JP7028807B2

JP7028807B2 - Genome assembly and haplotype fading methods

Info

Publication number: JP7028807B2
Application number: JP2019002382A
Authority: JP
Inventors: グリーン，リチャード，イー．ジュニア; ラレアウ，リアナ，エフ．
Original assignee: University of California
Current assignee: University of California
Priority date: 2013-02-01
Filing date: 2019-01-10
Publication date: 2022-03-02
Anticipated expiration: 2034-01-31
Also published as: CA2899020C; US20190080050A1; US20150363550A1; GB2519255B; CN108624668B; AU2020202992B2; GB2519255A; JP2022065109A; US11081209B2; EP2951319B1; CN105121661B; AU2014212152A1; GB201501001D0; AU2020202992A1; CN105121661A; GB2547875B; HK1218433A1; GB2547875A; EP2951319A1; EP2951319A4

Description

関連出願の相互参照
この出願は、2013年2月1日に出願された米国特許仮出願第61/759,941号、2013年10月17日に出願された米国特許仮出願第61/892,355号の利益を主張するものであり、その開示を参照により本明細書に組み込む。 Mutual Reference to Related Applications This application is the benefit of U.S. Patent Application No. 61 / 759,941 filed February 1, 2013 and U.S. Patent Application No. 61 / 892,355 filed October 17, 2013. Is claimed, the disclosure of which is incorporated herein by reference.

本開示は、ゲノム内の短、中、長期の接続を同定するためのゲノムアセンブリ及びハプロタイプフェージングの方法を提供する。 The present disclosure provides methods of genomic assembly and haplotype fading for identifying short, medium and long term connections within the genome.

高品質で高度に連続したゲノム配列を作製することは、理論的にも、実際にも困難なままである。 Producing high-quality, highly contiguous genomic sequences remains theoretically and practically difficult.

次世代配列決定(NGS)データの持続的な欠点は、短いリード長及び比較的小さい挿入サイズにより、ゲノムの大きな反復領域に跨がることができないことである。この不備は、de novoアセンブリに著しく影響を及ぼす。ゲノム再編成の性質及び配置が不確実なので、長い反復領域によって分離されているコンティグは、連結又は再配列できない。更に、バリアントを長距離にわたってハプロタイプと明確に関連付けることができないので、フェージング情報は確定できない。本開示は、適切な入力DNAを用いておよそ数百キロベース、最長メガベースのゲノム距離に跨る非常に長い範囲のリード対(XLRP)を生成することによって、これらの問題の全てを同時に解決することができる。そのようなデータは、動原体を含めた、ゲノム中の大きな反復領域によって示される実質的な障壁を克服するために非常に貴重になり得、費用効果に優れたde novoアセンブリを可能にし、個別化医療のために充分な完全性及び精度の再配列決定データを作製することができる。 A persistent drawback of next-generation sequencing (NGS) data is the inability to span large repeat regions of the genome due to short read lengths and relatively small insertion sizes. This flaw significantly affects the de novo assembly. Contigs separated by long repeat regions cannot be linked or rearranged due to the uncertain nature and arrangement of genomic rearrangements. Moreover, fading information cannot be determined because variants cannot be clearly associated with haplotypes over long distances. The present disclosure solves all of these problems simultaneously by generating a very long range of read pairs (XLRP) over a genomic distance of approximately hundreds of kilobases and the longest megabase using appropriate input DNA. Can be done. Such data can be invaluable to overcome the substantial barriers posed by large repeat regions in the genome, including centromeres, enabling cost-effective de novo assembly. Sufficient integrity and accuracy of rearrangement determination data can be produced for personalized medicine.

著しく重要なことは、非常に離れているが、分子的に連結しているDNAセグメントの間での会合の形成における再構成クロマチンの使用である。本開示は、離れたセグメントを一緒にし、クロマチン高次構造によって共有結合させることを可能にし、それによってDNA分子のこれまで離れていた部分を物理的に接続する。その後の加工により、確かめようとする会合したセグメントの配列が可能になり、ゲノム上での分離が入力DNA分子の完全長まで拡張するリード対を産生することができる。リード対は同じ分子に由来するので、これらの対はフェーズ情報も含有する。 Of significant importance is the use of reconstituted chromatin in the formation of associations between DNA segments that are very distant but molecularly linked. The present disclosure allows distant segments to come together and be covalently linked by chromatin higher order structures, thereby physically connecting previously distant portions of a DNA molecule. Subsequent processing allows the sequence of associated segments to be verified and allows segregation on the genome to produce read pairs that extend to the full length of the input DNA molecule. Since the lead pairs are derived from the same molecule, these pairs also contain phase information.

いくつかの実施形態において、本開示は、これまで必要とされてきたデータよりはるかに少ないそれで高品質アセンブリを作製できる方法を提供する。例えば、本明細書に開示する本方法は、Illumina HiSeqの2レーンだけのデータからゲノムアセンブリを提供する。 In some embodiments, the present disclosure provides a method by which a high quality assembly can be made with much less data than previously required. For example, the method disclosed herein provides a genomic assembly from data from only two lanes of Illumina HiSeq.

他の実施形態において、本開示は、長距離リード対手法を使用する染色体レベルのフェージングを生成できる方法を提供する。例えば、本明細書に開示する方法は、90%以上のヘテロ接合単一ヌクレオチド多型(SNP)をフェージングすることができ、その1つ1つについては少なくとも99%以上の精度になる。この精度は、実質的により高コストで、多くの時間と労力を要する方法によって作製されるフェージングと同程度である。 In other embodiments, the present disclosure provides a method capable of generating chromosomal level fading using a long-range read pair technique. For example, the methods disclosed herein are capable of fading 90% or more of heterozygous single nucleotide polymorphisms (SNPs), each with at least 99% accuracy. This accuracy is substantially higher cost and is comparable to fading produced by a time-consuming and labor-intensive method.

いくつかの例において、メガベース規模までのゲノムDNAの断片を作製できる方法を、本明細書に開示した方法と一緒に使用することができる。長いDNA断片を生成して、それらの抽出によって提供される最長の断片に跨るリード対を生成する本方法の能力を確認することができる。いくつかの場合において、長さ150kbpを超えるDNA断片を抽出し、使用して、XLRPライブラリーを生成することができる。 In some examples, methods capable of producing fragments of genomic DNA up to the megabase scale can be used in conjunction with the methods disclosed herein. It is possible to confirm the ability of this method to generate long DNA fragments and generate read pairs across the longest fragments provided by their extraction. In some cases, DNA fragments larger than 150 kbp in length can be extracted and used to generate XLRP libraries.

本開示は、de novoゲノムアセンブリを大きく加速し、改善する方法を提供する。本明細書に開示する方法は、1つ以上の対象由来のゲノムの迅速且つ安価なde novoアセンブリを可能にするデータ分析方法を利用する。本開示は、本明細書に開示する方法がハプロタイプフェージング及びメタゲノミクス分析を含めた様々な適用に使用し得ることを更に提供する。 The present disclosure provides a way to significantly accelerate and improve the de novo genome assembly. The methods disclosed herein utilize data analysis methods that allow rapid and inexpensive de novo assembly of genomes from one or more subjects. The disclosure further provides that the methods disclosed herein can be used for a variety of applications, including haplotype fading and metagenomic analysis.

特定の実施形態において、本開示は、複数のコンティグを生成するステップと、染色体、クロマチン又は再構成クロマチンの物理的レイアウトをプロービングすることによって作製されるデータから複数のリード対を生成するステップと、複数のコンティグに複数のリード対をマッピング又はアセンブルするステップと、リードマッピング又はアセンブリデータを使用してコンティグの隣接行列を構築するステップと、隣接行列を分析して、その順序及び/又はゲノムに対する方向を表す、コンティグを通る経路を決定するステップとを含むゲノムアセンブリの方法を提供する。更なる実施形態において、本開示は、コンティグの端までの各リードの距離の写像を得ることにより少なくとも約90%のリード対に重み付けして、どのリード対が短い範囲の接触を示し、どのリード対がより長い範囲の接触を示すかについての情報を組み込むことを提供する。他の実施形態において、隣接行列を再スケーリングして、クロマチンのスキャフォールド相互作用を調節する1つ以上の薬剤である転写リプレッサーCTCF等に対する保存結合部位など、ゲノムの無差別な領域を表すいくつかのコンティグ上の多くの接触の重みを軽減することができる。他の実施形態において、本開示は、ヒト対象のゲノムアセンブリの方法を提供し、それにより複数のコンティグが、ヒト対象のDNAから生成され、それにより対象のネイキッドDNAから作られるヒト対象の染色体、クロマチン又は再構成クロマチンを分析することにより複数のリード対が生成される。 In certain embodiments, the present disclosure comprises generating multiple contigs and generating multiple read pairs from data produced by probing the physical layout of a chromosome, chromatin or reconstituted chromatin. The step of mapping or assembling multiple read pairs to multiple contigs, the step of constructing adjacent contig matrices using read mapping or assembly data, and the analysis of the adjacent matrices and their order and / or direction with respect to the genome. Provided is a method of genome assembly including a step of determining a pathway through a contig, which represents. In a further embodiment, the present disclosure weights at least about 90% of lead pairs by obtaining a mapping of the distance of each lead to the end of the contig, which leads show short range of contact and which leads. It provides to incorporate information about whether a pair exhibits a longer range of contact. In other embodiments, the adjacency matrix is rescaled to represent indiscriminate regions of the genome, such as conservative binding sites for one or more agents that regulate chromatin scaffold interactions, such as transcriptional repressor CTCF. The weight of many contacts on the contig can be reduced. In other embodiments, the present disclosure provides a method of genomic assembly of a human subject, whereby multiple contigs are generated from the DNA of the human subject, thereby making a chromosome of the human subject from the naked DNA of the subject. Multiple lead pairs are generated by analyzing chromatin or reconstituted chromatin.

更なる実施形態において、本開示は、対象のDNAの長いストレッチを不確定なサイズのランダムな断片に断片化するステップと、高スループット配列決定法を使用して断片を配列決定して、複数の配列決定リードを生成するステップと、複数のコンティグを形成するように配列決定リードをアセンブルするステップとを含むショットガン配列決定法を使用することによって複数のコンティグを生成できることを提供する。 In a further embodiment, the present disclosure comprises the steps of fragmenting a long stretch of DNA of interest into random pieces of uncertain size and sequencing the pieces using a high-throughput sequencing method. It provides that multiple contigs can be generated by using a shotgun sequencing method that includes a step of generating an sequencing read and a step of assembling the sequencing read to form multiple contigs.

特定の実施形態において、本開示は、Hi-Cに基づく技法を使用して染色体、クロマチン又は再構成クロマチンの物理的レイアウトをプロービングすることにより、複数のリード対を生成できることを提供する。更なる実施形態において、Hi-Cに基づく技法は、染色体、クロマチン又は再構成クロマチンをホルムアルデヒドなどの固定剤で架橋して、DNA-タンパク質架橋を形成するステップと、1つ以上の制限酵素で架橋したDNA-タンパク質を切断して、粘着末端を含む複数のDNA-タンパク質複合体を生成するステップと、ビオチンなど1つ以上のマーカーを含有するヌクレオチドで粘着末端を埋めて、次に一緒にライゲーションする平滑末端を作り出すステップと、複数のDNA-タンパク質複合体を断片に断片化するステップと、1つ以上のマーカーを使用することによって断片を含有する接合部をプルダウンするステップと、高スループット配列決定法を使用して断片を含有する接合部を配列決定して、複数のリード対を生成するステップとを含む。更なる実施形態において、本明細書に開示する方法のための複数のリード対は、再構成クロマチンの物理的レイアウトをプロービングすることによって作製されるデータから生成される。 In certain embodiments, the present disclosure provides that multiple read pairs can be generated by probing the physical layout of a chromosome, chromatin or reconstituted chromatin using Hi-C based techniques. In a further embodiment, the Hi-C based technique involves cross-linking a chromosome, chromatin or reconstituted chromatin with a fixative such as formaldehyde to form a DNA-protein bridge, and cross-linking with one or more restriction enzymes. The step of cleaving the DNA-protein to form multiple DNA-protein complexes containing sticky ends, and filling the sticky ends with nucleotides containing one or more markers, such as biotin, and then ligating together. High-throughput sequencing, including the steps to create blunt ends, fragmenting multiple DNA-protein complexes into fragments, and pulling down junctions containing fragments by using one or more markers. Includes the steps of sequencing the junction containing the fragment using the to generate multiple read pairs. In a further embodiment, the plurality of lead pairs for the methods disclosed herein are generated from the data produced by probing the physical layout of the reconstituted chromatin.

様々な実施形態において、本開示は、複数のリード対が、培養細胞又は一次組織から単離された染色体若しくはクロマチンの物理的レイアウトをプロービングすることによって決定できることを提供する。他の実施形態において、複数のリード対が、1つ以上の対象のサンプルから得られるネイキッドDNAを単離されたヒストンと複合体形成させることによって形成される再構成クロマチンの物理的レイアウトをプロービングすることによって決定できる。 In various embodiments, the present disclosure provides that multiple read pairs can be determined by probing the physical layout of a chromosome or chromatin isolated from cultured cells or primary tissue. In another embodiment, multiple read pairs probe the physical layout of reconstituted chromatin formed by complexing naked DNA from one or more target samples with isolated histones. It can be decided by.

他の実施形態において、本開示は、複数のリード対中にある1つ以上のヘテロ接合性の部位を同定するステップを含む、ハプロタイプフェージングを決定する方法を提供し、一対のヘテロ接合性部位を含むリード対を同定することによって、対立遺伝子のバリアントに対するフェージングデータを決定できる。 In another embodiment, the disclosure provides a method of determining haplotype fading, comprising identifying one or more heterozygous sites in multiple read pairs, the pair of heterozygous sites. Fading data for allelic variants can be determined by identifying the included read pairs.

様々な実施形態において、本開示は、環境から微生物を収集する改変されたステップと、ホルムアルデヒドなどの固定剤を添加して、各微生物細胞内に架橋を形成する改変されたステップとを含む改変されたHi-Cに基づく方法を使用して、複数の微生物染色体の物理的レイアウトをプロービングすることにより複数のリード対を生成するステップを含み、異なるコンティグにマッピングされるリード対が、どのコンティグが同じ種由来であるかを示す、高スループットな細菌ゲノムアセンブリの方法を提供する。 In various embodiments, the present disclosure is modified to include a modified step of collecting microorganisms from the environment and a modified step of adding a fixing agent such as formaldehyde to form a contig within each microbial cell. Which contigs are the same lead pairs that are mapped to different contigs, including the step of generating multiple read pairs by probing the physical layout of multiple microbial chromosomes using a Hi-C based method. Provided is a method of high throughput bacterial genome assembly to indicate whether it is derived from a species.

いくつかの実施形態において、本開示は(a)複数のコンティグを生成するステップと、(b)染色体、クロマチン又は再構成クロマチンの物理的レイアウトをプロービングすることによって生成されるデータから複数のリード対を決定するステップと、(c)複数のコンティグに複数のリード対をマッピングするステップと、(d)リードマッピングデータを使用してコンティグの隣接行列を構築するステップと、(e)隣接行列を分析して、その順序及び/又はゲノムに対する方向を表す、コンティグを通る経路を決定するステップとを含むゲノムアセンブリの方法を提供する。 In some embodiments, the disclosure presents a plurality of read pairs from data generated by (a) stepping to generate multiple contigs and (b) probing the physical layout of a chromosome, chromatin or reconstituted chromatin. Steps to determine, (c) map multiple read pairs to multiple contigs, (d) build contig adjacencies using read mapping data, and (e) analyze adjacency matrices. A method of genome assembly comprising the steps of determining the pathway through the contig, which represents its order and / or direction with respect to the genome, is provided.

更なる実施形態において、本開示は、Hi-Cに基づく技法を使用して染色体、クロマチン又は再構成クロマチンの物理的レイアウトをプロービングすることによって複数のリード対を生成する方法を提供する。更なる実施形態において、Hi-Cに基づく技法は、(a)染色体、クロマチン又は再構成クロマチンを固定剤で架橋して、DNA-タンパク質架橋を形成するステップと、(b)1つ以上の制限酵素で架橋したDNA-タンパク質を切断して、粘着末端を含む複数のDNA-タンパク質複合体を生成するステップと、(c)1つ以上のマーカーを含有するヌクレオチドで粘着末端を埋めて、次に一緒にライゲーションされる平滑末端を作り出すステップと、(d)複数のDNA-タンパク質複合体を断片に剪断するステップと、(e)1つ以上のマーカーを使用することによって断片を含有する接合部をプルダウンするステップと、(f)高スループット配列決定法を使用して断片を含有する接合部を配列決定して、複数のリード対を生成するステップとを含む。 In a further embodiment, the present disclosure provides a method of generating multiple read pairs by probing the physical layout of a chromosome, chromatin or reconstituted chromatin using Hi-C based techniques. In a further embodiment, the Hi-C based technique involves (a) cross-linking a chromosome, chromatin or reconstituted chromatin with a fixative to form a DNA-protein bridge, and (b) one or more restriction. Steps to cleave the enzymatically cross-linked DNA-protein to form multiple DNA-protein complexes containing sticky ends, and (c) fill the sticky ends with nucleotides containing one or more markers, and then Steps to create blunt ends to be ligated together, (d) to shear multiple DNA-protein complexes into fragments, and (e) junctions containing the fragments by using one or more markers. It includes a pull-down step and (f) sequencing the junction containing the fragment using a high throughput sequencing method to generate multiple read pairs.

特定の実施形態において、複数のリード対が、培養細胞又は一次組織から単離された染色体若しくはクロマチンの物理的レイアウトをプロービングすることによって決定される。他の実施形態において、複数のリード対が、1つ以上の対象のサンプルから得られるネイキッドDNAを単離されたヒストンと複合体形成させることによって形成される再構成クロマチンの物理的レイアウトをプロービングすることによって決定される。 In certain embodiments, multiple read pairs are determined by probing the physical layout of chromosomes or chromatin isolated from cultured cells or primary tissue. In another embodiment, multiple read pairs probe the physical layout of reconstituted chromatin formed by complexing naked DNA from one or more target samples with isolated histones. It is determined by that.

いくつかの実施形態において、コンティグの端までのリードの距離の写像を得ることにより少なくとも約50%、約60%、約70%、約80%、約90%、約95%若しくは約99%又はより多くの複数のリード対に重み付けして、長い接触よりも短い接触のより高い確率を組み込む。いくつかの実施形態において、隣接行列を再スケーリングして、ゲノムの無差別な領域を表すいくつかのコンティグ上の多くの接触の重みを軽減する。 In some embodiments, at least about 50%, about 60%, about 70%, about 80%, about 90%, about 95% or about 99% or by obtaining a mapping of the lead distance to the end of the contig. Weight more multiple lead pairs to incorporate a higher probability of short contacts than long contacts. In some embodiments, the adjacency matrix is rescaled to reduce the weight of many contacts on some contigs that represent indiscriminate regions of the genome.

特定の実施形態において、ゲノムの無差別な領域は、クロマチンのスキャフォールド相互作用を調節する1つ以上の薬剤に対する1つ以上の保存結合部位を含む。いくつかの例において、薬剤は転写リプレッサーCTCFである。 In certain embodiments, the indiscriminate region of the genome comprises one or more conservative binding sites for one or more agents that regulate chromatin scaffold interactions. In some examples, the agent is transcriptional repressor CTCF.

いくつかの実施形態において、本明細書に開示する方法は、ヒト対象のゲノムアセンブリを提供し、それにより複数のコンティグが、ヒト対象のDNAから生成され、それにより対象のネイキッドDNAから作られるヒト対象の染色体、クロマチン又は再構成クロマチンを分析することから複数のリード対が生成される。 In some embodiments, the methods disclosed herein provide a genomic assembly of a human subject, whereby multiple contigs are generated from the DNA of the human subject, thereby making a human from the naked DNA of the subject. Multiple read pairs are generated from analysis of the chromosome, chromatin or reconstituted chromatin of interest.

他の実施形態において、本開示は、複数のリード対中にある1つ以上のヘテロ接合性の部位を同定するステップを含むハプロタイプフェージングを決定する方法を提供し、一対のヘテロ接合性部位を含むリード対を同定することによって、対立遺伝子のバリアントに対するフェージングデータを決定できる。 In another embodiment, the disclosure provides a method of determining haplotype fading comprising identifying one or more heterozygous sites in multiple read pairs, comprising a pair of heterozygous sites. By identifying lead pairs, fading data for allelic variants can be determined.

更に他の実施形態において、本開示は、メタゲノミクスアセンブリの方法を提供し、複数のリード対が、環境から微生物を収集するステップと、固定剤を添加して、各微生物細胞内に架橋を形成するステップとを含む改変されたHi-Cに基づく方法を使用して、複数の微生物染色体の物理的レイアウトをプロービングすることにより生成され、異なるコンティグにマッピングされるリード対が、どのコンティグが同じ種由来であるかを示す。いくつかの例において、固定剤はホルムアルデヒドである。 In yet another embodiment, the disclosure provides a method of metagenomic assembly in which multiple lead pairs form a crosslink within each microbial cell by adding a fixative and a step of collecting the microorganism from the environment. Lead pairs generated by probing the physical layout of multiple microbial chromosomes using a modified Hi-C-based method, including steps to, and which contigs are of the same species. Indicates whether it is derived. In some examples, the fixative is formaldehyde.

いくつかの実施形態において、本開示は、単一DNA分子から複数のリード対を生成するステップと、リード対を使用してコンティグをアセンブルするステップとを含む、単一DNA分子に由来する複数のコンティグをアセンブルする方法を提供し、少なくとも1%のリード対が、単一DNA分子上で50kBより長い距離に跨り、リード対は、14日以内に生成される。いくつかの実施形態において、少なくとも10%のリード対が、単一DNA分子上で50kBより長い距離に跨る。他の実施形態において、少なくとも1%のリード対が、単一DNA分子上で100kBより長い距離に跨る。更なる実施形態において、リード対は、7日以内に生成される。 In some embodiments, the present disclosure is derived from a single DNA molecule, comprising the step of generating multiple read pairs from a single DNA molecule and the step of assembling contigs using read pairs. It provides a way to assemble contigs, with at least 1% of read pairs spanning distances greater than 50 kB on a single DNA molecule, and read pairs are generated within 14 days. In some embodiments, at least 10% of read pairs span distances greater than 50 kB on a single DNA molecule. In other embodiments, at least 1% of read pairs span a distance greater than 100 kB on a single DNA molecule. In a further embodiment, lead pairs are generated within 7 days.

他の実施形態において、本開示は、in vitroで単一DNA分子から複数のリード対を生成するステップと、リード対を使用してコンティグをアセンブルするステップとを含む、単一DNA分子に由来する複数のコンティグをアセンブルする方法を提供し、少なくとも1%のリード対が、単一DNA分子上で30kBより長い距離に跨る。いくつかの実施形態において、少なくとも10%のリード対が、単一DNA分子上で30kBより長い距離に跨る。他の実施形態において、少なくとも1%のリード対が、単一DNA分子上で50kBより長い距離に跨る。 In other embodiments, the present disclosure derives from a single DNA molecule, comprising the step of generating multiple read pairs from a single DNA molecule in vitro and the step of assembling contigs using read pairs. It provides a way to assemble multiple contigs, with at least 1% of read pairs spanning distances greater than 30 kB on a single DNA molecule. In some embodiments, at least 10% of read pairs span distances greater than 30 kB on a single DNA molecule. In other embodiments, at least 1% of read pairs span a distance greater than 50 kB on a single DNA molecule.

更に他の実施形態において、本開示は、単一DNA分子から複数のリード対を生成するステップと、リード対を使用してDNA分子の複数のコンティグをアセンブルするステップとを含む、ハプロタイプフェージングの方法を提供し、少なくとも1%のリード対が、単一DNA分子上で50kBより長い距離に跨り、ハプロタイプフェージングは、70%を超える精度で実施される。いくつかの実施形態において、少なくとも10%のリード対が、単一DNA分子上で50kBより長い距離に跨る。他の実施形態において、少なくとも1%のリード対が、単一DNA分子上で100kBより長い距離に跨る。更なる実施形態において、ハプロタイプフェージングは、90%を超える精度で実施される。 In yet another embodiment, the disclosure comprises a method of haplotype fading comprising the steps of generating multiple read pairs from a single DNA molecule and assembling multiple contigs of the DNA molecule using the read pairs. With at least 1% of read pairs spanning distances greater than 50 kB on a single DNA molecule, haplotype fading is performed with an accuracy of over 70%. In some embodiments, at least 10% of read pairs span distances greater than 50 kB on a single DNA molecule. In other embodiments, at least 1% of read pairs span a distance greater than 100 kB on a single DNA molecule. In a further embodiment, haplotype fading is performed with an accuracy greater than 90%.

更なる実施形態において、本開示は、in vitroで単一DNA分子から複数のリード対を生成するステップと、リード対を使用してDNA分子の複数のコンティグをアセンブルするステップとを含む、ハプロタイプフェージングの方法を提供し、少なくとも1%のリード対が、単一DNA分子上で30kBより長い距離に跨り、ハプロタイプフェージングは70%を超える精度で実施される。いくつかの実施形態において、少なくとも10%のリード対が、単一DNA分子上で30kBより長い距離に跨る。他の実施形態において、少なくとも1%のリード対が、単一DNA分子上で50kBより長い距離に跨る。更に他の実施形態において、ハプロタイプフェージングは、90%を超える精度で実施される。更なる実施形態において、ハプロタイプフェージングは、70%を超える精度で実施される。 In a further embodiment, the disclosure comprises haplotype fading in vitro to generate multiple read pairs from a single DNA molecule and to assemble multiple contigs of the DNA molecule using the read pairs. With at least 1% of read pairs spanning distances greater than 30 kB on a single DNA molecule, haplotype fading is performed with an accuracy of> 70%. In some embodiments, at least 10% of read pairs span distances greater than 30 kB on a single DNA molecule. In other embodiments, at least 1% of read pairs span a distance greater than 50 kB on a single DNA molecule. In yet another embodiment, haplotype fading is performed with an accuracy greater than 90%. In a further embodiment, haplotype fading is performed with an accuracy greater than 70%.

いくつかの実施形態において、本開示は、(a)in vitroで第1のDNA分子を架橋するステップであって、第1のDNA分子が第1のDNAセグメント及び第2のDNAセグメントを含むステップと、(b)第1のDNAセグメントを第2のDNAセグメントと連結し、それによって連結されたDNAセグメントを形成するステップと、(c)連結DNAセグメントを配列決定し、それによって第1のリード対を得るステップとを含む、第1のDNA分子から第1のリード対を生成する方法を提供する。 In some embodiments, the present disclosure is (a) a step of cross-linking a first DNA molecule in vitro, wherein the first DNA molecule comprises a first DNA segment and a second DNA segment. And (b) the step of linking the first DNA segment to the second DNA segment to form the linked DNA segment, and (c) sequencing the linked DNA segment, thereby the first read. Provided is a method for generating a first read pair from a first DNA molecule, which comprises the step of obtaining a pair.

いくつかの実施形態において、再構成クロマチン由来などの複数の会合分子は、第1のDNA分子に架橋されている。いくつかの例において、会合分子はアミノ酸を含む。更なる例において、会合分子はペプチド又はタンパク質である。特定の実施形態において、第1のDNA分子は、固定剤で架橋されている。いくつかの例において、固定剤はホルムアルデヒドである。いくつかの実施形態において、第1のDNAセグメント及び第2のDNAセグメントは、第1のDNA分子を切り離すことによって生成される。特定の実施形態において、本方法は、第1のリード対を使用して第1のDNA分子の複数のコンティグをアセンブルするステップを更に含む。いくつかの実施形態において、第1及び第2のDNAセグメントのそれぞれは、少なくとも1つの親和性標識に接続され、連結DNAセグメントは親和性標識を使用して捕捉される。 In some embodiments, multiple associated molecules, such as those derived from reconstituted chromatin, are crosslinked to a first DNA molecule. In some examples, the associated molecule comprises an amino acid. In a further example, the associated molecule is a peptide or protein. In certain embodiments, the first DNA molecule is crosslinked with a fixative. In some examples, the fixative is formaldehyde. In some embodiments, the first DNA segment and the second DNA segment are generated by cleaving the first DNA molecule. In certain embodiments, the method further comprises assembling multiple contigs of the first DNA molecule using the first read pair. In some embodiments, each of the first and second DNA segments is attached to at least one affinity label and the linked DNA segment is captured using the affinity label.

更なる実施形態において、本方法は(a)再構成クロマチン由来などの複数の会合分子を少なくとも第2のDNA分子に提供するステップと、(b)会合分子を第2のDNA分子に架橋し、それによりin vitroで第2の複合体を形成するステップと、(c)第2の複合体を切り離し、それにより第3のDNAセグメント及び第4のセグメントを生成するステップと、(d)第3のDNAセグメントを第4のDNAセグメントと連結し、それにより第2の連結DNAセグメントを形成するステップと、(e)第2の連結DNAセグメントを配列決定し、それにより第2のリード対を得るステップとを更に含む。いくつかの例において、DNA分子由来のDNAセグメントの40%未満が、他の任意のDNA分子由来のDNAセグメントと連結されている。更なる例において、DNA分子由来のDNAセグメントの20%未満が、他の任意のDNA分子由来のDNAセグメントと連結されている。 In a further embodiment, the method comprises (a) providing a plurality of associated molecules, such as those derived from reconstituted chromatin, to at least the second DNA molecule, and (b) bridging the associated molecule into the second DNA molecule. Thereby forming a second complex in vitro, (c) separating the second complex and thereby producing a third DNA segment and a fourth segment, and (d) a third. DNA segment is ligated to the fourth DNA segment to form a second linked DNA segment, and (e) the second linked DNA segment is sequenced, thereby obtaining a second read pair. Further includes steps. In some examples, less than 40% of DNA segments from DNA molecules are linked to DNA segments from any other DNA molecule. In a further example, less than 20% of the DNA segment from a DNA molecule is linked to a DNA segment from any other DNA molecule.

他の実施形態において、本開示は、(a)1つ以上のDNA結合分子を第1のDNA分子に提供し、1つ以上のDNA結合分子が既定の配列に結合するステップと、(b)in vitroで第1のDNA分子を架橋するステップであって、第1のDNA分子が第1のDNAセグメント及び第2のDNAセグメントを含むステップと、(c)第1のDNAセグメントを第2のDNAセグメントと連結し、それによって第1の連結DNAセグメントを形成するステップと、(d)第1の連結DNAセグメントを配列決定し、それによって第1のリード対を得るステップとを含む、既定の配列を含む第1のDNA分子から第1のリード対を生成する方法を提供し、既定の配列がリード対中に現れる確率が、既定の配列へのDNA結合分子の結合による影響を受ける方法。 In other embodiments, the present disclosure provides (a) one or more DNA-binding molecules to a first DNA molecule, and (b) a step in which the one or more DNA-binding molecules bind to a predetermined sequence. The step of cross-linking the first DNA molecule in vitro, in which the first DNA molecule contains the first DNA segment and the second DNA segment, and (c) the first DNA segment is included in the second. A default, including the steps of ligating with a DNA segment, thereby forming a first ligated DNA segment, and (d) sequencing the first ligated DNA segment, thereby obtaining a first read pair. A method of generating a first read pair from a first DNA molecule containing a sequence, in which the probability that a given sequence will appear in the read pair is affected by the binding of the DNA binding molecule to the given sequence.

いくつかの実施形態において、DNA結合分子は、既定の配列にハイブリダイズできる核酸である。いくつかの例において、核酸はRNAである。他の例において、核酸はDNAである。他の実施形態において、DNA結合分子は小分子である。いくつかの例において、小分子は100μM未満の結合親和性で、既定の配列に結合する。更なる例において、小分子は1μM未満の結合親和性で、既定の配列に結合する。更なる実施形態において、DNA結合分子は表面又は固体支持体に固定化されている。 In some embodiments, the DNA binding molecule is a nucleic acid that can hybridize to a predetermined sequence. In some examples, the nucleic acid is RNA. In another example, the nucleic acid is DNA. In other embodiments, the DNA binding molecule is a small molecule. In some examples, small molecules bind to a predetermined sequence with a binding affinity of less than 100 μM. In a further example, the small molecule binds to a predetermined sequence with a binding affinity of less than 1 μM. In a further embodiment, the DNA binding molecule is immobilized on a surface or solid support.

いくつかの実施形態において、既定の配列がリード対中に現れる確率は低下する。他の実施形態において、既定の配列がリード対中に現れる確率は増加する。 In some embodiments, the probability that a default sequence will appear in a read pair is reduced. In other embodiments, the probability that a default sequence will appear in a read pair increases.

更に他の実施形態において、本開示は、それぞれ少なくとも第1の配列エレメント及び第2の配列エレメントを含む複数のリード対を含むin vitroライブラリーを提供し、第1及び第2の配列エレメントは単一DNA分子に由来し、リード対の少なくとも1%は、単一DNA分子上で少なくとも50kB離れている第1及び第2の配列エレメントを含む。 In yet another embodiment, the present disclosure provides an in vitro library containing a plurality of read pairs, each containing at least a first sequence element and a second sequence element, wherein the first and second sequence elements are simple. Derived from a single DNA molecule, at least 1% of read pairs contain first and second sequence elements that are at least 50 kB apart on a single DNA molecule.

いくつかの実施形態において、リード対の少なくとも10%は、単一DNA分子上で少なくとも50kB離れている第1及び第2の配列エレメントを含む。他の実施形態において、リード対の少なくとも1%は、単一DNA分子上で少なくとも100kB離れている第1及び第2の配列エレメントを含む。 In some embodiments, at least 10% of the read pairs contain first and second sequence elements that are at least 50 kB apart on a single DNA molecule. In other embodiments, at least 1% of read pairs contain first and second sequence elements that are at least 100 kB apart on a single DNA molecule.

更なる実施形態において、リード対の20%未満は、1つ以上の既定の配列を含む。更なる実施形態において、リード対の10%未満は、1つ以上の既定の配列を含む。更に他の実施形態において、リード対の5%未満は、1つ以上の既定の配列を含む。 In a further embodiment, less than 20% of read pairs contain one or more default sequences. In a further embodiment, less than 10% of read pairs contain one or more default sequences. In yet another embodiment, less than 5% of read pairs contain one or more default sequences.

いくつかの実施形態において、既定の配列は、既定の配列にハイブリダイズできる1つ以上の核酸によって決定される。いくつかの例において、1つ以上の核酸はRNAである。他の例において、1つ以上の核酸はDNAである。更なる例において、1つ以上の核酸は、表面又は固体支持体に固定化されている。 In some embodiments, the default sequence is determined by one or more nucleic acids that can hybridize to the default sequence. In some examples, one or more nucleic acids are RNA. In another example, one or more nucleic acids is DNA. In a further example, one or more nucleic acids are immobilized on a surface or solid support.

他の実施形態において、既定の配列は、1つ以上の小分子によって決定される。いくつかの例において、1つ以上の小分子が、100μM未満の結合親和性で既定の配列に結合する。更なる例において、1つ以上の小分子が、1μM未満の結合親和性で既定の配列に結合する。 In other embodiments, the default sequence is determined by one or more small molecules. In some examples, one or more small molecules bind to a predetermined sequence with a binding affinity of less than 100 μM. In a further example, one or more small molecules bind to a predetermined sequence with a binding affinity of less than 1 μM.

いくつかの実施形態において、本開示は、DNA断片及び再構成クロマチン由来などの複数の会合分子を含む組成物を提供し、(a)会合分子は、in vitro複合体中のDNA断片に架橋されており、(b)in vitro複合体は、固体支持体に固定化されている。 In some embodiments, the present disclosure provides a composition comprising a plurality of associated molecules, such as those derived from DNA fragments and reconstituted chromatin, where (a) the associated molecules are crosslinked into DNA fragments in an in vitro complex. (B) The in vitro complex is immobilized on a solid support.

他の実施形態において、本開示は、DNA断片、複数の会合分子及びDNA結合分子を含む組成物を提供し、(a)DNA結合分子は、DNA断片の既定の配列に結合しており、(b)会合分子は、DNA断片に架橋されている。 In other embodiments, the present disclosure provides a composition comprising a DNA fragment, a plurality of associated molecules and a DNA binding molecule, wherein (a) the DNA binding molecule is bound to a predetermined sequence of the DNA fragment (. b) The associated molecule is cross-linked into a DNA fragment.

いくつかの実施形態において、DNA結合分子は、既定の配列にハイブリダイズできる核酸である。いくつかの例において、核酸はRNAである。他の例において、核酸はDNAである。更なる例において、核酸は、表面又は固体支持体に固定化されている。 In some embodiments, the DNA binding molecule is a nucleic acid that can hybridize to a predetermined sequence. In some examples, the nucleic acid is RNA. In another example, the nucleic acid is DNA. In a further example, the nucleic acid is immobilized on a surface or solid support.

他の実施形態において、DNA結合分子は小分子である。いくつかの例において、小分子は100μM未満の結合親和性で、既定の配列に結合する。他の例において、小分子は1μM未満の結合親和性で、既定の配列に結合する。 In other embodiments, the DNA binding molecule is a small molecule. In some examples, small molecules bind to a predetermined sequence with a binding affinity of less than 100 μM. In another example, the small molecule binds to a predetermined sequence with a binding affinity of less than 1 μM.

参照文献による組み込み
この明細書に記載の全ての刊行物、特許及び特許出願は、各個々の刊行物、特許又は特許出願は、参照により組み込まれるものと具体的且つ個別的に示されるのと同程度に、参照により本明細書に組み込まれる。この明細書に記載の全ての刊行物、特許及び特許出願は、その全体並びにその中に引用されるいずれの文献も参照により本明細書に組み込まれる。 Incorporation by Reference All publications, patents and patent applications described herein are the same as each individual publication, patent or patent application being specifically and individually indicated as incorporated by reference. To some extent, it is incorporated herein by reference. All publications, patents and patent applications described herein are incorporated herein by reference in their entirety and any references cited therein.

本開示の新規な特徴は、添付の請求項に詳細に述べられる。本開示の特徴及び利点のより良い理解は、本開示の原則が利用される、例示的な実施形態を述べた以下の詳しい説明を参照することによって得られることになり、以下の図面を伴う。 The novel features of this disclosure are described in detail in the appended claims. A better understanding of the features and benefits of the present disclosure will be obtained by reference to the following detailed description of exemplary embodiments in which the principles of the present disclosure are utilized, with the following drawings.

高スループット配列決定リードを使用するゲノムアセンブリの図解を示す図である。アセンブルされるゲノムを示す(最上部)。一般に、ゲノムは、アセンブルが困難な多くの反復配列を有する。ゲノムからのランダムで高スループットな配列データを収集し(中央)、ゲノム中の固有の領域にある「コンティグ」へとアセンブルする(一番下)。コンティグアセンブリは、多くの反復配列で通常止まる。最終出力は、互いに対するその順序及び方向が未知である何千ものコンティグの組である。図において、それらは、最長から最短へと任意に番号をつけられる。FIG. 6 illustrates an illustration of a genome assembly using a high-throughput sequencing read. Shows the genome to be assembled (top). In general, the genome has many repetitive sequences that are difficult to assemble. Random, high-throughput sequence data from the genome is collected (center) and assembled into "contigs" in unique regions of the genome (bottom). Contig assembly usually stops at many repeats. The final output is a set of thousands of contigs whose order and orientation with respect to each other is unknown. In the figure, they are arbitrarily numbered from longest to shortest. 本開示のHi-Cに基づく手順を例示する図である。(A)DNAがどこで架橋され、加工されて、配列決定用のビオチン化接合部断片を作り出すかを実証する図である、(B～D)様々な制限酵素に対するヒトchr14上の接触マップデータを提供する。示すように、大部分の接触は染色体に沿って限局する。It is a figure which illustrates the procedure based on Hi-C of this disclosure. (A) Demonstrates where DNA is cross-linked and processed to produce biotinylated junction fragments for sequencing, (B-D) Contact map data on human chr14 for various restriction enzymes. offer. As shown, most contacts are localized along the chromosome. ゲノムアセンブリを支援するためにHi-C配列データを使用する本開示の方法を提供する。(A)Hi-Cに基づく手順を使用してDNAが架橋され、加工される場所について例示する図である、(B)リード対データが、ランダムなショットガン配列決定及びアセンブリから生成される、アセンブルされたコンティグにマッピングされる場所を実証する図である、(C)フィルタリング及び重み付けの後に、全てのコンティグ間リード対データを要約する隣接行列を構築できることを示す図である。この行列を再整列して、正しいアセンブリ経路を示すことができる。示されるように、大部分のリード対はコンティグ内にマッピングできる。そこから、接触距離の分布を知ることが可能になる(例えば、図6を参照のこと)。異なるコンティグにマッピングされるリード対は、正しいゲノムアセンブリにおいてコンティグが隣接していることについてのデータを提供する。Provided are the methods of the present disclosure that use Hi-C sequence data to assist in genomic assembly. (A) Illustrated where DNA is cross-linked and processed using Hi-C based procedures, (B) Read-to-data is generated from random shotgun sequencing and assembly. It is a diagram demonstrating the location mapped to the assembled contigs, (C) showing that after filtering and weighting, an adjacency matrix can be constructed that summarizes all inter-contig read-to-data. This matrix can be rearranged to show the correct assembly path. As shown, most lead pairs can be mapped within the contig. From there, it becomes possible to know the distribution of contact distances (see, for example, Figure 6). Read pairs that map to different contigs provide data about contig adjacencies in the correct genomic assembly. 本開示の典型的な手順を例示する図である。DNA断片を、最初に生成し、調製し、その後にin vitroクロマチンアセンブリ及びビオチン化し、次いでクロマチン/DNA複合体を、ホルムアルデヒドで固定し、ストレプトアビジンビーズでプルダウンし、次いで複合体を制限消化して、粘着末端を生成し、次いでビオチン化dCTP及び内部を硫酸化GTPで埋め、平滑末端ライゲーションの後に、クロマチン/DNA複合体を、プロテイナーゼ消化、エキソヌクレアーゼ消化及び剪断し、その後に、DNA断片をビオチンでプルダウンし、配列決定アダプターとライゲーションし、最後に、DNA断片をサイズにより選択し、配列決定する。It is a figure which illustrates the typical procedure of this disclosure. DNA fragments are first generated, prepared, then in vitro chromatin assembly and biotinylated, then the chromatin / DNA complex is immobilized with formaldehyde, pulled down with streptavidin beads, and then restriction digested. The chromatin / DNA complex is subjected to proteinase digestion, exonuclease digestion and shearing, followed by biotinylated dCTP and blunt-ended ligation, followed by biotinylated dCTP and sulphated GTP. Pull down with to ligate with the sequencing adapter, and finally select the DNA fragment by size and sequence. ゲノム中の反復領域に由来する、ゲノムアセンブリ及び整列化において起こる曖昧性の図解を提供する。(A)連結における不確定性が、反復領域に橋渡しできないリード対に起因することを示す図である。(B)リード対が反復の境界を跨ぐことができないことによる、セグメントの配置の不確定性を示す図である。It provides an illustration of the ambiguity that occurs in genome assembly and alignment, which derives from repetitive regions in the genome. (A) It is a figure which shows that the uncertainty in concatenation is caused by the read pair which cannot be bridged to the iterative region. (B) It is a figure which shows the uncertainty of the arrangement of a segment because a read pair cannot cross the boundary of an iteration. ヒトXLRPライブラリーのリード対間のゲノム距離の分布を例示する図である。他の技術で実現可能な最大距離を、比較のために示す。It is a figure which illustrates the distribution of the genomic distance between the read pairs of a human XLRP library. The maximum distances that can be achieved with other technologies are shown for comparison. よく特徴付けられているハプロタイプであるNA12878を用いたサンプルに対するフェージング精度を例示する図である。示した距離は、フェージングされたSNP間のものである。FIG. 5 illustrates fading accuracy for a sample using the well-characterized haplotype NA12878. The distances shown are between faded SNPs. 本開示の様々な実施形態による、典型的なコンピュータシステムの様々な構成要素を例示する図である。It is a figure which illustrates various components of a typical computer system by various embodiments of this disclosure. 本開示の様々な実施形態に関して使用することができる典型的なコンピュータシステムのアーキテクチャを例示するブロック図である。FIG. 3 is a block diagram illustrating a typical computer system architecture that can be used with respect to the various embodiments of the present disclosure. 本開示の様々な実施形態に関して使用することができる典型的なコンピュータネットワークを例示する線図である。It is a diagram illustrating a typical computer network that can be used with respect to various embodiments of the present disclosure. 本開示の様々な実施形態に関して使用することができる別の典型的なコンピュータシステムのアーキテクチャを例示するブロック図である。FIG. 3 is a block diagram illustrating another typical computer system architecture that can be used with respect to the various embodiments of the present disclosure.

本明細書及び添付の特許請求の範囲に使用されるように、単数形、「a」、「and」及び「the」は、文脈に別段の明確な指図がない限り複数の指示対象を含む。したがって、例えば、「コンティグ」への言及は、そのようなコンティグの複数形を含み、「染色体の物理的レイアウトをプロービングする」への言及は、染色体の物理的レイアウトをプロービングするための1つ以上の方法及び当業者に公知のその同等物などへの言及を含む。 As used herein and in the appended claims, the singular, "a," "and," and "the" include multiple referents unless the context clearly dictates otherwise. Thus, for example, a reference to "contig" includes the plural of such contigs, and a reference to "probing the physical layout of a chromosome" is one or more for probing the physical layout of a chromosome. Includes references to methods and their equivalents known to those of skill in the art.

「and」の使用は、特に明記しない限り「and/or」も意味する。同様に、「comprise」、「comprises」、「comprising」、「include」、「includes」、及び「including」は、交換可能であり、制限されることを意図しない。 The use of "and" also means "and / or" unless otherwise stated. Similarly, "comprise", "comprises", "comprising", "include", "includes", and "including" are interchangeable and are not intended to be restricted.

様々な実施形態の説明が、用語「comprising」を使用する場合、いくつかの具体的な例において、実施形態が、言い回し「consisting essentially of」又は「consisting of」を使用して代わりに記述できることを当業者が理解することは、更に理解されるべきである。 When the description of the various embodiments uses the term "comprising", in some specific examples the embodiments can be described instead using the phrase "consisting essentially of" or "consisting of". What one of ordinary skill in the art understands should be further understood.

本明細書で使用される用語「配列決定リード」とは、配列が決定されたDNAの断片のことを指す。 As used herein, the term "sequencing read" refers to a fragment of DNA that has been sequenced.

本明細書で使用される用語「コンティグ」とは、DNA配列の連続した領域のことを指す。「コンティグ」は、重なり合っている配列に対して配列決定リードを比較し、及び/又は配列決定リードを公知の配列のデータベースに対して比較して、どの配列決定リードが連続している確率が高いかについて同定することによるなど、当技術分野において公知の多くの方法によって決定できる。 As used herein, the term "contig" refers to a contiguous region of a DNA sequence. A "contig" compares sequencing reads to overlapping sequences and / or compares sequenced reads to a database of known sequences, with a high probability of which sequencing reads are contiguous. It can be determined by many methods known in the art, such as by identifying.

本明細書で使用される用語「対象」とは、任意の真核又は原核生物を指すことができる。 As used herein, the term "object" can refer to any eukaryote or prokaryote.

本明細書で使用される用語「ネイキッドDNA」とは、複合体形成したタンパク質を実質的に含まないDNAのことを指すことができる。例えば、それは、細胞核中に見出される内在性タンパク質の約50%、約40%、約30%、約20%、約10%、約5%、又は約1%未満と複合体形成したDNAのことを指すことができる。 As used herein, the term "naked DNA" can refer to DNA that is substantially free of complexed proteins. For example, it is DNA that is complexed with about 50%, about 40%, about 30%, about 20%, about 10%, about 5%, or less than about 1% of the endogenous proteins found in the cell nucleus. Can be pointed to.

本明細書で使用される用語「再構成クロマチン」とは、単離された核タンパク質をネイキッドDNAと複合体形成させることによって形成されるクロマチンを形成することを指すことができる。 As used herein, the term "reconstituted chromatin" can refer to the formation of chromatin formed by complexing an isolated nuclear protein with naked DNA.

本明細書で使用される用語「read pair(リード対)」又は「read-pair(リード対)」とは、連結されて、配列情報を与える2つ以上のエレメントのことを指すことができる。いくつかの場合において、リード対の数とは、マッピング可能なリード対の数のことを指すことができる。他の場合において、リード対の数とは、生成されたリード対の総数のことを指すことができる。 As used herein, the term "read pair" or "read-pair" can refer to two or more elements that are concatenated to give sequence information. In some cases, the number of read pairs can refer to the number of mapable read pairs. In other cases, the number of read pairs can refer to the total number of read pairs generated.

別段の規定がない限り、本明細書において使用される技術的及び科学的な用語の全ては、この開示が属する当業者にとって一般的に理解されるそれと同じ意味を有する。本明細書に記載される方法及び試薬と類似の又は同等の任意のそれらは、開示する方法及び組成物の実践に使用できるが、典型的な方法及び材料についてここで記述する。 Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Any of the methods and reagents similar to or equivalent to those described herein can be used in the practice of the disclosed methods and compositions, but typical methods and materials are described herein.

本開示は、非常に長い範囲のリード対を生成し、そのデータを上述した仕事全ての向上に利用するための方法を提供する。いくつかの実施形態において、本開示は、約300万個のリード対だけで、高度に連続しており且つ正確なヒトゲノムアセンブリを作製する方法を提供する。他の実施形態において、本開示は、99%又はより高い精度でヒトゲノム中の90%又はより多くのヘテロ接合バリアントをフェージングする方法を提供する。更に、本開示によって生成されるリード対の範囲を拡張して、より広いゲノム距離に跨ることができる。アセンブリは、非常に長い範囲のリード対ライブラリーだけでなく標準的なショットガンライブラリーからも作製される。更に他の実施形態において、本開示は、これら両方の組の配列決定データを利用することが可能なソフトウェアを提供する。フェージングされたバリアントが、単一の長い範囲のリード対ライブラリーで作製され、それに由来するリードが参照ゲノムにマッピングされ、次いでそれを使用してバリアントを個体の両親の染色体の一方に割り当てる。最後に、本開示は、公知の技法を使用して更により大きなDNA断片を抽出して、非常に長いリードを生成することを提供する。 The present disclosure provides a method for generating a very long range of lead pairs and using that data to improve all of the work described above. In some embodiments, the present disclosure provides a method of producing a highly continuous and accurate human genome assembly with only about 3 million read pairs. In other embodiments, the present disclosure provides a method of fading 90% or more heterozygous variants in the human genome with 99% or higher accuracy. In addition, the range of read pairs generated by the present disclosure can be extended to span wider genomic distances. Assemblies are made from a standard shotgun library as well as a very long range of read-to-library. In yet another embodiment, the disclosure provides software that can utilize both of these sets of sequencing data. Faded variants are created in a single long-range read-to-library, from which reads are mapped to the reference genome, which is then used to assign the variant to one of the individual's parental chromosomes. Finally, the present disclosure provides that even larger DNA fragments are extracted using known techniques to produce very long reads.

これらの反復がアセンブリ及び整列化の過程を遮る機序は、極めて直接的であり、結局は曖昧性の結果である(図5)。大きい反復領域の場合、その困難はスパンの1つである。リード又はリード対が、反復領域を跨ぐのに十分な長さでない場合、反復エレメントに接する領域を明確に接続することができない。より小さい反復エレメントの場合、問題は主に配置である。領域がゲノム中に良く見られる2つの反復エレメントに隣接する場合、隣接するエレメントの、そのクラスの他の全てとの類似性のため、その正確な配置を決定することは不可能でないとしても困難になる。両方の場合において、それは、特定の反復の同定、したがって配置を難しくする、反復における特徴的な情報の不足である。必要なことは、反復領域に囲まれた又はそれにより分離された固有のセグメントの間の接続を実験的に確立する能力である。 The mechanism by which these iterations block the assembly and alignment process is quite straightforward and ultimately the result of ambiguity (Figure 5). For large repeating regions, the difficulty is one of the spans. If the lead or lead pair is not long enough to span the repeating region, the regions in contact with the repeating element cannot be clearly connected. For smaller repeating elements, the problem is primarily placement. If a region is adjacent to two repetitive elements commonly found in the genome, it is difficult, if not impossible, to determine the exact placement of the adjacent element due to its similarity to everything else in its class. become. In both cases, it is a lack of characteristic information in the iterations that makes it difficult to identify and therefore place a particular iteration. What is needed is the ability to experimentally establish connections between unique segments surrounded by or separated by repeating regions.

本開示の方法は、これらの反復領域によってもたらされる実質的な障壁を克服することによりゲノム研究の分野を大いに前進させ、それによりゲノム分析の多くのドメインにおける重要な前進を可能にすることができる。これまでの技術でde novoアセンブリを実行するには、多くの小さいスキャフォールドに断片化したアセンブリを受忍するか又は大量の時間と資源をかけて大きな挿入ライブラリーを作製する若しくは他の手法を使用してより連続したアセンブリを生成するかのいずれかでなければならない。そのような手法には、非常に深い配列決定カバレッジを獲得すること、BAC若しくはフォスミドライブラリーを構築すること、光学的マッピング、又は大概は、これらと他の技法とのいくつかの組合せがあり得る。厳しい資源及び時間要件により、そのような手法はほとんどの小さい研究室にとって手の届かないものであり、モデル生物以外の研究を妨げている。本明細書に記載される方法は非常に長い範囲のリード対を作製することができるので、de novoアセンブリを、1回の配列決定の実行で実現することができる。これによりアセンブリ経費が数桁下がり、必要な時間が数ヶ月又は数年から数週間まで短縮されることになる。いくつかの場合において、本明細書に開示する方法は、14日未満、13日未満、12日未満、11日未満、10日未満、9日未満、8日未満、7日未満、6日未満、5日未満、4日未満、又は前述の指定した時間の任意の2つの範囲で複数のリード対を生成することを可能にする。例えば、本方法は、約10日～14日間で複数のリード対を生成することを可能にし得る。どんな生態学的地位のゲノムの組み立ても日常的になり、系統発生分析は比較対象の不足に苦しまなくなり、ゲノム10kなどのプロジェクトが現実化する可能性がある。 The methods of the present disclosure can greatly advance the field of genomics research by overcoming the substantial barriers posed by these iterative regions, thereby enabling significant advances in many domains of genomic analysis. .. To perform de novo assembly with conventional techniques, accept fragmented assemblies in many small scaffolds, or spend a lot of time and resources to create large insertion libraries or use other techniques. Must either produce a more contiguous assembly. Such techniques include obtaining very deep sequencing coverage, building BAC or phosmid libraries, optical mapping, or, in most cases, some combination of these with other techniques. obtain. Due to stringent resource and time requirements, such techniques are out of reach for most small laboratories and impede research other than model organisms. Since the method described herein can produce a very long range of read pairs, a de novo assembly can be achieved with a single sequence determination. This will reduce assembly costs by orders of magnitude and reduce the time required from months or years to weeks. In some cases, the methods disclosed herein are less than 14 days, less than 13 days, less than 12 days, less than 11 days, less than 10 days, less than 9 days, less than 8 days, less than 7 days, less than 6 days. Allows you to generate multiple read pairs in any two ranges of less than 5 days, less than 4 days, or the time specified above. For example, the method may be able to generate multiple read pairs in about 10-14 days. Assembling the genome of any ecological status becomes routine, phylogenetic analysis will not suffer from a shortage of comparisons, and projects such as the genome 10k may become a reality.

同様に、医学目的での構造及びフェージング分析の方法も、困難なままである。癌、同じ癌種の個体間、又は同じ腫瘍の中でさえ驚くほど不均一である。結果として生じる効果から原因を引き出すには、サンプル当たり低コストで非常に高い正確さ及びスループットを必要とする。個別化医療のドメインにおいて、ゲノム治療の究極の判断基準の1つは、大小の構造再編成及び新規な突然変異を含めた、完全に特徴付けられ、フェージングされた全バリアントを含む配列決定されたゲノムである。これまでの技術でこれを実現するには、de novoアセンブリに必要とされるそれと類似の取り組みが要求され、現在のところ日常的な医療にするにはあまりに高価で多くの時間と労力を要する。開示される方法は、少ない費用で完全且つ正確なゲノムを速やかに作製することができ、それによってヒト疾患の研究及び治療において高度に探究された可能性を多くもたらすことができる。 Similarly, methods of structural and fading analysis for medical purposes remain difficult. It is surprisingly heterogeneous between cancers, individuals of the same cancer type, or even within the same tumor. Deriving the cause from the resulting effect requires very high accuracy and throughput at a low cost per sample. In the domain of personalized medicine, one of the ultimate criteria for genomic therapy was sequenced, including all fully characterized and faded variants, including large and small structural rearrangements and novel mutations. It is a genome. Achieving this with conventional technology requires similar efforts required for de novo assembly, which is currently too expensive and time consuming and labor intensive for routine medical care. The disclosed methods can rapidly generate complete and accurate genomes at low cost, thereby providing many highly explored possibilities in the study and treatment of human diseases.

最後に、本明細書に開示する方法をフェージングに適用することにより、家族性分析法の精度と統計手法の利便性を組み合わせることができ、いずれかの方法を単独で使用するより、金、労力及びサンプルの節約を可能にする。これまでの技術では高額過ぎる非常に望ましいフェージング分析であるde novoバリアントフェージングは、本明細書に開示する方法を使用して容易に実行することができる。ヒト変形の圧倒的多数は稀れなので(5%未満のわずかな対立遺伝子頻度)、これは特に重要である。フェージング情報は、連鎖していない遺伝子型と比較して、高度に接続するハプロタイプのネットワーク(単一の染色体に割り当てられたバリアントの集合)から大きな利点を得る集団遺伝子研究に有益である。ハプロタイプ情報により、集団サイズ、移動及び部分母集団間の交換における歴史的変化のより高解像度の研究を可能にすることができ、特定のバリアントを個々の両親及び祖父母までさかのぼることが可能になる。単一個体中に一緒にある場合、これにより、疾患と関連付けられたバリアントの遺伝子伝達及びバリアント間の相互影響が次々に明らかになる。本開示の方法は、非常に長い範囲のリード対(XLRP)ライブラリーの調製、配列決定及び分析を最終的に可能にすることができる。 Finally, by applying the methods disclosed herein to fading, the accuracy of familial analysis and the convenience of statistical methods can be combined, with more money and effort than using either method alone. And allows for sample savings. De novo variant fading, a highly desirable fading analysis that is too expensive for conventional techniques, can be readily performed using the methods disclosed herein. This is especially important because the overwhelming majority of human variants are rare (slight allele frequencies <5%). Fading information is useful for population genetic studies that benefit significantly from highly connected haplotype networks (a collection of variants assigned to a single chromosome) compared to unlinked genotypes. Haplotype information allows for higher resolution studies of historical changes in population size, migration and exchange between subpopulations, allowing specific variants to be traced back to individual parents and grandparents. When present together in a single individual, this in turn reveals the gene transduction of the variants associated with the disease and the interrelationships between the variants. The methods of the present disclosure can ultimately enable the preparation, sequencing and analysis of a very long range of read pair (XLRP) libraries.

本開示のいくつかの実施形態において、対象から組織又はDNAサンプルを提供することができ、本方法は、アセンブルされたゲノム、コールしたバリアント(大きい構造のバリアントを含める)による整列化、フェージングしたバリアントコール、又は任意の追加の分析を返すことができる。他の実施形態において、本明細書に開示する方法は、個体に対してXLRPライブラリーを直接提供することができる。 In some embodiments of the present disclosure, tissue or DNA samples can be provided from a subject, the method of which is an assembled genome, aligned with called variants (including large structural variants), and faded variants. A call or any additional analysis can be returned. In other embodiments, the methods disclosed herein can provide the XLRP library directly to an individual.

本開示の様々な実施形態において、本明細書に開示する方法は、長い距離で隔てられた非常に長い範囲のリード対を生成することができる。この距離の上限は、大きいサイズのDNAサンプルを収集する能力によって改善することができる。いくつかの場合において、リード対は、最長で50、60、70、80、90、100、125、150、175、200、225、250、300、400、500、600、700、800、900、1000、1500、2000、2500、3000、4000、5000kbp又はより長いゲノム距離に跨ることができる。いくつかの例において、リード対は、最長500kbpのゲノム距離に跨ることができる。他の例において、リード対は、最長2000kbpのゲノム距離に跨ることができる。本明細書に開示する方法は、分子生物学における標準的な技法と融和し、その上に組み立てることができ、効率、特異性及びゲノムカバレッジの増大に更に良く適している。いくつかの場合において、リード対は、約1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、60又は90日未満で生成することができる。いくつかの例において、リード対は、約14日未満で生成することができる。更なる例において、リード対は、約10日未満で生成することができる。いくつかの場合において、本開示の方法は、複数のコンティグを正しく順序付け及び/又は方向付けするにあたって少なくとも約50%、約60%、約70%、約80%、約90%、約95%、約99%又は約100%の精度で、約5%、約10%、約15%、約20%、約30%、約40%、約50%、約60%、約70%、約80%、約90%、約95%、約99%又は約100%より多くのリード対を提供することができる。例えば、本方法は、複数のコンティグを正しく順序付け及び/又は方向付けするにあたって約90～100%の精度を提供することができる。 In various embodiments of the present disclosure, the methods disclosed herein can generate a very long range of read pairs separated by a long distance. This upper limit of distance can be improved by the ability to collect large size DNA samples. In some cases, lead pairs can be up to 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 300, 400, 500, 600, 700, 800, 900, It can span 1000, 1500, 2000, 2500, 3000, 4000, 5000 kbp or longer genomic distances. In some examples, read pairs can span genomic distances up to 500 kbp. In another example, the read pair can span a genomic distance of up to 2000 kbp. The methods disclosed herein can be integrated and built onto standard techniques in molecular biology and are better suited for increased efficiency, specificity and genomic coverage. In some cases, lead pairs are about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20. , 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 60 or can be produced in less than 90 days. In some examples, lead pairs can be generated in less than about 14 days. In a further example, lead pairs can be generated in less than about 10 days. In some cases, the methods of the present disclosure are at least about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, in correctly ordering and / or orienting multiple contigs. About 99% or about 100% accuracy, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80% , About 90%, about 95%, about 99% or about 100% or more lead pairs can be provided. For example, the method can provide about 90-100% accuracy in correctly ordering and / or orienting multiple contigs.

他の実施形態において、本明細書に開示する方法は、現在利用されている配列決定技術と一緒に使用することができる。例えば、本方法は、よく試験され及び/又は広く配備された配列決定機器と組み合わせて使用することができる。更なる実施形態において、本明細書に開示する方法は、現在利用されている配列決定技術に由来する技術及び手法と一緒に使用することができる。 In other embodiments, the methods disclosed herein can be used in conjunction with currently available sequencing techniques. For example, the method can be used in combination with well-tested and / or widely deployed sequencing equipment. In a further embodiment, the methods disclosed herein can be used in conjunction with techniques and techniques derived from the currently used sequencing techniques.

本開示の方法は、多様な生物についてde novoゲノムアセンブリを劇的に単純化する。これまでの技術を使用すると、そのようなアセンブリは、経済的なメイトペアライブラリーの短い挿入物によって現在のところ制限される。フォスミドで利用可能な40～50kbpまでのゲノム距離でリード対を生成することはでき得るが、これらは高価で、扱いにくく、ヒトにおいて300kbp～5Mbpのサイズである動原体内のそれを含めた最長の反復ストレッチに跨るには短過ぎる。本明細書に開示する方法は、長い距離(例えば、メガベース又はそれ以上)に跨ることが可能なリード対を提供することができ、それによってこれらのスキャフォールド完全性の課題を克服することができる。したがって、染色体レベルアセンブリを作製することは、本開示の方法を利用することによって日常的になり得る。現在のとこと研究所に途方もない時間と金をかけ、発展的なゲノムカタログを妨げている、より多くの時間と労力を要するアセンブリの手段は不要になり、より意味がある分析に資源を自由にすることができる。同様に、長い範囲のフェージング情報の獲得は、集団ゲノム、系統発生、及び疾患研究に多大な追加の力を提供することができる。本明細書に開示する方法は、多数の個体に対する正確なフェージングを可能にし、したがって、集団及び深い時間レベルでゲノムをプロービングする能力の広さと深さを拡張する。 The methods of this disclosure dramatically simplify the de novo genome assembly for a variety of organisms. Using previous techniques, such assemblies are currently limited by short inserts in the economical mate pair library. Although it is possible to generate read pairs at genomic distances up to 40-50 kbp available in fosmid, these are expensive, cumbersome, and the longest including those in the kinematic body, which are 300 kbp-5 Mbps in size in humans. Too short to straddle the repeated stretches of. The methods disclosed herein can provide read pairs that can span long distances (eg, megabases or more), thereby overcoming these scaffold integrity challenges. .. Therefore, making chromosomal level assemblies can be routine by utilizing the methods of the present disclosure. It eliminates the need for more time-consuming and labor-intensive assembly tools that are spending tremendous time and money on current and laboratories and hindering evolving genomic catalogs, leaving resources for more meaningful analysis. You can be free. Similarly, the acquisition of long-range fading information can provide significant additional power for population genome, phylogeny, and disease research. The methods disclosed herein allow accurate fading for large numbers of individuals and thus extend the breadth and depth of the ability to probe the genome at population and deep time levels.

個別化医療の分野において、本明細書に開示する方法から生成されるXLRPリード対は、正確で、低コストで、フェージングされ、速やかに作製される個人ゲノムに対して意味がある進歩を表す。現在の方法は、長距離でバリアントをフェージングする能力が不十分であり、それにより複雑なヘテロ接合遺伝子型の表現型の影響の特徴付けを妨げている。加えて、ゲノム疾患に関する実質的な対象となる構造バリアントは、それらを研究するのに使用されるリード及びリード対挿入物と比較して大きいサイズのため、現在の技法では正確に同定し、特徴付けることが困難である。数十キロベース～メガベース又はより長くに跨るリード対は、この困難を軽減する助けになり得、それにより構造的変形の高度に並列で、個別化された分析が可能になる。 In the field of personalized medicine, XLRP read pairs generated from the methods disclosed herein represent significant advances for accurate, low cost, fading and rapidly produced personal genomes. Current methods lack the ability to fade variants over long distances, thereby hindering the characterization of the phenotypic effects of complex heterozygous genotypes. In addition, structural variants of substantial interest for genomic disease are accurately identified and characterized by current techniques due to their large size compared to the reads and lead-to-inserts used to study them. Is difficult. Read pairs spanning tens of kilobases to megabases or longer can help alleviate this difficulty, allowing highly parallel and individualized analysis of structural deformations.

基本的な進化的及び生物医学的研究は、高スループット配列決定の技術的進歩によって駆動されている。全ゲノムの配列決定及びアセンブリが、大きなゲノム配列決定センターの起源になるよう使用されるのに対して、市販の配列決定装置は、大部分の研究大学がこれらの機械を1台又は数台持つには現在十分に安価である。大量のDNA配列データを生成することは、現在比較的安価である。しかしながら現在の技術では、高品質で高度に連続したゲノム配列を作製することは、理論的にも、実際にも困難なままである。更に、ヒトを含めて、分析したいと思う生物の大部分は二倍体なので、各個体はゲノムの2つの半数体コピーを有する。ヘテロ接合性の部位(例えば、母親により与えられる対立遺伝子が、父親により与えられる対立遺伝子と異なる)において、どちらの対立遺伝子の組が、どちらの親に由来するか(ハプロタイプフェージングとして公知である)を知ることは困難である。この情報を使用して、疾患及び形質関連研究など多くの進化的及び生物医学的研究を実施することができる。 Basic evolutionary and biomedical research is driven by technological advances in high-throughput sequencing. Whereas whole-genome sequencing and assembly is used to be the origin of large genome sequencing centers, most research universities have one or several of these machines on the market. Is currently cheap enough. Generating large amounts of DNA sequence data is currently relatively inexpensive. However, with current technology, producing high quality, highly continuous genomic sequences remains theoretically and practically difficult. Moreover, since most of the organisms we want to analyze, including humans, are diploid, each individual has two haploid copies of its genome. Which allele pair is derived from which parent at a heterozygous site (eg, the allele given by the mother is different from the allele given by the father) (known as haplotype fading). Is difficult to know. This information can be used to carry out many evolutionary and biomedical studies, including disease and trait-related studies.

様々な実施形態において、本開示は、DNA調製の技術を、所与のゲノム内の短、中、長期の接続を高スループットで発見するためのペアエンド配列決定と組み合わせるゲノムアセンブリの方法を提供する。本開示は、これらの接続を使用して、ハプロタイプフェージング及び/又はメタゲノム研究のためにゲノムアセンブリを支援する方法を更に提供する。本明細書に提示される方法を使用して、対象のゲノムのアセンブリを決定することができるが、本明細書に提示される方法を使用して、染色体など対象のゲノムの一部のアセンブリ、又は様々な長さの対象のクロマチンのアセンブリを決定できることも理解されるべきである。 In various embodiments, the present disclosure provides a method of genome assembly that combines DNA preparation techniques with paired-end sequencing to discover short, medium, and long-term connections within a given genome at high throughput. The present disclosure further provides a method of assisting genome assembly for haplotype fading and / or metagenomic studies using these connections. The method presented herein can be used to determine the assembly of the genome of interest, but the method presented herein can be used to assemble a portion of the genome of interest, such as a chromosome. Alternatively, it should be understood that the assembly of chromatin of interest of various lengths can be determined.

いくつかの実施形態において、本開示は、対象から得た標的DNAの配列決定断片から複数のコンティグを生成するステップを含む本明細書に開示する1つ以上の方法を提供する。標的DNAの長いストレッチは、1つ以上の制限酵素でDNAを切断する、DNAを剪断する、又はその組合せによって断片化することができる。得られた断片を、高スループット配列決定法を使用して配列決定して、複数の配列決定リードを得ることができる。本開示の方法で使用できる高スループット配列決定法の例には、それだけには限らないが、Roche Diagnostics社によって開発された454パイロシークエンス法、Illumina社によって開発された「クラスタ」配列決定法、Life Technologies社によって開発されたSOLiD及びイオン半導体配列決定法、並びにComplete Genomics社によって開発されたDNAナノボール配列決定法がある。次いで異なる配列決定リードの重なり合う末端をアセンブルして、コンティグを形成することができる。別法として、断片化した標的DNAを、ベクターにクローニングすることができる。次いでDNAベクターで細胞又は生物をトランスフェクトして、ライブラリーを形成する。トランスフェクトした細胞又は生物を複製した後に、ベクターを単離し、配列決定して、複数の配列決定リードを生成する。次いで異なる配列決定リードの重なり合う末端をアセンブルして、コンティグを形成することができる。 In some embodiments, the disclosure provides one or more methods disclosed herein comprising the step of generating multiple contigs from a sequenced fragment of target DNA obtained from a subject. Long stretches of target DNA can be fragmented by cutting the DNA with one or more restriction enzymes, shearing the DNA, or a combination thereof. The resulting fragments can be sequenced using a high throughput sequencing method to obtain multiple sequencing reads. Examples of high-throughput sequencing methods that can be used in the methods of the present disclosure are, but are not limited to, the 454 pyrosequencing method developed by Roche Diagnostics, the "cluster" sequencing method developed by Illumina, and Life Technologies. There are SOLiD and ion semiconductor sequencing methods developed by the company, and DNA nanoball sequencing methods developed by Complete Genomics. The overlapping ends of different sequencing reads can then be assembled to form a contig. Alternatively, the fragmented target DNA can be cloned into a vector. Cells or organisms are then transfected with a DNA vector to form a library. After replicating the transfected cells or organism, the vector is isolated and sequenced to generate multiple sequencing reads. The overlapping ends of different sequencing reads can then be assembled to form a contig.

図1で示すように、ゲノムアセンブリは、高スループット配列決定技術で特に、問題を含み得る。しばしば、アセンブリは数千又は数万もの短いコンティグからなる。これらコンティグの順序及び方向は通常未知であり、ゲノムアセンブリの有用性を制限している。これらのスキャフォールドを順序付け、方向付けるための技術が存在するが、それらは通常、高価であり、労力を要し、しばしば非常に長い範囲の相互作用の発見に失敗する。 As shown in Figure 1, genomic assembly can be particularly problematic in high-throughput sequencing techniques. Often, the assembly consists of thousands or even tens of thousands of short contigs. The order and orientation of these contigs is usually unknown, limiting the usefulness of genomic assembly. Techniques for ordering and orienting these scaffolds exist, but they are usually expensive, labor-intensive, and often fail to find very long-range interactions.

コンティグを生成するために使用される標的DNAを含むサンプルは、体液(例えば、血液、尿、血清、リンパ、唾液、肛門及び膣の分泌物、汗並びに精液)を採取すること、組織を採取すること又は細胞/生物を収集することを含めた多くの手段によって対象から得ることができる。得られたサンプルは、単一型の細胞/生物から構成されてもよく、又は複数の型の細胞/生物から構成されてもよい。DNAを、対象のサンプルから抽出し、調製することができる。例えば、公知の溶解緩衝液、超音波処理技法、エレクトロポレーションなどを使用してサンプルを処理して、ポリヌクレオチドを含む細胞を溶解させることができる。アルコール抽出、セシウム勾配及び/又はカラムクロマトグラフィーを使用することによって標的DNAを更に精製して、タンパク質などの夾雑物を除去することができる。 Samples containing target DNA used to generate contigs are collected from body fluids (eg, blood, urine, serum, lymph, saliva, anal and vaginal secretions, sweat and semen), tissue collection. It can be obtained from the subject by many means, including collecting cells / organisms. The resulting sample may be composed of a single type of cell / organism, or may be composed of multiple types of cells / organism. DNA can be extracted and prepared from the sample of interest. For example, samples can be treated with known lysis buffers, sonication techniques, electroporation, etc. to lyse cells containing polynucleotides. The target DNA can be further purified to remove contaminants such as proteins by using alcohol extraction, cesium gradient and / or column chromatography.

本開示の他の実施形態において、極めて高分子量のDNAを抽出する方法が提供される。いくつかの場合において、XLRPライブラリーからのデータは、入力DNAの断片サイズを増加させることによって改善することができる。いくつかの例において、細胞からメガベースサイズのDNA断片を抽出することにより、ゲノム中で数メガベース隔てられたリード対を作製することができる。いくつかの場合において、作製されたリード対は、約10kB、約50kB、約100kB、約200kB、約500kB、約1Mb、約2Mb、約5Mb、約10Mb又は約100Mbを超える範囲にわたる配列情報を提供することができる。いくつかの例において、リード対は、約500kBを超える範囲にわたる配列情報を提供することができる。更なる例において、リード対は、約2Mbを超える範囲にわたる配列情報を提供することができる。いくつかの場合において、極めて高分子量のDNAは、非常に穏やかな細胞溶解(Teague,B.ら、(2010)Proc. Nat. Acad. Sci. USA 107(24)、10848～53頁)及びアガロースプラグ(Schwartz,D. C.及びCantor,C. R. (1984) Cell、37(1)、67～75頁)によって抽出することができる。他の場合において、長さメガベースまでのDNA分子を精製できる市販の機械を使用して、極めて高分子量のDNAを抽出することができる。 In another embodiment of the present disclosure, a method for extracting a very high molecular weight DNA is provided. In some cases, the data from the XLRP library can be improved by increasing the fragment size of the input DNA. In some examples, extraction of megabase-sized DNA fragments from cells can produce read pairs separated by several megabases in the genome. In some cases, the produced read pair provides sequence information over a range of about 10 kB, about 50 kB, about 100 kB, about 200 kB, about 500 kB, about 1 Mb, about 2 Mb, about 5 Mb, about 10 Mb or about 100 Mb. can do. In some examples, the read pair can provide sequence information over a range of more than about 500 kB. In a further example, the read pair can provide sequence information over a range of more than about 2 Mb. In some cases, extremely high molecular weight DNA is very mild cytolysis (Teague, B. et al., (2010) Proc. Nat. Acad. Sci. USA 107 (24), pp. 10848-53) and agarose. It can be extracted by a plug (Schwartz, DC and Cantor, CR (1984) Cell, 37 (1), pp. 67-75). In other cases, extremely high molecular weight DNA can be extracted using commercially available machines capable of purifying DNA molecules up to megabase in length.

様々な実施形態において、本開示は、生細胞内の染色体の物理的レイアウトをプロービングするステップを含む本明細書に開示する1つ以上の方法を提供する。配列決定よって染色体の物理的レイアウトをプロービングする技法の例には、染色体高次構造捕捉(「3C」)、環状染色体高次構造捕捉(「4C」)、カーボンコピー染色体捕捉(carbon-copy chromosome capture)(「5C」)及びHi-Cに基づく方法など「C」系技法、並びにChIP-ループ、ChIP-PETなどChIPに基づく方法がある。これらの技法は、核内の空間的関係を固めるために生細胞におけるクロマチンの固定を利用する。その後に産物を処理及び配列決定することにより、研究者は、ゲノム領域の中で最も近い会合の基質を回復することができる。更なる分析により、これらの会合を使用して、生きている核内でそれらが物理的に配列されている通りに染色体の三次元幾何学的地図を作製することができる。そのような技法は、生細胞における染色体の別々の空間的組織化について記述しており、染色体座の中の機能的な相互作用の正確な考察を提供する。これらの機能性研究を悩ました問題の1つは、染色体の近接に起因するに過ぎないデータ中に存在する会合である非特異的な相互作用の存在であった。本開示において、本明細書に提示される方法によってこれらの非特異的な染色体内相互作用が捕捉されて、アセンブリに有益な情報が提供される。 In various embodiments, the present disclosure provides one or more methods disclosed herein comprising the steps of probing the physical layout of chromosomes in living cells. Examples of techniques for probing the physical layout of chromosomes by sequencing are chromosome higher-order structure capture (“3C”), circular chromosome higher-order structure capture (“4C”), and carbon-copy chromosome capture. ) ("5C") and "C" -based techniques such as Hi-C based methods, as well as ChIP-based methods such as ChIP-loop and ChIP-PET. These techniques utilize chromatin fixation in living cells to consolidate spatial relationships within the nucleus. Subsequent processing and sequencing of the product allows the investigator to recover the substrate of the closest association in the genomic region. Further analysis allows these associations to be used to create a three-dimensional geometric map of the chromosomes as they are physically arranged in the living nucleus. Such techniques describe the separate spatial organization of chromosomes in living cells and provide an accurate look at functional interactions within chromosome loci. One of the problems that plagued these functional studies was the presence of non-specific interactions, which are associations present in the data that are solely due to chromosomal proximity. In the present disclosure, these non-specific intrachromosomal interactions are captured by the methods presented herein to provide useful information for assembly.

いくつかの実施形態において、染色体内相互作用は、染色体接続性と相関する。いくつかの場合において、染色体内データは、ゲノムアセンブリを補助することができる。いくつかの場合において、クロマチンはin vitroで再構築される。クロマチン、特にクロマチンの主要なタンパク質構成要素であるヒストンは、配列決定によってクロマチン高次構造及び構造を検出するために最も一般的な「C」系技法:3C、4C、5C及びHi-Cの下で固定に重要なので、これは有利になり得る。クロマチンは、配列の観点で非常に非特異的であり、通常ゲノムの全体で均一にアセンブルすることになる。いくつかの場合において、クロマチンを使用しない種のゲノムは、再構築クロマチンでアセンブルすることができ、それにより本開示の範囲を生物の全てのドメインに拡張することができる。 In some embodiments, intrachromosomal interactions correlate with chromosomal connectivity. In some cases, intrachromosomal data can assist in genomic assembly. In some cases, chromatin is reconstituted in vitro. Chromatin, especially histones, the major protein component of chromatin, is the most common "C" -based technique for detecting chromatin higher-order structures and structures by sequencing: under 3C, 4C, 5C and Hi-C. This can be advantageous as it is important for fixation. Chromatin is very non-specific in terms of sequence and usually results in uniform assembly throughout the genome. In some cases, the genomes of chromatin-free species can be assembled with reconstituted chromatin, thereby extending the scope of the disclosure to all domains of the organism.

クロマチン高次構造捕捉技法について、図2に要約する。要するに、物理的に近接近しているゲノム領域間で架橋が作り出される。クロマチン内におけるタンパク質(ヒストンなど)とDNA分子、例えばゲノムDNAとの架橋は、本明細書の他の場所で更に詳細に記述される適切な方法又は当技術分野において公知の別の方法により達成することができる。いくつかの場合において、2つ以上のヌクレオチド配列は、1つ以上のヌクレオチド配列に結合しているタンパク質を介して架橋することができる。1つの手法は、紫外線照射にクロマチンを露出させることである(Gilmourら、Proc. Nat'l. Acad. Sci. USA、81:4275～4279頁、1984年)。ポリヌクレオチドセグメントの架橋は、化学的又は物理的な(例えば光学的)架橋など、他の手法を利用して実施することもできる。適切な化学的架橋剤には、それだけには限らないが、ホルムアルデヒド及びソラレンがある(Solomonら、Proc. NatL. Acad. Sci. USA、82:6470～6474頁、1985年、Solomonら、Cell、53:937～947頁、1988年)。例えば、架橋は、DNA分子及びクロマチンタンパク質を含む混合物に2%ホルムアルデヒドを添加することによって実施することができる。DNAを架橋するのに使用できる薬剤の他の例には、それだけには限らないが、紫外線、マイトマイシンC、窒素マスタード、メルファラン、1,3-ブタジエンジエポキシド、cisジアミンジクロロ白金(II)及びシクロホスファミドがある。最適には、架橋剤は、約2Åなど比較的短い距離を橋渡しする架橋を形成することになり、それにより可逆的である密接な相互作用を選択する。 The chromatin higher-order structure capture technique is summarized in Figure 2. In short, crosslinks are created between physically close genomic regions. Cross-linking of proteins (such as histones) with DNA molecules, such as genomic DNA, within chromatin is accomplished by appropriate methods described in more detail elsewhere herein or by other methods known in the art. be able to. In some cases, two or more nucleotide sequences can be crosslinked via a protein attached to one or more nucleotide sequences. One technique is to expose chromatin to UV irradiation (Gilmour et al., Proc. Nat'l. Acad. Sci. USA, 81: 4275-4279, 1984). Cross-linking of polynucleotide segments can also be performed using other techniques, such as chemical or physical (eg, optical) cross-linking. Suitable chemical cross-linking agents include, but are not limited to, formaldehyde and psoralen (Solomon et al., Proc. NatL. Acad. Sci. USA, 82: 6470-6474, 1985, Solomon et al., Cell, 53. : 937-947, 1988). For example, cross-linking can be performed by adding 2% formaldehyde to a mixture containing DNA molecules and chromatin proteins. Other examples of agents that can be used to crosslink DNA are, but are not limited to, UV, mitomycin C, nitrogen mustard, melphalan, 1,3-butadienediepoxide, cisdiaminedichloroplatinum (II) and cyclo. There is phosphamide. Optimally, the cross-linking agent will form cross-links that bridge relatively short distances, such as about 2 Å, thereby selecting close interactions that are reversible.

いくつかの実施形態において、DNA分子は、架橋の前又は後に免疫沈降することができる。いくつかの場合において、DNA分子は断片化することができる。断片は、アセチル化ヒストン、例えばH3を特異的に認識し、それに結合する抗体などの結合相手と接触させることができる。そのような抗体の例には、それだけには限らないが、抗アセチル化ヒストンH3があり、Upstate Biotechnology社、Lake Placid、N.Y.から入手可能である。免疫沈降からのポリヌクレオチドは、その後免疫沈降から収集することができる。クロマチンを断片化する前に、アセチル化ヒストンを、隣接するポリヌクレオチド配列に架橋することができる。次いでその混合物を処理して、混合物中のポリヌクレオチドを分画する。分割技法は、当技術分野において公知であり、例えばより小さいゲノム断片を生成するための剪断技法を含む。断片化は、例えば、超音波処理、剪断及び/又は制限酵素の使用を含めたクロマチンを断片化するために確立された方法を使用して達成できる。制限酵素は、1、2、3、4、5又は6塩基長の制限部位を有することができる。制限酵素の例には、それだけには限らないがAatII、Acc65I、AccI、AciI、AclI、AcuI、AfeI、AflII、AflIII、AgeI、AhdI、AleI、AluI、AlwI、AlwNI、ApaI、ApaLI、ApeKI、ApoI、AscI、AseI、AsiSI、AvaI、AvaII、AvrII、BaeGI、BaeI、BamHI、BanI、BanII、BbsI、BbvCI、BbvI、BccI、BceAI、BcgI、BciVI、BclI、BfaI、BfuAI、BfuCI、BglI、BglII、BlpI、BmgBI、BmrI、BmtI、BpmI、Bpul0I、BpuEI、BsaAI、BsaBI、BsaHI、BsaI、BsaJI、BsaWI、BsaXI、BscRI、BscYI、BsgI、BsiEI、BsiHKAI、BsiWI、BslI、BsmAI、BsmBI、BsmFI、BsmI、BsoBI、Bsp1286I、BspCNI、BspDI、BspEI、BspHI、BspMI、BspQI、BsrBI、BsrDI、BsrFI、BsrGI、BsrI、BssHII、BssKI、BssSI、BstAPI、BstBI、BstEII、BstNI、BstUI、BstXI、BstYI、BstZ17I、Bsu36I、BtgI、BtgZI、BtsCI、BtsI、Cac8I、ClaI、CspCI、CviAII、CviKI-1、CviQI、DdcI、DpnI、DpnII、DraI、DraIII、DrdI、EacI、EagI、EarI、EciI、Eco53kI、EcoNI、EcoO109I、EcoP15I、EcoRI、EcoRV、FatI、FauI、Fnu4HI、FokI、FseI、FspI、HaeII、HaeIII、HgaI、HhaI、HincII、HindIII、HinfI、HinPlI、HpaI、HpaII、HphI、Hpy166II、Hpy188I、Hpy188III、Hpy99I、HpyAV、HpyCH4III、HpyCH4IV、HpyCH4V、KasI、KpnI、MboI、MboII、MfeI、MluI、MlyI、MmeI、MnlI、MscI、MseI、MslI、MspAlI、MspI、MwoI、NaeI、NarI、Nb.BbvCI、Nb.BsmI、Nb.BsrDI、Nb.BtsI、NciI、NcoI、NdeI、NgoMIV、NheI、NlaIII、NlaIV、NmeAIII、NotI、NruI、NsiI、NspI、Nt.AlwI、Nt.BbvCI、Nt.BsmAI、Nt.BspQI、Nt.BstNBI、Nt.CviPII、PacI、PaeR7I、PciI、PflFI、PflMI、PhoI、PleI、PmeI、PmlI、PpuMI、PshAI、PsiI、PspGI、PspOMI、PspXI、PstI、PvuI、PvuII、RsaI、RsrII、SacI、SacII、SalI、SapI、Sau3AI、Sau96I、SbfI、ScaI、ScrFI、SexAI、SfaNI、SfcI、SfiI、SfoI、SgrAI、SmaI、SmlI、SnaBI、SpeI、SphI、SspI、StuI、StyD4I、StyI、SwaI、T、TaqαI、TfiI、TliI、TseI、Tsp45I、Tsp509I、TspMI、TspRI、Tth111I、XbaI、XcmI、XhoI、XmaI、XmnI、及びZraIがある。得られる断片は、サイズが異なり得る。得られる断片は、5'又は3'末端に一本鎖突出を含むこともできる。 In some embodiments, the DNA molecule can be immunoprecipitated before or after cross-linking. In some cases, DNA molecules can be fragmented. Fragments can specifically recognize acetylated histones, such as H3, and contact them with binding partners such as antibodies that bind to them. Examples of such antibodies include, but are not limited to, anti-acetylated histone H3, which is available from Upstate Biotechnology, Lake Placid, NY. Polynucleotides from immunoprecipitation can then be collected from immunoprecipitation. Acetylated histones can be cross-linked to adjacent polynucleotide sequences prior to fragmenting chromatin. The mixture is then processed to fractionate the polynucleotides in the mixture. Division techniques are known in the art and include, for example, shearing techniques for producing smaller genomic fragments. Fragmentation can be achieved using established methods for fragmenting chromatin, including, for example, sonication, shearing and / or the use of restriction enzymes. Restriction enzymes can have restriction sites of 1, 2, 3, 4, 5 or 6 base lengths. Examples of restriction enzymes include, but are not limited to, AatII, Acc65I, AccI, AciI, AclI, AcuI, AfeI, AflII, AflIII, AgeI, AhdI, AleI, AluI, AlwI, AlwNI, ApaI, ApaLI, ApeKI, ApoI, AscI, AseI, AsiSI, AvaI, AvaII, AvrII, BaeGI, BaeI, BamHI, BanI, BanII, BbsI, BbvCI, BbvI, BccI, BceAI, BcgI, BciVI, BclI, BfaI, BfuAI, BfuCI, BglI BmgBI, BmrI, BmtI, BpmI, Bpul0I, BpuEI, BsaAI, BsaBI, BsaHI, BsaI, BsaJI, BsaWI, BsaXI, BscRI, BscYI, BsgI, BsiEI, BsiHKAI, BsiWI, BslI, BsmAI, BsmBI Bsp1286I, BspCNI, BspDI, BspEI, BspHI, BspMI, BspQI, BsrBI, BsrDI, BsrFI, BsrGI, BsrI, BssHII, BssKI, BssSI, BstAPI, BstBI, BstEII, BstNI, BstUI, BstUI, Bst BtgZI, BtsCI, BtsI, Cac8I, ClaI, CspCI, CviAII, CviKI-1, CviQI, DdcI, DpnI, DpnII, DraI, DraIII, DrdI, EcoI, EagleI, EarI, EcoI, Eco53kI, EcoNI, EcoO109 EcoRV, FatI, FauI, Fnu4HI, FokI, FseI, FspI, HaeII, HaeIII, HgaI, HhaI, HincII, HindIII, HinfI, HinPlI, HpaI, HpaII, HphI, Hpy166II, Hpy188I, Hpy188III, Hpy99I HpyCH4V, KasI, KpnI, MboI, MboII, MfeI, MluI, MlyI, MmeI, MnlI, MscI, MseI, MslI, MspAlI, MspI, MwoI, NaeI, NarI, Nb.BbvCI, Nb.BsmI, Nb.B. B tsI, NciI, NcoI, NdeI, NgoMIV, NheI, NlaIII, NlaIV, NmeAIII, NotI, NruI, NsiI, NspI, Nt.AlwI, Nt.BbvCI, Nt.BsmAI, Nt.BspQI, Nt.BstNBI, Nt.C PacI, PaeR7I, PciI, PflFI, PflMI, PhoI, PleI, PmeI, PmlI, PpuMI, PshAI, PsiI, PspGI, PspOMI, PspXI, PstI, PvuI, PvuII, RsaI, RsrII, SacI, SacII, Sal Sau96I, SbfI, ScaI, ScrFI, SexAI, SfaNI, SfcI, SfiI, SfoI, SgrAI, SmaI, SmlI, SnaBI, SpeI, SphI, SspI, StuI, StyD4I, StyI, SwaI, T, TaqαI, TfiT There are Tsp45I, Tsp509I, TspMI, TspRI, Tth111I, XbaI, XcmI, XhoI, XmaI, XmnI, and ZraI. The resulting fragments may vary in size. The resulting fragment can also contain a single chain overhang at the 5'or 3'end.

いくつかの実施形態において、超音波処理技法を使用して、約100～5000ヌクレオチドの断片を得ることができる。別法として、約100～1000、約150～1000、約150～500、約200～500、又は約200～400ヌクレオチドの断片を得ることができる。サンプルは、架橋している結合配列セグメントの配列決定用として調製することができる。いくつかの場合において、例えば分子内架橋された2つの配列セグメントをライゲーションすることによりポリヌクレオチドの単一の短いストレッチを作り出すことができる。配列情報は、高スループット配列決定法など、本明細書の他の場所で更に詳細に記述される任意の適切な配列決定技法又は当技術分野において公知の別の技法を使用してサンプルから得ることができる。例えば、ライゲーション産物をペアエンド配列決定に供して、断片の各末端から配列情報を得ることができる。配列セグメントの対は、得られた配列情報で表すことができ、ポリヌクレオチドに沿って2つの配列セグメントを隔てている直線距離を超えてハロタイプ判定情報を関連付ける。 In some embodiments, sonication techniques can be used to obtain fragments of about 100-5000 nucleotides. Alternatively, fragments of about 100-1000, about 150-1000, about 150-500, about 200-500, or about 200-400 nucleotides can be obtained. Samples can be prepared for sequencing of crosslinked binding sequence segments. In some cases, a single short stretch of a polynucleotide can be created, for example by ligating two intramolecularly crosslinked sequence segments. Sequencing information can be obtained from the sample using any suitable sequencing technique described in more detail elsewhere herein, such as high throughput sequencing techniques, or another technique known in the art. Can be done. For example, the ligation product can be subjected to pair-end sequencing to obtain sequence information from each end of the fragment. Pairs of sequence segments can be represented by the resulting sequence information and correlate halotype determination information beyond the linear distance separating the two sequence segments along the polynucleotide.

Hi-Cによって生成されるデータの特徴の1つは、ゲノムへマッピングした際に、大部分のリード対が線形に近接近していることが判明するということである。即ち、大部分のリード対が、ゲノム中で互いに近くにあることが判明する。得られたデータセットにおいて、染色体が別個のテリトリーを占有する場合に期待される通り、染色体内接触の確率は、平均して染色体間接触のそれより非常に高い。更に、相互作用の確率は、直線距離に伴い速やかに低下するが、同じ染色体上で>200Mb隔てられている座でさえ異なる染色体上の座より相互作用する可能性が高い。長い範囲の染色体内及び特に染色体間接触の検出において、この短及び中範囲の染色体内接触の「バックグラウンド」は、Hi-C分析を使用して差し引くべきバックグラウンドノイズである。 One of the characteristics of the data generated by Hi-C is that when mapped to the genome, most read pairs are found to be in close linear proximity. That is, it turns out that most of the read pairs are close to each other in the genome. In the resulting dataset, the probability of intrachromosomal contact is, on average, much higher than that of interchromosomal contact, as would be expected if the chromosomes occupy a separate territory. Moreover, the probability of interaction decreases rapidly with linear distance, but even loci separated by> 200 Mb on the same chromosome are more likely to interact than loci on different chromosomes. In the detection of long-range intrachromosomal and especially interchromosomal contacts, this "background" of short- and medium-range intrachromosomal contacts is the background noise that should be subtracted using Hi-C analysis.

特に、真核生物におけるHi-C実験は、種特異的及び細胞型特異クロマチン相互作用に加えて、2つの標準的な相互作用パターンを示した。1つ目のパターンである距離依存的な低下(DDD)は、ゲノム距離の関数として相互作用頻度における低下の一般的な傾向である。2つ目のパターンであるシス-トランス比率(CTR)は、数十メガベースの配列によっても隔てられている場合でも、異なる染色体上の座に対してより同じ染色体上に位置する座の間で著しく高い相互作用頻度である。これらのパターンは、一般的なポリマー力学を反映する可能性があり、近位座は、ランダムに相互作用する確率が高く、またほとんど混じり合わずに核内で別個の体積を占有する傾向がある間期染色体の現象である染色体テリトリーの形成など特定の核組織化の特徴を有する。これらの2つのパターンの正確な詳細は、種、細胞型及び細胞の状態間で変化し得るが、それらは遍在しており、顕著である。これらのパターンは非常に強力で、一貫性があり、それらを使用して実験品質を判定し、通常はデータから標準化して詳細な相互作用を明らかにする。しかしながら、本明細書に開示する方法において、ゲノムアセンブリはゲノムの三次元構造を利用することができる。標準的なHi-C相互作用パターンに特定のループ相互作用の分析に対する障害を起こす特徴、即ちそれらの遍在性、強力さ及び一貫性は、コンティグのゲノム位置を推定するための強力なツールとして使用できる。 In particular, Hi-C experiments in eukaryotes showed two standard patterns of interaction, in addition to species-specific and cell-type-specific chromatin interactions. The first pattern, distance-dependent decline (DDD), is a common trend of decline in interaction frequency as a function of genomic distance. The second pattern, the cis-trans ratio (CTR), is markedly between loci located on the same chromosome relative to loci on different chromosomes, even when separated by sequences of tens of megabases. High interaction frequency. These patterns may reflect general polymer mechanics, where proximal loci are more likely to interact randomly and tend to occupy separate volumes within the nucleus with little mixing. It has specific nuclear organization characteristics such as the formation of chromosomal territories, which is a phenomenon of interphase chromosomes. The exact details of these two patterns can vary between species, cell types and cell states, but they are ubiquitous and prominent. These patterns are very strong and consistent and are used to determine experimental quality and are usually standardized from the data to reveal detailed interactions. However, in the methods disclosed herein, genomic assembly can utilize the three-dimensional structure of the genome. The features that impede the analysis of specific loop interactions in standard Hi-C interaction patterns, namely their ubiquity, strength and consistency, are powerful tools for estimating the genomic position of contigs. Can be used.

具体的実施において、染色体内リード対間の物理的な距離の検討から、ゲノムアセンブリに関するデータのいくつかの有用な特徴が示される。第1に、短い範囲の相互作用は、長い範囲の相互作用より一般的である(例えば、図6を参照のこと)。即ち、リード対の各リードは、実際のゲノムにおいて遠く離れている領域とよりも、すぐ近くにある領域と組になる可能性が高い。第2には、中間及び長い範囲の相互作用のロングテールがある。即ち、リード対は、キロベース(kB)又は更にメガベース距離(Mb)で染色体内編成についての情報を保有している。例えば、リード対は、約10kB、約50kB、約100kB、約200kB、約500kB、約1Mb、約2Mb、約5Mb、約10Mb又は約100Mbを超える範囲にわたる配列情報を提供することができる。データのこれら特徴は、同じ染色体上で近くにあるゲノムの領域が、物理的に近接近している可能性が高いことを単純に示しており、それら領域はDNA骨格によって互いに化学的に連結されているので期待された通りの結果である。Hi-Cによって生成されるデータセットなどゲノム全体のクロマチン相互作用のそれは、染色体全体に沿って配列のグループ化及び直線的組織化についての長い範囲の情報を提供することができると推察された。 In a specific practice, a study of the physical distance between intrachromosomal read pairs reveals some useful features of the data regarding genomic assembly. First, short-range interactions are more common than long-range interactions (see, eg, Figure 6). That is, each read in a read pair is more likely to pair with a region that is closer than it is to a region that is far away in the actual genome. Second, there is a long tail of intermediate and long range interactions. That is, the lead pair holds information about intrachromosomal organization at kilobase (kB) or even megabase distance (Mb). For example, a read pair can provide sequence information over a range of more than about 10 kB, about 50 kB, about 100 kB, about 200 kB, about 500 kB, about 1 Mb, about 2 Mb, about 5 Mb, about 10 Mb, or about 100 Mb. These features of the data simply indicate that regions of the genome that are close together on the same chromosome are likely to be physically close together, and these regions are chemically linked to each other by the DNA backbone. The result is as expected. It was speculated that that of genome-wide chromatin interactions, such as the dataset produced by Hi-C, could provide a long range of information on sequence grouping and linear organization along the entire chromosome.

Hi-Cの実験的方法は、直接的であり、比較的低コストであるが、現在の手順は、ゲノムアセンブリ及びハロタイプ判定のために10⁶～10⁸個の細胞、特に特定のヒト患者サンプルから得ることが不可能な極めて多量の材料を必要とする。対照的に、本明細書に開示する方法は、著しく少ない細胞由来材料で遺伝子型アセンブリ、ハプロタイプフェージング及びメタゲノミクスの正確且つ予測的な結果をもたらす方法を含む。例えば、約0.1μg、約0.2μg、約0.3μg、約0.4μg、約0.5μg、約0.6μg、約0.7μg、約0.8μg、約0.9μg、約1.0μg、約1.2μg、約1.4μg、約1.6μg、約1.8μg、約2.0μg、約2.5μg、約3.0μg、約3.5μg、約4.0μg、約4.5μg、約5.0μg、約6.0μg、約7.0μg、約8.0μg、約9.0μg、約10μg、約15μg、約20μg、約30μg、約40μg、約50μg、約60μg、約70μg、約80μg、約90μg、約100μg、約150μg、約200μg、約300μg、約400μg、約500μg、約600μg、約700μg、約800μg、約900μg、又は約1000μg未満のDNAを、本明細書に開示する方法で使用することができる。いくつかの例において、本明細書に開示する方法において使用されるDNAは、約1,000,000、約500,000、約100,000、約50,000、約10,000、約5,000、約1,000、約5,000、又は約1,000、約500、又は約100個未満の細胞から抽出することができる。 Experimental methods of Hi-C are direct and relatively low cost, but the current procedure is 10 ^6-10 ⁸ cells for genomic assembly and haplotyping, especially specific human patient samples. It requires a very large amount of material that cannot be obtained from. In contrast, the methods disclosed herein include methods that provide accurate and predictive results for genotype assembly, haplotype fading and metagenomics with significantly less cell-derived material. For example, about 0.1 μg, about 0.2 μg, about 0.3 μg, about 0.4 μg, about 0.5 μg, about 0.6 μg, about 0.7 μg, about 0.8 μg, about 0.9 μg, about 1.0 μg, about 1.2 μg, about 1.4 μg, About 1.6 μg, about 1.8 μg, about 2.0 μg, about 2.5 μg, about 3.0 μg, about 3.5 μg, about 4.0 μg, about 4.5 μg, about 5.0 μg, about 6.0 μg, about 7.0 μg, about 8.0 μg, about 9.0 μg, about 10 μg, about 15 μg, about 20 μg, about 30 μg, about 40 μg, about 50 μg, about 60 μg, about 70 μg, about 80 μg, about 90 μg, about 100 μg, about 150 μg, about 200 μg, about 300 μg, about 400 μg, about 500 μg, DNA of less than about 600 μg, about 700 μg, about 800 μg, about 900 μg, or about 1000 μg can be used in the methods disclosed herein. In some examples, the DNA used in the methods disclosed herein is about 1,000,000, about 500,000, about 100,000, about 50,000, about 10,000, about 5,000, about 1,000, about 5,000, or about 1,000, about 500. , Or can be extracted from less than about 100 cells.

一般に、Hi-Cに基づく技法など染色体の物理的レイアウトをプロービングする手順は、培養細胞又は一次組織から単離されたクロマチンなど、細胞/生物内で形成されるクロマチンを利用する。本開示は、細胞/生物から単離されるクロマチンだけでなく再構成クロマチンでのそのような技法の使用も提供する。再構成クロマチンは、様々な特徴について細胞/生物内で形成されるクロマチンと区別される。第1に、多くのサンプルの場合、ネイキッドDNAサンプルの収集は、体液を収集する、口腔又は直腸部位を綿棒でふき取る、上皮サンプルを採取する等などによる非侵襲的から侵襲的な様々な方法を使用することにより実現できる。第2に、クロマチンの再構成は、ゲノムアセンブリ並びにハプロタイプフェージングに対する人工産物を生成する染色体間及び他の長い範囲の相互作用の形成を実質的に妨げる。いくつかの場合において、本開示の方法及び組成物に従って、サンプルは、約20、15、12、11、10、9、8、7、6、5、4、3、2、1、0.5、0.4、0.3、0.2、0.1%未満若しくはより少ない染色体間又は分子間架橋を有することができる。いくつかの例において、サンプルは約5%未満の染色体間又は分子間架橋を有することができる。いくつかの例において、サンプルは約3%未満の染色体間又は分子間架橋を有することができる。更なる例において、約1%未満の染色体間又は分子間架橋を有することができる。第3に、架橋能力がある部位の頻度、したがってポリヌクレオチド内の分子内架橋の頻度は調整することができる。例えば、ヌクレオソーム密度を所望の値に調整できるように、ヒストンに対するDNAの比を変化させることができる。いくつかの場合において、ヌクレオソーム密度は、生理的レベル以下に減少する。したがって、架橋の分布は、より長い範囲の相互作用に有利に働くように改変することができる。いくつかの実施形態において、様々な架橋密度を持つサブサンプルを調製して、長短両方の範囲の会合を網羅することができる。例えば、架橋条件は、少なくとも約1%、約2%、約3%、約4%、約5%、約6%、約7%、約8%、約9%、約10%、約11%、約12%、約13%、約14%、約15%、約16%、約17%、約18%、約19%、約20%、約25%、約30%、約40%、約45%、約50%、約60%、約70%、約80%、約90%、約95%又は約100%の架橋が、サンプルDNA分子上で少なくとも約50kb、約60kb、約70kb、約80kb、約90kb、約100kb、約110kb、約120kb、約130kb、約140kb、約150kb、約160kb、約180kb、約200kb、約250kb、約300kb、約350kb、約400kb、約450kb又は約500kb離れているDNAセグメント間に起こるように調整することができる。 In general, procedures for probing the physical layout of chromosomes, such as Hi-C-based techniques, utilize chromatin formed in cells / organisms, such as chromatin isolated from cultured cells or primary tissues. The present disclosure provides for the use of such techniques in reconstituted chromatin as well as chromatin isolated from cells / organisms. Reconstituted chromatin is distinguished from chromatin formed in cells / organisms for various characteristics. First, for many samples, collecting naked DNA samples can be done in a variety of non-invasive to invasive ways, such as collecting body fluids, wiping the oral or rectal area with a cotton swab, or collecting epithelial samples. It can be realized by using it. Second, chromatin rearrangement substantially impedes the formation of interchromosomal and other long-range interactions that produce artificial products for genomic assembly and haplotype fading. In some cases, according to the methods and compositions of the present disclosure, the sample will be about 20, 15, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.5, 0.4. , 0.3, 0.2, less than 0.1% or less interchromosomal or intermolecular crosslinks. In some examples, the sample can have less than about 5% interchromosomal or intermolecular crosslinks. In some examples, the sample can have less than about 3% interchromosomal or intermolecular crosslinks. In a further example, it can have less than about 1% interchromosomal or intermolecular crosslinks. Third, the frequency of sites capable of cross-linking, and thus the frequency of intramolecular cross-linking within the polynucleotide, can be adjusted. For example, the ratio of DNA to histones can be varied so that the nucleosome density can be adjusted to the desired value. In some cases, nucleosome density is reduced below physiological levels. Therefore, the distribution of crosslinks can be modified to favor longer range of interactions. In some embodiments, subsamples with varying crosslink densities can be prepared to cover both long and short range of associations. For example, the cross-linking conditions are at least about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 11%. , About 12%, about 13%, about 14%, about 15%, about 16%, about 17%, about 18%, about 19%, about 20%, about 25%, about 30%, about 40%, about 45%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95% or about 100% crosslinks are at least about 50 kb, about 60 kb, about 70 kb, about 70 kb on the sample DNA molecule. 80kb, about 90kb, about 100kb, about 110kb, about 120kb, about 130kb, about 140kb, about 150kb, about 160kb, about 180kb, about 200kb, about 250kb, about 300kb, about 350kb, about 400kb, about 450kb or about 500kb away It can be adjusted to occur between DNA segments that are cross-linked.

様々な実施形態において、本開示は、複数のコンティグに複数のリード対をマッピングすることを可能にする様々な方法を提供する。リードをコンティグ配列にマッピングするための一般公開されているコンピュータプログラムがいくつかある。これらのリードマッピングプログラムデータも、ゲノム内で特定のリードマッピングがどれ程固有かについて記述するデータを提供する。コンティグ内で高い信頼度で独自にマッピングされるリードの集団から、各リード対においてリード間の距離の分布を推測することができる。これらは図6に示されるデータである。リードが異なるコンティグに明確にマッピングされるリード対の場合、このマッピングデータは、問題の2つのコンティグ間の接続を意味する。それは、上記の分析からわかる距離の分布に比例する2つのコンティグ間の距離も意味する。したがって、リードが異なるコンティグにマッピングされる各リード対は、正しいアセンブリにおけるそれらの2つのコンティグ間の接続を意味する。そのようなマッピングされたリード対全てから推測される接続は、隣接行列に要約することができ、各コンティグは、行及び列の両方で表される。コンティグを接続するリード対は、リード対中でリードをマッピングしたコンティグを示す対応する行及び列においてゼロ以外の値として記される。大部分のリード対は、コンティグ内にマッピングでき、そこからリード対間の距離の分布を知ることができ、そこから異なるコンティグにマッピングされるリード対を使用してコンティグの隣接行列を構築できる。 In various embodiments, the present disclosure provides various methods that allow mapping of multiple read pairs to multiple contigs. There are several open house computer programs for mapping reads to contig arrays. These read mapping program data also provide data that describes how unique a particular read mapping is within the genome. From a population of leads that are uniquely mapped within the contig with high reliability, the distribution of distances between leads can be inferred for each lead pair. These are the data shown in Figure 6. For lead pairs where the leads are clearly mapped to different contigs, this mapping data means the connection between the two contigs in question. It also means the distance between the two contigs, which is proportional to the distance distribution found from the above analysis. Therefore, each lead pair whose leads are mapped to different contigs means the connection between those two contigs in the correct assembly. The connections inferred from all such mapped read pairs can be summarized in an adjacency matrix, where each contig is represented by both rows and columns. The lead pair connecting the contigs is marked as a non-zero value in the corresponding row and column indicating the contig to which the leads are mapped in the lead pair. Most lead pairs can be mapped within the contig, from which the distribution of distances between the lead pairs can be known, from which lead pairs mapped to different contigs can be used to construct the adjacency matrix of the contig.

様々な実施形態において、本開示は、リード対データからのリードマッピングデータを使用してコンティグの隣接行列を構築するステップを含む方法を提供する。いくつかの実施形態において、隣接行列は、長い範囲の相互作用よりも短い範囲の相互作用の傾向を組み込むリード対に対する重み付けスキームを使用する(例えば、図3を参照のこと)。より短い距離に跨っているリード対は、より長い距離に跨るリード対より通常一般的である。特定の距離の確率について記述している写像は、この分布を知るために単一コンティグにマッピングされるリード対データを使用して合わせることができる。したがって、異なるコンティグにマッピングされるリード対の重要な特徴の1つは、それらがマッピングされるコンティグ上の位置である。両方ともコンティグの一方の末端近くにマッピングされたリード対の場合、これらコンティグ間の推測される距離は短くなり得、したがって接合されたリード間の距離は小さい。リード対間の距離は、長いより短い方が一般的なので、この構成は、リードがコンティグ端から遠く離れてマッピングされているというよりもこれら2つのコンティグが隣接していることのより強い証拠を提供する。したがって、隣接行列における接続は、コンティグの端までのリードの距離によって更に重み付けされる。更なる実施形態において、隣接行列を更に再スケーリングして、ゲノムの無差別な領域を表すいくつかのコンティグ上の多くの接触の重みを軽減することができる。リードマッピングの割合が高いことにより特定可能なゲノムのこれらの領域は、推測的に、アセンブリに誤った情報を与える可能性がある偽のリードマッピングを含有する可能性が高い。更に他の実施形態において、このスケーリングは、転写リプレッサーCTCF、内分泌受容体、コヒーシン又は共有結合的に修飾されたヒストンなど、クロマチンのスキャフォールド相互作用を調節する1つ以上の薬剤に対する1つ以上の保存結合部位を検索することによって、導くことができる。 In various embodiments, the present disclosure provides a method comprising the step of constructing a contig adjacency matrix using read mapping data from read-to-data. In some embodiments, the adjacency matrix uses a weighting scheme for lead pairs that incorporates a tendency for short-range interactions rather than long-range interactions (see, eg, Figure 3). Lead pairs that span shorter distances are usually more common than lead pairs that span longer distances. Maps describing the probabilities of a particular distance can be matched using read-to-data mapped to a single contig to know this distribution. Therefore, one of the key features of read pairs that are mapped to different contigs is their position on the contig to which they are mapped. For lead pairs both mapped near one end of the contig, the estimated distance between these contigs can be short, and therefore the distance between the joined leads is small. Since the distance between lead pairs is generally shorter than long, this configuration provides stronger evidence that these two contigs are adjacent rather than the leads being mapped far away from the contig edge. offer. Therefore, the connections in the adjacency matrix are further weighted by the lead distance to the end of the contig. In a further embodiment, the adjacency matrix can be further rescaled to reduce the weight of many contacts on some contigs that represent indiscriminate regions of the genome. These regions of the genome, which can be identified by the high proportion of read mappings, are likely to contain false read mappings, which can speculatively give false information to the assembly. In yet other embodiments, this scaling is one or more for one or more agents that regulate the scaffold interaction of chromatin, such as transcriptional repressor CTCF, endocrine receptors, cohesins or covalently modified histones. It can be derived by searching for the conservative binding site of.

いくつかの実施形態において、本開示は、隣接行列を分析して、その順序及び/又はゲノムに対する方向を表す、コンティグを通る経路を決定するステップを含む本明細書に開示する1つ以上の方法を提供する。他の実施形態では、各コンティグを丁度１回訪れる（呼び出す）よう、コンティグを通る経路を選択することができる。更なる実施形態において、隣接行列を通る経路が、訪れる（呼び出される）枝重みの合計を最大化するようコンティグを通る経路が選択される。このようにして、最も確実性が高いコンティグ接続が、正しいアセンブリとして提案される。更に他の実施形態において、各コンティグが丁度1回訪れられ（呼び出され）、隣接行列の枝重み付けが最大化されるようコンティグを通る経路を選択することができる。 In some embodiments, the disclosure comprises one or more methods disclosed herein comprising the step of analyzing an adjacency matrix to determine its order and / or direction through a contig, indicating its direction with respect to the genome. I will provide a. In other embodiments , the route through the contig can be selected to visit ( call) each contig exactly once . In a further embodiment, the route through the contig is selected so that the route through the adjacency matrix maximizes the total number of branch weights visited ( called ) . In this way, the most reliable contig connection is proposed as the correct assembly. In yet another embodiment , each contig can be visited ( called ) exactly once and the route through the contig can be selected so that the branch weighting of the adjacency matrix is maximized.

二倍体ゲノムにおいて、どちらの対立遺伝子のバリアントが同じ染色体上で連結しているかについて知ることはしばしば重要である。これは、ハプロタイプフェージングとして公知である。高スループット配列データの短いリードにより、どちらの対立遺伝子のバリアントが連結されているかについて稀に直接観察できる。ハプロタイプフェージングの計算による推測は、長距離では信頼できなくなることがある。本開示は、リード対上の対立遺伝子のバリアントを使用してどちらの対立遺伝子のバリアントが連結しているか決定することを可能にする1つ以上の方法を提供する。 In the diploid genome, it is often important to know which allelic variants are linked on the same chromosome. This is known as haplotype fading. Short reads of high-throughput sequence data allow rare direct observation of which allelic variant is linked. Calculating haplotype fading inferences can be unreliable over long distances. The present disclosure provides one or more methods that allow the use of allelic variants on a lead pair to determine which allelic variant is linked.

様々な実施形態において、本開示の方法及び組成物は、複数の対立遺伝子のバリアントに関して二倍体又は倍数体ゲノムのハプロタイプフェージングを可能にする。したがって本明細書に記載される方法は、連結している対立遺伝子のバリアントの決定が、リード対及び/又は同じリード対を使用してアセンブルしたコンティグのバリアント情報に基づいて連結されることを提供することができる。対立遺伝子のバリアントの例としては、それだけには限らないが、1000ゲノム、UK10K、HapMap及びヒトの間の遺伝的変異を発見するための他のプロジェクトから公知であるものを含む。実証した通り、特定の遺伝子に対する疾患関連性はハプロタイプフェージングデータを有することによって、例えば、シャルコーマリートゥースニューロパチーをもたらすSH3TC2の両方のコピーにある連結していない不活性化突然変異(Lupski JR、Reid JG、Gonzaga-Jauregui Cら、N. Engl. J. Med. 362:1181～91頁、2010年)及び高コレステロール血症9をもたらすABCG5の両方のコピーにある連結していない不活性化突然変異(Rios J、Stein E、Shendure Jら、Hum. Mol. Genet. 19:4313～18頁、2010年)の発見によってより容易に明らかにできる。 In various embodiments, the methods and compositions of the present disclosure allow haplotype fading of diploid or polyploid genomes for variants of multiple alleles. Accordingly, the methods described herein provide that determination of allelic variants that are linked is linked based on variant information of contigs assembled using lead pairs and / or the same read pair. can do. Examples of allelic variants include, but are not limited to, those known from the 1000 Genomes, UK10K, HapMap and other projects for discovering genetic variants between humans. As demonstrated, disease associations for specific genes have haplotype fading data, for example, unlinked inactivated mutations (Lupski JR, Reid JG) in both copies of SH3TC2 that result in Charcot-Marie-tooth neuropathy. , Gonzaga-Jauregui C et al., N. Engl. J. Med. 362: 1181-91, 2010) and unlinked inactivated mutations in both copies of ABCG5 resulting in hypercholesterolemia 9 (, It can be clarified more easily by the discovery of Rios J, Stein E, Shendure J et al., Hum. Mol. Genet. 19: 4313-18, 2010).

ヒトは、平均して1,000に1つの部位でヘテロ接合性である。いくつかの場合において、高スループット配列決定法を使用する単一レーンのデータは、少なくとも約150,000,000個のリード対を生成することができる。リード対は、長さ約100塩基対であり得る。これらのパラメータから、ヒトサンプルからの全リードの1/10が、ヘテロ接合性部位を網羅すると推定される。したがって、平均でヒトサンプルからの全リード対の1/100が、一対のヘテロ接合性部位を網羅すると推定される。したがって、単一レーンを使用して約1,500,000個のリード対(150,000,000の1/100)が、フェージングデータを提供する。ヒトゲノムにおけるおよそ3,000,000,000塩基、及び1,000塩基に1つがヘテロ接合性であるとすると、平均な的ヒトゲノムにおいておよそ3,000,000個のヘテロ接合性部位がある。一対のヘテロ接合性部位を表す約1,500,000個のリード対について、高スループット配列法の単一レーンを使用してフェージングしようとする各ヘテロ接合性部位の平均カバレッジは、代表的な高スループット配列決定機を使用して、約(1×)である。したがって、二倍体ヒトゲノムは、本明細書に開示する方法を使用して調製されるサンプルからの配列バリアントと関連する、1レーンの高スループット配列データにより確実且つ完全にフェージングすることができる。いくつかの例において、1レーンのデータは、1組のDNA配列リードデータであり得る。更なる例において、1レーンのデータは、高スループット配列決定機器の1回の実行からの1組のDNA配列リードデータであり得る。 Humans are heterozygous at one in 1,000 sites on average. In some cases, single-lane data using high-throughput sequencing methods can generate at least about 150,000,000 read pairs. Read pairs can be about 100 base pairs in length. From these parameters, it is estimated that 1/10 of all reads from human samples cover heterozygous sites. Therefore, on average, 1/100 of all read pairs from human samples are estimated to cover a pair of heterozygous sites. Therefore, about 1,500,000 read pairs (1/100 of 150,000,000) using a single lane provide fading data. If there are approximately 3,000,000,000 bases in the human genome, and one in 1,000 bases is heterozygous, there are approximately 3,000,000 heterozygous sites in the average human genome. For approximately 1,500,000 read pairs representing a pair of heterozygous sites, the average coverage of each heterozygous site attempted to fade using a single lane in the high throughput sequencing method is a typical high throughput sequencer. Using, is about (1 ×). Thus, the diploid human genome can be reliably and completely faded with one-lane high-throughput sequence data associated with sequence variants from samples prepared using the methods disclosed herein. In some examples, one lane of data can be a set of DNA sequence read data. In a further example, one lane of data can be a set of DNA sequence read data from a single run of a high throughput sequencing instrument.

ヒトゲノムは、相同な2組の染色体からなるので、個体の真の遺伝子構造を理解するには、母方及び父方のコピー又は遺伝物質のハプロタイプの概要説明が必要になる。個体のハプロタイプを得ることは、いくつかの方法で有用である。第1に、ハプロタイプは、臓器移植におけるドナー宿主適合の転帰の予測において臨床的に有用であり、疾患関連性を検出する手段として益々使用される。第2に、複雑なヘテロ接合性を示す遺伝子において、ハプロタイプは、2つの有害なバリアントが同じ対立遺伝子上に位置するかどうかに関する情報を提供し、これらのバリアントの遺伝が有害かどうかの予測に大きな影響を及ぼす。第3に、個体グループのハプロタイプは、集団構造及び人類の進化的な歴史に関する情報を提供してきた。最後に、最近記載された、遺伝子発現における広範囲に及ぶ対立遺伝子の不均衡は、対立遺伝子間の遺伝的又は後成的な差異が発現における量的差異の一因となり得ることを示唆している。ハプロタイプ構造の理解により、対立遺伝子の不均衡の一因となるバリアントの機序が詳細に説明されることになる。 Since the human genome consists of two sets of homologous chromosomes, an overview of maternal and paternal copies or haplotypes of genetic material is needed to understand the true genetic structure of an individual. Obtaining an individual haplotype is useful in several ways. First, haplotypes are clinically useful in predicting donor-host adaptation outcomes in organ transplantation and are increasingly being used as a means of detecting disease associations. Second, in genes that exhibit complex heterozygotes, haplotypes provide information on whether two harmful variants are located on the same allele and predict whether the inheritance of these variants is harmful. It has a big impact. Third, haplotypes of individual groups have provided information on population structure and the evolutionary history of humankind. Finally, the recently described widespread allele imbalance in gene expression suggests that genetic or posterior differences between alleles can contribute to quantitative differences in expression. .. Understanding the haplotype structure will explain in detail the mechanism of variants that contribute to allelic imbalances.

特定の実施形態において、本明細書に開示する方法は、長い範囲の連結及びフェージングの必要に応じて、ゲノムの離れた領域の間で会合を固定し、捕捉するためのin vitro技法を含む。いくつかの場合において、本方法は、XLRPライブラリーを構築し、配列決定してゲノム的に非常に離れたリード対を供給すること含む。いくつかの場合において、相互作用は主に、単一DNA断片内のランダムな会合から起こる。いくつかの例において、DNA分子内で互いに近いセグメントはより頻繁に、より高い確率で相互作用するが、分子の離れた部分間の相互作用はより頻度が低くなるので、セグメント間のゲノム距離を推測することができる。したがって、2つの座を接続している対の数と入力DNA上での近接の間に系統的関係がある。図2に実証するように、本開示は、抽出における最大のDNA断片に跨ることができるリード対を作製することができる。このライブラリーに対する入力DNAは、150kbpの最大長を有し、その長さは配列決定データから観察される意味があるリード対の最長である。これは、本方法が、より大きな入力DNA断片が得られた場合に、ゲノム的に更に離れた座を連結できることを示唆している。本方法によって作製されたデータの型を扱うように特に適合させた改善されたアセンブリソフトウェアツールを適用することにより、完全なゲノムアセンブリが可能になり得る。 In certain embodiments, the methods disclosed herein include in vitro techniques for fixing and capturing associations between distant regions of the genome, depending on the need for long-range ligation and fading. In some cases, the method comprises constructing and sequencing an XLRP library to supply genomically very distant read pairs. In some cases, the interaction primarily results from random association within a single DNA fragment. In some examples, segments that are close to each other within a DNA molecule interact more frequently and with a higher probability, but interactions between distant parts of the molecule are less frequent, so the genomic distance between the segments is reduced. You can guess. Therefore, there is a systematic relationship between the number of pairs connecting the two loci and their proximity on the input DNA. As demonstrated in FIG. 2, the present disclosure allows the creation of read pairs that can span the largest DNA fragment in extraction. The input DNA for this library has a maximum length of 150 kbp, which is the longest meaningful read pair observed from the sequencing data. This suggests that this method can ligate further genomically distant loci when larger input DNA fragments are obtained. Complete genomic assembly may be possible by applying improved assembly software tools specifically adapted to handle the types of data produced by this method.

本開示の方法及び組成物を使用して作製されるデータによって、極めて高いフェージング精度を実現することができる。これまでの方法と比較して、本明細書に記載される方法は、より高い割合のバリアントをフェージングすることができる。フェージングは、高レベルの精度を維持しながら、実現することができる。このフェーズ情報は、より長い範囲、例えば約200kbp、約300kbp、約400kbp、約500kbp、約600kbp、約700kbp、約800kbp、約900kbp、約1Mbp、約2Mbp、約3Mbp、約4Mbp、約5Mbp、又は約10Mbpより長く拡張することができる。いくつかの実施形態において、ヒトサンプルのヘテロ接合性SNPの90%より多くは、約250,000,000個未満のリード又はリード対を使用して、例えばIllumina HiSeqの1レーンだけのデータを使用することにより99%より高い精度でフェージングすることができる。他の場合において、ヒトサンプルのヘテロ接合性SNPの約40%、50%、60%、70%、80%、90%、95%又は99%より多くは、約250,000,000又は約500,000,000個未満のリード若しくはリード対を使用して、例えばIllumina HiSeqの1又は2レーンだけのデータを使用することにより約70%、80%、90%、95%又は99%より高い精度でフェージングすることができる。例えば、ヒトサンプルのヘテロ接合性SNPの95%又は99%より多くは、約250,000,000又は約500,000,000個未満のリードを使用して、約95%又は99%より高い精度でフェージングすることができる。更なる場合において、リード長を約200bp、250bp、300bp、350bp、400bp、450bp、500bp、600bp、800bp、1000bp、1500bp、2kbp、3kbp、4kbp、5kbp、10kbp、20kbp、50kbp又は100kbpに増大することによって、更なるバリアントを捕捉することができる。 The data produced using the methods and compositions of the present disclosure can achieve extremely high fading accuracy. Compared to previous methods, the methods described herein are capable of fading a higher percentage of variants. Fading can be achieved while maintaining a high level of accuracy. This phase information has a longer range, eg about 200kbp, about 300kbp, about 400kbp, about 500kbp, about 600kbp, about 700kbp, about 800kbp, about 900kbp, about 1Mbp, about 2Mbp, about 3Mbp, about 4Mbp, about 5Mbp, or It can be extended longer than about 10 Mbps. In some embodiments, more than 90% of the heterozygous SNPs in human samples use less than about 250,000,000 reads or lead pairs, eg by using data from only one lane of Illumina HiSeq 99. Fading can be performed with higher accuracy than%. In other cases, more than about 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% of the heterozygous SNPs in human samples are about 250,000,000 or less than about 500,000,000 reads. Alternatively, lead pairs can be used, for example, by using data from only one or two lanes of Illumina HiSeq, fading with greater accuracy than about 70%, 80%, 90%, 95% or 99%. For example, more than 95% or 99% of heterozygous SNPs in human samples can be faded with greater accuracy than about 95% or 99% using less than about 250,000,000 or about 500,000,000 reads. In more cases, increase the read length to about 200bp, 250bp, 300bp, 350bp, 400bp, 450bp, 500bp, 600bp, 800bp, 1000bp, 1500bp, 2kbp, 3kbp, 4kbp, 5kbp, 10kbp, 20kbp, 50kbp or 100kbp. Allows further variants to be captured.

本開示の他の実施形態において、XLRPライブラリーからのデータを使用して、長い範囲のリード対のフェージング能力を確認することができる。図6に示すように、それらの結果の精度は、これまで利用可能だった最善の技術と同程度であるが、著しく長い距離にまで更に拡張されている。特定の配列決定法のための現在のサンプル調製手順は、フェージングのために標的した制限部位のリード長内、例えば150bpに位置するバリアントを認識する。一例において、アセンブリの基準サンプルであるNA12878に対して組み立てたXLRPライブラリーから、存在する1,703,909個のヘテロ接合性SNPの44%が、99%より高い精度でフェージングされた。いくつかの場合において、この割合は、賢明な制限酵素の選択又は異なる酵素の組合せによりほとんど全ての可変部位に拡大することができる。 In other embodiments of the present disclosure, data from the XLRP library can be used to confirm the fading capability of a long range of read pairs. As shown in Figure 6, the accuracy of these results is comparable to the best technology available so far, but has been further extended to significantly longer distances. Current sample preparation procedures for specific sequencing methods recognize variants located within the read length of the restricted site targeted for fading, eg 150 bp. In one example, 44% of the 1,703,909 heterozygous SNPs present were faded with greater than 99% accuracy from the XLRP library assembled against the assembly reference sample NA12878. In some cases, this proportion can be extended to almost all variable sites by wise selection of restriction enzymes or combinations of different enzymes.

いくつかの実施形態において、本明細書に記載される組成物及び方法は、例えばヒト腸内に見出されるメタゲノムの調査を可能にする。したがって、所与の生態学的環境に住むいくつかの又は全ての生物の部分的若しくは全ゲノム配列を、調査することができる。例は、全ての腸微生物、表皮の特定の領域で見出される微生物及び有毒廃棄物場に生きる微生物のランダムな配列決定を含む。これらの環境における微生物集団の組成は、本明細書に記載される組成物及び方法、並びにそれぞれのゲノムによってコードされる相互関係のある生化学の態様を使用して決定することができる。本明細書に記載される方法は、例えば、2、3、4、5、6、7、8、9、10、12、15、20、25、30、40、50、60、70、80、90、100、125、150、175、200、250、300、400、500、600、700、800、900、1000、5000、10000以上又はより多くの生物及び/若しくは生物のバリアントを含む複雑な生物学的環境からのメタゲノム研究を可能にすることができる。 In some embodiments, the compositions and methods described herein allow for the investigation of metagenomics found, for example, in the human intestine. Therefore, partial or whole genomic sequences of some or all organisms living in a given ecological environment can be investigated. Examples include random sequencing of all intestinal microorganisms, microorganisms found in specific areas of the epidermis and microorganisms living in toxic waste sites. The composition of microbial populations in these environments can be determined using the compositions and methods described herein, as well as interrelated biochemical embodiments encoded by their respective genomes. The methods described herein are, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70, 80, Complex organisms containing 90, 100, 125, 150, 175, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 5000, 10000 or more and / or variants of organisms. It can enable metagenomic research from a scientific environment.

癌ゲノムの配列決定に必要とされる高い精度は、本明細書に記載される方法及びシステムを使用して実現することができる。癌ゲノムを配列決定する場合、不正確な参照ゲノムは、ベースコーリングが課題になる可能性がある。不均一なサンプル及び少ない出発材料、例えば生検によって得られるサンプルは、更なる課題をもたらす。更に、大規模構造バリアント及び/又はヘテロ接合性の消失の検出は、体細胞バリアント間の分化能力並びにベースコーリングにおけるエラーと同様に、癌ゲノムの配列決定にとってしばしば重大である。 The high accuracy required for sequencing cancer genomes can be achieved using the methods and systems described herein. Inaccurate reference genomes can be a challenge for base calling when sequencing the cancer genome. Non-uniform samples and small starting materials, such as samples obtained by biopsy, pose additional challenges. Moreover, the detection of large-scale structural variants and / or loss of heterozygosity is often critical for the sequencing of the cancer genome, as well as the ability to differentiate between somatic variants and errors in base calling.

本明細書に記載されるシステム及び方法は、2、3、4、5、6、7、8、9、10、12、15、20個又はより多くの様々なゲノムを含有する複雑なサンプルから正確な長い配列を生成することができる。正常、良性及び/又は腫瘍起源の混合サンプルは、任意選択で正常対照を必要とせずに、分析することができる。いくつかの実施形態において、わずか100ng又は更にわずか数百個のゲノム同等物の出発サンプルを利用して、正確な長い配列を生成する。本明細書に記載されるシステム及び方法は、大規模構造バリアント及び再編成の検出を可能にすることができる。フェージングしたバリアントコールを、約1kbp、約2kbp、約5kbp、約10kbp、20kbp、約50kbp、約100kbp、約200kbp、約500kbp、約1Mbp、約2Mbp、約5Mbp、約10Mbp、約20Mbp、約50Mbp若しくは約100Mbp又はより長いヌクレオチドに跨る長い配列に対して得ることができる、例えば、フェーズバリアントコールを、約1Mbp又は約2Mbpに跨る長い配列に対して得ることができる。 The systems and methods described herein are from complex samples containing 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20 or more various genomes. It is possible to generate an accurate long array. Mixed samples of normal, benign and / or tumor origin can be optionally analyzed without the need for a normal control. In some embodiments, starting samples of only 100 ng or even hundreds of genomic equivalents are utilized to generate accurate long sequences. The systems and methods described herein can enable the detection of large scale structural variants and reorganizations. Faded variant call, about 1kbp, about 2kbp, about 5kbp, about 10kbp, 20kbp, about 50kbp, about 100kbp, about 200kbp, about 500kbp, about 1Mbp, about 2Mbp, about 5Mbp, about 10Mbp, about 20Mbp, about 50Mbp or A phase variant call can be obtained for a long sequence spanning about 100 Mbps or longer nucleotides, eg, a phase variant call can be obtained for a long sequence spanning about 1 Mbps or about 2 Mbps.

本明細書に記載される方法及びシステムを使用して決定したハプロタイプは、計算資源、例えばクラウドシステムなどネットワーク上の計算資源に割り当てることができる。短いバリアントコールは、計算資源に格納されている関連した情報を使用して、必要に応じて修正することができる。構造バリアントは、短いバリアントコールからの複合情報及び計算資源に格納されている情報に基づいて検出することができる。セグメント重複、構造的変形を起こしやすい領域、極めて変化しやすく医学的に関連するMHC領域、セントロメア及びテロメア領域、並びにそれらに限らないが反復領域、低配列精度、高バリアント率、ALUリピート、セグメント重複又は当技術分野において公知の他の任意の関連する問題がある部分を持つ領域を含めた他の異質染色性領域などの問題があるゲノム部分は、精度を高めるために再組立てすることができる。 Haplotypes determined using the methods and systems described herein can be allocated to computational resources, such as computational resources on networks such as cloud systems. Short variant calls can be modified as needed using the relevant information stored in the computational resources. Structural variants can be detected based on complex information from short variant calls and information stored in computational resources. Segment duplication, structurally deformable regions, highly variable and medically relevant MHC regions, centromere and telomere regions, and, but not limited to, repeat regions, low sequence accuracy, high variant rates, ALU repeats, segment duplication. Alternatively, problematic genomic moieties such as other heterologous regions, including regions with any other relevant problematic moieties known in the art, can be reassembled for increased accuracy.

サンプル型は、局所的か又はクラウドなどネットワーク化されたかいずれかの計算資源中の配列情報に割り当てることができる。情報の供給源が公知である場合、例えば、情報の供給源が癌又は正常組織由来であるとき、供給源は、サンプル型の一部としてサンプルに割り当てることができる。通常他のサンプル型の例には、それだけには限らないが、組織型、サンプル収集方法、感染の存在、感染の型、処理方法、サンプルサイズ等がある。癌ゲノムと比較する正常ゲノムなど、完全な又は部分的な比較ゲノム配列が利用可能である場合に、サンプルデータと比較ゲノム配列との差異を決定し、任意選択で出力することができる。 Sample types can be assigned to array information in either local or networked computational resources such as the cloud. If the source of information is known, for example, if the source of information is from cancer or normal tissue, the source can be assigned to the sample as part of the sample type. Examples of other sample types usually include, but are not limited to, tissue type, sample collection method, presence of infection, type of infection, treatment method, sample size, and the like. When a complete or partial comparative genome sequence, such as a normal genome to be compared with the cancer genome, is available, the difference between the sample data and the comparative genome sequence can be determined and output at will.

本方法は、対象となる選択的なゲノム領域の遺伝情報及び対象となる選択的な領域と相互作用できるゲノム領域の分析において使用することができる。本明細書に開示する増幅法は、それだけに限らないが、米国特許第6,449,562号、第6,287,766号、第7,361,468号、第7,414,117号、第6,225,109号及び第6,110,709号に見出されるものなど、遺伝子分析の技術者に公知の装置、キット及び方法で使用することができる。いくつかの場合において、本開示の増幅法を使用して、DNAハイブリダイゼーション研究用の標的核酸を増幅し、それによって多形の有無を決定することができる。多形又は対立遺伝子は、遺伝的疾患など疾患又は状態と関連付けることができる。他の場合において、多形は疾患又は状態の罹病性、例えば、中毒、退行性及び年齢に関連する状態、癌などと関連する多形と関連している可能性がある。他の場合において、多形は、冠状動脈の健康増進など有益な形質、又はHIV若しくはマラリアなどの疾患に対する抵抗性、又は骨粗鬆症、アルツハイマー病若しくは痴呆症など変性疾患に対する抵抗性に関連することがある。 The method can be used in the analysis of genetic information of a selective genomic region of interest and genomic regions that can interact with the selective region of interest. The amplification methods disclosed herein are not limited to those found in US Pat. Nos. 6,449,562, 6,287,766, 7,361,468, 7,414,117, 6,225,109 and 6,110,709. It can be used with devices, kits and methods known to the person. In some cases, the amplification methods of the present disclosure can be used to amplify the target nucleic acid for DNA hybridization studies, thereby determining the presence or absence of polymorphisms. Polymorphs or alleles can be associated with a disease or condition, such as a genetic disorder. In other cases, polymorphisms may be associated with susceptibility to the disease or condition, eg, addiction, degenerative and age-related conditions, polymorphisms associated with cancer, and the like. In other cases, polymorphism may be associated with beneficial traits such as coronary artery health promotion, or resistance to diseases such as HIV or malaria, or resistance to degenerative diseases such as osteoporosis, Alzheimer's disease or dementia. ..

本開示の組成物及び方法は、診断、予後、治療、患者の層化、薬剤開発、治療選択及びスクリーニング目的に使用することができる。本開示は、本開示の方法を使用して単一の生体分子サンプルから多くの異なる標的分子を1度に分析できるという利点を提供する。これにより、例えば、1つのサンプルで実施すべきいくつかの診断検査が可能になる。 The compositions and methods of the present disclosure can be used for diagnostic, prognosis, treatment, patient stratification, drug development, treatment selection and screening purposes. The present disclosure provides the advantage that many different target molecules can be analyzed at one time from a single biomolecule sample using the methods of the present disclosure. This allows, for example, several diagnostic tests to be performed on a single sample.

本開示の組成物及び方法は、ゲノム研究に使用することができる。本明細書に記載される方法は、この用途に非常に望ましい答えを速やかに提供することができる。本明細書に記載される方法及び組成物は、診断又は予後のために並びに健康及び疾患の指標として使用することができるバイオマーカーを見出す過程に使用することができる。本明細書に記載される方法及び組成物を使用して、薬物をスクリーニングする、例えば、薬剤開発、治療の選択、治療有効性の決定、並びに/又は医薬開発の標的を同定することができる。身体においてタンパク質が最終的な遺伝子産物なので、薬物を含むスクリーニングアッセイで遺伝子発現を試験する能力は極めて重要である。いくつかの実施形態において、本明細書に記載される方法及び組成物は、タンパク質と遺伝子両方の発現を同時に測定することになり、特定のスクリーニングが実施されるという点で最も多くの情報を得られることになる。 The compositions and methods of the present disclosure can be used for genomic studies. The methods described herein can promptly provide a highly desirable answer for this application. The methods and compositions described herein can be used in the process of finding biomarkers that can be used for diagnosis or prognosis and as indicators of health and disease. The methods and compositions described herein can be used to screen for drugs, eg, drug development, treatment selection, therapeutic efficacy determination, and / or drug development targets. Since proteins are the ultimate gene product in the body, the ability to test gene expression in drug-containing screening assays is crucial. In some embodiments, the methods and compositions described herein will measure the expression of both proteins and genes simultaneously, providing the most information in that a particular screening will be performed. Will be.

本開示の組成物及び方法は、遺伝子発現分析に使用することができる。本明細書に記載される方法は、ヌクレオチド配列どうしを判別する。標的ヌクレオチド配列間の差異は、例えば、単一核酸塩基差異、核酸欠失、核酸挿入又は再編成であり得る。2つ以上の塩基を含むそのような配列差異を、検出することもできる。本開示の過程は、感染症、遺伝的疾患及び癌を検出することが可能である。それは、環境モニタリング、科学捜査及び食品科学においても有用である。核酸で実施できる遺伝子分析の例には、例えば、SNP検出、STR検出、RNA発現分析、プロモーターメチル化、遺伝子発現、ウイルス検出、ウイルス細分類及び薬物耐性がある。 The compositions and methods of the present disclosure can be used for gene expression analysis. The methods described herein discriminate between nucleotide sequences. Differences between target nucleotide sequences can be, for example, single nucleobase differences, nucleic acid deletions, nucleic acid insertions or rearrangements. Such sequence differences containing two or more bases can also be detected. The process of the present disclosure is capable of detecting infectious diseases, genetic diseases and cancers. It is also useful in environmental monitoring, forensics and food science. Examples of gene analysis that can be performed on nucleic acids include, for example, SNP detection, STR detection, RNA expression analysis, promoter methylation, gene expression, virus detection, virus subclassification and drug resistance.

本方法を、患者から得た若しくは患者に由来した生体分子サンプルの分析に適用して、疾患のある細胞型がサンプル中に存在するかどうか、病期、患者の予後、患者が特定の治療に反応する能力又は患者にとって最善の治療を決定することができる。本方法を適用して、特定の疾患に対するバイオマーカーを同定することもできる。 This method can be applied to the analysis of biomolecular samples obtained from or derived from patients to determine whether diseased cell types are present in the samples, stage, patient prognosis, and patient specific treatment. The ability to respond or the best treatment for the patient can be determined. The method can also be applied to identify biomarkers for specific diseases.

いくつかの実施形態において、本明細書に記載される方法は、状態の診断に使用される。本明細書では、用語、状態を「診断する」又は「診断」は、状態を予測又は診断すること、状態になりやすい体質を決定すること、状態の治療を監視すること、疾患の治療応答を診断すること、又は状態の予後、状態の進行、若しくは状態の特定の治療への応答を含むことができる。例えば、本明細書に記載される方法のいずれかに従って血液サンプルをアッセイして、サンプル中の疾患マーカー若しくは悪性細胞型の存在及び/又は量を決定し、それによって疾患又は癌を診断若しくは段階付けすることができる。 In some embodiments, the methods described herein are used for diagnosing a condition. As used herein, the term "diagnosing" or "diagnosing" a condition refers to predicting or diagnosing the condition, determining a predisposition to the condition, monitoring the treatment of the condition, and the therapeutic response of the disease. It can include diagnosing, or the prognosis of the condition, the progression of the condition, or the response to a particular treatment of the condition. For example, blood samples are assayed according to any of the methods described herein to determine the presence and / or amount of disease markers or malignant cell types in the sample, thereby diagnosing or grading the disease or cancer. can do.

いくつかの実施形態において、本明細書に記載される方法及び組成物は、状態の診断及び予後に使用される。 In some embodiments, the methods and compositions described herein are used for diagnosis and prognosis of the condition.

多数の免疫性、増殖性、悪性疾患及び障害が、本明細書に記載される方法に特に適している。免疫疾患及び障害には、アレルギー疾患及び障害、免疫機能の障害並びに自己免疫疾患及び状態がある。アレルギー疾患及び障害には、それだけには限らないがアレルギー性鼻炎、アレルギー性結膜炎、アレルギー性喘息、アトピー性湿疹、アトピー性皮膚炎並びに食事性アレルギーがある。免疫不全には、それだけには限らないが重症複合型免疫不全症(SCID)、好酸球増多症候群、慢性肉芽腫性疾患、白血球接着不全症I及びII、高IgE症候群、チェディアック東症候群、好中球増多症、好中球減少症、形成不全症、無ガンマグロブリン血症、高IgM症候群、ディジョージ/口蓋心臓顔面症候群並びにインターフェロンγTH1経路欠損がある。自己免疫性及び免疫性調節不全障害には、それだけには限らないが、関節リウマチ、糖尿病、全身エリテマトーデス、グレーブス病、グラーブ眼障害、クローン病、多発性硬化症、乾癬、全身性硬化症、甲状腺腫及びリンパ腫症性甲状腺腫(橋本病、リンパ腫性甲状腺腫)、円形脱毛症、自己免疫性心筋症、硬化性苔癬、自己免疫性ブドウ膜炎、アジソン病、萎縮性胃炎、重症筋無力症、特発性血小板減少性紫斑病、溶血性貧血、原発性胆汁性肝硬変、ヴェゲナー肉芽腫症、結節性多発性動脈炎及び炎症性腸疾患、同種異系移植拒絶並びに感染性微生物又は環境抗原に対するアレルギー反応による組織破壊がある。 A large number of immune, proliferative, malignant diseases and disorders are particularly suitable for the methods described herein. Immune diseases and disorders include allergic diseases and disorders, disorders of immune function and autoimmune diseases and conditions. Allergic diseases and disorders include, but are not limited to, allergic rhinitis, allergic conjunctivitis, allergic asthma, atopic eczema, atopic dermatitis and dietary allergies. Immunodeficiency includes, but is not limited to, severe combined immunodeficiency (SCID), neutropenia syndrome, chronic granulomatous disease, leukocyte adhesion deficiency I and II, high IgE syndrome, Chediak-Higashi syndrome, There are neutrophil hyperplasia, neutropenia, dysplasia, agammaglobulinemia, hyper-IgM syndrome, DiGeorge / palatal cardiac facial syndrome and interferon γTH1 pathway deficiency. Autoimmune and immune dysregulation disorders include, but are not limited to, rheumatoid arthritis, diabetes, systemic erythematosus, Graves' disease, Grave's eye disorder, Crohn's disease, multiple sclerosis, psoriasis, systemic sclerosis, thyroid tumor. And lymphomatosis thyroidoma (Hashimoto's disease, lymphoma thyroidoma), alopecia round, autoimmune myocardial disease, sclerosing lichen, autoimmune vasculitis, Azison's disease, atrophic gastric inflammation, severe myasthenia, Idiopathic thrombocytopenic purpura, hemolytic anemia, primary biliary cirrhosis, Wegener's granulomatosis, nodular polyarteritis and inflammatory bowel disease, allogeneic transplant rejection and allergic reactions to infectious microorganisms or environmental antigens There is tissue destruction due to.

本開示の方法によって評価できる増殖的な疾患及び障害には、それだけには限らないが、新生児における血管腫症、二次進行多発性硬化症、慢性進行性骨髄変性疾患、神経線維腫症、神経節腫症、ケロイド形成、骨のパジェット病、線維嚢胞症(例えば、乳房又は子宮の)、サルコイドーシス、ペーロニー及びデュピュイトラン線維症、肝硬変、アテローム性動脈硬化症並びに脈管再狭窄がある。 Proliferative disorders and disorders that can be assessed by the methods of the present disclosure include, but are not limited to, hemangiomasosis, secondary advanced multiple sclerosis, chronic progressive myelopathy, neurofibrosis, ganglia in neonates. There are swelling, keloid formation, bone Paget's disease, fibrous cystic disease (eg, breast or uterine), sarcoidosis, pelony and dupuytran fibrosis, cirrhosis, atherosclerosis and vascular restenosis.

本開示の方法によって評価できる悪性疾患及び障害には、血液学的悪性腫瘍及び固形腫瘍がある。 Malignant diseases and disorders that can be evaluated by the methods of the present disclosure include hematological malignancies and solid tumors.

そのような悪性腫瘍は、血液由来細胞に変化を含むので、サンプルが血液サンプルである場合、血液学的悪性腫瘍は本開示の方法に特に適している。そのような悪性腫瘍には、非ホジキンリンパ腫、ホジキンリンパ腫、非B細胞系リンパ腫及び他のリンパ腫、急性又は慢性白血病、赤血球増加症、血小板血症、多発性骨髄腫、骨髄異形成障害、骨髄増殖性疾患、骨髄線維症、非定型的免疫性リンパ球増殖並びに形質細胞疾患がある。 Since such malignancies contain changes in blood-derived cells, hematological malignancies are particularly suitable for the methods of the present disclosure if the sample is a blood sample. Such malignant tumors include non-Hodgkin lymphoma, Hodgkin lymphoma, non-B cell line lymphoma and other lymphomas, acute or chronic leukemia, erythrocytosis, thromboemia, multiple myeloma, myeloma dysplasia, myeloproliferative disorders. There are sexual disorders, myeloma fibrosis, atypical immune lymphocyte proliferation and plasma cell disorders.

本開示の方法によって評価できる形質細胞疾患には、多発性骨髄腫、アミロイド症及びワルデンストレームマクログロブリン血症がある。 Plasma cell disorders that can be assessed by the methods of the present disclosure include multiple myeloma, amyloidosis and Waldenström macroglobulinemia.

固形腫瘍の例には、それだけには限らないが、大腸癌、乳癌、肺癌、前立腺ガン、脳腫瘍、中枢神経系腫瘍、膀胱腫瘍、黒色腫、肝癌、骨肉腫及び他の骨癌、睾丸及び卵巣の癌腫、頭頸部腫瘍並びに頸部新生物がある。 Examples of solid tumors include, but are not limited to, colon cancer, breast cancer, lung cancer, prostate cancer, brain tumor, central nervous system tumor, bladder tumor, melanoma, liver cancer, osteosarcoma and other bone cancers, testicles and ovaries. There are cancers, head and neck tumors and cervical neoplasms.

遺伝的疾患も、本開示の過程によって検出することができる。これは、染色体及び遺伝的な異常又は遺伝的疾患について出生前又は出産後スクリーニングによって実施することができる。検出可能な遺伝的疾患の例には、21-ヒドロキシラーゼ欠損症、嚢胞性線維症、脆弱X症候群、ターナー症候群、デュシェンヌ型筋ジストロフィー、ダウン症候群又は他のトリソミー、心臓病、単一遺伝子疾患、HLAタイピング、フェニルケトン尿症、鎌状赤血球性貧血、テイサックス病、サラセミア、クラインフェルター症候群、ハンチントン病、自己免疫疾患、リピドーシス、肥満異常、血友病、先天性代謝異常及び糖尿病がある。 Genetic disorders can also be detected by the process of the present disclosure. This can be done by prenatal or postnatal screening for chromosomal and genetic abnormalities or diseases. Examples of detectable genetic disorders include 21-hydroxylase deficiency, cystic fibrosis, fragile X syndrome, Turner syndrome, Duchenne muscular dystrophy, Down syndrome or other trisomy, heart disease, monogenic disease, HLA. There are typing, phenylketonuria, sickle erythrocyte anemia, Teisax's disease, salacemia, Kleinfelder's syndrome, Huntington's disease, autoimmune disease, lipidosis, obesity, hemophilia, congenital metabolic disorders and diabetes.

本明細書に記載される方法を使用して、サンプル中のバクテリア若しくはウイルスそれぞれのマーカーの存在並びに/又は量を決定することによって、病原体感染症、例えば細胞内細菌及びウイルスによる感染症を診断することができる。 The methods described herein are used to diagnose pathogenic infections, such as intracellular bacterial and viral infections, by determining the presence and / or amount of markers for each of the bacteria or viruses in the sample. be able to.

本開示の過程によって多種多様な感染症を検出することができる。感染症は、細菌、ウイルス、寄生虫及び真菌の感染因子に起因することができる。薬物に対する様々な感染因子の抵抗性も、本開示を使用して決定できる。 A wide variety of infectious diseases can be detected by the process of the present disclosure. Infectious diseases can be caused by infectious agents of bacteria, viruses, parasites and fungi. The resistance of various infectious agents to the drug can also be determined using this disclosure.

本開示によって検出することができる細菌感染因子には、大腸菌(Escherichia coli)、サルモネラ菌(Salmonella)、赤痢菌(Shigella)、クレブシエラ(KlESBiella)、シュードモナス(Pseudomonas)、リステリアモノサイトゲネス(Listeria monocytogenes)、マイコバクテリウムツベルクローシス(Mycobacterium tuberculosis)、マイコバクテリウムアビウムイントラセルラーレ(Mycobacterium aviumintracellulare)、エルシニア(Yersinia)、フランシセラ(Francisella)、パスツレラ(Pasteurella)、ブルセラ(Brucella)、クロストリジウム(Clostridia)、百日咳菌(Bordetella pertussis)、バクテロイデス(Bacteroides)、黄色ブドウ球菌(Staphylococcus aureus)、肺炎連鎖球菌(Streptococcus pneumonia)、B群溶血性レンサ球菌(B-Hemolytic strep.)、コリネバクテリア(Corynebacteria)、レジオネラ(Legionella)、ミコプラズマ(Mycoplasma)、ウレアプラズマ(Ureaplasma)、クラミジア(Chlamydia)、淋菌(Neisseria gonorrhea)、髄膜炎菌(Neisseria meningitides)、ヘモフィルスインフルエンザ(Hemophilus influenza)、エンテロコッカスフェカーリス(Enterococcus faecalis)、プロテウスブルガリス(Proteus vulgaris)、プロテウスミラビリス(Proteus mirabilis)、ヘリコバクターピロリ(Helicobacter pylori)、トレポネーマパラジウム(Treponema palladium)、ボレリアブルグドルフェリ(Borrelia burgdorferi)、ボレリアレカレンチス(Borrelia recurrentis)、リケッチア病原体(Rickettsial pathogens)、ノカルジア(Nocardia)及び放線菌(Actinomycetes)がある。 Bacterial infectious agents that can be detected by the present disclosure include Escherichia coli, Salmonella, Shigella, KlESBiella, Pseudomonas, Listeria monocytogenes, Mycobacterium tuberculosis, Mycobacterium avium intracellulare, Yersinia, Francisella, Pasteurella, Brucella, Clostridia, Pertussis Bacteria pertussis, Bacteroides, Staphylococcus aureus, Streptococcus pneumonia, Group B hemolytic streptococcus (B-Hemolytic strep.), Corynebacteria, Legionella ), Mycoplasma, Ureaplasma, Chlamydia, Neisseria gonorrhea, Neisseria meningitides, Hemophilus influenza, Enterococcus faecalis, Proteus Proteus vulgaris, Proteus mirabilis, Helicobacter pylori, Treponema bacteria, Borrelia burgdorferi, Borrelia recurrentis, Borrelia recurrentis There are pathogens, Nocardia and Actinomycetes.

本開示によって検出することができる真菌の感染因子には、クリプトコッカスネオフォルマンス(Cryptococcus neoformans)、ブラストミセスダーマチチジス(Blastomyces dermatitidis)、ヒストプラスマカプスラーツム(Histoplasma capsulatum)、コクシジオイデスイミティス(Coccidioides immitis)、パラコクシジオイデスブラジリエンシス(Paracoccidioides brasiliensis)、カンジダアルビカンス(Candida albicans)、アスペルギルスフミガタス(Aspergillus fumigautus)、藻菌類[クモノスカビ属(Rhizopus)](Phycomycetes)、スポロトリックスシェンキイ(Sporothrix schenckii)、クロモミコーシス(Chromomycosis)及びマズラミコーシス(Maduromycosis)がある。 Fungal infectious agents that can be detected by the present disclosure include Cryptococcus neoformans, Blastomyces dermatitidis, Histoplasma capsulatum, and Coccidiides immitis. , Paracoccidiides brasiliensis, Candida albicans, Aspergillus fumigautus, Algae [Rhizopus] (Phycomycetes) There are Chromomycosis and Maduromycosis.

本開示によって検出することができるウイルス感染因子には、ヒト免疫不全ウイルス、ヒトT-細胞リンパ好性ウイルス、肝炎ウイルス(例えば、B型肝炎ウイルス及びC型肝炎ウイルス)、エプスタインバーウイルス、サイトメガロウイルス、ヒトパピローマウイルス、オルトミクソウイルス、パラミクソウイルス、アデノウイルス、コロナウイルス、ラブドウイルス、ポリオウイルス、トーガウイルス、ブニヤウイルス、アリーナウイルス、風疹ウイルス及びレオウイルスがある。 Viral infectious agents that can be detected by the present disclosure include human immunodeficiency virus, human T-cell lymphophilic virus, hepatitis virus (eg, hepatitis B virus and hepatitis C virus), Epsteiner virus, cytomegalo. There are viruses, human papillomavirus, orthomixovirus, paramixovirus, adenovirus, coronavirus, rabdovirus, poliovirus, togavirus, bunyavirus, arenavirus, eczema virus and leovirus.

本開示によって検出することができる寄生虫因子には、熱帯熱マラリア原虫(Plasmodium falciparum)、四日熱マラリア原虫(Plasmodium malaria)、三日熱マラリア原虫(Plasmodium vivax)、卵形マラリア原虫(Plasmodium ovale)、回旋糸状虫(Onchoverva volvulus)、リーシュマニア、トリパノソーマ種、住血吸虫種、赤痢アメーバ(Entamoeba histolytica)、クリプトスポリジウム、ジアルジア種、トリコモナス種、大腸バランチジウム(Balatidium coli)、バンクロフト糸状虫(Wuchereria bancrofti)、トキソプラズマ種、ギョウチュウ(Enterobius vermicularis)、回虫(Ascaris lumbricoides)、ヒト鞭虫(Trichuris trichiura)、メジナ虫(Dracunculus medinesis)、吸虫、広節裂頭条虫(Diphyllobothrium latum)、条虫種、ニューモシスチスカリニ(Pneumocystis carinii)及びアメリカ鉤虫(Necator americanis)がある。 Parasitic factors that can be detected by the present disclosure include Plasmodium falciparum, Plasmodium malaria, Plasmodium vivax, and Plasmodium ovale. ), Onchoverva volvulus, Leishmania, Tripanosoma, Livestock sucker, Entamoeba histolytica, Cryptospolidium, Diargia, Tricomonas, Balatidium coli, Wuchereria bancrofti ), Toxoplasma, Enterobius vermicularis, Ascaris lumbricoides, Trichuris trichiura, Dracunculus medinesis, Sucker, Diphyllobothrium latum, Plasmodium, Plasmodium There are Carinii (Pneumocystis carinii) and American Plasmodium (Necator americanis).

本開示は、感染因子による薬剤耐性の検出にも有用である。例えば、バンコマイシン耐性エンテロコッカスヘシュウム(Enterococcus faecium)、メチシリン耐性黄色ぶどう球菌、ペニシリン耐性肺炎球菌(Streptococcus pneumoniae)、多剤耐性ヒト結核菌(Mycobacterium tuberculosis)及びAZT耐性ヒト免疫不全ウイルスの全てを、本開示で同定することができる。 The present disclosure is also useful for detecting drug resistance by infectious agents. For example, all of the bancomycin-resistant Enterococcus faecium, methicillin-resistant yellow bacillus, penicillin-resistant Streptococcus pneumoniae, multidrug-resistant human tuberculosis, and AZT-resistant human immunodeficiency virus. It can be identified in the disclosure.

したがって、本開示の組成物及び方法を使用して検出される標的分子は、患者マーカー(癌マーカーなど)か又は細菌若しくはウイルスマーカーなど外来因子による感染症マーカーのいずれかであり得る。 Therefore, the target molecule detected using the compositions and methods of the present disclosure can be either a patient marker (such as a cancer marker) or an infectious disease marker due to a foreign factor such as a bacterial or viral marker.

本開示の組成物及び方法を使用して、存在量が生物学的状態若しくは病状の指標となる標的分子、例えば病態の結果として上方制御若しくは下方制御される血液マーカーを同定及び/又は定量化することができる。 The compositions and methods of the present disclosure are used to identify and / or quantify target molecules whose abundance is an indicator of biological status or pathology, such as blood markers that are upregulated or downregulated as a result of the pathology. be able to.

いくつかの実施形態において、本開示の方法及び組成物は、サイトカイン発現に使用することができる。本明細書に記載される方法の低い感度は、例えば、癌などの疾患の状態、診断又は予後のバイオマーカーとしてサイトカインの早期検出、及び潜在的状態の同定に役立ち得る。 In some embodiments, the methods and compositions of the present disclosure can be used for cytokine expression. The low sensitivity of the methods described herein can be useful, for example, for early detection of cytokines as biomarkers for the diagnosis or prognosis of diseases such as cancer, and identification of potential states.

標的ポリヌクレオチドが得られる異なるサンプルは、同じ個体からの複数のサンプル、異なる個体からのサンプル又はその組合せを含むことができる。いくつかの実施形態において、サンプルは一個人由来の複数のポリヌクレオチドを含む。いくつかの実施形態において、サンプルは2人以上の個体由来の複数のポリヌクレオチドを含む。個体とは、標的ポリヌクレオチドを得ることができる任意の生物又はその部分であり、その限定されない例としては、植物、動物、真菌、原生生物、原核生物、ウイルス、ミトコンドリア及び葉緑体がある。サンプルポリヌクレオチドは、例えば培養細胞系、生検、血液サンプル又は細胞を含有する液体サンプルを含めた、対象から得られる細胞サンプル、組織サンプル又は器官サンプルなど、対象から単離することができる。対象は、動物であり、ウシ、ブタ、マウス、ラット、鶏、ネコ、イヌ等などの動物を含み得るがこれに限定されず、通常ヒトなどの哺乳動物である。化学合成によるなど、サンプルは人工的に得ることもできる。いくつかの実施形態において、サンプルはDNAを含む。いくつかの実施形態において、サンプルはゲノムDNAを含む。いくつかの実施形態において、サンプルは、ミトコンドリアDNA、葉緑体DNA、プラスミドDNA、細菌人工染色体、酵母人工染色体、オリゴヌクレオチドタグ又はその組合せを含む。いくつかの実施形態において、サンプルは、ポリメラーゼ連鎖反応(PCR)、逆転写並びにその組合せを含むがこれに限定されないプライマーとDNAポリメラーゼの任意の適切な組合せを使用するプライマー伸長反応によって生成されるDNAを含む。プライマー伸長反応の鋳型がRNAである場合、逆転写産物は、相補的DNA(cDNA)と呼ばれる。プライマー伸長反応に有用なプライマーは、1つ以上の標的に特異的な配列、ランダム配列、部分的にランダムな配列及びその組合せを含むことができる。プライマー伸長反応に適切な反応条件は、当技術分野において公知である。一般に、サンプルポリヌクレオチドは、サンプル中に存在する任意のポリヌクレオチドを含み、標的ポリヌクレオチドを含むことも含まないこともできる。 Different samples from which the target polynucleotide is obtained can include multiple samples from the same individual, samples from different individuals or combinations thereof. In some embodiments, the sample comprises multiple polynucleotides of one individual origin. In some embodiments, the sample comprises a plurality of polynucleotides from two or more individuals. An individual is any organism or part thereof from which a target polynucleotide can be obtained, and examples thereof include, but are not limited to, plants, animals, fungi, prokaryotes, prokaryotes, viruses, mitochondria, and chloroplasts. Sample polynucleotides can be isolated from a subject, such as a cell sample, tissue sample or organ sample obtained from the subject, including, for example, cultured cell lines, biopsies, blood samples or liquid samples containing cells. The subject is an animal, which may include, but is not limited to, animals such as cows, pigs, mice, rats, chickens, cats, dogs and the like, and is usually mammals such as humans. Samples can also be obtained artificially, such as by chemical synthesis. In some embodiments, the sample comprises DNA. In some embodiments, the sample comprises genomic DNA. In some embodiments, the sample comprises mitochondrial DNA, chlorophyll DNA, plasmid DNA, bacterial artificial chromosome, yeast artificial chromosome, oligonucleotide tag or a combination thereof. In some embodiments, the sample is DNA produced by a primer extension reaction using any suitable combination of primers and DNA polymerase, including but not limited to polymerase chain reaction (PCR), reverse transcription and combinations thereof. including. When the template for the primer extension reaction is RNA, the reverse transcript is called complementary DNA (cDNA). Primers useful for the primer extension reaction can include one or more target-specific sequences, random sequences, partially random sequences and combinations thereof. Reaction conditions suitable for the primer extension reaction are known in the art. In general, the sample polynucleotide includes any polynucleotide present in the sample and may or may not include the target polynucleotide.

いくつかの実施形態において、鋳型核酸分子(例えば、DNA又はRNA)は、タンパク質、脂質及び非鋳型核酸など様々な他の構成要素を含有する生体サンプルから単離される。鋳型核酸分子は、任意の細胞物質から得ることができ、動物、植物、バクテリア、真菌又は他の任意の細胞生物から得ることができる。本開示に使用する生体サンプルは、ウイルス粒子又は調製物を含む。鋳型核酸分子は、生物又は生物から得られる生体サンプル、例えば血液、尿、脳脊髄液、精液、唾液、痰、便及び組織から直接得ることができる。任意の組織又は体液検体が、本開示に使用する核酸の供給源として使用することができる。鋳型核酸分子は、初代細胞培養又は細胞株などの培養細胞から単離することもできる。鋳型核酸が得られる細胞又は組織は、ウイルス若しくは他の細胞内病原体に感染していることもできる。サンプルは、生物検体から抽出される全RNA、cDNAライブラリー、ウイルス又はゲノムDNAであることもできる。サンプルは、非細胞起源の単離されたDNA、例えばフリーザーから増幅/単離されたDNAであることもできる。 In some embodiments, the template nucleic acid molecule (eg, DNA or RNA) is isolated from a biological sample containing various other components such as proteins, lipids and non-template nucleic acids. The template nucleic acid molecule can be obtained from any cellular material and can be obtained from animals, plants, bacteria, fungi or any other cellular organism. Biological samples used in the present disclosure include viral particles or preparations. The template nucleic acid molecule can be obtained directly from the organism or a biological sample obtained from the organism, such as blood, urine, cerebrospinal fluid, semen, saliva, sputum, stool and tissue. Any tissue or body fluid sample can be used as a source of nucleic acids used in the present disclosure. The template nucleic acid molecule can also be isolated from cultured cells such as primary cell cultures or cell lines. The cells or tissues from which the template nucleic acid is obtained can also be infected with a virus or other intracellular pathogen. The sample can also be total RNA, cDNA library, virus or genomic DNA extracted from a biological sample. The sample can also be isolated DNA of non-cellular origin, eg DNA amplified / isolated from a freezer.

核酸の抽出及び精製方法は、当技術分野で周知である。例えば、核酸は、フェノール、フェノール/クロロホルム/イソアミルアルコール、又はTRIzol及びTriReagentを含む類似の処方で有機抽出によって精製することができる。抽出技法の他の限定されない例には、以下がある:(1)自動核酸抽出装置、例えば、Applied Biosystems社(Foster City、Calif.)から入手可能なモデル341DNA Extractorを使用して又は使用しない、例えばフェノール/クロロホルム有機試薬を使用する(Ausubelら、1993年)、エタノール沈殿を伴う有機抽出、(2)固定相吸着法(米国特許第5,234,809号、Walshら、1991年)、及び(3)塩誘発性核酸沈殿法(Millerら(1988)、そのような沈殿法は、一般に「塩析」法と呼ばれる。核酸を単離及び/又は精製するもう一つの例には、磁気粒子の使用があり、核酸は特異的若しくは非特異的にその粒子に結合し、その後磁石を使用してビーズを単離し、洗浄し、ビーズから核酸を溶出することができる(例えば米国特許第5,705,628号を参照のこと)。いくつかの実施形態において、サンプルから不要なタンパク質を取り除くのに役立つ酵素消化ステップ、例えばプロテイナーゼK又は他のプロテアーゼによる消化の後に上記の単離法があってもよい。例えば米国特許第7,001,724号を参照のこと。必要に応じて、RNase阻害剤を、溶解緩衝液に添加することができる。特定の細胞又はサンプル型の場合、手順にタンパク質変性/消化ステップを加えることが望まれる。精製法は、DNA、RNA又は両方を単離することを対象にできる。抽出手順の間又はその後にDNAとRNAの両方が一緒に単離される場合、更なるステップを利用して、一方を若しくは両方を他と別々に精製することができる。例えば、サイズ、配列又は他の物理的若しくは化学的特性による精製により、抽出した核酸の細画分を生成することもできる。最初の核酸単離ステップに加えて、本開示の方法における任意のステップの後に、過剰な若しくは不要な試薬、反応物又は産物を除去するなど、核酸の精製を実施することができる。 Nucleic acid extraction and purification methods are well known in the art. For example, nucleic acids can be purified by organic extraction with phenol, phenol / chloroform / isoamyl alcohol, or similar formulations containing TRIzol and TriReagent. Other unrestricted examples of extraction techniques include: (1) with or without an automated nucleic acid extractor, eg, Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.). For example, using phenol / chloroform organic reagents (Ausubel et al., 1993), organic extraction with ethanol precipitation, (2) stationary phase adsorption method (US Pat. No. 5,234,809, Walsh et al., 1991), and (3) salts. Inducible Nucleic Acid Precipitation Method (Miller et al. (1988), such precipitation method is commonly referred to as the "salting" method. Another example of isolating and / or purifying nucleic acid is the use of magnetic particles. , Nucleic acid can be specifically or non-specifically bound to its particles, after which the beads can be isolated and washed using a magnet and the nucleic acid can be eluted from the beads (see, eg, US Pat. No. 5,705,628). ). In some embodiments, there may be the above isolation method after an enzymatic digestion step, eg digestion with Proteinase K or other protease, which helps remove unwanted proteins from the sample, eg US Pat. No. 7,001,724. See No. If desired, an RNase inhibitor can be added to the lysis buffer. For certain cell or sample types, it is desirable to add a protein denaturation / digestion step to the procedure. The method can cover the isolation of DNA, RNA or both. If both DNA and RNA are isolated together during or after the extraction procedure, a further step is utilized to take advantage of one or both. Can be purified separately from others. For example, purification by size, sequence or other physical or chemical properties can also produce a fraction of the extracted nucleic acid. In the first nucleic acid isolation step. In addition, after any step in the methods of the present disclosure, nucleic acid purification can be performed, such as removing excess or unwanted reagents, reactants or products.

鋳型核酸分子は、2003年10月9日に公開された米国特許出願公開第2002/0,190,663号A1に記載の通り得ることができる。通常、核酸は、Maniatisら、Molecular Cloning、A Laboratory Manual、Cold Spring Harbor、N.Y.、280～281頁(1982)に記載されるものなど、様々な技法によって生体サンプルから抽出することができる。いくつかの場合において、核酸は、生体サンプルからの第1の抽出物であり、次いでin vitroで架橋できる。いくつかの場合において、天然の会合タンパク質(例えばヒストン)を、核酸から更に除去することができる。 Template nucleic acid molecules can be obtained as described in US Patent Application Publication No. 2002 / 0,190,663 A1 published October 9, 2003. Nucleic acids can usually be extracted from biological samples by a variety of techniques, including those described in Maniatis et al., Molecular Cloning, A Laboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281 (1982). In some cases, the nucleic acid is the first extract from a biological sample and then can be crosslinked in vitro. In some cases, naturally associated proteins (eg, histones) can be further removed from the nucleic acid.

他の実施形態において、本開示は、例えば、組織、細胞培養、体液、動物組織、植物、細菌、真菌、ウイルス等から単離されるDNAを含めた、任意の高分子量二本鎖DNAに容易に適用することができる。 In other embodiments, the present disclosure facilitates any high molecular weight double-stranded DNA, including DNA isolated from, for example, tissues, cell cultures, body fluids, animal tissues, plants, bacteria, fungi, viruses and the like. Can be applied.

いくつかの実施形態においては、複数の独立したサンプルのそれぞれは、少なくとも約1ng、2ng、5ng、10ng、20ng、30ng、40ng、50ng、75ng、100ng、150ng、200ng、250ng、300ng、400ng、500ng、1μg、1.5μg、2μg、5μg、10μg、20μg、50μg、100μg、200μg、500μg、又は1000μg、又はより多くの核酸材料を、それぞれ独立に含むことができる。いくつかの実施形態においては、複数の独立したサンプルのそれぞれは、約1ng、2ng、5ng、10ng、20ng、30ng、40ng、50ng、75ng、100ng、150ng、200ng、250ng、300ng、400ng、500ng、1μg、1.5μg、2μg、5μg、10μg、20μg、50μg、100μg、200μg、500μg、又は1000μg、未満又はより多くの核酸をそれぞれ独立に含むことができる。 In some embodiments, each of the plurality of independent samples is at least about 1 ng, 2 ng, 5 ng, 10 ng, 20 ng, 30 ng, 40 ng, 50 ng, 75 ng, 100 ng, 150 ng, 200 ng, 250 ng, 300 ng, 400 ng, 500 ng. , 1 μg, 1.5 μg, 2 μg, 5 μg, 10 μg, 20 μg, 50 μg, 100 μg, 200 μg, 500 μg, or 1000 μg, or more nucleic acid materials, respectively, can be independently included. In some embodiments, each of the plurality of independent samples is approximately 1 ng, 2 ng, 5 ng, 10 ng, 20 ng, 30 ng, 40 ng, 50 ng, 75 ng, 100 ng, 150 ng, 200 ng, 250 ng, 300 ng, 400 ng, 500 ng, It can independently contain less than or less than 1 μg, 1.5 μg, 2 μg, 5 μg, 10 μg, 20 μg, 50 μg, 100 μg, 200 μg, 500 μg, or 1000 μg, respectively.

いくつかの実施形態において、Epicentre Biotechnologies社(Madison、WI)から入手可能なものなど市販のキットを使用して末端修復を実施して、平滑末端5'リン酸化核酸末端を生成する。 In some embodiments, end repairs are performed using commercially available kits, such as those available from Epicentre Biotechnologies (Madison, WI), to produce blunt-ended 5'phosphorylated nucleic acid ends.

アダプターオリゴヌクレオチドは、少なくともその一部が既知であり、標的ポリヌクレオチドに接合することが可能な配列を有する任意のオリゴヌクレオチドを含む。アダプターオリゴヌクレオチドは、DNA、RNA、ヌクレオチド類似体、非標準ヌクレオチド、標識ヌクレオチド、修飾ヌクレオチド又はその組合せを含むことができる。アダプターオリゴヌクレオチドは、一本鎖、二本鎖又は部分的な二本鎖であり得る。一般に、部分的な二本鎖アダプターは、1つ以上の一本鎖領域及び1つ以上の二本鎖領域を含む。二本鎖アダプターは、互いにハイブリダイズする2つの別々のオリゴヌクレオチドを含むことができ(「オリゴヌクレオチド二本鎖」とも呼ばれる)、ハイブリダイゼーションにより、1つ以上の平滑末端、1つ以上の3'突出、1つ以上の5'突出、ミスマッチ及び/若しくは対にならないヌクレオチドに起因する1つ以上のバルジ、又はこれらの任意の組合せが残り得る。いくつかの実施形態において、一本鎖アダプターは、互いにハイブリダイズすることができる配列を2つ以上含む。一本鎖アダプター中に2つのそのようなハイブリダイズ可能な配列が含有される場合、ハイブリダイゼーションはヘアピン構造を生ずる(ヘアピンアダプター)。アダプターのハイブリダイズした2つの領域が、非ハイブリダイズ領域により互いから隔てられている場合、「バブル」構造が生じる。バブル構造を含むアダプターは、内部ハイブリダイゼーションを含む単一のアダプターオリゴヌクレオチドからなることができ、又は互いにハイブリダイズする2つ以上のアダプターオリゴヌクレオチドを含むことができる。アダプター中の2つのハイブリダイズ可能な配列間など内部配列のハイブリダイゼーションは、一本鎖アダプターオリゴヌクレオチドに二本鎖構造を作製することができる。ヘアピンアダプターと二本鎖アダプター、又は異なる配列のアダプターなど異なる種類のアダプターを組み合わせて使用することができる。ヘアピンアダプター中のハイブリダイズ可能な配列は、オリゴヌクレオチドの一方又は両方の末端を含んでも含まなくてもよい。いずれの末端もハイブリダイズ可能な配列に含まれない場合、両方の末端は「遊離」又は「突出」している。一方の末端だけがアダプター中の別の配列にハイブリダイズ可能である場合、他方の末端は、3'突出又は5'突出などの突出を形成する。5'末端ヌクレオチド及び3'末端ヌクレオチドが相補的であり、互いにハイブリダイズするような、5'末端ヌクレオチドと3'末端ヌクレオチドの両方がハイブリダイズ可能な配列に含まれる場合、末端は「平滑」と呼ばれる。異なるアダプターを、逐次反応で又は同時に標的ポリヌクレオチドに接合することができる。例えば、第1及び第2のアダプターを、同じ反応に添加することができる。アダプターは、標的ポリヌクレオチドと組み合わせる前に操ることができる。例えば、末端リン酸を付加又は除去することができる。 Adapter oligonucleotides include any oligonucleotide that is at least partially known and has a sequence capable of conjugating to the target polynucleotide. Adapter oligonucleotides can include DNA, RNA, nucleotide analogs, non-standard nucleotides, labeled nucleotides, modified nucleotides or combinations thereof. The adapter oligonucleotide can be single-stranded, double-stranded or partially double-stranded. In general, a partial double-stranded adapter comprises one or more single-stranded regions and one or more double-stranded regions. Double-stranded adapters can contain two separate oligonucleotides that hybridize to each other (also referred to as "oligonucleotide double strands"), and by hybridization, one or more blunt ends, one or more 3'. Overhangs, one or more 5'protrusions, mismatches and / or one or more bulges due to unpaired nucleotides, or any combination thereof, may remain. In some embodiments, the single-stranded adapter comprises two or more sequences that can hybridize to each other. Hybridization results in a hairpin structure when two such hybridizable sequences are contained in a single-stranded adapter (hairpin adapter). When the two hybridized regions of the adapter are separated from each other by non-hybridized regions, a "bubble" structure occurs. Adapters containing bubble structures can consist of a single adapter oligonucleotide containing internal hybridization, or can include two or more adapter oligonucleotides that hybridize to each other. Hybridization of internal sequences, such as between two hybridizable sequences in an adapter, can create a double-stranded structure on a single-stranded adapter oligonucleotide. Different types of adapters can be used in combination, such as hairpin adapters and double-stranded adapters, or adapters with different arrangements. The hybridizable sequence in the hairpin adapter may or may not include one or both ends of the oligonucleotide. If neither end is included in the hybridizable sequence, both ends are "free" or "protruding". If only one end is capable of hybridizing to another sequence in the adapter, the other end forms a protrusion, such as a 3'protrusion or a 5'protrusion. If both the 5'end nucleotide and the 3'end nucleotide are included in a hybridizable sequence, such that the 5'end nucleotide and the 3'end nucleotide are complementary and hybridize with each other, the end is said to be "smooth". Called. Different adapters can be conjugated to the target polynucleotide either sequentially or simultaneously. For example, first and second adapters can be added to the same reaction. The adapter can be manipulated prior to combining with the target polynucleotide. For example, terminal phosphoric acid can be added or removed.

アダプターは、1つ以上の増幅プライマーアニーリング配列又はその相補体、1つ以上の配列決定プライマーアニーリング配列又はその相補体、1つ以上のバーコード配列、複数の異なるアダプター又は異なるアダプターのサブセットの間で共有されている1つ以上の共通配列、1つ以上の制限酵素認識部位、1つ以上の標的ポリヌクレオチド突出と相補的な1つ以上の突出、1つ以上のプローブ結合部位(例えば、Illumina Inc.によって開発されたものなど、大規模並列配列決定用のフローセルなど配列決定プラットフォームに付着させるための)、1つ以上のランダムな又はほぼランダムな配列(例えば、ランダム配列を含むアダプターのプールで表される1つ以上の位置で選択される異なるヌクレオチドのそれぞれを持つ、1つ以上の位置で2つ以上の異なるヌクレオチドの組からランダムに選択される1つ以上のヌクレオチド)、及びその組合せを含むがそれだけには限らない1つ以上の様々な配列エレメントを含有することができる。2つ以上の配列エレメントは、互いに隣接していない(例えば、1つ以上のヌクレオチドによって隔てられている)、互いに隣接している、部分的に重複する、又は完全に重複していることができる。例えば、増幅プライマーアニーリング配列は、配列決定プライマーアニーリング配列としての機能を果たすこともできる。配列エレメントは、アダプターオリゴヌクレオチドの3'末端若しくはその近く、5'末端若しくはその近く、又は内部に位置することができる。アダプターオリゴヌクレオチドが、ヘアピンなど、二次構造を形成できる場合、配列エレメントは、部分的若しくは完全に二次構造の外側に、部分的若しくは完全に二次構造の中側、又は二次構造に関与する配列の間に位置することができる。例えば、アダプターオリゴヌクレオチドがヘアピン構造を含む場合、配列エレメントは、ハイブリダイズ可能な配列間の配列(「ループ」)内を含めて、ハイブリダイズ可能な配列(「ステム」)の内側若しくは外側に部分的又は完全に位置することができる。いくつかの実施形態において、異なるバーコード配列を有する複数の第1のアダプターオリゴヌクレオチド中の第1のアダプターオリゴヌクレオチドは、第1のアダプターオリゴヌクレオチドの全ての間で共通の配列エレメントを複数含む。いくつかの実施形態において、第2のアダプターオリゴヌクレオチドの全ては、第1のアダプターオリゴヌクレオチドによって共有される共通の配列エレメントとは異なる、第2のアダプターオリゴヌクレオチドの全ての間で共通の配列エレメントを含む。配列エレメントにおける差異は、例えば、配列長の変化、1つ以上のヌクレオチドの欠失若しくは挿入、又は1つ以上のヌクレオチド位置におけるヌクレオチド組成の変化(塩基変化又は塩基修飾など)により、異なるアダプターの少なくとも一部が完全に整列しないような任意のものであり得る。いくつかの実施形態において、アダプターオリゴヌクレオチドは、1つ以上の標的ポリヌクレオチドに相補的である5'突出、3'突出又は両方を含む。相補的な突出は、長さ1、2、3、4、5、6、7、8、9、10、11、12、13、14、15又はより長いヌクレオチドを含むがこれに限定されない、1以上のヌクレオチド長であり得る。例えば、相補的突出は、長さ約1、2、3、4、5又は6ヌクレオチドであり得る。相補的突出は、固定された配列を含むことができる。1つ以上のヌクレオチドが、1つ以上の位置で2つ以上の異なるヌクレオチドの組からランダムに選択されるように、ランダム配列を含む相補的突出を持つアダプターのプールで表される1つ以上の位置で選択される異なるヌクレオチドのそれぞれを持つ相補的突出は、1つ以上のヌクレオチドのランダム配列を含むことができる。いくつかの実施形態において、アダプター突出は、制限エンドヌクレアーゼ消化によって作製される標的ポリヌクレオチド突出と相補的である。いくつかの実施形態において、アダプター突出は、アデニン又はチミンから構成される。 Adapters are among one or more amplification primer annealing sequences or complements thereof, one or more sequencing primer annealing sequences or complements thereof, one or more barcode sequences, multiple different adapters or subsets of different adapters. One or more common sequences shared, one or more restriction enzyme recognition sites, one or more protrusions complementary to one or more target polynucleotide overhangs, one or more probe binding sites (eg, Illumina Inc) Tableed in a pool of adapters containing one or more random or nearly random sequences (eg, random sequences), for attachment to sequencing platforms such as flow cells for large-scale parallel sequencing, such as those developed by. Containing one or more nucleotides randomly selected from a set of two or more different nucleotides at one or more positions, each having a different nucleotide selected at one or more positions), and combinations thereof. Can contain one or more different sequence elements, but not limited to that. Two or more sequence elements can be non-adjacent to each other (eg, separated by one or more nucleotides), adjacent to each other, partially overlapping, or completely overlapping. .. For example, the amplified primer annealing sequence can also serve as a sequencing primer annealing sequence. Sequence elements can be located at or near the 3'end of the adapter oligonucleotide, near or near the 5'end, or inside. Where the adapter oligonucleotide is capable of forming a secondary structure, such as a hairpin, the sequence element is partially or completely outside the secondary structure, partially or completely inside the secondary structure, or involved in the secondary structure. Can be located between sequences that do. For example, if the adapter oligonucleotide contains a hairpin structure, the sequence element is located inside or outside the hybridizable sequence (“stem”), including within the sequence between the hybridizable sequences (“loop”). Can be targeted or perfectly located. In some embodiments, the first adapter oligonucleotide among the plurality of first adapter oligonucleotides having different barcode sequences comprises a plurality of sequence elements common among all of the first adapter oligonucleotides. In some embodiments, all of the second adapter oligonucleotides are different from the common sequence elements shared by the first adapter oligonucleotide, and are common sequence elements among all of the second adapter oligonucleotides. including. Differences in sequence elements are at least different adapters due to, for example, changes in sequence length, deletion or insertion of one or more nucleotides, or changes in nucleotide composition at one or more nucleotide positions (such as base changes or base modifications). It can be anything that some are not perfectly aligned. In some embodiments, the adapter oligonucleotide comprises a 5'protrusion, a 3'protrusion, or both that are complementary to one or more target polynucleotides. Complementary protrusions include, but are not limited to, nucleotides 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or longer nucleotides, 1 It can be the above nucleotide length. For example, the complementary overhang can be about 1, 2, 3, 4, 5 or 6 nucleotides in length. Complementary protrusions can include fixed sequences. One or more represented by a pool of adapters with complementary overhangs containing random sequences so that one or more nucleotides are randomly selected from a set of two or more different nucleotides at one or more positions. Complementary protrusions with each of the different nucleotides selected at position can contain a random sequence of one or more nucleotides. In some embodiments, the adapter protrusion is complementary to the target polynucleotide protrusion produced by restriction endonuclease digestion. In some embodiments, the adapter protrusion is composed of adenine or thymine.

アダプターオリゴヌクレオチドは、それらが構成される1つ以上の配列エレメントを収めるのに少なくとも十分な任意の適切な長さを有することができる。いくつかの実施形態において、アダプターは、長さ約10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、90、100、200又はより長いヌクレオチド未満若しくはより長いヌクレオチドである。いくつかの例において、アダプターは長さ約10～約50ヌクレオチドであり得る。更なる例において、アダプターは長さ約20～約40のヌクレオチドであり得る。 Adapter oligonucleotides can have at least any suitable length sufficient to contain one or more sequence elements from which they are composed. In some embodiments, the adapter is about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200 or more. Less than or longer nucleotides. In some examples, the adapter can be about 10 to about 50 nucleotides in length. In a further example, the adapter can be about 20-40 nucleotides in length.

本明細書では、用語「バーコード」とは、バーコードと関連付けられているポリヌクレオチドのいくつかの特徴を同定可能にする既知の核酸配列のことを指す。いくつかの実施形態において、同定されるポリヌクレオチドの特徴は、ポリヌクレオチドが由来するサンプルである。いくつかの実施形態において、バーコードは少なくとも長さ3、4、5、6、7、8、9、10、11、12、13、14、15又はより長いヌクレオチドであり得る。例えば、バーコードは、少なくとも長さ10、11、12、13、14又は15ヌクレオチドであり得る。いくつかの実施形態において、バーコードは、長さ10、9、8、7、6、5又は4ヌクレオチドより短いことができる。例えば、バーコードは、長さ10ヌクレオチドより短くなることができる。いくつかの実施形態において、いくつかのポリヌクレオチドと関連付けられているバーコードは、他のポリヌクレオチドと関連付けられているバーコードと異なる長さである。一般に、バーコードは、充分な長さのものであり、サンプルが関連付けられているバーコードに基づいてそれらの同定を可能にするのに十分に異なる配列を含む。いくつかの実施形態において、バーコード及びそれが関連付けられているサンプル供給源は、1、2、3、4、5、6、7、8、9、10個又はより多くのヌクレオチドの突然変異、挿入若しくは欠失など、バーコード配列中の1つ以上のヌクレオチドの突然変異、挿入若しくは欠失の後に正確に同定することができる。いくつかの例において、1、2若しくは3個のヌクレオチドを、突然変異、挿入及び/又は欠失させることができる。いくつかの実施形態において、複数のバーコード中の各バーコードは、少なくとも2、3、4、5、6、7、8、9、10又はより多くの位置など、複数の少なくとも2つのヌクレオチド位置で他のいずれのバーコードとも異なる。いくつかの例において、各バーコードは、少なくとも2、3、4又は5つの位置で、他のいずれのバーコードとも異なり得る。いくつかの実施形態において、第1の部位と第2の部位の両方が、複数のバーコード配列の少なくとも1つを含む。いくつかの実施形態において、第2の部位のバーコードは、第1のアダプターオリゴヌクレオチドのバーコードからそれぞれ独立に選択される。いくつかの実施形態において、一対の配列が同じ又は異なる1つ以上のバーコードを含むように、バーコードを有する第1の部位と第2の部位は対にされる。いくつかの実施形態において、本開示の方法は、標的ポリヌクレオチドが接合されたバーコード配列に基づいて、標的ポリヌクレオチドが由来するサンプルを同定するステップを更に含む。一般に、バーコードは、標的ポリヌクレオチドに接合された際に、標的ポリヌクレオチドが由来したサンプルの識別子として機能する核酸配列を含むことができる。 As used herein, the term "barcode" refers to a known nucleic acid sequence that makes it possible to identify some features of a polynucleotide associated with a barcode. In some embodiments, the characteristic of the polynucleotide identified is the sample from which the polynucleotide is derived. In some embodiments, the barcode can be at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or longer nucleotides. For example, the barcode can be at least 10, 11, 12, 13, 14 or 15 nucleotides in length. In some embodiments, the barcode can be shorter than 10, 9, 8, 7, 6, 5 or 4 nucleotides in length. For example, barcodes can be shorter than 10 nucleotides in length. In some embodiments, the barcode associated with some polynucleotides is of a different length than the barcode associated with other polynucleotides. In general, the barcodes are of sufficient length and contain sequences that are sufficiently different to allow their identification based on the barcode to which the sample is associated. In some embodiments, the barcode and the sample source to which it is associated are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotide mutations, It can be accurately identified after a mutation, insertion or deletion of one or more nucleotides in a barcode sequence, such as an insertion or deletion. In some examples, one, two or three nucleotides can be mutated, inserted and / or deleted. In some embodiments, each barcode in the plurality of barcodes has a plurality of at least two nucleotide positions, such as at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more positions. It is different from any other barcode. In some examples, each barcode may differ from any other barcode in at least 2, 3, 4 or 5 positions. In some embodiments, both the first and second sites contain at least one of the plurality of barcode sequences. In some embodiments, the barcode of the second site is independently selected from the barcode of the first adapter oligonucleotide. In some embodiments, the first and second sites with barcodes are paired so that the pair of sequences contains one or more barcodes that are the same or different. In some embodiments, the methods of the present disclosure further comprise the step of identifying the sample from which the target polynucleotide is derived based on the barcode sequence to which the target polynucleotide is attached. In general, the barcode can include a nucleic acid sequence that, when attached to the target polynucleotide, acts as an identifier for the sample from which the target polynucleotide was derived.

真核生物において、ゲノムDNAは、クロマチンに束ねられて、核内で染色体として構成されている。クロマチンの基本的な構造単位はヌクレオソームであり、ヌクレオソームは、ヒストン八量体の周りに巻き付いた146塩基対(bp)のDNAから構成される。ヒストン八量体は、コアヒストンH2A-H2B二量体とH3-H4二量体それぞれの2コピーから構成される。ヌクレオソームは、DNAに沿って規則的に間隔を置いて配置され、一般に「数珠状」と呼ばれる。 In eukaryotes, genomic DNA is bundled with chromatin and organized as chromosomes in the nucleus. The basic structural unit of chromatin is the nucleosome, which is composed of 146 base pairs (bp) of DNA wrapped around a histone octamer. The histone octamer consists of two copies of each of the core histone H2A-H2B dimer and the H3-H4 dimer. Nucleosomes are regularly spaced along the DNA and are commonly referred to as "beads".

ヌクレオソームへのコアヒストン及びDNAのアセンブリは、シャペロンタンパク質及び関連するアセンブリ因子に媒介される。これらの因子のほぼ全ては、コアヒストン結合タンパク質である。ヌクレオソームアセンブリタンパク質-1(NAP-1)などヒストンシャペロンのいくつかは、ヒストンH3及びH4への結合に対して優先傾向を呈する。新しく合成されたヒストンはアセチル化され、続いてクロマチンにアセンブリされた後に脱アセチル化されることも観察された。したがって、ヒストンアセチル化又は脱アセチル化を媒介する因子は、クロマチンアセンブリ過程において重要な役割を果たす。 Assembly of core histones and DNA into nucleosomes is mediated by chaperone proteins and related assembly factors. Almost all of these factors are core histone binding proteins. Some histone chaperones, such as the nucleosome assembly protein-1 (NAP-1), show a preferred tendency for binding to histones H3 and H4. It was also observed that the newly synthesized histones were acetylated and subsequently assembled into chromatin and then deacetylated. Therefore, factors that mediate histone acetylation or deacetylation play an important role in the chromatin assembly process.

一般に、クロマチンを再構成又はアセンブルするために2つのin vitroの方法が開発された。1つ方法はATP非依存的であるが、第2はATP依存的である。クロマチンを再構成するATP非依存的な方法は、DNA及びコアヒストン、プラスNAP-1のようなタンパク質か又はヒストンシャペロンとして作用するための塩のいずれかを含む。この方法は、細胞内の天然のコアヌクレオソーム粒子を正確に模倣しているわけではないDNA上でランダムなヒストンの編成を生じる。これらの粒子は、規則的に順序付けられておらず、拡張されたヌクレオソームアレイ及び使用されるDNA配列が通常250bpを超えないので、しばしばモノヌクレオソームと称される(Kundu、T. K.ら、Mol. Cell 6:551～561頁、2000年)。より長いDNA配列上で、順序付けられたヌクレオソームの拡張アレイを生成するには、クロマチンはATP依存的な過程でアセンブルされなければならない。 In general, two in vitro methods have been developed to reconstitute or assemble chromatin. One method is ATP-independent, while the second is ATP-dependent. ATP-independent methods of reconstitution of chromatin include either DNA and proteins such as core histones, plus NAP-1, or salts to act as histone chaperones. This method results in random histone organization on DNA that does not exactly mimic the natural core nucleosome particles in the cell. These particles are often referred to as mononucleosomes because they are not regularly ordered and the extended nucleosome array and the DNA sequence used usually do not exceed 250 bp (Kundu, TK et al., Mol. Cell 6). : 551-561, 2000). Chromatin must be assembled in an ATP-dependent process to generate an ordered extended array of nucleosomes on longer DNA sequences.

周期的なヌクレオソームアレイのATP依存的なアセンブリは、天然のクロマチンに見られるそれと類似しており、DNA配列、コアヒストン粒子、シャペロンタンパク質及びATP利用クロマチンアセンブリ因子を必要とする。ACF(ATP利用クロマチンアセンブリ及び再形成因子)又はRSF(再形成及び間隔因子)は、広く研究されている2つのアセンブリ因子であり、それらを使用して、in vitroで、拡張され順序付けられたヌクレオソームのアレイからクロマチンを生成する(Fyodorov, D.V.及びKadonaga, J.T.、Method Enzymol. 371:499～515頁、2003年、Kundu, T. K.ら、Mol. Cell 6:551～561頁、2000年)。 The ATP-dependent assembly of periodic nucleosome arrays is similar to that found in native chromatin and requires DNA sequences, core histone particles, chaperone proteins and ATP-utilized chromatin assembly factors. ACF (ATP-based chromatin assembly and remodeling factor) or RSF (remodeling and spacing factor) are two widely studied assembly factors that are used to expand and order nucleosomes in vitro. Chromatin is produced from the array of (Fyodorov, DV and Kadonaga, JT, Method Enzymol. 371: 499-515, 2003, Kundu, TK et al., Mol. Cell 6: 551-561, 2000).

特定の実施形態において、本開示の方法は、例えば、血漿、血清及び/若しくは尿から単離した遊離DNA、細胞及び/若しくは組織からのアポトーシスDNA、in vitroで酵素的に断片化したDNA(例えば、DNaseI及び/又は制限酵素による)、並びに/又は機械的な力(水剪断、超音波処理、噴霧化等)によって断片化したDNAを含むがこれに限定されない任意の型の断片化した二本鎖DNAに容易に適用することができる。 In certain embodiments, the methods of the present disclosure are, for example, free DNA isolated from plasma, serum and / or urine, apoptotic DNA from cells and / or tissues, and enzymatically fragmented DNA in vitro (eg,). , DNase I and / or by restriction enzymes), and / or any type of fragmented DNA that includes, but is not limited to, DNA fragmented by mechanical force (water shearing, ultrasonic treatment, atomization, etc.). It can be easily applied to strand DNA.

生体サンプルから得られる核酸を断片化して、分析に適した断片を作製することができる。鋳型核酸は、様々な機械的、化学的及び/又は酵素的な方法を使用して、所望の長さに断片化又は剪断することができる。DNAは、超音波処理、例えばCovaris法、DNaseへの短時間曝露、又は1つ以上の制限酵素の混合物、又はトランスポーゼース若しくはニッキング酵素を使用することによって、ランダムに剪断することができる。RNAは、RNaseへの短時間曝露、熱+マグネシウム又は剪断によって断片化することができる。RNAは、cDNAに変換することができる。断片化を利用する場合、RNAは断片化の前か後にcDNAに変換することができる。いくつかの実施形態において、生体サンプル由来核酸は、超音波処理によって断片化される。他の実施形態において、核酸は水剪断機器によって断片化される。通常、個々の核酸鋳型分子は、約2kb～約40kb塩基であり得る。様々な実施形態において、核酸は、約6kb～10kb断片であり得る。核酸分子は、一本鎖、二本鎖又は一本鎖領域を持つ二本鎖(例えば、ステムアンドループ構造)であり得る。 Nucleic acid obtained from a biological sample can be fragmented to prepare a fragment suitable for analysis. The template nucleic acid can be fragmented or sheared to the desired length using a variety of mechanical, chemical and / or enzymatic methods. DNA can be randomly sheared by sonication, eg, Covaris method, short exposure to DNase, or a mixture of one or more restriction enzymes, or a transportase or nicking enzyme. RNA can be fragmented by short exposure to RNase, heat + magnesium or shear. RNA can be converted to cDNA. When using fragmentation, RNA can be converted to cDNA before or after fragmentation. In some embodiments, the nucleic acid derived from the biological sample is fragmented by sonication. In other embodiments, the nucleic acid is fragmented by a water shear device. Generally, individual nucleic acid template molecules can be about 2 kb to about 40 kb bases. In various embodiments, the nucleic acid can be a fragment of about 6 kb to 10 kb. The nucleic acid molecule can be single-stranded, double-stranded or double-stranded with a single-stranded region (eg, a stem-and-loop structure).

いくつかの実施形態において、架橋したDNA分子は、サイズ選択ステップに供することができる。核酸のサイズ選択は、特定のサイズより小さい又は大きい架橋したDNA分子に対して実施することができる。サイズ選択は、架橋の頻度及び/又は断片化の方法、例えば頻度が高い若しくは稀なカッター制限酵素を選ぶことによって更に影響を受け得る。いくつかの実施形態において、組成物は、約1kb～5Mb、約5kb～5Mb、約5kb～2Mb、約10kb～2Mb、約10kb～1Mb、約20kb～1Mb、約20kb～500kb、約50kb～500kb、約50kb～200kb、約60kb～200kb、約60kb～150kb、約80kb～150kb、約80kb～120kb若しくは約100kb～120kbの範囲、又はこれらの値のいずれかに囲まれる任意の範囲(例えば約150kb～1Mb)にDNA分子を架橋結合することを含むように調製できる。 In some embodiments, the crosslinked DNA molecule can be subjected to a size selection step. Nucleic acid size selection can be performed on crosslinked DNA molecules that are smaller or larger than a particular size. Size selection can be further influenced by the frequency of cross-linking and / or the method of fragmentation, such as choosing frequent or rare cutter restriction enzymes. In some embodiments, the composition is about 1 kb to 5 Mb, about 5 kb to 5 Mb, about 5 kb to 2 Mb, about 10 kb to 2 Mb, about 10 kb to 1 Mb, about 20 kb to 1 Mb, about 20 kb to 500 kb, about 50 kb to 500 kb. , About 50 kb to 200 kb, about 60 kb to 200 kb, about 60 kb to 150 kb, about 80 kb to 150 kb, about 80 kb to 120 kb or about 100 kb to 120 kb, or any range surrounded by any of these values (for example, about 150 kb). It can be prepared to include cross-linking a DNA molecule to ~ 1Mb).

いくつかの実施形態において、サンプルポリヌクレオチドは1つ以上の特定のサイズ範囲の断片化したDNA分子の集団に断片化される。いくつかの実施形態において、断片は、少なくとも約1、約2、約5、約10、約20、約50、約100、約200、約500、約1000、約2000、約5000、約10,000、約20,000、約50,000、約100,000、約200,000、約500,000、約1,000,000、約2,000,000、約5,000,000、約10,000,000個又はより多くの出発DNAのゲノム同等物から生成することができる。断片化は、化学的、酵素的、機械的断片化含めて当技術分野において公知の方法によって達成することができる。いくつかの実施形態において、断片は、約10～約10,000、約20,000、約30,000、約40,000、約50,000、約60,000、約70,000、約80,000、約90,000、約100,000、約150,000、約200,000、約300,000、約400,000、約500,000、約600,000、約700,000、約800,000、約900,000、約1,000,000、約2,000,000、約5,000,000、約10,000,000又はより長いヌクレオチドの平均長を有する。いくつかの実施形態において、断片は、約1kb～約10Mbの平均長を有する。いくつかの実施形態において、断片は、約1kb～5Mb、約5kb～5Mb、約5kb～2Mb、約10kb～2Mb、約10kb～1Mb、約20kb～1Mb約20kb～500kb、約50kb～500kb、約50kb～200kb、約60kb～200kb、約60kb～150kb、約80kb～150kb、約80kb～120kb若しくは約100kb～120kb、又はこれらの値のいずれかに囲まれる任意の範囲(例えば約60～120kb)の平均長を有する。いくつかの実施形態において、断片は、約10Mb未満、約5Mb未満、約1Mb未満、約500kb未満、約200kb未満、約100kb未満又は約50kb未満の平均長を有する。他の実施形態において、断片は、約5kb以上、約10kb以上、約50kb以上、約100kb以上、約200kb以上、約500kb以上、約1Mb以上、約5Mb以上又は約10Mb以上の平均長を有する。いくつかの実施形態において、断片化は、サンプルDNA分子を音波処理に供することを含めて、機械的に達成される。いくつかの実施形態において、断片化は、1つ以上の酵素が二本鎖核酸破断を生成するのに好適な条件下で1つ以上の酵素でサンプルDNA分子を処理することを含む。DNA断片の生成に有用な酵素の例には、配列特異的な及び配列非特異的なヌクレアーゼがある。ヌクレアーゼの限定されない例には、DNaseI、断片化酵素、制限エンドヌクレアーゼ、そのバリアント及びその組合せがある。例えば、DNaseIによる消化は、Mg⁺⁺の非存在下及びMn⁺⁺の存在下でDNAのランダムな二本鎖破断を誘導することができる。いくつかの実施形態において、断片化は、1つ以上の制限エンドヌクレアーゼでサンプルDNA分子を処理することを含む。断片化は、5'突出、3'突出、平滑末端、又はその組合せを有する断片を作製することができる。いくつかの実施形態において、断片化が、1つ以上の制限エンドヌクレアーゼの使用を含む場合など、サンプルDNA分子の切断により、予測可能な配列を有する突出が残る。いくつかの実施形態において、本方法は、カラム精製又はアガロースゲルからの単離など標準的な方法によって断片をサイズ選択するステップを含む。 In some embodiments, the sample polynucleotide is fragmented into a population of fragmented DNA molecules in one or more specific size ranges. In some embodiments, the fragments are at least about 1, about 2, about 5, about 10, about 20, about 50, about 100, about 200, about 500, about 1000, about 2000, about 5000, about 10,000, It can be generated from about 20,000, about 50,000, about 100,000, about 200,000, about 500,000, about 1,000,000, about 2,000,000, about 5,000,000, about 10,000,000 or more genomic equivalents of starting DNA. Fragmentation can be achieved by methods known in the art, including chemical, enzymatic and mechanical fragmentation. In some embodiments, the fragments are about 10 to about 10,000, about 20,000, about 30,000, about 40,000, about 50,000, about 60,000, about 70,000, about 80,000, about 90,000, about 100,000, about 150,000, about 200,000, about. It has an average length of 300,000, about 400,000, about 500,000, about 600,000, about 700,000, about 800,000, about 900,000, about 1,000,000, about 2,000,000, about 5,000,000, about 10,000,000 or longer nucleotides. In some embodiments, the fragments have an average length of about 1 kb to about 10 Mb. In some embodiments, the fragments are about 1 kb to 5 Mb, about 5 kb to 5 Mb, about 5 kb to 2 Mb, about 10 kb to 2 Mb, about 10 kb to 1 Mb, about 20 kb to 1 Mb, about 20 kb to 500 kb, about 50 kb to 500 kb, about 50 kb to 500 kb. 50 kb to 200 kb, about 60 kb to 200 kb, about 60 kb to 150 kb, about 80 kb to 150 kb, about 80 kb to 120 kb or about 100 kb to 120 kb, or any range surrounded by any of these values (for example, about 60 to 120 kb). Has an average length. In some embodiments, the fragment has an average length of less than about 10 Mb, less than about 5 Mb, less than about 1 Mb, less than about 500 kb, less than about 200 kb, less than about 100 kb, or less than about 50 kb. In other embodiments, the fragment has an average length of about 5 kb or more, about 10 kb or more, about 50 kb or more, about 100 kb or more, about 200 kb or more, about 500 kb or more, about 1 Mb or more, about 5 Mb or more, or about 10 Mb or more. In some embodiments, fragmentation is achieved mechanically, including subjecting the sample DNA molecule to sonication. In some embodiments, fragmentation comprises treating the sample DNA molecule with one or more enzymes under conditions suitable for one or more enzymes to produce double-stranded nucleic acid fractures. Examples of enzymes useful for the generation of DNA fragments are sequence-specific and sequence-non-specific nucleases. Non-limiting examples of nucleases include DNase I, fragmentation enzymes, restriction endonucleases, variants thereof and combinations thereof. For example, digestion with DNase I can induce random double-strand breaks in DNA in the absence of Mg ⁺⁺ and in the presence of Mn ⁺⁺ . In some embodiments, fragmentation involves treating the sample DNA molecule with one or more restriction endonucleases. Fragmentation can produce fragments with 5'protrusions, 3'protrusions, blunt ends, or combinations thereof. In some embodiments, cleavage of the sample DNA molecule leaves a protrusion with a predictable sequence, such as when fragmentation involves the use of one or more restriction endonucleases. In some embodiments, the method comprises sizing the fragment by standard methods such as column purification or isolation from an agarose gel.

いくつかの実施形態において、断片化したDNAの5'及び/又は3'末端ヌクレオチド配列は、ライゲーションの前に修飾されない。例えば、制限エンドヌクレアーゼによる断片化を使用して、予測可能な突出を残し、その後にDNA断片上にある予測可能な突出と相補的な突出を含む核酸末端とライゲーションすることができる。別の例において、予測可能な平滑末端を残す酵素による切断の後に、続けてアダプター、オリゴヌクレオチド又はポリヌクレオチドなど平滑末端を含む核酸に平滑末端化したDNA断片をライゲーションすることができる。いくつかの実施形態において、断片化したDNA分子を、平滑末端に削って(又は「末端修復して」)、平滑末端を有するDNA断片を作製した後に、アダプターに接合する。平滑末端を削るステップは、3'→5'エキソヌクレアーゼ活性と5'→3'ポリメラーゼ活性の両方を有するDNAポリメラーゼ、例えばT4ポリメラーゼなど適切な酵素とインキュベーションすることによって達成できる。いくつかの実施形態において、末端修復の後に、1つ以上のアデニン、1つ以上のチミン、1つ以上のグアニン若しくは1つ以上のシトシンなど1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20個又はより多くのヌクレオチドの付加を続けて、突出を作製することができる。例えば、末端対は、続けて1、2、3、4、5又は6個のヌクレオチドを付加することができる。ライゲーション反応などにおいて、突出を有するDNA断片は、相補的突出を有するオリゴヌクレオチド、アダプターオリゴヌクレオチド又はポリヌクレオチドなど1つ以上の核酸に接合することができる。例えば、鋳型非依存的なポリメラーゼを使用して末端修復したDNA断片の3'末端に単一アデニンを付加し、その後それぞれ3'末端にチミンを有する1つ以上のアダプターにライゲーションすることができる。いくつかの実施形態において、オリゴヌクレオチド又はポリヌクレオチドなどの核酸は、1つ以上のヌクレオチドにより3'末端を伸長し、その後に5'リン酸化を続けることによって修飾されている平滑末端二本鎖DNA分子に接合することができる。いくつかの場合において、3'末端の伸長は、マグネシウムを含有することができる適切な緩衝液に1つ以上のdNTPの存在下で、Klenowポリメラーゼ若しくは本明細書に提供された任意の適切なポリメラーゼなどのポリメラーゼにより、又は末端デオキシヌクレオチド転移酵素の使用により実施することができる。いくつかの実施形態において、平滑末端を有する標的ポリヌクレオチドは、平滑末端を含む1つ以上のアダプターに接合される。DNA断片分子の5'末端のリン酸化は、例えば、ATP及びマグネシウムを含有する適切な緩衝液中でT4ポリヌクレオチドキナーゼにより実施することができる。任意選択で、例えば、ホスファターゼなど、当技術分野において公知の酵素を使用することによって、断片化したDNA分子を処理して、5'末端又は3'末端を脱リン酸化することができる。 In some embodiments, the 5'and / or 3'terminal nucleotide sequences of the fragmented DNA are unmodified prior to ligation. For example, fragmentation with a restriction endonuclease can be used to leave a predictable overhang and then ligate with a nucleic acid terminal containing a predictable overhang and a complementary overhang on the DNA fragment. In another example, cleavage by an enzyme that leaves a predictable blunt end can be followed by ligation of the blunt-ended DNA fragment to a nucleic acid containing the blunt end, such as an adapter, oligonucleotide or polynucleotide. In some embodiments, the fragmented DNA molecule is scraped (or "end-repaired") to a blunt end to create a DNA fragment with a blunt end and then attached to the adapter. The step of scraping the blunt ends can be accomplished by incubation with a suitable enzyme such as a DNA polymerase having both 3'→ 5'exonuclease activity and 5'→ 3'polymerase activity, such as T4 polymerase. In some embodiments, after terminal repair, one or more adenines, one or more thymines, one or more guanines or one or more cytosines, etc. 1, 2, 3, 4, 5, 6, 7, Protrusions can be created by continuing the addition of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. For example, the terminal pair can be continuously added with 1, 2, 3, 4, 5 or 6 nucleotides. In ligation reactions and the like, the overhanging DNA fragment can be conjugated to one or more nucleic acids such as oligonucleotides, adapter oligonucleotides or polynucleotides with complementary overhangs. For example, a template-independent polymerase can be used to add a single adenine to the 3'end of a terminal-repaired DNA fragment and then ligate to one or more adapters each having thymine at the 3'end. In some embodiments, nucleic acids such as oligonucleotides or polynucleotides are blunt-ended double-stranded DNAs that are modified by extending the 3'end with one or more nucleotides followed by 5'phosphorylation. Can be attached to molecules. In some cases, 3'terminal extension can be Klenow polymerase or any suitable polymerase provided herein in the presence of one or more dNTPs in a suitable buffer that can contain magnesium. It can be carried out by a polymerase such as, or by the use of terminal deoxynucleotide transferase. In some embodiments, the target polynucleotide having a blunt end is attached to one or more adapters containing the blunt end. Phosphorylation of the 5'end of the DNA fragment molecule can be performed, for example, by T4 polynucleotide kinase in a suitable buffer containing ATP and magnesium. Optionally, by using an enzyme known in the art, such as, for example, phosphatase, fragmented DNA molecules can be treated to dephosphorylate the 5'end or 3'end.

アダプターオリゴヌクレオチド及び標的ポリヌクレオチドなど2つのポリヌクレオチドに関して、本明細書では用語「接続する」、「接合する」及び「ライゲーション」とは、2つの別々のDNAセグメントを共有結合して、連続した骨格を持つ単一のより長いポリヌクレオチドを作製することを指す。2つのDNAセグメントを接合する方法は当技術分野において公知であり、それだけには限らないが、酵素的及び非酵素的(例えば化学的)方法を含む。非酵素的であるライゲーション反応の例には、米国特許第5,780,613号及び第5,476,930号に記載される非酵素的なライゲーション技法があり、それらを参照により本明細書に組み込む。いくつかの実施形態において、アダプターオリゴヌクレオチドは、リガーゼ、例えばDNAリガーゼ又はRNAリガーゼによって標的ポリヌクレオチドに接合される。それぞれ特徴付けられた反応条件を有する複数のリガーゼが当技術分野において公知であり、それだけには限らないが、tRNAリガーゼ、Taq DNAリガーゼ、サームスフィリフォルミス(Thermus filiformis)DNAリガーゼ、大腸菌DNAリガーゼ、Tth DNAリガーゼ、サームススコトダクタス(Thermus scotoductus)DNAリガーゼ(I及びII)、耐熱性リガーゼ、Ampligase耐熱性DNAリガーゼ、VanC-型リガーゼ、9°N DNAリガーゼ、Tsp DNAリガーゼ及び生物資源調査によって発見される新規なリガーゼを含めたNAD⁺依存的リガーゼ、T4 RNAリガーゼ、T4 DNAリガーゼ、T3 DNAリガーゼ、T7 DNAリガーゼ、Pfu DNAリガーゼ、DNAリガーゼ1、DNAリガーゼIII、DNAリガーゼIV及び生物資源調査によって発見される新規なリガーゼを含めたATP依存性リガーゼ、並びにその野生型、変異体アイソフォーム及び遺伝子操作されたバリアントがある。 With respect to two polynucleotides, such as adapter oligonucleotides and target polynucleotides, the terms "connecting", "binding" and "ligation" herein covalently bind two separate DNA segments into a contiguous skeleton. Refers to making a single longer polynucleotide with. Methods of joining two DNA segments are known in the art and include, but are not limited to, enzymatic and non-enzymatic (eg, chemical) methods. Examples of non-enzymatic ligation reactions include the non-enzymatic ligation techniques described in US Pat. Nos. 5,780,613 and 5,476,930, which are incorporated herein by reference. In some embodiments, the adapter oligonucleotide is ligated to the target polynucleotide by a ligase, such as DNA ligase or RNA ligase. Multiple ligases, each with characterized reaction conditions, are known in the art, but not limited to tRNA ligases, Taq DNA ligases, Thermus filiformis DNA ligases, Escherichia coli DNA ligases, By Tth DNA ligase, Thermos scotoductus DNA ligase (I and II), heat resistant ligase, Ampligase heat resistant DNA ligase, VanC-type ligase, 9 ° N DNA ligase, Tsp DNA ligase and bioresource survey NAD ⁺ dependent ligase, T4 RNA ligase, T4 DNA ligase, T3 DNA ligase, T7 DNA ligase, Pfu DNA ligase, DNA ligase 1, DNA ligase III, DNA ligase IV and biological resource survey, including novel ligases found There are ATP-dependent ligases, including novel ligases discovered by, as well as their wild forms, variant isoforms and genetically engineered variants.

ライゲーションは、相補的突出などハイブリダイズ可能な配列を有するDNAセグメントの間で可能である。ライゲーションは、2つの平滑末端の間でも可能である。通常、5'リン酸がライゲーション反応に利用される。5'リン酸は、標的ポリヌクレオチド、アダプターオリゴヌクレオチド又は両方によって提供され得る。5'リン酸は、必要に応じて接合されるDNAセグメントに付加する又はそれから除去することができる。5'リン酸を付加又は除去する方法は当技術分野において公知であり、それだけには限らないが、酵素的及び化学的過程を含む。5'リン酸の付加及び/又は除去に有用な酵素には、キナーゼ、ホスファターゼ及びポリメラーゼがある。いくつかの実施形態において、2つの末端を接合する際に共有結合が2つ作られるように、ライゲーション反応において接合される2つの末端の両方(例えばアダプター末端及び標的ポリヌクレオチド末端)が、5'リン酸を提供する。いくつかの実施形態において、2つの末端を接合する際に共有結合が1つだけ作られるように、ライゲーション反応において接合される2つの末端の一方だけ(例えばアダプター末端及び標的ポリヌクレオチド末端の一方だけ)が、5'リン酸を提供する。 Ligation is possible between DNA segments that have hybridizable sequences such as complementary overhangs. Ligation is also possible between the two blunt ends. Usually, 5'phosphoric acid is used for the ligation reaction. 5'Phosphate can be provided by the target polynucleotide, adapter oligonucleotide or both. 5'Phosphoric acid can be added to or removed from the DNA segment to be joined as needed. Methods for adding or removing 5'phosphoric acid are known in the art and include, but are not limited to, enzymatic and chemical processes. Enzymes useful for the addition and / or removal of 5'phosphate include kinases, phosphatases and polymerases. In some embodiments, both of the two ends (eg, the adapter end and the target polynucleotide end) that are joined in the ligation reaction are 5'so that two covalent bonds are created when the two ends are joined. Provides phosphoric acid. In some embodiments, only one of the two ends joined in the ligation reaction (eg, one of the adapter end and one of the target polynucleotide ends) so that only one covalent bond is created when joining the two ends. ) Provides 5'phosphoric acid.

いくつかの実施形態において、標的ポリヌクレオチドの一方又は両方の末端で1つの鎖だけが、アダプターオリゴヌクレオチドに接合される。いくつかの実施形態において、標的ポリヌクレオチドの一方又は両方の末端で両方の鎖が、アダプターオリゴヌクレオチドに接合される。いくつかの実施形態において、3'リン酸はライゲーションの前に除去される。いくつかの実施形態において、アダプターオリゴヌクレオチドが、標的ポリヌクレオチドの両方の末端に付加され、各末端で1つ又は両方の鎖が、1つ以上のアダプターオリゴヌクレオチドに接合される。両方の末端で両方の鎖がアダプターオリゴヌクレオチドに接合される場合、接合の後に、対応する3'末端を伸長するための鋳型としての機能を果たし得る5'突出を残す切断反応を続けることができ、その3'末端は、アダプターオリゴヌクレオチドに由来する1つ以上のヌクレオチドを含むことができる又は含むことができない。いくつかの実施形態において、標的ポリヌクレオチドは、一方の末端で第1のアダプターオリゴヌクレオチド及び他方の末端で第2のアダプターオリゴヌクレオチドに接合される。いくつかの実施形態において、標的ポリヌクレオチドの2つの末端は、単一のアダプターオリゴヌクレオチドの反対の末端に接合される。いくつかの実施形態において、接合される標的ポリヌクレオチド及びアダプターオリゴヌクレオチドは、平滑末端を含む。いくつかの実施形態において、いかなるバーコード配列も2つ以上のサンプルの標的ポリヌクレオチドに接合されないように、各サンプルに対して少なくとも1つのバーコード配列を含む異なる第1のアダプターオリゴヌクレオチドを使用して、各サンプルについて別々のライゲーション反応を実施することができる。接合したアダプターオリゴヌクレオチドを有するDNAセグメント又は標的ポリヌクレオチドは、接合されたアダプターによって「タグ付けされた」と見なされる。 In some embodiments, only one strand is attached to the adapter oligonucleotide at one or both ends of the target polynucleotide. In some embodiments, both strands are attached to the adapter oligonucleotide at one or both ends of the target polynucleotide. In some embodiments, 3'phosphate is removed prior to ligation. In some embodiments, adapter oligonucleotides are added to both ends of the target polynucleotide and one or both strands are attached to one or more adapter oligonucleotides at each end. If both strands are attached to the adapter oligonucleotide at both ends, the cleavage reaction can be continued after the junction, leaving a 5'protrusion that can serve as a template for extending the corresponding 3'end. , Its 3'end can or cannot contain one or more nucleotides derived from the adapter oligonucleotide. In some embodiments, the target polynucleotide is conjugated to a first adapter oligonucleotide at one end and a second adapter oligonucleotide at the other end. In some embodiments, the two ends of the target polynucleotide are attached to the opposite ends of a single adapter oligonucleotide. In some embodiments, the targeted polynucleotide and adapter oligonucleotide to be conjugated comprise a blunt end. In some embodiments, different first adapter oligonucleotides containing at least one barcode sequence are used for each sample so that no barcode sequence is attached to the target polynucleotide of more than one sample. Each sample can be subjected to a separate ligation reaction. A DNA segment or target polynucleotide having a conjugated adapter oligonucleotide is considered "tagged" by the conjugated adapter.

いくつかの場合において、ライゲーション反応は、約0.1ng/μL、約0.2ng/μL、約0.3ng/μL、約0.4ng/μL、約0.5ng/μL、約0.6ng/μL、約0.7ng/μL、約0.8ng/μL、約0.9ng/μL、約1.0ng/μL、約1.2ng/μL、約1.4ng/μL、約1.6ng/μL、約1.8ng/μL、約2.0ng/μL、約2.5ng/μL、約3.0ng/μL、約3.5ng/μL、約4.0ng/μL、約4.5ng/μL、約5.0ng/μL、約6.0ng/μL、約7.0ng/μL、約8.0ng/μL、約9.0ng/μL、約10ng/μL、約15ng/μL、約20ng/μL、約30ng/μL、約40ng/μL、約50ng/μL、約60ng/μL、約70ng/μL、約80ng/μL、約90ng/μL、約100ng/μL、約150ng/μL、約200ng/μL、約300ng/μL、約400ng/μL、約500ng/μL、約600ng/μL、約800ng/μL又は約1000ng/μLのDNAセグメント若しくは標的ポリヌクレオチド濃度で実施することができる。例えば、ライゲーションは、約100ng/μL、約150ng/μL、約200ng/μL、約300ng/μL、約400ng/μL又は約500ng/μLのDNAセグメント若しくは標的ポリヌクレオチド濃度で実施することができる。 In some cases, the ligation reaction is about 0.1 ng / μL, about 0.2 ng / μL, about 0.3 ng / μL, about 0.4 ng / μL, about 0.5 ng / μL, about 0.6 ng / μL, about 0.7 ng / μL, about 0.8ng / μL, about 0.9ng / μL, about 1.0ng / μL, about 1.2ng / μL, about 1.4ng / μL, about 1.6ng / μL, about 1.8ng / μL, about 2.0ng / μL, Approximately 2.5 ng / μL, Approximately 3.0 ng / μL, Approximately 3.5 ng / μL, Approximately 4.0 ng / μL, Approximately 4.5 ng / μL, Approximately 5.0 ng / μL, Approximately 6.0 ng / μL, Approximately 7.0 ng / μL, Approximately 8.0 ng / μL, about 9.0 ng / μL, about 10 ng / μL, about 15 ng / μL, about 20 ng / μL, about 30 ng / μL, about 40 ng / μL, about 50 ng / μL, about 60 ng / μL, about 70 ng / μL, Approximately 80 ng / μL, Approximately 90 ng / μL, Approximately 100 ng / μL, Approximately 150 ng / μL, Approximately 200 ng / μL, Approximately 300 ng / μL, Approximately 400 ng / μL, Approximately 500 ng / μL, Approximately 600 ng / μL, Approximately 800 ng / μL or It can be carried out at a DNA segment or target polynucleotide concentration of about 1000 ng / μL. For example, ligation can be performed at DNA segment or target polynucleotide concentrations of about 100 ng / μL, about 150 ng / μL, about 200 ng / μL, about 300 ng / μL, about 400 ng / μL or about 500 ng / μL.

いくつかの場合において、ライゲーション反応は、約0.1～1000ng/μL、約1～1000ng/μL、約1～800ng/μL、約10～800ng/μL、約10～600ng/μL、約100～600ng/μL又は約100～500ng/μLのDNAセグメント若しくは標的ポリヌクレオチド濃度で実施することができる。 In some cases, the ligation reaction is about 0.1-1000 ng / μL, about 1-1000 ng / μL, about 1-800 ng / μL, about 10-800 ng / μL, about 10-600 ng / μL, about 100-600 ng / It can be carried out at μL or a DNA segment or target polynucleotide concentration of about 100-500 ng / μL.

いくつかの場合において、ライゲーション反応は、約5分間、約10分間、約20分間、約30分間、約40分間、約50分間、約60分間、約90分間、約2時間、約3時間、約4時間、約5時間、約6時間、約8時間、約10時間、約12時間、約18時間、約24時間、約36時間、約48時間又は約96時間より長く実施することができる。他の場合において、ライゲーション反応は、約5分間、約10分間、約20分間、約30分間、約40分間、約50分間、約60分間、約90分間、約2時間、約3時間、約4時間、約5時間、約6時間、約8時間、約10時間、約12時間、約18時間、約24時間、約36時間、約48時間又は約96時間未満実施することができる。例えば、ライゲーション反応は、約30分～約90分間実施することができる。いくつかの実施形態において、標的ポリヌクレオチドに対するアダプターの接合は、アダプター由来のヌクレオチド配列を含む3'突出を有する接合されたポリヌクレオチド産物を作製する。 In some cases, the ligation reaction is about 5 minutes, about 10 minutes, about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 60 minutes, about 90 minutes, about 2 hours, about 3 hours, It can be carried out for longer than about 4 hours, about 5 hours, about 6 hours, about 8 hours, about 10 hours, about 12 hours, about 18 hours, about 24 hours, about 36 hours, about 48 hours or about 96 hours. .. In other cases, the ligation reaction is about 5 minutes, about 10 minutes, about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 60 minutes, about 90 minutes, about 2 hours, about 3 hours, about. It can be carried out for 4 hours, about 5 hours, about 6 hours, about 8 hours, about 10 hours, about 12 hours, about 18 hours, about 24 hours, about 36 hours, about 48 hours or less than about 96 hours. For example, the ligation reaction can be carried out for about 30 minutes to about 90 minutes. In some embodiments, conjugation of an adapter to a target polynucleotide produces a conjugated polynucleotide product with a 3'protrusion containing a nucleotide sequence from the adapter.

いくつかの実施形態において、標的ポリヌクレオチドに少なくとも1つのアダプターオリゴヌクレオチドを接合した後に、1つ以上の標的ポリヌクレオチドの3'末端は、1つ以上の接合されたアダプターオリゴヌクレオチドを鋳型として使用して伸長される。例えば、標的ポリヌクレオチドの5'末端だけに接合しているハイブリダイズしたオリゴヌクレオチドを2つ含むアダプターにより、未接合の鎖の置き換えと同時に又はその後に、アダプターの接合鎖を鋳型として使用して、標的の未接合の3'末端の伸長が可能になる。接合した産物が5'突出を有し、5'突出を鋳型として使用して相補的3'末端を伸長できるように、ハイブリダイズしたオリゴヌクレオチドを2つ含むアダプターの両方の鎖を標的ポリヌクレオチドに接合できる。更なる例として、ヘアピンアダプターオリゴヌクレオチドを、標的ポリヌクレオチドの5'末端に接合することができる。いくつかの実施形態において、伸長される標的ポリヌクレオチドの3'末端は、アダプターオリゴヌクレオチドからのヌクレオチドを1つ以上含む。両方の末端でアダプターが接合されている標的ポリヌクレオチドの場合、伸長は、5'突出を有する二本鎖標的ポリヌクレオチドの両方の3'末端に実施することができる。この3'末端伸長又は「フィルイン」反応は、鋳型にハイブリダイズしているアダプターオリゴヌクレオチド鋳型に対する相補的配列又は「相補体」を生成し、したがって5'突出を埋めて、二本鎖配列領域を作製する。二本鎖標的ポリヌクレオチドの両方の末端が、相補鎖の3'末端の伸長により埋められる5'突出を有する場合、その産物は、完全に二本鎖である。伸長は、DNAポリメラーゼなど当技術分野において公知の任意の適切なポリメラーゼによって実施することができ、その多くは市販されている。DNAポリメラーゼは、DNA依存性DNAポリメラーゼ活性、RNA依存性DNAポリメラーゼ活性又はDNA依存性及びRNA依存性DNAポリメラーゼ活性を含むことができる。DNAポリメラーゼは耐熱性又は非耐熱性であり得る。DNAポリメラーゼの例には、それだけには限らないが、Taqポリメラーゼ、Tthポリメラーゼ、Tliポリメラーゼ、Pfuポリメラーゼ、Pfutuboポリメラーゼ、Pyrobestポリメラーゼ、Pwoポリメラーゼ、KODポリメラーゼ、Bstポリメラーゼ、Sacポリメラーゼ、Ssoポリメラーゼ、Pocポリメラーゼ、Pabポリメラーゼ、Mthポリメラーゼ、Phoポリメラーゼ、ES4ポリメラーゼ、VENTポリメラーゼ、DEEPVENTポリメラーゼ、EX-Taqポリメラーゼ、LA-Taqポリメラーゼ、Expandポリメラーゼ、Platinum Taqポリメラーゼ、Hi-Fiポリメラーゼ、Tbrポリメラーゼ、Tflポリメラーゼ、Truポリメラーゼ、Tacポリメラーゼ、Tneポリメラーゼ、Tmaポリメラーゼ、Tihポリメラーゼ、Tfiポリメラーゼ、Klenowフラグメント並びにそのバリアント、修飾産物及び誘導体がある。3'末端の伸長は、独立したサンプルから標的ポリヌクレオチドをプールする前か後に実施することができる。 In some embodiments, after conjugating at least one adapter oligonucleotide to the target polynucleotide, the 3'end of one or more target polynucleotides uses one or more conjugated adapter oligonucleotides as a template. Is stretched. For example, with an adapter containing two hybridized oligonucleotides attached only to the 5'end of the target polynucleotide, at the same time as or after replacement of the unattached strand, using the attached strand of the adapter as a template. Allows unjunctioned 3'end elongation of the target. Both strands of the adapter containing two hybridized oligonucleotides are targeted to the polynucleotide so that the joined product has a 5'protrusion and the 5'protrusion can be used as a template to extend the complementary 3'end. Can be joined. As a further example, a hairpin adapter oligonucleotide can be attached to the 5'end of the target polynucleotide. In some embodiments, the 3'end of the extended target polynucleotide comprises one or more nucleotides from the adapter oligonucleotide. For target polynucleotides to which adapters are attached at both ends, extension can be performed at both 3'ends of double-stranded target polynucleotides with 5'protrusions. This 3'end extension or "fill-in" reaction produces a complementary sequence or "complement" to the adapter oligonucleotide template hybridizing to the template and thus fills the 5'protrusion to create a double-stranded sequence region. To make. If both ends of the double-stranded target polynucleotide have a 5'protrusion filled by extension of the 3'end of the complementary strand, the product is completely double-stranded. Elongation can be performed by any suitable polymerase known in the art, such as DNA polymerase, many of which are commercially available. The DNA polymerase can include DNA-dependent DNA polymerase activity, RNA-dependent DNA polymerase activity or DNA-dependent and RNA-dependent DNA polymerase activity. DNA polymerases can be thermostable or non-thermostable. Examples of DNA polymerases include, but are not limited to, Taq polymerase, Tth polymerase, Tli polymerase, Pfu polymerase, Pfutubo polymerase, Pyrobest polymerase, Pwo polymerase, KOD polymerase, Bst polymerase, Sac polymerase, Sso polymerase, Poc polymerase, Pab. Polymerase, Mth polymerase, Pho polymerase, ES4 polymerase, VENT polymerase, DEEPVENT polymerase, EX-Taq polymerase, LA-Taq polymerase, Expand polymerase, Platinum Taq polymerase, Hi-Fi polymerase, Tbr polymerase, Tfl polymerase, Tru polymerase, Tac polymerase , Tne Polymerase, Tma Polymerase, Tih Polymerase, Tfi Polymerase, Klenow Fragment and variants thereof, modified products and derivatives. 3'end extension can be performed before or after pooling the target polynucleotide from an independent sample.

特定の実施形態において、本開示は、標的核酸の濃縮及び標的核酸の分析の方法を提供する。いくつかの場合において、濃縮の方法は、溶液系の形式である。いくつかの場合において、標的核酸は、標識化剤で標識することができる。他の場合において、標的核酸は、標識化剤で標識される1つ以上の会合分子に架橋することができる。標識化剤の例には、それだけには限らないがビオチン、ポリヒスチジンタグ及び化学的タグ(例えば、クリック化学法において使用されるアルキン及びアジド誘導体)がある。更に、標識された標的核酸は捕捉することができ、そのため捕捉剤を使用することによって濃縮することができる。捕捉剤は、ストレプトアビジン及び/又はアビジン、抗体、化学的部分(例えばアルキン、アジ化物)、並びに当技術分野において公知の親和性精製に使用される任意の生物、化学、物理若しくは酵素剤であり得る。 In certain embodiments, the present disclosure provides methods for enriching the target nucleic acid and analyzing the target nucleic acid. In some cases, the method of concentration is in the form of a solution system. In some cases, the target nucleic acid can be labeled with a labeling agent. In other cases, the target nucleic acid can be crosslinked to one or more associated molecules labeled with a labeling agent. Examples of labeling agents include, but are not limited to, biotin, polyhistidine tags and chemical tags (eg, alkyne and azide derivatives used in click chemistry). In addition, the labeled target nucleic acid can be captured and therefore enriched by using a scavenger. The scavenger is streptavidin and / or avidin, an antibody, a chemical moiety (eg, an alkyne, an azide), and any biological, chemical, physical or enzymatic agent used for affinity purification known in the art. obtain.

いくつかの場合において、固定化された又は固定化されていない核酸プローブを使用して、標的核酸を捕捉することができる。例えば、標的核酸は、固体支持体上又は溶液中のプローブに対するハイブリダイゼーションによりサンプルから濃縮することができる。いくつかの例において、サンプルはゲノムサンプルであり得る。いくつかの例において、プローブはアンプリコンであり得る。アンプリコンは、既定の配列を含むことができる。更に、ハイブリダイズした標的核酸は、洗浄する及び/又はプローブから溶出することができる。標的核酸は、DNA、RNA、cDNA又はmRNA分子であり得る。 In some cases, immobilized or non-immobilized nucleic acid probes can be used to capture the target nucleic acid. For example, the target nucleic acid can be concentrated from the sample by hybridization to the probe on a solid support or in solution. In some examples, the sample can be a genomic sample. In some examples, the probe can be an amplicon. The amplicon can contain a default array. In addition, the hybridized target nucleic acid can be washed and / or eluted from the probe. The target nucleic acid can be a DNA, RNA, cDNA or mRNA molecule.

いくつかの場合において、濃縮方法は、プローブに標的核酸を含むサンプルを接触させるステップと、固体支持体に標的核酸を結合させるステップとを含むことができる。いくつかの場合において、サンプルを、化学的、物理的又は酵素的方法を使用して断片化して、標的核酸を産生することができる。いくつかの場合において、プローブは、標的核酸に特異的にハイブリダイズすることができる。いくつかの場合において、標的核酸は、約50～5000、約50～2000、約100～2000、約100～1000、約200～1000、約200～800又は約300～800、約300～600又は約400～600ヌクレオチド残基の平均サイズを有することができる。標的核酸は、サンプル中の結合していない核酸から更に分離することができる。固体支持体を洗浄及び/又は溶出して、濃縮標的核酸を得ることができる。いくつかの例において、濃縮ステップは、約1、2、3、4、5、6、7、8、9又は10回繰り返すことができる。例えば、濃縮ステップは、約1、2又は3回繰り返すことができる。 In some cases, the enrichment method can include contacting the probe with a sample containing the target nucleic acid and binding the target nucleic acid to the solid support. In some cases, the sample can be fragmented using chemical, physical or enzymatic methods to produce the target nucleic acid. In some cases, the probe can specifically hybridize to the target nucleic acid. In some cases, the target nucleic acid is about 50-5000, about 50-2000, about 100-2000, about 100-1000, about 200-1000, about 200-800 or about 300-800, about 300-600 or It can have an average size of about 400-600 nucleotide residues. The target nucleic acid can be further separated from the unbound nucleic acid in the sample. The solid support can be washed and / or eluted to give the enriched target nucleic acid. In some examples, the enrichment step can be repeated about 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 times. For example, the enrichment step can be repeated about 1, 2 or 3 times.

いくつかの場合において、濃縮法は、プローブ由来アンプリコンを準備することを含むことができ、前記増幅用のプローブは固体支持体に結合している。固体支持体は、サンプルから特異的な標的核酸を捕捉するために、支持体に固定化された核酸プローブを含むことができる。プローブ由来アンプリコンは、標的核酸にハイブリダイズすることができる。プローブアンプリコンへのハイブリダイゼーション後に、サンプル中の標的核酸を、捕捉し(例えば、ビオチン、抗体等の捕捉剤による)、洗浄し及び/又は捕捉したプローブからハイブリダイズした標的核酸を溶出することによって濃縮できる(図4)。標的核酸配列を、例えばPCR法を使用して更に増幅して、濃縮PCR産物の増幅されたプールを作製することができる。 In some cases, the enrichment method can include preparing a probe-derived amplicon, wherein the amplification probe is attached to a solid support. The solid support can include a nucleic acid probe immobilized on the support to capture a specific target nucleic acid from the sample. The probe-derived amplicon can hybridize to the target nucleic acid. After hybridization to the probe amplicon, the target nucleic acid in the sample is captured (eg, with a capture agent such as biotin, antibody, etc.), washed and / or by elution of the hybridized target nucleic acid from the captured probe. It can be concentrated (Fig. 4). The target nucleic acid sequence can be further amplified, for example using PCR methods, to create an amplified pool of concentrated PCR products.

いくつかの場合において、固体支持体は、マイクロアレイ、スライド、チップ、マイクロウエル、カラム、管、粒子又はビーズであり得る。いくつかの例において、固体支持体は、ストレプトアビジン及び/又はアビジンでコーティングされることができる。他の例において、固体支持体は、抗体でコーティングされることができる。更に、固体支持体は、ガラス、金属、セラミック又はポリマー材料を含むことができる。いくつかの実施形態において、固体支持体は、核酸マイクロアレイ(例えばDNAマイクロアレイ)であり得る。他の実施形態において、固体支持体は、常磁性ビーズであり得る。 In some cases, the solid support can be a microarray, slide, chip, microwell, column, tube, particle or bead. In some examples, the solid support can be coated with streptavidin and / or avidin. In another example, the solid support can be coated with an antibody. Further, the solid support can include glass, metal, ceramic or polymer materials. In some embodiments, the solid support can be a nucleic acid microarray (eg, a DNA microarray). In other embodiments, the solid support can be paramagnetic beads.

いくつかの場合において、濃縮方法は、第2の制限酵素による消化、セルフライゲーション(例えば自己環状化)、及び最初の制限酵素による再消化を含むことができる。特定の例において、ライゲーション産物だけが線状化されることになり、アダプターライゲーション及び配列決定に利用可能になる。他の場合において、ライゲーション接合部配列自体を使用して、接合部配列に相補的なベイトプローブを使用するハイブリダイゼーションに基づいて濃縮することができる。 In some cases, the enrichment method can include digestion with a second restriction enzyme, self-ligation (eg, self-cyclization), and redigestion with the first restriction enzyme. In certain examples, only the ligation product will be linearized and will be available for adapter ligation and sequencing. In other cases, the ligation junction sequence itself can be used to concentrate based on hybridization using a bait probe complementary to the junction sequence.

特定の実施形態において、本開示は、濃縮DNAを増幅する方法を提供する。いくつかの場合において、濃縮DNAは、リード対である。リード対は、本開示の方法によって得ることができる。 In certain embodiments, the present disclosure provides a method of amplifying concentrated DNA. In some cases, the concentrated DNA is a read pair. Lead pairs can be obtained by the method of the present disclosure.

いくつかの実施形態において、1つ以上の増幅及び/又は複製ステップを使用して、配列決定しようとするライブラリーを調製する。当技術分野において公知の任意の増幅法を使用することができる。使用できる増幅技法の例には、それだけには限らないが、定量的PCR、定量的蛍光PCR(QF-PCR)、多重蛍光PCR(MF-PCR)、リアルタイムPCR(RTPCR)、単一細胞PCR、制限断片長多型PCR(PCR-RFLP)、PCK-RFLPIRT-PCR-IRFLP、ホットスタートPCR、ネステッドPCR、in situポロノニーPCR(polonony PCR)、in situローリングサークル増幅(RCA)、ブリッジPCR、ライゲーション媒介PCR、Qbレプリカーゼ増幅、インバースPCR、ピコタイターPCR及びエマルジョンPCRがある。他の適切な増幅法には、リガーゼ連鎖反応(LCR)、転写増幅、自家持続配列複製法、標的ポリヌクレオチド配列の選択的増幅、共通配列プライムドポリメラーゼ連鎖反応(consensus sequence primed polymerase chain reaction)(CP-PCR)、任意プライムドポリメラーゼ連鎖反応(arbitrarily primed polymerase chain reaction)(AP-PCR)、縮重オリゴヌクレオチドプライムドPCR(DOP-PCR)及び核酸配列ベース増幅(NABSA)がある。本明細書において使用できる他の増幅法は、米国特許第5,242,794号、第5,494,810号、第4,988,617号、及び第6,582,938号に記載されるそれらを含む。 In some embodiments, one or more amplification and / or replication steps are used to prepare the library to be sequenced. Any amplification method known in the art can be used. Examples of amplification techniques that can be used are, but are not limited to, quantitative PCR, quantitative fluorescence PCR (QF-PCR), multiplex fluorescence PCR (MF-PCR), real-time PCR (RTPCR), single-cell PCR, limitation. Fragment length polymorphic PCR (PCR-RFLP), PCK-RFLPIRT-PCR-IRFLP, hot start PCR, nested PCR, in situ polonony PCR (polonony PCR), in situ rolling circle amplification (RCA), bridge PCR, ligation-mediated PCR , Qb Replicase Amplification, Inverse PCR, Picotiter PCR and Emulsion PCR. Other suitable amplification methods include ligase chain reaction (LCR), transcriptional amplification, autologous sustained sequence replication, selective amplification of target polynucleotide sequences, and consensus sequence primed polymerase chain reaction (consensus sequence primed polymerase chain reaction). There are CP-PCR, arbitrarily primed polymerase chain reaction (AP-PCR), degenerate oligonucleotide primed PCR (DOP-PCR) and nucleic acid sequence-based amplification (NABSA). Other amplification methods that can be used herein include those described in US Pat. Nos. 5,242,794, 5,494,810, 4,988,617, and 6,582,938.

特定の実施形態において、DNA分子を個々の区画に分注した後に、PCRを使用してDNA分子を増幅する。いくつかの場合において、増幅アダプター内の1つ以上の特異的プライミング配列が、PCR増幅に利用される。増幅アダプターは、個々の区画に分注する前か後に断片化したDNA分子にライゲーションすることができる。両方の末端に適切なプライミング配列がある増幅アダプターを含むポリヌクレオチドは、指数的にPCR増幅することができる。例えば、プライミング配列を含む増幅アダプターの不完全なライゲーション効率により、適切なプライミング配列が1つしかないポリヌクレオチドは、直線的な増幅しか受けることができない。更に、適切なプライミング配列を含むアダプターが全くライゲーションされない場合、ポリヌクレオチドは、増幅、例えばPCR増幅から全部まとめて取り除かれ得る。いくつかの実施形態において、PCRサイクルの数は、10～30サイクルで変化するが、9、8、7、6、5、4、3、2未満のサイクルと同程度に少ない又は40、45、50、55、60サイクルと同程度に多くなることができる。その結果、PCR増幅の後には、適切なプライミング配列を持つ増幅アダプターを保有する指数的に増幅可能な断片は、線形に増幅可能な又は増幅不能な断片と比較して、非常に高い濃度(1000倍又はそれ以上)で存在することができる。全ゲノム増幅技法(ランダム化プライマーによる増幅又はphi29ポリメラーゼを使用する多置換増幅など)と比較して、PCRの恩恵は、それだけには限らないが、より均質な相対的配列カバレッジ、[各断片は、1サイクルにつき最高でも1回しかコピーされ得ず、増幅は、熱サイクルプログラムによって制御されるので、例えばMDAよりキメラ分子の形成率が実質的に低い(Laskenら、2007年、BMC Biotechnology)][キメラ分子は、アセンブリグラフ中に非生物性配列が存在することにより正確な配列アセンブリに大きな課題をもたらすので、より高い率のミスアセンブリ又は非常に曖昧で断片的なアセンブリが生じる可能性があり、特異的配列を持つ特異的なプライミング部位を使用することと比較して、MDAにおいて一般に使用されるランダム化プライマーの結合に起因し得る、配列特異的なバイアスが減少する]、PCRサイクルの数の選択によって制御できる最終的な増幅DNA産物の量の高い再現性、及び当技術分野において公知の一般的な全ゲノム増幅技法と比べて、PCRにおいて一般的に使用されるポリメラーゼによる複製の高い忠実度が挙げられる。 In certain embodiments, the DNA molecule is dispensed into individual compartments and then PCR is used to amplify the DNA molecule. In some cases, one or more specific priming sequences in the amplification adapter are utilized for PCR amplification. The amplification adapter can be ligated to fragmented DNA molecules before or after dispensing into individual compartments. Polynucleotides containing amplification adapters with suitable priming sequences at both ends can be PCR amplified exponentially. For example, due to the incomplete ligation efficiency of amplification adapters containing priming sequences, polynucleotides with only one suitable priming sequence can only undergo linear amplification. Furthermore, if no adapter containing the appropriate priming sequence is ligated, the polynucleotide can be removed altogether from amplification, eg PCR amplification. In some embodiments, the number of PCR cycles varies from 10 to 30 cycles, but is as low as 9, 8, 7, 6, 5, 4, 3, 2, or less than 40, 45, It can be as many as 50, 55, 60 cycles. As a result, after PCR amplification, exponentially amplifyable fragments carrying an amplification adapter with the appropriate priming sequence have a very high concentration (1000) compared to linearly amplifyable or non-amplifiable fragments. Can be doubled or more). Compared to whole-genome amplification techniques (such as amplification with randomized primers or polysubstituted amplification using phi29 polymerase), the benefits of PCR are not limited to that, but more homogeneous relative sequence coverage, [each fragment is It can be copied at most once per cycle, and amplification is controlled by a thermal cycle program, resulting in a substantially lower rate of chimeric molecule formation than, for example, MDA (Lasken et al., 2007, BMC Biotechnology)] [. Chimera molecules pose a major challenge to accurate sequence assembly due to the presence of abiotic sequences in the assembly graph, which can lead to higher rates of misassembly or very vague and fragmented assembly. Compared to using a specific priming site with a specific sequence, the sequence-specific bias that can result from binding of randomized primers commonly used in MDA is reduced], the number of PCR cycles. High reproducibility of the amount of final amplified DNA product that can be controlled by selection, and high fidelity of replication by polymerases commonly used in PCR compared to common whole-genome amplification techniques known in the art. Can be mentioned.

いくつかの実施形態において、フィルイン反応が、第1のプライマー及び第2のプライマーを使用する1つ以上の標的ポリヌクレオチドの増幅の一部として続けられる又は実施され、第1のプライマーは、第1のアダプターオリゴヌクレオチドの1つ以上の相補体の少なくとも一部にハイブリダイズ可能な配列を含み、更に第2のプライマーは、第2のアダプターオリゴヌクレオチドの1つ以上の相補体の少なくとも一部にハイブリダイズ可能な配列を含む。第1及び第2のプライマーのそれぞれは、約10、15、20、25、30、35、40、45、50、55、60、65、70、75、80、90、100ヌクレオチド又はそれ未満若しくはより長いヌクレオチドなど任意の適切な長さであり得、その任意の部分又は全ては、対応する標的配列(例えば約5、10、15、20、25、30、35、40、45、50ヌクレオチド又はそれ未満若しくはより長いヌクレオチド)と相補的であり得る。例えば、約10～50ヌクレオチドが、対応する標的配列と相補的であり得る。 In some embodiments, the fill-in reaction is continued or carried out as part of amplification of one or more target polynucleotides using the first and second primers, the first primer being the first primer. Contains a sequence capable of hybridizing to at least a portion of one or more complements of the adapter oligonucleotide of the second primer, and the second primer hybridizes to at least a portion of one or more complements of the second adapter oligonucleotide. Contains sequences that can be hybridized. Each of the first and second primers is about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100 nucleotides or less or less. It can be of any suitable length, such as longer nucleotides, and any portion or all of it can be the corresponding target sequence (eg, about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 nucleotides or Can be complementary to lesser or longer nucleotides). For example, about 10-50 nucleotides can be complementary to the corresponding target sequence.

「増幅」とは、標的配列のコピー数が増加する任意の過程のことを指す。いくつかの場合において、複製反応は、ポリヌクレオチドの相補的コピー/複写を1つだけ作製することができる。標的ポリヌクレオチドのプライマー指向性増幅の方法は、当技術分野において公知であり、それだけには限らないが、ポリメラーゼ連鎖反応(PCR)に基づく方法を含む。PCRによる標的配列の増幅に有利な条件は、当技術分野において公知であり、その過程の様々なステップで最適化することができ、標的型、標的濃度、増幅される配列長、標的及び/又は1つ以上のプライマーの配列、プライマー長、プライマー濃度、使用するポリメラーゼ、反応容積、1つ以上のエレメントと1つ以上の他エレメントの比、その他など反応中のエレメントの特性に依存しており、そのいくつか又は全ては改変することができる。一般に、PCRは、増幅される標的の変性(二本鎖の場合)、標的への1つ以上のプライマーのハイブリダイゼーション及びDNAポリメラーゼによるプライマーの伸長ステップを含み、標的配列を増幅するために繰り返される(又は「サイクルされる」)ステップを伴う。この過程におけるステップは、収率の向上、偽産物形成の低下及び/又はプライマーアニーリングの特異性の増加若しくは低下など様々な結果に対して最適化することができる。最適化の方法は、当技術分野で周知であり、増幅反応におけるエレメントの型若しくは量、並びに/又は特定のステップにおける温度、特定のステップの継続時間及び/若しくはサイクル数などの過程における所与のステップの条件の調整を含む。 "Amplification" refers to any process in which the number of copies of a target sequence increases. In some cases, the replication reaction can make only one complementary copy / copy of the polynucleotide. Methods of primer-directed amplification of target polynucleotides are known in the art and include, but are not limited to, polymerase chain reaction (PCR) -based methods. Advantageous conditions for amplification of target sequences by PCR are known in the art and can be optimized at various steps in the process, targeting type, target concentration, amplified sequence length, target and / or. It depends on the characteristics of the element being reacted, such as the sequence of one or more primers, primer length, primer concentration, polymerase used, reaction volume, ratio of one or more elements to one or more other elements, etc. Some or all of them can be modified. Generally, PCR involves denaturation of the targeted target to be amplified (in the case of double strands), hybridization of one or more primers to the target, and extension of the primer with DNA polymerase, which is repeated to amplify the target sequence. Accompanied by (or "cycled") steps. Steps in this process can be optimized for a variety of outcomes such as increased yields, decreased fake product formation and / or increased or decreased specificity of primer annealing. Optimization methods are well known in the art and given in the course of the type or quantity of elements in the amplification reaction and / or the temperature at a particular step, the duration of a particular step and / or the number of cycles. Includes adjustment of step conditions.

いくつかの実施形態において、増幅反応は、少なくとも約5、10、15、20、25、30、35、40、50、60、70、80、90、100、150、200又はより多いサイクルを含むことができる。いくつかの例において、増幅反応は、少なくとも約20、25、30、35又は40サイクルを含むことができる。いくつかの実施形態において、増幅反応は、約5、10、15、20、25、35、40、50、60、70、80、90、100、150、200又はより多くのサイクル以下である。サイクルは、1、2、3、4、5、6、7、8、9、10又はより多くのステップなど、任意の数のステップを含有することができる。ステップは、3'末端伸長(例えばアダプターフィルイン)、プライマーアニーリング、プライマー伸長及び鎖変性を含むがそれには限定されない、所与のステップの目的を実現するのに適切な任意の温度又は温度勾配を含むことができる。ステップは、約1、5、10、15、20、25、30、35、40、45、50、55、60、70、80、90、100、120、180、240、300、360、420、480、540、600、1200、1800秒又はそれ未満若しくはより長い時間を含むがこれに限定されない任意の継続時間、手作業で中断されるまで無制限の時間も含むものであり得る。異なるステップを含む任意の数のサイクルを、任意の順序で組み合わせることができる。いくつかの実施形態において、組合せにおけるサイクルの合計数が約5、10、15、20、25、30、35、40、50、60、70、80、90、100、150、200サイクル又はそれ未満若しくはより多くなるように、異なるステップを含む異なるサイクルが組み合わされる。いくつかの実施形態において、増幅はフィルイン反応の後に実施される。 In some embodiments, the amplification reaction comprises at least about 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or more cycles. be able to. In some examples, the amplification reaction can include at least about 20, 25, 30, 35 or 40 cycles. In some embodiments, the amplification reaction is about 5, 10, 15, 20, 25, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or more cycles or less. The cycle can contain any number of steps, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more steps. Steps include, but are not limited to, 3'end extension (eg, adapter fill-in), primer annealing, primer extension and chain denaturation, including any temperature or temperature gradient suitable to achieve the purpose of a given step. be able to. Steps are about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 180, 240, 300, 360, 420, It may include any duration, including but not limited to 480, 540, 600, 1200, 1800 seconds or less, and unlimited time until manually interrupted. Any number of cycles containing different steps can be combined in any order. In some embodiments, the total number of cycles in the combination is about 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 cycles or less. Or different cycles containing different steps are combined so that there are more. In some embodiments, amplification is performed after the fill-in reaction.

いくつかの実施形態において、増幅反応は、少なくとも約1、2、3、4、5、6、7、8、9、10、12、14、16、18、20、25、30、40、50、100、200、300、400、500、600、800、1000ngの標的DNA分子で実施することができる。他の実施形態において、増幅反応は、約1、2、3、4、5、6、7、8、9、10、12、14、16、18、20、25、30、40、50、100、200、300、400、500、600、800、1000ng未満の標的DNA分子で実施することができる。 In some embodiments, the amplification reaction is at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50. , 100, 200, 300, 400, 500, 600, 800, 1000 ng of target DNA molecules. In other embodiments, the amplification reaction is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50, 100. , 200, 300, 400, 500, 600, 800, can be performed with target DNA molecules less than 1000 ng.

増幅は、独立したサンプルから標的ポリヌクレオチドをプールする前か後に実施することができる。 Amplification can be performed before or after pooling the target polynucleotide from an independent sample.

本開示の方法は、サンプル中に存在する増幅可能な核酸の量を決定することを含む。任意の公知の方法を使用して、増幅可能な核酸を定量化することができ、典型的な方法はポリメラーゼ連鎖反応(PCR)、特に定量的ポリメラーゼ連鎖反応(qPCR)である。qPCRは、ポリメラーゼ連鎖反応に基づく技法であり、それを使用して、標的した核酸分子を増幅し、同時に定量化する。qPCRは、DNAサンプル中の特異的な配列の検出と定量化(入力DNA又は追加の標準化用遺伝子に対して標準化される際に、コピーの絶対数又は相対量として)の両方を可能にする。手順は、ポリメラーゼ連鎖反応の一般的な原則に従っており、反応において各増幅サイクル後に増幅されたDNAが蓄積する際に、リアルタイムでそのDNAを定量化する追加の特徴を伴う。QPCRについては、例えば、Kurnitら(米国特許第6,033,854号)、Wangら(米国特許第5,567,583号及び第5,348,853号)、Maら(The Journal of American Science、2(3)、2006年)、Heidら(Genome Research、986～994頁、1996年)、Sambrook及びRussell(Quantitative PCR、Cold Spring Harbor Protocols、2006年)、並びにHiguchi米国特許第6,171,785号及び第5,994,056号)に記載されている。これらの内容は、その全体を参照により本明細書に組み込む。 The methods of the present disclosure include determining the amount of amplifyable nucleic acid present in a sample. Any known method can be used to quantify the amplifyable nucleic acid, a typical method being a polymerase chain reaction (PCR), especially a quantitative polymerase chain reaction (qPCR). qPCR is a technique based on the polymerase chain reaction that uses it to amplify and simultaneously quantify targeted nucleic acid molecules. qPCR enables both detection and quantification of specific sequences in DNA samples (as an absolute number or relative amount of copies when standardized for input DNA or additional standardization genes). The procedure follows the general principles of the polymerase chain reaction, with the additional feature of quantifying the amplified DNA in real time as it accumulates after each amplification cycle in the reaction. For QPCR, for example, Kurnit et al. (US Pat. No. 6,033,854), Wang et al. (US Pat. Nos. 5,567,583 and 5,348,853), Ma et al. (The Journal of American Science, 2 (3), 2006), Heid et al. (Genome Research, pp. 986-994, 1996), Sambrook and Russell (Quantitative PCR, Cold Spring Harbor Protocols, 2006), and Higuchi US Pat. Nos. 6,171,785 and 5,994,056). These contents are incorporated herein by reference in their entirety.

定量化の他の方法には、二本鎖DNAの間に入る蛍光色素及び、相補的DNAとハイブリダイズした場合に、蛍光を発する修飾DNAオリゴヌクレオチドプローブの使用がある。これらの方法は、広く使用することができるが、例として更に詳述した通りリアルタイムPCRにも特に適している。第1の方法において、DNA結合色素は、PCRにおいて全ての二本鎖DNA(ds)に結合し、色素の蛍光を生じる。そのため、PCRの間のDNA産物の増加は、蛍光強度の増加をもたらし、各サイクルで測定され、したがってDNA濃度を定量化することが可能になる。反応は、標準的なPCR反応と同様に調製され、蛍光(ds)DNA色素が添加される。反応はサーモサイクラーで実行され、各サイクルの後に、蛍光レベルが検出器で測定される、色素は、(ds)DNA(即ち、PCR産物)に結合しているときしか蛍光を発さない。標準的な希釈を参照して、PCRにおける(ds)DNA濃度を決定することができる。他のリアルタイムPCR法のように、得られる値は、それと関連付けられた絶対単位を有さない。測定されたDNA/RNAサンプルと標準希釈との比較は、標準に対するサンプルの分数又は比率を与え、異なる組織間又は実験条件間の相対的な比較が可能になる。標的遺伝子の定量化及び/又は発現における精度を確実にするために、安定して発現する遺伝子に対して標準化することができる。未知の遺伝子のコピー数は、既知のコピー数の遺伝子に対して同様に標準化することができる。 Other methods of quantification include the use of fluorescent dyes that fall between double-stranded DNA and modified DNA oligonucleotide probes that fluoresce when hybridized with complementary DNA. Although these methods can be widely used, they are also particularly suitable for real-time PCR, as described in more detail as an example. In the first method, the DNA-binding dye binds to all double-stranded DNA (ds) in PCR, resulting in dye fluorescence. Therefore, the increase in DNA product during PCR results in an increase in fluorescence intensity, which is measured at each cycle and thus makes it possible to quantify the DNA concentration. The reaction is prepared in the same manner as a standard PCR reaction, and a fluorescent (ds) DNA dye is added. The reaction is performed in a thermocycler and after each cycle the fluorescence level is measured by the detector, the dye fluoresces only when it is bound to (ds) DNA (ie, the PCR product). The (ds) DNA concentration in PCR can be determined with reference to standard dilutions. As with other real-time PCR methods, the resulting value has no absolute unit associated with it. The comparison between the measured DNA / RNA sample and the standard dilution gives a fraction or ratio of the sample to the standard, allowing relative comparisons between different tissues or experimental conditions. It can be standardized for genes that are stably expressed to ensure accuracy in quantification and / or expression of the target gene. The number of copies of an unknown gene can be similarly standardized for a gene with a known number of copies.

第2の方法は、配列特異的RNA又はDNAに基づくプローブを使用して、プローブ配列を含有するDNAだけを定量化する、したがって、レポータープローブの使用は、特異性を著しく高め、いくらかの非特異的DNA増幅の存在下でも定量化を可能にする。全ての遺伝子が同程度の効率で増幅される場合には、これにより多重化、即ち、異なる色の標識を持つ特異的プローブを使用することにより同じ反応においていくつもの遺伝子をアッセイすることが可能になる。 The second method uses sequence-specific RNA or DNA-based probes to quantify only the DNA containing the probe sequences, so the use of reporter probes significantly increases specificity and provides some non-specificity. Allows quantification even in the presence of target DNA amplification. If all genes are amplified with comparable efficiency, this allows multiplexing, ie, assaying multiple genes in the same reaction using specific probes with different color labels. Become.

この方法は、プローブの一方の末端に蛍光レポーター(例えば6-カルボキシフルオレスセイン)及び反対の末端に蛍光の消光物質(例えば、6-カルボキシテトラメチルローダミン)を持つDNAに基づくプローブで一般に実施されている。消光物質へのレポーターの近接近は、その蛍光の検出を妨げる。ポリメラーゼ(例えば、Taqポリメラーゼ)の5'→3'エキソヌクレアーゼ活性によるプローブの分解は、レポーター-消光物質の近接を破断し、したがって消光されていない蛍光が放射され、その放射を検出することができる。各PCRサイクルにおけるレポータープローブに標的された産物の増加は、プローブの分解及びレポーターの放出により蛍光の比例した増加がもたらす。反応は、標準的なPCR反応と同様に調製され、レポータープローブが添加される。反応が始まるとき、PCRのアニーリング期の間、プローブとプライマー両方がDNA標的にアニールする。新たなDNA鎖の重合は、プライマーから開始され、一旦、ポリメラーゼがプローブに達すると、その5'→3'エキソヌクレアーゼはプローブを分解し、蛍光レポーターと消光物質とを物理的に分離し、蛍光が増加する。蛍光は、リアルタイムPCRサーモサイクラーで検出及び測定され、産物の指数関数的な増加に対応する蛍光の幾何学的な増加を使用して、各反応における閾値サイクルを決定する。 This method is commonly performed with DNA-based probes that have a fluorescent reporter (eg, 6-carboxyfluorescein) at one end of the probe and a fluorescent quencher (eg, 6-carboxytetramethylrhodamine) at the opposite end. ing. The proximity of the reporter to the quencher interferes with the detection of its fluorescence. Degradation of the probe by the 5'→ 3'exonuclease activity of a polymerase (eg, Taq polymerase) breaks the reporter-quenching substance proximity and thus emits unquenched fluorescence, which can be detected. .. The increase in products targeted by the reporter probe in each PCR cycle is brought about by a proportional increase in fluorescence due to probe degradation and reporter release. The reaction is prepared in the same manner as a standard PCR reaction and a reporter probe is added. When the reaction begins, both the probe and primer anneal to the DNA target during the PCR annealing phase. Polymerization of the new DNA strand is initiated from the primer, and once the polymerase reaches the probe, its 5'→ 3'exonuclease decomposes the probe, physically separating the fluorescent reporter from the quencher and fluorescing. Will increase. Fluorescence is detected and measured with a real-time PCR thermocycler and uses a geometric increase in fluorescence corresponding to the exponential increase in product to determine the threshold cycle in each reaction.

サイクル数に対する蛍光を対数目盛にプロットすることによって(それで指数的に増加している量は直線を与える)、反応の対数期の間に存在するDNAの相対濃度が決定される。バックグラウンドを越える蛍光の検出の閾値が、決定される。サンプルからの蛍光が閾値と交差するサイクルは、サイクル閾値、C_tと呼ばれる。対数期の間、DNAの量はサイクル毎に2倍になるので、DNAの相対量を算出することができる、例えば別のサンプルより3サイクル早いC_tのサンプルは、2³ = 8倍多くの鋳型を有する。次いで、結果を、既知の量の核酸の段階希釈(例えば原液、1:4、1:16、1:64)のリアルタイムPCRによって作製される標準曲線と比較することによって、核酸(例えば、RNA又はDNA)の量が決定される。 By plotting the fluorescence for the number of cycles on a logarithmic scale (so the exponentially increasing amount gives a straight line), the relative concentration of DNA present during the logarithmic phase of the reaction is determined. The threshold for detecting fluorescence above the background is determined. The cycle in which the fluorescence from the sample intersects the threshold is called the cycle threshold, C _t . During the logarithmic phase, the amount of DNA doubles with each cycle, so the relative amount of DNA can be calculated, for example, a sample with C _t that is 3 cycles earlier than another sample is 2 ³ = 8 times more. Has a mold. The results are then compared to a standard curve produced by real-time PCR of a known amount of serial dilution of nucleic acid (eg, stock solution, 1: 4, 1:16, 1:64) to nucleic acid (eg, RNA or The amount of DNA) is determined.

特定の実施形態において、qPCR反応は、蛍光共鳴エネルギー移動(FRET)を利用する二重発蛍光団手法、例えば、LIGHTCYCLERハイブリダイゼーションプローブを含み、ここで2つのオリゴヌクレオチドプローブが、アンプリコンにアニールする(例えば、米国特許第6,174,670号を参照のこと)。オリゴヌクレオチドは、効果的なエネルギー移動に適合する距離で分離されている発蛍光団とヘッドトゥーテールの方向にハイブリダイズするように設計されている。核酸に結合している又は伸長産物に組み込まれているときにシグナルを放射するよう構造化されている標識オリゴヌクレオチドの他の例には、:SCORPIONSプローブ(例えば、Whitcombeら、Nature Biotechnology 17:804～807頁、1999年及び米国特許第6,326,145号)、Sunrise(又はAMPLIFLOUR)プライマー(例えばNazarenkoら、Nuc. Acids Res. 25:2516～2521頁、1997年及び米国特許第6,117,635号)、並びにLUXプライマー及びモレキュラービーコンプローブ(例えば、Tyagiら、Nature Biotechnology 14:303～308頁、1996年及び米国特許第5,989,823号)がある。 In certain embodiments, the qPCR reaction comprises a dual fluorophore technique utilizing fluorescence resonance energy transfer (FRET), such as the LIGHTCYCLER hybridization probe, where the two oligonucleotide probes are annealed to the amplicon. (See, for example, US Pat. No. 6,174,670). Oligonucleotides are designed to hybridize in the head-to-tail direction with fluorophore groups that are separated at distances that accommodate effective energy transfer. Another example of a labeled oligonucleotide structured to emit a signal when bound to a nucleic acid or integrated into an extension product is: SCORPIONS probe (eg Whitcombe et al., Nature Biotechnology 17:804). 807, 1999 and U.S. Pat. No. 6,326,145), Sunrise (or AMPLIFLOUR) primers (eg, Nazarenko et al., Nuc. Acids Res. 25: 2516-2521, 1997 and U.S. Pat. No. 6,117,635), and LUX primers. And molecular beacon probes (eg, Tyagi et al., Nature Biotechnology 14: 303-308, 1996 and US Pat. No. 5,989,823).

他の実施形態において、qPCR反応は、蛍光Taqman法及びリアルタイムに蛍光を測定することができる機器(例えば、ABI Prism 7700 Sequence Detector)を使用する。Taqman反応は、2つの異なる蛍光色素で標識されているハイブリダイゼーションプローブを使用する。一方の色素は、レポーター色素(6-カルボキシフルオレスセイン)であり、他方は消光色素(6-カルボキシ-テトラメチルローダミン)である。プローブが完全であるとき、蛍光エネルギー移動が起こり、レポーター色素の蛍光放出は消光色素に吸収される。PCRサイクルの伸長段階の間に、蛍光ハイブリダイゼーションプローブは、DNAポリメラーゼの5'→3'核酸分解活性によって切断される。プローブの切断により、レポーター色素の放射は、もはや消光色素へと効率的に移動せず、結果としてレポーター色素の蛍光放射スペクトルが増加する。リアルタイム法又は単一点検出法を含めた任意の核酸定量化法を使用して、サンプル中の核酸の量を定量化することができる。検出は、いくつかの異なる方法(例えば染色、標識プローブとのハイブリダイゼーション、ビオチン化プライマーを組み込み後のアビジン-酵素コンジュゲート検出、 dCTP又はdATPなど、増幅されるセグメントへの32P標識デオキシヌクレオチド三リン酸の組み込み)、及び核酸定量化のための当技術分野において公知の他の任意の適切な検出法を実施できる。定量化は、増幅ステップを含んでも含まなくてもよい。 In other embodiments, the qPCR reaction uses the fluorescent Taqman method and an instrument capable of measuring fluorescence in real time (eg, ABI Prism 7700 Sequence Detector). The Taqman reaction uses a hybridization probe labeled with two different fluorochromes. One dye is a reporter dye (6-carboxyfluorescein) and the other is a quenching dye (6-carboxy-tetramethylrhodamine). When the probe is complete, fluorescence energy transfer occurs and the fluorescence emission of the reporter dye is absorbed by the quenching dye. During the extension phase of the PCR cycle, the fluorescent hybridization probe is cleaved by the 5'→ 3'nucleic acid degradation activity of the DNA polymerase. Cleavage of the probe no longer efficiently transfers the emission of the reporter dye to the quenching dye, resulting in an increase in the fluorescence emission spectrum of the reporter dye. Any nucleic acid quantification method, including real-time method or single point detection method, can be used to quantify the amount of nucleic acid in the sample. Detection is performed by several different methods (eg staining, hybridization with labeled probes, avidin-enzyme conjugate detection after incorporation with biotinylated primers, 32P labeled deoxynucleotide triphosphate to amplified segments such as dCTP or dATP). Incorporation of acid), and any other suitable detection method known in the art for nucleic acid quantification can be performed. The quantification may or may not include an amplification step.

いくつかの実施形態において、本開示は、連結したDNAセグメントを同定又は定量化するための標識を提供する。いくつかの場合において、連結されたDNAセグメントを標識して、アレイハイブリダイゼーションなど下流の適用を支援することができる。例えば、連結されたDNAセグメントは、ランダムプライミング又はニックトランスレーションを使用して標識することができる。 In some embodiments, the present disclosure provides labels for identifying or quantifying linked DNA segments. In some cases, ligated DNA segments can be labeled to support downstream applications such as array hybridization. For example, ligated DNA segments can be labeled using random priming or nick translation.

様々な標識(例えばレポーター)を使用して、増幅ステップを含むがこれに限定されない間に、本明細書に記載されるヌクレオチド配列を標識することができる。適切な標識には、放射性核種、酵素、蛍光、化学発光又は発色剤及びリガンド、補助因子、阻害剤、磁気粒子などがある。そのような標識の例は、米国特許第3,817,837号、米国特許第3,850,752号、米国特許第3,939,350号、米国特許第3,996,345号、米国特許第4,277,437号、米国特許第4,275,149号及び米国特許第4,366,241号に含まれ、その全体を参照により組み込む。 Various labels (eg, reporters) can be used to label the nucleotide sequences described herein, including but not limited to amplification steps. Suitable labels include radionuclides, enzymes, fluorescence, chemiluminescence or color formers and ligands, cofactors, inhibitors, magnetic particles and the like. Examples of such markings are in US Pat. No. 3,817,837, US Pat. No. 3,850,752, US Pat. No. 3,939,350, US Pat. No. 3,996,345, US Pat. No. 4,277,437, US Pat. No. 4,275,149 and US Pat. No. 4,366,241. Included and incorporates in its entirety by reference.

追加の標識には、それだけには限らないがβ-ガラクトシダーゼ、インベルターゼ、緑色蛍光タンパク質、ルシフェラーゼ、クロラムフェニコール、アセチルトランスフェラーゼ、β-グルクロニダーゼ、エキソグルカナーゼ及びグルコアミラーゼがある。蛍光標識、及び特定の化学的性質を持つ特別に合成された蛍光試薬を使用することもできる。蛍光を測定する様々な方法が、利用可能である。例えば、いくつかの蛍光標識は、励起又は放射スペクトルの変化を呈し、いくつかは、1つの蛍光レポーターが蛍光を失う一方で第2のレポーターが蛍光を増加させる共鳴エネルギー移動を呈し、いくつかは蛍光の消失(消光)又は出現を呈し、一方でいくつかは回転運動を報告する。 Additional labels include, but are not limited to, β-galactosidase, invertase, green fluorescent protein, luciferase, chloramphenicol, acetyltransferase, β-glucuronidase, exoglucanase and glucoamylase. Fluorescent labels and specially synthesized fluorescent reagents with specific chemical properties can also be used. Various methods of measuring fluorescence are available. For example, some fluorescent labels exhibit excitation or changes in the emission spectrum, some exhibit resonance energy transfer, while one fluorescent reporter loses fluorescence while a second reporter increases fluorescence. It exhibits disappearance (quenching) or appearance of fluorescence, while some report rotational motion.

更に、標識するのに十分な材料を得るために、反応当たりの増幅サイクル数を増加させる代わりに、複数の増幅をプールすることができる。別法として、標識ヌクレオチドは、増幅反応の最後のサイクルに組み込むことができる、例えば30サイクルのPCR(標識なし)+10サイクルのPCR(+標識)。 In addition, multiple amplifications can be pooled instead of increasing the number of amplification cycles per reaction in order to obtain sufficient material for labeling. Alternatively, labeled nucleotides can be incorporated into the final cycle of the amplification reaction, eg, 30 cycles of PCR (unlabeled) + 10 cycles of PCR (+ labeled).

特定の実施形態において、本開示は、連結したDNAセグメントに結合できるプローブを提供する。本明細書では、用語「プローブ」とは、対象となる別の分子(例えば、別のオリゴヌクレオチド)にハイブリダイズする能力がある分子(例えば、精製した制限消化物のように天然に存在するか又は合成的に、組換え的に若しくはPCR増幅によって作製されるかを問わないオリゴヌクレオチド)のことを指す。プローブがオリゴヌクレオチドであるとき、それらは一本鎖又は二本鎖であり得る。プローブは、特定の標的(例えば、遺伝子配列)の検出、同定及び単離に有用である。いくつかの場合において、プローブは、酵素(例えば、ELISA及び酵素ベースの組織化学的アッセイ)、蛍光、放射性、及び発光性のシステムを含むがこれに限らない任意の検出システムで検出可能である標識と会合させることができる。 In certain embodiments, the present disclosure provides probes capable of binding to ligated DNA segments. As used herein, the term "probe" refers to a molecule capable of hybridizing to another molecule of interest (eg, another oligonucleotide) (eg, is it naturally occurring, such as a purified restriction digest)? Or an oligonucleotide, whether synthetically, recombinantly or by PCR amplification). When the probes are oligonucleotides, they can be single-stranded or double-stranded. Probes are useful for the detection, identification and isolation of specific targets (eg, gene sequences). In some cases, the probe is detectable by any detection system including, but not limited to, enzymes (eg, ELISA and enzyme-based histochemical assays), fluorescent, radioactive, and luminescent systems. Can be met with.

アレイ及びマイクロアレイに関しては、用語「プローブ」は、前記プローブにハイブリダイズしたヌクレオチド配列を検出する目的でアレイに貼り付けられるハイブリダイズ可能な任意の材料を指すために使用される。いくつかの場合において、プローブは、約10bp～500bp、約10bp～250bp、約20bp～250bp、約20bp～200bp、約25bp～200bp、約25bp～100bp、約30bp～100bp、又は約30bp～80bpであり得る。いくつかの場合において、プローブは、長さ約10bp、約20bp、約30bp、約40bp,約50bp、約60bp、約70bp、約80bp、約90bp、約100bp、約150bp、約200bp、約250bp、約300bp、約400bp又は約500bpより長くなることができる。例えば、プローブは長さ約20～約50bpであり得る。プローブ設計の例及び理論的根拠は、WO95/11995、欧州特許第717,113号及びWO97/29212に見出すことができる。 With respect to arrays and microarrays, the term "probe" is used to refer to any hybridizable material that is attached to an array for the purpose of detecting a nucleotide sequence that hybridizes to said probe. In some cases, the probe is at about 10 bp to 500 bp, about 10 bp to 250 bp, about 20 bp to 250 bp, about 20 bp to 200 bp, about 25 bp to 200 bp, about 25 bp to 100 bp, about 30 bp to 100 bp, or about 30 bp to 80 bp. possible. In some cases, the probe is about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 60 bp, about 70 bp, about 80 bp, about 90 bp, about 100 bp, about 150 bp, about 200 bp, about 250 bp, It can be longer than about 300 bp, about 400 bp or about 500 bp. For example, the probe can be about 20 to about 50 bp in length. Examples and rationale for probe design can be found in WO95 / 11995, European Patent No. 717,113 and WO97 / 29212.

いくつかの場合において、制限酵素によって消化される部位の近くにハイブリダイズできるように、1つ以上のプローブを設計することができる。例えば、プローブは、制限酵素認識部位の約10bp、約20bp、約30bp、約40bp、約50bp、約60bp、約70bp、約80bp、約90bp、約100bp、約150bp、約200bp、約250bp、約300bp、約400bp又は約500bpの範囲にあり得る。 In some cases, one or more probes can be designed to hybridize near sites that are digested by restriction enzymes. For example, the probe is about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 60 bp, about 70 bp, about 80 bp, about 90 bp, about 100 bp, about 150 bp, about 200 bp, about 250 bp, about 250 bp, about the restriction enzyme recognition site. It can be in the range of 300 bp, about 400 bp or about 500 bp.

他の場合において、単一且つ固有のプローブは、制限酵素によって消化される部位の各側約10bp、約20bp、約30bp、約40bp、約50bp、約60bp、約70bp、約80bp、約90bp、約100bp、約150bp、約200bp、約250bp、約300bp、約400bp又は約500bpの範囲に設計することができる。制限酵素によって消化される部位のいずれの側でもハイブリダイズできるように、プローブを設計することができる。例えば、一次制限酵素認識部位の各側で単一プローブを使用することができる。 In other cases, single and unique probes are about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 60 bp, about 70 bp, about 80 bp, about 90 bp, on each side of the site digested by restriction enzymes. It can be designed in the range of about 100 bp, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 400 bp or about 500 bp. Probes can be designed to hybridize to any side of the site that is digested by restriction enzymes. For example, a single probe can be used on each side of the primary restriction enzyme recognition site.

更なる場合において、2、3、4、5、6、7、8つ又はより多くのプローブを、制限酵素認識部位の各側で設計することができ、次いでそれらプローブを使用して、同じライゲーション事象を調査することができる。例えば、2又は3つのプローブを制限酵素認識部位の各側で設計することができる。いくつかの例において、一次制限酵素認識部位当たり複数(例えば2、3、4、5、6、7又は8つ又はより多く)のプローブの使用は、個々のプローブから偽陰性の結果を得る問題を最小化するのに有用になり得る。 In additional cases, 2, 3, 4, 5, 6, 7, 8 or more probes can be designed on each side of the restriction enzyme recognition site, and then the same ligation is used using those probes. You can investigate the event. For example, two or three probes can be designed on each side of the restriction enzyme recognition site. In some examples, the use of multiple probes per primary restriction enzyme recognition site (eg, 2, 3, 4, 5, 6, 7 or 8 or more) is a problem with getting false negative results from individual probes. Can be useful in minimizing.

本明細書では、用語「プローブの組」とは、ゲノム中の一次制限酵素に対する一次制限酵素認識部位の1つ以上にハイブリダイズできるプローブ一式又は集合のことを指す。 As used herein, the term "probe set" refers to a set or set of probes that can hybridize to one or more of the primary restriction enzyme recognition sites for a primary restriction enzyme in the genome.

いくつかの場合において、1組のプローブは、ゲノムDNA中の制限酵素に対する一次制限酵素認識部位の1つ以上に隣接する核酸配列に対して配列において相補的であり得る。例えば、プローブの組は、ゲノムDNA中の制限酵素認識部位の1つ以上に隣接するヌクレオチドの約10bp～500bp、約10bp～250bp、約20bp～250bp、約20bp～200bp、約25bp～200bp、約25bp～100bp、約30bp～100bp、又は約30bp～80bpに対して配列において相補的であり得る。プローブの組は、制限酵素認識部位の片側(例えばいずれか)又は両側に対して配列において相補的であり得る。したがって、プローブはゲノムDNA中の一次制限酵素認識部位の1つ以上の各側に隣接する核酸配列に対して配列において相補的であり得る。更に、プローブの組は、ゲノムDNA中の一次制限酵素認識部位の1つ以上から約10bp、約20bp、約30bp、約40bp、約50bp、約60bp、約70bp、約80bp、約90bp、約100bp、約150bp、約200bp、約250bp、約300bp、約400bp又は約500bp未満にある核酸配列に対して配列において相補的であり得る。 In some cases, a set of probes can be sequence-complementary to nucleic acid sequences flanking one or more of the primary restriction enzyme recognition sites for restriction enzymes in genomic DNA. For example, a set of probes is about 10 bp to 500 bp, about 10 bp to 250 bp, about 20 bp to 250 bp, about 20 bp to 200 bp, about 25 bp to 200 bp, about 10 bp to 500 bp, about 10 bp to 250 bp, about 20 bp to 250 bp, about 25 bp to 200 bp, about nucleotides adjacent to one or more restriction enzyme recognition sites in genomic DNA. It can be complementary in sequence to 25 bp to 100 bp, about 30 bp to 100 bp, or about 30 bp to 80 bp. The set of probes can be complementary in sequence to one side (eg, either) or both sides of the restriction enzyme recognition site. Thus, the probe can be sequence-complementary to the nucleic acid sequences flanking each side of one or more of the primary restriction enzyme recognition sites in genomic DNA. In addition, a set of probes can be from one or more of the primary restriction enzyme recognition sites in genomic DNA: about 10 bp, about 20 bp, about 30 bp, about 40 bp, about 50 bp, about 60 bp, about 70 bp, about 80 bp, about 90 bp, about 100 bp. , Can be complementary in sequence to nucleic acid sequences that are below about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 400 bp or about 500 bp.

いくつかの場合において、2つ以上のプローブを、ゲノムDNA中の制限酵素認識部位の1つ以上に隣接する配列にハイブリダイズできるように設計することができる。プローブは、重複又は部分的に重複することができる。 In some cases, two or more probes can be designed to hybridize to sequences flanking one or more restriction enzyme recognition sites in genomic DNA. The probes can overlap or partially overlap.

プローブ、プローブのアレイ又はプローブの組は、支持体に固定化することができる。支持体(例えば固体支持体)は、ガラス、シリカ、プラスチック、ナイロン又はニトロセルロースなど様々な材料ででき得る。支持体は好ましくは堅く、平面表面を有する。支持体は約1～10,000,000個の分割された座を有することができる。例えば、支持体は、約10～10,000,000、約10～5,000,000、約100～5,000,000、約100～4,000,000、約1000～4,000,000、約1000～3,000,000、約10,000～3,000,000、約10,000～2,000,000、約100,000～2,000,000又は約100,000～1,000,000個の分割された座を有することができる。分割された座の密度は、1平方センチメートルの範囲に少なくとも約10、約100、約1000、約10,000、約100,000又は約1,000,000個の分割された座であり得る。いくつかの場合において、分割された各座は、>95%を単一型のオリゴヌクレオチドによって占有できる。他の場合において、各分割された座は、プローブ又は1組のプローブのプールされた混合物によって占有できる。更なる場合において、いくつかの分割された座は、プローブ又は1組のプローブのプールされた混合物によって占有され、他の分割された座は、>95%を単一型のオリゴヌクレオチドによって占有される。 A probe, an array of probes or a set of probes can be immobilized on a support. The support (eg, solid support) can be made of various materials such as glass, silica, plastic, nylon or nitrocellulose. The support is preferably rigid and has a planar surface. The support can have about 1-10,000,000 split seats. For example, the supports are about 10-10,000,000, about 10-5,000,000, about 100-5,000,000, about 100-4,000,000, about 1000-4,000,000, about 1000-3,000,000, about 10,000-3,000,000, about 10,000-2,000,000, about 100,000-2,000,000. Or it can have about 100,000 to 1,000,000 divided seats. The density of the divided loci can be at least about 10, about 100, about 1000, about 10,000, about 100,000 or about 1,000,000 divided loci in a range of 1 square centimeter. In some cases, each split locus can occupy> 95% with a single oligonucleotide. In other cases, each split locus can be occupied by a probe or a pooled mixture of probes. In a further case, some split loci are occupied by a probe or a pooled mixture of probes, and other split loci are occupied> 95% by a single oligonucleotide. Ru.

いくつかの場合において、アレイ上の所与のヌクレオチド配列に対するプローブ数は、そのようなアレイにハイブリダイズさせるDNAサンプルに対して大過剰であり得る。例えば、アレイは、入力サンプル中のDNA量に対して約10、約100、約1000、約10,000、約100,000、約1,000,000、約10,000,000又は約100,000,000倍のプローブ数を有することができる。 In some cases, the number of probes for a given nucleotide sequence on an array can be large excess for the DNA sample hybridized to such an array. For example, an array can have about 10, about 100, about 1000, about 10,000, about 100,000, about 1,000,000, about 10,000,000, or about 100,000,000 times the number of probes with respect to the amount of DNA in the input sample.

いくつかの場合において、アレイは、約10、約100、約1000、約10,000、約100,000、約1,000,000、約10,000,000、約100,000,000又は約1,000,000,000個のプローブを有することができる。 In some cases, an array can have about 10, about 100, about 1000, about 10,000, about 100,000, about 1,000,000, about 10,000,000, about 100,000,000 or about 1,000,000,000 probes.

プローブのアレイ又はプローブの組は、支持体上で段階的な方式で合成することができ、又は予め合成された形態で結合させることもできる。合成の1つ方法は、VLSIPS(商標)(米国特許第5,143,854号及び欧州特許第476,014号に記載の通り)であり、その方法は、高密度且つ小型化されたアレイにオリゴヌクレオチドプローブを合成するために光の使用を伴う。合成サイクル数を減らすためのマスク設計のアルゴリズムについては、米国特許第5,571,639号及び米国特許第5,593,839号に記載されている。欧州特許第624,059号に記載の通り、アレイは、機械的に制約された流路によってモノマーを支持体のセルに供給することによって組合せ様式で合成することもできる。アレイは、インクジェット式プリンタを使用して支持体に試薬をスポットすることによって合成することもできる(例えば、欧州特許第728,520号を参照のこと)。 An array of probes or a set of probes can be synthesized on a support in a stepwise manner, or can be combined in a pre-synthesized form. One method of synthesis is VLSIPS ™ (as described in US Pat. No. 5,143,854 and European Patent No. 476,014), which synthesizes oligonucleotide probes into dense, miniaturized arrays. With the use of light for. Mask design algorithms for reducing the number of synthetic cycles are described in US Pat. No. 5,571,639 and US Pat. No. 5,593,839. As described in European Patent No. 624,059, arrays can also be synthesized in a combinatorial fashion by feeding the monomers to the cells of the support via mechanically constrained channels. Arrays can also be synthesized by spotting reagents on the support using an inkjet printer (see, eg, European Patent No. 728,520).

いくつかの実施形態において、本開示は、連結したDNAセグメントをアレイにハイブリダイズさせる方法を提供する。「基板」又は「アレイ」は、合成的か又は生合成的かいずれかで調製することができ、様々な異なる形式の生物学的活性についてスクリーニングすることができる意図的に作り出された核酸の集合である(例えば、可溶性分子のライブラリー、及び樹脂ビーズ、シリカチップ又は他の固体支持体に繋がれたオリゴヌクレオチドのライブラリー)。加えて、用語「アレイ」は、基本的に任意の長さ(例えば、長さ1～約1000ヌクレオチドモノマー)の核酸を基板にスポットすることによって調製できる核酸のライブラリーを含む。 In some embodiments, the present disclosure provides a method of hybridizing ligated DNA segments into an array. A "substrate" or "array" can be prepared either synthetically or biosynthetic, and is a deliberately produced collection of nucleic acids that can be screened for a variety of different forms of biological activity. (Eg, a library of soluble molecules, and a library of oligonucleotides linked to resin beads, silica chips or other solid supports). In addition, the term "array" includes a library of nucleic acids that can be prepared by spotting nucleic acids of essentially any length (eg, 1 to about 1000 nucleotide monomers in length) onto a substrate.

アレイ技術並びに様々な関連技法及び用途については、通常、多くの教科書及び文書に記載されている。例えば、これらには、Lemieuxら、1998年、Molecular Breeding 4、277～289頁、Schena及びDavis、Parallel Analysis with Biological Chips. in PCR Methods Manual(M. Innis、D. Gelfand、J. Sninsky編)、Schena及びDavis、1999年、Genes, Genomes and Chips. In DNA Microarrays: A Practical Approach(M. Schena編)、Oxford University Press、Oxford, UK、1999年)、The Chipping Forecast(Nature Genetics special issue、1999年1月補足)、Mark Schena(編)、Microarray Biochip Technology、(Eaton Publishing社)、Cortes、2000年、The Scientist 14[17]:25頁、Gwynn及びPage、Microarray analysis、the next revolution in molecular biology、Science、1999年8月6日、並びにEakins及びChu(1999) Trends in Biotechnology 17、217～218頁がある。 Array techniques and various related techniques and uses are usually described in many textbooks and documents. For example, Lemieux et al., 1998, Molecular Breeding 4, pp. 277-289, Schena and Davis, Parallel Analysis with Biological Chips. In PCR Methods Manual (edited by M. Innis, D. Gelfand, J. Sninsky), Schena and Davis, 1999, Genes, Genomes and Chips. In DNA Microarrays: A Practical Approach (edited by M. Schena), Oxford University Press, Oxford, UK, 1999), The Chipping Forecast (Nature Genetics special issue, 1999) January Supplement), Mark Schena (eds.), Microarray Biochip Technology, (Eaton Publishing), Cortes, 2000, The Scientist 14 [17]: 25 pages, Gwynn and Page, Microarray analysis, the next revolution in molecular biology, Science, August 6, 1999, and Eakins and Chu (1999) Trends in Biotechnology 17, pp. 217-218.

一般に、ライブラリーのメンバーを空間的に分離することにより、任意のライブラリーを、秩序ある方式でアレイに配列することができる。配列するのに適したライブラリーの例には、核酸ライブラリー(DNA、cDNA、オリゴヌクレオチド等を含めたライブラリー)、ペプチド、ポリペプチド及びタンパク質ライブラリー並びにリガンドライブラリーなど任意の分子を含むライブラリーがある。 In general, by spatially separating the members of a library, any library can be arranged in an array in an orderly manner. Examples of libraries suitable for sequencing are live including nucleic acid libraries (library containing DNA, cDNA, oligonucleotides, etc.), peptides, polypeptides and protein libraries as well as ligand libraries and any other molecule. There is a rally.

ライブラリーを固相(例えば固体基板)上に固定又は固定化して、メンバーの拡散及び混合を制限することができる。いくつかの場合において、リガンド結合DNAのライブラリーを、調製することができる。特に、ライブラリーは、膜並びにプラスチック及びガラスなどの無孔基板を含めた実質的に平面の固相に固定化することができる。更に、索引付け(即ち、特定のメンバーの参照又は呼び出し)が容易になるように、ライブラリーを配列することができる。いくつかの例において、ライブラリーのメンバーを、格子形態のスポットとして適用することができる。一般的なアッセイシステムは、この目的に適合させることができる。例えば、アレイは、マイクロプレートの表面に、1つのウェルに複数のメンバーか又は各ウェルに単一メンバーかのいずれかで固定化することができる。更に、ニトロセルロース又はナイロン膜(例えばブロッティング実験に使用される膜)など、固体基板は膜であり得る。代わりの基板は、ガラス又はシリカ製基板を含む。したがって、ライブラリーは、当技術分野において公知の任意の適切な方法、例えば、電荷相互作用によって、又はウェルの壁若しくは底又は膜表面への化学的結合によって固定化することができる。配列及び固定する他の手段、例えば、ピペット操作、液滴接触、圧電手段、インクジェット及びバブルジェット(登録商標)技術、静電塗布等を使用することができる。シリコン製チップの場合には、フォトリソグラフィーを利用して、チップ上にライブラリーを配列及び固定することができる。 The library can be immobilized or immobilized on a solid phase (eg, a solid substrate) to limit the diffusion and mixing of members. In some cases, a library of ligand-bound DNA can be prepared. In particular, the library can be immobilized on a substantially planar solid phase, including membranes and non-perforated substrates such as plastic and glass. In addition, libraries can be arranged for ease of indexing (ie, referencing or calling a particular member). In some examples, members of the library can be applied as spots in grid form. Common assay systems can be adapted for this purpose. For example, the array can be immobilized on the surface of the microplate with either multiple members in one well or a single member in each well. In addition, solid substrates such as nitrocellulose or nylon membranes (eg, membranes used in blotting experiments) can be membranes. Alternative substrates include glass or silica substrates. Thus, the library can be immobilized by any suitable method known in the art, for example, by charge interaction or by chemical bonding to the wall or bottom of the well or membrane surface. Other means of arranging and immobilizing, such as pipette manipulation, droplet contact, piezoelectric means, inkjet and bubble jet® technology, electrostatic coating, etc. can be used. In the case of silicon chips, photolithography can be used to arrange and secure the library on the chip.

ライブラリーは、固体基板上へ「スポットされる」ことによって配列することができ、これは、手によって又はメンバーを正確に置くためのロボット工学を使用することによって行うことができる。一般に、アレイはマクロアレイ又はマイクロアレイと記載することができ、その差異はスポットのサイズである。マクロアレイは、約300ミクロン又はより大きなスポットサイズを含有することができ、既存のゲル及びブロットスキャナによって容易に撮像することができる。マイクロアレイのスポットサイズは、直径200ミクロン未満であり得、これらのアレイは通常数千スポットも含有する。したがって、マイクロアレイは専門のロボット工学及び画像化装置を必要とする場合があり、それらは特別注文する必要があり得る。機器類については、全般的にCortese、2000、The Scientist 14 [11]:26による総説に記載されている。 Libraries can be arranged by being "spotted" onto a solid substrate, which can be done by hand or by using robotics to place the members accurately. Generally, an array can be described as a macroarray or a microarray, the difference being the size of the spot. Macroarrays can contain about 300 microns or larger spot sizes and can be easily imaged by existing gel and blot scanners. The spot size of microarrays can be less than 200 microns in diameter, and these arrays typically contain thousands of spots as well. Therefore, microarrays may require specialized robotics and imaging equipment, which may need to be specially ordered. Equipment is generally described in a review by Cortese, 2000, The Scientist 14 [11]:26.

DNA分子の固定化ライブラリーを作製する技法は、当技術分野に記載されている。通常、従来の方法の大部分は、例えば固体基板上の様々な別々の位置で様々な順列の配列を組み立てるマスキング技法を使用して一本鎖核酸分子ライブラリーを合成する方法について記載した。米国特許第5,837,832号は、極めて大規模な集積技術に基づいてシリコン基板に固定化されたDNAアレイを作製するための改善した方法について記載している。特に、米国特許第5,837,832号は、本開示の固定化されたDNAライブラリーを作製するために使用できる、基板上で空間的に定義済みの場所に特定の組のプローブを合成する「タイリング」と呼ばれる戦略について記載している。米国特許第5,837,832号は、また使用できる従来技法についての参照も提供する。他の場合において、アレイは、光蒸着化学を使用して組み立てることもできる。 Techniques for making immobilization libraries of DNA molecules are described in the art. Most of the conventional methods have usually described methods of synthesizing single-stranded nucleic acid molecule libraries using, for example, masking techniques that assemble different ordered sequences at different different positions on a solid substrate. U.S. Pat. No. 5,837,832 describes an improved method for making DNA arrays immobilized on silicon substrates based on extremely large integration techniques. In particular, U.S. Pat. No. 5,837,832 "tilings" to synthesize a particular set of probes in a spatially defined location on a substrate that can be used to create the immobilized DNA library of the present disclosure. It describes a strategy called. U.S. Pat. No. 5,837,832 also provides a reference for conventional techniques that can be used. In other cases, the array can also be assembled using photodeposited chemistry.

ペプチド(又は、ペプチド模倣体)のアレイは、アレイ中の予め定義された別々の場所に各別個のライブラリーメンバー(例えば、固有のペプチド配列)を配置する方式で表面上に合成することもできる。各ライブラリーメンバーの同一性は、アレイ中の空間的場所によって決定される。既定の分子(例えば、標的又はプローブ)と反応性ライブラリーメンバーとの結合相互作用が起こるアレイ中の場所が決定され、それにより空間的場所に基づいて反応性ライブラリーメンバーの配列を同定する。これらの方法については、米国特許第5,143,854号、WO90/15070及びWO92/10092、Fodorら(1991) Science、251:767頁、Dower及びFodor (1991) Ann. Rep. Med. Chem.、26:271頁に記載されている。 Arrays of peptides (or peptide mimetics) can also be synthesized on the surface in a manner that places each separate library member (eg, a unique peptide sequence) at a separate predefined location in the array. .. The identity of each library member is determined by its spatial location in the array. The location in the array where the binding interaction of a predetermined molecule (eg, target or probe) with the reactive library member occurs is determined, thereby identifying the sequence of the reactive library member based on the spatial location. For these methods, see US Pat. No. 5,143,854, WO 90/15070 and WO 92/10092, Fodor et al. (1991) Science, p. 251: 767, Dower and Fodor (1991) Ann. Rep. Med. Chem., 26:271. It is described on the page.

検出を補助するために、任意の容易に検出可能なレポーター、例えば、蛍光、生物発光、燐光、放射性レポーター等などの標識を(前述の通り)使用することができる。そのようなレポーター、その検出、標的/プローブに対する結合等については、この文書の別の箇所で述べる。プローブ及び標的の標識については、Shalonら、1996年、Genome Res 6(7):639～45頁にも開示されている。 To aid detection, any readily detectable reporter, such as a label such as fluorescence, bioluminescence, phosphorescence, radioactive reporter, etc., can be used (as described above). Such reporters, their detection, binding to targets / probes, etc. are described elsewhere in this document. Labels for probes and targets are also disclosed in Shalon et al., 1996, Genome Res 6 (7): 639-45.

いくつかの市販のマイクロアレイ形式の例については、下の表1に述べられる(Marshall及びHodgson、1998年、Nature Biotechnology、16(1)、27～31頁も参照のこと)。

Examples of some commercially available microarray formats are given in Table 1 below (see also Marshall and Hodgson, 1998, Nature Biotechnology, 16 (1), pp. 27-31).

アレイに基づくアッセイからデータを生成するためにシグナルを検出して、プローブとヌクレオチド配列とのハイブリダイゼーションの存否を表すことができる。更に、直接的及び間接的な標識技法を利用することもできる。例えば、直接標識は、アレイ会合プローブにハイブリダイズするヌクレオチド配列に蛍光色素を直接組み込む(例えば、色素は、標識ヌクレオチド又はPCRプライマーの存在下で酵素的合成によってヌクレオチド配列に組み込まれる)。直接標識のスキームは、例えば類似の化学構造及び特性を持つ蛍光色素のファミリーを使用することにより強いハイブリダイゼーションシグナルを産生することができ、実装しやすくできる。核酸の直接標識を含む場合、シアニン又はalexa類似体を、多重蛍光比較アレイ分析に利用することができる。他の実施形態において、間接標識のスキームを利用して、マイクロアレイプローブにハイブリダイゼーションする前又は後にエピトープを核酸に組み込むことができる。1つ以上の染色手順及び試薬を使用して、ハイブリダイズした複合体を標識することができる(例えば、エピトープに結合する蛍光分子、それによりハイブリダイズした種のエピトープに対する色素分子のコンジュゲーションにより蛍光シグナルを得られる)。 Signals can be detected to generate data from array-based assays to indicate the presence or absence of hybridization between the probe and the nucleotide sequence. In addition, direct and indirect labeling techniques can be utilized. For example, direct labeling integrates a fluorescent dye directly into a nucleotide sequence that hybridizes to an array-associated probe (eg, the dye is incorporated into the nucleotide sequence by enzymatic synthesis in the presence of labeled nucleotides or PCR primers). Direct labeling schemes can produce strong hybridization signals and are easy to implement, for example by using a family of fluorescent dyes with similar chemical structures and properties. Cyanine or alexa analogs can be utilized for multiple fluorescence comparison array analysis if the nucleic acid is directly labeled. In other embodiments, an indirect labeling scheme can be utilized to integrate the epitope into the nucleic acid before or after hybridization to the microarray probe. One or more staining procedures and reagents can be used to label the hybridized complex (eg, fluorescence by conjugation of a fluorescent molecule that binds to an epitope, thereby a dye molecule to an epitope of the hybridized species. You can get a signal).

様々な実施形態において、本明細書に記載される適切な配列決定法又は当技術分野において公知の別の方法を使用して、サンプル内の核酸分子から配列情報を得ることになる。配列決定は、当技術分野で周知の古典的なサンガー配列決定方法によって達成できる。配列決定は、高スループットシステムを使用して達成することもでき、そのいくつかは、配列決定されるヌクレオチドが成長鎖に組み込まれた直後又は同時にそれを検出する、即ちリアルタイム又は実質的にリアルタイムで配列を検出することを可能にする。いくつかの場合において、高スループット配列決定は、1時間当たり少なくとも1,000、少なくとも5,000、少なくとも10,000、少なくとも20,000、少なくとも30,000、少なくとも40,000、少なくとも50,000、少なくとも100,000又は少なくとも500,000個の配列リードを生成し、配列決定リードは、リード当たり少なくとも約50、約60、約70、約80、約90、約100、約120、約150、約180、約210、約240、約270、約300、約350、約400、約450、約500、約600、約700、約800、約900又は約1000塩基であり得る。 In various embodiments, sequence information will be obtained from the nucleic acid molecules in the sample using the appropriate sequencing methods described herein or another method known in the art. Sequencing can be accomplished by classical Sanger sequencing methods well known in the art. Sequencing can also be achieved using high throughput systems, some of which detect the sequenced nucleotides immediately after or at the same time as they are integrated into the growth chain, ie in real time or substantially in real time. Allows to detect sequences. In some cases, high-throughput sequencing will generate at least 1,000, at least 5,000, at least 10,000, at least 20,000, at least 30,000, at least 40,000, at least 50,000, at least 100,000 or at least 500,000 sequence reads per hour and sequence. Decision leads are at least about 50, about 60, about 70, about 80, about 90, about 100, about 120, about 150, about 180, about 210, about 240, about 270, about 300, about 350, about It can be 400, about 450, about 500, about 600, about 700, about 800, about 900 or about 1000 bases.

いくつかの実施形態において、高スループット配列決定は、Illumina社のGenome Analyzer IIX、MiSeq personal sequencer、又はHiSeq 2500、HiSeq 1500、HiSeq 2000若しくはHiSeq 1000機を使用するものなどHiSeq systems社によって市販されている技術の使用を含む。これらの機械は、合成化学による可逆的ターミネーターに基づく配列決定を使用する。これらの機械は、8日間に200,000,000,000個のDNAリード又はより多くを行うことができる。3、2、1日又はより短い時間内で実行するためにより小さいシステムを利用することができる。 In some embodiments, high-throughput sequencing is commercially available from Illumina's Genome Analyzer IIX, MiSeq personal sequencer, or HiSeq systems, such as those using HiSeq 2500, HiSeq 1500, HiSeq 2000 or HiSeq 1000 machines. Including the use of technology. These machines use synthetic terminator-based sequencing. These machines can perform 200,000,000,000 DNA reads or more in 8 days. Smaller systems can be utilized to run in 3, 2, 1 day or less time.

いくつかの実施形態において、高スループット配列決定は、ABI Solid System社から入手可能な技術の使用を含む。ビーズに連結されたクローン的に増幅したDNA断片の大規模並列配列決定を可能にするこの遺伝子分析プラットフォーム。配列決定法は、色素標識オリゴヌクレオチドによる逐次ライゲーションに基づく。 In some embodiments, high throughput sequencing involves the use of techniques available from ABI Solid System. This gene analysis platform enables large-scale parallel sequencing of clonally amplified DNA fragments linked to beads. The sequencing method is based on sequential ligation with dye-labeled oligonucleotides.

次世代配列決定は、イオン半導体配列決定(例えばLife Technologies社の技術を使用する(Ion Torrent))を含むことができる。イオン半導体配列決定は、ヌクレオチドがDNA鎖に組み込まれるときに、イオンが放出し得るという事実を利用することができる。イオン半導体配列決定を実施するために、微小測定ウェルの高密度アレイを、形成することができる。各ウェルには、単一のDNA鋳型を入れておくことができる。ウェルの下にイオン感応性の層があり得、イオン感応性の層の下に、イオンセンサーがあり得る。ヌクレオチドがDNAに添加されると、H⁺を放出することができ、それをpHの変化として測定することができる。H⁺イオンは電圧に変換することができ、半導体センサーによって記録することができる。アレイチップは、ヌクレオチドを次々に順次あふれさせることができる。スキャン、光又はカメラは、必要としない。いくつかの場合において、IONPROTON(商標)配列決定装置を使用して核酸を配列決定する。いくつかの場合において、IONPGM(商標)配列決定装置が使用される。Ion Torrent Personal Genome Machine(PGM)。PGMは、2時間に10,000,000個のリードを行うことができる。 Next-generation sequencing can include ion semiconductor sequencing (eg, using Life Technologies technology (I on Torrent)). Ion semiconductor sequencing can take advantage of the fact that ions can be released when a nucleotide is integrated into a DNA strand. A high density array of micromeasurement wells can be formed to perform ion semiconductor sequencing. Each well can contain a single DNA template. Below the well may be an ion-sensitive layer and below the ion-sensitive layer may be an ion sensor. When a nucleotide is added to DNA, it can release H ⁺ , which can be measured as a change in pH. H ⁺ ions can be converted to voltage and recorded by a semiconductor sensor. The array chip can be flooded with nucleotides one after another. No scan, light or camera required. In some cases, the ION PROTON ™ sequencing device is used to sequence nucleic acids. In some cases, the ION PGM ™ sequencing device is used. Ion Torrent Personal Genome Machine (PGM). PGM can make 10,000,000 leads in 2 hours.

いくつかの実施形態において、高スループット配列決定は、単一分子合成時解読法(SMSS)(Single Molecule Sequencing by Synthesis)など、Helicos BioSciences社(Cambridge、Massachusetts)から入手可能な技術の使用を含む。SMSSは、24時間以内での全ヒトゲノムの配列決定を可能にするので、特色がある。最後に、SMSSについては、米国特許出願公開第2006/0024711号、第2006/0024678号、第2006/0012793号、第2006/0012784号、及び第2005/0100932号にある程度記載されている。 In some embodiments, high throughput sequencing involves the use of techniques available from Helicos BioSciences (Cambridge, Massachusetts), such as Single Molecule Sequencing by Synthesis (SMSS). SMSS is unique because it allows sequencing of the entire human genome within 24 hours. Finally, SMSS is described to some extent in US Patent Application Publication Nos. 2006/0024711, 2006/0024678, 2006/0012793, 2006/0012784, and 2005/0100932.

いくつかの実施形態において、高スループット配列決定は、機器内のCCDカメラによって記録される配列決定反応によって生成される化学発光シグナルを送る光ファイバープレートを含むPicoTiterPlate装置など、454Lifesciences,Inc.社(Branford、Connecticut)から入手可能な技術の使用を含む。光ファイバーのこの使用により、4.5時間に最低20,000,000塩基対の検出が可能になる。 In some embodiments, high-throughput sequencing is a PicoTiter Plate device that includes a fiber optic plate that sends a chemiluminescent signal generated by a sequencing reaction recorded by a CCD camera in the instrument, such as 454 Lifesciences, Inc. (Branford, Branford, Includes the use of technology available from Connecticut). This use of optical fiber allows detection of at least 20,000,000 base pairs in 4.5 hours.

ビーズ増幅の後に光ファイバー検出を使用する方法については、Marguile, M.、ら「Genome sequencing in microfabricated high-density pricolitre reactors」、Nature、doi:10.1038/nature03959、並びに米国特許出願公開第2002/0,012,930号、第2003/0068629号、第2003/0100102号、第2003/0148344号、第2004/0248161号、第2005/0079510号、第2005/0124022号、及び第2006/0078909号に記載されている。 For methods of using fiber optic detection after bead amplification, see Marguile, M. et al., "Genome sequencing in microfabricated high-density pricolitre reactors", Nature, doi: 10.1038 / nature03959, and US Patent Application Publication No. 2002 / 0,012,930, It is described in 2003/0068629, 2003/0100102, 2003/0148344, 2004/0248161, 2005/0079510, 2005/0124022, and 2006/0078909.

いくつかの実施形態において、高スループット配列決定は、クローナル単一分子アレイ(Clonal Single Molecule Array)(Solexa,Inc.)又は可逆的ターミネーター化学反応を利用する合成時解読(SBS)を使用して実施される。これらの技術については、米国特許第6,969,488号、第6,897,023号、第6,833,246号、第6,787,308号、及び米国特許出願公開第2004/0106110号、第2003/0064398号、第2003/0022207号、並びにConstans, A.、The Scientist 2003年、17(13):36頁にある程度記載されている。 In some embodiments, high-throughput sequencing is performed using a Clonal Single Molecule Array (Solexa, Inc.) or a time-of-synthesis decoding (SBS) utilizing a reversible terminator chemical reaction. Will be done. These technologies are described in US Patents 6,969,488, 6,897,023, 6,833,246, 6,787,308, and US Patent Application Publications 2004/0106110, 2003/0064398, 2003/0022207, and Constant, A., The Scientist 2003, 17 (13): 36, to some extent.

次世代配列決定技法は、Pacific Biosciences社によるリアルタイム技術[SMRT(商標)]を含むことができる。SMRTにおいて、4種のDNA塩基のそれぞれは、4つの異なる蛍光色素の1つに結合することができる。これらの色素は、リン酸結合であり得る。単一DNAポリメラーゼは、ゼロモード導波路(ZMW)の底で単一分子の鋳型一本鎖DNAにより固定化することができる。ZMWは、ZMWから速やかに(マイクロ秒)拡散することができる蛍光ヌクレオチドのバックグラウンドに対し、DNAポリメラーゼによる単一ヌクレオチドの組み込みの観察を可能にする閉じ込め構造であり得る。成長鎖にヌクレオチドを組み込むのに数ミリ秒かかり得る。この間に、蛍光標識が励起され、蛍光シグナルを発生することができ、蛍光タグは切断されることができる。ZMWは、下面から照らすことができる。励起光線から減弱された光は、各ZMWの20～30nm下の方を透過することができる。20ゼプトリットル(10^-21リットル)の検出限界を持つ顕微鏡を製作することができる。小さい検出容積は、バックグラウンドノイズの減少に1000倍の改善をもたらすことができる。色素の対応する蛍光の検出は、どの塩基が組み込まれたかを示すことができる。その過程を繰り返すことができる。 Next-generation sequencing techniques can include real-time technology [SMRT ™] by Pacific Biosciences. In SMRT, each of the four DNA bases can bind to one of four different fluorescent dyes. These dyes can be phosphate bonds. Single DNA polymerases can be immobilized with a single molecule template single-stranded DNA at the bottom of a zero-mode waveguide (ZMW). ZMW can be a confinement structure that allows observation of single nucleotide integration by DNA polymerase against a background of fluorescent nucleotides that can diffuse rapidly (microseconds) from ZMW. It can take a few milliseconds to integrate the nucleotide into the growth chain. During this time, the fluorescent label is excited, a fluorescent signal can be generated, and the fluorescent tag can be cleaved. The ZMW can be illuminated from the bottom. The light attenuated from the excitation rays can pass 20 to 30 nm below each ZMW. A microscope with a detection limit of 20 zepto liters ( ^10-21 liters) can be made. A small detection volume can provide a 1000x improvement in background noise reduction. Detection of the corresponding fluorescence of the dye can indicate which base was incorporated. The process can be repeated.

いくつかの場合において、次世代配列決定は、ナノ細孔配列決定である(例えば、Soni GV及びMeller A. (2007) Clin Chem 53:1996～2001頁を参照のこと)。ナノ細孔は、直径約1ナノメートル程度の小さい穴であり得る。導電液体にナノ細孔を浸漬し、細孔を横切る電位を印加することにより、ナノ細孔を通るイオンの伝導による微弱電流を得ることができる。流れる電流の量は、ナノ細孔のサイズに感応性であり得る。DNA分子がナノ細孔を通過するとき、DNA分子上の各ヌクレオチドは、異なる程度でナノ細孔を遮ることができる。したがって、DNA分子がナノ細孔を通過するときにナノ細孔を通過する電流の変化は、DNA配列の読み出しを表すことができる。ナノ細孔配列決定技術は、Oxford Nanopore Technologies社製、例えば、GridlONシステムであり得る。単一のナノ細孔を、マイクロウエルの最上部を跨いでポリマー膜に挿入することができる。各マイクロウエルは、個別検知用の電極を有することができる。チップ当たり100,000個又はより多くのマイクロウエル(例えば、200,000、300,000、400,000、500,000、600,000、700,000、800,000、900,000又は1,000,000個以上)を持つマイクロウエルを、アレイチップ内に製造することができる。機器(又は、ノード)を使用して、チップを分析することができる。データは、リアルタイムで分析することができる。1つ以上の機器は、時間で作動することができる。ナノ細孔は、タンパク質ナノ細孔、例えば、タンパク質α-溶血素、七量体タンパク質孔であり得る。ナノ細孔は、固体ナノ細孔製、例えば合成膜(例えば、SiN_x又はSiO₂)内に形成される1ナノメートルサイズの穴であり得る。ナノ細孔は、複合型細孔(例えば、固体状態の膜へのタンパク質細孔の集積)であり得る。ナノ細孔は、集積化したセンサーを持つナノ細孔であり得る(例えば、トンネル電極検出器、容量検出器又はグラフェン性ナノギャップ若しくはエッジ状態検出器(例えば、Garajら(2010) Nature、67巻、doi:10.1038/nature09379を参照のこと))。ナノ細孔は、特定の型の分子を分析するために官能化することができる(例えば、DNA、RNA又はタンパク質)。ナノ細孔配列決定は、「鎖配列決定」を含むことができ、完全なDNAポリマーは、DNAが細孔を移行する際にリアルタイムに配列決定されながらタンパク質ナノ細孔を通過することができる。酵素は二本鎖DNAの鎖を分離することができ、ナノ細孔を通して鎖を与えることができる。DNAは一方の末端でヘアピンを有することができ、システムは両方の鎖を読み取ることができる。いくつかの場合において、ナノ細孔配列決定は、「エキソヌクレアーゼ配列決定」であり、個々のヌクレオチドは前進性エキソヌクレアーゼによってDNA鎖から切断されることができ、そのヌクレオチドはタンパク質ナノ細孔を通過することができる。ヌクレオチドは、細孔内の分子(例えば、シクロデキストラン)に一過的に結合することができる。電流の特性的断絶を使用して、塩基を同定する。 In some cases, next-generation sequencing is nanopore sequencing (see, eg, Soni GV and Meller A. (2007) Clin Chem 53: 1996-2001). The nanopores can be small holes as small as about 1 nanometer in diameter. By immersing the nanopores in a conductive liquid and applying a potential across the pores, a weak current due to the conduction of ions through the nanopores can be obtained. The amount of current flowing can be sensitive to the size of the nanopores. As the DNA molecule passes through the nanopores, each nucleotide on the DNA molecule can block the nanopores to a different extent. Therefore, the change in current that passes through the nanopores as the DNA molecule passes through the nanopores can represent a read-out of the DNA sequence. The nanopore sequencing technique can be from Oxford Nanopore Technologies, eg, the GridlON system. A single nanopore can be inserted into the polymer membrane across the top of the microwell. Each microwell can have an electrode for individual detection. Microwells with 100,000 or more microwells per chip (eg, 200,000, 300,000, 400,000, 500,000, 600,000, 700,000, 800,000, 900,000 or 1,000,000 or more) can be produced in an array chip. A device (or node) can be used to analyze the chip. The data can be analyzed in real time. One or more devices can operate in time. The nanopores can be protein nanopores, such as protein α-hemolysin, heptamer protein pores. The nanopores can be solid nanopores, eg, 1 nanometer sized holes formed within a synthetic membrane (eg, SiN _x or SiO ₂ ). The nanopores can be complex pores (eg, accumulation of protein pores in a solid membrane). The nanopores can be nanopores with integrated sensors (eg tunnel electrode detectors, capacitive detectors or graphene nanogap or edge state detectors (eg Garaj et al. (2010) Nature, Vol. 67). , Doi: 10.1038 / nature09379)). Nanopores can be functionalized to analyze a particular type of molecule (eg, DNA, RNA or protein). Nanopore sequencing can include "chain sequencing", where the complete DNA polymer can pass through protein nanopores while being sequenced in real time as the DNA translocates the pores. Enzymes can separate strands of double-stranded DNA and impart strands through nanopores. DNA can have a hairpin at one end and the system can read both strands. In some cases, nanopore sequencing is "exonuclease sequencing," where individual nucleotides can be cleaved from the DNA strand by an advanced exonuclease, and the nucleotides pass through protein nanopores. can do. Nucleotides can transiently bind to molecules within the pores (eg, cyclodextran). The characteristic disruption of the current is used to identify the base.

GENIA社製ナノ細孔配列決定技術を使用できる。操作したタンパク質細孔を、脂質二重層膜に埋めることができる。「能動的制御」技術を使用して、効果的なナノ細孔-膜アセンブリ及びチャネルを通るDNA運動の制御を可能にする。いくつかの場合において、ナノ細孔配列決定技術は、NABsys社製である。ゲノムDNAは、平均長約100kbの鎖に断片化することができる。100kb断片を一本鎖にし、その後6merプローブとハイブリダイズさせることができる。プローブを持つゲノム断片は、ナノ細孔を通り抜けることができ、電流対時間追跡を作り出すことができる。電流追跡は、各ゲノム断片上のプローブ位置を提供することができる。ゲノム断片を並べて、ゲノムに対するプローブマップを作り出すことができる。その過程は、プローブライブラリーに対して並列に行うことができる。各プローブに対してゲノム長のプローブマップを生成することができる。誤りは、「移動窓シーケンシングバイハイブリダイゼーション(Sequencing By Hybridization)(mwSBH)」と称される過程で直すことができる。いくつかの場合において、ナノ細孔配列決定技術は、IBM/Roche社製である。電子ビームを使用して、マイクロチップにナノ細孔サイズの開口部を作製することができる。電界を使用して、ナノ細孔を通してDNAを引き寄せる又はねじ込むことができる。ナノ細孔におけるDNAトランジスタ装置は、金属と誘電体が交互になったナノメートルサイズの層を含むことができる。DNA骨格中の別々の電荷を、電界によってDNAナノ細孔の内部に閉じ込めることができる。ゲート電圧をオン/オフすることにより、DNA配列を読み取ることができる。 GENIA nanopore sequencing technology can be used. The manipulated protein pores can be embedded in the lipid bilayer membrane. "Active control" technology is used to enable effective control of nanopore-membrane assembly and DNA movement through channels. In some cases, the nanopore sequencing technique is manufactured by NABsys. Genomic DNA can be fragmented into strands with an average length of about 100 kb. The 100 kb fragment can be single-stranded and then hybridized with the 6mer probe. Genome fragments with probes can traverse nanopores and create current-to-time tracking. Current tracking can provide probe positions on each genomic fragment. Genome fragments can be lined up to create a probe map for the genome. The process can be done in parallel with the probe library. A probe map of genome length can be generated for each probe. Errors can be corrected in a process called "Sequencing By Hybridization (mwSBH)". In some cases, the nanopore sequencing technique is manufactured by IBM / Roche. An electron beam can be used to create nanopore-sized openings in a microchip. An electric field can be used to attract or screw DNA through the nanopores. DNA transistor devices in nanopores can include nanometer-sized layers of alternating metal and dielectric. Separate charges in the DNA skeleton can be confined inside the DNA nanopores by an electric field. DNA sequences can be read by turning the gate voltage on and off.

次世代配列決定は、DNAナノボール配列決定を含むことができる(例えば、Complete Genomics社によって実施されるように、例えば、Drmanacら(2010) Science 327:78～81頁を参照のこと)。DNAを、単離し、断片化し、サイズ選択することができる。例えば、DNAは、約500bpの平均長に断片化することができる(例えば、超音波処理によって)。アダプター(Ad1)を、断片の末端に結合させることができる。アダプターを使用して、配列決定反応のためのアンカーにハイブリダイズさせることができる。各末端に結合したアダプターを持つDNAを、PCR増幅することができる。アダプター配列を修飾することができ、それにより相補的一本鎖末端が互いに結合して環状DNAを形成する。DNAをメチル化して、その後のステップにおいて使用するIIS型制限酵素による切断から保護することができる。アダプター(例えば、右アダプター)は、制限認識部位を有することができ、制限認識部位は非メチル化のままであり得る。アダプター中にある非メチル化制限認識部位は、制限酵素(例えば、Acul)によって認識され得、DNAは、Aculによって右アダプターの右側13bpで切断されて、線状二本鎖DNAを形成することができる。右及び左アダプターの第2巡(Ad2)を、線状DNAのいずれかの末端にライゲーションすることができ、両方のアダプターが結合しているDNAは全て、PCR増幅することができる(例えば、PCRによって)。Ad2配列を修飾して、それらが互いに結合し、環状DNAを形成できるようにすることができる。DNAはメチル化することができるが、制限酵素認識部位は、左Ad1アダプターにおいて非メチル化のままであり得る。制限酵素(例えば、Acul)を適用することができ、DNAは、Ad1の左側13bpで切断され、線状DNA断片を形成することができる。右及び左アダプターの第3巡(Ad3)を、線状DNAの右及び左側面にライゲーションすることができ、得られた断片はPCR増幅することができる。アダプターは修飾することができ、それによりそれらは互いに結合し、環状DNAを形成することができる。III型制限酵素(例えば、EcoP15)を添加することができ、EcoP15は、Ad3の左側26bp及びAd2の右側26bpでDNAを切断することができる。この切断は、DNAの大きなセグメントを除去し、DNAを再び線状化することができる。右及び左アダプターの第4巡(Ad4)を、DNAにライゲーションすることができ、DNAを増幅(例えば、PCRによって)し、修飾することができ、それによりそれらは互いに結合し、完成した環状DNA鋳型を形成する。 Next-generation sequencing can include DNA nanoball sequencing (see, eg, Drmanac et al. (2010) Science 327: 78-81, as performed by Complete Genomics). DNA can be isolated, fragmented and size-selected. For example, DNA can be fragmented to an average length of about 500 bp (eg, by sonication). The adapter (Ad1) can be attached to the end of the fragment. Adapters can be used to hybridize to anchors for sequencing reactions. DNA with an adapter attached to each end can be PCR amplified. The adapter sequence can be modified so that the complementary single-stranded ends bind to each other to form circular DNA. The DNA can be methylated to protect it from cleavage by the IIS type restriction enzymes used in subsequent steps. The adapter (eg, the right adapter) can have a restriction recognition site, which can remain unmethylated. Unmethylated restriction recognition sites in the adapter can be recognized by restriction enzymes (eg, Acul), and the DNA can be cleaved by Acul at 13 bp to the right of the right adapter to form linear double-stranded DNA. can. The second round (Ad2) of the right and left adapters can be ligated to either end of the linear DNA, and any DNA bound to both adapters can be PCR amplified (eg, PCR). By). The Ad2 sequences can be modified to allow them to bind to each other and form circular DNA. DNA can be methylated, but restriction enzyme recognition sites can remain unmethylated at the left Ad1 adapter. Restriction enzymes (eg, Acul) can be applied and the DNA can be cleaved at 13 bp to the left of Ad1 to form linear DNA fragments. The third round (Ad3) of the right and left adapters can be ligated to the right and left surfaces of the linear DNA, and the resulting fragments can be PCR amplified. Adapters can be modified so that they can bind to each other and form circular DNA. A type III restriction enzyme (eg, EcoP15) can be added, which can cleave DNA at 26 bp on the left side of Ad3 and 26 bp on the right side of Ad2. This cleavage can remove large segments of the DNA and re-straighten the DNA. The fourth round (Ad4) of the right and left adapters can be ligated to DNA, the DNA can be amplified (eg, by PCR) and modified so that they bind to each other and the completed circular DNA. Form a mold.

ローリングサークル複製(例えばPhi29 DNAポリメラーゼを使用する)を使用して、DNAの小さい断片を増幅することができる。4つのアダプター配列は、ハイブリダイズできるパリンドローム配列を含有することができ、一本鎖は、それ自体の上で折りたたまれて、平均で直径およそ200～300ナノメートルであり得るDNAナノボール(DNB(商標))を形成することができる。DNAナノボールは、マイクロアレイ(配列決定フローセル)に付着させることができる(例えば、吸着による)。フローセルは、二酸化ケイ素、チタン及びヘキサメチルジシラザン(HMDS)及びフォトレジスト材料でコーティングされたシリコンウエハであり得る。配列決定は、DNAに蛍光プローブをライゲーションすることによって、連鎖しない配列決定によって実施することができる。問い合わせられる位置の蛍光の色は、高分解能カメラによって可視化することができる。アダプター配列間のヌクレオチド配列の同一性を決定することができる。 Rolling circle replication (eg, using Phi29 DNA polymerase) can be used to amplify small pieces of DNA. The four adapter sequences can contain a hybridizable parindrome sequence, and the single strand can be folded over itself and average approximately 200-300 nanometers in diameter DNA nanoballs (DNB (DNB). Trademark)) can be formed. DNA nanoballs can be attached to microarrays (sequencing flow cells) (eg, by adsorption). The flow cell can be a silicon wafer coated with silicon dioxide, titanium and hexamethyldisilazane (HMDS) and a photoresist material. Sequencing can be performed by unlinked sequencing by ligating the DNA with a fluorescent probe. The color of the fluorescence at the inquired position can be visualized by a high resolution camera. The identity of the nucleotide sequences between the adapter sequences can be determined.

いくつかの実施形態において、高スループット配列決定は、AnyDot.chips(Genovoxx社、Germany)を使用して行うことができる。特に、AnyDot.chipsは、ヌクレオチド蛍光シグナル検出を10×～50×増強できる。AnyDot.chips及びそれを使用する方法については、国際公開WO 02088382、WO 03020968、WO 03031947、WO 2005044836、PCT/EP 05/05657、PCT/EP 05/05655、並びにドイツ特許出願第DE 101 49 786、DE 102 14 395、DE 103 56 837、DE 10 2004 009 704、DE 10 2004 025 696、DE 10 2004 025 746、DE 10 2004 025 694、DE 10 2004 025 695、DE 10 2004 025 744、DE 10 2004 025 745、及びDE 10 2005 012 301にある程度記載されている。 In some embodiments, high throughput sequencing can be performed using AnyDot.chips (Genovoxx, Germany). In particular, AnyDot.chips can enhance nucleotide fluorescence signal detection by 10x-50x. For information on AnyDot.chips and how to use them, see International Publication WO 02088382, WO 03020968, WO 03031947, WO 2005044836, PCT / EP 05/05657, PCT / EP 05/05655, and German Patent Application No. DE 101 49 786, DE 102 14 395, DE 103 56 837, DE 10 2004 009 704, DE 10 2004 025 696, DE 10 2004 025 746, DE 10 2004 025 694, DE 10 2004 025 695, DE 10 2004 025 744, DE 10 2004 025 It is described to some extent in 745 and DE 10 2005 012 301.

他の高スループット配列決定システムには、Venter, J.ら、Science、2001年2月16日、Adams, M.ら、Science、2000年3月24日、及びM. J. Leveneら、Science、299:682～686頁、2003年1月、並びに米国特許出願公開第2003/0044781号及び第2006/0078937号に開示されるものがある。そのようなシステム全体は、核酸分子上で測定される重合反応による塩基の経時的な付加によって、複数の塩基を有する標的核酸分子を配列決定することを含み、即ち、配列決定しようとする鋳型核酸分子上の核酸重合酵素の活性がリアルタイムで追跡される。次いで配列の塩基付加の各ステップでどの塩基が核酸重合酵素の触媒活性により標的核酸の成長相補鎖に組み込まれているかについて同定することにより、配列を推論することができる。標的核酸分子複合体上のポリメラーゼは、標的核酸分子に沿って移動し、活性部位でオリゴヌクレオチドプライマーを伸長するのに適した位置に提供される。それぞれ識別可能な型のヌクレオチド類似体が標的核酸配列中の異なるヌクレオチドに対して相補的である、複数の標識型のヌクレオチド類似体が活性部位のすぐ近くに提供される。成長核酸鎖は、ポリメラーゼを使用して活性部位で核酸鎖にヌクレオチド類似体を付加することによって伸長され、付加されようとするヌクレオチド類似体は、活性部位で標的核酸のヌクレオチドに相補的である。重合ステップの結果としてオリゴヌクレオチドプライマーに付加されたヌクレオチド類似体が、同定される。標識ヌクレオチド類似体を提供するステップと、成長核酸鎖を重合させるステップと、付加されたヌクレオチド類似体を同定するステップは繰り返され、それにより核酸鎖が更に伸長され、標的核酸の配列が決定される。 Other high-throughput sequencing systems include Venter, J. et al., Science, February 16, 2001, Adams, M. et al., Science, March 24, 2000, and MJ Levene et al., Science, 299: 682. ~ 686, January 2003, and US Patent Application Publication Nos. 2003/0044781 and 2006/0078937. Such an entire system comprises sequencing a target nucleic acid molecule having multiple bases by addition of bases over time by a polymerization reaction measured on the nucleic acid molecule, i.e., a template nucleic acid to be sequenced. The activity of nucleic acid polymerizing enzymes on the molecule is tracked in real time. The sequence can then be inferred by identifying which base is incorporated into the growth complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step of sequence base addition. The polymerase on the target nucleic acid molecule complex travels along the target nucleic acid molecule and is provided at a suitable position to extend the oligonucleotide primer at the active site. Multiple labeled nucleotide analogs are provided in the immediate vicinity of the active site, each of which is complementary to different nucleotides in the target nucleic acid sequence. The growing nucleic acid chain is extended by adding a nucleotide analog to the nucleic acid chain at the active site using a polymerase, and the nucleotide analog to be added is complementary to the nucleotide of the target nucleic acid at the active site. Nucleotide analogs added to oligonucleotide primers as a result of the polymerization step are identified. The steps of providing the labeled nucleotide analog, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated, whereby the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined. ..

特定の実施形態において、本開示は、本開示の1つ以上の構成要素を含む、キットを更に提供する。キットは、上述のものを含めて、当業者に明らかな任意の適用に使用することができる。キットは、例えば、複数の会合分子、固定剤、制限エンドヌクレアーゼ、リガーゼ及び/又はその組合せを含むことができる。いくつかの場合において、会合分子は、例えばヒストンを含めたタンパク質であり得る。いくつかの場合において、固定剤は、ホルムアルデヒド又は他の任意のDNA架橋剤であり得る。 In certain embodiments, the present disclosure further provides a kit comprising one or more components of the present disclosure. The kit can be used for any application apparent to those of skill in the art, including those described above. The kit can include, for example, multiple associated molecules, fixatives, restriction endonucleases, ligases and / or combinations thereof. In some cases, the associated molecule can be a protein, including, for example, histones. In some cases, the fixative can be formaldehyde or any other DNA cross-linking agent.

いくつかの場合において、キットは、複数のビーズを更に含むことができる。ビーズは、常磁性であり得及び/又は捕捉剤でコーティングされている。例えば、ビーズは、ストレプトアビジン及び/又は抗体でコーティングすることができる。 In some cases, the kit can further include multiple beads. The beads can be paramagnetic and / or coated with a scavenger. For example, beads can be coated with streptavidin and / or antibody.

いくつかの場合において、キットは、アダプターオリゴヌクレオチド及び/又は配列決定プライマーを含むことができる。更に、キットは、アダプターオリゴヌクレオチド及び/又は配列決定プライマーを使用してリード対を増幅することができる装置を含むことができる。 In some cases, the kit can include adapter oligonucleotides and / or sequencing primers. In addition, the kit can include a device capable of amplifying read pairs using adapter oligonucleotides and / or sequencing primers.

いくつかの場合において、キットは、溶解緩衝液、ライゲーション試薬(例えばdNTP、ポリメラーゼ、ポリヌクレオチドキナーゼ及び/又はリガーゼ緩衝液等)及びPCR試薬(例えばdNTP、ポリメラーゼ及び/又はPCR緩衝液等)を含むがこれに限定されない他の試薬を含むこともできる、 In some cases, the kit comprises lysis buffer, ligation buffer (eg, dNTP, polymerase, polynucleotide kinase and / or ligase buffer, etc.) and PCR reagent (eg, dNTP, polymerase and / or PCR buffer, etc.). Can also include other reagents, but not limited to this,

キットは、キットの構成要素を使用するための及び/又はリード対を生成するための指示書を含むこともできる。 The kit can also include instructions for using the components of the kit and / or for generating lead pairs.

図8に例示されるコンピュータシステム500は、媒体511及び/又はネットワークポート505から命令を読み取ることができる論理的装置として理解することができ、任意選択でそのネットワークポートは固定媒体512を有するサーバ509に接続することができる。図8に示すようなシステムは、CPU501、ディスク駆動装置503、キーボード515及び/又はマウス516など任意の入力装置並びに任意のモニタ507を含むことができる。データ通信は、指示された通信媒体よって局所的又は遠隔地でサーバに対して実現することができる。通信媒体は、データを送信及び/又は受信する任意の手段を含むことができる。例えば、通信媒体は、ネットワーク接続、ワイヤレス接続又はインターネット接続であり得る。そのような接続は、ワールドワイドウェブ通信を提供することができる。図8に例示されるように、本開示に関連するデータは、当事者522による受信及び/又は再調査のためにそのようなネットワーク若しくは接続により送信できると想定される。 The computer system 500 exemplified in FIG. 8 can be understood as a logical device capable of reading instructions from the medium 511 and / or the network port 505, the network port of which is optionally a server 509 having a fixed medium 512. Can be connected to. A system as shown in FIG. 8 can include any input device such as CPU 501, disk drive 503, keyboard 515 and / or mouse 516, as well as any monitor 507. Data communication can be realized for the server locally or remotely by the indicated communication medium. The communication medium can include any means of transmitting and / or receiving data. For example, the communication medium can be a network connection, a wireless connection or an internet connection. Such a connection can provide worldwide web communication. As illustrated in FIG. 8, it is assumed that the data associated with this disclosure may be transmitted over such networks or connections for reception and / or review by parties 522.

図9は、本開示の実施形態の例に関連して使用することができるコンピュータシステム100の第1のアーキテクチャ例を例示するブロック図である。図9中に表されるように、例のコンピュータシステムは処理命令用にプロセッサ102を含むことができる。プロセッサの限定されない例には、Intel Xeon(商標)プロセッサ、AMD Opteron(商標)プロセッサ、Samsung 32ビットRISC ARM 1176JZ(F)-S v1.0(商標)プロセッサ、ARM Cortex-A8 Samsung S5PC100(商標)プロセッサ、ARM Cortex-A8 Apple A4(商標)プロセッサ、Marvell PXA 930(商標)プロセッサ又は機能的に同等のプロセッサがある。複数スレッドの遂行を、並列処理に使用することができる。いくつかの実施形態において、クラスタ内の単一のコンピュータシステムにあるか、又は複数のコンピュータ、携帯電話及び/若しくは個人データ補助装置を含めたネットワークを介してシステム全体に分散しているかにかかわらず、複数のコアを持つ多重プロセッサ若しくはプロセッサも使用できる。 FIG. 9 is a block diagram illustrating a first architectural example of a computer system 100 that can be used in connection with the examples of embodiments of the present disclosure. As shown in FIG. 9, the example computer system can include a processor 102 for processing instructions. Unlimited examples of processors include Intel Xeon ™ processor, AMD Opteron ™ processor, Samsung 32-bit RISC ARM 1176JZ (F) -S v1.0 ™ processor, ARM Cortex-A8 Samsung S5 PC100 ™. There are processors, ARM Cortex-A8 Apple A4 ™ processors, Marvell PXA 930 ™ processors or functionally equivalent processors. Execution of multiple threads can be used for parallel processing. In some embodiments, whether in a single computer system within a cluster or distributed throughout the system over a network that includes multiple computers, mobile phones and / or personal data aids. , Multiple processors with multiple cores or processors can also be used.

図9に例示するように、高速キャッシュ104をプロセッサ102に接続する若しくはそれに組み込んで、プロセッサ102によって最近、又は頻繁に使用される命令若しくはデータ用の高速メモリを提供することができる。プロセッサ102は、プロセッサバス108によってノースブリッジ106に接続される。ノースブリッジ106は、メモリバス112によってランダムアクセスメモリ(RAM)110に接続されており、プロセッサ102によるRAM110へのアクセスを管理する。ノースブリッジ106は、チップセットバス116によってサウスブリッジ114にも接続される。サウスブリッジ114は、次に、周辺バス118に接続される。周辺バスは、例えば、PCI、PCI-X、PCI Express又は他の周辺バスであり得る。ノースブリッジ及びサウスブリッジは、しばしばプロセッサチップセットと称され、プロセッサとRAMと周辺バス118の周辺構成要素との間のデータ移動を管理する。いくつかの代わりのアーキテクチャにおいて、ノースブリッジの機能は、別々のノースブリッジチップを使用する代わりにプロセッサに組み込むことができる。 As illustrated in FIG. 9, a fast cache 104 can be connected to or incorporated into the processor 102 to provide fast memory for instructions or data that are recently or frequently used by the processor 102. The processor 102 is connected to the north bridge 106 by the processor bus 108. The north bridge 106 is connected to the random access memory (RAM) 110 by the memory bus 112 and manages access to the RAM 110 by the processor 102. Northbridge 106 is also connected to Southbridge 114 by chipset bus 116. Southbridge 114 is then connected to peripheral bus 118. Peripheral buses can be, for example, PCI, PCI-X, PCI Express or other peripheral buses. Northbridges and southbridges, often referred to as processor chipsets, manage the movement of data between the processor and RAM and the peripheral components of peripheral bus 118. In some alternative architectures, the functionality of the northbridge can be built into the processor instead of using a separate northbridge chip.

いくつかの実施形態において、システム100は、周辺バス118に取り付けられたアクセラレータカード122を含むことができる。アクセラレータは、特定の処理を加速するためのフィールドプログラマブルゲートアレイ(FPGA)又は他のハードウェアを含むことができる。例えば、アクセラレータを使用して、適応データを構成し直す又は拡張設定処理に使用される代数式を評価することができる。 In some embodiments, the system 100 may include an accelerator card 122 attached to the peripheral bus 118. Accelerators can include field programmable gate arrays (FPGAs) or other hardware to accelerate certain processes. For example, accelerators can be used to reconfigure adaptive data or evaluate algebraic expressions used in extended configuration processing.

ソフトウェア及びデータは、外部記憶装置124に記憶され、プロセッサによる使用のためにRAM110及び/又はキャッシュ104に読み込むことができる。システム100は、システム資源を管理するためのオペレーティングシステム、(オペレーティングシステムの限定されない例には、Linux(登録商標)、Windows(登録商標)、MACOS(商標)、BlackBerry OS(商標)、iOS(商標)及び他の機能的に同等のオペレーティングシステムがある)、並びにデータ記憶及び本開示の実施形態の例による最適化を管理するためにオペレーティングシステム上で実行するアプリケーションソフトウェアを含む。 The software and data are stored in external storage 124 and can be read into RAM 110 and / or cache 104 for use by the processor. System 100 is an operating system for managing system resources, (unlimited examples of operating systems are Linux®, Windows®, MACOS®, BlackBerry OS®, iOS®. ) And other functionally equivalent operating systems), as well as application software running on the operating system to manage data storage and optimization according to the examples of embodiments of the present disclosure.

この例において、システム100は、ネットワークアタッチドストレージ(NAS)などの外部記憶装置及び分散並列処理に使用できる他のコンピュータシステムに対するネットワークインターフェースを得るために周辺バスに接続されたネットワークインターフェースカード(NIC)120及び121も含む。 In this example, system 100 is a network interface card (NIC) connected to a peripheral bus to obtain a network interface to external storage devices such as network attached storage (NAS) and other computer systems that can be used for distributed parallel processing. Also includes 120 and 121.

図10は、複数のコンピュータシステム202a及び202b、複数の携帯電話及び個人データ補助装置202c並びにネットワークアタッチドストレージ(NAS)204a及び204bを持つネットワーク200を示す線図である。実施形態の例において、システム202a、202b及び202cは、データ記憶を管理し、ネットワークアタッチドストレージ(NAS)204a及び204bに記憶されているデータに対するデータアクセスを最適化することができる。数学モデルをデータに使用し、コンピュータシステム202a及び202b並びに携帯電話及び個人データ補助システム202c全体にわたる分散並列処理を使用して評価できる。コンピュータシステム202a及び202b並びに携帯電話及び個人データ補助システム202cが並列処理を行って、ネットワークアタッチドストレージ(NAS)204a及び204bに記憶されているデータの適応データを構成し直すこともできる。図10は単なる例示であり、他の様々なコンピュータアーキテクチャ及びシステムを本開示の様々な実施形態と共に使用できる。例えば、ブレードサーバーを使用して、並列処理を行うことができる。プロセッサブレードを背面電極に接続して、並列処理を行うことができる。記憶装置は、背面電極に又は別々のネットワークインターフェースよってネットワークアタッチドストレージ(NAS)として接続することもできる。 FIG. 10 is a schematic diagram showing a network 200 with a plurality of computer systems 202a and 202b, a plurality of mobile phones and personal data assisting devices 202c, and network attached storage (NAS) 204a and 204b. In an example of an embodiment, the systems 202a, 202b and 202c can manage data storage and optimize data access to the data stored in network attached storage (NAS) 204a and 204b. Mathematical models can be used for the data and evaluated using distributed parallel processing across computer systems 202a and 202b as well as mobile phones and personal data aid systems 202c. The computer systems 202a and 202b and the mobile phone and personal data auxiliary system 202c can also perform parallel processing to reconfigure the adaptive data of the data stored in the network attached storage (NAS) 204a and 204b. FIG. 10 is merely exemplary and various other computer architectures and systems can be used with the various embodiments of the present disclosure. For example, a blade server can be used for parallel processing. Processor blades can be connected to the back electrodes for parallel processing. The storage device can also be connected to the back electrode or as network attached storage (NAS) via a separate network interface.

いくつかの実施形態の例において、プロセッサは、別々のメモリスペースを維持し、ネットワークインターフェース、背面電極又は他のプロセッサによる並列処理用の他のコネクタを介してデータを送信することができる。他の実施形態において、プロセッサのいくつか又は全ては、共有仮想アドレスメモリスペースを使用することができる。 In some examples of embodiments, the processor maintains separate memory space and can transmit data via a network interface, back electrode or other connector for parallel processing by another processor. In other embodiments, some or all of the processors may use the shared virtual address memory space.

図11は、実施形態の例による共有仮想アドレスメモリスペースを使用する多重プロセッサコンピュータシステム300のブロック図である。システムは、共有メモリサブシステム304にアクセスできる複数のプロセッサ302a～fを含む。システムは、メモリサブシステム304に複数のプログラム可能なハードウェアメモリアルゴリズムプロセッサ(MAP)306a～fを組み込む。各MAP306a～fは、メモリ308a～f及び1つ以上のフィールドプログラマブルゲートアレイ(FPGA)310a～fを含むことができる。MAPは、設定可能な機能単位を提供し、特定のアルゴリズム又はアルゴリズムの部分を、それぞれのプロセッサと緊密に連携して処理するためにFPGA310a～fに提供することができる。例えば、MAPを使用して、データモデルに関する代数式を評価し、実施形態の例における適応データを構成し直すことができる。この例において、各MAPは、この目的のプロセッサの全てによって世界的にアクセス可能である。一構成において、各MAPは、ダイレクトメモリアクセス(DMA)を使用して付随するメモリ308a～fにアクセスすることができ、それによりそれぞれのマイクロプロセッサ302a～fとは独立して、及びそれらとは非同期でタスクを遂行することが可能になる。この構成において、MAPは、パイプライン及びアルゴリズムの並列遂行のために別のMAPに結果を直接フィードすることができる。 FIG. 11 is a block diagram of a multiprocessor computer system 300 using a shared virtual address memory space according to an example of an embodiment. The system includes multiple processors 302a-f that can access the shared memory subsystem 304. The system incorporates multiple programmable hardware memory algorithm processors (MAPs) 306a-f into memory subsystem 304. Each MAP306a-f can include memories 308a-f and one or more field programmable gate array (FPGA) 310a-f. The MAP provides configurable functional units and can provide specific algorithms or parts of the algorithms to FPGA 310a-f for processing in close coordination with their respective processors. For example, MAP can be used to evaluate algebraic expressions for the data model and reconstruct the adaptive data in the examples of embodiments. In this example, each MAP is globally accessible by all of the processors of this purpose. In one configuration, each MAP can access the associated memory 308a-f using direct memory access (DMA), thereby being independent of and from their respective microprocessors 302a-f. It becomes possible to perform tasks asynchronously. In this configuration, the MAP can feed the results directly to another MAP for parallel execution of the pipeline and algorithms.

上記のコンピュータアーキテクチャ及びシステムは、単なる例であり、他の様々なコンピュータ、携帯電話、及び個人データ補助アーキテクチャ並びにシステムを、汎用プロセッサ、コプロセッサ、FPGA及び他のプログラム可能な論理デバイス、システムオンチップ(SOC)、特定用途向け集積回路(ASIC)、並びに他の処理及び論理素子の任意の組合せを使用するシステムを含めた実施形態の例と共に使用することができる。いくつかの実施形態において、コンピュータシステムの全部又は一部は、ソフトウェア又はハードウェア内で実装することができる。任意の様々なデータ記憶媒体を、ランダムアクセスメモリ、ハードディスク、フラッシュメモリ、テープ装置、ディスクアレイ、ネットワークアタッチドストレージ(NAS)並びに他の局所的又は分散されたデータ記憶デバイス及びシステムを含めた実施形態の例と共に使用することができる。 The computer architectures and systems described above are merely examples, and various other computers, mobile phones, and personal data auxiliary architectures and systems can be used as general purpose processors, coprocessors, FPGAs and other programmable logical devices, system-on-chip. Can be used with examples of embodiments that include (SOC), application specific integrated circuits (ASICs), and systems that use any combination of other processing and logic elements. In some embodiments, all or part of a computer system can be implemented within software or hardware. Embodiments of any variety of data storage media including random access memory, hard disks, flash memory, tape devices, disk arrays, network attached storage (NAS) and other localized or distributed data storage devices and systems. Can be used with the example of.

実施形態の例において、コンピュータシステムは、上記の又は他のコンピュータアーキテクチャ及びシステムのいずれかの上で遂行するソフトウェアモジュールを使用して実装することができる。他の実施形態において、システムの機能は、ファームウェア、図11に述べたフィールドプログラマブルゲートアレイ(FPGA)などのプログラム可能な論理デバイス、システムオンチップ(SOC)、特定用途向け集積回路(ASIC)又は他の処理及び論理エレメントに部分的に又は完全に実装することができる。例えば、設定プロセッサ及び最適化プログラムは、図9に例示されるアクセラレータカード122などのハードウェアアクセラレータカードを利用してハードウェアアクセラレーションで実装することができる。 In an example of an embodiment, a computer system can be implemented using a software module that runs on any of the above or other computer architectures and systems. In other embodiments, the function of the system is firmware, a programmable logic device such as a field programmable gate array (FPGA) described in FIG. 11, a system on chip (SOC), an application specific integrated circuit (ASIC) or the like. Can be partially or completely implemented in the processing and logical elements of. For example, the configuration processor and the optimization program can be implemented by hardware acceleration using a hardware accelerator card such as the accelerator card 122 illustrated in FIG.

以下の実施例は、例示を目的とするものであり、本開示を制限しない。それらは使用され得るものの代表であり、一方で当業者に公知の他の手順を別法として使用することができる。 The following examples are for purposes of illustration only and do not limit this disclosure. They are representative of what can be used, while other procedures known to those of skill in the art can be used otherwise.

[実施例1]
in vitroでクロマチンを生成する方法 [Example 1]
How to produce chromatin in vitro

クロマチンを再構成する2つの手法は、特に注目するものである:1つの手法は、DNAへのヒストンのATP非依存的なランダム沈着を使用することになるが、他の手法は、周期的なヌクレオソームのATP依存的なアセンブリを使用する。本開示は、本明細書に開示する1つ以上の方法による手法のいずれかの使用を可能にする。クロマチンを生成する両方の手法の例は、Lusserら(「Strategies for the reconstitution of chromatin」、Nature Methods (2004)、1(1):19～26頁)において見出すことができ、その中の引用文献を含めて全体を参照により本明細書に組み込む。 Two methods of reconstitution of chromatin are of particular interest: one method will use ATP-independent random deposition of histones on DNA, while the other will be periodic. Use an ATP-dependent assembly of nucleosomes. The present disclosure allows the use of any of the techniques by one or more of the methods disclosed herein. Examples of both methods of producing chromatin can be found in Lusser et al. ("Strategies for the reconstitution of chromatin", Nature Methods (2004), 1 (1): pp. 19-26), in which references are cited. Incorporated herein by reference in its entirety, including.

[実施例2]
Hi-Cに基づく技法を使用するゲノムアセンブリ [Example 2]
Genome assembly using Hi-C based techniques

ヒト対象由来ゲノムを、500kbのサイズを有する偽コンティグに断片化した。Hi-Cに基づく方法を使用して、生細胞内の染色体の物理的レイアウトをプロービングすることによって複数のリード対を生成した。Lieberman-Aidenら(「Comprehensive mapping of long range interactions reveals folding principles of the human genome」、Science (2009)、326 (5950):289～293頁)に提示される方法を含めて、任意の数のHi-Cに基づく方法を使用して、リード対を生成することができ、その中の引用文献を含めて、完全に本明細書に組み込む。リード対を全ての偽コンティグにマッピングし、2つの別々の偽コンティグにマッピングされた対を使用してマッピングデータに基づく隣接行列を構築する。偽コンティグの端までのリードの距離の写像を得ることにより少なくとも約50%、約60%、約70%、約80%、約90%、約95%若しくは約99%のリード対に重み付けして、経験的に公知の長い接触よりも短い接触のより高い確率を数学的に組み込む。次いで、各偽コンティグについて、隣接行列を分析して、単一の最も隣接する偽コンティグを見つけることによって偽コンティグを通る経路を決定し、その経路は、最高の重み合計を有することによって決定された。これらの方法を実施することにより、全ての偽コンティグの>97%が、正しい隣接を同定することが判明した。追加の実験を実施して、より短いコンティグ並びに代わりの重み付け及び経路を見つけるスキームの影響を試験することができる。 The genome from a human subject was fragmented into a pseudocontig with a size of 500 kb. Using a Hi-C-based method, multiple read pairs were generated by probing the physical layout of the chromosomes in living cells. Any number of Hi, including the method presented in Lieberman-Aiden et al. ("Comprehensive mapping of long range interactions reveals folding principles of the human genome", Science (2009), 326 (5950): 289-293). -C-based methods can be used to generate lead pairs, including the references contained therein, which are fully incorporated herein. Map read pairs to all false contigs and use pairs mapped to two separate false contigs to build an adjacency matrix based on the mapping data. Weighting at least about 50%, about 60%, about 70%, about 80%, about 90%, about 95% or about 99% of lead pairs by obtaining a mapping of the lead distance to the end of the fake contig. Mathematically incorporates a higher probability of short contacts than empirically known long contacts. For each sham contig, the adjacency matrix was then analyzed to determine the path through the sham contig by finding the single most adjacent sham contig, which path was determined to have the highest weight sum. .. By performing these methods, it was found that> 97% of all false contigs identified the correct adjacency. Additional experiments can be performed to test the effects of shorter contigs as well as alternative weighting and route finding schemes.

別法として、Hi-Cデータを使用するゲノムアセンブリは、de novoゲノムアセンブリの超長距離スキャフォールドに、Hi-Cデータセット中のゲノム近接シグナルを活用する計算法を含むことができる。本明細書に開示する方法と共に使用できるそのような計算法の例には、Burtonら(Nature Biotechnology 31:1119～1125頁(2013))による隣接クロマチンライゲーション法、及びKaplanら(Nature Biotechnology 31:1143～47頁(2013))によるDNA三角測量法があり、その参照、及びその中の任意の引用文献を完全に本明細書に組み込む。更に、これらの計算法は、本明細書に提示される他のゲノムアセンブリ法と組み合わせて使用できることを理解すべきである。 Alternatively, genomic assemblies that use Hi-C data can include computational methods that leverage genomic proximity signals in Hi-C datasets in the ultra-long-range scaffolds of the de novo genomic assembly. Examples of such calculations that can be used with the methods disclosed herein include the adjacent chromatin ligation method by Burton et al. (Nature Biotechnology 31: 111-1125 (2013)) and Kaplan et al. (Nature Biotechnology 31: 1143). There is a DNA trigonometric method according to (2013)), the reference thereof, and any citations therein are fully incorporated herein. Furthermore, it should be understood that these computational methods can be used in combination with the other genomic assembly methods presented herein.

例えば、(a)コンティグを染色体群にクラスタリングするステップと、(b)1つ以上の染色体群の中で、コンティグを順序付けるステップと、次いで(c)相対的な方向を個々のコンティグに割り当てるステップとを含む、Burtonらに基づく隣接クロマチンライゲーション法は、本明細書に開示する方法と共に使用することができる。ステップ(a)の場合、コンティグは、階層的クラスタリングを使用して群に入れられる。各ノードが最初に1つのコンティグを表し、ノード間の各枝が2つのコンティグを連結するHi-Cリード対の数に等しい重みを有するグラフが組み立てられる。コンティグは、平均連結距離による階層的凝集型クラスタリングを使用して一緒に統合され、群の数が別個の染色体から期待される数に減少するまでそれを適用する(2つ以上のコンティグを持つ群だけを計数する)。反復コンティグ(制限断片部位の数により標準化された、他のコンティグを含む平均連結密度が、平均連結密度より2倍大きいコンティグ)及び制限断片部位がほとんどないコンティグは、クラスタされない。しかしながら、クラスタリング後に、その群のコンティグの平均連結密度が任意の他の群の平均連結密度より4倍大きい場合、これらコンティグのそれぞれを、群に割り当てる。ステップ(b)の場合、クラスタリングステップのようにグラフが組み立てられるが、コンティグ間のHi-C連結の数の逆数に等しいノード間の枝重みを持ち、コンティグ当たりの制限断片部位の数によって標準化される。短いコンティグは、このグラフから除外される。最小全域木が、このグラフのために算出される。この木の最長経路である「幹」が、見つけられる。次いで全域木は、合計枝重みを経験的に低く保つ方法で、幹に隣接するコンティグを幹に付加することによって幹を長くするように修飾される。各群について長くなった幹が見られた後に、それを以下の通りに完全な順序に変換する。幹を全域木から除去し、幹中にない全てのコンティグを含有する1組の「枝」を残す。順序付けながら隣接するコンティグ間の連結数を最大化するように、選んだ挿入部位で最初に最も長い枝である幹に、これらの枝を再挿入する。短い断片は、再挿入されない、その結果、クラスタ形成した小さいコンティグの多くは、最終アセンブリから省かれる。ステップ(c)の場合、各コンティグ上のHi-C連結整列化の正確な位置を考慮することによって、その順序の中で各コンティグの方向が決定される。ゲノム距離xで2つのリードを接続しているHi-C連結の尤度は、x ≧約100Kbにつきおおよそ1/xであると仮定される。所与の順序でコンティグを方向付けるために考え得る全ての方法を表す重み付き有向非循環グラフ(WDAG)が、組み立てられる。WDAGにおける各枝は、4つの考え得る方向の組み合わせのうち1つにある、一対の隣接したコンティグに対応し、枝重みは、2つのコンティグ間のHi-C連結距離の組を観察する対数尤度に設定され、それらが所与の方向で直接隣接していると仮定される。各コンティグについて、その方向の品質スコアは、以下の通りに算出される。このコンティグ間で観察されたHi-C連結の組の対数尤度は、その現在の方向及びその近傍に見出される。次いで、コンティグを反転し、対数尤度を再算出する。第1の対数尤度は、方向が算出される方法なので、より高くなると保証される。対数尤度間の差異が、品質スコアと見なされる。 For example, (a) clustering contigs into chromosome groups, (b) ordering contigs within one or more chromosome groups, and (c) assigning relative directions to individual contigs. Adjacent chromatin ligation methods based on Burton et al., Including, can be used in conjunction with the methods disclosed herein. For step (a), the contigs are grouped using hierarchical clustering. Each node initially represents one contig, and a graph is constructed in which each branch between the nodes has a weight equal to the number of Hi-C read pairs connecting the two contigs. Contigs are integrated together using hierarchical aggregated clustering by mean link distance and applied until the number of groups is reduced from separate chromosomes to the expected number (groups with two or more contigs). Only count). Repeated contigs (contigs standardized by the number of restricted fragment sites, whose average contigulation density including other contigs is twice greater than the average contigulation density) and contigs with few restricted fragment sites are not clustered. However, after clustering, if the average contigation density of the contigs in that group is four times greater than the average contigulation density of any other group, then each of these contigs is assigned to the group. For step (b), the graph is constructed like a clustering step, but with branch weights between nodes equal to the reciprocal of the number of Hi-C contigs between contigs, standardized by the number of restricted fragment sites per contig. Ru. Short contigs are excluded from this graph. The minimum spanning tree is calculated for this graph. The longest route of this tree, the "trunk", can be found. The spanning tree is then modified to lengthen the trunk by adding contigs adjacent to the trunk to the trunk in a manner that keeps the total branch weight empirically low. After seeing a long stem for each group, convert it to the complete order as follows: Remove the trunk from the spanning tree, leaving a set of "branches" containing all contigs that are not in the trunk. Reinsert these branches into the trunk, which is the first and longest branch at the selected insertion site, so as to maximize the number of connections between adjacent contigs while ordering. Short pieces are not reinserted, resulting in many of the small clustered contigs being omitted from the final assembly. For step (c), the orientation of each contig is determined in that order by considering the exact location of the Hi-C contiguous alignment on each contig. The likelihood of a Hi-C link connecting two reads at a genomic distance x is assumed to be approximately 1 / x per x ≧ about 100 Kb. A weighted directed acyclic graph (WDAG) is constructed that represents all possible ways to orient the contig in a given order. Each branch in WDAG corresponds to a pair of adjacent contigs in one of four possible directional combinations, and the branch weight is a log-likelihood that observes the set of Hi-C connection distances between the two contigs. It is set to degrees and it is assumed that they are directly adjacent in a given direction. For each contig, the quality score in that direction is calculated as follows. The log-likelihood of the set of Hi-C linkages observed between these contigs is found in and near its current direction. Then, the contig is inverted and the log-likelihood is recalculated. The first log-likelihood is guaranteed to be higher because the direction is calculated. The difference between the log-likelihoods is considered the quality score.

Kaplanらの方法と類似の、代わりのDNA三角測量を本明細書に開示する方法に使用して、コンティグ及びリード対からゲノムをアセンブルすることができる。DNA三角測量は、ゲノムの場所を推測するための、in vivoでの高スループットのゲノム規模クロマチン相互作用データの使用に基づく。DNA三角測量法の場合、CTRパターンは、ゲノムを100kbの瓶に分割し、それぞれが大きな仮想コンティグを表しており、配置されたそれぞれのコンティグについて各染色体の平均相互作用頻度を算出することによって最初に定量化される。長い範囲にわたる局在を評価するために、各側で隣接する1mbを持つコンティグの相互作用データが除かれる。平均相互作用頻度は、染色体内相互作用と染色体間相互作用とを強く隔て、コンティグがどちらの染色体に属しているかについて高度に予測する。次に、単純なマルチクラスモデルである単純ベイズ分類器を訓練して、各染色体の平均相互作用頻度に基づく各コンティグの染色体を予測する。ゲノムのアセンブルされた部分を使用して、Hi-C相互作用頻度とゲノム距離の関係(DDDパターン)を記載している確率的単一パラメータ指数関数的減衰モデルに合わせる。各回に、各側1Mbの隣接領域と共にコンティグは染色体から除去される。次いで、相互作用プロファイル及び減衰モデルに基づく各コンティグに対して最も見込みのある位置が推定される。予測エラーは、予測された位置と実際の位置との距離の絶対値として定量化される。 An alternative DNA triangulation similar to the method of Kaplan et al. Can be used in the method disclosed herein to assemble the genome from contigs and read pairs. DNA triangulation is based on the use of high-throughput genome-scale chromatin interaction data in vivo to infer the location of the genome. For DNA triangulation, the CTR pattern begins by dividing the genome into 100 kb bottles, each representing a large virtual contig, and calculating the average interaction frequency of each chromosome for each placed contig. Is quantified to. Interaction data for contigs with adjacent 1mbs on each side are excluded to assess long-range localization. Mean interaction frequency strongly separates intrachromosomal and interchromosomal interactions and is highly predictive of which chromosome the contig belongs to. Next, a simple multiclass model, the naive Bayes classifier, is trained to predict the chromosomes of each contig based on the mean interaction frequency of each chromosome. The assembled portion of the genome is used to fit a probabilistic single-parameter exponential decay model that describes the relationship between Hi-C interaction frequency and genomic distance (DDD pattern). Each time, the contig is removed from the chromosome with an adjacent region of 1 Mb on each side. The most probable position is then estimated for each contig based on the interaction profile and decay model. Prediction errors are quantified as the absolute value of the distance between the predicted position and the actual position.

長い挿入ライブラリーとDNA三角測量を組み合わせることにより、各コンティグに対する予測性を、更に改善することができる。各コンティグは近くのコンティグと対にされれば十分なので、染色体の配置及び各コンティグのおおよその場所を知ることにより、長い挿入スキャフォールドの計算の複雑性を著しく減少させ、それによって、曖昧なコンティグ接合が解消され、染色体の離れた領域又は異なる染色体に位置するコンティグが誤って接合されるアセンブリエラーを減少させる可能性がある。 Combining a long insertion library with DNA triangulation can further improve the predictability for each contig. Knowing the arrangement of chromosomes and the approximate location of each contig significantly reduces the complexity of calculating long insertion scaffolds, as each contig only needs to be paired with a nearby contig, thereby obscuring the contig. The mating can be eliminated and contigs located in distant regions of chromosomes or on different chromosomes can reduce assembly errors that are incorrectly mated.

[実施例3]
ハプロタイプフェージングの方法 [Example 3]
Haplotype fading method

本明細書に開示する方法によって生成されるリード対は通常、染色体内接触に由来するので、ヘテロ接合性の部位を含有する任意のリード対は、そのフェージングに関する情報も保有することになる。この情報を使用して、短、中、更に長(メガベース)距離に対する信頼性が高いフェージングを、迅速且つ正確に実施することができる。1000人ゲノムの3人組(母親/父親/子ゲノムの1組)のうちの1組からデータをフェージングするように設計された実験は、確実に推測されるフェージングを有する。加えて、Selvarajら(Nature Biotechnology 31:1111～1118頁(2013))と類似の近接ライゲーションを使用するハプロタイプ再構築も、本明細書に開示するハプロタイプフェージング法と共に使用することができる。 Since the read pairs produced by the methods disclosed herein usually derive from intrachromosomal contacts, any read pair containing a heterozygous site will also retain information about its fading. This information can be used to perform reliable fading for short, medium and even longer (megabase) distances quickly and accurately. Experiments designed to fade data from one of a triad of 1000 genomes (one of the mother / father / child genomes) have definitely inferred fading. In addition, haplotype reconstruction using proximity ligation similar to Selvaraj et al. (Nature Biotechnology 31: 1111-1118 (2013)) can also be used with the haplotype fading method disclosed herein.

例えば、近接ライゲーションに基づく方法を使用するハプロタイプ再構築は、ゲノムをフェージングする本明細書に開示する方法に使用することもできる。近接ライゲーションに基づく方法を使用するハプロタイプ再構築は、近接ライゲーション及びDNA配列決定をハプロタイプアセンブリのための確率的アルゴリズムと組み合わせる。第1に、近接ライゲーション配列決定は、Hi-C手順など染色体捕捉手順を使用して実施される。これらの方法は、三次元空間で一緒に巻き付いている2つの離れたゲノム座からDNA断片を捕捉することができる。得られたDNAライブラリーのショットガンDNA配列決定の後に、ペアエンド配列決定リードは、数百塩基対から数千万塩基対の範囲の「挿入サイズ」を有する。したがって、Hi-C実験において生成される短いDNA断片は、小さなハプロタイプブロックを産生することができ、長い断片はこれらの小さいブロックを最終的に連結することができる。十分な配列決定カバレッジにより、この手法は、バリアントを非連続的なブロックに連結し、そのようなブロックの全てを単一ハプロタイプにアセンブルする潜在性を有する。次いでこのデータは、ハプロタイプアセンブリのための確率的アルゴリズムと組み合わされる。確率的アルゴリズムは、ノードがヘテロ接合性バリアントに対応し、枝がバリアントを連結できる重なり合う配列断片に対応するグラフを利用する。このグラフは、配列決定エラー又はトランス相互作用に起因する偽の枝を含有することがある。次いで最大カットアルゴリズムを使用して、入力配列決定リードの組によって得られるハプロタイプ情報と最大限一致する節約解を予測する。近接ライゲーションは、従来のゲノム配列決定又はメイトペア配列決定より大きなグラフを生成するので、計算時間及び繰り返しの数を改変し、それによりハプロタイプを合理的な速度及び高い精度で予測することができる。次いで得られたデータを使用し、Beagleソフトウェア及びゲノムプロジェクトからの配列決定データを使用して局所的フェージングを導き、それにより高分解能及び精度で染色体に跨るハプロタイプを生成することができる。 For example, haplotype reconstruction using proximity ligation-based methods can also be used in the methods disclosed herein for fading the genome. Haplotype reconstruction using a proximity ligation-based method combines proximity ligation and DNA sequencing with a probabilistic algorithm for haplotype assembly. First, proximity ligation sequencing is performed using chromosomal capture procedures such as the Hi-C procedure. These methods can capture DNA fragments from two distant genomic loci that wrap together in three-dimensional space. After the shotgun DNA sequencing of the resulting DNA library, the paired-end sequencing reads have an "insertion size" ranging from hundreds to tens of millions of base pairs. Thus, the short DNA fragments produced in the Hi-C experiment can produce small haplotype blocks, and the long fragments can ultimately ligate these small blocks. With sufficient sequencing coverage, this technique has the potential to concatenate variants into discontinuous blocks and assemble all such blocks into a single haplotype. This data is then combined with a probabilistic algorithm for haplotype assembly. The probabilistic algorithm utilizes a graph corresponding to overlapping sequence fragments in which nodes correspond to heterozygous variants and branches can concatenate variants. This graph may contain false branches due to sequencing errors or trans interactions. The maximum cut algorithm is then used to predict a conservative solution that best matches the haplotype information obtained by the set of input sequencing leads. Proximity ligation produces larger graphs than traditional genomic sequencing or mate pair sequencing, so the computation time and number of iterations can be modified to predict haplotypes with reasonable speed and accuracy. The data obtained can then be used to guide local fading using Beagle software and sequencing data from the Genome Project, thereby generating haplotypes across chromosomes with high resolution and accuracy.

[実施例4]
メタゲノムアセンブリの方法 [Example 4]
Metagenomic assembly method

微生物を環境から収集し、ホルムアルデヒドなどの固定剤で固定して、微生物細胞内で架橋を形成する。高スループット配列決定を使用することによって、微生物由来の複数のコンティグが生成される。Hi-Cに基づく技法を使用することによって複数のリード対が生成される。異なるコンティグにマッピングされるリード対は、どのコンティグが同じ種由来であるかを示す。 Microorganisms are collected from the environment and fixed with a fixative such as formaldehyde to form crosslinks within the microbial cells. By using high-throughput sequencing, multiple microbial-derived contigs are generated. Multiple read pairs are generated by using Hi-C based techniques. Read pairs that map to different contigs indicate which contigs are from the same species.

[実施例5]
非常に長い範囲のリード対(XLRP)を作製する方法 [Example 5]
How to make a very long range of lead pairs (XLRP)

市販のキットを使用して、DNAを、150kbpまでの断片サイズに抽出する。Activ Motif製の市販キットを使用して、in vitroでDNAを再構成クロマチン構造にアセンブルする。クロマチンをビオチン化して、ホルムアルデヒドで固定し、ストレプトアビジンビーズに固定化する。DNA断片を制限酵素で消化し、終夜インキュベートする。得られた粘着末端をα-チオ-dGTP及びビオチン化dCTPで埋めて、平滑末端を生成する。平滑末端を、T4リガーゼでライゲーションする。再構成クロマチンをプロテイナーゼで消化して、ライゲーションしたDNAを回収する。DNAをビーズから抽出し、エキソヌクレアーゼ消化に供して、ライゲーションしていない末端からビオチンを除去する。回収したDNAを剪断し、末端をdNTPで埋める。ビオチン化断片を、ストレプトアビジンビーズによるプルダウンで精製する。いくつかの場合において、アダプターをライゲーションし、高スループット配列決定のために断片をPCTで増幅する。 DNA is extracted to fragment sizes up to 150 kbp using a commercially available kit. Assemble DNA into reconstituted chromatin structures in vitro using a commercial kit from Activ Motif. Chromatin is biotinylated, fixed with formaldehyde, and immobilized on streptavidin beads. DNA fragments are digested with restriction enzymes and incubated overnight. The resulting sticky ends are filled with α-thio-dGTP and biotinylated dCTP to produce blunt ends. Ligase the blunt ends with T4 ligase. The reconstituted chromatin is digested with proteinase to recover the ligated DNA. DNA is extracted from the beads and subjected to exonuclease digestion to remove biotin from the unligated ends. The recovered DNA is sheared and the ends are filled with dNTP. The biotinylated fragment is purified by pull-down with streptavidin beads. In some cases, the adapter is ligated and the fragment is amplified with PCT for high throughput sequencing.

[実施例6]
高品質ヒトゲノムアセンブリを作製する方法 [Example 6]
How to make a high quality human genome assembly

本開示によって相当なゲノム距離に跨るリード対を生成できるという知識により、ゲノムアセンブリに対するこの情報の利用を試験することができる。本開示は、de novoアセンブリの連結を、潜在的には染色体長のスキャフォールドまで、著しく改善することができる。判定は、本開示を使用してどの程度完全なアセンブリを作製できるか、及びどの程度の量のデータを必要とすることになるかについて実施することができる。アセンブリに役立つデータを作製する場合の本方法の有効性を評価するために、標準的なIlluminaショットガンライブラリー及びXLRPライブラリーを組み立て、配列決定することができる。ある場合において、標準的なショットガンライブラリー及びXLRPライブラリーのそれぞれについて、Illumina HiSeqの1レーンのデータを使用する。各方法から生成されたデータを試験し、既存の様々なアセンブラと比較する。任意選択で、本開示によって作製される固有のデータに特に合わせて調整するために、新たなアセンブラについても記される。任意選択で、よく特徴付けられたヒトサンプルを使用して、本方法によって作製されるアセンブリに対して比較するための参照を準備し、それにより本方法の精度及び完全性を判定する。これまでの分析で得られた知見を使用して、アセンブラを作製して、XLRP及びショットガンデータの効果的且つ有効な利用を高める。2002年12月のマウスゲノムドラフト又はより良い品質のゲノムアセンブリを、本明細書に記載される方法を使用して生成する。 The knowledge that this disclosure can generate read pairs across significant genomic distances allows us to test the use of this information for genomic assembly. The present disclosure can significantly improve the ligation of de novo assemblies, potentially down to chromosomal length scaffolds. Determination can be made as to how complete the assembly can be made using the present disclosure and how much data will be required. To assess the effectiveness of this method in producing useful data for assembly, standard Illumina shotgun and XLRP libraries can be assembled and sequenced. In some cases, one lane of Illumina HiSeq data is used for each of the standard shotgun and XLRP libraries. The data generated from each method is tested and compared to various existing assemblers. The new assembler is also described, optionally, to be tailored specifically to the unique data produced by the present disclosure. Optionally, a well-characterized human sample is used to prepare a reference for comparison to the assembly made by the method, thereby determining the accuracy and completeness of the method. The findings obtained from previous analyses will be used to create assemblers to enhance the effective and effective use of XLRP and shotgun data. A December 2002 mouse genome draft or better quality genome assembly is generated using the methods described herein.

この分析に使用できるサンプルの1つは、NA12878である。サンプル細胞のDNAを、DNA断片長を最大化するように設計された様々な公開されている技法を使用して抽出する。標準的なIllumina TruSeqショットガンライブラリー及びXLRPライブラリーが、それぞれ組み立てられる。各ライブラリーについてHiSeqの単一レーンの2×150bpの配列が得られ、ライブラリー当たりおよそ150,000,000個のリード対を産生することができる。ショットガンデータは、全ゲノムアセンブリのアルゴリズムを使用してコンティグにアセンブルされる。そのようなアルゴリズムの例には、Chapmanら(PLOS ONE 6(8):e2350頁(2011))に記載のMeraculous又はSimpsonら(Genome research 22(3):549～56頁(2012))に記載のSGAがある。XLRPライブラリーリードは、最初のアセンブリによって作製されたコンティグに整列される。整列化を使用して、コンティグを更に連結する。コンティグを接続するにあたってのXLRPライブラリーの効果が一度確かめられるならば、Meraculousアセンブリを拡張して、単一アセンブリ過程にショットガン及びXLRPライブラリーを同時に組み込む。Meraculousは、アセンブラのための強力な基盤を提供する。任意選択で、本開示の特定の必要性を満たすために、オールインワンアセンブラが作製される。本開示によってアセンブルされたヒトゲノムを公知の任意の配列と比較して、ゲノムのアセンブリにおける品質を評価する。 One of the samples that can be used for this analysis is NA12878. The DNA of the sample cells is extracted using a variety of published techniques designed to maximize DNA fragment length. A standard Illumina TruSeq shotgun library and XLRP library are assembled respectively. A single lane 2 × 150 bp sequence of HiSeq is obtained for each library, which can produce approximately 150,000,000 read pairs per library. Shotgun data is assembled into contigs using whole-genome assembly algorithms. Examples of such algorithms are described in Meraculous or Simpson et al. (Genome research 22 (3): 549-56 (2012)) described in Chapman et al. (PLOS ONE 6 (8): e2350 (2011)). There is SGA. XLRP library reads are aligned to the contig created by the first assembly. Alignment is used to further connect the contigs. Once the effectiveness of the XLRP library on connecting contigs can be seen, extend the Meraculous assembly to include the shotgun and XLRP library in a single assembly process at the same time. Meraculous provides a strong foundation for assembler. Optionally, an all-in-one assembler is made to meet the specific needs of the present disclosure. The human genome assembled by the present disclosure is compared to any known sequence to assess the quality of the genome assembly.

[実施例7]
小さいデータセットから高精度でヒトサンプルのヘテロ接合性SNPをフェージングする方法 [Example 7]
How to Fad Heterozygous SNPs in Human Samples with High Precision from Small Data Sets

一実験において、試験ヒトサンプルデータセット中のヘテロ接合性バリアントのおよそ44%が、フェージングしている。制限部位の1リード長の距離内にある全て又はほとんど全てのフェージングバリアントが捕捉される。in silico分析を使用することによって、フェージングのためのより多くのバリアントは、より長いリード長を使用する及び消化に1つ以上の組合せの制限酵素を使用することによって捕捉することができる。異なる制限部位を持つ制限酵素の組合せを使用することにより、各リード対に関与する2つの制限部位のうち1つの範囲内にあるゲノム(したがって、ヘテロ接合性部位)の割合が増加する。in silico分析は、本開示の方法が2つの制限酵素の様々な組合せを使用して公知のヘテロ接合性位置の95%以上をフェージングできることを示す。追加の酵素並びにより長いリード長は、観察され、フェージングされるヘテロ接合性部位の画分を完全なカバレッジ及びフェージングに至るまで更に増加させる。 In one experiment, approximately 44% of the heterozygous variants in the test human sample dataset are fading. All or almost all fading variants within a one-lead length distance of the restriction site are captured. By using in silico analysis, more variants for fading can be captured by using longer read lengths and by using one or more combinations of restriction enzymes for digestion. The use of a combination of restriction enzymes with different restriction sites increases the proportion of genomes (and thus heterozygous sites) within one of the two restriction sites involved in each read pair. In silico analysis shows that the methods disclosed are capable of fading over 95% of known heterozygous positions using various combinations of two restriction enzymes. Additional enzymes as well as longer read lengths further increase the fraction of the observed and faded heterozygous sites to full coverage and fading.

2つの制限酵素の様々な組合せで実現可能なヘテロ接合性部位のカバレッジを算出する。リード近接におけるヘテロ接合性部位に関して上位3つの組合せを、本手順で試験する。これらの組合せのそれぞれについて、XLRPライブラリーを作製し、配列決定する。得られたリードを、ヒト参照ゲノムに対して整列し、サンプルの公知のハプロタイプと比較して、手順の精度を決定する。ヒトサンプルのヘテロ接合性SNPの90%以上までが、Illumina HiSeqの1レーンだけのデータを使用して99%以上の高い精度でフェージングされる。加えて、300bpまでリード長を増加させることによって、更なるバリアントが捕捉される。観察可能な制限部位周辺のリードエリアは、効果的に2倍になる。追加の制限酵素の組合せを実装して、カバレッジ及び精度を増加させる。 Calculate the coverage of heterozygous sites that can be achieved with various combinations of the two restriction enzymes. The top three combinations for heterozygous sites in lead proximity are tested in this procedure. For each of these combinations, an XLRP library is created and sequenced. The resulting reads are aligned with the human reference genome and compared to the known haplotypes of the sample to determine the accuracy of the procedure. Up to 90% or more of the heterozygous SNPs in human samples are faded with a high accuracy of 99% or more using data from only one lane of Illumina HiSeq. In addition, further variants are captured by increasing the read length up to 300 bp. The lead area around the observable restricted area is effectively doubled. Implement additional restriction enzyme combinations to increase coverage and accuracy.

[実施例8]
高分子量DNAの抽出及び効果 [Example 8]
Extraction and effects of high molecular weight DNA

150kbp以下のDNAを、市販のキットで抽出した。図7は、抽出したDNAの最大断片長まで、捕捉リード対からXLRPライブラリーを生成できることを実証している。したがって、本明細書に開示する方法は、更に長いDNAのストレッチからリード対を生成する能力があると期待できる。高分子量DNAを回収するために十分に開発された過程が多数あり、これらの方法は、本明細書に開示する方法又は手順と一緒に使用することができる。大きな断片長のDNAを作製する抽出法を使用して、これらの断片からXLRPライブラリーを作り出し、作製されるリード対を評価することができる。例えば、大きな分子量のDNAは、(1)Teagueら(Proc. Nat. Acad. Sci. USA 107 (24):10848～53頁(2010))又はZhouら(PLOS Genetics、5(11)、:e1000711頁(2009))による細胞の穏やかな溶解、及び(2)Wingら(The Plant Journal:for Cell and Molecular Biology 4(5):893～8頁(1993))によるアガロースゲルプラグにより、その参照文献は、その中にあるいずれの引用文献も含めて完全に本明細書に組み込まれる、又はBoreal Genomics製のAurora Systemを使用することによって抽出できる。これらの方法は、次世代配列決定の常法に必要とされるDNAを越える長いDNA断片を生成する能力があるが、当技術分野において公知の他の任意の適切な方法を、類似の結果を実現するために代用することができる。Aurora Systemは他にない結果をもたらし、組織又は他の標本から長さ1メガベースまで、及びそれを超えるDNAを分離し、濃縮することができる。これらの方法のそれぞれを使用し、サンプルレベルで起こり得る差異を制御するために単一のGM12878細胞培養から始め、DNA抽出が調製される。断片のサイズ分布は、Herschlebら(Nature Protocols 2(3):677～84頁(2007))によるパルスフィールドゲル電気泳動によって評価できる。前述の方法を使用して、極めて大きなDNAストレッチを抽出し、それを使用してXLRPライブラリーを組み立てることができる。次いでXLRPライブラリーを配列決定し、整列する。リード対間のゲノム距離をゲルから観察される断片サイズと比較することによって得られたリードデータを分析する。 DNA of 150 kbp or less was extracted with a commercially available kit. Figure 7 demonstrates that XLRP libraries can be generated from captured read pairs up to the maximum fragment length of the extracted DNA. Therefore, the methods disclosed herein can be expected to have the ability to generate read pairs from longer stretches of DNA. There are many well-developed processes for recovering high molecular weight DNA, and these methods can be used in conjunction with the methods or procedures disclosed herein. Extraction methods that produce large fragment length DNA can be used to create XLRP libraries from these fragments and evaluate the resulting read pairs. For example, DNA with a large molecular weight can be found in (1) Teague et al. (Proc. Nat. Acad. Sci. USA 107 (24): 10848-53 (2010)) or Zhou et al. (PLOS Genetics, 5 (11) ,: e1000711. References by gentle lysis of cells by page (2009)) and by agarose gel plugs by (2) Wing et al. (The Plant Journal: for Cell and Molecular Biology 4 (5): 893-8 (1993)). Can be extracted by using the Aurora System manufactured by Boreal Genomics, which is fully incorporated herein by reference, including any of the references contained therein. These methods are capable of producing long DNA fragments that exceed the DNA required for conventional next-generation sequencing methods, but any other suitable method known in the art, with similar results. It can be substituted to achieve it. The Aurora System provides unique results and can separate and concentrate DNA from tissues or other specimens up to and over 1 megabase in length. Using each of these methods, DNA extractions are prepared starting with a single GM12878 cell culture to control possible differences at the sample level. Fragment size distribution can be evaluated by pulsed-field gel electrophoresis by Herschleb et al. (Nature Protocols 2 (3): 677-84 (2007)). The methods described above can be used to extract very large DNA stretches and use them to assemble an XLRP library. The XLRP library is then sequenced and sorted. Read data obtained by comparing the genomic distance between read pairs to the fragment size observed from the gel is analyzed.

[実施例9]
望ましくないゲノム領域からのリード対を減少させる [Example 9]
Reduce read pairs from undesired genomic regions

望ましくないゲノム領域に相補的なRNAが、in vitro転写によって作製され、架橋の前に再構築クロマチンに付加される。補充されたRNAが、1つ以上の望ましくないゲノム領域に結合するので、RNA結合は、これらの領域で架橋効率を低下させる。それにより架橋した複合体におけるこれらの領域からのDNAの存在量は、減少する。再構築クロマチンをビオチン化し、固定化し、上記の通り使用する。いくつかの場合において、RNAは、ゲノム中の反復領域を標的にするように設計される。 RNA complementary to the undesired genomic region is produced by in vitro transcription and added to the reconstituted chromatin prior to cross-linking. RNA binding reduces cross-linking efficiency in these regions as the supplemented RNA binds to one or more undesired genomic regions. Thereby, the abundance of DNA from these regions in the crosslinked complex is reduced. Reconstructed chromatin is biotinylated, immobilized and used as described above. In some cases, RNA is designed to target repetitive regions in the genome.

[実施例10]
所望のクロマチン領域由来のリード対を増加させる [Example 10]
Increases lead pairs from the desired chromatin region

遺伝子アセンブリ又はハロタイプ判定のために、所望のクロマチン領域由来のDNAを二本鎖形態で作製する。したがって、望ましくない領域由来のDNAの出現量は減少する。所望のクロマチン領域由来の二本鎖DNAは、そのような領域で、複数キロベース間隔でタイルするプライマーによって生成される。本方法の他の実装において、タイリング間隔を変化させて、所望の複製効率を持つ異なるサイズの所望の領域をアドレス指定する。任意選択でDNAを融解させることにより、所望の領域全体のプライマー結合部位をプライマーと接触させる。DNAの新たな鎖を、タイルしたプライマーを使用して合成する。例えば一本鎖DNAに特異的なエンドヌクレアーゼでこれらの領域を標的することにより、望ましくない領域を減少させる又は取り除く。残っている所望の領域は、任意選択で増幅することができる。調製したサンプルを、本明細書他の場所で記載した配列決定ライブラリー調製法に供する。いくつかの実装において、各所望のクロマチン領域の長さまでの距離に跨るリード対を、そのような所望のクロマチン領域のそれぞれから生成する。 DNA from the desired chromatin region is generated in double-stranded form for gene assembly or haplotyping. Therefore, the appearance of DNA from undesired regions is reduced. Double-stranded DNA from the desired chromatin region is produced in such region by primers that tile at multiple kilobase intervals. In another implementation of the method, tiling intervals are varied to address desired regions of different sizes with the desired replication efficiency. By optionally melting the DNA, the primer binding site across the desired region is brought into contact with the primer. New strands of DNA are synthesized using tiled primers. Targeting these regions with, for example, single-stranded DNA-specific endonucleases reduces or eliminates unwanted regions. The remaining desired region can be optionally amplified. The prepared sample is subjected to the sequencing library preparation method described elsewhere herein. In some implementations, lead pairs spanning the distance to the length of each desired chromatin region are generated from each of such desired chromatin regions.

本開示の好ましい実施形態について本明細書に示し、記載したが、そのような実施形態がほんの一例として提供されていることは、当業者にとって明らかであろう。多数の変形、変更及び置き換えが、本開示を逸脱することなく当業者には直ちに思いつくであろう。本明細書に記載される本開示の実施形態の様々な代替物が、本開示を実践する際に利用できることを理解すべきである。以下の請求が本開示の範囲を定義し、それによりこれらの請求及びその等価物の範囲内にある方法及び構造は網羅されるものとする。本発明は以下の実施形態を包含する。
[１] ゲノムアセンブリの方法であって、
複数のコンティグを生成するステップと、
染色体、クロマチン又は再構成クロマチンの物理的レイアウトをプロービングすることによって作製されるデータから複数のリード対を生成するステップと、
前記複数のコンティグに前記複数のリード対をマッピング又はアセンブルするステップと、
前記リードマッピング又はアセンブリデータを使用してコンティグの隣接行列を構築するステップと、
前記隣接行列を分析して、その順序及び/又はゲノムに対する方向を表す、前記コンティグを通る経路を決定するステップと
を含む方法。
[２] 前記複数のコンティグが、
対象のDNAの長いストレッチを不確定なサイズのランダムな断片に断片化するステップと、
高スループット配列決定法を使用して前記断片を配列決定して、複数の配列決定リードを生成するステップと、
複数のコンティグを形成するように前記配列決定リードをアセンブルするステップと
を含むショットガン配列決定法を使用することによって生成される、実施形態1に記載の方法。
[３] 前記複数のリード対が、Hi-Cに基づく技法を使用して染色体、クロマチン又は再構成クロマチンの前記物理的レイアウトをプロービングすることにより生成される、実施形態1又は実施形態2に記載の方法。
[４] 前記Hi-Cに基づく技法が、
染色体、クロマチン又は再構成クロマチンを固定剤で架橋して、DNA-タンパク質架橋を形成するステップと、
1つ以上の制限酵素で前記架橋したDNA-タンパク質を切断して、粘着末端を含む複数のDNA-タンパク質複合体を生成するステップと、
1つ以上のマーカーを含有するヌクレオチドで前記粘着末端を埋めて、次に一緒にライゲーションされる平滑末端を作り出すステップと、
前記複数のDNA-タンパク質複合体を断片に断片化するステップと、
前記1つ以上のマーカーを使用することによって断片を含有する接合部をプルダウンするステップと、
高スループット配列決定法を使用して断片を含有する前記接合部を配列決定して、複数のリード対を生成するステップと
を含む、実施形態3に記載の方法。
[５] 前記複数のリード対が、培養細胞又は一次組織から単離された染色体若しくはクロマチンの前記物理的レイアウトをプロービングすることによって生成される、前記実施形態のいずれかに記載の方法。
[６] 前記複数のリード対が、1つ以上の対象のサンプルから得られるネイキッドDNAを単離されたヒストンと複合体形成させることによって形成される再構成クロマチンの前記物理的レイアウトをプロービングすることによって生成される、実施形態1から4のいずれかに記載の方法。
[７] 前記複数のリード対の場合に、前記コンティグの端までの前記リードの距離の写像を得ることにより少なくとも約80%の前記リード対に重み付けして、長い接触よりも短い接触のより高い確率を組み込む、前記実施形態のいずれかに記載の方法。
[８] 前記隣接行列を再スケーリングして、前記ゲノムの無差別な領域を表す前記コンティグ上の多くの接触の重みを軽減する、前記実施形態のいずれかに記載の方法。
[９] 前記ゲノムの前記無差別な領域が、クロマチンのスキャフォールド相互作用を調節する1つ以上の薬剤に対する1つ以上の保存結合部位を含む、実施形態8に記載の方法。
[１０] 前記1つ以上の薬剤が転写リプレッサーCTCFを含む、実施形態9に記載の方法。
[１１] ヒト対象の前記ゲノムアセンブリを提供し、前記複数のコンティグが前記ヒト対象のDNAから生成され、前記複数のリード対が、前記対象のネイキッドDNAから作られる前記ヒト対象の染色体若しくはクロマチン、又は再構成クロマチンを使用することによって生成される、前記実施形態のいずれかに記載の方法。
[１２] ハプロタイプフェージングを決定する方法であって、前記実施形態のいずれかに記載の方法を含み、
前記複数のリード対中にある1つ以上のヘテロ接合性の部位を同定するステップと、
一対のヘテロ接合性部位を含むリード対を同定するステップと
を更に含み、前記対のヘテロ接合性部位の前記同定により、対立遺伝子のバリアントに対するフェージングデータを決定できる方法。
[１３] メタゲノミクスアセンブリの方法であって、実施形態1に記載の方法を含み、前記複数のリード対が、
環境から微生物を収集するステップと、
固定剤を添加して、各微生物細胞内に架橋を形成するステップと
を含む改変されたHi-Cに基づく方法を使用して、複数の微生物染色体の物理的レイアウトをプロービングすることにより決定され、異なるコンティグにマッピングされるリード対が、どのコンティグが同じ種由来であるかを示す方法。
[１４] 前記固定剤が、ホルムアルデヒドである、実施形態13に記載の方法。
[１５] 単一DNA分子から生成される複数のコンティグをアセンブルする方法であって、
前記単一DNA分子から複数のリード対を生成するステップと、
リード対を使用して前記コンティグをアセンブルするステップと
を含み、少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨り、前記リード対が、14日以内に生成される方法。
[１６] 少なくとも10%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨る、実施形態15に記載の方法。
[１７] 少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも100kBの距離に跨る、実施形態15に記載の方法。
[１８] 前記リード対が、7日以内に生成される、実施形態15から17のいずれかに記載の方法。
[１９] 単一DNA分子に由来する複数のコンティグをアセンブルする方法であって、
in vitroで前記単一DNA分子から複数のリード対を生成するステップと、
前記リード対を使用して前記コンティグをアセンブルするステップと
を含み、少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも30kBの距離に跨る方法。
[２０] 少なくとも10%の前記リード対が、前記単一DNA分子上で少なくとも30kBの距離に跨る、実施形態19に記載の方法。
[２１] 少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨る、実施形態20に記載の方法。
[２２] ハプロタイプフェージングの方法であって、
単一DNA分子から複数のリード対を生成するステップと、
前記リード対を使用して前記DNA分子の複数のコンティグをアセンブルするステップとを含み、少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨り、前記ハプロタイプフェージングが、70%を超える精度で実施される方法。
[２３] 少なくとも10%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨る、実施形態22に記載の方法。
[２４] 少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも100kBの距離に跨る、実施形態22に記載の方法。
[２５] 前記ハプロタイプフェージングが、90%を超える精度で実施される、実施形態22から24のいずれかに記載の方法。
[２６] ハプロタイプフェージングの方法であって、
in vitroで単一DNA分子から複数のリード対を生成するステップと、
前記リード対を使用して前記DNA分子の複数のコンティグをアセンブルするステップとを含み、少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも30kBの距離に跨り、前記ハプロタイプフェージングが、70%を超える精度で実施される方法。
[２７] 少なくとも10%の前記リード対が、前記単一DNA分子上で少なくとも30kBの距離に跨る、実施形態26に記載の方法。
[２８] 少なくとも1%の前記リード対が、前記単一DNA分子上で少なくとも50kBの距離に跨る、実施形態26に記載の方法。
[２９] 前記ハプロタイプフェージングが、90%を超える精度で実施される、実施形態26から28のいずれかに記載の方法。
[３０] ハプロタイプフェージングが、70%を超える精度で実施される、in vitroハプロタイプフェージングの方法。
[３１] 第1のDNA分子から第1のリード対を生成する方法であって、
(a)in vitroで第1のDNA分子を架橋するステップであって、前記第1のDNA分子が第1のDNAセグメント及び第2のDNAセグメントを含むステップと、
(b)前記第1のDNAセグメントを前記第2のDNAセグメントと連結し、それによって連結されたDNAセグメントを形成するステップと、
(c)前記連結DNAセグメントを配列決定し、それによって第1のリード対を得るステップと
を含む方法。
[３２] 複数の会合分子が前記第1のDNA分子に架橋されている、実施形態31に記載の方法。
[３３] 前記会合分子がアミノ酸を含む、実施形態32に記載の方法。
[３４] 前記会合分子がペプチド又はタンパク質である、実施形態33に記載の方法。
[３５] 前記第1のDNA分子が、固定剤で架橋されている、実施形態31から34のいずれかに記載の方法。
[３６] 前記固定剤が、ホルムアルデヒドである、実施形態35に記載の方法。
[３７] 前記第1のDNAセグメント及び前記第2のDNAセグメントが、前記第1のDNA分子を切り離すことによって生成される、実施形態31から36のいずれかに記載の方法。
[３８] 前記第1のリード対を使用して前記第1のDNA分子の複数のコンティグをアセンブルするステップを更に含む、実施形態31から37のいずれかに記載の方法。
[３９] 前記第1及び前記第2のDNAセグメントのそれぞれが、少なくとも1つの親和性標識に接続され、前記連結DNAセグメントが前記親和性標識を使用して捕捉される、実施形態31から38のいずれかに記載の方法。
[４０] (a)複数の会合分子を少なくとも第2のDNA分子に提供するステップと、
(b)前記会合分子を前記第2のDNA分子に架橋し、それによりin vitroで第2の複合体を形成するステップと、
(c)前記第2の複合体を切り離し、それにより第3のDNAセグメント及び第4のセグメントを生成するステップと、
(d)前記第3のDNAセグメントを前記第4のDNAセグメントと連結し、それにより第2の連結DNAセグメントを形成するステップと、
(e)前記第2の連結DNAセグメントを配列決定し、それにより第2のリード対を得るステップと
を更に含む、実施形態31に記載の方法。
[４１] 前記DNA分子由来の前記DNAセグメントの40%未満が、他の任意のDNA分子由来のDNAセグメントと連結されている、実施形態40に記載の方法。
[４２] 前記DNA分子由来の前記DNAセグメントの20%未満が、他の任意のDNA分子由来のDNAセグメントと連結されている、実施形態40に記載の方法。
[４３] 既定の配列を含む第1のDNA分子から第1のリード対を生成する方法であって、
(a)1つ以上のDNA結合分子を前記第1のDNA分子に提供するステップであって、1つ以上の前記DNA結合分子が前記既定の配列に結合するステップと、
(b)in vitroで前記第1のDNA分子を架橋するステップであって、前記第1のDNA分子が第1のDNAセグメント及び第2のDNAセグメントを含むステップと、
(c)前記第1のDNAセグメントを前記第2のDNAセグメントと連結し、それによって第1の連結DNAセグメントを形成するステップと、
(d)前記第1の連結DNAセグメントを配列決定し、それによって前記第1のリード対を得るステップと
を含み、前記既定の配列が前記リード対中に現れる確率が、前記既定の配列への前記DNA結合分子の結合による影響を受ける方法。
[４４] 前記DNA結合分子が、前記既定の配列にハイブリダイズできる核酸である、実施形態43に記載の方法。
[４５] 前記核酸がRNAである、実施形態44に記載の方法。
[４６] 前記核酸がDNAである、実施形態44に記載の方法。
[４７] 前記DNA結合分子が小分子である、実施形態43に記載の方法。
[４８] 前記小分子が、100μM未満の結合親和性で前記既定の配列に結合する、実施形態47に記載の方法。
[４９] 前記小分子が、1μM未満の結合親和性で前記既定の配列に結合する、実施形態47に記載の方法。
[５０] 前記DNA結合分子が、表面又は固体支持体に固定化されている、実施形態43から49のいずれかに記載の方法。
[５１] 前記既定の配列が前記リード対中に現れる前記確率が低下する、実施形態43に記載の方法。
[５２] 前記既定の配列が前記リード対中に現れる前記確率が増加する、実施形態43に記載の方法。
[５３] それぞれ少なくとも第1の配列エレメント及び第2の配列エレメントを含む複数のリード対を含むin vitroライブラリーであって、前記第1及び前記第2の配列エレメントが単一DNA分子に由来し、前記リード対の少なくとも1%が、前記単一DNA分子上で少なくとも50kB離れている第1及び第2の配列エレメントを含むライブラリー。
[５４] 前記リード対の少なくとも10%が、前記単一DNA分子上で少なくとも50kB離れている第1及び第2の配列エレメントを含む、実施形態53に記載のin vitroライブラリー。
[５５] 前記リード対の少なくとも1%が、前記単一DNA分子上で少なくとも100kB離れている第1及び第2の配列エレメントを含む、実施形態54に記載のin vitroライブラリー。
[５６] 前記リード対の20%未満が、1つ以上の既定の配列を含む、実施形態53から55のいずれかに記載のin vitroライブラリー。
[５７] 前記リード対の10%未満が、1つ以上の既定の配列を含む、実施形態56に記載のin vitroライブラリー。
[５８] 前記リード対の5%未満が、1つ以上の既定の配列を含む、実施形態57に記載のin vitroライブラリー。
[５９] 前記既定の配列が、前記既定の配列にハイブリダイズできる1つ以上の核酸又は小分子によって決定される、実施形態56から58のいずれかに記載のin vitroライブラリー。
[６０] 前記1つ以上の核酸がRNAである、実施形態59に記載のin vitroライブラリー。
[６１] 前記1つ以上の核酸がDNAである、実施形態59に記載のin vitroライブラリー。
[６２] 前記1つ以上の核酸が、表面又は固体支持体に固定化されている、実施形態59から61のいずれかに記載のin vitroライブラリー。
[６３] 前記既定の配列が1つ以上の小分子によって決定される、実施形態59に記載のin vitroライブラリー。
[６４] 前記1つ以上の小分子が100μM未満の結合親和性で前記既定の配列に結合する、実施形態63に記載のin vitroライブラリー。
[６５] 前記1つ以上の小分子が1μM未満の結合親和性で前記既定の配列に結合する、実施形態63に記載のin vitroライブラリー。
[６６] DNA断片及び複数の会合分子を含む組成物であって、前記会合分子が、in vitro複合体中で前記DNA断片に架橋されており、前記in vitro複合体が、固体支持体に固定化されている組成物。
[６７] DNA断片、複数の会合分子及びDNA結合分子を含む組成物であって、前記DNA結合分子が、前記DNA断片の既定の配列に結合しており、前記会合分子が、前記DNA断片に架橋されている組成物。
[６８] 前記DNA結合分子が、前記既定の配列にハイブリダイズできる核酸である、実施形態67に記載の組成物。
[６９] 前記核酸がRNAである、実施形態68に記載の組成物。
[７０] 前記核酸が、DNAである、実施形態68に記載の組成物。
[７１] 前記核酸が表面又は固体支持体に固定化されている、実施形態68から70のいずれかに記載の組成物。
[７２] 前記DNA結合分子が小分子である、実施形態67に記載の組成物。
[７３] 前記小分子が、100μM未満の結合親和性で前記既定の配列に結合する、実施形態72に記載の組成物。
[７４] 前記小分子が、1μM未満の結合親和性で前記既定の配列に結合する、実施形態72に記載の組成物。 Preferred embodiments of the present disclosure have been shown and described herein, but it will be apparent to those skilled in the art that such embodiments are provided as just one example. A number of modifications, changes and replacements will be immediately conceivable to those of skill in the art without departing from this disclosure. It should be understood that various alternatives to the embodiments of the present disclosure described herein are available in practicing the present disclosure. The following claims define the scope of this disclosure, thereby covering the methods and structures within the scope of these claims and their equivalents. The present invention includes the following embodiments.
[1] A method of genome assembly,
Steps to generate multiple contigs and
Steps to generate multiple read pairs from data produced by probing the physical layout of chromosomes, chromatin or reconstituted chromatin,
A step of mapping or assembling the plurality of read pairs to the plurality of contigs,
The step of constructing a contig adjacency matrix using the read mapping or assembly data,
With the step of analyzing the adjacency matrix to determine the route through the contig, which represents its order and / or direction with respect to the genome.
How to include.
[2] The plurality of contigs
With the steps of fragmenting a long stretch of DNA of interest into random pieces of uncertain size,
Steps to sequence the fragment using a high-throughput sequencing method to generate multiple sequencing reads,
With the step of assembling the sequencing read to form multiple contigs
The method according to embodiment 1, which is generated by using a shotgun sequencing method comprising.
[3] The first or second embodiment, wherein the plurality of read pairs are generated by probing the physical layout of a chromosome, chromatin or reconstituted chromatin using a Hi-C based technique. the method of.
[4] The technique based on Hi-C is
Steps to crosslink chromosomes, chromatin or reconstituted chromatin with a fixative to form DNA-protein crosslinks,
A step of cleaving the crosslinked DNA-protein with one or more restriction enzymes to generate multiple DNA-protein complexes containing sticky ends.
A step of filling the sticky end with a nucleotide containing one or more markers to create a blunt end to be ligated together.
The step of fragmenting the plurality of DNA-protein complexes into fragments,
With the step of pulling down the junction containing the fragment by using one or more of the markers,
With the step of sequencing the junction containing the fragment using a high throughput sequencing method to generate multiple read pairs.
The method according to the third embodiment.
[5] The method of any of the embodiments, wherein the plurality of read pairs are generated by probing the physical layout of a chromosome or chromatin isolated from cultured cells or primary tissue.
[6] Probing the physical layout of the reconstituted chromatin formed by complexing the naked DNA obtained from one or more target samples with isolated histones by the plurality of read pairs. The method according to any one of embodiments 1 to 4, which is generated by.
[7] In the case of the plurality of lead pairs, at least about 80% of the lead pairs are weighted by obtaining a mapping of the distance of the leads to the end of the contig, and the short contact is higher than the long contact. The method according to any of the above embodiments, which incorporates probabilities.
[8] The method of any of the embodiments, wherein the adjacency matrix is rescaled to reduce the weight of many contacts on the contig representing an indiscriminate region of the genome.
[9] The method of embodiment 8, wherein the indiscriminate region of the genome comprises one or more conservative binding sites for one or more agents that regulate chromatin scaffold interactions.
[10] The method of embodiment 9, wherein the one or more agents comprises a transcriptional repressor CTCF.
[11] The human subject's chromosome or chromatin, which provides the genomic assembly of the human subject, wherein the plurality of contigs are generated from the human subject's DNA and the plurality of read pairs are made from the subject's naked DNA. Alternatively, the method according to any of the above embodiments, which is produced by using reconstituted chromatin.
[12] A method for determining haplotype fading, comprising the method according to any of the above embodiments.
The step of identifying one or more heterozygous sites in the plurality of read pairs,
Steps to identify a lead pair containing a pair of heterozygous sites
The method by which fading data for allelic variants can be determined by the identification of the pair of heterozygous sites.
[13] A method of metagenomic assembly, comprising the method according to embodiment 1, wherein the plurality of lead pairs are composed of the plurality of lead pairs.
Steps to collect microorganisms from the environment,
With the step of adding a fixative to form crosslinks within each microbial cell
Lead pairs determined by probing the physical layout of multiple microbial chromosomes using a modified Hi-C-based method, including, and mapped to different contigs, which contigs are from the same species. How to show.
[14] The method of embodiment 13, wherein the fixative is formaldehyde.
[15] A method of assembling multiple contigs generated from a single DNA molecule.
The step of generating multiple read pairs from the single DNA molecule,
With the step of assembling the contig using a lead pair
A method in which at least 1% of the read pairs span a distance of at least 50 kB on the single DNA molecule and the read pairs are generated within 14 days.
[16] The method of embodiment 15, wherein at least 10% of the read pairs span a distance of at least 50 kB on the single DNA molecule.
[17] The method of embodiment 15, wherein at least 1% of the read pairs span a distance of at least 100 kB on the single DNA molecule.
[18] The method of any of embodiments 15-17, wherein the lead pair is generated within 7 days.
[19] A method of assembling multiple contigs derived from a single DNA molecule.
In vitro, the step of generating multiple read pairs from the single DNA molecule,
With the step of assembling the contig using the lead pair
A method in which at least 1% of the read pairs span a distance of at least 30 kB on the single DNA molecule.
[20] The method of embodiment 19, wherein at least 10% of the read pairs span a distance of at least 30 kB on the single DNA molecule.
[21] The method of embodiment 20, wherein at least 1% of the read pairs span a distance of at least 50 kB on the single DNA molecule.
[22] A haplotype fading method,
Steps to generate multiple read pairs from a single DNA molecule,
The haplotype fading comprises at least 1% of the read pairs spanning a distance of at least 50 kB on the single DNA molecule, comprising assembling multiple contigs of the DNA molecule using the read pairs. A method implemented with an accuracy of over 70%.
[23] The method of embodiment 22, wherein at least 10% of the read pairs span a distance of at least 50 kB on the single DNA molecule.
[24] The method of embodiment 22, wherein at least 1% of the read pairs span a distance of at least 100 kB on the single DNA molecule.
[25] The method of any of embodiments 22-24, wherein the haplotype fading is performed with an accuracy greater than 90%.
[26] A haplotype fading method,
Steps to generate multiple read pairs from a single DNA molecule in vitro,
The haplotype fading comprises at least 1% of the read pairs spanning a distance of at least 30 kB on the single DNA molecule, comprising assembling multiple contigs of the DNA molecule using the read pairs. A method implemented with an accuracy of over 70%.
[27] The method of embodiment 26, wherein at least 10% of the read pairs span a distance of at least 30 kB on the single DNA molecule.
[28] The method of embodiment 26, wherein at least 1% of the read pairs span a distance of at least 50 kB on the single DNA molecule.
[29] The method of any of embodiments 26-28, wherein the haplotype fading is performed with an accuracy greater than 90%.
[30] A method of in vitro haplotype fading in which haplotype fading is performed with an accuracy of over 70%.
[31] A method of generating a first read pair from a first DNA molecule.
(a) A step of cross-linking the first DNA molecule in vitro, wherein the first DNA molecule contains a first DNA segment and a second DNA segment.
(b) A step of connecting the first DNA segment to the second DNA segment to form a linked DNA segment.
(c) With the step of sequencing the linked DNA segment and thereby obtaining a first read pair.
How to include.
[32] The method of embodiment 31, wherein the plurality of associated molecules are crosslinked to the first DNA molecule.
[33] The method of embodiment 32, wherein the associated molecule comprises an amino acid.
[34] The method of embodiment 33, wherein the associated molecule is a peptide or protein.
[35] The method of any of embodiments 31-34, wherein the first DNA molecule is crosslinked with a fixative.
[36] The method of embodiment 35, wherein the fixative is formaldehyde.
[37] The method of any of embodiments 31-36, wherein the first DNA segment and the second DNA segment are generated by cleaving the first DNA molecule.
[38] The method of any of embodiments 31-37, further comprising assembling a plurality of contigs of the first DNA molecule using the first read pair.
[39] Embodiments 31-38, wherein each of the first and second DNA segments is attached to at least one affinity label and the linked DNA segment is captured using the affinity label. The method described in either.
[40] (a) A step of providing a plurality of associated molecules to at least a second DNA molecule.
(b) A step of cross-linking the associated molecule to the second DNA molecule, thereby forming a second complex in vitro.
(c) A step of cleaving the second complex to generate a third DNA segment and a fourth segment.
(d) A step of linking the third DNA segment with the fourth DNA segment to form a second linked DNA segment.
(e) With the step of sequencing the second linked DNA segment and thereby obtaining a second read pair.
31. The method of embodiment 31.
[41] The method of embodiment 40, wherein less than 40% of the DNA segment from said DNA molecule is linked to a DNA segment from any other DNA molecule.
[42] The method of embodiment 40, wherein less than 20% of the DNA segment from said DNA molecule is linked to a DNA segment from any other DNA molecule.
[43] A method of generating a first read pair from a first DNA molecule containing a predetermined sequence.
(a) A step of providing one or more DNA-binding molecules to the first DNA molecule, wherein the one or more DNA-binding molecules bind to the predetermined sequence.
(b) A step of cross-linking the first DNA molecule in vitro, wherein the first DNA molecule contains a first DNA segment and a second DNA segment.
(c) A step of linking the first DNA segment to the second DNA segment, thereby forming a first linked DNA segment.
(d) With the step of sequencing the first linked DNA segment, thereby obtaining the first read pair.
A method in which the probability that the predetermined sequence appears in the read pair is affected by the binding of the DNA-binding molecule to the predetermined sequence.
[44] The method of embodiment 43, wherein the DNA binding molecule is a nucleic acid capable of hybridizing to the predetermined sequence.
[45] The method of embodiment 44, wherein the nucleic acid is RNA.
[46] The method of embodiment 44, wherein the nucleic acid is DNA.
[47] The method of embodiment 43, wherein the DNA binding molecule is a small molecule.
[48] The method of embodiment 47, wherein the small molecule binds to the predetermined sequence with a binding affinity of less than 100 μM.
[49] The method of embodiment 47, wherein the small molecule binds to the predetermined sequence with a binding affinity of less than 1 μM.
[50] The method of any of embodiments 43-49, wherein the DNA-binding molecule is immobilized on a surface or a solid support.
[51] The method of embodiment 43, wherein the probability that the predetermined sequence will appear in the read pair is reduced.
[52] The method of embodiment 43, wherein the probability that the predetermined sequence will appear in the read pair is increased.
[53] An in vitro library containing a plurality of read pairs, each containing at least a first sequence element and a second sequence element, wherein the first and second sequence elements are derived from a single DNA molecule. , A library containing first and second sequence elements in which at least 1% of the read pairs are separated by at least 50 kB on the single DNA molecule.
[54] The in vitro library according to embodiment 53, wherein at least 10% of the read pair comprises first and second sequence elements separated by at least 50 kB on the single DNA molecule.
[55] The in vitro library according to embodiment 54, wherein at least 1% of the read pair comprises first and second sequence elements separated by at least 100 kB on the single DNA molecule.
[56] The in vitro library according to any of embodiments 53-55, wherein less than 20% of the read pairs contain one or more predetermined sequences.
[57] The in vitro library according to embodiment 56, wherein less than 10% of the read pairs contain one or more predetermined sequences.
[58] The in vitro library according to embodiment 57, wherein less than 5% of the read pairs contain one or more predetermined sequences.
[59] The in vitro library according to any of embodiments 56-58, wherein the defined sequence is determined by one or more nucleic acids or small molecules capable of hybridizing to the defined sequence.
[60] The in vitro library according to embodiment 59, wherein the one or more nucleic acids are RNA.
[61] The in vitro library according to embodiment 59, wherein the one or more nucleic acids are DNA.
[62] The in vitro library according to any of embodiments 59-61, wherein the one or more nucleic acids are immobilized on a surface or a solid support.
[63] The in vitro library according to embodiment 59, wherein the predetermined sequence is determined by one or more small molecules.
[64] The in vitro library according to embodiment 63, wherein the one or more small molecules bind to the predetermined sequence with a binding affinity of less than 100 μM.
[65] The in vitro library according to embodiment 63, wherein the one or more small molecules bind to the predetermined sequence with a binding affinity of less than 1 μM.
[66] A composition comprising a DNA fragment and a plurality of associated molecules, wherein the associated molecule is crosslinked to the DNA fragment in an in vitro complex, and the in vitro complex is immobilized on a solid support. The composition that has been made.
[67] A composition comprising a DNA fragment, a plurality of associated molecules and a DNA binding molecule, wherein the DNA binding molecule is bound to a predetermined sequence of the DNA fragment and the associated molecule is attached to the DNA fragment. The crosslinked composition.
[68] The composition according to embodiment 67, wherein the DNA-binding molecule is a nucleic acid capable of hybridizing to the predetermined sequence.
[69] The composition according to embodiment 68, wherein the nucleic acid is RNA.
[70] The composition according to embodiment 68, wherein the nucleic acid is DNA.
[71] The composition according to any of embodiments 68-70, wherein the nucleic acid is immobilized on a surface or a solid support.
[72] The composition according to embodiment 67, wherein the DNA-binding molecule is a small molecule.
[73] The composition according to embodiment 72, wherein the small molecule binds to the predetermined sequence with a binding affinity of less than 100 μM.
[74] The composition according to embodiment 72, wherein the small molecule binds to the predetermined sequence with a binding affinity of less than 1 μM.

Claims

Contacting the sample with a fixative , wherein the sample comprises a nucleic acid molecule complexed to at least one nucleic acid binding protein.
Cleaving the nucleic acid into multiple segments, including at least the first segment and the second segment.
Connecting the first segment and the second segment at a joint,
Obtaining at least a portion of the sequence on each side of the junction to generate the first read pair,
Mapping the first read pair to a set of contigs and determining a route through the set of contigs that represents the direction and / or order with respect to the genome.
How to include.

Excluding the first read pair from the contig assembly analysis because at least a portion of the sequence on each side of the junction maps to a common contig, and using a second read pair against the genome. Determining the route through the contig, which represents the direction and / or order.
The method according to claim 1.

The method of claim 1, comprising contacting the sample with an antibody prior to contacting the sample with a fixative .

The method of claim 1, comprising contacting the sample with a fixative followed by contacting the sample with an antibody.

The method of claim 1, wherein cleaving the nucleic acid into at least a first segment and a second segment comprises cleaving the nucleic acid with at least one restriction enzyme.

The method of claim 1, wherein cleaving the nucleic acid into at least a first segment and a second segment comprises shearing the nucleic acid.

The method of claim 1, wherein cleaving the nucleic acid into at least a first segment and a second segment comprises sonicating the nucleic acid.

The method of claim 1, wherein contacting the sample with a fixative comprises irradiating with ultraviolet light.

The method of claim 1, wherein contacting the sample with a fixative comprises contacting the sample with a chemical fixative.

The method of claim 9, wherein the chemical fixative comprises formaldehyde.

The method of claim 9, wherein the chemical fixative comprises psoralen.

The method of claim 1, wherein the at least one nucleic acid binding protein comprises a natural chromatin component.

The method of claim 1, wherein the at least one nucleic acid binding protein comprises an externally sourced histone.

The method of claim 1, wherein the ligation comprises filling the sticky end with at least some biotin-labeled nucleotide and ligating the blunt end.

The method of claim 1, wherein the route through a set of contigs representing orientation and / or order with respect to the genome is determined such that each contig is recalled exactly once.

The method of claim 1, wherein determining a route through a set of contigs representing the direction and / or order with respect to the genome reduces the weight of the contigs representing indiscriminate regions of the genome.

The method of claim 1, wherein the set of contigs is generated by a shotgun sequencing method.

The method of claim 1, wherein determining a pathway through a set of contigs representing orientation and / or order with respect to the genome comprises haplotype fading of the set of contigs.

Haplotype fading of a set of contigs involves identifying one or more heterozygous sites in multiple read pair sequences, where fading data for allelic variants contains reads containing pairs of heterozygous sites. 18. The method of claim 18, as determined by identifying a pair.

Claim 1 excludes read-pair sequences that map to a common contig from analysis, thereby using read pairs that map to different contigs to determine a route through a set of contigs that represent orientation and / or order with respect to the genome. The method described in.