JP2013165661A - Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence - Google Patents

Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence Download PDF

Info

Publication number
JP2013165661A
JP2013165661A JP2012030173A JP2012030173A JP2013165661A JP 2013165661 A JP2013165661 A JP 2013165661A JP 2012030173 A JP2012030173 A JP 2012030173A JP 2012030173 A JP2012030173 A JP 2012030173A JP 2013165661 A JP2013165661 A JP 2013165661A
Authority
JP
Japan
Prior art keywords
sequence
base sequence
determining
index
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2012030173A
Other languages
Japanese (ja)
Inventor
Junya Yamagishi
山岸潤也
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Obihiro University of Agriculture and Veterinary Medicine NUC
Original Assignee
Obihiro University of Agriculture and Veterinary Medicine NUC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Obihiro University of Agriculture and Veterinary Medicine NUC filed Critical Obihiro University of Agriculture and Veterinary Medicine NUC
Priority to JP2012030173A priority Critical patent/JP2013165661A/en
Publication of JP2013165661A publication Critical patent/JP2013165661A/en
Pending legal-status Critical Current

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

PROBLEM TO BE SOLVED: To provide a method for determining base sequences of two or more specimens at a time by corresponding each specimen to sequences.SOLUTION: Sequences are determined at a time through correspondence to each specimen by preparing a plurality of groups by mixing two or more nucleic acid specimens in a specific combination, adding an index sequence specific to each group, subsequently carrying out a sequential analysis using the next generation sequencer (or massively parallel sequencer, NGS), corresponding to each specimen based on the combined pattern of the added index sequence in sequences whose specimen-corresponding parts are the same, and assembling the sequence groups already corresponded to the same specimen.

Description

本発明は、複数の核酸検体を特定の組み合わせで混合したサブグループを作成し、各サブグループ特有のIndex配列を付与した後に、超並列シーケンサーを用いて配列解析を行い、得られた配列をIndex配列に応じて分類したものをアセンブルすることで、各検体と配列を対応付けて決定する方法に関する。 The present invention creates a subgroup in which a plurality of nucleic acid samples are mixed in a specific combination, assigns an index sequence unique to each subgroup, performs sequence analysis using a massively parallel sequencer, and converts the obtained sequence into an index. The present invention relates to a method of associating and classifying each sample and a sequence by assembling those classified according to the sequence.

次世代シーケンサーは並列処理により一度に大量の配列を出力することが出来るが、検体を混合して反応を行うため、検体と出力配列を対応付けることが出来ない。そのため、ゲノム解析やトランスクリプトーム解析など1検体に由来するDNA、RNAの解析や、メタゲノム解析などポピュレーションを明らかにすることを目的とした解析に用いられる(図1)。 Although the next-generation sequencer can output a large amount of sequences at a time by parallel processing, the sample and the output sequence cannot be associated with each other because the sample is mixed and reacted. Therefore, it is used for analysis aiming at clarifying the population such as analysis of DNA and RNA derived from one specimen such as genome analysis and transcriptome analysis, and metagenome analysis (FIG. 1).

一方、次世代シーケンサーの解析力を有効に利用するため、検体を処理する際に数塩基から十数塩基からなる人工的な塩基配列(Index)を付与することで検体と出力配列を対応付け、多検体処理を行う方法としてマルチプレックス法が知られているが、並列可能な数は数十検体にとどまる。 On the other hand, in order to effectively use the analysis power of the next-generation sequencer, when processing a sample, an artificial base sequence (Index) consisting of several bases to a dozen bases is assigned to associate the sample with the output sequence, A multiplex method is known as a method for performing multi-sample processing, but the number of parallel samples is limited to several tens of samples.

細胞工学 2011年8月号 次世代シークエンサーを使いこなす 基礎の基礎Cell Engineering August 2011 Issue Mastering the Next Generation Sequencer Basics 細胞工学 2011年8月号 次世代シークエンサーを使いこなす マルチプレックス法によるバクテリアゲノムのde novo DNA配列解読Cell Engineering August 2011 Issue Using the Next Generation Sequencer De novo DNA sequencing of bacterial genome by multiplex method 細胞工学 2011年8月号 次世代シークエンサーを使いこなす マルチプレックス法によるバクテリアゲノムのde novo DNA配列解読Cell Engineering August 2011 Issue Using the Next Generation Sequencer De novo DNA sequencing of bacterial genome by multiplex method Parasitol Int. 2011 Jun;60(2):199−202. Epub 2011 Mar 21.Construction and analysis of full−length cDNA library of Cryptosporidium parvum.Parasitol Int. 2011 Jun; 60 (2): 199-202. Epub 2011 Mar 21. Construction and analysis of full-length cDNA library of Cryptosporidium parvum. Genome Biol. 2009;10(3):R25. Epub 2009 Mar 4. Ultrafast and memory−efficient alignment of short DNA sequences to the human genome.Genome Biol. 2009; 10 (3): R25. Epub 2009 Mar 4. Ultrafast and memory-efficiency alignment of short DNA sequences to the human genome.

本発明の目的は、Index付与方法の改良によりマルチプレックス法を超える多検体化を達成し、次世代シーケンサーの解析力を有効に利用する方法を提供することにある。 An object of the present invention is to provide a method of achieving multiple samples exceeding the multiplex method by improving the Index providing method and effectively utilizing the analysis power of the next-generation sequencer.

本発明者は、Index付与方法と次世代シーケンサーから出力される配列の解析方法を鋭意検討した結果、各サンプルを特定の組み合わせで混合した後、元サンプル数の根に比例する数へ圧縮された各混合サンプルにIndexを付与して次世代シーケンサーにより塩基配列を取得し、出力された配列から混合前の配列を検体のIDと対応付けて復元する方法を見出し、本発明を完成した(図2)。 As a result of earnest examination of the index assignment method and the analysis method of the sequence output from the next-generation sequencer, the present inventor mixed each sample in a specific combination and then compressed it to a number proportional to the root of the number of original samples. An index is assigned to each mixed sample, a base sequence is obtained by a next-generation sequencer, and a method of restoring the sequence before mixing in association with the ID of the specimen from the output sequence has been found, and the present invention has been completed (FIG. 2). ).

すなわち、本発明は、以下の態様からなる。 That is, this invention consists of the following aspects.

(1)複数からなる核酸検体(プラスミドA,B,C,D)のそれぞれに対して、複数のID(1,2,3,4)を同一の組み合わせにならないように付与し(A=1&2、B=1&3,C=2&4,D=3&4)、同一のIDを有する検体を合わせたサブグループを構成し(分割・混合:1=A+B,2=A+C,3=B+D,4=C+D)、各サブグループを断片化し、各サブグループを構成する核酸にサブグループ特有のIndex配列(1=AA,2=CT,3=CG,4=AT、Index結合)を付与した後、全てを混合して配列解析を行い(混合シーケンス)、得られた配列をIndex配列に応じて分類することで各検体との対応付けを行い(Indexに応じてグループ化:A=AA+CT,B=AA+CG,C=AT+CT,D=AT+CG)、対応付けられた配列をアセンブルすることで各検体の塩基配列を決定する方法。(括弧内は4サンプル同時解析を例示した図2の場合を示す) (1) A plurality of IDs (1, 2, 3, 4) are assigned to each of a plurality of nucleic acid samples (plasmids A, B, C, D) so as not to have the same combination (A = 1 & 2) , B = 1 & 3, C = 2 & 4, D = 3 & 4), and a sub-group is formed by combining samples having the same ID (division / mixing: 1 = A + B, 2 = A + C, 3 = B + D , 4 = C + D), each subgroup is fragmented, and a subgroup-specific index sequence (1 = AA, 2 = CT, 3 = CG, 4 = AT, Index binding) is added to the nucleic acid constituting each subgroup. After adding, all are mixed and sequence analysis is performed (mixed sequence), and the obtained sequence is classified according to the Index sequence to associate with each specimen (grouping according to Index: A = AA + CT, B = AA + CG, C = AT + CT, D = AT + CG), determine the base sequence of each sample by assembling the associated sequences How. (The figure in parentheses shows the case of FIG. 2 illustrating the simultaneous analysis of 4 samples)

(2)上記(1)における検体数が、n0個の検体と、(aのn条−n0)個のモック検体から構成されるaのn条個の検体である、塩基配列の決定方法。 (2) A method for determining a base sequence, wherein the number of specimens in (1) is n specimens of a composed of n0 specimens and (a article n-n0) mock specimens.

(3)上記(2)におけるサブグループの構成方法が、X番目の検体について(X−1)をa進数で表記し、各桁と各数値の組み合わせをIDとみなして、各桁ごとに同一の数値を有する検体をまとめて1つのサブグループとすることを、全ての桁について繰り返すことと同義の操作からなる、塩基配列の決定方法。 (3) The subgroup configuration method in (2) is the same for each digit, assuming that (X-1) is expressed in a-adic for the Xth specimen, and the combination of each digit and each numeric value is regarded as an ID. A method for determining a base sequence, comprising an operation that is synonymous with repeating for all the digits that the samples having the numerical values are collectively made into one subgroup.

(4)上記(3)におけるaが2、あるいは2の倍数である、塩基配列の決定方法。 (4) A method for determining a base sequence, wherein a in (3) is 2 or a multiple of 2.

(5)上記(4)における得られた配列をIndex配列に応じて分類することで各検体との対応付けを行う方法が、同一配列に付与された種々のIndex配列の組み合わせと、検体に付与された付与されたIDの組み合わせとの照合により成される、塩基配列の決定方法。 (5) The method of associating the obtained sequence in (4) with each sample by classifying according to the Index sequence is a combination of various Index sequences given to the same sequence and given to the sample A method for determining a base sequence, which is performed by collating with a combination of assigned IDs.

(6)上記(1)における核酸検体がDNAもしくはRNAである、塩基配列の決定方法。 (6) A method for determining a base sequence, wherein the nucleic acid sample in (1) is DNA or RNA.

(7)上記(1)における配列解析技術が次世代シーケンサーを用いたものである、塩基配列の決定方法。 (7) A method for determining a base sequence, wherein the sequence analysis technique in (1) above uses a next-generation sequencer.

(8)上記(1)におけるIndex配列がDNA、RNAもしくはその混合物質である、塩基配列の決定方法。 (8) A method for determining a base sequence, wherein the Index sequence in (1) is DNA, RNA or a mixed substance thereof.

(9)上記(8)におけるIndex配列がパリティー因子を含む、塩基配列の決定方法。 (9) A method for determining a base sequence, wherein the Index sequence in (8) includes a parity factor.

(10)上記(1)から(5)に、上記(6)から(9)記載の方法を任意に組み合わせた、塩基配列の決定方法。 (10) A method for determining a base sequence, wherein the methods described in (6) to (9) are arbitrarily combined with the above (1) to (5).

(11)上記(11)記載の方法を計算機により実行する際の、コンピュータープログラム。 (11) A computer program for executing the method according to (11) above by a computer.

本発明によれば、1回の次世代シーケンサー解析で、使用するIndex数の累乗に相当する検体数の配列を決定することが可能となる(図1、図2)。 According to the present invention, it is possible to determine the sequence of the number of samples corresponding to the power of the number of indexes to be used by one-time sequencer analysis (FIGS. 1 and 2).

例えば、20種類のIndex配列を用いることで、2の10乗、すなわち1,024検体の配列を、40種類のIndex配列を用いることで、2の20乗、すなわち1,048、576検体の配列を、検体と出力配列の対応情報を保持した状態で、1回の次世代シーケンサー解析で決定することが出来る。 For example, by using 20 types of Index sequences, the array of 2 to the 10th power, that is, 1,024 samples, and by using 40 types of Index sequences, the sequence of 2 to the 20th power, that is, the sequence of 1,048, 576 samples. Can be determined by one-time next-generation sequencer analysis while maintaining correspondence information between the specimen and the output sequence.

具体的には、これまでキャピラリー法で行われていたプラスミドDNAの塩基配列解析を、当該発明と次世代シーケンサーの組み合わせで置き換えることが可能であり、その結果、費用対効果の大幅な向上が見込まれる。 Specifically, it is possible to replace the base sequence analysis of plasmid DNA, which has been performed by the capillary method so far, with the combination of the present invention and the next-generation sequencer, and as a result, the cost-effectiveness can be greatly improved. It is.

発明の効果Effect of the invention 発明概要Summary of invention 検体混合方法の概要Overview of sample mixing method パリティーとはWhat is parity 発明概要(2)Summary of Invention (2) 実施例(1)に用いた塩基配列のアクセッション番号Accession number of the base sequence used in Example (1) 実施例(1)で用いた混合グループの構成Composition of mixed group used in Example (1) 実施例(1)復元率Example (1) Restoration rate 実施例(1)復元率Example (1) Restoration rate

以下、本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail.

本発明において次世代シーケンサーとは、近年開発された、並列処理によって飛躍に解析能力を向上させた一群の塩基配列解析装置で、具体的には、Illumina社HiSeq2000、Roche社Genome Sequencer、ライフテクノロジー社Ion PGM、Pacific Biosciences等を指す(非特許文献1)が、今後開発される装置も含み、ここに示したもの限定されるものではない。 In the present invention, the next-generation sequencer is a group of base sequence analyzers that have been developed in recent years, and whose analysis capabilities have been dramatically improved by parallel processing. Although it refers to Ion PGM, Pacific Biosciences, etc. (Non-patent Document 1), it includes devices to be developed in the future and is not limited to those shown here.

本発明において核酸検体とは、同一配列からなるDNAあるいはRNA分子の集合体である。すなわち、大腸菌を用いた系で増幅およびクローニングされたプラスミドが当てはまる。核酸検体はPCR産物でもよい。また上記の同一配列とは、完全に一致した配列だけではなく、部分的に同一の配列も含む分子の集合体も含む。さらに、一定量のコンタミネーションを含んでいてもよい。 In the present invention, a nucleic acid sample is an assembly of DNA or RNA molecules having the same sequence. That is, a plasmid amplified and cloned in a system using E. coli is applicable. The nucleic acid sample may be a PCR product. The above-mentioned identical sequence includes not only a completely matched sequence but also a collection of molecules including partially identical sequences. Furthermore, a certain amount of contamination may be included.

核酸検体の混合方法について、図3と以下に例示するが、ここに示すもの限定されるものではない。 The method for mixing nucleic acid samples is illustrated in FIG. 3 and the following, but is not limited thereto.

検体数をn0とした場合、aを整数の定数として、aのn乗>n0となる最少のnを求め、(aのn乗−n0)個のモック検体を加えることで0番から(aのn乗−1)番からなるaのn乗個の検体を準備し、X番目の検体について(X−1)をa進数で表記し、各桁と各数値の組み合わせをIDとみなして、各桁ごとに同一の数値を有する検体をまとめて1つのサブグループとして混合する。 When the number of specimens is n0, a is an integer constant, and the minimum number n of a to the power of n> n0 is obtained. By adding (a to the power of n−n0) mock specimens, N-th sample of the n-th power of 1) is prepared, (X-1) is expressed in a-adic number for the X-th sample, and a combination of each digit and each numerical value is regarded as an ID, Samples having the same numerical value for each digit are mixed together as one subgroup.

以下にa=2の場合における具体例を表1と共に示すが、混合方法は例示した方法に限定されるものではない。 Although the specific example in the case of a = 2 is shown with Table 1 below, the mixing method is not limited to the illustrated method.

#0から#2のn乗‐1までの2のn乗種類からなる核酸サンプルを並列解析する場合、以下のルールに基づいて検体の混合を行う。 When nucleic acid samples consisting of 2 n power types from # 0 to # 2 n-1 are analyzed in parallel, specimens are mixed based on the following rules.

検体番号を2進数で表記し、k桁目の数値が0ならば混合グループID(2×(n−k)+1)、1ならば混合グループID(2×(n−k)+2)を付与することを、1からnを満たす全てのkについて繰り返し行う。 Specimen number is expressed in binary number. If the numerical value of the k-th digit is 0, a mixed group ID (2 × (n−k) +1) is given, and if it is 1, a mixed group ID (2 × (n−k) +2) is given. This is repeated for all k satisfying 1 to n.

例えば、n=3の時、総検体数は8で、検体番号#6の検体は、2進法で110と表記され、3桁目の数は1なので、(2×(3−3)+2)=2、2桁目の数は1なので、(2×(3−2)+2)=4、1桁目の数は0なので、(2×(3−1)+1)=5となり、結果、検体番号#6には、混合グループID2,4,5が付加されることになる。逆に、ID2を付加される検体は、検体#6(110)、検体#7(111)、検体#5(101)、検体#4(100)なので、これらをまとめた混合検体を作成する。 For example, when n = 3, the total number of samples is 8, the sample number # 6 is expressed as 110 in binary, and the third digit is 1, so (2 × (3-3) +2 ) = 2, since the number of the second digit is 1, (2 × (3-2) +2) = 4, since the number of the first digit is 0, (2 × (3-1) +1) = 5, and the result The sample numbers # 6 are added with the mixed group IDs 2, 4, and 5. On the other hand, the samples to which ID2 is added are sample # 6 (110), sample # 7 (111), sample # 5 (101), and sample # 4 (100), so a mixed sample in which these are collected is created.

2n種類からなる核酸検体を上記のルールによって2n種類の混合検体に編成したものを、次世代シーケンサーを用いた既知マルチプレックス法(非特許文献2)に供し、塩基配列を取得する。 A nucleic acid sample composed of 2n types is organized into 2n types of mixed samples according to the above rules, and then subjected to a known multiplex method using a next-generation sequencer (Non-patent Document 2) to obtain a base sequence.

この時に用いるIndexにはパリティー塩基を付与することが出来る(図4)。 A parity base can be added to the Index used at this time (FIG. 4).

塩基配列の再構成を以下の様に行う(図2、図5)。 The base sequence is reconstructed as follows (FIGS. 2 and 5).

次世代シーケンサーから出力された配列にはIndexが付与されており、Indexに基づいて配列と混合IDを対応付けすることができる。 An index is assigned to the sequence output from the next-generation sequencer, and the sequence and the mixed ID can be associated with each other based on the index.

2n種類からなる核酸検体を2n種類の混合検体に編成した後に並列解析する場合、例えば、混合ID0,2,4,・・・,2n−2には検体番号#0が含まれている。同様に、混合ID0,2,4,・・・,2n−1には検体番号#1が含まれている。 When parallel analysis is performed after 2n types of nucleic acid samples are organized into 2n types of mixed samples, for example, sample numbers # 0 are included in the mixed IDs 0, 2, 4,..., 2n-2. Similarly, the sample numbers # 1 are included in the mixed IDs 0, 2, 4,..., 2n-1.

すなわち、検体番号#0由来の配列には、混合ID0,2,4,・・・,2n−2に相当するIndexが付与されている。同様に、検体番号#1由来の配列には、混合ID0,2,4,・・・,2n−1に相当するIndexが付与されている。 That is, the index corresponding to the mixed IDs 0, 2, 4,..., 2n-2 is assigned to the sequence derived from the specimen number # 0. Similarly, an index corresponding to the mixed IDs 0, 2, 4,..., 2n-1 is assigned to the sequence derived from the specimen number # 1.

逆に、同一の配列に付与されていたIndexを集計し、その分布を解析することで、元々の検体番号を特定することができる。 On the contrary, the original specimen number can be specified by counting the Indexes assigned to the same sequence and analyzing the distribution.

例えば、次世代シーケンサーから出力されたある塩基配列に混合ID1,3,5,・・・,2n−2に相当するIndexが付与されていた場合、その配列は検体番号#0を構成する配列であることが解る。同様に、ある配列に混合ID2,4,5,・・・,2n−1に相当するIndexが付与されていた場合、その配列は検体番号#1を構成する配列であることが解る。 For example, when an index corresponding to the mixed ID 1, 3, 5,..., 2n-2 is assigned to a certain base sequence output from the next-generation sequencer, the sequence is a sequence constituting the sample number # 0. I understand that there is. Similarly, when an index corresponding to the mixed IDs 2, 4, 5,..., 2n-1 is assigned to a certain sequence, it is understood that the sequence is a sequence constituting the specimen number # 1.

元々の検体番号に対応付けられた次世代シーケンサーからの出力配列を基にアセンブラーでコンティグを作成することで、各サンプルの塩基配列を得ることができる。ここでアセンブラーはABySS(非特許文献3)あるいはVelvet等を用いてもよいが、それらに限らない。 The base sequence of each sample can be obtained by creating a contig with an assembler based on the output sequence from the next-generation sequencer associated with the original specimen number. Here, the assembler may use ABySS (Non-Patent Document 3) or Velvet, but is not limited thereto.

次世代シーケンサーからの出力配列は適宜短縮して上記の解析に供しても良い。 The output sequence from the next-generation sequencer may be appropriately shortened and used for the above analysis.

以下、実施例を持って本発明の実施の態様を説明するが、これは単なる例示であり本発明を何等制限するものではない。 Hereinafter, the embodiments of the present invention will be described with reference to examples. However, this is merely an example and does not limit the present invention.

(実施例1)計算機による模擬解析
1,066配列からなるクリプトスポリジウムパルバムの既知完全長cDNA配列を模擬解析の対象配列とした。この配列は、サンガー法で明らかにされた配列断片、および、次世代シーケンサーを用いたショットガン解析により明らかにされた18,308,250個の配列断片(アクセッション番号:SRX004536〜SRX004538)を、既知のクリプトスポリジウムパルバムのゲノム配列と比較することで決定されたものである(非特許文献4)。
(Example 1) Simulated analysis by computer A known full-length cDNA sequence of Cryptosporidium parvum consisting of 1,066 sequences was used as a target sequence for simulation analysis. This sequence includes sequence fragments revealed by the Sanger method and 18,308,250 sequence fragments (accession numbers: SRX004536 to SRX004538) revealed by shotgun analysis using a next-generation sequencer. It was determined by comparing with the genome sequence of known Cryptosporidium parvum (Non-patent Document 4).

本実施例では、請求項2に記載のaを2に、nを8とし、上記1,066配列の中から256配列を選択した(図6)。また、上記18,308,250断片配列の中から256配列に相当する5,474,164断片配列を選抜し、以降に用いた。 In this example, a in claim 2 was set to 2, n was set to 8, and 256 sequences were selected from the 1,066 sequences (FIG. 6). In addition, a 5,474,164 fragment sequence corresponding to 256 sequences was selected from the 18,308,250 fragment sequences and used thereafter.

本実施例では、当該発明に基づいた処理により、上記5,474,164断片配列から256配列を再構成することを、計算機を用いた模擬解析により試みた。 In this example, it was attempted by simulation analysis using a computer to reconstruct 256 sequences from the 5,474,164 fragment sequences by the processing based on the present invention.

対象となる256配列を図7に示す組み合わせで混合することで、16グループを構成した。すなわち各配列は8種類のグループに属することになる。 Sixteen groups were constructed by mixing the 256 sequences of interest in the combinations shown in FIG. That is, each array belongs to 8 types of groups.

マッピングツールであるBowtie(非特許文献5)を用いることで、上記5,474,164断片配列を対象となる256配列に対応付け、さらにこれらを上記の8グループにランダムに割り振った。これは模擬解析に特有の操作であり、実解析ではIndex配列に基づく処理に置き換えられる。また、5’側の4塩基をIndex相当配列として除去し、残った32塩基を続く解析に用いた。 By using Bowtie (Non-Patent Document 5) which is a mapping tool, the 5,474,164 fragment sequences were associated with 256 sequences of interest, and these were randomly assigned to the 8 groups. This is an operation peculiar to the simulation analysis, and is replaced with a process based on the Index array in the actual analysis. In addition, 4 bases on the 5 'side were removed as an Index-corresponding sequence, and the remaining 32 bases were used for the subsequent analysis.

解析対象の32塩基を連続する22塩基を単位として抽出することで、1配列から11配列のサブセットを作成した。 Subsets of 11 sequences were created from 1 sequence by extracting the 32 bases to be analyzed in units of 22 consecutive bases.

同一配列を有する22塩基の配列を集計し、それらに付与されていたIndex配列のパターンを解析することで、その22塩基が256配列のうちのどの配列に由来するものかを推定した。 By counting 22 base sequences having the same sequence and analyzing the index sequence pattern assigned to them, it was estimated which sequence of the 256 bases was derived from the 22 bases.

256配列のそれぞれについて、対応付けられた22塩基の配列を集計し、アセンブラーであるABySS(非特許文献3)によりコンティグの作成を行った。 For each of the 256 sequences, the 22-base sequence corresponding to each of the 256 sequences was tabulated, and a contig was created by the assembler ABySS (Non-patent Document 3).

作成されたコンティグ配列を元の配列を比較することにより、復元度を見積もった。 The degree of restoration was estimated by comparing the generated contig sequence with the original sequence.

ここで復元度は、(最大長コンティグの長さ/元の配列の長さ)で定義する。 Here, the degree of restoration is defined by (length of maximum length contig / length of original array).

その結果、当該方法により大部分の配列が復元できることが示された。すなわち、256配列のうち、99%以上の領域を復元できたものが104配列、95%以上の領域を復元できたものが160配列、90%以上の領域を復元できたものが199配列となる成績が得られた。詳細を図8および図9に示した。 As a result, it was shown that most of the sequences can be restored by this method. That is, of the 256 sequences, 104 sequences can restore 99% or more of the region, 160 sequences can restore 95% or more of the region, and 199 sequences can restore 90% or more of the region. A grade was obtained. Details are shown in FIGS.

これまでキャピラリー法で行われていたプラスミドDNAの塩基配列解析を、当該発明と次世代シーケンサーの組み合わせで置き換えることが可能であり、その結果、費用対効果の大幅な向上が見込まれる。 It is possible to replace the base sequence analysis of plasmid DNA, which has been performed by the capillary method so far, with a combination of the present invention and a next-generation sequencer, and as a result, a significant improvement in cost effectiveness can be expected.

Claims (11)

複数からなる核酸検体(プラスミドA,B,C,D)のそれぞれに対して、複数のID(1,2,3,4)を同一の組み合わせにならないように付与し(A=1&2、B=1&3,C=2&4,D=3&4)、同一のIDを有する検体を合わせたサブグループを構成し(分割・混合:1=A+B,2=A+C,3=B+D,4=C+D)、各サブグループを断片化し、各サブグループを構成する核酸にサブグループ特有のIndex配列(1=AA,2=CT,3=CG,4=AT、Index結合)を付与した後、全てを混合して配列解析を行い(混合シーケンス)、得られた配列をIndex配列に応じて分類することで各検体との対応付けを行い(Indexに応じてグループ化:A=AA+CT,B=AA+CG,C=AT+CT,D=AT+CG)、対応付けられた配列をアセンブルすることで各検体の塩基配列を決定する方法。(括弧内は4サンプル同時解析を例示した図2の場合を示す) A plurality of IDs (1, 2, 3, 4) are assigned to each of a plurality of nucleic acid samples (plasmids A, B, C, D) so as not to have the same combination (A = 1 & 2, B = 1 & 3, C = 2 & 4, D = 3 & 4), and a sub-group is formed by combining samples having the same ID (division / mixing: 1 = A + B, 2 = A + C, 3 = B + D, 4 = C + D), each subgroup is fragmented, and subsequence-specific index sequences (1 = AA, 2 = CT, 3 = CG, 4 = AT, index binding) are added to the nucleic acids constituting each subgroup. All are mixed and sequence analysis is performed (mixed sequence), and the obtained sequences are classified according to the index sequence to associate with each specimen (grouping according to the index: A = AA + CT , B = AA + CG, C = AT + CT, D = AT + CG), determining the base sequence of each specimen by assembling the associated sequences . (The figure in parentheses shows the case of FIG. 2 illustrating the simultaneous analysis of 4 samples) 請求項1における検体数が、n0個の検体と、(aのn乗−n0)個のモック検体から構成されるaのn乗個の検体である、塩基配列の決定方法。 The base sequence determination method according to claim 1, wherein the number of samples is n0 samples and (n-th power of n-n0) mock samples. 請求項2におけるサブグループの構成方法が、X番目の検体について(X−1)をa進数で表記し、各桁と各数値の組み合わせをIDとみなして、各桁ごとに同一の数値を有する検体をまとめて1つのサブグループとすることを、全ての桁について繰り返すことと同義の操作からなる、塩基配列の決定方法。 The subgroup configuration method according to claim 2 has (X-1) expressed in a-adic for the X-th specimen, and a combination of each digit and each numerical value is regarded as an ID, and each digit has the same numerical value. A method for determining a base sequence, comprising an operation synonymous with repeating all the digits to group samples into one subgroup. 請求項3におけるaが2、あるいは2の倍数である、塩基配列の決定方法。 The method for determining a base sequence, wherein a in claim 3 is 2 or a multiple of 2. 請求項1における得られた配列をIndex配列に応じて分類することで各検体との対応付けを行う方法が、同一配列に付与された種々のIndex配列の組み合わせと、検体に付与されたIDの組み合わせとの照合により成される、塩基配列の決定方法。 The method of associating each sequence obtained by classifying the obtained sequence according to claim 1 with each sample is a combination of various index sequences assigned to the same sequence and the ID assigned to the sample. A method for determining a base sequence, which is performed by matching with a combination. 請求項1における核酸検体がDNAもしくはRNAである、塩基配列の決定方法。 The method for determining a base sequence, wherein the nucleic acid sample according to claim 1 is DNA or RNA. 請求項1における配列解析技術が次世代シーケンサーを用いたものである、塩基配列の決定方法。 A method for determining a base sequence, wherein the sequence analysis technique according to claim 1 uses a next-generation sequencer. 請求項1におけるIndex配列がDNA、RNAもしくはその混合物質である、塩基配列の決定方法。 The method for determining a base sequence, wherein the Index sequence according to claim 1 is DNA, RNA or a mixed substance thereof. 請求項8におけるIndex配列がパリティー因子を含む、塩基配列の決定方法。 The method for determining a base sequence, wherein the Index sequence according to claim 8 includes a parity factor. 請求項1から5に、請求項6から9記載の方法を任意に組み合わせた、塩基配列の決定方法。 A method for determining a base sequence, wherein the methods according to claims 6 to 9 are arbitrarily combined with claims 1 to 5. 請求項10記載の方法を計算機により実行する際の、コンピュータープログラム。 A computer program for executing the method according to claim 10 by a computer.
JP2012030173A 2012-02-15 2012-02-15 Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence Pending JP2013165661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012030173A JP2013165661A (en) 2012-02-15 2012-02-15 Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2012030173A JP2013165661A (en) 2012-02-15 2012-02-15 Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence

Publications (1)

Publication Number Publication Date
JP2013165661A true JP2013165661A (en) 2013-08-29

Family

ID=49176675

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2012030173A Pending JP2013165661A (en) 2012-02-15 2012-02-15 Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence

Country Status (1)

Country Link
JP (1) JP2013165661A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160082715A (en) * 2014-12-26 2016-07-11 연세대학교 산학협력단 Detection method of gene deletion based on next-generation sequencing
CN110872616A (en) * 2018-08-31 2020-03-10 希森美康株式会社 Analysis method, information processing device, gene analysis system, program, and recording medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160082715A (en) * 2014-12-26 2016-07-11 연세대학교 산학협력단 Detection method of gene deletion based on next-generation sequencing
KR101638473B1 (en) 2014-12-26 2016-07-12 연세대학교 산학협력단 Detection method of gene deletion based on next-generation sequencing
CN110872616A (en) * 2018-08-31 2020-03-10 希森美康株式会社 Analysis method, information processing device, gene analysis system, program, and recording medium

Similar Documents

Publication Publication Date Title
Zielezinski et al. Benchmarking of alignment-free sequence comparison methods
Kozich et al. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform
Finet et al. Multigene phylogeny of the green lineage reveals the origin and diversification of land plants
Wu et al. A simple, fast, and accurate method of phylogenomic inference
Giongo et al. PANGEA: pipeline for analysis of next generation amplicons
Lemmon et al. Anchored hybrid enrichment for massively high-throughput phylogenomics
Chen et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species
Nagy et al. Re-mind the gap! Insertion–deletion data reveal neglected phylogenetic potential of the nuclear ribosomal internal transcribed spacer (ITS) of fungi
Song et al. Development of chloroplast genomic resources for Oryza species discrimination
Chagnon et al. Trait‐based partner selection drives mycorrhizal network assembly
Xu et al. Genome sequence of Malania oleifera, a tree with great value for nervonic acid production
Parks et al. Signal, uncertainty, and conflict in phylogenomic data for a diverse lineage of microbial eukaryotes (Diatoms, Bacillariophyta)
Qu et al. Insights into the existence of isomeric plastomes in Cupressoideae (Cupressaceae)
Zhang et al. Plastome phylogenomics of Saussurea (Asteraceae: cardueae)
KR101798229B1 (en) ribosomal RNA sequence extraction method and microorganism identification method using extracted ribosomal RNA sequence
KR101447593B1 (en) Method for determining whole genome sequence of chloroplast, mitochondria or nuclear ribosomal DNA of organism using next generation sequencing
KR20170012390A (en) Sequencing process
JP2017517282A5 (en)
Pang et al. Assessing the potential of candidate DNA barcodes for identifying non‐flowering seed plants
Reddy et al. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets
Choi et al. Identifying genetic markers for a range of phylogenetic utility–From species to family level
Shevtsov et al. Genetic diversity of Francisella tularensis subsp. holarctica in Kazakhstan
JP2013165661A (en) Method for determining base sequence of two or more specimens at a time by corresponding each specimen to sequence
Zhang et al. Subgenome-aware analyses suggest a reticulate allopolyploidization origin in three Papaver genomes
Keim et al. Microbial forensics: DNA fingerprinting of Bacillus anthracis (anthrax)