WO2018235938A1 - Procédé pour séquençage et procédé pour analyse d'acides nucléiques - Google Patents

Procédé pour séquençage et procédé pour analyse d'acides nucléiques Download PDF

Info

Publication number
WO2018235938A1
WO2018235938A1 PCT/JP2018/023778 JP2018023778W WO2018235938A1 WO 2018235938 A1 WO2018235938 A1 WO 2018235938A1 JP 2018023778 W JP2018023778 W JP 2018023778W WO 2018235938 A1 WO2018235938 A1 WO 2018235938A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleic acid
barcode
index
molecular barcode
Prior art date
Application number
PCT/JP2018/023778
Other languages
English (en)
Japanese (ja)
Inventor
克之 城口
Original Assignee
国立研究開発法人理化学研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立研究開発法人理化学研究所 filed Critical 国立研究開発法人理化学研究所
Priority to JP2019525702A priority Critical patent/JP7160349B2/ja
Publication of WO2018235938A1 publication Critical patent/WO2018235938A1/fr

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to methods of sequencing and analyzing nucleic acids.
  • next-generation sequencer platforms has made it possible to simultaneously analyze the sequences of a large number of nucleic acids in one run.
  • a unique molecular barcode to each molecule of nucleic acid molecules present in the sample, the number of unique molecular barcode types can be made to correspond to the number of nucleic acid molecules, and the platform of the next generation sequencer
  • Patent Document 1 and Non-Patent Document 1 The way of digital determination of H. pylori was pioneered (Patent Document 1 and Non-Patent Document 1).
  • Patent Document 1 By using a molecular barcode as a random base and lengthening the base sequence, it becomes possible to easily add great diversity to the barcode sequence, and the dynamic range of the nucleic acid molecule that can be quantified digitally is expanded (Patent Document 1 and Non Patent Literature 1).
  • the sequence of molecular barcodes may change during analysis, whereby newly generated molecular barcodes may affect the accuracy of quantification of nucleic acid molecules.
  • the sequence of the molecular barcode is randomly designed, it is difficult to grasp that the sequence has changed.
  • due to the randomness of the molecular barcode sequence it is difficult to analyze what kind of error may occur in digital quantification, and it is also difficult to present the solution.
  • the present invention provides methods of sequencing and analyzing nucleic acids.
  • the present inventors use the index and the barcode in the digital quantification method of the target nucleic acid molecule, when mixing a plurality of samples and quantifying the target nucleic acid molecule, the nucleic acid derived from the sample whose index is different from unexpected It is clarified that a miss index can occur, which is added to We also consider the most frequent pair to be the correct pair when two different indexes are added to the same barcode, and excluding any or all others as miss indexes. , Revealed that the accuracy of the digital quantification method can be improved. We have found that when counting the number of types of barcode sequences, mutations (e.g.
  • the present inventors have clarified that the accuracy of the digital quantification method can be improved by clustering sequences having a certain sequence similarity into a group and quantifying a target nucleic acid molecule based on the number of clusters. We have found that when digitally counting nucleic acids, the problem of misidentifying the template can occur. We also identify the most frequent pair as the correct pair and exclude any or all others as false identifications when two different target nucleic acid sequences are added to the same barcode It was revealed that the accuracy of the digital quantification method could be improved.
  • step (I) The method according to (1A) above, wherein the target nucleic acid molecule to which at least a molecular barcode has been added is subjected to amplification prior to step (I).
  • step (II) A sequence similar to the sequence having a specific molecular barcode in step (II) is a sequence including the sequence having the specific molecular barcode and a mismatched base having a number of bases or less in the molecular barcode sequence portion.
  • (4A) The method according to any of the above (1A) to (3A), wherein the molecular barcode has a fixed base at a specific position.
  • step (II) that the sequence similar to the sequence having the specific molecular barcode in step (II) contains the fixed base at the specific position, and / or the position of the fixed base is shifted from the specific position
  • the method according to (4A) above which is selected based on (6A)
  • the method according to (4A) above further comprising excluding from analysis the sequence having a molecular barcode that does not contain the fixed base at the specific position.
  • (8A) further comprising the step of determining the number of target nucleic acid molecules contained in the sample from which the target nucleic acid molecule is derived based on the number of groups having a specific molecular barcode or a sequence similar thereto And the method according to any one of (1A) to (7A) above.
  • (9A) A method of analyzing nucleic acid: (I) subjecting a mixture of a plurality of nucleic acid molecules to which molecular barcodes have been added to sequencing to obtain sequence information; (II) selecting a sequence having a specific molecular barcode or a sequence similar thereto from the sequence information obtained in (I) above, and creating a group from the selected sequence; Method, including.
  • a sequence similar to the sequence having the specific molecular barcode in step (II) is a sequence including the sequence having the specific molecular barcode and a mismatched base having a number of bases or less in the molecular barcode sequence portion.
  • step (11A) that the sequence similar to the sequence having the specific molecular barcode in step (II) contains the fixed base at the specific position, and / or the position of the fixed base is shifted from the specific position
  • the method according to (11A) above which is selected based on (13A)
  • the method according to (11A) further including the step of excluding from analysis the sequence having a molecular barcode which does not contain the fixed base at the specific position.
  • (14A) further comprising the step of determining the number of target nucleic acid molecules contained in the sample from which the target nucleic acid molecule is derived based on the number of groups having a specific molecular barcode or a sequence similar thereto And the method according to any one of (9A) to (13A) above.
  • step (I) The method according to any one of (9A) to (14A) above, wherein the target nucleic acid molecule to which at least a molecular barcode has been added is subjected to amplification prior to step (I).
  • (16A) A method of analyzing nucleic acid: (I) subjecting a mixture of a plurality of nucleic acid molecules to which a molecular barcode having a fixed base at a specific position is added to sequencing to obtain sequence information; (IIa) excluding from analysis a sequence having a molecular barcode which does not contain the fixed base at the specific position; (IIb) obtaining sequence information comprising a sequence comprising the fixed base at the specific position in the step (I) or after the step (I); or (IIc) as the step (II) above Further comprising the step of selecting a sequence having a specific molecular barcode or a sequence similar thereto from the sequence information obtained in 4.), and creating a group according to the selected sequence, and in step (II
  • step (E) select an array having a specific index and create a group for each index
  • step (F) for nucleic acid sequences having molecular barcodes that appeared in a plurality of groups, the barcode / index pair with the highest number of reads is determined as the correct barcode / index pair, or Determine the highest index and molecular barcode pair as the correct index and molecular barcode pair The method according to (1B) above.
  • step (E) select a sequence having a specific molecular barcode to create a group for each molecular barcode
  • the highest detection frequency index and molecular bar code pair in the created group is determined as the correct index and molecular bar code pair.
  • step (E) select a sequence containing the sequence of the target nucleic acid molecule to create a group
  • step (F) a sequence having a specific index is further selected from the group to create a subgroup, and a barcode having the largest number of reads and a nucleic acid sequence having a molecular barcode appearing in a plurality of subgroups Determine the index pair as the correct barcode / index pair, or determine the index / molecular barcode pair with the highest detection frequency as the correct index / molecular barcode pair, The method according to (1B) above.
  • step (E) select a sequence containing the sequence of the target nucleic acid molecule to create a group
  • step (F) molecules having a specific molecular barcode are further selected from the group to create a subgroup, and the index with the highest detection frequency and molecular barcode in one created subgroup is indexed And determine the correct pair of molecular barcodes, The method according to (1B) above.
  • step (F) at least any one or all of the index and molecular barcode pairs other than the determined correct pair are determined as the index and the molecular barcode mispairing, The method according to (2B) to (5B) above.
  • step (E) select a molecule having a specific index in step (E) and create a group for each index,
  • step (F) with respect to the sequences having molecular barcodes appearing in a plurality of groups, a low detection frequency index and molecular barcode pair is determined as at least any one or all of the indexes and the molecular barcode mispair , The method according to (1B) above.
  • step (E) select a sequence having a specific molecular barcode in step (E) to create a group for each molecular barcode,
  • the pair of the low detection frequency index and the molecular barcode is determined as the index and / or the molecular barcode, at least any one or all of mispairs, The method according to (1B) above.
  • step (F) selecting a sequence containing the target nucleic acid molecule in step (E) to create a group
  • molecules having a specific index are further selected from the group to create subgroups, and for nucleic acid molecules having molecular barcodes that appear in a plurality of subgroups, an index with low detection frequency and a molecular barcode Determine at least any one or all of the pairs as the index and the molecule barcode mispairing, The method according to (1B) above.
  • step (E) selecting a molecule containing the target nucleic acid molecule in step (E) to create a group
  • step (F) a molecule having a specific molecular barcode is selected from the group to create a subgroup, and at least one of a pair of a low detection index and a molecular barcode in one created subgroup Determine one or all of the index and molecular barcodes as a spare The method according to (1B) above.
  • step (11B) the step of creating a group creates a group by clustering, as a group, molecules assumed to have the same sequence determined based on sequence identity or similarity Done by The method according to (1B) to (10B) above.
  • step (E) clustering is (I) In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having the same sequence as the unique molecular barcode sequence into the same cluster; (Ii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a unique molecular barcode sequence and a sequence having up to one mismatch into the same cluster; (Iii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecules having a unique molecular barcode sequence and a sequence having up to 2 bases of mismatch into the same cluster; or (iv) the molecular barcode In the partial sequence, it is performed by classifying nucleic acid molecule groups having a sequence having a unique molecular barcode and a mismatch up to 3 bases into the same cluster, The method according to (11B) above.
  • step (E) clustering is In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a sequence sequenced as having an insertion or deletion (indel) of bases into the same cluster, The method according to (11B) or (12B) above.
  • step (E) clustering is In the sequence of the molecular barcode portion, it is performed on a group of nucleic acid molecules obtained excluding the sequence sequenced as having a base insertion or deletion (indel), The method according to (11B) or (12B) above.
  • a method of determining the number of target nucleic acid molecules contained in a specific original sample from sequence information (E) selecting a nucleic acid molecule containing the sequence of the target nucleic acid molecule from the obtained sequence information; (F) clustering the nucleic acid molecules selected in (e) above for each unique molecular barcode sequence, and then identifying a cluster having a plurality of sequences in the index nucleic acid molecule portion, (G) In each of the clusters identified in (f), the index with the highest detection frequency and the molecular barcode pair are identified as the correctly indexed target nucleic acid molecules, and the other index and molecular barcode pairs are identified.
  • the number of kinds of sequence of the unique molecular barcode linked to the target nucleic acid molecule correctly indexed Is the number of Method.
  • clustering is (I) In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having the same sequence as the unique molecular barcode sequence into the same cluster; (Ii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a unique molecular barcode sequence and a sequence having up to one mismatch into the same cluster; (Iii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecules having a unique molecular barcode sequence and a sequence having up to 2 bases of mismatch into the same cluster; or (iv) the molecular barcode In the partial sequence, it is performed by classifying nucleic acid molecule groups having a sequence having a unique molecular barcode and a mismatch up to 3 bases into the same cluster, The method as described in the above (16B).
  • clustering is In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a sequence sequenced as having an insertion or deletion (indel) of bases into the same cluster, The method according to (16B) or (17B) above. (19B) In (e), clustering is In the sequence of the molecular barcode portion, it is performed on a group of nucleic acid molecules obtained excluding the sequence sequenced as having a base insertion or deletion (indel), The method according to (16B) or (17B) above.
  • (20B) The insertions or deletions of the bases in the position of each of the one or more fixed bases arranged in all the molecular barcode sequences linked to the nucleic acid molecule and in the sequence of the sequenced molecular barcode sequence part The method according to (18B) or (19B) above, further comprising specifying by difference with the position of each of the one or more fixed bases.
  • 21B In the digital quantification of a target nucleic acid molecule using a barcode sequence, based on the information on the obtained nucleic acid sequence, the sequence of the molecule barcode after mutation has one group together with other sequences having sequence similarity And clustering the number of target nucleic acid molecules based on the number of clusters obtained.
  • clustering is (I) In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having the same sequence as the unique molecular barcode sequence into the same cluster; (Ii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a unique molecular barcode sequence and a sequence having up to one mismatch into the same cluster; (Iii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecules having a unique molecular barcode sequence and a sequence having up to 2 bases of mismatch into the same cluster; or (iv) the molecular barcode A method carried out by classifying nucleic acid molecule groups having sequences having a unique molecular barcode and up to 3 bases of mismatches into the same cluster in the partial sequence.
  • a method comprising detecting a base insertion or deletion (indel).
  • a unique index for each sample containing a plurality of nucleic acid molecules meaning index sequence nucleic acid molecule, which may contain a plurality of index nucleic acid molecules if unique to each sample) and unique to each nucleic acid molecule Obtained from sequencing (ie, multiplex sequencing) using a mixture of multiple samples containing the target nucleic acid molecule (eg, DNA or RNA) to which or any molecular barcode (barcode sequence nucleic acid molecule) has been added
  • a method for determining the correct pair or mispair of an index added to a target nucleic acid molecule and a molecular barcode from the sequence information obtained (A) separately obtaining a plurality of samples containing a nucleic acid molecule (eg, DNA or RNA) and ⁇ at least one of the samples includes a target nucleic acid molecule ⁇ , (B) ⁇ e.g., in each of a plurality of obtained samples ⁇ before amplification
  • the sample may be mixed before obtaining the amplification product, and after obtaining the amplification product, the sample containing the amplification product may be mixed), an index unique to each sample, and a unique or optional for each target nucleic acid molecule And sequencing the nucleic acid molecule to which the molecular barcode of S is added to determine the sequence of the index portion, the sequence of the molecular barcode portion, and the sequence of the target nucleic acid molecule portion linked thereto as needed for each nucleic acid molecule.
  • step (E) select an array having a specific index and create a group for each index
  • step (F) for nucleic acid sequences having molecular barcodes that appeared in a plurality of groups, the barcode / index pair with the highest number of reads is determined as the correct barcode / index pair, or Determine the highest index and molecular barcode pair as the correct index and molecular barcode pair
  • step (E) select an array having a specific index and create a group for each index
  • step (F) for nucleic acid sequences having molecular barcodes that appeared in a plurality of groups, the barcode / index pair with the highest number of reads is determined as the correct barcode / index pair, or Determine the highest index and molecular barcode pair as the correct index and molecular barcode pair
  • (1C) Determine the highest index and molecular barcode pair as the correct index and molecular barcode pair
  • step (E) select a sequence having a specific molecular barcode and create a group for each molecular barcode,
  • step (F) the highest detection frequency index and molecular bar code pair in the created group is determined as the correct index and molecular bar code pair.
  • step (E) select a sequence containing the sequence of the target nucleic acid molecule to create a group
  • step (F) a sequence having a specific index is further selected from the group to create a subgroup, and a barcode having the largest number of reads and a nucleic acid sequence having a molecular barcode appearing in a plurality of subgroups Determine the index pair as the correct barcode / index pair, or determine the index / molecular barcode pair with the highest detection frequency as the correct index / molecular barcode pair, The method according to (1C) above.
  • step (E) select a sequence containing the sequence of the target nucleic acid molecule to create a group
  • step (F) molecules having a specific molecular barcode are further selected from the group to create a subgroup, and the index with the highest detection frequency and molecular barcode in one created subgroup is indexed And determine the correct pair of molecular barcodes, The method according to (1C) above.
  • step (F) at least any one or all of the index and molecular barcode pairs other than the determined correct pair are determined as the index and molecular barcode mispairing The method according to any one of the above (2C) to (5C).
  • step (E) select a molecule having a specific index and create a group for each index
  • step (F) with respect to the sequences having molecular barcodes appearing in a plurality of groups, a low detection frequency index and molecular barcode pair (for example, a detection frequency pair lower than a predetermined reference value and constant)
  • the reference value includes, but is not limited to, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less in the group.
  • a pair of detection frequency pairs may be determined as a molecule barcode mispair with at least any one or all of the indices. The method according to (1C) above.
  • a pair with a low detection frequency index and a molecular barcode for example, a pair with a detection frequency lower than a certain reference value, and a certain reference value is 50% or less in the group 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, or 1% or less, and is not limited thereto, and, for example, even the second and subsequent detection frequency pairs ) Is determined as a mispair of at least one or all of the index and molecular barcodes, The method according to (1C) above.
  • step (F) selecting a sequence containing the target nucleic acid molecule in step (E) to create a group
  • molecules having a specific index are further selected from the group to create subgroups, and for nucleic acid molecules having molecular barcodes that appear in a plurality of subgroups, an index with low detection frequency and a molecular barcode (Eg, a detection frequency pair lower than a predetermined reference value, and the predetermined reference value is 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less)
  • a value of 1% or less but is not limited thereto, and may be, for example, a pair of second or later detection frequency
  • at least any one or all of index and mispair of molecular barcode decide The method according to (1C) above.
  • step (E) select a molecule containing a target nucleic acid molecule to create a group
  • step (F) molecules having specific molecular barcodes are further selected from the group to create subgroups, and a pair of infrequently detected index and molecular barcodes in one created subgroup (eg, constant)
  • the detection frequency is lower than the standard value of the group, and the fixed standard value is 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, 1% or less in the group
  • at least any one or all of the second and subsequent detection frequency pairs may be determined as the index and the molecular barcode mispair, The method according to (1C) above.
  • step (E) the step of creating a group had ⁇ identical sequence determined based on sequence identity or similarity ⁇ preferably in the sequence of the molecular barcode portion ⁇ ⁇ eg, It is carried out by creating a group by clustering molecules presumed to have a sequence change by any of the steps (A) to (D) as a group The method according to (1C) to (10C) above.
  • step (E) clustering is In the sequence of the molecular barcode portion, a nucleic acid molecule group having a sequence sequenced as having an insertion or deletion (indel) of bases (for example, up to 1 base, up to 2 bases, or up to 3 bases) is classified into the same cluster To be done by The method according to (11C) or (12C) above.
  • step (E) clustering is A group of nucleic acid molecules obtained by excluding sequences sequenced as having insertions or deletions (indel) of bases (for example, up to 1 base, up to 2 bases, or up to 3 bases) in the sequence of the molecular barcode portion To be done against The method according to (11C) or (12C) above.
  • One or more (eg, one, two, three, four, five or six) insertions or deletions of said bases are arranged in all molecular barcode sequences linked to the nucleic acid molecule.
  • each fixed base may be designed to be any one base selected from the group consisting of A, T, G and C; or a combination of A and T, A and T G combination, A and C combination, T and G combination, T and C combination, G and C combination, A, T and G combination, A and T and C combination, A and G and It consists of the combination with C, and the combination with T and G and C It may be designed to be a base selected at random from bases contained in any one combination selected from ⁇ .
  • a target nucleic acid molecule eg, a nucleic acid molecule of interest to which a unique index (index sequence nucleic acid molecule) and a molecular barcode unique to each nucleic acid molecule (barcode sequence nucleic acid molecule) are added ,
  • the number of target nucleic acid molecules contained in a specific original sample from sequence information obtained by sequencing (ie, multiplex sequencing) using a mixture of multiple samples containing DNA, RNA or How to determine (A) separately obtaining a plurality of samples comprising a nucleic acid molecule (eg, DNA or RNA) and ⁇ at least one of the samples comprising the nucleic acid molecule of interest ⁇ , (B) A target nucleic acid molecule in which an arbitrary molecular barcode is linked to each of the target nucleic acid molecules in each of a plurality of obtained samples before amplification of a nucleic acid molecule contained in the sample, and different molecular barcodes are linked to each other Obtaining (C) Before
  • the sample may be mixed before obtaining the amplification product, and after obtaining the amplification product, the sample containing the amplification product may be mixed), an index unique to each sample, and a unique or optional for each target nucleic acid molecule Sequencing the nucleic acid molecules to which the molecular barcodes have been added to identify the sequence of the index portion, the sequence of the molecular barcode portion, and the sequence of the nucleic acid molecule portion linked thereto for each nucleic acid molecule; (E) selecting a nucleic acid molecule containing the sequence of the target nucleic acid molecule from the obtained sequence information; (F) clustering the nucleic acid molecules selected in (e) above for each unique molecular barcode sequence, and then identifying a cluster having a plurality of sequences in the index nucleic acid molecule portion, (G) In each of the clusters identified in (f), the index with the highest detection frequency and the molecular barcode pair are identified as the correctly indexed target nucleic acid molecules, and the other
  • the target nucleic acid molecule contained in the sample corresponding to the index, the number of kinds of sequence of the unique molecular barcode linked to the target nucleic acid molecule correctly indexed (or the number of clusters of the target nucleic acid molecule correctly indexed) Is the number of Method.
  • clustering is (I) In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having the same sequence as the unique molecular barcode sequence into the same cluster; (Ii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a unique molecular barcode sequence and a sequence having up to one mismatch into the same cluster; (Iii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecules having a unique molecular barcode sequence and a sequence having up to 2 bases of mismatch into the same cluster; or (iv) the molecular barcode In the partial sequence, it is performed by classifying nucleic acid molecule groups having a sequence having a unique molecular barcode and a mismatch up to 3 bases into the same cluster, The method as described in the above (16C).
  • clustering is In the sequence of the molecular barcode portion, a group of nucleic acid molecules having a sequence sequenced as having an insertion or deletion (indel) of bases (for example, up to 1 base, up to 2 bases, or up to 3 bases) in the same cluster Done by classification, The method according to (16C) or (17C) above.
  • clustering is A nucleic acid molecule obtained by excluding sequences sequenced as having insertions or deletions (indel) of bases (eg, up to 1 base, up to 2 bases, or up to 3 bases) in the sequence of the molecular barcode portion Performed on a group, The method according to (16C) or (17C) above.
  • One or more (eg, one, two, three, four, five or six) insertions or deletions of said bases are arranged in all molecular barcode sequences linked to the nucleic acid molecule 20.
  • each fixed base can be designed to be any one base selected from the group consisting of A, T, G and C; or a combination of A and T, of A and G Combination, A and C combination, T and G combination, T and C combination, G and C combination, A, T and G combination, A, T and C combination, A, G and C and And the combination of T, G and C from It may be designed to be a base selected at random from bases contained in any one of the combinations to be-option ⁇ .
  • FIG. 1 is a diagram for explaining a digital quantification method of nucleic acid molecule and its effectiveness.
  • A In panel A of FIG. 1, a scheme of digital counting is shown. Uniquely add a molecular barcode to each target nucleic acid molecule (add a unique molecular barcode). After amplification, both the nucleic acid portion of interest and the barcode portion are sequenced. The copy number is determined not by the read number but by the number of unique barcodes. The dotted frame indicates the experimental design used in this example.
  • Panel B of FIG. 1 illustrates that the first requirement for accurate digital counting: each target nucleic acid molecule must be labeled with a different barcode.
  • FIG. 1 shows that the second requirement for accurate digital counting: all barcode sequences (at least one lead) attached to the target nucleic acid molecule must be detected FIG. If the number of unique barcodes measured becomes constant as the sequence depth is increased, then the sequence depth of the range meets the second requirement.
  • FIG. 2 shows the observed intrinsic features of digital counts with random base barcodes applied to the two requirements for accurate digital quantification.
  • Panel A of FIG. 2 represents the dependence of the number of clusters (unique barcodes shown in gray) on the number of random bases (base length). The results of ST1 are shown.
  • Gray lines indicate the number of unique barcodes.
  • Panel B of FIG. 2 shows the dependence of the number of barcode clusters on the number of random bases.
  • C Panel C in FIG.
  • Gray lines represent the dependence of the number of unique barcode sequences observed on the number of random bases.
  • FIG. 3 shows the results of analysis using Distance and fixed base. In FIG. 3, the results of ST1 with index A (indicated by circles) and index B (indicated by triangles) are shown. The length of the random base barcode was 24.
  • Panel A of FIG. 3 shows the effect of clustering at different Distances on the number of clusters.
  • FIG. 4 shows the absolute counts of each template.
  • Panel C in FIG. 4 shows the correlation between the input (ie, the number of molecules before PCR amplification, x axis) and the output (ie, the result of digital counting, the y axis). The number of outputs was determined from FIGS. 4A and 11 with a coverage of 12.6 to 20.9 indicated by the large symbols.
  • the gray line shows a regression line with slope 1 on a logarithmic scale. The circles and triangles correspond to index A and index B, respectively. Pearson's product moment correlation coefficient r of linear regression and determination coefficient R 2 are shown.
  • FIG. 5 shows the required number of random bases for digital counting.
  • the x-axis indicates the number of inputs of the molecule to be measured, and the y-axis is a relative of 0.95 for each curve in panel B of FIG. 4 and panel B of FIG. Indicates the number of random bases when reaching the target number of clusters.
  • FIG. 6A shows a conventional method involving amplification, indexing, mixing and sequencing.
  • the barcoded sequence is not used, and a unique index is added to the amplified sequence for each sample, mixed and sequenced. An index may be added before amplification.
  • FIG. 6B shows a conventional method involving amplification, indexing, mixing and sequencing. Here, in the conventional method, although miss index addition may occur, the generated miss index can not be identified.
  • FIG. 6C shows the use of molecular barcodes. A unique barcode sequence is labeled for each nucleic acid sequence of interest having sequence 1, and each molecule is uniquely labeled.
  • FIG. 6A shows a conventional method involving amplification, indexing, mixing and sequencing.
  • FIG. 6D illustrates the use of molecular barcodes, including molecular barcode addition, amplification, indexing, mixing and sequencing. Unique barcode attachment to nucleic acid molecules and unique index is added to each sample, showing a scheme for mixing and sequencing multiple samples. An index may be added after amplification of the molecular barcodes, but before amplification.
  • FIG. 6E illustrates the use of molecular barcodes, including molecular barcode addition, amplification, indexing, mixing and sequencing. 1 shows a scheme of an example of a method of identifying a miss index in the first embodiment of the present invention. Although a miss index may occur, in the first embodiment of the present invention, the encountered miss index can be identified.
  • FIG. 7A shows a scheme for adding a barcode to a target nucleic acid molecule contained in a plurality of samples.
  • FIG. 7B shows a scheme of indexing and amplification, and shows the case in which partial switching occurs due to indexes that are contaminated with other indexes.
  • FIG. 7C shows the count of the bar code number, confirmation of the same bar code, and identification of an error (index and bar code mispair).
  • FIG. 8 is a supplementary drawing of FIG. 2 and shows the observed intrinsic features of digital counts using barcodes with random bases.
  • Panel A of FIG. 8 represents the dependence of the number of clusters detected on the number of random bases (base length) for ST1, ST2, LT1 and LT2.
  • Gray lines represent the dependence of the number of unique barcode sequences observed on the number of random bases.
  • FIG. 9 is a supplementary drawing of FIG. 3 and shows the analysis results using Distance and fixed base for ST2, LT1 and LT2.
  • Panel A of FIG. 9 shows the effect of clustering on the number of clusters as in panel A of FIG. 3 but for ST2, LT1 and LT2 with different Distance parameters.
  • Panel B of FIG. 9 is the same as panel B of FIG. 3, but shows the dependence of the position of the fixed base on ST2, LT1 and LT2. Asterisks indicate no filtering.
  • Panel C of FIG. 9 is the same as panel C of FIG. 3, but shows the dependence of the number of fixed bases on ST2, LT1 and LT2.
  • FIG. 9 is a supplementary drawing of FIG. 3 and shows the analysis results using Distance and fixed base for ST2, LT1 and LT2.
  • FIG. 10 shows a histogram of the number of reads in each cluster for ST1 with index A (see panel A) and index B (see panel B). The colors correspond to the sample colors of the plot of FIG. 2C.
  • FIG. 12 is a supplementary drawing of FIG. 5 and shows an estimation of the required number of random bases for digital counting. This plot is the same as FIG.
  • FIG. 13 shows the bar code design and the number of numerator inputs.
  • the capital letters in the sequences of LT1 to 6 are binding sites of PCR amplification primers.
  • Barcode indicates a random region including random bases and fixed bases, and target indicates a target nucleic acid.
  • LT1-6 were PAGE purified products, and the 5 'end of ST1-5 was amine-modified. The fixed base between the random bases helps to avoid long homopolymeric barcodes which may have lower amplification efficiency.
  • N represents any one of A, T, G, or C.
  • FIG. 14 shows primer sequences for preparation of the library. The underlined part indicates an index array (index A is included in Rv primer 1 and index B is included in Rv primer 2). All primers were PAGE purified.
  • FIG. 15 shows the number of leads in each process. * This fraction may be greater than the number of leads in the exclusion of a miss index (eg, contamination index) (see example).
  • a "molecular barcode” is a tag having a unique sequence that is attached to each molecule of nucleic acid. Also called “primer ID” and “Ubiquitous Molecular Identifier (UMI)".
  • UMI Ubiquitous Molecular Identifier
  • This method of determination of nucleic acid molecules has attracted a great deal of attention as a next-generation sequencer platform that has made it possible to analyze a large amount of nucleic acid sequences in a single run, and has become a focus of attention.
  • Various methods have been developed to determine the number digitally. Since this method of determining the number of nucleic acid molecules can digitally count the number of molecules as the number of types of barcodes (sometimes referred to as the “number of unique barcodes”), “digital counting method” or “digital quantification method” It is sometimes called "etc.” This digital counting method can accurately determine the absolute number of molecules in a sample, even in the presence of the noise and bias of the measurement system.
  • RNA-Seq using molecular barcodes, ie, digital RNA-Seq (dRNA-Seq) or quantitative RNA-Seq.
  • dRNA-Seq digital RNA-Seq
  • Digital counting is also used for many applications in next generation sequencer platforms that can acquire large amounts of sequence data.
  • RNA-Seq single nucleotide resolution UV cross-linked immunoprecipitation
  • iCLIP individual cross-link resolution and UV cross-linking and immunoprecipitation
  • antibody repertoire analysis genetic analysis of bacterial 16S rRNA, And exonucleases, unique barcodes and single ligation-mediated nucleotide resolution chromatin immunoprecipitation experiments
  • ChIP-nexus chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode and single ligation.
  • the same barcode can be applied to multiple nucleic acid molecules present in the original sample.
  • the possibility of addition is substantially limited, which allows the number of types of barcode sequences to correspond to the number of nucleic acid molecules present in the sample.
  • quantification of nucleic acid molecules present in the sample is possible by using molecular barcodes comprising nucleotide sequences with sufficient diversity.
  • Molecular barcodes can be obtained, for example, as nucleic acid groups with random bases.
  • the molecular barcode is focused on the number of types of its sequence to determine the number of molecules to be measured, the sequence is random (the sequence is diverse and it is not necessary for humans to recognize the content of the sequence It may be said that they may be synthesized in the same manner.
  • the molecular barcode may be a set of nucleic acids of known sequence designed to obtain sufficient diversity.
  • a molecular barcode may be simply referred to as a barcode, and a sequence possessed by the molecular barcode may be referred to as a barcode sequence.
  • the number of unique barcode sequences is a number that represents the degree of diversity of barcode sequences.
  • the number of unique barcode sequences is n if n different barcode sequences are detected ⁇ where n is a natural number ⁇ .
  • the number of random bases means the base length of random bases.
  • a random base refers to a continuous base having a random sequence.
  • the random bases may consist of two bases, three bases or four bases.
  • an "index" is a nucleic acid that serves as a unique label attached to a nucleic acid molecule for each sample from which it is derived. For example, an index can be added that has a different nucleotide sequence for each sample.
  • samples from which individual nucleic acid molecules are derived based on the sequence of the added index It can be identified. Due to the large capacity of one sequencing on the platform of the next-generation sequencer, it is possible to mix multiple samples and sequence them in one run, for example, the index is useful in such a case is there.
  • the addition of the index may be added before, during or after processing (eg, amplification) of the nucleic acid molecule.
  • the terms “template”, “target nucleic acid”, “target nucleic acid molecule”, “target nucleic acid” or “target nucleic acid molecule” refer to a nucleic acid molecule (eg, DNA or RNA) to be quantified in digital quantification. Can be used interchangeably.
  • the sequence originally possessed by the target nucleic acid molecule ie, the sequence before the barcode or index is added for analysis
  • the target nucleic acid sequence is referred to as the target nucleic acid sequence.
  • nucleic acid means a macromolecule having a nucleic acid sequence.
  • Nucleic acids include deoxyribonucleic acid (DNA) and ribonucleic acid (RNA).
  • the ribonucleic acid includes messenger RNA (mRNA), non-coding RNA such as microRNA, transfer RNA (rRNA), and ribosomal RNA (rRNA).
  • sequence depth refers to the total amount or total number of molecules to be sequenced. For example, high sequence depth (ie, more sequence information can be obtained) may increase the likelihood that sequences only slightly present in the sample will be detected.
  • coverage means the average number of reads of each cluster (read number / cluster) obtained by clustering as originating from the same nucleic acid molecule.
  • each molecule means that they differ from one another for at least a portion of the molecules contained in the system.
  • the “specific to each molecule” means all molecules contained in the system, substantially all molecules, or most of the molecules (eg, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more) % Or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more).
  • nucleic acids target nucleic acid molecules
  • DNA eg, complementary DNA or cDNA
  • nucleic acids Add molecular barcodes with different sequences for each molecule
  • FIG. 6C Such a nucleic acid to which a molecular barcode having a unique sequence for each molecule is added may be referred to as a "barcoded nucleic acid”.
  • the barcoded target nucleic acid molecule (cDNA obtained from RNA if the starting nucleic acid is RNA) is amplified (see, eg, FIG. 6D).
  • the target nucleic acid sequence of the barcoded and amplified nucleic acid and the barcoded sequence are sequenced in tandem (see, eg, FIG. 6D).
  • the number of unique barcodes attached to the target nucleic acid sequence is quantified rather than the number of amplified molecules (the so-called "read number"), and the original The absolute copy number of the nucleic acid molecule of interest (before amplification) can be determined.
  • the barcode sequence may be added to the target nucleic acid molecule so as to have a unique sequence for each nucleic acid molecule, and the specific sequence thereof It does not matter what the In digital quantification, barcodes whose specific sequence is known may be used.
  • the platform of the next generation sequencer has been developed, and it has become possible to decode a large number of base sequences by one sequencing (run). As a result, single-sample measurements do not run out of sequencing capabilities and the need for sequencing multiple samples simultaneously in a single run is growing. While sequencing multiple samples in a single run, a unique index can be added to each sample to distinguish which sample the nucleic acid came from.
  • the index may be added to the target nucleic acid molecule so as to have any sequence unique to each sample, regardless of the specific sequence.
  • an index whose specific sequence is known may be used.
  • the index may be added to the amplified target nucleic acid molecule after the target nucleic acid molecule is amplified, or added to the target nucleic acid molecule before the target nucleic acid molecule is amplified. It may be done.
  • An index may be added after performing amplification on each sample. For example, the addition of an index can be performed on each of the amplification products by adapter ligation. Alternatively, the index may be added while the nucleic acid molecule of interest is amplified.
  • the addition of an index can be performed during amplification of the nucleic acid molecule by including it in the sequence of the primer.
  • the index when the index is added to the target nucleic acid molecule before amplification, the index may be added to the target nucleic acid molecule before, simultaneously with or after the addition of the barcode sequence.
  • the index, the barcode sequence, and the nucleic acid molecule of interest may be linked in any order.
  • the index may be provided in conjunction with the barcode array.
  • the target nucleic acid molecule derived from the specific sample can be identified using the index as an index, and is added to the target nucleic acid sequence
  • the number of unique barcode sequences is quantified to determine the absolute copy number of the original (before amplification) nucleic acid molecule of interest (see, eg, FIG. 6D).
  • a nucleic acid derived from a sample having an unexpected index It has become apparent that problems may arise that are added to (see FIG. 6E, FIG. 7B). This problem may occur when using an index, and is referred to as index switching, index hopping, misindexing, and the like. The existence of the index switching problem has already been pointed out (Sinha, R. et al. Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing.
  • the present invention provides solutions to each of these problems.
  • a digital quantification method that uses a sample-specific index to distinguish samples
  • multiple bar codes and indexes are not added to the same bar code for the target nucleic acid molecule to which the index is added. (Since each nucleic acid molecule has a unique barcode attached).
  • each nucleic acid molecule has a unique barcode attached.
  • the present invention when multiple indexes are found in a cluster of nucleic acid molecules to which the same barcode is added, it can be determined that a miss index has occurred (for example, FIG. 6E). And Figure 7C).
  • the number of existing index sequences is compared, and the most frequently occurring index sequence is the correctly indexed sequence.
  • this method can be carried out independently of the sequence of the target nucleic acid molecule.
  • the method may or may not include decoding the nucleic acid sequence of interest. This method corresponds to the first embodiment described below.
  • the index sequence and the barcode sequence have different identifications that affect the accuracy of quantification.
  • the barcoded sequence is recognized as a different sequence by mutation (for example, insertion, substitution, and deletion) of bases in the sequence regardless of whether or not an index is added, the number of sequence types is The digital quantification used to determine the original number of molecules before being subjected to amplification etc. makes the determination of the number of molecules inaccurate.
  • substitution of bases in a barcode the sequences included in a certain distance (Distance) are clustered as one cluster, and the number of molecules is determined based on the number of clusters.
  • the substitution of bases can address the problem of being recognized as an identical but different sequence.
  • distance refers to the number of bases that differ between two predetermined barcode sequences. For example, if one barcode sequence is exactly the same as another barcode sequence except for one base change at any one position, the distance between these two barcode sequences is 1 is there. For example, the distance between these two barcode sequences is 2 if exactly the same except for the two base changes at any two positions. For example, if one barcode sequence is exactly the same as another barcode sequence except for three base changes at any three positions, then the distance between these two barcode sequences is 3 It is. It is believed that the accuracy of the method of the first embodiment increases as the diversity of barcode sequences increases.
  • the value of Distance is not particularly limited as long as it is appropriately determined according to the present disclosure, but is, for example, 1 to 10, preferably 1 to 5, more preferably 1 to 3, and still more preferably 3. It goes without saying that this method can be carried out independently of the sequence of the target nucleic acid molecule. Thus, the method may or may not include decoding the nucleic acid sequence of interest. This method corresponds to the second embodiment described below. In a system for adding an index, it can be used similarly to determine the difference between the indexes.
  • the barcode may or may not be added with an index.
  • the base at the fixed position as the fixed base (ie, setting the base at the predetermined position in the barcode sequence as the specific or defined base)
  • generation of indel is indicated by the absence of the fixed base at the predetermined position.
  • This method may be referred to herein as "fixed base match filtering"). That is, it is determined that insertion or deletion of a base has occurred in the barcode sequence when the sequenced barcode sequence contains a base different from the original base at any of the fixed base positions.
  • the number of fixed bases in the barcode sequence is not particularly limited as long as it is appropriately determined according to the present disclosure, but it is, for example, 1 to 15, preferably 2 to 12, more preferably 3 to 10, still more preferably 4 to There are six. It goes without saying that this method can be carried out independently of the sequence of the target nucleic acid molecule. Thus, the method may or may not include decoding the nucleic acid sequence of interest. This method corresponds to the third embodiment described below. In a system for adding an index, it can be used similarly to determine the difference between the indexes.
  • a first embodiment of the present invention That is, according to the first embodiment of the present invention, Sequencing using a mixture of multiple samples containing the nucleic acid molecule of interest with an index unique to each sample containing multiple nucleic acid molecules and a unique or arbitrary molecular barcode unique to each nucleic acid molecule (multiplex sequencing)
  • a method of determining the correct pair or mispair of an index added to a target nucleic acid molecule and a molecular barcode from sequence information obtained by E) From the obtained sequence information, a sequence having a specific index or a sequence similar thereto, a sequence having a specific molecular barcode or a sequence similar thereto, or a sequence containing a target nucleic acid molecule or a similar Selecting a sequence and creating a group from the selected sequence; (F) In the group created in (E) above, determine the pair with the highest detection frequency index and molecular barcode as the correct pair of index and molecular barcode, and / or the low detection frequency index and molecules Bar
  • the method of the invention (A) separately obtaining a plurality of samples containing a nucleic acid molecule (eg, DNA or RNA) and ⁇ at least one of the samples includes a target nucleic acid molecule ⁇ , (B) ⁇ e.g., in each of a plurality of obtained samples ⁇ before amplification of the nucleic acid molecules contained in the sample, each nucleic acid molecule of interest is linked to each of the nucleic acid molecules, or each molecular barcode unique to each nucleic acid molecule is linked; Obtaining a target nucleic acid molecule to which different molecular barcodes are linked; (C) ⁇ For example, before mixing a plurality of samples ⁇ A target nucleic acid molecule in which a unique index is added to the target nucleic acid molecule for each sample containing a plurality of target nucleic acid molecules, and different indexes are linked for each sample derived And (step (C) may be performed after step (B), or step (
  • the nucleic acid molecule can be amplified to obtain an amplification product of the target nucleic acid molecule
  • the sample may be mixed before obtaining the amplification product, and after obtaining the amplification product, the sample containing the amplification product may be mixed), an index unique to each sample, and a unique or optional for each target nucleic acid molecule And sequencing the nucleic acid molecule to which the molecular barcode of S is added to determine the sequence of the index portion, the sequence of the molecular barcode portion, and the sequence of the target nucleic acid molecule portion linked thereto as needed for each nucleic acid molecule.
  • It may further include a.
  • any index may be used as long as it has a unique base sequence for each sample.
  • the index may have a predetermined sequence (eg, it may be possible to determine which sample is derived from by referring to the sequence), but the sequence is unknown. (For example, reference to a sequence makes it impossible to determine which sample it is derived from, and it may be known that different sequences lead to different samples).
  • molecular barcodes can be made to have sufficient diversity to the number of nucleic acid molecules in the sample.
  • the molecular barcode may have any base sequence as long as it has sufficient diversity to the number of nucleic acid molecules in the sample.
  • the sequence of the molecular barcode can be a randomly determined sequence (a randomly determined sequence), for the purpose of saving time and effort of sequence design.
  • the molecular barcode may have sufficient diversity by including a plurality of randomly determined bases (ie, random bases). In order to ensure the diversity of molecular barcodes, the length of the base sequence of molecular barcodes can be increased.
  • the number of random bases in the base sequence of the required molecular barcode is determined experimentally based on the graph as exemplified in FIG. You may Although the present invention is not limited, for example, by setting the number of random bases in the base sequence of the molecular barcode to 38 or more, sufficient diversity is secured for digitally quantifying the number of molecules covering 10 15 It can be understood from the examples that can be done. If the four bases are randomly arranged and the base length is 38, then the diversity of the molecular barcodes theoretically extends to 4 38 (ie, about 7.56 ⁇ 10 22 ).
  • the number of random bases in the molecular barcode is, for example, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 12 or more, 13 or more, 14 or more, 15 or more to ensure sequence diversity. , 16 or more, 17 or more, 18 or more, 19 or more, or 20 or more. Alternatively, the number of random bases may be 25 or more, 30 or more, 35 or more, 40 or more.
  • the plurality of samples are 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, or 10 or more samples, and It is a distinguishable number, but there is no upper limit to the number.
  • a sequence having a specific index, a sequence having a specific molecular barcode, and / or a sequence containing a target nucleic acid molecule based on sequence identity Groups can be formed by the sequences selected and selected.
  • a sequence having a specific molecule barcode and forming a group for each molecular barcode by the selected sequence it is possible to form a number of groups corresponding to the number of types of molecular barcodes. .
  • a number corresponding to the number of indexes (for example, the number of samples when adding a different index for each sample) can form a group of Also, by selecting a sequence having a specific target nucleic acid and forming a group with the selected sequence, a nucleic acid group including the target nucleic acid can be obtained.
  • the step of (E) above comprises creating identical groups ⁇ preferably in the sequence of the molecular barcode portion ⁇ based on sequence identity or similarity This can be performed by creating a group by clustering molecules presumed to be ⁇ eg, the sequence may be changed by any of the steps (A) to (D) ⁇ as a group.
  • the above (E) can be implemented in combination with the second embodiment. Details will be described in the second embodiment.
  • the above (E) can be implemented in combination with the second embodiment and the third embodiment. Details will be described in the third embodiment.
  • the index with the highest detection frequency and the pair of molecular barcodes are the index and the correct pair of molecular barcodes and It can be decided.
  • at least any one or all of the low detection frequency index and molecular barcode pairs may be determined as the index and molecular barcode mispairing.
  • the highest detection frequency index and molecular barcode pair is determined as the correct index and molecular barcode pair, and at least low frequency detection index and molecular barcode pair Any one or all may be determined as the index and the misplacement of the molecular barcode.
  • nucleic acid molecules determined to be mispaired can be excluded from the count of the number of molecules. Determination of the correct pair and determination of the mispair, respectively, can be performed regardless of the sequence of the nucleic acid molecule of interest.
  • the target nucleic acid molecule may be selected to determine the correct pair and the mispair, respectively; or the target nucleic acid molecule may be selected after determining the correct pair and the mispair. You may
  • step (F) when a sequence having a specific molecular barcode is selected to create a group for each molecular barcode, (i) In step (F), the pair with the highest detection frequency index and molecular barcode among the created groups can be determined as the correct pair of index and molecular barcode; or (ii) Among the groups created in step (F), a pair with a low detection frequency index and a molecular barcode (for example, a pair with a detection frequency lower than a certain reference value Values include 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, and 1% or less, and are not limited to these. May be determined as an index and / or a molecule barcode and / or a mispair of at least one or all of them.
  • a pair with a low detection frequency index and a molecular barcode for example, a pair with a detection frequency lower than a certain reference value Values include 50% or less, 40% or less, 30% or less,
  • step (F) for nucleic acid sequences having molecular barcodes appearing in a plurality of groups, determine the barcode / index pair with the highest number of reads as the correct barcode / index pair, or The highest detection frequency index and molecular barcode pair can be determined as the correct index and molecular barcode pair; or (iv) In the step (F), a low detection frequency index and molecular barcode pair (for example, a detection frequency pair lower than a predetermined reference value) for sequences having molecular barcodes appearing in a plurality of groups
  • the certain reference value includes, but is not limited to, 50% or less, 40% or less, 30% or less, 20% or less, 10% or less, 5% or less, 1% or less in the group.
  • the second and subsequent detection frequency pairs may be determined as) at least any one or all of the indices and the mispair of the
  • step (F) when a sequence including the target nucleic acid molecule is selected to create a group, (v) In step (F), a sequence having a specific index is further selected from the group to create a subgroup, and the nucleic acid sequence having a molecular barcode appearing in a plurality of subgroups has the highest number of reads
  • the barcode and index pair can be determined as the correct barcode and index pair, or the most frequently detected index and molecular barcode pair can be determined as the correct index and molecular barcode pair;
  • step (F) further select a molecule having a specific molecular barcode from the group to create a subgroup, and in the created subgroup, the index with the highest detection frequency and the molecular barcode The pair can be determined as the correct pair of index and molecular barcodes;
  • molecules having a specific index are further selected from the group to create subgroups, and nucleic acid molecules having molecular barcodes
  • the correct pair of barcode and index sequences can be determined and / or mispairing can be determined.
  • not counting mispairs can improve the accuracy of digital quantification of a target nucleic acid molecule.
  • a second embodiment of the present invention In digital quantification of nucleic acid molecules using barcode sequences, mutations (insertions, substitutions or deletions) occur in the barcode sequences during analysis, and the mutations have a quantitative accuracy. It became clear that the A second embodiment of the present invention is a method for digitally quantifying a target nucleic acid molecule using a barcode sequence, and based on the obtained information on the nucleic acid sequence, has sequence similarity to the sequence possessed by the molecular barcode after mutation. It is related to grouping into 1 group together with the sequence of (clustering). This seeks to minimize the effects of mutations in barcode sequences that occur during analysis.
  • similar sequences may be generated by mutation (insertion, substitution or deletion) from the same sequence. It is suggested that this clustering actually improves the accuracy of digital quantification even in the example. More specifically, for example, the step of creating a group had the same sequence ⁇ determined preferably in the sequence of the molecular barcode portion ⁇ sequence identity or similarity ⁇ e.g. When (A) to (D) are performed, it may be possible to create a group by clustering molecules presumed to have a sequence mutation at any of these steps as a group. Therefore, a sequence having similarity with a sequence having a particular index includes a sequence having a particular index and a sequence having similarity to a sequence having a particular index.
  • the obtained nucleic acid sequence is indexed, based on sequence similarity, a barcode and / or a target nucleic acid molecule Can be grouped (clustered).
  • the obtained nucleic acid sequence is indexed, based on sequence similarity, a barcode and / or a target nucleic acid molecule Can be grouped (clustered).
  • the clustering may for example be an insertion or deletion (indel up to 1 base, up to 2 bases or up to 3 bases) in the sequence of the molecular barcode portion (indel)
  • the present invention is carried out by classifying nucleic acid molecule groups having sequences sequenced as having C.) into the same cluster.
  • the obtained nucleic acid sequence is indexed, based on sequence similarity, a barcode and / or a target nucleic acid molecule Can be grouped (clustered).
  • the clustering may for example be an insertion or deletion (indel up to 1 base, up to 2 bases or up to 3 bases) in the sequence of the molecular barcode portion (indel)
  • the present invention is performed on a group of nucleic acid molecules obtained excluding the sequences sequenced as having.
  • nucleic acid sequences can be selected according to whether the sequences of a particular barcode are similar, and groups can be created from the selected sequences.
  • similar means that the sequences differ by one, two, three or more bases (eg, insertion, deletion or substitution), but the other bases are identical.
  • the proportion of matching bases between similar base sequences is, for example, 50% or more, 55% or more, 60% or more, 65% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more , 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more.
  • insertions or deletions may occur in the resulting nucleic acid sequences.
  • a third embodiment of the invention in the detection of indels that may occur to a nucleic acid sequence (in particular a barcode sequence), one or more (eg one) arranged in all barcode sequences linked to the nucleic acid molecule Whether part (one or more) or all of the two, three, four, five or six or more fixed bases are changed to a base other than the predetermined fixed base in the original position Can be detected.
  • each fixed base is usually designed to be any one base selected from the group consisting of A, T, G and C.
  • each fixed base may be a combination of A and T, a combination of A and G, a combination of A and C, a combination of T and G, a combination of T and C, a combination of G and C, A and T And G combinations,
  • A, T and C Can be designed to be a base selected from the bases contained in any one combination selected from the group consisting of: a combination of A, G and C, and a combination of T, G and C ⁇ . This makes it possible to use indel as an indicator that one or more fixed bases are present at a position deviated from the predetermined position, and preferably, as a further index that another base is present at the position where the fixed base should be present. Can be detected.
  • nucleic acid molecule group having a sequence sequenced as having indel may be classified into the same cluster as a sequence having no indel, or a nucleic acid molecule group having a sequence sequenced as having indel (Eg, nucleic acid molecules having a sequence sequenced as having indel may be excluded from the obtained sequence information, or nucleic acid molecules having a sequence sequenced as having indel Clusters of nucleic acid molecules may be excluded.
  • the fixed bases when two or more fixed bases are present, the fixed bases may preferably have one or more other bases intervened between the fixed bases.
  • "fixed base” refers to a common position which is present at a predetermined position from the end (5 'end, or 3' end, or 5 'end and 3' end) of a barcode sequence in a plurality of barcode sequences.
  • the common base may be a base determined by design common to a plurality of barcode sequences as described above).
  • Method for analysis of nucleic acids (I) subjecting a mixture of a plurality of target nucleic acid molecules to which a molecular barcode and an index have been added to sequencing to obtain sequence information; (II) From the sequence information obtained in (I) above, a sequence having a specific index or a sequence similar thereto, and / or a sequence having a specific molecular barcode or a sequence analogous thereto is selected and selected Creating a group by the arranged array; (III) determining, in the group created in (II) above, the pair of the index with the highest detection frequency and the molecular barcode as the correct pair of the index and the molecular barcode; May be included.
  • a second embodiment of the present invention is a method of analyzing nucleic acid: (I) subjecting a mixture of a plurality of nucleic acid molecules to which molecular barcodes have been added to sequencing to obtain sequence information; (II) selecting a sequence having a specific molecular barcode or a sequence similar thereto from the sequence information obtained in (I) above, and creating a group from the selected sequence; May be included.
  • a third embodiment of the present invention is a method of analyzing nucleic acid: (I) subjecting a mixture of a plurality of nucleic acid molecules to which a molecular barcode having a fixed base at a specific position is added to sequencing to obtain sequence information; (IIa) excluding from analysis a sequence having a molecular barcode which does not contain the fixed base at the specific position; May be included.
  • the target nucleic acid molecule to which at least a molecular barcode has been added may be subjected to amplification prior to step (I).
  • at least a target nucleic acid molecule to which a molecular barcode has been added means that an index may be further added if at least a molecular barcode is added, and an index may not be added.
  • the molecular barcode is a known method, eg, when amplifying a nucleic acid molecule of interest using a primer comprising the molecular barcode sequence (eg It can be attached to the target nucleic acid molecule by chain reaction).
  • an index may be added to the amplification product of the target nucleic acid molecule to which the molecular barcode has been added.
  • Methods for adding an index to an amplification product include known methods, for example, adapter ligation using an adapter having an index sequence.
  • an index may be added to the target nucleic acid molecule along with the molecular barcode.
  • the target nucleic acid molecule is amplified (eg, polymerase chain reaction) using a known method The method is mentioned.
  • the sequence similar to the sequence having the specific molecular barcode in step (II) is less than the predetermined number of bases and the sequence having the specific molecular barcode.
  • a sequence containing a mismatched base in the molecular barcode sequence portion is an integer in the range of 1 to 10, 1 to 9, 1 to 8, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2.
  • the predetermined number of bases is an integer in the range of 1 to 10, 1 to 9, 1 to 8, 1 to 7, 1 to 6, 1 to 5, 1 to 4, 1 to 3, or 1 to 2.
  • the base other than the mismatched base exactly matches the sequence of the specific molecular barcode.
  • the method of the first embodiment can be implemented in combination with the third embodiment.
  • the method of the second embodiment can be implemented in combination with the third embodiment.
  • the molecular barcode may have a fixed base at a particular position.
  • a sequence similar to the sequence with the specific molecular barcode in step (II) comprises the fixed base at the specific position, and / or the fixed base May be selected based on the fact that the position of is shifted from the specific position.
  • the method may further include excluding, from analysis, a sequence having a molecular barcode that does not include the fixed base at the specific position.
  • a sequence having molecular barcodes that do not contain the fixed base at the specific position is analyzed It may further include excluding.
  • excluding from the analysis a sequence having a molecular barcode that does not contain the fixed base at the specific position may be before, after or during clustering.
  • a sequence having a molecular barcode which does not contain the fixed base at the specific position may be excluded from the sequence information of step (I), which is prepared in step (II) It may be excluded from the analyzed group.
  • sequence information consisting of a sequence comprising the fixed base at the specific position in step (I) or after step (I) You may get it.
  • a group consisting of a sequence comprising the fixed base at the specific position may be obtained.
  • step (IIb): in the step (I), or after step (I) the identification in the molecular barcode portion
  • Step (II) a sequence having a specific molecular barcode or similar thereto from step (II): the sequence information obtained in (I) above Select sequences to create a group according to the selected sequences
  • step (IIc): in step (II) or after step (II), at the specific position in the molecular barcode portion You may obtain the group which consists of a sequence containing the said fixed base.
  • the sequence information or group consisting of the sequence containing the fixed base at the specific position in the molecular barcode portion may consist of the sequence containing the fixed base at all specific positions.
  • the sequence information or group consisting of a sequence including the fixed base at the specific position in the molecular barcode portion is n when the number of fixed bases is n (wherein n is a natural number), Alternatively, it may be composed of a sequence including fixed bases at a specific position of nm (where m can be a natural number ranging from 1, 2, 3 or 1 to n-1).
  • step (III) the index and molecular barcode pair other than the determined correct pair are determined as the index and the molecular barcode mispair Also, the determined mispair may be excluded from the analysis.
  • the nucleic acid analysis method is based on the number of groups created by the sequence having the specific molecular barcode or the sequence similar thereto.
  • the method may further comprise the step of determining the number of target nucleic acid molecules contained in the sample from which the molecule is derived.
  • first embodiment of the present invention can be combined with the second embodiment of the present invention
  • first embodiment of the present invention can be combined with the third embodiment of the present invention.
  • the first embodiment of the present invention may be combined with the second and third embodiments of the present invention.
  • the second embodiment of the present invention can combine the third embodiment of the present invention.
  • a fourth embodiment of the present invention is a method for digitally quantifying a target nucleic acid molecule using a barcode sequence, comprising: And a third embodiment, and a method comprising the implementation of an embodiment selected from the group consisting of these combinations.
  • a fourth embodiment of the present invention is a method for digital quantification of a nucleic acid molecule of interest using a barcoded sequence comprising: (E) selecting a nucleic acid molecule containing the sequence of the target nucleic acid molecule from the obtained sequence information; (F) clustering the nucleic acid molecules selected in (e) above for each unique molecular barcode sequence, and then identifying a cluster having a plurality of sequences in the index nucleic acid molecule portion, (G) In each of the clusters identified in (f), the index with the highest detection frequency and the molecular barcode pair are identified as the correctly indexed target nucleic acid molecules, and the other index and molecular barcode pairs are identified.
  • the purpose contained in the sample corresponding to the index based on the number of sequence types of the unique molecular barcode linked to the correctly indexed target nucleic acid molecule (or the number of clusters of the target nucleic acid molecule correctly indexed) Determine the number of nucleic acid molecules, It may be a method.
  • step (g) the number of kinds of unique molecular barcode sequences linked to the correctly indexed target nucleic acid molecule (or the number of clusters of the target nucleic acid molecule correctly indexed)
  • the number may be determined as the number of target nucleic acid molecules contained in the sample corresponding to the index, and it is considered that the accuracy of quantification increases in principle as the number of reads increases.
  • the fourth embodiment of the present invention is (A) separately obtaining a plurality of samples comprising a nucleic acid molecule (eg, DNA or RNA) and ⁇ at least one of the samples comprising the nucleic acid molecule of interest ⁇ , (B) A target nucleic acid molecule in which an arbitrary molecular barcode is linked to each of the target nucleic acid molecules in each of a plurality of obtained samples before amplification of a nucleic acid molecule contained in the sample, and different molecular barcodes are linked to each other Obtaining (C) Before mixing a plurality of samples, a unique index is added to the target nucleic acid molecule for each sample containing a plurality of target nucleic acid molecules, and a library of target nucleic acid molecules in which different indexes are linked for each sample derived (The order of step B and step C may be earlier; and after step (b) or (c), the nucleic acid molecule can be amplified to obtain an amplification product of the target nucleic
  • the sample may be mixed before obtaining the amplification product, and after obtaining the amplification product, the sample containing the amplification product may be mixed), an index unique to each sample, and a unique or optional for each target nucleic acid molecule
  • sequencing the nucleic acid molecules to which the molecular barcodes have been added to identify the sequence of the index portion, the sequence of the molecular barcode portion, and the sequence of the nucleic acid molecule portion linked thereto for each nucleic acid molecule. It is also good.
  • the clustering is (I) In the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having the same sequence as the unique molecular barcode sequence into the same cluster; (Ii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecule groups having a unique molecular barcode sequence and a sequence having up to one mismatch into the same cluster; (Iii) in the sequence of the molecular barcode portion, by grouping nucleic acid molecules having a unique molecular barcode sequence and a sequence having up to 2 bases of mismatch into the same cluster; or (iv) the molecular barcode In the partial sequence, it may be performed by classifying nucleic acid molecule groups having a sequence of unique molecular barcodes and a sequence having up to 3 bases into the same cluster.
  • clustering is In the sequence of the molecular barcode portion, nucleic acid molecules having a sequence sequenced as having an insertion or deletion (indel) of bases (for example, up to 1 base, up to 2 bases, or up to 3 bases) in the same cluster It may be performed by classification. At this time, the molecular barcode having the fixed base described in the third embodiment may be used.
  • indel insertion or deletion
  • clustering is A nucleic acid molecule obtained by excluding sequences sequenced as having insertions or deletions (indel) of bases (eg, up to 1 base, up to 2 bases, or up to 3 bases) in the sequence of the molecular barcode portion It may be performed on a group.
  • the molecular barcode having the fixed base described in the third embodiment may be used.
  • each of the target nucleic acid molecules can be Accurate digital quantification is achieved by labeling and obtaining a sufficiently large number of leads compared to the number of original target nucleic acid molecules and detecting all the molecular barcodes attached to each target nucleic acid molecule. It will be possible.
  • RNA-Seq gene expression analysis 2-7 by RNA sequencing, iCLIP (individual-nucleotide resolution UV cross-linking and immunoprecipitation) 8 , antibody repertoire analysis 9 , bacterial 16S rRNA gene analysis 10 , 11 , and ChIP It is used for many applications in next generation sequencing platforms such as -nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode and single ligation) 12 . These methods make it possible to digitally accurately determine the absolute number of molecules in a given sample, even in the presence of noise and / or bias in the measurement system.
  • RNA-Seq using molecular barcodes ie digital RNA-Seq (dRNA-Seq) 3 or quantitative RNA-Seq 13 is one of the most widely used applications of digital counting.
  • dRNA-Seq is often used for single cell gene expression analysis as it works well even for small sample sizes.
  • the detection limit is important in such measurement. Because, single cells have been shown to have many low copy RNAs 13,14 , and detection limits indicate that there are many potentially undetected low copy RNAs, which are biological Because it can affect the subsequent interpretation of the phenomenon. Therefore, since the barcode system used determines the detection limit of nucleic acid quantification, the investigation of the efficacy of barcodes for absolute and digital quantification is crucial.
  • RNA-Seq studies where barcodes are individual cells in a single sequencing run Because it can be used to identify thousands of cells in 7) .
  • the general procedure for digital quantification of nucleic acid molecules is as follows (see FIG. 1, panel A). (I) Uniquely tag each RNA (or complementary DNA or cDNA) or DNA with externally added DNA (molecular barcodes) containing diverse sequences 1-3 . (Ii) Amplify the barcoded DNA or cDNA (produced from RNA when starting from RNA).
  • sequence limiting barcodes When sequence limiting barcodes were previously used, the diversity of barcode sequences required for accurate quantitation was estimated by theoretical calculations 16 and for absolute quantitation of barcoded molecules. The capacity of this technology was experimentally confirmed 3,16 .
  • sequence limiting barcodes has the following disadvantages: Many different individually designed barcode sequences have to be prepared for the measurement of high dynamic range, which is It is not cost effective. In order to minimize costs while increasing the dynamic range of counting, random (or pseudorandom) base barcodes are used instead 2,4-9,11,12,18,20 . Again, it should be determined that the sequence diversity of the barcode set is sufficient 17, 18 .
  • Thermal cycling (ProFlex PCR system; Themo Fisher Scientific Inc.) was performed as follows: half cycle at 98 ° C .; 10 seconds at 98 ° C., 10 seconds at 60 ° C., and 1 minute at 68 ° C. 4 cycles; 19 cycles of 98 ° C. for 10 seconds, 60 ° C. for 2 seconds, and 68 ° C. for 1 minute; 68 cycles of 1 ⁇ 5 of the cycle; Then, the amplified products were twice column purification (DNA Clean & Concentrator TM -5; Zymo Research Corp, CA, USA) , and the length distribution of the amplified product 2100 Bioanalyzer (Agilent Technologies, Inc., CA, USA) and It confirmed using. Concentrations were determined by qPCR kit (KK4602; KAPA Biosystems, Inc., MA, USA) using a real-time PCR system (7500; Themo Fisher Scientific Inc.).
  • Sequencing index added two samples (CGCTCATT: index A (index A), GAGATTCC: index B (index B)) for 150 cycle kit v3 (Read 1: 100 cycles, Read 2: 50 cycles, Index 1: 8 Single-run sequencing using MiSeq sequencer (Illumina, Inc.). Read 2 was not used for analysis because the sequence in Read 2 is part of the sequence in Read 1.
  • Raw sequence data used for analysis was deposited at GEO database GSE94895.
  • the barcode region was 50 bp from the 5 'end for the long template and 30 bp from the 5' end for the short template (see “barcode” in FIG. 13) and these were extracted from the mapped reads.
  • the fixed bases in the barcode region (up to 6 bases for short templates and up to 12 bases for long templates; see Figure 13 "barcode") are used for filtering and at least one fixed base mismatch I have excluded the barcode I have.
  • the leads with index A and B were integrated prior to clustering when cross-contamination of the index was considered.
  • Nucleotide Sequence Clusterizer The in-house software named "Nucleotide Sequence Clusterizer" was coded in C for clustering. This tool performs clustering of DNA sequences using the identified nucleotide positions of each sequence. This tool performs bounded single link clustering: First, each sequence is in its own cluster. If any two sequences differ from each other by no more than D mismatches, their clusters were integrated together. Here, D is a configurable "Distance” parameter. This process continues until there are no more clusters to consolidate, at which point the Nucleotide Sequence Clusterizer reports the number of clusters and the sequences within each cluster. The Nucleotide Sequence Clusterizer is available upon request.
  • the LT1-6 nucleic acid molecules are directed from the 5 'end to the 3' end side, It was designed to be the sequence of SEQ ID NO: 1-barcode sequence-target nucleic acid sequence-the sequence of SEQ ID NO: 2.
  • the barcode sequences of LT1-6 and the target nucleic acid sequences are shown in SEQ ID NOs: 5-16.
  • the nucleic acid molecules of ST1 to 5 are directed from the 5 'end to the 3' end side, It was designed to be the sequence of SEQ ID NO: 3-barcode sequence-target nucleic acid sequence-the sequence of SEQ ID NO: 4.
  • the barcode sequences of ST1-5 and the target nucleic acid sequences are shown in SEQ ID NOs: 17-26.
  • templates DNAs all contain random base barcodes, shown as molecular barcodes in panel A of FIG. 1, and the long template is downstream of a 50 base barcode consisting of 38 random bases and 12 fixed bases.
  • the short template had the target nucleic acid sequence of 8 bases downstream of the barcode of 30 bases consisting of 24 random bases and 6 fixed bases (FIG. 13). reference).
  • all templates contained common sequences at both the 5 'and 3' ends used for PCR amplification (see Figures 13 and 14). In this example, 40 000, 40000, 4000, 300, 100 and 20 copies of LT1, LT2, LT3, LT4, LT5 and LT6, 20000 copies of ST1 and ST2, and 4000 copies, respectively, as model measurement samples.
  • sequences of index A and B were added to the template by including in the reverse primer for amplification (see FIG. 14).
  • Sequence of reverse primer for amplification of index A (Rv primer in FIG. 14): CAAGCAGAGAGACGGCATACGAGAT AATGAGCG GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 28)
  • Sequence of reverse primer for amplification of index B (Rv primer 2 in FIG.
  • the plateau should be determined because the number of unique barcodes to be identified should not depend on the coverage (sequence depth) if the coverage reaches a sufficient level Will be observed in the plot of Even if the sequence depth (ie, the number of times each barcode is read) is increased, the measured number of original target nucleic acid sequences should not increase beyond 20000 of the original copy number, so the plateau is 20000. Expected to be. However, this expected plateau was not observed, and the number of unique barcodes determined increased monotonically as coverage increased. This implies that the digital counting system needs improvement under this condition.
  • the number of barcode clusters decreased as the position of the fixed base moved away from the sequencing primer site. This is reasonable as fixed base mismatches can detect indel-type sequence changes that occur between the sequence initiation site and the position of the fixed base. Also, the dependence of the number of fixed bases on the number of barcode clusters determined was analyzed. At this time, the fixed base located at the farthest position from the sequence primer site was used (see panel C of FIG. 3 and panel C of FIG. 9). When the number of fixed bases used was small, the number of barcode clusters determined decreased significantly, and as the number of fixed bases used increased, the number of barcode clusters determined became almost constant. To observe the effect of mismatch exclusion on the above two requirements for accurate digital quantification, the number of random bases (panel A in FIG.
  • the possibility of being added to the tube is considered to be very small. Therefore, it was considered that either the PCR primer containing the specific index was mixed in the tube, the index sequence had an error, and / or index switching occurred in the sequencing step (Sinha , R. et al., Biorxiv, 10.1101 / 125724 (2017)). To eliminate this effect, first, all the leads sorted into two indices for each template were mixed and clustering was performed on these mixed leads.
  • the barcode cluster was counted as having an index including the highest number of reads among the sequenced reads. Using this process, it was finally found that the determined number of clusters showed a plateau as a function of coverage (see panel C of FIG. 2 and yellow lines of panel C of FIG. 8). Importantly, the blue lines in panel C of FIG. 2 and panel C of FIG. 8 show that the number of clusters determined to increase in coverage slightly increases, but the effect of index switching is With the above process excluded, the number of clusters showed a plateau even if the coverage rate increased.
  • index switching affects the accuracy of barcode clustering, and exclusion of index switching (miss index) It is shown that the process allows for a digital quantitative system with improved accuracy and accuracy not affected by coverage.
  • the number was 4 (for all templates, the 16th, 21st, 24th and 28th from the left used a barcode (FIG. 13) with fixed bases). Cross contamination of the index and misidentification of the template were also considered. It is believed that all of the above quantitative analysis and insights can be used to accurately quantify the nucleic acid molecule of interest using the digital counting scheme of the present invention. Based on these conditions, two requirements were examined for all the molds to determine the dynamic range of this digital counting system (see panels A, B and 11 of FIG. 4). For coverage dependency, use 20 random bases for clustering (panel A and 11 in Figure 4), and for the dependence on the number of random bases, analyze 10% of the original total number of reads We decided to use (panel B of FIG. 4).
  • the required number of random bases can be presented to count the absolute number of molecules in the presence of errors (see FIG. 5, panel A).
  • the x-axis indicates the number of inputs of the molecule to be measured, and the y-axis indicates the number of random bases when each curve in panel B of FIG. 4 and panel B of FIG. Show.
  • Panel B in FIG. 5 shows the dependence of the relative number of clusters on the number of random bases as done in panel B of FIG. Did not have a significant effect). Including these data in panel A of FIG. 5 to show more data in the lower range of a given number of molecules, and, for example, to quantify about 10 5 molecules with an accuracy of over 95% It has been found that at least 16 random bases are required.
  • molecular barcodes are used to identify sample cross contamination (caused by physical contamination of primers, errors in the index, and / or index switching in the sequencing process) or misidentification of the target nucleic acid sequence in the alignment process did.
  • the former can solve the serious problems reported in the next generation sequencer platform 23, 24 .
  • the effectiveness of random base barcodes has been demonstrated in general applications and validation of the barcode usage presented here The strategies for are applicable to various platforms.
  • the effectiveness of random base barcodes on barcoded molecules has been demonstrated, so that one skilled in the art, who can vary from application to application, can evaluate the effectiveness or efficacy of barcode addition.
  • the present invention includes gene expression analysis, iCLIP 8 , antibody repertoire analysis 9 , bacterial 16S rRNA gene analysis 10 , 11 , count of molecules in ChIP-nexus 12 as well as cells 9 , 25 , 26 , virus 1 and barcodes Can be widely used in digital counting of nucleic acid quantitation using molecular barcodes, for other applications using Recently, commercially available products such as Single Cell Sequencing Solution (Illumina, Inc., CA, US and Bio-Rad Laboratories, Inc., CA, USA) and Chromium Single Cell 3 'Solution (10x Genomics, Inc. CA, USA) An apparatus may be used to perform these applications. We believe that system biology is promoted based on a large amount of quantitative data obtained experimentally.
  • SEQ ID NO: 1 Base sequence of 5 'region of LT1 to 6
  • SEQ ID NO: 2 Base sequence of 3' region of LT1 to 6
  • SEQ ID NO: 3 Base sequence of 5 'region of ST1 to 5
  • SEQ ID NO: 4 base sequence of 3 'region of ST1 to 5 barcode of SEQ ID NO: 5: LT1 barcode sequence
  • SEQ ID NO: 6 LT1 target nucleic acid sequence
  • SEQ ID NO: 8 target nucleic acid sequence of LT2
  • 9 LT3 bar code sequence
  • SEQ ID NO: 10 LT3 target nucleic acid sequence
  • SEQ ID NO 12 LT4 target nucleic acid sequence
  • SEQ ID NO: 13 LT5 bar code sequence
  • SEQ ID NO 14 LT5
  • SEQ ID NO 16 LT

Abstract

L'invention fournit un procédé de correction d'erreurs générées dans un procédé de quantification digital d'acides nucléiques mettant en œuvre un code à barres moléculaire. Concrètement, l'invention fournit : un procédé spécifiant un mauvais appariement du code à barres moléculaire avec une séquence d'index selon une fréquence de détection; un procédé spécifiant des codes à barres moléculaires présentant des substitutions de bases divisées en groupes identiques; et un procédé spécifiant des codes à barres moléculaires présentant une insertion ou une déficience mettant en œuvre des codes à barres moléculaires contenant des bases fixes et des bases aléatoires.
PCT/JP2018/023778 2017-06-23 2018-06-22 Procédé pour séquençage et procédé pour analyse d'acides nucléiques WO2018235938A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2019525702A JP7160349B2 (ja) 2017-06-23 2018-06-22 核酸をシークエンシングする方法および解析する方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762523857P 2017-06-23 2017-06-23
US62/523,857 2017-06-23

Publications (1)

Publication Number Publication Date
WO2018235938A1 true WO2018235938A1 (fr) 2018-12-27

Family

ID=64735948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/023778 WO2018235938A1 (fr) 2017-06-23 2018-06-22 Procédé pour séquençage et procédé pour analyse d'acides nucléiques

Country Status (2)

Country Link
JP (1) JP7160349B2 (fr)
WO (1) WO2018235938A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021241721A1 (fr) 2020-05-29 2021-12-02 国立研究開発法人理化学研究所 Procédé de traitement d'une population cellulaire et procédé d'analyse de gènes inclus dans une population cellulaire
WO2023092872A1 (fr) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 Procédé de séquençage à haut débit basé sur la référence interne d'un marqueur connu

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150275289A1 (en) * 2012-05-31 2015-10-01 Board Of Regents, The University Of Texas System Method for Accurate Sequencing of DNA
WO2016002875A1 (fr) * 2014-07-02 2016-01-07 株式会社Dnaチップ研究所 Procédé de comptage de molécules d'acide nucléique
JP2017503512A (ja) * 2014-01-16 2017-02-02 イラミーナ インコーポレーテッド 固相担体におけるアンプリコン調製および配列決定
JP2017514487A (ja) * 2014-04-29 2017-06-08 イルミナ インコーポレイテッド 鋳型切換え及びタグメンテーションを用いる単一細胞の遺伝子発現の多重分析

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150275289A1 (en) * 2012-05-31 2015-10-01 Board Of Regents, The University Of Texas System Method for Accurate Sequencing of DNA
JP2017503512A (ja) * 2014-01-16 2017-02-02 イラミーナ インコーポレーテッド 固相担体におけるアンプリコン調製および配列決定
JP2017514487A (ja) * 2014-04-29 2017-06-08 イルミナ インコーポレイテッド 鋳型切換え及びタグメンテーションを用いる単一細胞の遺伝子発現の多重分析
WO2016002875A1 (fr) * 2014-07-02 2016-01-07 株式会社Dnaチップ研究所 Procédé de comptage de molécules d'acide nucléique

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MATSUMOTO, Y. ET AL.: "A framework for digital RNA sequencing (dRNA-seq) and its application for gene expression profiling of small number of immune-related cells", EUROPEAN JOURNAL OF IMMUNOLOGY, vol. 46, no. suppl. 1, 1203, 2016, pages 1162, XP055655719 *
MATSUMOTO, YUKI ET AL.: "Frame for digital RNA-seq: Accurate quantification of nucleic acid molecule by random barcode and its evaluation criteria (non-official translation)", ABSTRACTS OF COLLABORATIVE CONFERENCE OF THE 38TH ANNUAL CONFERENCE OF THE MOLECULAR BIOLOGY SOCIETY OF JAPAN AND THE 88TH CONFERENCE OF THE JAPANESE BIOCHEMICAL SOCIETY, 2015 *
SHIROGUCHI, KATSUYUKI: "Development of genome-side digital quantification method for non-coding RNA's wich has single-base identification and single-molecule level quantification", RESEARCH PERFORMANCE REPORT ON KAKENHI, no. 15K14425, 22 March 2018 (2018-03-22) *
SHIROGUCHI, KATSUYUKI: "Standardization of genome-wide absolute and accurate counting of RNA molecules", RESEARCH PERFORMANCE REPORT ON KAKENHI, no. 26640119, 3 June 2016 (2016-06-03) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021241721A1 (fr) 2020-05-29 2021-12-02 国立研究開発法人理化学研究所 Procédé de traitement d'une population cellulaire et procédé d'analyse de gènes inclus dans une population cellulaire
WO2023092872A1 (fr) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 Procédé de séquençage à haut débit basé sur la référence interne d'un marqueur connu

Also Published As

Publication number Publication date
JPWO2018235938A1 (ja) 2020-04-23
JP7160349B2 (ja) 2022-10-25

Similar Documents

Publication Publication Date Title
EP3601598B1 (fr) Procédés d'enrichissement de séquences d'acide nucléique cibles comportant des applications dans le séquençage d'acide nucléique à correction d'erreur
JP7011392B2 (ja) Dnaプロファイリングのための方法および組成物
US10619163B2 (en) Tagging and assessing a target sequence
Mamanova et al. Target-enrichment strategies for next-generation sequencing
JP7332733B2 (ja) 次世代シークエンシングのための高分子量dnaサンプル追跡タグ
US20110257031A1 (en) Nucleic acid, biomolecule and polymer identifier codes
US9334532B2 (en) Complexity reduction method
WO2017020024A2 (fr) Systèmes et procédés d'analyse génétique
CN110878345A (zh) 通过分子计数提高等位基因调用的置信度
US20160115544A1 (en) Molecular barcoding for multiplex sequencing
KR20220025213A (ko) 시퀀싱 공정
Ogawa et al. The efficacy and further functional advantages of random-base molecular barcodes for absolute and digital quantification of nucleic acid molecules
JP7160349B2 (ja) 核酸をシークエンシングする方法および解析する方法
US20140287946A1 (en) Nucleic acid control panels
EP2510114B1 (fr) Procédé analytique pour ARN
WO2011145614A1 (fr) Procédé pour concevoir une sonde pour détecter un matériau de référence d'acide nucléique, sonde pour détecter un matériau de référence d'acide nucléique, et système de détection d'acide nucléique ayant une sonde pour détecter un matériau de référence d'acide nucléique
Edwards Whole-genome sequencing for marker discovery
CN110582577A (zh) 文库定量和鉴定

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18820103

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019525702

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18820103

Country of ref document: EP

Kind code of ref document: A1