GB2599049A - Analysis system and analysis method - Google Patents

Analysis system and analysis method Download PDF

Info

Publication number
GB2599049A
GB2599049A GB2117852.0A GB202117852A GB2599049A GB 2599049 A GB2599049 A GB 2599049A GB 202117852 A GB202117852 A GB 202117852A GB 2599049 A GB2599049 A GB 2599049A
Authority
GB
United Kingdom
Prior art keywords
time
series data
kinds
color
electrophoresis device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB2117852.0A
Other versions
GB2599049B (en
GB202117852D0 (en
Inventor
Anazawa Takashi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi High Tech Corp
Original Assignee
Hitachi High Tech Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi High Tech Corp filed Critical Hitachi High Tech Corp
Priority to GB2117852.0A priority Critical patent/GB2599049B/en
Priority claimed from GB1910590.7A external-priority patent/GB2573692B/en
Publication of GB202117852D0 publication Critical patent/GB202117852D0/en
Publication of GB2599049A publication Critical patent/GB2599049A/en
Application granted granted Critical
Publication of GB2599049B publication Critical patent/GB2599049B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • G01N27/44704Details; Accessories
    • G01N27/44717Arrangements for investigating the separated zones, e.g. localising zones
    • G01N27/44721Arrangements for investigating the separated zones, e.g. localising zones by optical means
    • G01N27/44726Arrangements for investigating the separated zones, e.g. localising zones by optical means using specific dyes, markers or binding molecules
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6428Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6486Measuring fluorescence of biological material, e.g. DNA, RNA, cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2565/00Nucleic acid analysis characterised by mode or means of detection
    • C12Q2565/60Detection means characterised by use of a special device
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6428Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes"
    • G01N2021/6439Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks
    • G01N2021/6441Measuring fluorescence of fluorescent products of reactions or of fluorochrome labelled reactive substances, e.g. measuring quenching effects, using measuring "optrodes" with indicators, stains, dyes, tags, labels, marks with two or more labels
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/645Specially adapted constructive features of fluorimeters
    • G01N2021/6463Optics
    • G01N2021/6471Special filters, filter wheel
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8827Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/64Fluorescence; Phosphorescence
    • G01N21/6408Fluorescence; Phosphorescence with measurement of decay time, time resolved fluorescence
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis
    • G01N27/44704Details; Accessories
    • G01N27/44717Arrangements for investigating the separated zones, e.g. localising zones
    • G01N27/44721Arrangements for investigating the separated zones, e.g. localising zones by optical means
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/74Optical detectors

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Optics & Photonics (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

A capillary electrophoresis device comprising; a sample containing four or more types of fluorophores; a capillary 1 for performing the electrophoretic analysis on the sample; a light source to irradiate the capillary with a laser beam 12; an optical system 14 to condense fluorescence 13 emitted from a light-emitting point 15 of the capillary 1 when irradiated by the laser beam 12 and a RGB colour sensor to measure an image of the light-emitting point 15 generated by the optical system 14. The device may perform DNA sequencing or DNA fragment analysis by the electrophoretic analysis. Also provided is a capillary electrophoresis device comprising; a plurality of samples each containing four or more types of fluorophores; a plurality of capillaries; a capillary array; a light source to irradiate the capillary array; an optical system to condense fluorescence emitted from light-emitting points of the plurality of capillaries and an RGB colour area sensor that measures an image of fluorescence emitted from the light-emitting points generated by the optical system.

Description

Description
Title of Invention: ANALYSIS SYSTEM AND ANALYSIS METHOD
Technical Field
[0001] The present invention relates to an analysis system and an analysis method.
Background Art
[0002] Analysis methods are widely used with which a plurality of kinds of components included in samples including biological samples, such as DNA, proteins, and cells, is labeled with a plurality of kinds of fluorescent substances (where the correspondence is not always one-to-one correspondence), fluorescence emitted from the plurality of kinds of fluorescent substances is detected being identified, and hence the plurality of kinds of components is analyzed. Examples of such analysis methods include chromatography, DNA sequencing, DNA fragment analysis, flow cytometry, PCR, HPLC, Western blot, Northern blot, Southern blot, and microscopic observation.
[0003] Generally, the fluorescence spectra of a plurality of kinds of fluorescent substances have overlaps with each other (in the following, referred to as spectral overlaps). The fluorescence emissions from a plurality of kinds of fluorescent substances temporally or spatially also have overlaps with each other (in the following, referred to as space-time overlaps). In the situations in which the overlaps are present, there is a need for a technique that detects and identifies fluorescence emitted from a plurality of kinds of fluorescent substances.
[0004] Next, this technique will be described, taking an example of a DNA sequencer using electrophoresis. DNA sequencers using electrophoresis start from slab electrophoresis in 1980s, and change to methods according to capillary electrophoresis in 1990s and later. However, techniques that solve the problems do not change basically. Fig. 3 of Nonpatent Literature 1 shows a method according to slab electrophoresis. Note that the technique is also used in capillary electrophoresis. The basic processes of the technique are composed of processes (1) to (4) below under the condition M N. (1) A laser beam is irradiated to a sample while being separated by electrophoresis, the sample including M kinds of DNA fragments labeled with M kinds of fluorescent substances, to make the fluorescent substances emit fluorescence, the fluorescence are detected in N colors in N kinds of wavelength bands, and hence time-series data of N color fluorescence intensities is acquired.
(2) Color conversion is applied for each time of the time-series data (1) to give time-series data of concentrations of M kinds of fluorescent substances, i.e., M kinds of DNA fragments is acquired.
(3) Correction based on the difference in the mobility of M kinds of fluorescent substances (in the following, referred to as mobility correction) is applied for each time of the time-series data (2) to give mobility-corrected time-series data of the concentrations of M kinds of fluorescent substances, i.e., M kinds of DNA fragments is acquired.
(4) Base-calling is performed based on the time-series data (3). [0005] On the basis of the processes (1) to (4), Fig. 3 of Nonpatent Literature 1 will be described in detail. Here, M = 4, N = 4. In the following, the case in which a single sample is analyzed using a single electrophoresis channel will be described. In the case in which a plurality of samples is analyzed using a plurality of electrophoresis channels, the processes below are performed in parallel.
[0006] First, (1) will be described. In the preparation of a copy of DNA fragments of various lengths for a template DNA by a Sanger reaction, the DNA fragments are respectively labeled with one of four kinds of fluorescent substances corresponding to the terminal base species C, A, G, and T (in the following, for simplicity, these fluorescent substances are referred to as C, A, G, and T). A laser beam is sequentially irradiated to the DNA fragments labeled with the fluorescent substances while being separated by length through electrophoresis to make the fluorescent substances emit fluorescence. The emitted fluorescence is detected in four kinds of wavelength bands b, g, y, and r (in the following, referred to as four-color detection). The wavelength bands may correspond to the maximum emission wavelengths of C, A, G, and T. Thus, pieces of time-series data of the fluorescence intensities I(b), I(g), I(y), and 1(r) of these four colors are acquired. Fig. 3A of Nonpatent Literature 1 is these pieces of time-series data (also referred to as raw data) of these colors, and I(b), I(g), I(y), and I(r) are respectively indicated by blue, green, black, and red.
[0007] Subsequently, (2) will be described. The fluorescence intensities of four colors at each time are expressed by the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances C, A, G, and T at that time as Equation (1). [Equation 1] (b)) 1(g) r(bC) w(bA) w(gA) w (yA) w(rA) w(bG) w(gG) w(yG) w(rG) w(bT)) w(gT) w(yT) w(rT) /D(C) D (A) D(G) D(T)1 - ( 1) I(y) _ w(gC) I (r) w(yC) w(rC) Here, elements w(XY) of a four-by-four matrix W express the spectral-overlap-based intensity ratio at which a fluorescent substance Y (C, A, G, or T) is detected in a wavelength band X (b, g, y, or r). w(XY) is the fixed value determined only by the characteristics of the fluorescent substance Y (C, A, G, or T) and the wavelength band X (b, g, y, or r), and is not changed during electrophoresis.
[0008] Therefore, the concentrations of four kinds of fluorescent substances at each of the times is found from the fluorescence intensities of four colors at that time as Equation (2).
[Equation 2] (D (C)) c(bC) w(bA) w(b G) w(bT))-1 (1 (b)) D (A) w(gC) w(gA) w(gG) w(gT) 1(g) -. (2) D(G) w(yC) w(yA) w(yG) w(yT) 1(y) D(T) w(rC) w(rA) w(rG) w(rT) I (r) As described above, the fluorescence intensities of four colors are multiplied by the inverse matrix W, and hence the spectral overlap is solved (in the following, this process is referred to as color conversion). Thus, the concentrations of four kinds of fluorescent substances, C, A, G, and T, i.e., pieces of the time-series data of the concentrations of the DNA fragments having four kinds of bases at the ends are acquired.
[0009] Fig. 3B of Nonpatent Literature 1 is this time-series data. Color conversion is feasible regardless of the presence or absence of the space-time overlap. As shown in Fig. 3B, the presence of the concentrations of a plurality of kinds of fluorescent substances at the same time shows the presence of the space-time overlap.
[0010] Fig. 30 of Nonpatent Literature 1 is a process that corresponds none of (1) to (4), and is not necessarily performed. The time-series data in Fig. 3B is separated, by deconvolution, into individual peaks, that is, signals express the concentrations of the DNA fragments of a single length that are labeled with any one of four kinds of fluorescent substances, C, A, G, and T, and hence the space-time overlap is solved.
[0011] Lastly, (3) and (4) will be described. Generally, DNA fragments of various lengths differed by one base length are separated almost at regular intervals by electrophoresis. However, the influence of the fluorescent substance with which the DNA fragment is labeled changes the mobility (electrophoresis velocity), and this sometimes makes the regular intervals unequal. Therefore, magnitude correlation of mobility according to types of fluorescent substances to be labeled is checked in advance, and time-series data in Fig. 3B or Fig. 30 of Nonpatent Literature 1 is corrected based on information on the magnitude correlation. Thus, time-series data in which DNA fragments of various lengths differed by one base length are arranged almost at regular intervals is acquired. Fig. 3D of Nonpatent Literature 1 is this time-series data. Blue, green, black, and red peaks respectively express the concentrations of single-length DNA fragments having the terminal base species of C, A, G, and T. In Fig. 3D, the DNA fragments are arranged in order of lengths differed by one base. Therefore, as shown in Fig. 3D, the terminal base species are arranged in order, and hence the result of base-calling can be acquired.
[0012] In the processes above, or before and after the processes, the time-series data is sometimes appropriately subjected to processing, such as smoothing, noise filtering, and base line removal.
[0013] In the color conversion process in (2), as expressed by Equation (1), the unknowns of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances, C, A, G, and T, are found by solving four simultaneous equations composed of the known quantities of four kinds of fluorescence detection intensities I(b), I(g), I(y), and I(r) in the wavelength bands b, g, y, and r for each of the times. Generally, since this corresponds to solving M kinds of unknowns by N simultaneous equations, the condition M N is necessary as described above. If M > N, no solution can be uniquely found (that is, a plurality of solutions is possibly present), and hence color conversion (Equation (2)) is unfeasible. [0014] However, in Nonpatent Literature 2, DNA sequencing by electrophoresis is addressed under the condition M > N where M = 4, N = 3. The emission of fluorescence is detected in three kinds of wavelength bands b, g, and r (in the following, referred to as three-color detection), and then the fluorescence intensities in four colors at each time in Equation (1) are substituted for three color fluorescence intensities at each time in Equation (3). [Equation 3] D(C) (Kb)) (w(bC) vidr(w) w(bG) w(b7)()(A)) (3) = w(gC) w(gA) w(gG) w(gT) 0)G( w(rC) m*/0,A(TG) m*T) D(T), At this time, the matrix W has three rows and four columns, and has no inverse matrix. Thus, none of the concentrations of four kinds of fluorescent substances can be uniquely found unlike Equation (2). As described above, generally, no solution can be found under the condition M > N. However, a solution can be found by additionally providing preconditions as below.
[0015] In the first step, it is assumed that there is no space-time overlap in a plurality of kinds of fluorescent substances, i.e., it is assumed that only one kind of fluorescent substance emits fluorescence at a time. At this time, Y (C, A, G, or T) at which the ratio of three color fluorescence intensities (I(b) I(g) I(r))± and the ratio of four columns (w(bY) w(gY) w(rY))' of the matrix W are the closest at each of the times can be selected. In other words, when one kind of fluorescent substance is selected in Equation (3) and the concentrations of the remaining three kinds of fluorescent substances is zero, i.e., when (D(C) D(A) D(G) D(T))T is (D(C) 0 0 0)1, (0 D(A) 0 0)1, (0 0 D(G) 0)", or (0 0 0 D(T))", D(Y) (Y is C, A, G, or T) where the difference between the left-hand side and the right-hand side is the smallest is individually found. Thus, Y (C, A, G, or T) at which the difference between the left-hand side and the right-hand side at this time is the smallest can be selected. Here, in the case in which the difference between the left-hand side and the right-hand side is not sufficiently small in any case, the process goes to the subsequent second step.
[0016] In the second step, it is assumed that only two kinds of fluorescent substances emit fluorescence at a time. Now, when two kinds of fluorescent substances are selected in Equation (3) and the concentrations of the remaining two kinds of fluorescent substances is zero, i.e., when (D(C) D(A) D(G) D(T))F is (D(C) D(A) 0 0)T, (D(C) 0 D(G) 0)F, (D(C) 0 0 D(T)):, (0 D(A) D(G) (0 D(A) 0 D(T))', or (0 0 D(G) D(I))', two kinds of D(Y) (Y is C, A, G, or T) where the difference between the left-hand side and the right-hand side is the smallest are individually found. Two kinds of Y (C, A, G, or T) where the difference between the left-hand side and the right-hand side at this time is the smallest can be selected.
[0017] In this manner, similarly to the process (2) in Nonpatent Literature 1, the time-series data of the concentrations of four kinds of fluorescent substances, i.e., four kinds of DNA fragments, is acquired. After that, processes equivalent to processes (3) and (4) in Nonpatent Literature 1 are performed, and hence the result of base-calling can be acquired.
[0018] In order to hold the conditions of performing the first and the second steps in Nonpatent Literature 2, i.e., the assumption that only one kind or two kinds of fluorescent substances emit fluorescence at a time, it is necessary to hold a small space-time overlap, i.e., two conditions below.
(a) In the time-series data of the concentrations of four kinds of DNA fragments, a plurality of peaks derived from the DNA fragments of single lengths is arranged almost at regular intervals.
(b) In the time-series data of the concentrations of four kinds of DNA fragments, two adjacent peaks derived from the DNA fragments differed by one base length are excellently separated.
[0019] In Nonpatent Literature 2, four kinds of primers used in the Sanger reaction are labeled with four kinds of fluorescent substances different from each other (precisely, a labeling method for three kinds of fluorescent substances is modified), and the difference in mobility due to the influence of the fluorescent substance with which the DNA fragment is labeled is made sufficiently small. In addition, the condition in which electrophoresis separation performance is sufficiently high is satisfied. That is, the time-series data of the three-color fluorescence intensity (I(b) I(g) I(r))' where the assumptions (a) and (b) are held is acquired (the upper part of Fig. 2 in Nonpatent Literature 2). Under these conditions, the first and the second steps are performed. Thus, the time-series data of the concentrations of four kinds of fluorescent substances, i.e., four kinds of DNA fragments, is acquired, and the result of base-calling is obtained (the lower part of Fig. 2 in Nonpatent Literature 2).
Citation List Nonpatent Literature [0020] Nonpatent Literature 1: Genome Res. pp. 644-65, 8(6), Jun.
Nonpatent Literature 2: Electrophoresis. pp. 1403-14, 19(8-9), Jun. 1998
Summary of Invention
Technical Problem [0021] An RGB color sensor is a two-dimensional sensor having three kinds of pixels arrayed, the sensor being configured to detect the wavelength bands of red (R), green (G), and blue (B) corresponding to three primary colors that human eyes can identify. The RGB color sensor is used not only in single-lens reflex digital cameras and compact digital cameras but also in digital cameras installed on smartphones these years, and is explosively popular in the world. Thus, the performance is remarkably improved, and the price is also remarkably decreased. Therefore, it is extremely useful that the RGB color sensor is applied to the analysis method in which fluorescence emitted from the plurality of kinds of fluorescent substances is detected while being identified and a plurality of kinds of components is detected. However, since the RGB color sensor can detect only three colors, it is difficult to identify the emissions of fluorescence from four kinds or more fluorescent substances. As described above, generally, in order to identify the emissions of fluorescence from M kinds of fluorescent substances, in the case in which M kinds of fluorescent substances have a spectral overlap and a space-time overlap, N-color detection has to be performed in N kinds of wavelength bands where the condition M N is satisfied.
[0022] Nonpatent Literature 2 solves the problem in the case in which the emissions of fluorescence from M = four kinds of fluorescent substances is detected in N = three colors on the DNA sequencer using electrophoresis (that is, M > N) by providing preconditions on the space-time overlap. These are the conditions in which only one kind or two kinds of fluorescent substances emit fluorescence at a time, i.e., the conditions (a) and (b) are held.
[0023] However, generally, the conditions (a) and (b) are not held in many cases. Also in the time-series data of three color fluorescence intensities (upper) and the time-series data of the concentrations of four kinds of fluorescent substances (lower) in Fig. 2 of Nonpatent Literature 2, in the regions indicated by asterisks, although electrophoresis separation performance, three or more peaks respectively derived from the DNA fragments of single lengths are crowded due to a phenomenon referred to as compression. Thus, the condition (a) is not held, and hence correct base-calling is not achieved. In the closing stage of electrophoresis in Fig. 2 of Nonpatent Literature 2, the electrophoresis separation performance is reduced. Therefore, because separation of two adjacent peaks derived from the DNA fragments differed by one base length is insufficient, the condition (b) is not held, and hence correct base-calling is not achieved.
[0024] In Nonpatent Literature 2, the condition (a) is achieved using a primer labeling method in which four kinds of primers used in the Sanger reaction are labeled with four kinds of fluorescent substances different from each other. However, nowadays, instead of the primer labeling method, a terminator labeling method in which four kinds of terminators used in the Sanger reaction are labeled with four kinds of different fluorescent substances is mainly used. In the primer labeling method, it is necessary to separately perform the Sanger reaction using four kinds of primers, i.e., in four different sample tubes. On the other hand, in the terminator labeling method, the Sanger reaction using four kinds of terminators can be performed together, i.e., in one sample tube. Therefore, the terminator labeling method can greatly simplify the Sanger reaction. [0025] However, in the primer labeling method, the difference in mobility of DNA fragments labeled with four kinds of fluorescent substances is small, whereas in the terminator labeling method, the difference in mobility of DNA fragments labeled with four kinds of fluorescent substances is large, and hence the condition (a) is not held inevitably. That is, only one kind or two kinds of fluorescent substances do not necessarily emit fluorescence, and there is a possibility that three kinds or more fluorescent substances sometimes emit fluorescence at a time. Therefore, in the case in which at least the terminator labeling method is used, it is difficult to perform DNA sequencing by the method of Nonpatent Literature 2.
[0026] Therefore, the present invention is to provide an analysis technique to detect M kinds of components by N-color detection in N kinds (M > N) of wavelength bands in the state in which fluorescence emitted from M kinds of fluorescent substances has a spectral overlap and a space-time overlap.
Solution to Problem [0027] For example, in order to solve the problem, configurations described in claims are adopted. The present application includes multiple schemes that solve the problem. For an example, there is provided an analyzer configured to separate a sample including a plurality of components labeled with any of M kinds of fluorescent substances by chromatography and acquire first time-series data of fluorescence signals detected in N kinds (M > N) of wavelength bands in a state in which at least a part of the plurality of components is not completely separated; a storage unit configured to store second time-series data of individual model fluorescence signals of the plurality of components; and a computer configured to compare the first time-series data with the second time-series data, and determine that which kinds of fluorescent substances of the M kinds of fluorescent substances individually label the plurality of components.
[0028] According to another example, there is provided an analysis method comprising: separating a sample including a plurality of components labeled with any of M kinds of fluorescent substances by chromatography to acquire first time-series data of fluorescence signals detected in N kinds (M > N) of wavelength bands in a state in which at least a part of the plurality of components is not completely separated; and determining that which kinds of fluorescent substances of the M kinds of fluorescent substances individually label the plurality of components by comparing the first time-series data with second time-series data of individual model fluorescence signals of the plurality of components.
Advantageous Effects of Invention [0029] According to the present invention, M kinds of components can be detected even in M > N in the state in which fluorescence emitted from M kinds of fluorescent substances has a spectral overlap and a space-time overlap. Note that further characteristics relating to the present invention will be apparent from the description of the present specification and the accompanying drawings. Problems, configurations, and effects other than those described above will be apparent from the description of embodiments below.
Brief Description of Drawings
[0030] Fig. 1 is a diagram showing a DNA sequencing method according to Nonpatent Literature 1 using model data.
Fig. 2 is a diagram showing a DNA sequencing method according to Nonpatent Literature 2 using model data.
Fig. 3 is a diagram showing an example of a DNA sequencing method using model data according to a first embodiment.
Fig. 4 is a diagram showing another example of a DNA sequencing method using model data according to the first embodiment.
Fig. 5 is a diagram showing process steps and the configuration of a system according to the first embodiment.
Fig. 6 is a diagram showing process steps and the configuration of a system according to the first embodiment.
Fig. 7 is a diagram showing process steps and the configuration of a system according to a second embodiment.
Fig. 8 is a diagram showing process steps and the configuration of a system according to a third embodiment (M = 4, N = 3) . Fig. 9 is a diagram showing process steps and the configuration of a system according to a fourth embodiment (M = 4, N = 2).
Fig. 10 is a block diagram of a capillary-electrophoresis apparatus.
Fig. 11 is a block diagram of a multicolor detection system of a multi-capillary-electrophoresis apparatus.
Fig. 12 is a block diagram of a computer.
Fig. 13 shows a fifth embodiment in which DNA sequencing is performed using the process steps and the configurations of the system shown in Figs. 9 to 12.
Fig. 14 is a diagram showing time-series data of two color fluorescence intensities of model peaks of single-length DNA fragments labeled with four kinds of fluorescent substances according to the fifth embodiment.
Fig. 15 is a diagram illustrating the process of DNA sequencing according to the fifth embodiment.
Fig. 16 is a diagram illustrating the process of DNA sequencing according to the fifth embodiment.
Fig. 17 is a diagram collecting the terminal base species, the fitting accuracy, and the QV (Quality Value) for model peaks in Fig. 15 (5) in the temporal order of electrophoresis.
Fig. 16 is a diagram collecting the terminal base species, the fitting accuracy, and the QV for model peaks in Fig. 15 (6) in the temporal order of electrophoresis after correction.
Description of Embodiments
[0031] In the following, embodiments of the present invention the will be described with reference to the accompanying drawings. Note that the accompanying drawings illustrate specific embodiments according to the principle of the present invention. However, these drawings are provided for understanding the present invention, and are not used for limitedly interpreting the present invention at all. [0032] The embodiments below relate to a device that detects fluorescences, being identified, from a sample including a plurality of components labeled with a plurality of fluorescent substances and hence analyzes the components. The embodiments below are applicable to the fields of chromatography, DNA sequencing, DNA fragment analysis, flow cytometry, PCR, HPLC, Western blot, Northern blot, Southern blot, microscopic observation, and any other method, for example.
[0033] In the case in which DNA sequencing by electrophoresis is performed, the content of Nonpatent Literatures 1 and 2 will be described in more detail using Figs. 1 to 4.
[0034] Fig. 1 shows a method of Nonpatent Literature 1. Fig. 1 (1) shows time-series data of four color fluorescence intensities I(b), T(g), I(y), and T(r) obtained by four-color detection of the emissions of fluorescence from four kinds of fluorescent substances C, A, G, and T in four kinds of wavelength bands b, g, y, and r. The horizontal axis in Fig. 1 expresses time, and the vertical axis expresses fluorescence intensity.
[0035] Fig. 1 (W) shows elements w(XY) in four row and four columns of a matrix W. For example, four black bar graphs express w(bC), w(gC), w(yC), and w(rC) from the left. Similarly, horizontal stripe bar graphs express w(XA) (X is b, g, y, and r), oblique stripe bar graph express w(XG) (X is b, g, y, and r), and check bar graphs express w(XT) (X is b, g, y, and r). The graphs w(bY), w(gY), w(yY), and w(rY) are normalized such that w(bY) + w(gY) + w(yY) + w(rY) = 1 (Y is C, A, G, or T).
[0036] Specifically, the matrix W and the inverse matrix W I of the matrix W are as follows.
[Equation 4] W= iw(bC) w(bA) w(bG) w(bT) w(gC) w(gA) w(gG) w(gT) _ w(yC) w(yA) w(yG) w(yT) \ w(rC) w(rA) w(rG) w(rT) (0.56 0.26 0./6 0\ 0.28 0.43 0.24 0.18 0.11 0.22 0.40 0.27 0.06 0.09 0.20 0.451 (4) W-1 = iw(bC) w(bA) w(bG) w(bT) w(gC) w(gA) w(gG) w(gT) w(yC) w(yA) w(yG) w(yT) w(rC) w(rA) w(rG) w(rT) ( 2.55 -1.44 -0.27 0.23 -1.76 4.27 -1.68 -0.34 0.33 -2.12 4.64 -2.00 -0.12 0.29 -1.69 3.12 (5) [0037] I(b), I(g), I(y), and I(r) at each time in Fig. 1 (1) is multiplied by the inverse matrix W-of the matrix W (i.e., by color conversion), and hence the time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances C, A, G, and T, i.e., four kinds of DNA fragments having base species C, A, G, and T at the ends is acquired as shown in Fig. 1 (2).
[0038] In Fig. 1 (2), four peaks of C, A, G, and T are obtained. The heights of the peaks, i.e., the concentrations (arbitrary unit), are D(C) -100, D(A) -60, D(G) -90, and D(T) -60. The times (arbitrary unit) of the peaks are 30, 55, 60, and 75. Although the peak of A and the peak of G have a large space-time overlap with each other, color conversion can correctly find each of the concentrations even in this case.
[0039] Fig. 1 (3) shows mobility-corrected time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances or DNA fragments, obtained based on the differences in mobility due to the kinds of fluorescent substances to be labeled that is checked in advance in the results in Fig. 1 (2). Specifically, since it is known that the mobility of the DNA fragments labeled with the fluorescent substance A is decreased with respect to the mobilities of the DNA fragments labeled with the other fluorescent substances so as to be delayed by a duration (arbitrary unit) of 10. Therefore, mobility correction is performed in which the peak-detection time (arbitrary unit) of the DNA fragments labeled with the fluorescent substance A is preceded by 10. That is, the detection time (arbitrary unit) of the peak of A is corrected from 55 to 45 in Fig. 1 (2). As a result of this mobility correction, time-series data is acquired in which DNA fragments of various lengths differed by one base length are arranged almost at regular intervals.
[0040] Fig. 1 (4) shows results of base-calling performed based on Fig. 1 (3). It is fine that the labeling fluorescent substance species to which the peaks belong or the terminal base species of the DNA fragments are read in temporal order.
[0041] Fig. 2 shows a method of Nonpatent Literature 2. Fig. 2 (1) shows time-series data of three color fluorescence intensities I(b), I(g), and I(r) obtained by three-color detection of the emissions of fluorescence from four kinds of fluorescent substances C, A, G, and T in three kinds of wavelength bands b, g, and r.
[0042] In Fig. 2 (1), only the time-series data of I(y) is removed from Fig. 1 (1), and the time-series data of I(b), I(g), and I(r) are the same in both drawings.
[0043] Fig. 2 (W) shows elements w(XY) in three rows and four columns of a matrix W. For example, four black bar graphs express w(bC), w(gC), and w(rC) from the left. Similarly, horizontal stripe bar graphs express w(XA) (X is b, g, and r), oblique stripe bar graphs express w(XG) (X is b, g, and r), and check bar graphs express w(XT) (X is b, g, and r). w(bY), w(gY), and w(rY) are normalized such that w(bY) + w(crY) + w(rY) = 1 (Y is C, A, G, or T).
[0044] Specifically, the matrix W is as follows.
[Equation 5] W = w(bC) w(bA) w(bG) w(bT) = (0.63 0.33 0.27 0.13\ -* * ( 6) w(gC) ( w(rC) w(gA) w(gG) w(gT) w(rT) ) 0.31 0.56 0.40 0.25) w(rA) w(rG) 0.06 0.11 0.33 0.63/ [0045] As the first step, the case is considered in Fig. 2 (1) in which only one kind of fluorescent substance emits fluorescence at a time using Fig. 2 (W). In Fig. 2 (1), two peaks observed on the left side and the right side have ratios of the three-color fluorescence intensity (T(b) T(g) I(r))T close to the ratios of (0.63 0.31 0.06)± and (0.13 0.25 0.63)±, respectively. Thus, these two peaks can be determined as single peaks of C and T. The heights of the peaks of these C and T, i.e., the concentrations (arbitrary unit), are D(C) -100 and D(T) -80, and the times (arbitrary unit) of the peaks of these C and T are 30 and 75, respectively.
[0046] On the other hand, the peak observed in the center of Fig. 2 (1) has a ratio of the three-color fluorescence intensity (I(b) I(g) I(r))1 that is not close to any ratio of (w(bY) w(gY) w(rY)F (Y is C, A, G, or T). Therefore, as the second step, the case is considered in which only two kinds of fluorescent substances emit fluorescence at a time using Fig. 2 (W). Here, the solution that the peaks of A and G are detected at the same time can be derived. Specifically, when the heights of the peaks, i.e., the concentrations (arbitrary unit) are D(A) = 80 and D(G) = 90 and the times (arbitrary unit) of the peaks are 55 and 55, the difference between the left-hand side and the right-hand side in Equation (3) is decreased, and hence the peaks observed in the center of Fig. 2 (1) can be explained. From the results above, time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances C, A, G, and T, i.e., four kinds of DNA fragments having base species C, A, G, and T at the ends is acquired as shown in Fig. 2 (2).
[0047] Similarly to Fig. 1 (3), Fig. 2 (3) shows mobility-corrected time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances or DNA fragments. The detection time (arbitrary unit) of the peak of A is preceded by 10, and corrected from 55 to 45. In order to align the peak intervals, the detection time (arbitrary unit) of the peak of G is moved backward by five, and corrected from 55 to 60. As a result, Fig. 2 (3) and Fig. 1 (3) are the same. Fig. 2 (4) shows results of base-calling similarly to Fig. 1, and the same as Fig. 1 (4).
[0048] On the other hand, Fig. 2 shows that processes from Fig. 2 (1) to Fig. 2 (2) are not determined in a certain manner. Although the first step is the same as the processes above, in the second step, another solution is derived, and Fig. 2 (2)' is obtained, instead of Fig. 2 (2). That is, the solution that the peaks of A and T are detected at the same time can be derived. Specifically, when the heights of the peaks, i.e., the concentrations (arbitrary unit) are D(A) = 107 and D(G) = 34 and the times (arbitrary unit) of the peaks are 55 and 55, the difference between the left-hand side and the right-hand side in Equation (3) is decreased, and hence the peak observed in the center of Fig. 2 (1) can be explained. From the results above, time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances C, A, G, and T, i.e., four kinds of DNA fragments having base species C, A, G, and T at the ends is acquired as shown in Fig. 2 (2)'.
[0049] However, in Fig. 2 (2) because G is not detected, D(G) -0.
Similarly to Fig. 1 (3), Fig. 2 (3)' shows mobility-corrected time-series data of the concentrations D(C), D(A), D(G), and D(T) of four kinds of fluorescent substances or DNA fragments. The detection time (arbitrary unit) of the peak of A is preceded by 10, and corrected from 55 to 45. Fig. 2 (4)' shows results of base-calling performed based on Fig. 2 (3)'. Although the same Fig. 2 (1) and Fig. 2 (W) are used, Fig. 2 (2), Fig. 2 (3), and Fig. 2 (4) are totally different from Fig. 2 (2)', Fig. 2 (3)', and Fig. 2 (4)', and different base-calling results are derived. In the model data used here, since the base-calling results shown in Fig. 1 (4) is a positive solution, Fig. 2 (4) shows correct base-calling results, but Fig. 2 (4)' shows wrong base-calling results.
[0050] Therefore, in the method of Nonpatent Literature 2, a plurality of solutions are possibly derived from the same measured results, and there is a risk that wrong base-calling results are derived, generally, wrong analysis results.
[0051] First Embodiment Fig. 3 is a diagram showing a DNA sequencing method using model data according to a first embodiment. Fig. 3 (1) shows time-series data of three color fluorescence intensities I(b), I(g), and I(r) obtained by three-color detection of the emissions of fluorescence from four kinds of fluorescent substances C, A, G, and T in three kinds of wavelength bands b, g, and r, which is the same as Fig. 2 (1).
[0052] Fig. 3 (5) shows time-series data of the three color fluorescence intensities I(b), I(g), and I(r) of a model peak obtained when DNA fragments of a single length labeled with the fluorescent substance C are detected in three colors. The vertical axis in Fig. 3 (5) expresses the fluorescence intensity, and the horizontal axis is time. The data in Fig. 3 (5) expresses a temporal change in fluorescence intensity ratio between the model peaks of three fluorescence colors (b, g, and r) when the DNA fragments of the single length labeled with the fluorescent substance C are detected. Here, the three-color fluorescence intensity ratio at any time is (w(bC) w(gC) w(rCHT -(0.63 0.31 0.06)T of the matrix W in Equation (6). Similarly, Figs. 3 (6), (7), and (8) show time-series data of the three color fluorescence intensities I(b), I(g), and I(r) of model peaks \NI-len DNA fragments of single lengths labeled with the fluorescent substances A, G, and T are detected in three colors. The three color fluorescence intensity ratios in Figs. 3 (6), (7), and (8) are respectively (0.33 0.56 0.11)", (0.27 0.40 0.33)1, (0.13 0.25 0.63)'.
[0053] The shape of each of the model peaks is Gaussian here, and dispersion of the Gaussian distribution is matched with spatial dispersion of the DNA fragments of a single length observed in experiments. Note that the shapes of the model peaks are non-limiting to this example, and the shapes of the model peaks may have other configurations. Here, the model peaks in Figs. 3 (5), (6), (7), and (8) are fitted to the time-series data of Fig. 3 (1) only by changing heights and times of the model peaks. An example of the fitting process will be described with reference to Fig. 3. For example, the fitting process is executed in a stepwise fashion from the left end of the data of Fig. 3 (1). For example, height (fluorescence intensity) and median (electrophoresis time) of the Gaussian distribution in Fig. 3 (5) are varied. It is determined that fitting is achieved when a difference to the data of Fig. 3 (1) is smaller than a predetermined error. That is, here, in the data of Fig. 3 (1), the place where the fluorescence intensity and electrophoresis time are most matched is searched. Note that the fitting may be performed while width of the Gaussian distribution is also changed. In the case in which no fitting is achieved in Fig. 3 (5), it is determined whether fitting is achieved using other data (in Figs. 3 (6), (7), and (8)) or the combination of other data. As described above, fitting to the data of Fig. 3 (1) is performed using any data of Figs. 3 (5), (6), (7), and (8) or these combination of these pieces of data. The error in the fitting process can be calculated based on the difference between the peak shapes of Fig. 3 (1) and the shapes of the model peaks, for example. Various publicly known methods may be applied to calculating the error. Note that from the viewpoint of efficiency, preferably, fitting is performed in a stepwise manner from the end of the data of Fig. 3 (1), for example. This is because since tail of an adjacent peak of fluorescence leaks into a certain peak of fluorescence, performing fitting from the end of data efficiently enables fitting in consideration of the leakage of the adjacent peak. [0054] Fig. 3 (2) shows results of performing the fitting process. At this time, the peak shape of C is time-series data that the height of the time-series data of I(b) is multiplied by 1/w(bC) = 1/0.63 = 1.59. The peak shape of A is time-series data that the height of the time-series data of I(g) is multiplied by 1/w(gA) 1/0.56 -1.79. The shape peak of G is time-series data that the height of the time-series data of I(g) is multiplied by 1/w(gG) = 1/0.40 = 2.50. The peak shape of T is time-series data that the height of the time-series data of I(r) is multiplied by 1/w(rT) = 1/0.63 -1.59.
[0055] Based on these results, Fig. 3 (2) is the same as Fig. 1 (2). A method of acquiring Figs. 3 (3) and (4) further is similar to the method of acquiring Figs. 1 (3) and (4). According to the embodiment, since the process that derives Fig. 3 (2) from Fig. 3 (1) is uniquely determined, correct base-calling results can be acquired as shown in Fig. 3 (4).
[0056] In the method of Nonpatent Literature 2 in Fig. 2, in deriving Fig. 2 (2) or Fig. 2 (2)' from Fig. 2 (1), only the matrix W shown in Equation (6), i.e., the three-color fluorescence detection intensity ratios of the emissions of fluorescence from the fluorescent substances are used. On the other hand, in the embodiment shown in Fig. 3, in deriving Fig. 3 (2) from Fig. 3 (1), the peak shapes of the emissions of fluorescence from the fluorescent substances, i.e., temporal change information is used, in addition to the matrix W shown in Equation (6), i.e., the three-color fluorescence detection intensity ratios of the emissions of fluorescence from the fluorescent substances. These differences cause the difference whether the solution can be uniquely derived, i.e., correct base-calling results can be obtained.
[0057] Fig. 4 shows an example of the case in which three-color detection performed in Fig. 3 is further limited to two-color detection. Fig. 4 (1) shows time-series data of two color fluorescence intensities I(b) and I(r) obtained by two-color detection of the emissions of fluorescence from four kinds of fluorescent substances C, A, G, and T in two kinds of wavelength bands b and r, and is the time-series data in Fig. 3 (1) from which the time-series data of I(g) is removed.
[0058] Fig. 4 (5) shows time-series data of the two color fluorescence intensities I(b) and I(r) of a model peak when DNA fragments in a single length labeled with the fluorescent substance C are detected in two colors, and is the time-series data in Fig. 3 (5) from which the time-series data of I(g) is removed. Similarly, Figs. 4 (6), (7), and (8) are time-series data of the two color fluorescence intensities I(b) and I(r) of model peaks when DNA fragments in single lengths labeled with the fluorescent substances A, G, and T are detected in two colors, and are the time-series data that the time-series data in Figs. 3 (6), (7), and (8) from which I(g) is removed.
[0059] Here, the model peaks in Figs. 4 (5), (6), (7), and (8) are fitted to the time-series data of Fig. 4 (1) only by changing heights and times of the model peaks. Fig. 4 (2) shows the results of performing the fitting process. At this time, the peak shape of C is time-series data that the height of the time-series data of T(b) is multiplied by 1/w(bC) = 1/0.63 = 1.59. The peak shape of A is time-series data that the height of the time-series data of I(b) is multiplied by 1/w(bA) = 1/0.33 = 3.03. The shape peak of G is time-series data that the height of the time-series data of I(r) is multiplied by 1/w(rG) = 1/0.33 = 3.03. The peak of T is time-series data that the height of the time-series data of I(r) is multiplied by 1/w(rT) -1/0.63 -1.59.
[0060] Based on these results, Fig. 4 (2) is the same as Fig. 1 (2). A method of further acquiring Figs. 4 (3) and (4) is similar to the method of acquiring Figs. 1 (3) and (4). According to the embodiment, since the process that derives Fig. 4 (2) from Fig. 4 (1) is uniquely determined, correct base-calling results can be acquired as shown in Fig. 4 (4).
[0061] Fig. 5 shows process steps and a configuration of a system according to the embodiment. The system according to the first embodiment includes an analyzer 510, a computer 520, and a display device 530. The analyzer 510 is a liquid chromatography device, for example. The computer 520 may be achieved using a general-purpose computer, for example. The processing unit of the computer 520 may be achieved as the functions of programs executed on a computer. The computer at least includes a processor, such as a CPU (Central Processing Unit), and a storage unit, such as a memory. The process of the computer 520 may be achieved in which the processes corresponding to program codes are stored in the memory and the processor executes the program codes.
[0062] In this configuration, the analyzer 510 separates a sample including a plurality of components labeled with any of M kinds of fluorescent substances by chromatography, and acquires first time-series data of fluorescence signal detected in N kinds (M > N) of wavelength bands in a state in which at least a part of the plurality of components is not completely separated. The first time-series data of the fluorescence signal corresponds to N-colordetection time-series data 513 of an M-color-labeled sample, as described below. The computer 520 includes a storage unit (e.g. a memory and a HDD). The storage unit stores in advance second time-series data of individual model fluorescence signals of the plurality of components. The second time-series data of the individual model fluorescence signals of the plurality of components corresponds to N-color-detection time-series data 541 of single peak of each of the M-color labels described below. The computer 520 compares the first time-series data with the second time-series data, and determines which kind of fluorescent substances of M kinds of fluorescent substances individually label each of the plurality of components. The display device 530 displays third time-series data of concentrations of M kinds of fluorescent substances contributing to the fluorescence signals. The third time-series data of the concentrations of the fluorescent substances corresponds to M-colorlabel time-series data 523 described below. In the following, the processes will be more specifically described.
[0063] First, the M-color-labeled sample 501 including the plurality of components labeled with M kinds of fluorescent substances is injected into the analyzer 510. Subsequently, in the analyzer 510, a separation analysis process 511 of the plurality of components included in the sample 501 is performed. The analyzer 510 detects fluorescence emissions from M kinds of fluorescent substances in N kinds (M > N) of wavelength bands (N-color detection), and acquires the N-color-detection time-series data (fluorescence detection time-series data) 513 of the M-color-labeled sample. Here, the plurality of components are not always excellently separated. That is, fluorescence from a part of different components labeled with different fluorescent substances is detected in a space-time overlap state. The analyzer 510 outputs the N-color-detection time-series data 513 to the computer 520.
[0064] Subsequently, the computer 520 acquires, as input information, the N-color-detection time-series data 513 and the N-color-detection time-series data 541 of the single peak of each of the M-color labels that is the N-color detection time-series data of a single component labeled with any of M kinds of fluorescent substances. The N-color detection time-series data 541 of the single peak of each of the M-color labels is data corresponding to Fig. 3 (5), (6), (7), and (8), for example. Note that the N-color-detection time-series data 541 of the single peak of each of the M-color labels stored in advance on a first database 540.
[0065] Subsequently, the computer 520 executes a comparison analysis process 521 between the N-color-detection time-series data 513 and the N-color-detection time-series data 541 of the single peak of each of the M-color labels. As a result, the computer 520 acquires the M-color-label time-series data 523 that is the time-series data of the detected concentrations of M kinds of fluorescent substances, i.e., the concentrations of components labeled with M kinds of fluorescent substances. The M-color-label time-series data 523 is data corresponding to Fig. 3 (2), for example. Lastly, the display device 530 performs a display process 531 of the M-color-label time-series data 523.
[0066] Fig. 6 is a diagram that embodies comparison analysis on the computer 520 in Fig. 5. As comparison analysis, the computer 520 performs a fitting analysis process 522 on the N-color-detection time-series data 513 using the N-color-detection time-series data 541 of the single peak of each of the M-color labels. As a result, the computer 520 acquires fitting error data (or fitting accuracy data) 524 that is difference between the N-color-detection time-series data 541 and its fitting result together with the M-colorlabel time-series data 523. The display device 530 performs the display process 531 for any of or both of the M-color-label time-series data 523 and the fitting error data 524.
[0067] Second Embodiment Fig. 7 shows the process steps and a configuration of a system in the case in which the present invention is applied to electrophoresis analysis of DNA fragments. A plurality of components that are analytical targets may be nucleic acid fragments of different lengths or different compositions, and chromatography may be electrophoresis.
[0068] An analyzer 510 is an electrophoresis apparatus. First, an Mcolor-labeled DNA sample 502 including a plurality of kinds of DNA fragments labeled with M kinds of fluorescent substances is injected into the analyzer 510. Subsequently, in the analyzer 510, an electrophoresis separation analysis process 512 of the plurality of kinds of DNA fragments included in the DNA sample is performed. The analyzer 510 detects the emissions of fluorescence from M kinds of fluorescent substances in N kinds (M > N) of wavelength bands (N-color detection), and acquires N-color-detection time-series data 513. Here, the plurality of kinds of DNA fragments are not always excellently separated. That is, fluorescence from a part of different kinds of DNA fragments labeled with different fluorescent substances is detected in a space-time overlap state. The analyzer 510 outputs the N-color-detection time-series data 513 to a computer 520.
[0069] Subsequently, the computer 520 acquires, as input information, the N-color-detection time-series data 313 and N-color-detection time-series data 541 of a single peak of each of M-color labels that is the N-color-detection time-series data of a single kind of DNA fragments labeled with any one of M kinds of fluorescent substances. Subsequently, the computer 520 performs comparison analysis between the N-color-detection time-series data 313 and the N-color-detection time-series data 541 of the single peak of each of the M-color labels. Specifically, the computer 520 performs the fitting analysis process 522 on the N-color-detection time-series data 513 using the N-color-detection time-series data 541 of the single peak of each of the M-color labels. As a result, the computer 520 acquires M-color-label time-series data 523 that is the time-series data of the detected concentrations of M kinds of fluorescent substances, i.e., the concentration of the DNA fragments labeled with M kinds of fluorescent substances. At the same time, the computer 520 acquires fitting error data (or fitting accuracy data) 524.
[0070] Here, the mobility of DNA fragments by electrophoresis is affected by M kinds of fluorescent substances to be labeled. Therefore, in order to reduce the influence, the computer 520 performs a process using mobility-difference data 551 of the Mcolor-labeled samples indicating the difference in the mobility due to M kinds of fluorescent substances to be labeled. The mobility-difference data 551 of the M-color labeled samples is stored in advance on a second database 550. The computer 520 executes a mobility correction process 525 on the M-color-label time-series data 523 using the mobility-difference data 551 of the M-colorlabeled samples, and acquires data 526 that mobility is corrected on the M-color-label time-series data (in the following, referred to as corrected data). The corrected data 526 is data corresponding to Fig. 3 (3), for example.
[0071] Lastly, the display device 530 performs a display process 531 for a part or all of the M-color-label time-series data 523, fitting error data (or fitting accuracy data) 524, and the corrected data 526.
[0072] Third Embodiment Fig. 8 shows the process steps and a configuration of a system in the case in which the present invention is applied to DNA sequencing by electrophoresis. In Fig. 8, N -3 and M -4. An analyzer 510 is a DNA sequencer.
[0073] First, a four-color-labeled DNA sequencing sample 503 is prepared. The four-color-labeled DNA sequencing sample 503 Includes four kinds of DNA fragments that are prepared by a Sanger method using a target DNA as a template and labeled with four kinds of fluorescent substances corresponding to four kinds of terminal base species. The four-color-labeled DNA sequencing sample 503 is injected into the analyzer 510. Subsequently, in the analyzer 510, an electrophoresis separation analysis process 512 is performed on four kinds of DNA fragments included in the DNA sequencing sample. The analyzer 510 detects fluorescence emissions from four kinds of fluorescent substances in three kinds of wavelength bands, and acquires three-color-detection time-series data 513. Here, four kinds of DNA fragments are not always excellently separated. That is, fluorescence from a part of the DNA fragments of different lengths labeled with different fluorescent substances is detected in the space-time overlap state. The analyzer 510 outputs the threecolor-detection time-series data 513 to a computer 520.
[0074] Subsequently, the computer 520 acquires, as input information, the three-color detection time-series data 513 and three-colordetection time-series data 541 of a single peak of each of four-color labels that is the three-color-detection time-series data of the DNA fragments of a single length labeled with any one of four kinds of fluorescent substances. The three-color-detection time-series data 541 of the single peak of each of the four-color labels is stored in advance on a first database 540. The computer 520 performs comparison analysis between the three-color-detection time-series data 513 and the three-color-detection time-series data 541 of the single peak of each of the four-color labels. Specifically, the computer 520 performs a fitting analysis process 522 on the three-color-detection time-series data 513 of the four-color-labeled DNA sequencing sample using the three-color-detection time-series data 541 of the single peak of each of the four-color labels. As a result, the computer 520 acquires four-color-label time-series data 523 that is the time-series data of the detected concentrations of four kinds of fluorescent substances, i.e., the concentrations of four kinds of DNA fragments having different terminal base species and labeled with four kinds of fluorescent substances. At the same time, the computer 520 acquires fitting error data (or fitting accuracy data) 524.
[0075] Here, the mobility of DNA fragments by electrophoresis is affected by four kinds of fluorescent substances to be labeled. Therefore, in order to reduce the influence, a process is performed using mobility-difference data 551 of four-color labels indicating the difference in the mobility due to four kinds of fluorescent substances to be labeled. The mobility-difference data 551 of four-color labels is stored in advance on a second database 550. The computer 520 executes a mobility correction process 525 on the fourcolor-label time-series data 523 using the mobility-difference data 551 of four-color labels, and acquires corrected data (in the following, referred to as corrected data) 526 of the four-color-label time-series data.
[0076] The computer 520 performs a DNA base sequence determination process 528 using the corrected data 526. On the other hand, the computer 520 acquires base-sequence-determination error data (or base-sequence-determination accuracy data) 527 of each base of the determined DNA base sequence using the fitting error data (or fitting accuracy data) 524.
[0077] Lastly, the display device 530 performs a display process 531 for a part or all of the four-color-label time-series data 523, the fitting error data (or fitting accuracy data) 524, the corrected data 526, the DNA-base-sequence-determination results, and the basesequence-determination error data (or base-sequence-determination accuracy data) 527.
[0078] Fourth Embodiment Fig. 9 shows the process steps and a configuration of a system in the case in which the configuration of Fig. 8 is substituted under the conditions N = 2 and M = 4. The process steps and the configuration of the system are similar to Fig. 8, and the description is therefore omitted.
[0079] Fig. 10 is a block diagram of a capillary electrophoresis apparatus that is an example of the analyzer 510. A capillary electrophoresis apparatus 100 is used as a DNA sequencer and a DNA fragment analysis device, for example. The inside of a capillary 1 is filled with an electrophoresis separation medium having electrolyte, and a sample injection end 2 and a sample elution end 3 of the capillary 1 are respectively immersed in a cathode-side electrolytic solution 4 and an anode-side electrolytic solution 5, respectively. A negative electrode 6 is immersed in the cathode-side electrolytic solution 4, and a positive electrode 7 is immersed in the anode-side electrolytic solution 5. A high-voltage power supply 8 applies a high voltage across the negative electrode 6 and the positive electrode 7, and hence electrophoresis is performed. [0080] The sample injection to the capillary 1 is performed in which the sample injection end 2 and the negative electrode 6 are immersed in a sample solution 9 and the high-voltage power supply 8 applies a high voltage across the negative electrode 6 and the positive electrode 7 for a short time. The sample solution 9 includes a plurality of kinds of components labeled with a plurality of kinds of fluorescent substances. After sample injection, the sample injection end 2 and the negative electrode 6 are again immersed in the cathode-side electrolytic solution 4, a high voltage is applied across the negative electrode 6 and the positive electrode 7, and hence electrophoresis is performed.
[0081] Negatively charged components included in the sample, e.g. DNA fragments are electrophoretically migrated in an electrophoresis direction 10, indicated by an arrow, from the sample injection end 2 to the sample elution end 3 in the capillary 1. By the difference in the mobility due to electrophoresis, the plurality of kinds of components included in the sample solution 9 is gradually separated. At a position (a laser beam irradiation position 15) where the components electrophoretically migrated by a certain distance in the capillary 1, a laser beam 12 emitted from a laser light source 11 is irradiated. When the components pass the laser beam irradiation position 15, emission of fluorescence 13 from the plurality of kinds of fluorescent substances labeled on the components is induced. The fluorescence 13 varying over time of electrophoresis is measured by a multicolor detection system 14 that performs optical detection in a plurality of kinds of wavelength bands. Although only one capillary 1 is depicted in Fig. 10, there may be used a multi-capillary electrophoresis apparatus that performs electrophoresis analysis in parallel using a plurality of capillaries 1.
[0082] Fig. 11 shows an example of a multicolor detection system of a multi-capillary electrophoresis apparatus. The laser beam irradiation positions 15 of a plurality of capillaries 1 are arranged on the same plane at regular intervals. The left drawing in Fig. 11 is a cross-sectional view vertical to the major axis of the plurality of capillaries 1, and the right drawing in Fig. 11 is a cross-sectional view parallel with the major axis of a given capillary 1.
[0083] The laser beam 12 is irradiated along the arrangement plane of the plurality of capillaries 1. Thus, the laser beam 12 is simultaneously irradiated to the plurality of capillaries 1. The fluorescence 13 emitted from each of the capillaries 1 is condensed in parallel by separate lenses 16. The condensed beams directly enter a two-dimensional color sensor 17. The two-dimensional color sensor 17 is an RGB color sensor that can perform three-color detection in three kinds of wavelength bands. The fluorescence 13 emitted from the capillaries 1 respectively forms spots at different positions on the two-dimensional color sensor 17, and hence the fluorescence 13 can be independently detected in three colors.
[0084] Fig. 12 shows an exemplary configuration of a computer 520. As shown in Figs. 5 to 9, the computer 520 is connected to the analyzer 510. The computer 520 may control not only data analysis described in Figs. 5 to 9 but also the analyzer 510. In Figs. 5 to 9, a display device 530 and databases 540 and 550 are depicted on the outer side of the computer 520. However, the databases 540 and 550 may be included in the computer 520.
[0085] The computer 520 includes a CPU (processor) 1201, a memory 1202, a display unit 1203, a HDD 1204, an input unit 1205, and a network interface (NIF) 1206. The display unit 1203 is a display, for example, and may be used as the display device 530. The input unit 1205 is an input device that is a keyboard and a mouse, for example. A user can set the conditions of data analysis and the conditions of controlling the analyzer 510 through the input unit 1205. N-color detection time-series data 513 outputted from the analyzer 510 is sequentially stored on the memory 1202.
[0086] The HDD 1204 may include the databases 540 and 550. The HDD 1204 may include programs that perform the fitting analysis process, the mobility correction process, and the DNA base sequence determination process, and any other process of the computer 520. The process of the computer 520 may be achieved in which processes corresponding to program codes are stored on the memory 1202 and the CPU 1201 executes the program codes.
[0087] For example, N-color-detection time-series data 541 of the single peak of each of the M-color labels stored on the HDD 1204 is stored on the memory 1202, and the CPU 1201 executes the comparison analysis process using the N-color-detection time-series data 513 and the N-color-detection time-series data 541 of the single peak of each of the M-color labels. The display unit 1203 displays the analyzed results. Note that the analyzed results may be checked against information on a network through the NIF 1206.
[0068] Fifth Embodiment Figs. 13 to 18 show an embodiment in which DNA sequencing is performed using the process steps and the configurations of the system shown in Figs. 9 to 12.
[0069] As a four-color-labeled DNA sequencing sample, a sample was prepared by dissolving 3500/3500xE Sequencing Standards, BigDye Terminator v3.1 (Thermo Fisher Scientific) in 300 pL of formamide. This sample includes four kinds of DNA fragments having terminal base species C, A, G, and T labeled with four kinds of fluorescent substances dROX (a maximum emission wavelength of 618 nm), dR6G (a maximum emission wavelength of 568 urn), dR110 (541 nm), and dTAMRA (a maximum emission wavelength of 595 nm), respectively.
[0090] There are four capillaries 1 with an outer diameter of 360 pm, an inner diameter of 50 pm, a total length of 56 cm, and an effective length of 36 cm. For the electrophoresis separation medium, POP-7 (Thermo Fisher Scientific) that is a polymer solution was used. In electrophoresis, the capillary 1 was adjusted to a temperature of 60°C, and the electric field strength was 182 V/cm. The sample injection was performed by electrokinetic injection at an electric field strength of 27 V/cm for eight seconds. The laser beam 12 was at a wavelength of 505 nm and an output of 20 mW. Between the lens 16 and the two-dimensional color sensor, a long-pass filter that blocked the laser beam 12 was used.
[0091] Fig. 13 (1) corresponds to Fig. 4 (1), and is two-color-detection time-series data obtained by detecting fluorescence 13 emitted one of four capillaries 1 by the RE B color sensor during electrophoresis. The RGB color sensor used in the embodiment can perform three-color detection with three kinds of wavelength bands r, g, and b corresponding to R (red), G (green), and B (blue). However, the emissions of fluorescence from four kinds of fluorescent substances were rarely detected in the wavelength band b, and were detected only in the wavelength bands r and g. Thus, Fig. 13 (1) shows a time series of the fluorescence intensity I(r) and I(g) in these two colors. The horizontal axis expresses a lapse of time (electrophoresis time) from the start of electrophoresis in units of seconds, and the vertical axis expresses the fluorescence intensity in arbitrary unit.
[0092] Fig. 14 (1) corresponds to Fig. 4 (5), and is time-series data of two color fluorescence intensities T(g) and I(r) of a model peak when DNA fragments having terminal base species C and a single length, and labeled with the fluorescent substance dROX were detected in two colors. Here, ratio of the two color fluorescence intensities at any time is (w(gC) w(rC))-= (0.02 0.96)T. Similarly, Fig. 14 (2) corresponds to Fig. 4 (6), and is time-series data of two color fluorescence intensities I(g) and I(r) of a model peak when DNA fragments having terminal base species A and a single length, and labeled with the fluorescent substance dR6G were detected in two colors. Ratio of the two color fluorescence intensities is (w(gA) w(rA)), = (0.50 0.50),. Fig. 14 (3) corresponds to Fig. 4 (7), and is time-series data of two color fluorescence intensities I(g) and I(r) of a model peak when DNA fragments having terminal base species G and a single length, and labeled with the fluorescent substance dR110 were detected in two colors. Ratio of the two color fluorescence intensities is (w(gG) w(rG)) = (0.71 0.29),. Fig. 14 (4) corresponds to Fig. 4 (8), and is time-series data of two color fluorescence intensities I(g) and I(r) of a model peak when DNA fragments having terminal base species T and a single length, and labeled with the fluorescent substance dTAMRA were detected in two colors. Ratio of the two color fluorescence intensities is (w(gT) w(rT))' = (0.16 0.84)i. Shapes, offsets, and standard deviation of the model peaks were Gaussian, zero, and one second, respectively. This is the result matched with the peak shape of DNA fragments in a single length measured under the present electrophoresis conditions.
[0093] The two-color-detection time-series data of four kinds of model peaks shown in Figs. 14 (1) to (4) were sequentially fit to the two-color-detection time-series data obtained by electrophoresis analysis shown in Fig. 13 (1). In the fitting, only medians (electrophoresis times) and heights (fluorescence intensities) of the Gaussian distributions shown in Figs. 14 (1) to (4) were varied while the two color fluorescence intensity ratios and standard deviations of the Gaussian distributions were kept, and the medians and the heights were determined such that the errors from the twocolor-detection time-series data in Fig. 13 (1) were minimized. [0094] Fig. 13 (2) is the result that the model peak of G (dR110) in Fig. 14 (3) was accurately fit to the peak of g observed on the leftmost of Fig. 13 (1). Fig. 13 (3) is the result that the model peak of T (dTAMRA) in Fig. 14 (4) was accurately fit to the peak of r on the right side of the above peak of g in Fig. 13 (1). Fig. 13 (4) is the result that the model peak of C (dROX) in Fig. 14 (1) was accurately fit to the peak of r on the right side of the above peak of r in Fig. 13 (1). Fig. 13 (5) is the two-color-detection time-series data that summated those in Figs. 13 (2), (3) and (4), i.e., time-series data that added I(g) and I(r) in Figs. 13 (2), (3) and (4). As shown in Fig. 13 (5), the corresponding portions in Fig. 13 (1) were faithfully reproduced.
[0095] Fitting error and fitting accuracy were evaluated as below. Fitting error was found by dividing standard deviation of difference between a fit model peak and the corresponding measured two-color-detection time-series data in a section of two-second duration (two times the standard deviation of the Gaussian distribution) center of which is the time of the top of the fit model peak, by a larger value of the measured two-color-fluorescence intensities at the time of the top of the model peak. Fitting accuracy was obtained by subtracting the fitting error from one. Fitting accuracy is 100% when fitting perfectly agrees with measurement. Then fitting accuracy decreases with deviation, and becomes 0% when the deviation is larger than or equal to the larger value of the measured twocolor-fluorescence intensities. In the embodiment, fitting error and accuracy are defined as described above. However, definitions other than these are of course fine.
[0096] The fitting accuracy of the model peak of T in Fig. 13 (3) alone was only 43.41%. However, the fitting accuracy of the model peak of T when the model peaks of G, T, and C were summated as shown in Fig. 13 (5) was 95.56%. This means that in the case in which the space-time overlap is present, it is important to perform fitting neighboring model peaks together rather than a model peak alone. [0097] Similarly, the two-color-detection time-series data of four kinds of model peaks shown in Figs. 14 (1) to (4) were sequentially fit to all the peaks of g and r in Fig. 13 (1). Fig. 13 (6) shows two-color-detection time-series data that summated the individual fit model peaks. As shown in Fig. 13 (6), all the fit model peaks in Fig. 13 (1) were faithfully reproduced.
[0098] Fig. 15 (1) shows time-series data of concentration of the fluorescent substance dR110, i.e., concentration of DNA fragments of the terminal base species G, obtained by summating I(g) and I(r) in Fig. 13 (2). Fig. 15 (2) shows time-series data of concentration of the fluorescent substance dTAMRA, i.e., concentration of DNA fragments of the terminal base species T, obtained by summating I(g) and I(r) in Fig. 13 (3). Fig. 15 (3) shows time-series data of concentration of the fluorescent substance dROX, i.e., concentration of DNA fragments of the terminal base species C obtained by summating I(g) and I(r) in Fig. 13 (4). Similarly, Fig. 15 (5) shows time-series data of concentrations of DNA fragments of the terminal base species C, A, G, and T obtained by all the fit model peaks in Fig. 13 (6).
[0099] Fig. 17 shows a list summarizing terminal base species, fitting accuracy, and QV (Quality Value) on all the model peaks in Fig. 15 (5) in the temporal order of electrophoresis. Here, QV is found from QV = -10*Log (1 -S) where the fitting accuracy is S. In 200 seconds from electrophoresis time 1100 seconds to 1300 seconds, 76 kinds of DNA fragments differed by one base length are measured, and their terminal base species are individually identified.
Because the fitting accuracy is a mean of 94.72%, and the QV is a mean of 13.95, highly accurate fitting is achieved. Note that the computer 520 may calculate data in Fig. 17, and the display device 530 may display data in Fig. 17.
[0100] Fig. 15 (6) shows corrected time-series data of concentrations of the DNA fragments of the terminal base species C, A, G, and T obtained by applying mobility correction to Fig. 15 (5) based on the difference in mobility due to the differences between four kinds of fluorescent substances. Specifically, electrophoresis times of the medians of the model peaks of the terminal base species G in Fig. 15 (5) was shifted backward by 1.6 seconds, electrophoresis times of the medians of the model peaks of the terminal base species T in Fig. 15 (5) was shifted forward by 1.1 seconds, and the model peaks of the terminal base species C and A were not corrected. As a result above, the peaks of the DNA fragments differed by one base length were arranged almost at regular intervals in a shorter length order. Notably, measurement order of the DNA fragments of the terminal base species G and the terminal base species T are reversed because of the influence of the difference in mobility due to the difference in the fluorescent substances (that is, long DNA fragments overtakes short DNA fragments in electrophoresis) as shown in Fig. 15 (5), whereas it is excellently corrected as shown in Fig. 15(6).
[0101] Fig. 18 shows a list summarizing terminal base species, fitting accuracy, and QV on all the model peaks in Fig. 15 (6) in the corrected temporal order of electrophoresis. Fig. 18 is the list that rearranges Fig. 17, and the numerical values used are the same. The arrangement of the terminal base species in Fig. 18 provides DNA sequencing results (base-calling results). Fitting accuracy and QV of the bases are indexes in correlation with accuracy of base-sequence-determination, but are not the same with accuracy of base-sequence-determination. Generally, it is expected that accuracy of base-sequence-determination is greater than fitting accuracy. Actually, accuracy of the DNA sequencing results in Fig. 18 was 100%. Note that the computer 520 may calculate the data in Fig. 18, and the display device 530 may display the data in Fig. 18. [0102] Fig. 16 summarizes the processes of DNA sequencing according to the embodiment. Fig. 16 (1) is the same as Fig. 15 (1), and is the two-color-detection time-series data obtained by electrophoresis analysis, corresponding to Fig. 4 (1). Fig. 16 (2) is the same as Fig. 15 (5), and is the time-series data of concentrations of the DNA fragments of the terminal base species C, A, G, and T, corresponding to Fig. 4 (2). Fig. 16 (3) is the same as Fig. 15 (6), and is the corrected-time-series data of concentrations of the DNA fragments of the terminal base species C, A, G, and T obtained by mobility correction, corresponding to Fig. 4 (3). Lastly, Fig. 16 (4) shows the results of performing base-calling based on the results in Fig. 16 (3), corresponding to Fig. 4 (4). The base-calling results in Fig. 16 (4) are matched with the base sequences of the target DNA.
[0103] The schemes and the effects of the first to the fifth embodiments will be summarized. According to the foregoing embodiments, analysis methods can be provided in which M kinds of components are identified and detected by N-color detection in N kinds (M > N) of wavelength bands in the state in which fluorescence emitted from M kinds of fluorescent substances has spectral overlaps and space-time overlaps. In the following, following Nonpatent Literature 2, on a DNA sequencer using electrophoresis, a scheme that detects emissions of fluorescence from M = four kinds of fluorescent substances in N = three colors will be described.
[0104] The analyzer 510 detects the emissions of fluorescence from four kinds of fluorescent substances C, A, G, and T in three colors in three kinds of wavelength bands b, g, and r. The process is similar to Nonpatent Literature 2 up to the process of obtaining the three-color-fluorescence intensities in Equation (3) at each time. Here, the HDD (the storage unit) 1204 of the computer 520 stores four kinds of model peak data, that is, four kinds of time-series data when DNA fragments of single lengths labeled with any of four kinds of fluorescent substances, C, A, G, and T are detected in three colors. The three-color-fluorescence intensity ratio of the model peak data of the DNA fragments labeled with the fluorescent substance Y (C, A, G, or T) is (w(bY) w(gY) w(rY)F)F. Therefore, the model peak data includes information equivalent to the matrix W. In addition to this, the model peak data includes information on the shapes of the peaks, i.e., time-series information.
[0105] In Nonpatent Literature 2, one kind or two kinds of fluorescent substances emitting fluorescence are selected at each time, and their concentrations are found using the matrix W. On the other hand, in the foregoing embodiments, the computer 520 executes the fitting analysis process to the time-series data of the threecolor-fluorescence intensities expressed by Equation (3) using the model peak data of four kinds of fluorescent substances. Even in the case in which fluorescence emitted from four kinds of fluorescent substances has a spectral overlap and a space-time overlap, the computer 520 can execute the fitting analysis process. For example, there is no problem when three kinds or more fluorescent substances emit fluorescence at a time. The fitting results composed of the model peak data of C, A, G, and T expresses the time-series data of concentrations D(C), D(A), D(G), and D(T) of C, A, G, and T. That is, although not color conversion is performed, the concentrations of four kinds of fluorescent substances, i.e., the time-series data of the concentration of four base species corresponding to Equation (2) of Nonpatent Literature 1 can be acquired using the time-series data of the three color fluorescence intensities. Unlike Nonpatent Literature 2, the foregoing embodiments have significant characteristics that utilize time-series information on the concentrations of four kinds of fluorescent substances. After that, the computer 520 performs processes equivalent to the processes (3) and (4) in Nonpatent Literature 1, and hence the computer 520 can acquire the results of base-calling.
[0106] According to the foregoing embodiments, fitting is performed to the time-series data of N-color-fluorescence intensities obtained in N kinds (M > N) of wavelength bands by N-color detection in the state in which fluorescence emitted from M kinds of fluorescent substances has spectral overlaps and space-time overlaps using the model peak data of M kinds of fluorescent substances, and hence the time-series data of the concentrations of M kinds of fluorescent substances, i.e., M kinds of components can be analyzed.
[0107] In order to perform analysis in which M kinds of components are identified and detected by N-color detection in N kinds of wavelength bands in the state in which fluorescence emitted from M kinds of fluorescent substances has a spectral overlap and a space-time overlap, conventionally, the necessary conditions are M N. According to the foregoing embodiments, analysis can be similarly performed in which M kinds of components are identified and detected even in M > N. That is, the effect is exerted in which similar analysis can be achieved by much simpler, smaller-sized, and inexpensive device configuration. For example, by N = three-color detection using an RGB color sensor which performance is being enhanced and which cost is being reduced rapidly, analysis in which M = four kinds or more components labeled with M = four kinds or more fluorescent substances are identified and detected is feasible. From the results above, analysis by highly accurate and inexpensive multicolor detection is feasible. For example, N = three-color detection using an inexpensive RGB color sensor is performed while M = four kinds of DNA fragments labeled with M = four kinds of fluorescent substances by the Sanger reaction, being subjected to electrophoresis separation. Thus, even though the DNA fragments of different lengths labeled with M = four kinds of fluorescent substances are measured in the mixed state, the time-series data of the concentrations of M = four kinds of DNA fragments can be acquired, and hence DNA sequencing can be excellently performed. [0108] The present invention is non-limiting to the foregoing embodiments, including various exemplary modifications. The foregoing embodiments are described in detail for easily understanding the present invention, and are not necessarily limited to those having all the configurations. A part of the configuration of an embodiment may be substituted for the configuration of another embodiment. To the configuration of an embodiment, the configuration of another embodiment may be added. The other configurations can be added to, removed from, or replaced by a part of the configuration of the embodiments.
[0109] The configurations, functions, processing units, and processing schemes, for example, of the computer 520 may be achieved by hardware by designing a part or all of those using an integrated circuit, for example. The configurations, functions, and any other component may be achieved by software by a processor that interprets and executes programs implementing the functions. Information, such as programs, tables, and files, that achieves the functions can be stored on various types of computer readable media. Examples of the non-transitory computer readable media that are used include a flexible disk, CD-ROM, DVD-ROM, hard disk, optical disk, magneto-optical disk, CD-R, magnetic tape, non-volatile memory card, and ROM. [0110] In the foregoing embodiments, control lines and information lines that are considered as necessary lines for description are shown. All control lines and information lines of products are not necessarily shown. All the configurations may be connected to each other.
List of Reference Signs [0111] c Fluorescence signal detected in wavelength band c g Fluorescence signal detected in wavelength band g y Fluorescence signal detected in wavelength band y r Fluorescence signal detected in wavelength band r C Fluorescent substance C or base species C A Fluorescent substance A or base species A G Fluorescent substance G or base species G T Fluorescent substance T or base species T 1 Capillary 2 Sample injection end 3 Sample elution end 4 Cathode-side electrolytic solution Anode-side electrolytic Solution 6 Negative electrode 7 Positive electrode 8 High-voltage power supply 9 Sample solution Electrophoresis direction 11 Laser light source 12 Laser beam 13 Fluorescence 14 Multicolor detection system Laser beam irradiation position 16 Lens 17 Two-dimensional color sensor 510 Analyzer 520 Computer 530 Display device 540, 550 Database The following labelled clauses set out further aspects of the present invention: Clause 1 An analysis system comprising: an analyzer configured to separate a sample including a plurality of components labeled with any of M kinds of fluorescent substances by chromatography and acquire first time-series data of fluorescence signals detected in N kinds (M > N) of wavelength bands in a state in which at least a part of the plurality of components is not completely separated; a storage device configured to store second time-series data of individual model fluorescence signals of the plurality of components, and a computer configured to compare the first time-series data with the second time-series data, and determine which kind of fluorescent substance of M kinds of fluorescent substances individually labels each of the plurality of components.
Clause 2 The analysis system according to clause 1, wherein the second time-series data are data expressing temporal change in fluorescence intensity ratios between N kinds of fluorescences of model peaks when a component labeled with each of the fluorescent substances is detected.
Clause 3 The analysis system according to clause 1, wherein the computer outputs third time-series data of concentrations of M kinds of fluorescent substances contributing to the fluorescence signals by fitting the second time-series data to the first time-series data.
Clause 4 The analysis system according to clause 3, wherein the storage device stores mobility difference data relating to differences in mobility due to differences in the M kinds of fluorescent substances, and the computer corrects the differences in the mobility in the third time-series data based on the mobility difference data.
Clause.5 The analysis system according to clause 3, wherein the computer outputs fitting error data or fitting accuracy data relating to a difference between the first time-series data and a result of the fitting on each of the plurality of components.
Clause 6 The analysis system according to clause 5, further comprising a display configured to display at least one of (i) the result of the fitting relating to each of the plurality of components, (ii) fitting error data or fitting accuracy data relating to each of the plurality of components, and (iii) the third time-series data.
Clause 7 The analysis system according to clause 1, wherein the plurality of components are nucleic acid fragments of different lengths or of different compositions, and the chromatography is electrophoresis.
Clause 8 The analysis system according to clause 7, wherein the plurality of components are DNA fragments terminally labeled with M = four kinds of fluorescent substances according to terminal base species prepared by a Sanger method using a target DNA as a template; the first time-series data is time-series data of fluorescence signals detected in N -three kinds or two kinds of wavelength bands, and the computer determines a base sequence of the target DNA.
Clause 9 An analysis method comprising: separating a sample including a plurality of components labeled with any of M kinds of fluorescent substances by chromatography to acquire first time-series data of fluorescence signals detected in N kinds (M > N) of wavelength bands in a state in which at least a part of the plurality of components is not completely separated, and determining which kind of fluorescent substance of M kinds of fluorescent substances individually labels each of the plurality of components by comparing the first time-series data with second time-series data of individual model fluorescence signals of the plurality of components.
Clause 10 The analysis method according to clause 9, wherein the second time-series data are data expressing fluorescence intensity ratios between N kinds of fluorescences of model peaks when each of the fluorescent substances emits fluorescence alone.
Clause 11 The analysis method according to clause 9, wherein the determining includes outputting third time-series data of concentrations of M kinds of fluorescent substances contributing to the fluorescence signals by fitting the second time-series data to the first time-series data.
Clause 12 The analysis method according to clause 11, wherein the determining includes correcting the differences in the mobility in the third time-series data based on mobility difference data relating to differences in mobility due to differences in the M kinds of fluorescent substances.
Clause 13 The analysis method according to clause 11, wherein the determining includes outputting fitting error data or fitting accuracy data relating to a difference between the first time-series data and a result of the fitting on each of the plurality of components.
Clause 14 The analysis method according to clause 13, further comprising displaying at least one of (i) the result of the fitting relating to each of the plurality of components, (ii) fitting error data or fitting accuracy data relating to each of the plurality of components, and (iii) the third time-series data.
Clause 15 The analysis method according to clause 9, wherein the plurality of components are nucleic acid fragments of different lengths or of different compositions, and the chromatography is electrophoresis.
Clause 16 The analysis method according to clause 15, wherein the plurality of components are DNA fragments terminally labeled with M = four kinds of fluorescent substances according to terminal base species prepared by a Sanger method using a target DNA as a template; the first time-series data is time-series data of fluorescence signals detected in N = three kinds or two kinds of wavelength bands, and the determining includes determining a base sequence of the target DNA.

Claims (1)

  1. Claims [Claim 1] A capillary electrophoresis device comprising: a sample containing four or more types of fluorophores; a capillary for performing an electrophoretic analysis of the sample; a light source that irradiates the capillary with a laser beam; an optical system that condenses fluorescence emitted from a light-emitting point of the capillary by irradiation with the laser beam; and a sensor that measures an image of the light-emitting point generated by the optical system, wherein the sensor is an RGB color sensor.[Claim 2] The capillary electrophoresis device according to claim 1, wherein the capillary electrophoresis device performs DNA sequencing of the sample by the electrophoretic analysis.[Claim 3] The capillary electrophoresis device according to claim 1, wherein the capillary electrophoresis device performs DNA fragment analysis of the sample by the electrophoretic analysis.[Claim 4] The capillary electrophoresis device according to any one of claims 1 to 3, wherein the capillary electrophoresis device performs comparing a piece of time-series data of a signal of the RGB color sensor obtained during the electrophoretic analysis of the sample with pieces of time-series data of signals of the RGB color sensor obtained during electrophoretic analyses of each of the four or more types of fluorophores.[Claim 5] The capillary electrophoresis device according to any one of claims 1 to 3, wherein the capillary electrophoresis device performs representing a piece of time-series data of a signal of the RGB color sensor obtained during the electrophoretic analysis of the sample by a combination of pieces of time-series data of signals of the RGB color sensor obtained during electrophoretic analyses of each of the four or more types of fluorophores.[Claim 6] A capillary electrophoresis device comprising: a plurality of types of samples each of which contain four or more types of fluorophores; a plurality of capillaries for performing electrophoretic analyses of the plurality of types of samples; a capillary array in which measured parts of the plurality of capillaries are arranged on a same plane; a light source that irradiates the capillary array with a laser beam; an optical system that condenses fluorescences emitted from light-emitting points of the plurality of capillaries by irradiation with the laser beam; and an area sensor that measures an image of fluorescence emitted from the light-emitting points generated by the optical system, wherein the area sensor is an RGB color sensor.[Claim 7] The capillary electrophoresis device according to claim 6, wherein the capillary electrophoresis device performs DNA sequencing of the plurality of types of samples by the electrophoretic analyses.[Claim 8] The capillary electrophoresis device according to claim 6, whcrcin thc capillary cicctrophorcsis dovicc performs DNA fragment analysis of the plurality of types of samples by the electrophoretic analyses.[Claim 9] The capillary electrophoresis device according to any one of claims 6 to 8, wherein the capillary electrophoresis device performs comparing pieces of time-series data of each of signals of the RGB color sensor obtained during the electrophoretic analyses of the plurality of types of sample with pieces of time-series data of signals of the RGB color sensor obtained during electrophoretic analyses of each of the four or more types of fluorophores.[Claim 10] The capillary electrophoresis device according to any one of claims 6 to 8, wherein the capillary electrophoresis device performs representing pieces of time-series data of each of signals of the RGB color sensor obtained during the electrophoretic analyses of the plurality of types of sample by a combination of pieces of time-series data of signals of the RGB color sensor obtained during electrophoretic analyses of each of the four or more types of fluorophores.[Claim 11] The capillary electrophoresis device according to claim 6, whcrcin thc optical systcm includes a plurality of lenses that individually condense fluorescences emitted from the light-emitting points of the plurality of capillaries, and the individually condensed fluorecesences are directly incident on the RGB color sensor.
GB2117852.0A 2017-02-20 2017-02-20 Capillary electrophoresis device Active GB2599049B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2117852.0A GB2599049B (en) 2017-02-20 2017-02-20 Capillary electrophoresis device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2117852.0A GB2599049B (en) 2017-02-20 2017-02-20 Capillary electrophoresis device
GB1910590.7A GB2573692B (en) 2017-02-20 2017-02-20 Analysis system and analysis method

Publications (3)

Publication Number Publication Date
GB202117852D0 GB202117852D0 (en) 2022-01-26
GB2599049A true GB2599049A (en) 2022-03-23
GB2599049B GB2599049B (en) 2022-11-09

Family

ID=80080154

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2117852.0A Active GB2599049B (en) 2017-02-20 2017-02-20 Capillary electrophoresis device

Country Status (1)

Country Link
GB (1) GB2599049B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464411A (en) * 2009-01-15 2009-06-24 浙江大学 Capillary array analyzer
US20130177913A1 (en) * 2006-06-30 2013-07-11 Canon U.S. Life Sciences, Inc. Real-time pcr in micro-channels
WO2015045586A1 (en) * 2013-09-25 2015-04-02 株式会社日立ハイテクノロジーズ Fluorescence detection device and fluorescence detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130177913A1 (en) * 2006-06-30 2013-07-11 Canon U.S. Life Sciences, Inc. Real-time pcr in micro-channels
CN101464411A (en) * 2009-01-15 2009-06-24 浙江大学 Capillary array analyzer
WO2015045586A1 (en) * 2013-09-25 2015-04-02 株式会社日立ハイテクノロジーズ Fluorescence detection device and fluorescence detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Kheterpal et al. A three-wavelength labeling approach for DNA sequencing using energy transfer primers and capillary electrophoresis. Electrophoresis. 1998 Jun;19(8-9):1403-14. *

Also Published As

Publication number Publication date
GB2599049B (en) 2022-11-09
GB202117852D0 (en) 2022-01-26

Similar Documents

Publication Publication Date Title
US20240085329A1 (en) Analysis system and analysis method
EP1835281B1 (en) Multiplexed capillary electrophoresis system
EP2937685A1 (en) Device for genotypic analysis and method for genotypic analysis
WO2020211403A1 (en) Method and apparatus for identifying electrophoretogram, device, and storage medium
JP2023120217A (en) Software for microfluidic systems interfacing with mass spectrometry
US10041884B2 (en) Nucleic acid analyzer and nucleic acid analysis method using same
US20230010104A1 (en) Software for microfluidic systems interfacing with mass spectrometry
US20040168915A1 (en) Two-dimensional protein separations using chromatofocusing and multiplexed capillary gel electrophoresis
JP7050122B2 (en) Capillary electrophoresis device
US6833919B2 (en) Multiplexed, absorbance-based capillary electrophoresis system and method
EP1597570A1 (en) Multiplexed absorbance-based capillary electrophoresis system and method
GB2599049A (en) Analysis system and analysis method
JP2023101563A (en) Analysis device and analysis method
US7534335B2 (en) Multiplexed, absorbance-based capillary electrophoresis system and method
CN112513618B (en) Biopolymer analysis method and biopolymer analysis device
JP2000258392A (en) Cataphoresis device
US20240132951A1 (en) Analysis method of base sequence and gene analyzer
CN115380208A (en) Electrophoresis device and analysis method
CN118765370A (en) System for adaptive spectral calibration
US10466200B2 (en) Gel electrophoresis chip
WO2024167819A1 (en) Method for correlating separation and deconvoluted mass spectral data
Ventzki et al. Automated protein analysis by online detection of laser‐induced fluorescence in slab gels and 3‐D geometry gels
CN118318166A (en) Software for a microfluidic system interfacing with mass spectrometry
JPH10132785A (en) Electrophoresis device