JPH08173197A

JPH08173197A - Method for determining wave form peak for determining dna base sequence and device for determining dna base sequence

Info

Publication number: JPH08173197A
Application number: JP6338517A
Authority: JP
Inventors: Sanpei Usui; 三平臼井; Toshiyuki Sakurai; 利之桜井; Noboru Takatsuki; 昇高杯
Original assignee: Hitachi Electronics Engineering Co Ltd
Current assignee: Hitachi High Tech Corp
Priority date: 1994-12-28
Filing date: 1994-12-28
Publication date: 1996-07-09
Anticipated expiration: 2016-05-28
Also published as: JP3171302B2

Abstract

PURPOSE: To provide a method for determining the peak of the wave form for the determination of a base sequence, capable of extracting a substantially true wave form, even when the wave form signal of fluorescent strength per time (occurrence frequency) contains connected waves. CONSTITUTION: The left and right sides of a wave form peak are grasped as a Gaussian distribution and a Cauchy distribution, respectively, and the wave form peak is thus grasped as the peal having different wave form characteristics on the left and right sides, respectively. The half-value widths of the left and right sides are computed, respectively, and used as barometers. A base having wave form data little in the fluctuation of the position of the peak is first extracted as a normal wave form from data comprising the computed half-value widths and the strength of the peak of each wave form length. The pitch ΔTn of the wave form is determined on the basis of the data of the wave signals of the normal wave form for each interval. The strength I and position of the peak of a true base signal peak in continuous connected wave form signals not selected as the base of the true base signal are determined from the above-described equation and the above-described value: ΔTn.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、ＤＮＡ塩基配列決定
のための波形ピ−ク決定方法およびＤＮＡ塩基配列決定
装置に関し、詳しくは、レーザ光を電気泳動板の側面か
ら照射して一色の蛍光色素マーカを使用して各ＤＮＡ塩
基断片の泳動状態についてラインセンサにより蛍光を受
光することで時間対受光強度信号の波形特性として検出
するＤＮＡ塩基配列決定装置装置（以下ＤＮＡシーケン
サ）において、連続塩基の出現により、波形信号の重な
りが生じても正しいピ−ク値とその位置とを得ることが
でき、例えば、４００個以上の塩基配列の長さの、長い
ＤＮＡ塩基についてのＤＮＡ塩基決定率を向上させるこ
とができるような方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a waveform peak determining method and a DNA nucleotide sequencer for determining a DNA nucleotide sequence. In a DNA base sequencer device (hereinafter referred to as a DNA sequencer) that detects fluorescence as a waveform characteristic of time-reception intensity signal by receiving fluorescence with a line sensor for the migration state of each DNA base fragment using a dye marker, Even if the waveform signals overlap due to the appearance, the correct peak value and its position can be obtained, and for example, the DNA base determination rate for a long DNA base having a length of 400 or more base sequences is improved. Method and apparatus that can be performed.

【０００２】[0002]

【従来の技術】従来の塩基配列の決定方法としては、Ｄ
ＮＡ塩基として、Ａ（アデニン），Ｃ（シトシン），Ｇ
（グアニン），Ｔ（チミン）について電気泳動板の中を
それぞれの断片を泳動させてこれらの到達状態をレーザ
光照射により発生する蛍光をそれぞれの検出レ−ンの所
定の位置で受光して検出し、あらかじめ記憶された塩基
ピッチΔＴのテ−ブルを参照して検出された波形信号の
ピ−ク位置と、その強度など、種々の判定条件とを組み
合わせて行っている。2. Description of the Related Art A conventional method for determining a base sequence is D
As NA bases, A (adenine), C (cytosine), G
For (guanine) and T (thymine), each fragment is electrophoresed in the electrophoresis plate, and the reaching state of these fragments is detected by receiving the fluorescence generated by laser light irradiation at a predetermined position of each detection lane. However, the peak position of the waveform signal detected by referring to the table of the base pitch ΔT stored in advance and various determination conditions such as the intensity thereof are combined.

【０００３】しかし、ＤＮＡ塩基配列の長いものはその
長さに応じて泳動速度も遅くなり、発光幅も広がって、
連続塩基による波形の重なりが起こる。この場合の検出
波形信号は、単独塩基による正規波形の信号とならず
に、検出波形が隣接波形信号と重なって連接波形信号と
なってしまう。また、泳動の際に発生する各塩基のスマ
イリングやピッチ変動によっても波形信号の重なりが発
生して連接波形信号を発生させる原因になる。さらに、
本来のＤＮＡ塩基配列と同時にデオキシ状態で反応が停
止したＤＮＡ塩基が検出されるので、それが各完全結合
のＤＮＡ塩基の検出信号に対してゴースト信号となって
現れてくる。このような信号波形も波形の連接を発生さ
せる原因になる。However, in the case of a long DNA base sequence, the migration speed becomes slower depending on the length, and the emission width becomes wider,
Overlapping of waveforms due to continuous bases. In this case, the detected waveform signal does not become a signal of a normal waveform by a single base, but the detected waveform overlaps with an adjacent waveform signal and becomes a concatenated waveform signal. In addition, the smiley of each base and the pitch fluctuation that occur during the migration also cause overlap of the waveform signals, which causes a concatenated waveform signal. further,
Since the DNA base whose reaction has stopped in the deoxy state is detected simultaneously with the original DNA base sequence, it appears as a ghost signal with respect to the detection signal of the DNA base of each complete bond. Such a signal waveform also causes a waveform concatenation.

【０００４】[0004]

【発明が解決しようとする課題】塩基ピッチΔＴは、泳
動条件に応じて変動し、連接波形の発生は、強い強度の
前後に弱い強度の塩基が存在する場合に読み落としや読
み誤りの原因になり、塩基配列の決定確率を低下させ
る。特に、ＤＮＡ塩基配列の長い４００個あるいは５０
０個、さらにはそれより長いＤＮＡ塩基については、塩
基ピッチΔＴより半値幅ΔＴ_1/2が大きくなり、波形の
重なりが著しくなるため、その決定確率が、例えば、９
０％程度にまで低下してしまう。The base pitch ΔT fluctuates according to the electrophoretic conditions, and the occurrence of a concatenated waveform is a cause of a missed reading or a reading error when there is a weak base before and after a strong one. Therefore, the probability of determining the base sequence is reduced. Especially, 400 or 50 long DNA base sequences
For 0 or even longer DNA bases, the half-width ΔT _1/2 becomes larger than the base pitch ΔT, and the overlapping of the waveforms becomes significant, so that the decision probability is, for example, 9
It decreases to about 0%.

【０００５】この決定確率を向上させるためには、連接
波形に埋設された信号波形を救い出す必要があるが、そ
れぞれのピ−クをピ−ク位置としてもピ−ク位置自体が
変動しているために真のピ−ク位置とすることはでき
ず、それらを正確なピッチとして採って塩基配列を決定
するようなことはできない。さらに、波形信号の重なり
によりピ−クが消失している場合には、強度に応じて選
択される波形信号や算出される塩基ピッチΔＴ自体が不
正確なものになる。この発明は、このような従来技術の
問題点を解決するものであって、時間対蛍光強度（出現
頻度）の波形信号に連接波形が存在していてもほぼ真の
波形を抽出することができる塩基配列決定のための波形
ピ−ク決定方法を提供することにある。この発明の他
の目的は、長いＤＮＡ塩基配列の塩基についての決定確
率を向上させることができるようなＤＮＡシーケンサを
提供することにある。In order to improve the decision probability, it is necessary to rescue the signal waveform embedded in the concatenated waveform, but even if each peak is used as the peak position, the peak position itself changes. Therefore, the true peak position cannot be set, and it is not possible to determine the base sequence by taking them as an accurate pitch. Further, when the peaks disappear due to the overlapping of the waveform signals, the waveform signal selected according to the intensity and the calculated base pitch ΔT itself become inaccurate. The present invention solves such a problem of the conventional technique, and it is possible to extract a substantially true waveform even if there is a concatenated waveform in the waveform signal of time versus fluorescence intensity (frequency of appearance). It is to provide a method for determining a waveform peak for determining a base sequence. Another object of the present invention is to provide a DNA sequencer capable of improving the probability of determining a base of a long DNA base sequence.

【０００６】[0006]

【課題を解決するための手段】このような目的を達成す
るためのこの発明の塩基配列決定のための波形ピ−ク決
定方法およびシーケンサの特徴は、Ａ（アデニン），Ｃ
（シトシン），Ｇ（グアニン），Ｔ（チミン）のＤＮＡ
塩基の波形信号の和のデ−タを泳動時間の関数として算
出して得られたピークの出現ピッチの平均的な値ΔＴm
と所定レベル以上のピーク値とに基づいて演算対象とな
る塩基波形信号のデ−タを抽出する演算対象波形抽出手
段と、抽出された前記塩基波形信号の波形デ−タからピ
ーク位置より時間的に後の波形の第１の判値幅ΔＴR1/2
と、ピーク位置より時間的に前の波形の第２の判値幅Δ
ＴL1/2とを泳動時間の関数としてそれぞれ得る判値幅算
出手段と、第１および第２の判値幅ΔＴR1/2，ΔＴL1/2
とピークの強度とを基準としてピーク位置の変動が少な
い波形デ−タの塩基波形を正規波形の塩基として抽出す
る正規波形の塩基抽出手段と、この正規波形の塩基抽出
手段により抽出された塩基の波形データに対してスマイ
リング補正をするスマイリング補正手段と、このスマイ
リング補正手段により補正された塩基の波形信号デ−タ
に対して所定の時間の間隔ごとに分けて各区間における
ＤＮＡ塩基の波形信号から得られたピークの出現ピッチ
の平均値に応じて塩基ピッチΔＴn を求める塩基ピッチ
算出手段と、この塩基ピッチ算出手段で求めた各区間の
値ΔＴn と判値幅算出手段により求められた第１および
第２の判値幅ΔＴR1/2，ΔＴL1/2とにより正規波形の塩
基抽出手段において選択されていない連接波形信号につ
いて真の塩基信号のピークの強度＊Ｉを、＊Ｉ_n ＝ {Ｉ_n ＋２β₁ β₂×Ｉ_n-3−（β₁−２α×
β₂ ）Ｉ_n-1−αＩ_n+1}／（１−２α×β₁ ）ただし、Ｉ_n は区間におけるｎ番目の観測強度、Ｉ
_n+1 ，Ｉ_n-1，Ｉ_n-3は、Ｉ_n の位置よりΔＴn ，−Δ
Ｔn ，−３ΔＴn離れた位置の観測強度， α＝exp(-ln2・P²) ，β₁ ＝1/(1+Q²)，β₂ ＝1/(1+4
Q²) P＝ 2ΔＴ_n ／ΔＴ_L1/2， Q＝ 2ΔＴ_n ／ΔＴ_R1/2 により算出し、値ΔＴn に基づいて正規波形として抽出
された隣接塩基の波形の位置を基準としてそのピーク位
置を特定する真の波形特定手段とを備えるものである。[Means for Solving the Problems]
Waveform peak determination for base sequencing of the present invention for
The fixed method and sequencer are characterized by A (adenine), C
(Cytosine), G (guanine), T (thymine) DNA
Calculate the sum of the waveform signals of bases as a function of migration time
Average value ΔTm of the appearance pitch of the peaks obtained
And peak value above a certain level
Target waveform extractor for extracting base waveform signal data
And the waveform data of the extracted base waveform signal.
The first threshold width ΔTR1 / 2 of the waveform after the peak position
And the second threshold width Δ of the waveform temporally before the peak position
TL1 / 2 and threshold value calculation as a function of migration time
Outputting means and first and second threshold widths ΔTR1 / 2, ΔTL1 / 2
And the peak intensity is less
Extract the basic waveform of the waveform data as a normal waveform base.
Base wave extraction means for normal waveform and base wave extraction for this normal waveform
The waveform data of the bases extracted by
A smile correction method for ring correction and this smile
Waveform signal data of the base corrected by the ring correction means
For each interval divided into predetermined time intervals
Appearance pitch of peaks obtained from waveform signals of DNA bases
Base pitch to obtain base pitch ΔTn according to the average value of
Calculation means, and of each section obtained by this base pitch calculation means
The value ΔTn and the first and
The salt of a normal waveform is formed by the second threshold widths ΔTR1 / 2 and ΔTL1 / 2.
For the concatenated waveform signal not selected by the base extraction means,
And the intensity * I of the peak of the true base signal is_n = {I_n + 2β₁ β₂× I_n-3-(Β₁-2α x
β₂ ) I_n-1-ΑI_{n + 1}}/ (1-2α × β₁ ) However, I_n Is the nth observed intensity in the interval, I
_{n + 1} , I_n-1, I_n-3Is I_n From the position of ΔTn, -Δ
Tn, observation intensity at -3ΔTn apart, α = exp (-ln2 · P²), Β₁ = 1 / (1 + Q²), Β₂ = 1 / (1 + 4
Q²) P = 2ΔT_n / ΔT_{L1 / 2}, Q = 2ΔT_n / ΔT_{R1 / 2} And extract as a normal waveform based on the value ΔTn
The peak position based on the waveform position of the adjacent base
And a true waveform specifying means for specifying the position.

【０００７】[0007]

【作用】塩基波形信号の波形は、ピークに対してその左
側をガウス分布として捉え、ピークに対してその右側を
コーシー分布として捉え、左右異なる波形特性を持つも
のとしてそれぞれの判値幅を左右に分けてそれぞれ算出
し、算出した左右の判値幅を指標としてこれと各波形信
号のピークの強度とのデ−タからピーク位置の変動が少
ない波形デ−タの塩基を正規波形として、まず抽出し、
この正規波形の波形信号のデ−タに基づいて各区間ごと
に波形のピッチΔＴn を求めて、これとそれぞれの左右
の判値幅ΔＴR1/2，ΔＴL1/2とにより正規波形の塩基と
しては選択されていない連続的な連接波形信号について
真の塩基信号のピークの強度Ｉとその位置とを前記式と
前記値ΔＴn とにより求める。これにより、連接波形の
塩基信号を含めて多くの塩基波形信号をほぼ正しい波形
データとして得ることができる。[Function] Regarding the waveform of the base waveform signal, the left side of the peak is regarded as a Gaussian distribution, and the right side of the peak is regarded as a Cauchy distribution. Then, the base of the waveform data with a small fluctuation in the peak position is extracted as a normal waveform from the data of the calculated left and right threshold widths as an index and the peak intensity of each waveform signal, and first extracted,
The pitch ΔTn of the waveform is obtained for each section based on the data of the waveform signal of this normal waveform, and the base pitch of the normal waveform is selected by this and the left and right threshold widths ΔTR1 / 2 and ΔTL1 / 2. The peak intensity I of the true base signal and the position of the continuous continuous waveform signal which is not present are determined by the above equation and the value ΔTn. As a result, many base waveform signals including the base signal having the concatenated waveform can be obtained as substantially correct waveform data.

【０００８】ところで、デオキシ状態で反応が停止した
ＤＮＡ塩基は、反応阻害要因の影響で不完全結合が発生
することにより生じる。これは、本来の完全結合のＤＮ
Ａ塩基に対して通常、数％程度発生する。したがって、
デオキシ状態で反応が停止したＤＮＡ塩基の出現頻度は
完全結合のＤＮＡ塩基の出現頻度の数％になる。しか
し、これがゴースト信号成分になる。ゴースト信号は、
相似形であるので、自己の波形信号については影響がほ
とんどないが、他のＤＮＡ塩基は、波形形態が異なるの
で、その影響が問題になる。一方、自己の塩基について
ゴースト成分を除去すると、強度の弱い波形信号成分に
対する影響が大きく、塩基決定確率を９８％〜９９％と
いう高い確率を求める場合にそれに影響を与える。そこ
で、前記に加えて、あるＤＮＡ塩基について自己のゴー
スト成分の除去はしないで、本来の信号強度を低減する
ことなく、ゴースト信号を効果的に除去するものとし
て、残り３個のＤＮＡ塩基の信号成分からのゴースト信
号成分の蛍光強度（出現頻度）をゴースト発生率εをか
けて算出する。これを原波形信号のから減算するように
すれば、強度の弱い信号波形を含めてほぼ真の波形信号
をより多く検出することができる。これにより決定確率
をさらに向上させることができる。By the way, the DNA base whose reaction is stopped in the deoxy state is generated by the generation of incomplete bond due to the influence of the reaction inhibiting factor. This is the original fully coupled DN
Usually, about several percent occurs with respect to A base. Therefore,
The frequency of appearance of DNA bases in which the reaction has stopped in the deoxy state is several% of the frequency of appearance of completely bonded DNA bases. However, this becomes a ghost signal component. The ghost signal is
Since it has a similar shape, it has almost no effect on its own waveform signal, but since other DNA bases have different waveform shapes, its effect becomes a problem. On the other hand, if the ghost component of the own base is removed, it has a great influence on the waveform signal component having a weak intensity, and this has an influence on obtaining a high base decision probability of 98% to 99%. Therefore, in addition to the above, the signal of the remaining three DNA bases is used to effectively remove the ghost signal without reducing the original signal intensity without removing the ghost component of one DNA base. The fluorescence intensity (appearance frequency) of the ghost signal component from the component is calculated by multiplying the ghost occurrence rate ε. By subtracting this from the original waveform signal, it is possible to detect a larger number of substantially true waveform signals including the signal waveform with weak intensity. This can further improve the decision probability.

【０００９】[0009]

【実施例】図１は、この発明の塩基配列決定のための波
形ピ−ク決定方法を適用したＤＮＡシーケンサの一実施
例のブロック図、図２は、そのゴースト信号除去処理の
フローチャート、図３は、その波形データ生成方法の説
明図、図４は、真のピーク波形データを抽出するフロー
チャート、図５は、ピッチ頻度と波形分布と半値幅との
説明図、そして図６は連接波形と真のピークとの関係の
説明図である。図１において、１は、ＤＮＡシーケンサ
であって、レーザ光Ｌにより照射された泳動板２からの
所定の幅を持つ塩基断片群の蛍光をレンズ（図示せず）
を介して一次元ラインイメージセンサ（ＣＣＤ）３によ
り受光する。これにより一色の蛍光色素マーカを使用し
た各ＤＮＡ塩基として、Ａ（アデニン），Ｃ（シトシ
ン），Ｇ（グアニン），Ｔ（チミン）を泳動させた各レ
ーンにおける蛍光の受光強度をアナログ信号として検出
し、ＣＣＤ駆動／制御回路４においてそれを受けて、そ
のアナログ信号を連続的に演算処理装置５に出力する。1 is a block diagram of an embodiment of a DNA sequencer to which the waveform peak determining method for base sequence determination of the present invention is applied. FIG. 2 is a flowchart of the ghost signal removing process. 4 is an explanatory diagram of the waveform data generation method, FIG. 4 is a flowchart for extracting true peak waveform data, FIG. 5 is an explanatory diagram of pitch frequency, waveform distribution, and half width, and FIG. 6 is a concatenated waveform and true waveform. It is explanatory drawing of the relationship with the peak of. In FIG. 1, reference numeral 1 denotes a DNA sequencer, which uses a lens (not shown) to fluoresce a base fragment group having a predetermined width from the electrophoretic plate 2 irradiated with the laser beam L.
The light is received by the one-dimensional line image sensor (CCD) 3 via. As a result, fluorescence intensity in each lane in which A (adenine), C (cytosine), G (guanine), and T (thymine) were migrated as each DNA base using a single-color fluorescent dye marker was detected as an analog signal. Then, the CCD drive / control circuit 4 receives it and continuously outputs the analog signal to the arithmetic processing unit 5.

【００１０】ＣＣＤ駆動／制御回路４は、タイミングコ
ントローラ、タイミング発生回路、マルチプレクサ等を
内蔵していて、ラインイメージセンサ３から読み出した
信号を演算処理装置５に出力する。ラインイメージセン
サ３は、例えば、１画素１２５μｍの信号を発生して各
レーンについて４画素分を積分してレーン方向に約６０
０μｍ幅の信号を１単位として出力する。演算処理装置
５は、マイクロプロセッサ（ＭＰＵ）５０と、ＣＣＤ駆
動／制御回路４からの受光信号を受けてそれを増幅する
アンプ５１、ローパスフィルタ（ＬＰＦ）５２、Ａ／Ｄ
変換回路（Ａ／Ｄ）５３、メモリ５４、波形メモリ５
５、ＣＲＴディスプレイ（ＣＲＴ）５６、キーボード５
７、プリンタ（図示せず）、そしてタイマ（図示せず）
等からなり、バス５８を介してこれら回路がＭＰＵ５０
と相互に接続されている。そして、受光した蛍光強度を
時間対蛍光強度（ＤＮＡ塩基出現頻度）の波形データと
してＣＲＴディスプレイ５６あるいはプリンタに出力す
る。The CCD drive / control circuit 4 has a built-in timing controller, timing generation circuit, multiplexer, etc., and outputs the signal read from the line image sensor 3 to the arithmetic processing unit 5. The line image sensor 3 generates, for example, a signal of 125 μm per pixel, integrates 4 pixels for each lane, and outputs about 60 pixels in the lane direction.
A signal of 0 μm width is output as one unit. The arithmetic processing unit 5 includes a microprocessor (MPU) 50, an amplifier 51 that receives a light-receiving signal from the CCD drive / control circuit 4, and amplifies it. A low-pass filter (LPF) 52 and an A / D.
Conversion circuit (A / D) 53, memory 54, waveform memory 5
5, CRT display (CRT) 56, keyboard 5
7, printer (not shown), and timer (not shown)
Etc., and these circuits are connected to the MPU 50 via the bus 58.
Are interconnected with. Then, the received fluorescence intensity is output to the CRT display 56 or a printer as waveform data of time-dependent fluorescence intensity (frequency of appearance of DNA bases).

【００１１】ＭＰＵ５０は、Ａ／Ｄ５３に所定の周期で
サンプリングパルスを送出してＬＰＦ５２を経てノイズ
が除去された受光信号をデジタル値に変換させ、バス５
８を介してこのデジタル値を受けて波形メモリ５５に順
次測定データとしてそれを記憶する。その結果、波形メ
モリ５５には、サンプリング時間に対応して時間対蛍光
強度のデータがデジタル値でそれぞれの測定時点におい
て、例えば、図３(a)に示すような波形信号として記憶
されることになる。一方、メモリ５４には、塩基波形デ
ータ抽出プログラム５４ａと、ゴーストデータ算出プロ
グラム５４ｂ、ゴーストデータ除去プログラム５４ｃ、
波形表示／出力プログラム５４ｄ、演算対象波形抽出プ
ログラム５４ｅ，判値幅算出プログラム５４ｆ、正規波
形の塩基抽出プログラム５４ｇ、スマイリング補正プロ
グラム５４ｈ、塩基ピッチ算出プログラム５４ｉ、そし
て真の波形特定プログラム５４ｊ、さらに各種の処理プ
ログラムが格納されている。The MPU 50 sends a sampling pulse to the A / D 53 at a predetermined cycle to convert the noise-removed light-receiving signal through the LPF 52 into a digital value, and the bus 5
The digital value is received via 8 and stored in the waveform memory 55 as measurement data in sequence. As a result, the waveform memory 55 stores the data of the fluorescence intensity versus time corresponding to the sampling time as a digital value at each measurement time point, for example, as a waveform signal as shown in FIG. Become. On the other hand, in the memory 54, a base waveform data extraction program 54a, a ghost data calculation program 54b, a ghost data removal program 54c,
Waveform display / output program 54d, calculation target waveform extraction program 54e, open range calculation program 54f, regular waveform base extraction program 54g, smileing correction program 54h, base pitch calculation program 54i, and true waveform identification program 54j, and various other types. A processing program is stored.

【００１２】塩基波形データ抽出プログラム５４ａは、
波形メモリ５５に記憶された測定波形データから各ＤＮ
Ａ塩基Ａ，Ｃ，Ｇ，Ｔに対応してそれぞれの波形データ
を抽出してバックグランドノイズに対応する所定量のデ
ータ値を波形データメモリの各測定時点に対応して読出
したデータから減算する。さらに、これは、メモリ５４
の作業領域に時間の経過に対応してＤＮＡ塩基Ａ，Ｃ，
Ｇ，Ｔのそれぞれのレーンに対応するタイミングで抽出
されたデータを各レーンに対応するように分離してそれ
ぞれの記憶領域に分割して順次記憶する処理を行う。な
お、バックグランドノイズは、例えば、所定の各測定区
間対応に測定された波形データのうちの極小値を求めて
順次極小値分を減算していく処理である。The base waveform data extraction program 54a is
Each DN from the measured waveform data stored in the waveform memory 55
Waveform data corresponding to A bases A, C, G, and T are extracted, and a predetermined amount of data value corresponding to background noise is subtracted from the read data corresponding to each measurement time point of the waveform data memory. . Furthermore, this is the memory 54
DNA bases A, C, and
The data extracted at the timing corresponding to each lane of G and T is separated so as to correspond to each lane, divided into each storage area, and sequentially stored. The background noise is, for example, a process of obtaining a local minimum value in the waveform data measured corresponding to each predetermined measurement section and sequentially subtracting the local minimum value.

【００１３】ゴーストデータ算出プログラム５４ｂは、
次の式(1) ｇi ＝ε（ｄ1 ＋ｄ2 ＋ｄ3 ＋ｄ4 ）−εｄi ………(1) に従って、各ＤＮＡ塩基Ａ，Ｃ，Ｇ，Ｔのそれぞれにつ
いてぞれぞれを順次ｉに割り当ててゴースト信号成分ｇ
i の演算をある測定時点について算出して、その測定時
点を順次更新して測定期間全体に亙り、ゴースト信号成
分ｇi のレベルをデータとして各元のデータに対応させ
た時間関係で算出し、この時間関係で対応させてメモリ
５５の所定の領域に順次記憶していくプログラムであ
る。ただし、ｄ1 ，ｄ2 ，ｄ3 ，ｄ4 は各ＤＮＡ塩基
Ａ，Ｃ，Ｇ，Ｔについての波形データ値であって、ｎは
この実施例では４である。ｄi はＤＮＡ塩基Ａ，Ｃ，
Ｇ，ＴのうちのあるＤＮＡ塩基ｉの波形データ値であ
り、εはゴースト発生率であって、ここでは、０．０３
である。各ＤＮＡ塩基Ａ，Ｃ，Ｇ，Ｔについては、デオ
キシ状態で反応が停止する発生率が本来の完全結合のＤ
ＮＡの塩基に対して３％程度になるからである。なお、
ゴースト発生率εは、検出したい長い配列の塩基配列の
弱い蛍光強度の信号を検出に対して排除しない程度のも
のであればよく、通常、０．０２〜０．１程度が妥当で
ある。The ghost data calculation program 54b is
According to the following formula (1) gi = ε (d1 + d2 + d3 + d4) -εdi ... (1), each of the DNA bases A, C, G, and T is sequentially assigned to i and the ghost signal is assigned. Ingredient g
The calculation of i is calculated for a certain measurement time point, the measurement time point is sequentially updated, and the level of the ghost signal component gi is calculated as data in the time relation corresponding to each original data over the entire measurement period. It is a program for sequentially storing in a predetermined area of the memory 55 in association with each other in terms of time. However, d1, d2, d3 and d4 are waveform data values for the respective DNA bases A, C, G and T, and n is 4 in this embodiment. di is a DNA base A, C,
It is a waveform data value of a certain DNA base i of G and T, and ε is a ghost occurrence rate.
Is. For each of the DNA bases A, C, G, and T, the occurrence rate at which the reaction stops in the deoxy state is D, which is the original complete bond.
This is because it is about 3% with respect to the base of NA. In addition,
The ghost occurrence rate ε may be such that a weak fluorescence intensity signal of a base sequence having a long sequence to be detected is not excluded from detection, and usually about 0.02 to 0.1 is appropriate.

【００１４】前記(1) 式は、各測定時点において得られ
た各ＤＮＡの波形データの合計値に前記ゴースト発生率
をかけることによりその測定時点のゴースト信号成分の
信号のレベルを算出するものであり、これのうち、自己
の本来の成分が含まれている分として式の第２項「εｄ
i」を減算することで、他のＤＮＡのゴースト信号成分
のみを算出している。これにより自己の弱い強度の信号
波形を救済することができる。ゴーストデータ除去プロ
グラム５４ｃは、前記ゴーストデータ算出プログラム５
４ｂにより算出された各測定時点対応の測定データをそ
の測定時点の元の波形データから除去して本来のデータ
を各ＤＮＡの塩基Ａ，Ｃ，Ｇ，Ｔに対してそれぞれ生成
し、これをそれぞれの測定時点に対応してメモリ５４に
記憶するプログラムである。The equation (1) is for calculating the signal level of the ghost signal component at the time of measurement by multiplying the total value of the waveform data of each DNA obtained at each time of measurement by the ghost occurrence rate. Yes, and the second term of the equation, “εd
By subtracting "i", only the ghost signal component of other DNA is calculated. As a result, the self weak signal waveform can be relieved. The ghost data removal program 54c is the ghost data calculation program 5 described above.
The measurement data corresponding to each measurement time point calculated by 4b is removed from the original waveform data at that measurement time point, and the original data is generated for each of the bases A, C, G, and T of each DNA. It is a program stored in the memory 54 corresponding to the measurement time point.

【００１５】なお、この場合、ゴーストデータ除去プロ
グラム５４ｃは、前記の各ＤＮＡの塩基Ａ，Ｃ，Ｇ，Ｔ
に対してそれぞれ他のＤＮＡのゴースト信号成分を減算
するのではなく、各ＤＮＡの塩基Ａ，Ｃ，Ｇ，Ｔの各測
定時点における総和の波形データに対して順次それぞれ
の他のＤＮＡのゴースト信号成分を減算していき、総計
の状態で加算した波形データを生成するものであっても
よい。In this case, the ghost data removing program 54c uses the bases A, C, G, T of each DNA described above.
However, instead of subtracting the ghost signal component of each other DNA, the ghost signal of each other DNA is sequentially applied to the total waveform data of the bases A, C, G, T of each DNA at each measurement time point. The waveform data may be generated by subtracting the components and adding them in the total state.

【００１６】次に図２に示す処理の流れと図３とに従っ
て、ゴースト信号成分の除去について説明する。まず、
ＭＰＵ５０は、波形メモリ５５に所定量の測定データが
記憶された測定時点で、割り込み処理により塩基波形デ
ータ抽出プログラム５４ａを実行する。そして、バック
グランドノイズを除去する処理をし（ステップ１０
１）、図３の(a) に示すある測定時点の波形データ（説
明の都合上、図３ではデジタル値をアナログ状態で示し
ている。）から各ＤＮＡ塩基Ａ，Ｃ，Ｇ，Ｔについの波
形データを抽出してメモリ５４に記憶する（ステップ１
０２）。この記憶状態を多数の測定時点について連続的
に図示すると、図３の(b) になる。Next, the removal of the ghost signal component will be described with reference to the flow of processing shown in FIG. 2 and FIG. First,
The MPU 50 executes the base waveform data extraction program 54a by interrupt processing at the time of measurement when a predetermined amount of measurement data is stored in the waveform memory 55. Then, the background noise is removed (step 10).
1), from the waveform data at a certain measurement time point shown in FIG. 3A (for convenience of explanation, the digital value is shown in an analog state in FIG. 3), from each of the DNA bases A, C, G, T The waveform data is extracted and stored in the memory 54 (step 1
02). When this storage state is continuously illustrated for a large number of measurement points, it becomes as shown in FIG.

【００１７】次に、ＭＰＵ５０は、ゴーストデータ算出
プログラム５４ｂを実行して、各ゴースト信号成分の算
出を式(1) に従って行い、メモリ５４に記憶する（ステ
ップ１０３）。これが図３(c) のＤＮＡ塩基Ａについて
Ｇstとして示した網目部分の波形として代表して示す状
態である。そして、ゴーストデータ除去プログラム５４
ｃを実行して、ＭＰＵ５０は、前記ゴーストデータ算出
プログラム５４ｂにより算出された各測定時点対応のゴ
ースト信号の測定データをその測定時点の元のデータか
ら除去して本来のデータを各ＤＮＡの塩基Ａ，Ｃ，Ｇ，
Ｔに生成し、これをそれぞれの測定時点に対応してメモ
リ５４に記憶する（ステップ１０４）。なお、この場
合、自己の波形データにおいては、ゴースト信号成分が
含まれている。Next, the MPU 50 executes the ghost data calculation program 54b, calculates each ghost signal component according to the equation (1), and stores it in the memory 54 (step 103). This is the state shown as a representative waveform of the mesh portion shown as Gst for the DNA base A in FIG. 3 (c). Then, the ghost data removal program 54
By executing c, the MPU 50 removes the measurement data of the ghost signal corresponding to each measurement time point calculated by the ghost data calculation program 54b from the original data at the measurement time point and removes the original data from the base A of each DNA. , C, G,
It is generated in T and stored in the memory 54 corresponding to each measurement time point (step 104). In this case, the ghost signal component is included in its own waveform data.

【００１８】このデータは、次に、連接波形信号につい
て真の波形信号のピークを求める処理（ステップ１０
５）の処理において、波形データの半値幅の算出と波形
データにおける正規波形の塩基の抽出なとを経て各波形
データのピーク位置と、その値が各塩基強度として算出
され、これら算出値を基にして解析する処理を行うこと
により塩基配列が決定される。なお、このステップ１０
５の処理の詳細は図４に示す。また、前記ステップ１０
４の後に波形表示／出力プログラム５４ｄをＭＰＵ５０
が実行して、ゴースト信号成分を排除した測定結果デー
タをＣＲＴディスプレイ５６あるいはプリンタに出力し
てもよい（ステップ１０６）。This data is then processed to find the peak of the true waveform signal for the concatenated waveform signal (step 10).
In the process of 5), the peak position of each waveform data and its value are calculated as the respective base intensities through the calculation of the full width at half maximum of the waveform data and the extraction of the bases of the normal waveform in the waveform data. The base sequence is determined by performing the analysis process described in 1. Note that this step 10
Details of the process of 5 are shown in FIG. Also, in the step 10
Waveform display / output program 54d after the MPU50
May be executed to output the measurement result data excluding the ghost signal component to the CRT display 56 or the printer (step 106).

【００１９】図４の真の波形信号のピークを求める処理
を説明する前に、この処理に関係するプログラムについ
て、まず説明する。演算対象波形抽出プログラム５４ｅ
は、Ａ（アデニン），Ｃ（シトシン），Ｇ（グアニ
ン），Ｔ（チミン）のＤＮＡ塩基の波形信号の和のデ−
タを泳動時間の関数として算出し、この和のデ−タに対
して離散化微分処理として、例えば、二次５項平滑化微
分処理をして、そのデ−タに対してピークの検出をして
検出したピークのピッチについての頻度分布デ−タを生
成する。この頻度分布デ−タを示すと、図５の(a) であ
る。また、ＤＮＡ塩基の波形信号の和の演算式は式(2)
になる。Ｉsum ＝ＩA ＋ＩC ＋ＩG ＋ＩT ………(2) 前記の頻度分布デ−タのピーク位置からΔＴ，Δ２Ｔ，
Δ３Ｔの値を得て、これらからΔＴの平均値としての塩
基ピッチΔＴm を算出する。さらに、ピークの出現ピッ
チの平均的な値ΔＴm と所定レベルの閾値を設定してそ
れ以上のピーク値とに基づいて演算対象となる塩基波形
信号のデ−タを抽出する。Before describing the processing for obtaining the peak of the true waveform signal in FIG. 4, a program related to this processing will be described first. Calculation target waveform extraction program 54e
Is the sum of the waveform signals of the DNA bases of A (adenine), C (cytosine), G (guanine) and T (thymine).
Data is calculated as a function of migration time, and data of this sum is subjected to discretized differential processing, for example, second-order five-term smoothing differential processing to detect peaks for the data. Then, frequency distribution data about the pitch of the detected peak is generated. The frequency distribution data is shown in FIG. In addition, the arithmetic expression of the sum of the waveform signals of DNA bases is given by equation (2)
become. Isum = IA + IC + IG + IT (2) From the peak position of the frequency distribution data, ΔT, Δ2T,
The value of Δ3T is obtained, and the base pitch ΔTm as the average value of ΔT is calculated from these values. Further, the average value ΔTm of the appearance pitch of the peak and a threshold value of a predetermined level are set, and the data of the base waveform signal to be calculated is extracted based on the peak value higher than that.

【００２０】この場合、さらに、単独波形塩基のみを演
算対象として抽出してもよい。これは、前記演算対象塩
基の波形データから波形の各ピーク位置において、ｂ×
ΔＴm の範囲（例えば、ｂ＝１．２〜１．８の範囲のあ
る値）に別のピークが存在する波形データを振り落と
す。そして、その残りの波形データに対して次の条件式
を適用して演算処理により単独波形塩基のみのデータを
抽出する。この場合には、次の各波形の半値幅算出プロ
グラム５４ｆで詳細に説明するが、ピーク位置より時間
的に後（右側）の波形の判値幅ΔＴ_R1/2と、ピーク位置
より時間的に前の（左側）の波形の判値幅ΔＴL1/2とす
ると、｜（ΔＴ_L1/2−ΔＴ_R1/2)／（ΔＴ_L1/2＋ΔＴ_R1/2）｜
＜δ ただし、δ≒0.1 〜0.3 の条件を満たすピークの波形のみを抽出する。ここで、
δの値を適切な範囲に選択することで単独波形の塩基の
みを選択し、これに基づいて以後の半値幅等の基礎デー
タを算出するようにしてもよい。判値幅算出プログラム
５４ｆは、前記の処理により抽出された演算対象の塩基
波形信号の波形デ−タに対して図５(b) のグラフに示す
ように、ピーク右側の波形の判値幅ΔＴR1/2と、ピーク
左側の波形の判値幅ΔＴ_L1/2とを泳動時間の関数として
平均値を採って、判値幅のテーブルを作成する。In this case, further, only the single waveform base may be extracted as the calculation target. This means that at each peak position of the waveform from the waveform data of the calculation target base, b ×
The waveform data in which another peak exists in the range of ΔTm (for example, a certain value in the range of b = 1.2 to 1.8) is shaken off. Then, the following conditional expression is applied to the remaining waveform data, and data of only a single waveform base is extracted by arithmetic processing. In this case, the full width at half maximum calculation program 54f for each waveform will be described in detail. However, the threshold value width ΔT _{R1 / 2} of the waveform that is after the peak position (on the right side) and before the peak position in time. Assuming that the threshold width of the waveform on the left side of is ΔTL1 / 2, | (ΔTL1 _{/ 2−} ΔTR1 _{/ 2)} / (ΔTL1 _{/ 2} + ΔTR1 _{/ 2} ) |
<Δ However, only the peak waveform satisfying the condition of δ ≈ 0.1 to 0.3 is extracted. here,
It is also possible to select only the base of the single waveform by selecting the value of δ in an appropriate range and to calculate the basic data such as the full width at half maximum thereafter based on this. As shown in the graph of FIG. 5 (b) for the waveform data of the calculation target base waveform signal extracted by the above-described processing, the threshold value calculation program 54f determines the threshold value width ΔTR1 / 2 of the waveform on the right side of the peak. And the threshold width ΔT _{L1 / 2} of the waveform on the left side of the peak are averaged as a function of the migration time to create a table of the threshold widths.

【００２１】正規波形の塩基抽出プログラム５４ｇは、
判値幅ΔＴ_R1/2，ΔＴ_L1/2と、得られている各波形のピ
ーク値（強度）とを基準としてピーク位置の変動が少な
い波形デ−タの塩基を正規の塩基波形として抽出する。
具体的には、左右の半値幅の抽出基準幅を図(b) 前記の
判値幅デ−タテーブルにおいてデ−タのばらつきの範囲
として、図(c) に示すように、３σの範囲を採り、それ
ぞれ左判値幅の選択範囲の点線のライン３σLA，３σLB
と、右の判値幅の選択範囲の一点鎖線のライン３σRA，
３σRBとする。そして、それぞれの塩基波形の泳動時間
に対応して判別幅を得て、左右の判値幅がこの範囲内に
ある波形デ−タが、例えば８０％程度（７０％から９０
％の範囲から選択した値）抽出されるような所定のピー
ク強度以上を閾値として設定して、泳動時間の関数とし
て順次対象となる波形デ−タを選択していく。スマイリ
ング補正プログラム５４ｈは、正規波形の塩基抽出プロ
グラム５４ｇにより抽出された各塩基の前記波形信号の
デ−タに対してスマイリングにより時間的にずれた塩基
についての波形位置の補正を行う。The normal waveform base extraction program 54g is
The bases of the waveform data in which the fluctuation of the peak position is small are extracted as a normal base waveform with reference to the threshold widths ΔT _{R1 / 2} and ΔT _{L1 / 2} and the obtained peak value (intensity) of each waveform.
Specifically, the extraction reference widths of the left and right half-value widths are set as the range of data variation in the above-mentioned price range data table as shown in Fig. (B), and the range of 3σ is set as shown in Fig. (C). , Dotted lines 3σLA and 3σLB of the selection range of the left price range, respectively
And the right-hand side of the selection range of the price range, the dashed-dotted line 3σRA,
Let 3σ RB. Then, the discrimination width is obtained corresponding to the migration time of each base waveform, and the waveform data whose left and right threshold widths are within this range is, for example, about 80% (70% to 90%).
A value selected from the range of%) is set as a threshold value equal to or higher than a predetermined peak intensity to be extracted, and the target waveform data is sequentially selected as a function of the migration time. The smiley correction program 54h corrects the waveform position of a base that is temporally shifted by smileing to the waveform signal data of each base extracted by the normal waveform base extraction program 54g.

【００２２】これは、選択された正規波形の塩基につい
て各塩基波形のデ−タの総和Ｓに対して塩基ピッチΔＴ
の分散が最小になるように各塩基波形デ−タの配列を求
めるものである。例えば、Ａの塩基波形デ−タの配列を
基準として他の塩基の波形デ−タの位置をシフトさせ
て、Σ｜ΣΔＴ／Ｓ−ΔＴi ｜が最小になるようにし
（ΣΔＴ／Ｓは平均値）、次に基準となる塩基の配列を
Ａの塩基から他の塩基に変えて同じような処理を順次繰
り返すことにより求めることができる。塩基ピッチ算出
プログラム５４ｉは、スマイリング補正後の補正された
塩基の波形信号デ−タに対して所定の時間の間隔ごとに
分けて各区間におけるＤＮＡ塩基の波形信号の和のデ−
タから得られたピークの出現ピッチの平均的な値ΔＴn
とを求める処理をする。なお、区間幅は、１区間で数十
から百程度の波形データが入る程度がよい。この場合に
各区間で求めたΔＴn の値を直線で接続して、図５(c)
の判値幅ΔＴ_R1/2，ΔＴ_L1/2のテーブルと同様に泳動時
間の関数として連続値のテーブルとすることができる。
このようにすれば、それぞれの泳動時間に対応してそれ
ぞれの波形の時間位置で直線上で順次算出される半値幅
によりΔＴ_R1/2，ΔＴ_L1/2を得て、次の式(3)の計算に
より真のピーク強度を得るとともに、直線上で順次算出
されるΔＴn によりピーク位置をより正確に算出でき
る。This is the base pitch ΔT with respect to the sum S of the data of each base waveform for the base of the selected normal waveform.
The sequence of each base waveform data is obtained so that the variance of is minimized. For example, the positions of the waveform data of other bases are shifted with reference to the sequence of the base waveform data of A so that Σ | ΣΔT / S-ΔTi | is minimized (ΣΔT / S is an average value). ), And then the base sequence to be the reference is changed from the base of A to another base, and the same processing is sequentially repeated to obtain the same. The base pitch calculation program 54i divides the corrected base waveform signal data after the smileing correction into predetermined time intervals and outputs the sum of the DNA base waveform signals in each section.
Average value ΔTn of the appearance pitch of peaks obtained from
And the process of asking for. The section width is preferably such that several tens to hundreds of waveform data are included in one section. In this case, connect the values of ΔTn obtained in each section with a straight line, and
Similar to the table of the judgment widths ΔT _{R1 / 2} and ΔT _{L1 / 2,} a continuous value table can be obtained as a function of the migration time.
In this way, ΔT _{R1 / 2} and ΔT _{L1 / 2} are obtained from the half widths that are sequentially calculated on the straight line at the time positions of the respective waveforms corresponding to the respective migration times, and the following formula (3) is obtained. The true peak intensity can be obtained by the calculation of, and the peak position can be calculated more accurately by ΔTn sequentially calculated on the straight line.

【００２３】真の波形特定プログラム５４ｊは、塩基ピ
ッチ算出プログラム５４ｉで求めた値ΔＴn と判値幅算
出プログラム５４ｆにより求められている各区間内のそ
れぞれの波形の判値幅ΔＴ_R1/2，ΔＴ_L1/2とにより各区
間にある正規波形の塩基抽出手段によって正規波形とし
ては選択されていない連続的な連接波形信号について真
の塩基信号のピークの強度Ｉを次の式(2) により求め、
そのピーク位置を特定する。＊Ｉ_n ＝ {Ｉ_n ＋２β₁ β₂×Ｉ_n-3−（β₁−２α×β₂ ）Ｉ_n-1−αＩ_n+1} ／（１−２α×β₁ ） ……………(3) ただし、Ｉ_n は区間におけるｎ番目の観測強度、Ｉ
_n+1 ，Ｉ_n-1，Ｉ_n-3は、Ｉ_n の位置よりΔＴn ，−Δ
Ｔn ，−３ΔＴn離れた位置の観測強度， α＝exp(-ln2・P²) ，β₁ ＝1/(1+Q²)，β₂ ＝1/(1+4
Q²) P＝ 2ΔＴ_n ／ΔＴ_L1/2， Q＝ 2ΔＴ_n ／ΔＴ_R1/2 この(3) 式は、図５(b) に示すように、塩基波形信号の
波形について、ピークに対してその左側がガウス分布と
して捉え、ピークに対してその右側がコーシー分布とし
て捉えてそれぞれのピーク値ｈに対する値を得る式を考
えると、ガウス分布では、ある分布値ｙは、ｙ＝ｈexp
{-ln2(2T/ΔT_1/2)²}となる。また、コーシー分布では、
ある分布値ｙは、ｙ＝ｈ／{1+(2T/ΔT_1/2)²} になる。
これに従って図(c) に示す連接波形について強度を求め
ると、式(3) になる。The true waveform identification program 54j
Value calculated by the switch calculation program 54i
The section within each section calculated by the output program 54f.
Threshold width ΔT of each waveform_{R1 / 2}, ΔT_{L1 / 2}And each ward
A normal waveform is extracted by the base extraction means of the normal waveform between
True for continuous concatenated waveform signals that are not selected
The intensity I of the peak of the base signal of is calculated by the following equation (2),
The peak position is specified. * I_n = {I_n + 2β₁ β₂× I_n-3-(Β₁-2α × β₂ ) I_n-1-ΑI_{n + 1}} / (1-2α × β₁ ) ……………… (3) However, I_n Is the nth observed intensity in the interval, I
_{n + 1} , I_n-1, I_n-3Is I_n From the position of ΔTn, -Δ
Tn, observation intensity at -3ΔTn apart, α = exp (-ln2 · P²), Β₁ = 1 / (1 + Q²), Β₂ = 1 / (1 + 4
Q²) P = 2ΔT_n / ΔT_{L1 / 2}, Q = 2ΔT_n / ΔT_{R1 / 2} As shown in Fig. 5 (b), this equation (3) is for the base waveform signal.
Regarding the waveform, the Gaussian distribution is on the left side of the peak.
And the right side of the peak is Cauchy distribution
Consider the formula to obtain the value for each peak value h
Thus, in the Gaussian distribution, some distribution value y is y = hexp
{-ln2 (2T / ΔT_1/2)²} Becomes. And in the Cauchy distribution,
One distribution value y is y = h / {1+ (2T / ΔT_1/2)²} become.
In accordance with this, the strength of the concatenated waveform shown in Fig. (C) is calculated.
Then, equation (3) is obtained.

【００２４】さらに、この強度とともに、前記のピッチ
値ΔＴn に基づいて正規波形として抽出された隣接塩基
の波形、例えば、図６の強度Ｉn-2 の位置にある波形デ
ータＫを基準としてこの波形データＫの時間位置ｔｓに
対してｔｓ＋｛ｎ−（ｎ−２）｝×ΔＴn によりｎ番目
の波形データのピーク位置を求めることができる。以下
同様にして、ｎ＋１番目とｎ＋２番目のピーク位置と強
度とを求める。図６は、このようにして算出したｎ＝５
４８番目と、ｎ＋１＝５４９番目，ｎ＋２＝５５０番目
との状態を示している。なお、このｎの番号は、区間の
順位ではなく、最初から数えられた番号である。区間に
おける順位としては、例えば、前記の数値より５００番
分引いた値になる。Further, together with this intensity, the waveform of adjacent bases extracted as a normal waveform based on the pitch value ΔTn, for example, the waveform data K at the position of intensity In-2 in FIG. The peak position of the n-th waveform data can be obtained by ts + {n- (n-2)} * [Delta] Tn with respect to the time position ts of K. Similarly, the n + 1-th and n + 2-th peak positions and intensities are obtained in the same manner. FIG. 6 shows that n = 5 calculated in this way.
The states of the 48th, n + 1 = 549th, and n + 2 = 550th are shown. It should be noted that the n number is not the rank of the section, but the number counted from the beginning. The ranking in the section is, for example, a value obtained by subtracting 500 from the above numerical value.

【００２５】次に図４に示す処理の流れと図５とに従っ
て、正常な波形信号とともに真の波形信号のピークを求
める処理について説明する。まず、ＭＰＵ５０は、演算
対象波形抽出プログラム５４ｅを実行して、図２のステ
ップ１０４で求められたバックグランドノイズとゴース
ト信号を除去する処理をした各塩基の波形デ−タに対
し、Ａ（アデニン），Ｃ（シトシン），Ｇ（グアニ
ン），Ｔ（チミン）のＤＮＡ塩基の波形信号の和のデ−
タを式(2) に従って泳動時間の関数として算出する（ス
テップ１０５Ａ）。そしてこの和のデ−タに対して二次
５項平滑化微分処理（離散化処理の一例として）をして
（ステップ１０５Ｂ）、そのデ−タに対してピークの検
出をして検出したピークのピッチについての頻度分布デ
−タを生成する（ステップ１０５Ｃ）。さらに、この頻
度分布デ−タから平均値としての塩基ピッチΔＴm を算
出する（ステップ１０５Ｄ）。Next, the processing for obtaining the peak of the true waveform signal together with the normal waveform signal will be described with reference to the flow of processing shown in FIG. 4 and FIG. First, the MPU 50 executes the calculation target waveform extraction program 54e, and A (adenine) is applied to the waveform data of each base processed to remove the background noise and the ghost signal obtained in step 104 of FIG. ), C (cytosine), G (guanine), T (thymine) DNA base waveform signal sum data
Is calculated as a function of migration time according to equation (2) (step 105A). Then, second-order five-term smoothing differential processing (as an example of discretization processing) is performed on the data of this sum (step 105B), and a peak is detected for the data and the detected peak is detected. The frequency distribution data for the pitch is generated (step 105C). Further, the base pitch ΔTm as an average value is calculated from this frequency distribution data (step 105D).

【００２６】ＭＰＵ５０は、判値幅算出プログラム５４
ｆを実行して、前記の処理により抽出された演算対象の
塩基波形信号の波形デ−タに対して図５(b) に示すよう
に、ピーク右側の波形の判値幅ΔＴR1/2と、ピーク左側
の波形の判値幅ΔＴL1/2とを泳動時間の関数としてそれ
ぞれ算出する処理をする。そして、図(c) に示すよう
に、泳動時間の関数として算出した判値幅の平均値のデ
−タテーブルを作成する（ステップ１０５Ｅ）。次に、
ＭＰＵ５０は、正規波形の塩基抽出プログラム５４ｇを
実行して、前記の判値幅デ−タテーブルを参照し、この
判値幅ΔＴ_R1/2，ΔＴ_L1/2とピークの強度とを基準とし
てピーク位置の変動が少ない波形デ−タの塩基を正規波
形の塩基として抽出する（ステップ１０５Ｆ）。このと
きの塩基波形の抽出は、前記したように、例えば、全波
形信号の７０％から９０％程度の範囲のいずれかの％の
波形を抽出するような値に左右の判値幅の選択範囲（例
えば、先の３σ）とピーク強度の閾値とを設定して行
う。The MPU 50 includes a price range calculation program 54.
As shown in FIG. 5 (b) for the waveform data of the base waveform signal of the calculation target extracted by the above process by executing f, as shown in FIG. 5 (b), the threshold value width ΔTR1 / 2 of the waveform on the right side of the peak and the peak A process of calculating the threshold width ΔTL1 / 2 of the waveform on the left side as a function of the migration time is performed. Then, as shown in FIG. 6C, a data table of the average value of the price range calculated as a function of the migration time is created (step 105E). next,
The MPU 50 executes the normal waveform base extraction program 54g, refers to the above-described threshold value width data table, and determines the peak position based on the threshold value widths ΔT _{R1 / 2} and ΔT _{L1 / 2} and the peak intensity. The base of the waveform data with little fluctuation is extracted as the base of the normal waveform (step 105F). As described above, the extraction of the base waveform at this time is, for example, a selection range of the left and right threshold widths (a range in which 70% to 90% of the entire waveform signal is extracted). For example, the above 3σ) and the threshold value of the peak intensity are set.

【００２７】次に、ＭＰＵ５０は、スマイリング補正プ
ログラム５４ｈを実行して、選択された正規波形とみな
せる塩基デ−タに対してスマイリング補正を行う（ステ
ップ１０５Ｇ）。次に、ＭＰＵ５０は、塩基ピッチ算出
プログラム５４ｉを実行して、スマイリング補正後の補
正された塩基の波形信号デ−タに対して各区間における
ＤＮＡ塩基の波形信号の和のデ−タから得られたピーク
の出現ピッチの平均的な値ΔＴnとを求める（ステップ
１０５Ｈ）。次に、ＭＰＵ５０は、真の波形特定プログ
ラム５４ｊを実行して、正規波形の塩基抽出処理により
抽出されていない連接波形データに対してそのピーク位
置と強度とを求める（ステップ１０５Ｉ）。このように
して求めた正規波形と連接波形のすべての波形データの
ピーク位置と強度とをデータとして、ＭＰＵ５０は，従
来と同様に塩基配列の解析を行う（ステップ１０５
Ｊ）。Next, the MPU 50 executes the smileing correction program 54h to perform smileing correction on the base data that can be regarded as the selected normal waveform (step 105G). Next, the MPU 50 executes the base pitch calculation program 54i and obtains the sum of the waveform signals of the DNA bases in each section with respect to the corrected base waveform signal data after the smileing correction. Then, an average value ΔTn of the appearance pitch of the peaks is obtained (step 105H). Next, the MPU 50 executes the true waveform specifying program 54j and obtains the peak position and the intensity of the concatenated waveform data that has not been extracted by the base extraction processing of the normal waveform (step 105I). Using the peak positions and intensities of all the waveform data of the normal waveform and the concatenated waveform thus obtained as data, the MPU 50 analyzes the base sequence as in the conventional case (step 105).
J).

【００２８】以上説明してきたが、実施例の波形データ
は、生の波形データに対して半値幅ΔＴ_1/2や波形ピッ
チΔＴを得ているが、波形データを含め、半値幅ΔＴ
_1/2や波形ピッチΔＴを、二次微分の関数にして抽出し
て正規の塩基波形等を抽出することでより正確なデータ
を得ることができる。ある山型のピークを持つ波形デー
タを微分（一次微分）すると、その極大値の位置がゼロ
クロス点となって、正負にピークを持つ波形データが発
生する。これをさらに微分（二次微分）すると、それぞ
れのピークがゼロクロス点となった波形データとなり、
負側の波形が最初の波形データに対してそれよりも狭い
幅のとがったピークを持つ波形データになる。正側の波
形は幅のある低いピークの幅の広い波形になる。そこ
で、負側の波形を反転させて、正側とすることで元の波
形データと同等に扱うことができるようになる。すなわ
ち、前記の演算対象波形抽出プログラム５４ｅは、前記
値ΔＴm を前記ＤＮＡ塩基の波形信号の和のデ−タにつ
いて二次微分して、正負を反転させ、その正側の波形デ
ータに基づいて得るものであって、前記判値幅算出プロ
グラム５４ｆは、前記抽出された前記塩基波形信号の波
形デ−タの二次微分したデータの正負を反転させ、その
正側の波形データから前記第１および第２の判値幅ΔＴ
_R1/2，ΔＴ_L1/2を得るものであって、前記塩基抽出プロ
グラム５４ｇは、各前記塩基の波形データを二次微分し
たデータに対して前記二次微分の第１および第２の判値
幅ΔＴ_R1/2，ΔＴ_L1/2トそのピークの強度とを基準とし
てピーク位置の変動が少ない波形デ−タの塩基波形を正
規波形の塩基として抽出するものである。As described above, the waveform data of the embodiment obtains the half width ΔT _1/2 and the waveform pitch ΔT with respect to the raw waveform data. However, including the waveform data, the half width ΔT
More accurate data can be obtained by extracting _1/2 or the waveform pitch ΔT as a function of the second derivative and extracting a normal base waveform or the like. When the waveform data having a certain mountain-shaped peak is differentiated (first derivative), the position of the maximum value becomes the zero cross point, and waveform data having positive and negative peaks is generated. When this is further differentiated (second derivative), it becomes waveform data in which each peak is a zero cross point,
The waveform on the negative side becomes waveform data having a sharp peak with a width narrower than that of the first waveform data. The waveform on the positive side becomes a waveform with a wide low peak. Therefore, by inverting the waveform on the negative side and setting it on the positive side, it becomes possible to handle it in the same manner as the original waveform data. That is, the calculation target waveform extraction program 54e secondarily differentiates the value ΔTm with respect to the data of the sum of the waveform signals of the DNA bases, inverts the positive and negative values, and obtains it based on the positive side waveform data. The threshold width calculation program 54f inverts the positive / negative of the data obtained by second-order differentiating the waveform data of the extracted base waveform signal, and converts the positive and negative from the positive side waveform data. 2 threshold width ΔT
_{R1 / 2} and ΔT _{L1 / 2} are obtained, and the base extraction program 54g uses the second differential of the second differential of the waveform data of each of the bases. Based on ΔT _{R1 / 2} , ΔT _{L1 / 2 and} the intensity _of the peak, the base waveform of the waveform data in which the fluctuation of the peak position is small is extracted as the base of the normal waveform.

【００２９】この実施例でのゴースト信号の排除につい
ては、先に説明したように、各波形データの総和を算出
して、順次他の３個ゴースト信号成分を、算出対象塩基
を順次更新しながら算出して減算していくことにより、
効率よくゴースト信号成分を除去した総和の波形データ
を得ることができる。しかし、他の方法でゴースト信号
成分やノイズを除去してもよいことはもちろんである。
実施例おける半値幅は、塩基信号波形の左右についてそ
れぞれ算出しているが、これは、左右の平均値、すなわ
ち、ΔＴ_1/2＝（ΔＴ_L1/2＋ΔＴ_R1/2）／２により算出
してもよい。この場合、ΔＴ_L1/2＝ΔＴ_1/2，ΔＴ_R1/2
＝ΔＴ_1/2になる。Regarding the elimination of the ghost signal in this embodiment, as described above, the total sum of the respective waveform data is calculated, and the other three ghost signal components are sequentially updated while sequentially updating the bases to be calculated. By calculating and subtracting,
It is possible to efficiently obtain the total waveform data from which the ghost signal component has been removed. However, it goes without saying that the ghost signal component and noise may be removed by other methods.
The full width at half maximum in the example is calculated for each of the left and right sides of the base signal waveform. This is calculated by the average value of the left and right sides, that is, ΔT _1/2 = (ΔT _{L1 / 2} + ΔT _{R1 / 2} ) / 2. May be. In this case, ΔT _{L1 / 2} = ΔT _1/2 , ΔT _{R1 / 2}
= ΔT _1/2 .

【００３０】[0030]

【発明の効果】この発明にあっては、ピークに対してそ
の左側をガウス分布として捉え、ピークに対してその右
側をコーシー分布として捉え、左右異なる波形特性を持
つものとしてそれぞれの判値幅を左右に分けてそれぞれ
算出し、算出した左右の判値幅を指標としてこれと各波
形信号のピークの強度とのデ−タからピーク位置の変動
が少ない波形デ−タの塩基を正規波形として、まず抽出
し、この正規波形の波形信号のデ−タに基づいて各区間
ごとに波形のピッチΔＴn を求めて、これとそれぞれの
左右の判値幅ΔＴ_R1/2，ΔＴ_L1/2とにより正規波形の塩
基としては選択されていない連続的な連接波形信号につ
いて真の塩基信号のピークの強度Ｉとその位置とを前記
式と前記値ΔＴn とにより区間ごとに求めるようにして
いるので、連接波形の塩基信号を含めて多くの塩基波形
信号をほぼ正しい波形データとして得ることができる。
その結果、ＤＮＡ塩基配列の長い４００個あるいは５０
０個のＤＮＡ塩基についての決定確率を向上させること
ができる。According to the present invention, the left side of a peak is regarded as a Gaussian distribution, and the right side of a peak is regarded as a Cauchy distribution. Then, the bases of the waveform data with less fluctuation of the peak position are extracted as the normal waveform from the data of the calculated left and right threshold widths as an index and the peak intensity of each waveform signal. Then, the pitch ΔTn of the waveform is obtained for each section based on the data of the waveform signal of the regular waveform, and the base of the regular waveform is calculated from this and the left and right judgment widths ΔT _{R1 / 2} and ΔT _{L1 / 2.} For the continuous concatenated waveform signal that is not selected, the peak intensity I of the true base signal and its position are obtained for each section by the above equation and the value ΔTn. You can get a lot of base waveform signal as a nearly correct waveform data including a group signal.
As a result, 400 or 50 long DNA base sequences
The probability of decision about 0 DNA bases can be improved.

[Brief description of drawings]

【図１】図１は、この発明の塩基配列決定のための波形
ピ−ク決定方法を適用したＤＮＡシーケンサの一実施例
のブロック図である。FIG. 1 is a block diagram of an embodiment of a DNA sequencer to which a waveform peak determining method for determining a base sequence of the present invention is applied.

【図２】図２は、そのゴースト信号除去処理のフローチ
ャートである。FIG. 2 is a flowchart of the ghost signal removal processing.

【図３】図３は、その波形データ生成方法の説明図であ
る。FIG. 3 is an explanatory diagram of the waveform data generation method.

【図４】図４は、真のピーク波形データを抽出するフロ
ーチャートである。FIG. 4 is a flowchart for extracting true peak waveform data.

【図５】図５は、ピッチ頻度と波形分布と半値幅との説
明図である。FIG. 5 is an explanatory diagram of pitch frequency, waveform distribution, and half width.

【図６】図６は連接波形と真のピークとの関係の説明図
である。FIG. 6 is an explanatory diagram of a relationship between a concatenated waveform and a true peak.

[Explanation of symbols]

１…ＤＮＡシーケンサ、２…泳動板、３…ラインイメー
ジセンサ、４…ＣＣＤ駆動／制御回路、５…演算処理装
置、５０…ＭＰＵ、５１…アンプ、５２…ローパスフィ
ルタ（ＬＰＦ）、５３…Ａ／Ｄ変換回路（Ａ／Ｄ）、５
４…メモリ、５５…波形メモリ、５６…ＣＲＴディスプ
レイ、５７…キーボード、５８…プリンタ、５４ａ…塩
基波形データ抽出プログラム、５４ｂ…ゴーストデータ
算出プログラム、５４ｃ…ゴーストデータ除去プログラ
ム、５４ｄ…波形表示／出力プログラム、５４ｅ…演算
対象波形抽出プログラム５４、５４ｆ…判値幅算出プロ
グラム、５４ｇ…正規波形の塩基抽出手段、５４ｈ…ス
マイリング補正プログラム、５４ｉ…塩基ピッチ算出プ
ログラム、５４ｊ…真の波形特定プログラム。1 ... DNA sequencer, 2 ... electrophoresis plate, 3 ... line image sensor, 4 ... CCD drive / control circuit, 5 ... arithmetic processing unit, 50 ... MPU, 51 ... amplifier, 52 ... low pass filter (LPF), 53 ... A / D conversion circuit (A / D), 5
4 ... Memory, 55 ... Waveform memory, 56 ... CRT display, 57 ... Keyboard, 58 ... Printer, 54a ... Base waveform data extraction program, 54b ... Ghost data calculation program, 54c ... Ghost data removal program, 54d ... Waveform display / output Program, 54e ... Calculation target waveform extraction program 54, 54f ... Opening width calculation program, 54g ... Normal waveform base extraction means, 54h ... Smileing correction program, 54i ... Base pitch calculation program, 54j ... True waveform identification program.

Claims

[Claims]

1. A DNA base fragmentation using a single color dye marker.
Electrophoresis one group, and at the position where it is run for a certain distance
The reaching state is detected by receiving light emitted from the dye,
The received light intensity is used as a waveform signal as a function of time, and this waveform signal
Sequence determination to determine the sequence of DNA bases based on the number
In the device, A (adenine), C (cytosine), G (guanine), T
Electrophoresis of sum data of waveform signals of DNA base of (thymine)
Appearance pitch of peaks calculated as a function of time
Based on the average value ΔTm and the peak value above a certain level.
Then, the data of the base waveform signal to be calculated is extracted.
A calculation target waveform extracting means, and a peak from the waveform data of the extracted base waveform signal.
First threshold width ΔTR1 / 2 of waveform after time
And the second threshold width Δ of the waveform temporally before the peak position
TL1 / 2 and threshold value calculation as a function of migration time
Output means and the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2
Waveform with less fluctuation in peak position based on
Normal to extract the base waveform of data as a base of normal waveform
Waveform base extraction means and the base wave extracted by this normal waveform base extraction means
Smiley that corrects smiley for shape data
Correction means and the waveform of the base corrected by this smiley correction means
Each section is divided into predetermined time intervals for signal data.
Of the peak obtained from the waveform signal of the DNA base between
Calculate base pitch ΔTn according to the average value of appearance pitch
Base pitch calculating means, and the value ΔTn of each section obtained by this base pitch calculating means
The first and second values obtained by the notation price range calculation means
Of the normal waveform by the judgment widths ΔTR1 / 2 and ΔTL1 / 2 of
For the concatenated waveform signal not selected by the base extraction means,
And the intensity * I of the peak of the true base signal is_n = {I_n + 2β₁ β₂× I_n-3-(Β₁-2α x
β₂ ) I_n-1-ΑI_{n + 1}}/ (1-2α × β₁ ) However, I_n Is the nth observed intensity in the interval, I
_{n + 1} , I_n-1, I_n-3Is I_n From the position of ΔTn, -Δ
Tn, observation intensity at -3ΔTn apart, α = exp (-ln2 · P²), Β₁ = 1 / (1 + Q²), Β₂ = 1 / (1 + 4
Q²) P = 2ΔT_n / ΔT_{L1 / 2}, Q = 2ΔT_n / ΔT_{R1 / 2} And a normal waveform based on the value ΔTn
Based on the position of the waveform of the extracted adjacent base, its peak
A DNA salt having a true waveform specifying means for specifying a position
A waveform peak determination method for determining a base sequence.

2. The half-value width is calculated as ΔT _1/2 as an average value of the first price width ΔT _{R1 / 2} and the second price width ΔT _{L1 / 2,} and the first price width ΔT _{R1 / 2 and} ΔT _1/2 are used in place of the second threshold value width ΔTL _1/2.
A method for determining a waveform peak for determining the DNA base sequence as described.

3. The light emission is fluorescence and the waveform signal is
It is stored in the memory as a digital value and the ghost occurrence rate ε
Is 0.02 to 0.1, and A (adenine), C
The data gi for the ith ghost signal gi (where i is a subscript 1 to 4) of the four DNA bases of (cytosine), G (guanine), and T (thymine) is given by gi = ε (d1 + d2 + d3 + d4 )-. Epsilon.di and subtracting the obtained gi from the data of the waveform signal, wherein d1, d2, d3 and d4 are data values of the waveform signal for each DNA base. Waveform peak determination method for determining the DNA base sequence of the above.

4. The calculation target waveform extracting means is configured to output the value ΔT.
m is secondarily differentiated with respect to the data of the sum of the waveform signals of the DNA bases, positive and negative are inverted, and is obtained on the basis of the waveform data on the positive side thereof. The positive and negative of the second-order differentiated data of the waveform data of the base waveform signal thus obtained are inverted, and the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2 are obtained from the waveform data on the positive side. Therefore, the base extracting means sets the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2 of the second derivative and the peak intensities thereof to the data obtained by second-derivating the waveform data of each base. The waveform peak determination method for determining a DNA base sequence according to claim 1, wherein the base waveform of the waveform data with a small fluctuation in the peak position is extracted as the base of the normal waveform with reference to the above.

5. A DNA base breakage using a single color dye marker.
Electrophoresis one group, and at the position where it is run for a certain distance
The reaching state is detected by receiving light emitted from the dye,
The received light intensity is used as a waveform signal as a function of time, and this waveform signal
Sequence determination to determine the sequence of DNA bases based on the number
In the device, the migration plate is irradiated with a laser and the DN is detected by a line sensor.
Depending on the arrival state of the base fragment group of A or RNA,
Of an electric signal obtained from the line sensor having an optical detection system for detecting emitted light and a memory.
Detect levels as a function of time and convert them to digital values
And a processing unit that stores the data as data in the memory.
The arithmetic processing unit includes A (adenine) and C (sitoshi).
Waves of DNA bases of G), G (guanine) and T (thymine)
Obtained by calculating the sum of the shape signals as a function of migration time.
The average value ΔTm of the appearance pitches of the generated peaks and the predetermined value
Base wave to be calculated based on peak value of bell or higher
Waveform extraction means for extracting the data of the shape signal, and a peak from the waveform data of the extracted base waveform signal.
First threshold width ΔTR1 / 2 of waveform after time
And the second threshold width Δ of the waveform temporally before the peak position
TL1 / 2 and threshold value calculation as a function of migration time
Output means and the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2
Waveform with less fluctuation in peak position based on
Normal to extract the base waveform of data as a base of normal waveform
Waveform base extraction means and the base wave extracted by this normal waveform base extraction means
Smiley that corrects smiley for shape data
Correction means and the waveform of the base corrected by this smiley correction means
Each section is divided into predetermined time intervals for signal data.
Of the peak obtained from the waveform signal of the DNA base between
Calculate base pitch ΔTn according to the average value of appearance pitch
Base pitch calculating means, and the value ΔTn of each section obtained by this base pitch calculating means
The first and second values obtained by the notation price range calculation means
Of the normal waveform by the judgment widths ΔTR1 / 2 and ΔTL1 / 2 of
For the concatenated waveform signal not selected by the base extraction means,
And the intensity * I of the peak of the true base signal is_n = {I_n + 2β₁ β₂× I_n-3-(Β₁-2α x
β₂ ) I_n-1-ΑI_{n + 1}}/ (1-2α × β₁ ) However, I_n Is the nth observed intensity in the interval, I
_{n + 1} , I_n-1, I_n-3Is I_n From the position of ΔTn, -Δ
Tn, observation intensity at -3ΔTn apart, α = exp (-ln2 · P²), Β₁ = 1 / (1 + Q²), Β₂ = 1 / (1 + 4
Q²) P = 2ΔT_n / ΔT_{L1 / 2}, Q = 2ΔT_n / ΔT_{R1 / 2} And a normal waveform based on the value ΔTn
Based on the position of the waveform of the extracted adjacent base, its peak
DNA salt having a true waveform specifying means for specifying the position
Base sequencing device.

6. The light emission is fluorescence and the waveform signal is
It is stored in the memory as a digital value and the ghost occurrence rate ε
Is 0.02 to 0.1, and A (adenine), C
The data gi for the ith ghost signal gi (where i is a subscript 1 to 4) of the four DNA bases of (cytosine), G (guanine), and T (thymine) is given by gi = ε (d1 + d2 + d3 + d4 )-[Epsilon] di and subtracting the obtained gi from the waveform signal data, wherein d1, d2, d3 and d4 are the data values of the waveform signal for each DNA base and the half width Is calculated as an average value of the first price range ΔTR1 / 2 and the second price range ΔTL1 / 2, and the first price range ΔTR1 / 2 and the second price range Δ
The DNA nucleotide sequencer according to claim 3, wherein the ΔT1 / 2 is used in place of TL1 / 2.

7. The calculation target waveform extracting means is configured to output the value ΔT.
m is secondarily differentiated with respect to the data of the sum of the waveform signals of the DNA bases, positive and negative are inverted, and is obtained on the basis of the waveform data on the positive side thereof. The positive and negative of the second-order differentiated data of the waveform data of the base waveform signal thus obtained are inverted, and the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2 are obtained from the waveform data on the positive side. Therefore, the base extracting means sets the first and second threshold widths ΔTR1 / 2 and ΔTL1 / 2 of the second derivative and the peak intensities thereof to the data obtained by second-derivating the waveform data of each base. The DNA base sequence determination device according to claim 3, wherein the base waveform of the waveform data with a small fluctuation in the peak position is extracted as a base of the normal waveform with reference to the above.