JP5344670B2

JP5344670B2 - Gene expression analysis method, gene expression analysis apparatus, and gene expression analysis program

Info

Publication number: JP5344670B2
Application number: JP2008032466A
Authority: JP
Inventors: 康次笠間; 真澄安倍
Original assignee: National Institute of Radiological Sciences
Current assignee: National Institute of Radiological Sciences
Priority date: 2008-02-13
Filing date: 2008-02-13
Publication date: 2013-11-20
Anticipated expiration: 2028-02-13
Also published as: JP2009193273A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for easily comparing reference profile data (reference PD) with created measurement profile data (measurement PD) and analyzing an expression state of a gene. <P>SOLUTION: The method includes a reference PD acquisition step S1 of showing, as a first waveform, a reference range of values equivalent to the number of bases based on positions and a detection amount of the values equivalent to the number of bases of DNA fragments, identifying and storing transcript seeds from which a predetermined peak in the first waveform comes, and acquiring the stored reference PD; a measurement PD creation step S2 of creating a measurement PD showing, as a second waveform, a measurement range of values equivalent to the number of bases obtained on the basis of positions and a detection amount of values equivalent to the number of bases of DNA fragments to be measured; a peak association step S3 applying correction processing and associating positions of peaks of a part of or all of at least either the first waveform or the second waveform; and a gene expression analyzing step S4 of reading information of the derivation of associated peak from transcript seed information, identifying the gene of the peak of the measurement PD, and thus, analyzing the expression state of the gene. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、発現している複数の遺伝子転写産物に由来する複数のピークを有する一つの波形として表したプロファイルデータを用いて遺伝子の発現状態を解析する遺伝子発現解析方法、遺伝子発現解析装置、および遺伝子発現解析プログラムに関する。 The present invention relates to a gene expression analysis method, a gene expression analysis apparatus, and a gene expression analysis apparatus that analyze a gene expression state using profile data represented as a single waveform having a plurality of peaks derived from a plurality of expressed gene transcripts, and The present invention relates to a gene expression analysis program.

ゲノム科学は「ヒトの遺伝子情報（ヒトゲノム）の解読完了」という象徴的な事件の後、遺伝子発現に関する制御メカニズムの解明、遺伝子産物の機能解明という、いわゆるポストシークエンス研究へと移行しつつある。ポストシークエンス研究を進めると、さまざまな生命現象を解明することができるので学術的に意義が高いだけでなく、医薬品の開発にも多大な貢献をもたらし、高度なオーダーメイド治療等を実現できる可能性がある。そのため、ポストシークエンス研究の進展に対する期待度も非常に大きい。 Genomic science is moving to so-called post-sequence research, which is the elucidation of the control mechanism related to gene expression and the elucidation of the function of gene products after the symbolic event of "complete decoding of human genetic information (human genome)". Advancing post-sequence research can elucidate various life phenomena, so it is not only highly scientifically significant, but also contributes greatly to the development of pharmaceuticals and the possibility of realizing advanced custom-made treatments, etc. There is. Therefore, the degree of expectation for the progress of post-sequence research is very high.

このようなポストシークエンス研究の一例として、遺伝子の発現状態を解析する遺伝子発現解析が挙げられる。同じ遺伝子でも、生体の状態等によって時々刻々とその発現量が変化しており、オルタナティブスプライシング等によって一つの遺伝子から異なるタンパク質に対応した複数のｍＲＮＡが生成されている。また、高等動物や植物では、タンパク質に翻訳されないノンコーディングＲＮＡやマイクロＲＮＡも生成されており、これらが遺伝子発現制御を行っていることや、様々な生物種で保存されていることなども明らかになりつつある。
これらのＲＮＡの中でもタンパク質の発現に直接的に関係のあるｍＲＮＡの発現量を解析して遺伝子発現解析を行うことは、前記したように、より高度なオーダーメイド治療等を実現するためにも特に重要である。 An example of such post-sequence studies is gene expression analysis that analyzes gene expression status. Even in the same gene, its expression level changes every moment depending on the state of the living body, and a plurality of mRNAs corresponding to different proteins are generated from one gene by alternative splicing or the like. In higher animals and plants, non-coding RNAs and microRNAs that are not translated into proteins are also produced, and it is clear that these are regulated in gene expression and preserved in various species. It is becoming.
Among these RNAs, the analysis of gene expression by analyzing the expression level of mRNA that is directly related to the expression of protein, as described above, is particularly effective for realizing more advanced custom-made treatments and the like. is important.

遺伝子発現解析を行うための手法としては、遺伝子の発現状態を網羅的に解析することのできるディファレンシャルディスプレイ法やＳＡＧＥ（Serial Analysis of Gene Expression）法、ＤＮＡマイクロアレイ法、ＤＮＡチップ法などが広く用いられている。 As a method for performing gene expression analysis, a differential display method, a SAGE (Serial Analysis of Gene Expression) method, a DNA microarray method, a DNA chip method, etc. that can comprehensively analyze the expression state of a gene are widely used. ing.

また、近年、遺伝子発現解析を行うための手法として、網羅的かつより高精度な遺伝子発現解析を可能にした高カバー率遺伝子発現解析法（High Coverage Expression Profiling法（以下、「ＨｉＣＥＰ法」と称する。））が開発され、注目を浴びている（例えば、特許文献１参照）。
ＨｉＣＥＰ法は、ｍＲＮＡを逆転写して得られたｃＤＮＡを２種類の制限酵素により切断し、これに特殊な塩基配列を有するアダプターと称する２０塩基ほどのＤＮＡフラグメントをライゲーションさせ、さらに前記した特殊な塩基配列を有するアダプターと相補的な塩基配列を有する、蛍光標識された選択的ＰＣＲ用プライマーを用いて選択的ＰＣＲを行い、キャピラリー電気泳動によって種々の長さに応じて分離し、これを解析するキャピラリーＤＮＡシーケンサーを利用して、複数のピークを有する一つの波形として表された測定プロファイルデータを得るものである。このようにすると、前記した複数のピークは、一つの波形に、約２００程度のピークを有することになる。これは、同じサンプルであれば条件の相違により強度が変わっても原則として同じ泳動位置（プロファイル上のピークの位置）に、同じ遺伝子の転写産物に由来するピークが検出される。 Also, in recent years, as a method for performing gene expression analysis, a high coverage gene profiling method (hereinafter referred to as “HiCEP method”) that enables comprehensive and more accurate gene expression analysis. )) Has been developed and attracts attention (see, for example, Patent Document 1).
In the HiCEP method, a cDNA obtained by reverse transcription of mRNA is cleaved with two types of restriction enzymes, and a DNA fragment of about 20 bases called an adapter having a special base sequence is ligated thereto, and the special base described above is further ligated. Capillaries that perform selective PCR using fluorescently labeled selective PCR primers having a base sequence that is complementary to the adapter having the sequence, which are separated according to various lengths by capillary electrophoresis and analyzed Measurement profile data expressed as a single waveform having a plurality of peaks is obtained using a DNA sequencer. In this way, the plurality of peaks described above has about 200 peaks in one waveform. In principle, peaks derived from transcripts of the same gene are detected at the same migration position (peak position on the profile) even if the intensity changes due to a difference in conditions for the same sample.

したがって、ＨｉＣＥＰ法は、塩基配列が決定されていない未知遺伝子に対してもその発現状態を解析することができるという利点を有しており、発現している遺伝子の全転写産物（全ｍＲＮＡ）に対して解析される転写産物（ｍＲＮＡ）の割合をカバー率と定義するならば、前述した従来法のカバー率が１０〜３０％であるのに対し、ＨｉＣＥＰ法は７０〜８０％のカバー率を達成している。さらに、±約２０％の微小な変動量を確実に捉えることが可能である。このように、ＨｉＣＥＰ法は、従来のＤＮＡマイクロアレイ法等では実現し得なかった高精度・高感度を達成している。 Therefore, the HiCEP method has the advantage that the expression state of an unknown gene whose base sequence has not been determined can be analyzed, and the total transcription product (total mRNA) of the expressed gene can be analyzed. If the ratio of the transcript (mRNA) to be analyzed is defined as the coverage ratio, the coverage ratio of the conventional method described above is 10 to 30%, whereas the HiCEP method has a coverage ratio of 70 to 80%. Have achieved. Furthermore, it is possible to reliably capture a minute fluctuation amount of about ± 20%. Thus, the HiCEP method achieves high accuracy and high sensitivity that could not be realized by the conventional DNA microarray method or the like.

国際公開第０２／０４８３５２号パンフレットInternational Publication No. 02/048352 Pamphlet

しかしながら、ディファレンシャルディスプレイ法やＳＡＧＥ（Serial Analysis of Gene Expression）法、ＤＮＡマイクロアレイ法、ＤＮＡチップ法などは、塩基配列が予め分かっている遺伝子にしか対応できないこと、感度が低い（例えば、検出のために必要なｍＲＮＡの変動量は２〜３倍必要といわれている。）こと、発現量が大きく変動するものでないとその結果の再現性が十分とはいえないことなどの問題がある。 However, the differential display method, the SAGE (Serial Analysis of Gene Expression) method, the DNA microarray method, the DNA chip method, etc. can only deal with genes whose base sequences are known in advance, and have low sensitivity (for example, for detection). It is said that the required amount of mRNA variation is 2 to 3 times.), And the reproducibility of the results cannot be said to be sufficient unless the expression level varies greatly.

また、特許文献１に記載のＨｉＣＥＰ法は、従来法では成し得なかった高カバー率、高精度、高感度で遺伝子の転写産物の発現状態を解析することができるが、電気泳動を使用して分離するため、ｃＤＮＡフラグメントのサイズが同じ場合や、電気泳動時のｃＤＮＡフラグメントの立体的分子構造が原因で、本来の塩基数とは異なる位置にピークがずれて出現してしまうことがある。また、キャピラリーＤＮＡシーケンサーに用いられる電気泳動用のポリマーの劣化や、異なる製造ロットのポリマーの使用、或いは電気泳動時の電圧の変化や温度の変化によって得られる測定プロファイルデータのピークの位置がずれてしまうことがある。また、異なるモデルのシーケンサーを用いる場合は、ポリマーが異なるなど測定条件の差異により、数ｂｐピークの位置が異なって測定されることがあり、異なる研究者のデータを比較する場合、注意が必要である。 In addition, the HiCEP method described in Patent Document 1 can analyze the expression state of a gene transcript with high coverage, high accuracy, and high sensitivity that could not be achieved by conventional methods, but it uses electrophoresis. Therefore, the peak may appear at a position different from the original base number due to the same size of the cDNA fragment or the three-dimensional molecular structure of the cDNA fragment during electrophoresis. In addition, the peak position of the measurement profile data obtained due to deterioration of the electrophoresis polymer used in the capillary DNA sequencer, use of a polymer from a different production lot, or change in voltage or temperature during electrophoresis is shifted. May end up. Also, when using sequencers of different models, the position of several bp peaks may be measured differently due to differences in measurement conditions such as different polymers. Care must be taken when comparing data from different researchers. is there.

そのため、作成された測定プロファイルデータとリファレンスとなる参照プロファイルデータとを対比させたり、別の施設や別の日に同じ条件で測定されて作成された測定プロファイルデータ同士を対比させたりした場合にピークの位置が一致せず、対比が困難となる場合がある。なお、このような場合において、例えば一つ一つのピークについて電気泳動時の温度や電圧の変化に関する補正係数を乗じるなどの補正処理を行って、ピークの出現位置等を補正した後にこれらを対比させることも考えられるが、得られたピークの全てにそのような処理を行うのは非常に負担が重く、時間のかかる作業である。また、ピークを構成するフラグメント配列の塩基数とは異なり、本質的に、基準となる絶対的なピークサイズを想定することはできない。 Therefore, when the created measurement profile data is compared with the reference profile data used as a reference, or when measurement profile data created under the same conditions on different facilities or on different days are compared, the peak In some cases, the positions of these do not match, making comparison difficult. In such a case, for example, each peak is subjected to correction processing such as multiplication by a correction coefficient related to changes in temperature or voltage during electrophoresis, and the peak appearance position and the like are corrected, and then these are compared. Although it is conceivable, it is very burdensome and time consuming to perform such processing on all the obtained peaks. In addition, unlike the number of bases of the fragment sequence constituting the peak, an absolute peak size serving as a reference cannot be assumed essentially.

本発明は前記問題に鑑みてなされたものであり、作成された測定プロファイルデータと参照プロファイルデータとの対比や、作成された測定プロファイルデータ同士の対比を容易に行うことができ、かつ遺伝子の発現状態の解析を容易に行うことができる遺伝子発現解析方法、遺伝子発現解析装置、および遺伝子発現解析プログラムを提供することを課題とする。 The present invention has been made in view of the above problems, and can easily compare the created measurement profile data with the reference profile data, or can compare the created measurement profile data with each other, and can express the gene. It is an object of the present invention to provide a gene expression analysis method, a gene expression analysis apparatus, and a gene expression analysis program that can easily analyze a state.

（１）前記課題を解決した本発明に係る遺伝子発現解析方法は、発現している複数の遺伝子転写産物に由来する複数のピークを一つの波形として表したプロファイルデータを用いて遺伝子の発現状態を解析する遺伝子発現解析方法であって、参照プロファイルデータ取得工程と、測定プロファイルデータ作成工程と、ピーク対応付け工程と、遺伝子発現解析工程を含むことを特徴としている。 (1) The gene expression analysis method according to the present invention that solves the above-described problem is characterized in that the expression state of a gene is determined using profile data that represents a plurality of peaks derived from a plurality of expressed gene transcripts as one waveform. A gene expression analysis method for analysis, which includes a reference profile data acquisition step, a measurement profile data creation step, a peak association step, and a gene expression analysis step.

つまり、本発明に係る遺伝子発現解析方法は、参照プロファイルデータ取得工程で、複数の遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られるＤＮＡフラグメントの塩基数相当値の位置と、その位置における前記遺伝子転写産物の転写量相当の検出量に基づいて得られる塩基数相当値の参照範囲を第一の波形として表わし、かつ前記遺伝子転写産物の転写産物種情報として、前記第一の波形中の所定のピークと、そのピークが由来する転写産物種と、を同定して記憶した参照プロファイルデータを予め取得しておき、次いで、測定プロファイルデータ作成工程で、複数の遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られる測定対象物となるＤＮＡフラグメントの塩基数相当値の位置と、その位置における前記測定対象物の転写量相当の検出量に基づいて得られる塩基数相当値の測定範囲を第二の波形として表した測定プロファイルデータを作成する。次いで、ピーク対応付け工程で、前記第一の波形および前記第二の波形のうちの少なくとも一方の一部または全部の領域を補正しピークの位置を調整する補正処理を行い、前記第二の波形中の着目する領域を含む複数のピークと、前記第一の波形における複数のピークと、を対応付けることで、前記着目する領域を含む複数のピークと、当該複数のピークに相当する第一の波形中の複数のピークと、を対応付け、次いで、遺伝子発現解析工程で、対応付けされた前記測定プロファイルデータのピークの由来する遺伝子転写産物の情報を前記転写産物種情報から読み取り、対応付けされた前記測定プロファイルデータのピークの由来する遺伝子を特定することで遺伝子の発現状態を解析する。
そして、本発明における前記ピーク対応付け工程は、前記参照プロファイルデータ取得工程で取得された前記参照プロファイルデータのピークの位置と、前記測定プロファイルデータ作成工程で作成された前記測定プロファイルデータのピークの位置が一致する場合は、一致する前記参照プロファイルデータのピークと前記測定プロファイルデータのピークを対応付け、前記参照プロファイルデータ取得工程で取得された前記参照プロファイルデータのピークの位置と前記測定プロファイルデータ作成工程で作成された前記測定プロファイルデータのピークの位置が一部または全部ずれている場合は、これらのうちの少なくとも一方の波形について、これらの波形の類似度が最も高くなるように一部または全部の領域を補正しピークの位置を調整する補正処理を行った上で、前記参照プロファイルデータのピークと前記測定プロファイルデータのピークを対応付ける。 That is, in the gene expression analysis method according to the present invention, in the reference profile data acquisition step, the position corresponding to the number of bases of a DNA fragment obtained by amplifying a part of cDNA reverse-transcribed from a plurality of gene transcripts, A reference range of a base number equivalent value obtained based on a detected amount equivalent to the transcription amount of the gene transcript at the position is represented as a first waveform, and the transcript type information of the gene transcript as the first waveform Reference profile data that identifies and stores a predetermined peak and the transcript type from which the peak is derived is acquired in advance, and then a plurality of gene transcripts are reverse-transcribed in a measurement profile data creation step. A position corresponding to the number of bases of a DNA fragment to be measured obtained by amplifying a part of the obtained cDNA, and the measurement pair at the position Creating a measurement profile data representing the measuring range of the number of bases corresponding value obtained on the basis of the detected amount of the transfer amount equivalent of goods as the second waveform. Next, in the peak association step, correction processing is performed to correct a partial or entire region of at least one of the first waveform and the second waveform and adjust the position of the peak, and the second waveform The plurality of peaks including the region of interest in the first waveform and the plurality of peaks in the first waveform are associated with each other to thereby match the plurality of peaks including the region of interest and the first waveform corresponding to the plurality of peaks. In the gene expression analysis step, the information on the gene transcript derived from the peak of the associated measurement profile data is read from the transcript species information and correlated. The gene expression state is analyzed by specifying the gene from which the peak of the measurement profile data is derived.
In the peak matching step of the present invention, the peak position of the reference profile data acquired in the reference profile data acquisition step and the peak position of the measurement profile data created in the measurement profile data creation step Are matched, the corresponding peak of the reference profile data is associated with the peak of the measurement profile data, and the peak position of the reference profile data acquired in the reference profile data acquisition step and the measurement profile data creation step If the position of the peak of the measurement profile data created in step 1 is partly or completely shifted, at least one of these waveforms is partially or entirely so that the similarity between these waveforms is the highest. Correct the area and adjust the peak position After performing the correction processing of integer associates the peak measured profile data and the peak of the reference profile data.

したがって、本発明に係る遺伝子発現解析方法は、取得された参照プロファイルデータと、作成された測定プロファイルデータを波形として表して対比させるため、参照プロファイルデータのピークと、作成された測定プロファイルデータのピークの対応付けを容易に行うことができる。そして、対応付けができた測定プロファイルデータのうちの着目するピークについてその由来となる遺伝子を特定することにより、適切に遺伝子の発現状態を解析することができる。
また、参照プロファイルデータのピークの位置と測定プロファイルデータのピークの位置が一致する場合は、これらのピークの位置を補正処理せずにそのまま対応付けし、これらのピークの位置がずれている場合のみ、そのピークの位置を調整する補正処理を行って対応付けをするため、迅速な対応付けが可能となる。 Therefore, in the gene expression analysis method according to the present invention, the acquired reference profile data and the created measurement profile data are represented as waveforms and compared, so that the peak of the reference profile data and the peak of the created measurement profile data are compared. Can be easily associated. And the gene expression state can be analyzed appropriately by specifying the gene that is derived from the peak of interest in the measurement profile data that can be correlated.
In addition, if the peak position of the reference profile data and the peak position of the measurement profile data match, these peak positions are matched as they are without correction processing, and only when these peak positions are shifted. Since the correction processing for adjusting the position of the peak is performed for association, quick association is possible.

（２）本発明における前記参照プロファイルデータ取得工程は、前記参照プロファイルデータを、既知のプロファイルデータを保存しているデータベースから取得するか、前記転写産物種情報から人工的に作成して取得するか、既知のプロファイルデータ若しくは前記測定プロファイルデータに１つ以上のピークを追加或いは削除することによって取得するか、前記参照プロファイルデータを複数用いて合成することによって取得するか、または、前記測定プロファイルデータを複数用いて合成することによって取得するのが好ましい。
このようにすれば、迅速かつ簡便に参照プロファイルデータを取得して、測定プロファイルデータと対比させることが可能となる。 (2) Whether the reference profile data acquisition step in the present invention acquires the reference profile data from a database storing known profile data or artificially creates it from the transcript type information Acquired by adding or deleting one or more peaks from the known profile data or the measurement profile data, or by combining a plurality of the reference profile data, or the measurement profile data It is preferable to obtain by combining a plurality.
In this way, reference profile data can be acquired quickly and easily and compared with measurement profile data.

（３）本発明においては、前記ピーク対応付け工程における補正処理が、ガウス関数に基づく関数近似によって行われるのが好ましい。
このようにすれば、補正処理にかかる負担を軽減しつつ高精度かつ簡便に補正処理を行うことができる。 ( 3 ) In this invention, it is preferable that the correction process in the said peak matching process is performed by the function approximation based on a Gaussian function.
In this way, the correction process can be performed with high accuracy and simplicity while reducing the burden on the correction process.

（４）本発明においては、前記ピーク対応付け工程における補正処理が、前記参照プロファイルデータのピークの位置を基準として前記測定プロファイルデータのピークの位置を移動させるか、前記測定プロファイルデータのピークの位置を基準として前記参照プロファイルデータのピークの位置を移動させるか、または、前記参照プロファイルデータのピークの位置と前記測定プロファイルデータのピークの位置の双方を移動させて、前記測定プロファイルデータのピークと前記参照プロファイルデータのピークとを対応付けるのが好ましい。
このようにすれば、参照プロファイルデータピークと測定プロファイルデータのピークを適切に対応付けることができる。 ( 4 ) In the present invention, the correction processing in the peak association step moves the peak position of the measurement profile data based on the peak position of the reference profile data, or the peak position of the measurement profile data The peak position of the reference profile data is moved on the basis of the above, or both the peak position of the reference profile data and the peak position of the measurement profile data are moved, and the peak of the measurement profile data and the peak It is preferable to associate the peak of the reference profile data.
In this way, it is possible to appropriately associate the reference profile data peak with the measurement profile data peak.

（５）本発明における前記遺伝子発現解析工程は、前記参照プロファイルデータのピークと前記測定プロファイルデータのピークの対応付けができたピークと、対応付けができなかったピークと、が区別できるように表示するとともに、前記参照プロファイルデータのピークに、当該ピークの由来となる遺伝子に関する遺伝子情報が付加されている場合は、当該遺伝子情報を引用することにより前記測定プロファイルデータにおいて対応付けされたピークの遺伝子を特定し、遺伝子の発現状態を解析するのが好ましい。 ( 5 ) The gene expression analysis step according to the present invention is displayed so that a peak in which the peak of the reference profile data and the peak of the measurement profile data can be correlated can be distinguished from a peak that cannot be correlated. In addition, when gene information regarding the gene from which the peak is derived is added to the peak of the reference profile data, the gene of the peak associated in the measurement profile data is obtained by quoting the gene information. It is preferable to identify and analyze the expression state of the gene.

このようにすれば、得られた解析結果は、対応付けができなかったピークを容易に見分けることができるとともに、対応付けができた測定プロファイルデータのピークは、その由来となる遺伝子が特定されているので、例えば、後にこれを参照したときに当該ピークが何の遺伝子に由来するか、その塩基配列を解析し直す必要がなくなるなど、ユーザーにとって利用価値の高いものとなる。 In this way, the obtained analysis result can easily identify the peak that could not be matched, and the peak of the measurement profile data that can be matched is identified by the gene from which it was derived. Therefore, for example, when this is referred to later, it is highly useful to the user, such as what gene the peak is derived from, and it is not necessary to reanalyze the base sequence.

（６）本発明における前記遺伝子発現解析工程には、前記ピーク対応付け工程で対応付けができなかったピークについて、当該ピークに関する関連情報を付加する工程が含まれているのが好ましく、（７）前記関連情報が、前記波形の類似度に関する相関係数を基にした評価値、ピーク位置、プライマーセット、発現強度、ピーク形状の特徴、およびサンプルの細胞情報や実験情報のうち少なくとも１つを含んでいるのが好ましい。 ( 6 ) It is preferable that the gene expression analysis step in the present invention includes a step of adding related information regarding the peak with respect to a peak that could not be matched in the peak matching step, ( 7 ) The related information includes at least one of an evaluation value based on a correlation coefficient related to the similarity of the waveform, a peak position, a primer set, an expression intensity, a peak shape characteristic, and sample cell information and experimental information. It is preferable.

このようにすれば、前記したピーク対応付け工程で対応付けができなかったピークについて、当該ピークに関する関連情報を遺伝子発現解析工程で付加することができるので、例えば、後にこれを参照したときに当該波形の類似度に関する相関係数を基にした評価値、ピーク位置、プライマーセット、発現強度、ピーク形状の特徴、およびサンプルの細胞情報や実験情報などの関連情報を得ることができる。したがって、このようにして得られた解析結果は、ユーザーにとってより利用価値の高いものとなる。 If it does in this way, since related information about the peak can be added in the gene expression analysis step for the peak that could not be matched in the above-described peak matching step, for example, when this is referred to later, It is possible to obtain evaluation values based on correlation coefficients related to waveform similarity, peak positions, primer sets, expression intensity, peak shape characteristics, and related information such as sample cell information and experimental information. Therefore, the analysis result obtained in this way has a higher utility value for the user.

（８）本発明に係る遺伝子発現解析装置は、発現している複数の遺伝子転写産物に由来する複数のピークを有する一つの波形として表したプロファイルデータを用いて遺伝子の発現状態を解析する遺伝子発現解析装置であって、複数の遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られるＤＮＡフラグメントの塩基数相当値の位置と、その位置における前記遺伝子転写産物の転写量相当の検出量と、に基づいて得られる塩基数相当値の参照範囲を第一の波形として表わし、かつ前記遺伝子転写産物の転写産物種情報として、前記第一の波形中の所定のピークと、そのピークが由来する転写産物種と、を同定して記憶した参照プロファイルデータを予め取得しておく参照プロファイルデータ取得手段と、複数の遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られる測定対象物となるＤＮＡフラグメントの塩基数相当値の位置と、その位置における前記測定対象物の転写量相当の検出量と、に基づいて得られる塩基数相当値の測定範囲を第二の波形として表した測定プロファイルデータを作成する測定プロファイルデータ作成手段と、前記第一の波形および前記第二の波形のうちの少なくとも一方の一部または全部の領域を補正しピークの位置を調整する補正処理を行い、前記第二の波形中の着目する領域を含む複数のピークと、前記第一の波形における複数のピークと、を対応付けることで、前記着目する領域を含む複数のピークと、当該複数のピークに相当する第一の波形中の複数のピークと、を対応付けるピーク対応付け手段と、対応付けされた前記測定プロファイルデータのピークの由来する遺伝子転写産物の情報を前記転写産物種情報から読み取り、対応付けされた前記測定プロファイルデータのピークの由来する遺伝子を特定することで遺伝子の発現状態を解析する遺伝子発現解析手段を有し、前記ピーク対応付け手段は、前記参照プロファイルデータ取得手段で取得された前記参照プロファイルデータのピークの位置と、前記測定プロファイルデータ作成手段で作成された前記測定プロファイルデータのピークの位置と、が一致する場合は、一致する前記参照プロファイルデータのピークと、前記測定プロファイルデータのピークと、を対応付け、前記参照プロファイルデータ取得手段で取得された前記参照プロファイルデータのピークの位置と、前記測定プロファイルデータ作成手段で作成された前記測定プロファイルデータのピークの位置と、が一部または全部ずれている場合は、これらのうちの少なくとも一方の波形について、これらの波形の類似度が最も高くなるように一部または全部の領域を補正しピークの位置を調整する補正処理を行った上で、前記参照プロファイルデータのピークと、前記測定プロファイルデータのピークと、を対応付けることを特徴としている。 ( 8 ) The gene expression analysis apparatus according to the present invention analyzes a gene expression state using profile data expressed as one waveform having a plurality of peaks derived from a plurality of expressed gene transcripts. An analysis device, a position corresponding to the number of bases of a DNA fragment obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcription products, and a detection amount corresponding to the transcription amount of the gene transcription product at that position The reference range of the base number equivalent value obtained based on the above is represented as a first waveform, and as the transcript type information of the gene transcript, the predetermined peak in the first waveform and its peak are derived The reference profile data acquisition means for acquiring the reference profile data that has been identified and stored in advance and the plurality of gene transcripts are reversed. A base obtained on the basis of the position corresponding to the number of bases of a DNA fragment that is a measurement object obtained by amplifying a part of the copied cDNA and the detection amount corresponding to the transcription amount of the measurement object at that position Measurement profile data creating means for creating measurement profile data representing a measurement range of a number equivalent value as a second waveform, and a part or all of at least one of the first waveform and the second waveform Is corrected and the peak position is adjusted, and the plurality of peaks including the region of interest in the second waveform and the plurality of peaks in the first waveform are associated with each other. Peak associating means for associating a plurality of peaks including the region with a plurality of peaks in the first waveform corresponding to the plurality of peaks, and the associated measurement profile Gene expression that analyzes gene expression status by reading the gene transcript information from the peak of profile data from the transcript species information and identifying the gene from which the peak of the associated measurement profile data is derived have a analysis means, said peak correlating means includes a position of the peak of the reference profile data obtained by the reference profile data acquisition means, the measurement of the profile data peak created by the measurement profile data creation means If the position matches, the corresponding peak of the reference profile data is associated with the peak of the measurement profile data, and the peak position of the reference profile data acquired by the reference profile data acquisition means In the measurement profile data creation means When the position of the peak of the measured profile data is partially or entirely deviated, at least one of these waveforms is partially or entirely so that the similarity between these waveforms is the highest. the area after performing the correction process of adjusting the position of the corrected peak, and the peak of the reference profile data, the peak of the measured profile data, characterized Rukoto associates.

このようにすれば、取得された参照プロファイルデータと作成された測定プロファイルデータを波形として表わすので、これらの対比を容易に行うことができ、さらに参照プロファイルデータのピークと測定プロファイルデータのピークの対応付けと遺伝子の発現状態の解析を容易に行うことができる。
また、このようにすれば、参照プロファイルデータのピークの位置と、測定プロファイルデータのピークの位置が一致する場合は、これらのピークの位置を補正処理せずにそのまま対応付けし、これらのピークの位置がずれている場合のみ、そのピークの位置を調整する補正処理を行って対応付けをするため、迅速な対応付けが可能となる。 In this way, since the acquired reference profile data and the created measurement profile data are represented as waveforms, they can be easily compared, and the correspondence between the peak of the reference profile data and the peak of the measurement profile data is also possible. And analysis of gene expression status can be easily performed.
Further, in this way, when the peak position of the reference profile data and the peak position of the measurement profile data match, these peak positions are directly associated without correction processing, and Only when the position is shifted, the correction processing for adjusting the position of the peak is performed for the association, so that the association can be performed quickly.

（９）本発明における前記参照プロファイルデータ取得手段は、前記参照プロファイルデータを、既知のプロファイルデータを保存しているデータベースから取得するか、前記転写産物種情報から人工的に作成して取得するか、既知のプロファイルデータ若しくは前記測定プロファイルデータに１つ以上のピークを追加或いは削除することによって取得するか、前記参照プロファイルデータを複数用いて合成することによって取得するか、または、前記測定プロファイルデータを複数用いて合成することによって取得するが好ましい。
このようにすれば、迅速かつ容易に参照プロファイルデータを取得して、測定プロファイルデータと対比させることが可能となる。 ( 9 ) Whether the reference profile data acquisition means in the present invention acquires the reference profile data from a database storing known profile data, or artificially creates it from the transcript type information. Acquired by adding or deleting one or more peaks from the known profile data or the measurement profile data, or by combining a plurality of the reference profile data, or the measurement profile data Preferably, it is obtained by synthesizing using a plurality.
In this way, reference profile data can be acquired quickly and easily and compared with measurement profile data.

（１０）本発明においては、前記ピーク対応付け手段における補正処理が、ガウス関数に基づく関数近似によって行われるのが好ましい。
このようにすれば、補正処理にかかる負担を軽減しつつ高精度かつ簡便に補正処理を行うことができる。 (1 0 ) In the present invention, it is preferable that the correction processing in the peak association unit is performed by function approximation based on a Gaussian function.
In this way, the correction process can be performed with high accuracy and simplicity while reducing the burden on the correction process.

（１１）本発明においては、前記ピーク対応付け手段における補正処理が、前記参照プロファイルデータのピークの位置を基準として前記測定プロファイルデータのピークの位置を移動させるか、前記測定プロファイルデータのピークの位置を基準として前記参照プロファイルデータのピークの位置を移動させるか、または、前記参照プロファイルデータのピークの位置と前記測定プロファイルデータのピークの位置の双方を移動させて、前記測定プロファイルデータのピークと前記参照プロファイルデータのピークを対応付けるのが好ましい。
このようにすれば、参照プロファイルデータピークと測定プロファイルデータのピークを適切に対応付けることができる。 (1 1 ) In the present invention, the correction processing in the peak association unit moves the peak position of the measurement profile data based on the peak position of the reference profile data, or the peak of the measurement profile data The peak position of the reference profile data is moved with respect to the position, or both the peak position of the reference profile data and the peak position of the measurement profile data are moved to It is preferable to associate the peak of the reference profile data.
In this way, it is possible to appropriately associate the reference profile data peak with the measurement profile data peak.

（１２）本発明における前記遺伝子発現解析手段は、前記参照プロファイルデータのピークと前記測定プロファイルデータのピークの対応付けができたピークと、対応付けができなかったピークと、が区別できるように表示するとともに、前記参照プロファイルデータのピークに、当該ピークの由来となる遺伝子に関する遺伝子情報が付加されている場合は、当該遺伝子情報を引用することにより前記測定プロファイルデータにおいて対応付けされたピークの遺伝子を特定し、遺伝子の発現状態を解析するのが好ましい。 (1 2 ) The gene expression analyzing means in the present invention is capable of distinguishing between a peak in which the peak of the reference profile data and the peak of the measurement profile data can be associated with a peak that cannot be associated. And when the gene information related to the gene from which the peak is derived is added to the peak of the reference profile data, the gene of the peak associated in the measurement profile data by quoting the gene information It is preferable to identify the gene and analyze the expression state of the gene.

（１３）本発明における前記遺伝子発現解析手段には、前記ピーク対応付け手段で対応付けができなかったピークについて、当該ピークに関する関連情報を付加する手段が含まれているのが好ましく、（１４）前記関連情報が、前記波形の類似度に関する相関係数を基にした評価値、ピーク位置、プライマーセット、発現強度、ピーク形状の特徴、およびサンプルの細胞情報や実験情報のうち少なくとも１つを含んでいるのが好ましい。 (1 3 ) The gene expression analyzing means in the present invention preferably includes means for adding related information relating to the peak that has not been matched by the peak matching means. 4 ) The related information is at least one of an evaluation value, a peak position, a primer set, an expression intensity, a peak shape characteristic, and sample cell information and experimental information based on a correlation coefficient related to the similarity of the waveform. Is preferably included.

このようにすれば、前記したピーク対応付け工程で対応付けができなかったピークについて、当該ピークに関する関連情報を遺伝子発現解析手段で付加することができるので、例えば、後にこれを参照したときに当該波形の類似度に関する相関係数を基にした評価値、ピーク位置、プライマーセット、発現強度、ピーク形状の特徴、およびサンプルの細胞情報や実験情報などの関連情報を得ることができる。したがって、このようにして得られた解析結果は、ユーザーにとってより利用価値の高いものとなる。 If it does in this way, since it can add the related information regarding the peak about the peak which was not able to be matched in the above-mentioned peak matching process by the gene expression analysis means, for example, when referring to this later It is possible to obtain evaluation values based on correlation coefficients related to waveform similarity, peak positions, primer sets, expression intensity, peak shape characteristics, and related information such as sample cell information and experimental information. Therefore, the analysis result obtained in this way has a higher utility value for the user.

（１５）本発明に係る遺伝子発現解析プログラムは、（１）から（７）に記載の遺伝子発現解析方法をコンピュータに実行させることを特徴としている。 ( 15 ) A gene expression analysis program according to the present invention is characterized by causing a computer to execute the gene expression analysis method described in (1) to ( 7 ).

このように、コンピュータに（１）から（７）に記載の遺伝子発現解析方法を実行させるので、コンピュータによって参照プロファイルデータと測定プロファイルデータの対比を容易に行うことができ、さらにこれらのピークを対応付けて遺伝子の発現状態を解析することが可能となる。 As described above, since the gene expression analysis method described in (1) to ( 7 ) is executed by a computer, the reference profile data and the measurement profile data can be easily compared by the computer, and these peaks can be corresponded. In addition, the expression state of the gene can be analyzed.

本発明の遺伝子発現解析方法によれば、参照プロファイルデータと測定プロファイルデータを波形として表して対比させるため、これらの対比を容易に行うことができ、かつ遺伝子の発現状態の解析を容易に行うことができる。 According to the gene expression analysis method of the present invention, the reference profile data and the measurement profile data are represented as waveforms and compared. Therefore, the comparison can be easily performed, and the gene expression state can be easily analyzed. Can do.

また、本発明の遺伝子発現解析装置によれば、参照プロファイルデータと測定プロファイルデータを波形として表して対比させるため、これらの対比を容易に行うことができ、かつ遺伝子の発現状態の解析を容易に行うことができる。 Further, according to the gene expression analysis apparatus of the present invention, the reference profile data and the measurement profile data are represented as waveforms and compared, so that these comparisons can be easily performed and the gene expression state can be easily analyzed. It can be carried out.

そして、本発明の遺伝子発現解析プログラムによれば、参照プロファイルデータと測定プロファイルデータを波形として表して対比させ、さらにこれらのピークを対応付けて遺伝子の発現状態の解析を行わせるようにコンピュータを実行させることができる。 According to the gene expression analysis program of the present invention, the computer is executed so that the reference profile data and the measurement profile data are represented as waveforms and compared, and the peaks are associated with each other to analyze the gene expression state. Can be made.

例えば、網羅的かつ高精度な遺伝子発現解析方法であるＨｉＣＥＰ法による、ＤＮＡシーケンサーを用いて得られ、複数のピークを有する波形として表された測定プロファイルデータは、波形の１ピークが、そのサンプルにおける特定の遺伝子に由来するｍＲＮＡの存在量（正確には１プロファイル中の相対値）を示している。全てのピークの強度を観測することで、同時に数万種類のｍＲＮＡの高精度な発現量を測定することができる。しかし、ピークがどのような塩基配列を持つのか、どのような遺伝子由来なのかは、ピークを分取して塩基配列を解読しなければ知ることができない。公知の膨大な遺伝子情報とＨｉＣＥＰ法による実験の結果を繋ぐためには、ピークの由来遺伝子（ｍＲＮＡ名）を知ることが重要である。なお、遺伝子情報とは、その遺伝子から転写される転写産物名（複数ある場合はリスト等であってもよい）や遺伝子の機能、その他の付属する情報をいい、後記する転写産物情報を包含する。 For example, measurement profile data obtained using a DNA sequencer based on the HiCEP method, which is a comprehensive and highly accurate gene expression analysis method, expressed as a waveform having a plurality of peaks, The abundance of mRNA derived from a specific gene (precisely, a relative value in one profile) is shown. By observing the intensity of all the peaks, it is possible to simultaneously measure highly accurate expression levels of tens of thousands of mRNAs. However, what kind of base sequence a peak has and what kind of gene it is from cannot be known unless the peak is sorted and the base sequence is decoded. It is important to know the gene (mRNA name) of the peak in order to connect a large amount of known gene information with the results of experiments using the HiCEP method. The gene information refers to the name of the transcript transcribed from the gene (may be a list if there are multiple genes), the function of the gene, and other attached information, including the transcript information described later. .

本発明は、ＨｉＣＥＰ法などによって作成された測定プロファイルデータを波形として表し、遺伝子情報が調べられている波形（すなわち、参照プロファイルデータの波形）を基準として波形の補正処理を行い、そのピークの位置を整列させること、または、測定プロファイルデータに１つ以上のピークを追加或いは削除して波形の補正処理を行い、整列させることで、ＨｉＣＥＰ法による測定プロファイルデータのピークを高精度に遺伝子情報に対応付けて、ピークの由来する遺伝子を特定し、遺伝子の発現状態を解析することのできる遺伝子発現解析方法を具現するものである。 The present invention expresses measurement profile data created by the HiCEP method or the like as a waveform, performs waveform correction processing based on a waveform whose genetic information is examined (that is, the waveform of reference profile data), and positions of the peaks , Or by adding or deleting one or more peaks from the measurement profile data to correct the waveform and aligning them, the peaks in the measurement profile data by HiCEP method correspond to the genetic information with high accuracy. In addition, the present invention embodies a gene expression analysis method that can identify a gene from which a peak is derived and analyze the expression state of the gene.

以下に、適宜図面を参照して本発明に係る遺伝子発現解析方法、遺伝子発現解析装置、および遺伝子発現解析プログラムを実施するための最良の形態について詳細に説明する。
まず、図１を参照して、本発明に係る遺伝子発現解析方法について説明する。なお、図１は、本発明に係る遺伝子発現解析方法の工程の手順を説明するフローチャートである。 The best mode for carrying out a gene expression analysis method, a gene expression analysis apparatus, and a gene expression analysis program according to the present invention will be described below in detail with reference to the drawings as appropriate.
First, the gene expression analysis method according to the present invention will be described with reference to FIG. FIG. 1 is a flowchart for explaining the steps of the gene expression analysis method according to the present invention.

本発明に係る遺伝子発現解析方法は、発現している複数の遺伝子転写産物に由来する複数のピークを有する一つの波形として表したプロファイルデータを用いて遺伝子の発現状態を解析する遺伝子発現解析方法である。
図１に示すように、本発明に係る遺伝子発現解析方法は、参照プロファイルデータ取得工程Ｓ１と、測定プロファイルデータ作成工程Ｓ２と、ピーク対応付け工程Ｓ３と、遺伝子発現解析工程Ｓ４とを含んでなる。 The gene expression analysis method according to the present invention is a gene expression analysis method for analyzing a gene expression state using profile data represented as one waveform having a plurality of peaks derived from a plurality of expressed gene transcripts. is there.
As shown in FIG. 1, the gene expression analysis method according to the present invention includes a reference profile data acquisition step S1, a measurement profile data creation step S2, a peak association step S3, and a gene expression analysis step S4. .

ここで、この技術分野における「発現」とは、一般的にはゲノムＤＮＡからｍＲＮＡが転写されてタンパク質として翻訳されることをいうが、本発明における「発現」とは、ゲノムＤＮＡからの遺伝子転写産物、代表的にはｍＲＮＡが転写されていることをいうものとする。
また、本発明において「プロファイル」とは、ある条件下におけるサンプルの遺伝子の発現パターン、既知および未知の遺伝子の発現の有無、発現される全ての遺伝子の発現量等を含む情報を示すものであり、「プロファイルデータ」とは、そのような情報を複数のピークを有する一つの波形として表したデータをいう。また、「ピーク」とは、前記した波形の一部を構成し、ＤＮＡフラグメントの塩基配列の塩基数と検出量に基づいて、例えば略三角形の形状で表されるものをいう。 Here, “expression” in this technical field generally means that mRNA is transcribed from genomic DNA and translated as a protein, but “expression” in the present invention means gene transcription from genomic DNA. The product, typically mRNA, is transcribed.
In the present invention, “profile” refers to information including the gene expression pattern of a sample under a certain condition, the presence or absence of expression of a known and unknown gene, the expression level of all the expressed genes, and the like. “Profile data” refers to data representing such information as a single waveform having a plurality of peaks. Further, the “peak” means a part of the waveform described above, which is expressed, for example, in a substantially triangular shape based on the number of bases in the base sequence of the DNA fragment and the detected amount.

図１に示すように、参照プロファイルデータ取得工程Ｓ１は、複数のｍＲＮＡなどの遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られるＰＣＲ増幅産物などのＤＮＡフラグメントの塩基数相当値の位置と、その位置における遺伝子転写産物の転写量相当の検出量に基づいて得られる塩基数相当値の参照範囲を第一の波形として表わし、かつ遺伝子転写産物の転写産物種情報として、第一の波形中の所定のピークと、そのピークが由来する転写産物種とを同定して記憶した参照プロファイルデータを予め取得しておく工程である。ここで、転写産物種情報とは、遺伝子転写産物の種類に関する情報をいい、例えば、ピーク位置、ピーク強度（高さ）が測定され、塩基配列が決定され、さらにその配列名やアクセッション番号が付与され、由来遺伝子名が特定されたものをいう。また、報告された刊行物名や著者名、機関名などの情報が含まれていてもよい。 As shown in FIG. 1, the reference profile data acquisition step S1 is a value corresponding to the number of bases of a DNA fragment such as a PCR amplification product obtained by amplifying a part of cDNA reverse-transcribed from a plurality of gene transcription products such as mRNA. The reference range of the position and the base number equivalent value obtained based on the detected amount corresponding to the transcription amount of the gene transcript at that position is represented as the first waveform, and the transcript species information of the gene transcript is represented by the first This is a step of previously acquiring reference profile data in which a predetermined peak in a waveform and a transcription product species from which the peak is derived are identified and stored. Here, the transcript type information refers to information on the type of gene transcript, for example, the peak position and peak intensity (height) are measured, the base sequence is determined, and the sequence name and accession number are It is given and the name of the gene of origin is specified. Information such as the name of the reported publication, the name of the author, and the name of the institution may also be included.

なお、本発明で用いられる参照プロファイルデータには、遺伝子情報を格納したデータベースへのリンク情報をもつピークが含まれて構成されているが、参照プロファイルデータの全てのピークについて由来する転写産物種（および／または遺伝子）が決定されていなくても参照プロファイルデータとして使用することができる。さらに、複数のプロファイルデータを用いたり、注目するプロファイル部分（ピーク部分）のみを抽出して用いたりしてもよい。また、複数の波形を合成して参照プロファイルデータを作成したり、ピークの位置のみを用いて人工的に参照プロファイルデータを作成したりしてもよい。 Note that the reference profile data used in the present invention is configured to include peaks having link information to a database storing gene information, but the transcript species derived from all peaks in the reference profile data ( And / or gene) can be used as reference profile data even if they have not been determined. Furthermore, a plurality of profile data may be used, or only the profile portion (peak portion) of interest may be extracted and used. Further, reference profile data may be created by combining a plurality of waveforms, or reference profile data may be created artificially using only the peak position.

このような参照プロファイルデータは、既知のプロファイルデータを保存しているデータベースから取得するか、転写産物種情報から人工的に作成して取得するか、既知のプロファイルデータ若しくは測定プロファイルデータに１つ以上のピークを追加或いは削除することによって取得するか、参照プロファイルデータを複数用いて合成することによって取得するか、または、測定プロファイルデータを複数用いて合成することによって好適に取得することができる。つまり、図１に示す、後のピーク対応付け工程Ｓ３で測定プロファイルデータの対応付けの対象となるプロファイルデータが得られればよい。したがって、予め用意されたプロファイルデータを用いることに限定されず、前記したように、作成された複数の測定プロファイルデータのうちの一つを参照プロファイルデータとして用い、他のものを測定プロファイルデータとすることができる。 Such reference profile data is acquired from a database storing known profile data, or is artificially created from transcript type information, or one or more known profile data or measurement profile data. Can be obtained by adding or deleting the peaks, or by combining a plurality of reference profile data, or by combining a plurality of measurement profile data. That is, it is only necessary to obtain profile data that is to be associated with measurement profile data in the subsequent peak association step S3 shown in FIG. Therefore, the present invention is not limited to using profile data prepared in advance. As described above, one of a plurality of created measurement profile data is used as reference profile data, and the other is used as measurement profile data. be able to.

ただし、当該データベースに保存されている既知のプロファイルデータ中の幾つかのピークに由来する遺伝子について、既に塩基配列が解読され、そのピークについての遺伝子情報、つまり、遺伝子転写産物の転写産物種情報が付加されている場合は、既知のプロファイルデータを参照することにより、これらの情報を同時に得ることができる。したがって、転写産物種情報を得ることができる点で、既知のプロファイルデータを参照する方が、測定プロファイルデータに１つ以上のピークを追加或いは削除して取得したものなどを参照プロファイルデータとして用いるよりも好適である。 However, for genes derived from several peaks in the known profile data stored in the database, the base sequence has already been decoded, and the gene information for that peak, that is, the transcript type information of the gene transcript is not available. If added, these pieces of information can be obtained simultaneously by referring to known profile data. Therefore, referring to the known profile data is more advantageous as reference profile data in that one or more peaks are added to or deleted from the measurement profile data in that the transcript type information can be obtained. Is also suitable.

用いる参照プロファイルデータは、ある条件下における特定の細胞（サンプル）と同一の生物種から得られたデータを参照するのが好ましい。ストレインや由来組織（細胞）などサンプルの条件や状態が近ければ近いほど測定プロファイルデータ（測定プロファイルデータについては、後記する測定プロファイルデータ作成工程Ｓ２で説明する。）と参照プロファイルデータの類似性が高くなるためより効果的である。しかしながら、本発明においては、類似する細胞の波形を参照プロファイルデータにするなど、サンプルの測定プロファイルデータと、参照する細胞の参照プロファイルデータとを比較的条件が近いものを用いる場合であっても十分効果的である（図２参照）。 The reference profile data used preferably refers to data obtained from the same species as a specific cell (sample) under certain conditions. The closer the conditions and state of the sample such as the strain and the derived tissue (cell) are, the higher the similarity between the measurement profile data (measurement profile data will be described in the measurement profile data creation step S2 described later) and the reference profile data. This is more effective. However, in the present invention, it is sufficient even when the measurement profile data of the sample and the reference profile data of the cell to be referenced are relatively close to each other, such as using the waveform of a similar cell as the reference profile data. It is effective (see FIG. 2).

なお、図２の（ａ）〜（ｈ）は、同一生物種由来の異なる細胞株Ａと細胞株ＢについてのＨｉＣＥＰ法（ＨｉＣＥＰ法についての具体的な説明は、後記する測定プロファイルデータ作成工程Ｓ２で行うこととする。）による測定プロファイルデータを比較した図であって、ＸアダプターのＮ_１Ｎ_２とＹアダプターのＮ_３Ｎ_４の組合せを、（ａ）は細胞Ａについて（ＡＡ＿ＡＡ）、（ｂ）は細胞Ｂについて（ＡＡ＿ＡＡ）、（ｃ）は細胞Ａについて（ＣＴ＿ＴＣ）、（ｄ）は細胞Ｂについて（ＣＴ＿ＴＣ）、（ｅ）は細胞Ａについて（ＡＧ＿ＣＧ）、（ｆ）は細胞Ｂについて（ＡＧ＿ＣＧ）、（ｇ）は細胞Ａについて（ＡＧ＿ＣＣ）、（ｈ）は細胞Ｂについて（ＡＧ＿ＣＣ）として行ったものである。
図２の（ａ）〜（ｈ）に示すように、ＨｉＣＥＰ法によって作成された測定プロファイルデータは、異なる細胞株であっても、類似した波形であることが分かる。 2A to 2H show the HiCEP method for different cell lines A and B derived from the same species (specific description of the HiCEP method will be described later in measurement profile data creation step S2). FIG. 6 is a diagram comparing measurement profile data according to (1), wherein a combination of N ₁ N ₂ of the X adapter and N ₃ N ₄ of the Y adapter, (a) is for cell A (AA_AA), ( b) for cell B (AA_AA), (c) for cell A (CT_TC), (d) for cell B (CT_TC), (e) for cell A (AG_CG), (f) for cell B (AG_CG) and (g) are performed for cell A as (AG_CC), and (h) is performed for cell B as (AG_CC).
As shown in FIGS. 2A to 2H, it can be seen that the measurement profile data created by the HiCEP method has a similar waveform even in different cell lines.

図１に示すように、次に行う測定プロファイルデータ作成工程Ｓ２は、複数のｍＲＮＡなどの遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られる測定対象物となるＰＣＲ増幅産物などのＤＮＡフラグメントの塩基数相当値の位置と、その位置における測定対象物の転写量相当の検出量とに基づいて得られる塩基数相当値の測定範囲を第二の波形として表した測定プロファイルデータを作成する工程である。 As shown in FIG. 1, in the next measurement profile data creation step S2, a PCR amplification product or the like that becomes a measurement object obtained by amplifying a part of cDNA obtained by reverse-transcription of a plurality of gene transcription products such as mRNA is obtained. Create measurement profile data that shows the measurement range of the base number equivalent value obtained based on the position corresponding to the base number of the DNA fragment and the detected amount equivalent to the transcription amount of the measurement object at that position as a second waveform It is a process to do.

測定プロファイルデータは、外挿補完や解析対象からの除外、ノイズレベルの指定と除外、重なり合うピークの分離などを行うことでより好適に波形として表すことができる。
外挿補完とは、発現量が大きすぎるために、ＤＮＡシーケンサーのセンサが飽和してしまい、波形が台形になったり、中央の落ちくぼんだ巨大なピークになったりすることがあるので、解析に先立って本来のピークを推定してピークの形状を補完し、完全なピークとして表すことをいう。 The measurement profile data can be more suitably represented as a waveform by performing extrapolation complementation, exclusion from the analysis target, designation and exclusion of noise levels, separation of overlapping peaks, and the like.
Extrapolation complementation is because the expression level is too large and the DNA sequencer's sensor is saturated, and the waveform may become trapezoidal, or the peak may become a huge peak in the center. This means that the original peak is estimated in advance and the shape of the peak is complemented and expressed as a complete peak.

また、解析対象からの除外とは、キャピラリーＤＮＡシーケンサーを用いたＨｉＣＥＰ法では何らかの結晶が析出するなどした不純物やゴミが混じった場合に、蛍光を励起させるレーザ光の単純な散乱光によってＡＴＧＣの各色が同時に測定され、複数のプロファイルの同じ位置に同じ強度でピークが検出されるので、そのような異常なピークを解析対象から除外することをいう。なお、このような異常なピークは、ピークの広がり方（σ）やピークの対象性といったピークの形状からも検出することができる。 In addition, the exclusion from the object of analysis means that in the HiCEP method using a capillary DNA sequencer, each color of ATGC is generated by simple scattered light of laser light that excites fluorescence when impurities or dust such as some crystals are precipitated are mixed. Are simultaneously measured, and a peak is detected at the same position in a plurality of profiles with the same intensity. Therefore, such an abnormal peak is excluded from the analysis target. Note that such an abnormal peak can also be detected from the shape of the peak such as how the peak spreads (σ) and the target property of the peak.

ノイズレベルの指定と除外とは、ノイズとみなすべき小さなピークを指定して除外することをいう。つまり、遺伝子の発現量が非常に少ないために、小さく表されたピークが多数得られるため、そのような小さなピークをノイズとして解析対象から除外することをいう。例えば、遺伝子転写産物の個数が５個／細胞以下となる場合は解析対象から除外することを挙げることができる。 The designation and exclusion of the noise level means that a small peak that should be regarded as noise is designated and excluded. That is, since the gene expression level is very small, a large number of small peaks are obtained, and therefore, such small peaks are excluded from the analysis target as noise. For example, when the number of gene transcripts is 5 / cell or less, it can be excluded from the analysis target.

そして、重なり合うピークの分離とは、大きなピークの裾野に小さなピークが埋もれてしまったり、ピークの肩の部分に別のピークが出てしまったりするために、重なり合うピークは単独のピークと比べて波形がゆがんでしまうため、そのようなピークを分離させることをいう。なお、このような重なり合うピークは、関数近似や波形分析により、波形のゆがみから予測することができるので、例えば、当該ピークの有する情報の一つとしてデータベースなどに記憶させておくことができ、必要に応じて、後記する遺伝子発現解析工程Ｓ４でそのピークに関する関連情報などとして付加することができる。 Overlapping peak separation means that a small peak is buried in the base of a large peak, or another peak appears in the shoulder of the peak, so the overlapping peak is a waveform compared to a single peak. This means that such a peak is separated because it is distorted. Such overlapping peaks can be predicted from waveform distortion by function approximation or waveform analysis. For example, they can be stored in a database or the like as one piece of information that the peak has. Accordingly, it can be added as related information regarding the peak in the gene expression analysis step S4 described later.

また、測定プロファイルデータに重なり合うピークがある場合は以下のようにすることでも個々のピークを分離して表すことができる。
（１）ピークが重なる部分については、重なり合っているピークごとの寄与分を足し合わせて波形を作成してもよい。この方法によれば、原理的には実際に測定される波形と精度よく一致するので好ましい。
（２）ピークの重なりを無視し、重なっているピークの形状が交差する部分で相手の関数形に切り替えて作成してもよい。この方法によれば、ピークの分離を迅速に行うことができ、ピークが見やすくなるものの、実際の測定波形とは異なる形状になってしまうことがある。 If there are overlapping peaks in the measurement profile data, the individual peaks can be separated and represented by the following.
(1) For the portion where the peaks overlap, the waveform may be created by adding the contributions of the overlapping peaks. This method is preferable because, in principle, it matches the waveform actually measured with high accuracy.
(2) It may be created by ignoring the overlap of peaks and switching to the function form of the counterpart at the intersection of the overlapping peak shapes. According to this method, peaks can be separated quickly and the peaks can be easily seen, but the shape may be different from the actual measurement waveform.

なお、これらの方法によっても複数のピークが完全にオーバーラップする場合には最も大きいピークしか見ることができないので、そのような場合にはピークを分取し、塩基配列を解読するなどして解析するのがよい。 In addition, when multiple peaks completely overlap even with these methods, only the largest peak can be seen. In such a case, the peak is separated and analyzed by decoding the base sequence, etc. It is good to do.

ここで、複数の遺伝子転写産物（例えば、ｍＲＮＡ）を逆転写したｃＤＮＡの一部を増幅して得られる測定対象物となるＤＮＡフラグメントの塩基数相当値の位置と、その位置における測定対象物の転写量相当の検出量とに基づいて得られる塩基数相当値の測定範囲を第二の波形として表した測定プロファイルデータは、例えば、図３に示すようにすることによって得ることができる。なお、図３は、測定プロファイルデータを得るまでの手順の一例を示した説明図である。 Here, the position corresponding to the number of bases of the DNA fragment to be obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts (for example, mRNA), and the position of the measurement object at that position The measurement profile data representing the measurement range of the base number equivalent value obtained based on the detection amount equivalent to the transcription amount as the second waveform can be obtained by, for example, as shown in FIG. FIG. 3 is an explanatory diagram showing an example of a procedure for obtaining measurement profile data.

図３（ａ）に示すように、まず、ポリ（Ａ）ＲＮＡ（ｍＲＮＡ）１１を鋳型として５’末端に、例えばビオチンなどのタグ物質１２が付加された一本鎖ｃＤＮＡ（First strand cDNA）１３を合成する（ａ工程）。次いで、同図（ｂ）に示すように、ａ工程で合成された一本鎖ｃＤＮＡ１３を鋳型として二本鎖ｃＤＮＡ（Second strand cDNA）１４を得る（ｂ工程）。そして、同図（ｃ）に示すように、ｂ工程で得られた二本鎖ｃＤＮＡ１４を第１の制限酵素Ｘ（例えば、４塩基認識の制限酵素）で切断し（ｃ工程）、同図（ｄ）に示すように、当該タグ物質に高親和性を有する物質を用いて、ｃ工程で得られたｃＤＮＡフラグメントから当該タグ物質が付加しているｃＤＮＡフラグメントを回収する（ｄ工程）。 As shown in FIG. 3A, first, a single strand cDNA (First strand cDNA) 13 in which a tag substance 12 such as biotin is added to the 5 ′ end using poly (A) RNA (mRNA) 11 as a template. Is synthesized (step a). Next, as shown in FIG. 5B, a double-stranded cDNA 14 is obtained using the single-stranded cDNA 13 synthesized in the step a as a template (step b). Then, as shown in FIG. 4C, the double-stranded cDNA 14 obtained in the step b is cleaved with a first restriction enzyme X (for example, a restriction enzyme with 4-base recognition) (step c). As shown in d), using a substance having a high affinity for the tag substance, the cDNA fragment to which the tag substance is added is recovered from the cDNA fragment obtained in the c process (d process).

次いで、同図（ｅ）に示すように、ｄ工程で回収されたｃＤＮＡフラグメントの第１の制限酵素Ｘによる切断部位へ、当該切断部位の配列に相補的な配列およびＸプライマーに相補的な配列を含むＸアダプター１６を５’末端側に結合させたｃＤＮＡフラグメントを得て（ｅ工程）、同図（ｆ）に示すように、ｅ工程で得たｃＤＮＡフラグメントを、当該Ｘアダプター１６を切断しない第２の制限酵素Ｙ（例えば、４塩基認識の制限酵素）で切断する（ｆ工程）。そして、同図（ｇ）に示すように、このタグ物質１２に高親和性を有する物質１５を用いて、ｆ工程で得られたｃＤＮＡフラグメントから当該タグ物質１２が結合しているｃＤＮＡフラグメントを取り除くことにより、第２の制限酵素Ｙによる切断部位を含むｃＤＮＡフラグメントを回収し（ｇ工程）、同図（ｈ）に示すように、ｇ工程で回収されたｃＤＮＡフラグメントの第２の制限酵素Ｙによる切断部位へ、当該切断部位の配列に相補的な配列およびＹプライマーに相補的な配列を含むＹアダプター１７を結合させたｃＤＮＡフラグメント１８を得る（ｈ工程）。 Next, as shown in FIG. 5 (e), a sequence complementary to the sequence of the cleavage site and a sequence complementary to the X primer to the cleavage site of the cDNA fragment recovered in step d by the first restriction enzyme X To obtain a cDNA fragment in which the X adapter 16 containing is bound to the 5 ′ end (step e), and the cDNA fragment obtained in step e does not cleave the X adapter 16 as shown in FIG. Cleavage with a second restriction enzyme Y (for example, a restriction enzyme with 4-base recognition) (step f). Then, as shown in FIG. 5G, using the substance 15 having high affinity for the tag substance 12, the cDNA fragment to which the tag substance 12 is bound is removed from the cDNA fragment obtained in step f. Thus, the cDNA fragment containing the cleavage site by the second restriction enzyme Y is recovered (step g), and the cDNA fragment recovered in the step g is recovered by the second restriction enzyme Y as shown in FIG. A cDNA fragment 18 is obtained in which a Y adapter 17 containing a sequence complementary to the sequence of the cleavage site and a sequence complementary to the Y primer is bound to the cleavage site (step h).

次いで、同図（ｉ）に示すように、このＸアダプター１６の配列に相補的な配列を含み、その３’末端に２塩基配列であるＮ_１Ｎ_２（Ｎ_１およびＮ_２は同一又は異なっていてもよい、アデニン、チミン、グアニンおよびシトシンからなる群より選ばれる塩基である）を含み、５’末端に蛍光物質２０を含むＸプライマー１９と、当該Ｙアダプター１７の配列に相補的な配列を含み、その３’末端に２塩基配列であるＮ_３Ｎ_４（Ｎ_３およびＮ_４は同一又は異なっていてもよい、アデニン、チミン、グアニンおよびシトシンからなる群より選ばれる塩基である）を含むＹプライマー２１とからなるプライマーセットを用いて、ｈ工程で得られたｃＤＮＡフラグメント１８を鋳型としたＰＣＲ反応を行う（ｉ工程）。そして、同図（ｊ）に示すように、ｉ工程で得られたＰＣＲ増幅産物をＤＮＡシーケンサーにかけて当該ＰＣＲ増幅産物の移動距離（すなわち、塩基数）および蛍光量（すなわち、検出量）を検出し（ｊ工程）、かかる検出結果を波形として表すことで測定プロファイルデータを作成することができる。 Next, as shown in FIG. 5 (i), N ₁ N ₂ (N ₁ and N ₂ are the same or different) containing a sequence complementary to the sequence of the X adapter 16 and having a 2 base sequence at its 3 ′ end. Which is a base selected from the group consisting of adenine, thymine, guanine and cytosine), and a sequence complementary to the sequence of the X adapter 19 containing the fluorescent substance 20 at the 5 ′ end and the Y adapter 17 N ₃ N ₄ (N ₃ and N ₄ may be the same or different, and is a base selected from the group consisting of adenine, thymine, guanine and cytosine) at the 3 ′ end thereof A PCR reaction using the cDNA fragment 18 obtained in step h as a template is performed using a primer set consisting of Y primer 21 (step i). Then, as shown in FIG. 6 (j), the PCR amplification product obtained in step i is applied to a DNA sequencer to detect the movement distance (that is, the number of bases) and the fluorescence amount (that is, the detection amount) of the PCR amplification product. (Step j) Measurement profile data can be created by expressing the detection result as a waveform.

このようにすれば、ある条件下における特定の細胞についてのポリ（Ａ）ＲＮＡ（ｍＲＮＡ）の発現パターン（既知であると未知であるとを問わない。）を、Ｎ_１、Ｎ_２、Ｎ_３、Ｎ_４の４つの塩基の組合せ、つまり４^４＝２５６通りに分類（サブグループ化）して得ることができる。なお、本発明においてはこれに限定されるものではなく、例えば、Ｎ_１、Ｎ_２をＮ_Ａ、Ｎ_Ｂ、Ｎ_Ｃとし、Ｎ_３、Ｎ_４をＮ_Ｄ、Ｎ_Ｅ、Ｎ_Ｆとする６つの塩基の組合せとしてもよい。なお、この場合は、４^６＝４０９６通りに分類（サブグループ化）することができる。もちろん、この場合に用いる制限酵素Ｘ，Ｙはいずれも６塩基認識の制限酵素を用いる必要がある。
このような測定プロファイルデータの作成方法は、例えば、国際公開番号ＷＯ２００２／０４８３５２号パンフレットや、特開２００５−６５５４号公報に記載されている。 In this way, the expression pattern of poly (A) RNA (mRNA) for a specific cell under a certain condition (whether it is known or unknown) is expressed as N ₁ , N ₂ , N _3. , N ₄ of four bases, that is, 4 ⁴ = 256 classifications (subgroups). However, the present invention is not limited to this. For example, N ₁ and N ₂ are N _A , N _B and N _C, and N ₃ and N ₄ are N _D , N _E and N _F 6 It may be a combination of two bases. In this case, it is possible to classify (subgroup) 4 ⁶ = 4096. Of course, the restriction enzymes X and Y used in this case must be 6-base recognition restriction enzymes.
Such a method for creating measurement profile data is described in, for example, pamphlet of International Publication No. WO2002 / 048352 and JP-A-2005-6554.

図１に示すように、次に行うピーク対応付け工程Ｓ３は、前記した第一の波形および前記第二の波形のうちの少なくとも一方の一部または全部の領域を補正しピークの位置を調整する補正処理を行い、前記した第二の波形中の着目する領域を含む複数のピークと、第一の波形における複数のピークとを対応付けることで、着目する領域を含む複数のピークと、当該複数のピークに相当する第一の波形中の複数のピークとを対応付ける工程である。
つまり、これらのデータに含まれる個々のピークについて一方または双方のデータの一部または全部の領域を補正しピークの位置を調整する補正処理を行い、同じ位置に有るピークと同じ位置に無いピークを検出し、同じ位置に有るピーク同士を関連付ける。 As shown in FIG. 1, in the next peak association step S <b> 3 to be performed next, a partial or entire region of at least one of the first waveform and the second waveform is corrected to adjust the peak position. By performing correction processing and associating the plurality of peaks including the region of interest in the second waveform described above with the plurality of peaks in the first waveform, the plurality of peaks including the region of interest and the plurality of peaks This is a step of associating a plurality of peaks in the first waveform corresponding to the peaks.
That is, for each peak included in these data, correction processing is performed to correct a part or all of one or both of the data and adjust the position of the peak, and peaks that are not at the same position as the peak at the same position are performed. Detect and associate peaks at the same position.

ピーク対応付け工程Ｓ３は、例えば、（１）参照プロファイルデータ取得工程Ｓ１（図１参照）で取得された参照プロファイルデータのピークの位置と、測定プロファイルデータ作成工程Ｓ２（図１参照）で作成された測定プロファイルデータのピークの位置が一致する場合は、一致する参照プロファイルデータのピークと測定プロファイルデータのピークを対応付け、（２）参照プロファイルデータ取得工程Ｓ１で取得された参照プロファイルデータのピークの位置と、測定プロファイルデータ作成工程Ｓ２で作成された測定プロファイルデータのピークの位置が一部または全部ずれている場合は、これらのうちの少なくとも一方の波形について、これらの波形の類似度が最も高くなるように一部または全部の領域を補正しピークの位置を調整する補正処理を行った上で、参照プロファイルデータのピークと、測定プロファイルデータのピークとを対応付けることにより好適に行うことができる。
なお、測定プロファイルデータのピークの位置が一部ずれているとは、参照プロファイルデータのピークの位置に対して、複数の測定プロファイルデータの内のいくつかがずれている場合や、ある測定プロファイルのあるサイズ領域（例えば、２００ｂｐから３５０ｂｐの間）がずれている場合などが該当する。 The peak associating step S3 is created in, for example, (1) the peak position of the reference profile data acquired in the reference profile data acquiring step S1 (see FIG. 1) and the measurement profile data creating step S2 (see FIG. 1). If the peak positions of the measured profile data match, the corresponding peak of the reference profile data is associated with the peak of the measurement profile data, and (2) the peak of the reference profile data acquired in the reference profile data acquisition step S1 When the position and the position of the peak of the measurement profile data created in the measurement profile data creation step S2 are partly or completely shifted, at least one of these waveforms has the highest similarity between these waveforms. Correct some or all of the area so that After performing the correction processing for integer, it can be suitably carried out by associating the peak of the reference profile data, the measured profile data and a peak.
Note that when the peak position of the measurement profile data is partially shifted, some of the multiple measurement profile data is shifted relative to the peak position of the reference profile data, This corresponds to a case where a certain size region (for example, between 200 bp and 350 bp) is shifted.

ピーク対応付け工程Ｓ３における補正処理としては、予め計算基準点を用意し、その基準点間にあるもう１つの基準点の左右を泳動サイズ（塩基数相当値）方向に拡大または縮小して波形相互の評価値（相関係数に類するもの）が向上するように補正処理を行うグローバル補正があり、また、波形のピークが僅かにずれている場合にそのピークの前後の評価値（相関係数に類するもの）を最大にするよう個別の補正量を計算して補正処理を行うローカル補正がある。前者のグローバル補正は、サイズマーカーの認識ずれや実験の条件に由来する相対的に大きな測定揺らぎを吸収することができ、後者のローカル補正は、電気泳動時のｃＤＮＡフラグメントの立体構造等に由来するものや、何らかの実験条件の差異に由来する相対的に小さな測定揺らぎを吸収することができる。 As correction processing in the peak association step S3, a calculation reference point is prepared in advance, and the left and right sides of another reference point between the reference points are enlarged or reduced in the migration size (base number equivalent value) direction so that the waveforms are mutually connected. There is a global correction that corrects so that the evaluation value (similar to the correlation coefficient) is improved, and when the waveform peak is slightly shifted, the evaluation values before and after the peak (correlation coefficient) There is a local correction in which individual correction amounts are calculated and correction processing is performed so as to maximize the similarity). The former global correction can absorb relatively large measurement fluctuations derived from size marker recognition deviations and experimental conditions, while the latter local correction is derived from the three-dimensional structure of cDNA fragments during electrophoresis, etc. And relatively small measurement fluctuations derived from differences in experimental conditions.

したがって、測定プロファイルデータと参照プロファイルデータの波形の状態に応じてグローバル補正およびローカル補正のうちの一方の補正処理、或いは両方を適宜に組み合わせた補正処理を行うことによって、前記したように、これらの波形の類似度が最も高くなるように調整することができる。 Therefore, as described above, one of the global correction and the local correction, or a correction process appropriately combining both, is performed according to the waveform state of the measurement profile data and the reference profile data. It can be adjusted so that the similarity of the waveform is the highest.

このような補正処理の一例としては、例えば、（ａ）参照プロファイルデータのピークの位置を基準として測定プロファイルデータのピークの位置を移動させるか、（ｂ）ある測定プロファイルデータのピークの位置を基準として参照プロファイルデータのピークの位置を移動させるか、または、（ｃ）参照プロファイルデータのピークの位置と測定プロファイルデータのピークの位置の双方を移動させることを挙げることができる。このようにすることによって、測定プロファイルデータの波形と参照プロファイルデータの波形の類似度が最も高くなるように調整することができる結果、参照プロファイルデータのピークと測定プロファイルデータのピークを高精度に対応付けすることが可能となる。
なお、前記した補正処理は、補正処理する対象となる波形の数が多いほど高精度に補正することができる。つまり、多数決によって、よりもっともらしい方向へ動かすことができる。したがって、補正処理する対象となる波形の数が十分に多い場合には、計算基準点となるピークの位置を指定してもよいし、そのようなピークの位置を指定しなくてもよい。 As an example of such correction processing, for example, (a) the peak position of the measurement profile data is moved with reference to the peak position of the reference profile data, or (b) the peak position of certain measurement profile data is used as a reference. Or (c) moving both the peak position of the reference profile data and the peak position of the measurement profile data. As a result, the similarity between the waveform of the measurement profile data and the waveform of the reference profile data can be adjusted to the highest level. As a result, the peak of the reference profile data and the peak of the measurement profile data can be handled with high accuracy. It becomes possible to attach.
The correction process described above can be corrected with higher accuracy as the number of waveforms to be corrected increases. In other words, you can move in a more plausible direction by majority vote. Therefore, when the number of waveforms to be corrected is sufficiently large, the position of the peak serving as the calculation reference point may be designated, or such a peak position may not be designated.

かかる補正処理は、簡便かつ高精度な補正処理を行うことができるため、ガウス関数に基づく関数近似（ガウス関数近似方式）によって行われるのが好ましいが、例えば、ガウス関数近似方式を基本として、近似による波形寄与分を元のデータから逐次減算して関数近似を繰り返す試行減算方式を併用してもよい。このようにすれば、より適切な補正処理を行うことが可能となる。 Such correction processing can be performed simply and with high accuracy, and is preferably performed by function approximation based on a Gaussian function (Gaussian function approximation method). For example, the approximation is based on the Gaussian function approximation method. A trial subtraction method that repeats function approximation by sequentially subtracting the waveform contribution by the original data may be used together. In this way, more appropriate correction processing can be performed.

ガウス関数近似方式と試行減算方式を併用した補正処理は、例えば、以下のようにして行うことができる。
波形として表した測定プロファイルデータのうちから、主ピーク（１回目の近似で、その近似が確からしいと認められるもの）の寄与を全体の波形から減算し、その残りの部分に対して同様に波形近似を行う。以後、予め定めた範囲に収まるか、予め定めた回数を超えるまでこの処理を繰り返す。なお、かかる補正処理は、裾野の領域には使用せず、ピーク両側の立ち上がり部分を使用して近似を行うと、より高精度に近似させることができる。また、この場合において、補正処理した波形に、補正処理前の波形と重なる測定点がどの程度存在するかを、その補正の確からしさの評価基準とするとよい。 The correction process using both the Gaussian function approximation method and the trial subtraction method can be performed as follows, for example.
From the measured profile data expressed as a waveform, subtract the contribution of the main peak (the one whose approximation is likely to be certain in the first approximation) from the entire waveform, and apply the same waveform to the rest Approximate. Thereafter, this process is repeated until it falls within a predetermined range or exceeds a predetermined number of times. Such correction processing can be approximated with higher accuracy by performing approximation using the rising portions on both sides of the peak instead of using the base region. Further, in this case, it is preferable that the degree of the measurement point overlapping the waveform before the correction process in the corrected waveform is used as an evaluation criterion for the accuracy of the correction.

このような補正処理を行うと、最初に、ガウス関数近似方式による補正処理を行って確かなピークだけをリストアップし、その結果を表示してユーザーの経験則に基づく判断基準との比較を自動的に行い、さらに高次の近似ピークが必要と判断された場合には、試行減算方式による補正処理を行ってより評価値が低いピークも取得するように再度補正処理を行う、といった処理を実施することができる。 When such correction processing is performed, first, correction processing using the Gaussian function approximation method is performed to list only certain peaks, and the results are displayed and automatically compared with criteria based on the user's rule of thumb. If it is determined that a higher-order approximate peak is necessary, the correction process is performed again using a trial subtraction method so that a peak with a lower evaluation value is acquired. can do.

なお、飽和ピーク（サチレーション）があった場合は、以下のような補正処理を行うとよい。飽和ピークは、測定器のセンサの飽和状態等により先端が潰れたような形状として検出されるピークであるので、例えば、飽和ピークの両端根元部分である「立ち上がり部分」と「立ち下がり部分」とから波形中央部の先端形状を推定し、本来存在するであろう高さのピークをガウス関数等で作り出す一連の処理が挙げられる。このような補正処理を行うか否かについては、使用する装置のダイナミックレンジを考慮して設定された閾値を超えるかどうかで判断させることができる。 When there is a saturation peak (saturation), the following correction process may be performed. Since the saturation peak is a peak that is detected as a shape in which the tip is crushed due to the saturation state of the sensor of the measuring instrument, for example, the `` rising part '' and `` falling part '' that are the root parts of the saturation peak From this, a series of processes for estimating the shape of the tip of the central portion of the waveform and generating a peak of height that would originally exist using a Gaussian function or the like can be mentioned. Whether or not to perform such correction processing can be determined based on whether or not a threshold value set in consideration of the dynamic range of the device to be used is exceeded.

参照プロファイルデータの波形と測定プロファイルデータの波形について前記した補正処理を行い、これらの波形の類似度が最も高くなるように一部または全部のピークの位置を調整して、測定プロファイルデータのピークと参照プロファイルデータのピークを対応付けた一例を図４〜６を参照して説明する。なお、図４〜６は、１２サンプルについて２回繰り返してＨｉＣＥＰ法を行った結果、作成された測定プロファイルデータ（計２４波形）を示す図である。 The correction processing described above is performed on the waveform of the reference profile data and the waveform of the measurement profile data, and the positions of some or all of the peaks are adjusted so that the degree of similarity between these waveforms is the highest. An example in which the peaks of the reference profile data are associated will be described with reference to FIGS. 4 to 6 are diagrams showing measurement profile data (total 24 waveforms) created as a result of performing HiCEP method twice for 12 samples.

図４は、２４波形のピークの位置が局所的にずれてしまった測定プロファイルデータの例であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子を示す図である。
図４（ａ）では、中央の２ピークがずれていて、ピークの位置（塩基数相当値）のみでは両ピークの特定が困難である。しかし、同図（ｂ）の下段に示される波形のうち一番下に表された測定プロファイルデータを参照プロファイルデータとして用い、これを基準として他の２３波形のピークの位置を前記した補正処理によって調整すると、同図（ｂ）の上段の表示部に示されるように参照プロファイルデータの中央の２ピークと、測定プロファイルデータの中央の２ピークとを精度よく対応付けることができる。 4A and 4B are examples of measurement profile data in which the positions of the peaks of the 24 waveforms are locally shifted. FIG. 4A shows a state before correction processing and FIG. 4B shows a state after correction processing.
In FIG. 4A, the two peaks at the center are shifted, and it is difficult to specify both peaks only by the peak position (base number equivalent value). However, the measurement profile data shown at the bottom of the waveform shown in the lower part of FIG. 5B is used as reference profile data, and the peak positions of the other 23 waveforms are used as a reference by the correction processing described above. When the adjustment is made, the center two peaks of the reference profile data and the center two peaks of the measurement profile data can be associated with each other with high accuracy as shown in the upper display section of FIG.

また、図５は、２４波形のピークの位置が全体的に大きくシフトしてずれてしまった測定プロファイルデータの例（図５では測定プロファイルデータの一部のみを示している。）であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子を示す図である。
図５（ａ）では、２４波形のうちの６波形がピークの位置（塩基数相当値）方向に大きくシフトしており、ピークが３つあるようにみえる。しかし、同図（ｂ）に示すように、２４波形のうちの１つを参照プロファイルデータとして用い、これを基準として先に述べた６波形を含む他の２３波形のピークの位置を前記した補正処理によって調整するとこれらが揃い、ピークの位置が２つであることが分かり、これらを精度よく対応付けることができる。 FIG. 5 is an example of measurement profile data in which the peak positions of 24 waveforms are largely shifted and shifted as a whole (FIG. 5 shows only a part of the measurement profile data). (A) is a figure before correction processing, (b) is a figure which shows the mode after correction processing.
In FIG. 5A, six of the 24 waveforms are greatly shifted in the direction of the peak position (value corresponding to the number of bases), and it appears that there are three peaks. However, as shown in FIG. 5B, one of the 24 waveforms is used as reference profile data, and the correction is performed as described above for the peak positions of the other 23 waveforms including the 6 waveforms described above on the basis of this. If it adjusts by processing, these will be gathered and it turns out that the position of a peak is two, and these can be matched accurately.

図６は、２４波形のピークの位置が近接して複合している測定プロファイルデータの例であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子を示す図である。
図６（ａ）では、２４波形のピークが近接し、複合しているので、これらの測定プロファイルデータのピークの位置が少しずれるだけでそれぞれのピークの判定が困難となる。しかし、同図（ｂ）の下段に示される波形のうち一番下に表された測定プロファイルデータを参照プロファイルデータとして用い、これを基準として他の２３波形のピークの位置を前記した補正処理によって調整すると、同図（ｂ）の上段の表示部に示されるように全てのピークが特定の位置で揃い、これらを精度よく対応付けることができる。 FIGS. 6A and 6B are examples of measurement profile data in which the positions of the peaks of 24 waveforms are close to each other. FIG. 6A shows a state before correction processing, and FIG. 6B shows a state after correction processing.
In FIG. 6A, since the peaks of 24 waveforms are close to each other and are combined, it is difficult to determine each peak just by slightly deviating the peak positions of these measurement profile data. However, the measurement profile data shown at the bottom of the waveform shown in the lower part of FIG. 5B is used as reference profile data, and the peak positions of the other 23 waveforms are used as a reference by the correction processing described above. When the adjustment is performed, all the peaks are aligned at a specific position as shown in the upper display section of FIG. 5B, and these can be associated with each other with high accuracy.

図７は、１０波形の測定プロファイルデータを参考にして、任意の位置にピークを有する波形を人工的に作成して参照プロファイルデータとした場合を示す図である。
図７の上段の表示部に示されるように、１０波形の測定プロファイルデータを参考にして、任意の位置にピークを有する波形を人工的に作成して参照プロファイルデータとし、これを基準として１０波形の測定プロファイルデータのピークの位置を前記した補正処理によって調整すると、同図の下段の表示部に示すように全てのピークが揃い、これらを精度よく対応付けることができる。なお、人工的に作成したプロファイルは、前記したように、データベースに登録されているピークの位置の情報を元に作成することができる。その際、異なる実験条件であれば、発現の大きいピーク、小さいピーク、中間のピークというように強度値（ピークの高さ）を任意に設定することができる。このようにして作成したピークは、補正処理に関して有効に機能し得る。 FIG. 7 is a diagram illustrating a case where a waveform having a peak at an arbitrary position is artificially created as reference profile data with reference to measurement profile data of 10 waveforms.
As shown in the upper display section of FIG. 7, with reference to measurement profile data of 10 waveforms, a waveform having a peak at an arbitrary position is artificially created as reference profile data, and 10 waveforms are based on this. If the position of the peak of the measurement profile data is adjusted by the correction process described above, all the peaks are aligned as shown in the lower display section of the figure, and these can be associated with high accuracy. As described above, the artificially created profile can be created based on the peak position information registered in the database. In this case, under different experimental conditions, the intensity value (peak height) can be arbitrarily set such as a peak with high expression, a small peak, and an intermediate peak. The peak created in this way can function effectively with respect to the correction process.

ここで、参照プロファイルデータのピークに、当該ピークの由来に関する遺伝子情報が付加されている場合は、参照プロファイルデータのピークと対応付けすることができた測定プロファイルデータのピークに対して、前記した遺伝子情報を参照プロファイルデータから取得し、当該測定プロファイルデータのピークの遺伝子情報とみなすことができる。 Here, when gene information relating to the origin of the peak is added to the peak of the reference profile data, the gene described above is used for the peak of the measurement profile data that can be associated with the peak of the reference profile data. Information can be acquired from reference profile data and regarded as gene information of the peak of the measurement profile data.

図１に示すように、次に行う遺伝子発現解析工程Ｓ４は、対応付けされた測定プロファイルデータのピークの由来する遺伝子転写産物の情報を転写産物種情報から読み取り、対応付けされた測定プロファイルデータのピークの由来する遺伝子を特定することで遺伝子の発現状態を解析する工程である。
遺伝子発現解析工程Ｓ４は、例えば、参照プロファイルデータのピークと測定プロファイルデータのピークの対応付けができたピークと、対応付けができなかったピークとが区別できるように表示するとともに、参照プロファイルデータのピークに、当該ピークの由来となる遺伝子に関する遺伝子情報が付加されている場合は、当該遺伝子情報を引用することによって測定プロファイルデータにおいて対応付けされたピークの遺伝子を特定することで遺伝子の発現状態を解析することができる。なお、対応付けができたピークと、対応付けができなかったピークとを区別できるように表示するとは、例えば、一方のピークを色違いで表示したり、一方のピークを明滅するように表示したりすることが挙げられる。 As shown in FIG. 1, in the next gene expression analysis step S4, the information of the gene transcript derived from the peak of the associated measurement profile data is read from the transcript type information, and the associated measurement profile data It is a step of analyzing the expression state of a gene by specifying the gene from which the peak is derived.
In the gene expression analysis step S4, for example, the peak of the reference profile data and the peak of the measurement profile data are displayed so as to be distinguishable from the peak that cannot be matched, and the reference profile data If gene information about the gene from which the peak is derived is added to the peak, the gene expression state can be determined by identifying the gene of the peak associated in the measurement profile data by quoting the gene information. Can be analyzed. Note that displaying a peak that can be correlated with a peak that could not be correlated means, for example, displaying one peak in a different color or displaying one peak blinking. Can be mentioned.

遺伝子発現解析工程Ｓ４で、参照プロファイルデータのピークと、測定プロファイルデータのピークとの間にピーク対応付けできないピークがあった場合には、そのピークに関する関連情報を付加することにより、より好適な解析結果が得られる。かかる関連情報としては、参照プロファイルデータの波形との類似性に関する相関係数を基にした評価値（図８参照）や、ピーク位置、プライマーセット、発現強度、ピーク形状の特徴、サンプルの細胞に関する情報や、実験条件などの実験情報などを挙げることができ、これらのうち少なくとも１つを含んでいるのがよい。なお、図８は、参照プロファイルデータの波形との類似性に関する相関係数を基にした評価値を図示したものである。 In the gene expression analysis step S4, if there is a peak that cannot be matched between the peak of the reference profile data and the peak of the measurement profile data, a more suitable analysis is performed by adding related information about the peak. Results are obtained. Such related information includes an evaluation value (see FIG. 8) based on a correlation coefficient regarding similarity to the waveform of the reference profile data, peak position, primer set, expression intensity, peak shape characteristics, and sample cells. Information and experimental information such as experimental conditions can be mentioned, and it is preferable that at least one of them is included. FIG. 8 illustrates an evaluation value based on a correlation coefficient related to similarity with the waveform of the reference profile data.

図８の中段の表示部に示されているように、参照プロファイルデータに無いピークが測定プロファイルデータに有る場合や、参照プロファイルデータに有るピークが測定プロファイルデータに無い場合がある。この場合、参照プロファイルデータとの関係ではその評価値が低いものとなる。そのため、同図の下段の表示部の中央付近に示されているように、かかる測定プロファイルデータのピークに対する評価値は、他のピークに対する評価値よりも若干低くなる。つまり、ピークサイズに関する信頼度が他のピークよりも若干低くなる。 As shown in the middle display section of FIG. 8, there may be a case where there is a peak not in the reference profile data in the measurement profile data, or there is no peak in the reference profile data in the measurement profile data. In this case, the evaluation value is low in relation to the reference profile data. Therefore, as shown in the vicinity of the center of the lower display portion in the figure, the evaluation value for the peak of the measurement profile data is slightly lower than the evaluation values for the other peaks. That is, the reliability regarding the peak size is slightly lower than the other peaks.

もちろん、参照プロファイルデータのピークと対応付けすることのできた測定プロファイルデータのピークに、前記した参照プロファイルデータとの類似性に関する相関係数を基にした評価値などの関連情報を付加してもよい。
なお、このように、関連情報として参照プロファイルデータの波形との類似性に関する相関係数を基にした評価値を付加した場合、かかる評価値は同時に、測定プロファイルデータに付加された転写産物種情報の対応付けの確からしさに関する評価を与えることにもなる。 Of course, related information such as an evaluation value based on the correlation coefficient related to the similarity to the reference profile data may be added to the peak of the measurement profile data that can be associated with the peak of the reference profile data. .
As described above, when the evaluation value based on the correlation coefficient related to the similarity to the waveform of the reference profile data is added as the related information, the evaluation value is simultaneously added to the transcription product type information added to the measurement profile data. It is also possible to give an evaluation regarding the likelihood of the association.

以上に述べたように、本発明の遺伝子発現解析方法によれば、参照プロファイルデータと測定プロファイルデータを波形として表して対比させるため、これらの対比を容易に行うことができ、かつ遺伝子の発現状態の解析を容易に行うことができる。また、従来は、測定プロファイルデータのピークがどのような遺伝子転写産物（ｍＲＮＡ）由来のものなのか、毎回、ピークを分取して塩基配列を決定し、任意のデータベースによって類似性検索などを行って遺伝子を決定しなければならなかったが、対応付けに用いた参照プロファイルデータのピークに遺伝子情報が付加されている場合は、これにより測定プロファイルデータのピークが由来する遺伝子を同定することができるので、前記した作業を省くことができる。 As described above, according to the gene expression analysis method of the present invention, since the reference profile data and the measurement profile data are represented as waveforms and compared, the comparison can be easily performed, and the expression state of the gene Can be easily analyzed. In addition, conventionally, the gene profile (mRNA) from which the peak of the measurement profile data is derived is determined each time the base sequence is determined and the similarity search is performed using an arbitrary database. If the gene information is added to the peak of the reference profile data used for matching, the gene from which the peak of the measurement profile data is derived can be identified. Therefore, the above-described work can be omitted.

次に、図９を参照して、本発明に係る遺伝子発現解析装置について説明する。なお、図９は、本発明に係る遺伝子発現解析装置の構成の一例を示すブロック図である。 Next, the gene expression analysis apparatus according to the present invention will be described with reference to FIG. FIG. 9 is a block diagram showing an example of the configuration of the gene expression analysis apparatus according to the present invention.

本発明に係る遺伝子発現解析装置Ａは、発現している複数の遺伝子転写産物に由来する複数のピークを有する一つの波形として表したプロファイルデータを用いて遺伝子の発現状態を解析する遺伝子発現解析装置であって、図９に示すように、参照プロファイルデータ取得手段１と、測定プロファイルデータ作成手段２と、ピーク対応付け手段３と、遺伝子発現解析手段４とを有する。
なお、本発明に係る遺伝子発現解析装置Ａの参照プロファイルデータ取得手段１、測定プロファイルデータ作成手段２、ピーク対応付け手段３および遺伝子発現解析手段４はそれぞれ、図１に示す、本発明に係る遺伝子発現解析方法の参照プロファイルデータ取得工程Ｓ１、測定プロファイルデータ作成工程Ｓ２、ピーク対応付け工程Ｓ３および遺伝子発現解析工程Ｓ４に対応するものであるため、以下の説明においては、前記した内容と重複する内容についての説明を省略することとする。 A gene expression analysis apparatus A according to the present invention analyzes a gene expression state using profile data represented as a single waveform having a plurality of peaks derived from a plurality of expressed gene transcripts. In addition, as shown in FIG. 9, the apparatus includes reference profile data acquisition means 1, measurement profile data creation means 2, peak association means 3, and gene expression analysis means 4.
Note that the reference profile data acquisition means 1, measurement profile data creation means 2, peak association means 3 and gene expression analysis means 4 of the gene expression analysis apparatus A according to the present invention are respectively shown in FIG. Since this corresponds to the reference profile data acquisition step S1, the measurement profile data creation step S2, the peak association step S3, and the gene expression analysis step S4 of the expression analysis method, in the following description, the same contents as those described above The description about is omitted.

かかる遺伝子発現解析装置Ａは、例えば、一般的に使用される汎用コンピュータやワークステーションなどを使用することができ、図示しない接続手段によって外部のキャピラリーＤＮＡシーケンサーなどのＤＮＡシーケンサーＤＳや、インターネットなどの通信ネットワークＮＷを介して参照プロファイルデータベースＤＢ１、遺伝子情報データベースＤＢ２などと接続されており、適時、必要とされる情報を取得することができる。また、遺伝子発現解析装置Ａには、ハードディスクドライブなどの記憶手段５が設けられており、必要に応じて前記した各手段でした参照データや作成したデータ、解析結果などを記憶させたり、読み出したりすることができる。 The gene expression analyzer A can use, for example, a general-purpose computer or a workstation that is generally used. A DNA sequencer DS such as an external capillary DNA sequencer or a communication such as the Internet by a connection means (not shown). It is connected to the reference profile database DB1, the gene information database DB2, and the like via the network NW and can acquire necessary information in a timely manner. In addition, the gene expression analysis apparatus A is provided with a storage means 5 such as a hard disk drive, and stores or reads out the reference data, created data, analysis results, etc., as required by each means as described above. can do.

参照プロファイルデータ取得手段１は、通信ネットワークＮＷを介して参照プロファイルデータベースＤＢ１から、後記する測定プロファイルデータ作成手段２で作成された測定プロファイルデータの参照対象となる参照プロファイルデータを予め取得する。参照プロファイルデータは、前記したように、複数の遺伝子転写産物を逆転写したｃＤＮＡの一部を増幅して得られるＤＮＡフラグメントの塩基数相当値の位置と、その位置における遺伝子転写産物の転写量相当の検出量とに基づいて得られる塩基数相当値の参照範囲を第一の波形として表わし、かつ遺伝子転写産物の転写産物種情報として、第一の波形中の所定のピークと、そのピークが由来する転写産物種とを同定して記憶したデータである。なお、かかる参照プロファイルデータに含まれるピークに、遺伝子名などの遺伝子情報が付加されている場合、または付加するような設定がされている場合は、当該遺伝子情報を、通信ネットワークＮＷを介して遺伝子情報データベースＤＢ２から取得してもよい。 The reference profile data acquisition unit 1 acquires in advance reference profile data that is a reference target of the measurement profile data created by the measurement profile data creation unit 2 to be described later from the reference profile database DB1 via the communication network NW. As described above, the reference profile data is the position corresponding to the number of bases of the DNA fragment obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts, and the amount corresponding to the transcription amount of the gene transcript at that position. The reference range of the base equivalent value obtained on the basis of the detected amount is expressed as the first waveform, and as the transcript type information of the gene transcript, the predetermined peak in the first waveform and its peak are derived It is the data which identified and memorize | stored the transcription | transfer product kind to perform. In addition, when gene information such as a gene name is added to the peak included in the reference profile data, or when it is set to be added, the gene information is transferred to the gene via the communication network NW. You may acquire from information database DB2.

ここで、参照プロファイルデータベースＤＢ１や遺伝子情報データベースＤＢ２は一意のデータベースに限定されるものではなく、参照プロファイルデータとして用いることのできるデータが異なる複数のデータベースに格納された、いわゆるデータベース・プールのようなものも含まれる。このような場合、参照プロファイルデータを取得するためには、複数のデータベースから適宜必要なデータを取得することになる。なお、どのようなデータを参照プロファイルデータとするかはユーザーが任意に決定することができる。例えば、最も沢山のピークがあり、最も近い生物材料、または実験条件であり、最も多くの遺伝子情報が対応付けられているものを、必要であれば複数のデータベースから取得して参照プロファイルデータとすることができる。これは例えば、測定プロファイルデータが１つか２つといった少数である場合、１つの参照プロファイルデータのみを取得しただけであると、波形の補正処理の精度が良くないことが有り得る。そのような場合は、前記したように比較的近いデータを複数取り出してきて参照プロファイルデータとし、波形の補正処理を行い、ピーク対応付けに使用するとよい。 Here, the reference profile database DB1 and the gene information database DB2 are not limited to unique databases, such as a so-called database pool in which data that can be used as reference profile data is stored in a plurality of different databases. Also included. In such a case, in order to acquire the reference profile data, necessary data is appropriately acquired from a plurality of databases. Note that the user can arbitrarily determine what data is used as reference profile data. For example, if there are many peaks, the closest biological material, or the experimental condition, and the one with the most genetic information associated with it, it is obtained from multiple databases as reference profile data. be able to. For example, when the measurement profile data is a small number such as one or two, if only one reference profile data is acquired, the accuracy of waveform correction processing may not be good. In such a case, it is preferable to extract a plurality of relatively close data as reference profile data as described above, perform waveform correction processing, and use it for peak matching.

参照プロファイルデータベースＤＢ１としては、例えば、独立行政法人放射線医学総合研究所（日本国）の提供するＨｉＣＥＰ用のデータベース（ＵＲＬ：http://133.63.22.11/peakdb/query?request=dbmain&lang=ja）などを挙げることができる。また、遺伝子情報データベースＤＢ２としては、例えば、ＮＣＢＩ（National Center for Biotechnology Information（国立バイオテクノロジー情報センター）（アメリカ合衆国）ＵＲＬ：http://www.ncbi.nlm.nih.gov/）の提供するＧｅｎＢａｎｋ（ＵＲＬ：http://www.ncbi.nlm.nih.gov/Genbank/index.html）、一塩基多型（ＳＮＰ）のデータベースであるｄｂＳＮＰ（ＵＲＬ：http://www.ncbi.nlm.nih.gov/projects/SNP/）、ＥＳＴ（Expressed Sequence Tag）のデータベースであるｄｂＥＳＴ（ＵＲＬ：http://www.ncbi.nlm.nih.gov/dbEST/）、文献データベースであるＭＥＤＬＩＮＥ（ＰｕｂＭｅｄ（ＵＲＬ：http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed））などのデータベースを挙げることができる。 As reference profile database DB1, for example, HiCEP database (URL: http://133.63.22.11/peakdb/query?request=dbmain&lang=ja) provided by National Institute of Radiological Sciences (Japan) Can be mentioned. As the gene information database DB2, for example, GenBank (provided by NCBI (National Center for Biotechnology Information) (USA) URL: http://www.ncbi.nlm.nih.gov/) URL: http://www.ncbi.nlm.nih.gov/Genbank/index.html), dbSNP (URL: http: //www.ncbi.nlm.nih. Database) of single nucleotide polymorphism (SNP) gov / projects / SNP /), EST (Expressed Sequence Tag) database dbEST (URL: http://www.ncbi.nlm.nih.gov/dbEST/), literature database MEDLINE (PubMed (URL: http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed)).

そして、測定プロファイルデータ作成手段２は、例えば、図３を参照して説明したように、測定対象となる複数の遺伝子転写産物を逆転写したｃＤＮＡの一部をＰＣＲ法などにより増幅して得られる測定対象物となるＤＮＡフラグメント（ＰＣＲ増幅産物）をＤＮＡシーケンサーＤＳにかけることにより測定された塩基数相当値の位置と、その位置における測定対象物の転写量相当の検出量とを含むシーケンス結果ＳＲを取得することにより、かかるシーケンス結果ＳＲに基づいて得られる塩基数相当値の測定範囲を例えば、横軸を塩基数とし、縦軸を検出量としたグラフの形式で複数のピークを有する第二の波形として表した測定プロファイルデータを作成する。 The measurement profile data creation means 2 is obtained by, for example, amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcription products to be measured by PCR method or the like as described with reference to FIG. Sequence result SR including the position corresponding to the number of bases measured by applying the DNA fragment (PCR amplification product) to be measured to the DNA sequencer DS and the detected amount corresponding to the transcription amount of the measurement object at that position For example, the measurement range of the base number equivalent value obtained based on the sequence result SR is, for example, a second peak having a plurality of peaks in the form of a graph with the horizontal axis as the base number and the vertical axis as the detection amount. Create measurement profile data expressed as a waveform.

ピーク対応付け手段３は、前記した第一の波形および前記した第二の波形のうちの少なくとも一方の一部または全部の領域を補正しピークの位置を調整する補正処理を行い、第二の波形中の着目する領域を含む複数のピークと、第一の波形における複数のピークとを対応付けることで、前記着目する領域を含む複数のピークと、当該複数のピークに相当する第一の波形中の複数のピークとを対応付ける。 The peak associating means 3 performs a correction process for correcting a part or all of at least one of the first waveform and the second waveform, and adjusting the peak position, thereby obtaining the second waveform. By associating the plurality of peaks including the region of interest in the first peak with the plurality of peaks in the first waveform, the plurality of peaks including the region of interest and the first waveform corresponding to the plurality of peaks in the first waveform Associate multiple peaks.

遺伝子発現解析手段４は、対応付けされた測定プロファイルデータのピークの由来する遺伝子転写産物の情報を転写産物種情報から読み取り、対応付けされた測定プロファイルデータのピークの由来する遺伝子を特定することで遺伝子の発現状態を解析する。なお、ピーク対応付け手段３で対応付けされた測定プロファイルデータのピークに遺伝子情報を付加する場合は、この遺伝子発現解析手段４で遺伝子情報データベースＤＢ２にアクセスして対応付けされたピークに関する遺伝子情報を付加するようにしてもよい。 The gene expression analysis means 4 reads the information of the gene transcription product from which the peak of the associated measurement profile data is derived from the transcript type information, and identifies the gene from which the peak of the associated measurement profile data is derived. Analyze gene expression status. In addition, when adding gene information to the peak of the measurement profile data matched by the peak matching means 3, the gene expression analysis means 4 accesses the gene information database DB2 and stores the gene information related to the peak. You may make it add.

そして、前記のようにして解析された解析結果をディスプレイやプリンターなどの表示手段６に出力することで、ユーザーに遺伝子の発現状態を提示することができる。 Then, by outputting the analysis result analyzed as described above to the display means 6 such as a display or a printer, the expression state of the gene can be presented to the user.

そして、本発明に係る遺伝子発現解析プログラムは、ＣＤ−ＲＯＭ、フレキシブルディスク等のコンピュータ読み取り可能な記録媒体（図示せず）に記録され、例えば、遺伝子発現解析装置Ａと接続された記録媒体駆動装置（図示せず）によって、当該記録媒体から遺伝子発現解析プログラムを読み出して記憶手段５にインストールすることにより、コンピュータに、図１に示す参照プロファイルデータ取得工程Ｓ１、測定プロファイルデータ作成工程Ｓ２、ピーク対応付け工程Ｓ３および遺伝子発現解析工程Ｓ４を実行させるようにしてもよい。 The gene expression analysis program according to the present invention is recorded on a computer-readable recording medium (not shown) such as a CD-ROM or a flexible disk, and is, for example, a recording medium driving apparatus connected to the gene expression analyzing apparatus A (Not shown), by reading the gene expression analysis program from the recording medium and installing it in the storage means 5, the reference profile data acquisition step S 1, measurement profile data creation step S 2 shown in FIG. The attaching step S3 and the gene expression analyzing step S4 may be executed.

また、遺伝子発現解析プログラムが通信ネットワークＮＷを介して接続された他のコンピュータ（サーバ）に記憶されている場合、通信ネットワークＮＷに接続された遺伝子発現解析装置（クライアント）が、当該他のコンピュータから通信ネットワークＮＷを介して遺伝子発現解析プログラムをダウンロードすることにより、コンピュータに参照プロファイルデータ取得工程Ｓ１、測定プロファイルデータ作成工程Ｓ２、ピーク対応付け工程Ｓ３および遺伝子発現解析工程Ｓ４を実行させるようにしてもよい。 When the gene expression analysis program is stored in another computer (server) connected via the communication network NW, the gene expression analysis apparatus (client) connected to the communication network NW is By downloading the gene expression analysis program via the communication network NW, the computer may execute the reference profile data acquisition step S1, the measurement profile data creation step S2, the peak association step S3, and the gene expression analysis step S4. Good.

以上、本発明の遺伝子発現解析方法、遺伝子発現解析装置、および遺伝子発現解析プログラムについて、発明を実施するための最良の形態により詳細に説明したが、本発明の趣旨はこれに限定されるものではなく、特許請求の範囲の記載に基づいて広く解釈されなければならないことはいうまでもない。 The gene expression analysis method, the gene expression analysis apparatus, and the gene expression analysis program of the present invention have been described above in detail according to the best mode for carrying out the invention. However, the gist of the present invention is not limited to this. Needless to say, it should be interpreted widely based on the description of the scope of claims.

本発明に係る遺伝子発現解析方法の工程の手順を説明するフローチャートである。It is a flowchart explaining the procedure of the process of the gene expression analysis method which concerns on this invention. （ａ）〜（ｈ）は、異なる細胞株Ａと細胞株ＢについてのＨｉＣＥＰ法による測定プロファイルデータを比較した図である。(A)-(h) is the figure which compared the measurement profile data by HiCEP method about different cell line A and cell line B. FIG. 測定プロファイルデータを得るまでの手順の一例を示した説明図である。It is explanatory drawing which showed an example of the procedure until obtaining measurement profile data. ２４波形のピークの位置が局所的にずれてしまった測定プロファイルデータの例であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子示す図である。It is an example of the measurement profile data from which the position of the peak of 24 waveforms has shifted locally, (a) is a figure before correction processing, and (b) is a figure showing signs after correction processing. ２４波形のピークの位置が全体的に大きくシフトしてずれてしまった測定プロファイルデータの例であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子示す図である。FIG. 4A is an example of measurement profile data in which the positions of the peaks of 24 waveforms are greatly shifted and shifted as a whole, (a) showing a state before correction processing and (b) showing a state after correction processing. ２４波形のピークの位置が近接して複合している測定プロファイルデータの例であって、（ａ）は補正処理前、（ｂ）は補正処理後の様子を示す図である。It is an example of the measurement profile data which the position of the peak of 24 waveforms adjoined, Comprising: (a) is a figure which shows the mode before correction processing, (b) is after correction processing. １０波形の測定プロファイルデータを参考にして、任意の位置にピークを有する波形を人工的に作成して参照プロファイルデータとした場合を示す図である。It is a figure which shows the case where the waveform which has a peak in arbitrary positions is created artificially and it is set as reference profile data with reference to the measurement profile data of 10 waveforms. 参照プロファイルデータの波形との類似性に関する相関係数を基にした評価値を図示したものである。The evaluation value based on the correlation coefficient regarding the similarity with the waveform of reference profile data is illustrated. 本発明に係る遺伝子発現解析装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the gene expression analyzer which concerns on this invention.

Explanation of symbols

Ｓ１参照プロファイルデータ取得工程
Ｓ２測定プロファイルデータ作成工程
Ｓ３ピーク対応付け工程
Ｓ４遺伝子発現解析工程
Ａ遺伝子発現解析装置
１参照プロファイルデータ取得
２測定プロファイルデータ作成
３ピーク対応付け工程
４遺伝子発現解析工程
５記憶手段
６表示手段
ＤＳＤＮＡシーケンサー
ＳＲシーケンス結果（ＤＮＡフラグメント（ＰＣＲ増幅産物）の塩基数と検出量）
ＮＷ通信ネットワーク
ＤＢ１参照プロファイルデータベース
ＤＢ２遺伝子情報データベース S1 Reference profile data acquisition step S2 Measurement profile data creation step S3 Peak association step S4 Gene expression analysis step A Gene expression analyzer 1 Reference profile data acquisition 2 Measurement profile data creation 3 Peak association step 4 Gene expression analysis step 5 Storage means 6 Display means DS DNA sequencer SR Sequence results (number of bases and detection amount of DNA fragment (PCR amplification product))
NW communication network DB1 reference profile database DB2 gene information database

Claims

A gene expression analysis method for analyzing a gene expression state using profile data representing a plurality of peaks derived from a plurality of expressed gene transcripts as one waveform,
Obtained based on the position corresponding to the number of bases of a DNA fragment obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts and the detected amount corresponding to the transcription amount of the gene transcript at that position. The reference range of the base number equivalent value is represented as a first waveform, and as a transcription product species information of the gene transcript, a predetermined peak in the first waveform, a transcription product species from which the peak is derived, A reference profile data acquisition step of acquiring reference profile data that has been identified and stored in advance;
A position corresponding to the number of bases of a DNA fragment as a measurement object obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts, and a detection amount corresponding to the transcription amount of the measurement object at that position; The measurement profile data creating step for creating measurement profile data representing the measurement range of the base equivalent value obtained based on the second waveform,
A plurality of regions including a region of interest in the second waveform by performing correction processing for correcting a part or all of at least one of the first waveform and the second waveform and adjusting a peak position; And a plurality of peaks in the first waveform corresponding to the plurality of peaks, and a plurality of peaks in the first waveform corresponding to the plurality of peaks. A peak matching step to associate,
The gene expression state by reading the information of the gene transcript from which the peak of the associated measurement profile data is derived from the transcript type information and specifying the gene from which the peak of the associated measurement profile data is derived A gene expression analysis process for analyzing
Only including,
The peak association step includes
If the peak position of the reference profile data acquired in the reference profile data acquisition step matches the peak position of the measurement profile data created in the measurement profile data creation step, the matching reference Corresponding the peak of the profile data and the peak of the measurement profile data,
When the peak position of the reference profile data acquired in the reference profile data acquisition step and the peak position of the measurement profile data created in the measurement profile data creation step are partially or completely shifted Then, with respect to at least one of these waveforms, the reference profile data is corrected by correcting a part or all of the regions so that the similarity between these waveforms is the highest and adjusting the peak position. And a peak of the measurement profile data are associated with each other.

The reference profile data acquisition step includes
The reference profile data is
Either from a database that stores known profile data,
Obtained by artificially creating from the transcript species information,
Acquired by adding or deleting one or more peaks in the known profile data or the measurement profile data,
Obtained by combining using a plurality of the reference profile data, or
The gene expression analysis method according to claim 1, wherein the gene expression analysis method is obtained by synthesizing using a plurality of the measurement profile data.

The gene expression analysis method according to claim 1 or 2 , wherein the correction processing in the peak association step is performed by function approximation based on a Gaussian function.

The correction process in the peak association step is
Moving the position of the peak of the measurement profile data based on the position of the peak of the reference profile data,
Moving the peak position of the reference profile data based on the peak position of the measurement profile data, or
Move both the peak position of the reference profile data and the peak position of the measurement profile data,
The gene expression analysis method according to any one of claims 1 to 3 , wherein a peak of the measurement profile data is associated with a peak of the reference profile data.

The gene expression analysis step includes
The peak of the reference profile data and the peak of the measurement profile data are displayed so as to be distinguishable from the peak that cannot be correlated, and the peak of the reference profile data When gene information on the gene from which the gene is derived is added, the gene of the peak associated in the measurement profile data is identified by quoting the gene information, and the expression state of the gene is analyzed. The gene expression analysis method according to any one of claims 1 to 4 .

In the gene expression analysis step,
The gene according to any one of claims 1 to 5 , further comprising a step of adding related information related to the peak for a peak that could not be matched in the peak matching step. Expression analysis method.

The related information includes at least one of an evaluation value based on a correlation coefficient related to the similarity of the waveform, a peak position, a primer set, an expression intensity, a peak shape characteristic, and sample cell information and experimental information. The gene expression analysis method according to claim 6 , wherein:

A gene expression analyzer that analyzes gene expression using profile data that represents a plurality of peaks derived from a plurality of expressed gene transcripts as a single waveform,
Obtained based on the position corresponding to the number of bases of a DNA fragment obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts and the detected amount corresponding to the transcription amount of the gene transcript at that position. The reference range of the base number equivalent value is represented as a first waveform, and as a transcription product species information of the gene transcript, a predetermined peak in the first waveform, a transcription product species from which the peak is derived, Reference profile data acquisition means for acquiring in advance the reference profile data that has been identified and stored;
A position corresponding to the number of bases of a DNA fragment as a measurement object obtained by amplifying a part of cDNA obtained by reverse transcription of a plurality of gene transcripts, and a detection amount corresponding to the transcription amount of the measurement object at that position; Measurement profile data creating means for creating measurement profile data representing the measurement range of the base number equivalent value obtained based on the second waveform,
A plurality of regions including a region of interest in the second waveform by performing correction processing for correcting a part or all of at least one of the first waveform and the second waveform and adjusting a peak position; And a plurality of peaks in the first waveform corresponding to the plurality of peaks, and a plurality of peaks in the first waveform corresponding to the plurality of peaks. A peak matching means to associate;
The gene expression state by reading the information of the gene transcript from which the peak of the associated measurement profile data is derived from the transcript type information and specifying the gene from which the peak of the associated measurement profile data is derived A gene expression analysis means for analyzing
I have a,
The peak association means includes
If the peak position of the reference profile data acquired by the reference profile data acquisition means matches the peak position of the measurement profile data created by the measurement profile data creation means, the matching reference Corresponding the peak of the profile data and the peak of the measurement profile data,
When the position of the peak of the reference profile data acquired by the reference profile data acquisition unit and the position of the peak of the measurement profile data generated by the measurement profile data generation unit are partially or completely shifted Then, with respect to at least one of these waveforms, the reference profile data is corrected by correcting a part or all of the regions so that the similarity between these waveforms is the highest and adjusting the peak position. And a peak of the measurement profile data are associated with each other.

The reference profile data acquisition means includes
The reference profile data is
Either from a database that stores known profile data,
Obtained by artificially creating from the transcript species information,
Acquired by adding or deleting one or more peaks in the known profile data or the measurement profile data,
Obtained by combining using a plurality of the reference profile data, or
The gene expression analysis apparatus according to claim 8 , wherein the gene expression analysis apparatus is obtained by synthesizing using a plurality of the measurement profile data.

The gene expression analysis apparatus according to claim 8 or 9 , wherein the correction processing in the peak association unit is performed by function approximation based on a Gaussian function.

The correction process in the peak association means is
Moving the position of the peak of the measurement profile data based on the position of the peak of the reference profile data,
Moving the peak position of the reference profile data based on the peak position of the measurement profile data, or
Move both the peak position of the reference profile data and the peak position of the measurement profile data,
Gene expression analysis apparatus according to any one of claims 1 0 to claim 8, characterized in that associating the peak of the reference profile data and the peak of the measured profile data.

The gene expression analysis means includes
The peak of the reference profile data and the peak of the measurement profile data are displayed so as to be distinguishable from the peak that cannot be correlated, and the peak of the reference profile data When gene information on the gene from which the gene is derived is added, the gene of the peak associated in the measurement profile data is identified by quoting the gene information, and the expression state of the gene is analyzed. The gene expression analysis apparatus according to any one of claims 8 to 11.

The gene expression analysis means includes
Correspondence for the peak that could not at the peak correlating means, according to claims 8, characterized in that it contains means for adding the relevant information about the peak in any one of claims 1 2 Gene expression analyzer.

The related information includes at least one of an evaluation value based on a correlation coefficient related to the similarity of the waveform, a peak position, a primer set, an expression intensity, a peak shape characteristic, and sample cell information and experimental information. gene expression analysis apparatus according to claim 1 3, characterized in that out.

Gene expression analysis program characterized by executing the claims 1 to methods of gene expression analysis according to the computer to claim 7.