JP2007312653A

JP2007312653A - Analyzing method for character extraction and comparison classification of sequential gene expression data and analyzing apparatus based on the analyzing method

Info

Publication number: JP2007312653A
Application number: JP2006144599A
Authority: JP
Inventors: Tomoko Koshi; 智子越; Shu Muto; 周武藤
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2006-05-24
Filing date: 2006-05-24
Publication date: 2007-12-06
Anticipated expiration: 2026-05-24
Also published as: JP4555256B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an analyzing method for the extraction of characters of sequential gene expression data of a recombinant gene in individual plant body in the case of using a Luc-Tag line transformed plant and for the comparison classification based on the characters of the sequential gene expression data between the individuals, and provide an analyzing apparatus based on the analyzing method. <P>SOLUTION: Cluster analysis is carried out on multiple specimens in a line based on the wave form expressing the change of luciferase activity of a Luc-Tag line transformed plant with time, an individual group having high similarity is selected, a model waveform reflecting the similarity is formed, the model waveform is decomposed to a plurality of single-peaked waveforms and the characters of the change of luciferase activity with time in the individual group having high line similarity are grasped. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、生物個体における時系列遺伝子発現量データの特徴抽出と、個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析方法、および該解析方法に基づく解析装置に関する。特には、遺伝子組み換え型形質転換植物体を対象として、個々の植物個体における該組み換え遺伝子の時系列遺伝子発現量データの特徴抽出と、個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析方法、および該解析方法に基づく解析装置に関する。 The present invention relates to feature extraction of time-series gene expression level data in individual organisms, analysis method for the purpose of comparative classification based on characteristics of time-series gene expression level data between individuals, and analysis apparatus based on the analysis method About. In particular, for transgenic transgenic plants, feature extraction of time-series gene expression data of the recombinant gene in individual plants and comparison based on characteristics of time-series gene expression data between individuals The present invention relates to an analysis method for classification and an analysis apparatus based on the analysis method.

生物個体における遺伝子の発現量の時間的な変化は、当該遺伝子の発現を制御する機構、あるいは当該遺伝子産物の該生物の生命活動における役割を反映している。例えば、所謂、セル・サイクルと呼ばれる細胞分裂の一連のステップにおいては、そのステップの進行に関与する特定の種類のタンパク質をコードする遺伝子は、予めプログラムされたタイム・テーブルに従うように、発現される。 The temporal change in the expression level of a gene in an individual organism reflects the mechanism that controls the expression of the gene or the role of the gene product in the life activity of the organism. For example, in a series of steps of cell division called the so-called cell cycle, genes that code for specific types of proteins involved in the progress of that step are expressed in accordance with a pre-programmed time table. .

加えて、多細胞で構成される高等生物は、各種の器官を有しており、個々の器官を形成する過程、すなわち、分化の過程に付随して、特定の遺伝子産物をその細胞内で産生している。そのため、分化した細胞内では、それぞれ特異的な遺伝子の発現量が、その細胞の達しているステージに応じて、時間的な変化を示す事例も報告されている。多細胞で構成される高等生物において、ゲノムＤＮＡ上に存在する、タンパク質をコードする遺伝子の発現過程では、例えば、転写（ｔｒａｎｓｃｒｉｐｔｉｏｎ）、ｍＲＮＡへの成熟（ｓｐｌｉｃｉｎｇ）、ｍＲＮＡに基づく翻訳（ｔｒａｎｓｌａｔｉｏｎ）、翻訳されたペプチド鎖の折りたたみ（ｆｏｌｄｉｎｇ）、成熟型タンパク質への変換（ｐｏｓｔ−ｔｒａｎｓｌａｔｉｏｎｐｒｏｃｅｓｓｉｎｇ）の各ステップを経て、対象の成熟型タンパク質が産生される。これらの一連のステップは、いずれも、それぞれ酵素タンパク質か関与する反応であり、その酵素反応速度の影響を受けている。一般に、酵素タンパク質の酵素活性は、温度依存性を有しており、当該細胞内の温度によって、その酵素反応速度が相違する。 In addition, higher organisms composed of multiple cells have various organs and produce specific gene products in the cells accompanying the process of forming individual organs, that is, the process of differentiation. is doing. For this reason, there have been reports of cases where the expression level of each specific gene in a differentiated cell changes with time depending on the stage reached by the cell. In higher organisms composed of multiple cells, the expression process of a gene encoding a protein present on genomic DNA includes, for example, transcription, maturation (splicing) into mRNA, translation based on mRNA, Through the steps of folding the translated peptide chain and post-translation processing, the target mature protein is produced. Each of these series of steps is a reaction involving an enzyme protein, and is influenced by the enzyme reaction rate. In general, the enzyme activity of an enzyme protein has temperature dependence, and the enzyme reaction rate varies depending on the temperature in the cell.

例えば、所定の温度下において、細胞培養されている植物細胞においては、当該細胞内の温度は、一定に保たれており、従って、酵素タンパク質の酵素活性自体は、一般に、一定水準に維持されている。一方、周辺温度が時間的に変動する環境下にある、植物体においては、当該植物体の各細胞内の温度は、周辺温度の影響を受けて、同様に時間的に変動する結果、酵素タンパク質の酵素活性も、対応して、時間的な変動を示す。また、光合成能を有する植物体では、当該植物体の生命活動を維持するために消費されるエネルギー源の相当部分に、自らの光合成過程で光エネルギーを利用して生産される、ＡＴＰやＮＡＤＰＨを利用している。例えば、タンパク質をコードする遺伝子の発現過程で利用される、基質物質である、各種アミノ酸、リボヌクレオシド５’三リン酸の合成にも、光合成能を有する植物体の細胞内においては、その光合成機構で生産される、ＡＴＰやＮＡＤＰＨが、相当部分利用されている。従って、当該植物体に対する、光照射量が時間的に変動すると、その細胞内で合成される、各種アミノ酸、リボヌクレオシド５’三リン酸の量も時間的な変動を示す。例えば、光照射量と周辺温度とが、時間的に同期して、周期的に変動する状況に置かれている、光合成能を有する植物体の生命活動も、対応する周期的な変動を示すことが予測されている。太陽光の照射下、昼夜間の温度変動を受ける栽培環境において、栽培される植物体においては、例えば、前記の要因に起因する周期的な変動の影響も含め、外因的な周期に同期するような、遺伝子の発現量の時間的変化が誘起される場合もある。 For example, in a plant cell that is cultured at a predetermined temperature, the temperature in the cell is kept constant, and thus the enzyme activity of the enzyme protein itself is generally maintained at a constant level. Yes. On the other hand, in a plant body in an environment where the ambient temperature varies with time, the temperature in each cell of the plant body is affected by the ambient temperature and similarly varies with time. Correspondingly, the enzyme activity also shows temporal variations. Moreover, in the plant body which has photosynthesis ability, ATP and NADPH which are produced using light energy in its own photosynthesis process are used for a considerable part of the energy source consumed in order to maintain the life activity of the plant body. We are using. For example, in the synthesis of various amino acids and ribonucleoside 5 ′ triphosphates, which are substrate substances used in the expression process of a gene encoding a protein, the photosynthetic mechanism in the cell of a plant having photosynthesis ability ATP and NADPH produced in Japan are used in a substantial part. Therefore, when the amount of light irradiation with respect to the plant body varies with time, the amounts of various amino acids and ribonucleoside 5 'triphosphate synthesized in the cell also vary with time. For example, the life activity of a plant having photosynthetic ability, where the amount of light irradiation and the ambient temperature are periodically changing in synchronization with time, also show corresponding periodic fluctuations. Is predicted. In a cultivated environment that undergoes daytime and nighttime temperature fluctuations under sunlight irradiation, for example, in a plant that is cultivated, it seems to synchronize with an extrinsic cycle, including the effects of periodic variations due to the above factors, for example. There are also cases where a temporal change in the expression level of the gene is induced.

また、植物体が有する各種の組織、器官を構成する細胞内では、特定の遺伝子からの転写（ｔｒａｎｓｃｒｉｐｔｉｏｎ）自体、何らかの転写因子（ｔｒａｎｓｃｒｉｐｔｉｏｎｆａｃｔｏｒ）によって誘起される場合も少なくない。例えば、外的な刺激によって、転写因子の増加、活性化が誘起され、その結果、当該遺伝子からの転写が誘起され、一方、外的な刺激が取り除かれると、それ以降、転写因子の増加、活性化は進行せず、時間経過とともに、転写量が減少する場合も少なくない。この場合にも、対象とする遺伝子の発現量の時間的変化が生じる。 In addition, in cells constituting various tissues and organs of a plant body, transcription from a specific gene itself is often induced by some transcription factor. For example, when an external stimulus induces an increase or activation of a transcription factor, and as a result, transcription from the gene is induced, while when an external stimulus is removed, an increase in the transcription factor thereafter, Activation does not proceed, and the amount of transcription often decreases with time. Also in this case, a temporal change in the expression level of the target gene occurs.

多細胞で構成される高等生物のゲノムＤＮＡ上には、それぞれ異なるタンパク質をコードしている遺伝子が数多く存在している。その際、それらの遺伝子から発現されるタンパク質の生物学的な機能、その機能発現のメカニズムに応じて、各遺伝子の発現量は、独特の時間的な変化を示すと考えられている。換言するならば、各遺伝子由来の産物である、タンパク質の生物学的な機能、役割を研究する上では、当該遺伝子の発現量の時間的な変化の情報も、重要な情報である。 Many genes encoding different proteins exist on the genomic DNA of higher organisms composed of multiple cells. At that time, the expression level of each gene is considered to exhibit a unique temporal change depending on the biological function of the protein expressed from those genes and the mechanism of the function expression. In other words, when studying the biological functions and roles of proteins, which are products derived from each gene, information on temporal changes in the expression level of the gene is also important information.

一方、多細胞で構成される高等生物において、一旦、分化が完了した細胞内で、そのコード遺伝子の発現によって、産生されたタンパク質は、その生物学的な機能に依っては、時間経過とともに、その生理的な活性を失う（失活化）、さらには、内因性のタンパク質分解酵素によって、分解を受ける。この細胞内における、タンパク質の合成と分解過程が関与する、タンパク質代謝機構に伴って、細胞内に存在している、タンパク質の濃度は、時間的な変動を示す。例えば、細胞内において、何らかの要因によって、特定の遺伝子の発現が促進され、該遺伝子によりコードされるタンパク質の濃度が上昇した後、該遺伝子の発現を促進する要因が取り除かれると、その後は、当該タンパク質の分解に因って、タンパク質の濃度が低下していく。結果的に、該タンパク質の細胞内濃度は、ある時刻において、極大を示すピーク形状の時間的変化を示す。 On the other hand, in higher organisms composed of multiple cells, the protein produced by the expression of the coding gene in the cell once differentiated, with the passage of time, depending on its biological function, It loses its physiological activity (inactivation) and is further degraded by endogenous proteolytic enzymes. The concentration of the protein present in the cell along with the protein metabolism mechanism, which involves protein synthesis and degradation processes in the cell, shows temporal fluctuations. For example, when the expression of a specific gene is promoted by some factor in the cell, and the factor that promotes the expression of the gene is removed after the concentration of the protein encoded by the gene is increased, Due to protein degradation, the protein concentration decreases. As a result, the intracellular concentration of the protein shows a temporal change in the peak shape showing the maximum at a certain time.

転写、ｍＲＮＡへの成熟（ｓｐｌｉｃｉｎｇ）後、該タンパク質のペプチド鎖の翻訳に利用されるｍＲＮＡも、翻訳に利用された後、内因性の核酸分解酵素（ＲＮａｓｅ）によって、分解を受ける。この細胞内における、ｍＲＮＡの産生と分解過程が関与する、ｍＲＮＡの代謝機構に伴って、細胞内に存在している、該ｍＲＮＡの濃度は、時間的な変動を示す。例えば、細胞内において、何らかの要因によって、特定の遺伝子の発現が促進され、該ｍＲＮＡの濃度が上昇した後、該遺伝子の発現を促進する要因が取り除かれると、その後は、ｍＲＮＡの酵素的分解に因って、該ｍＲＮＡの濃度が低下していく。結果的に、該ｍＲＮＡの細胞内濃度は、ある時刻において、極大を示すピーク形状の時間的変化を示す。一般に、ｍＲＮＡの酵素的分解は、それから翻訳されるペプチド鎖で構成されるタンパク質の酵素的分解よりも、格段に速やかに進行するため、該ｍＲＮＡの細胞内濃度が極大を示す時刻は、対応するタンパク質の細胞内濃度が極大を示す時刻よりも、若干早くなる。また、極大を示す時刻の後、該ｍＲＮＡの細胞内濃度は、速やかに減少する。

特定のタンパク質の細胞内濃度が、前述するようなピーク形状の時間的変化を示す際、多細胞で構成される高等生物、例えば、植物体が有する各種の組織、器官は、複数の同種の細胞を含んでいるが、それら同種の細胞相互を比較すると、当該タンパク質の細胞内濃度の変化は、細胞間で、完全に同期したものとはならない。しかしならが、これら特定の組織、器官に含まれる、同種の細胞の集団全体を考慮すると、すなわち、同種の細胞個々における、当該タンパク質の細胞内濃度を平均した「平均値」は、やはり、ある時刻において、極大を示すピーク形状の時間的変化を示す傾向を有する。すなわち、特定の組織、器官に含まれる、同種の細胞の集団全体は、個々の細胞内において、該タンパク質の細胞内濃度がピークを示す時刻は、特定の時間幅の間に集中する傾向を有する場合、同種の細胞の集団全体の「平均値」は、この特定の時間幅の間に極大を示す。 After transcription and maturation (splicing) into mRNA, mRNA used for translation of the peptide chain of the protein is also used for translation and then degraded by endogenous nuclease (RNase). The concentration of the mRNA present in the cell in accordance with the metabolic mechanism of the mRNA, in which the production and degradation processes of the mRNA are involved, shows a temporal variation. For example, when the expression of a specific gene is promoted by some factor in the cell and the concentration of the mRNA is increased, and then the factor that promotes the expression of the gene is removed, the enzyme is subsequently degraded enzymatically. Therefore, the concentration of the mRNA decreases. As a result, the intracellular concentration of the mRNA shows a temporal change in the peak shape showing the maximum at a certain time. In general, the enzymatic degradation of mRNA proceeds much more rapidly than the enzymatic degradation of a protein composed of a peptide chain translated therefrom, and therefore the time at which the intracellular concentration of the mRNA reaches its maximum corresponds. It is slightly earlier than the time when the intracellular concentration of the protein shows the maximum. In addition, after the time showing the maximum, the intracellular concentration of the mRNA decreases rapidly.

When the intracellular concentration of a specific protein shows a temporal change in the peak shape as described above, higher organisms composed of multiple cells, for example, various tissues and organs possessed by a plant have a plurality of the same type of cells. However, when the cells of the same type are compared with each other, the change in the intracellular concentration of the protein is not completely synchronized between cells. However, considering the whole population of allogeneic cells contained in these specific tissues and organs, that is, there is still an “average value” that averages the intracellular concentration of the protein in the same type of individual cells. It has a tendency to show a temporal change in the peak shape indicating the maximum at time. That is, the entire population of the same type of cells contained in a specific tissue or organ has a tendency that the time at which the intracellular concentration of the protein shows a peak in each cell is concentrated within a specific time width. In this case, the “average value” of the entire population of allogeneic cells shows a maximum during this particular time span.

例えば、植物体では、その植物個体が有する各種の組織、器官を構成する、個々の細胞について、対象とするタンパク質の細胞内濃度の変化を追跡することは、技術的に困難である。そのため、これら特定の組織、器官に含まれる、同種の細胞の集団全体における、当該タンパク質の細胞内濃度を平均した「平均値」の時間的変化を追跡した情報が、各遺伝子由来の産物である、タンパク質の生物学的な機能、役割を研究する上で利用される。 For example, in a plant body, it is technically difficult to trace changes in the intracellular concentration of a target protein for individual cells constituting various tissues and organs of the plant individual. For this reason, the information derived by tracking the temporal change of the “average value” that averages the intracellular concentration of the protein in the same type of cell population contained in these specific tissues and organs is the product derived from each gene. It is used to study biological functions and roles of proteins.

各遺伝子由来の産物である、タンパク質の生物学的な機能、役割を研究する上では、個々の細胞内における、当該遺伝子の発現量の時間的な変化を追跡することが最も望ましい。しかしながら、多細胞で構成される高等生物において、各個体を構成する個々の細胞内における、特定の遺伝子の発現量の時間的な変化を追跡することは、技術的に困難である。そのため、各個体を構成する多数の細胞中、例えば、分化によって、各種の組織、器官を構成している、同種の細胞の集団全体を対象として、該集団に含まれる各細胞における当該遺伝子の発現量を平均した「平均値」の時間的変化を追跡する手法が利用される。 In studying the biological function and role of a protein, which is a product derived from each gene, it is most desirable to track temporal changes in the expression level of the gene in individual cells. However, it is technically difficult to trace temporal changes in the expression level of a specific gene in individual cells constituting each individual in a higher organism composed of multiple cells. Therefore, among a large number of cells constituting each individual, for example, an entire population of cells of the same kind that constitute various tissues and organs by differentiation, the expression of the gene in each cell included in the population A method of tracking the temporal change of the “average value” obtained by averaging the quantities is used.

例えば、植物体の生長を継続しつつ、その植物個体が有する各種の組織、器官を構成する、同種の細胞の集団全体を対象として、時間を追って、該集団に含まれる各細胞における特定の遺伝子の発現量を平均した「平均値」を追跡するためには、従来は、該集団から、一部の細胞を採取して、その採取細胞サンプル中に含まれる、ｍＲＮＡの濃度、あるいは、対応するタンパク質濃度を分析する手段が利用されていた。この一部の細胞をサンプリングする手法は、その母集団を構成する、同種の細胞の集団全体が、ある程度均質な状態である前提が満足される場合にのみ、適用可能である。また、一部の細胞をサンプリングした後、その母集団にサンプリング操作が、外的なストレスとして、作用しない場合に、有効な手段である。 For example, for a whole group of cells of the same type that constitute various tissues and organs of the plant individual while continuing the growth of the plant body, a specific gene in each cell included in the group over time In order to track an “average value” obtained by averaging the expression level of a gene, conventionally, a part of cells is collected from the population, and the concentration of mRNA contained in the collected cell sample or a corresponding value is collected. Means for analyzing protein concentration have been utilized. This method of sampling a part of the cells is applicable only when the premise that the whole population of the same kind of cells constituting the population is in a somewhat homogeneous state is satisfied. Moreover, it is an effective means when the sampling operation does not act on the population after sampling a part of cells as an external stress.

実際に、生育している植物体を対象として、その植物個体に含まれる同種の細胞の集団全体から、同じ時刻において、複数の採取細胞サンプルを採取して、各サンプルにおけるｍＲＮＡの濃度、あるいは、対応するタンパク質濃度を分析した結果を比較したところ、個々のサンプルにおいて、ｍＲＮＡの細胞内濃度の「平均値」、対応するタンパク質の細胞内濃度の「平均値」は、相当に大きな分散を示すことが少なくないことが判明した。すなわち、生育している植物体を対象として、その植物個体に含まれる同種の細胞の集団全体における、ｍＲＮＡの細胞内濃度の「平均値」、対応するタンパク質の細胞内濃度の「平均値」の時間的変化を、相当に長い時間に亘り、時系列的に追跡する際には、一部の細胞をサンプリングする手法は、有効でないと判断される。 In fact, for a growing plant body, from a whole population of cells of the same kind contained in the plant individual, at the same time, a plurality of collected cell samples are collected, and the concentration of mRNA in each sample, or Comparing the results of analysis of the corresponding protein concentration, it was found that the “average value” of the intracellular concentration of mRNA and the “average value” of the intracellular concentration of the corresponding protein showed considerable dispersion in each sample. It turns out that there are not a few. That is, for the growing plant body, the “average value” of the intracellular concentration of mRNA and the “average value” of the intracellular concentration of the corresponding protein in the entire population of cells of the same kind contained in the plant individual. When tracking temporal changes in a time series over a considerably long time, it is determined that the technique of sampling some cells is not effective.

本発明者らと、その共同研究者らは、一部の細胞をサンプリングする「破壊的評価法」に代えて、非破壊的評価法として、対象とする遺伝子の発現に完全に同期させて、その発現量と定量的に比例する発現量を示すように、発光タンパク質（ルシフェラーゼ）をコードするルシフェラーゼ遺伝子を導入し、細胞内で該組み換え発現型の発光タンパク質（ルシフェラーゼ）を産生させ、該組み換え発現型の発光タンパク質に由来する化学発光を測定する手法を採用することができることを実証した。 Instead of the “destructive evaluation method” in which some cells are sampled, the present inventors and their collaborators are completely synchronized with the expression of the gene of interest as a non-destructive evaluation method, A luciferase gene that encodes a photoprotein (luciferase) is introduced so that the expression level is quantitatively proportional to the expression level, and the recombinant expression type photoprotein (luciferase) is produced in the cell, and the recombinant expression It was demonstrated that a technique for measuring chemiluminescence derived from a type of photoprotein can be employed.

具体的には、植物のゲノムＤＮＡ上に存在する、特定の遺伝子に対して、該遺伝子がコードする、タンパク質のアミノ酸配列をコードする領域を含む、ｐｒｅ−ｍＲＮＡの転写を継続させる配列（ＩＲＥＳ）の直後に、ルシフェラーゼ遺伝子の該発光タンパク質（ルシフェラーゼ）をコードする領域（ＯＲＦ部分）を挿入し、連結する。このキメラ型遺伝子は、上流部分は、特定の遺伝子の転写を引き起こす、プロモータ配列を有しており、特定の遺伝子の発現を誘起する、内因的要因によって、同様に転写が開始される。その際、得られるｍＲＮＡは、挿入されているルシフェラーゼ遺伝子によりコードされる、発光タンパク質（ルシフェラーゼ）のペプチド鎖へと翻訳可能なものとなる。すなわち、そのｍＲＮＡに基づき、翻訳がなされ、引き続き、タンパク質へと折りたたみ（ｆｏｌｄｉｎｇ）が進行すると、活性を有する発光タンパク質（ルシフェラーゼ）が、レポーター・タンパク質として、細胞内に産生される。その発現量は、野生株植物中における、対象の特定の遺伝子がコードする、対象のタンパク質の発現量に、相当するものとなる。 Specifically, for a specific gene present on the genomic DNA of a plant, a sequence encoding a protein amino acid sequence encoded by the gene (IRES) that continues transcription of pre-mRNA Immediately after, the region (ORF portion) encoding the photoprotein (luciferase) of the luciferase gene is inserted and ligated. This chimeric gene has a promoter sequence that causes transcription of a specific gene in the upstream portion, and transcription is similarly initiated by an endogenous factor that induces expression of the specific gene. In that case, the obtained mRNA becomes translatable into a peptide chain of a photoprotein (luciferase) encoded by the inserted luciferase gene. That is, translation is performed based on the mRNA, and when folding proceeds into a protein, an active photoprotein (luciferase) is produced in the cell as a reporter protein. The expression level corresponds to the expression level of the target protein encoded by the target specific gene in the wild-type plant.

この細胞内に存在するレポーター・タンパク質（ルシフェラーゼ）の濃度は、該発光タンパク質（ルシフェラーゼ）の酵素活性を評価することで定量的に求めることが可能である。具体的には、当該細胞（ｌ）内に、発光タンパク質（ルシフェラーゼ）の基質物質が所定の濃度Ｃ_sub.存在する際、該融合タンパク質の細胞内濃度Ｃ_fusion（ｌ）とすると、そのＣ末部分の発光タンパク質（ルシフェラーゼ）の酵素活性による、単位時間当たりの化学発光：ｄＰ_chem.（ｌ）／ｄｔは、前記酵素反応の見かけの速度定数ｋ_react.を用いて、
ｄＰ_chem.（ｌ）／ｄｔ＝ｋ_react.・Ｃ_sub.・Ｃ_fusion（ｌ）（１）
と表示できる。従って、単位時間当たりの化学発光：ｄＰｃｈｅｍ．／ｄｔを測定することで、該融合タンパク質の細胞内濃度Ｃ_fusion（ｌ）を推定することが可能となる。また、細胞集団全体を対象とする際には、
Σ［ｄＰ_chem.（ｌ）／ｄｔ］＝ｋ_react.・Ｃ_sub.・Σ［Ｃ_fusion（ｌ）］
となり、細胞集団全体で観測される、単位時間当たりの化学発光：Σ［ｄＰ_chem.（ｌ）／ｄｔ］に基づき、細胞集団全体の該融合タンパク質の細胞内濃度Ｃ_fusion（ｌ）の総和：Σ［Ｃ_fusion（ｌ）］を推定することが可能となる。 The concentration of the reporter protein (luciferase) present in the cells can be quantitatively determined by evaluating the enzyme activity of the photoprotein (luciferase). Specifically, when the substrate substance of the photoprotein (luciferase) is present in the cell (l) at a predetermined concentration C _sub. , The intracellular concentration C _fusion (l) of the fusion protein indicates that C terminal The chemiluminescence per unit time due to the enzyme activity of a partial photoprotein (luciferase): dP _chem. (L) / dt is calculated using the apparent rate constant k _react.
dP _chem. (l) / dt = k _react. C _sub. C _fusion (l) (1)
Can be displayed. Therefore, chemiluminescence per unit time: dPchem. By measuring / dt, the intracellular concentration C _fusion (l) of the fusion protein can be estimated. When targeting the entire cell population,
Σ [dP _chem. (L) / dt] = k _react. · C _sub. · Σ [C _fusion (l)]
Based on chemiluminescence per unit time observed in the entire cell population: Σ [dP _chem. (L) / dt], the sum of the intracellular concentrations C _fusion (l) of the fusion protein in the entire cell population: Σ [C _fusion (l)] can be estimated.

すなわち、特定の遺伝子の発現量、特に、転写の頻度を定量的に反映するように、発光タンパク質（ルシフェラーゼ）遺伝子を「Ｔａｇ」、この場合、「レポーター酵素タンパク質」遺伝子として挿入し、組換え型タンパク質として産生させ、この「Ｔａｇ」の発光タンパク質（ルシフェラーゼ）の酵素活性を非破壊的に測定するものである。特に、「レポーター酵素タンパク質」として、発光タンパク質（ルシフェラーゼ）をコードするルシフェラーゼ遺伝子を、対象植物のゲノムＤＮＡ中、特定の遺伝子に対して、その転写を継続可能な配列（ＩＲＥＳ）の直後に挿入し、形質転換体としたものである。この形質転換体植物について、各植物個体を自家交配して、それぞれのＴ１種子を収穫して、前記の遺伝子組み換えを保持する系統として、選別したもの（系統）を、特に、「ＬｕｃＴａｇライン」と称している。 That is, a photoprotein (luciferase) gene is inserted as a “Tag”, in this case, a “reporter enzyme protein” gene so as to quantitatively reflect the expression level of a specific gene, in particular, the frequency of transcription. It is produced as a protein, and the enzyme activity of this “Tag” photoprotein (luciferase) is measured nondestructively. In particular, as a “reporter enzyme protein”, a luciferase gene encoding a photoprotein (luciferase) is inserted immediately after a sequence (IRES) capable of continuing transcription of a specific gene in the genomic DNA of the target plant. A transformant is obtained. About this transformant plant, each plant individual is self-crossed, and each T1 seed is harvested and selected as a strain that retains the above-described genetic recombination (line), in particular, “LucTag line” It is called.

この「ＬｕｃＴａｇライン」は、形質転換体植物であるが、「ＬｕｃＴａｇ」が挿入されている、特定の遺伝子の発現は、野生株植物と、本質的に同じ制御機構に従っている。すなわち、その発現誘導、発現抑制、ならびに、転写、翻訳、タンパク質への折りたたみ、産生されたタンパク質の分解の何れの過程に関しても、それを支配する機構は、「ＬｕｃＴａｇライン」の形質転換体植物と、野生株植物との間では、本質的に同じものとなっている。この特徴を利用すると、該「ＬｕｃＴａｇライン」型形質転換体植物の植物体を生長する間、その植物体の細胞集合における、特定の遺伝子の発現量の時間的な変化を追跡することで、該野生株植物の植物体の細胞集合における、特定の遺伝子の発現量の時間的な変化と全く等価な情報を入手することができる。 This “LucTag line” is a transformant plant, but the expression of a specific gene into which “LucTag” is inserted follows essentially the same control mechanism as that of the wild type plant. That is, the mechanism governing the expression induction, expression suppression, transcription, translation, protein folding, and degradation of the produced protein is the transforming plant of the “LucTag line”. It is essentially the same between wild-type plants. By utilizing this feature, while the plant of the “LucTag line” type transformant plant is grown, the temporal change in the expression level of a specific gene in the cell population of the plant body is traced. Information completely equivalent to the temporal change in the expression level of a specific gene in a cell assembly of a wild-type plant can be obtained.

なお、野生株植物の染色体ＤＮＡ中への「レポーター酵素タンパク質」ルシフェラーゼ遺伝子の挿入は、「２倍体」型の染色体ＤＮＡ中、特定の遺伝子が存在する相同染色体対（ａ・ａ）のうち、いずれか一方の相同染色体に起こっている。すなわち、遺伝子組み換え操作が施された、形質転換植物体は、遺伝子組み換えがなされている染色体ａ^*と、遺伝子組み換えがなされていない相同染色体ａとからなる、（ａ^*・ａ）型の染色体構成を採っている。従って、得られた形質転換植物の植物個体（ａ^*・ａ）を自家交配させて、植物個体それぞれのＴ１種子を収穫すると、得られるＴ１種子は、（ａ^*・ａ^*）、（ａ^*・ａ）、（ａ・ａ^*）、（ａ・ａ）の４種の染色体構成のいずれかを採る。このＴ１種子として、ライン化される、該「ＬｕｃＴａｇライン」型形質転換体植物は、（ａ^*・ａ^*）の染色体構成を有する「ホモ型」、ならびに、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」と、遺伝型が相違する種子が混在したものとなっている。 The insertion of the “reporter enzyme protein” luciferase gene into the chromosomal DNA of a wild-type plant is a homologous chromosome pair (a · a) in which a specific gene is present in a “diploid” type chromosomal DNA, Occurs on either homologous chromosome. In other words, a transformed plant body that has been subjected to genetic recombination operation has a chromosome configuration of type (a ^* .a) consisting of a chromosome a ^* that has been genetically modified and a homologous chromosome a that has not been genetically modified. Is adopted. Therefore, when plant individuals (a ^* · a) of the obtained transformed plants are self-mated and T1 seeds of each plant individual are harvested, the T1 seeds obtained are (a ^* · a ^* ), (a ^* )・ A), (a ^* a ^* ), and (a ^* a) are selected from any of the four types of chromosome configurations. As the T1 seeds are lines of, the "LucTag line" type transformant plants, (a ^* · a ^*) with a chromosomal structure of the "homo-type", as well as, (a ^* · a) or (a -"Heterotype" having a chromosome structure of a ^* ) and seeds with different genotypes are mixed.

該「ＬｕｃＴａｇライン」型形質転換体植物の種を播種し、発芽、生長させた植物個体は、例えば、（ａ^*・ａ^*）の染色体構成を有する「ホモ型」であっても、自家交配の際、相同染色体対間で対立遺伝子の組み換えが部分的に起こるため、個体間で、表現型に差異を示すこともある。すなわち、（ａ^*・ａ^*）の染色体構成を有する「ホモ型」であっても、「ＬｕｃＴａｇ」が付されている、特定の遺伝子の発現量の時間的な変化は、植物個体間で何らかの分布を示すものとなる。また、（ａ^*・ａ^*）の染色体構成を有する「ホモ型」の植物個体と、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」の植物個体との間では、その遺伝型の相違に由来して、対象とする、「ＬｕｃＴａｇ」が付されている、特定の遺伝子の発現量の時間的な変化は、遺伝型の相違する植物個体間で有意な差異を示す可能性がある。 Even if the plant individual seeded with the seed of the “LucTag line” type transformant, germinated, and grown is, for example, a “homo type” having a chromosome configuration of (a ^* · a ^* ), self-mating In this case, allelic recombination partially occurs between homologous chromosome pairs, and thus there may be a difference in phenotype between individuals. That is, even with a “homotype” having a chromosomal configuration of (a ^* · a ^* ), the temporal change in the expression level of a specific gene to which “LucTag” is attached varies between plant individuals. The distribution is shown. Further, between the plants in the "homo-type" with chromosomal structure, and plants in the "heterozygous" with a chromosomal structure of the (a ^* · a) or (a · a ^*) of the (a ^{^*} · a ^*) Among them, due to the difference in genotype, the temporal change in the expression level of a specific gene to which “LucTag” is attached is significant among plant individuals having different genotypes. May show differences.

また、野生株植物の染色体ＤＮＡ中への「レポーター酵素タンパク質」ルシフェラーゼ遺伝子の挿入に際して、その挿入部位が、偶々、「特定の遺伝子」のコード領域のＣ末部分の翻訳を阻害する場合もある。その際には、挿入を受けた「特定の遺伝子」の発現によって、該「特定の遺伝子」によって、本来産生される野生型タンパク質ではなく、Ｃ末端部分が欠損した、Ｃ末欠失型変異タンパク質が産生される。このＣ末欠失型変異タンパク質は、野生型タンパク質の有する機能が損なわれた、あるいは低下している場合もある。例えば、挿入を受けた「特定の遺伝子」が、自己制御遺伝子の場合、その機能が欠落することによって、「ホモ型」と、「ヘテロ型」とでは、挿入を受けた「特定の遺伝子」の細胞内での発現量の時間的変化に差異を示すこともある。 In addition, when the “reporter enzyme protein” luciferase gene is inserted into the chromosomal DNA of a wild-type plant, the insertion site may accidentally inhibit the translation of the C-terminal part of the coding region of the “specific gene”. In this case, the C-terminal deletion mutant protein lacking the C-terminal portion, not the wild-type protein originally produced by the “specific gene”, due to the expression of the inserted “specific gene”. Is produced. In some cases, the function of the wild-type protein is impaired or lowered in the C-terminal deletion mutant protein. For example, if the inserted “specific gene” is a self-regulating gene, its function is lost, so that “homotype” and “heterotype” There may be a difference in the temporal change of the expression level in the cell.

該「ＬｕｃＴａｇライン」型形質転換体植物を利用する際には、同じ遺伝型内における、個体間の分散、ならびに、異なる遺伝型間における表現型の差異の有無を検証した上で、各遺伝型内において、共通する、「特定の遺伝子の発現量の時間的な変化」の傾向（特徴）を抽出することが必要となる。 When using the “LucTag line” type transformant plant, each genotype is examined after verifying the dispersion between individuals within the same genotype and the presence or absence of phenotypic differences between different genotypes. It is necessary to extract a common tendency (feature) of “temporal change in the expression level of a specific gene”.

具体的には、先ず、該「ＬｕｃＴａｇライン」型形質転換体植物の種子複数を播種し、発芽、生長させた植物個体複数について、各植物個体において、「ＬｕｃＴａｇ」を利用して測定された、該植物個体の細胞集団全体で観測される「特定の遺伝子の発現量の時間的な変化」を示す、時系列遺伝子発現量データを相互比較し、同じ遺伝型内における、個体間の分散の程度、ならびに、異なる遺伝型間における表現型の差異の有無を検証することが、必要となる。次に、異なる遺伝型間における表現型の顕著な差異が存在する際には、各遺伝型内において、共通する「特定の遺伝子の発現量の時間的な変化」の傾向（特徴）を抽出することが必要となる。 Specifically, first, a plurality of seeds of the “LucTag line” type transformant plant were sown, germinated and grown, each plant individual was measured using “LucTag”. Time series gene expression level data showing the “temporal change in the expression level of a specific gene” observed in the entire cell population of the plant individual is compared with each other, and the degree of dispersion among individuals within the same genotype As well as verifying the presence or absence of phenotypic differences between different genotypes. Next, when there is a significant difference in phenotype between different genotypes, a common “temporal change in expression level of a specific gene” within each genotype is extracted. It will be necessary.

一方、異なる遺伝型間における表現型の差異が僅かな場合には、同じ「ライン」内において、異なる遺伝型間を超えて、当該「ライン」全体として、共通する「特定の遺伝子の発現量の時間的な変化」の傾向（特徴）を抽出することが必要となる。 On the other hand, if there is a slight phenotypic difference between different genotypes, within the same “line”, the different “generic gene expression level” It is necessary to extract the tendency (feature) of “temporal change”.

上記の「ＬｕｃＴａｇライン」型形質転換体植物を利用して、対象とする「特定の遺伝子の発現量の時間的な変化」の傾向（特徴）を解明する上では、上述する解析作業が必要となる。その解析作業を効率的に、また、高い妥当性を維持して遂行する上では、下記するような数値解析・統計的解析手法を開発することが望まれている。 Using the above-mentioned “LucTag line” type transformant plant, in order to elucidate the tendency (characteristics) of the “temporal change in the expression level of a specific gene”, the above-described analysis work is required. Become. In order to perform the analysis work efficiently and while maintaining high validity, it is desired to develop the following numerical and statistical analysis methods.

すなわち、上述するように、何らかの「共通性・類似性」を示す蓋然性の高い、時間経過に従って、量的な変動（波形）を示す「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」の複数個について、相互に比較して、統計的な「共通性・類似性」の高さに従って、階層的に分類する「クラスター分析」を目的とする、「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」のデータ処理・数値解析手法、さらには、「クラスター分析」の結果、統計的な「共通性・類似性」を有すると判定された「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」の部分集合（クラス）について、該部分集合（クラス）に含まれる「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」の間に見出される統計的な「共通性・類似性」を反映する、統計学的な「平均化処理」を施された「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」代表値を求める統計的処理、ならびに、得られた「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」代表値に基づき、該部分集合（クラス）に含まれる「時系列的なデータ（ｔ_i，Ｐ（ｔ_i））」の間に見出される統計的な「共通性・類似性」を示す「波形上の特徴」を抽出する「プロファイル化」を目的とする、統計的解析手法の開発が望まれる。 That is, as described above, “time-series data (t _i , P (t _i )) indicating a quantitative variation (waveform) with time, which is highly likely to indicate some“ commonness / similarity ”. ”For time-series data (t _i) for the purpose of“ cluster analysis ”that classifies hierarchically according to the statistical height of“ commonality / similarity ”. , P (t _i )) ”as well as“ time series data (t) determined to have statistical “commonness / similarity” as a result of “cluster analysis”. _i , P (t _i )) ”for a subset (class), statistical data found between“ time-series data (t _i , P (t _i )) ”included in the subset (class) “Time-series” with statistical “averaging” that reflects common “similarity / similarity” Data _{_{(t i, P (t i}} )) "statistical processing for obtaining a representative value, and the resulting" time-series data _{_{(t i, P (t i}} )) "based on the representative value, the subset “A feature on the waveform” indicating a statistical “commonness / similarity” found between “time-series data (t _i , P (t _i ))” included in (class) “ Development of statistical analysis methods for the purpose of profiling is desired.

本発明は、前記の課題を解決するものであり、本発明の目的は、前述の「用途」に適合する、多細胞で構成される生物個体における時系列遺伝子発現量データの特徴抽出と、該生物個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析方法、および該解析方法に基づく解析装置を提供することにある。特には、本発明の目的は、「ＬｕｃＴａｇライン」型形質転換体植物の利用する際、上記の「用途」に適合する、遺伝子組み換え型形質転換植物体を対象として、個々の植物個体における該組み換え遺伝子の時系列遺伝子発現量データの特徴抽出と、個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析方法、および該解析方法に基づく解析装置を提供することにある。 The present invention solves the above-mentioned problems, and an object of the present invention is to extract features of time-series gene expression level data in an individual organism composed of multiple cells that is suitable for the above-mentioned “use”, and An object of the present invention is to provide an analysis method for the purpose of comparative classification based on characteristics of time-series gene expression level data between individual organisms, and an analysis apparatus based on the analysis method. In particular, an object of the present invention is to use the recombinant plant in an individual plant for a genetically modified transgenic plant that meets the above-mentioned “use” when using a “LucTag line” type transformed plant. To provide an analysis method for the purpose of comparative classification based on feature extraction of time-series gene expression data of genes, characteristics of time-series gene expression data between individuals, and an analysis apparatus based on the analysis method is there.

さまざまな生物のゲノムの解読が終了し、現在、ゲノムに含まれる情報の意味付け（機能を探る；アノテーション）を行う実験が数多く行われている。その実験系の中でもある特定の遺伝子発現を時系列に追うことができる突然変異体や培養細胞が開発され、これらを用いて遺伝子の機能を解明しようとする動きがある。この突然変異体や培養細胞（ライン）を多数用意し、遺伝子機能を網羅的に探る研究も行われている。 The genomes of various organisms have been deciphered, and many experiments are currently underway to make the meaning (search for functions; annotation) of the information contained in the genomes. Among the experimental systems, mutants and cultured cells that can follow a specific gene expression in time series have been developed, and there is a movement to elucidate the function of genes using these. A number of mutants and cultured cells (lines) have been prepared, and research has been conducted to comprehensively explore gene functions.

このような実験系で得られた時系列遺伝子発現のデータはラインに固有な複雑な波形を描き、また同じラインにおいても遺伝型により波形にばらつきがみられる。そのため、従来の手作業の方法では、各ラインの波形の特徴を把握したり、ライン間の類似性を比較することは容易ではなかった。 The time-series gene expression data obtained in such an experimental system draws complex waveforms unique to the line, and even in the same line, the waveform varies depending on the genotype. Therefore, in the conventional manual method, it is not easy to grasp the characteristics of the waveform of each line and compare the similarity between lines.

本発明にかかる解析方法は、このような実験系で得られた時系列遺伝子発現データを加工し、ラインの特徴をつかむと共に各ラインの比較分類を行うことに特化した、解析ツールとして、開発されたものである。得られたデータより、各ラインの波形の特徴を見やすく表示すると共に、波形の特徴からサンプルの遺伝型を自動的に判別、各々についてモデル波形の作成と解析を行う。最終的に、各ラインのモデル波形によるクラスター分析を行い、各ラインの比較分類を行う。 The analysis method according to the present invention is developed as an analysis tool specialized in processing time-series gene expression data obtained in such an experimental system, grasping the characteristics of the lines, and performing comparative classification of each line. It has been done. From the obtained data, the waveform characteristics of each line are displayed in an easy-to-see manner, the genotype of the sample is automatically determined from the waveform characteristics, and a model waveform is created and analyzed for each. Finally, cluster analysis is performed using the model waveform of each line, and comparison classification of each line is performed.

本発明にかかる解析システムは、下記する機能を有している。 The analysis system according to the present invention has the following functions.

「測定誤差補正」１４は、実験の測定器より得られた数値データを受け取り、遺伝子発現の値に擬陽性があった場合にこれを検出し、これを補正する。 The “measurement error correction” 14 receives numerical data obtained from an experimental measuring instrument, detects if there is a false positive in the value of gene expression, and corrects it.

「複数種波形判別」１７は、遺伝子発現の値より遺伝型などにより複数の異なる波形が混在するかを判定する。これは、遺伝型により波形が大きく異なる場合があり、それを検出するため、及び、後述のモデル波形作成７するときに両者を区別して処理する必要があるためである。 The “plurality waveform discrimination” 17 determines whether a plurality of different waveforms are mixed depending on the genotype or the like from the value of gene expression. This is because the waveform may differ greatly depending on the genotype, and it is necessary to distinguish between the two when processing the model waveform creation 7 described later.

「時間軸統一」１８は、ライン間で遺伝子発現測定実験の開始時刻や測定間隔が異なる場合、これを統一する。 “Unify time axis” 18 unifies the time when the start time and measurement interval of the gene expression measurement experiment differ between lines.

「モデル波形作成」１９は、ラインの複数のサンプルからそのラインの特徴を最も反映する波形を作成する。モデル波形は、遺伝型による誤差、ライン間で生じた測定時間の誤差などを修正し、標準化したデータといえる。 “Model waveform creation” 19 creates a waveform that most reflects the characteristics of the line from a plurality of samples of the line. The model waveform can be said to be standardized data by correcting genotyping errors and measurement time errors between lines.

「波形分解」２０は、モデル波形（複雑な曲線）を既知の関数（正規分布のような単純な曲線）の複合したものと仮定し、複数の単純な曲線に分解する。このように考えることにより複雑な波形、特にピークをより明瞭に見ることができる。分離した個々の曲線に関して、時刻、半値幅、頂点のプロファイルを作成する。 The “waveform decomposition” 20 assumes that a model waveform (complex curve) is a composite of known functions (simple curves such as a normal distribution), and decomposes them into a plurality of simple curves. By thinking in this way, complex waveforms, particularly peaks, can be seen more clearly. A profile of time, half width, and vertex is created for each separated curve.

「波形比較・分類」２３は、以上の波形解析５で標準化したデータにより、クラスター分析を実施、統計的に類似ラインを分類する。 “Waveform comparison / classification” 23 performs cluster analysis based on the data standardized in the above waveform analysis 5 and statistically classifies similar lines.

すなわち、本発明にかかる時系列遺伝子発現量データの解析方法は、
同一の「ライン」に由来する生物個体における時系列遺伝子発現量データの特徴抽出と、個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析方法であって、
前記生物個体における時系列遺伝子発現量データは、当該遺伝子の発現量を非破壊的にモニター可能な「レポーター遺伝子」として、ルシフェラーゼ遺伝子を利用し、「ルシフェラーゼ酵素活性」の時系列的な変化として、観測されるデータであり、
当該解析方法は、
測定器より得られた「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ中に、該測定器に起因する「測定誤差」データが混入するか、否かを判別し、混入している「測定誤差」データを、その前後の数値データに基づく、「推定値」に補正する操作を行う、「測定誤差補正」の工程；
同一の「ライン」に由来する生物個体複数について、その「ルシフェラーゼ酵素活性」の時系列的な変化の数値データを比較し、類似性の高さにより、階層的なクラスター化を行い、少なくとも、複数の個体を含む、グループを一つ、または、二つ形成する「クラスター分析」操作を行う、「複数種波形判別」の工程；
同一の「ライン」に由来する生物個体について、その「ルシフェラーゼ酵素活性」の時系列的な変化の数値データに基づき、測定開始時間を起点として、所望の経過時間において、当該個体において、観測されると予測される「ルシフェラーゼ酵素活性」の時系列的な変化の数値データに変換し、各個体における、「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ間における「時間軸統一」を行う、「時間軸統一」の工程；
「時間軸統一」がなされた、各個体における、「ルシフェラーゼ酵素活性」の時系列的な変化の数値データを用いて、前記「複数種波形判別」により、同一のグループに属すると判別された複数の個体からなる群において、この個体群における「ルシフェラーゼ酵素活性」の時系列的な変化の類似性を反映する、「ルシフェラーゼ酵素活性」代表値の時系列的な変化を示す「モデル波形」を作成する、「モデル波形作成」の工程；
前記同一のグループに属すると判別された複数の個体からなる群に対する、「ルシフェラーゼ酵素活性」代表値の時系列的な変化を示す「モデル波形」に基づき、「単峰性の波形関数」を複数重ね合わせ、該「モデル波形」の波形的特徴を近似的に示す「合成波形」を作成し、該「単峰性の波形関数」複数について、それぞれのピーク位置、ピーク高さ、半値幅の値を決定する、「波形分解」の工程
を有している
ことを特徴とする、時系列遺伝子発現量データの解析方法である。 That is, the method for analyzing time-series gene expression level data according to the present invention includes:
An analysis method for the purpose of comparative classification based on feature extraction of time series gene expression data in individuals derived from the same "line" and characteristics of time series gene expression data between individuals,
The time-series gene expression level data in the individual organism uses a luciferase gene as a `` reporter gene '' capable of non-destructively monitoring the expression level of the gene, and as a chronological change in `` luciferase enzyme activity '', Observed data,
The analysis method is
It is determined whether or not “measurement error” data caused by the measuring instrument is mixed in the time-series numerical data of “luciferase enzyme activity” obtained from the measuring instrument. A process of “measurement error correction”, which performs an operation of correcting the “measurement error” data to an “estimated value” based on the numerical data before and after that;
Comparing numerical data of time-series changes of “luciferase enzyme activity” for multiple organisms originating from the same “line”, hierarchical clustering is performed according to the level of similarity, and at least Performing “cluster analysis” operation to form one or two groups including the individual of “multiple types of waveform discrimination”;
Based on numerical data of time-series changes in the “luciferase enzyme activity” for biological individuals originating from the same “line”, observed in the individual at a desired elapsed time starting from the measurement start time It is converted into numerical data of time-series changes in predicted “luciferase enzyme activity”, and “time axis unification” is performed between the numerical data of time-series changes in “luciferase enzyme activity” in each individual. Process of “unification of time axis”;
Using the numerical data of the time-series changes in “luciferase enzyme activity” in each individual for which “time axis unification” has been made, a plurality of types determined to belong to the same group by the “plurality waveform discrimination” Create a “model waveform” that shows the time-series changes in the representative value of “luciferase enzyme activity”, reflecting the similarity of the time-series changes in “luciferase enzyme activity” in this group. The process of “model waveform creation”;
Based on a “model waveform” indicating a time-series change of a representative value of “luciferase enzyme activity” for a group consisting of a plurality of individuals determined to belong to the same group, a plurality of “unimodal waveform functions” are provided. Overlay, create a “synthetic waveform” that approximately shows the waveform characteristics of the “model waveform”, and for each of the “unimodal waveform function” values for the peak position, peak height, and half width A method of analyzing time-series gene expression level data, characterized by having a step of “waveform decomposition”.

加えて、本発明にかかる時系列遺伝子発現量データの解析方法は、
各「ライン」に由来する生物群について、前記の「モデル波形作成」の工程で作成される「モデル波形」を利用して、
複数の「ライン」に由来する生物群の「モデル波形」を比較し、類似性の高さにより、階層的なクラスター化を行い、少なくとも、複数の「ライン」を含む、グループを一つ、または、二つ形成する「クラスター分析」操作を行う、「波形比較・分類」の工程を、さらに含む方法とすることが好ましい。 In addition, the method for analyzing time-series gene expression level data according to the present invention includes:
Using the “model waveform” created in the process of “model waveform creation” for the biological group derived from each “line”,
Compare “model waveforms” of organisms derived from multiple “lines”, perform hierarchical clustering according to the level of similarity, and at least one group that includes multiple “lines”, or It is preferable that the method further includes a “waveform comparison / classification” step of performing two “cluster analysis” operations.

本発明にかかる時系列遺伝子発現量データの解析方法では、
前記「波形分解」の工程で利用される、「単峰性の波形関数」は、ローレンツ関数型の波形関数であることが望ましい。 In the method for analyzing time-series gene expression level data according to the present invention,
The “unimodal waveform function” used in the “waveform decomposition” step is preferably a Lorentz function type waveform function.

前記「モデル波形作成」の工程において、
同一のグループに属すると判別された複数の個体からなる群において、この個体群における「ルシフェラーゼ酵素活性」の時系列的な変化の類似性を反映する、「ルシフェラーゼ酵素活性」代表値として、該個体群における、各時刻における「ルシフェラーゼ酵素活性」の中心値を選択することが好ましい。 In the process of “model waveform creation”,
In a group consisting of a plurality of individuals determined to belong to the same group, as the representative value of “luciferase enzyme activity” that reflects the similarity of time-series changes in “luciferase enzyme activity” in this individual group, the individual It is preferable to select the central value of the “luciferase enzyme activity” at each time in the group.

「複数種波形判別」の工程において、
同一の「ライン」に由来する生物個体複数について、その「ルシフェラーゼ酵素活性」の時系列的な変化の数値データを比較し、類似性の高さにより、階層的なクラスター化を行う際、各個体間における「ルシフェラーゼ酵素活性」の時系列的な変化の数値データの類似性は、該「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ相互の「距離」の計算に、行間行列計算法を利用することが望ましい。 In the “Multiple Waveform Discrimination” process,
When comparing numerical data of time-series changes in the “luciferase enzyme activity” for multiple organisms that originate from the same “line”, each individual can be clustered according to the level of similarity. The similarity of the numerical data of the time-series changes in the “luciferase enzyme activity” between the two is based on the matrix matrix calculation method for calculating the “distance” between the numerical data of the time-series changes in the “luciferase enzyme activity”. It is desirable to use it.

その際、
前記行間行列計算法として、ｍａｎｈａｔｔａｎ法を用い、
該「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ相互の「距離」に基づく、クラスター化のための結合法として、ｗａｒｄ法を用いることが好ましい。 that time,
As the inter-row matrix calculation method, a manhattan method is used,
It is preferable to use the ward method as the binding method for clustering based on the “distance” between the numerical data of the time-series changes in the “luciferase enzyme activity”.

加えて、本発明にかかる解析システムは、
同一の「ライン」に由来する生物個体における時系列遺伝子発現量データの特徴抽出と、個体相互間における時系列遺伝子発現量データの特徴に基づく、比較分類を目的とする解析に利用可能な解析システムであって、
該解析システムは、前記本発明の時系列遺伝子発現量データの解析方法に従って、解析を行うための機構として、
測定器より得られた「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ中に、該測定器に起因する「測定誤差」データが混入するか、否かを判別し、混入している「測定誤差」データを、その前後の数値データに基づく、「推定値」に補正する操作を行う、「測定誤差補正」の機構；
同一の「ライン」に由来する生物個体複数について、その「ルシフェラーゼ酵素活性」の時系列的な変化の数値データを比較し、類似性の高さにより、階層的なクラスター化を行い、少なくとも、複数の個体を含む、グループを一つ、または、二つ形成する「クラスター分析」操作を行う、「複数種波形判別」の機構；
同一の「ライン」に由来する生物個体について、その「ルシフェラーゼ酵素活性」の時系列的な変化の数値データに基づき、測定開始時間を起点として、所望の経過時間において、当該個体において、観測されると予測される「ルシフェラーゼ酵素活性」の時系列的な変化の数値データに変換し、各個体における、「ルシフェラーゼ酵素活性」の時系列的な変化の数値データ間における「時間軸統一」を行う、「時間軸統一」の機構；
「時間軸統一」がなされた、各個体における、「ルシフェラーゼ酵素活性」の時系列的な変化の数値データを用いて、前記「複数種波形判別」により、同一のグループに属すると判別された複数の個体からなる群において、この個体群における「ルシフェラーゼ酵素活性」の時系列的な変化の類似性を反映する、「ルシフェラーゼ酵素活性」代表値の時系列的な変化を示す「モデル波形」を作成する、「モデル波形作成」の機構；
前記同一のグループに属すると判別された複数の個体からなる群に対する、「ルシフェラーゼ酵素活性」代表値の時系列的な変化を示す「モデル波形」に基づき、「単峰性の波形関数」を複数重ね合わせ、該「モデル波形」の波形的特徴を近似的に示す「合成波形」を作成し、該「単峰性の波形関数」複数について、それぞれのピーク位置、ピーク高さ、半値幅の値を決定する、「波形分解」の機構
を有している
ことを特徴とする、時系列遺伝子発現量データの解析システムである。 In addition, the analysis system according to the present invention includes:
An analysis system that can be used for analysis for the purpose of comparative classification based on feature extraction of time-series gene expression data in organism individuals originating from the same "line" and characteristics of time-series gene expression data between individuals Because
The analysis system is a mechanism for performing analysis according to the method for analyzing time-series gene expression level data of the present invention.
It is determined whether or not “measurement error” data caused by the measuring instrument is mixed in the time-series numerical data of “luciferase enzyme activity” obtained from the measuring instrument. A “measurement error correction” mechanism that performs an operation to correct the “measurement error” data to an “estimated value” based on the numerical data before and after that;
Comparing numerical data of time-series changes of “luciferase enzyme activity” for multiple organisms originating from the same “line”, hierarchical clustering is performed according to the level of similarity, and at least A "multiple-type waveform discrimination" mechanism that performs a "cluster analysis" operation to form one or two groups, including
Based on numerical data of time-series changes in the “luciferase enzyme activity” for biological individuals originating from the same “line”, observed in the individual at a desired elapsed time starting from the measurement start time It is converted into numerical data of time-series changes in predicted “luciferase enzyme activity”, and “time axis unification” is performed between the numerical data of time-series changes in “luciferase enzyme activity” in each individual. "Unified time axis"mechanism;
Using the numerical data of the time-series changes in “luciferase enzyme activity” in each individual for which “time axis unification” has been made, a plurality of types determined to belong to the same group by the “plurality waveform discrimination” Create a “model waveform” that shows the time-series changes in the representative value of “luciferase enzyme activity”, reflecting the similarity of the time-series changes in “luciferase enzyme activity” in this group. The “model waveform creation” mechanism;
Based on a “model waveform” indicating a time-series change of a representative value of “luciferase enzyme activity” for a group consisting of a plurality of individuals determined to belong to the same group, a plurality of “unimodal waveform functions” are provided. Overlay, create a “composite waveform” that approximately shows the waveform characteristics of the “model waveform”, and for each of the “unimodal waveform functions”, the values of the peak position, peak height, and half width It is a time-series gene expression level data analysis system characterized by having a “waveform decomposition” mechanism.

本発明にかかる時系列遺伝子発現量データの解析システムにおいては、
各「ライン」に由来する生物群について、前記の「モデル波形作成」の工程で作成される「モデル波形」を利用して、
複数の「ライン」に由来する生物群の「モデル波形」を比較し、類似性の高さにより、階層的なクラスター化を行い、少なくとも、複数の「ライン」を含む、グループを一つ、または、二つ形成する「クラスター分析」操作を行う、「波形比較・分類」の機構を、さらに具えているシステム構成とすることもできる。 In the time series gene expression level data analysis system according to the present invention,
Using the “model waveform” created in the process of “model waveform creation” for the biological group derived from each “line”,
Compare “model waveforms” of organisms derived from multiple “lines”, and perform hierarchical clustering according to the level of similarity, and at least one group that includes multiple “lines”, or The system configuration further includes a “waveform comparison / classification” mechanism that performs two “cluster analysis” operations.

本発明にかかる解析システム、解析方法は、特には、下記する効果を有する。
・大量の時系列発現データを得たとき、客観的に解析された定量的なデータでラインを比較分類することができる。
・波形分解を行うことによりひとつの波形を複数の波形で見るため、特徴をより顕著にみることができる。 The analysis system and analysis method according to the present invention particularly have the following effects.
・ When a large amount of time-series expression data is obtained, the lines can be compared and classified with quantitative data that has been objectively analyzed.
・ By performing waveform decomposition, a single waveform is viewed as a plurality of waveforms, so that the characteristics can be seen more remarkably.

以下に、本発明に関して、より詳しく説明する。 Hereinafter, the present invention will be described in more detail.

（１）測定対象の「ＬｕｃＴａｇライン」型形質転換植物体
先ず、本発明の解析方法が対象とする「特定の遺伝子の発現量の時間的な変化」は、「２倍体」型の染色体ＤＮＡを有する植物体において、相同染色体対にそれぞれ、対立遺伝子として存在している「特定の遺伝子」の発現量が、当該植物体の生長を維持した状態で、時間的に変化する状況を意味する。具体的には、当該植物体の単一の細胞内における「特定の遺伝子の発現量の時間的な変化」ではなく、その植物体の一部、例えば、葉のような、特定の器官、組織を構成している、多数の細胞を含む、細胞集団全体における「特定の遺伝子の発現量の時間的な変化」である。 (1) “LucTag line” type transformed plant to be measured First, “temporal change in expression level of a specific gene” targeted by the analysis method of the present invention is a “diploid” type chromosomal DNA. Means that the expression level of a “specific gene” present as an allele in each homologous chromosome pair varies with time in a state in which the growth of the plant body is maintained. Specifically, it is not a “temporal change in the expression level of a specific gene” within a single cell of the plant body, but a part of the plant body, for example, a specific organ or tissue such as a leaf. Is a “temporal change in the expression level of a specific gene” in the entire cell population including a large number of cells.

この細胞集団を構成する細胞数Ｎ_totalは、当該植物体の生長を維持した状態では、時間経過とともに、徐々に変化するが、例えば、葉などの分化が完了している器官を構成する細胞集団では、それを構成する細胞数は、数日間程度の時間幅では、実質的に一定と見做すことができる。また、葉などの器官では、この細胞集団を構成する細胞は、全体として、同じ環境下に置かれている。葉などの器官、例えば、葉の表皮細胞群は、同種の細胞の集団であるが、微視的にみると、その存在する部位によって、若干の差異を有している。従って、その細胞集団を構成する細胞数Ｎ_totalは、幾つかの部分集合の細胞数Ｎ_subgr-iの和、Ｎ_total＝ΣＮ_subgr-iと表記することができる。この部分集合の細胞は、微視的にも、実質的に同種の細胞群となっている。 The number of cells constituting the cell population N _total gradually changes with time in a state in which the growth of the plant body is maintained. For example, the cell population constituting an organ in which differentiation such as a leaf has been completed Then, the number of cells constituting the cell can be considered to be substantially constant in a time width of about several days. In organs such as leaves, the cells constituting this cell population are placed in the same environment as a whole. An organ such as a leaf, for example, a group of epidermal cells of a leaf, is a group of cells of the same type, but when viewed microscopically, there are slight differences depending on the site where the cells exist. Therefore, the number N _total of cells constituting the cell population can be expressed as the sum of the number N _subgr-i of several subsets, N _total = ΣN _subgr-i . Microscopically, the subset of cells is substantially the same type of cell group.

その際、微視的に、実質的に同種の細胞群において、その細胞内における「特定の遺伝子の発現量の時間的な変化」を考慮すると、該特定の遺伝子の発現が開始し、発現量は極大に達し、最終的に減少する「時間的変化」は、実質的に同じと考えることができる。一方、同一の環境下に置かれている、実質的に同種の細胞群においても、該特定の遺伝子の発現は開始する時間は、完全には同期してないが、この細胞群全体としては、ある時間を極大とする確率関数に従っていると近似することが可能である。 At that time, microscopically, in a group of cells of substantially the same type, in consideration of “temporal change in the expression level of a specific gene” in the cell, the expression of the specific gene starts and the expression level The “temporal change” that reaches the maximum and eventually decreases can be considered to be substantially the same. On the other hand, even in a group of cells of substantially the same type that are placed in the same environment, the time at which the expression of the specific gene starts is not completely synchronized. It can be approximated according to a probability function that maximizes a certain time.

例えば、形質転換大腸菌のように、単一のクローン、すなわち、全ての細胞は、遺伝子的には、完全に同種の細胞で構成される、培養液中において、誘導物質を利用する、組み換え遺伝子の過剰発現を行った場合も、その細胞群における過剰発現の頻度を時間的に追跡すると、誘導物質の添加時をｔ＝０とし、過剰発現を開始する細胞数は、徐々に増加し、ある時刻ｔ_maxで極大に達し、その後、減少する。この過剰発現を開始する細胞数Ｎｓ（ｔ）の時間変化は、ポワッソン分布：ｆ（ｘ）＝ｋ^x・ｅｘｐ（−ｋ）／ｘ！に類する確率分布として、近似できると考えられる。一旦、過剰発現すると、短時間にその細胞内に該遺伝子産物のタンパク質が産生され、その後、そのタンパク質の分解がなされないとすると、その細胞群全体における、該タンパク質の総量は、過剰発現した細胞数の総和に比例する。過剰発現した細胞数の総和：∫Ｎｓ（ｔ）ｄｔは、各時刻ｔ_iにおいて、過剰発現を開始する細胞数Ｎｓ（ｔ_i）とすると、
∫Ｎｓ（ｔ）ｄｔ≒Σ１／２・｛Ｎｓ（ｔ_i）＋Ｎｓ（ｔ_i+1）｝・（ｔ_i+1−ｔ_i）
と、近似的に表すことが可能である。過剰発現を開始する細胞数Ｎ（ｔ）の時間変化が、ポワッソン分布に従って、Ｎｓ（ｔ）＝Ｎａ×｛ｔ_max ^t・ｅｘｐ（−ｔ_max）／ｔ！｝
と表記できる場合、過剰発現した細胞数の総和：∫Ｎｓ（ｔ）ｄｔは、ｔ＝ｔ_maxで、その増加率Ｎｓ（ｔ）は極大を示し、ｔ＝２ｔ_maxに達すると、飽和傾向を示す、単調増加関数となる。対応して、その細胞群全体における、該タンパク質の総量Ｐ_pro（ｔ）を考えると、各時刻における増加率：ｄＰ_pro（ｔ）／ｄｔが、
ｄＰ_pro（ｔ）／ｄｔ∝Ｎ（ｔ）
ｄＰ_pro（ｔ）／ｄｔ＝Ｐ_proａ×｛ｔ_max ^t・ｅｘｐ（−ｔ_max）／ｔ！｝
と表記できる場合、ｔ＝ｔ_maxで、その増加率：ｄＰ_pro（ｔ）／ｄｔは極大を示し、ｔ＝２ｔ_maxに達すると、飽和傾向を示す、単調増加関数となる。 For example, like a transformed E. coli, a single clone, i.e., all cells, are genetically composed of completely the same type of cells, and in a culture medium, a recombinant gene that utilizes an inducer. Even when overexpression is performed, if the frequency of overexpression in the cell group is traced over time, the time of addition of the inducer is set to t = 0, and the number of cells that start overexpression gradually increases at a certain time. It reaches a maximum at t _max and then decreases. The time variation of the number Ns (t) of cells that start this overexpression is Poisson distribution: f (x) = k ^x · exp (−k) / x! It can be approximated as a probability distribution similar to. Once overexpressed, if the protein of the gene product is produced in the cell in a short period of time and then the protein is not degraded, the total amount of the protein in the entire cell group is the overexpressed cell. It is proportional to the sum of numbers. Total number of overexpressed cells: ∫Ns (t) dt is the number of cells Ns (t _i ) that start overexpression at each time t _i .
∫Ns (t) dt≈Σ1 / 2 · {Ns (t _i ) + Ns (t _{i + 1} )} · (t _{i + 1} −t _i )
And can be expressed approximately. The time change of the number N (t) of cells in which overexpression starts is determined according to the Poisson distribution. Ns (t) = Na × {t _max ^t · exp (−t _max ) / t! }
If so denoted, overexpressed cell number of total: ∫Ns (t) dt is a t = t _max, the increase rate Ns (t) represents the maximum, is reached t = 2t _max, the saturation tendency It becomes a monotonically increasing function. Correspondingly, considering the total amount P _pro (t) of the protein in the entire cell group, the rate of increase at each time: dP _pro (t) / dt is
dP _pro (t) / dt∝N (t)
dP _pro (t) / dt = P _pro a × {t _max ^t · exp (−t _max ) / t! }
When t = t _max , the rate of increase: dP _pro (t) / dt shows a maximum, and when t = 2t _max is reached, a monotonically increasing function showing a saturation tendency.

一旦、過剰発現すると、短時間にその細胞内に該遺伝子産物のタンパク質が産生され、その後、そのタンパク質の酵素的分解がなされる場合には、そのタンパク質の酵素的分解反応の見かけの速度定数ｋ_dig.に従って、当該細胞内における該タンパク質の濃度が減少する。その際、過剰発現を開始する細胞数Ｎｓ（ｔ）が、ポワッソン分布に従って、
Ｎｓ（ｔ）＝Ｎａ×｛ｔ_max ^t・ｅｘｐ（−ｔ_max）／ｔ！｝
と表記できる場合、この過剰発現を開始する細胞に由来するタンパク質量の増加成分は、∂Ｐ_pro.+（ｔ）／∂ｔ＝Ｐ_proａ×｛ｔ_max ^t・ｅｘｐ（−ｔ_max）／ｔ！｝
となる。対応して、その細胞群全体における、該タンパク質の総量Ｐ_pro（ｔ）は、
Ｐ_pro（ｔ）＝∫［（∂Ｐ_pro.+（ｓ）/∂ｓ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｓ）｝］・ｄｓ
≒Σ1/2・［｛（∂Ｐ_pro.+（ｔ_i）/∂ｔ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｔ_i）｝＋｛（∂Ｐ_pro.+（ｔ_i+1）/∂ｔ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｔ_i+1）｝］・（ｔ_i+1−ｔ_i）
と、近似的に表すことが可能である。その際、その細胞群全体における、該タンパク質の総量Ｐ_pro（ｔ）は、時刻ｔ_maxと、時刻ｔ_dig.≡｛１／ｋ_dig.｝との間に極大を示すような二次微分可能な連続関数として近似できる。 Once over-expressed, the gene product protein is produced in the cell in a short period of time, and when the protein is subsequently enzymatically degraded, the apparent rate constant k of the enzymatic degradation reaction of the protein. _{According to dig.} , the concentration of the protein in the cell decreases. At that time, the number Ns (t) of cells initiating overexpression is determined according to the Poisson distribution.
Ns (t) = Na × {t _max ^t · exp (−t _max ) / t! }
Can be expressed as 成分 P _pro _{. +} (T) / ∂t = P _pro a × {t _max ^t · exp (−t _max ) / t! }
It becomes. Correspondingly, the total amount P _pro (t) of the protein in the whole cell group is
P _pro (t) = ∫ [(∂P _{pro. +} (S) / ∂s) · exp {−k _dig. (Ts)}] · ds
≈Σ1 / 2 · [{(∂P _{pro. +} (T _i ) / ∂t) · exp {−k _dig. (T−t _i )} + {(∂P _{pro. +} (T _{i + 1} ) / ∂t) · exp {−k _dig. (T−t _{i + 1} )}] · (t _{i + 1} −t _i )
And can be expressed approximately. At that time, the total amount P _pro (t) of the protein in the whole cell group can be secondarily differentiated so as to show a maximum between the time t _max and the time t _dig. ≡ {1 / k _dig. } _. It can be approximated as a continuous function.

植物体の一部、例えば、葉のような、特定の器官、組織を構成している、多数の細胞を含む、細胞集団全体における「特定の遺伝子の発現量の時間的な変化」に関しても、細胞集団を構成する細胞数Ｎ_totalが、同種の細胞からなる、幾つかの部分集合の細胞数Ｎ_subgr-iの和、Ｎ_total＝ΣＮ_subgr-iと表記できる際、その細胞数Ｎ_subgr-iの同種の細胞群における、「特定の遺伝子の発現量の時間的な変化」は、前述のモデルで近似することが可能である。 Regarding “a temporal change in the expression level of a specific gene” in a whole cell population including a large number of cells constituting a specific organ or tissue such as a part of a plant, for example, a leaf, When the number N _{total of} cells constituting a cell population can be expressed as the sum of the number N _subgr-i of several subsets of cells of the same kind, N _total = ΣN _subgr-i , the number N _{subgr- The} “temporal change in the expression level of a specific gene” in the same type of cell group _i can be approximated by the above-described model.

すなわち、細胞数Ｎ_subgr-iの同種の細胞群においては、時刻ｔ_iにおいて「特定の遺伝子」の発現を開始する細胞数Ｎｓ_i（ｔ_i）は、ポワッソン分布に従って、
Ｎｓ_i（ｔ）＝Ｎａ_i×｛ｔ_max-i ^t・ｅｘｐ（−ｔ_max-i）／ｔ！｝
と、近似的に表記でき、この発現を開始する細胞に由来するタンパク質量の増加成分は、∂Ｐ_pro.+-i（ｔ）／∂ｔ＝Ｐ_pro-iａ×｛ｔ_max-i ^t・ｅｘｐ（−ｔ_max-i）／ｔ！｝
となる。対応して、この細胞数Ｎ_subgr-iの部分集合の細胞群全体における、該タンパク質の総量Ｐ_pro-i（ｔ）は、
Ｐ_pro-i（ｔ）＝∫［（∂Ｐ_{pro.+ -i}（ｓ）/∂ｓ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｓ）｝］・ｄｓ
≒Σ1/2・［｛（∂Ｐ_{pro.+ -i}（ｔ_i）/∂ｔ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｔ_i）｝＋｛（∂Ｐ_{pro.+ -i}（ｔ_i+1）/∂ｔ）・ｅｘｐ｛−ｋ_dig.（ｔ−ｔ_i+1）｝］・（ｔ_i+1−ｔ_i）
と、近似的に表すことが可能である。その際、かかる部分集合を構成する細胞群全体における、該タンパク質の総量Ｐ_pro-i（ｔ）は、時刻ｔ_max-iと、時刻ｔ_dig.≡｛１／ｋ_dig.｝との間に極大を示すような二次微分可能な連続関数として近似できる。 That is, in the same cell group having the cell number N _subgr-i , the cell number Ns _i (t _i ) at which expression of the “specific gene” starts at time t _i is determined according to the Poisson distribution.
Ns _i (t) = Na _i × {t _max-i ^t · exp (−t _max-i ) / t! }
When, approximately can notation, increase component of amount of protein from cells to initiate this _{expression, ∂P pro + -. I (} t) / ∂t = P pro-i a × {t max-i t Exp (-t _max-i ) / t! }
It becomes. Correspondingly, the total amount P _pro-i (t) of the protein in the entire cell group of this subset of cell numbers N _subgr-i is
P _pro-i (t) = ∫ [(∂P _{pro. + -I} (s) / ∂s) · exp {−k _dig. (Ts)}] · ds
≈Σ1 / 2 · [{(∂P _{pro. + -I} (t _i ) / ∂t) · exp {−k _dig. (T−t _i )} + {(∂P _{pro. + -I} (t _{i +1} ) / ∂t) · exp {−k _dig. (T−t _{i + 1} )}] · (t _{i + 1} −t _i )
And can be expressed approximately. In this case, the total amount P _pro-i (t) of the protein in the entire cell group constituting the subset is between the time t _max-i and the time t _dig. ≡ {1 / k _dig. } _. It can be approximated as a continuous differentiable function that exhibits a maximum.

対象の細胞集団は、同種の細胞からなる、幾つかの部分集合が寄せ合わされたものであり、該細胞集団全体における、「特定の遺伝子」から発現されるタンパク質の総量Ｐ_pro（ｔ）は、Ｐ_pro（ｔ）＝ΣＰ_pro-i（ｔ）
として、表記することが可能である。この場合、該細胞集団全体における、「特定の遺伝子」から発現されるタンパク質の総量Ｐ_pro（ｔ）の時間的変化において、ある時間帯において、複数の部分集合を構成する細胞群において、その部分集合中のタンパク質の総量Ｐ_pro-i（ｔ）が極大を示し、互いに重なりあうこともある。 The target cell population is a collection of several subsets of the same type of cells, and the total amount P _pro (t) of the protein expressed from the “specific gene” in the entire cell population is: P _pro (t) = ΣP _pro-i (t)
Can be expressed as: In this case, in the temporal change of the total amount P _pro (t) of the protein expressed from the “specific gene” in the entire cell population, the portion of the cell group constituting a plurality of subsets in a certain time zone The total amount P _pro-i (t) of the proteins in the assembly shows a maximum and may overlap each other.

一方、かかる部分集合を構成する細胞群全体における、該タンパク質の総量Ｐ_pro-i（ｔ）は、時刻ｔ_max-iと、時刻ｔ_dig.≡｛１／ｋ_dig.｝との間に極大を示すような二次微分可能な連続関数できるので、少なくとも、そのピークの見かけの半値全幅Δｔ_hwは、ｔ_max-iあるいはｔ_dig.≡｛１／ｋ_dig.｝と比較して、極端に小さくなることは有り得ない。従って、該細胞集団全体における、「特定の遺伝子」から発現されるタンパク質の総量Ｐ_pro（ｔ）の時間的変化においても、そのピークの見かけの半値全幅Δｔ_hwは、ｔ_max-iあるいはｔ_dig.≡｛１／ｋ_dig.｝と比較して、極端に小さくなることは有り得ない。対象とする、細胞集団全体における、「特定の遺伝子」から発現されるタンパク質の総量を観測している際、前記ｔ_max-iあるいはｔ_dig.≡｛１／ｋ_dig.｝と比較して、極端に小さくなるような半値全幅Δｔ_hwを示す、「極度に鋭いピーク」が観測された際には、この「極度に鋭いピーク」は、測定系に起因する「ノイズ・ピーク」である蓋然性が極めて高い。 On the other hand, the total amount P _pro-i (t) of the protein in the entire cell group constituting the subset is a _maximum between the time t _max-i and the time t _dig. ≡ {1 / k _dig. } _. As shown in FIG. 4, at least the apparent full width at half maximum Δt _hw of the peak is extremely large compared to t _max-i or t _dig. ≡ {1 / k _dig. } _. It can never be smaller. Therefore, even in the temporal change of the total amount P _pro (t) of the protein expressed from the “specific gene” in the entire cell population, the apparent full width at half maximum Δt _hw of the peak is t _max-i or t _{dig. _{. ≡ {1 / k dig.}} } as compared to, is impossible to become extremely small. When observing the total amount of protein expressed from a “specific gene” in the entire cell population of interest, compared to the t _max-i or t _dig. ≡ {1 / k _dig. }, When an “extremely sharp peak” showing a full width at half maximum Δt _hw that is extremely small is observed, this “extremely sharp peak” is likely to be a “noise peak” caused by the measurement system. Extremely expensive.

確かに、多くの細胞が、極めて同期性の高い「発現」を行う可能性は、排除できないが、この種の極めて同期性の高い「発現」は、「ｈｅａｔ−ｓｈｏｃｋ」タンパク質など、特定の「ストレス」に対して、瞬時に応答する必要のある、極く僅かな種類のメンテナンス・タンパク質をコードする遺伝子に限られる。 Certainly, the possibility that many cells do highly synchronous “expression” cannot be ruled out, but this type of highly synchronous “expression” is not limited to certain “such as a“ heat-shock ”protein. Limited to genes encoding very few types of maintenance proteins that need to respond instantaneously to "stress".

また、多くの細胞において、「セル・サイクル」が極めて高い同期性で進行する（同調分裂が起きてきる）際には、その「セル・サイクル」の各ステージ（Ｓ期、Ｇ₂期、Ｍ期）のみで発現される遺伝子の遺伝子産物（タンパク質）は、「セル・サイクル」の周期に従って、濃度変化を示す。植物体を構成する細胞集団、特に、既に分化がなされている器官、組織を構成する細胞集団においては、一般に、分裂指数（ｍｉｔｏｉｃｉｎｄｅｘ）は低く、数％〜２０％の範囲である点を考慮すると、前述の同調分裂に起因する、「急峻なピーク」が観測される可能性は、排除できないが、極めて、稀である。 In many cells, when the “cell cycle” proceeds with extremely high synchrony (synchronous division occurs), each stage of the “cell cycle” (S phase, G ₂ phase, M phase) The gene product (protein) of the gene expressed only in the (phase) shows a change in concentration according to the period of the “cell cycle”. In consideration of the fact that cell populations constituting plant bodies, particularly cell populations constituting already differentiated organs and tissues, generally have a low mitotic index and range from several to 20%. Then, the possibility of observing a “steep peak” due to the above-mentioned synchronized splitting cannot be excluded, but is extremely rare.

本発明では、「特定の遺伝子の発現量の時間的な変化」を検出する手段として、「ＬｕｃＴａｇライン」型形質転換体植物を利用する形態を選択している。 In the present invention, as a means for detecting “temporal change in the expression level of a specific gene”, a form utilizing a “LucTag line” type transformant plant is selected.

具体的には、植物のゲノムＤＮＡ上に存在する、特定の遺伝子に対して、該遺伝子がコードする、タンパク質のアミノ酸配列をコードする領域を含む、ｐｒｅ−ｍＲＮＡの転写を継続させるように、リボゾームとの結合を可能とする配列（ＩＲＥＳ）の直後に、ルシフェラーゼ遺伝子の該発光タンパク質（ルシフェラーゼ）をコードする領域（ＯＲＦ部分）を挿入し、連結する。このキメラ型遺伝子は、上流部分は、特定の遺伝子の転写を引き起こす、プロモータ配列を有しており、特定の遺伝子の発現を誘起する、内因的要因によって、同様に転写が開始される。その際、得られるｍＲＮＡは、挿入されているルシフェラーゼ遺伝子によりコードされる、発光タンパク質（ルシフェラーゼ）のペプチド鎖へと翻訳可能なものとなる。すなわち、そのｍＲＮＡに基づき、翻訳がなされ、引き続き、タンパク質へと折りたたみ（ｆｏｌｄｉｎｇ）が進行すると、活性を有する発光タンパク質（ルシフェラーゼ）が、レポーター・タンパク質として、細胞内に産生される。その発現量は、野生株植物中における、対象の特定の遺伝子がコードする、対象のタンパク質の発現量に、相当するものとなる。 Specifically, for a specific gene present on the genomic DNA of a plant, a ribosome containing a region encoding a protein amino acid sequence encoded by the gene is continued so as to continue transcription. A region (ORF portion) encoding the photoprotein (luciferase) of the luciferase gene is inserted immediately after the sequence (IRES) that enables binding to and linked. This chimeric gene has a promoter sequence that causes transcription of a specific gene in the upstream portion, and transcription is similarly initiated by an endogenous factor that induces expression of the specific gene. In that case, the obtained mRNA becomes translatable into a peptide chain of a photoprotein (luciferase) encoded by the inserted luciferase gene. That is, translation is performed based on the mRNA, and when folding proceeds into a protein, an active photoprotein (luciferase) is produced in the cell as a reporter protein. The expression level corresponds to the expression level of the target protein encoded by the target specific gene in the wild-type plant.

後述する具体例においては、シロイヌナズナ（Ａｒａｂｉｄｏｐｓｉｓｔｈａｌｉａｎａ）の染色体ＤＮＡ中に、前記ＩＲＥＳを利用して、レポーター遺伝子を含む、塩基配列が既知のＤＮＡ断片を挿入する手法を利用して、創製されるシロイヌナズナの形質転換体を利用している。その際、既知のＤＮＡ断片の挿入は、染色体ＤＮＡ中に存在する、前記ＩＲＥＳの塩基配列を有する複数部位の、いずれか一つに、ランダムに起こる条件が選択されている。従って、その複数部位の何れに、既知のＤＮＡ断片の挿入がなされているかによって、それぞれ異種の形質転換体が得られる。 In a specific example to be described later, Arabidopsis thaliana is created using a technique of inserting a DNA fragment having a known base sequence containing a reporter gene into the chromosomal DNA of Arabidopsis thaliana using the IRES. The transformant is used. At this time, a condition is selected in which insertion of a known DNA fragment occurs randomly in any one of a plurality of sites having a base sequence of the IRES present in chromosomal DNA. Accordingly, different types of transformants can be obtained depending on which of the plurality of sites has a known DNA fragment inserted therein.

特に、レポーター遺伝子とし、植物体において、組み換え発現した際、酵素活性を有するタンパク質の産生がなされることが、従来から検証されている、ホタル由来のルシフェラーゼをコードするルシフェラーゼ遺伝子を利用している。なお、ホタル由来のルシフェラーゼは、例えば、ホタル草で検証されているように、植物体の種々の器官、例えば、根、茎、葉を構成する細胞で、酵素活性を有する、組み換え発現タンパク質として、産生される。該ホタル由来のルシフェラーゼの基質を当該植物体の根から吸収させ、導管を経由して、該植物体の各器官、組織の細胞へと供給すると、細胞内において、組み換え発現ルシフェラーゼの酵素活性により、該基質ルシフェリンから、オキシルシフェリンと変換され、該オキシルシフェリンに由来する、青色の化学発光が等量的に起こる。 In particular, as a reporter gene, a luciferase gene encoding a firefly-derived luciferase, which has been conventionally verified to produce a protein having an enzyme activity when recombinantly expressed in a plant body, is used. In addition, firefly-derived luciferase is a recombinantly expressed protein having enzyme activity in cells constituting various organs of plants, for example, roots, stems, and leaves, as verified in firefly grass, for example. Produced. When the substrate of the luciferase derived from the firefly is absorbed from the root of the plant body and supplied to the cells of each organ and tissue of the plant body via the conduit, the enzymatic activity of the recombinantly expressed luciferase in the cell, The substrate luciferin is converted to oxyluciferin, and blue chemiluminescence derived from the oxyluciferin occurs in an equal amount.

挿入部位が相違する、複数種の形質転換体について、自家交配させて、植物個体それぞれのＴ１種子を収穫する。このＴ１種子を播種し、生長した植物体において、ホタル由来のルシフェラーゼの組み換え発現が確認されたラインについて、自家交配させて、Ｔ２種子を採取する。具体例では、このホタル由来のルシフェラーゼの組み換え発現の表現型を示すラインから採取したＴ２種子を播種し、生長した植物体を測定対象としている。 A plurality of types of transformants having different insertion sites are self-mated to harvest T1 seeds of each plant individual. This T1 seed is sown, and in the grown plant body, a line in which recombinant expression of firefly-derived luciferase is confirmed is self-mated and T2 seed is collected. In a specific example, T2 seeds collected from a line showing the phenotype of the recombinant expression of firefly-derived luciferase are sown, and the grown plant body is measured.

なお、シロイヌナズナの「ＬｕｃＴａｇライン」型形質転換体の創製方法は、下記の文献に開示される手法を適用している。 In addition, the method disclosed in the following literature is applied to the method of creating the “LucTag line” type transformant of Arabidopsis thaliana.

参考文献：
The Plant Journal (2003) 35, 273-283
Gene trapping of the Arabidopsis genome with a firefly luciferase reporter
Yoshiharu Y. Yamamoto, Yumi Tsuhara, Kazuhito Gohda, Kumiko Suzuki and Minami Matsui References:
The Plant Journal (2003) 35, 273-283
Gene trapping of the Arabidopsis genome with a firefly luciferase reporter
Yoshiharu Y. Yamamoto, Yumi Tsuhara, Kazuhito Gohda, Kumiko Suzuki and Minami Matsui

前記の手法で創製される「ＬｕｃＴａｇライン」型形質転換体植物では、挿入されるＤＮＡ断片自体は、プロモータ配列を有していないので、当該植物において、内因性の発現誘導がなされる遺伝子の発現に伴い、その下流に挿入されている「レポーター遺伝子」の転写がなされている。さらには、染色体ＤＮＡ中に、二部位以上「レポーター遺伝子」の挿入がなされている場合には、その二つの「レポーター遺伝子」の転写が行われる可能性がある。「レポーター遺伝子」のルシフェラーゼ遺伝子の転写に伴って、組み換え発現されるルシフェラーゼのタンパク質量が、特定の一つの遺伝子の発現量を反映するものとするため、染色体ＤＮＡ中に、一部位のみに「レポーター遺伝子」が挿入されている形質転換体を、本発明で利用する「ＬｕｃＴａｇライン」型形質転換体として、選択する。 In the “LucTag line” type transformant plant created by the above method, since the inserted DNA fragment itself does not have a promoter sequence, the expression of a gene that induces endogenous expression in the plant. Along with this, transcription of the “reporter gene” inserted downstream is carried out. Furthermore, when two or more “reporter genes” are inserted into the chromosomal DNA, the two “reporter genes” may be transcribed. Since the amount of protein of recombinantly expressed luciferase accompanying the transcription of the luciferase gene of the “reporter gene” reflects the expression level of one specific gene, The transformant in which the “gene” is inserted is selected as the “LucTag line” type transformant used in the present invention.

すなわち、ライン化の段階で、播種されたＴ１種子から生長する植物体について、その染色体ＤＮＡを採取し、「レポーター遺伝子」の組み換えがなされている染色体ａ^*と、遺伝子組み換えがなされていない相同染色体ａとからなる、（ａ^*・ａ）型の染色体構成を有する「ヘテロ型」となっていることを確認する。表現型の確認に加えて、この「ヘテロ型」である点も確認された、「ＬｕｃＴａｇライン」型形質転換植物体を、自家交配させて、Ｔ２種子を採取する。この「ヘテロ型」形質転換植物体から採取されるＴ２種子は、（ａ^*・ａ^*）の染色体構成を有する「ホモ型」、ならびに、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」、さらに、（ａ・ａ）の染色体構成を有する「野生型」と、遺伝型が相違する種子が混在したものとなっている。 That is, at the stage of line formation, the chromosomal DNA of the plant growing from the sowed T1 seed is collected, the chromosome a ^{* in} which the “reporter gene” has been recombined, and the homologous chromosome in which the gene recombination has not been performed. It is confirmed that it is a “heterotype” having a chromosome structure of (a ^* · a) type consisting of a. In addition to confirming the phenotype, the “LucTag line” -type transformed plant that has also been confirmed to be “hetero-type” is self-crossed to collect T2 seeds. T2 seeds collected from this “heterotype” transformed plant have a “homotype” having a chromosome configuration of (a ^* · a ^* ), and (a ^* · a) or (a · a ^* ) A “heterotype” having a chromosomal configuration and a “wild type” having a chromosomal configuration of (a · a) are mixed with seeds having different genotypes.

自家交配を行った際、相同染色体間で、遺伝的組み換えがなされる結果、遺伝子連鎖群の間での交叉が生じ、Ｔ２種子においては、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」でも、「遺伝子型」の相違が存在する。その際、目的とする「レポーター遺伝子」の組み換えがなされている「対象遺伝子」が存在する染色体上の、遺伝子連鎖群の構成に起因して、発現の頻度に差異が生じることもある。あるいは、「対象遺伝子」と、その対立遺伝子において、並列して、発現が生じている場合に、その発現の頻度割合に、遺伝子連鎖群の構成に起因して、差異が生じる場合もある。 When self-mating is carried out, genetic recombination occurs between homologous chromosomes, resulting in crossover between gene linkage groups. In T2 seeds, (a ^* · a) or (a · a ^* ) Even in “heterotype” having a chromosomal structure, there is a difference in “genotype”. At this time, there may be a difference in expression frequency due to the structure of the gene linkage group on the chromosome where the “target gene” in which the target “reporter gene” has been recombined is present. Alternatively, when expression occurs in parallel in the “target gene” and its allele, there may be a difference in the frequency ratio of the expression due to the configuration of the gene linkage group.

この点をより具体的に説明すると、Ｔ１種子の染色体の遺伝子連鎖群の構成が、（Ａ₁Ａ₂ ^*，ａ₁ａ₂）である際、「ヘテロ型」のＴ２種子においては、その染色体の遺伝子連鎖群の構成は、少なくとも、（Ａ₁Ａ₂ ^*，ａ₁ａ₂）または（Ａ₁ａ₂，ａ₁Ａ₂ ^*）の二種が混在する。その際、「対象遺伝子」Ａ₂ ^*と、その対立遺伝子ａ₂とが、並列的に発現している際、（Ａ₁Ａ₂ ^*，ａ₁ａ₂）の構成では、発現頻度は、「対象遺伝子」Ａ₂ ^*＞対立遺伝子ａ₂であるが、（Ａ₁ａ₂，ａ₁Ａ₂ ^*）の構成では、発現頻度は、「対象遺伝子」Ａ₂ ^*＜対立遺伝子ａ₂であるという現象も考えられる。 To explain this point more specifically, when the composition of the gene linkage group of the chromosome of the T1 seed is (A ₁ A ₂ ^* , a ₁ a ₂ ), At least two types of (A ₁ A ₂ ^* , a ₁ a ₂ ) or (A ₁ a ₂ , a ₁ A ₂ ^* ) are mixed. At that time, when the “target gene” A ₂ ^* and its allele a ₂ are expressed in parallel, in the configuration of (A ₁ A ₂ ^* , a ₁ a ₂ ), the expression frequency is “ “Target gene” A ₂ ^* > allele a ₂ , but in the configuration of (A ₁ a ₂ , a ₁ A ₂ ^* ), the expression frequency is “target gene” A ₂ ^* <allele a ₂ A phenomenon is also conceivable.

同じく、「ホモ型」のＴ２種子においても、その染色体の遺伝子連鎖群の構成には、少なくとも、（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）または（Ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）の二種が混在する。この二種の間でも、「対象遺伝子」Ａ₂ ^*の発現頻度は、若干の差異が生じる可能性もある。 Similarly, in the “homotype” T2 seed, the composition of the gene linkage group of the chromosome is at least (A ₁ A ₂ ^* , A ₁ A ₂ ^* ) or (A ₁ A ₂ ^* , a ₁ A _2. ^* ) The two types are mixed. Even between these two types, the expression frequency of the “target gene” A ₂ ^* may be slightly different.

従って、一つのＴ１種子に基づく「ＬｕｃＴａｇライン」内でも、詳細にみると、複数のＴ２種子から生育させた植物体の間では、「ホモ型」と「ヘテロ型」の遺伝型において、特定の遺伝子の発現に付随する、「レポーター遺伝子」の産物、組み換え発現されるルシフェラーゼのタンパク質の量の時間的変化に差異が生じる可能性がある。さらには、「ホモ型」、「ヘテロ型」に大別される植物体群の間でも、その「遺伝子型」を反映する微視的な差異が見出される可能性もある。 Therefore, even in the “LucTag line” based on one T1 seed, in particular, among the plant bodies grown from a plurality of T2 seeds, there are specific types of “homotype” and “heterotype” genotypes. There may be a difference in the temporal change in the amount of the “reporter gene” product, recombinantly expressed luciferase protein associated with gene expression. Furthermore, there is a possibility that a microscopic difference reflecting the “genotype” may be found among plant groups roughly classified into “homotype” and “heterotype”.

「ＬｕｃＴａｇライン」の作成に用いた野性型植物体における、染色体の遺伝子連鎖群の構成が、（Ａ₁Ａ₂，ａ₁ａ₂）である際、「レポーター遺伝子」のルシフェラーゼ遺伝子の挿入が、相同染色体上に存在する、対立遺伝子の「Ａ２」になされると、形質転換植物は、（Ａ₁Ａ₂ ^*，ａ₁ａ₂）の遺伝子型となる。この形質転換植物を、自家交配させて、採取されるＴ１種子には、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）、（Ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）、（ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁ａ₂）、（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，ａ₁ａ₂）；「野生型」の（Ａ₁ａ₂，Ａ₁ａ₂）、（Ａ₁ａ₂，ａ₁ａ₂）、（ａ₁ａ₂，ａ₁ａ₂）の組み合わせを含む可能性がある。その後、Ｔ１種子から生育させた植物体において、「レポーター遺伝子」の産物、組み換え発現されるルシフェラーゼのタンパク質を産生することが可能なものは、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）、（Ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）、（ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁ａ₂）、（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，ａ₁ａ₂）である。 When the structure of the chromosomal gene linkage group in the wild type plant body used for the creation of the “LucTag line” is (A ₁ A ₂ , a ₁ a ₂ ), the insertion of the luciferase gene of the “reporter gene” When the allele “A2” is present on the homologous chromosome, the transformed plant has a genotype of (A ₁ A ₂ ^* , a ₁ a ₂ ). T1 seeds collected by self-mating this transformed plant are “homotype” (A ₁ A ₂ ^* , A ₁ A ₂ ^* ), (A ₁ A ₂ ^* , a ₁ A ₂ ^*). ), (A ₁ A ₂ ^* , a ₁ A ₂ ^* ); “Hetero” (A ₁ A ₂ ^* , a ₁ a ₂ ), (A ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ _{^{_{_{a 2 *, a 1 a 2}}}} ), (a 1 a 2 *, a 1 a 2); "wild-type" of _{_{_{(a 1 a 2, a 1}}} a 2), (a 1 a 2, a 1 a 2 ), (A ₁ a ₂ , a ₁ a ₂ ). Thereafter, in the plant grown from the T1 seed, the product of the “reporter gene” and those capable of producing the recombinantly expressed luciferase protein are the “homotype” (A ₁ A ₂ ^* , A ₁ _{^{_{a 2 *), (a 1}}} a 2 *, * a 1 a 2), (a 1 a 2 *, a 1 a 2 *); "heterozygous" in _{_{^{(a 1 a 2 *, a}}} 1 a 2) , (A ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , a ₁ a ₂ ).

次に、Ｔ１種子の染色体の遺伝子連鎖群の構成が、「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁ａ₂）である際、該Ｔ１種子から生育させた植物体「ＬｕｃＴａｇライン」において、自家交配させて、採取されるＴ２種子では、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）、（Ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）、（ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁ａ₂）、（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，ａ₁ａ₂）；「野生型」の（Ａ₁ａ₂，Ａ₁ａ₂）、（Ａ₁ａ₂，ａ₁ａ₂）、（ａ₁ａ₂，ａ₁ａ₂）の組み合わせを含む可能性がある。同じく、Ｔ１種子の染色体の遺伝子連鎖群の構成が、「ヘテロ型」の（ａ₁Ａ₂ ^*，Ａ₁ａ₂）である際、該Ｔ１種子から生育させた植物体「ＬｕｃＴａｇライン」において、自家交配させて、採取されるＴ２種子では、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）、（Ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）、（ａ₁Ａ₂ ^*，ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁ａ₂）、（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，Ａ₁ａ₂）、（ａ₁Ａ₂ ^*，ａ₁ａ₂）；「野生型」の（Ａ₁ａ₂，Ａ₁ａ₂）、（Ａ₁ａ₂，ａ₁ａ₂）、（ａ₁ａ₂，ａ₁ａ₂）の組み合わせを含む可能性がある。 Next, when the composition of the gene linkage group of the chromosome of the T1 seed is “heterotype” (A ₁ A ₂ ^* , a ₁ a ₂ ), in the plant “LucTag line” grown from the T1 seed In T2 seeds collected by self-mating, “homotype” (A ₁ A ₂ ^* , A ₁ A ₂ ^* ), (A ₁ A ₂ ^* , a ₁ A ₂ ^* ), (a ₁ A ₂ ^* , a ₁ A ₂ ^* ); “heterotype” (A ₁ A ₂ ^* , a ₁ a ₂ ), (A ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , a ₁ a ₂ ); “wild type” (A ₁ a ₂ , A ₁ a ₂ ), (A ₁ a ₂ , a ₁ a ₂ ), (a ₁ a ₂ and a ₁ a ₂ ). Similarly, when the composition of the gene linkage group of the chromosome of the T1 seed is “heterotype” (a ₁ A ₂ ^* , A ₁ a ₂ ), in the plant body “LucTag line” grown from the T1 seed, In T2 seeds collected by self-mating, “Homo” (A ₁ A ₂ ^* , A ₁ A ₂ ^* ), (A ₁ A ₂ ^* , a ₁ A ₂ ^* ), (a ₁ A ₂ ^* , A ₁ A ₂ ^* ); “Hetero” (A ₁ A ₂ ^* , a ₁ a ₂ ), (A ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , A ₁ a ₂ ), (a ₁ A ₂ ^* , a ₁ a ₂ ); “wild type” (A ₁ a ₂ , A ₁ a ₂ ), (A ₁ a ₂ , a ₁ a ₂ ), (a ₁ a ₂ , A ₁ a ₂ ).

一方、Ｔ１種子の染色体の遺伝子連鎖群の構成が、「ヘテロ型」の（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）である際、該Ｔ１種子から生育させた植物体「ＬｕｃＴａｇライン」において、自家交配させて、採取されるＴ２種子では、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，Ａ₁ａ₂）；「野生型」の（Ａ₁ａ₂，Ａ₁ａ₂）の組み合わせを含む可能性がある。また、Ｔ１種子の染色体の遺伝子連鎖群の構成が、「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁Ａ₂）である際、該Ｔ１種子から生育させた植物体「ＬｕｃＴａｇライン」において、自家交配させて、採取されるＴ２種子では、「ホモ型」の（Ａ₁Ａ₂ ^*，Ａ₁Ａ₂ ^*）；「ヘテロ型」の（Ａ₁Ａ₂ ^*，ａ₁Ａ₂）；「野生型」の（ａ₁Ａ₂，ａ₁Ａ₂）の組み合わせを含む可能性がある。 On the other hand, when the structure of the gene linkage group of the chromosome of T1 seed is “heterotype” (A ₁ A ₂ ^* , A ₁ a ₂ ), In T2 seeds collected by self-mating, “homotype” (A ₁ A ₂ ^* , A ₁ A ₂ ^* ); “heterotype” (A ₁ A ₂ ^* , A ₁ a ₂ ); It may include a combination of “wild type” (A ₁ a ₂ , A ₁ a ₂ ). Further, when the composition of the gene linkage group of the chromosome of T1 seed is “heterotype” (A ₁ A ₂ ^* , a ₁ A ₂ ), in the plant body “LucTag line” grown from the T1 seed, In T2 seeds collected by self-mating, “homo-type” (A ₁ A ₂ ^* , A ₁ A ₂ ^* ); “hetero-type” (A ₁ A ₂ ^* , a ₁ A ₂ ); It may contain a combination of “wild type” (a ₁ A ₂ , a ₁ A ₂ ).

このように、「ヘテロ型」と大別される場合であっても、ライン化されるＴ１種子の染色体の遺伝子連鎖群の構成の相違によって、採取されるＴ２種子の構成は、大きな違いを有するものとなる。「ヘテロ型」と大別される「ＬｕｃＴａｇライン」相互における差異の有無を検証する上では、各ラインについて、測定に供するＴ２種子から生育した植物体の個体数が、上記の組み合わせ数を超えていることが必要である。 Thus, even if it is roughly classified as “heterotype”, the composition of the collected T2 seeds has a large difference due to the difference in the composition of the gene linkage group of the chromosome of the T1 seed to be lined. It will be a thing. In verifying the presence or absence of differences between the “LucTag lines”, which are broadly classified as “heterotypes”, the number of plants grown from T2 seeds for measurement exceeds the number of combinations described above for each line. It is necessary to be.

その点を考慮に入れ、下記の具体例では、各Ｔ１種子から生育させた植物体「ＬｕｃＴａｇライン」において、自家交配させて、採取されるＴ２種子のうち、少なくとも、２⁴個（１６個）を一組として、播種し、生長させた植物体について、特定の遺伝子の発現に付随する、「レポーター遺伝子」の産物、組み換え発現されるルシフェラーゼのタンパク質の有無、ならびに、組み換え発現されるルシフェラーゼのタンパク質の量の時間的変化を測定している。 Taking into account that point, in the specific examples below, plants grown from the T1 seeds in "LucTag line", by self-pollination, of the T2 seeds are collected, at least, 2 ⁴ (16) As a set, plants that have been sown and grown are associated with the expression of a specific gene, the product of a “reporter gene”, the presence or absence of a recombinantly expressed luciferase protein, and the recombinantly expressed luciferase protein. The amount of time change is measured.

なお、各ラインにおいて、Ｔ２種子から生育した植物体中に含まれる、「野生型」個体数が予想を超えて多い場合には、更に、２⁴個（１６個）一組を追加することが好ましい。すなわち、同じ実験条件下においても、各植物個体における「特定の遺伝子の発現」が、上述するような「ポワッソン分布」型の確率変数に従っている際には、例えば、ピークを示す時刻、あるいは、そのピークの高さの「バラツキ」が起こる可能性があるので、「レポーター遺伝子」の産物、組み換え発現されるルシフェラーゼを産生している個体数を一定数以上とする必要がある。 In each line, contained in the plants grown from T2 seed, when the "wild-type" population is large beyond expectation, it is further possible to add 2 ⁴ (16) one set preferable. That is, even under the same experimental conditions, when the expression of a specific gene in each plant individual follows a “Poisson distribution” type random variable as described above, Since there is a possibility that the peak height may vary, the number of individuals producing the product of the “reporter gene” and the recombinantly expressed luciferase needs to be a certain number or more.

測定対象である、各ラインにおいて、Ｔ２種子から生育した植物体個々には、その起源であるＴ１種子に付されている「ライン」名の識別子と、そのＴ２種子に対して付されている識別子（枝番）とを結合し、「ライン」名−「枝番」型の個体を特定するＩＤが付される。また、Ｔ１種子に付されている「ライン」名の識別子は、「ＬｕｃＴａｇ」が挿入されている部位、すなわち、「特定の遺伝子」を特定する識別子と、そのＴ１種子に対して付されている識別子とを結合し、「特定の遺伝子」名−「枝番」型の表記とされる。 In each line to be measured, in each plant grown from the T2 seed, an identifier of the “line” name given to the T1 seed that is the origin, and an identifier given to the T2 seed An ID for identifying an individual of “line” name- “branch number” type is attached. Further, the identifier of the “line” name attached to the T1 seed is attached to the portion where “LucTag” is inserted, that is, the identifier that specifies the “specific gene” and the T1 seed. The identifier is combined with the “specific gene” name—the “branch number” type.

各Ｔ１種子は、播種され、生育された植物体となった段階で、目的とする「レポーター遺伝子」の産物、組み換え発現されるルシフェラーゼのタンパク質の有無に関して、確認がなされており、組み換え発現されるルシフェラーゼのタンパク質の有無の「表現型」は、特定されている。勿論、この段階で、当該Ｔ１種子から生育された植物体において、組み換え発現されるルシフェラーゼのタンパク質の存在が確認されない「ライン」は、「ＬｕｃＴａｇライン」型形質転換植物ではないと判断される。 Each T1 seed has been confirmed to be recombinantly expressed at the stage where it has been sown and grown to the presence of a target “reporter gene” product and a recombinantly expressed luciferase protein. A “phenotype” of the presence or absence of a luciferase protein has been identified. Of course, at this stage, the “line” in which the presence of recombinantly expressed luciferase protein is not confirmed in the plant grown from the T1 seed is determined not to be a “LucTag line” type transformed plant.

（２）測定対象の「ＬｕｃＴａｇライン」型形質転換植物個体における「ルシフェラーゼ酵素活性」の時間的変化の測定
「ＬｕｃＴａｇライン」型形質転換植物の各ラインについて、そのＴ２種子を２⁴個（１６個）一組として、９６ウエルプレートの各穴に種を一粒ずつ播き、発芽を確認した後、所定の時間間隔で、その幼苗体について、「ルシフェラーゼ酵素活性」を測定する。具体的には、基質ルシフェリンを所定濃度含む、水性培地上で、発芽させ、その根から、吸収させた基質ルシフェリンが、該幼苗体の地上部の器官を構成する細胞集団において、組み換え発現しているルシフェラーゼの酵素活性により、酵素的に変換されたオキシルシフェリンに由来する、青色の化学発光の強度を所定の時間間隔で測定する。 (2) for measuring object "LucTag line" type transformation measured temporal variation of the "luciferase enzyme activity" in a plant individual "LucTag line" type transformation each line of plants, 2 ^four the T2 seeds (16 ) As a set, seeds are seeded one by one in each hole of a 96-well plate, and after germination is confirmed, “luciferase enzyme activity” is measured for the seedlings at predetermined time intervals. Specifically, the substrate luciferin germinated on an aqueous medium containing a predetermined concentration of the substrate luciferin and absorbed from its root is expressed recombinantly in the cell population constituting the above-ground organ of the seedling body. The intensity of blue chemiluminescence derived from enzymatically converted oxyluciferin is measured at predetermined time intervals based on the enzymatic activity of the luciferase.

オキシルシフェリンに由来する、青色の化学発光のスペクトルは既知であり、その化学発光のピーク波長を含む、狭い波長幅の化学発光の光強度を測定する。なお、組み換え発現されるルシフェラーゼのタンパク質量は、対象とする「特定の遺伝子」の発現量に比例するため、該「特定の遺伝子」の発現が低水準にある時間帯では、組み換え発現されるルシフェラーゼのタンパク質量も低い水準に留まっている。すなわち、その時間帯では、観測されるオキシルシフェリンに由来する、青色の化学発光の光強度も低くなっている。このような極弱い化学発光の光強度の測定においても、十分な測定感度を達成する上では、フォトン・カウンディング法を利用する。「フォトン・カウンディング法」では、測定器で受光するフォトンを一つずつカウントし、測定開示時から、所定の数のフォトンをカウントするまでに要する時間を測定する。所定のフォトン数、すなわち、閾値フォトン数：Ｎ_p-th.に達する時間：Δｔ_obs.は、受光する光強度、すなわち、単位時間当たり、測定器で受光するフォトン数：ｄＮ_photon／ｄｔを用いて、
Ｎ_p-th.≒｛ｄＮ_photon／ｄｔ｝×Δｔ_obs.
と近似できる。従って、受光する光強度：ｄＮ_photon／ｄｔは、
ｄＮ_photon／ｄｔ≒Ｎ_p-th./Δｔ_obs.
として、表記される。 A spectrum of blue chemiluminescence derived from oxyluciferin is known, and the light intensity of chemiluminescence having a narrow wavelength width including the peak wavelength of the chemiluminescence is measured. The amount of recombinantly expressed luciferase protein is proportional to the amount of expression of the “specific gene” of interest, so that the luciferase that is recombinantly expressed in the time zone when the expression of the “specific gene” is at a low level. The amount of protein remains at a low level. That is, in that time zone, the light intensity of blue chemiluminescence derived from the observed oxyluciferin is also low. The photon counting method is used to achieve sufficient measurement sensitivity even in the measurement of the light intensity of such extremely weak chemiluminescence. In the “photon counting method”, the photons received by the measuring device are counted one by one, and the time required from when the measurement is disclosed until a predetermined number of photons are counted is measured. The predetermined number of photons, that is, the time to reach the threshold photon number: N _{p-th .} : Δt _obs. Is used as the received light intensity, ie, the number of photons received by the measuring device per unit time: dN _photon / dt And
N _p-th. ≈ {dN _photon / dt} × Δt _obs.
Can be approximated. Therefore, the received light intensity: dN _photon / dt is
dN _photon / dt≈N _p-th. / Δt _obs.
Is expressed as:

但し、受光する光強度、すなわち、単位時間当たり、測定器で受光するフォトン数：ｄＮ_photon／ｄｔが、実質的に「０」である際には、閾値フォトン数：Ｎ_p-th.に達する時間：Δｔ_obs.は、「∽」となってしまう。従って、実際には、所定の時間幅：Δｔ_gateの間に、測定器で受光されるフォトン数Ｎ_p-gateが、前記の、閾値フォトン数：Ｎ_p-th.に達しない場合には、受光する光強度：ｄＮ_photon／ｄｔは、
ｄＮ_photon／ｄｔ≒Ｎ_p-gate/Δｔ_gateとして、近似する。 However, when the received light intensity, that is, the number of photons received by the measuring device per unit time: dN _photon / dt is substantially “0”, the threshold number of photons reaches N _p-th. Time: Δt _{obs. Becomes} “∽”. Therefore, in actuality, when the number of photons N _p-gate received by the measuring device during the predetermined time width: Δt _gate does not reach the above-mentioned threshold number of photons: N _p-th. Light intensity received: dN _photon / dt is
Approximate as dN _photon / dt≈N _p-gate / Δt _gate .

なお、実際の「フォトン・カウンディング」型の測定系は、予め設定された、極く短い時間幅：δｔ_k毎に、その間に受光されたフォトン数：Ｎ_p-obs.（δｔ_k）を、積算した値：ΣＮ_p-obs.（δｔ_k）が、閾値フォトン数：Ｎ_p-th.を超えるか否かを判定し、超えた時点で、Δｔ_obs.＝Σδｔ_kとしている。また、受光器自体の測定感度は、この極く短い時間幅：δｔ_kに数個のフォトンを受光する際、それを高い確度でカウント可能なように、極めて高い感度に設定されている。具体的には、この極く短い時間幅：δｔ_k中に、閾値フォトン数：Ｎ_p-th.の１／１０程度のフォトン数を超える光入射がなされると、「測定可能上限」を超えた状態となるような、測定感度に設定される場合が多い。 The actual “photon counting” type measurement system uses a preset very short time width: δt _k and the number of photons received during that time: N _p-obs. (Δt _k ) _. , the integrated _{value:. ΣN p-obs (δt} k) is a threshold number of _photons:. N _p-th determines whether more than, at the time of exceeding, and a Δt _obs = Σδt _{_k..} The measurement sensitivity of the light receiver itself, the very short duration: when receiving several photons .DELTA.t _k, so as to be counted it with high accuracy, is set to a very high sensitivity. Specifically, the very short duration: in .DELTA.t _k, the threshold number of _photons:. If N _p-th 1/10 approximately of the light incident exceeding the number of photons is made greater than "measurable limit" In many cases, the measurement sensitivity is set so as to be

仮に、化学発光の光強度を測定している間に、極く短い時間幅であるが、本来観測すべき、化学発光の光強度と比較して、「パルス状の迷光」に起因する格段に多くのフォトンが、受光器に入射すると、その極く短い時間幅：δｔ_kは、「測定可能上限」を超えた状態となる。受光器は、一旦、「測定可能上限」を超えた状態となると、それ以降、測定を停止し、「限度を超えた光入射」に起因する感光面の一時的な損傷を回復させるモードになる。その際、実際の測定は完了していないが、「仮の測定結果」として、測定感度として設定されている、「測定可能上限」の光強度の値を示す。実際には、「パルス状の迷光」が入射された、極く短い時間幅：δｔ_kの間に、「測定可能上限」のフォトン数：Ｎ_p-obs.LIMITを超えた時点で、実際にカウントされていたフォトン数：Ｎ_{p-obs.current}（δｔ_k）を、その極く短い時間幅：δｔ_kで除した値；Ｎ_{p-obs.current}（δｔ_k）／δｔ_kの値が、「仮の測定結果」として、出力される。勿論、「仮の測定結果」は、「測定可能上限」の光強度：Ｎ_p-obs.LIMIT/δｔ_kを超えた値となっている。 Temporarily, while measuring the light intensity of chemiluminescence, it is a very short time width, but compared with the light intensity of chemiluminescence that should be observed originally, it is remarkably caused by "pulsed stray light". When a large number of photons are incident on the light receiver, the extremely short time width: δt _k exceeds the “measurable upper limit”. Once the receiver reaches a state where the "measurable upper limit" is exceeded, the measurement is stopped thereafter, and a mode in which temporary damage to the photosensitive surface due to "light incidence exceeding the limit" is recovered is entered. . At that time, although the actual measurement is not completed, the “provisional measurement result” indicates the light intensity value of the “measurable upper limit” set as the measurement sensitivity. In fact, "pulsed stray light" is incident, very short duration: during .DELTA.t _k, the number of photons "Measurable limit": at the time of exceeding the _{N p-obs.LIMIT,} actually the value of _{_{N p-obs.current (δt k)}} / δt k,; divided by .DELTA.t _k: counted once was the number of photons: the _{_{N p-obs.current (δt k)}} , the very short duration Output as “provisional measurement result”. Of course, "tentative measurement result", the light intensity of "measurable limit": has a value greater than _{N p-obs.LIMIT} / δt _k.

本発明に基づく、解析を行う際には、前記の「パルス状の迷光」入射などの、受光器系の「測定エラー」に起因する、「誤った測定結果」を予め、除去した上で、その時点（ｔ_i）において、本来観測されるであろう「測定結果」を推定し、その推定値で補完した、「ルシフェラーゼ酵素活性」の時間的変化を示す「測定誤差補正済データ」を用いる。 When performing an analysis based on the present invention, after removing in advance the “wrong measurement result” caused by the “measurement error” of the receiver system, such as the above-mentioned “pulsed stray light” incidence, At that time (t _i ), the “measurement error corrected data” indicating the temporal change of the “luciferase enzyme activity”, which is estimated by “estimated” “measurement result” that is supposed to be observed and supplemented with the estimated value, is used. .

また、測定対象である各植物体（サンプル）：Ｓ_Plant（ｍ）について、「ルシフェラーゼ酵素活性」の時間的変化を測定した「元データ」は、実際に、ある時刻（ｔ_0-m）に測定を開始し、その後、目標とする時間間隔：Δｔ_intervalで、順次、その時点（ｔ_i-m）における、「ルシフェラーゼ酵素活性」を示す指標である、オキシルシフェリンに由来する、青色の化学発光の光強度の測定値：Ｐ_lum.obs.-m（ｔ_i-m）を時系列的に測定したものである。すなわち、実際の測定がなされた時刻と、化学発光の光強度の測定値との組：（ｔ_i-m，Ｐ_lum.obs.-m（ｔ_i-m））を、時系列的に記録した「データ」形式となっている。この実際の測定がなされた時刻（ｔ_i-m）は、ＨＨ：ＭＭ：ＳＳ［ＡＭ／ＰＭ］の形式となっている。 In addition, for each plant body (sample): S _Plant (m) to be measured, “original data” obtained by measuring the temporal change in “luciferase enzyme activity” is actually at a certain time (t _0-m ). Measurement is started, and then blue chemiluminescence light derived from oxyluciferin, which is an index indicating “luciferase enzyme activity” at the target time interval: Δt _interval in that order (t _im ). Measured value of intensity: P _lum.obs.-m (t _im ) measured in time series. That is, “data” in which a set of (t _im , P _lum.obs.-m (t _im )) of the time when actual measurement is performed and the measured value of the light intensity of chemiluminescence is recorded in time series. It has a format. The time (t _im ) when this actual measurement was made is in the format of HH: MM: SS [AM / PM].

（３）各植物体の「ルシフェラーゼ酵素活性」の時間的変化の測定データの加工
（３−１）時刻表示の経過時間表示への変換（「経過時間変換」処理）
測定対象である各植物体（サンプル）：Ｓ_Plant（ｍ）は、播種後、発芽し、所定の生長条件下において、その「ルシフェラーゼ酵素活性」の時間的変化を測定されている。各「ＬｕｃＴａｇライン」に関して、そのＴ２種子を２⁴個（１６個）一組として、実験を開始する時刻を揃えている。この実験の開始時刻（ｔ_start-m）を基準として、実際の測定がなされた時刻（ｔ_i-m）までの経過時間：ｔ_m［ｉ］＝（ｔ_i-m−ｔ_start-m）を算出し、時刻表示型の時系列的測定データ（ｔ_i-m，Ｐ_lum.obs.-m（ｔ_i-m））を、経過時間表示型の時系列的データ（ｔ_m［ｉ］，Ｐ_lum.obs.-m（ｔ_m［ｉ］））に変換する。 (3) Processing of measurement data of temporal change of “luciferase enzyme activity” of each plant body (3-1) Conversion of time display to elapsed time display (“elapsed time conversion” processing)
Each plant (sample) to be measured: S _Plant (m) germinates after sowing, and the temporal change in “luciferase enzyme activity” is measured under predetermined growth conditions. For each "LucTag line", the T2 seeds as a ^two-four (16) a pair, are aligned with the time to start the experiment. Based on the start time (t _start-m ) of this experiment, the elapsed time up to the time (t _im ) when the actual measurement was made: t _m [i] = (t _im −t _start-m ) The time display type time-series measurement data (t _im , P _lum.obs.-m (t _im )) is changed from the elapsed time display type time series data (t _m [i], P _lum.obs.-m (T _m [i])).

（３−２）受光器系の「測定エラー」に起因する「誤った測定結果」を除去し、「推定値」で補完する「測定誤差補正」処理
上で説明したように、「ルシフェラーゼ酵素活性」の時間的変化は、本来、二次微分可能な連続的な関数：Ｐ_lum.-m（ｔ_m）で表記できると予測される。特には、そのピークは、「パルス状」の急激な増加、減少を示すことはない。 (3-2) “Measurement error correction” process that eliminates “incorrect measurement results” caused by “measurement errors” in the receiver system and supplements with “estimated values” As described above, “luciferase enzyme activity ”Is predicted to be expressed by a continuous function that can be second-order differentiated: P _lum.-m (t _m ). In particular, the peak does not show a “pulse-like” sudden increase or decrease.

一方、受光器系の「測定エラー」に起因する「誤った測定結果」は、その値は、受光器系の「測定可能上限」、あるいは、それを超える値となっており、また、その前後の測定時間における「誤差の無い測定結果」と比較して、「パルス状」の急激な増加、減少を示すものとなる。 On the other hand, the “wrong measurement result” due to the “measurement error” of the receiver system has a value that is at or exceeds the “measurable upper limit” of the receiver system. Compared with the “measurement result with no error” in the measurement time, the “pulse-like” increases and decreases abruptly.

図４の例示する、グラフ［Ａ２］に示すように、受光器系の「測定エラー」に起因する「誤った測定結果」は、所謂、「スパイク・ノイズ」に多く見られる「パルス状」のピークを与える。各測定時間の間隔：Δｔ_intervalが、少なくとも、組み換え発現された「ルシフェラーゼ」の失活、あるいは、分解を受ける過程の速度定数の逆数：ｔ_dig.≡｛１／ｋ_dig.｝よりも、十分に狭く設定されている際には、この時定数ｔ_dig.≡｛１／ｋ_dig.｝よりも、大幅に短い時定数で生じている「パルス状」のピークは、受光器系の「測定エラー」に起因する「誤った測定結果」と判断することができる。 As shown in the graph [A2] illustrated in FIG. 4, the “wrong measurement result” caused by the “measurement error” of the receiver system is a “pulse-like” often seen in so-called “spike noise”. Give a peak. Each measurement time interval: Δt _interval is at least sufficiently greater than the inactivation of recombinantly expressed “luciferase” or the reciprocal of the rate constant of the process undergoing degradation: t _dig. ≡ {1 / k _dig. } When the time constant t _dig. ≡ {1 / k _dig. }, The “pulse-like” peak generated with a time constant much shorter than the time constant t _dig. It can be determined as an “incorrect measurement result” due to an “error”.

ある時間ｔ_m［ｉ］の「ルシフェラーゼ酵素活性」のデータ：Ｐ_lum.obs.-m（ｔ_m［ｉ］）が、受光器系の「測定エラー」に起因する「誤った測定結果」か、否かを判定する基準として、下記の条件を用いることができる。 “Luciferase enzyme activity” data at a certain time t _m [i]: _Is P _lum.obs.-m (t _m [i]) an “incorrect measurement result” due to a “measurement error” of the receiver system? The following conditions can be used as criteria for determining whether or not.

具体的には、上述する「測定可能上限」の光強度：Ｐ_lum-LIMIT（≡Ｎ_p-obs.LIMIT/δｔ_k）を基準として、その１／１０以下の光強度を示す、緩やかに光強度が変化している状況において、「パルス状」の急激な増加、減少を示すピークを選別して、測定誤差として除去する。 Specifically, the light intensity of the above “measurable upper limit”: P _lum-LIMIT ( _{≡N p-obs.LIMIT} / δt _k ) is used as a reference, and the light intensity is 1/10 or less. In a situation where the intensity is changing, the peak indicating a sudden increase / decrease of “pulse” is selected and removed as a measurement error.

条件（１−０）：時間ｔ_m［ｉ］に「ピーク」の先端が存在している。 Condition (1-0): The tip of the “peak” exists at time t _m [i].

｛Ｐ_lum.obs.-m（ｔ_m［ｉ］）−Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｝×｛Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）−Ｐ_lum.obs.-m（ｔ_m［ｉ］）｝＜０
条件（１−１）：緩やかに光強度が変化している状況と推定される。 {P _lum.obs.-m (t _m [i]) − P _lum.obs.-m (t _m [i−1])} × {P _lum.obs.-m (t _m [i + 1]) − P _lum.obs.-m (t _m [i])} <0
Condition (1-1): It is estimated that the light intensity is gradually changing.

｜Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）−Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｜＜｛（Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］））^1/2＋（Ｐ_lum.obs.-m（ｔ_m［ｉ−１］））^1/2｝
図４の例示する、グラフ［Ａ２］においては、近似的に、
｜Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）−Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｜＜２０
の条件としている。 | P _lum.obs.-m (t _m [i + 1])-P _lum.obs.-m (t _m [i-1]) | <{(P _lum.obs.-m (t _m [i + 1]) ) ^1/2 + (P _lum.obs.-m (t _m [i-1])) ^1/2 }
In the graph [A2] illustrated in FIG.
| P _lum.obs.-m (t _m [i + 1])-P _lum.obs.-m (t _m [i-1]) | <20
As a condition.

条件（１−２）：緩やかに光強度が変化している状況において、「統計的に許容される」分散の範囲を超えていると、推定される。 Condition (1-2): In a situation where the light intensity is slowly changing, it is estimated that the range of the “statistically acceptable” dispersion is exceeded.

｜Ｐ_lum.obs.-m（ｔ_m［ｉ］）−１／２・｛Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）＋Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｝｜＜１／２・｛Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）＋Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｝
Ｐ_lum.obs.-m（ｔ_m［ｉ］）の測定結果は、平均値｛Ｐ_lum.-m（ｔ_m［ｉ］）｝のポワッソン分布に従うとすると、その平均値｛Ｐ_lum.-m（ｔ_m［ｉ］）｝は、前後の測定結果の平均値と仮定すると、実測値と平均値と間の許容される差異は、平均値｛Ｐ_lum.-m（ｔ_m［ｉ］）｝以下である。 | P _lum.obs.-m (t _m [i])-1/2 · {P _lum.obs.-m (t _m [i + 1]) + P _lum.obs.-m (t _m [i-1] )} | <1/2 · {P _lum.obs.-m (t _m [i + 1]) + P _lum.obs.-m (t _m [i-1])}
_{_{P lum.obs.-m (t m [}} i]) was measured and the result is, when following the Poisson distribution with a mean value _{_{{P lum.-m (t m}} [i])}, the average value _{P lum.- Assuming that _m (t _m [i])} is an average value of _previous and subsequent measurement results, an allowable difference between the actual measurement value and the average value is the average value {P _lum.-m (t _m [i] )}

図４の例示する、グラフ［Ａ２］においては、近似的に、
｜Ｐ_lum.obs.-m（ｔ_m［ｉ］）−１／２・｛Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）＋Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｝｜＜｜Ｄｉｖ．_max｜；
その際、｜Ｄｉｖ．_max｜＝１００と選択する；
の条件としている。 In the graph [A2] illustrated in FIG.
| P _lum.obs.-m (t _m [i])-1/2 · {P _lum.obs.-m (t _m [i + 1]) + P _lum.obs.-m (t _m [i-1] )} | <| Div. _max |
At that time, | Div. Select _max | = 100;
As a condition.

上記の条件（１−０）〜（１−２）を満たす場合、その時間時間ｔ_m［ｉ］に存在する、測定誤差「ピーク」：Ｐ_lum.obs.-m（ｔ_m［ｉ］）に代えて、１／２・｛Ｐ_lum.obs.-m（ｔ_m［ｉ＋１］）＋Ｐ_lum.obs.-m（ｔ_m［ｉ−１］）｝の値で置き換えて、「測定誤差補正」済みの時系列データ（ｔ_m［ｉ＋１］，Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］））とする。 When the above conditions (1-0) to (1-2) are satisfied, a measurement error “peak” existing at the time t _m [i]: P _lum.obs.-m (t _m [i]) Instead of the value of 1/2 · {P _lum.obs.-m (t _m [i + 1]) + P _lum.obs.-m (t _m [i-1])} " _Completed time series data (t _m [i + 1], P _{lum.corrected.-m} (t _m [i])).

この「測定誤差補正」を施すと、得られる「測定誤差補正」済み「時系列データ」は、図５に例示する、グラフ［Ａ３］に示すように、受光器系の「測定エラー」に起因する「誤った測定結果」は、除去され、緩やかに時間的に変化する「ルシフェラーゼ酵素活性」の波形に、通常の、統計的に許容される範囲の「測定上のバラツキ」が重畳された状態となる。 When this “measurement error correction” is performed, the “measurement error correction” and “time series data” obtained are caused by the “measurement error” of the optical receiver system as shown in the graph [A3] illustrated in FIG. The “incorrect measurement result” is removed, and the waveform of “luciferase enzyme activity” that changes slowly and temporally is superimposed with the normal and statistically acceptable “measurement variation” It becomes.

（４）各植物体の「ルシフェラーゼ酵素活性」の時間的変化に基づく、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」の発現の有無の判定、
図５に例示する、グラフ［Ａ３］に示すように、対象とする「ライン」においては、各植物体の測定部分の細胞集団全体で観測される「ルシフェラーゼ酵素活性」は、立ち上がり、立下りとも、２時間以上の時間を要する緩やかな変化を示すと推定される。 (4) Determination of the presence or absence of expression of “luciferase gene” of “reporter gene” based on temporal change of “luciferase enzyme activity” of each plant body,
As shown in graph [A3] illustrated in FIG. 5, in the target “line”, the “luciferase enzyme activity” observed in the entire cell population of the measurement part of each plant is rising and falling. It is presumed to show a gradual change requiring 2 hours or more.

その点を考慮して、グラフ［Ａ３］に示すような、１時間毎に測定されている、「測定誤差補正」済みの時系列データから、２時間毎の測定結果を示す、「測定誤差補正」済みの時系列データ（サブ・セット）を作成する。 In consideration of this point, “measurement error correction” indicates measurement results every two hours from time-series data that has been measured every hour as shown in the graph [A3] and has been subjected to “measurement error correction”. "Create time series data (sub set).

すなわち、図２に示す、「測定誤差補正」済みの時系列データ［Ａ３］、あるいは、図３に示す、「測定誤差補正」処理済みの時系列データ［Ｂ２］を作成し、下記の条件に基づき、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」の発現の有無の判定を行う。 That is, the time series data [A3] that has been subjected to the “measurement error correction” shown in FIG. 2 or the time series data [B2] that has been subjected to the “measurement error correction” process shown in FIG. Based on this, the presence or absence of expression of the “reporter gene” “luciferase gene” is determined.

測定系は、上述する「パルス状の迷光」に加えて、弱いが、連続的に入射される「バック・グラウンド」型の「迷光成分」をも受光している。そのため、「ルシフェラーゼ酵素活性」を示す、化学発光の時間的変化を反映する明確な光強度の水準を、一定時間以上継続されているか、否かを判定して、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」の発現の有無の判定とする。 In addition to the above-mentioned “pulse stray light”, the measurement system also receives a “background / ground” type “stray light component” which is weakly incident but is continuously incident. Therefore, it is determined whether or not a clear light intensity level reflecting the temporal change in chemiluminescence, which indicates “luciferase enzyme activity”, has been continued for a certain period of time, and the “luciferase gene” of the “reporter gene” is determined. It is determined whether or not “

具体的には、各植物体の測定部分の細胞集団全体で観測される「ルシフェラーゼ酵素活性」は、立ち上がり、立下りとも、２時間以上の時間を要する緩やかな変化を示すと推定されるので、２時間毎の測定結果を示す、「測定誤差補正」済みの時系列データ（サブ・セット）において、連続して、３つの時間における光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）が、一定の水準を超えている際、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」の発現がなされていると判定する。この「有意に、化学発光に起因する光強度が測定されている」とする、基準は、
「測定誤差補正」処理済みの時系列データ［Ｂ２］においては、
Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）≧３０を満たす時間：ｔ_m［ｉ］が、連続して、３つの時点、すなわち、４時間にわたっていると、選択している。 Specifically, since the “luciferase enzyme activity” observed in the entire cell population of the measurement part of each plant body is estimated to show a gradual change requiring 2 hours or more for both rising and falling, In the time-series data (sub-set) after “measurement error correction” indicating the measurement results every two hours, the light intensity at three times: P _{lum.corrected.-m} (t _m [i] ) Exceeds a certain level, it is determined that the “luciferase gene” of the “reporter gene” is expressed. The criterion that “significantly the light intensity due to chemiluminescence is measured” is
In time-series data [B2] that has been subjected to “measurement error correction” processing,
Time satisfying P _{lum.corrected.-m} (t _m [i]) ≧ 30: It is selected that t _m [i] continuously _extends over three time points, ie, 4 hours.

連続的に入射される「バック・グラウンド」型の「迷光成分」の強度は、同じ条件において、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」が導入されていない、「野生型」植物体において、観測される光強度と同じ水準である。この「バック・グラウンド」型の「迷光成分」の強度の水準の、少なくとも、２倍〜３倍に、前記の有意水準の下限値を設定することが好ましい。 The intensity of the “background” type “stray light component” that is continuously incident is observed in the “wild type” plant in which the “reporter gene” “luciferase gene” is not introduced under the same conditions. It is the same level as the light intensity. It is preferable to set the lower limit of the significance level to at least 2 to 3 times the intensity level of the “background” type “stray light component”.

前記の「有意に、化学発光に起因する光強度が測定されている」とする基準を満たさない、Ｔ２種子から生育された植物体は、当該「ライン」が、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」の遺伝型を示すＴ１種子に由来する場合、通常、（ａ，ａ）の染色体構成を有する「野生型」であると見做す。 Plants grown from T2 seeds that do not satisfy the above-mentioned criteria of “significantly, the light intensity caused by chemiluminescence is measured” have the “line” represented by (a ^* · a) or ( When it is derived from a T1 seed having a “heterotype” genotype having a chromosome configuration of a · a ^* ), it is normally considered to be a “wild type” having a chromosome configuration of (a, a).

実際には、各植物体の測定部分の細胞集団全体が、幾つかの部分集合の細胞数Ｎ_subgr-iの和、Ｎ_total＝ΣＮ_subgr-iと表記することができ、部分集合の総数が多く、この部分集合の細胞数Ｎ_subgr-iが、小さくなっており、その発現の時間的タイミングが、完全に分散していると、長時間にわたり、測定された光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）の数値積分値は、
∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ
≒Σ１／２・｛Ｐ_{lum.corrected.-m}（ｔ_m［ｉ＋１］）＋Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）｝・（ｔ_m［ｉ＋１］−ｔ_m［ｉ］）
となり、全体として、積分値は、一定水準を超えるが、個々の測定時間（ｔ_m［ｉ］）では、上記の基準を満たなさない場合もある。 Actually, the whole cell population of the measurement part of each plant body can be expressed as the sum of the number N _subgr-i of several subsets, N _total = ΣN _subgr-i, and the total number of subsets is In many cases, the cell number N _{subgr-i of} this subset is small, and when the temporal timing of its expression is completely dispersed, the light intensity measured over a long period of time: P _{lum.corrected.} The numerical integration value of _-m (t _m [i]) is
∫P _{lum.corrected.-m} (t) dt
≈Σ1 / 2 · {P _{lum.corrected.-m} (t _m [i + 1]) + P _{lum.corrected.-m} (t _m [i])} · (t _m [i + 1] −t _m [i])
As a whole, the integrated value exceeds a certain level, but the individual measurement time (t _m [i]) may not satisfy the above criteria.

具体的には、連続する４つの測定時間における、測定された光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）の数値が、２９，４５，４５，２９のように、実質的には、３０以上となるような場合であって、このような、４つの測定時間帯が、長時間にわたり、分散している結果、上記の基準を「形式的」に満たしていない場合も、「有意に、化学発光に起因する光強度が測定されている」とは見做さない。すなわち、当該植物体においては、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」が挿入されている「特定の遺伝子」の発現無しと判定される。 Specifically, the numerical values of the measured light intensity: P _{lum.corrected.-m} (t _m [i]) at four consecutive measurement times are substantially as 29, 45, 45, and 29. In the case where it is 30 or more, and such four measurement time zones are dispersed over a long period of time, the above criteria are not “formally” satisfied. It is not considered that “the light intensity due to chemiluminescence is measured significantly”. That is, in the plant body, it is determined that there is no expression of the “specific gene” in which the “luciferase gene” of the “reporter gene” is inserted.

一方、長時間にわたって、唯一箇所、連続する３つの測定時間において、測定された光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）の数値が、３１，３０，３１のように、なっており、その前後を含め、連続する５つの測定時間では、２８，３１，３０，３１，２７となっており、実質的には、３０以上となるか否かが不明である場合であっても、上記の基準を「形式的」に満たしている場合は、「有意に、化学発光に起因する光強度が測定されている」と見做す。すなわち、当該植物体においては、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」が挿入されている「特定の遺伝子」の発現有りと判定される。 On the other hand, the numerical value of the measured light intensity: P _{lum.corrected.-m} (t _m [i]) at a single point and three consecutive measurement times over a long time is 31, 30, 31 It is 28, 31, 30, 31, and 27 in five consecutive measurement times including before and after that, and it is actually a case where it is unknown whether or not it will be 30 or more. However, if the above criteria are “formally” satisfied, it is considered that “the light intensity caused by chemiluminescence is significantly measured”. That is, it is determined that the “specific gene” into which the “luciferase gene” of the “reporter gene” is inserted is expressed in the plant body.

対象とする「ライン」について、測定された各植物体（サンプル）に関して、上記の基準により判定される、「遺伝子の発現」有無の判定結果（表現型）は、ライン・テーブル３１に収納する。 The determination result (phenotype) of the presence or absence of “gene expression”, which is determined by the above-mentioned criteria for each measured plant body (sample) for the target “line”, is stored in the line table 31.

「発現の有無」に基づく、場合分け（分岐）では、対象とする「ライン」毎に、測定対象である各植物体（サンプル）：Ｓ_Plant（ｍ）は、そのＴ２種子を２⁴個（１６個）一組としており、その全てについて、「遺伝子の発現」有無の判定結果（表現型）を求め、少なくとも、一つの植物体（サンプル）：Ｓ_Plant（ｍ）が、「遺伝子の発現」有りと判定されると、「モデル波形形成」、「ラインの特徴把握」の解析操作を行う。 In case classification (branching) based on “presence / absence of expression”, for each target “line”, each plant body (sample) to be measured: S _Plant (m) has 2 ⁴ T2 seeds ( 16) As a set, for all of them, the determination result (phenotype) of presence or absence of “gene expression” is obtained, and at least one plant body (sample): S _Plant (m) is “gene expression” If it is determined that there is, analysis operations of “model waveform formation” and “line feature grasp” are performed.

逆に、対象とする「ライン」毎に、測定対象である各植物体（サンプル）：Ｓ_Plant（ｍ）は、そのＴ２種子を２⁴個（１６個）一組としており、その全てについて、「遺伝子の発現」有無の判定結果（表現型）を求め、いずれの植物体（サンプル）：Ｓ_Plant（ｍ）も、「遺伝子の発現」無しと判定されると、「モデル波形形成」、「ラインの特徴把握」の解析操作を行なわない。 Conversely, the target for each "line", the plants are measured (sample): S _Plant (m) is then the T2 seeds 2 ⁴ (16) and a set, for all, The determination result (phenotype) of presence or absence of “gene expression” is obtained, and if any plant body (sample): S _Plant (m) is determined to have no “gene expression”, “model waveform formation”, “ Do not perform the analysis operation of “Characteristics of line”.

従って、対象とする「ライン」において、測定対象である各植物体（サンプル）がいずれも、測定部分の細胞集団全体として、「ルシフェラーゼ酵素活性」が明確に有意なピークを示さない植物体である場合には、そのピークの特定、特徴の抽出が困難であるが、この基準を用いて、この種の「特徴性」に乏しいものは、排除することが可能となっている。 Accordingly, in the target “line”, each plant (sample) to be measured is a plant that does not clearly show a significant peak in “luciferase enzyme activity” as the whole cell population of the measurement part. In some cases, it is difficult to identify the peak and extract the features, but using this criterion, it is possible to eliminate those that lack this kind of “characteristic”.

この基準を設定することにより、各植物体の測定部分の細胞集団全体が、幾つかの部分集合の細胞数Ｎ_subgr-iの和、Ｎ_total＝ΣＮ_subgr-iと表記することができる際、そのいずれかは、特定の時間帯において、明確に有意なピークを示すような「ルシフェラーゼ酵素活性」を示すものを選別することが可能となる。このような特徴的なピークの存在に基づき、類似性・共通性の判断を行うことで、後述の「クラスター分析」の確度を高くすることが可能となっている。 By setting this standard, when the whole cell population of the measurement part of each plant body can be expressed as the sum of the number N _subgr-i of several subsets, N _total = ΣN _subgr-i Any one of them can select those showing “luciferase enzyme activity” that clearly shows a significant peak in a specific time zone. By determining similarity / commonality based on the presence of such characteristic peaks, the accuracy of “cluster analysis” described later can be increased.

（５）対象とする「ライン」内において、「遺伝子の発現」有りと判定される、各植物体の「ルシフェラーゼ酵素活性」の時間的変化に基づく、「レポーター遺伝子」の「ルシフェラーゼ遺伝子」の発現パターンの類似性による「クラスター分析」 (5) Expression of “luciferase gene” of “reporter gene” based on temporal change of “luciferase enzyme activity” of each plant body determined to be “gene expression” in the “line” of interest "Cluster analysis" by pattern similarity

対象とする「ライン」内において、「遺伝子の発現」有りと判定される、各植物体の「ルシフェラーゼ酵素活性」の時間的変化として、「測定誤差補正」済みの時系列データ（ｔ_m［ｉ＋１］，Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］））を利用して、当該ライン内の各植物体（サンプル）：Ｓ_Plant（ｍ）間における「ルシフェラーゼ酵素活性」の時間的変化の類似性に基づく、複数の「クラスター」に分類を行う。 As a temporal change of “luciferase enzyme activity” of each plant body determined to be “gene expression” in the target “line”, time-series data (t _m [i + 1 ], P _{lum.corrected.-m} (t _m [i])), the temporal change of “luciferase enzyme activity” between each plant body (sample): S _Plant (m) in the line. Classification into multiple “clusters” based on similarity.

その際、各植物体（サンプル）：Ｓ_Plant（ｍ）において、観測されている細胞集団全体の細胞数Ｎ_totalも異なり、また、個々の細胞中において、「特定の遺伝子」と、その対立遺伝子の発現が、並行的に進行する際には、その発現頻度の比率は、明確でない。この点を考慮して、予め選択された、所望の「長時間」にわたる「測定誤差補正」済みの時系列データ（ｔ_m［ｉ＋１］，Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］））を利用して、
測定された光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）の数値積分値を、
∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ
≒Σ１／２・｛Ｐ_{lum.corrected.-m}（ｔ_m［ｉ＋１］）＋Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）｝・（ｔ_m［ｉ＋１］−ｔ_m［ｉ］）
として、時間積分値を算出する。この時間積分値により、測定された光強度：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）を除した値は、規格化された「ルシフェラーゼ酵素活性」の時間的変化を示す指標となる。すなわち、この「長時間」の測定期間中、どの時間帯が、相対的に高い「ルシフェラーゼ酵素活性」を示すか、つまり、どの時間帯が、「特定の遺伝子」の発現を行っている細胞数が、相対的に高くなっているかを示す指標となる。 At that time, in each plant body (sample): S _Plant (m), the observed cell number N _{total of the} entire cell population is different, and in each individual cell, a “specific gene” and its allele When the expression of is progressed in parallel, the ratio of the expression frequency is not clear. In consideration of this point, time series data (t _m [i + 1], P _{lum.corrected.-m} (t _m [i]) that has been preliminarily selected and has been subjected to “measurement error correction” over a desired “long time” is selected. )
Measured light intensity: Numerical integration value of P _{lum.corrected.-m} (t _m [i])
∫P _{lum.corrected.-m} (t) dt
≈Σ1 / 2 · {P _{lum.corrected.-m} (t _m [i + 1]) + P _{lum.corrected.-m} (t _m [i])} · (t _m [i + 1] −t _m [i])
As a result, a time integral value is calculated. The value obtained by dividing the measured light intensity: P _{lum.corrected.-m} (t _m [i]) by this time integration value is an index indicating the temporal change of the normalized “luciferase enzyme activity”. . That is, during this “long time” measurement period, which time zone shows relatively high “luciferase enzyme activity”, that is, which time zone expresses the “specific gene”. Is an indicator of whether it is relatively high.

どの時間帯が、「特定の遺伝子」の発現を行っている細胞数が、相対的に高くなっているか、すなわち、各植物体（サンプル）における、「特定の遺伝子」の発現パターン間の類似性は、規格化された「ルシフェラーゼ酵素活性」の時間的変化を示す時系列データに基づき、｛…，Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ，…｝のベクター間の「距離」を算定し、このベクター間の「距離」の長短を、類似度の指標と比較する。 In which time zone, the number of cells expressing the “specific gene” is relatively high, that is, the similarity between the expression patterns of the “specific gene” in each plant (sample) Is based on time-series data indicating temporal changes in the normalized “luciferase enzyme activity”, {..., P _{lum.corrected.-m} (t _m [i]) / ∫P _{lum.corrected.-m} ( t) The “distance” between the vectors of dt,.

具体的には、ベクター間の「距離」の算定は、行間行列計算法を適用する。 Specifically, the calculation of the “distance” between vectors applies a row matrix calculation method.

予め、Ｔ１種子から生育した植物体において、その遺伝型を別途検定した結果、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」の遺伝型を示すことが確認された「ライン」について、そのＴ２種子から生育させた植物体中、「特定の遺伝子」の発現有りと判定されたものに関して、「クラスター分析」を行う。 As a result of separately testing the genotype of the plant grown from T1 seed in advance, it was confirmed that it showed a “heterotype” genotype having a chromosome configuration of (a ^* · a) or (a · a ^* ) With respect to the “line”, “cluster analysis” is performed on the plant that is grown from the T2 seed and is determined to have the expression of the “specific gene”.

その際、該「ライン」のサンプルとして、２⁴個（１６個）一組を、計２組を採用し、各組において、種々の「クラスター分析」の手法を適用して、クラスター化を行った後、二つの組を併せて、２⁵個（３２個）のサンプルに関して、同様にクラスター化を行った際、各組において、最も類似する「ペア」と判定されたサンプルが、二組を併せた際にも、同様に、最も類似する「ペア」と判定される割合を算定した。 At that time, as a sample of the "line", 2 ^four (16) a set, employing a total of two pairs, in each pair, by applying the techniques of various "cluster analysis", performed clustering after, together two sets, for a sample of 2 ⁵ (32), when subjected to the same clustered, in each set, the sample is judged to be most similar to the "pAIR", two sets of When combined, the ratio determined to be the most similar “pair” was also calculated.

その際、各組において、最も類似する「ペア」と判定されたサンプルが、二組を併せた際にも、同様に、最も類似する「ペア」と判定される割合が、より高くなる「クラスター分析」の手法を、最適な「クラスター分析」手法として、選択した。 At that time, when the pair of samples that are determined to be the most similar “pair” in each group is combined, the ratio of the sample that is determined to be the most similar “pair” is also higher. The “analysis” method was selected as the optimal “cluster analysis” method.

ベクター間の「距離」の算定法として、利用可能な行間行列計算法として、
下記の４種の行間行列計算法：
・ｅｕｃｌｉｄｅａｎ
・ｍａｎｈａｔｔａｎ
・ｍａｘｉｍｕｍ
・ｃａｎｂｅｒｒａ
算定されたベクター間の「距離」に基づく、「クラスター化」の過程で利用される結合法として、
下記の６種の結合法：
・ａｖｅｒａｇｅ
・ｃｅｎｔｒｏｉｄ
・ｃｏｍｐｌｅｔｅ
・ｍｃｑｕｉｔｔｙ
・ｓｉｎｇｌｅ
・ｗａｒｄ
の組み合わせについて、上記の手法に基づき、最適な「クラスター分析」手法の組み合わせを選択した。 As a method for calculating the “distance” between vectors,
The following four row matrix calculation methods:
・ Euclidian
・ Manhattan
・ Maximum
・ Canberra
Based on the calculated “distance” between vectors, as a coupling method used in the “clustering” process,
The following six bonding methods:
・ Average
・ Centroid
・ Complete
・ Mcquity
・ Single
・ Ward
Based on the above method, the optimal “cluster analysis” method combination was selected.

上記の表１に示すように、行間行列計算法として、ｍａｎｈａｔｔａｎ法、結合法として、ｗａｒｄ法を用いる組み合わせが、最適な「クラスター分析」手法として、選択される。 As shown in Table 1 above, the combination using the Manhattan method as the inter-row matrix calculation method and the Ward method as the combination method is selected as the optimal “cluster analysis” method.

なお、各個体間の距離に基づく、クラスター分析における「階層化」には、統計解析ソフトＲ（ｈｔｔｐ：／／ｗｗｗ．ｒ−ｐｒｏｊｅｃｔ．ｏｒｇ／）を利用した。 Note that statistical analysis software R (http://www.r-project.org/) was used for “stratification” in cluster analysis based on the distance between each individual.

なお、クラスター分析の計算法は、上記において選択された手法以外に、場合によっては、行間行列計算法では、ｅｕｃｌｉｄｅａｎ／ｍａｘｉｍｕｍ／ｃａｎｂｅｒｒａ／ｂｉｎａｒｙ／ｍｉｎｋｏｗｓｋｉを、結合法ではｓｉｎｇｌｅ／ｃｏｍｐｌｅｔｅ／ａｖｅｒａｇｅ／ｍｃｑｕｉｔｔｙ／ｍｅｄｉａｎ／ｃｅｎｔｒｏｉｄを適宜用いることができる。 In addition to the method selected above, the cluster analysis calculation method may include euclidean / maximum / cancella / binary / minkowski in the inter-row matrix calculation method and single / complete / average / mcquitity / in the combination method. median / centroid can be used as appropriate.

この「クラスター分析」法を、図６の、グラフ［Ｂ２］に示す、Ｔ１種子から生育した植物体において、その遺伝型を別途検定した結果、（ａ^*・ａ）または（ａ・ａ^*）の染色体構成を有する「ヘテロ型」の遺伝型を示すことが確認された「ライン」について、そのＴ２種子から生育させた植物体（サンプル）１６個体の規格化された「ルシフェラーゼ酵素活性」の時間的変化のデータに適用した。その結果、「クラスター化」の結果として、図７の、グラフ［Ｂ３］に示す、樹状図に示す、階層的なクラスター化がなされる。最終的に、「特定の遺伝子」の発現有りと判定される１３の植物体（サンプル）は、二つの「グループ」に分類されている。この二つのグループ間の結合距離は、閾値として、７．６×１０⁵となっており、また、各グループに分類される植物体（サンプル）は、略等しい個体数となっている。 This “cluster analysis” method is shown in graph [B2] of FIG. 6. As a result of separately testing the genotype of the plant grown from T1 seed, (a ^* · a) or (a · a ^* ) Time of standardized “luciferase enzyme activity” of 16 plant bodies (samples) grown from the T2 seed for “line” that was confirmed to show a “heterotype” genotype having the chromosome structure of Applied to change data. As a result, as a result of “clustering”, hierarchical clustering shown in the dendrogram shown in the graph [B3] in FIG. 7 is performed. Finally, the thirteen plants (samples) determined to have “specific gene” expression are classified into two “groups”. The coupling distance between the two groups is 7.6 × 10 ⁵ as a threshold, and the plants (samples) classified into each group have substantially the same number of individuals.

この「特定の遺伝子」の発現有りと判定される１３の植物体（サンプル）は、（ａ^*・ａ^*）の「ホモ型」と、（ａ^*・ａ）または（ａ・ａ^*）の「ヘテロ型」との遺伝型に分類されていると思われる。 The 13 plant bodies (samples) determined to have expression of this “specific gene” are the “homotype” of (a ^* · a ^* ) and (a ^* · a) or (a · a ^* ). It seems that it is classified into the genotype of “heterotype”.

先に説明したように、例えば、遺伝型が（ａ^*・ａ）または（ａ・ａ^*）の「ヘテロ型」である際にも、相同遺伝子上に存在する対立遺伝子の組み合わせにより、複数種の「遺伝子型」が存在する。仮に、当該「ライン」について、その「特定の遺伝子」の発現有りと判定される植物体（サンプル）が、全て、（ａ^*・ａ）または（ａ・ａ^*）の「ヘテロ型」である際にも、この複数の「遺伝子型」によって、「クラスター化」がなされ、最終的に、二つの「グループ」に分類される可能性がある。その際には、その二つのグループ間の結合距離は、相対的に近くなって、閾値も相対的に低い値となる。 As described above, for example, even when the genotype is a “heterotype” of (a ^* · a) or (a · a ^* ), a combination of alleles present on the homologous gene can be There is a “genotype”. Temporarily, regarding the “line”, all the plant bodies (samples) determined to have expression of the “specific gene” are “heterotype” of (a ^* · a) or (a · a ^* ). In some cases, these “genotypes” may cause “clustering” and eventually be classified into two “groups”. In that case, the coupling distance between the two groups is relatively short, and the threshold value is also relatively low.

実際に、有意に、二つのグループに大別すると判断する上では、
（ｉ）階層的な「クラスター化」により、作成される樹状図において、最上位のグループとして、ある水準以上の閾値で、二つのグループに大別されている。 In fact, in judging that it is significantly divided into two groups,
(I) Due to the hierarchical “clustering”, in the created dendrogram, the highest group is roughly divided into two groups with a threshold above a certain level.

（ii）その際、大別される二つのグループは、少なくとも、２以上のサンプルを含んでいる。 (Ii) At that time, the two groups roughly classified include at least two or more samples.

以上の２つの要件を満足する必要がある。 It is necessary to satisfy the above two requirements.

例えば、一方のグループに分類されるサンプルが一つである場合、統計学的に、このサンプルに対して、前記の閾値以下の「距離」のサンプルが、２⁴個（１６個）一組のサンプル群に存在していない蓋然性は相当に低い。逆に、最上位のグループを区分する上の閾値はある水準に達していない場合、統計学上、二つのグループ間に有意な差異があると、判断できる蓋然性は相当に低い。 For example, when one sample is classified into one group, statistically, a sample of “distance” below the threshold is set to 24 ⁴ (16) sets. The probability that it does not exist in the sample group is considerably low. On the other hand, if the upper threshold for dividing the top group does not reach a certain level, the probability that it can be judged that there is a significant difference between the two groups in statistics is considerably low.

（６）対象とする「ライン」内において、「遺伝子の発現」有りと判定される、各植物体の「ルシフェラーゼ酵素活性」の時間的変化に基づく、「クラスター分析」により大別される二つの「グループ」における、「ルシフェラーゼ酵素活性」の時間的変化の傾向を代表する「モデル波形」の作成
（６−１）「時間軸統一」操作
各グループに分類される、各植物体（サンプル）について、その規格化された「ルシフェラーゼ酵素活性」の時間的変化のデータは、（ｔ_m［ｉ］，Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）の時系列データであるが、各各植物体（サンプル）毎に、その測定時間：ｔ_m［ｉ］は、若干の相違を有している。 (6) Within the target “line”, two types of “genetic expression” are roughly classified by “cluster analysis” based on temporal changes in “luciferase enzyme activity” of each plant body. Creation of “model waveform” representing the tendency of temporal change in “luciferase enzyme activity” in “Group” (6-1) “Unify time axis” operation About each plant body (sample) classified into each group The normalized time-dependent data of “luciferase enzyme activity” is (t _m [i], P _{lum.corrected.-m} (t _m [i]) / ∫P _{lum.corrected.-m} ( t) Time series data of dt), but for each plant (sample), the measurement time: t _m [i] is slightly different.

そのため、各植物体（サンプル）について、その規格化された「ルシフェラーゼ酵素活性」の時間的変化のデータを、仮に、同じ測定時間：ｔ_LINE［ｉ］において、測定した場合、測定されていたと推定される、「時間軸を統一」した、規格化された「ルシフェラーゼ酵素活性」の時間的変化のデータに変換する。 Therefore, for each plant body (sample), it is presumed that the normalized “luciferase enzyme activity” data was measured when it was measured at the same measurement time: t _LINE [i]. The data is converted into normalized time-dependent data of “luciferase enzyme activity” that “unify the time axis”.

具体的には、統一される時間：ｔ_LINE［ｉ］に対して、各植物体（サンプル）における、測定時間：ｔ_m［ｉ］は、その前後となっており、少なくとも、ｔ_m［ｉ−１］＜ｔ_LINE［ｉ］＜ｔ_m［ｉ＋１］の関係を満足している。この点に着目して、統一される時間：ｔ_LINE［ｉ］において、推定される規格化された「ルシフェラーゼ酵素活性」：｛Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）｝を、時間ｔ_m［ｉ−１］、ｔ_m［ｉ］、ｔ_m［ｉ＋１］の波形データに基づき、内挿法により推定する。 Specifically, with respect to the unified time: t _LINE [i], the measurement time: t _m [i] in each plant body (sample) is around, and at least t _m [i -1] <t _LINE [i] <t _m [i + 1]. Focusing on this point, the standardized “luciferase enzyme activity” estimated at a unified time: t _LINE [i]: {P _{lum.corrected.-m} (t _LINE [i]) / ∫P _{lum.corrected.-m} (t) dt)} is estimated by interpolation based on the waveform data at times t _m [i−1], t _m [i], t _m [i + 1].

具体的には、規格化された「ルシフェラーゼ酵素活性」の時間的変化を示す波形において、時間ｔ_m［ｉ−１］、ｔ_m［ｉ］、ｔ_m［ｉ＋１］の波形データが、単調増加、または、単調減少、あるいは、略一定となっている際には、この三点を一つの直線で近似し、その近似直線上、時間ｔ_LINE［ｉ］における値を推定値として採用する。すなわち、最小二乗法［ｍｅｔｈｏｄｏｆｌｅａｓｔｓｑｕａｒｅｓ］を適用して、時間ｔ_m［ｉ−１］、ｔ_m［ｉ］、ｔ_m［ｉ＋１］の波形データ上の三点を直線近似する「近似直線」を求める。 Specifically, the waveform data at time t _m [i−1], t _m [i], t _m [i + 1] monotonically increase in the waveform indicating the temporal change in the normalized “luciferase enzyme activity”. Alternatively, when monotonously decreasing or substantially constant, these three points are approximated by one straight line, and the value at time t _LINE [i] is employed as the estimated value on the approximate straight line. That is, by applying the least square method [method of least squares], the “approximate straight line” that linearly approximates three points on the waveform data at times t _m [i−1], t _m [i], and t _m [i + 1]. "

一方、規格化された「ルシフェラーゼ酵素活性」の時間的変化を示す波形において、時間ｔ_m［ｉ−１］、ｔ_m［ｉ］、ｔ_m［ｉ＋１］の波形データが、明確なピークの極大部に相当する際、すなわち、時間ｔ_m［ｉ］が、ピークの極大点に相当している際には、二点を用いて、内挿法により推定する。具体的には、ｔ_m［ｉ−１］＜ｔ_LINE［ｉ］＜ｔ_m［ｉ］である場合は、時間ｔ_m［ｉ−１］、ｔ_m［ｉ］の二点を結ぶ直線上、時間ｔ_LINE［ｉ］における値を推定値とする。また、ｔ_m［ｉ］＜ｔ_LINE［ｉ］＜ｔ_m［ｉ＋１］である場合は、時間ｔ_m［ｉ］、ｔ_m［ｉ＋１］の二点を結ぶ直線上、時間ｔ_LINE［ｉ］における値を推定値とする。 On the other hand, in the waveform showing the temporal change of the normalized “luciferase enzyme activity”, the waveform data at times t _m [i−1], t _m [i], and t _m [i + 1] are maximums of clear peaks. When it corresponds to a part, that is, when the time t _m [i] corresponds to the maximum point of the peak, it is estimated by interpolation using two points. Specifically, when t _m [i−1] <t _LINE [i] <t _m [i], a line connecting two points of time t _m [i−1] and t _m [i] The value at time t _LINE [i] is assumed to be an estimated value. When t _m [i] <t _LINE [i] <t _m [i + 1], the time t _LINE [i] is on a straight line connecting two points of time t _m [i] and t _m [i + 1]. The value at is the estimated value.

（６−２）「クラスター分析」により大別される二つの「グループ」における、「ルシフェラーゼ酵素活性」の時間的変化の傾向を代表する「モデル波形作成」操作
「クラスター化」によって、二つのグループに大別された、各「ライン」の各植物体（サンプル）群の「時間軸を統一」した、規格化された「ルシフェラーゼ酵素活性」の時間的変化のデータから、各「グループ」における、「ルシフェラーゼ酵素活性」の時間的変化の傾向を代表する「モデル波形」を作成する。 (6-2) “Model waveform creation” operation representative of the temporal change tendency of “luciferase enzyme activity” in two “groups” roughly classified by “cluster analysis”. From the data of time-dependent changes in “luciferase enzyme activity” that standardizes the “time axis” of each plant body (sample) group in each “line”, A “model waveform” representing the tendency of temporal change in “luciferase enzyme activity” is created.

通常、当該グループに分類されている「複数の波形」に対して、その何れとも、類似性の指標「距離」が略等しく、かつ、当該グループに分類可能な「波形」が、この「グループ」に属する「波形」における共通性（傾向）を代表する「モデル波形」となる。すなわち、統一される時間：ｔ_LINE［ｉ］における、推定される規格化された「ルシフェラーゼ酵素活性」を時系列的に並べたデータ：｛…，Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ），…｝のベクターを用いて、当該グループに属する各植物体（サンプル）群について、その平均を求めたものが、一般に、この「グループ」に属する「波形」における共通性（傾向）を代表する「モデル波形」となる。 Usually, the “waveforms” that are classified into the group and that have a similarity index “distance” that is substantially equal to the “plurality of waveforms” that are classified into the group are the “group”. “Model waveform” representing the commonality (trend) in “waveforms” belonging to That is, data obtained by chronologically arranging estimated normalized “luciferase enzyme activities” at a unified time: t _LINE [i]: {..., P _{lum.corrected.-m} (t _LINE [i ]) / ∫P _{lum.corrected.-m} (t) dt),...}}, The average of each group of plant bodies (samples) belonging to the group is generally determined by this “ This is a “model waveform” that represents the commonality (trend) among “waveforms” belonging to “group”.

但し、Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）の値は、元のデータ：Ｐ_{lum.corrected.-m}（ｔ_m［ｉ］）自体が、統計的な変動（バラツキ）を有するため、同程度の統計的な変動（バラツキ）を有するものとなっている。例えば、Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）の値が、ガウス分布型の分散（バラツキ）を有するとすると、その標準分散は、｛Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）｝^1/2程度となっている。すなわち、相対的なバラツキの程度は、｛Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）｝^1/2／Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）となり、Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）の値が小さい程、よりその程度は増すものとなっている。さらに、「規格化」を施すと、｛Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）｝における、相対的なバラツキの程度は、∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）が小さいほど、さらに、その程度は増すものとなっている。 However, the value of P _{lum.corrected.-m} (t _LINE [i]) is the original data: P _{lum.corrected.-m} (t _m [i]) itself has a statistical variation (variation). Therefore, it has the same level of statistical fluctuation (variation). For example, if the value of P _{lum.corrected.-m} (t _LINE [i]) has a Gaussian distribution (variation), the standard dispersion is {P _{lum.corrected.-m} (t _LINE [ i])} about ^1/2 . That is, the degree of relative _{variation, {P lum.corrected.-m (t} LINE [i])} 1/2 / P lum.corrected.-m (t LINE [i]) becomes, P _{Lum.Corrected The} smaller the value of _.-m (t _LINE [i]), the more it increases. Furthermore, when “normalization” is applied, the degree of relative variation in {P _{lum.corrected.-m} (t _LINE [i]) / ∫P _{lum.corrected.-m} (t) dt)} is As ∫P _{lum.corrected.-m} (t) dt) is smaller, the degree is further increased.

その点を考慮すると、上記の規格化された「ルシフェラーゼ酵素活性」の単純「平均」は、∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）が小さい植物体（サンプル）の規格化された「データ」に起因する「バラツキ」の影響を受け易くなっている。従って、本発明では、単純「平均」に代えて、当該グループに属する各植物体（サンプル）群の「データ」の「中央値」を、この「グループ」に属する「波形」における共通性（傾向）を代表する値として選択する。この「中央値」は、統一される時間：ｔ_LINE［ｉ］における、各植物体（サンプル）の規格化された「ルシフェラーゼ酵素活性」：｛Ｐ_{lum.corrected.-m}（ｔ_LINE［ｉ］）／∫Ｐ_{lum.corrected.-m}（ｔ）ｄｔ）｝を、その大きさにより並べ、中央となる値とし、このグループに属する植物体（サンプル）数が、２ｎ＋１の場合、（ｎ＋１）番目の値とし、２ｎの場合は、ｎ番目の値と（ｎ＋１）番目の値の単純「平均」とする。 Considering this point, the simple “average” of the above-mentioned standardized “luciferase enzyme activity” is a standardized “plant” (sample) having a small ∫P _{lum.corrected.-m} (t) dt). It is easily affected by “variation” caused by “data”. Therefore, in the present invention, instead of the simple “average”, the “median” of “data” of each plant (sample) group belonging to the group is changed to the commonality (trend) in the “waveform” belonging to the “group”. ) As a representative value. This “median” is the normalized “luciferase enzyme activity” of each plant body (sample) at a unified time: t _LINE [i]: {P _{lum.corrected.-m} (t _LINE [i] ) / ∫P _{lum.corrected.-m} (t) dt)} are arranged according to their sizes and set as the central value. When the number of plants (samples) belonging to this group is 2n + 1, the (n + 1) th In the case of 2n, it is a simple “average” of the nth value and the (n + 1) th value.

図８に示すグラフ［Ｂ４−１］に、グラフ［Ｂ３］に示す樹状図における、植物体（サンプル）数７が属するグループに関して、前記の手順に従って、統一される時間：ｔ_LINE［ｉ］における、「中央値」：Ｐ_{lum.center-Gr1}（ｔ_LINE［ｉ］）で構成される、時系列的代表データ：（ｔ_LINE［ｉ］，Ｐ_{lum.center-Gr1}（ｔ_LINE［ｉ］））を作成した結果を示す。図８に示すグラフ［Ｂ４−１］には、対比のため、単純「平均」を用いて構成される時系列的データも併せて示す。 For the group to which the number of plant bodies (samples) 7 in the dendrogram shown in the graph [B3] belongs to the graph [B4-1] shown in FIG. 8, the time unified according to the above procedure: t _LINE [i] In the above, “median value”: P _{lum.center-Gr1} (t _LINE [i]), time series representative data: (t _LINE [i], P _{lum.center-Gr1} (t _LINE [i] )) Is created. The graph [B4-1] illustrated in FIG. 8 also shows time-series data configured using a simple “average” for comparison.

同じく、図９に示すグラフ［Ｂ４−２］に、グラフ［Ｂ２］に示す樹状図における、植物体（サンプル）数６が属するグループに関して、前記の手順に従って、統一される時間：ｔ_LINE［ｉ］における、「中央値」：Ｐ_{lum.center-Gr1}（ｔ_LINE［ｉ］）で構成される、時系列的代表データ：（ｔ_LINE［ｉ］，Ｐ_{lum.center-Gr1}（ｔ_LINE［ｉ］））を作成した結果を示す。 Similarly, in the graph [B4-2] shown in FIG. 9, for the group to which the number of plant bodies (samples) 6 in the dendrogram shown in the graph [B2] belongs, the time unified according to the above procedure: t _LINE [ i], “median”: P _{lum.center-Gr1} (t _LINE [i]), time series representative data: (t _LINE [i], P _{lum.center-Gr1} (t _LINE [ i])) is created.

この「中央値」を代表値とする「モデル波形」は、そのグループにおいて、各植物体（サンプル）の規格化された「ルシフェラーゼ酵素活性」が共通的に高くなる時間帯において、それぞれ極大を示すものとなっている。実際には、該グループにおいて、各植物体（サンプル）の規格化された「ルシフェラーゼ酵素活性」が極大を示す時間は、若干前後し、また、その極大値にも分散があるため、この「モデル波形」は、この二つの分散を「平均化」したものに相当するものとなっている。 The “model waveform” having the “median” as a representative value shows a maximum in the group in which the normalized “luciferase enzyme activity” of each plant body (sample) is commonly increased. It has become a thing. Actually, in this group, the time for which the normalized “luciferase enzyme activity” of each plant body (sample) takes a maximum is slightly around, and the maximum value is also dispersed. “Waveform” corresponds to “averaged” of these two variances.

図１０には、グラフ［Ｂ２］に示す樹状図に含まれる、二つのグループに含まれる各植物体（サンプル）の全てについて、「時間軸を統一」した、規格化された「ルシフェラーゼ酵素活性」の時間的変化を示す波形、ならびに、図８に示す、第一のグループに対する、「中央値」を代表値とする「モデル波形」、単純「平均」を用いて構成される時系列的データ、図９に示す、第二のグループに対する、「中央値」を代表値とする「モデル波形」を併せて示す。その際、第一のグループに対する、「中央値」を代表値とする「モデル波形」と、第二のグループに対する、「中央値」を代表値とする「モデル波形」とは、その極大を示す時間帯は、概ね類似しているが、個々の極大ピークにおける「極大値」の大小に相違点があると見做せる。 FIG. 10 shows a standardized “luciferase enzyme activity” in which all the plants (samples) included in the two groups included in the dendrogram shown in the graph [B2] are “unified time axis”. , And a time-series data composed of a “model waveform” and a simple “average” with “median” as a representative value for the first group shown in FIG. FIG. 9 also shows a “model waveform” having a “median value” as a representative value for the second group. At that time, the “model waveform” with “median” as a representative value for the first group and the “model waveform” with “median” as a representative value for the second group show the maximum values. Although the time zones are generally similar, it can be considered that there is a difference in the magnitude of the “maximum value” at each maximum peak.

極大を示す時間帯に着目すると、全体として、２４時間周期の規則性が推定され、また、その２４時間の周期中には、凡そ１２時間の隔たりを有する、二つのピーク対が存在していると推定される。すなわち、測定対象の各植物体（サンプル）は、同じ、９６穴プレート上で生長されており、同じ環境下に保たれており、所謂、「概日リズム」に類する「２４時間周期の規則性」を示し、その間における「特定の遺伝子」の発現のタイミングも揃っている。 Focusing on the time zone showing the maximum, the regularity of the 24-hour period is estimated as a whole, and there are two peak pairs having a separation of about 12 hours in the 24-hour period. It is estimated to be. That is, each plant body (sample) to be measured is grown on the same 96-well plate and kept in the same environment, and the “regularity of a 24-hour period” similar to the so-called “circadian rhythm”. ", And the timing of the expression of the" specific gene "in the meantime.

作成された「モデル波形」は、そのグループの構成を示す「クラスター分析」の結果、すなわち、「遺伝型自動判別」における樹状図を与える解析結果とともに、モデル波形テーブルに収納される。 The created “model waveform” is stored in the model waveform table together with the result of “cluster analysis” indicating the configuration of the group, that is, the analysis result giving a tree diagram in “genotype automatic discrimination”.

なお、このモデル波形作成の過程では、「クラスター分析」においては、対象とする「ライン」内において、「遺伝子の発現」有りと判定される、各植物体の「ルシフェラーゼ酵素活性」の時間的変化を利用しているため、各「ライン」内において、「遺伝子の発現」有りと判定される植物体（サンプル）数が、少なくとも、４を超えない場合には、二つのグループへの大別ができない。また、上記の二つのグループへの大別がなされていると判断する基準を満たさない場合には、各グループに対する、「モデル波形」の作成も行うことができない。このような場合には、通常、当該「ライン」について、新たに、２⁴個（１６個）一組のサンプル群の測定を行い、その結果に基づき、同じ手順で解析を行う。 In the process of creating the model waveform, in the “cluster analysis”, the temporal change in the “luciferase enzyme activity” of each plant body that is determined to have “gene expression” within the target “line”. Therefore, in each “line”, if the number of plants (samples) determined to have “gene expression” does not exceed at least 4, the two groups are roughly classified. Can not. In addition, if the criteria for determining that the two groups are roughly divided are not satisfied, the “model waveform” cannot be created for each group. In such a case, usually, for that "line", new, 2 ⁴ (16) was measured for a set of sample groups, based on the result, to analyze the same procedure.

二つの独立したサンプル群においても、同様に二つのグループへの大別がなされない場合には、二つのサンプル群を統合して、同様の解析を進める。その際、やはり、最上位のグループ分けにおいて、一方のグループは、単一のサンプルしか属さない場合には、この単一のサンプルを除外し、残ったサンプル群において、「クラスター分析」を行い、上記の二つのグループへの大別がなされていると判断する基準を満たすか、否かの判定を行う。その結果、残ったサンプル群において、上記の二つのグループへの大別がなされていると判断される場合は、この二つのグループに関して、「モデル波形」を作成する。 In the case where two independent sample groups are not divided into two groups in the same manner, the two sample groups are integrated and the same analysis proceeds. At that time, again, in the highest grouping, if one group belongs to only a single sample, this single sample is excluded, and “cluster analysis” is performed on the remaining sample group. It is determined whether or not a criterion for determining that the above two groups are roughly divided is satisfied. As a result, if it is determined that the remaining sample groups are roughly divided into the above two groups, a “model waveform” is created for these two groups.

一方、上記の二つのグループへの大別がなされていると判断されない場合は、当該「ライン」に関しては、単一のグループであると、判断して、同様に「モデル波形」を作成する。 On the other hand, if it is not determined that the above two groups are roughly divided, it is determined that the “line” is a single group, and a “model waveform” is similarly created.

（７）対象とする「ライン」において、「クラスター分析」により大別される二つの「グループ」における、「ルシフェラーゼ酵素活性」の時間的変化の傾向を代表する「モデル波形」に基づく、「ラインの特徴把握」を目的とする波形解析
（７−１）「モデル波形」の波形的特徴を反映する、複数の「波形関数」の重ね合わせへの「波形分解」
各グループについて、作成される「モデル波形」は、当該グループに分類される植物体（サンプル）複数において、共通的に見出される「ルシフェラーゼ酵素活性」が極大を示す時間帯の情報を含んでいる。また、その共通的に見出される「ルシフェラーゼ酵素活性」が極大を示す時間帯における、代表的な「極大値」の情報をも同時に含んでいる。 (7) In the target “line”, the “line” based on the “model waveform” representing the tendency of temporal change of “luciferase enzyme activity” in two “groups” roughly classified by “cluster analysis” Waveform analysis for the purpose of "feature grasp" (7-1) "Waveform decomposition" into a superposition of a plurality of "waveform functions" reflecting the waveform characteristics of "model waveform"
The “model waveform” created for each group includes information on a time zone in which “luciferase enzyme activity” commonly found in a plurality of plants (samples) classified into the group has a maximum. In addition, information on a representative “maximum value” in a time zone in which the commonly found “luciferase enzyme activity” exhibits a maximum is also included.

一方、植物体全体で観測される「ルシフェラーゼ酵素活性」の時間的変化は、元来、その細胞集団を構成する細胞数Ｎ_totalが、幾つかの部分集合の細胞数Ｎ_subgr-iの和、Ｎ_total＝ΣＮ_subgr-iと表記することができ、その部分集合の細胞群が、それぞれ示す「ルシフェラーゼ酵素活性」の時間的変化を、重ね合わせたものとなっている。従って、一見した範囲では、ピークの頂部が潰れ、全体的に幅の広いピークと見做せるものは、本来は、異なった時点でピークを示す「ルシフェラーゼ酵素活性」の時間的変化が、複数重なったものと解釈することが可能である。 On the other hand, the temporal change in “luciferase enzyme activity” observed in the whole plant is originally the sum of the number of cells constituting the cell population N _total is the sum of the number of _sub- cells N _subgr-i , N _total = ΣN _subgr-i, and the cell group of the subset is a superposition of the temporal changes of the “luciferase enzyme activity” shown. Therefore, at first glance, the peak of the peak is crushed, and what can be regarded as a broad peak as a whole is inherently a multiple of temporal changes in the “luciferase enzyme activity” that shows the peak at different points in time. Can be interpreted.

すなわち、個々の部分集合の細胞群における、「ルシフェラーゼ酵素活性」の時間的変化は、同じような半値幅を示す単峰性の波形関数で近似でき、この同じような半値幅を示す単峰性の波形関数が、複数重なりあった結果、植物体全体で観測される「ルシフェラーゼ酵素活性」の時間的変化の波形を与えていると解釈することができる。 That is, the temporal change in “luciferase enzyme activity” in individual subsets of cells can be approximated by a unimodal waveform function exhibiting a similar half-width, and the unimodality exhibiting a similar half-width. As a result of a plurality of waveform functions overlapping, it can be interpreted that a waveform of a temporal change in “luciferase enzyme activity” observed in the whole plant is given.

この個々の部分集合の細胞群における、「ルシフェラーゼ酵素活性」の時間的変化を近似的に表す、単峰性の波形関数は、上で例示するような、二次微分可能な連続関数であり、例えば、ピーク高さ：ｈ_peak、ピーク位置：ｔ_peak、半値全幅：２Δｔ_halfを有する
ローレンツ関数；ｆ（ｔ）＝ｈ_peak×｛１＋（ｔ−ｔ_peak）²／（Δｔ_half）²｝^-1
などで、近似することが可能である。 In this individual subset of cells, the unimodal waveform function that approximately represents the temporal change in “luciferase enzyme activity” is a continuous function that is second-order differentiable, as illustrated above. For example, Lorentz function having peak height: h _peak , peak position: t _peak , full width at half maximum: 2Δt _half ; f (t) = h _peak × {1+ (t−t _peak ) ² / (Δt _half ) ² } ^{− 1}
And so on.

作成された「モデル波形」は、時系列的代表データ：（ｔ_LINE［ｉ］，Ｐ_{lum.center-Gr1}（ｔ_LINE［ｉ］））の形式であり、全体的には、緩やかな時間的変化の傾向を反映しているが、微視的には、数値微分が可能な程度に平滑化がなされたものとはなっていない。本来は、個々の植物体における、規格化された「ルシフェラーゼ酵素活性」の時間的変化は、少なくとも、二次微分可能な連続関数であるが、測定時の測定誤差成分、すなわち、微細なバラツキに起因する、微小な「ノイズ」を含むため、作成された「モデル波形」も、若干の微小な「ノイズ」を含むものとなっている。 The created “model waveform” is in the form of time-series representative data: (t _LINE [i], P _{lum.center-Gr1} (t _LINE [i])). Although it reflects the tendency of change, it is not microscopically smoothed to the extent that numerical differentiation is possible. Originally, the temporal change of the standardized “luciferase enzyme activity” in individual plants is at least a continuous function that can be second-order differentiated, but the measurement error component at the time of measurement, that is, fine variation. Due to the included minute “noise”, the created “model waveform” also includes some minute “noise”.

これらを考慮して、少なくとも、数値微分した際、二次微分可能な連続性を有する波形関数とするため、作成された「モデル波形」に対して、移動平均法を適用して、全体波形の平滑化を施す。その際、微小な「ノイズ」成分を含む波形において、この微小な「ノイズ」成分を除去しつつ、本来の波形の形状に対する歪みを抑制することが可能な移動平均法には、種々の手法が提案されているが、ここでは、多項式適合法を採用している。その際、少なくとも、数値微分した際、二次微分可能な連続性を有する波形関数とする目的も達成可能な平滑化点数を検討したところ、例えば、図８、図９に示す「モデル波形」に対しては、平滑化点として、５〜９点を選択すると、微小な「ノイズ」成分が除去され、少なくとも、数値微分した際、二次微分可能な連続性を有する波形関数が得られることが確認された。 In consideration of these, at least when the numerical differentiation is performed, the waveform function has a continuity that can be second-order differentiated, so the moving average method is applied to the created "model waveform" to Apply smoothing. At that time, there are various methods in the moving average method that can suppress the distortion to the original waveform shape while removing the minute “noise” component from the waveform including the minute “noise” component. Although proposed, a polynomial fitting method is used here. At that time, at least when the numerical differentiation is performed, the number of smoothing points that can also achieve the objective of obtaining a waveform function having continuity that can be secondarily differentiated is examined. For example, the “model waveform” shown in FIGS. On the other hand, if 5 to 9 points are selected as the smoothing points, a minute “noise” component is removed, and at least when a numerical differentiation is performed, a waveform function having continuity capable of second order differentiation can be obtained. confirmed.

この平滑化処理を施した「モデル波形」は、平滑化点５以上の多項式適合法を採用して、平滑化がなされているので、数値微分した際、三次微分が可能な状態となっている。上述するように、平滑化処理を施した「モデル波形」は、二次微分可能な単峰性の波形関数複数の重ね合わせで近似できるものであり、これらの単峰性の波形関数複数のピーク位置に相当する位置に、極大ピーク、あるいは、肩ピークを示すと推定される。 Since the “model waveform” subjected to the smoothing process is smoothed by adopting a polynomial fitting method having a smoothing point of 5 or more, when it is numerically differentiated, it is in a state capable of third-order differentiation. . As described above, a smoothed “model waveform” can be approximated by superimposing a plurality of unimodal waveform functions that can be second-order differentiated, and a plurality of peaks of these unimodal waveform functions. It is estimated that a maximum peak or a shoulder peak is shown at a position corresponding to the position.

極大ピークでは、一次微分値が、「正→零→負」と変化する、二次微分値は、「大→小→大」型の変化を示す。また、明確な肩ピークでも、一次微分値が、「正→零→負→零→正」または、「負→零→正→零→負」と変化する、二次微分値は、「大→小→大」型の変化を示す。一方、典型的な肩ピークの前後では、一次微分値が、正の値で「大→小→大」型の変化、あるいは、負の値で「大→小→大」型の変化を示す。従って、二次微分関数の極小点、かつ、三次微分値が零となる時間が、平滑化処理を施した「モデル波形」の、極大ピーク、あるいは、肩ピークを示す時間として特定される。 At the maximum peak, the primary differential value changes from “positive → zero → negative”, and the secondary differential value shows a change of “large → small → large” type. In addition, even with a clear shoulder peak, the primary differential value changes as “positive → zero → negative → zero → positive” or “negative → zero → positive → zero → negative”. It shows a change from “small to large”. On the other hand, before and after a typical shoulder peak, the first-order differential value is a positive value indicating a change of “large → small → large” type, or a negative value indicating a change of “large → small → large” type. Therefore, the minimum point of the second derivative function and the time when the third derivative value becomes zero are specified as the time indicating the maximum peak or the shoulder peak of the “model waveform” subjected to the smoothing process.

この極大ピーク、あるいは、肩ピークを示す時間、ならびに、その総数の推定値を利用して、Ｄａｖｉｄｓｏｎ−Ｆｌｅｔｃｈｅｒ−Ｐｏｗｅｌｌ（ＤＦＰ）法を適用して、平滑化処理を施した「モデル波形」の近似波形として、最も適する「合成波形」を作成する。ここでは、この「合成波形」の作成に利用する、単峰性の波形関数として、ローレンツ関数を採用し、極大ピーク、あるいは、肩ピークを示す時間に対して、それぞれ、一つの単峰性の波形関数の極大を対応させて、そのピーク位置：ｔ_peakにおける単峰性の波形関数のピーク高さ：ｈ_peak、半値全幅：２Δｔ_halfを種々に代えて、最適な「合成波形」を作成する。その際、平滑化処理を施した「モデル波形」と、「合成波形」との間で、残差二乗和を最小とするように、最適化がなされる。 An approximation of a “model waveform” that has been smoothed by applying the Davidson-Fletcher-Powell (DFP) method using the estimated time of the maximum peak or shoulder peak and the estimated value of the total number. As the waveform, the most suitable “composite waveform” is created. Here, the Lorentz function is adopted as the unimodal waveform function used to create this “synthetic waveform”, and each of the unimodal waveform functions has one unimodality with respect to the time showing the maximum peak or the shoulder peak. Corresponding to the maximum of the waveform function, the peak height: h _peak and the full width at half maximum: 2Δt _half of the unimodal waveform function at the peak position: t _peak are variously changed to create an optimal “composite waveform”. . At that time, optimization is performed so as to minimize the residual sum of squares between the “model waveform” subjected to the smoothing process and the “synthesized waveform”.

図１１のグラフ［Ｂ６−１］に、図８に示す「モデル波形」に対して、平滑化処理を施した結果と、その後、平滑化処理を施した「モデル波形」における、極大ピーク、あるいは、肩ピークの位置に対応させて、複数のローレンツ関数の重ね合わせによる「合成波形」によって、近似波形を作成した結果を示す。図１２のグラフ［Ｂ６−２］に、図９に示す「モデル波形」に対して、平滑化処理を施した結果と、その後、平滑化処理を施した「モデル波形」における、極大ピーク、あるいは、肩ピークの位置に対応させて、複数のローレンツ関数の重ね合わせによる「合成波形」によって、近似波形を作成した結果を示す。 The graph [B6-1] in FIG. 11 shows the result of smoothing the “model waveform” shown in FIG. 8 and the maximum peak in the “model waveform” after smoothing, or The result of creating an approximate waveform by “composite waveform” by superimposing a plurality of Lorentz functions in correspondence with the position of the shoulder peak is shown. The graph [B6-2] in FIG. 12 shows the result of smoothing the “model waveform” shown in FIG. 9 and the maximum peak in the “model waveform” after smoothing, or The result of creating an approximate waveform by “composite waveform” by superimposing a plurality of Lorentz functions in correspondence with the position of the shoulder peak is shown.

（７−２）「波形分解」の結果に基づく、プロファイル
この複数の「単峰性の波形関数」、すなわち、ローレンツ関数型の「分解曲線」の重ね合わせである「合成波形」は、平滑化処理を施した「モデル波形」を「波形分解」したものに相当している。すなわち、平滑化処理を施した「モデル波形」の特徴を示す、ピークの総数、各ピークの位置、高さ、半値幅、ならびに、「合成波形」全体の積分面積、平滑化処理を施した「モデル波形」の積分面積が、解析結果として得られる。 (7-2) Profile based on the result of “waveform decomposition” A plurality of “unimodal waveform functions”, that is, a “composite waveform” that is a superposition of Lorentz function type “decomposition curves” is smoothed. This corresponds to a “model waveform” that has been processed and “waveform decomposition”. That is, the characteristics of the “model waveform” subjected to the smoothing process, the total number of peaks, the position of each peak, the height, the half-value width, the integrated area of the entire “synthesized waveform”, and the smoothed process “ The integrated area of the “model waveform” is obtained as an analysis result.

これらの解析結果、プロファイルとして、プロファイル・テーブル３４に収納される。ならびに、それをグラフ表示したものは、ファイル・リスト・テーブル３２に収納される。 These analysis results and profiles are stored in the profile table 34. In addition, the graph display of this is stored in the file list table 32.

上記の図１１のグラフ［Ｂ６−１］、図１２のグラフ［Ｂ６−２］に示す、プロファイル結果は、その生育条件、特に、光照射条件として、１２時間／１２時間の明暗サイクル４日間（９６時間）、その後、連続明条件３日間、延べ７日間（１６８時間）における測定結果において、１２時間／１２時間の明暗サイクル４日間（９６時間）では、対応する「概日リズム」に相当する２４時間周期のピークが確認され、加えて、その後の、連続明条件３日間においても、２４時間周期の規則性が見出される。 The profile results shown in the graph [B6-1] in FIG. 11 and the graph [B6-2] in FIG. 12 show that the growth conditions, in particular, the light irradiation conditions, 12 days / 12 hours light / dark cycle 4 days ( 96 hours), and then in the measurement results under continuous light conditions for 3 days and a total of 7 days (168 hours), a 12-hour / 12-hour light-dark cycle of 4 days (96 hours) corresponds to the corresponding “circadian rhythm”. The peak of the 24-hour period is confirmed, and in addition, regularity of the 24-hour period is found even in the subsequent continuous light condition for 3 days.

（８）異なる「ライン」間における、対象となる「特定の遺伝子」発現の時間的変化の類似性の有無を検証する「波形比較・分類」
各「ライン」について、各生育条件における、「ルシフェラーゼ酵素活性」の時間的変化の測定結果から、作成された「モデル波形」を利用して、「クラスター分析」を行うことによって、異なる「ライン」間における「類似性」の有無を検証することも可能である。 (8) “Waveform comparison / classification” that verifies the similarity of temporal changes in the expression of the “specific gene” of interest between different “lines”
For each “line”, a different “line” is obtained by performing “cluster analysis” using the “model waveform” created from the measurement result of the temporal change in “luciferase enzyme activity” under each growth condition. It is also possible to verify the presence or absence of “similarity”.

その際、各ラインの「モデル波形」に関して、その「モデル波形」の積分面積で除すことで、規格化された「モデル波形」とした上で、上述の「ライン」内における「クラスター分析」と同じ、手順で「クラスター分析」を行う。 At that time, the “model waveform” of each line is divided by the integration area of the “model waveform” to obtain a standardized “model waveform”, and then “cluster analysis” in the above-mentioned “line”. Perform “cluster analysis” using the same procedure as above.

図１３に示すグラフ［Ｂ７］に、この異なる「ライン」間における、対象となる「特定の遺伝子」発現の時間的変化の類似性の有無を検証する「波形比較・分類」を行った一例を示す。類似性の程度を示すため、規格化された「モデル波形」を、「ｈｅａｔ−ｍａｐ」の形態で、その樹状図とともに示す。 An example in which “waveform comparison / classification” is performed on the graph [B7] shown in FIG. 13 to verify the similarity of temporal changes in the expression of the “specific gene” of interest between the different “lines”. Show. In order to show the degree of similarity, a standardized “model waveform” is shown in the form of “heat-map” along with its tree diagram.

以下に、本発明にかかる解析装置の構成・その機能を、具体的に説明する。 Hereinafter, the configuration and function of the analysis apparatus according to the present invention will be described in detail.

本発明の実施態様を示す、具体例では、「ルシフェラーゼ酵素活性」の時間的変化に相当する「波形」を解析するだけでなく、後々、同様の測定結果、解析結果を含む、「データ」を膨大な数、蓄積していく場合、個々の植物体について、その解析した情報をすぐに検索できるように、「個々の植物体」毎に、「データ」にインデクスを付して、階層化を行ったデータ・ベースを構築可能なシステムとしている。 In the specific example showing the embodiment of the present invention, not only the “waveform” corresponding to the temporal change in the “luciferase enzyme activity” is analyzed, but also “data” including the same measurement result and analysis result is obtained later. When a large number of data is accumulated, an index is added to the “data” for each “individual plant body” so that the analyzed information can be searched immediately. It is a system that can build the database that was done.

図１を参照すると、本具体例は、キーボード当の入力装置１と、プログラム制御により動作するデータ処理装置２と、情報を記憶する主記憶装置（データベース；ＤＢ）３と、ディスプレイ装置等の出力装置４から構成されている。 Referring to FIG. 1, the present example is an input device 1 such as a keyboard, a data processing device 2 that operates under program control, a main storage device (database; DB) 3 that stores information, and an output of a display device or the like. The apparatus 4 is configured.

主記憶装置３は、ライン・テーブル３１、ファイル・リスト・テーブル３２、波形解析リスト・テーブル３３、プロファイル・テーブル３４、モデル波形・テーブル３５とを備える。 The main storage device 3 includes a line table 31, a file list table 32, a waveform analysis list table 33, a profile table 34, and a model waveform table 35.

ライン・テーブル３１は、入力装置１から入力されたライン情報１１のライン名に基づき、ＩＤを発行する。テーブル情報にはライン名、観察部位、生育条件、表現型等、個々の測定結果の測定対象を特定する情報を含む。 The line table 31 issues an ID based on the line name of the line information 11 input from the input device 1. The table information includes information for specifying the measurement object of each measurement result, such as the line name, observation site, growth condition, phenotype, and the like.

ファイル・リスト・テーブル３２は、各ラインにおいて作成されたテキスト・ファイルや画像ファイルを一括管理する。ライン・テーブル３１を作成する段階で発行されたＩＤを基に、ファイル・リスト内でも一意のＩＤを保持する。 The file list table 32 collectively manages text files and image files created in each line. Based on the ID issued at the stage of creating the line table 31, a unique ID is held in the file list.

波形解析リスト・テーブル３３は、ライン・テーブル３１、プロファイル・テーブル３４、及び、モデル波形・テーブル３５を、一意のＩＤにて結合する。他に、波形分解２０前の元の波形についての総面積情報や遺伝型情報を含む。 The waveform analysis list table 33 combines the line table 31, the profile table 34, and the model waveform table 35 with a unique ID. In addition, it includes total area information and genotype information about the original waveform before waveform decomposition 20.

プロファイル・テーブル３４は、波形分解２０後の各分解曲線について、その分解曲線の位置情報（時間）、高さ情報（測定値）、半値幅を格納する。これらの情報は、波形解析リスト・テーブル３３で発行されたＩＤに基づき管理する。 The profile table 34 stores position information (time), height information (measured value), and half value width of each decomposition curve after the waveform decomposition 20. These pieces of information are managed based on the ID issued in the waveform analysis list table 33.

モデル波形・テーブル３５は、「モデル波形作成」１９後の各ラインのモデル波形情報（時間、測定値）を格納する。これらの情報は、波形解析リスト・テーブル３３で発行されたＩＤに基づき、管理する。 The model waveform table 35 stores model waveform information (time, measured value) of each line after “model waveform creation” 19. These pieces of information are managed based on the ID issued in the waveform analysis list table 33.

コンピュータ（中央処理装置；プロセッサ；データ処理装置）２は、ＤＢ登録／更新１２、データ加工６、発現量の計算１５、発現の有無１６、モデル波形作成７、ラインの特徴把握８、グラフ作成２２、波形比較・分類２３とを備える。 The computer (central processing unit; processor; data processing unit) 2 includes DB registration / update 12, data processing 6, expression level calculation 15, presence / absence of expression 16, model waveform creation 7, line characteristic grasping 8, graph creation 22 And waveform comparison / classification 23.

これらの手段は、それぞれ概略次のように動作する。 Each of these means generally operates as follows.

ＤＢ登録／更新１２は、実験毎に、その測定器より出力されたデータ１０、ならびに、その測定対象に関して、実験者が別途入力する、ライン情報１１を受け取る。次いで、過去に実施された実験における、実測「データ１０」と「ライン情報１１」とが蓄積されている、ライン・テーブル３１を検索し、同一の「ライン」について、既存の情報があれば、更新を行う。「ライン情報１１」として、対象とする「ライン」に対する「ライン情報１１」がなければ、新規登録を行う。 The DB registration / update 12 receives the data 10 output from the measuring device and the line information 11 separately input by the experimenter regarding the measurement object for each experiment. Next, the line table 31 in which the actual measurement “data 10” and “line information 11” in the experiment conducted in the past are stored is searched, and if there is existing information about the same “line”, Update. If there is no “line information 11” for “target line” as “line information 11”, new registration is performed.

「データ加工」６の段階は、「経過時間変換」１３と「測定誤差補正」１４の数値データ処理操作を含み、測定器から出力された「データ１０」（時間、及び、測定値＝遺伝子発現量が記載）の加工を行う。 The stage of “data processing” 6 includes numerical data processing operations of “elapsed time conversion” 13 and “measurement error correction” 14, and “data 10” (time and measurement value = gene expression output from the measuring device). Process).

「グラフ作成」２２の機能を利用して、「データ加工」６の段階で、実測「データ１０」に前記の加工を施すことで得られる、「一次処理済データ」を元にグラフを作成する。 Using the “graph creation” 22 function, a graph is created based on “primary processed data” obtained by performing the above-described processing on the actually measured “data 10” at the stage of “data processing” 6. .

「発現量の計算」１５は、遺伝子発現の有無を判別する。測定対象の植物体（サンプル）個々について、得られた結果（遺伝子発現の有無＝表現型）は、ライン・テーブル３１へ格納する。 “Calculation of expression level” 15 determines the presence or absence of gene expression. The results (presence / absence of gene expression = phenotype) obtained for each plant (sample) to be measured are stored in the line table 31.

「発現の有無」１６は、「発現量の計算」１５で決定された遺伝子発現の有無により分岐を行う。「発現有り」と認められたもののみ、以下の「モデル波形作成」７、及び、「ラインの特徴把握」８を行う。 The “presence / absence of expression” 16 branches depending on the presence / absence of gene expression determined in “calculation of expression level” 15. Only those recognized as “present” are subjected to the following “model waveform creation” 7 and “line feature grasp” 8.

「モデル波形作成」７の段階は、複数種波形判別に基づく、「遺伝型自動判別」１７、「時間軸統一」１８、「モデル波形作成」１９の数値データ処理操作を含む。まず、「遺伝型自動判別」１７では、ラインに含まれる複数の「発現有り」サンプル（植物体）が、それぞれ遺伝型により異なる波形を示すか、否かを、複数種波形判別操作にて判断する。「遺伝型により異なる波形を示す」と判断される場合は、その「遺伝型」毎にサンプル（植物体）を区分けした上で、それぞれの波形の種類について、モデル波形を作成する。「遺伝型により異なる波形を示していない」（同じだった）と判断される場合は、１つのモデル波形を作成する。ここで、「モデル波形」とは、ラインの複数のサンプル（植物体）が異なる波形を描いたとき、そのラインの特徴を最も反映する波形のことである。 The stage of “model waveform creation” 7 includes numerical data processing operations of “genotype automatic discrimination” 17, “time axis unification” 18, and “model waveform creation” 19 based on plural types of waveform discrimination. First, in “genotype automatic discrimination” 17, it is judged by a plural type waveform discrimination operation whether or not a plurality of “expressed” samples (plants) included in a line show different waveforms depending on the genotype. To do. If it is determined that “shows a different waveform depending on the genotype”, a sample waveform is created for each type of waveform after classifying the sample (plant) for each “genotype”. If it is determined that “they do not show different waveforms depending on genotype” (they are the same), one model waveform is created. Here, the “model waveform” is a waveform that most reflects the characteristics of the line when a plurality of samples (plants) of the line draw different waveforms.

「ラインの特徴把握」８の段階は、「波形分解」２０、「プロファイル作成」２１の解析的操作を含み、「モデル波形作成」７で作成した「モデル波形」を元にその波形の特徴を抽出し、「波形のプロファイル」を作成する。 The stage of “line feature grasping” 8 includes analytical operations of “waveform decomposition” 20 and “profile creation” 21, and features of the waveform based on the “model waveform” created in “model waveform creation” 7. Extract and create a “waveform profile”.

「波形比較・分類」２３の段階は、「ラインの特徴把握」８で作成したデータ、特には、「波形のプロファイル」をもとに、それぞれのラインのモデル波形を比較分類する。 In the stage of “waveform comparison / classification” 23, the model waveforms of the respective lines are compared and classified based on the data created in “line characteristic grasping” 8, in particular, “waveform profile”.

この解析装置における、具体的なデータの授受、収納などの動作を説明する。 Specific operations such as data exchange and storage in this analyzer will be described.

入力装置１から与えられた、測定器から出力されたデータ１０であるデータ［Ａ１］、データ［Ｂ１］（実際のデータの一部）、及び、ライン情報１１（line：L00001、part：Seedling、generation：T2、Condition：4LD3Wc、Genotype：mixture、等）は、ＤＢ登録／更新１２へ渡される。 Data [A1], data [B1] (part of actual data), and line information 11 (line: L00001, part: Seedling), which are data 10 output from the measuring instrument, provided from the input device 1. generation: T2, Condition: 4LD3Wc, Genotype: mixture, etc.) are passed to the DB registration / update 12.

ＤＢ登録／更新１２は、このライン情報１１をもとにＤＢを検索し、同じデータがあれば更新を、なければ新規登録を行い、ＩＤを発行する。 The DB registration / update 12 searches the DB based on this line information 11, updates if there is the same data, performs new registration if there is the same data, and issues an ID.

次に、「経過時間変換」１３で、測定器から出力されたデータの時間をフォーマットを行う。測定器に固有な時刻表示をＨＨ：ＭＭ形式（ＭＭについては０〜５９表示を０〜９９表示へ変換）へ統一し、測定時刻を測定開始時間からの経過時間へ変換する。 Next, in “elapsed time conversion” 13, the time of the data output from the measuring device is formatted. The time display unique to the measuring device is unified into the HH: MM format (0 to 59 display is converted to 0 to 99 display for MM), and the measurement time is converted into the elapsed time from the measurement start time.

「測定誤差補正」１４では、測定器から出力された測定値が誤っていた場合に自動補正を行う。測定器では様々な要因により誤った値を検出することがある。このような擬陽性の値（ここでは遺伝子が発現していないのに発現したとみなされる値。測定時に静電気が発生するなどでこのような値が出る場合がある）を検出し、値の補正を行う。補正する値の上限：ｉは、対象に応じて、選択すべきもので、解析システムを利用する「ユーザー指定」で変更可能とする。 In “measurement error correction” 14, automatic correction is performed when the measurement value output from the measuring instrument is incorrect. An instrument may detect an incorrect value due to various factors. Detect such false positive values (here, values that are considered to be expressed even though the gene is not expressed. Such values may occur due to static electricity generated during measurement), and correct the values. Do. The upper limit of the value to be corrected: i should be selected according to the target, and can be changed by “user designation” using the analysis system.

補正を行う条件：
（式１−１）｜Ｘ［ｎ−１］−Ｘ［ｎ＋１］｜＜２０
（式１−２）｜（Ｘ［ｎ］−Ｘ［ｎ−１］）＋（Ｘ［ｎ］−Ｘ［ｎ＋１］）｜／２＞｜ｉ｜
補正後の値：
（式１−３）ｙ＝（Ｘ［ｎ−１］＋Ｘ［ｎ＋１］）／２ Conditions for correction:
(Formula 1-1) | X [n−1] −X [n + 1] | <20
(Formula 1-2) | (X [n] −X [n−1]) + (X [n] −X [n + 1]) | / 2> | i |
Value after correction:
(Formula 1-3) y = (X [n−1] + X [n + 1]) / 2

以上、データ加工６を行う前（データ［Ａ１］、データ［Ｂ１］）と行った後（データ［Ａ２］、データ［Ｂ２］）、及び、「測定誤差補正」１４を終えたサンプルデータ（データ［Ａ３］）をグラフ［Ａ３］に示す。補正する値の上限：ｉ＝１００とした場合、データ［Ａ］は、「測定誤差補正」１４の条件式を満たすため補正される。データ［Ｂ］は条件を満たさない。尚、説明用として測定誤差補正１４を行う前のデータ（データ［Ａ２］、データ［Ｂ２］）に対応したグラフをグラフ［Ａ２］、グラフ［Ｂ２］に、行った後のデータ（データ［Ａ３］）に対応したグラフをグラフ［Ａ３］に示す。（実際にこのシステムで作成するグラフはグラフ作成２２で作成したグラフ［Ｂ３］、グラフ［Ｂ５］、グラフ［Ｂ６］、及び波形比較・分類２３で作成したグラフ［Ｂ７］のみ）グラフのＸ軸は時間を、Ｙ軸は遺伝子活性を、各波形はひとつのラインに含まれるサンプル（１〜１６）を示す。概日リズムなどの遺伝子を見分けやすくするため、６時間、２４時間ごとに破線、実線で区切ってある。尚、データ［Ａ３］の＊印ははずれ値を補正したことを示す。 As described above, before performing data processing 6 (data [A1], data [B1]) and after (data [A2], data [B2]), and after completing “measurement error correction” 14, sample data (data [A3]) is shown in graph [A3]. Upper limit of correction value: When i = 100, the data [A] is corrected to satisfy the conditional expression of “measurement error correction” 14. Data [B] does not satisfy the condition. For the purpose of explanation, the graph corresponding to the data before the measurement error correction 14 (data [A2], data [B2]) is converted into the graph [A2] and the graph [B2] and the data after the data (data [A3 ] A graph corresponding to [] is shown in graph [A3]. (Only the graph [B3], the graph [B5], the graph [B6], and the graph [B7] created in the waveform comparison / classification 23 created in the graph creation 22 are actually created in this system.) X axis of the graph Indicates time, Y-axis indicates gene activity, and each waveform indicates samples (1 to 16) included in one line. In order to make it easy to identify genes such as circadian rhythms, they are separated by broken lines and solid lines every 6 hours and 24 hours. Note that * in the data [A3] indicates that the outlier is corrected.

「測定誤差補正」１４を行うことにより、誤った活性の値（＝擬陽性）を排除することができる。また、これにより相対的に小さな値として見落とされていた波形を検出することができる。 By performing “measurement error correction” 14, an erroneous activity value (= false positive) can be eliminated. Further, it is possible to detect a waveform that has been overlooked as a relatively small value.

以下、補正後の動作の実施例についてはデータ［Ｂ］のみ取り扱う。 Hereinafter, only the data [B] is handled in the embodiment of the operation after correction.

発現量の計算１５は、データ［Ｂ３］の測定値及びタイミングにより遺伝子発現の有無を決定する。発現の有無の条件はユーザーによって異なるが、例えば、”遺伝子発現有り”という基準値を、“測定値が３０以上かつ連続して３点以上みられた場合”と設定するならば、データ［Ｂ３］はこの条件を満たすため”遺伝子発現有り”となる。得られた結果（遺伝子発現の有無＝表現型）は「ライン・テーブル」３１へ格納する。プログラム内に遺伝子発現の有無に関する閾値を設定することにより、見落としや個人の主観による誤った決定を防ぐことができる。 The expression level calculation 15 determines the presence or absence of gene expression based on the measurement value and timing of the data [B3]. The condition for the presence or absence of expression varies depending on the user. For example, if the reference value “gene expression is present” is set to “when the measured value is 30 or more and 3 or more points are continuously observed”, the data [B3 ] Is “with gene expression” to satisfy this condition. The obtained result (presence / absence of gene expression = phenotype) is stored in the “line table” 31. By setting a threshold for the presence or absence of gene expression in the program, it is possible to prevent oversight or erroneous determination due to individual subjectivity.

「発現の有無」１６は、「発現量の計算」１５で決定された遺伝子発現の有無により分岐を行う。ラインに含まれる複数あるサンプルのうち、ひとつでも”発現有り”と認められたら発現有りとする。 The “presence / absence of expression” 16 branches depending on the presence / absence of gene expression determined in “calculation of expression level” 15. Expression is considered to be present if at least one of the samples included in the line is recognized as “expressed”.

「複数種波形判別」１７では、１ラインの解析に含まれる複数のサンプルが遺伝型により異なる波形（遺伝子活性）を描くか否かを判別する。遺伝型はホモ（２ｎ）：ヘテロ（ｎ）：なしの３種が存在し、単純に考えると遺伝型がヘテロの場合ホモに比べて遺伝子発現（＝遺伝子活性、つまり波形として現れる値）が半分となる。しかし、自己抑制をする遺伝子等では遺伝子発現が単純に１／２にならず独自の波形を描くことがある。よって、まずこの「複数種波形判別」１７の機能を用いて各サンプルを遺伝型が同じグループに分別し、そのそれぞれについて下記の「モデル波形作成」１９を実施する。 In “multiple types of waveform determination” 17, it is determined whether or not a plurality of samples included in one line analysis draw different waveforms (gene activities) depending on the genotype. There are three types of genotypes: homo (2n): hetero (n): none, and when considered simply, gene expression (= gene activity, that is, a value that appears as a waveform) is half that of homo when the genotype is hetero. It becomes. However, in genes that self-suppress, gene expression is not simply halved, but a unique waveform may be drawn. Accordingly, first, the samples are classified into groups having the same genotype by using the function of “multiple types of waveform discrimination” 17, and “model waveform creation” 19 described below is executed for each sample.

具体的には、得られたサンプルのうち活性がみられたサンプル（ホモ及びヘテロ）についてクラスター分析（＊）を行い、
１．樹状図を指定した閾値で切断したとき２つのグループに分かれる、
２．各グループにサンプルが２つ以上含まれる、
の条件を満たした場合に遺伝型などにより異なる種類の波形があったと判断する。１の条件は、ある程度の閾値（＝距離、グループ間の相違の尺度）で切断した場合にグループが分かれなければ、グループ間の類似度がある程度高く波形が異なるとはいえない。また、３つ以上のグループに分かれた場合も同様にグループ間の類似度が高い、又は誤測定やコンタミの可能性が考えられる。２の条件は、得られたサンプル数がひとつだった場合、確率的にコンタミである可能性が高いと判断する。複数種波形判別１７において遺伝型により波形が異なるか否かについては、波の”形”、つまり、発現量よりも発現するタイミングが重要であるため（少量で他の遺伝子を大きく活性化させるものもあれば、大量にあってもあまり生体内に影響を与えないものもある）、サンプルデータに重み付けを施し波形総面積を全て等しくしたデータを解析対象とした。これによりタイミングを重視した、時系列遺伝子発現が類似したものを同じクラスター内に入れることを可能とした。クラスター分析については、分析の目的や用途に応じて計算法が異なるが、既に生物学的実験により遺伝型がわかったサンプルデータ（以降、実験データ）をクラスター分析にかけた結果、行間行列計算法はｍａｎｈａｔｔａｎ、結合法はｗａｒｄが最も分類感度が高く、正確に遺伝型を分類できたためこれらを使用する。閾値は、実験データをクラスター分析にかけ、リーズナブルに分類できる値を経験的に決定した。得られた結果（複数種波形の有無＝表現型）は「ライン・テーブル」３１へ格納する。今回の具体例では、グループ間の結合距離が７．６ｅ＋０５で２つのグループに別れ、かつそれぞれのグループに２つ以上のサンプルが含まれたため、遺伝型により波形が異なると判断する。 Specifically, cluster analysis (*) was performed on samples (homo and hetero) in which activity was observed among the obtained samples,
1. When the dendrogram is cut at the specified threshold, it is divided into two groups.
2. Each group contains two or more samples,
When the above condition is satisfied, it is determined that there are different types of waveforms depending on the genotype. The condition of 1 is that if the groups are not separated when they are cut at a certain threshold (= distance, a measure of difference between groups), the degree of similarity between the groups cannot be said to be high to some extent. Similarly, when the group is divided into three or more groups, the degree of similarity between the groups is high, or there is a possibility of erroneous measurement or contamination. Condition 2 determines that there is a high probability of contamination if the number of samples obtained is one. Whether or not the waveforms differ depending on the genotype in the multiple-type waveform discrimination 17 is because the “shape” of the wave, that is, the timing of expression is more important than the expression level (those that greatly activate other genes in small amounts) In some cases, even if it is a large amount, it does not affect the living body very much), and the sample data is weighted and the total waveform area is all equal. This made it possible to put similar time-series gene expression in the same cluster with emphasis on timing. For cluster analysis, the calculation method varies depending on the purpose and application of the analysis, but as a result of applying cluster analysis to sample data whose genotype has already been determined by biological experiments (hereinafter, experimental data) Manhattan and the combination method are used because ward has the highest classification sensitivity and can accurately classify the genotype. The threshold value was determined empirically by subjecting the experimental data to cluster analysis and reasonably classifying it. The obtained result (the presence / absence of plural types of waveforms = phenotype) is stored in the “line table” 31. In this specific example, since the coupling distance between groups is 7.6e + 05, the group is divided into two groups, and each group includes two or more samples, so it is determined that the waveform differs depending on the genotype.

（＊）クラスター分析［ｃｌｕｓｔｅｒａｎａｌｙｓｉｓ］
クラスター分析とは、異質なもののまざり合っている対象の中で互いに似たものを集めて集落（クラスター）をつくり、対象を分類しようという方法を総称したもので、これを統計的に行う。結果は、類似したデータを順に結合していくデンドログラム（樹状図）として表現される。分析の目的や用途に応じて、いろいろな方法（行間行列計算法や結合法）が提唱されているが、その計算法を決定し統計解析ソフトＲでの解析法を確立した。クラスター分析は現在、症状や検査値にもとづく疾患の分類、財務諸指標による企業の分類、形状や性質による細菌の分類、といったさまざまな分野に応用されている。 (*) Cluster analysis [cluster analysis]
Cluster analysis is a collective term for a method of collecting communities that are similar to each other among different objects that are similar to each other to create a community (cluster) and classifying the objects, and this is performed statistically. The result is expressed as a dendrogram (dendrogram) that combines similar data in sequence. Various methods (row matrix calculation method and combination method) have been proposed depending on the purpose and application of the analysis. The calculation method was determined and an analysis method using the statistical analysis software R was established. Cluster analysis is currently applied in various fields, such as classification of diseases based on symptoms and laboratory values, classification of companies by financial indicators, and classification of bacteria by shape and nature.

「時間軸統一」１８では、「経過時間変換」１３によりフォーマットした時間において測定時刻の間隔をライン間で統一する。これは後に異なるライン間で類似度を比較する（波形比較・分類２３）ため、ライン間で異なる観測時刻を擬似的に揃える。時間軸の統一には最小二乗法（＊）を用い、統一前と統一後の誤差を最小限に抑えた。時間の幅は「ユーザー指定」で変更可能とする。 In the “unify time axis” 18, the measurement time interval is unified between lines in the time formatted by the “elapsed time conversion” 13. In this case, similarities are compared between different lines later (waveform comparison / classification 23), so that different observation times are aligned between the lines in a pseudo manner. The least square method (*) was used to unify the time axis, minimizing errors before and after unification. The time width can be changed by “user specified”.

（＊）最小二乗法［ｍｅｔｈｏｄｏｆｌｅａｓｔｓｑｕａｒｅｓ］
近似直線のひとつ。点（ｘ、ｙ）の集合に対してその傾向を最もよく表す直線を当てはめる数学的方法。ｎ個のデータ（ｘ₁，ｙ₁），（ｘ₂，ｙ₂），・・・，（ｘ_n，ｙ_n）が得られたとき、その集団の傾向を最もよく表す直線ｙ＝ａｘ＋ｂを与える、係数ａ，ｂは、式２−１、式２−２で示すことができる。 (*) Least squares method [method of least squares]
One of the approximate straight lines. A mathematical method of fitting a straight line that best represents the trend to a set of points (x, y). When n data (x ₁ , y ₁ ), (x ₂ , y ₂ ),..., (x _n , y _n ) are obtained, a straight line y = ax + b that best represents the tendency of the group is obtained. The given coefficients a and b can be expressed by Expression 2-1 and Expression 2-2.

「モデル波形作成」１９では、上記「遺伝型判別」１８、「時間軸統一」１８の機能を用い、”遺伝型グループ毎”に、”時間軸を統一した”データでモデル波形の作成を行う。各サンプルの時間軸統一後のそれぞれの時刻での測定値から、発現量の計算１５で”発現有り”と認められたものを対象として、はずれ値（異常に飛び離れた値のこと。通常、同じものを解析すればある値を中心にばらつくはずであるが離れた値が出る場合がある）に対し比較的安定している中央値をとるものとする。 In the “model waveform creation” 19, a model waveform is created by using “unified time axis” data for each “genotype group” using the functions of “genotype discrimination” 18 and “unify time axis” 18. . From the measured values at the respective times after unifying the time axis of each sample, outliers (those that are abnormally separated from each other) that are recognized as “present” in the calculation of the expression level 15. If the same thing is analyzed, a certain value should be centered, but a distant value may be obtained.)

以上、「複数種波形判別」１７で作成したクラスター解析のグラフをグラフ［Ｂ３］、「モデル波形作成」７を行いモデル波形を追加したグラフをグラフ［Ｂ４−１］、［Ｂ４−２］に示す。グラフ［Ｂ３］は、横軸が時刻を、縦軸が各サンプル及びサンプルの類似度を示したデンドログラム（樹上図）を示す。このクラスター分析では、各サンプルを類似度で結合したデンドログラムだけでなく、遺伝子が、どのタイミングで「アップレギュレート（活性化）」または「ダウンレギュレート（不活性化）」したかが一目でわかるよう擬似的なイメージを融合した作図法を用いる（ｈｅａｔｍａｐ）。これは、各波形の強弱を相対的に色で表したもので、ここでは青→黄→赤になるにつれ強い活性であったことを示す。このグラフ［Ｂ４］は、「モデル波形」を作成することにより、見落としがちである”波形の性質”を顕著に見ることができる。また、この例では、波形が１２時間毎に「概日リズム」を刻んでいることがわかった。これらの波形の特徴は後述の「波形分解」２０により更に顕著に観察することが可能になる。 As described above, the graph of the cluster analysis created in the “plurality waveform discrimination” 17 is the graph [B3], and the graph obtained by performing the “model waveform creation” 7 and adding the model waveform is the graphs [B4-1] and [B4-2]. Show. The graph [B3] shows a dendrogram (tree view) in which the horizontal axis indicates time, and the vertical axis indicates each sample and the similarity of the samples. In this cluster analysis, not only the dendrogram that connects each sample with similarity, but also when the gene was “up-regulated (activated)” or “down-regulated (inactivated)” at a glance. As can be seen, a plotting method that fuses pseudo images is used (heatmap). This is a relative color representation of the strength of each waveform, which indicates that the activity was stronger as blue → yellow → red. In this graph [B4], by creating a “model waveform”, it is possible to notice the “waveform properties” that are often overlooked. Further, in this example, it was found that the waveform engraved “circadian rhythm” every 12 hours. These waveform characteristics can be observed more prominently by “waveform decomposition” 20 described later.

「波形分解」２０では、「モデル波形作成」１９で作成された「モデル波形」を元にその波形を分解解析する。波形より特徴（極地の個数／位置／高さ／半値幅／面積等）を得るとき、このような実験データの波形は、音波のように規則正しいものではなく複雑な形をしているため、条件をそろえて値をとることが難しい。例えば、半値幅を取るにも極地を中心としてどこまでを幅とするかの境があいまいとなる。よって、複雑な波形をより単純な波形として表現するために、複数の既知の曲線の合成であると考え、適合させることを試みた。言い換えると、ひとつの「モデル波形」（複雑な曲線）を極地毎に、既知の関数（正規分布のような単純な曲線）に分解する。 In the “waveform decomposition” 20, the waveform is decomposed and analyzed based on the “model waveform” created in the “model waveform creation” 19. When obtaining the characteristics (number of polar regions / position / height / half-value width / area, etc.) from the waveform, the waveform of such experimental data is not regular like a sound wave but has a complicated shape. It is difficult to take values with For example, even when taking the half-value width, the boundary of how far the width is from the polar region is ambiguous. Therefore, in order to express a complex waveform as a simpler waveform, it was considered to be a synthesis of a plurality of known curves and attempted to be adapted. In other words, one “model waveform” (complex curve) is decomposed into known functions (simple curves such as normal distribution) for each polar region.

分解する波形は、「モデル波形作成」１９で作成した「モデル波形」を用いる。解析方法の流れは、まず、移動平均法を用いて波形を平滑化する（これにより誤った極地の検出を抑える）。次に、平滑化した波形を微分し、極地を特定する。極地のデータを元に、勾配法のひとつであるＤＦＰ法などを用いて既知の曲線を極地ごとに適合させる。最終的に、これら分解した個々の曲線より、特徴（値）を取得する。 The “model waveform” created in “model waveform creation” 19 is used as the waveform to be decomposed. In the flow of the analysis method, first, the waveform is smoothed using the moving average method (this suppresses detection of erroneous polar regions). Next, the smoothed waveform is differentiated to identify the polar region. Based on the polar data, a known curve is adapted to each polar region using the DFP method which is one of the gradient methods. Finally, features (values) are acquired from these decomposed individual curves.

これらの流れについて、詳細を説明する。まず、実際の観測波形では雑音に相当する凸凹の激しい波形が現れるため、移動平均法を用いて雑音に相当する値の除去（修正）を行う。測定誤差補正１４で大きな雑音については除去済みであるが、もっと細かい雑音（測定環境や測定機械によるもの等）を対象とする。雑音除去法としての移動平均法は最も一般的に用いられ複数の手法が提案されているが、その中で多項式適合法を用いた。平滑化点数は対象とするデータにより異なるが、５〜９点が良い成績を示した。次に、波形を微分することにより極地の位置、個数を特定する。移動平均法による雑音の除去を行った波形（以降、原波形）を２次微分および３次微分する。２次微分曲線の極小点かつ３次微分曲線のゼロの位置が原波形の極地であるから、位置及び数が決定される。最後に、波形を極地ごとに特定の曲線関数（ここでは、得られた波形の性質よりローレンツ波形とする）で表現できると仮定しこの曲線に適合させた。ローレンツ波形の式を式３に示す。ここで、νは波数を、ｈは極地の高さを、ｕは極地の波数を、ｗは半値幅を示す。先ほど求めた極地の情報をもとに、ローレンツ波形を原波形に当てはめ、２つの曲線の誤差が最小となるローレンツ波形（以降、分解曲線）を計算する。 Details of these flows will be described. First, since an uneven waveform corresponding to noise appears in the actual observed waveform, the value corresponding to noise is removed (corrected) using the moving average method. Although large noise has been removed in the measurement error correction 14, more detailed noise (such as that caused by the measurement environment and the measurement machine) is targeted. The moving average method as a denoising method is most commonly used and a plurality of methods have been proposed. Among them, a polynomial fitting method is used. The smoothing score varies depending on the target data, but 5 to 9 points showed good results. Next, the position and number of polar regions are specified by differentiating the waveform. A waveform obtained by removing noise by the moving average method (hereinafter referred to as an original waveform) is subjected to second and third differentiation. Since the minimum point of the second derivative curve and the zero position of the third derivative curve are the polar regions of the original waveform, the position and number are determined. Finally, it was assumed that the waveform could be expressed by a specific curve function (here, the Lorentz waveform based on the properties of the obtained waveform) for each polar region, and was fitted to this curve. The Lorentz waveform equation is shown in Equation 3. Here, ν represents the wave number, h represents the height of the polar region, u represents the wave number of the polar region, and w represents the half width. Based on the polar information obtained earlier, the Lorentz waveform is applied to the original waveform to calculate a Lorentz waveform (hereinafter, decomposition curve) that minimizes the error between the two curves.

最終的に、これら分解した個々の分解曲線より、必要とする情報（極地の個数／位置／高さ／半値幅／面積等）を取得する。「波形分解」２０を行うことにより、同じ条件で極値に関する各値を取得でき、極値を明瞭に表せることから従来では見落としていた波形の性質を見つけることができる。 Finally, necessary information (number of polar regions / position / height / half-value width / area, etc.) is acquired from these decomposed individual decomposition curves. By performing “waveform decomposition” 20, each value relating to the extreme value can be acquired under the same conditions, and the extreme value can be clearly represented, so that the characteristic of the waveform that has been overlooked in the past can be found.

プロファイル作成２１では、波形分解２０で洗い出したライン毎の特徴を、プロファイルとしてラインテーブル３１を元にプロファイルテーブル３４へ格納する。現在のところ、プロファイルとして格納しているデータは、ＩＤ、極地の個数、原波形の極地の位置（時間）、原波形の極地の値、分解曲線の極地の位置（時間）、分解曲線の極地の値、半値幅、原波形の面積、分解曲線の総面積（＝分解曲線の合成波形、各分解曲線の和）である。 In the profile creation 21, the characteristics for each line extracted by the waveform decomposition 20 are stored in the profile table 34 based on the line table 31 as a profile. Currently, the data stored as a profile includes ID, the number of polar regions, the position of the polar region of the original waveform (time), the value of the polar region of the original waveform, the position of the polar region of the decomposition curve (time), and the polar region of the decomposition curve , Half width, area of original waveform, total area of decomposition curve (= composite waveform of decomposition curve, sum of decomposition curves).

グラフ作成２２では、以上の結果を受け取りグラフを作成する。データ加工６を終えたデータをグラフ化したものに加え、複数種波形判別１７で認められた場合については複数波形の、それ以外については原波形について、モデル波形、分解曲線を追加する。発現の有無１６で発現がみられなかったラインの場合、データ加工６を終えたデータのグラフのみ作成する。表示法は一定時間ごとに切り分け、生育条件の確認や概日リズムを刻む遺伝子発現の観察が容易にできるよう加工した。作成したグラフは既に登録したライン・テーブル３１の情報を元にファイル・リスト・テーブル３２へ格納する。 The graph creation 22 receives the above results and creates a graph. In addition to a graph of the data after the data processing 6 is completed, a model waveform and a decomposition curve are added for a plurality of waveforms when the plural types of waveform discrimination 17 are recognized, and for the original waveform otherwise. In the case of a line in which expression is not observed due to presence / absence of expression 16, only a graph of data after data processing 6 is created. The display method was divided at regular intervals and processed so that it was easy to confirm growth conditions and observe gene expression with a circadian rhythm. The created graph is stored in the file list table 32 based on the already registered line table 31 information.

以上、「波形分解」２０後のグラフを、グラフ［Ｂ６−１］、［６−２］に示す（「複数種波形判別」１７で、「遺伝型」により波形が異なったため２つある）。Ｘ軸は時間を、Ｙ軸は「遺伝子活性」を、波形のうちＯｒｉｇｉｎａｌ（赤または青の太線）はモデル波形を、Ｓｅｐａｒａｔｅ〜はそのモデル波形の分解曲線を、Ｓｕｍはその分解曲線を再度合成したもの（各分解曲線の和）を示す。ＯｒｉｇｉｎａｌとＳｅｐａｒａｔｅの波形については相似している程分解曲線の精度が良い。「波形分解」２０を行うことにより、半値幅を正確に取得できるだけでなく、この例では、従来見落としていた９６時間以降も、「概日リズム」を刻んでいることがわかる。 As described above, the graphs after “waveform decomposition” 20 are shown in graphs [B6-1] and [6-2] (there are two types of “multiple types of waveform discrimination” 17 because the waveforms differ depending on “genotype”). X axis is time, Y axis is "gene activity", Original (red or blue thick line) of the waveform is the model waveform, Separate ~ is the decomposition curve of the model waveform, Sum is the synthesis curve again (Sum of each decomposition curve) is shown. The accuracy of the decomposition curve is better as the waveforms of Original and Separate are more similar. By performing “waveform decomposition” 20, not only can the half width be accurately obtained, but in this example, it can be seen that “circadian rhythm” is carved even after 96 hours, which was conventionally overlooked.

「波形比較・分類」２３は、異なるライン間でクラスター分析を行い類似度の比較や分類を行う。比較するラインは、ライン・テーブル３１の情報を用いて条件により選抜できる（“光条件が明暗サイクル４日間＋連続明条件３日間に当てはまるもの”等）。具体的には、ライン・テーブル３１、モデル波形テーブル３５を用いて、各ライン各条件に合わせた「モデルデータ」を取り出す。「複数種波形判別」１７と同じく、ライン間の比較をする際は、発現量よりも発現するタイミングが重要であるため、「モデルデータ」に波形総面積を全て等しくする重み付けを施し、その「データ」を解析対象とした。「クラスター分析」はｈｅａｔｍａｐを用い、クラスター分析の計算法など全て「複数種波形判別」１７と同様に行う。 “Waveform comparison / classification” 23 performs cluster analysis between different lines, and compares and classifies similarities. The lines to be compared can be selected according to conditions using the information in the line table 31 (“light conditions apply to light / dark cycle 4 days + continuous light conditions 3 days”, etc.). Specifically, using the line table 31 and the model waveform table 35, “model data” that matches each line condition is extracted. Similar to “Multiple Waveform Discrimination” 17, when comparing between lines, the timing of expression is more important than the expression level. Therefore, the “model data” is weighted so that the total area of the waveforms is all equal. Data "was the subject of analysis. The “cluster analysis” is performed in the same manner as “multiple types of waveform discrimination” 17 using a heat map and the calculation method of the cluster analysis.

以上、「波形比較・分類」２３後のグラフをグラフ［Ｂ７］に示す。横軸が時間を、縦軸が各ラインの「モデルデータ」及びサンプルの類似度を示したデンドログラム（樹状図）を示す。これにより、各遺伝子間の関係をより容易推測することができる。この例では、大きく「一定期間、一度活性化し、その後は不活性化する」遺伝子と、「一定時間ごとに活性化と不活性化を繰り返す」遺伝子に分けられる。更によく見ると時間の差から、ある遺伝子の活性化／不活性化を促進している遺伝子、というものも発見可能かもしれない。 The graph after the “waveform comparison / classification” 23 is shown in the graph [B7]. The horizontal axis represents time, and the vertical axis represents “model data” of each line and a dendrogram (dendrogram) showing the similarity of samples. Thereby, the relationship between each gene can be estimated more easily. In this example, the gene can be broadly divided into a gene “activated once for a certain period and then inactivated” and a gene “repeatedly activated and deactivated at certain intervals”. If you look more closely, it may be possible to find a gene that promotes activation / inactivation of a gene from the time difference.

「波形比較・分類」２４は、通常の検索を行うか、波形比較・分類２３に進むかで分岐する。通常の検索を行う場合、以上で説明した波形解析５で処理した結果をＤＢから検索する。検索対象はライン・テーブル３１、ファイル・リスト・テーブル３２、波形解析リスト・テーブル３３、プロファイル・テーブル３４、モデル波形・テーブル３５の全てであるが、「ユーザーのニーズ」によって概要〜詳細まで提供情報を選択することが可能である。また、閲覧したいラインを条件によって絞り込むことも可能である。波形比較・分類を行う場合は、波形比較・分類２３へ進む。 The “waveform comparison / classification” 24 branches depending on whether a normal search is performed or the process proceeds to the waveform comparison / classification 23. When performing a normal search, the result processed in the waveform analysis 5 described above is searched from the DB. The search target is all of the line table 31, the file list table 32, the waveform analysis list table 33, the profile table 34, and the model waveform table 35. Information provided from the outline to the details depending on "user needs" Can be selected. It is also possible to narrow down the lines to be browsed according to conditions. When performing waveform comparison / classification, the process proceeds to waveform comparison / classification 23.

「検索」２５は、処理したい内容を選択する。「検索」又は「波形比較・分類」２３のどちらを実行するか選択し、その詳細な条件を設定する。 “Search” 25 selects contents to be processed. Select whether to execute “search” or “waveform comparison / classification” 23 and set the detailed conditions.

「画面出力」２６は、現在までに解析した情報を提示する。提示ラインや情報の詳細具合、グラフの取得等は全てユーザーが選択可能である。 “Screen output” 26 presents information analyzed so far. The presentation line, details of information, acquisition of graphs, etc. can all be selected by the user.

以下に、本発明にかかる解析システムを利用することによる、利点に関して、説明を加える。 Below, explanation is added about the advantage by utilizing the analysis system concerning the present invention.

従来の解析方法では、手作業で実験の測定機器より得られた数値データをＥｘｃｅｌ等の表計算ソフトウェアで整理、計算し、グラフを作る程度のことしかできなかった。手作業でグラフから特徴を抽出する場合、目視による誤差・検出漏れが生じ、データには各作業者の主観が入る。また、大量にこれらのデータを得て、これらのデータよって、
・時間がかかる
・労力がかかる
・数値に誤差が生じる
・測定者の主観が入る
・グラフの特徴の検出漏れがある
・グラフの特徴を数値化できない
等のデメリットがあり、
・各ラインの特徴の把握が困難（またはできない）
・ライン間の比較分類が困難（またはできない）
・大量の実験データを処理することが困難（またはできない）
という問題点があった。 In the conventional analysis method, the numerical data obtained from the experimental measuring instrument manually can be arranged and calculated by a spreadsheet software such as Excel, and only a graph can be created. When features are extracted manually from the graph, visual errors and omissions occur, and the subjectivity of each worker is included in the data. Moreover, these data are obtained in large quantities, and with these data,
・ It takes time ・ It takes labor ・ An error occurs in the numerical value ・ Subjectivity of the measurer enters ・ There is a lack of detection of graph features
・ Understanding the characteristics of each line is difficult (or impossible)
・ Comparison between lines is difficult (or impossible)
・ It is difficult (or impossible) to process a large amount of experimental data
There was a problem.

本発明にかかる解析システムでは、以上の機能を全てシステム化することにより、従来の解析方法における課題に対して、
・時間がかかる → 大幅に減少
・労力がかかる → 大幅に減少
・数値に誤差が生じる → 大幅に減少
・測定者の主観が入る → 全て客観的
・検出漏れがある → 検出漏れ無し
・数値化できない → 全て数値化（定量的）
と、その課題を一括して解決することができる。 In the analysis system according to the present invention, by systematizing all the above functions, for the problem in the conventional analysis method,
・ It takes time → Significantly reduces ・ It takes labor → Significantly decreases ・ Errors in numerical values → Significantly decreases ・ Increased subjectivity of the measurer → All objectives ・ Detection omission → No detection omission → All numerical (quantitative)
And the problem can be solved collectively.

また、プロファイルより各ラインの特徴を定量的につかむことができ、これによりライン間の比較分類を可能にし、これらの遺伝子発現パターンの原因となる遺伝子間の関連の推測をも可能にする。 In addition, the characteristics of each line can be quantitatively grasped from the profile, thereby enabling comparison and classification between the lines, and inferring the relationship between genes that cause these gene expression patterns.

また、本発明にかかる解析システムを応用することで、下記の利点が得られる。
・大量の時系列発現データを得たとき、客観的に解析された定量的なデータでラインを比較分類することができる。
・波形分解を行うことによりひとつの波形を複数の波形で見るため、特徴をより顕著にみることができる。実際に、従来の方法では発見できなかった、「概日性遺伝子発現」の発見などの成果がみられた。
・マイクロアレイやセルアレイなどで経時的な遺伝子発現を観察する際にも利用できる。
・遺伝子間の発現順序や様式を推測することができ、生命科学に貢献することができる。 Moreover, the following advantages can be obtained by applying the analysis system according to the present invention.
・ When a large amount of time-series expression data is obtained, the lines can be compared and classified with quantitative data that has been objectively analyzed.
・ By performing waveform decomposition, a single waveform is viewed as a plurality of waveforms, so that the characteristics can be seen more remarkably. In fact, results such as the discovery of “circadian gene expression” that could not be found by conventional methods were observed.
-It can also be used when observing gene expression over time in a microarray or cell array.
・ Estimates the order and mode of expression between genes and contributes to life science.

本発明の解析手法は、実験の手法に依らず、時系列の遺伝子発現解析など、連続的な波形を描く対象に関して、その類似性の検証を目的とする詳細な統計的な解析に適用できる。 The analysis method of the present invention can be applied to detailed statistical analysis for the purpose of verifying the similarity of a target for drawing a continuous waveform, such as time-series gene expression analysis, regardless of the experimental method.

本発明にかかる解析システムの構成を模式的に示す図である。」It is a figure which shows typically the structure of the analysis system concerning this invention. " 本発明の解析方法において、「データ加工」６の段階における、「経過時間変換」１３と「測定誤差補正」１４の数値データ処理操作によって、測定器から出力された「データ１０」（時間、及び、測定値＝遺伝子発現量が記載）の加工、Ａ１→Ａ２→Ａ３の作業を説明する図である。In the analysis method of the present invention, the “data 10” (time and time) output from the measuring instrument by the numerical data processing operation of “elapsed time conversion” 13 and “measurement error correction” 14 in the stage of “data processing” 6. FIG. 4 is a diagram for explaining the processing of A1 → A2 → A3. 本発明の解析方法において、「データ加工」６の段階における、「経過時間変換」１３と「測定誤差補正」１４の数値データ処理操作によって、加工される「データ１０」（Ｂ１）と、加工後、「発現量の計算」１５に利用する「一次加工データ」（Ｂ２）を説明する図であるIn the analysis method of the present invention, “data 10” (B1) processed by numerical data processing operations of “elapsed time conversion” 13 and “measurement error correction” 14 at the stage of “data processing” 6 and post-processing FIG. 4 is a diagram for explaining “primary processing data” (B2) used for “calculation of expression level” 15; 測定器から出力された「データ１０」を、「経過時間変換」１３の処理後、各サンプルの「ルシフェラーゼ酵素活性」の時間的変化を、併せてグラフ表示した、グラフ［Ａ２］を示す。“Data 10” output from the measuring device is a graph [A2] in which the time change of “luciferase enzyme activity” of each sample after the processing of “elapsed time conversion” 13 is also displayed in a graph. 測定器から出力された「データ１０」を、「データ加工」６の段階における、「経過時間変換」１３と「測定誤差補正」１４の数値データ処理操作によって、加工した、各サンプルの「一次加工データ」を、併せてグラフ表示した、グラフ［Ａ３］を示す。The “primary processing” of each sample processed from the “data 10” output from the measuring instrument by the numerical data processing operation of “elapsed time conversion” 13 and “measurement error correction” 14 in the stage of “data processing” 6. Graph [A3] in which “data” is also displayed as a graph is shown. 「クラスター分析」に利用される、各サンプルの「ルシフェラーゼ酵素活性」の積分値で規格化した「ルシフェラーゼ酵素活性」の時間的変化を、併せてグラフ表示した、グラフ［Ｂ２］を示す。A graph [B2] is also shown, in which a temporal change of the “luciferase enzyme activity” normalized by the integral value of the “luciferase enzyme activity” of each sample used for “cluster analysis” is also displayed as a graph. 「クラスター分析」によって、分類に従って、作成される樹状図と、各サンプルにおける規格化した「ルシフェラーゼ酵素活性」の時間的変化を、ｈｅａｔｍａｐ形式で併せてグラフ表示した、グラフ［Ｂ３］を示す。The graph [B3], in which the dendrogram created by “cluster analysis” according to the classification and the temporal change in the normalized “luciferase enzyme activity” in each sample, is also displayed in a heatmap format. グラフ［Ｂ３］に示す「クラスター分析」によって、二つのグループに大別されたサンプル中、上部のグループに属するサンプル群について、作成された「モデル波形」をグラフ表示した、グラフ［Ｂ４−１］を示す。Graph [B4-1], in which the “model waveform” created for the sample group belonging to the upper group among the samples roughly divided into two groups by “cluster analysis” shown in graph [B3] is displayed as a graph. Indicates. グラフ［Ｂ３］に示す「クラスター分析」によって、二つのグループに大別されたサンプル中、下部のグループに属するサンプル群について、作成された「モデル波形」をグラフ表示した、グラフ［Ｂ４−２］を示す。Graph [B4-2], in which the “model waveform” created for the sample group belonging to the lower group among the samples roughly divided into two groups by “cluster analysis” shown in graph [B3] is displayed as a graph. Indicates. グラフ［Ｂ４−１］を示す、上部のグループに属するサンプル群について、作成された「モデル波形」と、グラフ［Ｂ４−２］を示す、下部のグループに属するサンプル群について、作成された「モデル波形」とを対比させて、グラフ表示した、グラフ［Ｂ５］を示す。The “model waveform” created for the sample group belonging to the upper group showing the graph [B4-1] and the “model” created for the sample group belonging to the lower group showing the graph [B4-2] The graph [B5] is displayed in a graph by comparing with the “waveform”. グラフ［Ｂ４−１］を示す、上部のグループに属するサンプル群について、作成された「モデル波形」に基づき、平滑化処理を施した「モデル波形」、「波形分解」された複数の「分解曲線」、該「分解曲線」を合成した「合成波形」を、対比させて、グラフ表示した、グラフ［Ｂ６−１］を示す。For the sample group belonging to the upper group, which represents the graph [B4-1], a “model waveform” that has been smoothed based on the created “model waveform”, and a plurality of “decomposition curves” that have been “waveform decomposed” ”, A graph [B6-1] in which the“ composite waveform ”obtained by synthesizing the“ decomposition curve ”is compared and displayed as a graph. グラフ［Ｂ４−２］を示す、下部のグループに属するサンプル群について、作成された「モデル波形」に基づき、平滑化処理を施した「モデル波形」、「波形分解」された複数の「分解曲線」、該「分解曲線」を合成した「合成波形」を、対比させて、グラフ表示した、グラフ［Ｂ６−２］を示す。For the sample group belonging to the lower group, which represents the graph [B4-2], a “model waveform” that has been smoothed based on the created “model waveform”, and a plurality of “decomposition curves” that have been subjected to “waveform decomposition” , And a graph [B6-2] in which the “composite waveform” obtained by synthesizing the “decomposition curve” is displayed as a graph. 各「ライン」について作成された、「ルシフェラーゼ酵素活性」の時間的変化を示す「モデル波形」を利用して、複数の「ライン」間において、「波形比較・分類」を行い「クラスター化」の結果を示す、樹状図と、その「モデル波形」をｈｅａｔｍａｐ形式で併せてグラフ表示した、グラフ［Ｂ７］を示す。Using the “model waveform” created for each “line” and showing the temporal change in “luciferase enzyme activity”, “waveform comparison / classification” is performed between multiple “lines”. A graph [B7] in which the dendrogram showing the result and the “model waveform” are displayed together in a heatmap format is shown.

Claims

An analysis method for the purpose of comparative classification based on feature extraction of time series gene expression data in individuals derived from the same "line" and characteristics of time series gene expression data between individuals,
The time-series gene expression level data in the individual organism uses a luciferase gene as a `` reporter gene '' capable of non-destructively monitoring the expression level of the gene, and as a chronological change in `` luciferase enzyme activity '', Observed data,
The analysis method is
It is determined whether or not “measurement error” data caused by the measuring instrument is mixed in the time-series numerical data of “luciferase enzyme activity” obtained from the measuring instrument. A process of “measurement error correction”, which performs an operation of correcting the “measurement error” data to an “estimated value” based on the numerical data before and after that;
Comparing numerical data of time-series changes of “luciferase enzyme activity” for multiple organisms originating from the same “line”, hierarchical clustering is performed according to the level of similarity, and at least Performing “cluster analysis” operation to form one or two groups including the individual of “multiple types of waveform discrimination”;
Based on numerical data of time-series changes in the “luciferase enzyme activity” for biological individuals originating from the same “line”, observed in the individual at a desired elapsed time starting from the measurement start time It is converted into numerical data of time-series changes in predicted “luciferase enzyme activity”, and “time axis unification” is performed between the numerical data of time-series changes in “luciferase enzyme activity” in each individual. Process of “unification of time axis”;
Using the numerical data of the time-series changes in “luciferase enzyme activity” in each individual for which “time axis unification” has been made, a plurality of types determined to belong to the same group by the “plurality waveform discrimination” Create a “model waveform” that shows the time-series changes in the representative value of “luciferase enzyme activity”, reflecting the similarity of the time-series changes in “luciferase enzyme activity” in this group. The process of “model waveform creation”;
Based on a “model waveform” indicating a time-series change of a representative value of “luciferase enzyme activity” for a group consisting of a plurality of individuals determined to belong to the same group, a plurality of “unimodal waveform functions” are provided. Overlay, create a “composite waveform” that approximately shows the waveform characteristics of the “model waveform”, and for each of the “unimodal waveform functions”, the values of the peak position, peak height, and half width A method of analyzing time-series gene expression level data, comprising a step of “waveform decomposition” to determine

Using the “model waveform” created in the process of “model waveform creation” for the biological group derived from each “line”,
Compare “model waveforms” of organisms derived from multiple “lines”, perform hierarchical clustering according to the level of similarity, and at least one group that includes multiple “lines”, or The method of analyzing time-series gene expression level data according to claim 1, further comprising a “waveform comparison / classification” step of performing two “cluster analysis” operations.

The analysis of time-series gene expression level data according to claim 1, wherein the "unimodal waveform function" used in the "waveform decomposition" step is a Lorentz function type waveform function. Method.

In the process of “model waveform creation”,
In a group consisting of a plurality of individuals discriminated to belong to the same group, as the representative value of “luciferase enzyme activity” that reflects the similarity of time-series changes in “luciferase enzyme activity” in this individual group, the individual 2. The method for analyzing time-series gene expression level data according to claim 1, wherein a central value of “luciferase enzyme activity” at each time in a group is selected.

In the “Multiple Waveform Discrimination” process,
When comparing numerical data of time-series changes in the “luciferase enzyme activity” for multiple organisms that originate from the same “line”, each individual can be clustered according to the level of similarity. The similarity of numerical data of time-series changes in “luciferase enzyme activity” between the two is calculated using the inter-row matrix calculation method to calculate the “distance” between the numerical data of time-series changes in “luciferase enzyme activity”. The time-series gene expression level data analysis method according to claim 1, wherein the time-series gene expression level data is used.

As the inter-row matrix calculation method, a manhattan method is used,
The time series according to claim 6, characterized in that the Ward method is used as a binding method for clustering based on the "distance" between the numerical data of the time series changes of the "luciferase enzyme activity". Analysis method of gene expression level data.

An analysis system that can be used for analysis for the purpose of comparative classification based on feature extraction of time-series gene expression data in organism individuals originating from the same "line" and characteristics of time-series gene expression data between individuals Because
The analysis system, as a mechanism for performing analysis according to the time series gene expression level data analysis method according to claim 1,
It is determined whether or not “measurement error” data caused by the measuring instrument is mixed in the time-series numerical data of “luciferase enzyme activity” obtained from the measuring instrument. A “measurement error correction” mechanism that performs an operation to correct the “measurement error” data to an “estimated value” based on the numerical data before and after that;
Comparing numerical data of time-series changes of “luciferase enzyme activity” for multiple organisms originating from the same “line”, hierarchical clustering is performed according to the level of similarity, and at least A "multiple-type waveform discrimination" mechanism that performs a "cluster analysis" operation to form one or two groups, including
Based on numerical data of time-series changes in the “luciferase enzyme activity” for biological individuals originating from the same “line”, observed in the individual at a desired elapsed time starting from the measurement start time It is converted into numerical data of time-series changes in predicted “luciferase enzyme activity”, and “time axis unification” is performed between the numerical data of time-series changes in “luciferase enzyme activity” in each individual. "Unified time axis"mechanism;
Using the numerical data of the time-series changes in “luciferase enzyme activity” in each individual for which “time axis unification” has been made, a plurality of types determined to belong to the same group by the “plurality waveform discrimination” Create a “model waveform” that shows the time-series changes in the representative value of “luciferase enzyme activity”, reflecting the similarity of the time-series changes in “luciferase enzyme activity” in this group. The “model waveform creation” mechanism;
Based on a “model waveform” indicating a time-series change of a representative value of “luciferase enzyme activity” for a group consisting of a plurality of individuals determined to belong to the same group, a plurality of “unimodal waveform functions” Overlay, create a “synthetic waveform” that approximately shows the waveform characteristics of the “model waveform”, and for each of the “unimodal waveform function” values for the peak position, peak height, and half width A system for analyzing time-series gene expression data, characterized by having a “waveform decomposition” mechanism for determining

Using the “model waveform” created in the process of “model waveform creation” for the biological group derived from each “line”,
Compare “model waveforms” of organisms derived from multiple “lines”, perform hierarchical clustering according to the level of similarity, and at least one group that includes multiple “lines”, or The time series gene expression level analysis system according to claim 7, further comprising a “waveform comparison / classification” mechanism for performing two “cluster analysis” operations.