JP2005149037A

JP2005149037A - Method, apparatus and program for estimating gene expression action

Info

Publication number: JP2005149037A
Application number: JP2003384457A
Authority: JP
Inventors: Hiroya Nobori; 博也昇; Yoshiki Miura; 佳樹三浦; Shigeyuki Mitsui; 重之三井
Original assignee: Mitsubishi Space Software Co Ltd
Current assignee: Mitsubishi Space Software Co Ltd
Priority date: 2003-11-14
Filing date: 2003-11-14
Publication date: 2005-06-09
Anticipated expiration: 2023-11-14
Also published as: JP4459597B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a method for estimating interaction of gene expression in which an initial value of a suitable coefficient can be easily obtained and the volume of calculation is reduced. <P>SOLUTION: The method for preparing a gene model and estimating gene expression action on the basis of time-sequence data of a wild strain composed of a gene group which is not operated and destroyed strain obtained by destroying specific genes is provided with individual gene expression coefficient estimation steps (S17-S20) in which a loop for repeating a procedure for estimating a generation coefficient for simultaneously satisfying the time-sequence data of the wild strain and the specific destroyed strain in the gene model by the number of individual genes to be estimated which are included in the wild strain is included, and gene expression coefficient matrix estimation steps (S16-S20) in which a repeating loop for successively performing the individual gene expression coefficient estimation steps by extracting succeeding destroyed strain after finishing the estimation of the specific destroyed strain by the prescribed number of destroyed strains is included. An initial value for estimating a gene expression coefficient matrix is obtained through the two steps. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、複数の遺伝子間において、個々の遺伝子が他の遺伝子から時間的に影響を受けて変化する、その変化作用量を推測する方法に関するものである。 The present invention relates to a method for estimating the amount of change action, in which each gene changes by being influenced by time from other genes among a plurality of genes.

従来、こうした複数の遺伝子群において、個々の遺伝子が他の遺伝子から影響を受ける様子、つまり遺伝子相互作用を推定する方法として、閾値検定、統計的手法を用いたＢａｙｅｓｉａｎネットワーク、二項関係を用いたブーリアンネットワーク、更には微分方程式を用いる方法等、各種の技術がある。
なかでも同値類のフィードバック制御が同定可能な微分方程式モデルを使用した推定方法がよく知られている。 Conventionally, in such a plurality of gene groups, individual genes are influenced by other genes, that is, as a method for estimating gene interaction, a threshold test, a Bayesian network using a statistical method, and a binomial relationship are used. There are various technologies such as a Boolean network and a method using a differential equation.
In particular, an estimation method using a differential equation model that can identify feedback control of an equivalence class is well known.

なお、出願人は、出願時点で先行技術文献を知得していない。 The applicant has not obtained prior art documents at the time of filing.

従来の方法は、推定のために各遺伝子の作用係数に初期値を与えて推定を開始するが、その初期値の与え方に関して適切な方法を示していないので、最適方法を推定するまでに非常に時間がかかるという課題がある。
また計算量が膨大になり、または繰返し推定の中で数値の発散を防ぐために制限を加える必要がある、そのため遺伝子数を制限する必要がある、などの課題もある。 In the conventional method, the initial value is given to the coefficient of action of each gene for estimation, but the estimation is started. There is a problem that it takes time.
In addition, there is a problem that the amount of calculation becomes enormous, or that it is necessary to add a restriction in order to prevent the divergence of numerical values in iterative estimation, and therefore it is necessary to limit the number of genes.

この発明は上記の課題を解決するためになされたもので、適切な係数の初期値が容易に得られ、かつ計算量を抑えた遺伝子発現相互作用推定方法を得る。 The present invention has been made in order to solve the above-described problems, and provides a gene expression interaction estimation method in which an initial value of an appropriate coefficient can be easily obtained and a calculation amount is suppressed.

この発明に係る遺伝子発現作用推定方法は、遺伝子モデルを作成し、操作されていない遺伝子群からなる野生株と特定遺伝子を破壊した破壊株の時間列データを基に遺伝子発現作用を推定する方法において、
遺伝子モデルに対し、野生株と特定の破壊株との時間列データを同時に満足する発現係数を推定する手順を野生株に含まれる推定対象個別遺伝子の数だけ繰返すループとして持つ個別遺伝子への発現係数推定ステップと、
特定の破壊株との推定を終えると、次の破壊株を取上げて個別遺伝子への発現係数推定ステップを繰返すことを、所定数の破壊株について順次行う、繰り返しループとして持つ遺伝子発現係数行列推定ステップとを備えて、
上記２つのステップを経て、遺伝子発現係数行列の推定初期値を得るようにした。 The gene expression action estimation method according to the present invention is a method of creating a gene model and estimating a gene expression action based on time series data of a wild strain consisting of an unmanipulated gene group and a disrupted strain in which a specific gene is destroyed. ,
Expression coefficient for individual genes with a loop that repeats the number of individual genes to be estimated included in the wild strain for the gene model to estimate the expression coefficient that simultaneously satisfies the time series data of the wild strain and the specific disrupted strain An estimation step;
When the estimation with a specific disrupted strain is completed, the next step is to take the next disrupted strain and repeat the expression coefficient estimation step for individual genes. And with
Through the above two steps, an estimated initial value of the gene expression coefficient matrix was obtained.

野生株と破壊株とのタイムコース値の組合わせから遺伝子間相互作用行列を初期値推定するようにしたので、推定収束までの時間が短縮され、かつ安定で妥当な推定ができる効果がある。 Since the initial value of the gene interaction matrix is estimated from the combination of the time course values of the wild strain and the disrupted strain, the time until the convergence of the estimation is shortened and there is an effect that stable and reasonable estimation can be performed.

実施の形態１．
遺伝子の相互作用、または他の遺伝子から受ける影響度を推定するために、最小二乗法による準最適化手法を適用して初期値を得て、またその初期値に対して線形微分方程式モデルによる準最適化を適用して、制約を少なくして推定所要時間を短縮する方法を説明する。
本実施の形態において、線形微分方程式モデルとして次の（１）式を設定する。 Embodiment 1 FIG.
In order to estimate gene interaction or the degree of influence from other genes, a semi-optimization method using the least square method is applied to obtain an initial value, and a quasi-optimization using a linear differential equation model is applied to the initial value. A method for reducing the constraints and reducing the estimated time by applying optimization will be described.
In the present embodiment, the following equation (1) is set as a linear differential equation model.

ここでｘは遺伝子発現量ベクトル、Ａは各遺伝子間の相互作用を示す係数行列、ｗは不確定要素によるバイアス係数ベクトルである。不確定要素に関しては、時間の多項式関数とする場合もあるが、ここではバイアスモデルとして記述する。また上記（１）式から、最小二乗法では遺伝子発現量ベクトルを（１）式の両辺を積分した次の（２）式とする。 Here, x is a gene expression level vector, A is a coefficient matrix indicating an interaction between genes, and w is a bias coefficient vector based on uncertain elements. The uncertain element may be a polynomial function of time, but is described here as a bias model. Further, from the above equation (1), in the least square method, the gene expression level vector is represented by the following equation (2) obtained by integrating both sides of the equation (1).

ｘ₀ は遺伝子発現量初期値（初期値ベクトル）、ｔは時間を示す。
また最小二乗法は、遺伝子発現量ベクトル（遺伝子ベクトル）ｘの一つの遺伝子に対し、時間で変化するタイムコースデータを基に次の（３）式で各遺伝子Ｙが変化すると設定する。そうしてその時間変化において、他の遺伝子からの相互作用を示す行列Ｂと、Ｗの推定を行う。
Ｙ＝Ｙ₀ ＋Ｂ・Ｘ＋Ｗ・ｔ（３）
ここでＹは対象遺伝子群中における一つの遺伝子のタイムコースデータであり、Ｘは個別の遺伝子Ｙの集合体としての遺伝子群で、遺伝子発現量ｘの時間積分値である。またＹ₀ はその初期値である。
こうして設定したモデルに基づく推定方法をフローにしたものを図１と図２に示す。またこの動作を展開する計算回路構成の例を図３に示す。図３において、推定を行うための回路構成として、実際のデータを読込む入力装置５１、本装置の解析結果である、遺伝子間の影響度をパスウェイとして得て、このパスウェイを表示する表示装置５２と、この入出力間で本処理を行うハードウェア構成の遺伝子発現作用推定装置からなる。この遺伝子発現作用推定装置は、中央処理装置を有する計算機で実現できるものである。遺伝子発現作用推定装置の詳細構成として、以下の構成要素を持っている。入力側から順に、タイムコース・データ入力処理部４１、これで得たデータを補間する補間処理部４２、補間結果から正規化する正規化処理部４３、モデルの係数初期値を推定する初期値推定部４４、得られた初期値に基づいて相互作用係数を推定する相互作用係数推定処理部４５、更に最適化処理をする係数最適化処理部４６、入力のタイムコース・データを記憶するタイムコース・データ記憶部４８、モデルの数式を表現する遺伝子発現モデルを記憶する解析手法記憶部４７、得られた遺伝子間の影響を図式化するパスウェイ描画処理部４９で、それぞれ対応プログラムを持ったハードウェアである。あるいは、上記各部は、プログラムのみにより実現されてもよいし、ハードウェアのみにより実現されてもよいし、プログラムとハードウェアとの組み合わせにより実現されてもよい。 x ₀ is the gene expression amount initial value (initial value vector), t denotes time.
The least square method is set so that each gene Y changes in the following equation (3) based on the time course data that changes with time for one gene of the gene expression level vector (gene vector) x. Then, in the time change, the matrix B indicating the interaction from other genes and W are estimated.
Y = Y ₀ + B · X + W · t (3)
Here, Y is time course data of one gene in the target gene group, and X is a gene group as an aggregate of individual genes Y, and is a time integral value of the gene expression level x. Y ₀ is the initial value.
FIGS. 1 and 2 show a flow of the estimation method based on the model set in this way. FIG. 3 shows an example of a calculation circuit configuration for developing this operation. In FIG. 3, as a circuit configuration for estimation, an input device 51 that reads actual data, a display device 52 that obtains the degree of influence between genes, which is an analysis result of this device, as a pathway, and displays this pathway. And a gene expression action estimation device having a hardware configuration for performing this processing between the input and output. This gene expression action estimation apparatus can be realized by a computer having a central processing unit. The detailed configuration of the gene expression action estimation apparatus has the following components. In order from the input side, a time course / data input processing unit 41, an interpolation processing unit 42 for interpolating the data obtained thereby, a normalization processing unit 43 for normalizing from the interpolation result, an initial value estimation for estimating the coefficient initial value of the model Unit 44, an interaction coefficient estimation processing unit 45 that estimates an interaction coefficient based on the obtained initial value, a coefficient optimization processing unit 46 that performs further optimization processing, and a time course that stores input time course data. A data storage unit 48, an analysis method storage unit 47 that stores a gene expression model that expresses a mathematical expression of the model, and a pathway drawing processing unit 49 that graphically illustrates the effect between the obtained genes, each with hardware having a corresponding program is there. Or each said part may be implement | achieved only by a program, may be implement | achieved only by hardware, and may be implement | achieved by the combination of a program and hardware.

この計算機による推定方法を図１と図２に基づいて説明する。
先ずタイムコースデータである実験値により着目する個別の遺伝子Ｙの時間変化データを読み込む。図４は実験等で得られる観測値であり、図において、左端の列は説明のためのインデックスであり、図では１から２７まで記されている。またこの図では説明を簡単にするために遺伝子を４個（Ｇ０１，Ｇ０２，Ｇ０３，Ｇ０４）の例を示している。インデックス５ないし７は操作されていない自然の個別遺伝子からなる遺伝子群を野生株として、その野生株の時間変化データを示している。そして左端の列が時刻、この例では３つの時刻（１．１，１．２，１．３）を表し、次の列（１．１１，１．２１，１．３１）は、遺伝子Ｇ０１の各左側時刻におけるタイムコース値で、次の１．１２ないし１．３２は、遺伝子Ｇ０２の同じくサンプル時刻のタイムコース値、同様に１．１３ないし１．３３はＧ０３のタイムコース値、１．１４ないし１．３４はＧ０４のタイムコース値である。更にインデックス１０ないし１２は、インデックス８に示すように、Ｇ０１遺伝子を破壊した破壊株の３つの時刻（１．１，１．２，１．３）におけるタイムコース値で、２．１１，２．１２，２．１３，２．１４の各下への列が、それぞれ遺伝子Ｇ０１，Ｇ０２，Ｇ０３，Ｇ０４の時刻変化値である。以下、インデックス１５ないし１７は遺伝子Ｇ０２を破壊した破壊株、２０ないし２２はＧ０３を破壊した破壊株、２５ないし２７はＧ０４を破壊した破壊株のタイムコース値である。これをタイムコース・データ入力処理部４１がＳ１１で行う。 An estimation method using this computer will be described with reference to FIGS.
First, the time change data of the individual gene Y of interest is read from the experimental value which is time course data. FIG. 4 shows observed values obtained by experiments and the like. In the figure, the leftmost column is an index for explanation, and 1 to 27 are shown in the figure. Further, in this figure, an example of four genes (G01, G02, G03, G04) is shown for easy explanation. Indexes 5 to 7 show time-change data of wild strains, where a gene group composed of natural individual genes that have not been manipulated is a wild strain. The leftmost column represents the time, in this example, three times (1.1, 1.2, 1.3), and the next column (1.11, 1.21, 1.31) represents the gene G01. The time course values at the left time are the following 1.12 to 1.32, the time course value of the sample time of the gene G02, 1.13 to 1.33 are the time course values of G03, 1.14 Thru 1.34 is the time course value of G04. Furthermore, as shown in index 8, indexes 10 to 12 are time course values at three times (1.1, 1.2, 1.3) of the disrupted strain in which the G01 gene is disrupted. The lower columns of 12, 2.13, and 2.14 are time change values of the genes G01, G02, G03, and G04, respectively. In the following, indexes 15 to 17 are the time course values of the disrupted strain in which the gene G02 is disrupted, 20 to 22 are the disrupted strain in which the G03 is disrupted, and 25 to 27 are the disrupted strains in which the G04 is disrupted. The time course / data input processing unit 41 performs this in S11.

Ｓ１２において、計算機はタイムコースデータを計算機処理のために絶対値が大きいデータで正規化し、補間処理部４２が行うＳ１３で、対数値を基に図４の経過時間の中間値をスプライン（平滑化）補間処理する。正規化処理部４３が行うＳ１４で、補間後の対数値データを元の物理量に戻す。
Ｓ１５では、サンプル時刻毎に各遺伝子のそれまでのデータを積分して行列Ｘと、またベクトルＹを生成する。Ｓ１５ではこの他に、生成された行列Ｘにおいて、推定しない相互作用の要素はゼロに固定し、またＸ行列、Ｙベクトルの有効桁を計算の精度と安定化のために、調整する。 In S12, the computer normalizes the time course data with data having a large absolute value for computer processing, and in S13 performed by the interpolation processing unit 42, the intermediate value of the elapsed time in FIG. 4 is splined (smoothed) based on the logarithmic value. ) Interpolate. In S14 performed by the normalization processing unit 43, the interpolated logarithmic value data is returned to the original physical quantity.
In S15, the matrix X and the vector Y are generated by integrating the data so far for each gene at each sampling time. In S15, in addition to this, in the generated matrix X, the elements of the interaction that are not estimated are fixed to zero, and the significant digits of the X matrix and the Y vector are adjusted for calculation accuracy and stabilization.

本実施の形態で特徴的なループを説明する。Ｓ１６では、Ｓ２１で囲まれる大きなループを形成している。この大きなループは、野生株の時間変化と、１つの破壊株（図４では、例えばインデックス８に注記したＧ０１を破壊した破壊株で、そのタイムコース値はインデックス１０ないし１２に示される）をループの１つとし、これを野生株と次の破壊株（Ｇ０２を破壊した破壊株）の組み、というように全組合わせによるＢとＷを推定して行くことを表している。
Ｓ１７とＳ２０で囲まれる小さなループは、例えばＧ０１を破壊した破壊株と野生株のタイムコース値からＢとＷを得ることを表している。つまりＳ１７で、先ず最初の野生株の遺伝子Ｇ０１を取り上げて、図４のインデックス５ないし７のタイムコース値と１０ないし１２のタイムコース値とから、Ｇ０１に注目して、Ｓ１８により最小二乗法により暫定的に（３）式の行列ＢとベクトルＷ内の遺伝子Ｇ０１に関する相互作用の係数を推定する。こうして図４の例では、インデックス５ないし７のタイムコース値と１０ないし１２のタイムコース値による推定では、遺伝子数が４つあるので４つのＢとＷが推定される。つまりこの小さなループでは、図４のインデックス１０ないし１２を用いた推定で、一回目のループではＧ０１に着目した推定をし、次の小さなループの２回目ではＧ０２に着目した推定を行い、三回目の小さなループにおいてはＧ０３に着目して推定し、最後に四回目の小さなループではｇ０４に着目した推定を行う。その結果、Ｓ１９で、得られたＢとＷ内の計数を用いて、（１）式の行列Ａとｗの各値を暫定的に推定する。 A loop characteristic of this embodiment will be described. In S16, a large loop surrounded by S21 is formed. This large loop loops over time of the wild strain and one disrupted strain (in FIG. 4, for example, a disrupted strain that has disrupted G01 noted in index 8 and its time course value is shown in indexes 10 to 12). This represents the estimation of B and W for all combinations, such as a combination of a wild strain and the next disrupted strain (destroyed strain that destroyed G02).
The small loop surrounded by S17 and S20 represents that B and W are obtained from the time course values of the disrupted strain and the wild strain that destroyed G01, for example. That is, in S17, the first wild-type gene G01 is picked up, and from the time course values of indexes 5 to 7 and 10 to 12 in FIG. 4, paying attention to G01, the least square method is obtained in S18. Provisionally, the coefficient of interaction regarding the gene G01 in the matrix B and the vector W in the equation (3) is estimated. Thus, in the example of FIG. 4, in the estimation using the time course values of the indexes 5 to 7 and the time course values of 10 to 12, there are four genes, so four B and W are estimated. That is, in this small loop, the estimation using the indexes 10 to 12 in FIG. 4 is performed. In the first loop, the estimation focusing on G01 is performed, and in the second small loop, the estimation focusing on G02 is performed. In the small loop, estimation is performed by paying attention to G03. Finally, in the fourth small loop, estimation is performed focusing on g04. As a result, in S19, using the obtained counts in B and W, the values of the matrices A and w in the equation (1) are temporarily estimated.

次いでＳ１６に戻って、タイムコースデータ中のその他の破壊株、図４の例ではインデックス１３，１８，２３に記された他の破壊株について、Ｓ１７ないしＳ１９のループを繰り返す。即ち大きなループでは、順次他の破壊株と野生株とを用いた推定に移ることを表している。この大きなループを含む二重ループの処理を行うのが初期値推定部４４である。
こうして順次破壊株を取り出して、それぞれ破壊株と野生株と同時に満足するＢとＷの推定を行う。その結果、全破壊株が全遺伝子に与える相互作用が推定できる。このとき遺伝子Ｙは、ループ中に含まれる遺伝子の発現量の時系列データを格納したベクトルとなる。またＸは遺伝子Ｙベクトルのタグ時刻における発現量の積分値を格納している。
Ｓ２１で全破壊株が及ぼす遺伝子のＢとＷの係数の推定が終わると、Ｓ２３で相互作用係数推定処理部４５が、破壊株の影響がない係数について、残りのＡ，ｗの係数を野生株のタイムコースデータを入力して最小二乗法により推定する。こうして係数行列Ａとベクトルｗが暫定的に準最適化用の初期値として確定する。 Next, the process returns to S16, and the loop of S17 to S19 is repeated for the other disrupted strains in the time course data (in the example of FIG. 4, the other disrupted strains indicated by indexes 13, 18, and 23). In other words, a large loop represents a shift to estimation using other disrupted strains and wild strains. The initial value estimation unit 44 performs processing of a double loop including this large loop.
In this way, the disrupted strains are sequentially taken out, and B and W that are satisfied simultaneously with the disrupted strain and the wild strain are estimated. As a result, the interaction that all the disrupted strains give to all genes can be estimated. At this time, the gene Y is a vector that stores time-series data of the expression level of the gene included in the loop. X stores the integrated value of the expression level of the gene Y vector at the tag time.
When the estimation of the B and W coefficients of the genes affected by all the disrupted strains is completed in S21, the interaction coefficient estimation processing unit 45 uses the remaining A and w coefficients as wild strains for the coefficients that are not affected by the disrupted strains in S23. The time course data is input and estimated by the method of least squares. In this way, the coefficient matrix A and the vector w are provisionally determined as initial values for semi-optimization.

Ｓ２４以降は、係数最適化処理部４６が最終的な係数を確定するための処理を行う。Ｓ２４では、Ｓ２３で暫定的に確定したｘのモデルについて、Ａとｗの各要素の係数を微小変動させた、２個以上の幾つかのモデル、例えば５個のモデルを線形微分方程式モデルとして生成する。
Ｓ２５では、確定までの演算時間を制限するために、最大試行回数を設定し、Ｓ２６では、Ｓ２４で生成した５個のモデルのうち、最初のモデルを取り出して、Ｓ２７とＳ２８の処理をする。Ｓ２７では、モデルの微分方程式を数値積分して、各個体の各遺伝子のタイムコース値を求める。
Ｓ２８で、上記積分で求めたタイムコース値と、対応するＳ１１での入力実験値とを比較して、推定相対誤差を求める。
Ｓ２９からＳ２６に戻って、Ｓ２４で生成した５個全てのモデルについて、Ｓ２７とＳ２８の処理を繰り返し、各モデルの推定相対誤差を求める。 After S24, the coefficient optimization processing unit 46 performs a process for determining the final coefficient. In S24, two or more models, for example, five models are generated as linear differential equation models by slightly varying the coefficients of the elements of A and w for the model of x tentatively determined in S23. To do.
In S25, the maximum number of trials is set in order to limit the calculation time until confirmation, and in S26, the first model is extracted from the five models generated in S24, and the processes of S27 and S28 are performed. In S27, the differential equation of the model is numerically integrated to obtain the time course value of each gene of each individual.
In S28, the time course value obtained by the integration is compared with the corresponding input experimental value in S11 to obtain an estimated relative error.
Returning from S29 to S26, the process of S27 and S28 is repeated for all five models generated in S24, and the estimated relative error of each model is obtained.

Ｓ３０で所定の誤差以下のモデルがあれば、そのモデルのＡ推定行列とｗベクトルであるとして推定を終える。
Ｓ３０でどのモデルも所定の誤差以下でなければ、Ｓ３１で５個のモデルのうち、最も誤差の少ないモデルを個体として選択し、再び５個のモデルを生成する。そしてＳ３２でその５個のモデルについてそれぞれ行列Ａとベクトルｗの異なる係数を微小変動させる。Ｓ３３では、最大試行回数内で所定の回数Ｓ３０で誤差が閾値以下にならなかった場合に、Ｓ３２の微小変動を小さくする。
Ｓ２５とＳ３４とで囲まれる試行ループにより、Ｓ３０で所定の誤差以下のモデルが得られる可能性が高く、誤差が少ないモデルが定まれば、そのモデルが表す行列Ａとベクトルｗが最終推定結果となる。 If there is a model having a predetermined error or less in S30, the estimation is finished assuming that the model is an A estimation matrix and a w vector.
If none of the models is equal to or smaller than the predetermined error in S30, the model having the smallest error among the five models is selected as an individual in S31, and five models are generated again. In S32, the different coefficients of the matrix A and the vector w are slightly changed for the five models. In S33, if the error does not become the threshold value or less in the predetermined number of times S30 within the maximum number of trials, the minute fluctuation in S32 is reduced.
If a model having a predetermined error or less is likely to be obtained in S30 by the trial loop surrounded by S25 and S34, and a model with a small error is determined, the matrix A and the vector w represented by the model are determined as the final estimation result. Become.

図５は、このように推定してＳ３０で誤差が少ないとして得た最終のモデルを推定値で表し、図４のタイムコース実験値とを比較した図である。
また図６は、推定の結果得られる、遺伝子間の相互作用、つまり他の遺伝子への影響関係を図式化したものである。この図はパスウェイ描画処理部４９で得られる。図１と図２による推定の結果、Ｂが定まり、遺伝子Ｃ３は、遺伝子Ｂ６からプラスの相互作用を受け、Ｂ７からマイナスの作用を受け、Ｃ２と組になってＣ１にプラス作用を与えることが推定される。
こうして遺伝子を微分方程式でモデル化し、野生株と破壊株とのタイムコース値の組合わせから微分方程式モデルの遺伝子間相互作用行列を初期値推定するようにしたので、推定収束までのシミュレーション回数が短縮され、かつ安定で妥当な推定ができる。
なお、図２と図３の各ステップを手順として持つプログラムを用意して、記録媒体またはオンラインロードにより他の計算機にインストールし、そのプログラムを計算機の中央処理装置により実行させて上記解析を行わせるようにしてもよい。 FIG. 5 is a diagram in which the final model obtained in this way and obtained with a small error in S30 is represented by an estimated value and compared with the time course experimental value of FIG.
FIG. 6 is a schematic diagram showing the interaction between genes, that is, the influence relationship to other genes, obtained as a result of estimation. This figure is obtained by the pathway drawing processing unit 49. As a result of estimation according to FIGS. 1 and 2, B is determined, and gene C3 receives a positive interaction from gene B6, receives a negative action from B7, and forms a positive action on C1 in combination with C2. Presumed.
In this way, the gene is modeled with a differential equation, and the initial value of the gene interaction matrix of the differential equation model is estimated from the combination of the time course values of the wild strain and the disrupted strain, so the number of simulations until the estimated convergence is shortened And a stable and reasonable estimate.
It should be noted that a program having the steps of FIGS. 2 and 3 as a procedure is prepared, installed in another computer by a recording medium or online loading, and the program is executed by the central processing unit of the computer to perform the above analysis. You may do it.

この発明の実施の形態１における行列とベクトル推定処理フローを示す図である。It is a figure which shows the matrix and vector estimation processing flow in Embodiment 1 of this invention. 実施の形態１における行列とベクトル推定処理フローの続きを示す図である。FIG. 10 is a diagram showing a continuation of the matrix and vector estimation processing flow in the first embodiment. 実施の形態１における行列とベクトル推定処理を行う回路構成を示す図である。3 is a diagram showing a circuit configuration for performing matrix and vector estimation processing in Embodiment 1. FIG. 実施の形態１における実験で得られるタイムコース値の例を示す図である。6 is a diagram illustrating an example of a time course value obtained by an experiment in Embodiment 1. FIG. 実験値のタイムコース値と実施の形態１で得られた推定値に基づいたタイムコース値との比較を示す図である。It is a figure which shows the comparison with the time course value based on the estimated value obtained in Embodiment 1 and the time course value of an experimental value. 実施の形態１での推定結果による遺伝子間の相互作用を示すパスウェイ図である。3 is a pathway diagram showing an interaction between genes based on an estimation result in Embodiment 1. FIG.

Explanation of symbols

Ｓ１６特定破壊株を取上げるステップ、Ｓ１７推定対象の個別遺伝子を選択するステップ、Ｓ１８最小二乗法による行列Ｂ、ベクトルＷの係数推定ステップ、Ｓ１９行列Ａ、ベクトルｗ内の要素への登録ステップ、Ｓ２０特定破壊株の中で未推定の遺伝子があるかを調べるステップ、Ｓ２１未推定の破壊株があるかを調べるステップ、Ｓ１７，Ｓ２０特定破壊株の中で個別遺伝子の係数を順次遺伝子毎に推定する小さなループ、Ｓ１６，Ｓ２０破壊株毎に順次推定する大きなループ、Ｓ２２全破壊株について推定が終わったかを確認するステップ、Ｓ２３行列Ａとｗの未推定係数を推定するステップ、Ｓ２４推定対象の複数の線形微分方程式モデル作成ステップ、Ｓ２６Ｓ２４で作成した一つの線形微分方程式モデルを選択するステップ、Ｓ２７モデルの積分により各遺伝子の時間列値を得るステップ、Ｓ２８時間積分値と入力遺伝子時間列値との誤差算出ステップ、Ｓ２９Ｓ２４で作成した線形微分方程式モデルを順次取上げて行き、全モデルが終わったかを確認するステップ、Ｓ３０Ｓ２４で作成したモデルの中で所定の誤差以下のモデルがあるかを調べるステップ、Ｓ３１誤差最小のモデルに対してＳ２４で作成した数のモデル数を用意するステップ、Ｓ３２行列Ａ、ベクトルｗの要素の値をモデル毎に違えて変化させるステップ、Ｓ３３変動率を小さくするステップ、Ｓ３４設定した最大試行回数を超えていないかを確認するステップ、４１タイムコース・データ入力処理部、４２補間処理部、４３正規化処理部、４４初期値推定部、４５相互作用係数推定処理部、４６係数最適化処理部、４７解析手法記憶部、４８タイムコース・データ記憶部、４９パスウェイ描画処理部。 S16 Step of picking up specific disrupted strains, S17 Step of selecting individual genes to be estimated, S18 Step of estimating matrix B by coefficient of least squares, vector W, S19 Step of registering elements in matrix A and vector w, S20 specification A step of checking whether there is an unestimated gene in the disrupted strain, a step of checking whether there is an S21 unestimated disrupted strain, S17, S20 A small one that sequentially estimates the coefficient of each individual gene among the specific disrupted strains Loop, S16, S20 large loop for estimating each disrupted strain sequentially, S22 step for confirming whether estimation has been completed for all disrupted strains, S23 step for estimating unestimated coefficients of matrices A and w, S24 a plurality of linears to be estimated Differential equation model creation step, S26 Select one linear differential equation model created in S24 Step S27, obtaining the time series value of each gene by integrating the model, S28 calculating the error between the time integral value and the input gene time series value, and taking the linear differential equation model created in S29 S24 in order, A step of confirming whether the model has been completed, a step of checking whether there is a model having a predetermined error or less in the models created in S30 and S24, and a step S31 for the number of models created in S24 with respect to the model having the smallest error. Step, step S32, changing the values of the elements of the matrix A and the vector w differently for each model, step S33, reducing the variation rate, step S34, checking whether the set maximum number of trials has been exceeded, step 41 Data input processing unit, 42 Interpolation processing unit, 43 Normalization processing unit, 44 Initial value estimation unit, 4 Interaction coefficient estimation processing unit, 46 coefficient optimization processing unit, 47 analysis method storage unit, 48 Time-course data storage unit, 49 pathway drawing processing unit.

Claims

In a method for estimating a gene expression action based on time series data of a wild strain consisting of a gene group that has not been manipulated and a disrupted strain in which a specific gene has been destroyed,
To individual genes having a loop that repeats the number of the estimation target individual genes included in the wild strain for the gene model to estimate the expression coefficient that simultaneously satisfies the time series data of the wild strain and the specific disrupted strain An expression coefficient estimation step of
When the estimation with the specific disrupted strain is completed, the gene expression coefficient matrix having a repeated loop is sequentially performed for a predetermined number of disrupted strains by taking the next disrupted strain and repeating the expression coefficient estimation step for the individual gene. An estimation step,
A gene expression action estimation method characterized in that an estimated initial value of a gene expression coefficient matrix is obtained through the above two steps.

The gene expression action estimation method according to claim 1, wherein the gene model is estimated by approximating the gene expression with a time integration of a gene expression level having a gene expression coefficient matrix.

2. The gene expression action estimation method according to claim 1, wherein, in the expression coefficient estimation step for an individual gene, a least square method is applied to estimate the expression coefficient.

In a device that creates a gene model and estimates the gene expression effect based on time series data of a wild strain consisting of a group of genes that has not been manipulated and a disrupted strain that has destroyed a specific gene
With respect to the gene model, a procedure for estimating an expression coefficient that simultaneously satisfies the time series data of the wild strain and a specific disrupted strain is calculated as a loop that repeats the number of individual genes to be estimated included in the wild strain, When the estimation with a specific disrupted strain is completed, an initial coefficient estimator is provided that sequentially takes out a predetermined number of disrupted strains by taking the next disrupted strain and repeating the expression coefficient estimation calculation for the individual genes. Characteristic gene expression action estimation device.

5. The gene expression action estimation apparatus according to claim 4, wherein the coefficient initial value estimation unit estimates the gene model by approximating the gene expression by time integration of the gene expression level having the gene expression coefficient matrix.

The gene expression action estimation device according to claim 4, wherein the coefficient initial value estimation unit applies a least square method to the estimation of the expression coefficient.

In a program that makes a computer create a gene model and estimate the gene expression effect based on time series data of a wild strain consisting of a group of genes that have not been manipulated and a disrupted strain that has destroyed a specific gene,
To individual genes having a loop that repeats the number of the estimation target individual genes included in the wild strain for the gene model to estimate the expression coefficient that simultaneously satisfies the time series data of the wild strain and the specific disrupted strain An expression coefficient estimation procedure for
When the estimation with the specific disrupted strain is finished, the next disrupted strain is taken and the expression coefficient estimation procedure for the individual gene is repeated, sequentially for a predetermined number of disrupted strains, and a gene expression coefficient matrix having an iterative loop A gene expression action estimation program comprising a procedure for obtaining an estimated initial value of a gene expression coefficient matrix using an estimation procedure.

8. The gene expression action estimation program according to claim 7, wherein the expression coefficient estimation procedure estimates a gene model by approximating the gene expression level having a gene expression coefficient matrix by time integration.

8. The gene expression action estimation program according to claim 7, wherein the expression coefficient estimation procedure applies a least square method to the expression coefficient estimation.