JP2006072861A

JP2006072861A - Transcription factor analysis program and transcription factor analysis method

Info

Publication number: JP2006072861A
Application number: JP2004257713A
Authority: JP
Inventors: Masaru Watanabe; 賢渡辺; Motohiko Yano; 元彦谷野
Original assignee: Fujitsu Ltd; Japan Biological Informatics Consortium
Current assignee: Fujitsu Ltd; Japan Biological Informatics Consortium
Priority date: 2004-09-03
Filing date: 2004-09-03
Publication date: 2006-03-16

Abstract

<P>PROBLEM TO BE SOLVED: To extract combination expected to have large relevance to expression of a gene from combination of a large number of transcription factors to perform efficient investigation. <P>SOLUTION: On coordinates wherein magnitude of a two-phenomenon mutual information amount showing coocurrence of a first transcription factor and a second transcription factor is taken as a horizontal axis and wherein magnitude of a three-phenomenon mutual information amount showing tissue peculiarity of the first transcription factor and the second transcription factor is taken as a vertical axis, the calculated mutual information amount is plotted and visualized to simultaneously analyze the coocurrence and the tissue peculiarity. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、遺伝子の発現と転写因子との関連を分析する転写因子分析プログラムおよび転写因子分析方法に関し、特に、多数の転写因子の組み合わせから遺伝子の発現と関連が深いと予想されるものを効率よく抽出することができる転写因子分析プログラムおよび転写因子分析方法に関するものである。 The present invention relates to a transcription factor analysis program and a transcription factor analysis method for analyzing the relationship between gene expression and a transcription factor, and in particular, the efficiency of what is expected to be deeply related to gene expression from a combination of a large number of transcription factors. The present invention relates to a transcription factor analysis program and a transcription factor analysis method that can be well extracted.

生物は、種毎に固有な遺伝情報をもっている。この遺伝情報は、ゲノム上に存在し、すべての細胞に染色体の形で同じ情報が格納されている。近年の研究により、ヒトのゲノムであるヒトゲノムは、約３０億もの塩基対からなることが明らかになっている。 Living organisms have genetic information unique to each species. This genetic information exists on the genome, and all cells store the same information in the form of chromosomes. Recent research has revealed that the human genome, which is the human genome, consists of about 3 billion base pairs.

遺伝子は、染色体を構成するＤＮＡ上に点在し、タンパク質の生成に関する制御情報を記憶している。ヒトの場合、遺伝子は３万個程度存在するとされているが、実際には、一つの細胞において全ての遺伝子が発現してタンパク質を生成するわけではない。例えば、心臓の細胞においては、心臓としての活動のために必要なタンパク質を生成する遺伝子のみが機能するようになっている。このように、遺伝子は、生体組織毎にその一部のみが機能するように制御されている。 Genes are scattered on the DNA constituting the chromosome and store control information related to protein production. In the case of humans, there are about 30,000 genes, but in reality, not all genes are expressed in a single cell to produce a protein. For example, in the cells of the heart, only genes that produce proteins necessary for the activity as the heart function. In this way, the gene is controlled so that only a part thereof functions for each living tissue.

生体組織毎に遺伝子の働きを調整する仕組みについては、未だ完全には解明されていない。現状では、転写因子と呼ばれる物質が関与しているということが定説となっており、どの転写因子がどの遺伝子と結びついてどのタンパク質が生成されるのかが盛んに研究されている。転写因子とは、遺伝子の上流部分に結合することによって、遺伝子に発現をおこさせる物質であり、単体で作用する転写因子は、既にある程度の数が見つかっている。 The mechanism for adjusting the function of genes for each living tissue has not yet been fully elucidated. At present, it has become the established theory that a substance called a transcription factor is involved, and it is actively researched which transcription factor is associated with which gene to produce which protein. A transcription factor is a substance that causes expression of a gene by binding to the upstream portion of the gene, and a certain number of transcription factors acting alone are already found.

非特許文献１で紹介されているＴＲＡＮＳＦＡＣという商用プロダクトでは、既知の転写因子がどの遺伝子の上流部分に結合するのかをある程度予測できるようになっている。このプロダクトを使用することにより、所定の遺伝子の上流部にどの転写因子が結合するのかを予測することができる。 In a commercial product called TRANSFAC introduced in Non-Patent Document 1, it is possible to predict to some extent which gene upstream a known transcription factor binds. By using this product, it is possible to predict which transcription factor binds to the upstream part of a given gene.

“三井情報開発株式会社：プロダクト紹介：遺伝子発現研究に不可欠な転写因子及び転写因子結合部位データベース『TRANSFAC』”、［online］、平成１６年８月２６日、［平成１６年８月２６日検索］、インターネット＜URL：http://bio.mki.co.jp/product/transfac/＞“Mitsui Information Development Co., Ltd .: Product Introduction: Transcription Factors and Transcription Factor Binding Site Database“ TRANSFAC ”Essential for Gene Expression Research”, [online], August 26, 2004 [Search August 26, 2004] ] Internet <URL: http://bio.mki.co.jp/product/transfac/>

しかしながら、遺伝子の上流部にどの転写因子が結合するのかを予測するだけでは、遺伝子の発現と転写因子の関連の研究を効率よく進めることはできない。 However, research on the relationship between gene expression and transcription factors cannot be carried out efficiently simply by predicting which transcription factor binds to the upstream part of the gene.

ほとんどの遺伝子の発現は、単体の転写因子ではなく転写因子の組み合わせによって起こることが知られている。このため、遺伝子の上流部にどの転写因子が結合するのかが分かったならば、次に、それらの転写因子の組み合わせの中から、実際に発現に関係のある組み合わせを特定する作業が必要となる。転写因子の組み合わせは、多数存在するため、この作業は、非常に多くの時間と労力を必要とし、研究の進捗を妨げる要因となっている。 It is known that most genes are expressed not by a single transcription factor but by a combination of transcription factors. For this reason, once it is known which transcription factor binds to the upstream part of the gene, it is necessary to identify a combination that is actually related to expression from the combination of those transcription factors. . Since there are many combinations of transcription factors, this work requires a great deal of time and effort, and hinders the progress of research.

この発明は、上述した従来技術による問題点を解消するためになされたものであり、多数の転写因子の組み合わせから遺伝子の発現と関連が深いと予想されるものを抽出し、効率的に調査をおこなうことを可能にする転写因子分析プログラムおよび転写因子分析方法を提供することを目的とする。 The present invention has been made to solve the above-mentioned problems caused by the prior art, and extracts what is expected to be deeply related to gene expression from a combination of a large number of transcription factors to efficiently investigate. It is an object of the present invention to provide a transcription factor analysis program and a transcription factor analysis method that can be performed.

上述した課題を解決し、目的を達成するため、本発明に係る転写因子分析プログラムは、遺伝子の発現と転写因子との関連を分析する転写因子分析プログラムであって、遺伝子の発現における第１の転写因子と、第２の転写因子と、生体組織との結合確率を計算する結合確率計算手順と、前記結合確率計算手順により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の共起性を表す２事象の相互情報量を計算する２事象相互情報量計算手順手段と、前記結合確率計算手順により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の組織特異性を表す３事象の相互情報量を計算する３事象相互情報量計算手順手段と、共起性を一方の軸にとり、組織特異性をもう一方の軸にとった２次元の座標上に前記２事象相互情報量計算手順手段と前記３事象相互情報量計算手順手段により算出された相互情報量情報をプロットした図表を生成し、この図表を所定の表示手段に表示する分析結果表示処理手順とをコンピュータに実行させることを特徴とする。 In order to solve the above-described problems and achieve the object, a transcription factor analysis program according to the present invention is a transcription factor analysis program for analyzing the relationship between gene expression and transcription factor, and is a first in gene expression. A binding probability calculation procedure for calculating a binding probability between the transcription factor, the second transcription factor, and the biological tissue, a first transcription factor based on the binding probability calculated by the binding probability calculation procedure, A two-event mutual information calculation procedure means for calculating mutual information of two events representing the co-occurrence of transcription factors; a first transcription factor based on the binding probability calculated by the binding probability calculation procedure; 3 event mutual information calculation procedure means to calculate the mutual information of 3 events representing the tissue specificity of the transcription factor, and two dimensions taking the co-occurrence on one axis and the tissue specificity on the other axis 2 event phases on the coordinates of Generate a chart plotting the mutual information information calculated by the information amount calculation procedure means and the three-event mutual information amount calculation procedure means, and display an analysis result display processing procedure for displaying the chart on a predetermined display means. It is made to perform.

また、本発明に係る転写因子分析方法は、遺伝子の発現と転写因子との関連を分析する転写因子分析方法であって、遺伝子の発現における第１の転写因子と、第２の転写因子と、生体組織との結合確率を計算する結合確率計算工程と、前記結合確率計算工程により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の共起性を表す２事象の相互情報量を計算する２事象相互情報量計算工程手段と、前記結合確率計算工程により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の組織特異性を表す３事象の相互情報量を計算する３事象相互情報量計算工程手段と、共起性を一方の軸にとり、組織特異性をもう一方の軸にとった２次元の座標上に前記２事象相互情報量計算工程手段と前記３事象相互情報量計算工程手段により算出された相互情報量情報をプロットした図表を生成し、この図表を所定の表示手段に表示する分析結果表示処理工程とを有したことを特徴とする。 The transcription factor analysis method according to the present invention is a transcription factor analysis method for analyzing the relationship between gene expression and transcription factor, wherein the first transcription factor, the second transcription factor in gene expression, A connection probability calculation step of calculating a connection probability with a biological tissue, a first transcription factor based on the connection probability calculated by the connection probability calculation step, and two events representing the co-occurrence of the second transcription factor 2 event mutual information calculation process means for calculating mutual information, 3 events representing the tissue specificity of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation process 3 event mutual information calculation process means for calculating mutual information of the two events, and the calculation of the two events mutual information on a two-dimensional coordinate taking the co-occurrence on one axis and the tissue specificity on the other axis Process means and 3 event mutual information calculator Generating a chart plotting the mutual information amount information calculated by means, characterized in that the chart had an analysis result display processing step of displaying on predetermined display means.

この発明によれば、２次元の座標上に転写因子の組み合わせの共起性を表す２事象の相互情報量と転写因子の組み合わせの組織特異性を表す３事象の相互情報量を同時に視覚化するように構成したので、組織特異的遺伝子発現と関連の深い転写因子の組み合わせを容易に抽出することができる。 According to this invention, the mutual information of two events representing the co-occurrence of a combination of transcription factors and the mutual information of three events representing the tissue specificity of the combination of transcription factors are simultaneously visualized on two-dimensional coordinates. Thus, a combination of transcription factors closely related to tissue-specific gene expression can be easily extracted.

また、本発明に係る転写因子分析プログラムは、上記の発明において、前記分析結果表示処理手順は、表示手段に表示した図表上の相互情報量の一つが選択されたならば、その相互情報量に係る詳細情報を表示手段に表示することを特徴とする。 The transcription factor analysis program according to the present invention is the above-described invention, wherein the analysis result display processing procedure sets the mutual information amount if one of the mutual information amounts on the chart displayed on the display means is selected. Such detailed information is displayed on the display means.

この発明によれば、選択された転写因子の組み合わせの組織別確率分布等の詳細情報を表示手段に表示するように構成したので、組織特異的遺伝子発現と関連の深いものとして抽出した転写因子の組み合わせが実際にどの組織の遺伝子の発現と関連しているのかを容易に分析することができる。 According to the present invention, since the detailed information such as the probability distribution by tissue of the combination of the selected transcription factors is displayed on the display means, the transcription factors extracted as deeply related to the tissue-specific gene expression are displayed. It is easy to analyze which tissue gene expression the combination is actually associated with.

また、本発明に係る転写因子分析プログラムは、上記の発明において、未知の転写因子の名称および塩基配列情報の登録を受け付ける未知転写因子登録手順をさらにコンピュータに実行させ、前記結合確率計算手順は、前記未知転写因子登録手順により登録された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする。 The transcription factor analysis program according to the present invention, in the above invention, further causes the computer to execute an unknown transcription factor registration procedure for receiving registration of an unknown transcription factor name and base sequence information. It is further characterized in that a binding probability obtained by applying the transcription factor registered by the unknown transcription factor registration procedure to one or both of the first transcription factor and the second transcription factor is further calculated.

この発明によれば、未知の転写因子の情報を登録して、既知の転写因子と同様に分析できるように構成したので、登録した未知の転写因子が組織特異的遺伝子発現と関連しているかどうかを容易に分析することができ、もって当該の未知の転写因子が実際に転写因子であるか否かを予測することができる。 According to the present invention, since information on unknown transcription factors is registered and can be analyzed in the same manner as known transcription factors, whether or not the registered unknown transcription factors are related to tissue-specific gene expression. Thus, it is possible to predict whether or not the unknown transcription factor is actually a transcription factor.

また、本発明に係る転写因子分析プログラムは、上記の発明において、ゲノム情報から未知の転写因子の塩基配列情報を生成する未知転写因子生成手順をさらにコンピュータに実行させ、前記結合確率計算手順は、前記未知転写因子生成手順により生成された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする請求項１または請求項２に記載の転写因子分析プログラムを特徴とする。 The transcription factor analysis program according to the present invention, in the above invention, further causes the computer to execute an unknown transcription factor generation procedure for generating base sequence information of an unknown transcription factor from genomic information, and the binding probability calculation procedure includes: The binding probability obtained by applying the transcription factor generated by the unknown transcription factor generation procedure to one or both of the first transcription factor and the second transcription factor is further calculated. The transcription factor analysis program described in 1. is characterized.

この発明によれば、未知の転写因子の情報を自動生成して、既知の転写因子と同様に分析できるように構成したので、生成された未知の転写因子が組織特異的遺伝子発現と関連しているかどうかを容易に分析することができ、もって当該の未知の転写因子が実際に転写因子であるか否かを予測することができる。 According to the present invention, since information on unknown transcription factors is automatically generated and can be analyzed in the same manner as known transcription factors, the generated unknown transcription factors are related to tissue-specific gene expression. Whether or not the unknown transcription factor is actually a transcription factor can be predicted.

本発明によれば、２次元の座標上に転写因子の組み合わせの共起性を表す２事象の相互情報量と転写因子の組織特異性の組み合わせを表す３事象の相互情報量を同時に視覚化するように構成したので、組織特異的遺伝子発現と関連の深い転写因子の組み合わせを容易に抽出することができるという効果を奏する。 According to the present invention, the mutual information of two events representing the co-occurrence of a combination of transcription factors on two-dimensional coordinates and the mutual information of three events representing a combination of tissue specificities of transcription factors are simultaneously visualized. Since it comprised as mentioned above, there exists an effect that the combination of a transcription factor deeply related with tissue-specific gene expression can be extracted easily.

また、本発明によれば、選択された転写因子の組み合わせの組織別確率分布等の詳細情報を表示手段に表示するように構成したので、組織特異的遺伝子発現と関連の深いものとして抽出した転写因子の組み合わせが実際にどの組織の遺伝子の発現と関連しているのかを容易に分析することができるという効果を奏する。 Further, according to the present invention, the detailed information such as the probability distribution by tissue of the combination of the selected transcription factors is displayed on the display means, so that the transcription extracted as deeply related to the tissue-specific gene expression The effect is that it is possible to easily analyze which tissue gene expression is actually associated with the combination of factors.

また、本発明によれば、未知の転写因子の情報を登録して、既知の転写因子と同様に分析できるように構成したので、登録した未知の転写因子が組織特異的遺伝子発現と関連しているかどうかを容易に分析することができ、もって当該の未知の転写因子が実際に転写因子であるか否かを予測することができるという効果を奏する。 In addition, according to the present invention, since information on unknown transcription factors is registered and can be analyzed in the same manner as known transcription factors, the registered unknown transcription factors are related to tissue-specific gene expression. Whether or not the unknown transcription factor is actually a transcription factor can be predicted.

また、本発明によれば、未知の転写因子の情報を自動生成して、既知の転写因子と同様に分析できるように構成したので、生成された未知の転写因子が組織特異的遺伝子発現と関連しているかどうかを容易に分析することができ、もって当該の未知の転写因子が実際に転写因子であるか否かを予測することができるという効果を奏する。 In addition, according to the present invention, information on unknown transcription factors is automatically generated and can be analyzed in the same manner as known transcription factors, so that the generated unknown transcription factors are related to tissue-specific gene expression. Whether or not the unknown transcription factor is actually a transcription factor can be predicted.

以下に添付図面を参照して、この発明に係る転写因子分析プログラムおよび転写因子分析方法の好適な実施の形態を詳細に説明する。ここでは、この発明に係る転写因子分析プログラムおよび転写因子分析方法をヒトの遺伝子発現に関連する転写因子の分析に用いた場合について説明するが、この発明に係る転写因子分析プログラムおよび転写因子分析方法は、他の生物の遺伝子発現に関連する転写因子の分析にも用いることができる。 Exemplary embodiments of a transcription factor analysis program and a transcription factor analysis method according to the present invention will be explained below in detail with reference to the accompanying drawings. Here, the case where the transcription factor analysis program and the transcription factor analysis method according to the present invention are used for the analysis of a transcription factor related to human gene expression will be described. The transcription factor analysis program and the transcription factor analysis method according to the present invention Can also be used to analyze transcription factors associated with gene expression in other organisms.

まず、本実施例に係る転写因子分析方式が基礎としている情報理論について説明する。本実施例に係る転写因子分析方式では、２事象の相互情報量と３事象の相互情報量という２種類の相互情報量を評価する。 First, the information theory on which the transcription factor analysis method according to this embodiment is based will be described. In the transcription factor analysis method according to the present embodiment, two types of mutual information, that is, a mutual information amount of two events and a mutual information amount of three events are evaluated.

２事象の相互情報量は、２つの因子間の共起性を表すものとされ、前世紀半ばから研究がおこなわれている。たとえば、文章中の単語の共起性の測定に用いられ、漢字変換システムにおける変換候補の優先順位の決定などに応用されている。 The mutual information amount of two events represents the co-occurrence between two factors and has been studied since the middle of the last century. For example, it is used to measure the co-occurrence of words in a sentence and is applied to the determination of the priority order of conversion candidates in a kanji conversion system.

具体的には、２事象の相互情報量Ｉ（Ｘ；Ｙ）は、下記の数式１で求められる。 Specifically, the mutual information amount I (X; Y) of two events is obtained by the following formula 1.

ここで、Ｈ（Ｘ）は、シャノンの情報量と呼ばれ、下記の数式２で表される。 Here, H (X) is called Shannon's information amount and is expressed by the following Equation 2.

一方、３事象の相互情報量は、下記の数式３で算出される。 On the other hand, the mutual information amount of the three events is calculated by Equation 3 below.

この３事象の相互情報量は、いわゆる「複雑系」の分析に使える可能性が示唆されてきたが、実際に利用されることは少なかった。本実施例に係る転写因子分析方式では、３事象の相互情報量が因子間の関係の特異性を表現することに注目し、２事象の相互情報量と同時に分析に用いることとした。 Although it has been suggested that the mutual information of these three events can be used for analysis of so-called “complex systems”, it has been rarely used in practice. In the transcription factor analysis method according to the present example, attention is paid to the fact that the mutual information amount of the three events expresses the specificity of the relationship between the factors, and it was decided to use it simultaneously with the mutual information amount of the two events.

図１は、本実施例に係る転写因子分析方式の適用例を示すサンプル図である。同図に示すように、本実施例に係る転写因子分析方式では、一方の軸に２事象の相互情報量（共起性）をとり、もう一方の軸に３事象の相互情報量（特異性）をとった座標上に、各種転写因子の組み合わせをプロットする。 FIG. 1 is a sample diagram showing an application example of the transcription factor analysis method according to the present embodiment. As shown in the figure, in the transcription factor analysis method according to this embodiment, the mutual information amount (co-occurrence) of two events is taken on one axis, and the mutual information amount (specificity) of three events is taken on the other axis. Plot the combinations of various transcription factors on the coordinates.

２事象の相互情報量は、２種類の転写因子の組み合わせの共起性の高さを表している。この共起性は、２種類の転写因子が各種遺伝子の上流部に同時に結合する（あるいは、同時に結合しない）ことが多いほど大きな値になる。 The mutual information amount of two events represents the high level of co-occurrence of the combination of two types of transcription factors. This co-occurrence becomes larger as the two types of transcription factors frequently bind to the upstream portions of various genes simultaneously (or do not bind simultaneously).

一方、３事象の相互情報量は、２種類の転写因子と生体組織の組み合わせの特異性の高さを表している。３事象の相互情報量は、一般に、正負いずれの値もとりうる。この例では、負の値をとる場合が示されている。この値は、３つの因子の関係の様々な状態を示すことが数学的に示唆されているが、いずれの場合においても、絶対値が大きいほど第３の因子の影響が大きいと解される。 On the other hand, the mutual information amount of the three events represents the high specificity of the combination of two types of transcription factors and biological tissues. In general, the mutual information of the three events can take either positive or negative values. In this example, a negative value is shown. It is mathematically suggested that this value indicates various states of the relationship between the three factors. In any case, it is understood that the larger the absolute value, the greater the influence of the third factor.

例えば、３事象の相互情報量の値が０であり、第１の因子と第２の因子の共起性が高いのであれば、第３の因子の影響はほとんどないことを示している。逆に、３事象の相互情報量の値が大きい場合は、影響の詳細は不明であるが、第３の因子の影響がそれだけ大きいことを示している。本実施例においては、第３の因子として生体組織の事象を採用していることから、３事象の相互情報量は、生体組織の特異性を示す指標となり、絶対値の大きさが特異性の大きさを示すものと解される。 For example, if the mutual information value of three events is 0 and the co-occurrence of the first factor and the second factor is high, it indicates that there is almost no influence of the third factor. Conversely, when the value of the mutual information amount of the three events is large, the details of the influence are unknown, but the influence of the third factor is so large. In this embodiment, since a biological tissue event is adopted as the third factor, the mutual information amount of the three events serves as an index indicating the specificity of the biological tissue, and the magnitude of the absolute value is specific. It is understood that it shows the size.

なお、３事象の相互情報量に関する参考資料としては、下記のものがある。
［参考文献１］Tsujishita,T., "On Triple Mutual Information", ADVANCED IN APPLIED MATHEMATICS, 1995: p.269-274 The following are reference materials related to the mutual information of the three events.
[Reference 1] Tsujishita, T., "On Triple Mutual Information", ADVANCED IN APPLIED MATHEMATICS, 1995: p.269-274

図１の領域１にプロットされた組み合わせのように、共起性が高く、特異性が低い転写因子の組み合わせは、どの遺伝子に対しても、発現組織によらず転写因子間に同様な関係が存在することを示している。かかる一般的な関係を持つ転写因子の組み合わせは、特定の遺伝子の発現と関連している可能は低いと判断でき、遺伝子配列の相同性による影響が強く表れている可能性が高いと解される。これは、転写因子間の遺伝子の結合判定が転写因子毎におこなわれることから、配列の相同性が高い場合には同じような結合判定がなされやすく、結果として、共起性が高く特異性が低くなる傾向をもつためである。 A combination of transcription factors having high co-occurrence and low specificity, such as the combination plotted in region 1 of FIG. 1, has a similar relationship between transcription factors regardless of the expression tissue for any gene. Indicates that it exists. It can be judged that the combination of transcription factors having such a general relationship is unlikely to be related to the expression of a specific gene, and it is understood that there is a high possibility that the influence due to the homology of the gene sequence is strongly expressed. . This is because the determination of gene binding between transcription factors is performed for each transcription factor, so if the sequence homology is high, the same binding determination is likely to be made, resulting in high co-occurrence and specificity. This is because it tends to be lower.

また、図１の領域２にプロットされた組み合わせのように、特異性の高い転写因子の組み合わせは、特定の生体組織において発現する遺伝子と特異な関連をもつ可能性が高い。かかる特異な関係を持つ転写因子の組み合わせは、特定の遺伝子の発現と関連している可能性が高く、研究対象としての優先順位を上げるのが適当であると判断できる。特に、共起性が低く、特異性が高い因子は、共起性の影響がないことから、特定の遺伝子の発現と関連している可能性が非常に高いと予想される。 Moreover, like the combination plotted in the region 2 of FIG. 1, a combination of transcription factors with high specificity is highly likely to have a specific relationship with a gene expressed in a specific living tissue. A combination of transcription factors having such a specific relationship is likely to be related to the expression of a specific gene, and it can be judged that it is appropriate to raise the priority as a research object. In particular, a factor with low co-occurrence and high specificity is expected to be very likely to be associated with the expression of a specific gene because it has no co-occurrence effect.

このように、共起性と特異性を同時に評価し、両者を軸とする２次元の座標空間上に評価結果をプロットすることにより、多数の転写因子の組み合わせの中から、特定の生体組織と特異な因果関係をもった組み合わせを抽出することができ、研究の効率化を図る上での有用な指標となる。 In this way, by co-occurrence and specificity are evaluated at the same time, and by plotting the evaluation results on a two-dimensional coordinate space with both as axes, a specific biological tissue can be selected from a number of combinations of transcription factors. Combinations with specific causal relationships can be extracted, which is a useful index for improving research efficiency.

次に、本実施例に係る転写因子分析装置の構成について説明する。図２は、本実施例に係る転写因子分析装置の構成を示す機能ブロック図である。同図に示すように、転写因子分析装置２００は、ネットワーク等を介してＤＢサーバ１００と接続された構成をとる。 Next, the configuration of the transcription factor analyzer according to this embodiment will be described. FIG. 2 is a functional block diagram showing the configuration of the transcription factor analyzer according to the present embodiment. As shown in the figure, the transcription factor analyzer 200 is connected to the DB server 100 via a network or the like.

ＤＢサーバ１００は、転写因子の分析に必要な各種情報を記憶したＤＢサーバであり、転写因子配列情報ＤＢ１１０と、遺伝子発現情報ＤＢ１２０と、ゲノム配列情報ＤＢ１３０と、遺伝子マッピング情報ＤＢ１４０とを有する。 The DB server 100 is a DB server that stores various information necessary for analysis of transcription factors, and includes a transcription factor sequence information DB 110, a gene expression information DB 120, a genome sequence information DB 130, and a gene mapping information DB 140.

転写因子配列情報ＤＢ１１０は、既知の転写因子の塩基配列情報を記憶したＤＢである。図３は、図２に示した転写因子配列情報ＤＢ１１０のデータ構成の一例を示すサンプル図であり、ＸＸＸ１という転写因子の塩基配列情報を示している。 The transcription factor sequence information DB 110 is a DB that stores base sequence information of known transcription factors. FIG. 3 is a sample diagram showing an example of the data configuration of the transcription factor sequence information DB 110 shown in FIG. 2, and shows base sequence information of the transcription factor XXX1.

同図は、ＸＸＸ１という転写因子が３つの塩基からなる配列であり、それぞれの構成要素における塩基の出現確率が下記の通りであることを示している。
第１の要素：シトシン（Ｃ）１００％
第２の要素：グアニン（Ｇ）５０％、アデニン（Ａ）５０％
第３の要素：グアニン（Ｇ）１０％、アデニン（Ａ）１０％、チミン（Ｔ）８０％ The figure shows that the transcription factor XXX1 is a sequence composed of three bases, and the base appearance probability in each component is as follows.
First element: 100% cytosine (C)
Second element: guanine (G) 50%, adenine (A) 50%
Third element: guanine (G) 10%, adenine (A) 10%, thymine (T) 80%

第２と第３の要素で、塩基の出現確率が一定でないのは、転写因子の塩基配列にゆらぎが存在するためである。 The reason why the occurrence probability of bases in the second and third elements is not constant is that there is fluctuation in the base sequence of the transcription factor.

遺伝子発現情報ＤＢ１２０は、実験等により測定された各生体組織毎の遺伝子発現情報を記憶したＤＢである。図４は、図２に示した遺伝子発現情報ＤＢ１２０のデータ構成の一例を示すサンプル図である。 The gene expression information DB 120 is a DB that stores gene expression information for each living tissue measured by experiments or the like. FIG. 4 is a sample diagram showing an example of the data configuration of the gene expression information DB 120 shown in FIG.

この例では遺伝子の種別をｃＤＮＡのＩＤであるＨＩＴで識別しているが、遺伝子座を用いて識別するようにしてもよい。また、この例では発現情報を相対値である比率で表しているが、絶対値で表すようにしてもよい。 In this example, the type of gene is identified by HIT which is the ID of cDNA, but it may be identified using a genetic locus. In this example, the expression information is expressed as a ratio that is a relative value, but may be expressed as an absolute value.

ゲノム配列情報ＤＢ１３０は、ゲノム（この実施例の場合ではヒトゲノム）の塩基配列情報を記憶したＤＢである。また、遺伝子マッピング情報ＤＢ１４０は、ゲノム上のどの位置にどの遺伝子が存在するかを記憶したＤＢである。 The genome sequence information DB 130 is a DB that stores base sequence information of a genome (human genome in this embodiment). The gene mapping information DB 140 is a DB that stores which gene is present at which position on the genome.

なお、図２では、全てのＤＢが１台のＤＢサーバに格納されるように図示されているが、ＤＢが複数のＤＢサーバに分散して存在するような構成をとってもよい。また、いずれかもしくは全てのＤＢを転写因子分析装置２００が内蔵するような構成をとってもよい。 In FIG. 2, all DBs are illustrated as being stored in a single DB server, but a configuration in which DBs are distributed among a plurality of DB servers may be employed. Further, the transcription factor analyzer 200 may be configured to incorporate any or all of the DBs.

転写因子分析装置２００は、利用者の指定した条件に従って転写因子の組み合わせと生体組織の関連を分析し、分析結果を利用者に提示する装置であり、入力部２１０と、表示部２２０と、インターフェース部２３０と、制御部２４０と、記憶部２５０とを有する。 The transcription factor analysis device 200 is a device that analyzes the relationship between a combination of transcription factors and a living tissue according to conditions specified by the user, and presents the analysis result to the user. The input factor 210, the display unit 220, and an interface Unit 230, control unit 240, and storage unit 250.

入力部２１０は、利用者の入力を受け付ける装置であり、キーボードやマウスからなる。表示部２２０は、画像データや文字データ等を画面表示する装置であり、液晶表示装置等がこれにあたる。インターフェース部２３０は、ネットワーク等を通じてＤＢサーバ１００とデータのやり取りをするためのインターフェースである。 The input unit 210 is a device that accepts user input and includes a keyboard and a mouse. The display unit 220 is a device that displays image data, character data, and the like on a screen, and a liquid crystal display device or the like corresponds to this. The interface unit 230 is an interface for exchanging data with the DB server 100 through a network or the like.

制御部２４０は、転写因子分析装置２００を全体制御する制御部であり、条件入力受付部２４０ａと、結合判定処理部２４０ｂと、離散化処理部２４０ｃと、確率化処理部２４０ｄと、結合確率計算部２４０ｅと、２事象相互情報量計算部２４０ｆと、３事象相互情報量計算部２４０ｇと、分析結果表示処理部２４０ｈと、情報取得部２４０ｉとを有する。 The control unit 240 is a control unit that controls the transcription factor analysis apparatus 200 as a whole, and includes a condition input receiving unit 240a, a combination determination processing unit 240b, a discretization processing unit 240c, a probability processing unit 240d, and a joint probability calculation. Unit 240e, two-event mutual information amount calculation unit 240f, three-event mutual information amount calculation unit 240g, analysis result display processing unit 240h, and information acquisition unit 240i.

条件入力受付部２４０ａは、条件入力画面を表示部２２０に表示し、利用者から分析のための各種条件の入力を受け付ける処理部である。本実施例に係る転写因子分析では、転写因子の組み合わせの一方を固定して分析をおこなうこととしており、条件入力受付部２４０ａは、条件の一部として、固定して分析する転写因子の種別の指示を受け付ける。 The condition input receiving unit 240a is a processing unit that displays a condition input screen on the display unit 220 and receives input of various conditions for analysis from the user. In the transcription factor analysis according to the present embodiment, one of the combinations of transcription factors is fixed and the analysis is performed, and the condition input receiving unit 240a determines the type of transcription factor to be fixed and analyzed as a part of the condition. Accept instructions.

結合判定処理部２４０ｂは、遺伝子マッピング情報ＤＢ１４０に格納された各種遺伝子の上流部に転写因子が結合するか否かを判定処理し、判定結果を記憶部２５０の結合判定結果記憶部２５０ａに記憶させる処理部である。 The binding determination processing unit 240b determines whether or not the transcription factor binds to the upstream portion of various genes stored in the gene mapping information DB 140, and stores the determination result in the binding determination result storage unit 250a of the storage unit 250. It is a processing unit.

図５は、図２に示した結合判定結果記憶部２５０ａのデータ構成の一例を示すサンプル図である。同図に示すように、結合判定処理部２４０ｂは、指定された転写因子がいずれかの遺伝子の上流部に結合すると判定したならば、当該の転写因子と遺伝子の情報をここに記憶させる。 FIG. 5 is a sample diagram illustrating an example of a data configuration of the combination determination result storage unit 250a illustrated in FIG. As shown in the figure, if the binding determination processing unit 240b determines that the designated transcription factor binds to the upstream portion of any gene, it stores the transcription factor and gene information here.

離散化処理部２４０ｃは、遺伝子発現情報ＤＢ１２０の発現情報を離散化して階級値化する処理部である。発現情報は、一般的には、標準マーカに対する比として相対値で示され、同一の遺伝子の生体組織別の発現量を比較するために利用されている。このため、異なる遺伝子間において、発現量の大きさを比較することは一般的に困難である。そこで、離散化処理部２４０ｃは、遺伝子の発現情報を離散化し、異なる遺伝子間で発現量を比較できるようにし、その結果を記憶部２５０の離散化結果記憶部２５０ｂに記憶させる。 The discretization processing unit 240c is a processing unit that discretizes the expression information in the gene expression information DB 120 to create a class value. Expression information is generally expressed as a relative value as a ratio to a standard marker, and is used to compare the expression level of the same gene for each living tissue. For this reason, it is generally difficult to compare the level of expression between different genes. Therefore, the discretization processing unit 240c discretizes gene expression information, enables comparison of expression levels between different genes, and stores the result in the discretization result storage unit 250b of the storage unit 250.

図６は、図２に示した離散化結果記憶部２５０ｂのデータ構成の一例を示すサンプル図である。この例では、図４で示した発現情報を３０％をしきい値として、それ以上を１へ、それ未満を０へと２値化している。なお、しきい値は、任意の大きさに設定することができる。また、複数のしきい値を設けて、３値以上に離散化をおこなうこともできる。 FIG. 6 is a sample diagram illustrating an example of a data configuration of the discretization result storage unit 250b illustrated in FIG. In this example, the expression information shown in FIG. 4 is binarized with 30% as a threshold, more than 1 and less than 0. The threshold value can be set to an arbitrary size. Also, it is possible to discretize to more than three values by providing a plurality of threshold values.

確率化処理部２４０ｄは、結合判定処理部２４０ｂの処理結果と離散化処理部２４０ｃを組み合わせて、個々の因子の確率量を算出し、記憶部２５０の確率化結果記憶部２５０ｃへ記憶させる処理部である。具体的には、離散化処理部２４０ｃが離散化して作成された生体組織別の遺伝子発現数を、結合判定処理部２４０ｂで判明した結合パターン毎に分類して集計し、確率を計算する処理をおこなう。 The probability processing unit 240d combines the processing result of the combination determination processing unit 240b and the discretization processing unit 240c, calculates the probability amount of each factor, and stores it in the probability result storage unit 250c of the storage unit 250. It is. Specifically, a process of calculating the probability by classifying and totaling the number of gene expressions by biological tissue created by the discretization processing unit 240c by discretization for each binding pattern found by the binding determination processing unit 240b. Do it.

図７は、図２に示した確率化結果記憶部２５０ｃのデータ構成の一例を示すサンプル図である。ここでは、固定して分析することを指定されたＸＸＸ１という転写因子と、これと組み合わされて分析される転写因子の一つであるＹＹＹ１の確率量を格納した部分を抽出している。 FIG. 7 is a sample diagram illustrating an example of a data configuration of the probability result storage unit 250c illustrated in FIG. Here, a portion storing the transcription factor XXX1 designated to be fixedly analyzed and the probability amount of YYY1, which is one of the transcription factors analyzed in combination, is extracted.

結合確率計算部２４０ｅは、相互情報量の計算に必要な結合確率を算出し、記憶部２５０の結合確率記憶部２５０ｄへ記憶させる処理部である。具体的には、条件付確率の一般式である。
Ｐ（Ｘ，Ｙ，Ｚ）＝Ｐ（Ｚ｜Ｘ，Ｙ）×Ｐ（Ｘ，Ｙ）
から、Ｐ（Ｘ，Ｙ，Ｚ）を求め、このＰ（Ｘ，Ｙ，Ｚ）からＰ（Ｙ，Ｚ）およびＰ（Ｚ，Ｘ）を算出する。 The connection probability calculation unit 240e is a processing unit that calculates a connection probability necessary for calculating the mutual information amount and stores it in the connection probability storage unit 250d of the storage unit 250. Specifically, it is a general expression of conditional probability.
P (X, Y, Z) = P (Z | X, Y) × P (X, Y)
From this, P (X, Y, Z) is obtained, and P (Y, Z) and P (Z, X) are calculated from this P (X, Y, Z).

図８は、図２に示した結合確率記憶部２５０ｄのデータ構成の一例を示すサンプル図である。ここでは、固定して分析することを指定されたＸＸＸ１という転写因子と、これと組み合わされて分析される転写因子の一つであるＹＹＹ１と、組織Ａ〜Ｚの結合確率を格納した部分を抽出している。 FIG. 8 is a sample diagram showing an example of the data configuration of the connection probability storage unit 250d shown in FIG. Here, extract the transcription factor XXX1 specified to be fixed and analyzed, YYY1 which is one of the transcription factors analyzed in combination with this, and the part storing the binding probability of tissues A to Z is doing.

２事象相互情報量計算部２４０ｆは、結合確率計算部２４０ｅが算出した情報を基にして２つの転写因子の２事象の相互情報量を計算し、記憶部２５０の相互情報量記憶部２５０ｅへ記憶させる処理部である。また、３事象相互情報量計算部２４０ｇは、結合確率計算部２４０ｅが算出した情報を基にして２つの転写因子と生体組織の３事象の相互情報量を計算し、記憶部２５０の相互情報量記憶部２５０ｅへ記憶させる処理部である。 The two-event mutual information amount calculation unit 240f calculates the mutual information amount of two events of two transcription factors based on the information calculated by the binding probability calculation unit 240e, and stores the mutual information amount in the mutual information amount storage unit 250e of the storage unit 250. It is a processing part to be made. Further, the three-event mutual information calculation unit 240g calculates the mutual information amount of the three events of the two transcription factors and the living tissue based on the information calculated by the coupling probability calculation unit 240e, and the mutual information amount of the storage unit 250 This is a processing unit to be stored in the storage unit 250e.

相互情報量を計算するには、既に説明したように数式１〜３を用いる。相互情報量を求めるには、数式２を用いてシャノンの情報量を求める必要がある。以下、Ｘ：ＸＸＸ１、Ｙ：ＹＹＹ１、Ｚ：組織として、図８で示した結合確率を使って相互情報量を算出してみることにする。 In order to calculate the mutual information amount, Formulas 1 to 3 are used as described above. In order to obtain the mutual information amount, it is necessary to obtain the Shannon information amount using Equation 2. Hereinafter, the mutual information amount will be calculated using the connection probabilities shown in FIG. 8 as X: XXX1, Y: YYY1, and Z: organization.

Ｘは、｛０，１｝２つの値をとるので、Ｈ（Ｘ）は、
Ｈ（Ｘ）＝−｛０．５×Ｌｏｇ（０．５）＋０．５×Ｌｏｇ（０．５）｝
＝−｛−０．３４６５７３５９−０．３４６５７３５９｝
＝０．６９３１４７１８
と求められる。 Since X takes {0, 1} two values, H (X) is
H (X) = − {0.5 × Log (0.5) + 0.5 × Log (0.5)}
=-{-0.3465657359-0.3465657359}
= 0.69314718
Is required.

Ｈ（Ｙ）は、Ｈ（Ｘ）と同様に求められ、０．６９３１４７１８となる。Ｚは、｛組織Ａ，組織Ｂ，組織Ｃ，組織Ｄ｝という４つの値をとるので、Ｈ（Ｚ）は、同様の計算により、１．２７５０３８０４９となる。 H (Y) is obtained in the same manner as H (X) and becomes 0.69314718. Since Z takes four values, {Organization A, Organization B, Organization C, Organization D}, H (Z) is 1.27038049 by the same calculation.

（Ｘ，Ｙ）は｛（１，１），（１，０），（０，１），（０，０）｝の４つの状態をとり、Ｈ（Ｘ，Ｙ）は、Ｈ（Ｘ）と同様の計算により、１．３８６２９４３６１となる。（Ｙ，Ｚ）は、Ｙ＝｛０，１｝とＺ＝｛Ａ，Ｂ，Ｃ，Ｄ｝を組み合わせた８つの状態をとり、Ｈ（Ｙ，Ｚ）は、１．９００２２５６１５となる。Ｈ（Ｚ，Ｘ）は、Ｈ（Ｙ，Ｚ）と同様に、１．９００２２５６１５となる。 (X, Y) takes four states {(1, 1), (1, 0), (0, 1), (0, 0)}, and H (X, Y) is H (X). It becomes 1.386294361 by the same calculation. (Y, Z) takes eight states combining Y = {0, 1} and Z = {A, B, C, D}, and H (Y, Z) is 1.900225615. H (Z, X) becomes 1.900225615 as H (Y, Z).

また、（Ｘ，Ｙ，Ｚ）は、図８に示した１６の状態があり、Ｈ（Ｘ，Ｙ，Ｚ）を計算すると、２．３１９５７４９９７となる。 Further, (X, Y, Z) has the 16 states shown in FIG. 8, and when H (X, Y, Z) is calculated, it becomes 2.31957997.

上記のシャノンの情報量を数式１および数式３に当てはめることにより、２事象の相互情報量と３事象の相互情報量が下記のように求められる。
Ｉ（Ｘ；Ｙ）＝０
Ｉ（Ｘ；Ｙ；Ｚ）＝−０．２０５８３８１８４ By applying the Shannon information amount to Equations 1 and 3, the mutual information amount of two events and the mutual information amount of three events are obtained as follows.
I (X; Y) = 0
I (X; Y; Z) = − 0.205831844

図９は、図２に示した相互情報量記憶部２５０ｅのデータ構成の一例を示すサンプル図である。同図に示すように、相互情報量記憶部２５０ｅには、固定された転写因子の種別と、これと組み合わされた転写因子の種別と、これらの転写因子の関係から求められた２事象の相互情報量と、３事象の相互情報量とが記憶される。 FIG. 9 is a sample diagram illustrating an example of a data configuration of the mutual information storage unit 250e illustrated in FIG. As shown in the figure, the mutual information storage unit 250e includes a fixed transcription factor type, a transcription factor type combined with the fixed transcription factor, and two events obtained from the relationship between these transcription factors. The information amount and the mutual information amount of three events are stored.

分析結果表示処理部２４０ｈは、算出した相互情報量や、相互情報量を算出する過程で得られた各種情報を分析結果として表示部２２０に表示する処理部である。分析結果表示処理部２４０ｈが表示する画面については、後述することとする。 The analysis result display processing unit 240h is a processing unit that displays the calculated mutual information and various information obtained in the process of calculating the mutual information on the display unit 220 as analysis results. The screen displayed by the analysis result display processing unit 240h will be described later.

情報取得部２４０ｉは、インターフェース部２３０を介して、ＤＢサーバ１００から各種情報を取得する処理部である。 The information acquisition unit 240 i is a processing unit that acquires various types of information from the DB server 100 via the interface unit 230.

記憶部２５０は、結合判定結果記憶部２５０ａと、離散化結果記憶部２５０ｂと、確率化結果記憶部２５０ｃと、結合確率記憶部２５０ｄと、相互情報量記憶部２５０ｅとを有する。これらの記憶部については、既に説明済みなので、ここでは説明を省略する。 The storage unit 250 includes a combination determination result storage unit 250a, a discretization result storage unit 250b, a probability result storage unit 250c, a connection probability storage unit 250d, and a mutual information storage unit 250e. Since these storage units have already been described, description thereof is omitted here.

次に、条件入力受付部２４０ａおよび分析結果表示処理部２４０ｈが表示部２２０に表示する画面について説明する。図１０は、条件入力画面の一例を示すサンプル図である。この画面は、条件入力受付部２４０ａが利用者から分析のための各種条件の入力を受け付けるための画面である。 Next, screens displayed on the display unit 220 by the condition input receiving unit 240a and the analysis result display processing unit 240h will be described. FIG. 10 is a sample diagram showing an example of the condition input screen. This screen is a screen for the condition input receiving unit 240a to receive input of various conditions for analysis from the user.

一番下にある「固定する転写因子の指定」の項目では、転写因子を一つ選択する。本実施例に係る転写因子分析方式では、転写因子の組み合わせの一方を固定して分析をおこなうが、ここで選択した転写因子が、固定して分析をおこなう転写因子となる。 In the "Specify transcription factor to be fixed" item at the bottom, select one transcription factor. In the transcription factor analysis method according to the present embodiment, analysis is performed with one of the combinations of transcription factors fixed, and the transcription factor selected here becomes the transcription factor to be fixed and analyzed.

図１１は、分析結果表示画面の一例を示すサンプル図である。この画面は、２事象相互情報量計算部２４０ｆおよび３事象相互情報量計算部２４０ｇによって算出された相互情報量を、分析結果表示処理部２４０ｈが視覚化して表示する画面である。 FIG. 11 is a sample diagram showing an example of the analysis result display screen. This screen is a screen on which the analysis result display processing unit 240h visualizes and displays the mutual information calculated by the two-event mutual information calculation unit 240f and the three-event mutual information calculation unit 240g.

同図に示すように、画面の左上には、画面の大部分を占めるグラフデータの表示領域が存在する。この領域では、共起性（２事象の相互情報量）が横軸にとられ、組織特異性（３事象の相互情報量）が縦軸にとられた座標上に、相互情報量記憶部２５０ｅに記憶された相互情報量がプロットされる。 As shown in the figure, a graph data display area occupying most of the screen exists at the upper left of the screen. In this area, the co-occurrence (mutual information amount of two events) is taken on the horizontal axis, and the mutual information storage unit 250e is on the coordinates where the tissue specificity (mutual information amount of three events) is taken on the vertical axis. The mutual information stored in is plotted.

利用者がプロットされた点の一つを選択すると、グラフデータの表示領域の右の領域に当該の転写因子の詳細情報が表示されるようになっている。グラフデータの表示領域の下の領域には、これらの相互情報量を求めるために指定された各種条件が表示される領域が設けられている。 When the user selects one of the plotted points, the detailed information of the transcription factor is displayed in the area to the right of the graph data display area. In the area below the graph data display area, there is provided an area in which various conditions designated for obtaining the mutual information are displayed.

分析結果表示画面で「分布グラフ表示」ボタンを押下すると、分布グラフ表示画面が表示される。図１２は、分布グラフ表示画面の一例を示すサンプル図である。同図に示すように、この画面では、分析結果表示画面で選択されていた転写因子と固定された転写因子の組織別の確率分布がグラフとして表示される。 When the “distribution graph display” button is pressed on the analysis result display screen, the distribution graph display screen is displayed. FIG. 12 is a sample diagram illustrating an example of a distribution graph display screen. As shown in the figure, on this screen, the probability distribution for each tissue of the transcription factor selected on the analysis result display screen and the fixed transcription factor is displayed as a graph.

画面上には、２つの転写因子が無相関な場合の期待値が点線で表示され、この点線とグラフとを比較することにより、どの組織において２つの転写因子の間に特異な関連があるのかを容易に判別できるようになっている。また、この画面では、「離散化数」ボタンを押下することにより、確率分布のグラフの代わりに離散化数のグラフを表示させることもできる。 On the screen, the expected value when the two transcription factors are uncorrelated is displayed as a dotted line. By comparing this dotted line with the graph, in which tissue there is a specific relationship between the two transcription factors Can be easily identified. Further, on this screen, by pressing the “discretization number” button, a graph of the discretization number can be displayed instead of the graph of the probability distribution.

分析結果表示画面で「遺伝子リスト表示」ボタンを押下すると、遺伝子リスト表示画面が表示される。図１３は、遺伝子リスト表示画面の一例を示すサンプル図である。同図に示すように、この画面では、分析結果表示画面で選択されていた転写因子と固定された転写因子の組織別の離散化数が一覧表として表示される。 When a “gene list display” button is pressed on the analysis result display screen, a gene list display screen is displayed. FIG. 13 is a sample diagram showing an example of a gene list display screen. As shown in the figure, on this screen, the number of discretizations for each organization of the transcription factor selected on the analysis result display screen and the fixed transcription factor is displayed as a list.

分析結果表示画面で「一覧データ表示」ボタンを押下すると、一覧データ表示画面が表示される。図１４は、一覧データ表示画面の一例を示すサンプル図である。同図に示すように、この画面では、全ての転写因子の組み合わせの相互情報量と離散化数が表示される。 When the “display list data” button is pressed on the analysis result display screen, the list data display screen is displayed. FIG. 14 is a sample diagram showing an example of a list data display screen. As shown in the figure, on this screen, the mutual information amount and the discretization number of all transcription factor combinations are displayed.

次に、図２に示した転写因子分析装置２００の処理手順について説明する。図１５は、図２に示した転写因子分析装置２００の処理手順を示すフローチャートである。 Next, the processing procedure of the transcription factor analyzer 200 shown in FIG. 2 will be described. FIG. 15 is a flowchart showing a processing procedure of the transcription factor analysis apparatus 200 shown in FIG.

同図に示すように、分析のための各種条件の入力を受け付けたならば（ステップＳ１０１）、全ての遺伝子と転写因子の結合判定をおこない（ステップＳ１０２）、遺伝子の発現データの離散化をおこなう（ステップＳ１０３）。 As shown in the figure, when input of various conditions for analysis is accepted (step S101), the binding determination of all genes and transcription factors is performed (step S102), and the expression data of the genes is discretized. (Step S103).

続いて、固定すると指定された転写因子と組み合わせる転写因子を一つ選択する（ステップＳ１０４）。ここで、転写因子配列情報ＤＢ１１０に登録された全ての転写因子を選択済である場合には（ステップＳ１０５肯定）、ステップＳ１１０へ遷移する。 Subsequently, one transcription factor to be combined with the designated transcription factor is selected (step S104). If all the transcription factors registered in the transcription factor sequence information DB 110 have been selected (Yes at Step S105), the process proceeds to Step S110.

転写因子配列情報ＤＢ１１０に登録された全ての転写因子を選択済でない場合には（ステップＳ１０５否定）、固定すると指定された転写因子と選択された転写因子の確率化処理をおこない、転写因子の状態別・組織別の確率分布を求め（ステップＳ１０６）、これを基にして結合確率を算出する（ステップＳ１０７）。 If all the transcription factors registered in the transcription factor sequence information DB 110 have not been selected (No at Step S105), the fixed transcription factor and the selected transcription factor are stochasticized if fixed, and the state of the transcription factor is determined. A probability distribution for each organization / organization is obtained (step S106), and a connection probability is calculated based on the probability distribution (step S107).

結合確率が算出されたならば、これに基づいて２事象の相互情報量を算出し（ステップＳ１０８）、さらに３事象の相互情報量の算出をおこなう（ステップ１０９）。このようにして、固定すると指定された転写因子と選択された転写因子の相互情報量が求められたならば、ステップＳ１０４から処理を再開する。 If the connection probability is calculated, the mutual information amount of two events is calculated based on this (step S108), and the mutual information amount of three events is further calculated (step 109). In this way, when the mutual information amount between the designated transcription factor and the selected transcription factor is determined to be fixed, the processing is restarted from step S104.

ステップＳ１０４で、転写因子配列情報ＤＢ１１０に登録された全ての転写因子を選択済である場合には（ステップＳ１０５肯定）、求められた全ての相互情報量を分析結果画面上にプロットして視覚化し、この画面を利用者に提示する（ステップＳ１１０）。 If all the transcription factors registered in the transcription factor sequence information DB 110 have been selected in step S104 (Yes in step S105), all the obtained mutual information amounts are plotted and visualized on the analysis result screen. This screen is presented to the user (step S110).

上述してきたように、本実施例１では、転写因子の組み合わせの共起性を一方の軸にとり、組織特異性をもう一方の軸にとった座標空間上に相互情報量をプロットして視覚化し、転写因子の組み合わせの共起性と組織特異性を同時に分析できるように構成したので、特定の組織の遺伝子発現と関係が深いと予測される転写因子の組み合わせを容易に識別することができ、転写因子研究の効率を向上させることができる。 As described above, in Example 1, the co-occurrence of transcription factor combinations is taken as one axis, and the mutual information is plotted and visualized on a coordinate space where the tissue specificity is taken as the other axis. Because it is configured to simultaneously analyze the co-occurrence and tissue specificity of transcription factor combinations, transcription factor combinations that are predicted to be closely related to gene expression in specific tissues can be easily identified, The efficiency of transcription factor research can be improved.

実施例１では、既知の転写因子の組み合わせと遺伝子発現の関連を分析する例について説明したが、この発明に係る転写因子分析プログラムおよび転写因子分析方法は、未知の転写因子を予測するためにも利用できる。本実施例２では、この発明に係る転写因子分析プログラムおよび転写因子分析方法を未知の転写因子を予測するために利用する場合について説明する。 In Example 1, although the example which analyzes the relationship between the combination of a known transcription factor and gene expression was demonstrated, the transcription factor analysis program and transcription factor analysis method concerning this invention are also in order to predict an unknown transcription factor. Available. In Example 2, a case where the transcription factor analysis program and the transcription factor analysis method according to the present invention are used for predicting an unknown transcription factor will be described.

遺伝子の発現に関連する転写因子はすべて明らかになっているわけではなく、未知のものが存在する。もしも、この発明に係る転写因子分析プログラムおよび転写因子分析方法を用いて、未知の転写因子同士の組み合わせ、もしくは未知の転写因子と既知の転写因子の組み合わせが特定の生体組織における遺伝子の発現と特異的な関連があると判明したとすれば、その未知の転写因子は実際に転写因子である可能性が高い。 Not all transcription factors associated with gene expression are known, and there are unknown ones. If the transcription factor analysis program and the transcription factor analysis method according to the present invention are used, a combination of unknown transcription factors, or a combination of an unknown transcription factor and a known transcription factor is a gene expression and specificity in a specific living tissue. If an unknown transcription factor is found, it is likely that the unknown transcription factor is actually a transcription factor.

このようにして、未知の転写因子を容易に予測することができれば、転写因子の研究の効率を大幅に向上させることができる。 Thus, if an unknown transcription factor can be easily predicted, the efficiency of transcription factor research can be greatly improved.

実施例１で説明した転写因子分析方式は、未知の転写因子の予測に対しても大きな変更を加えることなく対応できる。遺伝子上流部との転写因子の結合判定をおこなうには、転写因子の塩基配列情報さえあればよく、転写因子の塩基配列情報は、未知のものであっても問題ない。その他の処理においても、転写因子が既知のものでなければならない要因はない。 The transcription factor analysis method described in Example 1 can cope with prediction of an unknown transcription factor without making a major change. In order to determine the binding of the transcription factor to the upstream part of the gene, it is sufficient if the base sequence information of the transcription factor is sufficient, and there is no problem even if the base sequence information of the transcription factor is unknown. In other treatments, there is no factor that the transcription factor must be known.

そこで、本実施例２では、実施例１との相違点についてのみ説明することとする。図１６は、本実施例に係る転写因子分析装置の構成を示す機能ブロック図である。 Therefore, in the second embodiment, only differences from the first embodiment will be described. FIG. 16 is a functional block diagram showing the configuration of the transcription factor analyzer according to the present embodiment.

同図に示すように、制御部２４０は、未知転写因子登録部２４０ｊをさらに有する。未知転写因子登録部２４０ｊは、未知転写因子登録画面を表示部２２０に表示し、利用者に未知の転写因子の名前と塩基配列情報を入力させ、入力された情報を記憶部２５０の未知転写因子記憶部ｆに記憶させる処理部である。 As shown in the figure, the control unit 240 further includes an unknown transcription factor registration unit 240j. The unknown transcription factor registration unit 240j displays an unknown transcription factor registration screen on the display unit 220, causes the user to input the name and base sequence information of the unknown transcription factor, and inputs the input information to the unknown transcription factor in the storage unit 250. The processing unit is stored in the storage unit f.

図１７は、未知転写因子登録画面の一例を示すサンプル図である。同図に示すように、未知の転写因子の名前と、各塩基配列における塩基の出現確率を登録できるようになっている。 FIG. 17 is a sample diagram showing an example of the unknown transcription factor registration screen. As shown in the figure, it is possible to register the name of an unknown transcription factor and the appearance probability of a base in each base sequence.

記憶部２５０は、未知転写因子記憶部ｆをさらに有する。未知転写因子記憶部ｆのデータ構成は、図３で説明した転写因子配列情報ＤＢ１１０のデータ構成と同様である。 The storage unit 250 further includes an unknown transcription factor storage unit f. The data configuration of the unknown transcription factor storage unit f is the same as the data configuration of the transcription factor sequence information DB 110 described in FIG.

なお、本実施例では、未知の転写因子の塩基配列を利用者が登録するようにしているが、未知の転写因子の塩基配列をプログラムが自動的に生成するように構成することもできる。ゲノム情報から遺伝子の上流部の塩基配列は明らかになっており、転写因子の塩基配列は、この一部を反転させたものになる。 In this embodiment, the user registers the base sequence of an unknown transcription factor. However, the program may automatically generate the base sequence of an unknown transcription factor. From the genome information, the base sequence of the upstream part of the gene has been clarified, and the base sequence of the transcription factor is an inversion of this part.

次に、図１６に示した転写因子分析装置２０１の処理手順について説明する。図１８は、図１６に示した転写因子分析装置２０１の処理手順を示すフローチャートである。 Next, the processing procedure of the transcription factor analyzer 201 shown in FIG. 16 will be described. FIG. 18 is a flowchart showing a processing procedure of the transcription factor analyzer 201 shown in FIG.

ここでも、実施例１との相違点のみ説明することとする。分析の各種条件の入力を受け付ける前に、未知の転写因子の登録を受け付ける（ステップＳ２０１）。そして、各種条件の入力を受け付ける際には、固定する側の転写因子を既知の転写因子からだけではなく、ステップＳ２０１で入力された転写因子からも選択できるようにする。以降の処理においても、ステップＳ２０１で入力された転写因子を既知の転写因子と同様に扱う。 Here, only differences from the first embodiment will be described. Before accepting input of various analysis conditions, registration of an unknown transcription factor is accepted (step S201). When receiving input of various conditions, the transcription factor to be fixed can be selected not only from the known transcription factor but also from the transcription factor input in step S201. Also in the subsequent processing, the transcription factor input in step S201 is handled in the same manner as the known transcription factor.

上述してきたように、本実施例２では、未知の転写因子の塩基配列情報を登録（もしくは、生成）して、この未知の転写因子と遺伝子の発現との関連を分析できるように構成したので、未知の転写因子のなかから実際に転写因子として機能する転写因子を効率的に選別することができる。 As described above, in Example 2, the base sequence information of an unknown transcription factor is registered (or generated), and the relationship between the unknown transcription factor and gene expression can be analyzed. Thus, transcription factors that actually function as transcription factors can be efficiently selected from unknown transcription factors.

なお、実施例１および実施例２において、２事象の相互情報量と３事象の相互情報量を同時に視覚化して分析する方式を、転写因子の分析のために用いた例を説明したが、この分析方法の用途はこれに限定されるものではなく、様々な分析に用いることができる。特に、複数の因子が複雑に関連した事象の中から特異な関係を見出すことが目的の場合に、この分析方式は有用である。 In Example 1 and Example 2, the example in which the method of simultaneously visualizing and analyzing the mutual information amount of two events and the mutual information amount of three events was used for the analysis of transcription factors was described. The use of the analysis method is not limited to this, and can be used for various analyses. This analysis method is particularly useful when the purpose is to find a unique relationship among events in which a plurality of factors are complicatedly related.

（付記１）遺伝子の発現と転写因子との関連を分析する転写因子分析プログラムであって、
遺伝子の発現における第１の転写因子と、第２の転写因子と、生体組織との結合確率を計算する結合確率計算手順と、
前記結合確率計算手順により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の共起性を表す２事象の相互情報量を計算する２事象相互情報量計算手順手段と、
前記結合確率計算手順により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の組織特異性を表す３事象の相互情報量を計算する３事象相互情報量計算手順手段と、
共起性を一方の軸にとり、組織特異性をもう一方の軸にとった２次元の座標上に前記２事象相互情報量計算手順手段と前記３事象相互情報量計算手順手段により算出された相互情報量情報をプロットした図表を生成し、この図表を所定の表示手段に表示する分析結果表示処理手順と
をコンピュータに実行させることを特徴とする転写因子分析プログラム。 (Appendix 1) A transcription factor analysis program for analyzing the relationship between gene expression and transcription factors,
A binding probability calculation procedure for calculating the binding probability of the first transcription factor, the second transcription factor, and the biological tissue in the expression of the gene;
Two-event mutual information calculation procedure means for calculating the mutual information of two events representing the co-occurrence of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation procedure; ,
A three-event mutual information calculation procedure means for calculating a mutual information amount of three events representing the tissue specificity of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation procedure; ,
Mutually calculated by the two-event mutual information calculation procedure means and the three-event mutual information calculation procedure means on a two-dimensional coordinate taking the co-occurrence as one axis and the tissue specificity as the other axis. A transcription factor analysis program characterized by causing a computer to generate a chart on which information amount information is plotted and to display an analysis result display processing procedure for displaying the chart on a predetermined display means.

（付記２）前記分析結果表示処理手順は、表示手段に表示した図表上の相互情報量の一つが選択されたならば、その相互情報量に係る詳細情報を表示手段に表示することを特徴とする付記１に記載の転写因子分析プログラム。 (Appendix 2) The analysis result display processing procedure is characterized in that if one of the mutual information amounts on the chart displayed on the display means is selected, the detailed information on the mutual information amount is displayed on the display means. The transcription factor analysis program according to appendix 1.

（付記３）未知の転写因子の名称および塩基配列情報の登録を受け付ける未知転写因子登録手順をさらにコンピュータに実行させ、
前記結合確率計算手順は、前記未知転写因子登録手順により登録された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記１または２に記載の転写因子分析プログラム。
（付記４）ゲノム情報から未知の転写因子の塩基配列情報を生成する未知転写因子生成手順をさらにコンピュータに実行させ、
前記結合確率計算手順は、前記未知転写因子生成手順により生成された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記１または２に記載の転写因子分析プログラム。 (Supplementary Note 3) The computer further executes an unknown transcription factor registration procedure for accepting registration of an unknown transcription factor name and nucleotide sequence information,
The binding probability calculation procedure further calculates a binding probability in which the transcription factor registered by the unknown transcription factor registration procedure is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis program according to appendix 1 or 2.
(Appendix 4) The computer further executes an unknown transcription factor generation procedure for generating base sequence information of an unknown transcription factor from genomic information,
The binding probability calculation procedure further calculates a binding probability in which the transcription factor generated by the unknown transcription factor generation procedure is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis program according to appendix 1 or 2.

（付記５）遺伝子の発現と転写因子との関連を分析する転写因子分析方法であって、
遺伝子の発現における第１の転写因子と、第２の転写因子と、生体組織との結合確率を計算する結合確率計算工程と、
前記結合確率計算工程により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の共起性を表す２事象の相互情報量を計算する２事象相互情報量計算工程手段と、
前記結合確率計算工程により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の組織特異性を表す３事象の相互情報量を計算する３事象相互情報量計算工程手段と、
共起性を一方の軸にとり、組織特異性をもう一方の軸にとった２次元の座標上に前記２事象相互情報量計算工程手段と前記３事象相互情報量計算工程手段により算出された相互情報量情報をプロットした図表を生成し、この図表を所定の表示手段に表示する分析結果表示処理工程と
を有したことを特徴とする転写因子分析方法。 (Appendix 5) A transcription factor analysis method for analyzing the relationship between gene expression and transcription factors,
A binding probability calculation step of calculating a binding probability of the first transcription factor, the second transcription factor, and the biological tissue in gene expression;
Two-event mutual information calculation step means for calculating a mutual information amount of two events representing the co-occurrence of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation step; ,
3-event mutual information calculation step means for calculating the mutual information of the first transcription factor and the three events representing the tissue specificity of the second transcription factor based on the binding probability calculated by the binding probability calculation step; ,
Mutually calculated by the two-event mutual information calculation process means and the three-event mutual information calculation process means on a two-dimensional coordinate taking the co-occurrence on one axis and the tissue specificity on the other axis. A transcription factor analysis method comprising: generating a chart plotting information amount information and displaying the chart on a predetermined display means.

（付記６）前記分析結果表示処理工程は、表示手段に表示した図表上の相互情報量の一つが選択されたならば、その相互情報量に係る詳細情報を表示手段に表示することを特徴とする付記５に記載の転写因子分析方法。 (Appendix 6) The analysis result display processing step is characterized in that, when one of the mutual information amounts on the chart displayed on the display means is selected, detailed information relating to the mutual information amount is displayed on the display means. The transcription factor analysis method according to appendix 5.

（付記７）未知の転写因子の名称および塩基配列情報の登録を受け付ける未知転写因子登録工程をさらに有し、
前記結合確率計算工程は、前記未知転写因子登録工程により登録された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記５または６に記載の転写因子分析方法。
（付記８）ゲノム情報から未知の転写因子の塩基配列情報を生成する未知転写因子生成工程をさらに有し、
前記結合確率計算工程は、前記未知転写因子生成工程により生成された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記５または６に記載の転写因子分析方法。 (Additional remark 7) It further has the unknown transcription factor registration process which receives registration of the name and base sequence information of an unknown transcription factor,
The binding probability calculating step further calculates a binding probability obtained by applying the transcription factor registered in the unknown transcription factor registration step to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis method according to appendix 5 or 6.
(Additional remark 8) It further has the unknown transcription factor production | generation process which produces | generates the base sequence information of an unknown transcription factor from genome information,
The binding probability calculating step further calculates a binding probability obtained by applying the transcription factor generated by the unknown transcription factor generating step to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis method according to appendix 5 or 6.

（付記９）遺伝子の発現と転写因子との関連を分析する転写因子分析装置であって、
遺伝子の発現における第１の転写因子と、第２の転写因子と、生体組織との結合確率を計算する結合確率計算手段と、
前記結合確率計算手段により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の共起性を表す２事象の相互情報量を計算する２事象相互情報量計算手段手段と、
前記結合確率計算手段により算出された結合確率に基づいて第１の転写因子と、第２の転写因子の組織特異性を表す３事象の相互情報量を計算する３事象相互情報量計算手段手段と、
共起性を一方の軸にとり、組織特異性をもう一方の軸にとった２次元の座標上に前記２事象相互情報量計算手段手段と前記３事象相互情報量計算手段手段により算出された相互情報量情報をプロットした図表を生成し、この図表を所定の表示手段に表示する分析結果表示処理手段と
を備えたことを特徴とする転写因子分析装置。 (Appendix 9) A transcription factor analyzer for analyzing the relationship between gene expression and transcription factors,
A binding probability calculating means for calculating a binding probability of the first transcription factor, the second transcription factor, and the biological tissue in gene expression;
Two-event mutual information calculation means for calculating the mutual information of two events representing the co-occurrence of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation means; ,
Three-event mutual information calculation means for calculating mutual information of three events representing the tissue specificity of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation means; ,
The mutual events calculated by the two-event mutual information calculation means and the three-event mutual information calculation means on the two-dimensional coordinates taking the co-occurrence on one axis and the tissue specificity on the other axis. A transcription factor analyzing apparatus comprising: an analysis result display processing means for generating a chart in which information amount information is plotted and displaying the chart on a predetermined display means.

（付記１０）前記分析結果表示処理手段は、表示手段に表示した図表上の相互情報量の一つが選択されたならば、その相互情報量に係る詳細情報を表示手段に表示することを特徴とする付記９に記載の転写因子分析装置。 (Supplementary Note 10) The analysis result display processing means, when one of the mutual information amounts on the chart displayed on the display means is selected, displays detailed information on the mutual information amount on the display means. The transcription factor analyzer according to appendix 9.

（付記１１）未知の転写因子の名称および塩基配列情報の登録を受け付ける未知転写因子登録手段をさらに備え、
前記結合確率計算手段は、前記未知転写因子登録手段により登録された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記９または１０に記載の転写因子分析装置。
（付記１２）ゲノム情報から未知の転写因子の塩基配列情報を生成する未知転写因子生成手段をさらに備え、
前記結合確率計算手段は、前記未知転写因子生成手段により生成された転写因子を第１の転写因子と第２の転写因子の一方、もしくは両方に当てはめた結合確率をさらに計算することを特徴とする付記９または１０に記載の転写因子分析装置。 (Supplementary note 11) An unknown transcription factor registration means for accepting registration of the name and base sequence information of an unknown transcription factor,
The binding probability calculation means further calculates a binding probability in which the transcription factor registered by the unknown transcription factor registration means is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analyzer according to appendix 9 or 10.
(Additional remark 12) The transcription | transfer element production | generation means which produces | generates the base sequence information of an unknown transcription factor from genome information is further provided,
The binding probability calculating means further calculates a binding probability in which the transcription factor generated by the unknown transcription factor generating means is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analyzer according to appendix 9 or 10.

以上のように、本発明にかかる転写因子分析プログラムおよび転写因子分析方法は、転写因子の分析に有用であり、特に、多数の転写因子の組み合わせから遺伝子の発現と関連が深いと予想されるものを抽出して効率的に調査をおこなうことが必要な場合に適している。 As described above, the transcription factor analysis program and the transcription factor analysis method according to the present invention are useful for the analysis of transcription factors, and in particular, those that are expected to be closely related to gene expression from combinations of a large number of transcription factors. It is suitable for the case where it is necessary to extract and efficiently investigate.

本実施例に係る転写因子分析方式の適用例を示すサンプル図である。It is a sample figure which shows the example of application of the transcription factor analysis system concerning a present Example. 本実施例に係る転写因子分析装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the transcription factor analyzer based on a present Example. 図２に示した転写因子配列情報ＤＢのデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of transcription factor arrangement | sequence information DB shown in FIG. 図２に示した遺伝子発現情報ＤＢのデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of gene expression information DB shown in FIG. 図２に示した結合判定結果記憶部のデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of the combination determination result memory | storage part shown in FIG. 図２に示した離散化結果記憶部のデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of the discretization result memory | storage part shown in FIG. 図２に示した確率化結果記憶部のデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of the probability result memory | storage part shown in FIG. 図２に示した結合確率記憶部のデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of the joint probability memory | storage part shown in FIG. 図２に示した相互情報量記憶部のデータ構成の一例を示すサンプル図である。It is a sample figure which shows an example of a data structure of the mutual information storage part shown in FIG. 条件入力画面の一例を示すサンプル図である。It is a sample figure which shows an example of a condition input screen. 分析結果表示画面の一例を示すサンプル図である。It is a sample figure which shows an example of an analysis result display screen. 分布グラフ表示画面の一例を示すサンプル図である。It is a sample figure which shows an example of a distribution graph display screen. 遺伝子リスト表示画面の一例を示すサンプル図である。It is a sample figure which shows an example of a gene list display screen. 一覧データ表示画面の一例を示すサンプル図である。It is a sample figure which shows an example of a list data display screen. 図２に示した転写因子分析装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the transcription factor analyzer shown in FIG. 本実施例に係る転写因子分析装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the transcription factor analyzer based on a present Example. 未知転写因子登録画面の一例を示すサンプル図である。It is a sample figure which shows an example of an unknown transcription factor registration screen. 図１６に示した転写因子分析装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the transcription factor analyzer shown in FIG.

Explanation of symbols

１００ＤＢサーバ
１１０転写因子配列情報ＤＢ
１２０遺伝子発現情報ＤＢ
１３０ゲノム配列情報ＤＢ
１４０遺伝子マッピング情報ＤＢ
２００転写因子分析装置
２０１転写因子分析装置
２１０入力部
２２０表示部
２３０インターフェース部
２４０制御部
２４０ａ条件入力受付部
２４０ｂ結合判定処理部
２４０ｃ離散化処理部
２４０ｄ確率化処理部
２４０ｅ結合確率計算部
２４０ｆ２事象相互情報量計算部
２４０ｇ３事象相互情報量計算部
２４０ｈ分析結果表示処理部
２４０ｉ情報取得部
２４０ｊ未知転写因子登録部
２５０記憶部
２５０ａ結合判定結果記憶部
２５０ｂ離散化結果記憶部
２５０ｃ確率化結果記憶部
２５０ｄ結合確率記憶部
２５０ｅ相互情報量記憶部
２５０ｆ未知転写因子記憶部 100 DB server 110 Transcription factor sequence information DB
120 Gene expression information DB
130 Genome sequence information DB
140 Gene mapping information DB
200 transcription factor analyzer 201 transcription factor analyzer 210 input unit 220 display unit 230 interface unit 240 control unit 240a condition input reception unit 240b coupling determination processing unit 240c discretization processing unit 240d randomization processing unit 240e coupling probability calculation unit 240f 2 events Mutual information calculation unit 240g 3-event mutual information calculation unit 240h Analysis result display processing unit 240i Information acquisition unit 240j Unknown transcription factor registration unit 250 Storage unit 250a Binding determination result storage unit 250b Discretization result storage unit 250c Stochasticization result storage unit 250d Binding probability storage unit 250e Mutual information storage unit 250f Unknown transcription factor storage unit

Claims

A transcription factor analysis program for analyzing the relationship between gene expression and transcription factors,
A binding probability calculation procedure for calculating the binding probability of the first transcription factor, the second transcription factor, and the biological tissue in the expression of the gene;
Two-event mutual information calculation procedure means for calculating the mutual information of two events representing the co-occurrence of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation procedure; ,
A three-event mutual information calculation procedure means for calculating a mutual information amount of three events representing the tissue specificity of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation procedure; ,
Mutually calculated by the two-event mutual information calculation procedure means and the three-event mutual information calculation procedure means on a two-dimensional coordinate taking the co-occurrence as one axis and the tissue specificity as the other axis. A transcription factor analysis program characterized by causing a computer to generate a chart on which information amount information is plotted and to display an analysis result display processing procedure for displaying the chart on a predetermined display means.

The analysis result display processing procedure is characterized in that, when one of mutual information amounts on the chart displayed on the display means is selected, detailed information relating to the mutual information amount is displayed on the display means. The transcription factor analysis program described in 1.

The computer further executes an unknown transcription factor registration procedure for accepting registration of an unknown transcription factor name and nucleotide sequence information,
The binding probability calculation procedure further calculates a binding probability in which the transcription factor registered by the unknown transcription factor registration procedure is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis program according to claim 1 or 2.

The computer further executes an unknown transcription factor generation procedure for generating base sequence information of an unknown transcription factor from genomic information,
The binding probability calculation procedure further calculates a binding probability in which the transcription factor generated by the unknown transcription factor generation procedure is applied to one or both of the first transcription factor and the second transcription factor. The transcription factor analysis program according to claim 1 or 2.

A transcription factor analysis method for analyzing the relationship between gene expression and transcription factors,
A binding probability calculation step of calculating a binding probability of the first transcription factor, the second transcription factor, and the biological tissue in gene expression;
Two-event mutual information calculation step means for calculating a mutual information amount of two events representing the co-occurrence of the first transcription factor and the second transcription factor based on the binding probability calculated by the binding probability calculation step; ,
3-event mutual information calculation step means for calculating the mutual information of the first transcription factor and the three events representing the tissue specificity of the second transcription factor based on the binding probability calculated by the binding probability calculation step; ,
Mutually calculated by the two-event mutual information calculation process means and the three-event mutual information calculation process means on a two-dimensional coordinate taking the co-occurrence on one axis and the tissue specificity on the other axis. A transcription factor analysis method comprising: generating a chart plotting information amount information and displaying the chart on a predetermined display means.