JP4477894B2

JP4477894B2 - Genetic analysis system

Info

Publication number: JP4477894B2
Application number: JP2004037606A
Authority: JP
Inventors: 弘三川原; 光輝高津
Original assignee: 株式会社ワールドフュージョン
Priority date: 2004-02-16
Filing date: 2004-02-16
Publication date: 2010-06-09
Anticipated expiration: 2024-02-16
Also published as: JP2005228171A

Description

本発明は、コンピュータを用いたテーラーメード医療のための個体差を遺伝子レベルで特定するシステムに関する。特に、ゲノム上に発現実験データをうまく表現することで染色体上に近隣の遺伝子の発現状況を容易に把握することができ、具体的に個体差が強く関連する候補となりうる遺伝子を検出することができる遺伝子解析システムに関する。 The present invention relates to a system for identifying individual differences at the gene level for tailor-made medicine using a computer. In particular, by expressing expression experiment data on the genome well, it is possible to easily grasp the expression status of neighboring genes on the chromosome, and specifically to detect genes that can be related candidates with strong individual differences. It relates to a gene analysis system that can be used.

テーラーメード医療のための個体差を遺伝子レベルで特定する方法はさまざまである。疾患や薬物感受性などに関連する候補遺伝子のSNPsを利用し、遺伝子型を患者ごとに求め、求めたデータをもとに患者をグループ化する。同時に同患者群でのさまざまな細胞の動きを発現実験で求め、それぞれの遺伝子型や表現型と関連づける。 There are various methods for identifying individual differences at the genetic level for tailor-made medicine. Using SNPs of candidate genes related to diseases, drug sensitivity, etc., genotypes are obtained for each patient, and patients are grouped based on the obtained data. At the same time, various cell movements in the same patient group are obtained by expression experiments and associated with each genotype and phenotype.

特に、発現実験データを用いて実験によって実際に動きのある遺伝子を特定するために発現実験の解析を行う。具体的には遺伝子の発現パターンを利用して遺伝子をクラスター化し、特異的な発現パターンを持つ遺伝子群を特定する。 In particular, an expression experiment is analyzed in order to identify a gene that actually moves by an experiment using expression experiment data. Specifically, genes are clustered using gene expression patterns, and genes having specific expression patterns are identified.

図１８は、従来使用している発現解析ソフトの処理状況を示している。発現実験に使用した遺伝子のクラスター化は出来るが、クラスター化した遺伝子がゲノム上のどこにあるのかがわからない。例えば、クラスター化して同じような発現パターンを持っている遺伝子がゲノム上の近い位置にあるのか、互いに離れているのかをここでは見ることができない。 FIG. 18 shows the processing status of conventionally used expression analysis software. You can cluster the genes used in the expression experiment, but you do not know where the clustered genes are on the genome. For example, it is not possible here to see whether genes that are clustered and have similar expression patterns are close to each other on the genome or separated from each other.

したがって、上記発現解析ソフトによって遺伝子群が特定できたとしても、それらの遺伝子の染色体上の位置と遺伝子に関連する網羅的な情報は、その都度ゲノムデータベースを検索しないと分からないのが現状である。 Therefore, even if the gene group can be identified by the above expression analysis software, the present situation is that if you do not search the genome database each time, you will not know the location of these genes on the chromosome and the comprehensive information related to the genes. .

図１９は、発現実験に使用された遺伝子群をゲノムデータベースを用いて検索した例であり、染色体上に対象の遺伝子が存在する位置を確認することができる。○で囲んだ部分が検索を行った遺伝子で、全染色体に６箇所あるが、遺伝子の周辺にある網羅的な情報（アノテーション）はさらにリンクをたどり、抽出するようにしている。 FIG. 19 is an example in which a gene group used in an expression experiment is searched using a genome database, and the position where the target gene exists on the chromosome can be confirmed. The part enclosed with ○ is the searched gene, and there are 6 places on all chromosomes. However, exhaustive information (annotations) around the gene is further followed by links and extracted.

上記従来の遺伝子解析手法では、発現解析ソフトで発現パターンの検索とパターンでのクラスタリングを行った後、遺伝子のゲノム上の位置や遺伝子の情報を網羅的に検索する作業を別途行わなければならないため、非常に手間がかかっていた。 In the conventional gene analysis method described above, after searching for expression patterns and clustering by pattern using expression analysis software, it is necessary to separately perform a comprehensive search of gene locations and gene information. It was very time consuming.

また、たとえゲノム上に点在する遺伝子の位置と発現パターンの関係を自動的に判断し、結果を数値でリスト化できたとしても、数値情報だけでは情報の見落が発生する。 Even if the relationship between the positions of genes scattered on the genome and the expression pattern is automatically determined and the results can be listed as numerical values, the information is overlooked only by the numerical information.

したがって、完璧な発現情報やゲノム情報のデータベースが完成したとしてもコンピュータによる自動判断でのみ遺伝子機能決定や疾患、治療薬などとの関連性の研究を進めることは不可能なので、研究者の目で見て確認しながら遺伝子の機能決定を行うという工程は必須である。 Therefore, even if a complete database of expression information and genome information is completed, it is impossible to proceed with research on gene function determination and relevance to diseases, therapeutic drugs, etc. only by computer automatic judgment. The process of determining the function of a gene while viewing and confirming it is essential.

例えば、遺伝子の位置と発現パターンの関係を網羅的な遺伝子情報を含む形式でグラフィック表示してみせることで、コンピュータでは発見できない生物現象とのつながりを発見することがある。 For example, a relationship between a gene position and an expression pattern may be displayed graphically in a format including comprehensive gene information, thereby finding a connection with a biological phenomenon that cannot be found by a computer.

本発明は、上述した課題を解決するために創案されたものであり、発現解析によって特定された遺伝子群とそれらの遺伝子のゲノム上の位置と遺伝子に関連する網羅的な情報をまとめて抽出することができるとともに、研究者の勘と知識を十分に生かし、研究者が遺伝子機能や疾患、治療薬などさまざまなカテゴリーとの関係を発見しやすくするためのグラフィック表示インターフェースを持った遺伝子解析システムを提供することを目的としている。 The present invention was created to solve the above-described problems, and collectively extracts gene groups identified by expression analysis, genomic positions of those genes, and comprehensive information related to the genes. A gene analysis system with a graphic display interface that makes it easy for researchers to discover relationships with various categories such as gene function, diseases, and therapeutic drugs, making full use of their intuition and knowledge It is intended to provide.

上記目的を達成するために、請求項１記載の発明は、発現実験で得られた遺伝子情報と発現データを読み込み、内部データと合成する遺伝子解析システムにおいて、前記発現実験で得られた情報と発現データを内部データベースに読み込む過程で、発現実験で使用したプローブ毎に、各プローブが使用した遺伝子とこの遺伝子のゲノム上の位置とを特定するゲノム位置特定手段と、前記ゲノム位置特定手段で特定された位置情報と遺伝子に関する情報を結び付けてデータを格納する登録手段と、閲覧したい遺伝子の範囲を染色体上で選択できる第１の選択手段と、前記第１の選択手段で選択された遺伝子範囲のうち、前記登録手段に発現データが存在する第１の遺伝子領域を抽出する抽出手段と、前記第１の選択手段で選択された遺伝子範囲に対してアノテーションを表示する第１の表示手段と、前記第１の表示手段によりアノテーションが表示された遺伝子範囲と前記第１の遺伝子領域との位置関係を前記登録手段のデータに基づき遺伝子単位で対比させて発現データとともに表示する第２の表示手段と、前記第２の表示手段での表示に基づき、第１の遺伝子領域内から第２の遺伝子領域を選択する第２の選択手段と、前記登録手段からデータを取得し、前記第２の遺伝子領域の各遺伝子を列又は行に、発現実験の実験条件を行又は列に並べるとともに、前記各遺伝子の各実験条件に対して発現値に関する情報を表示する第３の表示手段とを備えたことを特徴とする遺伝子解析システムである。 In order to achieve the above object, the invention according to claim 1 is a gene analysis system that reads gene information and expression data obtained in an expression experiment and synthesizes them with internal data. In the process of reading the data into the internal database, for each probe used in the expression experiment, the genome position specifying means for specifying the gene used by each probe and the position of this gene on the genome are specified by the genome position specifying means. A registration means for storing data by associating position information and information on genes, a first selection means for selecting a range of genes to be browsed on a chromosome, and a gene range selected by the first selection means Extraction means for extracting a first gene region in which expression data exists in the registration means; and a gene range selected by the first selection means The first display means for displaying the annotations, and the positional relationship between the gene range in which the annotations are displayed by the first display means and the first gene regions are compared on a gene basis based on the data of the registration means. Second display means for displaying together with expression data, second selection means for selecting a second gene region from within the first gene region based on the display on the second display means, and the registration Data is obtained from the means, each gene of the second gene region is arranged in a column or row, and the experimental conditions of the expression experiment are arranged in a row or column, and information on the expression value for each experimental condition of each gene A gene analysis system comprising a third display means for displaying .

また、請求項２記載の発明は、請求項１記載の遺伝子解析システムにおいて、前記ゲノム位置特定手段は、Locus ID、遺伝子名、遺伝子シンボル、UniGene ID、nm ID、Accession IDのいずれかのキーを用いて公開された外部情報からゲノム位置情報を検索する外部データ検索手段と、前記外部データ検索手段によりデータが検出できない場合には、Accession IDのキーにより内部のデータを検索する第１の内部データ検索手段と、前記いずれの手段においてもデータが検出できない場合または、Locus ID、遺伝子名、遺伝子シンボル、UniGene ID、nm ID、Accession IDのいずれのキーも得られない場合には、核酸またはアミノ酸配列情報に基づいて内部のデータを検索する第２の内部データ検索手段とで構成されていることを特徴とする遺伝子解析システムである。 The invention according to claim 2 is the gene analysis system according to claim 1, wherein the genome position specifying means uses any key of Locus ID, gene name, gene symbol, UniGene ID, nm ID, and Accession ID. External data search means for searching genome position information from external information published by using the first internal data for searching internal data using an Accession ID key when data cannot be detected by the external data search means Nucleic acid or amino acid sequence if no data can be detected by the search means and any of the above means, or if any key of Locus ID, gene name, gene symbol, UniGene ID, nm ID, Accession ID cannot be obtained A gene analysis system comprising: second internal data search means for searching internal data based on information.

また、請求項３記載の発明は、請求項１又は請求項２のいずれかに記載の遺伝子解析システムにおいて、前記第２の表示手段では、アノテーションが表示された遺伝子範囲と前記第１の遺伝子領域を各々着色することで位置関係を対比させることを特徴とする遺伝子解析システムである。 The invention according to claim 3 is the gene analysis system according to claim 1 or 2 , wherein the second display means includes a gene range in which an annotation is displayed and the first gene region. It is a gene analysis system characterized by contrasting the positional relationship by coloring each.

本発明によれば、ゲノム研究に必要とされる染色体上のさまざまなアノテーションと発現実験データを同時に参照することができるので、特徴的な遺伝子発現パターンの検索と検索された遺伝子の周辺領域のゲノム情報を網羅的に参照することが可能である。 According to the present invention, various annotations on chromosomes required for genome research and expression experiment data can be referred to at the same time, so it is possible to search for characteristic gene expression patterns and genomes around the searched genes. It is possible to refer to information comprehensively.

また、アノテーション表示の遺伝子領域と発現データに関する遺伝子領域との位置関係を遺伝子単位で対比させて表示したり、着色することで、現象を視覚的に訴えることができ、研究者の勘と知識を十分に生かせることができる。 In addition, the positional relationship between the gene region of the annotation display and the gene region related to the expression data can be displayed by comparing them in gene units, or by coloring, the phenomenon can be visually appealed, and the intuition and knowledge of researchers can be expressed. You can make full use of it.

以下、図面を参照して本発明の一実施形態を説明する。図１は本発明の発現実験に利用した遺伝子に関する情報を登録・表示するまでのフローチャートを示す。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 shows a flowchart for registering and displaying information on genes used in the expression experiment of the present invention.

遺伝子発現量は遺伝子がコードされている領域のDNA塩基配列を元に作られるmRNAの量のデータのことであり，遺伝子がその機能を果たす時に遺伝子の発現量は変化する。そして、この遺伝子の発現の量を元に遺伝子の機能を解析するために、発現の量を各施設で実験により検出している。 The gene expression level is data on the amount of mRNA produced based on the DNA base sequence of the region where the gene is encoded, and the gene expression level changes when the gene performs its function. And in order to analyze the function of a gene based on the expression level of this gene, the expression level is detected by experiment in each facility.

これら発現実験で得られた発現データは、外部のデータベース等に蓄積されており、例えば図２のようになっている。細胞名、サンプル、薬剤、疾患、患者などの実験カテゴリーと、施設ごとの管理ＩＤ、各遺伝子のプローブセットが対応するLocus ID（図ではGene Symbol）、各遺伝子のプローブセットに使用したAccession ID（核酸やアミノ酸配列データベースに利用している個々のエントリー配列につけられたID）やFold_change、 P-Value、Ratioなどの発現値等から構成されている。また、図示はされていないが、遺伝名や遺伝子シンボル、核酸またはアミノ酸配列などのデータが登録されている場合もある。なお、Locus ID は、NCBI（米国国立バイオテクノロジーセンター）で用いられているものである。 Expression data obtained in these expression experiments is accumulated in an external database or the like, for example, as shown in FIG. Experiment category such as cell name, sample, drug, disease, patient, etc., management ID for each facility, Locus ID corresponding to each gene probe set (Gene Symbol in the figure), Accession ID used for each gene probe set ( It consists of the ID) assigned to each entry sequence used in nucleic acid and amino acid sequence databases, and expression values such as Fold_change, P-Value, and Ratio. Although not shown, there are cases where data such as genetic names, gene symbols, nucleic acids or amino acid sequences are registered. Locus ID is used by NCBI (National Biotechnology Center, USA).

このような外部データベースから本発明の遺伝子解析システム内にデータを取り込むのであるが、発現データを読みこむ際に必要な項目は、発現値とそれぞれの発現値に対して遺伝子を特定できるための遺伝子名、遺伝子シンボル、Accession ID、Locus ID、核酸またはアミノ酸配列などである。 Data is imported from such an external database into the gene analysis system of the present invention. The items necessary for reading the expression data are the expression value and a gene for specifying the gene for each expression value. Name, gene symbol, Accession ID, Locus ID, nucleic acid or amino acid sequence.

これらのデータを遺伝子解析システム上の内部データベースに取り込む過程を模式的に示したのが、図３であり、図１のフローチャートも参照しながら説明すると、まず、図２で示したようなデータが蓄積されている外部データベース１から本発明の遺伝子解析システム上の内部データベース２に読み込む(Ｓ１)。次に、各プローブセット毎に、それぞれのプローブセットに使用したゲノム上の遺伝子とその遺伝子の染色体上の位置を特定しながら（Ｓ２）、内部データベース２に格納する（Ｓ３）。内部データベースに格納されたデータは、他の必要とされるゲノム情報とマージされインターネットやイントラネットのネットワークを介し、ブラウザ３でデータが配信される（Ｓ４）。 FIG. 3 schematically shows the process of importing these data into the internal database on the gene analysis system. When described with reference to the flowchart of FIG. 1, first, the data as shown in FIG. The stored external database 1 is read into the internal database 2 on the gene analysis system of the present invention (S1). Next, for each probe set, the gene on the genome used for each probe set and the position of the gene on the chromosome are specified (S2) and stored in the internal database 2 (S3). The data stored in the internal database is merged with other required genome information and distributed by the browser 3 via the Internet or an intranet network (S4).

内部データベース２には、各種の必要なテーブル構造が用意されており、上記発現データ等を取り込んで、各情報を登録するためのテーブル構造を示すのが図４である。リレーショナルデータベース構造をとっており、染色体番号ごとのテーブル（chr(n)_expData）と、細胞、サンプル、薬剤、疾患、患者等の実験カテゴリー（Category）のテーブルと、遺伝子の染色体上の位置を特定するためのLocus IDやその位置情報とプローブセットのAccesion IDの対応テーブル（geneAcc）とから構成されている。 Various necessary table structures are prepared in the internal database 2, and FIG. 4 shows a table structure for taking in the expression data and registering each information. It has a relational database structure, and a table for each chromosome number (chr (n) _expData), a table of experimental categories (Category) such as cells, samples, drugs, diseases, and patients, and the location of the gene on the chromosome. It consists of a Locus ID and its position information and a correspondence table (geneAcc) of the accession ID of the probe set.

ここでcategory_nameはカテゴリ名、locusはLocus ID、gsymbolは遺伝子シンボルまたは遺伝子名、txStartは染色体のスタート位置、txEndは染色体のエンド位置、chromは染色体番号、accessionはAccesion ID、category_idはCategoryテーブルのID、gaidはgeneAccテーブルのID、fchangeはfold_change(発現値)、dfchangeは発現値、 pvalueは p-value(発現値)、dpvalueは発現値、 tcourseはタイムコースを示している。 Where category_name is the category name, locus is the Locus ID, gsymbol is the gene symbol or gene name, txStart is the start position of the chromosome, txEnd is the end position of the chromosome, chrom is the chromosome number, accession is the Accession ID, category_id is the ID of the Category table , Gaid is the geneAcc table ID, fchange is fold_change (expression value), dfchange is the expression value, pvalue is the p-value (expression value), dpvalue is the expression value, and tcourse is the time course.

また、内部データベース２には、上記図４のテーブルだけではなく、図５のようなテーブル構造も備えている。図５のテーブルには、あらかじめ、公開された情報や公共のデータベースからデータを取り込んで、マイニングされたデータが格納されている。図の各アノテーションテーブルは、Accession IDをもったアノテーションのテーブルを意味し、estHumanテーブル、mouse,rat,bostaurus,susscrofaなどのEST、mrnaテーブル、cdsテーブル、unigeneテーブル、stsテーブル、snpテーブル、ユーザー独自のアノテーションテーブルなどから構成されている。 Further, the internal database 2 has not only the table of FIG. 4 but also a table structure as shown in FIG. The table shown in FIG. 5 stores data mined by taking in data from public information or public databases in advance. Each annotation table in the figure means an annotation table with Accession ID. EstHuman table, mouse, rat, bostaurus, susscrofa, etc. EST, mrna table, cds table, unigene table, sts table, snp table, user's own It consists of an annotation table.

また、各アクセッションテーブルには、上記各種のテーブルに特有な遺伝子の周辺情報が格納されている。これらのアノテーションテーブルそれぞれにTable1が作成されており、各アノテーションテーブルと、Accession IDを含むテーブル（Table1）とは、Table1のannot_id（アノテーションテーブルID）により結び付けられている。Table1は図５に示された要素を主要部として構成されているもので、nameはaccession ID、chromは染色体番号、chromStartは染色体のスタート位置、chromEndは染色体のエンド位置を示している。 In addition, each accession table stores peripheral information on genes unique to the various tables. Table 1 is created in each of these annotation tables, and each annotation table and a table including an Accession ID (Table 1) are linked by annot_id (annotation table ID) of Table 1. Table 1 is composed mainly of the elements shown in FIG. 5, where name is the accession ID, chrom is the chromosome number, chromStart is the chromosome start position, and chromEnd is the chromosome end position.

次に、内部データベース２に外部データを取り込む際に、図１のＳ２のステップで行う処理を詳しく説明する。発現実験で使用した各Probe Setに対応する発現データや遺伝子リストを内部データベース２に取り込む際に、それらの入力データから遺伝子名とその染色体上の位置を特定する必要があり、その処理を示したのが図６である。 Next, the process performed in step S2 of FIG. 1 when external data is taken into the internal database 2 will be described in detail. When importing the expression data and gene list corresponding to each probe set used in the expression experiment into the internal database 2, it is necessary to specify the gene name and the position on the chromosome from the input data, and the processing was shown. This is shown in FIG.

染色体上の場所を特定するキーは、入力データの中にあるLocus Link ID、Accession ID、遺伝子名、遺伝子シンボル、UnGene ID、nm ID、核酸やアミノ酸配列などを利用する。それらの項目が入力するデータのどこのカラムもしくはデータベースのどのフィールドに格納しているかということを、あらかじめ内部データベースに設定しておき自動、または手動で図６の処理を進める。 The key for specifying the location on the chromosome uses the Locus Link ID, Accession ID, gene name, gene symbol, UnGene ID, nm ID, nucleic acid, amino acid sequence, etc. in the input data. The column in which data of these items is stored or in which field of the database is stored in advance in the internal database, and the process of FIG. 6 is advanced automatically or manually.

先ず、検索キーとなるフィールドから、Locus Link ID、Accession ID、遺伝子名、遺伝子シンボル、UnGene ID、nm ID、核酸配列かもしくはアミノ酸配列データを抽出する（Ｔ１）。どれか一つでもあれば検索可能である。UniGeneはNCBIで管理されている公共ゲノムデータベースの一部で、遺伝子ごとに管理されたEST配列のIDである。Nm IDは同様、NCBIで管理されているmRNAのデータベースＩＤであり、Locus IDとは、NCBIで管理されている染色体の位置情報に関連する、UniGene、mRNA、文献、疾患など、さまざまなデータベースとのリンク情報が載っているデータベースＩＤである。 First, Locus Link ID, Accession ID, gene name, gene symbol, UnGene ID, nm ID, nucleic acid sequence, or amino acid sequence data are extracted from the search key field (T1). Any one of them can be searched. UniGene is part of the public genome database managed by NCBI and is the ID of the EST sequence managed for each gene. Nm ID is also the database ID of mRNA managed by NCBI, and Locus ID is a database of various databases such as UniGene, mRNA, literature, and disease related to the location information of chromosomes managed by NCBI. This is the database ID in which the link information is listed.

もしも、Locus Linkデータベースに記載されているキー（Locus Link ID、Accession ID、遺伝子名、遺伝子シンボル、UnGene ID、nm ID）が抽出することができれば（Ｔ２YES）、Locus Link データベース又はそれと同等の外部データベースを検索し、一致するデータが存在すれば（Ｔ３ YES）、Locus Link データベース又はそれと同等の外部データベースから染色体上の位置を特定し（Ｔ４）、位置情報を内部データベースの図４に示すテーブルに登録する（Ｔ５）。 If the key (Locus Link ID, Accession ID, gene name, gene symbol, UnGene ID, nm ID) described in the Locus Link database can be extracted (T2YES), the Locus Link database or an equivalent external database If there is matching data (T3 YES), the position on the chromosome is identified from the Locus Link database or an equivalent external database (T4), and the position information is registered in the internal database table shown in FIG. (T5).

一方、Ｔ２でLocus Link ID、Accession ID、遺伝子名、遺伝子シンボル、UniGene ID、nm ID のいずれの検索キーも存在しない場合（NO）にはＴ１２に進み、Ｔ３でLocus Linkデータベース又はそれと同等の外部データベースに検索キーのデータが存在しない（NO）ならば、Ｔ６に進む。 On the other hand, if no search key of Locus Link ID, Accession ID, gene name, gene symbol, UniGene ID, and nm ID exists at T2 (NO), proceed to T12, and at T3, the Locus Link database or equivalent external If no search key data exists in the database (NO), the process proceeds to T6.

Ｔ６では、Ｔ１で入力データから抽出した検索キーの中にAccession IDデータが存在しているかどうかを確認し、Accession IDデータがある場合（Ｔ６ YES）、内部データベースに格納されている図５のようなデータを当該Accession IDデータを用いて検索する。検索の結果、一致するデータが存在する場合（Ｔ７YES）、図５に示すような情報から染色体上の位置を特定し（Ｔ８）、その位置情報を図４に示すテーブルに登録する（Ｔ５）。 At T6, it is confirmed whether Accession ID data exists in the search key extracted from the input data at T1, and if Accession ID data is present (T6 YES), it is stored in the internal database as shown in FIG. Search for the relevant data using the Accession ID data. If there is matching data as a result of the search (T7 YES), the position on the chromosome is specified from the information shown in FIG. 5 (T8), and the position information is registered in the table shown in FIG. 4 (T5).

他方、上記Accession IDデータが存在しない場合（Ｔ６ NO）はＴ１２に、内部データベースに一致するAccession IDデータが存在しない場合（Ｔ７ NO）はＴ９に進む。 On the other hand, if the Accession ID data does not exist (T6 NO), the process proceeds to T12. If the Accession ID data matching the internal database does not exist (T7 NO), the process proceeds to T9.

Ｔ９では、Accession IDデータ以外の検索キーを用いてNCBI(Entrezシステム)をインターネット経由で検索して核酸もしくはアミノ酸配列を取得し（Ｔ９）、取得した配列を用いて内部データベースもしくは内部データベースにリンクされた外部のアミノ酸データベースに相同性検索を行い（Ｔ１０）、相同性検索の結果を利用し、染色体上の位置をもとめ（Ｔ１１）、内部データベースの図４のテーブルに登録する（Ｔ５）。 At T9, NCBI (Entrez system) is searched via the Internet using a search key other than Accession ID data to obtain a nucleic acid or amino acid sequence (T9), and the obtained sequence is linked to an internal database or internal database. The homology search is performed on the external amino acid database (T10), the result of the homology search is used to determine the position on the chromosome (T11), and it is registered in the table of FIG. 4 in the internal database (T5).

一方、Ｔ１２では、Ｔ１での入力データ中の核酸又はアミノ酸配列データを取得し、取得した配列を用いて内部データベースもしくは内部データベースにリンクされた外部のアミノ酸データベースに相同性検索を行い（Ｔ１０）、相同性検索の結果を利用し、染色体上の位置をもとめ（Ｔ１１）、内部データベースの図４のテーブルに登録する（Ｔ５）。 On the other hand, at T12, nucleic acid or amino acid sequence data in the input data at T1 is acquired, and homology search is performed on the internal database or an external amino acid database linked to the internal database using the acquired sequence (T10), Using the result of the homology search, the position on the chromosome is determined (T11) and registered in the table of FIG. 4 in the internal database (T5).

次に本発明の遺伝子解析システムのゲノム表示部について説明する。
図７で示すゲノム表示部の主画面１１は、ゲノム上の遺伝子を表示する遺伝子表示部１２やその他のアノテーションを表示するアノテーション表示部１３、発現データを表示する発現データ表示部１４等から構成されている。このように、必要なゲノムアノテーションと、発現解析を行った遺伝子を同時に表示するようになっている。図７の例ではヒトゲノムの染色体情報を示すが、１５が染色体で、１６は染色体番号を表し、現在１番染色体が選択されている。１番染色体のうち、アノテーションが表示されているのが１７で選択された部分である。１８は染色体１５をさらに細かく表示した遺伝子表示であり、１９は１７同様にアノテーションの表示部分を選択した様子である。 Next, the genome display unit of the gene analysis system of the present invention will be described.
The main screen 11 of the genome display unit shown in FIG. 7 includes a gene display unit 12 that displays genes on the genome, an annotation display unit 13 that displays other annotations, an expression data display unit 14 that displays expression data, and the like. ing. In this way, the necessary genome annotation and the gene subjected to expression analysis are displayed simultaneously. In the example of FIG. 7, chromosome information of the human genome is shown. 15 is a chromosome, 16 is a chromosome number, and currently the first chromosome is selected. Of the first chromosome, the annotation is displayed in the portion selected at 17. 18 is a gene display in which chromosome 15 is displayed more finely, and 19 is a state in which an annotation display portion is selected as in the case of 17.

主画面１１の下半分に、必要とされるアノテーションが表示される。必要とされるアノテーションは、図８のような選択画面で必要なアノテーションを３２で示す項目から選択し、それらの表示方法を３１の表示方法の項目で詳細表示（Full）、粗表示（Dense）、非表示（Hide）等を選択し、アノテーションの色を３３のColorを選択して決定し表示する。３４の部分が発現データ部分に相当するが、この例では、Expression、testexp1、testexp2を発現データとして図７に表示させている。Expression、testexp1、testexp2はそれぞれ異なる実験の単位であり、発現データを登録する再に、自由に選択、追加できるようになっている。 Necessary annotations are displayed in the lower half of the main screen 11. The required annotations are selected from the items indicated by 32 on the selection screen as shown in FIG. 8, and the display methods are displayed in detail (Full) and coarse display (Dense) in 31 display method items. , Hide, etc. are selected, and the annotation color is determined by selecting 33 Colors and displayed. The portion 34 corresponds to the expression data portion. In this example, Expression, testexp1, and testexp2 are displayed as expression data in FIG. Expression, testexp1, and testexp2 are units of different experiments, and can be freely selected and added when registering expression data.

通常、遺伝子とプローブセットの関係は図９のような関係になっている。遺伝子４１の表示の内、幅の太い部分がエクソンで、細い部分がイントロンである。一方、プローブセット４２の各プローブは図のように遺伝子の一部で設計され、発現データはこの各プローブに対して発生する。本来、これらの各プローブと同じ位置に発現データをマッピングするのが本来だが、より現象を視覚的に訴えるために、図７の２０及び２１の破線で囲まれた部分のように、アノテーション表示の遺伝子領域と発現データに関する遺伝子領域との位置関係を遺伝子単位で対比させて表現させるようにしている。 Usually, the relationship between the gene and the probe set is as shown in FIG. Of the gene 41 display, the thick part is an exon and the thin part is an intron. On the other hand, each probe of the probe set 42 is designed as a part of a gene as shown in the figure, and expression data is generated for each probe. Originally, the expression data is originally mapped at the same position as each of these probes. However, in order to appeal the phenomenon more visually, the annotation display is performed as indicated by the broken lines 20 and 21 in FIG. The positional relationship between the gene region and the gene region related to the expression data is expressed by comparing in gene units.

また、表示する際に発現実験に使用したProbe Setがある遺伝子を対象に、遺伝子領域と同じ領域を同一色で表示するか、発現実験値の平均を色に変換して表示するようにして、さらに視覚的にわかりやすくすることができる。図７の例では、Genesで示す遺伝子２０で発現実験をしたデータは、ブロック２１で示されるように、一定の幅で図８のColor項目３３で指定した色(例えば赤)、一色で表現している。 In addition, for the gene with the probe set used for the expression experiment when displaying, the same region as the gene region is displayed in the same color, or the average of the expression experiment value is converted into color and displayed, Furthermore, it can be made visually easy to understand. In the example of FIG. 7, the data obtained by the expression experiment with the gene 20 indicated by Genes is expressed by a single color (for example, red) designated by the Color item 33 in FIG. ing.

このように、当該遺伝子で利用したProbe Setはその遺伝子領域の一部であるが、遺伝子単位で見たほうがわかりやすい。このような主画面を中心に、発現データの詳細をさらに以下の方法で表現する。 As described above, the probe set used in the gene is a part of the gene region, but it is easier to understand by looking at the gene unit. The details of the expression data are further expressed by the following method centering on such a main screen.

詳細表示は、図７で選択された範囲の遺伝子のデータを図１０のように一覧表示する。データは縦軸５１が実験条件で細胞や細胞に刺激を与える薬剤、サンプルやその他の条件で、同じ遺伝子を使った一連の実験で多数の実験データが作成される。横軸は図７で選択された範囲の遺伝子である。５３の部分が実際の発現値で縦軸がタイムコースなどの５１の実験条件の中での複数実験の結果である。横軸が同じ遺伝子内に複数のプローブセットがあった場合、プローブセットの数で分割して表示する表示する順は、Accession ID順か、選択された遺伝子範囲の中でのゲノム位置の順である。 In the detailed display, the gene data in the range selected in FIG. 7 is displayed as a list as shown in FIG. The vertical axis 51 is a cell, a drug that stimulates the cell under experimental conditions, a sample, and other conditions, and a large number of experimental data is created in a series of experiments using the same gene. The horizontal axis represents the range of genes selected in FIG. 53 is the actual expression value, and the vertical axis is the result of multiple experiments under 51 experimental conditions such as the time course. When there are multiple probe sets in the gene with the same horizontal axis, the display order is displayed divided by the number of probe sets in the order of Accession ID or the order of genome positions within the selected gene range. is there.

以下、図１１の詳細表示方法のフローチャートにしたがって説明する。まず、主画面で閲覧したい遺伝子がある染色体上の必要な位置を選択する（検索を用いてもいい）（Ｄ１）。選択された範囲のゲノム位置情報を読み取り、発現データが入ったテーブル（例では図４の染色体ごとに作成されたChr(n)_expDataテーブル）から該当する遺伝子を取得し（Ｄ２）、主画面上に図８で指定されたアノテーションとともに、同図で指定された発現データのある遺伝子を同図８の色表示指定にしたがって表示する（Ｄ３）。再度、発現データの詳細を閲覧したい遺伝子を再選択する（Ｄ４）。 Hereinafter, description will be given according to the flowchart of the detailed display method of FIG. First, a necessary position on a chromosome having a gene to be viewed on the main screen is selected (search may be used) (D1). Read the genomic position information of the selected range, obtain the corresponding gene from the table containing the expression data (Chr (n) _expData table created for each chromosome in Fig. 4 in the example) (D2), on the main screen In addition to the annotation specified in FIG. 8, the gene having the expression data specified in FIG. 8 is displayed according to the color display specification in FIG. 8 (D3). Again, the gene for which the details of the expression data are to be browsed is reselected (D4).

再選択された範囲情報をさらに取得し、図４のテーブルgeneAccテーブルからプローブセットに対応する配列のAccession IDとそれらが含まれる遺伝子を取得する（Ｄ５）。同様にchr(n)_expDataテーブルからそれぞれのプローブセットに対応するAccession IDとそれらに対応するfold change、P valueなどの発現値に関する情報を取得する（Ｄ６）。詳細表示画面に列を遺伝子、行を細胞、薬剤、サンプル等のさまざまな実験条件を一つのブロック（図１０で示す５３に相当する部分）として表示する（Ｄ７）。このときに、複数のProbe Setが同一遺伝子上に存在したら、ブロック（図１０で示す５３に相当する部分の横軸をProbe Setの数で均等に分割して表示する。ブロック内の縦軸は、タイムコースなどの細かい実験単位でのfold changeやP valueなどの発現値とともに表示する（Ｄ８）。このＤ８での結果表示例を示すのが、図１０の５４のグラフである。 The reselected range information is further acquired, and the Accession ID of the sequence corresponding to the probe set and the gene including them are acquired from the table geneAcc table of FIG. 4 (D5). Similarly, information about expression values such as Accession IDs corresponding to each probe set and fold change and P value corresponding to them is acquired from the chr (n) _expData table (D6). The detailed display screen displays various experimental conditions such as columns for genes and rows for cells, drugs, samples, etc. as one block (part corresponding to 53 shown in FIG. 10) (D7). At this time, if a plurality of probe sets exist on the same gene, the horizontal axis of the block (the part corresponding to 53 shown in FIG. 10 is equally divided by the number of probe sets. The vertical axis in the block is This is displayed together with expression values such as fold change and P value in a fine experimental unit such as a time course (D8) A graph 54 shown in FIG.

ところで、登録した遺伝子や発現データは、上記の表示方法でわかりやすくなるが、データの検索手法も、必要である。ゲノム情報とともにうまく検索を行うことで、遺伝子の誘導発現や遺伝子ネットワークの関係、SNPなどのゲノム情報との関連を網羅的に閲覧できる。 By the way, registered genes and expression data can be easily understood by the above display method, but a data search method is also necessary. By performing a successful search together with genomic information, it is possible to comprehensively browse the relationship between the inductive expression of genes, the relationship between gene networks, and genomic information such as SNPs.

検索の特徴は、発現データとゲノムアノテーションを同時に検索が可能であることである。例えば、特定のＳＮＰを含んだ遺伝子で実験した細胞や薬剤などの条件を含み、特定範囲の発現値をもったデータやタイムコースでの発現値の変化率や変化のパターンを検索することなどである。従来、これらの検索方法は個々に利用されてきたが、ゲノム情報と発現情報を同時に検索し、同じ画面上で染色体上にマッピングして表示することで、従来わかりずらかった誘導発現現象を把握するのに便利である。特に、染色体上の近傍に位置する遺伝子が互いに発現することは、意味があることである。 The feature of the search is that the expression data and genome annotation can be searched simultaneously. For example, by searching for data with a specific range of expression values, change rates of expression values in the time course, and patterns of changes, including conditions such as cells and drugs that were experimented with a gene containing a specific SNP is there. Conventionally, these search methods have been used individually, but genome information and expression information are searched at the same time, and by mapping and displaying them on the same screen on the same screen, it is possible to grasp the induction expression phenomenon that has been difficult to understand conventionally. Convenient for. In particular, it is meaningful that genes located in the vicinity of each chromosome are expressed with each other.

以下に実施例を示す。図１２に検索条件設定例を示す。６１はプロジェクト選択で、発現実験プロジェクトのうち、どのプロジェクトを検索対象にするかである。６２で発現データを染色体上での位置、遺伝子名、細胞などで絞込み条件を設定している。この例では、細胞（Cell）とあるが、細胞以外にも、薬剤、サンプルなどの実験条件も含む。 Examples are shown below. FIG. 12 shows an example of search condition setting. Reference numeral 61 denotes a project selection, which is a search target among the expression experiment projects. At 62, the expression data is narrowed down by the position, gene name, cell, etc. on the chromosome. In this example, it is referred to as a cell, but also includes experimental conditions such as drugs and samples in addition to cells.

さらに、近傍の遺伝子の誘導を確認するためには染色体上での位置の範囲で検索するのはより効果的である。特に、主となる遺伝子が判明が判明したときにその遺伝子から一定範囲内にある、特定の発現パターンや発現値を持ったデータを検索するのは、従来のように、発現実験データの検索とゲノムデータベースの検索を別々に行っていたら不可能な検索である。 Furthermore, it is more effective to search within a range of positions on the chromosome in order to confirm the induction of nearby genes. In particular, when the main gene is found, searching for data with a specific expression pattern or expression value within a certain range from that gene is the same as searching for expression experiment data as in the past. This is impossible if the genome database is searched separately.

６３は発現データの検索で、発現値（Fold_change、Ratio、P-Valueなど発現解析で利用されるデータ値）の上限、下限を設定する。次に、６４では発現データ値の変化率を検索する条件を設定する。例えば、発現値の変化率の上限、下限を決定する。データの比較はタイムコースなどで行う。この例では、発現値の変化率であるが、発現の変化パターンを組み合わせるとより効果的である。６５は検索処理に時間を要するためにバックグランド処理で検索を実行しているので、後で検索結果を閲覧するためのタイトルである。 63 is a search for expression data, and sets the upper and lower limits of expression values (data values used in expression analysis such as Fold_change, Ratio, P-Value). Next, at 64, a condition for searching for the change rate of the expression data value is set. For example, the upper limit and the lower limit of the change rate of the expression value are determined. Comparison of data is performed at a time course. In this example, the rate of change of the expression value is more effective when combined with the change pattern of the expression. 65 is a title for browsing the search result later because the search is executed in the background process because the search process takes time.

検索後の発現データは以下の方法で、グループ化する処理を行う。検索された発現データはゲノムの位置を取得して、その前後に位置する遺伝子の発現データも検索条件に合致するか確認する。合致した場合はさらに次に位置する遺伝子の発現データも検索条件に合致するか確認して、合致しなくなるまで繰り返す。これらの発現データは同領域で発現しているので一つのグループとする。また、同様にゲノムの同じバンド領域にある発現データでもグループを作る。 The expression data after the search is grouped by the following method. The searched expression data obtains the position of the genome, and confirms whether the expression data of the genes located before and after that match the search conditions. If there is a match, the expression data of the next located gene is also checked if it matches the search condition, and the process is repeated until no match is found. Since these expression data are expressed in the same region, they are grouped together. Similarly, a group is also created based on expression data in the same band region of the genome.

これらのグループ化した発現データをリストで表示し、わかりやすくした表示例を図１３に示す。７１、７２はこの例のサンプルである、細胞の単位にまとめて表示した例である。各グループあるいは各発現データはゲノムのビューアーで表示するリンク７３をはり、該当場所をビューアーでそれぞれの場所を表示できるようにすることで遺伝子の誘導発現等の確認が一目瞭然であるとともに、同時にその他のアノテーションも確認可能である。 FIG. 13 shows a display example in which these grouped expression data are displayed in a list and are easily understood. Reference numerals 71 and 72 are samples of this example, which are examples collectively displayed in cell units. Each group or each expression data is linked 73 to display in the genome viewer, and the corresponding location can be displayed in the viewer so that confirmation of the induced expression of the gene is obvious at a glance, Annotations can also be confirmed.

次に、文献データマイニング情報の表示の効果について説明する。発現情報をゲノムデータベース上に表示することによる効果は、実際に実験を行った結果を直接ゲノム情報と関連付けることが出来るため、より効果的である説明をおこなった。しかし、遺伝子に関する情報は実際に実験した結果だけの情報ではなく、すでに世界中で研究された結果が認められた文献情報も参考にしなければならないことは事実であるが、その情報量が極端に多くなっているために、一つずつ文献を読んで遺伝子が関連する情報をまとめることは不可能に近い。しかしながら文献の情報から得られる情報は多く、特に、遺伝子と疾患の関連など、発現実験以上に重要な情報など、すでに研究された結果が報告されている。特に、疾患に関する遺伝子と、その疾患について発現実験の結果、発現が多くあったなどの報告はなされている。 Next, the effect of displaying the document data mining information will be described. The effect of displaying the expression information on the genome database was explained more effectively because the result of the actual experiment can be directly related to the genome information. However, it is true that information on genes is not only information on the results of actual experiments, but also on literature information that has already been studied worldwide, but the amount of information is extremely large. Due to the increasing number of people, it is almost impossible to read the literature one by one and collect information related to genes. However, there is much information that can be obtained from literature information, and in particular, researched results such as information more important than expression experiments, such as the relationship between genes and diseases, have already been reported. In particular, there have been reports on genes related to diseases, and as a result of expression experiments on the diseases, there were many expressions.

そこで、本発明で説明した発現情報の変わりに、我々が発明した特開２００３−４４４８１による文献情報のマイニング方法を利用して算出される遺伝子に対しての関連性の強さをここでいう発現値に置き換えて表示する方法である。 Therefore, instead of the expression information described in the present invention, the strength of the relevance to the gene calculated by using the literature information mining method disclosed in Japanese Patent Application Laid-Open No. 2003-44481 invented here is expressed here. This is a method of displaying by replacing with a value.

例えば、遺伝子と疾患の関連を考えても、一つの遺伝子に関して関連する疾患は複数ある。逆にいえば、疾患単位で表現すれば特定疾患に関連する遺伝子の関連性のランキングは特開２００３−４４４８１の手法によって求められる。関連性の強さを関連性強度とするとその値をそれぞれの遺伝子の発現値とみなし、本発明と同様な方法で表現することで特定の疾患や、治療薬単位で、それらに関する遺伝子がどれだけ研究されているかという結果がゲノムを基準にして一目瞭然となる。 For example, even if the relationship between genes and diseases is considered, there are multiple diseases related to one gene. In other words, when expressed in terms of disease units, the ranking of the relevance of genes related to a specific disease can be obtained by the method of JP-A-2003-44481. When the strength of relevance is the strength of relevance, the value is regarded as the expression value of each gene, and expressed in the same way as in the present invention, how many genes are related to a specific disease or therapeutic unit The result of research is clear at a glance based on the genome.

このような表示を行うには、あらかじめ疾患別や化合物、治療薬別もしくは、両者にともに関連する遺伝子をランキングし、関連の強さを示す数字を発現値とみなしてマップ表示する。図１４のフローチャートで示すように、まず、図１０の５１にあたるカテゴリを疾患、化合物、疾患と化合物両方共通するなどの条件を決定する（Ｋ１）。次のＫ２の各処理段階で疾患の場合についてみれば、特定の疾患に関係する遺伝子をランキングする。 In order to perform such display, genes related to each disease, compound, therapeutic agent, or both are ranked in advance, and a number indicating the strength of the association is regarded as an expression value and displayed as a map. As shown in the flowchart of FIG. 14, first, conditions such as a category corresponding to 51 of FIG. 10 are determined for diseases, compounds, and both diseases and compounds are common (K1). In the case of disease at the next treatment stage of K2, genes related to a specific disease are ranked.

ランキングを行うと、図１５のように特定の疾患に関連する遺伝子リストが取得できる。図１５の表は「Hypertension（高血圧病）」に関連する遺伝子を検索した様子である。図１５中のFreqとRatio値を発現値とみなし、正規化したあとにそれぞれの遺伝子の位置情報を求め、内部データベースに値を格納する。ここで、Rankは、高血圧病に関連する遺伝子のランキング、Gene Symbolは遺伝子シンボルで、Gene Nameは遺伝子シンボルに対応していて、一般的に良く使われている遺伝子名である。Freqは、文献中にHypertensionと同時に出現した遺伝子の論分数である。文献中にHypertensionのみが出現する文献数とRatioは、Freq値の割合である。 When ranking is performed, a gene list related to a specific disease can be acquired as shown in FIG. The table in FIG. 15 shows a search for genes related to “Hypertension (hypertension)”. The Freq and Ratio values in FIG. 15 are regarded as expression values, normalized, position information of each gene is obtained, and the values are stored in an internal database. Here, Rank is a ranking of genes related to hypertension, Gene Symbol is a gene symbol, and Gene Name corresponds to the gene symbol and is a commonly used gene name. Freq is a rational fraction of a gene that appeared in the literature at the same time as Hypertension. The number of documents in which only Hypertension appears in the document and the Ratio are the ratios of the Freq values.

次に、前記と同様な方法で表示をする。図１６の例では、高血圧病（Hypertension）に関連する治療薬と遺伝子の関連を得るために、カテゴリには治療薬を表示している。８１に治療薬のカテゴリ、８２に関連する遺伝子、８３に疾患との関連強度を色表示で示す。８４はFreqとRatioとを数値で表示した例である。なお、この例では発現情報でいうタイムコースで示すカテゴリは利用していない。 Next, display is performed in the same manner as described above. In the example of FIG. 16, a therapeutic agent is displayed in the category in order to obtain an association between a therapeutic agent related to hypertension and a gene. Reference numeral 81 denotes a therapeutic drug category, 82-related genes, and 83 a disease-related strength in color. Reference numeral 84 denotes an example in which Freq and Ratio are displayed as numerical values. In this example, the category indicated by the time course in the expression information is not used.

また、カテゴリを疾患とし、治療薬としてもより効果的である。さらに、特開２００３−４４４８１の手法の中で示す、時系列データの表示方法での結果を図１７のように表示しても効果的である。 Moreover, the category is a disease, and it is more effective as a therapeutic drug. Further, it is also effective to display the result of the time series data display method shown in the method of JP-A-2003-44481 as shown in FIG.

以上のように、遺伝子の網羅的情報と、発現情報を同時に表示することでさまざまな情報がマイニング可能である。例えば治療薬を投与した時の副作用の個人差（薬剤感受性）の研究を考えると、治療薬を投与したとき薬剤感受性の強さによって患者群をいくつかのカテゴリに分割するとともに、遺伝子発現実験で求めた遺伝子のクラスターや個々の遺伝子との関連性を研究し、薬剤感受性に影響する遺伝子と個人ごとにことなるSNP等の多型情報を特定する。特定された多型情報を利用し、患者に薬剤を投与する前に事前に薬剤感受性を調べ、副作用のない治療を行おうというものである。 As described above, various information can be mined by simultaneously displaying comprehensive information of genes and expression information. For example, considering the study of individual differences (drug sensitivity) of side effects when a therapeutic drug is administered, the patient group is divided into several categories according to the strength of drug sensitivity when a therapeutic drug is administered. Study the obtained cluster of genes and their relationship with individual genes, and identify polymorphic information such as SNPs that differ from individual genes that affect drug sensitivity. Using the identified polymorphism information, the drug sensitivity is examined in advance before the drug is administered to the patient, and treatment without side effects is performed.

この研究の流れの中で、いくつかのカテゴリに分けた患者群と利用して、遺伝子発現実験で求めた遺伝子のクラスターや個々の遺伝子との関連性を研究し、薬剤感受性に影響する遺伝子と個人ごとにことなる多型情報を特定する部分において、ゲノム上に点在する遺伝子とその周辺の情報、発現実験データと分類された多型情報を一度に表示することができる。 In the course of this research, we used a group of patients divided into several categories to study the cluster of genes obtained in gene expression experiments and the relationship with individual genes, and to identify genes that affect drug sensitivity. In a part for specifying different polymorphism information for each individual, it is possible to display at a time polymorphic information classified as genes scattered around the genome, information on the periphery thereof, and expression experiment data.

上述したように、このような情報を一度に表示することで、上記で述べた薬剤感受性が強い患者集団の多型分類とその発現情報、多型情報、遺伝子とその周辺の状況が一目瞭然である。さらには、薬剤投与という刺激で生じた遺伝子の動きは、ひとつの遺伝子が原因しているわけでなく複数の遺伝子が同時に関与している。複数の遺伝子が同時に引き起こす誘導発現の様子も検索、表示が可能になるために研究効率は格段に良くなる。 As described above, displaying such information at a time makes it easy to identify the polymorphism classification and expression information, polymorphism information, genes, and surrounding conditions of the patient group with strong drug sensitivity described above. . Furthermore, the gene movement caused by the stimulation of drug administration is not caused by a single gene but a plurality of genes are simultaneously involved. Research efficiency is greatly improved because it is possible to search and display the state of induced expression caused by multiple genes simultaneously.

実施例でも述べたように、この表示方法を文献データマイニングの結果表示として利用することにより、従来検索だけでは発見しずらかった特定の疾患や、化合物、遺伝子に関連性のある遺伝子の染色体上の位置情報も同時に把握できる。 As described in the examples, by using this display method as a result display of document data mining, it is possible to find on a chromosome of a gene related to a specific disease, compound, or gene that has been difficult to find by a conventional search alone. You can also grasp the location information at the same time.

したがって、疾患に関連する遺伝子の関連強度の強さと遺伝子のアノテーションや遺伝子周辺の情報、公共データベースのＳＮＰなどの多型情報との関連性が一目瞭然となり、研究施設で求めた前述の薬剤感受性などのカテゴリ分類された患者群と、それぞれの遺伝子型などの多型情報や前述の発現情報を同時表示することで、研究テーマ、文献、ゲノム情報が一覧できるために、研究者の知識と勘を効率よく引き出せるため、コンピュータによるデータマイニングの結果を数値で見るよりはるかに研究効率が良くなる。 Therefore, the relationship between the strength of the gene related to the disease and the annotation of the gene, the information around the gene, and polymorphic information such as SNP in the public database becomes clear at a glance, and the above-mentioned drug sensitivity obtained at the research facility By simultaneously displaying categorized patient groups, polymorphism information such as their respective genotypes, and the above-mentioned expression information, research themes, literature, and genome information can be listed, so researchers' knowledge and intuition are efficient. Since it can be extracted well, the research efficiency is much better than numerically viewing the results of computer data mining.

本発明の遺伝子解析システムのデータ登録・表示までの処理を示す図である。It is a figure which shows the process until the data registration and display of the gene analysis system of this invention. 本発明の遺伝子解析システムに入力される外部データの構成例を示す図である。It is a figure which shows the structural example of the external data input into the gene analysis system of this invention. 本発明の遺伝子解析システムのデータ登録・表示までの処理を示す模式図である。It is a schematic diagram which shows the process until the data registration and display of the gene analysis system of this invention. 本発明の遺伝子解析システムが有するデータベース構造の一例を示す図である。It is a figure which shows an example of the database structure which the gene analysis system of this invention has. 本発明の遺伝子解析システムが有するデータベース構造の一例を示す図である。It is a figure which shows an example of the database structure which the gene analysis system of this invention has. 発現実験に用いられた遺伝子のゲノム上の位置を特定する処理を示す図である。It is a figure which shows the process which pinpoints the position on the genome of the gene used for expression experiment. 本発明の遺伝子解析システムのゲノム表示主画面を示す図である。It is a figure which shows the genome display main screen of the gene-analysis system of this invention. 図７の表示形式を選択する画面を示す図である。It is a figure which shows the screen which selects the display format of FIG. 遺伝子とプローブセットとの関係を示す図である。It is a figure which shows the relationship between a gene and a probe set. 発現データの詳細表示を示す図である。It is a figure which shows the detailed display of expression data. 発現データの詳細表示方法を示す図である。It is a figure which shows the detailed display method of expression data. 検索条件設定の一例を示す図である。It is a figure which shows an example of search condition setting. グループ化した発現データをリスト表示した例を示す図である。It is a figure which shows the example which displayed the list of the grouped expression data. 文献情報のマイニング方法を利用して詳細表示を行う方法を示す図である。It is a figure which shows the method of performing a detailed display using the mining method of literature information. 高血圧病に関連する遺伝子のランキングリスト例を示す図である。It is a figure which shows the ranking list example of the gene relevant to a hypertension disease. 高血圧病に関連する治療薬と遺伝子の関係を示す図である。It is a figure which shows the relationship between the therapeutic agent relevant to a hypertension, and a gene. 高血圧病に関連する遺伝子論文数の推移を示す図である。It is a figure which shows transition of the number of gene articles relevant to a hypertension. 従来の発現解析ソフトによる処理例を示す図である。It is a figure which shows the example of a process by the conventional expression analysis software. 発現実験で使用された遺伝子群をデータベースを用いて検索した従来の例を示す図である。It is a figure which shows the conventional example which searched the gene group used by the expression experiment using the database.

Explanation of symbols

１外部データベース
２内部データベース
３ブラウザ 1 External database 2 Internal database 3 Browser

Claims

In a gene analysis system that reads gene information and expression data obtained in expression experiments and synthesizes them with internal data,
In the process of reading the information and expression data obtained in the expression experiment into an internal database, for each probe used in the expression experiment, a genome position specifying means for specifying the gene used by each probe and the position of the gene on the genome When,
Registration means for storing data by linking the position information specified by the genome position specifying means and information on the gene;
A first selection means capable of selecting a range of genes to be browsed on a chromosome;
An extraction means for extracting a first gene region in which expression data is present in the registration means, out of the gene range selected by the first selection means;
First display means for displaying an annotation for the gene range selected by the first selection means;
Second display means for displaying the positional relationship between the gene range in which the annotation is displayed by the first display means and the first gene region on a gene basis based on the data of the registration means and the expression data; ,
Second selection means for selecting a second gene region from within the first gene region based on the display on the second display means;
Data is acquired from the registration means, and each gene of the second gene region is arranged in a column or row, and the experimental conditions of the expression experiment are arranged in a row or column, and the expression value for each experimental condition of each gene A gene analysis system comprising: a third display means for displaying information .

The gene analysis system according to claim 1,
The genome localization means is Locus
External data search means for searching genome location information from external information published using any one of ID, gene name, gene symbol, UniGene ID, nm ID, and Accession ID key,
A first internal data search means for searching internal data using an Accession ID key, when the external data search means cannot detect data;
When data cannot be detected by any of the above means, Locus ID, gene name, gene symbol, UniGene
When any key of ID, nm ID, and Accession ID cannot be obtained, the second internal data search means for searching internal data based on nucleic acid or amino acid sequence information is used. Gene analysis system.

In the gene analysis system according to claim 1 or 2,
The gene analysis system characterized in that the second display means compares the positional relationship by coloring the gene range in which the annotation is displayed and the first gene region, respectively .