JP2006191922A

JP2006191922A - System for estimating strain of microorganism and method for the same

Info

Publication number: JP2006191922A
Application number: JP2005216546A
Authority: JP
Inventors: Koretsugu Ogata; 是嗣緒方; Tomoko Inagaki; 知子稲垣
Original assignee: WORLD FUSION CO Ltd; Shimadzu Corp
Current assignee: WORLD FUSION CO Ltd; Shimadzu Corp
Priority date: 2004-12-15
Filing date: 2005-07-26
Publication date: 2006-07-27
Anticipated expiration: 2025-07-26
Also published as: JP5565991B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for simply estimating a strain with high accuracy. <P>SOLUTION: This system for estimating the strain comprises estimating the strain of a test bacterium based on homology which is observed between a base sequence derived from the test bacterium and a base sequence derived from an already-known microorganism, wherein the system is equipped with a homology-searching means so that a sequence data base for describing sequence data concerning a gene having high preservability of the already-known microorganism, including at least a base sequence of the gene and a species name of the microorganism from which the gene is derived, is compared with a base sequence of a gene having the high preservability of the test bacterium, together with base sequences described in the sequence data base, and then data of a sequence having high homology to the base sequence of the test bacterium are searched out of the sequence data base. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、塩基配列に基づく相同性比較や系統解析によって微生物の菌種を推定するシステム、及び該システムを用いた菌種推定方法に関する。 The present invention relates to a system for estimating the species of microorganisms by homology comparison based on base sequences and phylogenetic analysis, and a method for estimating the species using the system.

近年、未知の微生物（細菌及び真菌類）の菌種を推定する方法として、被検菌のリボソーマルRNA（rRNA）遺伝子やその周辺領域（本発明では、これらを総称してrRNA遺伝子関連領域、又はリボソーマルRNA遺伝子関連領域と呼ぶ）の配列を決定してGenbankやEMBL、DDBJ等の公共データベースを用いた相同性検索を行い、被検菌のものと相同性の高かった配列が由来する菌種を該被検菌と同一種又は近縁種と判定すると共に、該相同性検索の結果に基づいて被検菌種及びその近縁種の塩基配列に基づく系統樹を作成することで該被検菌の系統学的な位置を推定するといった手法が広く用いられるようになっている（非特許文献１）。 In recent years, as a method for estimating the species of unknown microorganisms (bacteria and fungi), the ribosomal RNA (rRNA) gene of a test bacterium and its peripheral region (in the present invention, these are collectively referred to as rRNA gene-related regions, or The ribosomal RNA gene-related region) is determined, and homology searches using public databases such as Genbank, EMBL, and DDBJ are performed. The test bacterium is determined to be the same or related species as the test bacterium, and a phylogenetic tree based on the base sequence of the test bacterium species and the related species is created based on the result of the homology search. A method of estimating the phylogenetic position is widely used (Non-Patent Document 1).

上記rRNA遺伝子関連領域は全生物に高い保存性で存在しているが、それらの中でも微生物の菌種推定に利用される領域は、どの程度詳細なレベルでの推定を行うかによって異なっている。一般に、カビなどの真核生物の場合、18SrRNAは、綱・目レベル、D2rRNA（28SrRNA遺伝子中の配列）やITS（Internal Transcribed Spacer）領域（18Sと26(28)SrRNAの間の配列）は、その下の属・種レベルの分類に有効とされており、予め解析するレベルを決定した上で、適当なrRNA領域のシーケンスを行い、該領域の塩基配列に基づいた相同性検索や系統解析が行われる。 The rRNA gene-related region exists in all living organisms with high conservation, but among these, the region used for estimating the species of microorganisms differs depending on how detailed the estimation is performed. In general, in the case of eukaryotes such as mold, 18SrRNA is a class / eye level, D2rRNA (sequence in 28SrRNA gene) and ITS (Internal Transcribed Spacer) region (sequence between 18S and 26 (28) SrRNA) It is effective for classification under the genus / species level. After determining the level to be analyzed in advance, the appropriate rRNA region is sequenced, and homology search and phylogenetic analysis based on the base sequence of the region are performed. Done.

また、上記のような、rRNA遺伝子関連領域の配列決定や相同性検索による、細菌や真菌類の菌種推定、及び系統解析を行うためのキットやシステムも市販されている。 In addition, kits and systems for performing bacterial species and fungal species estimation and phylogenetic analysis by sequencing and homology search of rRNA gene-related regions as described above are also commercially available.

篠田吉史、加藤暢夫、森田直樹、「16SrRNA遺伝子解析による細菌の系統分類法」、島津評論、vol.57, No.1-2, pp.121-132,（2000）Yoshifumi Shinoda, Ikuo Kato, Naoki Morita, "A phylogenetic method of bacteria by 16S rRNA gene analysis", Shimazu review, vol.57, No.1-2, pp.121-132, (2000)

上記のような公共のデータベースを用いた相同性検索によって微生物の同定を行う場合、該公共データベースには膨大な数の配列データが登録されており、中には解析精度の低いデータや該配列の由来する生物種が十分に同定されていないものもある。また、微生物以外の配列データを含む膨大なデータに対しても相同性検索が行われるため、必ずしも、精度の高い検索結果が得られない場合があった。 When identifying microorganisms by homology search using a public database as described above, an enormous number of sequence data is registered in the public database, including data with low analysis accuracy and Some species are not well-identified. In addition, since homology search is also performed on a large amount of data including sequence data other than microorganisms, a search result with high accuracy may not always be obtained.

そのため、相同性検索の結果得られた各配列データについて、一つ一つデータベースに登録されたデータの信憑性を確認した上で、微生物の分類表などを基に近縁の微生物のデータを収集して系統樹の作成を行う必要があり、非常に手間が掛かっていた。また、しばしば既存の分類体系と矛盾した系統樹が作成され、菌種の判断が困難となることがあった。 Therefore, for each sequence data obtained as a result of homology search, after confirming the authenticity of the data registered in the database one by one, collect the data of closely related microorganisms based on the microorganism classification table etc. Therefore, it was necessary to create a phylogenetic tree, which was very time-consuming. In addition, a phylogenetic tree that often contradicts the existing classification system was created, and it was difficult to determine the bacterial species.

更に、真核生物の菌種推定を行う場合には、予め形態学的特徴などの他の因子を基に綱・目などを絞り込んでおき、相同性検索の結果などから系統樹を作成して菌種の判断を行うため、手間が掛かる上に作業者による差が大きくなるという問題があった。 Furthermore, when estimating the species of eukaryotes, narrow down the classes and eyes based on other factors such as morphological characteristics in advance, and create a phylogenetic tree from the results of homology search. In order to determine the bacterial species, there is a problem that it takes time and increases the difference between workers.

また、上記のような現在市販されている微生物菌種推定システムでは、解析対象とする配列データが短く、十分な解析精度を得られない場合があった。 Moreover, in the microbial strain estimation systems currently on the market as described above, the sequence data to be analyzed is short, and sufficient analysis accuracy may not be obtained.

更に、ITS領域やD2rRNAによる相同性比較のみでは、全ての菌種を特定できるレベルでなく、対象となる配列が短いために精度の点で問題があるため、しばしば系統的に矛盾のある系統樹が作成されるという問題がある。また、18SrRNA、D2rRNA、ITS領域についてそれぞれ配列解析を行い、既存のデータベースを用いて相同性検索を行った場合、各領域と相同性の高い配列として、それぞれ異なる菌種に由来する配列データが得られることが多く、菌種推定の判断が困難となっていた。 Furthermore, homologies with ITS regions and D2 rRNA alone are not at a level that can identify all bacterial species, but are problematic in terms of accuracy because the target sequences are short, so there are often systematic contradictory phylogenetic trees. There is a problem that is created. In addition, when sequence analysis was performed for each of the 18SrRNA, D2rRNA, and ITS regions, and homology searches were performed using existing databases, sequence data derived from different bacterial species were obtained as sequences with high homology to each region. In many cases, it was difficult to determine the estimation of the bacterial species.

そこで、本発明が解決しようとする課題は、高精度な菌種推定を簡便に行うことができる菌種推定システム、及び菌種推定方法を提供することである。 Therefore, the problem to be solved by the present invention is to provide a bacterial species estimation system and a bacterial species estimation method capable of easily performing highly accurate bacterial species estimation.

上記課題を解決するために成された本発明に係る微生物の菌種推定システムは、被検菌由来の塩基配列と既知微生物由来の塩基配列との間に見られる相同性から、該被検菌の菌種を推定する菌種推定システムであって、
a)既知微生物の保存性の高い遺伝子について、少なくともその塩基配列及び由来生物種名を含む配列データを記載した配列データベースと、
b)被検菌の保存性の高い遺伝子の塩基配列と、上記配列データベースに記載された塩基配列とを比較し、該配列データベースの中から被検菌の塩基配列と相同性の高い配列データを検索する相同性検索手段と、
を備えることを特徴とする。 The microorganism species estimation system according to the present invention, which has been made to solve the above problems, is based on the homology observed between a base sequence derived from a test microorganism and a base sequence derived from a known microorganism. A bacterial species estimation system for estimating the bacterial species of
a) a sequence database describing sequence data including at least the base sequence and the name of the species of origin for a highly conserved gene of a known microorganism;
b) Compare the base sequence of the highly conserved gene of the test bacterium with the base sequence described in the sequence database, and use the sequence database to obtain sequence data highly homologous to the base sequence of the test bacterium. A homology search means for searching;
It is characterized by providing.

なお、上記保存性の高い遺伝子としては、上述のような16S，18S，5S, 5.8S，23S，25S, 26S，28SリボソーマルRNA遺伝子、ゲノム上のリボソーマルRNA遺伝子の間に存在するスペーサ領域（ITS領域又はIGS領域）等のリボソーマルRNA遺伝子関連領域、及びミトコンドリアDNA、gryB遺伝子、キチン合成酵素（CHS）遺伝子、チトクロームb遺伝子、recA遺伝子、elongation factor 1A遺伝子、 tubulin遺伝子、rpoB遺伝子、pks遺伝子、actin遺伝子、fus遺伝子の中から選ばれる1又は2種類以上の遺伝子を用いることが望ましい。 The highly conserved genes include the 16S, 18S, 5S, 5.8S, 23S, 25S, 26S, 28S ribosomal RNA genes as described above, and the spacer region (ITS Ribosomal RNA gene related regions such as mitochondrial DNA, gryB gene, chitin synthase (CHS) gene, cytochrome b gene, recA gene, elongation factor 1A gene, tubulin gene, rpoB gene, pks gene, actin It is desirable to use one or more genes selected from genes and fus genes.

また、本発明の微生物菌種推定システムは、更に、
c)既知微生物の種名及びその分類情報を記載した分類データベース、
を備え、上記配列データベースに記載された配列データと、該分類データベースに記載された分類情報とが互いに関連づけられているものとすることが望ましい。 Moreover, the microbial strain estimation system of the present invention further includes:
c) a classification database describing species names of known microorganisms and their classification information;
It is desirable that the sequence data described in the sequence database and the classification information described in the classification database are associated with each other.

また、本発明の微生物菌種推定システムは、更に、
d)上記分類データベースに記載された分類情報の中から任意の分類群を指定する分類群指定手段と、
e)該分類群指定手段で指定された分類群に属する菌種由来の配列データを上記配列データベースから抽出してサブデータベースを作成するサブデータベース作成手段と、
f)該サブデータベースを対象に、被検菌由来の保存性の高い遺伝子の塩基配列との相同性検索を行うサブデータベース検索手段と、
を備えるものとすることが望ましい。 Moreover, the microbial strain estimation system of the present invention further includes:
d) a classification group designating means for designating an arbitrary classification group from the classification information described in the classification database;
e) Sub-database creation means for creating a sub-database by extracting sequence data derived from bacterial species belonging to the taxon belonging to the taxon designated by the taxon designation group from the sequence database;
f) Sub-database search means for performing a homology search with a base sequence of a highly conserved gene derived from a test bacterium for the sub-database,
It is desirable to provide.

ここで、上記分類群指定手段は、上記相同性検索手段における検索によって被検菌と相同性が高いとされた分類群が自動的に指定されるものであってもよく、上記相同性検索手段による相同性検索の結果等を参考に、操作者が所定の入力手段を用いて適当な分類群を指定するものとしてもよい。これにより、例えば、上記相同性検索手段による相同性検索結果を基にサブデータベースを作成し、該サブデータベースを対象として再び相同性検索を行って菌種の絞り込みを行うことにより、精度の高い菌種推定を行うことが可能となる。 Here, the classification group designating means may be one in which a classification group having high homology with the test bacteria is automatically designated by the search in the homology retrieval means, and the homology retrieval means The operator may designate an appropriate classification group by using a predetermined input means with reference to the result of the homology search by the above. Thus, for example, by creating a sub-database based on the homology search result by the homology search means, and performing a homology search again on the sub-database to narrow down the bacterial species, a highly accurate bacterium It is possible to perform seed estimation.

上記のようなサブデータベース検索手段等を備えた菌種推定システムの場合、上記相同性検索手段においては、被検菌の保存性の高い遺伝子の塩基配列として、18SリボソーマルRNA遺伝子の塩基配列を使用し、上記サブデータベース検索手段においては、被検菌由来の保存性の高い遺伝子の塩基配列としてITS領域又はIGS（InterGenic Spacer）領域の塩基配列を使用することがより望ましい。 In the case of a strain estimation system equipped with a sub-database search means as described above, the homology search means uses the base sequence of the 18S ribosomal RNA gene as the base sequence of the highly conserved gene of the test bacterium. In the sub-database search means, it is more preferable to use the base sequence of the ITS region or IGS (InterGenic Spacer) region as the base sequence of the highly conserved gene derived from the test bacteria.

これにより、18SリボソーマルRNA遺伝子を用いた相同性検索の結果に基づいてサブデータベースを作成し、これに対して18SリボソーマルRNA遺伝子よりも多様性の頻度が比較的高いITS領域やIGS領域の塩基配列を用いた相同性検索を行うことで菌種を絞り込むことができ、より詳細な菌種推定が可能となる。 This creates a sub-database based on the results of homology searches using the 18S ribosomal RNA gene, while the base sequences of the ITS and IGS regions have a relatively high frequency of diversity compared to the 18S ribosomal RNA gene. By performing a homology search using, it is possible to narrow down the bacterial species, and more detailed bacterial species estimation is possible.

また更に、本発明の微生物菌種推定システムは、上記サブデータベースに記載された塩基配列に基づいて系統樹を作成する系統樹作成手段を備えたものとしてもよく、この場合、該系統樹作成手段が、被検菌由来の塩基配列を含む系統樹と被検菌由来の塩基配列を含まない系統樹とを作成できるものとすることがより望ましい。 Furthermore, the microbial strain estimation system of the present invention may include a phylogenetic tree creating means for creating a phylogenetic tree based on the base sequences described in the sub-database. In this case, the phylogenetic tree creating means However, it is more preferable that a phylogenetic tree containing a base sequence derived from the test bacterium and a phylogenetic tree not containing the base sequence derived from the test bacterium can be created.

上記のような菌種推定システムを用いた本発明の菌種推定方法は、被検菌由来の塩基配列と既知微生物由来の塩基配列との間に見られる相同性から、該被検菌の菌種を推定する菌種推定方法であって、
a)既知微生物の保存性の高い遺伝子の塩基配列を記載した配列データベースに対して、被検菌の保存性の高い遺伝子の塩基配列を用いた相同性検索を行って、該被検菌の属する大まかな分類群を推定するステップと、
b)既知微生物の種名及びその分類情報を記載した分類データベースの中から上記相同性検索によって推定された分類群を指定して、該分類群に関する配列データを前記配列データベースから抽出した検索用サブデータベースを作成するステップと、
c)同一被検菌に由来する上記とは異なる保存性の高い遺伝子の塩基配列を用いて、前記検索用サブデータベースを対象とした相同性検索を行って菌種の絞り込みを行うステップと、
を有することを特徴とする。 The bacterial species estimation method of the present invention using the bacterial species estimation system as described above is based on the homology observed between a base sequence derived from a test microorganism and a base sequence derived from a known microorganism. A method for estimating a species, which estimates a species,
a) A homology search using the base sequence of the highly conserved gene of the test bacterium is performed on the sequence database describing the base sequence of the highly conserved gene of the known microorganism, and the test bacterium belongs to Estimating a rough taxon; and
b) A subgroup for search in which the taxonomic group estimated by the homology search is specified from the taxonomic database describing the species name of the known microorganism and its taxonomic information, and the sequence data relating to the taxonomic group is extracted from the sequence database. Creating a database;
c) using a base sequence of a highly conserved gene different from the above derived from the same test bacterium, performing a homology search on the search sub-database to narrow down the bacterial species;
It is characterized by having.

上記のような本発明の菌種推定方法においては、被検菌として真核生物を使用し、該被検菌の18SリボソーマルRNA遺伝子の塩基配列を用いて上記配列データベースに対する相同性検索を行い、該被検菌のITS領域又はIGS領域の塩基配列を用いて上記検索用サブデータベースに対する相同性検索を行うことが望ましい。 In the bacterial species estimation method of the present invention as described above, using a eukaryote as a test bacterium, using the base sequence of the 18S ribosomal RNA gene of the test bacterium, homology search for the sequence database, It is desirable to perform a homology search with respect to the search sub-database using the base sequence of the ITS region or IGS region of the test bacteria.

また、本発明の菌種推定方法は、更に、
d)上記検索用サブデータベースを対象とした相同性検索の結果に基づき、上記分類データベースの中から適当な分類群を指定して、該分類群に関する配列データを上記配列データベースから抽出した系統樹作成用サブデータベースを作成するステップと、
e)前記系統樹作成用サブデータベースに記載された塩基配列を用いて被検菌の塩基配列を含む系統樹と被検菌の塩基配列を含まない系統樹を作成し、両者の間に矛盾がないかどうかを確認するステップ、
を有するものとすることが望ましい。 In addition, the method for estimating the bacterial species of the present invention further comprises:
d) Creation of a phylogenetic tree in which appropriate taxonomic groups are designated from the taxonomy database based on the result of homology search for the search subdatabase, and sequence data relating to the taxonomic group is extracted from the sequence database Creating a subdatabase for
e) Create a phylogenetic tree containing the base sequence of the test bacterium and a phylogenetic tree not containing the base sequence of the test bacterium using the base sequences described in the sub-database for creating the phylogenetic tree, and there is a contradiction between the two. Step to check if there is,
It is desirable to have.

本発明の菌種推定システムによれば、微生物の保存性の高い遺伝子の配列データを記載したデータベースと各微生物の分類上の関係について記載したデータベースを利用して相同性検索を行うことで、より精度の高い菌種推定が行えるようになると共に、分類表などの必要なデータを収集する手間を省くことができ、より簡便に被検菌の系統的な位置を解析することができるようになる。また、従来の市販の菌種推定システムよりも長い配列を用いて解析を行うことができるため、相同性検索の精度を向上することができる。 According to the bacterial strain estimation system of the present invention, by performing a homology search using a database describing the sequence data of genes having high conservation of microorganisms and a database describing the classification relationship of each microorganism, Highly accurate bacterial species estimation can be performed, and the trouble of collecting necessary data such as a classification table can be saved, and the systematic location of the test bacteria can be analyzed more easily. . Moreover, since analysis can be performed using a sequence longer than that of a conventional commercially available strain estimation system, the accuracy of homology search can be improved.

以下、実施例を用いて本発明の微生物菌種推定システム及び該システムを用いた菌種推定方法について説明する。 Hereinafter, the microbial species estimation system of the present invention and the microbial species estimation method using the system will be described using examples.

［実施例１］
本実施例の微生物菌種推定システムの概略構成を図２に示す。本実施例の微生物菌種推定システムは、配列データベース１１ａ及び分類データベース１１ｄを記憶した記憶部１１と、相同性検索や系統樹作成等を実行する制御部１２から成り、該制御部１２にはキーボードやマウスなどの入力部１３と、モニタなどの出力部１４が接続されている。 [Example 1]
A schematic configuration of the microbial strain estimation system of the present embodiment is shown in FIG. The microbial strain estimation system of the present embodiment includes a storage unit 11 that stores a sequence database 11a and a classification database 11d, and a control unit 12 that executes homology search, phylogenetic tree creation, and the like. An input unit 13 such as a mouse and an output unit 14 such as a monitor are connected.

上記配列データベース１１ａは、微生物由来の16S，18S，5S, 5.8S，23S，25S, 26S，28SリボソーマルRNA遺伝子、ゲノム上のリボソーマルRNA遺伝子の間に存在するスペーサ領域（ITS領域及びIGS領域）等のリボソーマルRNA遺伝子関連領域の塩基配列、及び該配列に関連する情報（該配列が由来する菌種名や、該配列の生物学的特徴、遺伝子の機能など）を含む配列データを記載したものであり、細菌由来の配列データを記載した細菌データベース１１ｂと、真菌由来の配列データを記載した真菌データベース１１ｃから成る。 The sequence database 11a includes 16S, 18S, 5S, 5.8S, 23S, 25S, 26S, 28S ribosomal RNA genes derived from microorganisms, spacer regions (ITS regions and IGS regions) existing between ribosomal RNA genes on the genome, and the like. Describes the sequence data including the nucleotide sequence of the ribosomal RNA gene-related region and information related to the sequence (such as the name of the species from which the sequence is derived, the biological characteristics of the sequence, the function of the gene, etc.) Yes, it consists of a bacteria database 11b describing sequence data derived from bacteria and a fungus database 11c describing sequence data derived from fungi.

このような配列データベース１１ａは、例えば、上述のような公共データベースから、細菌及び真菌に由来するリボソーマルRNA遺伝子関連領域に関する情報を抽出することによって作成することができる。なお、本発明の配列データベース１１ａを作成する際には、塩基配列の解読精度の低いデータや、配列の由来する生物種が十分に特定できないものを除外し、有効性の高いデータのみでデータベースを構成するようにする。 Such a sequence database 11a can be created, for example, by extracting information on a ribosomal RNA gene-related region derived from bacteria and fungi from the public database as described above. When creating the sequence database 11a of the present invention, exclude data with low base sequence deciphering accuracy or data for which the species from which the sequence is derived cannot be specified sufficiently, and use only highly effective data to create the database. Make it up.

また、上記分類データベース１１ｄは、配列データベース１１ａに記載されている各配列データが由来する菌種名と、各菌種の分類情報が記載されたものであり、例えば、上記配列データベース１１ａと同様に、公共のデータベース（例えば、Unified Taxonomy Database）から、細菌及び真菌に関する情報を抽出することなどによって作成することができる。上記配列データベース１１ａに記載された配列データと該分類データベース１１ｄに記載された分類情報とは互いに関連づけられており、配列データベース１１ａにおいて特定の配列データを選択することで該配列が由来する菌種の分類情報を分類データベース１１ｄから読み出すことができると共に、分類データベース１１ｄに記載された微生物の分類情報の中から特定の分類群を選択することで、該分類群に属する菌種のリボソーマルRNA遺伝子関連領域の配列データを配列データベース１１ａから抽出することができる。 In addition, the classification database 11d is a database in which the name of the bacterial species from which each sequence data described in the sequence database 11a is derived and the classification information of each bacterial species are described. For example, the classification database 11d is similar to the sequence database 11a. It can be created by extracting information about bacteria and fungi from a public database (for example, Unified Taxonomy Database). The sequence data described in the sequence database 11a and the classification information described in the classification database 11d are associated with each other. By selecting specific sequence data in the sequence database 11a, the species of the strain from which the sequence is derived can be selected. The classification information can be read from the classification database 11d, and a specific classification group is selected from the classification information of the microorganisms described in the classification database 11d, whereby the ribosomal RNA gene-related region of the bacterial species belonging to the classification group Can be extracted from the sequence database 11a.

上記制御部１２は、相同性検索部１２ａ、分類群指定部１２ｂ、サブデータベース作成部１２ｃ、系統樹作成部１２ｄを備えている。相同性検索部１２ａは上記配列データベース１１ａ又は後述のサブデータベース１１ｅに記載された塩基配列と被検菌由来の塩基配列とを比較することで、該被検菌の配列と相同性の高い配列を検索するためのものであり、例えば、BLASTやFASTAなどの既存の解析プログラムや、これらを改良したプログラム、あるいはこれらのアルゴリズムに相当するアルゴリズムを用いたプログラムなどを利用することができる。 The control unit 12 includes a homology search unit 12a, a classification group designation unit 12b, a sub-database creation unit 12c, and a phylogenetic tree creation unit 12d. The homology search unit 12a compares a base sequence described in the sequence database 11a or the sub-database 11e described later with a base sequence derived from the test bacterium, thereby obtaining a sequence highly homologous to the sequence of the test bacterium. For example, an existing analysis program such as BLAST or FASTA, a program obtained by improving these programs, or a program using an algorithm corresponding to these algorithms can be used.

分類群指定部１２ｂは上記分類データベース１１ｄに記載された分類情報の中から任意の分類群を指定するものであり、上記相同性検索部１２ａによる相同性検索の結果、被検菌と相同性が高かった配列、例えば相同性スコアが250以上、もしくは連続してマッチした配列が100塩基対以上でかつ同一性（Identity）が90％以上程度の相同性、より望ましくは相同性スコアが300以上、もしくは連続してマッチした配列が150塩基対以上でかつIdentityが95％以上程度の相同性を有する配列、が由来する菌種の属する分類群、あるいは、操作者が入力部１３を用いて指定した分類群を、後述のサブデータベース作成の対象として指定するものである。サブデータベース作成部１２ｃは、上記分類群指定部１２ｂで指定された分類群に属する菌種由来の配列データを上記配列データベース１１ａから抽出して検索用サブデータベース１１ｅ、又は系統樹用サブデータベース１１ｆを作成するものである。 The classification group specifying unit 12b specifies an arbitrary classification group from the classification information described in the classification database 11d. As a result of the homology search by the homology searching unit 12a, the classification group specifying unit 12b has a homology with the test bacteria. A high sequence, for example, a homology score of 250 or higher, or a sequence of matched sequences of 100 base pairs or higher and an identity of 90% or higher, more preferably a homology score of 300 or higher, Alternatively, a taxon that belongs to a bacterial species from which a sequence of matched sequences is 150 base pairs or more and a sequence having a homology of about 95% or more, or designated by the operator using the input unit 13 A classification group is designated as a target for creating a sub-database described later. The sub-database creating unit 12c extracts sequence data derived from the bacterial species belonging to the taxon specified by the taxon specifying unit 12b from the sequence database 11a to obtain the search sub-database 11e or the phylogenetic tree sub-database 11f. To create.

系統樹作成部１２ｄは、上記系統樹用サブデータベース１１ｆに記載された複数の配列データを用いてマルチプルアライメント（多重整列）処理を行い、その結果に基づいて系統樹を作成するものであり、例えばCLUSTAL Wなどの既存の解析プログラムや、これらを改良したプログラム、あるいはこれらのアルゴリズムに相当するアルゴリズムを用いたプログラムなどを利用することができる。なお、該系統樹作成部１２ｄは、被検菌を含む系統樹と含まない系統樹とを作成するものとし、両者の枝の長さから解析結果に矛盾がないかどうかを確かめることができるようにする。 The phylogenetic tree creation unit 12d performs multiple alignment (multiple alignment) processing using a plurality of sequence data described in the phylogenetic tree sub-database 11f, and creates a phylogenetic tree based on the result. Existing analysis programs such as CLUSTAL W, improved programs thereof, or programs using algorithms corresponding to these algorithms can be used. The phylogenetic tree creation unit 12d creates a phylogenetic tree that includes the test bacteria and a phylogenetic tree that does not include the test bacteria, and can confirm whether there is a contradiction in the analysis results from the lengths of the branches of both. To.

続いて、本実施例の菌種推定システムを用いた菌種推定の方法について説明する。図１は、本実施例の菌種推定方法の手順を示すフローチャートである。ここでは一例として、真菌を対象とし、18SrRNA遺伝子の塩基配列とスペーサ領域（ITS又はIGS）の塩基配列を用いて菌種推定を行う方法について説明する。図１５に示すように、ITS領域やIGS領域はゲノム上のrRNA遺伝子間に存在しており、配列相同性の高い18SrRNAなどのrRNA遺伝子に比べて多様性の頻度が高いといわれている。従って、このようなITS領域やIGS領域を利用することにより、変種間や株レベルでの推定等の詳細な菌種推定を行うことが可能となる。 Next, a method for estimating the bacterial species using the bacterial species estimation system of the present embodiment will be described. FIG. 1 is a flowchart showing the procedure of the method for estimating the bacterial species of this embodiment. Here, as an example, a method will be described in which fungus is the target and the bacterial species is estimated using the base sequence of the 18S rRNA gene and the base sequence of the spacer region (ITS or IGS). As shown in FIG. 15, the ITS region and the IGS region exist between rRNA genes on the genome, and it is said that the frequency of diversity is higher than that of rRNA genes such as 18S rRNA having high sequence homology. Therefore, by using such an ITS region or IGS region, it is possible to perform detailed bacterial species estimation such as estimation between variants or at the strain level.

まず、予め菌種を特定したい微生物（被検菌）の18SrRNA遺伝子及びITS領域のシーケンスを行い、これらの領域の塩基配列を取得しておく。このとき、できるだけ広い範囲の配列決定を行うことが望ましく、18SrRNAは1800塩基程度、ITS領域は250-800塩基程度とすることが望ましい。 First, the 18S rRNA gene and the ITS region of a microorganism (test bacteria) whose bacterial species are to be specified are sequenced in advance, and the base sequences of these regions are obtained. At this time, it is desirable to perform sequencing in as wide a range as possible, 18SrRNA is preferably about 1800 bases, and ITS region is preferably about 250-800 bases.

操作者によって本実施例の菌種推定システムが起動されると、図３のような相同性検索設定画面３０が表示される。問い合わせ名入力欄３１及び、問い合わせ配列入力欄３２にはそれぞれ適当な名称と、上記によって取得された被検菌の18SrRNA配列を入力する。データベース選択欄３３は、相同性検索を行うデータベースとして、上記細菌データベース１１ｂ又は真菌データベース１１ｃのいずれかを選択するものであり、ここでは真菌データベース１１ｃを選択する。検索パラメータ設定欄３４は、Expect（期待値），WordSize（文字列の長さ），Number of hits（表示するデータの数）等の、相同性検索に関するパラメータを設定するものである。問い合わせ配列の入力及び検索パラメータの設定(S1)が完了したら操作者がOKボタン３５をクリックすることにより、配列データベース１１ａ（ここではそのうちの真菌データベース１１ｃ）に対する相同性検索が実行される(S2)。 When the operator starts the bacterial species estimation system of the present embodiment, a homology search setting screen 30 as shown in FIG. 3 is displayed. In the inquiry name input field 31 and the inquiry sequence input field 32, an appropriate name and the 18S rRNA sequence of the test bacterium obtained as described above are input. The database selection column 33 is used to select either the bacterial database 11b or the fungal database 11c as a database for performing the homology search. Here, the fungal database 11c is selected. The search parameter setting field 34 is used to set parameters related to homology search such as Expect (expected value), WordSize (character string length), Number of hits (number of data to be displayed). When the input of the query sequence and the setting of the search parameters (S1) are completed, the operator clicks the OK button 35 to execute a homology search for the sequence database 11a (here, the fungal database 11c) (S2). .

相同性検索が完了すると、図４のようなサブデータベース選択画面４０が表示される。該サブデータベース選択画面４０は、検索結果を表示する検索結果欄４２と、該相同性検索の結果を基に、次に検索を行うサブデータベースの選択を行うためのサブデータベース選択欄４１から成る。検索結果欄４２は問い合わせ配列と相同性の高かった配列をリスト表示するリスト表示欄４３と、問い合わせ配列と上記リスト表示された配列とのアライメントを表示するアライメント表示欄４４から成る。リスト表示欄４３には、各配列データのアクセッションナンバー（登録番号）、Definition（配列の名称やその他の情報）、相同性の高さを示すスコア、及び検索の統計的な有意性を示すE-Valueが表示される。 When the homology search is completed, a sub-database selection screen 40 as shown in FIG. 4 is displayed. The sub database selection screen 40 includes a search result column 42 for displaying a search result, and a sub database selection column 41 for selecting a sub database to be searched next based on the result of the homology search. The search result column 42 includes a list display column 43 that displays a list of sequences having high homology with the query sequence, and an alignment display column 44 that displays the alignment between the query sequence and the listed sequence. The list display field 43 includes an accession number (registration number), definition (sequence name and other information) of each sequence data, a score indicating the degree of homology, and an E indicating the statistical significance of the search. -Value is displayed.

サブデータベース選択欄４１は「科」選択欄４１ａ、「属」選択欄４１ｂ、及び選択済みデータ表示欄４１ｃから成る。「科」選択欄４１ａは、分類データベース１１ｄに記載されている微生物（ここでは真菌）の科名の一覧から、後述の「属」選択欄４１ｂに表示させるものを選択するものであり、デフォルトでは上記相同性検索の結果で最も相同性スコアの高かった配列データの由来菌種が属する科が指定されている。該「科」選択欄４１ａで適当な科名を選択すると、その科に含まれる属名の一覧が「属」選択欄４１ｂに表示される。該「属」選択欄４１ｂに表示された中から適当な属名をクリックして選択ボタン４１ｄを押すと選択した属名が選択済みデータ表示欄４１ｃに表示される。また選択済みデータ表示欄４１ｃに表示されたデータ名をクリックして選択解除ボタン４１ｅを押すことで該データを選択済みデータ表示欄４１ｃから削除することもできる。このような属の選択（S3）が完了したら、OKボタン４５を押すことによって、選択された属に含まれる菌種の配列データが配列データベース１１ａ（ここでは真菌データベース１１ｃ）から抽出されて検索用サブデータベース１１ｅが作成される（S4）。 The sub-database selection field 41 includes a “family” selection field 41a, a “genus” selection field 41b, and a selected data display field 41c. The “family” selection column 41a is used to select what is displayed in the “genus” selection column 41b described later from the list of family names of microorganisms (here, fungi) described in the classification database 11d. The family to which the bacterial species derived from the sequence data having the highest homology score as a result of the homology search belongs is specified. When an appropriate family name is selected in the “family” selection column 41a, a list of genus names included in the family is displayed in the “genus” selection column 41b. When an appropriate genus name is clicked from the “genus” selection column 41b and the selection button 41d is pressed, the selected genus name is displayed in the selected data display column 41c. It is also possible to delete the data from the selected data display column 41c by clicking the data name displayed in the selected data display column 41c and pressing the selection release button 41e. When the selection of the genus (S3) is completed, by pressing the OK button 45, the sequence data of the bacterial species included in the selected genus is extracted from the sequence database 11a (here, the fungal database 11c) and used for search. A sub-database 11e is created (S4).

続いて、図５のようなサブデータベース検索設定画面５０が表示されるので、上記被検菌のITS領域の配列データを問い合わせ配列として入力し、検索パラメータの設定を行う（S5）。OKボタン５４をクリックすると上記で作成された検索用サブデータベース１１ｅに対して相同性検索が実行される（S6）。 Subsequently, since the sub-database search setting screen 50 as shown in FIG. 5 is displayed, the sequence data of the ITS region of the test bacteria is input as a query sequence, and the search parameters are set (S5). When the OK button 54 is clicked, a homology search is performed on the search sub-database 11e created above (S6).

このように、本実施例に係る菌種推定方法では、始めに18SrRNA遺伝子の塩基配列を用いて大まかな菌種の推定を行い、その結果を基に作成した検索用サブデータベースに対して、更に、ITS領域の塩基配列を用いた相同性検索を行うことで菌種の絞り込みを行う。なお、菌種の絞り込みには上記ITS領域のほか、IGS領域の塩基配列を使用してもよい。 Thus, in the bacterial species estimation method according to the present example, first, a rough bacterial species is estimated using the base sequence of the 18S rRNA gene, and further on the search sub-database created based on the result, Species are narrowed down by homology search using the base sequence of the ITS region. For narrowing down the bacterial species, the base sequence of the IGS region in addition to the ITS region may be used.

検索用サブデータベース１１ｅに対する相同性検索が完了すると、図６のようなマルチプルアライメント設定画面６０が表示される。該マルチプルアライメント設定画面６０は、上記サブデータベース設定画面４０と同様に、相同性検索の結果を示す検索結果表示欄６２、及びサブデータベース選択欄６１から成る。ここで、サブデータベース設定欄６１は、マルチプルアライメントに使用するデータを選択するためのものであり、検索結果表示欄６２に表示された相同性検索の結果に基づいて、適当な属を選択する（S7）ことにより、その属に含まれるデータをマルチプルアライメントに使用することができる。また、リスト表示欄６３の各配列の前に設けられたチェックボックス６３ａをチェックすることで、相同性検索でヒットした配列をマルチプルアライメントの対象に加えることもできる。設定ボタン６５を押すとマルチプルアライメントに使用するデータを選別する際の閾値を設定するためのウィンド（図示略）が表示される。該ウィンドで被検菌配列との相同性スコア又は同一性（Identity）の閾値を設定し、チェックボックス６７でマルチプルアライメントに問い合わせ配列（被検菌の配列）を使用するか否かを選択（S8）したうえで、OKボタン６６を押せば、「属」選択欄６１ｂで選択された属に含まれるデータ、及び上記リスト表示欄６３で検索結果の中から選択されたデータが配列データベース１１ａから抽出されて系統樹用サブデータベース１１ｆが作成され（S9）、その中から上記閾値以上のデータが選別されてマルチプルアライメント処理が実行される（S10）。このとき、「問い合わせ配列を使用する」のチェックボックス６７がチェックされていた場合には、被検菌のデータを含んだマルチプルアライメントが実行され、チェックボックス６７がチェックされていなかった場合には、被検菌のデータを含まないマルチプルアライメントが実行される。 When the homology search for the search sub-database 11e is completed, a multiple alignment setting screen 60 as shown in FIG. 6 is displayed. Similar to the sub-database setting screen 40, the multiple alignment setting screen 60 includes a search result display column 62 and a sub-database selection column 61 indicating the result of the homology search. Here, the sub database setting column 61 is for selecting data to be used for multiple alignment, and selects an appropriate genus based on the result of the homology search displayed in the search result display column 62 ( S7), the data included in the genus can be used for multiple alignment. Further, by checking the check box 63a provided in front of each sequence in the list display column 63, the sequence hit in the homology search can be added to the target of multiple alignment. When the setting button 65 is pressed, a window (not shown) for setting a threshold for selecting data used for multiple alignment is displayed. In this window, a threshold value of homology score or identity (Identity) with the test bacteria sequence is set, and whether or not to use the query sequence (test bacteria sequence) for multiple alignment is selected in the check box 67 (S8 Then, if the OK button 66 is pressed, the data included in the genus selected in the “genus” selection field 61b and the data selected from the search results in the list display field 63 are extracted from the sequence database 11a. Then, a phylogenetic tree sub-database 11f is created (S9), data above the threshold value is selected from the sub-database 11f, and multiple alignment processing is executed (S10). At this time, when the check box 67 of “use query sequence” is checked, multiple alignment including the data of the test bacteria is executed, and when the check box 67 is not checked, Multiple alignment that does not include the test bacteria data is performed.

マルチプルアライメントが完了すると、図７のようなマルチプルアライメント結果表示画面７０が表示され、保存ボタン７１を押すことでマルチプルアライメントに使用した配列データのファイルとアライメント結果ファイルが作成されて保存される。また、入力部１３で所定の操作を行うことにより、該マルチプルアライメントの結果に基づく系統樹を表示させることができ、問い合わせ配列を含む系統樹と含まない系統樹の2種類を作成しておくことで、両者を比較して系統的に矛盾がないことを確認することができる（S11）。 When the multiple alignment is completed, a multiple alignment result display screen 70 as shown in FIG. 7 is displayed. By pressing a save button 71, a sequence data file and an alignment result file used for the multiple alignment are created and stored. In addition, by performing a predetermined operation on the input unit 13, a phylogenetic tree based on the result of the multiple alignment can be displayed, and two types of phylogenetic trees including the query sequence and not including them are created. Thus, the two can be compared to confirm that there is no systematic contradiction (S11).

［実施例２］
本発明の微生物菌種推定システムの有効性を示すため、菌種未知の微生物に対し18SrDNAとITS領域の塩基配列に基づく相同性検索、及び系統樹作成を行った。 [Example 2]
In order to demonstrate the effectiveness of the microbial strain estimation system of the present invention, homology search based on the 18S rDNA and the base sequence of the ITS region and phylogenetic tree creation were carried out for microorganisms of unknown bacterial species.

カビと考えられるサンプルよりDNAを定法により抽出し、18SrDNAとITS領域を増幅するためのPCRテンプレートとした。PCRは、プライマーとして図８に示すE21f、Ef4（18SrRNA）、ITS1、ITS4（ITS）を使用し、EX Taq（Takara）を用いて行った。アガロースゲル電気泳動により、それぞれ目的の産物を確認した後、それぞれのプライマーとFung5を用いてシーケンスを行った。シーケンス反応はBigDye Terminator Ver 3.1を使用して行い、ABI3730を用いて泳動した。 DNA was extracted from a sample considered to be mold by a conventional method and used as a PCR template for amplifying 18S rDNA and the ITS region. PCR was performed using EX Taq (Takara) using E21f, Ef4 (18SrRNA), ITS1, and ITS4 (ITS) shown in FIG. 8 as primers. Each target product was confirmed by agarose gel electrophoresis, and then sequenced using each primer and Fung5. The sequencing reaction was performed using BigDye Terminator Ver 3.1, and electrophoresis was performed using ABI3730.

まず、18SrDNAのPCR産物の3本のシーケンスのアライメントを行い、全長の配列を決定した（図９）。このような配列について、BLASTによる相同性検索を行い、図１０のような検索結果を得た。検索結果上位のNeocosmospora vasinfectaについて、分類上の位置を確認すると、
Lineage (full): root; cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; class" Hypocreomycetidae; Hypocreales; Nectriaceae; Nectria
となっており、ここでは、Nectriaよりも細かい分類については、推定することができなかった。 First, three sequences of 18S rDNA PCR products were aligned to determine the full-length sequence (FIG. 9). A homology search by BLAST was performed for such a sequence, and a search result as shown in FIG. 10 was obtained. For the top search result Neocosmospora vasinfecta, confirming the position on the classification,
Lineage (full): root; cellular organisms; Eukaryota; Fungi / Metazoa group; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; class "Hypocreomycetidae;Hypocreales;Nectriaceae; Nectria
Here, we could not estimate the finer classification than Nectria.

次に、ITS1とITS4によって決定されたシーケンスデータ（図１１）から、BLAST検索を行った（図１２）。検索結果上位のFusarium solaniについて、分類上の位置を確認すると、
Lineage (full): root; cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; Hypocreomycetidae; Hypocreales; Nectriaceae; Nectria; Nectria haematococca; mitosporic Nectria haematococca; Fusarium solani complex
となっており、これによって上記のNectriaよりも細かい分類群を知ることができた。 Next, a BLAST search was performed from the sequence data determined by ITS1 and ITS4 (FIG. 11) (FIG. 12). About Fusarium solani, the top search result, confirm the position on the classification,
Lineage (full): root; cellular organisms; Eukaryota; Fungi / Metazoa group; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; Hypocreomycetidae; Hypocreales; Nectriaceae; Nectria; Nectria haematococca; mitosporic Nectria haematococca;
As a result, we were able to know the taxon that was finer than the above Nectria.

上記18SrDNAとITS領域のシーケンスデータを用いた相同性検索の結果に基づき、それぞれ被検菌のデータを含む系統樹と、被検菌のデータを含まない系統樹の作成を行った（図１３、１４）。ITSシーケンスデータを用いたBLAST検索の上位に属する配列データを収集して系統樹作成を行った結果、Nectria haematococcaが最も近い菌種と推定された（図１４ａ）。これはBLAST検索の結果とは異なるものであった。検体のシーケンスデータを含まずに系統解析を行い、検体を含んだものと比較すると、両者の間に相違が見られ（図１４の矢印で示した箇所）、このことから、Fusarium solaniやNectria haemtococcaのそれぞれの菌種における系統的な関係は、完全に分離されているわけではなく、分子レベルでの分類がまだ整理できていないことが確認できた。 Based on the results of the homology search using the sequence data of the 18S rDNA and the ITS region, a phylogenetic tree containing the test bacteria data and a phylogenetic tree not containing the test bacteria data were prepared (FIG. 13, 14). As a result of collecting phylogenetic trees by collecting sequence data belonging to the top of the BLAST search using ITS sequence data, it was estimated that Nectria haematococca was the closest species (FIG. 14a). This was different from the BLAST search results. When the phylogenetic analysis is performed without including the sequence data of the specimen and compared with the specimen containing the specimen, there is a difference between the two (indicated by the arrow in FIG. 14). From this, Fusarium solani and Nectria haemtococca It was confirmed that the systematic relationship in each of the fungal species was not completely separated and the classification at the molecular level was not yet organized.

18SrDNAの相同性検索の結果と、サブデータベースとITSシーケンスデータを利用した系統樹の結果から、本検体は、Fusarium solaniやNectria haematococcaに近縁であると推定された。従来であれば、Nectria属という大きな分類群か、ITSでのBLAST結果上位であるFusarium solaniを検体の属する菌種と推定していた。しかしながら、この指標を用いた解析では、Fusarium solaniやNectria haematococcaの2菌種のいずれかであると推定することになり、より詳細な結果を得ることができた。このことから、18SrDNAを用いて被検菌の大まかな分類を推定し、更にITS情報に基づいて結果の絞り込みを行う本発明の菌種推定方法の有用性が確かめられた。 From the results of 18SrDNA homology search and the results of phylogenetic trees using sub-databases and ITS sequence data, this sample was presumed to be closely related to Fusarium solani and Nectria haematococca. Conventionally, it was estimated that Fusarium solani, which is a large taxon of the genus Nectria, or the BLAST result of ITS, is the bacterial species to which the specimen belongs. However, in the analysis using this index, it was estimated that it was one of two species of Fusarium solani or Nectria haematococca, and more detailed results could be obtained. From this, the usefulness of the method for estimating the bacterial species of the present invention in which the rough classification of the test bacteria was estimated using 18SrDNA and the results were further narrowed down based on ITS information was confirmed.

本発明の実施例である菌種推定システムによる微生物の菌種推定方法を示すフローチャート。The flowchart which shows the microbial species estimation method by the microbial species estimation system which is an Example of this invention. 同実施例の菌種推定システムの概略構成を示すブロック図。The block diagram which shows schematic structure of the microbe estimation system of the Example. 同実施例の菌種推定システムに係る相同性検索設定画面を示す図。The figure which shows the homology search setting screen which concerns on the microbial species estimation system of the Example. 同実施例の菌種推定システムに係る、サブデータベース選択画面を示す図。The figure which shows the sub database selection screen based on the fungus species estimation system of the Example. 同実施例の菌種推定システムに係る、サブデータベース検索設定画面を示す図。The figure which shows the sub database search setting screen based on the fungal species estimation system of the Example. 同実施例の菌種推定システムに係る、マルチプルアライメント設定画面を示す図。The figure which shows the multiple alignment setting screen based on the microbial species estimation system of the Example. 同実施例の菌種推定システムに係る、マルチプルアライメント結果表示画面を示す図。The figure which shows the multiple alignment result display screen based on the microbe estimation system of the Example. 被検菌の配列解析に使用したプライマーの配列を示す図。The figure which shows the arrangement | sequence of the primer used for the sequence analysis of a test microbe. 被検菌の18SrRNA遺伝子の塩基配列を示す図。The figure which shows the base sequence of 18SrRNA gene of a test microbe. 上記18SrRNA遺伝子の塩基配列を使用した相同性検索の結果を示す図。The figure which shows the result of the homology search using the base sequence of the said 18SrRNA gene. 被検菌のITS領域の塩基配列を示す図。The figure which shows the base sequence of the ITS area | region of a test microbe. 上記ITS領域の塩基配列を使用した相同性検索の結果を示す図。The figure which shows the result of the homology search using the base sequence of the said ITS area | region. 上記18SrRNA遺伝子を用いた相同性検索の結果に基づく系統樹を示す図、(a)被検菌由来の配列を含むもの、(b)被検菌由来の配列を含まないもの。The figure which shows the phylogenetic tree based on the result of the homology search using the said 18SrRNA gene, (a) The thing containing the sequence derived from a test microbe, (b) The thing which does not contain the sequence derived from a test microbe. 上記ITS領域を用いた相同性検索の結果に基づく系統樹を示す図、(a)被検菌由来の配列を含むもの、(b)被検菌由来の配列を含まないもの。The figure which shows the phylogenetic tree based on the result of the homology search using the said ITS area | region, (a) The thing containing the sequence derived from a test microbe, (b) The thing which does not contain the sequence derived from a test microbe. rRNA遺伝子とITS領域又はIGS領域との位置関係の例を示す模式図であり、(a)はArxula adeninivoransの18SrRNA、5.8SrRNA、25SrRNA、及びその間にあるITS1、ITS2を、(b)はTricholoma matsutakeの25SrRNA、5SrRNA、及びIGSを、(c)はEncephalitozoon cuniculiの5SrRNAとそれを挟む2つのIGS、及びguanylyltransferase遺伝子の一部を示す。It is a schematic diagram showing an example of the positional relationship between the rRNA gene and the ITS region or IGS region, (a) is Arxula adeninivorans 18SrRNA, 5.8SrRNA, 25SrRNA, and ITS1, ITS2 in between, (b) is Tricholoma matsutake (C) shows a part of Encephalitozoon cuniculi 5SrRNA and two IGS and guanylyltransferase genes sandwiching the 5SrRNA, 5SrRNA, and IGS.

Explanation of symbols

１１…記憶部
１１ａ…配列データベース
１１ｂ…細菌データベース
１１ｃ…真菌データベース
１１ｄ…分類データベース
１１ｅ…検索用サブデータベース
１１ｆ…系統樹用サブデータベース
１２…制御部
１２ａ…相同性検索部
１２ｂ…分類群指定部
１２ｃ…サブデータベース作成部
１２ｄ…系統樹作成部
１３…入力部
１４…出力部
３０…相同性検索設定画面
４０…サブデータベース選択画面
５０…サブデータベース検索設定画面
６０…マルチプルアライメント設定画面
７０…マルチプルアライメント結果表示画面 DESCRIPTION OF SYMBOLS 11 ... Memory | storage part 11a ... Sequence database 11b ... Bacteria database 11c ... Fungal database 11d ... Classification database 11e ... Sub database 11f for search ... Sub database 12 for phylogenetic tree ... Control part 12a ... Homology search part 12b ... Classification group designation | designated part 12c ... sub database creation unit 12d ... phylogenetic tree creation unit 13 ... input unit 14 ... output unit 30 ... homology search setting screen 40 ... sub database selection screen 50 ... sub database search setting screen 60 ... multiple alignment setting screen 70 ... multiple alignment result Display screen

Claims

From the homology found between a base sequence derived from a test bacterium and a base sequence derived from a known microorganism, a strain estimation system for estimating the strain of the test bacterium,
a) a sequence database describing sequence data including at least the base sequence and the name of the species of origin for a highly conserved gene of a known microorganism;
b) Compare the base sequence of the highly conserved gene of the test bacterium with the base sequence described in the sequence database, and use the sequence database to obtain sequence data highly homologous to the base sequence of the test bacterium. A homology search means for searching;
A fungus species estimation system comprising:

The above highly conserved gene is a spacer region (ITS region or IGS region) existing between 16S, 18S, 5S, 5.8S, 23S, 25S, 26S, 28S ribosomal RNA genes of known microorganisms and ribosomal RNA genes on the genome. ), Mitochondrial DNA, gryB gene, chitin synthase (CHS) gene, cytochrome b gene, recA gene, elongation factor 1A gene, tubulin gene, rpoB gene, pks gene, actin gene, fus gene 1 or 2 The bacterial species estimation system according to claim 1, wherein the gene is a gene of more than one type.

Furthermore,
c) a classification database describing species names of known microorganisms and their classification information;
The bacterial strain estimation system according to claim 1 or 2, wherein the sequence data described in the sequence database and the classification information described in the classification database are associated with each other.

Furthermore,
d) a classification group designating means for designating an arbitrary classification group from the classification information described in the classification database;
e) Sub-database creation means for creating a sub-database by extracting sequence data derived from bacterial species belonging to the taxon belonging to the taxon designated by the taxon designation group from the sequence database;
f) Sub-database search means for performing a homology search with a base sequence of a highly conserved gene derived from a test bacterium for the sub-database,
The fungus species estimation system according to any one of claims 1 to 3, further comprising:

The homology search means uses the base sequence of the 18S ribosomal RNA gene as the base sequence of the highly conserved gene of the test bacterium,
The bacterial species estimation system according to claim 4, wherein the sub-database search means uses a base sequence of an ITS region or an IGS region as a base sequence of a highly conserved gene derived from a test bacterium.

Furthermore,
g) Phylogenetic tree creation means for creating a phylogenetic tree based on the base sequences described in the sub-database,
The fungus species estimation system according to claim 4 or 5, characterized by comprising:

The phylogenetic tree estimation unit according to claim 6, wherein the phylogenetic tree creation means can create a phylogenetic tree including a base sequence derived from a test bacterium and a phylogenetic tree not including a base sequence derived from the test bacterium.

From the homology found between a base sequence derived from a test bacterium and a base sequence derived from a known microorganism, a microbial species estimation method for estimating the microbial species of the test bacterium,
a) A homology search using the base sequence of the highly conserved gene of the test bacterium is performed on the sequence database describing the base sequence of the highly conserved gene of the known microorganism, and the test bacterium belongs to Estimating a rough taxon; and
b) A subgroup for search in which the taxonomic group estimated by the homology search is specified from the taxonomic database describing the species name of the known microorganism and its taxonomic information, and the sequence data relating to the taxonomic group is extracted from the sequence database. Creating a database;
c) using a base sequence of a highly conserved gene different from the above derived from the same test bacterium, performing a homology search on the search sub-database to narrow down the bacterial species;
A bacterial species estimation method characterized by comprising:

Using a eukaryote as a test bacterium, perform a homology search against the sequence database using the base sequence of the 18S ribosomal RNA gene of the test bacterium, the base sequence of the ITS region or IGS region of the test bacterium The method according to claim 8, wherein the homology search is performed on the search sub-database.

Furthermore,
d) Creation of a phylogenetic tree in which appropriate taxonomic groups are designated from the taxonomy database based on the result of homology search for the search subdatabase, and sequence data relating to the taxonomic group is extracted from the sequence database Creating a subdatabase for
e) Create a phylogenetic tree containing the base sequence of the test bacterium and a phylogenetic tree not containing the base sequence of the test bacterium using the base sequences described in the sub-database for creating the phylogenetic tree, and there is a contradiction between the two. Step to check if there is,
The method according to claim 8 or 9, wherein