JP4490302B2

JP4490302B2 - Prediction program for heat-resistant protein and recording medium thereof

Info

Publication number: JP4490302B2
Application number: JP2005032297A
Authority: JP
Inventors: 英人高見; 郁夫内山; 志軍李; 弘毅掘越
Original assignee: Japan Agency for Marine Earth Science and Technology
Current assignee: Japan Agency for Marine Earth Science and Technology
Priority date: 2004-03-08
Filing date: 2005-02-08
Publication date: 2010-06-23
Anticipated expiration: 2025-02-08
Also published as: JP2005293551A

Description

本発明は、生物が産生するタンパク質に着目して、そのアミノ酸配列又は塩基配列のデータから、耐熱性に関する特性値を計算することにより、当該タンパク質が耐熱性を有するか否かを判別するためのプログラム、及びそれを記録した記録媒体に関する。より詳細には、本発明は、被検定タンパク質が耐熱性を有しているか否かを判別するためのプログラムであって、当該タンパク質のアミノ酸組成に基づく主成分分析により、被検定タンパク質に固有の分析値を算出し、当該分析値を、耐熱性生物の有する被検定タンパク質と対応するタンパク質における分析値と比較することからなる、タンパク質の耐熱性を判別するための処理をコンピューターに実行させるためのプログラム、及びそれを記録したコンピューター読み取り可能な記録媒体に関する。 The present invention focuses on a protein produced by an organism, and determines whether or not the protein has heat resistance by calculating a characteristic value related to heat resistance from the amino acid sequence or base sequence data. The present invention relates to a program and a recording medium on which the program is recorded. More specifically, the present invention is a program for determining whether or not a test protein has heat resistance, and is a program specific to the test protein by principal component analysis based on the amino acid composition of the protein. An analysis value is calculated, and the analysis value is compared with an analysis value of a protein corresponding to the test protein of the thermotolerant organism to cause the computer to execute a process for determining the heat resistance of the protein. The present invention relates to a program and a computer-readable recording medium on which the program is recorded.

耐熱性酵素は、高温領域においても酵素活性を失わない酵素として産業界、研究開発分野などで広く使用されている。例えば、デンプンなどの糖類の加水分解における酵素反応工程に使用する酵素（特許文献１及び２参照）、生ゴミの処理やそれらからの堆肥などを製造する際の酵素反応における酵素（特許文献３及び４参照）、トレハロースなどの有用物質を製造する際の酵素反応における酵素（特許文献５及び６参照）などが挙げられる。
このように、産業的にも耐熱性酵素は非常に重要なものであるが、近年ではＰＣＲ法（特許文献７参照）や複製ＲＮＡベースの増幅系（特許文献８及び９参照）などに使用される耐熱性のＤＮＡポリメラーゼの開発が重要とされてきており、多数の耐熱性ＤＮＡポリメラーゼが主として好熱性微生物から単離されている。ＤＮＡポリメラーゼは遺伝子操作技術における重要なツールのひとつとなってきており、遺伝子のクローニングや配列決定だけでなく、微量の遺伝子の検出や同定をするためのツールとして、遺伝子増幅のための酵素として重要となってきている。 Thermostable enzymes are widely used in industry, R & D fields and the like as enzymes that do not lose enzyme activity even at high temperatures. For example, an enzyme used in an enzyme reaction step in hydrolysis of saccharides such as starch (see Patent Documents 1 and 2), an enzyme in an enzyme reaction when processing garbage or compost from them (Patent Documents 3 and 2) 4), enzymes in enzyme reactions when producing useful substances such as trehalose (see Patent Documents 5 and 6), and the like.
Thus, thermostable enzymes are very important industrially, but in recent years, they have been used in PCR methods (see Patent Document 7) and replication RNA-based amplification systems (see Patent Documents 8 and 9). Development of thermostable DNA polymerases has been regarded as important, and a large number of thermostable DNA polymerases have been isolated mainly from thermophilic microorganisms. DNA polymerase has become one of the important tools in gene manipulation technology. It is important not only for gene cloning and sequencing, but also as a tool for detecting and identifying trace amounts of genes and as an enzyme for gene amplification. It has become.

現在、これらの目的で主として用いられる好熱性ＤＮＡポリメラーゼは、Ｔ．アクアティクス（T.aquaticus）に由来するＴａｑポリメラーゼのようなサーマス（Thermus）属に由来するものである。より適切な特性及び活性を有する新規なポリメラーゼの発見について関心が高まっており、サーマス属以外からのものとしては、例えば、アナエロセルムサーモフィラム（Anaerocellum thermophilum）からのＤＮＡポリメラーゼを用いる方法（特許文献１０参照）、硫黄代謝好熱性古細菌パイロコッカス・ホリコシからのものを用いる方法（特許文献１１参照）などが報告されている。 Currently, thermophilic DNA polymerases primarily used for these purposes are T.I. It is derived from the genus Thermus such as Taq polymerase derived from T. aquaticus. There is a growing interest in the discovery of novel polymerases with more appropriate properties and activities. Examples of those other than those belonging to the genus Thermus include a method using a DNA polymerase from Anaerocellum thermophilum (see Patent Document 10). In addition, a method using a sulfur metabolism thermophilic archaeon Pyrococcus horikoshi (see Patent Document 11) has been reported.

このように、耐熱性酵素の重要性は益々高まってきているのであるが、耐熱性酵素の検索は多くの場合、好熱性菌や耐熱性菌をスクリーニングの対象として、対象とする酵素生産菌を自然界からスクリーニングし、培養条件を検討して生産された酵素の耐熱性を一つ一つ熱処理をして確認しなくてはならないため、膨大な手間と時間を要するのみならず、多くの場合偶然に左右されることが多かった。また、スクリーニングの対象が、好熱性菌、耐熱性菌あるいは中温菌に限定されており、好熱性菌や耐熱性菌は微生物全体の種類から考えるとごく限られた種にすぎないため、耐熱性酵素の多様性が限られていた。
偶然の発見に期待するだけでなく、産業上有用な耐熱性酵素を検索するための系統的かつ省力的な手法の確立が求められており、本発明者らは、その手法についての特許出願を行ってきた（特許文献１２参照）。
そして、それを簡便に行うためのコンピューター処理可能なプログラムの開発が求められていた。 In this way, the importance of thermostable enzymes is increasing, but in many cases, thermostable enzymes are searched for thermophilic bacteria and thermostable bacteria, and the target enzyme-producing bacteria are selected. Since the heat resistance of the enzyme produced by screening from the natural world and examining the culture conditions must be confirmed by heat treatment one by one, it not only requires enormous labor and time, but is often accidental. It was often influenced by. In addition, the target of screening is limited to thermophilic bacteria, heat-resistant bacteria, or mesophilic bacteria, and thermophilic bacteria and heat-resistant bacteria are only limited species in terms of the overall types of microorganisms. Enzyme diversity was limited.
In addition to expectation of accidental discovery, establishment of a systematic and labor-saving method for searching industrially useful thermostable enzymes is required, and the present inventors have filed a patent application for the method. (See Patent Document 12).
There has been a demand for the development of a computer-processable program for easily doing this.

特表平１０−５０６５２４号Special table hei 10-506524 特開２０００−５０８７０号JP 2000-50870 特開２００１−６１４７４号JP 2001-61474 A 特開２００３−２１９８６４号JP2003-2119864 特開平０８−３３６３８８号JP 08-336388 A 特開平０８−１４９９８０号JP 08-149980 A 特公平０４−６７９５７号No. 04-67957 特開平０２−５８６４号Japanese Patent Laid-Open No. 02-5864 特開平０２−５００５６５号Japanese Patent Laid-Open No. 02-500565 特表２００１−５０２１６９号Special table 2001-502169 特開２０００−４１６６８号JP 2000-41668 A 特願２００４−０４６８８０号Japanese Patent Application No. 2004-046880

従来、耐熱性酵素などの耐熱性タンパク質を検索する多くの場合、好・耐熱性菌をスクリーニングの対象としてきたが、好熱性菌は微生物全体の種類から考えるとごく限られた種にすぎないため、耐熱性酵素の多様性があまり期待できなかった。一方、工業的に用いられている耐熱性酵素にはいわゆる中温菌から分離されたものも少なくないが、対象とする酵素生産菌を自然界からスクリーニングし、培養条件を検討して生産された酵素の耐熱性を一つ一つ実験により確認しなくてはならないため、膨大な手間と時間を要していた。
本発明者らは、試行錯誤により耐熱性酵素を検索していた従来の手法を改善することを目的として、タンパク質のアミノ酸配列又は塩基配列などのデータに基づいて簡便な手法で、当該タンパク質が耐熱性を有するか否かを判別することができる新規な方法を提供してきた（特願２００４−０４６８８０号）。しかしながら、このためには複雑な統計処理が必要であり、コンピューター化が求められていた。
本発明は、このような試行錯誤により耐熱性酵素を検索していた従来の手法を改善することを目的とするものであり、タンパク質のアミノ酸配列又は塩基配列などのデータに基づいて簡便な手法で、当該タンパク質が耐熱性を有するか否かを判別するためのコンピュータープログラム、そのためのデータ、その記録媒体を提供するものである。 Conventionally, in many cases of searching for thermostable proteins such as thermostable enzymes, thermophilic bacteria have been the target of screening. However, thermophilic bacteria are only a limited species when considered from the total types of microorganisms. The diversity of thermostable enzymes could not be expected so much. On the other hand, many industrially used thermostable enzymes have been isolated from so-called mesophilic bacteria, but the enzyme produced by screening the target enzyme-producing bacteria from nature and examining the culture conditions. Since heat resistance had to be confirmed one by one by experiment, enormous effort and time were required.
In order to improve the conventional method of searching for thermostable enzymes by trial and error, the present inventors have made it possible to heat the protein with a simple method based on data such as the amino acid sequence or base sequence of the protein. A novel method has been provided that can determine whether or not there is a property (Japanese Patent Application No. 2004-046880). However, this requires complicated statistical processing and computerization has been required.
The present invention aims to improve the conventional method of searching for thermostable enzymes by such trial and error, and is a simple method based on data such as amino acid sequences or base sequences of proteins. The present invention provides a computer program for determining whether or not the protein has heat resistance, data for the computer program, and a recording medium for the computer program.

本発明者らは、これまで全ゲノム配列が知られている１２０種類の微生物のゲノム中に予測された蛋白質のアミノ酸組成を用いて主成分分析を行い、その第２主成分の固有ベクトル(アミノ酸の重み係数)から個別の蛋白質の主成分得点を算出し、この値と当該タンパク質の耐熱性との相関を検討してきたところ、この値と当該タンパク質に対応する好熱菌の算出するタンパク質における値とが、極めて強い相関関係を有していることを見出し、すでに特許出願してきた（特願２００４−０４６８８０号）。しかしながら、この方法は、オーソログ関係のタンパク質の検索、被検定タンパク質の主成分分析による固有の分析値（ベクトル値）の計算、及び既知タンパク質との比較という多量のデータ処理を必要とするために、これらの計算や検索などのデータ処理をするコンピューター化が望まれていたが、これを完成させることができた。 The present inventors conducted a principal component analysis using the amino acid composition of the protein predicted in the genomes of 120 types of microorganisms whose whole genome sequences have been known so far. The principal component score of each individual protein is calculated from the weighting factor), and the correlation between this value and the heat resistance of the protein has been studied, and this value and the value in the protein calculated by the thermophile corresponding to the protein Has been found to have a very strong correlation and has already filed a patent application (Japanese Patent Application No. 2004-046880). However, since this method requires a large amount of data processing such as search for ortholog-related proteins, calculation of unique analysis values (vector values) by principal component analysis of test proteins, and comparison with known proteins, Computerization for data processing such as calculation and search was desired, but this was completed.

即ち、本発明は、被検定タンパク質が耐熱性を有しているか否かを判別するためのコンピューターに実行させるプログラムであって、当該タンパク質のアミノ酸組成に基づく主成分分析により、被検定タンパク質に固有の分析値を算出し、当該分析値を、耐熱性生物の有する被検定タンパク質と対応するタンパク質における分析値と比較することからなる、タンパク質の耐熱性を判別するための処理をコンピューターに実行させるためのプログラムに関する。
また、本発明は、コンピューターに以下の各ステップを実行させて被検定タンパク質が耐熱性を有しているか否かを判別させるためのプログラムであって、
（１）被検定タンパク質のアミノ酸配列を入力するステップ、
（２）他の生物における、被検定タンパク質とは異なる生物種が産生する対応するタンパク質（以下、対応タンパク質という。）の関係にある既知のタンパク質を検索するタンパク質検索のステップ、
（３）被検定タンパク質のアミノ酸組成に基づく主成分分析による固有の分析値を計算するステップ、
（４）前記（２）のステップで検索された対応タンパク質の固有の分析値と、前記（３）のステップで計算した被検定タンパク質の固有の分析値を計算して、両者の差分を算出するステップ、
（５）前記（４）で算出された差分に基づいて、被検定タンパク質が前記（２）のステップで検索された対応タンパク質と類似しているか否かを判別するステップ、及び、
（６）前記（２）のステップで検索された対応タンパク質、及び前記（５）のステップで判別された結果を表示するステップ、
からなるステップにより、被検定タンパク質のアミノ酸組成に基づく固有の分析値と、既知の対応タンパク質の固有の分析値とを比較させることによる被検定タンパク質が耐熱性を有しているか否かを判別させるためのプログラムに関する。
さらに、本発明は、前記した本発明のコンピューターに実行させるためのプログラムを記録したコンピューター読み取り可能な記録媒体に関する。 That is, the present invention is a program executed by a computer for determining whether or not a test protein has heat resistance, and is specific to the test protein by principal component analysis based on the amino acid composition of the protein. In order to cause a computer to execute a process for discriminating the heat resistance of a protein, which comprises calculating the analysis value of the protein and comparing the analysis value with the analysis value of the test protein of the thermostable organism and the corresponding protein. Related to the program.
Further, the present invention is a program for causing a computer to execute the following steps to determine whether or not a test protein has heat resistance,
(1) inputting an amino acid sequence of a protein to be tested;
(2) a protein search step for searching for a known protein in a relationship with a corresponding protein (hereinafter referred to as a “corresponding protein”) produced by a biological species different from the test protein in another organism,
(3) calculating a specific analysis value by principal component analysis based on the amino acid composition of the protein to be tested;
(4) The unique analysis value of the corresponding protein searched in the step (2) and the unique analysis value of the test protein calculated in the step (3) are calculated, and the difference between them is calculated. Step,
(5) Based on the difference calculated in (4), determining whether or not the protein to be tested is similar to the corresponding protein searched in the step (2); and
(6) a step of displaying the corresponding protein searched in the step (2) and the result determined in the step (5);
By comparing the specific analysis value based on the amino acid composition of the test protein with the specific analysis value of the known corresponding protein, it is determined whether or not the test protein has heat resistance. Related to the program.
Furthermore, the present invention relates to a computer-readable recording medium that records a program for causing the computer of the present invention to be executed.

まず、本発明のプログラムの基礎となる、本発明の被検定タンパク質が耐熱性を有しているか否かを判別する方法について説明する。
全塩基配列決定が終了した微生物のゲノム配列から推測された全蛋白質のアミノ酸組成と微生物の生育温度には相関関係があることが知られている。特に、８０℃を越える超好熱性古細菌(アーキア)と一部の細菌の間で顕著に相関関係が見られることが知られている。しかし、これまで全ゲノム配列が決定された超好熱菌のほとんどがアーキアであることから、この相関関係がアーキア特有のものであるのか、好熱性菌の特徴であるのか、詳細な検討がなされてこなかった。また同様に、ゲノム配列決定が終了した一部の好熱性細菌についても、それらと類縁性の高い非好熱、非耐熱性菌が存在しないこと、あるいは存在してもゲノム配列情報がないため、好熱性細菌に見られたアミノ酸組成の特徴が、本当に好熱性細菌に特有な特徴であるのか、単に種の特異性を反映した結果なのかを正確に決定することは困難であった。 First, a method for determining whether the test protein of the present invention has heat resistance, which is the basis of the program of the present invention, will be described.
It is known that there is a correlation between the amino acid composition of all proteins inferred from the genome sequence of the microorganism for which the entire nucleotide sequence has been determined and the growth temperature of the microorganism. In particular, it is known that there is a significant correlation between hyperthermophilic archaea (Archia) exceeding 80 ° C. and some bacteria. However, since most of the hyperthermophilic bacteria whose whole genome sequence has been determined so far are archaea, detailed investigations have been made as to whether this correlation is unique to archaea or is characteristic of thermophilic bacteria. I did not come. Similarly, some thermophilic bacteria for which genome sequencing has been completed have no non-thermophilic, non-thermophilic bacteria closely related to them, or there is no genome sequence information even if they exist, It has been difficult to accurately determine whether the characteristics of the amino acid composition found in thermophilic bacteria are truly characteristic of thermophilic bacteria or simply reflect species specificity.

本発明者らは、同属内及び非常に類縁性の高い属内に７０℃前後を生育上限温度とする好熱性菌や、様々な生育上限温度を有する非好熱性菌が存在するバチルス（Bacillus）属関連種に着目して、ゲノム配列情報と耐熱性との相関を検討することにした。これらバチルス（Bacillus）属関連種では、これまで4つの非好熱性バチルスサブチリス（B. subtilis）、バチルスハロデュランス（B. halodurans）、オーシャノバチルスイヘエンシス（Oceanobacillus iheyensis）、バチルスセレウス（B. cereus）の全ゲノム配列が明らかにされてきたが、好熱性のバチルス（Bacillus）属関連種の全ゲノム情報については解析されてきていなかった。
そこで、好熱性のジオバチルスカウストフィラス（Geobacillus kaustophilus）の1種である、ジオバチルスカウストフィラスＨＴＡ４２６（Geobacillus kaustophilus HTA426）（ＧＫ）のゲノムを解析することにした。この微生物は深海のマリアナ海溝から得られたものであり、生育上限温度は７４℃である。
これらのバチルス（Bacillus）属関連種の１６ＳｒＤＮＡに基づいてネイバージョイニング法により作成した系統樹を図１に示す。図１の左下のバーは０．０１ケーヌック単位（Knuc unit）を示す。下方の線で示している部分（原図では赤色）は、好熱性菌であることを示す。以下の解析に使用した５種の微生物、上側からバチルスハロデュランスＣ−１２５（B. halodurans C-125）（以下、ＢＨと略称する。）、バチルスサブチリス１６８（B. subtilis 168）（以下、ＢＳと略称する。）、バチルスセレウスＡＴＣＣ１４５７９（B. cereus ATCC14579）（以下、ＢＣと略称する。）、オーシャノバチルスイヘエンシスＨＴＥ８３１（Oceanobacillus iheyensis HTE831）（以下、ＯＩと略称する。）、及びジオバチルスカウストフィラスＨＴＡ４２６（Geobacillus kaustophilus HTA426）（以下、ＧＫと略称する。）にはその右肩のアスタリスク印を付している。 The inventors of the present invention have a thermophilic bacterium having a maximum growth temperature of about 70 ° C. within the same genus and a highly related genus, and a non-thermophilic bacterium having various growth maximum temperatures. Focusing on genus-related species, we decided to investigate the correlation between genome sequence information and heat resistance. Among these Bacillus related species, there have been four non-thermophilic Bacillus subtilis, B. halodurans, Oceanobacillus iheyensis, and Bacillus cereus (B. cereus) has been revealed, but the genome information of thermophilic Bacillus related species has not been analyzed.
Therefore, it was decided to analyze the genome of Geobacillus kaustophilus HTA426 (GK), which is a kind of thermophilic Geobacillus kaustophilus. This microorganism is obtained from the deep-sea Mariana Trench and has a maximum growth temperature of 74 ° C.
A phylogenetic tree prepared by the neighbor joining method based on 16S rDNA of these Bacillus related species is shown in FIG. The lower left bar in FIG. 1 shows 0.01 Knuc unit. The portion indicated by the lower line (red in the original figure) indicates a thermophilic bacterium. Five microorganisms used for the following analysis, from the upper side, B. halodurans C-125 (hereinafter abbreviated as BH), Bacillus subtilis 168 (hereinafter referred to as B. subtilis 168) BS.), Bacillus cereus ATCC 14579 (hereinafter abbreviated as BC), Oceanobacillus iheyensis HTE831 (hereinafter abbreviated as OI), and Geobacillus. Kaustophilus HTA426 (hereinafter referred to as GK) is marked with an asterisk on the right shoulder.

まず、本発明者らは、好熱性のジオバチルスカウストフィラス（Geobacillus kaustophilus）ゲノムの全塩基配列を決定した。次に、このジオバチルスカウストフィラス（Geobacillus kaustophilus）（ＧＫ）を含むこれら５種の微生物、及びこれまでに全ゲノム配列が明らかにされた１２０種の微生物が持つ蛋白質のアミノ酸組成を主成分分析法（ＰＣＡ）により解析したところ、従来知られているとおり、全体的に第１主成分 (ＰＣ１)はＧＣ含量、第２主成分 (ＰＣ２)は生育上限温度と強い相関を示すことが観察された。
この結果を図２に示す。図２の原図はカラーのグラフである。図２の横軸はＧＣ含量（ＰＣ１）の解析値を示し、縦軸は生育上限温度（ＰＣ２）の解析値を示す。ここで行った主成分分析法（ＰＣＡ）は通常の統計学における手法によっている。赤色の四角印（白黒の図では黒色）は好熱性細菌を示し、青色（白黒の図では黒色）はＧＣ含有量の低いグラム陽性菌を示し、緑色（白黒の図ではやや灰色）はＧＣ含有量の高いグラム陽性菌を示す。図２のＰＣ２における０．０１５２のラインは好熱性細菌（上側）と非好熱性細菌 (下側) の境界を示している。
また、バチルス（Bacillus）属関連種間に限っても、第２主成分得点と生育上限温度との間に相関が見られた。ただし、これは菌全体の平均アミノ酸組成を用いた結果であり、個々の蛋白質で見ると、ちらばりが大きいため、相関はそれほど明確ではなくなっている。 First, the present inventors determined the entire base sequence of the thermophilic Geobacillus kaustophilus genome. Next, the amino acid composition of proteins of these five microorganisms including Geobacillus kaustophilus (GK) and 120 microorganisms whose whole genome sequences have been clarified so far are analyzed as principal components. As a result of analysis by the method (PCA), it was observed that the first main component (PC1) showed a strong correlation with the GC content and the second main component (PC2) showed a strong correlation with the growth upper limit temperature as is conventionally known. It was.
The result is shown in FIG. The original drawing of FIG. 2 is a color graph. The horizontal axis of FIG. 2 shows the analytical value of the GC content (PC1), and the vertical axis shows the analytical value of the growth upper limit temperature (PC2). The principal component analysis method (PCA) performed here is based on a general statistical method. Red squares (black in black and white figures) indicate thermophilic bacteria, blue (black in black and white figures) indicates gram-positive bacteria with low GC content, and green (slightly gray in black and white figures) contains GC High amount of gram positive bacteria. A line of 0.0152 in PC2 in FIG. 2 indicates a boundary between thermophilic bacteria (upper side) and non-thermophilic bacteria (lower side).
Moreover, even if it restricted between Bacillus (Bacillus) related species, the correlation was seen between the 2nd main component score and the growth upper limit temperature. However, this is a result of using the average amino acid composition of the whole bacterium, and the correlation is not so clear from the viewpoint of individual proteins due to the large dispersion.

そこで、本発明者らは、解析に用いたバチルス属関連の前記した５種の微生物間において、各蛋白質の耐熱性指標を、第２主成分に対応する固有ベクトルを重み係数としてアミノ酸組成にかけることによりまず算出した。
ここで使用したアミノ酸組成に基づく主成分分析法としては、ＮＣＢＩで公開されているデータベースから１１９種類の微生物ゲノムデータを取得し、これと本発明において決定したジオバチルスカウストフィラスＨＴＡ４２６（Geobacillus kaustophilus HTA426）のゲノムをあわせて１２０種類のゲノム中に同定された蛋白質配列を用いて行った。これらの配列のうち、配列長が５０アミノ酸未満の配列を除去し、さらにＰＳＯＲＴプログラムを用いて２つ以上の膜貫通領域が予測された蛋白質も除去した。残った蛋白質の配列を用いて、生物種ごとに平均アミノ酸組成を算出し、生物種を行、アミノ酸を列とする行列を入力して、統計解析パッケージＲのプリンコンプ（princomp）関数を用いる主成分分析法を行った。
次に、この結果を基に、好熱性ジオバチルスカウストフィラスの対応する蛋白質の主成分得点と非好熱菌４種の値の差分を算出した。この対応づけはホモロジー検索結果から推定されたオーソログ関係をもとに行い、１：１で対応関係がついた蛋白質を対象として解析を行った。（Kreil D.P. and Ouzounis, C. A. (2001) Identification of thermophilic species by the amino acid compositions deduced from their genome. Nucleic Acids Res. 29, 1608-1615）
選抜した９６５個のタンパク質は、２回以上の膜貫通領域を有さないタンパク質であって、かつほぼ同じ生育上限温度を有するジオバチルスステアロサーモフィラス（Geobacillus stearothermophilus ）（以下、ＧＳと略称する。）のゲノムサーバーから５種に共通な９６５個のタンパク質を抽出した。なお、２回以上の膜貫通領域を有するかは、ＰＳＯＲＴプログラム（Nakai, K. & Horton, P., PSORT: Trends Biochem. Sci., 24, 34-36 (1999)）によって判定した。
この結果、算出された「主成分得点」の値を次の表１〜表１８に示す。これらの表の各欄は、左側から、ＧＫに基づく識別記号である「ＧＫＩＤ」、各タンパク質の分類を示す「カテゴリー」、各タンパク質の名称等を示す「注釈」、そしてその右側が５種の微生物のそれぞれの識別記号と「主成分得点」の値であり、左からＧＫ、ＢＣ、ＢＨ、ＢＳ、及びＯＩの順に配置されている。各微生物の識別記号に付されている色は、赤色が対応するＧＫの「主成分得点」との差（ＧＫとの差＝（各値）−各ＧＫ値）が、−０．００５以下であることを示し、青色が対応するＧＫの「主成分得点」との差が−０．０１０以下であることを示し、緑色が対応するＧＫの「主成分得点」との差が−０．０１５以下であることを示し、色が付されていないものは対応するＧＫの「主成分得点」との差が−０．０１５を超えていることを示している。 Therefore, the present inventors apply the heat resistance index of each protein to the amino acid composition using the eigenvector corresponding to the second principal component as a weighting factor among the above-mentioned five types of microorganisms related to the genus Bacillus used for the analysis. First, it calculated.
As a principal component analysis method based on the amino acid composition used here, 119 types of microbial genome data were obtained from a database published by NCBI, and this was determined in the present invention. Geobacillus kaustophilus HTA426 ) Using the protein sequences identified in 120 different genomes. Among these sequences, sequences having a sequence length of less than 50 amino acids were removed, and proteins predicted to have two or more transmembrane regions using the PSORT program were also removed. Using the remaining protein sequences, the average amino acid composition is calculated for each species, and a matrix with the species as rows and amino acids as columns is input, and the principal analysis function of the statistical analysis package R is used. Component analysis was performed.
Next, based on this result, the difference between the main component score of the corresponding protein of the thermophilic Geobacil scout phyllus and the value of the four non-thermophilic bacteria was calculated. This association was performed based on the ortholog relationship estimated from the homology search result, and analysis was performed on proteins having a correspondence relationship of 1: 1. (Kreil DP and Ouzounis, CA (2001) Identification of thermophilic species by the amino acid compositions deduced from their genome. Nucleic Acids Res. 29, 1608-1615)
The selected 965 proteins are proteins that do not have two or more transmembrane regions and have approximately the same maximum temperature for growth (Geobacillus stearothermophilus) (hereinafter abbreviated as GS). )) 965 proteins common to 5 types were extracted from the genome server. In addition, it was determined by the PSORT program (Nakai, K. & Horton, P., PSORT: Trends Biochem. Sci., 24, 34-36 (1999)) whether it has two or more transmembrane regions.
As a result, the calculated “principal component score” values are shown in the following Tables 1 to 18. Each column of these tables is, from the left side, “GK ID” which is an identification symbol based on GK, “Category” indicating the classification of each protein, “Note” indicating the name of each protein, and five types on the right side. These are the identification symbols and the “principal component score” values of the microorganisms, and are arranged in the order of GK, BC, BH, BS, and OI from the left. The color assigned to the identification symbol of each microorganism is that the difference from the “principal component score” of GK to which red corresponds (difference from GK = (each value) −each GK value) is −0.005 or less. The blue color indicates that the difference from the “principal component score” of the corresponding GK is −0.010 or less, and the green color indicates that the difference from the “main component score” of the corresponding GK is −0.015. This indicates that the difference between the corresponding GK and the “principal component score” exceeds −0.015.

この結果からも明らかなように、微生物全体としては明確な相関が見られないとしても、個々の対応するタンパク質同士を比較することにより、明確な相関が有る場合があることが明らかにされたのである。このことをより明確にするために、ＧＫと各種の微生物の相関をグラフ化してみた。
まずグラフ化するにあたっては、ＧＫとほぼ同じ生育上限温度を有するジオバチルスステアロサーモフィラス(Geobacillus stearothermophilus)(ＧＳ)とのオーソログの対応付けを以下のように行った。オクラホマ大のＦＴＰサイトからＧＳのドラフトゲノム配列を取得した。これらコンティグ配列に対して、ＧＫの各翻訳配列をクエリとしてＴＢＬＡＳＴＮプログラムで類似配列を検索し、最高スコアのヒットがクエリの長さの７０％以上の領域について７０％以上の一致を示したものをオーソログとした。次に、ＧＫとＧＳについて、各タンパク質における「主成分得点」の値に基づいた相関図を図３に示す。図３の横軸はＧＫの「主成分得点」の値を示し、縦軸はＧＳの「主成分得点」の値を示す。グラフ中の実線は両者の値が同じ箇所を示し、破線は前記の実線から±０．０１の範囲を示している。このように好熱性細菌同士で各タンパク質を比較した場合には、各タンパク質の「主成分得点」の値は極めて強い相関があることが分かる。同様に、非好熱性細菌であるＢＣ、ＢＨ、ＢＳ、及びＯＩについてそれぞれＧＫとの相関をグラフ化して示したものを図４に示す。図４の左上 (ａ) はＧＫとＢＣとの相関であり、左下 (ｂ）はＧＫとＢＨとの相関であり、右上 (Ｃ) はＧＫとＢＳとの相関であり、右下はＧＫとＯＩとの相関である。それぞれのグラフの横軸はＧＫの「主成分得点」の値を示し、縦軸は各非好熱性細菌の「主成分得点」の値を示す。このグラフから、非好熱性細菌については、タンパク質の種類によりＧＫとよい相関を示すものも有るが、全く異なる値を示すものもあることがわかる。 As is clear from this result, even if no clear correlation was found for the whole microorganism, it was revealed that there was a clear correlation by comparing each corresponding protein. is there. In order to make this clearer, the correlation between GK and various microorganisms was graphed.
First, in graphing, the orthologs were associated with Geobacillus stearothermophilus (GS) having the same upper growth limit temperature as GK as follows. The GS draft genome sequence was obtained from the Oklahoma University FTP site. For these contig sequences, a similar sequence was searched with the TBLASTN program using each translation sequence of GK as a query, and the hit with the highest score showed 70% or more match for the region of 70% or more of the query length. It was an ortholog. Next, for GK and GS, a correlation diagram based on the value of “principal component score” in each protein is shown in FIG. The horizontal axis of FIG. 3 shows the value of “principal component score” of GK, and the vertical axis shows the value of “principal component score” of GS. A solid line in the graph indicates a portion where both values are the same, and a broken line indicates a range of ± 0.01 from the solid line. Thus, when each protein is compared between thermophilic bacteria, it turns out that the value of the "principal component score" of each protein has a very strong correlation. Similarly, FIG. 4 shows graphs showing the correlation with GK for BC, BH, BS and OI which are non-thermophilic bacteria. The upper left (a) in FIG. 4 is the correlation between GK and BC, the lower left (b) is the correlation between GK and BH, the upper right (C) is the correlation between GK and BS, and the lower right is GK and GK. Correlation with OI. The horizontal axis of each graph shows the value of “principal component score” of GK, and the vertical axis shows the value of “principal component score” of each non-thermophilic bacterium. From this graph, it can be seen that some non-thermophilic bacteria show a good correlation with GK depending on the type of protein, but some show completely different values.

前記したＧＫとＧＳの相関では、ほぼ全てのタンパク質において両者に強い相関が認められたが、非好熱性細菌との比較では相関がほとんど無いタンパク質もあることがわかる。これは、当該タンパク質が耐熱性を有していないことによるものとも考えられる。逆に、非好熱性細菌は、全体としては耐熱性を有していないのであるが、当該微生物が産生している全部のタンパク質が耐熱性を有していないのではなく、耐熱性を有していないのは一部のタンパク質であるとも考えられる。そして、仮に耐熱性を失ったタンパク質が生命維持に不可欠のタンパク質である場合には、他のすべてのタンパク質が耐熱性を有していたとしても、当該微生物は生命体全体としてはもはや耐熱性を有していないことになる。
このことは、本発明者らの本発明における新たな知見である。即ち、従来、耐熱性のタンパク質を検索する場合には、耐熱性の微生物をスクリーニングすることによっていた。これは、耐熱性の生命体は耐熱性のタンパク質を有しているからであり、そうでなければ高温の条件下で生命を維持することができないからである。しかしながら、非好熱性細菌が産生するタンパク質は全て非耐熱性でないといけなかというと、必ずしもそうではない。非好熱性細菌が耐熱性のタンパク質を産生していたとしても、生命の維持に問題が生じるかというと、必ずしもそうではなく、非好熱性細菌が非耐熱性になっているのは全てのタンパク質が非耐熱性になったのではなく、生命維持に必須のタンパク質が耐熱性を失った結果であるということも十分考えられる。
前記した表１〜１８、及び図４の結果は、非好熱性細菌であっても好熱性細菌が産生していると同様な耐熱性のタンパク質を産生している可能性を明らかにしているのである。 In the correlation between GK and GS described above, strong correlation was observed in almost all proteins, but it can be seen that there are proteins that have almost no correlation in comparison with non-thermophilic bacteria. This may be due to the fact that the protein does not have heat resistance. On the contrary, non-thermophilic bacteria do not have heat resistance as a whole, but not all proteins produced by the microorganism have heat resistance, but have heat resistance. It is thought that some proteins are not. If a protein that has lost its heat resistance is an essential protein for life support, even if all other proteins have heat resistance, the microorganism is no longer heat resistant as a whole organism. It will not have.
This is a new finding in the present invention by the present inventors. That is, conventionally, when searching for a heat-resistant protein, it has been done by screening a heat-resistant microorganism. This is because a heat-resistant life form has a heat-resistant protein, otherwise life cannot be maintained under high temperature conditions. However, not all proteins produced by non-thermophilic bacteria must be non-thermostable. Even if non-thermophilic bacteria produce heat-resistant proteins, the problem of sustaining life is not necessarily the case. Non-thermophilic bacteria are all non-thermophilic. It is possible that it is not the result of becoming non-heat-resistant, but that it is the result of the loss of heat resistance of proteins essential for life support.
The results shown in Tables 1 to 18 and FIG. 4 clarify the possibility of producing a heat-resistant protein similar to that produced by a thermophilic bacterium even if it is a non-thermophilic bacterium. is there.

これらの結果を、各微生物の生育上限温度との関係でまとめたものを次の図５に示す。図５の横軸は温度を示し、縦軸は百分率（％）を示す。グラフ中の黒四角印（■）はＧＫを示し、黒丸印（●）はＢＨを示し、白丸印（○）はＢＳを示し、白三角印（△）はＢＣを示し、白四角印（□）はＯＩをそれぞれ示す。横軸には、それぞれの微生物の生育上限温度がプロットされ、縦軸は上のライン（原図では緑色）は、９６５個のタンパク質のうちＧＫとの「主成分得点」の値の差が−０．０１５を超えているタンパク質の数の全９６５個に対する百分率（％）を各微生物についてプロットした線であり、中程のライン（原図では青色）は同様に−０．０１０を超えているタンパク質の数の百分率（％）を各微生物についてプロットした線であり、下側のライン（原図では赤色）は同様に−０．００５を超えているタンパク質の数の百分率（％）を各微生物についてプロットした線である。「主成分得点」の値の差が−０．０１５を超えている（即ち、−０．０１５以上、絶対値では小さくなるが、負数であるから大きくなる。以下同じ。）タンパク質を耐熱性蛋白質として、ＢＣ、ＢＨ、ＢＳ、及びＯＩについてこれらの菌の蛋白質の数及びその比率をまとめたものを次の表１９に示す。 A summary of these results in relation to the maximum growth temperature of each microorganism is shown in FIG. The horizontal axis of FIG. 5 indicates temperature, and the vertical axis indicates percentage (%). Black square marks (■) in the graph indicate GK, black circle marks (●) indicate BH, white circle marks (◯) indicate BS, white triangle marks (Δ) indicate BC, and white square marks (□ ) Indicates OI. On the horizontal axis, the upper limit temperature of growth of each microorganism is plotted, and on the vertical axis, the upper line (green in the original figure) shows that the difference in the value of “principal component score” from GK out of 965 proteins is −0. The percentage of the number of proteins exceeding .015 to the total of 965 (%) is plotted for each microorganism, with the middle line (blue in the original figure) of the protein similarly exceeding -0.010. Number percentage (%) is a line plotted for each microorganism, and the lower line (red in the original figure) similarly plots the percentage (%) of the number of proteins above -0.005 for each microorganism. Is a line. The difference in the value of the “principal component score” exceeds −0.015 (that is, −0.015 or more, the absolute value is small but the negative value is large. The same applies hereinafter). Table 19 below summarizes the numbers and ratios of proteins of these bacteria for BC, BH, BS, and OI.

表１９は、前記した主成分分析法に基づいて各常温性バチルス（Bacillus）属の細菌から耐熱性の蛋白質を予測した結果をまとめたものである。表１９中の、−は各蛋白質とＧＫのＰＣ２値との差が−０．０１５よりも小さく耐熱性無しと判定されたものを示し、＋は各蛋白質とＧＫのＰＣ２値との差が−０．０１５より大きいもの、＋＋は−０．０１よりも大きいもの、＋＋＋は−０．００５よりも大きいもので耐熱性有りと判定されたものをそれぞれ示す。表１９中の、＋＋、＋＋＋は前記した＋＋と＋＋＋の和を示し、＋、＋＋、＋＋＋は前記した＋と＋＋と＋＋＋の和を示す。表１９は、バチルス（Bacillus）５種間で１：１の対応がつく９６５種のオーソログ（orthlog）に基づく解析を行った結果をまとめたものである。
この結果、表１９に示したように、ＢＣでは９６５個の蛋白質の８３．５％が、ＢＨでは７９．１％、ＢＳでは７２．１％、ＯＩでは５９．５％が耐熱性蛋白質であると予測された。このグラフでは、ＢＣ（△）が少し異常な数値を示しているが、蛋白質の３種は同様な傾向を示していることがわかる。即ち、好熱性細菌の産生するタンパク質と同程度の「主成分得点」の値を有するタンパク質をたくさん産生している微生物であるほど、生育上限温度が高くなっていることが示されている。例えば、好熱性細菌の産生するタンパク質と同程度の「主成分得点」の値を有するタンパク質を最も少量しか産生していないＯＩは、これらの微生物の中で最も低い生育上限温度となっている。また、ＢＣ（△）はこれらの4種の非好熱性細菌の中では、最も多量の同種のタンパク質を有しているが、生育上限温度が異常に低くなっていることがわかる。これは、生命維持に必須のタンパク質がたまたま耐熱性を失った結果であると考えられる。 Table 19 summarizes the results of predicting thermostable proteins from bacteria of each genus Bacillus based on the principal component analysis described above. In Table 19,-indicates that the difference between the PC2 value of each protein and GK was less than -0.015, and that there was no heat resistance, + indicates the difference between the PC2 value of each protein and GK- A value greater than 0.015, ++ is greater than −0.01, and ++ is greater than −0.005 and are determined to have heat resistance. In Table 19, ++ and ++ indicate the sum of ++ and ++ described above, and +, ++, and ++ indicate the sum of +, ++, and ++ described above. Table 19 summarizes the results of analysis based on 965 orthologs that have a 1: 1 correspondence between five Bacillus species.
As a result, as shown in Table 19, 83.5% of 965 proteins in BC, 79.1% in BH, 72.1% in BS, and 59.5% in OI are heat-resistant proteins. It was predicted. In this graph, BC (Δ) shows a slightly abnormal value, but it can be seen that the three types of proteins show the same tendency. That is, it is shown that the upper limit temperature for growth is higher as the microorganism produces more proteins having the same “principal component score” value as the protein produced by thermophilic bacteria. For example, OI producing the smallest amount of protein having a “principal component score” value similar to that produced by thermophilic bacteria has the lowest growth upper limit temperature among these microorganisms. In addition, BC (Δ) has the largest amount of the same protein among these four types of non-thermophilic bacteria, but the growth upper limit temperature is abnormally low. This is thought to be the result of the accidental loss of heat resistance of a protein essential for life support.

次に、このことを検証するために、これらの微生物からタンパク質を分離し、（１）非加熱、（２）６０℃で１０分間加熱、（３）７０℃で１０分間加熱、のそれぞれの処理をした後でのネイティブＰＡＧＥのパターンを検討した。結果を図６に図面に代わる写真で示す。図６の左側（図６ａ）は全タンパク質であり、クーマシーブリリアントブルー（Coomassie brilliant blue）で染色したものである。この結果、好熱性細菌のＧＫでは加熱処理後（レーン２及び３）であってもほとんど全てのタンパク質のバンドを確認することができるが、他の４種の非好熱性細菌では加熱処理により多くのタンパク質のバンドが消失することがわかる。しかし、ここで重要なことは、全部のタンパク質のバンドが消失するわけではないということである。いくつかのタンパク質のバンドは加熱処理後においても消失せずに残っていることがわかる。このことは、前記してきた、非好熱性細菌であっても産生する全てのタンパク質が非耐熱性であるということではない、ということを実証するものである。
図６の右側（図６ｂ）は、各微生物の全タンパク質を図６ａと同様にネイティブＰＡＧＥで分離した後、エステラーゼ (EC 3.1.1.1) 活性を有するタンパク質のバンドを活性染色法により検出したものである。各図のＢＣ、ＢＨ、ＢＳ、ＧＫ、ＯＩはそれぞれの微生物を示し、各微生物の１〜３の各レーンは、レーン１が非加熱、レーン２が６０℃１０分、レーン３が７０℃１０分である。エステラーゼ（図６ｂ）におけるＯＩでは６０℃の加熱処理により（レーン２）バンドは消失している。また、ＢＣも未処理時に最も強く染色されたメインバンドが６０℃の加熱処理によって消失している。しかし、ＢＳでは加熱処理によりバンドは消失しておらず、ＢＨにおいてもわずかではあるが、６０℃の加熱処理によってもバンドが消失せず維持されている。
また、バチルスサブチリスにおける生育に必須のタンパク質の一つであるＧｒｏＥＳ、バチルス属関連種に共通に保存されている代表的なタンパク質のひとつであるＨａｇ (Flagellin)については、各微生物からこれらのタンパク質をコードしている遺伝子をＰＣＲにて増幅し、大腸菌を用いてクローン化したものを用いて先と同様に耐熱性の検証を行った。クローン化され精製された各タンパク質を（１）非加熱、（２）６０℃で１０分間加熱、（３）７０℃で１０分間加熱、のそれぞれの処理した後でのネイティブＰＡＧＥのパターンを検討した。結果を図７に図面に代わる写真で示す。左側（図７ａ）はＨａｇ、右側（図７ｂ）はＧｒｏＥＳであり、ネイティブＰＡＧＥで分離した後、クーマシーブリリアントブルー（Coomassie brilliant blue）で染色した結果を示したものである。各図のＢＣ、ＢＨ、ＢＳ、ＧＫ、ＯＩはそれぞれの微生物を示し、各微生物の１〜３の各レーンは、レーン１が非加熱、レーン２が６０℃１０分、レーン３が７０℃１０分である。 Next, in order to verify this, proteins were separated from these microorganisms, and each treatment was (1) unheated, (2) heated at 60 ° C for 10 minutes, and (3) heated at 70 ° C for 10 minutes. We examined the pattern of native PAGE after doing this. The results are shown in FIG. The left side of FIG. 6 (FIG. 6a) is the total protein, which is stained with Coomassie brilliant blue. As a result, almost all protein bands can be confirmed even after heat treatment (lanes 2 and 3) in the thermophilic bacterium GK, but in the other four types of non-thermophilic bacteria, more It can be seen that the protein band disappears. However, what is important here is that not all protein bands disappear. It can be seen that some protein bands remain without being lost even after heat treatment. This proves that not all the proteins produced even in the case of the non-thermophilic bacteria described above are non-thermostable.
The right side of FIG. 6 (FIG. 6b) shows the protein band having esterase (EC 3.1.1.1) activity detected by the activity staining method after separation of all proteins of each microorganism by native PAGE as in FIG. 6a. is there. BC, BH, BS, GK, and OI in each figure indicate the respective microorganisms. In each of the lanes 1 to 3 of each microorganism, lane 1 is unheated, lane 2 is 60 ° C. for 10 minutes, and lane 3 is 70 ° C. 10 Minutes. In OI in esterase (FIG. 6b), the band disappeared by heat treatment at 60 ° C. (lane 2). In addition, the main band, which was most strongly stained when BC was not treated, disappeared by heat treatment at 60 ° C. However, in BS, the band is not lost by the heat treatment, and in BH, the band is not lost even by the heat treatment at 60 ° C., but it is maintained.
In addition, GroES, which is one of the proteins essential for growth in Bacillus subtilis, and Hag (Flagellin), which is one of the typical proteins conserved among the species related to the genus Bacillus, The gene coding for was amplified by PCR, and heat resistance was verified in the same manner as described above using a gene cloned using E. coli. Each cloned and purified protein was examined for native PAGE patterns after (1) unheated, (2) heated at 60 ° C. for 10 minutes, and (3) heated at 70 ° C. for 10 minutes. . The results are shown in FIG. The left side (FIG. 7a) is Hag, and the right side (FIG. 7b) is GroES. After separation by native PAGE, the result of staining with Coomassie brilliant blue is shown. BC, BH, BS, GK, and OI in each figure indicate the respective microorganisms. In each of the lanes 1 to 3 of each microorganism, lane 1 is unheated, lane 2 is 60 ° C. for 10 minutes, and lane 3 is 70 ° C. 10 Minutes.

ではどのようなタンパク質が耐熱性であるのかということが問題となる。エステラーゼの場合には、複数の蛋白質が染色法により染色される可能性があるので、実際的にはどのバンドに相当するかを一義的に特定することは困難であることがわかったので、蛋白質のＨａｇとＧｒｏＥＳに着目することにした。そこで、各微生物のＨａｇについて検証してみた結果が図７ａ、ＧｒｏＥＳについては図７ｂである。これらのタンパク質は、前記の表に記載されているＧＫの識別記号がＧＫ３１３１(Ｈａｇ)、ＧＫ０２４８(ＧｒｏＥＳ)のものである。
これらのタンパク質の各微生物における主成分得点の値は次のようになっている。Ｈａｇは、(ＧＫ，−０．０５１３；ＢＣ，−０．０６２２；ＢＨ，−０．０５６７；ＢＳ，−０．０５７８；ＯＩ，−０．０５２８)（表１７参照）、ＧｒｏＥＳは、(ＧＫ，０．１０１８；ＢＣ，０．０８２６；ＢＨ，０．１０１２；ＢＳ，０．０９４；ＯＩ，０．０９８８)（表３参照）である。これらをまとめて示すと次の表２０となる。 Then, what kind of protein is heat resistant becomes a problem. In the case of esterase, since it is possible that a plurality of proteins may be stained by a staining method, it has been found that it is difficult to uniquely identify which band actually corresponds. I decided to pay attention to Hag and GroES. Therefore, the results of verifying the Hag of each microorganism are shown in FIG. 7a and FIG. 7b for GroES. These proteins have GK identification symbols described in the above table of GK3131 (Hag) and GK0248 (GroES).
The value of the main component score of each protein of these proteins is as follows. Hag is (GK, -0.0513; BC, -0.0622; BH, -0.0567; BS, -0.0578; OI, -0.0528) (see Table 17), GroES is (GK , 0.1018; BC, 0.0826; BH, 0.1012; BS, 0.094; OI, 0.0988) (see Table 3). These are collectively shown in Table 20 below.

Ｈａｇタンパク質については、ＢＣ以外の全てについては、７０℃の加熱処理によりによっても未処理時とほぼ同様にバンドが維持されている。ＧｒｏＥＳタンパク質については、ＢＣとＢＳが６０℃以上の加熱処理によりわずかに薄いバンドしか確認されなかったことから、熱処理によってタンパク質が分解されたと考えられる。他のＧＫ、ＯＩ、ＧＫについては未処理とほぼ同様にバンドが維持されていた。このことは、非好熱性細菌では、その種類により、耐熱性を保持しているものと保持していないものがあるということを示している。
次に、図６ａで示された７０℃で１０分間熱処理後にも消失せず残ったタンパク質のバンドを上部から順にゲルから切り出し、各バンドに含まれるタンパク質の同定をＬＣ／ＭＳ／ＭＳを用いて行った。その結果を表２１−２５に示す。 As for the Hag protein, bands other than BC are maintained in the same manner as when untreated even by heat treatment at 70 ° C. Regarding the GroES protein, only a slightly thin band was confirmed by heat treatment at 60 ° C. or higher for BC and BS, and it is considered that the protein was decomposed by heat treatment. For other GK, OI, and GK, bands were maintained in substantially the same manner as untreated. This indicates that some non-thermophilic bacteria have heat resistance and some do not.
Next, protein bands that did not disappear even after heat treatment at 70 ° C. for 10 minutes shown in FIG. 6a were cut out from the gel in order from the top, and the proteins contained in each band were identified using LC / MS / MS. went. The results are shown in Tables 21-25.

表２１−２５は、熱処理実験により耐熱性が確認されたバチルスサブチリス（Bacillus subtilis (ＢＳ)）由来の蛋白質の内訳（表２１及び表２２）を、バチルスハロドランス（Bacillus halodurans(ＢＨ)）由来の蛋白質の内訳（表２３）を、オーシャノバチルスイヘエンシス（Oceanobacillus iheyensis(ＯＩ)）由来の蛋白質の内訳（表２４）を、バチルスセレウス（Bacillus cereus(ＢＣ)）由来の蛋白質の内訳（表２５）をそれぞれ示したものである。各表の各欄は左から、ＢＳの遺伝子名、その産生物名、対応するＧＫの遺伝子名、予測の結果、産生物のアミノ酸数、ＧＫとＢＳの主成分分析の得点差をそれぞれ示す。「予測の結果」欄の、−はＧＫのＰＣ２値との差が−０．０１５よりも小さいさく耐熱性無しと判定されたことを示し、＋はその差が−０．０１５よりも大きいものを、＋＋は−０．０１よりも大きいものを、＋＋＋は−０．００５よりも大きいもので耐熱性有りと判定されたものであることを示す。
表２１−２５に示したように少なくともＢＣとＢＨでは３８、ＢＳでは１１７、ＯＩでは５２の耐熱性タンパク質が同定された。そこで、耐熱性が確認されたタンパク質の耐熱性の有無がＧＫと他のBacillus属関連種との主成分得点の対比によりどのように予測されていたかを調べた。先に述べたように各Bacillus属関連種とＧＫの主成分得点の差が−０．０１５以上を耐熱性ありとすると、ＢＣでは表２５に示した３８個中３４個（８９．５％）が、ＢＨでは表２３に示した３８個中３６個（９４．７％）、ＢＳでは表２１及び表２２に示した１１７個中１０３個（８８．０％）、ＯＩでは表２４に示した５２個中４６個（８８．５％）が、本発明の方法により耐熱性ありと予測されていたことがわかる。
これらの結果をまとめて示すと次の表２６になる。表２６の各記号は表１９の場合と同様である。したがって、好熱性細菌の対応するタンパク質との相関を計算することにより、非好熱性細菌が産生しているタンパク質の耐熱性を判別することができるという本発明の方法が、当該タンパク質の耐熱性を示しているということである。 Table 21-25 shows the breakdown (Table 21 and Table 22) of proteins derived from Bacillus subtilis (BS) whose heat resistance has been confirmed by heat treatment experiments, and Bacillus halodurans (BH). Breakdown of proteins derived from Table (23), Breakdown of proteins derived from Oceanobacillus iheyensis (OI) (Table 24), Breakdown of proteins derived from Bacillus cereus (BC) (Table 23) 25) respectively. Each column of each table shows, from the left, the BS gene name, its product name, the corresponding GK gene name, the prediction result, the number of amino acids in the product, and the score difference of the principal component analysis of GK and BS. In the “prediction result” column, − indicates that the difference from the PC2 value of GK is less than −0.015, and it is determined that there is no heat resistance, and + indicates that the difference is greater than −0.015. ++ is greater than −0.01, and ++ is greater than −0.005 and is determined to have heat resistance.
As shown in Table 21-25, at least 38 heat-resistant proteins were identified for BC and BH, 117 for BS, and 52 for OI. Therefore, it was examined how the presence or absence of heat resistance of a protein for which heat resistance was confirmed was predicted by comparing the main component scores of GK and other species belonging to the genus Bacillus. As described above, if the difference between the main component scores of each Bacillus genus-related species and GK is −0.015 or more, 34 of 38 (89.5%) in 38 shown in Table 25 in BC However, in BH, 36 out of 38 (94.7%) shown in Table 23, BS in 103 out of 117 shown in Table 21 and Table 22 (88.0%), and OI shown in Table 24 It can be seen that 46 out of 52 (88.5%) were predicted to be heat resistant by the method of the present invention.
These results are summarized in Table 26 below. Each symbol in Table 26 is the same as in Table 19. Therefore, the method of the present invention in which the heat resistance of a protein produced by a non-thermophilic bacterium can be determined by calculating the correlation with the corresponding protein of the thermophilic bacterium, the heat resistance of the protein. It is that it shows.

本発明の方法をバチルス（Bacillus）属関連種に基づいて説明してきたが、本発明の方法は、例示してきたバチルス（Bacillus）属関連種に限定されるものではなく、被検定タンパク質に対応する耐熱性タンパク質が存在している限りいかなる生物種にも適用可能なものであることは当業者であれば容易に理解されるところである。
本発明における「耐熱性生物」とは、ヒトが生命を維持できる温度以上で、生命を維持できる生物であればよいが、具体的には約５０℃以上、好ましくは６０℃以上、より好ましくは６５℃以上の環境下で生命を維持できる生物をいう。例えば、好熱性菌、温泉生物、などが挙げられる。本発明の方法における「耐熱性生物」としては、被検定タンパク質を産生する生物と関連性を有する耐熱性生物が好ましい。ここで言う「関連性」としては、生物学における分類による近似性、発生学における遺伝子的な近似性、当該被検定タンパク質の有する機能的な近似性などを例示することができる。
本発明における「耐熱性生物が有するタンパク質」とは、当該耐熱性生物が産生するタンパク質であり、生命維持に必要なタンパク質であるか否かにかかわらず、当該耐熱性生物が産生しているタンパク質であればあればよい。
また、本発明における「耐熱性生物の有する被検定タンパク質と対応するタンパク質」としては、被検定タンパク質が有する機能と同種、好ましくは同等の機能を有するタンパク質であればよく、必ずしも生物学的又は発生学的な関連性を有する必要はないが、好ましくは生物学的又は発生学的な関連性を有するものが挙げられる。例えば、前記して例で挙げてきたように生物学的なオルソロガス遺伝子に基づく対応関係や、同属又は同種の生物間における同種の機能を有するタンパク質の関係などが挙げられる。
本発明の方法における「耐熱性生物の有する被検定タンパク質と対応するタンパク質」は、必ずしも１個のタンパク質である必要はなく、２個又はそれ以上のタンパク質であってもよい。そして、このようなタンパク質として２個以上のタンパク質を選定することができる場合には、これら相互と比較し、総合的に判定することも可能である。 Although the method of the present invention has been described based on Bacillus related species, the method of the present invention is not limited to the exemplified Bacillus related species and corresponds to the protein to be tested. Those skilled in the art will readily understand that thermostable proteins can be applied to any species as long as they exist.
The “heat-resistant organism” in the present invention may be any organism that can maintain life at or above the temperature at which a human can maintain life, and specifically, it is about 50 ° C. or more, preferably 60 ° C. or more, more preferably An organism that can sustain life in an environment of 65 ° C or higher. Examples include thermophilic bacteria and hot spring organisms. As the “thermostable organism” in the method of the present invention, a thermostable organism having a relationship with an organism producing the test protein is preferable. Examples of the “relevance” herein include closeness by biology classification, genetic closeness in embryology, and functional closeness of the protein to be tested.
In the present invention, the “protein possessed by a thermostable organism” is a protein produced by the thermostable organism, and whether the protein is produced by the thermostable organism regardless of whether or not it is a protein necessary for life support. If it is.
In the present invention, the “protein corresponding to the test protein possessed by the thermostable organism” may be a protein having the same kind as that of the test protein, preferably the same function as that of the test protein. It is not necessary to have a biological relationship, but those having a biological or developmental relationship are preferable. For example, as described above, examples include correspondence relationships based on biological orthologous genes, and relationships between proteins having the same kind of functions among organisms of the same or the same kind.
The “protein corresponding to the test protein possessed by the thermostable organism” in the method of the present invention is not necessarily one protein, and may be two or more proteins. And when two or more proteins can be selected as such a protein, it is also possible to judge comprehensively by comparing with each other.

本発明の方法における、「アミノ酸組成に基づく被検定蛋白質の耐熱性指標」を計算する手法としては、前記で例示してきた生物のゲノム中に同定される蛋白質をコードしている遺伝子に基づいて、タンパク質を抽出し、これらのタンパク質のうち、アミノ酸配列の配列長が５０アミノ酸未満のタンパク質を除去し、さらにＰＳＯＲＴプログラムを用いて２つ以上の膜貫通領域が予測されたタンパク質も除去し、次に残ったタンパク質についてそれらのアミノ酸配列を用いて、生物種ごとに平均アミノ酸組成を算出し、算出された平均アミノ酸組成について生物種を行、アミノ酸を列とする行列を入力して、統計解析パッケージＲのプリンコンプ（princomp）関数を用いる主成分分析法に基づく方法（Kreil D.P. and Ouzounis, C. A. (2001) Identification of thermophilic species by the amino acid compositions deduced from their genome. Nucleic Acids Res. 29, 1608-1615）が有効であるが、これに限定されるものではなく、実験的に耐熱性が検証された蛋白質の数が一定数以上あれば、判別分析や回帰分析などの手法によりその知見を取り込んで改善することも可能である。また、本発明においてはタンパク質全体(全長を)用いることが好ましいが、タンパク質の各ドメイン、部分長のみを対象に用いて行うことも可能である。 In the method of the present invention, as a technique for calculating the “heat resistance index of the test protein based on amino acid composition”, based on the gene encoding the protein identified in the genome of the organism exemplified above, Proteins are extracted, and among these proteins, proteins whose amino acid sequence is less than 50 amino acids are removed, and proteins in which two or more transmembrane regions are predicted using the PSORT program are also removed. Using the amino acid sequences of the remaining proteins, the average amino acid composition is calculated for each biological species, the biological average is calculated for the calculated average amino acid composition, a matrix having amino acids as columns is input, and the statistical analysis package R Method Based on Principal Component Analysis Using the Princomp Function (Kreil DP and Ouzounis, CA (2001) Identification of the Nucleic Acids Res. 29, 1608-1615) is effective, but the number of proteins whose heat resistance has been verified experimentally is not limited to this. If there are more than a certain number, it is possible to improve by incorporating the knowledge by methods such as discriminant analysis and regression analysis. Further, in the present invention, it is preferable to use the whole protein (full length), but it is also possible to carry out using only each domain and partial length of the protein.

本発明の方法における「分析値の比較」としては、前記の例で示してきた、両者の差を取る方法が簡便で好ましいがこれに限定されるものではない。データ量が多数蓄積された場合には、全体の平均値との相違や、偏差のように統計的な処理がなされた値に基づいて比較することも可能である。
また、比較したときの判定基準は、被検定タンパク質が実際に耐熱性を有していることを確認できる範囲において、これを設定することができる。前記した例では、固有ベクトル値(各アミノ酸の重み係数)と各蛋白質のアミノ酸数から計算された主成分得点の差が−０．００５〜−０．０１５程度以下である範囲において耐熱性であると判定することができる。このような判定は、必ずしも、有るか無いかということだけでなく、耐熱性を有する可能性として百分率（％）で表示することも可能である。 As the “comparison of analysis values” in the method of the present invention, the method of taking the difference between them as shown in the above example is simple and preferable, but is not limited thereto. When a large amount of data is accumulated, it is possible to make a comparison based on a difference from the overall average value or a value subjected to statistical processing such as a deviation.
In addition, the criterion for comparison can be set within a range in which it can be confirmed that the test protein actually has heat resistance. In the above-described example, heat resistance is within a range where the difference between the principal component score calculated from the eigenvector value (weighting coefficient of each amino acid) and the number of amino acids of each protein is about −0.005 to −0.015 or less. Can be determined. Such a determination is not necessarily limited to whether or not it exists, but can also be displayed as a percentage (%) as the possibility of having heat resistance.

本発明の方法における、被検定タンパク質のアミノ酸組成に基づく主成分分析による固有の解析値を算出するためのデータとしては、当該タンパク質のアミノ酸配列及び／又は塩基配列のデータなどが挙げられるがこれに限定されるものではなく、アミノ酸組成だけであってもよいこともあり得る。このようなデータとしては、判定精度を上げるために、情報量の多いものが好ましいが、前記の例で示したように当該タンパク質をコードする塩基配列が簡便で好ましい例としてあげることができる。これに加えて、タンパク質の3次元データなどをさらに加えることも可能であるが、どのようなデータが必要であるかということは、判定の精度の向上だけでなく、このデータを処理する処理手法に大きく依存している。 In the method of the present invention, the data for calculating a specific analysis value by principal component analysis based on the amino acid composition of the protein to be tested includes amino acid sequence and / or base sequence data of the protein. It is not limited, and it may be only the amino acid composition. As such data, data having a large amount of information is preferable in order to increase the determination accuracy. However, as shown in the above example, the base sequence encoding the protein can be mentioned as a simple and preferable example. In addition to this, it is possible to add more 3D protein data, but what kind of data is necessary is not only an improvement in the accuracy of judgment, but also a processing method for processing this data Depends heavily on.

本発明の方法は、具体的には以下に示す（１）〜（６）のステップからなるものである。
（１）タンパク質のアミノ酸配列及び／又はそれをコードする塩基配列を得ること。
（２）当該アミノ酸配列及び／又は塩基配列のデータに基づいて、当該タンパク質の
「固有の分析値」を算出すること。
（３）当該タンパク質を被検定タンパク質として、「耐熱性生物の有する被検定
タンパク質と対応するタンパク質」を選定すること。
（４）選定された「対応するタンパク質」の分析値データを得ること。
（５）両者を比較すること。
（６）比較の結果に基づいて判定すること。
これらのステップにおいて、（１）の配列の決定、及び（３）の選定以外の事項は予めその処理方法を設定しておくことができ、電子計算機による処理が可能である。また、前記ステップ（３）も、酵素分類などに基づいて予め分類しておけば、蓄積データの中から選定対象となる「対応するタンパク質」を選定させることも可能である。そうすると、前記（１）のステップ以外を電子計算機による処理とすることができる。
即ち、本発明は、前記してきた本発明の方法を、電子計算機で処理できようにプログラムされており、当該処理が、（ａ）当該プログラムに当該タンパク質のアミノ酸配列又は塩基配列のデータを入力することにより、電子計算機による処理で固有の分析値の算出する方法、（ｂ）当該タンパク質の分類記号、機能データ、由来データなどに基づいて当該タンパク質に「対応するタンパク質」を蓄積データの中から抽出する方法、（ｃ）前記の（ｂ）ステップで抽出されてきた「対応するタンパク質」の固有の分析値を算出又は蓄積データの値として参照する方法、（ｄ）当該タンパク質の固有の分析値と、「対応するタンパク質」の固有の解析値を比較する方法、（ｅ）比較した結果を表示（出力）する方法として、電子計算機により処理方法を提供するものである。 Specifically, the method of the present invention comprises the following steps (1) to (6).
(1) Obtaining an amino acid sequence of a protein and / or a base sequence encoding it.
(2) Calculate the “unique analysis value” of the protein based on the amino acid sequence and / or base sequence data.
(3) Select “protein corresponding to the test protein possessed by the thermostable organism” with the protein as the test protein.
(4) Obtain analytical value data of the selected “corresponding protein”.
(5) Compare the two.
(6) To make a determination based on the result of the comparison.
In these steps, the processing method can be set in advance for matters other than the determination of the arrangement in (1) and the selection in (3), and can be processed by an electronic computer. Further, if the step (3) is also classified in advance based on enzyme classification or the like, it is possible to select a “corresponding protein” to be selected from the accumulated data. If it does so, it can be set as the process by an electronic computer except the step of said (1).
That is, the present invention is programmed so that the above-described method of the present invention can be processed by an electronic computer, and the processing inputs (a) the amino acid sequence or base sequence data of the protein to the program. (B) Extracting the “corresponding protein” from the accumulated data based on the protein classification symbol, function data, origin data, etc. (C) a method of referring to a specific analysis value of the “corresponding protein” extracted in the step (b) described above as a value of calculated or accumulated data, and (d) a specific analysis value of the protein. , Processed by an electronic computer as a method of comparing specific analysis values of “corresponding proteins”, (e) a method of displaying (outputting) the comparison results It is intended to provide the law.

前記した電子計算機による処理においては、耐熱性生物の有する被検定タンパク質と対応するタンパク質における解析値を、その都度計算させることもできるが、当該タンパク質のアミノ酸組成に基づく主成分分析により算出された値を各タンパク質の種類に応じて分類してリスト化して蓄積データとしておくこともできる。このような蓄積データは、電子計算機による処理に供する情報として利用できるように、電子計算機で処理可能な記録媒体に蓄積することができる。このような記録媒体としては、ハードディスク、ＤＶＤディスク、ＣＤ−ＲＯＭ、ＭＯ、フレキシブルディスクなどが挙げられる。 In the processing by the electronic computer described above, the analysis value in the protein corresponding to the test protein of the thermostable organism can be calculated each time, but the value calculated by the principal component analysis based on the amino acid composition of the protein Can be classified and listed according to the type of each protein to be stored data. Such accumulated data can be accumulated in a recording medium that can be processed by an electronic computer so that it can be used as information for processing by the electronic computer. Examples of such a recording medium include a hard disk, a DVD disk, a CD-ROM, an MO, and a flexible disk.

したがって、コンピューターに以下の各ステップを実行させて被検定タンパク質が耐熱性を有しているか否かを判別させるための本発明のプログラムとしては、例えば、
（１）被検定タンパク質のアミノ酸配列を入力するステップ、
（２）他の生物における、被検定タンパク質とは異なる生物種が産生する対応する
タンパク質（対応タンパク質）の関係にある既知のタンパク質を検索する
タンパク質検索のステップ、
（３）被検定タンパク質のアミノ酸組成に基づく主成分分析による固有の分析値を
計算するステップ、
（４）前記（２）のステップで検索された対応タンパク質の固有の分析値と、
前記（３）のステップで計算した被検定タンパク質の固有の分析値を計算して、
両者の差分を算出するステップ、
（５）前記（４）で算出された差分に基づいて、被検定タンパク質が
前記（２）のステップで検索された対応タンパク質と類似しているか否かを
判別するステップ、及び、
（６）前記（２）のステップで検索された対応タンパク質、及び前記（５）の
ステップで判別された結果を表示するステップ、
からなるステップにより、被検定タンパク質のアミノ酸組成に基づく固有の分析値と、既知の対応タンパク質の固有の分析値とを比較させることによる被検定タンパク質が耐熱性を有しているか否かを判別させるためのプログラムが挙げられる。 Therefore, as a program of the present invention for causing a computer to execute the following steps to determine whether or not a test protein has heat resistance, for example,
(1) inputting an amino acid sequence of a protein to be tested;
(2) A protein search step for searching for a known protein in a relationship with a corresponding protein (corresponding protein) produced by a species different from the test protein in another organism,
(3) calculating a specific analysis value by principal component analysis based on the amino acid composition of the protein to be tested;
(4) a specific analysis value of the corresponding protein searched in the step (2);
Calculate the specific analysis value of the test protein calculated in the step (3),
Calculating the difference between the two,
(5) Based on the difference calculated in (4) above, determining whether the test protein is similar to the corresponding protein searched in the step (2); and
(6) a step of displaying the corresponding protein searched in the step (2) and the result determined in the step (5);
By comparing the specific analysis value based on the amino acid composition of the test protein with the specific analysis value of the known corresponding protein, it is determined whether or not the test protein has heat resistance. A program for this.

以下では、本発明のプログラムについて説明するが、以下の説明では、対応タンパク質としては、オーソログ関係のものを、またアミノ酸組成に基づく固有の分析値については主成分分析の第２主成分の値を例として説明する。また、本発明のプログラムはスタンドアロンでも実行可能であるが、以下の説明ではサーバータイプのプログラムを例として説明する。以下の説明は本発明の例示であり、本発明はこれらの例に限定されるものではない。
クライアントの実行フローチャートを図８に示す。起動させると起動画面が表示され、そのとき、オーソログ検索などに必要なデータが読み込まれる。起動の設定が完了すれば、端末等からの入力画面となる。被検定タンパク質の入力は、タイプイン、ＦＤ、ＣＤ、オンラインなどのいずれの方法であってもよく、被検定タンパク質のアミノ酸配列、及び／又は塩基配列、並びにその由来を入力する。図８では「変量プロット」として示されている。
入力が完了すれば、次にオーソログの検索を行うが、オプションとして、検索対象とする生物を選定できるように設計されている。生物種の選定は必須ではなく、蓄積されている全生物種を対象とすることもできる。指定された生物種についてオーソログを検索する。
オーソログの検索方法としては、被検定タンパク質のアミノ酸配列、アミノ酸組成、機能、由来、発現器官など各種のパラメーターを採用することができるが、この例ではアミノ酸配列に基づく相同性の検索によりオーソログを検索している。被検定タンパク質と相同性の高い、例えば相同性が７０％以上、８０％以上、又は８５％以上のタンパク質であって、被検定タンパク質とは異なる生物種由来のものであるものをこの例ではオーソログ候補と選定するようにしている。 In the following, the program of the present invention will be described. In the following description, the corresponding protein is an ortholog-related protein, and the inherent analysis value based on the amino acid composition is the value of the second principal component of the principal component analysis. This will be described as an example. The program of the present invention can be executed stand-alone, but in the following description, a server type program will be described as an example. The following description is an illustration of the present invention, and the present invention is not limited to these examples.
An execution flowchart of the client is shown in FIG. When activated, a startup screen is displayed, and at that time, data necessary for an ortholog search or the like is read. When the startup setting is completed, an input screen from a terminal or the like is displayed. The input of the test protein may be any of type-in, FD, CD, online, etc., and the amino acid sequence and / or base sequence of the test protein and its origin are input. In FIG. 8, it is shown as a “variable plot”.
When the input is completed, the ortholog is searched next, but as an option, it is designed so that the organism to be searched can be selected. The selection of species is not essential, and all accumulated species can be targeted. Search the ortholog for the specified species.
Various parameters such as the amino acid sequence, amino acid composition, function, origin, and expression organ of the test protein can be used as the method for searching the ortholog. In this example, the ortholog is searched by searching for homology based on the amino acid sequence. is doing. In this example, an ortholog is a protein that is highly homologous to the test protein, for example, a protein having a homology of 70% or more, 80% or more, or 85% or more and that is derived from a species different from the test protein. The candidate is selected.

このオーソログ検索の結果、該当するオーソログとなるタンパク質が見いだせなかった場合（図８では、「「オーソログリストが無い」の「no」）には、比較検討することができないので、処理が終了する。
１個以上のオーソログが見いだされた場合（図８では、「「オーソログリストが無い」の「yes」」には、オーソログリストを作成し、各オーソログ毎にアミノ酸組成に基づく第２主成分の値（スコア）を計算する。そして、被検定タンパク質の第２主成分の値（スコア）と、対応タンパク質の第２主成分の値（スコア）を比較してその差分を算出して、判別する。
このときの判別としては、この例では両者の差分が±０．００５、±０．０１０、±０．０１５の３段階がデフォルトとして設定されているが、ユーザーがオプションで設定することもできるようになっている。この結果を、エクセル形式、又は画面表示の形式で表示する。
この結果の表示は、表形式を基本とするが、この表形式に基づいてユーザーが指定したグラフとして表示することもできる。また、差分の大きさに応じて着色してあってもよい。この時の色調としては、赤、青、緑などをデフォルトとしておいてもよい。 As a result of this ortholog search, if no corresponding orthologous protein is found (“no” of “no ortholog list” in FIG. 8), the comparison is not possible, so the processing ends.
When one or more orthologs are found (in FIG. 8, in “yes” of “no ortholog list”), an ortholog list is created, and the value of the second principal component based on the amino acid composition for each ortholog (Score) is calculated, and the value (score) of the second principal component of the protein to be tested is compared with the value (score) of the second principal component of the corresponding protein, and the difference is calculated for discrimination.
In this example, the difference between the two is ± 0.005, ± 0.010, and ± 0.015 as defaults in this example, but the user can also set the options as an option. It has become. The result is displayed in an Excel format or a screen display format.
The display of the result is based on a tabular format, but can be displayed as a graph specified by the user based on the tabular format. Moreover, you may color according to the magnitude | size of a difference. As a color tone at this time, red, blue, green, or the like may be set as a default.

図９にサーバーサイドのデータ入力までのフローチャートを示す。クライアントから起動要求がくると、プログラムがスタートし、データの読み込みが行われる。データフォーマットのチェックが完了したら、アミノ酸の文字列以外の文字列を削除してアミノ酸の認識を行う。入力されたアミノ酸配列から、アミノ酸の数をカウントし、アミノ酸の数が５０未満である場合には、処理を終了する（図９の「Minimum Length チェック」）。次に、入力されたタンパク質に細胞膜貫通ドメインがあるかどうかを「ＳＯＳＵＩ」プログラムにより検定する。このプログラムは、疎水性アミノ酸の配列から細胞膜貫通ドメインを推定するプログラムである。入力されたタンパク質が「ＳＯＳＵＩ」プログラムにより細胞膜貫通ドメインが２個以上あると判定された場合には、これを膜タンパク質として取り扱い、本プログラムの処理対象外として処理を終了する。
入力されたタンパク質のアミノ酸数が５０個以上で、かつ細胞膜貫通ドメインと推定される箇所が１箇所未満である場合には、アミノ酸組成の分析を行う。アミノ酸組成の分析は、翻訳領域（ＯＲＦ）における２０種類の各アミノ酸について、各アミノ酸の含有率を％で計算する。次いで主成分分析を行う。この主成分分析は、市販の統計処理用のプログラムを使用することができる。主成分分析により第２種成分の値を算出してデータ入力処理を終了する。 FIG. 9 shows a flowchart up to data entry on the server side. When an activation request is received from the client, the program starts and data is read. When the data format check is completed, character strings other than amino acid character strings are deleted to recognize amino acids. The number of amino acids is counted from the input amino acid sequence, and if the number of amino acids is less than 50, the process is terminated (“Minimum Length check” in FIG. 9). Next, the “SOSUI” program is used to test whether the input protein has a transmembrane domain. This program is a program for estimating a transmembrane domain from a sequence of hydrophobic amino acids. If it is determined by the “SOSUI” program that there are two or more transmembrane domains, the input protein is handled as a membrane protein, and the processing is terminated as a non-processing target of this program.
When the number of amino acids of the input protein is 50 or more and there are less than one site estimated to be a transmembrane domain, the amino acid composition is analyzed. In the analysis of amino acid composition, the content of each amino acid is calculated in% for each of the 20 types of amino acids in the translation region (ORF). Next, principal component analysis is performed. For this principal component analysis, a commercially available program for statistical processing can be used. The value of the second type component is calculated by principal component analysis, and the data input process is terminated.

対応タンパク質の検索は、タンパク質のＮ末端側及びＣ末端側から相同性を検索し、例えば、ＢＬＡＳＴＰを利用して相同性を検索して、相同性のもっとも高いものを対応タンパク質とする。デフォルトとして相同性が７０％以下のものしか見つからない場合には対応タンパク質が見つからないとするが、最低の相同性についてもユーザーが設定することもできる。
本発明のプログラムは、対応タンパク質の検索のための既知のタンパク質のデータベースを参照する。このために、既知のタンパク質のデータを蓄積する必要があり、本発明のプログラムは、対応タンパク質の検索のための既知のタンパク質のデータを入力するためのステップを有することもできる。このようなデータのデータ源としては、各種の学術雑誌に発表されたものや、インターネット上で利用できるデータベースなどが挙げられる。インターネット上で利用できるデータベースの場合には、定期的又は非定期的の自動的にアクセスして新規な情報を自動ダウンロードするように設定することも可能である。このようにしてして得られたデータは、図９に示すのと同様なフローで、蓄積データとして本発明のプログラムの参照ファイルに蓄積しておくことができる。
また、本発明のプログラムは、既知のタンパク質のアミノ酸組成に基づく主成分分析の第２主成分の値のデータベースを参照する。この値は、対応タンパク質の検索のための既知のタンパク質のデータベースのアミノ酸配列から計算することができるので、前記した対応タンパク質の検索のための既知のタンパク質を入力したときに、計算して蓄積データとして、入力しておくことができる。
本発明のプログラムはこのような入力ステップをさらに含んでいることができる。 In the search for the corresponding protein, homology is searched from the N-terminal side and the C-terminal side of the protein, for example, the homology is searched using BLASTP, and the protein having the highest homology is used as the corresponding protein. If no homology is found by default with 70% or less, the corresponding protein is not found, but the user can also set the minimum homology.
The program of the present invention refers to a database of known proteins for searching for the corresponding protein. For this purpose, it is necessary to accumulate data of known proteins, and the program of the present invention may have a step for inputting data of known proteins for searching for corresponding proteins. Data sources of such data include those published in various academic journals and databases that can be used on the Internet. In the case of a database that can be used on the Internet, it is also possible to set to automatically download new information automatically by periodic or non-periodic automatic access. The data obtained in this way can be stored in the reference file of the program of the present invention as stored data in the same flow as shown in FIG.
The program of the present invention refers to a database of values of the second principal component of principal component analysis based on the amino acid composition of known proteins. This value can be calculated from the amino acid sequence of the known protein database for searching for the corresponding protein, so when the known protein for searching the corresponding protein is input, the calculated data is accumulated data. Can be entered as
The program of the present invention may further include such an input step.

本発明のプログラムの主な機能をまとめると、以下のようになる。
耐熱性を持つ微生物及び耐熱性を持たない数種類の近縁種のアミノ酸配列をインプットとし、アミノ酸組成をもとに主成分分析を行った結果を散布図として表示する。この微生物単位での主成分得点散布図では耐熱性をもつ微生物の推定ができ、更に実際の生育温度と主成分分析からの推定結果が一致しない例外的な微生物の候補を得ることができる。
好熱性菌と非好熱性菌との間でオーソログ関係にある蛋白質の第２主成分比較をする機能により、非好熱性菌に存在する耐熱性蛋白質を推定することが出来る。また、実際の生育温度と主成分分析からの推定結果が一致しない例外的な微生物の場合は、耐熱性へ深く関与する遺伝子の推定を行う事ができる。
また、本発明のプログラムは、耐熱性予測に必要なデータセットを作成するための計算ステップと耐熱性の有無を知りたい蛋白質のアミノ酸組成に基づいて耐熱性予測をするための２つのステップにより構成されており、処理内容と順序は以下のとおりである。
（１）データセット作成ステップの処理内容と順序
（ａ）最小配列排除ある長さよりも短いアミノ酸配列を削除する。
（ｂ）疎水領域排除ＳＯＳＵＩ（細胞膜貫通ドメイン推定用プログラム）に
ヒットしたデータで膜貫通領域が２つ以上含まれる配列を削除する。
（ｃ）各生物種間でのオーソログの決定
総当りでＢＬＡＳＴＰを実行し結果の最良値を持つものを一つずつ取得。
結果のオーソログのみを残したＦＡＳＴＡファイルを作成する(Ｗ１)。
（ｄ）アミノ酸組成の計算
オーソログ単位でアミノ酸組成を出し、生物種単位で平均を取る。
ファイルＷ１をもとに計算する。オーソログ単位のアミノ酸組成は
Ｗ２ファイルとして保存する。
（ｅ）主成分分析アミノ酸組成をもとに主成分得点を算出する。
生物種単位での主成分得点をもとに、主成分得点散布図表示データを
Ｗ３ファイルに出力する。
第２主成分の固有ベクトルとファイルＷ２を用いて各オーソログについて
スコアを計算し、Ｗ４ファイルに保存する。 The main functions of the program of the present invention are summarized as follows.
Using the amino acid sequences of heat-resistant microorganisms and several closely related non-heat-resistant amino acids as inputs, the results of principal component analysis based on the amino acid composition are displayed as a scatter diagram. In the main component scoring chart in units of microorganisms, it is possible to estimate microorganisms having heat resistance, and it is possible to obtain exceptional microorganism candidates whose actual growth temperature does not match the estimation result from the principal component analysis.
Due to the function of comparing the second principal component of the protein having an ortholog relationship between the thermophilic bacterium and the non-thermophilic bacterium, the heat-resistant protein present in the non-thermophilic bacterium can be estimated. In addition, in the case of an exceptional microorganism whose actual growth temperature and the estimation result from the principal component analysis do not match, it is possible to estimate a gene that is deeply involved in heat resistance.
The program of the present invention is composed of a calculation step for creating a data set necessary for heat resistance prediction and two steps for predicting heat resistance based on the amino acid composition of a protein for which the presence or absence of heat resistance is desired. The processing contents and order are as follows.
(1) Processing contents and order of data set creation step (a) Minimum sequence exclusion An amino acid sequence shorter than a certain length is deleted.
(B) Exclusion of hydrophobic region A sequence containing two or more transmembrane regions is deleted from data hit by SOSUI (program for estimating transmembrane domain).
(C) Determination of orthologs between each organism species BLASTP is performed on all rounds, and one with the best result is obtained one by one.
A FASTA file that leaves only the resulting ortholog is created (W1).
(D) Calculation of amino acid composition The amino acid composition is calculated in units of orthologs and averaged in units of species.
Calculate based on the file W1. The amino acid composition of the ortholog unit is saved as a W2 file.
(E) Principal component analysis The principal component score is calculated based on the amino acid composition.
Based on the principal component score for each species, the principal component score scatter diagram display data is output to a W3 file.
A score is calculated for each ortholog using the eigenvector of the second principal component and the file W2, and stored in the W4 file.

（２）耐熱性予測ステップの処理内容と処理順序
（ａ）全体分布図表示
ファイルＩ２、Ｉ３、Ｗ３を読み込み、生物種毎に散布図
(各要素は選択などが可能) を作成・表示する。
この全体分布図から、比較のもとになる生物 (対応タンパク質を有する生物)と
比較対象生物種 (複数可) を選択することができる。
（ｂ）対応タンパク質を有する生物と被検定タンパク質を有する生物の
全タンパク質のオーソログの関係にあるタンパク質のスコアを比較する。
（ｃ）選択した生物種についてファイルＷ１、Ｗ４を読み込み情報を取得する。
（ｄ）対応タンパク質を有する生物とそれぞれの比較対照生物種に含まれる
全タンパク質のオーソログの関係にあるタンパク質のスコアの差分を
一覧表示する。(差分の大きさで３種類に色分け表示する)
（ｅ）また、対応タンパク質を有する生物と比較対照生物の１対１でのオーソログの
スコアの差分を散布図として表示する。
本発明のプログラムの主な機能をまとめて次の表２７に示す。 (2) Processing contents and processing order of heat resistance prediction step (a) Whole distribution map display Files I2, I3, W3 are read, and a scatter diagram for each species
(Each element can be selected, etc.) Create and display.
From this overall distribution map, the organism (the organism with the corresponding protein) to be compared and the species to be compared (s) can be selected.
(B) Compare the scores of the proteins having an ortholog of all the proteins of the organism having the corresponding protein and the organism having the test protein.
(C) Read the files W1 and W4 for the selected species and acquire information.
(D) List differences in the scores of proteins that have an ortholog of all the proteins contained in the corresponding control species and the organisms with the corresponding proteins. (Displays three different colors depending on the size of the difference)
(E) In addition, the difference between the one-to-one ortholog scores of the organism having the corresponding protein and the control organism is displayed as a scatter diagram.
Table 27 below summarizes the main functions of the program of the present invention.

本発明のプログラムは、これをコンピューターに実行させるために、コンピューター読み取り可能な記録媒体に記録させておくことができる。このような記録媒体としては、ハードディスク、ＤＶＤディスク、ＣＤ−ＲＯＭ、ＭＯ、フレキシブルディスクなどが挙げられる。
したがって、本発明は、本発明のプログラムが記録されたコンピューター読み取り可能な記録媒体を提供するものでもある。
本発明の記録媒体には、本発明のプログラムが参照する、対応タンパク質の検索のための既知のタンパク質のデータベース、及び既知のタンパク質のアミノ酸組成に基づく固有の分析値のデータベースを併せて記録させておくこともできる。 The program of the present invention can be recorded on a computer-readable recording medium so that the computer can execute the program. Examples of such a recording medium include a hard disk, a DVD disk, a CD-ROM, an MO, and a flexible disk.
Therefore, the present invention also provides a computer-readable recording medium on which the program of the present invention is recorded.
In the recording medium of the present invention, a database of known proteins for searching for the corresponding protein referred to by the program of the present invention and a database of specific analysis values based on the amino acid composition of the known proteins are recorded together. It can also be left.

本発明の方法は、実験を伴わずにタンパク質の固有の分析値を計算することにより、コンピューター上で蛋白質の耐熱性予測ができるため、非常に迅速で安価である。また、本発明は、生物の個体ではなく、その生物が産生しているタンパク質単位で判定が可能であるために、従来は好熱性菌にその資源を頼らざるを得なかった耐熱性酵素の検索範囲を、中温菌が産生するタンパク質の範囲にまで広げることが出来るようになり、耐熱性タンパク質のスクリーニング範囲を拡大することができる。さらに、これまで非常に時間と労力を要した中温菌からの耐熱性酵素のスクリーニングにも予め耐熱性酵素候補の絞り込みが出来るようになったことで、様々なプロセスに対応できる耐熱性酵素の検索を容易に行うことができることになる。
さらに、本発明により開発されたコンピュータープログラムを用いて類縁菌に好熱性菌を持つ非好熱性菌からの耐熱性タンパク質を簡便に予測することで、これまでにない多様性に富んだ耐熱性酵素の探索にかかる時間を大幅に短縮することができる。また、本発明のプログラムは、Windows（登録商標）で起動するパーソナルコンピューター上で耐熱性の有無を知りたい蛋白質の耐熱性予測ができるため、特にコンピューター言語の知識がない人でも容易に用いることができる。 The method of the present invention is very quick and inexpensive because it can predict the heat resistance of a protein on a computer by calculating a specific analysis value of the protein without an experiment. In addition, since the present invention can be determined not by an individual organism but by a protein unit produced by the organism, a search for thermostable enzymes that conventionally had to rely on thermophilic bacteria for its resources. The range can be expanded to the range of proteins produced by mesophilic bacteria, and the screening range for thermostable proteins can be expanded. In addition, it has become possible to narrow down candidates for thermostable enzymes in advance for screening thermostable enzymes from mesophilic bacteria, which has been extremely time-consuming and laborious, so that the search for thermostable enzymes that can be used in various processes is now possible. Can be easily performed.
Furthermore, by using a computer program developed in accordance with the present invention, a thermostable enzyme from a non-thermophilic bacterium having a thermophilic bacterium as a related bacterium can be easily predicted, thereby providing a thermostable enzyme rich in diversity that has never existed before. The time required for searching can be significantly reduced. In addition, the program of the present invention can predict the heat resistance of a protein whose heat resistance is to be known on a personal computer running on Windows (registered trademark). it can.

以下、実施例により本発明のプログラムの基礎となっている実験結果をより具体的に説明するが、本発明はこれら実施例により何ら限定されるものではない。 Hereinafter, the experimental results that are the basis of the program of the present invention will be described more specifically by way of examples. However, the present invention is not limited to these examples.

１２０種の菌のデータの算出法。
ＮＣＢＩで公開されているデータベースから１１９種類の微生物ゲノムデータを取得し、これと本発明において決定したジオバチルスカウストフィラスＨＴＡ４２６（Geobacillus kaustophilus HTA426）のゲノムをあわせて１２０種類のゲノム中に同定された蛋白質の配列を解析に用いた。これらの蛋白質の配列のうち、配列長が５０アミノ酸未満の蛋白質を除去し、さらにＰＳＯＲＴ（K. Nakai, P. Horton, Trends Biochem. Sci., 24, 34-6, 1999 ）プログラムを用いて２つ以上の膜貫通領域が予測された蛋白質も除去した。残った蛋白質の配列を用いて、生物種ごとに平均アミノ酸組成を算出し、生物種を行、アミノ酸を列とする行列を入力して、クレイルらの方法（D. Kreil, C. Ouzounis, Nucleic Acids Res, 29, 1608-15, 2001）にしたがって、主成分分析を行った。解析には統計解析パッケージＲのプリンコンプ（princomp）関数を用いた。 Calculation method of data of 120 species of bacteria.
119 types of microbial genome data were obtained from the database published by NCBI, and the genome of Geobacillus kaustophilus HTA426 determined in the present invention was identified in 120 types of genomes. The protein sequence was used for analysis. Among these protein sequences, a protein having a sequence length of less than 50 amino acids is removed, and further 2 using the PSORT (K. Nakai, P. Horton, Trends Biochem. Sci., 24, 34-6, 1999) program. Proteins predicted to have more than one transmembrane region were also removed. Using the remaining protein sequence, calculate the average amino acid composition for each species, enter a matrix with the species as rows and amino acids as columns, and the method of Craile et al. (D. Kreil, C. Ouzounis, Nucleic According to Acids Res, 29, 1608-15, 2001), principal component analysis was performed. For the analysis, the princomp function of the statistical analysis package R was used.

９６５個のタンパク質の固有の解析値データの算出法
ＧＫ, ＢＣ, ＢＨ, ＢＳ, ＯＩの５種間のオーソログの対応付けはウチヤマによるＭＢＧＤ（I. Uchiyama, Nucleic Acids Res 31, 58-62, 2003）サーバ上のクラスタリングプログラムを用いて行った。５種すべてに存在し、かつ１対１の対応がついたオーソロググループのみを解析に用いた。さらに、ＰＳＯＲＴで膜貫通領域が２つ以上あると予測された蛋白質が４つ以上含まれるグループは除いた。前記実施例１に記載の主成分分析で得られた第２主成分の固有ベクトルを用いて、各蛋白質の耐熱性指標を、アミノ酸組成ベクトルと固有ベクトルの内積として算出した。 Calculation method of intrinsic analysis value data of 965 proteins The correspondence of five kinds of orthologs of GK, BC, BH, BS, OI is MBGD (I. Uchiyama, Nucleic Acids Res 31, 58-62, 2003) ) Performed using a clustering program on the server. Only ortholog groups present in all five and having a one-to-one correspondence were used for the analysis. Furthermore, a group including 4 or more proteins predicted to have 2 or more transmembrane regions by PSORT was excluded. Using the eigenvector of the second principal component obtained by the principal component analysis described in Example 1, the heat resistance index of each protein was calculated as the inner product of the amino acid composition vector and the eigenvector.

全タンパク質の分析
ＧＫ、ＢＳ、ＢＣはＬＢ培地（ｐＨ７）を用いて、ＢＨとＯＩはホリコシII（Horikoshi II）培地（ｐＨ９．５）(Takami, H, Kobayashi, T., Aono, R., and Horikoshi, K. Appl. Microbiol. Biotechnol. 38, 101-108, 1992)を用いて好気的に１８時間液体培養した。培養温度は、ＧＫの５５℃を除き３７℃で行った。培養菌体を遠心分離により取得し、５０ｍＭのリン酸バッファーで菌体を洗浄後、同バッファーに再懸濁して菌液とした。次にこの菌液をフレンチプレスに供試して作製した菌体破砕液を遠心分離し、菌体残渣を取り除いたものをタンパク質溶液として全タンパク質の分析に用いた。また、このタンパク質溶液をそれぞれ６０℃、７０℃で１０分間熱処理後急冷して熱処理タンパク質溶液をした。全タンパク質の分析は、ネイティブゲル電気泳動法によって行い、ゲル濃度は１２．５％のものを用いた。電気泳動後のゲルは、クーマシーブリリアントブルー（Coomassie brilliant blue）にて染色した。 Analysis of total proteins GK, BS and BC use LB medium (pH 7), and BH and OI use Horikoshi II medium (pH 9.5) (Takami, H, Kobayashi, T., Aono, R , and Horikoshi, K. Appl. Microbiol. Biotechnol. 38, 101-108, 1992). The culture temperature was 37 ° C except for 55 ° C of GK. The cultured cells were obtained by centrifugation, washed with 50 mM phosphate buffer, and then resuspended in the same buffer to obtain a bacterial solution. Next, the bacterial cell disruption solution prepared by using this bacterial solution in a French press was centrifuged, and the bacterial cell residue removed was used as a protein solution for the analysis of total proteins. The protein solution was heat-treated at 60 ° C. and 70 ° C. for 10 minutes and then rapidly cooled to obtain a heat-treated protein solution. The total protein was analyzed by native gel electrophoresis, and the gel concentration was 12.5%. The gel after electrophoresis was stained with Coomassie brilliant blue.

耐熱性蛋白質の同定
上記の実施例３に記載の方法で調製した各生物のタンパク質溶液を、実施例３と同様にして、ネイティブゲル電気泳動法によって分離し、クーマシーブリリアントブルーにて染色した。７０℃で１０分間熱処理したＧＫを除く４種のタンパク質溶液を電気泳動した図６のレーン３から、熱処理後も消失しなかったタンパク質のバンドをゲルから３ｍｍおきに切り出し、常法に従ってタンパク質をゲル中でトリプシン処理後ＬＣ／ＭＳ／ＭＳシステムを用いてペプチドを分画し、質量を算出した。質量分析は、サーモエレクトロン（Thermo Electron）社製のBioworks3.1, Xcalibourシステムを用いて解析し各バチルス（Bacillus）関連種のタンパク質データベースと照合することによって行い、それぞれのバンドに含まれるタンパク質の同定を行った。
この結果を表２１−２５に示す。 Identification of thermostable protein The protein solution of each organism prepared by the method described in Example 3 above was separated by native gel electrophoresis in the same manner as in Example 3, and stained with Coomassie Brilliant Blue. Protein bands that did not disappear after heat treatment were excised from lane 3 in Fig. 6 after electrophoresis of four protein solutions excluding GK that had been heat-treated at 70 ° C for 10 minutes. The peptides were fractionated using an LC / MS / MS system after trypsin treatment, and the mass was calculated. Mass spectrometry is performed using the Bioworks 3.1, Xcalibour system manufactured by Thermo Electron and collated with the protein database of each Bacillus-related species, and the protein contained in each band is identified. Went.
The results are shown in Tables 21-25.

エステラーゼの分析
上記の実施例３に記載の方法で調製した各生物のタンパク質溶液を、同様にネイティブゲル電気泳動法によって分離し、以下に示す方法によってエステラーゼ活性を有するバンドのみの検出を行った。
５０％アセトンに溶解した１％α−ナフチル酢酸（α-naphtyl acetate）を２ｍｌ、ファストブルーＢＢ（fast blue BB）塩１００ｍｇを０．０５Ｍトリス塩酸塩バッファー（ｐＨ７．４）１００ｍｌに加え撹拌し、プラスチック容器に移した後、電気泳動が終了したゲルを浸し遮光して３７℃で１０分間保温する。エステラーゼ活性を有するバンドが現れたら、先の溶液を捨て蒸留水にてゲルを洗浄した。
得られた各生物からのエステラーゼを、未処理溶液、それぞれ６０℃、７０℃で１０分間熱処理後急冷して熱処理タンパク質溶液として、ネイティブゲル電気泳動法を行った。ゲル濃度は１２．５％のものを用いた。 Analysis of esterase The protein solution of each organism prepared by the method described in Example 3 above was similarly separated by native gel electrophoresis, and only the band having esterase activity was detected by the method described below.
2 ml of 1% α-naphthyl acetate dissolved in 50% acetone and 100 mg of fast blue BB salt are added to 100 ml of 0.05 M Tris hydrochloride buffer (pH 7.4) and stirred. After being transferred to a plastic container, the gel after electrophoresis is immersed and shielded from light and kept at 37 ° C. for 10 minutes. When a band having esterase activity appeared, the previous solution was discarded and the gel was washed with distilled water.
The obtained esterase from each organism was subjected to native gel electrophoresis as an untreated solution, heat-treated at 60 ° C. and 70 ° C. for 10 minutes, and then rapidly cooled as a heat-treated protein solution. A gel concentration of 12.5% was used.

フラジェリンの分析
５種類の菌株のｈａｇ遺伝子の塩基配列から設計したプライマーセットを用いて、ＰＣＲにてｈａｇ遺伝子を増幅した。次に、Ｎ−末端にHis-tagを有するＴＡクローニング用のプラスミドベクター(pCRT7TOPOTA)にこれらのＰＣＲ産物をライゲーションして大腸菌（E.coli BL21 DE3）に形質転換した。形質転換した大腸菌はＯＤ６００が０．６になるまで培養し、０．５ｍＭのＩＰＴＧを添加して、３０℃で３−５時間発現させた。菌体を先と同様にフレンチプレスにで破砕し作製した菌体破砕液をHis-tagを有するタンパク質のみを簡便に精製するためのタロンメタルアフィニティー（TALON Metal Affinity）カラムに供試してタンパク質をカラムに付着させた後、１５０ｍＭのイミダゾール、５０ｍＭのリン酸ナトリウム、３００ｍＭのＮａＣｌで目的タンパク質を精製した。精製したタンパク質はＳＤＳ−ＰＡＧＥで精製度を確認した。
精製したタンパク質を用いて、前記実施例４の方法に準じて熱処理を行い、ネイティブゲル電気泳動法によりタンパク質を分離後、クーマシーブリリアントブルー（Coomassie brilliant blue）にて染色した。 Analysis of flagellin The hag gene was amplified by PCR using a primer set designed from the base sequences of the hag gene of five strains. Next, these PCR products were ligated to a plasmid vector (pCRT7TOPOTA) for TA cloning having His-tag at the N-terminus and transformed into E. coli BL21 DE3. The transformed Escherichia coli was cultured until OD600 reached 0.6, 0.5 mM IPTG was added, and expression was performed at 30 ° C. for 3 to 5 hours. The bacterial cell disruption solution prepared by crushing the bacterial cells with a French press in the same manner as described above was applied to a TALON Metal Affinity column for easy purification of only proteins with His-tags. Then, the target protein was purified with 150 mM imidazole, 50 mM sodium phosphate and 300 mM NaCl. The degree of purification of the purified protein was confirmed by SDS-PAGE.
Using the purified protein, heat treatment was performed according to the method of Example 4, and the protein was separated by native gel electrophoresis and then stained with Coomassie brilliant blue.

ＧｒｏＥＳの分析
５種類の菌株のｇｒｏＥＳ遺伝子の塩基配列から設計したプライマーセットを用いて、ＰＣＲにてｇｒｏＥＳ遺伝子を増幅し、実施例５と同様にタンパク質を精製した。また、精製タンパク質を用いて同様に熱処理、電気泳動を通してＧｒｏＥＳの分析を行った。 Analysis of GroES GroES gene was amplified by PCR using a primer set designed from the base sequences of groES genes of 5 strains, and the protein was purified in the same manner as in Example 5. In addition, GroES was similarly analyzed using the purified protein through heat treatment and electrophoresis.

耐熱性酵素などの耐熱性タンパク質は、糖工業、タンパク質工業、肥料工業などの様々な産業分野で利用されており、その重要性は極めて高い。また、ＤＮＡポリメラーゼなどのように、遺伝子操作技術においては耐熱性酵素の利用が不可欠とされている。
本発明のプログラムは、このような耐熱性酵素などの耐熱性タンパク質を簡便な手法で検索する新たな手法を提供するものであり、産業上極めて有用なものである。また、本発明の方法は、耐熱性タンパク質の検索範囲を従来の好熱性菌由来のもののみから、更に拡大できることを教示するものであり、産業上極めて大きな貢献をするものである。 Thermostable proteins such as thermostable enzymes are used in various industrial fields such as sugar industry, protein industry and fertilizer industry, and their importance is extremely high. In addition, the use of thermostable enzymes, such as DNA polymerase, is indispensable in gene manipulation techniques.
The program of the present invention provides a new technique for searching for thermostable proteins such as thermostable enzymes by a simple technique, and is extremely useful industrially. In addition, the method of the present invention teaches that the search range of thermostable proteins can be further expanded from only those derived from conventional thermophilic bacteria, and makes a great contribution to the industry.

図１は、本発明の方法の例示で使用したバチルス（Bacillus）属関連種の１６ＳｒＤＮＡに基づいてネイバージョイニング法により作成した系統樹を示す。FIG. 1 shows a phylogenetic tree created by the neighbor joining method based on the 16S rDNA of a Bacillus-related species used in the method of the present invention. 図２は、これまでに全ゲノム配列が明らかにされた１２０種の微生物が持つ蛋白質のアミノ酸組成を主成分分析法（ＰＣＡ）により、第１主成分をＧＣ含量（ＰＣ１）とし、第２主成分を生育上限温度（ＰＣ２）として解析した結果を示すカラーで作成したグラフである。FIG. 2 shows the amino acid composition of the proteins of 120 microorganisms whose whole genome sequences have been clarified so far by the principal component analysis method (PCA), with the first principal component as the GC content (PC1), It is the graph created with the color which shows the result of having analyzed the component as growth upper limit temperature (PC2). 図３は、好熱性ジオバチルスカウストフィラス（Geobacillus kaustopilus）（ＧＫ）とほぼ同じ生育上限温度を有するジオバチルスステアロサーモフィラス（Geobacillus stearothermophilus）（ＧＳ）について、各タンパク質における「主成分得点」の値に基づいた相関図を示す。図３の横軸はＧＫの「主成分得点」の値を示し、縦軸はＧＳの「主成分得点」の値を示す。FIG. 3 shows the “principal component score” for each protein for Geobacillus stearothermophilus (GS), which has almost the same maximum temperature as that of thermophilic Geobacillus kaustopilus (GK). The correlation diagram based on the value of is shown. The horizontal axis of FIG. 3 shows the value of “principal component score” of GK, and the vertical axis shows the value of “principal component score” of GS. 図４は、好熱性ジオバチルスカウストフィラス（Geobacillus kaustopilus）（ＧＫ）と、非好熱性細菌であるＢＣ、ＢＨ、ＢＳ、及びＯＩについてそれぞれＧＫとの相関をグラフ化して示したものである。図４の左上はＧＫとＢＣとの相関であり、左下はＧＫとＢＨとの相関であり、右上はＧＫとＢＳとの相関であり、右下はＧＫとＯＩとの相関である。それぞれのグラフの横軸はＧＫの「主成分得点」の値を示し、縦軸は各非非好熱性細菌の「主成分得点」の値を示す。FIG. 4 is a graph showing the correlation between GK for thermophilic Geobacillus kaustopilus (GK) and non-thermophilic bacteria BC, BH, BS, and OI. The upper left of FIG. 4 is the correlation between GK and BC, the lower left is the correlation between GK and BH, the upper right is the correlation between GK and BS, and the lower right is the correlation between GK and OI. The horizontal axis of each graph shows the value of “principal component score” of GK, and the vertical axis shows the value of “principal component score” of each non-thermophilic bacterium. 図５は、ＧＫ（黒四角印（■））、ＢＣ（白三角印（△））、ＢＨ（黒丸印（●））、ＢＳ（白丸印（○））、及びＯＩ（白四角印（□））の各微生物の生育上限温度と、９６５個のタンパク質のうちＧＫの「主成分得点」の値の差が異なるタンパク質の含有率の関係をまとめたグラフを示す。図５の横軸は温度を示し、縦軸は百分率（％）を示す。横軸には、それぞれの微生物の生育上限温度がプロットされ、縦軸は上のライン（原図では緑色）は、９６５個のタンパク質のうちＧＫとの「主成分得点」の値の差が-０．０１５を超えているタンパク質の数の全９６５個に対する百分率（％）を各微生物についてプロットした線であり、中程のライン（原図では青色）は同様に−０．０１を超えているタンパク質の数の百分率（％）を各微生物についてプロットした線であり、下側のライン（原図では赤色）は同様に−０．００５を超えているタンパク質の数の百分率（％）を各微生物についてプロットした線である。FIG. 5 shows GK (black square mark (■)), BC (white triangle mark (△)), BH (black circle mark (●)), BS (white circle mark (◯)), and OI (white square mark (□). )) And the growth upper limit temperature of each microorganism, and a graph summarizing the relationship between the contents of proteins with different differences in GK “principal component score” among 965 proteins. The horizontal axis of FIG. 5 indicates temperature, and the vertical axis indicates percentage (%). On the horizontal axis, the maximum growth temperature of each microorganism is plotted. On the vertical axis, the upper line (green in the original figure) shows that the difference in the value of “principal component score” from GK out of 965 proteins is −0. The percentage of the number of proteins exceeding .015 to the total of 965 (%) is plotted for each microorganism, with the middle line (blue in the original figure) of the proteins similarly exceeding -0.01. Number percentage (%) is a line plotted for each microorganism, and the lower line (red in the original figure) similarly plots the percentage (%) of the number of proteins above -0.005 for each microorganism. Is a line. 図６は、ＧＫ、ＢＣ、ＢＨ、ＢＳ、及びＯＩの各微生物のタンパク質を分離した後でのネイティブＰＡＧＥのパターンを検討した結果を示す、図面に代わるカラー写真である。図６の左側（図６ａ）は全タンパク質であり、左側（図６ｂ）は、各微生物のエステラーゼ活性を有するバンドを示したものである。各図のＢＣ、ＢＨ、ＢＳ、ＧＫ、ＯＩはそれぞれの微生物を示し、各微生物の１〜３のレーンは、１が非加熱、２が６０℃１０分、３が７０℃１０分である。FIG. 6 is a color photograph, instead of a drawing, showing the result of examining the native PAGE pattern after separating proteins of GK, BC, BH, BS, and OI microorganisms. The left side of FIG. 6 (FIG. 6a) is the total protein, and the left side (FIG. 6b) shows a band having esterase activity of each microorganism. BC, BH, BS, GK, and OI in each figure indicate the respective microorganisms. In the lanes 1 to 3 of each microorganism, 1 is unheated, 2 is 60 ° C. for 10 minutes, and 3 is 70 ° C. for 10 minutes. 図７は、ＧＫ、ＢＣ、ＢＨ、ＢＳ、及びＯＩの各微生物からＨａｇ、ＧｒｏＥＳタンパク質を分離した後でのネイティブＰＡＧＥのパターンを検討した結果を示す、図面に代わるカラー写真である。図７の左側（図７ａ）はＨａｇあり、左側（図７ｂ）は、各微生物のＧｒｏＥＳを単離して展開したものである。各図のＢＣ、ＢＨ、ＢＳ、ＧＫ、ＯＩはそれぞれの微生物を示し、各微生物の１〜３のレーンは、１が非加熱、２が６０℃１０分、３が７０℃１０分である。FIG. 7 is a color photograph, instead of a drawing, showing the results of examining the pattern of native PAGE after separating Hag and GroES proteins from GK, BC, BH, BS, and OI microorganisms. The left side (FIG. 7a) of FIG. 7 is Hag, and the left side (FIG. 7b) is the result of isolating and developing GroES of each microorganism. BC, BH, BS, GK, and OI in each figure indicate the respective microorganisms. In the lanes 1 to 3 of each microorganism, 1 is unheated, 2 is 60 ° C. for 10 minutes, and 3 is 70 ° C. for 10 minutes. 図８は、本発明のプログラムのユーザーサイドからの処理を示すフローチャートである。FIG. 8 is a flowchart showing processing from the user side of the program of the present invention. 図９は、本発明のプログラムのサーバーサイドからの入力処理のフローチャートである。FIG. 9 is a flowchart of input processing from the server side of the program of the present invention.

Claims

A program for causing a computer to execute the following steps to determine whether or not a test protein has heat resistance,
Inputting the amino acid sequence of the protein to be tested;
Among the known proteins in the species that are different from the species that produce the test protein, a protein that is produced by a species that is different from the species that produces the test protein and that has a relationship between the test protein and the ortholog , Corresponding protein search step)
Calculating specific analysis values by principal component analysis based on amino acid composition for each of the test protein and the corresponding protein,
Calculating a difference between the specific analysis value of the corresponding protein and the specific analysis value of the test protein,
Determining whether the test protein is similar to the corresponding protein based on the difference; and
Displaying the corresponding protein and the result determined in the determining step;
And execute
For each of the test protein and the corresponding protein, a step of calculating a specific analysis value by principal component analysis based on an amino acid composition,
Calculating an average amino acid composition for each species from genome data obtained from a database;
Calculating an eigenvector of the second principal component by principal component analysis from the matrix of the calculated average amino acid composition for each species,
Calculating a specific analysis value (heat resistance index) of the test protein as an inner product from the vector of the amino acid composition of the test protein and the eigenvector of the second principal component; and
Calculating an intrinsic analysis value (heat resistance index) of the protein to be tested as an inner product from the amino acid composition vector of the corresponding protein and the eigenvector of the second principal component;
including,
A program for discriminating whether or not a test protein has heat resistance by comparing a specific analysis value based on the amino acid composition of the test protein and a specific analysis value of the corresponding protein.

The program according to claim 1, wherein the biological species different from the biological species producing the test protein is a thermostable organism.

The program according to claim 1, wherein the amino acid composition of the test protein is calculated from the amino acid sequence of the test protein.

The program according to any one of claims 1 to 3, wherein the amino acid composition of the corresponding protein is calculated from the amino acid sequence of the corresponding protein.

The program according to any one of claims 1 to 4, wherein a known protein in a biological species different from the biological species producing the test protein is searched from a database of known proteins.

The program according to any one of claims 1 to 5, wherein the amino acid sequence of the corresponding protein is obtained from a database of known proteins.

The program according to claim 6, wherein the database of known proteins is a database of genomic data.

The program according to claim 6, wherein the database of known proteins is a database of microbial genome data.

From the genome data acquired from the database, calculating the average amino acid composition for each species,
The protein sequence is performed using the remaining protein sequence by removing a sequence having a sequence length of less than 50 amino acids, removing a protein having two or more transmembrane regions predicted. The program in any one of.

The program according to any one of claims 1 to 9, wherein the genome data acquired from the database is microbial genome data.

The program according to any one of claims 1 to 10, wherein the matrix of the calculated average amino acid composition for each biological species is a matrix having the biological species as rows and the average amino acid composition as columns.

Based on the difference, determining whether the test protein is similar to the corresponding protein,
The program according to any one of claims 1 to 11, which is a step of determining that the protein to be tested is heat-resistant when the difference is within a predetermined range.

The program according to any one of claims 1 to 12, comprising a step of checking whether or not the protein is a processing target.

The program according to any one of claims 1 to 13, wherein the protein to be processed has 50 or more amino acids and 1 or less transmembrane domain.

The program according to claim 1, comprising a step for inputting data of a known protein for searching for a corresponding protein.

The program according to any one of claims 1 to 15, further comprising a step for inputting a unique analysis value based on an amino acid composition of a known protein.

A computer-readable recording medium having recorded thereon a program to be executed by the computer according to claim 1.

The recording medium according to claim 17, wherein the recording medium also records data of known proteins necessary for searching for the corresponding protein.

The recording medium according to claim 17 or 18, wherein the recording medium also records data of a specific analysis value based on an amino acid composition of a known protein.