JP2007061092A

JP2007061092A - Method of determining biospecies

Info

Publication number: JP2007061092A
Application number: JP2006215083A
Authority: JP
Inventors: Hiroto Yoshii; 裕人吉井
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-08-05
Filing date: 2006-08-07
Publication date: 2007-03-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for determining biospecies reducing as possible the possibility of wrong determination according to biological feature of the biospecies when the unknown sample contains organism whose biospices not corresponds to any one of the previously determined categories in determining biospecies using pattern recognition. <P>SOLUTION: The method provides specific indefinable threshold value for every known organism and determines to carry out determination of biospecies to an unknown sample by utilizing the indefinable threshold value, wherein the determination is carried out when the determination is "yes". <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明はパターン認識を用いた生物種判定方法に関するものであり、特に、分析する方法としてＤＮＡマイクロアレイを用いた核酸配列解析システムに好適に適用でき、微生物の種類を判定する用途に用いるとその効果を発揮する。 The present invention relates to a biological species determination method using pattern recognition, and in particular, can be suitably applied to a nucleic acid sequence analysis system using a DNA microarray as a method for analysis. Demonstrate.

従来、ガラスなどからなる基板上に位置を定めて固定された、プローブと呼ばれる核酸断片を配備したＤＮＡマイクロアレイは未知の核酸サンプルの分析用として広く利用されてきた。このＤＮＡマイクロアレイを用いることで、未知の核酸断片サンプル（未知サンプルと称す）を解析し、未知サンプルがどの生物種に由来するかを判定する方法にも利用されてきた。この方法ではハイブリダイゼーション反応という、核酸の塩基対形成反応が利用される。 Conventionally, a DNA microarray in which nucleic acid fragments called probes, which are fixed on a substrate made of glass or the like, are widely used for analyzing unknown nucleic acid samples. By using this DNA microarray, an unknown nucleic acid fragment sample (referred to as an unknown sample) is analyzed, and it has also been used in a method for determining which biological species the unknown sample is derived from. This method uses a nucleic acid base pairing reaction called a hybridization reaction.

ハイブリダイゼーション反応とは何かを以下に説明する。生体内でほとんどの場合、ＤＮＡは２重らせん構造をしていて、その２本鎖の間の結合は塩基間の水素結合で実現されている。一方、ＲＮＡは１本で存在する場合が多い。塩基の種類はＤＮＡの場合はATGCの４種類、RNAの場合はAUGCの４種類であり、それぞれ水素結合ができる塩基対はA-T(U)、G-Cのペアとなっている。ハイブリダイゼーション反応とは、１本鎖状態の２つの核酸分子同士が適切な条件下で反応して、核酸中にある塩基配列を介して１つに結合するこという。 What is a hybridization reaction is described below. In most cases, DNA has a double helix structure, and the bond between the two strands is realized by hydrogen bonding between bases. On the other hand, RNA often exists as a single RNA. There are 4 types of bases, ATGC in the case of DNA, and 4 types of AUGC in the case of RNA. The base pairs capable of hydrogen bonding are A-T (U) and G-C pairs. A hybridization reaction means that two nucleic acid molecules in a single-stranded state react with each other under appropriate conditions and bind to one through a base sequence in the nucleic acid.

このことを踏まえ、従来の生物種を判定する方法を以下に説明する。基板上に固定されたプローブと、そのプローブと塩基対を形成することのできる相補的な塩基配列をもつ核酸断片とは、適切な反応条件のもとでハイブリダイゼーション反応を生じ、プローブと核酸断片が結合することができる。基板上に固定されたプローブが特定の生物に対応した塩基配列であって、かつハイブリダイゼーション反応が生じてプローブと核酸断片とが結合したことを認識できれば、核酸断片に対応する生物種はプローブに対応する生物種と同一であると判定可能である。すなわち未知サンプルに対応する生物種を判定することができる。 Based on this, a conventional method for determining a biological species will be described below. A probe immobilized on a substrate and a nucleic acid fragment having a complementary base sequence capable of forming a base pair with the probe cause a hybridization reaction under appropriate reaction conditions. Can be combined. If the probe immobilized on the substrate has a base sequence corresponding to a specific organism and it can be recognized that a hybridization reaction has occurred and the probe and the nucleic acid fragment are bound, the species corresponding to the nucleic acid fragment will be the probe. It can be determined that it is the same as the corresponding species. That is, the biological species corresponding to the unknown sample can be determined.

例えば、核酸断片に蛍光物質を付与することによってハイブリダイゼーション反応が生じたかどうかを光学的に認識することができる。基板上に固定されたプローブから蛍光が生じた場合、ハイブリダイゼーション反応が生じ、プローブと核酸断片との結合体が形成されたと認識できる。この結果に基づいて、核酸断片はプローブに対応する生物種と同一であると判定される。一方、プローブから蛍光が生じなかった場合、ハイブリダイゼーション反応が生じず、プローブと核酸断片との結合体が形成されなかったと認識され、核酸断片はプローブに対応する生物種ではないと判定される。この判定方法を利用すると、ひとつの未知サンプルが与えられた場合、どの生物種に対応するかを一度のハイブリダイゼーション反応で複数種類の生物種に関して判定することができる。すなわち、対応する生物種が既知のプローブを複数準備し、基板上に位置を定めて固定する。そのようにして作成したＤＮＡマイクロアレイに、未知サンプルを適切な反応条件のもとでハイブリダイゼーション反応をさせる。そして、基板上の位置から生物種を特定し、蛍光の有無でその生物種に対応するか否かを、判定する事ができる。つまり、基板上のどの位置のプローブから蛍光が生じるかを確認することによって、未知サンプルの生物種を判定することができる。 For example, it is possible to optically recognize whether or not a hybridization reaction has occurred by adding a fluorescent substance to a nucleic acid fragment. When fluorescence is generated from the probe immobilized on the substrate, it can be recognized that a hybridization reaction has occurred and a conjugate of the probe and the nucleic acid fragment has been formed. Based on this result, the nucleic acid fragment is determined to be identical to the biological species corresponding to the probe. On the other hand, if no fluorescence is generated from the probe, it is recognized that a hybridization reaction has not occurred and a conjugate of the probe and the nucleic acid fragment has not been formed, and it is determined that the nucleic acid fragment is not a biological species corresponding to the probe. By using this determination method, when one unknown sample is given, it is possible to determine which species correspond to a plurality of species by a single hybridization reaction. That is, a plurality of probes whose corresponding species are known are prepared, and their positions are fixed on the substrate. The DNA microarray thus prepared is allowed to undergo a hybridization reaction with an unknown sample under appropriate reaction conditions. Then, a biological species is specified from the position on the substrate, and it can be determined whether or not it corresponds to the biological species by the presence or absence of fluorescence. That is, the biological species of the unknown sample can be determined by confirming from which position on the substrate the fluorescence is generated.

しかし実際には、未知サンプルとプローブのハイブリダイゼーション反応の結果、ただ一つの生物種に対応するプローブが蛍光を生じるわけではない。多くの場合、たとえ未知サンプルには一種類の生物種しか対応しないことが事前にわかっていても、ハイブリダイゼーション反応させると、生物種を特定するためのプローブとは別に、他のプローブから蛍光が生じる場合がある。これは核酸分子がその中にある塩基配列を介して部分的に他のプローブと結合する場合があるからであり、クロスハイブリダイゼーションと呼ばれる。このクロスハイブリダイゼーションが発生するために、上記のように基板上の位置と蛍光の有無、の二つの情報のみでは未知サンプルに対応する生物種を判定することができない場合が多い。たとえば未知サンプルを、複数種類の生物種に対応するプローブを備えたＤＮＡマイクロアレイとハイブリダイゼーションさせたとき、生物Ａと生物Ｂに対応するプローブから蛍光を生じたとしても、以下の結果を反映している可能性の検討が必要である。 However, in practice, as a result of a hybridization reaction between an unknown sample and a probe, a probe corresponding to only one species does not generate fluorescence. In many cases, even if it is known in advance that an unknown sample corresponds to only one species, the hybridization reaction causes fluorescence from other probes apart from the probe for identifying the species. May occur. This is because a nucleic acid molecule may partially bind to another probe via a base sequence contained therein, and is called cross-hybridization. Since this cross-hybridization occurs, it is often impossible to determine a biological species corresponding to an unknown sample with only two pieces of information, that is, the position on the substrate and the presence or absence of fluorescence as described above. For example, when an unknown sample is hybridized with a DNA microarray having probes corresponding to a plurality of species, even if fluorescence is generated from probes corresponding to organisms A and B, the following results are reflected. It is necessary to examine the possibility of being.

すなわち、クロスハイブリダイゼーションの可能性を考慮すると、生物Ａのみ未知サンプルに含まれている場合、生物Ｂのみ含まれる場合、生物Ａと生物Ｂともに含まれる場合、などが考えられ、一意に未知サンプルに含まれる生物種を決定することができない。 That is, in consideration of the possibility of cross-hybridization, the case where only organism A is included in the unknown sample, the case where only organism B is included, the case where both organism A and organism B are included, and the like are considered. Species contained in cannot be determined.

一般的な傾向として、同じプローブに結合した核酸断片から発生する蛍光強度は、クロスハイブリダイゼーションして部分的にプローブに結合した場合に生じる蛍光強度よりハイブリダイゼーションしてほぼ完全に結合した場合に生じる蛍光強度の方が強い。よって、ＤＮＡマイクロアレイを利用して、未知サンプルを解析し、未知サンプルがどの生物種であるかを判定する場合には、プローブの位置情報と蛍光強度に代表されるシグナル強度の情報と、から総合的に生物種を判定する方法を選択するべきである。 As a general trend, the fluorescence intensity generated from a nucleic acid fragment bound to the same probe occurs when it hybridizes and is almost completely bound to the fluorescence intensity produced when cross-hybridized and partially bound to the probe. The fluorescence intensity is stronger. Therefore, when analyzing an unknown sample using a DNA microarray and determining which species the unknown sample is, it is comprehensive from the position information of the probe and the signal intensity information represented by the fluorescence intensity. Should select a method to determine the species.

なお、ＤＮＡマイクロアレイと未知サンプルとのハイブリダイゼーション反応後の蛍光強度はプローブ位置によって順序つけられたベクトルデータとして記憶手段に格納して利用することができる。 The fluorescence intensity after the hybridization reaction between the DNA microarray and the unknown sample can be stored and used in the storage means as vector data ordered by the probe position.

特表2002-533699号公報には、ＤＮＡマイクロアレイを利用して未知のサンプルから得られたベクトルデータを解析し、この未知のサンプルから得られたベクトルデータと最も似ている既知のベクトルデータを検索する方法が開示されている。この最も似ている既知のベクトルデータを検索するという情報処理は、パターン認識として知られ、非常に一般的である。パターン認識とは、観測されたパターンを予め定められた複数の「カテゴリー」のうちの一つに対応させる処理である。「カテゴリー」を説明する例として、ＯＣＲ（Optical Character Recognition）と呼ばれる技術分野では、紙に印刷、または手書きされた文字を一つのパターンとしてパターン認識する。このとき、認識対象を数字としたとすると、「紙に書かれている文字が０から９の数字のどれに一番近いか？」を既知のベクトルデータと照らし合わせて求める。パターン認識の問題において、この認識すべき０から９の10種類の数字が「カテゴリー」である。 In JP 2002-533699, the vector data obtained from an unknown sample is analyzed using a DNA microarray, and the known vector data most similar to the vector data obtained from this unknown sample is searched. A method is disclosed. This information processing of retrieving the most similar known vector data is known as pattern recognition and is very common. Pattern recognition is a process of making an observed pattern correspond to one of a plurality of predetermined “categories”. As an example for explaining “category”, in a technical field called OCR (Optical Character Recognition), characters printed or handwritten on paper are pattern-recognized as one pattern. At this time, assuming that the recognition target is a number, “Which of the numbers 0 to 9 is the closest to the character written on the paper?” Is obtained by comparing with known vector data. In the pattern recognition problem, 10 types of numbers from 0 to 9 to be recognized are “categories”.

通常、パターン認識の問題においては、認識すべきカテゴリーの数と種類が事前に定められている。例えば上記の例では、数字といえば０〜９であり、日本語なら3000字程度の漢字、英語のアルファベットなら26文字というようにカテゴリーの数と種類はあらかじめ定められている。
特表2002-533699号公報 Usually, in the problem of pattern recognition, the number and types of categories to be recognized are predetermined. For example, in the above example, numbers are 0 to 9, and the number and types of categories are determined in advance, such as about 3000 kanji for Japanese and 26 for the English alphabet.
Special table 2002-533699 gazette

しかしながら、対応する生物種が未知の核酸断片サンプルをつかってＤＮＡマイクロアレイとハイブリダイゼーションさせた結果得られたベクトルデータを使って、パターン認識する場合、カテゴリーが事前に想定されるとは限らない場合が多い。例えば、未知サンプル中にある細菌が存在するかどうかを、ＤＮＡマイクロアレイを用いて判定する場合では、あらかじめプローブとして配備する核酸断片に対応する細菌の種類を決める。しかしながら実際に未知サンプル中に存在する生物が、プローブに対応した生物種の中で収まる可能性は低い。なぜなら、先に説明したＯＣＲと呼ばれる技術分野における０〜９までの９種類のカテゴリー、アルファベットＡからＺの２６種類のカテゴリー、または漢字ならば約３０００種類のカテゴリーに比べて、生物種全体の種類は圧倒的多数に上る。それゆえ、たとえ判定したい生物種を細菌に限ったとしても、想定すべきカテゴリーが膨大な数となり、あらかじめ全ての種類の細菌に関するカテゴリーを定めることは事実上不可能であるからである。したがって、未知サンプル中に存在すると想定される生物の種類をある程度限定してカテゴリーを定めることが必要である。 However, when pattern recognition is performed using vector data obtained as a result of hybridization with a DNA microarray using an unknown nucleic acid fragment sample, the category may not be assumed in advance. Many. For example, in the case where it is determined using a DNA microarray whether or not there are bacteria in an unknown sample, the type of bacteria corresponding to the nucleic acid fragment to be deployed as a probe is determined in advance. However, it is unlikely that organisms that are actually present in the unknown sample will fit within the species corresponding to the probe. This is because the species of the whole species compared to the nine categories from 0 to 9 in the technical field called OCR described above, 26 categories of alphabets A to Z, or about 3000 categories of kanji. Is an overwhelming majority. Therefore, even if the organism species to be determined are limited to bacteria, there are a huge number of categories to be assumed, and it is virtually impossible to predetermine categories for all types of bacteria. Therefore, it is necessary to define categories by limiting the types of organisms assumed to be present in unknown samples to some extent.

そのため、ＯＣＲなどの文字認識で利用される従来の方法をそのまま生物種の判定に適用するには問題があった。具体的には、未知サンプル内にあらかじめ定められていないカテゴリーの生物が含まれていた場合、この生物を定められたカテゴリーに無理に対応させてしまうといった誤った判定をしてしまうという問題があった。 Therefore, there is a problem in applying a conventional method used for character recognition such as OCR as it is to the determination of a biological species. Specifically, if an unknown sample contains an organism in a category that is not set in advance, there is a problem that an erroneous determination is made such that the organism is forced to correspond to a predetermined category. It was.

本発明の目的は、パターン認識を利用した生物種の判定において、あらかじめ定められたどのカテゴリーにも対応しない生物が未知サンプルに含まれている場合に、生物種の生物学的特長に応じて、誤った判定する可能性を低減すること目的とする。 The object of the present invention is to determine the biological species using pattern recognition, when an unknown sample contains an organism that does not correspond to any predetermined category, depending on the biological characteristics of the biological species, The object is to reduce the possibility of erroneous determination.

本発明の生物種判定方法は、生物に由来する物質が含まれていることが想定される物質を分析して、対応する生物の種類を判定する生物種判定方法において、
対応する生物種が判明している複数の既知サンプルを生物種分析方法により分析して、複数の分析データを得る工程と、
既知サンプルから得た該複数の分析データに基づいて、該既知サンプルに対応する生物種に関する判定不能閾値を設定する工程と、
対応する生物種が未知である未知サンプルを、前記生物種分析法により分析して、該未知サンプルに対応する生物種の特定のための分析データを得る工程と、
前記判定不能閾値に基づいて前記未知サンプルに対応する種類を判定するか、あるいは判定不能であるかを決定する工程と、
判定をすると決定されたならば、前記複数の分析データに基づいて前記未知サンプルの生物種を判定する工程と、
を有する生物種判定方法である。 The biological species determination method of the present invention is a biological species determination method that analyzes a substance assumed to contain a substance derived from an organism and determines the type of the corresponding organism.
Analyzing a plurality of known samples whose corresponding species are known by a species analysis method to obtain a plurality of analysis data;
Setting an indeterminable threshold for the species corresponding to the known sample based on the plurality of analysis data obtained from the known sample;
Analyzing an unknown sample whose corresponding species is unknown by the species analysis method to obtain analysis data for identifying the species corresponding to the unknown sample;
Determining a type corresponding to the unknown sample based on the indeterminable threshold, or determining whether it is impossible to determine;
If it is determined to make a determination, determining the species of the unknown sample based on the plurality of analysis data;
It is the biological species determination method which has.

また、本発明の生物種判定方法の他の態様は、生物に由来する物質が含まれていることが想定される物質を生物種分析方法にて分析して、対応する生物の種類を判定する生物種判定方法において、
（１）前記未知サンプルに対する判定結果として想定される生物種を選択する工程と、
（２）前記選択された生物類に属することが判明している複数の個体から得られる既知サンプルの各々から、該生物種に特徴的であって、パターン認識用として使用し得る複数の画像データからなる画像データ群を得る工程と、
（３）前記画像データ群から画像データを選択し、残りの画像データとの関係を用いて判定不能閾値を設定する工程と、
（４）未知サンプルからの画像データを得る工程と、
（５）前記未知サンプルからの画像データを前記判定不能閾値に基づいて、該未知サンプルに対応する生物種を判定するか判定が不能であるかを決定する工程と、
（６）前記（５）で判定を行うことが決定された場合は、前記画像データ群からなる識別辞書を用いて、生物種を判定する工程と、
を有することを特徴とする生物種類判定方法である。 In another aspect of the biological species determination method of the present invention, a species assumed to contain a substance derived from an organism is analyzed by the biological species analysis method to determine the type of the corresponding organism. In the species determination method,
(1) selecting a biological species assumed as a determination result for the unknown sample;
(2) A plurality of image data that are characteristic of the species and can be used for pattern recognition from each of the known samples obtained from a plurality of individuals known to belong to the selected organism. Obtaining an image data group comprising:
(3) selecting image data from the image data group and setting an indeterminate threshold using a relationship with the remaining image data;
(4) obtaining image data from an unknown sample;
(5) determining whether image data from the unknown sample is based on the indeterminable threshold and determining whether or not it is possible to determine the species corresponding to the unknown sample;
(6) When it is determined to perform the determination in (5), a step of determining a biological species using an identification dictionary including the image data group;
It is a biological kind determination method characterized by having.

本発明の生物種判定のための情報処理装置は、対応する生物種が判明している複数の既知サンプルを生物分析方法により分析して得られた複数の分析データ、及び、該複数の分析データに基づいて設定された判定不能閾値を記憶したメモリと、該メモリに記憶された判定不能閾値に基づいて、未知サンプルに対応する生物種が判定可能か否かを決定し、判定可能と決定した場合には、前記メモリに記憶された複数の分析データに基づいて前記未知サンプルに対応する生物種の判定を行う処理ユニットとから成る生物種判定のための情報処理装置である。 The information processing apparatus for biological species determination according to the present invention includes a plurality of analysis data obtained by analyzing a plurality of known samples whose corresponding species are known by a biological analysis method, and the plurality of analysis data. It is determined whether or not the biological species corresponding to the unknown sample can be determined based on the memory storing the indeterminate threshold set based on and the indeterminate threshold stored in the memory, and determined to be determinable In this case, the information processing apparatus for biological species determination includes a processing unit that determines a biological species corresponding to the unknown sample based on a plurality of analysis data stored in the memory.

本発明の生物種判定のための情報処理装置の他の態様は、生物に由来する物質が含まれていることが想定される物質を分析して、対応する生物種を判定するための情報処理装置において、対応する生物種が判明している複数の既知サンプルを分析して得られる該生物種に特徴的な画像データを入力するための既知サンプル画像データ入力手段と、
未知サンプルを前記既知サンプルと同様に分析して得られる画像データを入力する未知サンプル画像データ入力手段と、
取り込まれた前記画像データを記憶する記憶手段と、
既知サンプルから得た該複数の分析データに基づいて、該既知サンプルに対応する生物種に関する判定不能閾値を設定する手段と、
未知サンプルからの画像データを前記判定不能閾値にもとづいて判定を行うかまたは判定を行わないかを決定し、判定を行うのであれば、未知サンプルに対応する生物種を判定する生物種判定手段と、
前記判定手段での判定結果を記憶する記憶手段と、
前記記憶手段に記憶された判定結果を出力する出力手段と
を有することを特徴とする生物種判定のための情報処理装置である。 Another aspect of the information processing apparatus for biological species determination of the present invention is an information processing for determining a corresponding biological species by analyzing a substance assumed to contain a substance derived from an organism. In the apparatus, known sample image data input means for inputting image data characteristic of the biological species obtained by analyzing a plurality of known samples whose corresponding biological species are known;
Unknown sample image data input means for inputting image data obtained by analyzing an unknown sample in the same manner as the known sample;
Storage means for storing the captured image data;
Means for setting an indeterminate threshold for the species corresponding to the known sample based on the plurality of analysis data obtained from the known sample;
Biological species determining means for determining whether or not to determine whether or not to determine image data from an unknown sample based on the indeterminate threshold, and determining the biological species corresponding to the unknown sample ,
Storage means for storing the determination result of the determination means;
An information processing apparatus for determining a species, comprising output means for outputting a determination result stored in the storage means.

本発明の生物種判定のためのプログラムは、未知サンプルに対応する生物種の判定をコンピュータに実行させるためのプログラムであって、
（１）未知サンプルに対する判定結果として想定される生物種に属する複数の異なる個体からの既知サンプルを分析して得られる該想定される生物種に特徴的な画像データに対応する複数の画像データを格納した記憶手段から、これらの複数の既知サンプル画像データを呼び出すステップと、
（２）未知サンプルを前記既知サンプルと同様にして分析して得られる画像データに対応する複数の画像データを格納した記憶手段から、該未知サンプル画像データを読み出すステップと、
（３）前記既知サンプル画像データから１つを選択し、選択された１つと残りの画像データとの関係を用いて判定不能閾値を設定するステップと、
（４）前記判定不能閾値に基づいて前記未知サンプル画像データを処理し、未知サンプルに対応する生物の種類を判定するステップと、
（５）前記判定ステップで得られた判定結果を記憶手段に格納させるステップと、
（６）前記記憶手段に格納された判定結果を出力するステップと
を有することを特徴とする生物種類判定用プログラムである。 The program for determining a biological species of the present invention is a program for causing a computer to execute determination of a biological species corresponding to an unknown sample,
(1) A plurality of pieces of image data corresponding to image data characteristic of the assumed species obtained by analyzing known samples from a plurality of different individuals belonging to the assumed species as a determination result for the unknown sample. Recalling the plurality of known sample image data from the stored storage means;
(2) reading the unknown sample image data from a storage means storing a plurality of image data corresponding to the image data obtained by analyzing the unknown sample in the same manner as the known sample;
(3) selecting one from the known sample image data, and setting a non-determinable threshold using a relationship between the selected one and the remaining image data;
(4) processing the unknown sample image data based on the non-determinable threshold value, and determining a type of organism corresponding to the unknown sample;
(5) storing the determination result obtained in the determination step in a storage means;
(6) A biological type determination program, comprising: a step of outputting a determination result stored in the storage means.

本発明の生物種判定のための記録媒体は、生物種判定をコンピュータで実行するためのプログラムを読み取り可能に記録した記録媒体であって、該プログラムが上記構成の生物種判定のためのプログラムであることを特徴とする記録媒体である。 The recording medium for biological species determination of the present invention is a recording medium in which a program for executing biological species determination by a computer is recorded so as to be readable, and the program is a biological species determination program having the above-described configuration. It is a recording medium characterized by being.

本発明の生物種判定方法の他の態様は、対応する生物種が判明している複数の既知サンプルを生物分析方法により分析して得られた複数の分析データと、該複数の分析データに基づいて設定された判定不能閾値を用い、前記判定不能閾値に基づいて、未知サンプルに対応する生物種が判定可能か否かを決定した後、判定可能と決定した場合には、前記複数の分析データに基づいて前記未知サンプルに対応する生物種の判定を行う生物種判定方法である。 Another aspect of the biological species determination method of the present invention is based on a plurality of analysis data obtained by analyzing a plurality of known samples whose corresponding biological species are known by a biological analysis method, and the plurality of analysis data. If it is determined that the biological species corresponding to the unknown sample is determinable based on the determinable threshold set, and then it is determined that determination is possible, the plurality of analysis data Is a biological species determination method for determining a biological species corresponding to the unknown sample.

本発明によれば、あらかじめ定められたどのカテゴリーにも対応しない生物が未知サンプルに含まれている場合に、判定不能と判断することができるので、適切な生物種判定結果が得られるという効果がある。また、生物種に対応するカテゴリーごとに判定不能の判断を行うためのパラメータを設定することができるので、生物種の生物学的特長に応じた最適な生物種判定結果が得られるという効果がある。 According to the present invention, when an organism that does not correspond to any predetermined category is included in the unknown sample, it can be determined that determination is impossible, so that an effect of obtaining an appropriate species determination result can be obtained. is there. In addition, since it is possible to set a parameter for performing an indeterminate determination for each category corresponding to the species, there is an effect that an optimum species determination result corresponding to the biological characteristics of the species can be obtained. .

本発明にかかる生物種の判定方法は、ベクトルデータを解析して、識別辞書の作成及び判定不能閾値を決定する方法が含まれる。本発明にかかる生物種の判定方法では、最初に生物種が判明している生物から抽出された核酸断片サンプル（既知サンプルと称す）を分析して得られたベクトルデータを外部記憶手段に格納する。この既知サンプルを分析して得られたベクトルデータを辞書のように参照して未知サンプルの生物種を判定するので、外部記憶手段に格納された既知サンプルを分析して得られた生物種判定用のベクトルデータの総体を、識別辞書と称することにする。 The biological species determination method according to the present invention includes a method of analyzing vector data and creating an identification dictionary and determining an indeterminate threshold. In the method for judging a biological species according to the present invention, vector data obtained by analyzing a nucleic acid fragment sample (referred to as a known sample) extracted from an organism whose biological species is first known is stored in an external storage means. . The species data of the unknown sample is determined by referring to the vector data obtained by analyzing the known sample like a dictionary. Therefore, for the species determination obtained by analyzing the known sample stored in the external storage means. The total of the vector data is referred to as an identification dictionary.

次に識別辞書として格納されたベクトルデータを用いて、判定不能閾値を設定する。この判定不能閾値の設定方法に関しては後ほど詳述する。設定された判定不能閾値に従って、対応する生物種を判定したい未知サンプルが作成した識別辞書で生物種を判定できるのか、または判定できない（判定不能）のかが判断される。 Next, using the vector data stored as an identification dictionary, a determination impossible threshold is set. The method for setting the indeterminate threshold will be described in detail later. It is determined whether or not the biological species can be determined by the identification dictionary created by the unknown sample for which the corresponding biological species is to be determined, or cannot be determined (determination is impossible) according to the set determination impossible threshold.

以下、既知サンプルおよび未知サンプルを分析した結果が画像データとして得られる場合について本発明の判定方法を説明する。 Hereinafter, the determination method of the present invention will be described in the case where the result of analyzing a known sample and an unknown sample is obtained as image data.

識別辞書を作成するための既知サンプルの選定に当たっては、まず判定対象としての生物が属すると想定される生物種類が選択される。例えば未知サンプル中に細菌が存在している可能性があって、細菌に関しての生物種判定を行いたい場合、細菌のなかから既知の生物種をあらかじめ選択する。この選択された生物種が、パターン認識を利用した生物種判定方法におけるカテゴリーに相当する。数字やアルファベットに比べ生物種全体の種類は圧倒的多数に上るので、ある程度限定してカテゴリーを定めることが必要であることは既に述べた。 In selecting a known sample for creating an identification dictionary, first, a type of organism that is assumed to belong to the organism to be determined is selected. For example, when there is a possibility that bacteria are present in an unknown sample and it is desired to perform species determination regarding bacteria, a known species is selected in advance from the bacteria. The selected biological species corresponds to a category in the biological species determination method using pattern recognition. We have already mentioned that it is necessary to limit the category to some extent because the number of species is overwhelmingly large compared to numbers and alphabets.

次に、選択された各生物種の固体を用意し、各生物種の固体から抽出された核酸断片サンプルを得る。これを既知サンプルとして、既知サンプル及び未知サンプルから画像データを得るための分析方法を選択する。この分析方法は、パターン認識により生物種の判定が可能となる方法から選択される。例えば、ＤＮＡマイクロアレイなどを用いて、得られた画像データをベクトルデータとして認識する分析方法が好適に利用できる。 Next, a solid of each selected species is prepared, and a nucleic acid fragment sample extracted from the solid of each species is obtained. With this as a known sample, an analysis method for obtaining image data from the known sample and the unknown sample is selected. This analysis method is selected from methods that enable determination of species by pattern recognition. For example, an analysis method for recognizing the obtained image data as vector data using a DNA microarray or the like can be suitably used.

ＤＮＡマイクロアレイを利用して画像データをどのようにして得るかを説明する。プローブは、生物種ごとに用意されて、前述のように基板上の所定の位置（すなわち、どの生物種に対応するプローブがどの位置に配備されているかがあらかじめ定められた位置）に固定されている。核酸断片に例えば蛍光物質を付与することによって、ＤＮＡマイクロアレイと核酸断片とを適切な条件下で反応させたときに、ハイブリダイゼーション反応が生じたかどうかを光学的に認識することができる。 How to obtain image data using a DNA microarray will be described. A probe is prepared for each species, and is fixed at a predetermined position on the substrate as described above (that is, a position where a probe corresponding to which species is deployed is determined in advance). Yes. For example, by applying a fluorescent substance to the nucleic acid fragment, it is possible to optically recognize whether or not a hybridization reaction has occurred when the DNA microarray and the nucleic acid fragment are reacted under appropriate conditions.

本発明では、１つのカテゴリーに対して、１つの既知サンプルから得た画像データから識別辞書を作成するのではなく、各カテゴリーにおいてそれぞれ２つ以上の既知サンプル（同種の生物の異なる２つ以上の個体）から得られる画像データに基づいて判定不能閾値を設定して、生物種類の判定を行なう。 In the present invention, rather than creating an identification dictionary from image data obtained from one known sample for one category, each category has two or more known samples (two or more different samples of the same organism). An indeterminable threshold is set based on the image data obtained from the individual, and the biological type is determined.

なお、判定したい生物が微生物であれば、生物種類として微生物の「種（species）」を選択することができ、その他様々な生物に本発明が適用できることは言うまでもない。 In addition, if the organism to be determined is a microorganism, it is needless to say that “species” of the microorganism can be selected as the organism type, and the present invention can be applied to various other organisms.

以下、図面に基づいて本発明の一例について説明する。 Hereinafter, an example of the present invention will be described with reference to the drawings.

図１は、本発明の生物種類判定方法の一例における処理手順を説明するフローチャートである。この生物種類判定方法は、ある未知サンプル中にターゲットとしての生物種を特定できるこの生物種に由来する物質が存在するか、存在するとすればそれが由来する生物は何の種類に属するかを判定する方法である。生物種類判定方法における棄却とは、ターゲットとして選択した生物種に由来する物質が未知サンプルに存在しないとの判定をすることである。なお、以下においては、微生物などのゲノム解析を用いる生物種類判定を主体として本発明を説明する。しかしながら、例えば、抗原抗体反応を用いた検査システムなどに対しても本発明の技術を適用できる。また、ＭＨＣなどの個体識別ゲノム領域などを分析するシステムにも適用してもよい。 FIG. 1 is a flowchart illustrating a processing procedure in an example of the biological species determination method of the present invention. This organism type determination method determines whether there is a substance derived from this species that can identify the target species in an unknown sample, and if so, what kind of organism the organism is derived from. It is a method to do. The rejection in the biological species determination method is to determine that a substance derived from the biological species selected as the target does not exist in the unknown sample. In the following, the present invention will be described mainly with reference to biological species determination using genome analysis such as microorganisms. However, the technique of the present invention can be applied to, for example, a test system using an antigen-antibody reaction. Further, the present invention may be applied to a system that analyzes an individual identification genomic region such as MHC.

本発明における未知サンプルの生物種判定処理の流れは、大きくみて、既知サンプルを用いて識別辞書を作成する学習フェーズと未知サンプルを判定する判定フェーズに分かれる。図１において、１０１から１０４が学習フェーズで、１０５から１０８が判定フェーズである。 The flow of the biological sample determination process of the unknown sample in the present invention can be broadly divided into a learning phase for creating an identification dictionary using a known sample and a determination phase for determining an unknown sample. In FIG. 1, 101 to 104 are learning phases, and 105 to 108 are determination phases.

以下に学習フェーズを説明する。ステップ１０１では、対応する生物の種類が既知の生物から抽出された核酸断片を含む既知サンプルを用意する。例えば、菌種が特定されている菌のゲノムなどを含む溶液などが既知サンプルに相当する。この既知サンプルを用いて一連のハイブリダイゼーション反応実験１０２を行ってデータを得る。詳細は後述するが、例えばＤＮＡマイクロアレイを用いた場合、まずＰＣＲ反応により既知サンプルに含まれる核酸断片を増幅し、蛍光物質を付与する。その後、ＤＮＡマイクロアレイとハイブリダイゼーション反応をして、それぞれのスポットの蛍光強度のデータを画像として認識して外部記憶手段に格納する。この画像データをもとに、判定不能閾値設定ステップ１０３と、辞書作成ステップ１０４で、判定不能閾値と識別辞書がそれぞれ作成される。 The learning phase will be described below. In step 101, a known sample containing a nucleic acid fragment extracted from an organism with a known organism type is prepared. For example, a solution containing the genome of a fungus whose species is specified corresponds to the known sample. Using this known sample, a series of hybridization reaction experiments 102 are performed to obtain data. Although details will be described later, for example, when a DNA microarray is used, first, a nucleic acid fragment contained in a known sample is amplified by a PCR reaction, and a fluorescent substance is applied. Thereafter, a hybridization reaction is performed with the DNA microarray, and the fluorescence intensity data of each spot is recognized as an image and stored in the external storage means. Based on this image data, the indeterminable threshold setting step 103 and the dictionary creation step 104 create an indeterminate threshold and an identification dictionary, respectively.

次に判定フェーズを説明する。未知のサンプルを用意し（ステップ１０５）、ステップ１０２と全く同じ手順でハイブリダイゼーション反応実験１０６を実施する。ハイブリダイゼーション反応により得られた画像データと、学習フェーズで得られた判定不能閾値、識別辞書とを照らし合わせることによって、未知サンプルに対して生物種の判定を行う（ステップ１０７）。その結果判定結果１０８として、「未知サンプルは生物種Ａに対応する」、「未知サンプルには生物種Ａ〜Ｃに由来する物質が含まれる」、「未知サンプルには生物種Ａ〜Ｚ以外で生物群αに含まれる生物に由来する物質が存在する」、「１０５の未知サンプルは判定できない（＝判定不能）」というような結果が得られる。 Next, the determination phase will be described. An unknown sample is prepared (step 105), and the hybridization reaction experiment 106 is performed in exactly the same procedure as in step 102. By comparing the image data obtained by the hybridization reaction with the indeterminable threshold and the identification dictionary obtained in the learning phase, the species is determined for the unknown sample (step 107). As a result of the determination 108, “unknown sample corresponds to species A”, “unknown sample contains substances derived from species A to C”, “unknown sample includes species other than species A to Z” A result such as “a substance derived from an organism included in the organism group α exists” or “105 unknown samples cannot be determined (= not determined)” is obtained.

以上に説明した学習フェーズ中の、特に判定不能閾値の設定方法について異なる２通りの方法を以下に詳しく説明する。 Two different methods in particular regarding the setting method of the indeterminate threshold during the learning phase described above will be described in detail below.

まず同種であるが異なる生物個体から得られた既知サンプルを用意し、ＤＮＡマイクロアレイとハイブリダイゼーション反応させ画像データを得る。判定不能閾値の設定には、次の方法のいずれかを好ましく用いることができる。
・方法（１）３以上の既知サンプルの画像データから１つを選択して除外し、残りの既知サンプルの画像データから識別辞書を作成し、それを利用して判定不能閾値を設定する方法。
・方法（２）３以上の既知サンプルの画像データから選択した任意の２つの組み合わせの全てについてパターン認識アルゴリズムにより求めた距離を利用して判定不能閾値を設定する方法。 First, a known sample obtained from the same but different organism is prepared and subjected to a hybridization reaction with a DNA microarray to obtain image data. Any of the following methods can be preferably used for setting the indeterminate threshold.
Method (1) A method in which one of three or more known sample image data is selected and excluded, an identification dictionary is created from the remaining known sample image data, and an indeterminate threshold is set using the identification dictionary.
Method (2) A method for setting an indeterminable threshold using the distances obtained by the pattern recognition algorithm for all two arbitrary combinations selected from image data of three or more known samples.

まず、上記の方法（１）の判定不能閾値を設定する方法について説明する。この方法の処理手順のフローチャートを図８に示す。まず、未知サンプルに対する生物種判定結果として想定されるｎ種の異なる生物種（Ｓ１〜Ｓｎ：ｎ≧２）、すなわちターゲットカテゴリーを選択する（ステップ８０２）。次に、選択されたターゲットカテゴリーごとに固有の判定不能閾値を得るための処理を行う。次にステップ８０２でターゲットカテゴリーとして選択されたカテゴリーの既知サンプルを用意して、ハイブリダイゼーションさせた結果、画像データを得ることができる。画像データは、識別辞書を作成するために外部記憶手段に格納される。この画像データの総体を学習データと呼ぶことにする。以下、ターゲットカテゴリーＳ１に属する生物種を例にあげて、（１）の判定不能閾値を設定する方法を説明する。 First, the method for setting the indeterminable threshold of the above method (1) will be described. A flowchart of the processing procedure of this method is shown in FIG. First, n different biological species (S1 to Sn: n ≧ 2) assumed as biological species determination results for an unknown sample, that is, target categories are selected (step 802). Next, a process for obtaining an indeterminable threshold value specific to each selected target category is performed. Next, as a result of preparing a known sample of the category selected as the target category in step 802 and performing hybridization, image data can be obtained. The image data is stored in an external storage means for creating an identification dictionary. The total of the image data is called learning data. Hereinafter, a method for setting the non-determinable threshold value of (1) will be described by taking a biological species belonging to the target category S1 as an example.

まず、ターゲットカテゴリーS1に属するｍ個の個体Ｓ１−X（１≦X≦ｍ、ｍ≧３）を用意する。用意した各個体から拡散断片を抽出し、ｍ個（ｍ≧３）の既知サンプルを得る。このｍ個の既知サンプルとDNAマイクロアレイとを適当な条件下でハイブリダイゼーションさせ、ｍ個（ｍ≧３）（Ｐｓ１−１〜Ｐｓ１−ｍ）の画像データ群を得る。次に、学習データ分割ステップ８０３において、これらの画像データから１つを選択して学習データから除去する。次に、１つの画像データを除いた残りのｍ−１個の学習データ８０４を用いて、辞書作成ステップ８０５において識別辞書８０６を作成する。この辞書作成ステップ８０５は、採用したパターン認識アルゴリズムに則って作成される。 First, m individuals S1-X (1 ≦ X ≦ m, m ≧ 3) belonging to the target category S1 are prepared. A diffusion fragment is extracted from each prepared individual, and m (m ≧ 3) known samples are obtained. The m known samples and the DNA microarray are hybridized under appropriate conditions to obtain m (m ≧ 3) (Ps1-1 to Ps1-m) image data groups. Next, in learning data division step 803, one of these image data is selected and removed from the learning data. Next, an identification dictionary 806 is created in a dictionary creation step 805 using the remaining m−1 pieces of learning data 804 excluding one image data. This dictionary creation step 805 is created in accordance with the adopted pattern recognition algorithm.

パターン認識による未知パターンの判定には、公知の方法から選択した方法を利用することができる。パターン認識による判定や分類のための方法は、例えば、IEEE Transaction on Pattern Analysis and Machine Learning, Vol. 22, No. 1, January 2000, pp.4-pp.37にある"Statistical Pattern Recognition: A Review"Anil K. Jain, Robert P.W. Duin, and Jianchan Mao. の論文にレビューされている。具体的には、例えばk-Nearest-Neighbor法、分類木、Support Vector Machine、ベイズ識別法、ブースティング法、ニューラルネットなどのパターン認識の技術が利用できる。 For the determination of the unknown pattern by pattern recognition, a method selected from known methods can be used. The method for judgment and classification by pattern recognition is, for example, “Statistical Pattern Recognition: A Review” in IEEE Transaction on Pattern Analysis and Machine Learning, Vol. 22, No. 1, January 2000, pp.4-pp.37. "Reviewed by Anil K. Jain, Robert PW Duin, and Jianchan Mao. Specifically, for example, a pattern recognition technique such as a k-Nearest-Neighbor method, a classification tree, a support vector machine, a Bayes identification method, a boosting method, or a neural network can be used.

例えばニューラルネットをパターン認識アルゴリズムとして採用したとすると、ネットワークの重みパラメータ集合が識別辞書として学習される。例えばSupport Vector Machineがパターン認識アルゴリズムとして採用されたとすると、いわゆるSupport Vectorと呼ばれる代表するサンプルベクトルとその重み付けが識別辞書として学習される。本発明において、識別辞書として学習される、もしくは学習する、とは、学習データに基づいて識別辞書を作成することと同義である。 For example, if a neural network is adopted as a pattern recognition algorithm, a network weight parameter set is learned as an identification dictionary. For example, if Support Vector Machine is adopted as a pattern recognition algorithm, a representative sample vector called a support vector and its weight are learned as an identification dictionary. In the present invention, learning or learning as an identification dictionary is synonymous with creating an identification dictionary based on learning data.

次に、学習データから除いた一つの画像データを、識別辞書８０６を用いて判定する（ステップ８０８）。ここでは、例えば個体S1−１に対応する画像データＰｓ１−１を学習データから除いたとしよう。この時、注意すべきなのは、識別辞書８０６はステップ８０７で除いた１つの画像データを含まない。よって、識別辞書８０６に対して、ステップ８０７で除かれた個体S１−１は未知サンプルとなる。識別辞書を用いて判定を行う場合、外部記憶手段に格納されたベクトルデータ同士を比較するために、ユークリッドノルムに代表されるノルムをあらかじめ定義する必要がある。本発明の生物種判定方法にユークリッドノルムを採用した場合については後述されるが、もちろん一般の種々のノルムを採用してもかまわない。以上の工程を経て、判定指数８０９が得られる（ステップ８０９）。 Next, one image data excluded from the learning data is determined using the identification dictionary 806 (step 808). Here, for example, it is assumed that the image data Ps1-1 corresponding to the individual S1-1 is excluded from the learning data. At this time, it should be noted that the identification dictionary 806 does not include one image data excluded in step 807. Therefore, for the identification dictionary 806, the individual S1-1 removed in step 807 is an unknown sample. When the determination is performed using the identification dictionary, it is necessary to previously define a norm represented by the Euclidean norm in order to compare the vector data stored in the external storage unit. The case where the Euclidean norm is employed in the biological species determination method of the present invention will be described later. Of course, various common norms may be employed. The determination index 809 is obtained through the above steps (step 809).

一般にパターン認識アルゴリズムの判定結果は、数値データとなる。例えば判定確率であったり、類似度であったり、単にベクトルデータ同士の距離であったりする。このように、判定指数８０９はあらかじめ定義されたノルムを用いて算出された判定結果である数値データを意味する。 Generally, the determination result of the pattern recognition algorithm is numerical data. For example, it may be a determination probability, a similarity, or simply a distance between vector data. As described above, the determination index 809 means numerical data that is a determination result calculated using a predefined norm.

こうして、ｍ個の画像データのうち、学習データから一つの画像データを除いた学習データをもちいて作成した識別辞書を用いて、ターゲットカテゴリーＳ１についての一つの判定指数A１−１が得られる。 In this way, one determination index A1-1 for the target category S1 is obtained using an identification dictionary created by using learning data obtained by removing one image data from learning data among m pieces of image data.

次に、上記の学習データ分割ステップ８０３で除かれなかったターゲットカテゴリーS1の画像データを新たにＳ１−X（１≦X≦ｍ、ｍ≧３）の中から選択して同様に除く。ここではS1−２に対応する画像データを選択したとしよう。同様の処理を行ない、ターゲットカテゴリーＳ１に関する判定指数A1−２を得る。すなわち上記の操作を同様にターゲットカテゴリーＳ１の各画像データに実行することによりｍ個の判定指数からなる判定指数集合｛A１｝を得る。判定指数集合｛A１｝は、m個の判定指数の元（げん）、A1−１、A−２、・・・A１−ｍ、からなる。 Next, the image data of the target category S1 that is not removed in the learning data division step 803 is newly selected from S1-X (1 ≦ X ≦ m, m ≧ 3) and removed in the same manner. Here, it is assumed that the image data corresponding to S1-2 is selected. Similar processing is performed to obtain a determination index A1-2 for the target category S1. That is, the above operation is similarly performed on each image data of the target category S1, thereby obtaining a determination index set {A1} including m determination indexes. The judgment index set {A1} is composed of m judgment index elements, A1-1, A-2,... A1-m.

こうして得られた判定指数集合｛A１｝からターゲットカテゴリーS1に関する判定不能閾値を設定することができる。上記ではターゲットカテゴリーS１に関して例にあげて判定指数集合を得る方法を説明した。同様にして最初に選んだｎ種のターゲットカテゴリーのうち、ターゲットカテゴリーS1以外の他のターゲットカテゴリーに関してもそれぞれ判定指数集合を得る。その結果、ｎ個の判定指数集合を得ることができる。 A determination impossible threshold for the target category S1 can be set from the determination index set {A1} thus obtained. In the above description, the method for obtaining the determination index set has been described with respect to the target category S1. Similarly, a set of determination indices is obtained for each of the target categories other than the target category S1 among the n target categories selected first. As a result, n decision index sets can be obtained.

図９に、ｎ個の判定指数集合のうちから一つの判定指数集合を選び、判定指数集合８１０の分布をヒストグラムで表示した例を示す。判定指数８０９が類似度を示している場合、判定不能閾値を、例えば、集合の最小値のα倍（α<1）に設定したり、集合の平均値や中央値のβ倍（β＞０）にしたりする。逆に、８０９の判定指数が非類似度を示している場合、判定不能閾値を、例えば、集合の最大値のα倍（α>1）に設定したり、集合の平均値や中央値のβ倍（β＞０）にしたりする。判定不能閾値を、判定指数集合に基づいてどのような値に設定するかは、検査対象としての生物の種類、パターン認識を用いる分析方法の種類、目的とする判定精度などに応じてターゲットカテゴリーごとに選択できる。このようにして定めた判定不能閾値の設定が適切かどうかを確認する方法として、選択したターゲットカテゴリーに含まれないことがあらかじめ判明しているサンプルを未知サンプル１０５として用いる方法がある。図１を用いて前述した未知サンプルの生物種判定処理を実行して、この未知サンプルについて「判定不能である」との結果がでるかどうかを試験して、判定不能閾値の設定が正しいかどうかを確認することができる。 FIG. 9 shows an example in which one determination index set is selected from the n determination index sets, and the distribution of the determination index set 810 is displayed as a histogram. When the determination index 809 indicates similarity, for example, the determination impossible threshold is set to α times (α <1) the minimum value of the set, or β times (β> 0) the average value or median value of the set. ). On the other hand, when the determination index of 809 indicates dissimilarity, for example, the determination impossible threshold is set to α times the maximum value of the set (α> 1), or the average value of the set or the β of the median value Or double (β> 0). What value should be set for the indeterminate threshold based on the set of decision indices depends on the type of organism to be examined, the type of analysis method using pattern recognition, the target decision accuracy, etc. Can be selected. As a method for confirming whether or not the setting of the indeterminate threshold value determined in this way is appropriate, there is a method in which a sample that has been previously determined not to be included in the selected target category is used as the unknown sample 105. The unknown species biological species determination process described above with reference to FIG. 1 is executed to test whether or not a result of “undeterminable” is obtained for this unknown sample. Can be confirmed.

つぎに、方法（２）の判定不能閾値を設定する方法を以下に説明する。図１０に判定不能閾値を設定する別の例を示す。パターン認識アルゴリズムとしてk-Nearest-Neighbor法（特にk=1）を選び、ノルムとしてユークリッドノルムを採用した場合の判定不能閾値設定方法を以下に説明する。この場合、未知サンプルを分析して得られた画像データから算出された判定指数が非類似度を示すとすると、定められた判定不能閾値より大きい時に「判定不能」という結果がでることになる。判定不能閾値を設定するために、まずひとつのターゲットカテゴリーS1を選び、S1に属する全ての既知サンプルをハイブリダイゼーションさせた結果、画像データを得て外部記憶手段に格納する。この格納された画像データの総体からS1に属する任意の２つの画像データの組み合わせを選択し、この２つの画像データをもとに認識されたプローブ位置によって順序つけられた蛍光強度からなるベクトルデータ同士のユークリッド距離を算出する。つづいて、上記で選ばれなかった２つの画像データの組み合わせを新しく選出し、同様にして、新しく選出された２つの画像データに基づいてユークリッド距離を算出する。このような手順で、S1に属し外部記憶手段に格納された画像データ群に関してそれぞれの組み合わせに基づいてユークリッド距離が算出される。ターゲットカテゴリーに属する既知サンプルが６つ用意された場合を図７に示す。この場合、２つの画像データをもとに算出されるユークリッド距離の個数は₆Ｃ₂＝１５となる。 Next, a method for setting the indeterminable threshold of method (2) will be described below. FIG. 10 shows another example in which an indeterminate threshold value is set. An indeterminable threshold setting method when the k-Nearest-Neighbor method (especially k = 1) is selected as the pattern recognition algorithm and the Euclidean norm is adopted as the norm will be described below. In this case, assuming that the determination index calculated from the image data obtained by analyzing the unknown sample indicates the dissimilarity, a result of “determination impossible” is obtained when the determination index is larger than a predetermined determination failure threshold. In order to set an indeterminable threshold, first, one target category S1 is selected, and as a result of hybridization of all known samples belonging to S1, image data is obtained and stored in the external storage means. A combination of arbitrary two image data belonging to S1 is selected from the total of the stored image data, and vector data composed of fluorescence intensities ordered by probe positions recognized based on the two image data. The Euclidean distance of is calculated. Subsequently, a combination of two image data not selected above is newly selected, and similarly, the Euclidean distance is calculated based on the two newly selected image data. With such a procedure, the Euclidean distance is calculated based on the respective combinations of the image data groups belonging to S1 and stored in the external storage means. FIG. 7 shows a case where six known samples belonging to the target category are prepared. In this case, the number of Euclidean distances calculated based on the two image data is ₆ C ₂ = 15.

ターゲットカテゴリーS1に属する全ての画像データの組み合わせに基づいて算出されたユークリッド距離を、判定指数をｘ軸としてヒストグラムを用いて示したものが図１０である。図１０は距離の分布に２つの山が存在する。カテゴリーに属するサンプルベクトルが２つの領域に局在することを意味する。このように、ヒストグラムからターゲットカテゴリーS１の性質を確認できるので、各ターゲットカテゴリーにごとに適切な判定不能閾値の設定方法を選ぶことができる。
例えば、この距離集合の平均値や中央値などの統計的代表値をもって判定不能閾値とすることができる。 FIG. 10 shows the Euclidean distance calculated based on the combination of all the image data belonging to the target category S1, using a histogram with the determination index as the x-axis. In FIG. 10, there are two mountains in the distance distribution. This means that sample vectors belonging to the category are localized in two regions. As described above, since the property of the target category S1 can be confirmed from the histogram, it is possible to select an appropriate determination impossible threshold setting method for each target category.
For example, a statistical representative value such as an average value or median value of the distance set can be used as the indeterminate threshold value.

次に、得られた判定不能閾値を、図１を用いて前述した未知サンプルの生物種判定処理を実行して、このようにして定めた判定不能閾値の設定が適切かどうかを、方法（１）と同じ方法で確認する。選択したターゲットカテゴリーに含まれないことがあらかじめ判明しているサンプルを未知サンプル１０５として用いる。この未知サンプルについて「判定不能である」との結果がでるかどうかを試験して、判定不能閾値の設定が正しいかどうかを確認する。 Next, the obtained determination impossible threshold is subjected to the biological sample determination processing of the unknown sample described above with reference to FIG. 1 to determine whether or not the determination impossible threshold set in this way is appropriate. ) Confirm in the same way. A sample that is previously known not to be included in the selected target category is used as the unknown sample 105. This unknown sample is tested to see if a result of “undecidable” is obtained, and it is confirmed whether the setting of the undecidable threshold is correct.

次に、上記の生物種類判定方法に用い得る情報処理装置としてのコンピュータシステム、プログラム、画像認識を用いた分析方法などの各処理について説明する。 Next, each processing such as a computer system as an information processing apparatus that can be used in the above-described organism type determination method, a program, and an analysis method using image recognition will be described.

以上説明した生物種類判定は、予め作成されたプログラムに従ったコンピュータ上での処理により自動化可能である。本発明にかかる生物種類判定のための情報処理装置は、生物に由来する物質が含まれていることが想定される物質を分析して、対応する生物種を判定するための情報処理装置である。この情報装置は以下の少なくとも手段を用いて構成することができる。
（１）対応する生物種が判明している複数の既知サンプルを分析して得られる該生物種に特徴的な画像データを入力するための既知サンプル画像データ入力手段。
（２）未知サンプルを前記既知サンプルと同様に分析して得られる画像データを入力する未知サンプル画像データ入力手段。
（３）取り込まれた前記画像データを記憶する記憶手段。
（４）既知サンプルから得た該複数の分析データに基づいて、該既知サンプルに対応する生物種に関する判定不能閾値を設定する手段。
（５）未知サンプルからの画像データを前記判定不能閾値にもとづいて処理し、未知サンプルの提供元である生物の種類を判定する生物種類判定手段。
（６）前記判定手段での判定結果を記憶する記憶手段。
（７）前記記憶手段に記憶された判定結果を出力する出力手段。 The organism type determination described above can be automated by processing on a computer according to a program created in advance. The information processing apparatus for biological species determination according to the present invention is an information processing apparatus for analyzing a substance assumed to contain a substance derived from a living organism and determining a corresponding biological species. . This information device can be configured using at least the following means.
(1) A known sample image data input means for inputting image data characteristic of the biological species obtained by analyzing a plurality of known samples whose corresponding biological species are known.
(2) Unknown sample image data input means for inputting image data obtained by analyzing an unknown sample in the same manner as the known sample.
(3) Storage means for storing the captured image data.
(4) A means for setting an indeterminable threshold for a species corresponding to the known sample based on the plurality of analysis data obtained from the known sample.
(5) Biological type determination means for processing image data from an unknown sample based on the indeterminable threshold and determining the type of the biological source of the unknown sample.
(6) Storage means for storing the determination result of the determination means.
(7) Output means for outputting the determination result stored in the storage means.

上記の判定不能閾値の設定は、記憶手段に３以上の個体からの画像データを記憶手段に記憶させておき、以下のステップを有するプログラムに基づいて実行されることが好ましい。
（ａ）３以上の画像データから１つの画像データを選択して除外し、残りの複数の画像データを用いて識別辞書を作成し、得られた識別辞書に基づいて先に除外した画像データを判定して判定指数を得る処理を、各画像データごとに行なって３以上の判定指数からなる判定指数集合を得るステップ。
（ｂ）前記判定指数集合から判定不能閾値を設定するステップ。 The determination threshold value is preferably set based on a program having the following steps in which image data from three or more individuals is stored in the storage unit.
(A) One image data is selected and excluded from three or more image data, an identification dictionary is created using a plurality of remaining image data, and the image data previously excluded based on the obtained identification dictionary A process of obtaining a judgment index by performing a process of making a judgment and obtaining a judgment index for each image data to obtain a judgment index set composed of three or more judgment indices.
(B) A step of setting an indeterminable threshold value from the determination index set.

また、上記の判定不能閾値の設定は、次のようにして行われることが好ましい。すなわち、記憶手段に３以上の個体からの画像データを記憶手段に記憶させておく。そして、前記個体が３以上であり、前記記憶手段にこれらの個体からの画像データが記憶されており、前記判定不能閾値の設定が、少なくとも以下の工程を実行するプログラムに基づいて行われることも好ましい。
（Ａ）前記３以上の画像データから選択した任意の２つの画像データの全ての組み合せについて、２つの画像データ間の距離を求め、距離集合を得る工程。
（Ｂ）前記距離集合から判定不能閾値を決定する工程。 Moreover, it is preferable that the determination impossible threshold value is set as follows. That is, the storage means stores image data from three or more individuals. In addition, the number of individuals is 3 or more, image data from these individuals is stored in the storage means, and the determination impossible threshold is set based on a program that executes at least the following steps. preferable.
(A) A step of obtaining a distance set by obtaining a distance between two image data for all combinations of arbitrary two image data selected from the three or more image data.
(B) A step of determining an indeterminate threshold from the distance set.

また、本発明にかかる生物種類判定のためのプログラムは、未知サンプルに対応する生物の種類の判定をコンピュータに実行させるためのプログラムであって、少なくとも以下のステップを実行するためのものである。
（１）未知サンプルに対する判定結果として想定される生物種に属する複数の異なる個体からの既知サンプルを分析して得られる該想定される生物種に特徴的な画像データに対応する複数の画像データを格納した記憶手段から、これらの複数の既知サンプル画像データを呼び出すステップ。
（２）未知サンプルを前記既知サンプルと同様にして分析して得られる画像データに対応する複数の画像データを格納した記憶手段から、該未知サンプル画像データを読み出すステップ。
（３）前記既知サンプル画像データから１つを選択し、選択された１つと残りの画像データとの関係を用いて判定不能閾値を設定するステップ。
（４）前記判定不能閾値に基づいて前記未知サンプル画像データを処理し、未知サンプルに対応する生物の種類を判定するステップ。
（５）前記判定ステップで得られた判定結果を記憶手段に格納させるステップ。
（６）前記記憶手段に格納された判定結果を出力するステップ。 In addition, the organism type determination program according to the present invention is a program for causing a computer to execute an organism type determination corresponding to an unknown sample, and for executing at least the following steps.
(1) A plurality of pieces of image data corresponding to image data characteristic of the assumed species obtained by analyzing known samples from a plurality of different individuals belonging to the assumed species as a determination result for the unknown sample. Recalling the plurality of known sample image data from the stored storage means;
(2) A step of reading out the unknown sample image data from the storage means storing a plurality of image data corresponding to the image data obtained by analyzing the unknown sample in the same manner as the known sample.
(3) selecting one from the known sample image data, and setting an indeterminable threshold using a relationship between the selected one and the remaining image data.
(4) A step of processing the unknown sample image data based on the indeterminable threshold and determining a kind of organism corresponding to the unknown sample.
(5) A step of storing the determination result obtained in the determination step in a storage means.
(6) A step of outputting the determination result stored in the storage means.

上記の判定不能閾値の設定ステップは、個体を３以上とし、記憶手段にこれらの個体からの画像データが記憶を記憶させておき、少なくとも以下のステップによって行うことが好ましい。
（ａ）前記３以上の画像データから１つの画像データを選択して除外し、残りの複数の画像データを用いて識別辞書を作成し、得られた識別辞書に基づいて先に除外した画像データを判定して判定指数を得る処理を、各画像データごとに行なって３以上の判定指数からなる判定指数集合を得るステップ。
（ｂ）前記判定指数集合から判定不能棄却閾値を決定するステップ。 It is preferable that the above-described determination impossible threshold setting step is performed by at least the following steps with the number of individuals set to 3 or more and image data from these individuals stored in the storage means.
(A) One image data is selected and excluded from the three or more image data, an identification dictionary is created using a plurality of remaining image data, and image data is excluded first based on the obtained identification dictionary The process of obtaining the determination index by performing the process for determining the determination index for each image data to obtain a determination index set composed of three or more determination indices.
(B) determining a non-determinable rejection threshold from the determination index set;

また、上記の判定不能閾値の設定ステップは、個体を３以上とし、記憶手段にこれらの個体からの画像データが記憶を記憶させておき、少なくとも以下のステップによって行うことが好ましい。
（Ａ）前記３以上の画像データから選択した任意の２つの画像データの全ての組み合せについて、２つの画像データ間の距離を求め、距離集合を得る工程。
（Ｂ）前記距離集合から判定不能閾値を決定する工程。 In addition, it is preferable that the above-described determination impossible threshold setting step is performed by at least the following steps by setting the number of individuals to 3 or more, storing image data from these individuals in the storage means.
(A) A step of obtaining a distance set by obtaining a distance between two image data for all combinations of arbitrary two image data selected from the three or more image data.
(B) A step of determining an indeterminate threshold from the distance set.

記憶手段に、多数の既知生物種類のそれぞれにおける判定不能閾値を格納しておき、未知サンプルの種類に応じて、未知試料に含まれる生物由来物質がその存在を示すことが想定される必要数のカテゴリーを選択するステップをプログラムに追加しておくとよい。このことにより、判定不能かどうかを検討するカテゴリー数を効果的に低減でき、より効率の良い判定処理が可能となる。 The storage means stores indeterminable threshold values for each of a large number of known organism types, and the necessary number of organism-derived substances included in the unknown sample is assumed to indicate the presence according to the type of the unknown sample. It is a good idea to add a category selection step to the program. As a result, the number of categories for examining whether or not determination is possible can be effectively reduced, and more efficient determination processing can be performed.

なお、上記のプログラムは、コンピュータシステムの記憶手段中に保持させておいてもよいし、記録媒体に格納して使用者に配布できるようにしてもよい。更には、ネットワークシステムを介して配布できるようにしてもよい。 The program may be stored in a storage unit of a computer system, or may be stored in a recording medium and distributed to users. Further, it may be distributed via a network system.

図２に、生物種類判定方法を実行し得るコンピュータシステムを利用した情報処理装置の構成の一例のブロック図を示す。この装置は、外部記憶装置２０１、中央処理装置（CPU）２０２、メモリ２０３、入出力装置２０４を少なくとも有して構成される。外部記憶装置２０１には、生物種類判定を行なうための上述した構成のプログラムや、既知サンプル及び未知サンプルを対するハイブリダイゼーション反応を利用した分析の結果としての画像データが保持される。外部記憶装置２０１には、更に判定不能閾値を用いた決定の結果を保持させる。中央処理装置（CPU）２０２は、生物種類判定のためのプログラムを実行したり、すべての装置の制御を行なったりする。メモリ２０３は中央処理装置（CPU）２０２が使用するプログラム、及びサブルーチンやデータを一時的に記録する。入出力装置２０４は、ユーザーとのインタラクションを行う。多くの場合、プログラム実行のトリガーはこの入出力装置を介してユーザーが出す。また、ユーザーが結果を見たり、プログラムのパラメータ制御をこの入出力装置を介して行う。 FIG. 2 is a block diagram showing an example of the configuration of an information processing apparatus using a computer system that can execute the biological species determination method. This apparatus includes at least an external storage device 201, a central processing unit (CPU) 202, a memory 203, and an input / output device 204. The external storage device 201 holds the program having the above-described configuration for performing biological type determination, and image data as a result of analysis using a hybridization reaction for a known sample and an unknown sample. The external storage device 201 further stores the result of determination using the indeterminate threshold. The central processing unit (CPU) 202 executes a program for biological species determination and controls all devices. The memory 203 temporarily records programs, subroutines, and data used by the central processing unit (CPU) 202. The input / output device 204 interacts with the user. In many cases, the user triggers program execution via this input / output device. Also, the user views the results and controls the program parameters via this input / output device.

図３はＤＮＡマイクロアレイ上のハイブリダイゼーションの様子を示した図である。生体内でほとんどの場合、ＤＮＡは２重らせん構造をしていて、その２本鎖の間の結合は塩基間の水素結合で実現されている。一方、ＲＮＡは１本で存在する場合が多い。塩基の種類はDNAの場合はATGCの４種類、RNAの場合はAUGCの４種類であり、それぞれ水素結合ができる塩基対はA-T(U)、G-Cのペアとなっている。一般にハイブリダイゼーション反応とは、１本鎖状態の核酸分子同士がその中にある部分塩基配列を介して部分的に結合する状態をいう。図３に示す例では、図中上側の基板にくっついた核酸分子（プローブ）の方が下側のサンプル中にある核酸分子より短い。サンプル中に存在する核酸分子がプローブの塩基配列を含む場合は、このハイブリダイゼーション反応はうまくいき、サンプル中の核酸分子はＤＮＡマイクロアレイにトラップされることとなる。 FIG. 3 is a diagram showing the state of hybridization on the DNA microarray. In most cases, DNA has a double helix structure, and the bond between the two strands is realized by hydrogen bonding between bases. On the other hand, RNA often exists as a single RNA. There are four types of bases in the case of DNA, ATGC, and in the case of RNA, AUGC. The base pairs capable of hydrogen bonding are A-T (U) and G-C pairs. In general, a hybridization reaction refers to a state in which single-stranded nucleic acid molecules are partially bound to each other through a partial base sequence in the nucleic acid molecule. In the example shown in FIG. 3, the nucleic acid molecule (probe) attached to the upper substrate in the figure is shorter than the nucleic acid molecule in the lower sample. If the nucleic acid molecule present in the sample contains the base sequence of the probe, this hybridization reaction will be successful, and the nucleic acid molecule in the sample will be trapped in the DNA microarray.

次に、図４を用いてＤＮＡマイクロアレイを用いて画像データを得るための実験手順全般について説明する。４０１の「サンプル」とは対象としている生物由来物質、例えば核酸（細胞に含まれている状態のものも含む）が含まれている、あるいは含まれていることが想定される液体や個体である。例えば、感染症の原因菌の特定をするために本発明を適用した場合、ヒト、家畜等の動物由来の血液、喀痰、胃液、膣分泌物、口腔内粘液等の体液、尿及び糞便のような排出物、人、動物などから採取した組織片等の細菌などの微生物やそれに由来する物質が存在すると思われるあらゆる物が未知サンプル４０１の供給元となる。また、食中毒、汚染の対象となる食品、飲料水及び温泉水のような環境中の水等、細菌による汚染が引き起こされる可能性のある媒体が未知サンプルの供給元として用いられることもある。さらに、輸出入時における検疫等の動植物も検体としてその対象となる。既知サンプルの場合には、種類が既知である微生物などから調製されたサンプルである。 Next, the overall experimental procedure for obtaining image data using a DNA microarray will be described with reference to FIG. The “sample” 401 is a liquid or an individual that contains or is assumed to contain a target biological substance, for example, a nucleic acid (including a nucleic acid contained in a cell). . For example, when the present invention is applied to identify the causative agent of an infectious disease, blood derived from animals such as humans and livestock, body fluids such as sputum, gastric juice, vaginal secretions, oral mucus, urine and feces The source of the unknown sample 401 is any effluent, microorganisms such as bacteria collected from humans, animals, etc., or any material that may be derived from such microorganisms. In addition, media that may cause contamination by bacteria, such as food poisoning, food subject to contamination, drinking water, and water in the environment such as hot spring water, may be used as a source of unknown samples. In addition, animals and plants such as quarantine at the time of import / export are also subject to this. In the case of a known sample, it is a sample prepared from a microorganism of a known type.

次に、必要に応じて、４０２の"生化学的増幅"方法を用いて４０１のサンプルとしての核酸を増幅する。例えば感染症の原因菌の特定をするために本発明を適用した場合、16s rRNA検出用に設計されたＰＣＲ反応用プライマーを用いてＰＣＲ法によって対象核酸を増幅したり、或いはＰＣＲ増幅物を元にさらにＰＣＲ反応等を行なって調整したりする。また、ＰＣＲ以外のＬＡＭＰ法などの増幅方法により調整してもよい。 Next, if necessary, the nucleic acid as the sample 401 is amplified using the “biochemical amplification” method 402. For example, when the present invention is applied to identify the causative bacteria of an infectious disease, a target nucleic acid is amplified by a PCR method using a PCR reaction primer designed for 16s rRNA detection, or a PCR amplified product is used as the original. Further, a PCR reaction or the like is performed for adjustment. Moreover, you may adjust by amplification methods, such as LAMP method other than PCR.

その後で、増幅されたサンプル、または４０１のサンプルそのものに、可視化のために各種標識法により標識する。この標識物質としては、通常Cy3, Cy5, Rodaminなどの蛍光物質が用いられる。また、４０２の生化学的増幅の実験手順の中で、標識分子を混入することもある。 Thereafter, the amplified sample or 401 sample itself is labeled by various labeling methods for visualization. As the labeling substance, fluorescent substances such as Cy3, Cy5, and Rodamin are usually used. Further, in the experimental procedure of biochemical amplification of 402, a labeled molecule may be mixed.

そして、標識分子が付加された核酸を図１における、４０４のＤＮＡマイクロアレイとハイブリダイゼーション反応（４０５）を行う。この様子は、図３に示した通りである。例えば、感染症の原因菌の特定をするために本発明を適用した場合、４０４のＤＮＡマイクロアレイは、菌に特異的なプローブを基板に固定したものとなる。各菌のプローブの設計は、例えば16s rRNAをコーディングしているゲノム部分より、当該菌に対し非常に特異性が高く、十分かつそれぞれのプローブ塩基配列で"出来るだけ"ばらつきのないハイブリダイゼーション感度が期待できるように行う。４０４のＤＮＡマイクロアレイのプローブを固定する担体（基板）は、ガラス基板、プラスチック基板、シリコンウェハー等の平面基板が考えられる。また、凹凸のある三次元構造体、ビーズのような球状のもの、棒状、紐状、糸状のもの等を用いても、本発明の実景形態、効果には影響ない。 The nucleic acid to which the labeled molecule is added is subjected to a hybridization reaction (405) with the DNA microarray 404 in FIG. This situation is as shown in FIG. For example, when the present invention is applied to identify a causative bacterium of an infectious disease, the DNA microarray 404 has a bacterium-specific probe immobilized on a substrate. The probe design of each bacterium is, for example, highly specific to the bacterium compared to the genome part coding 16s rRNA, and has sufficient hybridization sensitivity that does not vary as much as possible with each probe base sequence. Do as you expect. The carrier (substrate) for fixing the DNA microarray probe 404 may be a flat substrate such as a glass substrate, a plastic substrate, or a silicon wafer. Further, even if a three-dimensional structure having irregularities, a spherical shape such as a bead, a rod shape, a string shape, a thread shape or the like is used, the actual scene form and effect of the present invention are not affected.

通常、基板の表面はプローブＤＮＡの固定化が可能なように処理したものが使用される。特に、表面に化学反応が可能となるように官能基を導入した物は、ハイブリダイゼーション反応の過程でプローブが安定に結合している為に、再現性の点で好ましい形態である。プローブの固定化方法としては、例えば、マレイミド基とチオール（−ＳＨ）基との組合わせを用いて基板上にプローブを固定化する方法が挙げられる。即ち、核酸プローブの末端にチオール（−ＳＨ）基を結合させておき、固相表面がマレイミド基を有するように処理しておくことで、固相表面に供給された核酸プローブのチオール基と固相表面のマレイミド基とが反応して核酸プローブを固定化する。ガラス基板へのマレイミド基の導入は、まず、ガラス基板にアミノシランカップリング剤を反応させ、次にそのアミノ基とＥＭＣＳ試薬（N-(6-Maleimidocaproyloxy)succinimide :Ｄｏｊｉｎ社製）との反応により行うことができる。ＤＮＡへのＳＨ基の導入は、ＤＮＡ自動合成機上5'-Thiol-ModifierC6（Glen Research社製）を用いる事により行なうことができる。固定化に利用する官能基の組合わせとしては、上記したチオール基とマレイミド基の組合わせ以外にも、例えばエポキシ基（固相上）とアミノ基（核酸プローブ末端）の組合わせ等が挙げられる。また、各種シランカップリング剤による表面処理も有効であり、該シランカップリング剤により導入された官能基と反応可能な官能基を導入したオリゴヌクレオチドが用いられる。さらに、官能基を有する樹脂をコーティングする方法も利用可能である。 Usually, the surface of the substrate is treated so that the probe DNA can be immobilized. In particular, a product having a functional group introduced so that a chemical reaction is possible on the surface is a preferable form in terms of reproducibility because the probe is stably bound in the course of the hybridization reaction. Examples of the probe immobilization method include a method of immobilizing a probe on a substrate using a combination of a maleimide group and a thiol (-SH) group. That is, a thiol (-SH) group is bonded to the end of the nucleic acid probe, and the solid phase surface is treated so as to have a maleimide group, so that the thiol group of the nucleic acid probe supplied to the solid phase surface is fixed. The maleimide group on the phase surface reacts to immobilize the nucleic acid probe. The maleimide group is introduced into the glass substrate by first reacting the glass substrate with an aminosilane coupling agent and then reacting the amino group with an EMCS reagent (N- (6-Maleimidocaproyloxy) succinimide: Dojin). be able to. Introduction of SH groups into DNA can be performed by using 5′-Thiol-Modifier C6 (Glen Research) on an automatic DNA synthesizer. Examples of combinations of functional groups used for immobilization include combinations of epoxy groups (on the solid phase) and amino groups (ends of nucleic acid probes) in addition to the combinations of thiol groups and maleimide groups described above. . Further, surface treatment with various silane coupling agents is also effective, and oligonucleotides into which functional groups capable of reacting with functional groups introduced with the silane coupling agents are used. Furthermore, a method of coating a resin having a functional group can also be used.

ハイブリダイゼーション反応を行った後、４０４のＤＮＡマイクロアレイの表面を洗浄し、プローブと結合していない核酸を剥がした後で、通常は乾燥し、４０５の蛍光量を測定する。そして、ＤＮＡマイクロアレイの基板に励起光を照射し、蛍光強度を測定した画像（４０６）が得られる。この画像（４０６）が画像データとなる。画像データの一例を図６に示した。異なる既知サンプルに対応して、図６の６０１と６０２とで異なる画像データ（画像）が得られている。 After performing the hybridization reaction, the surface of the DNA microarray 404 is washed, the nucleic acid not bound to the probe is peeled off, and then usually dried, and the fluorescence amount 405 is measured. Then, an excitation light is irradiated onto the substrate of the DNA microarray, and an image (406) obtained by measuring the fluorescence intensity is obtained. This image (406) becomes image data. An example of the image data is shown in FIG. Corresponding to different known samples, different image data (images) are obtained between 601 and 602 in FIG.

次に、図５を用いて感染症の菌を特定する場合のＤＮＡマイクロアレイの原理を示す。図５で示したＤＮＡマイクロアレイは、例えば、黄色ブドウ球菌を特定する目的で作られている。左の列は、黄色ブドウ球菌野生株由来の処理系列であり、右の列は大腸菌野生株由来の処理系列である。例えば、左は黄色ブドウ球菌に感染した患者の血液を処理する流れで、右は大腸菌に感染した患者の血液を処理する流れだと考えてよい。 Next, the principle of the DNA microarray in the case of identifying infectious bacteria is shown using FIG. The DNA microarray shown in FIG. 5 is made for the purpose of specifying Staphylococcus aureus, for example. The left column is a treatment sequence derived from a wild strain of S. aureus, and the right column is a treatment sequence derived from a wild strain of Escherichia coli. For example, it can be considered that the flow on the left is a flow for processing blood of a patient infected with S. aureus, and the flow on the right is a flow for processing blood of a patient infected with Escherichia coli.

どちらも基本的には同じ処理を行う。つまり、まず初めに例えば菌感染患者の血液や、痰などからＤＮＡを抽出する。この際に、一般的には、患者の体細胞由来の人間のＤＮＡも含まれる可能性がある。 Both basically perform the same processing. That is, first, DNA is extracted from, for example, blood or sputum of a bacterially infected patient. In this case, generally, human DNA derived from somatic cells of a patient may be included.

抽出されたＤＮＡが少ない場合、ＰＣＲ法などの方法で増幅を行う。この際に蛍光物質もしくは蛍光物質を結合させることができる物質を標識として混入させるのが一般的である。増幅をしない場合は、抽出されたＤＮＡを用いて、相補鎖を作りながら蛍光物質もしくは蛍光物質を結合させることができる物質を標識として混入させる、または、そのまま直接抽出されたＤＮＡに蛍光物質もしくは蛍光物質を結合させることができる物質を標識として付加させる。 When the extracted DNA is small, amplification is performed by a method such as PCR. In this case, a fluorescent substance or a substance capable of binding the fluorescent substance is generally mixed as a label. When amplification is not performed, using the extracted DNA, a fluorescent substance or a substance capable of binding the fluorescent substance is mixed as a label while forming a complementary strand, or the directly extracted DNA is directly fluorescent or fluorescent. A substance capable of binding the substance is added as a label.

通常、ＰＣＲ増幅を行う場合、感染症の菌特定目的であれば、いわゆる16s rRNAといわれるリボゾームＲＮＡを構成する塩基配列の部分を増幅するのが一般的である。この場合、左の黄色ブドウ球菌のＰＣＲプライマーと右の大腸菌のＰＣＲプライマーはほとんど同じものを使うこととなる。より具体的には、どんな菌の16s rRNAをコーディングしている部分でも増幅させることができるプライマーセットを用いて、マルチプレックスＰＣＲを行う。 Usually, when PCR amplification is performed, for the purpose of identifying the bacteria of an infectious disease, it is common to amplify a part of a base sequence constituting ribosomal RNA so-called 16s rRNA. In this case, the PCR primer for S. aureus on the left and the PCR primer for E. coli on the right are almost the same. More specifically, multiplex PCR is performed using a primer set that can amplify the 16s rRNA coding portion of any fungus.

黄色ブドウ球菌を判定する目的のために設計されたＤＮＡマイクロアレイが正しく動作するならば、左のハイブリ溶液では、スポットがポジティブに反応し、右のハイブリ溶液では、スポットがネガティブに反応する。これと全く同じように、大腸菌の存在を判定する目的のために設計されたＤＮＡマイクロアレイが正しく動作するならば、次の反応が生じる。すなわち、左のハイブリ溶液（ハイブリダイゼーション用溶液）では、スポットがネガティブに反応し、右のハイブリ溶液（ハイブリダイゼーション用溶液）では、スポットがポジティブに反応する。 If a DNA microarray designed for the purpose of determining S. aureus works correctly, the left hybrid solution will react positively with the spot and the right hybrid solution will react negatively with the spot. In exactly the same way, if a DNA microarray designed for the purpose of determining the presence of E. coli operates correctly, the following reaction occurs. That is, in the left hybrid solution (hybridization solution), the spot reacts negatively, and in the right hybrid solution (hybridization solution), the spot reacts positively.

ポジティブに反応したスポットからの蛍光強度を測定して図４で示すスキャン画像処理を行なうことで画像データを得ることができる。ここで、同じ種類に属する異なる個体からのサンプルを用いて同じ分析条件で画像データを得た場合に、常に同じ蛍光強度が得られれば、それを辞書として用いればよい。しかしながら、実際には、蛍光強度にバラツキが生じるので、未知サンプルからの画像データがこのバラツキの範囲内なのか、あるいはその範囲外で既知のカテゴリーに属さないと判定すべきかどうかの明確な基準を得ることが困難な場合がある。更に、後述の実施例で示すように、プローブによってはクロスハイブリダイゼーションを生じる。そこで本発明では、図８に示すように同一種類に属する多数の異なる個体からのサンプルを用いた識別辞書作成と判定不能閾値の設定より、カテゴリーごとに未知サンプルの判定を行なうかどうかの基準を明確としている。 Image data can be obtained by measuring the fluorescence intensity from the positively reacted spot and performing the scan image processing shown in FIG. Here, if image data is obtained under the same analysis conditions using samples from different individuals belonging to the same type, if the same fluorescence intensity is always obtained, it can be used as a dictionary. However, in practice, there is a variation in fluorescence intensity, so there is a clear criterion for determining whether image data from an unknown sample should fall within this variation range or not belong to a known category outside that range. It may be difficult to obtain. Furthermore, as shown in the examples described later, cross-hybridization occurs depending on the probe. Therefore, in the present invention, as shown in FIG. 8, a criterion for determining whether or not an unknown sample is determined for each category is based on creation of an identification dictionary using samples from a number of different individuals belonging to the same type and setting of an indeterminate threshold. It is clear.

以下、本発明の生物種類判定方法に利用し得る分析データの取得方法の具体例を挙げる。なお、本発明は、以下に述べる感染症の原因菌特定に限ったものではなく、ＭＨＣなどの人間の体質判定や、癌などの疾病に関わるＤＮＡ、ＲＮＡの解析に用いてもよい。 Hereinafter, specific examples of the analytical data acquisition method that can be used in the biological species determination method of the present invention will be given. The present invention is not limited to the identification of the causative bacteria of infectious diseases described below, but may be used for determination of human constitution such as MHC, and analysis of DNA and RNA related to diseases such as cancer.

実施例１
＜プローブDNAの準備＞
Enterobacter cloacae菌検出用Probeとして以下に示す核酸配列（Ｉ−ｎ）（ｎは数字）を設計した。具体的には、16s rRNAをコーディングしているゲノム部分より、以下に示したプローブ塩基配列を選んだ。これらのプローブ塩基配列群は、当該菌に対し非常に特異性が高く、十分かつそれぞれのプローブ塩基配列で"出来るだけ"ばらつきのないハイブリダイゼーション感度が期待できるように設計されている。
I-1：CAgAgAgCTTgCTCTCgggTgA
I-2：gggAggAAggTgTTgTggTTAATAAC
I-3：ggTgTTgTggTTAATAACCACAgCAA
I-4：gCggTCTgTCAAgTCggATgTg
I-5：ATTCgAAACTggCAggCTAgAgTCT
I-6：TAACCACAgCAATTgACgTTACCCg
I-7：gCAATTgACgTTACCCgCAgAAgA
上記のプローブは、DNAマイクロアレイに固定するための官能基として、合成後、定法に従って核酸の5'末端にチオール基を導入した。官能基の導入後、精製し、凍結乾燥した。凍結乾燥した内部標準用プローブは、-30℃の冷凍庫に保存した。 Example 1
<Preparation of probe DNA>
The following nucleic acid sequence (In) (n is a number) was designed as a probe for detecting Enterobacter cloacae. Specifically, the probe base sequence shown below was selected from the genome part coding 16s rRNA. These probe base sequence groups are designed to be highly specific for the bacteria and to be expected to have sufficient and sensitivity that is as “variable” as possible with each probe base sequence.
I-1: CAgAgAgCTTgCTCTCgggTgA
I-2: gggAggAAggTgTTgTggTTAATAAC
I-3: ggTgTTgTggTTAATAACCACAgCAA
I-4: gCggTCTgTCAAgTCggATgTg
I-5 ： ATTCgAAACTggCAggCTAgAgTCT
I-6: TAACCACAgCAATTgACgTTACCCg
I-7: gCAATTgACgTTACCCgCAgAAgA
After synthesizing the above probe, a thiol group was introduced into the 5 ′ end of the nucleic acid according to a conventional method as a functional group for immobilization on the DNA microarray. After introduction of the functional group, it was purified and lyophilized. The freeze-dried internal standard probe was stored in a -30 ° C freezer.

一方、黄色ブドウ球菌（Ａ−ｎ）、表皮ブドウ球菌（Ｂ−ｎ）、大腸菌（Ｃ−ｎ）、肺炎桿菌（Ｄ−ｎ）、緑膿菌（Ｅ−ｎ）、セラチア菌（Ｆ−ｎ）、肺炎連鎖球菌（Ｇ−ｎ）、インフルエンザ菌（Ｈ−ｎ）、及びエンテロコッカス・フェカリス菌（Ｊ−ｎ）（ｎは数字）についても同様な手法により以下に示すプローブセットを設計した。
A-1：gAACCgCATggTTCAAAAgTgAAAgA
A-2：CACTTATAgATggATCCgCgCTgC
A-3：TgCACATCTTgACggTACCTAATCAg
A-4：CCCCTTAgTgCTgCAgCTAACg
A-5：AATACAAAgggCAgCgAAACCgC
A-6：CCggTggAgTAACCTTTTAggAgCT
A-7：TAACCTTTTAggAgCTAgCCgTCgA
A-8：TTTAggAgCTAgCCgTCgAAggT
A-9：TAgCCgTCgAAggTgggACAAAT
B-1：gAACAgACgAggAgCTTgCTCC
B-2：TAgTgAAAgACggTTTTgCTgTCACT
B-3：TAAgTAACTATgCACgTCTTgACggT
B-4：gACCCCTCTAgAgATAgAgTTTTCCC
B-5：AgTAACCATTTggAgCTAgCCgTC
B-6：gAgCTTgCTCCTCTgACgTTAgC
B-7：AgCCggTggAgTAACCATTTgg
C-1：CTCTTgCCATCggATgTgCCCA
C-2：ATACCTTTgCTCATTgACgTTACCCg
C-3：TTTgCTCATTgACgTTACCCgCAg
C-4：ACTggCAAgCTTgAgTCTCgTAgA
C-5：ATACAAAgAgAAgCgACCTCgCg
C-6：CggACCTCATAAAgTgCgTCgTAgT
C-7：gCggggAggAAgggAgTAAAgTTAAT
D-1：TAgCACAgAgAgCTTgCTCTCgg
D-2：TCATgCCATCAgATgTgCCCAgA
D-3：CggggAggAAggCgATAAggTTAAT
D-4：TTCgATTgACgTTACCCgCAgAAgA
D-5：ggTCTgTCAAgTCggATgTgAAATCC
D-6：gCAggCTAgAgTCTTgTAgAgggg
E-1：TgAgggAgAAAgTgggggATCTTC
E-2：TCAgATgAgCCTAggTCggATTAgC
E-3：gAgCTAgAgTACggTAgAgggTgg
E-4：gTACggTAgAgggTggTggAATTTC
E-5：gACCACCTggACTgATACTgACAC
E-6：TggCCTTgACATgCTgAgAACTTTC
E-7：TTAgTTACCAgCACCTCgggTgg
E-8：TAgTCTAACCgCAAgggggACg
F-1：TAgCACAgggAgCTTgCTCCCT
F-2：AggTggTgAgCTTAATACgCTCATC
F-3：TCATCAATTgACgTTACTCgCAgAAg
F-4：ACTgCATTTgAAACTggCAAgCTAgA
F-5：TTATCCTTTgTTgCAgCTTCggCC
F-6：ACTTTCAgCgAggAggAAggTgg
G-1：AgTAgAACgCTgAAggAggAgCTTg
G-2：CTTgCATCACTACCAgATggACCTg
G-3：TgAgAgTggAAAgTTCACACTgTgAC
G-4：gCTgTggCTTAACCATAgTAggCTTT
G-5：AAgCggCTCTCTggCTTgTAACT
G-6：TAgACCCTTTCCggggTTTAgTgC
G-7：gACggCAAgCTAATCTCTTAAAgCCA
H-1：gCTTgggAATCTggCTTATggAgg
H-2：TgCCATAggATgAgCCCAAgTgg
H-3：CTTgggAATgTACTgACgCTCATgTg
H-4：ggATTgggCTTAgAgCTTggTgC
H-5：TACAgAgggAAgCgAAgCTgCg
H-6：ggCgTTTACCACggTATgATTCATgA
H-7：AATgCCTACCAAgCCTgCgATCT
H-8：TATCggAAgATgAAAgTgCgggACT
J-1：TTCTTTCCTCCCgAgTgCTTgCA
J-2：AACACgTgggTAACCTACCCATCAg
J-3：ATggCATAAgAgTgAAAggCgCTT
J-4：gACCCgCggTgCATTAgCTAgT
J-5：ggACgTTAgTAACTgAACgTCCCCT
J-6：CTCAACCggggAgggTCATTgg
J-7：TTggAgggTTTCCgCCCTTCAg
＜検体増幅用PCR Primer の準備＞
起炎菌検出用の為の16s rRNA核酸（標的核酸）増幅用PCR Primerとして表１に示す核酸配列を設計した。具体的には、16s rRNAをコーディングしているゲノム部分を特異的に増幅するプローブセット、つまり約1500塩基長の16s rRNAコーディング領域の両端部分で、特異的な融解温度をできるだけ揃えたプライマーを設計した。なお、変異株や、ゲノム上に複数存在する16s rRNAコーディング領域も同時に増幅できるように複数種類のプライマーを設計した。 On the other hand, Staphylococcus aureus (An), Staphylococcus epidermidis (Bn), Escherichia coli (Cn), Neisseria pneumoniae (Dn), Pseudomonas aeruginosa (En), Serratia fungus (Fn) ), Streptococcus pneumoniae (Gn), Haemophilus influenzae (Hn), and Enterococcus faecalis (Jn) (n is a number), the following probe set was designed by the same method.
A-1: gAACCgCATggTTCAAAAgTgAAAgA
A-2: CACTTATAgATggATCCgCgCTgC
A-3: TgCACATCTTgACggTACCTAATCAg
A-4: CCCCTTAgTgCTgCAgCTAACg
A-5: AATACAAAgggCAgCgAAACCgC
A-6: CCggTggAgTAACCTTTTAggAgCT
A-7: TAACCTTTTAggAgCTAgCCgTCgA
A-8: TTTAggAgCTAgCCgTCgAAggT
A-9: TAgCCgTCgAAggTgggACAAAT
B-1: gAACAgACgAggAgCTTgCTCC
B-2: TAgTgAAAgACggTTTTgCTgTCACT
B-3: TAAgTAACTATgCACgTCTTgACggT
B-4: gACCCCTCTAgAgATAgAgTTTTCCC
B-5: AgTAACCATTTggAgCTAgCCgTC
B-6: gAgCTTgCTCCTCTgACgTTAgC
B-7: AgCCggTggAgTAACCATTTgg
C-1: CTCTTgCCATCggATgTgCCCA
C-2: ATACCTTTgCTCATTgACgTTACCCg
C-3: TTTgCTCATTgACgTTACCCgCAg
C-4: ACTggCAAgCTTgAgTCTCgTAgA
C-5: ATACAAAgAgAAgCgACCTCgCg
C-6: CggACCTCATAAAgTgCgTCgTAgT
C-7: gCggggAggAAgggAgTAAAgTTAAT
D-1: TAgCACAgAgAgCTTgCTCTCgg
D-2: TCATgCCATCAgATgTgCCCAgA
D-3: CggggAggAAggCgATAAggTTAAT
D-4: TTCgATTgACgTTACCCgCAgAAgA
D-5: ggTCTgTCAAgTCggATgTgAAATCC
D-6: gCAggCTAgAgTCTTgTAgAgggg
E-1: TgAgggAgAAAgTgggggATCTTC
E-2: TCAgATgAgCCTAggTCggATTAgC
E-3: gAgCTAgAgTACggTAgAgggTgg
E-4: gTACggTAgAgggTggTggAATTTC
E-5: gACCACCTggACTgATACTgACAC
E-6: TggCCTTgACATgCTgAgAACTTTC
E-7: TTAgTTACCAgCACCTCgggTgg
E-8: TAgTCTAACCgCAAgggggACg
F-1: TAgCACAgggAgCTTgCTCCCT
F-2: AggTggTgAgCTTAATACgCTCATC
F-3: TCATCAATTgACgTTACTCgCAgAAg
F-4: ACTgCATTTgAAACTggCAAgCTAgA
F-5: TTATCCTTTgTTgCAgCTTCggCC
F-6: ACTTTCAgCgAggAggAAggTgg
G-1: AgTAgAACgCTgAAggAggAgCTTg
G-2: CTTgCATCACTACCAgATggACCTg
G-3: TgAgAgTggAAAgTTCACACTgTgAC
G-4: gCTgTggCTTAACCATAgTAggCTTT
G-5: AAgCggCTCTCTggCTTgTAACT
G-6: TAgACCCTTTCCggggTTTAgTgC
G-7: gACggCAAgCTAATCTCTTAAAgCCA
H-1: gCTTgggAATCTggCTTATggAgg
H-2: TgCCATAggATgAgCCCAAgTgg
H-3: CTTgggAATgTACTgACgCTCATgTg
H-4: ggATTgggCTTAgAgCTTggTgC
H-5: TACAgAgggAAgCgAAgCTgCg
H-6: ggCgTTTACCACggTATgATTCATgA
H-7: AATgCCTACCAAgCCTgCgATCT
H-8: TATCggAAgATgAAAgTgCgggACT
J-1: TTCTTTCCTCCCgAgTgCTTgCA
J-2: AACACgTgggTAACCTACCCATCAg
J-3: ATggCATAAgAgTgAAAggCgCTT
J-4: gACCCgCggTgCATTAgCTAgT
J-5: ggACgTTAgTAACTgAACgTCCCCT
J-6: CTCAACCggggAgggTCATTgg
J-7: TTggAgggTTTCCgCCCTTCAg
<Preparation of PCR Primer for sample amplification>
The nucleic acid sequences shown in Table 1 were designed as PCR primers for amplifying 16s rRNA nucleic acid (target nucleic acid) for detection of pathogenic bacteria. Specifically, a probe set that specifically amplifies the 16s rRNA-encoding genomic region, that is, a primer with the same specific melting temperature as possible at both ends of the approximately 1500-base long 16s rRNA coding region did. Multiple primers were designed to simultaneously amplify mutant strains and multiple 16s rRNA coding regions on the genome.

表中に示したPrimerは、合成後、高速液体クロマトグラフィー（HPLC）により精製し、Forward Primer 3種、Reverse Primer 3種を混合し、それぞれのPrimer濃度が、最終濃度10 pmol/μl となるようにTE緩衝液に溶解した。 The primers shown in the table are synthesized and purified by high performance liquid chromatography (HPLC), and 3 kinds of Forward Primer and 3 kinds of Reverse Primer are mixed so that each Primer concentration is 10 pmol / μl final concentration. Dissolved in TE buffer.

＜Enterobacter#cloacae Genome DNA（モデル検体）の抽出＞
（微生物の培養＆ Genome DNA 抽出の前処理）
まず、Enterobacter cloacae 標準株を、定法に従って培養した。この微生物培養液を1.5ml容量のマイクロチューブに1.0ml（OD600=0.7）採取し、遠心分離で菌体を回収した（8500rpm、5min、4℃）。上精を捨てた後、Enzyme Buffer（50mM Tris-HCl：p.H. 8.0、25mM EDTA）300μlを加え、ミキサーを用いて再縣濁した。再縣濁した菌液は、再度、遠心分離で菌体を回収した（8500rpm、5min、4℃）。上精を捨てた後、回収された菌体に、以下の酵素溶液を加え、ミキサーを用いて再縣濁した。
Lysozyme：50 μl （20 mg/ml in Enzyme Buffer）
N-Acetylmuramidase SG：50 μl （0.2 mg/ml in Enzyme Buffer）
次に、酵素溶液を加え再縣濁した菌液を、37℃のインキュベーター内で30分間静置し、細胞壁の溶解処理を行った。 <Enterobacter # cloacae Genome DNA (model specimen) extraction>
(Preparation of microorganism culture & Genome DNA extraction)
First, the Enterobacter cloacae standard strain was cultured according to a conventional method. 1.0 ml (OD600 = 0.7) of this microorganism culture solution was collected in a 1.5 ml capacity microtube, and the cells were collected by centrifugation (8500 rpm, 5 min, 4 ° C.). After discarding the supernatant, 300 μl of Enzyme Buffer (50 mM Tris-HCl: pH 8.0, 25 mM EDTA) was added and resuspended using a mixer. The resuspended bacterial solution was recovered again by centrifugation (8500 rpm, 5 min, 4 ° C.). After discarding the upper fine particles, the following enzyme solution was added to the collected cells and resuspended using a mixer.
Lysozyme: 50 μl (20 mg / ml in Enzyme Buffer)
N-Acetylmuramidase SG: 50 μl (0.2 mg / ml in Enzyme Buffer)
Next, the bacterial solution resuspended by adding the enzyme solution was allowed to stand for 30 minutes in a 37 ° C. incubator to lyse the cell wall.

（Genome抽出）
以下に示す微生物のGenome DNA抽出は、核酸精製キット（MagExtractor -Genome-：TOYOBO社製）を用いて行った。具体的には、まず、前処理した微生物縣濁液に溶解・吸着液750μlと磁性ビーズ40μlを加え、チューブミキサーを用いて、10分間激しく攪拌した（ステップ１）。次に、分離用スタンド（Magical Trapper）にマイクロチューブをセットし、30秒間静置して磁性粒子をチューブの壁面に集め、スタンドにセットした状態のまま、上精を捨てた（ステップ２）。次に、洗浄液 900 μl を加え、ミキサーで5sec程度攪拌して再縣濁を行った（ステップ３）。次に、分離用スタンド（Magical Trapper）にマイクロチューブをセットし、30秒間静置して磁性粒子をチューブの壁面に集め、スタンドにセットした状態のまま、上精を捨てた（ステップ４）。ステップ３、４を繰り返して2度目の洗浄（ステップ５）を行った後、70％エタノール 900 μl を加え、ミキサーで5sec程度攪拌して再縣濁した（ステップ６）。次に、分離用スタンド（Magical Trapper）にマイクロチューブをセットし、30秒間静置して磁性粒子をチューブの壁面に集め、スタンドにセットした状態のまま、上精を捨てた（ステップ７）。ステップ６、７を繰り返して70％エタノールによる2度目の洗浄（ステップ８）を行った後、回収された磁性粒子に純水 100 μl を加え、チューブミキサーで10分間攪拌を行った。 (Genome extraction)
Genome DNA extraction of the following microorganisms was performed using a nucleic acid purification kit (MagExtractor-Genome-: manufactured by TOYOBO). Specifically, first, 750 μl of the dissolving / adsorbing solution and 40 μl of magnetic beads were added to the pretreated microorganism suspension and vigorously stirred for 10 minutes using a tube mixer (step 1). Next, the microtube was set on a separation stand (Magical Trapper), left to stand for 30 seconds, the magnetic particles were collected on the wall of the tube, and the upper fine was discarded while being set on the stand (step 2). Next, 900 μl of the washing solution was added, and the mixture was stirred for about 5 seconds with a mixer and re-suspended (step 3). Next, a microtube was set on a separation stand (Magical Trapper), left still for 30 seconds, magnetic particles were collected on the wall surface of the tube, and the upper fine was discarded while being set on the stand (step 4). Steps 3 and 4 were repeated to perform the second washing (Step 5), and then 900 μl of 70% ethanol was added and stirred for about 5 seconds with a mixer to re-suspend (Step 6). Next, the microtube was set on a separation stand (Magical Trapper), left to stand for 30 seconds, the magnetic particles were collected on the wall of the tube, and the upper fine was discarded while being set on the stand (step 7). Steps 6 and 7 were repeated and the second washing with 70% ethanol (step 8) was performed. Then, 100 μl of pure water was added to the collected magnetic particles, and the mixture was stirred for 10 minutes with a tube mixer.

次に分離用スタンド（Magical Trapper）にマイクロチューブをセットし、30秒間静置して磁性粒子をチューブ壁面に集め、スタンドにセットした状態のまま、上精を新しいチューブに回収した。 Next, the microtube was set on a separation stand (Magical Trapper), allowed to stand for 30 seconds to collect the magnetic particles on the wall of the tube, and the supernatant was collected in a new tube while being set on the stand.

（回収したGenome DNAの検査）
回収された微生物（Enterobacter cloacae 株）のGenome DNAは、定法に従って、アガロース電気泳動と260/280nmの吸光度測定を行い、その品質（低分子核酸の混入量、分解の程度）と回収量を検定した。本実施例では、約10μgのGenome DNA が回収され、GenomeDNAのデグラデーションやｒRNAの混入は認められなかった。回収したGenome DNAは、最終濃度50ng/μｌとなるようにTE緩衝液に溶解し、以下の工程に使用した。 (Examination of recovered Genome DNA)
Genome DNA of the recovered microorganism (Enterobacter cloacae strain) was subjected to agarose electrophoresis and absorbance measurement at 260/280 nm in accordance with conventional methods, and the quality (contamination amount of low-molecular nucleic acid, degree of degradation) and recovery amount were tested. . In this example, about 10 μg of Genome DNA was recovered, and Genome DNA degradation and rRNA contamination were not observed. The recovered Genome DNA was dissolved in TE buffer so as to have a final concentration of 50 ng / μl and used in the following steps.

＜DNAマイクロアレイの作製＞
［1］ガラス基板の洗浄
合成石英のガラス基板（サイズ：25mmｘ75mmｘ1mm、飯山特殊ガラス社製）を耐熱、耐アルカリのラックに入れ、所定の濃度に調製した超音波洗浄用の洗浄液に浸した。一晩洗浄液中で浸した後、20分間超音波洗浄を行った。続いて基板を取り出し、軽く純水ですすいだ後、超純水中で20分超音波洗浄をおこなった。次に80℃に加熱した１N水酸化ナトリウム水溶液中に10分間基板を浸した。再び純水洗浄と超純水洗浄を行い、DNAチップ用の石英ガラス基板を用意した。 <Production of DNA microarray>
[1] Cleaning of glass substrate A synthetic quartz glass substrate (size: 25 mm x 75 mm x 1 mm, manufactured by Iiyama Special Glass Co., Ltd.) was placed in a heat-resistant and alkali-resistant rack and immersed in a cleaning solution for ultrasonic cleaning adjusted to a predetermined concentration. After soaking in the cleaning solution overnight, ultrasonic cleaning was performed for 20 minutes. Subsequently, the substrate was taken out, rinsed lightly with pure water, and then ultrasonically cleaned in ultrapure water for 20 minutes. Next, the substrate was immersed in a 1N sodium hydroxide aqueous solution heated to 80 ° C. for 10 minutes. Pure water cleaning and ultrapure water cleaning were performed again to prepare a quartz glass substrate for a DNA chip.

［2］表面処理
シランカップリング剤KBM-603(信越シリコーン社製)を、1%の濃度となるように純水中に溶解させ、2時間室温で攪拌した。続いて、先に洗浄したガラス基板をシランカップリング剤水溶液に浸し、20分間室温で放置した。ガラス基板を引き上げ、軽く純水で表面を洗浄した後、窒素ガスを基板の両面に吹き付けて乾燥させた。次に乾燥した基板を120℃に加熱したオーブン中で1時間ベークし、カップリング剤処理を完結させ、基板表面にアミノ基を導入した。次いで同仁化学研究所社製のN-マレイミドカプロイロキシスクシイミドを、ジメチルスルホキシドとエタノールの1:1混合溶媒中に最終濃度が0.3mg/mlとなるように溶解させてEMCS溶液を用意した。なお、N-マレイミドカプロイロキシスクシイミド(N-(6-Maleimidocaproyloxy)succinimido)を以下EMCSと略す。ベークの終了したガラス基板を放冷し、調製したEMCS溶液中に室温で2時間浸した。この処理により、シランカップリング剤によって表面に導入されたアミノ基とEMCSのスクシイミド基が反応し、ガラス基板表面にマレイミド基が導入された。EMCS溶液から引き上げたガラス基板を、先述のMCSを溶解した混合溶媒を用いて洗浄し、さらにエタノールにより洗浄した後、窒素ガス雰囲気下で乾燥させた。 [2] Surface treatment A silane coupling agent KBM-603 (manufactured by Shin-Etsu Silicone) was dissolved in pure water to a concentration of 1% and stirred at room temperature for 2 hours. Subsequently, the previously cleaned glass substrate was immersed in an aqueous silane coupling agent solution and allowed to stand at room temperature for 20 minutes. The glass substrate was pulled up and the surface was lightly washed with pure water, and then nitrogen gas was blown onto both sides of the substrate to dry it. Next, the dried substrate was baked in an oven heated to 120 ° C. for 1 hour to complete the coupling agent treatment, and amino groups were introduced onto the substrate surface. Next, N-maleimidocaproyloxy succinimide manufactured by Dojindo Laboratories was dissolved in a 1: 1 mixed solvent of dimethyl sulfoxide and ethanol so that the final concentration was 0.3 mg / ml to prepare an EMCS solution. . Hereinafter, N- (maleimidocaproyloxy) succinimido is abbreviated as EMCS. The glass substrate after baking was allowed to cool and immersed in the prepared EMCS solution for 2 hours at room temperature. By this treatment, the amino group introduced to the surface by the silane coupling agent and the succinimide group of EMCS reacted, and the maleimide group was introduced to the glass substrate surface. The glass substrate pulled up from the EMCS solution was washed with the above-mentioned mixed solvent in which MCS was dissolved, further washed with ethanol, and then dried in a nitrogen gas atmosphere.

［3］プローブDNA
先に作製した微生物検出用プローブを純水に溶解し、それぞれ、最終濃度（インク溶解時）10μMとなるように分注した後、凍結乾燥を行い、水分を除いた。 [3] Probe DNA
The prepared microorganism detection probe was dissolved in pure water, dispensed to a final concentration (at the time of ink dissolution) of 10 μM, and then freeze-dried to remove moisture.

［4］BJプリンターによるDNA吐出、および基板への結合
グリセリン7.5wt%、チオジグリコール7.5wt%、尿素7.5wt%、アセチレノールEH(川研ファインケミカル社製)1.0wt%を含む水溶液を用意した。続いて、先に用意した7種類のプローブ（表１）を上記の混合溶媒に規定濃度なるように溶解した。得られたDNA溶液をバブルジェットプリンター（商品名：BJF-850 キヤノン社製）用インクタンクに充填し、印字ヘッドに装着した。 [4] DNA ejection by BJ printer and binding to substrate An aqueous solution containing glycerin 7.5 wt%, thiodiglycol 7.5 wt%, urea 7.5 wt%, and acetylenol EH (manufactured by Kawaken Fine Chemical Co., Ltd.) 1.0 wt% was prepared. Subsequently, the seven types of probes (Table 1) prepared previously were dissolved in the above mixed solvent so as to have a specified concentration. The obtained DNA solution was filled in an ink tank for a bubble jet printer (trade name: BJF-850 manufactured by Canon Inc.) and mounted on a print head.

なおここで用いたバブルジェットプリンターは平板への印刷が可能なように改造を施したものである。またこのバブルジェットプリンターは、所定のファイル作成方法に従って印字パターンを入力することにより、約5ピコリットルのDNA溶液を約120マイクロメートルピッチでスポッティングすることが可能となっている。続いて、この改造バブルジェットプリンターを用いて、1枚のガラス基板に対して、印字操作を行い、アレイを作製した。印字が確実に行われていることを確認した後、30分間加湿チャンバー内に静置し、ガラス基板表面のマレイミド基と核酸プローブ末端のチオール基とを反応させた。 The bubble jet printer used here is modified so that printing on a flat plate is possible. The bubble jet printer can spot a DNA solution of about 5 picoliters at a pitch of about 120 micrometers by inputting a print pattern according to a predetermined file creation method. Subsequently, using this modified bubble jet printer, a printing operation was performed on one glass substrate to produce an array. After confirming that printing was performed reliably, the sample was left in a humidified chamber for 30 minutes to react the maleimide group on the surface of the glass substrate with the thiol group at the end of the nucleic acid probe.

［5］洗浄
30分間の反応後、100mMのNaClを含む10mMのリン酸緩衝液(pH7.0)により表面に残ったDNA溶液を洗い流し、ガラス基板表面に一本鎖DNAが固定したＤＮＡマイクロアレイを得た。 [5] Cleaning
After the reaction for 30 minutes, the DNA solution remaining on the surface was washed away with 10 mM phosphate buffer (pH 7.0) containing 100 mM NaCl to obtain a DNA microarray in which single-stranded DNA was immobilized on the surface of the glass substrate.

＜検体の増幅と標識化（PCR増幅＆蛍光標識の取り込み）＞
検体となる微生物ＤＮＡの増幅、および、標識化反応を以下に示す。
Premix PCR 試薬（TAKARA ExTaq）：25μl
Template Genome DNA：2μl (100ng)
Forward Primer mix： 2μl (20pmol/tube each)
Reverse Primer mix： 2μl (20pmol/tube each)
Cy-3 dUTP (1mM)： 2μl (2nmol/tube)
H₂0：17μl
（Total：50μl）
上記組成の反応液を以下のプロトコールに従って、市販のサーマルサイクラーで増幅反応を行った。
（ステップ１）95℃、10 min.
（ステップ２）92℃、45 sec.
（ステップ３）55℃、45 sec.
（ステップ４）72℃、45 sec.
（ステップ５）72℃、10 min.
（ステップ２〜４は３５回繰り返した。）
反応終了後、精製用カラム（QIAGEN QIAquick PCR Purification Kit）を用いてPrimerを除去した後、増幅産物の定量を行い、標識化検体とした。 <Amplification and labeling of specimen (PCR amplification & incorporation of fluorescent label)>
Amplification of microbial DNA as a specimen and labeling reaction are shown below.
Premix PCR reagent (TAKARA ExTaq): 25μl
Template Genome DNA: 2μl (100ng)
Forward Primer mix: 2μl (20 pmol/tube each)
Reverse Primer mix: 2μl (20 pmol/tube each)
Cy-3 dUTP (1mM): 2μl (2nmol / tube)
H ₂ 0: 17 μl
(Total: 50μl)
The reaction solution having the above composition was subjected to an amplification reaction using a commercially available thermal cycler according to the following protocol.
(Step 1) 95 ° C, 10 min.
(Step 2) 92 ° C, 45 sec.
(Step 3) 55 ° C, 45 sec.
(Step 4) 72 ° C, 45 sec.
(Step 5) 72 ° C, 10 min.
(Steps 2-4 were repeated 35 times.)
After completion of the reaction, Primer was removed using a purification column (QIAGEN QIAquick PCR Purification Kit), and then the amplification product was quantified to obtain a labeled sample.

＜ハイブリダイゼーション＞
＜DNAマイクロアレイの作製＞で作製したＤＮＡマイクロアレイと＜検体の増幅と標識化（PCR増幅＆蛍光標識の取り込み）＞で作製した標識化検体を用いて検出反応を行った。 <Hybridization>
Detection reaction was performed using the DNA microarray prepared in <Preparation of DNA microarray> and the labeled sample prepared in <Amplification and labeling of specimen (PCR amplification & incorporation of fluorescent label)>.

（ＤＮＡマイクロアレイのブロッキング）
BSA（牛血清アルブミンFraction V：Sigma社製）を1wt％となるように100mM NaCl / 10mM Phosphate Bufferに溶解した。この溶液に＜DNAマイクロアレイの作製＞で作製したＤＮＡマイクロアレイを室温で2時間浸し、ブロッキングを行った。ブロッキング終了後、0.1wt％SDS（ドデシル硫酸ナトリウム）を含む２ｘSSC溶液（NaCl 300mM 、Sodium Citrate (trisodium citrate dihydrate, C6H5Na3・2H2O) 30mM、p.H. 7.0）で洗浄を行った。その後、純水でリンスしてからスピンドライ装置で水切りを行った。 (Blocking of DNA microarray)
BSA (bovine serum albumin Fraction V: manufactured by Sigma) was dissolved in 100 mM NaCl / 10 mM Phosphate Buffer so as to be 1 wt%. The DNA microarray prepared in <Preparation of DNA microarray> was immersed in this solution at room temperature for 2 hours for blocking. After completion of blocking, washing was performed with a 2 × SSC solution containing 0.1 wt% SDS (sodium dodecyl sulfate) (NaCl 300 mM, sodium citrate (trisodium citrate dihydrate, C6H5Na3 · 2H2O) 30 mM, pH 7.0). Then, after rinsing with pure water, draining was performed with a spin dryer.

（ハイブリダイゼーション）
水切りしたＤＮＡマイクロアレイをハイブリダイゼーション装置（Genomic Solutions Inc. Hybridization Station）にセットし、以下に示すハイブリダイゼーション溶液、条件でハイブリダイゼーション反応を行った。 (Hybridization)
The drained DNA microarray was set in a hybridization apparatus (Genomic Solutions Inc. Hybridization Station), and a hybridization reaction was performed with the following hybridization solution and conditions.

＜ハイブリダイゼーション溶液＞
6 x SSPE / 10% Form amide / Target (2nd PCR Products 全量)
(6xSSPE: NaCl 900mM、NaH2PO4・H2O 60mM、EDTA 6mM、p.H. 7.4)
＜ハイブリダイゼーション条件＞
65 ℃、3min→92℃、2min→45℃、3hr→Wash、2ｘSSC/0.1% SDS、25℃→Wash、2 x SSC、20℃→(Rinse with H₂O: Manual)→Spin dry
＜微生物の検出（蛍光測定）＞
ハイブリダイゼーション反応終了後のＤＮＡマイクロアレイをＤＮＡマイクロアレイ用蛍光検出装置（Axon社製、GenePix 4000B）を用いで蛍光測定を行った。 <Hybridization solution>
6 x SSPE / 10% Form amide / Target (2nd PCR Products total amount)
(6xSSPE: NaCl 900mM, NaH2PO4 ・ H2O 60mM, EDTA 6mM, pH 7.4)
<Hybridization conditions>
65 ° C, 3 min → 92 ° C, 2 min → 45 ° C, 3 hr → Wash, 2 x SSC / 0.1% SDS, 25 ° C → Wash, 2 x SSC, 20 ° C → (Rinse with H ₂ O: Manual) → Spin dry
<Detection of microorganisms (fluorescence measurement)>
After completion of the hybridization reaction, the DNA microarray was subjected to fluorescence measurement using a fluorescence detector for DNA microarray (Axon, GenePix 4000B).

この結果得られた画像データとしての画像の例を図６に示す。なお、図６においてより蛍光強度の強いプローブは、より濃い色で示している。６０１はＤＮＡマイクロアレイに黄色ブドウ球菌のゲノムを含むサンプルを反応させた画像で、６０２は大腸菌のゲノムを含むサンプルを反応させた画像の例である。図の左に書いているアルファベットは、プローブ配列のアルファベットである。ＡからＪまでそれぞれ、以下の各菌に特異的に結合するように設計されたプローブである。
（Ａ）黄色ブドウ球菌。
（Ｂ）表皮ブドウ球菌。
（Ｃ）大腸菌。
（Ｄ）肺炎桿菌。
（Ｅ）緑膿菌。
（Ｆ）セラチア菌。
（Ｇ）肺炎連鎖球菌。
（Ｈ）インフルエンザ菌。
（Ｉ）エンテロバクター・クロアカエ菌。
（Ｊ）エンテロコッカス・フェカリス菌。 An example of an image as image data obtained as a result is shown in FIG. In FIG. 6, the probe having a higher fluorescence intensity is shown in a darker color. Reference numeral 601 denotes an image obtained by reacting a DNA microarray with a sample containing the genome of Staphylococcus aureus. Reference numeral 602 denotes an example of an image obtained by reacting a sample containing the genome of E. coli. The alphabet written on the left of the figure is the probe sequence alphabet. Each of A to J is a probe designed to specifically bind to each of the following bacteria.
(A) Staphylococcus aureus.
(B) Staphylococcus epidermidis.
(C) E. coli.
(D) Neisseria pneumoniae.
(E) Pseudomonas aeruginosa.
(F) Serratia bacteria.
(G) Streptococcus pneumoniae.
(H) Haemophilus influenzae.
(I) Enterobacter cloacae.
(J) Enterococcus faecalis.

理想的には、６０１のＡの行のプローブだけが蛍光強度が高くなり、かつ、６０２のＣの行のプローブだけが蛍光強度が高くなる。この６０１の理想的な結果は、図５に示した実験結果の例と同じである。 Ideally, only the probe in row A of 601 has high fluorescence intensity, and only the probe in row C of 602 has high fluorescence intensity. The ideal result of 601 is the same as the example of the experimental result shown in FIG.

しかし、図６に示すように、実際は理想通りにはならない。つまり、いわゆる"クロスハイブリダイゼーション反応"がおこり、６０１の場合は、Ａ以外の行のプローブも蛍光強度が強く、また、６０２の場合は、Ｃ以外の行のプローブも蛍光強度が強い。更に、６０２の場合、Ｃの行でも蛍光強度の弱いプローブもある。 However, as shown in FIG. 6, it is not actually ideal. That is, a so-called “cross-hybridization reaction” occurs, and in the case of 601, the probes in rows other than A also have high fluorescence intensity, and in the case of 602, the probes in rows other than C also have high fluorescence intensity. Furthermore, in the case of 602, there is also a probe with weak fluorescence intensity even in the C row.

この状況を３つのプローブの系で説明したのが図７である。黄色ブドウ球菌（S. aureus）、表皮ブドウ球菌（S. epiderimidis）、大腸菌（E. coli）の3種類のプローブがあるＤＮＡマイクロアレイを用いてそれぞれの菌について６種類の既知サンプルの実験をしている。一般にプローブがＮ個ある場合、実験データはＮ次元のベクトルとなる。図６の場合、プローブが合計７２個あるので、７２次元のベクトル、図７の場合、プローブが３つあるので、３次元のベクトルが実験データとなる。 FIG. 7 illustrates this situation with a system of three probes. Experiments on six known samples for each bacterium using DNA microarrays with three types of probes: S. aureus, S. epiderimidis, and E. coli Yes. In general, when there are N probes, the experimental data is an N-dimensional vector. In the case of FIG. 6, since there are 72 probes in total, the 72-dimensional vector, and in the case of FIG. 7, since there are three probes, the three-dimensional vector is the experimental data.

図７下の図で、３菌それぞれ6種類のサンプル（＝合計１８個のデータ）を3次元座標にプロットしてある。図に示した通り、３つのプローブが理想的にそれぞれ３つの菌に非常に特異的なプローブである場合、図７下のようにベクトルデータは、それぞれの軸のまわりに集中する。但し、データの揺らぎは存在し、１つの点に集中するわけではない。図７の例でいうと、３菌それぞれのデータ存在範囲の大きさは異なっており、大きい順で大腸菌（E. coli）、表皮ブドウ球菌（S. epiderimidis）、黄色ブドウ球菌（S. aureus）となっている。 In the lower figure of FIG. 7, 6 types of samples (= 18 data in total) for each of the three bacteria are plotted in three-dimensional coordinates. As shown in the figure, when the three probes are ideally very specific probes for each of the three bacteria, the vector data is concentrated around the respective axes as shown in the lower part of FIG. However, data fluctuations exist and do not concentrate on one point. In the example of FIG. 7, the size of the data existence range of each of the three bacteria is different, and in descending order, E. coli, S. epiderimidis, S. aureus. It has become.

さらに、各菌ごとに先に示した図１及び図８の方法に従って判定指数集合を導き、判定不能閾値を設定し、設定された判定不能閾値を用いて未知サンプルの供給元である菌の判定を行なうか、あるいは判定しない点を決定することができる。 Further, a determination index set is derived for each bacterium according to the method shown in FIGS. 1 and 8 described above, an indeterminate threshold is set, and the bacterium that is an unknown sample supply source is determined using the set indeterminate threshold. It is possible to determine whether or not to perform the determination.

実施例２
以下に肺炎桿菌とセラチア菌のＤＮＡマイクロアレイの実験データを示す。なお、プローブとしては以下の各菌の検出用のものを用いている。
黄色ブドウ球菌（S.aureus）（Ａ−ｎ）。
表皮ブドウ球菌（S.epidermidis）（Ｂ−ｎ）。
大腸菌（E.coli)（Ｃ−ｎ）、肺炎桿菌（K.pneumoniae)（Ｄ−ｎ）。
緑膿菌（P.aeruginosa）（Ｅ−ｎ）。
セラチア菌（S.marcescenes）（Ｆ−ｎ）。
肺炎連鎖球菌（S.neumoiae)（Ｇ−ｎ）。
インフルエンザ菌（H.influenzae)（Ｈ−ｎ）。
エンテロバクター・クロアカエ菌（Enterobacter cloacae）（Ｉ−ｎ）。
及びエンテロコッカス・フェカリス菌（E.faecelis)（Ｊ−ｎ）。 Example 2
The experimental data of DNA microarray of Klebsiella pneumoniae and Serratia bacteria are shown below. In addition, as a probe, the thing for the detection of the following each bacteria is used.
S. aureus (An).
S. epidermidis (Bn).
E. coli (Cn), K. pneumoniae (Dn).
P. aeruginosa (En).
S. marcescenes (Fn).
S. neumoiae (Gn).
H. influenzae (Hn).
Enterobacter cloacae (In).
And E. faecelis (Jn).

なお、上記のカッコ内のｎは先に示したｎ＝１〜６である。結局、全プローブ数は１０×６＝６０個となる。 Note that n in the above parentheses is n = 1 to 6 described above. Eventually, the total number of probes is 10 × 6 = 60.

まず、肺炎桿菌の１０個の異なるサンプルに対するＤＮＡマイクロアレイの実験データを図１１〜２０に示す。各図において左から右にプローブＡ−１、Ａ−２、・・・〜Ｊ−５、Ｊ−６の順で配列されている。図示した通り、ＤＮＡマイクロアレイの実験データは、６０個の蛍光輝度の値、つまり６０次元のベクトルとして得られる。まず、任意のベクトルの間の距離を定義するために、「ベクトルのノルムでベクトルの各要素を割る」正規化を行う。式で記述すると、 First, experimental data of DNA microarrays for 10 different samples of Klebsiella pneumoniae are shown in FIGS. In each figure, the probes A-1, A-2,... To J-5, J-6 are arranged in order from left to right. As shown in the drawing, the experimental data of the DNA microarray is obtained as 60 fluorescence luminance values, that is, 60-dimensional vectors. First, in order to define the distance between arbitrary vectors, normalization of “dividing each element of the vector by the norm of the vector” is performed. In formula

式においてベクトルxが元のベクトルで、ベクトルyが正規化後のベクトル、となる。
このように正規化したベクトルはそのノルムが常に１となっている。なお、ここでn次元のベクトルxのノルム（ユークリッドノルム）とは次の式で定義される。 In the expression, vector x is the original vector, and vector y is the normalized vector.
The normalized vector always has a norm of 1. Here, the norm (Euclidean norm) of the n-dimensional vector x is defined by the following equation.

そして、正規化後の２つのベクトル（ベクトルaとベクトルb）間の距離を次の式で定義する。 Then, the distance between two normalized vectors (vector a and vector b) is defined by the following equation.

本実施例では、k-th nearest neighborマッチングアルゴリズムの距離定義を上記のようにする。１０個のサンプルの間の任意の１組ずつの距離を計算し、ヒストグラムにしたものを図２１に示す。データの数は₁₀Ｃ₂＝４５個になる。この図から、肺炎桿菌に上記k-th nearest neighborのアルゴリズムを適用して判定するとすると、その判定不能閾値は最大値である0.057というのが一つの候補になる。少し余裕を持たせて、1.5倍とか２倍の値を使っても良い。 In this embodiment, the distance definition of the k-th nearest neighbor matching algorithm is as described above. FIG. 21 shows a histogram obtained by calculating a distance between each set of 10 samples. The number of data is ₁₀ C ₂ = 45. From this figure, assuming that the k-th nearest neighbor algorithm is applied to Klebsiella pneumoniae, the determination impossible threshold value is 0.057, which is the maximum value, as one candidate. You may use a value of 1.5 times or 2 times with a little margin.

次に、セラチア菌の同じく１０個のサンプルの実験データを図２２〜３１に示す。肺炎桿菌で行った正規化、距離計算を用いて、１０個のサンプルの任意の２サンプルの距離を計算し、ヒストグラムを取ったのが、図３２である。肺炎桿菌のヒストグラムと分布の形状が全く異なるのがわかる。大きく山が２つ存在するということは、１０個のベクトルの中で２つのクラスターが存在することが想定される。実際、先に示した１０サンプルの蛍光輝度グラフを見ても、大きく分けて２種類のパターンが存在することがわかる。この図から、肺炎桿菌に上記k-th nearest neighborのアルゴリズムを適用して判定する、とすると、その棄却値は１つ目の山の最大値である0.090というのが一つの候補になる。 Next, experimental data of 10 samples of Serratia bacteria are shown in FIGS. FIG. 32 shows a histogram obtained by calculating the distance between two arbitrary samples of 10 samples using the normalization and distance calculation performed for Klebsiella pneumoniae. It can be seen that the shape of P. pneumoniae histogram and the distribution shape are completely different. The fact that there are two large mountains is assumed that there are two clusters in ten vectors. In fact, it can be seen that there are roughly two types of patterns even when looking at the fluorescence luminance graph of the 10 samples shown above. From this figure, if it is determined by applying the k-th nearest neighbor algorithm to Klebsiella pneumoniae, the rejection value is 0.090, which is the maximum value of the first peak.

本発明の生物種類判定方法の一例を示す図である。It is a figure which shows an example of the biological kind determination method of this invention. 本発明の生物種類判定方法を実行するための情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus for performing the biological kind determination method of this invention. ハイブリダイゼーション反応を説明する図である。It is a figure explaining hybridization reaction. ＤＮＡマイクロアレイを用いた実験手順を示すである。2 shows an experimental procedure using a DNA microarray. 感染症の判定用ＤＮＡマイクロアレイの実験手順を示すである。It is an experimental procedure of a DNA microarray for determining an infectious disease. ハイブリダイゼーション反応後の蛍光強度からなる画像の一例を示すである。It is an example of the image which consists of the fluorescence intensity after hybridization reaction. ベクトルデータの分布例を示すである。It is an example of distribution of vector data. 判定不能値設定ステップを説明する図である。It is a figure explaining a determination impossible value setting step. 判定指数集合の分布例を示すである。It is an example of distribution of a determination index set. 同一カテゴリー内の任意の２サンプルの距離集合例を示すである。It is an example of a distance set of two arbitrary samples in the same category. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. 肺炎桿菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to the Klebsiella pneumoniae sample. １０個の肺炎桿菌サンプル間の任意の１組ずつの距離に関するヒストグラムである。It is a histogram about the distance of every 1 set between 10 Klebsiella pneumoniae samples. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. セラチア菌サンプルに対するＤＮＡマイクロアレイの実験データを示す図である。It is a figure which shows the experimental data of the DNA microarray with respect to a Serratia microbe sample. １０個のセラチア菌サンプル間の任意の１組ずつの距離に関するヒストグラムである。FIG. 6 is a histogram for any set of distances between 10 Serratia samples.

Claims

In a biological species determination method for analyzing a substance assumed to contain a substance derived from a living organism and determining the type of the corresponding organism,
Analyzing a plurality of known samples whose corresponding species are known by a species analysis method to obtain a plurality of analysis data;
Setting an indeterminable threshold for the species corresponding to the known sample based on the plurality of analysis data obtained from the known sample;
Analyzing an unknown sample whose corresponding species is unknown by the species analysis method to obtain analysis data for identifying the species corresponding to the unknown sample;
Determining a type corresponding to the unknown sample based on the indeterminable threshold, or determining whether it is impossible to determine;
If it is determined to make a determination, determining the species of the unknown sample based on the plurality of analysis data;
A method for determining species.

The indeterminate threshold is an identification obtained by excluding arbitrary analysis data from a total of a plurality of analysis data stored in the storage means obtained from the known sample in one species and learning based on the remaining analysis data The biological species determination method according to claim 1, wherein a dictionary is created, the excluded analysis data is determined based on the identification dictionary, a determination index is derived, and the determination index is set based on the determination index.

In a biological species determination method for analyzing a substance assumed to contain a substance derived from a biological organism using a biological species analysis method and determining a corresponding biological species,
(1) selecting a biological species assumed as a determination result for the unknown sample;
(2) A plurality of image data that are characteristic of the species and can be used for pattern recognition from each of the known samples obtained from a plurality of individuals known to belong to the selected organism. Obtaining an image data group comprising:
(3) selecting image data from the image data group and setting an indeterminate threshold using a relationship with the remaining image data;
(4) obtaining image data from an unknown sample;
(5) determining whether image data from the unknown sample is based on the indeterminable threshold and determining whether or not it is possible to determine the species corresponding to the unknown sample;
(6) When it is determined to perform the determination in (5), a step of determining a biological species using an identification dictionary including the image data group;
A method for determining a kind of organism, comprising:

Setting of the indeterminate threshold is
(1) selecting three or more different individuals as the individual and obtaining an image data group composed of the obtained three or more image data;
(2) Select and exclude one image data from the image data group, create a dictionary using a plurality of remaining image data, and determine the previously excluded image data based on the obtained dictionary Performing a process for obtaining a decision index for each image data to obtain a decision index set consisting of m decision indices;
(3) determining an indeterminable threshold from the set of determination indices;
The organism type determination method according to claim 3, wherein the method is performed by a method comprising:

Setting of the indeterminate threshold is
(1) selecting three or more different individuals as the individual and obtaining an image data group composed of the obtained three or more image data;
(2) obtaining a distance set by obtaining a distance between two image data for all combinations of arbitrary two image data selected from the image data group; and
(3) determining an indeterminate threshold from the distance set;
The biological species determination method according to claim 3, wherein the biological species determination method is performed by a method comprising:

A known or unknown sample is immobilized on a probe-immobilized carrier on which a probe capable of specifically binding to a target nucleic acid having a base sequence characteristic of the selected organism is positioned and immobilized on a substrate. The biological species determination method according to claim 3, wherein the nucleic acid sample is reacted, and the target nucleic acid / probe conjugate formed on the substrate is optically detected to obtain image data.

The determination method according to claim 6, wherein the optical detection of the conjugate is performed using fluorescence from a fluorescent label imparted to the conjugate.

A memory storing a plurality of analysis data obtained by analyzing a plurality of known samples of which corresponding species are known by a biological analysis method, and an indeterminate threshold set based on the plurality of analysis data; Based on the indeterminable threshold value stored in the memory, it is determined whether or not the biological species corresponding to the unknown sample can be determined. And a processing unit for determining a biological species corresponding to the unknown sample based on the information processing apparatus for biological species determination.

Analyzing substances that are assumed to contain substances derived from living organisms, and analyzing multiple known samples with known corresponding biological species in an information processing device for determining the corresponding biological species A known sample image data input means for inputting image data characteristic of the biological species obtained as described above,
Unknown sample image data input means for inputting image data obtained by analyzing an unknown sample in the same manner as the known sample;
Storage means for storing the captured image data;
Means for setting an indeterminate threshold for the species corresponding to the known sample based on the plurality of analysis data obtained from the known sample;
Biological species determining means for determining whether or not to determine whether or not to determine image data from an unknown sample based on the indeterminate threshold, and determining the biological species corresponding to the unknown sample ,
Storage means for storing the determination result of the determination means;
An information processing apparatus for biological species determination, comprising: output means for outputting a determination result stored in the storage means.

Setting of the indeterminate threshold is
The individual is 3 or more, image data from these individuals is stored in the storage means,
(A) One image data is selected and excluded from the three or more image data, an identification dictionary is created using a plurality of remaining image data, and image data is excluded first based on the obtained identification dictionary A process for obtaining a judgment index by performing a process for each image data to obtain a judgment index set composed of three or more judgment indices;
(B) setting a non-determinable threshold from the set of determination indices;
The information processing apparatus according to claim 9, wherein the information processing apparatus is executed based on a program including:

The setting of the indeterminate threshold is 3 or more for the individual, and image data from these individuals is stored in the storage means,
(A) obtaining a distance set by obtaining a distance between two image data for all combinations of arbitrary two image data selected from the three or more image data; and
(B) determining an indeterminate threshold from the distance set;
The information processing apparatus according to claim 9, which is executed based on a program having

A program for causing a computer to determine a species corresponding to an unknown sample,
(1) A plurality of pieces of image data corresponding to image data characteristic of the assumed species obtained by analyzing known samples from a plurality of different individuals belonging to the assumed species as a determination result for the unknown sample. Recalling the plurality of known sample image data from the stored storage means;
(2) reading the unknown sample image data from a storage means storing a plurality of image data corresponding to the image data obtained by analyzing the unknown sample in the same manner as the known sample;
(3) selecting one from the known sample image data, and setting a non-determinable threshold using a relationship between the selected one and the remaining image data;
(4) processing the unknown sample image data based on the non-determinable threshold value, and determining a type of organism corresponding to the unknown sample;
(5) storing the determination result obtained in the determination step in a storage means;
(6) A biological type determination program, comprising: a step of outputting a determination result stored in the storage means.

Setting of the indeterminate threshold is
The individual is 3 or more, and image data from these individuals is stored in the storage means. (A) One image data is selected and excluded from the three or more image data, and the remaining plurality An identification dictionary is created using image data, and a process for obtaining a determination index by determining image data previously excluded based on the obtained identification dictionary is performed for each image data, and includes three or more determination indexes. Obtaining a decision index set;
(B) determining an indeterminable threshold from the set of decision indices;
The biological species determination program according to claim 10, comprising:

Setting of the indeterminate threshold is
The number of individuals is three or more, and image data from these individuals is stored in the storage means. (A) For all combinations of arbitrary two image data selected from the three or more image data, 2 Obtaining a distance set between two image data and obtaining a distance set;
(B) determining an indeterminate threshold from the distance set;
The biological species determination program according to claim 11, comprising:

A recording medium in which a program for executing a species determination by a computer is recorded in a readable manner,
A recording medium, wherein the program is the program according to claim 11.

The determination is impossible using a plurality of analysis data obtained by analyzing a plurality of known samples whose corresponding species are known by a biological analysis method, and an indeterminate threshold set based on the plurality of analysis data. After determining whether or not the species corresponding to the unknown sample can be determined based on the threshold value, if it is determined that the species can be determined, determining the species corresponding to the unknown sample based on the plurality of analysis data Species determination method.