JP2006227942A

JP2006227942A - Extraction system of combination set of clinical test data, determination system of neoplasm progress using the same and clinical diagnosis support system

Info

Publication number: JP2006227942A
Application number: JP2005041389A
Authority: JP
Inventors: Rumiko Matsuoka; 瑠美子松岡; Tetsuo Tsukahara; 哲夫塚原; Yoshiyuki Furuya; 喜幸古谷; Michiko Furuya; 道子古谷; Norihiro Hagita; 紀博萩田; Akinori Abe; 明典阿部; Futoshi Naya; 太納谷
Original assignee: Individual
Current assignee: Individual
Priority date: 2005-02-17
Filing date: 2005-02-17
Publication date: 2006-08-31

Abstract

PROBLEM TO BE SOLVED: To provide a clinical diagnosis support system estimating progress of organization body pathological change of neoplasm or the like from clinical test data. SOLUTION: In the system, an electronic computer mechanically extracts a combination set of clinical test data with deep connection with neoplasm progress from progress determination data of neoplasm determined by using a pathological view and clinical test data comprising a neoplasm marker by a method for generating a decision tree rule set. (1) A cluster of a sub-data set which can be described is obtained by using a common test item among all test items, and a sub-decision tree is generated. (2) The sub-decision tree comprising defect/unknown data is cut from the sub-decision trees stated in (1). The system extracts the combination set of clinical test data having an operation process for generating the sub-decision tree as the cluster without defect/unknown data. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は臨床検査データの組合わせ集合の抽出システムとこれを用いた腫瘍進行度の判定システム並びに臨床診断支援システムに関するものであって、さらに詳しくは、臨床検査データから腫瘍等の組織の病変の進行度を推定することのできる、新しい臨床診断支援システムに関するものである。 The present invention relates to a system for extracting a combination set of clinical laboratory data, a system for judging a degree of tumor progression using the system, and a clinical diagnosis support system. More specifically, the present invention relates to a lesion of a tissue such as a tumor from clinical laboratory data. The present invention relates to a new clinical diagnosis support system capable of estimating the degree of progress.

腫瘍の進行度については、小林ら（非特許文献１）によって、臨床検査データと腫瘍サイズとの関係から、腫瘍の進行度を経験的に診断する方法が提案されている。これは１２種類の腫瘍マーカを対象にしている方法だが、現在のところ、実際に利用されている７０種類以上の臨床検査項目については、腫瘍の進行度を実験的に導き出す手法はまだ検討されていない。 Regarding the degree of tumor progression, Kobayashi et al. (Non-Patent Document 1) proposes a method for empirically diagnosing tumor progression from the relationship between clinical test data and tumor size. This is a method that targets 12 types of tumor markers, but at present, methods for experimentally deriving the degree of tumor progression have not been studied for more than 70 types of clinical laboratory items that are actually used. Absent.

このような状況において、実際の臨床検査データから、腫瘍の進行度を推論する方法を確立し、臨床診断を支援することができれば極めて大きな貢献となるとことは疑いない。 In such a situation, there is no doubt that if a method for inferring the degree of tumor progression is established from actual clinical test data and clinical diagnosis can be supported, it will be a very significant contribution.

そして、腫瘍に限られずに、人組織の病変の進行度が、数多くの臨床検査データから推定できるようになることは、臨床医療全般にとっても画期的なことである。 In addition, the progress of lesions in human tissues, not limited to tumors, can be estimated from a large number of clinical laboratory data, which is a breakthrough for clinical medicine as a whole.

このような組織病変の進行度の推察を臨床検査データより可能とするための方法としては、機械学習法による推論が有効である。しかしながら、機械学習法の中で、どの方法を適用するかは、データの性質に依存する。機械学習には、データセットのみで学習できるものと、事前に背景知識（参照ルール）を必要とするものがある。参照ルールとは、例えば、「（Ａ＜ａ）∧（Ｂ＞ｂ）…→Ｓ＝ｚ」のような形式で与えるルールの枠組みのことである。臨床検査データからの推論に際しては、このような参照ルールが確立していないことから、参照ルールの存在を前提とすることはできない。臨床検査データ間の関係について、新規ルールを獲得する学習法が必要になる。また、臨床検査データには、連続値を取るものと、離散値を取るものが混在し、これらを同時に扱える学習法が必要である。このような制約に対応する機械学習システムの一つとして、決定木というルールの集合を生成するＣ４．５（非特許文献２）がある。
Tsuneo Kobayashi, and Tomoko Kawakubo,Prospective Investigation of Tumor Markers and Risk Assessment in Early Cancer Screening,CANCER, Vol.37, No.7, 1994. Quinlan J. R.: C4.5: Programs for Machine Learning, Morgan Kaufman, 1993. As a method for enabling inference of the degree of progression of such tissue lesions from clinical test data, inference by a machine learning method is effective. However, which method is applied in the machine learning method depends on the nature of the data. Machine learning includes those that can be learned only from a data set and those that require background knowledge (reference rules) in advance. The reference rule is a rule framework given in a format such as “(A <a) ∧ (B> b)... → S = z”, for example. When inferring from clinical laboratory data, since such a reference rule has not been established, the existence of a reference rule cannot be assumed. A learning method is needed to acquire new rules for the relationship between clinical laboratory data. In addition, there is a need for a learning method that can handle both continuous and discrete values in clinical laboratory data, and can handle these simultaneously. As one of machine learning systems corresponding to such restrictions, there is C4.5 (Non-patent Document 2) that generates a set of rules called a decision tree.
Tsuneo Kobayashi, and Tomoko Kawakubo, Prospective Investigation of Tumor Markers and Risk Assessment in Early Cancer Screening, CANCER, Vol. 37, No. 7, 1994. Quinlan JR: C4.5: Programs for Machine Learning, Morgan Kaufman, 1993.

本発明は、以上のような背景から、臨床検査データから腫瘍等の組織病変の進行度を推定することのできる新しい臨床診断支援システムを提供することを課題としている。 From the background as described above, an object of the present invention is to provide a new clinical diagnosis support system that can estimate the progress of a tissue lesion such as a tumor from clinical test data.

本発明者は、上記の課題を解決するために、前記のとおりのQuinlanによるＣ４．５の決定木学習の手法を用いて検討を行い、次のことを基本的な目的とする。 In order to solve the above-mentioned problems, the present inventor has studied using the C4.5 decision tree learning method by Quinlan as described above, and has the following basic object.

＜Ａ＞臨床検査データより腫瘍等の組織病変の進行度を推定するための臨床診断システムの構築方法。すなわち、病理学的所見などを用いて判定した腫瘍等の進行度判定データと、その各病例での腫瘍マーカを含む臨床検査データとを用いて、腫瘍進行度との関連が深い臨床検査データの組み合わせを効率よく抽出する。 <A> A method for constructing a clinical diagnosis system for estimating the degree of progression of a tissue lesion such as a tumor from clinical test data. In other words, using clinical progress data including a tumor marker in each disease case using clinical progress data including a tumor marker determined by using pathological findings, Extract combinations efficiently.

＜Ｂ＞上記＜Ａ＞により抽出して得られた臨床検査データの組み合わせを条件として、腫瘍進行度が未知である患者の臨床検査データに適用することで、患者の腫瘍進行度を効率よく判定する。 By applying to the clinical laboratory data of a patient whose tumor progression degree is unknown, on the condition that the combination of clinical laboratory data obtained by extraction in <A> above is a condition, the tumor progression degree of the patient is efficiently determined To do.

＜Ｃ＞上記＜Ａ＞により抽出して得られた臨床検査データの組み合わせと腫瘍進行度の関係を、知識としてデータベースに組み込む。 <C> The relationship between the combination of the clinical test data obtained by the above extraction <A> and the degree of tumor progression is incorporated into the database as knowledge.

より詳しくは、この出願の発明は以下のことを特徴としている。 More specifically, the invention of this application is characterized by the following.

第１：病理学的所見などを用いて判定した腫瘍等の進行度判定データと腫瘍マーカーを含む臨床検査データとから腫瘍進行度との結びつきの深い臨床検査データの組合わせ集合を決定木ルール集合を作成する方法により電子計算機により機械的に抽出するシステムであって、
１）全検査項目の中で共通する検査項目を用いて説明のつくサブデータ集合のクラスタを求めてサブ決定木を作成する
２）前記１）のサブ決定木より欠損・不明データを含むものを刈り取り欠損・不明データのないクラスタとしてのサブ決定木を作成する
演算プロセスを有することを特徴とする臨床検査データの組合わせ集合の抽出システム。 First: A decision tree rule set is a combination set of clinical laboratory data closely related to tumor progression degree from tumor grade determination data determined using pathological findings and clinical laboratory data including tumor markers A system for mechanical extraction by an electronic computer by a method of creating
1) Create a sub-decision tree by finding a cluster of sub-data sets that can be explained using common inspection items among all inspection items. 2) Items containing missing / unknown data from the sub-decision tree of 1) above. A system for extracting a combination set of clinical laboratory data, characterized by having an arithmetic process for creating a sub-decision tree as a cluster free of cut-missing missing / unknown data.

第２：上記第１のシステムにおいて、さらに、
３）前記２）でサブ決定木に含まれなかった欠損・不明データの抽出と見直しにより全てのデータが組込まれたサブ決定木を作成する
演算プロセスを有することを特徴とする臨床検査データの組合わせ集合の抽出システム。 Second: In the first system,
3) A set of clinical laboratory data characterized by having a calculation process for creating a sub-decision tree in which all data is incorporated by extracting and reviewing missing / unknown data not included in the sub-decision tree in 2) above Combined set extraction system.

第３：上記第１または第２のシステムにおいて、検査項目間の関係強度に係わる知識データベースをもっての検証プロセスを介してサブ決定木の作成と欠損・不明データを含むものの刈り取りを行うことを特徴とする臨床検査データの組合わせ集合の抽出システム。 Third: In the first or second system described above, the sub-decision tree is created and the data including missing / unknown data is trimmed through a verification process with a knowledge database related to the strength of relationship between examination items. A system for extracting a combination set of clinical laboratory data.

第４：上記第１から第３のいずれかのシステムにより抽出されたツリー構造の決定木クラスタのデータベースを有し、腫瘍進行度が未知の患者の臨床検査データの条件選択による組合わせと決定木クラスタとの合致性の演算をもって腫瘍進行度を機械的に判定可能とするしたことを特徴とする腫瘍進行度の判定システム。 Fourth: A tree and decision tree having a tree-structured decision tree cluster extracted by any one of the first to third systems described above, and a combination and decision tree based on condition selection of clinical laboratory data of a patient whose tumor progression degree is unknown A tumor progression determination system characterized in that a tumor progression degree can be mechanically determined by calculating a match with a cluster.

第５：上記第４のシステムにおいて、条件選択の組合わせによる合致性の演算から決定木クラスタのツリー構造経路より腫瘍進行度を予測することを特徴とする腫瘍進行度の判定システム。 Fifth: The tumor progression degree determination system according to the fourth system, wherein the tumor progression degree is predicted from the tree structure path of the decision tree cluster from the calculation of matching by a combination of condition selections.

第６：上記第１から第５のいずれかのシステムをその構成の少くとも一部として備えていることを特徴とする臨床診断支援システム。 Sixth: A clinical diagnosis support system comprising any one of the first to fifth systems as at least a part of its configuration.

以上のとおりの本発明は、電子計算機による機械的な情報システムとして構成されるものであって、データの入力部、演算部、出力・表示部等の手段を有している。たとえば、第１および第２、第３の発明においては、サブ決定木のツリー構造を有する決定木クラスタの作成、欠損・不明データに係わる刈り取りが演算部において行われることになる。第４および第５の発明においては、判定予測が演算部において行われ、その結果は出力・表示部において出力、表示が行われることになる。 The present invention as described above is configured as a mechanical information system using an electronic computer, and has means such as a data input unit, a calculation unit, and an output / display unit. For example, in the first, second, and third inventions, the calculation unit performs creation of a decision tree cluster having a tree structure of a sub-decision tree and pruning for missing / unknown data. In the fourth and fifth inventions, the determination prediction is performed in the calculation unit, and the result is output and displayed in the output / display unit.

本発明によれば、これまで全く実現されていなかった、臨床検査データからの腫瘍等の病変組織の進行度の推定が可能になり、臨床診断とこれに基づく処分にとって極めて大きな支援が実現されることになる。 According to the present invention, it is possible to estimate the degree of progression of a diseased tissue such as a tumor from clinical laboratory data, which has never been realized so far, and extremely great support is realized for clinical diagnosis and disposal based thereon. It will be.

本発明について、以下に詳しく説明する。 The present invention will be described in detail below.

＜Ａ＞臨床検査データより腫瘍等の組織病変の進行度を推定するための臨床診断システム
この臨床診断システムは、診断システムの中核となる決定木の集合を構築するための手段である。実験によるデータとは異なり、一般に患者の臨床検査データは、必要な全ての検査項目がそろっているとは限らず、多くの検査データが欠損している（たとえば表１）。そのような不ぞろいなデータの中から、診断結果と臨床検査データとの関係を見つけだすための方法を本発明では提案している。 <A> Clinical Diagnostic System for Estimating Progression of Tissue Lesions such as Tumors from Clinical Laboratory Data This clinical diagnostic system is a means for constructing a set of decision trees as the core of the diagnostic system. Unlike experimental data, patient clinical test data generally does not have all necessary test items, and a lot of test data is missing (for example, Table 1). The present invention proposes a method for finding out the relationship between diagnosis results and clinical laboratory data from such irregular data.

（１）まず、決定木の集合を構築する元となっている、たとえば表１のような構成の臨床検査データ（Ｄ₀）には次の性質がある。
・Ｄ₀は、患者毎の腫瘍進行度を臨床病理学的に判定した結果と、その腫瘍進行度を判定した時点で検査している臨床検査データを対応させたデータである。
・腫瘍進行度は、たとえば表１のように、Kobayashi et. al (CANCER,1994)に基づいて１〜５の５段階に分類している。
・Ｄ₀に含まれている臨床検査データは、その腫瘍進行度の判定を行うために重要な検査項目が必ず含まれているとは限らず、検査しなかったために欠損している場合もある。
・腫瘍進行度の判定に意味のある検査項目ばかりとは限らず、腫瘍進行度とは関係が薄い臨床検査データも多く含まれている。
（２）このような性質を有する臨床検査データ（Ｄ₀）については、前記のQuinlanによるＣ４．５で一回学習しただけでは全データを説明する決定木を作成することは難しい。 (1) First, clinical laboratory data (D ₀ ) having a configuration as shown in Table 1, for example, which is a source for constructing a set of decision trees has the following properties.
D ₀ is data in which the results of the clinicopathological determination of the degree of tumor progression for each patient are associated with the clinical laboratory data examined at the time of determining the degree of tumor progression.
-As shown in Table 1, for example, the degree of tumor progression is classified into 5 levels of 1 to 5 based on Kobayashi et. Al (CANCER, 1994).
Clinical examination data contained in D ₀ are sometimes the tumor progression important inspection item for judging the not always is always included, are deficient because it has not examined .
-Not only test items that are meaningful in determining tumor progression, but also many clinical laboratory data that are not closely related to tumor progression.
(2) With regard to clinical laboratory data (D ₀ ) having such properties, it is difficult to create a decision tree that explains all data by learning once with C4.5 by Quinlan.

Quinlan によるＣ4.5 決定木学習には次の性質がある。 Quinlan C4.5 decision tree learning has the following properties.

・Ｃ4.5 は、入力データ（臨床検査データ＋腫瘍進行度）を調べて、判定値（ここでは腫瘍進行度）となるために条件を計算し、臨床検査データの範囲の組み合わせとして表された決定木を出力する。・ C4.5 was obtained by examining the input data (clinical test data + tumor progression), calculating the condition to become the judgment value (tumor progression here), and expressed as a combination of the range of clinical laboratory data Output the decision tree.

・欠損データを許容しない場合は、ある患者の臨床検査値の組み合わせと対応するリーフ（決定木の末端）は、決定木の中で１つしか存在しない。（ひとつのリーフが、複数の患者に対応する場合はある。）
・欠損データを許容する場合は、ひとつの決定木の中で、同じ患者に対応するリーフが複数出現する。これは、欠損した部分のデータがどんな値をとってもよいという条件となるため（don't care：値を考慮せず）、可能性として複数の条件が並立するためである。一方、本発明では、欠損データは許容していない。
（３）そこで、全検査項目の中で共通する検査項目を用いて、サブデータ集合のいくつかのクラスタ（図１のＤ_k）を学習によって求め、サブ決定木ＤＴ_k（ｋ番目の決定木）を作成する。 If missing data is not allowed, there is only one leaf (decision tree end) corresponding to a combination of laboratory values of a patient in the decision tree. (One leaf may correspond to multiple patients.)
・ When missing data is allowed, multiple leaves corresponding to the same patient appear in one decision tree. This is because a condition that the missing portion data may take any value (don't care: does not consider the value), and possibly a plurality of conditions are arranged side by side. On the other hand, in the present invention, missing data is not allowed.
(3) Therefore, by using the inspection items common among all the inspection items, several clusters (D _{k in} FIG. 1) of the sub data set are obtained by learning, and the sub decision tree DT _k (kth decision tree) is obtained. ).

サブ決定木ＤＴ_kを作成するためのクラスタリングの計算手順を図２に示す。Ｄ_kは第ｋステップでのサブ決定木ＤＴ_kを作成するのに用いる臨床検査データの部分集合Ｄｓ_kは、Ｄ_kから得られたサブ決定木ＤＴ_kによってカバーされた患者データを表す。図２の＜１＞の処理によってＤ_kから得られる決定木ＤＴ_kには、欠損データがあるために、Ｄ_kを説明できない不要な枝が生成される。図２の＜２＞の処理では、このような不要な枝を除き、これを第ｋ番目のサブ決定木ＤＴ_kとして出力する。これによって、元のデータ集合foo.dataの患者データを、いずれかのサブ決定木でカバーできるように学習する。この処理を反復的に繰り返すことで、患者データを説明可能な決定木を、サブ決定木の集合｛ＤＴ_k｝（ｋ＝１．．．Ｋ）として構成する。 FIG. 2 shows a clustering calculation procedure for creating the sub-decision tree DT _k . D _k is a subset of laboratory data Ds _k used to create the sub-decision tree DT _k at the k-th step represents patient data covered by the sub-decision tree DT _k obtained from D _k . Since there is missing data in the decision tree DT _k obtained from D _k by the process <1> in FIG. 2, an unnecessary branch that cannot explain D _k is generated. In the process of <2> in FIG. 2, such unnecessary branches are removed and output as the _kth sub-decision tree DTk. As a result, the patient data of the original data set foo.data is learned so as to be covered by any sub-decision tree. By repeating this process repeatedly, a decision tree that can explain patient data is configured as a set of sub-decision trees {DT _k } (k = 1... K).

この反復処理において、患者データの中には、複数の検査項目に亘って値が欠損していると、本来これら検査項目の値や範囲で判定するのが目的の決定木でありながら、多義性のために、各検査項目がどんな値をとってもよいということになり、もはやサブ決定木が生成できない場合が起きる。そこで図２の処理＜１＞で得られるサブ決定木ＤＴ_kにおいて、Ｄ_kの中でＤＴ_kをサポートできる患者データがある場合はその患者データはＤＴ_kの中でどれか１つのリーフにしか存在しないという条件で、Ｄｓ_kを求めている。これは、検査項目の値が欠損している（don't careの）場合を許容しない条件で決定木を生成することを意味する。これに対して、don't careを許容した条件で決定木を作成することも原理的には可能であるが、各検査項目がどんな値をとってもよいという条件でできるサブ決定木となるため、未知データに対してover-fitting（推論し過ぎ）の判定をしてしまう決定木になると考え、don't careを許容した条件でＤｓ_kを求める方法を採用していない。 In this iterative process, if the patient data is missing values across multiple test items, the ambiguity is that the original decision tree is to determine based on the values and ranges of these test items. Therefore, each inspection item can take any value, and a sub-decision tree can no longer be generated. Therefore, in the sub-decision tree DT _k obtained in the process of FIG. 2 <1>, when there is patient data that can support DT _k in D _k is only any one of the leaves the patient data in the DT _k Ds _k is obtained under the condition that it does not exist. This means that the decision tree is generated under a condition that does not allow the case where the value of the inspection item is missing (don't care). On the other hand, it is possible in principle to create a decision tree under conditions that allow don't care, but because it becomes a sub-decision tree that can be taken under the condition that each inspection item can take any value, considered to be a decision tree that result in the determination of over-fitting (too much inference) for the unknown data, does not adopt the method of determining the Ds _k under the conditions that allow the do not care.

サブ決定木クラスタを求める手順を図２に沿ってさらに詳しく説明すると以下のとおりである。
〔１〕全データＤ₀を入力データとして、Ｃ4.5 決定木学習を行って、ＤＴ₀を得る。 The procedure for obtaining the sub-decision tree cluster will be described in more detail with reference to FIG.
[1] C4.5 decision tree learning is performed using all data D ₀ as input data to obtain DT ₀ .

ここでは、入力データ全体の決定木を作成している。
〔２〕ＤＴ₀の中を調べて、各リーフまでの条件を満たす、患者の臨床検査データを取り出してＤｓ₀とする。 Here, a decision tree for the entire input data is created.
[2] The inside of DT ₀ is examined, and the clinical laboratory data of the patient that satisfies the conditions up to each leaf is extracted and set as Ds ₀ .

すなわち、作成した決定木の中からdon't careではないリーフを探し、その条件がどの患者データをサポートしているかを調べる。リーフまでの条件に含まれる臨床検査項目のデータが欠損していないもの対象となる。
〔３〕Ｄｓ₀を入力として、Ｄｓ₀の要素をリーフに持っているサブ決定木ＤＴ₀を得るのでこれを保存しておく。
〔４〕全データＤ₀の中から既にサブ決定木として得られている患者データDs₀を取り除き、残ったものについて、再度Ｃ4.5 決定木学習による評価を行う。Ｄ₀の中から、患者データDs₀を取り除いて（Ｄ₀−Ｄｓ₀），Ｄ₁とする。
〔５〕Ｄ₁を入力データとして用いてＣ4.5 決定木学習を行い（〔１〕，〔２〕，〔３〕，〔４〕を繰り返し）、新たなサブ決定木ＤＴ₁を得る。この操作を繰り返し、Ｄ_kがなくなるか、〔２〕の操作で条件を満たす患者データが得られなくなるまで繰り返す。 That is, a leaf that is not don't care is searched from the created decision tree, and it is examined which patient data is supported by the condition. Data for clinical test items included in the conditions up to the leaf are not missing.
[3] Since Ds ₀ is input and a sub-decision tree DT ₀ having a Ds ₀ element in the leaf is obtained, it is stored.
[4] Remove the patient data Ds ₀ which is obtained as a sub-decision tree already out of all the data D _0, the remaining ones, performs the evaluation by again C4.5 decision tree learning. From the D _0, remove the patient data _{_{_{Ds 0 (D 0 -Ds 0)}}} , and D _1.
[5] C4.5 decision tree learning is performed using D ₁ as input data (repeating [1], [2], [3], and [4]) to obtain a new sub-decision tree DT ₁ . This operation is repeated until D _k disappears or patient data that satisfies the condition in [2] is not obtained.

１サイクル毎に全データＤ₀の中からサブ決定木として評価した患者データを取り除くことによって、欠損のないデータによるサブ決定木のクラスタを作ることができる。また、残っている患者データ全てがdon't careとなる場合、無理に評価せず、要チェックデータとする。要チェックデータとして判定された患者データについては、さらに臨床検査データを追加するなどをすることによって、再評価に供することができる。
〔６〕上の一連の操作によって、保存されたK個のサブ決定木ＤＴ₀，ＤＴ₁，ＤＴ₂，，，ＤＴ_Kを決定木クラスタとする。 By subtracting the patient data evaluated as the sub-decision tree from all the data D ₀ every cycle, a cluster of sub-decision trees can be created with no missing data. In addition, if all the remaining patient data is don't care, it is not evaluated forcibly, but is checked data. The patient data determined as the check data required can be subjected to re-evaluation by adding clinical laboratory data.
By a series of operations on [6], the stored K sub decision tree _{_{_{DT 0, DT 1, DT 2}}} ,,, DT K to a decision tree clusters.

保存されているサブ決定木を集めることによって、欠損のないデータによるサブ決定木のクラスタとなる。 By collecting the stored sub-decision trees, a sub-decision tree with missing data is obtained.

以上の手順について、全データとサブ決定木との関係を示した概念図が図３および図４である。 3 and 4 are conceptual diagrams showing the relationship between all data and the sub-decision tree with respect to the above procedure.

全データの中から、臨床検査データと腫瘍進行度の関係が明確なものを効率よく抽出するために、Ｃ4.5 による決定木学習を使って得られる決定木の中から、don't careとなっているリーフを含む枝を刈り取ることによって、臨床検査データと腫瘍進行度の関係が明確な部分集合を抽出している。 In order to efficiently extract data with clear relation between clinical laboratory data and tumor progression from all the data, don't care and don't care are determined from the decision tree obtained by using C4.5 decision tree learning. By cutting the branches that contain the leaves, a subset with a clear relationship between clinical laboratory data and tumor progression is extracted.

以上の結果として、全データの中から、don't careを含まない決定木（ＤＴ₁〜ＤＴ_K）がえられる。ＤＴ₁〜ＤＴ₅に含まれていないデータは、欠損データがあるため、明確な決定木が得られないデータとなる（図５）。
（４）一連のサブ決定木のクラスタ化処理に含まれず、判定不能となったデータに新たな臨床検査データを追加して、決定木クラスタに加える処理について図６に沿って説明する。
〔１〕患者毎に臨床検査データと腫瘍進行度の関係付けられている元データ（Ｄ₀）、から決定木クラスタ（ＤＴＣ₀）を作成する。
〔２〕Ｄ₀の中からdon't careになったために決定木クラスタに含まれなかったデータ（ＤＵ₀）を抽出する。
〔３〕判定不能データ（ＤＵ₀）について、再検査などで臨床検査データの欠損値をなくす。
〔４〕データの見直しを行ったＤＵ₀を用いて、決定木クラスタ（ＤＵＣ₀）を作成する。 As a result, decision trees (DT _{1 to} DT _K ) that do not include don't care are obtained from all the data. Data that is not included in DT _{1 to} DT ₅ is data for which a clear decision tree cannot be obtained because there is missing data (FIG. 5).
(4) A process of adding new clinical test data to data that is not included in the clustering process of a series of sub-decision trees and cannot be determined, and adding the data to the decision tree cluster will be described with reference to FIG.
[1] A decision tree cluster (DTC ₀ ) is created from original data (D ₀ ) in which clinical laboratory data and tumor progression degree are related for each patient.
[2] Data (DU ₀ ) that is not included in the decision tree cluster because it has become don't care is extracted from D ₀ .
[3] With respect to undecidable data (DU ₀ ), a missing value in clinical laboratory data is eliminated by reexamination or the like.
[4] A decision tree cluster (DUC ₀ ) is created using DU ₀ whose data has been reviewed.

データの見直しにより、以前にdon't careとなっていたデータが決定木に含まれるようになる。
〔５〕Ｄ₀から作成した決定木（ＤＴＣ₀）に、ＤＵ₀から作成した新たな決定木クラスタ（ＤＵＣ０）を組み入れて、新しい決定木クラスタ（ＤＴＣ₁）とする。
〔６〕〔４〕の処理によりdon't careになっている判定不能データを抽出（ＤＵ₁）して、〔３〕の処理を繰り返し行って、決定木クラスタを作成する。
〔７〕データの見直しをしても、全てのデータがdon't careになるか、データが全て決定木クラスタに組み込まれるかすれば、処理を終了する。
＜Ｂ＞患者の腫瘍進行度を判定するシステム
次に、上記のようにしてできたL個サブ決定木（DTC₀〜DTC_L）を用いて、未知の患者データにして腫瘍の進行度を判定する。
（１）まず、上記＜Ａ＞によって作成した決定木クラスタについて説明すると、図７に例示したように、決定木クラスタを構成するサブ決定木は、ひとつの頂点から分岐している条件の集合と考えることができ、サブ決定木の中では、頂点からそれぞれのリーフに至る経路がＡＮＤで結合された条件と考えることができる。リーフには腫瘍進行度（１〜５）のどれか１つが記載されていて、別のリーフに同じ腫瘍進行度が記載されていることがある。したがって、同じ腫瘍進行度に至るには複数の経路（条件）が存在する。しかし、頂点からリーフに至る経路は、決定木クラスタ全体の中で１種類しか存在しない。リーフには、サブ決定木を作った元になっているデータに対応する患者番号が付記されているが、これはサブ決定木の検証を行うためのデータであり、未知データに対してはこの番号は意味をもたない。 As a result of the data review, data that was previously don't care is included in the decision tree.
To [5] decision tree that was created from the D _₀ (DTC _0), incorporated a new decision tree cluster (DUC0) that was created from DU _0, the new decision tree cluster (DTC _1).
[6] Undecidable data that has become don't care by the processing of [4] is extracted (DU ₁ ), and the processing of [3] is repeated to create a decision tree cluster.
[7] Even if the data is reviewed, if all the data becomes don't care or if all the data is incorporated into the decision tree cluster, the process is terminated.
 System for Determining Patient's Tumor Progression Next, using the L sub-decision trees (DTC _{0 to} DTC _L ) created as described above, the tumor progression is determined using unknown patient data. To do.
(1) First, the decision tree cluster created by the above <A> will be described. As illustrated in FIG. 7, the sub-decision tree constituting the decision tree cluster includes a set of conditions branched from one vertex. In the sub-decision tree, the path from the vertex to each leaf can be considered as a condition connected by AND. One of the tumor progression levels (1 to 5) is described in the leaf, and the same tumor progression level may be described in another leaf. Therefore, there are multiple routes (conditions) to reach the same degree of tumor progression. However, there is only one type of path from the vertex to the leaf in the entire decision tree cluster. In the leaf, the patient number corresponding to the data from which the sub-decision tree was created is added, but this is data for verifying the sub-decision tree. The number has no meaning.

たとえば、例示のサブ決定木では、腫瘍進行度＝４となる条件は、３種類示されており、この３つの条件のいずれの条件でも腫瘍進行度＝４と判定される。腫瘍進行度−４となる条件式は次のように表される。この式から、条件に含まれる要素の順序は腫瘍進行度の判定とは無関係となる。 For example, in the illustrated sub-decision tree, three types of conditions for tumor progression = 4 are shown, and it is determined that tumor progression = 4 for any of these three conditions. The conditional expression that gives a tumor progression of −4 is expressed as follows. From this equation, the order of the elements included in the condition is irrelevant to the determination of tumor progression.

腫瘍進行度が未知のデータを、この決定木クラスタに適用して、未知データに記載されている臨床検査データの組み合わせから、決定木クラスタ内のどのリーフに合致するかを調べることによって、未知データがどの腫瘍進行度であるかを予想するものである。
（２）未知データを決定木クラスタに対して適用するフローを図８および図９に沿って説明する。 By applying data with unknown tumor progression to this decision tree cluster and examining which leaves in the decision tree cluster match from the combination of laboratory data described in the unknown data, the unknown data Predicts the degree of tumor progression.
(2) A flow for applying unknown data to a decision tree cluster will be described with reference to FIGS.

腫瘍進行度のわかっていない未知データの腫瘍進行度を、生成した決定木クラスタ（ＤＴ₁〜ＤＴ_K）を使って判定する。未知の臨床検査データを、ＤＴ₁の条件に当てはめていき、もし、合致する条件がＤＴ₁に見つからなければ、ＤＴ₂の条件を使って当てはめを行う。このようにして、決定木クラスタの中の条件を調べて、どこかに合致する条件が見つかれば腫瘍進行度が判定できる。 Tumor progression of unknown data with unknown tumor progression is determined using the generated decision tree clusters (DT _{1 to} DT _K ). Unknown clinical laboratory data is applied to the DT ₁ conditions, and if no matching condition is found in DT ₁ , the DT ₂ conditions are applied. In this way, the condition in the decision tree cluster is examined, and if a condition that matches somewhere is found, the degree of tumor progression can be determined.

決定木クラスタの中に、未知の臨床検査データの組み合わせと完全に一致（決定木クラスタの頂点からリーフまでの検査データの組み合わせが一致）するものが見つからなければ、判定不能データとして出力される。また、未知の臨床検査データの組み合わせを完全に含んでいる経路が複数みつかれば、これも判定不能データとして出力される。 If a decision tree cluster that does not completely match a combination of unknown clinical test data (a combination of test data from the vertex to the leaf of the decision tree cluster matches) is not found, it is output as undecidable data. If a plurality of paths that completely contain unknown clinical test data combinations are found, this is also output as undecidable data.

出力された判定不能データは、臨床検査データなどを追加するなどを行って再び決定木クラスタによる判定に供することができる。
（３）図１０には、未知データを決定木クラスタに適用して得られる効果を説明している。 The undecidable data that has been output can be subjected to determination by the decision tree cluster again by adding clinical laboratory data or the like.
(3) FIG. 10 illustrates an effect obtained by applying unknown data to a decision tree cluster.

〔１〕元データ（Ｄ₀）から腫瘍判定の条件として抽出した臨床検査データの組み合わせが、未知データに存在すれば、腫瘍進行度判定ができる。 [1] If a combination of clinical laboratory data extracted as a condition for tumor determination from the original data (D ₀ ) is present in unknown data, the degree of tumor progression can be determined.

〔２〕判定不能と判断された未知データに臨床検査データを追加して、再度、決定木クラスタで判定することにより、腫瘍進行度判定を行うことができる。 [2] Tumor progression degree determination can be performed by adding clinical laboratory data to unknown data determined to be indeterminate and determining the decision tree cluster again.

〔３〕病理的な所見を用いて判定した腫瘍進行度と臨床検査データを蓄積し、その蓄積データを使って決定木クラスタを随時更新していくことによって、未知データの腫瘍進行度判定の精度を向上させることができる。
（４）以上の判定システムにおいては、実際には、（ａ）未知の患者データにdon't careとなる欠損の検査項目がどんな値をとっても腫瘍進行度を強制的に判定する方法と、（ｂ）欠損項目（don't care）のない検査項目だけを使って厳密に判定する方法の２種類の方法が考えられる。 [3] Accumulation of tumor progression assessment of unknown data by accumulating tumor progression and clinical laboratory data determined using pathological findings, and updating the decision tree cluster as needed using the accumulated data Can be improved.
(4) In the above determination system, in practice, (a) a method for forcibly determining the degree of tumor progression regardless of the value of a missing test item that becomes don't care in unknown patient data; b) Two types of methods are possible: a method of strictly judging using only inspection items without missing items (don't care).

この２つの判定法を併用することによって、両者の腫瘍進行度判定結果に違いがでてきた場合には、逆にこのdon't care となっている欠損項目を再度、検査しなおすなどの診断支援を医者に示唆する可能性がでてくる。 If there is a difference in the tumor progression judgment results of the two methods by using these two judgment methods in combination, diagnosis such as reexamining the missing items that are don't care. There is a possibility of suggesting support to the doctor.

そこで、上記の（ｂ）の場合について、評価してみる。
＜実験内容＞
腫瘍マーカー検査などの検査項目が重なっている複数の臨床検査データを用いて、注目する検査項目について決定木分析による推論学習を行って、その結果から検査項目間において臨床的に意味のある関連情報を得る方法の評価を行った。推論学習を行った結果得られる決定木の上位層に含まれるノードの条件から一般的傾向データとして意味のある関連情報として取り出す方法の評価を行い、臨床検査データから可能性のある疾患を絞り込む方法の評価を行った。
＜実験結果の評価方法＞
推論学習結果を検討し決定木の中から臨床的に意味のある関連情報を得る方法の評価を行った。推論学習の結果得られる決定木の上位層にあるノードの条件から臨床的に意味のある一般的傾向が得られる方法の評価を行い、臨床検査データから可能性のある疾患を絞り込む方法の評価を行った。
＜実験データ＞
腫瘍マーカを中心に検査し、腫瘍進行度を医師が判定した臨床検査データ（検査項目５１種類、患者数４３２人）
＜実験結果＞
（１）推論学習により臨床検査データより一般的傾向を抽出する方法の評価
一般に臨床検査結果は、その患者の状態に応じて検査を行う項目が決められる。このため、多くの患者の臨床検査データを集めたとき、その中の臨床検査項目の全てを欠損なく検査を行われていることはなく、検査していない項目がほとんどの場合存在する。このような臨床検査データの集合から、一般的傾向を求めるために推論学習手法を適用する方法の検討を行った。実験では、臨床検査データ集合Ｄ_kによる決定木ＤＴ_kの中で、決定木のリーフがサポートする患者データが存在し、且つ、そのリーフまでの条件には検査項目の欠損がないという条件（欠損を許容しない条件）を設定した。下に示す決定木は、各リーフに腫瘍進行度が表示され、その右側に患者データ番号（０〜４３１）、患者データ数を記載している。 Therefore, the case (b) above will be evaluated.
<Experiment details>
Using multiple clinical test data with overlapping test items such as tumor marker tests, inference learning by decision tree analysis is performed for the test item of interest, and the relevant information that is clinically meaningful between the test items based on the results The method of obtaining was evaluated. A method to evaluate possible methods of extracting relevant information as general trend data from the conditions of nodes included in the upper layers of decision trees obtained as a result of inference learning, and to narrow down possible diseases from clinical laboratory data Was evaluated.
<Evaluation method of experimental results>
We examined the results of inference learning and evaluated the method to obtain clinically relevant information from decision trees. Evaluate methods to obtain clinically meaningful general trends from the conditions of nodes in the upper layers of decision trees obtained as a result of inference learning, and evaluate methods to narrow down possible diseases from clinical laboratory data went.
<Experimental data>
Laboratory test data (51 types of test items, number of patients 432) in which doctors determined tumor progression by examining mainly tumor markers
<Experimental result>
(1) Evaluation of a method for extracting a general tendency from clinical test data by inference learning In general, items to be tested for a clinical test result are determined according to the patient's condition. For this reason, when clinical test data of many patients are collected, all of the clinical test items are not inspected without defects, and there are almost no items that have not been tested. From such a collection of clinical laboratory data, we examined a method to apply inference learning techniques to obtain general trends. In the experiment, in a decision tree DT _k by clinical examination data set D _k, present patient data to the leaf support of the decision tree, and, the condition that there is no loss of inspection items in conditions up to the leaf (loss The condition that does not allow is set. In the decision tree shown below, the degree of tumor progression is displayed on each leaf, and the patient data number (0 to 431) and the number of patient data are described on the right side.

臨床検査データ集合Ｄ₀から得られた、最初〔１２〕のサブ決定木ＤＴ₁を示す（表２）。この決定木により４０人分（〔１２〕＋〔２６〕＋〔２〕）の腫瘍進行度に対する説明がなされている。
サブ決定木ＤＴ₁（４０人分を説明） The first [12] sub-decision tree DT ₁ obtained from the clinical laboratory data set D ₀ is shown (Table 2). This decision tree explains the degree of tumor progression for 40 people ([12] + [26] + [2]).
Sub-decision tree DT ₁ (explains 40 people)

次に４３２−４０＝３９２名分のデータ集合Ｄ₂から同等の方法によってサブ決定木ＤＴ₂を得（表３）、これによって、６人分（〔２〕＋〔４〕）の腫瘍進行度に対する説明がなされている。 Next, a sub-decision tree DT ₂ is obtained from the data set D _{2 for} 432-40 = 392 persons by an equivalent method (Table 3), whereby the tumor progression of 6 persons ([2] + [4]) The explanation is made.

サブ決定木ＤＴ₂（６人分を説明） Sub-decision tree DT ₂ (explains 6 people)

次いで、以降のサブ決定木（表４〜表１２）を作成した。 Subsequently, the following sub-decision trees (Tables 4 to 12) were created.

サブ決定木ＤＴ₃（４人分を説明） Sub-decision tree DT ₃ (explains 4 people)

サブ決定木ＤＴ₄（１人分を説明） Sub-decision tree DT ₄ (explains one person)

サブ決定木ＤＴ₅（４人分を説明） Sub-decision tree DT ₅ (explains 4 people)

サブ決定木ＤＴ₆（９人分を説明） Sub-decision tree DT ₆ (explains 9 people)

サブ決定木ＤＴ₇（９１人分を説明） Sub-decision tree DT ₇ (explains 91 people)

サブ決定木ＤＴ₈（２８人分を説明） Sub-decision tree DT ₈ (explains 28 people)

サブ決定木ＤＴ₉（４人分を説明） Sub-decision tree DT ₉ (explains 4 people)

サブ決定木ＤＴ₁₀（１人分を説明） Sub-decision tree DT ₁₀ (explains one person)

サブ決定木ＤＴ₁₁（１人分を説明） Sub-decision tree DT ₁₁ (explains one person)

サブ決定木ＤＴ₁〜ＤＴ₁₁までを作成し、患者データ集合４３２人中１８９人分の腫瘍進行度を説明する決定木集合を得た。ここで得られた決定木によって、まだ腫瘍進行度が与えられていない患者を分別することが可能であるため、医師による診断を支援することが可能である。 Sub-decision trees DT _{1 to} DT ₁₁ were created, and a decision tree set explaining the degree of tumor progression for 189 out of 432 patient data sets was obtained. Since the decision tree obtained here can sort patients who have not yet been given the degree of tumor progression, it is possible to support a diagnosis by a doctor.

ここで説明できなかった２４３人分は、臨床検査データの欠損などの影響によるものであり、患者の臨床検査項目が追加されることによって、説明できるようになる。
＜実験結果の評価＞
欠損を認めない条件を用いてサブ決定木を順次生成するクラスタリングを行い、サンプルとした臨床検査データ集合から、腫瘍進行度を説明するサブ決定木を得た。 The 243 persons that could not be explained here are due to the influence of a lack of clinical test data, and can be explained by adding patient clinical test items.
<Evaluation of experimental results>
Clustering was performed in which sub-decision trees were sequentially generated using conditions that did not recognize a deficiency, and sub-decision trees that explained the degree of tumor progression were obtained from sample clinical laboratory data sets.

得られたサブ決定木の各リーフの条件と合致する患者データが少なくとも１名は存在し、その条件が臨床的に概ね矛盾がないことを確認した。
＜Ｃ＞臨床検査データと腫瘍進行度との関係の知識データベースシステム
機械推論学習手法を臨床データに適用することで、臨床データに含まれている患者の検査データの組み合わせを説明する決定木が得られる。決定木の情報には、臨床的に肯定できる情報とそうでない情報が混在している。このため、機会推論学習で得られる決定木をそのまま知識要素として実際的に使用することはできない。そこで、機械推論学習によって得られた決定木を、決定木に含まれている情報と既存の知識要素とを媒介するインタフェイスの検討から、決定木の情報の中から必要な情報を取捨選択する方法を導く。 It was confirmed that there was at least one patient data that matched the condition of each leaf of the obtained sub-decision tree, and that the condition was generally clinically consistent.
<C> Knowledge database system for the relationship between clinical test data and tumor progression level By applying the machine inference learning method to clinical data, a decision tree that explains the combination of patient test data included in the clinical data is obtained. It is done. The decision tree information includes information that can be clinically affirmed and information that is not. For this reason, the decision tree obtained by opportunity reasoning learning cannot be used as a knowledge element in practice. Therefore, the decision tree obtained by machine inference learning is used to select necessary information from the decision tree information by examining the interface that mediates the information contained in the decision tree and existing knowledge elements. Guide the way.

推論学習を行った結果、出力される決定木は、たとえば図１１のように出力情報は全てテキストファイルであり、１１行のヘッダ部分があり、１２行目から二分木の決定木の本体が始まっている。 As a result of inference learning, the decision tree that is output is, for example, as shown in FIG. 11, all output information is a text file, has an 11-line header portion, and the decision tree body starts from the 12th line. ing.

決定木の各ノードまでの条件を支持するデータの総数を後処理フィルタによって求めている（図１２）。 The total number of data supporting the conditions up to each node of the decision tree is obtained by a post-processing filter (FIG. 12).

この図１２においては、四角で囲った部分の意味は次のとおりである。 In this FIG. 12, the meaning of the part enclosed by the square is as follows.

腫瘍進行度＝４となる条件のひとつとして、
この条件に合致する患者の番号は、１，３，６，１２，１７，２０，２２，２３，３３，・・・であり、その合計１０４人となる。 As one of the conditions for tumor progression = 4,
Numbers of patients that meet this condition are 1, 3, 6, 12, 17, 20, 22, 23, 33,.

臨床診断支援システムの取り込みに際しては、その決定木がどのクラス項目に基づいて行わせれたものかをヘッダの情報から取り出して、クラス項目のそれぞれの区分に割り当てられた条件式として決定木を認識する。 When importing a clinical diagnosis support system, the class item that the decision tree was made based on is extracted from the header information, and the decision tree is recognized as a conditional expression assigned to each class item classification .

決定木を用いての臨床診断支援システムへの取り込みは、次の図１３の手順となる。 Incorporation into the clinical diagnosis support system using the decision tree is the procedure shown in FIG.

そこで以下に、既存システムとして本発明者らにより開発され、構築されてきたＨＧＭ−ａｉｄ（特願２００３−３２０４４９）を例として説明する。 In view of this, an HGM-aid (Japanese Patent Application No. 2003-320449) developed and constructed by the present inventors as an existing system will be described below as an example.

このＨＧＭ−ａｉｄは、包括的遺伝子医療支援システムと呼ぶことのできるものであって、次のことを構成上の特徴としている。 This HGM-aid can be called a comprehensive gene medical support system, and has the following structural features.

第１には、臨床医療における臨床データおよび基礎研究に関する研究データを蓄積するための知識分類データベースと、臨床データおよび研究データさらに気象データ、生活環境や嗜好などに係わるデータに含まれる用語の意味や範囲の定義（知識要素）を蓄積するための用語定義データベースと、知識分野データベースに蓄積されたデータを構成する知識要素同士の関連の強度を蓄積するための領域媒介データベースと、ユーザー管理データベースとを備えた医療用知識データベース支援システムであって、臨床データを基準値と照合することによって離散化したデータを生成し情報量の圧縮を行うための正規化した離散化データ処理機能手段を有している。 First, the knowledge classification database for accumulating clinical data and basic research data in clinical medicine, and the meaning of terms included in clinical data and research data, as well as weather data, data related to living environment and preferences, A term definition database for accumulating range definitions (knowledge elements), an area mediating database for accumulating the strength of the relationships between knowledge elements that constitute data accumulated in the knowledge domain database, and a user management database A medical knowledge database support system provided with normalized discretized data processing function means for generating discretized data by collating clinical data with a reference value and compressing the amount of information Yes.

また、第２には、臨床医療における臨床データおよび基礎研究に関する研究データを蓄積するための知識分類データベースと、臨床データおよび研究データに含まれる用語の意味や範囲の定義を蓄積するための用語定義データベースと、知識分野データベースに蓄積されたデータを構成する知識要素同士の関連の強度を蓄積するための領域媒介データベースと、ユーザー管理データベースとを備えた医療用知識データベース支援システムであって、関連する知識要素間で定義されている複数の関連情報を用いた探索機能手段を有していることを特徴とする医療用知識データベース支援システムであり、第３には、基点要素からの探索順序に係わる「深さ」と要素間の関連の「重さ」を閾値として制限する探索機能手段を有していることを特徴とする医療用知識データベース支援システムであることを特徴としている。 Secondly, a knowledge classification database for accumulating clinical data and clinical research data in clinical medicine, and a term definition for accumulating definitions of the meaning and scope of terms included in clinical data and research data. A medical knowledge database support system comprising a database, a domain-mediated database for accumulating the strength of association between knowledge elements constituting the data accumulated in the knowledge field database, and a user management database. A medical knowledge database support system characterized by having a search function means using a plurality of related information defined between knowledge elements, and thirdly, a search order from a base element It has a search function means that limits "depth" and the "weight" of the relationship between elements as a threshold. It is characterized in that it is a medical knowledge database support system.

そして、第４には、臨床医療における臨床データおよび基礎研究に関する研究データを蓄積するための知識分類データベースと、臨床データおよび研究データに含まれる用語の意味や範囲の定義を蓄積するための用語定義データベースと、知識分野データベースに蓄積されたデータを構成するレコード同士の結びつきの強度を蓄積するための領域媒介データベースと、ユーザー管理データベースとを備えた医療用知識データベース支援システムであって、臨床検査データとしての一次要素を同心円上に並べるとともに、一次要素からの関連探索によって導き出される二次要素は、一次要素から直線上もしくは放射状に配置し、二次要素間で関連のある要素は、分岐する要素から放射状に配置する探索グラフ表示手段を有していることを特徴とする医療用知識データベース支援システムであり、第５には、臨床検査データを同心円上に配置する医療用知識データベース支援システムであることを特徴としている。 Fourthly, a knowledge classification database for accumulating clinical data and clinical research data in clinical medicine, and a term definition for accumulating definitions of the meaning and scope of terms included in clinical data and research data. A medical knowledge database support system comprising a database, a domain-mediated database for accumulating the strength of ties between records constituting the data accumulated in the knowledge field database, and a user management database, wherein clinical laboratory data Are arranged on a concentric circle, and the secondary elements derived from the primary element by the association search are arranged linearly or radially from the primary element, and the elements related between the secondary elements are branched elements. Characterized by having search graph display means arranged radially from That a medical knowledge database support system, the fifth, is characterized by the clinical test data is a medical knowledge database support system that arranged concentrically.

臨床医療分野、基礎研究分野のデータをもとに、データベース化したデータ間の関連性を探索することによって、新たに、これまで関係が知られていなかった複数の臨床症状と遺伝子やタンパクとの関係を見出すことが可能とされ、ＨＧＭ−ａｉｄを完成し、運用することで、基礎研究分野と臨床医療分野の相互的な知識交流が促進され、疾患を包括的に捉えるための知識の蓄積が可能になる。 Based on data in the clinical medical field and basic research field, by searching for the relationship between the data in the database, it is possible to newly connect multiple clinical symptoms that have not been known so far with genes and proteins. It is possible to find the relationship, and by completing and operating HGM-aid, mutual knowledge exchange between the basic research field and the clinical medicine field is promoted, and the accumulation of knowledge to comprehensively grasp the disease is increased. It becomes possible.

本発明においては、決定木クラスタを上記のようなＨＧＭ−ａｉｄデータベースによって取り込むためのインタフェイスは図１４に沿って以下の手順によって行われる。この手順によって、ＨＧＭ−ａｉｄデータベースに蓄積されている知識要素間の関連付け情報の一部として、システムに取り込むことが可能となる。ＨＧＭ−ａｉｄでは、知識要素間の関連付け情報を手がかりとして、患者の臨床検査データから関連付けされている知識情報を探索することによって、患者の状態や診断（腫瘍進行度を含む）などを支援する情報をユーザに提示できる。
〔１〕前記＜Ａ＞において作成された決定木クラスタデータをシステムに読み込み、これを表示する。決定木クラスタデータは、ファイルまたはメモリ上の情報として渡される。
〔２〕ツリー構造をしている、決定木クラスタを解釈して、決定木の頂点からリーフに至る全ての経路を取り出す。１レコードに１つ経路を記述する。経路に含まれるノードは、（検査項目＋範囲）で構成されており、リーフには判定値（ここでは腫瘍進行度）が記載されている（たとえば図１５と図１６）。
〔３〕１レコードの中に、同じ検査項目が含まれていれば、その範囲の包含関係を調べて範囲データを書き換えて、同じレコードの中には、同じ検査項目が一ヵ所だけに含まれるようにデータを集約する。
〔４〕集約したレコードを全て調べて、その中に含まれている検査項目を全て取り出して、表に整理する（たとえば図１７）。
〔５〕整理した表を、腫瘍進行度（リーフ）によってソートすることによって、腫瘍進行度別に、そこに至る条件を記述した表となる。
〔６〕１レコードの各ノードを構成している検査項目毎の範囲は、決定木を作成するときに決められた範囲であるので、必ずしもその検査項目の異常値の範囲と一致するとは限らない。ＨＧＭ−ａｉｄでは、各検査項目に対して、異常低値・低値傾向・正常範囲・高値傾向・異常高値などの範囲でグレード分類している。グレード分類の範囲と決定木による検査値の範囲との包含関係を調べる（たとえば図１８と図１９）。
〔７〕腫瘍進行度を診断するための条件として、決定木により組み合わせられた検査値範囲を含んでいる、ＨＧＭ−ａｉｄのグレード分類の組み合わせを選択する。
〔８〕ＨＧＭ−ａｉｄには、関連する複数の知識要素が結び付けられた情報がデータベースとして保存されている。〔７〕で選択した腫瘍進行度に至る組み合わせを構成している検査項目のグレード分類を、知識要素として正規化し、関連情報としてデータベースに登録する（たとえば図２０）。
〔９〕決定木による検査値の範囲を分析して、グレードの範囲を更新するための情報として利用することも考えられる。 In the present invention, an interface for fetching a decision tree cluster by the HGM-aid database as described above is performed according to the following procedure along FIG. By this procedure, it is possible to import the information into the system as a part of the association information between knowledge elements accumulated in the HGM-aid database. In HGM-aid, information that supports patient status, diagnosis (including tumor progression), and the like by searching for knowledge information associated with clinical laboratory data of patients using association information between knowledge elements as clues Can be presented to the user.
[1] The decision tree cluster data created in <A> is read into the system and displayed. Decision tree cluster data is passed as information on a file or memory.
[2] The decision tree cluster having a tree structure is interpreted, and all paths from the vertex of the decision tree to the leaf are extracted. One route is described in one record. A node included in the route is configured by (examination item + range), and a determination value (in this case, a tumor progression degree) is described in the leaf (for example, FIGS. 15 and 16).
[3] If the same inspection item is included in one record, the inclusion relation of the range is examined and the range data is rewritten, and the same inspection item is included in only one place in the same record. So that the data is aggregated.
[4] All the collected records are examined, and all the inspection items contained therein are taken out and arranged in a table (for example, FIG. 17).
[5] By sorting the sorted table according to the degree of tumor progression (leaf), it becomes a table that describes the conditions leading to it by the degree of tumor progression.
[6] Since the range for each inspection item constituting each node of one record is a range determined when the decision tree is created, it does not necessarily match the range of the abnormal value of the inspection item. . In HGM-aid, each inspection item is classified into grades in a range of abnormally low value, low value tendency, normal range, high value tendency, abnormally high value, and the like. The inclusive relation between the grade classification range and the range of the inspection value by the decision tree is examined (for example, FIGS. 18 and 19).
[7] As a condition for diagnosing the degree of tumor progression, a combination of HGM-aid grade classifications including a test value range combined by a decision tree is selected.
[8] In HGM-aid, information in which a plurality of related knowledge elements are linked is stored as a database. The grade classification of the examination items constituting the combination that reaches the degree of tumor progression selected in [7] is normalized as a knowledge element and registered in the database as related information (for example, FIG. 20).
[9] It is conceivable to analyze the range of the inspection value by the decision tree and use it as information for updating the grade range.

全データＤｏとサブ決定木ＤＴｋとの関係を示す概念図である。It is a conceptual diagram which shows the relationship between all the data Do and the sub decision tree DTk. サブ決定木ＤＴｋを作成するためのクラスタリングのフローチャートである。It is a flowchart of clustering for creating sub-decision tree DTk. 決定木クラスタを作成する臨床検査データとサブ決定木関係概念図の前半部分である。It is the first half of the clinical test data and the sub-decision tree related conceptual diagram for creating a decision tree cluster. 図３に続く後半部分の図である。It is the figure of the second half part following FIG. サブ決定木とｄｏｎ’ｔｃａｒｅデータの集合との関係を示した概念図である。It is the conceptual diagram which showed the relationship between a sub decision tree and the set of don't care data. ｄｏｎ’ｔｃａｒｅの出現により判定不能となったデータの後処理法について説明した概念図である。It is the conceptual diagram explaining the post-processing method of the data which became undecidable by appearance of don't care. サブ決定木を構成する要素について説明した概念図である。It is the conceptual diagram explaining the element which comprises a sub decision tree. 決定木クラスタで未知の臨床検査データの腫瘍進行度を判定するフロー図である。It is a flowchart which determines the tumor progression degree of unknown clinical test data by a decision tree cluster. 決定木クラスタで未知の臨床データを判定するときのデータの流れを示した図である。It is the figure which showed the flow of data when determining unknown clinical data with a decision tree cluster. 決定木クラスタの適用による効果を説明した図である。It is a figure explaining the effect by application of a decision tree cluster. Ｃ４．５による決定木出力例（部分）を示した図である。It is the figure which showed the decision tree output example (part) by C4.5. Ｃ４．５による決定木を後処理によって患者番号を付加したデータ例を示した図である。It is the figure which showed the example of data which added the patient number by post-processing the decision tree by C4.5. 推論学習効果の取り込みフローを示した図である。It is the figure which showed the taking-in flow of the inference learning effect. 決定木クラスタをＨＧＭ−ａｉｄデータベースに取り込む手順を説明した図である。It is a figure explaining the procedure which takes in a decision tree cluster in a HGM-aid database. 決定木クラスタの読み込みと表示例を示した図である。It is the figure which showed reading of the decision tree cluster and the example of display. 図１５とは別の例を示した図である。It is the figure which showed the example different from FIG. 読込んだ決定木クラスタを集約−整理した表示例を示した図である。It is the figure which showed the example of a display which aggregated-reorganized the read decision tree cluster. 決定木クラスタによる条件範囲とＨＧＭ−ａｉｄのグレート分類の対象表示例を示した図である。It is the figure which showed the example of a target display of the condition range by a decision tree cluster, and the great classification of HGM-aid. 図１８とは別の例を示した図である。It is the figure which showed the example different from FIG. グレート分類の組み合わせによる条件の登録と表示の例を示した図である。It is the figure which showed the example of the registration and display of the conditions by the combination of a great classification.

Claims

A decision tree rule set is created from a combination set of clinical laboratory data closely related to the degree of tumor progression from data on the degree of progression of tumors determined using pathological findings and clinical laboratory data including tumor markers A system for mechanical extraction by a computer by a method,
1) Take out the parts that are not missing from all the inspection items and find the cluster of the subdata set to create a subdecision tree. 2) Cut out the data that contains missing / unknown data from the subdecision tree of 1) above. A system for extracting a combination set of clinical laboratory data characterized by having an arithmetic process for creating a sub-decision tree as a cluster having no unknown data.

The system of claim 1, further comprising:
3) A set of clinical laboratory data characterized by having a calculation process for creating a sub-decision tree in which all data is incorporated by extracting and reviewing missing / unknown data not included in the sub-decision tree in 2) above Combined set extraction system.

3. The laboratory test data according to claim 1 or 2, wherein a sub-decision tree is created and data including missing / unknown data is trimmed through a verification process using a knowledge database related to the strength of relationship between examination items. Extraction system for combination set.

A database of decision tree clusters having a tree structure extracted by the system according to any one of claims 1 to 3, wherein a combination of condition selection of clinical laboratory data of a patient whose tumor progression degree is unknown matches a decision tree cluster A tumor progression determination system characterized by mechanically determining tumor progression by sex calculation.

5. The tumor progression degree determination system according to claim 4, wherein the tumor progression degree is predicted from the tree structure path of the decision tree cluster from the calculation of matching by a combination of condition selections.

A clinical diagnosis support system comprising the system according to any one of claims 1 to 5 as at least a part of its configuration.