JPH0261769A

JPH0261769A - Generating device for classification determining tree

Info

Publication number: JPH0261769A
Application number: JP63214175A
Authority: JP
Inventors: Hiroyuki Izumi; 泉　寛幸; Hideki Sato; 秀樹佐藤; Yasuhiro Iijima; 飯島　泰裕; Fumiyo Suenaga; 末永　富美代; Ryoichi Narita; 成田　良一
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-08-29
Filing date: 1988-08-29
Publication date: 1990-03-01

Abstract

PURPOSE:To generate a classification determining tree to lead out a correcter inferred result by setting a suitable attribute threshold to an attribute to obtain plural values, to which an order can be attached, and suitably dividing an attribute area. CONSTITUTION:At first, an attribute value to obtain a continuous value to all the elements of a training example set is wholly extracted and arranged in small order in each attribute. An intermediate value thetai of the adjacent attribute values is obtained. This intermediate value thetai is defined as the attribute threshold and a threshold dividing table is prepared. Next, mutual information quantity I(thetai) is calculated by using the threshold dividing table. Then, an attribute threshold thetan' is obtained to lead out the maximum value of the mutual information quantity I(thetai) and by this attribute threshold thetan', a dividing table to describe the class condition of a training example to be sectioned is prepared. The mutual information quantity is calculated in each attribute and the attribute to show the largest mutual information quantity is allocated to a next low order node. Thus, the classifying tree is generated.

Description

【発明の詳細な説明】〔概要〕訓練事例の集合から分類決定木を生成するための分類決
定木生成装置に関し、より正しい推論結果を導き出す分類決定木を自動的に生
成することを目的とし、訓練事例の属性の値が順序付けの可能な複数の値を取る
ことになるときに、この属性値の値域の中から複数の属
性閾値を選択する属性閾値選択手段と、この選択された
属性閾値により区画される訓練事例のクラス状態を記述
する閾値分割表を、選択された各属性閾値毎に生成する
閾値分割表生成手段と、この生成された閾値分割表を使
い、属性閾値で区画したときの属性とクラスとの間の相
互情報量を、選択された各属性閾値毎に算出する相互情
報量算出手段と、この算出された複数の相互情報量の内
の有為な極大値の相互情報量をとる属性閾値に従って、
属性を区画する属性区画決定手段とを備え、この属性区
画決定手段により区画された属性とクラスとの間の相関
度の大きさを算出するとともに、順序付けの可能な複数
の値を取る属性が分類決定木の下位ノードとして選択さ
れることになるときは、下位ノードとして、この属性区
画決定手段により区画される属性を割り付けるよう処理
する。[Detailed Description of the Invention] [Summary] Regarding a classification decision tree generation device for generating a classification decision tree from a set of training examples, the present invention aims to automatically generate a classification decision tree that leads to more correct inference results. When the value of an attribute of a training example takes on a plurality of values that can be ordered, attribute threshold selection means selects a plurality of attribute thresholds from the range of the attribute value, and the selected attribute threshold Threshold contingency table generation means that generates a threshold contingency table that describes the class state of training examples to be partitioned for each selected attribute threshold; Mutual information calculation means for calculating mutual information between an attribute and a class for each selected attribute threshold, and mutual information of a significant maximum value among the plurality of mutual information calculated. According to the attribute threshold that takes
an attribute partition determining means for partitioning the attributes, the attribute partition determining means calculates the degree of correlation between the partitioned attributes and the class, and classifies attributes that take a plurality of values that can be ordered. When the node is selected as a lower node of the decision tree, processing is performed to allocate the attribute partitioned by this attribute partition determining means as the lower node.

〔産業上の利用分野］本発明は、分類決定装置の推論処理の基礎となる分類決
定木を生成するための分類決定木生成装置に関するもの
である。[Industrial Application Field] The present invention relates to a classification decision tree generation device for generating a classification decision tree that is the basis of inference processing of the classification decision device.

例えば、患者の病症例や検査結果をもとにして罹ってい
る病気を特定する病気診断や、機械の異常なメータ計測
値、音、動き等から機、械の異常原因を推定する故障診
断といったような各種の診断分野において、分類用知識
を知識ベースと呼ばれる記憶装置に格納し、診断事例が
入力されたときに、この知識ベース内の知識をもとに診
断事例の所属するクラスを推論するという処理を行う分
類決定装置が広（用いられるようになってきている。Examples include disease diagnosis, which identifies the disease a patient is suffering from based on the patient's illness case and test results, and failure diagnosis, which estimates the cause of machine abnormalities based on abnormal meter readings, sounds, movements, etc. In various diagnostic fields, classification knowledge is stored in a storage device called a knowledge base, and when a diagnostic case is input, the class to which the diagnostic case belongs is inferred based on the knowledge in this knowledge base. Classification determining devices that perform this process are becoming widely used.

従来の分類決定装置として最も広く用いられているもの
に、知識ベース型推論装置がある。この知識ベース型推
論装置は、第１１図に示すように、「もし、スタータが
回転せず、ライトがつかなければ、バッテリが異常であ
る。」、「もし、スタータが回転せず、ライトがつけば
、ケーブルの接続が異常である。」等のように、専門家
の経験的な知識を推論規則の形式でまとめて格納する知
識ベース１と、クラスに分類しようとしている診断事例
の属性を入力する診断事例入力装置２と、入力された診
断事例の属性と知識ベース１の推論規則とをもとにして
、入力された診断事例の所属するクラスを推論するクラ
ス推論装置３と、推論結果としてのクラスを出力する分
類結果出力装置４とから構成されるものである。One of the most widely used conventional classification determination devices is a knowledge-based reasoning device. As shown in Fig. 11, this knowledge-based reasoning device has two methods: ``If the starter does not rotate and the light does not come on, the battery is abnormal.'' and ``If the starter does not rotate and the light does not come on, then the battery is abnormal.'' Knowledge base 1 stores the experiential knowledge of experts in the form of inference rules, such as "If the cable is connected abnormally", and the attributes of the diagnostic cases to be classified into classes. A diagnostic case input device 2, a class inference device 3 that infers the class to which the input diagnostic case belongs based on the attributes of the input diagnostic case and the inference rules of the knowledge base 1, and an inference result. The classification result output device 4 outputs the classes as shown in FIG.

第１２図に、知識ベースｌに格納される推論規則の一例
を示す。このような木構造をなす分類知識は、分類決定
木と呼ばれている。この分類決定木は、この図に示すよ
うに属性に対応付けられるノードの木構造からなり、他
のノードからの技を持たない根ノードから属性値に従っ
て下位ノードにと分岐し、このような分岐を続けること
で、最終的に診断事例のクラスを特定することになる葉
ノードにと分岐する構造を採るのである。そして、クラ
ス推論装置３は、この根ノードから出発して、診断事例
の属性に従ってこの木構造をたどっていくことで、診断
事例の所属するクラスを推論するよう処理することにな
る。FIG. 12 shows an example of inference rules stored in knowledge base l. Classification knowledge having such a tree structure is called a classification decision tree. As shown in this figure, this classification decision tree consists of a tree structure of nodes that are associated with attributes, and branches from a root node that has no skills from other nodes to lower nodes according to attribute values. By continuing, a structure is adopted that branches to leaf nodes that ultimately specify the class of the diagnostic case. Then, the class inference device 3 starts from this root node and traces this tree structure according to the attributes of the diagnostic case to infer the class to which the diagnostic case belongs.

しかしながら、このような構成の知識ベース型推論装置
では、分類に関する経験的な知識を専門家から抽出して
いかなくてはならないことから、知識ベース１の構築に
非常に手間がかかってしまうことになる。そこで、第１
３図に示すように、例えば病気診断のカルテのような、
予め求められている属性とクラスとの間の相関関係を表
す訓練事例の集合を入力する訓練事例入力装置５を用意
するとともに、この訓練事例入力装置５から入力される
ことになる訓練事例の属性とクラスとの間の相関関係の
強さに従って、分類決定木を自動生成する分類決定木生
成装置６を備えることで、知識ベース１を自動的に構築
できる装置構成を採るようにしている。However, in a knowledge-based inference device with such a configuration, experiential knowledge regarding classification must be extracted from experts, which makes building the knowledge base 1 very time-consuming. Become. Therefore, the first
As shown in Figure 3, for example, medical records for disease diagnosis,
A training example input device 5 is prepared for inputting a set of training examples representing correlations between attributes and classes that have been determined in advance, and attributes of training examples to be input from this training example input device 5 are provided. By providing a classification decision tree generation device 6 that automatically generates a classification decision tree according to the strength of the correlation between the knowledge base 1 and the class, the device configuration is such that the knowledge base 1 can be automatically constructed.

訓練事例の集合は、具体的には、「事例Ｅ、は、属性Ａ
、の値としてＶｉｌを持ち、属性Ａ、の値としてＶ＋Ｚ
を持ち、・・・・属性Ａ６の値としてｖＩｈを持ち、ク
ラスＣ４所属する。」と記述される訓練事例の集まりか
らなるものである。第１４図に、その訓練事例の集合の
一例を示す、この訓練事例は、例えば心臓病であるか否
かというような特定の異常状態にあるか否かを表すもの
であり、属性をなす検査とは、例えば体温、血液、心電
図等の各種検査を表し、これらの検査値が属性値となる
のである０分類決定木生成装置６は、このような診断事
例の集合を使って、分類決定木を自動生成するのである
。Specifically, the set of training examples is "Case E has attribute A.
, has Vil as the value of , and V+Z as the value of attribute A,
... has vIh as the value of attribute A6, and belongs to class C4. It consists of a collection of training examples described as ``. FIG. 14 shows an example of a set of training examples. This training example represents whether or not a specific abnormal state exists, such as whether or not a person has heart disease, and tests that form attributes. represents various tests such as body temperature, blood, electrocardiogram, etc., and these test values become attribute values.The 0 classification decision tree generation device 6 uses a collection of such diagnostic cases to create a classification decision tree. is automatically generated.

このように構成される分類決定装置が正しい推論結果を
出すためには、これまでの説明からも明らかなように、
より正しい分類決定木を構築できるようにと分類決定木
生成装置６を構成していく必要があるのである。As is clear from the previous explanation, in order for the classification decision device configured in this way to produce correct inference results,
It is necessary to configure the classification decision tree generation device 6 so that it can construct a more accurate classification decision tree.

[Conventional technology]

次に、分類決定木生成装置６が実行するところの分類決
定木の生成処理の従来技術について説明する。Next, the conventional technology of the classification decision tree generation process executed by the classification decision tree generation device 6 will be described.

従来の分類決定木生成装置６は、訓練事例入力装置５か
ら訓練事例の集合Ｓが与えられると、下記の手順により
分類決定木を生成していたのである。すなわち、（ｓｔｌ）訓練事例の集合Ｓに対する根ノードを作成す
る。The conventional classification decision tree generation device 6 generates a classification decision tree according to the following procedure when the training example set S is given from the training example input device 5. That is, (stl) Create a root node for the set S of training examples.

（ｓｔ２）　　Ｓが空集合ならば、対応するノードをク
ラス不定の葉ノードとする。(st2) If S is an empty set, the corresponding node is set as a leaf node of undefined class.

（ｓｔ３）ノードに割り付けられた事例集合がすべて同
じクラスに属しているならば、その事例をもとにクラス
を決定してノードにラベルを付け、そのノードを葉ノー
ドとして手続きを終了する。(st3) If all the case sets assigned to a node belong to the same class, the class is determined based on the cases, a label is attached to the node, and the procedure is completed with the node as a leaf node.

（ｓＬ４）すべてが同じクラスに属していないならば、
訓練事例の属性集合Ｔから属性とクラスとの間の相関度
の最も大きい１つの属性を選択し、ノードを分岐ノード
としてその属性を格納する。(sL4) If not all belong to the same class,
One attribute with the highest degree of correlation between the attribute and the class is selected from the attribute set T of the training examples, and the attribute is stored with the node as a branch node.

（ｓ　ｔ５）その属性値に従って、事例集合を有限個の
排他的な部分集合に分割し、分割された各部分集合に対
して下位ノードを作成して割り付ける。(s t5) Divide the case set into a finite number of exclusive subsets according to the attribute values, and create and assign lower nodes to each divided subset.

（ｓ　ｔ６）各下位ノードに対して、ｓｔ２から繰り返
す。(s t6) Repeat from st2 for each lower node.

このようにして、属性とクラスとの間の相関度の大きさ
に従って、次の下位ノードにと分岐していく構造をとる
分類決定木が自動生成される。この相関度を計算するの
に、例えば「相互情報量」等が用いられている。In this way, a classification decision tree is automatically generated that has a structure that branches to the next lower node according to the degree of correlation between attributes and classes. For example, "mutual information" is used to calculate this degree of correlation.

上述の（Ｓ　１５）の処理にあっては、上位ノードから
下位ノードにと分岐していくために、属性値に従って、
その上位ノードに属する訓練事例集合を分割していくこ
とになる。次に、この分割処理の従来技術について説明
する。In the above-mentioned process (S15), in order to branch from the upper node to the lower node, according to the attribute value,
The training example set belonging to the upper node will be divided. Next, a conventional technique for this division process will be explained.

属性値が“ＹＥＳ″か″ＮＯ”のような順序付けの不可
能な値を取るようなときにあっては、この分割処理に自
由度はないので、そのまま属性値の数に従って分割され
ることになる。一方、例えば「体重」のように属性値が
連続値を取るようなものや、「視力Ｊのように属性値が
複数の離散値を取るもののように、属性値が順序付けの
可能な複数の値を取るようなときにあっては、第１５図
に示すように、属性値がクラスに対して重なりを持たな
いようなときには何ら問題はないものの、第１６図に示
すように、属性値がクラスに対して重なりを持つような
ときには、属性領域をどのように分割していくかにより
、属性とクラスとの間の相関度の大きさが大きく異なっ
てくることになる。これから、このような分割が適切に
なされないと、上位ノードに位置すべき属性が下位ノー
ドに設定されてしまったり、あるいは、分岐すべき下位
ノードの数が適切でないものに設定されてしまったりす
ることが起こり、分類決定装置の分類能力の低下を招く
ことになる。When an attribute value takes a value that cannot be ordered, such as "YES" or "NO", there is no degree of freedom in this division process, so it is divided according to the number of attribute values. Become. On the other hand, for example, there are cases where the attribute value takes continuous values, such as ``weight,'' or items where the attribute value takes multiple discrete values, such as ``vision J,'' where the attribute value has multiple values that can be ordered. As shown in Figure 15, there is no problem when the attribute value has no overlap with the class, but as shown in Figure 16, when the attribute value When there is overlap between attributes and classes, the degree of correlation between attributes and classes will vary greatly depending on how the attribute area is divided. If this is not done properly, an attribute that should be located at a higher-level node may be set at a lower-level node, or the number of lower-level nodes that should be branched may be set to an inappropriate number, leading to problems in classification decisions. This will lead to a decline in the classification ability of the device.

そこで、従来の分類決定木生成装置６では、このような
順序付けの可能な複数の値を取る属性の分割処理を、分
類決定木作成者の経験的な判断や「境界値分割方式」や
「中間値分割方式」等のような分割手法により実行して
いたのである。ここで、この「境界値分割方式」という
のは、第１７図に示すように、各クラスに属する訓練事
例の属性値の最大値と最小値とを求め、これらを小さい
順に並べることで属性領域を区画するところの属性閾値
を求める方式であり、一方、「中間値分割方式」という
のは、第１８図に示すように、各クラスに属する訓練事
例の属性値の平均値を求め、これらの平均値の中間値を
もって属性領域を区画するところの属性閾値を求める方
式である。Therefore, in the conventional classification decision tree generation device 6, the division process of attributes that take multiple values that can be ordered is performed based on the empirical judgment of the classification decision tree creator or the "boundary value division method" or "intermediate value division method". This was done using a division method such as "value division method". Here, as shown in Figure 17, this "boundary value division method" calculates the maximum and minimum values of the attribute values of the training examples belonging to each class, and arranges them in descending order to create an attribute area. On the other hand, the "median value division method", as shown in Figure 18, calculates the average value of the attribute values of the training examples belonging to each class, and This is a method of determining an attribute threshold value for dividing an attribute area using a median value of average values.

[Problem to be solved by the invention]

しかしながら、分類決定木作成者の経験的な判断による
従来技術では、分類決定木作成者のノウハウに左右され
るために、分類決定木を自動的に生成できないという問
題点があった。そして、「境界値分割方式」による従来
技術では、属性閾値となる最大値や最小値というのが特
定の要素にのみ依存するものであることから、そのクラ
スに属する訓練事例の属性値を適切に反映しているとは
言えないという問題点があった。また、「中間値分割方
式」による従来技術では、「境界値分割方式」のような
特定の要素に依存するという問題点はないものの、第１
９図に示すように、平均値を提供する双方のクラスに属
する訓練事例の要素数が大きく異なる場合には、適切な
属性閾値とはならないという問題点があった。すなわち
、従来技術では、高い分類正解率を実現できる分類決定
木を生成−しているとは言えなかったのである。However, the conventional technique based on the empirical judgment of the classification decision tree creator has the problem that the classification decision tree cannot be automatically generated because it depends on the know-how of the classification decision tree creator. In the conventional technology based on the "boundary value splitting method," the maximum and minimum values that serve as attribute thresholds depend only on specific elements, so the attribute values of training examples belonging to that class are appropriately determined. There was a problem in that it could not be said that the results were reflected. In addition, although the conventional technology based on the "intermediate value division method" does not have the problem of dependence on specific elements like the "boundary value division method", the first
As shown in FIG. 9, when the number of elements of the training examples belonging to both classes for which the average value is provided is significantly different, there is a problem that the attribute threshold value is not appropriate. In other words, it cannot be said that the conventional technology generates a classification decision tree that can achieve a high classification accuracy rate.

本発明はかかる事情に鑑みてなされたものであって、順
序付けの可能な複数の値を取る属性に対して、適切な属
性閾値を設定できるようにすることで属性領域を適切に
分割できるようにして、より正しい推論結果を導き出す
分類決定木を生成できるようにする分類決定木生成装置
の提供を目的とするものである。The present invention has been made in view of the above circumstances, and it is possible to appropriately divide an attribute area by making it possible to set an appropriate attribute threshold value for an attribute that takes a plurality of values that can be ordered. It is an object of the present invention to provide a classification decision tree generation device that can generate a classification decision tree that leads to more accurate inference results.

[Means to solve the problem]

第１図は本発明の原理構成図である。 FIG. 1 is a diagram showing the basic configuration of the present invention.

図中、１は分類決定木を格納する知識ヘース、５は訓練
事例の集合の入力処理を実行する訓練事例入力装置、６
は訓練事例入力装置５から人力される訓練事例の属性と
クラスとの間の相関関係の大きさに従って、分類木を自
動生成する分類決定木生成装置である。この分類決定木
生成装置６は、属性モード判断手段６０、属性閾値選択
手段６１、閾値分割表生成手段６２、相互情報量算出手
段６３、属性区画決定手段６４、分割表生成手段６５、
相関度算出手段６６、最大相関度検出手段６７及び下位
ノード割付手段６８を備える。In the figure, 1 is a knowledge base that stores classification decision trees, 5 is a training example input device that performs input processing on a set of training examples, and 6 is a knowledge base that stores classification decision trees.
is a classification decision tree generation device that automatically generates a classification tree according to the magnitude of the correlation between the attributes and classes of training examples input manually from the training example input device 5. This classification decision tree generating device 6 includes an attribute mode determining means 60, an attribute threshold selecting means 61, a threshold contingency table generating means 62, a mutual information calculating means 63, an attribute partition determining means 64, a contingency table generating means 65,
It includes correlation degree calculation means 66, maximum correlation degree detection means 67, and lower node allocation means 68.

属性モード判断手段６０は、訓練事例入力装置５から入
力される訓練事例の属性が、順序付けの可能な複数の値
を取る属性か、順序付けの不可能な値を取る属性のいず
れに相当するのかを判断し、属性閾値選択手段６１は、
属性モード判断手段６０が順序付けの可能な複数の値を
取る属性であると判断するときに、この属性値の値域の
中から複数の属性閾値を選択し、閾値分割表生成手段６
２は、属性閾値選択手段６１により選択された属性閾値
により区画される訓練事例のクラス状態を記述する閾値
分割表を、選択された各属性閾値毎に生成し、相互情報
量算出手段６３は、閾値分割表生成手段６２により生成
された閾値分割表を使い、所定の相互情報量算術式に従
って、属性閾値で区画したときの属性とクラスとの間の
相互情報量を、選択された各属性閾値毎に算出し、属性
区画決定手段６４は、相互情報量算出手段６３により算
出された複数の相互情報量の内の有為な極大値の相互情
報量をとる属性閾値に従って属性を区画し、分割表生成
手段６５は、属性区画決定手段６４により区画される訓
練事例のクラス状態を記述する分割表を生成するととも
に、属性モード判断手段６０が訓練事例の属性が順序付
けの不可能な値を取る属性であると判断するときに、そ
の値により区画される訓練事例のクラス状態を記述する
分割表を生成し、相関度算出手段６６は、分割表生成手
段６５により生成された分割表に従ワて、属性とクラス
との間の相関度の大きさを算出し、最大相関度検出手段
６７は、相関度算出手段６６が算出した複数の相関度の
中から、最も大きな値を示す相関度を検出することで下
位ノードに割り付けるための属性を特定し、下位ノード
割付手段６８は、相関度の大きさにより、順序付けの可
能な複数の値を取る属性が分類決定木の下位ノードとし
て選択されることになるときにあっては、下位ノードと
して、属性区画決定手段６４により決定された属性の区
画を割り付けるよう処理するとともに、相関度の大きさ
により、順序付けの不可能な値を取る属性が分類決定木
の下位ノードとして選択されることになるときにあって
は、下位ノードとして、その値により区画される属性を
割り付けるよう処理する。The attribute mode determining means 60 determines whether the attribute of the training example input from the training example input device 5 corresponds to an attribute that takes a plurality of values that can be ordered or an attribute that takes values that cannot be ordered. The attribute threshold selection means 61 determines,
When the attribute mode determining means 60 determines that the attribute takes a plurality of values that can be ordered, a plurality of attribute threshold values are selected from the range of the attribute value, and the threshold contingency table generating means 6
2 generates, for each selected attribute threshold, a threshold contingency table that describes the class state of the training examples partitioned by the attribute threshold selected by the attribute threshold selection means 61, and the mutual information calculation means 63, Using the threshold contingency table generated by the threshold contingency table generation means 62, the mutual information between the attributes and the classes when partitioned by the attribute threshold is determined by each selected attribute threshold according to a predetermined mutual information arithmetic formula. The attribute partition determination means 64 partitions and divides the attributes according to an attribute threshold value that takes the mutual information of the significant maximum value among the plurality of mutual information calculated by the mutual information calculation means 63. The table generation means 65 generates a contingency table that describes the class states of the training examples partitioned by the attribute partition determination means 64, and the attribute mode judgment means 60 generates a contingency table that describes the class state of the training examples partitioned by the attribute partition determination means 64. When it is determined that , calculates the magnitude of the degree of correlation between the attribute and the class, and the maximum correlation degree detection means 67 detects the degree of correlation showing the largest value from among the plurality of degrees of correlation calculated by the degree of correlation calculation means 66. By doing so, the attribute to be assigned to a lower node is specified, and the lower node allocation means 68 selects an attribute that takes a plurality of values that can be ordered as a lower node of the classification decision tree depending on the degree of correlation. In this case, the attribute section determined by the attribute section determining means 64 is assigned as a lower node, and attributes that take values that cannot be ordered are placed at the bottom of the classification determination tree depending on the degree of correlation. When the node is to be selected as a lower node, processing is performed to allocate an attribute partitioned by its value as a lower node.

[Effect]

分類決定木生成装置６は、属性とクラスとの間の相関関
係を表す訓練事例の集合から、属性とクラスとの間の相
関度の大きさに従って属性を選択し、そして、このよう
にして選択される属性を下位ノードに割り付けていくこ
とで、上位ノードから下位ノードにと分岐していくよう
構成される分類決定木の生成を実行する。このとき、本
発明の分類決定木生成装置６では、順序付けの可能な複
数の値を取る属性に対しては、相互情報量算出手段６３
により算出された相互情報量の内の有為な極大値の相互
情報量をとる属性閾値に従つて属性を区画し、そして、
この区画された属性とクラスとの間の相関度をもって、
分類決定木を生成する際の属性の選択基準の相関度とな
すよう処理することになる。The classification decision tree generation device 6 selects attributes from a set of training examples representing correlations between attributes and classes according to the degree of correlation between attributes and classes, and selects them in this way. By assigning the attributes to the lower nodes, a classification decision tree is generated that branches from the upper nodes to the lower nodes. At this time, in the classification decision tree generation device 6 of the present invention, for attributes that take a plurality of values that can be ordered, the mutual information calculation means 63
The attributes are divided according to an attribute threshold that takes the mutual information of the significant maximum value among the mutual information calculated by, and
With the degree of correlation between this partitioned attribute and class,
Processing is performed so that the degree of correlation is the criterion for selecting attributes when generating a classification decision tree.

この相互情報量に従って求まる属性閾値により区画され
る属性は、クラスに関しての情報量を最大的に持つこと
になる。しかも、計算処理により一意的に定まるもので
ある。これから、本発明によれば、分類能力の高い分類
決定木が、客観的かつ自動的に生成されることになる。The attributes defined by the attribute threshold determined according to this amount of mutual information have the maximum amount of information regarding the class. Moreover, it is uniquely determined by calculation processing. From now on, according to the present invention, a classification decision tree with high classification ability will be objectively and automatically generated.

（実施例〕以下、実施例に従って本発明の詳細な説明する。(Example〕 Hereinafter, the present invention will be explained in detail according to examples.

第２図に、本発明の分類決定木生成装置６が実行する分
類決定木の生成処理のフローチャートを示す。このフロ
ーチャートに示すように、本発明も従来技術と変わるこ
となく、最初に、訓練事例人力装置５から人力される属
性Ａが、順序付けの可能な複数の値（以下、連続値とい
うことにする）を取る属性か、あるいは順序付けの不可
能な値（以下、不連続値ということにする）を取る属性
なのかを判断する。そして、属性値が不連続値を取ると
きには、その値により区画される訓練事例のクラス状態
を記述する分割表を作成し、属性値が連続値を取るとき
には、先ず属性を区画する属性閾値を設定してから、そ
の属性閾値により区画される訓練事例のクラス状態を記
述する分割表を作成するよう処理する。すべての属性に
ついて、この分割表の作成処理を終了すると、生成され
た分割表を使って属性とクラスとの間の相関度を算出す
ることで属性の優先度を算出し、最も高い優先度の属性
、すなわち最も大きい相関度をとる属性を選択して下位
ノードにと割り付けていくことで、分類決定木を生成す
ることになる。FIG. 2 shows a flowchart of the classification decision tree generation process executed by the classification decision tree generation device 6 of the present invention. As shown in this flowchart, the present invention is no different from the prior art; first, the attribute A that is manually input from the training example human-powered device 5 is a plurality of values that can be ordered (hereinafter referred to as continuous values). It is determined whether the attribute takes values that cannot be ordered or values that cannot be ordered (hereinafter referred to as discontinuous values). Then, when the attribute value takes discrete values, a contingency table is created that describes the class states of the training examples divided by that value, and when the attribute value takes continuous values, the attribute threshold for dividing the attributes is first set. It then processes to create a contingency table that describes the class states of the training examples partitioned by the attribute thresholds. When this contingency table creation process is completed for all attributes, the priority of the attributes is calculated by calculating the correlation between attributes and classes using the generated contingency table, and the highest priority A classification decision tree is generated by selecting an attribute, that is, an attribute with the highest degree of correlation, and assigning it to lower nodes.

本発明は、この連続値を取る属性に対しての属性閾値の
設定処理を相互情報量に従って決定していくことで、分
類能力の高い分類決定木の生成を実現することを提案す
るものである。The present invention proposes to realize generation of a classification decision tree with high classification ability by determining attribute threshold setting processing for attributes that take continuous values according to mutual information. .

第３図に、本発明の分類決定木生成装置６が実行する、
この属性閾値を決定するためのフローチャートを示す。FIG. 3 shows the steps executed by the classification decision tree generation device 6 of the present invention.
A flowchart for determining this attribute threshold is shown.

次に、このフローチャートについて説明する。なお、〔
従来の技術）の欄でも説明したように、分類決定木の生
成アルゴリズムは、再帰的に実行されることを基本とし
ている。従って、以下に説明する処理は、分類対象とな
る訓練事例の違いを除けば、根ノードから下位ノードへ
と分岐していくときの生成処理のみならず、中間に位置
する中間ノードから次の下位ノードへと分岐していくと
きの生成処理に対しても全く同じものとなる。Next, this flowchart will be explained. In addition,〔
As explained in the ``Prior Art'' section, the classification decision tree generation algorithm is basically executed recursively. Therefore, excluding the difference in training examples to be classified, the process described below is not only the generation process when branching from the root node to the lower node, but also the generation process from the intermediate node located in the middle to the next lower node. The generation process when branching to a node is exactly the same.

最初に、ステップ１で示すように、訓練事例集合のすべ
ての要素に対して、連続値を取る属性の属性値をすべて
抽出し、この抽出した属性値を属性毎に小さい順に並べ
るよう処理する。説明の便重上、この並べられたある属
性の属性値を、（Ｖ＋≦■２≦■、≦、　、　、　、　
、≦Ｖｋ）で表すことにする。糖尿病の検査の必要性の
有無を示す第４図に示すような訓練事例の集合を想定す
るならば、この処理により「体重」、「身長」。First, as shown in step 1, all attribute values of attributes that take continuous values are extracted for all elements of the training example set, and the extracted attribute values are arranged in descending order for each attribute. For convenience of explanation, the attribute value of this arranged attribute is (V+≦■2≦■,≦, , , ,
, ≦Vk). If we assume a set of training cases as shown in Figure 4, which shows whether or not a diabetes test is necessary, this process will determine the "weight" and "height".

「視力ｊ等のような連続値を取る各属性毎に、小さい順
に並べられた属性値の植列が得られることになる。例え
ば、「体重」に対しては、（５４，３，５４，５，５４
，６，・・・・１０２．１　）のような植列が得られる
のである。For each attribute that takes continuous values, such as visual acuity j, a row of attribute values arranged in ascending order is obtained.For example, for "weight", (54, 3, 54, 5,54
, 6, ... 102.1) can be obtained.

続いて、ステップ２のステップ２１で、先ず、Ｖ、＋Ｖ
、、。Next, in step 21 of step 2, first, V, +V
,,.

θ、− を算出する。この算出処理により、ステップｌの処理で
求められた隣り合う属性値の中間値が求められることに
なる。「体重」の例で説明するならば、５４．３ｋｇと
５４．５ｋｇとの中間値である５４．４ｋｇというよう
に求められる。ステップ２１では、次に、このようにし
て算出されたθ１を属性閾値として、閾値分割表を作成
するよう処理する。Calculate θ,−. Through this calculation process, the intermediate value of the adjacent attribute values found in the process of step l is found. To explain using the example of "body weight," it is determined to be 54.4 kg, which is the intermediate value between 54.3 kg and 54.5 kg. In step 21, a threshold contingency table is created using the thus calculated θ1 as the attribute threshold.

二の閾値分割表は、属性閾値により区画される訓練事例
のクラス状態を記述するものである。第５図に作成され
た閾値分割表の例を示す。第５図（Ａ）は、θ、＝６５
．４ｋｇを属性閾値とする閾値分割表の例であり１．第
５図（Ｂ）は、θ１＝８９．８ｋｇを属性閾値とする閾
値分割表の例である。このように、ステップ２１では、
算出された属性閾値毎に、閾値分割表を作成することに
なる。なお、分類すべきクラスが複数あるときには、こ
の閾値分割表は、第６図に示すように、より一般的な記
述形式で表されるものである。第６図中、Ｎ、はクラス
Ｃノに属する診断事例の個数であり、Ｆｊ（θ）はクラ
スＣ４に属する診断事例の内で、属性値が属性閾値θよ
り小さな値となるものの個数であり、［Ｎ、−Ｆ、（θ
））はクラスＣ１に属する診断事例の内で、属性値が属
性閾値θより大きな値となるものの個数である。但し、
１５１５ｍ。The second threshold contingency table describes the class states of training examples divided by attribute thresholds. FIG. 5 shows an example of the threshold contingency table created. FIG. 5(A) shows θ,=65
．． This is an example of a threshold contingency table with 4kg as the attribute threshold.1. FIG. 5(B) is an example of a threshold contingency table with θ1=89.8 kg as the attribute threshold. In this way, in step 21,
A threshold contingency table is created for each calculated attribute threshold. Note that when there are multiple classes to be classified, this threshold contingency table is expressed in a more general description format, as shown in FIG. In Fig. 6, N is the number of diagnostic cases belonging to class C, and Fj(θ) is the number of diagnostic cases belonging to class C4 whose attribute value is smaller than the attribute threshold θ. , [N, −F, (θ
)) is the number of diagnostic cases belonging to class C1 whose attribute value is larger than the attribute threshold value θ. however,
1515m.

ステップ２１での処理を終了すると、次のステップ２２
で、作成された閾値分割表を使い、相互情報量Ｉ（θＬ
）を下記の式に従って算出するよう処理する。When the process in step 21 is finished, the next step 22
Using the created threshold contingency table, the mutual information I(θL
) is calculated according to the formula below.

１　（θｔ）−ｘＮ。1 (θt)−x N.

（Σ（Ｆ、（θ）ｌｏｇｚＦｊ（θ））±Σ（（Ｎ、−
Ｆ、（θ））・ｌＯｇｚ（Ｎ；−Ｆ　７（θ）））−Σ
（Ｎ７　・ｌｏｇｚＮ；）十Ｎ＋＋　・ＩｏｇｚＮｏ）
・・・・・（１）式但し、Ｎｏは訓練事例の総個数、Ｑ　・ｌｏｇｔＯ＝　Ｏとする。(Σ(F,(θ)logzFj(θ))±Σ((N,-
F, (θ))・lOgz(N;-F 7(θ)))-Σ
(N7 ・logzN;) 10N++ ・IogzNo)
...Formula (1), where No is the total number of training examples, Q ・logtO=O.

（１）式で定義した相互情報量Ｉ（θｉ）は、対応する
事例の属するクラスに関しての情報量の大きさを表して
いる。従って、相互情報量Ｉ（θ、）が大きな値を示す
ことになる属性閾値ｅ直により区画された属性を使えば
、クラスを推論するための情報量が多いことから分類能
力の高い分類木が構築できることになる。相互情報量そ
のものは、従来技術でも、属性とクラスとの間の相関度
を算出するための評価式の１つとして用いられていたも
のであるが、本発明では、従来技術と異なり、この相互
情報量を連続値を取る属性の区画の分割処理の評価に用
いようとするのである。The mutual information amount I(θi) defined by equation (1) represents the amount of information regarding the class to which the corresponding case belongs. Therefore, if we use attributes partitioned by the attribute threshold e, which gives a large value of mutual information I(θ,), a classification tree with high classification ability can be created because there is a large amount of information for inferring classes. It will be possible to build. Mutual information itself has been used in the prior art as one of the evaluation formulas for calculating the degree of correlation between attributes and classes, but in the present invention, unlike the prior art, this mutual information The aim is to use the amount of information to evaluate the process of partitioning sections of attributes that take continuous values.

ステップｌで求められたすべてのＶ、に対して、属性毎
に、ステップ２１及びステップ２２の処理を実行するこ
とで、すべての属性閾値に関しての相互情報量Ｉ（θ、
）を算出すると、続くステップ３で、属性毎に、 ■（θト、）≦Ｉ（ｅ、）であり、かつ、 ■（θＩ）≧■（θゑ、１）という不等式を満たす属性閾値θ、を検出する。By executing the processes of steps 21 and 22 for each attribute for all V obtained in step l, mutual information I(θ,
) is calculated, and in the subsequent step 3, the attribute threshold θ is calculated for each attribute, which satisfies the inequality: ■(θto,)≦I(e,) and ■(θI)≧■(θゑ,1) , to detect.

すなわち、相互情報ＩＩ（θ、）が極大値を示すところ
の属性閾値を求めるのである。この極大値をもたらす属
性閾値を〔θ、゛、θ２゛、θ３゛、・・・θ、°〕で表す。That is, the attribute threshold value at which the mutual information II (θ,) exhibits the maximum value is determined. The attribute threshold values that bring about this maximum value are expressed as [θ, ゛, θ2゛, θ3゛, . . . θ, °].

本発明では、このようにして求められた８７′を使って
、属性の値域である［５ｉｎＡ、　ｍａｘＡ］をそれら
で分割して、を限個の排他的な区間の［ｍ１ｎＡ、　θ
、°）［θ１“、θ＋）、・・・・・・・・・［θ６°
、ｍａｘＡ］を生成するよう処理する。このように、極大値をもたら
す属性閾値に従って、連続値を取る属性を区画するのは
、クラスを推論するための情報量が多いところを有効に
使うためである。この区間の生成にあたって、隣接する
θ、゛間の距離が有為な差となるほどに大きくない場合
には、その（３ｎ＋間の区間は、分類能力的には有為な
区間とはならないことから、いずれか一方を棄却してし
まうという手法を採るということも意味のあることであ
る。In the present invention, using 87' obtained in this way, the value range of the attribute [5inA, maxA] is divided into the limited exclusive intervals [m1nA, θ
, °) [θ1", θ+), ......[θ6°
, maxA]. The reason why attributes that take continuous values are divided according to the attribute threshold that gives the local maximum value is to effectively use a large amount of information for inferring a class. When generating this interval, if the distance between adjacent θ,゛ is not large enough to be a significant difference, the interval between (3n+) is not a significant interval in terms of classification ability. , it is also meaningful to adopt a method of rejecting one or the other.

ステップ３で、相互情報量の極大値をもたらす属性閾値
θ、゛を求め、この属性閾値θ７゛によって属性の区画
を決定すると、連続値をとる属性とクラスとの間の相関
度を計算するために、本発明の分類決定木生成装置６は
、この属性閾値θ７″により区画される訓練事例のクラ
ス状態を記述する分割表を作成することになる。第５図
に例示した、θ＋−６５．４ｋｇ及びθ１＝８９．８ｋ
ｇという属性閾値が、「体重」という属性に関しての相
互情報量の極大値をもたらす属性閾値ｅ、ｌ’であると
するならば、この処理により、第７図に示すような分割
表が作成されることになる。この分割表の作成処理は、
訓練事例の属性の内の連続値を取る属性のすべてに対し
て実行される。従って、第４図に示した訓練事例の集合
でみるならば、「身長」や「視力」に対しても、ステッ
プ１ないしステップ３の処理を実行することで、この分
割表が作成される。なお、分類すべきクラスが複数ある
ときには、この分割表は、第８図に示すように、より一
般的な記述形式で表されることになる。In step 3, the attribute threshold value θ,゛ that brings about the maximum value of mutual information is determined, and the attribute partition is determined by this attribute threshold value θ7゛. Then, the classification decision tree generating device 6 of the present invention creates a contingency table that describes the class states of the training examples partitioned by this attribute threshold value θ7''. 4kg and θ1=89.8k
If the attribute threshold value g is the attribute threshold value e, l' that brings about the maximum value of mutual information regarding the attribute "weight", then this process creates a contingency table as shown in Figure 7. That will happen. The process of creating this contingency table is
It is performed for all attributes that take continuous values among the attributes of the training example. Therefore, if we look at the set of training examples shown in FIG. 4, this contingency table is created by executing the processes in steps 1 to 3 for "height" and "visual acuity." Note that when there are multiple classes to be classified, this contingency table will be expressed in a more general description format as shown in FIG.

一方、本発明の分類決定木生成装置６は、第４図で示し
た訓練事例の「尿検査」のような不連続値をとる属性に
対しては、従来技術と変わることなく、直ちに、その不
連続値により区画される訓練事例のクラス状態を記述す
る分割表を作成することになる。第９図（Ａ）に、この
「尿検査Ｊの分割表の具体例を示す。この分割表の作成
処理も、訓練事例の属性の内の不連続値を取る属性のす
べてに対して実行される。なお、第９図（Ｂ）に、分類
すべきクラスが複数あるときの分割表の一般的な記述形
式を示す。この図からも明らかであるように、連続値を
取る属性の分割表と不連続値を取る属性の分割表は、属
性値の記述形式の違いを除けば、全く同一の形式を採る
ことになる。On the other hand, the classification decision tree generation device 6 of the present invention is able to immediately generate discontinuous values for attributes that take discontinuous values, such as "urinalysis" in the training example shown in FIG. We will create a contingency table that describes the class states of training examples partitioned by discrete values. FIG. 9(A) shows a specific example of the contingency table for urinalysis test J. This contingency table creation process is also executed for all attributes that take discontinuous values among the attributes of the training example. Figure 9 (B) shows the general description format of a contingency table when there are multiple classes to be classified.As is clear from this figure, a contingency table for attributes that take continuous values is The contingency tables for attributes that take discrete values will have exactly the same format, except for the difference in the format of the attribute value description.

このようにして、訓練事例のすべての属性に対して分ｖ
ｊ表を作成すると、本発明の分類決定木生成装置６は、
従来技術と同様に、この作成された分割表を使って、各
属性とクラスとの間の相関度を求めることになる。この
相関度の算出に対しては、様々な評価式が提案されてい
るが、ここでは、連続値を取る属性の分割表を求めるた
めに用いた相互情報量で説明することにする。In this way, for all attributes of the training example
After creating the j table, the classification decision tree generation device 6 of the present invention:
Similar to the prior art, the created contingency table is used to determine the degree of correlation between each attribute and class. Various evaluation formulas have been proposed for calculating this degree of correlation, but here, the mutual information used to obtain a contingency table of attributes that take continuous values will be explained.

第９図（Ｂ）のような分割表が得られると、各属性毎に
、相互情報量１　（Ａ）を下記の式に従って算出する。Once a contingency table as shown in FIG. 9(B) is obtained, mutual information 1(A) is calculated for each attribute according to the following formula.

１　　（Ａ）　　−−ＸＮ。1 (A) --X N.

−ΣＮＪ−１ｏｇｚＮ７＋Ｎｏ　・ＩｏｇｚＮｏｌここ
で、Ｎ、は診断事例の総個数であり、Ｎ、はクラスＣ５
に属する診断事例の個数であり、Ｘｉｊは値Ｖ＋（連続
値を取る属性にあっては値の領域）を持つクラスＣＪに
属する診断事例の個数であり、Ｘｉは値Ｖ、（連続値を
取る属性にあっては値の領域）を持つ診断事例の個数で
ある。-ΣNJ-1ogzN7+No ・IogzNol Here, N is the total number of diagnostic cases, and N is the class C5
, Xij is the number of diagnostic cases belonging to the class CJ that has the value V + (value range for attributes that take continuous values), and Xi is the number of diagnostic cases that belong to the class CJ that has the value V, (value range for attributes that take continuous values). For attributes, it is the number of diagnostic cases that have a value range).

この相互情報１１　（Ａ）が大きいほど、クラスとの間
の相関度が大きいことを意味する。これから、最も大き
な相互情報１１（Ａ）を示す属性が、次の下位ノードに
と割り付けられることで、分類木が生成されることにな
る。The larger the mutual information 11 (A), the greater the degree of correlation with the class. From now on, the attribute indicating the largest mutual information 11(A) is assigned to the next lower node, thereby generating a classification tree.

最後に、〔従来の技術〕で説明した分類決定木の生成処
理手順に合わせて、本発明の処理内容を具体的に説明す
ることにする。Finally, the processing contents of the present invention will be specifically explained in accordance with the classification decision tree generation processing procedure described in [Prior Art].

分類決定木の生成処理の（ｓ　ｔ４）に従って、属性と
クラスとの間の相関度を計算することで、第７図に示す
ように区画された「体重」が最も相関度の大きい属性で
あると判断すると、第１０図に示すように、根ノードに
「体重」を割り付けるとともに、その根ノードから分岐
する３つの枝に、区間［５４，３，６５，４）　［６５
，４，８９，８）　［８９，８，１０２，１］を割り付
けて、それぞれラベルを付ける。そして、体重の値が、
この３つの区間に所属しているかどうかに従って、訓練
事例集合を３つに分割する。この分割された３つの訓練
事例の集合に対して、分類決定木の生成処理の手続きが
再帰的に適用されることになる。なお、５４．３ｋｇは
診断事例の内の体重の最小値を表し、１０２．１ｋｇは
、は診断事例の内の体重の最大値を表している。By calculating the degree of correlation between attributes and classes according to step (s t4) of the classification decision tree generation process, it is determined that "weight", which is partitioned as shown in FIG. 7, is the attribute with the highest degree of correlation. If it is determined that, as shown in Fig. 10, "weight" is assigned to the root node, and the interval [54, 3, 65, 4) [65
, 4, 89, 8) Assign [89, 8, 102, 1] and label them respectively. And the weight value is
The training example set is divided into three parts depending on whether they belong to these three sections. The classification decision tree generation process is recursively applied to this divided set of three training examples. Note that 54.3 kg represents the minimum weight among the diagnosed cases, and 102.1 kg represents the maximum weight among the diagnosed cases.

分類決定木の生成処理の（ｓｔ３）に従って、体重の値
が［５４，３，６５，４）の区間に入る訓練事例の集合
のすべての要素が「検査不必要」のクラスに所属するこ
とが検出されると、第１Ｏ図に示すように、この枝に割
り付けられるノードを「不必要Ｊのラベルを付けた葉ノ
ードに設定する。同様に、体重の値が［８９，８，１０
２，１］の区間に入る訓練事例の集合のすべての要素が
「検査必要」のクラスに所属することが検出されると、
第１Ｏ図に示すように、この枝に割り付けられるノード
を「必要」のラベルを付けた葉ノードに設定する。According to (st3) of the classification decision tree generation process, all elements of the set of training examples whose weight values fall in the interval [54, 3, 65, 4) belong to the "test not required" class. When detected, the node assigned to this branch is set to the leaf node labeled "Unnecessary J" as shown in Figure 1O.Similarly, if the weight value is [89, 8, 10
2, 1], when it is detected that all elements of the set of training examples that fall in the interval belong to the “inspection required” class.
As shown in FIG. 1O, the node assigned to this branch is set to a leaf node labeled "required."

体重の値が［６５，４、８９，８）の区間に入る訓練事
例の集合の要素は、同一のクラスに属していないので、
分類決定木の生成処理の（ｓｔ４）に従って、「体重」
を除いた他の属性に関して必要に応じて属性閾値を求め
て分割表を作成し、属性とクラスとの間の相関度を計算
する。この計算により、「尿検査」が最も大きい相関度
の属性であるとすると、体重の値が［６５，４，８９，
８）の区間に入る訓練事例の集合を、尿検査の値である
Ｏ′°と“×″′とに従ワて分割していくのである。The elements of the training example set whose weight values fall in the interval [65, 4, 89, 8) do not belong to the same class, so
According to (st4) of the classification decision tree generation process, "weight"
Contingency tables are created by determining attribute thresholds as necessary for other attributes except for , and the degree of correlation between attributes and classes is calculated. According to this calculation, if "urine test" is the attribute with the highest degree of correlation, the value of weight is [65, 4, 89,
The set of training examples falling in the section 8) is divided according to the urine test values O'° and "x"'.

以上、図示実施例に従って本発明の詳細な説明したが、
本発明は、これに限定されるものではない６゜例えば、
「検査の必要有」、「検査の必要熱」というようなりラ
スが２つのものに限られるものではなく、複数あるもの
に対してもそのまま適用できるのである。また、分類決
定木と一義的に対応するようなＩＦ−ＴＨＥＮ形式の分
類知識の生成処理も本発明の範囲にある。The present invention has been described above in detail according to the illustrated embodiments, but
The present invention is not limited to 6 degrees, for example,
It is not limited to two cases such as ``need for inspection'' and ``required heat for inspection,'' but can also be applied to multiple cases. Furthermore, the scope of the present invention includes processing for generating classification knowledge in an IF-THEN format that uniquely corresponds to the classification decision tree.

（発明の効果）このように、本発明によれば、順序付けの可能な複数の
値を取る属性に対して、属性を区画するところの閾値を
客観的に求められるようになる。(Effects of the Invention) As described above, according to the present invention, it becomes possible to objectively determine a threshold value for dividing an attribute for an attribute that takes a plurality of values that can be ordered.

しかも、この求められる閾値は、クラスの推論を実行す
る上で、他の閾値よりも大きな情報を有していることか
ら、より正しい１１Ｅ論結果を導き出せる分類決定木を
生成できるようになる。Moreover, since this required threshold value has more information than other threshold values when performing class inference, it becomes possible to generate a classification decision tree that can lead to more accurate 11E theory results.

[Brief explanation of the drawing]

第１図は本発明の原理構成図、第２図及び第３図は本発明の実行するフローチャート、第４図は訓練事例の集合の一例、第５図は第３図のステップ２１の処理で作成される閾値
分割表の例、第６図は閾値分割表の説明図、第７図は連続値を取る属性の分割表の例、第８図は連続
値を取る属性の分割表の説明図、第９図は不連続値を取
る属性の分割表の説明図、第１Ｏ図は生成される分類決
定木の一例、第１１図は分類決定装置の構成図、第１２図は分類決定木の説明図、第１３図は分類決定装置の構成図、第１４図は訓練事例の集合の説明図、第１５［ｍ及び第１６図は属性とクラスの重なりの説明
図、第１７図及び第１８図は従来技術の説明図、第１９図は
従来技術の問題点を説明する説明図である。図中、１は知識ベース、２は診断事例人力装置、３はク
ラス推論装置、４は分類結果出力装置、５は訓練事例入
力装置、６は分類決定木生成装置、６０は属性モード判
断手段、６１は属性閾値選択手段、６２は閾値分割表生
成手段、６３は相互情報量算出手段、６４は属性区画決
定手段、６５は分割表生成手段、６６は相関度算出手段
、６７は最大相関度検出手段、６８は下位ノード割付手
段である。４府明の炙竹ずろフローチで一ト（１）茶　２　図本発明の疾行するフロー千ｆ−ト（互）＄　３　図（Ａ）第３図のステラ７’２１／）、＜几月ｊＴ“イ奎バ瑣る
ルＱ４直今やＩＡのイ）・ｊキ　５　口蓼・・１．Ｂ東」１イ多・１め４番合力　−４ン°１＄
　４　図ル０４直４１字１１表／）帆　０月　図第図連ｆ禿づ１．を乞月スる應・１゛生、Ｑ今３１表ψイタ
・ｊ第図（Ａ）（Ｂ）ノドｌヒイ直を一１Ｚ、ろ分家り！（の　誂８月　２塔
　９　図ロｃ４禿イＪ乞乙ろＡもす生Ｊり今筈りＡつ　説ａ月り
口笛図先広゛σ献ろ今ザ輿シス建不の一發Ｉ］第区］分類法定装置の構造図は）第図今角沃え牧１の透Ａ°図（１）第　１３　　図分線に定木Ａも光明図＄１２　　図 μｍ・１鐘」トイｊりっ簾イトリＳ免　憎口η第区、馬ｔｉＩ、ククスの重なり／ｌ説明図（１）第　１５
　　図Ａ性と７クスの士、なり鞭ｉ児Ｔ３ｒ３Ｉ区第　１６　
図（Ｉｔ）（Ａ）（Ｂ）イガミ来扶４永ゴの１光Ｑｌ　　１２］　　（ＩＬ）第
　７８　　［２１（Ａ）（Ｂ）ｌ〔米状株１の説θ肥口（１）イ楚禾械〃永丁のｒ、Ｐ＋是負戸Ｎ、先蟻する説明２第］９図Figure 1 is a diagram of the principle configuration of the present invention, Figures 2 and 3 are flowcharts of the execution of the present invention, Figure 4 is an example of a collection of training examples, and Figure 5 is the process of step 21 in Figure 3. An example of a threshold contingency table that is created. Figure 6 is an explanatory diagram of a threshold contingency table. Figure 7 is an example of a contingency table for attributes that take continuous values. Figure 8 is an explanatory diagram of a contingency table for attributes that take continuous values. , Fig. 9 is an explanatory diagram of a contingency table for attributes that take discontinuous values, Fig. 1O is an example of a classification decision tree that is generated, Fig. 11 is a configuration diagram of a classification decision device, and Fig. 12 is a diagram of a classification decision tree. Fig. 13 is a configuration diagram of the classification determination device; Fig. 14 is an explanatory diagram of a set of training examples; Figs. 15[m and 16 are illustrations of overlapping attributes and classes; Figs. The figure is an explanatory diagram of the prior art, and FIG. 19 is an explanatory diagram illustrating the problems of the prior art. In the figure, 1 is a knowledge base, 2 is a diagnosis case manual device, 3 is a class inference device, 4 is a classification result output device, 5 is a training example input device, 6 is a classification decision tree generation device, 60 is an attribute mode judgment means, 61 is an attribute threshold selection means, 62 is a threshold contingency table generation means, 63 is a mutual information calculation means, 64 is an attribute partition determination means, 65 is a contingency table generation means, 66 is a correlation degree calculation means, and 67 is a maximum correlation degree detection means. Means 68 is a lower node allocation means. 4 Fumei's roasted bamboo zuro frochi de oneto (1) Tea 2 Fig. The running flow of the present invention 1,000 f-t (mutual) $ 3 Fig. (A) Fig. 3 Stella 7'21/), < 几Moon jT "I keba diruru Q4 direct now IA's a), jki 5 口蓼...1.B east" 1 many, 1st 4th resultant force -4 n° 1$
4 Figure 04 Straight 41 Character 11 Table/) Sail 0 Month Figure Number Figure Renf Bald 1. 1st birthday, Q now 31 table ψ ita, j fig. (A) (B) Nodo lhii direct 1 Z, robu family! (The order of August 2 Tower 9 Figure C4 bald J beg Otoro A also life J right now Atsu theory a month whistle diagram first wide ゛σ offer now the koshi system construction part I ] Section ] The structural diagram of the classification legal device is) Fig. A° diagram of Imasaku Iwoemaki 1 (1) Fig. 13 A fixed tree A is also on the dividing line of Fig. $ 12 Fig. μm 1 bell "Toy j Ririden Itori S Men Hatred η Section, Horse tiI, Kukusu overlap/l Explanatory diagram (1) No. 15
Figure A and the 7th Class Master, Narifi Iji T3r3I Ward No. 16
Figure (It) (A) (B) Igami Raifu 4 Eigo no 1 light Ql 12] (IL) No. 78 [21 (A) (B) l [The theory of rice-like stock 1 θ Higuchi (1) I Chuhe machine 〃Yong Ding r, P + is negative door N, first explanation 2] Figure 9

Claims

[Claims] An attribute is selected from a set of training examples representing a correlation between the attribute and the class according to the degree of correlation between the attribute and the class, and is selected in this way. In a classification decision tree generation device (6) that generates a classification decision tree configured to branch from higher nodes to lower nodes by assigning attributes to lower nodes, the value of the attribute of a training example is determined. an attribute threshold selection means (61) for selecting a plurality of attribute thresholds from the range of the attribute value when the attribute value takes on a plurality of values that can be ordered; a threshold contingency table generating means (62) that generates a threshold contingency table describing the class states of training examples partitioned by the selected attribute thresholds for each selected attribute threshold;
2) Using the threshold contingency table generated by step 2), calculate the mutual information between attributes and classes when partitioned by attribute threshold for each selected attribute threshold according to a predetermined mutual information arithmetic formula. Mutual information calculating means (63); dividing attributes according to an attribute threshold value that takes the mutual information of a significant maximum value among the plurality of mutual information calculated by the mutual information calculating means (63); Attribute section determination means (6
4), the attribute division determination means (64) calculates the degree of correlation between the divided attributes and classes, and determines the possible ordering based on the calculated degree of correlation. When an attribute that takes a plurality of values is selected as a lower node of the classification decision tree, processing is performed to assign the attribute partitioned by the attribute partition determining means (64) as the lower node. A classification decision tree generation device.