JPH11250084A

JPH11250084A - Data mining device

Info

Publication number: JPH11250084A
Application number: JP10049739A
Authority: JP
Inventors: Susumu Shiraishi; 將白石; Hidetoshi Tanaka; 秀俊田中
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-03-02
Filing date: 1998-03-02
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To set diverse preprocessing procedures, to easily repeat trial and error on various conditions in a mining process and to simplify various kinds of setting by designating a procedure file to be used in a processing executing means and the order of its application. SOLUTION: In a preprocessing setting part 24, a user prepares the procedure file 312 and selects the file 312 to be used. Next, a correlation rule is extracted from an item database 304 by mining execution 22. The parameter of mining and the item designation of the condition part and the result part of the correlation rule are set by a mining setting part 25 by the user in advance. Finally, the extracted correlation rule is picture-displayed by the display of a result 23. At this time, the correlation rule including only some item is sampled or data some correlation rule can be applied to is referred to advance analysis. When the result is not satisfactory, the settings 24 and 25 are executed again.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、関係データベース
中から相関ルールを抽出するデータマイニング装置に関
するものである。[0001] The present invention relates to a data mining apparatus for extracting an association rule from a relational database.

【０００２】[0002]

【従来の技術】大量のデータの中から規則性を発見する
データマイニングの一方法として相関ルールの抽出があ
る。相関ルールとは「Ａ→Ｂ」の形式をしたルールであ
り、「データベース中でＡを含むレコードは、同時にＢ
も含むことが多い」ことを意味する。以下、Ａを条件
部、Ｂを結論部と呼ぶことにする。2. Description of the Related Art As one method of data mining for finding regularity from a large amount of data, there is extraction of an association rule. An association rule is a rule in the form of “A → B”. “A record containing A in the database is
In many cases. " Hereinafter, A is referred to as a condition part, and B is referred to as a conclusion part.

【０００３】データベースをマイニングシステムで処理
するためには数値属性を記号値に変換する離散化などの
前処理が必要であるが、そのような前処理を含んだデー
タマイニング装置の構成方法については特開平８−７７
０１０号公報「データ分析方法および装置」（以下、従
来技術と呼ぶ）で開示されている。In order to process a database with a mining system, preprocessing such as discretization for converting a numerical attribute into a symbolic value is required. A method of configuring a data mining apparatus including such preprocessing is particularly described. Kaihei 8-77
No. 010, "Data analysis method and apparatus" (hereinafter referred to as "prior art").

【０００４】[0004]

【発明が解決しようとする課題】データマイニングを行
う場合、一回の実行で全ての有用なルールを抽出できる
ことは稀であり、前処理やマイニングの設定を変えて何
度も試行錯誤を繰り返さなければならないことが多い。When performing data mining, it is rare that all useful rules can be extracted by one execution, and trial and error must be repeated many times by changing preprocessing and mining settings. Often must.

【０００５】従来技術では前処理を含むような統合化さ
れたデータマイニング装置について示されていた。しか
し、前処理で行う処理は、生成されるルールの制約設定
を除くと、数値属性の離散化と欠損値の処理のみであ
り、属性値のグルーピング、特定条件を満たすレコード
選択などの処理機能はなかった。そのため、きめの細か
い前処理は不可能であり、ゆえに豊富な前処理手続きを
使用した前処理適用の試行錯誤を効率よく行うための機
能はなかった。The prior art has shown an integrated data mining device that includes preprocessing. However, the processing performed in the pre-processing is only discretization of numerical attributes and processing of missing values, except for the constraint setting of the generated rules. Processing functions such as attribute value grouping and record selection satisfying specific conditions are not available. Did not. For this reason, fine-grained preprocessing is impossible, and therefore, there is no function for efficiently performing trial and error of preprocessing application using abundant preprocessing procedures.

【０００６】本発明の第１の目的は、上記の問題点を解
決し、豊富な前処理手続きの設定が可能であり、マイニ
ングプロセスにおいて少しづつ条件を変えて試行錯誤を
繰り返すことが容易であり、また種々の設定が簡単に行
えるような装置を提供することである。A first object of the present invention is to solve the above-mentioned problems, to enable abundant setting of preprocessing procedures, and to easily repeat trial and error by changing conditions little by little in a mining process. Another object of the present invention is to provide an apparatus which can easily perform various settings.

【０００７】また、マイニングにより得られた相関ルー
ルの表示において、ある項目のみを含むような相関ルー
ルを抜き出して見ることが多いが、そのような処理をし
た後で、抜き出された相関ルールの数が逐一人手で調べ
るには多すぎることがわかり処理をやりなおさなければ
ならないことが良くある。従来技術では、事前にそれぞ
れの項目について、該項目を含む相関ルールに関する情
報を表示するような機能はなかった。[0007] In addition, in the display of association rules obtained by mining, it is often the case that association rules that include only a certain item are extracted and viewed. Often, the numbers are found to be too large to be examined by hand, and the process must be repeated. In the related art, there is no function of displaying information on an association rule including an item for each item in advance.

【０００８】本発明の第２の目的は、上記の問題点を解
決し、相関ルール表示において、どの項目を含む相関ル
ールを抜き出して見るかについて、利用者が判断を下し
やすいような装置を提供することである。[0008] A second object of the present invention is to solve the above-mentioned problem and provide an apparatus which allows a user to easily determine which item of association rule is to be extracted and viewed in the association rule display. To provide.

【０００９】[0009]

【課題を解決するための手段】第１の発明によれば、関
係データベースを入力とし、利用者の指定に従って、レ
コードが複数集まって構成され項目の集合からなる項目
データベースを出力する前処理実行手段と、項目データ
ベースを入力とし、その項目データベースの項目間の相
関ルールを抽出してその結果を結果ルールファイルとし
て出力するマイニング実行手段と、結果ルールファイル
の内容を表示する結果表示手段とから構成されるデータ
マイニング装置において：関係データベースから項目デ
ータベースに変換する手続きを記録する手続きファイル
を入力とし、該手続きファイルに対して、数値属性の離
散化に関する手続き、属性値のグルーピングに関する手
続き、属性値の無値への置き換えに関する手続き、属性
の削除に関する手続き、特定条件を満たすレコードの選
択に関する手続き、属性値に属性情報を付与する項目化
に関する手続き、項目のグルーピングに関する手続き、
のうちいずれかの手続きの追加や削除や変更などの編集
を行う手続きファイル編集手段と、前処理実行手段にお
いて使用する一つまたは複数の手続きファイルとその適
用の順序とを指定する手続きファイル適用設定手段と、
を有する前処理設定手段を備えるように構成される。According to a first aspect of the present invention, there is provided a pre-processing execution means for inputting a relational database and outputting an item database composed of a plurality of records formed by collecting a plurality of records in accordance with a user's specification. And mining executing means for inputting an item database, extracting a correlation rule between the items of the item database, and outputting the result as a result rule file, and result displaying means for displaying the contents of the result rule file. In a data mining apparatus, a procedure file for recording a procedure for converting from a relational database to an item database is input, and a procedure for discretizing numerical attributes, a procedure for grouping attribute values, and a procedure for Procedures for replacing with values, procedures for deleting attributes Come, procedures for the selection of specific conditions are met records, procedures for itemized to grant the attribute information in the attribute values, procedures for grouping of items,
Procedure file editing means for editing such as addition, deletion or change of any of the procedures, and one or more procedure files to be used in the preprocessing execution means and a procedure file application setting for specifying the order of application Means,
And a pre-processing setting means having

【００１０】第２の発明によれば、本発明のデータマイ
ニング装置は、関係データベースを入力とし、関係デー
タベース内の属性名と属性値情報を属性ごとに記したデ
ータ辞書を出力するデータベース解析手段と、データ辞
書および、手続きファイル適用設定手段で指定された既
に編集済みの一つまたは複数の順序付けられた手続きフ
ァイルとを入力とし、データ辞書に手続きファイルを適
用した結果である中間データ辞書を出力するデータ辞書
変換手段と、データ辞書または中間データ辞書、および
関係データベースの属性間の関係を階層構造として記述
した属性階層構造情報とを入力とし、属性階層構造から
中間データ辞書内に含まれない属性を削除し、各属性の
下位に属性値を付加した形式の階層構造を出力する表示
用属性階層生成手段とをさらに備え、前処理設定手段
は、表示用属性階層生成手段が出力した階層構造を表示
する前処理時属性階層表示手段をさらに有するように構
成される。According to a second aspect of the present invention, the data mining apparatus of the present invention has a database analyzing means for inputting a relational database and outputting a data dictionary in which attribute names and attribute value information in the relational database are described for each attribute. , A data dictionary, and one or more ordered procedure files already edited specified by the procedure file application setting unit, and outputting an intermediate data dictionary as a result of applying the procedure file to the data dictionary The data dictionary conversion means, the data dictionary or the intermediate data dictionary, and the attribute hierarchical structure information which describes the relation between the attributes of the relational database as a hierarchical structure are input, and the attribute not included in the intermediate data dictionary is extracted from the attribute hierarchical structure. A display attribute hierarchy generator that deletes and outputs a hierarchical structure in the form of adding attribute values below each attribute DOO further comprising a pre-processing setting means is arranged to further include a processing time attribute hierarchy display means before the display attribute hierarchy generating means for displaying an output hierarchy structure.

【００１１】第３の発明によれば、本発明のデータマイ
ニング装置の前処理時属性階層表示手段は、階層構造の
表示の際、属性グループと属性と属性値とを、利用者に
区別がつくようにそれぞれ異なる形態で表示するように
構成される。According to the third invention, the pre-processing attribute hierarchy display means of the data mining apparatus of the present invention allows the user to distinguish between an attribute group, an attribute, and an attribute value when displaying a hierarchical structure. As described above.

【００１２】第４の発明によれば、本発明のデータマイ
ニング装置の手続きファイル編集手段は、前処理時属性
階層表示手段で利用者が選択したノードに関する手続き
の編集を可能とするように構成される。According to the fourth invention, the procedure file editing means of the data mining apparatus of the present invention is configured so as to enable editing of a procedure relating to a node selected by the user by the pre-processing attribute hierarchy display means. You.

【００１３】第５の発明によれば、本発明のデータマイ
ニング装置の手続きファイル編集手段は、利用者が前処
理時属性階層表示手段のあるノードを選択して数値属性
の離散化に関する手続きまたは属性の削除に関して行っ
た追加の手続きに基づいて、該ノードの下位にある全て
の属性ノードに対して自動的に同様の手続きが追加され
るように構成される。According to the fifth invention, the procedure file editing means of the data mining apparatus of the present invention is characterized in that the user selects a certain node in the pre-processing attribute hierarchy display means and performs a procedure or an attribute relating to discretization of numerical attributes. A similar procedure is automatically added to all the attribute nodes under the node based on the additional procedure performed for the deletion of.

【００１４】第６の発明によれば、本発明のデータマイ
ニング装置の前処理時属性階層表示手段は、階層構造の
表示の際、編集中の手続きファイル内の各手続きに対応
するノードを強調表示するように構成される。According to the sixth aspect, the preprocessing hierarchical attribute display means of the data mining apparatus of the present invention, when displaying the hierarchical structure, highlights a node corresponding to each procedure in the procedure file being edited. It is configured to

【００１５】第７の発明によれば、本発明のデータマイ
ニング装置の前処理設定手段は、数値属性の離散化に関
する手続きを含んで構成される第１手続きファイル、属
性値のグルーピングおよび属性値の無値への置き換えお
よび属性の削除および特定条件を満たすレコードの選択
に関する手続きを含んで構成され第１手続きファイルに
依存する第２手続きファイル、属性値に属性情報を付与
する項目化および項目のグルーピングに関する手続きを
含んで構成され第１手続きファイル及び第２手続きファ
イルに依存する第３手続きファイルを作成し、関係デー
タベースに対して第１手続きファイル、第２手続きファ
イル、第３手続きファイルの順に適用されることを保証
するように構成される。According to the seventh invention, the preprocessing setting means of the data mining apparatus of the present invention comprises: a first procedure file including a procedure relating to discretization of a numerical attribute; a grouping of attribute values; A second procedure file that includes procedures related to replacement with null values, deletion of attributes, and selection of records that satisfy specific conditions, and a second procedure file that depends on the first procedure file; itemization for adding attribute information to attribute values; and item grouping. A third procedure file is created which includes procedures related to the first procedure file and the second procedure file, and is applied to the relational database in the order of the first procedure file, the second procedure file, and the third procedure file. It is configured to guarantee that

【００１６】第８の発明によれば、本発明のデータマイ
ニング装置の前処理設定手段は、事前に編集された第
１、第２、第３手続きファイルのうち、正しい依存関係
にある手続きファイルの組みのみを選択できるような手
続きファイル適用設定手段を備えるように構成される。According to the eighth invention, the pre-processing setting means of the data mining apparatus of the present invention comprises the pre-editing first, second, and third procedure files of the procedure files having correct dependencies among the first, second, and third procedure files. It is configured to include a procedure file application setting unit capable of selecting only a set.

【００１７】第９の発明によれば、本発明のデータマイ
ニング装置は、表示用属性階層生成手段が出力した階層
構造を表示するマイニング時属性階層表示手段と、項目
データベース内の各項目に対して、相関ルールの条件部
のみに現れても良い、または相関ルールの結論部のみに
現れても良い、または相関ルールの条件部結論部のどち
らに現れても良い、または相関ルールに現れてはいけな
い、の４種類のモードのいずれかを指定する条件部結論
部設定手段とを有するマイニング設定手段を備え、条件
部結論部設定手段はマイニング時属性階層表示手段のノ
ードを選択してモードを指定することにより、該ノード
の下位にある全てのノードも自動的に同一モードとして
指定されるように構成される。According to the ninth aspect, the data mining apparatus of the present invention comprises: a mining attribute hierarchy display unit for displaying the hierarchical structure output by the display attribute hierarchy generation unit; , May appear only in the condition part of the association rule, or may appear only in the conclusion part of the association rule, or may appear in the condition part of the association rule, or must not appear in the association rule And a condition part conclusion part setting means for designating any one of the four types of modes. The condition part conclusion part setting means selects a node of the mining attribute hierarchy display means and designates the mode. Thus, all nodes under the node are automatically designated as the same mode.

【００１８】第１０の発明によれば、本発明のデータマ
イニング装置のマイニング時属性階層表示手段は、条件
部結論部設定手段によるモードの指定によって、各ノー
ドの表示形態を利用者に区別がつくように異なるものに
するように構成される。According to the tenth aspect, the attribute hierarchy display means at the time of mining of the data mining apparatus of the present invention makes it possible for the user to distinguish the display form of each node by specifying the mode by the condition part conclusion part setting means. And so on.

【００１９】第１１の発明によれば、本発明のデータマ
イニング装置のマイニング時属性階層表示手段は、条件
部結論部設定手段によるモードの指定によって、各ノー
ドについて該ノードと該ノードの下位にあるノードのモ
ードが全て同じになっている場合、該ノードを強調表示
するように構成される。According to the eleventh aspect, the attribute hierarchy display means at the time of mining of the data mining apparatus according to the present invention includes, for each node, the node and the subordinate of the node by the designation of the mode by the condition part conclusion part setting means. When the modes of the nodes are all the same, the node is configured to be highlighted.

【００２０】第１２の発明によれば、本発明のデータマ
イニング装置の結果表示画面は、全項目と共に各項目を
含む相関ルールの数を表示するように構成される。According to the twelfth aspect, the result display screen of the data mining apparatus of the present invention is configured to display all items and the number of association rules including each item.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面に基づいて説明する。本実施の形態では、健康診
断データベースを対象とした相関ルール抽出を行う場合
について述べる。Embodiments of the present invention will be described below with reference to the drawings. In the present embodiment, a case will be described in which association rules are extracted from a health examination database.

【００２２】図１に健康診断データベースの例を示す。
このデータベースは、複数の各人のレコードから構成さ
れ、各人のレコードには、氏名ＩＤ、性別、年齢等の一
般情報の他、身長等の身体計測データ、問診票回答デー
タ、精密診断データ等が含まれる。たとえば、ＩＤ００
０１の人のレコードをみると、性別は「男」、年齢は
「２２」、身長は「１５９」、握力（右）は「４４」、
問診票回答データの肩こりの欄は「いつも」のようなデ
ータが蓄積されている。利用者はこのようなデータベー
ス中の複数のデータをデータマイニング装置で処理する
ことによって生活習慣と健康状態との関係を抽出し、病
気の予防に役立てることができる。データマイニングの
方式には各種あるが、本データマイニング装置は相関ル
ールの抽出を行う。健康診断データベースからは、例え
ば「肥満度高い→血圧高い」等の相関ルールが抽出され
る。この相関ルールは、肥満度が高い人は血圧も高いこ
とが多い、ことを意味している。FIG. 1 shows an example of a health examination database.
This database is composed of a plurality of records of each person. Each person's record contains general information such as name ID, gender, age, etc., body measurement data such as height, medical questionnaire response data, precise diagnosis data, etc. Is included. For example, ID00
Looking at the record of the person No. 01, the gender is “male”, the age is “22”, the height is “159”, the grip strength (right) is “44”,
Data such as "always" is stored in the column of stiffness of the questionnaire response data. By processing a plurality of data in such a database with a data mining device, a user can extract the relationship between lifestyle and health status and use it for disease prevention. Although there are various data mining methods, the data mining apparatus extracts an association rule. For example, an association rule such as “high obesity → high blood pressure” is extracted from the health examination database. This association rule means that people with high obesity often have high blood pressure.

【００２３】図２はデータマイニングの処理の流れを示
す。データマイニング処理においては、まず前処理実行
２１によって、上記の分析対象のレコードが蓄積された
関係データベース３０１から項目データベース３０４を
作成する。項目データベース３０４は、関係データベー
ス３０１とは異なり、項目の集合からなる不定長のレコ
ードが複数集まって構成されている。前処理では連続値
の離散化、データの絞り込み、属性値に属性情報を付与
する項目化等の処理を行うが、詳細については後述す
る。前処理の内容は事前に利用者が前処理設定２４によ
って設定しておく。前処理設定２４では、利用者は手続
きファイルの作成と、使用する手続きファイルの選択を
行う。次にマイニング実行２２によって項目データベー
ス３０４から相関ルールを抽出する。マイニングのパラ
メータや、相関ルールの条件部と結論部の項目指定は事
前に利用者がマイニング設定２５によって設定してお
く。最後に結果表示２３により、抽出された相関ルール
を画面に表示する。この際、ある項目だけを含む相関ル
ールを抜き出して見たり、ある相関ルールが当てはまる
ようなデータを参照したりすることにより、分析を進め
る。結果に満足がいかない場合は、前処理設定２４やマ
イニング設定２５をやり直して実行を繰り返す。FIG. 2 shows the flow of the data mining process. In the data mining process, first, an item database 304 is created from the relation database 301 in which the records to be analyzed are stored by the preprocessing execution 21. The item database 304 is different from the relational database 301 in that a plurality of records of indefinite length each consisting of a set of items are collected. In the preprocessing, processes such as discretization of continuous values, narrowing down of data, and itemization for adding attribute information to attribute values are performed, and details thereof will be described later. The contents of the pre-processing are set in advance by the user using the pre-processing setting 24. In the preprocessing setting 24, the user creates a procedure file and selects a procedure file to be used. Next, an association rule is extracted from the item database 304 by the mining execution 22. The user sets the mining parameters and the item designation of the condition part and the conclusion part of the association rule in advance by the mining setting 25. Finally, the extracted association rules are displayed on the screen by the result display 23. At this time, the analysis is advanced by extracting and seeing an association rule including only a certain item or referring to data to which a certain association rule applies. If the result is not satisfactory, the pre-processing setting 24 and the mining setting 25 are redone and the execution is repeated.

【００２４】図３は、図２に示すデータマイニング装置
全体の詳細ブロック図を示す図である。図４は図３の前
処理設定手段３１１の詳細ブロック図を示す図である。
図５は図３のマイニング設定手段３１３の詳細ブロック
図を示す図である。FIG. 3 is a detailed block diagram of the entire data mining apparatus shown in FIG. FIG. 4 is a detailed block diagram of the preprocessing setting unit 311 of FIG.
FIG. 5 is a diagram showing a detailed block diagram of the mining setting means 313 of FIG.

【００２５】前処理実行前処理実行に関して、図２および図３を基にして説明す
る。手続きファイル３１２の蓄積された各手続きファイ
ルは、一つまたは複数の手続きから構成されている。各
手続きはデータベースに対する変換方法を記述した文字
列で構成される。手続きファイルには第１手続きファイ
ル、第２手続きファイル、第３手続きファイルの３種類
がある。The execution of the pre-processing will be described with reference to FIGS. 2 and 3. Each stored procedure file of the procedure file 312 is composed of one or a plurality of procedures. Each procedure is composed of a character string describing the conversion method for the database. There are three types of procedure files: a first procedure file, a second procedure file, and a third procedure file.

【００２６】第１手続きファイルは、数値属性の離散化
に関する手続きを含んで構成される。数値属性の離散化
とは、数値属性の取る範囲を境界値によっていくつかの
領域に分割し、分割された各領域に異なる名前を付けて
新たな属性値とすることである。ある項目が含まれるレ
コード数が全体に対して少ない場合、相関ルール抽出の
アルゴリズムの性質上、該項目を含むような相関ルール
は抽出されないので、離散化手続きが必要である。The first procedure file includes a procedure relating to the discretization of the numerical attribute. The discretization of the numerical attribute means that the range of the numerical attribute is divided into several regions by boundary values, and each divided region is given a different name to obtain a new attribute value. If the number of records containing a certain item is small relative to the whole, a correlation rule that includes the item is not extracted due to the nature of the algorithm for extracting the association rule, so a discretization procedure is required.

【００２７】第２手続きファイルは、データの絞り込
み、つまり、属性値のグルーピングおよび属性値の無値
への置き換えおよび属性の削除および特定条件を満たす
レコードの選択に関する手続きを含んで構成される。属
性値のグルーピングとは、いくつかの属性値をまとめて
ひとつの属性値とすることである。属性値の無値への置
き換えとは、ある属性値を削除することである。ある項
目が含まれるレコード数が全体に対して占める割合が大
きい場合、相関ルール抽出のアルゴリズムの性質上、該
項目を含むような相関ルールが膨大に生成されるので、
それを防ぐために該属性値が削除されたりする。The second procedure file includes procedures for narrowing down data, that is, grouping attribute values, replacing attribute values with null values, deleting attributes, and selecting records satisfying specific conditions. The grouping of attribute values is to collect several attribute values into one attribute value. Replacing an attribute value with a null value means deleting an attribute value. When the ratio of the number of records including a certain item to the whole is large, an enormous number of association rules including the item are generated due to the nature of the algorithm for extracting the association rule.
To prevent this, the attribute value is deleted.

【００２８】第３手続きファイルは、属性値に属性情報
を付与する項目化および項目のグルーピングに関する手
続きを含んで構成される。項目化を適用しないと、属性
情報が反映されないので、例えば、本来「肥満度高い→
血圧高い」となるべき相関ルールが「高い→高い」のよ
うな形式で出力されてしまう。従って、意味のある相関
ルールを出力するためには、項目化は必須である。デー
タベースに項目化手続きを適用することによって、不定
長のレコードが複数集まって構成される項目データベー
スが生成される。The third procedure file includes a procedure relating to itemization for adding attribute information to attribute values and grouping of items. If the itemization is not applied, the attribute information is not reflected.
A correlation rule that should be “high blood pressure” is output in a format such as “high → high”. Therefore, in order to output a meaningful association rule, itemization is essential. By applying the itemization procedure to the database, an item database composed of a plurality of records of indefinite length is generated.

【００２９】図３において、利用者３１８が前処理実行
の指示を出すと、前処理実行手段３０２は、図４に示す
手続きファイル適用設定手段４４で事前に選択された第
１、第２、第３手続きファイルをこの順序で逐次的に関
係データベース３０１に適用して項目データベース３０
４を生成する。In FIG. 3, when the user 318 issues an instruction to execute pre-processing, the pre-processing executing means 302 selects the first, second, and second pre-selected by the procedure file application setting means 44 shown in FIG. The three-procedure file is sequentially applied to the relational database 301 in this order to the item database 30.
4 is generated.

【００３０】図６は、本実施の形態における前処理実行
の処理の流れを示す図である。図６においては、ステッ
プ６１において、関係データベース３０１からデータを
読出し、ステップ６２で読出したデータを離散化し、ス
テップ６３でデータの絞り込みを行い、ステップ６４で
データの項目化を行い、ステップ６５で項目データベー
ス３０４を生成する。このとき、全ての手続きファイル
適用後の項目データベース３０４だけでなく、それぞれ
の手続きファイル適用後の中間データベース３０３を保
存するように設定することも可能である。この場合は、
ステップ６６において、ステップ６２で離散化されたデ
ータから中間データベース３０３を生成し、ステップ６
７で、ステップ６３で絞り込まれたデータを中間データ
ベース３０３として生成する。これらの中間データベー
ス３０３は関係データベース３０１と同じ構造をしてい
る。FIG. 6 is a diagram showing a flow of a pre-processing execution process in this embodiment. In FIG. 6, at step 61, data is read from the relational database 301, at step 62, the read data is discretized, at step 63, data is narrowed down, at step 64, data items are formed, and at step 65, data items are formed. Generate the database 304. At this time, it is possible to set so that not only the item database 304 to which all the procedure files have been applied but also the intermediate database 303 to which each procedure file has been applied is stored. in this case,
In step 66, an intermediate database 303 is generated from the data discretized in step 62, and
In step 7, the data narrowed down in step 63 is generated as the intermediate database 303. These intermediate databases 303 have the same structure as the relational database 301.

【００３１】このような前処理実行によれば、条件を少
し変えて前処理実行をやり直す場合に、既に存在する手
続きファイルを編集して適用し直すだけで良いので、試
行錯誤が簡単に行える、という効果がある。また、以上
の３種類の手続きファイルの逐次的な適用を行う枠組み
を持ったデータマイニング装置を提供することにより、
必要にして十分な前処理実行を行うことができるという
効果がある。According to such pre-processing execution, when re-executing pre-processing with slightly changing conditions, it is only necessary to edit the existing procedure file and apply it again, so that trial and error can be easily performed. This has the effect. In addition, by providing a data mining device having a framework for sequentially applying the above three types of procedure files,
There is an effect that necessary and sufficient preprocessing can be performed.

【００３２】さらに、各手続きファイル適用後の中間デ
ータベース３０３を保存することにより、前処理実行手
段３０２への入力を関係データベース３０１の代わりに
該中間データベース３０３として、既に実行済の処理を
再度行う手間を省くことができる、という効果がある。Further, by saving the intermediate database 303 after application of each procedure file, the input to the pre-processing execution means 302 is replaced with the intermediate database 303 instead of the relational database 301, so that the already executed processing can be performed again. Can be omitted.

【００３３】手続きファイルの例次に、第１手続きファイルの例を図７に示す。図７にお
いては、各行が一つの手続きに対応している。たとえ
ば、第１手続きファイル７０の第１行の手続き７１は、
「属性＜年齢＞を境界値＜３０＞＜４０＞＜５０＞で属
性値＜３０以下＞＜３０−４０＞＜４０−５０＞＜５０
以上＞に離散化します」と定義されている。この第１行
の手続き処理によって、属性＜年齢＞が３０，４０，５
０の境界点を分岐点として、属性値が＜３０以下＞＜３
０−４０＞＜４０−５０＞＜５０以上＞のデータ群に離
散化される。他の手続き行についても同様である。手続
き７２や手続き７３のように、他属性の値によって異な
る境界値を取るような離散化手続きもサポートする事が
できる。第２行の手続き７２は、たとえば、「条件＜性
別＝男＞の下で属性＜身長＞を境界値＜１６０＞＜１８
０＞で属性値＜低＞＜中＞＜高＞に離散化します」と定
義されている。この第２行の手続き処理によって、＜性
別＝男＞の条件下で属性＜身長＞を１６０，１８０の境
界点を分岐点として、属性値＜低＞＜中＞＜高＞に離散
化される。第３行の手続き７３は、たとえば、「条件＜
性別＝女＞の下で属性＜身長＞を境界値＜１５０＞＜１
７０＞で属性値＜低＞＜中＞＜高＞に離散化します」と
定義されている。この第３行の手続き処理によって、＜
性別＝女＞の条件下で属性＜身長＞を１５０，１７０の
境界点を分岐点として、属性値＜低＞＜中＞＜高＞に離
散化される。第４行の手続き７４は、たとえば、「属性
＜握力（右）＞を境界値＜３０＞＜４０＞＜５０＞＜６
０＞で属性値＜３０以下＞＜３０−４０＞＜４０−５０
＞＜５０−６０＞＜６０以上＞に離散化します」と定義
されている。この第４行の手続き処理によって、属性＜
握力（右）＞を３０，４０，５０，６０の境界点を分岐
点として、＜３０以下＞＜３０−４０＞＜４０−５０＞
＜５０−６０＞＜６０以上＞に離散化される。[0033] Examples of procedures file Next, an example of a first procedure file in FIG. In FIG. 7, each line corresponds to one procedure. For example, the procedure 71 on the first line of the first procedure file 70 is
"Attribute <age> is bounded by <30><40><50> and attribute value <30><30-40><40-50><50
The above is discretized into>. " By the procedure processing of the first line, the attribute <age> is 30, 40, 5
With the boundary point of 0 as a branch point, the attribute value is <30 or less><3
0-40><40-50><50 or more>. The same applies to other procedure lines. As in the procedure 72 and the procedure 73, a discretization procedure that takes a different boundary value depending on the value of another attribute can also be supported. The procedure 72 on the second line may be, for example, “set the attribute <height> under the condition <sex = male>
0>, it is discretized into attribute values <low><medium><high>". By the procedure processing in the second line, the attribute <height> is discretized into the attribute values <low>, <medium> and <high> under the condition of <sex = male> with the boundary points of 160 and 180 as the branch points. . The procedure 73 on the third line is, for example, “condition <
Under gender = female> attribute <height> with boundary value <150><1
70>, the attribute values are discretized into <low>, <medium>, and <high>. " By the procedure processing of the third line, <
Under the condition of gender = female, the attribute <height> is discretized into attribute values <low>, <medium> and <high> with the boundary point between 150 and 170 as a branch point. The procedure 74 on the fourth line may be, for example, “set the attribute <grip strength (right)> to the boundary value <30><40><50><6
0> and attribute value <30 or less><30-40><40-50
><50-60><60 or more>". By the procedure processing of the fourth line, the attribute <
The grip force (right)> is defined as the branch point at the boundary point of 30, 40, 50, 60, and is <30 or less><30-40><40-50>
<50-60> Discretized to <60 or more>.

【００３４】図１の関係データベースに図７の第１手続
きファイルを適用した結果得られる中間データベースを
図８に示す。図８において、上記の図７の第１行の手続
き７１において、属性＜年齢＞が＜３０以下＞＜３０−
４０＞＜４０−５０＞＜５０以上＞に離散化され、第２
行および第３行の手続き７２および７３において、属性
＜身長＞が＜低＞＜中＞＜高＞に離散化され、第４行の
手続き７４において、属性＜握力（右）＞が＜３０以下
＞＜３０−４０＞＜４０−５０＞＜５０−６０＞＜６０
以上＞に離散化された結果のみが表示されていることが
分かる。FIG. 8 shows an intermediate database obtained as a result of applying the first procedure file of FIG. 7 to the relational database of FIG. 8, in the procedure 71 on the first line in FIG. 7, the attribute <age> is <30 or less><30-
40><40-50><50 or more>
In the procedures 72 and 73 on the third and third rows, the attribute <height> is discretized into <low>, <medium> and <high>. In the procedure 74 on the fourth row, the attribute <grip strength (right)> is <30 or less. ><30-40><40-50><50-60><60
It can be seen that only the discretized results are displayed in the above.

【００３５】図９は第２手続きファイルの例を示す図で
ある。図９において、第１行の手続き９１は属性削除の
手続き例である。手続き９１では、「属性＜ＩＤ＞を削
除します」と定義されている。第２行の手続き９２は特
定条件を満たすレコード選択の手続き例である。手続き
９２では、「属性＜身長＞の属性値＜低＞＜高＞を選択
します」と定義されている。第３行の手続き９３は属性
削除の手続き例である。手続き９３では、「属性＜握力
（右）＞を削除します」と定義されている。第４行の手
続き９４は属性値のグルーピングの手続き例である。手
続き９４では、「属性＜肩がこる＞の属性値＜いつも＞
＜時々＞を属性＜はい＞にグループ化します」と定義さ
れている。第５行の手続き９５は属性値の無値への置き
換えの手続き例である。手続き９５では、「属性＜肩が
こる＞の属性値＜いいえ＞を削除します」と定義されて
いる。FIG. 9 is a diagram showing an example of the second procedure file. In FIG. 9, a procedure 91 on the first line is an example of a procedure for deleting an attribute. The procedure 91 is defined as “deleting the attribute <ID>”. The procedure 92 on the second line is an example of a procedure for selecting a record satisfying a specific condition. The procedure 92 is defined as “select the attribute value <low><high> of the attribute <height>”. The procedure 93 on the third line is an example of a procedure for deleting an attribute. The procedure 93 is defined as “deleting the attribute <gripping strength (right)>”. The procedure 94 on the fourth line is an example of a procedure for grouping attribute values. In the procedure 94, the attribute value of “attribute <shoulder” <always>
<Sometimes> are grouped into attributes <yes>. " The procedure 95 on the fifth line is an example of a procedure for replacing attribute values with null values. The procedure 95 is defined as “deleting the attribute value <No> of the attribute <shoulder>”.

【００３６】図８の中間データベスに図９の第２手続き
ファイルを適用した結果得られる中間データベースを図
１０に示す。図１０においては、上記の図９の第１行の
手続き９１において、属性＜ＩＤ＞のフィールドが全て
削除され、第２行の手続き９２において、属性＜身長＞
の属性値＜低＞＜高＞のみが選択され、第３行の手続き
９３において、属性＜握力（右）＞が削除され、第４行
の手続き９４において、属性＜肩がこる＞の属性値＜い
つも＞＜時々＞が属性＜はい＞にグループ化され、第５
行の手続き９５において、属性＜肩がこる＞の属性値＜
いいえ＞が削除された結果のみが表示されていることが
分かる。FIG. 10 shows an intermediate database obtained as a result of applying the second procedure file of FIG. 9 to the intermediate database of FIG. In FIG. 10, in the procedure 91 on the first line in FIG. 9, all the fields of the attribute <ID> are deleted, and in the procedure 92 on the second line, the attribute <height>
Only the attribute values <low> and <high> are selected, the attribute <grip strength (right)> is deleted in the procedure 93 on the third line, and the attribute value of the attribute <shoulder> is deleted in the procedure 94 on the fourth line. <Always><sometimes> are grouped into attribute <yes>
In the row procedure 95, the attribute value <attribute <
It can be seen that only the result in which “No” was deleted is displayed.

【００３７】図１１は、第３手続きファイルの例を示す
図である。図１１において、第１行の手続き１１１は、
「属性＜性別＞の属性値＜男＞を＜性別：男＞に項目化
します」と定義されている。第２行の手続き１１２は、
「属性＜性別＞の属性値＜女＞を＜性別：女＞に項目化
します」と定義されている。第３行の手続き１１３は、
「属性＜性別＞の属性値＜男＞かつ属性＜年齢＞の属性
値＜３０以下＞を＜青年＞に項目化します」と定義され
ている。第４行の手続き１１４は、「属性＜年齢＞の属
性値＜３０以下＞を＜年齢：３０以下＞に項目化しま
す」と定義されている。FIG. 11 is a diagram showing an example of the third procedure file. In FIG. 11, the procedure 111 on the first line is
It is defined as "itemize attribute value <male> of attribute <gender> to <sex: male>". The procedure 112 on the second line is
It is defined that "attribute value <female> of attribute <gender> is itemized as <sex: female>". The procedure 113 on the third line is
It is defined as "attribute attribute value <male> of attribute <sex> and attribute value <30 or less> of attribute <age> to <youth>." The procedure 114 on the fourth line is defined as “itemize attribute value <30 or less> of attribute <age> to <age: 30 or less>.

【００３８】図１０の中間データベスに図１１の第３手
続きファイルを適用した結果得られる中間データベース
を図１２に示す。図１２においては、上記の図１１の第
１行の手続き１１１において、属性＜性別＞の属性値＜
男＞が＜性別：男＞に項目化され、第２行の手続き１１
２において、属性＜性別＞の属性値＜女＞が＜性別：女
＞に項目化され、第３行の手続き１１３において、属性
＜性別＞の属性値＜男＞かつ属性＜年齢＞の属性値＜３
０以下＞が＜青年＞に項目化され、第４行の手続き１１
４おいて、属性＜年齢＞の属性値＜３０以下＞が＜年
齢：３０以下＞に項目化された結果のみが表示されてい
ることが分かる。FIG. 12 shows an intermediate database obtained by applying the third procedure file of FIG. 11 to the intermediate database of FIG. In FIG. 12, in the procedure 111 on the first line in FIG. 11, the attribute value of the attribute <sex>
Man> is itemized as <sex: man>, and procedure 11 in the second line is
2, the attribute value <female> of the attribute <sex> is itemized as <sex: female>, and in the procedure 113 on the third line, the attribute value <male> of the attribute <sex> and the attribute value of the attribute <age><3
<0> is itemized as <Youth>, and procedure 11 in the fourth line is
4, it can be seen that only the result in which the attribute value <30 or less> of the attribute <age> is itemized as <age: 30 or less> is displayed.

【００３９】手続きファイルの選択図２の前処理設定２４において、使用手続きファイルを
選択する処理について説明する。基本的には手続きファ
イルは前段の手続きファイルに依存する。つまり、第２
手続きファイルは第１手続きファイルに依存し、また第
３手続きファイルは第２手続きファイルに依存してい
る。例えば、第１手続きファイルで属性値＜年齢＞を属
性値＜２０以下＞と属性値＜２０以上＞に離散化した場
合、第２手続きファイルで「属性＜年齢＞の属性値＜３
０以上＞を削除します」という手続きがあってもこれは
正常に適用されない。第１手続きで属性値＜２０以下＞
と属性値＜２０以上＞に離散化されたために、＜３０以
上＞の属性値は存在しないからである。 Selection of Procedure File The process of selecting a used procedure file in the preprocessing setting 24 of FIG. 2 will be described. Basically, the procedure file depends on the previous procedure file. That is, the second
The procedure file depends on the first procedure file, and the third procedure file depends on the second procedure file. For example, when the attribute value <age> is discretized into the attribute value <20 or less and the attribute value <20 or more> in the first procedure file, the attribute value <3
This procedure will not be applied normally. Attribute value <20 or less> in the first procedure
This is because there is no attribute value of <30 or more> because the attribute value is discretized to <20 or more>.

【００４０】手続きファイル間の依存関係の例を図１３
に示す。図１３において、第３手続きファイルＡＡＡと
ＡＡＢが第２手続きファイルＡＡに依存し、第３手続き
ファイルＡＢＡとＡＢＢが第２手続きファイルＡＢに依
存し、第３手続きファイルＢＡＡが第２手続きファイル
ＢＡに依存し、第２手続きファイルＡＡとＡＢが第１手
続きファイルＡに依存し、第２手続きファイルＢＡが第
１手続きファイルＢに依存していることを表す。このよ
うなファイル依存関係は、図３のファイル管理手段３１
７で管理される。FIG. 13 shows an example of a dependency relationship between procedure files.
Shown in In FIG. 13, the third procedure files AAA and AAB depend on the second procedure file AA, the third procedure files ABA and ABB depend on the second procedure file AB, and the third procedure file BAA depends on the second procedure file BA. Dependent, the second procedure files AA and AB are dependent on the first procedure file A, and the second procedure file BA is dependent on the first procedure file B. Such a file dependency is stored in the file management unit 31 of FIG.
7 is managed.

【００４１】図１４は、使用手続きファイルを選択する
ための選択画面を示す図である。図１４（ａ）では、第
１手続きファイルリストボックス１４１でファイルＡが
選択され、第２手続きファイルリストボックス１４２で
は、ファイルＡと依存関係があるファイルＡＡとＡＢが
表示され、さらにファイルＡＡが選択されると、第３手
続きファイルリストボックス１４３ではファイルＡＡと
依存関係があるファイルＡＡＡとＡＡＢが表示される。FIG. 14 is a diagram showing a selection screen for selecting a usage procedure file. In FIG. 14A, the file A is selected in the first procedure file list box 141, the files AA and AB having a dependency relationship with the file A are displayed in the second procedure file list box 142, and the file AA is further selected. Then, in the third procedure file list box 143, files AAA and AAB that have a dependency relationship with the file AA are displayed.

【００４２】図１４（ａ）の第２手続きファイルリスト
ボックスでファイルＡＢを選択した場合を図１４（ｂ）
に示す。図１４（ｂ）では、第１手続きファイルリスト
ボックス１４１でファイルＡが選択され、第２手続きフ
ァイルリストボックス１４２では、ファイルＡと依存関
係があるファイルＡＡとＡＢが表示され、さらにファイ
ルＡＢが選択されると、第３手続きファイルリストボッ
クス１４３ではファイルＡＢと依存関係があるファイル
ＡＢＡとＡＢＢが表示される。FIG. 14 (b) shows a case where file AB is selected in the second procedure file list box of FIG. 14 (a).
Shown in In FIG. 14B, the file A is selected in the first procedure file list box 141, the files AA and AB having a dependency relationship with the file A are displayed in the second procedure file list box 142, and the file AB is further selected. Then, in the third procedure file list box 143, files ABA and ABB that have a dependency relationship with the file AB are displayed.

【００４３】また、図１４（ａ）の第１手続きファイル
リストボックスでファイルＢを選択した場合を図１４
（ｃ）に示す。図１４（ｃ）では、第１手続きファイル
リストボックス１４１でファイルＢが選択され、第２手
続きファイルリストボックス１４２では、ファイルＢと
依存関係があるファイルＢＡが表示され、さらにファイ
ルＢＡが選択されると、第３手続きファイルリストボッ
クス１４３ではファイルＢＡと依存関係があるファイル
ＢＡＡが表示される。以上のような方式を取ることによ
り、正しい依存関係にある手続きファイルの組みの適用
を保障することができる。FIG. 14 shows a case where file B is selected in the first procedure file list box of FIG.
It is shown in (c). In FIG. 14C, a file B is selected in the first procedure file list box 141, and a file BA having a dependency with the file B is displayed in the second procedure file list box 142, and a file BA is further selected. In the third procedure file list box 143, a file BAA having a dependency relationship with the file BA is displayed. By adopting the above-described method, it is possible to guarantee the application of a set of procedure files having correct dependencies.

【００４４】マイニング実行およびマイニング設定図２の前処理実行２１が終了してマイニングの対象とな
る項目データベース３０４が生成されると、次にマイニ
ング実行２２を行う。マイニング実行に先立って、利用
者は事前にマイニング設定２５を行う。図５はマイニン
グ設定の詳細を示す図である。図５において、パラメー
タ設定手段５３がマイニングのパラメータを設定し、ま
た条件部結論部設定手段５２が相関ルールの条件部と結
論部の項目を設定する。マイニング時属性階層表示手段
５１は表示用属性階層生成手段３１４からの指示によっ
て、利用者３１８のためにマイニング時の属性階層を表
示する。なお、マイニングのパラメータは用いるアルゴ
リズムに依存する。 Mining Execution and Mining Setting When the preprocessing execution 21 of FIG. 2 is completed and the item database 304 to be mined is generated, the mining execution 22 is performed next. Prior to the execution of mining, the user performs mining setting 25 in advance. FIG. 5 is a diagram showing details of the mining setting. In FIG. 5, a parameter setting means 53 sets mining parameters, and a condition part conclusion part setting means 52 sets items of a condition part and a conclusion part of an association rule. The mining attribute hierarchy display means 51 displays the mining attribute hierarchy for the user 318 in response to an instruction from the display attribute hierarchy generation means 314. The mining parameters depend on the algorithm used.

【００４５】利用者３１８がマイニング実行の指示を出
すと、図３のマイニング実行手段３０５は項目データベ
ース３０４を入力として、相関ルール抽出処理を実行
し、その結果を結果ルールファイル３０６として出力す
る。相関ルール抽出処理としては、公知の従来技術によ
って実行できる。図１２の項目データベース３０４か
ら、例えば、「年齢：５０以上→肩がこる：はい」とい
うような相関ルールが抽出される。When the user 318 issues an instruction to execute mining, the mining executing means 305 shown in FIG. 3 executes the association rule extraction processing with the item database 304 as an input, and outputs the result as a result rule file 306. The association rule extraction processing can be executed by a known conventional technique. For example, an association rule such as “age: 50 or more → shoulder: yes” is extracted from the item database 304 of FIG.

【００４６】結果表示図１５は、図３の結果表示手段３１６によって表示され
る結果表示画面を示す。ここで相関ルールリストボック
ス１５１内の表示は省略しているが、膨大な数の相関ル
ールが表示されているものとする。項目リストボックス
１５２で項目を選択した上で、ラジオボタン１５３、１
５４、１５５で条件部、結論部、条件部、または結論部
のどれかを選択して絞り込みボタン１５６をクリックす
ることにより、該項目を含む相関ルールが相関ルールリ
ストボックス１５１内に表示される。ここで項目リスト
ボックス１５２の、各項目に対応する条件部欄１５７に
は該項目を条件部に含むような相関ルールの数が、また
結論部欄１５８には該項目を結論部に含むような相関ル
ールの数が表示されている。各項目に対応する相関ルー
ル数は、結果表示画面立ち上げ時に図３の結果ルールフ
ァイル３０６の内容を読み込む際に算出しておく。図１
５からは、例えば項目＜性別：男＞を条件部に含むよう
な相関ルールの数は２５８個、また結論部に含むような
相関ルールの数は１３０個あることがわかる。このよう
に各項目を含む相関ルールの数が表示されているので、
利用者３１８はそれを参照して実際に絞り込みをするか
どうかの選択をすることができる、という効果がある。 Result Display FIG. 15 shows a result display screen displayed by the result display means 316 of FIG. Although the display in the association rule list box 151 is omitted here, it is assumed that an enormous number of association rules are displayed. After selecting an item in the item list box 152, radio buttons 153, 1
By selecting any one of the condition part, the conclusion part, the condition part, and the conclusion part at 54 and 155 and clicking the narrow down button 156, the association rule including the item is displayed in the association rule list box 151. Here, in the item list box 152, the number of association rules including the item in the condition part is stored in the condition part column 157 corresponding to each item, and the number of association rules including the item in the conclusion part is stored in the conclusion part column 158. The number of association rules is displayed. The number of correlation rules corresponding to each item is calculated when the content of the result rule file 306 in FIG. 3 is read when the result display screen is started. FIG.
5 indicates that, for example, the number of association rules including the item <sex: male> in the condition part is 258, and the number of association rules including the item <gender: male> in the conclusion part is 130. Since the number of association rules that include each item is displayed,
There is an effect that the user 318 can select whether or not to actually narrow down by referring to it.

【００４７】前処理設定次に、図２の前処理設定２４において、手続きファイル
を作成する処理について説明する。図１６は、第１手続
きファイル編集画面を示す。ここでは図１の関係データ
ベース３０１に対する第１手続きファイル編集を行う場
合について示している。属性階層表示部分１６１には、
関係データベース３０１内の属性が、利用者が事前に設
定した属性階層に従って表示されている。The pre-processing setting Next, the pretreatment set 24 of FIG. 2, a description will be given of a process to create a procedure file. FIG. 16 shows a first procedure file editing screen. Here, a case where the first procedure file is edited for the relational database 301 of FIG. 1 is shown. The attribute hierarchy display portion 161 includes:
Attributes in the relational database 301 are displayed according to an attribute hierarchy set in advance by the user.

【００４８】図１７は、各アイコンの意味を示す図であ
る。ここで属性グループは、属性の上位概念である。こ
のように、本実施の形態においては、属性値グループ、
属性、属性値に対応する各ノードの表示形態を変えるこ
とができるので、利用者にとってわかりやすい、という
効果がある。FIG. 17 is a diagram showing the meaning of each icon. Here, the attribute group is a superordinate concept of the attribute. As described above, in the present embodiment, the attribute value group,
Since the display mode of each node corresponding to the attribute and the attribute value can be changed, there is an effect that the user can easily understand.

【００４９】図１６において、チェックボックス１６２
は各属性ノードに付与されており、該属性が離散化手続
き作成の対象となるかどうかを指定するものである。チ
ェックされていれば離散化手続き作成対象であることを
意味し、チェックされていなければ離散化手続き作成対
象外であることを意味する。非数値属性は自動的に離散
化手続き作成対象外と見做し、チェックボックスのチェ
ックを不可とする。数値属性はデフォルトでは全て離散
化手続き作成対象とするが、利用者が変更する事もでき
る。In FIG. 16, a check box 162
Is assigned to each attribute node, and specifies whether or not the attribute is a target of creation of the discretization procedure. If it is checked, it means that a discretization procedure is to be created. If it is not checked, it means that it is not a discretization procedure creation object. Non-numeric attributes are automatically considered out of the discretization procedure creation target, and the check box is disabled. By default, all numerical attributes are targeted for creation of the discretization procedure, but can be changed by the user.

【００５０】次に、手続きの編集方法について説明す
る。第１手続きファイル編集で対象となる手続きは離散
化手続きである。図４で示したように、手続きファイル
生成手段４１において、空の第１手続きファイルが作成
され、手続ファイル３１２に保存される。図１６の属性
階層表示部分１６１において、利用者３１８はまずマウ
ス等の入力手段でノードを選択する。その後離散化ボタ
ン１６３をクリックすると、離散化方法を設定するため
のダイアログが出現する。離散化の方法としては、属性
値の上限と下限を用いてその間を等分割する方法や、頻
度を数えて等頻度になるように境界値を設定する方法
や、利用者が境界値のリストを与える方法などがサポー
トされる。また、例えば図７の手続き７１、７２のよう
に条件部を持つような離散化手続きも設定可能である。
属性ノードを選択して離散化手続き作成を行った場合
は、該属性に対する離散化手続きが作成され、また、属
性グループノードを選択して離散化手続きを行った場合
は、該属性グループの下位にある全ての離散化手続き作
成対象の属性に対する離散化手続きが作成される。な
お、既に離散化手続きが存在する属性に対して新たな離
散化手続きを追加した場合は、古い離散化手続きが新た
な離散化手続きに置き換えられる。Next, a procedure editing method will be described. The target procedure in the first procedure file editing is a discretization procedure. As shown in FIG. 4, the procedure file generating means 41 creates an empty first procedure file and stores it in the procedure file 312. In the attribute hierarchy display portion 161 of FIG. 16, the user 318 first selects a node using input means such as a mouse. After that, when the discretization button 163 is clicked, a dialog for setting the discretization method appears. The discretization method includes the method of dividing the attribute value equally using the upper and lower limits of the attribute value, the method of setting the boundary value so that the frequency is counted and the frequency equal, and the method in which the user sets the boundary value list. The method of giving is supported. Further, for example, a discretization procedure having a conditional part such as procedures 71 and 72 in FIG. 7 can also be set.
If an attribute node is selected and a discretization procedure is created, a discretization procedure for the attribute is created.If an attribute group node is selected and the discretization procedure is performed, a discretization procedure is created below the attribute group. Discretization procedures are created for all the attributes for which certain discretization procedures are to be created. When a new discretization procedure is added to an attribute for which a discretization procedure already exists, the old discretization procedure is replaced with a new discretization procedure.

【００５１】以上のように、本実施の形態においては、
ノードを選択して手続きを生成するので、手続きの編集
を簡単に行うことができるという効果がある。さらに、
属性グループを指定して複数の属性に対して一括して手
続きを作成することが可能なので、特に属性の数が多い
場合に手続き作成の労力を大幅に軽減することができる
という効果がある。As described above, in the present embodiment,
Since the procedure is generated by selecting a node, there is an effect that the procedure can be easily edited. further,
Since a procedure can be created collectively for a plurality of attributes by specifying an attribute group, there is an effect that the labor for creating the procedure can be greatly reduced especially when the number of attributes is large.

【００５２】図１６の属性階層表示部分１６１において
離散化手続きが存在する属性を選択すると、該属性に関
する離散化手続きが手続きリストボックス１６４に表示
される。手続きリストボックス１６４内の離散化手続き
を選択した上で、削除ボタン１６５をクリックして該手
続きの削除を、また変更ボタン１６６をクリックして該
手続きの変更を行うことができる。また、離散化手続き
が作成された属性は属性階層表示部分１６１で強調表示
される。従って、編集状態がわかりやすい、という効果
がある。When an attribute having a discretization procedure is selected in the attribute hierarchy display portion 161 of FIG. 16, a discretization procedure relating to the attribute is displayed in a procedure list box 164. After selecting the discretization procedure in the procedure list box 164, the user can click the delete button 165 to delete the procedure, and click the change button 166 to change the procedure. The attribute for which the discretization procedure has been created is highlighted in the attribute hierarchy display portion 161. Therefore, there is an effect that the editing state is easy to understand.

【００５３】例えば、図１６で属性＜身長＞に対応する
ノードを選択して手続きを作成した後の状態は図１８の
ようになる。図１８においては、属性＜身長＞に対応す
るノードが選択されたので、該ノード１８２が強調表示
されると共に、手続きリストボックス１８３内に、たと
えば、「属性＜身長＞を境界値＜１６０＞＜１８０＞で
属性値＜低＞＜中＞＜高＞に離散化する」なる手続きが
表示される。For example, the state after selecting the node corresponding to the attribute <height> in FIG. 16 and creating the procedure is as shown in FIG. In FIG. 18, since the node corresponding to the attribute <height> is selected, the node 182 is highlighted and, for example, “attribute <height> is added to the boundary value <160><180>, the attribute value is discretized to <low><medium><high>."

【００５４】次に、図１６の属性階層表示部分１６１の
表示方法について説明する。図３のデータベース解析手
段３０７は、データ辞書３０８が存在しない場合に限
り、関係データベース３０１を入力としてデータ辞書３
０８を作成する。データ辞書３０８は、関係データベー
ス３０１の属性名と、該属性に対する属性値情報を属性
ごとに記述したファイルである。属性値情報としては、
関係データベース３０１に出現する属性値のリスト、属
性値の型などが必要である。一方、利用者３１８は関係
データベース３０１内の全属性間の関係を木構造として
表現した属性階層構造情報３１５を事前に作成してお
く。属性階層構造情報３１５の作成に際しては、データ
辞書３０８に記載されている属性情報を利用することも
可能である。Next, a method of displaying the attribute hierarchy display portion 161 in FIG. 16 will be described. Only when the data dictionary 308 does not exist, the database analysis unit 307 of FIG.
08 is created. The data dictionary 308 is a file in which attribute names of the relational database 301 and attribute value information for the attributes are described for each attribute. As attribute value information,
A list of attribute values appearing in the relational database 301, an attribute value type, and the like are required. On the other hand, the user 318 creates in advance attribute hierarchical structure information 315 that expresses the relation between all the attributes in the relation database 301 as a tree structure. When creating the attribute hierarchical structure information 315, attribute information described in the data dictionary 308 can be used.

【００５５】図１９は、属性階層構造情報３１５の例を
示す図である。図１９において、全データは「問診」、
「身体検査」、・・・・等に分類され、さらに、「問
診」は「肩がこる」、・・・等に分類され、「身体検
査」は「身長」、「握力（右）」、・・・等に分類され
た属性階層構造が生成される。FIG. 19 is a diagram showing an example of the attribute hierarchy structure information 315. In FIG. 19, all data are “interview”,
"Physical examination" is classified into "physical examination", etc. Further, "interview" is classified into "stiff shoulders", ..., etc. "Physical examination" is "height", "grip strength (right)", .. Are generated.

【００５６】図３の表示用属性階層生成手段３１４は、
属性階層構造情報３１５をそのまま出力する。そして図
４の前処理時属性階層表示手段４３が、図３の表示用属
性階層生成手段３１４が出力した木構造を表示する。図
１６の属性階層表示部分１６１はその表示例である。図
１６において、利用者が操作した結果が図４の手続きフ
ァイル編集手段４２によって解釈され、実際に手続きフ
ァイル３１２が編集される。The display attribute hierarchy generating means 314 in FIG.
The attribute hierarchical structure information 315 is output as it is. Then, the pre-processing attribute hierarchy display means 43 of FIG. 4 displays the tree structure output by the display attribute hierarchy generation means 314 of FIG. The attribute hierarchy display portion 161 in FIG. 16 is a display example. In FIG. 16, the result of the user's operation is interpreted by the procedure file editing means 42 in FIG. 4, and the procedure file 312 is actually edited.

【００５７】図２０は、第２手続きファイル編集画面を
示す。第２手続きファイルの内容は第１手続きファイル
の内容に依存するので、事前に第１手続きファイルを選
択しておく必要がある。ここでは図７の第１手続きファ
イルを選択した場合について示している。図２０の属性
階層表示部分２０１には、関係データベース３０１内の
属性が、利用者３１８が事前に設定した属性階層に従っ
て表示されている。また、各属性の下には該属性に対す
る属性値が表示されている。FIG. 20 shows a second procedure file editing screen. Since the contents of the second procedure file depend on the contents of the first procedure file, it is necessary to select the first procedure file in advance. Here, the case where the first procedure file of FIG. 7 is selected is shown. In the attribute hierarchy display portion 201 of FIG. 20, the attributes in the relational database 301 are displayed according to the attribute hierarchy set by the user 318 in advance. The attribute value for each attribute is displayed below each attribute.

【００５８】手続きの編集は、前述の離散化手続き編集
と同様に、属性階層表示部分２０１で一つまたは複数の
ノードを選択して削除ボタン２０２、選択ボタン２０
３、属性値グルーピングボタン２０４のいずれかをクリ
ックすることにより行う。但し、属性グループノードを
選択した場合にクリックできるのは、削除ボタン２０２
だけである。属性グループノードを選択して削除ボタン
２０２をクリックした場合、該ノードの下位にある全て
の属性に対して属性削除手続きが生成される。また、既
に存在する手続きの削除や変更も、前述の離散化手続き
編集と同様に行うことができる。As for the editing of the procedure, one or a plurality of nodes are selected in the attribute hierarchy display part 201 and the delete button 202 and the select button
3. This is performed by clicking any one of the attribute value grouping buttons 204. However, what can be clicked when the attribute group node is selected is the delete button 202.
Only. When an attribute group node is selected and the delete button 202 is clicked, an attribute deletion procedure is generated for all the attributes below the node. In addition, the deletion or change of the existing procedure can be performed in the same manner as in the above-described editing of the discretization procedure.

【００５９】ここで図２０の属性階層表示部分２０１の
表示方法について説明する。図３のデータ辞書変換手段
３０９は、第１手続きファイルを参照することにより、
データ辞書３０８に各離散化対象数値属性に対する属性
値リストを付加して中間データ辞書３１０を作成する。
なお、既に第１手続きファイル適用後の中間データ辞書
３１０が存在する場合は以上の処理は行わない。Here, a method of displaying the attribute hierarchy display portion 201 in FIG. 20 will be described. The data dictionary conversion means 309 of FIG. 3 refers to the first procedure file,
An intermediate data dictionary 310 is created by adding an attribute value list for each numerical attribute to be discretized to the data dictionary 308.
If the intermediate data dictionary 310 to which the first procedure file has been applied already exists, the above processing is not performed.

【００６０】図３の表示用属性階層生成手段３１４は、
中間データ辞書３１０と属性階層構造情報３１５とを入
力とし、属性階層構造の各属性の下位に該属性に関する
属性値を付加した形式の木構造を作成する。そして図４
の前処理時属性階層表示手段４３が、図３の表示用属性
階層生成手段３１４が出力した木構造を表示する。図２
０の属性階層表示部分２０１は、その表示用属性階層生
成手段３１４が出力した木構造の表示例である。The display attribute hierarchy generating means 314 of FIG.
The intermediate data dictionary 310 and the attribute hierarchical structure information 315 are input, and a tree structure is created in which an attribute value relating to the attribute is added below each attribute in the attribute hierarchical structure. And FIG.
The pre-processing attribute hierarchy display means 43 displays the tree structure output by the display attribute hierarchy generation means 314 in FIG. FIG.
The attribute hierarchy display portion 201 of 0 is a display example of a tree structure output by the display attribute hierarchy generation unit 314.

【００６１】図２１は、第３手続きファイル編集画面を
示す。第３手続きファイルの内容は第１手続きファイル
と第２手続きファイルの内容に依存するので、事前に第
１手続きファイルと第２手続きファイルを選択しておく
必要がある。ここでは、図２１には、図７の第１手続き
ファイルと図９の第２手続きファイルを選択した場合が
示される。図２１の属性階層表示部分２１１には、関係
データベース３０１に対して第１手続きファイル７０と
第２手続きファイル９０を適用した後のデータベースに
関する属性と属性値が、利用者が事前に設定した属性階
層に従って表示されている。FIG. 21 shows a third procedure file editing screen. Since the contents of the third procedure file depend on the contents of the first procedure file and the second procedure file, it is necessary to select the first procedure file and the second procedure file in advance. Here, FIG. 21 shows a case where the first procedure file of FIG. 7 and the second procedure file of FIG. 9 are selected. In the attribute hierarchy display part 211 of FIG. 21, the attributes and attribute values related to the database after the first procedure file 70 and the second procedure file 90 are applied to the relational database 301 are the attribute hierarchy set by the user in advance. It is displayed according to.

【００６２】項目化手続きは全ての属性値に対して必要
であるので、第３手続きファイル生成の際に、デフォル
トの項目化手続きを全ての属性値に対して生成しておく
ものとする。これは図４の手続きファイル生成手段４１
が中間データ辞書３１０を参照することによって行う。
この場合、図２１の属性階層表示部分２１１において、
デフォルトの項目化手続きから変更された手続きに対応
するノードを強調表示する。Since the itemization procedure is required for all attribute values, it is assumed that a default itemization procedure is generated for all attribute values when the third procedure file is generated. This is the procedure file generating means 41 of FIG.
Is performed by referring to the intermediate data dictionary 310.
In this case, in the attribute hierarchy display portion 211 of FIG.
Highlight the node corresponding to the procedure changed from the default itemization procedure.

【００６３】手続きの編集は、前述の離散化手続き編集
と同様に、図２１の属性階層表示部分２１１で一つまた
は複数のノードを選択して項目化ボタン２１２、項目化
グルーピングボタン２１３のいずれかをクリックするこ
とにより行う。また、既に存在する手続きの削除や変更
も、前述の離散化手続き編集と同様に行うことができ
る。As in the editing of the discretization procedure, one or a plurality of nodes are selected in the attribute hierarchy display portion 211 in FIG. 21 and either the itemization button 212 or the itemization grouping button 213 is edited. This is done by clicking. In addition, the deletion or change of the existing procedure can be performed in the same manner as in the above-described editing of the discretization procedure.

【００６４】ここで図２１の属性階層表示部分２１１の
表示方法について説明する。図３のデータ辞書変換手段
３０９は、第１手続きファイルを参照することにより、
データ辞書３０８に各離散化対象数値属性の属性値リス
トを付加、また第２手続きファイルを参照することによ
り、属性や属性値の削除、属性値の変換等の処理を施し
て中間データ辞書３１０を作成する。なお、第１手続き
ファイル７０適用後の中間データ辞書３１０が既に存在
する場合は、これに第２手続きファイル９０を適用する
形で中間データ辞書３１０を作成しても良い。また、既
に第１、第２手続きファイル適用後の中間データ辞書３
１０が存在する場合は、以上の処理は行わない。Here, a method of displaying the attribute hierarchy display portion 211 of FIG. 21 will be described. The data dictionary conversion means 309 of FIG. 3 refers to the first procedure file,
An attribute value list of each numerical attribute to be discretized is added to the data dictionary 308, and by referring to the second procedure file, processing such as deletion of attributes and attribute values, conversion of attribute values, and the like is performed to create the intermediate data dictionary 310. create. When the intermediate data dictionary 310 after the application of the first procedure file 70 already exists, the intermediate data dictionary 310 may be created by applying the second procedure file 90 to this. Also, the intermediate data dictionary 3 already applied with the first and second procedure files
If 10 exists, the above processing is not performed.

【００６５】図３の表示用属性階層生成手段３１４は、
中間データ辞書３１０と属性階層構造情報３１５とを入
力とし、属性階層構造から中間データ辞書３１０に含ま
れない属性を削除、また各属性の下位に属性値を付加し
た形式の木構造を出力する。そして図４の前処理時属性
階層表示手段４３が、図３の表示用属性階層生成手段３
１４が出力した木構造を表示する。図２１の属性階層表
示部分２１１は、表示用属性階層生成手段３１４が出力
した木構造の表示例である。The display attribute hierarchy generating means 314 of FIG.
The intermediate data dictionary 310 and the attribute hierarchical structure information 315 are input, and an attribute not included in the intermediate data dictionary 310 is deleted from the attribute hierarchical structure, and a tree structure in which an attribute value is added below each attribute is output. Then, the pre-processing attribute hierarchy display means 43 shown in FIG.
The tree structure output by 14 is displayed. The attribute hierarchy display portion 211 in FIG. 21 is a display example of a tree structure output by the display attribute hierarchy generation unit 314.

【００６６】マイニング設定次に、マイニング設定の条件部と結論部の項目指定に関
連する部分について説明する。図２２は、条件部結論部
項目指定画面を示す。条件部と結論部の指定は手続きフ
ァイルの内容に依存するので、事前に使用する手続きフ
ァイルを選択しておく必要がある。ここでは第１手続き
ファイルとして図７を、第２手続きファイルとして図９
を、また第３手続きファイルとして図１１を選択した場
合について説明する。図２２の属性階層表示部分２２１
には、関係データベース３０１に第１、第２、第３手続
きファイルを適用した後のデータベースに関する属性と
属性値が、利用者が事前に設定した属性階層に従って表
示されている。 Mining Setting Next, a description will be given of a part related to item specification of a condition part and a conclusion part of the mining setting. FIG. 22 shows a condition part conclusion part item designation screen. Since the specification of the condition part and the conclusion part depends on the contents of the procedure file, it is necessary to select a procedure file to be used in advance. Here, FIG. 7 is used as the first procedure file, and FIG. 9 is used as the second procedure file.
And the case where FIG. 11 is selected as the third procedure file. Attribute hierarchy display part 221 in FIG.
Shows the attributes and attribute values of the database after applying the first, second, and third procedure files to the relational database 301 according to the attribute hierarchy set in advance by the user.

【００６７】ここで、条件部と結論部の項目指定に関す
る各アイコンの意味は図２３の通りである。図２３にお
いては、たとえば、（１）相関ルールの条件部にのみ現
れてよい、（２）相関ルールの結論部にのみ現れてよ
い、（３）相関ルールの条件部結論部いずれに現れても
よい、（４）相関ルールに現れてはいけない、の４つの
モードが用意される。図２２においては、例えば、項目
＜身長：低＞と項目＜身長：高＞は相関ルールに現れて
はいけない、という指定になっている。ここで、項目以
外の属性や属性グループのノードの設定自体はマイニン
グ実行に影響しない。Here, the meaning of each icon regarding the item designation of the condition part and the conclusion part is as shown in FIG. In FIG. 23, for example, (1) it may appear only in the condition part of the association rule, (2) it may appear only in the conclusion part of the association rule, or (3) it may appear in the condition part of the association rule. Four modes of “good” and (4) must not appear in the association rule are prepared. In FIG. 22, for example, it is specified that the item <height: low> and the item <height: high> must not appear in the association rule. Here, the setting of the attribute other than the item or the node of the attribute group does not affect the mining execution.

【００６８】項目が相関ルールの条件部のみに現れても
良い、または結論部のみに現れても良い、または条件部
と結論部のどちらに現れても良い、または条件部と結論
部のいずれに現れてもいけない、の４種類のモードを、
異なる形態で表示するので、設定の状態が利用者にとっ
てわかりやすい、という効果がある。An item may appear only in the condition part of the association rule, may appear only in the conclusion part, may appear in either the condition part or the conclusion part, or may appear in either the condition part or the conclusion part. There are four modes that must not appear,
Since the information is displayed in a different form, there is an effect that the setting state is easily understood by the user.

【００６９】また、薄く表示してあるアイコンも指定の
意味は同じであるが、該ノードの指定と該ノードの下位
にあるノードの指定が全て同じであることはない、とい
うことを表す。逆に、表示が薄くない、つまり強調表示
となっているアイコンは、該ノードとその下位にあるノ
ードの指定が全て同じであることを表す。従って、条件
部と結論部の項目指定に関するアイコンの表示が薄くな
い場合、その下位にある項目は展開表示しなくても設定
状態がわかる、という効果がある。The designation of a lightly displayed icon has the same meaning, but the designation of the node and the designation of a node below the node are not all the same. Conversely, an icon that is not lightly displayed, that is, an icon that is highlighted, indicates that the designation of the node and the nodes below it are all the same. Therefore, when the icons related to the item designation of the condition part and the conclusion part are not light, the effect is that the setting state can be recognized without expanding and displaying the items below the item.

【００７０】さて、図２２の属性階層表示部分２２１に
おいて、あるノードを指定して条件部チェックボックス
２２２と結論部チェックボックス２２３を設定して適用
ボタン２２４をクリックすると、該ノードと、該ノード
の下位にあるノードが全て同一モードとして指定され
る。例えば属性グループ＜身体計測＞に対応するノード
を選択して、条件部チェックボックス２２２のみをチェ
ックして適用ボタン２２４をクリックすると、図２４の
属性階層表示部分２４１のように、＜身体計測＞ノード
とその下位にあるノードが全て「相関ルールの条件部の
みに現れても良い」指定となる。また、指定が全て同じ
なので、＜身体計測＞ノードにおけるアイコンの表示も
薄い表示ではなく、強調表示となる。このような指定方
式を取るので、利用者の設定の負荷を大幅に軽減するこ
とができる、という効果がある。Now, in the attribute hierarchy display part 221 of FIG. 22, when a certain node is designated and the condition part check box 222 and the conclusion part check box 223 are set and the apply button 224 is clicked, the node and the node All lower nodes are designated as the same mode. For example, when a node corresponding to the attribute group <body measurement> is selected, only the condition part check box 222 is checked, and the apply button 224 is clicked, the <body measurement> node is displayed as in the attribute hierarchy display portion 241 in FIG. And all nodes below it are specified as "may appear only in the condition part of the association rule". In addition, since the designations are all the same, the icon display in the <body measurement> node is not a faint display but a highlighted display. Since such a designation method is employed, there is an effect that the load of the user setting can be greatly reduced.

【００７１】ここで図２２の属性階層表示部分２２１の
表示方法について説明する。図３のデータ辞書変換手段
３０９は、第１手続きファイル７０を参照することによ
り、データ辞書に各離散化対象数値属性の属性値リスト
を付加、また第２手続きファイル９０を参照することに
より、属性や属性値の削除、属性値の変換等の処理を施
し、更に第３手続きファイル１１０を参照して属性値の
項目への変換や項目グルーピングの処理を施すことによ
り、中間データ辞書３１０を作成する。なお、第１手続
きファイル適用後の中間データ辞書３１０が既に存在す
る場合は、これに第２、第３手続きファイルを適用する
形で中間データ辞書３１０を作成しても良い。同様に、
第１、第２手続きファイル適用後の中間データ辞書３１
０が既に存在する場合は、これに第３手続きファイルを
適用する形で中間データ辞書３１０を作成しても良い。
また、既に第１、第２、第３手続きファイル適用後の中
間データ辞書３１０が存在する場合は、以上の処理は行
わない。Here, a method of displaying the attribute hierarchy display portion 221 in FIG. 22 will be described. The data dictionary conversion means 309 in FIG. 3 adds the attribute value list of each numerical attribute to be discretized to the data dictionary by referring to the first procedure file 70, and In addition, the intermediate data dictionary 310 is created by performing processing such as deletion of attribute values, conversion of attribute values, and the like, and conversion of attribute values to items and processing of item grouping with reference to the third procedure file 110. . If the intermediate data dictionary 310 after the application of the first procedure file already exists, the intermediate data dictionary 310 may be created by applying the second and third procedure files thereto. Similarly,
Intermediate data dictionary 31 after application of first and second procedure files
If 0 already exists, the intermediate data dictionary 310 may be created by applying the third procedure file to this.
Further, if the intermediate data dictionary 310 to which the first, second, and third procedure files have been applied already exists, the above processing is not performed.

【００７２】図３の表示用属性階層生成手段３１４は、
中間データ辞書と属性階層構造情報３１５とを入力と
し、属性階層構造から中間データ辞書３１０に含まれな
い属性を削除、また各属性の下位に項目を付加した形式
の木構造を出力する。そして図５のマイニング時属性階
層表示手段５１が、図３の表示用属性階層生成手段３１
４が出力した木構造を表示する。図２２の属性階層表示
部分２２１は、表示用属性階層生成手段３１４が出力し
た木構造の表示例である。図２２において利用者が操作
した結果が図５の条件部結論部設定手段で解釈され、実
際の設定が行われる。The display attribute hierarchy generating means 314 of FIG.
An intermediate data dictionary and attribute hierarchical structure information 315 are input, an attribute not included in the intermediate data dictionary 310 is deleted from the attribute hierarchical structure, and a tree structure in which an item is added below each attribute is output. Then, the mining attribute hierarchy display means 51 of FIG. 5 is replaced with the display attribute hierarchy generation means 31 of FIG.
4 displays the tree structure output. The attribute hierarchy display portion 221 in FIG. 22 is a display example of a tree structure output by the display attribute hierarchy generation unit 314. In FIG. 22, the result of the operation by the user is interpreted by the condition part conclusion part setting means of FIG. 5, and the actual setting is performed.

【００７３】[0073]

【発明の効果】以上のように、第１の発明に係るデータ
マイニング装置によれば、マイニングの前処理として、
関係データベースに対する離散化や属性削除や項目化な
どの変換方法を記述する複数の手続きから構成される手
続きファイルを用意し、一つまたは複数の手続きファイ
ルを逐次的に関係データベースに適応する、という方法
を取るので、手続きファイルの内容を変更して関係デー
タベースに適用することにより、異なった前処理実行を
簡単に繰り返すことができるという効果がある。As described above, according to the data mining apparatus according to the first aspect of the present invention, as the preprocessing of mining,
A method that prepares a procedure file consisting of multiple procedures that describe conversion methods such as discretization, attribute deletion, and itemization for a relational database, and sequentially applies one or more procedure files to the relational database. Therefore, by changing the contents of the procedure file and applying it to the relational database, there is an effect that different preprocessing can be easily repeated.

【００７４】第２の発明に係るデータマイニング装置に
よれば、関係データベース内の属性名と属性値情報を属
性ごとに記述したデータ辞書を生成し、さらに該データ
辞書に手続きファイルを適用して中間データ辞書を生成
し、関係データベースの属性間の関係を階層構造として
記述した属性階層構造情報を、該中間データ辞書の内容
に基づいて変換することによって得られる階層構造を表
示することが可能なので、関係データベースに手続きフ
ァイルを適用して得られるデータの構造を簡単に把握す
ることができるという効果がある。According to the data mining apparatus of the second invention, a data dictionary in which attribute names and attribute value information in the relational database are described for each attribute is generated, and a procedure file is applied to the data dictionary to generate an intermediate data dictionary. Since it is possible to generate a data dictionary and display the attribute hierarchical structure information describing the relationship between the attributes of the relational database as a hierarchical structure based on the contents of the intermediate data dictionary, the hierarchical structure obtained can be displayed. There is an effect that the structure of data obtained by applying the procedure file to the relational database can be easily grasped.

【００７５】第３の発明に係るデータマイニング装置に
よれば、階層構造の表示において、ノードをその種類、
つまり属性グループか属性か属性値かによって、異なる
形態で表示するので、利用者にとってわかりやすいとい
う効果がある。According to the data mining apparatus of the third invention, in displaying the hierarchical structure, the nodes are represented by their types,
That is, since the display is made in a different form depending on whether the attribute group, the attribute, or the attribute value, it is easy for the user to understand.

【００７６】第４の発明に係るデータマイニング装置に
よれば、表示された階層構造内のノードを選択して該ノ
ードに関する手続きを編集することが可能なので、手続
きファイルの編集を簡単に行うことができるという効果
がある。According to the data mining device of the fourth aspect, it is possible to select a node in the displayed hierarchical structure and edit the procedure related to the node, so that the procedure file can be easily edited. There is an effect that can be.

【００７７】第５の発明に係るデータマイニング装置に
よれば、手続き生成の際に、属性階層表示画面上のノー
ドに対して手続きの作成処理を行うことにより、該ノー
ド以下の全ての属性に対する手続きが一括して作成され
るので、利用者の負荷を軽減するという効果がある。According to the data mining apparatus of the fifth aspect, when a procedure is generated, the procedure creation processing is performed on the node on the attribute hierarchy display screen, so that the procedure for all the attributes below the node is performed. Are collectively created, which has the effect of reducing the load on the user.

【００７８】第６の発明に係るデータマイニング装置に
よれば、手続きファイル編集の際、階層構造の表示にお
いて、手続きファイル内の各手続きに対応するノードを
強調表示するので、手続きの編集の状態がわかりやすい
という効果がある。According to the data mining apparatus of the sixth aspect, when editing the procedure file, the nodes corresponding to the respective procedures in the procedure file are highlighted in the display of the hierarchical structure. This has the effect of being easy to understand.

【００７９】第７の発明に係るデータマイニング装置に
よれば、前処理の手続きファイルを、数値属性の離散化
に関する第１手続きファイル、属性値のグルーピングお
よび属性値の無値への置き換えおよび属性の削除および
特定条件を満たすレコードの削除に関する第２手続きフ
ァイル、属性値に属性情報を付与する項目化および項目
のグルーピングに関する第３手続きファイルの３種類に
分けて作成し、逐次的に関係データベースに適用するよ
うな枠組みを提供することにより、必要にして十分な前
処理実行を行うことができるという効果がある。According to the data mining apparatus of the seventh aspect, the preprocessing procedure file is converted into the first procedure file relating to the discretization of numerical attributes, the grouping of attribute values, the replacement of attribute values with null values, and the replacement of attribute values. A second procedure file related to deletion and deletion of records satisfying specific conditions, and a third procedure file related to itemization and attribute grouping for assigning attribute information to attribute values are sequentially created and applied to the relational database. By providing such a framework, there is an effect that necessary and sufficient preprocessing can be performed.

【００８０】第８の発明に係るデータマイニング装置に
よれば、事前に編集された第１、第２、第３手続きファ
イルのうち、正しい依存関係にある手続きファイルの組
みのみを選択できるような手続きファイル適用設定手段
を備えることにより、必要にして十分な前処理実行を行
うことができるという効果がある。According to the data mining apparatus of the eighth aspect, a procedure that allows only a set of procedure files having a correct dependency among the first, second, and third procedure files edited in advance to be selected. The provision of the file application setting means has an effect that necessary and sufficient preprocessing can be performed.

【００８１】第９の発明に係るデータマイニング装置に
よれば、また、マイニングの条件部結論部指定の際
に、属性階層表示画面上のノードの設定が該ノードの下
位にある全てのノードに波及するので、利用者の負荷を
軽減するという効果がある。According to the data mining apparatus of the ninth aspect, when the condition part conclusion part of the mining is designated, the setting of the node on the attribute hierarchy display screen is transmitted to all nodes below the node. Therefore, there is an effect that the load on the user is reduced.

【００８２】第１０の発明に係るデータマイニング装置
によれば、マイニングの条件部結論部指定の際に、ノー
ドの条件部結論部指定のモードによって、マイニング時
属性階層表示手段上のノードの表示形態を変えるので、
設定の状態が利用者にわかりやすいという効果がある。According to the data mining apparatus of the tenth aspect, when the condition part conclusion part of the mining is designated, the display form of the node on the attribute hierarchy display means at the time of mining depends on the mode of the condition part conclusion part specification of the node. Changes
There is an effect that the setting state is easily understood by the user.

【００８３】第１１の発明に係るデータマイニング装置
によれば、マイニングの条件部結論部指定の際に、マイ
ニング時属性階層表示手段上の各ノードについて該ノー
ドと該ノードの下位にあるノードのモードが全て同じに
なっている場合、該ノードを強調表示するので、属性階
層を適切なレベルまで展開した形式で表示すれば、全て
のノードに関する条件部結論部指定がわかるという効果
がある。According to the data mining apparatus of the eleventh aspect, when the mining condition part conclusion part is designated, each node on the attribute hierarchy display means at the time of mining is set to the mode of the node and the node below the node. Are all the same, the node is highlighted, so displaying the attribute hierarchy in an expanded form to an appropriate level has the effect of indicating the condition part conclusion part designation for all nodes.

【００８４】第１２の発明に係るデータマイニング装置
によれば、結果表示画面で、全項目とともに、各項目を
含む相関ルールの数を表示するので、項目を指定してそ
の項目を含む相関ルールだけを表示させる処理を行う場
合の利用者への指針になるという効果がある。According to the data mining apparatus of the twelfth aspect, the number of association rules including each item is displayed together with all items on the result display screen. There is an effect that it becomes a guide to the user when performing the process of displaying.

[Brief description of the drawings]

【図１】関係データベースの一例としての健康診断デ
ータベースを示す図である。FIG. 1 is a diagram illustrating a health checkup database as an example of a relational database.

【図２】本実施の形態におけるデータマイニングの処
理の流れを示す図である。FIG. 2 is a diagram showing a flow of processing of data mining in the present embodiment.

【図３】本実施の形態におけるデータマイニング装置
全体の詳細ブロック図である。FIG. 3 is a detailed block diagram of the entire data mining device in the present embodiment.

【図４】図３に示す前処理設定手段の詳細ブロック図
である。FIG. 4 is a detailed block diagram of a preprocessing setting unit shown in FIG. 3;

【図５】図３に示すマイニング設定手段の詳細ブロッ
ク図である。FIG. 5 is a detailed block diagram of a mining setting unit shown in FIG. 3;

【図６】本実施の形態における前処理実行の処理の流
れを示す図である。FIG. 6 is a diagram showing a flow of a pre-processing execution process in the present embodiment.

【図７】本実施の形態における第１手続きファイルの
例を示す図である。FIG. 7 is a diagram illustrating an example of a first procedure file according to the present embodiment.

【図８】図１の健康診断データベースに図８の第１手
続きファイルを適用した後の中間データベースを示す図
である。FIG. 8 is a diagram showing an intermediate database after the first procedure file of FIG. 8 is applied to the health examination database of FIG. 1;

【図９】本実施の形態における第２手続きファイルの
例を示す図である。FIG. 9 is a diagram illustrating an example of a second procedure file according to the present embodiment.

【図１０】図８の中間データベースに図９の第２手続
きファイルを適用した後の中間データベースを示す図で
ある。FIG. 10 is a diagram showing an intermediate database after the second procedure file of FIG. 9 is applied to the intermediate database of FIG. 8;

【図１１】本実施の形態における第３手続きファイル
の例を示す図である。FIG. 11 is a diagram illustrating an example of a third procedure file according to the present embodiment.

【図１２】図１０の中間データベースに図１１の第３
手続きファイルを適用した後の中間データベースを示す
図である。FIG. 12 shows the intermediate database of FIG.
It is a figure showing an intermediate database after applying a procedure file.

【図１３】各手続きファイルの依存関係の例を示す図
である。FIG. 13 is a diagram illustrating an example of a dependency relationship between procedure files.

【図１４】本実施の形態における使用手続きファイル
選択画面を示す図である。FIG. 14 is a diagram showing a usage procedure file selection screen according to the present embodiment.

【図１５】本実施の形態における結果表示画面を示す
図である。FIG. 15 is a diagram showing a result display screen in the present embodiment.

【図１６】本実施の形態における第１手続きファイル
編集画面を示す図である。FIG. 16 is a diagram showing a first procedure file editing screen according to the present embodiment.

【図１７】本実施の形態における手続きファイル編集
画面中のアイコンの意味を示す図である。FIG. 17 is a diagram showing the meaning of an icon in a procedure file editing screen according to the present embodiment.

【図１８】図１６において属性＜身長＞に新たな手続
きを追加した後の画面を示す図である。FIG. 18 is a diagram showing a screen after a new procedure is added to the attribute <height> in FIG.

【図１９】属性階層構造の例を示す図である。FIG. 19 is a diagram illustrating an example of an attribute hierarchical structure.

【図２０】本実施の形態における第２手続きファイル
編集画面を示す図である。FIG. 20 is a diagram showing a second procedure file editing screen according to the present embodiment.

【図２１】本実施の形態における第３手続きファイル
編集画面を示す図である。FIG. 21 is a diagram showing a third procedure file editing screen according to the present embodiment.

【図２２】本実施の形態における条件部結論部項目指
定画面を示す図である。FIG. 22 is a diagram showing a condition part conclusion part item designation screen in the present embodiment.

【図２３】本実施の形態に置ける条件部結論部項目指
定画面中のアイコンの意味を示す図である。FIG. 23 is a diagram illustrating the meaning of icons in a condition part conclusion part item designation screen according to the present embodiment.

【図２４】図２２において属性グループ＜身体計測＞
を選択して設定を行った後の画面を示す図である。FIG. 24 shows an attribute group <body measurement> in FIG.
FIG. 11 is a diagram showing a screen after selecting and setting.

[Explanation of symbols]

３０１関係データベース、３０２前処理実行手段、
３０３中間データベース、３０４項目データベー
ス、３０５マイニング実行手段、３０６結果ルール
ファイル、３０７データベース解析手段、３０８デ
ータ辞書、３０９データ辞書変換手段、３１０中間デ
ータ辞書、３１１前処理設定手段、３１２手続きフ
ァイル、３１３マイニング設定手段、３１４表示用
属性階層生成手段、３１５属性階層構造情報、３１６
結果表示手段、３１７ファイル管理手段、３１８
利用者301 relational database, 302 preprocessing execution means,
303 intermediate database, 304 item database, 305 mining executing means, 306 result rule file, 307 database analyzing means, 308 data dictionary, 309 data dictionary converting means, 310 intermediate data dictionary, 311 preprocessing setting means, 312 procedure file, 313 mining Setting means, 314 display attribute hierarchy generation means, 315 attribute hierarchy structure information, 316
Result display means, 317 File management means, 318
user

Claims

[Claims]

1. A preprocessing execution means for inputting a relational database and outputting an item database consisting of a set of items constituted by a plurality of records according to a user's specification, and the item database having the item database as an input In a data mining apparatus comprising: a mining execution unit for extracting a correlation rule between the items and outputting the result as a result rule file; and a result display unit for displaying the contents of the result rule file:
A procedure file for recording a procedure for converting a relational database into the item database is input. The procedure file is related to a procedure relating to discretization of numerical attributes, a procedure relating to grouping of attribute values, and replacing an attribute value with a null value. Addition, deletion, or change of any of the following procedures: a procedure for deleting attributes, a procedure for selecting records that satisfy specific conditions, a procedure for itemization to add attribute information to attribute values, and a procedure for grouping items And a procedure file application setting means for specifying one or more procedure files used in the preprocessing execution means and an order of application thereof. A data mining device characterized by the following.

2. A database analysis means for inputting the relational database and outputting a data dictionary in which attribute names and attribute value information in the relational database are described for each attribute, the data dictionary and the procedure file application setting means Data dictionary conversion means for inputting one or a plurality of ordered procedure files which have already been edited and designated as, and outputting an intermediate data dictionary as a result of applying the procedure file to the data dictionary; and A dictionary or an intermediate data dictionary, and attribute hierarchical structure information describing a relationship between attributes of the relational database as a hierarchical structure, and deleting attributes not included in the intermediate data dictionary from the attribute hierarchical structure. Display attribute hierarchy generating means for outputting a hierarchical structure in a format in which an attribute value is added below The pre-processing setting means, data mining apparatus according to claim 1, further comprising a processing time attribute hierarchy display means prior to displaying the hierarchical structure the display attribute hierarchy generating means outputs.

3. The pre-processing attribute hierarchy display means displays attribute groups, attributes, and attribute values in different forms so that users can be distinguished when displaying a hierarchical structure. The data mining apparatus according to claim 2, wherein

4. The data mining apparatus according to claim 2, wherein said procedure file editing means enables editing of a procedure relating to a node selected by a user on said preprocessing attribute hierarchy display means. apparatus.

5. The procedure file editing means according to a procedure in which a user selects a certain node of the pre-processing attribute hierarchy display means and discriminates a numeric attribute or an additional procedure performed for deleting an attribute. 5. The data mining apparatus according to claim 4, wherein a similar procedure is automatically added to all attribute nodes under the node.

6. The data according to claim 4, wherein the pre-processing attribute hierarchy display means highlights a node corresponding to each procedure in the procedure file being edited when displaying the hierarchical structure. Mining equipment.

7. A first procedure file including a procedure relating to discretization of a numerical attribute, a grouping of attribute values, replacement of attribute values with null values, deletion of attributes, and specific conditions. A second procedure file which includes a procedure relating to selection of records satisfying the first procedure file and which depends on the first procedure file; a first procedure file which includes procedures relating to itemization for adding attribute information to attribute values and grouping of items; A third procedure file depending on a second procedure file is created, and it is guaranteed that the third procedure file is applied to the relational database in the order of the first procedure file, the second procedure file, and the third procedure file. Item 2. The data mining device according to Item 1.

8. A procedure file application setting means for selecting only a set of procedure files having a correct dependency among first, second, and third procedure files edited in advance. The data mining apparatus according to claim 2, comprising:

9. A mining attribute hierarchy display means for displaying a hierarchical structure output by the display attribute hierarchy generation means, and each item in the item database may appear only in a condition part of an association rule. Specifies one of four modes: may appear only in the conclusion part of the association rule, may appear in the conclusion part of the condition part of the association rule, or may not appear in the association rule Mining setting means having a condition part conclusion part setting means, wherein the condition part conclusion part setting means selects a node of the attribute hierarchy display means at the time of mining and designates a mode, so that all of the nodes under the node are designated. 3. The data mining device according to claim 2, wherein the node of the data mining is also automatically designated as the same mode.

10. The mining attribute hierarchy display means makes the display form of each node different so that users can be distinguished by specifying a mode by the condition part conclusion part setting means. The data mining device according to claim 9.

11. The mining attribute hierarchy display means may be configured such that the mode of the node and the nodes below the node are all the same for each node by the designation of the mode by the condition part conclusion part setting means. The data mining apparatus according to claim 10, wherein the node is highlighted.

12. The data mining apparatus according to claim 1, wherein the result display screen displays all items and the number of association rules including each item.