JP2001265596A

JP2001265596A - Device and method for mining data

Info

Publication number: JP2001265596A
Application number: JP2000072295A
Authority: JP
Inventors: Akisumi Mitsuishi; 彰純三石
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-03-15
Filing date: 2000-03-15
Publication date: 2001-09-28

Abstract

PROBLEM TO BE SOLVED: To provide a correlation rule useful for a user by generating only a correlation rule whose correlation coefficient is equal to or larger than a prescribed value based on the correlation coefficient between attributes included in a database. SOLUTION: An attribute set generating part 21 selects a prescribed number (m-pieces) of attribute from attributes included in the database 1, prepares all the attribute sets of the number of elements (m) and stores them in an attribute set file 5. The attribute set is fetched one by one from the file 5, and the candidate of an inter-attribute correlation rule is generated from the attribute set to calculate the correlation coefficient of the rule. Then, a rule where the value is equal to or larger than a prescribed value designated from a user by a user input part 10 is stored in an inter-attribute correlation file 2 as the inter-attribute correlation file 2.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、データベースに
おけるデータ相関の解析を行うためのデータマイニング
装置及びデータマイニング方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data mining apparatus and a data mining method for analyzing data correlation in a database.

【０００２】[0002]

【従来の技術】データベースにおけるデータ相関の解析
において、データベース中の未発見の規則性を発見する
技術は、データマイニングと呼ばれる。特に、ここで
は、名義尺度もしくは順序尺度で与えられる大量データ
の中からデータ項目間の規則性を発見するデータマイニ
ングに関して説明する。ここで尺度の定義は、田中
豊、脇本和昌著、「多変量統計解析法」、現代数学
社、ｐｐ１３８−１３９に述べられている定義に従って
いる。すなわち、名義尺度とは、数値の大きさや大小関
係には意味を持たない尺度であり、順序尺度とは、数値
の大小関係は意味を持つが、数値の大きさには意味がな
い尺度のことを言う。これに対して、数値の大きさに意
味があり、絶対原点の存在する尺度を比率尺度、絶対原
点の存在しない尺度を間隔尺度と呼ぶ。2. Description of the Related Art In analyzing data correlation in a database, a technique for finding an undiscovered regularity in the database is called data mining. In particular, here, data mining for finding regularity between data items from a large amount of data given by a nominal scale or an order scale will be described. The definition of the scale here is Tanaka
It follows the definition described in Yutaka and Wakimoto Kazumasa, "Multivariate Statistical Analysis", Gendai Mathematics, pp. 138-139. In other words, the nominal scale is a scale that has no meaning in the magnitude or magnitude relation of numerical values, and the ordinal scale is a scale that has a meaning in the magnitude relation of numerical values but has no meaning in the magnitude of numerical values Say On the other hand, the magnitude of the numerical value is significant, and a scale having an absolute origin is called a ratio scale, and a scale having no absolute origin is called an interval scale.

【０００３】例えば、小売店における売上情報データベ
ースから、顧客がどのような商品を同時に購入するか、
を分析する手法として、結合ルール（Association Rul
e）の生成がある。結合ルールはＡ→Ｂという形をして
おり、支持度（ｓ）と確信度（ｃ）の２つの指標（ｓ，
ｃ）を持つ。例えば、売上情報データベースから生成し
たＡ→Ｂ（ｓ、ｃ）という結合ルールは、商品Ａと商品
Ｂを同時に購入した顧客が全顧客のｓ％であり、商品Ａ
を購入した顧客のｃ％が商品Ｂも同時に購入したことを
示している。この結合ルールを効率的に生成する手法
は、例えば、R. Agrawal and R. Sricant, "Fast algor
ithms for mining association rules" In Proceedings
of the 20th VLDB Conference, 1994. などに示されて
いる（以下、これを従来技術１とする。）。[0003] For example, from a sales information database at a retail store, what kind of merchandise a customer purchases at the same time can be determined.
As a method of analyzing the association rule, Association Rul
e) is generated. The association rule has the form A → B, and has two indices (s, s, support (s) and confidence (c)).
c). For example, the combination rule A → B (s, c) generated from the sales information database is such that the customers who purchased the product A and the product B at the same time are s% of all the customers, and the product A
Indicates that c% of the customers who have purchased the product B also purchased the product B at the same time. A method for efficiently generating this combination rule is described in, for example, R. Agrawal and R. Sricant, "Fast algor
ithms for mining association rules "In Proceedings
of the 20th VLDB Conference, 1994. (Hereinafter, this is referred to as prior art 1.)

【０００４】また、結合ルールとは異なる指標でルール
を生成する手法もある。例えば、特開平１１−３２８１
８６号公報では、結合ルールで用いる“確信度”に代え
て“χ（カイ）二乗値”を指標として相関ルール（Corr
elation Rule）を生成する。相関ルールＡ⇔Ｂは、商品
Ａを購入するという事象と、商品Ｂを購入するという事
象の独立性検定（χ二乗検定）を行って、独立性を棄却
できる場合に、相関ルールとして出力する（以下、これ
を従来技術２とする）。There is also a method of generating a rule by using an index different from a combination rule. For example, JP-A-11-3281
No. 86 discloses an association rule (Corr.Corr.) Using “χ (chi) square value” as an index instead of “confidence” used in the combination rule.
elation Rule). The association rule A⇔B is output as an association rule when the independence test (χ square test) of the event of purchasing the product A and the event of purchasing the product B is performed, and the independence can be rejected ( Hereinafter, this is referred to as conventional technology 2).

【０００５】また、一般にデータマイニング技術によっ
て得られる結合ルールや相関ルールの数は膨大であり、
そのルール群から有用なルールを見出す作業は人間に委
ねられている。この作業を軽減する手段として、得られ
たルールをさらに分析して、人間にとって分かりやすく
する手法もある。例えば、特開平８−２７２８２５号公
報では、データベースからユーザの指示で生成された複
数のルールを、適合するデータの共通性が高いルール群
に、分類・表示する（これを従来技術３とする）。In addition, the number of association rules and association rules generally obtained by data mining technology is enormous,
It is up to humans to find useful rules from the rules. As a means of reducing this work, there is a method of further analyzing the obtained rules to make it easier for humans to understand. For example, in Japanese Patent Application Laid-Open No. Hei 8-272825, a plurality of rules generated by a user's instruction from a database are classified and displayed in a group of rules having high commonality of matching data (this is referred to as Conventional Technique 3). .

【０００６】[0006]

【発明が解決しようとする課題】上記の従来の技術に
は、以下の問題点がある。The above prior art has the following problems.

【０００７】前記従来技術１および２は、データベース
内のレコードに記録されている属性の値（以下、属性値
とする。）に着目して、ルールの生成が行われている。
例えば、図１０に示すような、健康診断結果のデータベ
ースを考える。このデータベースには、年齢、身長、体
重、血圧、視力、などの属性がある。年齢、身長、体
重、血圧、視力、などが各属性の属性名であり、“１０
歳代”、“２０歳代”、“低い”、“高い”、“軽
い”、“重い”、などが属性値である。ここで例示した
属性には、本来、比率尺度、あるいは、間隔尺度で与え
られるものもあるが、一方で、例えば性別、職業、趣味
など、本来的に名義尺度である属性もある。全ての尺度
の属性について、相互の関係を調べるために、ここでは
比率尺度あるいは間隔尺度の属性が、その値を名義尺度
あるいは順序尺度に変換されているものとしている。In the prior arts 1 and 2, rules are generated by focusing on attribute values (hereinafter referred to as attribute values) recorded in records in a database.
For example, consider a database of health check results as shown in FIG. This database has attributes such as age, height, weight, blood pressure, visual acuity, and the like. Age, height, weight, blood pressure, visual acuity, and the like are attribute names of each attribute.
The attribute values are “age”, “20s”, “low”, “high”, “light”, “heavy”, etc. The attributes exemplified here are originally a ratio scale or an interval scale. Some attributes are essentially nominal scales, such as gender, occupation, and hobbies.To examine the interrelationships of the attributes of all scales, here we use a ratio scale or It is assumed that the attribute of the interval scale has its value converted to a nominal scale or an ordinal scale.

【０００８】図１０に示したデータベースに対して、従
来技術１を適用して得られる結合ルールの例を以下に示
す。一般的には、多数の結合ルールが生成されるが、説
明のために以下では６個のルールのみを例示している。An example of a combination rule obtained by applying the prior art 1 to the database shown in FIG. 10 is shown below. Generally, a large number of combination rules are generated, but only six rules are illustrated below for the sake of explanation.

【０００９】属性間結合ルールの例（１）年齢が高い → 血圧が高い（２）年齢が低い → 血圧が低い（３）年齢が高い → 身長が低い（４）年齢が低い → 身長が高い（５）身長が低い → 血圧が高い（６）身長が高い → 血圧が低いExamples of attribute association rules (1) Age is high → blood pressure is high (2) Age is low → blood pressure is low (3) Age is high → height is low (4) Age is low → height is high ( 5) Short height → high blood pressure (6) High height → low blood pressure

【００１０】ここで、ルール（１）と（２）、ルール
（３）と（４）、ルール（５）と（６）は、それぞれ同
等の意味を持っている。The rules (1) and (2), the rules (3) and (4), and the rules (5) and (6) have the same meaning.

【００１１】また、ルール（１）から（４）はそれなり
の根拠があるルールであるが、ルール（５）と（６）
は、年齢という属性を媒介として、関係があるように見
えているだけで、根拠のないルールである。すなわち、
年齢と血圧、および、年齢と身長にはそれぞれ因果関係
が存在するが、身長と血圧には直接的な因果関係は存在
しない。言い換えれば、ルール（５）および（６）は、
年齢と血圧、および、年齢と身長という２つの因果関係
の連鎖によって、派生的に発生した相関ルール（以下、
派生相関ルールあるいは派生結合ルールと呼ぶ）という
ことができる。しかし、一般的な従来のデータマイニン
グ技術では、属性や属性値の意味を考慮しないため、派
生結合ルールであるルール（５）や（６）を自動的に排
除することは困難であった。The rules (1) to (4) are rules having a reasonable basis, but the rules (5) and (6)
Is an unfounded rule that appears to be related through the attribute of age. That is,
There is a causal relationship between age and blood pressure and between age and height, but there is no direct causal relationship between height and blood pressure. In other words, rules (5) and (6) are:
An association rule (hereinafter, referred to as a derivative) generated by a chain of two causal relationships of age and blood pressure, and age and height.
(Referred to as a derived association rule or a derived combination rule). However, in the general conventional data mining technology, it is difficult to automatically exclude the rules (5) and (6), which are derived combination rules, because the meanings of attributes and attribute values are not considered.

【００１２】上記のように、従来のデータマイニング技
術による属性値間相関ルール生成方法では、属性値間の
関係に着目しているため、同一属性に関するルールが多
数出力されたり、因果関係のないルールが多数含まれて
いた。そのため、結果の解釈は人間に委ねられ、結果の
解釈が煩雑であったり、誤った解釈をしてしまう危険性
があるという問題点があった。As described above, in the method of generating a correlation rule between attribute values according to the conventional data mining technique, attention is paid to the relationship between attribute values, so that a large number of rules relating to the same attribute are output, or a rule having no causal relationship. Was included. For this reason, the interpretation of the result is left to human beings, and there is a problem that the interpretation of the result is complicated and there is a risk of erroneous interpretation.

【００１３】また、前記従来技術３では、ユーザが指示
したルールと、自動生成された複数のルールの間の類似
度を算出し、類似度の高いルールをグループ化して表示
することによって、ユーザにわかり易い表示を行ってい
る。しかし、自動的に生成されたルールの妥当性の検証
は行っていないため、前記の派生結合ルールが最後まで
含まれており、ユーザに誤った解釈を与える危険性があ
った。Further, in the above-mentioned prior art 3, the similarity between a rule designated by the user and a plurality of automatically generated rules is calculated, and rules having a high similarity are grouped and displayed. The display is easy to understand. However, since the validity of the automatically generated rules is not verified, the above-described derived combination rule is included to the end, and there is a risk of giving a user an incorrect interpretation.

【００１４】この発明は上記のような問題点を解決する
ためになされたもので、データベースから抽出する相関
ルールの抽象度を向上させ、人間にとって、より分かり
やすい相関ルールを提供することが可能なデータマイニ
ング方法及びデータマイニング装置を得ることを目的と
する。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and it is possible to improve the abstraction of a correlation rule extracted from a database and provide a correlation rule that is more easily understood by humans. An object is to obtain a data mining method and a data mining device.

【００１５】[0015]

【課題を解決するための手段】本発明は、名義尺度もし
くは順序尺度で与えられる属性の属性値からなる複数の
レコードを記憶したデータベースから、２つ以上の属性
を選択して属性集合を生成する属性集合生成手段と、属
性集合生成手段により生成された属性集合を空でない２
つの集合に分割して、その一方を左辺として他方を右辺
としたルールを生成し、属性間相関ルールの候補として
出力する属性間相関ルール候補生成手段と、属性間相関
ルールの候補について相関係数を算出し、相関係数の値
が所定の閾値以上のものを属性間相関ルールとして抽出
する属性間相関ルール検定手段とを備えたデータマイニ
ング装置である。SUMMARY OF THE INVENTION According to the present invention, an attribute set is generated by selecting two or more attributes from a database storing a plurality of records of attribute values of attributes given by a nominal scale or an order scale. An attribute set generation unit and a non-empty attribute set generated by the attribute set generation unit;
An attribute correlation rule candidate generating means for generating a rule in which one is set on the left side and the other is on the right side and output as a candidate for an attribute correlation rule, and a correlation coefficient for the attribute correlation rule candidate And an attribute-attribute correlation rule test means for calculating a correlation coefficient value equal to or greater than a predetermined threshold value as an attribute-attribute correlation rule.

【００１６】また、属性集合の要素数の最大値を指定す
る入力手段をさらに備え、属性集合生成手段が、要素数
が２から最大値までのすべての属性集合を生成する。Further, there is further provided input means for designating the maximum value of the number of elements of the attribute set, and the attribute set generation means generates all the attribute sets having the number of elements from 2 to the maximum value.

【００１７】また、属性集合生成手段が、データベース
から１つ以上の属性値を選択して、属性値集合を生成す
る属性値集合生成部と、属性値集合生成部により生成さ
れた属性値集合から２つの集合を選択して、その一方を
条件として他方を結果として相関ルール候補を生成し、
生成された相関ルール候補につき独立性の検定を行い、
独立性が棄却できる相関ルール候補を属性値間相関ルー
ルとして出力する属性値間相関ルール生成部と、属性値
間相関ルール生成部から出力された属性値間相関ルール
の左辺及び右辺の属性値を、対応する属性の属性名に変
換して、属性名から構成される属性集合を生成する属性
集合生成部とから構成されている。Further, the attribute set generating means selects one or more attribute values from the database and generates an attribute value set, and an attribute value set generating section which generates the attribute value set from the attribute value set generated by the attribute value set generating section. Selecting two sets, generating a correlation rule candidate with one as a condition and the other as a result,
Perform an independence test on the generated association rule candidates,
An attribute value correlation rule generator that outputs correlation rule candidates that can reject independence as attribute value correlation rules, and attribute values on the left and right sides of the attribute value correlation rule output from the attribute value correlation rule generator. And an attribute set generation unit that converts an attribute name of a corresponding attribute to generate an attribute set including the attribute name.

【００１８】また、属性間相関ルール検定手段により抽
出された属性間相関ルールを入力とし、循環関係にある
属性間相関ルールの集合を検出する循環集合検出手段
と、循環集合検出手段で得られた循環関係にある属性間
相関ルールの集合から、循環を構成する全ての属性間相
関ルールについて、属性間相関ルールが、残りの属性間
相関ルールの連鎖によって派生的に導出されているの
か、真に相関関係を有するのかを検定する派生ルール検
定手段とをさらに備えている。[0018] Further, a cyclic set detecting means for inputting the inter-attribute association rules extracted by the inter-attribute association rule testing means and detecting a set of inter-attribute association rules in a cyclic relationship, and a cyclic set detecting means. From the set of inter-attribute association rules in a cyclic relationship, for all inter-attribute association rules that make up the cycle, it is true whether the inter-attribute association rules are derived by the chain of the remaining inter-attribute association rules. Derivation rule test means for testing whether or not there is a correlation is further provided.

【００１９】また、この発明は、名義尺度もしくは順序
尺度で与えられる属性の属性値からなる複数のレコード
を記憶したデータベースから、２つ以上の属性を選択し
て属性集合を生成する属性集合生成ステップと、属性集
合生成ステップにより生成された属性集合を空でない２
つの集合に分割して、その一方を左辺として他方を右辺
としたルールを生成し、属性間相関ルールの候補として
出力する属性間相関ルール候補生成ステップと、属性間
相関ルールの候補について相関係数を算出し、相関係数
の値が所定の閾値以上のものを属性間相関ルールとして
抽出する属性間相関ルール検定ステップとを備えてい
る。Further, the present invention provides an attribute set generating step of selecting two or more attributes from a database storing a plurality of records of attribute values of attributes given by a nominal scale or an order scale to generate an attribute set. And the attribute set generated by the attribute set generation step is not empty 2
Generating an inter-attribute correlation rule candidate generating step of generating a rule having one set as a left side and the other as a right side, and outputting the set as an inter-attribute correlation rule candidate; And extracting an attribute whose correlation coefficient is equal to or greater than a predetermined threshold value as an inter-attribute correlation rule.

【００２０】また、属性集合の要素数の最大値を指定す
る入力ステップをさらに備え、属性集合生成ステップ
が、要素数が１から上記最大値までのすべての属性集合
を生成する。Further, the method further comprises an input step of designating a maximum value of the number of elements of the attribute set, and the attribute set generation step generates all the attribute sets having the number of elements from 1 to the maximum value.

【００２１】また、属性集合生成ステップが、データベ
ースの属性値を用いて、属性値間相関ルールを生成する
属性値間相関ルール生成ステップと、属性値間相関ルー
ルの左辺及び右辺の属性値を対応する属性の属性名に変
換して、属性名から構成される属性集合を生成する属性
集合生成ステップとから構成されている。In the attribute set generating step, the attribute value correlation rule generating step of generating the attribute value correlation rule using the attribute values of the database corresponds to the attribute values on the left and right sides of the attribute value correlation rule. And an attribute set generating step of converting the attribute name into the attribute name of the attribute to be generated and generating an attribute set composed of the attribute names.

【００２２】また、属性間相関ルール検定ステップによ
り抽出された属性間相関ルールから、循環関係にある属
性間相関ルールの集合を検出する循環集合検出ステップ
と、循環集合検出ステップで得られた循環関係にある属
性間相関ルールの集合から、循環を構成する全ての属性
間相関ルールについて、属性間相関ルールが、残りの属
性間相関ルールの連鎖によって派生的に導出されている
のか、真に相関関係を有するのかを検定する派生ルール
検定ステップとをさらに備えている。Further, a cyclic set detecting step of detecting a set of cyclical attribute correlation rules from the inter-attribute correlation rules extracted in the inter-attribute correlation rule test step, and a cyclic relation obtained in the cyclic set detecting step. From the set of inter-attribute association rules in, for all inter-attribute association rules that make up the cycle, whether the inter-attribute association rules are derived by the chain of the remaining And a derived rule testing step of testing whether or not it has

【００２３】[0023]

【発明の実施の形態】以下、本発明について、実施の形
態とともに説明する。なお、実施の形態を説明するため
の全図において、同一機能を有するものは同一符号を付
け、その繰り返しの説明は省略する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described together with embodiments. In all the drawings for describing the embodiments, components having the same functions are denoted by the same reference numerals, and repeated description thereof will be omitted.

【００２４】実施の形態１．図１は、この発明の属性間
相関ルール生成を行うためのデータマイニング装置の構
成を示したブロック図である。図において、１はデータ
マイニングの対象となるデータベース、２はマイニング
結果である属性間相関ルールが格納されている属性間相
関ルールファイル、５は属性間相関ルールの候補を生成
するための属性集合が格納されている属性集合ファイ
ル、１０はマイニング条件を入力するユーザ入力部、２
０は属性間相関ルールを生成・検証する仮説生成検証
部、２１はすべての属性集合を生成する属性集合生成
部、２２は属性集合生成部２１が生成した属性集合の中
から、ルール候補を生成し、検証する属性間相関ルール
検証部である。Embodiment 1 FIG. 1 is a block diagram showing a configuration of a data mining apparatus for generating an inter-attribute association rule according to the present invention. In the figure, 1 is a database to be subjected to data mining, 2 is an inter-attribute correlation rule file storing mining result inter-attribute correlation rules, and 5 is an attribute set for generating inter-attribute correlation rule candidates. The stored attribute set file 10 is a user input unit for inputting mining conditions, 2
0 is a hypothesis generation and verification unit that generates and verifies an inter-attribute association rule, 21 is an attribute set generation unit that generates all attribute sets, and 22 is a rule candidate generated from the attribute set generated by the attribute set generation unit 21 And an attribute correlation rule verification unit for verification.

【００２５】次に動作について図２を用いて説明する。
図２は図１に示したデータマイニング装置の処理内容を
説明するフロー図である。まず、ステップＳ１００にお
いて、ユーザ入力部１０により、マイニング条件とし
て、最小相関係数Ｒminと最大ルール長Ｌmaxの値をユー
ザから得る。ここで、最小相関係数Ｒminとは、属性間
の相関係数の閾値であり、このＲminの値以上の相関係
数を有する属性間相関ルールが有用な相関ルールと判定
されるためのものである。また、最大ルール長Ｌmaxと
は、生成する属性集合の要素数の最大値である。次に、
属性集合生成部２１は、ステップＳ１０１において、変
数ｋに２をセットする。次に、ステップＳ１０２におい
て、データベース１に含まれる属性（すなわち、図１０
の例でいえば、年齢、身長、体重、視力等が属性であ
り、その総数をｎとする）から、ｋ個の属性を選択し、
要素数ｋの属性集合を全て作成する。この場合、属性集
合は、nＣｋ個生成される。これらの生成された属性集
合は属性集合ファイル５に格納される。次に、属性間相
関ルール検定部２２は、ステップＳ１０３において、属
性集合ファイル５から属性集合を１つずつ取り出し、そ
の属性集合から属性間相関ルールの候補を生成して、そ
の候補がルールとして成立する場合には属性間相関ルー
ルファイル２へ出力する。このステップＳ１０３の処理
の詳細を図３に示す。Next, the operation will be described with reference to FIG.
FIG. 2 is a flowchart for explaining the processing content of the data mining device shown in FIG. First, in step S100, the user input unit 10 obtains the values of the minimum correlation coefficient Rmin and the maximum rule length Lmax from the user as mining conditions. Here, the minimum correlation coefficient Rmin is a threshold value of a correlation coefficient between attributes, and is used for determining an inter-attribute correlation rule having a correlation coefficient equal to or more than the value of Rmin as a useful correlation rule. is there. The maximum rule length Lmax is the maximum value of the number of elements of the generated attribute set. next,
In step S101, the attribute set generation unit 21 sets 2 to a variable k. Next, in step S102, attributes included in the database 1 (that is, FIG. 10)
, Age, height, weight, visual acuity, etc. are attributes, and the total number is n), k attributes are selected,
All attribute sets with the number of elements k are created. In this case, nCk attribute sets are generated. These generated attribute sets are stored in the attribute set file 5. Next, in step S103, the inter-attribute correlation rule test unit 22 extracts an attribute set one by one from the attribute set file 5, generates a candidate for an inter-attribute correlation rule from the attribute set, and sets the candidate as a rule. If so, it is output to the attribute correlation rule file 2. FIG. 3 shows details of the process in step S103.

【００２６】図３において、ステップＳ１０３１におい
て、属性集合ファイル５を調べ、空ならば処理を終了す
る。空でなければ、ステップＳ１０３２において、属性
集合ファイル５の要素である属性集合を１つ取り出す。
次に、ステップＳ１０３３において、先のステップＳ１
０３２で取り出した属性集合を空でない２つの属性集合
LHSとRHSに分割する。分割された属性集合LHSとRHSを、
属性間相関ルールの左辺（条件）と右辺（結果）とみな
し、ステップＳ１０５５において、LHSとRHSの相関係数
を計算する。この計算方法については、例えば、田中
豊、脇本和昌著、「多変量統計解析法」、現代数学
社、ｐｐ１６１−１７１に述べられている数量化III類
における相関係数の定義と計算方法を用いることができ
る。これについては後述する。ただし、この定義に限定
する必要はなく、属性集合間の相関の程度を示すことが
できる指標であれば、他の定義および計算方法でもかま
わない。次に、ステップＳ１０３６において、ここで計
算した相関係数Rと図２のステップＳ１００において入
力した最小相関係数Rminとを比較し、RがRmin以上であ
れば、ステップＳ１０３７において、“LHS⇔RHS”（LH
SとRHSとは独立ではなく何らかの関係がある。）を属性
間相関ルールとして属性間相関ルールファイル２に出力
する。RがRminに満たない場合はステップＳ１０３７を
スキップし、その後ステップＳ１０３３へ戻る。In FIG. 3, in step S1031, the attribute set file 5 is checked, and if it is empty, the process is terminated. If it is not empty, in step S1032, one attribute set which is an element of the attribute set file 5 is extracted.
Next, in step S1033, the previous step S1
Two attribute sets that are not empty from the attribute set extracted in 032
Split into LHS and RHS. The divided attribute sets LHS and RHS are
Considering the left side (condition) and right side (result) of the inter-attribute correlation rule, in step S1055, the correlation coefficient between LHS and RHS is calculated. About this calculation method, for example, Tanaka
The definition and calculation method of the correlation coefficient in Quantification III described in Yutaka and Wakimoto Kazumasa, “Multivariate Statistical Analysis”, Hyundai Mathematics, pp161-171 can be used. This will be described later. However, it is not necessary to limit to this definition, and any other definition and calculation method may be used as long as the index can indicate the degree of correlation between attribute sets. Next, in step S1036, the calculated correlation coefficient R is compared with the minimum correlation coefficient Rmin input in step S100 of FIG. 2, and if R is equal to or greater than Rmin, in step S1037, “LHS⇔RHS ”(LH
S and RHS are not independent but have some relationship. ) Is output to the inter-attribute correlation rule file 2 as an inter-attribute correlation rule. If R is less than Rmin, step S1037 is skipped, and the process returns to step S1033.

【００２７】ステップＳ１０３３は、通過するたびに異
なる分割を行い、もはや以前と異なる分割ができなかっ
た場合には、ステップＳ１０３４の判定で“分割が終
り”と判定され、ステップＳ１０３１へ戻る。分割の種
類は、属性集合の要素数がｍの場合、２^m-1−１通り存
在するため、ステップＳ１０３１は２^m-1−１回だけ分
割を行い、２^m-1回目に通過した時に、ステップＳ１０
３４によってステップＳ１０３１へ戻ることになる。以
上の処理により、１つの属性集合から生成できる属性間
相関ルールをすべて生成することができる。In step S1033, a different division is performed each time the image passes, and if a division different from the previous one cannot be performed any more, it is determined in step S1034 that "division is completed", and the process returns to step S1031. Type of division, when the number of elements of the attribute set is m, since there ways 2 ^m-1 -1, step S1031 performs division by 2 ^m-1 -1 times, when passing through the first 2 ^m-1 times , Step S10
34 returns to step S1031. By the above processing, all the inter-attribute correlation rules that can be generated from one attribute set can be generated.

【００２８】図３の処理が終わると再び図２に戻って、
ステップＳ１０４において変数ｋの値を１増加させる。
次のステップＳ１０５において、変数ｋの値を、ステッ
プＳ１００において入力した最大ルール長Lmaxと比較し
て、Lmaxを超えていれば処理を終了する。超えていなけ
れば、ステップＳ１０２へ戻って、長さｋの属性間相関
ルールの生成処理を行う。このようにして、要素数ｋが
２〜Ｌmax個までの個数のすべての属性集合を生成し
て、各々についてルール検定を行い、相関係数の値が閾
値Ｒmin以上の相関ルールを属性間相関ルールファイル
２に格納していく。When the processing in FIG. 3 is completed, the processing returns to FIG.
In step S104, the value of the variable k is increased by one.
In the next step S105, the value of the variable k is compared with the maximum rule length Lmax input in step S100, and if it exceeds Lmax, the process ends. If not, the flow returns to step S102 to perform a process of generating an attribute correlation rule having a length k. In this way, all attribute sets with the number of elements k ranging from 2 to Lmax are generated, a rule test is performed for each of them, and a correlation rule having a correlation coefficient value equal to or greater than the threshold value Rmin is determined. Stored in file 2.

【００２９】なお、補足ながら、上記文献「多変量統計
解析法」に述べられている相関係数の定義と計算方法を
本発明に適用した場合について簡単に説明する。ここで
は、属性集合LHSとRHSの相関係数を考えることにして、
LHSがとりうる値をL1,L2,L3,…,Lr、RHSがとりうる値を
R1,R2,R3,…,Rsとする。ここで、L1,L2,L3,…,Lr、R1,R
2,R3,…,Rs は必ずしも数値ではなく、本発明の対象と
している例えば図１０の例に示すデータでは、“高
い”、“軽い”などの非数値データになっているため、
一般的な相関係数は算出できないが、LHSとRHSの相関度
を定量的に定義するために、（概念的には）下記の計算
で定量的な相関係数を定義する。As a supplement, the case where the definition and calculation method of the correlation coefficient described in the above-mentioned document “Multivariate Statistical Analysis” is applied to the present invention will be briefly described. Here, considering the correlation coefficient between the attribute sets LHS and RHS,
The values that LHS can take are L1, L2, L3, ..., Lr, and the values that RHS can take
R1, R2, R3, ..., Rs. Where L1, L2, L3, ..., Lr, R1, R
2, R3,..., Rs are not necessarily numerical values, and are non-numeric data such as “high” and “light” in the data shown in the example of FIG.
Although a general correlation coefficient cannot be calculated, a quantitative correlation coefficient is defined (conceptually) by the following calculation in order to quantitatively define the degree of correlation between LHS and RHS.

【００３０】まず、L1,L2,L3,…,Lr、R1,R2,R3,…,Rs
に適当な数値を割り当てる。次に、割り当てた数値で相
関係数を算出する。得られた相関係数が最大になるよう
に先に割り当てた適当な数値を調整する。最終的に得ら
れた相関係数の最大値をLHSとRHSの相関係数と定義す
る。First, L1, L2, L3,..., Lr, R1, R2, R3,.
Assign an appropriate number to. Next, a correlation coefficient is calculated using the assigned numerical value. An appropriate numerical value assigned earlier is adjusted so that the obtained correlation coefficient is maximized. The maximum value of the finally obtained correlation coefficient is defined as the correlation coefficient between LHS and RHS.

【００３１】なお、ここでは、適当な数値から始めて相
関係数の最大値を探していくように述べたが、実際の計
算においては、相関係数の最大値を固有値問題を解くこ
とにより、直接計算する方が望ましい。具体的な計算方
法を以下に述べる。Here, it has been described that the maximum value of the correlation coefficient is searched for starting from an appropriate numerical value. However, in the actual calculation, the maximum value of the correlation coefficient is directly determined by solving the eigenvalue problem. It is better to calculate. A specific calculation method will be described below.

【００３２】まず、下記の表１に示すｒ×ｓの分割表を
作成する。この時、ｒ≦ｓとする（この条件を満たさな
い場合は、LHSとRHSを入れ替えればよい。）。First, an r × s contingency table shown in Table 1 below is created. At this time, r ≦ s (if this condition is not satisfied, LHS and RHS may be replaced).

【００３３】図１０のデータベースを調べて、LHSがL
i、RHSがRj であるレコードの数fijをカウントして分割
表を完成させる。By checking the database of FIG.
i, the number of records fij whose RHS is Rj is counted to complete the contingency table.

【００３４】[0034]

【表１】 [Table 1]

【００３５】次に、ｓ×ｓの正方行列を作り、ｉ行ｊ列
の要素ｅｉｊを下式で計算する（対象行列になってい
る）。Next, a square matrix of s × s is created, and the element eij at the i-th row and the j-th column is calculated by the following equation (the target matrix).

【００３６】[0036]

【数１】 (Equation 1)

【００３７】上式を用いて作成した対象行列の２番目に
大きい固有値を求め、その平方をLHSとRHSの相関係数と
する（ここで、最も大きい固有値は１で、無意味な解で
あるため、２番目に大きい固有値を採用する。）。The second largest eigenvalue of the target matrix created using the above equation is obtained, and its square is used as the correlation coefficient between LHS and RHS (where the largest eigenvalue is 1 and is a meaningless solution). Therefore, the second largest eigenvalue is adopted.)

【００３８】以上のように、本実施の形態によれば、デ
ータベース１に含まれる属性間の相関係数に基づいて、
属性間の相関係数が所定の閾値以上の相関ルールを属性
間相関ルールとして出力するので、ユーザにとってわか
りやすく有用な相関ルールを得ることができ、相関ルー
ル生成結果の解釈が明瞭になり、ユーザに誤った解釈を
与えることを防止することができるという効果がある。As described above, according to the present embodiment, based on the correlation coefficient between the attributes included in the database 1,
Since a correlation rule in which the correlation coefficient between attributes is equal to or greater than a predetermined threshold is output as an inter-attribute correlation rule, a useful and easy-to-understand correlation rule can be obtained, and the interpretation of the correlation rule generation result becomes clear. This has the effect of preventing erroneous interpretations from being given.

【００３９】実施の形態２．上述の実施の形態１では、
データベース１に含まれる全ての属性の組みあわせにつ
いて、属性間相関ルールの候補を生成して検証する例に
ついて説明したが、この方法では、データベース１内に
存在する全てのルールを生成することができるものの、
調べる属性の数が大きくなると膨大な処理時間を要する
という問題がある。そこで、本実施の形態においては、
属性値を用いた属性値間相関ルールを手がかりとして、
高速に効率よく属性間相関ルールを生成する実施の形態
を示す。Embodiment 2 In the first embodiment described above,
Although an example has been described in which a candidate for an inter-attribute correlation rule is generated and verified for all combinations of attributes included in the database 1, all rules existing in the database 1 can be generated by this method. Although,
There is a problem that an enormous amount of processing time is required as the number of attributes to be checked increases. Therefore, in the present embodiment,
Based on attribute value correlation rules using attribute values,
An embodiment for efficiently generating an inter-attribute association rule at high speed will be described.

【００４０】図４は、本実施の形態におけるデータマイ
ニング装置の構成を示したブロック図である。図におい
て、３０は属性値間相関ルールを生成する属性値間相関
ルール生成部、４は属性値間相関ルール生成部３０によ
って生成される属性値間相関ルールを格納している属性
値間相関ルールファイル、５０は属性値間相関ルールの
候補を生成・検証する仮説生成検証部、５１は属性値間
相関ルールファイル４から属性集合の集合を生成する属
性集合生成部である。FIG. 4 is a block diagram showing the configuration of the data mining device according to the present embodiment. In the figure, reference numeral 30 denotes an attribute value correlation rule generation unit that generates an attribute value correlation rule, and 4 denotes an attribute value correlation rule that stores the attribute value correlation rule generated by the attribute value correlation rule generation unit 30. Reference numeral 50 denotes a hypothesis generation / verification unit for generating / verifying candidates for an attribute value correlation rule, and reference numeral 51 denotes an attribute set generation unit for generating a set of attribute sets from the attribute value correlation rule file 4.

【００４１】次に動作について図５を用いて説明する。
図５は、図４に示したデータマイニング装置の処理内容
を説明するフロー図である。まず、ステップＳ２００に
おいて、ユーザ入力部１０によりマイニング条件とし
て、属性値間相関ルール生成に必要なパラメータ、およ
び、最小相関係数Ｒminの値をユーザから得る。属性値
間相関ルール生成に必要なパラメータとしては、最小支
持度や最小χ二乗値、さらには最大ルール長などがある
が、本発明の本質ではないので省略する。Next, the operation will be described with reference to FIG.
FIG. 5 is a flowchart for explaining the processing content of the data mining device shown in FIG. First, in step S200, the user input unit 10 obtains, as mining conditions, parameters necessary for generating an attribute value correlation rule and a value of a minimum correlation coefficient Rmin from a user. The parameters necessary for the generation of the correlation rule between attribute values include the minimum support, the minimum χ square value, and the maximum rule length.

【００４２】次に、ステップＳ２０１において、属性値
間相関ルール生成部３０がステップＳ２００で入力され
たパラメータに基づいてデータベース１から属性値間相
関ルールの集合を作成して属性値間相関ルールファイル
４に格納する。この処理については、従来技術２に述べ
られている。図１０のデータベースの例で簡単に説明す
れば、例えば、相関ルールの左辺及び右辺の候補となる
１以上の属性値（“１０歳代”，“（身長が）低い”な
ど）からなる候補属性値集合を生成し、次に、候補属性
値集合からデータベース１内での出現回数が所定の最小
支持度以上のものを大属性値集合として選択する。次
に、左辺として所定の長さＬ１の大属性値集合の１つを
選択し、右辺として所定の長さＬ２の大属性値集合を選
択して、相関ルール候補を生成する。次に、生成した相
関ルール候補の左辺及び右辺の支持度すなわちデータベ
ース１内での出現回数の値を用いて、χ²検定を行い、
所定の最小χ二乗値以上のルール、すなわち、独立性が
棄却できるルールを属性値間相関ルールとして属性値間
相関ルールファイル４に出力する。Next, in step S201, the inter-attribute value correlation rule generation unit 30 creates a set of inter-attribute value correlation rules from the database 1 based on the parameters input in step S200, To be stored. This processing is described in Related Art 2. Briefly explaining with the example of the database in FIG. 10, for example, a candidate attribute including one or more attribute values (“ten years old”, “(height) is low”, etc.) as candidates for the left and right sides of the association rule A value set is generated, and then a candidate attribute value set whose occurrence count in the database 1 is equal to or greater than a predetermined minimum support is selected as a large attribute value set. Next, one of the large attribute value sets having a predetermined length L1 is selected as the left side, and the large attribute value set having the predetermined length L2 is selected as the right side, to generate a correlation rule candidate. Next, a 支持² test is performed using the support of the left side and the right side of the generated association rule candidate, that is, the value of the number of appearances in the database 1,
A rule having a value equal to or larger than a predetermined least square value, that is, a rule whose independence can be rejected, is output to the attribute value correlation rule file 4 as an attribute value correlation rule.

【００４３】なお、大属性値集合Ａ，Ｂがあるとき、Ａ
→Ｂという相関ルール候補の独立性の検定を行う、すな
わち、「ＡであることとＢであることは独立である」と
いう仮説をたててχ²検定を行う場合のχ²値は、集合Ａ
の支持度すなわち集合Ａの全要素を含むレコードの数を
ａ、Ｂの支持度すなわち集合Ｂの全要素を含むレコード
の数をｂ、集合Ａ及び集合Ｂの支持度すなわち集合Ａと
集合Ｂの要素をすべて含むレコードの数をｃ、データベ
ース中のレコードの総数をｎとすると、次式（２）で与
えられる。When there are large attribute value sets A and B, A
→ Test the independence of the association rule candidate B, that is, the χ ² value when performing the χ ² test based on the hypothesis that “A and B are independent” is a set A
A, the number of records containing all elements of set A is b, the support of B, ie, the number of records containing all elements of set B is b, and the support of sets A and B, ie, sets A and B Assuming that the number of records including all the elements is c and the total number of records in the database is n, it is given by the following equation (2).

【００４４】[0044]

【数２】 (Equation 2)

【００４５】この値が、例えば３．８（有意水準５
％）、あるいは、６．６（有意水準１％）を超えた場
合、上記仮説が棄却されるため、「ＡであることとＢで
あることは独立ではなく、何らかの関係がある」という
ことがいえる。以上説明した手順を大属性値集合の全て
の属性値の組合せについて行い、仮説が棄却された組合
せを属性間相関ルールとして出力する。なお、本実施の
形態においては、計算量を減らすため、データベース１
内の出現回数（支持度）が所定の回数以上のものを大属
性値集合として、それらについてのみχ²検定を行う例
について述べたが、その場合に限らず、全ての属性値の
組合せについて検定を行ってもよく、また、その場合に
は、“全ての組合せ”の数が膨大になってしまうが、実
用的な計算量で調べる方法が、例えば、上記の従来技術
１に示されている。This value is, for example, 3.8 (significance level 5).
%) Or 6.6 (significance level 1%), the above hypothesis is rejected, so that "A and B are not independent and have some relationship" I can say. The above-described procedure is performed for all combinations of the attribute values of the large attribute value set, and the combination for which the hypothesis has been rejected is output as an inter-attribute correlation rule. In the present embodiment, in order to reduce the amount of calculation, the database 1
What number of occurrences of the inner (support of) of a predetermined number of times or more as large attribute value set has been dealt with the case of performing only chi ² test for them is not limited to this case, test for all combinations of attribute values In such a case, the number of “all combinations” becomes enormous, but a method of checking with a practical amount of calculation is disclosed in, for example, the above-mentioned prior art 1. .

【００４６】また、上述においては、χ²検定を行って
独立性の仮説検定を行う例について述べたが、その場合
に限らず、独立性の検定が行えるものであれば他のいず
れの検定を行うようにしてもよく、その場合も同様の効
果を得ることができる。[0046] Further, in the above has described the example in which the independence of hypothesis testing performed chi ² test, the not limited to, any as long as it can be performed independently of the test other assays The same effect may be obtained in such a case.

【００４７】次に、ステップＳ２０２では、属性集合生
成部５１が属性値間相関ルールファイル４から属性集合
を作成し、属性集合ファイル５に出力する。ステップＳ
２０２の処理の詳細を図６を用いて説明する。図６は属
性集合を作成する処理の流れを示したフロー図である。Next, in step S 202, the attribute set generation unit 51 creates an attribute set from the attribute value correlation rule file 4 and outputs it to the attribute set file 5. Step S
Details of the processing of 202 will be described with reference to FIG. FIG. 6 is a flowchart showing the flow of a process for creating an attribute set.

【００４８】ステップＳ２０２１において、属性値間相
関ルールファイル４を調べ、空ならば処理を終了する。
空でなければ、ステップＳ２０２２へ移る。ステップＳ
２０２２では、属性値間相関ルールファイル４から、属
性値間相関ルールを１つ取り出す。ステップＳ２０２３
において、取り出した属性値間相関ルールの左辺を構成
する属性値を、対応する属性の属性名に変換する。変換
した属性名の集合を属性集合Ａとする。すなわち、上述
のルール（１）で言えば、左辺の属性値“高い”を対応
する属性の属性名である“年齢”に変換する。In step S2021, the attribute value correlation rule file 4 is checked, and if empty, the process is terminated.
If not empty, the process moves to step S2022. Step S
At 2022, one attribute value correlation rule is extracted from the attribute value correlation rule file 4. Step S2023
In, the attribute value forming the left side of the extracted attribute value correlation rule is converted into the attribute name of the corresponding attribute. A set of the converted attribute names is referred to as an attribute set A. That is, in the above rule (1), the attribute value “high” on the left side is converted into “age” which is the attribute name of the corresponding attribute.

【００４９】ステップＳ２０２４では、属性値間相関ル
ールの右辺を構成する属性値を、対応する属性の属性名
に変換し、左辺から作成した属性集合Ａに追加する。す
なわち、上述のルール（１）で言えば、右辺の属性値
“高い”を対応する属性の属性名である“血圧”に変換
して、それを属性集合Ａに追加する。次に、ステップＳ
２０２５で、作成した属性集合Ａを属性集合ファイル５
に出力し、ステップＳ２０２１へ戻る。In step S2024, the attribute values forming the right side of the attribute value correlation rule are converted to the attribute names of the corresponding attributes, and added to the attribute set A created from the left side. That is, according to the above rule (1), the attribute value “high” on the right side is converted into “blood pressure” which is the attribute name of the corresponding attribute, and is added to the attribute set A. Next, step S
At 2025, the created attribute set A is stored in the attribute set file 5
And returns to step S2021.

【００５０】ステップＳ２０２の処理が終わると、図５
のステップＳ１０３へ移り、属性間相関ルールの生成・
検定を行って、ルールとして成立するものを属性間相関
ルールとして属性間相関ルールファイル２へ出力する。
ステップＳ１０３の処理の詳細は図３に示したので、こ
こでは省略する。When the processing in step S202 is completed, FIG.
To step S103 to generate an inter-attribute association rule.
A test is performed, and a rule established as a rule is output to the inter-attribute correlation rule file 2 as an inter-attribute correlation rule.
Details of the processing in step S103 are shown in FIG.

【００５１】以上のように、本実施の形態によれば、１
以上の属性値からなる属性値集合間の相関ルールについ
てχ²検定を行い、独立性が棄却できるものを属性値間
相関ルールとして抽出し、その属性値間相関ルールから
属性間相関ルールの候補を生成し、得られた候補の相関
係数を計算して、所定の閾値以上の値を持つものを属性
間相関ルールとして出力するようにしたので、全ての相
関ルールを生成する保証はないが、ユーザにとってわか
りやすい有用な属性間相関ルールを効率よく生成できる
という効果がある。As described above, according to the present embodiment, 1
It performed chi ² test for association rules between attribute value set consisting of more attribute values, extracting what independence can be rejected as the attribute value correlation rule, the candidate attribute correlation rules from the attribute value Correlation Rule Generate and calculate the correlation coefficient of the obtained candidate, so that those having a value equal to or greater than a predetermined threshold are output as inter-attribute correlation rules, there is no guarantee that all correlation rules will be generated, There is an effect that a useful inter-attribute correlation rule that is easy for the user to understand can be efficiently generated.

【００５２】実施の形態３．上述の実施の形態１及び２
では、属性間の相関ルールを生成する方法に関するもの
について説明したが、上記実施の形態を用いて生成した
属性間相関ルールには、複数の属性間相関関係の連鎖に
よって、派生的に発生する属性間相関ルールが含まれ
る。ここでは、複数の属性間相関関係の連鎖によって派
生的に発生する属性間相関ルールを自動的に検出し、属
性間相関ルールから削除もしくは抽出する実施の形態を
示す。Embodiment 3 FIG. Embodiments 1 and 2 described above
In the above, the method relating to the method of generating the correlation rule between attributes has been described. However, the attribute correlation rule generated using the above-described embodiment includes an attribute which is generated by a chain of a plurality of attribute correlations. Inter-correlation rules are included. Here, an embodiment will be described in which an inter-attribute correlation rule which is generated as a result of a chain of a plurality of inter-attribute correlations is automatically detected and deleted or extracted from the inter-attribute correlation rule.

【００５３】図７は、本発明のデータマイニング装置に
おける、派生相関ルール抽出手段の構成を示した部分ブ
ロック図である。図において、６０は派生相関ルール抽
出部、６１は属性間相関ルール集合２から属性間相関ル
ールの連鎖を検出する循環参照検出部、６は循環参照検
出部６１が出力する循環参照の集合、６２は循環参照の
集合６をもとに、データベース１を調べて派生相関ルー
ルを検出し、該当する派生相関ルールを属性間相関ルー
ルファイル２から削除する派生ルール検定部である。こ
こで、図７に示す構成は、上述の図１及び図２のいずれ
の構成にも適用できるものとする。FIG. 7 is a partial block diagram showing the configuration of the derived association rule extracting means in the data mining apparatus of the present invention. In the figure, reference numeral 60 denotes a derived association rule extracting unit; 61, a cyclic reference detecting unit that detects a chain of inter-attribute association rules from the inter-attribute association rule set 2; 6, a set of cyclic references output by the cyclic reference detecting unit 61; Is a derived rule test unit for detecting a derived association rule by examining the database 1 based on a set 6 of cyclic references and deleting the corresponding derived association rule from the inter-attribute association rule file 2. Here, it is assumed that the configuration shown in FIG. 7 can be applied to any of the configurations shown in FIGS. 1 and 2 described above.

【００５４】次に動作について図８を用いて説明する。
図８は図７に示した派生相関ルール抽出処理の流れを示
すフロー図である。図に示すように、派生相関ルール抽
出処理は２つの処理（ステップ）から成っている。ま
ず、ステップＳ３００において、属性間相関ルールの集
合を属性間相関ルールファイル２から読み込み、循環関
係にある相関ルールの組を発見し、その循環関係を構成
している属性集合の組を循環参照集合ファイル６に出力
する。次に、ステップＳ３０１において、循環参照集合
ファイル６から循環参照を構成する属性集合の組を１つ
づつ取りだし、循環参照を構成する属性間相関ルール
が、真に相関関係を有するか否かを検定し、真の相関関
係を有さない場合、すなわち派生相関ルールである場合
には、属性間相関ルールファイル２から該当するルール
を削除する。この時、属性間相関ルール集合２から削除
するのではなく、別のファイルに出力するようにすれ
ば、派生相関ルールのみを抽出することができる。Next, the operation will be described with reference to FIG.
FIG. 8 is a flowchart showing the flow of the derived association rule extraction process shown in FIG. As shown in the figure, the derived association rule extraction process includes two processes (steps). First, in step S300, a set of inter-attribute association rules is read from the inter-attribute association rule file 2, a set of association rules having a cyclic relationship is found, and a set of attribute sets forming the cyclic relationship is referred to as a cyclic reference set. Output to file 6. Next, in step S301, a set of attribute sets constituting a cyclic reference is extracted one by one from the cyclic reference set file 6, and it is tested whether or not the inter-attribute association rules constituting the cyclic reference have a true correlation. If there is no true correlation, that is, if it is a derived correlation rule, the corresponding rule is deleted from the inter-attribute correlation rule file 2. At this time, if the file is not deleted from the inter-attribute correlation rule set 2 but is output to another file, only the derived correlation rule can be extracted.

【００５５】次に、ステップＳ３００、Ｓ３０１の処理
の詳細を説明する。Next, the details of the processing in steps S300 and S301 will be described.

【００５６】まず、ステップＳ３００の処理について説
明する。多数の属性Ａ，Ｂ，Ｃ，Ｄ，…を持つデータベ
ースを考える。このデータベースから属性間相関ルール
を生成した時に、Ａ⇔Ｂ、Ｂ⇔Ｃ、Ｃ⇔Ｄ、Ａ⇔Ｄなる
属性間相関ルールが得られたとする。ここで、４つのル
ールを並べると、Ａ⇔Ｂ⇔Ｃ⇔Ｄ⇔Ａとなり、循環関係
にあることがわかる。ステップＳ３０１では、このよう
な関係を属性間相関ルール集合から見出し、［Ａ，Ｂ，
Ｃ，Ｄ］のように、属性集合を循環を構成する順に並べ
たもを循環参照集合ファイル６に出力する。多数の属性
間相関ルールから、循環参照を発見する方法は、属性集
合をノード、属性間相関ルールをアークと見て、無向グ
ラフを作成し、マーキングしながら探索をすることで行
うことができる。First, the processing in step S300 will be described. Consider a database having many attributes A, B, C, D,. It is assumed that when the inter-attribute correlation rules are generated from this database, the inter-attribute correlation rules A⇔B, B⇔C, C⇔D, and A⇔D are obtained. Here, when the four rules are arranged, it becomes A⇔B⇔C⇔D⇔A, which indicates that there is a cyclic relationship. In step S301, such a relationship is found from the set of association rules between attributes, and [A, B,
C, D], the attribute sets are arranged in the order in which they form a cycle and output to the cycle reference set file 6. A method of finding a cyclic reference from a large number of inter-attribute association rules can be performed by creating an undirected graph and searching while marking an attribute set as a node and an inter-attribute association rule as an arc. .

【００５７】次に、ステップＳ３０１の処理の詳細につ
いて図９を用いて説明する。図９は派生相関ルールか否
かを判定し、派生相関ルールであると判断した時には該
当するルールを削除する、派生相関ルール検定処理の詳
細を示すフロー図である。Next, the details of the processing in step S301 will be described with reference to FIG. FIG. 9 is a flowchart showing the details of the derivative correlation rule test process in which it is determined whether or not the rule is a derivative correlation rule, and if it is determined that the rule is a derivative correlation rule, the corresponding rule is deleted.

【００５８】図において、まずステップＳ３０１１で循
環参照集合ファイル６が空であるか否かを調べ、空なら
ば処理を終了する。空でなければ循環参照を意味する属
性集合の組を１つ取り出す。以下では、取り出した属性
集合の組に含まれる属性集合の数をｔとし、属性集合は
Ｓ1，Ｓ2，．．．，Ｓtの順に並んでいるものとする。
すなわち、[Ｓ1，Ｓ2，．．．，Ｓt]はＳ1⇔Ｓ2，Ｓ2⇔
Ｓ3，．．．，Ｓt-1⇔Ｓt，Ｓt⇔Ｓ1なるｔ個の属性間
相関ルールが属性間相関ルールファイル２に含まれてい
たことを意味している。In the figure, first, at step S3011, it is checked whether or not the circular reference set file 6 is empty, and if it is empty, the process is terminated. If it is not empty, one set of attribute set meaning circular reference is extracted. In the following, the number of attribute sets included in the extracted set of attribute sets is t, and the attribute sets are S1, S2,. . . , St.
That is, [S1, S2,. . . , St] is S1⇔S2, S2⇔
S3,. . . , St-1⇔St, St⇔S1 means that the attribute-to-attribute correlation rules were included in the attribute-to-attribute correlation rule file 2.

【００５９】ステップＳ３０１３では、変数ｉに１をセ
ットする。この変数ｉはステップＳ３０１４〜Ｓ３０１
７で構成されるループをｔ回実行するために用いてい
る。ステップＳ３０１５、Ｓ３０１６では、属性集合Ｓ
iとＳi+1に着目し、相関ルールＳi⇔Ｓi+1が派生相関ル
ールであるか否かを調べている。図中にも注記した通
り、この図においてi=tの場合に限りi+1の値は１である
とする。In step S3013, 1 is set to a variable i. This variable i is determined in steps S3014 to S301.
7 is used to execute a loop t times. In steps S3015 and S3016, the attribute set S
Focusing on i and Si + 1, it is checked whether or not the correlation rule Si⇔Si + 1 is a derived correlation rule. As noted in the figure, it is assumed that the value of i + 1 is 1 only when i = t in this figure.

【００６０】ステップＳ３０１５ではＳiとＳi+1以外の
属性集合について、それぞれの属性集合がとりうる属性
値の直積を計算し、データベース１内の全てのレコード
を、上記直積の値で分類する。結果として、データベー
ス１は論理的に複数のデータベースに分割される。ステ
ップＳ３０１６において、分割された全てのデータベー
スについて、属性集合ＳiとＳi+1の間の相関係数を算出
し、その最大値がユーザの与える閾値に満たない場合に
は属性間相関ルールＳi⇔Ｓi+1を派生相関ルールと判断
し、属性間相関ルールファイル２から上記派生相関ルー
ルを削除する。In step S3015, for the attribute sets other than Si and Si + 1, the direct product of the possible attribute values of each attribute set is calculated, and all records in the database 1 are classified by the value of the direct product. As a result, the database 1 is logically divided into a plurality of databases. In step S3016, a correlation coefficient between the attribute sets Si and Si + 1 is calculated for all the divided databases, and if the maximum value is less than the threshold value given by the user, the inter-attribute correlation rule Si⇔Si +1 is determined as a derived correlation rule, and the derived correlation rule is deleted from the inter-attribute correlation rule file 2.

【００６１】以上の処理を例を用いて具体的に補足す
る。データベース１に多数の属性Ａ，Ｂ，Ｃ，Ｄ，
Ｅ，．．．があり、そこから生成された属性間相関ルー
ル集合に、Ａ⇔Ｂ、Ｂ⇔Ｃ、Ｃ⇔Ｄ、Ａ⇔Ｄの４つのル
ールが含まれていた場合を考える。この時、ステップＳ
３０１の処理によって、循環参照集合ファイル６には
[Ａ，Ｂ，Ｃ，Ｄ]という属性集合の組が含まれている。
ステップＳ３０１２によって、この組が取り出された
時、ｔの値は４となる。ここで、属性Ａ，Ｂ，Ｃ，Ｄが
とりうる属性値を、各々(a1,a2,a3), (b1,b2), (c1,c
2), (d1,d2,d3)とする。The above processing will be specifically supplemented with an example. A number of attributes A, B, C, D,
E,. . . Let us consider a case where four rules A⇔B, B⇔C, C⇔D, and A⇔D are included in the set of inter-attribute correlation rules generated therefrom. At this time, step S
By the processing of 301, the circular reference set file 6 becomes
A set of attribute sets [A, B, C, D] is included.
When this pair is extracted in step S3012, the value of t becomes 4. Here, the attribute values that the attributes A, B, C, and D can take are (a1, a2, a3), (b1, b2), (c1, c
2), (d1, d2, d3).

【００６２】ステップＳ３０１３、Ｓ３０１４を経由し
てＳ３０１５に入った時、ｉの値は１であり、この時は
Ａ⇔Ｂなる属性間相関ルールが派生相関ルールであるか
否かの判定が行われる。まず、ステップＳ３０１５にお
いて、Ａ，Ｂ以外の属性集合の属性値の直積を計算す
る。この例の場合、(c1,c2)×(d1,d2,d3)から、(c1d1,c
1d2,c1d3,c2d1,c2d2,c2d3)が得られ、元のデータベース
１の全レコードを、この直積によって論理的に６個の部
分データベースに分割する。すなわち、第一の部分デー
タベースは、属性Ｃの値がc1で、かつ属性Ｄの値がd1で
あるレコードばかりを集めたもの、第二の部分データベ
ースは、属性Ｃの値がc1で、かつ属性Ｄの値がd2である
レコードばかりを集めたものであり、第三から第六の部
分データベースも同様である。When S3015 is entered via steps S3013 and S3014, the value of i is 1, and at this time, it is determined whether or not the inter-attribute correlation rule of A⇔B is a derived correlation rule. . First, in step S3015, the direct product of the attribute values of the attribute set other than A and B is calculated. In this case, from (c1, c2) × (d1, d2, d3), (c1d1, c
1d2, c1d3, c2d1, c2d2, c2d3) are obtained, and all records of the original database 1 are logically divided into six partial databases by this direct product. That is, the first partial database is a collection of only records in which the value of the attribute C is c1 and the value of the attribute D is d1, and the second partial database is a collection of records in which the value of the attribute C is c1 and the attribute is c1. Only the records in which the value of D is d2 are collected, and the same applies to the third to sixth partial databases.

【００６３】次にステップＳ３０１６では、ステップＳ
３０１５で分割した部分データベース（今の場合６個）
の全てについて、属性集合ＡとＢの間の相関係数を算出
し、結果として６個の相関係数を得る。この６個の相関
係数の最大値が、ユーザの与える閾値に満たない場合、
属性間相関ルールＡ⇔Ｂは派生相関ルールと判断し、属
性間相関ルールファイル２から削除する。Next, in step S3016, step S30
Partial database divided in 3015 (6 in this case)
, The correlation coefficients between the attribute sets A and B are calculated, and as a result, six correlation coefficients are obtained. When the maximum value of the six correlation coefficients is less than the threshold value given by the user,
The attribute correlation rule AB is determined to be a derived association rule, and is deleted from the attribute correlation rule file 2.

【００６４】ステップＳ３０１７において、変数ｉの値
を１増し、ステップＳ３０１４へ戻る。今回は、変数ｉ
の値が２であるので、Ｂ⇔Ｃなる属性間相関ルールが派
生相関ルールであるか否かの判定が、Ａ⇔Ｂの場合と同
様に行われ、Ｄ⇔Ａの判定まで行ってステップＳ３０１
は終了する。In step S3017, the value of the variable i is increased by 1, and the flow returns to step S3014. This time, the variable i
Is 2, the determination whether or not the inter-attribute correlation rule of B⇔C is a derived correlation rule is performed in the same manner as in the case of A⇔B.
Ends.

【００６５】以上の説明では、説明の都合上、ステップ
Ｓ３０１５において元のデータベース１を分割し、その
すべてについて相関係数の算出を行うようにしたが、ス
テップＳ３０１５において分割の数と基準、すなわち属
性値の直積のみを求めておき、ステップＳ３０１６にお
いて、データベース１の全レコードを調べて相関係数を
算出する際に、分割の基準に従って分割数の相関係数を
１度に計算することが実際的である。In the above description, for convenience of explanation, the original database 1 is divided in step S3015, and the correlation coefficients are calculated for all of them. However, in step S3015, the number of divisions and the reference, that is, the attribute Only the direct product of the values is obtained, and in step S3016, when all the records of the database 1 are checked to calculate the correlation coefficient, it is practical to calculate the correlation coefficient of the number of divisions at one time according to the division criterion. It is.

【００６６】以上のように、この発明のデータマイニン
グ方法及び装置によれば、因果関係の存在しない派生属
性間相関ルールを、自動的に除去あるいは抽出すること
ができ、ユーザが相関ルールに対して誤った解釈を行う
危険性を軽減できるという効果がある。As described above, according to the data mining method and apparatus of the present invention, a correlation rule between derived attributes having no causal relationship can be automatically removed or extracted, and the user can delete the correlation rule from the correlation rule. This has the effect of reducing the risk of misinterpretation.

【００６７】[0067]

【発明の効果】本発明は、名義尺度もしくは順序尺度で
与えられる属性の属性値からなる複数のレコードを記憶
したデータベースから、２つ以上の属性を選択して属性
集合を生成する属性集合生成手段と、属性集合生成手段
により生成された属性集合を空でない２つの集合に分割
して、その一方を左辺として他方を右辺としたルールを
生成し、属性間相関ルールの候補として出力する属性間
相関ルール候補生成手段と、属性間相関ルールの候補に
ついて相関係数を算出し、相関係数の値が所定の閾値以
上のものを属性間相関ルールとして抽出する属性間相関
ルール検定手段とを備えたデータマイニング装置である
ので、相関係数が閾値以上の相関ルールを生成するの
で、ユーザにとってわかりやすく、有用な属性間相関ル
ールを得ることができるという効果が得られる。According to the present invention, an attribute set generating means for selecting two or more attributes from a database storing a plurality of records of attribute values of attributes given by a nominal scale or an order scale to generate an attribute set. And an attribute correlation that divides the attribute set generated by the attribute set generation means into two non-empty sets, generates a rule in which one is on the left side and the other is on the right side, and outputs the rule as a candidate for an attribute correlation rule. Rule candidate generating means, and an inter-attribute correlation rule test means for calculating a correlation coefficient for the inter-attribute correlation rule candidate and extracting a correlation coefficient value equal to or more than a predetermined threshold value as an inter-attribute correlation rule. Since it is a data mining device, it generates a correlation rule with a correlation coefficient equal to or larger than a threshold, so that a user-friendly and useful inter-attribute correlation rule can be obtained. The effect is obtained that that.

【００６８】また、属性集合の要素数の最大値を指定す
る入力手段をさらに備え、属性集合生成手段が、要素数
が２から最大値までのすべての属性集合を生成するの
で、データベースに含まれる全ての属性の組合せについ
て、属性間相関ルールの候補を生成して検証を行うの
で、漏れがなく、信頼性の高い属性間相関ルールを得る
ことができるという効果が得られる。Further, an input means for designating the maximum value of the number of elements of the attribute set is further provided, and the attribute set generating means generates all the attribute sets having the number of elements from 2 to the maximum value, so that the attribute sets are included in the database. For all combinations of attributes, a candidate for an inter-attribute correlation rule is generated and verified, so that an effect of obtaining a highly reliable inter-attribute correlation rule without omission can be obtained.

【００６９】また、属性集合生成手段が、データベース
から１つ以上の属性値を選択して、属性値集合を生成す
る属性値集合生成部と、属性値集合生成部により生成さ
れた属性値集合から２つの集合を選択して、その一方を
条件として他方を結果として相関ルール候補を生成し、
生成された相関ルール候補につき独立性の検定を行い、
独立性が棄却できる相関ルール候補を属性値間相関ルー
ルとして出力する属性値間相関ルール生成部と、属性値
間相関ルール生成部から出力された属性値間相関ルール
の左辺及び右辺の属性値を、対応する属性の属性名に変
換して、属性名から構成される属性集合を生成する属性
集合生成部とから構成されているので、全ての相関ルー
ルを生成する保証はないものの、属性間相関ルールを効
率よく生成することができるという効果が得られる。Further, the attribute set generating means selects one or more attribute values from the database and generates an attribute value set, and an attribute value set generated from the attribute value set generated by the attribute value set generator. Selecting two sets, generating a correlation rule candidate with one as a condition and the other as a result,
Perform an independence test on the generated association rule candidates,
An attribute value correlation rule generator that outputs correlation rule candidates that can reject independence as attribute value correlation rules, and attribute values on the left and right sides of the attribute value correlation rule output from the attribute value correlation rule generator. And an attribute set generation unit that generates an attribute set composed of attribute names by converting to attribute names of the corresponding attributes. An effect is obtained that rules can be efficiently generated.

【００７０】また、属性間相関ルール検定手段により抽
出された属性間相関ルールを入力とし、循環関係にある
属性間相関ルールの集合を検出する循環集合検出手段
と、循環集合検出手段で得られた循環関係にある属性間
相関ルールの集合から、循環を構成する全ての属性間相
関ルールについて、属性間相関ルールが、残りの属性間
相関ルールの連鎖によって派生的に導出されているの
か、真に相関関係を有するのかを検定する派生ルール検
定手段とをさらに備えているので、因果関係のない派生
属性間相関ルールを自動的に除去あるいは抽出すること
ができ、ユーザが相関ルールに対して誤った解釈を行う
危険性を軽減することができるという効果が得られる。Further, the inter-attribute correlation rules extracted by the inter-attribute association rule test means are input, and a set of cyclic set detection means for detecting a set of inter-attribute association rules in a cyclic relationship, and a set obtained by the cyclic set detection means. From the set of inter-attribute association rules in a cyclic relationship, for all inter-attribute association rules that make up the cycle, it is true whether the inter-attribute association rules are derived by the chain of the remaining inter-attribute association rules. Since there is further provided a derivation rule test means for testing whether or not there is a correlation, it is possible to automatically remove or extract the derivation attribute correlation rule having no causal relationship, and the user can determine whether the correlation rule is incorrect. This has the effect of reducing the risk of interpretation.

【００７１】また、この発明は、名義尺度もしくは順序
尺度で与えられる属性の属性値からなる複数のレコード
を記憶したデータベースから、２つ以上の属性を選択し
て属性集合を生成する属性集合生成ステップと、属性集
合生成ステップにより生成された属性集合を空でない２
つの集合に分割して、その一方を左辺として他方を右辺
としたルールを生成し、属性間相関ルールの候補として
出力する属性間相関ルール候補生成ステップと、属性間
相関ルールの候補について相関係数を算出し、相関係数
の値が所定の閾値以上のものを属性間相関ルールとして
抽出する属性間相関ルール検定ステップとを備えている
ので、相関係数が閾値以上の相関ルールを生成するの
で、ユーザにとってわかりやすく、有用な属性間相関ル
ールを得ることができるという効果が得られる。Further, the present invention provides an attribute set generating step of selecting two or more attributes from a database storing a plurality of records comprising attribute values of attributes given by a nominal scale or an order scale to generate an attribute set. And the attribute set generated by the attribute set generation step is not empty 2
Generating an inter-attribute correlation rule candidate generating step of generating a rule having one set as a left side and the other as a right side, and outputting the set as an inter-attribute correlation rule candidate; And an attribute correlation rule test step of extracting a correlation coefficient value equal to or more than a predetermined threshold value as an attribute correlation rule, so that a correlation rule having a correlation coefficient equal to or more than the threshold value is generated. Thus, an effect that a user-friendly and useful attribute correlation rule can be obtained is obtained.

【００７２】また、属性集合の要素数の最大値を指定す
る入力ステップをさらに備え、属性集合生成ステップ
が、要素数が２から上記最大値までのすべての属性集合
を生成するので、データベースに含まれる全ての属性の
組合せについて、属性間相関ルールの候補を生成して検
証を行うので、漏れがなく、信頼性の高い属性間相関ル
ールを得ることができるという効果が得られる。Further, the method further comprises an input step of designating the maximum value of the number of elements of the attribute set. Since the attribute set generation step generates all the attribute sets having the number of elements from 2 to the maximum value, the attribute set is included in the database. For all combinations of attributes to be created, candidates for inter-attribute correlation rules are generated and verified, so that there is an effect that an inter-attribute correlation rule with no omission and high reliability can be obtained.

【００７３】また、属性集合生成ステップが、データベ
ースの属性値を用いて、属性値間相関ルールを生成する
属性値間相関ルール生成ステップと、属性値間相関ルー
ルの左辺及び右辺の属性値を対応する属性の属性名に変
換して、属性名から構成される属性集合を生成する属性
集合生成ステップとから構成されているので、全ての相
関ルールを生成する保証はないものの、属性間相関ルー
ルを効率よく生成することができるという効果が得られ
る。The attribute set generating step corresponds to the attribute value correlation rule generating step of generating an attribute value correlation rule using the attribute values of the database and the attribute values on the left and right sides of the attribute value correlation rule. Generating an attribute set consisting of attribute names by converting the attribute names to the attribute names of the attributes to be created. Therefore, there is no guarantee that all the correlation rules will be generated. The effect of being able to generate efficiently is obtained.

【００７４】また、属性間相関ルール検定ステップによ
り抽出された属性間相関ルールから、循環関係にある属
性間相関ルールの集合を検出する循環集合検出ステップ
と、循環集合検出ステップで得られた循環関係にある属
性間相関ルールの集合から、循環を構成する全ての属性
間相関ルールについて、属性間相関ルールが、残りの属
性間相関ルールの連鎖によって派生的に導出されている
のか、真に相関関係を有するのかを検定する派生ルール
検定ステップとをさらに備えているので、因果関係のな
い派生属性間相関ルールを自動的に除去あるいは抽出す
ることができ、ユーザが相関ルールに対して誤った解釈
を行う危険性を軽減することができるという効果が得ら
れる。Further, from the inter-attribute association rule extracted in the inter-attribute association rule test step, a cyclic set detection step of detecting a set of inter-attribute association rules in a cyclic relation, and a cyclic relation obtained by the cyclic set detection step. From the set of inter-attribute association rules in, for all inter-attribute association rules that make up the cycle, whether the inter-attribute association rules are derived by the chain of the remaining And a derivation rule test step of testing whether or not the correlation rule has a causal relation. Therefore, the correlation rule between the derived attributes having no causal relationship can be automatically removed or extracted, and the user can erroneously interpret the correlation rule. The effect that the danger of performing can be reduced can be obtained.

[Brief description of the drawings]

【図１】この発明の実施の形態１におけるデータマイ
ニング装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a data mining device according to Embodiment 1 of the present invention.

【図２】この発明の実施の形態１における属性間相関
ルール生成処理の流れを示すフロー図である。FIG. 2 is a flowchart showing a flow of an inter-attribute association rule generation process according to the first embodiment of the present invention.

【図３】この発明の実施の形態１における属性値間相
関ルール検定処理の流れを示すフロー図である。FIG. 3 is a flowchart showing a flow of an attribute value correlation rule test process according to the first embodiment of the present invention;

【図４】この発明の実施の形態２におけるデータマイ
ニング装置の構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of a data mining device according to Embodiment 2 of the present invention.

【図５】この発明の実施の形態２における属性値間相
関ルールを用いた属性間相関ルール生成処理の流れを示
すフロー図である。FIG. 5 is a flowchart showing a flow of an inter-attribute association rule generation process using an inter-attribute value association rule according to the second embodiment of the present invention.

【図６】この発明の実施の形態２における属性集合生
成処理の流れを示すフロー図である。FIG. 6 is a flowchart showing a flow of an attribute set generation process according to the second embodiment of the present invention.

【図７】この発明の実施の形態３におけるデータマイ
ニング装置の派生相関ルール抽出手段の構成部分を示す
部分ブロック図である。FIG. 7 is a partial block diagram illustrating components of a derived association rule extraction unit of a data mining device according to Embodiment 3 of the present invention.

【図８】この発明の実施の形態３における派生相関ル
ール抽出処理の流れを示すフロー図である。FIG. 8 is a flowchart showing a flow of a derived association rule extraction process according to the third embodiment of the present invention.

【図９】この発明の実施の形態３における派生相関ル
ール検定処理の流れを示すフロー図である。FIG. 9 is a flowchart showing a flow of a derived association rule test process according to the third embodiment of the present invention.

【図１０】健康診断結果のデータベース例である。FIG. 10 is an example of a database of health check results.

[Explanation of symbols]

１データベース、２属性間相関ルールファイル、４
属性値間相関ルールファイル、５属性集合ファイ
ル、６循環参照集合ファイル、１０ユーザ入力部、
２０仮説生成検証部、２１属性集合生成部、２２
属性間相関ルール検定部、３０属性値間相関ルール生
成部、５０仮説生成検証部、５１属性集合生成部、
６０派生相関ルール抽出部、６１循環参照検出部、
６２派生ルール検定部。1 database, 2 attribute correlation rule file, 4
Attribute value correlation rule file, 5 attribute set file, 6 cyclic reference set file, 10 user input section,
20 hypothesis generation verification unit, 21 attribute set generation unit, 22
Inter-attribute association rule tester, 30 inter-attribute association rule generator, 50 hypothesis generation verifier, 51 attribute set generator,
60 derived association rule extraction unit, 61 cyclic reference detection unit,
62 Derived rule test unit.

Claims

[Claims]

1. An attribute set generating means for selecting two or more attributes from a database storing a plurality of records of attribute values of attributes given by a nominal scale or an order scale to generate an attribute set; The attribute set generated by the set generating means is divided into two non-empty sets, and a correlation rule is generated with one of them as a left side and the other as a right side, and output as a candidate for an inter-attribute correlation rule. Candidate generating means, and an inter-attribute correlation rule test means for calculating a correlation coefficient for the inter-attribute correlation rule candidate and outputting a correlation coefficient having a value equal to or greater than a predetermined threshold value as an inter-attribute correlation rule. A data mining device.

2. An apparatus according to claim 1, further comprising input means for designating a maximum value of the number of elements of said attribute set, wherein said attribute set generation means generates all attribute sets having a number of elements from 2 to said maximum value. The data mining device according to claim 1, wherein

3. The attribute set generation unit, wherein one or more attribute values are selected from the database to generate an attribute value set, and the attribute value set generation unit generates the attribute value set. Two sets are selected from the attribute value set, and one of them is set as the left side and the other is set as the right side to generate a correlation rule candidate. The generated correlation rule candidate is tested for independence, and the independence can be rejected. Attribute value correlation rule generation unit that outputs the attribute values as attribute value correlation rules, and attribute values of the left and right sides of the attribute value correlation rule output from the attribute value correlation rule generation unit, 2. The data mining apparatus according to claim 1, further comprising: an attribute set generation unit configured to generate an attribute set including the set of the attribute names by converting the attribute name into a name. apparatus.

4. A circulating set detecting means for inputting the inter-attribute correlation rules output by the inter-attribute association rule testing means and detecting a set of inter-attribute correlation rules in a cyclic relationship, From the set of inter-attribute association rules in a cyclic relationship, for all inter-attribute association rules that make up the cycle, is the above-mentioned inter-attribute association rule derived by the chain of the remaining inter-attribute association rules? 4. The data mining apparatus according to claim 1, further comprising: a derived rule testing means for testing whether or not there is a true correlation.

5. An attribute set generating step of selecting one or more attributes from a database storing a plurality of records of attribute values of attributes given by a nominal scale or an order scale to generate an attribute set; Dividing the attribute set generated in the set generation step into two non-empty sets, generating a correlation rule using one of them as a left side and the other as a right side, and outputting the correlation rule as a candidate for an attribute correlation rule; A candidate generating step; and an inter-attribute correlation rule test step of calculating a correlation coefficient for the inter-attribute correlation rule candidate and extracting a correlation coefficient value equal to or greater than a predetermined threshold as an inter-attribute correlation rule. A data mining method characterized in that:

6. The method according to claim 1, further comprising an input step of designating a maximum value of the number of elements of the attribute set, wherein the attribute set generation step generates all attribute sets having a number of elements from 1 to the maximum value. The data mining method according to claim 5, wherein

7. The attribute set generating step includes: generating a correlation rule between attribute values using the attribute values in the database; and generating a correlation rule between attribute values, and a left and right side of the correlation rule between attribute values. 6. The data mining method according to claim 5, further comprising the step of: converting an attribute value into an attribute name of a corresponding attribute to generate an attribute set including the attribute name.

8. A circulating set detecting step of detecting a set of inter-attribute association rules in a cyclic relationship from the inter-attribute association rules extracted by the inter-attribute association rule testing step; From the set of the inter-attribute correlation rules in a cyclic relationship, for all the inter-attribute correlation rules that make up the cycle, whether the above-mentioned inter-attribute correlation rules have been derived by the chain of the remaining inter-attribute association rules is true or not. 8. The data mining method according to claim 5, further comprising: a derivation rule test step of testing whether or not there is a correlation.