JPH10312288A

JPH10312288A - Combined analysis system

Info

Publication number: JPH10312288A
Application number: JP12049397A
Authority: JP
Inventors: Hideyuki Maki; 牧　　秀行; Akira Maeda; 章前田; Yukiyasu Ito; 幸康伊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-05-12
Filing date: 1997-05-12
Publication date: 1998-11-24

Abstract

PROBLEM TO BE SOLVED: To provide a data analysis method capable of performing combined analysis for a table form data and data different in form such as a transaction type. SOLUTION: In this system, example data 101 constituted of a single value attribute and a plural values attribute and attribute information 102 are inputted. They are converted from the table form data or the data to be analysed of the transaction form. One or more attributes are set as target attribute examples and explanation attribute value example group to generate rules from the combination of the target attribute example and the example attribute value example group. The rules are evaluated and selected, are indicated to a user or are transmitted to an external system.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】データの属性間の関係を調べ
るデータ分析技術に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a data analysis technique for examining relationships between data attributes.

【０００２】[0002]

【従来の技術】多くのデータ分析ツールが存在する。表
計算ツール、いわゆるスプレッドシート形式のデータを
扱うツールは多い。これらは、データ項目間の数式計算
を実現する。また、相関、依存関係の大きいデータ項目
の組合せを探索、発見する技術として、データマイニン
グが知られている。表形式データを対象としたデータマ
イニング技術については、「データベースからの知識発
見技術」（前田、システム／制御／情報, Vol.39, No.
4, pp.185, 1995）に説明がある。これは、データので
依存関係の強い項目の組合せを自動抽出するものであ
る。また、時系列分析、中でもトランザクションデータ
を対象とした組合せ分析技術については、「相関ルール
の可視化について」（福田、森下、信学技報, Vol.95,
No.81, 1995）に説明されている。これは、トランザク
ションの中で同時に発生する確率の高い事象の組合せを
自動抽出するものである。2. Description of the Related Art There are many data analysis tools. There are many spreadsheet tools that handle so-called spreadsheet data. These implement mathematical calculations between data items. Data mining is also known as a technique for searching for and finding combinations of data items having large correlations and dependencies. For the data mining technology for tabular data, see “Knowledge Discovery Technology from Databases” (Maeda, Systems / Control / Information, Vol. 39, No.
4, pp. 185, 1995). This is to automatically extract a combination of items that have a strong dependency in the data. For the time series analysis, especially the combination analysis technology for transaction data, see "Visualization of Association Rules" (Fukuda, Morishita, IEICE Technical Report, Vol.95,
No. 81, 1995). This automatically extracts a combination of events having a high probability of occurring simultaneously in a transaction.

【０００３】[0003]

【発明が解決しようとする課題】意思決定のためには、
種々のデータを用いたデータ分析が必要となる。例え
ば、販売業における顧客の購入動向の分析では、顧客の
性別、年齢などの静的な（日常的には変化しない）性質
に関する情報や、顧客の購入記録（買い物のたびに時系
列的に変化する）情報などを用いて分析を行う。これら
のデータはその性質により様々に異なる形式を取るの
で、通常は、別々の手法によって分析を行う。しかし、
データ間の潜在的な相関、依存関係を発見するために
は、これら形式の異なるデータを同時に分析する方法が
必要である。本発明の目的は、表形式データと、トラン
ザクション形式など形式の異なるデータを組み合わせて
分析を行うデータ分析方法を提供することである。SUMMARY OF THE INVENTION For decision making,
Data analysis using various data is required. For example, in the analysis of customer purchase trends in the sales business, information on static (which does not change on a daily basis) characteristics such as gender and age of customers, and customer purchase records (changes in time series with each purchase) Analysis) using information. Since these data take various forms depending on their properties, they are usually analyzed by different methods. But,
In order to discover potential correlations and dependencies between data, it is necessary to have a method of simultaneously analyzing these different types of data. An object of the present invention is to provide a data analysis method for performing analysis by combining tabular data and data in different formats such as a transaction format.

【０００４】[0004]

【課題を解決するための手段】種類の異なるデータを組
み合わせて分析するために、単数値属性データと複数値
属性データの両者を対象として、属性と属性値の組を生
成する手段を有する。また、目的属性と、説明属性を設
定する手段を有する。また、目的属性と説明属性を組合
せ、ルールを生成する手段を有する。そして、これらル
ールの各々を評価する手段と、評価結果にしたがってル
ールを選択する手段を有する。選択されたルールを使用
者に提示し、また、外部システムへ伝送する手段を有す
る。また、分析対象のデータを、上記の単数値属性デー
タ、複数値属性データに変換する手段を有する。In order to analyze different types of data in combination, there is provided means for generating a set of attribute and attribute value for both single-value attribute data and multi-value attribute data. Further, it has means for setting a purpose attribute and a description attribute. Further, it has a means for combining the purpose attribute and the description attribute to generate a rule. It has means for evaluating each of these rules and means for selecting a rule according to the evaluation result. It has means for presenting the selected rules to the user and transmitting them to an external system. Further, it has means for converting the data to be analyzed into the single-value attribute data and the multi-value attribute data.

【０００５】[0005]

【発明の実施の形態】図１に本発明の実施の形態の一例
を示す。ここでは、事例データ１０１と属性情報１０２
が入力となる。事例データ１０１はレコードの集合であ
り、各レコードはそれぞれ、分析の対象となる事例の１
つに対応している。図２に事例データ１０１の例を示
す。これは小売店における顧客データであり、ここで
は、１つの事例とは１人の顧客のことである。したがっ
て、各レコードは１人の顧客に対応する。事例データ１
０１を構成するレコードは１つ以上のフィールドによっ
て構成され、各フィールドは事例に関する属性に対応し
ている。顧客データの場合は、顧客番号、性別、年齢、
購入品など、顧客に関する情報を記述する項目に相当す
る。各レコードの、各フィールドには、対応する顧客
の、対応する属性の属性値が格納される。どのフィール
ドがどの属性に対応しているか、それぞれの属性はどの
ような値を取り得るか等、属性に関する情報が属性情報
１０２である。属性情報の例を図３に示す。これは各々
の属性についての情報の集合である。属性情報は、「属
性名」、「対応フィールド」、「単数値／複数値の
別」、「定義域」といった情報で構成されている。「属
性名」は、各々の属性に付けられた名前であり、属性
は、互いに区別される。なお、以後の説明では属性名に
よって属性を指し示す。例えば、「性別」という属性名
を持つ属性を『「性別」属性』と呼ぶ。「対応フィール
ド」は、事例データ中で、各々の属性に対応しているフ
ィールドの番号である。「単数値／複数値の別」は、各
々の属性が単数値属性であるか、複数値属性であるかを
示す記号である。単数値属性、複数値属性についての説
明は後述する。「定義域」は、各々の属性が取り得る属
性値の集合である。例えば、「顧客番号」属性の定義域
は 00000 から 99999 までの数字列であり、「性別」属
性の定義域は {男性,女性} である。FIG. 1 shows an example of an embodiment of the present invention. Here, case data 101 and attribute information 102
Is the input. The case data 101 is a set of records. Each record is one of the cases to be analyzed.
It corresponds to one. FIG. 2 shows an example of the case data 101. This is customer data at a retail store, where one case is one customer. Thus, each record corresponds to one customer. Case data 1
01 is composed of one or more fields, and each field corresponds to an attribute related to a case. For customer data, the customer number, gender, age,
This corresponds to an item that describes information about a customer, such as a purchased item. In each field of each record, the attribute value of the corresponding attribute of the corresponding customer is stored. Attribute information 102 is attribute information such as which field corresponds to which attribute and what value each attribute can take. FIG. 3 shows an example of the attribute information. This is a set of information about each attribute. The attribute information includes information such as “attribute name”, “corresponding field”, “single value / multiple value distinction”, and “definition area”. "Attribute name" is a name given to each attribute, and the attributes are distinguished from each other. In the following description, the attribute is indicated by the attribute name. For example, an attribute having an attribute name “sex” is called “sex” attribute. The “corresponding field” is a field number corresponding to each attribute in the case data. "Single value / multiple value distinction" is a symbol indicating whether each attribute is a single value attribute or a multiple value attribute. The single-value attribute and the multi-value attribute will be described later. “Defined area” is a set of attribute values that each attribute can take. For example, the domain of the "customer number" attribute is a numeric string from 00000 to 99999, and the domain of the "sex" attribute is {male, female}.

【０００６】ここで、単数値属性、複数値属性について
説明する。単数値属性とは、１つの事例について、同時
に２つ以上の値を取ることができない属性のことであ
る。図４は単数値属性の例である。「性別」属性の場
合、１つの事例、すなわち１人の顧客についての性別は
「男性」か「女性」のどちらか１つだけであり、２つ以
上の値を取り得ない。ただし、性別が不明である場合は
あり得る。このような場合は、当該顧客の性別属性はい
かなる値も取らないということになり、フィールドは空
欄となる。一方、複数値属性とは、１つに事例につい
て、同時に複数の値を取ることができる属性のことであ
る。図５は複数値属性の例である。「スーツ」、
「靴」、「コート」などが、「今月購入品」属性の属性
値である。「今月購入品」属性の場合、１人の顧客が当
該期間に複数の商品を購入することはあり得る。また、
１人の顧客が当該期間に購入する商品の数は不定であ
る。したがって、「今月購入品」属性に対応するフィー
ルドには、複数の属性値が格納され、その数は不定であ
る。また、いかなる属性値も取らない場合は、フィール
ドは空欄となる。図６は複数値属性の別の形の例であ
る。ここでは、どの１つのレコードの、どの１つのフィ
ールドについても、格納されている属性値は１つ、また
は空欄であるが、「今月購入品」属性に対応するフィー
ルドが複数存在し、したがって、「今月購入品」属性は
同時に複数の属性値を取ることになる。この場合、図３
に示した属性情報の「フィールド」の項には、「今月購
入品」属性に対応する複数のフィールド番号が記述され
る。Here, the single-value attribute and the multi-value attribute will be described. A single-valued attribute is an attribute that cannot take two or more values at the same time for one case. FIG. 4 is an example of a single-valued attribute. In the case of the "gender" attribute, the gender for one case, that is, one customer is only one of "male" and "female" and cannot take two or more values. However, it is possible that the gender is unknown. In such a case, the gender attribute of the customer does not take any value, and the field is blank. On the other hand, a multi-value attribute is an attribute that can take a plurality of values simultaneously for one case. FIG. 5 is an example of a multi-value attribute. "suit",
“Shoes”, “coat” and the like are attribute values of the “purchase item this month” attribute. In the case of the “purchase this month” attribute, one customer may purchase a plurality of products during the period. Also,
The number of products purchased by one customer during the period is undefined. Therefore, a plurality of attribute values are stored in the field corresponding to the “purchase item this month” attribute, and the number is indeterminate. If no attribute value is taken, the field is blank. FIG. 6 is another example of a multi-valued attribute. Here, the stored attribute value is one or blank for any one field of any one record, but there are a plurality of fields corresponding to the “purchase item this month” attribute. The “purchase this month” attribute will take multiple attribute values at the same time. In this case, FIG.
In the "field" section of the attribute information shown in (1), a plurality of field numbers corresponding to the "purchase item this month" attribute are described.

【０００７】説明属性値例群生成手段１０３の入出力関
係を図７に示す。属性情報１０２の中の属性のうちの１
つと、その属性の定義域に属する属性値のうちの１つを
組にしたものを属性値例と呼び、属性値名と属性値で書
き表す。例えば、（性別、男性）は属性「性別」とその
属性値「男性」を組にした属性値例である。説明属性値
例群生成手段１０３の入力は、これら属性値例７０１の
集合である。説明属性値例群生成手段１０３の入力とな
る属性値例７０１の集合のことを特に、組合せ候補集合
７０２と呼ぶことにする。組合せ候補集合７０２に関し
て、特に条件が設定されていない場合は、属性情報１０
２中のすべての属性の、すべての属性値の組が組合せ候
補集合７０２の要素となる。説明属性値例群生成手段１
０３では、入力された属性値例を組合せ、１つ以上の属
性値例を要素として持つ属性値例群７０３を生成し、出
力する。同一の単数値属性を持つ、複数の属性値例を同
一の属性値例群７０３に含むことはできない。例えば、
「性別」属性は単数値属性なので、（性別、男性）と
（性別、女性）の２つの属性値例を同一の属性値例群に
含むことはできない。一方、複数値属性については、同
一の複数値属性を持つ、複数の属性値例を同一の属性値
例群７０３に含むことができる。例えば、「今月購入
品」属性は複数値属性なので、（今月購入品、スーツ）
と（今月購入品、コート）の２つの属性値例を同一の属
性値例群に含むことができる。上記の同一属性の条件以
外に、属性値例の組合せに条件がない場合、組合せ候補
集合７０２に属する属性値例７０１の数が多いとその可
能な組合せの数は膨大なものとなる。そこで、１つの属
性値例群７０３が要素として持つ属性値例の数の上限を
比較的少数、例えば３に設定するなど、組合せの作り方
に条件１０４を設ける。説明属性値例群生成手段１０３
は、この組合せ条件１０４の範囲内で可能な全ての属性
値例群７０３を生成する。説明属性値例群生成手段１０
３において生成された属性値例群を、説明属性値例群と
呼ぶことにする。FIG. 7 shows the input / output relationship of the explanation attribute value example group generating means 103. One of the attributes in the attribute information 102
Then, a set of one of the attribute values belonging to the domain of the attribute is referred to as an attribute value example, and is represented by an attribute value name and an attribute value. For example, (sex, male) is an example of an attribute value in which an attribute “sex” and its attribute value “male” are paired. The input of the explanation attribute value example group generation unit 103 is a set of these attribute value examples 701. A set of attribute value examples 701 to be input to the explanation attribute value example group generation unit 103 is particularly called a combination candidate set 702. If no particular condition is set for the combination candidate set 702, the attribute information 10
2 are all elements of the combination candidate set 702. Explanation attribute value example group generation means 1
In step 03, the input attribute value examples are combined to generate and output an attribute value example group 703 having one or more attribute value examples as elements. A plurality of attribute value examples having the same single numerical value attribute cannot be included in the same attribute value example group 703. For example,
Since the “gender” attribute is a single-valued attribute, two attribute value examples (sex, male) and (sex, female) cannot be included in the same attribute value example group. On the other hand, for the multi-value attribute, a plurality of attribute value examples having the same multi-value attribute can be included in the same attribute value example group 703. For example, the “Purchase this month” attribute is a multi-value attribute, so
And (an item purchased this month, court) can be included in the same attribute value example group. When there is no condition for the combination of the attribute value examples other than the condition of the same attribute, if the number of attribute value examples 701 belonging to the combination candidate set 702 is large, the number of possible combinations becomes enormous. Therefore, the condition 104 is set in the way of creating a combination, such as setting the upper limit of the number of attribute value examples which one attribute value example group 703 has as an element to a relatively small number, for example, three. Explanation attribute value example group generation means 103
Generates all possible attribute value example groups 703 within the range of the combination condition 104. Explanation attribute value example group generation means 10
The attribute value example group generated in 3 is referred to as an explanation attribute value example group.

【０００８】目的属性値例設定手段１０５は、説明属性
値例群生成手段１０３と同様の属性値例の集合を入力と
し、このうちの１つ以上を目的属性値例として選択す
る。目的属性値例設定手段１０５の出力は、選択された
目的属性値例の集合である。入力された属性値例のうち
の、どれを目的属性値例として選択するかは、あらかじ
め定められているか、または使用者が指定する。特に定
められなければ、全ての属性値例が目的属性値例として
選択される。The target attribute value example setting means 105 receives a set of attribute value examples similar to those of the explanation attribute value example group generating means 103 and selects one or more of them as a target attribute value example. The output of the purpose attribute value example setting means 105 is a set of the selected purpose attribute value examples. Which of the input attribute value examples is selected as the target attribute value example is predetermined or designated by the user. Unless otherwise specified, all attribute value examples are selected as target attribute value examples.

【０００９】１つの説明属性値例群と１つの目的属性値
例からなる組をルールと呼ぶことにする。ルール生成手
段１０６は、説明属性値例群生成手段１０３から出力さ
れた説明属性値例群７０３の集合と、目的属性値例設定
手段１０５から出力された目的属性値例８０１の集合を
入力とし、これらを組み合わせて生成されたルールの集
合を出力する。ルール生成手段１０６の入出力の例を図
８に示す。ルール生成手段１０６では、説明属性値例群
と目的属性値例のすべての可能な組合せのうち、ルール
についての条件を満たすものをルール８０２として生成
し、生成されたルール８０２の集合を出力する。図９
に、ルールについての条件を示す。単数値属性に関して
は、同一の属性を持つ説明属性値例群と目的属性値例は
組み合わせることはできない。例えば、（性別、男性）
という目的属性値例に対しては、（性別、男性）や、
（性別、女性）という属性値例を持つ説明属性例群を組
み合わせない（９０１、９０２）。また、複数値属性に
関しては、同一の属性値例、すなわち同一の属性と同一
の属性値を持つ説明属性値例群と目的属性値例は組み合
わせることができない。例えば、（今月購入品、スー
ツ）という目的属性値例に対しては、（今月購入品、ス
ーツ）という属性値例を持つ説明属性値例群を組み合わ
せないが（９０３）、（今月購入品、コート）という属
性値例を持つ説明属性値例群は、属性値が異なるので組
み合わせることができる（９０４）。A set consisting of one example group of explanatory attribute values and one example of target attribute values is called a rule. The rule generation means 106 receives as input the set of the explanation attribute value example groups 703 output from the explanation attribute value example group generation means 103 and the set of the purpose attribute value examples 801 output from the purpose attribute value example setting means 105, A set of rules generated by combining these is output. FIG. 8 shows an example of input and output of the rule generation means 106. The rule generation means 106 generates a rule 802 that satisfies the rule condition among all possible combinations of the explanation attribute value example group and the purpose attribute value example, and outputs a set of the generated rules 802. FIG.
Shows the conditions for the rules. Regarding single-valued attributes, a description attribute value example group having the same attribute and a target attribute value example cannot be combined. For example, (gender, male)
For the purpose attribute value example, (gender, male)
A description attribute example group having attribute value examples of (sex, female) is not combined (901, 902). Further, with respect to the multi-value attribute, the same attribute value example, that is, the description attribute value example group having the same attribute and the same attribute value and the target attribute value example cannot be combined. For example, for a target attribute value example of (purchase item this month, suit), a description attribute value group having an attribute value example of (purchase item this month, suit) is not combined (903). The explanation attribute value group having the attribute value example of “coat” can be combined because the attribute values are different (904).

【００１０】図１０に、ルール評価手段１０７の入出力
を示す。ルール評価手段１０７は、事例データ１０１
と、ルール生成手段１０６において生成されたルール８
０２の集合と、属性情報１０２を入力し、入力されたル
ール８０２のすべてについて各々の評価情報１００２を
算出し、ルール８０２に評価情報１００２を付加する。
ここで、ルールに評価情報を付加したものを、「ルール
評価」と呼ぶことにする。そして、ルール評価１００１
の集合を出力する。いま、ある１つのレコード Rと、あ
る１つの属性値例 a :（attr、val）が与えられたとす
る。レコード Rにおいて、属性 attr に対応するフィー
ルドに格納されている属性値の集合を、VAL(R, attr)
とおく。属性 attr が単数値属性の場合、 VAL(R, att
r) の要素の数は、１または０である。また、属性 attr
が複数値属性の場合、 VAL(R, attr) の要素の数は、
１以上または０である。ここで、VAL(R, attr) = φ の
とき、そのときに限って、 R は属性 attr に関して
「欠損」していると言うことにする。また、 val ∈ VA
L(R, attr) であるとき、そのときに限って、属性値例a
とレコード R は互いに「合致」していると言うことに
する（図１１）。また、属性値例群 A : {a_i} が与え
られ、 A に属する属性値例 a_i のすべてがレコード R
に合致しているとき、そのときに限って、属性値例群
A とレコード Rは互いに合致していると言うことにする
（図１２）。また、ルール r が与えられ、レコード R
が、ルール r を構成する目的属性値例と説明属性値例
群の両方に合致するとき、そのときに限って、ルール r
とレコード R は互いに合致していると言うことにする
（図１３）。図１４に、ルールについて評価情報を得る
手順を示す。対象のルールを r 、ルール r を構成する
目的属性値例を t 、説明属性値例群を A とし、説明属
性値例群を A を構成する属性値例を { a_1, a_2,... a
_n } とする。ここで、 n は A を構成する属性値例の
数である。事例データ１０１中の全レコードを対象と
し、処理１４０１では説明属性値例群 A に合致するレ
コード数を数え、それを S_A とする。また、処理１４
０２では、目的属性値例 t に合致するレコード数を数
え、それを S_t とする。また、処理１４０３では、ル
ール r に合致するレコード数を数え、それを S_r とす
る。処理１４０４では、属性値例 t 、a_i（ただし、1
≦ i ≦ n）が持つ属性の、いずれに関しても欠損して
いないレコードの数を数え、それを N とする。上記
の、合致、欠損の判定の際、どのフィールドがどの属性
に対応しているかを知る必要があるが、これは属性情報
１０２を参照することによって知ることができる。処理
１４０５では、ルール r の評価値 U を以下の式にした
がって算出する。FIG. 10 shows the input / output of the rule evaluation means 107. The rule evaluation means 107 outputs the case data 101
And the rule 8 generated by the rule generation means 106
02, and the attribute information 102 are input, each piece of evaluation information 1002 is calculated for all of the input rules 802, and the evaluation information 1002 is added to the rule 802.
Here, what added evaluation information to a rule is called "rule evaluation". And rule evaluation 1001
Output the set of Now, it is assumed that one record R and one attribute value example a: (attr, val) are given. In record R, the set of attribute values stored in the field corresponding to attribute attr is written as VAL (R, attr)
far. VAL (R, att if attribute attr is a single-valued attribute
The number of elements in r) is 1 or 0. Also, the attribute attr
Is a multi-valued attribute, the number of elements in VAL (R, attr) is
It is 1 or more or 0. Here, when VAL (R, attr) = φ, R is said to be "missing" for attribute attr only then. Also, val ∈ VA
If L (R, attr), then only if
And record R are said to "match" each other (FIG. 11). Also, attribute value example group A: {a_i} is given, and all of the attribute value example a_i belonging to A are records R
Attribute value example group if and only if
Let us say that A and record R match each other (FIG. 12). Also, given rule r, record R
Satisfies both the target attribute value example and the description attribute value example group that constitute the rule r, then only when the rule r
And record R match each other (FIG. 13). FIG. 14 shows a procedure for obtaining evaluation information on a rule. Let r be the target rule, t be the purpose attribute value example that constitutes rule r, A be the description attribute value group, and {a_1, a_2, ... a be the attribute value example that constitutes A.
_n}. Where n is the number of example attribute values that make up A. For all records in the case data 101, the number of records that match the explanatory attribute value example group A is counted in processing 1401, and is set as S_A. Processing 14
In 02, the number of records that match the target attribute value example t is counted, and is set as S_t. In the processing 1403, the number of records matching the rule r is counted, and this is set as S_r. In processing 1404, the attribute value examples t and a_i (where 1
Count the number of records that are not missing in any of the attributes of ≤ i ≤ n), and let it be N. When determining the match or the loss, it is necessary to know which field corresponds to which attribute. This can be known by referring to the attribute information 102. In processing 1405, the evaluation value U of the rule r is calculated according to the following equation.

【００１１】[0011]

【数１】U = Ｇ・log[P(S_r|S_A) / P(S_A)] ここで、P(S_A) = S_A / N、P(S_r) = S_r / N、P(S_r|
S_A) = P(S_r) / P(S_A)である。また、Ｇは S_A、 S_
t、 S_r、 N の関数であり、具体的には、次のような式
を用いる。U = G · log [P (S_r | S_A) / P (S_A)] where P (S_A) = S_A / N, P (S_r) = S_r / N, P (S_r |
S_A) = P (S_r) / P (S_A). G is S_A, S_
This is a function of t, S_r, and N. Specifically, the following equation is used.

【００１２】[0012]

【数２】Ｇ = S_A / N[Equation 2] G = S_A / N

【００１３】[0013]

【数３】Ｇ = S_r / S_t[Equation 3] G = S_r / S_t

【００１４】また、Also,

【数１】において、Ｇの代わりにＧのべきを用いるこ
ともある。処理１４０６では、これら U、S_A、S_t、S_
r、N を評価情報１００２としてルール r に付加し、ル
ール r のルール評価１００１を生成する。ルール評価
手段１０７では、入力されたすべてのルール８０２につ
いてルール評価１００１を生成し、ルール評価１００１
の集合を出力する。In Expression 1, a power of G may be used instead of G. In processing 1406, these U, S_A, S_t, S_
r and N are added to the rule r as evaluation information 1002 to generate a rule evaluation 1001 of the rule r. The rule evaluation means 107 generates a rule evaluation 1001 for all the inputted rules 802, and
Output the set of

【００１５】ルール選択手段１０８の入出力を図１５に
示す。ルール選択手段１０８では、ルール評価手段１０
７から出力されたルール評価１００１の集合を入力し、
これを選択候補とし、ルール選択条件１０９に基づき、
各々のルールの評価情報によって選択候補であるルール
評価の一部または、全部を選択し、選択されたルール評
価１５０１の集合を出力する。ルール選択の方法の一例
を図１６に示す。ここでは、選択するルール評価の数 L
が、ルール選択条件１０９によってあらかじめ定めら
れており、入力されたルール評価１００１の中から、評
価値の大きい順に L 個のルール評価を選択する。FIG. 15 shows the input and output of the rule selecting means 108. In the rule selection means 108, the rule evaluation means 10
7, a set of rule evaluations 1001 output from
Based on this as a selection candidate, based on the rule selection condition 109,
A part or all of the rule evaluations that are selection candidates are selected based on the evaluation information of each rule, and a set of selected rule evaluations 1501 is output. FIG. 16 shows an example of a rule selection method. Here, the number of rule evaluations to select, L
Are determined in advance by the rule selection condition 109, and L rule evaluations are selected from the input rule evaluations 1001 in order of the evaluation value.

【００１６】図１７に、ルール選択の方法の別の一例を
示す。ここで、２つのルール r1 とr2 が、同一の目的
属性値例を持ち、 r1 の説明属性値例群 A1 が r2 の説
明属性値例群 A2 の部分集合である時、ルール r2 は
ルール r1 に「包含」されると言うことにする（図１
８）。図１７のルール選択の方法では、任意の２つのル
ール r1、r2 に関し、 r1 が r2 を包含し、かつ r2 の
評価値が r1 の評価値を超えない場合、 r2 のルール評
価を選択候補から外す。そして、選択候補に残ったルー
ル評価の中から、評価値の大きい順に L 個のルール評
価を選択する。なお、 L の値はルール選択条件１０９
によってあらかじめ定められている。FIG. 17 shows another example of a rule selection method. Here, when two rules r1 and r2 have the same target attribute value example, and the explanatory attribute value example group A1 of r1 is a subset of the explanatory attribute value example group A2 of r2, the rule r2 is
Let's say that it is "included" by rule r1 (Fig. 1
8). In the rule selection method shown in FIG. 17, with respect to any two rules r1 and r2, if r1 includes r2 and the evaluation value of r2 does not exceed the evaluation value of r1, the rule evaluation of r2 is excluded from the selection candidates. . Then, from among the rule evaluations remaining as selection candidates, L rule evaluations are selected in descending order of the evaluation value. Note that the value of L is determined by the rule selection condition 109
Is predetermined.

【００１７】図１９に、ルール選択の方法の別の一例を
示す。まず、入力されたルール評価１００１を、目的属
性値例によってグループに分ける。そして、それぞれの
グループにおいて、ルール評価の選択を行う。それぞれ
のグループにおける選択方法は、図１６に示したよう
な、単純な評価値順の選択方法や、図１７に示したよう
な、ルールの包含を考慮した選択方法を用いる。FIG. 19 shows another example of a rule selection method. First, the input rule evaluation 1001 is divided into groups according to the purpose attribute value examples. Then, a rule evaluation is selected in each group. As a selection method in each group, a selection method in a simple evaluation value order as shown in FIG. 16 or a selection method in consideration of inclusion of a rule as shown in FIG. 17 is used.

【００１８】図２０に、出力手段１１０の入出力を示
す。出力手段は、ルール選択手段１０８において選択さ
れたルール評価１５０１の集合を入力し、これを使用者
に提示したり、外部システムに伝送するのに適した形に
整形した後、画面２００１によって使用者に提示した
り、記憶媒体２００２や通信路２００３を介して外部シ
ステムへ伝送する。FIG. 20 shows the input and output of the output means 110. The output unit inputs the set of rule evaluations 1501 selected by the rule selection unit 108 and presents the set to the user or shapes it into a form suitable for transmission to an external system. Or transmitted to an external system via a storage medium 2002 or a communication path 2003.

【００１９】図２１は、画面２００１の表示例である。
ここに表示されているのは、入力されたルール評価１５
０１を評価値の順に列挙した表であり、各行が１つのル
ール評価を表している。また、各列はそれぞれ、目的属
性値例、説明属性値例群、ルールの精度、ルールのカバ
ー率を表している。第１行を例にとると、目的属性値例
は（先月購入品、コート）、説明属性値例群は、（性
別、男性）、（年齢層、３０歳代）、（先月購入品、ス
ーツ）の組、ルールの精度は 60％、ルールのカバー率
は 40％となっている。ここで、精度とカバー率につい
て説明する。精度は次の式で求められる。FIG. 21 shows a display example of the screen 2001.
What is displayed here is the entered rule evaluation 15
01 is a table listing the evaluation values in order, and each row represents one rule evaluation. In addition, each column represents a purpose attribute value example, a description attribute value example group, rule accuracy, and rule coverage. Taking the first line as an example, an example of the purpose attribute value is (purchased last month, coat), and a group of explanatory attribute values is (sex, male), (age group, 30s), (purchased last month, suit ), The rule accuracy is 60% and the rule coverage is 40%. Here, the accuracy and the coverage will be described. The accuracy is obtained by the following equation.

【００２０】[0020]

【数４】精度 = S_r / S_A ただし、S_r はルールに合致するレコードの数、S_A は
説明属性値例群に合致するレコードの数である。また、
カバー率は次の式で求められる。## EQU4 ## where S_r is the number of records that match the rule, and S_A is the number of records that match the group of explanatory attribute values. Also,
The coverage is obtained by the following equation.

【００２１】[0021]

【数５】カバー率 = S_r / S_t ただし、S_r はルールに合致するレコードの数、S_t は
目的属性値例に合致するレコードの数である。精度は、
説明属性値例群が目的属性値例をどの程度正確に説明し
ているかを表す。例えば、目的属性値例「コートを先月
購入」と、説明属性値例群「男性、３０歳代、スーツ先
月購入」をもつルールの精度が 60％であるということ
は、「男性、３０歳代、スーツを先月購入」という顧客
のうちの 60％が「コートを先月購入」しているという
ことである。すなわち、『「男性、３０歳代、スーツを
先月購入」という顧客は「コートを先月購入」してい
る』という説明が 60％正しいというふうに解釈でき
る。また、カバー率は、そのルールが目的属性値例の全
体のうちのどの程度を説明しているかを表す。例えば、
目的属性値例「コートを先月購入」と、説明属性値例群
「男性、３０歳代、スーツ先月購入」をもつルールのカ
バー率が 40％であるということは、このルールが「コ
ートを先月購入」した顧客のうちの 40％を説明してい
るということである。使用者は、精度、カバー率を見な
がら、『「男性、３０歳代、スーツを先月購入」という
顧客は「コートを先月購入」している』という説明はど
の程度有用であるかを評価し、顧客の動向分析に利用す
る。また、カバー率を以下の式によって求める場合もあ
る。## EQU5 ## Coverage = S_r / S_t where S_r is the number of records matching the rule, and S_t is the number of records matching the target attribute value example. Accuracy is
It shows how accurately the example attribute value group describes the purpose attribute value example. For example, a rule that has a purpose attribute value example “purchase court last month” and a description attribute value group “male, 30s, suit purchase last month” that has 60% accuracy means that “rules for men, 30s” 60% of customers who buy suits last month buy coats last month. In other words, it can be interpreted that the explanation that "a customer who is" male, 30s, buying a suit last month "has purchased a coat last month" is 60% correct. The coverage indicates how much of the rule explains the entire purpose attribute value example. For example,
A rule with an objective attribute value example “purchase court last month” and a description attribute value group “male, 30s, suit purchase last month” has a coverage rate of 40%. This means that they account for 40% of the "purchased" customers. The user evaluates the usefulness of the explanation that "the customer" male, 30s, purchasing a suit last month "has purchased a coat last month" while checking the accuracy and coverage. Use for customer trend analysis. Further, the coverage may be obtained by the following equation.

【００２２】[0022]

【数６】カバー率 = S_A / N ただし、N はルール中のいかなる属性に関しても欠損で
ないレコードの数である。また、外部システムに出力す
る場合は、選択されたルール評価を適当なデータ構造に
格納し、記憶媒体２００２や通信路２００３を経由して
外部システムに送出する。## EQU6 ## Coverage = S_A / N where N is the number of records that are not missing for any attribute in the rule. When outputting the rule evaluation to an external system, the selected rule evaluation is stored in an appropriate data structure, and transmitted to the external system via the storage medium 2002 or the communication path 2003.

【００２３】前述の説明では、説明属性値例群設定手段
１０３には、属性情報１０２に含まれるすべての属性情
報が入力されるものとしてきた。しかし、属性の中に
は、説明属性値例として使うことに意味のないものもあ
る。例えば、「顧客番号」属性の属性値は、顧客につけ
られた通し番号であり、その値が顧客の動向や売上に関
係しているとは考えないであろう。説明属性値例群を生
成する際には、このような属性は用いない方が良い。図
２２に、説明属性値例群に用いる属性を指定する手段を
示す。説明属性設定手段２２０１では、属性情報１０２
が持つ属性のうち、説明属性値例群生成手段１０３で用
いるべき属性、および属性値を設定し、出力する。この
設定は、あらかじめ定められているか、または、使用者
が必要に応じて行う。説明属性の設定を使用者が行う場
合の操作画面の例を図２３に示す。操作画面２３０１に
は、属性情報１０２が持つ属性の一覧２３０２と、属性
値の一覧２３０３が表示されている。使用者は、属性の
一覧２３０２の中から、説明属性として用いるべき属性
を１つ以上選択し、印をつける。また、選択した属性の
各々について、説明属性値例として用いるべき属性値を
１つ以上選択する。説明属性設定手段２２０１は、選択
された属性と属性値の集合を出力する。例えば、説明属
性の１つとして「年齢層」を選択し、その属性値とし
て、「２０歳代」、「３０歳代」を指定すると、「年齢
層」属性に関しては、「２０歳代」、「３０歳代」の２
つの属性のみが出力される。したがって、説明属性値例
群生成手段１０３においては、（年齢層、２０歳代）、
（年齢層、３０歳代）といった属性値例は用いられる
が、（年齢層、４０歳代）などの属性値例は用いられ
ず、よって、このような属性値例を持つルールも生成さ
れない。また、操作画面２３０１の属性値の一覧２３０
３において、初期設定は「すべての属性値」となってお
り、属性値の選択操作を行わない場合、選択された属性
のすべての属性値が出力される。In the above description, all the attribute information included in the attribute information 102 has been input to the explanation attribute value example group setting means 103. However, some attributes do not make sense to use as example description attribute values. For example, the attribute value of the “customer number” attribute is a serial number assigned to the customer, and it will not be considered that the value is related to the trend or sales of the customer. It is better not to use such an attribute when generating the example attribute value group. FIG. 22 shows a means for designating an attribute used for a group of explanatory attribute values. In the description attribute setting means 2201, the attribute information 102
The attribute to be used by the explanation attribute value example group generation unit 103 and the attribute value are set and output among the attributes possessed by. This setting is predetermined or performed by the user as needed. FIG. 23 shows an example of the operation screen when the user sets the description attribute. The operation screen 2301 displays a list 2302 of attributes included in the attribute information 102 and a list 2303 of attribute values. The user selects and marks one or more attributes to be used as descriptive attributes from the attribute list 2302. In addition, for each of the selected attributes, one or more attribute values to be used as the explanation attribute value examples are selected. The explanation attribute setting unit 2201 outputs a set of the selected attribute and attribute value. For example, if “age” is selected as one of the description attributes and “20s” and “30s” are specified as the attribute values, “20s”, “20s” 2 in the "30s"
Only one attribute is output. Therefore, in the explanation attribute value example group generation means 103, (age group, 20s),
An attribute value example such as (age group, thirties) is used, but an attribute value example such as (age group, forties) is not used, and thus a rule having such an attribute value example is not generated. Also, a list 230 of attribute values on the operation screen 2301
In 3, the initial setting is “all attribute values”, and if no attribute value selection operation is performed, all attribute values of the selected attribute are output.

【００２４】ここまでは、事例データ１０１、属性情報
１０２がすでに与えられていることを前提にして説明し
たが、実際の応用の場合、分析対象のデータはこのよう
な形式になっていない場合が多い。そこで、分析対象の
データを事例データ１０１、属性情報１０２に変換す
る、データ変換手段について説明する。図２４は、デー
タ変換手段を加えたシステム構成の例である。データ変
換手段２４０２は、分析対象データ２４０１を入力し、
そこから、事例データ１０１と、属性情報１０２を生成
する。ここでは、小売店における顧客動向分析の場合を
例にとり、どのようなデータ変換が行われるかを説明す
る。顧客動向分析では、それぞれの顧客についての性
別、年齢などの顧客属性情報と、それぞれ顧客の顧客が
いつ、何を購入したかという購入情報を用い、両者を組
合わせて分析を行う。分析対象は顧客であり、したがっ
て、顧客属性情報、購入情報の両者を１つのレコードが
１人の顧客に対応する事例データに変換する。The above description has been made on the assumption that the case data 101 and the attribute information 102 have already been given. However, in the case of an actual application, the data to be analyzed may not be in such a format. Many. Therefore, a data conversion unit that converts data to be analyzed into case data 101 and attribute information 102 will be described. FIG. 24 is an example of a system configuration to which data conversion means is added. The data conversion means 2402 inputs the analysis target data 2401,
From this, case data 101 and attribute information 102 are generated. Here, taking the case of customer trend analysis in a retail store as an example, what kind of data conversion is performed will be described. The customer trend analysis uses customer attribute information such as gender and age of each customer and purchase information indicating when and what each customer purchased, and performs an analysis by combining the two. The analysis target is a customer. Therefore, both customer attribute information and purchase information are converted into case data in which one record corresponds to one customer.

【００２５】まず、顧客属性情報を事例データに変換す
る例を示す。図２５には、分析対象データのレコードの
１つを示してある。顧客属性情報は、顧客１人１人につ
いての性別、年齢などの属性情報の集合であり、形式の
上では、前述の事例データ１０１の単数値属性と同じで
ある。しかし、データの性質、分析の目的にしたがっ
て、変換が必要となる場合がある。例えば、日本全国の
地域別の顧客動向を分析したい場合は、それぞれの顧客
の住所情報が必要となる。図２５の顧客属性情報２５０
１には住所のフィールドがあるので、これを用いれば良
い。しかし、顧客属性情報２５０１の住所情報をそのま
ま属性値として用いるのは情報が細か過ぎる。日本全国
の地域別動向を調べるには、東北、関東、中部といった
地域レベル、あるいは、都道府県レベルの情報が適して
いるであろう。そこで、顧客属性情報２５０１の住所情
報を地域レベル、都道府県レベルの属性に変換する。ま
た、年齢別の顧客動向を調べたい時には、顧客属性情報
２５０１の年齢情報を用いるだろうが、これも、１歳刻
みで調べるよりも、２０歳代、３０歳代というような、
やや大きな年齢層別に調べたい場合もある。図２５の例
では、顧客属性情報２５０１の年齢情報をそのまま属性
値として用いず、２０歳代、３０歳代というような年齢
層属性に変換する。このような場合、それぞれの属性の
定義域は変換方法から明確にわかる。例えば、地域属性
の定義域は、{北海道、東北、関東、中部、近畿、‥‥}
のようになるであろう。都道府県属性の定義域はすべ
ての都道府県名であり、年齢層属性の定義域は {１０歳
未満、１０歳代、２０歳代、‥‥} のようになる。以上
の変換の結果、顧客属性情報２５０１から、「値域」属
性、「都道府県」属性、「年齢層」属性などが生成さ
れ、これらの属性についての属性情報２５０３が生成さ
れる。また、これらの属性に対応するフィールド、属性
値を持った事例データ２５０２が生成される。First, an example of converting customer attribute information into case data will be described. FIG. 25 shows one of the records of the analysis target data. The customer attribute information is a set of attribute information such as gender and age for each customer, and is the same in format as the single-valued attribute of the case data 101 described above. However, conversion may be required depending on the nature of the data and the purpose of the analysis. For example, in order to analyze customer trends by region in Japan, address information of each customer is required. Customer attribute information 250 of FIG.
1 has an address field, which can be used. However, using the address information of the customer attribute information 2501 as an attribute value as it is is too fine. To study regional trends throughout Japan, information at the regional level, such as Tohoku, Kanto, and Chubu, or at the prefectural level would be appropriate. Therefore, the address information of the customer attribute information 2501 is converted into attributes at the regional level and at the prefectural level. In addition, when it is desired to examine customer trends by age, the age information of the customer attribute information 2501 will be used.
In some cases, you may want to investigate by age group. In the example of FIG. 25, the age information of the customer attribute information 2501 is not used as it is as an attribute value, but is converted into an age group attribute such as 20s and 30s. In such a case, the domain of each attribute is clearly known from the conversion method. For example, the domain of the regional attribute is {Hokkaido, Tohoku, Kanto, Chubu, Kinki, ‥‥}
It will be like. The domain of the prefecture attribute is all prefecture names, and the domain of the age group attribute is {10 years old, 10s, 20s, ‥‥}. As a result of the above conversion, a “value range” attribute, a “prefecture” attribute, an “age group” attribute, and the like are generated from the customer attribute information 2501, and attribute information 2503 for these attributes is generated. Also, case data 2502 having fields and attribute values corresponding to these attributes is generated.

【００２６】これら、住所情報から地域属性、都道府県
属性への変換、年齢情報から年齢層属性への変換の方
法、すなわち、変換前の顧客属性情報と変換後の属性値
との対応関係は先験的に明らかである場合が多い。例え
ば、「神奈川県横浜市○○区‥‥」という住所情報は、
都道府県属性に関して、「神奈川県」という属性値に変
換されることは、先験的に明らかである。しかし、変換
方法が先験的に明らかでない場合もある。先の年齢層属
性の例では、「２０歳以上３０歳未満 → ２０歳代」と
いうように、先験的に明らかな変換方法を用いたが、年
齢層に分ける場合、２０歳、３０歳、４０歳という年齢
に区切りを置くべきかどうかが明らかでない場合もあ
る。その場合、顧客属性情報における顧客の年齢分布な
ど、元のデータの性質を用いて変換方法を決定する。こ
の例を図２６に示す。図２６は元の顧客属性情報におけ
る顧客の年齢分布を示すヒストグラム２６０１である。
これによると、顧客の年齢分布は一様ではなく、したが
って、年齢分布によるグループ分けができそうである。
グループ分けの方法は、また種々考えられるが、図２６
の例では、３０歳付近のピークを中心に、ピーク付近、
ピークよりも下、ピークよりも上、の３つのグループを
作り、さらに、４０歳以上のもう１つのピークを１つの
グループとし、合わせて４つのグループを作っている。
これら４つのグループに互いを区別するための名前をつ
け、それらが属性値となる。こうして、年齢層属性につ
いての変換方法２６０２が決定され、これに基づき、顧
客属性情報２６０３から事例データ２６０４へのデータ
変換が行われ、また、属性情報２６０５が生成される。
顧客属性情報に関する属性は、たいていは単数値属性で
ある。The method of converting the address information into the regional attribute and the prefecture attribute, and the method of converting the age information into the age group attribute, that is, the correspondence between the customer attribute information before the conversion and the attribute value after the conversion is described above. Often it is experimentally clear. For example, the address information “Yokohama-ku, Yokohama-shi, Kanagawa”
It is apparent a priori that the prefecture attribute is converted to the attribute value “Kanagawa”. However, the conversion method may not be apparent a priori. In the above example of the age group attribute, a conversion method that is apparent a priori was used, such as “20 years old or older and less than 30 years old → 20s”, but when dividing into age groups, 20 years old, 30 years old, It may not be clear whether a break should be placed at the age of 40. In that case, the conversion method is determined using the characteristics of the original data, such as the age distribution of the customer in the customer attribute information. This example is shown in FIG. FIG. 26 is a histogram 2601 showing the customer age distribution in the original customer attribute information.
According to this, the age distribution of the customers is not uniform, and therefore, it is likely that grouping can be performed based on the age distribution.
Various methods of grouping are also conceivable.
In the example of the above, around the peak around the age of 30, around the peak,
Three groups are formed below the peak and above the peak, and another group over the age of 40 is grouped into one group, for a total of four groups.
These four groups are given names for distinguishing each other, and they are attribute values. Thus, the conversion method 2602 for the age group attribute is determined, and based on this, the data conversion from the customer attribute information 2603 to the case data 2604 is performed, and the attribute information 2605 is generated.
Attributes related to customer attribute information are usually single-valued attributes.

【００２７】次に、購入情報を事例データに変換する例
を説明する。購入データの例を図２７に示す。１つのレ
コードは、１回の商品の購入に対応しており、購入日
時、購入した顧客の顧客番号、商品名、金額が格納され
ている。購入データは、これらのレコードの系列であ
り、通常、購入の発生順、すなわち購入日時の順に並ん
でいる。このデータは、購入というトランザクションの
系列であると言える。図２５における顧客属性データの
場合、１つのレコードが１人の顧客に対応し、したがっ
て、１人の顧客に関する情報は１つのレコード中にのみ
存在するので、形式上は、そのまま事例データとするこ
とができる。しかし、購入データの場合、１つのレコー
ドは１人の顧客ではなく、１回の商品購入に対応してい
る。当然、１人の顧客が複数の商品を購入する場合があ
るので、１人の顧客に関する購入情報は複数のレコード
に存在する。したがって、レコードが顧客に対応する事
例データに変換するためには、データの形式の変換も必
要となる。また、購入情報のうち、何に着目し、分析に
利用するかによって、様々なデータ変換の方法があり得
る。Next, an example of converting purchase information into case data will be described. FIG. 27 shows an example of purchase data. One record corresponds to one purchase of a product, and stores the purchase date and time, the customer number of the purchased customer, the product name, and the price. The purchase data is a series of these records, and is usually arranged in the order of purchase, that is, in the order of purchase date and time. This data can be said to be a series of transactions of purchase. In the case of the customer attribute data in FIG. 25, one record corresponds to one customer, and therefore, information about one customer exists only in one record. Can be. However, in the case of purchase data, one record corresponds to one product purchase, not one customer. Of course, since one customer may purchase a plurality of products, purchase information about one customer exists in a plurality of records. Therefore, in order to convert a record into case data corresponding to a customer, it is necessary to convert the data format. Further, there are various data conversion methods depending on what is focused on and used for analysis in the purchase information.

【００２８】図２８に、購入データの変換の一例を示
す。これは、「今月、何を購入したか」にのみ着目した
データ変換である。まず、「今月購入品」という属性を
生成し、事例データには、「今月購入品」属性に対応す
るフィールドを設ける。１人の顧客が複数の商品を購入
することはあり得るので、この属性は複数値属性であ
り、また、定義域はすべての商品の商品名である。次
に、対象となる期間分（この場合は、今月分）の購入デ
ータ２８０１を探索し、それぞれの顧客が購入した商品
の商品名を、事例データの「今月購入品」属性に対応す
るフィールドに格納する。１人の顧客が当該期間中に同
じ商品を２回以上購入することはあり得るが、１つのフ
ィールドに同一の商品名を重複して格納することはしな
い。ここでは「購入したか」にのみ着目しており、回数
（個数）は問題としていないからである。FIG. 28 shows an example of conversion of purchase data. This is data conversion focusing only on "what you purchased this month". First, an attribute “purchased this month” is generated, and a field corresponding to the attribute “purchased this month” is provided in the case data. Since one customer can purchase a plurality of products, this attribute is a multi-value attribute, and the domain is the product names of all the products. Next, the purchase data 2801 for the target period (in this case, this month) is searched, and the product name of the product purchased by each customer is entered in the field corresponding to the “purchased this month” attribute of the case data. Store. It is possible that one customer purchases the same product more than once during the period, but the same product name is not redundantly stored in one field. Here, attention is paid only to "whether purchased", and the number of times (number) is not a problem.

【００２９】図２９に、購入データの変換の別の一例を
示す。図２８の例では、個々の商品について、購入した
かどうかに着目していた。しかし、分析の目的によって
は、個々の商品だけではなく、複数の商品について、時
間関係を含めて扱いたい場合もある。例えば、「１ヶ月
のうちにスーツとコートを購入した」、「スーツ購入の
後でコートを購入した」などの購入パターンに関する分
析である。個々の商品のみを扱う場合は、フィールド内
に商品名を並べるだけでこと足りるが、購入パターンを
扱うためには、個々の商品ではなく、購入パターン自体
を表す属性値を用いる。例えば、「１ヶ月のうちにスー
ツとコートを購入した」と「１年のうちにスーツとコー
トを購入した」というような、期間に関する購入パター
ンについては、「スーツ&コートin１ヶ月」、「スーツ&
コートin１２ヶ月」などの記号を属性値として用いる。
また、「スーツ購入の後でコートを購入した」のよう
に、順序に関する購入パターンについては、「スーツth
enコート」などの記号を属性値として用いる。１人の顧
客がこれらの購入パターンのうちの２つ以上に当てはま
ることはあり得るので、これらの購入パターンに関する
属性は複数値属性となるのが普通である。このように、
表現したい購入パターンに応じて記号を定義すれば、様
々な購入パターンを複数値属性として扱うことができ
る。どのような購入パターンを表現する必要があるか
は、分析の目的によって決まる。変換の手順としては、
まず、どのような購入パターンに着目するかを決める
（２９０２）。それによって、可能な購入パターンの集
合、すなわち、「購入パターン」属性の定義域が決まる
（２９０３）。次に、購入データ２９０１の中から、顧
客のそれぞれについて、当てはまる購入パターンを探索
し（２９０４）、見つかった購入パターンに対応する記
号を事例データ２９０５に格納する。FIG. 29 shows another example of conversion of purchase data. In the example of FIG. 28, attention is paid to whether or not each product is purchased. However, depending on the purpose of the analysis, there is a case where it is desired to treat not only individual products but also a plurality of products including a time relationship. For example, the analysis is related to a purchase pattern such as "purchased a suit and coat in one month" and "purchased a coat after purchasing a suit". When dealing only with individual products, it is sufficient to arrange the product names in the field. However, in order to handle the purchase pattern, not the individual product but an attribute value representing the purchase pattern itself is used. For example, purchase patterns related to periods such as “purchased a suit and coat in one month” and “purchased a suit and coat in one year” include “suit & coat in one month”, “suit & coat in one month”, and
A symbol such as “coat in 12 months” is used as the attribute value.
Also, for the purchase pattern related to the order, such as "I purchased a coat after purchasing a suit,"
A symbol such as "en coat" is used as the attribute value. Since a single customer can apply to more than one of these purchase patterns, the attributes associated with these purchase patterns are usually multi-valued attributes. in this way,
If a symbol is defined according to the purchase pattern to be expressed, various purchase patterns can be treated as a multi-value attribute. What purchase patterns need to be expressed depends on the purpose of the analysis. The conversion procedure is as follows:
First, a purchase pattern to be focused on is determined (2902). Thereby, a set of possible purchase patterns, that is, a domain of a “purchase pattern” attribute is determined (2903). Next, the purchase data 2901 is searched for an applicable purchase pattern for each customer (2904), and the symbol corresponding to the found purchase pattern is stored in the case data 2905.

【００３０】さらに多くの商品の組や、複雑な時間関
係、順序関係を扱おうとすると、可能な組合せの数が増
加し、それにともない、属性値の定義域も非常に大きな
ものとなる。だが、可能な組合せの数は大きいが、実際
に購入データに出現するのは、そのうちのごく一部であ
ることが多い。そこで、可能な組合せのうち、実際には
購入データに出現していないもの、あるいは出現頻度が
非常に小さいものを属性情報から削除し、定義域を小さ
くすることにより、以後の組合せ生成、評価、選択処理
を効率良く行う。この様子を図３０に示す。ここでは、
「１２ヶ月のうちに購入する商品の組」に着目するとす
る（３００１）。この場合、「Ａ&Ｂ in１２ヶ月」（た
だし、Ａ、Ｂは商品名）という記号を属性値として取
り、その定義域は、可能な２つの商品の組合せ全てであ
る（３００２）。次に、購入データ３００３の中から、
顧客のそれぞれについて、当てはまる購入パターンを探
索し（３００４）、見つかった購入パターンに対応する
記号を事例データ３００５に格納する。その次に、購入
パターンの探索の過程において出現しなかった購入パタ
ーン、あるいは、出現頻度の低い購入パターンを属性値
の定義域から削除し（３００６）、小さい属性情報３０
０７を生成する。出現頻度の低い購入パターンを定義域
から削除した場合、これらは、属性情報３００７の中に
は存在しないが、事例データ３００５の中には存在する
ことになる。必要であれば、これらの購入パターンを事
例データ３００５からも削除する。こうして、属性の定
義域を小さくすることによって、説明属性値例群生成手
段１０３や、ルール生成手段１０６において、無駄な組
合せを生成しないようにすることができる。If more product sets, complicated time relations, and order relations are to be handled, the number of possible combinations increases, and accordingly, the domain of the attribute values becomes very large. However, the number of possible combinations is large, but only a small part of them actually appear in the purchase data. Therefore, among the possible combinations, those that do not actually appear in the purchase data or those that have a very low frequency of occurrence are deleted from the attribute information and the domain is reduced, so that the subsequent combination generation, evaluation, Perform the selection process efficiently. This is shown in FIG. here,
It is assumed that attention is paid to "a set of products to be purchased within 12 months" (3001). In this case, a symbol “A & B in 12 months” (where A and B are product names) is taken as an attribute value, and its definition area is all possible combinations of two products (3002). Next, from the purchase data 3003,
For each customer, a purchase pattern that matches is searched (3004), and a symbol corresponding to the found purchase pattern is stored in the case data 3005. Next, the purchase pattern that did not appear in the process of searching for the purchase pattern or the purchase pattern with a low appearance frequency is deleted from the domain of the attribute value (3006), and the small attribute information 30 is deleted.
07 is generated. When the purchase patterns having a low appearance frequency are deleted from the definition area, they are not present in the attribute information 3007 but are present in the case data 3005. If necessary, these purchase patterns are also deleted from the case data 3005. In this way, by reducing the domain of the attribute, it is possible to prevent the description attribute example group generating unit 103 and the rule generating unit 106 from generating useless combinations.

【００３１】以上のように、顧客属性情報と購入情報
を、単数値属性データ、複数値属性データに変換するこ
とにより、顧客属性情報と購入情報を組み合わせた事例
データを作ることができる。この事例データを入力と
し、図１に示す分析システムを用いることにより、顧客
属性情報と購入情報を組み合わせた顧客動向分析が可能
となる。図３１に、顧客動向分析の結果の一例を示す。
例えば、図３１中の説明欄の第１は、「２０歳代前半、
男性で、１ヶ月のうちにスーツとコートを両方購入した
人の40％が、今月の購入総額が２０万以上で、これは今
月の購入総額が２０万以上の人の60.5％である」という
ことを示している。As described above, by converting customer attribute information and purchase information into single-value attribute data and multi-value attribute data, case data combining customer attribute information and purchase information can be created. By using this case data as input and using the analysis system shown in FIG. 1, customer trend analysis combining customer attribute information and purchase information becomes possible. FIG. 31 shows an example of the result of the customer trend analysis.
For example, the first of the description columns in FIG. 31 is “In the early 20s,
Forty percent of men who purchased both suits and coats in a month have more than 200,000 purchases this month, which is 60.5% of those who have purchased more than 200,000 this month. " It is shown that.

【００３２】[0032]

【発明の効果】本発明によれば、同時に単数の値を取る
属性と、複数の値を取る属性が混在したデータを対象
に、データの組合せ分析を行なうことができる。これに
より、形式の異なるデータを一緒に分析することがで
き、別々に分析したのでは得られない、データ間の潜在
的相関、依存関係を発見することができる。また、事例
の属性に関する情報と、事象の系列に関する情報を単数
値属性、複数値属性にデータ変換することにより、両者
を組み合わせて分析することができる。これにより、事
例に関する静的な情報（属性情報）と、動的な情報（事
象系列）の間の相関、依存関係を発見することができ
る。According to the present invention, data combination analysis can be performed on data in which an attribute taking a single value and an attribute taking a plurality of values are mixed at the same time. As a result, data in different formats can be analyzed together, and potential correlations and dependencies between data that cannot be obtained by separate analysis can be found. Further, by converting the information on the attribute of the case and the information on the series of events into single-valued attributes and multi-valued attributes, it is possible to analyze both in combination. This makes it possible to discover correlations and dependencies between static information (attribute information) and dynamic information (event series) relating to cases.

[Brief description of the drawings]

【図１】本発明の一つの実施形態の構成を示す図であ
る。FIG. 1 is a diagram showing a configuration of one embodiment of the present invention.

【図２】事例データの例を示す図である。FIG. 2 is a diagram showing an example of case data.

【図３】属性情報の例を示す図である。FIG. 3 is a diagram illustrating an example of attribute information.

【図４】単数値属性の例を示す図である。FIG. 4 is a diagram illustrating an example of a single-valued attribute.

【図５】複数値属性の例を示す図である。FIG. 5 is a diagram illustrating an example of a multi-value attribute.

【図６】複数値属性の別の形の例を示す図である。FIG. 6 is a diagram showing another example of a multi-value attribute.

【図７】説明属性値例群生成手段の入出力を説明する図
である。FIG. 7 is a diagram illustrating input / output of an explanation attribute value example group generation unit.

【図８】ルール生成手段の入出力を説明する図である。FIG. 8 is a diagram illustrating input / output of a rule generation unit.

【図９】ルール生成手段における属性値例の組合せの例
を示す図である。FIG. 9 is a diagram illustrating an example of a combination of example attribute values in a rule generation unit.

【図１０】ルール評価手段の入出力を説明する図であ
る。FIG. 10 is a diagram illustrating input / output of a rule evaluation unit.

【図１１】レコードと属性値例の合致の例を示す図であ
る。FIG. 11 is a diagram showing an example of a match between a record and an attribute value example.

【図１２】レコードと属性値例群の合致の例を示す図で
ある。FIG. 12 is a diagram illustrating an example of a match between a record and an attribute value example group.

【図１３】レコードとルールの合致の例を示す図であ
る。FIG. 13 is a diagram showing an example of a match between a record and a rule.

【図１４】ルールの評価値を算出する手順を説明する図
である。FIG. 14 is a diagram illustrating a procedure for calculating an evaluation value of a rule.

【図１５】ルール選択手段の入出力を説明する図であ
る。FIG. 15 is a diagram illustrating input / output of a rule selection unit.

【図１６】評価値の順にルールを選択する方法の例を示
す図である。FIG. 16 is a diagram illustrating an example of a method of selecting rules in the order of evaluation values.

【図１７】他のルールに包含されるルールを候補から外
す例を示す図である。FIG. 17 is a diagram illustrating an example of excluding a rule included in another rule from candidates.

【図１８】ルールの包含関係の例を示す図である。FIG. 18 is a diagram illustrating an example of a rule inclusion relation.

【図１９】目的属性値例によるグループ別にルールを選
択する方法の例を示す図である。FIG. 19 is a diagram illustrating an example of a method of selecting a rule for each group according to an example of an objective attribute value.

【図２０】出力手段の入出力を説明する図である。FIG. 20 is a diagram illustrating input / output of an output unit.

【図２１】分析結果を表示する画面の例を示す図であ
る。FIG. 21 is a diagram illustrating an example of a screen displaying an analysis result.

【図２２】説明属性設定手段の入出力を説明する図であ
る。FIG. 22 is a diagram illustrating input / output of an explanation attribute setting unit.

【図２３】説明属性を設定する方法の例を示す図であ
る。FIG. 23 is a diagram illustrating an example of a method of setting an explanation attribute.

【図２４】データ変換手段の入出力を説明する図であ
る。FIG. 24 is a diagram illustrating input / output of a data conversion unit.

【図２５】顧客属性情報が単数値属性データに変換され
る例を示す図である。FIG. 25 is a diagram illustrating an example in which customer attribute information is converted into single-value attribute data.

【図２６】データの特徴に基づいて変換方法を決定する
例を示す図である。FIG. 26 is a diagram illustrating an example of determining a conversion method based on characteristics of data.

【図２７】購入データの例を示す図である。FIG. 27 is a diagram illustrating an example of purchase data.

【図２８】購入データが複数値属性データに変換される
例を示す図である。FIG. 28 is a diagram showing an example in which purchase data is converted into multi-value attribute data.

【図２９】購入データ中の購入パターンが複数値属性デ
ータに変換される例を示す図である。FIG. 29 is a diagram showing an example in which a purchase pattern in purchase data is converted into multi-value attribute data.

【図３０】購入データ中に現れない購入パターンを属性
の定義域から削除する例を示す図である。FIG. 30 is a diagram illustrating an example in which a purchase pattern that does not appear in purchase data is deleted from the attribute definition area.

【図３１】顧客分析の結果の表示の例を示す図である。FIG. 31 is a diagram illustrating an example of display of a result of customer analysis.

[Explanation of symbols]

１０１…事例データ、１０２…属性情報、１０３…説明
属性値例群生成手段、１０４…組合せ条件、１０５…目
的属性値例設定手段、１０６…ルール生成手段、１０７
…ルール評価手段、１０８…ルール選択手段、１０９…
ルール選択条件、１１０…出力手段。101: case data, 102: attribute information, 103: explanation attribute value example group generation means, 104: combination condition, 105: purpose attribute value example setting means, 106: rule generation means, 107
... Rule evaluation means, 108 ... Rule selection means, 109 ...
Rule selection condition, 110 ... output means.

Claims

[Claims]

1. A set of records, each of which corresponds to one of the cases to be analyzed, each of said records being composed of one or more fields, each of said fields being associated with a case. Data corresponding to one of the attributes and having one or less or a plurality of attribute values in each of the fields for each of the records is set as analysis target data, and a set of attributes related to the case is set as an attribute candidate set. Input, and one of the attributes belonging to the attribute candidate set
And a combination generating means for generating one or more attribute value example groups each including one or more attribute value examples, and one or more attribute value examples. A purpose setting means for setting one of the attributes and one of the possible attribute values of the attribute as an attribute value example, and setting one or more of the attribute value examples as a target attribute value example; The attribute value example group generated by the combination generating means is an explanation attribute value example group, a set of one explanation attribute value example group and one target attribute value example is a rule, and one or more rules are generated.
It has combination evaluation means for calculating an evaluation value for each of the rules by a predetermined algorithm, and has combination selection means for selecting the rule according to the evaluation value, and outputs the selected rule to the outside. Or a combination analysis system having output means for presenting to a user.

2. The combination analysis system according to claim 1, wherein an attribute relating to the case to be analyzed is generated based on data relating to the case to be analyzed, and a set of possible attribute values of the attribute is determined. A combination analysis system including an attribute generation unit that determines an attribute value of the attribute for each of the cases to be analyzed.

3. The combination analysis system according to claim 1, further comprising means for designating a set of attribute candidates by a user.

4. The objective setting means according to claim 1, wherein:
A combination analysis system having means for designating a target attribute value example by a user.

5. A set of records according to claim 1, wherein data storing attribute information relating to a case to be analyzed and data which is a series of events relating to the case are input. Each of the records corresponds to one of the cases to be analyzed, each of the records is composed of one or more fields, each of the fields corresponds to one of the attributes for the case, and for each of the records 2. The combination analysis system according to claim 1, further comprising means for converting the data into a format in which one or less or a plurality of attribute values are stored in each of the fields.