JP2011034457A

JP2011034457A - Data mining system, data mining method and data mining program

Info

Publication number: JP2011034457A
Application number: JP2009181900A
Authority: JP
Inventors: Hiroki Mizuguchi; 弘紀水口; Masaru Kusui; 大久寿居; Yoshio Ishizawa; 善雄石澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-08-04
Filing date: 2009-08-04
Publication date: 2011-02-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data mining system that can change the granularity of attributes efficiently without causing a user's trial and error. <P>SOLUTION: The data mining system includes a grouping means 102 for selecting an attribute from data including a plurality of attributes and attribute values and grouping the attribute values of the selected attribute according to a pre-stored classification hierarchy hierarchically representing a classification corresponding to the attribute, a test statistic calculation means 103 for calculating a test statistic indicating the strength of relevance of the grouped attribute values to an attribute under analysis, and a determination means 104 for determining whether the grouped attribute values are characteristic in terms of relationship to the attribute under analysis according to the calculated test statistic. If the determination means 104 determines that they are not characteristic, the grouping means 102 executes the grouping again according to a level of classification in the classification hierarchy higher than that used in the preceding grouping out of the classification hierarchy. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、複数の属性と属性値とを含むデータから分析対象の属性に関連する属性を判定するデータマイニングシステム、データマイニング方法及びデータマイニング用プログラムに関する。 The present invention relates to a data mining system, a data mining method, and a data mining program for determining an attribute related to an attribute to be analyzed from data including a plurality of attributes and attribute values.

一般に、データマイニングシステムは、さまざまな属性をもったデータを入力として、属性間の関係の法則を計算（抽出）する。データマイニングシステムは、例えば、営業活動における顧客訪問の履歴データを入力として、営業活動の成功／失敗に関係深い属性が何かを計算（抽出）する。 In general, a data mining system calculates (extracts) a law of relation between attributes by using data having various attributes as input. For example, the data mining system calculates (extracts) what attributes are related to the success / failure of a sales activity by using, as an input, historical data of customer visits in the sales activity.

図１は、顧客訪問の履歴データの例を示す説明図である。図１に示す例では、履歴データは、訪問先、訪問日、訪問時間、訪問者及び成功／失敗を属性としたデータを含む。なお、図中の「…」は省略を示す。データマイニングシステムは、このようなデータが与えられた（入力された）時に、履歴データから、成功／失敗の属性と関係の深い属性を探す（抽出する）。データマイニングシステムは、属性間の関係があるかどうかを図るために、例えば、一般によく知られている、ｔ検定などの検定手法を用いる。 FIG. 1 is an explanatory diagram showing an example of customer visit history data. In the example illustrated in FIG. 1, the history data includes data having attributes of visited place, visited date, visited time, visitor, and success / failure. In addition, "..." in a figure shows abbreviation. When such data is given (input), the data mining system searches (extracts) an attribute closely related to the success / failure attribute from the history data. In order to determine whether there is a relationship between attributes, the data mining system uses, for example, a well-known test method such as t-test.

しかし、このような検定を行うためには、数値属性を離散化したり、属性をグルーピングしたりする必要がある。この問題を解決することを目的とした装置として、例えば、特許文献１に記載された装置がある。 However, in order to perform such a test, it is necessary to discretize numerical attributes or group attributes. As an apparatus intended to solve this problem, for example, there is an apparatus described in Patent Document 1.

特許文献１に記載された装置では、入力データに対し、手続きファイルと呼ばれるルールを適用することで数値属性の離散化や属性のグルーピングを行う。たとえば、手続きファイルに、「訪問時間を、８：００以前、８：００から０９：５９、１０：００から１１：５９、１２：００から１４：００、１４：００から１６：００、１６：００以降、離散化」、「訪問先を、Ａ産業とＢ興産、Ｃ興産とＤ会社、Ｅ会社とＦ会社、それ以外に離散化」などと記述することで、数値の離散化や属性のグルーピングを行う。 In the apparatus described in Patent Document 1, numerical attributes are discretized and attributes are grouped by applying a rule called a procedure file to input data. For example, in the procedure file, “visit time before 8:00, 8:00 to 09:59, 10:00 to 11:59, 12:00 to 14:00, 14:00 to 16:00, 16: “Discrete from 00”, “Distribute destinations to A industry and B Kosan, C Kosan and D company, E company and F company, and other discretization”, etc. Perform grouping.

特開平１１−２５００８４号公報JP-A-11-250084

しかし、特許文献１に記載された装置の問題点は、効率的ではないということである。その理由は、一つの属性に対し、一つの離散化やグルーピングのみを記述するのみでは、良い結果が得られないため、多くの試行錯誤が必要となるためである。 However, the problem with the device described in Patent Document 1 is that it is not efficient. The reason is that a lot of trial and error is required because a good result cannot be obtained by describing only one discretization or grouping for one attribute.

特許文献１に記載された装置では、たとえば、「訪問時間を、８：００以前、８：００から０９：５９、１０：００から１１：５９、１２：００から１４：００、１４：００から１６：００、１６：００以降、離散化」というルールを記述したのち、あまり妥当な結果が得られない場合には、「訪問時間を０：００から１１：５９と１２：００から２３：５９に離散化」というように新たなにルールを記述しなおす必要がある。また、仮に、ルールを複数記述できたとしても、特許文献１に記載された装置では、すべてのルールを適用したのちに検定を行う必要がある。 In the apparatus described in Patent Document 1, for example, “visit time from 8:00 or before, 8:00 to 09:59, 10:00 to 11:59, 12:00 to 14:00, 14:00 After describing the rule of “discretization after 16:00, 16:00”, if a reasonable result cannot be obtained, “visit visit times from 0:00 to 11:59 and from 12:00 to 23:59 It is necessary to rewrite a new rule such as “discretization”. Even if a plurality of rules can be described, the apparatus described in Patent Document 1 needs to perform a test after applying all the rules.

そこで、本発明は、データマイニングにおいて、ユーザが試行錯誤することなく、効率的に属性の粒度を変更することができるデータマイニングシステム、データマイニング方法及びデータマイニング用プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a data mining system, a data mining method, and a data mining program that can efficiently change the granularity of attributes without trial and error by a user in data mining. .

本発明によるデータマイニングシステムは、複数の属性と属性値とを含むデータから分析対象の属性に関連する属性を判定するデータマイニングシステムであって、データから属性を選択し、予め記憶する属性に対応する分類を階層的にあらわす分類階層に基づいて、選択した属性の属性値をグループ化するグループ化手段と、グループ化手段がグループ化した属性値と分析対象の属性との関連性の強さを示す検定量を算出する検定量算出手段と、検定値算出手段が算出した検定量に基づいて、グループ化した属性値が分析対象の属性との関係において特徴的であるか否かを判定する判定手段とを含み、グループ化手段は、判定手段が特徴的でないと判定すると、分類階層のうち、前回のグループ化で用いた下位層よりも上位の階層の分類に基づいてグループ化処理を再度実行することを特徴とする。 The data mining system according to the present invention is a data mining system that determines an attribute related to an attribute to be analyzed from data including a plurality of attributes and attribute values, and selects an attribute from the data and corresponds to an attribute stored in advance Grouping means for grouping the attribute values of the selected attributes based on the classification hierarchy representing the classification to be performed, and the strength of the relationship between the attribute values grouped by the grouping means and the attributes to be analyzed A test amount calculating means for calculating a test amount to be displayed, and a determination for determining whether the grouped attribute value is characteristic in relation to the analysis target attribute based on the test amount calculated by the test value calculating means And when the determination means determines that the determination means is not characteristic, the grouping means classifies the classification hierarchy to a higher hierarchy than the lower hierarchy used in the previous grouping. Zui and executes a grouping process again.

本発明によるデータマイニング方法は、複数の属性と属性値とを含むデータから分析対象の属性に関連する属性を判定するデータマイニング方法であって、データから属性を選択し、予め記憶する属性に対応する分類を階層的にあらわす分類階層に基づいて、選択した属性の属性値をグループ化するグループ化ステップと、グループ化した属性値と分析対象の属性との関連性の強さを示す検定量を算出する検定量算出ステップと、算出した検定量に基づいて、グループ化した属性値が分析対象の属性との関係において特徴的であるか否かを判定する判定ステップとを含み、グループ化ステップで、特徴的でないと判定すると、分類階層のうち、前回のグループ化で用いた下位層よりも上位の階層の分類に基づいてグループ化処理を再度実行することを特徴とする。 The data mining method according to the present invention is a data mining method for determining an attribute related to an attribute to be analyzed from data including a plurality of attributes and attribute values, and selects an attribute from the data and corresponds to an attribute stored in advance A grouping step that groups the attribute values of the selected attribute based on the classification hierarchy that represents the classification to be performed hierarchically, and a test quantity that indicates the strength of the association between the grouped attribute value and the attribute to be analyzed A test amount calculation step for calculating, and a determination step for determining whether or not the grouped attribute value is characteristic in relation to the attribute to be analyzed based on the calculated test amount. If it is determined that it is not characteristic, the grouping process is executed again based on the classification of the higher hierarchy than the lower hierarchy used in the previous grouping in the classification hierarchy. It is characterized in.

本発明によるデータマイニング用プログラムは、複数の属性と属性値とを含むデータから分析対象の属性に関連する属性を判定するコンピュータに、データから属性を選択し、予め記憶する属性に対応する分類を階層的にあらわす分類階層に基づいて、選択した属性の属性値をグループ化するグループ化処理と、グループ化した属性値と分析対象の属性との関連性の強さを示す検定量を算出する検定量算出処理と、算出した検定量に基づいて、グループ化した属性値が分析対象の属性との関係において特徴的であるか否かを判定する判定処理とを実行させ、グループ化処理で、特徴的でないと判定すると、分類階層のうち、前回のグループ化で用いた下位層よりも上位の階層の分類に基づいてグループ化処理を再度実行させることを特徴とする。 The data mining program according to the present invention selects an attribute from data and determines a classification corresponding to an attribute stored in advance in a computer that determines an attribute related to the attribute to be analyzed from data including a plurality of attributes and attribute values. A grouping process that groups the attribute values of the selected attributes based on the hierarchical classification, and a test that calculates the test amount that indicates the strength of the relationship between the grouped attribute values and the attributes to be analyzed Based on the calculated amount and the calculated amount, the grouped attribute value is judged to determine whether the grouped attribute value is characteristic in relation to the attribute to be analyzed. If it is determined that it is not appropriate, the grouping process is performed again based on the classification of the higher hierarchy than the lower hierarchy used in the previous grouping in the classification hierarchy. .

本発明によれば、データマイニングにおいて、ユーザが試行錯誤することなく、効率的に属性の粒度を変更することができる。 According to the present invention, in data mining, the attribute granularity can be changed efficiently without trial and error by the user.

顧客訪問の履歴データの例を示す説明図である。It is explanatory drawing which shows the example of the historical data of a customer visit. 本発明によるデータマイニングシステムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the data mining system by this invention. データマイニングシステムが実行する処理例を示す流れ図である。It is a flowchart which shows the process example which a data mining system performs. 訪問時間を示す属性に対応する分類階層の一例を示す説明図である。It is explanatory drawing which shows an example of the classification | category hierarchy corresponding to the attribute which shows visit time. 訪問先を示す属性に対応する分類階層の一例を示す説明図である。It is explanatory drawing which shows an example of the classification hierarchy corresponding to the attribute which shows a visited place. 訪問者を示す属性に対応する分類階層の一例を示す説明図である。It is explanatory drawing which shows an example of the classification hierarchy corresponding to the attribute which shows a visitor. グループ化手段の結果の一例を示す説明図である。It is explanatory drawing which shows an example of the result of a grouping means. 検定量算出手段の結果経過の一例を示す説明図である。It is explanatory drawing which shows an example of the result progress of a test amount calculation means. 検定量算出手段の結果経過の一例を示す説明図である。It is explanatory drawing which shows an example of the result progress of a test amount calculation means. グループ化手段の結果の一例を示す説明図である。It is explanatory drawing which shows an example of the result of a grouping means. 出力手段の結果の一例を示す説明図である。It is explanatory drawing which shows an example of the result of an output means. 第２の実施形態におけるデータマイニングシステムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the data mining system in 2nd Embodiment. 訪問者を示す属性に対応する部署ごとの分類階層の一例を示す説明図である。It is explanatory drawing which shows an example of the classification | category hierarchy for every department corresponding to the attribute which shows a visitor. 第２の実施形態におけるデータマイニングシステムが実行する処理例を示す流れ図である。It is a flowchart which shows the process example which the data mining system in 2nd Embodiment performs. 第３の実施形態におけるデータマイニングシステムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the data mining system in 3rd Embodiment. 纏め上げ手段の結果の一例を示す説明図である。It is explanatory drawing which shows an example of the result of a gathering means. 第４の実施形態におけるデータマイニングシステムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the data mining system in 4th Embodiment. 第４の実施形態におけるデータマイニングシステムが実行する処理例を示す流れ図である。It is a flowchart which shows the process example which the data mining system in 4th Embodiment performs. グループ化手段の結果の一例を示す説明図である。It is explanatory drawing which shows an example of the result of a grouping means. 検定量算出手段の結果経過の一例を示す説明図である。It is explanatory drawing which shows an example of the result progress of a test amount calculation means. データマイニングシステムの最小の構成例を示すブロック図である。It is a block diagram which shows the minimum structural example of a data mining system.

以下、本発明の実施形態について図面を参照して詳細に説明する。図２は、本発明によるデータマイニングシステムの構成の一例を示すブロック図である。図２に示すように、本発明の第１の実施形態におけるデータマイニングシステムは、プログラムに従って動作するデータ処理装置１００１と、データ記憶装置１００２と、データ群及び特徴を判断したい目的属性を受け取る（データ処理装置１００１に入力する）入力手段１０１と、目的属性に関係のある属性のリストを出力する出力手段１０５とを含む。なお、本実施形態では、データマイニングシステムは、例えば、プログラムに従って動作するパーソナルコンピュータ等の情報処理装置によって実現される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 2 is a block diagram showing an example of the configuration of the data mining system according to the present invention. As shown in FIG. 2, the data mining system according to the first embodiment of the present invention receives a data processing device 1001 that operates according to a program, a data storage device 1002, and a target attribute for which a data group and characteristics are to be determined (data Input means 101 for inputting to the processing apparatus 1001 and output means 105 for outputting a list of attributes related to the target attribute. In the present embodiment, the data mining system is realized by an information processing apparatus such as a personal computer that operates according to a program.

データ処理装置１００１は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。データ処理装置１００１は、属性に対応する分類を階層的にあらわす分類階層を参照し、属性値のグループ化を行うグループ化手段１０２と、データ群及び目的属性を入力として所定の検定を行う検定量算出手段１０３と、検定結果に基づいて属性が特徴的（検定量で関連性が強いことを示す値となっていること）であるかを判定する判定手段１０４とを含む。 Specifically, the data processing device 1001 is realized by a CPU of an information processing device that operates according to a program. The data processing apparatus 1001 refers to a classification hierarchy that hierarchically represents classifications corresponding to attributes, and grouping means 102 that groups attribute values, and a verification amount that performs a predetermined test with the data group and the target attribute as inputs. The calculation means 103 and the determination means 104 which determines whether an attribute is characteristic (it is a value which shows that a test quantity is strong relevance) based on a test result.

データ記憶装置１００２は、具体的には、磁気ディスク装置や光ディスク装置等の記憶装置によって実現される。図２に示すように、データ記憶装置１００２は、属性分類階層記憶手段１０６を含む。 Specifically, the data storage device 1002 is realized by a storage device such as a magnetic disk device or an optical disk device. As shown in FIG. 2, the data storage device 1002 includes attribute classification hierarchy storage means 106.

これらの手段はそれぞれ概略つぎのように動作する。 Each of these means generally operates as follows.

入力手段１０１は、具体的には、キーボード等の入力装置によって実現される。入力手段１０１は、データ群と目的属性とを受け取る（データ処理装置１００１に入力する）機能を備えている。データ群は、属性と属性値とを含む。また、目的属性とは、データマイニングを行うことによって、データ群に含まれる属性と規則性や関係性の検定対象となる属性である（なお、本実施形態では、このような属性をデータ群に含まれる属性とは区別して目的属性という）。たとえば、顧客訪問履歴データ（データ群）を入力とし、成功／失敗に関連する属性を求めたい場合には、入力手段１０１は、目的属性として、成功／失敗を入れる（データ処理装置１００１に入力する）。本実施形態では、データマイニングシステムは、データマイニングを行うことによって、入力したデータ群に含まれる属性と、目的属性との規則性や関係性を導出する処理を行う。 Specifically, the input unit 101 is realized by an input device such as a keyboard. The input unit 101 has a function of receiving a data group and a target attribute (inputting to the data processing apparatus 1001). The data group includes attributes and attribute values. In addition, the purpose attribute is an attribute that is subject to a test of regularity and relationship with the attribute included in the data group by performing data mining (in this embodiment, such an attribute is included in the data group. It is called a target attribute to distinguish it from the included attributes). For example, when customer visit history data (data group) is input and an attribute related to success / failure is to be obtained, the input unit 101 inputs success / failure as the target attribute (input to the data processing apparatus 1001). ). In the present embodiment, the data mining system performs processing for deriving regularity and relationship between attributes included in the input data group and target attributes by performing data mining.

属性分類階層記憶手段１０６は、具体的には、磁気ディスク装置や光ディスク装置等の記憶装置によって実現される。属性分類階層記憶手段１０６は、属性とその属性に対応する分類階層とを記憶する。たとえば、属性分類階層記憶手段１０６は、属性が訪問時間であれば、分類階層として、午前や午後に分類し、さらに、午前を就業時間午前と早朝とに分類して記憶する。属性分類階層記憶手段１０６は、これらの分類階層を属性毎に記憶する。これらのデータ（分類階層）には、ユーザが予め作成したデータを用いて属性分類階層記憶手段１０６に記憶してもよいし、一般に公開されているデータを用いて属性分類階層記憶手段１０６に記憶してもよい。たとえば、属性が企業の場合には、東京証券取引所などが用いる企業分類を用いることができる。そして、図１に示すような顧客訪問履歴であれば、この企業分類を訪問先の分類階層として用いてもよい。 Specifically, the attribute classification hierarchy storage means 106 is realized by a storage device such as a magnetic disk device or an optical disk device. The attribute classification hierarchy storage unit 106 stores attributes and classification hierarchies corresponding to the attributes. For example, if the attribute is a visit time, the attribute classification hierarchy storage means 106 classifies the classification hierarchy as AM or PM, and further classifies AM as working hours AM and early morning and stores them. The attribute classification hierarchy storage means 106 stores these classification hierarchies for each attribute. These data (classification hierarchy) may be stored in the attribute classification hierarchy storage means 106 using data created in advance by the user, or stored in the attribute classification hierarchy storage means 106 using publicly available data. May be. For example, when the attribute is a company, a company classification used by the Tokyo Stock Exchange can be used. And if it is a customer visit history as shown in FIG. 1, you may use this company classification | category as a classification hierarchy of a visited place.

グループ化手段１０２は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。グループ化手段１０２は、属性分類階層記憶手段１０６を参照し、入力データを、属性に対応する分類階層に基づいて、下位階層から順番にグループ化する機能を備えている。具体的には、グループ化手段１０２は、入力手段１０１が入力したデータ群から属性を一つ選択し、選択した属性に対応する分類階層があれば、その分類階層の最下層の分類に基づいて、属性値のグループ化を行う。たとえば、グループ化手段１０２は、顧客訪問履歴データ群から訪問時間属性を選択した場合、訪問時間属性に対応する分類階層の最下層が２時間ごとの分類であれば、属性値である実際の訪問時間を２時間毎にグループ化したデータを作成する。 Specifically, the grouping means 102 is realized by a CPU of an information processing apparatus that operates according to a program. The grouping means 102 has a function of referring to the attribute classification hierarchy storage means 106 and grouping input data in order from the lower hierarchy based on the classification hierarchy corresponding to the attribute. Specifically, the grouping unit 102 selects one attribute from the data group input by the input unit 101, and if there is a classification hierarchy corresponding to the selected attribute, the grouping unit 102 is based on the classification of the lowest layer of the classification hierarchy. , Group attribute values. For example, when the grouping means 102 selects a visit time attribute from the customer visit history data group, if the lowest layer of the classification hierarchy corresponding to the visit time attribute is a classification every two hours, the actual visit that is the attribute value Create data with time grouped every two hours.

また、グループ化手段１０２は、判定手段１０４から再グループ化する旨の指示を受け取る（入力される）と、先にグループ化した属性に対応する分類階層の一つ上の階層の分類に基づいて、属性値のグループ化を行う機能を備えている。さらに、グループ化手段１０２は、判定手段１０４から別の属性のグループ化をする旨の指示を受け取る（入力される）と、別の属性を一つ選択し、選択した属性に対応する分類階層に基づいて、属性値のグループ化を行う機能を備えている。 When the grouping unit 102 receives (inputs) an instruction to regroup from the determination unit 104, the grouping unit 102 is based on the classification of the hierarchy one level above the classification hierarchy corresponding to the previously grouped attribute. A function for grouping attribute values is provided. Further, when the grouping unit 102 receives (inputs) an instruction to group another attribute from the determination unit 104, the grouping unit 102 selects one other attribute and enters the classification hierarchy corresponding to the selected attribute. Based on this, a function for grouping attribute values is provided.

検定量算出手段１０３は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。検定量算出手段１０３は、グループ化手段１０２から出力されたデータ群と目的属性（グループ化された入力データ）とに基づいて、目的属性と他の属性や属性値との関連性の強さを示す検定量を算出する機能を備えている。検定量算出手段１０３は、検定量を算出する際に、たとえば、一般的なカイ二乗検定などの方法を用いる。たとえば、データ群が顧客訪問の履歴データで、目的属性が成功／失敗であった場合、検定量算出手段１０３は、カイ二乗検定であれば、成功／失敗の属性と訪問時間の属性とに基づいて頻度表を作成する。そして、検定量算出手段１０３は、作成した頻度表を用いてカイ二乗検定を行う。 Specifically, the test amount calculation means 103 is realized by a CPU of an information processing apparatus that operates according to a program. Based on the data group output from the grouping means 102 and the objective attribute (grouped input data), the test amount calculation means 103 determines the strength of the relevance between the objective attribute and other attributes and attribute values. It has a function to calculate the test amount shown. The test amount calculation means 103 uses, for example, a method such as a general chi-square test when calculating the test amount. For example, when the data group is customer visit history data and the objective attribute is success / failure, the verification amount calculation means 103 is based on the success / failure attribute and the visit time attribute if the chi-square test is used. Create a frequency table. Then, the test amount calculation means 103 performs a chi-square test using the created frequency table.

判定手段１０４は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。判定手段１０４は、検定量算出手段１０３から検定結果を受け取り、その、検定結果が特徴的か否かを、所定の閾値に基づいて判断する機能を備えている。特徴的でないと判断した場合には、判定手段１０４は、階層を一つ上げてグループ化を行う旨の指示をグループ化手段１０２に伝える（出力する）。特徴的であると判定した場合には、判定手段１０４は、他の属性についてグループ化を行う旨の指示をグループ化手段１０２に伝える（出力する）。すべての属性のグループ化が終われば、判定手段１０４は、特徴的であった属性を出力手段１０５に渡す（出力する）。 Specifically, the determination unit 104 is realized by a CPU of an information processing apparatus that operates according to a program. The determination unit 104 has a function of receiving a test result from the test amount calculation unit 103 and determining whether the test result is characteristic based on a predetermined threshold. If it is determined that it is not characteristic, the determination unit 104 transmits (outputs) to the grouping unit 102 an instruction to perform grouping by raising the hierarchy by one. If it is determined that it is characteristic, the determination unit 104 transmits (outputs) an instruction to group the other attributes to the grouping unit 102. When all the attributes are grouped, the determination unit 104 passes (outputs) the characteristic attribute to the output unit 105.

出力手段１０５は、具体的には、プログラムに従って動作する情報処理装置のネットワークインターフェース部やディスプレイ装置等の表示装置によって実現される。出力手段１０５は、判定手段１０４より属性リストを受け取り、これを出力する機能を備えている。出力手段１０５は、たとえば、ディスプレイ装置によって実現される場合、判定手段１０４の指示に従って、属性リストを表示する。また、出力手段１０５は、たとえば、ネットワークインターフェース部によって実現される場合、他のデータ処理装置などに属性リストを出力する。 Specifically, the output unit 105 is realized by a display device such as a network interface unit or a display device of an information processing apparatus that operates according to a program. The output unit 105 has a function of receiving the attribute list from the determination unit 104 and outputting it. For example, when the output unit 105 is realized by a display device, the output unit 105 displays an attribute list according to an instruction from the determination unit 104. The output unit 105 outputs the attribute list to another data processing device, for example, when realized by a network interface unit.

次に、図２及び図３を参照してデータマイニングシステムの動作について説明する。図３は、データマイニングシステムが実行する処理例を示す流れ図である。 Next, the operation of the data mining system will be described with reference to FIGS. FIG. 3 is a flowchart showing an example of processing executed by the data mining system.

データ群から所定の属性に関連する情報を抽出するために、ユーザが入力手段１０１を操作してデータ群と目的属性とを入力すると、入力手段１０１は、ユーザの操作に従って、データ群と目的属性とをデータ処理装置１００１に入力する。ここで、本実施形態で入力手段１０１がデータ処理装置１００１に入力するデータ群の例を図１に示す。図１に示す顧客訪問の履歴データは、営業部門の社員がどのような顧客にいつ訪問し、その営業活動が成功したか失敗したかを示すデータを含む。図１を参照すると、データ群は、訪問先、訪問日、訪問時間、訪問者及び成功／失敗を属性とする表形式のデータである。そして、図１に示す表内の各値は、属性に対する属性値を示す。図１に示すように、各データには、識別子ＩＤが記されて（付されて）いる。ＩＤ１のデータは、属性「訪問先」の属性値が「Ａ産業」であることを表している。なお、本実施形態では、入力手段１０１は、目的属性として「成功／失敗」を入力するものとする。これによって、ユーザは、どの行動が営業活動の成功、失敗に関係が深いかを知ることができる。 In order to extract information related to a predetermined attribute from the data group, when the user operates the input unit 101 to input the data group and the target attribute, the input unit 101 performs the data group and the target attribute according to the user operation. Are input to the data processing apparatus 1001. Here, an example of a data group input by the input unit 101 to the data processing apparatus 1001 in this embodiment is shown in FIG. The customer visit history data shown in FIG. 1 includes data indicating what kind of customers the sales department employees visited and when the sales activities succeeded or failed. Referring to FIG. 1, the data group is tabular data having attributes of visited place, visited date, visited time, visitor, and success / failure. Each value in the table shown in FIG. 1 indicates an attribute value for the attribute. As shown in FIG. 1, an identifier ID is written (applied) to each data. The data of ID1 represents that the attribute value of the attribute “visit destination” is “A industry”. In the present embodiment, the input unit 101 inputs “success / failure” as the target attribute. Thus, the user can know which behavior is closely related to the success or failure of the sales activity.

入力手段１０１がユーザの入力操作に従ってデータ処理装置１００１にデータ群と目的属性とを入力すると、グループ化手段１０２は、入力されたデータ群を参照し、グループ化の対象となる属性を一つ選択する（図３のステップＳ１１）。グループ化手段１０２は、属性の選択をランダムに行う。たとえば、グループ化手段１０２は、図１に示すデータ群に含まれる各属性から、訪問時間を選択する。 When the input unit 101 inputs a data group and a target attribute to the data processing apparatus 1001 in accordance with a user input operation, the grouping unit 102 refers to the input data group and selects one attribute to be grouped. (Step S11 in FIG. 3). The grouping means 102 performs attribute selection at random. For example, the grouping means 102 selects a visit time from each attribute included in the data group shown in FIG.

次に、グループ化手段１０２は、属性分類階層記憶手段１０６を参照し、選択した属性に対応する分類階層を取得（抽出）する（図３のステップＳ１２）。 Next, the grouping means 102 refers to the attribute classification hierarchy storage means 106 and acquires (extracts) a classification hierarchy corresponding to the selected attribute (step S12 in FIG. 3).

属性分類階層記憶手段１０６が記憶する分類階層の例を、図４、図５、図６に示す。本実施形態では、分類階層記憶手段１０６は、これらの分類階層を、すべて記憶しているものとする。 Examples of classification hierarchies stored in the attribute classification hierarchy storage unit 106 are shown in FIGS. In the present embodiment, it is assumed that the classification hierarchy storage unit 106 stores all of these classification hierarchies.

図４は、訪問時間を示す属性に対応する分類階層の例を示す説明図である。図４を参照すると、訪問時間は、午前と午後とに分類されている。さらに、午前は、早朝と就業時間午前とに、午後は、昼休みと就業時間午後と深夜とに分類されている。さらに、就業時間午前は、８：００−１０：００と１０：００−１２：００とに分類されている。 FIG. 4 is an explanatory diagram illustrating an example of a classification hierarchy corresponding to an attribute indicating a visit time. Referring to FIG. 4, the visit time is classified into morning and afternoon. Furthermore, morning is classified into early morning and working hours in the morning, and afternoon is classified into lunch break, working hours in the afternoon and late at night. Furthermore, the working hours in the morning are classified into 8: 00-10: 00 and 10: 00-12: 00.

図５は、訪問先を示す属性に対応する分類階層の例を示す説明図である。なお、図中の「…」は、省略を表す。図５には、訪問先企業の分類が示されている。このような企業分類として、一般に公開されている企業分類を流用することができる。たとえば、この図５に示す例は、広くインターネットで公開されているＷｉｋｉｐｅｄｉａにおける日本企業一覧の分類から抜粋した例である。 FIG. 5 is an explanatory diagram illustrating an example of a classification hierarchy corresponding to an attribute indicating a visited place. In addition, "..." in a figure represents abbreviation. FIG. 5 shows a classification of visited companies. As such a company classification, a public company classification can be used. For example, the example shown in FIG. 5 is an example extracted from the classification of the list of Japanese companies in Wikipedia widely released on the Internet.

図６は、訪問者を示す属性に対応する分類階層の例を示す説明図である。なお、図中「…」は省略を表す。図６には、訪問者の職域分類が示されている。このような分類については、たとえば、会社の人事データベースなどから取得可能である。 FIG. 6 is an explanatory diagram illustrating an example of a classification hierarchy corresponding to an attribute indicating a visitor. In the figure, "..." represents omission. FIG. 6 shows the job classification of visitors. Such classification can be obtained from, for example, a company personnel database.

本実施形態では、ステップＳ１２において、グループ化手段１０２は、属性分類階層記憶手段１０６から、図４に示す訪問時間に対応する分類階層を選択（抽出）する。 In the present embodiment, in step S12, the grouping unit 102 selects (extracts) the classification hierarchy corresponding to the visit time shown in FIG. 4 from the attribute classification hierarchy storage unit 106.

次に、グループ化手段１０２は、選択（抽出）した分類階層のうち、現在処理対象である階層の分類に基づいて、属性値のグループ化を行う（図３のステップＳ１３）。ここでは、属性に対応する分類階層を選択した直後であるので、分類階層のうちの最下位の階層が、現在処理対象の階層となる。グループ化手段１０２は、この階層の分類に基づいて、データ群（ステップＳ１１で選択した属性の属性値）をグループ化する。 Next, the grouping unit 102 groups attribute values based on the classification of the hierarchy that is the current processing target among the selected (extracted) classification hierarchies (step S13 in FIG. 3). Here, immediately after the classification hierarchy corresponding to the attribute is selected, the lowest hierarchy among the classification hierarchies becomes the current processing target hierarchy. The grouping means 102 groups the data group (attribute value of the attribute selected in step S11) based on this hierarchy classification.

図１に示すデータ群をグループ化した例を図７に示す。図７を参照すると、訪問時間を示す属性の属性値が変化しているのがわかる。グループ化手段１０２は、訪問時間を示す属性に対応する分類階層のうち、最下位の階層である２時間毎の分類に基づいて、属性値をグループ化している。たとえば、データＩＤ１の訪問時間を示す属性の属性値が、「１１：３０」から「１０：００−１２：００」のグループに置き換わっている。また、データＩＤ２の訪問時間を示す属性の属性値が、「１３：１０」から「１３：００−１５：００」のグループに、データＩＤ３の訪問時間を示す属性の属性値が、「１３：２０」から「１３：００−１５：００」のグループにそれぞれ置き換わっている。 FIG. 7 shows an example in which the data group shown in FIG. 1 is grouped. Referring to FIG. 7, it can be seen that the attribute value of the attribute indicating the visit time has changed. The grouping means 102 groups attribute values based on the classification for every two hours, which is the lowest hierarchy among the classification hierarchies corresponding to the attribute indicating the visit time. For example, the attribute value of the attribute indicating the visit time of the data ID 1 has been replaced with a group from “11:30” to “10: 00-12: 00”. In addition, the attribute value of the attribute indicating the visit time of the data ID2 is in a group of “13:10” to “13: 00-15: 00” and the attribute value of the attribute indicating the visit time of the data ID3 is “13: 20 ”to“ 13: 00-15: 00 ”groups.

次に、検定量算出手段１０３は、グループ化手段１０２から出力されたグループ化後のデータと目的属性とに基づいて、目的属性と他の属性やグループ化した属性値との検定量を算出する。検定量算出手段１０３は、検定量の算出において、二つの変数の関係が強いか否かを測ることができる既存の手法を用いる。既存の手法には、たとえば、カイ二乗検定やフィッシャーの正確検定、また、決定木でよく用いられる情報利得比や、相関ルールで用いられるサポートとコンフィデンスなどがある。 Next, the test amount calculation unit 103 calculates a test amount between the target attribute and other attributes or grouped attribute values based on the grouped data output from the grouping unit 102 and the target attribute. . The test amount calculation means 103 uses an existing method that can measure whether or not the relationship between two variables is strong in calculating the test amount. Existing methods include, for example, chi-square test and Fisher's exact test, information gain ratio often used in decision trees, and support and confidence used in association rules.

ここで、検定量算出手段１０３が、検定量の算出において、カイ二乗検定を用いた場合の例について説明する。まず、カイ二乗検定を用いる場合、検定量算出手段１０３は、目的属性とグループ化対象の属性とに基づいて、頻度表を作成する。図８に頻度表の例を示す。図８に示す例は、横に目的属性である成功／失敗属性の属性値「成功」と「失敗」とを、縦にグループ化対象の処理対象の属性である訪問時間の一つの属性値「１２：００−１３：００」と「１２：００−１３：００以外」とをとる２×２の頻度表である。表内の値は、その条件に合致するデータ数である。検定量算出手段１０３は、この頻度表を用いて、以下のカイ二乗検定を行う。 Here, an example in which the test amount calculation means 103 uses a chi-square test in calculating the test amount will be described. First, when the chi-square test is used, the test amount calculation unit 103 creates a frequency table based on the objective attribute and the grouping target attribute. FIG. 8 shows an example of a frequency table. In the example illustrated in FIG. 8, the attribute values “success” and “failure” of the success / failure attribute that are the target attributes are displayed horizontally, and one attribute value “ It is a 2 × 2 frequency table that takes “12: 00-13: 00” and “other than 12: 00-13: 00”. The value in the table is the number of data that matches the condition. The test amount calculation means 103 performs the following chi-square test using this frequency table.

カイ二乗＝Σ｛（Ｆ（ｉ，ｊ）‐Ｄ（ｉ，ｊ））^２／Ｆ（ｉ，ｊ）｝、
Ｄ（ｉ，ｊ）：頻度表のｉ行ｊ列の値、
Ｆ（ｉ，ｊ）：ａ（ｉ）×ｂ（ｊ）／Ｎ、Ｎ：総頻度、
ａ（ｉ）：ｉ行の値の合計値、
ｂ（ｊ）：ｊ列の値の合計値 Chi-square = Σ {(F (i, j) −D (i, j)) ² / F (i, j)},
D (i, j): value of i row j column of the frequency table,
F (i, j): a (i) × b (j) / N, N: total frequency,
a (i): total value of i rows,
b (j): Total value of j columns

この式に当てはめると、以下のように計算できる。 When applied to this equation, it can be calculated as follows.

カイ二乗検定＝（２５０×６０／１２６０−５０）^２／（２５０×６０／１２６０）＋（１０１０×６０／１２６０−１０）^２／（１０１０×６０／１２６０）＋（２５０×１２００／１２６０−２００）^２／（２５０×１２００／１２６０）＋（１０１０×１２００／１２６０−１０００）^２／（１０１０×１２００／１２６０）＝１２１.９０４７６１９＋３０.１７４４４６０２＋６.０９５２３８０９５＋１.５０８７２２３０１＝１５９.６８３１６８３ Chi-square test = (250 × 60 / 1260−50) ² / (250 × 60/1260) + (1010 × 60 / 1260−10) ² / (1010 × 60/1260) + (250 × 1200 / 1260−200) ) ² / (250 × 1200/1260) + (1010 × 1200 / 1260−1000) ² /(1010×1200/1260)=121.9047619+30.17444602+6.095238095+1.508722301=159.68316383

以上のように算出された値が、営業活動の成功／失敗（目的属性）と、その訪問時間が１２：００−１３：００（グループ化した属性値）か否かとに関する検定量となる。検定量算出手段１０３は、同じように、グループ化した他の各属性値でも検定量を算出する。そして、グループ化した全ての属性値について検定量を算出すると、検定量算出手段１０３は、グループ化した各属性値と検定量とを含むリストを判定手段１０４に渡す（出力する）。 The value calculated as described above is a verification amount relating to success / failure of sales activities (objective attribute) and whether or not the visit time is 12: 00-13: 00 (grouped attribute value). Similarly, the test amount calculation means 103 calculates a test amount for each of the other attribute values grouped. When the test amount is calculated for all the grouped attribute values, the test amount calculation unit 103 passes (outputs) a list including each grouped attribute value and the test amount to the determination unit 104.

また、図９に示すように、検定量算出手段１０３は、成功／失敗と訪問時間の各属性値とに基づいて、頻度表を作成する。これに基づいて上記と同じように処理することで、検定量算出手段１０３は、訪問時間の２時間毎集計（分類階層における特定の階層）と成功／失敗（目的属性）とに関する検定量を算出する。この場合、検定量算出手段１０３は、階層と検定量とを判定手段１０４に渡す（出力する）。 Further, as shown in FIG. 9, the test amount calculation means 103 creates a frequency table based on the attribute values of success / failure and visit time. Based on this, processing is performed in the same manner as described above, so that the test amount calculation means 103 calculates a test amount related to the summation of visit times every two hours (a specific layer in the classification layer) and success / failure (target attribute). To do. In this case, the test amount calculation unit 103 passes (outputs) the hierarchy and the test amount to the determination unit 104.

次に、判定手段１０４は、出力された検定量が特徴的か否かを判断する（図３のステップＳ１５）。判定手段１０４は、所定の閾値に基づいて、検定量が特徴的であるか否かを判断する。そして、判定手段１０４は、グループ化した属性値と検定量とを含むリストのうち、特徴的なグループ化した属性値をデータ処理装置に記憶させる。また、判定手段１０４は、全てのグループ化した属性値が特徴的であれば、出力された階層が目的属性との関係において特徴的であると判定する。 Next, the determination unit 104 determines whether or not the output verification amount is characteristic (step S15 in FIG. 3). The determination means 104 determines whether or not the test amount is characteristic based on a predetermined threshold. Then, the determination unit 104 causes the data processing device to store characteristic grouped attribute values in the list including the grouped attribute values and the test amount. If all grouped attribute values are characteristic, the determination unit 104 determines that the output hierarchy is characteristic in relation to the target attribute.

ステップＳ１５において、一つでも（すなわち、処理対象の階層でグループ化された属性値のうち一つでも）特徴的でない属性値があると判定した場合、判定手段１０４は、ステップＳ１６に処理を移行する。一方、ステップＳ１５において、全てのグループ化された属性値が特徴的であると判定した場合、判定手段１０４は、ステップＳ１７に処理を移行する。 When it is determined in step S15 that there is an attribute value that is not characteristic (that is, even one attribute value grouped in the processing target hierarchy), the determination unit 104 proceeds to step S16. To do. On the other hand, if it is determined in step S15 that all grouped attribute values are characteristic, the determination unit 104 shifts the process to step S17.

ここで、特徴的か否かの判定において、たとえば、カイ二乗検定量の場合、有意水準５％を閾値とする。図８に示す例の場合、自由度１であるので、検定量が３.８４以上であれば、判定手段１０４は、帰無仮説（二つの属性は独立である）を棄却し、二つの属性は独立ではないため特徴的であると判定する。図８に示す例の場合、検定量がこれを超えているので、判定手段１０４は、特徴的であると判定する。判定手段１０４は、他のグループ化した属性値についても判定を行い、一つでも特徴的でないグループ化した属性値がある場合には、ステップＳ１６に処理を移行する。 Here, in determining whether or not it is characteristic, for example, in the case of a chi-square test amount, a significance level of 5% is set as a threshold value. In the case of the example shown in FIG. 8, since the degree of freedom is 1, if the test amount is 3.84 or more, the determination unit 104 rejects the null hypothesis (the two attributes are independent), and the two attributes Is determined to be characteristic because it is not independent. In the case of the example shown in FIG. 8, since the verification amount exceeds this, the determination means 104 determines that it is characteristic. The determination unit 104 also determines other grouped attribute values, and if there is any grouped attribute value that is not characteristic, the process proceeds to step S16.

ステップＳ１５において、特徴的でない属性があると判定した場合には、判定手段１０４は、グループ化手段１０２に、処理対象の分類階層を一つ上にしてグループ化する旨の指示を出力する（ステップＳ１６）。本実施形態では、判定手段１０４は、グループ化手段１０２に、訪問時間の属性値をビジネス時間の階層の分類に基づいてグループ化する旨の指示を出力する。 If it is determined in step S15 that there is a non-characteristic attribute, the determination unit 104 outputs an instruction to the grouping unit 102 to group the processing target classification hierarchy up (step S15). S16). In the present embodiment, the determination unit 104 outputs an instruction to the grouping unit 102 to group the attribute values of the visit time based on the business time hierarchy classification.

ステップＳ１６において、判定手段が上記の指示を出力すると、グループ化手段１０２は、現在の階層（すなわち、前回よりも一つ上位の階層）の分類に基づいて属性値をグループ化する（図３のステップＳ１３）。本実施形態では、グループ化手段１０２は、訪問時間の属性値をビジネス時間の階層の分類に基づいてグループ化する。ここでのグループ化処理の結果の例を、図１０に示す。図１０を参照すると、図１に示す例と比較して、訪問時間の属性値が「就業時間午前」や「就業時間午後」などに変更されていることがわかる。 When the determination unit outputs the above instruction in step S16, the grouping unit 102 groups the attribute values based on the classification of the current layer (that is, the layer one level higher than the previous layer) (FIG. 3). Step S13). In the present embodiment, the grouping means 102 groups the visit time attribute values based on the business time hierarchy classification. An example of the result of the grouping process here is shown in FIG. Referring to FIG. 10, it can be seen that the attribute value of the visit time is changed to “working time am”, “working time afternoon”, or the like, as compared with the example shown in FIG.

次に、再度、検定量算出手段１０３は、目的属性とグループ化後の他の属性や属性値との検定量を算出する（ステップＳ１４）。そして、判定手段１０４は、検定量に基づいて、グループ化した属性値が特徴的か否かを判定する（ステップＳ１５）。判定した結果、すべての属性が特徴的である場合、判定手段１０４は、ステップＳ１７に処理を移行する。一方、すべての属性が特徴的でないと判定した場合には、判定手段１０４は、再度、ステップＳ１６に処理を移行する。その後、ステップＳ１５において、判定手段１０４が、すべての属性が特徴的であると判定するまで、データ処理装置１００１は、ステップＳ１３〜Ｓ１６の処理を繰り返し行う。 Next, the test amount calculation means 103 again calculates the test amount of the target attribute and other attributes and attribute values after grouping (step S14). Then, the determination unit 104 determines whether the grouped attribute values are characteristic based on the test amount (step S15). As a result of the determination, if all the attributes are characteristic, the determination unit 104 shifts the process to step S17. On the other hand, if it is determined that all the attributes are not characteristic, the determination unit 104 shifts the process to step S16 again. Thereafter, in step S15, the data processing device 1001 repeats the processing of steps S13 to S16 until the determination unit 104 determines that all attributes are characteristic.

次に、判定手段１０４は、入力されたデータ群のすべての属性について判定を行ったか否かを判定する（ステップＳ１７）。そして、未判定の属性があると判定した場合には、判定手段１０４は、ステップＳ１１に処理を移行する。その後、ステップＳ１７において、判定手段１０４が、すべての属性について判定を行ったと判定するまで、データ処理装置１００１は、ステップＳ１１〜Ｓ１７の処理を繰り返し行う。 Next, the determination unit 104 determines whether or not determination has been made for all attributes of the input data group (step S17). And when it determines with there being an undetermined attribute, the determination means 104 transfers a process to step S11. Thereafter, in step S17, the data processing device 1001 repeats the processing of steps S11 to S17 until the determination unit 104 determines that determination has been made for all attributes.

一方、ステップＳ１７において、すべての属性についての判定が終わったと判定した場合、出力手段１０５は、判定手段１０４の指示に従ってデータ処理装置１００１に記憶させていた特徴的な属性とその検定量とをディスプレイ装置等に出力する（ステップＳ１８）。出力手段１０５が出力する属性リストと検定量の例を図１１に示す。出力手段１０５は、たとえば、図１１に示すように、特徴的であると判断した属性とその検定量とを含むリストを出力する。 On the other hand, if it is determined in step S17 that the determination for all the attributes has been completed, the output unit 105 displays the characteristic attributes stored in the data processing device 1001 and the verification amount thereof according to the instruction of the determination unit 104. The data is output to the device (step S18). An example of the attribute list output by the output means 105 and the test amount is shown in FIG. For example, as shown in FIG. 11, the output unit 105 outputs a list including the attributes determined to be characteristic and their test amounts.

次に、本実施形態の効果について説明する。本実施形態では、属性の粒度を、ユーザが試行錯誤することなく効率的に変更できる。その理由は、グループ化手段が、属性分類階層に基づいて下位階層から順番にグループ化し、判定手段が、特徴的な階層であると判定すると、グループ化処理を止めるように制御するためである。 Next, the effect of this embodiment will be described. In the present embodiment, the granularity of the attribute can be changed efficiently without trial and error by the user. This is because the grouping means performs grouping in order from the lower hierarchy based on the attribute classification hierarchy, and controls the grouping process to stop when the determination means determines that it is a characteristic hierarchy.

また、本実施形態では、データマイニングにおいて、処理対象の属性の粒度の変更を、ルールを用いることなくできる。その理由は、グループ化手段が、属性分類階層記憶手段１０６が記憶する属性分類階層に基づいて、下位階層から順番にグループ化するためである。 In the present embodiment, in the data mining, the granularity of the attribute to be processed can be changed without using a rule. The reason is that the grouping means groups in order from the lower hierarchy based on the attribute classification hierarchy stored in the attribute classification hierarchy storage means 106.

実施形態２．
次に、本発明の第２の実施形態について、図面を参照して詳細に説明する。図１２を参照すると、本発明の第２の実施形態におけるデータマイニングシステムは、第１の実施形態に比べ、グループ化手段１０２に代えて第２のグループ化手段２０２を含み、判定手段１０４に代えて第２の判定手段２０４を含む点で異なる。 Embodiment 2. FIG.
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 12, the data mining system according to the second exemplary embodiment of the present invention includes a second grouping unit 202 instead of the grouping unit 102 and replaces the determination unit 104 as compared with the first exemplary embodiment. The second determination unit 204 is different.

以下、第１の実施形態と異なる点について主に説明する。第２のグループ化手段２０２は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。第２のグループ化手段２０２は、属性分類階層記憶手段１０６を参照し、入力データを分類階層の下位階層から順番にグループ化する機能を備えている。具体的には、第２のグループ化手段２０２は、入力手段１０１から、データ群を受け取る（入力される）と、データ群から属性を一つ選択し、選択した属性に対応する分類階層があれば、その分類階層の最下層の分類に基づいて、属性値のグループ化を行う。属性に対応する分類階層が複数ある場合には、第２のグループ化手段２０２は、いずれか一つを選択する。 Hereinafter, differences from the first embodiment will be mainly described. Specifically, the second grouping unit 202 is realized by a CPU of an information processing apparatus that operates according to a program. The second grouping means 202 has a function of referring to the attribute classification hierarchy storage means 106 and grouping input data in order from the lower hierarchy of the classification hierarchy. Specifically, when the second grouping unit 202 receives (inputs) a data group from the input unit 101, the second grouping unit 202 selects one attribute from the data group and has a classification hierarchy corresponding to the selected attribute. For example, attribute values are grouped based on the classification of the lowest layer of the classification hierarchy. When there are a plurality of classification hierarchies corresponding to the attribute, the second grouping means 202 selects any one.

たとえば、顧客訪問履歴データの訪問者属性に、図６に示す分類階層と図１３に示す分類階層とが対応する場合、第２のグループ化手段２０２は、どちらかの分類階層を選択する。図１３は、訪問者を示す属性に対応する部署ごとの分類階層を示す説明図である。 For example, when the classification hierarchy shown in FIG. 6 and the classification hierarchy shown in FIG. 13 correspond to the visitor attribute of the customer visit history data, the second grouping means 202 selects either classification hierarchy. FIG. 13 is an explanatory diagram showing a classification hierarchy for each department corresponding to an attribute indicating a visitor.

また、第２のグループ化手段２０２は、第２の判定手段２０４から再グループ化を実行する旨の指示を受け取る（入力される）と、先のグループ化の際に処理対象となった階層の一つ上の階層の分類に基づいて、グループ化を行う機能を備えている。さらに、第２のグループ化手段２０２は、第２の判定手段２０４から別の分類階層を選択する旨の指示を受け取る（入力される）と、現在処理対象の分類階層以外の分類階層を選択する機能を備えている。さらに、第２のグループ化手段２０２は、第２の判定手段２０４から別の属性でグループ化を実行する旨の指示を受け取る（入力される）と、データ群から別の属性を一つ選択し、選択した属性に含まれる属性値のグループ化を行う機能を備えている。 When the second grouping unit 202 receives (inputs) an instruction to execute regrouping from the second determination unit 204, the second grouping unit 202 determines the level of the processing target in the previous grouping. A grouping function is provided based on the classification of the hierarchy one level above. Further, upon receiving (inputting) an instruction to select another classification hierarchy from the second determination means 204, the second grouping means 202 selects a classification hierarchy other than the classification hierarchy currently being processed. It has a function. Furthermore, when the second grouping unit 202 receives (inputs) an instruction to execute grouping with another attribute from the second determination unit 204, the second grouping unit 202 selects one other attribute from the data group. And a function for grouping attribute values included in the selected attribute.

第２の判定手段２０４は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。第２の判定手段２０４は、検定量算出手段１０３から検定結果を受け取る（入力される）と、所定の閾値に基づいて、検定結果が特徴的か否かを判定する機能を備えている。検定結果が特徴的でないと判定した場合には、第２の判定手段２０４は、階層を一つ上げてグループ化を行う旨の指示を第２のグループ化手段２０２に伝える（出力）する。一方、検定結果が特徴的であると判定した場合、第２の判定手段２０４は、属性に対応する他の分類階層があれば、これに基づいてグループ化を行う旨の指示を第２のグループ化手段２０２に伝える（出力する）。属性に対応する他の分類階層がない場合には、第２の判定手段２０４は、他の属性についてグループ化を行う旨の指示を第２のグループ化手段２０２に伝える（出力する）。また、すべての属性のグループ化が終了していると判定すると、第２の判定手段２０４は、特徴的であると判定した属性を出力手段１０５に渡す（出力する）。 Specifically, the second determination unit 204 is realized by a CPU of an information processing apparatus that operates according to a program. The second determination unit 204 has a function of determining whether or not the test result is characteristic based on a predetermined threshold when receiving the test result from the test amount calculation unit 103 (input). When it is determined that the test result is not characteristic, the second determination unit 204 transmits (outputs) an instruction to the second grouping unit 202 to raise the hierarchy by one and perform grouping. On the other hand, if it is determined that the test result is characteristic, the second determination unit 204 instructs the second group to perform grouping based on the other classification hierarchy corresponding to the attribute, if any. Is transmitted (output) to the conversion means 202. If there is no other classification hierarchy corresponding to the attribute, the second determination unit 204 transmits (outputs) an instruction to group the other attribute to the second grouping unit 202. When determining that all attributes have been grouped, the second determination unit 204 passes (outputs) the attribute determined to be characteristic to the output unit 105.

次に、図１２及び図１４を参照して、本実施形態におけるデータマイニングシステムの動作について、主に第１の実施形態と異なる部分について詳細に説明する。図１４は、第２の実施形態におけるデータマイニングシステムが実行する処理例を示す流れ図である。 Next, with reference to FIG. 12 and FIG. 14, the operation of the data mining system in the present embodiment will be described in detail mainly for parts different from the first embodiment. FIG. 14 is a flowchart illustrating an example of processing executed by the data mining system according to the second embodiment.

第１の実施形態と異なるのは、属性に対応する他の分類階層があるか否かを判定するステップＳ２１が追加されている点である。 The difference from the first embodiment is that step S21 for determining whether there is another classification hierarchy corresponding to the attribute is added.

ステップＳ２１では、第２の判定手段２０４は、属性分類階層記憶手段１０６が記憶する分類階層に、属性に対応する他の分類階層があるか否かを判定する。そして、あると判定した場合には、第２の判定手段２０４は、ステップ１２に処理を移行し、別の分類階層に基づいてグループ化を行う旨の指示を第２のグループ化手段２０２に出力する。一方、ないと判定した場合には、第２の判定手段２０４は、ステップＳ１７に処理を移行し、第１の実施形態と同様に、未判定の属性があるか否かを判定する。 In step S <b> 21, the second determination unit 204 determines whether there is another classification hierarchy corresponding to the attribute in the classification hierarchy stored in the attribute classification hierarchy storage unit 106. If it is determined that there is, the second determination unit 204 proceeds to step 12 and outputs an instruction to the second grouping unit 202 to perform grouping based on another classification hierarchy. To do. On the other hand, if it is determined that there is no attribute, the second determination unit 204 moves the process to step S17 and determines whether or not there is an undetermined attribute as in the first embodiment.

ステップＳ２１において、第２の判定手段２０４が、別の分類階層に基づいてグループ化を行う旨の指示を出力すると、ステップＳ１２において、第２のグループ化手段２０２は、現在処理対象である分類階層と異なる分類階層を選択する。その後、第２の判定手段２０４は、第１の実施形態における処理と同様に、選択した分類階層に基づいて、属性値のグループ化を行う。 In step S21, when the second determination unit 204 outputs an instruction to perform grouping based on another classification hierarchy, in step S12, the second grouping unit 202 determines that the classification hierarchy that is the current processing target. Select a different classification hierarchy. After that, the second determination unit 204 performs grouping of attribute values based on the selected classification hierarchy, similarly to the processing in the first embodiment.

以上のように、本実施形態では、一つの属性に複数の分類階層が対応する場合、第２のグループ化手段は、いずれか一つの分類階層を選択する。そして、第２の判定手段は、未採用の分類階層を選択する旨の指示を第２のグループ化手段に出力する。そして、第２のグループ化手段は、第２の判定手段から出力された指示に従って、他の分類階層に基づいてグループ化処理を実行する。そのため、複数の分類階層が対応していても、すべての分類階層に対して特徴的か否かを判定することができる。 As described above, in the present embodiment, when a plurality of classification hierarchies correspond to one attribute, the second grouping unit selects any one of the classification hierarchies. Then, the second determination unit outputs an instruction to select an unadopted classification hierarchy to the second grouping unit. Then, the second grouping unit executes the grouping process based on another classification hierarchy according to the instruction output from the second determination unit. Therefore, even if a plurality of classification hierarchies correspond, it can be determined whether or not all the classification hierarchies are characteristic.

実施形態３．
次に、本発明の第３の実施形態について図面を参照して詳細に説明する。図１５を参照すると、本発明の第３の実施形態におけるデータマイニングシステムは、第１の実施形態に比べ、纏め上げ手段３０１を追加で含む点で異なる。以下、第１の実施形態と異なる点について主に説明する。 Embodiment 3. FIG.
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 15, the data mining system according to the third exemplary embodiment of the present invention is different from the first exemplary embodiment in that a collecting unit 301 is additionally included. Hereinafter, differences from the first embodiment will be mainly described.

纏め上げ手段３０１は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。纏め上げ手段３０１は、出力する属性を検定量に基づいて順位付けした際に、近くの順位、または、差が小さい検定量で、分類階層において兄弟関係（同一階層）にある全ての分類がある場合には、これを纏めて親階層の分類として、出力手段１０５に出力する機能を備えている。このように、本実施形態では、纏め上げるとは、所定の関係にある複数の分類を１つの属性に置換することをいう。 Specifically, the bundling unit 301 is realized by a CPU of an information processing apparatus that operates according to a program. When the attributes to be output are ranked based on the test amount, the grouping unit 301 has all the classifications that are in the sibling relationship (same hierarchy) in the classification hierarchy with a close rank or a test amount with a small difference. In such a case, a function is provided for outputting these to the output means 105 as a classification of the parent hierarchy. Thus, in the present embodiment, to summarize means to replace a plurality of classifications having a predetermined relationship with one attribute.

纏め上げ手段３０１は、近くの順位や、差が小さい検定量かどうかを、所定の閾値に基づいて判断する。たとえば、図１１に示す属性と検定量の順位付けとを入力とし、前後３位以内を閾値とした場合を考える。分類階層を参照すると、順位３と順位４との属性は、「訪問時間１３：００−１５：００」と「訪問時間１５：００−１７：００」とで、兄弟関係にある分類であり、「訪問時間就業時間午後」の下位階層に属する全ての兄弟関係にある分類である。したがって、纏め上げ手段３０１は、この２つをまとめ、「訪問時間就業時間午後」とする。この際、纏め上げ手段３０１は、検定量として、順位が高いもの（すなわち、図１１に示す順位３の検定値２３．４）を採用する。なお、検定量を決定する方法として、これに限らず、たとえば、平均値や、２つの値を併記する方法を用いてもよい。 The gathering unit 301 determines whether the ranking is near or whether the difference is a small test amount based on a predetermined threshold. For example, let us consider a case where the attributes shown in FIG. 11 and the ranking of the test amount are input, and the threshold value is within 3rd order. Referring to the classification hierarchy, the attributes of the ranking 3 and the ranking 4 are “visiting time 13: 00-15: 00” and “visiting time 15: 00-17: 00” and are in a sibling relationship, It is a classification that has all the sibling relationships that belong to the lower hierarchy of “visit time working hours afternoon”. Therefore, the summarizing means 301 puts these two together as “visit time working hour afternoon”. At this time, the summarizing means 301 employs a test rank having a high rank (that is, a rank 3 test value 23.4 shown in FIG. 11). The method for determining the test amount is not limited to this, and for example, an average value or a method of writing two values together may be used.

以上のように、本実施形態では、纏め上げ手段３０１は、出力する特徴的な属性のリストのうち、分類階層において近いものを纏め上げる。そのため、ユーザにとって見やすい分類を出力することができる。 As described above, in the present embodiment, the grouping unit 301 collects the characteristic attribute lists to be output that are close in the classification hierarchy. Therefore, it is possible to output a classification that is easy for the user to see.

実施形態４．
次に、本発明の第４の実施形態について図面を用いて詳細に説明する。図１７を参照すると、本発明の第４の実施形態におけるデータマイニングシステムは、第１の実施の形態に比べ、組合せ手段４０１を追加で含む点で異なる。 Embodiment 4 FIG.
Next, a fourth embodiment of the present invention will be described in detail with reference to the drawings. Referring to FIG. 17, the data mining system according to the fourth exemplary embodiment of the present invention is different from the first exemplary embodiment in that a combination unit 401 is additionally included.

組合せ手段４０１は、具体的には、プログラムに従って動作する情報処理装置のＣＰＵによって実現される。組合せ手段４０１は、入力手段１０１からデータ群と目的属性とを入力されると、データ群のうち、どれか二つ以上の属性の組合せを作成する機能を備えている。組合せ手段４０１は、データ群と目的属性と目的属性以外の属性の組合せとを、グループ化手段１０２に渡す（出力する）機能を備えている。 Specifically, the combination means 401 is realized by a CPU of an information processing apparatus that operates according to a program. The combination unit 401 has a function of creating a combination of any two or more attributes in the data group when the data group and the target attribute are input from the input unit 101. The combination unit 401 has a function of passing (outputting) the data group, the target attribute, and a combination of attributes other than the target attribute to the grouping unit 102.

グループ化手段１０２は、組合せ手段４０１から、データ群と目的属性と目的属性以外の属性の組合せとを出力されると、その属性の属性値のグループ化を行う機能を備えている。 The grouping unit 102 has a function of grouping attribute values of attributes when a data group, a target attribute, and a combination of attributes other than the target attribute are output from the combining unit 401.

次に、図１７及び図１８を参照して、本実施形態におけるデータマイニングシステムの動作について、主に第１の実施形態と異なる部分について詳細に説明する。図１８は、第４の実施形態におけるデータマイニングシステムが実行する処理例を示す流れ図である。 Next, with reference to FIG. 17 and FIG. 18, the operation of the data mining system in the present embodiment will be described in detail mainly on the differences from the first embodiment. FIG. 18 is a flowchart illustrating an example of processing executed by the data mining system according to the fourth embodiment.

組合せ手段４０１は、データ群を参照し、目的属性以外の属性から、属性２つ以上の属性の組合せを選択する（図１８に示すステップＳ４１）。たとえば、図１に示すデータ群が入力された場合、組合せ手段４０１は、目的属性以外の属性の組み合わせとして、訪問先と訪問時間とを選択する。 The combination unit 401 refers to the data group and selects a combination of two or more attributes from attributes other than the target attribute (step S41 shown in FIG. 18). For example, when the data group shown in FIG. 1 is input, the combination unit 401 selects a visited place and a visited time as a combination of attributes other than the objective attribute.

次に、グループ化手段１０２は、属性の分類階層を取得（抽出）する（図１８に示すステップＳ１２）。 Next, the grouping means 102 acquires (extracts) the attribute classification hierarchy (step S12 shown in FIG. 18).

次に、グループ化手段１０２は、現在処理対象である階層の分類に基づいて属性値をグループ化する（図１８に示すステップＳ１３）。たとえば、データ群が図１に示すものであり、目的属性以外の属性の組合せが訪問先及び訪問時間であり、訪問時間に対応する分類階層が図４に示すものであり、訪問先に対応する分類階層が図５に示すものであった場合を想定する。この場合、グループ化手段１０２は、訪問時間及び訪問先に対応する分類階層を参照し、対応する分類階層の最下層の分類に基づいて、図１９に示すように、その属性値をグループ化する。図１９を参照すると、図１に比べ、訪問先及び訪問時間の属性値がグループ化されていることがわかる。一行目のデータを参照すると、訪問先がＡ産業から食料品カテゴリグループに変換され、訪問時間が１１：１０から１０：００−１２：００に変換されている。 Next, the grouping means 102 groups the attribute values based on the classification of the hierarchy that is the current processing target (step S13 shown in FIG. 18). For example, the data group is as shown in FIG. 1, the combination of attributes other than the objective attribute is the visit destination and the visit time, and the classification hierarchy corresponding to the visit time is as shown in FIG. 4, corresponding to the visit destination. Assume that the classification hierarchy is as shown in FIG. In this case, the grouping means 102 refers to the classification hierarchy corresponding to the visit time and the visit destination, and groups the attribute values based on the lowest classification of the corresponding classification hierarchy as shown in FIG. . Referring to FIG. 19, it can be seen that the attribute values of the visited place and the visited time are grouped as compared with FIG. Referring to the data in the first row, the visited place is converted from the A industry to the food category group, and the visiting time is converted from 11:10 to 10: 00-12: 00.

次に、検定量算出手段１０３は、検定量を算出する（図１８のＳ１４）。たとえば、カイ二乗検定を用いる場合、検定量算出手段１０３は、属性の組合せで頻度表を作成する。検定量算出手段が作成する頻度表の例を、図２０に示す。図２０は、訪問先が食料品で、かつ、訪問時間が１２：００−１３：００の場合の頻度表である。このほかの動作は、第１の実施形態と同様である。 Next, the test amount calculation means 103 calculates a test amount (S14 in FIG. 18). For example, when the chi-square test is used, the test amount calculation unit 103 creates a frequency table with combinations of attributes. An example of the frequency table created by the test amount calculation means is shown in FIG. FIG. 20 is a frequency table when the visit destination is food and the visit time is 12: 00-13: 00. Other operations are the same as those in the first embodiment.

以上のように、本実施形態では、第１の実施形態の効果に加えて、属性の組合せをさらに考慮することで、より細かな分析を実現することができる。 As described above, in this embodiment, in addition to the effects of the first embodiment, more detailed analysis can be realized by further considering the combination of attributes.

次に、本発明によるデータマイニングシステムの最小構成について説明する。図２１は、データマイニングシステムの最小の構成例を示すブロック図である。図２１に示すように、データマイニングシステムは、最小の構成要素として、グループ化手段１０２と、検定量算出手段１０３と、判定手段１０４とを含む。 Next, the minimum configuration of the data mining system according to the present invention will be described. FIG. 21 is a block diagram illustrating a minimum configuration example of the data mining system. As shown in FIG. 21, the data mining system includes a grouping unit 102, a test amount calculation unit 103, and a determination unit 104 as minimum components.

図２１に示す最小構成のデータマイニングシステムでは、グループ化手段１０２は、複数の属性と属性値とを含むデータから属性を選択し、記憶部が記憶する属性に対応する分類を階層的にあらわす分類階層に基づいて、選択した属性の属性値をグループ化する。そして、検定量算出手段１０３は、グループ化手段１０２がグループ化した属性値及び選択した属性と分析対象の属性との関連性の強さを示す検定量を算出する。そして、判定手段１０４は、検定値算出手段１０３が算出した検定量に基づいて、グループ化された属性値及び選択した属性が分析対象の属性に対して特徴的であるか否かを判定する。そして、判定手段１０４が特徴的でないと判定すると、グループ化手段１０２は、分類階層のうち、前回より上位の階層の分類に基づいてグループ化処理を再度実行する。 In the data mining system with the minimum configuration shown in FIG. 21, the grouping unit 102 selects an attribute from data including a plurality of attributes and attribute values, and classifies the classification corresponding to the attribute stored in the storage unit in a hierarchical manner. Group attribute values of selected attributes based on hierarchy. Then, the test amount calculation unit 103 calculates the test value indicating the attribute value grouped by the grouping unit 102 and the strength of relevance between the selected attribute and the attribute to be analyzed. Then, the determination unit 104 determines whether the grouped attribute value and the selected attribute are characteristic of the analysis target attribute based on the test amount calculated by the test value calculation unit 103. Then, when the determination unit 104 determines that it is not characteristic, the grouping unit 102 executes the grouping process again based on the classification of the hierarchy higher than the previous one among the classification hierarchies.

従って、図２１に示す最小構成のデータマイニングシステムによれば、データマイニングにおいて、分析粒度についてユーザが試行錯誤することなく、効率的に変更することができる。 Therefore, according to the data mining system having the minimum configuration shown in FIG. 21, in the data mining, the analysis granularity can be changed efficiently without trial and error by the user.

なお、本実施形態では、以下の（１）〜（６）に示すようなデータマイニングシステムの特徴的構成が示されている。 In this embodiment, the characteristic configuration of the data mining system as shown in the following (1) to (6) is shown.

（１）データマイニングシステムは、複数の属性と属性値とを含むデータ（例えば、図１に示す顧客訪問の履歴データ）から分析対象の属性（例えば、図１に示す成功／失敗）に関連する属性を判定するデータマイニングシステムであって、データから属性を選択し、予め記憶する（例えば、属性分類階層記憶手段１０６によって実現される）属性に対応する分類を階層的にあらわす分類階層に基づいて、選択した属性の属性値をグループ化するグループ化手段（例えば、グループ化手段１０２によって実現される）と、グループ化手段がグループ化した属性値と分析対象の属性との関連性の強さを示す検定量を算出する検定量算出手段（例えば、検定量算出手段１０３によって実現される）と、検定値算出手段が算出した検定量に基づいて、グループ化された属性値が分析対象の属性との関係において特徴的であるか否かを判定する判定手段（例えば、判定手段１０４によって実現される）とを含み、グループ化手段は、判定手段が特徴的でないと判定すると、分類階層のうち、前回のグループ化で用いた階層よりも上位の階層（例えば、図４に示すビジネス時間）の分類に基づいてグループ化処理を再度実行することを特徴とする。 (1) The data mining system relates to an analysis target attribute (for example, success / failure illustrated in FIG. 1) from data including a plurality of attributes and attribute values (for example, customer visit history data illustrated in FIG. 1). A data mining system for determining an attribute, wherein an attribute is selected from data and stored in advance (for example, realized by the attribute classification hierarchy storage means 106) based on a classification hierarchy that hierarchically represents a classification corresponding to the attribute Grouping means for grouping the attribute values of the selected attributes (for example, realized by the grouping means 102), and the strength of relevance between the attribute value grouped by the grouping means and the attribute to be analyzed Based on the test amount calculated by the test value calculation unit (for example, realized by the test amount calculation unit 103) and the test value calculation unit that calculates the test amount shown Determining means (for example, realized by the determining means 104) for determining whether or not the grouped attribute values are characteristic in relation to the attribute to be analyzed. If it is determined that it is not characteristic, the grouping process is performed again based on the classification of the classification hierarchy higher than the hierarchy used in the previous grouping (for example, the business time shown in FIG. 4). And

（２）データマイニングシステムにおいて、グループ化手段は、判定手段が全てのグループ化した属性値が分析対象の属性に対して特徴的であると判定すると、データから他の属性を選択してグループ化処理を再度実行するように構成されていてもよい。 (2) In the data mining system, when the grouping unit determines that all grouped attribute values are characteristic of the analysis target attribute, the grouping unit selects other attributes from the data and groups them You may be comprised so that a process may be performed again.

（３）データマイニングシステムは、複数の属性と属性値とを含むデータ及び分析対象の属性を入力として、分析対象の属性に関連する属性のリストを出力するデータマイニングシステムであって、属性に対応する分類を階層的にあらわす分類階層を記憶する属性分類階層記憶手段（例えば、属性分類階層記憶手段１０６によって実現される）と、データから、比較対象とする属性を一つ選択し、選択した属性に対応する分類階層を属性分類階層記憶手段から抽出し、抽出した分類階層の下位階層の分類に基づいて、データのうち比較対象とする属性に対応する属性値をグループ化するグループ化手段（例えば、グループ化手段１０２によって実現される）と、少なくとも分析対象の属性と比較対象とする属性、または分析対象の属性とグループ化手段がグループ化した比較対象とする属性の全ての属性値のいずれかに対し、分析対象の属性と比較対象とする属性と、または分析対象の属性と比較対象とする属性の属性値との関係性の強さを示す検定量を算出する検定量算出手段（例えば、検定量算出手段１０３によって実現される）と、検定量算出手段が算出した検定量が特定の閾値を越えるか否かによって特徴的であるか否かを判定し、特徴的でない属性または特徴的でない属性値を示す検定結果があると判定した場合に、グループ化手段に分類階層の一つ上の階層の分類に基づいてグループ化処理を実行させ、全ての比較対象の属性または属性値の検定結果が特徴的であると判定した場合に、グループ化手段に分類階層を変更することなく、未判定の別の属性を選択させ、全ての特徴的な属性のリストと検定量とを出力する判定手段（例えば、判定手段１０４によって実現される）とを含むことを特徴とする。 (3) A data mining system is a data mining system that receives data including a plurality of attributes and attribute values and attributes to be analyzed and outputs a list of attributes related to the attributes to be analyzed. Attribute classification hierarchy storage means (for example, realized by the attribute classification hierarchy storage means 106) that stores the classification hierarchy that represents the classification to be performed, and one attribute to be compared is selected from the data, and the selected attribute Grouping means (for example, grouping attribute values corresponding to the attributes to be compared among the data based on the classification of the lower hierarchy of the extracted classification hierarchy) And at least the attribute to be analyzed and the attribute to be compared, or the attribute to be analyzed and the group The attribute to be analyzed and the attribute to be compared, or the attribute value of the attribute to be analyzed and the attribute to be compared with Whether or not the test amount calculated by the test amount calculation means (for example, realized by the test amount calculation means 103) that calculates the test amount indicating the strength of the relationship exceeds a specific threshold Based on the classification of the hierarchy one level above the classification hierarchy in the grouping means when it is determined whether there is a test result indicating a non-characteristic attribute or a non-characteristic attribute value. Grouping process, and when it is determined that all the comparison target attributes or attribute value test results are characteristic, the grouping means does not change the classification hierarchy and assigns another undecided attribute. Let all select Characteristic list and test amounts of attributes and outputs the determination means (e.g., as implemented by the decision means 104), characterized in that it comprises a.

（４）データマイニングシステムにおいて、属性分類階層記憶手段は、一つの属性に対応する分類階層を記憶し、グループ化手段（例えば、第２のグループ化手段２０２によって実現される）は、属性に対応する分類階層を属性分類階層記憶手段からいずれか一つ選択し、判定手段（例えば、第２の判定手段２０４によって実現される）は、全ての比較対象の属性または属性の属性値の検定結果が特徴的であると判定し、かつ、比較対象の属性に対応するその他の分類階層がある場合、グループ化手段に未選択の分類階層を選択させるように構成されていてもよい。 (4) In the data mining system, the attribute classification hierarchy storage unit stores a classification hierarchy corresponding to one attribute, and the grouping unit (for example, realized by the second grouping unit 202) corresponds to the attribute. One of the classification hierarchies to be selected is selected from the attribute classification hierarchy storage unit, and the determination unit (for example, realized by the second determination unit 204) determines that all the attributes to be compared or the attribute result of the attribute are tested. If there is another classification hierarchy corresponding to the attribute to be compared and determined to be characteristic, the grouping unit may be configured to select an unselected classification hierarchy.

（５）データマイニングシステムにおいて、特徴的な属性のリストと検定量とを入力に、所定の条件を満たす属性同士を纏めて一つの属性とする纏め上げ手段（例えば、纏め上げ手段３０１によって実現される）を含み、纏め上げ手段は、所定の条件として、検定量順に順位づけした結果、異なる複数の属性が、所定の順位幅以内に含まれることと、所定の検定量の範囲に含まれることとの、少なくともいずれか一つを満たし、かつ、これらの属性が、対応する分類階層において兄弟階層であり同一の親分類を共有する場合に、親分類を新たな特徴的な属性として纏め上げて出力するように構成されていてもよい。 (5) In the data mining system, a grouping unit (for example, the grouping unit 301 is realized by combining attributes satisfying a predetermined condition into one attribute by inputting a list of characteristic attributes and a test amount. The summarizing means includes, as a predetermined condition, a plurality of different attributes included within a predetermined rank range as a result of ranking in the order of the test amount and within a range of the predetermined test amount. And when these attributes are sibling hierarchies in the corresponding classification hierarchy and share the same parent classification, the parent classification is summarized as a new characteristic attribute. It may be configured to output.

（６）データマイニングシステムは、グループ化手段によるグループ化の対象とする属性の組合せを生成する組合せ手段（例えば、組合せ手段４０１によって実現される）を含むように構成されていてもよい。 (6) The data mining system may include a combination unit (for example, realized by the combination unit 401) that generates a combination of attributes to be grouped by the grouping unit.

本発明は、顧客訪問履歴を利用して行動分析をする用途に適用可能である。また、販売履歴データを利用して販売商品分析をする用途にも適用可能である。 The present invention can be applied to a usage analysis using a customer visit history. Moreover, it is applicable also to the use which analyzes sales goods using sales history data.

１０１入力手段
１０２グループ化手段
１０３検定量算出手段
１０４判定手段
１０５出力手段
１０６属性分類階層記憶手段
２０２第２のグループ化手段
２０４第２の判定手段
３０１纏め上げ手段
４０１組合せ手段
１００１データ処理装置
１００２データ記憶装置 DESCRIPTION OF SYMBOLS 101 Input means 102 Grouping means 103 Test amount calculation means 104 Judgment means 105 Output means 106 Attribute classification hierarchy storage means 202 2nd grouping means 204 2nd determination means 301 Summarizing means 401 Combining means 1001 Data processing apparatus 1002 Data Storage device

Claims

A data mining system for determining an attribute related to an attribute to be analyzed from data including a plurality of attributes and attribute values,
Grouping means for selecting an attribute from the data and grouping attribute values of the selected attribute based on a classification hierarchy that hierarchically represents a classification corresponding to the attribute stored in advance;
A test amount calculating means for calculating a test amount indicating the strength of relevance between the attribute value grouped by the grouping means and the attribute to be analyzed;
Determination means for determining whether the grouped attribute values are characteristic in relation to the analysis target attribute based on the test amount calculated by the test value calculation means,
When the grouping unit determines that the determination unit is not characteristic, the grouping unit executes the grouping process again based on a classification of a hierarchy higher than the hierarchy used in the previous grouping among the classification hierarchies. A featured data mining system.

The grouping unit selects another attribute from the data and executes the grouping process again when the determining unit determines that all grouped attribute values are characteristic in relation to the attribute to be analyzed. 1. A data mining system according to 1.

A data mining system that outputs data including a plurality of attributes and attribute values and an attribute to be analyzed, and outputs a list of attributes related to the attribute to be analyzed,
Attribute classification hierarchy storage means for storing a classification hierarchy representatively representing the classification corresponding to the attribute;
From the data, select one attribute to be compared, extract the classification hierarchy corresponding to the selected attribute from the attribute classification hierarchy storage means, based on the classification of the lower hierarchy of the extracted classification hierarchy, the Grouping means for grouping attribute values corresponding to the attributes to be compared in the data;
At least one of the analysis target attribute and the comparison target attribute, or the analysis target attribute and all the attribute values of the comparison target attribute grouped by the grouping unit, the analysis target attribute A test amount calculating means for calculating a test amount indicating the strength of the relationship between the attribute and the attribute to be compared, or the attribute value of the attribute to be analyzed and the attribute to be compared;
It is determined whether or not the test amount calculated by the test amount calculation means is characteristic depending on whether or not it exceeds a specific threshold, and it is determined that there is a test result indicating an uncharacteristic attribute or an uncharacteristic attribute value. In the case where the grouping means is caused to execute a grouping process based on the classification of the hierarchy one level above the classification hierarchy, and it is determined that all comparison target attributes or attribute value test results are characteristic In addition, the grouping means includes selecting means for selecting another undecided attribute without changing the classification hierarchy, and outputting a list of all characteristic attributes and the test amount. Mining system.

The attribute classification hierarchy storage means stores a classification hierarchy corresponding to one attribute,
The grouping means selects one of the classification hierarchies corresponding to the attribute from the attribute classification hierarchy storage means,
The determination unit determines that all the attributes to be compared or the test result of the attribute value of the attribute is characteristic, and if there is another classification hierarchy corresponding to the attribute to be compared, the grouping unit The data mining system according to claim 3, wherein an unselected classification hierarchy is selected.

Including a list of characteristic attributes and a test amount as input, and a grouping unit that combines attributes satisfying a predetermined condition into one attribute,
The summarizing means, as a predetermined condition, as a result of ranking in the order of the test amount, as a result, a plurality of different attributes are included within a predetermined rank range and at least included in the range of the predetermined test amount When any one of these attributes is satisfied and these attributes are sibling hierarchies in the corresponding classification hierarchy and share the same parent classification, the parent classification is collected and output as a new characteristic attribute. Item 5. The data mining system according to item 3 or claim 4.

The data mining system according to any one of claims 3 to 5, further comprising a combination unit that generates a combination of attributes to be grouped by the grouping unit.

A data mining method for identifying an attribute related to an attribute to be analyzed from data including a plurality of attributes and attribute values,
A grouping step of selecting an attribute from the data and grouping attribute values of the selected attribute based on a classification hierarchy that hierarchically represents a classification corresponding to the attribute stored in a storage unit;
A test amount calculation step of calculating a test amount indicating the strength of the relationship between the grouped attribute value and the attribute to be analyzed;
A determination step of determining whether or not the grouped attribute values are characteristic in relation to the analysis target attribute based on the calculated verification amount;
If it is determined in the grouping step that the characteristic is not characteristic in the determination step, a grouping process is executed again based on a classification of a hierarchy higher than the previous one of the classification hierarchies.

The data according to claim 7, wherein in the grouping step, when it is determined that all the grouped attribute values are characteristic in relation to the attribute to be analyzed, another attribute is selected from the data and the grouping process is executed. Mining method.

A data mining method for inputting data including a plurality of attributes and attribute values and an attribute to be analyzed, and outputting a list of attributes related to the attribute to be analyzed,
A storage step of storing in the storage unit a classification hierarchy that hierarchically represents the classification corresponding to the attribute;
A selection step of selecting one attribute to be compared from the data;
An extraction step of extracting the classification hierarchy corresponding to the selected attribute from the storage unit;
A grouping step of grouping attribute values corresponding to attributes of the data based on the classification of the lower hierarchy of the extracted classification hierarchy;
The analysis target attribute and the comparison target for at least one of the attribute of the analysis target and the attribute of the comparison target, or all attribute values of the attribute of the previous comparison target grouped with the analysis target attribute Or a test amount calculation step for calculating a test amount indicating the strength of the relationship between the attribute to be analyzed and the attribute value of the attribute to be compared;
The classification hierarchy is determined when it is determined whether there is a test result indicating a non-characteristic attribute or a non-characteristic attribute value by determining whether the calculated test amount exceeds a specific threshold or not. A regrouping step for grouping based on a hierarchy above
When it is determined that the test results of all the comparison target attributes or attribute values are characteristic, another undecided attribute is selected without changing the classification hierarchy, and all characteristic attribute lists and tests are performed. A data mining method comprising: an output step for outputting the quantity.

In the storing step, the classification hierarchy corresponding to one attribute is stored,
In the extraction step, one of the classification hierarchies corresponding to the attribute is extracted,
When it is determined that all the attributes to be compared or the test result of the attribute value of the attribute is characteristic, and there is another classification hierarchy corresponding to the attribute to be compared, an unselected classification hierarchy is The data mining method according to claim 9, further comprising a reselecting step of selecting.

Including a list of characteristic attributes and a test amount as input, and a step of combining attributes that satisfy a predetermined condition into one attribute,
In the summarizing step, as a predetermined condition, as a result of ranking in the order of the test amount, any of a plurality of different attributes is included within a predetermined rank range or included in a range of a predetermined test amount When one or both of them are satisfied and these attributes are sibling hierarchies in the corresponding classification hierarchy and share the same parent classification, the parent classification is collected and output as a new characteristic attribute. The data mining method according to claim 9 or claim 10.

The data mining method according to claim 9, further comprising a combination step of generating a combination of attributes to be grouped.

To identify attributes related to the attribute being analyzed from data containing multiple attributes and attribute values,
A grouping process for selecting an attribute from the data and grouping attribute values of the selected attribute based on a classification hierarchy that hierarchically represents a classification corresponding to the attribute stored in the storage unit;
A test amount calculation process for calculating a test amount indicating the strength of relevance between the grouped attribute value and the attribute to be analyzed;
A determination process for determining whether or not the grouped attribute values are characteristic in relation to the analysis target attribute based on the calculated verification amount;
A data mining program for causing a grouping process to be executed again based on a classification of a hierarchy higher than the previous one of the classification hierarchies when it is determined that the grouping process is not characteristic in the determination process.

On the computer,
14. The data according to claim 13, wherein when the grouping process determines that all grouped attribute values are characteristic in relation to the attribute to be analyzed, the grouping process is performed by selecting another attribute from the data. Mining program.