JP6257298B2

JP6257298B2 - Data classification apparatus, data classification method, and data classification program

Info

Publication number: JP6257298B2
Application number: JP2013255823A
Authority: JP
Inventors: 麻里 ▲高▼木; 森田　豊久; 豊久森田; 朋角田
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2013-12-11
Filing date: 2013-12-11
Publication date: 2018-01-10
Anticipated expiration: 2033-12-11
Also published as: JP2015114807A

Description

本発明は、データ分類装置およびデータ分類方法ならびにデータ分類プログラムに関する。 The present invention relates to a data classification device, a data classification method, and a data classification program.

ＡＴＭ（ＡｕｔｏｍａｔｉｃＴｅｌｌｅｒＭａｃｈｉｎｅ）による取引の内容を記録するＡＴＭジャーナルには、個人情報が含まれる。そして、この個人情報を分析対象とする際には、プライバシーを保護することが求められる。 The ATM journal that records the contents of transactions by ATM (Automatic Teller Machine) includes personal information. When this personal information is to be analyzed, it is required to protect privacy.

特開２０１３−１２５３７４号公報（特許文献１）には、「複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、レコードの数の分布が、偏りが大きいことを表す条件を満たしているか判断する工程と、レコードの数の分布が、偏りが大きいことを表す条件を満たしている場合には、複数のレコードのうち少なくとも１のレコードにおける第１の属性の属性値を、曖昧化データに置換し、データ格納部に格納する工程とを含む」と記載されている。 Japanese Patent Laid-Open No. 2013-125374 (Patent Document 1) states that “the first of the plurality of records for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obscured. Determining whether the distribution of the number of records satisfies a condition indicating that the bias is large, from the data stored in the data storage unit in which the number of records in which the attribute value of the attribute appears is stored; When the distribution of the number of records satisfies the condition indicating that the deviation is large, the attribute value of the first attribute in at least one of the plurality of records is replaced with the ambiguous data, and the data is stored. Including the process of storing in the part. "

また、特開２０１０−７２７２７号公報（特許文献２）には、「履歴処理装置は、まず、時系列で記録されたユーザの行動履歴データに含まれる各履歴データを履歴集合に分類するため第一の条件に基づいて、各履歴データをいくつかの履歴集合に分類する第一処理を実行する。第一処理終了後、履歴処理装置は、履歴データの記録時点および履歴データ間の類似度に関する第二の条件に基づいて、第一処理により集合に分類されずに残された孤立データを集合のいずれかに組み込む第二処理を実行する」と記載されている。 Japanese Patent Laid-Open No. 2010-72727 (Patent Document 2) states that “the history processing apparatus first classifies each history data included in the user's behavior history data recorded in time series into a history set. Based on one condition, a first process for classifying each history data into several history sets is executed, and after the first process is finished, the history processing apparatus relates to the recording time of the history data and the similarity between the history data. Based on the second condition, a second process is executed in which the isolated data remaining without being classified into the set by the first process is incorporated into any of the sets. "

特開２０１３−１２５３７４号公報JP 2013-125374 A 特開２０１０−７２７２７号公報JP 2010-72727 A

特許文献１に記載された技術では、曖昧化データへ置換することで分析対象に含まれる個人情報のプライバシーを保護できるようになる。しかし、置換後の曖昧化データの情報量が置換前のデータよりも少なくなるという問題があった。 With the technique described in Patent Document 1, the privacy of personal information included in the analysis target can be protected by substituting with ambiguous data. However, there is a problem that the amount of information of obfuscated data after replacement is smaller than that of data before replacement.

また、特許文献２に記載された技術では、類似度に基づいて履歴集合を分類することで分析対象に含まれる個人情報を保護できるようになる。ここで、類似度は、処理に利用するための数値などであり、それ自体が何らかの特徴や意味を有する値ではなく、類似度だけでは、分類後のグループが、どのような基準で分類されたのかを識別するのが容易ではないという問題があった。 In the technique described in Patent Document 2, personal information included in the analysis target can be protected by classifying the history set based on the similarity. Here, the degree of similarity is a numerical value used for processing, etc., and is not a value that has any characteristics or meaning in itself. There was a problem that it was not easy to identify.

本発明の目的は、分類後のグループが、どのような基準で分類されたのかを識別することを容易にできるようにする技術を提供することである。 An object of the present invention is to provide a technique that makes it easy to identify on which basis the classified group is classified.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次の通りである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

本発明の一実施の形態は、顧客の属性を示す静的なデータと前記顧客の行動に対して蓄積されたデータとを結合した属性系列データを記憶する属性系列ＤＢを有する。また、前記属性系列データから特徴を抽出し、抽出した前記特徴をデータとしてデータ項目と対応付けて前記属性系列ＤＢに追加する特徴抽出部を有する。また、前記属性系列データに含まれる前記データが数値である場合、前記数値を区間へと変換する区分部を有する。また、前記属性系列ＤＢに記憶された前記データ項目と前記データとを対応付けて一覧で表示する出力装置を有する。また、前記出力装置が一覧で表示する前記データ項目から、前記データ項目を選択する入力を受け付ける入力装置を有する。また、前記入力装置が入力を受け付けた前記データ項目に基づき抽出されるデータをグループとするグループ化部を有する。 One embodiment of the present invention has an attribute series DB that stores attribute series data obtained by combining static data indicating customer attributes and data accumulated for the customer behavior. A feature extraction unit configured to extract a feature from the attribute series data and add the extracted feature to the attribute series DB in association with a data item as data; In addition, when the data included in the attribute series data is a numerical value, there is a division unit that converts the numerical value into a section. Further, the data processing apparatus includes an output device that displays the data items stored in the attribute series DB and the data in a list in association with each other. Moreover, it has an input device which receives the input which selects the said data item from the said data item which the said output device displays with a list. The input device further includes a grouping unit that groups data extracted based on the data items received by the input device.

また、他の実施の形態では、特徴抽出部が、顧客の属性を示す静的なデータと前記顧客の行動に対して蓄積されたデータとを結合した属性系列データから特徴を抽出し、抽出した前記特徴をデータとしてデータ項目と対応付けて属性系列ＤＢに追加するデータ項目追加ステップを有する。また、区分部が、前記属性系列データに含まれる前記データが数値である場合、前記数値を区間へと変換する数値変換ステップを有する。また、出力装置が、前記属性系列ＤＢに記憶された前記データ項目と前記データとを対応付けて一覧で表示する表示ステップを有する。また、入力装置が、前記出力装置が一覧で表示する前記データ項目から、前記データ項目を選択する入力を受け付けるデータ項目選択ステップを有する。また、グループ化部が、前記入力装置が入力を受け付けた前記データ項目に基づき抽出されるデータをグループとするグループ化ステップを有する。 In another embodiment, the feature extraction unit extracts and extracts features from attribute series data obtained by combining static data indicating customer attributes and data accumulated for the customer behavior. A data item adding step of adding the feature as data to the attribute series DB in association with the data item; In addition, when the data included in the attribute series data is a numerical value, the sorting unit includes a numerical value conversion step of converting the numerical value into a section. In addition, the output device includes a display step of displaying the data items stored in the attribute series DB and the data in a list in association with each other. The input device may further include a data item selection step for receiving an input for selecting the data item from the data items displayed by the output device in a list. Further, the grouping unit includes a grouping step for grouping data extracted based on the data items received by the input device.

また、他の実施の形態では、特徴抽出部が、顧客の属性を示す静的なデータと前記顧客の行動に対して蓄積されたデータとを結合した属性系列データから特徴を抽出し、抽出した前記特徴をデータとしてデータ項目と対応付けて属性系列ＤＢに追加するデータ項目追加ステップをデータ分類装置のコンピュータに実行させる。また、区分部が、前記属性系列データに含まれる前記データが数値である場合、前記数値を区間へと変換する数値変換ステップを前記コンピュータに実行させる。また、出力装置が、前記属性系列ＤＢに記憶された前記データ項目と前記データとを対応付けて一覧で表示する表示ステップを前記コンピュータに実行させる。また、入力装置が、前記出力装置が一覧で表示する前記データ項目から、前記データ項目を選択する入力を受け付けるデータ項目選択ステップを前記コンピュータに実行させる。また、グループ化部が、前記入力装置が入力を受け付けた前記データ項目に基づき抽出されるデータをグループとするグループ化ステップを前記コンピュータに実行させる。 In another embodiment, the feature extraction unit extracts and extracts features from attribute series data obtained by combining static data indicating customer attributes and data accumulated for the customer behavior. The computer of the data classification device is caused to execute a data item adding step of adding the feature as data to the attribute series DB in association with the data item. In addition, when the data included in the attribute series data is a numerical value, the sorting unit causes the computer to execute a numerical value conversion step of converting the numerical value into an interval. Further, the output device causes the computer to execute a display step of displaying the data items stored in the attribute series DB in association with the data in a list. Further, the input device causes the computer to execute a data item selection step for receiving an input for selecting the data item from the data items displayed by the output device in a list. Further, the grouping unit causes the computer to execute a grouping step of grouping data extracted based on the data items received by the input device.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

本発明の代表的な実施の形態によれば、分類後のグループが、どのような基準で分類されたのかを識別することが容易になる。 According to the representative embodiment of the present invention, it is easy to identify on which basis the classified group is classified.

本発明の一実施の形態におけるデータ分類装置の構成例の概要を示す図である。It is a figure which shows the outline | summary of the structural example of the data classification device in one embodiment of this invention. 本発明の一実施の形態における属性ＤＢが記憶する属性データの構成例の概要を示す図である。It is a figure which shows the outline | summary of the structural example of the attribute data which attribute DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における系列ＤＢが記憶する系列データの構成例の概要を示す図である。It is a figure which shows the outline | summary of the structural example of the series data which the series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における系列ＤＢが記憶する系列データの他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the other structural example of the series data which the series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データの構成例の概要を示す図である。It is a figure which shows the outline | summary of the structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における全体処理の概要を示す図である。It is a figure which shows the outline | summary of the whole process in one embodiment of this invention. 本発明の一実施の形態における特徴抽出処理の概要を示す図である。It is a figure which shows the outline | summary of the feature extraction process in one embodiment of this invention. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データの他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the other structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データのさらに他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the further another structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データのさらに他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the further another structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データのさらに他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the further another structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における区分処理の概要を示す図である。It is a figure which shows the outline | summary of the division process in one embodiment of this invention. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データのさらに他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the further another structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における属性系列ＤＢが記憶する属性系列データのさらに他の構成例の概要を示す図である。It is a figure which shows the outline | summary of the further another structural example of the attribute series data which attribute series DB in one embodiment of this invention memorize | stores. 本発明の一実施の形態における項目選択処理の概要を示す図である。It is a figure which shows the outline | summary of the item selection process in one embodiment of this invention. 本発明の一実施の形態における選択画面の概要を示す図である。It is a figure which shows the outline | summary of the selection screen in one embodiment of this invention. 本発明の一実施の形態におけるグループ化処理の概要を示す図である。It is a figure which shows the outline | summary of the grouping process in one embodiment of this invention. 本発明の一実施の形態におけるグループ化ＤＢが記憶するグループ化データの構成例の概要を示す図である。It is a figure which shows the outline | summary of the structural example of the grouping data which grouping DB in one embodiment of this invention memorize | stores.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

＜全体構成＞
図１は、本発明の一実施の形態におけるデータ分類装置１の構成例の概要を示す図である。図１に示されるようにデータ分類装置１は、出力装置１１０と、入力装置１２０と、属性ＤＢ１３１と、系列ＤＢ１３２と、属性系列ＤＢ１３３と、グループ化ＤＢ１３４と、データ分類部１４０とを有する。 <Overall configuration>
FIG. 1 is a diagram showing an outline of a configuration example of a data classification device 1 according to an embodiment of the present invention. As shown in FIG. 1, the data classification device 1 includes an output device 110, an input device 120, an attribute DB 131, a series DB 132, an attribute series DB 133, a grouping DB 134, and a data classification unit 140.

また、データ分類装置１は、所定のハードウェアおよびソフトウェアにより実装される。例えば、データ分類装置１は、プロセッサやメモリなどを有し、プロセッサにより実行されるメモリ上のプログラムが、データ分類装置１のコンピュータを機能させる。 Further, the data classification device 1 is implemented by predetermined hardware and software. For example, the data classification device 1 includes a processor, a memory, and the like, and a program on the memory executed by the processor causes the computer of the data classification device 1 to function.

データ分類部１４０は、特徴抽出部１４１と、区分部１４２と、項目選択部１４３と、グループ化部１４４とを有する。 The data classification unit 140 includes a feature extraction unit 141, a sorting unit 142, an item selection unit 143, and a grouping unit 144.

属性ＤＢ１３１には、顧客の年齢や性別などの顧客の属性を示す静的なデータである属性データが記憶される。 The attribute DB 131 stores attribute data, which is static data indicating customer attributes such as customer age and sex.

系列ＤＢ１３２には、例えば、顧客の購買ログや、ＡＴＭを利用した場合における顧客の取引ログなどの顧客の行動に対して蓄積されたデータである系列データが記憶される。 The series DB 132 stores, for example, series data, which is data accumulated with respect to customer behavior, such as customer purchase logs and customer transaction logs when ATMs are used.

属性系列ＤＢ１３３には、顧客の属性を示す静的なデータと、顧客の行動に対して蓄積されたデータとを結合した属性系列データが記憶される。属性系列データは分析対象となるデータ（以下、分析対象データと呼ぶ場合がある）である。 The attribute series DB 133 stores attribute series data obtained by combining static data indicating customer attributes and data accumulated with respect to customer behavior. The attribute series data is data to be analyzed (hereinafter sometimes referred to as analysis target data).

特徴抽出部１４１は、属性系列ＤＢ１３３から属性系列データを取得する。また、特徴抽出部１４１は、取得した属性系列データから特徴（データ（このデータは、属性系列データのあるデータ項目のデータが該当する。）を構成する要素の要素数や、データを構成する隣り合う要素の比率や、データに含まれる数が最も多い要素である最頻要素や、各要素の平均値など）を抽出する。さらに、特徴抽出部１４１は、抽出した特徴をデータとしてデータ項目と対応付けて属性系列ＤＢ１３３に追加する。 The feature extraction unit 141 acquires attribute series data from the attribute series DB 133. The feature extraction unit 141 also acquires the number of elements constituting the feature (data (this data corresponds to data of a data item having the attribute series data)) from the acquired attribute series data, and the adjacent data constituting the data. The ratio of matching elements, the most frequent element that is the most contained in the data, the average value of each element, etc.) are extracted. Further, the feature extraction unit 141 adds the extracted feature as data to the attribute series DB 133 in association with the data item.

区分部１４２は、属性系列データに含まれるデータが数値である場合、このデータの最大値と最小値とを抽出する。また、区分部１４２は、抽出した最小値から最大値までの区間をＬ（例えば、２）等分する。これにより、抽出された最小値から最大値までの区間はＬ個の区間へと分割される。また、区分部１４２は、データである数値を、分割後の区間へと変換する。そして、区分部１４２は、変換後のデータを属性系列ＤＢ１３３に追加する。 When the data included in the attribute series data is a numerical value, the sorting unit 142 extracts the maximum value and the minimum value of this data. Further, the sorting unit 142 equally divides a section from the extracted minimum value to the maximum value into L (for example, 2). Thereby, the section from the extracted minimum value to the maximum value is divided into L sections. Further, the sorting unit 142 converts the numerical value, which is data, into a section after division. Then, the sorting unit 142 adds the converted data to the attribute series DB 133.

出力装置１１０は、属性系列ＤＢ１３３に記憶されたデータ項目とデータとを一覧で選択画面（後述、図１６）に表示する。 The output device 110 displays the data items and data stored in the attribute series DB 133 as a list on a selection screen (described later, FIG. 16).

入力装置１２０は、選択画面に表示された各データ項目から１以上のデータ項目を選択する入力を受け付ける。これにより、入力装置１２０は、グループを生成するためのデータ項目を選択する入力を受け付ける。 The input device 120 receives an input for selecting one or more data items from each data item displayed on the selection screen. Thereby, the input device 120 receives an input for selecting a data item for generating a group.

グループ化部１４４は、入力装置１２０が入力を受け付けたデータ項目に対応するデータを属性系列ＤＢ１３３から取得する。また、グループ化部１４４は、取得したデータ同士を組み合わせることで複数のグループを生成し、属性系列データを生成したグループに基づき分類する。これにより、グループ化部１４４は、選択する入力を受け付けたデータ項目に基づき抽出されるデータをグループとする。また、グループ化部１４４は、生成したグループごとの分類結果をグループ化ＤＢ１３４に記憶する。 The grouping unit 144 acquires data corresponding to the data item for which the input device 120 has received an input from the attribute series DB 133. The grouping unit 144 also generates a plurality of groups by combining the acquired data, and classifies the groups based on the group that generated the attribute series data. As a result, the grouping unit 144 groups the data extracted based on the data item that received the input to be selected. The grouping unit 144 also stores the generated classification result for each group in the grouping DB 134.

＜属性データ＞
図２は、本発明の一実施の形態における属性ＤＢ１３１が記憶する属性データの構成例の概要を示す図である。属性ＤＢ１３１には、顧客の年齢や性別などの顧客の属性を示す静的なデータである属性データが記憶される。図２に示されるように、属性データは、[ＩＤ]、[属性１（年齢）]〜[属性ｋ（性別）]などのデータ項目からなる。[ＩＤ]は、顧客を識別するための符号を示す。[属性１]〜[属性ｋ]は、顧客の属性を示す。例えば、[属性１]は、顧客の年齢を示し、[属性ｋ]は、顧客の性別を示す。 <Attribute data>
FIG. 2 is a diagram showing an outline of a configuration example of attribute data stored in the attribute DB 131 according to the embodiment of the present invention. The attribute DB 131 stores attribute data, which is static data indicating customer attributes such as customer age and sex. As shown in FIG. 2, the attribute data includes data items such as [ID], [attribute 1 (age)] to [attribute k (gender)]. [ID] indicates a code for identifying a customer. [Attribute 1] to [Attribute k] indicate customer attributes. For example, [Attribute 1] indicates the age of the customer, and [Attribute k] indicates the sex of the customer.

＜系列データ＞
図３は、本発明の一実施の形態における系列ＤＢ１３２が記憶する系列データの構成例の概要を示す図である。系列ＤＢ１３２には、例えば、顧客の購買ログや、ＡＴＭを利用した場合における顧客の取引ログなどの顧客の行動に対して蓄積されたデータである系列データが記憶される。図３に示されるように、系列データは、［ＩＤ］、[日付]、［取引］、［店舗］などのデータ項目からなる。[日付]は、取引がされた日付を示す。［取引］は、取引の内容を示す。［店舗］は、取引がされた店舗の名称を示す。 <Series data>
FIG. 3 is a diagram showing an outline of a configuration example of the sequence data stored in the sequence DB 132 according to the embodiment of the present invention. The series DB 132 stores, for example, series data, which is data accumulated with respect to customer behavior, such as customer purchase logs and customer transaction logs when ATMs are used. As shown in FIG. 3, the series data includes data items such as [ID], [Date], [Transaction], and [Store]. [Date] indicates the date on which the transaction was made. [Transaction] indicates the content of the transaction. [Store] indicates the name of the store where the transaction was made.

なお、図４に示されるように、系列データを［ＩＤ］、［年月］、［残高］などから構成するようにしても良い。［残高］は、［年月］から特定される日付（例えば対応する月の月末）における残高を示す。 As shown in FIG. 4, the series data may be composed of [ID], [year / month], [balance], and the like. [Balance] indicates the balance on a date (for example, the end of the corresponding month) specified from [Year / Month].

＜属性系列データ＞
図５は、本発明の一実施の形態における属性系列ＤＢ１３３が記憶する属性系列データの構成例の概要を示す図である。属性系列ＤＢ１３３には、顧客の属性を示す静的なデータと、顧客の行動に対して蓄積されたデータとを結合した属性系列データが記憶される。図５に示されるように、属性系列データは、［ＩＤ］が同一の属性データと系列データとが結合されることで生成される。具体的には、図３の系列データの場合、[日付]の任意範囲（例えば、２０１３／７）のデータから、[ＩＤ]ごとに、[日付]の昇順に[取引]の値を左から並べたリストを生成し、[２０１３／７取引]が結合される。属性系列データは、[属性１（年齢）]〜[属性ｋ（性別）]、[系列１（２０１３／７取引）]〜[系列ｍ（２０１３／４残高，２０１３／５残高，２０１３／６残高,・・・）]などのデータ項目からなる。 <Attribute series data>
FIG. 5 is a diagram showing an outline of a configuration example of attribute series data stored in the attribute series DB 133 according to the embodiment of the present invention. The attribute series DB 133 stores attribute series data obtained by combining static data indicating customer attributes and data accumulated with respect to customer behavior. As shown in FIG. 5, the attribute series data is generated by combining attribute data having the same [ID] and series data. Specifically, in the case of the series data shown in FIG. 3, from the data in an arbitrary range of [date] (for example, 2013/7), the value of [transaction] from the left in ascending order of [date] for each [ID]. A side-by-side list is generated and [2013/7 Transactions] are combined. The attribute series data includes [attribute 1 (age)] to [attribute k (gender)], [series 1 (2013/7 transaction)] to [series m (2013/4 balance, 2013/5 balance, 2013/6 balance). , ...)].

＜全体処理＞
図６は、本発明の一実施の形態における全体処理の概要を示す図である。 <Overall processing>
FIG. 6 is a diagram showing an overview of the overall processing in one embodiment of the present invention.

まず、Ｓ６０１にて、特徴抽出処理（後述、図７）が実行される。特徴抽出処理では、特徴抽出部１４１は、複数の要素から算出されるデータの特徴（データを構成する要素の要素数や、データを構成する隣り合う要素の比率や、データに含まれる数が最も多い要素である最頻要素や、各要素の平均値など）を抽出する。そして、特徴抽出部１４１は、抽出した特徴をデータ項目とするデータを属性系列ＤＢ１３３に追加する。 First, in S601, a feature extraction process (described later, FIG. 7) is executed. In the feature extraction process, the feature extraction unit 141 calculates the feature of the data calculated from a plurality of elements (the number of elements constituting the data, the ratio of adjacent elements constituting the data, and the number contained in the data). The most frequent elements, the average value of each element, etc.) are extracted. Then, the feature extraction unit 141 adds data having the extracted feature as a data item to the attribute series DB 133.

次に、Ｓ６０２にて、区分処理（後述、図１２）が実行される。区分処理では、区分部１４２は、属性系列データに含まれるデータが数値である場合、このデータである数値の最大値と最小値とを抽出する。そして、区分部１４２は、抽出した最小値から最大値までの区間をＬ（例えば、２）等分する。これにより、抽出された最小値から最大値までの区間はＬ個の区間へと分割される。また、区分部１４２は、データである各数値を、分割後の区間へと変換する。そして、区分部１４２は、変換後のデータを属性系列ＤＢ１３３に追加する。 Next, in S602, sorting processing (described later, FIG. 12) is executed. In the classification process, when the data included in the attribute series data is a numerical value, the classification unit 142 extracts the maximum value and the minimum value of the numerical value that is the data. Then, the sorting unit 142 equally divides the extracted section from the minimum value to the maximum value into L (for example, 2). Thereby, the section from the extracted minimum value to the maximum value is divided into L sections. Further, the sorting unit 142 converts each numerical value, which is data, into a section after division. Then, the sorting unit 142 adds the converted data to the attribute series DB 133.

次に、Ｓ６０３にて、項目選択処理（後述、図１５）が実行される。項目選択処理では、入力装置１２０は、グループを生成するための［データ項目］を選択する入力を受け付ける。 Next, in S603, an item selection process (described later, FIG. 15) is executed. In the item selection process, the input device 120 accepts an input for selecting a [data item] for generating a group.

次に、Ｓ６０４にて、グループ化処理（後述、図１７）が実行される。グループ化処理では、項目選択処理にて選択する入力を受け付けた各データ項目のデータ同士を組み合わせることでグループを生成する。また、グループ化部１４４は、生成した各グループをグループ化ＤＢ１３４（後述、図１８）に記憶する。 Next, in S604, a grouping process (described later, FIG. 17) is executed. In the grouping process, a group is generated by combining data of each data item that has received an input to be selected in the item selection process. The grouping unit 144 stores the generated groups in the grouping DB 134 (described later, FIG. 18).

＜特徴抽出処理＞
図７は、本発明の一実施の形態における特徴抽出処理の概要を示す図である。 <Feature extraction process>
FIG. 7 is a diagram showing an outline of the feature extraction processing in one embodiment of the present invention.

まず、Ｓ７０１にて、特徴抽出部１４１は、属性系列ＤＢ（前述、図５）１３３に記憶される属性系列データに含まれるデータ項目から一つのデータ項目を選択する。なお、特徴抽出部１４１は、未選択のデータ項目を選択する。 First, in S701, the feature extraction unit 141 selects one data item from the data items included in the attribute series data stored in the attribute series DB (described above, FIG. 5) 133. Note that the feature extraction unit 141 selects an unselected data item.

次に、Ｓ７０２にて、特徴抽出部１４１は、Ｓ７０１にて選択したデータ項目のデータがリストかを判定する。なお、特徴抽出部１４１は、データ項目のデータが複数の要素から構成されている場合に、このデータがリストであると判定する。Ｓ７０２にて、特徴抽出部１４１が、データ項目のデータがリストであると判定する場合（Ｓ７０２−Ｙｅｓ）、Ｓ７０３へ進む。一方、Ｓ７０２にて、特徴抽出部１４１が、データ項目のデータがリストではないと判定する場合（Ｓ７０２−Ｎｏ）、Ｓ７１２へ進む。例えば、Ｓ７０１にて選択したデータ項目が[属性１（年齢）]である場合、データに含まれる要素は一つであるため、特徴抽出部１４１は、データがリストではないと判定する。一方、Ｓ７０１にて選択したデータ項目が、[系列１（２０１３／７取引）]である場合、データに含まれる要素が複数であるため、特徴抽出部１４１は、データがリストであると判定する。 Next, in S702, the feature extraction unit 141 determines whether the data of the data item selected in S701 is a list. Note that the feature extraction unit 141 determines that this data is a list when the data of the data item is composed of a plurality of elements. In S702, when the feature extraction unit 141 determines that the data of the data item is a list (S702-Yes), the process proceeds to S703. On the other hand, if the feature extraction unit 141 determines in S702 that the data item data is not a list (S702-No), the process proceeds to S712. For example, if the data item selected in S701 is [attribute 1 (age)], the feature extraction unit 141 determines that the data is not a list because there is only one element included in the data. On the other hand, when the data item selected in S701 is [Series 1 (2013/7 transaction)], since the data includes a plurality of elements, the feature extraction unit 141 determines that the data is a list. .

Ｓ７０３にて、特徴抽出部１４１は、Ｓ７０１にて選択したデータ項目のデータを属性系列ＤＢ１３３からすべて取得する。例えば、Ｓ７０１にて選択したデータ項目が、[系列１（２０１３／７取引）]である場合、特徴抽出部１４１は、このデータ項目のデータである「支払，支払，振込」、「支払，支払，支払」、「残照，残照，振込」、「支払，支払，支払」、「振込，振込，残照」を属性系列ＤＢ１３３から取得する。 In S703, the feature extraction unit 141 acquires all data of the data item selected in S701 from the attribute series DB 133. For example, when the data item selected in S701 is [series 1 (2013/7 transaction)], the feature extraction unit 141 performs data “payment, payment, transfer”, “payment, payment” as data of this data item. , Payment "," afterglow, afterglow, transfer "," payment, payment, payment ", and" transfer, transfer, afterglow "are acquired from the attribute series DB 133.

次に、Ｓ７０４にて、特徴抽出部１４１は、Ｓ７０３にて取得した各データを構成する要素の要素数を算出する。例えば、Ｓ７０３で取得したデータが、「支払，支払，振込」、「支払，支払，支払」、「残照，残照，振込」、「支払，支払，支払」、「振込，振込，残照」である場合、特徴抽出部１４１は、各データの要素数を「３」と算出する。 Next, in S704, the feature extraction unit 141 calculates the number of elements constituting each data acquired in S703. For example, the data acquired in S703 is “payment, payment, transfer”, “payment, payment, payment”, “afterglow, afterglow, transfer”, “payment, payment, payment”, “transfer, transfer, afterglow”. In this case, the feature extraction unit 141 calculates the number of elements of each data as “3”.

次に、Ｓ７０５にて、特徴抽出部１４１は、Ｓ７０４にて算出した各要素数がすべて一致するかを判定する。Ｓ７０５にて、特徴抽出部１４１が、各要素数がすべて一致すると判定する場合（Ｓ７０５−Ｙｅｓ）、データの特徴を各要素の傾向から抽出できる。そのため、各要素数がすべて一致する場合は、各要素間を比較することで算出される比率（比率は、Ｓ７０７にて算出される）、または、データに含まれる数が最も多い要素である最頻要素（最頻要素は、Ｓ７０８にて抽出される）を特徴として抽出するために、Ｓ７０６へ進む。一方、Ｓ７０５にて、特徴抽出部１４１が、各要素数が一致しないと判定する場合（Ｓ７０５−Ｎｏ）、データの特徴を各要素の傾向からは抽出できない。そのため、各要素数が等しくない場合には、各要素の代表値の一例である平均値（平均値は、Ｓ７１０にて算出される）や、データの中で重複する要素を除外した出現種類（出現種類は、Ｓ７１１にて抽出される）を特徴として抽出するために、Ｓ７０９へ進む。 In step S <b> 705, the feature extraction unit 141 determines whether all the element numbers calculated in step S <b> 704 match. In S705, when the feature extraction unit 141 determines that all the numbers of elements match (S705-Yes), the data feature can be extracted from the tendency of each element. For this reason, when all the numbers of elements match, the ratio calculated by comparing each element (the ratio is calculated in S707) or the element with the largest number included in the data. In order to extract the frequent element (the most frequent element is extracted in S708) as a feature, the process proceeds to S706. On the other hand, when the feature extraction unit 141 determines in S705 that the number of elements does not match (S705-No), the data feature cannot be extracted from the tendency of each element. Therefore, when the number of elements is not equal, an average value (an average value is calculated in S710) that is an example of a representative value of each element, or an appearance type that excludes duplicate elements in the data ( The appearance type is extracted in step S711), and the process proceeds to step S709.

Ｓ７０６にて、特徴抽出部１４１は、Ｓ７０３にて取得したデータを構成する要素が数値であるかを判定する。Ｓ７０６にて、特徴抽出部１４１が、要素が数値であると判定する場合（Ｓ７０６−Ｙｅｓ）、Ｓ７０７へ進む。一方、Ｓ７０６にて、特徴抽出部１４１が、要素が数値ではないと判定する場合（Ｓ７０６−Ｎｏ）、Ｓ７０８へ進む。例えば、Ｓ７０１にて選択したデータ項目が[系列ｍ（２０１３／４残高，２０１３／５残高，２０１３／６残高,・・・）]であり、Ｓ７０３にて取得したデータが、「１００，８０，…，２５０，５００」、「１００，１２０，…，１００，１５０」、「３００，４５０，…，３００，３００」、「５００，９００，…，２５０，５００」、「１２０，４８，…，２００，２００」である場合、特徴抽出部１４１は、データを構成する要素が数値であると判定する。一方、Ｓ７０１にて選択したデータ項目が[系列１（２０１３／７取引）]であり、Ｓ７０３にて取得したデータが、「支払，支払，振込」、「支払，支払，支払」、「残照，残照，振込」、「支払，支払，支払」、「振込，振込，残照」である場合、特徴抽出部１４１は、データを構成する要素が数値ではないと判定する。 In S706, the feature extraction unit 141 determines whether the elements constituting the data acquired in S703 are numerical values. If the feature extraction unit 141 determines in S706 that the element is a numerical value (S706-Yes), the process proceeds to S707. On the other hand, when the feature extraction unit 141 determines in S706 that the element is not a numerical value (S706-No), the process proceeds to S708. For example, the data item selected in S701 is [series m (2013/4 balance, 2013/5 balance, 2013/6 balance,...)], And the data acquired in S703 is “100, 80, ..., 250, 500 "," 100, 120, ..., 100, 150 "," 300, 450, ..., 300, 300 "," 500, 900, ..., 250, 500 "," 120, 48, ..., In the case of “200, 200”, the feature extraction unit 141 determines that the elements constituting the data are numerical values. On the other hand, the data item selected in S701 is [Series 1 (2013/7 transaction)], and the data acquired in S703 includes “payment, payment, transfer”, “payment, payment, payment”, “afterglow, In the case of “afterglow, transfer”, “payment, payment, payment”, and “transfer, transfer, afterglow”, the feature extraction unit 141 determines that the elements constituting the data are not numerical values.

Ｓ７０７にて、特徴抽出部１４１は、Ｓ７０３にて取得した各データを構成する、隣り合う要素同士を比較することで各比率を算出する。より詳細には、要素数がＮである場合、特徴抽出部１４１は、１番目の要素と２番目の要素との比率を１番目の比率として算出し、２番目の要素と３番目の要素との比率を２番目の比率として算出し、Ｎ−１番目の要素とＮ番目の要素との比率をＮ−１番目の比率として算出する。そして、図８に示されるように算出した各比率を属性系列ＤＢ１３３に追加する。例えば、特徴抽出部１４１は、Ｓ７０３にて取得したデータ「１００，８０，…，２５０，５００」について比率「０．８…，２．０」を算出し、「１００，１２０，…，１００，１５０」について比率「１．２…，１．５」を算出し、「３００，４５０，…，３００，３００」について比率「１．５…，１．０」を算出し、「５００，９００，…，２５０，５００」、について比率「１．８…，２．０」を算出し、「１２０，４８，…，２００，２００」について比率「０．４…，１．０」を算出する。そして、特徴抽出部１４１は、図８に示されるように算出した各比率を、データ項目[系列ｍ（比率）]と対応付けて属性系列ＤＢ１３３に追加する。 In S707, the feature extraction unit 141 calculates each ratio by comparing adjacent elements constituting each data acquired in S703. More specifically, when the number of elements is N, the feature extraction unit 141 calculates the ratio of the first element and the second element as the first ratio, and calculates the second element, the third element, Is calculated as the second ratio, and the ratio between the (N-1) th element and the Nth element is calculated as the (N-1) th ratio. Then, each ratio calculated as shown in FIG. 8 is added to the attribute series DB 133. For example, the feature extraction unit 141 calculates the ratio “0.8..., 2.0” for the data “100, 80,..., 250, 500” acquired in S703, and “100, 120,. The ratio “1.2... 1.5” is calculated for “150”, the ratio “1.5... 1.0” is calculated for “300, 450,. .., 250, 500 ”, the ratio“ 1.8..., 2.0 ”is calculated, and“ 120, 48,..., 200, 200 ”is calculated. Then, the feature extraction unit 141 adds each ratio calculated as shown in FIG. 8 to the attribute series DB 133 in association with the data item [series m (ratio)].

Ｓ７０８にて、特徴抽出部１４１は、Ｓ７０３にて取得したデータを構成する各要素の中で、データに含まれる数が最も多い要素を最頻要素として抽出する。そして、特徴抽出部１４１は、図９に示されるように抽出した各最頻要素を、データ項目[系列１（最頻要素）]と対応付けて属性系列ＤＢ１３３に追加する。例えば、Ｓ７０３にて取得したデータ「支払，支払，振込」について、特徴抽出部１４１は、「支払」を最頻要素として抽出する。また、データ「支払，支払，支払」について、特徴抽出部１４１は、「支払」を最頻要素として抽出する。また、データ「残照，残照，振込」について特徴抽出部１４１は、「残照」を最頻要素として抽出する。また、データ「振込，振込，残照」について、特徴抽出部１４１は、「振込」を最頻要素として抽出する。そして、特徴抽出部１４１は、図９に示されるように抽出した各最頻要素を属性系列ＤＢ１３３に追加する。なお、最頻要素が複数存在する場合、特徴抽出部１４１は、各最頻要素を属性系列ＤＢ１３３に記憶するようにしても良いし、最後に記憶された方の要素を最頻要素として属性系列ＤＢ１３３に記憶するようにしても良い。特徴抽出部１４１が、抽出した各最頻要素を属性系列ＤＢ１３３に追加した後、Ｓ７１２へ進む。 In S708, the feature extraction unit 141 extracts, as the most frequent element, the element having the largest number included in the data among the elements constituting the data acquired in S703. Then, the feature extraction unit 141 adds each mode element extracted as shown in FIG. 9 to the attribute series DB 133 in association with the data item [series 1 (mode element)]. For example, for the data “payment, payment, transfer” acquired in S703, the feature extraction unit 141 extracts “payment” as the most frequent element. For the data “payment, payment, payment”, the feature extraction unit 141 extracts “payment” as the most frequent element. In addition, for the data “afterglow, afterglow, transfer”, the feature extraction unit 141 extracts “afterglow” as the most frequent element. For the data “transfer, transfer, afterglow”, the feature extraction unit 141 extracts “transfer” as the most frequent element. Then, the feature extraction unit 141 adds each mode element extracted as shown in FIG. 9 to the attribute series DB 133. Note that when there are a plurality of mode elements, the feature extraction unit 141 may store each mode element in the attribute series DB 133, or use the last stored element as the mode element. You may make it memorize | store in DB133. After the feature extraction unit 141 adds each extracted most frequent element to the attribute series DB 133, the process proceeds to S712.

Ｓ７０９にて、特徴抽出部１４１は、Ｓ７０３にて取得したデータを構成する要素が数値であるかを判定する。Ｓ７０９にて、特徴抽出部１４１が、要素が数値であると判定する場合（Ｓ７０９−Ｙｅｓ）、Ｓ７１０へ進む。一方、Ｓ７０９にて、特徴抽出部１４１が、要素が数値ではないと判定する場合（Ｓ７０９−Ｎｏ）、Ｓ７１１へ進む。 In step S709, the feature extraction unit 141 determines whether the elements constituting the data acquired in step S703 are numerical values. In S709, when the feature extraction unit 141 determines that the element is a numerical value (S709-Yes), the process proceeds to S710. On the other hand, when the feature extraction unit 141 determines in S709 that the element is not a numerical value (S709-No), the process proceeds to S711.

Ｓ７１０にて、特徴抽出部１４１は、Ｓ７０４にて算出した各要素数を、図１０に示されるように、データ項目[系列２（要素数）]と対応付けて属性系列ＤＢ１３３に追加する。また、特徴抽出部１４１は、各要素の平均値を算出し、算出した平均値を図１０に示されるように、データ項目[系列２（平均値）]と対応付けて属性系列ＤＢ１３３に追加する。 In S710, the feature extraction unit 141 adds the number of elements calculated in S704 to the attribute series DB 133 in association with the data item [series 2 (number of elements)] as illustrated in FIG. Further, the feature extraction unit 141 calculates an average value of each element, and adds the calculated average value to the attribute series DB 133 in association with the data item [series 2 (average value)] as illustrated in FIG. .

Ｓ７１１にて、特徴抽出部１４１は、Ｓ７０３にて取得したデータを構成する各要素から、データの中で重複する要素を除外することで出現種類を抽出し、抽出した出現種類を、図１１に示されるように属性系列ＤＢ１３３に追加する。また、特徴抽出部１４１は、Ｓ７０４にて算出した各要素数を図１１に示されるように属性系列ＤＢ１３３に追加する。 In S711, the feature extraction unit 141 extracts appearance types by excluding duplicate elements in the data from the elements constituting the data acquired in S703, and the extracted appearance types are shown in FIG. Add to the attribute series DB 133 as shown. Also, the feature extraction unit 141 adds the number of elements calculated in S704 to the attribute series DB 133 as shown in FIG.

次に、Ｓ７１２にて、特徴抽出部１４１は、Ｓ７０１にて属性系列データに含まれるすべてのデータ項目を選択したかを判定する。Ｓ７１２にて、特徴抽出部１４１が、すべてのデータ項目を選択したと判定する場合（Ｓ７１２−Ｙｅｓ）、特徴抽出処理を終了する。一方、特徴抽出部１４１が、すべてのデータ項目を選択していないと判定する場合（Ｓ７１２−Ｎｏ）、Ｓ７０１へ進む。 Next, in S712, the feature extraction unit 141 determines whether all data items included in the attribute series data have been selected in S701. If the feature extraction unit 141 determines in S712 that all data items have been selected (S712-Yes), the feature extraction process ends. On the other hand, if the feature extraction unit 141 determines that not all data items have been selected (S712-No), the process proceeds to S701.

＜区分処理＞
図１２は、本発明の一実施の形態における区分処理の概要を示す図である。 <Classification processing>
FIG. 12 is a diagram showing an outline of the sorting process according to the embodiment of the present invention.

まず、Ｓ１２０１にて、区分部１４２は、属性系列データに含まれるデータ項目から一つのデータ項目を選択する。なお、区分部１４２は、未選択のデータ項目を選択する。 First, in S1201, the classification unit 142 selects one data item from the data items included in the attribute series data. The sorting unit 142 selects an unselected data item.

次に、Ｓ１２０２にて、区分部１４２は、Ｓ１２０１にて選択したデータ項目のデータをすべて取得する。例えば、Ｓ１２０１にて選択したデータ項目が、[系列１（２０１３／７取引）]である場合には、区分部１４２は、このデータ項目のすべてのデータである「支払，支払，振込」、「支払，支払，支払」、「残照，残照，振込」、「支払，支払，支払」、「振込，振込，残照」を属性系列ＤＢ（前述、図５）１３３から取得する。 Next, in S1202, the sorting unit 142 acquires all data items of the data item selected in S1201. For example, when the data item selected in S1201 is [Series 1 (2013/7 transaction)], the sorting unit 142 displays all the data of this data item as “payment, payment, transfer”, “ “Payment, payment, payment”, “afterglow, afterglow, transfer”, “payment, payment, payment”, “transfer, transfer, afterglow” are acquired from the attribute series DB 133 (described above, FIG. 5).

次に、Ｓ１２０３にて、区分部１４２は、Ｓ１２０２にて取得したデータから、重複するデータを除外する。そして、区分部１４２は、重複するデータを除外した後のデータの数を算出する。 Next, in S1203, the sorting unit 142 excludes duplicate data from the data acquired in S1202. Then, the sorting unit 142 calculates the number of data after excluding duplicate data.

次に、Ｓ１２０４にて、区分部１４２は、閾値Ｌを取得する。Ｓ１２０３にて算出したデータの数が取得した閾値Ｌを超えるかを判定する。区分部１４２が、データの数が閾値Ｌを超えないと判定する場合（Ｓ１２０４−Ｎｏ）、Ｓ１２１２へ進む。一方、区分部１４２が、データの数が閾値Ｌを超えると判定する場合（Ｓ１２０４−Ｙｅｓ）、Ｓ１２０５へ進む。なお、閾値Ｌは、予め設定ファイルに記載されている。区分部１４２は、設定ファイルを読み込むことで閾値Ｌを取得する。 Next, in S1204, the classification unit 142 acquires a threshold value L. It is determined whether the number of data calculated in S1203 exceeds the acquired threshold value L. When the sorting unit 142 determines that the number of data does not exceed the threshold L (S1204-No), the process proceeds to S1212. On the other hand, when the classification unit 142 determines that the number of data exceeds the threshold L (S1204-Yes), the process proceeds to S1205. The threshold value L is described in advance in the setting file. The sorting unit 142 acquires the threshold value L by reading the setting file.

Ｓ１２０５にて、区分部１４２は、Ｓ１２０１にて選択したデータ項目のデータがリストかを判定する。Ｓ１２０５にて、区分部１４２が、データ項目のデータがリストであると判定する場合（Ｓ１２０５−Ｙｅｓ）、Ｓ１２０６へ進む。一方、Ｓ１２０５にて、区分部１４２が、データ項目のデータがリストではないと判定する場合（Ｓ１２０５−Ｎｏ）、Ｓ１２０９へ進む。 In S1205, the sorting unit 142 determines whether the data of the data item selected in S1201 is a list. In S1205, when the sorting unit 142 determines that the data item data is a list (S1205-Yes), the process proceeds to S1206. On the other hand, in S1205, when the classification unit 142 determines that the data item data is not a list (No in S1205), the process proceeds to S1209.

Ｓ１２０６にて、区分部１４２は、Ｓ１２０２にて取得したデータを構成する要素が数値であるかを判定する。Ｓ１２０６にて、区分部１４２が、要素が数値であると判定する場合（Ｓ１２０６−Ｙｅｓ）、Ｓ１２０７へ進む。一方、Ｓ１２０６にて、区分部１４２が、要素が数値ではないと判定する場合（Ｓ１２０６−Ｎｏ）、Ｓ１２１２へ進む。 In S1206, the classification unit 142 determines whether the elements constituting the data acquired in S1202 are numerical values. In S1206, when the classification unit 142 determines that the element is a numerical value (S1206-Yes), the process proceeds to S1207. On the other hand, in S1206, when the classification unit 142 determines that the element is not a numerical value (S1206 No), the process proceeds to S1212.

Ｓ１２０７にて、区分部１４２は、Ｓ１２０２にて取得したデータごとに、データを構成する各要素の最小値と最大値とを抽出する。そして、区分部１４２は、抽出した最小値から最大値までの区間を、Ｌ（例えば、２）等分する。これにより、抽出された最小値から最大値までの区間はＬ個の区間へと分割される。例えば、Ｓ１２０２にてデータ項目[系列ｊ]のデータである「１，１，３」、「２，５，４」、「３，４，６」、「２，３，２」、「１，１，６」が属性系列ＤＢ１３３から取得された場合、区分部１４２は、最小値「１」と最大値「６」とを抽出する。そして、区分部１４２は、抽出した最小値「１」から最大値「６」までの区間「１〜６」を２等分することで、「１〜３．５」（[１，３．６）に相当し、１以上３．６未満を意味する）の区間と「３．６〜６」（［３．６，６］に相当し、３．６以上６以下を意味する）の区間とへ分割する。このように、データを構成する各要素の最小値と最大値をＬ個へと分割した区間を生成し、生成した区間を各要素に対して適用することで、分割後の区間数をＬ個に抑えることができる。そのため、要素ごとに区間を分割するよりも、区間の数を減少させることができ、より可読性を向上させることができるようになる。 In S1207, the classification unit 142 extracts, for each data acquired in S1202, the minimum value and the maximum value of each element constituting the data. Then, the sorting unit 142 equally divides the extracted section from the minimum value to the maximum value into L (for example, 2). Thereby, the section from the extracted minimum value to the maximum value is divided into L sections. For example, “1,1,3”, “2,5,4”, “3,4,6”, “2,3,2”, “1,3” which are data of the data item [series j] in S1202 When “1, 6” is acquired from the attribute series DB 133, the sorting unit 142 extracts the minimum value “1” and the maximum value “6”. Then, the sorting unit 142 divides the section “1-6” from the extracted minimum value “1” to the maximum value “6” into two equal parts, thereby obtaining “1 to 3.5” ([1, 3.6 ) And an interval of “3.6 to 6” (corresponding to [3.6, 6], meaning 3.6 to 6) Divide into In this way, by generating a section obtained by dividing the minimum value and the maximum value of each element constituting the data into L pieces, and applying the generated section to each element, the number of divided sections can be reduced to L pieces. Can be suppressed. Therefore, the number of sections can be reduced and the readability can be improved more than dividing the sections for each element.

次に、Ｓ１２０８にて、区分部１４２は、Ｓ１２０１にて選択したデータ項目のデータを構成する各要素を、Ｓ１２０７にて分割した後に要素が含まれる区間へと変換し、変換後のデータを図１３に示されるように属性系列ＤＢ１３３に追加し、Ｓ１２１２へ進む。例えば、区分部１４２は、データ項目[系列ｊ]のデータを構成する要素である「１」を「１〜３．５」へ、「２」を「１〜３．５」へ、「３」を「１〜３．５」へ、「４」を「３．６〜６」へ、「５」を「３．６〜６」へ、「６」を「３．６〜６」へと変換し、変換後のデータ「１〜３．５，１〜３．５，１〜３．５」、「１〜３．５，３．６〜６，３．６〜６」、「１〜３．５，３．６〜６，３．６〜６」、「１〜３．５，１〜３．５，１〜３．５」、「１〜３．５，１〜３．５，３．６〜６」を、図１３に示されるように属性系列ＤＢ１３３に追加する。 Next, in S1208, the classification unit 142 converts each element constituting the data of the data item selected in S1201 into a section including the element after dividing in S1207, and displays the converted data as a diagram. 13 is added to the attribute series DB 133, and the process proceeds to S1212. For example, the sorting unit 142 sets “1” to “1 to 3.5”, “2” to “1 to 3.5”, and “3” that are elements constituting the data of the data item [series j]. To "1-3.5", "4" to "3.6 to 6", "5" to "3.6 to 6", and "6" to "3.6 to 6" The converted data “1 to 3.5, 1 to 3.5, 1 to 3.5”, “1 to 3.5, 3.6 to 6, 3.6 to 6”, “1 to 3 .5, 3.6-6, 3.6-6 "," 1-3.5, 1-3.5, 1-3.5 "," 1-3.5, 1-3.5, 3 .6-6 ”is added to the attribute series DB 133 as shown in FIG.

Ｓ１２０９にて、区分部１４２は、Ｓ１２０２にて取得したデータが数値であるかを判定する。Ｓ１２０９にて、区分部１４２が、データが数値であると判定する場合（Ｓ１２０９−Ｙｅｓ）、Ｓ１２１０へ進む。一方、Ｓ１２０９にて、区分部１４２が、データが数値ではないと判定する場合（Ｓ１２０９−Ｎｏ）、Ｓ１２１２へ進む。 In S1209, the classification unit 142 determines whether the data acquired in S1202 is a numerical value. In S1209, when the classification unit 142 determines that the data is a numerical value (S1209-Yes), the process proceeds to S1210. On the other hand, when the classification unit 142 determines in S1209 that the data is not a numerical value (S1209—No), the process proceeds to S1212.

Ｓ１２１０にて、区分部１４２は、Ｓ１２０２にて取得したデータごとに、データの最小値と最大値とを抽出する。そして、区分部１４２は、抽出した最小値から最大値までの区間をＬ（例えば、２）等分する。これにより、抽出された最小値から最大値までの区間はＬ個の区間へと分割される。例えば、Ｓ１２０２にてデータ項目[属性ｉ]のデータである「１」、「６」、「４」、「３」、「２」が属性系列ＤＢ１３３から取得された場合、区分部１４２は、最小値「１」と最大値「６」とを抽出する。そして、区分部１４２は、抽出した最小値「１」から最大値「６」までの区間「１〜６」を２等分することで、「１〜３」の区間と「４〜６」の区間とへ分割する。 In S1210, the sorting unit 142 extracts the minimum value and the maximum value of the data for each data acquired in S1202. Then, the sorting unit 142 equally divides the extracted section from the minimum value to the maximum value into L (for example, 2). Thereby, the section from the extracted minimum value to the maximum value is divided into L sections. For example, when “1”, “6”, “4”, “3”, “2”, which is data of the data item [attribute i], is acquired from the attribute series DB 133 in S1202, the sorting unit 142 sets the minimum value to A value “1” and a maximum value “6” are extracted. Then, the dividing unit 142 divides the section “1-6” from the extracted minimum value “1” to the maximum value “6” into two equal parts, so that the sections “1-3” and “4-6” Divide into sections.

次に、Ｓ１２１１にて、区分部１４２は、Ｓ１２０１にて選択したデータ項目の各データを、Ｓ１２１０にて分割した後にデータが含まれる区間へと変換し、変換後のデータを図１４に示されるように属性系列ＤＢ１３３に追加し、Ｓ１２１２へ進む。例えば、区分部１４２は、データ項目[属性ｉ]のデータ「１」を「１〜３」へ、「２」を「１〜３」へ、「３」を「４〜６」へ、「４」を「４〜６」へ、「５」を「４〜６」へと変換し、変換後のデータ「１〜３」、「４〜６」、「４〜６」、「４〜６」、「１〜３」を、図１４に示されるように属性系列ＤＢ１３３に追加する。 Next, in S1211, the sorting unit 142 converts each data item selected in S1201 into a section including the data after dividing in S1210, and the converted data is shown in FIG. In this manner, the attribute series DB 133 is added, and the process proceeds to S1212. For example, the sorting unit 142 sets the data “1” of the data item [attribute i] to “1-3”, “2” to “1-3”, “3” to “4-6”, “4” ”To“ 4-6 ”,“ 5 ”to“ 4-6 ”, and the converted data“ 1-3 ”,“ 4-6 ”,“ 4-6 ”,“ 4-6 ” , “1-3” are added to the attribute series DB 133 as shown in FIG.

Ｓ１２１２にて、区分部１４２は、Ｓ１２０１にて属性系列データに含まれるすべてのデータ項目を選択したかを判定する。Ｓ１２１２にて、区分部１４２が、すべてのデータ項目を選択したと判定する場合（Ｓ１２１２−Ｙｅｓ）、区分処理を終了する。一方、区分部１４２が、すべてのデータ項目を選択していないと判定する場合（Ｓ１２１２−Ｎｏ）、Ｓ１２０１へ進む。 In S1212, the sorting unit 142 determines whether all data items included in the attribute series data have been selected in S1201. If the sorting unit 142 determines in S1212 that all data items have been selected (Yes in S1212-), the sorting process ends. On the other hand, if the sorting unit 142 determines that all data items have not been selected (S1212-No), the process proceeds to S1201.

＜項目選択処理＞
図１５は、本発明の一実施の形態における項目選択処理の概要を示す図である。 <Item selection process>
FIG. 15 is a diagram showing an outline of the item selection processing in the embodiment of the present invention.

まず、Ｓ１５０１にて、項目選択部１４３は、属性系列ＤＢ１３３に記憶された属性系列データをすべて取得する。 First, in S1501, the item selection unit 143 acquires all the attribute series data stored in the attribute series DB 133.

次に、Ｓ１５０２にて、項目選択部１４３は、Ｓ１５０１にて取得した属性系列データを出力装置１１０に入力する。 Next, in S1502, the item selection unit 143 inputs the attribute series data acquired in S1501 to the output device 110.

次に、Ｓ１５０３にて、出力装置１１０は、Ｓ１５０２にて入力された属性系列データに基づき、選択画面（後述、図１６）を表示する。以下、図１６を用いて選択画面について説明する。 In step S1503, the output device 110 displays a selection screen (described later, FIG. 16) based on the attribute series data input in step S1502. Hereinafter, the selection screen will be described with reference to FIG.

図１６に示されるように選択画面には、選択欄と、[データ項目]と、[型]と、[データ数]と、[加工]と、[加工元]と、[データ]とが表示される。[データ項目]は、データ項目の名称を示す。[型]は、[データ]の型を示す。[型]としては、「数値」「数値リスト」「文字」などが該当する。「数値」は、[データ]が一つの要素から構成されることを示す。「数値リスト」は[データ]が複数の要素から構成されることを示す。「文字」は、[データ]が文字から構成されることを示す。[データ数]は、データの数を示す。[加工]は、区分処理にて、［データ］が数値から区間へ変換されたかを示す。「元データ」は［データ］が加工される前の数値を示す。また、[加工元]は、変換される前の元データのデータ項目の名称を示す。［データ］は、データの内容を示す。 As shown in FIG. 16, a selection field, [data item], [type], [number of data], [processing], [processing source], and [data] are displayed on the selection screen. Is done. [Data item] indicates the name of the data item. [Type] indicates the type of [Data]. [Type] corresponds to “number”, “number list”, “character”, and the like. “Numerical value” indicates that [data] is composed of one element. “Numeric list” indicates that [data] is composed of a plurality of elements. “Character” indicates that [DATA] is composed of characters. [Number of data] indicates the number of data. [Processing] indicates whether [Data] is converted from a numerical value to a section in the classification process. “Original data” indicates a numerical value before [data] is processed. [Processing source] indicates the name of the data item of the original data before conversion. [Data] indicates the contents of the data.

入力装置１２０は、選択画面に表示された各［データ項目］から、１以上の［データ項目］を選択する入力を受け付ける。入力装置１２０が［データ項目］を選択する入力を受け付けると、選択された［データ項目］と対応する選択欄にチェック１６０１が表示される。 The input device 120 receives an input for selecting one or more [data items] from each [data item] displayed on the selection screen. When the input device 120 receives an input for selecting [data item], a check 1601 is displayed in a selection column corresponding to the selected [data item].

なお、項目選択部１４３は、［データ］が加工された［データ項目］については、加工前の［データ項目］または加工後の［データ項目］のいずれか一方しか選択されないように、出力装置１１０を制御する。 Note that the item selection unit 143 selects the output device 110 so that only one of the [data item] before processing and the [data item] after processing is selected for the [data item] whose [data] is processed. To control.

つまり、項目選択部１４３は、加工前の［データ項目］が既に選択されている状態で、加工後の［データ項目］が選択されたとしても、加工後の［データ項目］と対応する選択欄のみにチェック１６０１が表示されるように出力装置１１０を制御する。詳細には、項目選択部１４３は、選択された［データ項目］に対応する［加工］を参照する。そして、項目選択部１４３は、［加工］に「区間」が記憶されている場合には、対応する［加工元］を参照する。そして、項目選択部１４３は、［加工元］に記憶される［データ項目］を参照し、［データ項目］が既に選択されている場合には、この［データ項目］と対応する選択欄に表示されているチェック１６０１が消去されるように出力装置１１０を制御する。また、項目選択部１４３は、新たに選択された加工後の［データ項目］と対応する選択欄にチェック１６０１が表示されるように出力装置１１０を制御する。 That is, the item selection unit 143 selects the selection field corresponding to the post-processing [data item] even if the post-processing [data item] is selected in a state where the pre-processing [data item] is already selected. The output device 110 is controlled so that only the check 1601 is displayed. Specifically, the item selection unit 143 refers to [processing] corresponding to the selected [data item]. Then, when “section” is stored in [processing], the item selection unit 143 refers to the corresponding [processing source]. Then, the item selection unit 143 refers to the [data item] stored in the [processing source], and when [data item] is already selected, the item selection unit 143 displays it in the selection column corresponding to this [data item]. The output device 110 is controlled so that the checked 1601 is deleted. In addition, the item selection unit 143 controls the output device 110 so that a check 1601 is displayed in the selection column corresponding to the newly selected [data item] after processing.

また、項目選択部１４３は、加工後の［データ項目］が既に選択されている状態で加工前の［データ項目］が選択されたとしても、加工前の［データ項目］と対応する選択欄のみにチェック１６０１が表示されるように出力装置１１０を制御する。詳細には、選択された［データ項目］に対応する［加工］を参照する。そして、項目選択部１４３は、［加工］に「区間」が記憶されていない場合には、対応する［加工元］に記憶されている[データ項目]が[加工元]に記憶され、［加工］に「区間」が記憶され、選択欄にチェック１６０１が表示されている［データ項目］を抽出する。そして、項目選択部１４３は、抽出した［データ項目］と対応する選択欄に表示されているチェック１６０１が消去されるように出力装置１１０を制御する。また、項目選択部１４３は、新たに選択された加工前の［データ項目］と対応する選択欄にチェック１６０１が表示されるように出力装置１１０を制御する。 Moreover, even if the [data item] before processing is selected while the [data item] after processing is already selected, the item selection unit 143 only selects the selection column corresponding to the [data item] before processing. The output device 110 is controlled so that a check 1601 is displayed. For details, refer to [Processing] corresponding to the selected [Data Item]. Then, when “section” is not stored in [processing], the item selection unit 143 stores [data item] stored in the corresponding [processing source] in [processing source] and [processing] ] [Section] is stored, and [data item] with a check 1601 displayed in the selection column is extracted. Then, the item selection unit 143 controls the output device 110 so that the check 1601 displayed in the selection column corresponding to the extracted [data item] is deleted. Further, the item selection unit 143 controls the output device 110 so that a check 1601 is displayed in the selection column corresponding to the newly selected [data item] before processing.

また、出力装置１１０は、最大グループ数１６０２を表示する。最大グループ数１６０２は、選択された［データ項目］の各［データ数］同士を乗算した数が該当する。 Further, the output device 110 displays the maximum number of groups 1602. The maximum number of groups 1602 corresponds to the number obtained by multiplying each [data number] of the selected [data item].

また、出力装置１１０は、選択完了ボタン１６０３を表示する。入力装置１２０が選択完了ボタン１６０３を選択する入力を受け付けると、項目選択処理を終了する。 Further, the output device 110 displays a selection completion button 1603. When the input device 120 receives an input for selecting the selection completion button 1603, the item selection process is terminated.

＜グループ化処理＞
図１７は、本発明の一実施の形態におけるグループ化処理の概要を示す図である。 <Grouping process>
FIG. 17 is a diagram showing an outline of the grouping process according to the embodiment of the present invention.

まず、Ｓ１７０１にて、グループ化部１４４は、Ｓ１５０３にて選択する入力を受け付けたデータ項目のデータに基づき、グループを生成する。詳細には、グループ化部１４４は、各データ項目のデータ同士を組み合わせることで、グループを生成する。例えば、Ｓ１５０３にて選択する入力を受けつけたデータ項目が、「属性ｉ（区間）」と、［属性ｋ］と、［系列ｊ（区間）］である場合、「属性ｉ（区間）」のデータは「１〜３」と「４〜６」であり、［属性ｋ］のデータは「男性」と「女性」であり、［系列ｊ（区間）］のデータは「１〜３．５，１〜３．５，１〜３．５」と「１〜３．５，３．６〜６，３．６〜６」である。この場合、グループ化部１４４は、「属性ｉ（区間）」のデータ「１〜３」、「４〜６」と［属性ｋ］のデータ「男性」、「女性」と［系列ｊ（区間）］のデータ「１〜３．５，１〜３．５，１〜３．５」、「１〜３．５，３．６〜６，３．６〜６」同士を組み合わせたグループである「１〜３，男性，｛１〜３．５，１〜３．５，１〜３．５｝」と、「１〜３，女性，｛１〜３．５，１〜３．５，１〜３．５｝」と、「１〜３，男性，｛１〜３．５，３．６〜６，３．６〜６｝」と、「１〜３，女性，｛１〜３．５，３．６〜６，３．６〜６｝」と、「４〜６，男性，｛１〜３．５，１〜３．５，１〜３．５｝」と、「４〜６，女性，｛１〜３．５，１〜３．５，１〜３．５｝」と、「４〜６，男性，｛１〜３．５，３．６〜６，３．６〜６｝」と、「４〜６，女性，｛１〜３．５，３．６〜６，３．６〜６｝」とを生成する。 First, in S1701, the grouping unit 144 generates a group based on the data of the data item that has received the input selected in S1503. Specifically, the grouping unit 144 generates a group by combining data of each data item. For example, if the data items received in S1503 are “attribute i (section)”, [attribute k], and [series j (section)], the data of “attribute i (section)” Are “1 to 3” and “4 to 6”, the data of [attribute k] is “male” and “female”, and the data of [series j (section)] is “1 to 3.5,1”. -3.5, 1-3.5 "and" 1-3.5, 3.6-6, 3.6-6 ". In this case, the grouping unit 144 sets the data “1 to 3” and “4 to 6” of the “attribute i (section)” and the data “male” and “female” of the [attribute k] [series j (section). ] Data “1 to 3.5, 1 to 3.5, 1 to 3.5” and “1 to 3.5, 3.6 to 6, 3.6 to 6” are combined. 1 to 3, male, {1 to 3.5, 1 to 3.5, 1 to 3.5} "and" 1 to 3, female, {1 to 3.5, 1 to 3.5, 1 3.5} ”,“ 1-3, male, {1-3.5, 3.6-6, 3.6-6}} ”,“ 1-3, female, {1-3.5, 3.6-6, 3.6-6} "," 4-6, male, {1-3.5, 1-3.5,1-3.5} "and" 4-6, female , {1-3.5, 1-3.5, 1-3.5} "and" 4-6 men, {1-3.5, 3.6-6, 3.6- A} ", it generates a" 4-6, female, {1～3.5,3.6～6,3.6～6} ".

次に、Ｓ１７０２にて、グループ化部１４４は、Ｓ１７０１にて生成したグループに含まれる各データをキーに属性系列ＤＢ１３３を検索することで、属性系列ＤＢ１３３に記憶される属性系列データをグループごとに分類する。また、グループ化部１４４は、キーと一致するデータの件数（以下、カウントと呼ぶ場合がある）をグループごとに算出する。これにより、グループごとのグループに属するデータの件数が算出される。 Next, in S1702, the grouping unit 144 searches the attribute series DB 133 by using each data included in the group generated in S1701 as a key, thereby obtaining the attribute series data stored in the attribute series DB 133 for each group. Classify. The grouping unit 144 also calculates the number of data matching the key (hereinafter sometimes referred to as a count) for each group. Thereby, the number of data belonging to the group for each group is calculated.

次に、Ｓ１７０３にて、グループ化部１４４は、Ｓ１７０２にて抽出したすべてのレコードの行番号（以下、行番号リストと呼ぶ場合がある）をグループごとにすべて抽出する。 Next, in S <b> 1703, the grouping unit 144 extracts all the row numbers of all the records extracted in S <b> 1702 (hereinafter may be referred to as a row number list) for each group.

次に、Ｓ１７０４にて、グループ化部１４４は、Ｓ１５０３にて選択する入力を受け付けたデータ項目の中から一つのデータ項目を選択する。なお、グループ化部１４４は、未選択のデータ項目を選択する。 Next, in S1704, the grouping unit 144 selects one data item from the data items that have received the input selected in S1503. The grouping unit 144 selects an unselected data item.

次に、Ｓ１７０５にて、グループ化部１４４は、Ｓ１７０１にて生成した各グループと、Ｓ１７０２にて算出した各カウントと、Ｓ１７０３にて抽出した各行番号とを対応付けたグループ化データを、図１８に示されるようにグループ化ＤＢ１３４に記憶する。 Next, in S1705, the grouping unit 144 creates grouped data in which each group generated in S1701, each count calculated in S1702, and each row number extracted in S1703 are associated with each other. Is stored in the grouping DB 134 as shown in FIG.

次に、Ｓ１７０６にて、グループ化部１４４は、Ｓ１７０４にて選択したデータ項目のデータが加工された後の区間であるかを判定する。Ｓ１７０６にて、グループ化部１４４が、データが区間であると判定する場合（Ｓ１７０６−Ｙｅｓ）、Ｓ１７０７へ進む。一方、Ｓ１７０６にて、グループ化部１４４が、データが区間でないと判定する場合（Ｓ１７０６−Ｎｏ）、Ｓ１７０８へ進む。 Next, in S1706, the grouping unit 144 determines whether the section is after the data of the data item selected in S1704 has been processed. In S1706, when the grouping unit 144 determines that the data is a section (Yes in S1706), the process proceeds to S1707. On the other hand, if the grouping unit 144 determines in step S1706 that the data is not a section (No in step S1706), the process advances to step S1708.

Ｓ１７０７にて、グループ化部１４４は、行番号リストに含まれる各行番号に対応する数値をグループごとに取得する。そして、グループ化部１４４は、取得した数値に基づき、Ｓ１７０４にて選択したデータ項目のデータの最小値と最大値と平均値とを算出する。これにより、グループ化部１４４は、区間に対応する数値を算出する。そして、グループ化部１４４は、算出した最小値と最大値と平均値とをグループ化ＤＢ１３４に追加する。 In S1707, grouping unit 144 acquires a numerical value corresponding to each row number included in the row number list for each group. Then, the grouping unit 144 calculates the minimum value, the maximum value, and the average value of the data items selected in S1704 based on the acquired numerical values. Thereby, the grouping unit 144 calculates a numerical value corresponding to the section. Then, the grouping unit 144 adds the calculated minimum value, maximum value, and average value to the grouping DB 134.

Ｓ１７０８にて、グループ化部１４４は、Ｓ１７０４にてすべてのデータ項目を選択したかを判定する。Ｓ１７０８にて、グループ化部１４４が、すべてのデータ項目を選択したと判定する場合（Ｓ１７０８−Ｙｅｓ）、グループ化処理を終了する。一方、グループ化部１４４が、すべてのデータ項目を選択していないと判定する場合（Ｓ１７０８−Ｎｏ）、Ｓ１７０４へ進む。 In S1708, the grouping unit 144 determines whether all data items have been selected in S1704. If the grouping unit 144 determines in S1708 that all data items have been selected (S1708—Yes), the grouping process ends. On the other hand, when the grouping unit 144 determines that not all data items have been selected (No in S1708), the process proceeds to S1704.

＜本実施の形態の効果＞
以上説明した本実施の形態におけるデータ分類装置１によれば、入力装置１２０が選択する入力を受け付けたデータ項目に基づき抽出されるデータをグループとすることで、分類後のグループが、どのような基準で分類されたのかを識別することが容易になる。 <Effects of the present embodiment>
According to the data classification device 1 in the present embodiment described above, the data extracted based on the data item that has received the input selected by the input device 120 is grouped, so that the group after classification is It becomes easy to identify whether it is classified by the standard.

また、区間に対応する数値である最小値、最大値、平均値を算出することで、区間が同じデータに対して、データの違いを示せるようになる。 In addition, by calculating the minimum value, maximum value, and average value, which are numerical values corresponding to the sections, the difference in data can be shown for data having the same section.

また、グループごとにグループに属するデータの件数を算出することで、グループに属するデータの件数を重みとして、グループを重み付きのデータとして扱った分析が可能になる。 Further, by calculating the number of data belonging to the group for each group, it is possible to perform analysis in which the number of data belonging to the group is used as a weight and the group is handled as weighted data.

また、各要素間を比較することで算出される比率、または、データに含まれる数が最も多い要素である最頻要素を特徴として抽出することで、データの特徴を各要素の傾向から抽出できるようになる。 In addition, the characteristics of data can be extracted from the tendency of each element by extracting the ratio calculated by comparing each element or the most frequent element that is the element with the largest number included in the data as the feature. It becomes like this.

また、各要素の代表値の一例である平均値や、データの中で重複する要素を除外した出現種類を特徴として抽出することで、データの特徴を各要素の傾向から抽出できない場合であっても、各要素の代表値からデータの特徴を抽出できるようになる。 In addition, it is a case where the feature of the data cannot be extracted from the tendency of each element by extracting the average value which is an example of the representative value of each element or the appearance type excluding duplicate elements in the data as the feature. In addition, the feature of the data can be extracted from the representative value of each element.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。例えば、分析対象データとして、属性系列データ以外のデータを適用するようにしても良い。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say. For example, data other than attribute series data may be applied as the analysis target data.

１…データ分類装置、１１０…出力装置、１２０…入力装置、１３１…属性ＤＢ、１３２…系列ＤＢ、１３３…属性系列ＤＢ、１３４…グループ化ＤＢ、１４０…データ分類部、１４１…特徴抽出部、１４２…区分部、１４３…項目選択部、１４４…グループ化部、１６０１…チェック、１６０２…最大グループ数、１６０３…選択完了ボタン DESCRIPTION OF SYMBOLS 1 ... Data classification device, 110 ... Output device, 120 ... Input device, 131 ... Attribute DB, 132 ... Series DB, 133 ... Attribute series DB, 134 ... Grouping DB, 140 ... Data classification unit, 141 ... Feature extraction unit, 142 ... Classification section, 143 ... Item selection section, 144 ... Grouping section, 1601 ... Check, 1602 ... Maximum number of groups, 1603 ... Selection complete button

Claims

An attribute DB that stores customer identification information and customer attributes in association with data items of the respective data;
A series DB for storing the customer identification information, an action history indicating the action of the customer, and a date and time of the action of the customer in association with data items of the respective data;
Based on the identification information of the customer in the attribute DB and the identification information of the customer in the series DB, the customer identification information, the attribute of the customer in the attribute DB, and the action history of the series DB An attribute series DB for storing attribute series data including list data linked in chronological order in a period based on date and time in association with data items of the respective data of the attribute series data ;
The data of the attribute-series data to extract features from the extracted the feature, the feature extraction unit in association with the data item of the data of the feature to add to the attribute lines DB,
When the is an attribute included in the sequence data Lud chromatography data is numeric, a classification unit for converting the numerical into sections,
An output device for displaying a data item stored in the attribute lines DB in the list,
From Lud over data item displays the output device in the list, an input device that receives an input for selecting a data item for generating the group,
And based on the type of data corresponding to the data item that the input device accepts the input, and a grouping unit for grouping attributes series data stored in the attribute lines DB,
I have a,
The feature extraction unit includes, as the feature, a ratio of comparing adjacent elements in an element that is an action history constituting each list data, a most frequent value in an element constituting each list data, and an element constituting each list data A data classification device that calculates at least one of an average value and an appearance type in an element constituting each list data .

The data classification device according to claim 1, wherein
A grouping DB for storing results grouped by the grouping unit;
The said grouping part is a data classification apparatus which memorize | stores the classification result for every group in which it grouped in said grouping DB .

The data classification device according to claim 2 ,
The sorting unit adds the converted data to the attribute series DB in association with a data item indicating that the data has been converted into sections.
The grouping unit refers to the attribute series DB, generates a group so that customers whose data items that the input device accepts all of the values match are in the same group, and the input device accepts the input. If the data item is a data item indicating that it has been converted into the section, the average value, minimum value, and maximum value are referred to using the numerical values before conversion into the section belonging to the same group with reference to the attribute series DB. Is calculated for each group, the number of attribute series data belonging to the same group is calculated for each group, and the calculated average value, minimum value, maximum value, and number are stored as the classification result in the grouping DB. Classification device.

In the data classification device according to any one of claims 1 to 3,
The feature extraction unit, the attribute lines the number of elements in an element that is action history constituting each list data included in the data matches all, if the elements are numeric, compares the adjacent elements as the feature A data classification device that calculates the calculated ratio.

In the data classification device according to any one of claims 1 to 4 ,
The feature extraction unit, the attribute-series data element number matches all the elements that are action history constituting each list data contained in the case the element is not numeric, the elements of each list data as the feature extracting the most frequent value in a data classification device.

In the data classification device according to any one of claims 1 to 5 ,
The feature extraction unit, the attribute-series data without matching the number of elements in an element that is action history constituting each list data included, if the element is a numerical value, the elements of each list data as the feature A data classification device for calculating an average value in

In the data classification device according to any one of claims 1 to 6 ,
The feature extraction unit, the attribute-series data without matching the number of elements in an element that is action history constituting each list data included, if the element is not numeric, the elements of each list data as the feature A data classification device that extracts appearance types.

An attribute DB that stores customer identification information and customer attributes in association with data items of the respective data;
A series DB for storing the customer identification information, an action history indicating the action of the customer, and a date and time of the action of the customer in association with data items of the respective data;
Based on the identification information of the customer in the attribute DB and the identification information of the customer in the series DB, the customer identification information, the attribute of the customer in the attribute DB, and the action history of the series DB Data to be executed by a data classification apparatus having an attribute series DB that stores attribute series data including data that is linked in order of date and time in a period based on date and time in association with data items of the respective data of the attribute series data A classification method,
Feature extraction unit, the attribute extracting features from sequence data, the data of the extracted the feature, the data items additional step of adding in association with the data item in the data of the feature in the attribute lines DB,
If classification unit, Lud over data included in the attribute-series data it is numeric, and numeric conversion step of converting the numerical into sections,
Output device, and a display step of displaying the data item stored in the attribute lines DB in the list,
Input device, from Lud over data item displays the output device in the list, a data item selection step of accepting an input to select data items to generate a group,
Grouping unit, a grouping step of the input device is have group Dzu the type of data corresponding to the data item the input of which is accepted, to group attributes series data stored in the attribute lines DB,
I have a,
The data item adding step includes, as the characteristics, a ratio of comparing adjacent elements in elements that are action histories constituting each list data, a most frequent value in elements constituting each list data, and elements constituting each list data Calculating at least one of the average value and the appearance type in the elements constituting each list data,
Data classification method.

An attribute DB that stores customer identification information and customer attributes in association with data items of the respective data;
A series DB for storing the customer identification information, an action history indicating the action of the customer, and a date and time of the action of the customer in association with data items of the respective data;
Based on the identification information of the customer in the attribute DB and the identification information of the customer in the series DB, the customer identification information, the attribute of the customer in the attribute DB, and the action history of the series DB A computer of a data classification apparatus having an attribute series DB that stores attribute series data including list data linked in chronological order in a period based on date and time in association with data items of the respective data of the attribute series data. A data classification program to be executed,
Feature extraction unit, the attribute extracting features from sequence data, the data of the extracted the feature, the data items additional step of adding in association with the data item in the data of the feature in the attribute lines DB,
If classification unit, Lud over data included in the attribute-series data is numeric, and numeric conversion step of converting the numerical into sections,
Output device, and a display step of displaying the data item stored in the attribute lines DB in the list,
Input device, from Lud over data item displays the output device in the list, a data item selection step of accepting an input to select data items to generate a group,
Grouping unit, a grouping step of the input device is have group Dzu the type of data corresponding to the data item the input of which is accepted, to group attributes series data stored in the attribute lines DB,
Is executed by the computer of the data classification device ,
The data item adding step includes, as the characteristics, a ratio of comparing adjacent elements in elements that are action histories constituting each list data, a most frequent value in elements constituting each list data, and elements constituting each list data The data classification program which calculates at least one of the appearance value in the element which comprises the average value in each list data, and each list data .