JP2000231564A

JP2000231564A - Data mining auxiliary device, data converting method and recording medium with data format conversion program recorded therein

Info

Publication number: JP2000231564A
Application number: JP3134999A
Authority: JP
Inventors: Hidetoshi Tanaka; 秀俊田中; Tatsuo Kuraoka; 立郎倉岡
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-02-09
Filing date: 1999-02-09
Publication date: 2000-08-22
Anticipated expiration: 2019-02-09
Also published as: JP4070344B2

Abstract

PROBLEM TO BE SOLVED: To convert a table format where plural formats mix into an item column format about a data mining auxiliary device supplying data being suitable for a data mining engine performing a data analysis. SOLUTION: A user designates ID columns in which identifiers in a data unit are described, attribute columns in which attributes are described and value columns in which values are described about a processing object table 42. An attribute name is prepared in each combination of the kind of a value in the attribute columns and a name of the value column. The kind of a value described in the value columns in each attribute name is examined and an item is allocated to a part or all of those values. An item code is attached to each item on the basis of the number of appearance data units of each item in the processing object table. The item code is associated with respective data units where the same identifier is described in the ID columns to make them an item column format table.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データマイニング
補助装置、データ形式変換方法、および、データ形式変
換プログラムを記録した記録媒体に係る。特に、データ
解析を行うデータマイニングエンジンに適切なデータを
供給するうえで好適なデータマイニング補助装置、デー
タ形式変換方法、および記録媒体に関する。The present invention relates to a data mining auxiliary device, a data format conversion method, and a recording medium on which a data format conversion program is recorded. In particular, the present invention relates to a data mining auxiliary device, a data format conversion method, and a recording medium suitable for supplying appropriate data to a data mining engine that performs data analysis.

【０００２】[0002]

【従来の技術】従来より、データ相互間の相関を解析
し、意味のある相関関係を発見するデータマイニングエ
ンジンが知られている。従来のデータマイニングエンジ
ンは、バイナリ形式もしくはアイテム列形式のデータを
前提に設計されている。図１３はバイナリ形式のデータ
テーブルの例を示す。また、図１４はアイテム列形式の
データテーブルの例を示す。2. Description of the Related Art Conventionally, a data mining engine that analyzes a correlation between data and finds a meaningful correlation is known. Conventional data mining engines are designed on the premise of binary or item string data. FIG. 13 shows an example of a binary data table. FIG. 14 shows an example of a data table in an item string format.

【０００３】図１３に示す如く、バイナリ形式における
データ単位（データ処理上、同じ識別子（ＩＤ）を有す
るものとしてまとめることのできる単位）は行であり、
列には２値をとる属性（以下、「アイテム」と呼ぶ）が
並ぶ。各行には２値のいずれかが並び、アイテムの存在
と非存在とを表す。図１３に示すテーブルは、Item1か
らItem7までのアイテムが存在するかしないかを、デー
タ単位ID1からID4のそれぞれにおいて○×で表してい
る。As shown in FIG. 13, a data unit in binary format (a unit that can be combined as having the same identifier (ID) in data processing) is a row,
In the column, attributes that take binary values (hereinafter, referred to as “items”) are arranged. Either of the two values is arranged in each row, and indicates the presence or absence of the item. In the table shown in FIG. 13, whether or not the items from Item 1 to Item 7 are present is represented by ○ in each of the data units ID1 to ID4.

【０００４】図１４中に符号１０を付して示す如く、ア
イテム列形式は、存在する特性のみを行で示した形式で
ある。アイテム列形式では、しばしば、各行のアイテム
の存在がコードで記述されると共に、図１４中に符号１
２を付して示すテーブル１２（コードとアイテムの対応
を関連付けるテーブル）を別途持つことで全体の記録量
の低減が図られる。[0004] As shown by reference numeral 10 in FIG. 14, the item string format is a format in which only existing characteristics are indicated by rows. In the item column format, the presence of an item in each row is often described by a code, and the item 1 in FIG.
The total recording amount can be reduced by separately providing a table 12 (a table for associating the correspondence between the code and the item) indicated by 2.

【０００５】従来のデータマイニングエンジンには、ス
プレッドシート形式やトランザクション形式のデータを
扱う機能を有するものもある。図１５はスプレッドシー
ト形式のデータテーブルの例を示す。また、図１６はト
ランザクション形式のデータテーブルの例を示す。Some conventional data mining engines have a function of handling data in a spreadsheet format or transaction format. FIG. 15 shows an example of a data table in a spreadsheet format. FIG. 16 shows an example of a transaction format data table.

【０００６】図１５に示す如く、スプレッドシート形式
におけるデータ単位は、バイナリ形式の場合と同様に行
である。スプレッドシート形式における各列には、バイ
ナリ形式の場合と異なり、２値を取るとは限らない属性
が並ぶ。各行には、属性に対応する値（属性値）を表す
文字列、数値、または無値が並ぶ。As shown in FIG. 15, the data unit in the spreadsheet format is a row as in the case of the binary format. In each column in the spreadsheet format, unlike the case of the binary format, attributes that do not always take binary values are arranged. In each line, a character string, a numerical value, or a null value representing a value (attribute value) corresponding to the attribute is arranged.

【０００７】図１６に示す如く、トランザクション形式
では、ＩＤ列、属性列、および値列（属性値の列）の３
種類からなる列群が用いられる。値列は１列であり、デ
ータ単位は複数行にわたる。各行のＩＤ列の値は、その
行のデータ単位への所属を表す。[0007] As shown in FIG. 16, in the transaction format, three columns of an ID column, an attribute column, and a value column (attribute value column) are used.
A column group consisting of types is used. The value column is one column, and the data unit extends over a plurality of rows. The value of the ID column of each row indicates the affiliation of the row to the data unit.

【０００８】本出願人は、特願平１０−４０１４９号に
おいて、スプレッドシート形式からバイナリ形式への変
換を行う補助装置を開示している。また、特開平９−１
３４３６５号公報には、同じくスプレッドシート形式か
らバイナリ形式（該公開特許公報では０−１属性と名づ
けている）への変換を行なう装置が開示されている。更
に、従来の技術によれば、スプレッドシート形式からア
イテム列形式への変換を行うデータマイニング補助装置
として、図１７に示すような補助装置が考えられる。The applicant of the present invention has disclosed in Japanese Patent Application No. 10-40149 an auxiliary device for converting a spreadsheet format into a binary format. Also, Japanese Patent Laid-Open No. 9-1
Japanese Patent No. 34365 discloses an apparatus for converting a spreadsheet format into a binary format (named as 0-1 attribute in the publication). Further, according to the prior art, an auxiliary device as shown in FIG. 17 can be considered as a data mining auxiliary device for converting a spreadsheet format to an item string format.

【０００９】図１７に示す従来の補助装置は、データ変
換部１４を備えている。データ変換部１４では、スプレ
ッドシート形式以外のデータテーブルがスプレッドシー
ト形式に変換される。データ変換部１４が上記の処理を
行うことにより、スプレッドシート形式テーブル１６が
生成される。[0009] The conventional auxiliary device shown in FIG. The data converter 14 converts a data table other than the spreadsheet format into the spreadsheet format. The spreadsheet format table 16 is generated by the data conversion unit 14 performing the above processing.

【００１０】スプレッドシート形式テーブル１６の情報
は、属性毎情報調査部１８、属性値毎情報調査部２０、
およびアイテム列作成部２２に供給される。属性毎情報
調査部１８では、スプレッドシート形式テーブル１６の
属性毎に値の種類が調査されると共に、その調査結果に
基づいて、値アイテム対応テーブルが作成される。値ア
イテム対応テーブルは、個々の属性値と、それらに割り
振られたアイテムとの関連を表すテーブルである。The information in the spreadsheet format table 16 is stored in an information search unit 18 for each attribute, an information search unit 20 for each attribute value,
And is supplied to the item sequence creation unit 22. The attribute-by-attribute information investigation unit 18 examines the type of value for each attribute of the spreadsheet format table 16, and creates a value item correspondence table based on the investigation result. The value item correspondence table is a table showing the relation between individual attribute values and the items assigned to them.

【００１１】値アイテム対応テーブルの作成にあたって
は、ひとつの属性値についてひとつのアイテムを割り当
てることを基本とするが、値の種類が多い場合、例えば
属性値が連続値であるような場合は、範囲を決めて属性
値をカテゴリ化し、複数の値をひとつのアイテムに対応
させる。また、適当に選んだ複数の値をひとつのアイテ
ムに対応させること、或いは、適当に選んだ値にアイテ
ムを対応させないこともある。When the value item correspondence table is created, one item is basically assigned to one attribute value. However, when there are many types of values, for example, when the attribute values are continuous values, a range is set. And categorize the attribute values, and associate a plurality of values with one item. Also, a plurality of appropriately selected values may correspond to one item, or an item may not correspond to an appropriately selected value.

【００１２】属性値毎情報調査部２０は、属性毎情報調
査部１８から提供される上記の値アイテム対応テーブル
に基づき、例えば、スプレッドシート形式テーブル１６
における各アイテムの出現数を求める。そして、属性値
毎情報調査部２０は、各アイテムに、その出現数順に正
の整数をアイテムコードとして付番することで、アイテ
ムコード対応テーブルを作成する。The attribute-by-attribute information research unit 20 is based on the value item correspondence table provided from the attribute-by-attribute information research unit 18 and, for example, a spreadsheet format table 16.
Find the number of occurrences of each item in. Then, the attribute value-based information investigating unit 20 creates an item code correspondence table by numbering each item with a positive integer as an item code in the order of the number of appearances.

【００１３】アイテム列作成部２２は、スプレッドシー
ト形式テーブル１６の各行について、各属性値に対応す
るアイテムを、上記値アイテム対応テーブルに基づいて
決定する。更に、アイテム列作成部２２は、決定された
アイテムのコードを上記アイテムコード対応テーブルに
基づいて決定し、アイテムコードの並んだアイテム列形
式を作成する。The item string creating unit 22 determines an item corresponding to each attribute value for each row of the spreadsheet format table 16 based on the value item correspondence table. Further, the item sequence creating unit 22 determines the code of the determined item based on the item code correspondence table, and creates an item sequence in which the item codes are arranged.

【００１４】図１７に示す補助装置は、更に、エンジン
パラメータ決定部２４を備えている。エンジンパラメー
タ決定部２４では、データマイニングエンジンの稼動時
におけるパラメータが決定される。例えば、アイテム出
現頻度の上限および下限を設けてデータマイニングエン
ジンを稼働させるような場合に、それらのパラメータが
エンジンパラメータ決定部２４において決定される。The auxiliary device shown in FIG. 17 further includes an engine parameter determining unit 24. The engine parameter determining unit 24 determines parameters when the data mining engine operates. For example, when an upper limit and a lower limit of the item appearance frequency are set to operate the data mining engine, those parameters are determined by the engine parameter determining unit 24.

【００１５】[0015]

【発明が解決しようとする課題】上述の如く、図１７に
示す装置によれば、スプレッドシート形式からアイテム
列形式への変換を行うことができる。同様に、従来の技
術によれば、トランザクション形式からアイテム列形式
への変換も行うことができる。しかしながら、従来の技
術によれば、スプレッドシート形式とトランザクション
形式とが混在したテーブル形式を扱う場合には、事前に
いずれかの形式にデータを変換することが必要であり、
その都度データ変換部を設計製作することが必要であっ
た。As described above, according to the apparatus shown in FIG. 17, conversion from a spreadsheet format to an item string format can be performed. Similarly, according to the conventional technique, the conversion from the transaction format to the item string format can also be performed. However, according to the conventional technology, when handling a table format in which a spreadsheet format and a transaction format are mixed, it is necessary to convert data to one of the formats in advance,
In each case, it was necessary to design and manufacture the data converter.

【００１６】また、従来の技術によれば、複数のテーブ
ルにまたがって共通のデータ単位を決めてデータマイニ
ングエンジンを適用しようとする場合、複数のテーブル
を事前にひとつのテーブルに変換するか、若しくは、ユ
ーザーがそれらをひとつのテーブルにまとめるために用
い得るビューを用意する必要があった。According to the conventional technique, when a data mining engine is applied by determining a common data unit across a plurality of tables, the plurality of tables are converted into one table in advance, or , They needed to provide a view that users could use to combine them into a single table.

【００１７】本発明は、上記のような課題を解決するた
めになされたもので、スプレッドシート形式とトランザ
クション形式が混在したテーブル形式からアイテム列形
式への変換を可能とすることにより、より広い範囲のデ
ータに対するデータマイニングエンジンの適用を容易と
するデータマイニング補助装置を提供することを第１の
目的とする。また、本発明は、複数のテーブルを一つに
統合する場合に、データ形式をアイテム列形式に変換す
る直前にその統合を行うことにより、テーブル毎の解析
を並行して行なうことを可能とし、かつ、テーブルの統
合に用いる領域を小さくすることを可能とするデータマ
イニング補助装置を提供することを第２の目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and a wide range can be obtained by enabling conversion from a table format in which a spreadsheet format and a transaction format are mixed to an item column format. It is a first object of the present invention to provide a data mining assisting device that makes it easy to apply a data mining engine to such data. Further, the present invention, when integrating a plurality of tables into one, by performing the integration immediately before converting the data format to the item string format, it is possible to perform analysis for each table in parallel, It is a second object of the present invention to provide a data mining auxiliary device which can reduce an area used for table integration.

【００１８】[0018]

【課題を解決するための手段】請求項１記載の発明は、
データ相互間の相関関係を発見するデータマイニングエ
ンジンに、該エンジンで処理可能なテーブル形式でデー
タを供給するデータマイニング補助装置であって、テー
ブル形式のデータ構造を有する処理対象テーブルについ
て、データ単位の識別子を記述するＩＤ列、属性を記述
する属性列、および、値を記述する値列を、ユーザーが
指定するためのテーブルビュー生成部と、前記属性列の
値の種類と前記値列の名称との組み合わせ毎に属性名を
作成して、その組み合わせと属性名との対応を表す属性
名テーブルを作成する属性名決定部と、前記属性名テー
ブルに定義される属性名毎に前記値列に記述された値の
種類を調査すると共に、それらの値の一部または全部に
対してアイテムを決定して、値とアイテムとの対応を表
す値アイテム対応テーブルを作成する属性毎情報調査部
と、前記処理対象テーブルにおける各アイテムの出現デ
ータ単位数に基づいて各アイテムにアイテムコードを付
けることにより、アイテムコード対応テーブルを作成す
る属性値毎情報調査部と、前記処理対象テーブルにおい
て前記ＩＤ列に同じ識別子が記述されているデータ単位
のそれぞれに対応するアイテムコードを、前記値アイテ
ム対応テーブルおよび前記アイテムコード対応テーブル
に基づいて調査すると共に、各データ単位とアイテムコ
ードとの対応を表すアイテム列を作成するアイテム列作
成部と、を備えることを特徴とするものである。According to the first aspect of the present invention,
A data mining auxiliary device that supplies a data mining engine that discovers a correlation between data to a data mining engine in a table format that can be processed by the engine. A table view generation unit for a user to specify an ID column for describing an identifier, an attribute column for describing an attribute, and a value column for describing a value; and a value type of the attribute column and a name of the value column. An attribute name determining unit that creates an attribute name for each combination of the attribute names and creates an attribute name table representing the correspondence between the combination and the attribute name, and describes in the value column for each attribute name defined in the attribute name table Investigate the types of values that have been set, determine items for some or all of those values, and indicate the value-item correspondence that corresponds to the item. An attribute-based information research unit that creates an item code correspondence table by attaching an item code to each item based on the number of appearance data units of each item in the processing target table. In the processing target table, an item code corresponding to each data unit in which the same identifier is described in the ID column is checked based on the value item correspondence table and the item code correspondence table. And an item sequence creation unit that creates an item sequence representing a correspondence with the item code.

【００１９】請求項２記載の発明は、請求項１記載のデ
ータマイニング補助装置であって、前記属性毎情報調査
部は、前記値列に記述された値のうち所定の種類のもの
については、カテゴリ化を許容し、かつ、カテゴリ毎に
アイテムを割り付けることを許容することを特徴とする
ものである。According to a second aspect of the present invention, there is provided the data mining auxiliary device according to the first aspect, wherein the attribute-based information investigating section determines, for a predetermined type of the values described in the value sequence, The present invention is characterized in that categorization is allowed and that it is possible to allocate items for each category.

【００２０】請求項３記載の発明は、請求項１または２
記載のデータマイニング補助装置であって、データマイ
ニングエンジンの稼働状態を制御するパラメータを決定
するエンジンパラメータ決定部を備えることを特徴とす
るものである。The invention according to claim 3 is the invention according to claim 1 or 2
The data mining assisting device according to claim 1, further comprising an engine parameter determining unit that determines a parameter for controlling an operation state of the data mining engine.

【００２１】請求項４記載の発明は、請求項１乃至３の
何れか１項記載のデータマイニング補助装置であって、
前記属性名決定部は、前記テーブルビュー生成部で指定
された前記属性列の値の種類毎に設定された行と、前記
テーブルビュー生成部で指定された前記値列と同じ列と
を有し、かつ、前記属性名が、前記属性列の値と前記値
列の名称とを所定の法則で結合することで定義されてい
る属性名テーブルを自動生成することを特徴とするもの
である。According to a fourth aspect of the present invention, there is provided the data mining auxiliary device according to any one of the first to third aspects,
The attribute name determination unit has a row set for each type of value of the attribute column specified by the table view generation unit, and a same column as the value column specified by the table view generation unit In addition, an attribute name table is automatically generated in which the attribute name is defined by combining a value of the attribute column and a name of the value column according to a predetermined rule.

【００２２】請求項５記載の発明は、請求項４記載のデ
ータマイニング補助装置であって、前記属性名決定部
は、前記属性名の修正を属性名毎に受け付ける手段、お
よび、全ての属性名の修正を一括して受け付ける手段の
少なくとも一方を備えることを特徴とするものである。According to a fifth aspect of the present invention, there is provided the data mining assisting apparatus according to the fourth aspect, wherein the attribute name determining unit receives a correction of the attribute name for each attribute name, and And at least one of means for collectively accepting the corrections described above.

【００２３】請求項６記載の発明は、請求項１乃至５の
何れか１項記載のデータマイニング補助装置であって、
前記属性名決定部は、前記テーブルビュー生成部におい
てＩＤ列および属性列のみが指定され、値列が指定され
なかった場合に、値列の名称を属性名とする１行の属性
名テーブルを自動生成することを特徴とするものであ
る。According to a sixth aspect of the present invention, there is provided the data mining auxiliary device according to any one of the first to fifth aspects, wherein:
The attribute name determining unit automatically creates a one-line attribute name table having the name of the value column as an attribute name when only the ID column and the attribute column are specified in the table view generating unit and the value column is not specified. It is characterized by generating.

【００２４】請求項７記載の発明は、請求項３記載のデ
ータマイニング補助装置であって、前記エンジンパラメ
ータ決定部は、前記属性毎情報調査部から得られる値ア
イテム対応テーブルを用いて、どのアイテムがどの属性
に包含され、どの属性がどの属性列の値に包含されてい
るかを前記データマイニングエンジンに提示し、前記デ
ータマイニングエンジンは、前記属性列の値を単位とし
たパラメータ設定を受け付けることを特徴とするもので
ある。According to a seventh aspect of the present invention, there is provided the data mining assisting apparatus according to the third aspect, wherein the engine parameter determining unit determines which item is determined by using a value item correspondence table obtained from the attribute-based information research unit. Is included in which attribute and which attribute column is included in the value of which attribute column is presented to the data mining engine, and the data mining engine accepts a parameter setting in units of the value of the attribute column. It is a feature.

【００２５】請求項８記載の発明は、テーブル形式の異
なる複数の処理対象テーブルの情報を、データマイニン
グエンジンで処理可能な形式に変換する請求項１乃至７
の何れか１項記載のデータマイニング補助装置であっ
て、前記テーブルビュー生成部は、前記ＩＤ列、前記属
性列、および、前記値列の指定をテーブル形式毎に許容
し、前記属性名決定部は、テーブル形式毎に前記属性名
テーブルを作成し、前記属性毎情報調査部は、テーブル
形式毎に前記値アイテム対応テーブルを作成し、前記属
性値毎情報調査部は、前記処理対象テーブルの全体に対
して前記アイテムコード対応テーブルを作成し、前記ア
イテム列作成部は、前記値アイテム対応テーブル、前記
アイテムコード対応テーブル、および前記複数の処理対
象テーブルに基づいて、単一のアイテム列を作成するこ
とを特徴とするものである。According to an eighth aspect of the present invention, information of a plurality of tables to be processed having different table formats is converted into a format that can be processed by a data mining engine.
4. The data mining assisting device according to claim 1, wherein the table view generation unit allows specification of the ID column, the attribute column, and the value column for each table format, and the attribute name determination unit. Creates the attribute name table for each table format, the attribute information investigation unit creates the value item correspondence table for each table format, and the attribute value information investigation unit The item code correspondence table, and the item string creation unit creates a single item string based on the value item correspondence table, the item code correspondence table, and the plurality of processing target tables. It is characterized by the following.

【００２６】請求項９記載の発明は、データ相互間の相
関関係を発見するデータマイニングエンジンに、該エン
ジンで処理可能なテーブル形式のデータを供給するため
のデータ形式変換方法であって、テーブル形式のデータ
構造を有するテーブルを処理対象テーブルとして選択す
るテーブル選択ステップと、前記処理対象テーブルにつ
いて、データ単位の識別子を記述するＩＤ列、属性を記
述する属性列、および、値を記述する値列を指定するＩ
Ｄ列・属性列・値列選択ステップと、前記属性列の値の
種類と前記値列の名称との組み合わせ毎に属性名を作成
して、その組み合わせと属性名との対応を表す属性名テ
ーブルを作成する属性名テーブル作成ステップと、前記
属性名テーブルに定義される属性名毎に前記値列に記述
された値の種類を調査すると共に、それらの値の一部ま
たは全部に対してアイテムを決定して、値とアイテムと
の対応を表す値アイテム対応テーブルを作成する値アイ
テム対応テーブル作成ステップと、前記処理対象テーブ
ルにおける各アイテムの出現データ単位数に基づいて各
アイテムにアイテムコードを付けることにより、アイテ
ムコード対応テーブルを作成するアイテムコード対応テ
ーブル作成ステップと、前記処理対象テーブルにおいて
前記ＩＤ列に同じ識別子が記述されているデータ単位の
それぞれに対応するアイテムコードを、前記値アイテム
対応テーブルおよび前記アイテムコード対応テーブルに
基づいて調査すると共に、各データ単位とアイテムコー
ドとの対応を表すアイテム列を作成するアイテム列作成
ステップと、を備えることを特徴とするものである。According to a ninth aspect of the present invention, there is provided a data format conversion method for supplying data in a table format that can be processed by the engine to a data mining engine that discovers a correlation between data. A table selection step of selecting a table having a data structure of, as an object table, an ID column describing an identifier of a data unit, an attribute column describing an attribute, and a value column describing a value for the object table. I to specify
D column / attribute column / value column selecting step, an attribute name table for creating an attribute name for each combination of the value type of the attribute column and the name of the value column, and representing a correspondence between the combination and the attribute name Creating an attribute name table, and examining the type of value described in the value column for each attribute name defined in the attribute name table, and selecting an item for a part or all of those values. Determining and creating a value item correspondence table representing the correspondence between values and items, and attaching an item code to each item based on the number of appearance data units of each item in the processing target table An item code correspondence table creating step for creating an item code correspondence table, and the same as the ID column in the processing target table. An item code corresponding to each data unit in which an identifier is described is examined based on the value item correspondence table and the item code correspondence table, and an item string representing the correspondence between each data unit and the item code is determined. And a step of creating an item string to be created.

【００２７】請求項１０記載の発明は、請求項９記載の
データ形式変換方法であって、前記ＩＤ列・属性列・値
列選択ステップでＩＤ列および属性列のみが指定され、
値列が指定されなかった場合に、前記属性名テーブル作
成ステップでは、値列の名称を属性名とする１行の属性
名テーブルが自動生成されることを特徴とするものであ
る。According to a tenth aspect of the present invention, in the data format conversion method according to the ninth aspect, only the ID column and the attribute column are specified in the ID column / attribute column / value column selecting step,
When the value column is not specified, the one-line attribute name table having the name of the value column as the attribute name is automatically generated in the attribute name table creating step.

【００２８】請求項１１記載の発明は、請求項９または
１０記載のデータマイニング補助装置であって、テーブ
ル形式の異なる複数の処理対象テーブルの情報を、デー
タマイニングエンジンで処理可能な形式に変換するデー
タ形式変換方法であって、前記ＩＤ列・属性列・値列選
択ステップは、前記ＩＤ列、前記属性列、および、前記
値列の指定をテーブル形式毎に行うステップを有し、前
記属性名テーブル作成ステップは、テーブル形式毎に前
記属性名テーブルを作成するステップを有し、前記値ア
イテム対応テーブル作成ステップは、テーブル形式毎に
前記値アイテム対応テーブルを作成するステップを有
し、前記アイテムコード対応テーブル作成ステップで
は、前記処理対象テーブルの全体に対して前記アイテム
コード対応テーブルが作成され、前記アイテム列作成ス
テップでは、前記値アイテム対応テーブル、前記アイテ
ムコード対応テーブル、および前記複数の処理対象テー
ブルに基づいて、単一のアイテム列が作成されることを
特徴とするものである。According to an eleventh aspect of the present invention, there is provided the data mining assistance device according to the ninth or tenth aspect, wherein information of a plurality of tables to be processed having different table formats is converted into a format which can be processed by the data mining engine. A data format conversion method, wherein the ID column / attribute column / value column selecting step includes a step of specifying the ID column, the attribute column, and the value column for each table format, The table creation step has a step of creating the attribute name table for each table format, and the value item correspondence table creation step has a step of creating the value item correspondence table for each table format, and the item code In the correspondence table creation step, the item code correspondence table is generated for the entire processing target table. And wherein in the item string creation step, a single item string is created based on the value item correspondence table, the item code correspondence table, and the plurality of processing target tables. .

【００２９】請求項１２記載の発明は、データ相互間の
相関関係を発見するデータマイニングエンジンに、該エ
ンジンで処理可能なテーブル形式のデータを供給するた
めのデータ形式変換プログラムを記録した記録媒体であ
って、前記プログラムは、コンピュータに、所定の入力
に応じて処理対象テーブルを選択させ、所定の入力に応
じて、前記処理対象テーブルにおいて、データ単位の識
別子を記述するＩＤ列、属性を記述する属性列、およ
び、値を記述する値列を認識させ、前記属性列の値の種
類と前記値列の名称との組み合わせ毎に属性名を作成さ
せると共に、その組み合わせと属性名との対応を表す属
性名テーブルを作成させ、前記属性名テーブルに定義さ
れる属性名毎に前記値列に記述された値の種類を調査さ
せると共に、それらの値の一部または全部に対してアイ
テムを認識させて、値とアイテムとの対応を表す値アイ
テム対応テーブルを作成させ、前記処理対象テーブルに
おける各アイテムの出現データ単位数に基づいて各アイ
テムにアイテムコードを付けることにより、アイテムコ
ード対応テーブルを作成させ、前記処理対象テーブルに
おいて前記ＩＤ列に同じ識別子が記述されているデータ
単位のそれぞれに対応するアイテムコードを、前記値ア
イテム対応テーブルおよび前記アイテムコード対応テー
ブルに基づいて調査させると共に、各データ単位とアイ
テムコードとの対応を表すアイテム列を作成させること
を特徴とするものである。According to a twelfth aspect of the present invention, there is provided a recording medium recording a data format conversion program for supplying a table format data processable by the engine to a data mining engine for finding a correlation between data. The program causes a computer to select a table to be processed according to a predetermined input, and describes an ID column and an attribute describing an identifier of a data unit in the table to be processed according to a predetermined input. Recognize an attribute column and a value column describing a value, create an attribute name for each combination of the value type of the attribute column and the name of the value column, and represent the correspondence between the combination and the attribute name An attribute name table is created, and for each attribute name defined in the attribute name table, the type of the value described in the value column is checked. Recognize items for some or all of the values, create a value item correspondence table representing the correspondence between values and items, and assign an item to each item based on the number of appearance data units of each item in the processing target table. By attaching a code, an item code correspondence table is created, and in the processing target table, an item code corresponding to each of the data units in which the same identifier is described in the ID column is stored in the value item correspondence table and the item code. The present invention is characterized in that an investigation is performed based on a correspondence table, and an item string representing the correspondence between each data unit and the item code is created.

【００３０】請求項１３記載の発明は、請求項１２記載
の記録媒体であって、前記プログラムは、前記処理対象
テーブルに対して、ＩＤ列および属性列のみが指定さ
れ、値列が指定されなかった場合に、コンピュータに、
前記処理対象テーブルの値列の名称を属性名とする１行
の属性名テーブルを自動生成させることを特徴とするも
のである。According to a thirteenth aspect of the present invention, in the recording medium according to the twelfth aspect, the program specifies only an ID column and an attribute column and does not specify a value column for the processing target table. In the event that
It is characterized in that a one-line attribute name table having the name of the value column of the processing target table as an attribute name is automatically generated.

【００３１】請求項１４記載の発明は、請求項１２また
は１３記載の記録媒体であって、テーブル形式の異なる
複数の処理対象テーブルの情報を、データマイニングエ
ンジンで処理可能な形式に変換するプログラムを記録し
た記録媒体であって、前記プログラムは、コンピュータ
に、前記処理対象テーブルのＩＤ列、属性列、および、
値列を、テーブル形式毎に認識させ、前記属性名テーブ
ルをテーブル形式毎に作成させ、前記値アイテム対応テ
ーブルを形式毎に作成させ、前記アイテムコード対応テ
ーブルを前記処理対象テーブルの全体に対して作成さ
せ、更に、前記値アイテム対応テーブル、前記アイテム
コード対応テーブル、および前記複数の処理対象テーブ
ルに基づいて、単一のアイテム列を作成させることを特
徴とするものである。According to a fourteenth aspect of the present invention, in the recording medium according to the twelfth or thirteenth aspect, there is provided a program for converting information of a plurality of tables to be processed having different table formats into a format that can be processed by a data mining engine. A recorded recording medium, wherein the program causes a computer to execute an ID column, an attribute column, and
A value string is recognized for each table format, the attribute name table is created for each table format, the value item correspondence table is created for each format, and the item code correspondence table is created for the entire processing target table. And generating a single item string based on the value item correspondence table, the item code correspondence table, and the plurality of processing target tables.

【００３２】[0032]

【発明の実施の形態】以下、図面を参照してこの発明の
実施の形態について説明する。尚、各図において共通す
る要素には、同一の符号を付して重複する説明を省略す
る。Embodiments of the present invention will be described below with reference to the drawings. Elements common to the drawings are denoted by the same reference numerals, and redundant description will be omitted.

【００３３】実施の形態１．本実施形態では、ある製造
プロセスにおいて、複数の検査点を対象として実行され
た検査の結果が一つのテーブルに記録されており、その
テーブルにデータマイニングエンジンを適用する場合に
ついて説明する。より具体的には、上記複数の検査点に
対して、それぞれ独自の検査項目が設定されており、そ
れらの検査結果が、スプレッドシート形式とトランザク
ション形式とが混在する形式で一つのテーブルに記録さ
れている場合について説明する。Embodiment 1 In the present embodiment, a description will be given of a case where the results of inspections performed on a plurality of inspection points in a certain manufacturing process are recorded in one table, and a data mining engine is applied to the table. More specifically, unique inspection items are set for each of the plurality of inspection points, and the inspection results are recorded in one table in a format in which a spreadsheet format and a transaction format are mixed. Will be described.

【００３４】図１は、本実施形態のデータマイニング補
助装置のブロック構成図を示す。また、図２は、その動
作を説明するためのフローチャートを示す。図１に示す
テーブルビュー生成部３０では、先ず、図２に示すテー
ブル選択ステップ（Ｓ１０１）が実行される。Ｓ１０１
では、ユーザーによって、一つのテーブルが、データマ
イニングエンジンの処理対象として選択される。図３
は、上記のテーブル選択ステップＳ１０１で選択された
テーブルの一例を示す。図３に示すテーブル４２（以
下、「処理対象テーブル４２」と称す）は、上記の如
く、複数の検査点における検査結果が、スプレッドシー
ト形式とトランザクション形式とが混在する形式で記録
されたテーブルである。FIG. 1 is a block diagram showing the configuration of a data mining auxiliary device according to this embodiment. FIG. 2 is a flowchart for explaining the operation. In the table view generation unit 30 shown in FIG. 1, first, the table selection step (S101) shown in FIG. 2 is executed. S101
Then, one table is selected by the user as a processing target of the data mining engine. FIG.
Shows an example of the table selected in the table selection step S101. The table 42 shown in FIG. 3 (hereinafter, referred to as a “processing target table 42”) is a table in which inspection results at a plurality of inspection points are recorded in a format in which a spreadsheet format and a transaction format are mixed, as described above. is there.

【００３５】テーブルビュー生成部３０では、次に、Ｉ
Ｄ列・属性列・値列選択ステップ（Ｓ１０２）が実行さ
れる。Ｓ１０２では、処理対象テーブル４２の形式に対
して、ユーザーによって、ＩＤ列、属性列、値列の区別
が与えられる。図３は、処理対象テーブルの「製品Ｉ
Ｄ」および「投入ＩＤ」がＩＤ列に、「検査点」および
「検査法」が属性列に、また、「結果１」、「結果２」
および「結果３」が値列に指定された場合を示す。Next, the table view generation unit 30
A D column / attribute column / value column selection step (S102) is executed. In S102, the user gives a distinction between an ID column, an attribute column, and a value column to the format of the processing target table 42. FIG. 3 shows “Product I” in the processing target table.
“D” and “input ID” are in the ID column, “inspection point” and “inspection method” are in the attribute column, and “result 1” and “result 2”
And "Result 3" are specified in the value column.

【００３６】図１に示す属性名決定部３２では、図２に
示す属性名テーブル作成ステップ（Ｓ１０３）が実行さ
れる。Ｓ１０３では、図３に示す処理対象テーブル４２
の属性属性列に存在する値の種類と同じ行数を有し、か
つ、値列の列数と同じ列の数を有する属性名テーブルが
作成される。The attribute name determining section 32 shown in FIG. 1 executes an attribute name table creation step (S103) shown in FIG. In S103, the processing target table 42 shown in FIG.
An attribute name table is created which has the same number of rows as the types of values present in the attribute column and the same number of columns as the number of columns in the value column.

【００３７】図４は、上記の属性名テーブル作成ステッ
プＳ１０３の処理により作成される属性名テーブル４４
および４６を示す。本実施形態では、Ｓ１０３におい
て、先ず、属性名テーブル４４を作成する処理が行われ
る。すなわち、処理対象テーブル４２の属性列の値（す
なわち「p」および「1」）と、値列の名称（すなわち
「結果１」等）とを、区切り符号“_”を介して、か
つ、「結果」の文字を共通に省略して結合することで属
性名「p1_1」等を生成し、その属性名を各行各列に配置
することで、自動的に属性名テーブル４４を作成する処
理が行われる。このようにして属性名テーブル４４を自
動的に作成することによれば、属性名の指定がされない
まま後続の手続きへ進んでしまうことを確実に防ぐこと
ができる。FIG. 4 shows the attribute name table 44 created by the process of the attribute name table creation step S103.
And 46 are shown. In the present embodiment, in S103, first, a process of creating the attribute name table 44 is performed. That is, the value of the attribute column (ie, “p” and “1”) of the processing target table 42 and the name of the value column (ie, “result 1”) are separated by the delimiter “_” and “ The process of automatically creating the attribute name table 44 is performed by generating the attribute name “p1_1” or the like by omitting the characters of “result” in common and combining them, and arranging the attribute name in each row and each column. Will be By automatically creating the attribute name table 44 in this way, it is possible to reliably prevent the process from proceeding to the subsequent procedure without specifying the attribute name.

【００３８】本実施形態において、上述したＳ１０３で
は、ユーザーに対して、属性名テーブルの値、すなわ
ち、属性名を修正することが許容されている。本実施形
態の装置は、その手段として、例えばスプレッドシート
形式で表示されたユーザインタフェースに、自動生成さ
れた属性名テーブル４４を表示させておき、スプレッド
シート上のセルの修正を属性名の修正と解釈する、とい
うような属性名毎に修正を受け付ける手段を備えてい
る。また、本実施形態の装置は、上記の手段として、更
に、例えばＣＳＶ形式のファイルに属性名を用意させ、
それを読み込むファイルインタフェースと、ファイルを
選択するユーザインタフェースを介して属性名を修正す
るような、属性名テーブル全体の一括修正を受け付ける
手段を備えている。In the present embodiment, in S103 described above, the user is allowed to modify the value of the attribute name table, that is, the attribute name. The device according to the present embodiment displays the automatically generated attribute name table 44 on a user interface displayed in, for example, a spreadsheet format, and corrects a cell on the spreadsheet by correcting the attribute name. There is provided a means for receiving a correction for each attribute name such as interpreting. Further, the apparatus according to the present embodiment further includes, for example, preparing an attribute name in a CSV format file,
There is provided a means for receiving a batch correction of the entire attribute name table, such as correcting the attribute name via a file interface for reading the file and a user interface for selecting a file.

【００３９】上述したＳ１０３において、ユーザによっ
て上記の修正が実行されることにより、属性名テーブル
４４が、例えば、属性名テーブル４６のように変換され
る。属性名テーブル４６において用いられる属性名は、
属性名テーブル４４で用いられる属性名に比して、ユー
ザーにとって直感的に理解し易いものである。従って、
上記の如く属性名テーブルの修正を許容することによれ
ば、ユーザーにとっての操作性を高めることができる。In the above-described step S103, the attribute name table 44 is converted into, for example, an attribute name table 46 by performing the above-mentioned correction by the user. The attribute names used in the attribute name table 46 are:
Compared to the attribute names used in the attribute name table 44, it is easier for the user to intuitively understand. Therefore,
By allowing the attribute name table to be modified as described above, the operability for the user can be improved.

【００４０】属性名テーブル４４または４６が生成され
ると、図１に示すテーブルビュー生成部３０は、処理対
象テーブル４２（図３）と等しい情報を持つテーブルビ
ューを、図５に示すような形式でユーザーに提供するこ
とができる。本実施形態の装置は、以下に説明する機能
に加えて、このテーブルビュー４８についての質問に回
答できる機能を有している。When the attribute name table 44 or 46 is generated, the table view generating unit 30 shown in FIG. 1 converts the table view having the same information as the processing target table 42 (FIG. 3) into the format shown in FIG. Can be provided to the user. The apparatus of the present embodiment has a function of answering a question about the table view 48 in addition to the functions described below.

【００４１】図５に示すテーブルビュー４８が作成され
た後、図１に示す属性毎情報調査部３４において、図２
に示す値アイテム対応テーブル作成ステップ（Ｓ１０
４）の処理が実行される。Ｓ１０４では、テーブルビュ
ー生成部３０によって提供されるテーブルビュー４８の
属性毎（ｐ種別、ｐ電流等）に値の種類（ｍ、ｎ、１．
３、１．４等）が調査され、その結果に基づいて、ユー
ザーにより図６に示すのような値アイテム対応テーブル
５０が作成される。After the table view 48 shown in FIG. 5 is created, the attribute-by-attribute information examination unit 34 shown in FIG.
Step of creating a value item correspondence table shown in FIG.
The processing of 4) is executed. In S104, the type of value (m, n,...) For each attribute (p type, p current, etc.) of the table view 48 provided by the table view generating unit 30.
3, 1.4, etc.), and the user creates a value item correspondence table 50 as shown in FIG. 6 based on the result.

【００４２】Ｓ１０４の処理においては、個々の属性値
に対応して、属性名と属性値とを結合させることにより
アイテムが定義される。この処理においては、ひとつの
属性値について、ひとつのアイテムを割り当てることを
基本とするが、属性値の種類が多い場合、例えば図６の
「p電流」のように属性値が連続値をとるような場合
は、範囲を決めてその属性値をカテゴリ化し、複数の属
性値をひとつのアイテムに対応させる。In the process of S104, an item is defined by combining an attribute name and an attribute value corresponding to each attribute value. In this processing, one item is basically assigned to one attribute value. However, when there are many types of attribute values, for example, the attribute values take a continuous value like “p current” in FIG. In such a case, the range is determined, the attribute values are categorized, and a plurality of attribute values correspond to one item.

【００４３】また例えば、図６の「p個数」欄に示すよ
うに、適当に選んだ複数の属性値に対して、ひとつのア
イテムを対応させることとしてもよい。更に、図６中、
「p種別」の属性値ｍや、「p個数」の属性値１などに示
されるように、適当に選んだ属性値にはアイテムを対応
させないこととしてもよい。図６には、アイテムと対応
させない属性値が４カ所で設定されている。For example, as shown in the “p number” column of FIG. 6, one item may be associated with a plurality of appropriately selected attribute values. Further, in FIG.
As shown in the attribute value m of the “p type” and the attribute value 1 of the “p number”, the item may not be associated with an appropriately selected attribute value. In FIG. 6, attribute values not corresponding to the items are set at four places.

【００４４】図１に示す属性値毎情報調査部３６では、
図２に示すアイテムコード対応テーブル作成ステップ
（Ｓ１０５）が実行される。Ｓ１０５では、値アイテム
対応テーブル５０（図６）に基づき、例えば処理対象テ
ーブル４２（図３）におけるアイテムの出現数順が調査
される。そして、Ｓ１０５では、各アイテムに、その出
現数順に正の整数をアイテムコードとして付番すること
で、図７に示すようなアイテムコード対応テーブル５２
が作成される。The attribute value-based information research unit 36 shown in FIG.
An item code correspondence table creation step (S105) shown in FIG. 2 is executed. In S105, based on the value item correspondence table 50 (FIG. 6), for example, the order in which the items appear in the processing target table 42 (FIG. 3) is checked. In S105, a positive integer is assigned to each item as an item code in the order of the number of appearances, so that the item code correspondence table 52 shown in FIG.
Is created.

【００４５】図１に示すアイテム列作成部３８では、図
２に示すアイテム列作成ステップ（Ｓ１０６）が実行さ
れる。Ｓ１０６では、先ず、図５に示すテーブルビュー
４８のＩＤ（製品ＩＤと投入ＩＤとの結合を１単位とす
るＩＤ）のそれぞれについて、各属性値に対応するアイ
テムが図６に示す値アイテム対応テーブル５０に基づい
て決定される。次いで、Ｓ１０６では、各ＩＤに対応す
るアイテムのコードが、図７に示すアイテムコード対応
テーブル５２に基づいて決定される。そして、各ＩＤ
と、そのＩＤに対応するアイテムコードとを組み合わせ
ることで、図８に示すようなアイテムコードの並んだア
イテム列形式５４が作成される。The item sequence creating section 38 shown in FIG. 1 executes an item sequence creating step (S106) shown in FIG. In S106, first, for each of the IDs of the table view 48 shown in FIG. 5 (IDs in which the combination of the product ID and the input ID is one unit), the items corresponding to the respective attribute values are stored in the value item correspondence table shown in FIG. 50. Next, in S106, the code of the item corresponding to each ID is determined based on the item code correspondence table 52 shown in FIG. And each ID
And an item code corresponding to the ID, an item string format 54 in which the item codes are arranged as shown in FIG. 8 is created.

【００４６】図１に示すエンジンパラメータ決定部４０
では、エンジンパラメータ決定ステップ（Ｓ１０７）の
処理が実行される。Ｓ１０７では、データマイニングエ
ンジンの稼動時に用いられるパラメータが決定される。
より具体的には、データマイニングエンジンを、例えば
アイテム出現頻度の上限および下限を設けて稼働させた
いような場合に、所望の設定を実現するためのパラメー
タがＳ１０７で決定される。The engine parameter determining section 40 shown in FIG.
Then, the process of the engine parameter determination step (S107) is executed. In S107, parameters used when the data mining engine operates are determined.
More specifically, when it is desired to operate the data mining engine with, for example, an upper limit and a lower limit of the item appearance frequency, parameters for realizing a desired setting are determined in S107.

【００４７】上述の如く、本実施形態のデータマイニン
グ補助装置によれば、スプレッドシート形式と、トラン
ザクション形式とが混在したテーブル、すなわち、列の
意味が行によって異なるようなテーブルを、容易にアイ
テム列形式に変換することができる。このため、本実施
形態の補助装置を用いることによれば、個別のデータ変
換部を設計製作することなく、広い範囲のデータ形式に
データマイニングエンジンを適用することが可能とな
る。As described above, according to the data mining assisting apparatus of the present embodiment, a table in which the spreadsheet format and the transaction format are mixed, that is, a table in which the meaning of the column differs depending on the row, can be easily set in the item column. Can be converted to a format. For this reason, according to the use of the auxiliary device of the present embodiment, the data mining engine can be applied to a wide range of data formats without designing and manufacturing individual data conversion units.

【００４８】実施の形態２．次に、図９を参照して、本
発明の実施の形態２のデータマイニング補助装置につい
て説明する。図９は、本実施形態において、データマイ
ニング補助装置の処理対象とされるテーブル５６（以
下、第２処理対象テーブル５６）を示す。図９に示す如
く、第２処理対象テーブル５６は、スプレッドシート形
式のデータ構造を有している。Embodiment 2 Next, a data mining assistance device according to a second embodiment of the present invention will be described with reference to FIG. FIG. 9 shows a table 56 to be processed by the data mining assistance device (hereinafter, a second processing target table 56) in the present embodiment. As shown in FIG. 9, the second processing target table 56 has a data structure in a spreadsheet format.

【００４９】本実施形態のデータマイニング補助装置
は、図３に示すような処理対象テーブル４２を扱う場合
は、実施の形態１の装置と同様に動作する。また、本実
施形態のデータマイニング補助装置は、図９に示すよう
なスプレッドシート形式のテーブル５６を処理対象とす
る場合は、以下のように動作する。The data mining assisting apparatus according to the present embodiment operates in the same manner as the apparatus according to the first embodiment when handling the processing target table 42 as shown in FIG. The data mining assistance device of the present embodiment operates as follows when a spreadsheet format table 56 as shown in FIG. 9 is to be processed.

【００５０】すなわち、本実施形態の補助装置が第２処
理対象テーブル５６を処理対象とする場合は、テーブル
ビュー生成部３０（図１参照）でＩＤ列・属性列・値列
選択ステップ（Ｓ１０２）が実行される際に、図中に符
号５８を付して表す如く、ユーザーによってＩＤ列と値
列のみが指定される。属性毎情報調査部３４（図１参
照）は、上記のＳ１０２において、ユーザーがＩＤ列と
値列のみを指定したと認識すると、属性名マスターテー
ブル作成ステップ（Ｓ１０３）において、第２処理対象
テーブル５６の値列の名称６０をそのまま属性名とし
て、属性名テーブルを作成する。That is, when the auxiliary device of this embodiment processes the second processing target table 56, the table view generation unit 30 (see FIG. 1) selects an ID column, an attribute column, and a value column (S102). Is executed, only the ID column and the value column are designated by the user as indicated by reference numeral 58 in the figure. When recognizing that the user has specified only the ID column and the value column in S102, the attribute information research unit 34 (see FIG. 1) performs the second process target table 56 in the attribute name master table creation step (S103). The attribute name table is created by using the name 60 of the value column of “1” as the attribute name.

【００５１】上記の処理によれば、処理対象テーブル４
２を対象とする実施の形態１の処理中で必要とされた属
性名の自動生成や属性名の修正（図４参照）を行うこと
なく、属性名テーブルを容易に作成することができる。
従って、本実施形態の補助装置によれば、スプレッドシ
ート形式のデータを、容易にアイテム列形式に変換する
ことができる。このように、本実施形態の補助装置によ
れば、スプレッドシート形式のテーブルを、容易にデー
タマイニングエンジンで扱うことのできるテーブルに変
換することができる。According to the above processing, the processing target table 4
The attribute name table can be easily created without performing the automatic generation of attribute names and the correction of attribute names (see FIG. 4) required in the processing of Embodiment 1 for the second embodiment.
Therefore, according to the auxiliary device of the present embodiment, data in a spreadsheet format can be easily converted to an item string format. As described above, according to the auxiliary device of the present embodiment, a table in a spreadsheet format can be easily converted into a table that can be handled by a data mining engine.

【００５２】実施の形態３．次に、図１０を参照して、
本発明の実施の形態３について説明する。データマイニ
ングエンジンの中には、アイテムの階層構造を与えられ
ることにより、より効率的に処理を行うことが可能とな
るものがある。本実施形態の補助装置は、データマイニ
ングエンジンがこのようなエンジンである場合に、エン
ジンパラメータ決定部４０（図１参照）において、エン
ジンパラメータ決定ステップＳ１０７（図２参照）の処
理中に、図６に示されるようなアイテムと属性名（「ｐ
種別」など）、および属性名と属性列の値（「ｐ１」な
ど）という階層構造が図１０のように抽出され、その階
層構造がデータマイニングエンジンに提示される。Embodiment 3 FIG. Next, referring to FIG.
Embodiment 3 of the present invention will be described. Some data mining engines can be processed more efficiently by being given a hierarchical structure of items. When the data mining engine is such an engine, the auxiliary device of the present embodiment performs the processing shown in FIG. 6 during the processing of the engine parameter determination step S107 (see FIG. 2) in the engine parameter determination unit 40 (see FIG. 1). Item and attribute name ("p
A type such as “type”) and an attribute name and a value of an attribute column (such as “p1”) are extracted as shown in FIG. 10, and the hierarchical structure is presented to the data mining engine.

【００５３】データマイニング補助装置が、上述した属
性階層をエンジンパラメータの一部としてデータマイニ
ングエンジンに与えることによれば、実施の形態１の補
助装置が提供する情報に属性列の値と属性名との関係を
付加できるので、一段深い階層構造をエンジンに与える
ことができる。従って、本実施形態の補助装置によれ
ば、データマイニングエンジンに、より効率的に処理を
行わせることができる。According to the data mining assisting device giving the attribute hierarchy described above to the data mining engine as a part of the engine parameters, the information provided by the assisting device of the first embodiment includes the attribute column value, the attribute name, , It is possible to give the engine a deeper hierarchical structure. Therefore, according to the auxiliary device of the present embodiment, the data mining engine can perform the processing more efficiently.

【００５４】実施の形態４．次に、図１１および図１２
を参照して、本発明の実施の形態４のデータマイニング
補助装置について説明する。本実施形態では、ある製造
プロセスにおいてそれぞれ独自の検査項目が設定された
複数の検査点で実行された検査の結果が複数のテーブル
に記録されており、そのテーブルにデータマイニングエ
ンジンを適用する場合について説明する。Embodiment 4 Next, FIG. 11 and FIG.
A data mining assisting device according to a fourth embodiment of the present invention will be described with reference to FIG. In the present embodiment, the results of inspections performed at a plurality of inspection points each having a unique inspection item set in a certain manufacturing process are recorded in a plurality of tables, and a case where a data mining engine is applied to the tables is described. explain.

【００５５】図１１は、本実施形態のデータマイニング
補助装置が、２つのテーブルを処理対象とする場合の構
成を表すブロック図を示す。また、図１２は、その動作
を説明するためのフローチャートを示す。FIG. 11 is a block diagram showing a configuration in the case where the data mining assisting apparatus of the present embodiment processes two tables. FIG. 12 is a flowchart for explaining the operation.

【００５６】図１１において、テーブルビュー生成部６
２および６４、属性名決定部６６および６８、属性毎情
報調査部７０および７２は、それぞれ図１に示すテーブ
ルビュー生成部３０、属性名決定部３２、属性毎情報調
査部７４と同じ機能を有している。また、図１２に示す
テーブル選択ステップ（Ｓ２０１ａおよびＳ２０１
ｂ）、ＩＤ列・属性列・値列選択ステップ（Ｓ２０２ａ
およびＳ２０２ｂ）、属性名マスターテーブル作成ステ
ップ（Ｓ２０３ａおよびＳ２０３ｂ）、値アイテム対応
テーブル作成ステップ（Ｓ２０４ａおよびＳ２０４ｂ）
では、それぞれ、図２に示すテーブル選択ステップ（Ｓ
１０１）、ＩＤ列・属性列・値列選択ステップ（Ｓ１０
２）、属性名マスターテーブル作成ステップ（Ｓ１０
３）、値アイテム対応テーブル作成ステップ（Ｓ１０
４）と同様の処理が実行される。In FIG. 11, the table view generator 6
2 and 64, the attribute name determining units 66 and 68, and the attribute-based information checking units 70 and 72 have the same functions as the table view generating unit 30, the attribute name determining unit 32, and the attribute-based information checking unit 74 shown in FIG. are doing. Also, the table selection steps (S201a and S201a) shown in FIG.
b), ID column / attribute column / value column selection step (S202a)
And S202b), attribute name master table creation step (S203a and S203b), value item correspondence table creation step (S204a and S204b)
Then, respectively, the table selection step (S
101), ID column / attribute column / value column selection step (S10)
2), attribute name master table creation step (S10)
3), value item correspondence table creation step (S10)
The same processing as in 4) is performed.

【００５７】本実施形態において、図１１に示す属性値
毎情報調査部７４では、図１２に示すアイテムコード対
応テーブル作成ステップ（Ｓ２０５）の処理が実行され
る。Ｓ２０５では、先ず、属性毎情報調査部７０および
７２からそれぞれ提供される図６に示すような値アイテ
ム対応テーブルに基づき、例えば、テーブルビュー生成
部６２および６４がそれぞれ提供する図５のようなテー
ブルビューにおけるアイテムの出現数が求められる。そ
して、Ｓ２０５では、全てのアイテムに、個々のアイテ
ムの出現数順に正の整数（アイテムコード）を付番する
ことで、図７に示すようなひとつのアイテムコード対応
テーブルが作成される。In the present embodiment, the attribute value-based information investigating unit 74 shown in FIG. 11 executes the process of the item code correspondence table creating step (S205) shown in FIG. In S205, first, based on the value item correspondence tables as shown in FIG. 6 provided by the attribute-based information research units 70 and 72, for example, the tables as shown in FIG. The number of occurrences of the item in the view is determined. Then, in S205, a positive integer (item code) is assigned to all items in the order of the number of appearances of each item, thereby creating one item code correspondence table as shown in FIG.

【００５８】本実施形態において、アイテム列作成部３
８およびエンジンパラメータ決定部４０は、それぞれ実
施の形態１の場合（図１参照）と同様の機能を有してい
る。すなわち、本実施形態において、図１２に示すアイ
テム列作成ステップ（Ｓ２０５）およびエンジンパラメ
ータ決定ステップ（Ｓ２０６）では、図２に示すアイテ
ム列作成ステップ（Ｓ１０６）およびエンジンパラメー
タ決定ステップ（Ｓ１０７）とそれぞれ同様の処理が実
行される。In this embodiment, the item string creating unit 3
8 and the engine parameter determination unit 40 have the same functions as those of the first embodiment (see FIG. 1). That is, in the present embodiment, the item sequence creation step (S205) and the engine parameter determination step (S206) shown in FIG. 12 are the same as the item sequence creation step (S106) and the engine parameter determination step (S107) shown in FIG. Is performed.

【００５９】上述の如く、本実施形態のデータマイニン
グ補助装置によれば、複数のテーブルを処理対象とし
て、ひとつのアイテム列形式を作成することができる。
従って、本実施形態の補助装置によれば、実施の形態１
の装置に比べて、より広いデータ形式に対してデータマ
イニングエンジンを適用することができる。As described above, according to the data mining assistance device of the present embodiment, one item string format can be created for a plurality of tables to be processed.
Therefore, according to the auxiliary device of the present embodiment, the first embodiment
The data mining engine can be applied to a wider data format than the device of the above.

【００６０】また、本実施形態のデータマイニング補助
装置は、複数のテーブルが処理対象とされる場合に、値
アイテム対応テーブル作成ステップ（Ｓ２０４ａおよび
Ｓ２０４ｂ）以前の処理は、それぞれのテーブル毎に実
行される。値アイテム対応テーブルの作成時には、実施
の形態１で説明したように、一部の属性値にアイテムが
割り振られないことがある。Further, in the data mining assisting apparatus of this embodiment, when a plurality of tables are to be processed, the processing before the value item correspondence table creating step (S204a and S204b) is executed for each table. You. When the value item correspondence table is created, an item may not be assigned to some attribute values, as described in the first embodiment.

【００６１】アイテムの割り振られていない属性値は、
アイテムコード対応テーブルを作成する際に、処理の対
象から除外して扱うことができる。従って、アイテムコ
ード対応テーブル作成ステップ（Ｓ２０５）の前段まで
の処理をテーブル毎に実行することによれば、処理の対
象から除外し得る属性値をテーブル毎に特定した後に、
アイテムコード対応テーブルの作成処理を行うことがで
きる。この場合、ＩＤ列・属性列・値列選択ステップ
（Ｓ２０２ａ、Ｓ２０２ｂ）の段階、或いは、属性名テ
ーブル作成ステップ（Ｓ２０３ａ、Ｓ２０３ｂ）の段階
で複数のテーブルが混合される場合に比して、小さな処
理領域で所望の作業を行うことが可能となる。The attribute value to which no item is assigned is
When creating an item code correspondence table, it can be handled by excluding it from processing. Therefore, by executing the processing up to the previous stage of the item code correspondence table creation step (S205) for each table, after specifying attribute values that can be excluded from the processing target for each table,
An item code correspondence table can be created. In this case, it is smaller than the case where a plurality of tables are mixed at the stage of the ID column / attribute column / value column selection step (S202a, S202b) or the stage of the attribute name table creation step (S203a, S203b). A desired operation can be performed in the processing area.

【００６２】[0062]

【発明の効果】この発明は以上説明したように構成され
ているので、以下に示すような効果を奏する。請求項
１、９または１２記載の発明によれば、列の意味が行に
よって異なるようなテーブルを、データマイニングエン
ジンが扱い得るアイテム列形式に変換することができ
る。従って、本発明によれば、データ変換部を個別に設
計製作することなく、アイテム列形式を介して、広い範
囲のデータ形式にデータマイニングエンジンを適用する
ことができる。Since the present invention is configured as described above, it has the following effects. According to the first, ninth, or twelfth aspect of the present invention, a table in which the meaning of a column differs depending on a row can be converted into an item string format that can be handled by a data mining engine. Therefore, according to the present invention, the data mining engine can be applied to a wide range of data formats via the item sequence format without individually designing and manufacturing the data conversion unit.

【００６３】請求項２記載の発明によれば、属性値をカ
テゴリ化してアイテムの割り当てを行うことができるた
め、不必要にアイテム数が増えるのを防止することがで
きる。従って、本発明によれば、データ変換の過程での
演算負荷を減少させることができる。According to the second aspect of the present invention, items can be assigned by categorizing attribute values, so that it is possible to prevent an unnecessary increase in the number of items. Therefore, according to the present invention, it is possible to reduce the calculation load in the data conversion process.

【００６４】請求項３記載の発明によれば、補助装置に
おいて、データマイニングエンジンのパラメータを決定
することで、エンジンを効率良く作動させることができ
る。According to the third aspect of the present invention, by determining the parameters of the data mining engine in the auxiliary device, the engine can be operated efficiently.

【００６５】請求項４記載の発明によれば、属性名テー
ブルが自動的に作成されるので、属性名の指定がされな
いまま、処理が後続の手続きへ進んでしまうことを防ぐ
ことができる。According to the fourth aspect of the present invention, since the attribute name table is automatically created, it is possible to prevent the processing from proceeding to the subsequent procedure without specifying the attribute name.

【００６６】請求項５記載の発明によれば、属性名テー
ブルに定義される属性名をユーザーが理解し易い名前に
修正することができる。このため、本発明によれば、装
置の操作性を高めることができる。According to the invention described in claim 5, the attribute names defined in the attribute name table can be corrected to names that are easy for the user to understand. For this reason, according to the present invention, the operability of the device can be improved.

【００６７】請求項６、１０または１３記載の発明によ
れば、スプレッドシート形式のテーブルが処理対象テー
ブルとされた場合に、属性名の自動生成や属性名の修正
を行うことなく、より容易に属性名テーブルを作成する
ことができる。According to the sixth, tenth, or thirteenth aspect of the present invention, when a table in a spreadsheet format is set as a processing target table, it is easier to automatically generate an attribute name and to correct the attribute name. An attribute name table can be created.

【００６８】請求項７記載の発明によれば、エンジンパ
ラメータの一部として属性階層をデータマイニングエン
ジンに与えることができる。従って、本発明によれば、
属性階層に基づいて稼働効率を高めることのできるエン
ジンを、効率良く稼働させることが可能となる。According to the present invention, an attribute hierarchy can be given to the data mining engine as a part of the engine parameter. Thus, according to the present invention,
It is possible to efficiently operate an engine that can increase the operation efficiency based on the attribute hierarchy.

【００６９】請求項８、１１または１４記載の発明によ
れば、複数のテーブルを対象としてひとつのアイテム列
形式を作成することができる。従って、本発明によれ
ば、請求項１に記載の発明に比べ、より広いデータ形式
に対してデータマイニングエンジンを適用することが可
能となる。また、本発明では、複数のテーブルから得ら
れる情報を、アイテムコード対応テーブルを作成する段
階まで混合しないため、データの変換処理を、比較的小
さい作業領域で行うことができる。According to the eighth, eleventh, or fourteenth aspect, one item string format can be created for a plurality of tables. Therefore, according to the present invention, the data mining engine can be applied to a wider data format as compared with the first aspect. Further, according to the present invention, since information obtained from a plurality of tables is not mixed until a step of creating an item code correspondence table, data conversion processing can be performed in a relatively small work area.

[Brief description of the drawings]

【図１】本発明の実施の形態１のデータマイニング補
助装置のブロック構成図である。FIG. 1 is a block diagram of a data mining auxiliary device according to a first embodiment of the present invention.

【図２】図１に示す補助装置で実行される処理の内容
を説明するためのフローチャートである。FIG. 2 is a flowchart for explaining the contents of processing executed by the auxiliary device shown in FIG. 1;

【図３】図１に示す補助装置が扱う処理対象テーブル
の１例である。FIG. 3 is an example of a processing target table handled by the auxiliary device shown in FIG. 1;

【図４】図１に示す補助装置によって作成される属性
名テーブルの例である。FIG. 4 is an example of an attribute name table created by the auxiliary device shown in FIG. 1;

【図５】図１に示す補助装置によって作成されるテー
ブルビューの１例である。FIG. 5 is an example of a table view created by the auxiliary device shown in FIG. 1;

【図６】図１に示す補助装置によって作成される値ア
イテム対応テーブルの１例である。FIG. 6 is an example of a value item correspondence table created by the auxiliary device shown in FIG. 1;

【図７】図１に示す補助装置によって作成される値ア
イテム対応テーブルの１例である。FIG. 7 is an example of a value item correspondence table created by the auxiliary device shown in FIG. 1;

【図８】図１に示す補助装置によって作成されるアイ
テム列形式のテーブルの１例である。FIG. 8 is an example of an item string format table created by the auxiliary device shown in FIG. 1;

【図９】本発明の実施の形態２のデータマイニング補
助装置において処理対象テーブルとされるスプレッドシ
ート形式のテーブルである。FIG. 9 is a table in a spreadsheet format that is used as a processing target table in the data mining assistance device according to the second embodiment of the present invention.

【図１０】本発明の実施の形態３のデータマイニング
補助装置からデータマイニングエンジンに提供される属
性階層に関する情報の１例である。FIG. 10 is an example of information on an attribute hierarchy provided to the data mining engine from the data mining auxiliary device according to the third embodiment of the present invention.

【図１１】本発明の実施の形態４のデータマイニング
補助装置のブロック構成図である。FIG. 11 is a block diagram of a data mining assistance device according to a fourth embodiment of the present invention.

【図１２】図１１に示す補助装置で実行される処理の
内容を説明するためのフローチャートである。FIG. 12 is a flowchart for explaining the contents of processing executed by the auxiliary device shown in FIG. 11;

【図１３】バイナリ形式のテーブルの１例である。FIG. 13 is an example of a table in a binary format.

【図１４】アイテム列形式のテーブルおよびアイテム
コード対応テーブルの１例である。FIG. 14 is an example of an item sequence table and an item code correspondence table.

【図１５】スプレッドシート形式のテーブルの１例で
ある。FIG. 15 is an example of a table in a spreadsheet format.

【図１６】トランザクション形式のテーブルの１例で
ある。FIG. 16 is an example of a transaction format table.

【図１７】従来のデータマイニング補助装置のブロッ
ク図である。FIG. 17 is a block diagram of a conventional data mining auxiliary device.

[Explanation of symbols]

３０；６２，６４テーブルビュー生成部、３２；
６６，６８属性名決定部、３４；７０，７２属
性毎情報調査部、３６；７４属性値毎情報調査
部、３８アイテム列作成部、４０エンジン
パラメータ決定部、４２処理対象テーブル、
４４，４６属性名テーブル、４８テーブルビュ
ー、５０値アイテム対応テーブル、５２ア
イテムコード対応テーブル、５４アイテム列形式
テーブル、５６第２処理対象テーブル。30; 62, 64 table view generation unit; 32;
66, 68 Attribute name determination unit, 34; 70, 72 Attribute information investigation unit, 36; 74 Attribute value information investigation unit, 38 Item column creation unit, 40 Engine parameter determination unit, 42 Processing target table,
44, 46 attribute name table, 48 table view, 50 value item correspondence table, 52 item code correspondence table, 54 item column format table, 56 second processing target table.

Claims

[Claims]

1. A data mining assisting device for supplying a data mining engine for finding a correlation between data to a data mining engine in a table format that can be processed by the engine, the processing target table having a data structure in a table format A table view generating unit for the user to specify an ID column describing an identifier of a data unit, an attribute column describing an attribute, and a value column describing a value; and An attribute name determining unit that creates an attribute name for each combination with the name of the value column, and creates an attribute name table representing the correspondence between the combination and the attribute name; and for each attribute name defined in the attribute name table, Investigate the types of values described in the value column, determine items for some or all of the values, and express the correspondence between the values and the items. An attribute information investigating unit for creating a value item correspondence table; and an attribute value information for creating an item code correspondence table by attaching an item code to each item based on the number of appearance data units of each item in the processing target table. An investigating unit, and an item code corresponding to each data unit in which the same identifier is described in the ID column in the processing target table, based on the value item correspondence table and the item code correspondence table, A data mining assistance device, comprising: an item sequence creating unit that creates an item sequence representing a correspondence between a data unit and an item code.

2. The attribute-by-attribute information investigating unit allows categorization of a value of a predetermined type among values described in the value string, and allows allocating an item for each category. The data mining assisting device according to claim 1, wherein:

3. The data mining assistance device according to claim 1, further comprising an engine parameter determining unit that determines a parameter for controlling an operation state of the data mining engine.

4. The attribute name deciding unit is the same as the row set for each type of value of the attribute column specified by the table view generating unit and the value column specified by the table view generating unit. And automatically generating an attribute name table in which the attribute name is defined by combining a value of the attribute column and a name of the value column according to a predetermined rule. The data mining assistance device according to claim 1.

5. The method according to claim 1, wherein the attribute name determining unit includes at least one of a unit that receives the correction of the attribute name for each attribute name and a unit that receives the correction of all the attribute names collectively. Item 5. The data mining auxiliary device according to Item 4.

6. The attribute name determining unit, wherein only an ID column and an attribute column are specified in the table view generating unit,
6. The data mining assisting device according to claim 1, wherein when a value column is not specified, a one-line attribute name table having a name of the value column as an attribute name is automatically generated. .

7. The engine parameter determining unit uses a value item correspondence table obtained from the attribute-by-attribute information research unit to determine which item is included in which attribute and which attribute is included in the value of which attribute column. 4. The data mining assistance device according to claim 3, wherein the data mining engine is presented with the data, and the data mining engine receives a parameter setting in units of the value of the attribute column. 5.

8. A data mining assisting device for converting information of a plurality of tables to be processed having different table formats into a format that can be processed by a data mining engine, wherein the table view generating unit includes the ID column, the attribute column ,
And, the specification of the value column is permitted for each table format, the attribute name determination unit creates the attribute name table for each table format, and the attribute information investigation unit supports the value item for each table format. Creating a table, the attribute value-based information investigating unit creates the item code correspondence table for the entire processing target table, the item string creating unit includes the value item correspondence table, the item code correspondence table The data mining assistance device according to any one of claims 1 to 7, wherein a single item sequence is created based on the plurality of processing target tables.

9. A data format conversion method for supplying a data mining engine for finding a correlation between data to a data mining engine capable of processing the data in a table format, wherein the table has a data structure in a table format. A table selection step of selecting a table as a processing target table; an ID column and an attribute specifying an ID column describing an identifier of a data unit, an attribute column describing an attribute, and a value column describing a value for the processing target table; A column / value column selecting step, an attribute for creating an attribute name for each combination of a value type of the attribute column and a name of the value column, and creating an attribute name table representing a correspondence between the combination and the attribute name Creating a name table, and examining the type of value described in the value column for each attribute name defined in the attribute name table; A value item correspondence table creating step of determining an item for a part or all of these values and creating a value item correspondence table representing the correspondence between the value and the item; and occurrence data of each item in the processing target table. An item code correspondence table creation step of creating an item code correspondence table by attaching an item code to each item based on the number of units; and a data unit in which the same identifier is described in the ID column in the processing target table. And an item column creation step of investigating an item code corresponding to the item code based on the value item correspondence table and the item code correspondence table, and creating an item sequence representing a correspondence between each data unit and the item code. Data format conversion characterized by Method.

10. When only an ID column and an attribute column are specified in the ID column / attribute column / value column selecting step and no value column is specified, the attribute name table creating step sets the name of the value column to 10. The data format conversion method according to claim 9, wherein a one-line attribute name table as an attribute name is automatically generated.

11. A data format conversion method for converting information of a plurality of processing target tables having different table formats into a format that can be processed by a data mining engine, wherein the ID column / attribute column / value column selecting step comprises: The ID
Specifying a column, the attribute column, and the value column for each table format, and the attribute name table creating step includes creating the attribute name table for each table format, The item correspondence table creation step includes a step of creating the value item correspondence table for each table format. In the item code correspondence table creation step, the item code correspondence table is created for the entire processing target table. 11. The method according to claim 9, wherein in the item string creation step, a single item string is created based on the value item correspondence table, the item code correspondence table, and the plurality of processing target tables. The described data mining auxiliary device.

12. A recording medium recording a data format conversion program for supplying a table format data processable by the engine to a data mining engine for finding a correlation between data, the program comprising: And causing the computer to select a processing target table in accordance with a predetermined input, and in response to the predetermined input, in the processing target table,
Recognize an ID column that describes an identifier of a data unit, an attribute column that describes an attribute, and a value column that describes a value, and assigns an attribute name to each combination of a value type of the attribute column and a name of the value column. At the same time, an attribute name table showing the correspondence between the combination and the attribute name is created, and the type of the value described in the value column is checked for each attribute name defined in the attribute name table. Recognize items for some or all of the values, create a value item correspondence table representing the correspondence between the values and the items, and generate an item for each item based on the number of appearance data units of each item in the processing target table. By attaching a code, an item code correspondence table is created, and in the processing target table, a data unit in which the same identifier is described in the ID column. A storage medium that causes an item code corresponding to each item to be examined based on the value item correspondence table and the item code correspondence table, and to create an item sequence representing a correspondence between each data unit and the item code. .

13. The program, when only an ID column and an attribute column are specified for the processing target table and no value column is specified, the computer stores the name of the value column of the processing target table in the computer. 2. A one-line attribute name table as an attribute name is automatically generated.
2. The recording medium according to 2.

14. A recording medium recording a program for converting information of a plurality of tables to be processed having different table formats into a format that can be processed by a data mining engine, wherein the program stores the program in a computer. ID column, attribute column, and value column for each table format, the attribute name table is created for each table format, the value item correspondence table is created for each format, and the item code correspondence table is created. It is created for the entire processing target table, and further, a single item column is created based on the value item correspondence table, the item code correspondence table, and the plurality of processing target tables. 14. The recording medium according to claim 12 or 13.