JP2018151805A

JP2018151805A - Data item name estimating apparatus, data item name estimating method, and program

Info

Publication number: JP2018151805A
Application number: JP2017046895A
Authority: JP
Inventors: 要松村; Kaname Matsumura
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2018-09-27
Anticipated expiration: 2037-03-13
Also published as: JP7235269B2

Abstract

PROBLEM TO BE SOLVED: To provide a data item name estimating apparatus, a data item name estimating method, and a program capable of estimating an item name given to a data item without requiring knowledge concerning data registered in a database.SOLUTION: A data item name estimating apparatus 100 comprises: a learning processing unit 10 which extracts feature values of data to which a data item name is assigned for each data item name in a learning table, defines respective relationships between the extracted feature values and attributes corresponding to the respective data item names, specifies the corresponding data item name for each attribute, learns combinations of the attributes and the respective data item names in the learning table to create a learning model; and an estimation processor 20 which compares the feature values of each data item of the target table with the definition by the learning processing unit 10 to estimate attributes of each data item, and applies the estimated attributes of each data item to the learning model to estimate the data item name of the data item of the target table.SELECTED DRAWING: Figure 1

Description

本発明は、データ項目名が未知のテーブルのデータ項目名を推定するための、データ項目名推定装置、及びデータ項目名推定方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention relates to a data item name estimation device and a data item name estimation method for estimating a data item name of a table whose data item name is unknown, and further relates to a program for realizing these.

近年、ICT（Information and Communication Technology）の発展により、多種多量の情報をリアルタイムに取得することが可能及び容易となっており、大量の情報が収集及び蓄積されている。そして、これらの情報を活用して、各種の分析及び予測を行う際には、これらの情報が格納されている複数のデータベースを統合し、共通のスキーマを持つデータベースを作成する必要がある。 In recent years, with the development of ICT (Information and Communication Technology), it has become possible and easy to acquire a large amount of information in real time, and a large amount of information has been collected and accumulated. When various analyzes and predictions are performed using these pieces of information, it is necessary to create a database having a common schema by integrating a plurality of databases storing these pieces of information.

データベースの統合には、共通化する分類又は属性を定義する必要がある。このため、従来ではデータベース又はデータモデリングのスペシャリストが、各データベース管理者にスキーマ構成を確認し、人手によって、共通するデータの識別、抽出、統合を行なっている。従って、データベースの統合には非常に多くの作業時間がかかってしまう。 For database integration, it is necessary to define common classifications or attributes. For this reason, conventionally, a database or data modeling specialist confirms the schema configuration with each database administrator and manually identifies, extracts, and integrates common data. Therefore, it takes a lot of work to integrate the databases.

また、データ統合を行う際、統合対象となるそれぞれのデータについて、作業を行なう者が、スキーマ構造までを理解してていれば、どの項目を統合すればよいか判断することは容易である。しかし、作業を行なう者が、統合対象のデータについて十分な知識を有しておらず、データ項目名が「t1」などの無意味な項目名が定義されている場合、どの項目を統合すればよいかを判断することは非常に困難である。従って、このような点からも、データベースの統合は、時間がかかる作業である。 Further, when performing data integration, it is easy to determine which items should be integrated if the person who performs the work understands the schema structure for each data to be integrated. However, if the person who performs the work does not have sufficient knowledge about the data to be integrated and a meaningless item name such as “t1” is defined as the data item name, which item should be integrated? It is very difficult to judge whether it is good. Therefore, from this point of view, database integration is a time-consuming operation.

このような問題を解決するため、例えば、特許文献１は、分類構築支援システムを開示している。特許文献１に開示された分類構築支援システムは、まず、データ項目に関して、そのデータ項目のレコードデータを使って、データ項目の特徴値を抽出する。次いで、分類構築支援システムは、抽出した特徴値について、あらかじめ定義しておいた属性との類似度を求めることで、複数のデータ項目それぞれを適切な分類又は属性に分類する。 In order to solve such a problem, for example, Patent Literature 1 discloses a classification construction support system. The classification construction support system disclosed in Patent Document 1 first extracts a feature value of a data item by using record data of the data item. Next, the classification construction support system classifies each of the plurality of data items into an appropriate classification or attribute by obtaining the similarity with the attribute defined in advance for the extracted feature value.

特開２００６−９９２３６号公報JP 2006-99236 A

しかしながら、上記特許文献１に開示された分類構築支援システムでは、同じ特徴値を持つデータ項目を１つの属性に纏める事はできるが、その属性を持つデータ項目の名称を判断することは不可能である。 However, in the classification construction support system disclosed in Patent Document 1, data items having the same feature value can be combined into one attribute, but it is impossible to determine the name of the data item having the attribute. is there.

例えば、あるデータ項目について、それを分類する属性として「温度」が判断されたとする。この場合において、データ統合を行なうためには、その「温度」の属性を持つデータ項目が、「平均気温」、「最高気温」、「最低気温」のいずれの項目名を定義することが適切であるか、ということは識別が困難であった。 For example, it is assumed that “temperature” is determined as an attribute for classifying a certain data item. In this case, in order to perform data integration, it is appropriate that the data item having the attribute of “temperature” defines any item name of “average temperature”, “maximum temperature”, and “minimum temperature”. It was difficult to identify.

本発明の目的の一例は、上記問題を解消し、データベースに登録されているデータに関する知識を必要とすることなく、データ項目に付与される項目名を推定し得る、データ項目名推定装置、データ項目名推定方法、及びプログラムを提供することにある。 An example of the object of the present invention is to solve the above-mentioned problems and to estimate an item name given to a data item without requiring knowledge about data registered in a database. An object is to provide an item name estimation method and a program.

上記目的を達成するため、本発明の一側面におけるデータ項目名推定装置は、
学習用のテーブルにおけるデータ項目名毎に、当該データ項目名が付与されているデータの特徴値を抽出し、抽出した特徴値と当該データ項目名に対応する属性との関係を定義し、前記属性毎に、対応するデータ項目名を特定し、前記学習用のテーブルにおける前記属性と前記データ項目名との組合せを学習して、学習モデルを作成する、学習処理部と、
対象となるテーブルのデータ項目毎の特徴値を、前記学習処理部による定義に照合して、各データ項目の属性を推定し、推定したデータ項目毎の属性を、前記学習モデルに適用して、前記対象となるテーブルのデータ項目のデータ項目名を推定する、推定処理部と、
を備えている、ことを特徴とする。 In order to achieve the above object, a data item name estimation apparatus according to one aspect of the present invention includes:
For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, the relationship between the extracted feature value and the attribute corresponding to the data item name is defined, and the attribute A learning processing unit for identifying a corresponding data item name for each, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
The feature value for each data item of the target table is collated with the definition by the learning processing unit, the attribute of each data item is estimated, the attribute for each estimated data item is applied to the learning model, An estimation processing unit that estimates a data item name of a data item of the target table;
It is characterized by having.

また、上記目的を達成するため、本発明の一側面におけるデータ項目名推定方法は、
（ａ）学習用のテーブルにおけるデータ項目名毎に、当該データ項目名が付与されているデータの特徴値を抽出し、抽出した特徴値と当該データ項目名に対応する属性との関係を定義し、前記属性毎に、対応するデータ項目名を特定し、前記学習用のテーブルにおける前記属性と前記データ項目名との組合せを学習して、学習モデルを作成する、ステップと、
（ｂ）対象となるテーブルのデータ項目毎の特徴値を、前記（ａ）のステップで得られた定義に照合して、各データ項目の属性を推定し、推定したデータ項目毎の属性を、前記学習モデルに適用して、前記対象となるテーブルのデータ項目のデータ項目名を推定する、ステップと、
を有する、ことを特徴とする。 In order to achieve the above object, a data item name estimation method according to an aspect of the present invention includes:
(A) For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, and the relationship between the extracted feature value and the attribute corresponding to the data item name is defined. Identifying a corresponding data item name for each attribute, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
(B) The feature value for each data item of the target table is collated with the definition obtained in the step (a), the attribute of each data item is estimated, and the attribute for each estimated data item is Applying to the learning model and estimating a data item name of a data item of the target table;
It is characterized by having.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
（ａ）学習用のテーブルにおけるデータ項目名毎に、当該データ項目名が付与されているデータの特徴値を抽出し、抽出した特徴値と当該データ項目名に対応する属性との関係を定義し、前記属性毎に、対応するデータ項目名を特定し、前記学習用のテーブルにおける前記属性と前記データ項目名との組合せを学習して、学習モデルを作成する、ステップと、
（ｂ）対象となるテーブルのデータ項目毎の特徴値を、前記（ａ）のステップで得られた定義に照合して、各データ項目の属性を推定し、推定したデータ項目毎の属性を、前記学習モデルに適用して、前記対象となるテーブルのデータ項目のデータ項目名を推定する、ステップと、
を実行させることを特徴とする。 Furthermore, in order to achieve the above object, a program according to one aspect of the present invention is provided.
On the computer,
(A) For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, and the relationship between the extracted feature value and the attribute corresponding to the data item name is defined. Identifying a corresponding data item name for each attribute, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
(B) The feature value for each data item of the target table is collated with the definition obtained in the step (a), the attribute of each data item is estimated, and the attribute for each estimated data item is Applying to the learning model and estimating a data item name of a data item of the target table;
Is executed.

以上のように、本発明によれば、データベースに登録されているデータに関する知識を必要とすることなく、データ項目に付与される項目名を推定することができる。 As described above, according to the present invention, it is possible to estimate an item name given to a data item without requiring knowledge about data registered in the database.

図１は、本発明の実施の形態におけるデータ項目推定装置の概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a data item estimation device according to an embodiment of the present invention. 図２は、本発明の実施の形態におけるデータ項目推定装置の具体的構成を示すブロック図である。FIG. 2 is a block diagram showing a specific configuration of the data item estimation apparatus according to the embodiment of the present invention. 図３は、本発明の実施の形態で用いられる学習用のテーブルの一例を示す図である。FIG. 3 is a diagram showing an example of a learning table used in the embodiment of the present invention. 図４は、本発明の実施の形態において、データ項目名の推定処理の対象となる対象テーブルの一例を示す図である。FIG. 4 is a diagram showing an example of a target table that is a target of data item name estimation processing in the embodiment of the present invention. 図５は、本発明の実施の形態におけるデータ項目名推定装置の学習処理時における動作を示すフロー図である。FIG. 5 is a flowchart showing the operation during the learning process of the data item name estimation apparatus according to the embodiment of the present invention. 図６は、本発明の実施の形態において作成される属性情報の一例を示す図である。FIG. 6 is a diagram showing an example of attribute information created in the embodiment of the present invention. 図７は、本発明の実施の形態において作成される属性個物情報の一例を示す図である。FIG. 7 is a diagram showing an example of attribute item information created in the embodiment of the present invention. 図８は、本発明の実施の形態において作成される属性組合せ情報の一例を示す図である。FIG. 8 is a diagram showing an example of attribute combination information created in the embodiment of the present invention. 図９は、本発明の実施の形態におけるデータ項目名推定装置の推定処理時における動作を示すフロー図である。FIG. 9 is a flowchart showing an operation during the estimation process of the data item name estimation apparatus according to the embodiment of the present invention. 図１０（ａ）は、本発明の実施の形態で用いられる対象テーブルから抽出された特徴値の一例を示す図である。図１０（ｂ）は、図１０（ａ）に示された特徴値から算出された類似度の一例を示す図である。FIG. 10A is a diagram showing an example of feature values extracted from the target table used in the embodiment of the present invention. FIG. 10B is a diagram illustrating an example of the similarity calculated from the feature values illustrated in FIG. 図１１（ａ）は、本発明の実施の形態で用いられる属性組合せ情報の一例を示す図である。図１１（ｂ）は、本発明の実施の形態で算出されたデータ項目名の出現頻度の一例を示す図である。FIG. 11A is a diagram showing an example of attribute combination information used in the embodiment of the present invention. FIG.11 (b) is a figure which shows an example of the appearance frequency of the data item name calculated in embodiment of this invention. 図１２は、本発明の実施の形態におけるデータ項目推定装置を実現するコンピュータの一例を示すブロック図である。FIG. 12 is a block diagram illustrating an example of a computer that implements the data item estimation apparatus according to the embodiment of the present invention.

（発明の概要）
本発明では、データ項目名の推定対象となるテーブルのデータ項目から、特徴値を抽出し、抽出した特徴値と属性との類似度を算出して、類似度の高い属性にそれぞれのデータ項目を分類する。そして、推定対象となるテーブルを構成するデータ項目の属性の組み合わせを、事前に学習しておいたデータ項目の属性の組み合わせとデータ項目名との対応関係に適用することで、データ項目にどのようなデータ項目名が付与されるかを推定する。つまり、本発明では、主に、学習処理と推定処理とが行なわれる。以下に具体的説明する。 (Summary of Invention)
In the present invention, the feature value is extracted from the data item of the table that is the estimation target of the data item name, the similarity between the extracted feature value and the attribute is calculated, and each data item is assigned to the attribute having a high similarity. Classify. Then, by applying the combination of data item attributes that make up the estimation target table to the correspondence between the data item attribute combination and the data item name learned in advance, The correct data item name is given. That is, in the present invention, learning processing and estimation processing are mainly performed. Specific description will be given below.

（実施の形態）
以下、本発明の実施の形態における、データ項目推定装置、データ項目推定方法、及びプログラムについて、図１〜図１２を参照しながら説明する。 (Embodiment)
Hereinafter, a data item estimation device, a data item estimation method, and a program according to an embodiment of the present invention will be described with reference to FIGS.

［装置構成］
最初に、本実施の形態におけるデータ項目推定装置の構成について説明する。図１は、本発明の実施の形態におけるデータ項目推定装置の概略構成を示すブロック図である。 [Device configuration]
Initially, the structure of the data item estimation apparatus in this Embodiment is demonstrated. FIG. 1 is a block diagram showing a schematic configuration of a data item estimation device according to an embodiment of the present invention.

図１に示す、本実施の形態におけるデータ項目推定装置１００は、処理の対象となるテーブル（以下、「対象テーブル」と表記する。）のデータ項目名を推定する装置である。図１に示すように、データ項目推定装置１００は、学習処理部１０と、推定処理部２０とを備えている。 A data item estimation device 100 according to the present embodiment shown in FIG. 1 is a device that estimates a data item name of a table to be processed (hereinafter referred to as “target table”). As shown in FIG. 1, the data item estimation device 100 includes a learning processing unit 10 and an estimation processing unit 20.

学習処理部１０は、まず、学習用のテーブルにおけるデータ項目名毎に、当該データ項目名が付与されているデータの特徴値を抽出し、抽出した特徴値と当該データ項目名に対応する属性との関係を定義する。次いで、学習処理部１０は、属性毎に、対応するデータ項目名を特定し、学習用のテーブルにおける属性とデータ項目名との出現の頻度を学習して、学習モデルを作成する。 First, the learning processing unit 10 extracts, for each data item name in the learning table, the feature value of the data to which the data item name is assigned, and the extracted feature value and the attribute corresponding to the data item name Define the relationship. Next, the learning processing unit 10 specifies a corresponding data item name for each attribute, learns the frequency of appearance of the attribute and the data item name in the learning table, and creates a learning model.

推定処理部２０は、まず、対象テーブルのデータ項目毎の特徴値を、学習処理部１０による定義に照合して、各データ項目の属性を推定する。次いで、推定処理部２０は、推定したデータ項目毎の属性を、学習モデルに適用して、対象テーブルのデータ項目のデータ項目名を推定する。 First, the estimation processing unit 20 collates the feature value for each data item in the target table with the definition by the learning processing unit 10 to estimate the attribute of each data item. Next, the estimation processing unit 20 applies the estimated attribute for each data item to the learning model, and estimates the data item name of the data item in the target table.

このように、本実施の形態では、データ項目名が既知の学習用のテーブルを用いて、データ項目名を推定するための学習モデルが生成され、この学習モデルを用いることで、データ項目名が既知でないテーブルにおけるデータ項目名が推定される。このため、本実施の形態によれば、データベースに登録されているデータに関する知識を必要とすることなく、データ項目に付与される項目名を推定することができる。 As described above, in the present embodiment, a learning model for estimating a data item name is generated using a learning table with a known data item name. By using this learning model, the data item name is Data item names in unknown tables are inferred. For this reason, according to this Embodiment, the item name provided to a data item can be estimated, without requiring the knowledge regarding the data registered into the database.

続いて、図２〜図４を用いて、本実施の形態におけるデータ項目推定装置１００の構成についてより具体的に説明する。図２は、本発明の実施の形態におけるデータ項目推定装置の具体的構成を示すブロック図である。図３は、本発明の実施の形態で用いられる学習用のテーブルの一例を示す図である。図４は、本発明の実施の形態において、データ項目名の推定処理の対象となる対象テーブルの一例を示す図である。 Next, the configuration of the data item estimation apparatus 100 in the present embodiment will be described more specifically with reference to FIGS. FIG. 2 is a block diagram showing a specific configuration of the data item estimation apparatus according to the embodiment of the present invention. FIG. 3 is a diagram showing an example of a learning table used in the embodiment of the present invention. FIG. 4 is a diagram showing an example of a target table that is a target of data item name estimation processing in the embodiment of the present invention.

図２に示すように、本実施の形態では、データ項目推定装置１００は、学習処理部１０及び推定処理部２０に加えて、記憶部３０を備えている。記憶部３０は、後述する属性情報３１、属性個物情報３２、及び属性組合せ情報３３を格納している。 As shown in FIG. 2, in the present embodiment, the data item estimation device 100 includes a storage unit 30 in addition to the learning processing unit 10 and the estimation processing unit 20. The storage unit 30 stores attribute information 31, attribute individual information 32, and attribute combination information 33, which will be described later.

また、図２に示すように、学習処理部１０は、学習テーブル受付部１１と、特徴抽出部１２と、属性情報作成部１３と、属性個物情報作成部１４と、属性組合せ情報作成部１５とを備えている。 As shown in FIG. 2, the learning processing unit 10 includes a learning table receiving unit 11, a feature extracting unit 12, an attribute information creating unit 13, an attribute individual information creating unit 14, and an attribute combination information creating unit 15. And.

学習テーブル受付部１１は、外部から入力される学習用のテーブル（図３参照）を受け付け、受け付けた学習用のテーブルを特徴抽出部１２に渡す。学習用のテーブルは、図３に示すようにテーブル形式のデータであるが、各レコードデータの形式は、XML(Extensible markup language)、CSV(Common Separated Value)、HTML(Hypertext markup language)等のいずれの形式であってもよい。 The learning table reception unit 11 receives a learning table (see FIG. 3) input from the outside, and passes the received learning table to the feature extraction unit 12. The learning table is data in a table format as shown in FIG. 3. The format of each record data is any of XML (Extensible markup language), CSV (Common Separated Value), HTML (Hypertext markup language), etc. It may be in the form of

特徴抽出部１２は、学習用のテーブルにおけるデータ項目名毎に、そのデータ項目名が付与されているレコードのデータから特徴値を抽出する。抽出される特徴値としては、データ型（文字型、数値型）、統計情報（平均値、分散値）、同じレコードの出現頻度等が挙げられる（後述の図６参照）。また、テーブルから特徴値を抽出する方法としては、上述した特許文献１に開示されている方法が挙げられる。 For each data item name in the learning table, the feature extraction unit 12 extracts a feature value from the data of the record to which the data item name is assigned. The extracted feature values include data type (character type, numerical type), statistical information (average value, variance value), appearance frequency of the same record, and the like (see FIG. 6 described later). Moreover, as a method for extracting feature values from the table, the method disclosed in Patent Document 1 described above can be cited.

属性情報作成部１３は、特徴抽出部１２が抽出した特徴値と、抽出元のデータのデータ項目名に対応する属性との関係を定義した属性情報３１（後述の図６参照）を作成し、作成した属性情報を記憶部３０に格納する。 The attribute information creation unit 13 creates attribute information 31 (see FIG. 6 described later) that defines the relationship between the feature value extracted by the feature extraction unit 12 and the attribute corresponding to the data item name of the extraction source data. The created attribute information is stored in the storage unit 30.

具体的には、属性情報作成部１３は、特徴抽出部１２が抽出した特徴値について、外部から、抽出元のデータのデータ項目名に対応する属性が設定されると、この特徴値と設定された属性とを対応付ける属性情報３１を作成する。また、属性の設定は、人手によって行なわれていても良い。 Specifically, the attribute information creation unit 13 sets the feature value extracted by the feature extraction unit 12 from the outside when an attribute corresponding to the data item name of the extraction source data is set. Attribute information 31 is created for associating the attribute. Further, the attribute setting may be performed manually.

属性個物情報作成部１４は、属性情報作成部１３が作成した属性情報に含まれる属性毎に、対応するデータ項目名が付与された属性個物情報（図７参照）を作成する。 The attribute individual information creation unit 14 creates attribute individual information (see FIG. 7) to which a corresponding data item name is assigned for each attribute included in the attribute information created by the attribute information creation unit 13.

具体的には、属性個物情報作成部１４は、属性情報に含まれる属性それぞれに対して、対応するデータ項目名が設定されると、属性と、入力されたデータ項目名が付与された個物とを用いて、属性個物情報を作成する。属性個物情報は、属性と個物との対応関係を示している。なお、ここでいう個物とは、データ項目名が付与されたオブジェクトを意味している。また、データ項目名の設定は、人手によって行なわれていても良い。 Specifically, when the corresponding data item name is set for each attribute included in the attribute information, the attribute individual information creation unit 14 assigns the attribute and the input data item name. Attribute individual object information is created using objects. The attribute individual information indicates the correspondence between attributes and individual objects. Here, the individual item means an object to which a data item name is assigned. The data item name may be set manually.

属性組合せ情報作成部１５は、属性個物情報を用いて、学習モデルとして、学習用のテーブルにおける属性の組合わせとそれに対応するデータ項目名とを示す属性組合せ情報（図８参照）を作成する。 The attribute combination information creation unit 15 creates attribute combination information (see FIG. 8) indicating the combination of attributes in the learning table and the corresponding data item name as a learning model using the attribute individual object information. .

また、図２に示すように、推定処理部２０は、対象テーブル受付部２１と、特徴抽出部２２と、属性推定部２３と、項目名推定部２４と、推定結果表示部２５と、結果編集部２６とを備えている。 As shown in FIG. 2, the estimation processing unit 20 includes a target table reception unit 21, a feature extraction unit 22, an attribute estimation unit 23, an item name estimation unit 24, an estimation result display unit 25, and a result edit. Part 26.

対象テーブル受付部２１は、外部から入力される対象テーブル（図４参照）を受付、受け付けた対象テーブルを特徴抽出部２２に渡す。対象テーブルも、図４に示すように学習用のテーブルと同様に、テーブル形式のデータである。また、対象テーブルにおける各レコードデータの形式も、XML(Extensible markup language)、CSV(Common Separated Value)、HTML(Hypertext markup language)等のいずれの形式であってもよい。 The target table reception unit 21 receives a target table (see FIG. 4) input from the outside, and passes the received target table to the feature extraction unit 22. Similar to the learning table, the target table is also data in a table format as shown in FIG. Also, the format of each record data in the target table may be any format such as XML (Extensible markup language), CSV (Common Separated Value), HTML (Hypertext markup language).

特徴抽出部２２は、対象テーブルのレコードから、データ項目毎に、データの特徴値を抽出する。抽出される特徴値としては、データ型（文字型、数値型）、統計情報（平均値、分散値）、同じレコードの出現頻度等が挙げられる。また、テーブルから特徴値を抽出する方法としては、上述した特許文献１に開示されている方法が挙げられる。 The feature extraction unit 22 extracts the feature value of the data for each data item from the record of the target table. The extracted feature values include data type (character type, numerical type), statistical information (average value, variance value), appearance frequency of the same record, and the like. Moreover, as a method for extracting feature values from the table, the method disclosed in Patent Document 1 described above can be cited.

属性推定部２３は、特徴抽出部２２が抽出した特徴値を、属性情報３１に照合して、各データ項目の属性を推定する。具体的には、属性推定部２３は、まず、各データ項目の特徴値と、属性情報３１に含まれる各属性の特徴値とを比較して、類似度を算出する。そして、属性推定部２３は、推定対象となっているデータ項目の特徴値との類似度が最も高い属性を特定し、特定した属性を、推定対象となっているデータ項目の属性として推定する（後述の図１０（ａ）及び（ｂ）参照）。 The attribute estimation unit 23 collates the feature value extracted by the feature extraction unit 22 with the attribute information 31 to estimate the attribute of each data item. Specifically, the attribute estimation unit 23 first calculates the similarity by comparing the feature value of each data item with the feature value of each attribute included in the attribute information 31. And the attribute estimation part 23 specifies the attribute with the highest similarity with the feature value of the data item used as an estimation object, and estimates the specified attribute as an attribute of the data item used as an estimation object ( (Refer FIG. 10 (a) and (b) mentioned later).

項目名推定部２４は、まず、属性推定部２３が推定した属性を用いて、属性の組合せを設定する。次いで、項目名推定部２４は、設定した属性の組合せ毎に、属性組合せ情報３３及び属性個物情報３２を用いて、特定のデータ項目名が出現する頻度を算出する。そして、項目名推定部２４は、算出結果に基づいて、対象テーブルのデータ項目それぞれのデータ項目名を推定する。 The item name estimation unit 24 first sets a combination of attributes using the attributes estimated by the attribute estimation unit 23. Next, the item name estimation unit 24 calculates the frequency of occurrence of a specific data item name using the attribute combination information 33 and the attribute individual information 32 for each set attribute combination. And the item name estimation part 24 estimates the data item name of each data item of a target table based on a calculation result.

具体的には、項目名推定部２４は、属性個物情報３２を用いて、組合せが設定された各属性について、対応する可能性がある１又は２以上のデータ項目名を特定する。そして、項目名推定部２４は、属性組合せ情報３３を用いて、設定した属性の組合せ毎に、対応する可能性があるデータ項目名それぞれについて出現頻度（出現確率）を計算する。そして、項目名推定部２４は、推定された属性のデータ項目名として、対応する可能性があるデータ項目名のうち、出現頻度が最も高いデータ項目名を特定する。 Specifically, the item name estimation unit 24 uses the attribute individual information 32 to identify one or more data item names that may correspond to each attribute for which a combination is set. Then, using the attribute combination information 33, the item name estimation unit 24 calculates the appearance frequency (appearance probability) for each data item name that may correspond to each set attribute combination. And the item name estimation part 24 specifies the data item name with the highest appearance frequency among the data item names which may correspond as a data item name of the estimated attribute.

推定結果表示部２５は、項目名推定部２４が推定したデータ項目名を、データ項目名推定装置１００に接続された表示装置、又は外部の端末装置の表示装置の画面に表示する。結果編集部２６は、推定されたデータ項目名に誤りがあると判断され、外部から、修正されたデータ項目名が入力された場合に、項目名推定部２４による推定結果を修正する。また、結果編集部２６は、修正内容を、学習処理部１０に伝えることができる。この場合、学習処理部１０に、修正内容を、属性情報３１、属性個物情報３２、及び属性組合せ情報３３に反映させる。 The estimation result display unit 25 displays the data item name estimated by the item name estimation unit 24 on the screen of a display device connected to the data item name estimation device 100 or a display device of an external terminal device. The result editing unit 26 corrects the estimation result by the item name estimation unit 24 when it is determined that there is an error in the estimated data item name and the corrected data item name is input from the outside. Further, the result editing unit 26 can transmit the correction content to the learning processing unit 10. In this case, the learning processing unit 10 reflects the correction contents in the attribute information 31, the attribute individual information 32, and the attribute combination information 33.

［装置動作］
次に、本実施の形態におけるデータ項目推定装置１００の動作について図３〜図１０を用いて説明する。また、本実施の形態では、データ項目推定装置１００を動作させることによって、データ項目推定方法が実施される。よって、本実施の形態におけるデータ項目推定方法の説明は、以下のデータ項目推定装置１００の動作説明に代える。 [Device operation]
Next, the operation of the data item estimation apparatus 100 in the present embodiment will be described with reference to FIGS. Moreover, in this Embodiment, the data item estimation method is implemented by operating the data item estimation apparatus 100. Therefore, the description of the data item estimation method in the present embodiment is replaced with the following description of the operation of the data item estimation apparatus 100.

まず、図５〜図８を用いて、学習フェーズ（学習処理）について説明する。図５は、本発明の実施の形態におけるデータ項目名推定装置の学習処理時における動作を示すフロー図である。 First, the learning phase (learning process) will be described with reference to FIGS. FIG. 5 is a flowchart showing the operation during the learning process of the data item name estimation apparatus according to the embodiment of the present invention.

図５に示すように、最初に、学習テーブル受付部１１は、外部から入力される学習用のテーブル（図３参照）を受け付ける（ステップＡ１）。また、学習テーブル受付部１１は、受け付けた学習用のテーブルを特徴抽出部１２に渡す。 As shown in FIG. 5, first, the learning table receiving unit 11 receives a learning table (see FIG. 3) input from the outside (step A1). The learning table reception unit 11 passes the received learning table to the feature extraction unit 12.

次に、特徴抽出部１２は、受け付けた学習用のテーブルのデータ項目の１つを選択し、選択したデータ項目のレコードのデータから、特徴値を抽出する（ステップＡ２）。 Next, the feature extraction unit 12 selects one of the data items in the accepted learning table, and extracts a feature value from the record data of the selected data item (step A2).

次に、特徴抽出部１２は、全てのデータ項目から特徴値を抽出したかどうかを判定する（ステップＡ３）。判定の結果、全てのデータ項目から特徴値を抽出していない場合は、特徴抽出部１２は、再度ステップＡ２を実行し、全てのデータ項目から特徴値を抽出している場合は、属性情報作成部１３に、ステップＡ４を実行するように指示する。 Next, the feature extraction unit 12 determines whether or not feature values have been extracted from all data items (step A3). If the feature value is not extracted from all the data items as a result of the determination, the feature extracting unit 12 executes Step A2 again. If the feature value is extracted from all the data items, the attribute information is created. The unit 13 is instructed to execute step A4.

属性情報作成部１３は、ステップＡ３でＹｅｓと判定されると、データ項目毎に、設定された属性を対応付けて、属性情報３１を作成する（ステップＡ４）。また、ステップＡ４において、属性の設定は、外部において、人手によって、抽出元のデータのデータ項目名に基づいて行なわれている。図６は、本発明の実施の形態において作成される属性情報の一例を示す図である。 If it is determined Yes in step A3, the attribute information creation unit 13 creates attribute information 31 by associating the set attribute for each data item (step A4). Further, in step A4, the attribute is set by the outside based on the data item name of the extraction source data. FIG. 6 is a diagram showing an example of attribute information created in the embodiment of the present invention.

次に、属性個物情報作成部１４は、ステップＡ４で作成された属性情報に含まれる属性毎に、対応するデータ項目名が付与された属性個物情報３２を作成する（ステップＡ５）。また、ステップＡ５において、データ項目名の設定は、外部において、人手によって行なわれている。図７は、本発明の実施の形態において作成される属性個物情報の一例を示す図である。 Next, the attribute individual information creation unit 14 creates attribute individual information 32 to which a corresponding data item name is assigned for each attribute included in the attribute information created in step A4 (step A5). In step A5, the data item name is set manually by the outside. FIG. 7 is a diagram showing an example of attribute item information created in the embodiment of the present invention.

次に、属性組合せ情報作成部１５は、ステップＡ５で作成された属性個物情報を用いて、学習モデルとして、学習用のテーブルにおける属性の組合せとそれに対応するデータ項目名とを示す属性組合せ情報３３を作成する（ステップＡ６）。図８は、本発明の実施の形態において作成される属性組合せ情報の一例を示す図である。 Next, the attribute combination information creation unit 15 uses the attribute individual information created in step A5 as the learning model to indicate the attribute combination information in the learning table and the corresponding data item name as the learning model. 33 is created (step A6). FIG. 8 is a diagram showing an example of attribute combination information created in the embodiment of the present invention.

続いて、図９〜図１０を用いて、推定フェーズ（推定処理）について説明する。図９は、本発明の実施の形態におけるデータ項目名推定装置の推定処理時における動作を示すフロー図である。 Next, an estimation phase (estimation process) will be described with reference to FIGS. FIG. 9 is a flowchart showing an operation during the estimation process of the data item name estimation apparatus according to the embodiment of the present invention.

図９に示すように、最初に、対象テーブル受付部２１は、外部から入力される対象テーブル（図４参照）を受け付ける（ステップＢ１）。また、対象テーブル受付部２１は、受け付けた対象テーブルを特徴抽出部２２に渡す。 As shown in FIG. 9, first, the target table receiving unit 21 receives a target table (see FIG. 4) input from the outside (step B1). The target table receiving unit 21 passes the received target table to the feature extracting unit 22.

次に、特徴抽出部２２は、図１０（ａ）に示すように、受け付けた対象テーブルのデータ項目の１つを選択し、選択したデータ項目のレコードのデータから、特徴値を抽出する（ステップＢ２）。 Next, as shown in FIG. 10A, the feature extraction unit 22 selects one of the received data items of the target table, and extracts a feature value from the record data of the selected data item (step B2).

図１０（ａ）は、本発明の実施の形態で用いられる対象テーブルから抽出された特徴値の一例を示す図である。図１０（ｂ）は、図１０（ａ）に示された特徴値から算出された類似度の一例を示す図である。 FIG. 10A is a diagram showing an example of feature values extracted from the target table used in the embodiment of the present invention. FIG. 10B is a diagram illustrating an example of the similarity calculated from the feature values illustrated in FIG.

次に、特徴抽出部２２は、全てのデータ項目から特徴値を抽出したかどうかを判定する（ステップＢ３）。判定の結果、全てのデータ項目から特徴値を抽出していない場合は、特徴抽出部２２は、再度ステップＢ２を実行し、全てのデータ項目から特徴値を抽出している場合は、属性推定部２３に、ステップＢ４を実行するように指示する。 Next, the feature extraction unit 22 determines whether or not feature values have been extracted from all data items (step B3). As a result of the determination, if the feature values are not extracted from all the data items, the feature extraction unit 22 executes Step B2 again, and if the feature values are extracted from all the data items, the attribute estimation unit. 23 is instructed to execute step B4.

属性推定部２３は、ステップＢ３でＹｅｓと判定されると、対象テーブルのデータ項目を１つ選択する。そして、属性推定部２３は、図１０（ｂ）に示すように、選択したデータ項目特徴値と、属性情報３１に含まれる各属性の特徴値とを比較して、類似度を算出し、類似度が最も高い属性をそのデータ項目の属性として推定する（ステップＢ４）。 If the attribute estimation unit 23 determines Yes in step B3, the attribute estimation unit 23 selects one data item in the target table. Then, the attribute estimation unit 23 compares the selected data item feature value with the feature value of each attribute included in the attribute information 31 as shown in FIG. The attribute with the highest degree is estimated as the attribute of the data item (step B4).

次に、属性推定部２３は、全てのデータ項目について属性を推定したかどうかを判定する（ステップＢ５）。判定の結果、全てのデータ項目について属性を推定していない場合は、属性推定部２３は、再度ステップＢ４を実行し、全てのデータ項目について属性を推定している場合は、項目名推定部２４に、ステップＢ６を実行するように指示する。 Next, the attribute estimation part 23 determines whether the attribute was estimated about all the data items (step B5). If the attribute is not estimated for all data items as a result of the determination, the attribute estimation unit 23 executes Step B4 again. If the attribute is estimated for all data items, the item name estimation unit 24 Is instructed to execute step B6.

次に、ステップＢ５でＹｅｓと判定された場合、属性推定部２３は、まず、ステップＢ４で推定された属性を用いて、同時に出現する複数の属性の組合せを設定する。次いで、項目名推定部２４は、図１１（ａ）及び（ｂ）に示すように、設定した属性の組合せ毎に、属性組合せ情報３３及び属性個物情報３２を用いて、特定のデータ項目名が出現する頻度（出現確率）を算出する（ステップＢ６）。 Next, when it is determined Yes in step B5, the attribute estimation unit 23 first sets a combination of a plurality of attributes that appear at the same time using the attribute estimated in step B4. Next, as shown in FIGS. 11A and 11B, the item name estimation unit 24 uses the attribute combination information 33 and the attribute individual information 32 for each set attribute combination to specify a specific data item name. Is calculated (appearance probability) (step B6).

図１１（ａ）は、本発明の実施の形態で用いられる属性組合せ情報の一例を示す図である。図１１（ｂ）は、本発明の実施の形態で算出されたデータ項目名の出現頻度の一例を示す図である。図１１（ｂ）において、「推定対象属性：［属性組合せ］」は、ある属性組合せにおける、データ項目名の推定対象となる属性を意味している。例えば、「日付：［地名、日付］」は、属性の組合せが「地名、日付」である場合において、データ項目名の推定対象は「日付」であることを意味している。 FIG. 11A is a diagram showing an example of attribute combination information used in the embodiment of the present invention. FIG.11 (b) is a figure which shows an example of the appearance frequency of the data item name calculated in embodiment of this invention. In FIG. 11B, “estimation target attribute: [attribute combination]” means an attribute that is an estimation target of a data item name in a certain attribute combination. For example, “date: [place name, date]” means that when the combination of attributes is “place name, date”, the estimation target of the data item name is “date”.

また、本実施の形態では、項目名推定部２４は、例えば、属性の組み合わせから、ある属性を持つデータ項目のデータ項目名を推定する相関ルールを作成する。更に、項目名推定部２４は、属性個物情報３２を参照して、推定属性と同じ属性を持つ個物（データ項目名）を特定し、特定した個物について、相関ルールを用いて、データ項目名として選ばれる確率を算出する。 In the present embodiment, the item name estimation unit 24 creates, for example, an association rule that estimates the data item name of a data item having a certain attribute from a combination of attributes. Further, the item name estimation unit 24 refers to the attribute individual information 32, identifies an individual (data item name) having the same attribute as the estimated attribute, and uses the association rule for the identified individual to obtain data. The probability of being selected as an item name is calculated.

更に、相関ルールの作成は、例えば、下記の参照文献１または参照文献２に開示されている相関関係を利用したアルゴリズムを用いることによって行なうことができる。
（参照文献１）G. Piatetsky-Shapiro(1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA.
（参照文献２）R. Agrawal, T. Imielinski, and A. Swami(1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207-216. Furthermore, the creation of an association rule can be performed by using, for example, an algorithm using the correlation disclosed in Reference Document 1 or Reference Document 2 below.
(Reference 1) G. Piatetsky-Shapiro (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and WJ Frawley, editors, Knowledge Discovery in Databases. AAAI / MIT Press, Cambridge, MA.
(Reference 2) R. Agrawal, T. Imielinski, and A. Swami (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207-216 .

次に、項目名推定部２４は、対象テーブルのデータ項目を１つ選択する。そして、項目名推定部２４は、選択したデータ項目の推定された属性と、それと同時に出現している他の属性との組合せを、ステップＢ６で得られた算出結果に照合し、最も出現確率が高いデータ項目名を、選択したデータ項目のデータ項目名と推定する（ステップＢ７）。 Next, the item name estimation unit 24 selects one data item of the target table. Then, the item name estimation unit 24 collates the combination of the estimated attribute of the selected data item and the other attribute appearing at the same time with the calculation result obtained in step B6, and has the highest appearance probability. A high data item name is estimated as the data item name of the selected data item (step B7).

次に、項目名推定部２４は、全てのデータ項目についてデータ項目名を推定したかどうかを判定する（ステップＢ８）。判定の結果、全てのデータ項目についてデータ項目名を推定していない場合は、項目名推定部２４は、再度ステップＢ７を実行する。 Next, the item name estimation unit 24 determines whether data item names have been estimated for all data items (step B8). As a result of the determination, if the data item names are not estimated for all the data items, the item name estimation unit 24 executes Step B7 again.

一方、判定の結果、全てのデータ項目についてデータ項目名を推定している場合は、項目名推定部２４は、推定結果表示部２５に結果を表示させる。これにより、データ項目名推定装置１００における処理は終了する。 On the other hand, if it is determined that the data item names are estimated for all data items, the item name estimation unit 24 causes the estimation result display unit 25 to display the results. Thereby, the process in the data item name estimation apparatus 100 is complete | finished.

また、本実施の形態では、項目名推定部２４は、属性組合せ情報３３を入力として、データ項目名を出力するニューラルネットワークを形成し、形成したニューラルネットワークを用いて、データ項目名を推定しても良い。更に、項目名推定部２４は、属性組合せ情報３３を入力として、データ項目の属性の組み合わせから出現率が最も高いデータ項目名を推定するためのベイズ推定を構築し、構築したベイス推定を用いて、データ項目名を推定しても良い。 In this embodiment, the item name estimation unit 24 receives the attribute combination information 33 as an input, forms a neural network that outputs the data item name, and estimates the data item name using the formed neural network. Also good. Further, the item name estimation unit 24 receives the attribute combination information 33 as input, constructs a Bayesian estimation for estimating the data item name having the highest appearance rate from the combination of the attributes of the data items, and uses the constructed Bayesian estimation. The data item name may be estimated.

以上のように、本実施の形態では、データベースに登録されているデータに関する知識を必要とすることなく、データ項目に付与される項目名を推定することができる。このため、本実施の形態を用いれば、データ分析又はデータ統合をする場合において、データの知識を有していなくても、同一の意味を持ち、統合可能なデータ項目を容易に特定できる。 As described above, in the present embodiment, it is possible to estimate an item name given to a data item without requiring knowledge about data registered in the database. For this reason, if this embodiment is used, when data analysis or data integration is performed, data items having the same meaning and that can be integrated can be easily specified without knowledge of data.

［プログラム］
本実施の形態におけるプログラムは、コンピュータに、図５に示すステップＡ１〜Ａ６、図９に示すステップＢ１〜Ｂ８を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態におけるデータ項目推定装置１００とデータ項目推定方法とを実現することができる。この場合、コンピュータのＣＰＵ（Central Processing Unit）は、学習処理部１０及び推定処理部２０として機能し、処理を行なう。 [program]
The program in the present embodiment may be a program that causes a computer to execute steps A1 to A6 shown in FIG. 5 and steps B1 to B8 shown in FIG. By installing and executing this program on a computer, the data item estimation device 100 and the data item estimation method in the present embodiment can be realized. In this case, a CPU (Central Processing Unit) of the computer functions as the learning processing unit 10 and the estimation processing unit 20 and performs processing.

また、本実施の形態では、記憶部３０は、コンピュータに備えられたハードディスク等の記憶装置に、属性情報３１、属性個物情報３２、及び属性組合せ情報３３を構成するデータファイルを格納することによって実現できる。 Further, in the present embodiment, the storage unit 30 stores data files constituting the attribute information 31, the attribute individual information 32, and the attribute combination information 33 in a storage device such as a hard disk provided in the computer. realizable.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、学習処理部１０及び推定処理部２０のいずれかとして機能しても良い。また、記憶部３０は、本実施の形態におけるプログラムを実行するコンピュータとは別のコンピュータ上に構築されていても良い。 The program in the present embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer may function as either the learning processing unit 10 or the estimation processing unit 20. The storage unit 30 may be constructed on a computer different from the computer that executes the program in the present embodiment.

ここで、本実施の形態におけるプログラムを実行することによって、データ項目推定装置１００を実現するコンピュータについて図１２を用いて説明する。図１２は、本発明の実施の形態におけるデータ項目推定装置を実現するコンピュータの一例を示すブロック図である。 Here, a computer that realizes the data item estimation apparatus 100 by executing the program according to the present embodiment will be described with reference to FIG. FIG. 12 is a block diagram illustrating an example of a computer that implements the data item estimation apparatus according to the embodiment of the present invention.

図１２に示すように、コンピュータ１１０は、ＣＰＵ１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。 As shown in FIG. 12, the computer 110 includes a CPU 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. These units are connected to each other via a bus 121 so that data communication is possible.

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 performs various calculations by developing the program (code) in the present embodiment stored in the storage device 113 in the main memory 112 and executing them in a predetermined order. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Further, the program in the present embodiment is provided in a state of being stored in a computer-readable recording medium 120. Note that the program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Specific examples of the storage device 113 include a hard disk drive and a semiconductor storage device such as a flash memory. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and a mouse. The display controller 115 is connected to the display device 119 and controls display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, and reads a program from the recording medium 120 and writes a processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic recording media such as a flexible disk, or CD- An optical recording medium such as ROM (Compact Disk Read Only Memory) can be used.

なお、本実施の形態におけるデータ項目推定装置１００は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、データ項目推定装置１００は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 Note that the data item estimation apparatus 100 according to the present embodiment can be realized not by using a computer in which a program is installed but also by using hardware corresponding to each unit. Furthermore, the data item estimation apparatus 100 may be partially realized by a program and the remaining part may be realized by hardware.

データ統合を行う際、統合対象のデータに関して、データ構造とデータ項目の意味の知識を有していなくても、推定により同じデータ項目を持つデータ構造の推定が可能となる。 When performing data integration, it is possible to estimate a data structure having the same data item by estimation even if the data to be integrated does not have knowledge of the meaning of the data structure and the data item.

１０学習処理部
１１学習テーブル受付部
１２特徴抽出部
１３属性情報作成部
１４属性個物情報作成部
１５属性組合せ情報作成部
２０推定処理部
２１対象テーブル受付部
２２特徴抽出部
２３属性推定部
２４項目名推定部
２５推定結果表示部
２６結果編集部
３０記憶部
３１属性情報
３２属性個物情報
３３属性組合せ情報
１００データ項目名推定装置
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス DESCRIPTION OF SYMBOLS 10 Learning processing part 11 Learning table reception part 12 Feature extraction part 13 Attribute information creation part 14 Attribute piece information creation part 15 Attribute combination information creation part 20 Estimation processing part 21 Target table reception part 22 Feature extraction part 23 Attribute estimation part 24 Item Name estimation unit 25 Estimation result display unit 26 Result editing unit 30 Storage unit 31 Attribute information 32 Attribute item information 33 Attribute combination information 100 Data item name estimation device 110 Computer 111 CPU
112 Main Memory 113 Storage Device 114 Input Interface 115 Display Controller 116 Data Reader / Writer 117 Communication Interface 118 Input Device 119 Display Device 120 Recording Medium 121 Bus

Claims

For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, the relationship between the extracted feature value and the attribute corresponding to the data item name is defined, and the attribute A learning processing unit for identifying a corresponding data item name for each, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
The feature value for each data item of the target table is collated with the definition by the learning processing unit, the attribute of each data item is estimated, the attribute for each estimated data item is applied to the learning model, An estimation processing unit that estimates a data item name of a data item of the target table;
A data item name estimation device comprising:

The learning processing unit
Create attribute information that defines the relationship between the extracted feature value and the attribute corresponding to the data item name,
Further, for each attribute included in the created attribute information, create attribute individual information with a corresponding data item name,
Then, using the attribute individual information, as the learning model, to create attribute combination information indicating a combination of the attributes in the learning table and a data item name corresponding to each of the attributes,
The data item name estimation apparatus according to claim 1.

The estimation processing unit
Extract feature values for each data item from the records in the target table,
The extracted feature value is collated with the attribute information to estimate the attribute of each data item,
Then, the attribute combination is set using the estimated attribute, and the frequency of occurrence of a specific data item name is calculated for each set attribute combination using the attribute combination information and the attribute individual information. ,
Based on the calculation result, estimate the data item name of each data item of the target table,
The data item name estimation device according to claim 2.

(A) For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, and the relationship between the extracted feature value and the attribute corresponding to the data item name is defined. Identifying a corresponding data item name for each attribute, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
(B) The feature value for each data item of the target table is collated with the definition obtained in the step (a), the attribute of each data item is estimated, and the attribute for each estimated data item is Applying to the learning model and estimating a data item name of a data item of the target table;
A data item name estimation method characterized by comprising:

In the step (a),
Create attribute information that defines the relationship between the extracted feature value and the attribute corresponding to the data item name,
Further, for each attribute included in the created attribute information, create attribute individual information with a corresponding data item name,
Then, using the attribute individual information, as the learning model, to create attribute combination information indicating a combination of the attributes in the learning table and a data item name corresponding to each of the attributes,
The data item name estimation method according to claim 4.

In the step (b),
Extract feature values for each data item from the records in the target table,
The extracted feature value is collated with the attribute information to estimate the attribute of each data item,
Then, the attribute combination is set using the estimated attribute, and the frequency of occurrence of a specific data item name is calculated for each set attribute combination using the attribute combination information and the attribute individual information. ,
Based on the calculation result, estimate the data item name of each data item of the target table,
The data item name estimation method according to claim 5.

On the computer,
(A) For each data item name in the learning table, the feature value of the data to which the data item name is assigned is extracted, and the relationship between the extracted feature value and the attribute corresponding to the data item name is defined. Identifying a corresponding data item name for each attribute, learning a combination of the attribute and the data item name in the learning table, and creating a learning model;
(B) The feature value for each data item of the target table is collated with the definition obtained in the step (a), the attribute of each data item is estimated, and the attribute for each estimated data item is Applying to the learning model and estimating a data item name of a data item of the target table;
A program that executes

In the step (a),
Create attribute information that defines the relationship between the extracted feature value and the attribute corresponding to the data item name,
Further, for each attribute included in the created attribute information, create attribute individual information with a corresponding data item name,
Then, using the attribute individual information, as the learning model, to create attribute combination information indicating a combination of the attributes in the learning table and a data item name corresponding to each of the attributes,
The program according to claim 7.

In the step (b),
Extract feature values for each data item from the records in the target table,
The extracted feature value is collated with the attribute information to estimate the attribute of each data item,
Then, the attribute combination is set using the estimated attribute, and the frequency of occurrence of a specific data item name is calculated for each set attribute combination using the attribute combination information and the attribute individual information. ,
Based on the calculation result, estimate the data item name of each data item of the target table,
The program according to claim 8.