JP2012038066A

JP2012038066A - Data processor and data processing method and program

Info

Publication number: JP2012038066A
Application number: JP2010177296A
Authority: JP
Inventors: Kiyoto Hosoda; 聖人細田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2010-08-06
Filing date: 2010-08-06
Publication date: 2012-02-23
Anticipated expiration: 2030-08-06
Also published as: JP5398663B2

Abstract

PROBLEM TO BE SOLVED: To improve the efficiency of work for extracting columns having a correspondence relation between two pieces of two-dimensional data.SOLUTION: A delimitation dividing part 11 selects a column pair to be an analysis target in migration source data, and a correlation rule calculating part 12 selects a column pair to be an analysis target in migration destination data, calculates a support degree and certainty in each row in the column pair of the migration source data, and calculates a support degree and certainty in each row for each column pair of the migration destination data. A correlation differential value calculating part 13 performs a difference calculation of a support degree and certainty between rows in the migration source data, and performs a difference calculation of a support degree and certainty between rows in each column pair in the migration destination data, a comparison calculating part 14 performs a difference calculation between a differential value in the migration source data and a differential value in the migration destination data, and a determining part 15 determines a column pair of the migration destination corresponding to the column pair of the migration source on the basis of a result of the difference calculation.

Description

本発明は、例えばデータを統合する際にカラム間の対応関係を解析する技術に関する。 The present invention relates to a technique for analyzing a correspondence relationship between columns when data is integrated, for example.

システム統合などに伴うデータ統合では、異なるデータベースのテーブル間にてカラム対応関係を取り、データの内容を両者に反映させることが必要である。
ここで、データ統合とは、移行元データベースと移行先データベースにおける、カラム名やデータの配置といった設計情報の差異を解消し、データの移行を実現することである。
このとき、システム間の類似したテーブルや、カラムの対応関係を判別する技術は、スキーママッチング技術と呼ばれる。
スキーママッチング技術の基本的な手法としては、スキーマ情報（カラム名称、型など）・インスタンス情報（単語や値の出現パターンなど）を利用した分析方法がある。
さらに応用的な手法として、複数カラム組間の対応関係を判別することが挙げられる。
ここで、複数カラム組間の対応関係とは、あるカラムの組と、別のカラムの組に対する対応関係を意味する。
上記複数カラム組間の対応関係の１つとして、システム統合の移行元におけるカラム内容（データ）を、ある特定の位置で分割した後、移行先の複数カラムに対応を取る場合が挙げられる。
具体例としては、移行元で電話番号を１つのカラムで取り扱っていたものを、移行先では局番で分割し３つのカラムで扱うといった例、移行元で氏名として１つのカラムで扱っていたものを、移行先で姓と名に分割し２カラムで扱うといった例、等が挙げられる。 In data integration associated with system integration, it is necessary to establish column correspondence between tables in different databases and reflect the data contents in both.
Here, data integration is to realize data migration by eliminating design information differences such as column names and data arrangement in the migration source database and the migration destination database.
At this time, a technique for discriminating the correspondence between similar tables and columns between systems is called a schema matching technique.
As a basic method of the schema matching technique, there is an analysis method using schema information (column name, type, etc.) and instance information (word, value appearance pattern, etc.).
Further, as an applied technique, it is possible to discriminate the correspondence between a plurality of column sets.
Here, the correspondence relationship between a plurality of column sets means a correspondence relationship between a certain column set and another column set.
As one of the correspondence relationships between the plurality of column groups, there is a case where the column contents (data) at the migration source of system integration are divided at a specific position and then dealt with at a plurality of migration destination columns.
As a specific example, an example in which a phone number is handled in one column at the migration source, a phone number is divided into three columns at the migration destination, and the name is handled in one column as a name at the migration source. An example of dividing into first name and last name at the migration destination and handling them in two columns is given.

特許文献１の技術では、相関ルール（支持度・確信度）を算出することによって、複数カラム組間の対応関係を判定する。
相関ルールとは、ある対象Ａと対象Ｂの間の相関関係を示す次の２つの値である。
確信度とは、Ａ選択者がＢを選ぶ確率である。
支持度とは、関係の全体においてＡとＢが同時に出現する確率である。
換言すると、確信度は、対象Ａが含まれるレコード数に対して、対象Ａと対象Ｂが共に含まれるレコード数の割合である。
また、支持度は、全レコード数に対して、対象Ａと対象Ｂが共に含まれるレコード数の割合である。
特許文献１による、複数カラム組間の対応判定方法を説明する。
同一テーブル内の２つのカラムに注目したとき、あるカラムを指定した際に、同一テーブル内に存在する別カラムとの間で、相関ルール計算を実施することで、両者の対応関係を判定する。
特許文献１の技術では、例えば、市場調査の目的等のために、相互に別個のカラムであるワインの購入に関するカラムとチーズの購入に関するカラムの支持度及び確信度を計算し、ワインを購入する人がチーズを購入する確率が高い等の相関を抽出している。 In the technique of Patent Document 1, the correspondence between a plurality of column sets is determined by calculating an association rule (support level / confidence level).
The correlation rule is the following two values indicating the correlation between a certain target A and target B.
The certainty factor is a probability that the A selector selects B.
The degree of support is the probability that A and B appear simultaneously in the entire relationship.
In other words, the certainty factor is the ratio of the number of records in which both the target A and the target B are included to the number of records in which the target A is included.
The support level is the ratio of the number of records in which both the target A and the target B are included to the total number of records.
A method for determining correspondence between a plurality of column sets according to Patent Document 1 will be described.
When attention is paid to two columns in the same table, when a certain column is designated, correlation rules are calculated with another column existing in the same table, thereby determining the correspondence between the two.
In the technique of Patent Document 1, for example, for the purpose of market research, the support degree and the certainty factor of the column relating to the purchase of wine and the column relating to the purchase of cheese, which are separate columns, are calculated and the wine is purchased. Correlations such as high probability that a person purchases cheese are extracted.

特開２０００−３５３１６３号公報JP 2000-353163 A

特許文献１の技術によれば、同一データを二つに割った関係（例：姓と名）の間では、支持度・確信度共に高い値を算出すると考えられる。
しかし、データ統合の場合は、移行元と移行先が存在し、通常対応をとるべきカラム組が別々のテーブルに配置される。
このとき、レコード数は同一であると仮定しても、従来手法では移行元と移行先で、レコードの並び順に関連が無く、独立しているため、相関関係の発見はできない。
例えば、移行元にて「姓」に対応するカラムと、移行先の「名」に対応するカラムを結合して、相関ルール計算を実施したとしても、この結合されたデータは、別々のテーブルに存在していたデータを結合した内容であり、同一レコードに存在するデータではないので特定の関連が必ずしもあるわけではなく、相関ルール計算にて高い値は算出できない可能性が高く、判断は不可能である。
また、特許文献１を用いれば、移行元データの姓：「佐藤」に対しては移行先データの名「一郎」が出現する可能性が高いとの結論が得られるのみであり、移行元データのカラムに対応する移行先データのカラムを抽出することはできない。 According to the technique of Patent Document 1, it is considered that a high value is obtained for both the support level and the certainty level in the relationship (for example, surname and first name) obtained by dividing the same data into two.
However, in the case of data integration, there are a migration source and a migration destination, and column sets that should normally be dealt with are arranged in separate tables.
At this time, even if it is assumed that the number of records is the same, in the conventional method, the transfer source and the transfer destination have no relation in the order of record arrangement and are independent, and thus the correlation cannot be found.
For example, even if the column corresponding to “Last Name” at the migration source and the column corresponding to “First Name” at the migration destination are combined and the association rule calculation is performed, the combined data is stored in separate tables. It is the contents of existing data combined, and it is not data that exists in the same record, so there is not necessarily a specific relationship, and it is highly possible that a high value cannot be calculated by the correlation rule calculation, and judgment is impossible It is.
Further, if Patent Document 1 is used, it is only possible to obtain a conclusion that there is a high possibility that the name “Ichiro” of the destination data will appear for the last name of the source data: “Sato”. It is not possible to extract the migration destination data column corresponding to this column.

本発明は、上記の課題を解決することを主な目的としており、２つのデータの間で対応関係にあるカラムを抽出する作業の効率を向上することを主な目的とする。 The main object of the present invention is to solve the above-mentioned problems, and it is a main object of the present invention to improve the efficiency of the work of extracting a column having a correspondence relationship between two data.

本発明に係るデータ処理装置は、
複数のフィールドが含まれ、各フィールドが複数のカラムのうちのいずれかに区分される２次元の第１のデータに対して、解析の対象となるカラム対を第１の解析対象カラム対として選択するカラム対選択処理を実行するカラム対選択処理実行部と、
第１の解析対象カラム対の各カラムのフィールド値を行単位で連結した連結フィールド値の出現傾向を解析する第１の出現傾向解析処理を実行する第１の出現傾向解析処理実行部と、
複数のフィールドが含まれ、各フィールドが複数のカラムのうちのいずれかに区分される２次元の第２のデータに対して、解析の対象となるカラム対を第２の解析対象カラム対として１対以上選択し、第２の解析対象カラム対ごとに、第２の解析対象カラム対の各カラムのフィールド値を行単位で連結した連結フィールド値の出現傾向を解析する第２の出現傾向解析処理を実行する第２の出現傾向解析処理実行部と、
第１の解析対象カラム対に対する解析結果と第２の解析対象カラム対ごとの解析結果とを解析して、第２の解析対象カラム対ごとに、第１の解析対象カラム対における連結フィールド値の出現傾向との近似度を算出する近似度算出処理を実行する近似度算出処理実行部とを有することを特徴とする。 The data processing apparatus according to the present invention
Select a column pair to be analyzed as the first column pair to be analyzed for two-dimensional first data that contains multiple fields and each field is divided into one of multiple columns. A column pair selection process execution unit for executing the column pair selection process to be performed;
A first appearance tendency analysis processing execution unit for executing a first appearance tendency analysis process for analyzing an appearance tendency of a concatenated field value obtained by concatenating field values of each column of the first analysis target column pair in units of rows;
For two-dimensional second data in which a plurality of fields are included and each field is divided into one of a plurality of columns, the column pair to be analyzed is set as 1 as the second analysis target column pair. Second appearance trend analysis processing for selecting the pair or more and analyzing the appearance tendency of the connected field value obtained by connecting the field values of each column of the second analysis target column pair in units of rows for each second analysis target column pair A second appearance tendency analysis processing execution unit for executing
The analysis result for the first analysis target column pair and the analysis result for each second analysis target column pair are analyzed, and the concatenated field value in the first analysis target column pair is determined for each second analysis target column pair. It has an approximation degree calculation process execution part which performs the approximation degree calculation process which calculates the approximation degree with an appearance tendency, It is characterized by the above-mentioned.

本発明によれば、第１のデータの第１の解析対象カラム対の連結フィールド値の出現傾向と、第２のデータの第２の解析対象カラム対ごとの連結フィールド値の出現傾向とを解析し、それぞれの解析結果を解析して、第２の解析対象カラム対ごとに第１の解析対象カラム対における連結フィールド値の出現傾向との近似度を算出するため、第１の解析対象カラム対と対応関係にある第２のデータ内のカラム対を抽出する作業の効率を向上させることができる。 According to the present invention, the appearance tendency of the connection field value of the first analysis target column pair of the first data and the appearance tendency of the connection field value for each second analysis target column pair of the second data are analyzed. Then, each analysis result is analyzed, and for each second analysis target column pair, the degree of approximation with the appearance tendency of the connected field value in the first analysis target column pair is calculated. It is possible to improve the efficiency of the operation of extracting the column pair in the second data that has a correspondence relationship with the second data.

実施の形態１に係るテーブル統合装置の構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of a table integration device according to the first embodiment. 実施の形態１に係るテーブル統合装置とデータベースとの関係を示す図。The figure which shows the relationship between the table integration apparatus which concerns on Embodiment 1, and a database. 実施の形態１に係るテーブル統合装置の処理概要を示すフローチャート図。FIG. 3 is a flowchart showing a processing outline of the table integration device according to the first embodiment. 実施の形態１に係るカラムの分割例を示す図。FIG. 4 is a diagram showing an example of column division according to the first embodiment. 実施の形態１に係る相関ルールを説明する図。FIG. 6 is a diagram for explaining an association rule according to the first embodiment. 実施の形態１に係る相関ルール計算処理を示すフローチャート図。FIG. 3 is a flowchart showing association rule calculation processing according to the first embodiment. 実施の形態１に係る相関差分値計算処理及び比較計算処理を示すフローチャート図。FIG. 4 is a flowchart showing correlation difference value calculation processing and comparison calculation processing according to the first embodiment. 実施の形態１に係る支持度の計算結果の例を示す図。FIG. 4 is a diagram illustrating an example of a support degree calculation result according to the first embodiment. 実施の形態２に係るテーブル統合装置の構成例を示す図。FIG. 4 is a diagram illustrating a configuration example of a table integration device according to a second embodiment. 実施の形態３に係るカラム選択の例を示す図。FIG. 10 shows an example of column selection according to the third embodiment. 実施の形態４に係るテーブル統合装置の構成例を示す図。FIG. 10 is a diagram illustrating a configuration example of a table integration device according to a fourth embodiment. 実施の形態４に係るカラム選択の例を示す図。FIG. 10 shows an example of column selection according to the fourth embodiment. 実施の形態５に係るテーブル統合装置の構成例を示す図。FIG. 10 is a diagram illustrating a configuration example of a table integration device according to a fifth embodiment. 実施の形態１に係る移行元データ及び移行先データの例を示す図。The figure which shows the example of the transfer source data and transfer destination data which concern on Embodiment 1. FIG. 実施の形態１〜５に係るテーブル統合装置のハードウェア構成例を示す図。The figure which shows the hardware structural example of the table integration apparatus which concerns on Embodiment 1-5.

実施の形態１．
図１は、本実施の形態に係るテーブル統合装置の構成例を示す。
図１に示すように、テーブル統合装置１は移行元システム５０１の移行元データベース５０２と移行先システム６０１の移行先データベース６０２に接続される。
本実施の形態では、移行元データベース５０２と移行先データベース６０２はリレーショナルデータベースである。
本実施の形態に係るテーブル統合装置１は、移行元データベース５０２内の２次元データ（以下、移行元データという）の特定のカラムが、移行先データベース６０２内の２次元データ（以下、移行先データという）の複数カラムのうちのいずれのカラムと対応関係にあるのかを判定する。
より具体的には、本実施の形態に係るテーブル統合装置１は、移行元データの特定のカラムが保持するデータを分割した内容が、移行先データの複数カラムのうちのいずれのカラムと対応関係にあるのかを判定する。 Embodiment 1 FIG.
FIG. 1 shows a configuration example of a table integration device according to the present embodiment.
As shown in FIG. 1, the table integration device 1 is connected to a migration source database 502 of the migration source system 501 and a migration destination database 602 of the migration destination system 601.
In this embodiment, the migration source database 502 and the migration destination database 602 are relational databases.
In the table integration device 1 according to the present embodiment, the specific column of the two-dimensional data in the migration source database 502 (hereinafter referred to as migration source data) is the two-dimensional data in the migration destination database 602 (hereinafter referred to as migration destination data). It is determined which of the plurality of columns is in a correspondence relationship.
More specifically, in the table integration device 1 according to the present embodiment, the content obtained by dividing the data held in the specific column of the migration source data corresponds to any column of the plurality of columns of the migration destination data. It is determined whether it is in.

詳細は後述するが、本実施の形態では、例えば図１４に示す移行元データ及び移行先データを対象とする。
図１４（ａ）は移行元データを示し、図１４（ｂ）は移行先データを示す。
移行元データ、移行先データともに、複数のフィールドが含まれ、各フィールドが複数のカラムのうちのいずれかに区分される２次元のデータである。
本実施の形態では、移行元データのシステム管理者のカラムが解析対象となる。
また、移行先データでは、利用者、利用申請者、利用許諾者の各カラムにおいて個人名が示されている。
このため、移行元のシステム管理者のカラムと対応関係にあるカラムが利用者、利用申請者、利用許諾者のいずれであるかを特定する必要がある。
なお、移行先データにおいて、利用者とは、利用申請を行って利用申請が許諾された場合にのみ利用が認められる所定のシステムを実際に利用する者を表す。
利用申請者とは、利用者のために当該システムの利用申請を行った者を表す。利用者本人であってもよい。
利用許諾者とは、当該システムの利用申請に対して利用を許諾した者を表す。
なお、移行元データではシステム管理者の「姓」と「名」が１つのカラムに収容されているが、移行先データでは個人名が「姓」と「名」の異なるカラムに収容されている。
このため、移行先データとの照合のために、移行元データのシステム管理者のカラムは、「姓」を表すカラムと「名」を表すカラムに分割する必要がある。
ここで、カラムとカラムのフィールド値を行単位で連結したものを連結フィールド値という。
例えば、移行元データにおいて分割された後の「姓」を表すカラムと「名」を表すカラムのフィールド値を行単位で連結したもの（例えば、「佐藤」＋「一郎」）を連結フィールド値という。
同様に、移行先データの「姓」を表すカラムと「名」を表すカラムのフィールド値を行単位で連結したもの（例えば、「山本」＋「一郎」）も連結フィールド値という。
本実施の形態では、移行元データは第１のデータの例であり、移行先データは第２のデータの例である。 Although details will be described later, in the present embodiment, for example, migration source data and migration destination data shown in FIG. 14 are targeted.
FIG. 14A shows the migration source data, and FIG. 14B shows the migration destination data.
Both the migration source data and the migration destination data are two-dimensional data in which a plurality of fields are included and each field is divided into one of a plurality of columns.
In the present embodiment, the column of the system administrator of the migration source data is the analysis target.
Further, in the migration destination data, the individual name is shown in each column of the user, the use applicant, and the licensee.
For this reason, it is necessary to specify whether the column corresponding to the column of the system administrator of the migration source is a user, a use applicant, or a licensee.
In the migration destination data, the user represents a person who actually uses a predetermined system that is allowed to be used only when the use application is made and the use application is permitted.
The use applicant represents a person who has applied for the use of the system for the user. The user himself / herself may be used.
The licensed user is a person who has licensed the usage application for the system.
In the source data, the system administrator's "last name" and "first name" are stored in one column, but in the destination data, the personal name is stored in different columns of "last name" and "first name". .
Therefore, for comparison with the migration destination data, the column of the system administrator of the migration source data needs to be divided into a column representing “last name” and a column representing “first name”.
Here, a concatenation of column and column field values in units of rows is called a concatenated field value.
For example, a concatenated field value (for example, “Sato” + “Ichiro”) obtained by concatenating the column values representing “last name” and the column representing “first name” in a row unit after being divided in the migration source data. .
Similarly, the column value representing the “last name” and the column value representing the “first name” in the migration destination data concatenated in units of rows (for example, “Yamamoto” + “Ichiro”) is also referred to as a concatenated field value.
In the present embodiment, the migration source data is an example of the first data, and the migration destination data is an example of the second data.

図１に示すように、テーブル統合装置１において、データベース接続部２０は、移行元データベース５０２に接続し、また、移行先データベース６０２に接続する。
データベース接続部２０は、図２に示すように、記憶領域１６内の接続情報保持部２１から接続情報をロードし、移行元システム５０１における移行元データベース５０２に接続し、移行先システム６０１における移行先データベース６０２に接続する。
そして、データベース定義情報１０１、インスタンスデータ１０２、区切り情報１０３を取得し、取得したデータベース定義情報１０１を定義情報取得部１７に出力し、インスタンスデータ１０２をデータ取得部１８に出力し、区切り情報１０３を区切り情報取得部１９に出力する。 As shown in FIG. 1, in the table integration device 1, the database connection unit 20 connects to the migration source database 502 and connects to the migration destination database 602.
As illustrated in FIG. 2, the database connection unit 20 loads connection information from the connection information holding unit 21 in the storage area 16, connects to the migration source database 502 in the migration source system 501, and migrates to the migration destination system 601. Connect to database 602.
Then, the database definition information 101, the instance data 102, and the delimiter information 103 are acquired, the acquired database definition information 101 is output to the definition information acquisition unit 17, the instance data 102 is output to the data acquisition unit 18, and the delimiter information 103 is stored. The data is output to the delimiter information acquisition unit 19.

定義情報取得部１７は、データベース接続部２０からデータベース定義情報１０１を取得し、取得したデータベース定義情報１０１を記憶領域１６内の定義情報保持部１６１に格納する。
データベース定義情報１０１は、移行元データ及び移行先データの各々について、例えばカラムの個数、各カラムの属性、各カラムのデータ型等が示されている。
定義情報取得部１７がデータベース定義情報１０１を取得する対象のシステムは、単一システムが複数のデータベースを備える構成であってもよい。 The definition information acquisition unit 17 acquires the database definition information 101 from the database connection unit 20 and stores the acquired database definition information 101 in the definition information holding unit 161 in the storage area 16.
The database definition information 101 indicates, for example, the number of columns, the attribute of each column, the data type of each column, etc. for each of the migration source data and the migration destination data.
The target system from which the definition information acquisition unit 17 acquires the database definition information 101 may be configured such that a single system includes a plurality of databases.

データ取得部１８は、データベース接続部２０からインスタンスデータ１０２を取得し、取得したインスタンスデータ１０２を記憶領域１６内の取得データ保持部１６２に格納する。
インスタンスデータ１０２は、移行元データのテーブルに格納されるフィールド値、移行先データのテーブルに格納されるフィールド値である。
なお、データベース接続部２０が移行元データベース５０２、移行先データベース６０２とネットワークを介して接続していないなどの場合は、インスタンスデータを記録媒体を介してオフラインにて取得してもよい。 The data acquisition unit 18 acquires the instance data 102 from the database connection unit 20 and stores the acquired instance data 102 in the acquired data holding unit 162 in the storage area 16.
The instance data 102 is a field value stored in the migration source data table and a field value stored in the migration destination data table.
Note that when the database connection unit 20 is not connected to the migration source database 502 and the migration destination database 602 via a network, the instance data may be acquired offline via a recording medium.

区切り情報取得部１９は、データベース接続部２０から区切り情報１０３を取得し、取得した区切り情報１０３を記憶領域１６内の区切り文字情報保持部１６３に格納する。
本実施の形態では、移行元データのカラムのデータ値を２つに分割する例を説明するので、区切り情報１０３には、移行元データのカラムのデータ値を２つに分割する際に目印となる区切り文字が示されている。 The delimiter information acquisition unit 19 acquires the delimiter information 103 from the database connection unit 20 and stores the acquired delimiter information 103 in the delimiter character information holding unit 163 in the storage area 16.
In this embodiment, an example in which the data value of the column of the migration source data is divided into two will be described. Therefore, the delimiter information 103 includes a mark and a mark when the data value of the column of the migration source data is divided into two. The delimiter is shown.

区切り分割部１１は、区切り情報取得部１９が取得した区切り情報１０３に基づき、移行元データの特定のカラムの内容を分割し、２種類の仮想カラムデータとして分割データ保持部１６４に保持する。
分割対象のカラムは、例えば、ユーザＩ／Ｆ２２を通じてユーザから指定される。
区切り分割部１１により分割データ保持部１６４に格納されるカラム対（２つの仮想カラム）は、移行元データにおいて解析の対象となるカラム対であり、第１の解析対象カラム対の例である。なお、カラム対はカラム組ともいう。
そして、区切り分割部１１は、第１の解析対象カラム対を選択する処理を行っており、カラム対選択処理実行部の例である。 The delimiter division unit 11 divides the contents of a specific column of the migration source data based on the delimiter information 103 acquired by the delimiter information acquisition unit 19 and stores the contents in the divided data holding unit 164 as two types of virtual column data.
The column to be divided is designated by the user through the user I / F 22, for example.
The column pair (two virtual columns) stored in the divided data holding unit 164 by the delimiter dividing unit 11 is a column pair to be analyzed in the migration source data, and is an example of a first analysis target column pair. The column pair is also called a column set.
The delimiter dividing unit 11 performs a process of selecting the first analysis target column pair, and is an example of a column pair selection process executing unit.

相関ルール計算部１２は、分割データ保持部１６４よりデータを入手し、相関ルール計算を実施し、結果を相関ルール計算結果保持部１６５に保持する。
より具体的には、相関ルール計算部１２は、移行元データの解析対象のカラム対（第１の解析対象カラム対）のインスタンス、すなわち、解析対象のカラム対の各カラムのフィールド値を行単位で連結した連結フィールド値ごとに支持度と確信度を算出する。
支持度及び確信度は、連結フィールド値ごとの出現傾向を表す。
支持度及び確信度をまとめて相関ルールともいう。
また、相関ルール計算部１２は、移行先データにおいて解析の対象となるカラム対を１対以上選択する。
本実施の形態では、移行先データに含まれるカラムにおける全種類のカラムの組合せを解析の対象とする。
移行先データにおいて解析の対象となるカラム対は、第２の解析対象カラム対の例である。
そして、相関ルール計算部１２は、移行先データの解析対象のカラム対（第２の解析対象カラム対）のインスタンス、すなわち、解析対象のカラム対の各カラムのフィールド値を行単位で連結した連結フィールド値ごとに支持度と確信度を算出する。
移行先データにおける支持度と確信度の算出は、カラム対ごとに行う。
相関ルール計算部１２は、第１の出現傾向解析処理実行部と第２の出現傾向解析処理実行部の例である。 The correlation rule calculation unit 12 obtains data from the divided data holding unit 164, performs a correlation rule calculation, and holds the result in the correlation rule calculation result holding unit 165.
More specifically, the correlation rule calculation unit 12 calculates the instance of the analysis target column pair (first analysis target column pair) of the migration source data, that is, the field value of each column of the analysis target column pair in units of rows. The support level and the certainty level are calculated for each linked field value linked in.
The support level and the certainty level represent the appearance tendency for each connected field value.
The support level and the certainty level are collectively referred to as an association rule.
Further, the association rule calculation unit 12 selects one or more column pairs to be analyzed in the migration destination data.
In this embodiment, combinations of all types of columns in the columns included in the migration destination data are targeted for analysis.
The column pair to be analyzed in the migration destination data is an example of the second analysis target column pair.
Then, the association rule calculation unit 12 concatenates the instance of the analysis target column pair (second analysis target column pair) of the migration destination data, that is, the connection of the field values of each column of the analysis target column pair in units of rows. Calculate support and confidence for each field value.
The support level and the certainty level in the migration destination data are calculated for each column pair.
The association rule calculation unit 12 is an example of a first appearance tendency analysis processing execution unit and a second appearance tendency analysis processing execution unit.

相関差分値計算部１３は、相関ルール計算部１２の相関ルール計算によって取得された支持度間の差分値と確信度間の差分値を、移行元データ及び移行先データの各々で計算し、移行元データにおける支持度の差分値と確信度の差分値、移行先データにおける支持度の差分値と確信度の差分値を相関差分計算結果保持部１６６に保持する。
より具体的には、相関差分値計算部１３は、移行元データのカラム対における連結フィールド値間の支持度の差分を算出する。なお、この移行元データのカラム対における支持度の差分は、第１の支持度１次差分値に相当する。
また、相関差分値計算部１３は、移行元データのカラム対における連結フィールド値間の確信度の差分を算出する。なお、この移行元データのカラム対における確信度の差分は、第１の確信度１次差分値に相当する。
相関差分値計算部１３は、移行先データについても同様の計算を行う。
つまり、相関差分値計算部１３は、移行先データのカラム対ごとに、カラム対における連結フィールド値間の支持度の差分を算出する。なお、この移行先データのカラム対における支持度の差分は、第２の支持度１次差分値に相当する。
また、相関差分値計算部１３は、移行先データのカラム対ごとに、カラム対における連結フィールド値間の確信度の差分を算出する。なお、この移行先データのカラム対における確信度の差分は、第２の確信度１次差分値に相当する。
そして、相関差分値計算部１３は、このようにして得られた移行元データにおける支持度の差分値（第１の支持度１次差分値）と確信度の差分値（第１の確信度１次差分値）、移行先データにおける支持度の差分値（第２の支持度１次差分値）と確信度の差分値（第２の確信度１次差分値）を相関差分計算結果保持部１６６に格納する。
相関差分値計算部１３は、後述の比較計算部１４とともに、近似度算出処理実行部の例である。 The correlation difference value calculation unit 13 calculates the difference value between the support levels obtained by the correlation rule calculation of the correlation rule calculation unit 12 and the difference value between the certainty factors for each of the migration source data and the migration destination data. The correlation difference calculation result holding unit 166 holds the difference value between the support level and the certainty level in the original data, and the difference value between the support level and the certainty level in the transfer destination data.
More specifically, the correlation difference value calculation unit 13 calculates a support level difference between linked field values in the column pair of the migration source data. Note that the difference in support level in the column pair of the migration source data corresponds to the first support level primary difference value.
Further, the correlation difference value calculation unit 13 calculates a difference in certainty between linked field values in the column pair of the migration source data. Note that the difference in the certainty factor in the column pair of the migration source data corresponds to the first certainty factor primary difference value.
The correlation difference value calculation unit 13 performs the same calculation for the transfer destination data.
That is, the correlation difference value calculation unit 13 calculates a support level difference between linked field values in a column pair for each column pair of migration destination data. Note that the difference in support level in the column pair of the migration destination data corresponds to a second support level primary difference value.
In addition, the correlation difference value calculation unit 13 calculates a certainty difference between linked field values in the column pair for each column pair of the migration destination data. In addition, the difference in the certainty factor in the column pair of the migration destination data corresponds to the second certainty factor primary difference value.
The correlation difference value calculation unit 13 then calculates the difference value of the support level (first support level primary difference value) and the difference value of the confidence level (first confidence level 1) in the transfer source data obtained in this way. Next difference value), the difference value of the support level (second support degree primary difference value) in the transfer destination data and the difference value of the confidence level (second confidence level primary difference value) are stored in the correlation difference calculation result holding unit 166. To store.
The correlation difference value calculation unit 13 is an example of an approximation degree calculation processing execution unit together with a comparison calculation unit 14 described later.

比較計算部１４では、移行元・移行先の相関差分計算結果を比較し、結果、カラム組間の対応候補であるものを比較計算結果保持部１６７に書き込む。
より具体的には、比較計算部１４は、移行先データのカラム対ごとに、同じ連結フィールド値の組合せから算出された移行元データの支持度の差分値（第１の支持度１次差分値）と移行先データの支持度の差分値（第２の支持度１次差分値）との差分を算出する。なお、この移行元データの支持度の差分値と移行先データの支持度の差分値との差分値は、支持度２次差分値に相当する。
比較計算部１４は、確信度についても同様の計算を行う。
つまり、比較計算部１４は、移行先データのカラム対ごとに、同じ連結フィールド値の組合せから算出された移行元データの確信度の差分値（第１の確信度１次差分値）と移行先データの確信度の差分値（第２の確信度１次差分値）との差分を算出する。なお、この移行元データの確信度の差分値と移行先データの確信度の差分値との差分値は、確信度２次差分値に相当する。
その後、比較計算部１４は、算出した支持度の差分値に対する合算及び商計算、算出した確信度の差分値に対する合算及び商計算を行った後、商計算後の支持度の差分値と商計算後の確信度の差分値を統合する計算を行い、統合結果を比較計算結果保持部１６７に格納する。
統合結果は、移行先データのカラム対ごとに、移行元データのカラム対における連結フィールド値の出現傾向との近似度を表す。
比較計算部１４は、前述の相関差分値計算部１３とともに、近似度算出処理実行部の例である。 The comparison calculation unit 14 compares the correlation difference calculation results of the migration source and the migration destination, and writes the result, which is a correspondence candidate between column pairs, to the comparison calculation result holding unit 167.
More specifically, the comparison calculation unit 14 determines, for each column pair of the transfer destination data, the difference value (first support degree primary difference value of the support level) of the transfer source data calculated from the combination of the same concatenated field values. ) And the difference value of the support level of the transfer destination data (second support level primary difference value). The difference value between the support level difference value of the migration source data and the support level difference value of the transfer destination data corresponds to the support level secondary difference value.
The comparison calculation unit 14 performs the same calculation for the certainty factor.
That is, for each column pair of the migration destination data, the comparison calculation unit 14 calculates the difference value (first confidence primary difference value) of the certainty factor of the migration source data calculated from the same combination field value combination and the migration destination. The difference with the difference value of data certainty factor (second certainty factor primary difference value) is calculated. Note that the difference value between the certainty factor difference value of the migration source data and the certainty factor difference value of the migration destination data corresponds to the certainty factor secondary difference value.
Thereafter, the comparison calculation unit 14 performs the summation and quotient calculation for the calculated difference value of the support level, the summation and quotient calculation for the calculated difference value of the certainty factor, and then the support difference value and the quotient calculation after the quotient calculation. The calculation for integrating the difference values of the certainty factors later is performed, and the integration result is stored in the comparison calculation result holding unit 167.
The integration result represents the degree of approximation with the appearance tendency of the connected field value in the column pair of the migration source data for each column pair of the migration destination data.
The comparison calculation unit 14 is an example of an approximation degree calculation processing execution unit together with the correlation difference value calculation unit 13 described above.

判定部１５は、比較計算部１４により算出された移行先データのカラム対ごとの統合結果（近似度）に基づき、移行元データのカラム対と対応する関係にある移行先データのカラム対の候補を対応候補カラム対として抽出する。
判定部１５は、対応候補抽出処理実行部の例である。 The determination unit 15 is based on the integration result (approximation) of each column pair of the migration destination data calculated by the comparison calculation unit 14, and candidates for the column pair of the migration destination data having a relationship corresponding to the column pair of the migration source data. Are extracted as corresponding candidate column pairs.
The determination unit 15 is an example of a correspondence candidate extraction process execution unit.

なお、区切り分割部１１、相関ルール計算部１２、相関差分値計算部１３、比較計算部１４、判定部１５、定義情報取得部１７、データ取得部１８、区切り情報取得部１９、データベース接続部２０は、それぞれ上述した処理を実現するプログラムとすることができる。
各要素をプログラムとした場合は、図示していないＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）が各要素のプログラムを実行して、上述の処理が実施される。 In addition, the division | segmentation division | segmentation part 11, the correlation rule calculation part 12, the correlation difference value calculation part 13, the comparison calculation part 14, the determination part 15, the definition information acquisition part 17, the data acquisition part 18, the division | segmentation information acquisition part 19, and the database connection part 20 Can be programs that implement the processes described above.
When each element is a program, a CPU (Central Processing Unit) (not shown) executes the program for each element, and the above-described processing is performed.

記憶領域１６は、メモリあるいはハードディスクにより実現されるデータ格納領域である。 The storage area 16 is a data storage area realized by a memory or a hard disk.

次に動作について説明する。
図３は、図１に示したテーブル統合装置１の処理概要を示すフローチャートである。
最初に、図３に示すフローチャートに沿ってテーブル統合装置１の処理概要を説明する。 Next, the operation will be described.
FIG. 3 is a flowchart showing an outline of processing of the table integration device 1 shown in FIG.
Initially, the process outline | summary of the table integration apparatus 1 is demonstrated along the flowchart shown in FIG.

まず、ステップＳ１において、定義情報取得部１７は、データベース接続部２０を介して、対象となる表のカラム名・カラム順番を取得し、定義情報保持部１６１に保存する。
また、データ取得部１８は、データベース接続部２０を介して、移行元データベース５０２及び移行先データベース６０２のデータを取得し、取得データ保持部１６２に記憶する。
また、区切り情報取得部１９は、ユーザＩ／Ｆ２２を介してユーザより解析の対象となるカラムの指定を受け、また、データベース接続部２０を介して区切り文字の情報を取得し、区切り文字情報保持部１６３に記憶する。
続いて、区切り分割部１１は、取得データ保持部１６２から移行元データを取得し、ユーザから指定された解析対象のカラム内の文字列（フィールド値）に関して、取得した区切り文字情報の部分で分割し、分割部分を含まない前方部分と後方部分を別々に分割データ保持部１６４に保存する（カラム対選択処理）。
前述したように、区切り分割部１１により分割された後の２つのカラムが第１の解析対象カラム対に相当する。 First, in step S <b> 1, the definition information acquisition unit 17 acquires the column name / column order of the target table via the database connection unit 20 and stores it in the definition information holding unit 161.
In addition, the data acquisition unit 18 acquires data of the migration source database 502 and the migration destination database 602 via the database connection unit 20 and stores the acquired data in the acquired data holding unit 162.
The delimiter information acquisition unit 19 receives a specification of a column to be analyzed from the user via the user I / F 22, acquires delimiter information via the database connection unit 20, and stores delimiter information Store in the unit 163.
Subsequently, the delimiter division unit 11 acquires the migration source data from the acquired data holding unit 162, and divides the character string (field value) in the analysis target column designated by the user at the acquired delimiter character information portion. Then, the front part and the rear part not including the divided part are separately stored in the divided data holding unit 164 (column pair selection process).
As described above, the two columns after being divided by the divider / divider 11 correspond to the first analysis target column pair.

次に、ステップＳ２において、相関ルール計算部１２は、移行元データと移行先データの双方において、相関ルール計算を実施する（第１の出現傾向解析処理）（第２の出現傾向解析処理）。 Next, in step S2, the correlation rule calculation unit 12 performs a correlation rule calculation on both the migration source data and the migration destination data (first appearance tendency analysis process) (second appearance tendency analysis process).

次に、ステップＳ３において、相関差分値計算部１３は、前記の相関ルール計算の結果を利用し、相関差分値計算を実施する（近似度算出処理）。
次に、ステップＳ４において、比較計算部１４は前記の相関差分値計算結果を利用し、移行元と移行先の比較計算を実施する（近似度算出処理）。 Next, in step S3, the correlation difference value calculation unit 13 performs correlation difference value calculation using the result of the correlation rule calculation (approximation calculation process).
Next, in step S4, the comparison calculation unit 14 uses the correlation difference value calculation result to perform comparison calculation between the transfer source and the transfer destination (approximation calculation process).

最後に、ステップＳ５において、判定部１５により移行元のカラムと移行先のカラム組に対する対応判定がなされ、結果が出力される。
出力の方法としては、ファイル出力、モジュールの出力、インタフェース等が想定される。 Finally, in step S5, the determination unit 15 determines the correspondence between the migration source column and the migration destination column set, and outputs the result.
As an output method, file output, module output, interface, and the like are assumed.

以下、ステップＳ１〜Ｓ４までの詳細に関して説明する。 Hereinafter, details of steps S1 to S4 will be described.

まず、ステップＳ１に関して説明する。
定義情報取得部１７は、データベース接続部２０を介して、対象となる表のカラム名・カラム順番を取得し、定義情報保持部１６１に保存する。
また、データ取得部１８は、データベース接続部２０を介して、移行元データベース５０２から移行元データを取得し、また、移行先データベース６０２から移行先データを取得し、取得データ保持部１６２に記憶する。
また、区切り情報取得部１９は、ユーザＩ／Ｆ２２を介してユーザより解析の対象となるカラムの指定を受け、また、データベース接続部２０を介して区切り文字の情報を取得する。
続いて、区切り分割部１１は、取得データ保持部１６２から移行元データを取得し、ユーザから指定された解析対象のカラム内の文字列（フィールド値）に関して、取得した区切り文字情報の部分で分割し、分割部分を含まない前方部分と後方部分を別々に分割データ保持部１６４に保存する。
区切り分割の具体例として、図４に示すように、“ ”（全角スペース）が区切り文字として与えられた場合を考える。
このとき、移行元データ中の「氏名」カラム内に関して、最初に一致する全角スペースを区切りとして、一致した部分の前後部分が、それぞれ別の仮想的なカラムとして保持される。 First, step S1 will be described.
The definition information acquisition unit 17 acquires the column name / column order of the target table via the database connection unit 20 and stores it in the definition information holding unit 161.
Further, the data acquisition unit 18 acquires the migration source data from the migration source database 502 via the database connection unit 20, acquires the migration destination data from the migration destination database 602, and stores it in the acquired data holding unit 162. .
Further, the delimiter information acquisition unit 19 receives the specification of the column to be analyzed from the user via the user I / F 22 and acquires delimiter information via the database connection unit 20.
Subsequently, the delimiter division unit 11 acquires the migration source data from the acquired data holding unit 162, and divides the character string (field value) in the analysis target column designated by the user at the acquired delimiter character information portion. Then, the front part and the rear part not including the divided part are stored in the divided data holding unit 164 separately.
As a specific example of delimiter division, consider a case where “” (double-byte space) is given as a delimiter as shown in FIG.
At this time, with respect to the “name” column in the migration source data, the first and last full-width spaces are separated, and the portions before and after the matching portions are held as separate virtual columns.

次に、ステップＳ２において、相関ルール計算部１２は、相関ルールである支持度と確信度の計算を実施する。
ここで、相関ルールとは、ある対象Ａと対象Ｂの間の相関関係を次の２つの値にて示す。
確信度：Ａ選択者がＢを選ぶ確率
支持度：関係の全体に占める割合（ＡとＢが同時に出現する割合）
前述したように、確信度は、対象Ａが含まれるレコード数に対して、対象Ａと対象Ｂが共に含まれるレコード数の割合である。
また、支持度は、全レコード数に対して、対象Ａと対象Ｂが共に含まれるレコード数の割合である。 Next, in step S2, the correlation rule calculation unit 12 calculates the support level and the certainty level, which are correlation rules.
Here, the correlation rule indicates a correlation between a target A and a target B by the following two values.
Confidence: Probability that A selector chooses B Support: Percentage of the whole relationship (A and B appear at the same time)
As described above, the certainty factor is the ratio of the number of records in which both the target A and the target B are included to the number of records in which the target A is included.
The support level is the ratio of the number of records in which both the target A and the target B are included to the total number of records.

相関ルールの計算例に関して、図５を元に説明する。
図中の「佐藤→一郎」という連結フィールド値に対し、カラム全体数５に対し、「佐藤一郎」の組が２つ存在するため、支持度２／５＝０．４と算出される。
また、「佐藤→一郎」という連結フィールド値について、「佐藤」の全数４に対し、「一郎」は２つ存在するため、確信度は２／４＝０．５と算出される。
相関ルール計算部１２は、移行元データについては、解析対象のカラムについてのみ支持度と確信度を計算する。
例えば、図１４（ａ）の移行元データの場合は、システム管理者の氏名を分割した後の２つのカラムにおいて同じ行にあるフィールド値の連結の各々について、支持度と確信度を計算する。
図１４（ａ）の例では、「佐藤→一郎」、「佐藤→二郎」、「佐藤→三郎」、「鈴木→一郎」の各々について支持度と確信度を計算する。
一方、移行先データについては、相関ルール計算部１２は、全ての種類のカラムの組合せについて、同じ行にあるフィールド値の連結の各々について、支持度と確信度を計算する。
例えば、図１４（ｂ）の移行元データの場合は、利用者の「姓」のカラムと「名」のカラムの組合せ（「山本→一郎」、「渡辺→三郎」等）、利用申請者の「姓」のカラムと「名」のカラムの組合せ（「太田→実」、「鈴木→順子」等）、利用許諾者の「姓」のカラムと「名」のカラムの組合せ（「佐藤→一郎」、「鈴木→四朗」等）の他、利用者の「姓」のカラムと利用申請者の「名」のカラムの組合せ（「山本→実」、「渡辺→順子」等）、利用者の「姓」のカラムと利用許諾者の「名」のカラムの組合せ（「山本→一郎」、「渡辺→一郎」等）、利用申請者の「姓」のカラムと利用者の「名」のカラムの組合せ（「太田→一郎」、「鈴木→三郎」等）、利用申請者の「姓」のカラムと利用許諾者の「名」のカラムの組合せ（「太田→一郎」、「鈴木→一郎」等）、利用許諾者の「姓」のカラムと利用者の「名」のカラムの組合せ（「佐藤→一郎」、「佐藤→三郎」等）、利用許諾者の「姓」のカラムと利用申請者の「名」のカラムの組合せ（「佐藤→実」、「佐藤→順子」等）についても支持度と確信度を計算する。
図１４（ｂ）の移行先データに、例えば、利用日時等のカラムがあれば、利用者の姓と利用日時の値の組合せ（例えば、「山本→２０１０年７月１０日」等）についても支持度と確信度を計算する。
なお、このような組合せは、移行元データの解析対象のカラム（「姓」と「名」の組合せ）と対応関係にないことが明らかなので、移行先データにおいて「姓」と「名」の組合せのみを支持度と確信度の計算の対象とする設定を行ってもよい。 A calculation example of the association rule will be described with reference to FIG.
For the concatenated field value “Sato → Ichiro” in the figure, there are two sets of “Ichiro Sato” for the total number of columns 5, so the support level is calculated as 2/5 = 0.4.
In addition, with respect to the concatenated field value “Sato → Ichiro”, there are two “Ichiro” for the total number of “Sato”, and therefore the certainty factor is calculated as 2/4 = 0.5.
The association rule calculation unit 12 calculates the support level and the certainty factor only for the analysis target column for the migration source data.
For example, in the case of the migration source data in FIG. 14A, the support level and the certainty factor are calculated for each of the concatenations of the field values in the same row in the two columns after dividing the name of the system administrator.
In the example of FIG. 14A, the support level and the certainty factor are calculated for each of “Sato → Ichiro”, “Sato → Jiro”, “Sato → Saburo”, and “Suzuki → Ichiro”.
On the other hand, for the migration destination data, the association rule calculation unit 12 calculates the support level and the certainty factor for each connection of field values in the same row for all types of column combinations.
For example, in the case of the migration source data in FIG. 14B, the combination of the user's “last name” column and the “first name” column (“Yamamoto → Ichiro”, “Watanabe → Saburo”, etc.), Combinations of “last name” and “first name” columns (“Ota → Mitsu”, “Suzuki → Junko”, etc.), and combinations of “Last name” and “First name” columns of the licensee (“Sato → Ichiro”) ”,“ Suzuki → Shiro ”, etc.), the combination of the user ’s“ last name ”column and the user ’s“ first name ”column (“ Yamamoto → Mitsu ”,“ Watanabe → Junko ”, etc.), A combination of the “last name” column and the “name” column of the licensee (“Yamamoto → Ichiro”, “Watanabe → Ichiro”, etc.), the user's “last name” column and the user ’s “first name” column Combinations (“Ota → Ichiro”, “Suzuki → Saburo”, etc.), the combination of the “last name” column of the applicant for use and the “name” column of the licensee (“Ota → Ichi” ”,“ Suzuki → Ichiro ”, etc.), the combination of the“ Last name ”column of the licensee and the“ First name ”column of the user (“ Sato → Ichiro ”,“ Sato → Saburo ”, etc.), The support level and the certainty factor are also calculated for combinations of the “last name” column and the “first name” column of the application applicant (“Sato → Jun”, “Sato → Junko”, etc.).
For example, if the migration destination data in FIG. 14B includes a column such as use date and time, the combination of the user's last name and use date and time values (for example, “Yamamoto → July 10, 2010”, etc.) Calculate support and confidence.
In addition, since it is clear that such a combination does not correspond to the analysis target column (combination of "last name" and "first name") in the source data, the combination of "first name" and "first name" in the destination data You may perform the setting which makes only the object of calculation of support and reliability.

このように、相関ルール計算部１２では、移行元データについては解析対象のカラム対に関して支持度・確信度の組を計算し、移行先データについてはカラム対の全データ組合せに関して、支持度・確信度の組を計算し、計算結果を相関ルール計算結果保持部１６５に格納する。 As described above, the association rule calculation unit 12 calculates a support / confidence pair for the analysis target column pair for the migration source data, and supports / confidence for all data combinations of the column pair for the transfer destination data. The set of degrees is calculated, and the calculation result is stored in the correlation rule calculation result holding unit 165.

次に、図６のフローチャートにて、ステップＳ２での相関ルール計算を詳細説明する。 Next, the correlation rule calculation in step S2 will be described in detail with reference to the flowchart of FIG.

まず、ステップＳ２０１において、カラム内容の読込みが開始される。
続いて、ステップＳ２０２において、分割された前方部分に対応するインスタンスが読み込まれる。
これは、図５（図１４（ａ））の例では「姓」カラムのデータに相当する。
続いてステップＳ２０３において、記憶領域との比較が開始され、記憶領域に現時点で読み込んだデータが存在するか確認する。
ステップＳ２０５において、読み込んだデータが存在する場合、内部の管理変数を１カウントアップする。
読み込んだデータが存在しない場合、ステップＳ２０６において、データを内部記憶領域に登録する。
ここで、データ登録とは、内部記憶領域にデータを登録すると共に、索引となる数を対応させることで、移行のステップにおける検索性能向上につなげるものである。
続いてステップＳ２０７〜Ｓ２１１において、後方部分のインスタンスが読み込まれ、前方部分と同様の処理が実施される。
続いて、ステップＳ２１２において、全てのインスタンスを読み込んだか判定が成される。全て読み込んでいない場合は、図５（図１４（ａ））での次のレコードにおける読込みが実施される。
具体的には、ステップＳ２０２〜Ｓ２１１に対応する、前部分の読み込み、後部分の読み込みが実施される。
分割されたデータの、分割データ保持部１６４への読込みが完了した場合は、ステップＳ２１３〜ステップＳ２１７において、読み込んだデータに対する支持度計算・ソートが実施され、続いて相関表形式（図８）にて、相関ルール計算結果保持部１６５への書出しが実施される。
移行先データについての相関ルール計算は、移行先データに含まれるカラムの組合せごとに、図６に示すフローが実施される。 First, in step S201, reading of column contents is started.
Subsequently, in step S202, an instance corresponding to the divided front portion is read.
This corresponds to the data of the “last name” column in the example of FIG. 5 (FIG. 14A).
Subsequently, in step S203, comparison with the storage area is started, and it is confirmed whether or not the data read at the present time exists in the storage area.
In step S205, if the read data exists, the internal management variable is counted up by one.
If the read data does not exist, the data is registered in the internal storage area in step S206.
Here, data registration refers to registering data in the internal storage area and associating the number serving as an index, thereby leading to improved search performance in the migration step.
Subsequently, in steps S207 to S211, the instance of the rear part is read, and the same processing as that of the front part is performed.
Subsequently, in step S212, it is determined whether all instances have been read. If not all have been read, the next record in FIG. 5 (FIG. 14A) is read.
Specifically, reading of the front part and reading of the rear part corresponding to steps S202 to S211 are performed.
When reading of the divided data into the divided data holding unit 164 is completed, support level calculation / sorting is performed on the read data in step S213 to step S217, and subsequently in the correlation table format (FIG. 8). Thus, the writing to the correlation rule calculation result holding unit 165 is performed.
For the correlation rule calculation for the migration destination data, the flow shown in FIG. 6 is performed for each combination of columns included in the migration destination data.

図８は、図１４（ａ）の移行元データ及び図１４（ｂ）の移行先データについて算出した支持度の例を示す。
移行先データについては、作図上の理由から、利用者の「姓」のカラムと「名」のカラムの組合せ（図８の（Ａ））、利用申請者の「姓」のカラムと「名」のカラムの組合せ（図８の（Ｂ））、利用許諾者の「姓」のカラムと「名」のカラムの組合せ（図８の（Ｃ））のみを表記しているが、実際には、すべてのカラムの組合せについての支持度が含まれる。
また、確信度も図８と同様の形式で管理される。 FIG. 8 shows an example of the support level calculated for the migration source data in FIG. 14A and the migration destination data in FIG.
For the migration destination data, the combination of the user's "last name" column and the "first name" column ((A) in FIG. 8), the user name of the user name "last name" and the "first name" for reasons of drawing. Column combination (FIG. 8B), only the combination of the “last name” column and the “first name” column of the licensee (FIG. 8C), Support for all column combinations is included.
The certainty factor is also managed in the same format as in FIG.

続いて、ステップＳ３における相関差分値計算部１３の計算と、ステップＳ４における比較計算部１４の計算の詳細を説明する。
ステップＳ３とステップＳ４では、上記ステップ２において求めた相関ルール計算結果に対し以下の式（１）の演算を実施し、相関比較中間結果を算出する。
なお、以下の式（１）にて、ａｉは移行元データの相関表のｉ番目の数値（例：「姓→名」の支持度を降順に並べた際のｉ番目の支持度の数値）である。
ｂｉは、移行先データにおいて、ａｉに対応する文字列（姓と名の組合せ）と同じ文字列に対する数値である。
例えば、図８の場合は、ｉ＝１の場合は、ａｉは「佐藤→一郎」についての値であり、０．２であり、移行先データについては、１つ目の「姓→名」のカラム対には「佐藤→一郎」は存在せず、ｂｉは０であり、２つ目の「姓→名」のカラム対にも「佐藤→一郎」は存在せず、ｂｉは０であり、３つ目の「姓→名」のカラム対には「佐藤→一郎」は存在し、ｂｉは０．２である。 Next, details of the calculation of the correlation difference value calculation unit 13 in step S3 and the calculation of the comparison calculation unit 14 in step S4 will be described.
In step S3 and step S4, the calculation of the following equation (1) is performed on the correlation rule calculation result obtained in step 2 to calculate a correlation comparison intermediate result.
In the following formula (1), ai is the i-th numerical value in the correlation table of the migration source data (for example, the numerical value of the i-th support degree when the support levels of “last name → first name” are arranged in descending order). It is.
bi is a numerical value for the same character string as the character string (combination of last name and first name) corresponding to ai in the migration destination data.
For example, in the case of FIG. 8, when i = 1, ai is a value for “Sato → Ichiro” and is 0.2, and the migration destination data is the first “last name → first name”. “Sato → Ichiro” does not exist in the column pair, bi is 0, “Sato → Ichiro” does not exist in the second “last name → first name” column pair, and bi is 0. In the third “last name → first name” column pair, “Sato → Ichiro” exists, and bi is 0.2.

なお、相関比較中間結果は、支持度と確信度の双方に関して算出される。 Note that the correlation comparison intermediate result is calculated for both the support level and the certainty level.

以下、図７のフローチャートを元に、ステップＳ３とステップＳ４の詳細を説明する。
図７のフローを実行すると、上記の式（１）の演算が行われたことになる。 The details of step S3 and step S4 will be described below based on the flowchart of FIG.
When the flow of FIG. 7 is executed, the calculation of the above equation (1) is performed.

相関差分値計算部１３では、上記支持度・確信度に対し、移行元データ・移行先データ双方にて、２つのカラム組をレコード単位で捉えた場合の相関値差分を計算する。
ステップＳ３０１において、相関差分値計算部１３は、移行元データの相関計算結果の全体を相関ルール計算結果保持部１６５より計算可能な形で準備する。
続いて、相関差分値計算部１３は、ステップＳ３０２において、相関値の組合せに対し、移行元データにおいて差分計算を実施する。
差分計算は、移行元データの行ごとに、他の行との差分値を得る。
例えば、移行元データの相関表が図８の「佐藤→一郎」、「佐藤→二郎」、「佐藤→三郎」の３行で構成されていると仮定すると、１行目（「佐藤→一郎」）の支持度と２行目（「佐藤→二郎」）の支持度との差分値、２行目（「佐藤→二郎」）の支持度と３行目（「佐藤→三郎」）の支持度との差分値、１行目（「佐藤→一郎」）の支持度と３行目（「佐藤→三郎」）の支持度との差分値が計算される。
また、１行目（「佐藤→一郎」）の確信度と２行目（「佐藤→二郎」）の確信度との差分値、２行目（「佐藤→二郎」）の確信度と３行目（「佐藤→三郎」）の確信度との差分値、１行目（「佐藤→一郎」）の確信度と３行目（「佐藤→三郎」）の確信度との差分値が計算される。 The correlation difference value calculation unit 13 calculates a correlation value difference when capturing two column sets in units of records in both the migration source data and the migration destination data with respect to the support level and the certainty level.
In step S301, the correlation difference value calculation unit 13 prepares the entire correlation calculation result of the migration source data in a form that can be calculated by the correlation rule calculation result holding unit 165.
Subsequently, in step S302, the correlation difference value calculation unit 13 performs a difference calculation on the migration source data for the combination of correlation values.
In the difference calculation, a difference value with respect to another row is obtained for each row of the migration source data.
For example, assuming that the correlation table of the migration source data is composed of three rows “Sato → Ichiro”, “Sato → Jiro”, and “Sato → Saburo” in FIG. 8, the first row (“Sato → Ichiro”). ) And the second line ("Sato → Jiro") support value difference, the second line ("Sato → Jiro") support level and the third line ("Sato → Saburo") support level The difference value between the support level of the first line (“Sato → Ichiro”) and the support level of the third line (“Sato → Saburo”) is calculated.
Also, the difference between the certainty of the first line (“Sato → Jiro”) and the certainty of the second line (“Sato → Jiro”), the certainty of the second line (“Sato → Jiro”) and the third line The difference between the certainty of the eyes ("Sato → Saburo") and the certainty of the first line ("Sato → Ichiro") and the third line ("Sato → Saburo") are calculated The

続いて、相関差分値計算部１３は、ステップＳ３０３において、移行先データの相関表のうちの１つのカラム組（例えば、図８の（Ａ）：「山本→一郎」で開始しているカラムと「０．２」で開始しているカラムの組）の各行を上記相関ルール計算結果保持部１６５より読み込み、ステップＳ３０４において、移行元データの相関表から相関差分値の計算に用いられた行の組合せのうちの１つ（例えば、図８の「佐藤→一郎」と「佐藤→二郎」の組合せ）を読み込む。
次に、相関差分値計算部１３は、ステップＳ３０５、Ｓ３０６において、Ｓ３０３で読み出した移行先データの相関表のカラム組の各行において、ステップＳ３０４で読み込まれた行の組合せ（例えば、図８の「佐藤→一郎」と「佐藤→二郎」の組合せ）と同じ組合せがあるかどうかを探索し、同じ行の組合せがある場合は、移行先データ側で移行元データと同一の組合せに対し、相関差分値同士の減算を実施する（ステップＳ３０７）。
このとき、移行元データについて差分値計算に用いられた行（例えば、図８の「佐藤→一郎」）がＳ３０３で読み出された移行先データの相関表には登場しない場合は、当該行については支持度・確信度ともに０を割り当てて差分値計算を行う。 Subsequently, in step S303, the correlation difference value calculation unit 13 selects one column set (for example, (A) in FIG. 8: “Yamamoto → Ichiro”) in the correlation table of the migration destination data. Each row of the column set starting with “0.2” is read from the correlation rule calculation result holding unit 165, and in step S304, the row used for calculating the correlation difference value from the correlation table of the migration source data is read. One of the combinations (for example, the combination of “Sato → Ichiro” and “Sato → Jiro” in FIG. 8) is read.
Next, in steps S305 and S306, the correlation difference value calculation unit 13 selects the combination of rows read in step S304 (for example, "" in FIG. 8) in each row of the column set of the correlation table of the migration destination data read in S303. (Sato → Ichiro) and “Sato → Jiro” combination) is searched, and if there is a combination of the same row, the correlation difference for the same combination as the source data on the destination data side Subtraction between values is performed (step S307).
At this time, if the row (for example, “Sato → Ichiro” in FIG. 8) used for the difference value calculation for the migration source data does not appear in the correlation table of the migration destination data read in S303, Calculates the difference value by assigning 0 to both support and certainty.

ステップＳ３０８において、相関差分値計算部１３は、移行元データ側の相関表から最終組（例えば、図８の「佐藤→一郎」と「佐藤→三郎」の組合せ）まで読み込んだかどうか判定し、読み込んでない場合はステップＳ３０４において続きの組を読み込み、読み込みが完了している場合はステップＳ３０９において比較計算部１４が比較計算を実施する。
Ｓ３０９の比較計算の詳細は後述する。
ステップＳ３１０では、移行先データ側の相関表中のカラム組の全パターンに対し、Ｓ３０３〜Ｓ３０９の読み込み・計算が完了したか判定し、そうでない場合は、別の候補カラム組（例えば、図８の（Ｂ）：「太田→実」で開始しているカラムと「０．２」で開始しているカラムの組）の各行を読み込む。
全て読み終わった場合は、処理を完了する。 In step S308, the correlation difference value calculation unit 13 determines whether or not the last set (for example, the combination of “Sato → Ichiro” and “Sato → Saburo” in FIG. 8) has been read from the correlation table on the source data side. If not, the subsequent set is read in step S304. If the reading is completed, the comparison calculation unit 14 performs comparison calculation in step S309.
Details of the comparison calculation in S309 will be described later.
In step S310, it is determined whether the reading / calculation in S303 to S309 has been completed for all the patterns in the column set in the correlation table on the migration destination data side. If not, another candidate column set (for example, FIG. (B): Read each row of the column starting with “Ota → Miru” and the column starting with “0.2”).
When all the reading is completed, the process is completed.

次に、相関差分値計算を実現するステップＳ３と、比較計算を実現するステップＳ４における、計算の具体例に関して、図８を利用して示す。
ここでは、図８の相関表が３行で構成されていると仮定して説明を行う。
また、以下では、支持度について説明を行うが、確信度についても同様である。 Next, a specific example of calculation in step S3 for realizing correlation difference value calculation and step S4 for realizing comparison calculation will be described with reference to FIG.
Here, the description will be made on the assumption that the correlation table of FIG.
In the following, the support level will be described, but the same applies to the certainty level.

（１）相関差分値計算部１３は、移行元データに対して、以下の距離の組を求め、順番関係を示す指標とする。
１番目と２番目の差（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝０．１
２番目と３番目の差（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝０．０５
１番目と３番目の差（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝０．１５ (1) The correlation difference value calculation unit 13 obtains the following pairs of distances from the migration source data and uses them as indices indicating the order relationship.
1st and 2nd difference (support level of “Sato → Ichiro” −support level of “Sato → Jiro”) = 0.1
Difference between second and third (support level of “Sato → Jiro” −support level of “Sato → Saburo”) = 0.05
1st and 3rd difference (support level of “Sato → Ichiro” −support level of “Sato → Saburo”) = 0.15

（２）また、相関差分値計算部１３は、移行先データのカラム対ごとに、上記（１）にて求めた組に対して同様に距離を求める。
（Ａ）図８の移行先データにおける（Ａ）のカラム
（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝０
（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝０
（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝０
（Ｂ）図８の移行先データにおける（Ｂ）のカラム
（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝０
（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝０
（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝０
（Ｃ）図８の移行先データにおける（Ｃ）のカラム
（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝０．１５
（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝０．０５
（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝０．２
なお、移行先側計算の（Ａ）及び（Ｂ）において、各カラムには「佐藤→一郎」、「佐藤→二郎」、「佐藤→三郎」のいずれも存在しないので、各々の支持度を０とみなして差分計算を行う。
また、移行先側計算の（Ｃ）において、カラムには「佐藤→三郎」は存在しないので、「佐藤→三郎」の支持度を０とみなして差分計算を行う。
すなわち、移行元に存在し移行先に存在しない文字列の値は、０とみなし計算する。 (2) Moreover, the correlation difference value calculation part 13 calculates | requires a distance similarly with respect to the group calculated | required by said (1) for every column pair of transfer destination data.
(A) Column (A) in the migration destination data of FIG. 8 (support level of “Sato → Ichiro” −support level of “Sato → Jiro”) = 0
(Support level of “Sato → Jiro” −Support level of “Sato → Saburo”) = 0
(Support level of “Sato → Ichiro” −Support level of “Sato → Saburo”) = 0
(B) Column (B) in the migration destination data in FIG. 8 (support level of “Sato → Ichiro” −support level of “Sato → Jiro”) = 0
(Support level of “Sato → Jiro” −Support level of “Sato → Saburo”) = 0
(Support level of “Sato → Ichiro” −Support level of “Sato → Saburo”) = 0
(C) Column (C) in the migration destination data of FIG. 8 (support degree of “Sato → Ichiro” −support degree of “Sato → Jiro”) = 0.15
(Support level of “Sato → Jiro” −Support level of “Sato → Saburo”) = 0.05
(Support level of “Sato → Ichiro” −Support level of “Sato → Saburo”) = 0.2
In (A) and (B) of the calculation on the migration destination side, since each column does not include “Sato → Ichiro”, “Sato → Jiro”, and “Sato → Saburo”, each support level is set to 0. Difference calculation is performed.
In addition, in (C) of the migration destination side calculation, “Sato → Saburo” does not exist in the column, so the difference calculation is performed assuming that the support degree of “Sato → Saburo” is 0.
That is, the value of the character string that exists in the migration source and does not exist in the migration destination is regarded as 0 and is calculated.

比較計算部１４は、同じ行の組合せに対して（２）（Ａ）〜（Ｃ）の差分値から（１）の差分値を減算し、各行の減算値を合計し、合計値の絶対値を取り、一致した組合せ個数で割ることで、相関に関する距離値（相関比較中間結果）とする。
上記の例の具体的な計算結果を示すと以下のようになる。 The comparison calculation unit 14 subtracts the difference values of (1) from the difference values of (2) (A) to (C) for the same row combination, sums the subtraction values of each row, and calculates the absolute value of the total value. Is taken and divided by the number of matching combinations to obtain a distance value related to correlation (intermediate correlation comparison result).
The specific calculation result of the above example is as follows.

［（２）（Ａ）の差分値］−［（１）の差分値］及び［（２）（Ｂ）の差分値］−［（１）の差分値］
（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝−０．１
（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝−０．０５
（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝−０．１５
合計値の絶対値＝０．３
商計算値＝０．１ [(2) (A) difference value]-[(1) difference value] and [(2) (B) difference value]-[(1) difference value]
(Support level of “Sato → Ichiro” −Support level of “Sato → Jiro”) = − 0.1
(Support level of “Sato → Jiro” −Support level of “Sato → Saburo”) = − 0.05
(Support level of “Sato → Ichiro” −Support level of “Sato → Saburo”) = − 0.15
Absolute value of total value = 0.3
Quotient calculation value = 0.1

［（２）（Ｃ）の差分値］−［（１）の差分値］
（「佐藤→一郎」の支持度−「佐藤→二郎」の支持度）＝０．０５
（「佐藤→二郎」の支持度−「佐藤→三郎」の支持度）＝０
（「佐藤→一郎」の支持度−「佐藤→三郎」の支持度）＝０．０５
合計値の絶対値＝０．１
商計算値≒０．０３ [(2) (difference value of (C)]-[(difference value of (1)]]
(Support level of “Sato → Ichiro” −Support level of “Sato → Jiro”) = 0.05
(Support level of “Sato → Jiro” −Support level of “Sato → Saburo”) = 0
(Support level of “Sato → Ichiro” −Support level of “Sato → Saburo”) = 0.05
Absolute value of total value = 0.1
Quotient calculation value ≒ 0.03

前述したように、上記計算は支持度、確信度の双方に対して実施される。
さらに支持度、確信度の相関比較中間結果は、比較計算部１４により、指定可能な変数αを利用して以下の式により統合される。
統合結果
＝α×（支持度の相関比較中間結果）＋（１−α）×（確信度の相関比較中間結果）
ただし、統合結果を求める式は別の形式でも良い。
統合結果は、比較計算結果保持部１６７に保存される。 As described above, the above calculation is performed for both the support level and the certainty level.
Further, the correlation comparison intermediate results of the support level and the certainty level are integrated by the following formula using the variable α that can be specified by the comparison calculation unit 14.
Integrated result = α × (intermediate comparison result of support level) + (1−α) × (intermediate comparison result of confidence level)
However, the formula for obtaining the integration result may be in another form.
The integration result is stored in the comparison calculation result holding unit 167.

最後に、ステップＳ５において、判定部１５は、統合結果に基づき、移行元のカラムと対応関係にある移行先のカラム組を判定し、判定結果を出力する。
判定の方法は、例えば、統合結果の算出結果が最も小さい値となった移行先のカラム組を抽出する等がある。
このとき、出力の形態は限定されるものではなく、記憶領域への出力、ファイル出力、画面出力等を想定する。 Finally, in step S5, the determination unit 15 determines a migration destination column set corresponding to the migration source column based on the integration result, and outputs the determination result.
The determination method includes, for example, extracting a migration destination column set in which the calculation result of the integration result has the smallest value.
At this time, the form of output is not limited, and output to a storage area, file output, screen output, and the like are assumed.

以上のように、本実施の形態では、２次元の表に対し、複数カラム組にて移行元・移行先ごとに計算を実施することにより、双方の結果が比較できるので、複数データベース間もしくは単一データベース内での１対多対応の関係を容易に把握することができる。 As described above, in the present embodiment, by performing calculation for each migration source / destination in a plurality of column sets on a two-dimensional table, both results can be compared. A one-to-many relationship within one database can be easily grasped.

以上、本実施の形態では、
複数存在する２次元の表集合における、ある一つの注目する第１の表中の１カラムに対して、
指定可能な箇所でカラム内データを分割する区切り分割手段と、
前記分割データ間の相関値として相関ルールを計算する相関ルール計算手段と、
前記相関値集合の、２つの組合せ間で差分値計算を実施する相関差分値計算手段と、
別の注目する第２の表におけるカラム組に対し、カラムが保持するデータ間の相関ルール計算を実施し、さらに相関値集合の２つの組合せ間で相関差分値計算を実施し、上記の相関差分値計算結果を第１の表における結果と比較する比較計算手段と、
上記比較結果を持って、注目カラムが複数カラムに対応するか否かを判定する、複数カラム間の対応判定手段と
を備える、テーブル統合装置を説明した。 As described above, in the present embodiment,
For one column in the first table of interest in a two-dimensional table set that exists in multiple numbers,
A delimiter that divides the data in the column at a specifiable location,
Correlation rule calculation means for calculating a correlation rule as a correlation value between the divided data;
Correlation difference value calculating means for calculating a difference value between two combinations of the correlation value set;
The correlation rule calculation between the data held in the column is performed for the column set in another second table of interest, and further the correlation difference value calculation is performed between the two combinations of the correlation value set. A comparison calculation means for comparing the value calculation results with the results in the first table;
A table integration apparatus has been described that includes the above-described comparison result and a correspondence determination unit between a plurality of columns that determines whether or not the column of interest corresponds to a plurality of columns.

また、本実施の形態では、
前記の表内差分値計算は、前記表１に存在するデータ組と同じ組合せを持つ、前記表２に存在するデータ組合せを比較対象とし、
かつ、表２に存在しないデータは相関値０として差分値計算を実現する相関差分値計算手段
を備えるテーブル統合装置を説明した。 In the present embodiment,
The intra-table difference value calculation has the same combination as the data set existing in the table 1, the data combination existing in the table 2 as a comparison target,
In addition, the table integration apparatus provided with the correlation difference value calculation means for realizing the difference value calculation with the data not present in Table 2 as the correlation value 0 has been described.

また、本実施の形態では、
表１と表２の間での表内相関差分値計算の結果を比較する方法として、一方の相関差分値集合から、データ組が同一であるもう一方の相関差分値の差を取り、相関差分値間の差の合計の絶対値を取り、
表１の持つデータ組と表２の持つデータ組の一致した個数で割った値（相関比較中間結果）を比較に利用する比較計算手段
を備えるテーブル統合装置を説明した。 In the present embodiment,
As a method for comparing the results of intra-table correlation difference value calculation between Table 1 and Table 2, the difference between the other correlation difference values having the same data set is taken from one correlation difference value set, and the correlation difference is calculated. Take the absolute value of the difference between the values,
The table integration apparatus provided with the comparison calculation means that uses the value (correlation comparison intermediate result) obtained by dividing the data set of Table 1 and the data set of Table 2 divided by the matched number has been described.

また、本実施の形態では、
表１と表２の間での相関比較中間結果を利用し、相関値として与えられる支持度と確信度に対し、指定可能な変数αを利用した次の式によって与えられる、
α×（支持度の相関比較中間結果）＋（１−α）×（確信度の相関比較中間結果）
を、複数カラム組間対応判定に利用する比較計算手段
を備えるテーブル統合装置を説明した。 In the present embodiment,
Using the intermediate results of correlation comparison between Table 1 and Table 2, the support and confidence given as correlation values are given by the following formula using a variable α that can be specified.
α × (Intermediate comparison result of support level) + (1−α) × (Intermediate comparison result of confidence level)
The table integration device provided with the comparison calculation means used for determining the correspondence between the plurality of column sets has been described.

また、本実施の形態では、
上記表２中のカラム組を入力とし、表１中に存在する複数カラムデータを分割した内容と対応するか否かを判定する区切り分割手段と、相関ルール計算手段と、相関差分値計算手段と、比較計算手段と、判定手段を持つテーブル統合装置を説明した。 In the present embodiment,
Delimiter dividing means for determining whether or not to correspond to the contents obtained by dividing the plurality of column data existing in Table 1 with the column set in Table 2 as an input, a correlation rule calculating means, a correlation difference value calculating means, The table integration apparatus having the comparison calculation means and the determination means has been described.

また、本実施の形態では、
複数存在する２次元の表集合における、ある一つの注目する第１の表中の１カラムに対して、
指定可能な箇所でカラム内データを分割する区切り分割ステップと、
前記分割データ間の相関値として相関ルールを計算する相関ルール計算ステップと、
前記相関値集合の、２つの組合せ間で差分値計算を実施する相関差分値計算ステップと、
別の注目する第２の表におけるカラム組に対し、カラムが保持するデータ間の相関ルール計算を実施し、さらに相関値集合の２つの組合せ間で相関差分値計算を実施し、上記の相関差分値計算結果を第１の表における結果と比較する比較計算ステップと、
上記比較結果を持って、注目カラムが複数カラムに対応するか否かを判定する、複数カラム間の対応判定ステップと
を備える、テーブル統合方法を説明した。 In the present embodiment,
For one column in the first table of interest in a two-dimensional table set that exists in multiple numbers,
A delimiter step that divides the data in the column at a specifiable location,
A correlation rule calculation step of calculating a correlation rule as a correlation value between the divided data;
A correlation difference value calculating step for calculating a difference value between two combinations of the correlation value set;
The correlation rule calculation between the data held in the column is performed for the column set in another second table of interest, and further the correlation difference value calculation is performed between the two combinations of the correlation value set. A comparison calculation step for comparing the value calculation results with the results in the first table;
The table integration method including the above-described comparison result and the correspondence determination step between the plurality of columns for determining whether the target column corresponds to the plurality of columns has been described.

なお、本実施の形態では、移行元データの解析対象のカラムを２つに分割する場合を説明した。
しかし、移行元データの解析対象のカラムが移行先のカラムと同じ構成である場合、例えば、移行元データ、移行先データのいずれにおいても、「姓」カラムと「名」カラムという構成になっている場合は、移行元データのカラムを分割する必要はない。 In the present embodiment, a case has been described in which the analysis target column of the migration source data is divided into two.
However, if the column to be analyzed in the migration source data has the same configuration as the migration destination column, for example, both the migration source data and the migration destination data have a configuration of “last name” column and “first name” column. If so, there is no need to split the migration source data column.

実施の形態２．
図９は、本実施の形態に係るテーブル統合装置１の構成例を示す。
本実施の形態に係るテーブル統合装置１は、図１に示す構成に加え、図９に示すように、閾値取得部３１、計算対象指定変数取得部３２、閾値保持部１６８、相関差分計算対象指定変数保持部１６９を持つ。
図９において、図１と同じ符号が用いられている要素は実施の形態１で説明したものと同様であり、説明を省略する。
なお、図９では、図１に示した移行元システム５０１、移行元データベース５０２、移行先システム６０１、移行先データベース６０２の図示は省略している。 Embodiment 2. FIG.
FIG. 9 shows a configuration example of the table integration device 1 according to the present embodiment.
In addition to the configuration shown in FIG. 1, the table integration apparatus 1 according to the present embodiment includes a threshold acquisition unit 31, a calculation target designation variable acquisition unit 32, a threshold holding unit 168, and a correlation difference calculation target designation as shown in FIG. It has a variable holding unit 169.
9, elements using the same reference numerals as those in FIG. 1 are the same as those described in the first embodiment, and a description thereof will be omitted.
9, illustration of the migration source system 501, the migration source database 502, the migration destination system 601, and the migration destination database 602 shown in FIG. 1 is omitted.

本実施の形態では、比較計算部１４の結果にて現れる比較計算結果に対して、閾値取得部３１にて取得する変数ｍによって複数カラム組が対応するか否かの判定を実現する。
実施の形態１では、判定部１５は、統合結果が最も小さい数値となったカラム対が、移行元データのカラムに対応するカラム対の候補として抽出している。
これに対して、本実施の形態では、閾値ｍの値を可変とし、閾値取得部３１がユーザやアプリケーションプログラム等から閾値ｍの値を取得する。
そして、判定部１５は、統合結果の値が取得されたｍの値以下の対応候補のカラム対を最小値の統合結果から順に抽出する。 In the present embodiment, it is determined whether or not a plurality of column sets correspond to the comparison calculation result appearing in the result of the comparison calculation unit 14 by the variable m acquired by the threshold acquisition unit 31.
In the first embodiment, the determination unit 15 extracts the column pair having the smallest integration result as a column pair candidate corresponding to the column of the migration source data.
On the other hand, in the present embodiment, the value of the threshold value m is variable, and the threshold value acquisition unit 31 acquires the value of the threshold value m from a user, an application program, or the like.
Then, the determination unit 15 sequentially extracts column pairs of correspondence candidates that are equal to or less than the value of m from which the value of the integration result is acquired, from the integration result of the minimum value.

また、本実施の形態では、計算対象指定変数取得部３２にて取得された相関計算対象指定変数ｋ（ｋ＞１）は、相関差分値計算にて、相関表の上位ｋ位までの内容を対象とし、相関差分値計算を実施する。
実施の形態１では、移行元データ、移行先データのそれぞれにおいて、全ての行の組合せについて支持度及び確信度の差分値の計算が行われる。
例えば、移行元データ、移行先データともに相関表（図８）に１００行あれば、各行について他の９９行の各々と支持度及び確信度の差分値の計算が行われる。
これに対して、本実施の形態では、計算対象指定変数取得部３２がユーザやアプリケーションプログラム等から変数ｋの値を取得し、相関差分値計算部１３は、取得されたｋの値に対応する行数において差分計算を行う。
例えば、ｋ＝１０であれば、移行元データ、移行先データともに、相関表の１行目については、２〜１１行目の各行との間で支持度及び確信度の差分計算が行われ、支持度、確信度のそれぞれに対して１０個の差分値が得られる。
同様に、相関表の２行目については、３〜１２行目の各行との間で支持度及び確信度の差分計算が行われ、支持度、確信度のそれぞれに対して１０個の差分値が得られる。 Further, in the present embodiment, the correlation calculation target designation variable k (k> 1) acquired by the calculation target specification variable acquisition unit 32 includes the contents up to the top k of the correlation table in the correlation difference value calculation. Calculate the correlation difference value for the target.
In the first embodiment, the difference values of the support level and the certainty factor are calculated for all combinations of rows in each of the transfer source data and the transfer destination data.
For example, if there are 100 rows in the correlation table (FIG. 8) for both the migration source data and the migration destination data, the difference values of the support level and the certainty factor are calculated for each row with each of the other 99 rows.
On the other hand, in the present embodiment, the calculation target designation variable acquisition unit 32 acquires the value of the variable k from the user, application program, or the like, and the correlation difference value calculation unit 13 corresponds to the acquired value of k. Difference calculation is performed on the number of lines.
For example, if k = 10, the difference calculation of the support level and the certainty factor is performed between each of the 2nd to 11th rows for the first row of the correlation table for both the migration source data and the migration destination data. Ten difference values are obtained for each of the support level and the certainty level.
Similarly, for the second row of the correlation table, the difference calculation of the support level and the certainty factor is performed with each of the third to twelfth rows, and 10 difference values are obtained for each of the support level and the certainty level. Is obtained.

なお、閾値ｍの指定、相関計算対象指定変数ｋの指定は、ユーザによる指定のほか、ファイル、別モジュールからの引数による渡しなどを想定する。 The specification of the threshold value m and the specification of the correlation calculation target specification variable k are assumed not only by the user, but also by a file, an argument from another module, or the like.

以上のように本実施の形態では、閾値ｍを設定することによる判定の基準設定が可能になることで、また、変数ｋにより計算対象の数が絞り込まれるため、計算量を削減することができる。 As described above, in the present embodiment, the determination criterion can be set by setting the threshold value m, and the number of calculation objects is narrowed down by the variable k, so that the amount of calculation can be reduced. .

本実施の形態では、
支持度に対する相関比較中間結果と確信度に対する相関比較中間結果とに対して変数αを用いた計算結果に対し、指定可能な閾値ｍにより、候補を絞り込む比較計算手段を備えるテーブル統合装置を説明した。 In this embodiment,
Described a table integration device including a comparison calculation means for narrowing candidates by a threshold m that can be specified for a calculation result using a variable α for a correlation comparison intermediate result for support and a correlation comparison intermediate result for certainty. .

また、本実施の形態では、
指定可能な変数ｋを用い、相関値の上位ｋ位までの集合から２つを選ぶ組合せを対象とし、表内相関差分値計算を実施する相関差分値計算手段を備えるテーブル統合装置を説明した。 In the present embodiment,
The table integration apparatus provided with the correlation difference value calculation means for performing the intra-table correlation difference value calculation for the combination of selecting two from the set up to the highest k correlation values using the specifiable variable k has been described.

実施の形態３．
図１１は、本実施の形態に係るテーブル統合装置１の構成例を示す。
本実施の形態に係るテーブル統合装置１は、図１に示す構成に加え、図１１に示すように、スキーマ情報分析部４１とスキーマ情報分析結果保持部１７０を持つ。
スキーマ情報分析部４１では、実施の形態１におけるステップＳ４の結果に対し、スキーマ情報を用い、カラム名の順序を考慮することで、判定を絞り込む。
つまり、移行先データにおけるカラム対ごとの統合結果の値とともに、移行先データにおけるカラムの配列順序を参照して、対応候補のカラム対を抽出する。
なお、本実施の形態では、判定部１５とともに、スキーマ情報分析部４１も対応候補抽出処理実行部の例となる。
また、図１１において、図１と同じ符号が用いられている要素は実施の形態１で説明したものと同様であり、説明を省略する。
なお、図１１では、図１に示した移行元システム５０１、移行元データベース５０２、移行先システム６０１、移行先データベース６０２の図示は省略している。 Embodiment 3 FIG.
FIG. 11 shows a configuration example of the table integration device 1 according to the present embodiment.
In addition to the configuration shown in FIG. 1, the table integration device 1 according to the present embodiment has a schema information analysis unit 41 and a schema information analysis result holding unit 170 as shown in FIG.
The schema information analysis unit 41 narrows down the determination by using the schema information and considering the order of the column names with respect to the result of step S4 in the first embodiment.
That is, the correspondence candidate column pairs are extracted with reference to the column arrangement order in the migration destination data together with the integration result value for each column pair in the migration destination data.
In the present embodiment, together with the determination unit 15, the schema information analysis unit 41 is an example of a correspondence candidate extraction process execution unit.
In FIG. 11, elements using the same reference numerals as those in FIG. 1 are the same as those described in the first embodiment, and description thereof is omitted.
In FIG. 11, the migration source system 501, the migration source database 502, the migration destination system 601, and the migration destination database 602 shown in FIG.

本実施の形態では、ステップＳ１で区切り分割する際に、その前後部分の順番関係が定義情報保持部１６１にて保持される。
つまり、前半部分は「姓」であり、後半部分が「名」であることが定義情報保持部１６１にて保持される。
図１０に、実施の形態３の具体例を示す。
移行元データの「管理者」カラムを分割した「姓」部分と「名」部分のそれぞれにつき、移行先データのカラムＡ・カラムＢ・カラムＣの中から対応するカラムを決定する必要がある。
ステップＳ４までの結果から、「管理者」カラムの「姓」部分はカラムＢに対応することが判明しているが、「管理者」カラムの「名」部分への対応は判明しなかったとする。
このとき、スキーマ情報分析部４１では、「管理者」カラムの分割部分の順番関係を定義情報保持部１６１から取得し、「姓」「名」の順番関係を把握する。
また、移行先データの順番情報を定義情報保持部１６１より取得し、比較することで、対応関係を把握する。
図１０に示す例では、「姓」が１番目に登場し、「名」が２番目に登場するという情報に基づき、「姓」−「名」の対応は「カラムＢ」−「カラムＣ」の対応関係に決定する。
本スキーマ情報分析部４１による結果はスキーマ情報分析結果保持部１７０に格納されると共に、判定部１５の入力となる。 In this embodiment, when dividing and dividing in step S1, the order information of the front and rear portions is held in the definition information holding unit 161.
That is, the definition information holding unit 161 holds that the first half is “last name” and the second half is “first name”.
FIG. 10 shows a specific example of the third embodiment.
For each of the “last name” portion and “first name” portion obtained by dividing the “administrator” column of the migration source data, it is necessary to determine the corresponding column from the columns A, B, and C of the migration destination data.
From the results up to step S4, it is known that the “last name” portion of the “administrator” column corresponds to the column B, but the correspondence to the “first name” portion of the “administrator” column has not been found. .
At this time, the schema information analysis unit 41 acquires the order relationship of the divided portions of the “manager” column from the definition information holding unit 161 and grasps the order relationship of “last name” and “first name”.
Further, the correspondence information is grasped by obtaining the order information of the migration destination data from the definition information holding unit 161 and comparing it.
In the example illustrated in FIG. 10, the correspondence between “last name” − “first name” is “column B” − “column C” based on information that “last name” appears first and “first name” appears second. Determine the correspondence.
A result obtained by the schema information analysis unit 41 is stored in the schema information analysis result holding unit 170 and is input to the determination unit 15.

以上のように本実施の形態では、カラムの定義情報を利用することにより、類似カラムの中でも対象を絞り込み、複数カラム組対応判定が可能である。 As described above, in the present embodiment, by using the column definition information, it is possible to narrow down targets among similar columns and to determine whether a plurality of column sets are supported.

以上、本実施の形態では、
判定手段より出力された、複数カラム組間の対応候補の集合を入力とし、
データベース定義情報からカラムの順番情報を取得する定義情報取得手段と、
同カラムの順番情報を利用して、前記複数カラム組間の対応候補集合から、対応候補を一つに決定する、もしくは対応する候補は存在しないことを分析するスキーマ情報分析手段と
を備えるテーブル統合装置を説明した。 As described above, in the present embodiment,
A set of correspondence candidates between a plurality of column sets output from the judging means is input,
Definition information acquisition means for acquiring column order information from database definition information;
Table integration provided with schema information analysis means for determining one corresponding candidate from the correspondence candidate set between the plurality of column sets or analyzing that there is no corresponding candidate using the order information of the same column The apparatus has been described.

実施の形態４．
図１１は、本実施の形態に係るテーブル統合装置１の構成例を示す。
本実施の形態に係るテーブル統合装置１は、図１１に示すように、単独カラムデータ分析部４２と単独カラムデータ分析結果保持部１７１を持つ。
また、図１１において、図１と同じ符号が用いられている要素は実施の形態１で説明したものと同様であり、説明を省略する。
なお、図１１では、図１に示した移行元システム５０１、移行元データベース５０２、移行先システム６０１、移行先データベース６０２の図示は省略している。 Embodiment 4 FIG.
FIG. 11 shows a configuration example of the table integration device 1 according to the present embodiment.
As shown in FIG. 11, the table integration device 1 according to the present embodiment includes a single column data analysis unit 42 and a single column data analysis result holding unit 171.
In FIG. 11, elements using the same reference numerals as those in FIG. 1 are the same as those described in the first embodiment, and description thereof is omitted.
In FIG. 11, the migration source system 501, the migration source database 502, the migration destination system 601, and the migration destination database 602 shown in FIG.

単独カラムデータ分析部４２は、既存のインスタンス分析手法を用いて、移行元テーブルの内容と移行先テーブルの内容の対応関係を絞り込み、単独カラムデータ分析結果保持部１７１に格納する。
本実施の形態では、相関ルール計算部１２ととともに、単独カラムデータ分析部４２が、第２の出現傾向解析処理実行部の例となる。 The single column data analysis unit 42 narrows down the correspondence relationship between the contents of the migration source table and the content of the migration destination table using the existing instance analysis method, and stores it in the single column data analysis result holding unit 171.
In the present embodiment, the single column data analysis unit 42 together with the association rule calculation unit 12 is an example of a second appearance tendency analysis processing execution unit.

単独カラムデータ分析は、区切り分割部１１にて分割された２つのデータに関して既知のインスタンス分析手法により値を算出し、さらに、移行先テーブルに関してカラム単位で既知のインスタンス分析手法で値を算出し、両者を比較する。
図１２を元に、具体例を説明する。
単独カラムデータ分析部４２は、移行元データのカラムに対し、区切り文字で分割された「姓」カラムに注目する。
そして、単独カラムデータ分析部４２は、「姓」カラムに出現するフィールド値のインスタンスごとに出現回数をカウントし、佐藤というインスタンスが１０回、田中というインスタンスが９回、というように、出現回数の降順に整理（ソート）する。
続いて、移行先の全カラムに対しても同様に出現回数のカウントとソートが実施される。
最後に、相関ルール計算部１２が、上位Ｎ個に対し、移行元データと移行先データの登場インスタンス数が一定数以上一致していたら、該当する移行先データのカラムを解析対象として選択する。
図１２の例では、移行先データのカラムＡとカラムＢは解析対象とする一方で、カラムＣは解析対象としない。
この後、相関ルール計算部１２は、移行先データについて、「姓」のカラムであるカラムＡに不図示の「名」のカラムを組み合わせて支持度及び確信度を計算し、また、「姓」のカラムであるカラムＢに不図示の「名」のカラムを組み合わせて支持度及び確信度を計算する。
以降の処理手順は、実施の形態１に示した通りであり、説明を省略する。 In the single column data analysis, values are calculated by a known instance analysis method for the two data divided by the delimiter dividing unit 11, and further, values are calculated by a known instance analysis method for each column of the migration destination table, Compare the two.
A specific example will be described with reference to FIG.
The single column data analysis unit 42 pays attention to the “last name” column divided by the delimiter with respect to the column of the migration source data.
Then, the single column data analysis unit 42 counts the number of appearances for each instance of the field value appearing in the “last name” column, the instance of Sato 10 times, the instance of Tanaka 9 times, and so on. Organize (sort) in descending order.
Subsequently, the number of occurrences is counted and sorted in the same manner for all the migration destination columns.
Finally, if the number of appearance instances of the migration source data and the migration destination data match a certain number or more for the top N, the correlation rule calculation unit 12 selects the corresponding migration destination data column as an analysis target.
In the example of FIG. 12, the column A and the column B of the migration destination data are to be analyzed, while the column C is not to be analyzed.
Thereafter, the association rule calculation unit 12 calculates the support level and the certainty factor of the migration destination data by combining the column “name” (not shown) with the column “A”, which is the column of “last name”. The support level and the certainty factor are calculated by combining the column of “name” (not shown) with the column B which is the column of No. 1.
Subsequent processing procedures are as described in the first embodiment, and a description thereof will be omitted.

以上のように本実施の形態では、カラム単位でのインスタンス比較により、候補が絞り込まれるため、相関ルール計算・相関差分値計算での計算量削減に効果がある。 As described above, in the present embodiment, candidates are narrowed down by instance comparison in units of columns, which is effective in reducing the amount of calculation in correlation rule calculation / correlation difference value calculation.

本実施の形態では、
注目テーブルのカラムに対して、別テーブルのカラムとの対応を、データ内容を比較することによって推測する単独カラムデータ分析手段と、
上記結果を、相関ルール計算にて、相関値計算の対象として利用する相関ルール計算手段と
を備えるテーブル統合装置を説明した。 In this embodiment,
Single column data analysis means for estimating the correspondence between the column of the table of interest and the column of another table by comparing the data contents;
The table integration apparatus provided with the correlation rule calculation means for using the above result as a correlation value calculation target in the correlation rule calculation has been described.

実施の形態５．
図１３は、本実施の形態に係るテーブル統合装置１の構成例を示す。
本実施の形態に係るテーブル統合装置１は、移行先の注目データとして１つのカラム組を入力とし、移行元のデータとして複数のカラムを判定対象とし、移行元のカラム組のデータ内容を結合したものが移行先に存在するか判断するために、図１３に示すように、区切り分割部１１ｂ、相関ルール計算部１２ｂ、相関差分値計算部１３ｂ、比較計算部１４ｂ、判定部１５ｂを持ち、それぞれの部分にて、移行先の情報１つに対し移行元の複数カラム組を読み込むような機能を備えた機構である。
つまり、実施の形態１では、移行元データには「氏名」のカラムが存在し、移行先データには「姓」カラムと「名」カラムが複数存在している場合に、移行元データの「氏名」カラムを「姓」カラムと「名」カラムに分割し、分割した「姓」カラムと「名」カラムの対に対応する「姓」カラムと「名」カラムの対の候補を移行先データから抽出することを内容としている。
これに対して、実施の形態５では、移行先データには「氏名」のカラムが存在し、移行元データには「姓」カラムと「名」カラムが複数存在している場合に、移行先データの「氏名」カラムを「姓」カラムと「名」カラムに分割し、分割した「姓」カラムと「名」カラムの対に対応する「姓」カラムと「名」カラムの対の候補を移行元データから抽出することを内容としている。 Embodiment 5 FIG.
FIG. 13 shows a configuration example of the table integration device 1 according to the present embodiment.
The table integration device 1 according to the present embodiment receives one column set as input data to be migrated, inputs a plurality of columns as migration source data, and combines the data contents of the migration source column set. In order to determine whether a thing exists in the migration destination, as shown in FIG. 13, it has a delimiter division unit 11b, a correlation rule calculation unit 12b, a correlation difference value calculation unit 13b, a comparison calculation unit 14b, and a determination unit 15b. In this part, it is a mechanism having a function of reading a plurality of column sets of the transfer source for one piece of information of the transfer destination.
That is, in the first embodiment, when the “name” column exists in the migration source data and the “last name” column and the “first name” column exist in the migration destination data, “ Divide the “Name” column into the “Last Name” column and the “First Name” column, and select the “Last Name” and “First Name” column candidates corresponding to the divided “Last Name” and “First Name” column destination data The content is to extract from.
On the other hand, in the fifth embodiment, the migration destination data includes a “name” column, and the migration source data includes a plurality of “last name” and “first name” columns. Split the “name” column of the data into the “last name” column and the “first name” column, and the pair of the “last name” column and the “first name” column corresponding to the divided “last name” column and “first name” column The content is to extract from the migration source data.

また、本実施の形態に係るテーブル統合装置１においてにも、スキーマ情報分析部４１、単独カラムデータ分析部４２とを同時に使用しても良い。 Also in the table integration device 1 according to the present embodiment, the schema information analysis unit 41 and the single column data analysis unit 42 may be used simultaneously.

以上、本実施の形態では、移行元の複数カラムと、移行先のカラムの対応関係を判断する構成を持つことによって、移行元の複数カラムに対して、そのデータが結合した内容を保持する移行先を判断することが可能になる。 As described above, in the present embodiment, by having a configuration for determining the correspondence relationship between a plurality of migration source columns and a migration destination column, a migration that retains the combined contents of the data for the plurality of migration source columns It becomes possible to judge the destination.

最後に、実施の形態１〜５に示したテーブル統合装置１のハードウェア構成例について説明する。
図１５は、実施の形態１〜５に示すテーブル統合装置１のハードウェア資源の一例を示す図である。
なお、図１５の構成は、あくまでもテーブル統合装置１のハードウェア構成の一例を示すものであり、テーブル統合装置１のハードウェア構成は図１５に記載の構成に限らず、他の構成であってもよい。 Finally, a hardware configuration example of the table integration device 1 shown in the first to fifth embodiments will be described.
FIG. 15 is a diagram illustrating an example of hardware resources of the table integration device 1 described in the first to fifth embodiments.
15 is merely an example of the hardware configuration of the table integration device 1, and the hardware configuration of the table integration device 1 is not limited to the configuration described in FIG. Also good.

図１５において、テーブル統合装置１は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。
ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。
更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク装置、メモリカード（登録商標）読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置の一例である。
実施の形態１〜５で説明した「記憶領域１６」は、ＲＡＭ９１４、磁気ディスク装置９２０等により実現される。
通信ボード９１５、キーボード９０２、マウス９０３、スキャナ装置９０７、ＦＤＤ９０４などは、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力装置の一例である。 In FIG. 15, the table integration apparatus 1 includes a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, and a processor) that executes a program.
The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices.
Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an SSD (Solid State Drive), an optical disk device, or a memory card (registered trademark) read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device.
The “storage area 16” described in the first to fifth embodiments is realized by the RAM 914, the magnetic disk device 920, and the like.
A communication board 915, a keyboard 902, a mouse 903, a scanner device 907, an FDD 904, and the like are examples of input devices.
The communication board 915, the display device 901, the printer device 906, and the like are examples of output devices.

通信ボード９１５は、例えば、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）、ＳＡＮ（ストレージエリアネットワーク）などに接続されている。 The communication board 915 is connected to a LAN (Local Area Network), the Internet, a WAN (Wide Area Network), a SAN (Storage Area Network), etc., for example.

磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。
プログラム群９２３のプログラムは、ＣＰＵ９１１がオペレーティングシステム９２１、ウィンドウシステム９２２を利用しながら実行する。 The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924.
The programs in the program group 923 are executed by the CPU 911 using the operating system 921 and the window system 922.

また、ＲＡＭ９１４には、ＣＰＵ９１１に実行させるオペレーティングシステム９２１のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。
また、ＲＡＭ９１４には、ＣＰＵ９１１による処理に必要な各種データが格納される。 The RAM 914 temporarily stores at least part of the operating system 921 program and application programs to be executed by the CPU 911.
The RAM 914 stores various data necessary for processing by the CPU 911.

また、ＲＯＭ９１３には、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）プログラムが格納され、磁気ディスク装置９２０にはブートプログラムが格納されている。
テーブル統合装置１の起動時には、ＲＯＭ９１３のＢＩＯＳプログラム及び磁気ディスク装置９２０のブートプログラムが実行され、ＢＩＯＳプログラム及びブートプログラムによりオペレーティングシステム９２１が起動される。 The ROM 913 stores a BIOS (Basic Input Output System) program, and the magnetic disk device 920 stores a boot program.
When the table integration device 1 is activated, the BIOS program in the ROM 913 and the boot program in the magnetic disk device 920 are executed, and the operating system 921 is activated by the BIOS program and the boot program.

上記プログラム群９２３には、実施の形態１〜５の説明において「〜部」、「〜手段」として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。 The program group 923 stores programs that execute the functions described as “˜unit” and “˜means” in the description of the first to fifth embodiments. The program is read and executed by the CPU 911.

ファイル群９２４には、実施の形態１〜５の説明において、「〜の読み込み」、「〜の判断」、「〜の判定」、「〜の計算」、「〜の演算」、「〜の比較」、「〜の評価」、「〜の解析」、「〜の更新」、「〜の設定」、「〜の登録」、「〜の選択」、「〜の抽出」、「〜の入力」、「〜の出力」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。
「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。
ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出される。
そして、読み出された情報やデータや信号値や変数値やパラメータは、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。
抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、実施の形態１〜５で説明しているフローチャートの矢印の部分は主としてデータや信号の入出力を示す。
データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。
また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 In the file group 924, in the description of the first to fifth embodiments, “read”, “determined”, “determined”, “calculated”, “calculated”, and “comparison of” ”,“ Evaluation of ”,“ analysis of ”,“ update of ”,“ setting of ”,“ registration of ”,“ selection of ”,“ extraction of ”,“ input of ”, Information, data, signal values, variable values, and parameters indicating the results of the processing described as “output of” are stored as items of “˜file” and “˜database”.
The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory.
Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit.
The read information, data, signal value, variable value, and parameter are used for CPU operations such as extraction, search, reference, comparison, calculation, calculation, processing, editing, output, printing, and display.
Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
The arrows in the flowcharts described in the first to fifth embodiments mainly indicate input / output of data and signals.
Data and signal values are recorded on a recording medium such as a memory of the RAM 914, a flexible disk of the FDD 904, a compact disk of the CDD 905, a magnetic disk of the magnetic disk device 920, other optical disks, a mini disk, and a DVD.
Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、実施の形態１〜５の説明において「〜部」、「〜手段」として説明しているものは、「〜回路」、「〜装置」、「〜機器」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。
すなわち、実施の形態１〜５で説明したフローチャートに示すステップ、手順、処理により、本発明に係るデータ処理方法を実現することができる。
また、「〜部」、「〜手段」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。
或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。
ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。
プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。
すなわち、プログラムは、実施の形態１〜５の「〜部」、「〜手段」としてコンピュータを機能させるものである。あるいは、実施の形態１〜５の「〜部」、「〜手段」の手順や方法をコンピュータに実行させるものである。 In addition, in the description of the first to fifth embodiments, what is described as “to part” and “to means” may be “to circuit”, “to device”, and “to device”. It may be “˜step”, “˜procedure”, “˜processing”.
That is, the data processing method according to the present invention can be realized by the steps, procedures, and processes shown in the flowcharts described in the first to fifth embodiments.
In addition, what is described as “˜unit” and “˜means” may be realized by firmware stored in the ROM 913.
Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware.
Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD.
The program is read by the CPU 911 and executed by the CPU 911.
That is, the program causes the computer to function as “to part” and “to means” in the first to fifth embodiments. Alternatively, the procedures and methods of “to part” and “to means” in the first to fifth embodiments are executed by a computer.

このように、実施の形態１〜５に示すテーブル統合装置１は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータである。
そして、上記したように「〜部」、「〜手段」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the table integration device 1 shown in the first to fifth embodiments includes a CPU as a processing device, a memory as a storage device, a magnetic disk, a keyboard as an input device, a mouse, a communication board, a display device as an output device, and a communication device. A computer including a board or the like.
As described above, the functions indicated as “˜unit” and “˜means” are realized by using these processing devices, storage devices, input devices, and output devices.

１テーブル統合装置、１１区切り分割部、１２相関ルール計算部、１３相関差分値計算部、１４比較計算部、１５判定部、１６記憶領域、１７定義情報取得部、１８データ取得部、１９区切り情報取得部、２０データベース接続部、２１接続情報保持部、２２ユーザＩ／Ｆ、３１閾値取得部、３２計算対象指定変数取得部、４１スキーマ情報分析部、４２単独カラムデータ分析部、１０１データベース定義情報、１０２インスタンスデータ、１０３区切り情報、１６１定義情報保持部、１６２取得データ保持部、１６３区切り文字情報保持部、１６４分割データ保持部、１６５相関ルール計算結果保持部、１６６相関差分計算結果保持部、１６７比較計算結果保持部、１６８閾値保持部、１６９相関差分計算対象指定変数保持部、１７０スキーマ情報分析結果保持部、１７１単独カラムデータ分析結果保持部、５０１移行元システム、５０２移行元データベース、６０１移行先システム、６０２移行先データベース。 DESCRIPTION OF SYMBOLS 1 Table integration apparatus, 11 Division | segmentation division | segmentation part, 12 Correlation rule calculation part, 13 Correlation difference value calculation part, 14 Comparison calculation part, 15 Judgment part, 16 Storage area, 17 Definition information acquisition part, 18 Data acquisition part, 19 Separation information Acquisition unit, 20 Database connection unit, 21 Connection information holding unit, 22 User I / F, 31 Threshold acquisition unit, 32 Calculation target designation variable acquisition unit, 41 Schema information analysis unit, 42 Single column data analysis unit, 101 Database definition information , 102 instance data, 103 delimiter information, 161 definition information holding unit, 162 acquisition data holding unit, 163 delimiter character information holding unit, 164 divided data holding unit, 165 correlation rule calculation result holding unit, 166 correlation difference calculation result holding unit, 167 Comparison calculation result holding unit, 168 threshold holding unit, 169 correlation Difference calculation target designation variable holding unit, 170 schema information analysis result holding unit, 171 single column data analysis result holding unit, 501 migration source system, 502 migration source database, 601 migration destination system, 602 migration destination database.

Claims

Select a column pair to be analyzed as the first column pair to be analyzed for two-dimensional first data that contains multiple fields and each field is divided into one of multiple columns. A column pair selection process execution unit for executing the column pair selection process to be performed;
A first appearance tendency analysis processing execution unit for executing a first appearance tendency analysis process for analyzing an appearance tendency of a concatenated field value obtained by concatenating field values of each column of the first analysis target column pair in units of rows;
For two-dimensional second data in which a plurality of fields are included and each field is divided into one of a plurality of columns, the column pair to be analyzed is set as 1 as the second analysis target column pair. Second appearance trend analysis processing for selecting the pair or more and analyzing the appearance tendency of the connected field value obtained by connecting the field values of each column of the second analysis target column pair in units of rows for each second analysis target column pair A second appearance tendency analysis processing execution unit for executing
The analysis result for the first analysis target column pair and the analysis result for each second analysis target column pair are analyzed, and the concatenated field value in the first analysis target column pair is determined for each second analysis target column pair. A data processing apparatus comprising: an approximation degree calculation process execution unit that executes an approximation degree calculation process for calculating an approximation degree with an appearance tendency.

The first appearance tendency analysis processing execution unit
For each linked field value in the first analysis target column pair, the appearance frequency is calculated,
The second appearance tendency analysis processing execution unit
For each second analysis target column pair, the appearance frequency is calculated for each connected field value in the second analysis target column pair,
The approximation calculation processing execution unit
The second analysis is performed by analyzing the calculated value of the appearance frequency for each linked field value in the first analysis target column pair and the calculated value of the appearance frequency for each linked field value in each of the second analysis target column pair. The data processing apparatus according to claim 1, wherein for each target column pair, the degree of approximation with the appearance tendency of the connected field value in the first analysis target column pair is calculated.

The first appearance tendency analysis processing execution unit
As the appearance frequency of the linked field value, for each linked field value in the first analysis target column pair, calculate at least one of the support level and the certainty level,
The second appearance tendency analysis processing execution unit
As the appearance frequency of the connected field value, at least one of the support level and the certainty factor is calculated for each connected field value in the second analysis target column pair for each second analysis target column pair. The data processing apparatus according to claim 2.

The approximation calculation processing execution unit
Calculating a difference between calculated values between linked field values in the first analysis target column pair as a first primary difference value, obtaining a plurality of first primary difference values in the first analysis target column pair;
For each second analysis target column pair, the difference between the calculated values between the linked field values in the second analysis target column pair is calculated as a second primary difference value, Obtaining a second primary difference value;
For each second analysis target column pair, the difference between the first primary difference value and the second primary difference value calculated from the combination of the same concatenated field values is calculated as the secondary difference value, and the second Obtain the secondary difference value for each analysis target column pair,
The degree of approximation with the appearance tendency of the connected field value in the first analysis target column pair is calculated for each second analysis target column pair using the secondary difference value. The data processing apparatus described.

The approximation calculation processing execution unit
When the linked field value existing in the first analysis target column pair does not exist in the second analysis target column pair, the calculated value of the linked field value in the second analysis target column pair is set to 0, and the second primary The data processing apparatus according to claim 4, wherein a difference value is calculated.

The first appearance tendency analysis processing execution unit
As the appearance frequency of the connected field value, the support level and the certainty factor are calculated for each connected field value in the first analysis target column pair,
The second appearance tendency analysis processing execution unit
As the appearance frequency of the connected field value, for each of the second analysis target column pair, the support level and the certainty factor are calculated for each connected field value in the second analysis target column pair,
The approximation calculation processing execution unit
A difference in support level between linked field values in the first analysis target column pair is calculated as a first support level primary difference value, and a plurality of first support level primary difference values in the first analysis target column pair are calculated. Get
For each second analysis target column pair, a support level difference between linked field values in the second analysis target column pair is calculated as a second support level primary difference value, and for each second analysis target column pair Obtaining a plurality of second support degree primary difference values;
For each second analysis target column pair, the difference between the first support degree primary difference value and the second support degree primary difference value calculated from the same combination field value combination is used as the support degree secondary difference value. Calculate
A difference in certainty between linked field values in the first analysis target column pair is calculated as a first certainty degree primary difference value, and a plurality of first certainty degree primary difference values in the first analysis target column pair are calculated. Get
For each second analysis target column pair, a difference in certainty between the connected field values in the second analysis target column pair is calculated as a second certainty factor primary difference value, and for each second analysis target column pair Obtaining a plurality of second certainty factor primary difference values;
For each second analysis target column pair, the difference between the first certainty factor primary difference value and the second certainty factor primary difference value calculated from the combination of the same concatenated field values is used as the certainty factor secondary difference value. Calculate
Using the support degree secondary difference value and the certainty degree secondary difference value, for each second analysis target column pair, calculating the degree of approximation with the appearance tendency of the connected field value in the first analysis target column pair The data processing apparatus according to claim 4, wherein:

The approximation calculation processing execution unit
In the first analysis target column pair, for each connected field value, k first primary difference values are calculated between k (k> 1) other connected field values,
5. The k second primary difference values between k other concatenated field values are calculated for each concatenated field value in the second analysis target column pair. 6. The data processing device according to any one of 6.

The second appearance tendency analysis processing execution unit
The data processing apparatus according to any one of claims 1 to 7, wherein column pairs of all combinations among a plurality of column pairs included in the second data are selected as second analysis target column pairs.

The second appearance tendency analysis processing execution unit
9. The field value in each column included in the second data is analyzed, and a specific number of column pairs is selected as the second analysis target column pair based on the analysis result. The data processing apparatus described in 1.

The column pair selection processing execution unit
The specific column included in the first data is divided into two, and the two columns after the division are selected as the first analysis target column pair, according to claim 1, Data processing device.

The data processing device further includes:
Based on the degree of approximation for each second analysis target column pair calculated by the approximation degree calculation processing execution unit, from among the candidates for the second analysis target column pair that have a relationship corresponding to the first analysis target column pair The data processing according to claim 1, further comprising: a correspondence candidate extraction process execution unit that executes a correspondence candidate extraction process that extracts a correspondence candidate column pair that has a degree of approximation less than or equal to a variable m that can be specified. apparatus.

The correspondence candidate extraction processing execution unit
12. The data processing apparatus according to claim 11, wherein a corresponding candidate column pair is extracted by referring to an arrangement order of columns in the second data together with an approximation degree for each second analysis target column pair.

For a two-dimensional first data that includes a plurality of fields and each field is classified into one of a plurality of columns, the computer selects a column pair to be analyzed as a first analysis target column. Column pair selection processing to select as a pair,
A first appearance tendency analysis process in which the computer analyzes an appearance tendency of a concatenated field value obtained by concatenating field values of each column of the first analysis target column pair in units of rows;
For the two-dimensional second data in which a plurality of fields are included and each field is classified into one of a plurality of columns, the computer selects a column pair to be analyzed as a second analysis target. Select one or more pairs as column pairs, and analyze the appearance tendency of concatenated field values obtained by connecting the field values of each column of the second analysis target column pair in units of rows for each second analysis target column pair. Appearance trend analysis processing,
The computer analyzes the analysis result for the first analysis target column pair and the analysis result for each second analysis target column pair, and for each second analysis target column pair, in the first analysis target column pair A data processing method comprising: an approximation calculation process for calculating an approximation with an appearance tendency of linked field values.

Select a column pair to be analyzed as the first column pair to be analyzed for two-dimensional first data that contains multiple fields and each field is divided into one of multiple columns. Column pair selection processing to
A first appearance tendency analysis process for analyzing an appearance tendency of a concatenated field value obtained by concatenating field values of each column of the first analysis target column pair in units of rows;
For two-dimensional second data in which a plurality of fields are included and each field is divided into one of a plurality of columns, the column pair to be analyzed is set as 1 as the second analysis target column pair. Second appearance trend analysis processing for selecting the pair or more and analyzing the appearance tendency of the connected field value obtained by connecting the field values of each column of the second analysis target column pair in units of rows for each second analysis target column pair When,
The analysis result for the first analysis target column pair and the analysis result for each second analysis target column pair are analyzed, and the concatenated field value in the first analysis target column pair is determined for each second analysis target column pair. A program that causes a computer to execute an approximation calculation process for calculating an approximation with an appearance tendency.