JP6253601B2

JP6253601B2 - Data linkage estimation device, data linkage estimation method, and program

Info

Publication number: JP6253601B2
Application number: JP2015011570A
Authority: JP
Inventors: 佐藤　彰洋; 彰洋佐藤; 理華河端
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2015-01-23
Filing date: 2015-01-23
Publication date: 2017-12-27
Anticipated expiration: 2035-01-23
Also published as: JP2016136354A

Description

本発明は、データ連携推定装置、データ連携推定方法及びプログラムに関するものである。 The present invention relates to a data linkage estimation device, a data linkage estimation method, and a program.

システム統合などにより必要となるデータベースの統合処理や、複数システム間でデータを送受信する処理にともなって、システム別のデータベース間のデータ連携、あるいは、同じデータベース内の別々のテーブル間でのデータ連携が必要となる。データ連携では、連携先となるテーブルに対して、どのテーブルのカラムからデータを抽出すればよいかという連携元テーブルを定義する必要がある。データ連携関係を定義するためには、テーブル間の関係性、すなわち参照関係を特定する必要がある上に、テーブルの論理的な位置づけを特定する必要がある。 With database integration processing required for system integration, etc., and data transmission / reception between multiple systems, data linkage between databases by system or data linkage between different tables in the same database Necessary. In data linkage, it is necessary to define a linkage source table for which table column data should be extracted from a linkage destination table. In order to define the data linkage relationship, it is necessary to specify the relationship between the tables, that is, the reference relationship, and also specify the logical position of the table.

非特許文献１には、データベースに対して実施されたデータ操作として発行されたクエリを読み込み、発行されたクエリに含まれる結合条件を用いてテーブル間の参照関係を自動抽出する技術が開示されている。
非特許文献２には、テーブル定義情報のようなメタデータのみからスキーママッチング技術によりテーブル間の参照関係を抽出する技術が開示されている。 Non-Patent Document 1 discloses a technique for reading a query issued as a data operation performed on a database and automatically extracting a reference relationship between tables using a join condition included in the issued query. Yes.
Non-Patent Document 2 discloses a technique for extracting a reference relationship between tables by using a schema matching technique only from metadata such as table definition information.

ＨａｚｅｍＥｌｍｅｌｅｅｇｙａｎｄＪａｅｗｏｏＬｅｅａｎｄＥｌＫｉｎｄｉＲｅｚｉｇａｎｄＭｏｕｒａｄＯｕｚｚａｎｉａｎｄＡｈｍｅｄＥｌｍａｇａｒｍｉｄ，Ｕ−ＭＡＰ：ＡＳｙｓｔｅｍｆｏｒＵｓａｇｅ−ＢａｓｅｄＳｃｈｅｍａＭａｔｃｈｉｎｇａｎｄＭａｐｐｉｎｇ，ＳＩＧＭＯＤ’１１Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２０１１ＡＣＭＳＩＧＭＯＤｉｎｔｅｒｎａｔｉｏｎａｌｃｏｎｆｅｒｅｎｃｅｏｎＭａｎａｇｅｍｅｎｔｏｆｄａｔａ，２０１１Hazem Elmeleegy and Jaewoo Lee and El Kindi Rezig and Mourad Ouzzani and Ahmed Elmagarmid, U-MAP: A System for Usage-Based Schema Matching and Mapping, SIGMOD'11 Proceedings of the 2011 ACM SIGMOD international conference on Management of data, 2011 佐藤彰洋、鹿島理華、谷垣宏一、山足光義，スキーマ構成文字列と主キー制約情報に基づく外部参照関係の推定，Ｔｈｅ２８ｔｈＡｎｎｕａｌＣｏｎｆｅｒｅｎｃｅｏｆｔｈｅＪａｐａｎｅｓｅＳｏｃｉｅｔｙｆｏｒＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ，２０１４Akihiro Sato, Rika Kashima, Koichi Tanigaki, Mitsuyoshi Yamafoot, Estimating External Reference Relationships Based on Schema Composition Strings and Primary Key Constraint Information, The 28th Annual Conference of the Japan Society for Artificial Intelligence20

非特許文献１のような発行クエリを利用する技術では、適用できるシステムの範囲が利用可能な発行クエリの数や、クエリ発行を媒介するアプリケーションにより制限され、全システムを対象として実施することが困難であるという課題がある。
またデータ連携関係の自動推薦のためには発行クエリの準備が必須であるため、例えば連携先テーブルが新規に作成するテーブルであり、まだ発行クエリが存在しない場合は利用できないという課題も存在する。 In the technology using issue queries as in Non-Patent Document 1, the range of applicable systems is limited by the number of issue queries that can be used and applications that mediate query issuance, and it is difficult to implement for all systems. There is a problem of being.
In addition, since it is essential to prepare an issue query for automatic recommendation of a data linkage relationship, for example, a linkage destination table is a newly created table, and there is a problem that it cannot be used if no issue query exists yet.

非特許文献２は、スキーママッチング技術は類似のデータ項目を見つけ出すという技術であるため、複数システムのデータベースを対象とする処理では、同義のカラムや類似の参照関係が多数存在する可能性がある。このため、スキーママッチング技術としては正しい対応関係であっても、データ連携関係としては使用できない関係が多数検出されてしまうという課題がある。
また、データ連携関係の自動推薦に固有の要件である、テーブルがマスタテーブルかトランザクションテーブルかという論理的な位置づけを認識したうえで、データ連携の要件に応じた連携元テーブル群を生成する必要がある点については考慮されていない。 In Non-Patent Document 2, the schema matching technique is a technique for finding out similar data items. Therefore, there is a possibility that there are many synonymous columns and similar reference relationships in a process for a database of a plurality of systems. For this reason, even if it is a correct correspondence as a schema matching technique, there exists a subject that many relations which cannot be used as a data cooperation relation will be detected.
In addition, it is necessary to generate a linkage source table group according to the requirements for data linkage after recognizing the logical position of whether the table is a master table or transaction table, which is a requirement unique to automatic recommendation of data linkage relationships. Some points are not considered.

本発明は、テーブルの論理的な位置づけを認識した上で、データ連携の要件に応じた連携元テーブル群を的確に生成することを目的とする。 An object of the present invention is to accurately generate a linkage source table group corresponding to data linkage requirements after recognizing the logical position of a table.

本発明に係るデータ連携推定装置は、
カラムを有するテーブルを複数含むテーブル群からカラムの参照関係を表す参照関係データを抽出する参照関係抽出部と、
前記テーブル群から抽出される２つのテーブルのカラム同士の類似度を算出し、前記類似度が閾値以上である２つのテーブルのカラム同士をスキーママッチング結果データとして抽出するスキーママッチング処理部と、
前記参照関係データと前記スキーママッチング結果データとに基づいて、複数のテーブルの各テーブルのカラムのカラム特性を判定し、判定した結果を継承関係データとして生成するデータ関係抽出部と、
前記継承関係データに基づいて、前記テーブルのテーブル特性を表すテーブルスコアを算出し、算出したテーブルスコアと前記テーブルとを対応付けた関係性情報を生成する関係性抽出部と、
前記テーブル群に含まれる検索対象の連携先テーブルを定義する連携先テーブル定義情報を取得し、前記関係性情報に基づいて、前記連携先テーブルの連携元である連携元テーブルの候補を連携元候補テーブル群として抽出するマッチング結果統合部と
を備える。 The data linkage estimation apparatus according to the present invention is:
A reference relationship extraction unit that extracts reference relationship data representing a column reference relationship from a table group including a plurality of tables having columns;
Calculating a similarity between the columns of the two tables extracted from the table group, and extracting the columns of the two tables having the similarity equal to or higher than a threshold as schema matching result data;
Based on the reference relationship data and the schema matching result data, determine a column characteristic of each column of a plurality of tables, and a data relationship extraction unit that generates the determined result as inheritance relationship data;
Based on the inheritance relationship data, a table score representing a table characteristic of the table is calculated, and a relationship extraction unit that generates relationship information in which the calculated table score is associated with the table;
Acquires cooperation destination table definition information that defines a search target cooperation destination table included in the table group, and selects a cooperation source table candidate that is a cooperation source of the cooperation destination table based on the relationship information. And a matching result integration unit that extracts the table group.

本発明に係るデータ連携推定装置は、データ関係抽出部が、各テーブルのカラムのカラム特性を判定し、判定した結果を継承関係データとして生成し、関係性抽出部が、継承関係データに基づいて、テーブルのテーブル特性を表すテーブルスコアとテーブルとを対応付けた関係性情報を生成し、マッチング結果統合部が、関係性情報に基づいて、連携先テーブルの連携元である連携元テーブルの候補を連携元候補テーブル群として抽出するので、テーブルの論理的な位置づけであるテーブル特性を認識した上で、データ連携の要件に応じた連携元候補テーブル群を的確に生成することができる。 In the data linkage estimation apparatus according to the present invention, the data relationship extraction unit determines the column characteristics of the columns of each table, generates the determined result as inheritance relationship data, and the relationship extraction unit performs the determination based on the inheritance relationship data. Then, the relationship information is generated by associating the table score indicating the table characteristics of the table with the table, and the matching result integration unit determines the cooperation source table candidate that is the cooperation source of the cooperation destination table based on the relationship information. Since it is extracted as a collaboration source candidate table group, it is possible to accurately generate a collaboration source candidate table group according to the requirements for data collaboration after recognizing the table characteristics that are the logical positioning of the table.

実施の形態１に係るデータ連携推定装置のブロック構成図。1 is a block configuration diagram of a data linkage estimation device according to Embodiment 1. FIG. 実施の形態１に係るデータ連携推定装置のハードウェア構成図。The hardware block diagram of the data cooperation estimation apparatus which concerns on Embodiment 1. FIG. 実施の形態１に係るデータ連携推定装置のデータ連携推定方法、データ連携推定処理の動作を示すフロー図。FIG. 3 is a flowchart showing operations of a data linkage estimation method and a data linkage estimation process of the data linkage estimation apparatus according to the first embodiment. 実施の形態１に係るデータベース定義情報１２１１の構成図。The block diagram of the database definition information 1211 which concerns on Embodiment 1. FIG. 実施の形態１に係る参照関係データ１６１の構成図。FIG. 3 is a configuration diagram of reference relationship data 161 according to Embodiment 1. 実施の形態１に係るスキーママッチング処理ｓ１３０、データ関係抽出処理ｓ１３０ａのフロー図。The flowchart of the schema matching process s130 which concerns on Embodiment 1, and the data relationship extraction process s130a. 実施の形態１に係るスキーママッチング結果データ１６３の構成図。FIG. 6 is a configuration diagram of schema matching result data 163 according to the first embodiment. 実施の形態１に係る継承関係データ１６２の構成図。The block diagram of the inheritance relation data 162 which concerns on Embodiment 1. FIG. 実施の形態１に係る探索用参照元テーブルと探索用参照先テーブルとの構成図。The block diagram of the reference table for a search and the reference table for a search which concerns on Embodiment 1. FIG. 実施の形態１に係る関係性抽出処理ｓ１４０のフロー図。FIG. 5 is a flowchart of relationship extraction processing s140 according to the first embodiment. 実施の形態１に係る関係性抽出部によるスコア計算を示す図であり、（ａ）はスコア算出式、（ｂ）は継承項目及び被継承項目に基づくスコア１５１と固有項目に基づくスコア１５２とテーブルスコア１７１４の算出例。It is a figure which shows the score calculation by the relationship extraction part which concerns on Embodiment 1, (a) is a score calculation formula, (b) is the score 151 based on the inheritance item and the inherited item, the score 152 based on a specific item, and a table An example of calculating a score 1714. 実施の形態１に係る関係性情報の構成図。FIG. 3 is a configuration diagram of relationship information according to the first embodiment. 実施の形態１に係るマッチング結果統合処理ｓ１５０のフロー図。FIG. 6 is a flowchart of matching result integration processing s150 according to Embodiment 1; 実施の形態１に係る連携先テーブル定義情報入力例。7 is an input example of cooperation destination table definition information according to the first embodiment. 実施の形態１に係る連携元テーブル群検索処理ｓ１５５のフロー図。FIG. 11 is a flowchart of cooperation source table group search processing s155 according to the first embodiment. 実施の形態１に係る連携元候補カラム群１８１の具体例。A specific example of a collaboration source candidate column group 181 according to the first embodiment. 実施の形態１に係る連携元候補カラム群の組み合わせ１８２の具体例。10 is a specific example of a combination source candidate column group combination 182 according to the first embodiment. 実施の形態１に係る連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３の具体例。A specific example of a table pair 183 included in a combination of cooperation source candidate column groups according to the first embodiment. 実施の形態１に係る最短経路探索部１９が用いるノード情報１９１、辺情報１９２、最短経路情報１９３の具体例。5 is a specific example of node information 191, edge information 192, and shortest path information 193 used by the shortest path search unit 19 according to the first embodiment. 実施の形態１に係る最短経路情報の組み合わせ１８４の具体例。5 is a specific example of the shortest path information combination 184 according to the first embodiment. 実施の形態１に係る最短経路から連携可能なカラム群１８５の具体例。5 is a specific example of a column group 185 that can be linked from the shortest path according to the first embodiment. 実施の形態１に係る最短経路上ノード情報１８６の具体例。5 is a specific example of shortest path node information 186 according to the first embodiment. 実施の形態１に係るマスタテーブル優先の連携元テーブル定義情報出力例。6 is an output example of master table priority linkage source table definition information according to the first embodiment. 実施の形態１に係るトランザクションテーブル優先の連携元テーブル定義情報出力例。6 is an output example of linkage source table definition information giving priority to a transaction table according to the first embodiment. 実施の形態１に係るテーブル構成例の全体像を示す図。FIG. 3 is a diagram illustrating an overall image of a table configuration example according to the first embodiment. 実施の形態１に係る連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３と期待する最短経路探索部での探索結果とを示す図。The figure which shows the search result in the pair 183 of the table contained in the combination of the cooperation origin candidate column group which concerns on Embodiment 1, and the shortest path search part to expect. 最短経路情報１９３について任意の個数の組み合わせを生成し、連携元候補カラム群の識別子で示されるカラムが識別子ごとに１つ以上経路上に存在する最短経路情報の組み合わせを検索する方法を説明する図。The figure explaining the method of producing | generating arbitrary combinations about the shortest path information 193, and searching for the combination of the shortest path information in which the column shown by the identifier of a cooperation origin candidate column group exists on a path | route for every identifier. . 実施の形態１に係る合計値の算出例。4 is a calculation example of a total value according to the first embodiment.

実施の形態１．
＊＊＊構成の説明＊＊＊
図１を用いて、本実施の形態に係るデータ連携推定装置１００の構成について説明する。
データ連携推定装置１００は、テーブル定義抽出部１１、参照関係抽出部１２、データ関係抽出部１３、スキーママッチング処理部１４、関係性抽出部１５、データ蓄積部１６、関係性情報格納部１７、マッチング結果統合部１８、最短経路探索部１９を備える。
また、データ連携推定装置１００には、連携元データベース定義情報１０１、連携先データベース定義情報１０２、表示装置２００、入力装置３００が接続される。 Embodiment 1 FIG.
*** Explanation of configuration ***
The configuration of the data linkage estimation apparatus 100 according to the present embodiment will be described using FIG.
The data linkage estimation apparatus 100 includes a table definition extraction unit 11, a reference relationship extraction unit 12, a data relationship extraction unit 13, a schema matching processing unit 14, a relationship extraction unit 15, a data storage unit 16, a relationship information storage unit 17, and a matching A result integrating unit 18 and a shortest path searching unit 19 are provided.
The data cooperation estimation apparatus 100 is connected with cooperation source database definition information 101, cooperation destination database definition information 102, a display apparatus 200, and an input apparatus 300.

表示装置２００は、データ連携推定装置１００から出力される返却結果を画面表示するプログラムであってもよいし、返却結果を入力値として利用する別の装置であってもよい。
入力装置３００は、データ連携推定装置１００に対して、テーブル定義情報を入力するための入力装置たるキーボード、マウス、通信ボードであってもよいし、テーブル定義情報を出力する装置あるいはプログラムであってもよい。 The display device 200 may be a program that displays the return result output from the data cooperation estimation device 100 on the screen, or may be another device that uses the return result as an input value.
The input device 300 may be a keyboard, a mouse, or a communication board that is an input device for inputting table definition information to the data linkage estimation device 100, and is a device or program that outputs table definition information. Also good.

連携元データベース定義情報１０１及び連携先データベース定義情報１０２には、連携元及び連携先となる複数のシステムのデータベースを定義するデータベース定義情報１２１１が格納される。
連携元データベース定義情報１０１及び連携先データベース定義情報１０２は、カラムを有するテーブルを複数含むテーブル群１０００の例である。 The linkage source database definition information 101 and the linkage destination database definition information 102 store database definition information 1211 that defines databases of a plurality of systems that are the linkage source and the linkage destination.
The linkage source database definition information 101 and the linkage destination database definition information 102 are examples of a table group 1000 including a plurality of tables having columns.

テーブル定義抽出部１１は、連携元データベース定義情報１０１及び連携先データベース定義情報１０２からデータベース定義情報１２１１を抽出し、参照関係抽出部１２及びデータ関係抽出部１３に出力する。データベース定義情報１２１１には、複数のテーブルが定義されている。 The table definition extraction unit 11 extracts the database definition information 1211 from the cooperation source database definition information 101 and the cooperation destination database definition information 102 and outputs the database definition information 1211 to the reference relationship extraction unit 12 and the data relationship extraction unit 13. A plurality of tables are defined in the database definition information 1211.

参照関係抽出部１２は、カラムを有するテーブルを複数含むテーブル群１０００からカラムの参照関係を表す参照関係データ１６１を抽出する。参照関係抽出部１２は、テーブル定義抽出部１１からデータベース定義情報１２１１を取得し、任意の２つのテーブル間の意味的な参照関係である参照関係データ１６１を抽出し、データ蓄積部１６へ保存する。 The reference relationship extraction unit 12 extracts reference relationship data 161 representing a column reference relationship from a table group 1000 including a plurality of tables having columns. The reference relationship extraction unit 12 acquires the database definition information 1211 from the table definition extraction unit 11, extracts reference relationship data 161 that is a semantic reference relationship between any two tables, and stores it in the data storage unit 16. .

データ関係抽出部１３は、テーブル定義抽出部１１からデータベース定義情報１２１１を取得し、任意の２テーブルの組み合わせを生成し、２テーブルの組み合わせのそれぞれのテーブル定義情報をスキーママッチング処理部１４に引き渡す。データ関係抽出部１３は、固有／継承関係抽出部とも称される。 The data relation extraction unit 13 acquires the database definition information 1211 from the table definition extraction unit 11, generates an arbitrary combination of two tables, and delivers each table definition information of the combination of the two tables to the schema matching processing unit 14. The data relationship extraction unit 13 is also referred to as a unique / inheritance relationship extraction unit.

スキーママッチング処理部１４は、テーブル群１０００から抽出される２つのテーブルのカラム同士の類似度を算出し、類似度が閾値以上である２つのテーブルのカラム同士をスキーママッチング結果データ１６３として抽出する。スキーママッチング処理部１４は、スキーママッチング処理を実行することにより、類似度をスキーママッチングスコア１６３９として算出する。
スキーママッチング処理部１４は、データ関係抽出部１３からの２テーブルの組み合わせを入力としてスキーママッチング処理を実施し、スキーママッチング結果データ１６３を出力する。 The schema matching processing unit 14 calculates the similarity between the columns of the two tables extracted from the table group 1000, and extracts the columns of the two tables whose similarity is equal to or greater than the threshold as the schema matching result data 163. The schema matching processing unit 14 calculates the similarity as the schema matching score 1639 by executing the schema matching process.
The schema matching processing unit 14 performs a schema matching process using the combination of the two tables from the data relationship extraction unit 13 as input, and outputs schema matching result data 163.

データ関係抽出部１３は、参照関係データ１６１とスキーママッチング結果データ１６３とに基づいて、複数のテーブルの各テーブルのカラムのカラム特性を判定し、判定した結果を継承関係データ１６２として生成する。データ関係抽出部１３は、複数のテーブルの各テーブルのカラムが継承項目と被継承項目と固有項目とのいずれであるかを判定する。データ関係抽出部１３は、この判定により、複数のテーブルの各テーブルのカラムのカラム特性を判定する。
データ関係抽出部１３は、参照関係データ１６１とスキーママッチング結果データ１６３とに基づいて、継承関係データ１６２を生成する。継承関係データ１６２は、固有／継承関係データとも称される。 Based on the reference relationship data 161 and the schema matching result data 163, the data relationship extraction unit 13 determines the column characteristics of the columns of each table of the plurality of tables, and generates the determined results as inheritance relationship data 162. The data relationship extraction unit 13 determines whether the column of each table of the plurality of tables is an inherited item, a inherited item, or a unique item. Based on this determination, the data relationship extraction unit 13 determines the column characteristics of the columns of each table.
The data relationship extraction unit 13 generates inheritance relationship data 162 based on the reference relationship data 161 and the schema matching result data 163. The inheritance relationship data 162 is also referred to as unique / inheritance relationship data.

データ蓄積部１６は、メモリあるいはハードディスクあるいはＳＳＤ（ソリッドステートドライブ）等により構成される記憶領域である。データ蓄積部１６には、参照関係データ１６１、継承関係データ１６２、スキーママッチング結果データ１６３が存在する。 The data storage unit 16 is a storage area configured by a memory, a hard disk, an SSD (solid state drive), or the like. The data storage unit 16 includes reference relationship data 161, inheritance relationship data 162, and schema matching result data 163.

関係性抽出部１５は、継承関係データ１６２に基づいて、テーブルのテーブル特性を表すテーブルスコア１７１４を算出し、算出したテーブルスコア１７１４とテーブルとを対応付けた関係性情報１７１を生成する。関係性抽出部１５は、テーブル特性として、テーブルがマスタ系のテーブルであるかトランザクション系のテーブルであるかを表すテーブルスコア１７１４を算出する。
すなわち、関係性抽出部１５は、継承関係データ１６２に基づいて、テーブルの仮想的な位置情報を数値化した関係性情報１７１を生成する。関係性情報１７１は、テーブルスコア情報、関係性データとも称される。
関係性情報格納部１７は、メモリあるいはハードディスクあるいはＳＳＤ（ソリッドステートドライブ）等により構成される記憶領域である。関係性情報格納部１７には、関係性情報１７１が存在する。 Based on the inheritance relationship data 162, the relationship extraction unit 15 calculates a table score 1714 that represents the table characteristics of the table, and generates relationship information 171 that associates the calculated table score 1714 with the table. The relationship extraction unit 15 calculates a table score 1714 indicating whether the table is a master table or a transaction table as a table characteristic.
That is, the relationship extraction unit 15 generates the relationship information 171 in which the virtual position information of the table is digitized based on the inheritance relationship data 162. The relationship information 171 is also referred to as table score information and relationship data.
The relationship information storage unit 17 is a storage area configured by a memory, a hard disk, an SSD (solid state drive), or the like. Relationship information 171 exists in the relationship information storage unit 17.

マッチング結果統合部１８は、テーブル群１０００に含まれる検索対象の連携先テーブルを定義する連携先テーブル定義情報３０００を取得する。マッチング結果統合部１８は、関係性情報１７１に基づいて、連携先テーブルの連携元である連携元テーブルの候補を連携元候補テーブル群として抽出する。マッチング結果統合部１８は、関係性情報１７１に基づいて、連携元テーブルと連携先テーブルとのマッチングを実施し、連携元候補テーブル群を抽出する。
マッチング結果統合部１８は、連携元候補テーブル群から、連携先テーブルのカラムの連携元をテーブル特性の異なるテーブル毎に出力する。マッチング結果統合部１８は、連携先テーブルのカラムの連携元のカラムとして、マスタ系のテーブルのカラムとトランザクション系のテーブルのカラムとを出力する。
マッチング結果統合部１８は、連携先テーブルのカラムの連携元がない場合、不足するカラムを有するテーブルと連携元候補テーブル群との結合可否を判断し、結合可能な場合に結合するためのキーとなる結合カラム群を報告する。 The matching result integration unit 18 acquires cooperation destination table definition information 3000 that defines a search target cooperation destination table included in the table group 1000. Based on the relationship information 171, the matching result integration unit 18 extracts a candidate of a collaboration source table that is a collaboration source of the collaboration destination table as a collaboration source candidate table group. Based on the relationship information 171, the matching result integration unit 18 performs matching between the cooperation source table and the cooperation destination table, and extracts a cooperation source candidate table group.
The matching result integration unit 18 outputs the cooperation source of the column of the cooperation destination table for each table having different table characteristics from the cooperation source candidate table group. The matching result integration unit 18 outputs a master table column and a transaction table column as the linkage source columns of the linkage destination table columns.
The matching result integration unit 18 determines whether or not a table having a deficient column and a collaboration source candidate table group can be joined when there is no linkage source of the columns of the linkage destination table, and a key for joining when the linkage is possible Report the binding column group.

最短経路探索部１９は、テーブル間が参照関係で結ばれている場合に２つのテーブルを結ぶ最短経路情報１９３を抽出する。 The shortest path search unit 19 extracts shortest path information 193 that connects two tables when the tables are connected by a reference relationship.

図２を用いて、データ連携推定装置１００のハードウェア構成例について説明する。
データ連携推定装置１００はコンピュータである。
データ連携推定装置１００は、プロセッサ９０１、補助記憶装置９０２、メモリ９０３、通信装置９０４、入力インタフェース９０５、ディスプレイインタフェース９０６といったハードウェアを備える。
プロセッサ９０１は、信号線９１０を介して他のハードウェアと接続され、これら他のハードウェアを制御する。
入力インタフェース９０５は、入力装置９０７に接続されている。
ディスプレイインタフェース９０６は、ディスプレイ９０８に接続されている。 A hardware configuration example of the data cooperation estimation apparatus 100 will be described with reference to FIG.
The data linkage estimation apparatus 100 is a computer.
The data linkage estimation apparatus 100 includes hardware such as a processor 901, an auxiliary storage device 902, a memory 903, a communication device 904, an input interface 905, and a display interface 906.
The processor 901 is connected to other hardware via the signal line 910, and controls these other hardware.
The input interface 905 is connected to the input device 907.
The display interface 906 is connected to the display 908.

プロセッサ９０１は、プロセッシングを行うＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）である。
プロセッサ９０１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。
補助記憶装置９０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）である。
メモリ９０３は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。
通信装置９０４は、データを受信するレシーバー９０４１及びデータを送信するトランスミッター９０４２を含む。
通信装置９０４は、例えば、通信チップ又はＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）である。
入力インタフェース９０５は、入力装置９０７のケーブル９１１が接続されるポートである。
入力インタフェース９０５は、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）端子である。
ディスプレイインタフェース９０６は、ディスプレイ９０８のケーブル９１２が接続されるポートである。
ディスプレイインタフェース９０６は、例えば、ＵＳＢ端子又はＨＤＭＩ（登録商標）（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）端子である。
入力装置９０７は、例えば、マウス、キーボード又はタッチパネルである。
ディスプレイ９０８は、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）である。 The processor 901 is an IC (Integrated Circuit) that performs processing.
The processor 901 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or a GPU (Graphics Processing Unit).
The auxiliary storage device 902 is, for example, a ROM (Read Only Memory), a flash memory, or an HDD (Hard Disk Drive).
The memory 903 is, for example, a RAM (Random Access Memory).
The communication device 904 includes a receiver 9041 that receives data and a transmitter 9042 that transmits data.
The communication device 904 is, for example, a communication chip or a NIC (Network Interface Card).
The input interface 905 is a port to which the cable 911 of the input device 907 is connected.
The input interface 905 is, for example, a USB (Universal Serial Bus) terminal.
The display interface 906 is a port to which the cable 912 of the display 908 is connected.
The display interface 906 is, for example, a USB terminal or an HDMI (registered trademark) (High Definition Multimedia Interface) terminal.
The input device 907 is, for example, a mouse, a keyboard, or a touch panel.
The display 908 is, for example, an LCD (Liquid Crystal Display).

補助記憶装置９０２には、図１に示すテーブル定義抽出部１１、参照関係抽出部１２、データ関係抽出部１３、スキーママッチング処理部１４、関係性抽出部１５、マッチング結果統合部１８、最短経路探索部１９（以下、テーブル定義抽出部１１、参照関係抽出部１２、データ関係抽出部１３、スキーママッチング処理部１４、関係性抽出部１５、マッチング結果統合部１８、最短経路探索部１９をまとめて「部」と表記する）の機能を実現するプログラムが記憶されている。
このプログラムは、メモリ９０３にロードされ、プロセッサ９０１に読み込まれ、プロセッサ９０１によって実行される。
更に、補助記憶装置９０２には、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）も記憶されている。
そして、ＯＳの少なくとも一部がメモリ９０３にロードされ、プロセッサ９０１はＯＳを実行しながら、「部」の機能を実現するプログラムを実行する。
図２では、１つのプロセッサ９０１が図示されているが、データ連携推定装置１００が複数のプロセッサ９０１を備えていてもよい。
そして、複数のプロセッサ９０１が「部」の機能を実現するプログラムを連携して実行してもよい。
また、「部」の処理の結果を示す情報やデータや信号値や変数値が、メモリ９０３、補助記憶装置９０２、又は、プロセッサ９０１内のレジスタ又はキャッシュメモリにファイルとして記憶される。 The auxiliary storage device 902 includes a table definition extraction unit 11, a reference relationship extraction unit 12, a data relationship extraction unit 13, a schema matching processing unit 14, a relationship extraction unit 15, a matching result integration unit 18, and a shortest path search shown in FIG. Unit 19 (hereinafter, the table definition extraction unit 11, the reference relationship extraction unit 12, the data relationship extraction unit 13, the schema matching processing unit 14, the relationship extraction unit 15, the matching result integration unit 18, and the shortest path search unit 19 are collectively referred to as “ A program that realizes the function of “part” is stored.
This program is loaded into the memory 903, read into the processor 901, and executed by the processor 901.
Further, the auxiliary storage device 902 also stores an OS (Operating System).
Then, at least a part of the OS is loaded into the memory 903, and the processor 901 executes a program that realizes the function of “unit” while executing the OS.
In FIG. 2, one processor 901 is illustrated, but the data linkage estimation apparatus 100 may include a plurality of processors 901.
A plurality of processors 901 may execute a program for realizing the function of “unit” in cooperation with each other.
In addition, information, data, signal values, and variable values indicating the results of the processing of “unit” are stored as files in the memory 903, the auxiliary storage device 902, or a register or cache memory in the processor 901.

「部」を「サーキットリー」で提供してもよい。
また、「部」を「回路」又は「工程」又は「手順」又は「処理」に読み替えてもよい。また、「処理」を「回路」又は「工程」又は「手順」又は「部」に読み替えてもよい。
「回路」及び「サーキットリー」は、プロセッサ９０１だけでなく、ロジックＩＣ又はＧＡ（ＧａｔｅＡｒｒａｙ）又はＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）又はＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）といった他の種類の処理回路をも包含する概念である。
なお、プログラムプロダクトと称されるものは、ブロック構成図に示す「部」の機能を実現するプログラムが記録された記憶媒体、記憶装置などであり、見た目の形式に関わらず、コンピュータ読み取り可能なプログラムをロードしているものである。 The “part” may be provided as “circuitry”.
Further, “part” may be read as “circuit”, “process”, “procedure”, or “processing”. Further, “processing” may be read as “circuit”, “process”, “procedure”, or “part”.
“Circuit” and “Circuitry” include not only the processor 901 but also other types of processing circuits such as a logic IC or GA (Gate Array) or ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array). It is a concept to include.
Note that what is called a program product is a storage medium, storage device, or the like in which a program that realizes the function of the “unit” shown in the block configuration diagram is recorded. Is what you are loading.

＊＊＊動作の説明＊＊＊
図３を用いて、本実施の形態に係るデータ連携推定装置１００のデータ連携推定方法、データ連携推定処理の動作について説明する。 *** Explanation of operation ***
The operation | movement of the data cooperation estimation method of the data cooperation estimation apparatus 100 which concerns on this Embodiment, and a data cooperation estimation process is demonstrated using FIG.

＜データベース定義情報抽出処理ｓ１１０＞
ｓ１１０において、テーブル定義抽出部１１は、連携元データベース定義情報１０１および連携先データベース定義情報１０２から、テーブルを定義する情報をデータベース定義情報１２１１として抽出する。テーブル定義抽出部１１は、データ連携設計の対象となる連携元データベース定義情報１０１および連携先データベース定義情報１０２から、データベース定義情報１２１１を、データ定義言語（ＤＤＬ）、ＸＭＬファイルなどにより受け取る。あるいは、テーブル定義抽出部１１は、データベースに接続してテーブル定義情報を取得するＳＱＬ文の発行などによって受け取り、参照関係抽出部１２およびデータ関係抽出部１３に引き渡す。 <Database definition information extraction processing s110>
In s 110, the table definition extraction unit 11 extracts information defining a table as database definition information 1211 from the cooperation source database definition information 101 and the cooperation destination database definition information 102. The table definition extraction unit 11 receives the database definition information 1211 from the cooperation source database definition information 101 and the cooperation destination database definition information 102 that are targets of data cooperation design, using a data definition language (DDL), an XML file, or the like. Alternatively, the table definition extraction unit 11 receives an SQL statement by connecting to a database to acquire table definition information, and passes it to the reference relationship extraction unit 12 and the data relationship extraction unit 13.

図２５は、本実施の形態におけるテーブル構成例の全体像を示す図である。図４は、図２５のテーブル構成に基づくデータベース定義情報１２１１を示す図である。
図２５に示すように、本実施の形態では、発注システムが店舗テーブル、注文テーブル、商品テーブルを有し、人事システムはオーナテーブルを有するものとする。
図４に示すように、データベース定義情報１２１１は、テーブル所在情報１１１と、テーブル定義情報１１２とから構成される。
図４に示すテーブル所在情報１１１の第１行目は、「発注システム」というインスタンスの「ＳＶＲ００１」というスキーマにおける「店舗」テーブルであるという情報である。また、テーブル定義情報１１２には、「店舗」テーブルに所属するカラムとして「店舗ＩＤ」と「オーナＩＤ」と「オーナ名」と「店舗住所」とが存在し、「店舗ＩＤ」は単独で主キーであることが設定されている。 FIG. 25 is a diagram showing an overall image of a table configuration example in the present embodiment. FIG. 4 is a diagram showing the database definition information 1211 based on the table configuration of FIG.
As shown in FIG. 25, in this embodiment, the ordering system has a store table, an order table, and a product table, and the personnel system has an owner table.
As shown in FIG. 4, the database definition information 1211 includes table location information 111 and table definition information 112.
The first row of the table location information 111 shown in FIG. 4 is information indicating that the “store” table in the schema “SVR001” of the instance “ordering system”. The table definition information 112 includes “store ID”, “owner ID”, “owner name”, and “store address” as columns belonging to the “store” table. It is set to be a key.

＜参照関係抽出処理ｓ１２０＞
参照関係抽出処理ｓ１２０において、参照関係抽出部１２は、カラムを有するテーブルを複数含むテーブル群１０００からカラムの参照関係を表す参照関係データ１６１を抽出する。
参照関係抽出部１２は、テーブル定義抽出部１１から引き渡されたテーブル定義情報の集合であるテーブル群におけるテーブル同士の組み合わせについて、テーブル所在情報１１１とテーブル定義情報１１２との情報を受け取る。テーブルについてのテーブル所在情報１１１及びテーブル定義情報１１２の情報を単にテーブル定義情報と称する場合もある。
参照関係抽出部１２は、受け取ったテーブル同士の組み合わせについて参照関係データ１６１の情報を出力する。参照関係抽出部１２は、レコードデータの値から推測する方法などの技術及び付加的な人的作業によって実現する。 <Reference relationship extraction processing s120>
In the reference relationship extraction process s120, the reference relationship extraction unit 12 extracts reference relationship data 161 representing the column reference relationship from the table group 1000 including a plurality of tables having columns.
The reference relationship extraction unit 12 receives information on the table location information 111 and the table definition information 112 for a combination of tables in a table group that is a set of table definition information delivered from the table definition extraction unit 11. Information on the table location information 111 and the table definition information 112 for the table may be simply referred to as table definition information.
The reference relationship extraction unit 12 outputs information of the reference relationship data 161 for the received combination of tables. The reference relationship extraction unit 12 is realized by a technique such as a method of inferring from the value of record data and additional human work.

図５を用いて、参照関係データ１６１の構成について説明する。
参照関係データ１６１は、参照先のテーブルのカラムと参照元のテーブルのカラムとで何らかの意味的な関係があることを示している。
参照関係抽出部１２は、参照関係データ１６１の抽出方法として上記のような技術や作業を用い、２つのテーブル間で参照関係にあるカラムデータを抽出する。
本実施の形態では、図４に示すように、店舗、注文、商品、オーナの４つのテーブルがあるものとする。参照関係抽出部１２は、これらのテーブルから２つを組み合わせ、各組み合わせの２つのテーブル間に関して参照関係の有無を抽出し、図５に示す参照関係データ１６１の結果を得る。図５に示す参照関係データ１６１では、以下の結果が得られている。
（１）店舗テーブル→オーナテーブル（オーナＩＤ）。
（２）注文テーブル→店舗テーブル（店舗ＩＤ）。
（３）注文テーブル→商品テーブル（商品ＩＤ）。
例えば、上記の（２）は、店舗ＩＤについて店舗テーブルは注文テーブルより上位であることを示している。 The configuration of the reference relationship data 161 will be described with reference to FIG.
The reference relationship data 161 indicates that there is some semantic relationship between the column of the reference destination table and the column of the reference source table.
The reference relationship extraction unit 12 extracts the column data having the reference relationship between the two tables using the technique and work as described above as the method of extracting the reference relationship data 161.
In the present embodiment, it is assumed that there are four tables, store, order, product, and owner, as shown in FIG. The reference relationship extraction unit 12 combines two from these tables, extracts the presence or absence of a reference relationship between the two tables of each combination, and obtains the result of the reference relationship data 161 shown in FIG. In the reference relationship data 161 shown in FIG. 5, the following results are obtained.
(1) Store table → Owner table (owner ID).
(2) Order table → store table (store ID).
(3) Order table → product table (product ID).
For example, (2) above indicates that the store table is higher than the order table for the store ID.

＜スキーママッチング処理ｓ１３０＞
スキーママッチング処理ｓ１３０において、スキーママッチング処理部１４は、テーブル群１０００から抽出される２つのテーブルのカラム同士の類似度１６３０１を算出する。スキーママッチング処理部１４は、類似度１６３０１が閾値以上である２つのテーブルのカラム同士をスキーママッチング結果データ１６３として抽出する。 <Schema matching processing s130>
In the schema matching processing s130, the schema matching processing unit 14 calculates the similarity 16301 between the columns of the two tables extracted from the table group 1000. The schema matching processing unit 14 extracts the columns of the two tables whose similarity 16301 is equal to or greater than the threshold as the schema matching result data 163.

＜データ関係抽出処理ｓ１３０ａ＞
データ関係抽出処理ｓ１３０ａについて説明する。
データ関係抽出処理ｓ１３０ａにおいて、データ関係抽出部１３は、参照関係データ１６１とスキーママッチング結果データ１６３とに基づいて、複数のテーブルの各テーブルのカラム１６２０１のカラム特性１６２０２を判定する。そして、データ関係抽出部１３は、判定した結果を継承関係データ１６２として生成する。継承関係データ１６２の構成の例を図８に示し、後で説明する。 <Data relation extraction processing s130a>
The data relationship extraction process s130a will be described.
In the data relationship extraction process s130a, the data relationship extraction unit 13 determines the column characteristic 16202 of the column 16201 of each table of the plurality of tables based on the reference relationship data 161 and the schema matching result data 163. Then, the data relationship extraction unit 13 generates the determined result as inheritance relationship data 162. An example of the configuration of the inheritance relationship data 162 is shown in FIG. 8 and will be described later.

図６を用いて、スキーママッチング処理部１４によるスキーママッチング処理ｓ１３０及びデータ関係抽出部１３によるデータ関係抽出処理ｓ１３０ａについて説明する。
ｓ１３１において、データ関係抽出部１３は、テーブル定義抽出部１１からデータベース定義情報１２１１を受け取る。データベース定義情報１２１１は、複数のテーブルのテーブル定義情報を含むテーブル群である。
ｓ１３２において、データ関係抽出部１３は、データ蓄積部１６から参照関係データ１６１を受け取る。 The schema matching processing s130 performed by the schema matching processing unit 14 and the data relationship extraction processing s130a performed by the data relationship extracting unit 13 will be described with reference to FIG.
In s131, the data relationship extraction unit 13 receives the database definition information 1211 from the table definition extraction unit 11. The database definition information 1211 is a table group including table definition information of a plurality of tables.
In s 132, the data relationship extraction unit 13 receives the reference relationship data 161 from the data storage unit 16.

ｓ１３３において、データ関係抽出部１３は、テーブル群から任意の２テーブルの組み合わせを生成する。ただし、同じテーブル同士では組み合わせとしない。データ関係抽出部１３は、生成した２つのテーブルの組み合わせのそれぞれのテーブル所在情報１１１およびテーブル定義情報１１２をスキーママッチング処理部１４に引き渡す。 In s133, the data relationship extraction unit 13 generates an arbitrary combination of two tables from the table group. However, the same table is not combined. The data relationship extraction unit 13 delivers the table location information 111 and the table definition information 112 of the generated combination of the two tables to the schema matching processing unit 14.

ｓ１３４において、データ関係抽出部１３は、スキーママッチング結果データ１６３を出力する。
図７を用いて、本実施の形態に係るスキーママッチング結果データ１６３の構成について説明する。
スキーママッチング処理部１４は、２つのテーブルの組み合わせのそれぞれのテーブル所在情報１１１およびテーブル定義情報１１２を入力として、スキーママッチング処理を実行する。スキーママッチング処理部１４は、スキーママッチング処理を実行し、各テーブルに属するカラム間の類似度１６３０１を０〜１．０の間の数値で表現したスキーママッチングスコア１６３９を算出する。 In s134, the data relationship extraction unit 13 outputs the schema matching result data 163.
The configuration of the schema matching result data 163 according to the present embodiment will be described with reference to FIG.
The schema matching processing unit 14 receives the table location information 111 and the table definition information 112 of each combination of two tables as input, and executes a schema matching process. The schema matching processing unit 14 executes schema matching processing, and calculates a schema matching score 1639 in which the similarity 16301 between columns belonging to each table is expressed by a numerical value between 0 and 1.0.

スキーママッチング処理では、４つのテーブルから２つずつ組み合わせ、各組み合わせの２つのテーブル間に関してスキーママッチング処理を実行する。そして、スキーママッチングスコアが閾値以上のデータをスキーママッチング結果データ１６３として抽出する。図７の例では、スキーママッチング結果の中で、店舗テーブルの「店舗ＩＤ」と「オーナＩＤ」カラムに関する結果を抜粋して記載している。 In the schema matching process, two combinations from four tables are performed, and the schema matching process is executed between the two tables of each combination. Then, data whose schema matching score is equal to or greater than the threshold is extracted as schema matching result data 163. In the example of FIG. 7, the results regarding the “store ID” and “owner ID” columns of the store table are extracted from the schema matching results.

データ関係抽出部１３は、スキーママッチング処理部１４によるスキーママッチング処理の結果を取得する。データ関係抽出部１３は、スキーママッチング処理部１４の結果に対して、一定の閾値を設け、スキーママッチングスコア１６３９の値が閾値以上となるスキーママッチング結果データ１６３をデータ蓄積部１６に保存する。ここで得られるスキーママッチング結果データ１６３は、ある２つのテーブル間で、類似度が閾値以上となるカラムのペアとなる。 The data relationship extraction unit 13 acquires the result of the schema matching processing by the schema matching processing unit 14. The data relationship extraction unit 13 sets a certain threshold for the result of the schema matching processing unit 14 and stores the schema matching result data 163 in which the value of the schema matching score 1639 is equal to or greater than the threshold in the data storage unit 16. The schema matching result data 163 obtained here is a pair of columns whose similarity is equal to or greater than a threshold value between two tables.

ｓ１３５において、データ関係抽出部１３は、スキーママッチングスコア１６３９の値が閾値以上となるスキーママッチング結果データ１６３が存在する場合、ｓ１３６に進む。
ｓ１３５において、データ関係抽出部１３は、スキーママッチングスコア１６３９の値が閾値以上となるスキーママッチング結果データ１６３が存在しない場合、ｓ１３８に進む。 In s135, if there is schema matching result data 163 in which the value of the schema matching score 1639 is greater than or equal to the threshold value, the data relationship extracting unit 13 proceeds to s136.
In s135, if there is no schema matching result data 163 in which the value of the schema matching score 1639 is equal to or greater than the threshold value, the data relationship extracting unit 13 proceeds to s138.

ｓ１３６において、データ関係抽出部１３は、継承項目あるいは被継承項目のラベル付与を行う。
継承項目あるいは被継承項目のラベル付与では、参照関係データ１６１のインスタンス名、スキーマ名、テーブル名（１６１１〜１６１３、１６１５〜１６１７のペア）を参照する。参照した結果、スキーママッチング結果データ１６３のインスタンス名、スキーマ名、テーブル名（１６３１〜１６３３、１６３５〜１６３７のペア）に合致するものが存在する場合、継承関係データ１６２をデータ蓄積部１６に保存する。
ここで、スキーママッチング結果データ１６３に含まれる２つのテーブルのうち、一方は参照関係データ１６１の「参照先」に合致し、もう一方は参照関係データ１６１の「参照元」に合致するが、いずれに合致するかで以降の処理が異なる。なお、スキーママッチング結果データ１６３に含まれる２つのテーブルは、インスタンス名、スキーマ名、テーブル名で特定される。よって、一方のテーブルは参照関係データ１６１の「参照先」となるインスタンス名、スキーマ名、テーブル名（１６１１〜１６１３）に合致する。また、もう一方のテーブルは参照関係データ１６１の「参照元」となるインスタンス名、スキーマ名、テーブル名（１６１５〜１６１７）に合致する。 In s136, the data relationship extraction unit 13 assigns a label for the inherited item or the inherited item.
In the label assignment of the inherited item or the inherited item, the instance name, schema name, and table name (a pair of 1611 to 1613 and 1615 to 1617) of the reference relationship data 161 are referred to. As a result of the reference, when there is a thing that matches the instance name, schema name, and table name (a pair of 1631 to 1633 and 1635 to 1637) of the schema matching result data 163, the inheritance relation data 162 is stored in the data storage unit 16. .
Here, one of the two tables included in the schema matching result data 163 matches the “reference destination” of the reference relationship data 161, and the other matches the “reference source” of the reference relationship data 161. The subsequent processing is different depending on whether or not Note that the two tables included in the schema matching result data 163 are specified by an instance name, a schema name, and a table name. Therefore, one table matches the instance name, schema name, and table name (1611 to 1613) that are the “reference destination” of the reference relationship data 161. The other table matches the instance name, schema name, and table name (1615 to 1617) that are the “reference source” of the reference relationship data 161.

参照関係データ１６１の「参照先」に合致した方は、スキーママッチング結果データ１６３のカラム名１６３４あるいはカラム名１６３８は被継承項目となる。そして、インスタンス名、スキーマ名、テーブル名、カラム名および被継承項目のラベルを、図８のように継承関係データ１６２の第１行目としてデータ蓄積部１６に保存する。ここで、参照関係データ１６１の「参照先」のインスタンス名、スキーマ名、テーブル名（１６１１〜１６１３）にインスタンス名、スキーマ名、テーブル名（１６３１〜１６３３あるいは１６３５〜１６３７）が合致する。
ただし、既に同一のインスタンス名、スキーマ名、テーブル名、カラム名（１６２１〜１６２４）を持つ継承関係データ１６２が存在する場合、ラベル名を上書きする。この際、既にラベル名に継承項目が入っている場合は、継承項目／被継承項目とし、既に被継承項目が入っている場合には変更を行わない。既に固有項目が入っている場合には、被継承項目に上書きする。 For those who match the “reference destination” of the reference relationship data 161, the column name 1634 or the column name 1638 of the schema matching result data 163 becomes the inherited item. Then, the instance name, schema name, table name, column name, and inherited item label are stored in the data storage unit 16 as the first line of the inheritance relation data 162 as shown in FIG. Here, the instance name, schema name, and table name (1631 to 1633 or 1635 to 1637) match the instance name, schema name, and table name (1611 to 1613) of the “reference destination” of the reference relationship data 161.
However, if the inheritance relation data 162 having the same instance name, schema name, table name, and column name (1621-1624) already exists, the label name is overwritten. At this time, if an inherited item is already included in the label name, the inherited item / inherited item is set, and if the inherited item is already included, no change is made. If a unique item has already been entered, the inherited item is overwritten.

一方、参照関係データ１６１の「参照元」のインスタンス名、スキーマ名、テーブル名（１６１５〜１６１７）にインスタンス名、スキーマ名、テーブル名（１６３１〜１６３３あるいは１６３５〜１６３７）が合致した方は、スキーママッチング結果データ１６３のカラム名１６３４あるいはカラム名１６３８は継承項目となる。この場合、「参照元」と合致しなかった方、すなわち「参照先」と合致したスキーママッチング結果データ１６３のインスタンス名、スキーマ名、テーブル名、カラム名は、継承関係データ１６２の継承元インスタンス名、継承元スキーマ名、継承元テーブル名、継承元カラム名（１６２６〜１６２９）に対応し、継承項目のラベルを図８のように継承関係データ１６２の２行目としてデータ蓄積部１６に保存する。
ただし、既に同一のインスタンス名、スキーマ名、テーブル名、カラム名（１６２１〜１６２４）を持つ継承関係データ１６２が存在する場合、ラベル名を上書きする。この際、既にラベル名に継承項目が入っている場合は変更を行わない。被継承項目が入っている場合には継承項目／被継承項目とする。既に固有項目が入っている場合には、継承項目に上書きする。 On the other hand, the instance name, the schema name, and the table name (1631 to 1633 or 1635 to 1637) that matches the instance name, schema name, and table name (1615 to 1617) of the “reference source” in the reference relation data 161 are schemas. The column name 1634 or the column name 1638 of the matching result data 163 is an inherited item. In this case, the instance name, schema name, table name, and column name of the schema matching result data 163 that matches the “reference source” that does not match the “reference source” are the inheritance source instance name of the inheritance relation data 162. , The inheritance source schema name, the inheritance source table name, and the inheritance source column name (1626 to 1629), and the label of the inheritance item is stored in the data storage unit 16 as the second line of the inheritance relation data 162 as shown in FIG. .
However, if the inheritance relation data 162 having the same instance name, schema name, table name, and column name (1621-1624) already exists, the label name is overwritten. At this time, if there is an inherited item in the label name, no change is made. If there is an inherited item, the item is inherited / inherited. If there is already a unique item, overwrite the inherited item.

次に、さらにデータ連携を探索するための探索処理について説明する。
図９は、探索用連携元と探索用連携先の構成を示す図である。
ｓ１３７において、データ関係抽出部１３は、インスタンス名、スキーマ名、テーブル名で特定される２つのテーブルそれぞれについて、参照関係データ１６１の「参照元」となるデータが存在する場合に、ｓ１３７ａにおいてこのテーブルを図７に示す探索用参照元テーブル１２１とする。また、データ関係抽出部１３は、インスタンス名、スキーマ名、テーブル名で特定される２つのテーブルそれぞれについて、参照関係データ１６１の「参照先」となるテーブルを探索用参照先テーブル１２２に格納する。
この際、探索用参照先テーブル１２２のテーブルについても、参照関係データ１６１の「参照元」となるデータが存在する場合に、「参照先」となるテーブルを探索用参照先テーブル１２２に格納する。すなわち、探索用参照元テーブル１２１から、参照元から参照先という方向の参照関係によって到達することが可能なテーブルすべてが、探索用参照先テーブル１２２に格納される。
そして、データ関係抽出部１３は、探索用参照元テーブル１２１と、探索用参照先テーブル１２２のテーブル間で、スキーママッチングを実施する。データ関係抽出部１３は、ｓ１３７ｂにおいて共通項目が存在する場合に、ｓ１３７ｃにおいて継承項目／被継承項目のラベル付与を行う。ｓ１３７ｃにおける継承項目／被継承項目のラベル付与の処理は、ｓ１３６における継承項目／被継承項目のラベル付与の処理と同様である。 Next, search processing for further searching for data linkage will be described.
FIG. 9 is a diagram illustrating a configuration of a search cooperation source and a search cooperation destination.
In s137, the data relationship extraction unit 13 determines that this table is used in s137a when there is data that becomes the “reference source” of the reference relationship data 161 for each of the two tables specified by the instance name, schema name, and table name. Is referred to as a search reference source table 121 shown in FIG. In addition, the data relationship extraction unit 13 stores a table to be a “reference destination” of the reference relationship data 161 in the search reference table 122 for each of the two tables specified by the instance name, the schema name, and the table name.
At this time, also for the table of the reference destination table for search 122, when there is data that becomes the “reference source” of the reference relationship data 161, the table that becomes the “reference destination” is stored in the reference destination table for search 122. That is, all the tables that can be reached by the reference relationship in the direction from the reference source to the reference destination from the search reference source table 121 are stored in the search reference destination table 122.
Then, the data relationship extraction unit 13 performs schema matching between the search reference source table 121 and the search reference destination table 122. If there is a common item in s137b, the data relationship extraction unit 13 assigns the inherited item / inherited item label in s137c. The inheritance item / inherited item label assignment processing in s137c is the same as the inheritance item / inherited item label assignment processing in s136.

ｓ１３８において、データ関係抽出部１３は、２つのテーブル（インスタンス名、スキーマ名、テーブル名で特定される）それぞれについて、継承関係データ１６２に継承項目としても被継承項目としても保存されていないカラムを固有項目とし、インスタンス名、スキーマ名、テーブル名、カラム名および固有項目のラベルを図８の第４行目のように継承関係データ１６２としてデータ蓄積部１６に保存する。 In s138, the data relationship extraction unit 13 sets columns that are not stored as inherited items or inherited items in the inheritance relationship data 162 for each of the two tables (identified by the instance name, schema name, and table name). As the unique item, the instance name, schema name, table name, column name, and unique item label are stored in the data storage unit 16 as inheritance relation data 162 as shown in the fourth line of FIG.

ｓ１３９において、データ関係抽出部１３は、任意の２テーブルの組み合わせについて、未処理の組み合わせが存在する場合、スキーママッチング処理部１４への引渡し以後の処理を繰り返す。
以上で、データ関係抽出処理ｓ１３０ａの説明を終わる。 In s139, the data relationship extraction unit 13 repeats the processing after the delivery to the schema matching processing unit 14 when there is an unprocessed combination for any two table combinations.
This is the end of the description of the data relationship extraction process s130a.

＜関係性抽出処理ｓ１４０＞
図１０を用いて、関係性抽出処理ｓ１４０について説明する。
関係性抽出処理ｓ１４０において、関係性抽出部１５は、継承関係データ１６２に基づいて、テーブルのテーブル特性１７１０１を表すテーブルスコアを算出し、算出したテーブルスコアとテーブルとを対応付けた関係性情報１７１を生成する。
関係性抽出部１５は、データ関係抽出部１３が保存した継承関係データ１６２に基づき、図１２に示す関係性情報１７１を関係性情報格納部１７に保存する。 <Relationship extraction processing s140>
The relationship extraction process s140 will be described with reference to FIG.
In the relationship extraction process s140, the relationship extraction unit 15 calculates a table score representing the table characteristics 17101 of the table based on the inheritance relationship data 162, and the relationship information 171 that associates the calculated table score with the table. Is generated.
The relationship extraction unit 15 stores the relationship information 171 shown in FIG. 12 in the relationship information storage unit 17 based on the inheritance relationship data 162 stored by the data relationship extraction unit 13.

ｓ１４１において、関係性抽出部１５は、インスタンス名、スキーマ名、テーブル名で特定される各テーブルについて、継承関係データ１６２を取得する。取得した継承関係データ１６２には、テーブルに属する各カラムについて、継承項目、被継承項目、固有項目等のラベル名と、継承項目である場合には継承元のインスタンス名、スキーマ名、テーブル名、カラム名が含まれる。 In s141, the relationship extraction unit 15 acquires inheritance relationship data 162 for each table specified by the instance name, schema name, and table name. The acquired inheritance relationship data 162 includes, for each column belonging to the table, label names such as inheritance items, inherited items, and unique items, and in the case of inheritance items, the instance name of the inheritance source, the schema name, the table name, Contains the column name.

ｓ１４２において、関係性抽出部１５は、取得した継承関係データ１６２のうち、ラベル名が継承項目、被継承項目、となるデータの個数をもとに、図１１（ａ）に示す継承項目及び被継承項目に基づくスコア１５１を算出する。ただし、ラベル名が「継承項目／被継承項目」のデータは計算に含めない。
ｓ１４３において、関係性抽出部１５は、取得した継承関係データ１６２のうち、ラベル名が固有項目となるデータの個数をもとに、図１１（ａ）に示す固有項目に基づくスコア１５２を算出する。
ここで、スコアの算出に用いる重みは、重みに関する制約１５３を満たす任意の正の数とする。図１１（ｂ）に重みに関する制約１５３を満たす重みの具体例を示す。 In s142, the relationship extraction unit 15 determines the inherited items and the received items shown in FIG. 11A based on the number of data whose label names are the inherited items and the inherited items in the acquired inherited relationship data 162. A score 151 based on the inherited item is calculated. However, data whose label name is “inherited item / inherited item” is not included in the calculation.
In s143, the relationship extraction unit 15 calculates a score 152 based on the unique item illustrated in FIG. 11A based on the number of pieces of data in which the label name is the unique item in the acquired inheritance relationship data 162. .
Here, the weight used for calculating the score is an arbitrary positive number that satisfies the constraint 153 regarding the weight. FIG. 11B shows a specific example of the weight that satisfies the constraint 153 regarding the weight.

ｓ１４４において、関係性抽出部１５は、継承項目及び被継承項目に基づくスコア１５１と固有項目に基づくスコア１５２を合計してテーブルスコア１７１４とする。
図１２に示すように、関係性抽出部１５は、算出したテーブルスコア１７１４に基づく関係性情報１７１を生成し、関係性情報格納部１７に保存する。図１１（ｂ）に継承項目及び被継承項目に基づくスコア１５１と固有項目に基づくスコア１５２とテーブルスコア１７１４の算出例を示す。 In s144, the relationship extraction unit 15 adds the score 151 based on the inherited item and the inherited item and the score 152 based on the unique item to obtain the table score 1714.
As shown in FIG. 12, the relationship extraction unit 15 generates relationship information 171 based on the calculated table score 1714 and stores it in the relationship information storage unit 17. FIG. 11B shows a calculation example of the score 151 based on the inherited item and the inherited item, the score 152 based on the unique item, and the table score 1714.

ｓ１４５において、関係性抽出部１５は、すべてのテーブルに対して処理が行われたか否かを判定し、すべてのテーブルに対する処理が行われるまで、ｓ１４１からｓ１４４までの処理を繰り返す。 In s145, the relationship extraction unit 15 determines whether or not processing has been performed on all tables, and repeats the processing from s141 to s144 until processing is performed on all tables.

被継承項目・固有項目が多く、継承項目が少ないテーブルはマスタ系のテーブルであり、被継承項目・固有項目が少なく、継承項目が多いテーブルはトランザクション系のテーブルである。関係性情報１７１は、各テーブルがマスタ系のテーブル、トランザクション系のテーブルのどちらに属するかをスコア１５１，１５２に基づき計算したテーブルスコア１７１４で表現するデータである。図１２に示すように、テーブルスコア１７１４が高いほどマスタ系のテーブル１７１０１ａであり、テーブルスコア１７１４が低いほどトランザクション系のテーブル１７１０１ｂと捉えられる。 A table with many inherited items / unique items and few inherited items is a master table, and a table with few inherited items / unique items and many inherited items is a transaction table. The relationship information 171 is data representing a table score 1714 that is calculated based on the scores 151 and 152 as to whether each table belongs to a master table or a transaction table. As shown in FIG. 12, the higher the table score 1714, the more the master table 17101a, and the lower the table score 1714, the transaction table 17101b.

＜マッチング結果統合処理ｓ１５０＞
図１３を用いて、マッチング結果統合処理ｓ１５０について説明する。
マッチング結果統合処理ｓ１５０において、マッチング結果統合部１８は、テーブル群１０００に含まれる検索対象の連携先テーブルを定義する連携先テーブル定義情報３０００を取得する。マッチング結果統合部１８は、関係性情報１７１に基づいて、連携先テーブルの連携元である連携元テーブルの候補を連携元候補テーブル群として抽出する。
マッチング結果統合部１８は、入力装置３００からの入力データと、データ蓄積部１６および関係性情報格納部１７に保存された情報とから、入力装置３００から与えられたテーブル定義情報に対して、連携元として対応するテーブル定義情報を返却する。 <Matching result integration process s150>
The matching result integration process s150 will be described with reference to FIG.
In the matching result integration process s150, the matching result integration unit 18 acquires cooperation destination table definition information 3000 that defines a search destination cooperation destination table included in the table group 1000. Based on the relationship information 171, the matching result integration unit 18 extracts a candidate of a collaboration source table that is a collaboration source of the collaboration destination table as a collaboration source candidate table group.
The matching result integration unit 18 cooperates with the table definition information given from the input device 300 based on the input data from the input device 300 and the information stored in the data storage unit 16 and the relationship information storage unit 17. Return the corresponding table definition information as a source.

ｓ１５１において、マッチング結果統合部１８は、入力装置３００からデータ連携関係を探索する対象のテーブル定義情報、すなわち連携先テーブル定義情報を取得する。
また、入力装置３００から入力される情報は、図１４の連携先テーブル定義情報入力例３１のようにインスタンス名、スキーマ名、テーブル名、カラム名からなるテーブルの定義情報であってもよいし、連携先テーブル定義情報入力例３２のようにカラム名のみからなるテーブル定義情報であってもよい。 In s151, the matching result integration unit 18 acquires table definition information to be searched for a data linkage relationship from the input device 300, that is, linkage destination table definition information.
Further, the information input from the input device 300 may be table definition information including an instance name, a schema name, a table name, and a column name as in the linkage destination table definition information input example 31 of FIG. It may be table definition information consisting only of column names as in the linkage destination table definition information input example 32.

ｓ１５２において、マッチング結果統合部１８は、取得した連携先テーブル定義情報のインスタンス名、スキーマ名、テーブル名、カラム名に対して、データ蓄積部１６に保存されているスキーママッチング結果データ１６３のインスタンス名、スキーマ名、テーブル名、カラム名（１６３１〜１６３４あるいは１６３５〜１６３８）が該当するデータが存在するかを検索する。取得した連携先テーブル定義情報に含まれていないインスタンス名、スキーマ名、テーブル名、カラム名は検索条件に含めない。 In s152, the matching result integration unit 18 uses the instance name of the schema matching result data 163 stored in the data storage unit 16 for the instance name, schema name, table name, and column name of the acquired cooperation destination table definition information. , A schema name, a table name, and a column name (1631 to 1634 or 1635 to 1638) are searched for the existence of data. Instance names, schema names, table names, and column names that are not included in the acquired linkage destination table definition information are not included in the search conditions.

ｓ１５３において、スキーママッチング結果データ１６３の検索結果が存在しない場合、ｓ１５Ａに進む。
ｓ１５Ａにおいて、マッチング結果統合部１８は、表示装置２００にスキーママッチング結果データ１６３の検索結果がない旨を出力する。 If the search result of the schema matching result data 163 does not exist in s153, the process proceeds to s15A.
In s15A, the matching result integration unit 18 outputs that the search result of the schema matching result data 163 is not present on the display device 200.

ｓ１５３において、スキーママッチング結果データ１６３の検索結果が存在する場合、検索結果を検索対象データ群としてｓ１５４に進む。例えば、図１４に示すように、３つ入力された連携先テーブル定義情報入力例３２のどれか１つでもヒットするスキーママッチング結果データ１６３が存在すれば、ｓ１５４の処理に進む。
ｓ１５４において、マッチング結果統合部１８は、関係性情報格納部１７に保存されている関係性情報１７１から、各テーブルのテーブルスコア１７１４を取得する。 If there is a search result of the schema matching result data 163 in s153, the search result is set as a search target data group and the process proceeds to s154. For example, as illustrated in FIG. 14, if there is schema matching result data 163 that hits any one of the three input destination table definition information input examples 32, the process proceeds to s154.
In s154, the matching result integration unit 18 acquires a table score 1714 of each table from the relationship information 171 stored in the relationship information storage unit 17.

＜連携元テーブル群検索処理ｓ１５５＞
ｓ１５５において、マッチング結果統合部１８は、参照関係データ１６１および関係性情報１７１から、次の３つの条件を満たすテーブルの組み合わせを検索する。マッチング結果統合部１８は、３つの条件として「テーブル間が参照関係で結ばれている」、「連携先テーブル定義情報を構成するカラムすべてに連携元となるカラムが存在する」、「参照関係で結ばれるテーブルのテーブルスコア情報の合計値が最大あるいは最小である」を満たすテーブルの組み合わせを、連携元テーブル群１５５１として検索する。 <Cooperation source table group search processing s155>
In s155, the matching result integration unit 18 searches the reference relationship data 161 and the relationship information 171 for a table combination that satisfies the following three conditions. The matching result integration unit 18 has three conditions: “the tables are linked by a reference relationship”, “the column that is the linkage source exists in all the columns that constitute the linkage destination table definition information”, and “the reference relationship A combination of tables satisfying “the total value of the table score information of the tables to be connected is maximum or minimum” is searched as the cooperation source table group 1551.

図１５を用いて、本実施の形態に係る連携元テーブル群検索処理ｓ１５５について説明する。
ｓ１５５１で、マッチング結果統合部１８は、スキーママッチング結果データ１６３に含まれる２つのカラム（インスタンス名、スキーマ名、テーブル名、カラム名の組み合わせであり、１６３１〜１６３４、１６３５〜１６３８）それぞれについて、連携元候補カラム群１８１とする。 The cooperation source table group search process s155 according to the present embodiment will be described with reference to FIG.
In s1551, the matching result integration unit 18 cooperates with each of the two columns included in the schema matching result data 163 (instance name, schema name, table name, and column name combination, 1631 to 1634, 1635 to 1638). The original candidate column group 181 is assumed.

図１６に連携元候補カラム群１８１の具体例を示す。
この際、連携先テーブル定義情報の同じカラムに対応するスキーママッチング結果データ１６３に含まれている連携元候補カラム群１８１は、同じ値の識別子を持つものとする。すなわち、連携先テーブル定義情報が５つのカラムからなる場合、連携元候補カラム群１８１は５種類の識別子を持つことになる。 FIG. 16 shows a specific example of the cooperation source candidate column group 181.
At this time, it is assumed that the cooperation source candidate column group 181 included in the schema matching result data 163 corresponding to the same column of the cooperation destination table definition information has the same identifier. That is, when the cooperation destination table definition information includes five columns, the cooperation source candidate column group 181 has five types of identifiers.

ｓ１５５２において、マッチング結果統合部１８は、連携元候補カラム群１８１に含まれるデータから、識別子１８１５の値１つに対して１つずつデータを取り出して連携元候補カラム群の組み合わせ１８２を生成する。
図１７に連携元候補カラム群の組み合わせ１８２の具体例を示す。連携元候補カラム群１８１において、Ｐ００１は２つ、Ｐ００２は３つ、Ｐ００３は２つあり、これらの組み合わせとなるため、２×３×２＝１２パターンとなる。 In s1552, the matching result integration unit 18 extracts data one by one for each value of the identifier 1815 from the data included in the cooperation source candidate column group 181 to generate a combination source column column combination 182.
FIG. 17 shows a specific example of the combination source column group combination 182. In the cooperation source candidate column group 181, there are two P001s, three P002s, and two P003s, and these are combinations, and thus 2 × 3 × 2 = 12 patterns.

ｓ１５５３で、マッチング結果統合部１８は、この連携元候補カラム群の組み合わせ１８２に含まれる各テーブル（インスタンス名、スキーマ名、テーブル名で特定される）のペアを連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３として生成する。
図１８に連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３の具体例を示す。例えば、連携元候補カラム群の組み合わせ１８２の組み合わせＩＤ：ＣＯＭＢ０１２から、「注文テーブルとオーナテーブル」、「注文テーブルと商品テーブル」、「オーナテーブルと商品テーブル」の３つのテーブルのペアを生成する。 In s1553, the matching result integration unit 18 includes a pair of each table (identified by an instance name, a schema name, and a table name) included in the combination source column group combination 182 in the combination source column group combination. The table pair 183 is generated.
FIG. 18 shows a specific example of the table pair 183 included in the combination of cooperation source candidate column groups. For example, a pair of three tables of “order table and owner table”, “order table and product table”, and “owner table and product table” is generated from the combination ID: COMB 012 of the combination source candidate column group 182.

ｓ１５５４において、最短経路探索部１９は、連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３の各ペアについて、２つのテーブル間を結ぶ最短経路を探索する。
図１９に、最短経路探索部１９が用いるノード情報１９１、辺情報１９２、最短経路情報１９３の具体例を示す。最短経路探索部１９は、ノード情報１９１、辺情報１９２を用いて、連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３の各ペアについて、最短経路を探索し、最短経路情報１９３を生成する。
この際、最短経路探索問題におけるノードはテーブル（インスタンス名、スキーマ名、テーブル名で特定される）であり、ノード情報１９１のノードＩＤ１９１４で識別される。
また、ノード間を結ぶ辺が存在するかはテーブル間の参照関係、すなわち一方を参照元テーブル、もう一方を参照先テーブルとした場合に、参照元テーブルとなるテーブルの情報（インスタンス名、スキーマ名、テーブル名）が参照関係データ１６１に存在するかで定義され、辺が存在する場合は、辺情報１９２としてノード情報１９１のノードＩＤ１９１４のペアと、辺の重み１９２３が最短経路問題における辺となる。なお、辺の重み１９２３は全ノード間で等しい（常に１）であるものとする。
図２６は、連携元候補カラム群の組み合わせに含まれるテーブルのペア１８３と、期待する最短経路探索部１９での探索結果を示す図である。 In s1554, the shortest path search unit 19 searches for the shortest path connecting the two tables for each pair of table pairs 183 included in the combination of cooperation source candidate columns.
FIG. 19 shows specific examples of node information 191, edge information 192, and shortest path information 193 used by the shortest path search unit 19. The shortest path search unit 19 uses the node information 191 and the edge information 192 to search for the shortest path for each pair of the table pairs 183 included in the combination of cooperation source candidate columns and generate the shortest path information 193. .
At this time, a node in the shortest path search problem is a table (identified by an instance name, a schema name, and a table name), and is identified by a node ID 1914 of the node information 191.
Also, whether there is an edge connecting the nodes is the reference relationship between the tables, that is, when one is the reference source table and the other is the reference destination table, the information of the table that becomes the reference source table (instance name, schema name) , Table name) is defined in the reference relation data 161, and when there is an edge, the edge ID 1914 pair of the node information 191 as the edge information 192 and the edge weight 1923 are the edges in the shortest path problem. . Note that the edge weight 1923 is the same among all nodes (always 1).
FIG. 26 is a diagram illustrating a table pair 183 included in the combination of cooperation source candidate column groups and a search result in the expected shortest path search unit 19.

マッチング結果統合部１８は、最短経路探索部１９による探索の結果、すべてのテーブルのペア間で最短経路が存在しない場合、ｓ１５５２に処理を戻す。マッチング結果統合部１８は、すべてのテーブルのペア間で最短経路が存在しない場合、連携元候補カラム群の組み合わせ１８２では「テーブル間が参照関係で結ばれている」条件が満たせないと判断する。そして、マッチング結果統合部１８は、連携元候補カラム群の組み合わせ生成ｓ１５５２以降の処理を再度実施する。
マッチング結果統合部１８は、最短経路探索部１９による探索の結果、最短経路が存在する場合、ｓ１５５６に進む。 The matching result integration unit 18 returns the process to s1552 if the shortest route does not exist between all the table pairs as a result of the search by the shortest route search unit 19. When the shortest path does not exist between all the table pairs, the matching result integration unit 18 determines that the condition “the tables are connected by a reference relationship” cannot be satisfied in the combination source candidate column group combination 182. Then, the matching result integration unit 18 again performs the processing after the combination generation candidate column group combination generation s1552.
When the shortest route exists as a result of the search by the shortest route search unit 19, the matching result integration unit 18 proceeds to s1556.

ｓ１５５６において、マッチング結果統合部１８は、最短経路探索部１９から返却された最短経路情報１９３について、任意の個数の組み合わせを生成し、連携元候補カラム群１８１の識別子で示されるカラムが、識別子ごとに１つ以上経路上に存在する最短経路情報の組み合わせを検索する。
図２７は、最短経路情報１９３について任意の個数の組み合わせを生成し、連携元候補カラム群の識別子で示されるカラムが、識別子ごとに１つ以上経路上に存在する最短経路情報の組み合わせを検索する方法を説明する図である。 In s1556, the matching result integration unit 18 generates an arbitrary number of combinations for the shortest path information 193 returned from the shortest path search unit 19, and the column indicated by the identifier of the cooperation source candidate column group 181 is displayed for each identifier. A combination of the shortest path information existing on one or more paths is searched.
FIG. 27 generates an arbitrary number of combinations for the shortest path information 193, and searches for a combination of the shortest path information in which one or more columns indicated by the identifiers of the cooperation source candidate column groups exist on the path for each identifier. It is a figure explaining a method.

ｓ１５５７において、マッチング結果統合部１８は、検索した各最短経路情報の組み合わせについて、固有の最短経路識別子を付与し、最短経路情報の組み合わせ１８４とする。
図２０に、最短経路情報の組み合わせ１８４の具体例を示す。
また、マッチング結果統合部１８は、この最短経路情報の組み合わせ１８４を生成すると共に、連携元候補カラム群の組み合わせ１８２に最短経路識別子１８５１を付与し、最短経路から連携可能なカラム群１８５とする。
図２１に、最短経路から連携可能なカラム群１８５の具体例を示す。例えば、ＲＯＵＴＥ００１には「店舗」テーブルと「注文」テーブルが存在し、いずれのテーブルにも店舗ＩＤ、オーナ名が存在するため、どちらのテーブルから連携するかの組み合わせとなる。 In s1557, the matching result integrating unit 18 assigns a unique shortest path identifier to each combination of searched shortest path information, and sets it as the shortest path information combination 184.
FIG. 20 shows a specific example of the combination 184 of the shortest path information.
In addition, the matching result integration unit 18 generates the combination 184 of the shortest path information, and assigns the shortest path identifier 1851 to the combination source candidate column group combination 182 so that the column group 185 can be linked from the shortest path.
FIG. 21 shows a specific example of the column group 185 that can be linked from the shortest path. For example, in ROUTE001, there are a “store” table and an “order” table, and a store ID and an owner name exist in each table.

ｓ１５５８において、マッチング結果統合部１８は、ノード情報１９１に含まれるノードＩＤのうち、最短経路情報の組み合わせ１８４の始点ノードＩＤ１８４２および終点ノードＩＤ１８４３および経由ノードＩＤ１８４４のいずれにも含まれないものを除外する。マッチング結果統合部１８は、以上のように、最短経路上ノード情報１８６を生成する。
図２２に、最短経路上ノード情報１８６の具体例を示す。 In s1558, the matching result integration unit 18 excludes node IDs included in the node information 191 that are not included in any of the start node ID 1842, the end node ID 1843, and the transit node ID 1844 of the shortest path information combination 184. . The matching result integration unit 18 generates the shortest path node information 186 as described above.
FIG. 22 shows a specific example of the shortest path node information 186.

以上のように、連携元テーブル群検索処理ｓ１５５は、最短経路情報の組み合わせ１８４と、最短経路から連携可能なカラム群１８５、最短経路上ノード情報１８６をマッチング結果統合部１８に返却する。 As described above, the link source table group search process s155 returns the combination 184 of the shortest path information, the column group 185 that can be linked from the shortest path, and the node information 186 on the shortest path to the matching result integration unit 18.

次に、図１３に戻り、マッチング結果統合処理ｓ１５０についての説明を続ける。
ｓ１８６において、マッチング結果統合部１８は、返却された最短経路から連携可能なカラム群１８５について、以下のように処理を行う。マッチング結果統合部１８は、最短経路識別子１８５１ごとに、経路上に存在するノードである始点ノードＩＤ１８４２および終点ノードＩＤ１８４３および経由ノードＩＤ１８４４を取得する。マッチング結果統合部１８は、取得したこれらのノードＩＤを用いて最短経路上ノード情報１８６からインスタンス名、スキーマ名、テーブル名（１８６１〜１８６３）を取得し、関係性情報１７１のテーブルスコア１７１４を取得して合計値を算出する。ただし、始点ノードＩＤ１８４２あるいは終点ノードＩＤ１８４３あるいは経由ノードＩＤ１８４４に同じノードＩＤが複数含まれる場合、１個とみなして合計値を算出する。
図２８は、合計値の算出の具体例を示す図である。 Next, returning to FIG. 13, the description of the matching result integration process s150 will be continued.
In s186, the matching result integration unit 18 performs the following processing on the column group 185 that can be linked from the returned shortest path. The matching result integration unit 18 acquires, for each shortest path identifier 1851, a start node ID 1842, an end node ID 1843, and a transit node ID 1844 that are nodes on the path. The matching result integration unit 18 acquires the instance name, schema name, and table name (1861-1863) from the shortest path node information 186 using these acquired node IDs, and acquires the table score 1714 of the relationship information 171. To calculate the total value. However, when the same node ID is included in the start node ID 1842, the end node ID 1843, or the transit node ID 1844, the total value is calculated assuming that it is one.
FIG. 28 is a diagram illustrating a specific example of calculation of the total value.

ｓ１８７において、マッチング結果統合部１８は、最短経路情報の組み合わせ１８４のなかで、もっともテーブルスコアの合計値が大きい最短経路情報の組み合わせを識別する最短経路識別子を用いて、最短経路から連携可能なカラム群１８５のデータを取り出す。マッチング結果統合部１８は、取り出したインスタンス名、スキーマ名、テーブル名、カラム名（１８５４〜１８５７）をマスタテーブル優先の連携元テーブル定義情報として返却する。ただし、同じカラム名が複数含まれる場合、関係性情報１７１のテーブルスコア１７１４が大きいテーブルのカラムを選択する。 In s 187, the matching result integration unit 18 uses the shortest path identifier that identifies the shortest path information combination having the largest total table score among the shortest path information combinations 184, so that the column can be linked from the shortest path. Data of group 185 is retrieved. The matching result integration unit 18 returns the extracted instance name, schema name, table name, and column name (1854-1857) as cooperation table definition information with priority on the master table. However, when a plurality of the same column names are included, a column of a table having a large table score 1714 of the relationship information 171 is selected.

ｓ１８８において、マッチング結果統合部１８は、最短経路情報の組み合わせ１８４のなかで、もっともテーブルスコアの合計値が小さい最短経路情報の組み合わせを識別する最短経路識別子を用いて、最短経路から連携可能なカラム群１８５のデータを取り出す。マッチング結果統合部１８は、取り出したインスタンス名、スキーマ名、テーブル名、カラム名（１８５４〜１８５７）をトランザクションテーブル優先の連携元テーブル定義情報として返却する。ただし、同じカラム名が複数含まれる場合、関係性情報１７１のテーブルスコア１７１４が小さいテーブルのカラムを選択する。 In s188, the matching result integrating unit 18 uses the shortest path identifier that identifies the shortest path information combination having the smallest total table score among the shortest path information combinations 184, so that the column that can be linked from the shortest path is used. Data of group 185 is retrieved. The matching result integration unit 18 returns the extracted instance name, schema name, table name, and column name (1854 to 1857) as the linkage table priority information of the transaction table. However, when a plurality of the same column names are included, a column of a table having a small table score 1714 of the relationship information 171 is selected.

ｓ１８９において、マッチング結果統合部１８は、マスタテーブル優先の連携元テーブル群およびトランザクションテーブル優先の連携元テーブル群に関して、参照関係データ１６１をもとにテーブル間の参照カラム群を返却する。
すなわち、マスタテーブル優先の連携元テーブル群およびトランザクションテーブル優先の連携元テーブル群の最短経路情報の組み合わせ１８４に関して、以下のように参照カラム群に加える。始点ノードＩＤ１８４２から経由ノードＩＤ１８４４を経由して終点ノードＩＤ１８４３に到達するまでのインスタンス名、スキーマ名、テーブル名（１８６１〜１８６３）を順次参照関係データ１６１の参照元（１６１５〜１６１７）および参照先（１６１１〜１６１３）とする。また、参照元となるインスタンス名、スキーマ名、テーブル名、カラム名（１６１５〜１６１８）および参照先となるインスタンス名、スキーマ名、テーブル名、カラム名（１６１１〜１６１４）を参照カラム群に加える。
以上で、本実施の形態に係るマッチング結果統合処理ｓ１５０についての説明を終わる。 In s189, the matching result integration unit 18 returns the reference column group between the tables based on the reference relation data 161 with respect to the master table-priority cooperation source table group and the transaction table priority cooperation source table group.
That is, the shortest path information combination 184 of the master table priority linkage source table group and the transaction table priority linkage source table group is added to the reference column group as follows. The instance name, schema name, and table name (1861-1863) from the start point node ID 1842 to the end point node ID 1843 via the via node ID 1844 are sequentially referred to as the reference source (1615-1617) and reference destination of the reference relation data 161 ( 1611-1613). Also, the instance name, schema name, table name, column name (1615-1618) serving as the reference source and the instance name, schema name, table name, column name (1611-1614) serving as the reference destination are added to the reference column group.
This is the end of the description of the matching result integration process s150 according to the present embodiment.

次に、マッチング結果表示処理ｓ１６０について説明する。
表示装置２００は、マッチング結果統合部１８の返却結果を、画面に表示してもよいし、他のプログラムに利用可能な形式でファイルとして保存してもよい。
図２３は、マスタテーブル優先の連携元テーブル定義情報出力例を示す。また、図２４は、トランザクションテーブル優先の連携元テーブル定義情報出力例を示す。
例えば、表示装置２００は、マッチング結果統合部１８から返却された、マスタテーブル優先の連携元テーブル定義情報を図２３に示すマスタテーブル優先の連携元テーブル定義情報出力例２１のように表示する。また、トランザクションテーブル優先の連携元テーブル定義情報を図２４に示すトランザクションテーブル優先の連携元テーブル定義情報出力例２３のように表示する。また、テーブル間の参照カラム群を図２３に示すマスタテーブル優先の連携元テーブル定義情報に関する参照カラム群出力例２２および図２４に示すトランザクションテーブル優先の連携元テーブル定義情報に関する参照カラム群出力例２４のように表示する。 Next, the matching result display process s160 will be described.
The display device 200 may display the return result of the matching result integration unit 18 on a screen, or may save it as a file in a format that can be used for other programs.
FIG. 23 shows an output example of master table priority linkage source table definition information. FIG. 24 shows an output example of cooperation source table definition information giving priority to a transaction table.
For example, the display device 200 displays the master table priority linkage source table definition information returned from the matching result integration unit 18 as in the master table priority linkage source table definition information output example 21 illustrated in FIG. Further, the transaction table priority linkage source table definition information is displayed as in transaction table priority linkage source table definition information output example 23 shown in FIG. 23. Reference column group output example 22 related to master table priority linkage source table definition information shown in FIG. 23 and reference column group output example 24 related to transaction table priority linkage source table definition information shown in FIG. Is displayed.

以上で、本実施の形態に係るデータ連携推定装置１００のデータ連携推定方法、データ連携推定処理の動作についての説明を終わる。 Above, description about the operation | movement of the data cooperation estimation method of the data cooperation estimation apparatus 100 which concerns on this Embodiment, and a data cooperation estimation process is complete | finished.

＊＊＊効果の説明＊＊＊
以上のように、本実施の形態に係るデータ連携推定装置１００は、複数のシステムの複数のデータベーステーブル間のデータ連携定義をデータベーステーブル定義情報から自動的に推薦する。参照関係抽出部は、連携元として受けとったデータベーステーブルの定義情報から、データベーステーブル間に意味的に存在する参照関係を抽出する。固有／継承項目抽出部は、前記参照関係抽出部から提供された参照関係情報をもとに連携元のデータベーステーブル間で継承項目、被継承項目、固有項目を抽出する。テーブル間関係性抽出部は、前記固有／継承項目抽出部から提供された固有／継承項目情報をもとにテーブルの仮想的な位置情報を数値化する。マッチング結果統合部は、前記テーブル間関係性抽出部から提供されたテーブル間関係性情報をもとに、連携元テーブルと連携先テーブルとのマッチングを実施し連携元候補テーブル群を抽出する。 *** Explanation of effects ***
As described above, the data linkage estimation apparatus 100 according to the present embodiment automatically recommends the data linkage definition between a plurality of database tables of a plurality of systems from the database table definition information. The reference relationship extraction unit extracts a reference relationship that exists semantically between the database tables from the definition information of the database table received as the cooperation source. The unique / inherited item extracting unit extracts inherited items, inherited items, and unique items between the database tables of the cooperation source based on the reference relationship information provided from the reference relationship extracting unit. The inter-table relationship extraction unit quantifies the virtual position information of the table based on the unique / inherited item information provided from the unique / inherited item extraction unit. The matching result integration unit performs matching between the cooperation source table and the cooperation destination table based on the inter-table relationship information provided from the inter-table relationship extraction unit, and extracts a cooperation source candidate table group.

また、マッチング結果統合部は、前記連携元候補テーブル群と前記連携先テーブルとのマッチングにおいて見つからなかった項目について、不足する項目をもつデータベーステーブルと前記連携元候補テーブル群との結合可否を判断する。そして、マッチング結果統合部は、結合可能な場合に結合するための結合キーとなる結合カラム群を報告する。 In addition, the matching result integration unit determines whether or not to join the database table having an insufficient item and the cooperation source candidate table group with respect to items that are not found in the matching between the cooperation source candidate table group and the cooperation destination table. . Then, the matching result integration unit reports a combination column group serving as a combination key for combination when the combination is possible.

データ連携推定装置によれば、データベースの定義情報のみから、連携先テーブル定義情報に対する連携元テーブル群として、マスタテーブルを優先した連携元テーブル群と、トランザクションテーブルを優先した連携元テーブル群とを返却することが可能となる。マスタテーブルを優先した連携元テーブル群は、参照される一方で自身が参照することは少なく、固有カラムが多い。トランザクションテーブルを優先した連携元テーブル群は、参照する一方で自身が参照されることは少なく、固有カラムが少ない。また、これらのテーブル群を結合するための結合キーとなるカラムの情報を併せて返却することが可能となる。 According to the data linkage estimation apparatus, only the definition information of the database returns the linkage source table group giving priority to the master table and the linkage source table group giving priority to the transaction table as the linkage source table group for the linkage destination table definition information. It becomes possible to do. The linkage source table group that prioritizes the master table is referenced but is rarely referred to by itself, and has many unique columns. The linkage source table group that prioritizes the transaction table is referred to by itself but is rarely referred to, and has few unique columns. In addition, it is possible to return information on columns as a join key for joining these table groups together.

さらに、本実施の形態に係るデータ連携推定装置１００の効果について、他の技術と比較することにより説明する。
例えば、発行されたクエリを処理するような処理形態を取る技術がある。この技術の場合、推定精度を高めるためにはすべての発行クエリを利用する必要があるが、一般に膨大な発行クエリをすべて保有しておくことはシステムの容量上現実的ではない場合が多い。また、発行クエリをすべて利用できたとしても利用者による試行錯誤的な発行クエリと、システムにより発行された正当なクエリの判別は困難であり、誤りが混入する可能性がある。
さらに、複数のデータベース間でのクエリ発行はアプリケーションが間に入り、クエリ発行を媒介するため、アプリケーション内でのデータ連携は発行クエリから読み取ることはできず、複数のデータベースに跨ったデータ連携の設計は困難である。
以上により、発行クエリを利用する技術では、適用できるシステムの範囲が利用可能な発行クエリの数や、クエリ発行を媒介するアプリケーションにより制限され、全システムを対象として実施することが困難である。また、データ連携関係の自動推薦のためには発行クエリの準備が必須であるため、例えば連携先テーブルが新規に作成するテーブルであり、まだ発行クエリが存在しない場合は利用できない。 Furthermore, the effect of the data cooperation estimation apparatus 100 according to the present embodiment will be described by comparing with other techniques.
For example, there is a technique that takes a processing form such as processing an issued query. In the case of this technique, it is necessary to use all issued queries in order to increase the estimation accuracy. However, it is often not practical in terms of system capacity to have all the large issued queries. Even if all the issued queries can be used, it is difficult to discriminate between a trial and error issued query by a user and a valid query issued by the system, and an error may be mixed.
In addition, since query issuance between multiple databases is interspersed by applications, data linkage within the application cannot be read from issued queries, and design of data linkage across multiple databases It is difficult.
As described above, in the technology using the issued query, the range of applicable systems is limited by the number of available issued queries and the application that mediates the query issuance, and it is difficult to implement for the entire system. In addition, since it is essential to prepare an issue query for automatic recommendation of a data linkage relationship, for example, a linkage destination table is a newly created table and cannot be used when no issue query exists yet.

なお、発行クエリの準備が必要なく、テーブル定義情報のようなメタデータのみからスキーママッチング技術によりテーブル間の参照関係を抽出する技術もある。このような技術では、テーブル定義情報は静的な情報のため、データ連携関係の推薦に必要なデータが入手しやすいという性質がある。また、テーブル定義情報を用いて対応関係を抽出するため、利用者によるクエリ発行や、アプリケーションによるクエリの媒介に影響されず、全システムを対象とした連携関係の自動推薦のようにスコープを広げた適用が可能である。
スキーママッチング技術は類似のデータ項目を見つけ出すという技術である。このため、複数システムのデータベースを対象とする処理では、同義のカラムや類似の参照関係が多数存在する可能性もあり、スキーママッチング技術としては正しい対応関係であっても、データ連携関係としては使用できない関係が多数検出されてしまう。 There is also a technique for extracting a reference relationship between tables by using a schema matching technique only from metadata such as table definition information without preparing an issue query. In such a technique, since the table definition information is static information, it has a property that it is easy to obtain data necessary for recommending the data linkage relationship. In addition, because the correspondence is extracted using the table definition information, the scope has been expanded, such as automatic recommendation of linkage relations for the entire system, regardless of the issuance of queries by users and the mediation of queries by applications. Applicable.
The schema matching technique is a technique for finding similar data items. For this reason, there may be many synonymous columns and similar reference relationships in a process that targets a database of multiple systems. Even if the correspondence relationship is correct as a schema matching technology, it is used as a data linkage relationship. Many impossible relationships are detected.

本実施の形態に係るデータ連携推定装置１００によれば、データ連携関係の自動推薦を実施するために発行クエリが必要ではなく、テーブル定義情報などのメタデータのみで実施可能である。よって、データ連携関係を探索するシステム範囲を広く設定できる。この結果、データ連携の対象となるシステム全体を考慮した全体最適解を見つけ出し、検出漏れを最小限にできる効果がある。
また、データ連携関係の自動推薦では探索スコープを広く取ると、本来連携元とすべきテーブル群以外の候補が検出される可能性がある。そして、本来連携元とすべきテーブルのコピーや変形（例えば、マスタテーブルからデータを連携すべき場合に、類似項目がトランザクションテーブルに存在する場合）を連携元としてデータ連携定義を実施してしまう可能性がある。この結果、作成したデータ連携は本来連携元とすべきテーブルではなく、そのコピーや変形と依存関係を持ってしまうことが発生する。このように本来必要のないデータ連携が定義されてしまうこと、すなわち部分最適解を回避する効果がある。 According to the data linkage estimation apparatus 100 according to the present embodiment, an issuance query is not required to perform automatic recommendation of the data linkage relationship, and it can be performed only with metadata such as table definition information. Therefore, it is possible to set a wide system range for searching for data linkage relationships. As a result, it is possible to find an overall optimal solution that takes into consideration the entire system that is the target of data linkage, and to minimize detection omissions.
In addition, in the automatic recommendation of the data linkage relationship, if the search scope is wide, candidates other than the table group that should originally be the linkage source may be detected. And, it is possible to implement data linkage definition using the copy source or copy of the table that should originally be the linkage source (for example, when data is linked from the master table and similar items exist in the transaction table) as the linkage source. There is sex. As a result, the created data linkage is not a table that should originally be the linkage source, but has a dependency relationship with the copy or transformation. Thus, there is an effect of avoiding a partially optimal solution, that is, data linkage that is not originally required is defined.

上記の実施の形態では、テーブル定義抽出部、参照関係抽出部、データ関係抽出部、スキーママッチング処理部、関係性抽出部、マッチング結果統合部、最短経路探索部がそれぞれ独立した機能ブロックとしてデータ連携推定装置を構成している。しかし、データ連携推定装置は上記のような構成でなくてもよい。例えば、参照関係抽出部、データ関係抽出部、スキーママッチング処理部、関係性抽出部をひとつの機能ブロックで実現してもよい。また、マッチング結果統合部、最短経路探索部を１つの機能ブロックで実現してもよい。データ連携推定装置の構成は任意である。 In the above embodiment, the table definition extraction unit, the reference relationship extraction unit, the data relationship extraction unit, the schema matching processing unit, the relationship extraction unit, the matching result integration unit, and the shortest path search unit are linked as independent functional blocks. An estimation device is configured. However, the data linkage estimation device may not have the above configuration. For example, the reference relationship extraction unit, the data relationship extraction unit, the schema matching processing unit, and the relationship extraction unit may be realized by one functional block. Further, the matching result integration unit and the shortest path search unit may be realized by one functional block. The configuration of the data linkage estimation device is arbitrary.

また、データ連携推定装置は、１つの装置でなく、複数の装置から構成されたデータ連携推定システムでもよい。データ連携推定装置の機能ブロックは、実施の形態に記載した機能を実現することができれば、任意であり、これらの機能ブロックを、他のどのような組み合わせでデータ連携推定装置を構成しても構わない。 In addition, the data cooperation estimation device may be a data cooperation estimation system including a plurality of devices instead of a single device. The functional block of the data linkage estimation device is arbitrary as long as the function described in the embodiment can be realized, and the data linkage estimation device may be configured by any combination of these functional blocks. Absent.

以上、本発明の実施の形態について説明したが、この実施の形態に含まれる発明を部分的に組み合わせて実施しても構わない。あるいは、この実施の形態のうち、１つの部分を実施しても構わない。あるいは、この実施の形態のうち、２つ以上を部分的に組み合わせて実施しても構わない。
なお、以上の実施の形態は、本質的に好ましい例示であって、本発明、その適用物や用途の範囲を制限することを意図するものではなく、必要に応じて種々の変更が可能である。 As mentioned above, although embodiment of this invention was described, you may implement combining the invention contained in this embodiment partially. Alternatively, one part of this embodiment may be implemented. Alternatively, two or more of the embodiments may be partially combined.
The above-described embodiments are essentially preferable examples, and are not intended to limit the scope of the present invention, its application, and uses, and various modifications can be made as necessary. .

１１テーブル定義抽出部、１２参照関係抽出部、１３データ関係抽出部、１４スキーママッチング処理部、１５関係性抽出部、１６データ蓄積部、１７関係性情報格納部、１８マッチング結果統合部、１９最短経路探索部、２１，２２，２３，２４出力例、３１，３２入力例、１００データ連携推定装置、１０１連携元データベース定義情報、１０２連携先データベース定義情報、１１１テーブル所在情報、１１２テーブル定義情報、１２１探索用参照元テーブル、１２２探索用参照先テーブル、１５１，１５２スコア、１５３制約、１６１参照関係データ、１６２継承関係データ、１６３スキーママッチング結果データ、１７１関係性情報、１８１連携元候補カラム群、１８２連携元候補カラム群の組み合わせ、１８３テーブルのペア、１８４最短経路情報の組み合わせ、１８５最短経路から連携可能なカラム群、１８６最短経路上ノード情報、１９１ノード情報、１９２辺情報、１９３最短経路情報、２００表示装置、３００入力装置、９０１プロセッサ、９０２補助記憶装置、９０３メモリ、９０４通信装置、９０５入力インタフェース、９０６ディスプレイインタフェース、９０７入力装置、９０８ディスプレイ、９１０信号線、９１１，９１２ケーブル、９０４１レシーバー、９０４２トランスミッター、１０００テーブル群、１２１１データベース定義情報、１５５１テーブル群、１６３４，１６３８カラム名、１６３９スキーママッチングスコア、１７１４テーブルスコア、１８１５識別子、１８４２始点ノード、１８４３終点ノード、１８４４経由ノード、１８５１最短経路識別子、１９１４ノードＩＤ、１９２３辺の重み、３０００連携先テーブル定義情報、１６２０１複数のテーブルの各テーブルのカラム、１６２０２カラム特性、１６３０１類似度、１７１０１テーブル特性、１７１０１ａマスタ系のテーブル、１７１０１ｂトランザクション系のテーブル、ｓ１１０データベース定義情報抽出処理、ｓ１２０参照関係抽出処理、ｓ１３０スキーママッチング処理、ｓ１３０ａデータ関係抽出処理、ｓ１４０関係性抽出処理、ｓ１５０マッチング結果統合処理、ｓ１５５連携元テーブル群検索処理、ｓ１６０マッチング結果表示処理。 11 Table definition extraction unit, 12 Reference relationship extraction unit, 13 Data relationship extraction unit, 14 Schema matching processing unit, 15 Relationship extraction unit, 16 Data storage unit, 17 Relationship information storage unit, 18 Matching result integration unit, 19 Shortest Route search unit, 21, 22, 23, 24 output example, 31, 32 input example, 100 data linkage estimation device, 101 linkage source database definition information, 102 linkage destination database definition information, 111 table location information, 112 table definition information, 121 search reference table, 122 search reference table, 151, 152 score, 153 constraint, 161 reference relationship data, 162 inheritance relationship data, 163 schema matching result data, 171 relationship information, 181 cooperation source candidate column group, 182 Set of linkage source candidate columns 183 table pair, 184 combination of shortest path information, 185 column group that can be linked from the shortest path, 186 node information on shortest path, 191 node information, 192 edge information, 193 shortest path information, 200 display device, 300 Input device, 901 processor, 902 auxiliary storage device, 903 memory, 904 communication device, 905 input interface, 906 display interface, 907 input device, 908 display, 910 signal line, 911, 912 cable, 9041 receiver, 9042 transmitter, 1000 table Group, 1211 database definition information, 1551 table group, 1634, 1638 column name, 1639 schema matching score, 1714 table score, 1815 Child, 1842 Start node, 1843 End node, 1844 Via node, 1851 Shortest path identifier, 1914 Node ID, 1923 Edge weight, 3000 Coordination table definition information, 16201 Column of each table of multiple tables, 16202 Column characteristics, 16301 Similarity, 17101 table characteristics, 17101a master table, 17101b transaction table, s110 database definition information extraction processing, s120 reference relationship extraction processing, s130 schema matching processing, s130a data relationship extraction processing, s140 relationship extraction processing, s150 Matching result integration process, s155 cooperation source table group search process, s160 matching result display process.

Claims

A reference relationship extraction unit that extracts reference relationship data representing a column reference relationship from a table group including a plurality of tables having columns;
Calculating a similarity between the columns of the two tables extracted from the table group, and extracting the columns of the two tables having the similarity equal to or higher than a threshold as schema matching result data;
Based on the reference relationship data and the schema matching result data, determine a column characteristic of each column of a plurality of tables, and a data relationship extraction unit that generates the determined result as inheritance relationship data;
Based on the inheritance relationship data, a table score representing a table characteristic of the table is calculated, and a relationship extraction unit that generates relationship information in which the calculated table score is associated with the table;
Acquires cooperation destination table definition information that defines a search target cooperation destination table included in the table group, and selects a cooperation source table candidate that is a cooperation source of the cooperation destination table based on the relationship information. A data linkage estimation apparatus including a matching result integration unit that extracts a table group.

The matching result integration unit
The data cooperation estimation apparatus according to claim 1, wherein the cooperation source of the column of the cooperation destination table is output for each table having different table characteristics from the cooperation source candidate table group.

The data relationship extraction unit
The column characteristic of each table column of the plurality of tables is determined by determining whether a column of each table of the plurality of tables is an inherited item, a inherited item, or a unique item. Or the data cooperation estimation apparatus of 2.

The relationship extraction unit
The data linkage estimation apparatus according to any one of claims 1 to 3, wherein the table score indicating whether the table is a master table or a transaction table is calculated as the table characteristic.

The matching result integration unit
5. The data linkage estimation apparatus according to claim 4, wherein a column of the master table and a column of the transaction table are output as the linkage source columns of the linkage destination table.

The schema matching processing unit
The data cooperation estimation apparatus according to any one of claims 1 to 5, wherein the similarity is calculated as a schema matching score by executing a schema matching process.

The matching result integration unit
When there is no linkage source of the column of the linkage destination table, it is determined whether or not the table having the insufficient column and the linkage source candidate table group can be joined. The data cooperation estimation apparatus of any one of Claim 1 to 6 to report.

A reference relationship extraction unit extracts reference relationship data representing a column reference relationship from a table group including a plurality of tables having columns;
The schema matching processing unit calculates the similarity between the columns of the two tables extracted from the table group, extracts the columns of the two tables whose similarity is equal to or more than a threshold as schema matching result data,
The data relationship extraction unit determines column characteristics of the columns of each table of the plurality of tables based on the reference relationship data and the schema matching result data, and generates the determined result as inheritance relationship data.
A relationship extraction unit calculates a table score representing the table characteristics of the table based on the inheritance relationship data, and generates relationship information in which the calculated table score is associated with the table;
The matching result integration unit acquires cooperation destination table definition information that defines a search target cooperation destination table included in the table group, and based on the relationship information, a cooperation source table that is a cooperation source of the cooperation destination table Data link estimation method for extracting the candidates as a link source candidate table group.

A reference relationship extraction process for extracting reference relationship data representing a column reference relationship from a table group including a plurality of tables having columns;
Schema matching processing for calculating the similarity between the columns of the two tables extracted from the table group, and extracting the columns of the two tables whose similarity is equal to or greater than a threshold as schema matching result data;
Based on the reference relationship data and the schema matching result data, determine a column characteristic of each column of a plurality of tables, and a data relationship extraction process for generating the determined result as inheritance relationship data;
Based on the inheritance relationship data, a table score representing the table characteristics of the table is calculated, and a relationship extraction process for generating relationship information in which the calculated table score is associated with the table;
Acquires cooperation destination table definition information that defines a search target cooperation destination table included in the table group, and selects a cooperation source table candidate that is a cooperation source of the cooperation destination table based on the relationship information. A program for causing a computer to execute matching result integration processing extracted as a table group.