JP2008197976A

JP2008197976A - Connection information generation program and connection information generation method

Info

Publication number: JP2008197976A
Application number: JP2007033424A
Authority: JP
Inventors: Tadashi Hoshiai; 忠星合
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-02-14
Filing date: 2007-02-14
Publication date: 2008-08-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide connection information for connecting three or more data base tables. <P>SOLUTION: When receiving the URL or paths of database tables 11, 12, 13 and so on selected as the object of connection from an operator (step S101), this database connection device 20 generates a corresponding table 71 in which connection information acquired by associating each of the field names of the database tables, the other comparison object field names and the similarity of the both field names is recorded (steps S104 to S113). <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、複数のデータベーステーブルを連結するための連結情報を生成するためのプログラム及び方法に、関する。 The present invention relates to a program and a method for generating connection information for connecting a plurality of database tables.

周知のように、多くの企業は、商品に関する情報や顧客に関する情報を、データベースにおいて管理している。そして、この種の企業同士が合併したり業務提携したりすることにより、互いのデータベースを相互利用しようとする場合に、それらデータベースのデータベーステーブルが連結されることがある。 As is well known, many companies manage information about products and information about customers in a database. And when this kind of company merges or makes business alliance, when trying to mutually use each other's database, the database table of those databases may be connected.

複数のデータベーステーブルを連結する場合、各データベーステーブルに共通するフィールドを探し出す必要がある。ここで、各データベーステーブルのレコードが、互いに同じフィールド名のフィールドを有していれば、各データベーステーブルを連結することは簡単である。しかし、例えば、顧客名が事実上記録されているフィールドが各データベーステーブルに存在しているにも拘わらず、そのフィールド名が利用者名や登録者名のように、データベーステーブルによって異なることがある。このため、複数のデータベーステーブルを連結することは、現実的には、簡単ではない。 When concatenating multiple database tables, it is necessary to find a field common to each database table. Here, if the records of each database table have fields with the same field name, it is easy to connect the database tables. However, for example, even though a field in which a customer name is actually recorded exists in each database table, the field name may differ depending on the database table, such as a user name or a registrant name. . For this reason, it is practically not easy to connect a plurality of database tables.

なお、二つのデータベーステーブルにおいて、フィールド名が互いに実質的に共通するフィールドを探し出す技術は、特許文献１において開示されている。 A technique for searching for a field whose field names are substantially common to each other in the two database tables is disclosed in Patent Document 1.

特開２００５−０６３３３２号公報JP 2005-063332 A

しかしながら、前述した特許文献１に係る技術は、実質的に共通するフィールド名同士を対応付ける連結情報が、二つのデータベーステーブルを連結するためだけの連結情報となっており、三つ以上のデータベーステーブルを連結するための連結情報を生成する技術は、存在していない。 However, in the technology according to Patent Document 1 described above, the link information that associates substantially common field names is link information for linking two database tables, and three or more database tables are stored. There is no technique for generating connection information for connection.

本発明は、前述した従来の事情に鑑みてなされたものであり、その課題は、三つ以上のデータベーステーブルを連結するための連結情報が生成できるようにすることにある。 The present invention has been made in view of the above-described conventional circumstances, and an object of the present invention is to be able to generate connection information for connecting three or more database tables.

上記の課題を解決するための案出された連結情報生成プログラムは、コンピュータを、幾つかのフィールドを有する複数のレコードを記憶する複数のデータベーステーブルの中から選択された幾つかのデータベーステーブルを特定する選択情報を、操作者から入力装置を通じて受け付ける選択受付手段，その選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルのそれぞれについて、そのデータベーステーブルの各フィールド名を特徴付ける語句の集合である特徴語セットを、各フィールドの値に基づいて生成する特徴語セット生成手段，その特徴語セット生成手段が生成した全ての特徴語セットのそれぞれについて、特徴ベクトルを生成する特徴ベクトル生成手段，選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名に対するそれ以外の残りの比較対象フィールド名の類似度を、特徴ベクトル生成手段が生成した特徴ベクトルに基づいて算出する処理、及び、そのフィールド名に対応する類似度テーブルに比較対象フィールド名と類似度とを対応付けて記録する処理を行う類似度生成手段，及び、受付手段が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名とそれ以外の比較対象フィールド名と双方のフィールド名の類似度とを対応付けた連結情報を、記憶装置内の対応テーブルに記録する記録手段として機能させることを、特徴としている。 A concatenated information generation program devised to solve the above problems identifies a computer selected from a plurality of database tables storing a plurality of records having a plurality of fields. A selection accepting unit that accepts selection information to be received from an operator through an input device, and a set of words that characterize each field name of the database table for each of the database tables specified by the selection information accepted by the selection accepting unit. Feature word set generation means for generating a feature word set based on the value of each field, feature vector generation means for generating a feature vector for each of all feature word sets generated by the feature word set generation means, selection Depending on the selection information received by the receiving means Processing for calculating the similarity of the remaining field names to be compared with the field names based on the feature vectors generated by the feature vector generation unit for all the field names of all the database tables to be processed; and , Similarity generation means for performing processing of associating and recording the comparison target field name and similarity in the similarity table corresponding to the field name, and all database tables specified by the selection information received by the reception means For each of all the field names, function as a recording means for recording, in the correspondence table in the storage device, linked information that associates the field name, the other field name to be compared, and the similarity of both field names. It is characterized by making it.

このように構成されると、コンピュータは、操作者が連結対象として幾つかのデータベーステーブルを選択した場合に、それらデータベーステーブルのフィールド名のそれぞれについて、そのフィールド名とそれ以外の比較対象フィールド名と双方のフィールド名の類似度とを対応付けた連結情報が記録された対応テーブルを、生成する。この対応テーブルに記録された連結情報には、双方のフィールド名の類似度が含まれているため、操作者は、類似度の高いフィールド名同士を、実質的に共通するフィールド名として把握することができる。 With this configuration, when the operator selects several database tables as the connection target, the computer selects the field name and other comparison target field names for each of the database table field names. A correspondence table in which concatenation information in which the similarity between both field names is associated is recorded is generated. Since the concatenation information recorded in this correspondence table includes the similarity of both field names, the operator must grasp the field names having high similarity as the substantially common field names. Can do.

従って、本発明によれば、三つ以上のデータベーステーブルを連結するための連結情報が生成できるようになる。 Therefore, according to the present invention, connection information for connecting three or more database tables can be generated.

以下、添付図面を参照しながら、本発明を実施するための最良の形態について、説明する。 The best mode for carrying out the present invention will be described below with reference to the accompanying drawings.

図１は、三台以上のデータベース装置１０と本実施形態のデータベース連結装置２０とからなるコンピュータネットワークシステムの構成図である。 FIG. 1 is a configuration diagram of a computer network system including three or more database devices 10 and a database connection device 20 of the present embodiment.

何れのデータベース装置１０とも、データベース機能が付加された汎用コンピュータであり、ネットワークを介してデータベース連結装置２０に通信自在に接続されている。従って、これらデータベース装置１０は、図示していないが、少なくとも、ストレージ、ＣＰＵ［Central Processing Unit］、メモリ、及び、通信アダプタを、内蔵している。なお、ストレージは、プログラムやデータを記憶しておくための記憶装置である。ＣＰＵは、そのストレージ内のプログラムに従って処理を実行する処理装置である。メモリは、ＣＰＵが読み出したプログラムやデータがキャッシュされたりＣＰＵの作業領域が展開されたりする記憶装置である。通信アダプタは、他のコンピュータとの間でデータの遣り取りをするための通信装置である。そして、何れのデータベース装置１０とも、データベース機能を実現するため、そのストレージに、レコードが蓄積されたデータベーステーブル１１，１２，１３，…と、与えられた検索条件を用いてそのデータベーステーブルを検索するためのプログラムとを、記憶している。 Each of the database devices 10 is a general-purpose computer to which a database function is added, and is communicably connected to the database connection device 20 via a network. Accordingly, these database devices 10 include at least a storage, a CPU (Central Processing Unit), a memory, and a communication adapter, which are not shown. The storage is a storage device for storing programs and data. The CPU is a processing device that executes processing according to a program in the storage. The memory is a storage device in which programs and data read by the CPU are cached and a work area of the CPU is expanded. The communication adapter is a communication device for exchanging data with other computers. In order to realize the database function, any database device 10 searches the database table using the database tables 11, 12, 13,... In which records are stored in the storage and given search conditions. A program for storing the program.

図２は、このデータベース連結装置２０の構成図である。 FIG. 2 is a configuration diagram of the database connecting device 20.

データベース連結装置２０は、データベース連結機能が付加されたパーソナルコンピュータである。従って、データベース連結装置２０は、液晶ディスプレイ等の表示装置２０ａと、キーボードやマウス等の入力装置２０ｂと、これら装置２０ａ，２０ｂに接続された本体とからなる。また、その本体は、少なくとも、ストレージ２０ｃ、ＣＰＵ２０ｄ、メモリ２０ｅ、及び、通信アダプタ２０ｆを、内蔵している。そして、このデータベース連結装置２０は、データベース連結機能を実現するため、そのストレージ２０ｃに、データベース連結ツール２１を、記憶している。このデータベース連結ツール２１は、複数のデータベースを連結するための連結情報を生成するためのプログラムである。このデータベース連結ツール２１は、入力装置２０ｂを通じて操作者からの実行指示を受けたＣＰＵ２０ｄにより、実行される。 The database connection device 20 is a personal computer to which a database connection function is added. Therefore, the database connection device 20 includes a display device 20a such as a liquid crystal display, an input device 20b such as a keyboard and a mouse, and a main body connected to these devices 20a and 20b. The main body includes at least a storage 20c, a CPU 20d, a memory 20e, and a communication adapter 20f. The database connection device 20 stores a database connection tool 21 in the storage 20c in order to realize a database connection function. The database connection tool 21 is a program for generating connection information for connecting a plurality of databases. The database connection tool 21 is executed by the CPU 20d that has received an execution instruction from the operator through the input device 20b.

図３及び図４は、データベース連結ツール２１による処理の流れを示す図である。 3 and 4 are diagrams showing a flow of processing by the database connection tool 21. FIG.

データベース連結ツール２１の実行開始後、最初のステップＳ１０１において、ＣＰＵ２０ｄは、連結対象となるデータベーステーブル１１，１２，１３，…の指定を、入力装置２０ｂを通じて操作者から受け付ける。具体的には、ＣＰＵ２０ｄは、データベーステーブル１１，１２，１３，…のＵＲＬ［Uniform Resource Locator］やパスを入力するための入力欄を含む画面を表示装置２０ａに表示し、その入力欄に入力されたＵＲＬやパスを取得することにより、連結対象として操作者に選択されたデータベーステーブル１１，１２，１３，…の指定を受け付ける。 After starting the execution of the database connection tool 21, in the first step S101, the CPU 20d accepts designation of the database tables 11, 12, 13,... To be connected from the operator through the input device 20b. Specifically, the CPU 20d displays a screen including an input field for inputting a URL [Uniform Resource Locator] and a path of the database tables 11, 12, 13,... On the display device 20a, and is input to the input field. By acquiring the URL and path, the specification of the database tables 11, 12, 13,... Selected by the operator as the connection target is accepted.

なお、このステップＳ１０１を実行するＣＰＵ２０ｄは、前述した選択受付手段に相当する。 The CPU 20d that executes this step S101 corresponds to the above-described selection receiving means.

次のステップＳ１０２では、ＣＰＵ２０ｄは、ステップＳ１０１で連結対象として指定された全てのデータベーステーブル１１，１２，１３，…の各フィールド名の中から同一又は類似のフィールド名を抽出するときの抽出条件を、入力装置２０ｂを通じて操作者から受け付ける。なお、ここで受け付けられる抽出条件には、「類似度閾値」及び「順位閾値」がある。このうち、また、「類似度閾値」は、互いに類似していると判断すべきフィールド名同士の類似度（後述）の下限値である。「順位閾値」とは、連結対象として指定された全てのデータベーステーブルの全フィールドの名称のうち、或るフィールド名を処理対象とした場合において、その処理対象フィールド名以外の残りの全てのフィールド名を、処理対象フィールド名に対する類似度の高さの順に並べたときに、処理対象フィールド名に類似していると判断すべきフィールド名の順番の下限値を言う。ＣＰＵ２０ｄは、これら「類似度閾値」及び「順位閾値」を入力するための入力欄を含む画面を表示装置２０ａに表示し、その入力欄に入力された類似度閾値及び順位閾値を取得することにより、操作者から抽出条件を受け付ける。 In the next step S102, the CPU 20d sets the extraction conditions for extracting the same or similar field names from the field names of all the database tables 11, 12, 13,... Specified as the connection target in step S101. And received from the operator through the input device 20b. The extraction conditions accepted here include “similarity threshold” and “rank threshold”. Among these, the “similarity threshold” is a lower limit value of similarity (described later) between field names that should be determined to be similar to each other. "Rank threshold" is the name of all fields other than the processing target field name when a certain field name is set as the processing target among the names of all the fields of all the database tables specified as the connection target. Is the lower limit value of the order of field names that should be determined to be similar to the processing target field name when they are arranged in descending order of similarity to the processing target field name. The CPU 20d displays a screen including input fields for inputting these “similarity threshold value” and “rank threshold value” on the display device 20a, and acquires the similarity threshold value and the rank threshold value input in the input field. The extraction condition is received from the operator.

なお、このステップＳ１０２を実行するＣＰＵ２０ｄは、類似度閾値受付手段及び順位閾値受付手段に相当する。 The CPU 20d executing step S102 corresponds to a similarity threshold receiving unit and a rank threshold receiving unit.

次のステップＳ１０３では、ＣＰＵ２０ｄは、連結対象として指定された全てのデータベーステーブルの全フィールドの名称のうち、或るフィールド名を処理対象とした場合において、その処理対象フィールド名と同一又は類似である否かが判断される比較対象の中に、処理対象フィールド名と同じデータベーステーブル内のフィールド名を含めるか否かの指定を、入力装置２０ｂを通じて操作者から受け付ける。具体的には、ＣＰＵ２０ｄは、同一データベーステーブル内のフィールド名も比較対象とするか否かを指定する指定情報を入力するための入力欄を含む画面を表示装置２０ａに表示し、その入力欄に入力された指定情報を通じて、同一データベーステーブル内のフィールド名も比較対象とするか否かの指定を受け付ける。 In the next step S103, the CPU 20d is the same as or similar to the processing target field name when a certain field name is set as the processing target among all the field names of all the database tables designated as the connection target. Whether or not to include a field name in the same database table as the processing target field name in the comparison target to be determined is accepted from the operator through the input device 20b. Specifically, the CPU 20d displays a screen including an input field for inputting designation information for designating whether or not field names in the same database table are to be compared on the display device 20a. Through the input designation information, designation is made as to whether or not field names in the same database table are to be compared.

なお、このステップＳ１０３を実行するＣＰＵ２０ｄは、指定受付手段に相当する。 The CPU 20d executing step S103 corresponds to a designation receiving unit.

このステップＳ１０３の後、ＣＰＵ２０ｄは、第１の処理ループＬ１を実行する。第１の処理ループＬ１では、ＣＰＵ２０ｄは、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルのそれぞれについて、順に、ステップＳ１０４を実行する。 After step S103, the CPU 20d executes a first processing loop L1. In the first processing loop L1, the CPU 20d sequentially executes step S104 for each of all database tables specified by the designation received in step S101.

ステップＳ１０４では、ＣＰＵ２０ｄは、処理対象データベーステーブルのオブジェクト化を実行する。ここで、データベーステーブルのオブジェクト化について説明する。 In step S104, the CPU 20d executes objectification of the processing target database table. Here, the objectification of the database table will be described.

図５は、データベーステーブル１１のデータ構造の一例を示す図である。また、図６は、図５のデータベーステーブル１１から生成したオブジェクト３１，３４及びテーブル３２，３３，３５の構造の一例を示す図である。 FIG. 5 is a diagram illustrating an example of the data structure of the database table 11. 6 is a diagram showing an example of the structure of the objects 31, 34 and the tables 32, 33, 35 generated from the database table 11 of FIG.

図５に示すように、データベーステーブル１１の各レコードは、フィールド名が従業員番号、氏名、住所、…である複数のフィールドを、有している。ＣＰＵ２０ｄは、このデータベーステーブル１１から、基底クラスオブジェクト３１、プロパティ名リストテーブル３２、インスタンスリストテーブル３３、複数のインスタンスオブジェクト３４、及び、複数のプロパティ値リストテーブル３５を生成する。 As shown in FIG. 5, each record of the database table 11 has a plurality of fields whose field names are employee number, name, address,. The CPU 20d generates a base class object 31, a property name list table 32, an instance list table 33, a plurality of instance objects 34, and a plurality of property value list tables 35 from the database table 11.

この基底クラスオブジェクト３１は、データベーステーブル１１のテーブル名を、クラス名として持つ。また、この基底クラスオブジェクト３１は、プロパティ名リストテーブル３２のポインタと、インスタンスリストテーブル３３のポインタとを、持っている。 The base class object 31 has the table name of the database table 11 as a class name. The base class object 31 has a pointer of the property name list table 32 and a pointer of the instance list table 33.

プロパティ名リストテーブル３２は、データベーステーブル１１内のレコードが有するフィールドと同数のレコードを、有している。各レコードは、データベーステーブル１１内のレコードが有する各フィールドの名称を一意に特定するための番号が記録される「番号」フィールドと、そのフィールド名がプロパティ名として記録される「プロパティ名」フィールドとを、有している。 The property name list table 32 has the same number of records as the fields of the records in the database table 11. Each record includes a “number” field in which a number for uniquely identifying the name of each field included in the record in the database table 11 is recorded, and a “property name” field in which the field name is recorded as a property name. have.

インスタンスリストテーブル３３は、データベーステーブル１１と同数のレコードを、有している。各レコードは、各インスタンスオブジェクト３４を一意に特定するための番号が記録される「番号」フィールドと、それらインスタンスオブジェクト３４のポインタが記録される「インスタンス」フィールドとを、有している。 The instance list table 33 has the same number of records as the database table 11. Each record has a “number” field in which a number for uniquely identifying each instance object 34 is recorded, and an “instance” field in which a pointer of the instance object 34 is recorded.

インスタンスオブジェクト３４は、インスタンスを一意に特定するためのインスタンスＩＤと、プロパティ値リストテーブル３５のポインタとを持つ。このインスタンスオブジェクト３４とプロパティ値リストテーブル３５は、データベーステーブル１１の一レコードにつき、一つ生成される。従って、インスタンスオブジェクト３４とプロパティ値リストテーブル３５は、データベーステーブル１１のレコードと同数生成される。 The instance object 34 has an instance ID for uniquely identifying the instance and a pointer of the property value list table 35. One instance object 34 and one property value list table 35 are generated for each record in the database table 11. Accordingly, the same number of instance objects 34 and property value list tables 35 as the records of the database table 11 are generated.

プロパティ値リストテーブル３５は、データベーステーブル１１の一レコードのフィールド数と同数のレコードを、有している。各レコードは、データベーステーブル１１内の対応するレコードにおける各フィールドの値を一意に特定するための番号が記録されるフィールドと、その対応レコードの各フィールドの値が記録されるフィールドとを、有している。 The property value list table 35 has the same number of records as the number of fields of one record of the database table 11. Each record has a field in which a number for uniquely specifying the value of each field in the corresponding record in the database table 11 is recorded, and a field in which the value of each field in the corresponding record is recorded. ing.

なお、処理対象のデータベーステーブルから生成されるオブジェクト３１，３４及びテーブル３２，３３，３５は、一つの情報体系を構成する。 Note that the objects 31, 34 and the tables 32, 33, 35 generated from the database table to be processed constitute one information system.

ＣＰＵ２０ｄは、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルのそれぞれについて、図６に示すようなオブジェクトを生成した後、第１の処理ループＬ１から離脱し、図３のステップＳ１０５へ処理を進める。 The CPU 20d generates an object as shown in FIG. 6 for each of all the database tables specified by the designation received in step S101, then leaves the first processing loop L1 and proceeds to step S105 in FIG. To proceed.

ステップＳ１０５では、ＣＰＵ２０ｄは、統合クラスオブジェクトを生成する。 In step S105, the CPU 20d generates an integrated class object.

図７は、統合クラスオブジェクト４１の構造の一例を示す図である。 FIG. 7 is a diagram illustrating an example of the structure of the integrated class object 41.

図７に示すように、統合クラスオブジェクト４１は、クラス名を持つが、インスタンスリストテーブルのポインタと、プロパティ名リストテーブルのポインタは、持っていない。なお、このクラス名は、ステップＳ１０４で生成された全ての情報体系を連結してなる総合的な情報体系の名称である。ＣＰＵ２０ｄは、この統合クラスオブジェクト４１を生成した後、この統合クラスオブジェクト４１のクラス名を、ステップＳ１０４で生成された各基底クラスオブジェクト３１の上位クラスのクラス名として、これら基底クラスオブジェクト３１のそれぞれに格納する。なお、このように各情報体系を統合クラスオブジェクト４１に関連付けただけでは、各情報体系は、互いに連結されたことにはならない。後述の対応テーブル７１が生成されることにより、各情報体系は、互いに連結されたこととなる。 As shown in FIG. 7, the integrated class object 41 has a class name, but does not have a pointer of an instance list table and a pointer of a property name list table. This class name is a name of a comprehensive information system formed by connecting all the information systems generated in step S104. After generating the integrated class object 41, the CPU 20d assigns the class name of the integrated class object 41 to each of the base class objects 31 as the class name of the upper class of each base class object 31 generated in step S104. Store. Note that the information systems are not connected to each other simply by associating the information systems with the integrated class object 41 in this way. By generating a correspondence table 71 described later, the information systems are connected to each other.

次のステップＳ１０６（図３）では、ＣＰＵ２０ｄは、依存情報管理オブジェクトを生成する。依存情報オブジェクトは、互いに関連付ける前の各情報体系（それらの基は、各データベーステーブル１１、１２、１３、…）と、互いに関連付けた後の総合的な情報体系との関係を管理するためのオブジェクトである。 In the next step S106 (FIG. 3), the CPU 20d generates a dependency information management object. The dependency information object is an object for managing the relationship between each information system before being associated with each other (their bases are each database table 11, 12, 13,...) And the overall information system after being associated with each other. It is.

図８は、依存情報管理オブジェクト５１の構造の一例を示す図である。 FIG. 8 is a diagram illustrating an example of the structure of the dependency information management object 51.

図８に示すように、依存情報管理オブジェクト５１は、総合的な情報体系の名称として、図７の統合クラスオブジェクト４１のクラス名と同じクラス名を、持っている。また、依存情報管理オブジェクト５１は、連結前情報体系名リストテーブル５２のポインタを持つ。その連結前情報体系名リストテーブル５２は、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルと同数のレコードを、有している。各レコードは、各データベーステーブルを一意に特定するための番号が記録される「番号」フィールドと、各データベーステーブルを管理するための連結前情報体系オブジェクト５３のポインタが記録される「連結前情報体系」フィールドとを、有している。連結前情報体系オブジェクト５３は、各データベーステーブルのテーブル名（すなわち基底クラスオブジェクト３１のクラス名）を、クラス名として持つとともに、基底クラスオブジェクト３１へのポインタを持つ。 As shown in FIG. 8, the dependency information management object 51 has the same class name as the class name of the integrated class object 41 of FIG. 7 as the name of the comprehensive information system. The dependency information management object 51 has a pointer of the pre-concatenation information system name list table 52. The pre-concatenation information system name list table 52 has the same number of records as all the database tables specified by the designation received in step S101. Each record includes a “number” field in which a number for uniquely identifying each database table is recorded, and a “pre-concatenation information system” in which a pointer of the pre-concatenation information system object 53 for managing each database table is recorded. ”Field. The pre-concatenation information system object 53 has the table name of each database table (that is, the class name of the base class object 31) as a class name and a pointer to the base class object 31.

次のステップＳ１０７（図３）では、ＣＰＵ２０ｄは、ステップＳ１０５で互いに関連付けられた各情報体系における全てのプロパティ名（すなわち、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルの全てのフィールド名）について、特徴語セットを生成する。特徴語セットとは、或る語句を特徴付ける語句の集合を言う。すなわち、ＣＰＵ２０ｄは、或るデータベーステーブル内の或るフィールド名について、そのフィールド名を特徴付ける語句を、各レコードにおけるそのフィールドの値の中から抽出し、抽出した語句を、特徴語セットとして、図６のプロパティ名リストテーブル３２の対応するプロパティ名に対応付けて、保存する。なお、図９には、「氏名」フィールド、「名前」フィールド、及び、「社員名」のフィールド名についてそれぞれ生成された特徴語セットが、示されている。なお、特徴語セットの生成方法については、様々なものが案出され、開示されているので、ここでは、説明しない。 In the next step S107 (FIG. 3), the CPU 20d determines all the property names in the information systems associated with each other in step S105 (that is, all field names of all database tables specified by the designation received in step S101). ) For a feature word set. A feature word set refers to a set of words that characterize a word. That is, the CPU 20d extracts a phrase characterizing the field name from a value of the field in each record for a certain field name in a certain database table, and the extracted phrase is set as a characteristic word set in FIG. Are stored in association with the corresponding property names in the property name list table 32. FIG. 9 shows feature word sets generated for the “name” field, the “name” field, and the “employee name” field names, respectively. Various methods for generating a feature word set have been devised and disclosed, and will not be described here.

なお、このステップＳ１０７を実行するＣＰＵ２０ｄは、前述した特徴語セット生成手段に相当する。 The CPU 20d that executes step S107 corresponds to the feature word set generation unit described above.

次のステップＳ１０８（図３）では、ＣＰＵ２０ｄは、ステップＳ１０５で互いに関連付けられた各情報体系における全てのプロパティ名（すなわち、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルの全てのフィールド名）のそれぞれについて、そのプロパティ名に対応付けられた特徴語セットに基づいて、特徴ベクトルを生成する。なお、特徴語セットから特徴ベクトルを生成する方法については、様々なものが案出され、開示されているので、ここでは、説明しない。 In the next step S108 (FIG. 3), the CPU 20d determines all the property names in each information system associated with each other in step S105 (that is, all field names of all database tables specified by the designation received in step S101). ), A feature vector is generated based on a feature word set associated with the property name. Various methods for generating a feature vector from a feature word set have been devised and disclosed, and will not be described here.

なお、このステップＳ１０８を実行するＣＰＵ２０ｄは、前述した特徴ベクトル生成手段に相当する。 The CPU 20d that executes this step S108 corresponds to the above-described feature vector generation means.

ＣＰＵ２０ｄは、特徴語セット及び特徴ベクトルの生成を終えた後、第２の処理ループＬ２を実行する。第２の処理ループＬ２では、ＣＰＵ２０ｄは、ステップＳ１０４で生成された全てのプロパティ名リストテーブル内の全てのプロパティ名（全てのフィールド名）のそれぞれについて、一つずつ、順に、ステップＳ１０９を実行する。 After completing the generation of the feature word set and the feature vector, the CPU 20d executes the second processing loop L2. In the second processing loop L2, the CPU 20d executes step S109 one by one for each of all the property names (all field names) in all the property name list tables generated in step S104. .

ステップＳ１０９では、ＣＰＵ２０ｄは、処理対象プロパティ名について、処理対象プロパティオブジェクト及び類似度テーブルを、生成する。 In step S109, the CPU 20d generates a processing target property object and a similarity table for the processing target property name.

図１０は、処理対象プロパティオブジェクト６１及び類似度テーブル６２のデータ構造の一例を示す図である。 FIG. 10 is a diagram illustrating an example of the data structure of the processing target property object 61 and the similarity table 62.

処理対象プロパティオブジェクト６１は、処理対象プロパティ名とそれに関連する類似度テーブル６２とを結びつけるためのオブジェクトである。この処理対象プロパティオブジェクト６１は、処理対象のプロパティ名を持つとともに、そのプロパティ名に関連する類似度テーブル６２のポインタを持っている。 The processing target property object 61 is an object for associating the processing target property name with the similarity table 62 related thereto. This processing target property object 61 has a processing target property name and a pointer of the similarity table 62 related to the property name.

類似度テーブル６２は、処理対象プロパティ名に対するその処理対象プロパティ名以外の全てのプロパティ名の類似度を、記録しておくためのテーブルである。この類似度テーブル６２は、処理対象プロパティ名以外の全てのプロパティ名と同数のレコードを、有している。各レコードは、「類似プロパティ名」、「類似度」及び「順位」の各フィールドを、有している。「類似プロパティ名」フィールドは、処理対象プロパティ名の比較対象であるプロパティ名が記録されるフィールドである。「類似度」フィールドは、そのプロパティ名の特徴ベクトルと処理対象プロパティ名の特徴ベクトルとの間の距離が類似度として記録されるフィールドである。「順位」フィールドは、処理対象プロパティ名以外の全てのプロパティ名を類似度の高い順に並べた場合におけるそのプロパティ名の順位が記録されるフィールドである。なお、一対の情報要素が持つ特徴ベクトル同士の間の距離に基づいて情報要素間の類似度を求める方法については、特開２００５−６３３３２号公報に開示されている。 The similarity table 62 is a table for recording similarities of all property names other than the processing target property name with respect to the processing target property name. The similarity table 62 has the same number of records as all the property names except the processing target property name. Each record has fields of “similar property name”, “similarity”, and “rank”. The “similar property name” field is a field in which the property name to be compared with the processing target property name is recorded. The “similarity” field is a field in which the distance between the feature vector of the property name and the feature vector of the processing target property name is recorded as the similarity. The “order” field is a field in which the order of the property names when all the property names other than the property names to be processed are arranged in descending order of similarity is recorded. Note that a method for obtaining a similarity between information elements based on a distance between feature vectors of a pair of information elements is disclosed in JP-A-2005-63332.

ＣＰＵ２０ｄは、ステップＳ１０４で生成された全てのプロパティ名リストテーブル３２内の全てのプロパティ名（すなわち、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルの全てのフィールド名）のそれぞれについて、処理対象プロパティオブジェクト６１及び類似度テーブル６２を生成した後、第２の処理ループＬ２から離脱し、図４のステップＳ１１０へ処理を進める。 The CPU 20d processes each property name in all the property name list tables 32 generated in step S104 (that is, all field names in all database tables specified by the designation received in step S101). After the target property object 61 and the similarity table 62 are generated, the process leaves the second processing loop L2, and the process proceeds to step S110 in FIG.

ステップＳ１１０では、ＣＰＵ２０ｄは、ステップＳ１０３で受け付けた指定が、処理対象プロパティ名と同一の情報体系（データベーステーブル）のプロパティ名（フィールド名）を比較対象に含めることを示すものであるか否かを、判別する。そして、ステップＳ１０３で受け付けた指定が、処理対象プロパティ名と同一の情報体系のプロパティ名を比較対象に含めることを示すものであった場合、ＣＰＵ２０ｄは、ステップＳ１１０からステップＳ１１２へ処理を分岐させる。一方、ステップＳ１０３で受け付けた指定が、処理対象プロパティ名と同一の情報体系のプロパティ名を比較対象に含めることを示すものでなかった場合、ＣＰＵ２０ｄは、ステップＳ１１１へ処理を進める。 In step S110, the CPU 20d determines whether or not the designation received in step S103 indicates that the property name (field name) of the same information system (database table) as the processing target property name is included in the comparison target. Determine. If the designation received in step S103 indicates that the property name having the same information system as the processing target property name is included in the comparison target, the CPU 20d branches the processing from step S110 to step S112. On the other hand, when the designation received in step S103 does not indicate that the property name having the same information system as the processing target property name is included in the comparison target, the CPU 20d advances the process to step S111.

ステップＳ１１１では、ＣＰＵ２０ｄは、ステップＳ１０４で生成された全てのプロパティ名リストテーブル３２内の全てのプロパティ名（すなわち、ステップＳ１０１で受け付けた指定により特定される全てのデータベーステーブルの全てのフィールド名）のそれぞれについて、そのプロパティ名の類似度テーブル６２から、同一情報体系に含まれるプロパティ名のレコードを削除する。なお、或るプロパティ名とそれの比較対象であるプロパティ名とが同一情報体系に含まれるか否かは、ステップＳ１０４で生成された全てのプロパティ名リストテーブル３２のうち、二つのプロパティ名を含むプロパティ名リストテーブル３２が存在するか否かにより、判断される。 In step S111, the CPU 20d stores all the property names in all the property name list tables 32 generated in step S104 (that is, all field names of all database tables specified by the designation received in step S101). For each, the property name record included in the same information system is deleted from the property name similarity table 62. Whether a property name and a property name to be compared with it are included in the same information system includes two property names in all the property name list tables 32 generated in step S104. The determination is made based on whether or not the property name list table 32 exists.

ＣＰＵ２０ｄは、全てのプロパティ名のそれぞれについて、類似度テーブル６２から同一情報体系のプロパティ名を含むレコードを削除する処理を行った後、ステップＳ１１２へ処理を進める。 The CPU 20d performs processing for deleting records including property names of the same information system from the similarity table 62 for each of the property names, and then proceeds to step S112.

ステップＳ１１２では、ＣＰＵ２０ｄは、全てのプロパティ名に対応する類似度テーブルから、ステップＳ１０２で受け付けた抽出条件（類似度閾値及び順位閾値）に合致しないレコードを、除去する。例えば、操作者により、類似度閾値及び順位閾値が、０．７及び３と設定された場合、各類似度テーブル６２からは、類似度が０．７未満、又は、順位が４位以下のレコードが、除去されることとなる。 In step S112, the CPU 20d removes records that do not match the extraction conditions (similarity threshold and rank threshold) received in step S102 from the similarity table corresponding to all property names. For example, when the similarity threshold value and the ranking threshold value are set to 0.7 and 3 by the operator, a record with a similarity of less than 0.7 or a ranking of 4th or less is obtained from each similarity table 62. Will be removed.

なお、これらステップＳ１０９乃至Ｓ１１２を実行するＣＰＵ２０ｄは、前述した類似度生成手段に相当する。 The CPU 20d that executes these steps S109 to S112 corresponds to the above-described similarity generation unit.

次のステップＳ１１３では、ＣＰＵ２０ｄは、対応テーブル７１を生成する。対応テーブル７１は、全ての類似度テーブル６２を統合したテーブルである。 In the next step S113, the CPU 20d generates the correspondence table 71. The correspondence table 71 is a table in which all similarity tables 62 are integrated.

図１１は、対応テーブル７１のデータ構造の一例を示す図である。 FIG. 11 is a diagram illustrating an example of the data structure of the correspondence table 71.

図１１に示すように、対応テーブル７１の各レコードは、「番号」、「テーブル名１」、「フィールド名１」、「類似度」、「フィールド名２」及び「テーブル名２」の各フィールドを、有している。「番号」フィールドは、各レコードを一意に識別するための番号が記録されるフィールドである。「テーブル名１」フィールドは、類似度テーブル６２に対応付けられているプロパティ名（フィールド名）を含む情報体系（データベーステーブル）のクラス名（基底クラスオブジェクト３１がクラス名として持つテーブル名）が記録されるフィールドである。「フィールド名１」フィールドは、類似度テーブル６２に対応付けられているプロパティ名（フィールド名）が記録されるフィールドである。「フィールド名２」及び「類似度」の各フィールドは、「フィールド名１」フィールド内のプロパティ名に対応付けられている類似度テーブル６２における「類似プロパティ名」及び「類似度」フィールド内の値（プロパティ名及び類似度）がそれぞれ記録されるフィールドである。「テーブル名２」フィールドは、「フィールド名２」フィールド内の値（プロパティ名）を含む情報体系（データベーステーブル）のクラス名（基底クラスオブジェクト３１がクラス名として持つテーブル名）が記録されるフィールドである。 As shown in FIG. 11, each record of the correspondence table 71 includes fields of “number”, “table name 1”, “field name 1”, “similarity”, “field name 2”, and “table name 2”. have. The “number” field is a field in which a number for uniquely identifying each record is recorded. In the “table name 1” field, the class name of the information system (database table) including the property name (field name) associated with the similarity table 62 (the table name that the base class object 31 has as the class name) is recorded. Field. The “field name 1” field is a field in which a property name (field name) associated with the similarity table 62 is recorded. The “field name 2” and “similarity” fields are values in the “similar property name” and “similarity” fields in the similarity table 62 associated with the property names in the “field name 1” field. (Property name and similarity) are fields recorded respectively. The “table name 2” field is a field in which the class name of the information system (database table) including the value (property name) in the “field name 2” field (the table name that the base class object 31 has as the class name) is recorded. It is.

ＣＰＵ２０ｄは、全ての類似度テーブル６２に基づいて、図１１の対応テーブルを生成した後、図３及び図４に係る処理を終了する。 The CPU 20d generates the correspondence table in FIG. 11 based on all the similarity degree tables 62, and then ends the processes according to FIGS.

なお、このステップＳ１１３を実行するＣＰＵ２０ｄは、前述した記録手段に相当する。 The CPU 20d that executes this step S113 corresponds to the recording means described above.

次に、本実施形態のデータベース連結装置２０の作用及び効果について説明する。 Next, the operation and effect of the database connection device 20 of this embodiment will be described.

操作者が、入力装置２０ｂを通じて、データベース連結ツール２１を実行すると、データベーステーブル１１、１２、１３、…のＵＲＬやパスを入力するための入力欄を含む画面（図示略）が、表示装置２０ａに表示される（ステップＳ１０１）。操作者は、任意のデータベーステーブルを幾つか選択して、それらのＵＲＬ及びパスをその入力欄に入力して、図示せぬ決定ボタンをクリックする。すると、抽出条件を入力するための入力欄を含む画面（図示略）が、表示装置２０ａに表示される（ステップＳ１０２）。操作者は、類似度閾値及び順位閾値として選択した値をその入力欄に入力して、図示せぬ決定ボタンをクリックする。すると、同一データベーステーブル内に含まれるフィールド名同士の比較を行うか否かを指定する指定情報を入力するための入力欄を含む画面（図示略）が、表示装置２０ａに表示される（ステップＳ１０３）。操作者は、指定情報をその入力欄に入力して、図示せぬ決定ボタンをクリックする。すると、操作者が選択した幾つかのデータベーステーブルを連結するための対応テーブル７１が、データベース連結装置２０によって、生成される（ステップＳ１０４〜Ｓ１１３）。 When the operator executes the database connection tool 21 through the input device 20b, a screen (not shown) including input fields for inputting the URLs and paths of the database tables 11, 12, 13,... Is displayed on the display device 20a. It is displayed (step S101). The operator selects some arbitrary database tables, inputs their URLs and paths in the input fields, and clicks a decision button (not shown). Then, a screen (not shown) including an input field for inputting the extraction condition is displayed on the display device 20a (step S102). The operator inputs the values selected as the similarity threshold and the rank threshold in the input fields, and clicks a decision button (not shown). Then, a screen (not shown) including an input field for inputting designation information for designating whether or not to compare field names included in the same database table is displayed on the display device 20a (step S103). ). The operator inputs designation information in the input field and clicks a decision button (not shown). Then, the correspondence table 71 for linking several database tables selected by the operator is generated by the database linking device 20 (steps S104 to S113).

この生成された対応テーブル７１には、フィールド名と比較対象のフィールド名と双方のフィールド名の類似度とを対応付けたレコードが、連結情報として、記録されている。この連結情報には、双方のフィールド名の類似度が含まれているため、操作者は、類似度の高いフィールド名同士を、実質的に共通するフィールド名として把握することができる。 In the generated correspondence table 71, a record in which the field name, the field name to be compared, and the similarity of both field names are associated is recorded as the connection information. Since this connection information includes the similarity between both field names, the operator can recognize field names having high similarity as substantially common field names.

また、類似度閾値が設定されている場合には、この対応テーブル７１には、その類似度閾値以上を持つ連結情報のみが、記録される（ステップＳ１０９）。従って、操作者は、類似度閾値を適宜設定しておけば、類似度が閾値未満の連結情報を対応テーブル７１から事前に除去しておくことができる。 If the similarity threshold is set, only the link information having the similarity threshold or higher is recorded in the correspondence table 71 (step S109). Therefore, if the operator sets the similarity threshold as appropriate, the operator can remove the connection information having the similarity less than the threshold from the correspondence table 71 in advance.

また、順位閾値が設定されている場合には、この対応テーブル７１には、各類似度テーブル６２においてその順位閾値以上の連結情報のみが、記録される（ステップＳ１０９）。従って、操作者は、順位度閾値を適宜設定しておけば、各類似度テーブル６２での順位が閾値未満の連結情報を、対応テーブル７１から事前に除去しておくことができる。 If a rank threshold is set, only the link information that is equal to or higher than the rank threshold in each similarity table 62 is recorded in the correspondence table 71 (step S109). Therefore, the operator can remove beforehand the connection information whose rank in each similarity table 62 is less than the threshold from the correspondence table 71 by appropriately setting the rank degree threshold.

さらに、同一データベーステーブル内に含まれるフィールド名同士の比較を行わないよう指定されていた場合には、各類似度テーブル６２からは、その類似度テーブル６２に対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名のレコードが、除去される（ステップＳ１１０；ＹＥＳ，Ｓ１１１）。従って、操作者は、このような指定を事前に行っておけば、同じデータベーステーブルに属するフィールド名同士の連結情報を対応テーブル７１から事前に除去しておくことができる。 Furthermore, when it is specified not to compare field names included in the same database table, each similarity table 62 belongs to the same database table as the field name corresponding to the similarity table 62. The record of the comparison target field name is removed (step S110; YES, S111). Therefore, if the operator performs such designation in advance, the connection information of field names belonging to the same database table can be removed from the correspondence table 71 in advance.

逆に、同一データベーステーブル内に含まれるフィールド名同士の比較を行うよう指定されていた場合には、各類似度テーブル６２からは、その類似度テーブル６２に対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名のレコードは、残されたままとなる（ステップＳ１１０；ＮＯ）。従って、操作者は、このような指定を事前に行っておけば、同じデータベーステーブルに属するフィールド名同士の連結情報をも対応テーブル７１に抽出させることができる。 On the other hand, when it is specified to compare field names included in the same database table, each similarity table 62 belongs to the same database table as the field name corresponding to the similarity table 62. The record of the comparison target field name remains (step S110; NO). Therefore, if the operator performs such designation in advance, the association table 71 can also extract connection information between field names belonging to the same database table.

（付記１）
コンピュータを、
幾つかのフィールドを有する複数のレコードを記憶する複数のデータベーステーブルの中から選択された幾つかのデータベーステーブルを特定する選択情報を、操作者から入力装置を通じて受け付ける選択受付手段，
前記選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルのそれぞれについて、そのデータベーステーブルの各フィールド名を特徴付ける語句の集合である特徴語セットを、各フィールドの値に基づいて生成する特徴語セット生成手段，
前記特徴語セット生成手段が生成した全ての特徴語セットのそれぞれについて、特徴ベクトルを生成する特徴ベクトル生成手段，
前記選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名に対するそれ以外の残りの比較対象フィールド名の類似度を、前記特徴ベクトル生成手段が生成した特徴ベクトルに基づいて算出する処理、及び、そのフィールド名に対応する類似度テーブルに比較対象フィールド名と類似度とを対応付けて記録する処理を行う類似度生成手段，及び、
前記選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名とそれ以外の比較対象フィールド名と双方のフィールド名の類似度とを対応付けた連結情報を、記憶装置内の対応テーブルに記録する記録手段
として機能させる
ことを特徴とする連結情報生成プログラム。 (Appendix 1)
Computer
Selection accepting means for accepting selection information for identifying some database tables selected from among a plurality of database tables storing a plurality of records having several fields from an operator through an input device;
A feature word set, which is a set of words that characterize each field name of the database table, is generated based on the value of each field for each of all database tables specified by the selection information received by the selection receiving unit. Word set generation means,
Feature vector generation means for generating a feature vector for each of all feature word sets generated by the feature word set generation means;
For each of all field names of all database tables specified by the selection information received by the selection receiving means, the feature vector generation means indicates the similarity of the remaining comparison target field names with respect to the field name. A similarity generation means for performing a process of calculating based on the generated feature vector, and a process of associating and recording the comparison target field name and the similarity in the similarity table corresponding to the field name;
For each of all the field names of all the database tables specified by the selection information received by the selection receiving means, the field name, the other comparison target field name, and the similarity of both field names are associated with each other A linked information generation program that causes linked information to function as a recording unit that records the linked information in a correspondence table in a storage device.

（付記２）
前記コンピュータを、更に、
前記類似度の下限値を特定する類似度閾値情報を、操作者から入力装置を通じて受け付ける類似度閾値受付手段
として機能させるとともに、
前記類似度生成手段は、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブルから、前記類似度閾値受付手段が受け付けた類似度閾値情報にて特定される類似度の下限値を下回るレコードを、削除する
ことを特徴とする付記１記載の連結情報生成プログラム。 (Appendix 2)
Said computer further
The similarity threshold information specifying the lower limit value of the similarity is caused to function as a similarity threshold receiving unit that receives from the operator through the input device,
The similarity generation unit generates a similarity table for each of all field names, and then, from the similarity table, the similarity lower limit specified by the similarity threshold information received by the similarity threshold reception unit The connected information generation program according to appendix 1, wherein records that are less than the value are deleted.

（付記３）
前記コンピュータを、更に、
前記各類似度テーブル内において類似度の高さの順番にレコードを並べ替えた場合におけるその順番の下限値を特定する順位閾値情報を、操作者から入力装置を通じて受け付ける順位閾値受付手段
として機能させるとともに、
前記類似度生成手段は、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブル内のレコードを、類似度の高さの順番に並べ替え、更に、前記順位閾値受付手段が受け付けた順位閾値情報にて特定される順位の下限値を下回るレコードを、削除する
ことを特徴とする付記１又は２記載の連結情報生成プログラム。 (Appendix 3)
Said computer further
In the respective similarity tables, when the records are rearranged in the order of the degree of similarity, the rank threshold value information for specifying the lower limit value of the order is functioned as rank threshold value receiving means for receiving from the operator through the input device. ,
The similarity generation unit generates a similarity table for each of all field names, and then rearranges the records in the similarity table in order of similarity, and the rank threshold reception unit The connected information generation program according to appendix 1 or 2, wherein a record that falls below a lower limit value of the rank specified by the received rank threshold information is deleted.

（付記４）
前記コンピュータを、更に、
前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録すべきか否かを指定する指定情報を、操作者から入力装置を通じて受け付ける指定受付手段
として機能させるとともに、
前記類似度生成手段は、前記指定受付手段が受け付けた指定情報が、前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録しないことを指定するものであった場合において、前記選択受付手段が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて類似度テーブルを作成するときには、その類似度テーブルに対応するフィールド名と異なるデータベーステーブルに属する比較対象フィールド名と類似度とだけを、その類似度テーブルに記録する
ことを特徴とする付記１，２又は３記載の連結情報生成プログラム。 (Appendix 4)
Said computer further
Designation information that designates whether or not the comparison target field name that belongs to the same database table as the field name corresponding to the similarity table should be recorded in the similarity table functions as a designation receiving unit that receives from the operator through the input device With
The similarity generation means specifies that the designation information received by the designation acceptance means does not record a comparison target field name belonging to the same database table as the field name corresponding to the similarity table in the similarity table If the similarity table is created for each of all the field names of all the database tables specified by the selection information received by the selection receiving means, the field names corresponding to the similarity table are different. The connected information generation program according to appendix 1, 2, or 3, wherein only the comparison target field name and similarity belonging to the database table are recorded in the similarity table.

（付記５）
コンピュータが、
幾つかのフィールドを有する複数のレコードを記憶する複数のデータベーステーブルの中から選択された幾つかのデータベーステーブルを特定する選択情報を、操作者から入力装置を通じて受け付ける選択受付手順，
前記選択受付手順において受け付けた選択情報により特定される全てのデータベーステーブルのそれぞれについて、そのデータベーステーブルの各フィールド名を特徴付ける語句の集合である特徴語セットを、各フィールドの値に基づいて生成する特徴語セット生成手順，
前記特徴語セット生成手順において生成した全ての特徴語セットのそれぞれについて、特徴ベクトルを生成する特徴ベクトル生成手順，
前記選択受付手順において受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名に対するそれ以外の残りの比較対象フィールド名の類似度を、前記特徴ベクトル生成手順で生成した特徴ベクトルに基づいて算出する処理、及び、そのフィールド名に対応する類似度テーブルに比較対象フィールド名と類似度とを対応付けて記録する処理を行う類似度生成手順，及び、
前記選択受付手順において受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名とそれ以外の比較対象フィールド名と双方のフィールド名の類似度とを対応付けた連結情報を、記憶装置内の対応テーブルに記録する記録手順
を実行する
ことを特徴とする連結情報生成方法。 (Appendix 5)
Computer
A selection acceptance procedure for accepting selection information for identifying several database tables selected from a plurality of database tables storing a plurality of records having several fields from an operator through an input device;
A feature word set, which is a set of words that characterize each field name of the database table, is generated based on the value of each field for each of all database tables specified by the selection information received in the selection receiving procedure. Word set generation procedure,
A feature vector generation procedure for generating a feature vector for each of all feature word sets generated in the feature word set generation procedure;
For each of all field names of all the database tables specified by the selection information received in the selection reception procedure, the similarity of the remaining comparison target field names with respect to the field name is determined by the feature vector generation procedure. A similarity generation procedure for performing processing to calculate based on the generated feature vector, and processing to record the comparison target field name and similarity in association with the similarity table corresponding to the field name; and
For each field name of all the database tables specified by the selection information received in the selection reception procedure, the field name, the other comparison target field name, and the similarity of both field names are associated with each other. A method for generating linked information, comprising: performing a recording procedure for recording linked information in a correspondence table in a storage device.

（付記６）
前記コンピュータが、更に、
前記類似度の下限値を特定する類似度閾値情報を、操作者から入力装置を通じて受け付ける類似度閾値受付手順
を実行するとともに、
前記類似度生成手順において、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブルから、前記類似度閾値受付手段が受け付けた類似度閾値情報にて特定される類似度の下限値を下回るレコードを、削除する
ことを特徴とする付記５記載の連結情報生成方法。 (Appendix 6)
The computer further comprises:
While executing a similarity threshold acceptance procedure for accepting similarity threshold information specifying the lower limit of the similarity through an input device from an operator,
In the similarity generation procedure, after generating a similarity table for each of all field names, a lower limit of the similarity specified by the similarity threshold information received by the similarity threshold receiving unit from the similarity table The linked information generation method according to appendix 5, wherein records lower than the value are deleted.

（付記７）
前記コンピュータが、更に、
前記各類似度テーブル内において類似度の高さの順番にレコードを並べ替えた場合におけるその順番の下限値を特定する順位閾値情報を、操作者から入力装置を通じて受け付ける順位閾値受付手順
を実行するとともに、
前記類似度生成手順において、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブル内のレコードを、類似度の高さの順番に並べ替え、更に、前記順位閾値受付手順で受け付けた順位閾値情報にて特定される順位の下限値を下回るレコードを、削除する
ことを特徴とする付記５又は６記載の連結情報生成方法。 (Appendix 7)
The computer further comprises:
In the respective similarity degree tables, when the records are rearranged in the order of the similarity degree, the order threshold value receiving procedure for receiving the order threshold value information specifying the lower limit value of the order from the operator through the input device is executed. ,
In the similarity generation procedure, after generating a similarity table for each of all field names, the records in the similarity table are rearranged in the order of similarity, and further in the rank threshold value acceptance procedure. The linked information generation method according to appendix 5 or 6, wherein a record that falls below a lower limit value of the rank specified by the received rank threshold information is deleted.

（付記８）
前記コンピュータが、更に、
前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録すべきか否かを指定する指定情報を、操作者から入力装置を通じて受け付ける指定受付手順
を実行するとともに、
前記類似度生成手順において、前記指定受付手順で受け付けた指定情報が、前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録しないことを指定するものであった場合において、前記選択受付手順で受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて類似度テーブルを作成するときには、その類似度テーブルに対応するフィールド名と異なるデータベーステーブルに属する比較対象フィールド名と類似度とだけを、その類似度テーブルに記録する
ことを特徴とする付記５，６又は７記載の連結情報生成方法。 (Appendix 8)
The computer further comprises:
Executes a designation acceptance procedure for accepting designation information for designating whether or not a comparison target field name belonging to the same database table as the field name corresponding to the similarity degree table should be recorded in the similarity degree table from the operator through the input device With
In the similarity generation procedure, the designation information received in the designation reception procedure specifies that a comparison target field name belonging to the same database table as the field name corresponding to the similarity table is not recorded in the similarity table If the similarity table is created for each of all the field names of all the database tables specified by the selection information received in the selection receiving procedure, the field name corresponding to the similarity table is different. The linked information generation method according to appendix 5, 6 or 7, wherein only the comparison target field name and similarity that belong to the database table are recorded in the similarity table.

（付記９）
幾つかのフィールドを有する複数のレコードを記憶する複数のデータベーステーブルの中から選択された幾つかのデータベーステーブルを特定する選択情報を、操作者から入力装置を通じて受け付ける選択受付部，
前記選択受付部が受け付けた選択情報により特定される全てのデータベーステーブルのそれぞれについて、そのデータベーステーブルの各フィールド名を特徴付ける語句の集合である特徴語セットを、各フィールドの値に基づいて生成する特徴語セット生成部，
前記特徴語セット生成部が生成した全ての特徴語セットのそれぞれについて、特徴ベクトルを生成する特徴ベクトル生成部，
前記選択受付部が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名に対するそれ以外の残りの比較対象フィールド名の類似度を、前記特徴ベクトル生成部が生成した特徴ベクトルに基づいて算出する処理、及び、そのフィールド名に対応する類似度テーブルに比較対象フィールド名と類似度とを対応付けて記録する処理を行う類似度生成部，及び、
前記選択受付部が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて、そのフィールド名とそれ以外の比較対象フィールド名と双方のフィールド名の類似度とを対応付けた連結情報を、記憶装置内の対応テーブルに記録する記録部
を備えることを特徴とする連結情報生成装置。 (Appendix 9)
A selection receiving unit for receiving selection information for identifying several database tables selected from among a plurality of database tables storing a plurality of records having several fields from an operator through an input device;
A feature word set that is a set of words that characterize each field name of the database table is generated based on the value of each field for each of all database tables specified by the selection information received by the selection receiving unit. Word set generator,
A feature vector generation unit that generates a feature vector for each of all the feature word sets generated by the feature word set generation unit;
For each of all field names of all database tables specified by the selection information received by the selection receiving unit, the feature vector generation unit indicates the similarity of the remaining comparison target field names with respect to the field name. A similarity generation unit that performs a process of calculating based on the generated feature vector, and a process of recording the comparison target field name and the similarity in association with the similarity table corresponding to the field name; and
For each of all field names of all database tables specified by the selection information received by the selection receiving unit, the field name, the other comparison target field name, and the similarity of both field names are associated with each other. A connection information generation device comprising a recording unit that records connection information in a correspondence table in a storage device.

（付記１０）
前記類似度の下限値を特定する類似度閾値情報を、操作者から入力装置を通じて受け付ける類似度閾値受付部
を更に備えるとともに、
前記類似度生成部は、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブルから、前記類似度閾値受付部が受け付けた類似度閾値情報にて特定される類似度の下限値を下回るレコードを、削除する
ことを特徴とする付記９記載の連結情報生成装置。 (Appendix 10)
A similarity threshold receiving unit that receives similarity threshold information specifying the lower limit of the similarity from the operator through the input device,
The similarity generation unit generates a similarity table for each of all field names, and then, from the similarity table, the similarity lower limit specified by the similarity threshold information received by the similarity threshold reception unit The connected information generating apparatus according to appendix 9, wherein records below the value are deleted.

（付記１１）
前記各類似度テーブル内において類似度の高さの順番にレコードを並べ替えた場合におけるその順番の下限値を特定する順位閾値情報を、操作者から入力装置を通じて受け付ける順位閾値受付部
を更に備えるとともに、
前記類似度生成部は、全てのフィールド名のそれぞれについて類似度テーブルを生成した後、それら類似度テーブル内のレコードを、類似度の高さの順番に並べ替え、更に、前記順位閾値受付部が受け付けた順位閾値情報にて特定される順位の下限値を下回るレコードを、削除する
ことを特徴とする付記９又は１０記載の連結情報生成装置。 (Appendix 11)
A rank threshold value receiving unit that receives rank threshold information for specifying a lower limit value of the order when the records are rearranged in order of the degree of similarity in each similarity table, from the operator through the input device; ,
The similarity generation unit generates a similarity table for each of all field names, and then rearranges the records in the similarity table in order of similarity, and the rank threshold acceptance unit The connected information generating apparatus according to appendix 9 or 10, wherein a record that falls below a lower limit value of the rank specified by the received rank threshold information is deleted.

（付記１２）
前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録すべきか否かを指定する指定情報を、操作者から入力装置を通じて受け付ける指定受付部
を更に備えるとともに、
前記類似度生成部は、前記指定受付部が受け付けた指定情報が、前記類似度テーブルに対応するフィールド名と同じデータベーステーブルに属する比較対象フィールド名をその類似度テーブルに記録しないことを指定するものであった場合において、前記選択受付部が受け付けた選択情報により特定される全てのデータベーステーブルの全てのフィールド名のそれぞれについて類似度テーブルを作成するときには、その類似度テーブルに対応するフィールド名と異なるデータベーステーブルに属する比較対象フィールド名と類似度とだけを、その類似度テーブルに記録する
ことを特徴とする付記９，１０又は１１記載の連結情報生成装置。 (Appendix 12)
A designation receiving unit is further provided that receives designation information for designating whether or not the comparison target field name belonging to the same database table as the field name corresponding to the similarity table should be recorded in the similarity table from the operator through the input device. With
The similarity generation unit specifies that the designation information received by the designation receiving unit does not record a comparison target field name belonging to the same database table as the field name corresponding to the similarity table in the similarity table When the similarity table is created for each of all the field names of all the database tables specified by the selection information received by the selection receiving unit, the field name corresponding to the similarity table is different. The connected information generating apparatus according to appendix 9, 10 or 11, wherein only the comparison target field name and the similarity belonging to the database table are recorded in the similarity table.

三台以上のデータベース装置と本実施形態のデータベース連結装置とからなるコンピュータネットワークシステムの構成図Configuration diagram of a computer network system comprising three or more database devices and the database connection device of this embodiment データベース連結装置の構成図Configuration diagram of database connection device データベース連結ツールによる処理の流れを示す図Diagram showing the flow of processing by the database connection tool データベース連結ツールによる処理の流れを示す図Diagram showing the flow of processing by the database connection tool データベーステーブルのデータ構造の一例を示す図Diagram showing an example of the data structure of the database table 図５のデータベーステーブルから生成したオブジェクト及びテーブルの構造の一例を示す図The figure which shows an example of the structure of the object and table which were produced | generated from the database table of FIG. 統合クラスオブジェクトの構造の一例を示す図Diagram showing an example of the structure of an integrated class object 依存情報管理オブジェクトの構造の一例を示す図The figure which shows an example of the structure of a dependence information management object 「氏名」フィールド、「名前」フィールド、及び「社員名」のフィールド名についてそれぞれ生成された特徴語セットを示す図The figure which shows the feature word set which is respectively generated regarding the field name of “name” field, “name” field and “employee name” 処理対象プロパティオブジェクト及び類似度テーブルのデータ構造の一例を示す図The figure which shows an example of the data structure of a process target property object and a similarity table 対応テーブルのデータ構造の一例を示す図The figure which shows an example of the data structure of a correspondence table

Explanation of symbols

１０データベース装置
１１データベーステーブル
１２データベーステーブル
１３データベーステーブル
２０データベース連結装置
２０ａ表示装置
２０ｂ入力装置
２０ｃストレージ
２０ｄＣＰＵ
２１データベース連結ツール DESCRIPTION OF SYMBOLS 10 Database apparatus 11 Database table 12 Database table 13 Database table 20 Database connection apparatus 20a Display apparatus 20b Input apparatus 20c Storage 20d CPU
21 Database connection tool

Claims

Computer
Selection accepting means for accepting selection information for identifying some database tables selected from among a plurality of database tables storing a plurality of records having several fields from an operator through an input device;
A feature word set, which is a set of words that characterize each field name of the database table, is generated based on the value of each field for each of all database tables specified by the selection information received by the selection receiving unit. Word set generation means,
Feature vector generation means for generating a feature vector for each of all feature word sets generated by the feature word set generation means;
For each of all field names of all database tables specified by the selection information received by the selection receiving means, the feature vector generation means indicates the similarity of the remaining comparison target field names with respect to the field name. A similarity generation means for performing a process of calculating based on the generated feature vector, and a process of associating and recording the comparison target field name and the similarity in the similarity table corresponding to the field name;
For each of all field names of all the database tables specified by the selection information received by the receiving unit, the field name, the other comparison target field name, and the similarity of both field names are associated with each other A linked information generation program that causes information to function as a recording unit that records information in a correspondence table in a storage device.

Said computer further
The similarity threshold information specifying the lower limit value of the similarity is caused to function as a similarity threshold receiving unit that receives from the operator through the input device,
The similarity generation unit generates a similarity table for each of all field names, and then, from the similarity table, the similarity lower limit specified by the similarity threshold information received by the similarity threshold reception unit 2. The linked information generation program according to claim 1, wherein records having a value lower than the value are deleted.

Said computer further
In the respective similarity tables, when the records are rearranged in the order of the degree of similarity, the rank threshold value information for specifying the lower limit value of the order is functioned as rank threshold value receiving means for receiving from the operator through the input device. ,
The similarity generation unit generates a similarity table for each of all field names, and then rearranges the records in the similarity table in order of similarity, and the rank threshold reception unit The connected information generation program according to claim 1 or 2, wherein a record that falls below a lower limit value of the rank specified by the received rank threshold information is deleted.

Said computer further
Designation information that designates whether or not the comparison target field name that belongs to the same database table as the field name corresponding to the similarity table should be recorded in the similarity table functions as a designation receiving unit that receives from the operator through the input device With
The similarity generation means specifies that the designation information received by the designation acceptance means does not record a comparison target field name belonging to the same database table as the field name corresponding to the similarity table in the similarity table If the similarity table is created for each of all the field names of all the database tables specified by the selection information received by the selection receiving means, the field names corresponding to the similarity table are different. 4. The linked information generating program according to claim 1, wherein only the comparison target field name and the similarity belonging to the database table are recorded in the similarity table.

Computer
A selection acceptance procedure for accepting selection information for identifying several database tables selected from a plurality of database tables storing a plurality of records having several fields from an operator through an input device;
A feature word set, which is a set of words that characterize each field name of the database table, is generated based on the value of each field for each of all database tables specified by the selection information received in the selection receiving procedure. Word set generation procedure,
A feature vector generation procedure for generating a feature vector for each of all feature word sets generated in the feature word set generation procedure;
For each of all field names of all the database tables specified by the selection information received in the selection reception procedure, the similarity of the remaining comparison target field names with respect to the field name is determined by the feature vector generation procedure. A similarity generation procedure for performing processing to calculate based on the generated feature vector, and processing to record the comparison target field name and similarity in association with the similarity table corresponding to the field name; and
For each field name of all the database tables specified by the selection information received in the reception procedure, the field name, the other field name to be compared, and the similarity of both field names are associated with each other A method for generating linked information, comprising: executing a recording procedure for recording information in a correspondence table in a storage device.