JP2007188343A

JP2007188343A - Schema integration support device, schema integration support method, and schema integration support program

Info

Publication number: JP2007188343A
Application number: JP2006006596A
Authority: JP
Inventors: Shuichi Morikawa; 修一森川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-01-13
Filing date: 2006-01-13
Publication date: 2007-07-26
Anticipated expiration: 2026-01-13
Also published as: JP4855080B2

Abstract

<P>PROBLEM TO BE SOLVED: To prepare mapping information of a table without using "domain information" and an "attribute name", in integration processing of two databases. <P>SOLUTION: A schema information extracting part 103 acquires schema information indicating data structure of the table, from the first database 100 and the second database 101 to be integrated, to be stored in a schema information storage device 104. A feature information generating part 105 calculates the number of respective data items constituting the each table, based on the scheme information, in every kind of defined attributes, to be stored in a feature information storage device 107 as feature information, in the each table. A similarity evaluating part 108 compares the feature information in the table of the first database 100 with the feature information in the table of the second database 101, and calculate similarity, to be stored in a mapping model storage device 110 as the mapping information, while correlating the compared tables with the similarity thereof. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、異なるデータベース間のスキーマ統合作業を支援するスキーマ統合支援装置、スキーマ統合支援方法およびスキーマ統合支援プログラムに関するものである。 The present invention relates to a schema integration support apparatus, a schema integration support method, and a schema integration support program that support schema integration work between different databases.

同一概念を表すデータが異なる表現形式で定義されているような複数の異なるデータベースを統合する際に使用する、異なるデータベース間のスキーマ（データベースの構造）の対応付け（マッピング）作業を支援する従来の方法には以下の方法があった。
従来の方法の一つは、データの中でマッピングに使用する属性のドメイン情報（実際のデータ値の集合）を使って、評価関数により一致度を判定し、マッピング候補の導出を行う（例えば、特許文献１）。
また別の方法では、マッピングに使用する属性の名称を標準化し、属性名から類似度を算出することによりマッピング候補を提示する（例えば、特許文献２）。
特開２００４−８６７８２号公報特開平８−２４９３３８号公報 Traditional support for mapping (mapping) schemas (database structures) between different databases used when integrating multiple different databases where data representing the same concept is defined in different representation formats There were the following methods.
One of the conventional methods is to use the domain information (actual data value set) of attributes used for mapping in the data, determine the degree of matching by an evaluation function, and derive mapping candidates (for example, Patent Document 1).
In another method, the name of the attribute used for mapping is standardized, and the mapping candidate is presented by calculating the similarity from the attribute name (for example, Patent Document 2).
Japanese Patent Laid-Open No. 2004-86782 JP-A-8-249338

従来のスキーマ統合支援方法のうち、マッピング候補の判定に“ドメイン情報”を使用する方法（特許文献１）では、統合処理においてスキーマ情報だけでなく、実際のデータ値が必要になるという課題点があった。さらに、一致度の判定にデータ値を使用するため、統合対象のデータベース間で、事前にデータ値を標準化（統一）する必要があるという課題点があった。
また、マッピング候補の判定に“属性名”を使用する方法（特許文献２）では、属性名は自由形式で任意に設定される情報であるため、統合対象のそれぞれのデータベースにおいて、類似度の算出に適切な属性名が設定されていなければマッピング候補を提示できないという課題点があった。 Among the conventional schema integration support methods, the method of using “domain information” for determining mapping candidates (Patent Document 1) has a problem that not only schema information but also actual data values are required in the integration process. there were. Furthermore, since the data value is used for the determination of the degree of coincidence, there is a problem that it is necessary to standardize (unify) the data value between the databases to be integrated.
In addition, in the method of using “attribute name” for determining a mapping candidate (Patent Document 2), since the attribute name is information that is arbitrarily set in a free format, the similarity is calculated in each database to be integrated. There is a problem that a mapping candidate cannot be presented unless an appropriate attribute name is set in.

本発明は、“ドメイン情報”や“属性名”を使用せずにマッピング候補を作成できるようにすることなどを目的とする。 An object of the present invention is to make it possible to create a mapping candidate without using “domain information” or “attribute name”.

本発明のスキーマ統合支援装置は、第１のデータベースと第２のデータベースとの統合を支援する情報として第１のデータベースと第２のデータベースとでマッピングするテーブルについてのマッピング情報を出力するスキーマ統合支援装置であり、第１のデータベースが有するテーブルＡのデータ構造情報と第２のデータベースが有するテーブルＢのデータ構造情報とを記憶したスキーマ情報記憶装置と、前記スキーマ情報記憶装置に記憶された第１のデータベースが有するテーブルＡのデータ構造情報と第２のデータベースが有するテーブルＢのデータ構造情報とを中央処理装置を用いて比較して第１のデータベースが有するテーブルＡと第２のデータベースが有するテーブルＢとの類似度を算出し算出した類似度をマッピング情報として出力装置に出力する類似度評価部とを備えたことを特徴とする。 The schema integration support apparatus of the present invention outputs schema mapping information about tables mapped between the first database and the second database as information supporting the integration of the first database and the second database. A schema information storage device that stores the data structure information of the table A of the first database and the data structure information of the table B of the second database, and the first information stored in the schema information storage device. The data structure information of the table A included in the first database and the data structure information of the table B included in the second database are compared using the central processing unit, and the table A included in the first database and the table included in the second database Calculate the similarity with B and use the calculated similarity as mapping information It characterized by comprising a similarity evaluation unit for outputting to an output device.

本発明のスキーマ統合支援装置によれば、例えば、データ間の制約条件・従属関係といったテーブルのデータ構造に着目することで、類似度を算出することができ、類似するテーブルを選択してマッピング候補を作成することができる。 According to the schema integration support device of the present invention, for example, similarity can be calculated by paying attention to the data structure of a table such as constraint conditions and dependency relationships between data. Can be created.

実施の形態１．
以下、統合対象のデータベースシステム（以下、データベースとする）として「リレーショナルデータベース」を例に、データベースの統合を支援するための情報として、「テーブルのデータ構造情報（スキーマ情報）」に基づいて「テーブルの類似度（マッピング情報）」を生成するスキーマ統合支援装置、スキーマ統合支援方法およびスキーマ統合支援プログラムについて説明する。 Embodiment 1 FIG.
In the following, “relational database” is used as an example of a database system to be integrated (hereinafter referred to as database), and “table data structure information (schema information)” is used as information for supporting database integration. Schema integration support apparatus, schema integration support method, and schema integration support program for generating “similarity (mapping information)” will be described.

図１は、実施の形態１におけるスキーマ統合支援装置１０２の外観を示す図である。
図１において、スキーマ統合支援装置１０２は、システムユニット９１０、表示装置９０１、キーボード（Ｋ／Ｂ）９０２、マウス９０３、コンパクトディスク装置（ＣＤＤ）９０５、プリンタ装置９０６、スキャナ装置９０７を備え、これらはケーブルで接続されている。
さらに、スキーマ統合支援装置１０２は、ＦＡＸ機９３２、電話器９３１とケーブルで接続され、また、ローカルエリアネットワーク（ＬＡＮ）９４２、ウェブサーバ９４１を介してインターネット９４０に接続されている。 FIG. 1 is a diagram illustrating an appearance of the schema integration support apparatus 102 according to the first embodiment.
In FIG. 1, the schema integration support apparatus 102 includes a system unit 910, a display apparatus 901, a keyboard (K / B) 902, a mouse 903, a compact disk apparatus (CDD) 905, a printer apparatus 906, and a scanner apparatus 907. Connected with a cable.
Further, the schema integration support apparatus 102 is connected to a FAX machine 932 and a telephone 931 via a cable, and is connected to the Internet 940 via a local area network (LAN) 942 and a web server 941.

図２は、実施の形態１におけるスキーマ統合支援装置１０２のハードウェア構成図である。
図２において、実施の形態１におけるスキーマ統合支援装置１０２は、プログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：中央処理装置）９１１を備えている。ＣＰＵ９１１は、バス９１２を介してＲＯＭ９１３、ＲＡＭ９１４、通信ボード９１５、表示装置９０１、Ｋ／Ｂ９０２、マウス９０３、ＦＤＤ（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）９０４、磁気ディスク装置９２０、ＣＤＤ９０５、プリンタ装置９０６、スキャナ装置９０７、光ディスク装置９０８と接続されている。
表示装置９０１には液晶ディスプレイ、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイなどがある。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０、光ディスク装置９０８は、不揮発性メモリの一例である。これらは、記憶機器、記憶装置あるいは記憶部の一例であり、以下で説明する“スキーマ情報記憶装置１０４”、“データ型辞書記憶装置１０６”、“特徴情報記憶装置１０７”、“重み付け変数記憶装置１０９”および“マッピングモデル記憶装置１１０”を構成する。
通信ボード９１５は、ＦＡＸ機９３２、電話器９３１、ＬＡＮ９４２等に接続されている。
例えば、通信ボード９１５、Ｋ／Ｂ９０２、スキャナ装置９０７、ＦＤＤ９０４などは、入力機器、入力装置あるいは入力部の一例である。
また、例えば、通信ボード９１５、表示装置９０１などは、出力機器、出力装置あるいは出力部の一例である。 FIG. 2 is a hardware configuration diagram of the schema integration support apparatus 102 according to the first embodiment.
In FIG. 2, the schema integration support apparatus 102 according to the first embodiment includes a CPU (Central Processing Unit) 911 that executes a program. The CPU 911 includes a ROM 913, a RAM 914, a communication board 915, a display device 901, a K / B 902, a mouse 903, an FDD (Flexible Disk Drive) 904, a magnetic disk device 920, a CDD 905, a printer device 906, a scanner device 907, via a bus 912. It is connected to the optical disk device 908.
The display device 901 includes a liquid crystal display, a CRT (Cathode Ray Tube) display, and the like.
The RAM 914 is an example of a volatile memory. The ROM 913, the FDD 904, the CDD 905, the magnetic disk device 920, and the optical disk device 908 are examples of nonvolatile memories. These are examples of storage devices, storage devices, or storage units. “Schema information storage device 104”, “data type dictionary storage device 106”, “feature information storage device 107”, “weighting variable storage device” described below 109 ”and“ mapping model storage device 110 ”.
The communication board 915 is connected to a FAX machine 932, a telephone 931, a LAN 942, and the like.
For example, the communication board 915, the K / B 902, the scanner device 907, the FDD 904, and the like are examples of an input device, an input device, or an input unit.
Further, for example, the communication board 915, the display device 901, and the like are examples of an output device, an output device, or an output unit.

ここで、通信ボード９１５は、ＬＡＮ９４２に限らず、直接、インターネット９４０、或いはＩＳＤＮ等のＷＡＮ（ワイドエリアネットワーク）に接続されていても構わない。直接、インターネット９４０、或いはＩＳＤＮ等のＷＡＮに接続されている場合、スキーマ統合支援装置１０２は、インターネット９４０、或いはＩＳＤＮ等のＷＡＮに接続され、ウェブサーバ９４１は不用となる。また、スキーマ統合支援装置１０２の備える通信ボード９１５は統合対象のデータベースとスキーマ情報などのデータを送受信する。
磁気ディスク装置９２０には、オペレーティングシステム（ＯＳ）９２１、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３は、ＣＰＵ９１１、ＯＳ９２１、ウィンドウシステム９２２により実行される。 Here, the communication board 915 is not limited to the LAN 942 but may be directly connected to the Internet 940 or a WAN (Wide Area Network) such as ISDN. When directly connected to a WAN such as the Internet 940 or ISDN, the schema integration support apparatus 102 is connected to a WAN such as the Internet 940 or ISDN, and the web server 941 is unnecessary. The communication board 915 included in the schema integration support apparatus 102 transmits / receives data such as schema information to / from an integration target database.
The magnetic disk device 920 stores an operating system (OS) 921, a window system 922, a program group 923, and a file group 924. The program group 923 is executed by the CPU 911, the OS 921, and the window system 922.

上記プログラム群９２３には、実施の形態の説明において「〜部」として説明する機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。
ファイル群９２４には、実施の形態の説明において、「〜情報」として説明するデータ、「〜部」として説明する機能を実行した際の判定結果や演算結果を示すデータ、「〜部」として説明する機能を実行するプログラム間で受け渡しするデータなどが「〜ファイル」として記憶されている。例えば、以下に説明する“スキーマ情報”、“データ型辞書”、“特徴情報”、“重み付け変数”、“マッピング情報”が「〜ファイル」として記憶される。
また、実施の形態の説明において、フローチャートや構成図の矢印の部分は主としてデータの入出力を示し、そのデータの入出力のためにデータは、磁気ディスク装置９２０、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋｃａｒｔｒｉｄｇｅ）、光ディスク、ＣＤ（コンパクトディスク）、ＭＤ（ミニディスク）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、その他の記憶媒体に記憶される。あるいは、信号線やその他の伝送媒体により伝送される。 The program group 923 stores a program for executing a function described as “˜unit” in the description of the embodiment. The program is read and executed by the CPU 911.
In the file group 924, in the description of the embodiment, data described as “˜information”, data indicating a determination result or a calculation result when a function described as “˜part” is executed, and described as “˜part”. Data to be passed between programs that execute the function to be performed is stored as “˜file”. For example, “schema information”, “data type dictionary”, “feature information”, “weighting variable”, and “mapping information” described below are stored as “˜file”.
Also, in the description of the embodiments, the arrows in the flowcharts and configuration diagrams mainly indicate input / output of data, and the data for the input / output of the data includes a magnetic disk device 920, an FD (Flexible Disk cartridge), and an optical disk. , CD (compact disc), MD (mini disc), DVD (Digital Versatile Disk), and other storage media. Alternatively, it is transmitted through a signal line or other transmission medium.

また、実施の形態の説明において「〜部」として説明するものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、ハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。 Also, what is described as “˜unit” in the description of the embodiment may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented by software alone, hardware alone, a combination of software and hardware, or a combination of firmware.

また、実施の形態を実施するプログラムは、磁気ディスク装置９２０、ＦＤ、光ディスク、ＣＤ、ＭＤ、ＤＶＤ、その他の記憶媒体による記憶装置を用いて記憶されても構わない。 The program for implementing the embodiment may be stored using a storage device such as a magnetic disk device 920, FD, optical disk, CD, MD, DVD, or other storage medium.

図３は、実施の形態１におけるスキーマ統合支援装置１０２の機能要素を示す機能構成図である。
図３において、第１のデータベース１００、第２のデータベース１０１はスキーマ統合対象のデータベースである。
また、図３において、スキーマ統合支援装置１０２は以下の機能を備える。
スキーマ統合支援処理で利用する「スキーマ情報」を統合対象のデータベースから取得するスキーマ情報抽出部１０３。
スキーマ情報抽出部１０３が取得した「スキーマ情報」を格納するスキーマ情報記憶装置１０４。
「スキーマ情報」を元にテーブルの「特徴情報」を生成する特徴情報生成部１０５。
「特徴情報」を生成の際に、統合対象の各データベースが使用するデータ型属性を標準化するための「データ型辞書」を格納するデータ型辞書記憶装置１０６。
特徴情報生成部１０５が生成した「特徴情報」を格納する特徴情報記憶装置１０７。
「特徴情報」に基づいてテーブルの類似度を評価する類似度評価部１０８。
類似度評価部１０８がテーブルの類似度を評価する際に使用する定義値（以下、「重み付け変数」とする）を格納する重み付け変数記憶装置１０９。
類似度評価部１０８が類似度を元に導出した統合対象のデータベース間でのテーブルの「マッピング情報」を格納するマッピングモデル記憶装置１１０を有する。 FIG. 3 is a functional configuration diagram showing functional elements of the schema integration support apparatus 102 according to the first embodiment.
In FIG. 3, the first database 100 and the second database 101 are schema integration target databases.
In FIG. 3, the schema integration support apparatus 102 has the following functions.
A schema information extraction unit 103 that acquires “schema information” used in the schema integration support process from a database to be integrated.
A schema information storage device 104 that stores “schema information” acquired by the schema information extraction unit 103.
A feature information generation unit 105 that generates “feature information” of a table based on “schema information”.
A data type dictionary storage device 106 that stores a “data type dictionary” for standardizing data type attributes used by each database to be integrated when generating “feature information”.
A feature information storage device 107 that stores the “feature information” generated by the feature information generation unit 105.
A similarity evaluation unit 108 that evaluates the similarity of tables based on “feature information”.
A weighting variable storage device 109 that stores definition values (hereinafter referred to as “weighting variables”) used when the similarity evaluation unit 108 evaluates the degree of similarity of a table.
The similarity evaluation unit 108 includes a mapping model storage device 110 that stores “mapping information” of tables between databases to be integrated derived based on the similarity.

第１のデータベース１００、第２のデータベース１０１は一般的なデータベース管理システムにより管理され、スキーマ統合支援装置１０２とＬＡＮ等のネットワークにより接続される。あるいは、第１のデータベース１００、第２のデータベース１０１の両方または片方が、スキーマ統合支援装置１０２と同一の計算機システム上に存在していてもよい。 The first database 100 and the second database 101 are managed by a general database management system, and are connected to the schema integration support apparatus 102 via a network such as a LAN. Alternatively, both or one of the first database 100 and the second database 101 may exist on the same computer system as the schema integration support apparatus 102.

また、スキーマ情報記憶装置１０４、データ型辞書記憶装置１０６、特徴情報記憶装置１０７、重み付け変数記憶装置１０９およびマッピングモデル記憶装置１１０は、一つの記憶装置にまとめて実装してもよいし、個別の記憶装置で実装してもよい。 In addition, the schema information storage device 104, the data type dictionary storage device 106, the feature information storage device 107, the weighting variable storage device 109, and the mapping model storage device 110 may be implemented together in one storage device, You may mount with a memory | storage device.

次に、スキーマ情報記憶装置１０４が格納する「スキーマ情報」、データ型辞書記憶装置１０６が格納する「データ型辞書」、特徴情報記憶装置１０７が格納する「特徴情報」、重み付け変数記憶装置１０９が格納する「重み付け変数」およびマッピングモデル記憶装置１１０が格納する「マッピング情報」について説明する。 Next, “schema information” stored in the schema information storage device 104, “data type dictionary” stored in the data type dictionary storage device 106, “feature information” stored in the feature information storage device 107, and weighting variable storage device 109 The “weighting variable” stored and the “mapping information” stored in the mapping model storage device 110 will be described.

スキーマ情報抽出部１０３が統合対象のデータベースから取得しスキーマ情報記憶装置１０４に格納する「スキーマ情報」は、データベースの構造を表わすデータであり、データベースで記憶管理する「テーブルのデータ構造情報」が含まれる。
また、「テーブルのデータ構造情報」にはテーブルを構成するデータ項目（“列”や“データフィールド”を示す）ごとに定義された「データ属性」が含まれる。
「データ属性」は、同一テーブル内の他のデータ項目を一意に識別する属性や他のテーブルとの関連付けを示す属性などの「従属関係に関する情報（従属関係を示す属性）」、データ型・データサイズなどデータ項目に設定されるデータの「制約条件に関する情報（制約条件を示す属性）」などを示す。
「従属関係に関する情報（従属関係を示す属性）」には、「主キー属性（従属属性項目のデータを一意に識別する属性）」、「従属属性（自テーブルの主キー属性項目のデータに関連付ける属性）」、「外部キー属性（他テーブルの主キー属性項目のデータに関連付ける属性）」などがある。 The “schema information” acquired by the schema information extraction unit 103 from the integration target database and stored in the schema information storage device 104 is data representing the structure of the database, and includes “table data structure information” stored and managed in the database. It is.
The “table data structure information” includes a “data attribute” defined for each data item (indicating “column” or “data field”) constituting the table.
“Data attribute” is an attribute that uniquely identifies another data item in the same table, an attribute that indicates an association with another table, and other information related to the dependency (attribute indicating the dependency), data type / data Indicates “constraint condition information (attributes indicating constraint conditions)” of data set in data items such as size.
“Information about dependency (attribute indicating dependency)” includes “primary key attribute (attribute for uniquely identifying data of dependent attribute item)”, “subordinate attribute (related to data of primary key attribute item of own table) Attribute) ”,“ foreign key attribute (attribute to be associated with data of primary key attribute item of other table) ”and the like.

図４は、実施の形態１における「スキーマ情報」の一例を示すデータ構造図である。
図４において、テーブル情報１４０は１つのテーブルについての「テーブルのデータ構造情報」を示し、「スキーマ情報」はテーブル情報１４０をテーブル数分有する。各テーブル情報１４０はＩＤ、テーブル名、主キーカラム情報１４１、主キー参照先情報１４２、従属カラム情報１４３および外部キー情報１４４を有する。
ＩＤとテーブル名はテーブルを一意に識別する情報である。
主キーカラム情報１４１は「主キー属性」を示す情報であり、主キーを構成する全てのデータ項目のカラムについて「カラム名」と「データ型」を有する。
従属カラム情報１４３は「従属属性」を示す情報であり、主キー以外の全てのデータ項目のカラムについて「カラム名」と「データ型」を有する。
外部キー情報１４４は「外部キー属性」を示す情報であり、外部キーである全てのデータ項目について「外部キーＩＤ」と「外部キーカラム情報１４５」を有する。「外部キーＩＤ」は「外部キーカラム情報１４５」を一意に識別する情報であり、「外部キーカラム情報１４５」は外部キーを構成する全てのデータ項目のカラムについて「カラム名」と「データ型」を有する。
主キー参照先情報１４２は自テーブルの主キーに関連付いた外部キーを持つ他のテーブルの全てについて「参照先ＩＤ（他テーブルのＩＤ）」を有する。
図４に示したテーブル情報１４０のうちＩＤと外部キーＩＤは、スキーマ情報抽出部１０３が、統合対象のデータベースシステムから取得したスキーマ情報をスキーマ情報記憶装置１０４に格納する際に付加する情報である。 FIG. 4 is a data structure diagram showing an example of “schema information” in the first exemplary embodiment.
In FIG. 4, table information 140 indicates “table data structure information” for one table, and “schema information” has table information 140 corresponding to the number of tables. Each table information 140 has an ID, a table name, primary key column information 141, primary key reference destination information 142, dependent column information 143, and foreign key information 144.
The ID and table name are information for uniquely identifying the table.
The primary key column information 141 is information indicating “primary key attribute”, and has “column name” and “data type” for all data item columns constituting the primary key.
The dependent column information 143 is information indicating “dependent attribute”, and has “column name” and “data type” for all data item columns other than the primary key.
The foreign key information 144 is information indicating “foreign key attribute”, and has “foreign key ID” and “foreign key column information 145” for all data items that are foreign keys. The “foreign key ID” is information for uniquely identifying “foreign key column information 145”, and the “foreign key column information 145” indicates “column name” and “data type” for all data item columns constituting the foreign key. Have.
The primary key reference destination information 142 has “reference destination ID (ID of other table)” for all other tables having foreign keys associated with the primary key of the own table.
The ID and the foreign key ID in the table information 140 shown in FIG. 4 are information added when the schema information extraction unit 103 stores the schema information acquired from the database system to be integrated in the schema information storage device 104. .

図５は、実施の形態１における「データ型辞書」の一例を示す図である。
データ型辞書記憶装置１０６には、スキーマ統合支援装置１０２がスキーマ統合支援処理に使用するデータ型（以下、標準データ型とする）と統合対象のデータベースが使用しているデータ型（以下、固有データ型とする）との対応関係を図５に示すような「データ型辞書」として登録する。つまり、スキーマ統合支援装置１０２は、スキーマ統合支援処理の前処理として、第１のデータベース１００と第２のデータベース１０１の固有データ型について、標準データ型との対応関係を事前にデータ型辞書記憶装置１０６に登録する。
図５に示す「データ型辞書」は、統合対象のデータベースである第１のデータベース１００“ＤＢＭＳ１”と第２のデータベース１０１“ＤＢＭＳ２”とのそれぞれの固有データ型を標準データ型に対応付けている。図５に示す「データ型辞書」は、例えば、“ＤＢＭＳ１”の固有データ型である“ｃｈａｒ”型と“ｉｎｔｅｇｅｒ”型とをそれぞれ標準データ型である“文字型”と“真数型”とに対応付け、“ＤＢＭＳ２”の固有データ型である“ｃｈａｒ”型と“ｎｃｈａｒ”型とを標準データ型である“文字型”に対応付けている。
第１のデータベース１００と第２のデータベース１０１が同一のデータベース管理システムを使用している場合、つまり、同一のデータ型を使用している場合、「データ型辞書」はどちらか片方の固有データ型に標準データ型を対応付けたデータであってもよい。
標準データ型には、例えば文字型、真数型、概数型、日時型、バイナリ型などを定義し、「データ型辞書」は、スキーマ統合支援装置１０２のデータ型辞書記憶装置１０６に、予め用意してもよいし、あるいはシステム管理者（ユーザ）が入力装置を用いて随時に定義可能にしておきスキーマ統合支援処理の際に固有データ型と標準データ型との対応関係の定義を変更してもよい。 FIG. 5 is a diagram illustrating an example of a “data type dictionary” in the first embodiment.
The data type dictionary storage device 106 includes a data type used by the schema integration support device 102 for schema integration support processing (hereinafter referred to as a standard data type) and a data type used by the database to be integrated (hereinafter referred to as unique data). The data is registered as a “data type dictionary” as shown in FIG. That is, the schema integration support apparatus 102 preliminarily stores the correspondence relationship between the standard data types and the specific data types of the first database 100 and the second database 101 as a preprocess of the schema integration support process. 106.
The “data type dictionary” shown in FIG. 5 associates specific data types of the first database 100 “DBMS1” and the second database 101 “DBMS2”, which are databases to be integrated, with standard data types. . The “data type dictionary” shown in FIG. 5 includes, for example, “char” type and “integer” type, which are intrinsic data types of “DBMS1”, and “character type” and “integer type”, which are standard data types, respectively. And “char” type and “nchar” type which are unique data types of “DBMS2” are associated with “character type” which is a standard data type.
When the first database 100 and the second database 101 use the same database management system, that is, when the same data type is used, the “data type dictionary” is one of the unique data types. The data may be associated with a standard data type.
For example, a character type, an exact number type, an approximate number type, a datetime type, and a binary type are defined as standard data types, and a “data type dictionary” is prepared in advance in the data type dictionary storage device 106 of the schema integration support device 102. Alternatively, the system administrator (user) can change the definition of the correspondence between the unique data type and the standard data type at the time of schema integration support processing by enabling the definition at any time using the input device. Also good.

図６は、実施の形態１における「特徴情報」の項目の一例を示す図である。
特徴情報記憶装置１０７に格納する「特徴情報」は特徴情報生成部１０５が「スキーマ情報」からテーブル毎に導出する定量的な値を持つ項目の集まりであり、「特徴情報」が類似する２つのテーブルをマッピング候補とする。「特徴情報」を構成する図６に示すような各項目はスキーマ統合支援装置１０２の記憶装置（例えば、特徴情報記憶装置１０７）に予め定義しておく。
図６では、「ＩＤ」、「主キー属性数」、「主キーのデータ型構成」、「主キーの参照先テーブル数」、「従属属性数」、「従属属性のデータ型構成」、「外部キー属性数」、「外部キーのデータ型構成」を「特徴情報」の項目として定義している。
「ＩＤ」は当該「特徴情報」に対応するテーブルを一意に識別する情報である。
「主キー属性数」は主キー属性が定義されたデータ項目の数を示し、「従属属性数」は主キー属性が定義されていないデータ項目の数を示し、「外部キー属性数」は外部キー属性が定義されたデータ項目の数を示す。
「主キーのデータ型構成」は標準データ型毎の「主キー属性数」、つまり、主キー属性が定義された各データ項目を標準データ型の属性で分類した数を示し、「従属属性のデータ型構成」は標準データ型毎の「従属属性数」を示し、「外部キーのデータ型構成」は標準データ型毎の「外部キー属性数」を示す。
「主キーの参照先テーブル数」は当該「特徴情報」に対応するテーブルの主キーに関連付いた外部キーを持つ他のテーブルの数を示す。
図６に示す「特徴情報」の項目「ＩＤ」、「主キーの参照先テーブル数」は、それぞれ図４に示すテーブル情報１４０のＩＤ、主キー参照先情報１４２に対応する情報である。
また、図６に示す「特徴情報」の項目「主キー属性数」と「主キーのデータ型構成」は図４に示すテーブル情報１４０の主キーカラム情報１４１に対応し、図６に示す「特徴情報」の項目「従属属性数」と「従属属性のデータ型構成」は図４に示すテーブル情報１４０の従属カラム情報１４３に対応し、図６に示す「特徴情報」の項目「外部キー属性数」と「外部キーのデータ型構成」は図４に示すテーブル情報１４０の外部キー情報１４４に対応する。
以下、「特徴情報」の各項目を特徴情報項目とする。 FIG. 6 is a diagram illustrating an example of the item “feature information” in the first exemplary embodiment.
The “feature information” stored in the feature information storage device 107 is a collection of items having quantitative values that the feature information generation unit 105 derives for each table from the “schema information”. The table is a mapping candidate. Each item as shown in FIG. 6 constituting “feature information” is defined in advance in a storage device of the schema integration support device 102 (for example, the feature information storage device 107).
In FIG. 6, “ID”, “number of primary key attributes”, “data type configuration of primary key”, “number of primary key referenced tables”, “number of dependent attributes”, “data type configuration of dependent attributes”, “ “Number of foreign key attributes” and “Data type configuration of foreign key” are defined as items of “feature information”.
“ID” is information for uniquely identifying a table corresponding to the “feature information”.
“Number of primary key attributes” indicates the number of data items for which primary key attributes are defined, “Number of subordinate attributes” indicates the number of data items for which primary key attributes are not defined, and “Number of foreign key attributes” indicates external Indicates the number of data items for which key attributes are defined.
“Primary key data type composition” indicates the number of “primary key attributes” for each standard data type, that is, the number of each data item in which the primary key attribute is defined, classified by the attributes of the standard data type. “Data type configuration” indicates “number of subordinate attributes” for each standard data type, and “Data type configuration of foreign key” indicates “number of foreign key attributes” for each standard data type.
“Number of primary key reference destination tables” indicates the number of other tables having foreign keys associated with the primary key of the table corresponding to the “feature information”.
The items “ID” and “number of primary key reference destination tables” of “feature information” shown in FIG. 6 are information corresponding to the ID of the table information 140 and primary key reference destination information 142 shown in FIG.
6 correspond to the primary key column information 141 of the table information 140 shown in FIG. 4, and the “feature information” items “number of primary key attributes” and “data type configuration of primary key” shown in FIG. The items “number of dependent attributes” and “data type configuration of the dependent attributes” of the “information” correspond to the dependent column information 143 of the table information 140 shown in FIG. 4, and the item “number of foreign key attributes” of the “feature information” shown in FIG. "And" Data type configuration of the foreign key "correspond to the foreign key information 144 of the table information 140 shown in FIG.
Hereinafter, each item of “feature information” is referred to as a feature information item.

次に、「重み付け変数」について説明する。
重み付け変数記憶装置１０９には類似度評価部１０８が類似度を評価する際に使用するパラメータ（重み付け変数）を設定する。
パラメータ（重み付け変数）には、各“特徴情報項目”に対するパラメータである「類似許容範囲」と「変動率」と「比重」、また、“特徴情報”に対するパラメータである「類似判定閾値」、“データ型”に対するパラメータである「標準データ型間での類似度」等がある。
「類似許容範囲」は、統合対象のデータベースが有する各テーブル間で各特徴情報項目の値を比較するときに類似かどうかを判定するための境界値である。類似度評価部１０８は、２つのテーブル間で特徴情報項目Ａ（例えば、主キー属性数）の差分値が「類似許容範囲」以内であれば当該２つのテーブルの特徴情報項目Ａは“類似”、「類似許容範囲」を超えていれば特徴情報項目Ａは“類似していない（非類似）”と判定する。“類似”している“特徴情報項目”が多いほど“特徴情報”が類似し、類似する“特徴情報”に対応する２つのテーブルはマッピング候補となる。
「変動率」は、類似度評価部１０８が“類似”と判定した特徴情報項目について、類似度評価部１０８が２つのテーブル間での特徴情報項目の差分値に応じて当該特徴情報項目の類似度の算出に使用する値で、類似度を減少させる割合（差分値当たりの類似度減少値）を示す。例えば、類似度を０（完全不一致）〜１（完全一致）とし、「変動率」を０．１とした場合、差分値が２である特徴情報項目の類似度は０．８（＝１−０．１×２）になる。
「比重」は、類似度評価部１０８が特徴情報全体の類似度、つまり、比較した２つのテーブルの類似度、を算出する際に使用する各特徴情報項目の類似度に対する重み付けを示す。例えば、全ての特徴情報項目で類似度の重みを均等に扱う場合は全ての特徴情報項目の「比重」を１とする。また、ある特定の特徴情報項目を重く扱う場合は、他の特徴情報項目の「比重」が１であれば、重く扱う特徴情報項目の「比重」を１より大きい値にする。
「標準データ型間での類似度」は、２つのテーブル間で異なる標準データ型のデータ項目を対応付ける場合の当該データ項目分の特徴情報項目の類似度を示す。例えば、標準データ型間の類似度を０（完全不一致）〜１（完全一致）としたときの「標準データ型間での類似度」の一例を図７に示す。図７において、文字型のデータ項目同士の類似度は１、文字型のデータ項目と真数型のデータ項目との類似度は０．４を示す。
「類似判定閾値」は、特徴情報全体として類似か、つまり、テーブルが類似かどうか、を判定するための境界値である。類似度評価部１０８は各特徴情報項目の類似度に「比重」を掛けて合計した値が「類似判定閾値」以上であれば“類似”、「類似判定閾値」未満であれば“非類似”と判定する。
各パラメータの設定値は、スキーマ統合支援装置１０２で予め定義しておき、必要に応じてシステム管理者（ユーザ）により入力装置で変更することが可能である。また、パラメータは特徴情報項目の内容に応じて追加あるいは削除してもよい。 Next, the “weighting variable” will be described.
In the weighting variable storage device 109, a parameter (weighting variable) used when the similarity evaluation unit 108 evaluates the similarity is set.
The parameters (weighting variables) include “similar allowable range”, “variation rate”, and “specific gravity”, which are parameters for each “feature information item”, and “similarity determination threshold”, “parameters for“ feature information ”, “Similarity between standard data types”, which is a parameter for “data type”.
The “similar allowable range” is a boundary value for determining whether or not the values of the feature information items are compared between the tables of the integration target database. The similarity evaluation unit 108 determines that the feature information item A of the two tables is “similar” if the difference value of the feature information item A (for example, the number of primary key attributes) between the two tables is within the “similar allowable range”. If it exceeds the “similar allowable range”, the feature information item A is determined to be “not similar (dissimilar)”. The more “feature information items” that are “similar” are, the more similar “feature information” is, and two tables corresponding to similar “feature information” become mapping candidates.
The “variation rate” is the similarity of the feature information item for the feature information item determined by the similarity evaluation unit 108 as “similar” according to the difference value of the feature information item between the two tables. This is a value used to calculate the degree, and indicates the ratio of decreasing the degree of similarity (similarity reduction value per difference value). For example, when the similarity is 0 (completely unmatched) to 1 (completely matched) and the “variation rate” is 0.1, the similarity of the feature information item having a difference value of 2 is 0.8 (= 1−1). 0.1 × 2).
“Specific gravity” indicates a weight for the similarity of each feature information item used when the similarity evaluation unit 108 calculates the similarity of the entire feature information, that is, the similarity of two compared tables. For example, when the similarity weight is equally treated in all feature information items, the “specific gravity” of all feature information items is set to 1. Also, when handling a particular feature information item heavily, if the “specific gravity” of another feature information item is 1, the “specific gravity” of the feature information item to be handled heavily is set to a value larger than 1.
“Similarity between standard data types” indicates the similarity of feature information items corresponding to the data items when data items of different standard data types are associated between two tables. For example, FIG. 7 shows an example of “similarity between standard data types” when the similarity between standard data types is 0 (completely unmatched) to 1 (completely matched). In FIG. 7, the similarity between character-type data items is 1, and the similarity between a character-type data item and an integer data item is 0.4.
The “similarity determination threshold” is a boundary value for determining whether the entire feature information is similar, that is, whether the table is similar. The similarity evaluation unit 108 multiplies the similarity of each feature information item by “specific gravity” to “similar” if the total value is equal to or greater than the “similarity determination threshold”, and “dissimilar” if it is less than the “similarity determination threshold”. Is determined.
The setting value of each parameter is defined in advance by the schema integration support device 102, and can be changed by the system administrator (user) by the input device as necessary. The parameter may be added or deleted according to the content of the feature information item.

図８は、実施の形態１における「マッピング情報」の一例を示す図である。
類似度評価部１０８は、「特徴情報」が類似でありマッピング候補と判定した２つのテーブルについて、図８に示すような「マッピング情報」をマッピングモデル記憶装置１１０に格納する。「マッピング情報」は、例えば、第１のデータベース１００と第２のデータベース１０１とでマッピング候補と判定したそれぞれのテーブルの名前とその類似度を示す。
図８では、テーブル名を“スキーマ名．テーブル名”で表わし、各行にマッピング候補を示している。図８において、第１のデータベース１００の“ＨＡＮ”スキーマの“ＵＲＩＡＧＥ”テーブル（第１のデータベースが有するテーブルＡの一例）に対する第２のデータベース１０１のテーブルのマッピング候補（第２のデータベースが有するテーブルＢの一例）は、“販売”スキーマの“売上実績”テーブル、“販売”スキーマの“売上明細”テーブル、“在庫”スキーマの“出庫実績”テーブルの３テーブルである。また、それぞれの類似度は“４．２３”、“５．６５”、“３．７２”であり、データベースを統合するユーザやスキーマ統合支援装置１０２は類似度を参照することで、第１のデータベース１００の“ＨＡＮ”スキーマの“ＵＲＩＡＧＥ”テーブルにマッピングする第２のデータベース１０１のテーブルの一番の候補は一番類似度が高い“販売”スキーマの“売上明細”テーブルであることが判定できる。 FIG. 8 is a diagram illustrating an example of “mapping information” in the first embodiment.
The similarity evaluation unit 108 stores “mapping information” as shown in FIG. 8 in the mapping model storage device 110 for the two tables that are determined to be mapping candidates with similar “feature information”. “Mapping information” indicates, for example, the name of each table determined as a mapping candidate in the first database 100 and the second database 101 and the degree of similarity.
In FIG. 8, the table name is represented by “schema name.table name”, and mapping candidates are shown in each row. In FIG. 8, mapping candidates (tables of the second database) of the second database 101 to the “URIAGE” table (an example of the table A of the first database) of the “HAN” schema of the first database 100. (Example B) includes three tables: a “sales record” table in the “sales” schema, a “sales details” table in the “sales” schema, and a “shipping record” table in the “stock” schema. Also, the respective similarities are “4.23”, “5.65”, and “3.72”, and the user who integrates the database and the schema integration support apparatus 102 refer to the similarity so that the first It can be determined that the first candidate of the table of the second database 101 mapped to the “URIAGE” table of the “HAN” schema of the database 100 is the “sales details” table of the “sales” schema having the highest similarity. .

図９は、実施の形態１におけるスキーマ統合支援処理（スキーマ統合支援方法）を示すフローチャートである。
スキーマ統合支援装置１０２が実行するスキーマ統合支援処理（スキーマ統合支援方法）を図９のフローチャートを用いて以下に説明する。
なお、以下に説明するスキーマ統合支援処理（スキーマ統合支援方法）はコンピュータに実行させることができ、スキーマ統合支援処理（スキーマ統合支援方法）をコンピュータに実行させるプログラムがスキーマ統合支援プログラムである。 FIG. 9 is a flowchart showing schema integration support processing (schema integration support method) in the first embodiment.
The schema integration support process (schema integration support method) executed by the schema integration support apparatus 102 will be described below with reference to the flowchart of FIG.
A schema integration support process (schema integration support method) described below can be executed by a computer, and a program that causes a computer to execute the schema integration support process (schema integration support method) is a schema integration support program.

＜ステップ１２０：データ構造情報抽出処理工程＞
まず、スキーマ情報抽出部１０３は、統合対象のデータベースシステムのいずれか（第１のデータベース１００あるいは第２のデータベース１０１）から「スキーマ情報」を取得する。そして、スキーマ情報抽出部１０３は、取得した「スキーマ情報」に基づいて各テーブルのテーブル情報１４０を生成し、全テーブルのテーブル情報１４０を「スキーマ情報（以降のスキーマ統合支援処理で使用するスキーマ情報）」としてスキーマ情報記憶装置１０４に格納する。
取得する「スキーマ情報」としては、データベースに存在するテーブルの“テーブル名一覧情報”、各テーブルの“主キー属性の情報”、各テーブルの“従属属性の情報”、各テーブルの“外部キー属性の情報”などがある。
スキーマ情報抽出部１０３は、第１のデータベース１００および第２のデータベース１０１を管理するデータベース管理システムが提供する「スキーマ情報」へのアクセス機能を使用して「スキーマ情報」を収集する。「スキーマ情報」にアクセスする方法としては、システムテーブル等と呼ばれるデータベースの「スキーマ情報」の格納されているデータを通常のデータと同様にアクセスする方法や、「スキーマ情報」をアクセスするための専用ＡＰＩ（アプリケーションインタフェース）を使用する方法等があるが、いずれの方法を使用してもよい。
スキーマ情報抽出部１０３は特徴情報生成部１０５が生成する「特徴情報」に依存して必要となる「スキーマ情報」を収集し、テーブル情報１４０を生成する。
例えば、特徴情報生成部１０５が図６に示すような特徴情報項目を有する「特徴情報」を生成することに依存して図４に示すようなテーブル情報１４０を生成する場合、スキーマ情報抽出部１０３のテーブル情報１４０（スキーマ情報の一部）の生成処理は次のようになる。
ユニークなＩＤを生成しテーブル情報１４０の“ＩＤ”に設定する。
また、取得した「スキーマ情報」の“テーブル名一覧情報”に基づいてテーブル情報１４０の“テーブル名”を設定する。
また、取得した「スキーマ情報」の“主キー属性の情報”に基づいてテーブル情報１４０の“主キーカラム情報１４１”を生成する。
また、取得した「スキーマ情報」の“従属属性の情報”に基づいてテーブル情報１４０の“従属カラム情報１４３”を生成する。
また、取得した「スキーマ情報」の“外部キー属性の情報”に基づいてテーブル情報１４０の“主キー参照先情報１４２”と“外部キー情報１４４”とを生成する。 <Step 120: Data structure information extraction process>
First, the schema information extraction unit 103 acquires “schema information” from any of the database systems to be integrated (the first database 100 or the second database 101). Then, the schema information extraction unit 103 generates table information 140 of each table based on the acquired “schema information”, and the table information 140 of all tables is converted into “schema information (schema information used in the schema integration support process thereafter). ”) In the schema information storage device 104.
“Schema information” to be acquired includes “table name list information” of tables existing in the database, “primary key attribute information” of each table, “subordinate attribute information” of each table, “foreign key attribute of each table Information ”.
The schema information extraction unit 103 collects “schema information” by using an access function to “schema information” provided by the database management system that manages the first database 100 and the second database 101. As a method of accessing “schema information”, a method of accessing data stored in “schema information” of a database called a system table or the like in the same way as normal data, or a dedicated method for accessing “schema information” There are methods using an API (application interface), and any method may be used.
The schema information extraction unit 103 collects “schema information” required depending on the “feature information” generated by the feature information generation unit 105, and generates table information 140.
For example, when the table information 140 shown in FIG. 4 is generated depending on the feature information generation unit 105 generating “feature information” having the feature information items shown in FIG. The table information 140 (part of schema information) is generated as follows.
A unique ID is generated and set in “ID” of the table information 140.
Further, the “table name” of the table information 140 is set based on the “table name list information” of the acquired “schema information”.
Further, “primary key column information 141” of the table information 140 is generated based on the “primary key attribute information” of the acquired “schema information”.
Further, “subordinate column information 143” of the table information 140 is generated based on the “subordinate attribute information” of the acquired “schema information”.
Further, “primary key reference destination information 142” and “foreign key information 144” of the table information 140 are generated based on the “foreign key attribute information” of the acquired “schema information”.

＜ステップ１２１：特徴情報生成処理工程＞
次に、特徴情報生成部１０５は、スキーマ情報抽出部１０３が生成してスキーマ情報記憶装置１０４に格納した「スキーマ情報（テーブル情報１４０の集合）」に基づいて各テーブルの「特徴情報」を生成して特徴情報記憶装置１０７に格納する。
例えば、図４のテーブル情報１４０に基づいて図６に示す特徴情報項目を有する「特徴情報」を生成する場合の特徴情報生成部１０５の「特徴情報」の生成処理は次のようになる。
特徴情報項目の「ＩＤ」にはテーブル情報１４０のＩＤを設定する。
特徴情報項目の「主キー属性数」、「主キーの参照先テーブル数」、「従属属性数」、「外部キー属性数」には、それぞれテーブル情報１４０の主キーカラム情報１４１の個数、主キー参照先情報１４２の個数、従属カラム情報１４３の個数、外部キー情報１４４の個数を設定する。
また、特徴情報生成部１０５は、テーブル情報１４０の主キーカラム情報１４１、従属カラム情報１４３、外部キーカラム情報１４５の各データ型を、図５に示すようなデータ型辞書を使用して標準データ型に変換し、主キーカラム情報１４１、従属カラム情報１４３、外部キーカラム情報１４５のそれぞれについて標準データ型毎のデータ項目数を集計する。そして、特徴情報生成部１０５は、特徴情報項目の「主キーのデータ型構成」には主キーカラム情報１４１のデータ型について集計した各標準データ型のデータ項目数を設定し、特徴情報項目の「従属属性のデータ型構成」には従属カラム情報１４３のデータ型について集計した各標準データ型のデータ項目数を設定し、特徴情報項目の「外部キーのデータ型構成」には外部キーカラム情報１４５のデータ型について集計した各標準データ型のデータ項目数を設定する。 <Step 121: Feature information generation process>
Next, the feature information generation unit 105 generates “feature information” of each table based on the “schema information (collection of table information 140)” generated by the schema information extraction unit 103 and stored in the schema information storage device 104. And stored in the feature information storage device 107.
For example, the “feature information” generation process of the feature information generation unit 105 when generating the “feature information” having the feature information items shown in FIG. 6 based on the table information 140 of FIG. 4 is as follows.
The ID of the table information 140 is set in the “ID” of the feature information item.
In the feature information item “number of primary key attributes”, “number of primary key reference destination tables”, “number of dependent attributes”, and “number of foreign key attributes”, the number of primary key column information 141 of the table information 140, the primary key, respectively. The number of reference destination information 142, the number of dependent column information 143, and the number of foreign key information 144 are set.
Further, the feature information generation unit 105 converts the data types of the primary key column information 141, the dependent column information 143, and the foreign key column information 145 of the table information 140 into standard data types using a data type dictionary as shown in FIG. The number of data items for each standard data type is aggregated for each of the primary key column information 141, the dependent column information 143, and the foreign key column information 145. Then, the feature information generation unit 105 sets the number of data items of each standard data type for the data type of the primary key column information 141 in the “primary key data type configuration” of the feature information item. The number of data items of each standard data type aggregated for the data type of the dependent column information 143 is set in the “data type configuration of the dependent attribute”, and the data type configuration of the foreign key column information 145 is set in the “data type configuration of the external key” of the feature information item. Set the number of data items of each standard data type aggregated for the data type.

＜ステップ１２２＞
スキーマ統合支援装置１０２は統合対象の全てのデータベースについて「特徴情報」を生成したか判定する。
例えば、スキーマ情報抽出部１０３が、第１のデータベース１００と第２のデータベース１０１の二つのデータベースに対して上記ステップ１２０と上記ステップ１２１とを実施済みか確認し、実施済みの場合はステップ１２３に処理を移し、未実施のデータベースがある場合はステップ１２０に処理を移す。 <Step 122>
The schema integration support apparatus 102 determines whether “feature information” has been generated for all databases to be integrated.
For example, the schema information extraction unit 103 confirms whether or not the above step 120 and the above step 121 have been performed on the two databases of the first database 100 and the second database 101. If there is an unexecuted database, the process moves to step 120.

＜ステップ１２３：類似度評価処理工程＞
そして、類似度評価部１０８は、特徴情報記憶装置１０７に格納された「特徴情報」に基づいて統合対象のデータベース間での各テーブルの類似度を算出する。そして、「マッピング情報」を生成してマッピングモデル記憶装置１１０（出力装置の一例）に格納しスキーマ統合支援処理を終了する。
また、類似度評価部１０８は表示装置９０１やプリンタ装置９０６などの出力装置、第１のデータベース１００と第２のデータベース１０１とのデータをマージするプログラムなどに「マッピング情報」を出力する。出力された「マッピング情報」に基づいてデータベースの統合作業をユーザに行わせることで、スキーマ統合支援装置１０２はユーザのデータベース統合作業の効率を向上することができる。
類似度評価部１０８は、類似度評価処理工程において、統合対象のデータベース間で２つのデータベースのテーブルを組み合わせて全テーブルの組み合わせについてテーブルの類似度を算出する。また、テーブルの類似度を算出する際、類似度評価部１０８は、各特徴情報項目について類似度を算出し、各特徴情報項目の類似度を合計した値を「特徴情報」の類似度、つまり、テーブルの類似度とする。 <Step 123: Similarity Evaluation Process>
Then, the similarity evaluation unit 108 calculates the similarity of each table between the integration target databases based on the “feature information” stored in the feature information storage device 107. Then, “mapping information” is generated and stored in the mapping model storage device 110 (an example of an output device), and the schema integration support processing is terminated.
Further, the similarity evaluation unit 108 outputs “mapping information” to an output device such as the display device 901 or the printer device 906, a program for merging data of the first database 100 and the second database 101, or the like. By causing the user to perform database integration work based on the output “mapping information”, the schema integration support apparatus 102 can improve the efficiency of the user's database integration work.
In the similarity evaluation processing step, the similarity evaluation unit 108 combines the tables of the two databases between the databases to be integrated, and calculates the table similarity for all table combinations. Further, when calculating the similarity of the table, the similarity evaluation unit 108 calculates the similarity for each feature information item, and the sum of the similarities of each feature information item is used as the similarity of “feature information”, that is, Let the similarity of the table.

図１０は、実施の形態１における類似度評価部１０８の類似度評価処理の流れを示すフローチャートである。
類似度評価部１０８が実行する上記ステップ１２３（類似度評価処理工程）について図１０に基づいて以下に説明する。 FIG. 10 is a flowchart showing the flow of the similarity evaluation process of the similarity evaluation unit 108 in the first embodiment.
The step 123 (similarity evaluation processing step) executed by the similarity evaluation unit 108 will be described below with reference to FIG.

＜類似度算出処理工程：ステップ１８１〜ステップ１８４、ステップ１８７、ステップ１８８＞
＜ステップ１８１＞
類似度評価部１０８は、統合対象のデータベースの一方を選定し、選定した方のデータベースを類似度評価する際の比較元のデータベースとする。例えば、第１のデータベース１００と第２のデータベース１０１とで定義されているそれぞれのテーブル数を比較してテーブル数が多い方を比較元のデータベースとする。ここでは、第１のデータベース１００を選定したものとする。なお選定方法は任意の方法でよく、例えば、データベースとして指定された順番に基づいて比較元・比較先を選定する。 <Similarity calculation process: Step 181 to Step 184, Step 187, Step 188>
<Step 181>
The similarity evaluation unit 108 selects one of the databases to be integrated, and uses the selected database as a comparison source database when evaluating the similarity. For example, the number of tables defined in the first database 100 and the second database 101 is compared, and the table with the larger number of tables is used as the comparison source database. Here, it is assumed that the first database 100 is selected. The selection method may be any method. For example, the comparison source and the comparison destination are selected based on the order designated as the database.

＜ステップ１８２＞
類似度評価部１０８は比較元のデータベース（第１のデータベース１００）が有する各テーブルに対応する「特徴情報」を特徴情報記憶装置１０７から一つ取り出す。 <Step 182>
The similarity evaluation unit 108 extracts one “feature information” corresponding to each table of the comparison source database (first database 100) from the feature information storage device 107.

＜ステップ１８３＞
類似度評価部１０８は比較先のデータベース（第２のデータベース１０１）が有する各テーブルに対応する「特徴情報」を特徴情報記憶装置１０７から一つ取り出す。 <Step 183>
The similarity evaluation unit 108 extracts one “feature information” corresponding to each table of the comparison target database (second database 101) from the feature information storage device 107.

＜ステップ１８４：類似度計算処理＞
類似度評価部１０８はステップ１８２で取得した比較元のデータベース（第１のデータベース１００）の「特徴情報」とステップ１８３で取得した比較先のデータベース（第２のデータベース１０１）の「特徴情報」とを比較して「特徴情報」の類似度、つまり、テーブルの類似度を計算する。 <Step 184: Similarity calculation processing>
The similarity evaluation unit 108 compares the “feature information” of the comparison source database (first database 100) acquired in step 182 and the “feature information” of the comparison destination database (second database 101) acquired in step 183. Are compared to calculate the similarity of “feature information”, that is, the similarity of tables.

＜マッピング情報生成処理工程：ステップ１８５〜ステップ１８６＞
＜ステップ１８５＞
類似度評価部１０８はステップ１８４で計算した類似度と重み付け変数記憶装置１０９に記憶された「類似判定閾値」とを比較する。比較した結果、類似度が「類似判定閾値」以上（類似）の場合はステップ１８６に処理を移し、類似度が「類似判定閾値」未満（非類似）の場合はステップ１８７に処理を移す。 <Mapping information generation process: Step 185 to Step 186>
<Step 185>
The similarity evaluation unit 108 compares the similarity calculated in step 184 with the “similarity determination threshold” stored in the weighting variable storage device 109. As a result of the comparison, if the similarity is equal to or higher than the “similarity determination threshold” (similarity), the process proceeds to step 186. If the similarity is less than the “similarity determination threshold” (not similar), the process proceeds to step 187.

＜ステップ１８６＞
類似度評価部１０８はステップ１８５で類似と判定した１組の「特徴情報」についてマッピング候補を生成しマッピングモデル記憶装置１１０に格納された「マッピング情報」に設定する。
このとき、類似度評価部１０８はステップ１８５で類似と判定した１組の「特徴情報」のそれぞれについてスキーマ情報記憶装置１０４に記憶された「スキーマ情報」を検索して「ＩＤ」が一致するテーブル情報１４０を取得し、取得したテーブル情報１４０のテーブル名とステップ１８４で計算した類似度とをマッピング候補として図８に示すようにマッピングモデル記憶装置１１０に格納する。 <Step 186>
The similarity evaluation unit 108 generates mapping candidates for the set of “feature information” determined to be similar in step 185 and sets the mapping candidates in the “mapping information” stored in the mapping model storage device 110.
At this time, the similarity evaluation unit 108 searches the “schema information” stored in the schema information storage device 104 for each of the set of “feature information” determined to be similar in step 185 and matches the “ID”. The information 140 is acquired, and the table name of the acquired table information 140 and the similarity calculated in step 184 are stored as mapping candidates in the mapping model storage device 110 as shown in FIG.

＜ステップ１８７＞
類似度評価部１０８は比較先のデータベース（第２のデータベース１０１）の「特徴情報」で未評価のものがないか確認し、未評価の「特徴情報」がある場合はステップ１８３に処理を移し、未評価の「特徴情報」がない場合はステップ１８８に処理を移す。 <Step 187>
The similarity evaluation unit 108 checks whether there is any unevaluated “feature information” in the comparison target database (second database 101). If there is unevaluated “feature information”, the process proceeds to step 183. If there is no unevaluated “feature information”, the process proceeds to step 188.

＜ステップ１８８＞
類似度評価部１０８は比較元のデータベース（第１のデータベース１００）の「特徴情報」で未評価のものがないか確認し、未評価の「特徴情報」がある場合はステップ１８２に処理を移し、未評価の「特徴情報」がない場合は類似度評価処理を終了する。 <Step 188>
The similarity evaluation unit 108 checks whether there is any unevaluated “feature information” in the comparison source database (first database 100). If there is unevaluated “feature information”, the process proceeds to step 182. If there is no unevaluated “feature information”, the similarity evaluation process is terminated.

上記の類似度評価処理により、類似度評価部１０８は比較元のデータベースが有する全てのテーブルを比較先の全てのテーブルと比較して各テーブル間の類似度を計算し、マッピング情報を作成する。 Through the similarity evaluation process described above, the similarity evaluation unit 108 compares all the tables included in the comparison source database with all the comparison destination tables, calculates the similarity between the tables, and creates mapping information.

上記類似度評価処理は統合対象のデータベースが２つである場合を説明した。統合対象のデータベースが３つ以上ある場合は、各データベースを比較先のデータベースとしてステップ１８３〜ステップ１８７をデータベース数分処理し、各データベースを比較元のデータベースとしてステップＳ１８２〜ステップ１８８をデータベース数分処理する。 The case where the similarity evaluation process has two databases to be integrated has been described. If there are three or more databases to be integrated, steps 183 to 187 are processed for the number of databases using each database as a comparison target database, and steps S182 to 188 are processed for the number of databases using each database as a comparison source database. To do.

図１１は、実施の形態１における類似度評価部１０８の類似度計算処理の流れを示すフローチャートである。
類似度評価部１０８が実行する上記ステップ１８４（類似度計算処理）について図１１に基づいて以下に説明する。
類似度評価部１０８は、上記ステップ１８４（類似度計算処理）において、特徴情報項目毎に類似度を算出し、各特徴情報項目の類似度の合計値を「特徴情報」の類似度（テーブルの類似度）として算出する。 FIG. 11 is a flowchart showing a flow of similarity calculation processing of the similarity evaluation unit 108 in the first embodiment.
The step 184 (similarity calculation process) executed by the similarity evaluation unit 108 will be described below with reference to FIG.
In step 184 (similarity calculation processing), the similarity evaluation unit 108 calculates the similarity for each feature information item, and calculates the total similarity of each feature information item as the similarity of “feature information” (in the table). Similarity).

＜ステップ２０１：特徴情報項目の類似度計算処理＞
類似度評価部１０８は、上記ステップ１８２で取得した比較元のデータベース（第１のデータベース１００）が有するテーブルの「特徴情報」と上記ステップ１８３で取得した比較先のデータベース（第２のデータベース１０１）が有するテーブルの「特徴情報」とから特徴情報項目を一つ選択して、選択した特徴情報項目の値を比較して選択した特徴情報項目の類似度を計算する。 <Step 201: Feature Information Item Similarity Calculation Processing>
The similarity evaluation unit 108 compares the “feature information” of the table included in the comparison source database (first database 100) acquired in step 182 and the comparison destination database (second database 101) acquired in step 183. One feature information item is selected from the “feature information” in the table of the table, the values of the selected feature information items are compared, and the similarity of the selected feature information items is calculated.

＜ステップ２０２＞
類似度評価部１０８は、全ての特徴情報項目の類似度を計算したか確認し、類似度を算出していない特徴情報項目がある場合はステップ２０１に処理を移し、全ての特徴情報項目の類似度を計算済みである場合はステップ２０３に処理を移す。 <Step 202>
The similarity evaluation unit 108 checks whether or not the similarity of all feature information items has been calculated. If there is a feature information item for which the similarity is not calculated, the process proceeds to step 201, where all the feature information items are similar. If the degree has been calculated, the process proceeds to step 203.

＜ステップ２０３＞
類似度評価部１０８は、各特徴情報項目の類似度に重み付け変数記憶装置１０９に記憶された各特徴情報項目の「比重」を掛けて合計した合計値を「特徴情報」の類似度（テーブルの類似度）として算出し類似度計算処理を終了する。 <Step 203>
The similarity evaluation unit 108 multiplies the similarity of each feature information item by the “specific gravity” of each feature information item stored in the weighting variable storage device 109 and adds up the total value to the similarity of the “feature information” (in the table). The similarity calculation process is terminated.

次に、類似度評価部１０８が実行する上記ステップ２０１（特徴情報項目の類似度計算処理）について説明する。
特徴情報項目の類似度の計算には、“単一の項目値（各従属関係を示す属性が定義されたデータ項目の数を示す特徴情報項目値）”を比較して計算する場合と、“データ型の構成（データ型属性毎のデータ項目数を示す特徴情報項目値）”を比較する場合の二通りある。図６に示す特徴情報項目の場合、“単一の項目値”による類似度計算で比較する特徴情報項目は「主キー属性数」、「主キーの参照先テーブル数」、「従属属性数」、「外部キー属性数」が該当し、“データ型の構成”による類似度計算で比較する特徴情報項目は、「主キーのデータ型構成」、「従属属性のデータ型構成」、「外部キーのデータ型構成」が該当する。
以下に、“単一の項目値”による特徴情報項目の類似度計算処理と“データ型の構成”による特徴情報項目の類似度計算処理とを説明する。 Next, step 201 (feature information item similarity calculation processing) executed by the similarity evaluation unit 108 will be described.
The similarity of feature information items is calculated by comparing “single item values (feature information item values indicating the number of data items in which attributes indicating each dependency are defined)” and “ There are two ways to compare “data type configurations (feature information item values indicating the number of data items for each data type attribute)”. In the case of the feature information item shown in FIG. 6, the feature information items to be compared in the similarity calculation by “single item value” are “number of primary key attributes”, “number of primary key reference destination tables”, and “number of dependent attributes”. , “Number of foreign key attributes”, and feature information items to be compared in the similarity calculation by “data type configuration” are “primary key data type configuration”, “subordinate attribute data type configuration”, “foreign key "Data type configuration of".
The feature information item similarity calculation process using “single item value” and the feature information item similarity calculation process using “data type configuration” will be described below.

図１２は、実施の形態１における特徴情報項目の類似度計算処理（単一の項目値）の流れを示すフローチャートである。
まず、上記ステップ２０１における類似度評価部１０８の“単一の項目値”による特徴情報項目の類似度計算処理について図１２に基づいて以下に説明する。
“単一の項目値”による類似度計算で比較する特徴情報項目は「主キー属性数」、「主キーの参照先テーブル数」、「従属属性数」、「外部キー属性数」が該当する。 FIG. 12 is a flowchart showing the flow of the feature information item similarity calculation process (single item value) in the first embodiment.
First, the similarity calculation processing of the feature information item by the “single item value” of the similarity evaluation unit 108 in step 201 will be described below with reference to FIG.
The feature information items to be compared in the similarity calculation using “single item value” correspond to “number of primary key attributes”, “number of primary key reference tables”, “number of dependent attributes”, and “number of foreign key attributes” .

＜ステップ１５０＞
類似度評価部１０８は比較元の特徴情報項目値が０かどうか調べる。０の場合はステップ１５１に処理を移し、０でない場合はステップ１５２に処理を移す。
＜ステップ１５１＞
比較元の特徴情報項目値が０である場合、類似度評価部１０８は比較先の特徴情報項目値が０かどうか調べる。０の場合はステップ１５７に処理を移し、０でない場合はステップ１５６に処理を移す。
＜ステップ１５７＞
比較元と比較先の特徴情報項目値が共に０である場合、類似度評価部１０８は当該特徴情報項目の類似度を１（類似）として特徴情報項目の類似度計算処理を終了する。 <Step 150>
The similarity evaluation unit 108 checks whether or not the feature information item value of the comparison source is 0. If it is 0, the process proceeds to step 151; otherwise, the process proceeds to step 152.
<Step 151>
When the feature information item value of the comparison source is 0, the similarity evaluation unit 108 checks whether the feature information item value of the comparison destination is 0. If it is 0, the process proceeds to step 157. If it is not 0, the process proceeds to step 156.
<Step 157>
When both the comparison source and comparison target feature information item values are 0, the similarity evaluation unit 108 sets the similarity of the feature information item to 1 (similarity) and ends the feature information item similarity calculation processing.

＜ステップ１５２＞
比較元の特徴情報項目値が０でない場合、類似度評価部１０８は比較先の特徴情報項目値が０かどうか調べる。０の場合はステップ１５６に処理を移し、０でない場合はステップ１５３に処理を移す。
＜ステップ１５３＞
比較元と比較先の特徴情報項目値が共に０でない場合、類似度評価部１０８は減算処理を行い比較元の特徴情報項目値と比較先の特徴情報項目値との差の絶対値を計算する。
＜ステップ１５４＞
類似度評価部１０８はステップ１５３で比較した特徴情報項目に対応する「類似許容範囲」を重み付け変数記憶装置１０９から取得し、ステップ１５３で計算した比較元と比較先との特徴情報項目値の差の絶対値（以下、差分とする）と取得した「類似許容範囲」とを比較する。差分が「類似許容範囲」以下の場合はステップ１５５に処理を移し、「類似許容範囲」を超える場合はステップ１５６に処理を移す。
＜ステップ１５５＞
類似度評価部１０８はステップ１５３で比較した特徴情報項目に対応する「変動率」を重み付け変数記憶装置１０９から取得し、ステップ１５３で計算した差分と取得した「変動率」とに基づいて当該特徴情報項目の類似度を計算して特徴情報項目の類似度計算処理を終了する。特徴情報項目の類似度は「１−（変動率×差分）」の式で計算する。
＜ステップ１５６＞
差分が類似度許容範囲より大きい場合（ステップ１５４）、または、比較元と比較先との一方の特徴情報項目値が０で他方の特徴情報項目値が０でない場合（ステップ１５１、ステップ１５２）、類似度評価部１０８は当該特徴情報項目の類似度を０（非類似）として特徴情報項目の類似度計算処理を終了する。
以上の処理により類似度評価部１０８は“単一の項目値”の比較により特徴情報項目の類似度を計算する。 <Step 152>
If the comparison-source feature information item value is not 0, the similarity evaluation unit 108 checks whether the comparison-target feature information item value is 0. If it is 0, the process proceeds to step 156, and if it is not 0, the process proceeds to step 153.
<Step 153>
If both the comparison source and comparison target feature information item values are not 0, the similarity evaluation unit 108 performs a subtraction process to calculate the absolute value of the difference between the comparison source feature information item value and the comparison destination feature information item value. .
<Step 154>
The similarity evaluation unit 108 acquires the “similar allowable range” corresponding to the feature information items compared in step 153 from the weighting variable storage device 109, and the difference between the feature information item values of the comparison source and the comparison destination calculated in step 153 Is compared with the acquired “similar allowable range”. If the difference is equal to or smaller than the “similar allowable range”, the process proceeds to step 155. If the difference exceeds the “similar allowable range”, the process proceeds to step 156.
<Step 155>
The similarity evaluation unit 108 acquires the “variation rate” corresponding to the feature information item compared in step 153 from the weighting variable storage device 109, and based on the difference calculated in step 153 and the acquired “variation rate”, the feature The similarity of information items is calculated, and the feature information item similarity calculation processing is terminated. The similarity of the feature information item is calculated by the formula “1- (variation rate × difference)”.
<Step 156>
When the difference is larger than the similarity allowable range (step 154), or when one feature information item value of the comparison source and the comparison destination is 0 and the other feature information item value is not 0 (step 151, step 152), The similarity evaluation unit 108 sets the similarity of the feature information item to 0 (dissimilarity), and ends the feature information item similarity calculation process.
Through the above processing, the similarity evaluation unit 108 calculates the similarity of feature information items by comparing “single item values”.

図１３は、実施の形態１における特徴情報項目の類似度計算処理（データ型の構成）の流れを示すフローチャートである。
まず、上記ステップ２０１における類似度評価部１０８の“データ型の構成”による特徴情報項目の類似度計算処理について図１３に基づいて以下に説明する。
“データ型の構成”による類似度計算で比較する特徴情報項目は、「主キーのデータ型構成」、「従属属性のデータ型構成」、「外部キーのデータ型構成」が該当する。 FIG. 13 is a flowchart showing the flow of the feature information item similarity calculation process (data type configuration) according to the first embodiment.
First, the similarity calculation processing of feature information items by the “data type configuration” of the similarity evaluation unit 108 in step 201 will be described below with reference to FIG.
The feature information items to be compared in the similarity calculation based on “data type configuration” correspond to “primary key data type configuration”, “subordinate attribute data type configuration”, and “foreign key data type configuration”.

類似度評価部１０８は、“データ型の構成”による特徴情報項目の類似度計算処理において、標準データ型が一致するデータ項目同士は互いに“対応するデータ項目（類似するデータ項目）”であると判定し、“対応するデータ項目”以外の“対応しないデータ項目”の数に基づいて類似度を算出する。また、類似度評価部１０８は、標準データ型が一致しない“対応しないデータ項目”を他の標準データ型（代替対応データ型）に置き換えることで、元の標準データ型が一致しなかった“対応しないデータ項目”を“対応するデータ項目”として扱う。このとき、類似度評価部１０８は、元の標準データ型と代替対応データ型との類似度（標準データ型間での類似度）に応じて“対応するデータ項目”の数および“対応しないデータ項目”の数を設定し類似度を算出する。 The similarity evaluation unit 108 determines that the data items having the same standard data type are “corresponding data items (similar data items)” in the feature information item similarity calculation processing based on the “data type configuration”. The similarity is calculated based on the number of “non-corresponding data items” other than “corresponding data items”. Also, the similarity evaluation unit 108 replaces the “non-corresponding data item” whose standard data type does not match with another standard data type (alternative-compatible data type), so that the original standard data type does not match “ "Data item not to be processed" is treated as "corresponding data item". At this time, the similarity evaluation unit 108 determines the number of “corresponding data items” and “non-corresponding data” according to the similarity between the original standard data type and the alternative corresponding data type (similarity between standard data types). Set the number of items and calculate the similarity.

＜ステップ１６０＞
類似度評価部１０８は元の標準データ型同士で“対応するデータ項目”が幾つ有るか算出する。
そこで、類似度評価部１０８は、標準データ型毎に特徴情報項目の比較先のデータ項目数（文字型属性数、真数型属性数など）から比較元のデータ項目数を減算して標準データ型の差分値を計算する。
＜ステップ１６１＞
類似度評価部１０８は、全ての標準データ型について全てのデータ項目が“対応するデータ項目”であるか、つまり、“対応しないデータ項目”が無いかを判定する。
そこで、類似度評価部１０８は、すべての標準データ型について、ステップ１６０で計算した差分値が０かどうか調べる。すべての標準データ型の差分値が０であるとき、比較元と比較先ですべてのデータ項目が対応付けられたことを示す。すべての差分値が０の場合はステップ１６２に処理を移し、そうでない場合はステップ１６３に処理を移す。
＜ステップ１６２＞
全ての標準データ型の差分値が０である場合、類似度評価部１０８は当該特徴情報項目の類似度を１（類似）として特徴情報項目の類似度計算処理を終了する。
＜ステップ１６３＞
差分値が０である標準データ型が全ての標準データ型でない場合、類似度評価部１０８は、比較元と比較先のいずれか一方について全てのデータ項目が“対応するデータ項目”であるか、つまり、各データ型のデータ項目数について、比較元と比較先との一方のデータ項目数が他方のデータ項目数より全てのデータ型で多いかを判定する。
そこで、類似度評価部１０８は、ステップ１６０で計算した差分値のすべての値が正（＋）または負（−）であるか調べる。すべての差分値が正（＋）または０であるとき、比較元のデータ項目がすべて対応付けられたことを示す。すべての差分値が負（−）または０であるとき、比較先のデータ項目がすべて対応付けられたことを示す。すべての差分値が正か負の場合はステップ１６４に処理を移し、そうでない場合はステップ１６７に処理を移す。
図１４に比較元のデータ項目が全て対応付けられる場合のデータ項目数の例を示し、図１５に比較先のデータ項目が全て対応付けられる場合の例を示し、図１６に比較元・比較先共に未対応のデータ項目がある場合の例を示す。
＜ステップ１６４＞
全ての差分値が正か負の場合、類似度評価部１０８はステップ１６０で計算した各標準データ型での差分値の絶対値を合計する。合計した値は、比較元と比較先とで“対応しないデータ項目”の数である。
図１４または図１５に示す差分値の場合、合計は“５＝３＋２＋０”である
＜ステップ１６５＞
類似度評価部１０８は当該特徴情報項目に対応する「類似許容範囲」を重み付け変数記憶装置１０９から取得し、ステップ１６４で計算した合計（“対応しないデータ項目”の数）と取得した「類似許容範囲」とを比較する。合計が「類似許容範囲」以下の場合はステップ１６６に処理を移し、「類似許容範囲」を超える場合はステップ１７４に処理を移す。
＜ステップ１６６＞
合計が「類似許容範囲」以下の場合、類似度評価部１０８は当該特徴情報項目に対応する「変動率」を重み付け変数記憶装置１０９から取得し、ステップ１６４で計算した合計と取得した「変動率」とに基づいて当該特徴情報項目の類似度を計算して特徴情報項目の類似度計算処理を終了する。特徴情報項目の類似度は「１−（変動率×合計）」の式で計算する。 <Step 160>
The similarity evaluation unit 108 calculates how many “corresponding data items” exist between the original standard data types.
Therefore, the similarity evaluation unit 108 subtracts the number of comparison-source data items from the number of comparison-destination data items (number of character-type attributes, number of true-number attributes, etc.) for each standard data type. Calculate the difference value of the type.
<Step 161>
The similarity evaluation unit 108 determines whether all the data items for all the standard data types are “corresponding data items”, that is, whether there is no “non-corresponding data item”.
Therefore, the similarity evaluation unit 108 checks whether or not the difference value calculated in step 160 is 0 for all standard data types. When the difference values of all standard data types are 0, it indicates that all data items are associated with the comparison source and the comparison destination. If all the difference values are 0, the process proceeds to step 162; otherwise, the process proceeds to step 163.
<Step 162>
When the difference values of all the standard data types are 0, the similarity evaluation unit 108 sets the similarity of the feature information item to 1 (similarity) and ends the feature information item similarity calculation processing.
<Step 163>
If the standard data type with a difference value of 0 is not all standard data types, the similarity evaluation unit 108 determines whether all the data items are “corresponding data items” for either the comparison source or the comparison destination, That is, it is determined whether the number of data items of each data type is larger in all data types than the number of data items in one of the comparison source and the comparison destination.
Therefore, the similarity evaluation unit 108 checks whether all the difference values calculated in step 160 are positive (+) or negative (−). When all the difference values are positive (+) or 0, it indicates that all the comparison source data items are associated. When all the difference values are negative (−) or 0, it indicates that all the comparison target data items are associated. If all the difference values are positive or negative, the process proceeds to step 164. If not, the process proceeds to step 167.
FIG. 14 shows an example of the number of data items when all the comparison source data items are associated, FIG. 15 shows an example when all the comparison destination data items are associated, and FIG. 16 shows the comparison source / comparison destination. An example in which there are unsupported data items is shown.
<Step 164>
If all the difference values are positive or negative, the similarity evaluation unit 108 sums up the absolute values of the difference values in the standard data types calculated in step 160. The total value is the number of “incompatible data items” between the comparison source and the comparison destination.
In the case of the difference values shown in FIG. 14 or FIG. 15, the total is “5 = 3 + 2 + 0” <Step 165>
The similarity evaluation unit 108 acquires the “similar allowable range” corresponding to the feature information item from the weighting variable storage device 109, the total calculated in step 164 (the number of “non-corresponding data items”), and the acquired “similar allowable range”. Compare with Range. If the total is equal to or smaller than the “similar allowable range”, the process proceeds to step 166. If the total exceeds the “similar allowable range”, the process proceeds to step 174.
<Step 166>
When the total is equal to or less than the “similar allowable range”, the similarity evaluation unit 108 acquires the “variation rate” corresponding to the feature information item from the weighting variable storage device 109, and the total calculated in step 164 and the acquired “variation rate” Based on the above, the similarity of the feature information item is calculated, and the feature information item similarity calculation processing is terminated. The similarity of the feature information item is calculated by the formula “1- (variation rate × total)”.

以下、ステップ１６０で計算した差分値が図１７に示す値である場合を例に説明する。 Hereinafter, a case where the difference value calculated in step 160 is the value shown in FIG. 17 will be described as an example.

＜ステップ１６７＞
正の差分値と負の差分値とがある場合（ステップ１６３）、代替対応データ型でなく元の標準データ型で“対応するデータ項目”になっているデータ項目の数を算出する。
そこで、類似度評価部１０８は、まず、比較元の各データ型の値の合計（Ａ）を計算する。つまり、類似度評価部１０８は比較元の全データ項目数を求める。図１７の場合、“合計（Ａ）＝５＋５＋２＋２＋１＝１５”である（ステップ１６７ａ）。
次に、ステップ１６０で計算した差分値の中で、負の値になっているものの合計（Ｂ）を計算する。このとき、「差分値＝比較先のデータ項目数−比較元のデータ項目数」なので、値が負の差分値は、比較先に対応するデータ項目が無い比較元のデータ項目の数となる。図１７の場合、“合計（Ｂ）＝（−２）＋（−１）＝−３”である（ステップ１６７ｂ）。
そして、合計Ａと合計Ｂを加算して、比較元と比較先とで“対応するデータ項目”の数を求める。図１７の場合、“対応するデータ項目数＝１５＋（−３）＝１２”である（ステップ１６７ｃ）。 <Step 167>
If there is a positive difference value and a negative difference value (step 163), the number of data items that are “corresponding data items” in the original standard data type instead of the alternative corresponding data type is calculated.
Therefore, the similarity evaluation unit 108 first calculates the sum (A) of the values of each data type of the comparison source. That is, the similarity evaluation unit 108 obtains the total number of data items as comparison sources. In the case of FIG. 17, “total (A) = 5 + 5 + 2 + 2 + 1 = 15” (step 167a).
Next, the sum (B) of negative values among the difference values calculated in step 160 is calculated. At this time, since “difference value = number of comparison destination data items−number of comparison source data items”, the difference value having a negative value is the number of comparison source data items having no data item corresponding to the comparison destination. In the case of FIG. 17, “total (B) = (− 2) + (− 1) = − 3” (step 167b).
Then, the total A and the total B are added to determine the number of “corresponding data items” between the comparison source and the comparison destination. In the case of FIG. 17, “the number of corresponding data items = 15 + (− 3) = 12” (step 167c).

＜ステップ１６８〜ステップ１７０＞
類似度評価部１０８は、“対応しないデータ項目”の標準データ型を代替対応データ型に置き換える際、“対応しないデータ項目”を有する他の標準データ型のうちから代替対応データ型を選択する。代替対応データ型の選択は「標準データ型間での類似度」に基づいて行う。 <Step 168 to Step 170>
When replacing the standard data type of “non-corresponding data item” with an alternative correspondence data type, the similarity evaluation unit 108 selects an alternative correspondence data type from other standard data types having “non-corresponding data item”. The alternative corresponding data type is selected based on “similarity between standard data types”.

以下、「標準データ型間での類似度」が図７に示す値である場合を例に説明する。 Hereinafter, a case where “similarity between standard data types” is the value shown in FIG. 7 will be described as an example.

＜ステップ１６８＞
類似度評価部１０８は、ステップ１６０で計算した差分値が負のデータ型について、代替対応データ型がないか調べる。図１７の場合、真数型（差分値＝−２）とバイナリ型（差分値＝−１）について代替対応データ型が無いか調べる。以下、真数型を例に説明する。
類似度評価部１０８は、まず、重み付け変数記憶装置１０９に記憶された「標準データ型間での類似度」において、差分値が負の標準データ型との類似度が０でない標準データ型を調べる。図７に示す「標準データ型間の類似度」の場合、真数型との類似度が０でない標準データ型は文字型と概数型と日付型である。類似度が０でない標準データ型がない場合は代替対応データ型もない（ステップ１６８ａ）。
次に、差分値が正のデータ型を調べる。図１７に示す差分値の場合、文字型と概数型と日付型のうち差分値が正のデータ型は文字型と概数型である。差分値が正のデータ型がない場合は代替対応データ型もない（ステップ１６８ｂ）。
そして、「標準データ型間での類似度」が大きい標準データ型から順番に代替対応データ型を選択する。図７に示す「標準データ型間の類似度」の場合、文字型と概数型とで真数型との類似度が大きいのは概数型であるため代替対応データ型は概数型である（ステップ１６８ｃ）。
類似度評価部１０８は代替対応データ型が有る場合はステップ１６９に処理を移し、無い場合はステップ１７０に処理を移す（ステップ１６８ｄ）。 <Step 168>
The similarity evaluation unit 108 checks whether there is an alternative corresponding data type for the data type with the negative difference value calculated in step 160. In the case of FIG. 17, it is checked whether there is an alternative corresponding data type for the true number type (difference value = −2) and the binary type (difference value = −1). Hereinafter, an explanation will be given by taking the true number type as an example.
The similarity evaluation unit 108 first checks a standard data type whose similarity with a standard data type with a negative difference value is not 0 in “similarity between standard data types” stored in the weighting variable storage device 109. . In the case of “similarity between standard data types” shown in FIG. 7, standard data types whose similarity to the true number type is not 0 are a character type, an approximate number type, and a date type. If there is no standard data type whose similarity is not 0, there is no alternative corresponding data type (step 168a).
Next, the data type with a positive difference value is examined. In the case of the difference value shown in FIG. 17, the data type having a positive difference value among the character type, the approximate number type, and the date type is the character type and the approximate number type. If there is no data type with a positive difference value, there is no alternative corresponding data type (step 168b).
Then, the alternative corresponding data types are selected in order from the standard data type having the largest “similarity between standard data types”. In the case of the “similarity between standard data types” shown in FIG. 7, it is the round type that has a large similarity between the character type and the round type, and the alternative data type is the round type (step 168c).
The similarity evaluation unit 108 moves the process to step 169 when there is an alternative corresponding data type, and moves the process to step 170 when there is no alternative corresponding data type (step 168d).

＜ステップ１６９＞
類似度評価部１０８は、ステップ１６８で代替対応データ型を選択することにより“対応するデータ項目”として扱うデータ項目について、ステップ１６０で計算した差分値に反映する。
そこで、類似度評価部１０８は、まず、負の値でありデータ型を代替対応データ型に置き換える元の標準データ型の差分値と正の値であり代替対応データ型の差分値を加算する。図１７に示す差分値の場合、“（−２）｛真数型の差分値｝＋１｛概数型の差分値｝＝−１”である（ステップ１６９ａ）。
加算した結果が正の値の場合、元の標準データ型の差分値の絶対値を代替対応済データ項目数とし、元の標準データ型の差分値を０とし、代替対応データ型の差分値をステップ１６９ａの加算結果として設定する。つまり、加算した結果が正の値ｎの場合、“代替対応済データ項目数＝元の標準データ型の差分値の絶対値”、“元の標準データ型の差分値＝０”、“代替対応データ型の差分値＝ｎ”である。加算した結果が負の値の場合、代替対応データ型の差分値を代替対応済データ項目数とし、元の標準データ型の差分値をステップ１６９ａの加算結果とし、代替対応データ型の差分値を０として設定する。つまり、図１７に示す差分値の場合、“代替対応済データ項目数＝１”、“真数型の差分値＝（−１）”、“概数型の差分値＝０”である。加算した結果が０のときは、代替対応データ型の差分値を代替対応済データ項目数とし、元の標準データ型と代替対応データ型の両方の差分値を０にする。つまり、加算した結果が０の場合、“代替対応済データ項目数＝代替対応データ型の差分値”、“元の標準データ型の差分値＝０”、“代替対応データ型の差分値＝０”である（ステップ１６９ｂ）。
次に、重み付け変数記憶装置１０９の「標準データ型間での類似度」が示す元の標準データ型と代替対応データ型との類似度と代替対応済データ項目数とを掛け合わせステップ１６７で算出した“対応するデータ項目”の数に加算する。図１７に示す差分値の場合、“１｛真数型の代替対応済データ項目数｝×０．８｛真数型と概数型との類似度｝＋１２｛代替対応データ型で置き換える前の対応するデータ項目数｝＝１２．８｛代替対応データ型で置き換えた後の対応するデータ項目数｝”である（ステップ１６９ｃ） <Step 169>
The similarity evaluation unit 108 reflects the data item handled as “corresponding data item” in the difference value calculated in step 160 by selecting an alternative corresponding data type in step 168.
Therefore, the similarity evaluation unit 108 first adds the difference value of the original standard data type, which is a negative value and the data type is replaced with the alternative corresponding data type, and the difference value of the positive corresponding alternative data type. In the case of the difference value shown in FIG. 17, “(−2) {the true type difference value} +1 {the approximate number type difference value} = − 1” (step 169 a).
When the result of addition is a positive value, the absolute value of the difference value of the original standard data type is set as the number of data items that can be replaced, the difference value of the original standard data type is set to 0, and the difference value of the replacement corresponding data type is set to Set as the addition result of step 169a. That is, if the result of addition is a positive value n, “number of alternative supported data items = absolute value of difference value of original standard data type”, “difference value of original standard data type = 0”, “substitution support” Data type difference value = n ″. If the result of addition is a negative value, the difference value of the substitution corresponding data type is set as the number of data items corresponding to substitution, the difference value of the original standard data type is set as the addition result of step 169a, and the difference value of the substitution corresponding data type is Set as 0. That is, in the case of the differential values shown in FIG. 17, “the number of substitution-corresponding data items = 1”, “the true type differential value = (− 1)”, and “the approximate type differential value = 0”. When the result of addition is 0, the difference value of the alternative correspondence data type is set as the number of data items corresponding to the alternative correspondence, and the difference values of both the original standard data type and the alternative correspondence data type are set to 0. That is, if the result of addition is 0, “the number of substitution-supported data items = difference value of substitution-compatible data type”, “difference value of original standard data type = 0”, “difference value of substitution-compatible data type = 0” "(Step 169b).
Next, in step 167, the similarity between the original standard data type indicated by the “similarity between standard data types” in the weighting variable storage device 109 and the alternative correspondence data type is multiplied by the number of alternative correspondence data items. Is added to the number of “corresponding data items”. In the case of the difference value shown in FIG. 17, “1 {number of data items corresponding to substitution of exact number type} × 0.8 {similarity between exact number type and approximate number type} +12 {correspondence before replacement with substitution corresponding data type Number of data items to be performed} = 12.8 {number of corresponding data items after being replaced with alternative corresponding data type} "(step 169c)

＜ステップ１７０＞
類似度評価部１０８は代替対応データ型の探索（ステップ１６８）をしていない“対応しないデータ項目”が残っているか判定する。
そこで、類似度評価部１０８は、ステップ１６０で計算した差分値が負の標準データ型のうち、代替対応データ型の探索をしていないものがないか調べ、未探索の標準データ型がある場合はステップ１６８に処理を移し、ない場合はステップ１７１に処理を移す。図１７に示す差分値の場合、一回目のステップ１６９の結果は図１８になる。図１８に示す結果の場合、差分値が負で代替対応データ型を探索していないものがあるか調べると、差分値が負のものは、真数型とバイナリ型であり、このうち真数型は探索済みなので、バイナリ型についてステップ１６８に処理を移す。バイナリ型の処理も完了すると、未探索のものはなくなるので、ステップ１７１に処理を移す。 <Step 170>
The similarity evaluation unit 108 determines whether or not “non-corresponding data items” that have not been searched for an alternative corresponding data type (step 168) remain.
Therefore, the similarity evaluation unit 108 checks whether there is a search for an alternative corresponding data type among the standard data types having a negative difference value calculated in step 160, and there is an unsearched standard data type. Moves the process to step 168, and if not, moves the process to step 171. In the case of the difference value shown in FIG. 17, the result of the first step 169 is shown in FIG. In the case of the result shown in FIG. 18, when there is a difference value that is negative and no alternative corresponding data type is searched, the ones with a negative difference value are the truth type and the binary type. Since the type has already been searched, the process moves to step 168 for the binary type. When the binary processing is completed, there is no unsearched one, and the processing is moved to step 171.

＜ステップ１７１＞
類似度評価部１０８は“対応しないデータ項目”の数を計算する。
このとき、類似度評価部１０８は、正である差分値の合計と負である差分値の合計の絶対値を比較し、値が大きい方を“対応しないデータ項目”の数とする。差分値がすべてのデータ型で０になった場合の“対応しないデータ項目”の数は０である。図１７に示す差分値の場合、ステップ１７０の結果は図１８になる。図１８に示す結果の場合、“正である差分値の合計＝１”、“負である差分値の合計の絶対値＝２”であるため、“対応しないデータ項目の数＝２”である。 <Step 171>
The similarity evaluation unit 108 calculates the number of “non-corresponding data items”.
At this time, the similarity evaluation unit 108 compares the absolute value of the sum of the positive difference values and the sum of the negative difference values, and sets the larger value as the number of “non-corresponding data items”. When the difference value is 0 for all data types, the number of “non-corresponding data items” is 0. In the case of the difference value shown in FIG. 17, the result of step 170 is FIG. In the case of the result shown in FIG. 18, since “the sum of positive difference values = 1” and “the absolute value of the sum of negative difference values = 2”, “the number of non-corresponding data items = 2”. .

＜ステップ１７２＞
類似度評価部１０８は当該特徴情報項目に対応する「類似許容範囲」を重み付け変数記憶装置１０９から取得し、ステップ１７１で計算した合計（“対応しないデータ項目”の数）と取得した「類似許容範囲」とを比較する。合計が「類似許容範囲」以下の場合はステップ１７３に処理を移し、「類似許容範囲」を超える場合はステップ１７４に処理を移す。ステップ１７０の結果が図１８であり合計が２である場合、「類似許容範囲」＝３とすると“２｛合計｝≦３｛類似許容範囲｝”であるからステップ１７３に処理を移す。また、「類似許容範囲」＝１とすると“２｛合計｝＞１｛類似許容範囲｝”となるためステップ１７４に処理を移す。 <Step 172>
The similarity evaluation unit 108 acquires the “similar allowable range” corresponding to the feature information item from the weighting variable storage device 109, the total calculated in step 171 (the number of “non-corresponding data items”), and the acquired “similar allowable range”. Compare with Range. If the total is equal to or smaller than the “similar allowable range”, the process proceeds to step 173. If the total exceeds the “similar allowable range”, the process proceeds to step 174. If the result of step 170 is FIG. 18 and the sum is 2, if “similar allowable range” = 3, then “2 {total} ≦ 3 {similar allowable range}”, so the process moves to step 173. If “similar permissible range” = 1, “2 {total}> 1 {similar permissible range}”, and the process proceeds to step 174.

＜ステップ１７３＞
合計が「類似許容範囲」以下の場合、類似度評価部１０８は、当該特徴情報項目に対応する「変動率」を重み付け変数記憶装置１０９から取得し、ステップ１７１で計算した合計と取得した「変動率」とに基づいて当該特徴情報項目の類似度を計算して特徴情報項目の類似度計算処理を終了する。特徴情報項目の類似度は「（１−（変動率×合計））×（対応するデータ項目数÷全データ項目数）」の式で計算する。
このとき、類似度評価部１０８は、まず、比較元と比較先それぞれについて、各データ型の値の合計を計算し、合計が小さい方、つまり、少ない方のデータ項目数を全データ項目数とする。ステップ１７０の結果が図１８の場合、“比較元のデータ型の値の合計＝５＋５＋２＋２＋１＝１５”、“比較先のデータ型の値の合計＝６＋３＋３＋２＋０＝１４”であるため、“全データ項目数＝１４（＜１５）”である（ステップ１７３ａ）。
そして、ステップ１６７およびステップ１６９で算出した“対応するデータ項目”の数とステップ１７１で計算した合計と重み付け変数記憶装置１０９から取得した「変動率」とに基づいて特徴情報項目の類似度を計算する。「変動率」が０．１、ステップ１７０の結果が図１８、合計が２、“対応するデータ項目”の数が１２．８の場合、“類似度＝０．７３≒（１−（０．１×２））×（１２．８÷１４）”である（ステップ１７３ｂ）。 <Step 173>
When the total is equal to or less than the “similar allowable range”, the similarity evaluation unit 108 acquires the “variation rate” corresponding to the feature information item from the weighting variable storage device 109, the sum calculated in step 171, and the acquired “variation” Based on the “rate”, the similarity of the feature information item is calculated, and the feature information item similarity calculation processing is terminated. The similarity of the feature information item is calculated by the formula “(1− (variation rate × total)) × (number of corresponding data items ÷ total number of data items)”.
At this time, the similarity evaluation unit 108 first calculates the sum of the values of the respective data types for each of the comparison source and the comparison destination, and calculates the smaller total number of data items, that is, the smaller number of data items as the total number of data items. To do. In the case where the result of step 170 is that shown in FIG. 18, “total number of data types of comparison source = 5 + 5 + 2 + 2 + 1 = 15” and “total of data type of comparison destination = 6 + 3 + 3 + 2 + 0 = 14”. = 14 (<15) "(step 173a).
Then, the similarity of the feature information item is calculated based on the number of “corresponding data items” calculated in step 167 and step 169, the total calculated in step 171 and the “variation rate” acquired from the weighting variable storage device 109. To do. When the “variation rate” is 0.1, the result of step 170 is FIG. 18, the total is 2, and the number of “corresponding data items” is 12.8, “similarity = 0.73≈ (1− (0. 1 × 2)) × (12.8 ÷ 14) ”(step 173b).

＜ステップ１７４＞
合計が「類似許容範囲」を超える場合（ステップ１６５、ステップ１７２）、類似度評価部１０８は当該特徴情報項目の類似度を０（非類似）として特徴情報項目の類似度計算処理を終了する。
以上の処理により類似度評価部１０８は“データ型の構成”の比較による特徴情報項目の類似度を計算する。 <Step 174>
When the total exceeds the “similar allowable range” (steps 165 and 172), the similarity evaluation unit 108 sets the similarity of the feature information item to 0 (non-similarity) and ends the feature information item similarity calculation processing.
With the above processing, the similarity evaluation unit 108 calculates the similarity of the feature information item based on the comparison of “data type configuration”.

上記実施の形態において、異なるデータベース間のスキーマ統合支援処理で、データベースの固有データ型を標準データ型に対応付けるデータ型辞書と、類似度の計算や評価基準に使用する重み付け変数を備え、データベースからスキーマ情報を抽出する手段（スキーマ情報抽出部１０３）と、抽出したスキーマ情報からデータの特徴情報を生成する手段（特徴情報生成部１０５）と、生成した特徴情報から類似度を評価しマッピングモデルを作成する手段（類似度評価部１０８）を有することを特徴とするスキーマ統合支援方法について説明した。 In the above embodiment, the schema integration support process between different databases includes a data type dictionary that associates the unique data type of the database with the standard data type, and a weighting variable that is used for similarity calculation and evaluation criteria. Means for extracting information (schema information extracting unit 103), means for generating feature information of data from the extracted schema information (feature information generating unit 105), and creating a mapping model by evaluating similarity from the generated feature information A schema integration support method characterized by having means (similarity evaluation unit 108) for performing the above has been described.

また、スキーマ情報として主にデータ間の制約条件・従属関係を使用して特徴情報を生成することを特徴とするスキーマ統合支援方法について説明した。 In addition, a schema integration support method characterized by generating feature information mainly using constraint conditions and dependency relationships between data as schema information has been described.

以上のように、データベースのスキーマ情報からデータ間の制約条件・従属関係に着目してテーブルの特徴情報を作成し、特徴情報を比較してテーブルの類似度を計算することで、一定基準の類似度（「類似判定閾値」以上の類似度）を持つテーブルのマッピング候補を生成することができる。
また、マッピング候補毎に類似度が計算されるため、類似度を比較することによりマッピング候補の順位付けを行うことができる。
さらに、特徴情報には外部キー属性や主キーの参照先テーブル数といった他のテーブルとの関連性の情報を含んでいるため、構造が似ているテーブルが複数存在している場合にも、他のテーブルとの関連性から類似度の違いを判別できる。
また、属性のデータ型は標準データ型に対応付けられ、異なる標準データ型間での類似度を評価できるので、同一の意味のデータ項目を異なるデータ型で定義しているようなスキーマ同士（例えば、日付データを片や文字型でもう一方は日付型で定義）についても対応できる。
さらに、特徴情報は数値項目の集合として定義されており、各項目について共通の方法により類似度を評価するので、特徴情報の項目の追加、削除に容易に対応できる。
また、特徴情報はデータベースのスキーマ情報から作成するので、データベースの実データにアクセスすることなしに、マッピング候補を作成できる。
さらに、マッピング候補の判定に実データを使用しないため、統合対象のデータベース間でのデータ内容の標準化（統一）といった前処理が不要である。
そして、特徴情報の項目としてデータの属性名を使用しないため、属性名が適切に設定されていないデータベースに対してもマッピング候補を作成できる。 As described above, the feature information of the table is created from the schema information of the database by paying attention to the constraint condition and dependency relationship between the data, and the similarity of the table is calculated by comparing the feature information, and the similarity of the fixed criteria Table mapping candidates having a degree (similarity greater than or equal to “similarity determination threshold”) can be generated.
Also, since the similarity is calculated for each mapping candidate, the mapping candidates can be ranked by comparing the similarities.
In addition, since the feature information includes information related to other tables such as foreign key attributes and the number of primary key reference tables, other information can be used even when there are multiple tables with similar structures. The difference in similarity can be determined from the relationship with the table.
In addition, attribute data types are associated with standard data types, and the similarity between different standard data types can be evaluated. Therefore, schemas in which data items having the same meaning are defined with different data types (for example, , Date data is defined as a single or character type and the other is defined as a date type).
Furthermore, the feature information is defined as a set of numerical items, and the similarity is evaluated by a common method for each item. Therefore, addition and deletion of feature information items can be easily handled.
Further, since the feature information is created from the schema information of the database, mapping candidates can be created without accessing the actual data of the database.
Furthermore, since actual data is not used for determining mapping candidates, preprocessing such as standardization (unification) of data contents between databases to be integrated is not required.
Since the attribute name of the data is not used as the feature information item, mapping candidates can be created even for a database in which the attribute name is not properly set.

上記実施の形態において、「リレーショナルデータベース」を例に挙げて説明したが、階層データベース、ネットワークデータベース、オブジェクト指向データベースなどの他のデータベースが統合対象のデータベースであっても構わない。上記実施の形態における「テーブル」とは特定のデータ構造を有するデータの集合を意味し、「リレーショナルデータベース」以外のデータベースにおいても、上記実施の形態と同様に、データ構造（特に、データ間の従属関係やデータ型）に基づいて類似度を算出し、マッピング情報を生成すればよい。 In the above embodiment, the “relational database” has been described as an example. However, other databases such as a hierarchical database, a network database, and an object-oriented database may be databases to be integrated. The “table” in the above embodiment means a set of data having a specific data structure, and in a database other than the “relational database”, the data structure (particularly, the dependency between data) is similar to the above embodiment. Similarity may be calculated based on the relationship and data type) to generate mapping information.

実施の形態１におけるスキーマ統合支援装置１０２の外観を示す図。FIG. 3 is a diagram illustrating an appearance of a schema integration support device 102 according to the first embodiment. 実施の形態１におけるスキーマ統合支援装置１０２のハードウェア構成図。FIG. 2 is a hardware configuration diagram of the schema integration support apparatus according to the first embodiment. 実施の形態１におけるスキーマ統合支援装置１０２の機能要素を示す機能構成図。FIG. 3 is a functional configuration diagram showing functional elements of the schema integration support apparatus 102 according to the first embodiment. 実施の形態１における「スキーマ情報」の一例を示すデータ構造図。FIG. 4 is a data structure diagram showing an example of “schema information” in the first embodiment. 実施の形態１における「データ型辞書」の一例を示す図。FIG. 4 is a diagram showing an example of a “data type dictionary” in the first embodiment. 実施の形態１における「特徴情報」の項目の一例を示す図。FIG. 6 is a diagram illustrating an example of “feature information” items according to the first embodiment. 実施の形態１における「標準データ型間での類似度」の一例を示す図。FIG. 4 is a diagram showing an example of “similarity between standard data types” in the first embodiment. 実施の形態１における「マッピング情報」の一例を示す図。FIG. 6 shows an example of “mapping information” in the first embodiment. 実施の形態１におけるスキーマ統合支援処理（スキーマ統合支援方法）を示すフローチャート。5 is a flowchart showing schema integration support processing (schema integration support method) in the first embodiment. 実施の形態１における類似度評価部１０８の類似度評価処理の流れを示すフローチャート。5 is a flowchart showing a flow of similarity evaluation processing of a similarity evaluation unit 108 in the first embodiment. 実施の形態１における類似度評価部１０８の類似度計算処理の流れを示すフローチャート。6 is a flowchart showing a flow of similarity calculation processing performed by the similarity evaluation unit according to the first embodiment. 実施の形態１における特徴情報項目の類似度計算処理（単一の項目値）の流れを示すフローチャート。5 is a flowchart showing a flow of feature information item similarity calculation processing (single item value) in the first embodiment. 実施の形態１における特徴情報項目の類似度計算処理（データ型の構成）の流れを示すフローチャート。6 is a flowchart showing a flow of feature information item similarity calculation processing (configuration of data type) according to the first embodiment; 実施の形態１における差分値が全て０以上である場合の比較元と比較先の標準データ型毎のデータ項目数の一例を示す図。The figure which shows an example of the data item number for every standard data type of the comparison origin and comparison destination in case all the difference values in Embodiment 1 are 0 or more. 実施の形態１における差分値が全て０以下である場合の比較元と比較先の標準データ型毎のデータ項目数の一例を示す図。The figure which shows an example of the data item number for every standard data type of the comparison origin and comparison destination in case the difference values in Embodiment 1 are all 0 or less. 実施の形態１における差分値に正の値と負の値とが含まれる場合の比較元と比較先の標準データ型毎のデータ項目数の一例を示す図。The figure which shows an example of the number of data items for every standard data type of the comparison origin and comparison destination in case the positive value and the negative value are contained in the difference value in Embodiment 1. 実施の形態１におけるステップ１６０で算出した差分値の一例を示す図。FIG. 6 is a diagram illustrating an example of a difference value calculated in step 160 in the first embodiment. 実施の形態１におけるステップ１６９およびステップ１７０後の差分値の一例を示す図。FIG. 6 is a diagram illustrating an example of a difference value after step 169 and step 170 in the first embodiment.

Explanation of symbols

１００第１のデータベース、１０１第２のデータベース、１０２スキーマ統合支援装置、１０３スキーマ情報抽出部、１０４スキーマ情報記憶装置、１０５特徴情報生成部、１０６データ型辞書記憶装置、１０７特徴情報記憶装置、１０８類似度評価部、１０９重み付け変数記憶装置、１１０マッピングモデル記憶装置、１４０テーブル情報、１４１主キーカラム情報、１４２主キー参照先情報、１４３従属カラム情報、１４４外部キー情報、１４５外部キーカラム情報、９０１表示装置、９０２Ｋ／Ｂ、９０３マウス、９０４ＦＤＤ、９０５ＣＤＤ、９０６プリンタ装置、９０７スキャナ装置、９０８光ディスク装置、９１０システムユニット、９１１ＣＰＵ、９１２バス、９１３ＲＯＭ、９１４ＲＡＭ、９１５通信ボード、９２０磁気ディスク装置、９２１ＯＳ、９２２ウィンドウシステム、９２３プログラム群、９２４ファイル群、９３１電話器、９３２ＦＡＸ機、９４０インターネット、９４１ウェブサーバ、９４２ＬＡＮ。 DESCRIPTION OF SYMBOLS 100 1st database, 101 2nd database, 102 Schema integration support apparatus, 103 Schema information extraction part, 104 Schema information storage apparatus, 105 Feature information generation part, 106 Data type dictionary storage apparatus, 107 Feature information storage apparatus, 108 Similarity evaluation unit, 109 weighting variable storage device, 110 mapping model storage device, 140 table information, 141 primary key column information, 142 primary key reference destination information, 143 dependent column information, 144 external key information, 145 external key column information, 901 display Device, 902 K / B, 903 mouse, 904 FDD, 905 CDD, 906 printer device, 907 scanner device, 908 optical disk device, 910 system unit, 911 CPU, 912 bus, 913 ROM, 914 RAM, 915 communication board, 920 magnetic disk device, 921 OS, 922 window system, 923 program group, 924 file group, 931 telephone, 932 FAX machine, 940 Internet, 941 Web server, 942 LAN.

Claims

A schema integration support device that outputs mapping information about a table to be mapped between the first database and the second database as information for supporting integration between the first database and the second database;
A schema information storage device that stores data structure information of table A included in the first database and data structure information of table B included in the second database;
The first database is obtained by comparing the data structure information of the table A of the first database stored in the schema information storage device with the data structure information of the table B of the second database using the central processing unit. What is claimed is: 1. A schema integration support apparatus comprising: a similarity evaluation unit that calculates a similarity between a table A and a table B included in a second database and outputs the calculated similarity as mapping information to an output device.

The schema information storage device stores an attribute indicating a dependency of data items included in the table in the data structure information,
The schema integration support apparatus according to claim 1, wherein the similarity evaluation unit calculates a similarity by comparing attributes indicating data item dependency included in the table included in the data structure information.

The schema information storage device stores at least one of a primary key attribute, a subordinate attribute, and a foreign key attribute, and stores a subordinate relationship of data items of the table;
3. The schema according to claim 2, wherein the similarity evaluation unit compares at least one of a primary key attribute, a subordinate attribute, and a foreign key attribute included in the data structure information to calculate a similarity. Integrated support device.

The similarity evaluation unit
The similarity is calculated by comparing the number of data items having at least one of the number of data items having a primary key attribute, the number of data items having a dependent attribute, and the number of data items having a foreign key attribute. The schema integration support apparatus according to claim 3.

The schema information storage device stores an attribute indicating a data type of a data item included in the table in the data structure information,
The schema integration support apparatus according to claim 1, wherein the similarity evaluation unit compares the attribute indicating the data type of the data item included in the table included in the data structure information and calculates the similarity.

The similarity evaluation unit
6. The schema integration support apparatus according to claim 5, wherein the similarity is calculated by comparing the number of data items for each data type.

The schema information storage device stores at least one of a primary key attribute, a dependent attribute, and a foreign key attribute of a data item included in the table in the data structure information,
The similarity evaluation unit includes at least one of the number of data items having a primary key attribute for each data type, the number of data items having a dependent attribute for each data type, and the number of data items having a foreign key attribute for each data type. 7. The schema integration support apparatus according to claim 6, wherein the similarity is calculated by comparing the number of each data item for each data type.

A schema integration support method of a schema integration support apparatus that outputs mapping information about tables mapped in the first database and the second database as information supporting the integration of the first database and the second database;
Storing the data structure information of the table A included in the first database and the data structure information of the table B included in the second database in the schema information storage device;
The similarity evaluation unit compares the data structure information of the table A included in the first database stored in the schema information storage device with the data structure information of the table B included in the second database using the central processing unit. Schema integration characterized by calculating similarity between table A of the first database and table B of the second database and performing similarity evaluation processing for outputting the calculated similarity to the output device as mapping information Support method.

A schema integration support program for causing a computer to execute the schema integration support method according to claim 8.