JPH08249338A

JPH08249338A - Data base concept schemer integration support device

Info

Publication number: JPH08249338A
Application number: JP7048114A
Authority: JP
Inventors: Gengo Suzuki; 源吾鈴木; Masashi Yamamuro; 雅司山室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-03-08
Filing date: 1995-03-08
Publication date: 1996-09-27

Abstract

PURPOSE: To provide a device which supports schemer integration that enables an integrated schemer designer to easily find a corresponding schemer element between object schemers and is troubled by neither classifications of a compli cated different variety nor respective coping methods. CONSTITUTION: Schemer information to be integrated is inputted and the attribute name of the inputted schemer is standardized according to a data item naming rule. It is converted into a concept graph and a term dictionary 107 which holds classification information on terms is used to find the similarity between both schemer elements of the schemer to be integrated. A user interface part 105 shows the calculated similarity between the schemer elements to the operator, and on the basis of the relation between the determined schemer elements, a schemer integration part 104 merges two concept graphs.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は，企業などの組織体で，
既存のデータベースに蓄積されたデータを横断的に利用
する際に必要となる複数のデータベースにまたがる統合
スキーマを作成する作業を支援する装置に関するもので
ある。The present invention relates to an organization such as a company,
The present invention relates to an apparatus that supports the work of creating an integrated schema across a plurality of databases, which is required when using data accumulated in an existing database in a crosswise manner.

【０００２】[0002]

【従来の技術】データベースの概念スキーマを統合する
場合，統合対象のスキーマを同一のデータモデルで表現
して（これを共通データモデルという），そのデータモ
デル上で統合作業を行う。2. Description of the Related Art When integrating conceptual schemas of databases, the schemas to be integrated are represented by the same data model (this is called a common data model), and the integration work is performed on the data model.

【０００３】従来，概念スキーマを表わすために一般的
に用いられている実体関連モデルを，そのまま共通デー
タモデルとして利用していた。実体関連モデルによるス
キーマ統合の場合には，同一の概念が異なるスキーマ表
現で表わされることがあるので，統合対象のスキーマ間
で対応する概念を見付け出す手続きが繁雑であることも
あり，スキーマ統合支援装置は開発されていなかった。
また，従来は，統合対象のスキーマ間の異種性（同一の
概念が異なる形で表現されている）の分類や，その各々
の場合の対処法の研究がなされるにとどまっていた。Conventionally, the entity-relationship model generally used to represent the conceptual schema has been used as it is as a common data model. In the case of schema integration using the entity-relationship model, the same concept may be represented by different schema expressions, so the procedure of finding the corresponding concept between the integration target schemas may be complicated, and schema integration support The device has not been developed.
In the past, research on how to classify the heterogeneity (the same concept is expressed in different forms) between the schemas to be integrated and how to deal with each case has been done.

【０００４】[0004]

【発明が解決しようとする課題】従来の実体関連モデル
によるスキーマ統合の場合には，同一の概念が，片方で
は実体型として，他方では属性として，といった具合
に，異なるスキーマ表現で表わされることがある。In the conventional schema integration by the entity-relationship model, the same concept may be represented by different schema expressions, such as the entity type on the one hand and the attribute on the other hand. is there.

【０００５】図２はこのことを示したものであり，入力
する実体関連モデルのスキーマの例を示したものであ
る。図２の２０１，２０２はそれぞれ統合対象となって
いるスキーマ（実体関連モデルで表わしてある）を表わ
す。通信回線の終端であるビルを表現するのに，図２の
スキーマ２０１では実体型「回線」の属性として（「起
点ビル」，「終点ビル」），図２のスキーマ２０２では
「サーキット」とは別の実体型として表現されている。FIG. 2 shows this, and shows an example of the schema of the entity relation model to be input. Reference numerals 201 and 202 in FIG. 2 denote schemas (represented by entity-relationship models) to be integrated. To represent the building that is the end of the communication line, in the schema 201 of FIG. 2 the attributes of the actual type “line” (“starting building”, “end building”) are used, and in the schema 202 of FIG. It is expressed as another entity type.

【０００６】従来の技術では，このような状況がスキー
マ間で存在する場合の対処法が用意されていただけで，
スキーマ２０１側の属性としての「起点ビル」と「終点
ビル」と，スキーマ２０２側の実体型としての「ビル」
が対応するスキーマ要素であることを，膨大なスキーマ
情報の中から探し出す具体的な方法がなかった。[0006] In the conventional technology, only a method for coping with the case where such a situation exists between schemas is prepared.
"Starting building" and "end building" as attributes on the schema 201 side, and "building" as the entity type on the schema 202 side
There was no specific way to find out from the enormous amount of schema information that is a corresponding schema element.

【０００７】スキーマ統合作業では，対象スキーマの間
で対応するスキーマ要素を探し出すのは，統合スキーマ
の設計者の総合的判断によって行われていた。スキーマ
の規模が大きくなると，このための手間は膨大であっ
た。In the schema integration work, finding the corresponding schema element among the target schemas is performed by a comprehensive judgment of the designer of the integrated schema. As the scale of the schema became larger, the effort for this became enormous.

【０００８】本発明の目的は，このような従来の課題を
解決し，統合スキーマ設計者が，容易に対応するスキー
マ要素を対象スキーマ間で見つけることができ，かつ，
繁雑な異種性の分類とそれぞれの対処法に煩わされるこ
となく，スキーマ統合を支援する装置を提供することで
ある。An object of the present invention is to solve such a conventional problem so that an integrated schema designer can easily find a corresponding schema element between target schemas, and
It is to provide a device that supports schema integration without being bothered by complicated classification of heterogeneity and countermeasures for each.

【０００９】[0009]

【課題を解決するための手段】本発明の上記目的は，以
下に説明するような装置によって達成される。すなわ
ち，統合対象のスキーマを共通データモデルとして概念
グラフに変換し，概念グラフの概念間の類似度を計算す
ることによって，対応スキーマ要素の候補を絞り込み，
統合スキーマ設計者に提示してやることで，作業の効率
を上げる装置である。The above objects of the present invention are achieved by an apparatus as described below. That is, by converting the schema to be integrated into a concept graph as a common data model and calculating the similarity between concepts in the concept graph, candidates for corresponding schema elements are narrowed down,
It is a device that increases the efficiency of work by presenting it to the integrated schema designer.

【００１０】図１は，本発明を実現する装置の構成例を
示す図である。この装置は，例えば図１に示すように，
統合対象のスキーマ情報を取り込むためのスキーマ情報
取り込み部１００と，取り込んだスキーマの属性名をデ
ータ項目命名規則に従って，基本単語表１０６を用いて
標準化するデータ標準化部１０１と，スキーマを概念グ
ラフに変換するモデル変換部１０２と，統合対象スキー
マ双方のスキーマ要素間の類似度を求める類似度計算部
１０３と，計算したスキーマ要素の類似度を操作者に提
示し，操作者がそのスキーマ要素の関係を判断して確定
するためのユーザインタフェース部１０５と，二つの概
念グラフのマージを行うスキーマ統合部１０４とを備え
る。基本単語表１０６は，標準化を行う際に用いられ，
用語辞書１０７は，スキーマ要素の名称に使用される用
語の分類情報を保持し，類似度計算を行う際に用られ
る。FIG. 1 is a diagram showing a configuration example of an apparatus for realizing the present invention. This device is, for example, as shown in FIG.
A schema information importing unit 100 for importing schema information to be integrated, a data standardizing unit 101 for standardizing attribute names of the imported schema according to a data item naming rule, and a schema to concept graph. The model conversion unit 102, the similarity calculation unit 103 that obtains the similarity between the schema elements of both the integration target schemas, and the calculated similarity of the schema elements are presented to the operator, and the operator shows the relationship between the schema elements. A user interface unit 105 for judging and confirming, and a schema integrating unit 104 for merging two concept graphs are provided. The basic word table 106 is used for standardization,
The term dictionary 107 holds classification information of terms used in the names of schema elements, and is used when calculating the degree of similarity.

【００１１】[0011]

【作用】本発明により解決しようとする課題は，以下の
ように解決される。図３は本発明のスキーマ統合支援装
置を用いてスキーマ統合を行う手順の概略を示したもの
である。The problem to be solved by the present invention is solved as follows. FIG. 3 shows an outline of a procedure for performing schema integration using the schema integration support device of the present invention.

【００１２】データ標準化部１０１は，スキーマ情報取
り込み部１００で取り込まれたスキーマ情報について，
属性名が標準化されているかどうかを判定し，標準化さ
れていればそのまま，標準化されていなければ名称の標
準化を行う。The data standardization unit 101 uses the schema information acquired by the schema information acquisition unit 100 as follows.
Whether or not the attribute name is standardized is determined, and if standardized, the name is standardized. If not standardized, the standardized name is standardized.

【００１３】図４は図２の実体関連図で表わされるスキ
ーマ情報について，属性名についてのデータ標準化を行
ったものである。例えば図２のスキーマ２０１について
「起点ビル」は「起点＿ビル＿名」に変更されている。
次に，この実体関連図を概念グラフに変換する。変換の
ための規則は図５および図６による。FIG. 4 is a data standardization of the attribute name of the schema information represented in the entity relation diagram of FIG. For example, in the schema 201 of FIG. 2, “starting point building” is changed to “starting point_building_name”.
Next, this entity relationship diagram is converted into a conceptual graph. The rules for conversion are according to FIGS. 5 and 6.

【００１４】図７のスキーマ７０１は，図２のスキーマ
２０１の実体関連図を概念グラフに変換した結果を示
す。同様に，図７のスキーマ７０２は，図２のスキーマ
２０２の実体関連図をデータ標準化し，概念グラフに変
換した結果を示す。A schema 701 of FIG. 7 shows a result of converting the entity relation diagram of the schema 201 of FIG. 2 into a concept graph. Similarly, the schema 702 of FIG. 7 shows the result of data standardization of the entity relation diagram of the schema 202 of FIG. 2 and conversion into a concept graph.

【００１５】次に，スキーマ要素間の類似度を類似度計
算部１０３で計算する。まず，図３に示す名称類似度計
算３２２で「概念」間の類似度計算を，名称の類似度に
従って行う。この際，用語辞書１０７を用いて類似度を
計算する。Next, the similarity calculation unit 103 calculates the similarity between the schema elements. First, the name similarity calculation 322 shown in FIG. 3 calculates the similarity between “concepts” according to the name similarity. At this time, the term dictionary 107 is used to calculate the degree of similarity.

【００１６】名称の類似度のみで，完全に同一であると
は限らないので，次に各「概念」の周辺情報を加味した
類似度を，図３に示す周辺類似度計算３２３で計算す
る。この周辺類似度計算３２３の計算を，図７のスキー
マ７０１と図７のスキーマ７０２の二つの概念グラフで
表わされたスキーマについて行う場合，このように概念
グラフに変換してスキーマを比較することで，従来の方
式では，「起点ビル」「終点ビル」と「ビル」といった
違いを考慮して計算しなければならなかったのに対し，
本方式においては単純な総当たり計算で，スキーマ要素
間の類似度を計算することができる。Since the names are not the same in terms of similarity only, the similarity is calculated by the peripheral similarity calculation 323 shown in FIG. 3 with the peripheral information of each "concept" taken into consideration. When the calculation of the peripheral similarity calculation 323 is performed for the schemas represented by the two conceptual graphs of the schema 701 of FIG. 7 and the schema 702 of FIG. 7, the schemas are converted in this way and the schemas are compared. So, in the conventional method, the calculation had to be performed in consideration of the difference such as "starting building", "end building" and "building".
In this method, the similarity between schema elements can be calculated by a simple brute force calculation.

【００１７】以上で計算したスキーマ要素間の類似度を
利用して，類似度の高いスキーマ要素の組を順にユーザ
インタフェース部１０５を通してスキーマ統合者に提示
する。これにより，スキーマ統合者は，容易に類似スキ
ーマ要素の候補を得ることができる。Utilizing the similarity between the schema elements calculated above, a set of schema elements having a high similarity is sequentially presented to the schema integrator through the user interface unit 105. As a result, the schema integrator can easily obtain candidates for similar schema elements.

【００１８】類似度の結果を元に，もし，スキーマに修
正が必要であれば修正し（スキーマ調整と呼ぶ。図３の
処理３３０），スキーマ統合部１０４を用いて概念グラ
フのマージを行う。後処理３５０として，必要に応じ
て，モデル変換部１０２を用いて，マージされた概念グ
ラフを実体関連モデルなどのモデルに変換する。Based on the result of the degree of similarity, if the schema needs to be modified, it is modified (referred to as schema adjustment. Process 330 in FIG. 3), and the schema integration unit 104 is used to merge the concept graphs. As post-processing 350, the model conversion unit 102 is used to convert the merged concept graph into a model such as an entity-relationship model, if necessary.

【００１９】以上により，本発明の課題が解決される。The problems of the present invention are solved by the above.

【００２０】[0020]

【実施例】図３に本発明の装置を用いたスキーマ統合作
業のフローを示す。図３において，３００はデータ標準
化の前処理，３１０はモデル変換の処理，３２０はスキ
ーマ比較の処理を表わす。スキーマ比較の処理３２０
は，名称類似度計算３２２，周辺類似度計算３２３およ
びそれらからトータルな類似度計算を行う処理３２４を
含む比較対象スキーマ要素の発見のための処理３２１と
一致性の確定処理３２５からなる。３３０はスキーマ調
整の処理，３４０はスキーママージの処理，３５０はモ
デルの変換等の後処理を表わす。FIG. 3 shows a flow of schema integration work using the apparatus of the present invention. In FIG. 3, reference numeral 300 represents preprocessing for data standardization, 310 represents model conversion processing, and 320 represents schema comparison processing. Schema comparison process 320
Includes a name similarity calculation 322, a marginal similarity calculation 323, and a process 321 for finding a comparison target schema element including a process 324 for performing a total similarity calculation from them and a matching determination process 325. Reference numeral 330 represents schema adjustment processing, 340 represents schema merge processing, and 350 represents post-processing such as model conversion.

【００２１】データ標準化部１０１がデータの標準化を
行う。データ標準化部１０１は，特開平４−２１５１８
２号公報の「データ名付与登録装置」に示されているよ
うな「基本単語表」（基本単語表１０６）を用い，デー
タ項目名に含まれる基本単語のマッチングを行い，含ま
れる修飾語，主要語，区分語を発見し，修飾語＋主要語
＋区分語の標準形にする。もし，区分語が欠けている場
合には，そのデータ項目のデータ型や値の例から，区分
語を推定し付加する。主要語・区分語両方とも欠けてい
る場合には，それ全体を主要語とみなし，区分語が欠け
ている場合と同様の方法で，区分語の付加を行う。The data standardization unit 101 standardizes data. The data standardization unit 101 is described in JP-A-4-21518.
Using the "basic word table" (basic word table 106) as shown in "Data name assignment registration device" of Japanese Patent Publication No. 2), the basic words included in the data item name are matched, the modifiers included, Discover the main word and the demarcation word and make them the standard form of modifier + main word + demarcation word. If the classifier is missing, the classifier is estimated and added from the data type and value example of the data item. If both the main word and the segment word are missing, the entire word is regarded as the main word, and the segment word is added in the same manner as when the segment word is missing.

【００２２】モデル変換部１０２は，名称が標準化され
た実体関連モデル（ＥＲモデル）を図５，図６に示すル
ールを用いて，概念グラフに変換する。図５および図６
において，Ｅは実体型，Ｒは関連，Ｍは修飾語，Ｐは主
要語，Ｃは区分語を表わしている。変換された結果を図
７に示す。The model conversion unit 102 converts an entity-relationship model (ER model) whose name is standardized into a concept graph using the rules shown in FIGS. 5 and 6
In E, E is an entity type, R is a relation, M is a modifier, P is a main word, and C is a classifier. The converted result is shown in FIG.

【００２３】次に，類似度の計算を行う。図６によると
概念名のパターンとしては，「Ｅ」「Ｐ」「Ｐ＿Ｃ」の
３つの場合がある。よって，概念名の名称類似度を求め
る場合，以下の３通りがある。Next, the similarity is calculated. According to FIG. 6, there are three patterns of concept names, "E", "P", and "P_C". Therefore, when obtaining the name similarity of the concept name, there are the following three ways.

【００２４】（１）概念名が両方とも分解していない場
合（つまり，「Ｅ」対「Ｅ」，「Ｐ」対「Ｐ」，「Ｅ」
対「Ｐ」の場合）。（２）概念名が両方とも分解している場合（つまり，
「Ｐ＿Ｃ」対「Ｐ＿Ｃ」の場合）。(1) When both concept names are not decomposed (that is, "E" vs. "E", "P" vs. "P", "E")
Pair "P"). (2) When both concept names are decomposed (that is,
"P_C" vs. "P_C").

【００２５】（３）概念名の一方が分解してなく，もう
一方が分解している場合（つまり，「Ｐ」対「Ｐ＿
Ｃ」，「Ｅ」対「Ｐ＿Ｃ」の場合）。ここで，（３）の場合には，図６により「Ｐ＿Ｃ」と常
にペアで，必ず「Ｐ」という概念が存在する（図６(b)
の場合には，Ｐが存在しないように一見見えるが，Ｐ＝
ＥであるからＰは存在する）。よって，「Ｐ」対「Ｐ＿
Ｃ」，「Ｅ」対「Ｐ＿Ｃ」は類似しているとみなす必要
はない。よってこの組み合わせに対しては，類似度を０
とする。以下に（１）と（２）の場合について名称類似
度の計算法を述べる。(3) When one of the concept names is not decomposed and the other is decomposed (that is, "P" vs. "P_"
C "," E "vs." P_C "). Here, in the case of (3), there is always a concept of "P" always in pair with "P_C" according to Fig. 6 (Fig. 6 (b)).
, It seems that P does not exist, but P =
Since it is E, there exists P). Therefore, "P" vs. "P_
"C", "E" vs. "P_C" need not be considered similar. Therefore, the similarity is 0 for this combination.
And The calculation method of the name similarity will be described below for the cases (1) and (2).

【００２６】（１）概念名が分解していない場合その語について，類似か否かを，特開平４−２１５１８
２号公報の「データ名付与登録装置」にある「標準語対
応表」で同じ標準語に対応する語になっているか否かで
点数をつける。(1) When the concept name is not decomposed Whether the words are similar or not is disclosed in Japanese Patent Laid-Open No. 21518/1992.
A score is given depending on whether or not the words correspond to the same standard word in the "standard word correspondence table" in the "data name assignment registration device" of the publication No. 2.

【００２７】具体的には，図８のアルゴリズムを用い
る。比較する二つの概念名を入力し（Ｓ１０），それら
に対応する標準名を検索する（Ｓ１１，Ｓ１４）。二つ
の概念名の両方に対応する標準名があって，かつそれら
が等しければ，名称類似度は１（最大値）に設定される
（Ｓ１８）。それらが等しくなければ，それらは全く別
のものであるので，０（最小値）に設定される（Ｓ１
９）。一方しか標準名が存在しない場合，および，どち
らも標準名が存在しない場合には，部分列マッチングな
どの手法を用いて，類似度を求める（Ｓ２０）。Specifically, the algorithm shown in FIG. 8 is used. Two concept names to be compared are input (S10), and standard names corresponding to them are searched (S11, S14). If there is a standard name corresponding to both two concept names and they are equal, the name similarity is set to 1 (maximum value) (S18). If they are not equal, they are completely different and are set to 0 (minimum value) (S1
9). If only one standard name does not exist, or if neither standard name exists, the degree of similarity is obtained using a technique such as subsequence matching (S20).

【００２８】このアルゴリズムを適用した例を図９に示
す。概念名「ビル」と「ビルディング」は，標準語対応
表から標準名として両方「ビル」が見つかるので類似度
が１となる。これに対し，「回線」と「サーキット」
は，標準名が片方しか見つからず，部分列マッチングも
ないので，類似度は０となる。An example of applying this algorithm is shown in FIG. The concept names “building” and “building” are both found as standard names in the standard word correspondence table, so the similarity is 1. On the other hand, "line" and "circuit"
Has a similarity of 0 because only one standard name is found and there is no substring matching.

【００２９】（２）概念名が分解している場合文献“関根，川下，町原，中川：体系的なＤＢ構築のた
めの用語辞書を用いたデータ標準化手法，情報処理学会
論文誌第３４巻第３号（１９９３）”にある「類似デー
タ項目分類機能」を用いて，同じ分類になるか否かで点
数をつける。(2) When concept names are decomposed Reference "Sekine, Kawashita, Machihara, Nakagawa: Data standardization method using terminology dictionary for systematic DB construction, IPSJ Transactions Vol. 34" The "similar data item classification function" in "No. 3 (1993)" is used to give a score depending on whether or not they are in the same classification.

【００３０】具体的には図１０のように行う。比較する
二つの概念名を入力し（Ｓ３０），区分語が等しいかど
うか（Ｓ３１，Ｓ３２），主要語が等しいかどうか（Ｓ
３３，Ｓ３４，Ｓ３７，Ｓ３８）を判定し，両方等しい
ものには，類似度として１を設定する（Ｓ３５）。区分
語のみが等しい場合には，類似度として中間の類似度Ｎ
２を設定する（Ｓ３６）。一方，主要語のみが等しい場
合には，区分語のみが等しい場合に比べて，高い類似度
Ｎ１を設定する（Ｓ３９）。両方等しくない場合には，
最も低い類似度Ｎ３を設定する。完全に等しくない語の
類似度は，部分列マッチングなどの手法を用いて，計算
される。Specifically, this is performed as shown in FIG. Two concept names to be compared are input (S30), whether the section words are the same (S31, S32), and whether the main words are the same (S
33, S34, S37, S38), and if both are equal, 1 is set as the similarity (S35). If only the classifiers are equal, the intermediate similarity N is calculated as the similarity.
2 is set (S36). On the other hand, when only the main words are equal, a higher degree of similarity N1 is set as compared with the case where only the segment words are equal (S39). If both are not equal,
The lowest similarity N3 is set. The degree of similarity between words that are not completely equal is calculated using a technique such as subsequence matching.

【００３１】このアルゴリズムを適用した例を図１１に
示す。概念名「回線＿コード」と「回線＿番号」は，区
分語が等しくなく，主要語が等しいので，類似度Ｎ１が
設定される。これに対し，例えば概念名「回線＿コー
ド」と「サーキット＿コード」は，主要語の分類が等し
くなく，区分語が等しいので，Ｎ１より小さい中間の類
似度Ｎ２が設定される。An example of applying this algorithm is shown in FIG. The concept names “line_code” and “line_number” do not have the same divisional word but the same main word, so the similarity N1 is set. On the other hand, for example, in the concept names “line_code” and “circuit_code”, the classification of the main words is not the same and the segment words are the same, so an intermediate similarity N2 smaller than N1 is set.

【００３２】次に，このように計算された名称類似度を
用いて，周辺の類似度を計算する。概念Ａ，Ｂに対し
て，それぞれに隣接している概念の集合をＳＡ，ＳＢと
する。周辺の類似度は，以下の式によって計算される。Next, using the name similarity calculated in this way, peripheral similarity is calculated. For concepts A and B, sets of concepts adjacent to each other are defined as SA and SB. The peripheral similarity is calculated by the following formula.

【００３３】（ＡとＢの周辺の類似度）＝ sum（ max
（ＳＡ，ＳＢ内要素間の名称類似度））／ min（ＳＡの
個数，ＳＢの個数）ここで，（ＳＡの個数）≦（ＳＢの個数）で，max はＳ
Ｂにおける最大，sumはＡについての総和（逆の場合も
同様）を表わす。(Similarity around A and B) = sum (max
(Name similarity between elements in SA and SB) / min (number of SAs, number of SBs) where (number of SAs) ≤ (number of SBs), max is S
The maximum and sum in B represent the total sum for A (and vice versa).

【００３４】図７の例の「回線」（＝Ａとする）と「サ
ーキット」（＝Ｂとする）の場合を考える。これらは標
準の用語辞書１０７に登録されていないとする。この二
つは名称の類似度は低い。Consider the cases of "line" (= A) and "circuit" (= B) in the example of FIG. It is assumed that these are not registered in the standard term dictionary 107. The two have low similarity in name.

【００３５】これらの隣接する概念は，ＳＡ＝｛回線＿コード，回線＿速度，ビル｝ＳＢ＝｛サーキット＿コード，サーキット＿速度，ビ
ル｝である。そして， min（ＳＡの個数，ＳＢの個数）＝３
である。These adjacent concepts are: SA = {line_code, line_speed, building} SB = {Circuit_code, circuit_speed, building} And min (number of SAs, number of SBs) = 3
Is.

【００３６】ＳＢの概念とＳＡの概念の名称類似度を求
めると，「サーキット＿コード」に対しては，「回線＿
コード」が最大になり（類似度はＮ２となる。Ｎ２＝
０．３とする），「サーキット＿速度」に対しては，
「回線＿速度」が最大になり（同様に０．３である），
「ビル」に対しては，「ビル」が最大になる（値は１で
ある）。When the name similarity between the concept of SB and the concept of SA is calculated, “circuit_code” is calculated as “line_code”.
“Code” is maximized (similarity is N2. N2 =
0.3), and for "Circuit_speed",
"Line_speed" is maximum (also 0.3),
For "Building", "Building" is the maximum (the value is 1).

【００３７】よって， sum（ max（ＳＡ，ＳＢ内要素間
の名称類似度））＝０．３＋０．３＋１．０＝１．６と
なる。従って，（「回線」と「サーキット」の周辺の類
似度）＝１．６／３≒０．５３となる。単独の名称類似
度のみでは，類似とみなされなかった概念が，周辺の類
似度を考慮することにより，その値は高くないものの類
似とみなされるようになる。Therefore, sum (max (name similarity between elements in SA and SB)) = 0.3 + 0.3 + 1.0 = 1.6. Therefore, (similarity around “line” and “circuit”) = 1.6 / 3≈0.53. A concept that is not considered to be similar only by a single name similarity is considered to be similar although its value is not high by considering peripheral similarities.

【００３８】最終的には，名称の類似度と周辺の類似度
の加重平均を類似度とする。上記の「回線」と「サーキ
ット」の例の場合，名称の類似度が０であり，周辺の類
似度が０．５３であるから，名称の重み：周辺の重みを
１：１とすると，最終的な類似度は，（０²＋０．５３
²）^1/2／２^1/2≒０．３７となる。Finally, the weighted average of the similarity of the name and the similarity of the surroundings is set as the similarity. In the case of the above “line” and “circuit”, the name similarity is 0 and the peripheral similarity is 0.53. Therefore, if the name weight: peripheral weight is 1: 1, the final The similarity is (0 ² +0.53
²⁾ the ^{^1/2} / 2 ^1/2 ≒ 0.37.

【００３９】次に，ユーザはユーザインタフェース部１
０５を用いて，概念が一致しているかどうかを確定す
る。類似度がしきい値よりも高い概念の組が，類似度が
高い順にユーザに提示される。それを見て，また，必要
な時にはその周辺を参照して，一致性を確定する。Next, the user operates the user interface section 1
Use 05 to determine if the concepts match. A set of concepts whose similarity is higher than the threshold is presented to the user in descending order of similarity. Look at it and, if necessary, refer to its surroundings to establish agreement.

【００４０】また，概念間の関係の一致性も確定する。
そして，二つの概念グラフのマージを行う。マージのア
ルゴリズムは，例えば文献“Sowa,J.F. Conceptual Str
uctures: Information Processing in Mind and Machin
e. Addison-Wesely.(1984)”に記載の方法による。マー
ジした概念グラフは，必要であれば，実体関連モデルに
変換する。Further, the agreement of the relation between the concepts is also determined.
Then, the two concept graphs are merged. The merging algorithm is, for example, the document “Sowa, JF Conceptual Str.
uctures: Information Processing in Mind and Machin
e. Addison-Wesely. (1984) ”. The merged concept graph is converted into an entity-relationship model if necessary.

【００４１】この変換のアルゴリズムを図１２，図１３
に示す。ｏｒｇ（Ｃ）は概念Ｃの由来を表わす。Ｅは実
体型，Ｒは関連，Ｐは主要語，Ｃは区分語を表わす。下
付の文字は，由来を表わすことにする。例えば，Ｃ_Eは
実体型を由来に持つ概念である。由来の＊は任意の由来
を表わしている。Ｃ≫Ｅは，概念Ｃが実体型Ｅに変換さ
れたことを表わす。この変換ルールの基本的な考え方
は，もと実体型だったものは，実体型に変換し，実体型
に変換される概念と実体型に変換される概念との間の関
係は，関連型に変換するというルールである。［Ｃ₁］
−（Ｒ）−＞［Ｃ ₂］などの表記は，前述のＳｏｗａの
文献に従っている。This conversion algorithm is shown in FIGS.
Shown in. org (C) represents the origin of the concept C. E is real
Body type, R is related, P is a main word, and C is a section word. under
The letters attached indicate the origin. For example, C_EIs
It is a concept that has a substantive type as its origin. Origin * is any origin
Is represented. C >> E means that the concept C is converted to the substantive type E.
It means that it was done. Basic idea of this conversion rule
Converts what was originally a substantive type to a substantive type,
The relationship between the concept converted to
The clerk is the rule to convert to the related type. [C₁]
-(R)-> [C ₂], Etc.
Follows the literature.

【００４２】このルールを用いて概念グラフを実体関連
モデルに変換した例を図１４〜図１６に示す。図１４の
(a) ，(b) は二つの入力のスキーマの例を示す。それ
を，概念グラフに変換した結果が，図１５に示す(a) ，
(b) である。それをマージした結果を，図１６の(a) に
示す。それを実体関連モデルに変換した結果を，図１６
の(b) に示す。14 to 16 show examples in which the concept graph is converted into the entity relation model using this rule. Of FIG.
(a) and (b) show examples of two input schemas. The result of converting it into a concept graph is shown in Fig. 15 (a),
It is (b). The result of merging them is shown in (a) of FIG. The result of converting it into an entity-relationship model is shown in FIG.
(B) of.

【００４３】[0043]

【発明の効果】以上説明したように，本発明のデータベ
ース概念スキーマ統合支援装置によれば，データベース
の概念スキーマを統合する作業の効率が向上する。As described above, according to the database conceptual schema integration support device of the present invention, the efficiency of the work of integrating the conceptual schemas of the databases is improved.

[Brief description of drawings]

【図１】本発明を実現する装置の構成例を示した図であ
る。FIG. 1 is a diagram showing a configuration example of an apparatus that realizes the present invention.

【図２】入力する実体関連モデルのスキーマの例を示し
た図である。FIG. 2 is a diagram showing an example of a schema of an entity relation model to be input.

【図３】本発明の処理の概要を示した図である。FIG. 3 is a diagram showing an outline of processing of the present invention.

【図４】図２の例からデータ項目名の標準化を施した結
果を示した図である。FIG. 4 is a diagram showing a result of standardizing data item names from the example of FIG.

【図５】実体関連モデルから概念グラフへの変換ルール
を示した図である。FIG. 5 is a diagram showing a conversion rule from an entity relation model to a concept graph.

【図６】実体関連モデルから概念グラフへの変換ルール
を示した図である。FIG. 6 is a diagram showing a conversion rule from an entity relation model to a concept graph.

【図７】図２の例から変換された概念グラフを示した図
である。FIG. 7 is a diagram showing a conceptual graph converted from the example of FIG.

【図８】概念名が分解されていない場合の名称類似度の
計算処理を示した図である。FIG. 8 is a diagram showing a calculation process of a name similarity when the concept name is not decomposed.

【図９】概念名が分解されていない場合の名称類似度の
計算例を示した図である。FIG. 9 is a diagram showing an example of calculation of name similarity when concept names are not decomposed.

【図１０】概念名が分解されている場合の名称類似度の
計算処理を示した図である。FIG. 10 is a diagram showing a name similarity calculation process when concept names are decomposed.

【図１１】概念名が分解されている場合の名称類似度の
計算例を示した図である。FIG. 11 is a diagram showing an example of calculation of a name similarity when concept names are decomposed.

【図１２】モデル変換のアルゴリズムを示した図であ
る。FIG. 12 is a diagram showing an algorithm of model conversion.

【図１３】モデル変換のアルゴリズムを示した図であ
る。FIG. 13 is a diagram showing an algorithm of model conversion.

【図１４】モデル変換の例を示した図である。FIG. 14 is a diagram showing an example of model conversion.

【図１５】モデル変換の例を示した図である。FIG. 15 is a diagram showing an example of model conversion.

【図１６】モデル変換の例を示した図である。FIG. 16 is a diagram showing an example of model conversion.

[Explanation of symbols]

１００スキーマ情報取り込み部１０１データ標準化部１０２モデル変換部１０３スキーマ要素間の類似度計算部１０４スキーマ統合部１０５ユーザインタフェース部１０６基本単語表１０７用語辞書 100 Schema Information Importing Part 101 Data Standardizing Part 102 Model Converting Part 103 Similarity Calculation Part between Schema Elements 104 Schema Integration Part 105 User Interface Part 106 Basic Word Table 107 Term Dictionary

Claims

[Claims]

1. A schema information importing unit for importing schema information to be integrated in a database conceptual schema integration support device which supports inputting conceptual schema information of a plurality of databases and generating integrated schemas thereof. , Holds the data standardization part that standardizes the attribute names of the imported schema according to the specified data item naming rules, the model conversion part that converts the imported schema into a concept graph, and the classification information of the terms used in the names of schema elements The term dictionary, a similarity calculation unit that obtains the degree of similarity between schema elements of both integration target schemas by using the term dictionary, and the operator are presented with the calculated degree of similarity of the schema elements. A user interface part for judging and confirming relationships and merging of multiple concept graphs A database conceptual schema integration support apparatus comprising: a schema integration unit.