JP4859456B2

JP4859456B2 - Data schema mapping program and computer system

Info

Publication number: JP4859456B2
Application number: JP2005374359A
Authority: JP
Inventors: 敦子小泉; 勝竹内; 敏子相薗; 康嗣森本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-12-27
Filing date: 2005-12-27
Publication date: 2012-01-25
Anticipated expiration: 2025-12-27
Also published as: US20070150495A1; US7996437B2; JP2007179146A

Description

本発明は、複数のデータスキーマ間でデータ項目名を対応付ける技術に関する。 The present invention relates to a technique for associating data item names among a plurality of data schemas.

近年、インターネットを基盤としたBtoB（企業間）サービスへのニーズが高まり、企業間の情報・サービスの相互運用性を向上させるための国際標準化、業界標準化が進んでいる。例えばEDI（電子データ交換）についてはebXML（electronic business XML）仕様の業界標準XML/EDIが主流になっており、また、財務情報についてはXBRL（eXtensible Business Reporting Language）が主流となっており、各企業独自の仕様で記述されたデータの項目を標準仕様の項目に対応付ける必要がある。対応付けのためのツールとしては、各種独自フォーマットファイルとEDI標準フォーマットファイルの相互変換を行うツールがいくつか存在するが、いずれも複数のデータベースのフォーマット間のデータ項目の対応関係の定義はGUIを使って人手で行う必要がある。そのため、データベースシステムや情報検索システムの新規導入や業界標準へのバージョンアップなどの変化への対応に手間がかかり、さらに、項目名をマッピングするノウハウが属人的に蓄積されるなどの問題が生じている。 In recent years, the need for Internet-based BtoB (business-to-business) services has increased, and international standardization and industry standardization to improve the interoperability of information and services between businesses has been progressing. For example, for EDI (electronic data exchange), the industry standard XML / EDI of the ebXML (electronic business XML) specification has become mainstream, and for financial information, XBRL (eXtensible Business Reporting Language) has become mainstream. It is necessary to associate data items described in company-specific specifications with standard specification items. There are several tools for mapping between various original format files and EDI standard format files as mapping tools, but all use GUI to define the correspondence of data items between multiple database formats. It is necessary to use it manually. As a result, it takes time to deal with changes such as the introduction of new database systems and information retrieval systems and version upgrades to industry standards, and problems such as the accumulation of know-how for mapping item names have occurred personally. ing.

上記データ項目名の対応付けに関する従来技術としては、あらかじめ用意した辞書やオントロジ（仕様記述に使用される語彙の関係）を用いる方法が知られている。例えば、特許文献１には、辞書を用いてデータ項目名を「修飾語＋主要語＋区分語」に分解し、各要素が辞書に一致するかどうかに基づいてデータ項目名の類似度を計算する方法が開示されている。 As a conventional technique related to the association of the data item names, there is known a method using a dictionary or ontology (relationship of vocabulary used for specification description) prepared in advance. For example, in Patent Document 1, a data item name is decomposed into “qualifier + main word + classified word” using a dictionary, and the similarity of the data item name is calculated based on whether each element matches the dictionary. A method is disclosed.

また、特許文献２には、情報検索の手段として、オントロジに概念間の階層関係や類義関係を記述し、オントロジを使って近似的な検索条件を生成する方法が開示されている。 Patent Document 2 discloses a method for generating an approximate search condition using an ontology by describing a hierarchical relationship or a synonym relationship between concepts in an ontology as information retrieval means.

また、特許文献３には、電子文書を取り込んで、２個の形態素間の概念関係を示すデータを出力する技術が開示されている。
特開平８−２４９３３８号特開２００３−３４５８２１号特開２００５−１５７８２３号 Patent Document 3 discloses a technique for taking in an electronic document and outputting data indicating a conceptual relationship between two morphemes.
JP-A-8-249338 JP 2003-345821 A JP 2005-157823 A

上記従来技術では、データ項目名の対応付けのためにあらかじめ辞書やオントロジに概念間の階層関係や類義関係を記述しておく必要がある。また、データ項目名の対応付けでは、同じような構成要素語からなるデータ項目名の中から対応するものを選ぶ必要があるため、従来の「類似度」を計算する方法では候補を絞りきれないという問題がある。例えば、あるデータスキーマにおける「注文日付」というデータ項目名が別のデータスキーマにおけるどのデータ項目名に対応するかという対応付けをする際に、類似度だけでは「注文年月日」、「注文番号」などの候補が得られるが、絞りこみが難しい。 In the above prior art, it is necessary to describe hierarchical relationships and synonymous relationships between concepts in advance in a dictionary or ontology for associating data item names. In addition, in the association of data item names, it is necessary to select corresponding ones from data item names composed of similar component words, and thus the conventional method of calculating “similarity” cannot narrow down candidates. There is a problem. For example, when associating which data item name “order date” in one data schema corresponds to which data item name in another data schema, the “order date” and “order number” are based only on the similarity. Can be obtained, but it is difficult to narrow down.

そこで本発明は、上記問題点に鑑みてなされたもので、異なるデータスキーマ間で項目名の対応付けを行う際に、関係はあるが区別すべき語彙を抽出し、意味のある弁別関係を構築することでデータ項目名の対応付けの候補を高精度で絞りこむことを目的とする。 Therefore, the present invention has been made in view of the above problems, and when linking item names between different data schemas, vocabularies that are related but should be distinguished are extracted, and a meaningful discrimination relationship is established. This is intended to narrow down the candidates for data item name association with high accuracy.

本発明は、データ構造を記述した第１のデータスキーマと第２のデータスキーマを参照して、前記第１のデータスキーマを構成するデータ項目名と第２のデータスキーマを構成するデータ項目名の対応付けを行うプログラムであって、前記第１のデータスキーマを構成するデータ項目名を抽出し、前記第２のデータスキーマを構成するデータ項目名を抽出し、前記抽出した第１のデータスキーマと第２のデータスキーマのそれぞれのデータ項目名に含まれる要素概念を抽出し、前記抽出した要素概念からデータ項目名間の弁別関係を設定し、前記第１のデータスキーマのデータ項目名と第２のデータスキーマからデータ項目名をそれぞれ読み込んで、前記弁別関係に基づいて前記読み込んだ２つのデータ項目名について対応関係を設定し、前記データ項目名の対応関係を蓄積する。 The present invention refers to the first data schema describing the data structure and the second data schema, and the names of the data items constituting the first data schema and the data item names constituting the second data schema. A program for associating, extracting data item names constituting the first data schema, extracting data item names constituting the second data schema, and extracting the first data schema and An element concept included in each data item name of the second data schema is extracted, a discrimination relationship between the data item names is set from the extracted element concept, and the data item name of the first data schema and the second Each data item name is read from each data schema, and a correspondence relation is set for the two read data item names based on the discrimination relationship. To store the correspondence between the data item name.

また、前記要素概念の抽出は、前記第１のデータスキーマまたは第２のデータスキーマについて、それぞれ２つのデータ項目名を読み込み、前記２つのデータ項目名の文字列を比較して同義の文字列を含むときに、前記２つのデータ項目名を構成する文字列のうち共通する１組の文字列を第１の要素概念として抽出し、前記２つのデータ項目名の文字列を比較して同義の文字列を含むときに、前記２つのデータ項目名を構成する文字列のうち共通する文字列を除いた１組の文字列を第２の要素概念として抽出する。 The element concept is extracted by reading two data item names for each of the first data schema and the second data schema and comparing the character strings of the two data item names to obtain synonymous character strings. When included, a common set of character strings out of character strings constituting the two data item names is extracted as a first element concept, and the character strings having the same meaning are compared by comparing the character strings of the two data item names. When a column is included, a set of character strings excluding a common character string from among character strings constituting the two data item names is extracted as a second element concept.

したがって、本発明は、データ項目名を構成する語彙の概念間の弁別関係および時間的順序関係を利用することにより、データ項目名の対応付け候補を絞ることができる。例えば、「『番号』と『日付』は弁別性のある概念（区別すべき概念）である」という知識に基づいて、「『注文日付』に対応するデータ項目名候補のうち、『注文番号』は『注文日付』とは弁別性のある概念なので候補から除外する」と判断することができる。換言すれば、複合語にしたときに意味の異なる項目として扱うことができる概念を弁別関係とし、この弁別関係を付加することで、項目名の絞り込みを高精度で行うことが可能となる。 Therefore, according to the present invention, by using the discrimination relationship and temporal order relationship between the vocabulary concepts constituting the data item name, the data item name correspondence candidates can be narrowed down. For example, based on the knowledge that ““ number ”and“ date ”are distinguishable concepts (concepts to be distinguished),” “order number” among the candidate data item names corresponding to “order date” Can be determined to be excluded from the candidates because “order date” is a distinctive concept. In other words, it is possible to narrow down item names with high accuracy by adding a discrimination relationship to a concept that can be treated as an item having a different meaning when a compound word is used.

また、項目名の時間的順序関係を利用することにより、データ項目名の対応関係の候補選択をより高精度で行うことができる。 In addition, by using the temporal order relationship of the item names, it is possible to select a candidate for the correspondence relationship of the data item names with higher accuracy.

以下、本発明の一実施形態を添付図面に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

第１の実施形態では、ターゲットとなるデータベースの項目名をソースとなるデータベースの項目名に対応付けるため、２つのデータベースのデータスキーマ定義およびインスタンス文書から概念間の弁別関係（区別すべき概念で同義または類義にはなり得ない関係）および時間的順序関係を抽出してオントロジに格納する機能と、オントロジを用いてデータ項目名の対応関係の絞込みを行う機能とを備えたデータスキーママッピング定義支援システムの一例を示す。 In the first embodiment, in order to associate an item name of a target database with an item name of a source database, a discriminating relationship between concepts (synonymous with a concept to be distinguished or Data schema mapping definition support system that has a function to extract the relations that cannot be similar) and temporal order relations and store them in the ontology, and a function to narrow down the correspondence of the data item names using the ontology An example is shown.

図１は、第１の実施形態を示し、企業内や企業間でのデータ連携のために、異なるデータスキーマ間でのデータ項目の対応関係を示す「データスキーママッピング定義ファイル」の作成を支援する計算機システムのブロック図を示す。 FIG. 1 shows the first embodiment, and supports the creation of a “data schema mapping definition file” indicating the correspondence of data items between different data schemas for data linkage within a company or between companies. The block diagram of a computer system is shown.

計算機システムは、演算処理を行うＣＰＵ１０１と、キーボードやマウス等で構成される入力装置１０２と、演算結果等を表示する表示装置１０３と、各データやプログラムを格納するストレージ装置１１０と、一時的にデータ（テーブル等）を格納するメモリ１０８とから構成される。 The computer system includes a CPU 101 that performs arithmetic processing, an input device 102 that includes a keyboard, a mouse, and the like, a display device 103 that displays arithmetic results and the like, a storage device 110 that stores data and programs, and temporarily. And a memory 108 for storing data (table or the like).

ストレージ装置１１０には、入力されたデータを格納する入力データ格納部１０４と、後述するオントロジ構築用データ格納部１０５と、仕様記述に使用される語彙の関係を定義したオントロジを格納するオントロジ格納部１０６と、生成したデータスキーママッピング定義を格納するデータスキーママッピング定義ファイル格納部１０７とが含まれる。 The storage device 110 includes an input data storage unit 104 that stores input data, an ontology construction data storage unit 105 that will be described later, and an ontology storage unit that stores an ontology that defines vocabulary relationships used in specification descriptions. 106 and a data schema mapping definition file storage unit 107 for storing the generated data schema mapping definition.

入力データ格納部１０４は、ソースデータスキーマ定義文書格納部１０４１、ソースインスタンス文書格納部１０４２、ターゲットデータスキーマ定義文書格納部１０４３、ターゲットインスタンス文書格納部１０４４によって構成されている。 The input data storage unit 104 includes a source data schema definition document storage unit 1041, a source instance document storage unit 1042, a target data schema definition document storage unit 1043, and a target instance document storage unit 1044.

ソースデータスキーマ定義文書格納部１０４１には、ソースとなるデータベースのスキーマを定義した文書が格納される。ソースインスタンス文書格納部１０４２には、基準となるデータベースの実際のデータが格納される。 The source data schema definition document storage unit 1041 stores a document that defines a schema of a source database. The source instance document storage unit 1042 stores actual data of a reference database.

ターゲットデータスキーマ定義文書格納部１０４３には、ソースとなるデータベースに項目名を対応付けるターゲットのデータベースのスキーマを定義した文書が格納される。ターゲットインスタンス文書格納部１０４４には、ターゲットとなるデータベースの実際のデータが格納される。なお、
図２（ａ）〜（ｄ）は、入力データ格納部１０４のソースデータスキーマ定義文書格納部１０４１、ソースインスタンス文書格納部１０４２、ターゲットデータスキーマ定義文書格納部１０４３、およびターゲットインスタンス文書格納部１０４４に格納された文書の記述例を示す。なお、これらの文書は、上述のebXML（electronic business XML）仕様のXML/EDIや、財務情報の場合にはXBRL（eXtensible Business Reporting Language）等で記述されるものである。 The target data schema definition document storage unit 1043 stores a document that defines a schema of a target database that associates an item name with a source database. The target instance document storage unit 1044 stores actual data of the target database. In addition,
2A to 2D show the source data schema definition document storage unit 1041, the source instance document storage unit 1042, the target data schema definition document storage unit 1043, and the target instance document storage unit 1044 of the input data storage unit 104. A description example of the stored document is shown. These documents are described in XML / EDI of the above-described ebXML (electronic business XML) specification, XBRL (eXtensible Business Reporting Language) or the like in the case of financial information.

図２（ａ）は、ソースデータスキーマ定義文書１０４１１の一例を示し、データ項目名（element name）として「注文日付」が、日付のデータタイプで定義され、データ項目名として「取引番号」が整数のデータタイプで定義された例を示す。 FIG. 2A shows an example of the source data schema definition document 10411, where “order date” is defined as a data item name (element name) with a date data type, and “transaction number” is an integer as a data item name. An example defined with a data type of

図２（ｂ）は、ソースデータインスタンス文書１０４２１の一例を示し、「注文日付」のデータとして「２００５０９２３」が格納され、「取引番号」のデータとして「０００２００３３」が格納された例を示す。 FIG. 2B shows an example of the source data instance document 10421, in which “20050923” is stored as “order date” data and “00020033” is stored as “transaction number” data.

図２（ｃ）は、ターゲットデータスキーマ定義文書１０４３１の一例を示し、データ項目名（element name）として「注文年月日」が、日付のデータタイプで定義され、データ項目名として「注文番号」が整数のデータタイプで定義された例を示す。 FIG. 2C shows an example of the target data schema definition document 10431, in which “order date” is defined as the data item name (element name) with the date data type, and “order number” as the data item name. Here is an example where is defined with an integer data type.

図２（ｄ）は、ターゲットデータインスタンス文書１０４４１の一例を示し、「注文年月日」のデータとして「２００４０５１２」が格納され、「注文番号」のデータとして「０００１０１１２３」が格納された例を示す。 FIG. 2D shows an example of the target data instance document 10441, in which “20044051” is stored as “order date” data and “000101123” is stored as “order number” data. .

次に、オントロジ構築用データ格納部１０５は、ソースデータスキーマデータ項目情報格納部１０５１、ターゲットデータスキーマデータ項目情報格納部１０５２、および概念情報格納部１０５３によって構成されている。オントロジ格納部１０６には、オントロジ１０６１が格納されている。 Next, the ontology construction data storage unit 105 includes a source data schema data item information storage unit 1051, a target data schema data item information storage unit 1052, and a concept information storage unit 1053. An ontology 1061 is stored in the ontology storage unit 106.

図３（ａ）、（ｂ）に、ソースデータスキーマデータ項目関係格納部１０５１およびターゲットデータスキーマデータ項目関係格納部１０５２のデータ構造を示す。 3A and 3B show the data structures of the source data schema data item relationship storage unit 1051 and the target data schema data item relationship storage unit 1052.

図３（ａ）のソースデータスキーマデータ項目情報格納部１０５１は、図２（ａ）、（ｂ）に示したソースデータスキーマ定義文書１０４１１とソースデータインスタンス文書１０４２１から後述するように抽出したデータを格納する。ソースデータスキーマデータ項目情報格納部１０５１は、一つのエントリに、ソースデータスキーマ定義文書の項目名を格納するデータ項目名１０５１１と、ソースデータスキーマ定義文書に付与されたスキーマ名１０５１２と、データ項目名１０５１１の親（上位概念）となるデータ項目名を格納する親項目名１０５１３と、データ項目名１０５１１に定義されたデータタイプを格納するデータ属性１０５１４と、データ項目名１０５１１に対応するインスタンスを頻度順に格納するインスタンスリスト１０５１５から構成される。 The source data schema data item information storage unit 1051 in FIG. 3A stores data extracted from the source data schema definition document 10411 and the source data instance document 10421 shown in FIGS. 2A and 2B as described later. Store. The source data schema data item information storage unit 1051 has, in one entry, a data item name 10511 for storing the item name of the source data schema definition document, a schema name 10512 given to the source data schema definition document, and a data item name. A parent item name 10513 for storing a data item name which is a parent (superior concept) of 10511, a data attribute 10514 for storing a data type defined in the data item name 10511, and instances corresponding to the data item name 10511 are arranged in order of frequency. It consists of an instance list 10515 to be stored.

図３（ｂ）のターゲットデータスキーマデータ項目情報格納部１０５２は、図２（ｃ）、（ｄ）に示したターゲットデータスキーマ定義文書１０４３１とターゲットデータインスタンス文書１０４４１から後述するように抽出したデータを格納する。ターゲットデータスキーマデータ項目情報格納部１０５２は、一つのエントリに、ターゲットデータスキーマ定義文書の項目名を格納するデータ項目名１０５２１と、ターゲットデータスキーマ定義文書に付与されたスキーマ名１０５２２と、データ項目名１０５２１の親（上位概念）となるデータ項目名を格納する親項目名１０５２３と、データ項目名１０５２１に定義されたデータタイプを格納するデータ属性１０５２４と、データ項目名１０５２１に対応するインスタンスを格納するインスタンスリスト１０５２５から構成される。 The target data schema data item information storage unit 1052 in FIG. 3B stores data extracted from the target data schema definition document 10431 and the target data instance document 10441 shown in FIGS. 2C and 2D as described later. Store. The target data schema data item information storage unit 1052 has, in one entry, a data item name 10521 for storing the item name of the target data schema definition document, a schema name 10522 given to the target data schema definition document, and a data item name. A parent item name 10523 for storing a data item name which is a parent (superordinate concept) of 10521, a data attribute 10524 for storing a data type defined in the data item name 10521, and an instance corresponding to the data item name 10521 are stored. An instance list 10525 is included.

また、概念情報格納部１０５３は、ソースデータスキーマ定義文書格納部１０４１およびターゲットデータスキーマ定義文書格納部１０４３のデータ項目名を構成する概念が後述するように格納される。また、オントロジ１０６１には、項目名間の概念関係が後述するように格納される。 The concept information storage unit 1053 stores the concepts constituting the data item names of the source data schema definition document storage unit 1041 and the target data schema definition document storage unit 1043 as described later. The ontology 1061 stores the conceptual relationship between the item names as will be described later.

データスキーママッピング定義ファイル格納部１０７には、ソースデータスキーマ定義文書とターゲットデータスキーマ定義文書におけるデータ項目の対応付けの結果がデータスキーママッピング定義ファイル１０７１が格納されている。 The data schema mapping definition file storage unit 107 stores a data schema mapping definition file 1071 as a result of associating data items in the source data schema definition document and the target data schema definition document.

メモリ１０８には、語彙の概念間の弁別関係および時間的順序関係を記述したオントロジ１０６１を利用してデータ項目名の対応付けを行うプログラムとして、データ項目情報取り込み手段１０８１、概念抽出手段１０８２、弁別関係抽出手段１０８３、同義関係抽出手段１０８４、順序関係抽出手段１０８５、オントロジ構築支援手段１０８６、データ項目マッピング定義支援手段１０８７がロードされ、ＣＰＵ１０１により実行される。 In the memory 108, as a program for associating data item names using the ontology 1061 describing the discrimination relationship and temporal order relationship between vocabulary concepts, a data item information capturing unit 1081, a concept extraction unit 1082, a discrimination A relationship extraction unit 1083, a synonym relationship extraction unit 1084, an order relationship extraction unit 1085, an ontology construction support unit 1086, and a data item mapping definition support unit 1087 are loaded and executed by the CPU 101.

（データ項目情報の取り込み）
以下、メモリ１０８にロードされたオントロジ構築支援とデータマッピング（データ項目名の対応関係の絞り込み）を行う各プログラムの動作について、以下に説明する。 (Import data item information)
The operation of each program that performs ontology construction support and data mapping (narrowing down the correspondence between data item names) loaded in the memory 108 will be described below.

まず、データ項目情報取り込み手段１０８１は、入力データ格納部１０４のソースデータスキーマ定義文書格納部１０４１からデータスキーマ定義文書を読み込んで、ソースデータスキーマデータ項目関係格納部１０５１にデータ項目情報を取り込み、ターゲットデータスキーマ定義文書格納部１０４３からデータスキーマ定義文書を読み込んで、ターゲットデータスキーマデータ項目関係格納部１０５２にデータ項目情報を取り込む。 First, the data item information capturing unit 1081 reads the data schema definition document from the source data schema definition document storage unit 1041 of the input data storage unit 104, captures the data item information into the source data schema data item relation storage unit 1051, and sets the target. The data schema definition document is read from the data schema definition document storage unit 1043, and the data item information is taken into the target data schema data item relationship storage unit 1052.

このデータ項目情報取り込み手段１０８１の処理手順を図６のフローチャートに沿って説明する。 The processing procedure of the data item information fetching means 1081 will be described with reference to the flowchart of FIG.

データ項目情報取り込み手段１０８１は、ソースデータスキーマ定義文書格納部１０４１からデータ項目情報を取り込み、ソースデータスキーマデータ項目情報格納部１０５１のデータ項目名１０５１１、データスキーマ名１０５１２、親項目名１０５１３、データ属性１０５１４に格納する（Ｓ６０１）。 The data item information capturing unit 1081 captures data item information from the source data schema definition document storage unit 1041, data item name 10511, data schema name 10512, parent item name 10513, data attribute of the source data schema data item information storage unit 1051. It is stored in 10514 (S601).

さらにデータ項目情報取り込み手段１０８１は、ソースインスタンス文書格納部１０４２から各データ項目のインスタンスを抽出してインスタンスの頻度を集計し、ソースデータスキーマデータ項目情報格納部１０５１のインスタンスリスト１０５１５に頻度順に格納する（Ｓ６０２）。 Further, the data item information fetching unit 1081 extracts the instances of the respective data items from the source instance document storage unit 1042, aggregates the instance frequencies, and stores them in the instance list 10515 of the source data schema data item information storage unit 1051 in order of frequency. (S602).

同様に、ターゲットデータスキーマ定義文書格納部１０４３からデータ項目情報を取り込み、ターゲットデータスキーマデータ項目情報格納部１０５２のデータ項目名１０５２１、データスキーマ名１０５２２、親項目名１０５２３、データ属性１０５２４に格納する（Ｓ６０３）。さらにデータ項目情報取り込み手段１０８１はターゲットインスタンス文書格納部１０４４から各データ項目のインスタンスを抽出してインスタンスの頻度を集計し、ターゲットデータスキーマデータ項目情報格納部１０５２のインスタンスリスト１０５２５に頻度順にインスタンスを格納する（Ｓ６０４）。 Similarly, data item information is fetched from the target data schema definition document storage unit 1043 and stored in the data item name 10521, data schema name 10522, parent item name 10523, and data attribute 10524 of the target data schema data item information storage unit 1052 ( S603). Further, the data item information fetching means 1081 extracts the instances of the respective data items from the target instance document storage unit 1044, aggregates the instance frequencies, and stores the instances in the instance list 10525 of the target data schema data item information storage unit 1052 in order of frequency. (S604).

上記処理により、基準となるソースデータスキーマ定義文書格納部１０４１に格納されたデータベースのデータスキーマ定義文書（以下、単にソースデータスキーマとする）と、このソースデータスキーマと相互変換を行いたいターゲットデータスキーマ定義文書格納部１０４３に格納されたデータベースのデータスキーマ定義文書（以下、単にターゲットデータスキーマとする）の項目名と頻度順のインスタンスがソースデータスキーマデータ項目情報格納部１０５１とターゲットデータスキーマデータ項目情報格納部１０５２にそれぞれ格納される。 Through the above processing, a database data schema definition document (hereinafter simply referred to as a source data schema) stored in the reference source data schema definition document storage unit 1041 and a target data schema to be interconverted with the source data schema Item names and frequency order instances of the database data schema definition document (hereinafter simply referred to as target data schema) stored in the definition document storage unit 1043 are the source data schema data item information storage unit 1051 and target data schema data item information. The data are stored in the storage unit 1052, respectively.

（要素概念および概念関係の抽出）
相互変換を行う２つのデータスキーマ（ソースデータスキーマとターゲットデータスキーマ）の項目名の概念情報抽出の処理は、概念抽出手段１０８２により実行される。概念抽出手段１０８２は、上記図６の処理により、項目名と頻度順のインスタンスがソースデータスキーマデータ項目情報格納部１０５１とターゲットデータスキーマデータ項目情報格納部１０５２に格納された後に実行される。 (Extraction of element concepts and concept relationships)
The concept extraction unit 1082 executes concept information extraction processing of item names of two data schemas (source data schema and target data schema) that perform mutual conversion. The concept extraction unit 1082 is executed after the instance of the item name and the frequency order are stored in the source data schema data item information storage unit 1051 and the target data schema data item information storage unit 1052 by the processing of FIG.

概念抽出手段１０８２は、ソースデータスキーマおよびターゲットデータスキーマのデータ項目名を構成する概念を抽出し、概念情報格納部１０５３に格納する。概念情報格納部１０５３のデータ構造を図４に示す。 The concept extraction unit 1082 extracts the concepts constituting the data item names of the source data schema and the target data schema and stores them in the concept information storage unit 1053. The data structure of the concept information storage unit 1053 is shown in FIG.

概念情報格納部１０５３は、一つのエントリにデータ項目名に含まれる概念を示す概念名１０５３１と、概念名が存在するデータ項目名内の位置１０５３２と、概念名１０５３１に対応付けられたデータ項目名のうち名称が異なるデータ項目名の数１０５３３と、概念名１０５３１に対応付けられたデータ項目名の一覧を格納するデータ項目名リスト１０５３４から構成される。 The concept information storage unit 1053 includes a concept name 10551 indicating a concept included in the data item name in one entry, a position 10532 in the data item name where the concept name exists, and a data item name associated with the concept name 10551. The number of data item names 10533 having different names, and a data item name list 10534 for storing a list of data item names associated with the concept name 10531 are included.

概念抽出手段１０８２の処理手順を図７のフローチャートに従って説明する。 The processing procedure of the concept extraction unit 1082 will be described with reference to the flowchart of FIG.

概念抽出手段１０８２は、ソースデータスキーマデータ項目情報格納部１０５１からデータ項目名１０５１１を２つ取り出してデータ項目名１０５１１の文字列を比較する（Ｓ７０１〜Ｓ７０４）。 The concept extraction unit 1082 extracts two data item names 10511 from the source data schema data item information storage unit 1051 and compares the character strings of the data item names 10511 (S701 to S704).

Ｓ７０４の比較の結果、２つのデータ項目名１０５１１が異なるときには、２つのデータ項目名Ａ、データ項目名Ｂとしたとき、項目名Ａ、項目名Ｂを構成する要素概念を以下のように切り出す（Ｓ７０５）。
（１）概念１＝項目名Ａ、項目名Ｂの先頭からの共通文字列
（２）概念２＝項目名Ａ、項目名Ｂの末尾からの共通文字列
（３）概念３＝項目名Ａから上記共通文字列を除いたもの
（４）概念４＝項目名Ｂから上記共通文字列を除いたもの
次に、データ項目名Ａおよびデータ項目名Ｂにおける概念１〜４の使われ方に関する情報を概念情報格納部１０５３に登録する（Ｓ７０６）。例えば、データ項目名Ａが「要求番号」、データ項目名Ｂが「要求年月日」のとき、概念１〜概念４は以下のようになる。
（１）概念１＝「要求」
（２）概念２＝nil（無し）
（３）概念３＝「番号」
（４）概念４＝「年月日」
そこで、概念１、概念３、概念４の概念名とデータ項目名における使われ方に関する情報を概念情報格納部１０５３の概念名１０５３１、データ項目名における位置１０５３２、データ項目名異なり数１０５３３、データ項目名リスト１０５３４に格納する。 As a result of the comparison in S704, when the two data item names 10511 are different, assuming that the two data item names A and B are the data item names B, the element concepts constituting the item name A and the item name B are cut out as follows ( S705).
(1) Concept 1 = Common character string from the beginning of item name A and item name B (2) Concept 2 = Common character string from the end of item name A and item name B (3) Concept 3 = From item name A Excluding the common character string (4) Concept 4 = item name B excluding the common character string Next, information on how to use the concepts 1 to 4 in the data item name A and the data item name B Registration is performed in the concept information storage unit 1053 (S706). For example, when the data item name A is “request number” and the data item name B is “request date”, the concepts 1 to 4 are as follows.
(1) Concept 1 = “Request”
(2) Concept 2 = nil (none)
(3) Concept 3 = “number”
(4) Concept 4 = “Date”
Therefore, information on how the concept 1, concept 3, and concept 4 are used in the concept name and the data item name are represented by the concept name 10551 in the concept information storage unit 1053, the position 10532 in the data item name, the number 10533 of data item name differences, The name list 10534 is stored.

最後に、概念３と概念４（すなわち、データ項目名Ａとデータ項目名Ｂの差異）の概念関係を抽出しオントロジ１０６１に登録する（Ｓ７０７）。以上の処理をデータ項目名１０５１１のすべての組み合わせについて行う（Ｓ７０８〜Ｓ７１１）。 Finally, the conceptual relationship between concept 3 and concept 4 (that is, the difference between data item name A and data item name B) is extracted and registered in ontology 1061 (S707). The above processing is performed for all combinations of the data item names 10511 (S708 to S711).

また、概念抽出手段１０８２は、上記図７の処理をターゲットデータスキーマデータ項目情報格納部１０５２についても実行する。 The concept extraction unit 1082 also executes the process of FIG. 7 on the target data schema data item information storage unit 1052.

以上の処理により、ソースデータスキーマおよびターゲットデータスキーマのデータ項目名を構成する要素概念が抽出されて概念情報格納部１０５３に格納される。加えて、ソースデータスキーマおよびターゲットデータスキーマのそれぞれのデータ項目名Ａとデータ項目名Ｂの差異がオントロジ１０６１に格納されることになる。 Through the above processing, element concepts constituting the data item names of the source data schema and the target data schema are extracted and stored in the concept information storage unit 1053. In addition, the difference between the data item name A and the data item name B of the source data schema and the target data schema is stored in the ontology 1061.

また、上記要素概念は、項目名Ａ、項目名Ｂが共通文字列を含む第１の要素概念と、項目名Ａ、項目名Ｂから共通文字列を除いた差分で構成される第２の要素概念とから構成されることになる。 In addition, the above element concept includes a first element concept in which item name A and item name B include a common character string, and a second element composed of differences obtained by removing the common character string from item name A and item name B. It is composed of concepts.

（概念関係抽出手順の詳細）
上記図７の概念関係抽出のステップ（Ｓ７０７）では、弁別関係抽出手段１０８３および順序関係抽出手段１０８５により、上述の概念３と概念４（すなわち、データ項目名Ａとデータ項目名Ｂの差異）の関係を抽出し、オントロジ１０６１に格納する。オントロジ１０６１のデータ構造を図５に示す。 (Details of conceptual relationship extraction procedure)
In the conceptual relationship extraction step (S707) in FIG. 7 above, the above-described concepts 3 and 4 (that is, the difference between the data item name A and the data item name B) are detected by the discrimination relationship extraction unit 1083 and the order relationship extraction unit 1085. The relationship is extracted and stored in ontology 1061. The data structure of ontology 1061 is shown in FIG.

図５において、オントロジ１０６１は、一つのエントリに概念３（項目名Ａから共通文字列を除いた文字列）を格納する概念名１０５４１と、概念４（項目名Ｂから共通文字列を除いた文字列）を格納する概念名１０５４２と、２つの概念名１０５４１、１０５４２の概念の関係を格納する概念関係１０５４３と、データ項目名Ａまたはデータ項目名Ｂにおける概念名１０５４１、１０５４２の位置１０５４４と、抽出されたデータ項目名Ａ及びＢの一覧を格納する例１０５４５と、オペレータや管理者などが当該エントリを確認したことを示す確認フラグ１０５４６とを含む。 In FIG. 5, ontology 1061 includes a concept name 10541 that stores concept 3 (a character string obtained by removing a common character string from item name A) in one entry, and a concept 4 (character obtained by removing a common character string from item name B). Column), a concept relationship 10543 storing the relationship between the two concept names 10541 and 10542, a position 10544 of the concept names 10541 and 10542 in the data item name A or the data item name B, and extraction An example 10545 for storing a list of the data item names A and B, and a confirmation flag 10546 indicating that the operator or administrator has confirmed the entry.

次に、概念関係抽出手段１０８５と弁別関係抽出手段１０８３の処理手順を図８のフローチャートに従って以下に説明する。 Next, processing procedures of the conceptual relationship extraction unit 1085 and the discrimination relationship extraction unit 1083 will be described below with reference to the flowchart of FIG.

（１）順序関係の抽出
上記概念２（すなわち、データ項目名Ａとデータ項目名Ｂの末尾からの共通文字列）が「年月日」のとき（Ｓ８０１）、概念３と概念４（すなわち、データ項目名Ａとデータ項目名Ｂの差異）の時間的な順序関係を抽出する。まず、インスタンス文書格納部１０４２または１０４４からデータ項目名Ａとデータ項目名Ｂの時系列的な順序関係を抽出する（Ｓ８０２）。 (1) Extraction of Order Relationship When the concept 2 (that is, the common character string from the end of the data item name A and the data item name B) is “year / month / day” (S801), the concept 3 and the concept 4 (that is, The temporal order relationship between the data item name A and the data item name B) is extracted. First, the time-series order relationship between the data item name A and the data item name B is extracted from the instance document storage unit 1042 or 1044 (S802).

データ項目名Ａが常に順序が先であれば（Ｓ８０３）、概念３は概念４よりも順序が先であると判定する（Ｓ８０４）。例えば、「見積年月日」が「出荷年月日」よりも常に先であれば、「見積」は「出荷」よりも順序が「先」であると判定する。逆にデータ項目名Ｂが常に順序が先であれば（Ｓ８０５）、概念４は概念３よりも順序が「先」であると判定する（Ｓ８０６）。 If the order of the data item name A is always ahead (S803), it is determined that the concept 3 is ahead of the concept 4 (S804). For example, if the “estimated date” is always ahead of the “shipped date”, it is determined that the “estimated” order is “first” than “shipped”. Conversely, if the data item name B is always ahead of the order (S805), it is determined that the concept 4 is “first” than the concept 3 (S806).

順序関係抽出手段１０８５は、時間軸上の位置を示すようなデータ項目名を対象に行う。したがって、データ項目末尾の文字列の条件としては、「年月日」の他に、「日付」「年月」「月・年・時刻」などが考えられる。 The order relation extracting unit 1085 performs a data item name indicating a position on the time axis. Therefore, as a condition of the character string at the end of the data item, “date”, “year / month”, “month / year / time”, etc. can be considered in addition to “year / month / day”.

例えば、「納品番号」のマッピング先候補として「見積番号」「出荷番号」「着荷番号」があるとき、時間的な順序関係に関して以下の情報が得られたとする。
（１）「見積 - 出荷 - 着荷」という順序関係
（２）「出荷 - 納品」という順序関係
この例では、「納品」は「出荷」より後であるという情報を利用することにより、「見積」「出荷」を「納品」の同義語候補からはずすことができる。これにより、「納品番号」のマッピング先の候補から「見積番号」と「出荷番号」をはずし、「着荷番号」に絞ることができる。 For example, when there are “estimate number”, “shipment number”, and “arrival number” as mapping destination candidates for “delivery number”, it is assumed that the following information is obtained regarding the temporal order relationship.
(1) Order relationship “estimate-shipment-arrival” (2) Order relationship “shipment-delivery” In this example, “estimation” is obtained by using information that “delivery” is later than “shipment”. “Shipment” can be removed from the synonym candidate “delivery”. As a result, the “estimate number” and the “shipment number” can be removed from the “delivery number” mapping destination candidates and can be narrowed down to the “arrival number”.

また、データ項目名のデータ属性１０５１４、１０５２４が「時間」や「年月日」であることを条件にして上記と同様の処理を実施してもよい。 Further, the same processing as described above may be performed on condition that the data attributes 10514 and 10524 of the data item name are “time” and “year / month / day”.

（２）弁別関係の抽出
概念３と概念４（すなわち、項目名Ａと項目名Ｂの差異）に順序関係がないとき、弁別関係抽出手段１０８３が、概念３と概念４は弁別関係にあると判定する（Ｓ８０７）。例えば、データ項目名Ａが「要求番号」、データ項目名Ｂが「要求年月日」の時、概念３は「番号」、概念４は「年月日」となり、これらの概念は「弁別関係」にあると判定する。なお、上記Ｓ８０１で概念２が「年月日」でないときにもこのＳ８０７に進んで弁別関係を設定する。 (2) Extraction of discrimination relationship When there is no order relationship between concept 3 and concept 4 (ie, the difference between item name A and item name B), discrimination relationship extraction means 1083 indicates that concept 3 and concept 4 are in a discrimination relationship. Determination is made (S807). For example, when the data item name A is “request number” and the data item name B is “request date”, the concept 3 is “number” and the concept 4 is “date”. Is determined. Even when the concept 2 is not “year / month / day” in S801, the process advances to S807 to set the discrimination relationship.

ここで、「弁別関係」関係は、同義語にはなり得ない語彙の関係であって、特に、データベースの項目名としたときに違う項目としてあげられ得る関係や、複合語にしたときに意味の異なる項目として扱うことができる概念関係を示す。 Here, the “discrimination relationship” relationship is a vocabulary relationship that cannot be a synonym, and in particular, a relationship that can be raised as a different item when it is used as a database item name, or a compound word. The conceptual relationship that can be treated as different items.

以上の手順で抽出した概念間の順序関係および弁別関係の情報を、オントロジ１０６１の概念名１０５４１、概念名１０５４２、概念関係１０５４３、データ項目名における位置１０５４４、例１０５４５に格納する（Ｓ８０８）。 Information on the order relationship and the discrimination relationship between concepts extracted by the above procedure is stored in the concept name 10541, concept name 10542, concept relationship 10543, position 10544 in the data item name, and example 10545 of the ontology 1061 (S808).

（データ項目マッピング定義支援）
データ項目マッピング定義支援手段１０８７は、オントロジ１０６１に格納された概念間の弁別関係および順序関係を利用することにより、ソースデータスキーマとターゲットデータスキーマにおけるデータ項目名の対応付けを支援する。対応付けの結果は図１２に示すデータスキーママッピング定義ファイル１０７１に格納される。 (Data item mapping definition support)
The data item mapping definition support means 1087 supports the correspondence between the data item names in the source data schema and the target data schema by using the discrimination relationship and the order relationship between the concepts stored in the ontology 1061. The association result is stored in the data schema mapping definition file 1071 shown in FIG.

図１２において、データスキーママッピング定義ファイル１０７１は、ソースデータスキーマデータ項目情報格納部１０５１におけるデータ項目名の位置を示すポインタを格納するソースポインタ１０７１１１と、ターゲットデータスキーマデータ項目情報格納部１０５２におけるデータ項目名の位置を示すポインタを格納するターゲットポインタ１０７１１２が一対となって構成される。 In FIG. 12, a data schema mapping definition file 1071 includes a source pointer 107111 for storing a pointer indicating the position of the data item name in the source data schema data item information storage unit 1051 and a data item in the target data schema data item information storage unit 1052. A pair of target pointers 107112 for storing pointers indicating the positions of names are configured.

上記データスキーママッピング定義ファイル１０７１を構築するデータ項目マッピング定義支援手段１０８７の処理について、図９のフローチャートを用いて以下に説明する。 The processing of the data item mapping definition support means 1087 for constructing the data schema mapping definition file 1071 will be described below with reference to the flowchart of FIG.

ソースデータスキーマとターゲットデータスキーマの２つのデータスキーマ間でのデータ項目名の対応付けの方法としては、「注文日付」と「注文年月日」のように共通文字列を含むデータ項目を候補として抽出することが考えられる。 As a method of associating data item names between the two data schemas of the source data schema and the target data schema, data items including common character strings such as “order date” and “order date” are candidates. It is possible to extract.

そこで、データ項目マッピング定義支援手段１０８７は、ソースデータスキーマデータ項目情報格納部１０５１およびターゲットデータスキーマデータ項目情報格納部１０５２からデータ項目を１つずつ取り出してデータ項目名１０５１１，１０５２１の文字列を比較する（Ｓ９０１〜Ｓ９０４）。ソースデータスキーマのデータ項目名１０５１１をデータ項目名Ａ、ターゲットデータスキーマのデータ項目名１０５２１をデータ項目名Ｂとした時、データ項目名Ａ、データ項目名Ｂを構成する要素概念を以下のように切り出す（Ｓ９０５）。
（１）概念１＝項目名Ａ、項目名Ｂの先頭からの共通文字列
（２）概念２＝項目名Ａ、項目名Ｂの末尾からの共通文字列
（３）概念３＝項目名Ａから上記の共通文字列を除いたもの
（４）概念４＝項目名Ｂから上記の共通文字列を除いたもの
本システムの特徴は、データ項目名Ａ、Ｂを構成する概念の弁別関係および時間的な順序関係に関する情報を利用して対応付け候補を絞る点にある。すなわち、データ項目名Ａとデータ項目名Ｂに共通文字列がある場合、オントロジ１０６１を参照し、概念３と概念４（すなわち、項目名Ａと項目名Ｂの差異）が“弁別関係”にないことを条件に（Ｓ９０６）、概念３と概念４を同義語候補とする（Ｓ９０７）。そして、概念間の同義関係をオントロジ１０６１の概念名１０６１１、概念名１０６１２、概念関係１０６１３、データ項目名における位置１０６１４、例１０６１５に格納する（Ｓ９０８）。このＳ９０８の処理が、図１に示した同義関係抽出手段１０８４に相当する。 Therefore, the data item mapping definition support means 1087 takes out the data items one by one from the source data schema data item information storage unit 1051 and the target data schema data item information storage unit 1052, and compares the character strings of the data item names 10511 and 10521. (S901 to S904). When the data item name 10511 of the source data schema is the data item name A and the data item name 10521 of the target data schema is the data item name B, the element concepts constituting the data item name A and the data item name B are as follows: Cut out (S905).
(1) Concept 1 = Common character string from the beginning of item name A and item name B (2) Concept 2 = Common character string from the end of item name A and item name B (3) Concept 3 = From item name A Excluding the above common character string (4) Concept 4 = Item name B excluding the above common character string The features of this system are the distinction relationship and temporal relationship of the concepts that make up data item names A and B It is in the point which narrows down a matching candidate using the information about various order relations. That is, when there is a common character string in data item name A and data item name B, reference is made to ontology 1061, and concepts 3 and 4 (that is, the difference between item name A and item name B) are not in the “discrimination relationship”. On the condition (S906), Concept 3 and Concept 4 are set as synonym candidates (S907). The synonymous relationship between the concepts is stored in the concept name 10611, concept name 10612, concept relationship 10613, position 10614 in the data item name, and example 10615 of the ontology 1061 (S908). The process of S908 corresponds to the synonym relation extracting unit 1084 shown in FIG.

次に、データ項目マッピング定義支援手段１０８７は、データスキーママッピング定義ファイル１０７１のデータ項目名Ａのマッピング先候補にデータ項目名Ｂを追加する（Ｓ９０９）。以上の処理をデータ項目名のすべての組み合わせについて行う（Ｓ９１０〜Ｓ９１３）。 Next, the data item mapping definition support unit 1087 adds the data item name B to the mapping destination candidate of the data item name A in the data schema mapping definition file 1071 (S909). The above processing is performed for all combinations of data item names (S910 to S913).

同様に、データ項目マッピング定義支援手段１０８７は、概念３と概念４（すなわち、項目名Ａと項目名Ｂの差異）の時間的な順序関係の情報を利用してマッピング先候補の絞り混みを行う。例えば、「納品番号」のマッピング先候補として「見積番号」「出荷番号」「着荷番号」があるとき、図５に示すように、オントロジ１０６１から以下の順序関係に関する情報が得られたとする。
（１）「出荷」は「納品」より先という順序関係
（２）「見積」は「出荷」より先という順序関係
この場合、「見積」「出荷」を「納品」の同義語候補からはずすことができる。これにより、「納品番号」のマッピング先の候補から「見積番号」と「出荷番号」をはずすことができる。 Similarly, the data item mapping definition support means 1087 performs constriction of mapping destination candidates using information on the temporal order relationship between the concept 3 and the concept 4 (that is, the difference between the item name A and the item name B). . For example, when there are “estimate number”, “shipment number”, and “arrival number” as mapping destination candidates for “delivery number”, as shown in FIG.
(1) Order relationship that “shipment” comes before “delivery” (2) Order relationship that “estimate” precedes “shipment” In this case, “estimate” and “shipment” are excluded from synonym candidates for “delivery” Can do. Thereby, the “estimate number” and the “shipment number” can be removed from the “delivery number” mapping destination candidates.

このように、本システムのデータ項目マッピング定義支援手段１０８７は、以上の手順で２つのデータベース（データスキーマ）の項目名に関するマッピング先候補を絞った上で、マッピング先候補を表示装置１０３の画面に表示する。データ項目マッピング定義支援画面の表示例を図１０に示す。 As described above, the data item mapping definition support unit 1087 of this system narrows down the mapping destination candidates related to the item names of the two databases (data schemas) by the above procedure, and then displays the mapping destination candidates on the screen of the display device 103. indicate. A display example of the data item mapping definition support screen is shown in FIG.

図１０において、データ項目マッピング定義支援画面１００１は、対応付けを行うターゲットデータスキーマのデータ項目名を表示するターゲットスキーマ表示部１００１１と、基準となるソースデータスキーマのデータ項目名を表示するソーススキーマ表示部１００１２と、２つのデータスキーマで同義となるデータ項目名の構成要素の候補を表示する同義概念候補表示部１００１３と、２つのデータスキーマで弁別関係となるデータ項目名の構成要素の候補を表示する弁別概念候補表示部１００１４と、オペレータの操作を受け付ける登録ボタン１００１５からなる。同義概念候補表示部１００１３および弁別概念候補表示部１００１４には、当該データ項目名に含まれる概念の同義概念候補および弁別概念候補が表示される。 In FIG. 10, a data item mapping definition support screen 1001 displays a target schema display unit 10011 that displays a data item name of a target data schema to be associated, and a source schema display that displays a data item name of a reference source data schema. Part 10012, synonym concept candidate display part 10013 for displaying candidate data item names that are synonymous in two data schemas, and candidate data item names that are discriminating in two data schemas And a registration button 10015 for accepting an operator's operation. The synonym concept candidate display unit 10013 and the discrimination concept candidate display unit 10014 display synonym concept candidates and discrimination concept candidates included in the data item name.

ソーススキーマ表示部１００１２に表示されたデータ項目名に対して、ターゲットスキーマ表示部１００１１には関連付け（マッピング）を行う候補がマッピング先候補として表示される。そして、このマッピング先候補の中からユーザが適切な候補を選択して登録ボタン１００１５を入力装置１０２により選択（クリック）すると、選択されたマッピング先候補がデータスキーママッピング定義ファイル１０７１に格納される。 For the data item name displayed in the source schema display unit 10012, candidates for association (mapping) are displayed as mapping destination candidates in the target schema display unit 10011. When the user selects an appropriate candidate from the mapping destination candidates and selects (clicks) the registration button 10015 with the input device 102, the selected mapping destination candidate is stored in the data schema mapping definition file 1071.

例えば、図１０において、画面右側の表示領域に設定されたソーススキーマ表示部１００１２に「注文日付」というソースデータスキーマのデータ項目名が表示されると、左側の表示領域に設定されたターゲットスキーマ表示部１００１１には、「注文日付」に関連付けられる候補として「注文年月日」が表示される。オペレータは、データ項目マッピング定義支援手段１０８７が絞り込んだ結果について、承認するため登録ボタン１００１５を操作すればよい。また、画面１００１の下部左側の同義概念候補表示部１００１３には、ターゲットスキーマ表示部１００１１に表示されたデータ項目名「注文年月日」に対する同義語が表示される。また、画面１００１の下部右側の弁別概念候補表示部１００１４には、ソースデータスキーマとターゲットデータスキーマのマッピング先候補の語彙から区別すべき弁別関係となる語句が表示される。 For example, in FIG. 10, when the data item name of the source data schema “Order Date” is displayed in the source schema display unit 10012 set in the display area on the right side of the screen, the target schema display set in the display area on the left side is displayed. The part 10011 displays “order date” as a candidate associated with “order date”. The operator may operate the registration button 10015 to approve the result narrowed down by the data item mapping definition support means 1087. A synonym for the data item name “order date” displayed in the target schema display unit 10011 is displayed in the synonym concept candidate display unit 10013 on the lower left side of the screen 1001. Further, the discrimination concept candidate display section 10014 on the lower right side of the screen 1001 displays a phrase that is a discrimination relationship to be distinguished from the vocabulary of the mapping destination candidates of the source data schema and the target data schema.

同様に、同義概念候補、弁別概念候補の中からユーザが適切な候補を選択して登録ボタン１００１５をクリックすると、選択内容に従ってオントロジ１０６１の同義概念および弁別概念の情報を更新し、図５に示した確認フラグ１０５４６に確認済みであることを示す値「１」が設定される。 Similarly, when the user selects an appropriate candidate from the synonym concept candidate and the discrimination concept candidate and clicks the registration button 10015, the information on the synonym concept and the discrimination concept of the ontology 1061 is updated according to the selection, and is shown in FIG. The confirmation flag 10546 is set to a value “1” indicating confirmation.

なお、上記ではオペレータの操作により２つのデータスキーマのデータ項目名の対応付け（マッピング）を承認する例を示したが、データ項目マッピング定義支援手段１０８７が絞り込んだマッピング先候補を、データスキーママッピング定義ファイル１０７１へ自動的に登録するようにしても良い。 In the above, an example in which the association (mapping) of the data item names of the two data schemas is approved by the operator's operation is shown. However, the mapping destination candidates narrowed down by the data item mapping definition support unit 1087 are displayed as the data schema mapping definition. It may be automatically registered in the file 1071.

（オントロジ構築支援手段）
最後に、オントロジ構築支援手段１０８６について述べる。本実施形態では、前述のデータ項目マッピング定義支援手段１０８７の処理過程で部分的にオントロジ１０６１を構築していくが、オントロジ構築支援手段１０８６は、概念ごとに弁別概念および同義概念を確認する手段を提供する。オントロジ構築支援画面の一例を図１１に示す。オントロジ構築支援画面９０１は、選択された概念に対応する語句を表示する概念表示部９０１１と、概念表示部９０１１に表示された語句に対して弁別関係となる概念と同義関係となる概念とを表示する概念関係表示部９０１２と、概念関係表示部９０１２に表示された概念に対して弁別関係となる語句と、同義関係となる語句とを表示する概念関係候補表示部９０１３と、オペレータの操作を受け付ける登録ボタン９０１４からなる。概念関係表示部９０１２には同義概念、弁別概念、および時間的順序関係が表示される。概念関係候補表示部９０１３には、概念関係格納部１０５４に格納されている同義概念候補、弁別概念候補、および順序関係候補が表示される。ユーザが概念関係候補の中から適切なものを選択して登録ボタン１０９４をクリックすると、選択された同義概念候補、弁別概念候補、および順序関係候補がオントロジ１０６１に格納され、確認フラグ１０６１６に確認済みであることを示す値「１」が設定される。 (Ontology construction support means)
Finally, the ontology construction support means 1086 will be described. In this embodiment, the ontology 1061 is partially constructed in the process of the data item mapping definition support means 1087 described above. The ontology construction support means 1086 provides means for confirming the discrimination concept and the synonym concept for each concept. provide. An example of the ontology construction support screen is shown in FIG. The ontology construction support screen 901 displays a concept display unit 9011 that displays a word corresponding to the selected concept, and a concept that is synonymous with a concept that is a discrimination relationship with respect to the word displayed on the concept display unit 9011. A concept relationship display unit 9012, a concept relationship candidate display unit 9013 for displaying a phrase that is a discrimination relationship with respect to the concept displayed in the concept relationship display unit 9012, and a phrase that is a synonym relationship, and an operation of the operator A registration button 9014 is provided. The concept relationship display portion 9012 displays synonymous concepts, discrimination concepts, and temporal order relationships. The conceptual relationship candidate display unit 9013 displays synonymous concept candidates, discrimination concept candidates, and order relationship candidates stored in the conceptual relationship storage unit 1054. When the user selects an appropriate one from the concept relationship candidates and clicks the registration button 1094, the selected synonym concept candidate, discrimination concept candidate, and order relationship candidate are stored in the ontology 1061 and confirmed in the confirmation flag 10616. A value “1” indicating that the value is set is set.

なお、オントロジ１０６１の構築も、優先度の高い弁別概念候補と同義概念候補を自動的に選択して登録するようにしても良い。 The ontology 1061 may also be constructed by automatically selecting and registering a high-priority discrimination concept candidate and a synonym concept candidate.

（データスキーママッピング定義ファイル）
上記データスキーママッピング定義支援システムにより作成されたデータスキーママッピング定義ファイル１０７１は、ターゲットデータスキーマで定義されるデータベースと、ソースデータスキーマで定義されるデータベースの統合を行うことができる。 (Data schema mapping definition file)
The data schema mapping definition file 1071 created by the data schema mapping definition support system can integrate the database defined by the target data schema and the database defined by the source data schema.

例えば、ターゲットデータスキーマで定義されるデータベースで「注文年月日」で検索を行う場合、データベースシステム（図示省略）がデータスキーママッピング定義ファイル１０７１を参照して、ターゲットデータスキーマのデータ項目名が「注文年月日」に対応するソースデータスキーマのデータ項目名を参照する。図１２に示したデータスキーママッピング定義ファイル１０７１からソースデータスキーマのデータ項目名のポインタを取得し、ソースデータスキーマデータ項目情報格納部１０５１から「注文日付」を得る。これによりデータベースシステムは、ソースデータスキーマで定義されるデータベースに対して、「注文日付」で同様の検索を行うことで、２つのデータベースを仮想的に統合して運用することが可能となる。 For example, when searching for “order date” in a database defined in the target data schema, the database system (not shown) refers to the data schema mapping definition file 1071 and the data item name of the target data schema is “ Refer to the data item name of the source data schema corresponding to “order date”. A pointer of the data item name of the source data schema is obtained from the data schema mapping definition file 1071 shown in FIG. 12, and “order date” is obtained from the source data schema data item information storage unit 1051. As a result, the database system can operate the two databases virtually integrated by performing a similar search on the “order date” with respect to the database defined in the source data schema.

また、ソースデータスキーマで定義されるデータベースを、ターゲットデータスキーマで定義されるデータベースに統合する場合では、データスキーママッピング定義ファイル１０７１で対応付けられたデータ項目名間でデータの転送を行えばよい。例えば、図１０で示したように、ソースデータスキーマで定義されるデータベースのデータ項目名「注文日付」のデータ（インスタンス）を、ターゲットデータスキーマデータ項目名定義されるデータベースのデータ項目名「注文年月日」のデータ（インスタンス）として書き込めばよい。 Further, when the database defined in the source data schema is integrated into the database defined in the target data schema, data may be transferred between the data item names associated in the data schema mapping definition file 1071. For example, as shown in FIG. 10, the data (instance) of the data item name “order date” defined in the source data schema is changed to the data item name “order year” defined in the target data schema data item name. What is necessary is just to write as data (instance) of "month and day".

（まとめ）
以上のように、本発明では、複数のデータスキーマ間で、データ項目名の対応付け候補を絞るために、概念間の弁別関係をオントロジに記述する。また、オントロジ構築を自動化するため、データスキーマにおけるデータ項目名間の関係から概念の弁別関係を抽出する。すなわち、「同じ親を持つデータ項目には、違う項目として区別できるような弁別性のある名前がつけられる」という性質を利用して、同じ親を持つデータ項目の語句から概念間の弁別関係を抽出する。例えば、「要求番号」と「要求日付」というデータ項目名が兄弟関係にある場合には、「『要求番号』と『要求日付』は弁別性のある概念（区別すべき概念）である」という知識を抽出する。さらに、「『要求番号』と『要求日付』の違いを表す『番号』と『日付』は弁別性のある概念（区別すべき概念）である」という知識を抽出する。 (Summary)
As described above, in the present invention, the discrimination relationship between concepts is described in an ontology in order to narrow down data item name association candidates among a plurality of data schemas. Moreover, in order to automate ontology construction, a concept discrimination relationship is extracted from the relationship between data item names in the data schema. In other words, using the property that “a data item with the same parent is given a distinguishable name that can be distinguished as a different item”, the discrimination relationship between concepts can be determined from the terms of data items with the same parent. Extract. For example, when the data item names “request number” and “request date” are in a sibling relationship, ““ request number ”and“ request date ”are distinct concepts (concepts to be distinguished)”. Extract knowledge. Furthermore, knowledge that “the“ number ”and the“ date ”representing the difference between the“ request number ”and the“ request date ”are distinguishable concepts” ”is extracted.

そして、本発明では、データ項目名の対応付け候補を絞るために、概念間の時間的順序関係をオントロジに記述する。また、オントロジ構築を自動化するため、インスタンス文書から時間的順序関係を抽出する。例えば、インスタンス文書において、「出荷日付」が「着荷日付」よりも常に早いことから、「『出荷』は『着荷』よりも早い」という時間的順序関係を抽出する。 And in this invention, in order to narrow down the correspondence candidate of a data item name, the temporal order relationship between concepts is described ontology. Also, in order to automate ontology construction, temporal order relations are extracted from the instance document. For example, in the instance document, since the “shipment date” is always earlier than the “arrival date”, the temporal order relationship ““ shipment ”is earlier than“ arrival ”” is extracted.

以上のように、本発明によれば、概念間の弁別関係および時間的順序関係を記述したオントロジを利用して複数のデータスキーマ間のデータ項目名を対応付けることにより、データ項目名の対応付け候補を正確かつ容易に絞ることができる。特に、前記従来例のように、手作業に頼っていたデータ項目名の対応付けを、自動化することが可能となるのである。これにより、異なるデータベース間の統合（仮想統合や物理的な統合）や、データベースの新規導入や更新、または業界標準に対応したデータベースのバージョンアップなどの変化に迅速に対応することが可能となる。 As described above, according to the present invention, by associating data item names between a plurality of data schemas using an ontology that describes the discrimination relationship and temporal order relationship between concepts, data item name association candidates Can be accurately and easily squeezed. In particular, as in the prior art, it is possible to automate the association of data item names that relied on manual work. As a result, it is possible to quickly respond to changes such as integration between different databases (virtual integration or physical integration), new introduction or update of databases, or database upgrades corresponding to industry standards.

＜第２実施形態＞
（弁別関係を利用した情報検索システム）
前記第１実施例形態では、データスキーマから「弁別関係」という概念関係を抽出し、データスキーマ間のデータ項目名の対応付けに利用する方法について述べた。「弁別関係」は２つのデータスキーマのデータ項目名の対応関係を絞り込む際に有効であるため、複数の情報サーバから適切な情報を検索するための検索条件の変換にも適用しうる。情報検索において、検索漏れを少なくする手段として、質問拡張という手法が知られている。 <Second Embodiment>
(Information retrieval system using discrimination)
In the first embodiment, the method of extracting the conceptual relationship “discrimination relationship” from the data schema and using it for associating the data item names between the data schemas has been described. Since the “discrimination relationship” is effective when narrowing down the correspondence between the data item names of the two data schemas, it can also be applied to conversion of search conditions for searching for appropriate information from a plurality of information servers. In information retrieval, a technique called question expansion is known as means for reducing search omissions.

質問拡張とは、検索質問中の単語と関連のある単語を検索質問に自動的に追加することである。例えば、検索質問が「自動車」であるとき、「車」「乗用車」「自家用車」などを追加する。質問拡張で追加する単語は、異表記（コンピュータ→コンピューター）・同義語（コンピュータ→計算機）・上位語（パソコン→コンピュータ）・下位語（コンピュータ→パソコン）などである。 Question expansion is to automatically add a word related to a word in the search question to the search question. For example, when the search question is “automobile”, “car”, “passenger car”, “private car”, and the like are added. The words to be added in the question expansion are different expressions (computer → computer), synonyms (computer → computer), broader terms (computer → computer), narrower terms (computer → computer), and the like.

追加すべき単語を見つける方法としては、辞書やシソーラスをあらかじめ作っておく方法が一般的である。例えば、特許文献２では、オントロジを用いて近似的な検索条件に変換する方法が開示されている。「類似度」という尺度で検索条件を拡張すると検索漏れは少なくなるが、ノイズが生じやすい。そこで、本第２実施形態では、「弁別関係」の知識を使って候補を条件に絞ることにより、より適切な検索条件に変換することが可能となる。例えば、「自家用車」という検索質問に対して、「乗用車」「車」などの上位語を付加するだけでなく、「自家用」と「商用」が弁別関係にあるという知識を用いて「商用車」を除外することにより、ノイズを減らすことができる。 As a method of finding a word to be added, a method of creating a dictionary or a thesaurus in advance is common. For example, Patent Document 2 discloses a method of converting into approximate search conditions using an ontology. If the search condition is expanded by the measure of “similarity”, search omissions are reduced, but noise is likely to occur. Therefore, in the second embodiment, it is possible to convert candidates into more appropriate search conditions by narrowing down candidates to conditions using knowledge of “discrimination relation”. For example, in addition to adding a broader term such as “passenger car” or “car” to the search query “private car”, using the knowledge that “private car” and “commercial” have a discrimination relationship, "Can be eliminated to reduce noise.

以下、図１３のフローチャートに従って、オントロジに記述された階層関係および弁別関係の情報を用いた情報検索システムの検索の手順を示す。なお、図１３のフローチャートは、図１と同様の計算機で実行されるものである。この検索システムは、複数の会社の製品体系を参照して、ユーザが探したい製品の情報を検索するシステムである。本システムは単語間の概念関係を記録する辞書と、本発明による単語間の弁別関係を記録する辞書とを保持することを前提とする。まず、ユーザが探したい製品カテゴリ名として「福祉車両」と入力すると（Ｓ１３０１）、システムは参考情報として、「福祉車両」に関する製品部品体系の例、たとえば図１４（ａ）に示すようなＡ社の製品分類体系を画面に表示する（Ｓ１３０２）。表示の形式は図１４に限らず、入力単語に関連している単語が明示されれば良い。ユーザが製品分類名として「昇降シート車」を選択すると、システムは「昇降シート車」を検索キーワードとして認識する（Ｓ１３０３）。さらに、「昇降シート車」の上位概念、すなわち「介護式福祉車両」と「福祉車両」を検索キーワードに追加する（Ｓ１３０４）。 Hereinafter, according to the flowchart of FIG. 13, a search procedure of the information search system using the information on the hierarchical relationship and the discrimination relationship described in the ontology will be described. The flowchart in FIG. 13 is executed by the same computer as in FIG. This search system refers to a product system of a plurality of companies, and is a system for searching for product information that the user wants to search for. The system is premised on maintaining a dictionary that records conceptual relationships between words and a dictionary that records discrimination relationships between words according to the present invention. First, when the user inputs “welfare vehicle” as a product category name to be searched for (S1301), the system uses, as reference information, an example of a product part system related to “welfare vehicle”, for example, Company A as shown in FIG. The product classification system is displayed on the screen (S1302). The display format is not limited to that shown in FIG. 14, and a word related to the input word may be specified. When the user selects “elevating seat car” as the product classification name, the system recognizes “elevating seat car” as a search keyword (S1303). Further, the superordinate concept of “elevating seat car”, that is, “nursing care vehicle” and “welfare vehicle” are added to the search keyword (S1304).

次に、図１４（ｂ）に示すような弁別関係に関する情報を参照し、検索キーワードの構成語の弁別概念を除外キーワードとして設定する（Ｓ１３０５）。例えば、「昇降シート車」の弁別概念「車いす移動車」、「ストレッチャー移動車」、「昇降」の弁別概念「回転」、および「介護式」の弁別概念「自操式」、「公共交通」を除外キーワードとして設定する。このように検索キーワードの拡張と除外キーワードの設定を行った上で、複数の会社の製品体系を対象に、「昇降シート車」に対応しうる製品の情報を検索する（Ｓ１３０６）。例えば、「『昇降シート車』あるいは『介護式福祉車両』あるいは『福祉車両』に分類されるものであって、かつ、『車いす移動車』『ストレッチャー移動車』『回転』『自操式』『公共交通』のいずれも含まない」というものを検索する。この方法により、図１５（ａ）、（ｂ）に示すように、Ｂ社には「昇降シート車」という製品分類名がない場合でも、「福祉車両」のうち「自操式仕様車」と「回転シート仕様車」を「車いす移動車」除外し、「リフトアップシート仕様車」だけを検索結果として表示する（Ｓ１３０７）。本発明における弁別関係を用いなければ、検索キーワードを上位概念である「福祉車両」に拡張することはできるが、「自操式仕様車」と「回転シート仕様車」「車いす移動車」を候補から除外することができない。 Next, with reference to information regarding the discrimination relationship as shown in FIG. 14B, the discrimination concept of the constituent words of the search keyword is set as an excluded keyword (S1305). For example, the discriminating concept of “lifting seat car” “wheelchair moving vehicle”, “stretcher moving vehicle”, the discriminating concept of “elevating” “rotation”, and the discriminating concept of “care type” “self-operated”, “public transportation "As a negative keyword. After the search keyword is expanded and the exclusion keyword is set in this way, product information that can be used for the “lifting seat car” is searched for the product systems of a plurality of companies (S1306). For example, “classified as“ lifting seat car ”or“ nursing care vehicle ”or“ welfare vehicle ”, and“ wheelchair moving vehicle ”“ stretcher moving vehicle ”“ rotation ”“ self-operated ” “None of public transportation” is searched. With this method, as shown in FIGS. 15 (a) and 15 (b), even if company B does not have a product classification name “elevating seat vehicle”, “self-operated specification vehicle” among “welfare vehicles” “Rotating seat specification vehicle” is excluded from “wheelchair moving vehicle”, and only “lift-up seat specification vehicle” is displayed as a search result (S1307). If the discrimination relationship in the present invention is not used, the search keyword can be expanded to the upper concept “welfare vehicle”, but “self-operated specification vehicle”, “rotating seat specification vehicle” and “wheelchair moving vehicle” are candidates. Cannot be excluded.

このように、本発明によれば弁別関係の知識を使って検索条件を絞ることにより、より適切な検索条件に変換でき、検索漏れを防ぎながらノイズの少ない検索結果を得ることができる。 As described above, according to the present invention, by narrowing down the search condition using the knowledge of the discrimination relationship, it is possible to convert to a more appropriate search condition, and it is possible to obtain a search result with less noise while preventing a search omission.

本発明は、異なるデータベースの統合に適用することができ、特に、データベースの新規企業内／企業間での情報連携やサービス連携を実現するためのデータベースの仮想統合に適用することができる。また、本発明は、語句の概念を判定する情報検索システムなどに適用することもできる。 The present invention can be applied to integration of different databases, and in particular, can be applied to virtual integration of databases for realizing information cooperation and service cooperation within a new company / between companies. The present invention can also be applied to an information search system that determines the concept of a phrase.

本発明の一実施形態によるデータスキーママッピング定義支援システムのシステム構成図である。1 is a system configuration diagram of a data schema mapping definition support system according to an embodiment of the present invention. FIG. スキーマ文書の一例を示す説明図で、（ａ）はソースデータスキーマ定義文書を示し、（ｂ）はソースインスタンス文書を示し、（ｃ）はターゲットデータスキーマ定義文書を示し、（ｄ）はターゲットインスタンス文書の記述例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a schema document, where (a) shows a source data schema definition document, (b) shows a source instance document, (c) shows a target data schema definition document, and (d) shows a target instance. It is explanatory drawing which shows the example of a description of a document. ソースデータスキーマデータ項目情報格納部およびターゲットデータスキーマデータ項目情報格納部のデータ構造を示す説明図で、（ａ）はソースデータスキーマデータ項目情報格納部を示し、（ｂ）ターゲットデータスキーマデータ項目情報格納部を示す。It is explanatory drawing which shows the data structure of a source data schema data item information storage part and a target data schema data item information storage part, (a) shows a source data schema data item information storage part, (b) Target data schema data item information Indicates a storage unit. 概念情報格納部のデータ構造を示す説明図である。It is explanatory drawing which shows the data structure of a conceptual information storage part. オントロジのデータ構造を示す説明図である。It is explanatory drawing which shows the data structure of ontology. データ項目情報取り込み処理の一例を示すフローチャートである。It is a flowchart which shows an example of a data item information taking-in process. 概念抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of a concept extraction process. 概念関係抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of a conceptual relationship extraction process. データ項目マッピング定義支援処理の一例を示すフローチャートである。It is a flowchart which shows an example of a data item mapping definition assistance process. データ項目マッピング定義支援画面を示す画面イメージである。It is a screen image which shows a data item mapping definition assistance screen. オントロジ構築支援画面を示す画面イメージである。It is a screen image which shows an ontology construction support screen. データスキーママッピング定義格納部のデータ構造を示す説明図である。It is explanatory drawing which shows the data structure of a data schema mapping definition storage part. 情報検索システムの一例を示す処理のフローチャートである。It is a flowchart of the process which shows an example of an information search system. 単語間の階層関係と弁別関係の一例を示す説明図で、（ａ）はＡ社における単語の階層関係を示し、（ｂ）はＡ社における単語間の弁別関係を示す。It is explanatory drawing which shows an example of the hierarchical relationship between words, and a discrimination relationship, (a) shows the hierarchical relationship of the word in A company, (b) shows the discrimination relationship between the words in A company. 単語間の階層関係と弁別関係の他の例を示す説明図で、（ａ）はＢ社における単語の階層関係を示し、（ｂ）はＢ社における単語間の弁別関係を示す。It is explanatory drawing which shows the other example of the hierarchical relationship between words, and a discrimination relationship, (a) shows the hierarchical relationship of the word in B company, (b) shows the discrimination relationship between the words in B company.

Explanation of symbols

１０１ＣＰＵ
１０２入力装置
１０３表示装置
１０４入力データ格納部
１０５オントロジ構築用データ格納部
１０６オントロジ格納部
１０７データスキーママッピング定義格納部
１０８メモリ１０８
１０４１ソースデータスキーマ定義文書格納部
１０４２ソースインスタンス文書格納部
１０４３ターゲットデータスキーマ定義文書格納部
１０４４ターゲットインスタンス文書格納部
１０５１ソースデータスキーマデータ項目情報格納部
１０５２ターゲットデータスキーマデータ項目情報格納部
１０５３概念情報格納部
１０６１オントロジ
１０７１データスキーママッピング定義ファイル
１０８１データ項目情報取り込み手段
１０８２概念抽出手段
１０８３弁別関係抽出手段
１０８４順序関係抽出手段
１０８５同義関係抽出手段
１０８６オントロジ構築支援手段
１０８７データ項目マッピング定義支援手段 101 CPU
102 Input Device 103 Display Device 104 Input Data Storage Unit 105 Ontology Construction Data Storage Unit 106 Ontology Storage Unit 107 Data Schema Mapping Definition Storage Unit 108 Memory 108
1041 Source data schema definition document storage unit 1042 Source instance document storage unit 1043 Target data schema definition document storage unit 1044 Target instance document storage unit 1051 Source data schema data item information storage unit 1052 Target data schema data item information storage unit 1053 Concept information storage Unit 1061 Ontology 1071 Data schema mapping definition file 1081 Data item information capturing means 1082 Concept extraction means 1083 Discrimination relation extraction means 1084 Order relation extraction means 1085 Synonym relation extraction means 1086 Ontology construction support means 1087 Data item mapping definition support means

Claims

Referring to the first data schema and the second data schema describing the data structure, the data item names constituting the first data schema are associated with the data item names constituting the second data schema. A program,
A procedure of extracting data item names constituting the first data schema;
A procedure for extracting data item names constituting the second data schema;
A procedure for extracting an element concept included in each data item name of the extracted first data schema and second data schema;
A procedure for setting a discrimination relationship between data item names from the extracted element concept;
A procedure for reading data item names from the data item name of the first data schema and a second data schema, respectively, and setting a correspondence relationship for the two read data item names based on the discrimination relationship;
A procedure for storing the correspondence of the data item names;
A program characterized by causing a computer to execute.

The procedure for extracting the element concept is as follows:
A procedure for reading two data item names for each of the first data schema and the second data schema;
When the character strings of the two data item names are compared to include a synonymous character string, a common set of character strings is extracted as a first element concept among the character strings constituting the two data item names And the steps to
When the character strings of the two data item names are compared and the synonymous character strings are included, a set of character strings excluding a common character string from among the character strings constituting the two data item names is a second character string. To extract as an element concept of
The program according to claim 1, comprising:

The procedure for setting the discrimination relationship is as follows:
A procedure for determining a time-series order relationship for two item names corresponding to the second element concept;
When the second element concept does not include a time-series order relationship, a procedure for setting that a set of character strings constituting the second element concept is a discrimination relationship;
The program according to claim 2, comprising:

The procedure for the association is as follows:
A comparison is made between the character strings of the two read data item names, and the two data item names are set in correspondence when they contain synonymous character strings and are not in a discrimination relationship. The program according to claim 1.

Referring to the first data schema and the second data schema describing the data structure, generating a discrimination relationship between the data item names constituting the first data schema and the data item names constituting the second data schema A program to
A procedure of extracting data item names constituting the first data schema;
A procedure for extracting data item names constituting the second data schema;
A procedure for reading two data item names for each of the first data schema and the second data schema;
When the character strings of the two data item names are compared and the synonymous character strings are included, a set of character strings excluding a common character string from among the character strings constituting the two data item names is a second character string. To extract as an element concept of
A procedure for determining a time-series order relationship for two item names corresponding to the second element concept;
When the second element concept does not include a time-series order relationship, a procedure for setting that a set of character strings constituting the second element concept is a discrimination relationship;
The program characterized by including.

The procedure for setting the discrimination relationship is as follows:
6. The program according to claim 5, further comprising a procedure for storing in the ontology a value indicating that the set of character strings and a conceptual relationship between the set of character strings is a discrimination relationship.

When the character strings of the two data item names are compared to include a synonymous character string, a common set of character strings is extracted as a first element concept among the character strings constituting the two data item names And the steps to
A procedure for storing in the ontology a value indicating that the set of character strings and the conceptual relationship of the one set of character strings are synonymous;
The program according to claim 6, further comprising:

A first data schema storage unit for storing the input first data schema;
A second data schema storage unit for storing the input second data schema;
Data item names constituting the first data schema are extracted from the first data schema storage unit and stored in the first data item storage unit, and data constituting the second data schema is obtained from the second data schema storage unit A data item name capturing unit for extracting the item name and storing it in the second data item storage unit;
A concept extraction unit for extracting an element concept included in each data item name of the first data item storage unit and the second data item storage unit;
A discrimination relationship extracting unit that extracts a discrimination relationship between data item names from the extracted element concept and stores it in an ontology; and
A synonym relation extracting unit that extracts a synonym relation between data item names from the extracted element concept and stores it in an ontology; and
Data item names are respectively read from the first data item storage unit and the second data item storage unit, and correspondences between the two data item names read are determined based on the ontology, and the correspondences are mapped. Mapping definition part to be stored in the file,
A computer system characterized by comprising:

The concept extraction unit
A comparison unit that reads two data item names from the first data item storage unit or the second data item storage unit and compares character strings of the two data item names;
When the two data item names include synonymous character strings, a common set of character strings is extracted as a first element concept among the character strings constituting the two data item names, and the two data An element concept extraction unit that extracts, as a second element concept, a set of character strings excluding a common character string from character strings constituting the two data item names when the item name includes a synonymous character string When,
The computer system according to claim 8, comprising:

The discrimination relationship extraction unit
A procedure order relation extraction unit that extracts a time-series order relation for two item names corresponding to the second element concept;
When the second element concept does not include a time-series order relationship, a discrimination relationship determination unit that determines that a set of character strings constituting the second element concept is a discrimination relationship;
The computer system according to claim 9, comprising:

A search program that performs a search for an input word,
A procedure to accept input of a word as a search key,
A procedure for adding a word that is a superordinate concept of the input word as a search key based on a dictionary that records a conceptual relationship between words;
A procedure for setting each word of the search key and a word having a discrimination relationship with a component of the word as an exclusion key based on a dictionary that records a discrimination relationship between words;
Performing a search using the search key;
A procedure for outputting a search result by excluding those including the exclusion key from the search result;
A search program characterized by causing a computer to execute.