JP2008511936A

JP2008511936A - Method and system for semantic identification in a data system

Info

Publication number: JP2008511936A
Application number: JP2007530351A
Authority: JP
Inventors: アンダーソン、ラッセル、ジョージ; ブージアヌ、ムハミド; マストロ、ビンセント、エー; ウェーバー、ロバート、シー、サード
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2004-08-31
Filing date: 2005-08-31
Publication date: 2008-04-17
Also published as: WO2006026702A2; EP1815349A2; CN101044472A; EP1815349A4; WO2006026702A3

Abstract

【課題】機能の使用、再使用および変更を変化するビジネス環境において可能にするデータ統合システム・ツールを提供すること。
【解決手段】追加のデータを必要とすることなく、他のアイテムとの関係に基づいてアイテムの識別を可能にする意味識別子と、データ、メタデータ、意味識別子その他のアイテムを、あるフォーマット、言語、および／またはデータ・モデルから他のものに変換することができる変換エンジンと、アイテムの多数のインスタンスまたは形式の区別を可能にする、ハブまたはデータベースの抽象化プロパティ・レベルとに関連する方法およびシステムが提供される。
【選択図】図１PROBLEM TO BE SOLVED: To provide a data integration system tool that enables use, reuse, and change of functions in a changing business environment.
Semantic identifiers that enable identification of items based on their relationship to other items without the need for additional data, data, metadata, semantic identifiers and other items in a certain format, language , And / or methods associated with the conversion engine that can convert from the data model to the other, and the abstract property level of the hub or database that allows for the distinction of multiple instances or forms of items and A system is provided.
[Selection] Figure 1

Description

本発明は、情報技術の分野に関し、より特定的には、データ統合システムの分野に関する。 The present invention relates to the field of information technology, and more particularly to the field of data integration systems.

コンピュータ・アプリケーションの出現によって、多くのビジネス・プロセスがより速く、より効率的なものとなった。しかし、異なるデータ構造、通信プロトコル、言語およびプラットフォームを使用する異なるコンピュータ・アプリケーションが急増したことによって、一般的な企業のＩＴインフラストラクチャが極めて複雑になってきている。典型的な企業内における異なるビジネス・プロセスにおいて、企業全体ではなく特定のビジネス・プロセスのために各々が開発され最適化されたコンピュータ・アプリケーションが使用されている場合がある。例えば、ある企業が、支払勘定を追跡するための特定のコンピュータ・アプリケーションと、顧客とのコンタクトの履歴を追跡するためのコンピュータ・アプリケーションを有している場合がある。実際、集中顧客コンタクト・データベースを保持するが、従業員は個人情報マネージャなどに彼ら自身のコンタクト情報を保持する場合などでは、企業は同一のビジネス・プロセスであっても２つ以上のコンピュータ・アプリケーションを使用する場合がある。 With the advent of computer applications, many business processes have become faster and more efficient. However, with the proliferation of different computer applications using different data structures, communication protocols, languages and platforms, the typical enterprise IT infrastructure has become extremely complex. Different business processes within a typical enterprise may use computer applications that are each developed and optimized for a specific business process rather than the entire enterprise. For example, a company may have a specific computer application for tracking payment accounts and a computer application for tracking the history of customer contacts. In fact, it maintains a centralized customer contact database, but when employees maintain their own contact information, such as in a personal information manager, a company may have more than one computer application, even in the same business process. May be used.

特定用途のコンピュータ・アプリケーションによれば顧客に適合したソリューションを提供することができるという利点を得ることができるが、特定用途のコンピュータ・アプリケーションが多くなると、同じデータを企業全体で何度も繰り返し入力して処理する必要が生じたり、企業が１つのプロセスと関連するデータから利益を受ける他のプロセスを実行するときにそのデータを利用できないというような非効率につながってしまう場合がある。例えば、支払勘定プロセスがサプライ・チェーンおよび注文プロセスから分離された場合には、企業は、その企業が注文を拒絶するであろう信用履歴をもつ顧客からの注文を受け付け、応じてしまう可能性がある。他にも、様々なコンピュータ・アプリケーション全体にわたるデータのすべてに対して矛盾のないアクセスから企業が利益を得ることになる数多くの例がある。 Special-purpose computer applications offer the advantage of being able to provide a tailored solution to the customer, but the more specific-use computer applications, the same data is repeatedly entered throughout the enterprise This may lead to inefficiencies such as the need to process, or when a company executes other processes that benefit from data associated with one process. For example, if the payment account process is separated from the supply chain and ordering process, the company may accept and respond to orders from customers with a credit history that the company will reject the order. is there. There are many other examples where companies can benefit from consistent access to all of the data across various computer applications.

多くの会社が、企業における異なるアプリケーション間でデータを共有する必要性を認識し、これに取り組んできた。このようにして、エンタープライズ・アプリケーション統合すなわちＥＡＩが、異なるソースからのデータを処理するためのメッセージ・ベースの戦略として登場した。コンピュータ・アプリケーションの複雑さと数が増加するにつれて、ＥＡＩへの取り組みは、異なるプロトコルを処理する必要性、増え続けるデータおよびトランザクションを処理する必要性、ならびに、増え続けるデータのより高速な統合に対する要求を含む多くの課題に直面する。最小公分母アプローチ、アトミック・アプローチおよびブリッジ型アプローチを含む、ＥＡＩに対する様々なアプローチが実施されている。しかし、ＥＡＩは、個々のアプリケーション間の通信に基づくものである。重大な課題は、プラットフォームおよびアプリケーションの直線的な追加に応じて、ＥＡＩソリューションの複雑さが幾何学的に増大することである。 Many companies have recognized and addressed the need to share data between different applications in the enterprise. In this way, enterprise application integration or EAI has emerged as a message-based strategy for processing data from different sources. As the complexity and number of computer applications increase, EAI efforts address the need to handle different protocols, the need to handle increasing data and transactions, and the demand for faster integration of increasing data. Face many challenges, including Various approaches to EAI have been implemented, including the lowest common denominator approach, atomic approach and bridged approach. However, EAI is based on communication between individual applications. A significant challenge is that the complexity of the EAI solution increases geometrically with the linear addition of platforms and applications.

データ統合システムが企業の必要性に対処するための有用なツールをもたらす一方で、こうしたシステムが、典型的には、顧客ソリューションとして導入されている。このようなシステムは、長期にわたる開発サイクルを伴い、ビジネス構造および情報要求の変化に対応するために高度な技術的なトレーニングを必要とすることがある。変化するビジネス環境において、機能の使用、再使用および変更を可能にするデータ統合システム・ツールに対する必要性が残っている。こうしたツールの１つは、あるアイテムを、追加的なデータを必要とすることなく他のアイテムとの関係に基づいて一意的に識別することを可能にする意味識別子（ｓｅｍａｎｔｉｃｉｄｅｎｔｉｆｉｅｒ）である。変換エンジンは、データ、メタデータ、意味識別子および他のアイテムを、あるフォーマット、言語、および／またはデータ・モデルから他のものに変換することができるツールである。最終的には、ハブまたはデータベースの抽象化プロパティ・レベルにより、アイテムの多数のインスタンスまたは形式の区別が可能になる。 While data integration systems provide useful tools to address enterprise needs, such systems are typically deployed as customer solutions. Such a system involves a long development cycle and may require advanced technical training to respond to changes in business structure and information requirements. There remains a need for data integration system tools that allow the use, reuse, and change of functionality in a changing business environment. One such tool is a semantic identifier that allows an item to be uniquely identified based on relationships with other items without requiring additional data. A conversion engine is a tool that can convert data, metadata, semantic identifiers, and other items from one format, language, and / or data model to another. Ultimately, the abstract property level of the hub or database allows the distinction between multiple instances or types of items.

アイテムについて意味識別子を存在させることができる。アイテムは、オブジェクト、データ・アイテム、データ、列、行、テーブル、データベース、インスタンス、属性、メタデータ、概念、トピック、主題、意味識別子、他の識別子、ＲＦＩＤタグ、ベンダー、供給業者、顧客、人、チーム、組織、ユーザ、ネットワーク、システム、装置、家族、店、製品、製造ライン、製品特性、製品仕様、製品属性、価格、コスト、材料仕様書、出荷データ、税金データ、コース、教育プログラム、位置、地図、部門、組織、有機的組織体、プロセス、規則、法、評価システム、商品、サービスおよびサービス提供、あるいは他のアイテムまたは概念とすることができる。アイテムは、データ統合ジョブおよび／またはデータ統合プラットフォームに関連付けることができる。意味識別子は、アイテムと１以上の他のアイテムとの関係に基づいて、アイテムを識別することができる。関係は、関係の不存在とすることもできる。関係は、意味に基づくものとすることができる。関係は、関係階層におけるアイテムの位置を含むことができる。 There can be a semantic identifier for the item. Item is object, data item, data, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person Teams, organizations, users, networks, systems, equipment, families, stores, products, production lines, product characteristics, product specifications, product attributes, prices, costs, material specifications, shipping data, tax data, courses, educational programs, It can be a location, map, department, organization, organic organization, process, rule, law, rating system, product, service and service offering, or other item or concept. An item can be associated with a data integration job and / or a data integration platform. A semantic identifier can identify an item based on the relationship between the item and one or more other items. A relationship can also be the absence of a relationship. The relationship can be based on meaning. The relationship can include the position of the item in the relationship hierarchy.

意味識別子は、アイテムについての固有の識別子とすることができる。アイテムについての固有の意味識別子が、アイテムと他のアイテムとのすべての関係より少ない数の関係を考えることが可能である。一意性を確実にするために、最小数の関係に基づいた意味識別子を作成することが有利である。アイテムについての固有の意味識別子を作成するのに必要とされる関係の数は、コンテキストによって異なり得る。意味識別子は、コンテキスト依存のものとすることができる。意味識別子は、動的なものとすることができる。 The semantic identifier can be a unique identifier for the item. It is possible to consider a relationship where the number of unique semantic identifiers for an item is less than all relationships between the item and other items. In order to ensure uniqueness, it is advantageous to create semantic identifiers based on a minimum number of relationships. The number of relationships required to create a unique semantic identifier for an item can vary from context to context. The semantic identifier can be context dependent. The semantic identifier can be dynamic.

意味識別子は、文字列構造またはフォーマットで格納し、維持し、記録し、処理し、および／または解釈することができる構文の形で格納し、維持し、記録し、処理し、および／または解釈することができる。構文および／または文字列構造またはフォーマットは、構文解析可能である。構文および／または文字列構造またはフォーマットは、切り捨て、修正し、短縮し、構文解析し、または再配列することができる。構文および／または文字列を切り捨て、修正し、短縮し、または再配列し、依然として意味識別子を保持することが可能である。特定のコンテキストにおいて、より短い構文および／または文字列は有用であり、性能を増大させることができる。 Semantic identifiers are stored, maintained, recorded, processed, and / or interpreted in a syntax that can be stored, maintained, recorded, processed, and / or interpreted in a string structure or format. can do. The syntax and / or string structure or format can be parsed. The syntax and / or string structure or format can be truncated, modified, shortened, parsed, or rearranged. It is possible to truncate, modify, shorten, or rearrange the syntax and / or string and still retain the semantic identifier. In certain contexts, shorter syntaxes and / or strings are useful and can increase performance.

意味識別子は、企業の方法におけるステップ、データベース内のデータ、行または列内のデータ、テーブル内の行または列、データベース内の行または列、テーブル内のデータ、データ内のテーブル、データベース内のメタデータ、ハブまたはリポジトリ内のアイテム、データベース内のアイテム、テーブル内のアイテム、列内のアイテム、行内のアイテム、組織内の人、通信の送信者または受信者、ネットワーク上のユーザ、ネットワーク上のシステム、ネットワーク上の装置、家族内の人、店の中の品目、メニュー上の料理、製造ライン内の製品、製品提供における製品、教育プログラムまたは訓練プログラムにおけるコースまたはステップ、地図上の位置、アイテムの位置、組織の部門、チームの人、規則システムにおける規則、サービス・スイートにおけるサービス、企業の組織階層内のエンティティ、供給チェーン内のエンティティ、マーケットにおける顧客、購買決定における購入者、商品またはサービスの価格、商品またはサービスのコスト、製造またはシステムの構成部品、方法のステップ、および／またはグループのメンバーのような意味コンテキストと関連付けることができる。 A semantic identifier is a step in an enterprise method, data in a database, data in a row or column, row or column in a table, row or column in a database, data in a table, table in data, meta in database Data, items in the hub or repository, items in the database, items in the table, items in the column, items in the row, people in the organization, sender or receiver of the communication, users on the network, systems on the network Devices on the network, people in the family, items in the store, dishes on the menu, products in the production line, products in the product offering, courses or steps in the educational or training program, location on the map, item Location, organizational unit, team person, rules in the rules system, services Services in the business suite, entities in the corporate organizational hierarchy, entities in the supply chain, customers in the market, purchasers in purchase decisions, prices of goods or services, costs of goods or services, manufacturing or system components, methods And / or a semantic context such as a member of a group.

実施形態において、データベースは、列を持ったテーブルを有することができる。その列についての固有の意味識別子は、「データベース名のテーブル名の列名」とすることができる。この固有の意味識別子は、次の構文、すなわち「列名：：テーブル名：：データベース名」を用いて、格納され、維持され、記録され、処理され、および／または解釈される。構文および／または任意の関連した文字列を構文解析することができ、不必要な要素を除去することができる。例えば、１つのデータベースだけが存在する場合には、以下の構文は、列：：列名：：テーブル名についての固有の識別子を依然として生成することができる。固有の意味識別子を作成するのに、データベース関係は必要とされない。他の例において、データベースは、１つのテーブルしか有することができないので、以下の構文は、列：：列名：：データベース名についての固有の識別子とすることができる。固有の識別子を作成するのに、テーブル関係は必要とされない。より短い構文および／または文字列を使用することにより、処理の回数が減少し、効率が増大される。 In an embodiment, the database may have a table with columns. The unique semantic identifier for the column can be “column name of table name of database name”. This unique semantic identifier is stored, maintained, recorded, processed, and / or interpreted using the following syntax: “column name :: table name :: database name”. The syntax and / or any associated string can be parsed and unnecessary elements can be removed. For example, if there is only one database, the following syntax can still generate a unique identifier for the column :: column name :: table name. Database relationships are not required to create unique semantic identifiers. In another example, since the database can have only one table, the following syntax can be a unique identifier for column :: column name :: database name. Table relationships are not required to create unique identifiers. By using shorter syntax and / or strings, the number of operations is reduced and efficiency is increased.

変換エンジンは、１以上の意味識別子、データベース、意味識別子を含むデータベース、情報システム、意味識別子、または他のアイテムを含む情報システムに対して変換操作を行うことができる。変換操作は、意味識別子のフォーマット、言語、および／またはデータ・モデルを変換するか、または他の方法で修正することができる。変換操作は、１以上のデータ・ツール、言語、フォーマット、および／またはデータ・モデルとの間、少なくとも１つの他のデータ・ツール、言語、フォーマット、および／またはデータ・ツールとの間の変換またはマッピングを含むことができる。例えば、変換操作は、ＤａｔａＳｔａｇｅ７、ＱｕａｌｉｔｙＳｔａｇｅ、ＢｕｓｉｎｅｓｓＯｂｊｅｃｔ、ＩＢＭ−ＤＢ２ＣｕｂｅＶｉｅｗｓ、ＵＭＬ１．１、ＵＭＬ１．３、ＥＲＳｔｕｄｉｏ、ＰｒｏｆｉｌｅＳｔａｇｅ、ＰｏｗｅｒＤｅｓｉｇｎｅｒ（ＰａｃｋａｇｅｓおよびＥｘｔｅｎｄｅｄＡｔｔｒｉｂｕｔｅｓのためのサポートが付加された）、および／またはＭｉｃｒｏＳｔｒａｔｅｇｙとの間の変換、またはこれらへのマッピングを含むことができる。変換エンジンおよび／または変換操作は、随意的に、メタブローカにおいて実現することができる。変換エンジン、変換操作のマッピング、または変換操作は、操作の実行において、元の意味コンテキストと変換された意味コンテキストとの間で前後に変換されるデータをトレースすることができる。変換操作は、バッチで、リアルタイムに、または連続的に行い、実行し、および／または実施することができる。変換操作は、例えば、サービス指向アーキテクチャの一部としてなど、サービスとして提供すること、または利用可能にすることができる。 The conversion engine can perform conversion operations on one or more semantic identifiers, databases, databases containing semantic identifiers, information systems, semantic identifiers, or information systems including other items. The conversion operation may convert or otherwise modify the format, language, and / or data model of the semantic identifier. A conversion operation is a conversion or conversion between one or more data tools, languages, formats, and / or data models, and at least one other data tool, language, format, and / or data tools. Mapping can be included. For example, the conversion operation is DataStage7, QualityStage, BusinessObject, IBM-DB2 CubeViews, UML1.1, UML1.3, ERSStudio, ProfileStage, PowerDesigner (Packages and ExtendedAttached support) Conversions to and from MicroStrategies can be included. The conversion engine and / or the conversion operation can optionally be implemented in a metabroker. The conversion engine, mapping of the conversion operation, or conversion operation can trace data that is converted back and forth between the original semantic context and the converted semantic context in the execution of the operation. Conversion operations can be performed, performed, and / or performed in batch, in real time, or continuously. The conversion operation can be provided or made available as a service, eg, as part of a service-oriented architecture.

意味識別子、データベース、１以上の意味識別子を含むデータベース、情報システム、１以上の意味識別子を含む情報システム、または他のアイテムについての変換操作が存在すると、この変換操作を、いずれかの他の意味識別子、データベース、１以上の意味識別子を含むデータベース、情報システム、１以上の意味識別子、あるいは少なくとも１つの変換操作を共有する他のアイテムとの間で変換し、これにマッピングし、これに結合し、これとともに使用し、またはこれと関連付けることが可能になる。 If there is a conversion operation for a semantic identifier, a database, a database that includes one or more semantic identifiers, an information system, an information system that includes one or more semantic identifiers, or other items, this conversion operation may be designated as any other meaning. An identifier, a database, a database containing one or more semantic identifiers, an information system, one or more semantic identifiers, or other items that share at least one conversion operation, map to, and bind to Can be used with, or associated with, this.

アイテムは、物理モデリング活動および／または論理モデリング活動のような、多数の形式またはインスタンスで存在することができる。データベースおよび／またはハブにおいて、いずれかの関連したデータまたはメタデータを含むアイテムは、多数の形式またはインスタンスで存在することができる。アイテムの種々の形式またはインスタンスを区別するために、抽象化レベル、階層内の位置、他のアイテムとの関係、アイテムの１以上の区別属性、アイテムが見出されるコンテキスト、アイテムが見出される物理的位置等のような、いずれかの区別特性を使用されることができる。 Items can exist in many forms or instances, such as physical modeling activities and / or logical modeling activities. In a database and / or hub, items that contain any relevant data or metadata can exist in many forms or instances. To distinguish between different types or instances of an item, the level of abstraction, position in the hierarchy, relationship to other items, one or more distinct attributes of the item, the context in which the item is found, the physical location in which the item is found Any distinguishing property can be used, such as.

１つの実施形態において、「従業員」と名づけられたテーブルのようなアイテムをハブに入れることができる。ハブ・コレクタは、ハブ内に２つの形式またはインスタンスの「従業員」、すなわち１つは物理データベース・インスタンスに対応し、他のものは論理モデリング活動に対応するものを有することができる。ハブ・データ収集の抽象化プロパティ・レベルは、物理モデルと論理モデル・インスタンスまたは形式との間の区別を可能にする。 In one embodiment, an item such as a table named “employee” may be placed in the hub. A hub collector can have two types or instances of “employees” within the hub, one corresponding to a physical database instance and the other corresponding to a logical modeling activity. The abstract property level of hub data collection allows a distinction between a physical model and a logical model instance or form.

クエリに応答するものとすることができる変換操作を実行するとき、変換エンジンは、ハブまたはデータベースからアイテムのすべてをグラブし、ロードし、または獲得することができる。変換エンジンは、抽象化レベル、階層内の位置、他のアイテムとの関係、アイテムの属性、物理的位置等のような区別特性に基づいて、アイテムをフィルタリングし、選択し、格納し、変換し、修正し、または他の方法で操作することができる。代替的には、クエリに応答するものとすることができる変換操作を実行するとき、変換エンジンは、ハブまたはデータベースにおいて、任意のデータおよび／またはメタデータを含むアイテムをフィルタリングし、選択し、格納し、変換し、修正し、または他の方法で操作することができ、関連した抽象化レベルのアイテムまたは関連した属性、位置、関係、位置等を有するアイテムだけをグラブし、または獲得することができる。フィルタリング、選択、格納、変換、修正、または他の操作は、実行時および設計時に行うことができ、バッチで、リアルタイムで、または連続的に行うことができる。実施形態において、フィルタリング、選択、格納、変換、修正、または他の操作は、データ・モデル、データ・モデルのマッピング、識別子の構文の区別特性等といった、開発時、設計時、または実行時に変換エンジンおよび／またはシステムによって得られる情報または入力に基づくものとすることができる。情報は、リアルタイムで動的に更新することができる。したがって、１つの好ましい実施形態において、システムは、論理アイテムを選択し、物理アイテムを省くため、または物理アイテムを選択し、論理アイテムを省くためなどに、データベースの周知のマッピングに基づいて、データベースからデータを選択するための選択コマンドを精緻化することができる。 When performing a transformation operation that can be responsive to a query, the transformation engine can grab, load, or obtain all of the items from the hub or database. The transformation engine filters, selects, stores, and transforms items based on distinctive properties such as abstraction level, position in hierarchy, relationship to other items, item attributes, physical location, etc. Can be modified, manipulated, or otherwise manipulated. Alternatively, when performing a transformation operation that can be responsive to a query, the transformation engine filters, selects, and stores items that contain any data and / or metadata at the hub or database. Can be converted, modified, or otherwise manipulated to grab or acquire only items with an associated level of abstraction or with associated attributes, positions, relationships, positions, etc. it can. Filtering, selection, storage, conversion, modification, or other operations can be performed at run time and design time, and can be performed in batch, in real time, or continuously. In an embodiment, the filtering, selection, storage, transformation, modification, or other operation is a transformation engine at development time, design time, or runtime, such as a data model, data model mapping, identifier syntax distinguishing characteristics, etc. And / or based on information or input obtained by the system. Information can be updated dynamically in real time. Thus, in one preferred embodiment, the system selects from a database based on a well-known mapping of the database, such as selecting a logical item and omitting a physical item, or selecting a physical item and omitting a logical item. The selection command for selecting data can be refined.

場合によっては、プロセス全体において、フィルタリング、選択、または他の操作がハブまたはデータベースに近いほど、操作がより効率的かつ高速になる。変換エンジンは、クエリ自体に変換操作を行い、ハブまたはデータベースに直接送ることができる、改訂されたクエリまたは選択コマンドをもたらすことができる。改訂されたクエリまたは選択コマンドは、ハブまたはデータベースと直接互換性があるフォーマットにすることができる。 In some cases, the closer the filtering, selection, or other operation is to the hub or database, the more efficient and faster the operation is throughout the process. The transformation engine can perform a transformation operation on the query itself, resulting in a revised query or select command that can be sent directly to the hub or database. The revised query or select command can be in a format that is directly compatible with the hub or database.

他の態様において、コンピュータ・プログラム製品は、コンピュータ・プログラム・コードを含むコンピュータ使用可能媒体を含むことができ、コンピュータ可読プログラム・コードは、１以上のコンピュータ上で実行されるとき、１以上のコンピュータに、上記の方法のいずれか１以上を実行させる。 In other aspects, a computer program product can include a computer-usable medium that includes computer program code, where the computer-readable program code when executed on one or more computers. To perform any one or more of the above methods.

本明細書において用いられる「ＩｎｔｅｒｎａｔｉｏｎａｌＢｕｓｉｎｅｓｓＭａｃｈｉｎｅｓ」または「ＩＢＭ」は、ニューヨーク州アーモンク所在のインターナショナル・ビジネス・マシーンズ・コーポレーションを指している。 As used herein, “International Business Machines” or “IBM” refers to International Business Machines Corporation, Armonk, NY.

本明細書において用いられる「データ・ソース」または「データ・ターゲット」は、特定の意味が他に示されるかまたは語句の文脈を別に要求することがない限り、これらの用語と矛盾しない最も広範な意味を持つように意図されており、データベース、複数のデータベース、リポジトリ情報マネージャ、キュー、メッセージ・サービス、リポジトリ、データ機構、データ・ストレージ機構、データ・プロバイダ、ウェブサイト、サーバ、コンピュータ、コンピュータ・ストレージ機構、ＣＤ、ＤＶＤ、モバイル・ストレージ機構、中央ストレージ機構、ハードディスク、複数の調整データ・ストレージ機構、ＲＡＭ、ＲＯＭ、フラッシュメモリ、メモリカード、一時メモリ機構、永続メモリ機構、磁気テープ、ローカル接続コンピューティング機構、遠隔接続コンピューティング機構、無線機構、有線機構、モバイル機構、中央機構、ウェブ・ブラウザ、クライアント、ラップトップ、携帯情報端末（「ＰＤＡ」）、電話、携帯電話、移動電話、情報プラットフォーム、分析機構、処理機構、ビジネス・エンタープライズ・システムまたはデータを処理する他の機構もしくはデータまたは他の情報ならびに上記のシステムのいずれかに用いられる構造化データまたは非構造化データあるいはいずれかのストリーミング・データ、メッセージ化データ、イベント駆動データもしくはソース・データを保持するための何らかのファイルまたはファイル・タイプを格納するようになった他の機構、および、上記のいずれかの組み合わせを含むものとする。ストレージ機構とは、何らかの物理装置または論理装置、リソース、あるいは、データ・ソースもしくはデータ・ターゲットとして機能を果たすか、さもなければ検索可能な形式でデータを格納することができる機構である。 As used herein, “data source” or “data target” is the broadest consistent with these terms, unless a specific meaning is otherwise indicated or the context of the phrase is otherwise required. Intended to be meaningful, database, multiple databases, repository information manager, queue, message service, repository, data mechanism, data storage mechanism, data provider, website, server, computer, computer storage Mechanism, CD, DVD, mobile storage mechanism, central storage mechanism, hard disk, multiple coordinated data storage mechanism, RAM, ROM, flash memory, memory card, temporary memory mechanism, permanent memory mechanism, magnetic tape, local connection computing Networking mechanism, remote connection computing mechanism, wireless mechanism, wired mechanism, mobile mechanism, central mechanism, web browser, client, laptop, personal digital assistant ("PDA"), telephone, mobile phone, mobile phone, information platform, Analysis mechanism, processing mechanism, business enterprise system or other mechanism or data or other information processing data and structured or unstructured data or any streaming data used in any of the above systems , Other mechanisms adapted to store any file or file type to hold messaged data, event driven data or source data, and any combination of the above. A storage mechanism is a mechanism that can function as some physical or logical device, resource, or data source or data target, or otherwise store data in a searchable format.

「ＥｎｔｅｒｐｒｉｓｅＪａｖａ（登録商標）Ｂｅａｎ（ＥＪＢ）」は、Ｊ２ＥＥプラットフォームのためのサーバ側のコンポーネント・アーキテクチャを含む。ＥＪＢは、分散Ｊａｖａ（登録商標）アプリケーション、トランザクションＪａｖａ（登録商標）アプリケーション、セキュアおよびポータブルＪａｖａ（登録商標）アプリケーションの迅速で簡単な開発をサポートする。ＥＪＢは、メッセージの並行処理を可能にするコンテナ・アーキテクチャをサポートし、分散トランザクションをサポートするため、Ｊ２ＥＥアーキテクチャを使用するデータベース更新、メッセージ処理およびエンタープライズ・システムへの接続が、同一のトランザクション・コンテキストに関与することが可能になる。 “Enterprise Java® Bean (EJB)” includes a server-side component architecture for the J2EE platform. EJB supports fast and easy development of distributed Java applications, transactional Java applications, secure and portable Java applications. EJB supports a container architecture that allows concurrent processing of messages and supports distributed transactions, so database updates, message processing, and connections to enterprise systems using the J2EE architecture are in the same transaction context. It becomes possible to get involved.

「ＪＭＳ」は、Ｊａｖａ（登録商標）ベースのＪ２ＥＥエンタープライズ・アーキテクチャのためのエンタープライズ・メッセージ・サービスであるＪａｖａ（登録商標）ＭｅｓｓａｇｅＳｅｒｖｉｃｅを意味する。「ＪＣＡ」は、以下により詳細に説明されるＪ２ＥＥプラットフォームのＪ２ＥＥＣｏｎｎｅｃｔｏｒＡｒｃｈｉｔｅｃｔｕｒｅを意味する。ＥＪＢ、ＪＭＳおよびＪＣＡは、現代の分散トランザクション環境において一般的に用いられるソフトウェア・ツールであるが、同様の機能を提供するいずれかのプラットフォーム、システムまたはアーキテクチャを本明細書において説明されるデータ統合システムとともに利用できることに留意されたい。 “JMS” refers to Java® Message Service, an enterprise message service for Java®-based J2EE enterprise architecture. “JCA” means J2EE Connector Architecture of the J2EE platform described in more detail below. EJB, JMS, and JCA are software tools commonly used in modern distributed transaction environments, but any platform, system, or architecture that provides similar functionality is described herein. Note that it can be used with.

本明細書において用いられる「リアルタイム」は、ビジネス・トランザクションまたはビジネスの継続時間に近い時間の間隔を含み、夜間に行われるバッチ処理操作のようなオフラインで行われるものとは対照的に、営業活動またはビジネス・プロセス中に行われるプロセスまたはサービスと含むものとする。ビジネス・プロセスの継続時間によって、リアルタイムは、秒、一瞬、分、時間、あるいはさらに日を含む場合がある。 As used herein, “real time” includes time intervals close to business transactions or business durations, as opposed to those performed offline, such as batch processing operations performed at night. Or with processes or services that take place during the business process. Depending on the duration of the business process, real time may include seconds, moments, minutes, hours, or even days.

本明細書において用いられる「ビジネス・プロセス」、「ビジネス・ロジック」および「ビジネス・トランザクション」は、販売、マーケティング、フルフィルメント、在庫管理、価格付け、製品設計、専門的サービス、金融サービス、管理、財務、引受業務、分析、契約、情報技術サービス、データ・ストレージ、データ・マイニング、情報配信、商品の経路指定、スケジューリング、通信、投資、トランザクション、提供、販売促進、広告、付け値、エンジニアリング、製造、サプライ・チェーン管理、人事管理、データ処理、データ統合、ワークフロー管理、ソフトウェア生成、ハードウェア生産、新製品の開発、研究、開発、戦略機能、品質管理および保証、パッケージ化、物流、顧客関係管理、リベートおよび返品処理、顧客サポート、製品保守、電話勧誘、企業広報、投資家向け広報活動を含むが、これらに限定されるものではなく、企業が行うことができるあらゆる方法、サービス、運用、プロセス、または取引を含むものとする。 As used herein, “business process”, “business logic” and “business transaction” include sales, marketing, fulfillment, inventory management, pricing, product design, professional services, financial services, management, Finance, underwriting, analysis, contracts, information technology services, data storage, data mining, information distribution, product routing, scheduling, communications, investment, transactions, provision, promotion, advertising, bids, engineering, manufacturing , Supply chain management, personnel management, data processing, data integration, workflow management, software generation, hardware production, new product development, research, development, strategic functions, quality control and assurance, packaging, logistics, customer relationship management , Rebates and returns processing, customer support Theft, product maintenance, telephone solicitation, corporate communications, including investor relations activities, the present invention is not limited to these, any method can be performed by the company, service, is intended to include operational, process, or the transaction.

本明細書において用いられる「サービス指向アーキテクチャ（ＳＯＡ）」は、企業のインフラストラクチャの一部を形成するサービスを含む。ＳＯＡにおいては、サービスは、迅速なアプリケーション開発を可能とし、冗長なコードを避ける、アプリケーションの開発および導入のための構成単位となることがある。各々のサービスは、サービスについてのデータ入力のソース、またはサービスのデータ出力のターゲットといった、周囲環境に結合できるビジネス・ロジックまたはビジネス規則の組を具体化することができる。ＳＯＡの種々の例が、以下の説明において提供される。 As used herein, “service oriented architecture (SOA)” includes services that form part of an enterprise's infrastructure. In SOA, services can be a building block for application development and deployment that enables rapid application development and avoids redundant code. Each service can embody a set of business logic or business rules that can be coupled to the surrounding environment, such as a source of data input for the service or a target of data output for the service. Various examples of SOA are provided in the following description.

本明細書において用いられる「メタデータ」は、処理されるデータにコンテキストを導入するデータ、データに関するデータ、関連情報のコンテキストに関する情報、データの出所に関する情報、データの場所に関する情報、データの意味に関する情報、データの経過時間に関する情報、データの見出しに関する情報、データの単位に関する情報、データのフィールドに関する情報、および／または、データのコンテキストに関連する他のいずれかの情報に関する情報を含むものとする。 As used herein, “metadata” refers to data that introduces context into the data being processed, data about the data, information about the context of the related information, information about the origin of the data, information about the location of the data, and the meaning of the data Information, information about the elapsed time of the data, information about the headings of the data, information about the units of the data, information about the fields of the data, and / or information about any other information related to the context of the data.

本明細書において用いられる「ＷＳＤＬ」すなわち「ウェブ・サービス記述言語」は、文書指向情報または手続き指向情報のいずれかを含むメッセージ上で動作するエンドポイントの組としてネットワーク・サービス（多くの場合、ウェブ・サービス）を記述するためのＸＭＬフォーマットを含む。動作およびメッセージは、抽象的に記述され、次いでエンドポイントを定めるために具体的なネットワーク・プロトコルおよびメッセージ・フォーマットに結合される。関連する具体的なエンドポイントは、抽象的なエンドポイント（サービス）に組み合わされる。ＷＳＤＬは、どのメッセージ・フォーマットまたはネットワーク・プロトコルが通信に用いられるかにかかわらず、エンドポイントおよびそれらのメッセージの記述を可能にするように拡張可能である。 As used herein, “WSDL” or “Web Service Description Language” is a network service (often a web service) as a set of endpoints that operate on messages that contain either document-oriented or procedure-oriented information. XML format for describing (service). Operations and messages are described abstractly and then combined into a specific network protocol and message format to define the endpoint. Related concrete endpoints are combined into abstract endpoints (services). WSDL is extensible to allow the description of endpoints and their messages regardless of which message format or network protocol is used for communication.

本明細書において用いられる「メタブローカ」は、データまたはメタデータの変換操作その他の操作を行うための変換エンジンその他の手段を含むことができるシステムまたは方法を含むことができる。変換操作その他の操作は、１以上のフォーマット、言語、および／またはデータ・モデルから１以上のフォーマット、言語、および／またはデータ・モデルへのデータまたはメタデータの変換を含むことができる。 A “metabroker” as used herein can include a system or method that can include a conversion engine or other means for performing data or metadata conversion operations or other operations. Transformation operations and other operations can include the transformation of data or metadata from one or more formats, languages, and / or data models to one or more formats, languages, and / or data models.

以下の説明全体を通して、他に特に示されない限り、同様の要素に対する数字は同様の要素を指すことが意図されている。 Throughout the following description, unless otherwise indicated, numbers for like elements are intended to refer to like elements.

本明細書に開示される本発明は、全体がハードウェアの実施形態、全体がソフトウェアの実施形態、または、ハードウェア要素とソフトウェア要素の両方を含む実施形態の形式を取ることができる。好ましい実施形態においては、本発明は、これらに限定されるものではないが、ファームウェア、常駐型のソフトウェア、マイクロコード等を含むソフトウェアにおいて実装される。 The invention disclosed herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In preferred embodiments, the present invention is implemented in software, including but not limited to firmware, resident software, microcode, and the like.

さらに、本発明は、コンピュータまたはいずれかの命令実行システムによって、またはこれらと接続して、使用されるためのプログラム・コードを提供するコンピュータ使用可能またはコンピュータ可読媒体からアクセス可能なコンピュータ・プログラム製品の形態を取ることができる。この説明のために、コンピュータ使用可能またはコンピュータ可読媒体は、命令実行システム、装置によって、またはこれらと接続して、使用されるためのプログラムを含み、格納し、通信し、伝搬し、または転送することが可能ないずれかの装置とすることができる。 Furthermore, the present invention provides a computer program product accessible from a computer usable or computer readable medium that provides program code for use by or in connection with a computer or any instruction execution system. Can take form. For purposes of this description, a computer usable or computer readable medium includes, stores, communicates, propagates, or transfers a program for use by or in connection with an instruction execution system, apparatus, or the like. It can be any device that can.

媒体は、電子システム、磁気システム、光システム、電磁システム、赤外線システム、もしくは半導体システム（または機器もしくは装置）または伝搬媒体とすることができる。コンピュータ可読媒体の例は、半導体メモリまたはソリッドステート・メモリ、磁気テープ、取り外し可能コンピュータ・ディスケット、ランダム・アクセス・メモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、磁気ハードディスクおよび光ディスクを含む。現時点における光ディスクの例は、ＣＤ−ＲＯＭ、ＣＤ−Ｒ／ＷおよびＤＶＤを含む。 The medium can be an electronic system, a magnetic system, an optical system, an electromagnetic system, an infrared system, or a semiconductor system (or apparatus or device) or a propagation medium. Examples of computer readable media include semiconductor memory or solid state memory, magnetic tape, removable computer diskette, random access memory (RAM), read only memory (ROM), magnetic hard disk and optical disk. Current examples of optical disks include CD-ROM, CD-R / W and DVD.

プログラム・コードを格納および／または実行するのに適したデータ処理システムは、システム・バスを通してメモリ要素に直接的にまたは間接的に結合された少なくとも１つのプロセッサを含む。メモリ要素は、プログラム・コードの実際の実行時に使用されるローカル・メモリと、大容量記憶装置と、実行時に大容量記憶装置からコードを取得しなければならない回数を減少させるように少なくともいくつかのプログラム・コードの一時的な記憶場所を提供するキャッシュ・メモリとを含むことができる。 A data processing system suitable for storing and / or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory element has at least some local memory used during actual execution of the program code, mass storage, and at least some so as to reduce the number of times code must be obtained from the mass storage during execution. And cache memory providing a temporary storage location for program code.

入力／出力装置すなわちＩ／Ｏ装置（キーボード、ディスプレイ、ポインティング装置等を含むが、これらに限定されるものではない）を、直接的に、または介在するＩ／Ｏコントローラを通して、システムに結合することができる。 Coupling input / output devices or I / O devices (including but not limited to keyboards, displays, pointing devices, etc.) to the system either directly or through intervening I / O controllers Can do.

プライベート・ネットワークまたは公衆ネットワークを通じて、データ処理システムを他のデータ処理システムまたは遠隔プリンタもしくは記憶装置に結合できるように、ネットワーク・アダプタをシステムに結合することもできる。モデム、ケーブル・モデムおよびイーサネット（登録商標）・カードは、現時点で利用可能なタイプのネットワーク・アダプタのうちの一部である。 Network adapters can also be coupled to the system so that the data processing system can be coupled to other data processing systems or remote printers or storage devices through private or public networks. Modems, cable modems and Ethernet cards are some of the currently available types of network adapters.

図１は、企業の様々なデータの統合を容易にするためのプラットフォーム１００を表す。プラットフォームは、各々が複数の異なるコンピュータ・アプリケーションおよびデータ・ソースを含むことができる複数のビジネス・プロセスを含む。プラットフォームは、上述のようなデータ・ソースとすることができるいくつかのデータ・ソース１０２を含むことができる。これらのデータ・ソースは、様々な物理的場所からの様々なデータ・タイプを含むことができる。例えば、データ・ソースは、Ｓｙｂａｓｅ、Ｍｉｃｒｏｓｏｆｔ、Ｉｎｆｏｒｍｉｘ、Ｏｒａｃｌｅ、Ｉｎｌｏｍｏｖｅｒ、ＥＭＣ、Ｔｒｉｌｌｉｕｍ、ＦｉｒｓｔＬｏｇｉｃ、Ｓｉｅｂｅｌ、ＰｅｏｐｌｅＳｏｆｔ、ＩＢＭ、Ａｐａｃｈｅ、またはＮｅｔｓｃａｐｅなどのプロバイダから提供されるシステムを含むことができる。データ・ソース１０２は、ＩＭＳ、ＤＢ２、ＡＤＡＢＡＳ、ＶＳＡＭ、ＭＤＳｅｒｉｅｓ、ＵＤＢ、ＸＭＬ、複合フラット・ファイル、またはＦＴＰファイルなどのデータベース製品または標準技術を使用するシステムを含むことができる。データ・ソース１０２は、ＭｉｃｒｏｓｏｆｔＯｕｔｌｏｏｋ、ＭｉｃｒｏｓｏｆｔＷｏｒｄ、ＭｉｃｒｏｓｏｆｔＥｘｃｅｌ、ＭｉｃｒｏｓｏｆｔＡｃｃｅｓｓのようなアプリケーションによって作成または使用されるファイル、ならびに、ＡＳＣＩＩ、ＣＳＶ、ＧＩＦ、ＴＩＦ、ＰＮＧ等のような標準フォーマットのファイルを含むことができる。データ・ソース１０２は、様々な場所に配置することができ、または集中的に配置することもできる。データ・ソース１０２から供給されるデータは、様々な形式のものとすることができ、互換性があるか、または互換性のない異なるフォーマットを有することができる。 FIG. 1 represents a platform 100 for facilitating the integration of various enterprise data. The platform includes a plurality of business processes, each of which can include a plurality of different computer applications and data sources. The platform can include a number of data sources 102 that can be data sources as described above. These data sources can include various data types from various physical locations. For example, the data source can be provided from a system that can include a provider such as Sybase, Microsoft, Informix, Oracle, Inlover, EMC, Trillium, First Logic, Siebel, PeopleSoft, IBM, Apache, or Netscape. Data sources 102 may include systems that use database products or standard technologies such as IMS, DB2, ADABAS, VSAM, MD Series, UDB, XML, composite flat files, or FTP files. Data sources 102 include files created or used by applications such as Microsoft Outlook, Microsoft Word, Microsoft Excel, Microsoft Access, and files in standard formats such as ASCII, CSV, GIF, TIF, PNG, etc. be able to. The data source 102 can be located at various locations or it can be centrally located. The data supplied from the data source 102 can be in various formats and can have different formats that are compatible or incompatible.

データ・ターゲットは、本明細書の後半で説明されるが、一般的に、これらのデータ・ターゲットは、上述のデータ・ソース１０２のいずれかとすることができる。このような用語の使用方法の違いは、一般的には、データ統合プロセスにおいてデータ・システムがデータを提供するのか、またはデータを受け取るのかに起因するものである。しかし、通常のデータ統合システムにおいては、データ・ソースはデータを受け取ることもできるし、データ・ターゲットはデータを提供することもできるため、特に他に記述がない限り、この区別はデータ・ソースとデータ・ターゲットとの間の能力に関する違いを与えることを意図するものではないことに留意されたい。 Data targets are described later in this document, but in general, these data targets can be any of the data sources 102 described above. Such differences in term usage are generally attributed to whether the data system provides or receives data in the data integration process. However, in a typical data integration system, a data source can receive data and a data target can provide data, so this distinction is different from data sources unless otherwise stated. Note that it is not intended to give a difference in capability between data targets.

また、図１に示されたプラットフォームはデータ統合システム１０４も含む。データ統合システムは、例えば、データ統合システム１０４が受信するクエリまたは検索コマンドの結果としてのデータ・ソース１０２からのデータ収集を容易にすることができる。データ統合システム１０４は、データ・ソースがデータをデータ統合システム１０４に供給するように、１以上のデータ・ソース１０２に対してコマンドを送信することができる。受信されたデータは、様々なメタデータを含む多数のフォーマットのものであり得るため、データ統合システムは、統合処理のために後に結合することができるように、受信したデータを再構成することができる。データ統合システム１０４によって実現することができる機能は、以下により詳細に説明される。 The platform shown in FIG. 1 also includes a data integration system 104. The data integration system can facilitate data collection from the data source 102 as a result of, for example, a query or search command received by the data integration system 104. The data integration system 104 can send commands to one or more data sources 102 such that the data source provides data to the data integration system 104. Since the received data can be in a number of formats including various metadata, the data integration system can reconstruct the received data so that it can be later combined for the integration process. it can. The functions that can be implemented by the data integration system 104 are described in more detail below.

また、プラットフォーム１００は、検索システム１０８を含む。検索システム１０８は、データ統合システム１０４から送信されるデータをさらに操作するのに用いられるデータベースまたは処理プラットフォームを含むことができる。例えば、データ統合システム１０４は、検索システム１０８が、処理されたデータを用いてビジネスに有用なレポート１１０を生成することができるように、データ・ソース１０２から受信するデータを整理し、結合し、変換し、または、他の方法で操作することができる。レポート１１０は、データの関連性を報告し、複雑なクエリに回答し、単純なクエリに回答し、または、ビジネスもしくはユーザに有用な他の報告を作成するために使用することができる。レポート１１０は、生データ、テーブル、チャート、グラフ、および検索システム１０８からのデータの他のいずれかの表現を含むことができる。 The platform 100 also includes a search system 108. The search system 108 can include a database or processing platform that is used to further manipulate data transmitted from the data integration system 104. For example, the data integration system 104 organizes and combines the data received from the data source 102 so that the search system 108 can use the processed data to generate a business useful report 110; Can be converted or otherwise manipulated. The report 110 can be used to report data relevance, answer complex queries, answer simple queries, or create other reports useful to business or users. The report 110 can include raw data, tables, charts, graphs, and any other representation of data from the search system 108.

また、プラットフォーム１００は、データベースまたはデータベース管理システム１１２を含むこともできる。データベース１１２は、一時的または永続的もしくは長期的な記憶として、情報を格納するために使用することができる。例えば、データ統合システム１０４は、１以上のデータ・ソース１０２からデータを収集し、そのデータを、互いに互換性がある形式または互いに結合することができる形式に変換することができる。データが変換されると、データ統合システム１０４は、後で実施される検索のために、分解形式、結合形式その他の形式で、データをデータベース１１２に格納することができる。 The platform 100 may also include a database or database management system 112. Database 112 can be used to store information as temporary or permanent or long-term storage. For example, the data integration system 104 can collect data from one or more data sources 102 and convert the data into a format that is compatible with each other or that can be combined with each other. Once the data is converted, the data integration system 104 can store the data in the database 112 in a decomposed form, combined form, or other form for later retrieval.

図２は、企業の複数のエンティティおよびビジネス・プロセス間のデータ統合を示す概略図である。図示される実施形態においては、データ統合システム１０４は、ユーザ・インターフェース・システム２０２とデータ・ソース１０２との間の情報の流れを容易なものにする。データ統合システム１０４は、１以上のデータ・ソース１０２に存在するデータを抽出し、場合によっては変換するためのクエリを、インターフェース・システム２０２から受信することができる。インターフェース・システム２０２は、ラップトップ・コンピュータもしくはデスクトップ・コンピュータ、携帯電話、個人用情報端末（「ＰＤＡ」）、ネットワーク化プラットフォーム、およびこれらに取り付けられる装置上で作動するウェブ・ブラウザといった、データ統合システム１０４と通信するためのいずれかの装置およびプログラム、または、データ統合システム１０４とインターフェース接続される他のいずれかの装置またはシステムを含むことができる。 FIG. 2 is a schematic diagram illustrating data integration between multiple entities and business processes in an enterprise. In the illustrated embodiment, the data integration system 104 facilitates the flow of information between the user interface system 202 and the data source 102. The data integration system 104 can receive data from the interface system 202 to extract and possibly convert data present in one or more data sources 102. The interface system 202 is a data integration system such as a laptop or desktop computer, a mobile phone, a personal information terminal ("PDA"), a networked platform, and a web browser that operates on devices attached thereto. Any device and program for communicating with 104 or any other device or system that interfaces with data integration system 104 may be included.

例えば、ユーザは、ＰＤＡを操作して、ＷｉＦｉまたはワイヤレス・アクセス・プロトコル／ワイヤレス・マークアップ言語（「ＷＡＰ／ＷＭＬ」）インターフェースを介してデータ統合システム１０４に情報を要求することができる。データ統合システム１０４は、その要求を受信して、ウェブサイトまたはＦＴＰファイル・サイト等の他のデータ・ソース１０２から情報にアクセスするために、必要ないずれかのクエリを生成することができる。データ・ソース１０２からのデータは、抽出され、要求するインターフェース・システム２０２（この例ではＰＤＡ）と互換性のあるフォーマットに変換され、次いで、ユーザが見て操作するためのインターフェース・システム２０２に送信することができる。他の実施形態においては、データは、データ・ソースから予め抽出され、データ統合システム１０４によって用いられるデータ・ウェアハウスその他のデータ機器であり得る別個のデータベース１１２に格納しておくことができる。データは、変換された状態で、またはその元の状態で、データベース１１２に格納することができる。例えば、データは、多くのデータ・ソース１０２からのデータを他の変換プロセスで結合することができるように、変換された状態で格納することができる。例えば、ＰＤＡからのクエリをデータ統合システム１０４に送信することができ、データ統合システム１０４は、データベース１１２から情報を抽出することができる。抽出後に、データ統合システム１０４は、そのデータをＰＤＡに返信する前にＰＤＡと互換性のある結合フォーマットに変換することができる。 For example, a user can operate a PDA to request information from the data integration system 104 via a WiFi or wireless access protocol / wireless markup language (“WAP / WML”) interface. The data integration system 104 can receive the request and generate any necessary queries to access information from other data sources 102 such as websites or FTP file sites. Data from the data source 102 is extracted and converted into a format compatible with the requesting interface system 202 (PDA in this example) and then sent to the interface system 202 for viewing and manipulation by the user. can do. In other embodiments, the data can be pre-extracted from a data source and stored in a separate database 112, which can be a data warehouse or other data device used by the data integration system 104. The data can be stored in the database 112 in the converted state or in its original state. For example, the data can be stored in a transformed state so that data from many data sources 102 can be combined in other transformation processes. For example, a query from a PDA can be sent to the data integration system 104, which can extract information from the database 112. After extraction, the data integration system 104 can convert the data to a combined format compatible with the PDA before returning it to the PDA.

図３は、企業の複数のデータ・ソース１０２についてのデータ統合を提供するためのアーキテクチャを示す概略図である。データ統合システム１０４の実施形態は、データ・ソースからのデータの抽出およびソース・データについての列の値およびテーブル構造の分析を（場合によっては他の処理の間に）実行するデータ発見段階３０２を含むことができる。また、データ発見段階３０２はデータ・ターゲットについてのテーブル構造、関係およびキーに関する推奨を生成することができる。より高度なプロファイリングおよび監査機能は、日付範囲の検証、計算の精度、if-then評価の精度等を含むことができる。データ発見段階３０２は、ソース・データの冗長な依存関係その他の変則的な部分を排除することなどによって、データを正規化することができる。データ発見段階３０２は、さらなる分析のためにデータ・ソース１０２内部の例外を掘り下げることまたはメインフレーム・データの直接プロファイリングを可能にするなどの付加的な機能を提供することができる。データ発見段階３０２の市販されている形態は、例えば、ＩＢＭのＷｅｂｓｐｈｅｒｅＰｒｏｆｉｌｅＳｔａｇｅ製品があるがこれに限定されない。 FIG. 3 is a schematic diagram illustrating an architecture for providing data integration for a plurality of enterprise data sources 102. Embodiments of the data integration system 104 include a data discovery stage 302 that performs extraction of data from a data source and analysis of column values and table structure for the source data (possibly during other processing). Can be included. The data discovery stage 302 can also generate recommendations regarding the table structure, relationships and keys for the data target. More advanced profiling and auditing features can include date range validation, calculation accuracy, if-then evaluation accuracy, and so on. The data discovery stage 302 can normalize the data, such as by eliminating redundant dependencies and other anomalous parts of the source data. The data discovery stage 302 can provide additional functions such as drilling down exceptions within the data source 102 for further analysis or allowing direct profiling of mainframe data. Commercially available forms of data discovery stage 302 include, but are not limited to, IBM's Websphere ProfileStage product.

データ統合システム１０４はまた、後に変換されることになる品質（クオリティ）データを生成するために、データを準備し、標準化し、照合し、または他の方法で操作する、データ準備段階３０４を含むこともできる。データ準備段階３０４は、データ内の不整合を調整すること、または（１対１の照合、１対多数の照合および重複排除を含む）正確な照合を行うことというような一般的なデータ品質機能を実行することができる。また、データ準備段階３０４は特定のデータ拡張機能を提供することもできる。例えば、データ準備段階３０４は、国際通信の改善のために、住所が多国間の郵便基準（ｍｕｌｔｉｎａｔｉｏｎａｌｐｏｓｔａｌｒｅｆｅｒｅｎｃｅ）に適合することを確実なものにすることができる。データ準備段階３０４は、空間情報の管理のために、位置データを多国間ジオコーディング標準（ｍｕｌｔｉｎａｔｉｏｎａｌｇｅｏｃｏｄｉｎｇｓｔａｎｄａｒｄ）に適合させることができる。データ準備段階３０４は、住所情報が、米国政府に認証された合衆国アドレス修正（ＵＳａｄｄｒｅｓｓｃｏｒｒｅｃｔｉｏｎ）によってアメリカ郵便公社の郵便料金の割引を受けることができることを保証するために、住所を変更または追加することができる。同様の分析およびデータ訂正を、適切に住所が記載された郵便について割引料金を提供する、カナダおよびオーストラリアの郵便システムに導入することができる。データ準備段階３０４の市販されている形態は、例えば、ＩＢＭのＷｅｂｓｐｈｅｒｅＱｕａｌｉｔｙＳｔａｇｅ製品があるがこれに限定されない。 The data integration system 104 also includes a data preparation stage 304 that prepares, standardizes, collates, or otherwise manipulates the data to produce quality data that will be converted later. You can also The data preparation stage 304 includes general data quality functions such as adjusting inconsistencies in the data or performing an exact match (including one-to-one matching, one-to-many matching and deduplication). Can be executed. The data preparation stage 304 can also provide specific data enhancement functions. For example, the data preparation stage 304 may ensure that the address meets multilateral postal standards for improved international communications. The data preparation stage 304 can adapt the location data to a multinational geocoding standard for spatial information management. The data preparation stage 304 changes or adds to the address to ensure that the address information can receive a US Postal Service postage discount by a US address correction approved by the US government. be able to. Similar analysis and data correction can be implemented in Canadian and Australian postal systems that offer discounted rates for properly addressed mail. Commercially available forms of data preparation stage 304 include, but are not limited to, IBM's Websphere QualityStage product.

また、データ統合システムは、変換されたデータを変換し、質を高めて配信するデータ変換段階３０８を含むこともできる。データ変換段階３０８は、データの再構成および再フォーマットのような移行サービスを実行し、システム・ユーザのビジネス規則およびアルゴリズムに基づいて計算を実行することもできる。データ変換段階３０８はまた、特定の分析コンテキストにおけるデータのより高度な調整処理のために、ターゲット・データをデータマートまたはキューブとして知られるサブセットに編成することもできる。データ変換段階３０８は、データ統合システム１０４によって使用される様々なデータ・ソースおよびデータ・ターゲットの様々なソフトウェア・アーキテクチャおよびハードウェア・アーキテクチャの橋渡しをする、（以下に一般的に説明されるような）ブリッジ、トランスレータ、または他のインターフェースを使用することができる。データ変換段階３０８は、プラットフォーム１００全体にわたるデータ統合ジョブを設計するために、グラフィカル・ユーザ・インターフェース、コマンドライン・インターフェース、またはこれらの組み合わせを含むことができる。データ変換段階３０８の市販されている形態は、例えば、ＩＢＭのＷｅｂｓｐｈｅｒｅＤａｔａＳｔａｇｅ製品があるがこれに限定されない。 The data integration system may also include a data conversion stage 308 that converts the converted data and delivers it with enhanced quality. The data conversion stage 308 may perform migration services such as data reconstruction and reformatting and may also perform calculations based on system user business rules and algorithms. Data transformation stage 308 can also organize target data into subsets known as data marts or cubes for more sophisticated reconciliation processing of data in a particular analysis context. Data transformation stage 308 bridges various software and hardware architectures of various data sources and data targets used by data integration system 104 (as described generally below). ) Bridges, translators, or other interfaces can be used. Data conversion stage 308 can include a graphical user interface, a command line interface, or a combination thereof to design data integration jobs across platform 100. Commercially available forms of data conversion stage 308 include, but are not limited to, IBM's Websphere DataStage product.

データ統合システム１０４の段階３０２、３０４、３０８は、該システム１０４の性能を最適化するために、並列実行システム３１０を連続的にまたは組み合わせて用いて実行することができる。 The stages 302, 304, 308 of the data integration system 104 can be performed using the parallel execution system 310 in succession or in combination to optimize the performance of the system 104.

データ統合システム１０４は、データ・ソース１０２と関連するメタデータを管理するためのメタデータ管理システム３１２を含むこともできる。一般に、メタデータ管理システム３１２は、データ統合環境におけるツールの全体にわたって、メタデータの交換、統合、管理および分析を提供することができる。例えば、メタデータ管理システム３１２は、ＩＢＭのＷｅｂｓｐｈｅｒｅＯＤＢＣＭｅｔａＢｒｏｋｅｒ、ＣＡＥＲｗｉｎ、ＩＢＭＷｅｂｓｐｈｅｒｅＰｒｏｆｉｌｅＳｔａｇｅ、ＩＢＭＷｅｂｓｐｈｅｒｅＤａｔａＳｔａｇｅ、ＩＢＭＷｅｂｓｐｈｅｒｅＱｕａｌｉｔｙＳｔａｇｅ、ＩＢＭＤＢ２ＣｕｂｅＶｉｅｗｓおよびＣｏｇｎｏｓＩｍｐｒｏｍｐｔｕのような、異なるソースにおけるデータの、広くアクセス可能な共通のビューを提供することができる。メタデータ管理システム３１２はまた、データ系統および影響分析のための分析ツールを提供することもできる。さらに、メタデータ管理システム３１２を用いて、データ統合システム１０４内のデータについてのデータ定義、アルゴリズムおよびビジネス・コンテキストのビジネス・データ用語集を作成することができ、この用語集は、企業全体で用いられるように公開することができる。メタデータ管理システム３１２の市販されている形態は、例えば、ＩＢＭのＷｅｂｓｐｈｅｒｅＭｅｔａＳｔａｇｅ製品があるがこれに限定されない。 The data integration system 104 can also include a metadata management system 312 for managing metadata associated with the data source 102. In general, the metadata management system 312 can provide metadata exchange, integration, management, and analysis across tools in a data integration environment. For example, the metadata management system 312 is different from IBM's Websphere ODBC MetaBroker, CA ERwin, IBM Websphere ProfileStage, IBM Websphere DataStage, IBM Cosphere Quality Stage, IBM CoVu Quality Stage, IBM DBVu ug Common views can be provided. The metadata management system 312 may also provide analytical tools for data lineage and impact analysis. In addition, the metadata management system 312 can be used to create a business data glossary of data definitions, algorithms, and business context for data in the data integration system 104, which is used throughout the enterprise. Can be made public. A commercially available form of the metadata management system 312 includes, for example, IBM's Websphere MetaStage product, but is not limited thereto.

図４を参照すると、企業に関連するアイテムは、そのアイテムの意味コンテキストを取得するためといった、様々なコンテキストまたは階層の観点から記述することができる。このように、図４は、アイテムについての意味識別子を示す。アイテムは、オブジェクト、クラス、属性、データ・アイテム、データ・モデル、モデル、定義、識別、構造、言語、マッピング、関係、インスタンスその他の意味識別子を含む、他のアイテムまたは概念とすることができる。意味識別子は、アイテムの属性、アイテムの物理的位置、階層等におけるアイテムと１以上の他のアイテムとの関係等に基づいて、アイテムを識別することができる。場合によっては、何らかの特定の関係の不存在として関係を定義することができる。関係は、意味に基づくものとすることができる。関係は、関係階層におけるアイテムの位置を含むことができる。例えば、図４において、関連する他のアイテムとの関係に基づいて、アイテム１５２０２を識別することができる。アイテム１５２０２は、アイテム２５２０４、アイテム３５２０８およびアイテム４５２１０に直接関連するものとして、アイテム５５２１２に間接的に関連するものとして、およびアイテム５５２１２およびアイテム４５２１０を介してアイテム６５２１４に間接的に関連するものとして、識別することができる。アイテム１はまた、アイテム２５２０４、アイテム３５２０８およびアイテム４５２１０に直接関連するものとして識別することもできる。実施形態において、アイテム１５２０２とアイテム５５２１２およびアイテム６５２１４の間の間接的な関係は、アイテム１５２０２とアイテム４５２１０との関係において取得することができる。この連結タイプまたは再帰タイプの識別は、静的な識別子に加えて、動的な識別子の実現を可能にする。例えば、アイテム４５２１０とアイテム６５２１４との間の関係が変化する場合、アイテム２５２０４、アイテム３５２０８およびアイテム４５２１０を組み入れるアイテム１５２０８についての意味識別子は、アイテム４５２１０の組み入れを通してこの変更を組み入れ、アイテム６５２１４が意味識別子内に直接含まれていた場合のように、アイテム６５２１４の変更を説明するために更新を行う必要はない。 Referring to FIG. 4, items associated with a company can be described in terms of various contexts or hierarchies, such as to obtain the semantic context of the item. Thus, FIG. 4 shows the semantic identifier for the item. Items can be other items or concepts including objects, classes, attributes, data items, data models, models, definitions, identifications, structures, languages, mappings, relationships, instances and other semantic identifiers. A semantic identifier can identify an item based on the item's attributes, the physical location of the item, the relationship between the item and one or more other items in the hierarchy, and the like. In some cases, a relationship can be defined as the absence of some particular relationship. The relationship can be based on meaning. The relationship can include the position of the item in the relationship hierarchy. For example, in FIG. 4, item 1 5202 can be identified based on relationships with other related items. Item 1 5202 is directly related to item 2 5204, item 3 5208 and item 4 5210, indirectly related to item 5 5212, and to item 6 5214 via item 5 5212 and item 4 5210. It can be identified as indirectly related. Item 1 can also be identified as directly related to item 2 5204, item 3 5208, and item 4 5210. In an embodiment, the indirect relationship between item 1 5202 and item 5 5212 and item 6 5214 can be obtained in the relationship between item 1 5202 and item 4 5210. This identification of concatenation type or recursion type allows the realization of dynamic identifiers in addition to static identifiers. For example, if the relationship between item 4 5210 and item 6 5214 changes, the semantic identifier for item 1 5208 incorporating item 2 5204, item 3 5208, and item 4 5210 will change this through the incorporation of item 4 5210. Incorporation, there is no need to make an update to account for changes in item 6 5214 as if item 6 5214 was included directly within the semantic identifier.

図５は、意味識別子のより具体的な例を示す。ジムは、米国某州、某町、某通り１１１に居住し、電話番号５５５−５５５−５５５５および社会保障番号０１２−３４−５６７８を有するジムとして識別することができる。代替的に、ジムは、他者との関係の観点から識別することができる。図５に示されるように、ジムは、ベティの息子、ラリーとジェフの兄弟、ジェシカの父親およびフランクの甥として識別することができる。 FIG. 5 shows a more specific example of the semantic identifier. Jim can be identified as a Jim who resides in Sakai Street, Sakaimachi, U.S. 111, USA and has a telephone number 555-555-5555 and a social security number 012-34-5678. Alternatively, Jim can be identified in terms of relationships with others. As shown in FIG. 5, Jim can be identified as Betty's son, Larry and Jeff's brother, Jessica's father, and Frank's nephew.

意味識別子は、１つのアイテムについての固有の識別子とすることができる。図５の例においては、ベティの息子、ラリーとジェフの兄弟、ジェシカの父親およびフランクの甥であるジムが世界に一人しかいない場合には、この意味識別子は、ジムについての固有識別子となる。アイテムに対する固有の意味識別子が、そのアイテムと他のアイテムとの関係のすべてより少ない数である場合を考えることも可能である。ベティの息子、ラリーの兄弟、ジェシカの父親であるジムが世界に一人しかいない場合には、固有の意味識別子を作成するのに、これらの関係の存在だけで十分である。ジムとジェフおよびフランクとの関係を考慮する必要はない。一意性を保証する最小数の関係に基づいた意味識別子を作成することが有利である。例えば、意味識別子がデータベース１１２内に格納されるか、またはデータ統合システム１０４によって処理される場合には、複雑でない意味識別子は、必要とする空間が少なく、より高速な処理が可能になる。 The semantic identifier can be a unique identifier for one item. In the example of FIG. 5, if there is only one person in the world, Betty's son, Larry and Jeff's brother, Jessica's father, and Frank's nephew, this semantic identifier is a unique identifier for Jim. It is also possible to consider the case where the unique semantic identifier for an item is less than all of its relationships with other items. If there is only one Jim in the world, Betty's son, Larry's brother, and Jessica's father, the existence of these relationships is sufficient to create a unique semantic identifier. There is no need to consider the relationship between Jim and Jeff and Frank. It is advantageous to create semantic identifiers based on a minimum number of relationships that guarantee uniqueness. For example, if semantic identifiers are stored in the database 112 or processed by the data integration system 104, less complex semantic identifiers require less space and allow faster processing.

アイテムについての固有の意味識別子を作成するのに必要とされる関係の数は、コンテキストに基づいて異なり得る。図６は、２つの関心あるアイテム、すなわちアイテム１５４０２およびアイテム７５４０４を示す。コンテキストＡ５４０８において、アイテム１５４０２は、アイテム１５４０２とアイテム５５４１０およびアイテム６５４１２との関係によって、アイテム７５４０４と区別することができる。つまり、コンテキストＡにおいて、アイテム１５４０２についての固有の意味識別子は、アイテム２、３および４に直接関連し、アイテム４を通してアイテム５５４１０に間接的に関連し、アイテム５５４１０およびアイテム４を通してアイテム６５４１２に間接的に関連するものとすることができる。コンテキストＡにおいて、アイテム７５４０４についての固有の意味識別子は、アイテム２および３だけに直接関連するものとすることができる。図７は、異なるコンテキスト、すなわちコンテキストＢ５４１４内のアイテム１５４０２を示す。コンテキストＢ５４１４においてアイテム１５４０２を一意的に識別するために、アイテム１５４０２の、アイテム４との直接的な関係、アイテム６との直接的な関係の不存在、またはアイテム５との間接的な関係のいずれか１以上を考えることができる。コンテキストＢ５４１４において、アイテム１５４０２は、アイテム２および３に直接関連するが、アイテム６に直接関連していないものとして一意的にかつ意味的に識別することができる。したがって、アイテム１についての固有識別子は、コンテキストＡ５４０８とコンテキストＢ５４１４で異なる。このように、ここで説明されるデータ統合方法およびシステムの実施形態においては、データ統合ジョブまたはデータ統合プラットフォームに関連したアイテムのようなアイテムについての意味識別子に、そのアイテムについてのコンテキスト依存識別子を与えることができる。実施形態において、こうしたコンテキスト依存識別子は、データ・リポジトリ等の中に、アトミック・フォーマットの形で格納することができる。 The number of relationships required to create a unique semantic identifier for an item can vary based on context. FIG. 6 shows two items of interest, item 1 5402 and item 7 5404. In context A 5408, item 1 5402 can be distinguished from item 7 5404 by the relationship between item 1 5402 and item 5 5410 and item 6 5412. That is, in context A, the unique semantic identifier for item 1 5402 is directly related to items 2, 3 and 4, indirectly related to item 5 5410 through item 4, and item 6 through item 5 5410 and item 4 5412 may be indirectly related. In context A, the unique semantic identifier for item 7 5404 may be directly related to items 2 and 3 only. FIG. 7 shows item 1 5402 in a different context, context B 5414. In order to uniquely identify item 1 5402 in context B 5414, the direct relationship of item 1 5402 with item 4, the absence of a direct relationship with item 6, or indirect with item 5 Any one or more of the relationships can be considered. In context B 5414, item 1 5402 can be uniquely and semantically identified as directly related to items 2 and 3, but not directly related to item 6. Thus, the unique identifier for item 1 is different for context A 5408 and context B 5414. Thus, in the data integration method and system embodiments described herein, a semantic identifier for an item, such as an item associated with a data integration job or data integration platform, is given a context sensitive identifier for that item. be able to. In embodiments, such context sensitive identifiers can be stored in an atomic format, such as in a data repository.

他の実施形態において、コンテキストＡ５４０８およびコンテキストＢ５４１４は、２つの異なるインポート、マッピング、実行バージョン、モデル、メタブローカ・モデル、インスタンス、ツール、ビュー、オブジェクト、クラス、アイテム、関係、属性、または上記のいずれかの任意の組み合わせとすることができる。照合または比較機構は、異なるインポート、実行バージョン、モデル、メタブローカ・モデル、インスタンス、ツール、および／またはアイテムにおけるアイテムの識別の構文（シンタックス）を比較し、その比較に基づいてどの動作を取るべきかまたは動作を取るべきではないかについての判定を決定することができる。例えば、照合エンジンは、インポート・インスタンスＡによって用いられるモデルを、メタブローカＢによって用いられるモデルと比較することができる。この比較に基づいて、メタブローカＢは、変換または修正なしに、インポート・インスタンスＡのデータおよびメタデータにアクセスすることができ、比較機構が、メタブローカＢの続行を命令することができることが決定される。他の例においては、ツールＡ５４０８をツールＢ５４１４と比較することができ、各々のツールが他のツールのオブジェクトにアクセスし、使用することができる、ツール間のオブジェクト併合の実行が決定される。実施形態においては、比較機構が、変換機構をトリガし、それぞれのツールの各々における特定のアイテムの識別の処理のための異なる構文に基づいた変換、あるいは、比較によって決定されるツール間の他の差異に基づいた変換のような変換を必要とするいずれかのオブジェクトを変換するのを助けるといった、ブリッジ、メタブローカ、ハブ等の確立のようなツール間のオブジェクト併合を助けることができる。 In other embodiments, Context A 5408 and Context B 5414 are two different imports, mappings, execution versions, models, metabroker models, instances, tools, views, objects, classes, items, relationships, attributes, or the above Any combination can be used. The matching or comparison mechanism should compare the syntax of identifying items in different imports, execution versions, models, metabroker models, instances, tools, and / or items, and what action to take based on the comparison A determination as to whether or not to take action can be determined. For example, the matching engine can compare the model used by Import Instance A with the model used by Meta Broker B. Based on this comparison, it is determined that Metabroker B can access the data and metadata of Import Instance A without conversion or modification, and that the comparison mechanism can instruct MetaBroker B to continue. . In another example, tool A 5408 can be compared to tool B 5414, and execution of object merging between tools can be determined, with each tool accessing and using objects in other tools. . In an embodiment, the comparison mechanism triggers the conversion mechanism to convert based on different syntax for the processing of identification of a particular item in each of the respective tools, or other between tools determined by comparison. It can help merge objects between tools such as establishing bridges, metabrokers, hubs, etc., to help transform any object that needs transformation, such as transformation based on differences.

実施形態において、意味識別子は、文字列構造またはフォーマットで格納し、維持し、記録し、処理し、および／または解釈することができる構文の形で格納し、維持し、記録し、処理し、および／または解釈することができる。図８は、構文およびその構文内に構成された対応する文字列の例を示す。構文５５０２は、列名：：テーブル名：：データベース名とすることができる。この構文は、例えば、データベース内のテーブルの列を識別する意味識別子に関連付けることができる。この構文５５０４内に構成された文字列は、年齢：：従業員：：従業員データベースとすることができる。この文字列は、例えば、特定の従業員データベース内の従業員の年齢を識別する意味識別子に関連付けることができる。図７の例において、コンテキストＢにおけるアイテム１５４０２についての意味識別子に対応する文字列は、アイテム２との直接的な関係：：アイテム３との直接的な関係：：アイテム４との直接的な関係とすることができる。意味識別子および対応する文字列はまた、アイテム１５４０２とアイテム６との間の直接的な関係の欠如を組み込むこともできる。 In an embodiment, the semantic identifier is stored, maintained, recorded, processed in a syntax that can be stored, maintained, recorded, processed, and / or interpreted in a string structure or format, And / or can be interpreted. FIG. 8 shows an example of a syntax and a corresponding character string constructed in the syntax. The syntax 5502 can be a column name :: table name :: database name. This syntax can be associated with, for example, a semantic identifier that identifies a column of a table in the database. The string constructed in this syntax 5504 can be age :: employee :: employee database. This string can be associated, for example, with a semantic identifier that identifies the age of the employee in a particular employee database. In the example of FIG. 7, the character string corresponding to the semantic identifier for item 1 5402 in context B is a direct relationship with item 2 :: a direct relationship with item 3:: a direct relationship with item 4. Relationship can be. The semantic identifier and the corresponding string can also incorporate the lack of a direct relationship between item 1 5402 and item 6.

図９において、アイテム９５６０２についての、文字列フォーマットでの意味識別子は、アイテム２に直接関連している：：アイテム４に直接関連している：：アイテム５５６０４に間接的に関連しているものとすることができる。文字列を構文解析することができる。構文および／または文字列を切り捨て、修正することができ、および／または、構文および／または文字列の要素を再配列することができる。図１０において、文字列５７０２は、文字列５６０４を切り捨てたものであり、文字列５７０４は、文字列５６０４を切り捨て、修正し、および／または再配列したものであり、文字列５７０８は、文字列５６０６を修正し、および／または再配列したものである。変換エンジンによって、切り捨て、修正、および／または再配列を行うことができる。意味識別子の一意性のために、構文および／または文字列内に含まれるすべての関係を必要としないとき、構文および／または文字列を切り捨てることは有用である。文字列５６０４の所定のコンテキストにおいて、すべてのアイテムがアイテム３に直接関連している、すなわち、例えば、アイテム３が、すべてのアイテムを格納するデータベースであったと想定する。アイテム３を含む関係を省略する文字列を作成するといったように、文字列５６０４を切り捨て、依然として固有の意味識別子を残すことができる。構文および／または文字列の切り捨てにより、格納要件を減らし、処理の効率を増大させることができる。例えば、データ統合プロセスのための処理時間を減少させるために、構文および／または文字列における関係の順序を変えることも有用である。あまり共通性がない関係が先に処理された場合、システムは、アイテムを識別するために、アイテムとの関連付けられたより少ない関係にアクセスし、処理することが必要になる可能性が高い。例えば、アイテム３に関連するアイテムが殆どなく、アイテム４に関連するものはさらに少なく、多くのアイテムがアイテム２に関連する場合には、コンテキストによって、文字列５７０８が、文字列５６０４より短い時間でアイテム９を識別することが可能になる。コンテキストにおいてアイテム９を一意的に識別するために、文字列５７０８の最初の２要素だけを必要とし、文字列５６０４の最初の３要素を必要とするということもあり得る。 In FIG. 9, the semantic identifier in string format for item 9 5602 is directly related to item 2 :: directly related to item 4 :: indirectly related to item 5 5604 Can be. The string can be parsed. The syntax and / or string can be truncated and modified, and / or the elements of the syntax and / or string can be rearranged. In FIG. 10, a character string 5702 is a character string 5604 truncated, a character string 5704 is a character string 5604 truncated, modified, and / or rearranged, and a character string 5708 is a character string 5606 has been modified and / or rearranged. The conversion engine can perform truncation, correction, and / or rearrangement. Because of the uniqueness of semantic identifiers, it is useful to truncate the syntax and / or string when you do not need all the relationships contained within the syntax and / or string. Assume that in the given context of string 5604, all items are directly related to item 3, ie, for example, item 3 was a database that stores all items. The character string 5604 can be truncated to leave a unique semantic identifier, such as creating a character string that omits the relationship involving item 3. Syntax and / or string truncation can reduce storage requirements and increase processing efficiency. For example, it may be useful to change the order of the relationships in the syntax and / or strings to reduce processing time for the data integration process. If a less common relationship is processed first, the system will likely need to access and process fewer relationships associated with the item to identify the item. For example, if there are few items associated with item 3, few are associated with item 4, and many items are associated with item 2, depending on the context, string 5708 may be shorter than string 5604. It becomes possible to identify the item 9. It may be that only the first two elements of string 5708 are needed and the first three elements of string 5604 are needed to uniquely identify item 9 in context.

変換エンジンは、１以上の意味識別子、データベース１１２、意味識別子を含むデータベース１１２、情報システム、１以上の意味識別子を含む情報システム、または他のアイテムに対して変換操作を行うことができる。図１１は、文字列５８０４として具体化される意味識別子、および、データベース５８０８内に配置された文字列として具体化される意味識別子に作用する変換エンジン５８０２を示す。変換操作は、意味識別子のフォーマット、言語、および／またはデータ・モデルを変換するか、または他の方法で修正することができる。変換操作は、１以上のデータ・ツール、言語、フォーマット、および／またはデータ・モデルとの間の変換またはマッピング、少なくとも１つの他のデータ・ツール、言語、フォーマット、および／またはデータ・ツールとの間の変換またはマッピングを含むことができる。例えば、変換操作は、ＩＢＭからのＷｅｂＳｐｈｅｒｅＤａｔａＳｔａｇｅ７、ＩＢＭからのＷｅｂＳｐｈｅｒｅＱｕａｌｉｔｙＳｔａｇｅ、ＢｕｓｉｎｅｓｓＯｂｊｅｃｔツール、ＩＢＭ−ＤＢ２ＣｕｂｅＶｉｅｗｓ、ＵＭＬ１．１、ＵＭＬ１．３、ＥＲＳｔｕｄｉｏ、ＩＢＭのＷｅｂＳｐｈｅｒｅＰｒｏｆｉｌｅＳｔａｇｅ、ＰｏｗｅｒＤｅｓｉｇｎｅｒ（ＰａｃｋａｇｅｓおよびＥｘｔｅｎｄｅｄＡｔｔｒｉｂｕｔｅｓのためのサポートが付加された）、および／またはＭｉｃｒｏＳｔｒａｔｅｇｙツールのような、周知のデータ統合ツールへの、これらからの、またはこれらの間の変換またはマッピングを含むことができる。変換エンジンおよび／または変換操作は、随意的に、メタブローカにおいて具体化することができる。変換操作は、バッチ、リアルタイムまたは連続的に行い、実行し、および／または実施することができる。変換操作は、例えば、サービス指向アーキテクチャの一部としてなど、サービスとして提供すること、または利用可能にすることができる。ＳＯＡは、ビジネス・エンタープライズのエンタープライズ・コンピュータ・システムのインフラストラクチャの一部とすることができる。ＳＯＡにおいて、サービスは、アプリケーション開発および導入のための基礎的要素になり、迅速なアプリケーション開発を可能とし、冗長なコードを避ける。各々のサービスは、サービスについてのデータ入力ソースまたはサービスについてのデータ出力ターゲットといった周囲環境に左右されない１組のビジネス・ロジックまたはビジネス規則を具体化する。その結果、適切な入力および出力がサービスとアプリケーションとの間に確立された場合、サービスを様々なアプリケーションとともに再使用することができる。サービス指向アーキテクチャは、環境の変化に対してサービスが保護されることを可能にするので、アーキテクチャは、周囲のコンピュータ環境が変わったとしても機能する。その結果、インフラストラクチャの変更の結果としてサービスを記録する必要はなくなり、そのことは、時間と労力の節約をもたらす。ＳＯＡは、ウェブ・サービスのためのものとすることができ、３つのエンティティ、すなわちサービス・プロバイダ、サービス・リクエスタおよびサービス・レジストリを含むことができる。レジストリは、公衆のものであっても、または私的なものであってもよい。サービス・リクエスタは、適切なサービスを探してレジストリをサーチすることができる。適切なサービスが発見されると、サービス・リクエスタは、サービスを呼び出すのに必要な、ウェブ・サービス記述言語（ＷｅｂＳｅｒｖｉｃｅｓＤｅｓｃｒｉｐｔｉｏｎＬａｎｇｕａｇｅ、「ＷＳＤＬ」）コードのようなコードを受け取ることができる。ＷＳＤＬは、ウェブ・サービスを記述するために従来より用いられているプログラミング言語である。次に、サービス・リクエスタは、サービスを呼び出すために、適切なフォーマット（ウェブ・サービス・メッセージのためのシンプル・オブジェクト・アクセス・プロトコル（ＳｉｍｐｌｅＯｂｊｅｃｔＡｃｃｅｓｓＰｒｏｔｏｃｏｌ、「ＳＯＡＰ」）フォーマットのような）のメッセージなどを通して、サービス・プロバイダと接続することができる。ＳＯＡＰプロトコルは、ウェブ・サービスにおいてデータを転送するための好ましいプロトコルである。ＳＯＡＰプロトコルは、ウェブ・サービス・クライアントとウェブ・サービス・サーバとの間のメッセージ交換フォーマットを定める。ＳＯＡＰプロトコルは、ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ（「ＸＭＬ」）スキーマを使用し、ＸＭＬは、データのタグ付けのためにウェブ・サービスにおいて通常用いられる一般的な言語仕様であるが、他のマークアップ言語を使用することもできる。 The conversion engine may perform a conversion operation on one or more semantic identifiers, a database 112, a database 112 including semantic identifiers, an information system, an information system including one or more semantic identifiers, or other items. FIG. 11 shows a conversion engine 5802 that operates on a semantic identifier embodied as a character string 5804 and a semantic identifier embodied as a character string located in the database 5808. The conversion operation may convert or otherwise modify the format, language, and / or data model of the semantic identifier. A transformation operation is a transformation or mapping between one or more data tools, languages, formats, and / or data models, with at least one other data tool, language, format, and / or data tools. Conversions or mappings between can be included. For example, the conversion operations are: WebSphere Data Stage7 from IBM, WebSphere Quality Stage from IBM, Business Object Tool for Business, IBM-DB2 Cube Views, UML1.1, UML1.3, ERSStudio, WebPier DataStage. And / or conversion or mapping to, or between, well-known data integration tools, such as MicroStrategie tools. The conversion engine and / or the conversion operation can optionally be embodied in a metabroker. Conversion operations can be performed, performed, and / or performed batch, real-time or continuously. The conversion operation can be provided or made available as a service, eg, as part of a service-oriented architecture. The SOA can be part of the infrastructure of a business enterprise enterprise computer system. In SOA, services become a building block for application development and deployment, enabling rapid application development and avoiding redundant code. Each service embodies a set of business logic or business rules that are independent of the surrounding environment, such as a data input source for the service or a data output target for the service. As a result, the service can be reused with various applications if appropriate inputs and outputs are established between the service and the application. A service-oriented architecture allows services to be protected against environmental changes, so the architecture works even if the surrounding computing environment changes. As a result, there is no need to record services as a result of infrastructure changes, which saves time and effort. The SOA can be for web services and can include three entities: a service provider, a service requester, and a service registry. The registry may be public or private. The service requester can search the registry for an appropriate service. Once an appropriate service is found, the service requester can receive code, such as Web Services Description Language (“WSDL”) code, necessary to invoke the service. WSDL is a programming language traditionally used to describe web services. The service requester then invokes the message in an appropriate format (such as the Simple Object Access Protocol (“SOAP”) format) for web service messages. The service provider can be connected through such as. The SOAP protocol is the preferred protocol for transferring data in web services. The SOAP protocol defines a message exchange format between a web service client and a web service server. The SOAP protocol uses an eXtensible Markup Language (“XML”) schema, which is a common language specification commonly used in web services for tagging data, but uses other markup languages You can also

意味識別子、データベース１１２、１以上の意味識別子を含むデータベース１１２、情報システム、１以上の意味識別子を含む情報システム、または他のアイテムについての変換操作が存在すると、この変換操作を、いずれかの他の意味識別子、データベース１１２、１以上の意味識別子を含むデータベース１１２、情報システム、１以上の意味識別子を含む情報システム、あるいは少なくとも１つの変換操作を共有する他のアイテムとの間で変換し、これにマッピングし、これに結合し、これとともに使用し、またはこれと関連付けることが可能になる。変換操作のために、ハブのようなアトミック・データ・リポジトリを使用されるような実施形態においては、変換操作のマッピングは、とりわけ、操作の実行において、元の意味コンテキストと変換された意味コンテキストとの間で前後に変換されるデータをトレースすることができる。コンテキストによって、構文および／または文字列を変えるかまたは切り捨ててより効率的な格納またはより高速な処理を可能することによって、あるいは、意味コンテキストが変化する固有識別子を形成するのに用いられる関係を変えることなどによって、データ・アイテムの適切な識別子が変化することがある。したがって、動的な識別子は、再トレース可能な変換の利点を、データ・アイテムが用いられる種々のコンテキストにおける高速処理、効率的なデータ処理および効率的な操作の利点と結び付けることができる。 If there is a conversion operation for the semantic identifier, the database 112, the database 112 including one or more semantic identifiers, the information system, the information system including one or more semantic identifiers, or other items, this conversion operation is Conversion between the semantic identifier of the database 112, the database 112 including one or more semantic identifiers, the information system, the information system including one or more semantic identifiers, or other items sharing at least one conversion operation. Can be mapped to, combined with, used with, or associated with. In an embodiment where an atomic data repository such as a hub is used for the transformation operation, the mapping of the transformation operation includes, among other things, the original semantic context and the transformed semantic context in performing the operation. Data that is converted back and forth between can be traced. Depending on the context, the syntax and / or string may be altered or truncated to allow more efficient storage or faster processing, or the relationship used to form a unique identifier where the semantic context changes The appropriate identifier of the data item may change. Thus, dynamic identifiers can combine the benefits of retraceable transformations with the advantages of fast processing, efficient data processing, and efficient manipulation in the various contexts in which the data items are used.

モデル内に識別を有するアイテムのような所定のアイテムは、物理インスタンスおよび論理モデリング・インスタンスのような多数の形式またはインスタンスで存在することができる。図１２は、アイテム、すなわち従業員情報５９０２のテーブルを示す。しかし、概念またはエンティティ「従業員」は、企業内で多数の異なる形式で存在する場合がある。例えば、従業員テーブル５９０２は、従業員に関連した値を物理データ・ストレージ機構に格納する物理テーブルとして存在することができる。他方、エンティティ従業員は、論理モデリング活動５９０８または種々の他の形式またはインスタンスにおける従業員を表すアイコンまたはテキストのような、論理エンティティとして表すこともできる。つまり、いずれかの関連したデータまたはメタデータを含む同じアイテムが、例えば、データベース、データ・リポジトリ、モデル、ハブ等におけるビュー、モデル、構造、またはデータ統合環境にわたる様々な形式またはインスタンスで存在することができる。図１３は、データベース６００２内の１つの形式または単一のインスタンス、および／または、データベース６００４またはハブ６００８内の２つ以上の形式またはインスタンスでの従業員テーブル５９０２を示す。 A given item, such as an item having an identity in the model, can exist in many forms or instances, such as physical instances and logical modeling instances. FIG. 12 shows a table of items, ie employee information 5902. However, the concept or entity “employee” may exist in many different forms within an enterprise. For example, employee table 5902 may exist as a physical table that stores values associated with employees in a physical data storage facility. On the other hand, entity employees may also be represented as logical entities, such as icons or text representing employees in logical modeling activities 5908 or in various other forms or instances. That is, the same item, including any related data or metadata, exists in various forms or instances across views, models, structures, or data integration environments, for example in databases, data repositories, models, hubs, etc. Can do. FIG. 13 shows an employee table 5902 in one form or single instance in database 6002 and / or in two or more forms or instances in database 6004 or hub 6008.

アイテムの種々の形式またはインスタンスを区別するために、抽象化レベル、アイテムの物理プロパティ、階層内のアイテムの位置、データベース内のアイテムの位置、アイテムが見出されるコンテキスト、アイテムの構文、アイテムと他のアイテムの関係、アイテムの属性、アイテムのクラス、または他の特性といった、何らかの区別特性を使用されることができる。例えば、再び図５を参照すると、年齢、性別、髪の色、ＩＱ、政治的所属、および／または過去３ヶ月に医者にかかった回数に基づいて、アイテムすなわちこの場合は個人を区別することができる。例えば、年齢が製品を区別化する要因として選択された場合には、ジェシカは唯一の１０歳以下の個人であり、ベティは５７歳から６７歳までの唯一の個人であり、ジムは３７歳である唯一の個人である。他の例において、アイテムの異なる形式またはインスタンスは、異なる抽象化レベルで、または異なるコンテキストで存在することができる。例えば、従業員テーブルは、従業員に関するデータに関連したデータベース内の値を格納するためなどに用いられる物理的な従業員テーブル５９０４、および従業員に関連するプロセスに鑑みて用いられるような論理的な従業員モデル５９０８といった、ハブ６１０２内の多数の形式またはインスタンスで存在することができる。 To distinguish between different types or instances of items, the level of abstraction, item physical properties, item location in the hierarchy, item location in the database, the context in which the item is found, item syntax, item and other Some distinguishing characteristics can be used, such as item relationships, item attributes, item classes, or other characteristics. For example, referring again to FIG. 5, it is possible to distinguish items, or individuals in this case, based on age, gender, hair color, IQ, political affiliation, and / or the number of times the doctor has been seen in the last three months. it can. For example, if age is selected as a product distinguishing factor, Jessica is the only individual under 10 years old, Betty is the only individual from 57 to 67 years old, Jim is 37 years old There is only one individual. In other examples, different types or instances of items can exist at different levels of abstraction or in different contexts. For example, the employee table may be a logical employee table 5904, such as used to store values in a database related to employee-related data, and logical processes such as those used in connection with employee-related processes. It can exist in numerous forms or instances within the hub 6102, such as a unique employee model 5908.

識別された特定のアイテムの異なるインスタンス間で区別化を行うことによって、様々な他の方法およびプロセスが可能になる。例えば、１つの実施形態において、「従業員」と名づけられたテーブルのようなアイテムをハブに入れることができる。ハブ・コレクタは、ハブ内に２つの形式またはインスタンス（１つは物理データベース・インスタンスに対応し、他のものは論理モデリング活動に対応する）の「従業員」を有することができる。ハブ内のアイテムに帰属するアイテムのプロパティのような区別特性により、物理インスタンスと論理モデル・インスタンスまたは形式との間の区別が可能になる。実施形態において、その区別特性は、抽象化の論理レベルと物理レベルを区別するためといった、いわゆる抽象化レベルとすることができる。他の場合においては、ハブは、他の特性を、異なる形式の識別子、関係、クラス、属性、物理的位置、論理的位置、モデル等のようなアイテムと関連付けることができる。 Differentiating between different instances of a particular item identified allows for various other methods and processes. For example, in one embodiment, an item such as a table named “employee” may be placed in the hub. A hub collector can have two types or instances of “employees” within the hub, one corresponding to a physical database instance and the other corresponding to a logical modeling activity. Differentiating characteristics, such as the properties of items attributed to items in the hub, allow a distinction between physical instances and logical model instances or forms. In the embodiment, the distinction characteristic may be a so-called abstraction level, such as for distinguishing between a logical level and a physical level of abstraction. In other cases, the hub may associate other characteristics with items such as different types of identifiers, relationships, classes, attributes, physical locations, logical locations, models, and the like.

図１５に示されるように、データベース内にロードされるデータの選択、データの変換、クエリの生成等のような操作を実行するとき、変換エンジン６２０４のようなシステムは、ハブ６２０８またはデータベース６２１０から、すべてのアイテムをグラブし、ロードし、または獲得することができる。システムは、何らかの区別特性に基づいて、アイテムを選択またはフィルタリングすることができる。例えば、システムは、物理的抽象化レベルを有する、他のアイテムと特定の関係を有する、論理的抽象化レベルを有する、指定された日および時間の前に作成された、またはいずれかの他の区別特性を有するインスタンスまたは形式を選択またはフィルタリングすることができる。したがって、ここで説明される方法およびシステムは、いずれかの区別特性に基づいた同じアイテムまたはエンティティのインスタンスの選択的な処理を提供する。 As shown in FIG. 15, when performing operations such as selecting data to be loaded into a database, transforming data, generating queries, etc., a system such as the transformation engine 6204 can be accessed from a hub 6208 or a database 6210. All items can be grabbed, loaded, or acquired. The system can select or filter items based on some distinguishing characteristic. For example, the system has a physical abstraction level, has a specific relationship with other items, has a logical abstraction level, was created before a specified date and time, or any other Instances or types that have distinct characteristics can be selected or filtered. Thus, the methods and systems described herein provide for selective processing of instances of the same item or entity based on any distinguishing characteristic.

図１６に示されるように、クエリ６２０２に応答するものとすることができる、変換操作のようなデータ統合操作を実行するとき、変換エンジン６２０４は、ハブ６２０８またはデータベース６２１０において、任意のデータおよび／またはメタデータを含むアイテムをフィルタリングまたは選択することができ、関連する抽象化レベルのそれらのアイテムだけをグラブし、ロードし、または獲得することができる。例えば、変換エンジンは、論理的抽象化レベルを有するインスタンスまたは形式をフィルタリングまたは選択し、物理的抽象化レベルを有するものだけを保持することができる。フィルタリングまたは選択は、実行時および設計時に行うことができ、バッチで、リアルタイムで、または連続的に行うことができる。実施形態において、フィルタリングおよび選択のこうした方法は、サービス指向アーキテクチャにおけるＲＴＩサービスとして提供することができる。 As shown in FIG. 16, when performing a data integration operation, such as a conversion operation, that may be responsive to a query 6202, the conversion engine 6204 may receive any data and / or data in the hub 6208 or database 6210. Or items that contain metadata can be filtered or selected, and only those items at the relevant level of abstraction can be grabbed, loaded, or acquired. For example, the transformation engine can filter or select instances or forms that have a logical abstraction level, and keep only those that have a physical abstraction level. Filtering or selection can be done at run time and design time, and can be done in batch, in real time, or continuously. In an embodiment, such methods of filtering and selection can be provided as RTI services in a service-oriented architecture.

フィルタリングまたは選択は、開発時、設計時、または実行時に変換エンジンおよび／またはシステムによって獲得される、データ・モデルのマッピング、メタデータ・モデルのマッピング、区別特性、アイテムと他のアイテムの関係、アイテムの属性、または識別子の構文などの情報に基づくものとすることができる。実施形態において、情報は、リアルタイムで動的に更新することができる。 Filtering or selection is acquired by the transformation engine and / or system at development time, design time, or runtime, data model mapping, metadata model mapping, distinctive characteristics, relationship between items and other items, item Based on information such as the attribute or identifier syntax. In embodiments, the information can be updated dynamically in real time.

プロセス全体においてフィルタリングまたは選択がハブまたはデータベースにより近いほど、操作がより効率的かつ高速になる。図１７に示されるように、変換エンジン６２０４は、クエリ６２０２自体の変換操作を行い、ハブ６２０８またはデータベース６２１０に直接送るなど、さらなる処理のために送ることができる改善されたクエリ６４０２を実現することができる。例えば、改訂されたクエリ６４０２は、ハブ６２０８またはデータベース６２１０の固有フォーマットと直接互換性のあるフォーマットにすることができる。例えば、クエリをデータベース６２１０の固有フォーマットにすることによって、システムは、クエリに対する処理効率を高めることができる。同様に、クエリ６４０２をフィルタリングすることができ、または物理エンティティではなく論理モデリング・エンティティを保持するために、選択コマンドのようなコマンドを生成することができ、この場合、クエリ５４０２は、データベースに適したフォーマットではなく、論理モデリング活動（グラフィカル・ユーザ・インターフェースのような）に適したフォーマットにすることができる。勿論、クエリのみならず他のメッセージおよび操作を、抽象化レベルに従ってフィルタリングし、同じエンティティが、データ統合プラットフォームにわたってトレースされ、特定のデータ統合活動の適切な操作環境に従って処理できるようにすることが可能である。 The closer the filtering or selection is to the hub or database throughout the process, the more efficient and faster the operation. As shown in FIG. 17, the transformation engine 6204 performs a transformation operation on the query 6202 itself and implements an improved query 6402 that can be sent for further processing, such as sent directly to the hub 6208 or database 6210. Can do. For example, the revised query 6402 can be in a format that is directly compatible with the native format of the hub 6208 or database 6210. For example, by putting the query in the native format of the database 6210, the system can increase the processing efficiency for the query. Similarly, query 6402 can be filtered or commands such as select commands can be generated to hold logical modeling entities rather than physical entities, in which case query 5402 is suitable for the database. Format suitable for logical modeling activities (such as a graphical user interface). Of course, not only queries but also other messages and operations can be filtered according to the level of abstraction so that the same entity can be traced across the data integration platform and processed according to the appropriate operating environment for the particular data integration activity. It is.

ここで説明される方法およびシステムを用いて、意味コンテキストを捕捉し、オブジェクト、データ・アイテム、データ、列、行、テーブル、データベース、インスタンス、属性、メタデータ、概念、トピック、主題、意味識別子、他の識別子、ＲＦＩＤタグ、ベンダー、供給業者、顧客、人、チーム、組織、ユーザ、ネットワーク、システム、装置、家族、店、製品、製造ライン、製品特性、製品仕様、製品属性、価格、コスト、材料仕様書、出荷データ、税金データ、コース、教育プログラム、位置、地図、部門、組織、有機的組織体、プロセス、規則、法、評価システム、商品、サービス、および／またはサービス提供のような、企業に関連した広範囲のアイテムに対してデータ統合タスクを処理することができる。 Using the methods and systems described herein, semantic context is captured and objects, data items, data, columns, rows, tables, databases, instances, attributes, metadata, concepts, topics, subject matter, semantic identifiers, Other identifiers, RFID tags, vendors, suppliers, customers, people, teams, organizations, users, networks, systems, equipment, families, stores, products, production lines, product characteristics, product specifications, product attributes, prices, costs, Such as material specifications, shipping data, tax data, courses, educational programs, locations, maps, departments, organizations, organic organizations, processes, rules, laws, evaluation systems, goods, services, and / or service offerings, Data integration tasks can be handled for a wide range of items related to the enterprise.

ここで説明される方法およびシステムは、企業の方法におけるステップ、データベース内のデータ、行または列内のデータ、テーブル内の行または列、データベース内の行または列、テーブル内のデータ、データベース内のテーブル、データベース内のメタデータ、ハブまたはリポジトリ内のアイテム、データベース内のアイテム、テーブル内のアイテム、列内のアイテム、行内のアイテム、組織内の人、通信の送信者または受信者、ネットワーク上のユーザ、ネットワーク上のシステム、ネットワーク上の装置、家族内の人、店の中の品目、メニュー上の料理、製造ライン内の製品、製品提供における製品、教育プログラムまたは訓練プログラムにおけるコースまたはステップ、地図上の位置、アイテムの位置、組織の部門、チームの人、規則システムにおける規則、サービス・スイートにおけるサービス、企業の組織階層内のエンティティ、供給チェーン内のエンティティ、マーケットにおける顧客、購買決定における購入者、商品またはサービスの価格、商品またはサービスのコスト、製造またはシステムの構成部品、方法のステップ、グループのメンバー、または多くの他のものといった、様々な意味コンテキストにおいて使用されることができる。 The methods and systems described herein include steps in an enterprise method, data in a database, data in a row or column, rows or columns in a table, rows or columns in a database, data in a table, data in a database Table, metadata in database, item in hub or repository, item in database, item in table, item in column, item in row, person in organization, sender or receiver of communication, on network Users, systems on the network, devices on the network, people in the family, items in the store, dishes on the menu, products in the production line, products in the product offering, courses or steps in the educational or training program, maps Top position, item position, organizational department, team person, Rules in the law system, services in the service suite, entities in the corporate organizational hierarchy, entities in the supply chain, customers in the market, buyers in purchase decisions, prices of goods or services, costs of goods or services, manufacturing or systems Can be used in a variety of semantic contexts, such as components, method steps, group members, or many others.

本発明は、特定の好ましい実施形態に関連して説明されたが、他の実施形態が、当業者によって認識され、本発明の範囲内に含まれるように意図されていることに留意されたい。 Although the invention has been described with reference to certain preferred embodiments, it is noted that other embodiments are recognized by those skilled in the art and are intended to be included within the scope of the invention.

各々が複数の異なるコンピュータ・アプリケーションおよびデータ・ソースを含むことができる複数のビジネス・プロセスを有するビジネス・エンタープライズの概略図である。1 is a schematic diagram of a business enterprise having multiple business processes, each of which can include multiple different computer applications and data sources. ビジネス・エンタープライズの複数のビジネス・プロセスにわたるデータ統合を示す概略図である。1 is a schematic diagram illustrating data integration across multiple business processes of a business enterprise. FIG. ビジネス・エンタープライズに複数のデータ・ソースのデータ統合を提供するためのアーキテクチャを示す概略図である。1 is a schematic diagram illustrating an architecture for providing data integration of multiple data sources to a business enterprise. FIG. 他のアイテムとの関連でアイテムを示す。Show items in relation to other items. 他のアイテムとの関連でアイテムを示す。Show items in relation to other items. 特定のコンテキストにおけるアイテムを示す。Indicates an item in a specific context. 特定のコンテキストにおけるアイテムを示す。Indicates an item in a specific context. 特定の文字列を示す。Indicates a specific string. アイテムおよび対応する文字列を示す。Indicates an item and the corresponding string. 文字列および特定の変形を示す。Indicates a string and a specific variant. 特定の文字列に作用する変換エンジンを示す。Indicates a conversion engine that operates on a specific string. 多数の形式またはインスタンスで存在することができるアイテムを示す。Indicates an item that can exist in multiple forms or instances. ハブまたはデータベースにおいて多数の形式またはインスタンスで存在することができるアイテムを示す。Indicates an item that can exist in multiple forms or instances in a hub or database. ハブ内の種々の抽象化レベルのアイテムを示す。Fig. 4 illustrates items at various levels of abstraction in the hub. データベースまたはハブにおいてすべてのアイテムがグラブされる変換プロセスを示す。Shows the conversion process where all items are grabbed in the database or hub. データベースまたはハブにおいてすべてのアイテムがフィルタリングされる変換プロセスを示す。Fig. 4 illustrates a conversion process in which all items are filtered in a database or hub. クエリが変換される変換プロセスを示す。Indicates the conversion process in which the query is converted.

Claims

Providing a semantic identifier for identifying the item based on relationships with other items;
Obtaining a mapping of the data model to enable determination of the semantic identifier for an item in the data model;
Associating the mapping with a data integration function performed based on at least one of a mapping and the semantic identifier;
including,
A method for data integration.

The item is an object, data item, data, column, row, table, database, instance, attribute, metadata, concept, topic, subject, identifier, semantic identifier, RFID tag, vendor, supplier, customer, person, Team, organization, user, network, system, equipment, family, store, product, production line, product characteristics, product specification, product attribute, price, cost, material specification, shipping data, tax data, course, education program, location The method of claim 1, comprising one or more of: a map, a department, an organization, an organic organization, a process, a rule, a law, a rating system, a product, a service, and a service offering.

The method of claim 1, wherein the relationship includes an item's position in a relationship hierarchy.

The method of claim 1, wherein the semantic identifier is a unique identifier for an item.

The method of claim 1, wherein the semantic identifier is based on a sufficient number of relationships for the identifier to be less than all relationships with other items of the item.

The method of claim 1, wherein the semantic identifier is based on a minimum number of relationships necessary for the identifier to be unique.

The method of claim 1, wherein the semantic identifier is a context sensitive identifier for an item.

The method of claim 1, wherein the semantic identifier is stored in an atomic format.

The method of claim 1, wherein the semantic identifier is stored in a data repository in an atomic format.

The method of claim 1, wherein the semantic identifier is dynamic.

The method of claim 1, wherein the semantic identifier varies with context.

A method for performing a data integration process,
Associating a model with a data set;
Forming a selection command for selecting an item from the data set based on a distinguishing characteristic for the item determined from the model;
including,
Method.

13. The method of claim 12, wherein the formation of the selection command / query is performed during execution of a process that uses the selection command / query.

13. The method of claim 12, wherein the forming of the selection command / query is performed at the time of designing a process that uses the selection command / query.

A method for performing a data integration process,
Associating a model with a data set;
Forming a query for querying the data set based on a distinguishing characteristic for the item determined from the model;
including,
Method.

The method of claim 15, wherein the formation of the selection command / query is performed during execution of a process that uses the selection command / query.

16. The method of claim 15, wherein the forming of the selection command / query is performed at the time of designing a process that uses the selection command / query.

A system for data integration,
A semantic identifier to identify the item based on its relationship to other items,
Mapping of the data model to enable determination of the semantic identifier for items in the data model;
A mechanism for associating the mapping with a data integration function performed based on at least one of the mapping and the semantic identifier;
A system comprising:

The system of claim 18, wherein the relationship includes a position of an item in a relationship hierarchy.

The system of claim 18, wherein the semantic identifier is a unique identifier for an item.

19. The semantic identifier is based on a number of relationships that is less than all relationships with other items of the item but sufficient to ensure that the identifier is unique. System.

The system of claim 18, wherein the semantic identifier is based on a minimum number of relationships that ensure that the identifier is unique.

The system of claim 18, wherein the semantic identifier is a context sensitive identifier for an item.

The system of claim 18, wherein the semantic identifier is stored in an atomic format.

The system of claim 18, wherein the semantic identifier is stored in a data repository in an atomic format.

The system of claim 18, wherein the semantic identifier is dynamic.

The system of claim 18, wherein the semantic identifier varies with context.

The semantic identifier recursively obtains an indirect relationship to the second item by obtaining a direct relationship to the first item that has a direct relationship to the second item; The system of claim 18.

The system of claim 18, wherein the semantic identifier is obtained as a string and the string is truncated if not all elements are required for a unique identifier.

The system of claim 18, wherein the data integration function is a conversion operation.

31. The system of claim 30, wherein the conversion operation modifies one or more of the format of semantic identifiers, the language of semantic identifiers, and the data model of semantic identifiers.

31. The system of claim 30, wherein the mapping of the conversion operation can trace data that is converted back and forth between the original semantic context and the converted semantic context in performing the operation.

The system of claim 18, wherein the conversion operation is provided as a service in a service oriented architecture.

The system of claim 18, further comprising a filter for selectively filtering instances of logical entities based on the distinguishing characteristics of the entities.

35. The system of claim 34, wherein the distinguishing characteristic is obtained from at least one of the mapping and the semantic identifier.