JP2001312504A

JP2001312504A - Method and system to extract knowledge

Info

Publication number: JP2001312504A
Application number: JP2000123630A
Authority: JP
Inventors: P Bakurausukii Kenneth; ケネス・ピー・バクラウスキー
Original assignee: Jarg Corp
Current assignee: Jarg Corp
Priority date: 2000-04-25
Filing date: 2000-04-25
Publication date: 2001-11-09

Abstract

PROBLEM TO BE SOLVED: To provide a data warehouse that can include an indexed database and also can store data on information stored in an external database. SOLUTION: The information extracting system that processes querys for extracting information from the database is provided with a mechanism that finds characteristics and fragments of the characteristics within the indexed database, an evaluation mechanism that evaluates repeatedly sub-querys using the characteristics and the fragment of the characteristics being found after recognizing the multi-level sub-queries being included in the query, and a mechanism that gathers the results of the repeated evaluations for the query and the sub-query succeeding the calculation of the total result of the query and also stores the results in a memory.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明が属する技術分野】本出願は、共に出願され同一
譲受人に譲渡された予備出願であるＫｅｎｎｅｔｈ
Ｐ．Ｂａｃｌａｗｓｋｉの「知識抽出システム及びそ
の方法」と題する１９９８年７月２８日付米国特許予備
出願第６０／０９４，３５０号と、Ｋｅｎｎｅｔｈ
Ｐ．Ｂａｃｌａｗｓｋｉの「分散オブジェクト検索シ
ステム及びその方法」と題する１９９８年７月２４日付
米国特許予備出願第６０／０９４，１１０号との２つの
出願に関連する。上記出願の開示は参照により本明細書
に含まれる。本出願は又、共に出願され同一譲受人に譲
渡された特許出願であるＫｅｎｎｅｔｈＰ．Ｂａｃ
ｌａｗｓｋｉの「オブジェクト検索を実行するための分
散コンピュータ・データベース・システム及びその方
法」と題する米国特許出願第ｘｘｘ，ｘｘｘ号にも関連
し、上記出願の開示は参照により本明細書に含まれる。BACKGROUND OF THE INVENTION This application is a preliminary application, Kenneth, filed together and assigned to the same assignee.
P. US Patent Application Serial No. 60 / 094,350, July 28, 1998, entitled "Knowledge Extraction System and Method" by Bacrawski;
P. It is related to two applications, U.S. Patent Application Serial No. 60 / 094,110, July 24, 1998, entitled "Distributed Object Retrieval System and Method" by Bacrawski. The disclosure of the above application is incorporated herein by reference. This application is also related to patent application Kenneth P.K., filed together and assigned to the same assignee. Bac
Lawski also relates to US Patent Application No. xxx, xxx entitled "Distributed Computer Database System and Method for Performing Object Search", the disclosure of which is incorporated herein by reference.

【０００２】本発明はコンピュータ・データベース・シ
ステムに関し、更に詳しくは分散コンピュータ・データ
ベース・システムに関する。[0002] The present invention relates to computer database systems, and more particularly, to distributed computer database systems.

【０００３】[0003]

【従来の技術】企業は顧客、製品、運営、業務活動につ
いて大量のデータをルーチン業務として収集している。
このデータに潜んでいる見識はマーケティング、運営コ
ストや戦略的意志決定に寄与し得るものである。例え
ば、１つの製品を購入する顧客と別の製品を購入する顧
客との間に強い相関が見られる場合、製品の一方だけを
購入した顧客は他方の製品を購入する可能性の高い見込
み客である。2. Description of the Related Art Enterprises collect large amounts of data on customers, products, operations, and business activities as routine work.
The insights hidden in this data can contribute to marketing, operating costs and strategic decisions. For example, if there is a strong correlation between a customer who buys one product and a customer who buys another product, a customer who buys only one of the products is a prospect who is likely to buy the other product is there.

【０００４】データの分析処理は主としてデータの相関
やその他のパターンを抽出する統計的手法を用いて行な
われている。この種の処理は、データ・マイニング（ｄ
ａｔａｍｉｎｉｎｇ）、知識の発見（ｋｎｏｗｌｅｄ
ｇｅｄｉｓｃｏｖｅｒｙ）、また知識の抽出（ｋｎｏ
ｗｌｅｄｇｅｅｘｔｒａｃｔｉｏｎ）等と様々に呼ば
れて来た。大量のデータからの特定パターン又は特定の
種類のパターンの検索はパターンの問い合わせ（ｐａｔ
ｔｅｒｎｑｕｅｒｙ）と呼ばれるだろう。[0004] Data analysis processing is mainly performed by using a statistical method for extracting data correlation and other patterns. This type of processing is called data mining (d
data mining, knowledge discovery (known)
Ge discovery) and knowledge extraction (kno)
wredge extraction). Searching for a specific pattern or a specific type of pattern from a large amount of data is performed by querying a pattern (pat
will be referred to as a turn query.

【０００５】大企業では大抵多数のデータベースを保持
しており、その多くがトランザクション型データベース
である。これらのデータベースの要件はデータ・マイニ
ングの要件と相反することが多い。トランザクション型
データベース（ｔｒａｎｓａｃｔｉｏｎａｌｄａｔａ
ｂａｓｅ）はリアルタイムで動作する小さなトランザク
ションを用いて更新される。一方、データ・マイニング
はリアルタイムで行なう必要がない大きなパターン・ク
エリ（パターンの問い合わせ）を使用する。この相反を
解決するため、データ・ウェアハウス（ｄａｔａｗａ
ｒｅｈｏｕｓｅ）と呼ばれる集中資源へ各種の供給源か
らのデータをダウンロードするのが今日では一般化して
いる。[0005] Large companies usually have a large number of databases, many of which are transactional databases. The requirements of these databases often conflict with the requirements of data mining. Transactional database (transactional data)
base) is updated with a small transaction operating in real time. On the other hand, data mining uses a large pattern query that does not need to be performed in real time. In order to resolve this conflict, the data warehouse (data wa)
It is now common to download data from various sources into a centralized resource called the house.

【０００６】各種の、時には異種の供給源からのデータ
をダウンロードして集中化するには複数のタスクが必要
である。データを供給源から抽出し、共通の統合データ
・モデルに変換し、誤っているデータ又は不正確なデー
タを排除するか訂正して清浄化し、全てのデータが格納
されて更に別のデータベースを構成する中央ウェアハウ
スへ統合する必要がある。更に、全てのビジネス実体の
全ての事例（インスタンス）、例えば顧客、製品、又は
従業員等が正しく識別（特定）されたことを確認する必
要がある。これは参照上の統一性（ｒｅｆｅｒｅｎｔｉ
ａｌｉｎｔｅｇｒｉｔｙ）の問題として知られてい
る。これらの全てのタスク（作業）は、特に僅かに違う
方法でビジネス実体を識別するデータベースからデータ
をダウンロードしようとする場合に、参照上の統一性を
確保しながら行うことは困難である。現行技術ではデー
タ・マイニングから独立した活動としてデータをデータ
・ウェアハウスへダウンロードしている。大量の研究文
献や多くの商用製品が存在するデータ・マイニングとは
対照的に、データ・ウェアハウス化は強力な理論的基盤
に欠け、優れた商用製品はほとんどない。[0006] Downloading and centralizing data from various, and sometimes disparate, sources requires multiple tasks. Extract data from sources, convert it to a common integrated data model, eliminate or correct erroneous or incorrect data and clean it up, all data is stored to form yet another database Need to be integrated into a central warehouse. In addition, it is necessary to confirm that all instances of all business entities, such as customers, products, or employees, have been correctly identified. This is referred to as reference uniformity.
It is known as the problem of al integrity. All of these tasks are difficult to perform while ensuring uniformity of reference, especially when trying to download data from a database that identifies business entities in a slightly different way. Current technology downloads data to data warehouses as an activity independent of data mining. In contrast to data mining, which has a large body of research literature and many commercial products, data warehousing lacks a strong theoretical foundation and few good commercial products.

【０００７】データ・ウェアハウスは多種多様なデータ
源を統合するので、データ・ウェアハウスのための統合
されたデータ・モデル並びに各データ源からデータを抽
出、変換、清浄化するデータ・マッピングを指定するこ
とが必要である。この技術分野では、オブジェクト指向
データ・モデル等のリッチな（情報量の多い）データ・
モデルの方が、もっと制約のあるデータ・モデル、例え
ばリレーショナル・モデル等のデータ・モデルより、こ
のような統合データ・モデルを定義し、データ・マッピ
ングを定義するのに一層適していることが公知となって
いる。しかし、大半のデータ・ウェアハウスではリレー
ショナル・モデル等のフラットなレコード構造を採用し
ている。リレーショナル・データベースは非常に限定さ
れたデータ構造を持っているので、もっと複雑なデータ
構造を合成するのは難しく誤りを犯し易い。リレーショ
ナル・データベースへの格納にはあまり適していない種
類のデータの幾つかを挙げると、一般に文字データ、と
くにハイパーテキスト文書、画像、サウンド、マルチメ
ディア・オブジェクト、及び複数の値を持つ属性があ
る。リレーショナル・データベースは非常に多数の潜在
的属性を備え、その内の幾つかだけが任意のレコードで
使用されるような場合のレコードを表現するのにもあま
り適していない。Because the data warehouse integrates a wide variety of data sources, it specifies an integrated data model for the data warehouse and a data mapping that extracts, transforms, and cleans data from each data source. It is necessary to. In this technical field, rich (information-rich) data such as object-oriented data models
It is known that models are more suitable for defining such integrated data models and defining data mappings than more restrictive data models, for example, data models such as relational models. It has become. However, most data warehouses use a flat record structure, such as a relational model. Because relational databases have very limited data structures, synthesizing more complex data structures is difficult and error prone. Some of the types of data that are not well suited for storage in relational databases include character data, especially hypertext documents, images, sounds, multimedia objects, and attributes with multiple values. Relational databases have a very large number of potential attributes, and are not well-suited for representing records where only some are used in any given record.

【０００８】オブジェクト・データベースはデータ・オ
ブジェクト又は情報オブジェクトの集合体で構成される
のが一般的である。各々の情報オブジェクトはオブジェ
クト識別子（ＯＩＤ）で一義的に識別される。各情報オ
ブジェクトは特徴を備えることができ、幾つかの特徴は
関連する値を持つことができる。情報オブジェクトは他
の情報オブジェクトを含んだり参照することもできる。[0008] An object database is generally composed of a collection of data objects or information objects. Each information object is uniquely identified by an object identifier (OID). Each information object can have features, and some features can have associated values. An information object can contain or reference another information object.

【０００９】ウェアハウス化データベースを含むデータ
ベース内の情報検索を支援するため、インデックス（ｉ
ｎｄｅｘ）と呼ばれる特殊な検索構造が使用される。大
型データベースは格納されたデータへのポインタを維持
するために対応する大きなインデックス構造を必要とす
る。このようなインデックス構造はデータベース自体よ
り大きくなることがある。現行技術では各属性又は各特
徴（ｆｅａｔｕｒｅ）について別個のインデックスが必
要である。この技術を拡張して単一のインデックス構造
の中の少数の属性又は特徴をインデックス化することも
できるが、数百又は数千の属性が存在する場合にはこの
技術はうまく機能しない。更に、インデックス構造の維
持に関連して相当なオーバヘッドが存在する。これによ
りインデックス化できる属性又は特徴の個数が制限され
るので、サポートされるものは注意深く選択する必要が
ある。トランザクション型データベース（ｔｒａｎｓａ
ｃｔｉｏｎａｌｄａｔａｂａｓｅ）では、普通は作業
負荷が良く理解されているので、データベース性能を最
適化するようにインデックスを選択することが可能であ
る。データ・ウェアハウスでは一般に良く定義された作
業負荷がないので、どの属性をインデックス化するか選
択するのが大幅に難しい。To support information retrieval in a database including a warehouse database, an index (i
A special search structure called ndx) is used. Large databases require a corresponding large index structure to maintain pointers to stored data. Such an index structure can be larger than the database itself. The state of the art requires a separate index for each attribute or feature. Although this technique can be extended to index a small number of attributes or features in a single index structure, this technique does not work well when there are hundreds or thousands of attributes. Further, there is considerable overhead associated with maintaining the index structure. Since this limits the number of attributes or features that can be indexed, those that are supported must be carefully selected. Transactional database (transa
In the contextual database, the workload is usually well understood, so that the index can be selected to optimize database performance. Choosing which attributes to index is much more difficult because data warehouses generally do not have a well-defined workload.

【００１０】前述の概念に関しての更なる情報は、以下
の出版物を参照することで得ることができる。Further information on the above concepts can be obtained by reference to the following publications:

【０００１１】１．Ｌ．Ａｉｅｌｌｏ，Ｊ．Ｄ
ｏｙｌｅ，ａｎｄＳ．Ｓｈａｐｉｒｏ，ｅｄｉ
ｔｏｒｓ．Ｐｒｏｃ．ＦｉｆｔｈＩｎｔｅｒｎ．
Ｃｏｎｆ．ｏｎＰｒｉｎｃｉｐｌｅｓｏｆＫ
ｎｏｗｌｅｄｇｅＲｅｐｒｅｓｅｎｔａｔｉｏｎａ
ｎｄＲｅａｓｏｎｉｎｇ．ＭｏｒｇａｎＫａｕｆ
ｍａｎＰｕｂｌｉｓｈｅｒｓ，ＳａｎＭａｔｅ
ｏ，ＣＡ，１９９６．２．Ｋ．Ｂａｃｌａｗｓｋｉ．Ｄｉｓｔｒｉｂｕ
ｔｅｄｃｏｍｐｕｔｅｒｄａｔａｂａｓｅｓｙｓ
ｔｅｍａｎｄｍｅｔｈｏｄ，Ｄｅｃｅｍｂｅｒ
１９９７．ＵｎｉｔｅｄＳｔａｔｅｓＰａｔｅｎ
ｔＮｏ．５，６９４，５９３．Ａｓｓｉｇｎｅｄ
ｔｏＮｏｒｔｈｅａｓｔｅｒｎＵｎｉｖｅｒｓｉ
ｔｙ，Ｂｏｓｔｏｎ，ＭＡ．（分散コンピュータ・
データベース・システム及びその方法、１９９７年１２
月、米国特許第５，６９４，５９３号、米国マサチュー
セッツ州ボストンのノースイースタン大学に譲渡）３．Ａ．ＤｅｌＢｉｍｂｏ，ｅｄｉｔｏｒ．
ＴｈｅＮｉｎｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏ
ｎｆｅｒｅｎｃｅｏｎＩｍａｇｅＡｎａｌｙｓｉ
ｓａｎｄＰｒｏｃｅｓｓｉｎｇ，ｖｏｌｕｍｅ
１３１１．Ｓｐｒｉｎｇｅｒ，Ｓｅｐｔｅｍｂｅｒ
１９９７．４．Ｎ．ＦｒｉｄｍａｎＮｏｙ．Ｋｎｏｗｌｅ
ｄｇｅＲｅｐｒｅｓｅｎｔａｔｉｏｎｆｏｒＩｎ
ｔｅｌｌｉｇｅｎｔＩｎｆｏｒｍａｔｉｏｎＲｅｔｒ
ｉｅｖａｌｉｎＥｘｐｅｒｉｍｅｎｔａｌＳｃｉ
ｅｎｃｅｓ．ＰｈＤｔｈｅｓｉｓ，Ｃｏｌｌｅｇｅ
ｏｆＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，Ｎｏｒ
ｔｈｅａｓｔｅｒｎＵｎｉｖｅｒｓｉｔｙ，Ｂｏｓ
ｔｏｎ，ＭＡ，１９９７．５．Ｍ．Ｈｕｒｗｉｃｚ．Ｔａｋｅｙｏｕｒ
ｄａｔａｔｏｔｈｅｃｌｅａｎｅｒｓ．Ｂｙｔ
ｅＭａｇａｚｉｎｅ，Ｊａｎｕａｒｙ１９９７．６．Ｙ．Ｏｈｔａ．Ｋｎｏｗｌｅｄｇｅ−Ｂａｓ
ｅｄＩｎｔｅｒｐｒｅｔａｔｉｏｎｏｆＯｕｔｄ
ｏｏｒＮａｔｕｒａｌＣｏｌｏｒＳｃｅｎｅｓ．
Ｐｉｔｍａｎ，Ｂｏｓｔｏｎ，ＭＡ，１９８
５．７．Ａ．Ｔｖｅｒｓｋｙ．Ｆｅａｔｕｒｅｓｏ
ｆｓｉｍｉｌａｒｉｔｙ．Ｐｓｙｃｈｏｌｏｇｉｃ
ａｌｒｅｖｉｅｗ，８４（４）：３２７−３５２，
Ｊｕｌｙ１９７７．８．Ｓ．ＷｅｉｓｓａｎｄＮ．Ｉｎｄｕｒｋ
ｈｙａ．ＰｒｅｄｉｃｔｉｖｅＤａｔａＭｉｎｉ
ｎｇ：ＡＰｒａｃｔｉｃａｌＧｕｉｄｅ．Ｍｏ
ｒｇａｎＫａｕｆｍａｎｎＰｕｂｌｉｓｈｅｒｓ，
Ｉｎｃ．，ＳａｎＦｒａｎｃｉｓｃｏ，ＣＡ，
１９９８．９．Ｊ． −Ｌ．ＷｅｌｄｏｎａｎｄＡ．Ｊ
ｏｃｈ．Ｄａｔａｗａｒｅｈｏｕｓｅｂｕｉｌｄｉ
ｎｇｂｌｏｃｋｓ．ＢｙｔｅＭａｇａｚｉｎｅ，
Ｊａｎｕａｒｙ１９９７．[00011] L. Aiello, J .; D
oyle, and S.M. Shapiro, edi
tors. Proc. Fifth Intern.
Conf. on Principles of K
nowrepresentation a
nd Reasoning. Morgan Kauf
man Publishers, San Mate
o, CA, 1996. 2. K. Bacrawski. Distribu
ted computer database sys
tem and method, Decmber
1997. United States Pattern
t No. 5,694,593. Assigned
to Northeastern Universi
ty, Boston, MA. (Distributed computer /
Database system and method, December 1997
Monthly, U.S. Pat. No. 5,694,593, transferred to Northeastern University, Boston, Mass., USA). A. Del Bimbo, editor.
The Nth International Co
nreference on Image Analysis
s and Processing, volume
1311. Springer, September
1997. 4. N. Fridman Noy. Knowle
dge Representation for In
tellent InformationRetr
ieval in Experimental Sci
encodes. PhD thesis, College
of Computer Science, Nor
theastern University, Bos
ton, MA, 1997. 5. M. Hurwicks. Take your
data to the cleaners. Byt
e Magazine, January 1997. 6. Y. Ohta. Knowledge-Bas
ed Interpretation of Outd
or Natural Color Scenes.
Pitman, Boston, MA, 198
5. 7. A. Tversky. Features o
f similarity. Psychological
al review, 84 (4): 327-352,
July 1977. 8. S. Weiss and N.W. Indurk
hya. Predictive Data Mini
ng: A Practical Guide. Mo
rgan Kaufmann Publishers,
Inc. , San Francisco, CA,
1998. 9. J. -L. Weldon and A. J
och. Datawarehouse buildi
ng blocks. Byte Magazine,
January 1997.

【００１２】[0012]

【発明が解決しようとする課題】「発明の背景」で参照
した出版物の開示は参照により本明細書に含まれる。現
行システムでの性能及びその他の問題並びに制限を克服
するようなデータ・ウェアハウス化及びデータ・マイニ
ング用のシステムの改良版を提供することが望まれる。The disclosures of the publications referred to in the Background of the Invention are hereby incorporated by reference. It would be desirable to provide an improved version of a system for data warehousing and data mining that overcomes the performance and other problems and limitations of current systems.

【００１３】[0013]

【課題を解決するための手段】本発明はデータ・ウェア
ハウス化とデータ・マイニングという２つの活動を統合
することにより、データ・ウェアハウス化の基盤とサポ
ートを改善する。術語「知識抽出」は本明細書において
データ・ウェアハウス化活動とデータ・マイニング活動
の統合を指す意味で使用する。SUMMARY OF THE INVENTION The present invention improves the data warehousing infrastructure and support by integrating the two activities of data warehousing and data mining. The term "knowledge extraction" is used herein to refer to the integration of data warehousing and data mining activities.

【００１４】本発明は、例えばデータ・ウェアハウスか
ら情報を取り出すためのクエリ（ｑｕｅｒｙ：問い合わ
せ、又は、リクエスト）を含むユーザからのクエリを処
理するための情報取出装置及びその方法に関する。本装
置は、インデックス・データベース内の複数の特徴（ｆ
ｅａｔｕｒｅ）及び特徴フラグメントを見付け出す（又
は、その存在する場所を特定する）ための機構、クエリ
に含まれる複数のレベルの複数のサブクエリ（ｓｕｂ−
ｑｕｅｒｙ）を識別（特定）し、この見付け出された特
徴及び特徴フラグメントの各々を用いてサブクエリを反
復的（又は、回帰的）に評価するための評価機構、及び
クエリの全体に対して評価（計算）した結果が得られた
後に続いてなされたクエリ及びサブクエリの反復的（回
帰的）な評価の複数の結果を収集し格納するための機構
を含む。[0014] The present invention relates to an information retrieval apparatus and method for processing a query from a user including a query (query) for retrieving information from a data warehouse. The apparatus may include a plurality of features (f) in the index database.
mechanisms for finding (or locating) feature fragments and sub-queries at multiple levels included in the query.
query) and an evaluation mechanism for iteratively (or recursively) evaluating a subquery using each of the found features and feature fragments, and an evaluation ( A mechanism for collecting and storing a plurality of results of an iterative (recursive) evaluation of subsequent queries and subqueries after the calculated results are obtained.

【００１５】本明細書で使用している「評価」は、クエ
リへの応答が生成されるプロセスであって、このクエリ
に記載された基準に一致する情報、情報ロケーション指
定子（情報格納場所の指定子）又はその情報に関するデ
ータを取り出すことで特徴付けられる。反復的評価はク
エリ評価の一種で、サブクエリと呼ばれる新しいクエリ
がクエリから生成され評価される。このように生成され
たサブクエリはクエリ・ツリー（ｑｕｅｒｙｔｒｅ
ｅ）のノードと見なされ、最初のクエリはベース・ノー
ドと見なされ、各サブクエリは、これが生成されたとき
の祖先クエリ（ｐｒｅｄｅｃｅｓｓｏｒ）との関連性に
よって定義されたレベルであって、対応するレベルをツ
リー内部に有している。サブクエリの全部、即ち祖先ク
エリと子クエリは反復的に評価され、その結果が収集、
格納され、クエリへの応答の形でユーザに提供される。[0015] As used herein, "evaluation" is the process by which a response to a query is generated. Information that matches the criteria described in the query, an information location specifier (the location of the information storage location). Specifier) or data about the information. Iterative evaluation is a type of query evaluation in which a new query, called a subquery, is generated from the query and evaluated. The subquery generated in this way is a query tree (query tree).
e), the first query is considered the base node, and each subquery is a level defined by its relevance to the predecessor at the time it was generated, the corresponding level In the tree. All of the subqueries, the ancestor and child queries, are evaluated iteratively and the results are collected,
Stored and provided to the user in response to a query.

【００１６】本発明は、従来の取出しシステムにおい
て、データ・ウェアハウス内に、多様な外部データベー
スにあるデータの新規で独立した集中レプリカ（集中複
成物）を提供する必要性をなくすことができる。本発明
はデータが陳腐化する、又はウェアハウス化のための複
製中にエラーが発生しやすいといった従来システムにお
けるこうしたデータの複製の問題を回避できる。その代
わりに、データ・ウェアハウスはインデックス・データ
ベースを内包でき、これが外部データベースに格納され
ている情報に関するデータ、例えばこれらのデータベー
ス内部のデータについての情報ロケーション指定子、リ
レーションに関する情報や統計などを提供するエントリ
を格納する。本発明は強固で汎用性の高いインデックス
・システムも提供できる。本発明のインデックスは例え
ば多数の潜在的属性を有しながらもその幾つかだけが特
定のレコードで使用される散発的レコードのインデック
ス化をサポートする。本発明はまた、例えば実質的に均
一なデータ構造の中の非常に多数の属性のインデックス
化をサポートするので、高性能を実現するのに必要とさ
れる作業負荷特性の決定が極めて容易になる。The present invention eliminates the need in a conventional retrieval system to provide a new, independent, centralized replica of data in various external databases in a data warehouse. . The present invention avoids such data replication problems in conventional systems, such as data becoming stale or errors prone to occur during replication for warehousing. Instead, the data warehouse can contain an index database, which provides data on information stored in external databases, such as information location specifiers on data inside these databases, information and statistics on relations, etc. Store the entry to be made. The present invention can also provide a robust and versatile index system. The index of the present invention supports the indexing of sporadic records that have a large number of potential attributes, but only some of which are used in a particular record. The invention also supports the indexing of a very large number of attributes, for example in a substantially uniform data structure, which makes it very easy to determine the workload characteristics required to achieve high performance. .

【００１７】さらに詳しく説明すると、本発明の１つの
態様によれば、分散コンピュータ・データベース・シス
テムは、ネットワークで相互接続されて１つのデータ・
ウェアハウス兼データ・マイニング・エンジンを構成す
るようになったシステムであって、１台以上のフロント
エンド・コンピュータと１台以上のコンピュータ・ノー
ドとを含み、このシステムは、画像、サウンド・ストリ
ーム、ビデオ・ストリームを含むオブジェクトやプレー
ンテキスト及び構造化テキストをインデックス化する。
外部データベースからのオブジェクトは、ウェアハウス
・ノード（ｗａｒｅｈｏｕｓｉｎｇｎｏｄｅ）と呼ば
れるノードによりネットワークからダウンロードされ
る。ウェアハウス・ノードは、オブジェクトから幾つか
の特徴（オブジェクトから抽出された特徴をオブジェク
ト特徴と言う。）を抽出し、抽出した特徴を多数の特徴
フラグメントにフラグメント化し、これらの特徴フラグ
メントをハッシュ（ｈａｓｈ）する。ハッシュされた特
徴フラグメントの各々は、ネットワーク上にあってイン
デックス・ノードと呼ばれる１つのノードへ送信され
る。ハッシュされた特徴フラグメントを受信するネット
ワーク上の各ノードは、オブジェクトのハッシュされた
特徴フラグメントを使用してインデックス・データベー
スの対応する区画部分で検索を実行する。ローカル・デ
ータベースの検索結果はウェアハウス・ノードにより収
集される。ウェアハウス・ノードは、これらの結果を使
用してオブジェクトがデータ・ウェアハウス内ですでに
インデックス化されているかどうか判定する。ウェアハ
ウス・ノードは、オブジェクトから特徴を抽出し、この
特徴をフラグメント化し、これらの特徴フラグメントを
ハッシュする。ハッシュされた特徴フラグメントの各々
はネットワーク上の１つのノードへ送信される。ハッシ
ュされた特徴フラグメントを受信するネットワーク上の
各ノードは、オブジェクトのハッシュされた特徴フラグ
メントを使用してインデックス・データベースの対応す
る区画部分にこの特徴を格納する。More specifically, according to one aspect of the present invention, a distributed computer database system is interconnected by a network to provide a single data database.
A system adapted to form a warehouse and data mining engine, comprising one or more front-end computers and one or more computer nodes, the system comprising an image, a sound stream, Index objects, including video streams, plain text and structured text.
Objects from external databases are downloaded from the network by nodes called warehouse nodes. The warehouse node extracts some features from the object (the features extracted from the object are called object features), fragments the extracted features into a number of feature fragments, and hashes these feature fragments. ). Each of the hashed feature fragments is sent to one node on the network, called the index node. Each node on the network that receives the hashed feature fragment performs a search in the corresponding partition of the index database using the hashed feature fragment of the object. Local database search results are collected by warehouse nodes. The warehouse node uses these results to determine if the object has already been indexed in the data warehouse. The warehouse node extracts features from the object, fragments the features, and hashes these feature fragments. Each of the hashed feature fragments is sent to one node on the network. Each node on the network receiving the hashed feature fragment stores the feature in the corresponding partition of the index database using the hashed feature fragment of the object.

【００１８】クエリは、例えばパターン・クエリ（パタ
ーンを対象とするクエリ）であり得る。パターン・クエ
リはデータ内のパターンの検索である。ユーザからのパ
ターン・クエリは、フロントエンド・コンピュータの１
つへ送信され、このコンピュータがデータ・マイニング
・エンジンのホーム・ノードと呼ばれるインデックス・
ノードの１つへこのパターン・クエリを転送する。この
ホーム・ノードは、パターン・クエリを１つ以上のサブ
クエリに分解し、各サブクエリはメモリに格納され、か
つ、オブジェクト特徴（オブジェクトに関する特徴）と
メソッド、例えば計算、を実装するコンピュータで実行
可能なプログラムとを含む。この計算は別のサブクエリ
に関係することがある。ホーム・ノードは、サブクエリ
特徴の各々を１つ以上のサブクエリ特徴フラグメントに
フラグメント化し、この特徴フラグメントをハッシュす
る。サブクエリ特徴フラグメントの各々は、ハッシュさ
れた特徴フラグメントに従ってネットワーク上の１つの
ノードへ送信される。サブクエリを受信するネットワー
ク上の各ノードは、サブクエリのハッシュされた特徴フ
ラグメントを用いてインデックス・データベースの対応
する区画部分に対して検索を実行し、アクセスされたデ
ータがサブクエリの計算により使用される。サブクエリ
の計算が別のサブクエリを含む場合（これはゼロ又は１
つ以上のサブクエリを含むことがある）、この別のサブ
クエリが反復的に評価され、この反復的評価で得られた
データがサブクエリの計算により使用される。ローカル
インデックス・データベースの検索結果と全ての反復的
評価の結果がホーム・ノードにより収集される。パター
ン・クエリの結果がホーム・ノードにより判定されユー
ザに返される。The query can be, for example, a pattern query (a query for a pattern). Pattern queries are searches for patterns in the data. The pattern query from the user is sent to one of the front-end computers.
To the index node called the home node of the data mining engine.
Forward this pattern query to one of the nodes. The home node decomposes the pattern query into one or more sub-queries, each sub-query stored in memory and computer-executable implementing object features (features about the object) and methods, such as computations. Including programs. This calculation may involve another subquery. The home node fragments each of the subquery features into one or more subquery feature fragments and hashes the feature fragments. Each of the subquery feature fragments is sent to one node on the network according to the hashed feature fragments. Each node on the network that receives the subquery performs a search on the corresponding partition portion of the index database using the hashed feature fragments of the subquery, and the accessed data is used by the subquery calculation. If the calculation of the subquery includes another subquery (this can be zero or one)
(Which may include one or more subqueries), the other subquery is iteratively evaluated, and the data obtained from this iterative evaluation is used in the subquery calculation. The search results of the local index database and the results of all iterative evaluations are collected by the home node. The result of the pattern query is determined by the home node and returned to the user.

【００１９】本発明の別の１つの態様においては、分散
コンピュータ・データベース・システムは、ネットワー
クで相互接続されて１つの知識抽出エンジンとして機能
するように構成されたシステムであって、１台以上のフ
ロントエンド・コンピュータと１台以上のコンピュータ
・ノードとを含み、当該システムは、データ・ウェアハ
ウス活動とデータ・マイニング活動の両方をサポートす
る。In another aspect of the invention, a distributed computer database system is a system interconnected by a network and configured to function as a knowledge extraction engine, wherein the system comprises one or more computers. Including a front-end computer and one or more computer nodes, the system supports both data warehousing and data mining activities.

【００２０】最初にデータ・ウェアハウス活動（データ
をウェアハウスの中に格納する活動）を考察する。別の
データベースからウェアハウスへオブジェクトをダウン
ロードするのはウェアハウス・ノードにより行なわれ
る。別のデータベースからダウンロードされたオブジェ
クトについては、ウェアハウス・ノードは、更に別のデ
ータベースからのダウンロードによりオブジェクトがデ
ータ・ウェアハウスにすでに表現されているかどうかを
最初に判定する。これに当てはまる場合、ウェアハウス
・ノードはオブジェクトの１つ以上の特徴を抽出し、オ
ブジェクト特徴の各々を複数の特徴フラグメントにフラ
グメント化し、これらの特徴フラグメントの各々をハッ
シュする。ハッシュされた特徴フラグメントの各々の一
部は、ウェアハウス・ノードがネットワーク上のインデ
ックス・ノードへハッシュされたオブジェクト特徴を送
信するときのアドレシング・インデックス（ａｄｒｅｓ
ｓｉｎｇｉｎｄｅｘ）としてウェアハウス・ノードに
より使用される。ハッシュされたオブジェクト特徴フラ
グメントを受信するネットワーク上の各インデックス・
ノードは、ハッシュされたオブジェクト特徴フラグメン
トを用いて対応するインデックス・データベースに対し
て検索を実行する。ハッシュされたオブジェクト特徴に
対応するデータを発見したノードは、この特徴フラグメ
ントを保有するウェアハウス・オブジェクトのＯＩＤ
（ｏｂｊｅｃｔｉｎｄｅｔｉｆｉｅｒ）を返す。この
ようなＯＩＤがウェアハウス・ノードにより収集され類
似性関数が計算される。この類似性関数はオブジェクト
がすでにデータ・ウェアハウスに格納されているかどう
かを判定するために使用される。オブジェクトがデータ
・ウェアハウスに表現されていると判定された場合、ウ
ェアハウス・オブジェクトのＯＩＤはダウンロードされ
たオブジェクト用に使用される。すでに表現されている
のではない場合、そのオブジェクト用にユニーク（一義
的に特定される）なＯＩＤが選択される。ウェアハウス
・ノードはオブジェクトの特徴を抽出し、この特徴をフ
ラグメント化し、これらの特徴フラグメントをハッシュ
する。ハッシュされた特徴フラグメントの各々の一部
は、データ・ウェアハウスに特徴が格納されているネッ
トワーク上のインデックス・ノードへハッシュしたオブ
ジェクト特徴フラグメントをウェアハウス・ノードが送
信するときのアドレシング・インデックスとしてウェア
ハウス・ノードにより使用される。Consider first the data warehouse activity (the activity of storing data in the warehouse). Downloading objects from another database to the warehouse is performed by the warehouse node. For objects downloaded from another database, the warehouse node first determines whether the object has already been represented in the data warehouse by downloading from yet another database. If so, the warehouse node extracts one or more features of the object, fragments each of the object features into a plurality of feature fragments, and hashes each of these feature fragments. A portion of each of the hashed feature fragments is used as an addressing index (adres) when the warehouse node sends the hashed object feature to an index node on the network.
Used by the warehouse node as a sing index. Each index on the network that receives the hashed object feature fragment
The node performs a search against the corresponding index database using the hashed object feature fragments. The node that finds the data corresponding to the hashed object feature is the OID of the warehouse object that holds this feature fragment.
(Object identifier) is returned. Such OIDs are collected by the warehouse node and a similarity function is calculated. This similarity function is used to determine if the object is already stored in the data warehouse. If it is determined that the object is represented in the data warehouse, the OID of the warehouse object is used for the downloaded object. If not, a unique (uniquely specified) OID is selected for the object. The warehouse node extracts the features of the object, fragments the features, and hashes these feature fragments. A portion of each of the hashed feature fragments is used as an addressing index when the warehouse node sends the hashed object feature fragments to an index node on the network where the features are stored in the data warehouse. Used by house nodes.

【００２１】次にデータ・マイニング活動を考察する。
データ内のパターンの検索をする等、クエリを評価した
いユーザはフロントエンド・コンピュータの１つへクエ
リを送信し、このコンピュータは次にネットワークのイ
ンデックス・ノードの１つへこのクエリを転送する。こ
のクエリを受信するノードはデータ・ウェアハウスのホ
ーム・ノードと呼ばれ、このクエリを１つ以上のサブク
エリに分解する。１つのサブクエリは、特徴と、計算等
のメソッドを実装するコンピュータで実行可能なプログ
ラムとを含み、これはさらに別のサブクエリを含むこと
ができる。ホーム・ノードはこれらを格納し、各サブク
エリの特徴を１つまたはそれ以上のサブクエリ特徴フラ
グメントへフラグメント化し、サブクエリの特徴フラグ
メントの各々をハッシュする。ハッシュされた特徴フラ
グメントの各々の一部は、ネットワーク上のノードへホ
ーム・ノードがサブクエリを送信するときのアドレシン
グ・インデックスとしてホーム・ノードにより使用され
る。サブクエリを受信するネットワーク上の各インデッ
クス・ノードは、ハッシュされたサブクエリ特徴を使用
して対応するインデックス・データベースに対して検索
を実行する。ハッシュされたサブクエリ特徴フラグメン
トに対応するデータを発見したノードは、そのサブクエ
リの中で指定された計算を実行する。この計算が別のサ
ブクエリを何も含まない場合、この計算結果がホーム・
ノードへ返される。この計算が別のサブクエリを含む場
合、ノードはこの計算に含まれているサブクエリに関し
てホーム・ノードの役割を担う。さらに詳しく説明する
と、ノードは、含まれているサブクエリの特徴フラグメ
ントをハッシュしてこのサブクエリを他のノードへ送信
する。計算が完了し最終結果が当初のホーム・ノードへ
返されるまでこのプロセスが反復的に継続する。計算結
果を受け取ると、ホーム・ノードは、当初のパターン・
クエリで指定されたデータ集合の残りの全てを実行し
て、フロントエンド・ノードへこの情報を送信する。フ
ロントエンド・ノードはユーザへの応答をフォーマット
化し、このフォーマット化された応答をユーザへ送信す
る。Next, consider the data mining activity.
A user who wants to evaluate a query, such as searching for a pattern in the data, sends the query to one of the front-end computers, which then forwards the query to one of the index nodes of the network. The node that receives this query is called the data warehouse home node and breaks it down into one or more subqueries. One subquery includes features and a computer-executable program that implements methods such as calculations, which can include additional subqueries. The home node stores them, fragments the features of each subquery into one or more subquery feature fragments, and hashes each of the subquery feature fragments. A portion of each of the hashed feature fragments is used by the home node as an addressing index when the home node sends a subquery to a node on the network. Each index node on the network receiving the subquery performs a search against the corresponding index database using the hashed subquery features. The node that finds the data corresponding to the hashed subquery feature fragment performs the computation specified in the subquery. If this calculation contains no other subqueries, the result of this calculation is
Returned to node. If the calculation involves another subquery, the node assumes the role of the home node for the subquery included in the calculation. More specifically, the node hashes the feature fragments of the included subquery and sends the subquery to other nodes. This process continues iteratively until the calculations are completed and the final result is returned to the original home node. Upon receiving the calculation result, the home node returns the original pattern
Execute the rest of the data set specified in the query and send this information to the front-end node. The front end node formats the response to the user and sends the formatted response to the user.

【００２２】[0022]

【発明の実施の形態】全体概要図としての図１を参照す
ると、本発明による分散コンピュータ・データベース・
システムの１つの実施例１００は、例えばネットワーク
１０６経由でフロントエンド・コンピュータ１０４と通
信するユーザ・コンピュータ１０２を含む。別の方法で
は、フロントエンド・コンピュータ１０４はユーザ・コ
ンピュータであっても良い。フロントエンド・コンピュ
ータ１０４は、更に１つのデータ・ウェアハウス兼デー
タ・マイニング・エンジンと通信し、これはローカル・
エリア・ネットワーク１１０で相互接続された１つ以上
のコンピュータ・ノード１０６、１０８を含む。個々の
コンピュータ・ノード１０６、１０８は、ローカル・デ
ィスク１１２を含むか、又は、これの代わりに又はこれ
に加えて、ネットワーク・ディスク・サーバ（図示して
いない）からデータを取得できる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring to FIG. 1 for a general schematic diagram, a distributed computer database
One embodiment 100 of the system includes a user computer 102 communicating with a front-end computer 104, for example, via a network 106. Alternatively, front-end computer 104 may be a user computer. The front-end computer 104 also communicates with one data warehouse and data mining engine, which is
It includes one or more computer nodes 106, 108 interconnected by an area network 110. Individual computer nodes 106, 108 may include local disk 112 or, alternatively or additionally, obtain data from a network disk server (not shown).

【００２３】データ・ウェアハウスのコンピュータ・ノ
ード１０６、１０８にはインデックス・ノード１０６や
ウェアハウス・ノード１０８を含む幾つかの種類があ
る。データ・ウェアハウスのノード１０６、１０８は独
立したコンピュータを表わさなくとも良い。１つの実施
例においては、データ・ウェアハウスはインデックス・
ノード１０６とウェアハウス・ノード１０８との全ての
役割を担う単一のコンピュータとして実現される。別の
実施例においては、データ・ウェアハウスはインデック
ス・ノード１０６とウェアハウス・ノード１０８との各
々について別々のコンピュータとして実現される。本発
明の範囲及び精神に納まるものとして多くの変化が可能
であることは当業者には認識されよう。There are several types of data warehouse computer nodes 106, 108, including index nodes 106 and warehouse nodes 108. The data warehouse nodes 106, 108 need not represent independent computers. In one embodiment, the data warehouse is an index warehouse.
It is implemented as a single computer that performs all the roles of node 106 and warehouse node 108. In another embodiment, the data warehouse is implemented as a separate computer for each of index node 106 and warehouse node 108. One skilled in the art will recognize that many variations are possible that fall within the scope and spirit of the invention.

【００２４】まずオブジェクトをダウンロードする例示
的な方法２００を考察し、また図２を参照すると、１つ
の実施例においてオブジェクトは１つまたはそれ以上の
ウェアハウス・ノード１０８により外部データベース２
０１からダウンロードされる（ステップ２０１）。例え
ば別のデータベースからの前回のダウンロードにより、
オブジェクトがすでにデータ・ウェアハウス内に表現さ
れている場合、ウェアハウス・ノード１０８はデータ・
ウェアハウスの統合データ・モデルで指定されるよう
に、オブジェクトを識別する目的でオブジェクトから多
数の特徴を抽出する。例えば、人は雇用者「ＩＤ」、ア
カウント番号、名前、住所、電話番号、電子メール・ア
ドレス等により、又はこれらのうちの幾つかの組合せに
より、識別することができる。Consider first an exemplary method 200 for downloading an object, and referring to FIG. 2, in one embodiment, the object is stored in an external database 2 by one or more warehouse nodes 108.
01 (step 201). For example, from a previous download from another database,
If the object is already represented in the data warehouse, the warehouse node 108
Extracting a number of features from an object for the purpose of identifying the object, as specified in the warehouse's integrated data model. For example, a person may be identified by an employer "ID", account number, name, address, telephone number, email address, etc., or some combination thereof.

【００２５】各種の特徴抽出技術を使用できる。トラン
ザクション（取引）の日付等のリレーショナル属性値で
は、考えられる値は連続して重複しない範囲の集合に分
割できる。この方法でフィールド値を分割することを仕
切り（ｄｉｓｃｒｅｔｉｚａｔｉｏｎ）と呼ぶ。実際の
値はインデックス・エントリにも含まれ得る。Various feature extraction techniques can be used. With relational attribute values, such as transaction dates, possible values can be divided into a set of non-overlapping ranges in a row. Dividing the field values in this manner is called discretionization. The actual value may also be included in the index entry.

【００２６】構造化された文書を解析（分解）してデー
タ構造を作成し、ついでこのデータ構造をフラグメント
と呼ばれる（おそらくは重複する）部分構造へ分割する
ことにより、特徴が構造化された文書から抽出される。
サブクエリに関連するフラグメント（ｆｒａｇｍｅｎ
ｔ）はデータベース内の一致するフラグメントを発見す
るために使用されるので、これはプローブと呼ばれる。By parsing (decomposing) the structured document to create a data structure, and then dividing this data structure into (possibly overlapping) substructures called fragments, the features are converted from the structured document. Is extracted.
Fragment related to subquery (fragmen
This is called a probe because t) is used to find a matching fragment in the database.

【００２７】非構造化文書から抽出された特徴は相互に
関連する部分構造の集合を含むデータ構造に構成され、
これが構造化文書の場合と同様に、（おそらくは重複す
る）コンポーネント部分構造（ｃｏｍｐｏｎｅｎｔｓ
ｕｂｓｔｒｕｃｔｕｒｅｓ）へ分割され、これらのコン
ポーネント部分構造が非構造化文書のフラグメントであ
る。The features extracted from the unstructured document are organized into a data structure containing a set of interrelated substructures,
As is the case with structured documents, component s (possibly duplicates)
substructures), and these component substructures are unstructured document fragments.

【００２８】サウンド、画像、ビデオ・ストリーム等の
メディア用に、例えば画像ではエッジ検出アルゴリズ
ム、セグメンテーション・アルゴリズムやオブジェクト
分類アルゴリズム等、多種多様な特徴抽出アルゴリズム
が開発されている。フーリエ変換やウェーブレット変換
（ｗａｖｅｌｅｔｔｒａｎｓｆｏｒｍａｔｉｏｎ）並
びに多くのフィルタリング・アルゴリズム（フィルター
用アルゴリズム）も画像やサウンドから特徴を抽出する
のに使用される。特徴は手動手段又は半自動手段により
オブジェクトに追加することもできる。このような追加
された特徴は注釈又はメタデータ（ｍｅｔａ−ｄａｔ
ａ）として呼称される。注釈がリレーショナル・データ
ベース・レコードか、構造化文書か、又は非構造化文書
かによって、前述した技術の１つを用いてこの注釈から
特徴が抽出される。ある特徴がこれに関連した値を有し
ている場合、これらの値は仕切られる。また特徴の間の
関連性を指定することができる。例えば、１つの特徴は
別の特徴に含まれたり、別の特徴に隣接することがあ
る。統合データ・モデルは特徴抽出アルゴリズム並びに
特徴の構造を指定する。A wide variety of feature extraction algorithms have been developed for media such as sound, images, video streams, etc., such as edge detection algorithms, segmentation algorithms and object classification algorithms for images. Fourier transforms and wavelet transforms as well as many filtering algorithms are also used to extract features from images and sounds. Features can also be added to objects by manual or semi-automatic means. Such added features may be annotations or metadata (meta-data
a). Depending on whether the annotation is a relational database record, a structured document, or an unstructured document, features are extracted from the annotation using one of the techniques described above. If a feature has a value associated with it, these values are partitioned. You can also specify the relevance between features. For example, one feature may be included in or adjacent to another feature. The integrated data model specifies the feature extraction algorithm as well as the structure of the feature.

【００２９】ウェアハウス・ノード１０８は、予め定め
られたハッシュ関数を使用してオブジェクトの各特徴フ
ラグメントをエンコード（ｅｎｃｏｄｅ）する。システ
ム内のデータは、ローカル・データベースのデータへの
インデックスを生成するためのこのハッシュ関数を使用
して予め様々なインデックス・ノードにローカルに格納
されいる。つまり同じハッシュ関数を使用して、データ
・ストレージ（データ格納用）のインデックスを生成
し、かつ、オブジェクトに対するハッシュされたプロー
ブを生成することで、データを格納するときにデータ・
ウェアハウスのインデックス・ノード１０６全体に均一
にデータが分散されることを保証する。The warehouse node 108 encodes each feature fragment of the object using a predetermined hash function. The data in the system is previously stored locally at various index nodes using this hash function to generate an index to the data in the local database. In other words, the same hash function is used to generate an index for data storage (for storing data), and by generating a hashed probe for the object, the data is stored when the data is stored.
Ensures that data is evenly distributed across the index nodes 106 of the warehouse.

【００３０】１つの実施例においては、ハッシュ関数を
使用することにより得られるハッシュ値は第１の部分を
有し、これはデータを格納するために送信しようとする
先（送信先）のインデックス・ノード又はプローブとし
て特徴フラグメントを送信しようとする先のインデック
ス・ノードを識別するために用いられる。ハッシュ値は
ローカル・インデックス値である第２の部分も有し、こ
れはデータを格納しようとする場合又はインデックス・
ノードからデータを取り出そうとする場合のメモリ内の
ロケーション（格納場所）を決定するために用いられ
る。つまり、ハッシュされたオブジェクト特徴フラグメ
ントは、ハッシュ値の第１の部分で決定されるように、
データ・ウェアハウスの特定のインデックス・ノード１
０６へプローブとして分散される（ステップ２０２）。In one embodiment, the hash value obtained by using the hash function has a first part, which is the index of the destination to which the data is to be stored (the destination). Used to identify the index node to which the feature fragment will be sent as a node or probe. The hash value also has a second part, which is the local index value, which is used when storing data or when
It is used to determine a location (storage location) in memory when data is to be retrieved from a node. That is, the hashed object feature fragment is determined by the first part of the hash value,
Specific index node 1 in the data warehouse
06 are distributed as probes (step 202).

【００３１】そのインデックス・ノード１０６へデータ
が最初に格納されたときに使われたハッシュされた特徴
フラグメントとそのインデックス・ノードのプローブと
が一致するようなインデックス・ノード１０６は、要求
された情報のハッシュされた特徴フラグメントに一致す
るＯＩＤをウェアハウス・ノード１０８へ送信する（ス
テップ２０３）ことにより、取り出しメッセージに応答
する。つまり、ハッシュされたプローブとハッシュされ
た特徴フラグメントのローカル・ハッシュ・テーブルと
の間で一致したものは全て、最初にオブジェクト特徴フ
ラグメントをハッシュしたウェアハウス・ノード１０８
へ返されるか又はそこに収集される。The index node 106, such that the hashed feature fragment used when the data was initially stored in the index node 106 and the index node's probe match, will Respond to the retrieval message by sending the OID that matches the hashed feature fragment to warehouse node 108 (step 203). That is, any match between the hashed probe and the local hash table of the hashed feature fragments is the warehouse node 108 that first hashed the object feature fragment.
Returned to or collected there.

【００３２】ウェアハウス・ノード１０８は、次に、Ｏ
ＩＤの１つがウェアハウスの中に格納しようとするオブ
ジェクトと同じオブジェクトを表わしているかどうか判
定する。この判定は、ウェアハウスの中に格納しようと
するオブジェクトとＯＩＤが返されたオブジェクトの間
の類似性の度合をウェアハウス・ノードが比較すること
により行なう。１つの実施例においては、類似性の尺度
はオブジェクトに共通の特徴と、ＯＩＤが返されたオブ
ジェクトの特徴ではなくウェアハウスの中に格納しよう
とするオブジェクトの特徴とによって判定される。The warehouse node 108 then
It is determined whether one of the IDs represents the same object as the object to be stored in the warehouse. This determination is made by the warehouse node comparing the degree of similarity between the object to be stored in the warehouse and the object whose OID has been returned. In one embodiment, the measure of similarity is determined by features common to the objects and features of the object that is to be stored in the warehouse, rather than features of the object for which the OID was returned.

【００３３】この類似性の尺度は、上記で参照したＴｖ
ｅｒｓｋｙの特徴コントラスト・モデル（Ｆｅａｔｕｒ
ｅＣｏｎｔｒａｓｔＭｏｄｅｌ）に基づくことがで
きる。第１項により類似性の値に正の数が加えられ、第
２項により負の数が加えられる。さらに、第２項は予め
定められた定数で掛け算されて、第２の組の特徴が第１
の組の特徴より類似性について影響が少なくなるように
してある。The measure of this similarity is the Tv referenced above.
ersky feature contrast model (Featur
e Contrast Model). The first term adds a positive number to the similarity value, and the second term adds a negative number. Further, the second term is multiplied by a predetermined constant so that the second set of features is
The effect on the similarity is less than the characteristics of the set.

【００３４】オブジェクトがデータ・ウェアハウスの中
において表現されていると判定された場合、そのオブジ
ェクトについてＯＩＤはすでに利用可能である。すでに
表現されているのではない場合、そのオブジェクトにつ
いてユニークなＯＩＤが選択される。If it is determined that the object is represented in the data warehouse, the OID is already available for that object. If not, a unique OID is selected for the object.

【００３５】ウェアハウス・ノード１０８は次に、デー
タ・ウェアハウスの統合データ・モデルに従ってオブジ
ェクトの特徴の全部を抽出する。特徴抽出技術について
は前述した。ウェアハウス・ノード１０８は各特徴を特
徴フラグメントにフラグメント化（分割化）し、前述し
たように予め定められたハッシュ関数を使用してオブジ
ェクトの各特徴フラグメントをエンコードする。１つの
実施例においては、ハッシュ関数を用いて得られたハッ
シュ値はデータを送信して格納しよう（ステップ２０
４）とする先（格納先）のインデックス・ノードを識別
するために用いる第１の部分と、データをインデックス
・ノードのどこに格納する（ステップ２０５）かを決定
するために使用されるローカル・インデックス値である
第２の部分とを有する。The warehouse node 108 then extracts all of the object's features according to the data warehouse's integrated data model. The feature extraction technology has been described above. The warehouse node 108 fragments (divides) each feature into feature fragments and encodes each feature fragment of the object using a predetermined hash function as described above. In one embodiment, the hash value obtained using the hash function will be transmitted and stored (step 20).
4) A first part used to identify the index node to which the data is to be stored (storage destination), and a local index used to determine where to store the data (step 205) in the index node. A second part that is a value.

【００３６】次に、クエリを処理する例示的な方法３０
０を考察し、また図３を参照すると、ユーザがユーザ・
コンピュータ１０２からクエリを送信する（ステップ３
０１）１つの実施例においては、フロントエンド・コン
ピュータ１０４がクエリを受信する。フロントエンド・
コンピュータ１０４はユーザ・コンピュータ１０２との
接続の確立を担当してユーザがオブジェクトを送信しま
た適切なフォーマットで応答を受信できるようにする。
フロントエンド・コンピュータ１０４は又、何らかの認
証機能及び管理機能も担当する。１つの実施例において
は、フロントエンド・コンピュータ１０４はＨＴＴＰプ
ロトコルを用いてユーザ・コンピュータ１０２と通信す
るＷｏｒｌｄＷｉｄｅＷｅｂサーバである。Next, an exemplary method 30 for processing a query 30
0, and with reference to FIG.
A query is transmitted from the computer 102 (step 3
01) In one embodiment, the front-end computer 104 receives the query. front end·
Computer 104 is responsible for establishing a connection with user computer 102 so that the user can send objects and receive responses in an appropriate format.
Front-end computer 104 is also responsible for some authentication and management functions. In one embodiment, front-end computer 104 is a World Wide Web server that communicates with user computer 102 using the HTTP protocol.

【００３７】クエリが受け入れ可能であることを確認し
た後、フロントエンド・コンピュータ１０４はデータ・
ウェアハウスの要件（スペック）にクエリを適合させる
のに必要とされる全ての再フォーマット化を実行する。
フロントエンド・コンピュータ１０４はデータ・ウェア
ハウスのインデックス・ノード１０６の１つへクエリを
送信し（ステップ３０２）、このインデックス・ノード
がそのクエリについてのデータ・ウェアハウスのホーム
・ノード１０７として定義される。After confirming that the query is acceptable, the front-end computer 104 sends the data
Perform any reformatting needed to adapt the query to the requirements of the warehouse.
The front-end computer 104 sends a query to one of the data warehouse index nodes 106 (step 302), which is defined as the data warehouse home node 107 for the query. .

【００３８】ホーム・ノード１０７はクエリを複数（１
つ又はそれ以上）のサブクエリに分解する。各サブクエ
リは１つの特徴を有し、コンピュータで実行可能な方法
例えば計算を指定する。この計算は、サブクエリがどの
ようなアクションを行なうかを決定する。もっとも一般
的な計算はデータ・ウェアハウスに格納されている情報
を集計する統計関数である。この計算には一致を受け入
れるのに必要な最低の強度などの類似性基準や、平均又
は標準偏差等の統計計算を含められる。この計算は別の
サブクエリを含むことができる。The home node 107 sends a plurality of queries (1
(Or more). Each subquery has one characteristic and specifies a computer-executable method, such as a calculation. This calculation determines what action the subquery will take. The most common calculations are statistical functions that aggregate information stored in a data warehouse. This calculation can include similarity measures, such as the lowest strength required to accept a match, and statistical calculations, such as the mean or standard deviation. This calculation can include another subquery.

【００３９】各サブクエリについて、ホーム・ノード１
０７はサブクエリ特徴をサブクエリ特徴フラグメントへ
フラグメント化し、前述したように予め定められたハッ
シュ関数を用いて特徴フラグメントをエンコードする。
ハッシュされた特徴フラグメント及びサブクエリは、ホ
ーム・ノードにより前述したハッシュされた特徴フラグ
メントを用いてインデックス・ノードへ送信される（ス
テップ３０３）。For each subquery, home node 1
07 fragments the subquery features into subquery feature fragments and encodes the feature fragments using a predetermined hash function as described above.
The hashed feature fragments and subqueries are sent by the home node to the index node using the hashed feature fragments described above (step 303).

【００４０】インデックス・ノード１０６は、そのイン
デックス・ノードにデータを最初に格納したときの使用
したインデックス特徴フラグメントとハッシュされた特
徴フラグメントとが一致すると、ハッシュされた特徴フ
ラグメントに一致するデータであってインデックス項目
のローカル・ハッシュ・テーブルの中にあるデータを取
り出し、サブクエリの中で指定された計算を実行するこ
とでサブクエリに応答する。この計算が何らかの別のサ
ブクエリを含む場合、インデックス・ノードはコンポー
ネント・サブクエリと呼ばれ前述のように処理される新
規クエリについてのホーム・ノードとして機能する（ス
テップ３０４）。例えば、サブクエリは先月器具を購入
した各顧客に相関する他の製品の売上を見付けるために
使用し得る。この計算が別のサブクエリを含むか含まな
いかにかかわらず、インデックス・ノードはこれが受信
したサブクエリのホーム・ノード１０７へ計算結果を返
す（ステップ３０５）。If the index feature fragment used when the data is first stored in the index node matches the hashed feature fragment, the index node 106 determines that the data matches the hashed feature fragment. Retrieves the data in the local hash table of index entries and responds to the subquery by performing the calculations specified in the subquery. If the calculation involves any other subqueries, the index node is called a component subquery and serves as the home node for the new query processed as described above (step 304). For example, a subquery may be used to find sales of other products that correlate to each customer who purchased the equipment last month. Regardless of whether this calculation includes another subquery or not, the index node returns the result of the calculation to the home node 107 of the subquery it received (step 305).

【００４１】当初のクエリのサブクエリ全部の結果を受
信すると、ホーム・ノード１０７は何らかのデータ集
計、例えば当初のクエリにより指定された平均又は標準
偏差の計算等を実行し、得られた情報をユーザに返す。
１つの実施例においては、返された情報はフロントエン
ド・コンピュータ１０４へ送信され（ステップ３０
６）、このコンピュータが応答を適切にフォーマット化
してから、ユーザへ応答を送信する（ステップ３０
７）。別の実施例においては、返すべき情報は、例えば
フロントエンド・コンピュータ１０４の介在なしにネッ
トワーク１０５を経由してユーザ・コンピュータ１０２
へ直接送信される。Upon receiving the results of all the subqueries of the original query, the home node 107 performs some sort of data aggregation, for example, calculating the average or standard deviation specified by the original query, and sends the obtained information to the user. return.
In one embodiment, the returned information is sent to front-end computer 104 (step 30).
6) The computer formats the response appropriately and then sends the response to the user (step 30).
7). In another embodiment, the information to be returned may include user computer 102 via network 105 without, for example, front-end computer 104.
Sent directly to.

【００４２】次に、好適実施例において使用されるメッ
セージ・フォーマットを考察し、図４ａを参照すると、
ウェアハウス・メッセージの例示的なフォーマットは４
つのフィールドを含む。これらは、ヘッダ４０２、オブ
ジェクト識別子（ＱＩＤ）４０３、ハッシュされたオブ
ジェクト・フラグメント（ＨＯＦ）４０４、及び、値４
０５である。ヘッダ・フィールド４０２はこのメッセー
ジがウェアハウス・メッセージであることを指定し、宛
先インデックス・ノードも指定する。宛先インデックス
・ノードはハッシュされたオブジェクト・フラグメント
の第１の部分で決定される。ＯＩＤフィールド４０３
は、オブジェクト形式指定子（オブジェクト・タイプ指
定子）とオブジェクト識別子を含む。ＨＯＦフィールド
４０４は、フラグメント形式指定子（フラグメント・タ
イプ指定子）とハッシュ・モジュール（図５参照）によ
って作成されたハッシュされたオブジェクト・フラグメ
ントの第２の部分とを含む。値フィールド４０５はフラ
グメントに関連するオプションの値を含む。フラグメン
ト形式指定子はウェアハウス・メッセージが値フィール
ド４０５を含むかどうか判定し、ウェアハウス・メッセ
ージが値フィールドを含む場合にはフラグメント形式指
定子が値フィールドのサイズを判定する。Consider now the message format used in the preferred embodiment, and referring to FIG.
An exemplary format for a warehouse message is 4
Contains two fields. These include a header 402, an object identifier (QID) 403, a hashed object fragment (HOF) 404, and a value 4
05. Header field 402 specifies that this message is a warehouse message, and also specifies the destination index node. The destination index node is determined on the first part of the hashed object fragment. OID field 403
Contains an object type specifier (object type specifier) and an object identifier. HOF field 404 includes a fragment type specifier (fragment type specifier) and a second portion of the hashed object fragment created by the hash module (see FIG. 5). Value field 405 contains the value of the option associated with the fragment. The fragment format specifier determines whether the warehouse message includes a value field 405, and if the warehouse message includes a value field, the fragment format specifier determines the size of the value field.

【００４３】図４ｂを参照すると、ウェアハウス応答メ
ッセージの例示的なフォーマットは２つの部分を含む。
これらは、識別子と値である。識別子の部分は４つのフ
ィールドを有する。ヘッダ４０６、ＯＩＤ１４０７、
ＯＩＤ２４０８、及び、重み４０９である。ヘッダ・
フィールド４０６は、このメッセージがウェアハウス応
答メッセージであることを指定し、宛先ウェアハウス・
ノードも指定する。宛先ウェアハウス・ノードは、対応
するウェアハウス・メッセージを受信したときの受信先
のウェアハウス・ノードである。２つのＯＩＤフィール
ド４０７、４０８は、オブジェクト形式指定子とオブジ
ェクト識別子とを含む。第１のＯＩＤフィールド４０７
は対応するウェアハウス・メッセージのＯＩＤフィール
ド４０３と同じである。第２のＯＩＤフィールド４０８
はそれまでにインデックス化されたオブジェクトを識別
する。重みフィールド４０９はＯＩＤ１４０７で識別さ
れたオブジェクトに関連するオプションの重みを含む。
ＯＩＤ１のオブジェクト形式指定子は、ウェアハウス応
答メッセージが重みフィールドを含むかどうか判定し、
ウェアハウス応答メッセージが重みフィールドを含む場
合にはＯＩＤ１のオブジェクト形式指定子がフィールド
のサイズを判定する。ウェアハウス応答メッセージの値
部分は、ＯＩＤ２４０８によって識別されるオブジェ
クトに関連するデータを含むための複数のフィールド４
１０を含む。値の部分の構造とサイズはＯＩＤ２のオブ
ジェクト形式指定子によって決定される。Referring to FIG. 4b, an exemplary format of a warehouse response message includes two parts.
These are identifiers and values. The identifier part has four fields. Header 406, OID1 407,
OID2 408 and weight 409. header·
Field 406 specifies that this message is a warehouse response message and the destination warehouse
Also specify the node. The destination warehouse node is the warehouse node to which the corresponding warehouse message was received. The two OID fields 407, 408 contain an object type specifier and an object identifier. First OID field 407
Is the same as the OID field 403 of the corresponding warehouse message. Second OID field 408
Identifies the object that has been indexed so far. Weight field 409 contains an optional weight associated with the object identified by OID 1407.
The object type specifier of OID1 determines whether the warehouse response message includes a weight field,
If the warehouse response message includes a weight field, the OID1 object type specifier determines the size of the field. The value portion of the warehouse response message contains a plurality of fields 4 to contain data related to the object identified by OID2 408.
10 inclusive. The structure and size of the value part are determined by the OID2 object format specifier.

【００４４】図４ｃを参照すると、挿入メッセージの例
示的なフォーマットは４つのフィールドを有する。これ
らは、ヘッダ４１１、ＯＩＤ４１２、ＨＯＦ４１３、及
び、値４１４である。ヘッダ・フィールド４１４は、そ
のメッセージが挿入メッセージであることを指定し、宛
先インデックス・ノードも指定する。宛先インデックス
・ノードは、ハッシュされたオブジェクト・フラグメン
トの第１の部分で判定される。ＯＩＤフィールド４１２
はオブジェクト形式指定子とオブジェクト識別子とを含
む。ＨＯＦフィールド４１３は、フラグメント形式指定
子とハッシュ・モジュールによって作成されたハッシュ
されたオブジェクト・フラグメントの第２の部分とを含
む（図５参照）。値フィールド４１４はフラグメントに
関連するオプションの値を含む。フラグメント形式指定
子は、挿入メッセージが値フィールド４１４を含むかど
うか判定し、挿入メッセージが値フィールドを含む場合
にはフラグメント形式指定子が値フィールドのサイズを
判定する。Referring to FIG. 4c, an exemplary format of the insert message has four fields. These are a header 411, an OID 412, a HOF 413, and a value 414. Header field 414 specifies that the message is an insert message, and also specifies the destination index node. The destination index node is determined on the first part of the hashed object fragment. OID field 412
Contains an object type specifier and an object identifier. The HOF field 413 contains the fragment type specifier and the second part of the hashed object fragment created by the hash module (see FIG. 5). Value field 414 contains the value of the option associated with the fragment. The fragment format specifier determines whether the insert message includes a value field 414, and if the insert message includes a value field, the fragment format specifier determines the size of the value field.

【００４５】図４ｄを参照すると、サブクエリ・メッセ
ージの例示的なフォーマットは２つの部分を有する。こ
れらは、識別子とサブクエリである。識別子部分は４つ
のフィールドがある。ヘッダ４１５、サブクエリ識別子
（ＱＳＩＤ）４１６、ハッシュされたクエリ・フラグメ
ント（ＨＱＦ）４１７、及び、値４１８である。ヘッダ
・フィールド４１５は、このメッセージがサブクエリ・
メッセージであることを指定し、又、宛先インデックス
・ノードも指定する。宛先インデックス・ノードはハッ
シュされたクエリ・フラグメントの第１の部分で判定さ
れる。ＱＳＩＤフィールド４１６はクエリ形式指定子と
サブクエリ識別子とを含む。ＨＱＦフィールド４１７
は、フラグメント形式指定子とハッシュ・モジュール
（図５参照）によって作成されたハッシュされたサブク
エリ・フラグメントの第２の部分を含む。値フィールド
４１８はフラグメントに関連するオプションの値を含
む。フラグメント形式指定子は、サブクエリ・メッセー
ジが値フィールド４１８を含むかどうか判定し、サブク
エリ・メッセージが値フィールドを含む場合にはフラグ
メント形式指定子が値フィールドのサイズを判定する。
サブクエリ・メッセージのサブクエリ部分は複数のサブ
クエリを含む。サブクエリを有していないサブクエリ・
メッセージは単純サブクエリ・メッセージと呼ばれる。Referring to FIG. 4d, an exemplary format of a subquery message has two parts. These are the identifier and the subquery. The identifier part has four fields. A header 415, a subquery identifier (QSID) 416, a hashed query fragment (HQF) 417, and a value 418. Header field 415 indicates that this message is a subquery
Specifies that this is a message, and also specifies the destination index node. The destination index node is determined on the first part of the hashed query fragment. QSID field 416 contains a query type specifier and a subquery identifier. HQF field 417
Contains a fragment type specifier and a second part of the hashed subquery fragment created by the hash module (see FIG. 5). Value field 418 contains the value of the option associated with the fragment. The fragment format specifier determines whether the subquery message includes a value field 418, and if the subquery message includes a value field, the fragment format specifier determines the size of the value field.
The subquery portion of the subquery message includes a plurality of subqueries. Subqueries without subqueries
The message is called a simple subquery message.

【００４６】図４ｅを参照すると、サブクエリ応答メッ
セージの例示的な実施例は２つの部分である識別子と値
を有する。識別子部分は２つのフィールド、ヘッダ４２
０とＱＳＩＤ４２１を有する。ヘッダ・フィールド４２
０はこのメッセージがサブクエリ応答メッセージである
ことを指定し、又宛先インデックス・ノードも指定す
る。宛先インデックス・ノードは対応するサブクエリ・
メッセージを受信したときの受信先のインデックス・ノ
ードと同じである。ＱＳＩＤフィールド４２１はクエリ
形式指定子とサブクエリ識別子とを含む。サブクエリ応
答メッセージの値部分はサブクエリの結果データを入れ
るための多数のフィールド４２２を有する。値部分の構
造はクエリ形式指定子によって指定される。Referring to FIG. 4e, an exemplary embodiment of a subquery response message has two parts, an identifier and a value. The identifier part consists of two fields, header 42
0 and QSID421. Header field 42
0 specifies that this message is a subquery response message, and also specifies the destination index node. The destination index node is the corresponding subquery
It is the same as the receiving index node when the message is received. QSID field 421 contains a query type specifier and a subquery identifier. The value portion of the subquery response message has a number of fields 422 for containing the result data of the subquery. The structure of the value part is specified by the query format specifier.

【００４７】分散コンピュータ・データベース・システ
ムの各ノードは、後述し又図５及び図６に図示するよう
に、ノード間でのメッセージの送信と受信を担当する通
信モジュールを含む。メッセージの送信には、（１）通
信媒体を介して送信する前にメッセージをキュー（ｑｕ
ｅ）に入れること、（２）通信媒体を介して実際に送信
すること、（３）メッセージ形式（メッセージ・タイ
プ）で決定されたモジュールによりメッセージが受信さ
れたときにメッセージを処理するタスクをキューに入れ
ること、が必要である。メッセージ形式は受信モジュー
ルへ発行されるコマンド（命令）を決定する。コマンド
はメッセージがモジュールにより処理されることになる
手段を決定する。メッセージの送信先となる宛先ノード
は各メッセージのヘッダ・フィールドで指定される。別
のノードからメッセージを受信したときは、メッセージ
の形式（タイプ）はどのモジュールがそのメッセージを
処理するかを決定する。メッセージ形式は各メッセージ
のヘッダ・フィールドにおいて指定される。ホーム・ノ
ードの通信モジュールは複数のフロントエンド・ノード
との通信も担当する。１つのフロントエンド・ノードは
ホーム・ノードへクエリを送信し、このホーム・ノード
は、例えばグラフやフォーマット済みテーブル等の結果
をフロントエンド・ノードへ送信する。Each node of the distributed computer database system includes a communication module responsible for sending and receiving messages between the nodes, as described below and shown in FIGS. To send a message, (1) queue the message (qu) before sending it over the communication medium.
e), (2) the actual transmission via the communication medium, (3) the task of processing the message when the message is received by a module determined by the message type (message type). Is necessary. The message format determines the command issued to the receiving module. Commands determine the means by which messages will be processed by the module. The destination node to which the message is sent is specified in the header field of each message. When a message is received from another node, the type of the message determines which module processes the message. The message format is specified in the header field of each message. The home node's communication module is also responsible for communicating with multiple front-end nodes. One front-end node sends a query to the home node, which sends results, such as graphs and formatted tables, to the front-end node.

【００４８】前述のノードの例示的実施例を考察し、ま
た図５を参照すると、ウェアハウス化を行うノード（ウ
ェアハウス・ノード）５００は外部データベースをスキ
ャンして知識抽出エンジンによるウェアハウス内への収
容（ウェアハウス化）とインデックス化のためにオブジ
ェクトをダウンロードするダウンローダ（ダウンロード
部）５０２を有することができる。各ウェアハウス・ノ
ード５００は種類がことなるダウンローダ５０２を有す
ることがある。例えば１つの種類のダウンローダは標準
ＳＱＬプロトコル、例えばＯＤＢＣやリレーショナル・
データベースを提供する業者が定義した専用プロトコル
等を使用してリレーショナル・データベースからデータ
をダウンロードできる。この場合のダウンロードは１つ
またはそれ以上のＳＱＬクエリ（ＳＱＬプロトコルによ
るクエリ）を用いて行なう。別の例では、ダウンローダ
は記事配給元（ｓｙｎｄｉｃａｔｏｒｓ）からインター
ネットを介して内容を取得するためにネゴシエーション
（交渉のための交信）を行なう情報コンテンツ交換（Ｉ
ＣＥ：ＩｎｆｏｒｍａｔｉｏｎａｎｄＣｏｎｔｅｎ
ｔＥｘｃｈａｎｇｅ）加入者であり得る。これは新聞
記事配信等の時間に敏感な内容を取得する上で好適な機
構である。ダウンローダ５０２はオブジェクトを特徴抽
出部５０４へ送信する。Considering the exemplary embodiment of the node described above, and referring to FIG. 5, the warehousing node (warehouse node) 500 scans an external database into the warehouse by the knowledge extraction engine. A downloader (download unit) 502 that downloads objects for housing (warehouse conversion) and indexing can be provided. Each warehouse node 500 may have a different type of downloader 502. For example, one type of downloader is a standard SQL protocol, such as ODBC or relational.
Data can be downloaded from a relational database using a proprietary protocol or the like defined by the database provider. Downloading in this case is performed using one or more SQL queries (query according to the SQL protocol). In another example, the downloader may negotiate to obtain content from the article syndicators via the Internet to exchange information content (I).
CE: Information and Content
t Exchange) may be a subscriber. This is a suitable mechanism for acquiring time-sensitive contents such as newspaper article distribution. The downloader 502 transmits the object to the feature extraction unit 504.

【００４９】特徴抽出部５０４はオブジェクトから特徴
を抽出する。オブジェクトがリレーショナル・データベ
ースの１つのレコードの場合、特徴抽出はインデックス
化されるフィールドの選択、フィールドの再フォーマッ
ト化、誤っていると判定されたデータの排除又は訂正等
のステップを含む。画像についての特徴抽出はエッジ検
出、画像オブジェクトの識別、画像オブジェクト間の関
連性の決定により行なわれる。別の実施例においては、
画像についての特徴抽出はフーリエ変換又はウェーブレ
ット変換を計算することで行なう。フーリエ変換又はウ
ェーブレット変換の各々は１つの抽出された特徴を構成
する。特徴は複数の挿入メッセージを用いてインデック
スされる。The feature extraction unit 504 extracts features from objects. If the object is a single record in a relational database, feature extraction includes steps such as selecting the fields to be indexed, reformatting the fields, eliminating or correcting data determined to be incorrect. Image feature extraction is performed by edge detection, image object identification, and determination of relevance between image objects. In another embodiment,
Feature extraction for an image is performed by calculating a Fourier transform or a wavelet transform. Each of the Fourier transforms or wavelet transforms constitutes one extracted feature. Features are indexed using multiple insertion messages.

【００５０】特徴抽出部５０４は外部データベースの各
オブジェクト識別子を知識抽出エンジンのオブジェクト
識別子へもマップする。各外部データベースはオブジェ
クト識別子を割り当てるそれ自体の機構を有することが
あり、同じオブジェクトに対する特徴が別々のオブジェ
クト識別子が付けられて各外部データベースに格納され
ていることがある。例えば、１つの外部データベースは
社会保険番号を使用することがある。別の外部データベ
ースは被雇用者識別子を使用することがある。外部オブ
ジェクト識別子からのマッピングは複数のウェアハウス
・メッセージを使用して実現する。The feature extractor 504 maps each object identifier of the external database to the object identifier of the knowledge extraction engine. Each external database may have its own mechanism for assigning object identifiers, and features for the same object may be stored in each external database with a different object identifier. For example, one external database may use social security numbers. Another external database may use employee identifiers. Mapping from external object identifiers is achieved using multiple warehouse messages.

【００５１】フラグメンタ５０６は各特徴に含まれたフ
ラグメントを計算する。各フラグメントは特徴の中の関
連するコンポーネントの有限個の組を含む。１つの実施
例においては、特徴のフラグメントは特徴を定義するデ
ータ構造における各々の属性と各々の関連性を含む。リ
レーショナル・データベース・レコードの形のオブジェ
クトでは、特徴は特徴抽出部５０４によって選択され、
再フォーマット化され、かつ訂正された属性である。こ
のフラグメントはハッシュ・モジュールへ転送される。The fragmentor 506 calculates a fragment included in each feature. Each fragment contains a finite set of related components in the feature. In one embodiment, the feature fragment includes each attribute and each association in the data structure defining the feature. For objects in the form of relational database records, features are selected by the feature extractor 504,
The reformatted and corrected attributes. This fragment is forwarded to the hash module.

【００５２】ハッシュ・モジュール５０８はフラグメン
トのハッシュ関数を計算する。１つの実施例において
は、ハッシュ関数はＩＥＴＦ（ＩｎｔｅｒｎｅｔＥｎ
ｇｉｎｅｅｒｉｎｇＴａｓｋＦｏｒｃｅ）のネット
ワーク作業部会が１９９０年１０月に発行し、インター
ネット上で又はマサチューセッツ工科大学計算機科学ラ
ボラトリ（ＭＩＴＬａｂｏｒａｔｏｒｙｆｏｒＣ
ｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，Ｃａｍｂｒｉｄｇ
ｅ，ＭＡ，ＵＳＡ）のＲ．Ｒｉｖｅｓｔから入手
可能なＲＦＣ（ＲｅｑｕｅｓｔＦｏｒＣｏｍｍｅｎ
ｔ）１１８６仕様に記載されているＭＤ４メッセージダ
イジェスト用アルゴリズムである。ハッシュ・モジュー
ル５０８は、フラグメントの目的がオブジェクト識別子
のマッピングを実現することなのか又はオブジェクトの
特徴をインデックス化することなのかによって、それぞ
れ、ウェアハウス・メッセージ又は挿入メッセージのど
ちらかを通信モジュール５１０へ転送する。The hash module 508 calculates a hash function for the fragment. In one embodiment, the hash function is an IETF (Internet Ent
published by the Networking Working Group of the G. Gingering Task Force in October 1990, on the Internet or at the Massachusetts Institute of Technology Computer Science Laboratory (MIT Laboratory for C).
Omputer Science, Cambridge
e, MA, USA). RFC (Request For Comment) available from Rivest
t) MD4 message digest algorithm described in the 1186 specification. The hash module 508 sends either the warehouse message or the insert message to the communication module 510, depending on whether the purpose of the fragment is to implement a mapping of the object identifier or to index the features of the object, respectively. Forward.

【００５３】類似性コンパレータ（類似性計算部）５１
２は通信モジュール５１０からウェアハウス応答メッセ
ージを受信して挿入メッセージを作成し、このメッセー
ジが通信モジュール５１０へ転送される。類似性コンパ
レータ５１２は識別子がマップされているオブジェクト
についてのウェアハウスからの応答を全部を収集する。
応答内の各オブジェクトについて、類似性コンパレータ
５１２は検索で返された各オブジェクト識別子の関連性
を判定する。関連性のこの判定は、ウェアハウス・ノー
ドにより識別子がマッピングされているオブジェクトと
ＯＩＤが返されたオブジェクトとの間の類似性の度合を
比較することで行なう。１つの実施例においては、クエ
リとオブジェクトとの間の類似性の尺度はコサイン尺度
（ｃｏｓｉｎｅｍｅａｓｕｒｅ）であり、式ＣＯＳ
（ｖ、ｗ）で与えられる。ここでベクトルｖがクエリを
表わし、ベクトルｗがオブジェクトを表わしている。こ
れらのベクトルは各フラグメントが空間の１つの次元を
表わすような空間内に存在する。適合するＯＩＤが見付
かった場合には、そのＯＩＤはマップされたオブジェク
ト識別子として使用され、そのＯＩＤは特徴抽出部５０
４へ転送される。適合するＯＩＤが見付からなかった場
合には、新規のオブジェクト識別子が選択されて、それ
が特徴抽出部５０４へ転送される。Similarity comparator (similarity calculator) 51
2 receives the warehouse response message from communication module 510 and creates an insert message, which is forwarded to communication module 510. The similarity comparator 512 collects all responses from the warehouse for the object to which the identifier is mapped.
For each object in the response, the similarity comparator 512 determines the relevancy of each object identifier returned in the search. This determination of relevancy is made by comparing the degree of similarity between the object to which the identifier is mapped by the warehouse node and the object whose OID is returned. In one embodiment, the measure of similarity between the query and the object is a cosine measure, and the expression COS
(V, w). Here, the vector v represents a query, and the vector w represents an object. These vectors lie in space such that each fragment represents one dimension of space. If a matching OID is found, the OID is used as a mapped object identifier and the OID is
4 is transferred. If no matching OID is found, a new object identifier is selected and transferred to the feature extraction unit 504.

【００５４】ここで図６を参照すると、インデックス・
ノード６００はフラグメント・テーブル・モジュール６
０２を有することができ、このフラグメント・テーブル
・モジュールがウェアハウス・メッセージ、挿入メッセ
ージ、単純サブクエリ・メッセージを通信モジュール６
０４から受信する。ウェアハウス・メッセージの場合、
フラグメント・テーブル・モジュール６０２はＨＯＦフ
ィールドのハッシュ値を用いてローカル・ハッシュ・テ
ーブル６０３の中の１つの値（１つのエントリ）を取り
出す。ＨＯＦフィールド内の形式指定子このとローカル
・ハッシュ・テーブルの中のこのエントリはフラグメン
ト・コンパレータ（フラグメント間の類似性を測るコン
ポーネント）６０６へ転送される。単純サブクエリ・メ
ッセージの場合、フラグメント・テーブル・モジュール
６０２はハッシュされたクエリ・フラグメントフィール
ド内のハッシュ値を用いてローカル・ハッシュ・テーブ
ル６０３のエントリを取り出す。ローカル・ハッシュ・
テーブル６０３のエントリはサブクエリ応答メッセージ
を用いてクエリ・プロセッサ６０８へ返される。挿入メ
ッセージの場合、フラグメント・テーブル・モジュール
６０２は挿入メッセージのＯＩＤフィールド及び値フィ
ールドをローカル・ハッシュ・テーブル６０３のエント
リへ追加することでローカル・ハッシュ・テーブル６０
３のエントリを変更する。Referring now to FIG.
Node 600 is a fragment table module 6
02, and this fragment table module sends warehouse messages, insert messages, simple subquery messages to the communication module 6
04. For warehouse messages,
The fragment table module 602 retrieves one value (one entry) in the local hash table 603 using the hash value of the HOF field. This type specifier in the HOF field and this entry in the local hash table are forwarded to the fragment comparator 606, a component that measures similarity between fragments. For simple subquery messages, the fragment table module 602 retrieves an entry in the local hash table 603 using the hash value in the hashed query fragment field. Local hash
Entries in table 603 are returned to query processor 608 using a subquery response message. In the case of an insert message, the fragment table module 602 adds the OID and value fields of the insert message to entries in the local hash table
Change entry 3

【００５５】フラグメント・コンパレータ６０６はフラ
グメント・テーブル・モジュール６０２からエントリを
受信する。比較関数はフラグメント・テーブル・モジュ
ール６０２から移転されたＨＯＦ形式指定子によって判
定される。比較関数を用いてフラグメント・テーブル・
モジュール６０２から移転されたエントリ内にあるＯＩ
Ｄフィールド及び値フィールドの関連性を決定する。１
つの実施例において、比較関数は類似性重みを決定し、
もっとも大きな類似性重みを有するＯＩＤが関連性を有
していると見なされる。関連性を有するＯＩＤとその類
似性重みはウェアハウス応答メッセージを使用して通信
モジュール６０４へ移転される。The fragment comparator 606 receives an entry from the fragment table module 602. The comparison function is determined by the HOF type specifier transferred from the fragment table module 602. Fragment table using comparison function
OI in entry transferred from module 602
Determine the relevance of the D and value fields. 1
In one embodiment, the comparison function determines a similarity weight,
The OID with the highest similarity weight is considered relevant. Relevant OIDs and their similarity weights are transferred to communication module 604 using a warehouse response message.

【００５６】クエリ・パーサ（クエリ分解部）６１２は
メモリ６１３に格納されているクエリ計算ツリー（ｑｕ
ｅｒｙｃｏｍｐｕｔａｔｕｉｏｎｔｒｅｅ）にクエ
リを分解（解析）するが、この計算ツリーは、複数のノ
ードとその相互の関連性との観点でもって指定されるデ
ータ構造である。クエリ計算ツリーのノードは内部ノー
ド又はリーフノードのどちらかである。内部ノードは１
つ以上の子ノードを有するノードである。内部ノードは
子ノードの結果をどのように組み合わせるべきかを指定
する。例えば、結果は加算されるか又は平均化されるこ
とがあり、又は標準偏差値の計算に使用されることがあ
る。リーフノードはその先に子ノードをいつも有してい
ないノードである。リーフノードは定数値又は単純なサ
ブクエリ・ノードのいずれかである。サブクエリ・ノー
ドは複数のコンポーネント・サブクエリ（ｃｏｍｐｏｎ
ｅｎｔｓｕｂ−ｑｕｅｒｙ）を有することができる。
各コンポーネント・サブクエリも対応するクエリ計算ツ
リーを使用して指定される。クエリ計算ツリーはクエリ
・プロセッサ６０８へ移転される。The query parser (query decomposing unit) 612 stores the query calculation tree (qu) stored in the memory 613.
The query is decomposed (analyzed) into an erroneous computation tree, and the computation tree is a data structure specified in terms of a plurality of nodes and their mutual relations. The nodes of the query computation tree are either internal nodes or leaf nodes. Internal node is 1
A node that has one or more child nodes. Internal nodes specify how to combine the results of the child nodes. For example, the results may be added or averaged, or used to calculate a standard deviation value. Leaf nodes are nodes that do not always have child nodes beyond them. Leaf nodes are either constant values or simple subquery nodes. The subquery node is composed of multiple component subqueries (compon
ent sub-query).
Each component subquery is also specified using a corresponding query computation tree. The query computation tree is transferred to the query processor 608.

【００５７】クエリ・プロセッサ６０８はクエリ処理の
管理を担当する。クエリ・パーサからクエリ計算ツリー
を受信すると、このプロセッサはクエリ識別子（ＱＩ
Ｄ）をクエリへ割り当て、サブクエリを指定するリーフ
ノードの各々にサブクエリ識別子（ＳＱＩＤ）を割り当
てる。コンポーネント・サブクエリを有していないサブ
クエリは単純サブクエリと呼ばれる。サブクエリは指定
されたインデックス・ノードへ通信モジュール６０４を
用いてサブクエリ・メッセージを送信することで処理さ
れる。指定された宛先インデックス・ノードにあるクエ
リ・プロセッサ６０８は、単純サブクエリ・メッセージ
をフラグメント・テーブル・モジュール６０２へ移転す
ることでサブクエリ・メッセージを処理し、モジュール
６０２はサブクエリ応答メッセージでもって応答する。
クエリ・プロセッサ６０８は次にもともとサブクエリ・
メッセージを送信したインデックス・ノードへこのサブ
クエリ応答メッセージを送信する。その結果、クエリ・
プロセッサ６０８はサブクエリ・メッセージとサブクエ
リ応答メッセージとの両方を送信し、かつ受信する。サ
ブクエリ応答メッセージが受信されると、クエリ計算ツ
リーの中で指定された処理が実行される。サブクエリが
コンポーネント・サブクエリを有する場合、このサブク
エリは追加サブクエリ（別のサブクエリ）の処理が必要
となる。クエリ全体（これらの全部のサブクエリと、
「ネストされたサブクエリ」と呼ばれるこれらのサブク
エリ等を含む）が計算されると、その結果がフォーマッ
トされ、このクエリを受信したときのフロントエンドへ
送信される。例えば、この結果はグラフ又はテーブルと
して与えられることがある。従って、クエリの各々又は
ネストされたサブクエリの各々がツリー内部の１つのレ
ベルに関連しているので、クエリ・プロセッサ６０８は
このツリー内部の全てのレベルのクエリの処理を担当す
る。The query processor 608 is responsible for managing query processing. Upon receiving the query computation tree from the query parser, the processor determines the query identifier (QI
D) is assigned to the query, and a subquery identifier (SQID) is assigned to each of the leaf nodes specifying the subquery. A subquery without a component subquery is called a simple subquery. The subquery is processed by sending a subquery message using the communication module 604 to the designated index node. The query processor 608 at the designated destination index node processes the subquery message by transferring the simple subquery message to the fragment table module 602, which responds with a subquery response message.
Query processor 608 then uses the subquery
Send this subquery response message to the index node that sent the message. As a result, the query
Processor 608 sends and receives both subquery messages and subquery response messages. When the subquery response message is received, the processing specified in the query calculation tree is performed. If the subquery has a component subquery, this subquery requires the processing of an additional subquery (another subquery). The entire query (all of these subqueries,
Once these queries (including these subqueries called "nested subqueries") are computed, the results are formatted and sent to the front end when this query was received. For example, the result may be provided as a graph or table. Thus, since each of the queries or each of the nested subqueries is associated with one level within the tree, query processor 608 is responsible for processing all levels of queries within this tree.

【００５８】図７は、例示的なコンピュータ・システム
８００を使って従来のシステム・アーキテクチャを示し
たものである。ユーザ・コンピュータ、フロントエンド
・コンピュータ、及び、インデックス・ノードやウェア
ハウス・ノードを含むコンピュータ・ノードの各々は、
コンピュータ・システム８００のインスタンスとして実
装できる。図７の例示的なコンピュータ・システムは説
明目的のみで議論するものであって、本発明の制限と見
なされるべきものではない。以下の説明では特定のコン
ピュータ・システムを記述する際に共通に使用される術
語を参照することがあるが、ここで説明する概念は図７
に図示してあるシステムとは異なるアーキテクチャを有
するシステムを含め、他のコンピュータ・システムにも
等しく当てはまる。FIG. 7 illustrates a conventional system architecture using an exemplary computer system 800. Each of the user computer, front-end computer, and computer nodes, including index nodes and warehouse nodes,
It can be implemented as an instance of computer system 800. The exemplary computer system of FIG. 7 is discussed for illustrative purposes only, and should not be considered a limitation of the present invention. In the description that follows, reference may be made to terms commonly used when describing a particular computer system, but the concept described herein is illustrated in FIG.
This applies equally to other computer systems, including systems having a different architecture than the system shown in FIG.

【００５９】コンピュータ・システム８００は、従来の
マイクロプロセッサを含むことがある中央演算処理ユニ
ット（ＣＰＵ）８０５、情報の一時的記憶のためのラン
ダム・アクセス・メモリ（ＲＡＭ）８１０、情報の永久
記憶のためのリード・オンリー・メモリ（ＲＯＭ）８１
５を含む。メモリ・コントローラ８２０はシステムＲＡ
Ｍ８１０を制御するために設けてある。バス・コントロ
ーラ８２５はバス８３０を制御するために設けてあり、
割り込みコントローラ８３５は他のシステム・コンポー
ネントからの各種割り込み信号を受信し処理するために
使用される。Computer system 800 includes a central processing unit (CPU) 805, which may include a conventional microprocessor, a random access memory (RAM) 810 for temporary storage of information, and a permanent storage of information. -Only memory (ROM) 81 for
5 is included. The memory controller 820 controls the system RA
It is provided to control M810. A bus controller 825 is provided for controlling the bus 830,
Interrupt controller 835 is used to receive and process various interrupt signals from other system components.

【００６０】大容量記憶はディスケット８４２、ＣＤ−
ＲＯＭ８４７、又はハードディスク８５２によって提供
される。データとソフトウェアはクライアント・コンピ
ュータ８００との間でリムーバブル（着脱可能）媒体例
えばディスケット８４２やＣＤ−ＲＯＭ８４７などを介
して交換できる。ディスケット８４２はコントローラ８
４０によりバス８３０へ接続されているディスケットド
ライブ装置８４１に挿入できる。同様に、ＣＤ−ＲＯＭ
８４７はコントローラ８４５でバス８３０へ接続されて
いるＣＤ−ＲＯＭドライブ装置８４６へ挿入できる。最
後に、ハードディスク８５２は固定ディスクドライブ装
置８５１の一部であり、コントローラ８５０によりバス
８３０へ接続されている。The mass storage is a diskette 842, CD-
Provided by the ROM 847 or the hard disk 852. Data and software can be exchanged with the client computer 800 via a removable (removable) medium, such as a diskette 842 or a CD-ROM 847. Diskette 842 is the controller 8
40 can be inserted into the diskette drive 841 connected to the bus 830. Similarly, CD-ROM
847 can be inserted into a CD-ROM drive 846 connected to the bus 830 by a controller 845. Finally, the hard disk 852 is a part of the fixed disk drive 851, and is connected to the bus 830 by the controller 850.

【００６１】コンピュータ・システム８００へのユーザ
入力は多数の装置により提供できる。例えば、キーボー
ド８５６やマウス８５７はキーボード及びマウス・コン
トローラ８５５によりバス８３０へ接続される。オーデ
ィオ・トランスデューサ８９６はマイクロホンとスピー
カの両方として機能するもので、オーディオ・コントロ
ーラ８９７によりバス８３０へ接続される。その他の入
力装置、例えばペンおよび／またはタブレットや、音声
入力用マイクロホン等、がバス８０３及び適当なコント
ローラを介してクライアント・コンピュータ８００へ接
続できることは当業者には明白なはずである。ＤＭＡコ
ントローラ８６０はシステムＲＡＭ８１０へのダイレク
ト・メモリ・アクセスを実行するために設けてある。視
覚表示はビデオ・コントローラ８６５により生成され、
これがビデオ・ディスプレイ８７０を制御する。コンピ
ュータ・システム８００はクライアント・コンピュータ
８００をバス８９１経由でネットワーク８９５へ相互接
続できるようにするネットワーク・アダプタ８９０も含
む。ネットワーク８９５はローカル・エリア・ネットワ
ーク（ＬＡＮ）、広域ネットワーク（ＷＡＮ）、または
インターネットで、多数のネットワーク装置を相互接続
する汎用通信回線を使用する。[0061] User input to computer system 800 can be provided by a number of devices. For example, a keyboard 856 and a mouse 857 are connected to the bus 830 by a keyboard and mouse controller 855. The audio transducer 896 functions as both a microphone and a speaker, and is connected to the bus 830 by the audio controller 897. It should be apparent to those skilled in the art that other input devices, such as pens and / or tablets, voice input microphones, etc., can be connected to client computer 800 via bus 803 and a suitable controller. DMA controller 860 is provided to perform direct memory access to system RAM 810. The visual display is generated by the video controller 865,
This controls the video display 870. Computer system 800 also includes a network adapter 890 that allows client computer 800 to be interconnected to network 895 via bus 891. Network 895 may be a local area network (LAN), a wide area network (WAN), or the Internet, using general-purpose communication lines that interconnect a number of network devices.

【００６２】コンピュータ・システム８００は一般にオ
ペレーティング・システム・ソフトウェアによって制御
調整される。コンピュータ・システム制御機能の中で
も、オペレーティング・システムはシステム資源の割り
当てを制御し、プロセス・スケジューリング、メモリ管
理、ネットワーキング及びＩ／Ｏサービス等のタスクを
実行する。Computer system 800 is generally controlled and coordinated by operating system software. Among the computer system control functions, the operating system controls the allocation of system resources and performs tasks such as process scheduling, memory management, networking and I / O services.

【００６３】前述した実施態様のコンポーネントのソフ
トウェアでの実装は、コンピュータで読み取り可能な媒
体例えば図７のディスケット８４２、ＣＤ−ＲＯＭ８４
７、ＲＯＭ８１５、は固定ディスク８５２等の有形媒体
上に固定されるか又は媒体８９１上でネットワーク８９
５に接続された通信アダプタ８９０等のモデム又はその
他のインタフェース装置経由で通信可能なコンピュータ
命令及びルーチンを含むことができる。媒体８９１は光
通信回線又はハードワイヤ通信回線等を含みこれに限定
されない有形媒体とするか、又はマイクロ波、赤外線等
を含みこれに限定されない無線技術又はその他の通信技
術のいずれかで実装される。これはまたインターネット
でも良い。送信された場合、ソフトウェア・コンポーネ
ントは搬送波に埋め込まれたデジタル信号の形をとるこ
とができる。コンピュータに対する一連の命令は本発明
に関して本明細書で前述した機能の全部又は一部を実現
する。このようなコンピュータに対する命令が多くのコ
ンピュータ・アーキテクチャ又はオペレーティング・シ
ステムで使用される多数のプログラミング言語で書ける
ことは当業者には理解されよう。更に、こうした命令は
半導体メモリ装置、磁気メモリ装置、光メモリ装置又は
その他のメモリ装置を含みこれに限定されない現在又は
将来の何らかのメモリ技術を用いて記憶したり、光通
信、赤外線通信、マイクロ波通信又はその他の送信技術
を含みこれに限定されない現在又は将来の何らかの通信
技術を用いて送信することができる。このようなコンピ
ュータ・プログラム製品は例えばシュリンクラップした
ソフトウェア等印刷文書又は電子文書を添付したリムー
バブル媒体として配布したり、例えばシステムＲＯＭ又
は固定ディスク上でコンピュータ・システムに導入済み
としたり、又は例えばインターネットやウェブ等のネッ
トワーク上のサーバ又は電子掲示板から配布したりする
ことができる。The software implementation of the components of the embodiments described above can be implemented on a computer readable medium, such as diskette 842, CD-ROM 84 in FIG.
7, the ROM 815 is fixed on a tangible medium such as a fixed disk 852, or a network 89
5 may include computer instructions and routines that can be communicated via a modem or other interface device, such as a communication adapter 890. The medium 891 may be a tangible medium including, but not limited to, an optical communication line or a hard-wired communication line, or may be implemented with any wireless or other communication technology including, but not limited to, microwaves, infrared, etc. . This can also be the Internet. When transmitted, the software component may take the form of a digital signal embedded in a carrier. The series of instructions to the computer implements all or part of the functions described herein above in connection with the present invention. Those skilled in the art will appreciate that the instructions for such a computer may be written in many programming languages used in many computer architectures or operating systems. Further, such instructions may be stored using any current or future memory technology, including but not limited to semiconductor memory devices, magnetic memory devices, optical memory devices, or other memory devices, optical communication, infrared communication, microwave communication or The transmission may be made using any current or future communication technology, including but not limited to other transmission technologies. Such computer program products may be distributed, for example, as removable media with attached printed or electronic documents, such as shrink-wrapped software, installed on computer systems on, for example, a system ROM or fixed disk, or may be installed on the Internet, It can be distributed from a server on a network such as the web or an electronic bulletin board.

【００６４】[0064]

【発明の効果】本発明の代表的な実施態様を開示した
が、本発明の精神及び範囲から逸脱することなく本発明
の利点の幾つかを実現し得るような各種の変化及び変更
を成し得ることは当業者には明らかであろう。同一機能
を実行する他のコンポーネントを適宜置き換え得ること
は当業者には明らかであろう。更に、本発明の方法は適
当な処理装置命令を用いて全てソフトウェアによる実装
として、又はハードウェア論理とソフトウェア論理の組
合せを使用して同一の結果を実現するハイブリッド実装
としてのいずれかで実現することができる。更に、メモ
リのサイズ、特定の機能を実現するために使用される論
理および／または命令の特定の構成、並びに発明の概念
に対するその他の変更等の態様は、添付の請求項により
包含されることを意図している。従って本発明は請求項
の範囲により示されるとおりにのみ制限されるものとし
て解釈されるべきものである。Having described representative embodiments of the present invention, various changes and modifications can be made to achieve some of the advantages of the present invention without departing from the spirit and scope of the invention. Obtaining will be apparent to those skilled in the art. It will be apparent to those skilled in the art that other components that perform the same function may be replaced as appropriate. Further, the method of the present invention may be implemented either entirely as a software implementation using appropriate processor instructions, or as a hybrid implementation that achieves the same result using a combination of hardware and software logic. Can be. Furthermore, aspects such as the size of the memory, the particular organization of the logic and / or instructions used to implement a particular function, and other changes to the inventive concept are intended to be covered by the appended claims. Intended. Accordingly, the invention is to be construed as limited only as indicated by the appended claims.

[Brief description of the drawings]

【図１】図１は、本発明による分散コンピュータ・デー
タベース・システムの１つの実施例のブロック図であ
る。FIG. 1 is a block diagram of one embodiment of a distributed computer database system according to the present invention.

【図２】図２は、フローチャートの形での図１の分散コ
ンピュータ・データベース・システムのブロック図であ
って、この図には別の供給源から本発明の１つの実施例
によるデータ・ウェアハウスへ情報をダウンロードする
方法が示されている。FIG. 2 is a block diagram of the distributed computer database system of FIG. 1 in the form of a flowchart, which illustrates data warehouses from another source according to one embodiment of the present invention. It shows how to download information to.

【図３】図３は、フローチャートの形での図１の分散コ
ンピュータ・データベース・システムのブロック図であ
って、この図は本発明の実施例によるクエリへの応答方
法を示す。FIG. 3 is a block diagram of the distributed computer database system of FIG. 1 in the form of a flowchart, which illustrates a method of responding to a query according to an embodiment of the present invention.

【図４】図４ａから図４ｅは各々、図１〜図３の実施例
に関連して使用されるウェアハウス・メッセージ、ウェ
アハウス応答メッセージ、挿入メッセージ、サブクエリ
・メッセージ、サブクエリ応答メッセージのフォーマッ
トを示すブロック図である。FIGS. 4a to 4e respectively illustrate the format of a warehouse message, a warehouse response message, an insert message, a subquery message, a subquery response message used in connection with the embodiment of FIGS. It is a block diagram shown.

【図５】図５は、本発明の１つの実施例による図１、図
２、図３のホーム・ノードの代表的な１つのブロック図
である。FIG. 5 is a representative block diagram of the home node of FIGS. 1, 2 and 3 according to one embodiment of the present invention.

【図６】図６は、本発明の１つの実施例による図１、図
２、図３のインデックス・ノードの代表的な１つのブロ
ック図である。FIG. 6 is a representative block diagram of the index nodes of FIGS. 1, 2 and 3 according to one embodiment of the present invention.

【図７】図７は、ユーザ・コンピュータ、インデックス
・ノード、ウェアハウス・ノードの各々の例示的実施例
によるコンピュータ・システムのブロック図である。FIG. 7 is a block diagram of a computer system according to an exemplary embodiment of each of a user computer, an index node, and a warehouse node.

Claims

[Claims]

1. In a distributed computer database system in which a plurality of index nodes and a plurality of warehouse nodes are connected by a network, an object or a location of an object is warehoused in a manner conducive to knowledge extraction using a query. Wherein the method comprises: A) a warehouse node extracting a plurality of features of a first group from an object downloaded from another database; and B) the extracted object. Fragmenting each of the features into a plurality of object feature fragments; C) the warehouse node hashing each of the object feature fragments of the first group of the plurality of object features; D) wherein each of the hashed object feature fragments has a first portion and a second portion; and D) the warehouse node comprises a plurality of feature fragments of the first group. Transmitting each of the hashed object feature fragments to a corresponding one of the plurality of index nodes indicated in the first portion of each of the hashed object features; and E) the index node comprises: , The index
Using the second portion of the corresponding hashed object feature fragment to access data according to a local hash table located on the node; and F) the corresponding hashed object feature fragment. G) returning to the warehouse node a plurality of object identifiers corresponding to the accessed data; and G) the warehouse node from the plurality of object identifiers Determining whether to assign the object identifier of the object to the object or to assign an object identifier that is not yet used to the object; H) the warehouse node according to the determination, Assigning an object identifier to the object; I) the warehouse node extracting a second group of features from the object; J) a plurality of object features of the extracted second group. Fragmenting each of the object features into a plurality of object feature fragments; K) the warehouse node hashing each of the object feature fragments of the second group of the plurality of object features, wherein the hashing comprises: And L) wherein the warehouse node determines that the first portion of each of the hashed object feature fragments comprises: a first portion of each of the hashed object feature fragments; The plurality of indexes indicated by Transmitting the scan nodes to corresponding ones of each of said hashed object features fragments of the plurality of objects, wherein a fragment of said second group, M) said index node, the index
Using the second portion of the corresponding hashed object feature fragment to store data according to a local hash table located at the node.

2. The warehouse node further determines a measure of similarity between the accessed data and the object after returning a plurality of object identifiers of the first group. The method of claim 1, comprising:

3. The similarity measure is determined by a similarity function based on features possessed by both the accessed data and the object and based on features owned only by the object. The method according to claim 2, characterized in that:

4. A method of data mining using queries in a distributed computer database system in which a plurality of index nodes are connected by a network, the method comprising: A) a first of the plurality of index nodes. And B) extracting a plurality of sub-queries from a query by a user, wherein the home node extracts a plurality of sub-queries from a query by a user. C) fragmenting each of the subquery features into a plurality of subquery feature fragments; D) the home node. Each of the subqueries includes a subquery feature, a plurality of subqueries, and a calculation specification. Is the said each of the subquery feature fragments Hashing each of the subquery feature fragments, wherein each of the hashed subquery feature fragments has a first portion and a second portion; and E) the home node Transmitting the hashed subquery feature fragments to a corresponding one of the plurality of index nodes indicated in the first portion of each of the hashed subquery feature fragments; The node has the index
Using said second portion of each of said hashed subquery feature fragments to access data according to a local hash table located on a node; and G) said index node is said home node. Iteratively evaluating each subquery of said plurality of subqueries contained in said corresponding subquery sent by said index node, said index node serving as a home node of said subquery of said plurality of subqueries H) the index node comprises: the iterative search of each of the subqueries of the plurality of subqueries included in the accessed data and the corresponding subqueries sent by the home node. According to the information determined by the evaluation , Calculating the information in accordance with the calculated specifications subquery said corresponding transmitted by the home node, I) each of said index nodes, the home
Returning the information to a node.

5. The method of claim 4, further comprising, before the step of extracting the subquery from the query, receiving the query from the user at the home node.

6. A distributed computer database system for storing information objects or locations of information objects in a warehouse, the system comprising: A) a plurality of warehouse nodes and a plurality of index nodes. Wherein the plurality of warehouse nodes and the plurality of index nodes are connected by a network, and B) each of the warehouse nodes, when downloading an object, From first
Extracting a plurality of features of a group of the object features, fragmenting each of the object features into an object feature fragment, hashing each of the object feature fragments, and hashing the object feature having a first portion and a second portion. Transmitting each of said hashed object feature fragments to a corresponding one of said plurality of index nodes indicated in said first portion of said hashed object feature fragments; C) Each of the index nodes uses the second portion of the hashed object feature fragment to access data according to a local hash table located at the index node;
Returning a plurality of object identifiers corresponding to the accessed data to the warehouse node; and D) the warehouse node uses one of the object identifiers of the plurality of object identifiers or is still in use. Assigning any of the unidentified object identifiers to the object, extracting a second group of features from the object, and assigning each of the extracted features of the second group of features to a plurality of object features. Fragmenting into a fragment, hashing each of said object feature fragments of said plurality of object features of said second group into a hashed object feature having a first part and a second part, said hashed object The first of the feature fragments The plurality of indexes indicated by the part
Sending each of said hashed object feature fragments to a corresponding one of the nodes; E) each of said index nodes using said second portion of said hashed object feature fragment Storing an object or an object location according to a local hash table located on said index node.

7. The warehouse node determines a similarity measure for use in assigning an object identifier to the object, the similarity measure between the accessed data and the object. 7. The distributed computer database system according to claim 6, wherein:

8. The warehouse node measures similarity using a feature held by both the accessed data and the object and a similarity function determined by the feature held by the object. The method of claim 7, wherein

9. A distributed computer having a data mining tool for processing queries from users.
A database system comprising: A) a plurality of index nodes connected by a network; and B) each of said index nodes, referred to as a home node of said query, comprising a query from a user. Receiving a plurality of subqueries from the query and a plurality of subquery features from each of the subqueries, fragmenting each of the subquery features into a plurality of subquery feature fragments, and hashing the subquery features of the plurality of subqueries. Into a hashed subquery feature having a first portion and a second portion, and hashing the hashed subquery feature fragment to a corresponding one of the plurality of index nodes indicated in the first portion of the fragment. Of the subquery feature fragment Characterized by transmitting the s, C) In addition, each of the index node, the second of said hashed subqueries feature fragment
Accessing data according to a local hash table located on the index node, iteratively evaluating each subquery included in the corresponding subquery, using the accessed data and the A distributed computer database system comprising: calculating information according to information determined by iterative evaluation; and returning the information to the home node.

10. A distributed computer database system for warehousing and data mining, comprising: A) a plurality of warehouse nodes and a plurality of index nodes, wherein the plurality of warehouse nodes. And the plurality of index nodes are connected by a network. B) Each of the warehouse nodes, upon receiving the download command, responds to the download command by a predetermined one. C) queuing the task, C) queuing the download task in response to the download command, a first task from the object downloaded by the download command.
Extracting a plurality of features of the group of the first group, fragmenting each of the object features into a plurality of object feature fragments, hashing each of the object feature fragments of the plurality of object features of the first group to form a first portion And a retrieval message including each of the hashed object feature fragments as a hashed object feature fragment having a second portion and a second portion indicated by the first portion of the hashed object feature fragment. D) the index node, upon receiving the retrieval message, using the second portion of the hashed object feature fragment to transmit the index to the corresponding one of the index nodes. E) accessing the data according to a local hash table located on the warehouse node, and transmitting a message returning a plurality of object identifiers corresponding to the accessed data to the warehouse node; The warehouse node, upon receiving the plurality of object identifiers from the plurality of index nodes, assigns either one of the object identifiers of the plurality of object identifiers or an unused object identifier to the object. allocation,
Extracting a plurality of features of a second group from the object, fragmenting each of the object features of the plurality of object features of the second group into a plurality of object feature fragments, Hashing each of the object feature fragments of the feature fragment into a hashed object feature fragment having a first portion and a second portion, indicated by the first portion of the hashed object feature fragment Transmitting an insert message including each of the hashed object feature fragments to a corresponding one of the plurality of index nodes; F) the index node, upon receiving the insert message, Using the second portion of the stored object feature fragment to store data according to a local hash table located on the index node.

11. The warehouse node determines a similarity measure for use in assigning an object identifier to the object, the similarity measure between the accessed data and the object. The distributed computer database system according to claim 10, wherein:

12. The warehouse node determines similarity using a similarity function determined by a feature held by both the accessed data and the object and a feature held by the object. The method according to claim 11, wherein the measurement is performed.

13. Having a data mining tool,
A distributed computer database system for processing a query from a user, comprising: A) a plurality of index nodes connected by a network; B) each of said index nodes receiving a command from a user. The index node is referred to as the home node of the command and queues a predetermined task in response to the command; C) the queued query task is Extracting a plurality of subqueries from the query contained in the query command in response to the query command from
Extracting a plurality of subquery features from each of the extracted subqueries; fragmenting each of the subquery features into a plurality of subquery feature fragments; hashing each of the subquery feature fragments to form a first portion and a second portion. Each of the hashed subquery feature fragments to a corresponding one of the plurality of index nodes indicated by the first portion of the hashed subquery feature fragment D) the index node, upon receiving the subquery message, uses the second portion of the hashed subquery feature fragment to transmit the index query message. No Accessing data according to a local hash table located on the code, iteratively evaluating each subquery included in the corresponding subquery,
A distributed computer database system, comprising: calculating information according to the accessed data and the information determined by the iterative evaluation; and transmitting a message returning the information to the home node.

14. The query message according to claim 13, wherein the query message requests predetermined data from the index node in response to a query included in the query command from the user. The method described in.

15. An information retrieval apparatus for processing a query for retrieving information from a database, comprising: A) a mechanism for finding a plurality of features and feature fragments in an index; and B) a mechanism for finding the information. An evaluation mechanism for identifying a plurality of levels of subqueries included in the query and iteratively evaluating the subquery using each of the found features and feature fragments; C A) a mechanism coupled to the evaluation mechanism, wherein a plurality of results of the iterative evaluation of the query and the subquery are collected after a result calculated for the entire query is obtained and stored in a memory; And a mechanism for extracting the information.

16. A method for processing a query for retrieving information from a database, the method comprising: A) finding a plurality of features and feature fragments in an index; and B) finding a plurality of levels of the query. Identifying a plurality of subqueries and iteratively evaluating the subqueries using each of the found features and feature fragments; and C) obtaining the calculated results for the entire query and A method comprising collecting and storing a plurality of results of an iterative evaluation of a subquery.

17. A computer program product for processing a query for retrieving information from a database, the computer program product including a computer executable program embodied on a computer readable medium, The computer-executable program comprises: A) a first code portion for finding a plurality of features and feature fragments in an index; and B) identifying a plurality of levels of sub-queries included in the query. A second code portion for iteratively evaluating the sub-query using each of the found features and feature fragments; and C) the query after the results computed for the entire query are obtained. And a third for collecting and storing multiple results of the iterative evaluation of the subquery. A computer program product comprising a code part.