JP5113157B2

JP5113157B2 - System and method for storing and retrieving data

Info

Publication number: JP5113157B2
Application number: JP2009511192A
Authority: JP
Inventors: ピエドモンテ，クリストファー，エム．
Original assignee: アルゲブレイクスデータコーポレーション
Priority date: 2006-05-15
Filing date: 2007-05-14
Publication date: 2013-01-09
Anticipated expiration: 2027-05-14
Also published as: WO2007134278A2; AU2007249268A1; JP2009537906A; WO2007134278A3; CA2652268A1; EP2024812A2; EP2024812A4

Description

相互参照
本願は以下の同時係属中の特許出願、即ち２００６年５月１５日に出願された米国特許出願第１１／３８３，４７６号明細書、２００６年５月１５日に出願された米国特許出願第１１／３８３，４７７号明細書、２００６年５月１５日に出願された米国特許出願第１１／３８３，４７８号明細書、２００６年５月１５日に出願された米国特許出願第１１／３８３，４７９号明細書、２００６年５月１５日に出願された米国特許出願第１１／３８３，４８０号明細書、及び２００６年５月１５日に出願された米国特許出願第１１／３８３，４８２号明細書に関連し、これら各米国特許出願は参照により本明細書に援用される。 Cross-reference This application is based on the following co-pending patent applications: US patent application 11 / 383,476 filed May 15, 2006, US patent application filed May 15, 2006 No. 11 / 383,477, U.S. Patent Application No. 11 / 383,478, filed May 15, 2006, U.S. Patent Application No. 11/383, filed May 15, 2006. , 479, US patent application Ser. No. 11 / 383,480 filed on May 15, 2006, and US Patent Application Ser. No. 11 / 383,482 filed on May 15, 2006. In connection with the specification, each of these US patent applications is hereby incorporated by reference.

発明の背景
Ｉ．分野
本発明の分野は、データの記憶及びアクセスを行うためのシステム及び方法に関し、より詳細には、データ記憶、データベース照会、及びデータ検索に関する。 BACKGROUND OF THE INVENTION The field of the invention relates to systems and methods for storing and accessing data, and more particularly to data storage, database queries, and data retrieval.

ＩＩ．背景
多くのデータベース及びデータ記憶システムは、データが受信されると、そのデータに対して構造を付与する所定のスキーマを有する。スキーマは、データが当初提供された状態のままでは、データの構造に関する情報を取り込まないことがある。これに加えて、スキーマは、データが実際に提供又は照会される様式に向けて最適化されていない予め規定された関係に則って設計されていることがある。スキーマに固有の論理的関係は、データが実際に記憶される様式に向けて最適化されていないデータベース構造にも繋がる恐れがある。さらに、スキーマ及び／又はそれに関連するデータベース構造に固有の論理的関係は、データクエリにおいて指定することができる論理関係の種類を制限する恐れがある。単一のクエリが記憶装置に複数回アクセスする必要があり得、特に処理速度と記憶装置アクセス速度との開きが広がることから、大きな非効率性が生じる。多大な努力が、関係（リレーショナル）データベース及び他の従来のデータベースのアクセス方法を改良するために払われてきたが、そういった方法は本質的に、予め規定される関係、その結果としてデータに付与される構造によって制限される。多くのデータベースのこういった関係と構造との密結合は、フラットファイル、カンマ区切り値（ＣＳＶ）ファイル、及び拡張可能マークアップ言語（ＸＭＬ）を使用して定義されたデータ等の様々な異なる形式（フォーマット）で提供されるデータの取り込み、変換、及び処理を効率的に行うことを難しくさせもする。 II. BACKGROUND Many databases and data storage systems have a predetermined schema that provides structure to the data as it is received. The schema may not capture information about the structure of the data as it is originally provided. In addition, the schema may be designed according to predefined relationships that are not optimized for the way the data is actually provided or queried. Schema-specific logical relationships can also lead to database structures that are not optimized for the way data is actually stored. Furthermore, logical relationships inherent in the schema and / or associated database structure can limit the types of logical relationships that can be specified in a data query. A single query may need to access the storage device multiple times, especially because of the wide spread between processing speed and storage access speed. A great deal of effort has been devoted to improving relational and other traditional database access methods, but such methods are inherently attached to data as a result of predefined relationships. Limited by the structure. Tight coupling of these relationships and structures in many databases can be done in a variety of different formats such as flat files, comma-separated value (CSV) files, and data defined using extensible markup language (XML). It also makes it difficult to efficiently capture, convert, and process data provided in (format).

発明の概要
本発明の諸態様は、データの記憶及びアクセスを行うためのシステム及び方法を提供する。実施形態例は、データセットを記憶するデータストア、データセットに関する情報を記憶するデータセット情報ストア、データセット間の代数的関係を記憶する代数的関係ストア、代数的関係を使用して、データストアからのデータセットの記憶及びアクセスを最適化するオプティマイザ、及び代数的関係を計算してデータセットを提供するセットプロセッサを含むことができる。実施形態例では、モジュールが、ハードウェア、ファームウェア、及び／又はソフトウェアの組み合わせによって提供されてよく、実施形態例によっては並列処理及び分散記憶を利用してもよい。 SUMMARY OF THE INVENTION Aspects of the present invention provide systems and methods for storing and accessing data. Example embodiments include a data store that stores data sets, a data set information store that stores information about data sets, an algebraic relationship store that stores algebraic relationships between data sets, a data store using algebraic relationships An optimizer that optimizes storage and access of the data set from and a set processor that computes algebraic relationships and provides the data set. In example embodiments, modules may be provided by a combination of hardware, firmware, and / or software, and in some example embodiments, parallel processing and distributed storage may be utilized.

本発明の一態様は、クエリ（照会）言語ステートメント（文）からデータセット間の代数的関係を組み立てるための方法を提供する。別の態様は、要求データセットを提供するための方法を提供する。照会言語ステートメントをシステムに提示することができる。例えば、照会言語ステートメントは、関係データモデルを使用する構造化照会言語（ＳＱＬ）フォーマットであってもよく、又はマークアップ言語フォーマットを使用するＸＱｕｅｒｙフォーマットであってもよい。次に、照会言語ステートメントから複数の代数的関係を組み立て、代数的関係ストアに記憶することができる。このようにして、ステートメントがシステムに提示される都度、データセット間の代数的関係を時間と共に関係ストアに蓄積することができる。実施形態例によっては、照会言語ステートメントは要求データセットを要求しないが、それでも要求データセットの提供に当たって有用な代数的関係を組み立てるために使用されることがある。こういった代数的関係のうちの少なくともいくつかは、関係ストアから検索することができ、要求データセットを提供するために使用することができる。 One aspect of the present invention provides a method for assembling algebraic relationships between data sets from query language statements. Another aspect provides a method for providing a request data set. Query language statements can be presented to the system. For example, the query language statement may be in a structured query language (SQL) format that uses a relational data model, or may be in an XQuery format that uses a markup language format. A plurality of algebraic relationships can then be assembled from the query language statements and stored in the algebraic relationship store. In this way, each time a statement is presented to the system, algebraic relationships between data sets can be accumulated over time in the relationship store. In some example embodiments, query language statements do not require a requested data set, but may still be used to construct algebraic relationships that are useful in providing the requested data set. At least some of these algebraic relationships can be retrieved from the relationship store and used to provide a request data set.

さらなる一態様では、データセット間の代数的関係は、ステートメントがシステムに提示される都度、時間と共に関係ストアに蓄積することができる。代数的関係の代替の集まり（コレクション）を生成して評価し、要求データセットの計算及び提供に使用するために最適な代数的関係の集まりを判断することができる。この最適化は、記憶装置から基礎をなすデータセットを検索するのではなく、代数的関係を使用して実行することができる。その結果、最適化をプロセッサ速度で実行することができ、低速の記憶装置からデータを検索するために要する時間量が最小化される。 In a further aspect, algebraic relationships between data sets can accumulate in the relationship store over time each time a statement is presented to the system. Alternative collections of algebraic relationships (collections) can be generated and evaluated to determine the optimal collection of algebraic relationships for use in calculating and providing the requested data set. This optimization can be performed using algebraic relationships rather than retrieving the underlying data set from storage. As a result, optimization can be performed at processor speed, minimizing the amount of time required to retrieve data from a slow storage device.

別の態様では、照会言語ステートメントが、提供されるべきデータセットを要求し、関係ストアが、照会言語ステートメントから組み立てられないデータセットの他の代数的関係を含む。例によっては、照会言語ステートメントから組み立てられた代数的関係及び関係ストア内の他の代数的関係の両方が、要求データセットの提供に使用され得る。さらなる一態様では、オプティマイザを使用して、要求データセットに等しい結果を定義する代数的関係の複数の集まりを生成することができ、最適化基準を適用して、要求データセットの提供に使用する、複数の代数的関係の集まりのうちの１つを選択することができる。実施形態例によっては、最適化基準は、記憶装置から転送する必要のあるデータ量及び／又は代数的関係の集まりを計算するために記憶装置からデータセットを転送するために要する時間量の推定に基づくことができる。別の例では、最適化基準は、異なる物理フォーマットの又はデータストア内で異なる位置にある同じ論理データを含む等価データセットを識別することができる。 In another aspect, the query language statement requires a data set to be provided, and the relationship store includes other algebraic relationships of the data set that are not assembled from the query language statement. In some examples, both algebraic relationships constructed from query language statements and other algebraic relationships in the relationship store may be used to provide the request data set. In a further aspect, the optimizer can be used to generate multiple collections of algebraic relationships that define results equal to the requested data set, and an optimization criterion is applied and used to provide the requested data set. , One of a set of algebraic relationships can be selected. In some example embodiments, the optimization criterion is an estimate of the amount of data that needs to be transferred from the storage device and / or the amount of time required to transfer the data set from the storage device to calculate a collection of algebraic relationships. Can be based. In another example, the optimization criteria may identify equivalent data sets that contain the same logical data in different physical formats or at different locations in the data store.

別の態様は、要求データセットに等しい結果をそれぞれ定義する少なくとも２つの代替の代数的関係を組み立てることができる、要求データセットを提供するための方法を提供する。データセットは、異なる物理フォーマットで且つ／又はデータストア内の異なる場所に記憶されている同じ論理データを含むことができる。例えば、データセットは、カンマ区切り値（ＣＳＶ）フォーマット、バイナリストリング符号化（ＢＳＴＲ）フォーマット、固定オフセット（ＦＩＸＥＤ）フォーマット、タイプ符号化データ（ＴＥＤ）フォーマット、及び／又はＸＭＬ若しくは他のマークアップ言語フォーマットで記憶媒体に記憶することができる。タイプ符号化データ（ＴＥＤ）は、データ及びそのようなデータのフォーマットを指す関連値を含むファイルフォーマットである。これらは単なる例にすぎず、他の実施形態では、他の物理フォーマットを使用してもよい。データセットは、分散記憶システム内の異なるディスクドライブ等の、データストア内の異なる場所に記憶してもよく、異なるデータ転送速度及び／又は異なる利用可能帯域幅を有する異なるデータチャネルを介してアクセス可能であることとしてもよい。代数的関係のうちの１つを、少なくとも部分的に、代数的関係内で参照されるデータセットの物理フォーマット及び／又は場所に基づいて、要求データセットの提供に使用されるものとして選択することができる。他の例では、代数的関係は、少なくとも部分的に、代数的関係内で参照されるデータセットの検索に使用されるチャネルの速度及び利用可能帯域幅に基づいて選択することができる。 Another aspect provides a method for providing a request data set that can assemble at least two alternative algebraic relationships that each define an equal result to the request data set. The data set can include the same logical data stored in different physical formats and / or at different locations within the data store. For example, the data set may be a comma separated value (CSV) format, a binary string encoding (BSTR) format, a fixed offset (FIXED) format, a type encoded data (TED) format, and / or an XML or other markup language format. Can be stored in a storage medium. Type-encoded data (TED) is a file format that includes data and associated values that point to the format of such data. These are merely examples, and other physical formats may be used in other embodiments. Data sets may be stored in different locations in the data store, such as different disk drives in a distributed storage system, and accessible via different data channels with different data rates and / or different available bandwidth It is good also as being. Selecting one of the algebraic relationships to be used to provide the requested data set based at least in part on the physical format and / or location of the data set referenced within the algebraic relationship. Can do. In other examples, the algebraic relationship may be selected based at least in part on the speed and available bandwidth of the channels used to search for the datasets referenced within the algebraic relationship.

別の態様は、異なる物理フォーマットのオペランドに対して作用する関数を使用して、要求データセットを提供するための方法を提供する。データセットは、カンマ区切り値（ＣＳＶ）フォーマット、バイナリストリング符号化（ＢＳＴＲ）フォーマット、固定オフセット（ＦＩＸＥＤ）フォーマット、タイプ符号化データ（ＴＥＤ）フォーマット、及び／又はＸＭＬ若しくは他のマークアップ言語フォーマット等の複数の物理フォーマットで記憶することができる。データセットをオペランドとして使用する関数が定義される。論理的に等価の関数を、オペランドに使用することができる物理フォーマットの異なる組み合わせに対して定義することができる。要求データセットを提供するために、要求データセットに等しい結果を定義する代数的関係を組み立てることができる。代数的関係は、記憶装置内のデータセットを参照することができる。代数的関係から要求データセットを計算するために、参照データセットは記憶装置から検索され、関数がそのデータセットに対して適用されて、代数的関係において指定される演算が実行される。代数的関係の計算に使用される関数は、データセットが検索される物理フォーマットに対応するように選択することができる。このようにして、別個のフォーマットへの変換を必要とせずに、データセットが検索された物理フォーマットに最適な関数を使用することができる。 Another aspect provides a method for providing a request data set using functions that operate on operands of different physical formats. The data set can be in comma separated value (CSV) format, binary string encoding (BSTR) format, fixed offset (FIXED) format, type encoded data (TED) format, and / or XML or other markup language format, etc. Can be stored in multiple physical formats. A function is defined that uses a data set as an operand. Logically equivalent functions can be defined for different combinations of physical formats that can be used for operands. To provide the request data set, an algebraic relationship can be constructed that defines a result equal to the request data set. Algebraic relationships can refer to data sets in storage. In order to calculate the requested data set from the algebraic relationship, the reference data set is retrieved from storage and a function is applied to the data set to perform the operations specified in the algebraic relationship. The function used to calculate the algebraic relationship can be selected to correspond to the physical format in which the data set is retrieved. In this way, the function best suited for the physical format from which the data set was retrieved can be used without the need for conversion to a separate format.

さらなる一態様では、要求データセットに等しい結果を定義する複数の代数的関係が組み立てられる。代数的関係のうちのいくつかは、物理フォーマットが異なるが論理的に同じデータを参照することができる。データセットの物理フォーマット、それらフォーマットのデータセットに対する動作に利用可能な関数、及び／又は計算に必要とされ得るあらゆるフォーマット変換を考慮した最適化基準を代数的関係に適用することができる。代数的関係は、最適化基準に基づいて選択され、要求データセットの提供に使用することができる。次に、フォーマット固有関数が使用されて、選択された代数的関係が計算される。関数のうちの少なくともいくつかは、代数的関係内で参照されるデータセットの物理フォーマットに基づいて選択される。 In a further aspect, a plurality of algebraic relationships are assembled that define a result equal to the requested data set. Some of the algebraic relationships can refer to the same data but with different physical formats. Optimization criteria can be applied to the algebraic relationships that take into account the physical format of the data sets, the functions available for operation on the data sets of those formats, and / or any format conversion that may be required for the calculation. Algebraic relationships are selected based on optimization criteria and can be used to provide the requested data set. The format-specific function is then used to calculate the selected algebraic relationship. At least some of the functions are selected based on the physical format of the data set referenced within the algebraic relationship.

別の態様では、代数的関係を使用して、新しいデータセットを定義することができる。一実施形態例では、データセットに関する情報を記憶するデータセット情報ストアを提供することができる。新しいデータセットは、データセット識別子をデータセットに関連付け、そのデータセット識別子をデータ情報ストアに記憶することによって作成することができる。例によっては、新しいデータセットは、照会言語ステートメントの部分としてシステムに提示される明示的なデータセットであってよい。 In another aspect, algebraic relationships can be used to define new data sets. In one example embodiment, a data set information store may be provided that stores information about the data set. A new data set can be created by associating a data set identifier with the data set and storing the data set identifier in the data information store. In some examples, the new data set may be an explicit data set that is presented to the system as part of the query language statement.

別の態様では、照会言語ステートメントは、同照会言語ステートメントの受け取り時にデータストアにまだ記憶されていないデータセットのうちの１つ又は複数を指定することができる。実施形態によっては、データセットは、記憶装置内にデータセットを実現化せずに、代数的関係によって定義することができる。 In another aspect, the query language statement may specify one or more of the data sets not yet stored in the data store upon receipt of the query language statement. In some embodiments, a data set can be defined by an algebraic relationship without realizing the data set in storage.

別の態様では、データセットが作成されたときを示す一時的な情報がデータセット情報ストアに記憶される。さらなる一態様では、指定時間前に一時的な情報に関連するデータセットをデータセット情報ストアから除去することにより、データセット情報ストアを一時的に再定義することができる。実現化されていないデータセットが、指定時間前に一時的な情報を有するデータを参照する場合、参照されるデータセットが除去される前に、そのデータセットを実現化してデータストアに記憶することができる。 In another aspect, temporary information indicating when the data set is created is stored in the data set information store. In a further aspect, the data set information store can be temporarily redefined by removing the data set associated with the temporary information from the data set information store prior to the specified time. If an unimplemented data set refers to data with temporary information before a specified time, the data set is realized and stored in the data store before the referenced data set is removed Can do.

別の態様は、スキーマ間のマッピングを使用して要求データセットを提供するための方法を提供する。マッピングは、異なるデータモデルに基づいて複数のスキーマ間に提供することができる。ステートメントは、異なるスキーマ及びデータモデルに基づいてシステムに提示することができる。例えば、ステートメントは、関係データモデルに基づく構造化照会言語（ＳＱＬ）フォーマット及び／又は拡張可能マークアップ言語（ＸＭＬ）データモデルに基づくＸＱｕｅｒｙフォーマットの照会言語としてシステムに提示することができる。これらステートメント及びデータモデルは単なる例にすぎず、他の例では、他のステートメント及びデータモデルをサポートすることもできる。異なるスキーマ及びデータモデルに基づいてシステムに提示されるステートメントから、データセット間の代数的関係を組み立てることができる。データセットが特定のスキーマ及びデータモデルに基づいて要求される場合、マッピングにより、他のスキーマ及びデータモデルに基づく代数的関係を、要求データの提供に使用することができる。 Another aspect provides a method for providing a request data set using a mapping between schemas. Mappings can be provided between multiple schemas based on different data models. Statements can be presented to the system based on different schemas and data models. For example, the statement may be presented to the system as a query language in a structured query language (SQL) format based on a relational data model and / or an XQuery format based on an extensible markup language (XML) data model. These statements and data models are merely examples, and in other examples, other statements and data models may be supported. Algebraic relationships between data sets can be assembled from statements presented to the system based on different schemas and data models. If a data set is requested based on a particular schema and data model, the mapping can use algebraic relationships based on other schemas and data models to provide the requested data.

さらなる一態様では、要求データセットに等しい結果を定義する複数の代数的関係を組み立てることができる。最適化基準を使用して、要求データセットを計算する代数的関係のうちの１つを選択することができる。代数的関係は、異なるスキーマ及びデータモデルに基づくステートメントから組み立てることができる。マッピングは、異なるデータモデルに基づいてスキーマ間に提供することができる。その結果、最適化をより広い代数的関係候補の集合（セット）にわたって実行して、要求データセットを提供することができる。代数的関係は、異なるデータモデルを使用して異なるスキーマに基づくステートメントから組み立てられた場合であっても考えることができる。例えば、代数的関係は、関係データモデルに基づく構造化照会言語（ＳＱＬ）フォーマット及び拡張可能マークアップ言語（ＸＭＬ）モデルに基づくＸＱｕｅｒｙフォーマットの両方でシステムに提示される照会ステートメントから組み立てることができる。そして、続く照会ステートメントがシステムに提示されたことに応答して、これら代数的関係を最適化に使用することができる。例えば、ＳＱＬステートメントから組み立てられた代数的関係を、ＸＱｕｅｒｙステートメントに応答して使用することができる。同様に、ＸＱｕｅｒｙから組み立てられた代数的関係を、ＳＱＬステートメントに応答して使用することもできる。これらは単なる例にすぎず、他の例では、他の種類のステートメント及びデータモデルを使用してもよい。 In a further aspect, a plurality of algebraic relationships that define a result equal to the requested data set can be assembled. An optimization criterion can be used to select one of the algebraic relationships for calculating the required data set. Algebraic relationships can be assembled from statements based on different schemas and data models. Mappings can be provided between schemas based on different data models. As a result, optimization can be performed over a wider set of algebraic relationship candidates to provide a requested data set. Algebraic relationships can be considered even when assembled from statements based on different schemas using different data models. For example, algebraic relationships can be assembled from query statements presented to the system in both a structured query language (SQL) format based on a relational data model and an XQuery format based on an extensible markup language (XML) model. These algebraic relationships can then be used for optimization in response to subsequent query statements being presented to the system. For example, an algebraic relationship assembled from SQL statements can be used in response to an XQuery statement. Similarly, algebraic relationships constructed from XQuery can be used in response to SQL statements. These are merely examples, and other types of statements and data models may be used in other examples.

別の態様は、仮想化を使用してデータセットを記憶するための方法を提供する。データセットは、データストアから除去し、関係ストア内の代数的関係によって定義することができる。データセット情報は、各データセットがデータストアにおいて実現化されているか否かを特定する情報を含むことができる。データセットをいつ仮想化すべきかを判断するための基準を確立することができる。例えば、基準は、データセットのサイズ、参照された回数、及び／又はデータセットがデータストア内でアクセスされた頻度に基づくことができる。データストアにおいて実現化されており、この基準を満たすデータセットについては、データストアからの除去から考えることができる。実施形態例では、データストアにおいて実現化されている他のデータセットに基づいてデータセットを定義する代数的関係（直接的であるか、実現化されているデータセットに直接又は間接的に基づく他の代数的関係を参照することによって間接的であるかに関わらず）が関係ストア内にある場合、これらデータセットを除去することができる。データセットが除去された後、データセット情報ストア内のそのデータセットに関する情報を、識別されたデータセットがデータストア内で実現化されていないことを示すように変更することができる。 Another aspect provides a method for storing a data set using virtualization. Data sets can be removed from the data store and defined by algebraic relationships in the relationship store. The data set information can include information identifying whether each data set is implemented in a data store. Criteria can be established for determining when a data set should be virtualized. For example, the criteria can be based on the size of the data set, the number of times it was referenced, and / or the frequency at which the data set was accessed in the data store. Data sets that are implemented in the data store and that meet this criteria can be considered from removal from the data store. In an example embodiment, an algebraic relationship that defines a data set based on other data sets realized in the data store (direct or other based directly or indirectly on the realized data set) These data sets can be removed if they are in the relationship store (whether indirect by referring to their algebraic relationships). After a data set is removed, information about that data set in the data set information store can be changed to indicate that the identified data set is not implemented in the data store.

さらなる一態様では、データセットを部分集合（サブセット）に分割し、次にそのデータセットをデータ記憶装置から除去することによって仮想化することにより、最適化のためのデータセットを選択することができる。例えば、選択されたデータセットのサブセットであるデータセットをデータストアに追加することができる。例によっては、サブセットは、等しい濃度を有する、選択されたデータセットの部分であってもよく、又は選択されたデータセット内のデータアイテムのスカラー値範囲に基づいて定義してもよい。これらは単なる例にすぎず、他の例では、他のサブセットを定義してもよい。データストアに追加されたサブセットの集合体に基づいて、選択されたデータセットを定義する代数的関係を組み立てることができる。次に、選択されたデータセットをデータストアから除去し、データセット情報ストア内の情報を、選択されたデータセットがデータセットで実現化されていないことを示すように変更することができる。 In a further aspect, the data set for optimization can be selected by dividing the data set into subsets and then virtualizing the data set by removing it from the data storage device. . For example, a data set that is a subset of the selected data set can be added to the data store. In some examples, the subset may be a portion of the selected data set that has an equal concentration, or may be defined based on a scalar value range of data items in the selected data set. These are merely examples, and other subsets may be defined in other examples. Based on the collection of subsets added to the data store, algebraic relationships that define the selected data set can be assembled. The selected data set can then be removed from the data store and the information in the data set information store can be changed to indicate that the selected data set is not implemented in the data set.

さらなる一態様では、仮想データセットを参照する代数的関係を使用して、要求データセットをシステムから検索することができる。例えば、選択されたデータセットは、データストアから除去し、選択されたデータセットを定義する代数的関係で置換することができる。選択されたデータセットがもはやデータストア内で実現化されていない場合であっても、要求された他のデータセットを提供するに当たって使用するために、代数的関係を関係ストアにおいて提供することができる。例えば、要求データセットを定義する代数的関係の複数の集まりを組み立てることができる。これら代数的関係のうちのいくつかは、選択されたデータセットを定義する代数的関係を使用して、（仮想的な）選択されたデータセットへの参照の置換を実行することによって組み立てることができる。例えば、選択されたデータセットを参照する表現を、データストアで実現化される１つ又は複数のサブセットを参照する表現で置換することができる。次に、最適化基準を適用して、要求データセットを計算する、代数的関係の集まりのうちの１つを選択することができる。 In a further aspect, an algebraic relationship that references a virtual data set can be used to retrieve the requested data set from the system. For example, the selected data set can be removed from the data store and replaced with an algebraic relationship that defines the selected data set. Algebraic relationships can be provided in the relation store for use in providing other requested data sets, even if the selected data set is no longer implemented in the data store. . For example, multiple collections of algebraic relationships that define the required data set can be assembled. Some of these algebraic relationships can be assembled by performing substitutions of references to the (virtual) selected dataset using the algebraic relationship that defines the selected dataset. it can. For example, an expression that references a selected data set can be replaced with an expression that references one or more subsets implemented in the data store. The optimization criteria can then be applied to select one of the algebraic collections that compute the required data set.

別の態様では、本発明の上記態様のうちの１つ又は複数を実行するようにプログラムされた１つ又は複数のプロセッサを有するコンピュータシステムが提供される。コンピュータシステムは、データセットストアを提供する揮発性記憶装置及び／又は不揮発性記憶装置を含むことができる。別の態様では、１つ又は複数のハードウェアアクセラレータ又は他の回路が、本発明の上記態様のうちの１つ又は複数を実行するように構成される。別の態様では、本発明の上記態様のうちの１つ又は複数を実行するための実行可能命令を有するコンピュータ可読媒体が提供される。 In another aspect, a computer system is provided having one or more processors programmed to perform one or more of the above aspects of the invention. The computer system can include volatile storage and / or non-volatile storage that provides a data set store. In another aspect, one or more hardware accelerators or other circuits are configured to perform one or more of the above aspects of the invention. In another aspect, a computer readable medium having executable instructions for performing one or more of the above aspects of the invention is provided.

本発明の上記各態様は、単独で使用してもよく、又は上記若しくは以下の説明において説明される本発明の他の態様と組み合わせてもよいことが理解される。 It will be appreciated that each of the above aspects of the invention may be used alone or in combination with other aspects of the invention as described above or in the following description.

参照による援用
本明細書において述べる公開物及び特許出願はすべて、個々の各公開物又は特許出願がとりわけ個々に参照により本明細書に援用されると示されているかのように、参照により本明細書に同じ程度で援用される。 INCORPORATION BY REFERENCE All publications and patent applications mentioned in this specification are herein incorporated by reference as if each individual publication or patent application was specifically indicated to be specifically incorporated herein by reference. To the same extent in the book.

本発明の新規の特徴は、特に添付の特許請求の範囲に記載される。本発明の特徴及び利点のよりよい理解が、本発明の原理が利用される例示的な実施形態を記す以下の詳細な説明及び添付図面を参照することによって得られる。 The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

本発明の実施形態例に関連して使用することができるコンピュータシステムの第１の構造例を示すブロック図である。FIG. 2 is a block diagram illustrating a first example structure of a computer system that can be used in connection with an example embodiment of the invention. 本発明の実施形態例に関連して使用することができるコンピュータネットワークを示すブロック図である。FIG. 2 is a block diagram illustrating a computer network that can be used in connection with example embodiments of the present invention. 本発明の実施形態例に関連して使用することができるコンピュータシステムの第２の構造例を示すブロック図である。FIG. 6 is a block diagram illustrating a second example structure of a computer system that can be used in connection with example embodiments of the invention. 本発明の一実施形態例の論理的構造を示すブロック図である。It is a block diagram which shows the logical structure of one example embodiment of this invention. 本発明の一実施形態例のセットマネージャモジュールに記憶される情報を示すブロック図である。It is a block diagram which shows the information memorize | stored in the set manager module of one embodiment of this invention. 本発明の一実施形態例による、データセットを提出するための方法のフローチャートである。4 is a flowchart of a method for submitting a data set according to an example embodiment of the present invention. 本発明の一実施形態例による、ステートメントを提出するための方法のフローチャートである。4 is a flowchart of a method for submitting a statement, according to an example embodiment of the invention. ステートメント及び図６の方法のＸＳＮツリーの例を示す。FIG. 7 shows an example XSN tree of statements and the method of FIG. 本発明の一実施形態例による、データセットを実現するための方法のフローチャートである。4 is a flowchart of a method for implementing a data set, according to an example embodiment of the present invention. 本発明の一実施形態例による、代数的及び演算的に最適化するための方法のフローチャートである。4 is a flowchart of a method for algebraic and computational optimization, according to an example embodiment of the invention. 本発明の代替の実施形態例による、代数的及び演算的に最適化するための方法のフローチャートである。6 is a flowchart of a method for optimizing algebraically and computationally according to an alternative embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. 本発明の実施形態例による、総合的に最適化するための方法を示す。Fig. 4 illustrates a method for comprehensive optimization according to an exemplary embodiment of the present invention. ＯｐｔｏＮｏｄｅ構造の一例フィールドを示す。An example field of an OptoNode structure is shown. 本発明の一実施形態例によるＯｐｔｏＮｏｄｅ構造の一例のブロック図である。FIG. 4 is a block diagram of an example OptoNode structure according to an example embodiment of the present invention. 本発明の一実施形態例による、代数的関係からデータセットを計算するための方法のフローチャートである。4 is a flowchart of a method for calculating a data set from an algebraic relationship, according to an example embodiment of the invention. 本発明の一実施形態例によるＸＳＮツリー例のブロック図である。FIG. 3 is a block diagram of an example XSN tree according to an example embodiment of the present invention. 本発明の一実施形態例によるＸＳＮツリー例のブロック図である。FIG. 3 is a block diagram of an example XSN tree according to an example embodiment of the present invention. 記憶装置マネージャの実施形態例において使用することができるバッファ連鎖の実施態様例を示すブロック図である。FIG. 6 is a block diagram illustrating an example implementation of a buffer chain that may be used in an example embodiment of a storage manager. 記憶装置マネージャの実施形態例において使用することができるバッファ連鎖の実施態様例を示すブロック図である。FIG. 6 is a block diagram illustrating an example implementation of a buffer chain that may be used in an example embodiment of a storage manager. 記憶装置マネージャの実施形態例において使用することができるバッファ連鎖の実施態様例を示すブロック図である。FIG. 6 is a block diagram illustrating an example implementation of a buffer chain that may be used in an example embodiment of a storage manager. 記憶装置マネージャの実施形態例において使用することができるバッファ連鎖の実施態様例を示すブロック図である。FIG. 6 is a block diagram illustrating an example implementation of a buffer chain that may be used in an example embodiment of a storage manager. 一実施形態例による関係データからＸＭＬへの変換のブロック図である。FIG. 6 is a block diagram of conversion from relational data to XML according to an example embodiment. 一実施形態例による関係データから有向グラフへの変換のブロック図である。FIG. 6 is a block diagram of conversion from relational data to a directed graph according to an example embodiment.

詳細な説明
本発明は様々な変更及び代替の構造に開かれているが、図面に示される実施形態をここで詳細に説明する。しかし、本発明を開示される特定の形態に制限する意図がないことを理解されたい。逆に、本発明が、添付の特許請求の範囲において表される本発明の趣旨及び範囲内にある変更、均等物、及び代替の構造をすべて包含することが意図される。 DETAILED DESCRIPTION While the present invention is open to various modifications and alternative constructions, the embodiments shown in the drawings will now be described in detail. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. On the contrary, the invention is intended to cover all modifications, equivalents, and alternative constructions falling within the spirit and scope of the invention as expressed in the appended claims.

本発明の実施形態例は、拡張集合処理及び代数最適化を使用してデータを記憶し処理するためのシステム及び方法を提供する。一例では、拡張集合理論に基づく汎用データモデルを使用して、広範囲の異種フォーマットで提供されるデータからスカラー情報、構造情報、及び時間的情報を取り込むことができる。例えば、固定フォーマット、カンマ区切り値（ＣＳＶ）フォーマット、拡張可能マークアップ言語（ＸＭＬ）及び他のフォーマットのデータを、情報の損失なしで取り込み、効率的に処理することができる。これら符号化は、物理フォーマットと呼ばれる。同じ論理データを、任意の数の異なる物理フォーマットで記憶することができる。実施形態例は、同じ論理データを保持しながら、これらフォーマット間でシームレスに変換することができる。 The example embodiments of the present invention provide systems and methods for storing and processing data using extended set processing and algebraic optimization. In one example, a universal data model based on extended set theory can be used to capture scalar information, structural information, and temporal information from data provided in a wide variety of heterogeneous formats. For example, fixed format, comma separated value (CSV) format, extensible markup language (XML) and other format data can be captured and processed without loss of information. These encodings are called physical formats. The same logical data can be stored in any number of different physical formats. Embodiments can be seamlessly converted between these formats while retaining the same logical data.

厳密な数学的データモデルを使用することにより、実施形態例は、データ及びデータの相互関係の代数的完全性を維持し、時間的不変性（temporal invariance）を提供し、適応データ再構築を可能にすることができる。 By using a strict mathematical data model, example embodiments maintain algebraic integrity of data and data interrelationships, provide temporal invariance, and allow adaptive data reconstruction Can be.

代数的完全性により、代数的関係がモデリングする情報の操作をその代数的関係の操作で置換することが可能になる。例えば、様々なデータセットを記憶装置からはるかに遅い速度で検索して調べることを必要とせずに、代数表現をプロセッサ速度で評価することにより、クエリを処理することができる。 Algebraic completeness allows the manipulation of information that an algebraic relationship models to be replaced by the operation of that algebraic relationship. For example, queries can be processed by evaluating algebraic expressions at processor speed without having to retrieve and examine various data sets from storage at a much slower speed.

時間的不変性は、情報がシステムから破棄されるまで、情報の一定の値、構造、及び場所を維持することによって提供することができる。「ｉｎｓｅｒｔ」機能、「ｕｐｄａｔｅ」機能、及び「ｄｅｌｅｔｅ」機能等の標準的なデータベース演算により、部分的に、システムにおいてすでに識別されているデータへの参照を含む代数的表現として定義される新しいデータが作成される。このような演算は元のデータを変更しないため、実施形態例は、システムに含まれる情報を、記録されている履歴に存在するそのままの形態で随時調べる能力を提供する。 Time invariance can be provided by maintaining a constant value, structure, and location of information until the information is discarded from the system. New data defined in part as algebraic representations with references to data already identified in the system by standard database operations such as "insert", "update", and "delete" functions Is created. Because such operations do not change the original data, the example embodiments provide the ability to examine the information contained in the system at any time in the form as it exists in the recorded history.

代数的完全性と組み合わせた適応データ再構築は、情報の論理的構造及び物理的構造を、論理的構造と物理的構造との間の厳密な数学的マッピングを維持しながら、変更することを許容する。実施形態例において、適応データ再構築は、クエリ処理を加速化すると共に、不揮発性記憶装置と揮発性記憶装置との間のデータ転送を最小化するのに用いることができる。 Adaptive data reconstruction combined with algebraic integrity allows the logical and physical structure of information to be changed while maintaining a strict mathematical mapping between logical and physical structures To do. In example embodiments, adaptive data reconstruction can be used to speed up query processing and minimize data transfer between non-volatile and volatile storage.

実施形態例は、これら特徴を使用して、ＸＭＬフォーマットで提供されるか、関係フォーマットで提供されるか、それとも他のデータフォーマットで提供されるかに関わりなく、動的に変化するデータのアクセス、統合、及び処理に当たって劇的な効率を提供することができる。特に、実施形態例は以下を提供することができる。
・あらゆる種類の企業情報を、等しい設備で、また、大規模なプログラミングなしで、数学的にモデリングして処理できるようにする、情報構造からの独立性
・データの事前構築化及びデータベースの抽出、変換、及びロード動作、並びに大半のデータベースインデックス構造及びそれに関連する記憶容量の解消
・冗長動作をなくすと共に、適応再構築作業データセットによって不揮発性／揮発性記憶装置境界性能バリアにわたるデータ転送を低減する適応最適化を介しての高速クエリ処理
・スケーラブルであり、超並列計算・記憶システムを十分に利用する高度に非同期かつ並列的な内部動作
・ステートレスエンティティ記録ひいては逐次再利用可能リソースの最小化による性能の向上及び耐故障性の増大
・記録履歴内にそれまでに存在したそのままの状態でデータベースを照会する能力 Example embodiments use these features to access dynamically changing data regardless of whether it is provided in XML format, in a related format, or in other data formats. , Integration, and processing can provide dramatic efficiencies. In particular, example embodiments can provide:
Independence from information structures that allow all types of corporate information to be mathematically modeled and processed with equal equipment and without extensive programming. Data pre-construction and database extraction. Conversion and load operations, and elimination of most database index structures and associated storage capacity-Eliminate redundant operations and reduce data transfer across non-volatile / volatile storage boundary performance barriers with adaptive rebuild work data sets High-speed query processing via adaptive optimization ・ Scalable, highly asynchronous and parallel internal operation that fully utilizes massively parallel computing and storage systems Improvement in fault tolerance and fault tolerance The ability to query the database as it is was

数学的データモデルにより、実施形態例を広範囲のコンピュータ構造及びシステムに使用することが可能であり、ありのままで超並列計算・記憶システムに役立つ。実施形態例に関連して使用することができるコンピュータ構造及びシステムのいくつかの例についてこれより説明する。 The mathematical data model allows example embodiments to be used in a wide range of computer structures and systems, and is useful for massively parallel computing and storage systems. Several examples of computer structures and systems that can be used in connection with example embodiments will now be described.

図１は、本発明の実施形態例に関連して使用することができるコンピュータシステム１００の第１の構造例を示すブロック図である。図１に示すように、コンピュータシステム例は、Intel Xeon（商標）プロセッサ、AMD Opteron（商標）プロセッサ、又は他のプロセッサ等の命令を処理するプロセッサ１０２を含むことができる。複数の実行スレッドを並列処理のために使用することができる。実施形態によっては、単一のコンピュータシステム内にあるのか、クラスタ内にあるのか、又はネットワークを経由してシステムに分散されているかに関わりなく、複数のプロセッサ又は複数のコアを有するプロセッサを使用してもよい。 FIG. 1 is a block diagram that illustrates a first example structure of a computer system 100 that can be used in connection with example embodiments of the invention. As shown in FIG. 1, an example computer system may include a processor 102 that processes instructions, such as an Intel Xeon ™ processor, an AMD Opteron ™ processor, or other processor. Multiple execution threads can be used for parallel processing. Some embodiments use processors with multiple processors or multiple cores regardless of whether they are in a single computer system, in a cluster, or distributed over the network through the system. May be.

図１に示すように、高速キャッシュ１０４をプロセッサ１０２に接続するか、又はプロセッサ１０２内に組み込んで、プロセッサ１０２が最近使用した、又は頻繁に使用する命令又はデータの高速メモリを提供することができる。プロセッサ１０２は、プロセッサバス１０８によってノースブリッジ１０６に接続される。ノースブリッジ１０６は、メモリバス１１２によってランダムアクセスメモリ（ＲＡＭ）１１０に接続され、プロセッサ１０２によるＲＡＭ１１０へのアクセスを管理する。ノースブリッジ１０６は、チップセットバス１１６によってサウスブリッジ１１４にも接続される。そして、サウスブリッジ１１４は周辺バス１１８に接続される。周辺バスは、例えば、ＰＣＩ、ＰＣＩ−Ｘ、ＰＣＩＥｘｐｒｅｓｓ、又は他の周辺バスであってよい。ノースブリッジ及びサウスブリッジは、プロセッサチップセットと呼ばれることが多く、プロセッサ、ＲＡＭ、及び周辺バス１１８上の周辺構成要素の間でのデータ転送を管理する。代替の構造によっては、別個のノースブリッジチップを使用する代わりに、ノースブリッジの機能をプロセッサ内に組み込んでもよい。 As shown in FIG. 1, a high speed cache 104 may be connected to or incorporated within the processor 102 to provide a high speed memory of recently used or frequently used instructions or data by the processor 102. . The processor 102 is connected to the north bridge 106 by a processor bus 108. The north bridge 106 is connected to a random access memory (RAM) 110 by a memory bus 112 and manages access to the RAM 110 by the processor 102. The north bridge 106 is also connected to the south bridge 114 by a chipset bus 116. The south bridge 114 is connected to the peripheral bus 118. The peripheral bus may be, for example, PCI, PCI-X, PCI Express, or other peripheral bus. The North Bridge and South Bridge are often referred to as processor chipsets and manage data transfers between the processor, RAM, and peripheral components on the peripheral bus 118. Depending on the alternative structure, instead of using a separate North Bridge chip, North Bridge functionality may be incorporated into the processor.

実施形態によっては、システム１００は、周辺バス１１８に取り付けられたアクセラレータカード１２２を含むことができる。アクセラレータは、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又は特定の処理を加速させる他のハードウェアを含むことができる。例えば、アクセラレータは、適応データ再構築又は拡張集合処理に使用される代数的表現の評価に使用することができる。 In some embodiments, the system 100 can include an accelerator card 122 attached to the peripheral bus 118. The accelerator may include a field programmable gate array (FPGA) or other hardware that accelerates certain processes. For example, accelerators can be used to evaluate algebraic expressions used for adaptive data reconstruction or extended set processing.

ソフトウェア及びデータは外部記憶装置１２４に記憶され、プロセッサによる使用のためにＲＡＭ１１０及び／又はキャッシュ１０４にロードすることができる。システム１００は、Linux又は他のオペレーティングシステム等のシステムリソースを管理するためのオペレーティングシステム及び本発明の実施形態例によるデータ記憶及び最適化を管理するための、オペレーティングシステムの上で実行されるアプリケーションソフトウェアを含む。 Software and data are stored in the external storage device 124 and can be loaded into the RAM 110 and / or the cache 104 for use by the processor. The system 100 includes an operating system for managing system resources such as Linux or other operating systems and application software running on the operating system for managing data storage and optimization according to example embodiments of the present invention. including.

この例では、システム１００は、周辺バスに接続され、ネットワーク接続ストレージ（ＮＡＳ）等の外部記憶装置及び分散並列処理に使用できる他のコンピュータシステムへのネットワークインタフェースを提供するネットワークインタフェースカード（ＮＩＣ）１２０及び１２１も含む。 In this example, system 100 is connected to a peripheral bus and provides a network interface card (NIC) 120 that provides a network interface to external storage devices such as network attached storage (NAS) and other computer systems that can be used for distributed parallel processing. And 121 are also included.

図２は、複数のコンピュータシステム２０２ａ、２０２ｂ、及び２０２ｃ並びにネットワーク接続ストレージ（ＮＡＳ）２０４ａ、２０４ｂ、及び２０４ｃを有するネットワーク２００を示すブロック図である。実施形態例では、コンピュータシステム２０２ａ、２０２ｂ、及び２０２ｃは、ネットワーク接続ストレージ（ＮＡＳ）２０４ａ、２０４ｂ、２０４ｃに記憶されているデータへのデータ記憶を管理すると共に、ＮＡＳ２０４ａ、２０４ｂ、及び２０４ｃへのデータアクセスを最適化することができる。数学的モデルは、データに用いることができ、コンピュータシステム２０２ａ、２０２ｂ、及び２０２ｃにわたる分散並列処理を使用した評価に用いることができる。コンピュータシステム２０２ａ、２０２ｂ、及び２０２ｃは、ネットワーク接続ストレージ（ＮＡＳ）２０４ａ、２０４ｂ、及び２０４ｃに記憶されているデータの適応データ再構築に並列処理を提供することもできる。これは単なる例にすぎず、広範囲の他のコンピュータ構造及びシステムを使用することが可能である。例えば、並列処理を提供するために、ブレードサーバを使用してもよい。バックプレーン経由でプロセッサブレードを接続して、並列処理を提供することができる。記憶装置はまた、バックプレーンに接続してもよいし、又は別個のネットワークインタフェース経由のネットワーク接続ストレージ（ＮＡＳ）としてもよい。 FIG. 2 is a block diagram illustrating a network 200 having multiple computer systems 202a, 202b, and 202c and network attached storage (NAS) 204a, 204b, and 204c. In the example embodiment, computer systems 202a, 202b, and 202c manage data storage to data stored in network attached storage (NAS) 204a, 204b, 204c and data to NAS 204a, 204b, and 204c. Access can be optimized. The mathematical model can be used for data and can be used for evaluation using distributed parallel processing across computer systems 202a, 202b, and 202c. Computer systems 202a, 202b, and 202c may also provide parallel processing for adaptive data reconstruction of data stored in network attached storage (NAS) 204a, 204b, and 204c. This is merely an example, and a wide range of other computer structures and systems can be used. For example, a blade server may be used to provide parallel processing. Processor blades can be connected via the backplane to provide parallel processing. The storage device may also be connected to the backplane or may be network attached storage (NAS) via a separate network interface.

実施形態例では、プロセッサは別個のメモリ空間を保持し、他のプロセッサによる並列処理のために、ネットワークインタフェース、バックプレーン、又は他のコネクタを通してデータを送信してもよい。他の実施形態では、プロセッサのうちのいくつか又はすべてが、共有仮想アドレスメモリ空間を使用することができる。 In example embodiments, the processor may maintain a separate memory space and send data through a network interface, backplane, or other connector for parallel processing by other processors. In other embodiments, some or all of the processors may use a shared virtual address memory space.

図３は、実施形態例による共有仮想アドレスメモリ空間を使用するマルチプロセッサコンピュータシステム３００のブロック図である。システムは、共有メモリサブシステム３０４にアクセス可能な複数のプロセッサ３０２ａ〜３０２ｆを含む。システムは、複数のプログラマブルハードウェアメモリアルゴリズムプロセッサ（ＭＡＰ）３０６ａ〜３０６ｆをメモリサブシステム３０４に組み込む。各ＭＡＰ３０６ａ〜３０６ｆは、メモリ３０８ａ〜３０８ｆ及び１つ又は複数のフィールドプログラマブルゲートアレイ（ＦＰＧＡ）３１０ａ〜３１０ｆを含むことができる。ＭＡＰは構成可能な機能ユニットを提供し、各プロセッサと密に連携して処理するための特定のアルゴリズム又はアルゴリズムの部分をＦＰＧＡ３１０ａ〜３１０ｆに提供することができる。例えば、実施形態例では、ＭＡＰを使用して、データモデルに関する代数的表現を評価し、適応データ再構築を実行することができる。この例では、各ＭＡＰは、こうった目的のために、すべてのプロセッサからグローバルにアクセス可能である。一構成では、各ＭＡＰはダイレクトメモリアクセス（ＤＡＭ）を使用して関連メモリ３０８ａ〜３０８ｆにアクセスし、それにより、各マイクロプロセッサ３０２ａ〜３０２ｆから独立して、かつ各プロセッサ３０２ａ〜３０２ｆと非同期にタスクを実行することが可能である。この構成では、ＭＡＰは、結果を別のＭＡＰに直接供給して、アルゴリズムのパイプライン化及び並列実行を行うことができる。 FIG. 3 is a block diagram of a multiprocessor computer system 300 that uses a shared virtual address memory space according to an example embodiment. The system includes a plurality of processors 302 a-302 f that are accessible to the shared memory subsystem 304. The system incorporates a plurality of programmable hardware memory algorithm processors (MAPs) 306 a-306 f in the memory subsystem 304. Each MAP 306a-306f may include memories 308a-308f and one or more field programmable gate arrays (FPGAs) 310a-310f. The MAP provides configurable functional units and can provide FPGAs 310a-310f with specific algorithms or portions of algorithms for processing in close cooperation with each processor. For example, in an example embodiment, MAP can be used to evaluate an algebraic representation for a data model and perform adaptive data reconstruction. In this example, each MAP is globally accessible from all processors for these purposes. In one configuration, each MAP uses direct memory access (DAM) to access the associated memory 308a-308f, thereby allowing tasks independent of each microprocessor 302a-302f and asynchronously with each processor 302a-302f. Can be performed. In this configuration, a MAP can feed the results directly to another MAP for algorithmic pipelining and parallel execution.

上記コンピュータ構造及びシステムは単なる例にすぎず、汎用プロセッサ、コプロセッサ、ＦＰＧＡ、及び他のプログラマブル論理素子、システムオンチップ（ＳＯＣ）、特定用途向け集積回路（ＡＳＩＣ）、他の処理要素及び論理要素の任意の組み合わせを使用するシステムを含め、広範囲の他のコンピュータ構造及びシステムを実施形態例に関連して使用することができる。データ管理・最適化システムの全部又は部分をソフトウェア又はハードウェアに実装してよく、ランダムアクセスメモリ、ハードドライブ、フラッシュメモリ、テープドライブ、ディスクアレイ、ネットワーク接続ストレージ（ＮＡＳ）、及び他のローカル又は分散データ記憶装置及びシステムを含め、様々なあらゆるデータ記憶媒体を実施形態例に関連して使用してよいことが理解される。 The above computer structures and systems are merely examples, and general purpose processors, coprocessors, FPGAs, and other programmable logic elements, system on a chip (SOC), application specific integrated circuits (ASICs), other processing elements and logic elements A wide variety of other computer structures and systems can be used in connection with example embodiments, including systems that use any combination of the above. All or part of a data management and optimization system may be implemented in software or hardware, random access memory, hard drives, flash memory, tape drives, disk arrays, network attached storage (NAS), and other local or distributed It will be appreciated that any of a variety of data storage media may be used in connection with the example embodiments, including data storage devices and systems.

実施形態例では、データ管理・最適化システムは、上記又は他のコンピュータ構造及びシステムのうちの任意のもので実行されるソフトウェアモジュールを使用して実施することができる。他の実施形態では、システムの機能は、ファームウェア、図３において参照されるようなフィールドプログラマブルゲートアレイ（ＦＰＧＡ）等のプログラマブル論理素子、システムオンチップ（ＳＯＣ）、特定用途向け集積回路（ＡＳＩＣ）、又は他の処理要素及び論理要素に部分的又は完全に実装することができる。例えば、セットプロセッサ及びオプティマイザは、図１に示すアクセラレータカード１２２等のハードウェアアクセラレータカードの使用を通してハードウェアアクセラレータと共に実装することができる。 In an example embodiment, the data management and optimization system can be implemented using software modules that execute on any of the above or other computer structures and systems. In other embodiments, the functions of the system include firmware, programmable logic elements such as field programmable gate arrays (FPGAs) as referenced in FIG. 3, system on chip (SOC), application specific integrated circuits (ASIC), Alternatively, it can be partially or fully implemented in other processing elements and logic elements. For example, the set processor and optimizer can be implemented with a hardware accelerator through the use of a hardware accelerator card such as the accelerator card 122 shown in FIG.

図４Ａは、ソフトウェアモジュール４００の論理的構造例を示すブロック図である。ソフトウェアは、コンポーネントベースであり、図４Ａに示すように特定の機能をカプセル化したモジュールに編制される。これは単なる例にすぎず、他のソフトウェア構造も同様に使用可能である。 FIG. 4A is a block diagram illustrating an example of the logical structure of the software module 400. The software is component-based and is organized into modules that encapsulate specific functions as shown in FIG. 4A. This is only an example, and other software structures can be used as well.

この実施形態例では、１つ又は複数の様々な物理フォーマットで内部で記憶されたデータをシステムに提示することができる。システムは、拡張集合理論に基づいてデータの数学的表現を作成し、システム内で一意に識別するためのグローバル一意識別子（ＧＵＩＤ）をこの数学的表現に割当てることができる。この実施形態例では、データは、内部では１つ又は複数のデータセットに適用される代数的表現の形態で表現され、ここで、データは、代数的表現の作成時に定義されていてもよく、又は定義されていなくてもよい。データセットは、データセットのメンバと呼ばれるデータ要素セットを含む。一実施形態例では、要素は、演算子（operators）、値、及び／又は他のデータセットの組み合わせから形成されるデータ値又は代数的表現であることができる。この例では、データセットは代数的表現のオペランドである。様々なデータセット間の関係を定義する代数的関係は、セットマネージャ４０２ソフトウェアモジュールによって記憶、管理される。この実施形態では、すべてのデータセットが特定の代数的関係を通して関連するため、代数的完全性（algebraic integrity）が保持される。特定のデータセットがシステムに記憶されてもよく、又はされなくともよい。データセットによっては、他のデータセットとの代数的関係のみによって定義され、システムからそのデータセットを検索するために計算する必要があるものもある。データセットによっては、システムにまだ提供されていないデータセットを参照する代数的関係によって定義され、まだ提供されていないデータセットがいつか将来に提供されるまで計算することができないものさえある。 In this example embodiment, internally stored data in one or more different physical formats can be presented to the system. The system can create a mathematical representation of the data based on the extended set theory and assign a globally unique identifier (GUID) to the mathematical representation for unique identification within the system. In this example embodiment, the data is represented internally in the form of an algebraic representation applied to one or more data sets, where the data may be defined at the time of creation of the algebraic representation, Or it may not be defined. A data set includes a set of data elements called members of the data set. In one example embodiment, an element can be a data value or an algebraic representation formed from a combination of operators, values, and / or other data sets. In this example, the data set is an algebraic operand. Algebraic relationships that define relationships between various data sets are stored and managed by the set manager 402 software module. In this embodiment, algebraic integrity is preserved because all data sets are related through specific algebraic relationships. A particular data set may or may not be stored in the system. Some data sets are defined solely by algebraic relationships with other data sets and need to be calculated to retrieve the data set from the system. Some data sets are even defined by algebraic relationships that reference data sets that have not yet been provided to the system, and cannot even be calculated until some time in the future.

一実施形態例では、代数的関係及び代数的表現内で参照されるＧＵＩＤは、作成されてセットマネージャ４０２に記憶された後は変更されない。これは、他の同時管理装置をロックすることを気にせず及び関連するオーバーヘッドなしでのデータ管理を可能にする時間的不変性を提供する。対応するデータセットの代数的関係及びＧＵＩＤは、セットマネージャ４０２においてのみ添付され、新しい動作の結果として除去又は変更されない。これにより、オペランド及び代数的関係の拡大し続ける世界が生まれ、記録履歴内の情報のいかなるときの状態も再現することができる。この実施形態では、別個の外部識別子を使用して、論理データが経時変化しても同じ論理データを指すことができるが、一意のＧＵＩＤが、特定の時間に存在するそのままの状態のデータセットの各インスタンスを参照するために使用される。セットマネージャ４０２は、ＧＵＩＤに外部識別子及びＧＵＩＤがシステムに追加された時間を示すタイムスタンプを関連付けることができる。セットマネージャ４０２は、ＧＵＩＤに特定のデータセットに関する特定の情報を関連付けることもできる。この情報は、セットマネージャ４０２内に、リスト、テーブル、又は他のデータ構造（この実施形態例では、セットユニバースと呼ばれる）で記憶することができる。データセット間の代数的関係も、セットマネージャ４０２内のリスト、テーブル、又は他のデータ構造（この実施形態例では、代数キャッシュと呼ばれる）に記憶することができる。 In one example embodiment, GUIDs referenced in algebraic relationships and algebraic representations are not changed after they are created and stored in set manager 402. This provides a time invariance that allows data management without worrying about locking other concurrent management devices and without the associated overhead. The algebraic relationship and GUID of the corresponding data set are attached only in the set manager 402 and are not removed or changed as a result of the new operation. This creates a world in which operands and algebraic relationships continue to expand, and any state of information in the record history can be reproduced. In this embodiment, a separate external identifier can be used to point to the same logical data even if the logical data changes over time, but a unique GUID can be used for a raw data set that exists at a particular time. Used to refer to each instance. The set manager 402 can associate a GUID with an external identifier and a time stamp indicating the time when the GUID was added to the system. The set manager 402 can also associate specific information about a specific data set with a GUID. This information can be stored in the set manager 402 in a list, table, or other data structure (referred to in this example embodiment as a set universe). Algebraic relationships between data sets can also be stored in a list, table, or other data structure within the set manager 402 (referred to in this example embodiment as an algebraic cache).

実施形態によっては、セットマネージャ４０２は、不必要又は冗長的な情報をパージ（消去）することができると共に、記録履歴の時間範囲を制限するように時間的に再定義することができる。例えば、不必要又は冗長的な情報を自動的にパージすることができ、ユーザ設定又はコマンドに基づいて、時間的情報を定期的に潰すことができる。これは、指定時間よりも前のタイムスタンプを有するすべてのＧＵＩＤをセットマネージャ４０２から除去することによって達成することができる。これらＧＵＩＤを参照するすべての代数的関係もセットマネージャ４０２から除去される。他のデータセットが、これらＧＵＩＤを参照する代数的関係によって定義されている場合、代数的関係がセットマネージャ４０２から除去される前に、これらデータセットを計算して記憶する必要があり得る。 In some embodiments, the set manager 402 can purge (erase) unnecessary or redundant information and can be redefined in time to limit the time range of the recording history. For example, unnecessary or redundant information can be automatically purged and temporal information can be periodically collapsed based on user settings or commands. This can be accomplished by removing from the set manager 402 all GUIDs that have a time stamp prior to the specified time. All algebraic relationships that reference these GUIDs are also removed from the set manager 402. If other data sets are defined by algebraic relationships that reference these GUIDs, these data sets may need to be calculated and stored before the algebraic relationships are removed from the set manager 402.

一実施形態例では、データセットを記憶装置からパージすることができ、システムは、必要であれば、後に代数的関係に頼ってそのデータセットを再び作成することができる。このプロセスは仮想化と呼ばれる。実際のデータセットがパージされると、かかるデータセットに関連する記憶容量を解放することができるが、システムは、同システムに記憶されている代数的関係に基づいてそのデータセットを識別する能力を保持する。一実施形態例では、大きいか、又は参照される回数が特定の回数閾値未満のデータセットを自動的に仮想化することができる。他の実施形態は、最近殆ど又は全く使用されなかったデータセットの仮想化、より高速のメモリ又は記憶容量を解放するためのデータセットの仮想化、又はセキュリティ強化のためのデータセット仮想化（仮想化後は、代数的関係にアクセスせずにデータセットにアクセスするのはより困難であるため）を含めた他の仮想化基準を使用することができる。こういった設定は、ユーザ構成可能又はシステム構成可能であることができる。例えば、セットマネージャ４０２がデータセットＡ並びにＡがデータセットＢとＣとの共通部分に等しいという代数的関係を含む場合、システムは、データセットＡをセットマネージャ４０２からパージし、必要であれば、データセットＢ及びＣ並びに代数的関係に頼ってデータセットＡを識別するように構成することができる。別の実施形態例では、２つ以上のデータセットが互いに等しい場合、そのデータセットのうちの１つを残してすべてをセットマネージャ４０２から削除することができる。これは、複数のセットが論理的に等しいが、異なる物理フォーマットである場合に発生し得る。このような場合、そのデータセットのうちの１つを残してすべてを除去し、物理的な記憶空間を節約することができる。 In one example embodiment, the data set can be purged from storage, and the system can later recreate the data set if necessary, relying on algebraic relationships. This process is called virtualization. When an actual data set is purged, the storage capacity associated with such a data set can be released, but the system has the ability to identify that data set based on the algebraic relationship stored in the system. Hold. In one example embodiment, a data set that is large or referenced less than a certain number of times threshold can be automatically virtualized. Other embodiments include virtualization of datasets that have been used little or no recently, virtualization of datasets to free up faster memory or storage capacity, or dataset virtualization for enhanced security (virtualization). After virtualization, other virtualization criteria can be used, including because it is more difficult to access the data set without accessing algebraic relationships. These settings can be user configurable or system configurable. For example, if set manager 402 includes an algebraic relationship in which data set A and A are equal to the intersection of data sets B and C, the system purges data set A from set manager 402 and, if necessary, Data sets A can be configured to rely on data sets B and C and algebraic relationships. In another example embodiment, if two or more data sets are equal to each other, all can be deleted from the set manager 402, leaving one of the data sets. This can occur when multiple sets are logically equal but in different physical formats. In such cases, all but one of the data sets can be removed to save physical storage space.

データセットの値をシステムによって計算又は提供する必要がある場合、オプティマイザ４１８が、データセットを定義する代数的関係をセットマネージャ４０２から検索することができる。オプティマイザ４１８は、セットマネージャ４０２からの代数的関係を使用して、データセットを定義する等価の追加の代数的関係を生成することもできる。この場合、次に、最も効率的な代数的関係を、データセットを計算するために選択することができる。 If the values of the data set need to be calculated or provided by the system, the optimizer 418 can retrieve the algebraic relationship that defines the data set from the set manager 402. The optimizer 418 may also use the algebraic relationship from the set manager 402 to generate an equivalent additional algebraic relationship that defines the data set. In this case, the most efficient algebraic relationship can then be selected to calculate the data set.

セットプロセッサ４０４ソフトウェアモジュールは、代数的表現で表されるデータセットの値を計算し、代数的関係を評価するために必要な算術的及び論理的な演算及び関数を実行するエンジンを提供する。セットプロセッサ４０４は、適応データ再構築も可能にする。データセットは、セットプロセッサ４０４の演算及び関数によって操作される際、物理的及び論理的に処理されて、続く演算及び関数を高速化する。セットプロセッサ４０４の演算及び関数は、一実施形態例では、ソフトウェアルーチンとして実装される。しかし、このような演算及び関数は、ファームウェア、図３において参照されるようなフィールドプログラマブルゲートアレイ（ＦＰＧＡ）等のプログラマブル論理素子、システムオンチップ（ＳＯＣ）、特定用途向け集積回路（ＡＳＩＣ）、又は他のハードウェア又はこれらの組み合わせに部分的又は完全に実装することができる。 The set processor 404 software module provides an engine that calculates the values of a data set represented in an algebraic representation and performs the arithmetic and logical operations and functions necessary to evaluate algebraic relationships. The set processor 404 also allows adaptive data reconstruction. As the data set is manipulated by the operations and functions of the set processor 404, it is physically and logically processed to speed up subsequent operations and functions. The operations and functions of the set processor 404 are implemented as software routines in one example embodiment. However, such operations and functions may be implemented in firmware, programmable logic elements such as field programmable gate arrays (FPGAs) as referenced in FIG. 3, system on chip (SOC), application specific integrated circuits (ASIC), or It can be implemented partially or completely on other hardware or a combination thereof.

図４Ａに示すソフトウェアモジュールについてこれよりさらに詳細に説明する。図４Ａに示すように、ソフトウェアは、セットマネージャ４０２及びセットプロセッサ４０４並びにＳＱＬコネクタ４０６、ＳＱＬトランスレータ４０８、ＸＳＮコネクタ４１０、ＸＭＬコネクタ４１２、ＸＭＬトランスレータ４１４、ＸＳＮインタフェース４１６、オプティマイザ４１８、ストレージマネージャ４２０、エグゼクティブ４２２、及び管理者インタフェース４２４を含む。 The software module shown in FIG. 4A will be described in further detail. As shown in FIG. 4A, the software includes a set manager 402 and a set processor 404, an SQL connector 406, an SQL translator 408, an XSN connector 410, an XML connector 412, an XML translator 414, an XSN interface 416, an optimizer 418, a storage manager 420, an executive 422 and an administrator interface 424.

図４Ａの実施形態例では、データセットについてのクエリ及び他のステートメントは、３つのコネクタのうちの１つ、即ちＳＱＬコネクタ４０６、ＸＳＮコネクタ４１０、又はＸＭＬコネクタ４１２を通して提供される。各コネクタは、特定のフォーマットでステートメントを受信及び提供する。一例では、ＳＱＬコネクタ４０６は、ユーザアプリケーション及びＯＤＢＣ準拠のサードパーティの関係データベースシステムを使用するための標準ＳＱＬ９２準拠ＯＤＢＣコネクタを提供し、ＸＭＬコネクタ４１２は、ユーザアプリケーション、対応サードパーティのＸＭＬシステム、及びソフトウェア４００の同じ又は他のシステム上の他のインスタンスを使用するための標準ウェブサービスＷ３ＣＸＱｕｅｒｙ準拠のコネクタを提供する。ＳＱＬ及びＸＱｕｅｒｙは、照会言語ステートメントをシステムに提供するためのフォーマット例であるが、他のフォーマットを使用してもよい。これらフォーマットで提供される照会言語ステートメントは、ＳＱＬトランスレータ４０８及びＸＭＬトランスレータ４１４により、システムが使用する拡張集合記法（ＸＳＮ：extended set notation）に変換される。ＸＳＮコネクタ４１０は、ステートメントをＸＳＮフォーマットで直接受け取るためのコネクタを提供する。拡張集合記法の一例について、以下、本明細書の末尾で説明する。拡張集合記法のこの例は、拡張データセットに関するステートメントをシステムに提示することができるという構文を含む。拡張集合記法のこの例は単なる例にすぎず、他の実施形態では、他の表記を使用してもよい。他の実施形態は、システムに提供されるステートメントから情報を取り込むために、異なる種類及びフォーマットのデータセット及び代数的関係を使用してもよい。 In the example embodiment of FIG. 4A, queries and other statements for the data set are provided through one of three connectors: SQL connector 406, XSN connector 410, or XML connector 412. Each connector receives and provides statements in a specific format. In one example, the SQL connector 406 provides a standard SQL92 compliant ODBC connector for using user applications and an ODBC compliant third party relational database system, and the XML connector 412 includes a user application, a compatible third party XML system, and A standard web service W3C XQuery compliant connector for using other instances of the software 400 on the same or other systems is provided. SQL and XQuery are example formats for providing query language statements to the system, but other formats may be used. Query language statements provided in these formats are converted by SQL translator 408 and XML translator 414 into extended set notation (XSN) used by the system. XSN connector 410 provides a connector for receiving statements directly in XSN format. An example of extended set notation is described below at the end of this specification. This example of extended set notation includes a syntax that allows statements regarding the extended data set to be presented to the system. This example of extended set notation is merely an example, and other notations may be used in other embodiments. Other embodiments may use different types and formats of data sets and algebraic relationships to capture information from statements provided to the system.

ＸＳＮインタフェース４１６は、コネクタからのすべてのステートメントに単一エントリポイントを提供する。ステートメントは、ＳＱＬトランスレータ４０８、ＸＭＬトランスレータ４１４、又はＸＳＮコネクタ４１０からＸＳＮフォーマットで提供される。ステートメントは、拡張集合記法のテキストベースの記述を使用して提供される。ＸＳＮインタフェース４１６は、テキスト記述を、システムが使用する内部表現に変換するパーサ（parser）を提供する。一例では、内部表現は、さらに後述するようにＸＳＮツリーデータ構造を使用する。ＸＳＮステートメントが解析されると、ＸＳＮインタフェース４１６はセットマネージャ４０２を呼び出して、ステートメント中で参照されるデータセットにＧＵＩＤを割当てることができる。ＸＳＮステートメントを表す全体的な代数的関係は、それぞれが代数関係である要素に解析することもできる。一実施形態例では、これら要素は、１〜３のデータセットから参照する、単一の演算で構成される表現を有する代数的関係であってよい。各代数的関係は、セットマネージャ４０２内の代数キャッシュに記憶することができる。各新しい代数的表現に、代数的表現によって定義されるデータセットを表すＧＵＩＤをセットユニバースに追加することができる。これにより、ＸＳＮインタフェース４１６は、システムに提示されたステートメントにおいて指定されるデータセットを参照する複数の代数的関係並びにステートメントを解析する際に作成することができる新しいデータセットを組み立てる。このようにして、ＸＳＮインタフェース４１６及びセットマネージャ４０２は、システムに提示されたステートメントから情報を取り込む。そして、システムがデータセットを計算する必要がある場合には、これらデータセット及び代数的関係を代数的最適化に使用することができる。 The XSN interface 416 provides a single entry point for all statements from the connector. Statements are provided in XSN format from SQL translator 408, XML translator 414, or XSN connector 410. Statements are provided using a text-based description of extended set notation. The XSN interface 416 provides a parser that converts the text description into an internal representation used by the system. In one example, the internal representation uses an XSN tree data structure as described further below. When the XSN statement is parsed, the XSN interface 416 can call the set manager 402 to assign a GUID to the data set referenced in the statement. The overall algebraic relationship that represents the XSN statement can also be parsed into elements that are each an algebraic relationship. In one example embodiment, these elements may be algebraic relationships having a representation composed of a single operation that references from one to three data sets. Each algebraic relationship can be stored in an algebraic cache within the set manager 402. For each new algebraic representation, a GUID representing the data set defined by the algebraic representation can be added to the set universe. This causes the XSN interface 416 to assemble multiple algebraic relationships that reference the data set specified in the statement presented to the system as well as a new data set that can be created when parsing the statement. In this way, XSN interface 416 and set manager 402 capture information from statements presented to the system. And if the system needs to compute data sets, these data sets and algebraic relationships can be used for algebraic optimization.

セットマネージャ４０２は、この例ではセットユニバースと呼ばれる、システムにとって既知のデータセットに関する情報を記憶するデータセット情報ストアを提供する。セットマネージャ４０２は、この例では代数キャッシュと呼ばれる、システムにとって既知のデータセット間の関係を記憶する関係ストアも提供する。図４Ｂは、一実施形態例によるセットユニバース４５０及び代数キャッシュ４５２内に保持される情報を示す。他の実施形態は、データセットに関する情報を記憶するために異なるデータセット情報記憶装置を使用してもよく、又はシステムにとって既知の代数関係に関する情報を記憶するために異なる関連ストアを使用してもよい。 The set manager 402 provides a data set information store that stores information about data sets known to the system, in this example called a set universe. The set manager 402 also provides a relationship store that stores relationships between data sets known to the system, in this example called an algebraic cache. FIG. 4B illustrates information held in the set universe 450 and the algebraic cache 452 according to an example embodiment. Other embodiments may use different data set information stores to store information about data sets, or use different related stores to store information about algebraic relationships known to the system. Good.

図４Ｂに示すように、セットユニバース４５０は、システムにとって既知のデータセットのＧＵＩＤリストを保持することができる。各ＧＵＩＤは、システム内のデータセットの一意の識別子である。セットユニバース４５０は、特定のデータセットについての情報にも各ＧＵＩＤを関連付けることができる。この情報は、例えば、コネクタを通して提示されるステートメント内のデータセットを参照するために使用される外部識別子（特定のデータセットに対して一意であってもよく、又は一意でなくてもよい）、データセットがシステムにとって既知となった時間を示す日／時インジケータ、データセットのフォーマットを示すフォーマットフィールド、及びデータセットのタイプを示すフラグを有するセットタイプを含み得る。フォーマットフィールドは、システム内のデータセットの論理−物理変換モデルを示すことができる。例えば、同じ論理データを、システム内の記憶媒体に異なる物理フォーマットで記憶することが可能である。本明細書において使用する物理フォーマットとは、データが記憶媒体に記憶される際に論理データを符号化するフォーマットを指すものであり、使用される物理的な記憶媒体の特定の種類（例えば、ディスク、ＲＡＭ、フラッシュメモリ等）を指すものではない。フォーマットフィールドは、論理データが記憶媒体上で物理フォーマットにどのようにマッピングされるかを示す。例えば、データセットは、カンマ区切り値（ＣＳＶ）フォーマット、バイナリストリング符号化（ＢＳＴＲ）フォーマット、固定オフセット（ＦＩＸＥＤ）フォーマット、タイプ符号化データ（ＴＥＤ）フォーマット、及び／又はマークアップ言語フォーマットで記憶媒体に記憶することができる。タイプ符号化データ（ＴＥＤ）は、データ及びかかるデータのフォーマットを示す関連する値を含むファイルフォーマットである。これらは単なる例にすぎず、他の実施形態では、他の物理フォーマットを使用してもよい。セットユニバースは、データセットについての情報を記憶するが、基礎データは、この実施形態例では、他のどこにでも、例えば図１の記憶装置１２４、図２のネットワーク接続ストレージ２０４ａ、２０４ｂ、及び２０４ｃ、図３のメモリ３０８ａ〜３０８ｆ、又は他の記憶装置に記憶することができる。データセットによっては物理的な記憶装置に存在せず、システムにとって既知の代数的関係から計算されるものもある。場合によっては、データセットは、システムにまだ提供されていないデータセットを参照する代数関係によって定義され、いつか将来にそれらデータが提供されるまで計算できないことがある。セットタイプは、実現化と呼ばれる、データセットが記憶装置内で利用可能であるか否か、又は仮想化と呼ばれる、データセットが他のデータセットとの代数的関係によって定義されるか否かを示すことができる。実施形態によっては、作成過程中又はシステムからの除去過程中のデータセットを示す遷移タイプ等の他のタイプもサポートすることが可能である。これらは単なる例にすぎず、他の実施形態では、データセットについての他の情報をデータセット情報ストアに記憶してもよい。 As shown in FIG. 4B, the set universe 450 can maintain a GUID list of data sets known to the system. Each GUID is a unique identifier for a data set in the system. The set universe 450 can also associate each GUID with information about a particular data set. This information can be, for example, an external identifier used to reference a data set in a statement presented through the connector (which may or may not be unique for a particular data set), It may include a set type having a date / time indicator that indicates when the data set is known to the system, a format field that indicates the format of the data set, and a flag that indicates the type of the data set. The format field can indicate a logical-to-physical conversion model of the data set in the system. For example, the same logical data can be stored in different physical formats on a storage medium in the system. As used herein, a physical format refers to a format for encoding logical data when data is stored on a storage medium, and is a specific type of physical storage medium used (e.g., disk , RAM, flash memory, etc.). The format field indicates how the logical data is mapped to the physical format on the storage medium. For example, the data set may be stored on a storage medium in comma separated value (CSV) format, binary string encoding (BSTR) format, fixed offset (FIXED) format, type encoded data (TED) format, and / or markup language format. Can be remembered. Type encoded data (TED) is a file format that includes data and associated values that indicate the format of such data. These are merely examples, and other physical formats may be used in other embodiments. The set universe stores information about the data set, but the underlying data is in this example embodiment anywhere else, such as storage device 124 of FIG. 1, network attached storage 204a, 204b, and 204c of FIG. The data can be stored in the memories 308a to 308f in FIG. 3 or other storage devices. Some data sets do not exist in physical storage but are calculated from algebraic relationships known to the system. In some cases, a data set is defined by an algebraic relationship that references a data set that has not yet been provided to the system, and may not be calculated until sometime in the future. The set type is called realization, whether the data set is available in storage or whether the data set is defined by an algebraic relationship with other data sets, called virtualization. Can show. Depending on the embodiment, other types may be supported, such as a transition type indicating a data set that is being created or removed from the system. These are merely examples, and in other embodiments, other information about the dataset may be stored in the dataset information store.

図４Ｂに示すように、代数キャッシュ４５２は、あるデータセットを別のデータセットに関連付ける代数的関係のリストを保持することができる。図４Ｂに示す例では、代数的関係は、データセットが、１〜３つの他のデータセットに対して実行される演算又は関数に等しいことを指定することができる（図４Ｂでは「ｇｕｉｄＯＰｇｕｉｄｇｕｉｄｇｕｉｄ」として示される）。演算及び関数の例としては、射影関数、逆関数、濃度関数、結合関数、及び制約関数が挙げられる。追加の例を、拡張集合記法例の部分として本明細書の末尾において述べる。代数的関係は、データセットが別のデータセットに対して特定の関係を有することも指定することができる（図４Ｂでは「ｇｕｉｄＲＥＬｇｕｉｄ」として示される）。関係演算子の例としては、拡張集合記法例の部分として本明細書の末尾にさらに説明されるように、等しい、サブセット、及び互いに素並びにそれらの論理否定が挙げられる。これらは単なる例にすぎず、他の実施形態では、４つ以上のデータセットに対して作用する関数を含め、他の演算、関数、及び関係演算子を使用してもよい。 As shown in FIG. 4B, algebraic cache 452 may maintain a list of algebraic relationships that associate one data set with another data set. In the example shown in FIG. 4B, the algebraic relationship can specify that the data set is equivalent to an operation or function performed on one to three other data sets (in FIG. 4B, “guid OP GUID shown as “guid guide”). Examples of operations and functions include projection functions, inverse functions, density functions, binding functions, and constraint functions. Additional examples are described at the end of this specification as part of an extended set notation example. An algebraic relationship can also specify that a data set has a specific relationship to another data set (shown as “guid REL guide” in FIG. 4B). Examples of relational operators include equal, subset, and disjoint and their logical negation, as further described at the end of this specification as part of an extended set notation example. These are merely examples, and in other embodiments, other operations, functions, and relational operators may be used, including functions that operate on more than three data sets.

他のモジュールがセットマネージャ４０２にアクセスして、データセットの新しいＧＵＩＤを追加することができ、セットマネージャ４０２は、他の代数的関係の最適化及び評価に使用するためにデータセット間の既知の関係を検索することができる。例えば、システムは、第１のデータセットＡと第２のデータセットＢとの積集合であるデータセットを指定する照会言語ステートメントを受け取り得る。システムは、結果として得られるデータセットＣを決定し、返すことができる。この例では、この要求を処理しているモジュールは、セットマネージャ４０２を呼び出して、データセットＡとデータセットＢとの積集合の評価に有用であり得るデータセットＡ及びＢの既知の関係を代数キャッシュから得ることができる。記憶システムからデータセットＡ及びＢの基礎データを実際に検索せずに、既知の関係を使用して結果を特定することが可能であり得る。セットマネージャ４０２は、データセットＣの新しいＧＵＩＤを作成し、その関係を代数キャッシュに記憶することもできる（即ち、データセットＣはデータセットＡとデータセットＢとの積集合に等しい）。この関係が代数キャッシュに追加されると、将来の最適化及び計算に使用できるようになる。すべてのデータセット及び代数的関係はセットマネージャ４０２に保持することができ、時間的不変性を提供することができる。既存のデータセット及び代数的関係は、システムが新しいステートメントを受け取る際に削除又は変更されない。変更されずに、新しいステートメントが受信されると、新しいデータセット及び代数的関係が組み立てられ、セットマネージャ４０２に追加される。例えば、データをデータセットから除去するように要求される場合、新しいＧＵＩＤをセットユニバースに追加し、元のデータセット及び除去すべきデータとの差として代数キャッシュにおいて定義することができる。 Other modules can access the set manager 402 to add new GUIDs for the data set, and the set manager 402 can use the known between data sets for use in optimizing and evaluating other algebraic relationships. You can search for relationships. For example, the system may receive a query language statement that specifies a data set that is the intersection of a first data set A and a second data set B. The system can determine and return the resulting data set C. In this example, the module processing this request calls the set manager 402 to algebraize the known relationships of datasets A and B that may be useful in evaluating the intersection of dataset A and dataset B. Can be obtained from the cache. It may be possible to identify results using known relationships without actually retrieving the underlying data for data sets A and B from the storage system. The set manager 402 can also create a new GUID for data set C and store the relationship in the algebraic cache (ie, data set C is equal to the intersection of data set A and data set B). Once this relationship is added to the algebraic cache, it can be used for future optimizations and calculations. All data sets and algebraic relationships can be maintained in the set manager 402 and can provide temporal invariance. Existing data sets and algebraic relationships are not deleted or changed when the system receives a new statement. If a new statement is received without modification, a new data set and algebraic relationship is assembled and added to the set manager 402. For example, if data is required to be removed from the data set, a new GUID can be added to the set universe and defined in the algebraic cache as the difference between the original data set and the data to be removed.

オプティマイザ４１８は、代数的表現をＸＳＮインタフェース４１６から受け取り、計算に向けて最適化する。データセットを計算する必要がある場合（例えば、記憶システム内で実現化するため、又はユーザからの要求に応答して返すために）、オプティマイザ４１８は、そのデータセットを定義する代数的関係を代数キャッシュから検索する。そして、オプティマイザ４１８は、等価のデータセットを定義する他の代数的関係の複数の集まりを生成することができる。代数的置換は、代数キャッシュからの他の代数的関係を使用して行うことができ、代数的演算を使用して、代数的に等価の関係を生成することができる。一実施形態例では、可能な代数関係の集まりがすべて、指定されたデータセットに等しいデータセットを定義する代数キャッシュ内の情報から生成される。 The optimizer 418 receives the algebraic representation from the XSN interface 416 and optimizes it for computation. When a data set needs to be computed (eg, to be implemented in a storage system or returned in response to a request from a user), the optimizer 418 algebraizes an algebraic relationship that defines the data set. Search from the cache. The optimizer 418 can then generate multiple collections of other algebraic relationships that define equivalent data sets. Algebraic replacement can be done using other algebraic relationships from the algebraic cache, and algebraic operations can be used to generate algebraically equivalent relationships. In one example embodiment, all possible algebraic collections are generated from information in an algebraic cache that defines a data set equal to the specified data set.

次に、オプティマイザ４１８は、代数的関係の各集まりからデータセットを計算するための推定コストを判断することができる。コストは、コスト関数を代数的関係の各集まりに適用することによって判断することができ、コストの最も低い代数的関係の集まりを使用して、指定されたデータセットを計算することができる。一実施形態例では、コスト関数は、代数的関係の各集まりを計算するために必要なデータセットを記憶装置から検索し、その結果を記憶装置に記憶するために要する推定時間を判断する。同じデータセットが、代数的関係の集まり内で２回以上参照される場合、データセットを検索するコストは、最初に検索された後はメモリ内で利用可能になるため、その割り当ては１度だけでよい。この例では、要するデータ転送時間が最小の代数的関係の集まりが、要求されたデータセットを計算するために選択される。 The optimizer 418 can then determine an estimated cost for calculating the data set from each collection of algebraic relationships. Cost can be determined by applying a cost function to each set of algebraic relationships, and the lowest cost set of algebraic relationships can be used to calculate a specified data set. In one example embodiment, the cost function retrieves the data set needed to calculate each collection of algebraic relationships from the storage device and determines the estimated time required to store the result in the storage device. If the same data set is referenced more than once in a collection of algebraic relationships, the cost of searching the data set will be available in memory after it is first searched, so the assignment is only once It's okay. In this example, the collection of algebraic relationships that requires the least data transfer time is selected to calculate the requested data set.

オプティマイザ４１８は、異なるデータチャネルを介して異なる物理ロケーションに、且つ／又は異なる物理フォーマットで記憶されている同じ論理データを参照する代数的関係の異なる集まりを生成することができる。データは論理的に同じであるが、異なるＧＵＩＤを有する異なるデータセットを使用して、場所又はフォーマットが異なるが論理的には同じデータを区別することができる。代数的関係の異なる集まりは、異なる場所から、且つ／又は異なるフォーマットのデータセットを検索するためにかかる時間が異なり得るため、異なるコストを有し得る。例えば、論理的に同じデータが、同じデータチャネルを介して異なるフォーマットで利用可能であり得る。フォーマットの例としては、カンマ区切り値（ＣＳＶ）フォーマット、バイナリストリング符号化（ＢＳＴＲ）フォーマット、固定オフセット（ＦＩＸＥＤ）フォーマット、タイプ符号化データ（ＴＥＤ）フォーマット、及びマークアップ言語フォーマットを挙げることができる。他のフォーマットを使用してもよい。データチャネルが同じ場合、最小サイズを有する（ひいては記憶装置から転送するためのバイト数が最小の）物理フォーマットを選択することができる。例えば、カンマ区切り値（ＣＳＶ）フォーマットは、多くの場合、固定オフセット（ＦＩＸＥＤ）フォーマットよりも小さい。しかし、より大きなフォーマットがより高速のデータチャネルを介して利用可能な場合、より小さなフォーマットよりもより大きなフォーマットが選択され得る。特に、一般に、ディスクドライブ又はフラッシュメモリ等のより低速の不揮発性記憶装置で利用可能なより小さなフォーマットよりも、ＤＲＡＭ等の高速不揮発性メモリで利用可能なより大きなフォーマットが選択される。 The optimizer 418 can generate different collections of algebraic relationships that reference the same logical data stored at different physical locations and / or in different physical formats via different data channels. Although the data is logically the same, different data sets with different GUIDs can be used to distinguish data that is logically the same in different locations or formats. Different collections of algebraic relationships may have different costs because the time taken to retrieve data sets from different locations and / or different formats may differ. For example, logically the same data may be available in different formats via the same data channel. Examples of formats include comma separated value (CSV) format, binary string encoding (BSTR) format, fixed offset (FIXED) format, type encoded data (TED) format, and markup language format. Other formats may be used. If the data channels are the same, the physical format having the smallest size (and thus the smallest number of bytes to transfer from the storage device) can be selected. For example, the comma separated value (CSV) format is often smaller than the fixed offset (FIXED) format. However, if a larger format is available via a faster data channel, a larger format can be selected over a smaller format. In particular, a larger format that is available in a high-speed non-volatile memory such as DRAM is generally selected than a smaller format that is available in a slower non-volatile storage device such as a disk drive or flash memory.

このようにして、オプティマイザ４１８は、データ記憶装置からデータセットの基礎データにアクセスせずに、高速のプロセッサ速度をうまく利用して、代数的関係を最適化する。命令を実行するプロセッサ速度は、多くの場合、記憶装置からのデータアクセス速度よりも高い。代数的関係を計算前に最適化することにより、記憶装置からの不必要なデータアクセスを回避することができる。オプティマイザ４１８は、多数の等価の代数的関係及び最適化技法をプロセッサ速度で考慮し、表現を実際に評価するために必要なデータアクセスの効率を考慮することができる。例えば、システムは、データセットＡ、Ｂ、及びＤの積集合であるデータを要求するクエリを受け取り得る。オプティマイザ４１８は、これらデータセットに関する既知の関係をセットマネージャ４０２から取得し、評価前に表現を最適化することができる。例えば、データセットＣがデータセットＡとＢとの積集合に等しいことを示す既存の関係を代数キャッシュから取得することができる。データセットＡ、Ｂ、及びＤの積集合を計算するのではなく、オプティマイザ４１８は、データセットＣとＤとの積集合を計算して、等価の結果を得るほうが効率的であると判断することができる。この判断を下すに当たり、オプティマイザ４１８は、データセットＣがデータセットＡ及びＢよりも小さく、記憶装置からより高速に取得できることを考慮するか、又はデータセットＣが最近の演算に使用されており、より高速のメモリ又はキャッシュにすでにロードされていることを考慮することができる。 In this way, the optimizer 418 optimizes algebraic relationships by taking advantage of the fast processor speed without accessing the underlying data of the data set from the data storage device. The processor speed for executing instructions is often higher than the data access speed from the storage device. By optimizing the algebraic relationship before calculation, unnecessary data access from the storage device can be avoided. The optimizer 418 can consider a number of equivalent algebraic relations and optimization techniques at processor speed, and can take into account the efficiency of data access required to actually evaluate the representation. For example, the system may receive a query that requests data that is the intersection of data sets A, B, and D. The optimizer 418 can obtain known relationships for these data sets from the set manager 402 and optimize the representation prior to evaluation. For example, an existing relationship can be obtained from the algebraic cache indicating that data set C is equal to the intersection of data sets A and B. Rather than calculating the intersection of datasets A, B, and D, optimizer 418 determines that it is more efficient to calculate the intersection of datasets C and D to obtain equivalent results. Can do. In making this determination, the optimizer 418 considers that the data set C is smaller than the data sets A and B and can be obtained from the storage device faster, or the data set C is used for recent operations, It can be considered that it is already loaded into a faster memory or cache.

オプティマイザ４１８は、セット及び代数キャッシュの解析を通して発見された追加の関係及びセットの提示を介して、セットマネージャ４０２内の情報を継続的に充実化することもできる。このプロセスは、総合的最適化（comprehensive optimization）と呼ばれる。例えば、オプティマイザ４１８は、使用されていないプロセッササイクルを利用して関係及びデータセットを解析して、将来の要求の評価を最適化するに当たって有用であると予想される新しい関係を代数キャッシュに追加すると共に、セットをセットユニバースに追加することができる。関係が代数キャッシュに入力されてしまえば、セットプロセッサ４０４によって実行されている計算が完了していない場合であっても、オプティマイザ４１８は、後続するステートメントを処理しながらそれら計算を利用することができる。有用であり得る総合的最適化のための多くのアルゴリズムがある。これらアルゴリズムは、最近の時間期間中の使用パターン又は使用傾向を示す、限られた数のセットに対して繰り返される計算の発見に基づくことができる。 The optimizer 418 may also continually enrich information in the set manager 402 through additional relationships and set presentations discovered through set and algebraic cache analysis. This process is called comprehensive optimization. For example, the optimizer 418 utilizes unused processor cycles to analyze relationships and data sets and adds new relationships that are expected to be useful in optimizing the evaluation of future requirements to the algebraic cache. A set can be added to the set universe. Once the relationship is entered into the algebraic cache, the optimizer 418 can utilize the computations while processing subsequent statements, even if the computations being performed by the set processor 404 are not complete. . There are many algorithms for comprehensive optimization that can be useful. These algorithms can be based on the discovery of computations repeated over a limited number of sets that indicate usage patterns or usage trends during the recent time period.

セットプロセッサ４０４は、最適化後に、代数的関係の選択された集まりを実際に計算する。セットプロセッサ４０４は、代数的拡張集合表現で指定されるデータセットを実現するために必要な算術的及び論理的な処理を提供する。一実施形態例では、セットプロセッサ４０４は、代数的関係において参照される演算及び関数を計算するために使用できる関数の集まりを提供する。関数の集まりは、特定の物理フォーマットのデータセットを受け取るように構成された関数を含み得る。この例では、セットプロセッサ４０４は、データセットに対して作用する複数の異なる代数的に等価の関数を提供し、結果を異なる物理フォーマットで提供することができる。代数的関係を計算するために選択される関数（オプティマイザ４１８によって最適化中に選択することができる）は、代数的関係内で参照されるデータセットのフォーマットに対応する。実施形態例では、セットプロセッサ４０４は複数の同時演算の並列処理が可能であり、ストレージマネージャ４２０を介して、データの入出力をパイプライン化して、不揮発性／揮発性記憶装置境界を渡る必要のあるデータ総量を最小化することができる。特に、選択された集まりからの代数的関係は、並列処理のために様々な処理リソースに割り当てることができる。こういった処理リソースは、図１に示すプロセッサ１０２及びアクセラレータ１２２、図２に示す分散コンピュータシステム、図３に示す複数のプロセッサ３０２及びＭＡＰ３０６、又は上記のうちの任意のものでのマルチスレッド実行を含むことができる。これらは単なる例に過ぎず、他の実施形態では、他の処理リソースを使用してもよい。 The set processor 404 actually calculates the selected collection of algebraic relationships after optimization. The set processor 404 provides the arithmetic and logical processing necessary to implement a data set specified by an algebraic extended set representation. In one example embodiment, the set processor 404 provides a collection of functions that can be used to compute operations and functions referenced in algebraic relationships. The collection of functions may include functions configured to receive a data set in a particular physical format. In this example, the set processor 404 can provide multiple different algebraically equivalent functions that operate on the data set and provide the results in different physical formats. The function selected to calculate the algebraic relationship (which can be selected during optimization by the optimizer 418) corresponds to the format of the data set referenced within the algebraic relationship. In the example embodiment, the set processor 404 is capable of parallel processing of a plurality of simultaneous operations, and it is necessary to pipeline data input / output via the storage manager 420 to cross the non-volatile / volatile storage device boundary. A certain amount of data can be minimized. In particular, algebraic relationships from selected collections can be assigned to various processing resources for parallel processing. These processing resources include multi-thread execution by the processor 102 and accelerator 122 shown in FIG. 1, the distributed computer system shown in FIG. 2, the plurality of processors 302 and MAP 306 shown in FIG. 3, or any of the above. Can be included. These are merely examples, and other processing resources may be used in other embodiments.

エグゼクティブ４２２は、実行の全体的なスケジューリング、計算リソースの管理及び割り当て、並びに適切なスタートアップ及びシャットダウンを実行する。 The executive 422 performs overall scheduling of execution, management and allocation of computing resources, and appropriate startup and shutdown.

管理者インタフェース４２４は、システムを管理するためのインタフェースを提供する。実施形態例では、これは、データセットをインポート又はエクスポートするためのインタフェースを含むことができる。データセットはコネクタを通して追加することができるが、管理者インタフェース４２４は、多数のデータセット又はサイズの非常に大きなデータセットをインポートするための代替のメカニズムを提供する。データセットは、インタフェースを通してデータセットの場所を指定することによってインポートすることができる。次に、セットマネージャ４０２は、ＧＵＩＤをデータセットに割当てることができる。しかし、データにアクセスする必要がある要求を受け取るまで、基礎データにアクセスする必要はない。これにより、データをインポートして特定の構造に再フォーマットする必要なく、システムの非常に高速な初期化が可能になる。むしろ、データが実際に照会されるとき、データセット間の関係が定義されて、セットマネージャ４０２内の代数キャッシュに追加される。その結果、最適化は、（テーブルセット又は他の予め規定されたデータ構造に内蔵される予め規定された関係ではなく）データが使用される実際の方法に基づく。 The administrator interface 424 provides an interface for managing the system. In example embodiments, this can include an interface for importing or exporting a data set. Although datasets can be added through the connector, the administrator interface 424 provides an alternative mechanism for importing multiple datasets or very large datasets. Data sets can be imported by specifying the location of the data set through the interface. The set manager 402 can then assign a GUID to the data set. However, it is not necessary to access the underlying data until a request is received that requires access to the data. This allows for very fast system initialization without having to import and reformat the data into a specific structure. Rather, when the data is actually queried, the relationship between the data sets is defined and added to the algebra cache in the set manager 402. As a result, the optimization is based on the actual way in which the data is used (rather than a predefined relationship built into a table set or other predefined data structure).

大量データの管理に、実施形態例を使用することができる。例えば、データストアは、１テラバイト超、１００テラバイト、又は１ペタバイト以上のデータを含むことができる。データストアは、大容量の記憶容量を有する記憶アレイ又は分散記憶システムによって提供することができる。そして、データセット情報記憶ストアは多数のデータセットを定義することができる。場合によっては、データ情報記憶装置内には定義された１００万超、１０００万以上のデータセットがあり得る。一実施形態例では、ソフトウェアは２６４データセットスケールであるが、他の実施形態はより小さな又は大きなデータセットユニバースを管理してもよい。これらデータセットの多くは仮想的であってよく、他はデータストア内で実現化することができる。データセット情報ストア内のエントリを時折スキャンして、追加のデータセットを仮想化すべきか否か、又はデータセットを除去して、データセット情報ストアに取り込まれたデータセットを時間的に再定義するか否かを判断することができる。関係ストアは、データセット間の多数の代数的関係を含むこともできる。場合によっては、１００万超、１０００万以上の代数的関係が関係ストアに含まれ得る。場合によっては、代数的関係の数は、データセットの数よりも大きいことがある。多数のデータセット及び代数的関係は、データストア内のデータセットについて取り込むことができる膨大な量の情報を表し、拡張集合処理及び代数的最適化を使用して、極めて大量のデータの効率的な管理を可能にする。上記は単なる例にすぎず、他の実施形態は、異なる数のデータセット及び代数的関係を管理してもよい。 The example embodiments can be used to manage large amounts of data. For example, a data store may contain more than 1 terabyte, 100 terabytes, or 1 petabyte or more of data. The data store can be provided by a storage array or a distributed storage system having a large storage capacity. The data set information storage store can define a large number of data sets. In some cases, there may be more than one million defined data sets in the data information storage device. In one example embodiment, the software is a 264 dataset scale, but other embodiments may manage smaller or larger dataset universes. Many of these data sets may be virtual and others can be implemented in a data store. Occasionally scan entries in the dataset information store to determine if additional datasets should be virtualized or remove datasets and redefine the datasets captured in the dataset information store in time It can be determined whether or not. The relationship store can also contain a number of algebraic relationships between data sets. In some cases, more than 1 million and over 10 million algebraic relationships may be included in the relationship store. In some cases, the number of algebraic relationships may be greater than the number of data sets. Numerous datasets and algebraic relationships represent a vast amount of information that can be captured about the datasets in the data store, and use extended set processing and algebraic optimization to efficiently handle very large amounts of data. Enable management. The above are merely examples, and other embodiments may manage different numbers of data sets and algebraic relationships.

図５は、システムへの情報のインポートを容易にするために実装されるソフトウェアモジュールの一実施形態例を示すブロック図である。従来のデータベースシステムと異なり、本例のシステムは、提示されたデータセットに対してすぐには作用しない。むしろ、システムは、新しいデータセットへの参照をデータセット情報ストアに記憶する。一実施形態例では、これは、新しいＧＵＩＤをセットユニバースに追加することによって達成される。データセットがセットユニバースにとって既知になった後、システムはデータセットを使用することができる。 FIG. 5 is a block diagram illustrating an example embodiment of a software module implemented to facilitate importing information into the system. Unlike conventional database systems, the system of this example does not work immediately on presented data sets. Rather, the system stores a reference to the new data set in the data set information store. In one example embodiment, this is accomplished by adding a new GUID to the set universe. After the data set is known to the set universe, the system can use the data set.

既述のように、情報は、管理者インタフェース４２４内に含まれる関数を通してシステムに追加することができ、これについて以下にさらに詳述する。情報をシステムに追加するこのような一方法は、情報セット５０６をインポートするためのコマンド５０１をインポート関数５０２に対して発行することによる。一実施形態では、コマンドは、インポート対象のデータセットの物理ロケーション、外部識別子、及びデータセットが記憶に向けてデータを符号化するために利用する論理−物理マッピングを示す値を含む。カンマ区切り値（ＣＳＶ）ファイル、拡張可能マークアップ言語（ＸＭＬ）ファイル、固定長ファイル（ＦＩＸＥＤ）、ＸＳＮフォーマットファイル、及び他を含めた様々な物理フォーマットをサポートすることができる。これに加えて、情報セットは様々な不揮発的又は揮発的記憶媒体に配置してよく、ローカルに接続されてもよく、又はネットワーク又は他の通信方法を介してリモートアクセスされてもよい。情報セットは、複数の異なる物理的な記憶媒体に分散してもよく、又はネットワークを介して受信されるデータパケット若しくはユーザからの入力（例えば、エンドユーザによってリアルタイムに入力される）等のリアルタイムデータストリームから提供されてもよい。コマンドが発行された後、インポート関数５０２は、コマンドを解析し、セットマネージャ５０３に関連する外部識別子及び物理フォーマット値と共にデータセットを作成せしめる。次に、セットマネージャ５０３は、関連するデータセットのＧＵＩＤを作成し、物理フォーマットタイプ値、外部識別子、関連するＧＵＩＤ、及びＧＵＩＤが実現化されていることを含む様々な情報をセットユニバースに入力する。次に、インポート関数５０２は、ストレージマネージャ５０４を呼び出して、データセットの物理ロケーション識別子とセットマネージャ５０３によって割当てられたＧＵＩＤとの関連（association）を作成する。詳細には、ストレージマネージャ５０４は、データと関連するＧＵＩＤとの物理的なパスを含むストレージマップ５０５にインデックス記録を追加する。データセット５０６は、ここで、システム内にインポートされ、制御権が呼び出し側に戻される。データセットが記憶装置に実現化されていない（即ち、仮想化されている）場合であっても、システムは、データセットについての情報を取り込むことができる。例えば、データセットＣがデータセットＡとＢとの積集合として定義され得る。データセットＡ及びＢは記憶装置内で実現化されているが、データセットＣは、代数キャッシュ内の関係「C=A UNION B」によってのみ定義され、データセットＣのＧＵＩＤがセットユニバースに追加されたときに、記憶装置内で実現化されていないことがある。 As previously mentioned, information can be added to the system through functions contained within the administrator interface 424, which will be described in further detail below. One such method of adding information to the system is by issuing a command 501 to import function 502 for importing information set 506. In one embodiment, the command includes a value that indicates the physical location of the dataset to be imported, the external identifier, and the logical-to-physical mapping that the dataset uses to encode the data for storage. Various physical formats can be supported, including comma separated value (CSV) files, extensible markup language (XML) files, fixed length files (FIXED), XSN format files, and others. In addition, the information set may be located on various non-volatile or volatile storage media, connected locally, or remotely accessed via a network or other communication method. The information set may be distributed over several different physical storage media, or real-time data such as data packets received over the network or input from the user (eg, input in real time by the end user) It may be provided from a stream. After the command is issued, the import function 502 parses the command and causes a data set to be created with an external identifier and physical format value associated with the set manager 503. Next, the set manager 503 creates a GUID for the associated data set and enters various information into the set universe, including physical format type value, external identifier, associated GUID, and GUID realized. . The import function 502 then calls the storage manager 504 to create an association between the data set's physical location identifier and the GUID assigned by the set manager 503. Specifically, the storage manager 504 adds an index record to the storage map 505 that includes the physical path between the data and the associated GUID. Data set 506 is now imported into the system and control is returned to the caller. Even if the data set is not implemented on a storage device (ie, virtualized), the system can capture information about the data set. For example, data set C can be defined as the intersection of data sets A and B. Data sets A and B are implemented in storage, but data set C is defined only by the relationship “C = A UNION B” in the algebraic cache, and the GUID of data set C is added to the set universe. May not be realized in the storage device.

ステートメントの提出は、割当て又は関係をシステムに提供するプロセスである。ステートメントは、様々なインタフェースを通してシステムに提出することができる。一実施形態例では、３つのインタフェースが提供される。即ち、標準ＳＱＬ９２準拠ステートメントを提出するためのＳＱＬコネクタ、ＸＳＮを使用してステートメントを提出するためのＸＳＮコネクタ、及びウェブサーバＷ３ＣＸＱｕｅｒｙ準拠ステートメント及び他のＸＭＬベースのステートメントを提出するためのＸＭＬコネクタである。 Statement submission is the process of providing assignments or relationships to the system. Statements can be submitted to the system through various interfaces. In one example embodiment, three interfaces are provided. An SQL connector for submitting standard SQL92 compliant statements, an XSN connector for submitting statements using XSN, and an XML connector for submitting web server W3C XQuery compliant statements and other XML-based statements is there.

図６は、システムへのステートメントの提出を容易にするソフトウェアモジュールをどのように実装することができるかの一実施形態例を示すブロック図である。この例では、標準ＳＱＬコマンドが、ＳＱＬコネクタ６０１を通してシステムに提出される。ＳＱＬコマンドは、もう１つの標準ＳＱＬ９２準拠ＳＱＬステートメントを含むことができる。ＳＱＬコネクタ６０１はまず、提出時間を取り込み、提出されたステートメント内に含まれるすべてのセットに時間値を確立する。次に、コマンドは解析されて、ＳＱＬステートメントの構文が正しいことを検証する。構文エラー又は準拠エラーがあった場合、エラーメッセージが呼び出し側に返され、提出は停止される。エラーがない場合、ＳＱＬコネクタ６０１は、ＳＱＬコマンドの内部ナビゲート可能表現を構築し、これがＳＱＬトランスレータ６０２に出力される。ＳＱＬトランスレータ６０２は、次に、ＳＱＬコマンドの内部ナビゲート可能表現を適切な等価のＸＳＮステートメントに変換する。変換後、結果として生成されたＸＳＮステートメントは、将来の処理のためにＸＳＮインタフェース６０３に渡される。次に、各ステートメントは、そのテキストＸＳＮ表現をＸＳＮツリーと呼ばれる内部構造に変換される。ＸＳＮツリーは、ＸＳＮステートメントのメンバをプログラム的に調べる手段及びステートメントの要素をナビゲートする手段を提供する。 FIG. 6 is a block diagram illustrating an example embodiment of how a software module that facilitates the submission of statements to the system may be implemented. In this example, standard SQL commands are submitted to the system through SQL connector 601. The SQL command can include another standard SQL92 compliant SQL statement. The SQL connector 601 first captures the submission time and establishes a time value for every set included in the submitted statement. The command is then parsed to verify that the syntax of the SQL statement is correct. If there is a syntax error or compliance error, an error message is returned to the caller and submission is stopped. If there are no errors, the SQL connector 601 builds an internal navigable representation of the SQL command, which is output to the SQL translator 602. The SQL translator 602 then converts the internal navigable representation of the SQL command into an appropriate equivalent XSN statement. After conversion, the resulting XSN statement is passed to the XSN interface 603 for future processing. Each statement is then translated from its text XSN representation into an internal structure called an XSN tree. The XSN tree provides a means for programmatically examining members of an XSN statement and a means for navigating the elements of the statement.

次に、ＸＳＮツリーが調べられて、ステートメントが割当て又は関係を表すか否かが判断される。ステートメントが割当てである場合、ＧＵＩＤがセットマネージャ４０２によってステートメント内で指定される代数的表現に割当てられる。次に、ＸＳＮツリーが調べられて、ＧＵＩＤが、表現内のすべてのデータセット及び演算に割当てられ、表現が任意の明示的な集合又は任意の冗長的割当てを含むか否かが判断される。明示的な集合とは、標準ＳＱＬ「ｉｎｓｅｒｔ」ステートメントのコンテキスト内で発生し得るような、ステートメントの部分としてシステムに入力される集合のことである。冗長的割当てとは、代数キャッシュ内にすでにある演算及び引数を含む割当てのことである。明示的な集合の場合、セットマネージャ４０２はこれら集合に新しいＧＵＩＤを割当て、これら集合はセットプロセッサ４０４によってすぐに実現化される。冗長的割当ての場合、同じ演算及び右値（ｒｖａｌｕｅ）を含む表現を探して代数キャッシュを探索することによって発見され、代数キャッシュ内の既存の割当てエントリの左値（ｌｖａｌｕｅ）のＧＵＩＤが、セットマネージャ４０２から検索され、表現内の助長的割当てのｌｖａｌｕｅに割当てられる。割当てが冗長的ではない場合、新しいＧＵＩＤが、セットマネージャ４０２からの割当てに提供され、表現内の割当てのｌｖａｌｕｅに割当てられる。ステートメントによって指定される複雑な代数的関係は、原始的（原子的）な関係及び割当ての集まりに分解することもできる。ＧＵＩＤをこれら関係及び割当てに提供することができ、対応する代数的関係を代数キャッシュに追加することができる。 Next, the XSN tree is examined to determine whether the statement represents an assignment or relationship. If the statement is an assignment, the GUID is assigned by the set manager 402 to the algebraic expression specified in the statement. The XSN tree is then examined to determine whether the GUID is assigned to all datasets and operations in the representation and whether the representation includes any explicit set or any redundant assignment. An explicit set is a set that is entered into the system as part of a statement, such as may occur within the context of a standard SQL “insert” statement. A redundant assignment is an assignment that includes operations and arguments already in the algebraic cache. For explicit collections, the set manager 402 assigns new GUIDs to these collections, which are immediately realized by the set processor 404. In the case of redundant assignments, the GUI of the left value (lvalue) of an existing assignment entry in the algebraic cache is found by searching the algebraic cache looking for expressions containing the same operation and right value (rvalue). Retrieved from 402 and assigned to the lvalue of the facilitative assignment in the representation. If the assignment is not redundant, a new GUID is provided for the assignment from the set manager 402 and assigned to the lvalue of the assignment in the representation. Complex algebraic relationships specified by statements can also be broken down into primitive (atomic) relationships and collections of assignments. GUIDs can be provided for these relationships and assignments, and corresponding algebraic relationships can be added to the algebraic cache.

すべての明示的な集合及び割当てのｌｖａｌｕｅにＧＵＩＤが割当てられると、制御権がＳＱＬコネクタ６０１に戻される。必要であれば、次に、第２の呼び出しがＸＳＮインタフェース６０３に対して行われて、呼び出し側に戻すものと予想されるあらゆる集合が実現化される。そして、実現化された集合は、呼び出し側に返される。 When a GUID is assigned to all explicit sets and assignments lvalue, control is returned to the SQL connector 601. If necessary, a second call is then made to the XSN interface 603 to realize any set expected to be returned to the caller. The realized set is then returned to the caller.

図７は、図６の方法でシステムに提出することができるステートメントの一例を示す。この例では、ユーザは、標準的な商業取引に関する特定の情報を探してデータベースを照会している。要求は、標準ＳＱＬステートメント７０１によって表される。この事例で要求されているＯＲＤＥＲＫＥＹは「１２３４５」である。詳細には、この例のユーザは、品物の割引き、発送日、及び品物に対するコメントを特定の顧客注文番号「１２３４５」で要求している。情報は、２つのテーブル、即ちＬＩＮＥＩＴＥＭ及びＯＲＤＥＲＳから取得される。Ｌ＿ＯＲＤＥＲＫＥＹフィールドがＯ＿ＯＲＤＥＲＫＥＹに等しいことに基づいて、２つのテーブルが結合される。ＳＱＬステートメント７０１は、ユーザによってＳＱＬコネクタ６０１に渡される。ＳＱＬトランスレータ６０２は、ＳＱＬステートメントの内部ナビゲート可能表現を適切な等価のＸＳＮステートメント７０２に変換する。ＬＩＮＥＩＴＥＭテーブル及びＯＲＤＥＲＳテーブルの列又はフィールドが、関係データベースに固有ではない表現に変換されたことに留意する。特に、ＬＩＮＥＩＴＥＭテーブルの列又はフィールドは、ここでは、「１」〜「１６」のドメインによって表されており、ＯＲＤＥＲＳテーブルの列又はフィールドは、ここでは、「１７」以上のドメインによって表されている。表現内の最も内側の関数から開始して、ＳＱＬステートメント７０１の結合演算が、ｒｄｍＪｏｉｎ演算に変換され、ＬＩＮＥＩＴＥＭ、ＯＲＤＥＲＳ、及びＮＵＬＬが３つのパラメータとして渡される。次に、ｒｄｍＪｏｉｎの結果はｒｄｍＲｅｓｔに渡され、ｒｄｍＲｅｓｔは、結合演算の結果であるデータを、ＬＩＮＥＩＴＥＭデータセットのＬ＿ＯＲＤＥＲＫＥＹドメインであるドメイン「１」が定数「１２３４５」に等しく、ＬＩＮＥＩＴＥＭデータセットからのＬ＿ＯＲＤＥＲＫＥＹフィールドであるドメイン「１」もＯＲＤＥＲＳデータセットからのＯ＿ＯＲＤＥＲＫＥＹドメインのドメイン「１７」に等しいデータのみに制限する。次に、ＸＳＮステートメント７０２は、将来の処理のために、ＸＳＮインタフェースに渡される。 FIG. 7 shows an example of a statement that can be submitted to the system with the method of FIG. In this example, the user is querying the database looking for specific information about standard commercial transactions. The request is represented by a standard SQL statement 701. The ORDERKEY required in this case is “12345”. Specifically, the user in this example requests a discount on the item, the date of shipment, and a comment on the item with a specific customer order number “12345”. Information is obtained from two tables: LINEITEM and ORDERS. Based on the L_ORDERKEY field being equal to O_ORDERKEY, the two tables are joined. The SQL statement 701 is passed to the SQL connector 601 by the user. The SQL translator 602 converts the internal navigable representation of the SQL statement into an appropriate equivalent XSN statement 702. Note that the columns or fields in the LINEITEM table and the ORDERS table have been converted to a representation that is not unique to the relational database. In particular, the columns or fields of the LINEITEM table are here represented by domains “1” to “16”, and the columns or fields of the ORDERS table are here represented by domains “17” or higher. . Starting from the innermost function in the representation, the join operation of the SQL statement 701 is converted to an rdmJoin operation, and LINEITEM, ORDERS, and NULL are passed as three parameters. Next, the result of rdmJoin is passed to rdmRest, and rdmRest uses the L_ORDERKEY domain in the LINEITEM data set equal to the constant “12345” and the L_ORDERKEY from the LINEITEM data set. The field domain “1” is also limited to only data equal to domain “17” of the O_ORDERKEY domain from the ORDERS data set. The XSN statement 702 is then passed to the XSN interface for future processing.

ＸＳＮインタフェース６０３は、提出時間を記録して、提出されたステートメント内に含まれる集合に時間的な値を確立する。次に、ステートメントは、ＸＳＮステートメント７０２からＸＳＮツリー７０３に変換される。ＸＳＮツリーの構造については、図１２Ａ及び図１２Ｂに関連してさらに後述する。従来のプロセスの一環として、ＧＵＩＤが作成されるか、又はセットマネージャ４０２から検索され、対応する集合のＸＳＮツリー７０３内に挿入する。次に、制御権はＳＱＬコネクタ６０１に戻される。 The XSN interface 603 records the submission time and establishes a temporal value for the set included in the submitted statement. The statement is then translated from the XSN statement 702 to the XSN tree 703. The structure of the XSN tree will be further described later with reference to FIGS. 12A and 12B. As part of the conventional process, a GUID is created or retrieved from the set manager 402 and inserted into the corresponding set of XSN trees 703. Next, the control right is returned to the SQL connector 601.

この事例の実施形態例は結果集合を要求したため、次に、第２の呼び出しがＸＳＮインタフェース６０３に対して行われ、呼び出し側に返すものと予想されるあらゆる集合が実現化される。ＸＳＮツリー７０３は、次に、オプティマイザ６０４に戻され、ここでＸＳＮツリー７０３が効率に関して最適化され、最適化ＸＳＮツリー７０４が生成される（ここでは、単に例示のために、ツリーフォーマットではなく表現フォーマットで示されている）。この事例では、オプティマイザが効率のためにｒｄｍＲｅｓｔをｒｄｍＪｏｉｎに併合することに留意する。最適化ＸＳＮツリー７０４は、次に、セットプロセッサ６０５に渡され、ここで、ＸＳＮツリー内の代数的関係の集まりが計算される。そして、実現化された集合が呼び出し側に戻される。 Since this example embodiment requested a result set, a second call is then made to the XSN interface 603 to realize any set expected to be returned to the caller. The XSN tree 703 is then returned to the optimizer 604, where the XSN tree 703 is optimized for efficiency and an optimized XSN tree 704 is generated (here, for illustrative purposes, not a tree format) Shown in the format). Note that in this case, the optimizer merges rdmRest with rdmJoin for efficiency. The optimized XSN tree 704 is then passed to the set processor 605 where a set of algebraic relationships in the XSN tree is calculated. The realized set is then returned to the caller.

図８は、集合実現化を容易にするために実装されるソフトウェアモジュールの一実施形態例を示すブロック図である。集合実現化とは、集合のメンバシップを計算し、このような集合の物理的表現を記憶装置内で実現化するプロセスである。集合実現化は、ＳＱＬコネクタ又はＸＭＬコネクタ等の実現化をサポートするシステムの外部インタフェースから、又はセットエクスポートプロシージャの一部としてエグゼクティブソフトウェアモジュールから起動することができる。この実施形態例では、エクスポートコマンドがエグゼクティブ８０１に発行される。コマンドは、エクスポートする外部識別子又はＧＵＩＤを記憶パスと共に識別することができる。そして、エグゼクティブ８０１は、外部識別子又はＧＵＩＤをＸＳＮインタフェース８０２に渡す。外部識別子がコマンド内で識別された場合、ＸＳＮインタフェース８０２は、外部識別子をセットマネージャ８０３に渡す。セットマネージャ８０３は、外部識別子に関連付けられたＧＵＩＤを判断し、そのＧＵＩＤをＸＳＮインタフェース８０２に返す。この参照は、ＧＵＩＤに関連付けられた時間値に相対して行われる。ユーザによる別段の指定がない限り、実施形態例は、外部識別子に関連付けられた最新のＧＵＩＤを使用する。関連付けられたＧＵＩＤが特定されると、外部識別子が、関連付けられたＧＵＩＤで置換される。実現化すべきＧＵＩＤは、コマンド内で直接指定されるか、それとも外部識別子から取得されるかに関わらず、次に、セットマネージャ８０３に渡され、実現化されるか否かが判断される。ＧＵＩＤに関連付けられたデータセットが既に実現化されている場合、制御権はエグゼクティブ８０１に戻される。ＧＵＩＤに関連付けられたデータセットが実現化されていない場合、ＧＵＩＤはオプティマイザ８０４に提出されて、実現化される。次に、オプティマイザ８０４は、ＧＵＩＤに関連付けられたデータセットを表す代数的関係の最適な集まりを判断する。次に、この代数的関係の集まりはセットプロセッサ８０５に渡され、セットプロセッサ８０５において計算される。代数的関係の集まりがセットプロセッサ８０５に提出されると、制御権はエグゼクティブ８０１に戻される。次に、エグゼクティブ８０１は、ストレージマネージャがデータセットからのデータをエグゼクティブ８０１に提供することを要求し、エグゼクティブ８０１は次に、エクスポートコマンド内で指定されるパス名を使用してデータを記憶装置に記憶する。 FIG. 8 is a block diagram illustrating an example embodiment of software modules implemented to facilitate set realization. Set realization is the process of calculating set membership and realizing such a physical representation of the set in a storage device. The collective realization can be invoked from an external interface of the system that supports the realization, such as an SQL connector or an XML connector, or from an executive software module as part of a set export procedure. In this example embodiment, an export command is issued to executive 801. The command can identify the external identifier or GUID to be exported along with the storage path. The executive 801 passes the external identifier or GUID to the XSN interface 802. If the external identifier is identified in the command, the XSN interface 802 passes the external identifier to the set manager 803. The set manager 803 determines the GUID associated with the external identifier and returns the GUID to the XSN interface 802. This reference is made relative to the time value associated with the GUID. Unless otherwise specified by the user, the example embodiment uses the latest GUID associated with the external identifier. Once the associated GUID is identified, the external identifier is replaced with the associated GUID. Regardless of whether the GUID to be realized is directly specified in the command or obtained from the external identifier, it is then passed to the set manager 803 to determine whether or not to realize. If the data set associated with the GUID has already been realized, control is returned to the executive 801. If the data set associated with the GUID has not been realized, the GUID is submitted to the optimizer 804 and realized. Next, the optimizer 804 determines the optimal collection of algebraic relationships that represent the data set associated with the GUID. This collection of algebraic relationships is then passed to the set processor 805 where it is calculated. When the collection of algebraic relationships is submitted to the set processor 805, control is returned to the executive 801. Next, executive 801 requests that the storage manager provide data from the data set to executive 801, which then uses the pathname specified in the export command to store the data on the storage device. Remember.

図９Ａは、代数的及び演算的なオプティマイザソフトウェアモジュールの一実施形態例を示すブロック図である。オプティマイザは、セットプロセッサ９０９に提出する前に、代数的関係の集まりを処理して、代数的及び演算的に最適化する。システム環境及びシステムに関連する様々な制限又は性能の弱点に基づいて、代数的関係のどの集まりが最も効率的であるかを判断するために使用可能な多くの方法がある。 FIG. 9A is a block diagram illustrating an example embodiment of an algebraic and arithmetic optimizer software module. Prior to submission to the set processor 909, the optimizer processes the collection of algebraic relationships to optimize algebraically and computationally. There are many methods that can be used to determine which collection of algebraic relationships is most efficient based on the system environment and various limitations or performance weaknesses associated with the system.

図９Ａの実施形態例では、オプティマイザは、２つの基本的な原理に関して動作する。第１は、前に実現化されているデータセットを単に再利用するコストよりも低いコストを有するデータセット実現化代替プランがないことである。第２は、記憶装置境界性能バリアを越えて検索されるデータ量を最小化すべきことである。他の実施形態例では、特に最新技術は変化するため、他の原理を適用することも可能である。上記基本原理は、実施形態例では、３つの最適化ルーチン、即ちｆｉｎｄＡｌｔＯｐｓルーチン９０４、ｆｉｎｄＭｅｔａＧｕｉｄｓルーチン９０５、及びｆｉｎｄＡｌｔＧｕｉｄｓルーチン９０６を通して実現される。他の最適化ルーチンを使用してもよく、より多数又は少数の最適化ルーチンがシステムに存在してよいことに留意することが重要である。実施形態例では、最適化ルーチンは、可能な限り高速でコストの十分に低い代数的関係の集まりをもたらす可能性がより高い最適化を試みるように設計される特定の順序で実行される。 In the example embodiment of FIG. 9A, the optimizer operates on two basic principles. First, there is no data set realization alternative plan that has a lower cost than the cost of simply reusing a previously realized data set. Second, the amount of data retrieved across the storage device boundary performance barrier should be minimized. In other example embodiments, particularly the state of the art changes, so other principles can be applied. The basic principles are implemented in the exemplary embodiment through three optimization routines: a findAltOps routine 904, a findMetaGuids routine 905, and a findAltGuids routine 906. It is important to note that other optimization routines may be used and that more or fewer optimization routines may exist in the system. In the example embodiment, the optimization routines are executed in a specific order that is designed to attempt optimizations that are likely to result in a collection of algebraic relationships that are as fast as possible and sufficiently low in cost.

ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３は、さらに後述するように、最適化ルーチンが実行される前、及び各最適化ルーチンが実行された後に実行される。代数的関係の特定の集まりの実行に関するコストは、システムが代数的関係の集まりを計算するために必要なデータセットを記憶装置から検索するためにかかる時間を推定することによって決定される。推定検索時間は、情報を各Ｉ／Ｏ記憶装置バリアを越えて検索することができる速度及びかかる記憶装置バリアを越えて検索する必要のある推定情報量に基づいて計算することができる。コスト決定は、情報が同じＩ／Ｏチャネルを介して読み出されるのか、それとも異なるＩ／Ｏチャネルを介して読み出されるのか、及び特定の情報が表現の複数のサブパーツに使用されているか否か等の他の要因を考慮することもでき、これらは両方とも性能に影響し得る。これら最適化技法は、最適化ルーチン実行時のシステムの状態に応じて異なる最適化結果となり得る。例えば、同じ論理データを有する異なるデータセットが、異なるサイズを有する異なるデータフォーマットで提供され得る。これらが同じＩ／Ｏチャネルを介して利用可能な場合、最も小さいフォーマットを有するデータセットが選択され得る。しかし、最近アクセスされ、高速メモリ又はキャッシュ内に既に利用可能なデータセットについては、より大きなフォーマットが選択され得る。 As will be described later, the findLeastCost routine 903 is executed before the optimization routine is executed and after each optimization routine is executed. The cost of performing a particular collection of algebraic relationships is determined by estimating the time it takes for the system to retrieve from the storage device the data set needed to compute the collection of algebraic relationships. The estimated retrieval time can be calculated based on the speed at which information can be retrieved across each I / O storage barrier and the estimated amount of information that needs to be retrieved across such storage barrier. Cost determination is whether information is read through the same I / O channel or different I / O channels, whether specific information is used in multiple subparts of the representation, etc. Other factors can also be considered, both of which can affect performance. These optimization techniques can give different optimization results depending on the state of the system when the optimization routine is executed. For example, different data sets having the same logical data can be provided in different data formats having different sizes. If they are available via the same I / O channel, the data set with the smallest format can be selected. However, for recently accessed data sets already available in high speed memory or cache, a larger format may be selected.

実施形態例では、ＸＳＮインタフェース９０１は、代数的関係の集まりに関連するセットを実現化するために、オプティマイザソフトウェアモジュールを呼び出す。ＸＳＮインタフェース９０１は、実現化すべき集合のＧＵＩＤをオプティマイザソフトウェアモジュール内のｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２に渡す。ｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２は、代数キャッシュからＧＵＩＤによって識別される１つ又は複数の集合を定義する１つ又は複数のオリジナルの代数的関係を検索する。これら代数的関係は、始原表現（genesis expression）と呼ぶことができる。次に、ｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２は、かかる始原表現のＯｐｔｏＮｏｄｅツリー表現を構築する。ＯｐｔｏＮｏｄｅツリーをさらに詳細に後述し、より原始的な代数的関係の集まりとして代数的関係を表すために使用することができる。次に、オプティマイザソフトウェアモジュールは、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３を実行して、最低コスト始原表現を決定する。ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３により、コストが最低であると分かった始原表現が、実行するのに十分に安価であると判断した場合、さらなる最適化は中止され、かかる始原表現の代数的関係が、後述するｒｅａｌｉｚｅＮｏｄｅルーチン９０８に提出される。 In the example embodiment, the XSN interface 901 calls an optimizer software module to implement a set associated with a collection of algebraic relationships. The XSN interface 901 passes the set of GUIDs to be realized to the buildExpressions routine 902 in the optimizer software module. The buildExpressions routine 902 retrieves one or more original algebraic relationships that define one or more sets identified by a GUID from the algebraic cache. These algebraic relationships can be called genesis expressions. Next, the buildExpressions routine 902 constructs an OptoNode tree representation of such a primitive representation. The OptoNode tree is described in more detail below and can be used to represent algebraic relationships as a collection of more primitive algebraic relationships. Next, the optimizer software module executes a findLeastCost routine 903 to determine the lowest cost source expression. If the findLeastCost routine 903 determines that the primitive expression found to have the lowest cost is sufficiently cheap to execute, further optimization is stopped and the algebraic relationship of such primitive expression is the realizeNode described below. Submitted to routine 908.

ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３が、コストが最低であると分かった始原表現が、実行するのに十分に安価ではないと判断した場合、ｆｉｎｄＡｌｔＯｐｓルーチン９０４が実行されて、代替の演算を見つける。このルーチンは、拡張集合理論代数を使用して始原表現の代替版を合成する。合成された代替表現は、潜在的に実行コストがより低く、且つ代数キャッシュ内で容易に識別されるように構築される。表現合成は、表現の「形態」の認識及び代数的に等価であるが、計算コストがより低く、且つ／又は代数キャッシュ内で認識される可能性がより高い他の形態の置換に基づいて行われる。簡単な例は、２つの結合集合に対する制限である。何らかの表記省略表現を使用して、これはSETA=R(J(a,b,c),d)と表現することができる。しかし、結合演算は制限を行うことも可能であり、等価表現はSETA=J(a,b,CP(c,d))である。これら形態は両方とも、計算に同量の入力データを必要とするが、第２の形態は、より少ない出力データを生成する。これは、第２の形態がより少ない計算及びＩ／Ｏリソースを必要とすることを意味する。第２の形態が第１の形態よりも好ましいか否かは、代数キャッシュから何が利用可能であるか、及びどの集合が不揮発性記憶装置ですでに実現化されているかに依存する。しかし、オプティマイザ４１８内で両形態を探ることにより、より効率的な代替を見つける確率を高くすることができる。 If the findLeastCost routine 903 determines that the primitive representation found to have the lowest cost is not cheap enough to execute, the findAltOps routine 904 is executed to find an alternative operation. This routine synthesizes an alternative version of the primitive expression using extended set theory algebra. The synthesized alternative representation is constructed so that it is potentially less expensive to execute and is easily identified in the algebraic cache. Representation synthesis is based on the perception of the “form” of the representation and substitution of other forms that are algebraically equivalent but are less computationally expensive and / or more likely to be recognized in the algebraic cache. Is called. A simple example is a restriction on two connected sets. Using some notation abbreviation, this can be expressed as SETA = R (J (a, b, c), d). However, the join operation can also be restricted, and the equivalent expression is SETA = J (a, b, CP (c, d)). Both of these forms require the same amount of input data for calculation, while the second form produces less output data. This means that the second form requires less computation and I / O resources. Whether the second form is preferred over the first form depends on what is available from the algebraic cache and which sets have already been implemented in non-volatile storage. However, exploring both forms within the optimizer 418 can increase the probability of finding a more efficient alternative.

ｆｉｎｄＡｌｔＯｐｓルーチン９０４が、代替の表現が発見されたことを示す場合、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３が再び実行されて、最小コスト始原表現及び代替表現に基づいて最小コスト表現が見つけられる。ここでも、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３が、コストが最低であると分かった表現が、実行するのに十分に安価であると判断した場合、さらなる最適化は中止され、かかる表現の代数的関係が、後述するｒｅａｌｉｚｅＮｏｄｅルーチン９０８に提出される。最適化を中止するための閾値は、処理リソースの相対速度及びデータチャネル及び／又は他のシステム特質に基づいて決定することができる。一例では、閾値は１０ＭＢのデータ転送に設定される。この例では、通常、１０ＭＢのデータを約１／１０秒で転送することができるため、さらなる最適化は中止され、集合は単純に表現から計算される。 If the findAltOps routine 904 indicates that an alternative representation has been found, the findLeastCost routine 903 is executed again to find the minimum cost representation based on the minimum cost source representation and the alternative representation. Again, if the findLeastCost routine 903 determines that an expression found to have the lowest cost is sufficiently cheap to execute, further optimization is stopped and the algebraic relationship of such an expression is described below. Submitted to the realizeNode routine 908. The threshold for aborting the optimization can be determined based on the relative speed of the processing resources and the data channel and / or other system characteristics. In one example, the threshold is set for 10 MB data transfer. In this example, typically 10 MB of data can be transferred in about 1/10 second, so further optimization is stopped and the set is simply computed from the representation.

始原表現及びｆｉｎｄＡｌｔＯｐｓルーチン９０４によって識別される始原表現の代替のいずれも、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３の実行によって決定されるように、実行には十分に安価ではなかった場合、次の最適化ルーチンが実行される。実施形態例では、次の最適化ルーチンは、ｆｉｎｄＭｅｔａＧｕｉｄｓルーチン９０５である。ｆｉｎｄＭｅｔａＧｕｉｄｓルーチン９０５は、増分的に小さなコストを有するすべての表現を見つけて、実行のためにセットプロセッサに提出する。増分的に小さなコストを有する表現は、多くの場合、メタデータのみを含む。低コスト演算の例としては、述語クロス積（predicate cross product）（ＣＰ演算）、出力範囲変換（output scope transform）（ＯＳＴ演算）、及び左右（ｒｄｍＳＦＬ演算及びｒｄｍＳＦＲ演算）の関係データモデルソートドメインが挙げられる。これら演算は、典型的には、ユーザデータモデル内のメタデータに対して作用し、追加のメタデータを生成する。物理的な集合サイズは、典型的には、５００バイト程度未満であり、オプティマイザ４１８の実行閾値よりもはるかに低い高速計算の最有力候補になる。したがって、これら演算は、最小閾値を満たすか否かをテストせずに、単にオプティマイザ４１８からすぐに実行することができる。次に、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３が再び呼び出されて、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３に対する前の呼び出しから決定された最小費用表現（least expensive expression）とｆｉｎｄＭｅｔａＧｕｉｄｓルーチン９０５からの表現との間として最小コスト表現（least-costly expression）が選択される。再び、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３により、コストが最低であると分かった表現が、実行するのに十分に安価であると判断された場合、さらなる最適化は中止され、かかる表現の代数的関係が、後述するｒｅａｌｉｚｅＮｏｄｅルーチン９０８に提出される。 If neither the source expression nor the source expression alternative identified by the findAltOps routine 904 was determined to be by execution of the findLeastCost routine 903, then the next optimization routine is executed. . In the example embodiment, the next optimization routine is a findMetaGuids routine 905. The findMetaGuids routine 905 finds all expressions that have incrementally small costs and submits them to the set processor for execution. Expressions with incrementally small costs often include only metadata. Examples of low cost operations include predicate cross product (CP operation), output scope transform (OST operation), and left and right (rdmSFL and rdmSFR operations) relational data model sort domains Can be mentioned. These operations typically operate on the metadata in the user data model to generate additional metadata. The physical set size is typically less than about 500 bytes, making it a prime candidate for fast computation that is much lower than the optimizer 418 execution threshold. Thus, these operations can simply be performed immediately from the optimizer 418 without testing whether the minimum threshold is met. Next, the findLeastCost routine 903 is called again and the least cost expression (least-costly expression) is between the least expensive expression determined from the previous call to the findLeastCost routine 903 and the expression from the findMetaGuids routine 905. ) Is selected. Again, if the findLeastCost routine 903 determines that an expression that is found to have the lowest cost is sufficiently cheap to execute, further optimization is stopped and the algebraic relationship of such an expression is described below. Submitted to the realizeNode routine 908.

ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３によって識別された最低コスト表現が、まだ十分に安価な実行費用ではない場合、ｆｉｎｄＡｌｔＧｕｉｄルーチン９０６が実行される。ｆｉｎｄＡｌｔＧｕｉｄｓルーチン９０６は、１つ又は複数の下位表現（subexpression）を、前に実現化された集合を記述する代替の表現で置換できるか否かを判断する。実現化された集合を再使用するコストは常に、かかる集合を実現化するために必要な表現を実行するコストよりも低いため、このルーチンを使用して、さらなるコスト削減を提供することができる。かかる置換の一例について関係データモデルを使用して説明することができる。テーブル（ボックス（ＢＯＸＥＳ）と呼ばれる）内の特定のフィールド（サイズ（ＳＩＺＥ）と呼ばれ、テーブルの３番目のフィールド）が、１〜１００の範囲の値を有するものと仮定する。次に、ユーザが、５０未満のサイズのすべてのボックスを要求するクエリ（Ｑ１）を発行する。これは、ＸＳＮでＱ１＝ｒｄｍＲＥＳＴ（ＢＯＸＥＳ，｛｛｛“ＬＴ”．＜”３”，”ＣＯＮＳＴ”．”５０”｝｝｝）として表現される。しばらくしてから、ユーザは、サイズが２５未満のすべてのボックスを要求する。これは、Ｑ２＝ｒｄｍＲＥＳＴ（ＢＯＸＥＳ，｛｛｛“ＬＴ”．＜”３”，”ＣＯＮＳＴ”．”２５”＞｝｝｝）として提出される。これら両クエリにおいて、提出されたまま実行される場合、ボックスデータセット全体を読み出して、結果Ｑ１及びＱ２を特定しなければならない。しかし、メタデータセット｛｛｛“ＬＴ”．＜”３”，”ＣＯＮＳＴ”．”５０”｝｝｝及び｛｛｛“ＬＴ”．＜”３”，”ＣＯＮＳＴ”．”２５”＞｝｝｝の数学的検査により、第２の集合によって制限されるいずれの集合も第１のサブセット（部分集合）であることが示された。したがって、代数的置換を行うことができ、次の表現が生成された。Ｑ２＝ｒｄｍＲＥＳＴ（Ｑ１，｛｛｛“ＬＴ”．＜”３”，”ＣＯＮＳＴ”．”２５”＞｝｝｝）。Ｑ１が不揮発性記憶装置ですでに実現化されている場合、これは、Ｑ１のサイズがボックスのサイズ未満でなければならず、したがって、転送のために必要なＩ／Ｏコストがより低いことを示すことができる。そして、Ｑ１がすでに実現化されている場合、これは、当初提出された表現よりもＱ２を評価する全体的により安価な手段を提供する。 If the lowest cost representation identified by the findLeastCost routine 903 is not yet sufficiently cheap to execute, the findAltGuid routine 906 is executed. The findAltGuids routine 906 determines whether one or more subexpressions can be replaced with an alternative expression describing a previously realized set. This routine can be used to provide further cost savings because the cost of reusing a realized set is always lower than the cost of performing the representation necessary to realize such a set. An example of such a substitution can be described using a relational data model. Assume that a particular field (called SIZE, the third field of the table) in the table (called BOXES) has a value in the range of 1-100. Next, the user issues a query (Q1) that requests all boxes of size less than 50. This is expressed in XSN as Q1 = rdmREST (BOXES, {{{“LT”. <“3”, “CONST”. “50”}}}). After some time, the user requests all boxes that are less than 25 in size. This is submitted as Q2 = rdmREST (BOXES, {{{"LT". <"3", "CONST". "25">}}}). In both these queries, if executed as submitted, the entire box data set must be read to identify results Q1 and Q2. However, the metadata set {{{“LT”. <“3”, “CONST”. "50"}}} and {{{"LT". <“3”, “CONST”. A mathematical examination of “25”>}}} indicates that any set limited by the second set is a first subset (subset). Thus, algebraic substitutions could be made and the following expression was generated: Q2 = rdmREST (Q1, {{{"LT". <"3", "CONST". "25">}}}). If Q1 is already implemented with non-volatile storage, this means that the size of Q1 must be less than the size of the box, and therefore the I / O cost required for the transfer is lower. Can show. And if Q1 has already been realized, this provides an overall cheaper means of evaluating Q2 than the originally submitted expression.

下位表現が任意の適した代替表現で置換された後、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３が再び実行されて、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３に対する前の呼び出しから決定された最小費用表現とｆｉｎｄＡｌｔＧｕｉｄｓルーチン９０６からの表現との間として最小コスト表現が選択される。ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３により、コストが最低であると分かった表現が、実行するのに十分に安価であると判断された場合、さらなる最適化は中止され、かかる表現が、後述するｒｅａｌｉｚｅＮｏｄｅルーチン９０８に提出される。 After the sub-expression has been replaced with any suitable alternative expression, the findLeastCost routine 903 is executed again to minimize the minimum cost expression determined from the previous call to the findLeastCost routine 903 and the expression from the findAltGuids routine 906. A cost expression is selected. If the findLeastCost routine 903 determines that an expression that is found to have the lowest cost is sufficiently cheap to execute, further optimization is aborted and such expression is submitted to the realizeNode routine 908 described below. The

上述した最適化作業が完了した後、オプティマイザはｒｅａｌｉｚｅＮｏｄｅルーチン９０８を呼び出す。ｒｅａｌｉｚｅＮｏｄｅルーチン９０８は、ｏｐｔｏＮｏｄｅツリーをＸＳＮツリーに変換し、ｓｐＰｒｏｃｅｓｓＸｓｎＴｒｅｅルーチンを呼び出して、ＸＳＮツリーを実行のためにセットプロセッサ９０９に提出し、ＸＳＮツリーを削除し、制御権をオプティマイザソフトウェアモジュールに戻し、そして、オプティマイザソフトウェアモジュールは制御権をＸＳＮインタフェース９０１に戻す。 After the optimization work described above is completed, the optimizer calls the realizeNode routine 908. The realizeNode routine 908 converts the optoNode tree to an XSN tree, calls the spProcessXsnTree routine to submit the XSN tree to the set processor 909 for execution, deletes the XSN tree, returns control to the optimizer software module, and The optimizer software module returns control to the XSN interface 901.

図９Ｂは、代数的及び演算的なオプティマイザソフトウェアモジュールの別の実施形態例を示すブロック図である。図９Ａに提示した実施形態例と異なり、この実施形態例での最適化ルーチンは、各ＯｐｔｏＮｏｄｅツリーにリーフからルートに進んで適用される。この手法は、各最適化ルーチンの結果を表現に対する引数として提供し、最適化のさらなる機会を発生させるが、実行時間の増大を代償とする。大幅な最適化をさらに行うことができる状況下では、この手法が好ましいであろう。 FIG. 9B is a block diagram illustrating another example embodiment of an algebraic and arithmetic optimizer software module. Unlike the example embodiment presented in FIG. 9A, the optimization routine in this example embodiment is applied to each OptoNode tree, going from leaf to root. This approach provides the results of each optimization routine as an argument to the expression, generating further optimization opportunities, but at the cost of increased execution time. This approach would be preferred in situations where further optimization can be performed.

この実施形態例での実施態様は、２つのみの最適化ルーチン、即ちｆｉｎｄＯｐｅｒａｔｉｏｎａｌルーチン９１３及びｆｉｎｄＡｌｇｅｂｒａｉｃルーチン９１４を使用する。前の実施形態例と異なり、ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３は、ｆｉｎｄＯｐｅｒａｔｉｏｎａｌルーチン９１３及びｆｉｎｄＡｌｇｅｂｒａｉｃルーチン９１４が両方とも実行された後でのみ実行される。ｆｉｎｄＬｅａｓｔＣｏｓｔルーチン９０３の機能は、前の実施形態例において述べた機能と同じである。 The implementation in this example embodiment uses only two optimization routines: a findOperational routine 913 and a findAlgebraic routine 914. Unlike the previous example embodiment, the findLeastCost routine 903 is executed only after both the findOperational routine 913 and the findAlgebraic routine 914 are executed. The functions of the findLeastCost routine 903 are the same as those described in the previous embodiment.

前の実施形態でのように、ＸＳＮインタフェース９０１はオプティマイザソフトウェアモジュールを呼び出し、実現化すべき集合のＧＵＩＤをｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２に渡す。ｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２は、前の実施形態例において述べたものと同じである。ｂｕｉｌｄＥｘｐｒｅｓｓｉｏｎｓルーチン９０２が表現のＯｐｔｏＮｏｄｅツリーを構築した後、ｆｉｎｄＯｐｅｒａｔｉｏｎａｌルーチン９１３が実行されて、代替の演算が見つけられる。このルーチンは、前の実施形態例において述べたｆｉｎｄＡｌｔＯｐｓルーチン９０４と同一の機能を実行する。 As in the previous embodiment, the XSN interface 901 calls the optimizer software module and passes the set of GUIDs to be realized to the buildExpressions routine 902. The buildExpressions routine 902 is the same as described in the previous example embodiment. After the buildExpressions routine 902 builds the expression's OptoNode tree, a findOperational routine 913 is executed to find alternative operations. This routine performs the same function as the findAltOps routine 904 described in the previous example embodiment.

ｆｉｎｄＯｐｅｒａｔｉｏｎａｌルーチン９１３が完了した後、変更されたＯｐｔｏＮｏｄｅツリーがｆｉｎｄＡｌｇｅｂｒａｉｃルーチン９１４に渡されて、追加の代替表現が見つけられる。ｆｉｎｄＡｌｇｅｂｒａｉｃルーチン９１４は、ＯｐｔｏＮｏｄｅツリー上で右から左に、且つ最も内側の表現から最も外側の表現に向かって繰り返される。この繰り返し順序は、追加の代替表現を見つける潜在性を最大化する。各表現は、１つの演算及び１〜３つの引数を含む場合、引数と演算との各組み合わせが、１度に１つずつｆｉｎｄＥｘｐｅｒｓｓｉｏｎｓルーチン９１５に提示される。次に、ｆｉｎｄＥｘｐｒｅｓｓｉｏｎｓルーチン９１５は、代替の表現の発見又は合成を意図して、表現の演算に特定のコードを実行する。演算に特定のコードは、代数キャッシュからの引数の代数的置換を実行し、表現内に含まれる低コスト表現の計算を実行し、表現自体を計算し、表現又は任意の表現の引数の代替形態を合成することができる。次に、代替表現はいずれも、演算に特定のコードによってＯｐｔｏＮｏｄｅツリーの適切な場所に追加される。 After the findOperational routine 913 is complete, the modified OptoNode tree is passed to the findAlgebraic routine 914 to find additional alternative expressions. The findAgebraic routine 914 is repeated on the OptoNode tree from right to left and from the innermost representation to the outermost representation. This repetition order maximizes the potential to find additional alternative expressions. If each representation includes one operation and one to three arguments, each combination of argument and operation is presented to the findExpressions routine 915, one at a time. A findExpressions routine 915 then executes code specific to the operation of the representation, with the intention of finding or synthesizing alternative representations. The code specific to the operation performs the algebraic substitution of the arguments from the algebraic cache, performs the calculation of the low-cost expression contained within the expression, calculates the expression itself, and the alternative form of the expression or any expression argument Can be synthesized. Next, any alternative representations are added to the appropriate place in the OptoNode tree with code specific to the operation.

上述した最適化作業が完了した後、オプティマイザは、前の実施形態例でのｒｅａｌｉｚｅＮｏｄｅルーチンと同じｒｅａｌｉｚｅＮｏｄｅルーチン９０８を呼び出す。次に、制御権がＸＳＮインタフェース９０１に戻される。 After the optimization work described above is completed, the optimizer calls the same realizeNode routine 908 as the realizeNode routine in the previous example embodiment. Next, the control right is returned to the XSN interface 901.

システムは、総合的最適化も実行することができる。総合的最適化は、関係及びデータセットを解析して、将来の要求の評価を最適化するに当たって有用であると予想される新しい関係を代数キャッシュに追加すると共に、集合をセットユニバースに追加する。これは、システムに対する過去の要求のパターンに基づいて実行することができ、このパターンを使用して、将来の同様の要求を予想して最適化を実行することができる。この総合的最適化は、予備プロセッササイクルを使用することによってバックグラウンドで実行することができる。図９Ｃ、図９Ｄ、図９Ｅ、図９Ｆ、図９Ｇ、及び図９Ｈは、総合的最適化方法の例を示す。しかし、様々な他の総合的最適化も可能であり、これら実施形態例は、本発明内のほんの少数の例にすぎない。 The system can also perform overall optimization. Comprehensive optimization analyzes relationships and data sets to add new relationships that are expected to be useful in optimizing the evaluation of future requirements to the algebra cache and to add sets to the set universe. This can be performed based on a pattern of past requests for the system, which can be used to perform optimization in anticipation of similar future requests. This overall optimization can be performed in the background by using spare processor cycles. 9C, 9D, 9E, 9F, 9G, and 9H show examples of comprehensive optimization methods. However, various other overall optimizations are possible, and these example embodiments are only a few examples within the present invention.

図９Ｃは、個々のスカラー値又は自由範囲のスカラー値がサブセット（部分集合）のメンバシップを識別する一例を示す。この性質のクエリは、データを等濃度の集合に分割する部分集合であって、特定の範囲の値をそれぞれ含む部分集合を作成することから恩恵を受けることができる。例えば、データセットは、図９Ｃにおいて９５０で示すデータ分布を有し得る。このデータセットは、図９Ｃにおいて９５０で示す部分集合１〜６等の等濃度の複数のデータセットに分割することができる。この一例は、特定のデータの後又は前に発生するすべてのトランザクションを要求することである。この最適化は、セットプロセッサが同様の性質の将来の部分集合を計算するために調べなければならないデータ量を低減するという利点を有する。総合的最適化ルーチンは、代数キャッシュの検査によりある範囲のスカラー値を使用する特定の集合から相当数の関係制約を検出することにより、この状況を識別する。これらエントリから、オプティマイザは、分割すべきスカラー値の範囲を確立するための最大スカラー値及び最小スカラー値を決定する。次に、オプティマイザは、分割部分集合数を、利用可能なＩ／Ｏチャネルの平均数に等しくなるように決定する。最後に、オプティマイザは、各分割部分集合につき、適切な関係を代数キャッシュに挿入すると共に、集合をセットユニバースに挿入する。オプティマイザは、部分集合の和集合が集合に等しいことを示す関係を挿入し、セットプロセッサを呼び出して各分割部分集合を計算することもできる。 FIG. 9C shows an example where individual scalar values or free range scalar values identify membership of a subset. Queries of this nature can benefit from creating subsets that divide the data into equal concentration sets, each containing a specific range of values. For example, the data set may have a data distribution indicated by 950 in FIG. 9C. This data set can be divided into multiple data sets of equal density, such as subsets 1-6 shown at 950 in FIG. 9C. An example of this is requesting all transactions that occur after or before specific data. This optimization has the advantage that the set processor reduces the amount of data that must be examined to compute a future subset of similar properties. The overall optimization routine identifies this situation by detecting a significant number of relational constraints from a particular set that uses a range of scalar values by checking the algebraic cache. From these entries, the optimizer determines the maximum and minimum scalar values for establishing the range of scalar values to be divided. The optimizer then determines the number of split subsets to be equal to the average number of available I / O channels. Finally, for each split subset, the optimizer inserts the appropriate relationship into the algebra cache and inserts the set into the set universe. The optimizer can also insert a relationship indicating that the union of the subsets is equal to the set, and call the set processor to compute each split subset.

図９Ｄは、分割部分集合内のメンバシップの基準が特定の範囲内に入るスカラー値に基づくことを除き、図９Ｃの最適化と同様の総合的最適化の一例を示す。この一例は、所望の範囲の顧客年齢に対して所望の部分集合を判断することである。例えば、データセット内のデータが、図９Ｄにおいて９５４で示される特定の範囲内に入り得る。このデータセットは、図９Ｄにおいて９５６で示されるように、これら各範囲を含む部分集合１〜５に分割することができる。他の総合的最適化例と同じように、この種類の分割は、セットプロセッサによって調べるべきデータを少なくすることができ、結果として、必要とされる計算時間及びリソースの低減を介して改良に繋がる。 FIG. 9D shows an example of an overall optimization similar to the optimization of FIG. 9C, except that the membership criteria in the split subset is based on scalar values that fall within a certain range. An example of this is determining a desired subset for a desired range of customer ages. For example, the data in the data set may fall within a specific range shown at 954 in FIG. 9D. This data set can be divided into subsets 1-5 that include each of these ranges, as shown at 956 in FIG. 9D. As with other comprehensive optimization examples, this type of partitioning can reduce the data to be examined by the set processor, resulting in improvements through the reduction of required computation time and resources. .

図９Ｅは、総合的最適化の別の形態を示すが、この最適化例は、スカラー値ではなく集合のメンバのドメインに基づく。この例では、オプティマイザは、有用な部分集合を生成するために特定のドメインのみが必要とされ、他のドメインは必要ないと判断する。例えば、図９Ｅのデータセット９５８は列１〜５を有するが、オプティマイザは、多くの要求が列１、３、及び４のみの使用を要すると判断し得る。次に、オプティマイザは、対象ドメインのみを含むメンバを有する部分集合を生成するためのエントリをセットマネージャ内に作成し、セットプロセッサを呼び出してこの部分集合を生成する。例えば、図９Ｅの９６０で示される列１、３、及び４のみを有するデータセットを作成することができる。 FIG. 9E illustrates another form of overall optimization, but this optimization example is based on the domain of the members of the set rather than a scalar value. In this example, the optimizer determines that only a particular domain is needed to generate a useful subset and no other domains are needed. For example, the data set 958 of FIG. 9E has columns 1-5, but the optimizer may determine that many requests require the use of only columns 1, 3, and 4. Next, the optimizer creates an entry in the set manager for generating a subset having members including only the target domain, and calls the set processor to generate this subset. For example, a data set having only columns 1, 3, and 4 shown at 960 in FIG. 9E can be created.

図９Ｆは、対象ドメインのスカラー値が比較的低い濃度を有すると判断される一例を示す。一例は、９６２で示されるようにスカラー値の真（ＴＲＵＥ）及び偽（ＦＡＬＳＥ）を有するバイナリドメインである。この場合、オプティマイザは、結果として生成される部分集合からドメインをなくしながら、このドメインが、ドメイン内に存在する各値に対して単調である部分集合の関係を作成する。例えば、９６４に示すように、当初のデータセットのうち、ドメインの値が偽であるすべてのメンバに対して部分集合を作成することができ、ドメインの値が真であるすべてのメンバに対して別個の部分集合を作成することができる。この最適化は、バイナリフィールドでさえも１００％の平均性能向上を提供するため、大きな性能恩恵を有することができる。 FIG. 9F shows an example in which the scalar value of the target domain is determined to have a relatively low density. An example is a binary domain with scalar values TRUE and FALSE as shown at 962. In this case, the optimizer removes the domain from the resulting subset, creating a subset relationship where this domain is monotonic for each value present in the domain. For example, as shown in 964, a subset can be created for all members of the original data set whose domain value is false, and for all members whose domain value is true. A separate subset can be created. This optimization can have a significant performance benefit because it provides a 100% average performance improvement even in the binary field.

図９Ｇは、集合が２つの集合の関係結合で構成される一例を示す。結合により、コストが当初の２つの集合以下のデータセットが生成される条件では、オプティマイザは結合を実行する。この一例は、関係集合間に対応しない主キー及び外部キーが存在する関係内部結合である。例えば、第１のデータセット９６６が３列（図９Ｇにおいてデータセット９６６の列１、２、及び３として示される）を含み、第２のデータセット９６８が４列（図９Ｇにおいてデータセット９６８の列１、２、３、及び４として示される）を含み得る。これら２つのデータセットを結合して、７列（図９Ｇにおいてデータセット９７０の列１、２、３、４、５、６、及び７として示される）を有する第３のデータセット９７０を作成することができる。 FIG. 9G shows an example in which the set is composed of a relational combination of two sets. The optimizer performs the join under the condition that the join produces a data set whose cost is less than or equal to the original two sets. An example of this is a relational inner join where there are primary keys and foreign keys that do not correspond between relation sets. For example, the first data set 966 includes three columns (shown as columns 1, 2, and 3 of data set 966 in FIG. 9G), and the second data set 968 includes four columns (of data set 968 in FIG. 9G). Columns 1, 2, 3, and 4). These two data sets are combined to create a third data set 970 having seven columns (shown as columns 1, 2, 3, 4, 5, 6, and 7 of data set 970 in FIG. 9G). be able to.

図９Ｈは、ベクタードマルチページ処理（vectored multipaging）の一例を示す。ユーザが特定の方法で情報にアクセスする（例えば、電話番号が、ある人物の氏名及び住所情報を探すために使用される）ことが多い場合、オプティマイザは、新しいデータセットを自動的に定義し、一次元配列の複数ページ（ベクタードマルチページ）を作成することにより、新しい関係（例えば、電話番号、氏名、及び住所のみを含むデータセットを定義する）を代数キャッシュに追加して、これら要求をより効率的にすることができる。例えば、オプティマイザは、電話番号の要素である３桁のエリアコード、３桁のプリフィックス、及び４桁のポストフィックスをベクタードマルチページ処理に使用すべきであると決定することができる。次に、オプティマイザは、とり得る１，０００個のエリアコード（０００〜９９９）のそれぞれに対応する１，０００個の部分集合９７４を含む集合９７２を作成する。これら各部分集合は、とり得る各プリフィックス値（０００〜９９９）の部分集合を参照する１，０００個のＧＵＩＤを含み、これら各部分集合は、４桁の各ポストフィックスにつき人物についての指名及び住所情報を有する１０，０００個のメンバを含む。すべてのコードが埋まっている場合、エリアコード及び電話番号プリフィックスに基づいて１００，０００個の部分集合を作成することができる。しかし、多くのエリアコードとプリフィックスとの組み合わせが使用中ではないため、これらエントリを単にヌル（ＮＵＬＬ）集合と呼ぶ。これら集合が作成されると、セットプロセッサはこれら集合を利用して、単にエリアコードをエリアコード集合内のオフセット（ベクター）として使用し、適切なプリフィックス部分集合を表すＧＵＩＤを検索し、それからプリフィックスをオフセットとして使用して、適切なポストフィックス部分集合のＧＵＩＤを判断することにより、電話番号に基づいて個人を素早く見つけることができる。最終的に、電話番号のポストフィックスがオフセットとして使用されて、個人のデータが見つけられる。 FIG. 9H shows an example of vectored multipaging. If the user often accesses information in a specific way (eg, a phone number is used to look up a person's name and address information), the optimizer automatically defines a new data set, By creating multiple pages of a one-dimensional array (vectored multi-page), new relationships (eg, defining a data set containing only phone numbers, names, and addresses) can be added to the algebraic cache to satisfy these requests. Can be more efficient. For example, the optimizer may determine that a three-digit area code, a three-digit prefix, and a four-digit postfix that are elements of a telephone number should be used for vectored multi-page processing. Next, the optimizer creates a set 972 including 1,000 subsets 974 corresponding to each of 1,000 possible area codes (000 to 999). Each of these subsets contains 1,000 GUIDs that reference a subset of each possible prefix value (000-999), and each of these subsets includes a person's nomination and address for each 4-digit postfix. Contains 10,000 members with information. If all codes are buried, 100,000 subsets can be created based on the area code and phone number prefix. However, since many area code and prefix combinations are not in use, these entries are simply referred to as a NULL set. Once these sets have been created, the set processor uses these sets to simply use the area code as an offset (vector) within the area code set, retrieve the GUID representing the appropriate prefix subset, and then retrieve the prefix. By using it as an offset to determine the GUID of the appropriate postfix subset, individuals can be quickly found based on phone numbers. Finally, the telephone number postfix is used as an offset to find personal data.

図１０Ａは、ＯｐｔｏＮｏｄｅツリー構造の図である。ＯｐｔｏＮｏｄｅツリーは、オプティマイザが処理中の関係、表現、及び引数を追跡するために使用される。ツリーのルートにはＯｐｔｏＮｏｄｅ１００１があり、これは複数のＯｐｔｏＥｘｐｒｅｓｓｉｏｎｓ１００２のリストである。リスト内の各ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００２は、同じリスト内のその他の表現の数学的に等価の変形に関する情報を含む。詳細には、実施形態例では、各ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００２は、演算タイプ、表現を識別するＧＵＩＤ、様々なフラグ（これらブールフラグは、ＯｐｔｏＥｘｐｒｅｓｓｉｏｎがＧＵＩＤを有するか否か、表す表現が代数キャッシュ内にあるか否か、及びＯｐｔｏＥｘｐｒｅｓｓｉｏｎがＧＵＩＤの代替表現の部分として使用されるか否かを示す）、コスト情報（このＯｐｔｏＥｘｐｒｅｓｓｉｏｎのコストの評価に使用すべきコストを示す値及び含まれている表現の残りの部分から独立して実現化される場合の表現のコストを示す値）、及び最大で３つまでのＯｐｔｏＮｏｄｅ引数を含む。オプティマイザは、所望の表現を評価するのに最も効率的な方法を決定するために、１つ又は複数のＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００２を作成する。上述したように、オプティマイザは、各ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００２を解析し、表現の評価に関連付けられたコストを判断する。次に、オプティマイザは、効率のためにどのＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００２を使用すべきかを決定することができる。 FIG. 10A is a diagram of an OptoNode tree structure. The OptoNode tree is used to track the relationships, expressions, and arguments that the optimizer is processing. There is an OptoNode 1001 at the root of the tree, which is a list of a plurality of OptoExpressions 1002. Each OptoExpression 1002 in the list contains information about mathematically equivalent variants of other expressions in the same list. Specifically, in the example embodiment, each OptoExpression 1002 includes an operation type, a GUID that identifies the expression, and various flags (these Boolean flags indicate whether the OptoExpression has a GUID, and whether the expression is in the algebraic cache. , And whether the OptoExpression is used as part of the alternative representation of the GUID, cost information (value indicating the cost to be used to evaluate the cost of this OptoExpression, and the rest of the contained representation) And a value indicating the cost of representation when implemented independently), and up to three OptoNode arguments. The optimizer creates one or more OptoExpression 1002 to determine the most efficient way to evaluate the desired expression. As described above, the optimizer analyzes each OptoExpression 1002 and determines the cost associated with the expression evaluation. The optimizer can then determine which OptoExpression 1002 to use for efficiency.

図１０Ｂは、ＯｐｔｏＮｏｄｅツリーの一例を示す。ツリーのルートには、数学的に等価の表現を表すＯｐｔｏＥｘｐｒｅｓｓｉｏｎのリストであるＯｐｔｏＮｏｄｅ１００４がある。各ＯｐｔｏＥｘｐｒｅｓｓｉｏｎは、その表現の引数のリストを含む。例えば、ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００６は３つの引数Ａｒｇ［０］、Ａｒｇ［１］、Ａｒｇ［２］を含む。そして、各引数は、特定の引数に使用できる代替の表現を列挙したＯｐｔｏＮｏｄｅを参照する。例えば、ＯｐｔｏＮｏｄｅ１００８は、ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１００６のＡｒｇ［２］に使用できる表現リスト（Ｌｉｓｔ［０］、Ｌｉｓｔ［１］、Ｌｉｓｔ［２］、…）を参照する。そして、これら表現は、ＯｐｔｏＥｘｐｒｅｓｓｉｏｎ１０１０、１０１２、及び１０１４によって表される。これら各表現は、ＯｐｔｏＮｏｄｅ１００６によって表される表現の引数Ａｒｇ［２］に使用された場合、数学的等価の結果を提供する。このＯｐｔｏＮｏｄｅツリー構造では、複数の等価表現をツリーの各レベルに列挙することができる。例えば、オプティマイザ内のｆｉｎｄＡｌｇｅｂｒａｉｃルーチン９１４（図９Ｂ参照）をＯｐｔｏＮｏｄｅツリー上で繰り返して、追加の代替表現を見つけて、ＯｐｔｏＮｏｄｅツリーに追加することができる。次に、ＦｉｎｄＬｅａｓｔＣｏｓｔルーチン９１５は、ＯｐｔｏＮｏｄｅツリーを探索（トラバース）して、最小コストで全体結果を計算するために使用できる特定の表現の集まりを識別することができる。次に、選択された表現の集まりが、ＸＳＮツリーに変換され、計算のためにセットプロセッサに送られる。 FIG. 10B shows an example of an OptoNode tree. At the root of the tree is an OptoNode 1004 which is a list of OptoExpressions that represent mathematically equivalent expressions. Each OptoExpression contains a list of arguments for that expression. For example, OptoExpression 1006 includes three arguments Arg [0], Arg [1], Arg [2]. Each argument refers to an OptoNode that lists alternative expressions that can be used for the specific argument. For example, the OptoNode 1008 refers to an expression list (List [0], List [1], List [2],...) That can be used for Arg [2] of the OptoExpression 1006. These expressions are represented by OptoExpression 1010, 1012, and 1014. Each of these representations, when used for the argument Arg [2] of the representation represented by OptoNode 1006, provides a mathematically equivalent result. In this OptoNode tree structure, multiple equivalent expressions can be listed at each level of the tree. For example, the findAlgebraic routine 914 (see FIG. 9B) in the optimizer can be repeated on the OptoNode tree to find additional alternative expressions and add them to the OptoNode tree. The FindLeastCost routine 915 can then traverse the OptoNode tree to identify a specific collection of expressions that can be used to calculate the overall result at the lowest cost. The selected collection of expressions is then transformed into an XSN tree and sent to the set processor for computation.

セットプロセッサは、システムがデータセットに対して実行するすべての計算及び論理値比較を担当する。一実施形態例では、セットプロセッサは、複数のプロセッサ、及び、システムメモリと不揮発性記憶装置との間の複数の独立した非競合Ｉ／Ｏチャネルを含むシステムを利用するように設計されるマルチスレッドリエントラントソフトウェア本体であってよい。セットプロセッサは、演算間のデータパイプライン化を利用するように設計することもできる。即ち、ある演算の結果を次の入力として、かかる結果を不揮発性記憶装置に中間的に記憶することなくそのまま渡すことができる。データパイプライン化は、Ｉ／Ｏ性能バリアを越える（クロスする）データ量を低減すると共に、不揮発性記憶装置からのデータの取得を担当するストレージマネージャに対する負荷を低減することにより、セットプロセッサの効率を大幅に向上させることができる。 The set processor is responsible for all calculations and logical value comparisons that the system performs on the data set. In one example embodiment, the set processor is multi-threaded designed to utilize a system that includes multiple processors and multiple independent non-contentional I / O channels between system memory and non-volatile storage. It may be the reentrant software body. Set processors can also be designed to take advantage of data pipelining between operations. That is, the result of a certain calculation can be used as the next input, and the result can be passed as it is without being stored in the nonvolatile storage device. Data pipelining reduces the amount of data that crosses (crosses) the I / O performance barrier and reduces the load on the storage manager responsible for acquiring data from non-volatile storage, thereby improving the efficiency of the set processor Can be greatly improved.

様々な演算の実行は、スレッドプールとして既知のオブジェクトによって監視される。スレッドプールは、ＰｒｏｃｅｓｓＯｐルーチンによって要求される各演算の実行スレッドの開始、それらの実行の監視、及びその成功又は失敗の報告を担当する。スレッドプールはエグゼクティブとも協働して、システムリソースを管理するために必要なエンジン内で実行されている現在のスレッド数を制限する。スレッドは、異なる数のハードウェア及びソフトウェアプラットフォームを使用して実施してもよい。例えば、図１のプロセッサ１０２等の従来のシングルコアプロセッサを、マルチプロセッシングをシミュレートするMicrosoft Windows（登録商標）等のオペレーティングシステムと併せて使用してもよい。代替の実施形態では、マルチプロセッサ又はマルチコアプロセッサを使用してよく、１つ又は複数のスレッドが各プロセッサに割当てられる。別の実施形態では、図３に示すようなマルチプロセッサシステムを使用してもよく、実行スレッドが各ＭＡＰ３０６ａ〜３０６ｆに割当てられる。システムの物理的な実施態様に関わりなく、実施形態例でのセットプロセッサは、性能を増大させるために、あるスレッドからの出力が別のスレッドの入力になるように、リスト、ツリー、又は他の構造を使用して各演算を互いに連鎖させることができる。 The execution of various operations is monitored by objects known as thread pools. The thread pool is responsible for starting execution threads for each operation required by the ProcessOp routine, monitoring their execution, and reporting their success or failure. The thread pool also works with executives to limit the current number of threads running in the engine that are needed to manage system resources. Threads may be implemented using different numbers of hardware and software platforms. For example, a conventional single core processor such as processor 102 of FIG. 1 may be used in conjunction with an operating system such as Microsoft Windows® that simulates multiprocessing. In alternative embodiments, multi-processor or multi-core processors may be used, and one or more threads are assigned to each processor. In another embodiment, a multiprocessor system as shown in FIG. 3 may be used, and an execution thread is assigned to each MAP 306a-306f. Regardless of the physical implementation of the system, the set processor in the example embodiment allows a list, tree, or other so that output from one thread becomes input to another thread to increase performance. A structure can be used to chain operations together.

セットプロセッサ内の各演算は、１つ又は複数の入力データに対して計算を実行し、出力データセットを生成するように設計された個々のルーチンである。これら演算は、データ処理に有用であることが分かっている拡張集合演算及び関数と等価である。セットプロセッサは、広範な物理−論理フォーマットマッピングをサポートするために、各演算に対してアルゴリズムの複数の実施態様を有することもできる。演算ルーチンを物理データフォーマット用に適合させることにより、処理のためにすべてのデータを単一の物理表現に変換することよりも、より高い効率及び性能を達成することができる。一実施形態例は、例えば、データをカンマ区切り値（ＣＳＶ）フォーマット、バイナリストリング符号化（ＢＳＴＲ）フォーマット、固定オフセット（ＦＩＸＥＤ）フォーマット、タイプ符号化データ（ＴＥＤ）フォーマット、及び／又はマークアップ言語フォーマット間でマッピングできるように、異なるフォーマット間の論理−物理マッピングをサポートする。これにより、システムは、すべてのデータを共通のフォーマットに変換する必要なく、データを処理することができる。例えば、システムは、ＣＳＶフォーマットの第１のデータセットとＸＭＬフォーマットの第２のデータセットとの結合の結果を計算する必要がある場合、いずれのデータも別のフォーマットに変換する必要なく、マッピングを使用して結果を計算し、かかる結果をＣＳＶフォーマット、ＸＭＬフォーマット、又は別の選択されたフォーマットで返すことができる。これに加えて、一実施形態例は、文字列、３２ビット整数、６４ビット整数、浮動小数点数、通貨、ブール値、日時値、及びインターバル値等の原子値用のいくつかの論理−物理マッピングも同様に含む。これらマッピングは、データフォーマットマッピングと同様に使用することができる。システムは、サポートされる様々なデータフォーマット及び原子フォーマットに対する潜在的なマッピングをすべて含んでもよく、又は選択されたマッピングのみを含んでもよい。例えば、実施形態例が５つのデータフォーマットをサポートする場合、各マッピングルーチンにつき５つの入力及び５つの出力があり、結果として、ソフトウェアルーチンの１２５個の潜在的なバージョンがある。実施形態例では、様々なフォーマット間をマッピングするためのソフトウェアルーチンは、効率の実質的な増大がある場合のみ含まれる。実質的な増大が結果として生じない場合、実施形態例は、マッピング関数を使用する代わりに、データを共通のフォーマットに変換する。 Each operation in the set processor is an individual routine designed to perform calculations on one or more input data and generate an output data set. These operations are equivalent to extended set operations and functions that have been found useful for data processing. The set processor can also have multiple implementations of algorithms for each operation to support a wide range of physical-logical format mappings. By adapting the arithmetic routine for a physical data format, higher efficiency and performance can be achieved than converting all data into a single physical representation for processing. An example embodiment may include, for example, data in comma separated value (CSV) format, binary string encoding (BSTR) format, fixed offset (FIXED) format, type encoded data (TED) format, and / or markup language format. Supports logical-to-physical mapping between different formats so that they can be mapped between. This allows the system to process the data without having to convert all the data to a common format. For example, if the system needs to calculate the result of a combination of a first data set in CSV format and a second data set in XML format, the mapping can be done without having to convert any data to another format. Can be used to calculate results and return such results in CSV format, XML format, or another selected format. In addition, one example embodiment provides several logical-physical mappings for atomic values such as strings, 32-bit integers, 64-bit integers, floating point numbers, currency, Boolean values, datetime values, and interval values. As well. These mappings can be used in the same manner as the data format mapping. The system may include all potential mappings to the various supported data formats and atomic formats, or may include only selected mappings. For example, if the example embodiment supports 5 data formats, there are 5 inputs and 5 outputs for each mapping routine, resulting in 125 potential versions of software routines. In the example embodiment, software routines for mapping between various formats are included only if there is a substantial increase in efficiency. If no substantial increase results, the example embodiment converts the data to a common format instead of using a mapping function.

セットプロセッサの別の機能は、プログラム全体を通して使用される共通集合スキーマのオブジェクト指向データモデルのインスタンスを提供することである。これは、特定の定義を満たし、プログラムによって実行される代数的及び計算的処理内で有用な構造である述語集合、ドメイン集合、濃度集合、及びその他を含む。 Another function of the set processor is to provide an object-oriented data model instance of a common schema that is used throughout the program. This includes predicate sets, domain sets, concentration sets, and others that satisfy certain definitions and are useful structures within algebraic and computational processes performed by the program.

図１１は、セットプロセッサソフトウェアモジュールの一実施形態例を示す。例では、オプティマイザは、ｓｐＰｒｏｃｅｓｓＸｓｎＴｒｅｅルーチン１１０２を通して評価するために、ＸＳＮツリーをセットプロセッサに提示する。ｓｐＰｒｏｃｅｓｓＸｓｎＴｒｅｅルーチン１１０２は、ＸＳＮツリーを調べ、ＸＳＮツリーが割当て又は関係ステートメントを表すか否か、又はＸＳＮツリーが明示的な集合を表すか否かを判断する。 FIG. 11 illustrates an example embodiment of a set processor software module. In the example, the optimizer presents the XSN tree to the set processor for evaluation through the spProcessXsnTree routine 1102. The spProcessXsnTree routine 1102 examines the XSN tree to determine whether the XSN tree represents an assignment or relationship statement, or whether the XSN tree represents an explicit set.

割当ステートメントの場合、ＰｒｏｃｅｓｓＸｓｎＡｓｓｉｇｎｍｅｎｔルーチン１１０５は、ステートメントの左値（ｌｖａｌｕｅ）がＸＳＮ集合であると確認されるか否かを判断する。ｌｖａｌｕｅが集合ではない場合、ルーチンは失敗コードを返す。次に、右値（ｒｖａｌｕｅ）が調べられて、演算であるか、それとも明示的な集合であるかが判断される。ｒｖａｌｕｅが明示的な集合である場合、ｌｖａｌｕｅに関連する外部識別子に、ｒｖａｌｕｅのＧＵＩＤが関連付けられる。ｒｖａｌｕｅが演算でもなく、明示的な集合でもない場合、ルーチンは失敗コードを返す。ｒｖａｌｕｅが演算である場合、ＰｒｏｃｅｓｓＸＳＮルーチン１１０７が呼び出されて、処理が続けられる。 In the case of an assignment statement, the ProcessXsnAssignment routine 1105 determines whether the left value (lvalue) of the statement is confirmed to be an XSN set. If lvalue is not a set, the routine returns a failure code. Next, the right value (rvalue) is examined to determine if it is an operation or an explicit set. If rvalue is an explicit set, the rvalue's GUID is associated with the external identifier associated with lvalue. If rvalue is neither an operation nor an explicit set, the routine returns a failure code. If rvalue is an operation, the ProcessXSN routine 1107 is called and processing continues.

関係ステートメントの場合、ＰｒｏｃｅｓｓＸＳＮＲｅｌａｔｉｏｎルーチン１１０６が、ｌｖａｌｕｅ及びｒｖａｌｕｅが演算であることを調べて確認する。いずれか一方又は両方が演算である場合、ＰｒｏｃｅｓｓＸＳＮルーチン１１０６が呼び出されて、いずれか一方又は両方に対する処理が続けられる。ｌｖａｌｕｅ又はｒｖａｌｕｅが演算ではない場合、単純に無視される。この目的は、典型的には、ただし非制限的に、オプティマイザをサポートするために関係を評価できるように、関係ステートメント内で参照される任意の集合を実現化することである。 For relational statements, the ProcessXSNRlation routine 1106 checks to see if lvalue and rvalue are operations. If either or both are operations, the ProcessXSN routine 1106 is called to continue processing for either or both. If lvalue or rvalue is not an operation, it is simply ignored. The goal is to realize an arbitrary set referenced in a relational statement, typically but without limitation, so that the relation can be evaluated to support the optimizer.

明示的な集合を実現化する要求の場合、ｓｐＰｒｏｃｅｓｓＸｓｎＴｒｅｅルーチン１１０２は、ルーチン１１０３において集合を即座に実現化し、実現化された集合を識別するＧＵＩＤを返す。 In the case of a request to realize an explicit set, the spProcessXsnTree routine 1102 immediately realizes the set in routine 1103 and returns a GUID that identifies the realized set.

ＰｒｏｃｅｓｓＸＳＮルーチン１１０７は、ＸＳＮツリーのルートにある現在の演算から開始してＸＳＮツリーのすべてのメンバを調べ、すべての演算に対してそれ自体を繰り返し呼び出す。実行すべき各演算は、ＸＳＮツリーのルート演算が、下位の演算よりも前に開始されて、適切なデータのパイプライン化の確立を保証するような順序でＰｒｏｃｅｓｓＯＰルーチン１１０８に渡される。 The ProcessXSN routine 1107 examines all members of the XSN tree starting from the current operation at the root of the XSN tree and repeatedly calls itself for all operations. Each operation to be performed is passed to the ProcessOP routine 1108 in an order that ensures that the root operation of the XSN tree is started before the subordinate operations to establish the proper data pipelining.

ＰｒｏｃｅｓｓＯｐルーチン１１０８は、各演算をとり、すべての集合の適切なＧＵＩＤが実行されるべき演算に関連付けられた状態でスレッドプール１１０９内に挿入する。次に、スレッドプール１１０９は、ＰｒｏｃｅｓｓＸＳＮルーチン１１０７に提示されるステートメント内の各演算に対する個々の実行スレッドを開始する。次に、これら実行スレッドは、動作が完了するまで、適切な演算１１１０を呼び出すことによって独立して実行される。これら各スレッドが完了すると、スレッドプール１１０９にスレッドの完了が通知され、スレッドプール１１０９は、スレッドをアクティブスレッドリストから除去することを含む適切なクリーンアップ及びエラー処理を提供する。 The ProcessOp routine 1108 takes each operation and inserts all sets of appropriate GUIDs into the thread pool 1109 in a state associated with the operation to be executed. The thread pool 1109 then starts an individual execution thread for each operation in the statement presented to the ProcessXSN routine 1107. These execution threads are then executed independently by calling the appropriate operation 1110 until the operation is complete. As each of these threads completes, the thread pool 1109 is notified of the thread completion, and the thread pool 1109 provides appropriate cleanup and error handling including removing the thread from the active thread list.

セットプロセッサは、集合に対して論理的演算を実行するように設計されるｓｐＬｏｇｉｃａｌルーチンとして知られる関数も含む。これら論理的演算は、セットプロセッサのｓｐＰｒｏｃｅｓｓＸｓｎＴｒｅｅルーチン１１０２によって実行される計算演算とは本質的に異なる。ｓｐＬｏｇｉｃａｌルーチンは、ｓｐＬｏｇｉｃａｌＥｑｕａｌ、ｓｐＬｏｇｉｃａｌＰｒｅｄｉａｔｅＥｑｕａｌ、及びｓｐＬｏｇｉｃａｌＰｒｅｄｉｃａｔｅＳｕｂＳｅｔを含み、通常、バイナリＸＳＮ記法で記憶される２つのデータセットを比較して、互いの論理的関係を判断するように設計される。これら関係は、相等、部分集合、上位集合、及び互いに素を含む。これら関数は、代替の表現を決定する際にオプティマイザによって使用される。 The set processor also includes a function known as an spLogical routine that is designed to perform logical operations on the set. These logical operations are essentially different from the computational operations performed by the set processor's spProcessXsnTree routine 1102. The spLogical routine includes spLogicalEqual, spLogicalPredicateEqual, and spLogicalPredicateSubSet, and is typically designed to compare two data sets stored in binary XSN notation to determine the logical relationship between them. These relationships include equivalence, subsets, supersets, and disjoint. These functions are used by the optimizer in determining alternative representations.

図１２Ａは、ＸＳＮ表現の一例をシステムにおいて表現するために使用できるＸＳＮツリー構造の図である。ＸＳＮツリーは、システム内でＸＳＮ表現を処理するために都合のよいフォーマットを提供する。図１２Ａは、表現A REL OP(B,C,D)のＸＳＮツリーを示す。この表現は、関係（REL）によってデータセットＡをデータセットＢ、Ｃ、及びＤで実行される演算（OP）に関連付ける。ＸＳＮツリーは、関係ノード１２０１、演算ノード１２０５、複数のメンバノード１２０２、１２０３、１２０６、１２０７、及び１２０８、並びに複数のデータセット１２０４、１２０９、１２１０、及び１２１１で構成される二重リンクリストである。関係ノード１２０１は、等号、未満、超等の表現の関係を指定する。関係ノード１２０１はメンバノード１２０２にリンクされ、メンバノード１２０２はその左側の子としてデータセットＡ１２０４（ステートメントの左値）へのリンクを有し、その右側の子としてメンバノード１２０３へのリンクを有する。メンバノード１２０３は、その左側の子として演算ノード１２０５にリンクされる。動作ノード１２０５は、射影、制限、結合等の実行すべき演算を識別される。演算ノード１２０５はメンバノード１２０６にリンクされ、メンバノード１２０６は、その左側の子としてデータセットＢ１２０９へのリンクを有し、その右側の子として別のメンバノード１２０７へのリンクを有する。メンバノード１２０７は、その左側の子としてデータセットＣ１２１０へのリンクを有し、その右側の子としてメンバノード１２０８を有する。メンバノード１２０８は、データセットＤ１２１１にリンクされる。 FIG. 12A is a diagram of an XSN tree structure that can be used to represent an example XSN representation in the system. The XSN tree provides a convenient format for processing XSN representations in the system. FIG. 12A shows an XSN tree of the expression A REL OP (B, C, D). This representation associates data set A with operations (OP) performed on data sets B, C, and D by relation (REL). The XSN tree is a double linked list composed of a relation node 1201, an operation node 1205, a plurality of member nodes 1202, 1203, 1206, 1207, and 1208, and a plurality of data sets 1204, 1209, 1210, and 1211. . The relation node 1201 designates a relation of expression such as equal sign, less than, and super. Relation node 1201 is linked to member node 1202, which has a link to dataset A 1204 (the left value of the statement) as its left child and a link to member node 1203 as its right child. . The member node 1203 is linked to the operation node 1205 as a child on the left side. The action node 1205 identifies operations to be performed such as projection, restriction, and combination. The compute node 1205 is linked to a member node 1206, which has a link to dataset B 1209 as its left child and a link to another member node 1207 as its right child. Member node 1207 has a link to dataset C 1210 as its left child and member node 1208 as its right child. Member node 1208 is linked to data set D 1211.

図１２Ｂは、ＸＳＮ割当ステートメントの一例をシステムにおいて表現するために使用できるＸＳＮツリー構造の図である。図１２Ｂは、割当ステートメントSQL1 = rdmPROJ(rdmREST(A, C1), C2)に対するＸＳＮツリーを示す。このステートメントは、英数字識別子ＳＱＬ１を表現rdmPROJ(rdmREST(A, C1), C2)に割当てる。ＸＳＮツリーは、割当てノード１２５１、英数字識別子１２５４、複数のメンバノード１２５２、１２５３、１２５６、１２５７、１２６０、及び１２６１、演算ノード１２５５及び１２５８、並びに複数のデータセット１２５９、１２６２、及び１２６３で構成される二重リンクリストである。割当てノード１２５１はメンバノード１２５２にリンクされ、メンバノード１２５２は、その左側の子として英数字識別子ＳＱＬ１１２５４へのリンクを有し、その右側の子としてメンバノード１２５３へのリンクを有する。メンバノード１２５３は、その左側の子として演算ノード１２５５（ｒｄｍＰＲＯＪ）にリンクされる。演算ノード１２５５は、実行すべき演算（この場合、射影）を識別する。演算ノード１２５５はメンバノード１２５６にリンクされ、メンバノード１２５６は、その左側の子として演算ノード１２５８（この場合、制限演算ｒｄｍＲＥＳＴ）へのリンクを有し、その右側の子として別のメンバノード１２５７へのリンクを有する。メンバノード１２５７は、その左側の子としてデータセットＣ２１２５９へのリンクを有する。演算ノード１２５８はメンバノード１２６０にリンクされ、メンバノード１２６０は、その左側の子としてデータセットＡ１２６２へのリンクを有し、その右側の子として別のメンバノード１２６１へのリンクを有する。メンバノード１２６１はデータセットＤ１２６３にリンクされる。実施形態例では、これらＸＳＮツリーは、アレイとしてシステム内部に記憶することができる。 FIG. 12B is a diagram of an XSN tree structure that can be used to represent an example XSN assignment statement in the system. FIG. 12B shows the XSN tree for the assignment statement SQL1 = rdmPROJ (rdmREST (A, C1), C2). This statement assigns the alphanumeric identifier SQL1 to the expression rdmPROJ (rdmREST (A, C1), C2). The XSN tree is composed of an assignment node 1251, an alphanumeric identifier 1254, a plurality of member nodes 1252, 1253, 1256, 1257, 1260, and 1261, operation nodes 1255 and 1258, and a plurality of data sets 1259, 1262, and 1263. This is a double linked list. Allocation node 1251 is linked to member node 1252, which has a link to alphanumeric identifier SQL1 1254 as its left child and a link to member node 1253 as its right child. Member node 1253 is linked to operation node 1255 (rdmPROJ) as a child on its left side. The operation node 1255 identifies an operation to be executed (in this case, projection). The compute node 1255 is linked to a member node 1256, which has a link to the compute node 1258 (in this case, the limit computation rdmREST) as its left child and to another member node 1257 as its right child. Have links. Member node 1257 has a link to data set C2 1259 as its left child. The compute node 1258 is linked to a member node 1260, which has a link to data set A 1262 as its left child and a link to another member node 1261 as its right child. Member node 1261 is linked to data set D 1263. In an example embodiment, these XSN trees can be stored inside the system as an array.

ストレージマネージャ４２０が、各集合を含む実際のデータを保持し、不揮発性記憶装置と揮発性記憶装置との間での効率的な転送を提供する。 A storage manager 420 maintains the actual data, including each collection, and provides efficient transfer between non-volatile and volatile storage devices.

図１３Ａ、図１３Ｂ、図１３Ｃ、及び図１３Ｄは、データのパイプライン転送及びバッファ連鎖（チェイン）を通してのデータ共有を可能にするために、バッファチェインをストレージマネージャ４２０においてどのように使用できるかを示す。これは単なる一実施形態例にすぎず、バッファチェインあり又はなしでストレージマネージャ４２０を実装することができる様々な方法があることに留意する。ストレージマネージャ４２０は、ＳｅｔＢａｓｅと呼ばれるクラスの別個のサブクラスであるＳｅｔＲｅａｄｅｒクラス及びＳｅｔＷｒｉｔｅｒクラス（略してリーダ及びライタと呼ばれる）の形態の簡易なメカニズムを介してセットデータへのアクセスを提供する。リーダは記憶装置からデータを読み出し、ライタはデータを記憶装置に書き込み、リーダとライタが一緒になって、ストレージマネージャ４２０のより複雑な機能をカプセル化する。 13A, 13B, 13C, and 13D illustrate how a buffer chain can be used in the storage manager 420 to enable data sharing through data pipeline transfer and buffer chaining. Show. Note that this is just one example embodiment, and there are various ways in which the storage manager 420 can be implemented with or without a buffer chain. The storage manager 420 provides access to set data via a simple mechanism in the form of a SetReader class and a SetWriter class (called reader and writer for short), which are separate subclasses of a class called SetBase. The reader reads data from the storage device, the writer writes data to the storage device, and the reader and writer together encapsulate the more complex functions of the storage manager 420.

このカプセル化により、異なるプラットフォーム又はストレージシステムに対して異なり得る柔軟なストレージマネージャ４２０の実装が可能になる。これに加えて、土台をなすストレージマネージャ４２０が、演算間のパイプライン化を提供して、物理的な記憶装置から転送しなければならないデータ量を最小化することも可能になる。この意味で、パイプライン化とは、データが書き込まれているか、それとも読み出されているかに関わりなく、土台をなすデータバッファの共有である。一例として、演算Ａ（ＯｐＡ）及び演算Ｂ（ＯｐＢ）を考える。但し、ＯｐＡはデータを生成し（ひいては記憶し）、ＯｐＢはそのデータを読み出す必要がある。非パイプライン手法では、別個の動作で、ＯｐＡが単にデータを書き込み、ＯｐＢがそのデータを記憶装置から読み出す。それに代えて、ストレージマネージャ４２０の設計では、ＯｐＡがデータを書き込み、ＯｐＢが、データが生成されているときに、実際には、多くの場合でデータが実際に記憶装置に書き込まれる前に、そのデータにアクセスすることができる。ＯｐＢは、ＳｅｔＲｅａｄｅｒインタフェースしか知らないため、データが実際には、記憶装置からではなくＯｐＡの出力から生じるものであることを知る必要がない。第２の例として、両方とも同じ集合からデータを読み出す必要があるＯｐＣ及びＯｐＤを考える。パイプライン化されたストレージマネージャ４２０は、両演算に関してデータを一度だけ読み出す。 This encapsulation allows a flexible storage manager 420 implementation that can be different for different platforms or storage systems. In addition, the underlying storage manager 420 can provide pipelining between operations to minimize the amount of data that must be transferred from a physical storage device. In this sense, pipelining is the sharing of the underlying data buffer, regardless of whether data is being written or read. As an example, consider operation A (OpA) and operation B (OpB). However, OpA generates (and thus stores) data, and OpB needs to read the data. In the non-pipeline approach, OpA simply writes data and OpB reads the data from the storage device in separate operations. Instead, in the design of the storage manager 420, when OpA writes data and OpB creates data, in fact, it is often the case before the data is actually written to the storage device. You can access the data. Since OpB only knows the SetReader interface, there is no need to know that the data actually comes from the output of OpA, not from the storage device. As a second example, consider OpC and OpD, both of which need to read data from the same set. The pipelined storage manager 420 reads the data only once for both operations.

このメカニズムを図１３Ａ、図１３Ｂ、図１３Ｃ、及び図１３Ｄに示す。データセットは、セットプロセッサの動作によって生成されるか、又はストレージマネージャを介してディスクから検索される。いずれの場合でも、ライタが使用されて、データがバッファチェインとして知られるＲＡＭバッファのリンクリスト内に順次配置される。セットプロセッサの動作がデータセットからのデータを要求すると、リーダが使用されて、演算に使用するために、データがＲＡＭバッファのリンクリストから順次検索される。一実施形態例では、データセットは、たった１つのライタを有するが、任意の数のリーダを有し得る。これを図１３Ａに示し、図１３Ａは、４つのシリアルバッファＤｂｕｆ１、Ｄｂｕｆ２、Ｄｂｕｆ３、及びＤｂｕｆ４を含むバッファチェイン１３０２を示す。ライタ１３０４を使用して、バッファチェイン１３０２内のデータを書き込むべきバッファを指す。ライタ１３０４はバッファチェインを通して順次進み、追加データがライタによりバッファチェインに添付された場合、新しいバッファが作成される。リーダ１３０６及び１３０８を使用して、データをバッファチェイン１３０２から読み出すことができるバッファを指す。 This mechanism is shown in FIGS. 13A, 13B, 13C, and 13D. Data sets are generated by the operation of a set processor or retrieved from disk via a storage manager. In either case, a writer is used to place data sequentially in a linked list of RAM buffers known as buffer chains. When the set processor operation requests data from the data set, the reader is used to sequentially retrieve the data from the linked list of RAM buffers for use in computations. In one example embodiment, the data set has only one writer, but may have any number of readers. This is shown in FIG. 13A, which shows a buffer chain 1302 that includes four serial buffers Dbuf1, Dbuf2, Dbuf3, and Dbuf4. A writer 1304 is used to refer to a buffer in which data in the buffer chain 1302 is to be written. The writer 1304 proceeds sequentially through the buffer chain, and if additional data is attached to the buffer chain by the writer, a new buffer is created. Readers 1306 and 1308 are used to refer to buffers from which data can be read from the buffer chain 1302.

セットプロセッサ内の動作の性質により、２つ以上のリーダによって読み出されているデータセットが、データを通して異なるペースで進むリーダを有する可能性が高い。例えば、図１３Ａに示すように、低速リーダ１３０８がＤｂｕｆ１を読み出し中であるのに対して、別のリーダ１３０６はすでにＤｂｕｆ３の読み出しを完了している。ライタ及びリーダがバッファチェインを通して進むにつれて、ライタは追加のバッファを作成し、リーダは、セットプロセッサ内の動作によって要求されるペースが何であれ、そのペースでデータを通して自由に進むことができる。図１３Ｂは、図１３Ａに示すリーダとライタとの同じ組み合わせを示すが、ライタ１３０４はＤｂｕｆ７に進んでおり、リーダ１３０６はＤｂｕｆ６に進んでおり、低速リーダ１３０８はＤｂｕｆ１のままである。 Due to the nature of operation within the set processor, a data set being read by more than one reader is likely to have readers that travel at different paces through the data. For example, as shown in FIG. 13A, the low-speed reader 1308 is reading Dbuf1, while another reader 1306 has already read Dbuf3. As the writer and reader progress through the buffer chain, the writer creates additional buffers, and the reader is free to proceed through the data at that pace whatever the pace required by the operation in the set processor. FIG. 13B shows the same combination of reader and writer shown in FIG. 13A, but the writer 1304 has advanced to Dbuf7, the reader 1306 has advanced to Dbuf6, and the low speed reader 1308 remains Dbuf1.

セットプロセッサの動作が続くにつれ、図１３Ｃに示すように、低速リーダ１３０８とその先にあるライタ１３０４及びリーダ１３０６との間に長い一連のバッファが作成される可能性がある。バッファチェイン１３０２が成長するにつれ、データをメモリに保持するために消費されるＲＡＭの空き容量も多くなる。ある時点で、使用中のＲＡＭ量が、追加のＲＡＭを必要とする他のルーチンのニーズにより過度になり、これら他のルーチンが使用できるようにするために、ＲＡＭのいくつかを解放しなければならなくなる。この状況が検出された場合、バッファチェインブレークを開始することができる。 As the set processor continues, a long series of buffers may be created between the slow reader 1308 and the writer 1304 and reader 1306 ahead, as shown in FIG. 13C. As the buffer chain 1302 grows, the free space in the RAM consumed to hold data in memory increases. At some point, the amount of RAM in use becomes excessive due to the needs of other routines that require additional RAM, and some of the RAM must be freed so that these other routines can be used. No longer. If this situation is detected, a buffer chain break can be started.

バッファチェインブレークは、データセットに関連付けられた追加のバッファチェインを作成することによって達成される。図１３Ｄに示す例では、ここではＤｂｕｆ２に進んだ低速リーダ１３０８が新しいバッファチェイン１３１０にコピーされる。この新しいバッファチェイン１３１０にも、新しいライタ１３１２が割当てられて、ディスクからシリアルデータを提供する。既存のバッファチェイン１３０２は、ここではＤｂｕｆ３〜Ｄｂｕｆ１２を含み、ライタ１３０４のみを含む。ライタ１３０４の背後にはもうリーダがないため、Ｄｂｕｆ３〜Ｄｂｕｆ１１は、ストレージマネージャによってもはや使用されていないＲＡＭバッファを解放する別個の非同期ルーチンであるＤｏＣｌｅａｎｕｐルーチンによって除去される。バッファ数は相当大きくなり得るため、これは、追加のＲＡＭを要求する他のルーチンが使用できる相当量のＲＡＭを提供する。 A buffer chain break is achieved by creating additional buffer chains associated with the data set. In the example shown in FIG. 13D, the low speed reader 1308 that has now advanced to Dbuf2 is copied to a new buffer chain 1310. This new buffer chain 1310 is also assigned a new writer 1312 to provide serial data from the disk. The existing buffer chain 1302 includes Dbuf3 to Dbuf12 here, and includes only the writer 1304. Since there are no more readers behind the writer 1304, Dbuf3 to Dbuf11 are removed by the DoCleanup routine, which is a separate asynchronous routine that frees RAM buffers that are no longer used by the storage manager. This provides a substantial amount of RAM that can be used by other routines that require additional RAM, since the number of buffers can be quite large.

最適化されたデータの記憶及び検索を提供することに加えて、実施形態例は、異なるデータモデルを使用する異なるスキーマ間での要求及びステートメントの変換及びマッピングのために使用することもできる。例えば、システムは、ＳＱＬデータモデル、ＸＭＬデータモデル、ＸＳＮデータモデル、又は他のデータモデル等の異なるデータモデルを使用するスキーマ間でのマッピングを含むことができる。ステートメントは、異なるデータモデルを使用するスキーマに基づいて提供することができる。例えば、第１のスキーマに基づくいくつかの照会言語ステートメントを、ＳＱＬフォーマット等の第１のフォーマットで提供することができる。上述したように、これらステートメントは、ＸＳＮフォーマットに変換することができ、これらステートメントからのデータセット及び代数的関係をセットマネージャ４０２において組み立てて記憶することができる。後に、ステートメントをＸＱｕｅｒｙフォーマット等の第２のフォーマットで受け取り得る。これもＸＳＮフォーマットに変換し、このステートメントからのデータセット及び代数的関係をセットマネージャ４０２において組み立てて記憶することができる。特に、このステートメントは、第２のデータモデルを使用するスキーマに基づくデータセットの提供を要求し得る。すべてのステートメントは統一されたＸＳＮデータモデルに変換されるため、オプティマイザ４１８は、第１のフォーマットで受け取ったステートメントから組み立てられるデータセット及び代数的関係を使用して、第２のフォーマットで要求されたデータセットを計算するために最適化された代数的関係の集まりを判断することができる。代数キャッシュに記憶される代数的関係及びスキーマ間のマッピングにより、第１のフォーマットのステートメントから取り込まれたデータセット及び関係を、第２のフォーマットで要求されたデータセットの最適化及び計算に使用することが可能になる。これにより、複数の異なるデータモデルを単一のシステムでサポートすることが可能になる。ステートメントからの全情報は、セットマネージャによってデータセット及び代数的関係として取り込まれるため、システムはモデル間で変換を行うことができる。さらに、この情報を使用して、部分表現（subexpression）の置換及び上述したようにオプティマイザにより使用される他の最適化技法を含め、他のデータモデルに関してデータセットの計算に使用されている代数的関係を最適化することもできる。データモデルは、関係データモデル、マークアップ言語データモデル、集合記法データ（set notation data）モデル、又は他のデータモデルであってよい。システムに提出されるステートメントのフォーマットは、標準照会言語ステートメント、ＸＱｕｅｒｙステートメント、集合記法ステートメント、又は他のフォーマットを含み得る。 In addition to providing optimized data storage and retrieval, example embodiments can also be used for conversion and mapping of requests and statements between different schemas using different data models. For example, the system can include mappings between schemas that use different data models, such as an SQL data model, an XML data model, an XSN data model, or other data model. Statements can be provided based on schemas that use different data models. For example, a number of query language statements based on a first schema can be provided in a first format, such as SQL format. As described above, these statements can be converted to XSN format, and data sets and algebraic relationships from these statements can be assembled and stored in set manager 402. Later, the statement may be received in a second format, such as an XQuery format. This can also be converted to XSN format and the data set and algebraic relationships from this statement can be assembled and stored in the set manager 402. In particular, this statement may require provision of a data set based on a schema that uses the second data model. Since all statements are converted to a unified XSN data model, the optimizer 418 was requested in the second format using a data set and algebraic relationships assembled from the statements received in the first format. A collection of algebraic relationships optimized to compute the data set can be determined. Mapping between algebraic relationships and schemas stored in the algebraic cache uses the datasets and relationships captured from the statements in the first format to optimize and calculate the datasets required in the second format. It becomes possible. This allows multiple different data models to be supported on a single system. All information from the statement is captured by the set manager as data sets and algebraic relationships, so the system can convert between models. In addition, this information can be used to replace the subexpressions and other optimization techniques used by the optimizer as described above for the algebraic values used to calculate the data set with respect to other data models. The relationship can also be optimized. The data model may be a relational data model, a markup language data model, a set notation data model, or other data model. The format of the statements submitted to the system can include standard query language statements, XQuery statements, set notation statements, or other formats.

例として、図１４Ａに提示する関係テーブル及びＸＭＬ文書を考える。関係テーブルは、数学的に拡張集合として表すことができる。関係テーブルを表す拡張集合のメンバは一般に、関係テーブル内の行と呼ばれる。関係テーブル内の行も数学的に拡張集合として表すことができる。関係テーブル内の行を表す拡張集合のメンバは一般に、フィールドと呼ばれる。行に共通するフィールドは列と呼ばれる。したがって、関係テーブルは、＜＜ｆ１１，ｆ１２，ｆ１３，…，ｆ１ｃ＞＞，…，＜ｆｒ１，ｆｒ２，ｆｒ３，…，ｆｒｃ＞＞という形態の拡張集合で表すことができる。但し、ｆはフィールドの値を表し、添え字ｒ及びｃは、一意の行及び列の計数値を表す。 As an example, consider the relationship table and XML document presented in FIG. 14A. The relational table can be expressed mathematically as an extended set. Members of the extended set that represent the relationship table are commonly referred to as rows in the relationship table. Rows in the relationship table can also be expressed mathematically as an extended set. Members of the extended set that represent rows in the relationship table are commonly referred to as fields. Fields that are common to rows are called columns. Therefore, the relationship table can be expressed by an extended set of the form << f11, f12, f13,..., F1c >>,..., <Fr1, fr2, fr3,. Here, f represents a field value, and subscripts r and c represent unique row and column count values.

ＸＭＬ文書も数学的に拡張集合として表すことができる。ＸＭＬ文書を表す拡張集合のメンバは一般に、ＸＭＬフラグメントと呼ばれ、データを表すタグ及び値を含む。これらＸＭＬフラグメントの値は、文字列又は別のＸＭＬフラグメントであってよい。したがって、ＸＭＬ文書は、＜ｔ１．｛ｖ１｝，…，ｔｎ．｛ｖｎ｝＞という形態の拡張集合で表すことができる。但し、ｔはＸＭＬフラグメントのタグを表し、ｖはＸＭＬフラグメントの値を表す。 An XML document can also be expressed mathematically as an extended set. Members of an extended set that represent an XML document are commonly referred to as XML fragments and include tags and values that represent data. The value of these XML fragments can be a string or another XML fragment. Therefore, an XML document is <t1. {V1}, ..., tn. It can be represented by an extended set of the form {vn}>. Here, t represents the tag of the XML fragment, and v represents the value of the XML fragment.

適宜定義された拡張集合を使用して、関係テーブルを表す拡張集合の変換関数ｇＲＸ（）のメンバは、ＸＭＬ文書を表す拡張集合のメンバにマッピングすることができ、関係フォーマット又はＸＭＬフォーマットでのデータのトランスペアレントな表現を可能にする。変換関数は、関係テーブル内のフィールドとＸＭＬ文書内のフラグメントとの間の構造的関係を提供し、関係テーブルの拡張集合表現に対して動作する。この情報の結果として、関係表現の値及び構造と同じデータのＸＭＬ表現の値及び構造との間の機能マッピングが提供される。 Using an appropriately defined extension set, members of the extension set conversion function gRX () representing the relation table can be mapped to members of the extension set representing the XML document, and the data in the relation format or XML format. Enables transparent representation of The transformation function provides a structural relationship between the fields in the relationship table and the fragments in the XML document and operates on the extended set representation of the relationship table. As a result of this information, a functional mapping between the value and structure of the XML representation of the same data as the value and structure of the relation representation is provided.

変換関数は、関係テーブルとＸＭＬフラグメントの集まりとの間の関係の集まりとして代数キャッシュ内に記憶することができる。ＸＭＬ文書から関係テーブルにマッピングするには、図１４ＡにおいてｇＲＸ（）として示される関数ｆＸＲ（）の敬意が使用される。これら関数が適切なマッピングを提供するには、項及び項の関係に対する制約が有効でなければならない。これら制約は、図１４Ａにおいてｗｈｅｒｅ節として列挙される。ｘ及びｚがＢ内にあり、Ｂ及びＤがＣ内にあるというメンバシップ制約と共に、ａがｓ．｛ｘ｝に等しくなければならないという制約は、ＸＭＬフラグメントがたった１つのみの値を含まなければならないことを示す。さらに、ｘ及びｙがＡ内にあるというメンバシップ制約と共に、ｂがｓ．ｘに等しくなければならないとう制約は、特定の行内の関係フィールドがたった１つのみの値を有しなければならないことを示す。これら制約は一緒になって、ＸＭＬフラグメントから関係テーブル内のフィールドへの一意のマッピングを保証する。 The transformation function can be stored in the algebraic cache as a collection of relationships between the relationship table and the collection of XML fragments. To map from the XML document to the relational table, the respect of the function fXR () shown as gRX () in FIG. 14A is used. In order for these functions to provide proper mapping, constraints on terms and term relationships must be valid. These constraints are listed as a where clause in FIG. 14A. With membership constraints that x and z are in B and B and D are in C, a is s. The constraint that it must be equal to {x} indicates that the XML fragment must contain only one value. Furthermore, with the membership constraint that x and y are in A, b is s. The constraint that it must be equal to x indicates that the relationship field in a particular row must have only one value. Together, these constraints ensure a unique mapping from XML fragments to fields in the relationship table.

別の例は、関係データテーブルへの有向（directed）グラフのベクター表現のマッピングである。図１４Ｂに示す有向グラフは、パス及び接合点で構成される。各接合点において、１つ又は複数のパスが接合点に入ると共にそこから出て行き、例外は、そこから出るパスのみを有する有向グラフの始点及びそこに入るパスのみを有する有向グラフの終点である。有向グラフの各接合点及び各接合点に入って出て行くパスは、｛ｆｒｏｍ．｛ｐ１，ｐ２，…，ｐｍ｝，ｔｏ．｛ｐｍ＋１，ｐｍ＋２，…，ｐｎ｝｝という形態の拡張集合として表すことができる。値ｐ１〜ｐｍは接合点からのパスを一意に識別し、値ｐｍ＋１〜ｐｎは接合点へのパスを一意に識別する。したがって、有向グラフは、拡張集合｛ｊ１．｛ｆｒｏｍ．｛ｐ１１，ｐ１２，…，ｐ１ｍ｝，ｔｏ．｛ｐ１ｍ＋１，ｐ１ｍ＋２，…，ｐ１ｎ｝｝，ｊ２．｛ｆｒｏｍ．｛ｐ２１，ｐ２２，…，ｐ２ｍ｝，ｔｏ．｛ｐ２ｍ＋１，ｐ２ｍ＋２，…，ｐ２ｎ｝｝，…，ｊｋ．｛ｆｒｏｍ．｛ｐｋ１，ｐｋ２，…，ｐｋｍ｝，ｔｏ．｛ｐｋｍ＋１，ｐｋｍ＋２，…，ｐｋｎ｝｝で表すことができる。この場合、変換関数はｆＮＲ（）である。有向グラフを関係テーブルに完全にマッピングする変換関数は、図１４Ｂに提示されるように明示的に定義される。関係からＸＭＬへのマッピングの場合と同様に、各モデルの規則を施行すると共に、モデル間での値及び構造のマッピングを提供するために、制約が必要である。有向グラフは、拡張集合Ｎで完全に表される。拡張集合Ｎは、グラフのｋ個すべての接合点のパスを表す項ｎｋ．Ｊｋの和集合である。パスｎｋ．Ｊｋは、各接合点のフロムパスｆ．Ｆｋ及びツーパスｔ．Ｔｋに関して定義される。関係テーブルは、拡張集合Ｒで表される。拡張集合Ｒは、から（from）、まで（to）、及びパスの各フィールドを含む関係テーブルの各行を表す項Ｒｉｊｋの和集合である。残りの制約は、項と項自体に対する制約との間の関係を定義する。これは、ｆ、ｔ、及びｐが存在しなければならず、互いに等しくあってはならないという制約、関係テーブルフィールドと有向グラフのパスとの間の関係を定義するために、Ｆｋが｛ｘｉ｝に等しくなければならず且つＴｋが｛ｙｊ｝に等しくなければならないという制約、パスを表すＦｋとＴｋとの対が一意でなければならないという制約、及びＪｋで表される各パスの範囲ｆ及び範囲ｔに一意の１つの値があるという制約を含む。 Another example is the mapping of a vector representation of a directed graph to a relational data table. The directed graph shown in FIG. 14B includes paths and junction points. At each junction point, one or more paths enter and exit the junction point, with the exception being the start point of a directed graph that has only paths that exit from it and the end point of a directed graph that has only paths that enter it. Each junction point of the directed graph and the path that enters and exits each junction point are {from. {P1, p2,..., Pm}, to. It can be expressed as an extended set of the form {pm + 1, pm + 2,..., Pn}}. Values p1-pm uniquely identify the path from the junction point, and values pm + 1-pn uniquely identify the path to the junction point. Therefore, the directed graph is an extended set {j1. {From. {P11, p12,..., P1m}, to. {P1m + 1, p1m + 2,..., P1n}}, j2. {From. {P21, p22,..., P2m}, to. {P2m + 1, p2m + 2,..., P2n}},. {From. {Pk1, pk2,..., Pkm}, to. {Pkm + 1, pkm + 2,..., Pkn}}. In this case, the conversion function is fNR (). The transformation function that completely maps the directed graph to the relational table is explicitly defined as presented in FIG. 14B. As with the relationship-to-XML mapping, constraints are needed to enforce the rules for each model and provide a mapping of values and structures between models. The directed graph is completely represented by the extended set N. The extended set N is a term nk.d that represents the path of all k joints in the graph. This is the union of Jk. Path nk. Jk is the from path f. Fk and two-pass t. Defined with respect to Tk. The relation table is represented by an extended set R. The extended set R is a union of terms Rijk representing each row of the relation table including fields from to (from) to (to) and the path. The remaining constraints define the relationship between terms and constraints on the terms themselves. This is because the constraint that f, t, and p must exist and must not be equal to each other, Fk is set to {xi} to define the relationship between the relationship table field and the directed graph path. The constraint that Tk must be equal to {yj}, the constraint that the pair of Fk and Tk representing the path must be unique, and the range f and range of each path represented by Jk It includes the constraint that there is one unique value for t.

上記フォーマット、スキーマ、及びマッピングが単なる例示に過ぎず、他の実施形態では、他のフォーマット、スキーマ、及びマッピングを使用してもよいことが理解されよう。 It will be appreciated that the above formats, schemas, and mappings are merely exemplary, and that other formats, schemas, and mappings may be used in other embodiments.

拡張集合記法の例
上述したように、拡張集合記法（ＸＳＮ）を実施形態例において使用することができる。以下において、使用可能な拡張集合記法（ＸＳＮ）の一例について説明する。これは、拡張集合記法の可能な実施形態の１つにすぎず、他の実施形態は、以下と異なる用語、集合タイプ、構文、パラメータ、演算、及び関数を使用してもよい。拡張集合記法の例は、近代の計算システムの環境内で拡張集合数学に基づいて表現を指定し操作するための単純で使用しやすい構文を提供する。この記法は、標準ＡＳＣＩＩキャラクタで表現可能であり、値、集合、演算、関係、及び表現をコンピュータベースの操作及び処理に適したように表すための標準構文を提供する。この記法は、標準ＡＳＣＩＩキャラクタが機械可読形態で代数的拡張集合表現を指定する能力を提供する。 Example of Extended Set Notation As described above, extended set notation (XSN) can be used in example embodiments. In the following, an example of an extended set notation (XSN) that can be used will be described. This is just one possible embodiment of extended set notation, and other embodiments may use different terms, set types, syntax, parameters, operations, and functions than the following. The extended set notation example provides a simple and easy-to-use syntax for specifying and manipulating expressions based on extended set mathematics within the environment of modern computing systems. This notation can be expressed in standard ASCII characters and provides a standard syntax for representing values, sets, operations, relationships, and representations suitable for computer-based operations and processing. This notation provides the ability for standard ASCII characters to specify an algebraic extended set representation in machine-readable form.

ＸＳＮの主要構成要素を説明し識別するために使用される用語を以下の表１において定義する。 Terms used to describe and identify the major components of XSN are defined in Table 1 below.

構文：ＸＳＮシステムは、集合を指定する象徴的手段並びに表現及びステートメントを定式化するための文法を含む。以下の説明中、括弧（［］）で囲まれた用語は、任意選択の構文を示す。例えば、範囲が要求されない場合、要素は［範囲］．構成要素として表される。省略記号（．．．）は、一連の任意の長さの表現を示す。例えば、＜”１”，”２”，”３”，…＞である。 Syntax: The XSN system includes a symbolic means of specifying a set and a grammar for formulating expressions and statements. In the following description, terms enclosed in parentheses ([]) indicate optional syntax. For example, if no range is required, the element is [Range]. Expressed as a component. The ellipsis (...) indicates a series of arbitrary length expressions. For example, <“1”, “2”, “3”,.

シンボル。構文は、以下の表２に記載する特定の共通するシンボルを利用する。読みやすくするために、所望であれば、任意選択的なスペースを句読点間に挿入してもよい。明確にするために、改行がステートメント、表現、又は集合内のどこであっても発生し得る。 symbol. The syntax utilizes certain common symbols listed in Table 2 below. For ease of reading, an optional space may be inserted between punctuation marks if desired. For clarity, line breaks can occur anywhere in a statement, expression, or set.

値：値は、二重引用符内で値を明示的に記述することによって指定される。値の例としては、“Curly”、 “１２３”、及び“＄２，３４３．７６”が挙げられる。値が二重引用符（“）を含む場合、その前に二重引用符を挿入することによって区切ることができる。例えば、“John said“”shoot”” when he saw the moose.”。ヌル値が、””等の２つの連続した二重引用符によって指定される。 Value: The value is specified by explicitly describing the value within double quotes. Examples of values include “Curly”, “123”, and “$ 2,343.76”. If the value contains a double quote (“), it can be delimited by inserting a double quote before it, eg“ John said “” shoot ”” when he saw the moose. ”Null value Is specified by two consecutive double quotes such as "".

英数字識別子。英数字識別子によって識別される集合は、割当ステートメントによって指定される。指定されると、英数字識別子を、その英数字識別子が割当てられた表現と同義で使用することができる。例えば、集合に英数字識別子NDCENSUS1960が割当てられた場合、NDCENSUS1960を、任意の表現内で、NDCENSUS1960が割当てられた集合を参照するために使用することができる。 An alphanumeric identifier. The set identified by the alphanumeric identifier is specified by the assignment statement. If specified, an alphanumeric identifier can be used synonymously with the expression to which the alphanumeric identifier is assigned. For example, if an alphanumeric identifier NDCENSUS1960 is assigned to a set, NDCENSUS1960 can be used in any representation to refer to the set to which NDCENSUS1960 is assigned.

範囲、構成要素（Constituents）、及び要素（Elements）：範囲及び構成要素は、値、英数字識別子、要素、又は集合で表すことができる。要素の構文は、[範囲.]構成要素である。範囲は、ピリオドの使用を通して構成要素から分離され、ピリオドの左側の項は範囲を表し、ピリオドの右側は構成要素を表す。例えば、範囲が値“１”を有し、構成要素が値“Bob”を有する要素は、適切な記法では“1”.”Bob”と表される。 Ranges, Constituents, and Elements: Ranges and components can be represented by values, alphanumeric identifiers, elements, or collections. The syntax of the element is the [Range.] Component. Ranges are separated from components through the use of a period, the term to the left of the period representing the range, and the right side of the period representing the component. For example, an element whose range has the value “1” and whose component has the value “Bob” is represented in the appropriate notation as “1”. ”Bob”.

要素は、少なくとも１つの範囲及び１つの構成要素を必要とする複合構造を有する範囲又は構成要素である。構成要素は明示的に記述されなければならないが、範囲に対するヌル値は、明示的に記述されず、暗黙的に示される。上記例では、要素“1”.”Bob”は、範囲“1”及び構成要素“Bob”を有する。しかし、範囲及び構成要素は両方とも、英数字識別子、要素、及び集合であってもよく、結果として潜在的に複雑な表現になり得る。 An element is a range or component having a composite structure that requires at least one range and one component. The component must be explicitly described, but the null value for the range is not explicitly described but is implied. In the above example, the element “1”. “Bob” has the range “1” and the component “Bob”. However, both ranges and components can be alphanumeric identifiers, elements, and collections, which can result in potentially complex expressions.

こういった潜在的に複雑な表現から発生する一問題は、範囲及び構成要素に関する優先順位である。例えば、要素“integer”.”sum”.”5”が与えられた場合、範囲と構成要素との区切りに問題が生じる。すなわち、範囲は“integer”なのか、それとも“integer”.”sum”なのか、という問題である。また、構成要素は“5”なのか、それとも“sum”.”5”なのか、という問題である。このＸＳＮ例での規則により、最初のピリオドの左側の項は範囲であり、右側の項は構成要素である。すると、これは、“integer”が範囲であり、“sum”.”5”が構成要素であることを示す。しかし、“integer”.”sum”を範囲にし、“5”を構成要素にしたい場合、これは、要素(“integer”.”sum”).”5”のように丸括弧の使用を通して指定することができる。 One problem that arises from these potentially complex expressions is the priority with respect to scope and components. For example, when the element “integer”. ”Sum”. ”5” is given, a problem occurs in the division between the range and the component. That is, whether the range is "integer" or "integer". "Sum". The problem is whether the component is “5” or “sum”. ”5”. By convention in this XSN example, the left term of the first period is a range and the right term is a component. Then, this indicates that “integer” is a range and “sum”. ”5” is a component. However, if you want “integer”. ”Sum” to be a range and “5” to be a component, this is specified through the use of parentheses like element (“integer”. ”Sum”). ”5” be able to.

メンバ及び集合。メンバは、集合内に含まれる要素、集合、又は表現である。集合は、表現により、又は個々のメンバを列挙することによって指定され、メンバのうちのいくつか又はすべては要素、集合、又は表現であることができる。任意の順序で列挙される同じメンバを含む任意の集合は、同じ集合である。 Members and sets. A member is an element, set, or representation contained within a set. A set is specified by expression or by enumerating individual members, and some or all of the members can be elements, sets, or expressions. Any sets that contain the same members listed in any order are the same set.

多くの場合、集合のメンバは、自然数の集合に属する範囲を含む。場合によっては、これら範囲は連続し、一意であり、値１を含む。こういった場合、集合を順序付き集合と呼ぶことができる。これら基準を満たさないすべての集合は、非順序付き集合と呼ぶことができる。 In many cases, members of a set include a range that belongs to a set of natural numbers. In some cases, these ranges are contiguous and unique and include the value 1. In such cases, the set can be called an ordered set. All sets that do not meet these criteria can be referred to as unordered sets.

集合は、｛member［,member［,...］］}として表現される。非順序付き集合のメンバは、{″a″，″x″，″b″，″g″}又は{″Groucho″，″Harpo″，″Gummo″}のように中括弧で囲まれる。順序付き集合のメンバは、＜″a″，″b″，″x″，″g″＞のように山括弧で囲まれる。順序付き集合のメンバは、その仕様において列挙される暗黙的な順序を有する。順序付き集合の連続した各メンバの範囲は、自然数の集合の対応するメンバである。したがって、＜″a″，″b″，″x″，″g″＞は、｛″1″．″a″，″2″．″b″，″3″．″x″，″4″．″g″｝と等価である。 A set is represented as {member [, member [, ...]]}. The members of the unordered set are enclosed in curly braces, such as {"a", "x", "b", "g"} or {"Groucho", "Harpo", "Gummo"}. The members of the ordered set are enclosed in angle brackets, such as <"a", "b", "x", "g">. The members of the ordered set have an implicit order listed in the specification. The range of each consecutive member of the ordered set is the corresponding member of the natural number set. Thus, <″ a ″, ″ b ″, ″ x ″, ″ g ″> is {″ 1 ″. "A", "2". "B", "3". ″ X ″, ″ 4 ″. It is equivalent to "g"}.

例えば、順序付き集合は、任意の数のデータフィールドを有するデータ記録であって、集合のメンバが記録のフィールドを表し、メンバの範囲が、対応するフィールドの記録内の順序位置であるデータ記録を表すことができる。以下の表の最初の行内のカンマ区切り値は、処理のために集合として指定することができる。データは、多くの異なる方法で階層に群化することができる。以下の表３はいくつかの可能性を示す。 For example, an ordered set is a data record having an arbitrary number of data fields, where the members of the set represent the fields of the record, and the range of members is the order position within the record of the corresponding field. Can be represented. The comma-separated values in the first row of the following table can be specified as a set for processing. Data can be grouped into a hierarchy in many different ways. Table 3 below shows some possibilities.

元のカンマ区切り値は４つの値シーケンスを含み、各値シーケンスは３つの値を有する。 The original comma-separated value includes four value sequences, each value sequence having three values.

集合１は、４つのメンバの非順序付き集合として指定され、各メンバは３つのメンバの非順序付き集合を含む。 Set 1 is designated as an unordered set of four members, each member comprising an unordered set of three members.

集合２は、４つのメンバの順序付き集合として指定され、各メンバは３つのメンバの非順序付き集合を含む。 Set 2 is designated as an ordered set of four members, each member containing an unordered set of three members.

集合３は、４つのメンバの非順序付き集合として指定され、各メンバは３つのメンバの順序付き集合を含む。 Set 3 is designated as an unordered set of four members, each member comprising an ordered set of three members.

集合４は非順序付きとして指定される。集合４は、範囲を使用して、集合の各メンバの、その集合のその他のメンバに相対する位置を示す。 Set 4 is designated as unordered. Set 4 uses a range to indicate the position of each member of the set relative to the other members of the set.

集合の中身及び構造は、時として、特に集合が関数及び演算の引数として使用される場合、集合の目的によって決定される。これら決定される構造のうちのいくつかは、ＸＳＮ例を仕様して関係データ演算を記述する場合に頻繁に発生する。こういった共通集合によっては、通常、述語集合、マッピング集合、変換集合、又は集約集合と呼ばれ、以下でより詳細に考察する。 The contents and structure of a set are sometimes determined by the purpose of the set, especially when the set is used as an argument for functions and operations. Some of these determined structures frequently occur when specifying XSN examples to describe relational data operations. Some such common sets are commonly referred to as predicate sets, mapping sets, transformation sets, or aggregate sets, and are discussed in more detail below.

述語集合：述語集合は、ある集合のメンバと別の集合のメンバとの間のマッピングの仕様を提供する。述語集合は、真を判断するために入れ子式条件表現を記述する。ＲＤＭＲＥＳＴ関数で使用されるような条件表現の場合、基本条件は、″condition″.＜element1，element2＞と表現される。 Predicate set: A predicate set provides a specification of the mapping between members of one set and members of another set. A predicate set describes a nested conditional expression to determine true. In the case of a conditional expression as used in the RDMREST function, the basic condition is expressed as “condition”. <Element1, element2>.

要素は、″列値″又は″ｃｏｎｓｔ″．″スカラー値″として指定することができる。条件は、等号（″ＥＱ″）、非等号（″ＮＥＱ″）、未満（″ＬＴ″）、以下（″ＬＥ″）、超（″ＧＴ″）、以上（″ＧＥ″）、同様（″ＬＫ″）、又は非同様（″ＮＬＫ″）として指定される。ＲＤＭＲＥＳＴ関数の場合、各要素は、条件で比較すべき列又は範囲″ｃｏｎｓｔ″によって示される定数スカラー値を指定する。 The element is "column value" or "const". Can be specified as "scalar value". Conditions are equal ("EQ"), not equal ("NEQ"), less than ("LT"), less ("LE"), more than ("GT"), more ("GE"), and so on ( "LK") or not ("NLK"). For the RDMREST function, each element specifies a constant scalar value indicated by the column or range “const” to be compared in the condition.

例えば、条件がＥＱであり、第１の要素が列の名称であり、第２の要素が定数値を提供する条件句″EQ″．＜″2″，″const″．″MI″＞は、第２の列が値″ＭＩ″に等しいすべてのメンバ（行）が出力集合に含まれることを示す。 For example, the condition is “EQ”, where the condition is EQ, the first element is the name of the column, and the second element provides a constant value. <“2”, “const”. "MI"> indicates that all members (rows) whose second column is equal to the value "MI" are included in the output set.

以下の例では、単一の条件が、ＲＤＭＲＥＳＴ関数の述語集合に対して指定される。結果として生成される集合は、集合ｚｉｐｃｉｔｙｓｔａｔｅからの第３の列に値″ＩＮ″を含むメンバ（行）のみを含む。追加の２組の中括弧に留意する。
RDMREST（zipcitystate，｛｛｛″EQ″．＜″3″，″const″．″IN″＞}}}） In the following example, a single condition is specified for the RDMREST function predicate set. The resulting set includes only members (rows) that contain the value “IN” in the third column from the set zipcitystate. Note the two additional sets of braces.
RDMREST (zipcitystate, {{{″ EQ ″. <″ 3 ″, ″ const ″. ″ IN ″>}}})

これらは、後述するＡＮＤ演算及びＯＲ演算の構築をサポートするために必要である。 These are necessary to support the construction of AND and OR operations described below.

ＡＮＤステートメント：条件集合がＡＮＤステートメントであり、リスト内の全条件が一緒にＡＮＤ演算される。すべて真の場合、全体条件は真である。これは、ＡＮＤ構造の一例である。
｛｛″EQ″．＜″2″，″const″．″MI″＞｝，｛″GE″．＜″5″，″const″．″49000″＞｝，｛″LT″．＜″5″，″const″．″51000″＞｝｝ AND statement: The condition set is an AND statement, and all conditions in the list are ANDed together. If all are true, the overall condition is true. This is an example of an AND structure.
{{"EQ". <“2”, “const”. "MI">}, {"GE". <″ 5 ″, ″ const ″. "49000">}, {"LT". <″ 5 ″, ″ const ″. "51000">}}

３つの条件区が、１組の中括弧で囲まれて、ＡＮＤステートメントを区切る。 Three conditional sections are enclosed in a set of braces to delimit the AND statement.

ＯＲステートメント：ＯＲステートメントは、２つ以上のＡＮＤステートメントを組み合わせることによって作成される。任意のＡＮＤステートメントの結果が真の場合、ステートメント全体は真である。ここで、一例は、
｛｛｛″GE″．＜″1″，″const″．″10000″＞｝｝，｛｛″GT″．＜″3″，″const″．″AK″＞｝，｛″LT″．＜″3″，″const″．″CA″＞｝｝，｛｛″EQ″．＜″2″，″const″．″Pasadena″＞｝｝｝）
である。 OR statement: An OR statement is created by combining two or more AND statements. If the result of any AND statement is true, the entire statement is true. Here is an example
{{{"GE". <“1”, “const”. "10000">}}, {{"GT". <“3”, “const”. "AK">}, {"LT". <“3”, “const”. "CA">}}, {{"EQ". <“2”, “const”. "Pasadena">}}})
It is.

この例では、３つのＯＲステートメントがある。第１のＯＲステートメントは１つの条件句を含み、第２のＯＲステートメントは２つのＡＮＤ条件句を含み、最後のＯＲステートメントは単一の条件句を含む。このようにして、複雑な条件表現を構築して、演算を定義することができる。 In this example, there are three OR statements. The first OR statement contains one conditional phrase, the second OR statement contains two AND conditional phrases, and the last OR statement contains a single conditional phrase. In this way, complex conditional expressions can be constructed to define operations.

マッピング集合：演算及び関数によっては、集合がマッピングを提供する必要がある。大半の場合、範囲及び構成要素が使用されて、入力集合と出力集合との関係を提供する。例えば、ＲＤＭＰＲＯＪ演算では、集合が、入力集合の列と出力集合の列とのマッピングを提供する。範囲値は出力集合の列を示し、構成要素は入力集合の列を示す。例えば、
＜″3″，″5″，″1″＞
である。 Mapping set: Some operations and functions require the set to provide a mapping. In most cases, ranges and components are used to provide a relationship between input and output sets. For example, in the RDMPROJ operation, the set provides a mapping between the columns of the input set and the columns of the output set. The range value indicates an output set column, and the component indicates an input set column. For example,
<″ 3 ″, ″ 5 ″, ″ 1 ″>
It is.

この述語集合は、入力集合の第３、第５、及び第１の列を出力集合の第１、第２、及び第３の列にマッピングすべきであることを示す。 This predicate set indicates that the third, fifth, and first columns of the input set should be mapped to the first, second, and third columns of the output set.

変換集合：変換表現は、集合からの１つ又は複数の入力値を出力集合内の値に変換するために使用される。変換は、減算（“ＳＵＢ”）、加算（“ＡＤＤ”）、除算（“ＤＩＶ”）、及び乗算（“ＭＵＬ”）等の演算を含む。追加の変換演算は、定数（“ＣＯＮＳＴ”）である。変換表現は、通常、ＲＤＭＭＡＴＨ等の関係演算と共に使用されて、出力集合のメンバを定義する。例えば、出力集合の第１の列が、入力集合の第１の列と第２の列との和として設計された場合、これを指定するために、以下の変換集合が使用される。
＜″ADD″．＜″1″，″2″＞＞ Transformation set: A transformation representation is used to transform one or more input values from a set to values in an output set. The transformation includes operations such as subtraction (“SUB”), addition (“ADD”), division (“DIV”), and multiplication (“MUL”). The additional conversion operation is a constant (“CONST”). Transform expressions are typically used with relational operations such as RDMATH to define the members of the output set. For example, if the first column of the output set is designed as the sum of the first column and the second column of the input set, the following transformation set is used to specify this:
<“ADD”. <″ 1 ″, ″ 2 ″ >>

これは、入力集合の第１及び第２の列を、加算変換の第１及び第２の引数として使用して、出力の第１の列の値を生成すべきであることを示す。変換は深く入れ子にして、仕様を提供することができる。例えば、計算（ＣＯＬ１＋ＣＯＬ２）／（ＣＯＬ３−１）で出力集合内の列１を表したく、入力集合の列５及び列６が列２及び列３にマッピングされた場合、変換集合は、
＜″DIV″．＜″ADD″．＜″1″，″2″＞，″SUB″．＜″3″，″CONST″．″1″＞＞，″5″，″6″＞
である。 This indicates that the first and second columns of the input set should be used as the first and second arguments of the addition transformation to generate the value of the first column of outputs. Transformations can be deeply nested to provide specifications. For example, if the calculation (COL1 + COL2) / (COL3-1) is desired to represent column 1 in the output set, and column 5 and column 6 of the input set are mapped to column 2 and column 3, the transformation set is
<"DIV". <“ADD”. <″ 1 ″, ″ 2 ″>, ″ SUB ″. <“3”, “CONST”. ″ 1 ″ >>, ″ 5 ″, ″ 6 ″>
It is.

変換集合は、特定のスカラードメイン修飾子を含むこともできる。例えば、数学が整数ドメインにおいて行われる場合、＜″ADD″．＜″1″，″2″＞＞の例は、
＜（″int64″．″ADD″）．＜″1″，″2″＞＞
として表現される。 The transformation set can also include specific scalar domain modifiers. For example, if mathematics is performed in the integer domain, <"ADD". An example of <“1”, “2” >>
<("Int64". "ADD"). <″ 1 ″, ″ 2 ″ >>
Is expressed as

これは、列１及び２のスカラー値を、整数値であるかのように一緒に加算することを示す。結果も整数スカラードメインにおいて生成される。関数名及び演算名のように、スカラードメイン識別子は大文字と小文字とが区別されない。 This indicates that the scalar values in columns 1 and 2 are added together as if they were integer values. Results are also generated in the integer scalar domain. Like function names and operation names, scalar domain identifiers are not case sensitive.

集約集合：集合はＲＤＭＧＲＯＵＰ関数にも使用されて、集約を提供する。集約演算は、加算（“ＳＵＭ”）、平均（“ＡＶＧ”）、計数（“ＣＮＴ”）、最小（“ＭＩＮ”）、及び最大（“ＭＡＸ”）を含む。これら関数は、ＲＤＭＧＲＯＵＰ関数によって作成される各群内の集合のメンバに対して実行すべき演算を指定する。例えば、
＜″1″，″3″，″COUNT″．″1″，″AVG″．″1″＞
である。 Aggregate sets: Aggregates are also used in the RDMGROUP function to provide aggregates. Aggregation operations include addition (“SUM”), average (“AVG”), count (“CNT”), minimum (“MIN”), and maximum (“MAX”). These functions specify the operations to be performed on the members of the set in each group created by the RDMGROUP function. For example,
<“1”, “3”, “COUNT”. "1", "AVG". ″ 1 ″>
It is.

これは、入力の第１及び第３の列が群の基礎を提供し、出力の第１及び第２の列として含まれることを示す。出力の第３の列は、群内の列１からのメンバの計数であり、第４の列は、群の列１内のメンバの平均である。 This indicates that the first and third columns of inputs provide the basis for the group and are included as the first and second columns of outputs. The third column of output is the count of members from column 1 in the group and the fourth column is the average of members in column 1 of the group.

変換集合のように、平均集合は、演算を実行すべきスカラードメインを指定することができる。例えば、上記を文字列ドメインで実行すべき場合、指定される集合は、
＜″1″，″3″，″（″STRING″．″COUNT″）．″1″，（″STRING″．″AVG″）．″1″＞
である。 Like a transform set, an average set can specify a scalar domain on which to perform an operation. For example, if the above is to be performed in a string domain, the specified set is
<"1", "3", "(" STRING "." COUNT ")." 1 ", (" STRING "." AVG ")." 1 ">
It is.

関数及び演算：関数及び演算は明示的に指定され、関数又は演算により、その関数又は演算に引数を提供する１〜３つの集合と組み合わせて指定される集合を定義する。他の実施形態では、異なる数の引数が可能であり得る。演算は原子的であり、拡張集合数学において指定される。関数は１つ又は複数の演算の組み合わせであり、頻繁に実行される演算の組み合わせの都合のよい記法である。 Functions and operations: Functions and operations are explicitly specified, and the function or operation defines a set that is specified in combination with one to three sets that provide arguments to the function or operation. In other embodiments, a different number of arguments may be possible. The operation is atomic and is specified in extended set mathematics. A function is a combination of one or more operations, and is a convenient notation for a combination of frequently executed operations.

関数及び演算は、予め規定された英数字識別子、丸括弧、及び１〜３つの集合引数を介して表現される。一例は、濃度集合｛″1″，″2″，″3″｝である集合を表すCRD（｛″1″，″2″，″3″｝）である。 Functions and operations are expressed via predefined alphanumeric identifiers, parentheses, and 1-3 set arguments. An example is CRD ({"1", "2", "3"}) representing a set that is a density set {"1", "2", "3"}.

一般に、関数は、関数（表現１［，表現２［，表現３［,...］］）として指定される。但し、引数の数は関数に依存する。特に、単項関数は１つの引数を必要とし、二項関数は２つの引数を必要とし、三項関数は３つの引数を必要とする。関数によっては、最後の引数は、マッピング及び変換の指定に使用される集合である。集合に使用される英数字識別子と異なり、関数名及び演算名では大文字と小文字との区別がない。 In general, a function is specified as a function (expression 1 [, expression 2 [, expression 3 [, ...]]). However, the number of arguments depends on the function. In particular, unary functions require one argument, binary functions require two arguments, and ternary functions require three arguments. For some functions, the last argument is a set that is used to specify mappings and transformations. Unlike alphanumeric identifiers used in sets, function names and operation names are not case sensitive.

以下は、関数のいくつかの例である。 The following are some examples of functions.

RDMPROJ（ASet，＜″7″，″1″，″2″，″3″＞）−−ＲＤＭＰＲＯＪは、関係データモデル（ＲＤＭ）射影関数である。ＡＳｅｔという名称の集合が、関係テーブルを表す演算への引数である。第２の集合は、結果として生成される集合内の列として使用されるべきＡＳｅｔからのメンバ（列）のマッピングを指定する集合である。 RDMPROJ (ASet, <"7", "1", "2", "3">)-RDMPROJ is a relational data model (RDM) projection function. A set with the name ASet is an argument to the operation representing the relation table. The second set is a set that specifies the mapping of members (columns) from the ASet to be used as columns in the resulting set.

INV（OldSet）−−ＩＮＶは反転関数であり、集合のメンバの範囲と構成要素とを交換させる。ＯｌｄＳｅｔという名称の集合は演算の引数であり、出力を生成するために反転する。 INV (OldSet)-INV is an inversion function, which exchanges the range of members of a set and its constituent elements. The set named OldSet is the argument of the operation and is inverted to produce the output.

CRD（MySet）−−ＣＲＤは濃度関数であり、入力引数集合の濃度集合を生成する。ＭｙＳｅｔという名称の集合は入力であり、出力集合の生成に使用される。 CRD (MySet)-CRD is a density function and generates a density set of the input argument set. A set named MySet is an input and is used to generate an output set.

RDMJOIN（cities_and_states,states_and_zips，｛｛｛″EQ″．＜″2″，″3″＞｝｝｝）−−ＲＤＭＪＯＩＮは、関係データモデル（ＲＤＭ）結合関数である。cities_and_states、states_and_zipsという名称の第１の２つの集合は、演算によって結合される対象である。第３の集合に定提供される明示的な述語集合が、結果として得られる結合集合のメンバを選択するために使用される条件を指定する。この場合、述語集合は、第１の集合の第２の列が第２の集合の第１の列（州列）に等しく、行を出力集合では結合すべきであることを指定する。 RDMJOIN (cities_and_states, states_and_zips, {{{"EQ". <"2", "3">}}})-RDMJOIN is a relational data model (RDM) binding function. The first two sets named cities_and_states and states_and_zips are objects to be combined by operation. The explicit predicate set provided in the third set specifies the conditions used to select members of the resulting combined set. In this case, the predicate set specifies that the second column of the first set is equal to the first column (state column) of the second set and that the rows should be joined in the output set.

RDMREST（zips，｛｛｛″GE″．＜″1″，″const″．″10000″＞｝，｛″LE″．＜″1″，″const″．″14999″＞｝｝，｛｛″GT″．＜″3″，″const″．″AK″＞｝，｛″LT″．＜″3″，″const″．″CA″＞｝｝｝）−−ＲＤＭＲＥＳＴは、関係データモデル（ＲＤＭ）制限関数である。ｚｉｐｓという名称の第１の集合は、関係テーブルを表す演算の引数である。第２の引数は、どのメンバ（行）が制限付き出力集合に含まれるべきかを指定する述語集合である。 RDMREST (zips, {{{"GE". <"1", "const". "10000">}, {"LE". <"1", "const". "14999">}}, {{" GT ". <" 3 "," const "." AK ">}, {" LT ". <" 3 "," const "." CA ">}}})-RDRMEST is a relational data model (RDM ) Limit function. The first set named zips is an operation argument representing a relation table. The second argument is a predicate set that specifies which members (rows) should be included in the restricted output set.

この例では、ＲＤＭ（関係データモデル）で始まる名称を有する関数は、関係データを集合として操作するように特に設計される。例えば、ＲＤＭＳＯＲＴは、ソート順及びプロシージャを示す第２の引数の集合内のメンバを使用して、第１の引数の集合をソートするバイナリ関数である。 In this example, functions with names that begin with RDM (Relational Data Model) are specifically designed to operate on relational data as a set. For example, RDMSORT is a binary function that sorts a first set of arguments using members in the second set of arguments that indicate the sort order and procedure.

表現：表現は、集合を指定する象徴的表現である。集合を表す英数字識別子は、表現の最も単純な形態である。表現は、多くの関数、演算、及び集合で構成することもできる。表現のいくつかの例としては、
CRD（SetA）
rdmPROJ（SetA，＜″1″，″5″，″23″＞）
CRD（rdmPROJ（SetA，＜″1″，″5″，″23″＞））
が挙げられる。 Expression: An expression is a symbolic expression that specifies a set. An alphanumeric identifier that represents a collection is the simplest form of representation. An expression can also consist of many functions, operations, and sets. Some examples of expressions include
CRD (SetA)
rdmPROJ (SetA, <″ 1 ″, ″ 5 ″, ″ 23 ″>)
CRD (rdmPROJ (SetA, <″ 1 ″, ″ 5 ″, ″ 23 ″>))
Is mentioned.

関係及び関係演算子：関係演算子は、２つの表現の関係を指定する象徴的表現である。関係演算子は、相等、部分集合、及び互いに素、並びにそれらの否定を含む。これらは、値“ＥＱ”、“ＳＵＢ”、“ＤＩＳ”、“ＮＥＱ”、“ＮＳＢ”、及び“ＮＤＳ”を使用して指定され、関係演算子を使用するいくつかの文例としては、
SetA EQ CRD（SetB）
SetC SUB SetB
が挙げられる。 Relations and relational operators: Relational operators are symbolic expressions that specify the relationship between two expressions. Relational operators include equality, subsets, and disjointness, and their negation. These are specified using the values “EQ”, “SUB”, “DIS”, “NEQ”, “NSB”, and “NDS”, and some example sentences using relational operators include:
SetA EQ CRD (SetB)
SetC SUB SetB
Is mentioned.

割り当て：割り当ては、英数字識別子を表現に割当てるステートメントである。構文的に、割り当ては、英数字識別子＝表現として指定される。例えば、
NewSet=<"1","2","12","4">
SetA=SS（SETB）
SetC＝＜″b″，″c″，″a″，″x″＞
SetD＝｛″Larry″，″Moe″，″Curly″｝
SetG=NULL
である。 Assignment: An assignment is a statement that assigns an alphanumeric identifier to an expression. Syntactically, the assignment is specified as an alphanumeric identifier = expression. For example,
NewSet = <"1", "2", "12", "4">
SetA = SS (SETB)
SetC = <″ b ″, ″ c ″, ″ a ″, ″ x ″>
SetD = {"Larry", "Moe", "Curly"}
SetG = NULL
It is.

関係データモデル：関係（リレーショナル）データモデル（ＲＤＭ）は、ＸＳＮを使用して記述することができる拡張集合データモデルの部分集合である。関係テーブルは、順序付き集合の集合であるとみなされ、テーブルの行がこれら順序付き集合によって表される。行を表す集合のメンバは、行内の列（フィールド）の値である。３行を有し、各行が４列を含む関係テーブルは、構造：
＜＜a1，b1，c1，d1＞，＜a2，b2，c2，d2＞，＜a3，b3，c3，d3＞＞
を有する集合によって表される。 Relational data model: The relational data model (RDM) is a subset of the extended set data model that can be described using XSN. A relational table is considered to be a set of ordered sets, and the rows of the table are represented by these ordered sets. A member of a set representing a row is a column (field) value in the row. A relational table with 3 rows, each row containing 4 columns, has the structure:
<<< a1, b1, c1, d1>, <a2, b2, c2, d2>, <a3, b3, c3, d3 >>
Represented by a set with

テーブル及び個々の行は両方とも、順序付き集合として表されるが、関係テーブルを、
｛＜a1，b1，c1，d1＞，＜a2，b2，c2，d2＞，＜a3，b3，c3，d3＞＞｝
等の順序の付いていないメンバを含む集合として表現することも可能である。 Both tables and individual rows are represented as ordered sets, but relational tables
{<A1, b1, c1, d1>, <a2, b2, c2, d2>, <a3, b3, c3, d3 >>
It is also possible to express as a set including members that are not ordered.

濃度集合：集合が順序付きとして提示される場合、集合の順序を示す情報が提示されなければならない。関係データモデルの追加の特徴のうちのいくつかを保持するため、且つＸＳＮ表現の処理の最適化に有用な濃度情報を提供するために、通常、濃度集合が、関係テーブルを表す集合に対して指定される。上記非順序付き集合の濃度集合は、
＜″3″，＜″4″，＜Ca，Cb，Cc，Cd＞＞＞
である。 Concentration set: When a set is presented as ordered, information indicating the order of the set must be presented. In order to retain some of the additional features of the relational data model and to provide concentration information useful for optimizing the processing of the XSN representation, the concentration set is usually relative to the set representing the relation table. It is specified. The concentration set of the unordered set is
<"3", <"4", <Ca, Cb, Cc, Cd >>>
It is.

濃度集合は入れ子集合である。最も外側の集合は、集合の濃度（この例では、テーブルが３行を含むため、３である）を含み、次に、行を表すメンバの濃度集合が続く。Ｃａ〜Ｃｄは、行を表す集合のメンバを構成する値の濃度を表す値である。各値のＣｎは、その特定のメンバの最大濃度を表す。濃度集合は、濃度関数：
CardinalityOfSetA＝CRD（SetA）
を介して生成される。 The concentration set is a nested set. The outermost set contains the density of the set (in this example, 3 because the table contains 3 rows), followed by the density set of members that represent the rows. Ca to Cd are values representing the density of values constituting the members of the set representing the row. Each value of Cn represents the maximum concentration of that particular member. The concentration set is the concentration function:
CardinalityOfSetA ＝ CRD （SetA）
Is generated through.

ＲＤＭ関数：標準的な関係データモデルは８つの演算で構成される。しかし、関係モデル全体を実施するために５つのみが必要とされ、通常、４つのみが実際の実施に使用されることが分かる。ＸＳＮは、拡張集合数学の枠組み内でのこれら関数の記法を提供する。 RDM function: The standard relational data model consists of 8 operations. However, it can be seen that only five are needed to implement the entire relational model, and usually only four are used in the actual implementation. XSN provides a notation for these functions within the framework of extended set mathematics.

これら関数は、関係データモデルの和（ＲＤＭＵＮＩＯＮ）、射影（ＲＤＭＰＲＯＪ）、制限（ＲＤＭＲＥＳＴ）、結合（ＲＤＭＪＯＩＮ）、差（ＲＤＭＤＩＦＦ）、及び商（ＲＤＭＤＩＶ）の拡張集合版である。これら関数に加えて、３つの追加の関数がＸＳＮの下で利用可能である。これらは、ＲＤＭＳＯＲＴ、ＲＤＭＰＩＶＯＴ、及びＲＤＭＧＲＯＵＰを含む。 These functions are an extended set of relational data models sum (RDMUNION), projection (RDMPROJ), restriction (RDMREST), join (RDMJOIN), difference (RDDMDIF), and quotient (RDDMIV). In addition to these functions, three additional functions are available under XSN. These include RDMSORT, RDMPIVOT, and RDMGROUP.

ＲＤＭＤＩＦＦ関数。ＲＤＭＤＩＦＦは、関係的Ａ−Ｂ演算と等価の非順序付き集合を定義する。結果として生成される集合は、Ｂ内にないＡのすべてのメンバを含む。以下は、この関数のフォーマット例及び説明である。
RDMDIFF（A，B）＝＝｛｝ RDDMIFF function. RDDMIFF defines an unordered set equivalent to a relational AB operation. The resulting set contains all members of A that are not in B. The following is an example format and description of this function.
RDMDIFF (A, B) == {}

引数：
Ａ−非順序付き集合
Ｂ−結果を生成するために、メンバがＡから除外される非順序付き集合 argument:
A-Unordered set B-Unordered set in which members are excluded from A to produce a result

結果：異なる関数の条件によって指定されるＢのメンバではないＡのメンバを含む非順序付き集合 Result: unordered set containing members of A that are not members of B specified by different function conditions

備考：列メンバのすべての値が等価である必要がある標準関係差への拡張として、ＸＳＮバージョンでは、等価関係を定義する述語集合を指定することが可能である。ヌルが条件述語集合に提供される場合、標準のＲＤＭ関数が実行される。ＡＥＱＢの場合、結果はヌル集合になる。ＡとＢとの積集合がヌル集合の場合、結果はＡになる。 Note: As an extension to the standard relationship difference where all values of column members need to be equivalent, the XSN version can specify a set of predicates that define an equivalence relationship. If a null is provided in the conditional predicate set, a standard RDM function is executed. For A EQ B, the result is a null set. If the product set of A and B is a null set, the result is A.

要件：集合ＡはＲＤＭ集合でなければならない。これら条件が満たされない場合の結果はヌル集合である。Ａ及びＢは、同じメンバ列濃度を有さなければならない。 Requirement: Set A must be an RDM set. The result when these conditions are not met is a null set. A and B must have the same member column density.

例：
A＝｛＜″a″，″b″，″c″＞，＜″d″，″b″，″r″＞｝
B＝｛＜″3″，″c″，″8″＞｝
RDMDIFF（A，B）＝＝｛＜″a″，″b″，″c″＞｝ Example:
A = {<″ a ″, ″ b ″, ″ c ″>, <″ d ″, ″ b ″, ″ r ″>}
B = {<"3", "c", "8">}
RDMDIFF (A, B) == {<"a", "b", "c">}

ＲＤＭＧＲＯＵＰ関数。ＲＤＭＧＲＯＵＰは、列が、１つ又は複数の列のメンバによって識別される指定群に基づいて集約される非順序付き集合を定義する。集約述語集合と併せて、この関数は、加算、計数、平均、最小、及び最大（ＳＣＡＭＭ）の各値を生成する能力を提供する。以下は、この関数のフォーマット例及び説明である。
RDMGROUP(A, Z) RDMGROUP function. RDMGROUP defines an unordered set in which columns are aggregated based on a specified group identified by one or more column members. In conjunction with the aggregate predicate set, this function provides the ability to generate sum, count, average, minimum, and maximum (SCAMM) values. The following is an example format and description of this function.
RDMGROUP (A, Z)

引数：Ａ−順序付き又は非順序付き集合 Argument: A-ordered or unordered set

結果：集合Ａ及び指定された集約述語集合Ｚの列のメンバの集約関数及びに基づいて生成されたメンバを含む非順序付き集合 Result: an unordered set containing members generated based on the aggregate functions of the members of the columns of the set A and the specified aggregate predicate set Z

備考：ＲＤＭＧＲＯＵＰは、述語集合において指定されるメンバ列の値の一意の各組み合わせにつき１つのメンバ行を生成する。集約するメンバ列は、範囲なしで述語集合内に列挙することによって指定される。出力集合に含めるべき他のメンバは、出力集合を生成するために、どのＳＣＡＭＭ値を実行すべきかを示すべきである。 Note: RDMGROUP generates one member row for each unique combination of member column values specified in the predicate set. The member columns to be aggregated are specified by enumerating in the predicate set without a range. Other members to be included in the output set should indicate which SCAMM values should be performed to generate the output set.

要件：集合ＡはＲＤＭ集合でなければならない。集合Ｚは集約述語集合でなければならない。これら条件が満たされない場合の結果はヌル集合である。 Requirement: Set A must be an RDM set. Set Z must be an aggregate predicate set. The result when these conditions are not met is a null set.

例：A＝＜＜″3″，″Tom″，″a″＞，
＜″2″，″Sam″，″c″＞，
＜″6″，″Harry″，″a″＞，
＜″7″，″Harry″，″a″＞＞
Z＝＜″3″，
″COUNT″．″2″，
″SUM″．″1″＞

RDMGROUP（A，Z）−＞｛＜″a″，″3″，″16″＞，＜″c″，″1″，″2″＞｝ Example: A = <<"3","Tom","a">,
<″ 2 ″, ″ Sam ″, ″ c ″>,
<"6", "Harry", "a">,
<″ 7 ″, ″ Harry ″, ″ a ″ >>
Z = <“3”,
"COUNT". ″ 2 ″,
"SUM". ″ 1 ″>

RDMGROUP (A, Z)->{<″ a ″, ″ 3 ″, ″ 16 ″>, <″ c ″, ″ 1 ″, ″ 2 ″>}

ＲＤＭＪＯＩＮ関数。ＲＤＭＪＯＩＮは、メンバ行が集合Ａからの１つのメンバ行と集合Ｂからの１つのメンバ行とを、これら２つのメンバ行間で条件述語集合Ｚが満たされることによって決定される場合に連結したものである非順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。
RDMJOIN（A，B，Z）＝＝｛｝ RDMJOIN function. RDMJOIN is a concatenation of one member row from set A and one member row from set B when the conditional predicate set Z is satisfied between these two member rows. Define an unordered set. The following is an example format and description of this function.
RDMJOIN (A, B, Z) == {}

引数：
Ａ−結果として生成されるメンバ行の左側として結合される非順序付き集合
Ｂ−結果として生成されるメンバ行の右側として結合される非順序付き集合
Ｚ−結合するメンバを決定するための条件集合を含む述語集合 argument:
A-Unordered set joined as the left side of the resulting member row B-Unordered set joined as the right side of the resulting member row Z-Condition set for determining the members to join A predicate set containing

結果：条件述語集合Ｚにおいて指定される条件に整合するメンバがＡからの１つのメンバ行及びＢからの１つのメンバ行から作成される非順序付き集合。述語集合Ｚにおいて指定される条件を満たす集合Ａからのメンバ行及び集合Ｂからのメンバ行が見つかった場合、結果として生成される集合のメンバが生成される。結果として生成されるメンバ行は、集合Ａからのメンバ行のメンバ列の後に集合Ｂのメンバ行からのメンバ列を含む順序付きメンバである。 Result: An unordered set in which members that match the conditions specified in the conditional predicate set Z are created from one member row from A and one member row from B. When a member row from set A and a member row from set B satisfying the conditions specified in predicate set Z are found, the members of the resulting set are generated. The resulting member row is an ordered member that includes the member columns from the member row of set B after the member columns of the member row from set A.

備考：条件述語集合Ｚは、集合Ａのメンバ行と集合Ｂのメンバ行との間で保たれなければならない条件を指定する。 Remark: The conditional predicate set Z specifies a condition that must be maintained between a member row of set A and a member row of set B.

要件：集合Ａ及び集合ＢはＲＤＭ集合でなければならない。集合Ｚは条件述語集合でなければならない。これら条件が満たされない場合の結果はヌル集合である。述語集合Ｚは、条件用に定義されたスキーマを有さなければならない。述語集合内のメンバの範囲は、集合Ａのメンバ列を指定し、述語集合内のメンバの構成要素は、集合Ｂからのメンバ列を指定する。 Requirement: Set A and Set B must be RDM sets. Set Z must be a conditional predicate set. The result when these conditions are not met is a null set. The predicate set Z must have a schema defined for the condition. The range of members in the predicate set specifies a member column of the set A, and the constituent elements of the members in the predicate set specify a member column from the set B.

例：
A＝｛＜″sales″，″Tom″＞，
＜″sales″，″Sam″＞，
＜″shipping″，″Bill″＞，
＜″shipping″，″Sally″＞｝
B＝｛＜″Bldg 1″，″sales″＞，
＜″Bldg 2″，″shipping″＞｝
Z＝｛｛｛″EQ″．＜″1″，″2″＞｝｝｝

RDMJOIN（A，B，Z）−＞｛
＜″sales″，″Tom″，″Bldg 1″，″sales″＞，
＜″sales″，″Sam″，″Bldg 1″，″sales″＞，
＜″shipping″，″Bill″，″Bldg 2″，″shipping″＞，
＜″shipping″，″Sally″，″Bldg 2″，″shipping″＞｝ Example:
A = {<"sales", "Tom">,
<"Sales", "Sam">,
<"Shipping", "Bill">,
<"Shipping", "Sally">}
B = {<"Bldg 1", "sales">,
<"Bldg 2", "shipping">}
Z = {{{"EQ". <″ 1 ″, ″ 2 ″>}}}

RDMJOIN (A, B, Z)-> {
<"Sales", "Tom", "Bldg 1", "sales">,
<"Sales", "Sam", "Bldg 1", "sales">,
<"Shipping", "Bill", "Bldg 2", "shipping">,
<"Shipping", "Sally", "Bldg 2", "shipping">}

ＲＤＭＰＩＶＯＴ関数。ＲＤＭＰＩＶＯＴ関数は、指定された集合のメンバ列とメンバ行とを交換する順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。
RDMPIVOT（A）＝＝＜＞ RDMPIVOT function. The RDMPIVOT function defines an ordered set that exchanges member columns and member rows of a specified set. The following is an example format and description of this function.
RDMPIVOT (A) == <>

引数：
Ａ−順序付き集合 argument:
A-ordered set

結果：結果として生成される集合は、集合Ａのメンバ列で構成されるメンバ行を含む。集合は、集合Ａ内のメンバ列の順序によって順序づけられる。 Result: The resulting set includes member rows composed of the member columns of set A. The sets are ordered by the order of member columns in set A.

備考：極めて大きな集合を回転させることは、コストがかかると共に時間がかかる恐れがあり、集合を処理する他の手段を見つけることができない場合のみ行われるべきである。 Note: Rotating very large sets can be costly and time consuming and should only be done if no other means of processing the set can be found.

要件：集合ＡはＲＤＭ集合でなければならない。これら条件が満たされない場合の結果は、ヌル集合である。 Requirement: Set A must be an RDM set. The result when these conditions are not met is a null set.

例：
A＝｛＜″3″，″Tom″，″a″＞，
＜″2″，″Sam″，″c″＞，
＜″6″，″Harry″，″a″＞，
＜″7″，″Harry″，″a″＞｝

RDMPIVOT（A）−＞＜
＜″3″，″2″，″6″，″7″＞，
＜″Tom″，″Sam″，″Harry″，″Harry″＞，
＜″a″，″c″，″a″，″a″＞＞ Example:
A = {<"3", "Tom", "a">,
<″ 2 ″, ″ Sam ″, ″ c ″>,
<"6", "Harry", "a">,
<″ 7 ″, ″ Harry ″, ″ a ″>}

RDMPIVOT (A)-><
<″ 3 ″, ″ 2 ″, ″ 6 ″, ″ 7 ″>,
<"Tom", "Sam", "Harry", "Harry">,
<″ A ″, ″ c ″, ″ a ″, ″ a ″ >>

ＲＤＭＰＲＯＪ関数。ＲＤＭＰＲＯＪは、変換述語集合を通して指定されるメンバ列を除く、引数集合のすべてのメンバ行からのメンバで構成される非順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。
RDMPROJ(A,Z) == {} RDMPROJ function. RDMPROJ defines an unordered set consisting of members from all member rows of the argument set, excluding the member columns specified through the conversion predicate set. The following is an example format and description of this function.
RDMPROJ (A, Z) == {}

引数：
Ａ−射影すべき非順序付き集合
Ｚ−射影のための変換述語集合 argument:
A-Unordered set to be projected Z-Transformation predicate set for projection

結果：結果として生成される集合は、変換述語集合によって指定されるメンバ列を除き、Ａの各メンバ行のメンバ行を含む。 Result: The resulting set includes the member rows of each member row of A, except for the member columns specified by the transformation predicate set.

備考：集合Ｚを適宜指定する方法の情報については、変換述語集合についての仕様を参照のこと。 Note: For information on how to specify the set Z as appropriate, see the specification for the conversion predicate set.

要件：集合ＡはＲＤＭ集合でなければならない。集合Ｚは変換述語集合でなければならない。これら条件が満たされない場合の結果は、ヌル集合である。 Requirement: Set A must be an RDM set. Set Z must be a transformation predicate set. The result when these conditions are not met is a null set.

例：
A＝｛＜″3″，″Tom″，″a″，″b″，″s″＞，
＜″2″，″Sam″，″c″，″b″，″s″＞，
＜″6″，″Harry″，″a″，″z″，″s″＞｝
Z＝＜″3″，″2″＞

RDMPROJ（A，Z）−＞｛＜″a″，″Tom″＞，
＜″c″，″Sam″＞，
＜″a″，″Harry″＞｝ Example:
A = {<"3", "Tom", "a", "b", "s">,
<″ 2 ″, ″ Sam ″, ″ c ″, ″ b ″, ″ s ″>,
<"6", "Harry", "a", "z", "s">}
Z = <"3", "2">

RDMPROJ (A, Z)->{<"a","Tom">,
<″ C ″, ″ Sam ″>,
<″ A ″, ″ Harry ″>}

ＲＤＭＲＥＳＴ関数。ＲＤＭＲＥＳＴ関数は、メンバ行が条件述語集合内で指定される条件を満たすものに制限される非順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。 RDMREST function. The RDMREST function defines an unordered set in which member rows are restricted to those that satisfy the conditions specified in the conditional predicate set. The following is an example format and description of this function.

RDMREST（A，Z）＝＝｛｝ RDMREST (A, Z) == {}

引数：
Ａ−制限されるべき非順序付き集合
Ｚ−制限の条件を指定する条件述語集合 argument:
A-Unordered set to be restricted Z-Conditional predicate set that specifies conditions for restriction

結果：結果として生成される集合は、条件述語集合Ｚによって指定される条件を満たす、集合Ａからのメンバ行のみを含む。 Result: The resulting set includes only member rows from set A that satisfy the condition specified by conditional predicate set Z.

備考：集合Ｚを指定する方法の情報については、条件述語集合の仕様を参照のこと。 Note: For information on how to specify the set Z, see the specification of the conditional predicate set.

要件：集合Ａは、ＲＤＭ集合の要件を満たさなければならない。集合Ｚは条件述語集合でなければならない。これら条件が満たされない場合、結果はヌル集合になる。 Requirement: Set A must meet the requirements of the RDM set. Set Z must be a conditional predicate set. If these conditions are not met, the result is a null set.

例：
A＝｛＜″3″，″Tom″，″a″，″b″，″s″＞，
＜″2″，″Sam″，″c″，″f″，″s″＞，
＜″6″，″Harry″，″a″，″z″，″s″＞｝
Z＝｛｛｛″EQ″．＜″2″，″const″．″Tom″＞｝｝，
｛｛″EQ″．＜″2″，″const″．″Harry″＞｝，｛″EQ″．＜″4″，″const″．″f″＞｝｝｝

RDMREST（A，Z）−＞｛＜″3″，″Tom″，″a″，″b″，″s″＞｝ Example:
A = {<"3", "Tom", "a", "b", "s">,
<″ 2 ″, ″ Sam ″, ″ c ″, ″ f ″, ″ s ″>,
<"6", "Harry", "a", "z", "s">}
Z = {{{"EQ". <“2”, “const”. "Tom">}},
{{"EQ". <“2”, “const”. "Harry">}, {"EQ". <“4”, “const”. "F">}}}

RDMREST (A, Z)->{<"3","Tom","a","b","s">}

ＲＤＭＳＯＲＴ関数。ＲＤＭＳＯＲＴは、非順序付き集合Ａ及び述語集合Ｚによって指定される順序に基づく順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。
RDMSORT（A，Z）＝＝＜＞ RDMSORT function. RDMSORT defines an ordered set based on the order specified by the unordered set A and the predicate set Z. The following is an example format and description of this function.
RDMSORT (A, Z) == <>

引数：
Ａ−非順序付き集合
Ｚ−結果集合のソート順を記述するマッピング集合 argument:
A-unordered set Z-mapping set describing the sort order of the result set

結果：マッピング集合Ｚにおいて指定される順序によってソートされる集合Ａのすべてのメンバ行を含む順序付き集合 Result: ordered set containing all member rows of set A sorted by the order specified in mapping set Z

備考：Ｚは、昇順のソート順を決定する最上位メンバ〜最下位メンバを指定するメンバ列の範囲を含むマッピング集合である。 Remarks: Z is a mapping set that includes a range of member columns that specify the highest order member to the lowest order member that determines the sort order in ascending order.

要件：述語集合Ｚは、要素が、それぞれ集合Ａの濃度未満のＮＡＴのメンバである順序付き集合でなければならない。集合ＡはＲＤＭ集合でなければならない。これら条件が満たされない場合、結果はヌル集合である。 Requirement: The predicate set Z must be an ordered set whose elements are each NAT members less than the concentration of set A. Set A must be an RDM set. If these conditions are not met, the result is a null set.

例：
A＝｛＜″3″，″Tom″，a″，″b″，″s″＞，
＜″2″，″Sam″，″c″，″b″，″s″＞，
＜″6″，″Harry″，″a″，″z″，″s″＞｝
Z＝＜″3″，″2″＞

RDMSORT（A，Z）−＞＜＜″6″，″Harry″，″a″，″z″，″s″＞，
＜″3″，″Tom″，″a″，″b″，″s″＞，
＜″2″，″Sam″，″c″，″b″，″s″＞＞ Example:
A = {<"3", "Tom", a "," b "," s ">,
<″ 2 ″, ″ Sam ″, ″ c ″, ″ b ″, ″ s ″>,
<"6", "Harry", "a", "z", "s">}
Z = <"3", "2">

RDMSORT (A, Z)-><<"6","Harry","a","z","s">,
<"3", "Tom", "a", "b", "s">,
<"2", "Sam", "c", "b", "s">>

ＲＤＭＵＮＩＯＮ関数。ＲＤＭＵＮＩＯＮは、集合Ａ及びＢのすべてのメンバ行を含む非順序付き集合を定義する。以下は、この関数のフォーマット例及び説明である。
RDMUNION（A，B）＝＝｛｝ RDMUNION function. RDMUNION defines an unordered set that includes all member rows of sets A and B. The following is an example format and description of this function.
RDMUNION (A, B) == {}

引数：
Ａ−非順序付き集合
Ｂ−非順序付き集合 argument:
A-Unordered set B-Unordered set

結果：Ａ又はＢのメンバ行を含む非順序付き集合 Result: unordered set containing member rows of A or B

備考：なし Remarks: None

要件：Ａ及びＢはＲＤＭ集合でなければならず、且つ同じメンバ列濃度を有さなければならない。これら条件が満たされない場合、結果はヌル集合である。 Requirement: A and B must be RDM sets and have the same member column concentration. If these conditions are not met, the result is a null set.

例：
A＝｛＜″a″，″b″，″c″＞｝
B＝｛＜″3″，″c″，″8″＞｝

RDMUNION（A，B）−＞｛＜″a″，″b″，″c″＞，
＜″3″，″c″，″8″＞｝ Example:
A = {<″ a ″, ″ b ″, ″ c ″>}
B = {<"3", "c", "8">}

RDMUNION (A, B)->{<"a","b","c">,
<″ 3 ″, ″ c ″, ″ 8 ″>}

上記関数、フォーマット、及び引数は単なる例にすぎず、他の実施形態では異なってもよい。例えば、他の実施形態では、異なる又は追加の関数を使用してもよい。 The above functions, formats, and arguments are merely examples and may be different in other embodiments. For example, in other embodiments, different or additional functions may be used.

本発明の好ましい実施形態を本明細書において図示し説明したが、かかる実施形態が単なる例として提供されることが当業者には明らかであろう。これから、当業者は、本発明から逸脱することなく多くの変形、変更、及び置換を思いつくであろう。本発明を実施するに際して、本明細書において述べた本発明の実施形態に対する様々な代替を利用し得ることを理解されたい。以下の特許請求の範囲が本発明の範囲を規定し、これら請求項及びそれらの均等物の範囲内にある方法及び構造が特許請求の範囲に包含されることが意図される。 While preferred embodiments of the present invention have been illustrated and described herein, it will be apparent to those skilled in the art that such embodiments are provided by way of example only. From now on, those skilled in the art will be able to conceive many variations, modifications and substitutions without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be utilized in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures that fall within the scope of these claims and their equivalents be covered by the claims.

Claims

A method for providing a request data set executed on a computer system comprising:
Receiving a first plurality of query language statements requesting a first plurality of data sets;
Assembling a first plurality of algebraic relationships between data sets based on the first plurality of query language statements, the first plurality of algebraic relationships comprising: Including a plurality of algebraic relationships assembled from each of said query language statements in a language statement ;
Storing the first plurality of algebraic relationships in a relationship store;
The method comprising: receiving a subsequent query language statement that requests the requested data set, the request data set, respectively and the different data sets in the first plurality of data sets,
Based on at least a portion of said first plurality of algebraic relations assembled from the first plurality of query language statements, looking contains a <br/> and providing said requested data set,
Providing the requested dataset includes a plurality of algebraic relationships that define a result equal to the requested dataset using at least some of the first plurality of algebraic relationships from the relationship store. Constructing a collection of and selecting one of the collection of algebraic relationships to calculate the request data set .

Each of the first plurality of query language statements specifies at least one explicit data set, the method comprising:
Providing a data set information store for storing information relating to the first plurality of data sets;
Associating a dataset identifier with the explicit dataset specified in the first plurality of query language statements;
Storing the data set identifier of the explicit data set in the data set information store;
The method of claim 1, further comprising:

Further comprising providing a data store storing at least some of the data sets, wherein the first plurality of query language statements are not stored in the data store upon receipt of the subsequent query language statements. The method according to claim 1 or 2, wherein at least one data set is specified.

4. The method of claim 3, wherein the request data set is calculated using at least one data set that was not stored in the data store upon receipt of the subsequent query language statement.

5. The algebraic relationship in the first plurality of algebraic relationships each having a single operator and several operands in the range of 1-3. the method of.

Each of the algebraic relationships in the first plurality of algebraic relationships is:
A first representation including at least an abstract representation of the first data set;
A second representation including at least an abstract representation of the second data set;
The method according to claim 1, comprising a relational operator that abstractly defines a mathematical relationship between the first representation and the second representation.

7. At least one of the first plurality of query language statements is based on a relational data model and at least one of the first plurality of query language statements is based on a markup language model. The method of any one of these.

Assembling a plurality of additional algebraic relationships based on the query language statement requesting the request data set;
Providing the request data set using at least some of the additional algebraic relationships;
The method according to claim 1, further comprising:

Applying optimization criteria to select the collection of algebraic relationships and calculating the request data set ;
Calculating the request data set using the selected set of algebraic relationships;
The method according to claim 1, comprising:

10. The optimization criterion according to any of claims 1 to 9, wherein the optimization criterion is based at least in part on an estimate of the time required to retrieve from a storage device a data set required to compute each collection of algebraic relationships. 2. The method according to item 1.

The optimization criterion is based, at least in part, on the cost of retrieving from the storage device the data set necessary to calculate each collection of the algebraic relationships;
The optimization criteria allocates a cost for retrieving an individual data set from storage only once if the individual data set is referenced more than once in a collection of algebraic relationships. 11. The method according to any one of items 10.

12. The assembling of a plurality of algebraic relations comprises generating a collection of algebraic relations that distinguish between equivalent data sets that differ in physical format but logically contain the same data. The method according to any one of the above.

Providing a plurality of functions, including at least two algebraically equivalent functions operating on data sets of different physical formats;
Selecting one of the algebraically equivalent functions based on the format of a selected data set of the equivalent data sets;
Realizing the request data set using at least some of the functions including selected functions of the algebraically equivalent functions;
The method according to claim 1, further comprising:

Assembling the collection of a plurality of algebraic relationships is when receiving the subsequent query language statement requesting the request data set using at least some of the first plurality of algebraic relationships. 14. The method of any one of claims 1-13, further comprising generating a new algebraic relationship that was not previously available.

15. The plurality of algebraic relation collections further comprising at least two algebraic relation collections that are not algebraically equivalent to each other but both provide a result equal to the requested data set. The method according to claim 1.