JP2018510445A

JP2018510445A - Domain-specific system and method for improving program performance

Info

Publication number: JP2018510445A
Application number: JP2018502613A
Authority: JP
Inventors: スノッドグラス，リチャード，ティー．; デブリー，ソーミャ，ケー．; チャン，ルイ; トーマス，スティーヴン; メイスン，ショーン
Original assignee: Arizona Board of Regents of University of Arizona
Current assignee: Arizona Board of Regents of University of Arizona
Priority date: 2015-04-02
Filing date: 2016-03-31
Publication date: 2018-04-12
Also published as: CN107851003A; EP3278218A4; WO2016161130A1; CA2980333A1; EP3278218A1

Abstract

本発明は、データベース管理システム（ＤＢＭＳ）のようなコンピュータプログラムの性能を向上させるシステム及び方法に関する。前記方法は、プログラム表現（ＰＲ）に基づいて、前記ＤＢＭＳコードにおける変数の不変区間を識別するステップを含む。前記ＤＢＭＳ内のプログラム相互作用及びドメインアサーションは、前記ＰＲ及び前記ＤＢＭＳのエコシステム仕様に基づいて推定される。１つまたは複数の候補スニペットは、前記ＤＢＭＳコードにおける変数の前記不変区間、前記ＰＲ、前記ＤＢＭＳと関連付けられた１つまたは複数の実行サマリー、前記推定したプログラム相互作用、および前記推定したドメインアサーションに基づいて、識別される。Spiffsは前記１つまたは複数の候補スニペットに基づいて生成される。該spiffsは述語クエリspiff、ハッシュ結合クエリspiff、集約クエリspiff、ページstiff、及びストリングマシンstiffを含む。前記ＤＢＭＳコードはこれらspiffsから生成されたスペックコードに基づいて修正される。【選択図】図１The present invention relates to a system and method for improving the performance of a computer program such as a database management system (DBMS). The method includes identifying invariant intervals of variables in the DBMS code based on a program representation (PR). Program interactions and domain assertions in the DBMS are estimated based on the PR and the DBMS ecosystem specifications. One or more candidate snippets are generated for the invariant interval of variables in the DBMS code, the PR, one or more execution summaries associated with the DBMS, the estimated program interaction, and the estimated domain assertion. Based on the identification. Spiffs are generated based on the one or more candidate snippets. The spiffs include a predicate query spiff, a hash join query spiff, an aggregate query spiff, a page stiff, and a string machine stiff. The DBMS code is modified based on the spec code generated from these spiffs. [Selection] Figure 1

Description

本開示は一般的に、コンピュータプログラムの性能を向上させる領域特化に関し、より詳細には、変数の不変区間を識別するとともに、識別した不変区間に少なくとも部分的に基づいて生成された特化コードを利用してＤＢＭＳコードを修正することにより、データベース管理システムの性能を向上させるシステムおよび方法に関する。 The present disclosure relates generally to domain specialization that improves the performance of computer programs, and more particularly to identifying invariant sections of variables and specialized codes generated based at least in part on the identified invariant sections. The present invention relates to a system and method for improving the performance of a database management system by modifying a DBMS code by using.

データベース管理システム（ＤＢＭＳ）は、データの格納およびアクセスを管理するソフトウェアプログラムの集まりである。今日、生成されるデータ量が多くなっており、格納および効率的なアクセスが必要となっていることから、広範な用途領域全体でＤＢＭＳが採用されている。過去４０年にわたるこのようなユビキタス展開によって、ＤＢＭＳは、一般的にこのような領域に適用できる数個のデータモデルに基づいて、設計および運用がなされている。リレーショナルデータモデルは、市販のオープンソースＤＢＭＳに最も広く受け入れられているモデルの１つである。このデータモデルを効率的にサポートするため、かなりの労力が捧げられてきた。 A database management system (DBMS) is a collection of software programs that manage the storage and access of data. Today, DBMSs have been adopted throughout a wide range of applications due to the increased amount of data generated and the need for storage and efficient access. With such ubiquitous development over the past 40 years, DBMSs are generally designed and operated based on several data models applicable to such areas. The relational data model is one of the most widely accepted models for commercial open source DBMSs. Considerable effort has been devoted to efficiently support this data model.

リレーショナルデータモデルの一般性により、リレーショナルデータベース管理システム自体も、ユーザが指定する如何なるスキーマであれ、提示される如何なるクエリまたは修正であれ、これらを処理可能である点において一般性を有する。関係演算子は、本質的に如何なる関係にも作用するため、基本関係の任意の属性に指定された述語に対応する必要がある。効果的な索引構造、革新的な並行処理制御メカニズム、および洗練されたクエリ最適化法等の革新によって、今日利用可能なリレーショナルＤＢＭＳは、非常に効率的である。また、このような一般性および効率により、多くの領域に拡散して使用可能となっている。 Because of the generality of the relational data model, the relational database management system itself has generality in that it can process any schema specified by the user and any queries or modifications presented. Since relational operators operate on essentially any relation, they need to correspond to predicates specified for any attribute of the basic relation. With innovations such as an effective index structure, innovative concurrency control mechanisms, and sophisticated query optimization methods, relational DBMSs available today are very efficient. Further, due to such generality and efficiency, it can be used by being diffused in many areas.

上記に関わらず、一般性は、複数の間接レイヤおよび洗練されたコードロジックにより実現される。このようなシステムの実行中に存在する不変値を利用することにより、ＤＢＭＳに関して効率をさらに増大可能である。本出願に開示の領域特化技術は、不変値を自動的に識別するとともに、不変値に基づいてコード特化を有効にするために開発したものである。 Regardless of the above, generality is realized by multiple indirect layers and sophisticated code logic. By utilizing invariant values that exist during the execution of such a system, the efficiency can be further increased with respect to the DBMS. The domain specialization technique disclosed in the present application was developed to automatically identify invariant values and to enable code specialization based on the invariant values.

本開示の実施形態は、データベース管理システム（ＤＢＭＳ）の性能を向上させるシステムおよび方法を提供する。簡潔に説明するなら、この方法の一実施形態はとりわけ、以下のように実装可能である。ＤＢＭＳの性能を向上させるコンピュータ実装方法は、（i）ＤＢＭＳソースコードのコンパイル時間解析に基づいて、ＤＢＭＳコードにおける変数の不変区間を識別するステップと、（ii）ソースコードおよびＤＢＭＳのエコシステム仕様に基づいて、ＤＢＭＳ内のプログラム相互作用を推定するステップと、ソースコード、ＤＢＭＳコードにおける変数の識別した不変区間、および推定したプログラム相互作用に基づいて、いわゆるドメインアサーションを推定するステップと、（iii）ＤＢＭＳコードにおける変数の不変区間、ソースコード、さまざまなワークロードを用いて実行されたＤＢＭＳと関連付けられ１つまたは複数の実行サマリー、推定したプログラム相互作用、および推定したドメインアサーションに基づいて、１つまたは複数の候補スニペットを識別するステップと、（iv）識別した候補スニペットに基づいて、コンパイル時および実行時を含むさまざまな時点に特化ＤＢＭＳコードを生成するステップと、（v）利用可能なコードを挿入することにより、ＤＢＭＳを修正して、場合により実行時に特化コードの生成を実行し、その後、特化コードを呼び出すステップと、を含む。 Embodiments of the present disclosure provide systems and methods for improving the performance of a database management system (DBMS). Briefly described, an embodiment of this method can be implemented, inter alia, as follows. A computer-implemented method for improving the performance of a DBMS includes: (i) identifying an invariant section of a variable in the DBMS code based on a compile time analysis of the DBMS source code; and (ii) an ecosystem specification of the source code and the DBMS. (Iii) estimating a program interaction in the DBMS based on the source code, invariant intervals identified by the variables in the DBMS code, and the estimated program interaction; One based on invariant intervals of variables in DBMS code, source code, one or more execution summaries associated with DBMS executed using different workloads, estimated program interactions, and estimated domain assertions Or Identifying a number of candidate snippets; (iv) generating specialized DBMS code based on the identified candidate snippets at various times including compile time and run time; and (v) available code Modifying the DBMS by insertion and possibly generating specialized code at runtime and then calling the specialized code.

本開示の他のシステム、方法、特徴、および利点については、以下の図面および詳細な説明を参照することによって、当業者に明らかとなるであろう。このような付加的なシステム、方法、特徴、および利点はすべて、本明細書に含まれ、本開示の範囲に含まれ、添付の特許請求の範囲によって保護されるものである。 Other systems, methods, features, and advantages of the present disclosure will be apparent to those of ordinary skill in the art by reference to the following drawings and detailed description. All such additional systems, methods, features and advantages are intended to be included herein, within the scope of this disclosure, and protected by the accompanying claims.

以下の図面を参照することによって、本開示の多くの態様をより深く理解することができる。図面中の構成要素は、必ずしも縮尺通りではなく、本開示の原理を明示するために強調している場合がある。さらに、図面中においては、複数の図にわたって同じ参照番号が対応する部分を示している。 Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis may be placed upon clearly illustrating the principles of the present disclosure. Furthermore, in the drawings, the same reference numerals indicate corresponding parts throughout the drawings.

本開示が提供する例示的な一実施形態に係るＳｐｉｆｆツールアーキテクチャを示したブロック図である。FIG. 2 is a block diagram illustrating a Spiff tool architecture according to an exemplary embodiment provided by the present disclosure. 本開示が提供する例示的な一実施形態に係る領域特化プロセスのブロック図である。FIG. 3 is a block diagram of a region specialization process according to an exemplary embodiment provided by the present disclosure. 本開示が提供する例示的な一実施形態によるコンピュータサイエンスのパラダイムを精緻化する領域特化を示した図である。FIG. 4 illustrates domain specializations that refine a computer science paradigm according to an exemplary embodiment provided by the present disclosure.

本開示の多くの実施形態は、プログラム可能なコンピュータが実行するアルゴリズム等のコンピュータ実行可能命令の形態であってもよい。ただし、本開示は、他のコンピュータシステム構成でも同様に実施可能である。本開示の特定の態様は、以下に記載のコンピュータ実行可能アルゴリズムのうちの１つまたは複数を実行するように具体的にプログラム、構成、または構築された専用コンピュータまたはデータプロセッサにおいて具現化可能である。 Many embodiments of the present disclosure may be in the form of computer-executable instructions, such as algorithms executed by a programmable computer. However, the present disclosure can be similarly implemented in other computer system configurations. Certain aspects of the present disclosure can be embodied in a dedicated computer or data processor that is specifically programmed, configured, or constructed to execute one or more of the computer-executable algorithms described below. .

また、本開示は、分散コンピューティング環境においても実施可能であり、通信ネットワークを介して結合されたリモート処理機器によってタスクまたはモジュールが実行される。さらに、本開示は、インターネットベースまたはクラウドコンピューティング環境において実施可能であり、共有リソース、ソフトウェア、および情報がオンデマンドでコンピュータ等の機器に提供されるようになっていてもよい。分散コンピューティング環境においては、ローカルおよびリモートのメモリ記憶デバイス両者にプログラムモジュールまたはサブルーチンが配置されていてもよい。以下に記載の本開示の態様は、磁気および光学式可読および取り外し可能コンピュータディスク、固定磁気ディスク、フロッピーディスクドライブ、光ディスクドライブ、光磁気ディスクドライブ、磁気テープ、ハードディスクドライブ（ＨＤＤ）、半導体ドライブ（ＳＳＤ）、コンパクトフラッシュ（登録商標）等の不揮発性メモリ等のコンピュータ可読媒体のほか、クラウド等のネットワーク上で電子的に分散したコンピュータ可読媒体に格納または分散されていてもよい。本開示の態様に特有のデータ構造およびデータ伝送についても、本開示の範囲に含まれる。 The present disclosure can also be practiced in distributed computing environments where tasks or modules are performed by remote processing devices that are linked through a communications network. Further, the present disclosure can be implemented in an Internet-based or cloud computing environment, where shared resources, software, and information may be provided to devices such as computers on demand. In a distributed computing environment, program modules or subroutines may be located in both local and remote memory storage devices. Aspects of the present disclosure described below include magnetic and optical readable and removable computer disks, fixed magnetic disks, floppy disk drives, optical disk drives, magneto-optical disk drives, magnetic tapes, hard disk drives (HDDs), semiconductor drives (SSDs). ), A computer readable medium such as a non-volatile memory such as CompactFlash (registered trademark), or a computer readable medium electronically distributed on a network such as a cloud. Data structures and data transmissions specific to aspects of this disclosure are also within the scope of this disclosure.

本発明は、本明細書では主にリレーショナルＤＢＭＳに関して説明するが、このようなＤＢＭＳタイプには限定されない。本発明は、階層型、ネットワーク型、およびオブジェクト指向型のＤＢＭＳタイプ等、任意のＤＢＭＳタイプに適用可能であることが容易に了解されるが、これらに限定されない。さらに、本明細書においては主に、ＤＢＭＳに関して領域特化を開示するが、本明細書で提供する概念は、データ操作、特に、データに関する複雑な解析を実行する任意のプログラムに適用可能であることが了解されるものとする。具体的に、開示のシステムおよび方法は、高い実行時性能を要するとともに、同じデータに対して異なるパラメータまたはクエリでアプリケーションを複数回実行するコンピュータプログラムにも適用可能であることが了解される。領域内特化器を表すＳｐｉｆｆは、ＤＢＭＳ実行時に特化コードを動的に作成するコードである。「領域特化」は、実行時不変値を利用してＤＢＭＳがそれ自体を特化できるように、ＳｐｉｆｆをＤＢＭＳコードに挿入するプロセスである。特化コード（本明細書においては、「ｓｐｅｃｃｏｄｅ」と称する場合もある）は、元の非特化コードよりも高速で、一般的に小さい。領域特化は、「領域内」すなわちＤＢＭＳが展開されてエンドユーザ側で動作している際に特化コードが生成されて呼び出されるという事実により命名されたものである。Ｓｐｉｆｆは、実行時不変値（実行時に取得）の実際の値を用いることにより、当該実行時不変値の特定の値に対して特化されたコードを動的に生成する。 Although the present invention is described herein primarily with respect to relational DBMSs, it is not limited to such DBMS types. It will be readily appreciated that the present invention is applicable to any DBMS type, such as, but not limited to, hierarchical, network and object oriented DBMS types. In addition, although domain specification is primarily disclosed herein for DBMS, the concepts provided herein are applicable to any program that performs data manipulation, particularly complex analysis on data. It shall be understood. Specifically, it will be appreciated that the disclosed systems and methods are applicable to computer programs that require high runtime performance and that execute applications multiple times with different parameters or queries on the same data. Spiff, which represents an in-region specializer, is a code that dynamically creates a special code when executing a DBMS. “Regional specialization” is the process of inserting Spiff into the DBMS code so that the DBMS can specialize itself using runtime invariant values. Specialized codes (sometimes referred to herein as “speccode”) are faster and generally smaller than the original non-specialized code. Domain specialization is named for the fact that specialization code is generated and called “in the domain”, ie when the DBMS is deployed and running on the end user side. Spiff dynamically generates code specialized for a specific value of the runtime invariant value by using the actual value of the runtime invariant value (obtained at runtime).

本出願が優先権を主張する本出願人の米国特許同時係属出願第１４／３６８，２６５号明細書において、用語「マイクロ特化」は、本明細書で使用する用語「領域特化」に相当し、用語「Ｂｅｅ」は、本明細書で使用する用語「Ｓｐｉｆｆ」に相当し、インスタンス化されたＢｅｅは、本明細書で使用する「特化コード」に相当し（Ｓｐｉｆｆの結果である）、ＨＲＥ（Ｈｉｖｅ実行時環境）は、本明細書で使用する「ＳＲＥ」（Ｓｐｉｆｆ実行時環境）に相当する。 In Applicant's US patent co-pending application 14 / 368,265 to which this application claims priority, the term “microspecialization” corresponds to the term “regional specialization” as used herein. The term “Bee” corresponds to the term “Spiff” as used herein, and the instantiated Bee corresponds to “specialized code” as used herein (which is the result of Spiff). , HRE (Hive runtime environment) corresponds to “SRE” (Spiff runtime environment) used in this specification.

アーキテクチャ
図１は、本開示が提供する例示的な一実施形態に係るＳｐｉｆｆツールアーキテクチャを示したブロック図である。 Architecture FIG. 1 is a block diagram illustrating a Spiff tool architecture according to an exemplary embodiment provided by the present disclosure.

一実施形態において、本開示は、図１に示すように、以下のような３つの入力を前提として、任意のプログラムを自動的に領域特化するＳｐｉｆｆツールアーキテクチャを提供する：
当然の入力であるアプリケーションのソースコード、
１つまたは複数のワークロード、
エコシステム仕様。 In one embodiment, the present disclosure, as shown in FIG. 1, provides a Spiff tool architecture that automatically domain-specifies any program given the following three inputs:
Application source code, which is a natural input,
One or more workloads,
Ecosystem specification.

したがって、このアーキテクチャでは、領域特化がアプリケーションソースコードを解析し、最終的に、これら３つの入力を利用した完全自動のプロセスとなって、セマンティクスは同一であるもののより高速に動作する特化アプリケーションを生成するものと仮定する。 Thus, in this architecture, domain specialization analyzes the application source code and eventually becomes a fully automated process using these three inputs, with specialized semantics that operate faster, but with the same semantics. Is generated.

目標
このアーキテクチャの目標は、以下の通りである。
１．ソースファイルをプログラムまたは一連の関連プログラムとするエンドツーエンドのソリューションを提供し、Ｓｐｉｆｆ生成、Ｓｐｉｆｆコンパイル、Ｓｐｉｆｆインスタンス化、Ｓｐｉｆｆ起動、およびＳｐｉｆｆガベージコレクションのコード等、上記ソースファイルを領域特化したものを自動的に提供する。
２．高並列グラフィック処理ユニット（ＧＰＵ）等、任意の従来アーキテクチャに対してコンパイルされた仮想的に任意のプログラムと当該アーキテクチャが協働する点において、ドメイン独立性を提供する。
３．ツールユーザが提供を要する情報を最小限に抑えるとともに、解析中のプログラムから抽出される情報を最大化する。
４．それぞれが概念的タスクを１つだけ実行する一連のツールへと解析を分割する。
５．各ツールが当該ツールの解析結果を捕捉するわずかな出力を生成するようにして、ツールのスケーラビリティを有効にする。
６．ツールを小さなプログラム上で最初にテストした後、現実的なプログラム上でテストできるように、漸進的開発を有効にする。
７．最初に部分的な最大限の解析（たとえば、不変値の一部のみまたは最小限の候補スニペットの探索）を行った後、各ツールを改良して、時間とともにより包括的な出力を生成できる点において、連続的な改良を有効にする。
８．Ｓｐｉｆｆにより導入される個々のコード変換の利益を動的および／または独立に評価可能であり、コード変換のすべての組み合わせを網羅的に評価するとともにそれぞれの実行時間を測定することなく、有効なコード変換および特定のワークロードの特性を考慮することによって、当該Ｓｐｉｆｆの全体利益を演算可能であることから、性能利益推定を有効にする。 Goals The goals of this architecture are:
1. Provides an end-to-end solution that uses a source file as a program or a series of related programs, and specializes the above source files, such as code for Spiff generation, Spiff compilation, Spiff instantiation, Spiff launch, and Siff garbage collection Provide automatically.
2. Domain independence is provided in that the architecture works with virtually any program compiled against any conventional architecture, such as a highly parallel graphics processing unit (GPU).
3. Minimize the information that the tool user needs to provide and maximize the information extracted from the program under analysis.
4). Divide the analysis into a series of tools, each performing only one conceptual task.
5. Enables tool scalability by allowing each tool to generate a small output that captures the analysis results of that tool.
6). Enable incremental development so that the tool can be tested first on a small program and then on a realistic program.
7). The ability to refine each tool to produce a more comprehensive output over time after first doing a partial maximum analysis (for example, searching for only some of the invariant values or the minimum candidate snippet) In order to enable continuous improvement.
8). The benefits of individual code transformations introduced by Spiff can be evaluated dynamically and / or independently, and all code combinations can be evaluated comprehensively and valid code without measuring each execution time By taking into account the characteristics of the transformation and the specific workload, the overall profit of the Spiff can be computed, thus enabling performance profit estimation.

ツール
図１に示すように、Ｓｐｉｆｆツールアーキテクチャには、多くのツールを含む。これらのツールとしては、不変探索器、追跡器、不変検査器、プログラム相互作用推定器、ドメインアサーション推定器、スニペット探索器、およびＳｐｉｆｆ構成器が挙げられるが、それぞれについては以下で詳しく説明する。図１に示す例示的なＳｐｉｆｆツールアーキテクチャにより、当業者であれば、本開示の主旨および原理から実質的に逸脱することなく、多くの変形および他の分割が可能であることが了解されよう。 Tools As shown in FIG. 1, the Spiff tool architecture includes many tools. These tools include invariant searchers, trackers, invariant testers, program interaction estimators, domain assertion estimators, snippet searchers, and Spiff constructors, each of which is described in detail below. With the exemplary Spiff tool architecture shown in FIG. 1, those skilled in the art will appreciate that many variations and other divisions are possible without substantially departing from the spirit and principles of the present disclosure.

この説明では、アプリケーションのソースコードの共通のコンピュータ可読構造化である抽象構文木（ＡＳＴ）等の特定のプログラム表現（ＰＲ）を利用する。また、本発明は、アプリケーションのテキストソースコード、ソースコードの低レベル中間表現（ＩＲ）、あるいはソースコードの同等アセンブリまたはマシンコード表現に対して直接、解析が代替的に実行される場合にも当てはまる。なお、ＰＲが利用する個々のプログラム構成は、プログラム表現（ＰＥ）と称する。ＡＳＴの場合はＰＥがＡＳＴノードであり、ＩＲの場合はＰＥがＩＲ命令であり、ソースコードの場合はＰＥが単文である。 In this description, a specific program representation (PR) such as an abstract syntax tree (AST), which is a common computer readable structure of application source code, is used. The present invention also applies when the analysis is alternatively performed directly on the text source code of the application, a low-level intermediate representation (IR) of the source code, or an equivalent assembly or machine code representation of the source code. . Each program configuration used by PR is referred to as a program expression (PE). In the case of AST, PE is an AST node, in the case of IR, PE is an IR instruction, and in the case of source code, PE is a single sentence.

不変探索器
不変探索器は、特化対象のＤＢＭＳのＰＲおよび追跡イベント（任意）を入力とし、ＰＲに対して静的解析を実行することにより、ゼロ個以上の不変区間を出力する。 Invariant Searcher The invariant searcher receives the PR of the DBMS to be specialized and the tracking event (arbitrary) as inputs, and outputs zero or more invariant sections by executing static analysis on the PR.

いくつかの定義：
不変区間：単一の開始ＰＥ（または、ソースコード内の単一位置も同等）および１つまたは複数の考え得る実行中に開始ノードから到達可能な単一の終了ＰＲにより規定され、変数の特定の特性が持続する一組のパスである。このような特性の例は記載していない（区間は、単一のパスではなく、一組のパスから成り得る。たとえば、対象となっている変数を修正するｉｆ／ｅｌｓｅブロックの分岐はないため、この変数は、これらの分岐と関連付けられたすべてのコードパス上で不変のままである）。なお、不変区間は、開始ＰＥにおいて始まり（すなわち、変数が当該割り当て値となったら直ちに始まる（開始ＰＥは常に、変数の値を設定する文である））、値が再び設定される直前の終了ＰＥにおいて終わる。このパスに沿った各ＰＥにおいて、当該変数の値は、当該パスに沿った他の点と同じになるため、不変と称する。
不変区間セット：含まれるすべての不変区間が同じ開始ノードを共有する特定の変数の一組の不変区間である。区間は、当該ＰＥの実行後にも指定特性が持続することを解析によって確定し得ない場合、必要以上に終端となる点において、おそらく最大ではない。
値フロー木（ＶＦＴ）：ある変数の値の別の変数へのコピーを捕捉する木である。ＶＦＴは、変数Ｘによって変数Ｙの値が割り当てられている場合、Ｘの不変区間セットをＹの不変区間セットに接続する。 Some definitions:
Invariant interval: defined by a single starting PE (or equivalently a single location in the source code) and a single ending PR that can be reached from the starting node during one or more possible executions. This is a set of paths that maintain the characteristics of. An example of such a characteristic is not described (the section can consist of a set of paths instead of a single path. For example, there is no branch of the if / else block that modifies the variable in question. , This variable remains unchanged on all code paths associated with these branches). Note that the invariant section begins at the start PE (that is, starts immediately when the variable becomes the assigned value (the start PE is always a statement that sets the value of the variable)) and ends immediately before the value is set again. Ends in PE. In each PE along this path, the value of the variable is the same as the other points along the path, so it is referred to as invariant.
Invariant section set: A set of invariant sections of a particular variable in which all contained invariant sections share the same starting node. The interval is probably not the maximum in that it ends more than necessary if it cannot be determined by analysis that the specified characteristics persist after execution of the PE.
Value Flow Tree (VFT): A tree that captures a copy of the value of one variable to another variable. When the value of the variable Y is assigned by the variable X, the VFT connects the X invariant section set to the Y invariant section set.

一例として、不変区間は、変数の値（すなわち、特性が値である）が持続する（たとえば、（ある定数Ｎについて）「変数がＮに等しい」）区間にわたって存在していてもよい。これにより、たとえば以下のような特定種類の最適化が省略される場合がある：
プログラム状態に基づく最適化（たとえば、集約演算におけるメモリ割り当て最適化）。
「変数がＮに等しい」という形態ではない特性に基づく最適化（たとえば、コードの一部if(p!=NULL)Sを前提とするなら、ポインタｐは、Ｓにおいて非ヌルの必要があり、たとえばＳから呼び出される関数において、冗長なヌルチェックを最適化により除去可能であるべきことが分かっている）。
コード中で明示されていない可能性がある導出値（たとえば、文字列の長さ）に基づく最適化。
ドメイン知識（たとえば、列中に現れ得る一組の値の濃度）に基づく最適化。 As an example, an invariant interval may exist over an interval in which the value of the variable (ie, the characteristic is a value) persists (eg, (for some constant N) “variable equals N”). This may omit certain types of optimizations such as:
Optimization based on program state (eg, memory allocation optimization in aggregate operations).
Optimization based on properties that are not in the form of “variable equals N” (eg, assuming a portion of code if (p! = NULL) S, the pointer p must be non-null in S; For example, it has been found that in functions called from S, redundant null checks should be eliminated by optimization).
Optimization based on derived values (eg, string length) that may not be explicitly stated in the code.
Optimization based on domain knowledge (eg concentration of a set of values that may appear in a column).

実施例１：ｉｆ文
以下に示すＥｘａｍｐｌｅ１を考える。

Example 1: if statement Consider the following Example1.

ここで、不変探索器は、「ｉｆ」文が真偽いずれになるかを静的に把握しない。このため、不変探索器は、変数ｘに関して、以下の不変区間セットを出力するものとする：
簡略化のため、ＰＥＩＤの代わりに行番号を用いて、ソース位置を識別する。また、閉開区間を使用する。
不変区間セット＃１：行１で始まり、１不変区間である：
不変区間＃１．１：行３で終わる
不変区間セット＃２：行１０で始まり、１不変区間である：
不変区間＃２．１：行１５（すなわち、プログラムの最後）で終わる Here, the invariant searcher does not statically grasp whether the “if” sentence is true or false. For this reason, the invariant searcher shall output the following invariant interval set for the variable x:
For simplicity, the line number is used instead of the PE ID to identify the source location. A closed / open section is used.
Invariant section set # 1: Starts at row 1 and is one invariant section:
Invariant section # 1.1: Ends in line 3 Invariant section set # 2: Starts in line 10 and is one invariant section:
Invariant section # 2.1: ends in line 15 (ie at the end of the program)

不変探索器は、ＸＭＬ等の何らかの構造化フォーマットで上記を出力するようにしてもよいが、本開示においては、簡略化のため、リストおよびサブリストが利用される。 The invariant searcher may output the above in some structured format such as XML, but in the present disclosure, lists and sublists are used for simplicity.

不変探索器は、その精度が変動してもよいが、正確さは必要である。具体的に、不変探索器が生成する不変値は、正確であるべきものの、必ずしも網羅的である必要はない。たとえば、ｘは、行１〜行５および行１〜行９において実際に不変である。ただし、正確でもある（精度は低い）ため、たとえば、「ｉｆ」文の最初に区間を停止する。当然のことながら、低精度の区間では、スニペット探索器およびＳｐｉｆｆ構成器（以下に説明するツール）がアプリケーションを領域特化する機会はそれほど多くない。 An invariant searcher may vary in its accuracy, but accuracy is required. Specifically, the invariant values generated by the invariant searcher should be accurate but not necessarily exhaustive. For example, x is actually unchanged in rows 1 to 5 and rows 1 to 9. However, since it is also accurate (accuracy is low), for example, the section is stopped at the beginning of the “if” sentence. Of course, in low-accuracy intervals, the snippet searcher and the Spiff composer (the tools described below) do not have much opportunity to domain-specific applications.

不変探索器は、プログラム中のすべての変数に関して、このような不変区間セットを出力することも可能である。これを変数ｈについて考察する：
不変区間セット＃３：行１１で始まり、１不変区間である：
不変区間＃３．１：行１３で終わる
不変区間セット＃４：行１３で始まり、１不変区間である：
不変区間＃４．１：行１５で終わる
変数ｙについては、以下であるものとする：
不変区間セット＃５：行２で始まり、１不変区間である：
不変区間＃５．１：行１５で終わる
ｚについては、以下であるものとする：
不変区間セット＃６：行１２で始まり、１不変区間である：
不変区間＃６．１：行１５で終わる
最後に、変数ａについては、以下であるものとする：
不変区間セット＃７：行１４で始まり、１不変区間である：
不変区間＃７．１：行１５で終わる The invariant searcher can also output such an invariant interval set for all variables in the program. Consider this for the variable h:
Invariant section set # 3: Starts at line 11 and is one invariant section:
Invariant section # 3.1: Ends with row 13 Invariant section set # 4: Starts with row 13 and is one invariant section:
Invariant interval # 4.1: For variable y ending in row 15, assume that:
Invariant section set # 5: Starts at row 2 and is one invariant section:
Invariant interval # 5.1: For z ending in row 15, assume that:
Invariant section set # 6: Starts at row 12 and is one invariant section:
Invariant interval # 6.1: ending in row 15 Finally, for variable a, assume that:
Invariant section set # 7: Starts at row 14 and is one invariant section:
Invariant section # 7.1: ends in line 15

変数ｈがどのようにその値を変数ｘから取得するか（ｘからの値の「フロー」）について言及する。これにより、ｚの値のｈからの「フロー」が生じる。そこで、すべてをまとめて試すために、ｘのＶＦＴは、以下のＥｘａｍｐｌｅ２に示すように、例示的な基準表現で与えられるものとする。

We refer to how variable h gets its value from variable x (the “flow” of the value from x). This creates a “flow” from the z value h. Therefore, in order to try all together, it is assumed that the VFT of x is given by an exemplary reference expression as shown in Example 2 below.

「from」および「to」属性の数値は、上記の不変区間セット（ＩＩＳ）のうちの１つを表す。このため、指定の第１の行は、不変区間セット＃１〜不変区間セット＃４である。 The numerical values of the “from” and “to” attributes represent one of the invariant interval sets (IIS) described above. For this reason, the designated first row is the invariant section set # 1 to the invariant section set # 4.

実施例２：値渡し関数
変数「ａ」は、関数X()において不変であるものの、呼び出された関数Y(a)内で値が一時的に変化するものと仮定する。Y(a)が戻った時、「ａ」の値は、依然としてその（不変）値を有する。この状況は、変数の値を通過する関数呼び出しが当該値の別の変数へのコピーであり、それ自体の不変区間セットと関連付けられていることから対応がなされる。 Example 2: Pass-by-value function It is assumed that the variable “a” is unchanged in the function X (), but the value temporarily changes in the called function Y (a). When Y (a) returns, the value of “a” still has its (invariant) value. This situation is addressed because a function call that passes through the value of a variable is a copy of that value to another variable and is associated with its own set of invariant intervals.

実施例３：割り当てのないループ
以下のＥｘａｍｐｌｅ３のコードを考える：

Example 3 Loop without Assignment Consider the following example 3 code:

ループを理解するため、不変探索器は、実際にループを展開するものとする。もっと正確に言えば、ループ内の変数割り当てがあるかを確認するものとする。本実施例のように、割り当てがない場合は、当該ループに達する不変区間がループ全体に延びることになる：
不変区間セット＃１：行１で始まり、１不変区間である：
不変区間＃１．１：８で終わる In order to understand the loop, the invariant searcher actually expands the loop. More precisely, we shall check if there is a variable assignment in the loop. As in this example, if there is no assignment, the invariant section reaching the loop will extend throughout the loop:
Invariant section set # 1: Starts at row 1 and is one invariant section:
Ends in invariant section # 1.1: 8

実施例４：既存の変数への割り当てがあるループ
ただし、Ｅｘａｍｐｌｅ４を参照して、ループにおける条件付き変数割り当てを考える：

Example 4: Loop with assignments to existing variables However, with reference to Example 4, consider conditional variable assignment in a loop:

ここで、不変探索器は、以下の区間を作成することになる：
不変区間セット＃１：行１で始まり、１不変区間である：
不変区間＃１．１：２で終わる
不変区間セット＃２：行７で始まり、１不変区間である：
不変区間＃２．１：９で終わる
不変区間セット＃３：行９で始まり、１不変区間である：
不変区間＃３．１：１０で終わる Here, the invariant searcher creates the following interval:
Invariant section set # 1: Starts at row 1 and is one invariant section:
Invariant interval # 1.1: Ends with invariant interval set # 2: Begins with row 7 and is one invariant interval:
Invariant section # 2.1: Ends with invariant section set # 3: Begins with row 9 and is one invariant section:
Ends in invariant interval # 3.1: 10

なお、この場合も、低精度ではあるが依然として正確な簡略化によって、潜在的に変数を表す可能性がある任意のループを除外している。さらに、関数呼び出しによってはｘの値を変更できず、むしろ、some_other_funcローカルの変数に値がコピーされることから、区間は行８で終了とならないことに留意するものとする。 Again, the low-precision but still accurate simplification excludes any loops that could potentially represent variables. Furthermore, it should be noted that the value of x cannot be changed by function calls, but rather the interval is not terminated at line 8 because the value is copied to a local variable some_other_func.

実施例５：新たな変数の作成を伴うループ
Ｅｘａｍｐｌｅ５を参照して、ループにおける変数の作成を考える：

Example 5: Loop with creation of a new variable Referring to Example 5, consider the creation of a variable in a loop:

ここで、不変探索器は、以下を作成することになる：
不変区間セット＃１：行４で始まり、１不変区間である：
不変区間＃１．１：８で終わる（すなわち、ループの最終反復の後） Here, the invariant searcher will create:
Invariant interval set # 1: Starts at row 4 and is one invariant interval:
Ends in invariant interval # 1.1: 8 (ie after the last iteration of the loop)

不変探索器の例示的なアルゴリズム
不変探索器は、コールグラフの葉すなわち他の如何なる関数も呼び出さない関数を始点とする。そして、当該関数のＶＦＴを演算し、変数がコピーされた場合（たとえば、h=x;）にエッジを追加することも可能である。その後、葉関数のみを呼び出す関数を考慮した後、（xが渡された場合等に）ローカル変数のエッジを追加するとともに区間を統合することも可能である。そして、反復により、ＶＦＴが演算された関数のみを呼び出す関数を考慮することも可能である。 Exemplary Algorithm for Invariant Searchers An invariant searcher starts with a call graph leaf, ie a function that does not call any other function. It is also possible to calculate the VFT of the function and add an edge when the variable is copied (for example, h = x;). Then, after considering a function that calls only leaf functions, it is also possible to add edges of local variables and integrate sections (for example, when x is passed). It is also possible to consider a function that calls only a function for which VFT has been calculated by iteration.

コールグラフ内の再帰関数およびサイクルには、別の注意を要する。コールグラフのボトムアップ走査は、プログラムの静的解析である。不変値がすべてのパスにおいて真であることが必要なため、不変探索器は、シグネチャを使用するとともに、その呼び出しシグネチャに一致する任意の関数の間接呼び出しが可能であるものと仮定する。 Extra care is required for recursive functions and cycles in the call graph. The call graph bottom-up scan is a static analysis of the program. Since the invariant value needs to be true in all passes, the invariant searcher uses the signature and assumes that any function that matches the call signature can be indirectly called.

なお、ループ内のメモリ割り当ては、多くの異なる実行時配置を生じさせ得る。このような割り当てが起こると、当該変数が指すメモリは、変数が割り当てられるまで不変である。ループ内の割り当ては、（たとえば、アレイの）新たな要素への割り当てまたは変数の上書きとなる。 Note that memory allocation within the loop can result in many different runtime placements. When such an assignment occurs, the memory pointed to by the variable remains unchanged until the variable is assigned. Assignments within the loop result in assignments to new elements (eg, arrays) or variable overrides.

間接的な関数呼び出しの場合、不変探索器は、関数ポインタの値を伝搬させて各間接呼び出しの考え得る一組の対象を見つけ出す前方解析ステップと、上述の通りコールグラフのボトムアップに値フローを伝搬させる後方解析ステップとを交互に行うことができる。この交互実行は、一組の関数ポインタ対象が安定するまで反復可能である。 For indirect function calls, the invariant searcher propagates the value of the function pointer to find a possible set of targets for each indirect call, and the value flow to the bottom up of the call graph as described above. The backward analysis step to propagate can be performed alternately. This alternate execution can be repeated until the set of function pointer objects is stable.

不変探索器は、変数がとり得る値をさらに識別するようにしてもよい。一例として、変数join_typeの場合は、当該変数に割り当てられる異なる値がごくわずかであり、すべて静的に把握されていてもよい。これは、変数の型（列挙）にて指定される場合があり、静的解析（たとえば、当該変数に割り当てられたすべての値の調査）によって発見可能となる場合がある。考え得る一組の値が小さい場合、不変探索器は、各値の不変区間を記録するようにしてもよい。 The invariant searcher may further identify possible values for the variable. As an example, in the case of a variable join_type, there are very few different values assigned to the variable, and all of them may be grasped statically. This may be specified by the type of variable (enumeration) and may be discoverable by static analysis (e.g., examining all values assigned to the variable). If the set of possible values is small, the invariant searcher may record the invariant interval for each value.

正確性
ツールにより戻された各不変区間は、正確であるものとする。すなわち、関連する変数が当該区間の開始と終了との間（終了は含まず）のすべてのパスにわたって不変であることが保証されるものとする。いずれかのパスにおいて間接的な割り当てが存在する場合、不変探索器ツールは、このようなすべての割り当てが指定変数の値を変化させることのないようにする必要がある。 Accuracy Each invariant interval returned by the tool shall be accurate. That is, it is guaranteed that the relevant variable is invariant across all paths between the start and end of the interval (not including the end). If there is an indirect assignment in either pass, the invariant searcher tool needs to ensure that all such assignments do not change the value of the specified variable.

解析は、２つの点において慎重であっていてもよい。第１に、検出漏れ（（ａ）区間セットまたは（ｂ）区間セット内の個々の区間として正確であるものの、ツールにより戻されていない区間）が存在していてもよい。変数が割り当てられている（すなわち、不変区間を開始している）もののツール（不明な区間セット）および不完全な区間セット（不明な個々の区間）によって解析されていない場所をツールが示している場合は許容範囲である。 The analysis may be careful in two respects. First, there may be a detection failure ((a) section set or (b) sections that are accurate as individual sections within the section set but are not returned by the tool). The tool indicates where the variable is assigned (ie starting an invariant interval) but not analyzed by the tool (unknown interval set) and the incomplete interval set (unknown individual interval) The case is acceptable.

第２に、非最大区間（無条件に値を変化させる文で終わらない区間）が存在していてもよい。これは、（ａ）実際には値を変化させない割り当てまたは（ｂ）値が不変であると判定するのに解析が十分正確ではない非割り当て（たとえば、当該文において値を変化させる可能性がある「ｆｏｒ」文）が原因とも考えられる。 Second, there may be a non-maximum section (a section that does not end with a sentence that changes the value unconditionally). This can be (a) an assignment that does not actually change the value, or (b) a non-assignment whose analysis is not accurate enough to determine that the value is unchanged (for example, it can change the value in the sentence) "For" sentence) may also be the cause.

また、正確性には、すべての値フロー木リンクが正しく、それぞれが値のコピーを表していることも必要である。ただし、これらのリンクは、ある区間セットの値が実際に別の区間セットに由来する場合であっても、区間セット同士のリンクが不要である点において、非最大となり得る。 Also, accuracy requires that all value flow tree links are correct, each representing a copy of the value. However, these links can be non-maximal in that a link between section sets is not required even if the value of one section set is actually derived from another section set.

追跡器
本明細書に開示の別のツールは、「追跡器」と称する。追跡器は、ワークロード下の実行ファイルを入力とし、一連の追跡イベントを出力する。追跡イベントの出力には通常、「ループ入り」、「変数読み出し」、または「関数呼び出し」等、プログラム内のデータフローに影響し得る命令の実行を記録している。 Tracker Another tool disclosed herein is referred to as a “tracker”. The tracker receives an executable file under the workload and outputs a series of tracking events. The output of the trace event usually records the execution of an instruction that can affect the data flow in the program, such as “enter loop”, “read variable”, or “function call”.

追跡イベントは、別のツール「要約器」により処理され、それぞれの実行統計と併せて関数、文、および変数を出力する実行要約を生成する。このような情報は、領域特化の利益を享受し得るアプリケーション内の「ホットスポット」を示す。 Trace events are processed by a separate tool “summarizer” to generate an execution summary that outputs functions, sentences, and variables along with their execution statistics. Such information indicates “hot spots” in the application that can benefit from domain specialization.

正確性が影響することとして、特定の関心活動が実行中に発生した場合、関連する追跡イベントが出力および／または記録され、あらゆる出力および／または記録追跡イベントが指定順で関心活動の発生に対応する。 The impact of accuracy is that if a particular activity of interest occurs during execution, the associated tracking event is output and / or recorded, and any output and / or recorded tracking event corresponds to the occurrence of the activity of interest in a specified order To do.

不変検査器
別のツール「不変検査器」は、所与の実行の追跡イベントを用いることにより、識別された不変値の違反（たとえば、不変探索器により識別）が当該実行において発生しているか判定する（あるいは、開発者は、観察すべき重要な変数を示すことによって、不変検査器に指示を出すことができる）。理想としては、多くのワークロードにわたるＤＢＭＳ実行ファイルの多くの実行に関して、不変検査器が違反を見つけることはない（これにより、不変探索器が見つけた不変値が正しいものと確認する）。 Invariant Inspector Another tool, Invariant Inspector, uses a tracking event for a given execution to determine whether a violation of the identified invariant value (eg, identified by an invariant searcher) has occurred in that execution. (Or the developer can give instructions to the invariant tester by indicating the important variables to observe). Ideally, the invariant checker will not find violations for many executions of the DBMS executable over many workloads (this ensures that the invariant values found by the invariant searcher are correct).

不変検査器は、周期的な動作によって、（不変探索器および追跡器等の）他のツールによる解析をさらに検証するようにしてもよい。アプリケーションのユーザは、たとえば不変検査器を動作させ、違反が見つからなかった旨を示されるようになっていてもよい。一方、違反が見つかった場合、ユーザには、違反が見つかった旨が示され、さらに、テクニカルサポートに問い合わせて支援を仰ぐメッセージが与えられるようになっていてもよい。 The invariant tester may further verify analysis by other tools (such as invariant searchers and trackers) by periodic operation. The user of the application may be instructed, for example, to activate an invariant tester and indicate that no violation was found. On the other hand, if a violation is found, the user may be informed that a violation has been found and may be further provided with a message to contact technical support for assistance.

不変検査器には、たとえば本明細書に記載のツールの開発者により採用され、静的解析（たとえば、不変探索器により識別された不変値）の正確性を保証するデバッグツールとしての別の用途がある。 An invariant tester is another application, for example as a debugging tool employed by the developers of the tools described herein to ensure the accuracy of static analysis (eg, invariant values identified by an invariant searcher). There is.

プログラム相互作用推定器
ツール「プログラム相互作用推定器」は、ＰＲ（または、ソースコード、ＩＲコード、あるいはアセンブリもしくはマシンコード等の同等表現）およびエコシステム仕様を用いて、関連情報と併せたデータファイルのリストであるプログラム相互作用を推定する。基本的に、プログラム相互作用推定器は、プログラム中で値がファイルに格納される場所、値が後でファイルから読み出される場所、およびこれらの値がファイルから除去される（または、ファイル自体が除去される）場所を確定する。そして、これらの値は、ドメインアサーション誘導器内の長期不変値として決定されることになる。 Program interaction estimator The tool “Program Interaction Estimator” is a data file with PR (or source code, IR code, or equivalent representation of assembly or machine code, etc.) and ecosystem specifications along with related information. Estimate program interactions that are lists of Basically, the program interaction estimator is where the values are stored in the file in the program, where the values are later read from the file, and these values are removed from the file (or the file itself is removed). Confirm the location. These values will then be determined as long-term invariant values in the domain assertion inductor.

エコシステム仕様は、（ａ）関与するデータ、（ｂ）固定されるデータファイルと変動し得るデータファイル、（ｃ）このデータを作成、アクセス、および破棄可能なプログラム、ならびに（ｄ）任意の並行処理要件を記述している。本開示においては、ファイルに焦点を当てているが、一般的に、この仕様は、外界からの読み出しおよび書き込みデータとより一般的に関係しており、ファイルが挙げられるものの、ユーザＩ／Ｏ、他のプロセスへのストリーム／他のプロセスからのストリーム、ならびにメモリ割り当ておよび文字符号化の処理等、プログラムによるデータおよびＯ／Ｓとの他の相互作用のための考え得る他の方法も挙げられる。ファイルが最も一般的な方法であり、本説明の中心であるが、本発明は、その他任意の上記のようなデータ形式も利用可能であることが了解されるものとする。 The ecosystem specification includes (a) the data involved, (b) a fixed and variable data file, (c) a program that can create, access and destroy this data, and (d) any parallel Describes processing requirements. Although this disclosure focuses on files, in general, this specification is more generally related to read and write data from the outside world, including files, but user I / O, Other possible ways for other interaction with data and O / S by the program, such as stream to other processes / streams from other processes, and memory allocation and character encoding processing are also mentioned. While files are the most common method and are the heart of this description, it is to be understood that the invention can use any other data format as described above.

テーブルＳｐｉｆｆ使用事例
本明細書において提供するツールの説明の補助として、原型的ＤＢＭＳ（「ｍｉｎｉｄｂ」）との関連でいくつかの例を記載する。ｍｉｎｉｄｂ．ｈおよびｍｉｎｉｄｂ．ｃソースファイルからの抜粋をＥｘａｍｐｌｅ６に示すが、これは、繰り返し参照することになる。

Table Spiff Use Cases As an aid to the description of the tools provided herein, some examples are described in the context of a prototypical DBMS (“minidb”). minidb. h and minidb. An excerpt from the c source file is shown in Example 6, which will be referred to repeatedly.

アプリケーションにおけるデータフロー操作に関する特定の関数の非自明な特徴を記述する設定ファイルとして開発者が提供可能なエコシステム仕様（例示的なエコシステム仕様を以下のＥｘａｍｐｌｅ７に示す）は、（ａ）データが空になり始め、ワークロードが標準入力またはファイルから読み込まれること、（ｂ）（ワークロード）データが変動し得ること、（ｃ）ｍｉｎｉｄｂだけがデータにアクセスすること、および（ｄ）任意特定のディレクトリ上で動作するのは多くても１つのｍｉｎｉｄｂインスタンスであることを記述することになる。エコシステム仕様は、スキーマがｍｉｎｉｄｂの実行の全体で不変であることを理解するのに必須である。

An ecosystem specification that can be provided by a developer as a configuration file that describes a non-trivial feature of a specific function related to data flow operations in an application (an exemplary ecosystem specification is shown in Example 7 below) is: Starting to become empty, the workload is read from standard input or a file, (b) the (workload) data can fluctuate, (c) only the minidb has access to the data, and (d) any specific It will describe that at most one minidb instance operates on the directory. The ecosystem specification is essential to understand that the schema is immutable throughout the execution of minidb.

ｍｉｎｉｄｂは、テーブルの行を保持するファイルであるテーブルと、ＳＱＬ文を含むファイルであるワークロードという２種類のデータを使用する。種類の名称は、以下の説明においてこれらのファイルを区別するためだけのものである。各テーブルは、ディレクトリ（データベース）に存在する。 Minidb uses two types of data: a table that is a file that holds table rows, and a workload that is a file containing SQL statements. The name of the type is only for distinguishing these files in the following description. Each table exists in a directory (database).

このエコシステムには、ｍｉｎｉｄｂという１つのプログラムが存在する。このプログラムは、テーブルデータファイルを作成する。この動作を実行するコードの行（たとえば、CreateTable関数の行３）を与えることにより、操作されているファイル（ここでは、コードの当該行に記述された特定ファイル）をドメインアサーション推定器に伝える。実施例にも用いられているＣｏｎｓｏｌａｓフォントは、ｍｉｎｉｄｂソースコードにおける関数の名称である。動詞「reads」は、アプリケーションによってディレクトリが作成されても除去されてもいないことを示す。テーブルデータファイルの場合、当該ファイルは、CreateTable()に受け渡されたファイルによって示される。動詞「creates」もまた、「opens」、「reads」、「writes」、および「removes」を示唆している（ドメインアサーション推定器は、各テーブルがデータベースディレクトリに存在することを見つけ出し得る場合があるため、inDirectory属性およびおそらくはディレクトリ要素全体が不要となる可能性があり、エコシステム仕様が１行だけ短くなる）。 This ecosystem has one program called minidb. This program creates a table data file. By giving the line of code that performs this operation (eg, line 3 of the CreateTable function), the file being manipulated (here, the specific file described in that line of code) is communicated to the domain assertion estimator. The Conolas font used in the embodiment is the name of a function in the minidb source code. The verb “reads” indicates that the directory has not been created or removed by the application. In the case of a table data file, the file is indicated by the file passed to CreateTable (). The verb “creates” also suggests “opens”, “reads”, “writes”, and “removes” (the domain assertion estimator may find that each table exists in the database directory Thus, the inDirectory attribute and possibly the entire directory element may be unnecessary, and the ecosystem specification is shortened by one line).

このプログラムは、ワークロードデータファイルを開くが、これは「reads」を示唆する。ここで、ファイルは、GetNextCommand()に受け渡されたものである。あるいは、このファイルは、GetNextCommand()の行７において標準入力から読み出される可能性がある。ｍｉｎｉｄｂの複数の並列インスタンス化では、同じワークロードファイルを読み出す可能性があるものの、内部のデータベースディレクトリまたはテーブルファイルにアクセスしたりこれらを変更したりすることはない。 The program opens the workload data file, which suggests "reads". Here, the file is the one passed to GetNextCommand (). Alternatively, this file may be read from standard input at line 7 of GetNextCommand (). Multiple parallel instantiations of minidb may read the same workload file, but do not access or change the internal database directory or table file.

ＰＲから抽出されたプログラム相互作用（Ｅｘａｍｐｌｅ８参照）は、ｍｉｎｉｄｂがこのディレクトリでテーブルファイルを作成し、これらを読み書きした後、除去することを記述しており、ソースにおいてこれらのファイル動作がそれぞれ起こる場所を厳密に示している。さらに、ファイル内のテーブルヘッダは、ファイル内で決して変更されず、当該ファイルは、変数「data_file_name」によって一意に識別される。

The program interaction extracted from PR (see Example 8) describes that minidb creates table files in this directory, reads and writes them, and then removes them, where each of these file operations occurs in the source Is strictly shown. Furthermore, the table header in the file is never changed in the file, and the file is uniquely identified by the variable “data_file_name”.

テーブルデータファイルは、データベースディレクトリにおいて最初に作成される（この例では、ｍｉｎｉｄｂという１つのアプリケーションだけを使用しているため、データに対する追加、除去等の動作よりも、データファイルにおいてこれを指定可能である）。このファイルには、TableHeaderおよびそれぞれが行（文字列）を伴う複数の「RowHeaders」という３つのデータ構造を含む。後続のツールにおける解析では、包含構造を把握する必要はなく、書き込まれた後に読み出されるデータ構造のみが必要となる。当然のことながら、データは、ファイルに書き込まれると、消去される前に、おそらくは何回も読み出し可能である。 The table data file is created first in the database directory. (In this example, only one application called minidb is used, so this can be specified in the data file rather than adding or removing data. is there). This file includes three data structures, TableHeader and a plurality of “RowHeaders”, each with a row (string). In subsequent analysis in the tool, it is not necessary to grasp the inclusion structure, and only the data structure that is read after being written is required. Of course, once the data has been written to the file, it can possibly be read many times before it is erased.

ファイルの寿命は、アプリケーションの個々の実行を超えて延びる。ある実行によってファイルが作成された後、別の実行によって当該ファイルにデータが書き込まれ、その後、別の実行によって当該データが読み出された後、別の実行によってファイルが除去される可能性がある。重要なセマンティクスとして、ファイルに書き込まれたデータは、後で当該ファイルから読み出されるデータと同じであり、その後、当該データがファイルから消去されるか、またはファイル自体が除去される。その他の重要なセマンティクスとして、ファイルに書き出された後に読み込まれる実際のＣ構造体がＰＲによって把握されている。 The lifetime of the file extends beyond the individual execution of the application. After a file is created by one execution, data may be written to the file by another execution, and then the data may be read by another execution, and then the file may be removed by another execution . As an important semantics, the data written to the file is the same as the data that is later read from the file, after which the data is erased from the file or the file itself is removed. Another important semantic is that the PR knows the actual C structure that is read after being written to the file.

原型的ＤＢＭＳの詳細（すなわち、ｍｉｎｉｄｂ）に戻って、興味深いことに、消去は、実際にはファイル書き込みである。行の上書きが起こって、元の行の消去が有効となる。 Returning to the details of the prototypical DBMS (ie, minidb), interestingly, the erase is actually a file write. Line overwriting occurs, and the original line is erased.

ExecuteDelete()のロジックは特に複雑であり、一時ファイルが作成され、消去対象の行の前のタイル行が一時ファイルにコピーされ、消去対象の行の後の行がコピーされた後、一時ファイルの名称が変更される。プログラム相互作用推定器は、このようなロジックを含むことにより、これらの詳細を取り扱うようになっていてもよい。 The logic of ExecuteDelete () is particularly complex: a temporary file is created, the tile line before the erased line is copied to the temporary file, the line after the erased line is copied, and then the temporary file The name is changed. The program interaction estimator may be adapted to handle these details by including such logic.

テーブルＳｐｉｆｆインスタンス使用事例
テーブルＳｐｉｆｆインスタンスは、データベースの特定行と関連付けられており、その取り扱いは上述の通りである。 Table Spiff Instance Use Case A table Spiff instance is associated with a specific row in the database, and its handling is as described above.

行Ｓｐｉｆｆ
「行」の概念は、かなりドメイン固有と考えられる。ただし、一般的な概念は、全体として読み出し、書き込み、および処理が行われるデータファイルの一部の概念である。また、各行におけるクエリ評価ループの概念も存在するが、これは、入力ファイルの各帰属部の処理に用いられるコードの部分としても一般化可能である。このように行Ｓｐｉｆｆを認識するには、入力ファイルの一部が処理されたタイミングを認識する必要があり、それぞれ同じ構造を有する異なる部分にも同じコードが繰り返し用いられる。 Row Spiff
The concept of “row” is considered quite domain specific. However, the general concept is a partial concept of a data file that is read, written, and processed as a whole. There is also a concept of a query evaluation loop in each line, which can be generalized as a part of code used for processing each attribute part of the input file. Thus, in order to recognize the line Spiff, it is necessary to recognize the timing at which a part of the input file is processed, and the same code is repeatedly used for different parts having the same structure.

行Ｓｐｉｆｆの実装には、（i）データの分割に利用する不変値の決定、（ii）データにおけるＳｐｉｆｆＩＤの配置、および（iii）おそらくはＳｐｉｆｆＩＤにより決定可能なデータ値の除去が必要である。第１のステップでは、ワークロードに依拠したコストモデルを使用する。第２のステップでは、入力データの構造を実際に変更するため、エコシステムの各関連プログラム（データの当該部分の読み出しまたは書き込みを行う）を変更する必要がある。第３の課題についても同様に取り扱われることになる。 Implementation of row Spiff requires (i) determination of invariant values to be used for data partitioning, (ii) placement of SpiffIDs in the data, and (iii) removal of data values, possibly determined by SpiffIDs. In the first step, a cost model that depends on the workload is used. In the second step, in order to actually change the structure of the input data, it is necessary to change each related program of the ecosystem (reading or writing the relevant part of the data). The third problem will be handled in the same manner.

したがって、行Ｓｐｉｆｆの概念の固有観点は、（ａ）ユニットにおいて処理されるデータの部分を識別することおよび（ｂ）このデータにアクセスするプログラムにおいてより効率的に操作可能となるようにデータを変更することである。 Thus, a unique aspect of the concept of row Spiff is that (a) identifying the part of the data that is processed in the unit and (b) changing the data so that it can be operated more efficiently in the program accessing this data It is to be.

クエリＳｐｉｆｆ使用事例
クエリＳｐｉｆｆは、クエリ、テーブル、および行不変値の組み合わせである。後者２つは上記で解決しているが、クエリ不変値は不変探索器が見つける。この場合、クエリ不変値は、ｍｉｎｉｄｂ実行全体では持続しない。クエリが由来するワークロードは、読み出されるだけであり、（たとえば、parallelAccessが可能なことから）複数のｍｉｎｉｄｂ具体化によって使用することも可能なためである。 Query Spiff Use Case A query Spiff is a combination of a query, a table, and a row invariant value. The latter two are solved above, but the query invariant values are found by the invariant searcher. In this case, the query invariant value does not persist for the entire minidb execution. This is because the workload from which the query is derived is only read and can be used by multiple minidb instantiations (for example, because parallelaccess is possible).

プログラム相互作用推定器の例示的なアルゴリズム
図１に示すように、プログラム相互作用推定器には、エコシステム仕様およびＰＲという２つの入力がある。エコシステム仕様がデータを読み出して操作するプログラムに焦点を当てている一方、生成されたプログラム相互作用は、ファイルになされること、特に、プログラム内のデータ構造がファイルに対して書き込みおよび読み出しされる場所に焦点を当てている。このため、プログラム相互作用推定器すなわちＰＩＤは、ファイル操作システム呼び出し、特に、fopen()、fwrite()、およびremove()を解析する。また、その開始点として、エコシステム仕様により指定された<datafile>および<workload>を使用するが、この場合は、（たとえば、Ｅｘａｍｐｌｅ６に示すように）テーブルおよびワークロードを使用する（なお、ＰＩＤは、データベースも解析するが、これがOpenTable()により読み出されただけのディレクトリに過ぎないことをかなり迅速に見出す）。 Exemplary Algorithm for Program Interaction Estimator As shown in FIG. 1, the program interaction estimator has two inputs: ecosystem specification and PR. While the ecosystem specification focuses on programs that read and manipulate data, the generated program interactions are made to files, especially the data structures in the program are written to and read from files. Focus on location. Thus, the program interaction estimator, or PID, analyzes file manipulation system calls, specifically fopen (), fwrite (), and remove (). In addition, the <datafile> and <workload> specified by the ecosystem specification are used as the starting point. In this case, a table and a workload are used (for example, as shown in Example 6). Parses the database, but finds it fairly quickly that this is just a directory read by OpenTable ()).

これらのファイル操作呼び出し間において、ＰＩＤは、FILE*値のフローを観察する。 Between these file operation calls, the PID observes the flow of FILE * values.

ワークロードファイルは、特に解析が容易である。エコシステム仕様は、このファイルがGetNextCommand():13で開かれる旨を規定する（ファイルは、標準入力から読み出すことも可能である）。ＰＩＤは、仕様が参照するソースコードを解析することによって、以下を決定する。
ファイルがquery_file_nameによって命名されていること、
ＦＩＬＥquery_fileおよびstdinと関連付けられていること、
このファイルの読み出しがGetNextCommand():8およびGetNextCommand():18で読み出された文字列だけであること。 Workload files are particularly easy to analyze. The ecosystem specification specifies that this file is opened with GetNextCommand (): 13 (the file can also be read from the standard input). The PID determines the following by analyzing the source code referenced by the specification.
The file is named by query_file_name,
Associated with FILEquery_file and stdin,
Reading of this file should be only the character string read by GetNextCommand (): 8 and GetNextCommand (): 18.

したがって、プログラム相互作用推定器は、Ｅｘａｍｐｌｅ７に示すように、この決定された情報をプログラム相互作用ファイルに出力する。 Therefore, the program interaction estimator outputs the determined information to the program interaction file as shown in Example 7.

テーブルファイルは、より複雑な挙動となる。エコシステム仕様は、参照ソースコードを解析することによって推定されたデータ構造TableHeader.table_fileに由来するdata_file_nameに従う必要があることを示すcreates="CreateTable():3”を記述している。このため、ＰＩＤは、
main():case'C'からCreateTable():2(fopen())への後
数行を経てWriteTableHeader():3(fwrite())を呼び出し
main()の後、（ケース「I」:fwrite()でWriteRow():3および6において）書き込まれた多くの行に戻り
（ケース「D」:fwrite()なし（これは検出が困難となる）でExecuteDelete():25において）消去された行、および、
参照ソースコードを調べることによって、main():57:remove()内で再び消去されたファイルが最終的に後続するフローとなり得る。 Table files have more complex behavior. The ecosystem specification describes creates = "CreateTable (): 3" indicating that it is necessary to follow the data_file_name derived from the data structure TableHeader.table_file estimated by analyzing the reference source code. For this reason, the PID is
Call WriteTableHeader (): 3 (fwrite ()) from main (): case'C 'to CreateTable (): 2 (fopen ())
After main (), go back to many rows written (in case "I": fwrite () in WriteRow (): 3 and 6) (without case "D": fwrite () (this is difficult to detect) In ExecuteDelete (): 25) and the deleted line, and
By examining the reference source code, a file that has been deleted again in main (): 57: remove () can eventually become a subsequent flow.

Ｃファイルtable_header.table_fileと関連付けられたテーブルdatafileに関するこの一連の全体動作から、ＰＩＤは、
TableHeaderデータ構造がWriteTableHeader():3でテーブルファイルに書き込まれた後、
0penTable():8で読み出される、
ことを推定することができる。 From this series of overall operations on the table datafile associated with the C file table_header.table_file, the PID is
After the TableHeader data structure is written to the table file with WriteTableHeader (): 3,
0penTable (): read by 8,
Can be estimated.

興味深いことに、これがTableHeaderで行われるすべてであり、一度だけ書き込まれてファイルから決して消去されない。 Interestingly, this is all done with TableHeader, written once and never erased from the file.

また、ＰＩＤは、RowHeaderデータ構造が、
WriteRow():3でテーブルファイルに追加された後、
SequentialScan():7で読み出され、
ExecuteDelete():25でファイルから除去される、
ことを決定可能である。 The PID has a RowHeader data structure,
WriteRow (): After being added to the table file with 3,
SequentialScan (): read by 7,
ExecuteDelete (): 25 is removed from the file,
Can be determined.

最終的に、ＰＩＤは、文字列が、
WriteRow():6でテーブルファイルに追加され、
SequentialScanO:15で読み出され、
ExecuteDeleteQ:15でファイルから除去される
ことを決定可能である。 Finally, the PID is a string
WriteRow (): 6 added to table file,
SequentialScanO: read at 15
ExecuteDeleteQ: 15 can decide to be removed from the file.

このように、ＰＩＤが実行する解析では、ＦＩＬＥ型の変数の値に従って、以下を観察することにより、エコシステム仕様において識別された各ファイルを各プログラムが操作する様子を解析する：
１．ファイル名の由来する場所（また、プログラム内の変数）、
２．ファイルが開かれる場所、
３．そこからプログラム中でファイル値のフローが発生する場所、
４．ひいては、（i）書き込まれた後、（ii）ファイルから読み出され、その後、（iii）ファイルから消去されるデータ構造、および、
５．最終的に、当該ファイルが消去または閉じられる場所。 Thus, the analysis performed by PID analyzes how each program operates on each file identified in the ecosystem specification by observing the following according to the value of a FILE type variable:
1. Where the file name came from (also a variable in the program),
2. Where the file is opened,
3. From where the file value flow occurs in the program,
4). Eventually, (i) after being written, (ii) read from the file and then (iii) erased from the file, and
5. The place where the file will eventually be erased or closed.

この解析は、全体が単一プログラムの単一実行の文脈内であることに留意するものとする。複数のプログラムが存在する場合は、それぞれが別個に解析される。各プログラムの実行が複数回の場合も多々あるが、この解析では、単一の実行だけを考慮する。 Note that this analysis is entirely within the context of a single execution of a single program. If there are multiple programs, each is analyzed separately. There are many cases where each program is executed multiple times, but in this analysis, only a single execution is considered.

ＰＩＤ解析では、最初に：
ファイル変数を探索し、
このような値の値フローを計算し、
値フローに沿って、ファイル開閉、書き込み、および消去動作を識別し、
それぞれについて、プログラム相互作用に記録される特定の情報を識別する必要がある。 In PID analysis, first:
Search for file variables,
Calculate the value flow of these values,
Identify file open / close, write and erase operations along the value flow,
For each, it is necessary to identify specific information recorded in the program interaction.

後述のドメインアサーション推定器は、ＰＩＤにより抽出された実行前挙動を取得して結合することにより、プログラムからファイルへのデータフローおよびその後のプログラム（おそらくは、後続の実行）に戻るデータフローの様子を全体的に理解することによって、従来のコンパイラ解析でできなかったプログラム実行全体の不変値フローを演算する。 The domain assertion estimator, described below, captures and combines the pre-execution behavior extracted by the PID to show the data flow from the program to the file and the data flow back to the subsequent program (possibly subsequent execution). By understanding the whole, the invariant value flow of the entire program execution that could not be done by the conventional compiler analysis is calculated.

ドメインアサーション推定器
ドメインアサーション推定器ツールは、ＰＲ、識別された不変値、およびプログラム相互作用を使用して、ドメインアサーションを推定する。テーブルＳｐｉｆｆの場合、プログラム相互作用は、不変となるスキーマ情報を示唆している。テーブルＳｐｉｆｆインスタンスの場合、プログラム相互作用は、プログラム中で行が作成、アクセス、更新、および消去される場所を理解するのに必須である。クエリＳｐｉｆｆは、従来のコンパイラ最適化技術の範囲内のより小さな範囲の不変値と併せて、スキーマ、行、およびワークロード生成不変値を利用可能である。ドメイン固有知識の指定では、これらの演算の一部を少し詳しく記述する。そして、ドメインアサーション推定器は、ファイルに対する読み出しおよび書き込みがなされた値に従って、おそらくはプログラムの複数の起動にわたり、不変区間を結合することにより、値の完全な寿命を導出した後、ドメインアサーションにて符号化する。ワークロードが完全である場合すなわちいくつかのバッチアプリケーションに当てはまり得るデータに対する可能な動作を完全に特性化している場合、ドメインアサーション推定器（ＤＡＤ）は、不変値の制限された一組の考え得る値を推定することもできる。この情報（ドメインアサーションおよびおそらくは一組の考え得る不変値）こそ、従来のコンパイラが有していないものである。 Domain Assertion Estimator The domain assertion estimator tool estimates domain assertions using PR, identified invariants, and program interactions. In the case of the table Spiff, the program interaction suggests schema information that becomes invariant. For table Spiff instances, program interaction is essential to understand where rows are created, accessed, updated, and deleted in the program. Query Spiff can utilize schema, row, and workload generation invariant values, along with a smaller range of invariant values within the scope of conventional compiler optimization techniques. In specifying domain-specific knowledge, some of these operations are described in some detail. The domain assertion estimator then derives the full lifetime of the value by combining invariant intervals, possibly over multiple invocations of the program, according to the values read and written to the file, and then signing with the domain assertion Turn into. A domain assertion estimator (DAD) can consider a limited set of invariant values if the workload is complete, i.e. fully characterizing possible behavior for data that may be applicable to some batch applications. The value can also be estimated. This information (domain assertions and possibly a set of possible invariants) is what traditional compilers do not have.

領域特化の重要な観点は、コンパイラが一般的に利用できない情報を利用することである。この情報は、（i）ドメイン固有知識および（ii）予備ソース知識という２つの形態である。両種の知識は、単一プログラムのソースコードを見るだけでコンパイラが発見または結論可能なものを超えていく。領域特化は、読み出しまたは操作を行うデータ、呼び出すプログラム、このデータの読み出しまたは操作も行う他のプログラム、関与するオペレーティングシステム、ネットワークルータ、およびストレージシステムといった特化対象のプログラムのはるかに広いエコシステムを考慮に入れる。このエコシステムは、特化対象のプログラム（および、そのデータ）の効率を向上させるために領域特化で利用可能な多くの情報を提供する。 An important aspect of domain specialization is the use of information that is not generally available to compilers. This information is in two forms: (i) domain specific knowledge and (ii) preliminary source knowledge. Both types of knowledge go beyond what a compiler can find or conclude by looking at the source code of a single program. Domain specialization is a much broader ecosystem of specialized programs such as data to read or manipulate, calling programs, other programs that also read or manipulate this data, operating systems involved, network routers, and storage systems Take into account. This ecosystem provides a lot of information that can be used in domain specialization to improve the efficiency of the specialization program (and its data).

やや包括的な予備ソース情報と比較して、ドメイン固有知識は、特定のドメインのプログラムにのみ当てはまる。ドメイン固有知識の一例として、「テーブルのスキーマのあらゆる変更が直列化可能となる」。直列化可能性は、データベース領域から生じた複雑な概念であるが、他の並列分散情報処理用途にも拡がっている。このような知識によれば、テーブルＳｐｉｆｆが作成されるべき場所および破壊されるべき場所を厳密に示すことを含めて、ＤＢＭＳを高速化するテーブルＳｐｉｆｆを作成可能である。 Compared to somewhat comprehensive preliminary source information, domain specific knowledge applies only to programs in a specific domain. An example of domain specific knowledge is "any change in table schema can be serialized". Serializability is a complex concept that originated from the database domain, but has also been extended to other parallel and distributed information processing applications. With such knowledge, it is possible to create a table Spiff that speeds up the DBMS, including precisely indicating where the table Spiff should be created and where it should be destroyed.

ドメイン固有知識の第２の形態は、プログラムのワークロードの形態である。一例として、「ＯＬＡＰ（オンライン解析処理）アプリケーションは、データ変動をほとんど示さず、（複雑な）クエリが日中に支配的となることが多く、更新は低頻度で通常は夜間である」。このような情報は、「この活動は他の活動よりも頻度が高い」という形態であるため、領域特化の指示を出すことにより、現時点ではワークを代償とした情報に基づくより良い決定が可能であって、後で何らかの高速化を行うことになる。 The second form of domain specific knowledge is the form of program workload. As an example, “OLAP (online analytics processing) applications show little data variability, (complex) queries are often dominant during the day, and updates are infrequent and usually at night.” Such information is in the form of "This activity is more frequent than other activities", so by giving instructions for domain specialization, it is now possible to make better decisions based on information at the expense of work Then, some speedup will be performed later.

予備ソース知識の一例は、唯一のプログラムにより書き込みおよび読み出しがなされたファイルの特定部分であり、これは、当該部分を修正するコードまたはファイルを消去するコードが実行されるまで残留することになる。このような知識によれば、入力ファイルを繰り返し処理する任意のプログラムを高速化するＳｐｉｆｆを作成可能である。 An example of preliminary source knowledge is a particular part of a file that has been written and read by a single program, which will remain until code that modifies that part or code that erases the file is executed. Based on such knowledge, it is possible to create a Spiff that speeds up an arbitrary program that repeatedly processes an input file.

ドメイン固有知識および予備ソース知識は、このような知識を形式的に記述するドメインアサーションおよび予備ソースアサーションを含むファイルをＳｐｉｆｆ探索器が読み出せるように形式化されているのが好ましい。そして、Ｓｐｉｆｆ探索器は、ＤＢＭＳソースコード（または、より一般的に、ドメイン仕様により記述されたドメインの任意のプログラムのソースコード）を含むファイルを読み出すとともに、Ｓｐｉｆｆ構成器で使用するＳｐｉｆｆ不変値を出力する。 The domain specific knowledge and preliminary source knowledge are preferably formalized so that the Spiff searcher can read a file containing domain assertions and preliminary source assertions that formally describe such knowledge. Then, the Spiff searcher reads a file containing DBMS source code (or more generally, source code of an arbitrary program described in the domain specification), and uses a Spiff invariant value used by the Spiff composer. Output.

テーブルＳｐｉｆｆ使用事例
プログラム相互作用からの情報をmain():table_header.num_columns上の不変区間と結合することにより、以下のＥｘａｍｐｌｅ９に示すドメインアサーションの生成にＤＡＤが必要とする情報が得られる。

Table Spiff use case By combining the information from the program interaction with the invariant section on main (): table_header.num_columns, the information required by the DAD to generate the domain assertion shown in Example 9 below is obtained.

なお、ExecuteDelete():38は含まない。解析によって、それが名称変更であることが分かっているためである。 Note that ExecuteDelete (): 38 is not included. This is because the analysis shows that it is a name change.

テーブルＳｐｉｆｆインスタンス使用事例
データベースのテーブル行は、テーブルごとに複数の行が存在する一方でスキーマは１つしか存在しない点において、スキーマと異なる。上記においては、テーブルと関連付けられたファイルに格納されたtable_header.num_columnsという１つの値が存在するが、この値は、最初にファイルに書き出された後、読み戻される。プログラム相互作用によれば、このことに加えて、row_dataというさまざまな異なる値が同じファイルに書き込まれることが分かる。 Table Spiff Instance Use Case A database table row differs from a schema in that there are multiple rows for each table but only one schema exists. In the above, there is one value, table_header.num_columns, stored in the file associated with the table, but this value is written back to the file and then read back. Program interaction shows that in addition to this, various different values of row_data are written to the same file.

行は、データファイル中の存在場所すなわちファイル位置（行の第１のバイトのオフセット）によって見分けられる。 A line is identified by its location in the data file, ie the file position (offset of the first byte of the line).

Ｅｘａｍｐｌｅ１０に示すように、ＤＡＤにより生成されたドメインアサーションは、読み出しまたは書き込み対象の指定ファイル内のプログラムの実行における現在位置が特定の値である場合にのみ関数依存性が有効となることを示す関数依存性の左側のキーワードOFFSETによって行を区別する。

As shown in Example 10, the domain assertion generated by DAD is a function indicating that the function dependency is valid only when the current position in the execution of the program in the specified file to be read or written is a specific value. Differentiate rows by the keyword OFFSET on the left side of the dependency.

直観的に、読み出しが発生する位置にデータが存在する一方、当該データは、セグメント開始により記述される当該位置に早く書き込まれたデータと同じである。ＤＡＤによれば、プログラム中の変数の複数の値は、同じファイルに書き込まれた後、OFFSETを用いてこれらの値を区別することが分かっている。なお、セグメント終了文のうちの１つには、OFFSETを含まないため、ファイルの消去時にファイル全体のすべての行不変値が終了となる。 Intuitively, data is present at the location where reading occurs, while the data is the same as the data that was written earlier at the location described by the start of the segment. According to DAD, it is known that multiple values of variables in a program are differentiated using OFFSET after being written to the same file. Since one of the segment end statements does not include OFFSET, all line invariant values of the entire file end when the file is deleted.

ＤＡＤによれば、ケース「D」（ｓｗｉｔｃｈ／ｃａｓｅ文を用いたｍｉｎｉｄｂの実装において解釈されるｍｉｎｉｄｂコマンド）がファイル内のOFFSET間でデータパケットを移動させることも分かり得る。当業者であれば、このような移動への対応には、ドメインアサーションの形式を拡張しさえすればよいことが容易に理解されよう。 According to DAD, it can also be seen that case “D” (minidb command interpreted in the implementation of minidb using a switch / case statement) moves a data packet between OFFSETs in the file. One skilled in the art will readily appreciate that the domain assertion format need only be extended to accommodate such movement.

OFFSET以外に、行不変値は、テーブルのスキーマ不変値と同じ構造を有する。 Other than OFFSET, row invariant values have the same structure as table schema invariant values.

さらに、アプリケーションに対して書き出しまたは読み込みされた各ファイルは、データパケットで構成されると考えられ、それぞれ、ユニットとして書き込みおよび読み出しがなされるプログラムのローカル変数における外部形式の値である。このため、ｍｉｎｉｄｂ．ｃは、ファイルにまずスキーマデータパケット（table_header.num_columnsの値を含む）を配置した後、一連の行データパケット（row_valuesの値を含む）を配置する。 Furthermore, each file written to or read from an application is considered to be composed of data packets, each of which is a value in an external format in a local variable of a program that is written and read as a unit. For this reason, minidb. In c, a schema data packet (including the value of table_header.num_columns) is first arranged in the file, and then a series of row data packets (including the value of row_values) are arranged.

クエリＳｐｉｆｆ使用事例（実施例１１）
２つの場合が存在する。第１は、ワークロード（すなわち、クエリ）が標準入力に由来する場合であり、この場合はドメインアサーションが推定不可能である。ユーザが何でも入力可能なためである（当然のことながら、依然としてＶＦＴを用いることにより、（多くの）不変値がクエリ中に活動的であることを判定可能であるものの、これは、先のステップにおいて不変探索器により実行済みである）。第２は、ワークロードがファイルすなわち起動引数において命名されたファイルに由来する場合であり、この場合は、OFFSETを処理する点において、ドメインアサーションが本質的に行不変値と同じである。この第２の場合は、ファイル名を用いることにより、実際のファイルをワークロードソースとして示す。これら２つの場合は、エコシステム仕様によって区別されるが、第２の場合には、特定のワークロードファイルの指定も可能であることが記述される。これに対して、ワークロードファイルの作成者は把握されていなくてもよく、このような場合は、当該ワークロードのクエリＳｐｉｆｆを作成できない。

Query Spiff use case (Example 11)
There are two cases. The first is when the workload (ie query) comes from standard input, in which case domain assertions cannot be estimated. This is because the user can enter anything (and, of course, by using VFT, it can be determined that (many) invariant values are active during the query) Has been performed by an invariant searcher). The second is when the workload comes from a file, ie the file named in the startup argument, in which case the domain assertion is essentially the same as a row invariant value in processing OFFSET. In the second case, an actual file is indicated as a workload source by using a file name. These two cases are distinguished by the ecosystem specification, but in the second case, it is described that a specific workload file can also be specified. On the other hand, the creator of the workload file does not need to be grasped. In such a case, the query Spiff of the workload cannot be created.

第２の場合は、ワークロード上の特化に用いられるようになっていてもよい。そして、クエリＳｐｉｆｆＩＤは、ワークロードに挿入されてもよいし、どこか他の場所に格納された関連付けとして配置されてもよく、当該ワークロードが実行された場合に用いられるようになっていてもよい。 The second case may be used for workload specialization. The query SpiffID may be inserted into the workload, may be arranged as an association stored somewhere else, or may be used when the workload is executed. Good.

以下では、クエリが読み込みまで把握されない第１の場合のみを考える。 In the following, only the first case where the query is not grasped until reading will be considered.

なお、クエリ中に有効な不変値のみを含むクエリＳｐｉｆｆは、積極最適化コンパイラによって発見可能であるものとする。ただし、より重要なこととして、クエリＳｐｉｆｆは、このようなクエリ不変値をコンパイラが発見不可能なスキーマおよび行不変値と組み合わせる。このようなスキーマおよび行不変値には、ファイル読み書きのセマンティクスに関する知識を要するためである。クエリＳｐｉｆｆが真のＳｐｉｆｆとなるのは、このような観点である。 It is assumed that a query Spiff that includes only valid invariant values in the query can be found by the aggressive optimization compiler. More importantly, however, Query Spiff combines such query invariant values with schema and row invariant values that the compiler cannot find. This is because such schema and row invariant values require knowledge of file read / write semantics. It is from this point of view that the query Spiff becomes a true Spiff.

ドメインアサーション推定器の例示的なアルゴリズム
追跡分割要素は、データファイルまたはデータファイルの構成要素（ここでは、テーブルヘッダおよび行）が作成、挿入、または消去されるプログラム相互作用に由来する。ドメインアサーション内の依存性は、ディレクトリおよびファイル名ならびに任意選択としてファイル内のOFFSETに由来する。不変値がこれらをつなぎ合わせるため、アプリケーション内の変数の値は、アプリケーションコードからファイルに流れ出た後、アプリケーションに戻るように見られることで、候補スニペットおよび最終Ｓｐｉｆｆを決定可能な長寿命不変値を確定することができる。 Exemplary Algorithm for Domain Assertion Estimator The tracking segmentation element is derived from a program interaction in which a data file or data file components (here table headers and rows) are created, inserted, or deleted. The dependencies in the domain assertion come from the directory and file name and optionally OFFSET in the file. Since the invariant values splice these together, the value of the variable in the application can be viewed as flowing back from the application code to the file and then back to the application, resulting in a long-life invariant value that can determine the candidate snippet and final Spiff. It can be confirmed.

ドメインアサーションにおいてtable_header.num_columnsが特性化されたテーブルＳｐｉｆｆ使用事例の場合、ＤＡＤは、以下を判定可能である：
main():ケース「C」がCreateTable()を呼び出すこと
これがWriteTableHeader()を呼び出すこと
これがテーブルヘッダをファイルに書き込むこと。 For a table Spiff use case where table_header.num_columns is characterized in a domain assertion, the DAD can determine:
main (): Case “C” calls CreateTable () This calls WriteTableHeader () This writes the table header to the file.

テーブルヘッダは：
上記およびその後のプログラム起動によって読み出されるが、
これは、main():57でファイルが消去されるまでである。 The table header is:
Read by the above and subsequent program startup,
This is until the file is deleted by main (): 57.

これは、テーブル<datafile>中のテーブルヘッダが一度だけ書き込まれプログラムにより決して修正されないことを示唆する。これを把握することにより、ＤＡＤは、適当な追跡分割要素を生成可能であり、テーブル<datafile>の追跡分割要素は作成および消去される。また、ＤＡＤは、table_header.num_columnsに関する依存性も作成可能である。 This suggests that the table header in table <datafile> is written once and never modified by the program. By grasping this, the DAD can generate an appropriate tracking division element, and the tracking division element of the table <datafile> is created and deleted. The DAD can also create a dependency on table_header.num_columns.

テーブルＳｐｉｆｆインスタンス使用事例の場合、関連するデータパケットは、テーブル<datafile>に追加され、おそらくは当該ファイルから消去され、最終的にファイル自体が消去された場合に除去されるrow_dataおよびrow_valuesである。ＤＡＤは、ファイルが作成されると、プログラムがrow_dataの複数の値をファイルに格納することにより、それが存在するOFFSETによってこのような各パケットを識別可能である旨を判定する。 For the table Spiff instance use case, the associated data packets are row_data and row_values that are added to the table <datafile>, possibly deleted from the file, and eventually removed when the file itself is deleted. When a file is created, the DAD determines that each such packet can be identified by the OFFSET in which the program stores a plurality of values of row_data in the file.

上記観察は、ＤＡＤのアルゴリズムを提供する。プログラムにより作成または開かれた各FILEの場合、ＤＡＤは、ＶＦＴによって、当該ファイルが最初に作成された場所および当該ファイルの名称が由来する場所を見つけ出す。そして、当該ファイルに格納された各データ構造（これらは、プログラム相互作用における<data>要素である）について、ＤＡＤは、当該データ構造に対して実行されるファイル操作（データ構造のファイルへの追加、おそらくはデータ構造の変更または除去、および最終的なファイルの消去）を確定する。そして、これらの操作は、適当な追跡分割要素を示唆する。最終的には、これらの操作で用いられるプログラムデータ構造（ファイルに書き込まれたＣまたはＣ＋＋プログラムデータ構造）から、ＤＡＤは、ＶＦＴを検査して、このようなプログラムデータ構造中の値が由来する場所を決定することにより、依存性を示唆することができる。また、ＤＡＤは、プログラムを流れる際の各FILE変数に対する実行を追跡することによって、ファイルが含むデータパケットが（num_columnsの場合のように）１つだけか（row_dataの場合のように）複数かを判定するが、これはＶＦＴによっても判定可能である。複数のパケットには、ドメイン追跡分割および依存性におけるOFFSETが必要である。 The above observation provides a DAD algorithm. For each FILE created or opened by the program, the DAD uses the VFT to find out where the file was originally created and where the name of the file came from. For each data structure stored in the file (these are the <data> elements in the program interaction), the DAD performs file operations (addition of the data structure to the file) performed on the data structure. (Possibly changing or removing data structures, and finally erasing files). These operations then suggest appropriate tracking segmentation elements. Eventually, from the program data structure (C or C ++ program data structure written to the file) used in these operations, the DAD examines the VFT to derive the value in such program data structure. By determining the location, dependencies can be suggested. The DAD also keeps track of execution for each FILE variable as it flows through the program, so that the file contains only one data packet (as in num_columns) or multiple (as in row_data). Although it is determined, this can also be determined by VFT. Multiple packets require OFFSET in domain tracking partitioning and dependencies.

正確性
正確性が影響することとして、生成されたドメインアサーションは完全であり、入力ＰＲ、不変値、およびプログラム相互作用と一致する。 Accuracy As accuracy affects, the generated domain assertions are complete and consistent with the input PR, invariant values, and program interactions.

スニペット探索器
図１に示すように、別のツール「スニペット探索器」は、以下を入力とする：
１つまたは複数の不変値
ＰＲ
１つまたは複数の実行要約
ドメインアサーションのプログラム相互作用、および、
コストモデル Snippet Searcher As shown in FIG. 1, another tool “Snippet Searcher” takes as input:
One or more invariant values PR
One or more execution summary domain assertion program interactions, and
Cost model

スニペット探索器は、１つまたは複数の<spiff>要素を出力するが、それぞれが１つまたは複数の候補スニペットを含んでおり、各候補スニペットが以下を含む：
ＰＥにより識別されたコードの区間
一組の不変値
各不変値の考え得る一組の値
各不変値の値が最初にファイルに書き込まれたソース位置、
当該値がファイルから除去されたソース位置、
値が時間ごとに読み出されるソース位置、
候補スニペットの適当な寿命すなわち関連Ｓｐｉｆｆを作成可能なタイミング（コンパイル時であれ実行時であれ）、および、
任意選択として、区間内で採用される推奨最適化。 The snippet searcher outputs one or more <spiff> elements, each containing one or more candidate snippets, each candidate snippet containing:
Interval of code identified by PE A set of invariant values A possible set of values for each invariant value The source location where each invariant value was first written to the file,
The source location from which the value was removed from the file,
The source position from which values are read out over time,
An appropriate lifetime for the candidate snippet, ie when it can create an associated Spiff (whether at compile time or run time), and
Optionally, recommended optimization adopted within the interval.

各ドメインアサーションは、単一のプログラム実行内の範囲を有する不変値に記録された区間に対して、たった１つのプログラム実行よりもほぼ確実に広い区間を示唆する。 Each domain assertion suggests an interval that is almost certainly wider than just one program execution for an interval recorded in an invariant value with a range within a single program execution.

スニペット探索器は、ドメインアサーションを用いて、不変値の範囲を広げるとともに、各不変値の考え得る一組の値を改良する。各不変値の区間は、候補スニペットの区間と（一部または全体が）重なる。さらに、各候補スニペットの区間は、コストモデルによる調整によって、実行要約およびＳｐｉｆｆ呼び出しコストから引き出される最適化されたスニペットの実行コストとスニペット評価回数との積として計算される節約を最大化しつつ、区間のサイズを最小限に抑える。このため、スニペット探索器は、Ｓｐｉｆｆ構成器が実行する考え得る最適化およびこのような各最適化の利益（後者はコストモデルによる）との考えを有する必要がある。 The snippet searcher uses domain assertions to expand the range of invariant values and improve the set of possible values for each invariant value. Each invariant interval overlaps (partially or entirely) the candidate snippet interval. In addition, each candidate snippet interval is adjusted by the cost model to maximize the savings calculated as the product of the execution cost of the optimized snippet derived from the execution summary and the Spiff call cost and the number of snippet evaluations. Minimize the size. For this reason, the snippet searcher needs to have an idea of the possible optimizations that the Spiff composer performs and the benefits of each such optimization (the latter depending on the cost model).

クエリＳｐｉｆｆ使用事例（実施例１２）
先に考察したクエリＳｐｉｆｆとテーブルＳｐｉｆｆとの間には、２つの大きな相違点がある。第１はこのステップで取り扱うが、スニペット探索器は、ドメインアサーション以外からの不変値を使用する。クエリが通例は存続しないためである（ただし、標準入力に与えられたワークロードに関する上記説明を参照）。第２は後で取り扱うが、Ｓｐｉｆｆ構成器は、実行時のＳｐｉｆｆコードのコンパイルとＳｐｉｆｆインスタンスのインスタンス化との間の境界の場所に関して、エコシステム仕様からの明示的な指示を必要とする。 Query Spiff use case (Example 12)
There are two major differences between the query Spiff and the table Spiff discussed above. The first is dealt with in this step, but the snippet searcher uses invariant values from other than domain assertions. This is because queries typically do not persist (but see the discussion above regarding the workload given to standard input). Although the second will be dealt with later, the Spiff constructor requires explicit instructions from the ecosystem specification regarding the location of the boundary between compiling the Spiff code at runtime and instantiating the Spiff instance.

スニペット探索器は、以下を推定する：
SequentialScan():40（データパケットをrow_values[]に解凍した直後）からSequentialScan():59（方法の最後）までのコードの区間、
一組の不変値main().query、特に、テーブルＳｐｉｆｆ使用事例によるstdinおよびquery.schemaから読み出されたquery.executor_routine,query.executor_command、query.num_predicates,query.predicate_list、およびpredicates[]、
各不変値の考え得る一組の値（この場合、query.executor_routineは常にSequentialScan()、query.executor_commandは常にSCAN_FWDであり、各述語について、column_idは（BuildPredicates()の当該領域への割り当てから推定される）stdinから読み出された任意のintであり、constant_operandはstdinから読み出されたunsigned longであり、operator_functionは&EqualInt4または&LessThanInt8である）、
各不変値の値が最初に決定されたソース位置（クエリの値は、main():32すなわちBuildAndPlanQuery()の呼び出し直後に決定）。
クエリの値は、決してファイルに対する書き込みも除去も読み出しも行われない、
候補スニペットの適当な寿命（寿命はちょうど「S」スイッチ内であり、エコシステム仕様によれば、コンパイラを呼び出し不可能であることが分かるため、SequentialScan()としてのexecutor_routine、SCAN_FWDとしてのexecutor_command、１〜６のnum_predicatesについてはＳｐｉｆｆの予備コンパイルのみを行い、このような各述語について、operator_functionが&EqualInt4または&LessThanInt8である）、
schema->num_columnsの値により終端するSequentialScan():47でのｆｏｒループの展開。

The snippet searcher estimates the following:
Code section from SequentialScan (): 40 (immediately after decompressing the data packet to row_values []) to SequentialScan (): 59 (end of method)
A set of invariant values main (). Query, in particular query.executor_routine, query.executor_command, query.num_predicates, query.predicate_list, and predicates [], read from stdin and query.schema according to the table Spiff use case
A set of possible values for each invariant (in this case, query.executor_routine is always SequentialScan (), query.executor_command is always SCAN_FWD, and for each predicate, column_id is (estimated from the assignment of BuildPredicates () to that region) Is an arbitrary int read from stdin, constant_operand is an unsigned long read from stdin, and operator_function is & EqualInt4 or & LessThanInt8),
The source location where the value of each invariant value was first determined (the value of the query is determined immediately after calling main (): 32, ie BuildAndPlanQuery ()).
The value of the query is never written to, removed from, or read from the file,
Appropriate lifetime of the candidate snippet (the lifetime is just in the “S” switch, and according to the ecosystem specifications, it can be seen that the compiler cannot be called, so executor_routine as SequentialScan (), executor_command as SCAN_FWD, 1 For num_predicates of ~ 6, only pre-compile Spiff, and for each such predicate operator_function is & EqualInt4 or & LessThanInt8),
For loop expansion in SequentialScan (): 47 that terminates with the value of schema-> num_columns.

fromValueおよびtoValueの決定方法は不明確であるものの、生成されたコンパイル時クエリＳｐｉｆｆの数の境界を示すことと考えられる。 Although the method for determining fromValue and toValue is unclear, it is considered to indicate the boundary of the number of generated compile-time queries Spiff.

スニペット探索器の例示的なアルゴリズム
スニペット探索器はまず、ファイルから読み出される変数およびこれらの値がファイルに挿入される場所を追跡することによって、不変値をプログラム実行全体に広げる。個の結果、複数の実行にまたがる不変区間が得られる。また、このツールでは、値が最初にファイルに書き込まれたタイミングおよび消去されるタイミングも追跡する必要がある。 Exemplary Algorithm for Snippet Searcher The snippet searcher first extends invariant values throughout the program execution by tracking variables read from the file and where these values are inserted into the file. The result is an invariant interval that spans multiple runs. The tool also needs to keep track of when values are first written to the file and when they are erased.

スニペット探索器のその他の課題は、コストモデルを用いてスニペットの境界を示すことである。そのため、このツールでは、Ｓｐｉｆｆ構成器がもたらし得る最適化および各最適化が実現可能となる条件を把握する必要がある。 Another challenge of the snippet searcher is to use a cost model to indicate snippet boundaries. Therefore, in this tool, it is necessary to grasp the optimization that can be brought about by the Spiff composer and the conditions under which each optimization can be realized.

正確性
このツールの正確性が影響することとして、このツールにより生成された候補スニペットはそれぞれ、入力不変値、ＰＲ、実行要約、ドメインアサーション、プログラム相互作用、およびコストモデルと一致する。指定不変値は実際にスニペット上で不変であるものとし、推奨最適化はこれらの不変値およびＰＲにおけるそれぞれの操作と一致するものとし、考え得る値が実際に可能であるものとする。 Accuracy As the accuracy of the tool affects, the candidate snippets generated by the tool are consistent with the input invariant, PR, execution summary, domain assertion, program interaction, and cost model, respectively. Assume that the specified invariant values are actually invariant on the snippet, the recommended optimization is consistent with these invariant values and their respective operations in PR, and possible values are actually possible.

必須ではない望ましいこととして：
コストモデルを前提として、返されたスニペットが最も望ましい、
コストモデルにより、大型化がコスト増につながる点において、スニペットが最大である、
コストモデルにより、小型化がコスト増につながる点において、スニペットが最小である、
コストモデルを前提として、推奨最適化が有用となる。 As desirable but not essential:
Given the cost model, the returned snippet is most desirable,
The snippet is the largest in terms of cost model, which leads to increased costs.
The snippet is minimal in that the cost model leads to increased costs.
Given the cost model, recommended optimization is useful.

Ｓｐｉｆｆ構成器
図１に示すように、別のツール「Ｓｐｉｆｆ構成器」は、１つまたは複数の候補スニペットおよびＰＲを入力とし、特化ソースコードを出力として生成する。 Spiff Configurator As shown in FIG. 1, another tool “Spiff Configurator” takes one or more candidate snippets and PR as input and generates specialized source code as output.

具体的に、各入力候補スニペットについて、Ｓｐｉｆｆ構成器は、以下のタスクを実行するものとする：
１．すべてのパターンパラメータおよびＳｐｉｆｆパターン関数を規定するＳｐｉｆｆパターンの．ｈファイルを作成する。
２．Ｓｐｉｆｆ実装宣言の．ｈファイルを作成する。
３．Ｓｐｉｆｆ実装定義の．ｃファイルを作成する。
４．コードを適当な場所に挿入することによって、（動的Ｓｐｉｆｆの）Ｓｐｉｆｆを作成し、Ｓｐｉｆｆを呼び出し、（ここでも動的Ｓｐｉｆｆの）Ｓｐｉｆｆを破壊する。 Specifically, for each input candidate snippet, the Spiff composer shall perform the following tasks:
1. A Spiff pattern that defines all pattern parameters and Spiff pattern functions. h Create a file.
2. Spiff implementation declaration. h Create a file.
3. Spiff implementation definition. c file is created.
4). Create a Spiff (of Dynamic Spiff) by inserting code in the appropriate place, call Spiff, and destroy Spiff (again of Dynamic Spiff).

各使用事例は、ｍｉｎｉｄｂの指定分岐と関連付けられており、正確性が確保される。各分岐には、当該構成を生成する候補スニペットを含む。 Each use case is associated with a specified branch of minidb, ensuring accuracy. Each branch includes a candidate snippet that generates the configuration.

実際の変換には、ＰＲ−ＰＲ変換として、ＴＸＬを使用するのが有益と考えられ、変換されたＰＥはその後、変換されてテキストソースコードに戻ることでＳｐｉｆｆを作成する。ＴＸＬにはパーザを含むが、ＰＥを直接取得し得る場合もある。また、ＴＸＬには、当該ＰＥと協働し得る構文木アンパーザも含む。 For actual conversion, it is considered beneficial to use TXL as a PR-PR conversion, and the converted PE is then converted back to the text source code to create a Spiff. TXL includes a parser, but there are cases where PE can be obtained directly. The TXL also includes a syntax tree amperer that can cooperate with the PE.

Ｓｐｉｆｆ構成器が上述の通り機能するには、ドメイン知識に基づく何らかの指示が必要となる場合がある。具体的に、Ｓｐｉｆｆ構成器は、以下の付与／識別が必要となる場合がある：
生成対象のすべての静的実装の仕様（すなわち、特化する変数およびその場合の値）。
２つ以上の静的実装が当てはまる場合の曖昧性除去のルール
動的実装の作成のルール（とにかく許可されているか？、キャッシングされているか？、その場合はどのように？、メモリか？、ディスク上か？、キャッシュのサイズおよび管理ルールは？、生成された動的実装は完全に特化すべきか、または、一部のみを特化して、一部のパラメータを一般のままとするのが良いか？、必要に応じてオンザフライで動的実装をコンパイルすることが受け入れられるか、または、キャッシュにある場合にのみ動的実装を使用することが受け入れられるか？、これらの質問に対する答えが変化するか？
完全に一般の実装が作成されるべきか（そして、フォールバックとして使用されるべきか）？、あるいは、何らかの方法で一部の変数が常に特化されるか（これにより、データブロック中での表現が必要となる変数が決定されることになる）？ Some instructions based on domain knowledge may be required for the Spiff composer to function as described above. Specifically, the Spiff composer may require the following assignment / identification:
A specification of all static implementations to be generated (ie variables to be specialized and values in that case).
Rules for disambiguation when two or more static implementations apply Rules for creating dynamic implementations (whether they are allowed or cached anyway, how are they? Memory? Disk Above ?, cache size and management rules ?, generated dynamic implementation should be fully specialized, or only partly specific, some parameters should remain general Is it acceptable to compile dynamic implementations on-the-fly as needed, or to use dynamic implementations only when they are in the cache, and the answers to these questions vary ?
Should a fully general implementation be created (and used as a fallback)? Or are some variables always specialized in some way (this will determine the variables that need to be represented in the data block)?

一般的に、Ｓｐｉｆｆ構成器は、入力ファイルにおいて上記すべてが識別される。作成すべき静的実装の数、静的／動的のいずれとすべきか、特化する変数と特化しない変数等を見つけ出すのは、スニペット探索器の仕事である。作成すべき静的実装は１つだけとなり、当該単一の静的実装を常に呼び出すものとする。 In general, the Spiff composer identifies all of the above in the input file. It is the job of the snippet searcher to find out the number of static implementations to be created, whether to be static / dynamic, variables that are specialized, variables that are not specialized, and the like. There is only one static implementation to be created, and the single static implementation is always called.

クエリＳｐｉｆｆ使用事例
Ｅｘａｍｐｌｅ１３を参照して、入力は以下の通りであり、スニペット探索器が指定するように、SCAN_FWDとしてのexecutor_command、１〜６のnum_predicatesについてコンパイル時クエリＳｐｉｆｆを示しており、このような各述語について、operator_functionが&EqualInt4または&LessThanInt8である。

Referring to Example 13 of Query Spiff, the input is as follows, and compile-time query Spiff is shown for executor_command as SCAN_FWD, num_predicates of 1-6 as specified by snippet searcher, For each predicate, operator_function is & EqualInt4 or & LessThanInt8.

上述の通り、Ｓｐｉｆｆ構成器は、実行時のＳｐｉｆｆコードのコンパイルとＳｐｉｆｆインスタンスのインスタンス化との間の境界の場所に関して、エコシステム仕様からの明示的な指示を必要とする。このような境界指定がケース「S」、「I」、および「D」で起こり、これら３つのケースで呼び出されるいずれにおいてもＳｐｉｆｆがコンパイル不可能であることをエコシステム仕様が指定するものと仮定する（これは、ユーザが許容する遅延に関する知識を強調する。なお、特定の行に関して新たなＳｐｉｆｆをコンパイルするのは、全体として見れば、ワークロードの集まりを高速化するのに特に有益であるが、ユーザは依然として、コンパイルしないことを指定したいと考える。特定のワークロードは、領域特化によってそれ自体が高速動作する必要があるためである）。このため、Ｓｐｉｆｆ構成器は、エコシステム仕様を入力として含む。 As described above, the Spiff constructor requires explicit instructions from the ecosystem specification regarding the location of the boundary between compiling the Spiff code at runtime and instantiating the Spiff instance. Assume that such a boundary specification occurs in cases “S”, “I”, and “D” and that the ecosystem specification specifies that Spiff is not compilable in any of these three cases. (This emphasizes the user's knowledge of delays allowed. Note that compiling a new Spiff for a particular row is particularly beneficial to speed up the collection of workloads as a whole. However, the user still wants to specify that it will not compile, because certain workloads need to run faster due to domain specialization). For this reason, the Spiff composer includes an ecosystem specification as an input.

このように、Ｓｐｉｆｆ構成器は、SequentialScan()の一部についてＳｐｉｆｆを構成するが、query.executor_commandが常にSCAN_FWDであることから、コンパイル時にnum_columnsの各値に対して１つを構成する。各述語について、column_idは任意のintであり、constant_operandはstdinから読み出されたunsigned longであり、operator_functionは&EqualInt4または&LessThanInt8である（Ｓｐｉｆｆ０は、任意数のnum_columnsを取り扱い可能な非特化である）。関連する変換は、ループ展開および定数畳み込みである。Ｓｐｉｆｆ構成器は、num_predicates=2の場合、２３のＳｐｉｆｆＩＤに関してＳｐｉｆｆパターンを生成するが、以下の通り、第１がcolumn_id=2およびoperator_function=&EqualInt4、第２がcolumn_id=7およびoperator_function=&LessThanInt8である。なお、クエリＳｐｉｆｆＩＤの演算は通常、Ｓｐｉｆｆパターンパラメータの特定の値と関連付けられている。Ｓｐｉｆｆ構成器は、アプリケーション固有のＩＤ生成メカニズムを利用して、適正なＳｐｉｆｆＩＤを生成する。ただし、本実施例では、演算されたＳｐｉｆｆＩＤとして２３を仮定する。 In this way, the Spiff composer configures Spiff for a part of SequentialScan (), but since query.executor_command is always SCAN_FWD, one is configured for each value of num_columns at compile time. For each predicate, column_id is an arbitrary int, constant_operand is an unsigned long read from stdin, and operator_function is & EqualInt4 or & LessThanInt8 (Spiff0 is a non-specialization that can handle any number of num_columns). Related transformations are loop unrolling and constant convolution. The Spiff composer generates a Spiff pattern for 23 SpiffIDs when num_predicates = 2. As follows, the first is column_id = 2 and operator_function = & EqualInt4, and the second is column_id = 7 and operator_function = & LessThanInt8. Note that the computation of the query SpiffID is usually associated with a specific value of the Spiff pattern parameter. The Spiff composer generates an appropriate SpiffID using an application-specific ID generation mechanism. However, in this embodiment, 23 is assumed as the calculated SpiffID.

Ｓｐｉｆｆ構成器の例示的なアルゴリズム
Ｓｐｉｆｆ構成器は、不変値を示した後にコンパイラの最適化実行を許可するか、または、異なるコードの生成によって最適化を手動で実行するかの一方のみを決定する。 Example Algorithm of the Spiff Configurator The Spiff Configurator only determines whether to allow the compiler to perform optimization after showing invariant values or to perform optimization manually by generating different code .

そして、Ｓｐｉｆｆ構成器は、関連ＰＥ内のファイル名、行番号、および列数を用いた元のソースから特化ソースへのほぼ逐語的なコピーによって、生成されたファイルをつなぎ合わせる。このため、Ｓｐｉｆｆ構成器は、ごく限られた量の構文解析および構文構築を行う必要があり、その労力の大部分は、元のソースの適所から特化ソースの適所へのコードのコピーから成る。 The Spiff composer then stitches the generated files by a near verbatim copy from the original source to the specialized source using the file name, row number, and number of columns in the associated PE. For this reason, the Spiff composer needs to do a very limited amount of parsing and syntax building, and most of its effort consists of copying code from the original source to the specialized source in place. .

正確性
正確性が影響することとして、コードは、コンパイルおよび動作を行うが、セマンティクスにおいては、置き換えた元のコードと同一である一方、入力情報と一致する。 Correctness As correctness affects, the code compiles and operates, but in semantics it is identical to the original code it replaces, but matches the input information.

以下の説明は、ＭｉｎｉＤＢテーブルＳｐｉｆｆおよびＭｉｎｉＤＢクエリＳｐｉｆｆの作成を実証する付加的な実施例を与えている。 The following description provides additional examples that demonstrate the creation of the MiniDB table Spiff and the MiniDB query Spiff.

ＭｉｎｉＤＢテーブルＳｐｉｆｆ
以下の実施例では、Ｅｘａｍｐｌｅ４に示すSequentialScan()関数における不変値schema->num_columns==CONSTANTからのテーブルＳｐｉｆｆの作成を実証している。 MiniDB table Spiff
In the following example, creation of a table Spiff from invariant values schema-> num_columns == CONSTANT in the SequentialScan () function shown in Example 4 is demonstrated.

不変探索器
上記実施例において、不変探索器は、SequentialScan()::schema->num_columns変数に関して、以下の不変区間セットを識別するものとする：
不変区間セット＃１：行５２で始まり、１不変区間である：
不変区間＃１．１：行１１４で終わる Invariant Searcher In the above example, the invariant searcher shall identify the following invariant interval set with respect to the SequentialScan () :: schema-> num_columns variable:
Invariant section set # 1: Starts at row 52, one invariant section:
Invariant section # 1.1: ends at line 114

また、不変探索器は、ＶＦＴを生成して、変数SequentialScan()::schema->num_columnsが値を得る場所を示すものとする：
SequentialScan()::schema->num_columnsは、ExecuteQuery()::query->schema->num_columnsから値を得る
ExecuteQuery()::query->schema->num_columnsは、main():query->schema->num_columnsから値を得る
main():query->schema->num_columnsは、main()::table_header->num_columnsから値を得る
main()::table_header->num_columnsは、OpenTable()のfread()から値を得る。 Also, the invariant searcher shall generate a VFT and indicate where the variable SequentialScan () :: schema-> num_columns gets a value:
SequentialScan () :: schema-> num_columns gets the value from ExecuteQuery () :: query->schema-> num_columns
ExecuteQuery () :: query->schema-> num_columns gets the value from main (): query->schema-> num_columns
main (): query->schema-> num_columns gets the value from main () :: table_header-> num_columns
main () :: table_header-> num_columns gets the value from fread () of OpenTable ().

このため、SequentialScan()::schema->num_columnsの値は最終的に、OpenTable()のfread()の呼び出しに由来する。 For this reason, the value of SequentialScan () :: schema-> num_columns ultimately comes from the call to fread () in OpenTable ().

不変検査器
不変検査器は、main()::table_header->num_columnsが一度だけ行６３４に割り当てられ、所与のワークロードの当該実行による特定の終端ノードを通してこの変数の値が決して変更されていないことを確認する。 Invariant checker The invariant checker is that main () :: table_header-> num_columns is assigned to row 634 only once, and the value of this variable has never been changed through a specific end node by that execution of the given workload. Make sure.

不変探索器の前にドメインアサーション推定器が実行されている場合、不変検査器は、実際の値が考え得る値に含まれている旨を確認することも可能である。これにより、当該解析を特定の値または変数に集中させることによって、不変探索器の範囲を抑えることも可能と考えられる。 If the domain assertion estimator is executed before the invariant searcher, the invariant checker can also confirm that the actual value is included in the possible values. Thus, it is considered that the range of the invariant searcher can be suppressed by concentrating the analysis on a specific value or variable.

スニペット探索器
スニペット探索器は、コストモデルと併せた実行要約の解析によって、ケース「C」、「I」、および「D」がＳｐｉｆｆの作成には高価過ぎるものの、ケース「S」内の演算時間が当該ケースの特化の推奨には十分である旨を判定する。 Snippet Searcher The snippet searcher uses the execution summary analysis along with the cost model to calculate the computation time in case “S”, although cases “C”, “I”, and “D” are too expensive to create Spiff. Is sufficient to recommend specialization of the case.

実行が固定回数またはある割合の回数未満であるＰＥ（または、他の同等な具現化）が特化されないことのみを記述する単純なコストモデルで開始するものとする。 Let's start with a simple cost model that only describes that PEs (or other equivalent implementations) whose execution is less than a fixed number or percentage of times are not specialized.

そして、このような場合、スニペット探索器は、ドメインアサーションによって、schema->num_columnsがSequentialScan()の全体で、データファイルの作成時から当該ファイルの除去までの範囲において不変であるため、ｍｉｎｉｄｂ．ｃ：５５３の直後に実行されるWriteTableHeader():3に列数が格納された場合に当該変数の値が最初にファイルに書き込まれたものと推定する。この値は、決してファイルから除去されていないものの、ファイル自体がmain():57で除去されている。これは、コンパイル時にＳｐｉｆｆを作成可能であることを示している。スニペットは、ExecuteTable():2θからExecuteTable():23を介して広がるものとする。これがnum_columns上で特化されたスニペットの範囲であり、num_columns上で特化可能な他の文を見て決定される（本質的に、num_columnsは使用頻度が低いため、この特化機会からは離れている）。ただし、別途の間接呼び出しは高価であるため、スニペット探索器は、行７にこの不変値の別の用途を有するExecuteQuery()の全体にこのスニペットを広げる。最後に、スニペット探索器は、この候補スニペットのループ展開を推奨するものとする。 In such a case, the snippet searcher uses the domain assertion so that schema-> num_columns is invariant in the range from the creation of the data file to the removal of the file for the entire SequentialScan (). c: When the number of columns is stored in WriteTableHeader (): 3 executed immediately after 553, it is estimated that the value of the variable is first written to the file. This value has never been removed from the file, but the file itself has been removed with main (): 57. This indicates that Spiff can be created at the time of compilation. The snippet is assumed to spread from ExecuteTable (): 2θ through ExecuteTable (): 23. This is the scope of the snippet specialized on num_columns and is determined by looking at other statements that can be specialized on num_columns (essentially, num_columns is less frequently used, so it is far from this specialization opportunity. ing). However, since the separate indirect call is expensive, the snippet searcher extends this snippet throughout ExecuteQuery () with another use of this invariant value in line 7. Finally, the snippet searcher shall recommend loop expansion of this candidate snippet.

この場合、Ｓｐｉｆｆは、Ｅｘａｍｐｌｅ１４に示す<snippet>で示されるＳｐｉｆｆ関数を１つだけ有することになる。

In this case, Spiff has only one Spiff function indicated by <snippet> shown in Example 14.

スニペット探索器は、ドメインアサーションによって、データパケットがmain()のケース「I」および「D」で作成され、ケース「P」および「D」で除去されるものと推定することも可能である。より具体的に、スニペット探索器は、以下を推定する：
SequentialScan():16（データパケットの読み込み直後）からSequentialScan():38（ループ用の解凍の最後）までのコードの区間、
一組の不変値（row_dataからの値および上記不変値解析によるスキーマ）、
各不変値の考え得る一組の値（この場合、num_columnsの値は３であり、第１列の値はハードコード化intおよびschemaであり、第１の列の型はint、第２の列の型はlong、第３の列の型はint、任意の文字のアレイである）、
各不変値が最初にファイルに書き込まれたソース位置（WriteRow():18およびWriteRow():25）、
当該値がファイルから除去されたソース位置（main():57およびExecuteDelete():25）、
値が時間ごとに読み出されるソース位置（main():SequentialScan():3）、
query.schema不変値を用いてテーブル規定時間に構築された候補スニペットの適当な寿命（挿入の可能性があるデータパケットを含むことから、Ｓｐｉｆｆが実行時にインスタンス化された場合はrow_valuesが与えられ、数クエリが動作した後、データパケットが除去されるため、高速である必要がある。また、row_valuesの可能性の数が非常に大きいためである）、
schema->num_columnsの値で終端するとともにrow_dataおよびshcema値を使用する場合のSequentialScan():16でのｆｏｒループの展開。 The snippet searcher can also infer that the data packet is created in cases “I” and “D” of main () and removed in cases “P” and “D” by domain assertion. More specifically, the snippet searcher estimates:
Code section from SequentialScan (): 16 (immediately after reading the data packet) to SequentialScan (): 38 (end of decompression for loop),
A set of invariant values (values from row_data and schema from invariant value analysis above),
A possible set of values for each invariant value (in this case the value of num_columns is 3, the value of the first column is hard-coded int and schema, the type of the first column is int, the second column Type is long, the third column type is int, an array of arbitrary characters),
The source location (WriteRow (): 18 and WriteRow (): 25) where each invariant was first written to the file,
The source location where the value was removed from the file (main (): 57 and ExecuteDelete (): 25),
Source position (main (): SequentialScan (): 3) from which the value is read every time
the appropriate lifetime of the candidate snippet constructed at the table stipulated time using query.schema invariant values (because it includes data packets that may be inserted, so if Spiff is instantiated at runtime, row_values is given, Data packets are removed after a few queries work, so it needs to be fast (because the number of possible row_values is very large)
For loop expansion in SequentialScan (): 16 when ending with schema-> num_columns value and using row_data and shcema values.

なお、Ｅｘａｍｐｌｅ１５に示すように、解析では、より広い範囲のテーブル不変値をより狭い範囲の行不変値と組み合わせるとともに、それぞれについて異なる方法を採用する。テーブルＳｐｉｆｆが規定された場合に前者がコードを生成可能である一方、後者は、row_valuesアレイの値を与えることによる実行時のＳｐｉｆｆのインスタンス化を伴う。ＤＢＭＳの領域特化において、スキーマ不変値は、連続的に狭い範囲の不変値を含むテーブルＳｐｉｆｆインスタンスおよびクエリＳｐｉｆｆにおいて大きな役割を果たすことになる。

As shown in Example 15, in the analysis, a table invariant value in a wider range is combined with a row invariant value in a narrower range, and a different method is adopted for each. The former can generate code when a table Spiff is defined, while the latter involves instantiation of Spiff at runtime by providing values in the row_values array. In DBMS domain specialization, schema invariant values will play a major role in table Spiff instances and query Spiffs that contain continuously narrow ranges of invariant values.

Ｓｐｉｆｆ構成器
これは、インスタンスが存在しないため、最も簡単な使用事例である。この事例について、４つの変形例を詳しく調べる。 Spiff composer This is the simplest use case because there are no instances. In this case, four variations are examined in detail.

変形例１：単一の静的実装：
Ｅｘａｍｐｌｅ１６に示す静的なＳｐｉｆｆ実装となるｍｉｎｉｄｂの単一不変値に対応する以下の入力候補スニペットを考える。

createAt="compileTime"は、このＳｐｉｆｆが静的実装を有すべきことを記述したものである。
valueRead="OpenTable():8"は、変数が外界から読み出される場所を記述しているため、Ｓｐｉｆｆが選択される可能性がある場所を示している。
existsFrom="WriteTableHeader():3"は、変数が外界に書き出される場所を記述しているため、動的Ｓｐｉｆｆが作成される可能性がある場所を示している。静的Ｓｐｉｆｆの場合は、安全に無視することができる。
existsTo="main():57"は、「外界」がファイルである場合に、ファイルが消去／除去される場所を記述しているため、動的Ｓｐｉｆｆのガベージコレクションが行われる可能性がある場所を示している。静的Ｓｐｉｆｆの場合は、安全に無視することができる。
replaceFunction="ExecuteQuery()"は、特化すべき関数を記述したものである。ここでは１つだけ存在するが、一般的には多数存在可能である。
value="3"は、Ｓｐｉｆｆを（この場合、静的に）生成すべき不変数の値を記述したものである。ここでは１つだけ存在するが、一般的には多数存在可能である。 Variation 1: Single static implementation:
Consider the following input candidate snippet corresponding to a single invariant value of minidb, which is a static Spiff implementation shown in Example 16.

createAt = "compileTime" describes that this Spiff should have a static implementation.
Since valueRead = "OpenTable (): 8" describes a place where a variable is read from the outside world, it indicates a place where Spiff may be selected.
existsFrom = "WriteTableHeader (): 3" describes a place where a dynamic Spiff may be created because a place where a variable is written to the outside world is described. In the case of static Spiff, it can be safely ignored.
existsTo = "main (): 57" describes the location where the file is erased / removed when the "external world" is a file, so the location where dynamic Spiff garbage collection may occur Is shown. In the case of static Spiff, it can be safely ignored.
replaceFunction = "ExecuteQuery ()" describes the function to be specialized. There is only one here, but in general there can be many.
value = "3" describes the value of the invariable to generate Spiff (in this case, statically). There is only one here, but in general there can be many.

この入力は、１つのＳｐｉｆｆパターン関数がExecuteQuery()に基づくＳｐｉｆｆパターンの構成と、変数ExecuteQuery()::query->schema->num_columnsを単一のリテラル値３に特化する静的実装の構成とをＳｐｉｆｆ構成器に伝えている。 This input consists of a Spiff pattern based on a single Spiff pattern function based on ExecuteQuery () and a static implementation that specializes the variable ExecuteQuery () :: query-> schema-> num_columns into a single literal value 3. To the Spiff composer.

変形例２：細粒度での特化
上記例は、関数（ExecuteQuery()）全体を置き換えるように適用されたＳｐｉｆｆを示している。実際のところは、当該関数のわずかなコードセグメントのみが不変値を含むことを観察可能である。その後、以下のスニペットに示すように、上記わずかなコードセグメントをＳｐｉｆｆに変換可能である。 Variant 2: Specialization with fine granularity The above example shows Spiff applied to replace the entire function (ExecuteQuery ()). In fact, it can be observed that only a few code segments of the function contain invariant values. Then, as shown in the following snippet, the few code segments can be converted to Spiff.

以下に示す（Ｅｘａｍｐｌｅ１７に示す）候補スニペットは、上記と酷似しているが、区間がExecuteQuery()関数全体よりもはるかに小さく、ｆｏｒループの３行だけである。したがって、replaceFunction属性が消えている。最後に、行２１の定数畳み込み推奨が省略されているのは、当該行が特化対象の区間内ではないためである（残しておくことも可能であるが、その場合は無視される）。

The candidate snippet shown below (shown in Example 17) is very similar to the above, but the interval is much smaller than the entire ExecuteQuery () function, with only three lines in the for loop. Therefore, the replaceFunction attribute has disappeared. Finally, the constant convolution recommendation on line 21 is omitted because the line is not within the section to be specialized (it can be left, but in that case it is ignored).

変形例３：固定実装アレイの採用
特定の実施例においては、１バイト整数でのＳｐｉｆｆ実装の識別を決定する。したがって、同じＳｐｉｆｆパターンに計２５５個の実装を有することができ、num_columns変数がＳｐｉｆｆパターンパラメータとして動作することにより、１〜２５５の間で変動する（無効値を表す場合は０を選定）。このため、以下のＥｘａｍｐｌｅ４２に示す候補スニペットは、値３のみならず、１〜２５５のすべての値（すなわち、invariantIntervalSet要素のfromValueおよびtoValue属性）を有する。また、関数全置き換えに立ち戻る。

Variant 3: Adoption of Fixed Mounting Array In a specific embodiment, the identification of the Spiff implementation with a 1-byte integer is determined. Therefore, it is possible to have a total of 255 implementations in the same Spiff pattern, and the num_columns variable varies between 1 and 255 by operating as a Spiff pattern parameter (0 is selected to represent an invalid value). For this reason, the candidate snippet shown in the following Example 42 has not only the value 3 but also all the values 1 to 255 (that is, the fromValue and toValue attributes of the invariantIntervalSet element). Return to the function replacement.

変形例４：動的Ｓｐｉｆｆ
現実的に、テーブルの各列としては、特定のデータ型が可能である。８つのデータ型（int2、int4、char、varchar等）が存在するものと仮定すると、３列テーブルの静的なテーブルＳｐｉｆｆには、考え得る実装を要する。したがって、このシナリオでは、動的なテーブルＳｐｉｆｆの方が適している。 Variation 4: Dynamic Spiff
In reality, each column of the table can have a specific data type. Assuming that there are eight data types (int2, int4, char, varchar, etc.), the static table Spiff of the three-column table requires a possible implementation. Therefore, in this scenario, the dynamic table Spiff is more suitable.

以下に与える候補スニペット（Ｅｘａｍｐｌｅ１９参照）は、これをcreateAt属性で記述しているが、ここでは、Ｓｐｉｆｆが作成されるアプリケーション中の場所すなわちCreateTable()関数（createAt属性（先の実施例ではcompileTime））内の場所およびＳｐｉｆｆがインスタンス化される場所すなわちOpenTable()関数（instantiateAt属性）内の場所を指定する。スニペットがインスタンス化された場合にnum_columns値が供給されることから、invariantlntervalSet要素にはfromValue属性もtoValue属性も存在しない。先の実施例とのもう１つの重要な相違点は、column_definitions上での定数畳み込みの付加的な最適化推奨である。

The candidate snippet given below (see Example 19) describes this with the createAt attribute. Here, the location in the application where the Spiff is created, that is, the CreateTable () function (createAt attribute (compileTime in the previous example)) ) And the location where Spiff is instantiated, ie the location in the OpenTable () function (instantiateAt attribute). Since the num_columns value is supplied when the snippet is instantiated, the invariantlntervalSet element has no fromValue or toValue attributes. Another important difference from the previous embodiment is the additional optimization recommendation for constant folding on column_definitions.

静的なＳｐｉｆｆの作成と異なり、動的なＳｐｉｆｆは、テーブルＳｐｉｆｆに関して、Ｓｐｉｆｆのコンパイル呼び出しをCreateTable()に挿入することによって、実行時に作成される。 Unlike static Spiff creation, dynamic Spiff is created at run time by inserting a Spiff compilation call into CreateTable () for the table Spiff.

さまざまな種類のＳｐｉｆｆの設計
（ここでは、オープンソースＰｏｓｔｇｒｅｓＤＢＭＳからの例を使用する）
述語クエリＳｐｉｆｆ
このＳｐｉｆｆは、o_orderdate>=date'19940801'等のクエリ内の一般述語およびo_orderkey=l_orderkey等の結合述語の両者を評価することによって利用される。 Various types of Spiff designs (Here we use examples from the open source Postgres DBMS)
Predicate query Spiff
This Spiff is used by evaluating both general predicates in queries such as o_orderdate> = date '19940801' and join predicates such as o_orderkey = l_orderkey.

これらの述語は、（Ｐｏｓｔｇｒｅｓの）ExecQual()関数によって評価される。具体的に、述語は通常、連結リストで表される。ExecQual()は、このリストを介して繰り返し適用されるとともに、個々の述語に対応する特定の評価関数を呼び出す。Ｅｘａｍｐｌｅ２０に提示のコード抜粋（ＰＧ９．３ｓｔｏｃｋ，ｓｒｃ／ｂａｃｋｅｎｄ／ｅｘｅｃｕｔｏｒ／ｅｘｅｃQｕａｌ．ｃ：５１２５）は、このようなロジックを示している。

These predicates are evaluated by the ExecQual () function (Postgres). Specifically, the predicate is usually represented by a linked list. ExecQual () is applied repeatedly through this list and calls a specific evaluation function corresponding to each predicate. The code excerpt (PG9.3stock, src / backend / executor / execQual.c: 5125) presented to Example 20 shows such logic.

述語ごとの評価関数は、変数項に格納されている。a>bの形態の各述語について、オペランド＃１、オペレータ、およびオペランド＃２という３つの構成要素が存在する。Ｐｏｓｔｇｒｅｓにおいては、関数ExecEvalOperによってオペレータが評価される。この関数（Ｅｘａｍｐｌｅ２１参照）は本質的に、オペレータのタイプに応じた検索を実行するとともに、実際のタイプ固有の比較関数のアドレスをフェッチする。また、ExecEvalOper()では、オペランドが別の連結リストに格納されている必要がある。多くの場合、このリストの長さは、２である。以下は、これらの場合における当該関数の特化の一例である。

The evaluation function for each predicate is stored in a variable term. For each predicate of the form a> b, there are three components: operand # 1, operator, and operand # 2. In Postgres, the operator is evaluated by the function ExecEvalOper. This function (see Example 21) essentially performs a search according to the type of operator and fetches the address of the actual type-specific comparison function. ExecEvalOper () requires that the operands be stored in a separate linked list. In many cases, the length of this list is 2. The following is an example of specialization of the function in these cases.

なお、ExecEvalOper()は、一度の実行だけで比較関数の検索を行うことにより最適化する。その後、異なる関数をxprstate.evalfuncに格納する。また、当該関数を一度呼び出して述語とする。オペレータの後続の評価は、ExecMakeFunctionResultNoSets()によって行われる（現在の特化範囲内で考えられるスカラー述語の場合）。 ExecEvalOper () is optimized by searching the comparison function only once. Then store the different functions in xprstate.evalfunc. Also, the function is called once and used as a predicate. Subsequent evaluation of the operator is performed by ExecMakeFunctionResultNoSets () (for scalar predicates considered within the current specialization scope).

その後、各オペランドについて、引数抽出関数を呼び出すことにより、オペランドのリストを通してExecMakeFunctionResultNoSets()が繰り返し適用される。 After that, ExecMakeFunctionResultNoSets () is repeatedly applied through the list of operands by calling the argument extraction function for each operand.

ExecEvalExprは、ｓｒｃ／ｉｎｃｌｕｄｅ／ｅｘｅｃｕｔｏｒ／ｅｘｅｃｕｔｏｒ．ｈ：７２において以下のように規定されたマクロである。
#define ExecEvalExpr(expr,econtext,isNull,isDone)＼
((*(expr)>evalfunc)(expr,econtext,isNull,isDone)) ExecEvalExpr is src / include / executor / executor. h: Macro defined in 72 as follows.
#define ExecEvalExpr (expr, econtext, isNull, isDone) \
((* (expr)> evalfunc) (expr, econtext, isNull, isDone))

このため、オペランドが定数の場合は、ExecEvalConst()が呼び出され、最後に比較関数を呼び出すことになる。 Therefore, if the operand is a constant, ExecEvalConst () is called, and the comparison function is called last.

述語評価において見られるボトルネックとして、第１に、オペランドリスト中の２つの要素のみを通して繰り返し適用されるループと、第２に、個々のオペランドの抽出とがある。具体的には、一般述語に関して、一方のオペランドが通常はテーブル列であり、他方のオペランドが定数である。このような場合は、複数の関数を呼び出してフェッチする必要なく、定数の値（または、アドレス）をコードに直接「格納」可能である。また、元の実装では、複数の関数呼び出しによって、テーブル由来オペランドの列ＩＤを抽出する必要がある。同様に、この列ＩＤについても、特化コードに直接格納することができる。 The bottlenecks found in predicate evaluation are firstly a loop that is applied iteratively through only two elements in the operand list, and secondly the extraction of individual operands. Specifically, for general predicates, one operand is usually a table column and the other operand is a constant. In such cases, the value (or address) of the constant can be “stored” directly in the code without having to call and fetch multiple functions. In the original implementation, it is necessary to extract the column ID of the table-derived operand by a plurality of function calls. Similarly, this column ID can also be stored directly in the specialized code.

結合述語の場合は、両オペランドが非定数である。オペランドの起点としては、INNER_VAR(I)、0UTER_VAR(O)、およびscantuple(S)という３つのタイプのうちの１つが可能である。また、オペランドの起点は、クエリを前提とした不変値でもある。この不変値を把握することによって、実際のオペランドの値を抽出するルーチンをさらに簡略化することができる。なお、理論的には、２つのオペランドの起点として９つの組み合わせが考えられるが、現実的には、以下の組み合わせのみが可能である。

For a join predicate, both operands are non-constant. The starting point of the operand can be one of three types: INNER_VAR (I), 0UTER_VAR (O), and scantuple (S). The starting point of the operand is also an invariant value based on the query. By grasping this invariant value, the routine for extracting the actual operand value can be further simplified. Theoretically, nine combinations are conceivable as starting points of the two operands, but in reality, only the following combinations are possible.

ハッシュ結合クエリＳｐｉｆｆ
ファイルｓｒｃ／ｂａｃｋｅｎｄ／ｅｘｅｃｕｔｏｒ／ｎｏｄｅＨａｓｈｊｏｉｎ．ｃに規定された関数ExecHashJoin()においては、所与のクエリに関して、変数node->js.jointypeが不変である。これは、クエリに応じて、集合{JOIN_ANTI，JOIN_SEMI，JOIN_LEFT，JOIN_INNER}から１つの値を選定することになる。 Hash join query Spiff
File src / backend / executor / nodeHashjoin. In the function ExecHashJoin () defined in c, the variable node-> js.jointype is invariant for a given query. According to the query, one value is selected from the set {JOIN_ANTI, JOIN_SEMI, JOIN_LEFT, JOIN_INNER}.

同じファイルおよび関数においては、所与のクエリに関して、変数List*joinqualも不変である。 In the same file and function, the variable List * joinqual is also unchanged for a given query.

ハッシュ結合クエリＳｐｉｆｆは、コードの全分岐を取り除くことによって、ｉｆ文の数を抑えるとともに、より重要なこととして、コードのサイズを小さくする。 The hash join query Spiff reduces the number of if statements by removing all branches of the code, and more importantly reduces the size of the code.

解析では、ポインタおよびヒープ割り当て構造体を伴う複雑なデータ構造を取り扱える必要がある。たとえば、ExecHashJoin()のｆｏｒループ本体のｉｆ文を取り除くには、以下（Ｅｘａｍｐｌｅ２２）のような表現を判断できる必要がある。

The analysis needs to be able to handle complex data structures with pointers and heap allocation structures. For example, in order to remove the if statement in the for loop body of ExecHashJoin (), it is necessary to be able to determine the following expression (Example22).

ページＳｐｉｆｆ
ページＳｐｉｆｆは、ＤＢＭＳがデータの格納を管理するディスク／メモリページ内の不変値を利用する。このような不変値としては、ページ上に格納された行の数、残留自由空間、およびページの空充が挙げられることが多い。Ｐｏｓｔｇｒｅｓのページスキャンルーチンにおいては、スキャン方向およびスキャンモード等の付加的な不変値が存在する（pageatatime）。 Page Spiff
The page Spiff uses an invariant value in a disk / memory page managed by the DBMS for data storage. Such invariant values often include the number of rows stored on the page, the residual free space, and the empty page. In the Postgres page scan routine, there are additional invariant values such as scan direction and scan mode (pageatatime).

より興味深いこととして、ページＳｐｉｆｆは、より積極的な最適化を有効にし得る。たとえば、ページＳｐｉｆｆは、ページがメモリに読み込まれたらデータレイアウトを認識して、データ局所性を最適化することができる。また、データレイアウトが変更となったら、別途プロセスにおける既存の関数呼び出しシーケンスに従う代わりに、ページＳｐｉｆｆは、これらの呼び出しを一度に１ブロックずつ行使することによって、命令局所性を同様に改善することができる。 More interestingly, page Spiff may enable more aggressive optimization. For example, page Spiff can optimize the data locality by recognizing the data layout once the page is read into memory. Also, if the data layout changes, instead of following the existing function call sequence in a separate process, Page Spiff can similarly improve instruction locality by exercising these calls one block at a time. it can.

ページＳｐｉｆｆは、データをしばらく辞退して、最終的にデータアクセスとなる長い呼び出しシーケンスを特化可能であり、呼び出された関数の多くのコードをすべて特化することができる。 The page Spiff can specialize a long call sequence that declines data for some time and eventually results in data access, and can specialize all the code for many of the called functions.

ページＳｐｉｆｆの主な利点として、呼び出された関数のインライン化により、命令キャッシュに適合し得る単一の特化関数を生成することができる。この変換がなされると、その他３つの相互排他的な変換が利用可能となる。
１．GetColumnsToLongs()仕様コードルーチンの活発な起動（圧縮タプルがページから抽出されたら直ちに、解凍tupletableslotに変換した後、仕様コードルーチンが操作するアレイに格納する）。
２．活発な部分的解凍（特化コードを呼び出すコードに必要となる最大列を演算させて、この最大列までを解凍する）
３．低調な解凍（元のコードがGetColumnsToLong()を呼び出したいずれの場所においても、複数の解凍動作を実行する）。 The main advantage of page Spiff is that the invocation of the called function can generate a single specialized function that can fit into the instruction cache. Once this conversion is done, the other three mutually exclusive conversions can be used.
1. Active activation of the GetColumnsToLongs () spec code routine (as soon as the compressed tuple is extracted from the page, convert it to a decompressed tupletableslot and store it in the array that the spec code routine operates on).
2. Active partial decompression (calculates the maximum sequence required for the code that calls the specialized code and decompresses up to this maximum sequence)
3. Mild decompression (performs multiple decompression operations wherever the original code calls GetColumnsToLong ()).

これの変形例では、選択肢の選択性を用いて決定する。選択性が高い場合は、数行のみが参照されることを意味するため、述語の適用前に低調な解凍を使用する。 In this modified example, the determination is made using the selectivity of options. If the selectivity is high, it means that only a few lines are referenced, so use a mild decompression before applying the predicate.

一般的には、実行によって命令キャッシュ局所性が最大となり得るように、GetColumnsToLong()の呼び出しを配置するのが最善である。 In general, it is best to place a call to GetColumnsToLong () so that instruction cache locality can be maximized by execution.

集約クエリＳｐｉｆｆ
集約Ｓｐｉｆｆは、SUMおよびAVG集約関数の効率を向上させるように設計されている。特に、数値データ型による集約関数の評価においては、メモリ割り当ておよび割り当て解除の実行において、Ｐｏｓｔｇｒｅｓが有意なオーバヘッドを招くことが分かっている。特に、集約Ｓｐｉｆｆは、このようなメモリ管理オーバヘッドを回避する。 Aggregate query Spiff
Aggregation Spiff is designed to improve the efficiency of SUM and AVG aggregation functions. In particular, in evaluating aggregate functions with numeric data types, it has been found that Postgres incurs significant overhead in performing memory allocation and deallocation. In particular, Aggregation Spiff avoids such memory management overhead.

Ｐｏｓｔｇｒｅｓにおいては、バイト文字列によって数値型が表され、各桁はNumericDigitのアレイに格納される。この表現によれば、非常に緻密な制御が可能であるものの、本質的に文字列に基づく算術演算の実行が必要になって、性能が犠牲になる。 In Postgres, a numeric type is represented by a byte string, and each digit is stored in an array of NumericDigit. According to this expression, although very precise control is possible, it is necessary to execute an arithmetic operation based on a character string, and performance is sacrificed.

一般的な実装において、行ごとのメモリ割り当ての実行が必要となる理由は、各入力行について、各行の値に存在する桁数が異なり得るためである。特に、a*bの評価においては、結果としての値の範囲が入力値の範囲を大きく超える可能性がある。それにも関わらず、Ｐｏｓｔｇｒｅｓにおいては、数値に関して対応可能な最大桁数を規定する定数（NUMERIC_MAX_PRECISION）が存在する。集約Ｓｐｉｆｆは、この値を利用することにより、Ｓｐｉｆｆデータ部のスラブ割り当てを行うが、その後、すべての入力行にわたって対応する集約関数を演算することにより、これを再利用して行ごとのメモリ割り当てを排除する。 In a general implementation, it is necessary to execute memory allocation for each row because the number of digits existing in the value of each row may be different for each input row. In particular, in the evaluation of a * b, the resulting value range may greatly exceed the input value range. Nevertheless, in Postgres, there is a constant (NUMERIC_MAX_PRECISION) that defines the maximum number of digits that can be handled with respect to numerical values. Aggregation Spiff uses this value to perform slab allocation of the Spiff data part, but then reuses this by computing the corresponding aggregation function across all input rows, and allocates memory for each row. Eliminate.

なお、集約関数の評価は、２つのステップから成る。たとえば、集約関数SUM(a + b)を前提として、第１のステップでは、式a+bの結果を評価する。そして、第２のステップでは、すべての入力行について、a+bの値を累算する。ＰｏｓｔｇｒｅＳＱＬにおいては、numeric_add()関数を用いることにより、a+bおよびSUM()関数の両者が評価される。この関数は、２つの入力を取得する。a+bの場合、２つの入力はそれぞれ、aおよびbである。SUM(x)の演算の場合、第２の入力はxであり、本質的にはスキャン行に由来し得る。第１の入力は、この点までに処理された現在の行の合計である遷移値である。 The evaluation of the aggregate function consists of two steps. For example, assuming the aggregate function SUM (a + b), in the first step, the result of the expression a + b is evaluated. In the second step, the value of a + b is accumulated for all input rows. In PostgreSQL, both the a + b and SUM () functions are evaluated by using the numeric_add () function. This function takes two inputs. For a + b, the two inputs are a and b, respectively. For the operation of SUM (x), the second input is x, which can essentially come from the scan row. The first input is a transition value that is the sum of the current row processed up to this point.

SUM()の評価
numeric_add()によれば、２つの入力が追加され、make_result()により割り当てられた戻りres変数に結果の値がコピーされる。その後、ｎｏｄｅＡｇｇ．ｃ内のadvance_transition_function()にresが戻され、この戻り値がpergroupstate->transValueにコピーされた後、戻り値が自由になる。次回のadvance_transition_function()は次の行を処理するように実行され、transValueは、以下のスニペットを介してnumeric_add()の第１の入力値にコピーされる。
fcinfo->arg[0]=pergroupstate->transvalue;
fcinfo->argnull[0]=pergroupstate->transValueIsNull; Evaluation of SUM ()
According to numeric_add (), two inputs are added, and the result value is copied to the return res variable allocated by make_result (). Then, nodeAgg. After res is returned to advance_transition_function () in c and this return value is copied to pergroupstate-> transValue, the return value becomes free. The next advance_transition_function () is executed to process the next line, and transValue is copied to the first input value of numeric_add () via the following snippet.
fcinfo-> arg [0] = pergroupstate->transvalue;
fcinfo-> argnull [0] = pergroupstate->transValueIsNull;

このロジックは、すべて行にわたって自由にすることなく、transValueを実際に共有可能であることを示している。したがって、EvaluateNumericAddＳｐｉｆｆのデータ部の場合は、集約評価の最初に、AllocateAggTempValues()を用いることによって、必要な変数すなわちagg_temp_values->result_valueおよびagg_temp_values->result_argが割り当てられる（なお、これら２つの変数は、同じ値を表すものの、Ｐｏｓｔｇｒｅｓにおいては、このような２つの変数がそれぞれ、戻り値および一時的演算引数である必要がある）。 This logic shows that transValue can actually be shared without having to be free across all rows. Therefore, in the case of the data part of EvaluateNumericAddSpiff, the necessary variables, ie, agg_temp_values-> result_value and agg_temp_values-> result_arg are assigned by using AllocateAggTempValues () at the beginning of the aggregation evaluation (note that these two variables are the same) Although it represents a value, in Postgres, these two variables need to be a return value and a temporary arithmetic argument, respectively).

式の評価
上述の通り、numeric_add()の別の用途は、a+b等の算術式の演算である。この場合は、式の評価結果を格納する変数が再利用されるが、これは、make_result()によって先に割り当てられている。この変数は、agg_temp_values->expr_result_argとして、Ｓｐｉｆｆデータ部に追加される。 Expression Evaluation As mentioned above, another use of numeric_add () is for arithmetic expressions such as a + b. In this case, the variable that stores the evaluation result of the expression is reused, but this is assigned first by make_result (). This variable is added to the Spiff data part as agg_temp_values-> expr_result_arg.

第１の入力がＳｐｉｆｆデータ部内のagg_temp_values->result_valueに直接由来するSUM()の評価の場合と異なり、a+bの評価における両入力は、既存のＰｏｓｔｇｒｅｓの実装を用いて求める必要がある一般変数である。実際、a+bの評価においては、ｅｘｅＱｕａｌ．ｃ内のExecEvalOper()からnumeric_add()が呼び出される。このため、述語Ｓｐｉｆｆと同様に、ExecMakeFunctionResultNoSets()関数を特化するＳｐｉｆｆ（EvaluateAggregateExpression）が作成される。そして、このＳｐｉｆｆは、EvaluateNumericAddＳｐｉｆｆの式評価版を呼び出す。 Unlike the case of the evaluation of SUM () where the first input is directly derived from agg_temp_values-> result_value in the Spiff data part, both inputs in the evaluation of a + b need to be obtained using the existing Postgres implementation. Is a variable. In fact, in the evaluation of a + b, exeQual. numeric_add () is called from ExecEvalOper () in c. Therefore, as with the predicate Spiff, a Spiff (EvaluateAggregateExpression) that specializes the ExecMakeFunctionResultNoSets () function is created. And this Spiff calls the expression evaluation version of EvaluateNumericAddSpiff.

式には、＋のほかに、−、＊、および／等の他の演算子を含み得る。これらの演算子を評価する関数についても、numeric_add()と同様に特化される。 In addition to +, the expression may include other operators such as-, *, and / or. The function that evaluates these operators is specialized as well as numeric_add ().

EvaluateNumericAddＳｐｉｆｆのまとめとして、以下の不変値が考えられる。
１）numeric_add()を呼び出す側／実行パス。これは、ｅｘｅｃＱｕａｌ．ｃの式の評価またはｎｏｄｅＡｇｇ．ｃのSUM()関数の評価に由来し得る。
２）式の評価において、結果値のメモリ位置は、不変にすることができる。
３）SUM()の評価において、結果値のメモリ位置および第１の入力のメモリ位置はいずれも、不変にすることができる。また、これら２つの変数は、同じメモリ位置を共有することも可能である。
４）すべての行にわたって共通のメモリセグメントを共有することは、数値データ型の最高精度の境界を示す定数によって可能となる。 The following invariant values are considered as a summary of EvaluateNumericAddSpiff.
1) Side / execution path that calls numeric_add (). This is the execQual. c expression or nodeAgg. It can be derived from the evaluation of the SUM () function of c.
2) In evaluating the expression, the memory location of the result value can be made unchanged.
3) In the evaluation of SUM (), the memory location of the result value and the memory location of the first input can both be unchanged. These two variables can also share the same memory location.
4) Sharing a common memory segment across all rows is made possible by constants that indicate the highest precision bounds of numeric data types.

文字列照合Ｓｐｉｆｆ
たとえば、文字列ｘを（ワイルドカード等の特殊文字を含む）別の文字列パターンyと照合するＣ関数matchが存在する。クエリの実行前に（クエリ定数と考えられる）文字列ｙを把握している場合は、この特定の文字列パターンに対して任意の文字列を照合する特化関数を作成可能である。 String matching Spiff
For example, there is a C function match that matches a character string x with another character string pattern y (including special characters such as wild cards). If the character string y (considered as a query constant) is known before the query is executed, a specialized function for matching an arbitrary character string against this specific character string pattern can be created.

特化の手法としては、まず、以下のクエリ特化コード（仕様コード）を作成することになる：
長さ１〜３２の一定文字列用のコード
%クエリ文字列の文字用のコード
そして、これら仕様コードのさまざまな組み合わせを一列に並べることにより、yに対して文字列を照合する特化関数を構成することも可能である。たとえば、パターン「%abc%defg%」があり、これに対して任意の文字列を照合する特化関数を作成したいものとする。そこで、以下の仕様コードを一列に並べることになる：
%仕様コード
「abc」を照合する３文字の仕様コード
%仕様コード（最初のコードと同じにすることも可能）
「defg」を照合する４文字の仕様コード、
終了%仕様コード
これらの仕様コードはそれぞれ、照合完了後にも文字列の多くの文字が残ることを前提としている。仕様コードのうちの１つの照合が終わったら、残りの文字列を列中の次の仕様コードに渡して照合プロセスを継続する。 As a specialization method, first, the following query specialization code (specification code) is created:
Code for a constant string of length 1 to 32
% Code for character string of query string And, by arranging various combinations of these specification codes in a line, it is possible to construct a specialized function that matches the character string against y. For example, suppose that there is a pattern “% abc% defg%”, and it is desired to create a specialized function for matching an arbitrary character string. Therefore, the following specification codes are arranged in a line:
% Specification code Three-character specification code to match "abc"
% Specification code (can be the same as the first code)
A 4-character specification code to match "defg",
End% spec code Each of these spec codes assumes that many characters in the string will remain after the collation is complete. When collation of one of the specification codes is finished, the remaining character string is passed to the next specification code in the column and the collation process is continued.

文字列の一定部分の照合は、longlong、long、short、およびcharの組み合わせを用いることにより達成される。 Collation of certain parts of a string is achieved by using a combination of longlong, long, short, and char.

任意のクエリ文字列を前提として、ローカル変数として格納されたＳｐｉｆｆＩＤを用いることにより、クエリＳｐｉｆｆ起動によって最後を除くそれぞれが次の段階を呼び出した一連のクエリＳｐｉｆｆ関数ポインタをインスタンス化するのは容易である。 By using the SpiffID stored as a local variable, given an arbitrary query string, it is easy to instantiate a series of query Spiff function pointers that each call the next stage with the start of the query Spiff. is there.

以下（Ｅｘａｍｐｌｅ２３）は、（疑似コードを用いた）文字列「%abc%defg%」に対する実現の様子の一例を示している。

The following (Example 23) shows an example of a state of implementation for the character string “% abc% defg%” (using the pseudo code).

これらの仕様コードルーチンを作成したら、アレイとして一連の関数呼び出しを構築することにより、このパターンに対して文字列を照合する。アレイは、Ｅｘａｍｐｌｅ２４のようになる。

Once these specification code routines are created, a string is matched against this pattern by constructing a series of function calls as an array. The array looks like Example 24.

そして、これらの関数が順次呼び出されることにより、文字列を照合する。長さが３２を超える一定部分は、セグメントへの分割も可能であるため、長さ６５の文字列の場合は、３２文字、３２文字、および１文字という３つのインスタンス化仕様コードが必要となる。 These functions are sequentially called to collate character strings. Since a certain part with a length exceeding 32 can be divided into segments, a string of length 65 requires three instantiation specification codes of 32 characters, 32 characters, and 1 character. .

より一般的には、一部の引数が不変の方法が存在する。これらの不変値は、再帰呼び出しおよびループ内等、ｉｆ文の一部が決定論的になる。相互呼び出しする一連の仕様コードによって、このシーケンスを展開する。これによって、ループならびに再帰的および非再帰的呼び出しで作用する一般変換のようになる。 More generally, there are methods where some arguments are immutable. These invariant values become part of the if statement, such as recursive calls and loops, deterministic. This sequence is expanded by a series of specification codes that are mutually called. This makes it a general transformation that works with loops and recursive and non-recursive calls.

呼び出される実際の一連の仕様コードが実行時まで把握されないことから（照合対象の実際のパターンが利用可能な場合）、Ｓｐｉｆｆインスタンス化器によって、呼び出す一連の仕様コードを指定する関数ポインタのアレイを埋めることも可能である。 Since the actual set of specification codes to be called is not known until runtime (when the actual pattern to be matched is available), the Spiff instantiator fills an array of function pointers that specify the set of specification codes to be called It is also possible.

クエリごとのＳｐｉｆｆ順位付け
クエリごとのＳｐｉｆｆ順位付けでは、メタＳｐｉｆｆを用いて、インタープリットされたデータ構造およびホットスワッピングを検討することにより、コンパイラが出力する類似物へと既存の仕様コードを変換する。いくつかの実施形態においては、ホットスワッピングメカニズムの使用により、さまざまなケース間の関係に従って、実行時に結合可能な特化コードへとswitch／caseブロックを変換する。具体的には、実行中に、あるcaseに別の特定のcaseが後続する場合である。ホットスワッピングによって、分岐ベースのディスパッチャ呼び出しは、目標分岐へのジャンプに置き換わる。これは、一般的なディスパッチャおよび解釈的な実行モデルにも当てはまる。クエリプランをインタープリットして対応するプランノード固有関数を呼び出す代わりに、プランツリーを前提として、すべてのディスパッチャ呼び出しを子プランノードへの直接ジャンプで置き換えることができる。 Spiff ranking by query In Spiff ranking by query, meta-Spiff is used to transform the existing specification code into analogs output by the compiler by examining the interpreted data structure and hot swapping. . In some embodiments, the switch / case block is converted into specialized code that can be combined at runtime according to the relationship between the various cases by using a hot swapping mechanism. Specifically, this is a case where a case is followed by another specific case during execution. Hot swapping replaces branch-based dispatcher calls with jumps to the target branch. This also applies to general dispatchers and interpretive execution models. Instead of interpreting the query plan and calling the corresponding plan node specific function, given the plan tree, all dispatcher calls can be replaced with a direct jump to the child plan node.

特化コードの格納
特化が呼び出された場合は、特化コード（仕様コード）が生成され、領域特化プロセスに沿ったさまざまな場所に格納されるようになっていてもよい。たとえば、仕様コードは、油田データ２２０および油田シミュレータ２３０の両者からの不変値を含んでいてもよい。仕様コードは、設定パラメータ２１０および油田シミュレータ２３０の両者からの不変値を含んでいてもよい。いくつかの実施形態において、仕様コードは、Ｌｉｎｕｘ（登録商標）オペレーティングシステム２３０に格納され、シミュレータおよび油田データからの不変値を含んでいてもよい。いくつかの実施形態において、仕様コードは、外部ルータまたは外部クラウドサービスに格納されていてもよい。 Storage of specialization codes When specialization is invoked, specialization codes (specification codes) may be generated and stored in various locations along the domain specialization process. For example, the specification code may include invariant values from both the oil field data 220 and the oil field simulator 230. The specification code may include invariant values from both the setting parameter 210 and the oil field simulator 230. In some embodiments, the specification code is stored in the Linux® operating system 230 and may include invariant values from the simulator and oil field data. In some embodiments, the specification code may be stored in an external router or external cloud service.

シミュレータに格納された仕様コードは、油田データおよびシミュレータに由来していてもよく、また、基本的な領域特化によって識別されるようになっていてもよい。他のＳｐｉｆｆは、指定用途に含まれるオペレーティングシステム、ルータ、クラウド、および特化コードと格納されている。いくつかの実施形態において、仕様コードは、格納場所から呼び出し可能な場所（後で特化する際の特化候補を提供したアプリケーション）まで流れるようになっていてもよい。たとえば、油田データは、外部ルータにおいて呼び出し対象の仕様コードを格納していてもよい。いくつかの実施形態においては、仕様コード識別子がデータまたはアプリケーションとともに存在可能であり、また、後続のアプリケーションと連通して含まれることにより、（後で）呼び出す関連仕様コードを示すことができる。 The specification code stored in the simulator may be derived from the oil field data and the simulator, or may be identified by basic area specialization. Other Spiffs are stored with the operating system, router, cloud, and specialized code included in the designated application. In some embodiments, the specification code may flow from a storage location to a location where it can be called (an application that provides specialization candidates for later specialization). For example, the oil field data may store a specification code to be called in an external router. In some embodiments, a specification code identifier can be present with the data or application, and can be included in communication with subsequent applications to indicate the associated specification code to be called (later).

図３は、本開示が提供する例示的な一実施形態によるコンピュータサイエンスのパラダイムを精緻化する領域特化を示した図である。この図には、データとしてのデータ表現、データとしてのコード表現、コードとしてのデータ表現、およびコードとしてのコード表現というそれぞれのシナリオについて、４つの象限３１０、３２０、３３０、および３４０を含む。 FIG. 3 is a diagram illustrating domain specialization that refines the computer science paradigm according to an exemplary embodiment provided by the present disclosure. The figure includes four quadrants 310, 320, 330, and 340 for each scenario of data representation as data, code representation as data, data representation as code, and code representation as code.

バベッジマシンから１９３０年代に至るコンピュータアーキテクチャの初期段階においては、データがコードと区別されていた。データは操作されるものであり、プログラムコードは、データを操作して演算を有効にする方法の命令であった。これは、コンピュータメモリまたは記憶装置においてバイナリフォーマットで表されるデータすなわちデータとして格納されたデータおよびその他何らかの方法（たとえば、パッチコード）で表されるソースコードすなわちコードとして表されるコードとして、図３の象限３１０に示している。 In the early stages of computer architecture from the garbage machine to the 1930s, data was distinguished from code. Data is manipulated, and the program code was instructions on how to manipulate the data to validate the operation. This can be represented as data represented in binary format in computer memory or storage, ie data stored as data and code represented as source code or code represented in some other way (eg, patch code), as shown in FIG. The quadrant 310 of FIG.

１９４０年代には、数値、混合コード、およびデータとしてコンピュータのメモリにマシンコードのプログラムを格納する革命的なアーキテクチャをＪｏｈｎｖｏｎＮｅｕｍａｎｎが提案した（実際、コードはデータとして操作可能であり、プログラムの実行中であっても変更可能である）。このアーキテクチャは、象限３２０に示しており、コード（マシン語命令）がデータとして表される。 In the 1940s, John von Neumann proposed a revolutionary architecture that stores machine code programs in computer memory as numbers, mixed code, and data (in fact, code can be manipulated as data and program execution It can be changed even if inside.) This architecture is shown in quadrant 320, where codes (machine language instructions) are represented as data.

１９６０年代には、データ（たとえば、引数のうちの１つの値）を伴うＬｉｓｐ関数の形態のコード組み合わせに至るいくつかの初期的な展開があり、これは、当該パラメータの値と対を成すＬｉｓｐ関数（コード）であるＬｉｓｐ連続を生じるものであって、引数が１つ少ない関数である。これは、象限３３０に示すように、データをコードに格納／カプセル化する特別な方法である。 In the 1960s, there were several initial developments that led to code combinations in the form of Lisp functions with data (eg, the value of one of the arguments), which was paired with the value of that parameter. A function (code) that produces a Lisp continuation and has one less argument. This is a special way of storing / encapsulating data in code, as shown in quadrant 330.

１９８０年代には、ポストスクリプト言語が発明された。これは、実行された場合に画像を生成するコードである。象限３２０に示すように、ポストスクリプトは、データであるＭｉｃｒｏｓｏｆｔ（登録商標）ワードファイル等のドキュメントを取得し、再びプログラムに変換するフォーマッタにより生成されるデータとしてのコードである。Ｍｉｃｒｏｓｏｆｔ（登録商標）ワードファイルから生成されたポストスクリプトファイルは、直接印刷される画像ではなく、ドキュメントの各文字を描画する命令であるため、当該プログラムは、たとえばポストスクリプトプリンタにおける実行またはポストスクリプト変換プログラムによる実行によって、ドキュメントのビットマップ画像を生成することも可能である。 In the 1980s, a postscript language was invented. This is code that generates an image when executed. As shown in the quadrant 320, the postscript is a code as data generated by a formatter that obtains a document such as a Microsoft® word file as data and converts it into a program again. Since the PostScript file generated from the Microsoft® word file is not a directly printed image but an instruction for drawing each character of the document, the program is executed on a PostScript printer or converted to PostScript, for example. It is also possible to generate a bitmap image of a document by execution by a program.

領域特化は、この考え方をさらに進めたものである。領域特化では、不変値すなわちデータを取得し、これらの値を用いて、実行可能なコードであるＤＢＭＳ等のアプリケーションの一部の特化コード版を生成する。このため、関係仕様コードは、関係（データ）のスキーマを用いたＤＢＭＳコードの特化の結果である。タプル仕様コードは、タプル（テーブルの行）内でデータ値を用いた結果である。Ｏ／Ｓ仕様コードは、オペレーティングシステムのスニペット内の特定の不変値の特定のデータ値に基づく当該スニペットの特化であり、ルータ仕様コードの場合と同じである。 Domain specialization is a further advancement of this concept. In area specialization, invariant values, that is, data are acquired, and using these values, a special code version of a part of an application such as DBMS, which is executable code, is generated. For this reason, the relationship specification code is a result of specialization of the DBMS code using the relationship (data) schema. The tuple specification code is a result of using a data value in a tuple (table row). The O / S specification code is a specialization of the snippet based on a specific data value of a specific invariant value in the operating system snippet and is the same as the router specification code.

このコードとしてのデータ表現（象限３３０に示す）は、あるアプリケーションにおいて、当該アプリケーションまたは別のアプリケーションのスニペットから生成可能であり、アプリケーション間で受け渡し可能であり、また、必要に応じて、目標アプリケーションにより呼び出し可能であり。領域特化技術は、このような仕様コードが性能増大時に有効となるタイミング、これらを生成すべきタイミング、特化すべき不変値、アプリケーション間で連通可能な方法、および呼び出すべきタイミングを識別する手段を提供する。 The data representation as this code (shown in quadrant 330) can be generated from one application or another application snippet in one application, can be passed between applications, and depending on the target application as needed. Callable. Domain-specific technology provides a means to identify when such specification codes are valid when performance increases, when to generate them, invariant values to specialize, how to communicate between applications, and when to call them. provide.

データファイル内の任意のコヒーレント領域について、当該領域内の不変値を確定し、アプリケーションコードのエリアまでこれらの値を追跡し、これらのエリアから仕様コードを構成した後、これらの仕様コードをそれぞれの領域に再び関連付け可能であることが示唆される。したがって、この視点は、コードで開始してこれを特化するのではなく、元のデータに焦点を当てている。 For any coherent region in the data file, determine invariant values in that region, track these values up to the application code area, construct a specification code from these areas, and then change these specification codes It is suggested that it can be associated again with the region. This viewpoint therefore focuses on the original data, rather than starting with code and specializing it.

本開示の上記実施形態、特に、任意の「好適な」実施形態は、実施態様の考え得る例に過ぎず、また、本開示の原理の明確な理解を目的として示したものに過ぎないことが強調されるものとする。本開示の上記実施形態は、本開示の主旨および原理から実質的に逸脱することなく、多くの変形および改良が可能である。このようなすべての改良および変形は、本開示の範囲内で本明細書に含まれ、以下の特許請求の範囲によって保護されるものである。 The above-described embodiments of the present disclosure, and in particular any “preferred” embodiments, are merely possible examples of implementations and are presented only for the purpose of providing a clear understanding of the principles of the present disclosure. It should be emphasized. The above-described embodiments of the present disclosure are capable of many variations and modifications without departing substantially from the spirit and principles of the present disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Claims

A computer-implemented method for improving the performance of computer program code comprising:
Identifying invariant intervals of variables in the computer program code based on a program representation (PR), ie, an implementation of an abstract syntax tree or the like of the computer program code;
Estimating a program interaction within the computer program based on the PR and an ecosystem specification of the computer program;
Estimating a domain assertion based on the PR, the identified invariant interval of a variable in the computer program code, and the estimated program interaction;
One or more based on the invariant interval of variables in the computer program code, the PR, one or more execution summaries associated with the computer program, the estimated program interaction, and the estimated domain assertion Identifying a candidate snippet of;
Generating specialized computer program code based on the one or more candidate snippets;
Modifying the computer program code based on the generated specialized computer program code;
Hiding the specialized computer program code;
A computer-implemented method comprising:

(A) the identified invariant interval spans multiple runs;
(B) the identified invariant interval includes at least one set of invariant intervals of a particular variable, and all invariant intervals in the set share the same start node;
The computer-implemented method of claim 1, comprising at least one of the following:

Each of the one or more candidate snippets comprises (a) a code interval identified by the PR or (b) a set of invariant values and a set of possible values for each variable. A computer-implemented method described in 1.

The one or more candidate snippets each include an appropriate lifetime of the candidate snippet, and each of the one or more candidate snippets is a recommended optimization employed within the appropriate lifetime of the candidate snippet The computer-implemented method according to claim 1, wherein the method is preferably included.

The step of generating specialized computer program code comprises: (a) creating the specialized computer program code by inserting the code at a suitable location of the computer program, calling the specialized computer program code; Breaking the code or (b) creating a specialization function that matches any string against a given string pattern, using a metaspecifier to interpret the data structure and hot Considering swapping, converting existing specialized computer program code, calling a query, reducing the size of the computer program code by removing branches of the computer program code, and using numerical values The slab allocation of the in-region specializer (Spiff) data portion, reusing the Spiff data portion by computing the corresponding aggregate function across all input rows in the computer program code, Removing a line-by-line memory allocation, wherein the numerical value is defined by the maximum number of digits that can be supported by the line, or using an invariant value in a disk or memory page on which the computer program is stored. 5. A computer-implemented method according to any one of claims 1 to 4, comprising reconstructing a data layout and optimizing data locality once the page is read.

The specialized computer program code is created at run time and called later, optionally further comprising determining whether a violation of the identified invariant interval occurs in a given run. The computer-implemented method according to any one of 5

A system configured to improve the performance of a computer program comprising:
An invariant searcher for identifying invariant intervals of variables in the computer program based on a program representation (PR) of the computer program;
An interaction estimator for estimating program interactions within the computer program based on the PR and ecosystem specifications of the computer program;
A domain assertion estimator that estimates domain assertions based on the PR, the identified invariant intervals of variables in the computer program code, and the estimated program interaction;
One based on the invariant interval of variables in the computer program code, the PR, one or more execution summaries associated with the computer program, the estimated program interaction, and the estimated domain assertion; Or a snippet searcher that identifies multiple candidate snippets;
An intra-region specializer (Spiff) composer that generates specialized computer program code based on the one or more candidate snippets;
A specialized source that receives the generated specialized computer program code and modifies the computer program code based on the generated specialized computer program code;
With a system.

The system according to claim 7, wherein the invariant searcher performs a static analysis on the PR to identify invariant intervals.

9. A system according to claim 7 or 8, wherein the identified invariant interval comprises at least a set of invariant intervals for a particular variable, and all invariant intervals in the set share the same starting node.

10. A system according to any of claims 7 to 9, further comprising an invariant inspector that determines whether a violation of the identified invariant interval occurs in a given execution.

11. A system according to any of claims 7 to 10, wherein each of the one or more candidate snippets includes a set of invariant values and a set of possible values for each variable.

The one or more candidate snippets each include an appropriate lifetime of the candidate snippet, and each of the one or more candidate snippets is a recommended optimization employed within the appropriate lifetime of the candidate snippet 12. A system according to any of claims 7 to 11, preferably comprising:

The Spiff Configurator inserts code into the computer program in place to create the specialized computer program code, call the specialized computer program code, and destroy the specialized computer program code; Existing specialized computers by creating specialized functions that match arbitrary strings against string patterns, using meta-specializers, and interpreting interpreted data structures and hot swapping Converting program code, combining query invariant values and raw invariant values that are valid in a query by schema, reducing the size of the computer program code by removing branches of the computer program code, Use within the area Slab allocation of the data unit (Spiff), and by reusing the Spiff data unit by computing the corresponding aggregate function over all input rows in the computer program code, memory allocation in units of rows By removing, the value is defined by the maximum number of digits that can be supported in line units, or the page is read using an invariant value in a disk or memory page where the computer program is stored. 13. A system as claimed in any of claims 7 to 12, further configured to reconfigure the data layout when done and optimize data locality.

When executed by the processor of the computer equipment:
Determining a variable in said computer program whose value is invariant within an identified invariant interval by means of a static analysis on a program representation (PR) of the computer program;
Creating a specialized computer program code at a suitable location of the computer program, calling the specialized computer program code, and destroying the specialized computer program code, wherein the specialized computer program code is at least the Created based on the determined variables;
When the specialized computer program is invoked, modifying at least a portion of the computer program with the specialized computer program code;
A persistent computer-readable medium containing computer-executable instructions for causing the computer equipment to perform the operation.

The persistent computer readable medium of claim 14, wherein generating the specialized computer program code is further based on at least one of domain specific knowledge and preliminary source knowledge associated with the computer program.