JP2023071664A

JP2023071664A - Format-specific data processing operation

Info

Publication number: JP2023071664A
Application number: JP2023015892A
Authority: JP
Inventors: エー．イスマン，マーシャル; A Isman Marshall; ジョイス，ジョン; Joyce John
Original assignee: Ab Initio Technology LLC
Current assignee: Ab Initio Technology LLC
Priority date: 2016-06-03
Filing date: 2023-02-06
Publication date: 2023-05-23
Also published as: CA3026334C; EP3465422A1; CN109564507B; AU2021201363A1; US20210182038A1; DE112017002779T5; JP2019521430A; EP3465422B1; AU2017274407A1; US10936289B2; US11347484B2; AU2021201363B2; US20170351494A1; CN109564507A; WO2017209969A1; CA3026334A1; SG11201810383SA; AU2017274407B2

Abstract

PROBLEM TO BE SOLVED: To improve format-specific data processing operations.

SOLUTION: A method includes analyzing, by a processor, a first version of a computer program, the analyzing including identifying a first process included in the first version of the computer program, the first process configured to perform an operation on data having a first format, and generating, by the processor, a second version of at least part of the computer program, including omitting the first process, and including in the second version of the at least part of the computer program one or more second processes configured to perform a second operation on data of a second format different from the first format, in which the second operation is based on the first operation.

SELECTED DRAWING: Figure 1

Description

優先権の主張
本出願は、２０１６年６月３日出願の米国特許仮出願第６２／３４５，２１７号、及び２０１７年２月１５日出願の米国特許出願第１５／４３３，４６７号の優先権を主張し、その両方の内容全体を参考としてここに組み込む。 PRIORITY CLAIM This application claims priority from U.S. Provisional Application No. 62/345,217, filed June 3, 2016, and U.S. Patent Application No. 15/433,467, filed February 15, 2017. and the entire contents of both are incorporated herein by reference.

背景
複雑な計算は、有向グラフ（「データフロー・グラフ」と呼ばれる）を用いたデータ・フローとして表すことのできる場合が多く、この計算の構成要素は、このグラフの頂点に関連付けられ、データは、グラフのリンク（弧、縁部）に対応する各構成要素間を流れる。各構成要素は、１つ又は複数の入力ポートでデータを受信し、このデータを処理し、１つ又は複数の出力ポートからデータを提供するデータ処理構成要素、及びデータ・フローの送信側又は受信側の役割を果たすデータセット構成要素を含むことができる。 Background Complex computations can often be represented as data flows using a directed graph (called a "dataflow graph"), the components of this computation being associated with the vertices of this graph, and the data being Flow between each component corresponding to a link (arc, edge) of the graph. Each component is a data processing component that receives data at one or more input ports, processes this data, provides data from one or more output ports, and a sender or receiver of data flows. It can contain data set components that play a supporting role.

概要
一態様では、方法は、コンピュータ・プログラムの第１のバージョンをプロセッサによって解析することであって、コンピュータ・プログラムの第１のバージョンに含まれる第１のプロセスを識別することを含み、この第１のプロセスが、第１のフォーマットのデータに第１の動作を実行するように構成されることと、コンピュータ・プログラムの少なくとも一部分の第２のバージョンをプロセッサによって生成することであって、第１のプロセスを省略することを含み、第１のフォーマットとは異なる第２のフォーマットのデータに第２の動作を実行するように構成された１つ又は複数の第２のプロセスを、コンピュータ・プログラムの少なくとも一部分の第２のバージョンに含むことを含み、この第２の動作が第１の動作に基づく。 Overview In one aspect, a method includes analyzing, by a processor, a first version of a computer program to identify a first process included in the first version of the computer program; A process configured to perform a first operation on data in a first format; and generating, by a processor, a second version of at least a portion of a computer program, comprising: one or more second processes configured to perform a second operation on data in a second format different from the first format, including omitting the process of Including in a second version of at least a portion, the second action being based on the first action.

各実施形態は、以下の特徴のうちの１つ又は複数を含むことができる。 Implementations can include one or more of the following features.

第１のプロセスを識別することは、第１の動作がデータのフォーマットに依存している第１のプロセスを識別することを含む。 Identifying the first process includes identifying the first process whose first operation depends on the format of the data.

第１のプロセスを識別することは、第２のフォーマットのデータに第１の動作を実行することができない第１のプロセスを識別することを含む。 Identifying the first process includes identifying the first process incapable of performing the first operation on data in the second format.

この方法は、第１のプロセスによって処理されるデータのフォーマットを決定することを含む。第１のプロセスを識別することは、この第１のプロセスによって処理されるデータのフォーマットを有するデータに、第１の動作を実行することができない第１のプロセスを識別することを含む。 The method includes determining the format of data to be processed by the first process. Identifying the first process includes identifying the first process incapable of performing the first operation on data having a format of data processed by the first process.

第１のプロセスを識別することは、コンピュータ・プログラムの第１のデータ処理要素を識別することを含み、この第１のデータ処理要素は、第１のプロセスを実行するように構成される。コンピュータ・プログラムの少なくとも一部分の第２のバージョンでの１つ又は複数の第２のプロセスを含むことは、コンピュータ・プログラムの少なくとも一部分の第２のバージョンでの１つ又は複数の第２のデータ処理要素を含むことを含み、この第２のデータ処理要素は、１つ又は複数の第２のプロセスを実行するように構成される。 Identifying the first process includes identifying a first data processing element of the computer program, the first data processing element configured to execute the first process. Including one or more second processes in a second version of at least a portion of the computer program includes one or more second data processing in the second version of at least a portion of the computer program. The second data processing element is configured to execute one or more second processes.

第１のフォーマットは、データ・タイプを含む。 A first format includes a data type.

第１のフォーマットは、データ要素のサイズを含む。 The first format contains the size of the data elements.

第１のプロセスは、第１のレコード・フォーマットのデータ・レコードに第１の動作を実行するように構成され、１つ又は複数の第２のプロセスは、第２のレコード・フォーマットのデータ・レコードに第２の動作を実行するように構成される。第１のレコード・フォーマットは、各レコードでのフィールドの名前を含む。 A first process is configured to perform a first operation on data records in a first record format, and one or more second processes are configured to perform data records in a second record format. is configured to perform the second operation on the A first record format contains the names of the fields in each record.

この方法は、ユーザ・インターフェースにおいて、１つ又は複数の動作の第１のセットの識別子を提示することを含む。 The method includes presenting an identifier of a first set of one or more actions in a user interface.

コンピュータ・プログラムの少なくとも一部分の第２のバージョンを生成することは、このコンピュータ・プログラムのこの部分のコピーを生成することを含む。 Generating a second version of at least a portion of a computer program includes generating a copy of this portion of the computer program.

この方法は、コンピュータ・プログラムのこの部分のコピーを修正して、第１のプロセスを省き、１つ又は複数の第２のプロセスを含むことを含む。 The method includes modifying the copy of this portion of the computer program to omit the first process and include one or more second processes.

この方法は、コンピュータ・プログラムの第２のバージョンを実行することを含む。 The method includes executing a second version of the computer program.

１つ又は複数の第２のプロセスは、オーバレイ指定によって定義される。コンピュータ・プログラムの第２のバージョンを生成することは、コンピュータ・プログラムの第１のバージョン及びオーバレイ指定に基づいて第２のバージョンを生成することを含む。このオーバレイ指定は、第１のプロセスの上流のプロセス及び第１のプロセスの下流のプロセスのうちの１つ又は複数を識別する。この方法は、第１のプロセスを定義する実行可能コードの解析に基づいて、第１のプロセスを識別することを含む。 One or more secondary processes are defined by overlay specifications. Generating the second version of the computer program includes generating a second version based on the first version of the computer program and the overlay specification. The overlay designation identifies one or more of a process upstream of the first process and a process downstream of the first process. The method includes identifying a first process based on analysis of executable code defining the first process.

コンピュータ・プログラムはグラフを含む。第１のプロセスは、このグラフの第１の構成要素によって表される実行可能なプロセスであり、１つ又は複数の第２のプロセスは、グラフの１つ又は複数の第２の構成要素によって表される実行可能なプロセスである。１つ又は複数の第２の構成要素は、グラフの上流構成要素からデータ・レコードを受信するように構成される。１つ又は複数の第２の構成要素は、グラフの下流構成要素にデータ・レコードを提供するように構成される。 A computer program includes a graph. A first process is an executable process represented by a first component of the graph and one or more second processes are represented by one or more second components of the graph. It is an executable process that One or more second components are configured to receive data records from upstream components of the graph. One or more second components are configured to provide data records to downstream components of the graph.

一態様では、システムは、コンピュータ・プログラムの第１のバージョンをプロセッサによって解析するための手段であって、この解析することが、コンピュータ・プログラムの第１のバージョンに含まれる第１のプロセスを識別することを含み、この第１のプロセスが、第１のフォーマットを有するデータに第１の動作を実行するように構成される手段と、コンピュータ・プログラムの少なくとも一部分の第２のバージョンをプロセッサによって生成するための手段であって、第１のプロセスを省略することを含み、第１のフォーマットとは異なる第２のフォーマットを有するデータに第２の動作を実行するように構成された１つ又は複数の第２のプロセスを、コンピュータ・プログラムの少なくとも一部分の第２のバージョンに含む手段とを備え、この第２の動作が第１の動作に基づく。 In one aspect, a system is means for analyzing a first version of a computer program by a processor, the analyzing identifying a first process included in the first version of the computer program. means configured to perform a first operation on data having a first format; and generating a second version of at least a portion of a computer program by a processor. comprising omitting the first process and configured to perform a second operation on data having a second format different from the first format; in a second version of at least a portion of the computer program, the second operation being based on the first operation.

一態様では、システムは、メモリに結合されたプロセッサを備え、このプロセッサ及びメモリは、コンピュータ・プログラムの第１のバージョンを解析するように構成され、この解析することが、コンピュータ・プログラムの第１のバージョンに含まれる第１のプロセスを識別することを含み、この第１のプロセスが、第１のフォーマットを有するデータに第１の動作を実行するように構成され、コンピュータ・プログラムの少なくとも一部分の第２のバージョンを生成するように構成され、第１のプロセスを省略することを含み、第１のフォーマットとは異なる第２のフォーマットを有するデータに第２の動作を実行するように構成された１つ又は複数の第２のプロセスを、コンピュータ・プログラムの少なくとも一部分の第２のバージョンに含み、この第２の動作が第１の動作に基づく。 In one aspect, a system comprises a processor coupled to a memory, the processor and memory configured to analyze a first version of a computer program, the analyzing comprising the first version of the computer program. wherein the first process is configured to perform a first operation on data having a first format, the version of at least a portion of the computer program configured to generate a second version, including omitting the first process, and configured to perform a second operation on data having a second format different from the first format One or more second processes are included in a second version of at least a portion of the computer program, the second operations being based on the first operations.

一態様では、持続的なコンピュータ読取り可能な媒体は、コンピューティング・システムが、コンピュータ・プログラムの第１のバージョンを解析するようにし、この解析することが、コンピュータ・プログラムの第１のバージョンに含まれる第１のプロセスを識別することを含み、この第１のプロセスが、第１のフォーマットを有するデータに第１の動作を実行するように構成され、コンピュータ・プログラムの少なくとも一部分の第２のバージョンを生成するようにし、第１のプロセスを省略することを含み、第１のフォーマットとは異なる第２のフォーマットを有するデータに第２の動作を実行するように構成された１つ又は複数の第２のプロセスを、コンピュータ・プログラムの少なくとも一部分の第２のバージョンに含むようにするための命令を記憶し、この第２の動作が第１の動作に基づく。 In one aspect, a persistent computer-readable medium causes a computing system to analyze a first version of a computer program, the analyzing included in the first version of the computer program. a second version of at least a portion of a computer program, the first process configured to perform a first operation on data having a first format; and omitting the first process, and configured to perform a second operation on data having a second format different from the first format. storing instructions for including the process of No. 2 in a second version of at least a portion of the computer program, the second action being based on the first action;

他の特徴及び利点は、以下の説明及び特許請求の範囲から明白になろう。 Other features and advantages will be apparent from the following description and claims.

図面の説明
グラフの例である。グラフの例である。グラフの例である。オーバレイ指定の一例である。ブロック図である。流れ図である。ブロック図である。ブロック図である。ブロック図である。流れ図である。 Description of the drawing
It is an example of a graph. It is an example of a graph. It is an example of a graph. This is an example of overlay designation. It is a block diagram. It is a flow chart. It is a block diagram. It is a block diagram. It is a block diagram. It is a flow chart.

説明
グラフなどの実行可能なアプリケーションは、この実行可能なアプリケーションによって処理されるデータ・レコードの１つ又は複数の特定のフォーマットに特有の、１つ又は複数のプロセスを含むことができる。このようなプロセスは、１つ又は複数の特定のフォーマットのみのデータに動作を実行することができ、互いに異なるフォーマットのデータにこれらのプロセスを実行しようとすると、エラー又は不正確な処理につながる場合がある。ここで、データの１つ又は複数の特定のフォーマットに特有の、実行可能なアプリケーションのプロセスを識別するための手法について述べる。実行可能なアプリケーションが、互いに異なるフォーマットのデータに動作を実行できるようにするために、これらのフォーマット特有のプロセスが省略され、１つ又は複数の他のプロセスが含まれる実行可能なアプリケーションの第２のバージョンを生成することができる。置換プロセスと呼ばれることもある、これら他のプロセスは、省略されたフォーマット特有のプロセスによって実行される動作に基づいているが、互いに異なる１つ又は複数のフォーマットのデータに特有であり、又は任意のフォーマットのデータに動作を実行できる動作を実行することができる。実行可能なアプリケーションの第２のバージョンに置換プロセスを含めることにより、実行可能なアプリケーションは、この実行可能なアプリケーションの元のバージョンが構成されたフォーマットとは異なるフォーマットのデータを処理できるようになる。 Description An executable application, such as a graph, may include one or more processes that are specific to one or more particular formats of data records processed by the executable application. Such processes may perform operations on data of only one or more specific formats, and attempting to perform these processes on data of different formats may lead to errors or incorrect processing. There is Techniques for identifying executable application processes specific to one or more particular formats of data are now described. In order to allow executable applications to perform operations on data of different formats, a second form of executable application in which these format-specific processes are omitted and one or more other processes are included. version can be generated. These other processes, sometimes referred to as replacement processes, are based on operations performed by the omitted format-specific processes, but are specific to one or more formats of data that differ from each other, or any An operation can be performed that can perform an operation on the data in the format. Including the replacement process in the second version of the executable application enables the executable application to process data in a format different from the format in which the original version of the executable application was constructed.

置換プロセスは、オーバレイ指定で定義することができ、このオーバレイ指定は、元の実行可能なアプリケーション（第１のアプリケーションとも呼ばれることがある）とは別のファイルである。オーバレイ指定で定義された置換プロセスは、元のアプリケーションの一部分になることなく、アプリケーションの第２のバージョン（第２のアプリケーションとも呼ばれることもある）に追加することができる。たとえば、アプリケーションがコンパイルされると、コンパイラは、オーバレイ・ファイルを考慮に入れ、１つ又は複数のプロセスが省略され、１つ又は複数の対応する置換プロセスが含まれる第２のアプリケーションを生成する。省略されると、第１のアプリケーションに含まれるプロセスが、第２のアプリケーションには含まれないことを意味する。対応する置換構成要素は、第２のアプリケーション、たとえば省略されたプロセスが第１のアプリケーションに配置されていた位置に挿入される。 A replacement process can be defined in an overlay specification, which is a separate file from the original executable application (sometimes referred to as the first application). A replacement process defined in an overlay specification can be added to a second version of an application (sometimes called a second application) without becoming part of the original application. For example, when an application is compiled, the compiler takes into account the overlay file and produces a second application in which one or more processes are omitted and one or more corresponding replacement processes are included. If omitted, it means that the processes contained in the first application are not contained in the second application. A corresponding replacement component is inserted at the location where the second application, eg, the omitted process, was located in the first application.

置換プロセスは、挿入の例である。挿入の他の例には、テスト・ソース及びプローブが含まれ、これらは又オーバレイ指定によって定義することができる。テスト・ソースは、実行可能なアプリケーションによる処理向けの、テスト・データなどのデータを提供できる置換データ・ソースである。プローブは、実行可能なアプリケーションがデータを書き込む際の、代替の書込み先である。挿入は、たとえば、グラフなどの実行可能なアプリケーションをテスト又はデバッグするのに有用になる場合がある。たとえば、試験者又は開発者は、アプリケーションに加える変更の効果を観察するために、入力データの特定のセットを使用してテストを実行したいと思うことがある。この変更の前後ともに、入力データの一貫したセットを使用してアプリケーションを実行することにより、アプリケーションが出力するデータに加えるこの変更の効果を監視することができる。例によっては、アプリケーションの機能の全てを少なくとも一度は実行することになる、テスト・データのセットなど、試験者は、アプリケーションをテストするときに使用されることになるテスト・データの特定のセットを有する場合がある。同様にして、試験者は、アプリケーションが出力データを書き込む際の標準の書込み先とは異なる特定の書込み先に、アプリケーションが出力するこのデータを書き込みたいと考えることがある。 The replacement process is an example of insertion. Other examples of inserts include test sources and probes, which can also be defined by overlay specifications. A test source is a replacement data source that can provide data, such as test data, for processing by an executable application. Probes are alternative destinations for executable applications to write data. Insertion may be useful, for example, for testing or debugging executable applications such as graphs. For example, a tester or developer may wish to run tests using a particular set of input data to observe the effects of changes to an application. By running the application with a consistent set of input data both before and after this change, the effect of this change on the data output by the application can be monitored. In some instances, the tester has a particular set of test data that will be used when testing the application, such as a set of test data that will exercise all of the application's functionality at least once. may have. Similarly, the tester may wish to write this data output by the application to a specific destination that is different from the standard destination to which the application writes output data.

例によっては、挿入は、アプリケーションの自動解析に基づいて自動的に定義することができる。たとえば、アプリケーションでのフォーマット特有のプロセスの自動識別に基づいて、置換構成要素を自動的に定義することができる。テスト・ソース及びプローブは、アプリケーションのデータ・ソース及び出力データ受信側の自動識別に基づいて自動的に定義することができる。 In some examples, the insertion can be automatically defined based on automatic analysis of the application. For example, replacement components can be automatically defined based on automatic identification of format-specific processes in the application. Test sources and probes can be automatically defined based on the automatic identification of the application's data sources and output data receivers.

例によっては、実行可能なアプリケーションはグラフベースのプロセスである。グラフベースのプロセスは、ある１つの構成要素から別の構成要素へのデータの流れを示すフローによって接続された、それぞれ実行可能なプロセスを表す１つ又は複数の構成要素を含む。置換プロセスは、グラフベースのプロセスでの構成要素に関連付けられたオブジェクトである。置換プロセス（置換え構成要素とも呼ばれることがある）は、既存の構成要素によって処理されたはずのデータが置換構成要素によって代わりに処理されるように、グラフ内の既存の構成要素を置換することができる。テスト・ソース及びプローブの挿入は、グラフベースのプロセスでのフローに関連付けられたオブジェクトである。テスト・ソースは、フローを通過するデータ（たとえば、アップストリーム・データ）を新規データで置換することができ、その結果、グラフのそれぞれの実行において上流側の計算を再実行する必要はない。たとえば、テスト・ソースは、データ・ソースからではなくテスト・ソースからグラフにテスト・データが提供されるよう、このデータ・ソースを置換することができる。プローブは、グラフが実行されるとき、フローを通過するデータを監視することができ、後に検査又は再使用するために、このデータを保存することができる。たとえば、プローブは、データベースなどの出力データ受信側に他の方法で保存されていたはずのデータを受信することができる。 In some examples, the executable application is a graph-based process. A graph-based process includes one or more components, each representing an executable process, connected by flows that show the flow of data from one component to another. A replacement process is an object associated with a component in a graph-based process. A replacement process (sometimes called a replacement component) can replace an existing component in a graph such that data that would have been processed by the existing component is instead processed by the replacement component. can. Insertion of test sources and probes are objects associated with flows in graph-based processes. A test source can replace data passing through the flow (eg, upstream data) with new data so that upstream computations do not have to be re-run on each execution of the graph. For example, a test source can replace this data source such that the graph is provided with test data from the test source rather than from the data source. The probe can monitor the data passing through the flow as the graph is executed, and can save this data for later inspection or reuse. For example, the probe may receive data that would otherwise have been stored in an output data receiver such as a database.

オーバレイ指定で定義される挿入は、元のアプリケーションの一部分になることなく、実行中にアプリケーションに追加することができる。アプリケーションがコンパイルされると、コンパイラは、オーバレイ・ファイルを考慮に入れ、挿入を含む実行可能なアプリケーションを生成する。アプリケーションの第１のバージョンとしての元のアプリケーション、及びアプリケーションの第２のバージョンとしての挿入を含むアプリケーションを参照する場合がある。たとえば、グラフベースのプロセスの例では、オーバレイ指定において定義された挿入オブジェクトと結合された、グラフの第１のバージョンの構成要素を含むグラフの第２のバージョンとして、実行可能なグラフを視覚的に表すことができる。例によっては、実行可能なグラフはシェル・スクリプトであり、ファイル内には記憶されない。例によっては、実行可能なグラフ及びこのグラフは、別々のファイルに記憶される。 Inserts defined by overlay specifications can be added to an application while it is running without becoming part of the original application. When the application is compiled, the compiler takes into account the overlay files and produces an executable application containing the inserts. We may refer to the original application as the first version of the application and the application containing the insert as the second version of the application. For example, in the graph-based process example, the executable graph can be visualized as a second version of the graph containing the elements of the first version of the graph combined with the insert objects defined in the overlay specification. can be represented. In some examples, the executable graph is a shell script and not stored in a file. In some examples, the executable graph and this graph are stored in separate files.

グラフの第２のバージョンへの挿入の取込みは、グラフの第１のバージョンを修正しない。その代わりに、挿入定義は、別々のファイル（たとえば、別々のオーバレイ指定）にとどまり、コード生成の開始時点において、修正済みグラフに含めるために通常のグラフ構成要素へと変換することができる。したがって、元のグラフを意図せずに破壊するリスクがない。 Incorporating inserts into the second version of the graph does not modify the first version of the graph. Instead, the insert definitions can remain in separate files (e.g., separate overlay specifications) and converted to regular graph constructs for inclusion in the modified graph at the start of code generation. Therefore, there is no risk of unintentionally destroying the original graph.

図１には、グラフ１００の一例が示してある。グラフ１００は、フローによって接続されたデータ処理構成要素を含む、コンピュータ・プログラムの視覚表現である。２つの構成要素を接続するフローは、第１の構成要素から出力されたレコードが、第２の構成要素に渡されることを示す。フローによって第１の構成要素が第２の構成要素に接続されているとき、第１の構成要素は第２の構成要素を参照する。 An example of a graph 100 is shown in FIG. Graph 100 is a visual representation of a computer program containing data processing components connected by flows. A flow connecting two components indicates that records output by the first component are passed to the second component. A first component references a second component when a flow connects the first component to the second component.

（図に示すような）データベースなどのデータ・ソース１０２、ファイル、キュー、実行可能文（たとえば、ＳＱＬ文）、又はグラフ１００の外部にある別のタイプのデータ・ソースは、グラフ１００によって処理される１つ又は複数のデータ・レコードを含む。外部という言葉によって、データ・ソース１０２のデータが、グラフ１００内に記憶されないことを意味する。データ・ソース１０２は、フローによってフィルタ構成要素１０３に接続されている。一般に、フィルタ構成要素は、所定の判定基準を満たさないレコードを選別又は削除する。この例では、フィルタ構成要素１０３は、オハイオ州在住の顧客のデータ・レコードを通過させ、その他のレコードを拒否する。フィルタ構成要素１０３は、選別済みのデータ・レコードを郵便番号でソートする、ソート構成要素１０４に接続されている。ソート構成要素１０４は、２つの異なる方式でデータ・レコードを処理できるように、このデータ・レコードのコピーを作成する、複製構成要素１０６に接続されている。複製構成要素は、フォーマット変更構成要素１０８、及び表現式による選別の構成要素１１０に接続されている。たとえば、オハイオ州在住の顧客のデータ・レコードの１つの実現値は、郵便番号でソートされ、フォーマット変更構成要素１０８に送られ、又データ・レコードの別の実現値が、表現式による選別の構成要素１１０に送られる。フォーマット変更構成要素１０８は、データ・レコードのフォーマットをそれとは異なるデータ・フォーマットに変更し、表現式による選別の構成要素１１０は、データ・レコードに関連付けられた表現式に基づいて、このデータ・レコードを削除する。フォーマット変更構成要素１０８、及び表現式による選別の構成要素１１０は、受信済みのデータ・レコードを結合する集約構成要素１１２に接続されており、この集約構成要素は、（図に示すような）データベース、ファイル、キュー、又は下流処理構成要素など、グラフの外部に存在する出力データ受信側構成要素１１４に接続されている。外部という言葉によって、出力データ受信側１１４のデータが、グラフ１００内に記憶されないことを意味する。グラフ１００は、各構成要素間に多くのフローを含むが、データ・ソース１０２とフィルタ構成要素１０３との間のフロー１１６（ソース／フィルタ・フロー１１６と呼ぶことがある）、及び集約構成要素１１２と出力データ受信側１１４との間のフロー１１８（集約／出力フロー１１８と呼ぶことがある）が、この例では特に対象となる。 Data sources 102 such as databases (as shown), files, queues, executable statements (eg, SQL statements), or other types of data sources external to graph 100 are processed by graph 100. contains one or more data records that By external is meant that the data of data source 102 is not stored within graph 100 . Data source 102 is connected to filter component 103 by a flow. In general, a filter component screens or removes records that do not meet predetermined criteria. In this example, filter component 103 passes data records for customers in Ohio and rejects other records. Filter component 103 is connected to sort component 104, which sorts the filtered data records by zip code. Sorting component 104 is connected to replication component 106, which makes copies of data records so that they can be processed in two different ways. The replication component is connected to a format modification component 108 and a filter by expression component 110 . For example, one occurrence of data records for customers in Ohio may be sorted by zip code and sent to format modification component 108, and another occurrence of data records may be configured for sorting by expression. sent to element 110. A format change component 108 changes the format of a data record to a different data format, and a filter by expression component 110 converts the data record to a different data format based on the expression associated with the data record. delete. The format modification component 108 and the filter by expression component 110 are connected to an aggregation component 112 that combines the received data records, which is a database (as shown). , files, queues, or downstream processing components that exist outside the graph. By external is meant that the output data receiver 114 data is not stored within the graph 100 . Graph 100 includes many flows between each component, including flows 116 (sometimes referred to as source/filter flows 116) between data sources 102 and filter components 103, and aggregate component 112. and output data receiver 114 (sometimes referred to as aggregation/output flow 118) are of particular interest in this example.

グラフの各構成要素のうちの１つ又は複数は、フォーマット特有の構成要素とすることができる。フォーマット特有の構成要素は、１つ又は複数の特定のフォーマットのデータのみを処理できる構成要素である。データ・フォーマットは、個々のデータ項目の特性（たとえば、レコードのフィールド内の値の特性）、又はレコードの特性（レコード・フォーマットと呼ばれることもある）である。個々のデータ項目の特性の例は、データ項目のサイズ（たとえば、１バイトＡＳＣＩＩデータ項目、若しくはマルチバイト・データ項目）、データ項目のタイプ（たとえば、ストリング・タイプ、整数タイプ、ブール・タイプ、若しくは他のデータ・タイプ）、又は個々のデータ項目の別の特性など、データ項目について数多くのバイトを含む。レコード・フォーマットの例には、レコード内のフィールドの名前、レコード内のフィールドの位置、レコード内のフィールドの数、階層型レコード、フィールドの配列又は繰返しグループ、入れ子状の配列、サブレコード、又はレコードの別の特性が含まれる。 One or more of each component of the graph may be format-specific components. Format-specific components are components that can only process data in one or more specific formats. A data format is a characteristic of an individual data item (eg, a characteristic of a value within a field of a record) or a characteristic of a record (sometimes called a record format). Examples of characteristics of individual data items are the size of the data item (e.g., single-byte ASCII data item, or multibyte data item), the type of data item (e.g., string type, integer type, Boolean type, or other data types), or other characteristics of individual data items. Examples of record formats include the name of a field within a record, the position of a field within a record, the number of fields within a record, a hierarchical record, an array or repeating group of fields, a nested array, a subrecord, or a record includes another characteristic of

グラフは、特定のデータ・フォーマットに特有の構成要素を含むとき、特定のフォーマットのデータのみを処理できてもよい。グラフが、互いに異なるフォーマットを有するデータを処理するのに使用される場合、エラーが発生する場合があり、又はデータが不正確に処理される場合がある。フォーマットが異なるデータをグラフが処理できるようにするには、フォーマット特有の構成要素のうちの１つ又は複数を、異なるフォーマットのデータを処理することのできる構成要素で置換することができる。置換構成要素は、様々な構成要素に特有であるフォーマット特有の構成要素とすることができ、又は任意のフォーマットのデータを処理できる構成要素（これを、フォーマットに依存しない構成要素と呼ぶことがある）とすることができる。 A graph may only be able to process data of a particular format when it contains components specific to that particular data format. When graphs are used to process data having formats that differ from each other, errors may occur or data may be processed incorrectly. To enable a graph to process data in different formats, one or more of the format-specific components can be replaced with components that can process data in different formats. A replacement component can be a format-specific component that is specific to the various components, or a component that can process data in any format (sometimes called a format-independent component). ).

たとえば、図１の例では、ソート構成要素１０４は、郵便番号フィールド内の値によってレコードをソートする。この例でのソート構成要素１０４は、整数だけを処理できるフォーマット特有の構成要素である。グラフ１００のオペレータは、グラフ１００を使用して、郵便番号フィールドが文字数字の列を含んでもよいデータの新規セットを処理したいと思うことがある。 For example, in the example of FIG. 1, sorting component 104 sorts the records by the values in the zip code field. Sort component 104 in this example is a format-specific component that can only handle integers. An operator of graph 100 may wish to use graph 100 to process new sets of data in which the postal code field may contain alphanumeric strings.

図２を参照すると、ソート構成要素１０４が省略され、置換ソート構成要素２０４が含まれる、グラフ１００の第２のバージョン２００が生成される。置換ソート構成要素２０４は、グラフの第２のバージョン２００内での、ソート構成要素１０４と同じ位置に配置され、文字数字の列をソートすることができる。グラフの他の構成要素は変化しない。グラフの第２のバージョン２００は、データの新規セットをこのように処理することができる。 Referring to FIG. 2, a second version 200 of graph 100 is generated in which sorting component 104 is omitted and permuted sorting component 204 is included. A permutation sort component 204 is placed in the second version 200 of the graph at the same position as the sort component 104 and is capable of sorting alphanumeric columns. Other components of the graph remain unchanged. A second version 200 of the graph can thus process the new set of data.

構成要素によっては、任意のフォーマットのデータを受信し、このデータに動作を実行することができてもよいが、特定のフォーマットのデータを出力してもよい。互いに異なるフォーマットのデータが出力として望まれる場合（たとえば、特定のフォーマットを指定する別のアプリケーションへの入力として提供されるように）、グラフは、このデータを提供することができない場合がある。グラフが所望のフォーマットのデータを出力できるようにするために、フォーマット特有の構成要素のうちの１つ又は複数を、所望のフォーマットのデータを出力できる構成要素で置換することができる。 Some components may be able to receive and perform operations on data in any format, but may output data in a particular format. If data in different formats are desired as output (eg, to be provided as input to another application that specifies a particular format), the graph may not be able to provide this data. To enable the graph to output data in the desired format, one or more of the format-specific components can be replaced with components capable of outputting data in the desired format.

やはり図１を参照すると、フォーマット変更構成要素１０８は、特定のフォーマットのデータを出力するフォーマット特有の構成要素である。たとえば、フォーマット変更構成要素１０８は、４つのフィールド、Ｎａｍｅ、Ａｃｃｏｕｎｔ＿ｎｕｍ、Ｂａｌａｎｃｅ、及びＴｒａｎｓ＿ｄａｔｅを有するデータ・レコードを出力してもよい。グラフ１００のオペレータは、たとえば、入力データのレコード・フォーマットについて特定の要求条件を有する別のアプリケーションによって出力データを処理できるように、互いに異なるフォーマットの出力データをグラフ１００が生成するようにしたいと思うことがある。この例では、出力データの所望のフォーマットは、４つのフィールド、Ｃｕｓｔ＿ｎａｍｅ、Ｂａｌａｎｃｅ、Ａｃｃｏｕｎｔ＿ｎｕｍ、及びＴｒａｎｓ＿ｄａｔｅを含む。すなわち、出力データの第１のフィールドの名前を変更し、第２のフィールド及び第３のフィールドを切り替える必要がある。 Still referring to FIG. 1, format modification component 108 is a format-specific component that outputs data in a particular format. For example, format modification component 108 may output a data record with four fields, Name, Account_num, Balance, and Trans_date. Operators of graph 100 may want graph 100 to produce output data in different formats, for example, so that the output data can be processed by another application that has specific requirements for the record format of the input data. Sometimes. In this example, the desired format of the output data includes four fields Cust_name, Balance, Account_num, and Trans_date. That is, it is necessary to rename the first field of the output data and switch the second and third fields.

図３を参照すると、フォーマット変更構成要素１０８が省略され、置換フォーマット変更構成要素３０８が含まれる、グラフ１００の第２のバージョン３００が生成される。置換フォーマット変更構成要素３０８は、グラフの第２のバージョン３００内での、フォーマット変更構成要素１０８と同じ位置に配置され、所望のフォーマットの出力データを生成することができる。グラフの他の構成要素は変化しない。 Referring to FIG. 3, a second version 300 of graph 100 is generated in which format change component 108 is omitted and a replacement format change component 308 is included. The replacement format change component 308 can be placed in the second version 300 of the graph at the same position as the format change component 108 to produce the desired format of the output data. Other components of the graph remain unchanged.

例によっては、フォーマット特有の構成要素の上流又は下流の１つ又は複数の構成要素など、グラフの他の構成要素を省略することもできる。場合によっては、省略されたその他の構成要素のうちの１つ又は複数の代わりに、置換構成要素を含めることができる。 In some examples, other components of the graph may be omitted, such as one or more components upstream or downstream of the format-specific component. In some cases, replacement components may be included in place of one or more of the omitted other components.

例によっては、グラフ１００の試験者は、グラフ１００をデバッグして、その機能を検証したいと考えることがある。場合によっては、試験者は、データが、ある構成要素から別の構成要素に流れるとき、このデータを検証したいと思うことがある。場合によっては、試験者は、グラフ１００での上流構成要素を迂回し、その代わりに、迂回された構成要素の位置にデータを挿入したいと思うことがある。場合によっては、試験者は、グラフを変更することの、グラフが出力するデータへの影響を監視するために、入力データの一貫したセットを使用してグラフ１００の動作をテストしたいと思うことがある。場合によっては、試験者は、グラフの機能の全てを少なくとも１回だけは実行できるようにすることを知っている入力データのセットを使用して、グラフ１００の動作をテストし、したがってグラフの完全なテストを可能にしたいと思うことがある。 In some examples, a tester of graph 100 may wish to debug graph 100 to verify its functionality. In some cases, testers may wish to verify data as it flows from one component to another. In some cases, a tester may wish to bypass an upstream component in graph 100 and instead insert data at the location of the bypassed component. In some cases, a tester may wish to test the behavior of graph 100 using a consistent set of input data to monitor the effect of changing the graph on the data the graph outputs. be. In some cases, the tester tests the behavior of graph 100 using a set of input data that he knows will allow all of the graph's functions to be executed at least once, thus perfecting the graph. Sometimes we want to be able to test

グラフ１００をデバッグする際、グラフの修正をやめることが望ましい場合がある。たとえば、試験者は、グラフの機能を損なうリスクを負いたくない思うことがある。例によっては、試験者は、グラフへのアクセスを制限されているか、又はアクセスできない場合がある（たとえば、試験者には、グラフを編集するのに必要な許可がない場合がある）。グラフ１００を修正することなく、このグラフ１００をデバッグするために、オーバレイを使用してグラフをデバッグすることができる。例によっては、たとえばグラフの自動解析に基づいて、このオーバレイを自動的に指定することができる。元のグラフ１００（グラフの第１のバージョンと呼ばれることがある）、及びオーバレイ指定に基づいて、グラフ１００の少なくとも一部分の第２のバージョンを生成することができる。 When debugging graph 100, it may be desirable to refrain from modifying the graph. For example, testers may not want to risk compromising the functionality of the graph. In some instances, the tester may have limited or no access to the graph (eg, the tester may not have the necessary permissions to edit the graph). To debug this graph 100 without modifying it, overlays can be used to debug the graph. In some examples, this overlay can be specified automatically, for example based on automatic analysis of the graph. A second version of at least a portion of the graph 100 can be generated based on the original graph 100 (sometimes referred to as the first version of the graph) and the overlay specifications.

グラフ１００の各構成要素間のフローをデータが通過するとき、たとえば、第１の構成要素から第２の構成要素までのフローに沿って、又は出力データ受信側までのフローに沿って、プローブが、このデータを収集又は監視する。たとえば、グラフ１００が実行される際にデータがフローを通過するとき、このデータを監視し、後で検査するために保存し、又は再使用するために保存することができる。オーバレイ指定は、収集又は監視すべきデータを運ぶフローを参照するプローブを定義することができる。プローブは、データを収集又は監視すべきフローを指定する。プローブは、特定の値を報告し、又は特定の値が、所定範囲内又は所定範囲外にあるときに、そのことを報告するように構成することができる。プローブを通過するデータは、後で解析又は使用するために保存してもよく、たとえば、このデータは、単層ファイル又はリレーショナル・データベースに記憶することができる。 As data traverses the flow between each component of graph 100, e.g., along the flow from the first component to the second component, or along the flow to the output data receiver, the probe , collect or monitor this data. For example, as data passes through the flow as graph 100 is executed, this data can be monitored, saved for later inspection, or saved for reuse. Overlay specifications can define probes that refer to flows that carry data to be collected or monitored. Probes specify the flows for which data should be collected or monitored. A probe can be configured to report a particular value, or to report when a particular value is within or outside a predetermined range. Data passing through the probe may be saved for later analysis or use, for example, the data may be stored in a flat file or relational database.

例によっては、プローブは、グラフ１００のある構成要素から、ファイル又はデータベースなど出力データ受信側までのフローを参照することができる。グラフ１００のデバッギング中に、データ受信側までのフローに沿ってプローブを配置することによって、このプローブは、グラフ１００から出力されるデータを受信する。たとえば、グラフ１００がデバッギング・モードで実行されるたびに、出力データをプローブによって受信し、ファイルに書き込むことができ、その結果、様々なグラフ実行からの出力データを比較することができ、又は他の方法で評価することができる。例によっては、出力データ受信側が自動的に識別され、オーバレイが自動的に指定されて、識別された出力データ受信側の前に挿入するためのプローブを定義する。 In some examples, a probe can refer to a flow from some component of graph 100 to an output data receiver such as a file or database. By placing a probe along the flow to the data receiver during debugging of graph 100 , this probe receives data output from graph 100 . For example, each time graph 100 is run in debugging mode, output data may be received by a probe and written to a file so that output data from various graph runs may be compared, or otherwise can be evaluated by the method of In some examples, output data receivers are automatically identified and overlays are automatically specified to define probes for insertion in front of the identified output data receivers.

例によっては、プローブは、グラフ１００の上流構成要素から下流構成要素までのフローを参照することができる。グラフ１００のデバッギング中に、下流構成要素までのフローに沿ってプローブを配置することによって、このプローブは、下流構成要素によって他の方法で受信されていたはずのデータを受信し、したがって下流構成要素が実行しないようにする。たとえば、試験者は、下流構成要素の前にグラフが処理する結果を監視したいと考えることがある。たとえば、この下流構成要素は、グラフの外部に影響を及ぼす機能を有してもよく、たとえば、下流構成要素は、この下流構成要素によって自身のクレジット・カードのレコードが処理されるそれぞれの人に向けて、テキスト・メッセージを送信してもよい。グラフのデバッギング中、試験者は、グラフの外部に影響を及ぼすような構成要素を使用不可にしたいと考えることがある。 In some examples, a probe may refer to a flow from an upstream component to a downstream component of graph 100 . By placing a probe along the flow to a downstream component during debugging of graph 100, this probe receives data that would otherwise have been received by the downstream component, thus prevent it from running. For example, a tester may wish to monitor the results of graph processing before downstream components. For example, this downstream component may have functionality that affects outside of the graph, e.g. You can send a text message to During graph debugging, the tester may wish to disable components that affect the outside of the graph.

グラフ１００の２つの構成要素間の特定のフローにおいて、テスト・ソースがグラフ１００にデータを挿入する。オーバレイ指定は、テスト・ソースからのデータで置換すべきデータを運ぶフローを参照する、テスト・ソースを定義することができる。例によっては、テスト・ソースは、通常、フローを通過するはずのデータを、新規のデータで置換する。状況によっては、テスト・ソースは、これまでに保存されたデータを読み取り、このデータを下流構成要素に渡すように構成することができる。例によっては、テスト・ソースは、データベース又はファイルなど、データ・ソースからのフローにおいて、データをグラフ１００に挿入する。テスト・ソースは、データ・ソースによって他の方法で提供されたはずのデータと同じフォーマットを有するデータを挿入することができる。例によっては、データ・ソースが自動的に識別され、オーバレイが自動的に指定されて、識別されたデータ・ソースを置換するためのテスト・ソースを定義する。 A test source inserts data into graph 100 in a particular flow between two components of graph 100 . An overlay specification can define a test source that references flows that carry data to be replaced with data from the test source. In some examples, the test source replaces data that would normally flow through with new data. In some situations, a test source can be configured to read previously saved data and pass this data to downstream components. In some examples, a test source inserts data into graph 100 in the flow from a data source, such as a database or file. A test source can insert data that has the same format as data that would otherwise have been provided by the data source. In some examples, data sources are automatically identified and overlays are automatically specified to define test sources to replace the identified data sources.

例によっては、あるポイントまで（たとえば、ある構成要素まで）グラフ１００を実行した結果が、これまでに検証されていた場合がある。すなわち、上流プロセス機能が、あるポイントまで検証されていた場合がある。こうした場合、グラフ１００が実行されるたびに上流構成要素が各機能を再処理することは、非効率になる場合がある。テスト・ソースは、そのあるポイントにおいて、データ（たとえば、これまでに検証されたデータ）をグラフに挿入することができる。このようにして、これまでに実行されたグラフ１００のセクション全体を迂回してもよい。 In some examples, the results of running graph 100 to a point (eg, to a component) may have been previously verified. That is, upstream process functionality may have been verified up to a point. In such cases, it may be inefficient for upstream components to reprocess each function each time graph 100 is executed. A test source can insert data (eg, data that has been verified so far) into the graph at some point thereof. In this way, the entire section of graph 100 that has been executed so far may be bypassed.

図４には、１つ又は複数の挿入を定義するオーバレイ指定２００の一例が示してある。挿入は、グラフ（たとえば、グラフ１００）のフローに関連付けられたオブジェクトとすることができ、プローブ、テスト・ソース、又は置換構成要素の形をとることができる。図４の例では、オーバレイ指定２００は、１つのテスト・ソース定義２０１、及び１つのプローブ定義２１３を含む。オーバレイ指定２００は、グラフ１００についての仕様を含むファイルとは別のファイルなど、ファイルに記憶することができる。 An example of an overlay specification 200 defining one or more inserts is shown in FIG. Insertions can be objects associated with the flow of a graph (eg, graph 100) and can take the form of probes, test sources, or replacement components. In the example of FIG. 4, overlay specification 200 includes one test source definition 201 and one probe definition 213 . Overlay specifications 200 may be stored in a file, such as a file separate from the file containing specifications for graph 100 .

オーバレイ指定２００は、挿入定義が対応できる先のグラフを指定する、３行のヘッダで開始する。ヘッダの後には、テスト・ソース定義２０１、プローブ定義２１３、及び置換構成要素定義（図示せず）が続く。 The overlay specification 200 begins with a three-line header that specifies the graph to which the insert definition can correspond. Following the header are test source definitions 201, probe definitions 213, and replacement component definitions (not shown).

テスト・ソース定義２０１は、名前２０２、上流ポート２０４、下流ポート２０６、挿入タイプ２０８、プロトタイプ・パス２１０、及びレイアウト・パラメータ２１２を含む。 Test source definition 201 includes name 202 , upstream port 204 , downstream port 206 , insertion type 208 , prototype path 210 and layout parameters 212 .

テスト・ソース定義２０１の上流ポート２０４は、テスト・ソースをグラフ１００に挿入すべきフローのすぐ上流にある構成要素の出力ポートを参照する。フローの上流にある構成要素は、その出力ポートからフロー上にデータが出力される元の構成要素である。図４の例では、テスト・ソース定義２０１の上流ポート２０４が、データベース１０２の出力を指す。テスト・ソース定義２０１の下流ポート２０６は、テスト・ソースをグラフ１００に挿入すべきフローのすぐ下流にある構成要素の入力ポートを参照する。フローの下流にある構成要素は、その入力ポートにおいてフローからデータが受信される構成要素である。図４の例では、テスト・ソース定義の下流ポート２０６が、フィルタ構成要素１０３の入力を指す。したがって、この例でのテスト・ソース定義２０１は、データベース１０２の出力と、フィルタ構成要素１０３の入力との間のフローにテスト・ソースが配置されることになり、その結果、このテスト・ソースが提供するデータが、データベース１０２からの入力データを置換できることを示す。 The upstream port 204 of the test source definition 201 references the output port of the component immediately upstream in the flow where the test source is to be inserted into the graph 100 . A component upstream in a flow is the component from which data is output on the flow from its output port. In the example of FIG. 4, upstream port 204 of test source definition 201 points to the output of database 102 . Downstream port 206 of test source definition 201 references the input port of the component immediately downstream in the flow where the test source is to be inserted into graph 100 . A component downstream of a flow is a component whose input port receives data from the flow. In the example of FIG. 4, downstream port 206 of the test source definition points to the input of filter component 103 . Thus, test source definition 201 in this example results in a test source being placed in the flow between the output of database 102 and the input of filter component 103, so that this test source is Indicates that the data provided can replace input data from database 102 .

挿入タイプ２０８は、挿入がテスト・ソースなのか、プローブなのか、それとも置換構成要素なのか定義する。図４の例では、値「０」がテスト・ソースを定義し、値「１」がプローブを定義し、値「２」が置換構成要素を定義する。他の値を使用して、挿入のタイプを定義することもできる。この挿入はテスト・ソースなので、挿入タイプ２０８の値は「０」である。 Insert type 208 defines whether the insert is a test source, probe, or replacement component. In the example of FIG. 4, a value of '0' defines a test source, a value of '1' defines a probe, and a value of '2' defines a replacement component. Other values can be used to define the type of insert. Since this insert is a test source, the value of insert type 208 is "0".

プロトタイプ・パス２１０は、挿入のタイプを示す。この例では、この挿入はテスト・ソースなので、プロトタイプ・パス２１０は、入力ファイル構成要素を指定する。プロトタイプ・パス２１０は、特定のタイプの挿入を定義するコードを含むファイルを指す。レイアウト・パラメータ２１２は、テスト・ソースが含むことになるデータを含むソース・ファイルの位置を定義する。例によっては、この位置はファイル・パスである。ソース・ファイル内のデータは、通常、上流ポート２０４及び下流ポート２０６によって定義されるフローを通過するはずのデータを置換することになる。すなわち、テスト・ソースがグラフ１００に適用されると、フィルタ構成要素１０３が、データベース１０２からのデータを受信するのではなく、ソース・ファイル内のデータを受信する。 A prototype path 210 indicates the type of insertion. In this example, this insert is test source, so the prototype path 210 specifies the input file component. A prototype path 210 points to a file containing code that defines a particular type of insertion. Layout parameters 212 define the location of the source files that contain the data that the test source will contain. In some examples, this location is a file path. Data in the source file will normally replace data that would pass through the flow defined by upstream port 204 and downstream port 206 . That is, when the test source is applied to graph 100 , filter component 103 receives data in the source file rather than receiving data from database 102 .

このソース・ファイルは、テスト・ソースの下流の構成要素によって他の方法で受信されるはずのデータと同じフォーマットを有するデータを含む。例によっては、ソース・ファイル内のデータは、テスト・ソースの上流にあるデータ・ソース（たとえば、データベース）内のデータと同じでもよい。たとえば、データベース１０２からのデータ・レコードを、ソース・ファイルにコピーすることができる。例によっては、データ・ソースは、ＳＱＬクエリなどの実行可能文を示す。これらの例では、ＳＱＬクエリを実行することができ、このクエリ実行の結果をソース・ファイルに記憶することができる。例によっては、ソース・ファイル内のデータは、データ・ソース以外のどこかから取得することができる。たとえば、ソース・ファイル内のデータを生成して、グラフ１００を完全にデバッグするために、あるデータ（たとえば、ある範囲の値）を確実に処理することができる。例によっては、データ・ソース内のデータが変更されても、ソース・ファイル内のデータは同じままであり、したがって、入力データの一貫したセットでデバッギングを継続できるようになる。 This source file contains data that has the same format as data that would otherwise be received by components downstream of the test source. In some examples, the data in the source files may be the same as the data in the data source (eg, database) upstream of the test source. For example, data records from database 102 can be copied to the source file. In some examples, the data source represents executable statements such as SQL queries. In these examples, a SQL query can be executed and the results of this query execution can be stored in a source file. In some examples, the data in the source files can come from somewhere other than the data source. For example, the data in the source file can be generated to reliably process certain data (eg, a range of values) in order to fully debug graph 100 . In some examples, the data in the source file remains the same even though the data in the data source changes, thus allowing debugging to continue with a consistent set of input data.

例によっては、ソース・ファイル内のデータは、グラフ１００の通常の実行中にフローを通過するはずのデータと同じでもよいが、テスト・ソースを使用してこのデータを挿入することによって、上流構成要素は処理を控えることができる。たとえば、複製構成要素１０６などの上流構成要素は、データを処理するのに大量のシステム・リソースを必要とすることがあり、又は、データ・フロー・グラフ１００内の他の構成要素と比較して、データを処理するのに要する時間が相対的に長くなることがある。したがって、既知のデータ（たとえば、通常の実行中にフローを通過するはずの同じデータ）をフローに挿入して、時間を節約し、又はシステム・リソースを保護することができる。 In some examples, the data in the source file may be the same data that would have passed through the flow during normal execution of the graph 100, but by inserting this data using a test source, the upstream configuration Elements can refrain from processing. For example, an upstream component such as replication component 106 may require a large amount of system resources to process data, or may require a large amount of system resources compared to other components in data flow graph 100 . , the time required to process the data can be relatively long. Thus, known data (eg, the same data that would have passed through the flow during normal execution) can be inserted into the flow to save time or conserve system resources.

プローブ定義２１３は、名前２１４、上流ポート２１６、下流ポート２１８、挿入タイプ２２０、及びプロトタイプ・パス２２２を含む。 Probe definition 213 includes name 214 , upstream port 216 , downstream port 218 , insertion type 220 and prototype path 222 .

プローブ定義２１３の上流ポート２１６は、プローブをグラフ１００に挿入すべきフローのすぐ上流にある構成要素の出力ポートを参照する。図４の例では、プローブ定義２１３の上流ポート２１６が、集約構成要素１１２の出力を指す。プローブ定義２１３の下流ポート２１８は、プローブをグラフ１００に挿入すべきフローのすぐ下流にある構成要素の入力ポートを参照する。図４の例では、プローブ定義２１３の下流ポート２１８が、出力データ受信側構成要素１１４を指す。したがって、この例でのプローブ定義２１３は、集約構成要素１１２の出力と、出力データ受信側構成要素１１４との間のフローにプローブが配置されることになり、その結果、出力データ受信側構成要素に他の方法で書き込まれたはずのデータをプローブが受信することを示す。 The upstream port 216 of the probe definition 213 references the output port of the component immediately upstream of the flow that should insert the probe into the graph 100 . In the example of FIG. 4, upstream port 216 of probe definition 213 points to the output of aggregation component 112 . The downstream port 218 of the probe definition 213 references the input port of the component immediately downstream in the flow that should insert the probe into the graph 100 . In the example of FIG. 4, downstream port 218 of probe definition 213 points to output data receiver component 114 . Thus, the probe definition 213 in this example would result in the probe being placed in the flow between the output of the aggregation component 112 and the output data receiver component 114, resulting in the output data receiver component indicates that the probe receives data that would otherwise have been written to .

プローブ定義２１３の挿入タイプ２２０は、挿入がテスト・ソースなのか、プローブなのか、それとも置換構成要素なのか定義する。この挿入はプローブなので、挿入タイプ２２０の値は「１」である。 Insertion type 220 of probe definition 213 defines whether the insertion is a test source, a probe, or a replacement component. Since this insert is a probe, the value of insert type 220 is "1".

プロトタイプ・パス２２２は、挿入のタイプを示す。この例では、この挿入はプローブなので、プロトタイプ・パス２２２は出力ファイル構成要素を指定する。プロトタイプ・パス２２２は、特定のタイプの挿入を定義するコードを含むファイルを指す。 Prototype path 222 indicates the type of insertion. In this example, this insertion is a probe, so prototype path 222 specifies an output file component. A prototype path 222 points to a file containing code that defines a particular type of insertion.

例によっては、プローブによって監視すべきデータは、システムによって自動的に作成されるファイルに記憶される。このファイルは、システムによって決定される位置に記憶することができる。プローブは、上流ポート２１６及び下流ポート２１８によって定義されたフローを通過するデータを監視する。すなわち、プローブがグラフ１００に適用されると、集約構成要素１１２の出力から、出力データ受信側構成要素１１４の入力まで通過するデータが監視され、システムによって自動的に作成されるファイルに記憶される。例によっては、このデータを、記憶する前に監視することができる。ファイルは、プローブ定義によって参照される構成要素（この例では、外部のデータ受信側構成要素１１４）によって受信されたはずの同じフォーマットのデータを受信することができる。 In some examples, data to be monitored by probes is stored in files that are automatically created by the system. This file can be stored at a location determined by the system. The probe monitors data passing through the flow defined by upstream port 216 and downstream port 218 . That is, when a probe is applied to graph 100, the data passing from the output of aggregation component 112 to the input of output data receiver component 114 is monitored and stored in a file automatically created by the system. . In some examples, this data can be monitored prior to storage. The file can receive data in the same format that would have been received by the component referenced by the probe definition (external data receiver component 114 in this example).

例によっては、グラフ１００の自動解析の結果として、オーバレイ指定によって、１つ又は複数のプローブ又はテスト・ソースの挿入を定義することができる。たとえば、グラフ１００の自動解析を実行して、データベース、ファイル、又は他のタイプのデータ・ソースなど、任意のデータ・ソースを識別することができる。識別されたデータ・ソースのうちの１つ又は複数を、テスト・ソースで自動的に置換することができる。置換されたデータ・ソースにより、データ・ソースのすぐ下流のフローにテスト・ソースが挿入され、その結果、データ・ソースからのデータではなく、このテスト・ソースからのデータが下流構成要素に提供されることを意味する。同様に、グラフ１００の自動解析は、データベース、ファイル、又は他のタイプの出力データ受信側など、任意の出力データ受信側を識別することができる。識別された出力データ受信側のうちの１つ又は複数を、プローブで自動的に置換することができる。置換された出力データ受信側により、出力データ受信側のすぐ上流のフローにプローブが挿入され、その結果、上流構成要素からのデータが、出力データ受信側ではなくプローブによって受信されることを意味する。グラフ１００の自動解析を使用して、特定のタイプの構成要素（たとえば、実行するとグラフ１００の外部に影響を及ぼす特定のタイプの構成要素）など、他の構成要素を識別することもできる。 In some examples, the overlay specification may define the insertion of one or more probes or test sources as a result of the automatic analysis of graph 100 . For example, an automated analysis of graph 100 can be performed to identify any data source, such as a database, file, or other type of data source. One or more of the identified data sources can be automatically replaced with test sources. The replaced data source inserts the test source into the flow immediately downstream of the data source, so that downstream components are provided with data from this test source instead of data from the data source. means that Similarly, automatic analysis of graph 100 can identify any output data receiver, such as a database, file, or other type of output data receiver. One or more of the identified output data receivers can be automatically replaced with probes. Permuted output data receiver inserts a probe into the flow immediately upstream of the output data receiver, meaning that data from the upstream component is received by the probe instead of the output data receiver. . Automatic analysis of graph 100 can also be used to identify other components, such as specific types of components (eg, specific types of components whose execution affects the outside of graph 100).

テスト・ソース及びプローブの挿入のさらなる説明が、米国特許出願第１４／７１５，８０７号に提示されており、その内容全体を参考としてここに組み込む。 Further discussion of test source and probe insertion is presented in US patent application Ser. No. 14/715,807, the entire contents of which are incorporated herein by reference.

置換構成要素定義は、名前、上流ポート、下流ポート、挿入タイプ、プロトタイプ・パス、及びレイアウト・パラメータを含む。置換構成要素定義の上流ポートは、この置換構成要素がグラフ１００内に挿入される位置のすぐ上流にある構成要素の出力ポートを参照する。置換構成要素定義の下流ポートは、この置換構成要素がグラフ内に挿入される位置のすぐ下流にある構成要素の入力ポートを参照する。上流ポート及び下流ポートに基づいて、置換構成要素によって置換すべきグラフ１００内の既存の構成要素を識別することができる。挿入タイプは、この挿入が置換構成要素であると定義する。 A replacement component definition includes a name, upstream port, downstream port, insertion type, prototype path, and layout parameters. The upstream port of a replacement component definition refers to the output port of the component immediately upstream from where this replacement component is inserted into graph 100 . The downstream port of a replacement component definition refers to the input port of the component immediately downstream from where this replacement component is inserted into the graph. An existing component in graph 100 to be replaced by a replacement component can be identified based on the upstream port and downstream port. The insert type defines this insert to be a replacement component.

プロトタイプ・パスは、挿入のタイプを示す。この例では、この挿入が置換構成要素なので、プロトタイプ・パスは、置換構成要素を定義するコードを含むファイルを指す。置換構成要素を定義するコードは、置換すべき既存の構成要素を定義するコードに基づいているが、所望のフォーマットのデータを処理することができる。 The prototype path indicates the type of insert. In this example, this insertion is a replacement component, so the prototype path points to the file containing the code defining the replacement component. The code defining the replacement component is based on the code defining the existing component to be replaced, but can handle data in any desired format.

例によっては、グラフの自動解析の結果として、オーバレイ指定によって、グラフ用の１つ又は複数の置換構成要素を定義することができる。たとえば、グラフ内の各構成要素の仕様を解析することができる。構成要素の仕様は、この構成要素を定義するコード、たとえば、この構成要素によって表されるデータ処理動作を定義するコードを含むか、又はそれを指す。コードを解析すると、構成要素によって表されるデータ処理動作がデータのフォーマットに依存しているかどうかを明らかにすることができる。 In some examples, overlay specifications may define one or more replacement components for a graph as a result of automatic analysis of the graph. For example, the specifications of each component in the graph can be analyzed. A component specification includes or refers to code that defines this component, eg, code that defines the data processing operations represented by this component. Analysis of the code can reveal whether the data processing operations represented by the components depend on the format of the data.

置換構成要素は、識別されたフォーマット特有の構成要素のうちの１つ又は複数について定義される。例によっては、置換すべきフォーマット特有の構成要素は、ユーザ入力に基づいて識別される。たとえば、ユーザは、入力データのフォーマット、フォーマット特有の構成要素のそれぞれによって表されるプロセス、又はその両方についての自らの知識を使用して、構成要素のいずれかを置換するか判定してもよい。例によっては、グラフによってこれまでに処理されたデータのフォーマットに対して入力データのフォーマットの自動解析を実行して、フォーマット特有の構成要素のいずれかを置換するかを識別することができる。 Replacement components are defined for one or more of the identified format-specific components. In some examples, the format-specific component to replace is identified based on user input. For example, a user may use their knowledge of the format of the input data, the process represented by each format-specific component, or both, to determine whether to replace any of the components. . In some examples, an automatic analysis of the format of the input data can be performed against the format of the data previously processed by the graph to identify replacements for any of the format-specific components.

例によっては、グラフベースではないコンピュータ・プログラムでは、コンピュータ・プログラム内の１つ又は複数のフォーマット特有のプロセスを識別し、１つ又は複数の他のプロセス、たとえば、特定のフォーマットのデータに対して作用できるプロセス、又は任意のフォーマットのデータ対して作用できるプロセスでこれを置換することができる。 In some examples, a non-graph-based computer program identifies one or more format-specific processes within the computer program and performs one or more other processes, e.g., on data in a particular format. It can be replaced by a process that can operate on or can operate on data in any format.

図５を参照すると、テスト・ソース、プローブ、又はその両方を挿入するため、解析エンジン３００が、グラフ１００を自動的に解析して、データ・ソース３０２及び出力データ受信側３０４を識別する。たとえば、解析エンジン３００は、グラフ１００のそれぞれのノードについて、パラメータ及び接続にアクセスすることができる（用語「ノード」及び「構成要素」は、これを区別なく使用することがある）。所与のノードが入力接続をもたない場合、解析エンジン３００は、このノードをデータ・ソースと識別する。同様に、所与のノードが出力接続をもたない場合、解析エンジン３００は、このノードを出力データ受信側と識別する。グラフのそれぞれのノードにアクセスし、これを解析するため、解析エンジンは、グラフの各接続（用語「接続」及び「フロー」は、これを区別なく使用することがある）の全てを「探索」して回る。例によっては、実行時（たとえば、デバッギング目的で処理が開始するとき）まで、グラフ１００はインスタンス化又はパラメータ化されない。解析エンジン３００は、実行時に自動解析を実行して、グラフ１００内のデータ・ソース及び出力データ受信側を識別することができる。 Referring to FIG. 5, analysis engine 300 automatically parses graph 100 to identify data sources 302 and output data receivers 304 in order to insert test sources, probes, or both. For example, the analysis engine 300 can access the parameters and connections for each node of the graph 100 (the terms "node" and "component" are sometimes used interchangeably). If a given node has no input connections, parsing engine 300 identifies this node as a data source. Similarly, if a given node has no output connections, parsing engine 300 identifies this node as an output data receiver. In order to access and parse each node of the graph, the parsing engine "explores" all of the graph's connections (the terms "connection" and "flow" are sometimes used interchangeably). go around. In some examples, graph 100 is not instantiated or parameterized until run time (eg, when processing begins for debugging purposes). Analysis engine 300 can perform automatic analysis at runtime to identify data sources and output data receivers in graph 100 .

解析エンジン３００は、データ・ソース３０２及び出力データ受信側３０４の識別子を挿入エンジン３０６に送信し、この挿入エンジンは、データ・ソースと出力データ受信側のいずれが、それぞれテスト・ソース及びプローブで置換されることになるのか判定する。例によっては、試験者３０８は、テスト・ソース及びプローブで置換すべき、データ・ソース及び出力データ受信側のリスト３１０を提供する。このリスト３１０は、ファイル、データベースとして、又は別のフォーマットで提供することができる。たとえば、試験者３０８は、頻繁に変更されると予想される任意のデータ・ソースを、リスト３１０上にもっていてもよい。このようなデータ・ソースをテスト・ソースで置換することにより、試験者３０８は、一貫した入力データを使用してグラフを確実にテストできるようにすることができる。 Analysis engine 300 sends the identifiers of data source 302 and output data receiver 304 to insertion engine 306, which replaces either data source or output data receiver with a test source and probe, respectively. determine whether or not it will be done. In some examples, the tester 308 provides a list 310 of data sources and output data receivers that should be replaced with test sources and probes. This list 310 may be provided as a file, database, or in another format. For example, tester 308 may have any data source on list 310 that is expected to change frequently. By replacing such data sources with test sources, tester 308 can ensure that consistent input data can be used to test graphs.

挿入エンジン３０６は、識別されたそれぞれのデータ・ソース３０２及び出力データ受信側３０４と、リスト３１０上のデータ・ソース及び出力データ受信側とを比較する。挿入エンジンは、リスト３１０上に表示される任意のデータ・ソース３０２又は出力データ受信側３０４について、オーバレイ指定３１２を作成する。例によっては、上流ポートや下流ポートなど、オーバレイ指定３１２についてのパラメータが、解析エンジン３００によって挿入エンジン３０６に提供される。例によっては、挿入エンジン３０６は、グラフ１００にアクセスして、関連するパラメータを取得する。 Insertion engine 306 compares each identified data source 302 and output data receiver 304 with the data sources and output data receivers on list 310 . The insert engine creates an overlay specification 312 for any data source 302 or output data receiver 304 that appears on list 310 . In some examples, parameters for overlay specifications 312, such as upstream ports and downstream ports, are provided by parsing engine 300 to insertion engine 306. FIG. In some examples, the insertion engine 306 accesses the graph 100 to obtain relevant parameters.

テスト・ソースについてオーバレイ指定３１２を作成するため、挿入エンジン３０６は、ソース・ファイルにデータを読み込む。例によっては、挿入エンジン３０６は、特定のデータ・ソース３０２に取って代わることになるテスト・ソース用のソース・ファイルに、データ・ソース３０２からコピーされたデータを読み込む。例によっては、データ・ソース３０２は、ＳＱＬ文などの実行可能な表現式を含み、挿入エンジン３０６は、この実行可能な表現式を実行し、ソース・ファイルに実行した結果を読み込む。例によっては、挿入エンジン３０６は、ユーザ・インターフェース３１４を介して、ソース・ファイル用のデータを試験者３０８に要求することができる。たとえば、挿入エンジン３０６は、識別されたデータ・ソース３０２のリストを試験者３０８に提供することができ、その結果、試験者３０８は、識別されたデータ・ソース３０２のうちのいずれがテスト・ソースで置換されることになるのか選択することができる。試験者３０８は、テスト・ソース用のソース・ファイルに含むべきデータを指定することもできる。場合によっては、試験者３０８は、テスト・ソース用のデータを含むファイルの位置（たとえば、パス）を識別することができる。場合によっては、試験者３０８は、元のデータ・ソース３０２内のデータのコピーであるソース・ファイルを生成するよう、挿入エンジン３０８に指示することができる。場合によっては、試験者３０８は、元のデータ・ソース３０２に含まれるか、又はそれに関連付けられた、ＳＱＬ文などの実行可能な表現式を実行するよう、挿入エンジン３０８に指示することができる。場合によっては、試験者３０８は、テスト・ソースのソース・ファイル用に、データを生成できるようにすることができる。たとえば、試験者３０８は、グラフ内のあらゆる機能が少なくとも１回実行されるようにする、実データ又は生成済みデータなどのデータのセットを提供してもよい。 To create overlay specifications 312 for the test source, insertion engine 306 reads data into the source file. In some examples, insertion engine 306 reads data copied from data source 302 into source files for test sources that will supersede a particular data source 302 . In some examples, the data source 302 contains executable expressions, such as SQL statements, and the insert engine 306 executes the executable expressions and reads the execution results into the source files. In some examples, insertion engine 306 can request data for source files from tester 308 via user interface 314 . For example, the insertion engine 306 can provide a list of identified data sources 302 to the tester 308 so that the tester 308 can determine which of the identified data sources 302 is the test source. You can choose whether it will be replaced with The tester 308 can also specify data to be included in the source files for the test source. In some cases, the tester 308 can identify the location (eg, path) of the file containing the data for the test source. In some cases, tester 308 can direct insertion engine 308 to generate a source file that is a copy of the data in original data source 302 . In some cases, tester 308 can direct insertion engine 308 to execute executable expressions, such as SQL statements, contained in or associated with original data source 302 . In some cases, the tester 308 may allow data to be generated for the source files of the test sources. For example, tester 308 may provide a set of data, such as real data or generated data, that causes every function in the graph to be executed at least once.

プローブ用のオーバレイ指定３１２を作成するため、挿入エンジン３０８は、出力データを記憶すべきファイルの位置を決定する。例によっては、この位置は、たとえばシステム技術者によって、デフォルトに設定される。例によっては、挿入エンジン３０６は、ユーザ・インターフェース３１４を介して、出力データ・ファイルについての位置を指定するよう試験者３０８に要求することができる。 To create the overlay specification 312 for the probe, the insertion engine 308 determines the location of the file in which the output data should be stored. In some examples, this location is set by default, eg, by a system engineer. In some examples, insertion engine 306 can request tester 308 to specify a location for the output data file via user interface 314 .

置換構成要素を挿入するため、解析エンジン３００は、グラフ１００を解析して、グラフ内の１つ又は複数のフォーマット特有の構成要素３０５を識別する。グラフの各構成要素を解析するため、解析エンジン３００は、グラフの接続の全てを「探索」して回る。例によっては、解析エンジン３００は、グラフの最も遠い上流構成要素から開始し、この上流構成要素からの各出力フローを「探索」して回り、したがって、最終的にはグラフの構成要素の全てを解析してもよい。逆に言えば、解析エンジン３００は、グラフの最も遠い下流構成要素から開始し、この下流構成要素への各入力フローを「探索」して回り、したがって、最終的にはグラフの構成要素の全てを解析してもよい。 To insert replacement components, parsing engine 300 parses graph 100 to identify one or more format-specific components 305 within the graph. To analyze each component of the graph, the analysis engine 300 "walks" through all of the graph's connections. In some examples, the parsing engine 300 starts with the furthest upstream component of the graph and "treats" through each outgoing flow from this upstream component, thus eventually traversing all of the graph's components. may be analyzed. Conversely, the parsing engine 300 starts with the furthest downstream component of the graph and "searches" through each input flow to this downstream component, thus eventually reaching all the components of the graph. may be analyzed.

解析エンジン３００は、グラフ１００内の各構成要素の仕様にアクセスすることができる。構成要素の仕様は、この構成要素を定義するコード、たとえば、この構成要素によって表されるデータ処理動作を定義するコードを含むか、又はそれを指す。コードの解析に基づいて、解析エンジン３００は、データ処理動作がデータのフォーマットに依存しているかどうかを判定することができる。解析エンジン３００は、フォーマット特有の構成要素の識別子を挿入エンジン３０６に送信し、この挿入エンジン３０６は、置換構成要素のためにフォーマット特有の構成要素のうちのいずれかを省略するか決定する。 Analysis engine 300 has access to the specifications of each component in graph 100 . A component specification includes or refers to code that defines this component, eg, code that defines the data processing operations represented by this component. Based on analysis of the code, analysis engine 300 can determine whether data processing operations depend on the format of the data. The parsing engine 300 sends identifiers of the format-specific components to the insertion engine 306, which determines whether to omit any of the format-specific components for the replacement component.

例によっては、解析エンジン３００は、ユーザの要求に応じてグラフ１００を解析する。たとえば、ユーザは、グラフ１００を使用して、通常このグラフによって処理されるデータとは異なるフォーマットのデータを処理したいと思うことがある。ユーザは、グラフ１００の解析を要求して、異なるフォーマットのデータをグラフ１００が確実に処理できるようにすることができる。 In some examples, analysis engine 300 analyzes graph 100 in response to user requests. For example, a user may wish to use graph 100 to process data in a different format than the data normally processed by this graph. A user can request analysis of the graph 100 to ensure that the graph 100 can handle data in different formats.

例によっては、たとえば、グラフが最初に定義されるとき、又はグラフが最初にインスタンス化若しくはパラメータ化されるとき（たとえば、グラフの第１の実行時に）、解析エンジン３００がグラフ１００を一度解析して、グラフ内のフォーマット特有の構成要素全てのリストを生成する。グラフ内のフォーマット特有の構成要素のリストは、先々参照するために記憶して、たとえば、互いに異なるフォーマットのデータを処理するためにグラフ１００を使用するというユーザの要求に応答して使用することができる。 In some examples, analysis engine 300 analyzes graph 100 once, for example, when the graph is first defined, or when the graph is first instantiated or parameterized (eg, during the first execution of the graph). to generate a list of all format-specific components in the graph. The list of format-specific components within the graph can be stored for future reference and used, for example, in response to user requests to use graph 100 to process data in different formats. can.

例によっては、解析エンジン３００は、グラフをいつ解析するかを自動的に決定する。たとえば、グラフの仕様は、このグラフによってこれまでに処理されたデータのフォーマットの記述を含んでもよい。入力データのフォーマットが、これまでに処理されたデータのフォーマットと異なる場合、解析エンジンは、グラフを解析して、異なるフォーマットの入力データを処理するために何らかの構成要素を置換する必要があるかどうかを判定してもよい。 In some examples, analysis engine 300 automatically determines when to analyze the graph. For example, a graph's specification may include a description of the format of data so far processed by this graph. If the format of the input data is different from the format of the data processed so far, the parsing engine parses the graph to see if any components need to be replaced in order to process the differently formatted input data. may be determined.

挿入エンジン３０６は、解析エンジン３００によって識別されるフォーマット特有の構成要素のいずれかを省略するか判定する。挿入エンジン３０６は、省略される構成要素のそれぞれについて置換構成要素を定義するオーバレイ指定を作成する。 The insertion engine 306 determines whether to omit any of the format-specific components identified by the parsing engine 300 . The insertion engine 306 creates an overlay specification that defines a replacement component for each omitted component.

例によっては、省略すべきフォーマット特有の構成要素は、ユーザによって識別される。たとえば、挿入エンジン３０６は、識別済みのフォーマット特有の構成要素のリストをユーザ・インターフェース３１４に表示することができ、ユーザは、置換すべき構成要素を選択する。ユーザは、置換すべき構成要素のそれぞれについて置換構成要素として使用される構成要素を指示することができる。たとえば、ユーザは、入力データのフォーマット、フォーマット特有の構成要素のそれぞれによって表されるプロセス、又はその両方についての自らの知識を使用して、構成要素のいずれかを省略すべきか、又どの構成要素を置換構成要素として含めるべきか判定してもよい。省略すべき構成要素、置換構成要素、又はその両方を識別するユーザ入力に基づいて、挿入エンジン３０６はオーバレイ指定を作成する。 In some examples, the format-specific components to omit are identified by the user. For example, the insertion engine 306 can display a list of identified format-specific components on the user interface 314, and the user selects the component to replace. The user can indicate the component to be used as the replacement component for each component to be replaced. For example, users may use their knowledge of the format of the input data, the process represented by each format-specific component, or both, to determine whether any of the components should be omitted and which components should be included as a replacement component. Based on user input identifying components to be omitted, replacement components, or both, the insertion engine 306 creates an overlay specification.

例によっては、挿入エンジン３０６は、フォーマット特有の構成要素のいずれかを省略すべきか、又どの構成要素を置換構成要素として含めるべきか自動的に決定することができる。たとえば、挿入エンジンは、フォーマット特有の構成要素のそれぞれの仕様を解析して、どの構成要素が入力データを処理できるか、又は処理できないか判定することができる。挿入エンジン３０６は、置換構成要素、たとえば、置換すべき対応する構成要素によって表されるのと同じデータ処理動作を表すが、入力データのフォーマットのデータを処理できる置換構成要素を自動的に識別することができる。挿入エンジン３０６は、自動的に識別された置換構成要素についてのオーバレイ指定を作成する。例によっては、ユーザ入力は、自動化判定に組み込まれる。たとえば、挿入エンジン３０６によって識別される置換構成要素を承認するよう、ユーザに求めてもよい。 In some examples, the insertion engine 306 can automatically determine whether any format-specific components should be omitted and which components should be included as replacement components. For example, the insertion engine can parse the specifications of each of the format-specific components to determine which components can or cannot process the input data. The insertion engine 306 automatically identifies replacement components, e.g., replacement components that represent the same data processing operations represented by the corresponding components to be replaced, but that can process data in the format of the input data. be able to. The insertion engine 306 creates overlay specifications for automatically identified replacement components. In some examples, user input is incorporated into automated decisions. For example, the user may be asked to approve the replacement component identified by the insertion engine 306 .

図６には、グラフ用の置換構成要素を定義するための一般的な手法が示してある。グラフによる処理のために、特定のフォーマットを有するデータのセットが受信される（４００）。受信データの特定のフォーマットを処理する能力について、グラフを解析すべきかどうかに関して判定がなされる（４０２）。例によっては、ユーザは、たとえばユーザ・インターフェースを介して、グラフを解析すべきであると示すことができる。たとえば、ユーザは、このデータのセットが、グラフによって処理されたこれまでのデータとは異なるフォーマットを有することを知っている場合がある。例によっては、この判定を自動的におこなうことができる。たとえば、受信データのフォーマットを決定し、たとえば、グラフが構成されるデータのフォーマットを示すグラフの仕様に記憶された情報と比較することができる。グラフが構成されるデータのフォーマットと受信データのフォーマットが一致しない場合、グラフが解析される。 FIG. 6 shows a general approach for defining replacement components for graphs. A set of data having a particular format is received (400) for processing by the graph. A determination is made as to whether the graph should be analyzed for its ability to process a particular format of received data (402). In some examples, a user can indicate, for example via a user interface, that the graph should be analyzed. For example, the user may know that this set of data has a different format than previous data processed by the graph. In some examples, this determination can be made automatically. For example, the format of the received data can be determined and compared, for example, with information stored in the specification of the graph indicating the format of the data from which the graph is constructed. If the format of the data from which the graph is constructed and the format of the received data do not match, the graph is parsed.

グラフは、たとえばプロセッサによって解析されて、構成要素によって処理されるデータのフォーマットに依存する、グラフの１つ又は複数の構成要素を識別する（４０４）。具体的には、グラフの構成要素のうちの１つ又は複数のそれぞれについての仕様が解析されて、グラフ内のフォーマット特有の構成要素を識別する。例によっては、グラフは、このグラフの各構成要素を介する段階的進行によって解析される。たとえば、各構成要素を解析して、この構成要素がフォーマット特有の構成要素かどうかを判定するとともに、この入力フローと出力フローを識別する。各フローは隣接する構成要素へと続き、これら構成要素のそれぞれが解析されて、この構成要素がフォーマット特有であるかどうかを判定し、入力フロー及び出力フローを識別する。このようにして、グラフの構成要素の全てを解析することができる。例によっては、この解析は、実行時に、たとえばグラフがパラメータ化された後に、自動的に実行することができる。例によっては、この解析は、自動的かつ動的に、たとえばグラフが実行されている間に実行することができる。たとえば、グラフの実行中に、あるパラメータが解決されると、動的な解析を実行することができる。例によっては、グラフは短期記憶内に受信され、そこからプロセッサによってグラフが解析されて、フォーマット特有の構成要素を識別する。 The graph is parsed 404, for example by a processor, to identify one or more components of the graph that depend on the format of the data processed by the component. Specifically, specifications for each of one or more of the graph's components are parsed to identify format-specific components within the graph. In some examples, the graph is analyzed by stepwise progression through each component of the graph. For example, each component is parsed to determine if it is format-specific and to identify its input and output flows. Each flow continues to adjacent components, each of which is parsed to determine whether this component is format-specific and to identify input and output flows. In this way, all of the components of the graph can be analyzed. In some examples, this analysis can be performed automatically at runtime, eg, after the graph has been parameterized. In some examples, this analysis can be performed automatically and dynamically, eg, while the graph is running. For example, dynamic analysis can be performed as certain parameters are resolved during graph execution. In some examples, the graph is received in short-term memory, from which the graph is parsed by a processor to identify format-specific components.

フォーマット特有と識別された構成要素のうちの１つ又は複数を評価して、構成要素を省略すべきかどうか、及び置換構成要素が含まれているかどうかを判定する（４０６）。たとえば、構成要素が、受信済みのデータのセットのフォーマットを有するデータを処理することができない場合、フォーマット特有の構成要素を省略してもよい。例によっては、フォーマット特有の構成要素のリストがユーザ・インターフェースに表示され、ユーザが、構成要素のいずれかを省略すべきか指定する。例によっては、フォーマット特有の構成要素のそれぞれの仕様を評価して、受信済みのデータのセットのフォーマットを有するデータを、この構成要素が処理できるかどうかを自動的に判定する。例によっては、フォーマット特有と識別された構成要素の全てが省略される。 One or more of the components identified as format-specific are evaluated to determine if the component should be omitted and if a replacement component is included (406). For example, a format-specific component may be omitted if the component cannot process data having the format of the received data set. In some examples, a list of format-specific components is displayed in the user interface and the user specifies whether any of the components should be omitted. In some examples, the specification of each of the format-specific components is evaluated to automatically determine whether the component can process data having the format of the received data set. In some examples, all components identified as format-specific are omitted.

省略すべきフォーマット特有の構成要素のうちの１つ又は複数のそれぞれでの置換構成要素について、オーバレイ指定が定義される（４０８）。所与の置換構成要素の仕様は、対応する省略済みのフォーマット特有の構成要素の仕様に基づいているが、受信済みのデータのセットのフォーマットを有するデータに実行できる、１つ又は複数のデータ処理動作を定義する。例によっては、置換構成要素は、受信済みのデータのセットのフォーマットに対して、フォーマット特有のものとすることができる。例によっては、置換構成要素は、一般的なもの、たとえば任意のフォーマットのデータを処理できるものとすることができる。 Overlay specifications are defined 408 for replacement components with each one or more of the format-specific components to be omitted. The specification of a given replacement component is based on the specification of the corresponding abbreviated format-specific component, but one or more data operations that can be performed on data having the format of the set of data received. Define behavior. In some examples, the replacement component may be format-specific to the format of the received data set. In some examples, the replacement component may be generic, eg, capable of processing data in any format.

グラフの実行に先立って、コンパイラが、グラフを実行可能なグラフにコンパイルしてもよい（４１０）。コンパイルの一環として、コンパイラは、置換構成要素を定義するオーバレイ指定２００を考慮に入れる。たとえば、コンパイラは、入力としてオーバレイ指定２００を受け付けてもよい。グラフの第２のバージョンが生成され、置換用に識別されたフォーマット特有の構成要素が除去され、除去済みの構成要素の代わりのオブジェクトとして、１つ又は複数の置換構成要素がグラフの第２のバージョンに挿入される。置換構成要素は、（除去済みの構成要素以外の）グラフ１００の第１のバージョンに含まれるデータ処理構成要素とともに、グラフの第２のバージョンで表してもよい。オーバレイ指定２００、又はこのオーバレイ指定を記憶するファイルは、グラフを含むファイルから分離されたままである。すなわち、置換構成要素は、グラフの第１のバージョンに含まれるデータ処理構成要素とともに、グラフの第２のバージョンに表示してもよいが、グラフの第１のバージョンを含むファイルは、置換構成要素の定義を含まない。 Prior to executing the graph, a compiler may compile 410 the graph into an executable graph. As part of compilation, the compiler takes into account overlay specifications 200 that define replacement components. For example, a compiler may accept overlay specification 200 as input. A second version of the graph is generated, the format-specific components identified for replacement are removed, and one or more replacement components are added to the second version of the graph as objects in place of the removed components. inserted into the version. The replacement components may be represented in the second version of the graph along with the data processing components included in the first version of graph 100 (other than the removed components). The overlay specification 200, or the file storing this overlay specification, remains separate from the file containing the graph. That is, the replacement component may be displayed in the second version of the graph along with the data processing components included in the first version of the graph, but the file containing the first version of the graph is the replacement component does not contain a definition of

オーバレイ指定において定義されるテスト・ソース、プローブ、又は置換構成要素などの挿入は、少なくとも２つのモード、すなわち単一実行モード（Single-Execution Mode）及び保存状態モード（Saved-State Mode）のうちの１つを使用して実行できる。 Insertion of test sources, probes, or replacement components defined in an overlay specification can be performed in at least two modes: Single-Execution Mode and Saved-State Mode. can be done using one.

図７には、単一実行モードで挿入定義を実行するための、例示的なシステムが示してある。この例では、クライアント６０２が、グラフ６０４の第１のバージョン、及び挿入を定義するオーバレイ・ファイル６０６（たとえば、オーバレイ指定）を生成又は参照する。たとえば、オーバレイ・ファイル６０６は、図４のオーバレイ指定２００でもよい。次いで、グラフ６０４が、コンパイラ６０８によってコンパイルされる。コンパイラ６０８は、オーバレイ・ファイル６０６を考慮に入れ、グラフの第２のバージョンを作成する。グラフの第２のバージョンは実行可能であり、オーバレイ・ファイル６０６によって定義される挿入を含む。次いで、グラフの第２のバージョンは、実行することができる。例によっては、コンパイルと実行は同時に発生する。グラフの第２のバージョンが再び実行されることになる場合、グラフ６０４の再指定、再コンパイル、及びグラフの第２のバージョンの再実行を含む、このプロセスが繰り返される。実行可能なグラフの、ある実行から次の実行までの情報は保存されない。 FIG. 7 shows an exemplary system for executing an insert definition in single execution mode. In this example, client 602 generates or references a first version of graph 604 and an overlay file 606 (eg, overlay specification) that defines an insert. For example, overlay file 606 may be overlay specification 200 in FIG. Graph 604 is then compiled by compiler 608 . Compiler 608 takes overlay file 606 into account and creates a second version of the graph. A second version of the graph is executable and includes inserts defined by overlay file 606 . A second version of the graph can then be run. In some cases compilation and execution occur simultaneously. If the second version of the graph is to be run again, the process is repeated, including redesigning graph 604, recompiling, and rerunning the second version of the graph. No information is saved from one run of the executable graph to the next.

図８には、保存状態マネージャ７０８を用いて、保存状態モードで挿入定義を実行するための、例示的なシステムが示してある。この例では、クライアント７０２が、グラフ７０４、及び挿入を定義するオーバレイ・ファイル７０６（たとえば、オーバレイ指定）を生成又は参照する。たとえば、オーバレイ・ファイル７０６は、図４のオーバレイ指定２００でもよい。保存状態リポジトリ７１０が、保存状態マネージャ７０８及びコンパイラ７１２によって管理される。保存状態マネージャ７０８は、この保存状態データが保存状態リポジトリ７１０内のどこに配置されているか識別することもできる。グラフ７０４が、コンパイラ７１２によってコンパイルされる。コンパイラ７１２は、オーバレイ・ファイル７０６を考慮に入れ、オーバレイ・ファイル７０６によって定義される挿入を含むグラフの第２のバージョンを作成する。次いで、グラフの第２のバージョンは、実行することができる。例によっては、コンパイルと実行は同時に発生する。保存状態モードによって、実行可能なグラフが、各実行間で情報を保存している間に何回も実行できるようになるという点で、保存状態モードは単一実行モードと異なる。 FIG. 8 shows an exemplary system for executing an insert definition in save state mode using save state manager 708 . In this example, a client 702 generates or references a graph 704 and an overlay file 706 (eg, overlay specification) that defines an insert. For example, overlay file 706 may be overlay specification 200 in FIG. A save state repository 710 is managed by save state manager 708 and compiler 712 . The save state manager 708 can also identify where this save state data is located within the save state repository 710 . Graph 704 is compiled by compiler 712 . Compiler 712 takes overlay file 706 into account and creates a second version of the graph that includes the insertions defined by overlay file 706 . A second version of the graph can then be run. In some cases compilation and execution occur simultaneously. The saved state mode differs from the single run mode in that the saved state mode allows the executable graph to be run multiple times while preserving information between each run.

保存状態マネージャ７０８は、保存状態マネージャ・ディレクトリに存在することができ、保存状態を管理する。保存状態リポジトリ７１０に保存できる情報の例には、情報のうちでもとりわけ、プローブ挿入に関連する情報、テスト・ソース挿入に関連する情報、置換構成要素挿入に関連する情報、オーバレイ・ファイル７０６に関連する情報、及びグラフ構成要素に関連付けられたパラメータ（たとえば属性）が含まれる。 A save state manager 708 can reside in a save state manager directory and manages save states. Examples of information that may be stored in the saved state repository 710 include information related to probe insertion, information related to test source insertion, information related to replacement component insertion, information related to overlay file 706, among other information. and parameters (eg, attributes) associated with the graph components.

例によっては、実行可能なグラフが実行されるとき、グラフの特定の部分のみが実行される。すなわち、グラフの特定の構成要素のみが実行される。例によっては、グラフの構成要素の全てよりも少ない構成要素が実行される。実行可能なグラフは、挿入に影響を及ぼすことになる構成要素を実行するだけでよい。例によっては、グラフの第２のバージョンは、元のグラフ全体の第２のバージョンである。例によっては、グラフの第２のバージョンは、元のグラフ全体のほんの一部分の第２のバージョン、たとえば、定義された挿入に関連したグラフの部分のみの第２のバージョンである。たとえば、最も上流の置換構成要素の上流にある構成要素は、グラフの第１のバージョンによって実行してもよく、最も上流の置換構成要素で開始する構成要素は、グラフの第２のバージョンによって実行してもよい。 In some examples, when an executable graph is executed, only certain portions of the graph are executed. That is, only certain components of the graph are executed. In some examples, less than all of the graph's components are executed. Executable graphs only need to execute the components that will affect the insertion. In some examples, the second version of the graph is a second version of the entire original graph. In some examples, the second version of the graph is a second version of only a fraction of the entire original graph, eg, only the portion of the graph associated with the defined insertion. For example, components upstream of the most upstream replacement component may be executed by the first version of the graph, and components starting with the most upstream replacement component may be executed by the second version of the graph. You may

図９には、ここで説明する置換構成要素技法を使用できる、例示的なデータ処理システム８００が示してある。このシステム８００は、記憶装置又はオンライン・データ・ストリームへの接続など、１つ又は複数のデータのソースを含んでもよいデータ・ソース８０２を含み、そのそれぞれが、様々なフォーマット（たとえば、データベース・テーブル、スプレッドシート・ファイル、フラット・テキスト・ファイル、又はメインフレームが使用するネイティブ・フォーマット）のうちの任意のフォーマットで、データを記憶又は提供してもよい。実行環境８０４及び開発環境８１８は、たとえば、あるバージョンのＵＮＩＸオペレーティング・システムなど、適切なオペレーティング・システムの制御下で、１つ又は複数の汎用コンピュータ上にホスティングしてもよい。たとえば、実行環境８０４は、複数ノードの並列コンピューティング環境を含むことができ、これは、複数の中央処理装置（ＣＰＵ）若しくはプロセッサ・コア、ローカル・システム（たとえば、対称型マルチプロセッシング（ＳＭＰ）コンピュータなどのマルチプロセッサ・システム）若しくはローカル分散システム（たとえば、クラスタ若しくは超並列処理（ＭＰＰ）として結合された複数のプロセッサ）、又は、遠隔プロセッサ若しくは遠隔分散プロセッサ（たとえば、ローカル・エリア・ネットワーク（ＬＡＮ）及び／又は広域ネットワーク（ＷＡＮ）を介して結合されたマルチプロセッサ）、又はその任意の組合せを使用するコンピュータ・システムの構成を含む。 FIG. 9 illustrates an exemplary data processing system 800 in which the replacement component techniques described herein can be used. The system 800 includes a data source 802, which may include one or more sources of data, such as connections to storage devices or online data streams, each of which may be in various formats (e.g., database tables). , spreadsheet files, flat text files, or native formats used by mainframes). Execution environment 804 and development environment 818 may be hosted on one or more general purpose computers under the control of a suitable operating system, eg, some version of the UNIX operating system. For example, execution environment 804 can include a multi-node parallel computing environment, which includes multiple central processing units (CPUs) or processor cores, local system (e.g., symmetric multiprocessing (SMP) computer ) or a local distributed system (e.g., multiple processors coupled as a cluster or massively parallel processing (MPP)), or a remote processor or remote distributed processor (e.g., a local area network (LAN) and/or multiprocessors coupled via a wide area network (WAN)), or any combination thereof.

実行環境８０４は、データ・ソース８０２からデータを読み取り、出力データを生成する。データ・ソース８０２を設ける記憶装置は、実行環境８０４に対してローカルでもよく、たとえばこれは、実行環境８０４をホスティングするコンピュータに接続された記憶媒体（たとえばハード・ドライブ８０８）に記憶され、又は実行環境８０４に対して遠隔でもよく、たとえばこれは、（たとえばクラウド・コンピューティング・インフラストラクチャによって提供される）遠隔接続を介して、実行環境８０４をホスティングするコンピュータと通信する遠隔システム（たとえばメインフレーム８１０）上にホスティングされる。データ・ソース８０２は、テスト・ソース定義（たとえば、図４のテスト・ソース定義２０１）で定義されるデータを含んでもよい。すなわち、テスト・ソース定義２０１のレイアウト・パラメータ２１２が、データ・ソース８０２内のソース・ファイルの位置を指してもよい。 Execution environment 804 reads data from data source 802 and generates output data. The storage device providing data source 802 may be local to execution environment 804, for example, it may be stored on a storage medium (eg, hard drive 808) coupled to the computer hosting execution environment 804, or may be executed. It may be remote to environment 804, for example, it may be a remote system (eg, mainframe 810) that communicates with the computer hosting execution environment 804 via a remote connection (eg, provided by a cloud computing infrastructure). ). Data sources 802 may include data defined in a test source definition (eg, test source definition 201 of FIG. 4). That is, layout parameter 212 of test source definition 201 may point to the location of the source file within data source 802 .

出力データは、実行環境８０４からアクセス可能なデータ・ソース８０２又はデータ記憶システム８１６に記憶して戻してもよく、又は他の方法で使用してもよい。データ記憶システム８１６は、開発者８２０がグラフを、開発、デバッグ、及びテストすることのできる開発環境８１８からもアクセス可能である。実装形態によっては、開発環境８１８は、各接点間の（作業要素すなわちデータのフローを表す）有向フローによって接続される（データ処理構成要素又はデータセットを表す）節点を含むグラフとして、アプリケーションを開発するためのシステムである。たとえば、「グラフベースのアプリケーション用のパラメータの管理（Managing Parameters for Graph-Based Applications）」と題する、米国特許第２００７／００１１６６８号に、このような環境がより詳細に記載してあり、これを参考として本明細書に援用する。「グラフとして表現された計算の実行（EXECUTING COMPUTATIONS EXPRESSED AS GRAPHS）」と題する、米国特許第５，９６６，０７２号に、このようなグラフベースの計算を実行するためのシステムが記載してあり、これを参考として本明細書に援用する。このシステムに従って作成されるグラフは、グラフ構成要素によって表される個々のプロセスとの間で情報を取得し、各プロセス間で情報を転送し、各プロセスについて実行順序を定義するための方法を提供する。このシステムは、利用可能な任意の方法からプロセス間通信の方法を選択するアルゴリズムを含む（たとえば、グラフのフローによる通信経路は、ＴＣＰ／ＩＰ若しくはＵＮＩＸのドメイン・ソケットを使用することができ、又は共有メモリを使用して、各プロセス間でデータを渡すことができる）。 The output data may be stored back into data sources 802 or data storage system 816 accessible from execution environment 804, or may be used in other ways. Data storage system 816 is also accessible to development environment 818, which allows developers 820 to develop, debug, and test graphs. In some implementations, the development environment 818 presents the application as a graph containing nodes (representing data processing components or datasets) connected by directed flows (representing work elements or flows of data) between each contact. It is a system for development. For example, US Patent No. 2007/0011668, entitled "Managing Parameters for Graph-Based Applications," describes such an environment in more detail and is incorporated herein by reference. incorporated herein by reference. U.S. Pat. No. 5,966,072, entitled "EXECUTING COMPUTATIONS EXPRESSED AS GRAPHS," describes a system for performing such graph-based computations, This is incorporated herein by reference. Graphs created according to this system provide a way to get information to and from the individual processes represented by the graph components, to transfer information between each process, and to define the order of execution for each process. do. The system includes an algorithm that selects the method of inter-process communication from any available method (e.g., the communication path through the graph flow could use TCP/IP or UNIX domain sockets, or Data can be passed between each process using shared memory).

開発環境８１８は、ソース・コードを記憶するためのコード・リポジトリ８２２を含む。例によっては、ソース・コード及びオーバレイ指定（たとえば、図４のオーバレイ指定２２０）は、たとえばユーザ・インターフェースを介して開発環境にアクセスする開発者８２０によって開発してもよい。例によっては、ソース・コード及びオーバレイ指定は、たとえば、前述の解析エンジン３００及び挿入エンジン３０６によって自動的に決定される。例によっては、グラフ及びオーバレイ指定は、コード・リポジトリ８２２に記憶することができる。例によっては、コード・リポジトリ８２２にグラフが記憶され、別のオーバレイ・リポジトリ８２４にオーバレイ指定が記憶される。 Development environment 818 includes code repository 822 for storing source code. In some examples, the source code and overlay specification (eg, overlay specification 220 of FIG. 4) may be developed by developer 820 accessing the development environment, eg, via a user interface. In some examples, the source code and overlay designations are automatically determined, for example, by the analysis engine 300 and insertion engine 306 described above. In some examples, graphs and overlay specifications can be stored in code repository 822 . In some examples, graphs are stored in a code repository 822 and overlay specifications are stored in another overlay repository 824 .

コード・リポジトリ８２２及びオーバレイ・リポジトリ８２４のうちの１つ又は両方が、コンパイラ８２６と通信してもよい。コンパイラ８２６は、グラフの第１のバージョン及びオーバレイ指定（たとえば、図４のオーバレイ指定２００）を、グラフの実行可能な第２のバージョン８２８にコンパイルすることができる。たとえば、コンパイラは、入力としてオーバレイ指定を受け付けてもよい。１つ又は複数の挿入が処理され、オーバレイ指定に含まれる挿入定義にそれぞれが対応するオブジェクトの形で、グラフに挿入される。グラフの第２のバージョン８２８は、修正されたグラフによって視覚的に表すことができる。挿入オブジェクトは、グラフ５００の第２のバージョンに示してもよい。 One or both of code repository 822 and overlay repository 824 may communicate with compiler 826 . Compiler 826 can compile the first version of the graph and the overlay specification (eg, overlay specification 200 of FIG. 4) into an executable second version 828 of the graph. For example, a compiler may accept an overlay specification as input. One or more inserts are processed and inserted into the graph in the form of objects each corresponding to an insert definition contained in the overlay specification. A second version 828 of the graph can be visually represented by a modified graph. Insert objects may be shown in a second version of graph 500 .

開発環境８１８は、グラフの第２のバージョン８２８を実行するための実行環境８３０を含むことができる。たとえば、グラフがコンパイラ８２６によってコンパイルされると、グラフの第２のバージョン８２８を実行することができる。グラフの第２のバージョン８２８を実行することは、データ（たとえば、作業要素又はデータ・レコード）が各構成要素間を流れるときに、各構成要素、挿入（たとえば、テスト・ソース、プローブ、置換構成要素、又はこれらのうちの任意の２つ以上の組合せ）、及びグラフの第２のバージョン８２８の有向フローに関連する計算を実行することを含むことができる。例によっては、実行環境８３０は、コード・リポジトリ８２２に記憶された第１のバージョンのグラフのソース・コード、又はオーバレイ・リポジトリ８２４に記憶されたソース・コードを修正することなく、グラフの第２のバージョン８２８を実行する。実行環境８３０は、開発環境８１８のインターフェースを介してアクセス可能でもよく、又はそれ自体のインターフェースをもっていてもよい。このインターフェースは、実行に関連する情報を表示するように構成することができる。このインターフェースは、挿入に関連する情報（たとえば、プローブによって監視及び保存されているデータ、テスト・ソースによって挿入されているデータ、置換構成要素についての情報、又は他の情報）を表示するように構成することもできる。実行環境８３０により、開発者８２０は、各実行間で、グラフの第２のバージョン８２８を複数回実行できるようになり、グラフの第２のバージョン８２８の各態様を修正できるようになってもよい。 Development environment 818 may include execution environment 830 for executing second version 828 of graph. For example, once the graph is compiled by compiler 826, a second version 828 of the graph can be executed. Executing the second version 828 of the graph includes each component, insertion (e.g., test source, probe, replacement configuration) as data (e.g., work elements or data records) flows between each component. elements, or a combination of any two or more thereof), and performing computations associated with the directed flow of the second version 828 of the graph. In some examples, execution environment 830 executes the second version of the graph without modifying the first version of the graph source code stored in code repository 822 or the source code stored in overlay repository 824 . version 828 of Execution environment 830 may be accessible through an interface of development environment 818 or may have its own interface. This interface can be configured to display information related to the execution. The interface is configured to display information related to the insertion (e.g., data being monitored and stored by probes, data being inserted by test sources, information about replacement components, or other information). You can also The execution environment 830 may allow the developer 820 to execute the second version 828 of the graph multiple times between each run and modify aspects of the second version 828 of the graph. .

例によっては、開発者は、グラフの挿入及びコンパイルを管理する。たとえば、開発者８２０は、コード・リポジトリ８２２から、図１のグラフ１００の第１のバージョンを選択する。開発者８２０は又、オーバレイ・リポジトリ８２４から、図４のオーバレイ指定２００を選択する。例によっては、オーバレイ指定２００を選択する代わりに、開発者８２０は、オーバレイ・リポジトリ８２４内の様々なオーバレイ指定から挿入定義を選択してもよい。開発者８２０は、グラフ１００の第１のバージョン及びオーバレイ指定２００に基づいて、グラフの第２のバージョン８２８をコンパイルするようコンパイラ８２６に命令する。 In some examples, the developer controls the insertion and compilation of graphs. For example, developer 820 selects the first version of graph 100 of FIG. 1 from code repository 822 . Developer 820 also selects overlay specification 200 of FIG. 4 from overlay repository 824 . In some examples, instead of selecting overlay specification 200 , developer 820 may select insert definitions from various overlay specifications in overlay repository 824 . Developer 820 instructs compiler 826 to compile a second version 828 of the graph based on first version of graph 100 and overlay specification 200 .

例によっては、挿入は自動的に挿入される。たとえば、前述の通り、グラフ１００内の１つ又は複数のデータ・ソース、出力データ受信側、又はフォーマット特有の構成要素は、たとえば、入力接続又は出力接続がない構成要素を識別することによって、又はグラフ１００内の構成要素の仕様を解析することによって自動的に識別される。識別されたデータ・ソース及び出力データ受信側は、グラフ１００のデバッギング中に挿入によって置換されることになる、データ・ソース及び出力データ受信側のリストと自動的に比較することができる。たとえば、このリストは、開発者８２０が提供することができる。フォーマット特有の構成要素を解析して、入力されるデータのセットのフォーマットなど、特定のフォーマットのデータをこの構成要素が処理できるかどうかを判定することができる。データを処理できないフォーマット特有の構成要素のリストが生成される。例によっては、このリストは、開発者８２０が提供することができる。このリストに従って、グラフ１００のデータ・ソース、出力データ受信側、又はフォーマット特有の構成要素用のオーバレイ指定が自動的に作成される。次いで、グラフの第２のバージョンが自動的にコンパイルされる。 In some cases, inserts are inserted automatically. For example, as noted above, one or more data sources, output data receivers, or format-specific components in graph 100 may be identified, for example, by identifying components that have no input or output connections, or It is automatically identified by analyzing the specifications of the components in graph 100 . The identified data sources and output data receivers can be automatically compared to a list of data sources and output data receivers that will be replaced by insertions during debugging of graph 100 . For example, this list can be provided by developer 820 . A format-specific component can be parsed to determine whether this component can process data in a particular format, such as the format of an incoming set of data. A list of format-specific components that cannot process the data is generated. In some examples, this list may be provided by developer 820 . According to this list, overlay specifications for data sources, output data receivers, or format-specific components of graph 100 are automatically created. A second version of the graph is then automatically compiled.

例によっては、オーバレイ指定は、コード・リポジトリ８２２又はオーバレイ・リポジトリ８２４内のファイルとして永続的に記憶されることはない。むしろ、通常オーバレイ・ファイルに含まれるはずの情報（たとえば、挿入定義）は、（たとえば、ユーザ・インターフェースを介して）開発者８２０によって作成され、又は解析エンジン３００及び挿入エンジン３０６によって自動的に判定され、メモリに一時的に記憶される。次いで、オーバレイ情報は、コンパイラ（たとえば、図８の６０８）又は保存状態マネージャ（たとえば、図９の７０８）に渡される。 In some examples, overlay specifications are not persistently stored as files in code repository 822 or overlay repository 824 . Rather, the information (e.g., insert definitions) that would normally be included in an overlay file is created by developer 820 (e.g., via a user interface) or automatically determined by parsing engine 300 and inserting engine 306. and temporarily stored in memory. The overlay information is then passed to a compiler (eg, 608 in FIG. 8) or a save state manager (eg, 708 in FIG. 9).

図１０を参照すると、例示的なプロセスでは、グラフの第１のバージョン（たとえば、図１のグラフ１００）が受信される（９０２）。たとえば、グラフの第１のバージョンは、プロセッサがアクセス可能な短期記憶に受信することができる。グラフ１００の第１のバージョンは、構成要素及びフローを含む。この構成要素は、データ・レコードに実行される動作を表し、このフローは、各構成要素間のデータ・レコードのフローを表す。 Referring to FIG. 10, in an exemplary process, a first version of a graph (eg, graph 100 of FIG. 1) is received (902). For example, a first version of the graph may be received in short-term memory accessible to the processor. A first version of graph 100 includes components and flows. The components represent the operations performed on the data records, and the flow represents the flow of data records between each component.

１つ又は複数の挿入を定義するオーバレイ指定が受信される（９０４）。例によっては、オーバレイ指定は、開発者又は試験者から受信される。例によっては、たとえば前述の通り、オーバレイ指定は自動的に定義される。オーバレイ指定は、図４に示すオーバレイ指定２００でもよい。オーバレイ指定は、１つ又は複数の挿入定義（たとえば、１つ若しくは複数のテスト・ソース定義、１つ若しくは複数のプローブ定義、又は１つ若しくは複数の置換構成要素定義）を含むことができる。挿入定義は、名前、上流ポート、下流ポート、挿入タイプ、プロトタイプ経路、及び（テスト・ソース定義用の）レイアウト・パラメータを含むことができる。定義されたテスト・ソース及びプローブのそれぞれは、グラフ１００のフローに関連付けることができる。定義された置換構成要素のそれぞれは、グラフ１００の構成要素に関連付けることができる。 An overlay specification is received that defines one or more inserts (904). In some examples, overlay specifications are received from developers or testers. In some instances, overlay specifications are automatically defined, eg, as described above. The overlay specification may be the overlay specification 200 shown in FIG. An overlay specification may include one or more insertion definitions (eg, one or more test source definitions, one or more probe definitions, or one or more replacement component definitions). An insert definition can include a name, upstream port, downstream port, insert type, prototype path, and layout parameters (for test source definitions). Each of the defined test sources and probes can be associated with a flow of graph 100 . Each of the defined replacement components can be associated with a component of graph 100 .

定義された挿入のうちの１つにそれぞれが対応する、１つ又は複数のオブジェクトが生成される（９０６）。オブジェクトは、テスト・ソース、プローブ、又は置換構成要素など、グラフの構成要素でもよい。 One or more objects, each corresponding to one of the defined inserts, are created (906). An object may be a graph component, such as a test source, probe, or replacement component.

グラフ１００の一部分の構成要素及びフローの少なくともいくつか、並びに生成された１つ又は複数のオブジェクトを含む、グラフの少なくとも一部分の第２のバージョンが生成される（９０８）。例によっては、グラフの第２のバージョンは、グラフ１００の一部分の構成要素及びフローの少なくともいくつか、並びに生成された１つ又は複数のオブジェクトを含むように修正された、元のグラフ１００のコピーである。グラフの第２のバージョンは、修正されたグラフ（たとえば、図２のグラフ２００の第２のバージョン、又は図３のグラフ３００の第３のバージョン）によって視覚的に表すことができる。各オブジェクトは、（テスト・ソース若しくはプローブ用の）オブジェクトに対応する定義済みの挿入に関連付けられたフローにおいて、又はこのオブジェクトに対応する定義済みの置換構成要素に関連付けられた構成要素の代わりに挿入される。生成された挿入オブジェクトは、グラフ１００のデータ処理構成要素とともに、グラフの第２のバージョンに表示してもよいが、グラフ１００の第１のバージョン（又は、グラフ１００の第１のバージョンを含むファイル）は修正されない。 A second version of at least a portion of the graph is generated (908), including at least some of the components and flows of the portion of the graph 100 and the generated one or more objects. In some examples, the second version of the graph is a copy of the original graph 100 modified to include at least some of the components and flows of a portion of the graph 100 and one or more generated objects. is. The second version of the graph can be visually represented by a modified graph (eg, the second version of graph 200 of FIG. 2 or the third version of graph 300 of FIG. 3). Each object is inserted in the flow associated with the defined insertion corresponding to the object (for test sources or probes) or in place of the component associated with the defined replacement component corresponding to this object. be done. The generated insert object, along with the data processing components of graph 100, may be displayed in a second version of the graph, but the first version of graph 100 (or the file containing the first version of graph 100) ) are not modified.

グラフ及びオーバレイ指定をコンパイルして、オーバレイ・ファイルによって定義された挿入を含むグラフの第２のバージョンを作成できるコンパイラ（たとえば、図７のコンパイラ６０８、及び図８のコンパイラ７１２）を説明したが、実施形態によっては、グラフ及びオーバレイ指定はコンパイルされない。たとえば、グラフ及びオーバレイ指定は、コンパイルされることなく直接実行することができる。インタープリタは、それぞれのステートメントを、既に機械コードにコンパイルされている一連の１つ又は複数のサブルーチンに変換することによって、グラフ及びオーバレイ指定を直接実行することができる。 Having described compilers (e.g., compiler 608 in FIG. 7 and compiler 712 in FIG. 8) that can compile graphs and overlay specifications to create a second version of the graph that includes the inserts defined by the overlay files; In some embodiments, graphs and overlay specifications are not compiled. For example, graph and overlay specifications can be executed directly without being compiled. The interpreter can directly execute graph and overlay specifications by converting each statement into a series of one or more subroutines already compiled into machine code.

プローブ、テスト・ソース、及び置換構成要素の形で挿入を説明してきたが、実施形態によっては、挿入は他の形をとることができる。挿入を広く使用して、グラフの所与のポイントでデータを入力し、グラフの所与のポイントからデータを取り出すことができる。たとえば、グラフのフローを通過するデータの品質を監視するように、挿入を設計することができる。データ品質が閾値を下回る場合、ユーザは、自動警告を受信することができる。挿入のさらなる説明は、米国特許出願第１４／７１５，９０４号に見いだすことができ、この内容全体を参考として本明細書に援用する。 Although insertions have been described in the form of probes, test sources, and replacement components, in some embodiments the insertions can take other forms. Insertion can be used extensively to enter data at a given point on the graph and retrieve data from a given point on the graph. For example, inserts can be designed to monitor the quality of data passing through the flow of the graph. Users can receive automatic alerts when the data quality is below a threshold. Further description of insertion can be found in US patent application Ser. No. 14/715,904, the entire contents of which are incorporated herein by reference.

さらに、グラフとの関連で挿入を説明してきたが、実施形態によっては、挿入は他の実行可能なアプリケーションとともに使用することができる。たとえば、包括的で実行可能なアプリケーション用のデータ・ソース、出力データ受信側、又はフォーマット特有のプロセスは、このアプリケーションの自動解析を用いて識別することができる。識別されたデータ・ソース、出力データ受信側、又はフォーマット特有のプロセスのうちの１つ又は複数は、それぞれ適切なテスト・ソース、プローブ、又は置換プロセスによって置換することができる。このようにして、実行可能なアプリケーションは、テスト・ソースからのデータを処理し、データをプローブに出力することができ、又は互いに異なるフォーマットのデータを処理できるようにすることができる。この構成は、実行可能なアプリケーションをテスト又はデバッグするのに役立つことがある。 Additionally, although insertion has been described in the context of graphs, in some embodiments, insertion can be used with other executable applications. For example, data sources, output data receivers, or format-specific processes for a generic executable application can be identified using automatic analysis of the application. One or more of the identified data sources, output data receivers, or format-specific processes can be replaced by appropriate test sources, probes, or replacement processes, respectively. In this way, executable applications can process data from test sources, output data to probes, or be able to process data in different formats from each other. This configuration may be useful for testing or debugging executable applications.

前述の手法は、適切なソフトウェアを実行するコンピューティング・システムを使用して実施することができる。たとえば、ソフトウェアは、（分散アーキテクチャ、クライアント／サーバ、又はグリッドなど様々なアーキテクチャでよい）１つ又は複数のプログラムされた、又はプログラム可能なコンピューティング・システム上で実行される、１つ又は複数のコンピュータ・プログラムでの手順を含んでもよく、このコンピューティング・システムのそれぞれが、少なくとも１つのプロセッサ、（揮発性メモリ及び／又は不揮発性メモリ、及び／又は記憶素子を含む）少なくとも１つのデータ記憶装置システム、（少なくとも１つの入力装置若しくは入力ポートを使用して入力を受信し、少なくとも１つの出力装置若しくは出力ポートを使用して出力を供給するための）少なくとも１つのユーザ・インターフェースを備える。ソフトウェアは、たとえば、グラフの設計、構成、及び実行に関連するサービスを提供する、比較的大きいプログラムの１つ又は複数のモジュールを含んでもよい。プログラムの各モジュール（たとえばグラフの要素）は、データ構造として、又はデータ・リポジトリに記憶されたデータ・モデルに合致する他の編成データとして実装することができる。 The techniques described above may be implemented using a computing system executing suitable software. For example, the software may be one or more computers running on one or more programmed or programmable computing systems (which may be of various architectures such as distributed architecture, client/server, or grid). may include steps in a computer program, each of the computing systems having at least one processor, at least one data storage device (including volatile and/or non-volatile memory, and/or storage elements) A system comprises at least one user interface (for receiving input using at least one input device or input port and for providing output using at least one output device or output port). Software may include, for example, one or more modules of relatively large programs that provide services related to graph design, construction, and execution. Each module of the program (eg, elements of a graph) can be implemented as a data structure or other organized data conforming to a data model stored in a data repository.

ソフトウェアは、（たとえば、汎用若しくは専用の、コンピューティング・システム若しくはコンピューティング装置によって読取り可能な）ＣＤ－ＲＯＭ又は他のコンピュータ読取り可能な媒体など、有形で持続的な媒体上に実装してもよく、又はネットワークの通信媒体を介して、ソフトウェアが実行されるコンピューティング・システムの有形で持続的な媒体に送達してもよい（たとえば、伝搬信号において符号化してもよい）。処理の一部又は全ては、専用コンピュータ上で実行してもよく、又は、コプロセッサ若しくはフィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）、若しくは専用の特定用途向け集積回路（ＡＳＩＣ）などの専用ハードウェアを使用して実行してもよい。ソフトウェアによって指定された計算の様々な部分を様々なコンピューティング要素によって実行する分散方式で、処理を実施してもよい。本明細書に記載の処理を実行するよう、記憶装置媒体がコンピュータに読み取られるときにコンピュータを構成し動作させるために、このようなコンピュータ・プログラムはそれぞれ、汎用又は専用のプログラム可能なコンピュータでアクセス可能な記憶装置のコンピュータ読取り可能な記憶媒体（たとえば、固体記憶装置若しくは固体記憶媒体、又は磁気媒体若しくは光媒体）に、記憶又はダウンロードされることが好ましい。この発明性のあるシステムは又、コンピュータ・プログラムで構成された有形の持続的な媒体として実施されるものとみなしてもよく、ここで、そのように構成された媒体により、コンピュータは、特定の事前定義された方式で動作して、本明細書に記載の処理ステップのうちの１つ又は複数を実行できるようになる。 The software may be implemented on a tangible, persistent medium such as a CD-ROM or other computer-readable medium (eg, readable by a general-purpose or special-purpose computing system or device). , or over the communications medium of a network, in a tangible and persistent medium of the computing system on which the software executes (eg, encoded in a propagated signal). Some or all of the processing may be performed on a dedicated computer, or dedicated hardware such as a coprocessor or field programmable gate array (FPGA), or dedicated application specific integrated circuit (ASIC). can be run using Processing may also be implemented in a distributed fashion, with different computing elements performing different portions of the calculations specified by the software. Each such computer program is accessed by a general purpose or special purpose programmable computer to configure and operate the computer when the storage medium is read by the computer to perform the processes described herein. It is preferably stored or downloaded to a computer readable storage medium capable of storage (eg, a solid state memory device or medium, or a magnetic or optical medium). The inventive system may also be viewed as embodied in a tangible, persistent medium configured with a computer program, wherein the medium so configured enables the computer to perform a specific It operates in a predefined manner to enable one or more of the processing steps described herein.

いくつもの実施形態を説明してきた。それにもかかわらず、前述の説明は、本発明の範囲を例示するものであり、それを限定するものではなく、本発明の範囲は、添付の特許請求の範囲に記載の範囲によって定義されることを理解されたい。したがって、他の実施形態も、添付の特許請求の範囲に記載の範囲内にある。たとえば、本発明の範囲から逸脱することなく、様々な修正を加えてもよい。さらに、前述のステップのいくつかは、順序に依存しなくてもよく、したがって、記載された順序とは異なる順序で実行することができる。 A number of embodiments have been described. Nevertheless, the foregoing description is intended to illustrate rather than limit the scope of the invention, which is defined by the scope of the appended claims. Please understand. Accordingly, other embodiments are within the scope of the following claims. For example, various modifications may be made without departing from the scope of the invention. Additionally, some of the steps described above may be order independent and, therefore, may be performed in a different order than the order described.

Claims

analyzing a first version of a computer program by a processor, including identifying a first process included in the first version of the computer program, the first process comprising: configured to perform a first operation on data in one format;
generating, by a processor, a second version of at least a portion of said computer program, comprising omitting said first process, wherein data in a second format different from said first format; including in said second version of said at least one portion of a computer program one or more second processes configured to perform the second act, wherein said second act is performed by said second A method based on the operation of 1.

2. The method of claim 1, wherein identifying a first process comprises identifying a first process for which said first operation depends on said format of said data.

3. The method of claim 1 or 2, wherein identifying a first process comprises identifying a first process incapable of performing said first operation on data in said second format.

A method according to any one of claims 1 to 3, comprising determining, by said first process, the format of data to be processed.

Identifying a first process includes identifying a first process incapable of performing said first operation on data having said format of said data processed by said first process. A method according to claim 4.

identifying a first process comprises identifying a first data processing element of said computer program, said first data processing element configured to execute said first process; , the method according to any one of claims 1 to 5.

including said one or more second processes in said computer program includes one or more second data processing elements in said second version of said at least part of said computer program 7. The method of claim 6, wherein said second data processing element is configured to execute said one or more second processes.

A method according to any preceding claim, wherein said first format comprises a data type.

A method according to any one of claims 1 to 8, wherein said first format comprises the size of data elements.

The first process is configured to perform the first operation on data records of a first record format, and the one or more second processes are configured to perform the first operation on data records of a second record format. A method according to any preceding claim, arranged to perform said second operation on data records.

11. The method of claim 10, wherein said first record format includes names of fields in said record.

A method according to any one of claims 1 to 11, comprising presenting an identifier of the first set of one or more actions in a user interface.

A method according to any one of claims 1 to 12, wherein generating said second version of at least part of said computer program comprises generating a copy of said part of said computer program.

14. The method of claim 13, comprising modifying the copy of the portion of the computer program to omit the first process and include the one or more second processes.

A method according to any one of claims 1 to 14, comprising executing said second version of said computer program.

A method according to any preceding claim, wherein said one or more second processes are defined by overlay specifications.

17. The method of claim 16, wherein generating the second version of the computer program comprises generating the second version based on the first version of the computer program and the overlay specification. the method of.

18. The method of claim 16 or 17, wherein the overlay designation identifies one or more of a process upstream of the first process and a process downstream of the first process.

A method as claimed in any one of claims 16 to 18, comprising identifying the first process based on analysis of executable code defining the first process.

2. The method of claim 1, wherein said computer program comprises a graph.

wherein the first process is an executable process represented by a first component of the graph and the one or more second processes are one or more second configurations of the graph; A method according to any one of claims 1 to 20, which is an executable process represented by elements.

22. The method of claim 21, wherein the one or more second components are configured to receive data records from upstream components of the graph.

23. A method according to claim 21 or 22, wherein said one or more second components are arranged to provide data records to downstream components of said graph.

means for analyzing a first version of a computer program by a processor, said analyzing comprising identifying a first process included in said first version of said computer program; means wherein the first process is configured to perform a first operation on data having a first format;
Means for generating by a processor a second version of at least a portion of said computer program comprising omitting said first process and having a second format different from said first format. including in said second version of said at least part of said computer program one or more second processes configured to perform a second operation on data; A system, wherein an action is based on said first action.

a processor coupled to a memory, the processor and memory comprising:
configured to analyze a first version of a computer program, said analyzing comprising identifying a first process included in said first version of said computer program; a process configured to perform a first operation on data having a first format;
configured to generate a second version of at least a portion of said computer program, comprising omitting said first process; in said second version of said at least a portion of said computer program, said second action being configured to perform the action of based on the system.

the computing system
A first version of a computer program is analyzed, said analyzing comprising identifying a first process included in said first version of said computer program, said first process comprising: , configured to perform a first operation on data having a first format;
a second operation on data having a second format different from the first format, comprising generating a second version of at least a portion of said computer program and omitting said first process; storing instructions for including in said second version of said at least a portion of said computer program one or more second processes configured to perform said second act is based on said first act.