JP2012032986A

JP2012032986A - Compile method and program

Info

Publication number: JP2012032986A
Application number: JP2010171661A
Authority: JP
Inventors: Masaki Arai; 正樹新井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-07-30
Filing date: 2010-07-30
Publication date: 2012-02-16

Abstract

PROBLEM TO BE SOLVED: To provide a compile method and a program capable of more reducing a used amount of a storage medium necessary for collection of profile information than before.SOLUTION: A CPU contained in a front end node executes: an insertion step for inserting an aggregation code for aggregating profile information acquired by each parallel computation device and corresponding to a designated range in a parallel computation program in a parallel computation device allocated to each designated range in the parallel computation program and a compression and integration code for compressing and integrating the aggregated profile information into the parallel computation program; and a compile step for compiling the parallel computation program into which the aggregation code and the compression code are inserted.

Description

本発明は、プロファイル方法及びプログラムに関する。 The present invention relates to a profile method and a program.

従来より、複数のプロセッサからなる並列計算機上で動作する並列プログラムの実行性能を解析して、プロファイル情報を出力する技術が知られている。また、トレースデータの重複部分を削除する技術やトレースデータの実行回数を指定する技術が知られている。 2. Description of the Related Art Conventionally, a technique for analyzing the performance of a parallel program running on a parallel computer composed of a plurality of processors and outputting profile information is known. In addition, a technique for deleting an overlapping portion of trace data and a technique for designating the number of execution times of trace data are known.

また、大規模並列計算機で動作するプログラムとして、ＳＰＭＤ（Single-Program, Multiple-Data）プログラムが知られている。大規模並列計算機は、コンピュータで構成される、１つのフロントエンドノード及び複数（例えば数万個）の計算ノードを含む。フロントエンドノード及び複数の計算ノードは、ネットワークを介して互いに接続されている。また、フロントエンドノードは、ＳＰＭＤプログラム及びコンパイラを備えている。コンパイラは、ＳＰＭＤプログラムをコンパイルすることによって、実行可能ファイルを作成する。 As a program that operates on a large-scale parallel computer, an SPMD (Single-Program, Multiple-Data) program is known. A large-scale parallel computer includes one front-end node and a plurality of (for example, tens of thousands) computation nodes configured by computers. The front end node and the plurality of calculation nodes are connected to each other via a network. The front-end node includes an SPMD program and a compiler. The compiler creates an executable file by compiling the SPMD program.

この種の大規模並列計算機は、実行可能ファイルを以下の手順で実行する。まず、フロントエンドノードが、使用する計算ノード番号の最大値を決定する。つまり、使用する計算ノードの最大数が決定される。次に、フロントエンドノードが、コンパイラによって作成された実行可能ファイルを各計算ノードへコピーする。その後、各計算ノードが実行可能ファイルを実行する。 This type of massively parallel computer executes an executable file according to the following procedure. First, the front-end node determines the maximum value of the calculation node number to be used. That is, the maximum number of calculation nodes to be used is determined. Next, the front end node copies the executable file created by the compiler to each compute node. Thereafter, each compute node executes the executable file.

ところで、ＳＰＭＤプログラムから作成された実行可能ファイルの実行時の振舞いを調査するために、フロントエンドノードは、プロファイル情報を取得することがある。既存の技術では、以下の手順に従って、プロファイル情報を取得する。まず、各計算ノードは、実行可能ファイルを実行しながらＳＰＭＤプログラムのある位置の実行回数やある範囲の実行時間を取得し、実行可能ファイルの実行終了時に、取得された実行回数や実行時間の情報をプロファイル情報として記憶する（手順１）。次に、フロントエンドノードが、実行可能ファイルの実行終了時に、各計算ノードに記憶されたプロファイル情報を取得し、１つのプロファイル情報を作成する（手順２）。このようにして、フロントエンドノードは１つのプロファイル情報を取得する。 By the way, in order to investigate the behavior at the time of execution of the executable file created from the SPMD program, the front-end node may acquire profile information. In the existing technology, profile information is acquired according to the following procedure. First, each computation node acquires the number of executions at a certain position of the SPMD program and a certain range of execution time while executing the executable file, and when the execution of the executable file ends, information on the acquired number of executions and execution time is obtained. Is stored as profile information (procedure 1). Next, at the end of execution of the executable file, the front-end node acquires profile information stored in each calculation node and creates one profile information (procedure 2). In this way, the front-end node acquires one profile information.

特開平１０−６３５５０号公報JP-A-10-63550

特開２００１−１４７８３３号公報JP 2001-147833 A

特開平１０−１１３２１号公報JP-A-10-11321

しかしながら、各計算ノードで収集される詳細なプロファイル情報のデータサイズは巨大なので、上記手順２で作成される１つのプロファイル情報は膨大なデータ量になる。特に、ＳＰＭＤコードでは、複数の計算ノードによって、同様の処理が実行される場合も多いので、各計算ノードで個別に収集したプロファイル情報には、冗長な情報も大量に含まれる。よって、各計算ノード及びフロントエンドノードにおいて、プロファイル情報の収集に必要な記憶媒体の容量、即ち、メモリ又はハードディスクの容量が、無駄に消費されることがある。 However, since the data size of the detailed profile information collected at each calculation node is huge, one profile information created in the procedure 2 has a huge amount of data. In particular, in the SPMD code, the same processing is often executed by a plurality of calculation nodes. Therefore, a large amount of redundant information is included in the profile information collected individually at each calculation node. Therefore, the capacity of the storage medium necessary for collecting the profile information, that is, the capacity of the memory or the hard disk may be wasted in each calculation node and front-end node.

上記課題に鑑み、開示されたコンパイル方法、プログラム実行方法及びプログラムは、従来よりもプロファイル情報の収集に必要な記憶媒体の使用量を低減することができることを目的とする。 In view of the above problems, it is an object of the disclosed compiling method, program execution method, and program to reduce the amount of storage medium used necessary for collecting profile information as compared with the related art.

上記目的を達成するため、開示されたコンパイル方法は、コンピュータに、並列計算機の各々が取得する、並列計算用プログラムの指定された範囲に対応するプロファイル情報を当該並列計算用プログラムの指定された範囲毎に割り当てられた並列計算機に集約する集約コード及び前記集約されたプロファイル情報を圧縮且つ統合する圧縮統合コードを並列計算用プログラムに挿入する挿入工程と、前記集約コード及び前記圧縮コードを挿入された並列計算用プログラムをコンパイルするコンパイル工程とを実行させる。 In order to achieve the above-described object, the disclosed compiling method is a method in which profile information corresponding to a specified range of a parallel calculation program acquired by each of the parallel computers is stored in a specified range of the parallel calculation program. An insertion step of inserting into the parallel calculation program an aggregated code that is aggregated to a parallel computer assigned for each time and a compressed integrated code that compresses and integrates the aggregated profile information; and the aggregated code and the compressed code are inserted And a compiling step of compiling a parallel computing program.

開示されたプログラムは、コンピュータに、並列計算機の各々が取得する、並列計算用プログラムの指定された範囲に対応するプロファイル情報を当該並列計算用プログラムの指定された範囲毎に割り当てられた並列計算機に集約する集約コード及び前記集約されたプロファイル情報を圧縮且つ統合する圧縮統合コードを並列計算用プログラムに挿入する挿入工程と、前記集約コード及び前記圧縮コードを挿入された並列計算用プログラムをコンパイルするコンパイル工程とを実行させる。 The disclosed program is obtained by assigning profile information corresponding to a designated range of a parallel computing program acquired by each of the parallel computers to a parallel computer assigned to each designated range of the parallel computing program. An insertion step of inserting into the parallel calculation program an aggregate code to be aggregated and a compressed integrated code for compressing and integrating the aggregated profile information, and compiling to compile the parallel computation program into which the aggregate code and the compressed code are inserted Process.

開示されたコンパイル方法及びプログラムによれば、従来よりもプロファイル情報の収集に必要な記憶媒体の使用量を低減することができる。 According to the disclosed compiling method and program, it is possible to reduce the usage amount of the storage medium necessary for collecting profile information as compared with the related art.

本実施の形態に係る並列計算機の概略構成図である。It is a schematic block diagram of the parallel computer which concerns on this Embodiment. フロントエンドノード２及び計算ノード３−０〜３−Ｎの概略構成図である。It is a schematic block diagram of the front end node 2 and the calculation nodes 3-0 to 3-N. コンパイラ１２で実行される処理を示すフローチャートである。3 is a flowchart showing processing executed by a compiler 12. ＳＰＭＤプログラム１１の一例を示す図である。2 is a diagram illustrating an example of an SPMD program 11. FIG. ＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフの例を示す図である。It is a figure which shows the example of the function call graph corresponding to the SPMD program 11, and the control flow graph of each function. ＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフの例を示す図である。It is a figure which shows the example of the function call graph corresponding to the SPMD program 11, and the control flow graph of each function. ＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフの例を示す図である。It is a figure which shows the example of the function call graph corresponding to the SPMD program 11, and the control flow graph of each function. ＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフの例を示す図である。It is a figure which shows the example of the function call graph corresponding to the SPMD program 11, and the control flow graph of each function. ＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフの例を示す図である。It is a figure which shows the example of the function call graph corresponding to the SPMD program 11, and the control flow graph of each function. プロファイル情報の取得処理を示すフローチャートである。It is a flowchart which shows the acquisition process of profile information. 各計算ノード３−０〜３−３が有するプロファイル情報の状態を示す図である。It is a figure which shows the state of the profile information which each calculation node 3-0 3-3 has. 複数のプロファイル情報が圧縮且つ統合された後の各計算ノード３−０〜３−３の状態を示す図である。It is a figure which shows the state of each calculation node 3-0 -3-3 after several profile information is compressed and integrated. 計算ノード３−０及び計算ノード３−３に集約されたプロファイル情報が存在する状態を示す図である。It is a figure which shows the state in which the profile information aggregated to the calculation node 3-0 and the calculation node 3-3 exists. 計算ノード３−０及び計算ノード３−３で圧縮且つ統合されたプロファイル情報を１つのプロファイル情報に集約した状態を示す図である。It is a figure which shows the state which aggregated the profile information compressed and integrated by the calculation node 3-0 and the calculation node 3-3 to one profile information. プロファイル情報処理関数（profile_start及びprofile_end）を含むコードの一例を示す図である。It is a figure which shows an example of the code | cord | chord containing a profile information processing function (profile_start and profile_end). 実行可能ファイル１３の一例を示す図である。4 is a diagram illustrating an example of an executable file 13. FIG. 関数PARTを含むコードの一例を示す図である。It is a figure which shows an example of the code containing the function PART. 関数CHOOSE_MAXを含むコードの一例を示す図である。It is a figure which shows an example of the code containing function CHOOSE_MAX.

以下、本発明の実施の形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本実施の形態に係る並列計算機の概略構成図である。 FIG. 1 is a schematic configuration diagram of a parallel computer according to the present embodiment.

図１において、並列計算機１は、フロントエンドノード２、及び並列計算装置としての計算ノード３−０〜３−Ｎ（Ｎ：自然数）を備えている。フロントエンドノード２及び計算ノード３−０〜３−Ｎは、コンピュータなどのハードウエアであり、ネットワーク４を介して互いに接続されている。フロントエンドノード２は、並列計算用のプログラムであるＳＰＭＤ（Single-Program, Multiple-Data）プログラム１１、コンパイラ１２、及び実行可能プログラム１３を備えている。尚、図１中の「Ｋ」は各計算ノードのノード番号を示す。例えば、「Ｋ＝０」は、計算ノード３−０のノード番号が０であることを示し、「Ｋ＝Ｎ」は、計算ノード３−Ｎのノード番号がＮ（Ｎ：自然数）であることを示す。 In FIG. 1, a parallel computer 1 includes a front-end node 2 and computation nodes 3-0 to 3-N (N: natural number) as parallel computing devices. The front-end node 2 and the computation nodes 3-0 to 3-N are hardware such as a computer and are connected to each other via the network 4. The front-end node 2 includes an SPMD (Single-Program, Multiple-Data) program 11, a compiler 12, and an executable program 13, which are programs for parallel computation. Note that “K” in FIG. 1 indicates the node number of each computation node. For example, “K = 0” indicates that the node number of the calculation node 3-0 is 0, and “K = N” indicates that the node number of the calculation node 3-N is N (N: natural number). Indicates.

各計算ノードは、ＣＰＵ２１−Ｋ及びメモリ２２−Ｋを有する。Ｋは、各計算ノードに対応するノード番号である。尚、各計算ノードの詳細な構成は、後述する。 Each computation node has a CPU 21-K and a memory 22-K. K is a node number corresponding to each calculation node. The detailed configuration of each calculation node will be described later.

ＳＰＭＤプログラム１１は、並列計算用のプログラムである。ＳＰＭＤプログラム１１は、全ノード番号をいくつかのグループに分けて利用する。従って、ＳＰＭＤプログラム１１内のノード番号に関する処理に基づいて作成される実行可能プログラムの数は、計算ノードの数よりも少ない。 The SPMD program 11 is a program for parallel calculation. The SPMD program 11 uses all node numbers divided into several groups. Therefore, the number of executable programs created based on the processing related to the node number in the SPMD program 11 is smaller than the number of calculation nodes.

コンパイラ１２は、ＳＰＭＤプログラム１１をコンパイルし、実行可能プログラム１３を作成する。尚、コンパイラ１２により作成される実行可能プログラムの数は、１つに限定されるものではない。複数の実行可能プログラムが作成されてもよい。作成された実行可能プログラム１３は、計算ノード３−０〜３−Ｎに送信され、各計算ノードにより実行される。 The compiler 12 compiles the SPMD program 11 and creates an executable program 13. Note that the number of executable programs created by the compiler 12 is not limited to one. A plurality of executable programs may be created. The created executable program 13 is transmitted to the computation nodes 3-0 to 3-N and executed by each computation node.

図２は、フロントエンドノード２及び計算ノード３−０〜３−Ｎの概略構成図である。尚、計算ノード３−０〜３−Ｎの各々の構成は、フロントエンドノード２の構成と同一であるため、その説明は省略する。 FIG. 2 is a schematic configuration diagram of the front-end node 2 and the calculation nodes 3-0 to 3-N. Note that the configuration of each of the computation nodes 3-0 to 3-N is the same as that of the front end node 2, and therefore the description thereof is omitted.

フロントエンドノード２は、フロントエンドノード２全体を制御するＣＰＵ２１、各種の情報やプログラムを備えるメモリ２２を備えている。また、フロントエンドノード２は、ＯＳ、各種の情報やプログラムを備えるハードディスクドライブ（ＨＤＤ）２３、各計算ノードを接続する通信インターフェース２４を備えている。さらに、フロントエンドノード２は、マウス及びキーボードのような入力装置３１からデータを入力する入力部２５、及びモニタ又はプリンタのような出力装置３２にデータやメッセージを出力する出力部２６を備えている。ＣＰＵ２１は、バス２７を介して、メモリ２２、ＨＤＤ２３、通信インターフェース２４、入力部２５、及び出力部２６に接続されている。 The front end node 2 includes a CPU 21 that controls the entire front end node 2 and a memory 22 that includes various information and programs. The front-end node 2 also includes an OS, a hard disk drive (HDD) 23 having various information and programs, and a communication interface 24 for connecting each calculation node. Further, the front-end node 2 includes an input unit 25 that inputs data from an input device 31 such as a mouse and a keyboard, and an output unit 26 that outputs data and messages to an output device 32 such as a monitor or a printer. . The CPU 21 is connected to the memory 22, HDD 23, communication interface 24, input unit 25, and output unit 26 via the bus 27.

図１のＳＰＭＤプログラム１１、コンパイラ１２及び実行可能プログラム１３は、メモリ２２又はＨＤＤ２３に格納されている。コンパイラ１２は、ＣＰＵ２１の制御により動作する。尚、フロントエンドノード２は、これらの要素以外にも、ＣＤ−Ｒドライブのようなデータの読み書き装置や、不揮発性ＲＯＭ等を備えていてもよい。 The SPMD program 11, the compiler 12, and the executable program 13 in FIG. 1 are stored in the memory 22 or the HDD 23. The compiler 12 operates under the control of the CPU 21. The front end node 2 may include a data read / write device such as a CD-R drive, a nonvolatile ROM, and the like in addition to these elements.

図３は、コンパイラ１２で実行される処理を示すフローチャートである。図４は、ＳＰＭＤプログラム１１の一例を示す図である。図４のＳＰＭＤプログラム１１では、関数Ｈに通信処理SEND()及び通信処理RECEIVE()を含む。 FIG. 3 is a flowchart showing processing executed by the compiler 12. FIG. 4 is a diagram illustrating an example of the SPMD program 11. In the SPMD program 11 of FIG. 4, the function H includes a communication process SEND () and a communication process RECEIVE ().

図３において、コンパイラ１２は、ＳＰＭＤプログラム１１に対応するプログラム全体の関数呼び出しグラフ及び各関数の制御フローグラフを作成する（ステップＣ１）。関数呼び出しグラフは、ＳＰＭＤプログラム１１のソースコードに書いてある関数の呼び出し関係を示すグラフである。制御フローグラフは、プログラムを実行したときに通る可能性のある全経路を表したグラフである。 In FIG. 3, the compiler 12 creates a function call graph of the entire program corresponding to the SPMD program 11 and a control flow graph of each function (step C1). The function call graph is a graph showing a function call relationship written in the source code of the SPMD program 11. The control flow graph is a graph that represents all paths that may be taken when the program is executed.

ここで、図４のＳＰＭＤプログラム１１に対応する関数呼び出しグラフ及び各関数の制御フローグラフを図５に示す。図５では、制御のフローを実線で表し、関数の呼び出し及び関数からの復帰を点線で表している。図５では、main関数が、基本ブロック４１〜４４（コード片Ａ，関数Ｇの呼び出し、関数Ｈの呼び出し、コード片Ｂ）を含む。関数Ｇは、基本ブロック４２１〜４２４を含み、関数Ｈは、基本ブロック４３１〜４３４を含む。基本ブロックは、コンパイラ最適化を行うための最小単位である。 Here, FIG. 5 shows a function call graph corresponding to the SPMD program 11 of FIG. 4 and a control flow graph of each function. In FIG. 5, the control flow is represented by a solid line, and the function call and the return from the function are represented by a dotted line. In FIG. 5, the main function includes basic blocks 41 to 44 (code piece A, function G call, function H call, code piece B). The function G includes basic blocks 421 to 424, and the function H includes basic blocks 431 to 434. The basic block is a minimum unit for performing compiler optimization.

次に、コンパイラ１２は、プロファイル情報を取得する範囲（以下、対象範囲という）を決定する（ステップＣ２）。プロファイル情報は、ＳＰＭＤプログラム１１のある位置の実行回数やある範囲の実行時間を纏めた情報であり、本実施の形態では、計算ノード番号、範囲名、位置識別情報及び実行時間を含む。計算ノード番号、範囲名、位置識別情報及び実行時間の詳細は後述する。対象範囲は、例えば、関数単位又は基本ブロック単位などである。どのような範囲でプロファイル情報を取得するかは、プログラムがコンパイルオプションによって指定する。尚、コンパイルオプションは、プログラマが知っている情報を正確にコンパイラに知らせるために実行時に指定するオプションである。ここでは、対象範囲は関数単位及び基本ブロック単位に設定されているものとする。 Next, the compiler 12 determines a range for acquiring profile information (hereinafter referred to as a target range) (step C2). The profile information is information that summarizes the number of times the SPMD program 11 is executed and a certain range of execution time. In the present embodiment, the profile information includes a calculation node number, a range name, position identification information, and an execution time. Details of the calculation node number, range name, position identification information, and execution time will be described later. The target range is, for example, a function unit or a basic block unit. The range in which profile information is acquired is specified by the program using compile options. The compile option is an option that is specified at the time of execution in order to accurately inform the compiler of information known to the programmer. Here, it is assumed that the target range is set in function units and basic block units.

次いで、コンパイラ１２は、対象範囲に範囲名を割り当て、対象範囲の開始地点にプロファイル情報の測定を開始するコードを挿入し、対象範囲の終了地点にプロファイル情報の測定を終了するコードを挿入する（ステップＣ３）。図６の例では、コンパイラ１２は、対象範囲に、範囲名Ｒｎ（ｎ＝０〜１１）を割り当てている。また、コンパイラ１２は、プロファイル情報の測定を開始するコード（profile_start）及びプロファイル情報の測定を終了するコード（profile_end）を基本ブロックに挿入している。 Next, the compiler 12 assigns a range name to the target range, inserts a code for starting the measurement of profile information at the start point of the target range, and inserts a code for ending the measurement of profile information at the end point of the target range ( Step C3). In the example of FIG. 6, the compiler 12 assigns the range name Rn (n = 0 to 11) to the target range. Further, the compiler 12 inserts into the basic block a code (profile_start) for starting measurement of profile information and a code (profile_end) for ending measurement of profile information.

コンパイラ１２は、関数呼び出しグラフ及び制御フローグラフを利用し、各対象範囲内に通信処理を含むか否かを判定する。コンパイラ１２は、通信処理の有無に応じて、ステップＣ３で挿入されたprofile_startコードに設定される、対象範囲に含まれるプログラムの実行時間を丸めるラウンド関数を変更する（ステップＣ４）。図７では、基本ブロック４３２，４３３が通信処理（SEND()及びRECEIVE()）を含むので、関数Ｈも通信処理を含むことになる。図７では、通信処理を含む基本ブロック４３２，４３３及び関数Ｈは網掛けで表示されている。 The compiler 12 uses the function call graph and the control flow graph to determine whether or not communication processing is included in each target range. The compiler 12 changes the round function for rounding the execution time of the program included in the target range, which is set in the profile_start code inserted in step C3, according to the presence / absence of communication processing (step C4). In FIG. 7, since the basic blocks 432 and 433 include communication processing (SEND () and RECEIVE ()), the function H also includes communication processing. In FIG. 7, the basic blocks 432 and 433 including the communication process and the function H are displayed by shading.

コンパイラ１２は、関数呼び出しグラフ及び制御フローグラフを利用し、各対象範囲のプロファイル情報処理位置を決定する（ステップＣ５）。プロファイル情報処理位置は、各対象範囲から得られるプロファイル情報をダンプする（出力する）位置である。ある対象範囲Ｒのためのプロファイル情報処理位置は、下記の条件（１）〜（４）を全て満たすプログラム上の位置Ｘである。
（１）対象範囲Ｒの実行後に到達するプログラム上の位置Ｘ
（２）対象範囲Ｒを実行した場合は、必ず位置Ｘを通る
（３）プログラムの実行中に、同じ位置識別情報（位置識別情報については後述する）について、位置Ｘを一度だけ通る
（４）位置Ｘを通った後に、同じ位置識別情報について、対象範囲Ｒを再び通ることはない
一般に、１つのプロファイル情報処理位置は、複数の対象範囲をカバーする。各対象範囲のプロファイル情報処理位置を関数呼び出しグラフ及び制御フローグラフに挿入した例を図８に示す。図８では、プロファイル情報処理位置はprofile_dumpコードで示されている。 The compiler 12 determines the profile information processing position of each target range using the function call graph and the control flow graph (step C5). The profile information processing position is a position where profile information obtained from each target range is dumped (output). The profile information processing position for a certain target range R is a position X on the program that satisfies all of the following conditions (1) to (4).
(1) Program position X reached after execution of the target range R
(2) When the target range R is executed, it always passes through the position X. (3) During the execution of the program, the same position identification information (position identification information will be described later) passes through the position X only once (4). After passing through the position X, the same position identification information does not pass through the target range R again. In general, one profile information processing position covers a plurality of target ranges. An example in which the profile information processing position of each target range is inserted into the function call graph and the control flow graph is shown in FIG. In FIG. 8, the profile information processing position is indicated by a profile_dump code.

コンパイラ１２は、複数のプロファイル情報処理位置をグループ化する（ステップＣ６）。複数のプロファイル情報処理位置をグループ化した例を図９に示す。図９では、コンパイラ１２は、関数Ｇあるいは関数Ｈを呼び出す基本ブロックからなるループの直下又は関数Ｇ，Ｈに含まれる最後の基本ブロックの直下にプロファイル情報処理位置が設定されるように、複数のプロファイル情報処理位置をグループ化している。 The compiler 12 groups a plurality of profile information processing positions (step C6). FIG. 9 shows an example in which a plurality of profile information processing positions are grouped. In FIG. 9, the compiler 12 sets a plurality of profile information processing positions so that the profile information processing position is set immediately below the loop composed of basic blocks that call the function G or function H or immediately below the last basic block included in the functions G and H. Profile information processing positions are grouped.

最後に、コンパイラ１２は、profile_startコード、profile_endコード及びprofile_dampコードの挿入されたＳＰＭＤプログラム１１をコンパイルし、実行可能ファイル１３を作成する（ステップＣ７）。 Finally, the compiler 12 compiles the SPMD program 11 into which the profile_start code, profile_end code, and profile_damp code are inserted, and creates an executable file 13 (step C7).

ステップＣ１〜Ｃ６は、コンパイラ１２が自動的にプロファイル情報取得用の仕組みをＳＰＭＤプログラム１１に組み込む処理である。ステップＣ１〜Ｃ６は、コンパイラ１２が実行する代わりにプログラマが直接行うこともできる。つまり、対象範囲の指定とプロファイル情報処理位置の指定は、下記の３つの方法のいずれかで実現できる。
（１）コンパイラが自動的に範囲指定関数及び処理位置をＳＰＭＤプログラム１１のソースコードに挿入する（自動）。
（２）プログラマが範囲指定関数及び処理位置をＳＰＭＤプログラム１１のソースコードに書く（手動）。
（３）プログラマが、必要な部分の範囲指定関数及び処理位置をＳＰＭＤプログラム１１のソースコードに書き、残りの部分はコンパイラが自動的に処理する（半自動）。 Steps C <b> 1 to C <b> 6 are processes in which the compiler 12 automatically incorporates a mechanism for acquiring profile information into the SPMD program 11. Steps C1 to C6 can be performed directly by the programmer instead of being executed by the compiler 12. That is, the specification of the target range and the specification of the profile information processing position can be realized by one of the following three methods.
(1) The compiler automatically inserts the range specification function and processing position into the source code of the SPMD program 11 (automatic).
(2) The programmer writes the range specification function and processing position in the source code of the SPMD program 11 (manual).
(3) The programmer writes the range specification function and processing position of the necessary part in the source code of the SPMD program 11, and the compiler automatically processes the remaining part (semi-automatic).

次に、各計算ノード３−０〜３−Ｎが実行可能プログラム１３を実行し、プロファイル情報を取得する処理を説明する。図１０は、プロファイル情報の取得処理を示すフローチャートである。 Next, a process in which each of the calculation nodes 3-0 to 3-N executes the executable program 13 and acquires profile information will be described. FIG. 10 is a flowchart showing the profile information acquisition process.

まず、各計算ノード３−０〜３−Ｎが実行可能プログラム１３の実行を開始する（ステップＥ１）。尚、実行可能プログラム１３は、図３の処理に従って、ＳＰＭＤプログラム１１がコンパイルされて、得られたプログラムである。 First, each calculation node 3-0 to 3-N starts execution of the executable program 13 (step E1). The executable program 13 is a program obtained by compiling the SPMD program 11 according to the processing of FIG.

各計算ノード３−０〜３−Ｎは、プロファイル情報処理関数に到達したら、その関数の種類を判断する（ステップＥ２）。プロファイル情報処理関数は、profile_start、profile_end又はprofile_dumpである。 When each calculation node 3-0 to 3-N reaches the profile information processing function, it determines the type of the function (step E2). The profile information processing function is profile_start, profile_end, or profile_dump.

プロファイル情報処理関数がprofile_startである場合には、各計算ノード３−０〜３−Ｎは、対象範囲に含まれるプログラムの実行時間の測定を開始する（ステップＥ３）。 When the profile information processing function is profile_start, each of the calculation nodes 3-0 to 3-N starts measuring the execution time of the program included in the target range (step E3).

プロファイル情報処理関数がprofile_endである場合には、各計算ノード３−０〜３−Ｎは、対象範囲に含まれるプログラムの実行時間の測定を終了し、プロファイル情報をメモリ２２−０〜２２−Ｎに保存する（ステップＥ４）。 When the profile information processing function is profile_end, each of the calculation nodes 3-0 to 3-N ends the measurement of the execution time of the program included in the target range and stores the profile information in the memories 22-0 to 22-N. (Step E4).

プロファイル情報処理関数がprofile_dumpである場合には、各計算ノード３−０〜３−Ｎは、複数の対象範囲のプロファイル情報を予め決められたそれぞれの計算ノードへ送信すると共に、取得した複数のプロファイル情報を圧縮し且つ統合する（ステップＥ５）。ここでは、プロファイル情報を保存するノードとして、計算ノード３−０〜３−Ｎを順番に利用する。例えば、範囲名（Ｒ０）のプロファイル情報は、計算ノード３−０に送信され、範囲名（Ｒ１）のプロファイル情報は、計算ノード３−１に送信される。範囲名（Ｒｎ）のプロファイル情報は、計算ノード３−Ｎに送信される。対象範囲の個数ｎが計算ノードの総数Ｎよりも多い場合には、対象範囲の個数ｎを計算ノードの総数Ｎで割算した場合の余りに対応する対象範囲のプロファイル情報は、余りに対応する計算ノード番号の計算ノードに送信される。例えば、対象範囲が１０２個であり、計算ノードの総数が１００である場合、１０１，１０２番目の対象範囲のプロファイル情報はそれぞれ計算ノード番号１，２の計算ノードに送信される。 When the profile information processing function is profile_dump, each of the calculation nodes 3-0 to 3-N transmits profile information of a plurality of target ranges to each of the predetermined calculation nodes, and the acquired plurality of profiles The information is compressed and integrated (step E5). Here, calculation nodes 3-0 to 3-N are used in order as nodes for storing profile information. For example, the profile information of the range name (R0) is transmitted to the calculation node 3-0, and the profile information of the range name (R1) is transmitted to the calculation node 3-1. The profile information of the range name (Rn) is transmitted to the calculation node 3-N. When the number n of target ranges is larger than the total number N of calculation nodes, the profile information of the target range corresponding to the remainder when the number n of target ranges is divided by the total number N of calculation nodes is the calculation node corresponding to the remainder. Sent to the number calculation node. For example, if the target range is 102 and the total number of calculation nodes is 100, the profile information of the 101st and 102nd target ranges is transmitted to the calculation nodes of calculation node numbers 1 and 2, respectively.

各計算ノード３−０〜３−Ｎは実行可能プログラム１３が終了するまで、ステップＥ２〜Ｅ５の処理を繰り返す（ステップＥ６）。最後に、実行可能プログラム１３の実行終了時に、フロントエンドノード２が、各計算ノード３−０〜３−Ｎで圧縮且つ統合されたプロファイル情報を１つに集約する（ステップＥ７）。 Each of the computation nodes 3-0 to 3-N repeats the processes of steps E2 to E5 until the executable program 13 ends (step E6). Finally, at the end of execution of the executable program 13, the front-end node 2 aggregates the profile information compressed and integrated by the calculation nodes 3-0 to 3-N into one (step E7).

次に、プロファイル情報の取得処理を図９及び図１０を参照しながら詳細に説明する。ここでは、４つの計算ノード３−０〜３−３を利用するものとする。 Next, profile information acquisition processing will be described in detail with reference to FIGS. Here, four calculation nodes 3-0 to 3-3 are used.

実行可能プログラム１３の実行開始後に、各計算ノード３−０〜３−３は、最初に到達するプロファイル情報処理の関数が、図９より、profile_start(R0,...) であると判断する（ステップＥ１，２）。従って、各計算ノード３−０〜３−３は、範囲名Ｒ０の対象範囲に含まれるプログラムの実行時間の測定を開始する（ステップＥ３）。次に、各計算ノード３−０〜３−３は、プロファイル情報処理関数profile_end(R0,...) に到達するので、範囲名Ｒ０の対象範囲に含まれるプログラムの実行時間の測定を終了する（ステップＥ４）。ステップＥ３，４により得られるプロファイル情報は、下記の表１である。 After the execution of the executable program 13 is started, each of the computation nodes 3-0 to 3-3 determines that the profile information processing function that arrives first is profile_start (R0,...) From FIG. Step E1,2). Therefore, each calculation node 3-0 to 3-3 starts measuring the execution time of the program included in the target range of the range name R0 (step E3). Next, since each of the calculation nodes 3-0 to 3-3 reaches the profile information processing function profile_end (R0,...), The measurement of the execution time of the program included in the target range of the range name R0 is terminated. (Step E4). The profile information obtained in steps E3 and E4 is shown in Table 1 below.

表１の計算ノード番号は、計算ノードの番号Ｋを表す。範囲名は、関数profile_startと関数profile_endで指定される対象範囲の名前を表す。位置識別情報は、対象範囲に到達した履歴を表す履歴情報である。ここでは、対象名Ｒ０の対象範囲が関数main()内に存在することを示す（main）が位置識別情報である。実行時間は、各計算ノード３−０〜３−３で対象範囲のプログラムを実行するのにかかった時間を表す。各計算ノード３−０〜３−３が、表１の実行時間の１の位を四捨五入して、実行時間を１０の倍数に丸めた場合の、各計算ノード３−０〜３−３が有するプロファイル情報の状態を図１１に示す。

The calculation node number in Table 1 represents the calculation node number K. The range name represents the name of the target range specified by the function profile_start and the function profile_end. The position identification information is history information representing a history of reaching the target range. Here, (main) indicating that the target range of the target name R0 exists in the function main () is the position identification information. The execution time represents the time taken to execute the program in the target range at each of the calculation nodes 3-0 to 3-3. Each calculation node 3-0 to 3-3 has the execution time rounded to a multiple of 10 by rounding the execution time in Table 1 to the first decimal place. The state of profile information is shown in FIG.

各計算ノード３−０〜３−３がプロファイル情報処理の関数profile_dump(R0,R1)に到達すると、計算ノード３−０は範囲名Ｒ０の対象範囲のプロファイル情報を集約し、集約したプロファイル情報を圧縮し且つ統合する（ステップＥ５）。複数のプロファイル情報が圧縮且つ統合された後の各計算ノード３−０〜３−３の状態を図１２に示す。図１２では、計算ノード３−０は、同じ実行時間を有するプロファイル情報を圧縮した形式で保持している。このとき、範囲名Ｒ０の対象範囲のプロファイル情報はすべて計算ノード３−０に集約されているため、他の計算ノード３−１〜３−３には、範囲名Ｒ０の対象範囲のプロファイル情報は存在しない。 When each of the calculation nodes 3-0 to 3-3 reaches the profile information processing function profile_dump (R0, R1), the calculation node 3-0 aggregates the profile information of the target range of the range name R0, and displays the aggregated profile information. Compress and integrate (step E5). FIG. 12 shows a state of each of the calculation nodes 3-0 to 3-3 after a plurality of profile information is compressed and integrated. In FIG. 12, the computation node 3-0 holds profile information having the same execution time in a compressed format. At this time, since the profile information of the target range of the range name R0 is all collected in the calculation node 3-0, the profile information of the target range of the range name R0 is stored in the other calculation nodes 3-1 to 3-3. not exist.

上述した、ステップＥ１〜Ｅ５の処理は、対象名Ｒ１〜Ｒ３の対象範囲のプログラムに対しても実行される（ステップＥ６）。従って、実行可能プログラム１３の実行終了時には、各計算ノード３−０〜３−３に集約されたプロファイル情報が分散して存在している。例えば、計算ノード３−０及び計算ノード３−３に集約されたプロファイル情報が存在する状態を図１３に示す。図１３では、計算ノード３−３は、範囲名Ｒ３の対象範囲のプロファイル情報を圧縮した形式で保持している。 The above-described processing of steps E1 to E5 is also executed for the program in the target range of the target names R1 to R3 (step E6). Therefore, at the end of execution of the executable program 13, the profile information aggregated in each of the calculation nodes 3-0 to 3-3 is distributed. For example, FIG. 13 shows a state where profile information aggregated in the computation nodes 3-0 and 3-3 exists. In FIG. 13, the computation node 3-3 holds the profile information of the target range of the range name R3 in a compressed format.

実行可能プログラム１３の実行終了時には、フロントエンドノード２が計算ノード３−０及び計算ノード３−３で圧縮且つ統合された複数のプロファイル情報を１つのプロファイル情報に集約する（ステップＥ７）。計算ノード３−０及び計算ノード３−３で圧縮且つ統合されたプロファイル情報を１つのプロファイル情報に集約した状態を図１４に示す。 At the end of execution of the executable program 13, the front end node 2 aggregates a plurality of profile information compressed and integrated by the calculation nodes 3-0 and 3-3 into one profile information (step E7). FIG. 14 shows a state in which the profile information compressed and integrated by the calculation node 3-0 and the calculation node 3-3 is integrated into one profile information.

上述したように、各計算ノード３−０〜３−３は、ステップＥ５で部分的にプロファイル情報の圧縮と統合を行うので、最終的にステップＥ７で作成されるプロファイル情報のデータ量を大幅に削減できる。また、ステップＥ１〜Ｅ５の処理に複数の計算ノードが利用されるので、プロファイル情報の集約、圧縮及び統合の処理を並列に実行することができる。 As described above, since each of the calculation nodes 3-0 to 3-3 partially compresses and integrates the profile information at step E5, the data amount of the profile information finally created at step E7 is greatly increased. Can be reduced. In addition, since a plurality of calculation nodes are used for the processing of steps E1 to E5, the processing of aggregation, compression, and integration of profile information can be executed in parallel.

次に、プロファイル情報処理関数について説明する。 Next, the profile information processing function will be described.

コンパイラ１２は、ステップＣ２でＳＰＭＤプログラム１１内の、実行時間のプロファイル情報を取得する範囲（対象範囲）を決定する。このとき、コンパイラ１２は、図１５に示すプロファイル情報処理関数（profile_start及びprofile_end）を含むコードを使って、対象範囲を指定する。 The compiler 12 determines a range (target range) in which the execution time profile information is acquired in the SPMD program 11 in step C2. At this time, the compiler 12 designates the target range using a code including profile information processing functions (profile_start and profile_end) shown in FIG.

ここで、対象範囲の指定に使用されるプロファイル情報処理関数の各パラメータの意味は次の通りである。図１５のREGION_NAMEは、対象範囲を区別するための名前を表す。図１５のSELECTORは、プロファイル情報を取捨選択するための関数を指定する。関数の引数は位置識別情報である。各計算ノード３−０〜３−３にプロファイル情報を取捨選択させない場合は、SELECTORにはNULLが指定される。 Here, the meaning of each parameter of the profile information processing function used for specifying the target range is as follows. REGION_NAME in FIG. 15 represents a name for distinguishing the target range. SELECTOR in FIG. 15 specifies a function for selecting profile information. The argument of the function is position identification information. If the calculation nodes 3-0 to 3-3 do not select the profile information, NULL is specified for SELECTOR.

図１５のCOMPRESSORは、プロファイル情報を圧縮且つ統合するための関数を指定する。関数profile_startのCOMPRESSORで指定される関数を含むコードは圧縮統合コードとして機能する。関数の引数は実行時間である。例えば、関数ROUND10がCOMPRESSORとして指定された場合には、実行時間をｔとし、測定された実行時間ｔの１の位が四捨五入され、実行時間ｔは１０の倍数に丸められる。関数ROUND10000がCOMPRESSORとして指定された場合には、測定された実行時間ｔの千の位が四捨五入され、実行時間ｔは１００００の倍数に丸められる。 COMPRESSOR in FIG. 15 specifies a function for compressing and integrating profile information. The code including the function specified by COMPRESSOR of the function profile_start functions as a compression integrated code. The function argument is the execution time. For example, when the function ROUND10 is designated as COMPRESSOR, the execution time is t, the 1's place of the measured execution time t is rounded off, and the execution time t is rounded to a multiple of 10. When the function ROUND10000 is designated as COMPRESSOR, the thousands of the measured execution time t is rounded off, and the execution time t is rounded to a multiple of 10,000.

図１５のANALYZERは、圧縮したプロファイル情報から分析対象のプロファイル情報を抽出するための関数を指定する。関数の引数は圧縮したプロファイル情報の表である。圧縮したプロファイル情報から分析対象のプロファイル情報を抽出しない場合には、ANALYZERには、NULLが指定される。 ANALYZER in FIG. 15 specifies a function for extracting profile information to be analyzed from the compressed profile information. The function argument is a table of compressed profile information. If analysis target profile information is not extracted from the compressed profile information, NULL is specified for ANALYZER.

これらのパラメータは、コンパイラ１２がデフォルトで利用できる。また、コンパイラ１２にオプションの指定をすることで、これらのパラメータはユーザ指定の処理に差し替えられることができる。 These parameters are available by default by the compiler 12. Also, by specifying options in the compiler 12, these parameters can be replaced with user-specified processing.

また、コンパイラ１２は、プロファイル情報処理位置を「profile_dump (REGION_NAME)」により指定する（ステップＣ５）。関数profile_dumpは、実行時に次の処理（１），（２）を行うライブラリ関数である。 Further, the compiler 12 designates the profile information processing position by “profile_dump (REGION_NAME)” (step C5). The function profile_dump is a library function that performs the following processes (1) and (2) at the time of execution.

処理（１）：各計算ノード３−０〜３−Ｎに分散されている範囲名REGION_NAMEの対象範囲のプロファイル情報を集める。このために、各計算ノード３−０〜３−Ｎは互いに通信する。例えば、図１１において、計算ノード３−０は、各計算ノード３−１〜３−３に分散されている範囲名Ｒ０の対象範囲のプロファイル情報を集める。このように、関数profile_dumpを含むコードは、ＳＰＭＤプログラム１１の指定された対象範囲のプロファイル情報を集約する集約コードとして機能する。 Process (1): Collect profile information of the target range of the range name REGION_NAME distributed to the respective computation nodes 3-0 to 3-N. For this purpose, the computation nodes 3-0 to 3-N communicate with each other. For example, in FIG. 11, the computation node 3-0 collects profile information of the target range of the range name R0 distributed to the computation nodes 3-1 to 3-3. As described above, the code including the function profile_dump functions as an aggregation code for aggregating profile information in the target range specified by the SPMD program 11.

処理（２）：各計算ノード３−Ｎが集めたプロファイル情報をprofile_startのパラメータで指定された関数を使って圧縮し、統合する。例えば、図１１において、計算ノード３−０は、各計算ノード３−１〜３−３に分散されている範囲名Ｒ０の対象範囲のプロファイル情報を集めて、集められたプロファイル情報を圧縮し且つ統合する。圧縮且つ統合後のプロファイル情報は図１２のようになる。 Process (2): The profile information collected by each computation node 3-N is compressed and integrated using the function specified by the parameter of profile_start. For example, in FIG. 11, the computation node 3-0 collects profile information of the target range of the range name R0 distributed to the computation nodes 3-1 to 3-3, compresses the collected profile information, and Integrate. The profile information after compression and integration is as shown in FIG.

また、コンパイラ１２は、範囲名REGION_NAMEとして、範囲名のリスト（即ち、複数の範囲名）を指定してもよい。この場合、複数の対象範囲について、上記の処理（１），（２）が実行される。複数の範囲名が指定された場合、計算ノード３−０〜３−３がそれぞれ範囲名Ｒ０〜Ｒ３の対象範囲のプロファイル情報を圧縮且つ統合するので、複数のプロファイル情報の圧縮及び統合を並列処理することができる。従って、従来のように、フロントエンドノードに全プロファイル情報を集めて処理する方法と比べて、利用する計算ノードリソース（即ち、各計算ノードに含まれるメモリやディスクの使用量）を低減することができる。また、複数の計算ノードの並列処理により、処理速度が向上する。 Further, the compiler 12 may specify a list of range names (that is, a plurality of range names) as the range name REGION_NAME. In this case, the above processes (1) and (2) are executed for a plurality of target ranges. When a plurality of range names are specified, since the computation nodes 3-0 to 3-3 compress and integrate the profile information of the target ranges of the range names R0 to R3, respectively, compression and integration of the plurality of profile information are performed in parallel. can do. Therefore, as compared with the conventional method of collecting and processing all profile information in the front-end node, it is possible to reduce the computing node resources (that is, the amount of memory and disk used in each computing node) to be used. it can. In addition, the processing speed is improved by parallel processing of a plurality of calculation nodes.

次に、図１５のパラメータCOMPRESSORについて説明する。 Next, the parameter COMPRESSOR in FIG. 15 will be described.

例えば、４個の計算ノード３−０〜３−３が、図１６に示すような、実行可能ファイル１３を実行した場合、計算ノード３−０は上述した表１のプロファイル情報を取得する。プロファイル情報が解析される場合、実行時間の詳細な差は重要ではないので、計算ノード３−０は、表１に示すような実際の実行時間を保存するのではなく、パラメータCOMPRESSOR（ここでは関数ROUND10）によって丸められた実行時間を保存する。計算ノード３−０がパラメータCOMPRESSORに従って実行時間を丸めた結果を表２に示す。 For example, when the four calculation nodes 3-0 to 3-3 execute the executable file 13 as shown in FIG. 16, the calculation node 3-0 acquires the profile information of Table 1 described above. When profile information is analyzed, the detailed difference in execution time is not important, so the computation node 3-0 does not store the actual execution time as shown in Table 1, but instead uses the parameter COMPRESSOR (here function Save the execution time rounded by ROUND10). Table 2 shows the result of the calculation node 3-0 rounding the execution time according to the parameter COMPRESSOR.

表２に示すように、同じ実行時間のプロファイル情報を有する計算ノードが多数存在する可能性が高まる。

As shown in Table 2, there is a high possibility that there are a large number of computation nodes having the same execution time profile information.

コンパイラ１２は、ステップＣ４で、対象範囲内に通信処理を含むか否かを判定し、判定結果に応じて、ステップＣ３で挿入したprofile_startのパラメータCOMPRESSORに対して、異なるパラメータCOMPRESSOR（ラウンド関数）を設定することができる。例えば、図９で通信処理を含む対象範囲（基本ブロック４３２，４３３及び関数Ｈの呼び出しを含む基本ブロック４３）では、コンパイラ１２は、関数ROUND10に代えて、関数ROUND10000で実行時間を丸めるようにパラメータCOMPRESSORを変更することができる。この場合、コンパイラ１２は、ステップＣ４において、対象範囲内に通信処理を含む場合には、profile_startのパラメータCOMPRESSORに関数ROUND10000を設定し、対象範囲内に通信処理を含まない場合には、profile_startのパラメータCOMPRESSORに関数ROUND10を設定する。これにより、通信処理を含む対象範囲のプロファイル情報内の実行時間は１万の倍数になり、通信処理を含まない対象範囲のプロファイル情報内の実行時間は１０の倍数になる。 In step C4, the compiler 12 determines whether or not communication processing is included in the target range, and sets a different parameter COMPRESSOR (round function) to the parameter COMPRESSOR of profile_start inserted in step C3 according to the determination result. Can be set. For example, in the target range including the communication processing in FIG. 9 (basic blocks 432 and 433 and the basic block 43 including the call to the function H), the compiler 12 sets the parameter to round the execution time with the function ROUND10000 instead of the function ROUND10. You can change COMPRESSOR. In this case, in step C4, the compiler 12 sets the function ROUND10000 in the parameter COMPRESSOR of the profile_start when the communication process is included in the target range, and the profile_start parameter when the communication process is not included in the target range. Set the function ROUND10 to COMPRESSOR. Thereby, the execution time in the profile information of the target range including the communication process is a multiple of 10,000, and the execution time in the profile information of the target range not including the communication process is a multiple of 10.

この結果、プロファイル情報の分析者又は分析装置は、集約されたプロファイル情報を参照したときに、通信処理を含む対象範囲と通信処理を含まない対象範囲とを一見して区別することができ、効率的にＳＰＭＤプログラム１１の特徴を把握することができる。尚、実行時間ｔを丸める関数は、ROUND10及びROUND10000に限定されるものではなく、ROUND100、ROUND1000、又はROUND100000でもよい。 As a result, when referring to the aggregated profile information, the profile information analyst or the analysis apparatus can distinguish the target range including the communication process from the target range not including the communication process at a glance. In particular, the features of the SPMD program 11 can be grasped. The function for rounding the execution time t is not limited to ROUND10 and ROUND10000, but may be ROUND100, ROUND1000, or ROUND100000.

次に、図１５のパラメータSELECTORについて説明する。 Next, the parameter SELECTOR in FIG. 15 will be described.

例えば、図９では、関数Ｇに含まれる範囲名Ｒ４の対象範囲（基本ブロック４２１）は、main関数のループにより複数回実行される可能性がある。このため、ループの回転数（即ち対象範囲の実行回数）を識別するために、位置識別情報を利用する。範囲名Ｒ４の対象範囲は、main関数のループ（以下L0とする）からループインデックスｉで指定された回転数だけ実行される可能性があるので、ループの回転数を識別するために、位置識別情報を（main, L0, i）で表す。例えば、main関数のループが計算ノード３−０〜３−３で３回回転する場合のプロファイル情報を表３に示す。 For example, in FIG. 9, the target range (basic block 421) of the range name R4 included in the function G may be executed a plurality of times by the loop of the main function. For this reason, position identification information is used to identify the number of rotations of the loop (that is, the number of executions of the target range). Since the target range of the range name R4 may be executed from the loop of the main function (hereinafter referred to as L0) by the number of rotations specified by the loop index i, the position identification is used to identify the number of rotations of the loop. Information is represented by (main, L0, i). For example, Table 3 shows profile information when the loop of the main function rotates three times at the calculation nodes 3-0 to 3-3.

ループの回転数が多い場合は、プロファイル情報のデータ量が膨大になるため、すべてのプロファイル情報を保存するのは現実的ではない。このような場合に、コンパイラ１２は、ステップＣ３で挿入したprofile_startのパラメータSELECTORに取捨選択関数として関数PARTを設定する。関数PARTは、位置識別情報をプロファイル情報の取捨選択の判定に利用する。また、例えば、最初の２回のループのプロファイル情報を取得する場合は、コンパイラ１２は、図１７に示すような、関数PARTを含むコードをＳＰＭＤプログラム１１に設定する。

When the number of rotations of the loop is large, the amount of profile information data is enormous, so it is not realistic to save all the profile information. In such a case, the compiler 12 sets the function PART as a selection function in the parameter SELECTOR of profile_start inserted in step C3. The function PART uses position identification information for determination of selection of profile information. For example, when acquiring the profile information of the first two loops, the compiler 12 sets a code including the function PART as shown in FIG.

図１７において、位置情報Ｐはリスト形式をもち、Ｐ［ｘ］は、ｘ番目の要素を取り出すものとする。また、図１７では、パラメータSELECTORに設定した関数PARTが、１を返す場合は、対象範囲の実行時間が測定され、０を返す場合は、対象範囲の実行時間が測定されない。コンパイラ１２が、図１７に示す関数PARTを範囲名Ｒ４の対象範囲のprofile_startに設定すると、計算ノード３−０〜３−３は表４に示すプロファイル情報を取得する（ステップＥ４）。 In FIG. 17, the position information P has a list format, and P [x] takes out the x-th element. In FIG. 17, when the function PART set in the parameter SELECTOR returns 1, the execution time of the target range is measured, and when it returns 0, the execution time of the target range is not measured. When the compiler 12 sets the function PART shown in FIG. 17 to profile_start of the target range of the range name R4, the calculation nodes 3-0 to 3-3 acquire the profile information shown in Table 4 (step E4).

最終的に、フロントエンド２は表５に示すプロファイル情報を取得する（ステップＥ７）。

Finally, the front end 2 acquires the profile information shown in Table 5 (step E7).

取捨選択関数としての関数PARTは、各計算ノード３−０〜３−ＮがステップＥ３で実行可能プログラム１３の実行時間の測定を開始するときに、各計算ノード３−０〜３−Ｎが指定されたループの回転数に対応する対象範囲の実行時間の取得を開始するかどうかを判断するために利用される。

The function PART as the selection function is designated by each calculation node 3-0 to 3-N when each calculation node 3-0 to 3-N starts measuring the execution time of the executable program 13 in step E3. This is used to determine whether or not to start the execution time of the target range corresponding to the number of rotations of the loop that has been made.

次に、図１５のパラメータANALYZERについて説明する。 Next, the parameter ANALYZER in FIG. 15 will be described.

フロントエンド２が、表５に示すプロファイル情報ではなく、ループの回転数が０又は１であり、且つ範囲名Ｒ４の対象範囲の実行時間が最も長いプロファイル情報を取得する場合を考える。この場合、コンパイラ１２は、ステップＣ３で挿入したprofile_startのパラメータANALYZERに分析関数として関数CHOOSE_MAXを設定し、図１８に示すような、関数CHOOSE_MAXを含むコードをＳＰＭＤプログラム１１に設定する。関数CHOOSE_MAXでは、表４に示すデータから実行時間が１０であるプロファイル情報が削除されるので、最終的に表６に示すプロファイル情報が作成される。 Consider a case in which the front end 2 acquires not the profile information shown in Table 5 but profile information in which the number of rotations of the loop is 0 or 1 and the execution time of the target range of the range name R4 is the longest. In this case, the compiler 12 sets the function CHOOSE_MAX as an analysis function in the parameter ANALYZER of profile_start inserted in step C3, and sets the code including the function CHOOSE_MAX as shown in FIG. In the function CHOOSE_MAX, the profile information whose execution time is 10 is deleted from the data shown in Table 4, so that the profile information shown in Table 6 is finally created.

分析関数としての関数CHOOSE_MAXは、各計算ノード３−０〜３−ＮがステップＥ５で対象範囲のプロファイル情報を統合する場合に利用することができる。

The function CHOOSE_MAX as the analysis function can be used when each calculation node 3-0 to 3-N integrates the profile information of the target range in step E5.

取捨選択関数はステップＥ３で利用され、分析関数はステップＥ５で利用される。これらの関数により、各計算ノード３−０〜３−Ｎが取得するプロファイル情報が削減される。各計算ノード３−０〜３−Ｎが取捨選択関数又は分析関数を利用する場合、フロントエンドノード２が全てのプロファイル情報を集約した後に特定のプロファイル情報のみを抽出する方法と比較して、プロファイル情報の取得に必要となる、一時的なメモリやディスクスペースの使用量を低減することができる。 The sorting function is used in step E3, and the analysis function is used in step E5. With these functions, the profile information acquired by each of the calculation nodes 3-0 to 3-N is reduced. When each calculation node 3-0 to 3-N uses a selection function or an analysis function, the profile is compared with the method in which the front-end node 2 extracts only specific profile information after aggregating all the profile information. It is possible to reduce the amount of temporary memory and disk space required for acquiring information.

以上の実施の形態に関し、更に以下の付記を開示する。 Regarding the above embodiment, the following additional notes are disclosed.

（付記１）コンピュータに、並列計算装置の各々が取得する、並列計算用プログラムの指定された範囲に対応するプロファイル情報を当該並列計算用プログラムの指定された範囲毎に割り当てられた並列計算装置に集約する集約コード及び前記集約されたプロファイル情報を圧縮且つ統合する圧縮統合コードを並列計算用プログラムに挿入する挿入工程と、前記集約コード及び前記圧縮コードを挿入された並列計算用プログラムをコンパイルするコンパイル工程とを実行させることを特徴とするコンパイル方法。かかる構成によれば、各並列計算装置が実行可能ファイルとして並列計算用プログラムを実行するときに、各並列計算装置が、取得した複数のプロファイル情報を圧縮し且つ統合するので、従来よりもプロファイル情報の収集に必要な記憶媒体の使用量を低減することができる。 (Supplementary Note 1) Profile information corresponding to a specified range of the parallel calculation program acquired by each of the parallel calculation devices is assigned to the parallel calculation device assigned to each specified range of the parallel calculation program. An insertion step of inserting into the parallel calculation program an aggregate code to be aggregated and a compressed integrated code for compressing and integrating the aggregated profile information, and compiling to compile the parallel computation program into which the aggregate code and the compressed code are inserted A compiling method characterized in that the process is executed. According to such a configuration, when each parallel computing device executes the parallel computing program as an executable file, each parallel computing device compresses and integrates the acquired plurality of profile information. It is possible to reduce the amount of storage medium used necessary for collecting data.

（付記２）前記挿入工程は、さらに、各プロファイル情報を取捨選択する取捨選択コード又は前記圧縮且つ統合されたプロファイル情報から分析対象のプロファイル情報を抽出する分析コードの少なくとも一方を並列計算用プログラムに挿入することを特徴とする付記１のコンパイル方法。かかる構成によれば、各並列計算装置で取得されるプロファイル情報が削減されるので、従来よりもプロファイル情報の収集に必要な記憶媒体の使用量を低減することができる。 (Supplementary Note 2) In the insertion step, at least one of a selection code for selecting each profile information or an analysis code for extracting profile information to be analyzed from the compressed and integrated profile information is used as a parallel calculation program. The compiling method according to Supplementary Note 1, wherein the compiling method is inserted. According to such a configuration, the profile information acquired by each parallel computing device is reduced, so that it is possible to reduce the usage amount of the storage medium necessary for collecting profile information as compared with the related art.

（付記３）前記圧縮統合コードは、前記並列計算用プログラムの指定された範囲が通信処理を含むか否かに応じて、前記プロファイル情報に含まれる実行時間の情報を異なる桁数で丸めることを特徴とする付記１又は２に記載のコンパイル方法。かかる構成によれば、プロファイル情報の分析者又は分析装置は、通信処理を含む範囲と通信処理を含まない範囲とを明確に区別できるので、効率的に並列計算用プログラムの特徴を把握することができる。 (Additional remark 3) The said compression integration code rounds the information of the execution time contained in the said profile information by a different number of digits according to whether the designated range of the said parallel calculation program includes communication processing. 3. The compiling method according to appendix 1 or 2, which is a feature. According to such a configuration, the profile information analyst or the analysis device can clearly distinguish the range including the communication processing from the range not including the communication processing, and thus can efficiently grasp the characteristics of the parallel calculation program. it can.

（付記４）前記並列計算用プログラム全体の関数呼び出しグラフ及び各関数の制御フローグラフを作成し、前記関数呼び出しグラフ及び前記制御フローグラフに含まれる関数単位又は基本ブロック単位の少なくとも一方で、前記プロファイル情報を取得する範囲を指定する指定工程をさらに含むことを特徴とする付記１乃至３のいずれかに記載のコンパイル方法。かかる構成によれば、各並列計算装置が並列計算用プログラムを実行するときに、自動的にプロファイル情報を取得する範囲を指定できる。 (Supplementary Note 4) A function call graph of the entire parallel computing program and a control flow graph of each function are created, and at least one of a function unit or a basic block unit included in the function call graph and the control flow graph, the profile 4. The compiling method according to any one of appendices 1 to 3, further comprising a specifying step of specifying a range for acquiring information. According to such a configuration, it is possible to designate a range in which profile information is automatically acquired when each parallel computing device executes a parallel computing program.

（付記５）コンピュータに、並列計算装置の各々が取得する、並列計算用プログラムの指定された範囲に対応するプロファイル情報を当該並列計算用プログラムの指定された範囲毎に割り当てられた並列計算装置に集約する集約コード及び前記集約されたプロファイル情報を圧縮且つ統合する圧縮統合コードを並列計算用プログラムに挿入する挿入工程と、前記集約コード及び前記圧縮コードを挿入された並列計算用プログラムをコンパイルするコンパイル工程とを実行させることを特徴とするプログラム。上記付記１と同様の効果を奏する。 (Supplementary Note 5) Profile information corresponding to the designated range of the parallel computing program acquired by each of the parallel computing devices is assigned to the parallel computing device assigned to each designated range of the parallel computing program. An insertion step of inserting into the parallel calculation program an aggregate code to be aggregated and a compressed integrated code for compressing and integrating the aggregated profile information, and compiling to compile the parallel computation program into which the aggregate code and the compressed code are inserted A program characterized by causing a process to be executed. The same effect as the above supplementary note 1 is achieved.

１並列計算機、２フロントエンドノード、３−１〜３−Ｎ計算ノード、１１ＳＰＭＤプログラム、１２コンパイラ、１３実行可能プログラム、２１ＣＰＵ、２２メモリ、２３ハードディスクドライブ（ＨＤＤ）、２４通信インターフェース、２５入力部、２６出力部 1 parallel computer, 2 front-end node, 3-1 to 3-N computation node, 11 SPMD program, 12 compiler, 13 executable program, 21 CPU, 22 memory, 23 hard disk drive (HDD), 24 communication interface, 25 input Part, 26 output part

Claims

On the computer,
An aggregation code for collecting profile information corresponding to a designated range of a parallel computing program acquired by each of the parallel computing devices to a parallel computing device assigned for each designated range of the parallel computing program, and the aggregation An insertion step of inserting a compression integration code for compressing and integrating the profile information obtained into the parallel calculation program;
And a compiling step of compiling the parallel calculation program into which the aggregated code and the compressed code are inserted.

The inserting step further includes inserting at least one of a selection code for selecting each profile information or an analysis code for extracting analysis target profile information from the compressed and integrated profile information into the parallel calculation program. The compiling method according to claim 1, wherein:

The compression integration code rounds the execution time information included in the profile information with a different number of digits according to whether or not a specified range of the parallel calculation program includes communication processing. Item 3. The compiling method according to item 1 or 2.