JPH0744507A

JPH0744507A - Compile method for parallel program

Info

Publication number: JPH0744507A
Application number: JP21000693A
Authority: JP
Inventors: Takayuki Nakagawa; 貴之中川; Machiko Asaya; 真知子朝家; Toshiaki Tarui; 俊明垂井; Tokuyasu Imon; 徳安井門
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1993-08-03
Filing date: 1993-08-03
Publication date: 1995-02-14

Abstract

PURPOSE:To realize the efficient load distribution processing not imposing a load on a programmer in the parallel computer system by adding number of times of implementation of a sub-program in the past imprementation as input information at compiling and applying mapping to a processor of the sub program at compiling. CONSTITUTION:A source program 1 is inputted to a compiler 2 and given to a hardware execution step 3 and the number of times of execution 4 of a sub program is obtained. Then a new source program is obtained by a precompile processing 5 based on the sub program implementation number of times 4, a maximum implementation processor number 6 and the source program 1 and compiling and implementation are implemented again. Since the sub program at compiling is mapped onto the processor, the load at implementation is relieved and the past implementation information is utilized, then the load of a programmer is not increased. Furthermore, the procedure of tuning is simplified by designating a maximum processor number.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は並列計算機システムにお
ける、プログラムの実行の高速化を目指したコンパイル
方法に関する。高速化は適度なプロセッッサ数を指定
し、この範囲内のプロセッサをコンパイル時に於て、過
去の実行時情報に基づき、有効に活用することを指示し
て実行させることにより達成される。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiling method aimed at speeding up program execution in a parallel computer system. The speed-up is achieved by designating an appropriate number of processors and executing the processors within this range by instructing the effective utilization based on the past runtime information at the time of compilation.

【０００２】[0002]

【従来の技術】従来、並列計算機システムにおける各プ
ロセッサの実行する負荷の均等化については”スタック
分割動的負荷分散方式とマルチＰＳＩ上での評価”（古
市ほか、 KL1 Programming Workshop 予稿集ｐｐ５１
〜、ＩＣＯＴ、１９９１年）において論じられている。2. Description of the Related Art Conventionally, regarding load equalization executed by each processor in a parallel computer system, "Stack division dynamic load balancing method and evaluation on multiple PSI" (Furuichi et al., KL1 Programming Workshop Proceedings pp51
~, ICOT, 1991).

【０００３】上記従来技術では、予めプログラムで指定
した分配に適当なサブプログラムの実行が必要になる毎
に、実行時のプロセッサの状況により分配先プロセッサ
を決定する負荷分散サブプログラムに問い合わせ、暇な
プロセッサにサブプログラムを分配する。各プロセッサ
においては未実行のサブプログラムを、プライオリティ
別の管理テーブルに登録し、プライオリティの高い順に
サブプログラムを実行する。この際、最低プライオリテ
ィのサブプログラムが負荷分散サブプログラムに自分の
プロセッサ番号を登録する処理を実行することにより、
暇なプロセッサの検出を可能にしている。In the above-mentioned prior art, every time it is necessary to execute a subprogram suitable for distribution specified in advance by a program, the load balancing subprogram which determines the distribution destination processor is inquired according to the state of the processor at the time of execution, and the time is spared. Distribute the subprogram to the processor. In each processor, the unexecuted subprograms are registered in the priority-based management table, and the subprograms are executed in descending order of priority. At this time, the lowest priority subprogram executes the process of registering its own processor number in the load balancing subprogram,
It enables detection of idle processors.

【０００４】上記従来技術では、暇なプロセッサの検出
処理が本来のプログラム処理に追加されている。In the above conventional technique, the detection processing of the spare processor is added to the original program processing.

【０００５】[0005]

【発明が解決しようとする課題】上記従来技術は、負荷
分散処理の緊急性について配慮がされておらず、一旦全
プロセッサに、暇なプロセッサを登録するサブプログラ
ムを分配してから特定プロセッサに登録情報を集中させ
るため、プログラム実行の初期において分配が遅れる問
題があった。また、実行時においてこの登録処理を実行
するので処理量を増やすという問題があった。さらに、
分配対象となりうるサブプログラムをプログラマが指定
しなければならないという手間が発生した。本発明は、
上記の問題点を全て解決し、並列計算機システムにおい
て効率的で、プログラマに負担がかからない負荷分散処
理を実現することを目的とする。The above-mentioned prior art does not consider the urgency of the load balancing process, and once the sub-program for registering the spare processor is distributed to all the processors and then registered to the specific processor. Since the information is concentrated, there is a problem that distribution is delayed in the early stage of program execution. Further, since this registration processing is executed at the time of execution, there is a problem that the processing amount is increased. further,
The trouble has arisen that the programmer has to specify the subprograms that can be distributed. The present invention is
It is an object of the present invention to solve all the above problems and to realize efficient load balancing processing in a parallel computer system without burdening the programmer.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、本発明では、コンパイル時の入力情報として、過去
の実行におけるサブプログラムの実行回数を追加し、コ
ンパイル時において、サブプログラムのプロセッサへの
マッピングを行なう。最大プロセッサ数を指定してこれ
をコンパイル結果に反映する手段を設ける。In order to achieve the above object, in the present invention, the number of times of execution of a subprogram in past execution is added as input information at the time of compilation, and the processor of the subprogram is added at the time of compilation. Mapping. A means for designating the maximum number of processors and reflecting this in the compilation result is provided.

【０００７】[0007]

【作用】本発明ではコンパイル時にサブプログラムのプ
ロセッサへのマッピングを行なうので、実行時の負担を
へらすことができる。また、過去の実行情報を活用でき
るので、プログラマの負担を増やすことが少ない。ま
た、最大プロセッサ数を指定することにより、最適なプ
ロセッサ台数を求める手順として複雑なチューニングの
手順を簡略化できる。In the present invention, since the subprogram is mapped to the processor at the time of compilation, the load at the time of execution can be reduced. Moreover, since the execution information in the past can be utilized, the burden on the programmer is not increased. Further, by designating the maximum number of processors, it is possible to simplify a complicated tuning procedure as a procedure for obtaining the optimum number of processors.

【０００８】[0008]

【実施例】以下、本発明の一実施例を図１から図６によ
り説明する。図１は、本発明の一実施例のシステム構
成図である。ソースプログラム１はコンパイラ２に入力
及びハードウェア実行３をへて、サブプログラムの実行
回数４を得る。サブプログラムの実行回数４と最大実行
プロセッサ数６とソースプログラム１を用いたプリコン
パイル処理５により新たなソースプログラムを得て、再
度コンパイル及び実行する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to FIGS. FIG. 1 is a system configuration diagram of an embodiment of the present invention. The source program 1 inputs to the compiler 2 and goes through the hardware execution 3 to obtain the execution count 4 of the subprogram. A new source program is obtained by precompilation processing 5 using the subprogram execution count 4, the maximum number of execution processors 6, and the source program 1, and the compilation and execution are performed again.

【０００９】本実施例では、プリコンパイラを例に説明
するが、コンパイラ本体と一体化してもよい。In this embodiment, the precompiler is described as an example, but it may be integrated with the main body of the compiler.

【００１０】プリコンパイラは図２プリコンパイラ制御
フローに示す、サブプログラム呼び出し関係木作成２１
とサブプログラム実行回数入力２２の後、最大実行プロ
セッサ数入力２３、プロセッサ割り付け２４、出力生成
２５を行なうことで実現できる。これらのデータは以下
のように使う。The precompiler creates a subprogram call relation tree 21 shown in the precompiler control flow in FIG.
After the subprogram execution count input 22 and the maximum execution processor count input 23, processor allocation 24, and output generation 25 are performed. These data are used as follows.

【００１１】（１）サブプログラム呼び出し関係木作成
２１により、例えば図５のようなソースプログラムを読
み込んで、図３のようなサブプログラムの呼び出し関係
木を作成する。(1) The subprogram call relationship tree creation 21 reads a source program as shown in FIG. 5, for example, and creates a subprogram call relationship tree as shown in FIG.

【００１２】（２）サブプログラム実行回数入力２２に
より、例えば図４のような過去の実行におけるサブプロ
グラムの実行回数を得る。(2) By the subprogram execution count input 22, the execution count of the subprogram in the past execution as shown in FIG. 4 is obtained.

【００１３】（３）最大実行プロセッサ数入力２３によ
り、例えば最大実行プロセッサ数”３０”を得る。(3) For example, the maximum execution processor number "30" is obtained from the maximum execution processor number input 23.

【００１４】（４）プロセッサ割り付け２４により、サ
ブプログラムの呼び出し関係木の根本に近いサブプログ
ラムから順番に最大実行プロセッサ数”３０”を越えな
いプロセッサを割り付ける。図３の場合サブプログラム
ｐの実行回数が図４によると２１回なので、これが呼び
出し毎に違うプロセッサに割り付けられるものとし、そ
のほかのサブプログラムは呼び出したプロセッサにおい
て実行させる。ａはｐからしか呼ばれないのでｐの一部
と見做し、負荷分散対象としない。ｂまたはｃを負荷分
散すると図４から、実行プロセッサ数が２１＋１０１＝
１２２となり最大実行プロセッサ数”３０”を越えるの
でｂおよびｃを負荷分散対象としない。(4) By the processor allocation 24, the processors that do not exceed the maximum execution processor number "30" are allocated in order from the subprogram near the root of the call relation tree of the subprogram. In the case of FIG. 3, the number of times the subprogram p is executed is 21 times according to FIG. 4, so it is assumed that this is allocated to a different processor for each call, and the other subprograms are executed by the processor that called. Since a is called only by p, it is regarded as a part of p and is not subject to load balancing. When b or c is load-balanced, the number of execution processors is 21 + 101 = from FIG.
Since this is 122 and the maximum number of processors to be executed exceeds "30", b and c are not subject to load balancing.

【００１５】（５）出力生成２５により、図６のような
出力を得る。ここで、ｃａｌｌ（ｐｄ（Ｎ１））＠ｎｏ
ｄｅ（Ｎ）はサブプログラムｐｄ（Ｎ１）を実行するプ
ロセッサ番号をＮとすることを指示している。Ｎはプリ
コンパイラが生成したサブプログラムｐｄの呼び出し毎
に０から１ずつ加算され、ｍｏｄで表示した剰余演算に
よって最大実行プロセッサ数”３０”を越えない整数で
ある。(5) The output generator 25 obtains an output as shown in FIG. Where call (pd (N1)) @ no
de (N) indicates that the processor number that executes the subprogram pd (N1) is N. N is an integer that is incremented from 0 by 1 every time the subprogram pd generated by the precompiler is called, and does not exceed the maximum number of execution processors “30” by the modulo operation represented by mod.

【００１６】以上の説明のように、本実施例によれば、
並列計算機システムにおいて、簡略で、効率の良い負荷
分散制御を実現する効果がある。As described above, according to this embodiment,
In a parallel computer system, there is an effect of realizing simple and efficient load balancing control.

【００１７】[0017]

【発明の効果】本発明によれば、コンパイル時に実行時
の情報を反映させることができるので、簡略な負荷分散
制御を行なうことができ、並列処理によるオーバヘッド
を削減する効果がある。また、最大実行プロセッサ数
は、プログラムの並列度を指定する単純な目安となるの
で、チューニング処理の第１歩としてプログラマの手間
を増やすことが少ない。According to the present invention, since information at the time of execution can be reflected at the time of compilation, simple load balancing control can be performed, and the overhead due to parallel processing can be reduced. Further, the maximum number of execution processors is a simple guideline for designating the degree of parallelism of the program, and therefore the programmer's time and effort are rarely increased as the first step of the tuning process.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成図。FIG. 1 is a system configuration diagram of an embodiment of the present invention.

【図２】本発明によるプリコンパイラ制御フローを表わ
す図。FIG. 2 is a diagram showing a precompiler control flow according to the present invention.

【図３】本発明によるサブプログラム呼び出し関係木の
例を示す図。FIG. 3 is a diagram showing an example of a subprogram call relation tree according to the present invention.

【図４】サブプログラム実行回数出力例を示す図。FIG. 4 is a diagram showing an example of a sub-program execution count output.

【図５】ソースプログラムの一例を示す図。FIG. 5 is a diagram showing an example of a source program.

【図６】本発明によるプリコンパイラ出力例を示す図。FIG. 6 is a diagram showing an example of precompiler output according to the present invention.

[Explanation of symbols]

２１…サブプログラム呼び出し関係木作成、２２…サブ
プログラム実行回数入力、２３…最大実行プロセッサ数
入力、２４…プロセッサ割り付け、２５…出力生成。21 ... Subprogram call relation tree creation, 22 ... Subprogram execution count input, 23 ... Maximum execution processor number input, 24 ... Processor allocation, 25 ... Output generation.

フロントページの続き (72)発明者井門徳安東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内Front page continuation (72) Inventor Imon Tokuyasu 1-280, Higashi Koikekubo, Kokubunji, Tokyo Metropolitan Research Center, Hitachi Ltd.

Claims

[Claims]

1. When compiling a program composed of a plurality of subprograms to be executed in parallel by a plurality of processors, the number of executions indicating how many times each subprogram is executed when the program is executed, and the plurality of execution times. A parallel program compiling method for allocating each subprogram to a different processor according to the number of executions of the subprogram, based on the number of processors and the number of processors.