JPH0744507A - Compile method for parallel program - Google Patents

Compile method for parallel program

Info

Publication number
JPH0744507A
JPH0744507A JP21000693A JP21000693A JPH0744507A JP H0744507 A JPH0744507 A JP H0744507A JP 21000693 A JP21000693 A JP 21000693A JP 21000693 A JP21000693 A JP 21000693A JP H0744507 A JPH0744507 A JP H0744507A
Authority
JP
Japan
Prior art keywords
program
processor
implementation
execution
subprogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP21000693A
Other languages
Japanese (ja)
Inventor
Takayuki Nakagawa
貴之 中川
Machiko Asaya
真知子 朝家
Toshiaki Tarui
俊明 垂井
Tokuyasu Imon
徳安 井門
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Agency of Industrial Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology filed Critical Agency of Industrial Science and Technology
Priority to JP21000693A priority Critical patent/JPH0744507A/en
Publication of JPH0744507A publication Critical patent/JPH0744507A/en
Pending legal-status Critical Current

Links

Landscapes

  • Devices For Executing Special Programs (AREA)
  • Multi Processors (AREA)

Abstract

PURPOSE:To realize the efficient load distribution processing not imposing a load on a programmer in the parallel computer system by adding number of times of implementation of a sub-program in the past imprementation as input information at compiling and applying mapping to a processor of the sub program at compiling. CONSTITUTION:A source program 1 is inputted to a compiler 2 and given to a hardware execution step 3 and the number of times of execution 4 of a sub program is obtained. Then a new source program is obtained by a precompile processing 5 based on the sub program implementation number of times 4, a maximum implementation processor number 6 and the source program 1 and compiling and implementation are implemented again. Since the sub program at compiling is mapped onto the processor, the load at implementation is relieved and the past implementation information is utilized, then the load of a programmer is not increased. Furthermore, the procedure of tuning is simplified by designating a maximum processor number.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は並列計算機システムにお
ける、プログラムの実行の高速化を目指したコンパイル
方法に関する。高速化は適度なプロセッッサ数を指定
し、この範囲内のプロセッサをコンパイル時に於て、過
去の実行時情報に基づき、有効に活用することを指示し
て実行させることにより達成される。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiling method aimed at speeding up program execution in a parallel computer system. The speed-up is achieved by designating an appropriate number of processors and executing the processors within this range by instructing the effective utilization based on the past runtime information at the time of compilation.

【0002】[0002]

【従来の技術】従来、並列計算機システムにおける各プ
ロセッサの実行する負荷の均等化については”スタック
分割動的負荷分散方式とマルチPSI上での評価”(古
市ほか、 KL1 Programming Workshop 予稿集pp51
〜、ICOT、1991年)において論じられている。
2. Description of the Related Art Conventionally, regarding load equalization executed by each processor in a parallel computer system, "Stack division dynamic load balancing method and evaluation on multiple PSI" (Furuichi et al., KL1 Programming Workshop Proceedings pp51
~, ICOT, 1991).

【0003】上記従来技術では、予めプログラムで指定
した分配に適当なサブプログラムの実行が必要になる毎
に、実行時のプロセッサの状況により分配先プロセッサ
を決定する負荷分散サブプログラムに問い合わせ、暇な
プロセッサにサブプログラムを分配する。各プロセッサ
においては未実行のサブプログラムを、プライオリティ
別の管理テーブルに登録し、プライオリティの高い順に
サブプログラムを実行する。この際、最低プライオリテ
ィのサブプログラムが負荷分散サブプログラムに自分の
プロセッサ番号を登録する処理を実行することにより、
暇なプロセッサの検出を可能にしている。
In the above-mentioned prior art, every time it is necessary to execute a subprogram suitable for distribution specified in advance by a program, the load balancing subprogram which determines the distribution destination processor is inquired according to the state of the processor at the time of execution, and the time is spared. Distribute the subprogram to the processor. In each processor, the unexecuted subprograms are registered in the priority-based management table, and the subprograms are executed in descending order of priority. At this time, the lowest priority subprogram executes the process of registering its own processor number in the load balancing subprogram,
It enables detection of idle processors.

【0004】上記従来技術では、暇なプロセッサの検出
処理が本来のプログラム処理に追加されている。
In the above conventional technique, the detection processing of the spare processor is added to the original program processing.

【0005】[0005]

【発明が解決しようとする課題】上記従来技術は、負荷
分散処理の緊急性について配慮がされておらず、一旦全
プロセッサに、暇なプロセッサを登録するサブプログラ
ムを分配してから特定プロセッサに登録情報を集中させ
るため、プログラム実行の初期において分配が遅れる問
題があった。また、実行時においてこの登録処理を実行
するので処理量を増やすという問題があった。さらに、
分配対象となりうるサブプログラムをプログラマが指定
しなければならないという手間が発生した。本発明は、
上記の問題点を全て解決し、並列計算機システムにおい
て効率的で、プログラマに負担がかからない負荷分散処
理を実現することを目的とする。
The above-mentioned prior art does not consider the urgency of the load balancing process, and once the sub-program for registering the spare processor is distributed to all the processors and then registered to the specific processor. Since the information is concentrated, there is a problem that distribution is delayed in the early stage of program execution. Further, since this registration processing is executed at the time of execution, there is a problem that the processing amount is increased. further,
The trouble has arisen that the programmer has to specify the subprograms that can be distributed. The present invention is
It is an object of the present invention to solve all the above problems and to realize efficient load balancing processing in a parallel computer system without burdening the programmer.

【0006】[0006]

【課題を解決するための手段】上記目的を達成するため
に、本発明では、コンパイル時の入力情報として、過去
の実行におけるサブプログラムの実行回数を追加し、コ
ンパイル時において、サブプログラムのプロセッサへの
マッピングを行なう。最大プロセッサ数を指定してこれ
をコンパイル結果に反映する手段を設ける。
In order to achieve the above object, in the present invention, the number of times of execution of a subprogram in past execution is added as input information at the time of compilation, and the processor of the subprogram is added at the time of compilation. Mapping. A means for designating the maximum number of processors and reflecting this in the compilation result is provided.

【0007】[0007]

【作用】本発明ではコンパイル時にサブプログラムのプ
ロセッサへのマッピングを行なうので、実行時の負担を
へらすことができる。また、過去の実行情報を活用でき
るので、プログラマの負担を増やすことが少ない。ま
た、最大プロセッサ数を指定することにより、最適なプ
ロセッサ台数を求める手順として複雑なチューニングの
手順を簡略化できる。
In the present invention, since the subprogram is mapped to the processor at the time of compilation, the load at the time of execution can be reduced. Moreover, since the execution information in the past can be utilized, the burden on the programmer is not increased. Further, by designating the maximum number of processors, it is possible to simplify a complicated tuning procedure as a procedure for obtaining the optimum number of processors.

【0008】[0008]

【実施例】以下、本発明の一実施例を図1から図6によ
り説明する。 図1は、本発明の一実施例のシステム構
成図である。ソースプログラム1はコンパイラ2に入力
及びハードウェア実行3をへて、サブプログラムの実行
回数4を得る。サブプログラムの実行回数4と最大実行
プロセッサ数6とソースプログラム1を用いたプリコン
パイル処理5により新たなソースプログラムを得て、再
度コンパイル及び実行する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to FIGS. FIG. 1 is a system configuration diagram of an embodiment of the present invention. The source program 1 inputs to the compiler 2 and goes through the hardware execution 3 to obtain the execution count 4 of the subprogram. A new source program is obtained by precompilation processing 5 using the subprogram execution count 4, the maximum number of execution processors 6, and the source program 1, and the compilation and execution are performed again.

【0009】本実施例では、プリコンパイラを例に説明
するが、コンパイラ本体と一体化してもよい。
In this embodiment, the precompiler is described as an example, but it may be integrated with the main body of the compiler.

【0010】プリコンパイラは図2プリコンパイラ制御
フローに示す、サブプログラム呼び出し関係木作成21
とサブプログラム実行回数入力22の後、最大実行プロ
セッサ数入力23、プロセッサ割り付け24、出力生成
25を行なうことで実現できる。これらのデータは以下
のように使う。
The precompiler creates a subprogram call relation tree 21 shown in the precompiler control flow in FIG.
After the subprogram execution count input 22 and the maximum execution processor count input 23, processor allocation 24, and output generation 25 are performed. These data are used as follows.

【0011】(1)サブプログラム呼び出し関係木作成
21により、例えば図5のようなソースプログラムを読
み込んで、図3のようなサブプログラムの呼び出し関係
木を作成する。
(1) The subprogram call relationship tree creation 21 reads a source program as shown in FIG. 5, for example, and creates a subprogram call relationship tree as shown in FIG.

【0012】(2)サブプログラム実行回数入力22に
より、例えば図4のような過去の実行におけるサブプロ
グラムの実行回数を得る。
(2) By the subprogram execution count input 22, the execution count of the subprogram in the past execution as shown in FIG. 4 is obtained.

【0013】(3)最大実行プロセッサ数入力23によ
り、例えば最大実行プロセッサ数”30”を得る。
(3) For example, the maximum execution processor number "30" is obtained from the maximum execution processor number input 23.

【0014】(4)プロセッサ割り付け24により、サ
ブプログラムの呼び出し関係木の根本に近いサブプログ
ラムから順番に最大実行プロセッサ数”30”を越えな
いプロセッサを割り付ける。図3の場合サブプログラム
pの実行回数が図4によると21回なので、これが呼び
出し毎に違うプロセッサに割り付けられるものとし、そ
のほかのサブプログラムは呼び出したプロセッサにおい
て実行させる。aはpからしか呼ばれないのでpの一部
と見做し、負荷分散対象としない。bまたはcを負荷分
散すると図4から、実行プロセッサ数が21+101=
122となり最大実行プロセッサ数”30”を越えるの
でbおよびcを負荷分散対象としない。
(4) By the processor allocation 24, the processors that do not exceed the maximum execution processor number "30" are allocated in order from the subprogram near the root of the call relation tree of the subprogram. In the case of FIG. 3, the number of times the subprogram p is executed is 21 times according to FIG. 4, so it is assumed that this is allocated to a different processor for each call, and the other subprograms are executed by the processor that called. Since a is called only by p, it is regarded as a part of p and is not subject to load balancing. When b or c is load-balanced, the number of execution processors is 21 + 101 = from FIG.
Since this is 122 and the maximum number of processors to be executed exceeds "30", b and c are not subject to load balancing.

【0015】(5)出力生成25により、図6のような
出力を得る。ここで、call(pd(N1))@no
de(N)はサブプログラムpd(N1)を実行するプ
ロセッサ番号をNとすることを指示している。Nはプリ
コンパイラが生成したサブプログラムpdの呼び出し毎
に0から1ずつ加算され、modで表示した剰余演算に
よって最大実行プロセッサ数”30”を越えない整数で
ある。
(5) The output generator 25 obtains an output as shown in FIG. Where call (pd (N1)) @ no
de (N) indicates that the processor number that executes the subprogram pd (N1) is N. N is an integer that is incremented from 0 by 1 every time the subprogram pd generated by the precompiler is called, and does not exceed the maximum number of execution processors “30” by the modulo operation represented by mod.

【0016】以上の説明のように、本実施例によれば、
並列計算機システムにおいて、簡略で、効率の良い負荷
分散制御を実現する効果がある。
As described above, according to this embodiment,
In a parallel computer system, there is an effect of realizing simple and efficient load balancing control.

【0017】[0017]

【発明の効果】本発明によれば、コンパイル時に実行時
の情報を反映させることができるので、簡略な負荷分散
制御を行なうことができ、並列処理によるオーバヘッド
を削減する効果がある。また、最大実行プロセッサ数
は、プログラムの並列度を指定する単純な目安となるの
で、チューニング処理の第1歩としてプログラマの手間
を増やすことが少ない。
According to the present invention, since information at the time of execution can be reflected at the time of compilation, simple load balancing control can be performed, and the overhead due to parallel processing can be reduced. Further, the maximum number of execution processors is a simple guideline for designating the degree of parallelism of the program, and therefore the programmer's time and effort are rarely increased as the first step of the tuning process.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例のシステム構成図。FIG. 1 is a system configuration diagram of an embodiment of the present invention.

【図2】本発明によるプリコンパイラ制御フローを表わ
す図。
FIG. 2 is a diagram showing a precompiler control flow according to the present invention.

【図3】本発明によるサブプログラム呼び出し関係木の
例を示す図。
FIG. 3 is a diagram showing an example of a subprogram call relation tree according to the present invention.

【図4】サブプログラム実行回数出力例を示す図。FIG. 4 is a diagram showing an example of a sub-program execution count output.

【図5】ソースプログラムの一例を示す図。FIG. 5 is a diagram showing an example of a source program.

【図6】本発明によるプリコンパイラ出力例を示す図。FIG. 6 is a diagram showing an example of precompiler output according to the present invention.

【符号の説明】[Explanation of symbols]

21…サブプログラム呼び出し関係木作成、22…サブ
プログラム実行回数入力、23…最大実行プロセッサ数
入力、24…プロセッサ割り付け、25…出力生成。
21 ... Subprogram call relation tree creation, 22 ... Subprogram execution count input, 23 ... Maximum execution processor number input, 24 ... Processor allocation, 25 ... Output generation.

フロントページの続き (72)発明者 井門 徳安 東京都国分寺市東恋ケ窪1丁目280番地 株式会社日立製作所中央研究所内Front page continuation (72) Inventor Imon Tokuyasu 1-280, Higashi Koikekubo, Kokubunji, Tokyo Metropolitan Research Center, Hitachi Ltd.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】複数のプロセッサで並列に実行すべき複数
のサブプログラムからなるプログラムもコンパイルする
ときに、該プログラムの実行時に各サブプログラムが何
回実行されるかを示す実行回数と、該複数のプロセッサ
の数とに基づいて、該プロセッサの数を越えない範囲で
各サブプログラムを、その実行回数ごとに異なるプロセ
ッサに割りつける並列プログラムのコンパイル方法。
1. When compiling a program composed of a plurality of subprograms to be executed in parallel by a plurality of processors, the number of executions indicating how many times each subprogram is executed when the program is executed, and the plurality of execution times. A parallel program compiling method for allocating each subprogram to a different processor according to the number of executions of the subprogram, based on the number of processors and the number of processors.
JP21000693A 1993-08-03 1993-08-03 Compile method for parallel program Pending JPH0744507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP21000693A JPH0744507A (en) 1993-08-03 1993-08-03 Compile method for parallel program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP21000693A JPH0744507A (en) 1993-08-03 1993-08-03 Compile method for parallel program

Publications (1)

Publication Number Publication Date
JPH0744507A true JPH0744507A (en) 1995-02-14

Family

ID=16582284

Family Applications (1)

Application Number Title Priority Date Filing Date
JP21000693A Pending JPH0744507A (en) 1993-08-03 1993-08-03 Compile method for parallel program

Country Status (1)

Country Link
JP (1) JPH0744507A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009169862A (en) * 2008-01-18 2009-07-30 Panasonic Corp Program conversion device, method, program and recording medium
KR101281625B1 (en) * 2011-09-02 2013-07-03 고려대학교 산학협력단 The method for allocating input data and the apparatus for the same

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02183833A (en) * 1989-01-11 1990-07-18 Agency Of Ind Science & Technol Compiling system for multiprocessor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02183833A (en) * 1989-01-11 1990-07-18 Agency Of Ind Science & Technol Compiling system for multiprocessor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009169862A (en) * 2008-01-18 2009-07-30 Panasonic Corp Program conversion device, method, program and recording medium
KR101281625B1 (en) * 2011-09-02 2013-07-03 고려대학교 산학협력단 The method for allocating input data and the apparatus for the same

Similar Documents

Publication Publication Date Title
Melani et al. Response-time analysis of conditional DAG tasks in multiprocessor systems
Baker An analysis of EDF schedulability on a multiprocessor
CN1306399C (en) Virtual machine for network processor
JPS6184740A (en) Generating system of general-use object code
US20020083423A1 (en) List scheduling algorithm for a cycle-driven instruction scheduler
Thoman et al. Adaptive granularity control in task parallel programs using multiversioning
WO2023124543A1 (en) Data processing method and data processing apparatus for big data
JP2016218503A (en) Parallelizing compile method, parallelizing compiler, and car onboard device
WO2022166480A1 (en) Task scheduling method, apparatus and system
JP2002366366A (en) Compiling method, code generation method, stack register using method, compiler, program for realizing them, and storage medium
Kale et al. Toward a standard interface for user-defined scheduling in OpenMP
Matheou et al. FREDDO: an efficient framework for runtime execution of data-driven objects
Gharajeh et al. Heuristic-based task-to-thread mapping in multi-core processors
JPH0744507A (en) Compile method for parallel program
JP2005129001A (en) Apparatus and method for program execution, and microprocessor
Markatos Scheduling for locality in shared-memory multiprocessors
JP2000315163A (en) Method and system for equally distributing processor resource
Zabatta et al. A thread performance comparison: Windows NT and Solaris on a symmetric multiprocessor
Shu et al. Asynchronous problems on SIMD parallel computers
JP3241214B2 (en) Distributed processing apparatus and process execution method
Pérez et al. Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems
US20090187895A1 (en) Device, method, program, and recording medium for converting program
Obuchi et al. Efficient translation and execution method for automated parallel processing system by using valgrind
Sato Efficient work-stealing strategies for fine-grain task parallelism
JPS6182243A (en) Object program generating method