JP2910676B2

JP2910676B2 - Load equalizer

Info

Publication number: JP2910676B2
Application number: JP8144929A
Authority: JP
Inventors: 彰一左近
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1996-05-15
Filing date: 1996-05-15
Publication date: 1999-06-23
Anticipated expiration: 2016-05-15
Also published as: JPH09305552A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のノードから
構成され、各ノードが１つのジョブを並行して実行する
分散並列処理システムに於いて、各ノードの負荷を均等
にする負荷均等化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a load equalizing device for equalizing the load on each node in a distributed parallel processing system comprising a plurality of nodes and each node executing one job in parallel. About.

【０００２】[0002]

【従来の技術】プロセッサ，メモリ等から構成されるノ
ードを複数備えたシステムに於いては、各ノードの負荷
を均等化するということが従来から行われている。例え
ば、特開平３−２６２０７４号公報では、疎結合並列計
算機に於ける負荷を動的に均等化するため、マスタプロ
セッサ以外のスレーブプロセッサに最低優先度のジョブ
を流しておき、このジョブからの応答に基づいて各スレ
ーブプロセッサの負荷を動的に監視し、空いたスレーブ
プロセッサにジョブを割り付けるようにしている。ま
た、特開平１−１５２５７１号公報では、ループの繰り
返し処理をプロセッサ総数で均等分割し、分割結果を各
プロセッサに割り付けることにより、負荷の均等化を図
っている。2. Description of the Related Art In a system having a plurality of nodes including a processor, a memory, and the like, it has been conventionally performed to equalize the load of each node. For example, in Japanese Patent Application Laid-Open No. 3-2620074, in order to dynamically equalize the load in a loosely coupled parallel computer, a job having the lowest priority is sent to slave processors other than the master processor, and a response from this job is sent. , The load of each slave processor is dynamically monitored, and a job is allocated to an empty slave processor. In JP-A-1-152571, the load is equalized by equally dividing the loop repetition processing by the total number of processors and allocating the division result to each processor.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来の技術の
内、前者は、ジョブを割り付け単位としているため、複
数のノードで１つのジョブ（分散並列処理プログラム）
を並行して実行する分散並列処理システムには適用でき
ないという問題がある。これに対して、後者の技術によ
れば、分散並列処理システムに於いて、各ノードが行う
仕事量を均等にし、各ノードの負荷を均等にすることが
できる。しかし、後者の技術は、各ノードの性能が同じ
であることを前提としているため、各ノードの性能が異
なる場合は、各ノードが行う仕事量を均等にすることは
できても、各ノードの負荷を均等にすることはできな
い。従って、性能の異なるノードが混在すると、性能の
高いノードの遊び時間が多くなり、分散並列処理システ
ムの性能を十分に引き出せないという問題が生じる。つ
まり、プログラムの実行時間が長くなるという問題が生
じる。Among the above-mentioned prior arts, the former uses a job as a unit of allocation, so that a single job (distributed parallel processing program) is executed by a plurality of nodes.
Cannot be applied to a distributed parallel processing system that executes the processing in parallel. On the other hand, according to the latter technique, in the distributed parallel processing system, the amount of work performed by each node can be equalized, and the load on each node can be equalized. However, since the latter technology assumes that the performance of each node is the same, if the performance of each node is different, even if the workload performed by each node can be equalized, The load cannot be equalized. Therefore, if nodes having different performances coexist, the idle time of the high-performance nodes increases, and the problem arises that the performance of the distributed parallel processing system cannot be sufficiently brought out. That is, there is a problem that the execution time of the program becomes longer.

【０００４】そこで、本発明の目的は、複数のノードか
ら構成される分散並列処理システムに於いて、各ノード
の性能が異なる場合であっても、各ノードの負荷を均等
化することにより、分散並列処理システムの性能を十分
に引き出すことができる負荷均等化装置を提供すること
にある。[0004] Therefore, an object of the present invention is to provide a distributed parallel processing system comprising a plurality of nodes, by equalizing the load on each node even when the performance of each node is different. It is an object of the present invention to provide a load equalization device that can sufficiently bring out the performance of a parallel processing system.

【０００５】[0005]

【課題を解決するための手段】本発明は上記目的を達成
するため、複数のノードから構成された分散並列処理シ
ステムに於ける負荷均等化装置に於いて、ソースプログ
ラムを翻訳して並列処理可能な実行プログラムを生成す
ると共に、前記ソースプログラムに基づいて前記各ノー
ドの性能を測定するための性能測定プログラムを生成す
るコンパイラと、前記各ノードに前記性能測定プログラ
ムを実行させて前記各ノードの性能を測定した後、前記
各ノードの性能比に応じて前記実行プログラムの処理を
前記各ノードに割り当てるスケジューラとを備えてい
る。In order to achieve the above object, according to the present invention, a source program can be translated and processed in parallel in a load equalization device in a distributed parallel processing system composed of a plurality of nodes. A simple execution program, and based on the source program,
Generate a performance measurement program to measure the performance of
And the compiler that, after measuring the performance of each node to execute the performance measurement program to the each node, and a scheduler for allocating a processing of said execution program in accordance with the performance ratio of each node to each node ing.

【０００６】上記した構成によれば、コンパイラがソー
スプログラムを翻訳して並列処理可能な実行プログラム
を生成すると共に、ソースプログラムに基づいて性能測
定プログラムを生成し、スケジューラが各ノードに性能
測定プログラムを実行させることによって測定した各ノ
ードの性能比に応じて実行プログラムの処理を各ノード
に割り当てる。According to the above configuration, the compiler translates the source program to generate an executable program that can be processed in parallel , and measures performance based on the source program.
Generate a fixed program and scheduler performs performance on each node
The processing of the execution program is assigned to each node according to the performance ratio of each node measured by executing the measurement program .

【０００７】また、本発明は、実行するプログラムに合
った性能測定プログラムによって各ノードの性能を測定
できるようにするため、前記コンパイラは、前記ソース
プログラムの最も多く実行される最頻実行部分を抽出す
る最頻実行部分抽出部と、該最頻実行部分抽出部が抽出
した最頻実行部分に基づいて前記性能測定プログラムを
生成する性能測定プログラム生成部とを備えている。Further, according to the present invention, in order that the performance of each node can be measured by a performance measuring program suitable for the program to be executed, the compiler extracts the most frequently executed part of the source program. And a performance measurement program generation unit that generates the performance measurement program based on the mode execution part extracted by the mode execution part extraction unit.

【０００８】上記した構成によれば、最頻実行部分抽出
部がソースプログラムの最頻実行部分を抽出し、性能測
定プログラム生成部が最頻実行部分抽出部で抽出された
最頻実行部分に基づいて性能測定プログラムを生成す
る。According to the above configuration, the most frequently executed part extracting unit extracts the most frequently executed part of the source program, and the performance measuring program generating unit extracts the most frequently executed part based on the most frequently executed part extracted by the most frequently executed part extracting unit. To generate a performance measurement program.

【０００９】[0009]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１０】図１は本発明の実施の形態例を示すブロッ
ク図であり、コンパイラ１１及びスケジューラ１４を含
む負荷均等化装置１と、ノード２１−１〜２１−Ｎ及び
各ノードを接続するネットワーク２２を含む分散並列処
理システム２とから構成されている。ノード２１−１〜
２１−Ｎは、プロセッサ及びメモリから構成される。FIG. 1 is a block diagram showing an embodiment of the present invention, in which a load equalizing device 1 including a compiler 11 and a scheduler 14 is connected to nodes 21-1 to 21-N and a network 22 connecting the nodes. And a distributed parallel processing system 2 including Nodes 21-1 to 21-1
21-N includes a processor and a memory.

【００１１】コンパイラ１１は、入力されたソースプロ
グラム３を翻訳して並列処理可能な実行プログラム１２
を生成する機能を有すると共に、ソースプログラム３の
最も多く実行される部分（最頻実行部分）に基づいて性
能測定プログラム１３を生成する機能を有する。The compiler 11 translates the input source program 3 and executes the parallel executable program 12
And a function of generating the performance measurement program 13 based on the most frequently executed part (the most frequently executed part) of the source program 3.

【００１２】スケジューラ１４は、各ノード２１−１〜
２１−Ｎに性能測定プログラム１３を実行させて各ノー
ド２１−１〜２１−Ｎの性能を測定する機能，各ノード
２１−１〜２１−Ｎの性能に応じて実行プログラム１２
の処理を各ノード２１−１〜２１−Ｎに割り当てる機能
を有する。The scheduler 14 has a function of each of the nodes 21-1 to 21-1.
21-N to execute the performance measurement program 13 to measure the performance of each of the nodes 21-1 to 21-N, and to execute the execution program 12 in accordance with the performance of each of the nodes 21-1 to 21-N.
Is assigned to each of the nodes 21-1 to 21-N.

【００１３】次に動作について説明する。Next, the operation will be described.

【００１４】ソースプログラム３が入力されると、コン
パイラ１１は、先ず、字句構文解析を行い、その後、ソ
ースプログラム３の最頻実行部分を抽出する。尚、最頻
実行部分の抽出方法としては、例えば、プログラム構造
を静的に調べ、手続き呼び出しのネストに基づいてルー
プ構造の最内側ループ内の文を最頻実行部分として抽出
する方法、プログラムの実行プロフィールを採取し、そ
れに基づいて最頻実行部分を抽出する方法、ユーザの指
示行による指定により最頻実行部分を抽出する方法な
ど、種々の方法をとることができる。When the source program 3 is input, the compiler 11 first performs lexical syntax analysis, and then extracts the most frequently executed part of the source program 3. As a method of extracting the most frequently executed part, for example, a method of statically examining a program structure and extracting a statement in the innermost loop of the loop structure as a most frequently executed part based on a nest of procedure calls, Various methods can be adopted, such as a method of extracting an execution profile and extracting the most frequently executed part based on the extracted information, and a method of extracting the most frequently executed part by designating the instruction line by the user.

【００１５】その後、コンパイラ１１は、抽出された最
頻実行部分で使用されている演算を基に各ノード２１−
１〜２１−Ｎの演算性能を測定するための性能測定プロ
グラム１３を生成する。更に、コンパイラ１１は、字句
構文解析結果に基づいて分散並列処理システム２で並列
処理可能な実行プログラム１２を生成する。After that, the compiler 11 determines each node 21- based on the operation used in the extracted most frequently executed part.
A performance measurement program 13 for measuring the calculation performance of 1 to 21-N is generated. Further, the compiler 11 generates an execution program 12 that can be processed in parallel by the distributed parallel processing system 2 based on the lexical analysis result.

【００１６】スケジューラ１４は、コンパイラ１１によ
って生成された実行プログラム１２を分散並列処理シス
テム２で実行させる場合、それに先立って性能測定プロ
グラム１３を各ノード２１−１〜２１−Ｎで実行させ、
各ノード２１−１〜２１−Ｎの性能を測定する。その
後、スケジューラ１４は、各ノード２１−１〜２１−Ｎ
の性能比を求め、それに基づいて実行プログラム１２の
処理を各ノード２１−１〜２１−Ｎに割り当てる。この
ようにすることにより、性能の高いノードには多くの処
理が割り当てられ、性能の低いノードには少しの処理し
か割り当てられないので、各ノード２１−１〜２１−Ｎ
の負荷を均等にすることができる。その結果、各ノード
２１−１〜２１−Ｎの実行時間が均等化され、遊ぶノー
ドがなくなるため、プログラムの実行時間を短くするこ
とができる。When the scheduler 14 executes the execution program 12 generated by the compiler 11 on the distributed parallel processing system 2, the scheduler 14 executes the performance measurement program 13 on each of the nodes 21-1 to 21-N prior to the execution.
The performance of each of the nodes 21-1 to 21-N is measured. Thereafter, the scheduler 14 determines that each of the nodes 21-1 to 21-N
And the processing of the execution program 12 is assigned to each of the nodes 21-1 to 21-N based on the performance ratio. By doing so, many processes are assigned to the high-performance nodes, and only a few processes are assigned to the low-performance nodes.
Load can be equalized. As a result, the execution time of each of the nodes 21-1 to 21-N is equalized, and there is no node to play, so that the execution time of the program can be shortened.

【００１７】図２は本発明の実施例のブロック図であ
り、コンパイラ１１及びスケジューラ１４を含む負荷均
等化装置１と、３台のノードＡ，Ｂ，Ｃ及び各ノードを
接続するネットワーク２２を含む分散並列処理システム
２とから構成されている。FIG. 2 is a block diagram of an embodiment of the present invention, which includes a load equalizing apparatus 1 including a compiler 11 and a scheduler 14, three nodes A, B, and C and a network 22 connecting the nodes. And a distributed parallel processing system 2.

【００１８】コンパイラ１１は、ソースプログラム３を
読み込み、その字句解析，構文解析を行う字句構文解析
部１１１と、ソースプログラム３の最頻実行部分を抽出
する最頻実行部分抽出部１１２と、最頻実行部分抽出部
１１２で抽出された最頻実行部分で使用されている演算
を基に各ノードＡ，Ｂ，Ｃの演算性能を測定するための
性能測定プログラム１３を生成する性能測定プログラム
生成部１１３と、字句構文解析部１１１の解析結果に基
づいて並列処理可能な複数のプロセスから構成される実
行プログラム１２を生成するコード生成部１１４とを備
えている。The compiler 11 reads the source program 3 and analyzes the lexical and syntax of the source program 3. The lexical syntactic analyzer 111 extracts the most frequently executed part of the source program 3. A performance measurement program generation unit 113 that generates a performance measurement program 13 for measuring the operation performance of each of the nodes A, B, and C based on the operation used in the most frequently executed part extracted by the execution part extraction unit 112 And a code generation unit 114 that generates an execution program 12 including a plurality of processes that can be processed in parallel based on the analysis result of the lexical syntax analysis unit 111.

【００１９】スケジューラ１４は、性能測定プログラム
１３を各ノードＡ，Ｂ，Ｃで実行させることにより各ノ
ードＡ，Ｂ，Ｃの性能を測定するノード性能測定部１４
１と、各ノードＡ，Ｂ，Ｃの性能を示す性能情報に基づ
いて各ノードＡ，Ｂ，Ｃの性能比を計算する性能比計算
部１４２と、実行プログラム１２を構成する並列処理可
能な複数のプロセスを各ノードＡ，Ｂ，Ｃにどのように
割り当てるかを性能比計算部１４２で計算された性能比
に応じて決定するプログラム割り当て部１４３と、プロ
グラム割り当て部１４３の決定に従ってプロセスを各ノ
ードＡ，Ｂ，Ｃに割り当てるプログラム実行部１４４と
を備えている。The scheduler 14 executes a performance measurement program 13 on each of the nodes A, B, and C to measure the performance of each of the nodes A, B, and C.
1, a performance ratio calculation unit 142 that calculates a performance ratio of each of the nodes A, B, and C based on performance information indicating the performance of each of the nodes A, B, and C; A program allocating unit 143 that determines how to assign the process to each of the nodes A, B, and C according to the performance ratio calculated by the performance ratio calculating unit 142. And a program execution unit 144 assigned to A, B, and C.

【００２０】図３はソースプログラム３の一例を示した
図であり、この図３に示したＨＰＦ（ＨｉｇｈＰｅｒ
ｆｏｒｍａｎｃｅＦＯＲＴＲＡＮ）のソースプログラ
ムは、１００００個の数の和を６個のＭＰＩ（Ｍｅｓｓ
ａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ）プロセ
スで分散並列処理することを指示するものである。その
第３行は、プロセッサ数を６にすることを指示する行で
あり、第４行〜第６行は配列ａ（ｎ），ｂ（ｎ），ｃ
（ｎ）を６個のプロセッサに割り当てることを指示する
行である。尚、第３行によって指示するプロセッサ数
は、分散並列処理システム２が備えているノード数より
も多くすることが必要である。FIG. 3 is a diagram showing an example of the source program 3. The HPF (High Perm.) Shown in FIG.
The source program of “formance FORTRAN” is the sum of 10000 numbers and 6 MPI (Message).
The instruction indicates that distributed parallel processing is performed by an age passing interface (Age Passing Interface) process. The third line is a line instructing that the number of processors is set to six, and the fourth to sixth lines are arrays a (n), b (n), c
This line indicates that (n) is assigned to six processors. Note that the number of processors specified by the third row needs to be larger than the number of nodes included in the distributed parallel processing system 2.

【００２１】図４はコンパイラ１１の処理例を示す流れ
図、図５はスケジューラ１４の処理例を示す流れ図であ
り、以下各図を参照して本実施例の動作を説明する。FIG. 4 is a flowchart showing a processing example of the compiler 11, and FIG. 5 is a flowchart showing a processing example of the scheduler 14. The operation of this embodiment will be described below with reference to the drawings.

【００２２】コンパイラ１１に図３のソースプログラム
３が入力されると、字句構文解析部１１１がソースプロ
グラム３の字句解析，構文解析を行い、解析結果を最頻
実行部分抽出部１１２及びコード生成部１１４に渡す
（図４，Ｓ１）。When the source program 3 shown in FIG. 3 is input to the compiler 11, the lexical analysis unit 111 performs lexical analysis and syntactic analysis of the source program 3, and analyzes the analysis result to the most frequently executed part extraction unit 112 and the code generation unit. It is passed to 114 (FIG. 4, S1).

【００２３】字句構文解析部１１１から解析結果が渡さ
れると、最頻実行部分抽出部１１２は、最内側ループの
ａ（ｉ）＝ｂ（ｉ）＋ｃ（ｉ）の部分を最頻実行部分と
して抽出する（Ｓ２）。尚、ソースプログラム３中に複
数ループがある場合は、実行時プロフィール情報を使用
して最頻実行部分を決定するか、ユーザの指定により最
頻実行部分を決定する。When the analysis result is passed from the lexical syntax analysis unit 111, the mode execution part extraction unit 112 sets the a (i) = b (i) + c (i) part of the innermost loop as the mode execution part. Extract (S2). When there are a plurality of loops in the source program 3, the most frequently executed part is determined by using the profile information at the time of execution, or the most frequently executed part is determined by the designation of the user.

【００２４】最頻実行部分が抽出されると、性能測定プ
ログラム生成部１１３が、抽出された最頻実行部分で使
用されている演算ａ（ｉ）＝ｂ（ｉ）＋ｃ（ｉ）を基
に、各ノードＡ，Ｂ，Ｃの演算性能を測定するための性
能測定プログラム１３を生成する（Ｓ３）。性能測定プ
ログラム１３は、例えば、図６に示すものとなる。性能
測定プログラム１３では、演算の実行時間を測定できれ
ば良いので、ループの回転数を多くする必要はなく、図
６の例では、ループの回転数を１０としている。When the most frequently executed part is extracted, the performance measurement program generation unit 113 calculates the operation a (i) = b (i) + c (i) used in the extracted most frequently executed part. Then, a performance measurement program 13 for measuring the operation performance of each of the nodes A, B, and C is generated (S3). The performance measurement program 13 is, for example, as shown in FIG. Since the performance measurement program 13 only needs to measure the execution time of the calculation, it is not necessary to increase the number of rotations of the loop. In the example of FIG. 6, the number of rotations of the loop is set to 10.

【００２５】コード生成部１１４は、字句構文解析部１
１１から解析結果が渡されると、ＨＰＦの実行プログラ
ム１２を生成する（Ｓ４）。The code generation unit 114 includes the lexical syntax analysis unit 1
When the analysis result is passed from 11, the execution program 12 of the HPF is generated (S4).

【００２６】スケジューラ１４は、実行プログラム１２
を分散並列処理システム２で実行させる場合、それに先
立って性能測定プログラム１３を入力する。ノード性能
測定部１４１は、性能測定プログラム１３が入力される
と、それを各ノードＡ，Ｂ，Ｃで実行させることによ
り、各ノードＡ，Ｂ，Ｃの性能を測定する（図５，Ｓ１
１）。The scheduler 14 executes the execution program 12
Is executed by the distributed parallel processing system 2, the performance measurement program 13 is input prior to the execution. When the performance measurement program 13 is input, the node performance measurement unit 141 executes the program at each of the nodes A, B, and C to measure the performance of each of the nodes A, B, and C (FIG. 5, S1).
1).

【００２７】各ノードＡ，Ｂ，Ｃの性能情報が得られる
と、性能比計算部１４２が各ノードＡ，Ｂ，Ｃの性能比
を計算する（Ｓ１２）。今、例えば、各ノードＡ，Ｂ，
Ｃの性能を示す性能情報として、それぞれ２０秒，３０
秒，６０秒が得られたとすると、性能比はＡ：Ｂ：Ｃ＝
１／２０：１／３０：１／６０＝３：２：１となる。When the performance information of each of the nodes A, B, and C is obtained, the performance ratio calculator 142 calculates the performance ratio of each of the nodes A, B, and C (S12). Now, for example, each node A, B,
The performance information indicating the performance of C is 20 seconds and 30 seconds, respectively.
And 60 seconds, the performance ratio is A: B: C =
1/20: 1/30: 1/60 = 3: 2: 1.

【００２８】各ノードＡ，Ｂ，Ｃの性能比が求められる
と、プログラム割り当て部１４３は、ノードＡ，Ｂ，Ｃ
の性能比がＡ：Ｂ：Ｃ＝３：２：１であり、実行プログ
ラム１２が６個のプロセスによって構成されていること
から、ノードＡ，Ｂ，Ｃにそれぞれ３プロセス，２プロ
セス，１プロセスを割り当てることを決定する（Ｓ１
３）。When the performance ratio of each of the nodes A, B, and C is obtained, the program allocating unit 143 sets the nodes A, B, and C
Is A: B: C = 3: 2: 1 and the execution program 12 is composed of six processes, so that three processes, two processes, and one process Is determined (S1).
3).

【００２９】その後、プログラム実行部１４４が、プロ
グラム割り当て部１４３の決定に従って、ノードＡ，
Ｂ，Ｃにそれぞれ３プロセス，２プロセス，１プロセス
を割り当て、プログラムを実行させる（Ｓ１４）。尚、
上述した実施例に於いては、コンパイラ１１で性能測定
プログラム１３を作成し、それによってノードの性能を
測定するようにしたが、予め用意されている性能測定プ
ログラムによってノードの性能を測定するようにしても
良い。しかし、実施例のようにした方が、ノードの性能
を正確に測定することができる。Thereafter, the program execution unit 144 determines whether the node A,
Three processes, two processes, and one process are assigned to B and C, respectively, and the program is executed (S14). still,
In the above-described embodiment, the performance measurement program 13 is created by the compiler 11, and the performance of the node is measured by the compiler 11. However, the performance of the node is measured by the performance measurement program prepared in advance. May be. However, in the case of the embodiment, the performance of the node can be accurately measured.

【００３０】[0030]

【発明の効果】以上説明したように、本発明は、各ノー
ドの性能に応じて実行プログラムの処理を各ノードに割
り当てるので、各ノードの性能が異なる場合であっても
各ノードの負荷を均等にすることができる。この結果、
各ノードのプログラム実行時間が均等化され、遊ぶノー
ドがなくなるので、プログラムの実行時間を短縮するこ
とができる。As described above, according to the present invention, the processing of the execution program is assigned to each node in accordance with the performance of each node. Therefore, even if the performance of each node is different, the load of each node is equalized. Can be As a result,
Since the program execution time of each node is equalized and there are no nodes to play, the program execution time can be reduced.

【００３１】また、本発明は、ソースプログラムの最頻
実行部分に基づいて性能測定プログラムを生成し、この
性能測定プログラムによって各ノードの性能を測定する
ようにしたものであり、実行するプログラムに合った性
能測定プログラムによって各ノードの性能を測定できる
ので、実行されるプログラムにかかわらず、各ノードの
負荷を均一化することができる。Further, according to the present invention, a performance measurement program is generated based on the most frequently executed part of a source program, and the performance of each node is measured by the performance measurement program. Since the performance of each node can be measured by the performance measurement program, the load of each node can be equalized regardless of the program to be executed.

[Brief description of the drawings]

【図１】本発明の実施の形態例を示すブロック図であ
る。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】本発明の実施例のブロック図である。FIG. 2 is a block diagram of an embodiment of the present invention.

【図３】ソースプログラム３の一例を示す図である。FIG. 3 is a diagram illustrating an example of a source program 3.

【図４】コンパイラ１１の処理例を示す流れ図である。FIG. 4 is a flowchart showing a processing example of a compiler 11;

【図５】スケジューラ１４の処理例を示す流れ図であ
る。FIG. 5 is a flowchart showing a processing example of a scheduler 14;

【図６】性能測定プログラム１３の一例を示す図であ
る。FIG. 6 is a diagram showing an example of a performance measurement program 13.

[Explanation of symbols]

１…負荷均等化装置１１…コンパイラ１１１…字句構文解析部１１２…最頻実行部分抽出部１１３…性能測定プログラム生成部１１４…コード生成部１２…実行プログラム１３…性能測定プログラム１４…スケジューラ１４１…ノード性能測定部１４２…性能比計算部１４３…プログラム割り当て部１４４…プログラム実行部２…分散並列処理システム２１−１〜２１−Ｎ，Ａ，Ｂ，Ｃ…ノード２２…ネットワーク DESCRIPTION OF SYMBOLS 1 ... Load equalization apparatus 11 ... Compiler 111 ... Lexical syntax analysis part 112 ... Most frequently executed part extraction part 113 ... Performance measurement program generation part 114 ... Code generation part 12 ... Execution program 13 ... Performance measurement program 14 ... Scheduler 141 ... Node Performance measurement unit 142 Performance ratio calculation unit 143 Program allocation unit 144 Program execution unit 2 Distributed parallel processing system 21-1 to 21-N, A, B, C Node 22 Network

フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 15/16 630 G06F 9/45 G06F 15/177 674 Continuation of the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 15/16 630 G06F 9/45 G06F 15/177 674

Claims

(57) [Claims]

In a load equalization device in a distributed parallel processing system including a plurality of nodes, a source program is translated to generate an executable program that can be processed in parallel , and based on the source program.
Performance measurement program for measuring the performance of each node
A compiler that generates ram, after measuring the performance of the by executing the performance measurement program to each node each node, a scheduler for allocating a processing of the execution program in accordance with the performance ratio of each node to each node And a load equalizing device.

2. The compiler according to claim 1, wherein the compiler extracts a most frequently executed part of the source program, and extracts a most frequently executed part of the source program, based on the most frequently executed part extracted by the most frequently executed part. 2. The load equalization apparatus according to claim 1, further comprising a performance measurement program generation unit that generates the performance measurement program.

3. The load equalization apparatus according to claim 2, wherein the mode execution portion extraction unit has a configuration for extracting an innermost loop portion of the source program as a mode execution portion.