JP2580525B2

JP2580525B2 - Load balancing method for parallel computers

Info

Publication number: JP2580525B2
Application number: JP5309645A
Authority: JP
Inventors: 俊明垂井; 徳安井門
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1993-11-17
Filing date: 1993-11-17
Publication date: 1997-02-12
Anticipated expiration: 2012-02-12
Also published as: JPH07141302A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は複数のプロセッシング・
エレメントからなる並列計算機の動的な負荷分散方式に
係り、特に主記憶共有のプロセッサからなるクラスタを
ネットワークにより接続した、階層構造を持った並列計
算機において、負荷の均一分配に好適な負荷分散方式に
関する。BACKGROUND OF THE INVENTION The present invention relates to a plurality of processing systems.
The present invention relates to a dynamic load distribution method for a parallel computer composed of elements, and more particularly to a load distribution method suitable for uniform distribution of load in a parallel computer having a hierarchical structure in which clusters composed of main memory shared processors are connected by a network. .

【０００２】[0002]

【従来の技術】計算機性能の飛躍的向上に関して、多数
台のプロセッシング・エレメントを並列動作させる、並
列計算機が有望視されている。以下ではプロセッシング
・エレメントを略してＰＥと呼ぶ。並列計算機において
は負荷となるプロセスを複数のＰＥに均一に分散させる
ことが重要である。全ＰＥが等しく処理を負担した場合
に最も性能が高くなるのであり、負荷の偏りが大きくな
ればなるほど性能が低下する。また、並列計算機におい
ては、並列化オーバヘッド、つまりプロセスのＰＥ間の
分配に伴う通信処理の低減も重要な課題である。ＰＥ間
で頻繁に負荷値（実行待ちプロセス数）をやり取りし、
常に負荷の少ないＰＥにプロセスを分配すれば、各ＰＥ
の負荷は均等化するが、通信オーバヘッドは高く、全
体の実行効率はかえって低下する。従って、ＰＥ間の通
信量を削減しつつ、負荷の均等化を実現する負荷分散方
式が要求される。2. Description of the Related Art With respect to a dramatic improvement in computer performance, a parallel computer that allows a large number of processing elements to operate in parallel has been regarded as promising. Hereinafter, the processing element is abbreviated as PE. In a parallel computer, it is important to distribute a load process uniformly to a plurality of PEs. The performance is the highest when all the PEs share the processing, and the performance decreases as the load unevenness increases. In parallel computers, parallelization overhead, that is, reduction of communication processing accompanying distribution of processes among PEs is also an important issue. The load value (the number of processes waiting to be executed) is frequently exchanged between PEs,
If the process is always distributed to the PE with less load, each PE
However, the communication overhead is high and the overall execution efficiency is rather reduced. Therefore, there is a demand for a load distribution method that achieves load equalization while reducing the amount of communication between PEs.

【０００３】特開平２−２０８７３１に示さた技術で
は、プロセスを分配しようとしていたＰＥがランダムに
選んだＰＥの負荷値をネットワークを通じて調べ、相手
のＰＥの負荷値が小さい場合にのみ、負荷分散を行なう
ようになっている。In the technique disclosed in Japanese Patent Application Laid-Open No. Hei 2-208731, a PE that is trying to distribute a process checks the load value of a PE selected at random through a network, and only when the load value of a partner PE is small, does the load distribution. It is supposed to do it.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術では、全
てのＰＥがネットワークへの通信路を持っている場合は
有効であるが、主記憶共有のクラスタ構成を採用するこ
とにより、ＰＥがネットワークに直接接続されていない
ときは、他のＰＥの負荷値を簡単に調べることが出来
ず、効率的な負荷分散を行なうことができない。The above prior art is effective when all the PEs have a communication path to the network. However, by adopting a cluster configuration sharing the main memory, the PEs are connected to the network. If they are not directly connected, the load values of other PEs cannot be easily checked, and efficient load distribution cannot be performed.

【０００５】近年高集積ＬＳＩ技術の発達に伴い、主記
憶共有の密結合マルチプロセッサ（ＴＣＭＰ）を採用す
るシステムが増えている。ＴＣＭＰ内ではＰＥは同一の
アドレス空間を共有できるため、効率的な処理が可能で
ある。ＴＣＭＰは共有バスのスループットにより接続可
能なＰＥ数が１０台程度に制限されるため、より大規模
なシステムを構成するためにはＴＣＭＰクラスタをさら
にネットワークにより接続し、階層型のシステム構成を
とる必要が有る。その際、ネットワークのハードウェア
規模を削減するため、全てのＰＥからネットワークへの
通信路を出すことは通常行なわれず、クラスタ内に１台
通信用のプロセッサを設ける方式が一般的である。通信
用プロセッサのみがネットワークへ接続され、他のＰＥ
はネットワークへの通信路は持たない。クラスタ内のＰ
Ｅが通信を行なうには、ＴＣＭＰを介して通信用プロセ
ッサに通信を依頼しなければならない。In recent years, with the development of highly integrated LSI technology, systems employing a tightly coupled multiprocessor (TCMP) sharing main memory have been increasing. Since the PEs can share the same address space in the TCMP, efficient processing is possible. Since the number of PEs that can be connected to TCMP is limited to about 10 due to the throughput of the shared bus, in order to configure a larger-scale system, it is necessary to further connect a TCMP cluster via a network and adopt a hierarchical system configuration. There is. At that time, in order to reduce the hardware scale of the network, a communication path from all the PEs to the network is not usually provided, and a system in which one communication processor is provided in a cluster is generally used. Only the communication processor is connected to the network and the other PEs
Has no communication path to the network. P in the cluster
In order for E to communicate, it must request communication to the communication processor via TCMP.

【０００６】従来提案された動的負荷分散は、全てのＰ
Ｅがネットワークに接続されているアーキテクチャを想
定し、以下の手順で行なわれる。先ず、分配候補となる
ＰＥをランダムに選択し、分配候補ＰＥの負荷値を読み
出す。この際、各ＰＥの負荷値はネットワーク内の専用
レジスタに格納されている。これにより、負荷値を迅速
に読み出すことが出来る。その後分配候補ＰＥの負荷値
と自ＰＥの負荷値の比較が行なわれ、分配候補ＰＥの負
荷値が自ＰＥの負荷値より小さい場合には負荷の分配が
行なわれる。分配候補ＰＥの負荷値が自ＰＥの負荷値よ
り大きい場合には負荷の分配は行なわない。この方式
は、分配候補となるＰＥの負荷値のみを調べればよいた
め、システム全体の負荷情報等のグローバルな情報を使
うことなく、負荷分散を行なうことが可能である。この
負荷分散方式はスマートランダム方式と呼ばれている。[0006] The conventionally proposed dynamic load balancing is based on all P
Assuming an architecture in which E is connected to a network, the following procedure is performed. First, PEs to be distribution candidates are randomly selected, and the load values of the distribution candidate PEs are read. At this time, the load value of each PE is stored in a dedicated register in the network. Thus, the load value can be quickly read. Thereafter, the load value of the distribution candidate PE is compared with the load value of the own PE. If the load value of the distribution candidate PE is smaller than the load value of the own PE, the load is distributed. If the load value of the distribution candidate PE is larger than the load value of the own PE, the load is not distributed. In this method, it is only necessary to check the load value of the PE that is a distribution candidate, and therefore, it is possible to perform load distribution without using global information such as load information of the entire system. This load distribution method is called a smart random method.

【０００７】クラスタ構成のクラスタの間で、スマート
ランダム方式による動的負荷分散を行なおうとすると、
以下のような問題点が生じる。各クラスタのＰＥが他の
クラスタへ負荷分散を行なおうとし、分配候補クラスタ
の負荷値を読み出す場合、ＰＥはネットワークに直接は
接続されていないため、ＴＣＭＰを介して通信用プロセ
ッサに負荷値の読み出しの依頼が行なわれる。また、ネ
ットワークからの負荷値の読み出しが終了した後、読み
出された負荷値を通信用プロセッサから依頼元のＰＥに
転送する必要が有る。さらに、通信用プロセッサが他の
処理を行なっている場合には、ＰＥからの負荷値読み出
し要求にすぐに応えることが出来るとは限らない。その
ため、クラスタ内のＰＥが負荷値を得るまでの待ち時間
が非常に大きくなる危険が有る。負荷値読み出しの待ち
時間の増加は、ＰＥの稼働率の低下を招くばかりか、負
荷の分配そのものが遅れるため、効果的な動的負荷分散
を行なう上で大きな障害となる。[0007] When a dynamic load distribution is to be performed between clusters in a cluster configuration by a smart random method,
The following problems arise. When the PE of each cluster attempts to distribute the load to another cluster and reads the load value of the distribution candidate cluster, the PE is not directly connected to the network, so the read of the load value to the communication processor via TCMP. Is requested. Further, after the reading of the load value from the network is completed, it is necessary to transfer the read load value from the communication processor to the requesting PE. Further, when the communication processor is performing other processing, it is not always possible to immediately respond to the load value read request from the PE. For this reason, there is a risk that the waiting time until the PEs in the cluster obtain the load value becomes extremely long. An increase in the waiting time for reading the load value not only causes a decrease in the operation rate of the PEs, but also delays the load distribution itself, which is a major obstacle to effective dynamic load distribution.

【０００８】本発明の目的は、クラスタ構成の並列シス
テムにおいて、クラスタ内のＰＥが負荷分散を行なう際
の待ち時間を少なくすることが可能な、動的負荷分散方
法を提供することである。An object of the present invention is to provide a dynamic load distribution method capable of reducing a waiting time when a PE in a cluster performs load distribution in a parallel system having a cluster configuration.

【０００９】[0009]

【課題を解決するための手段】上記目的を解決するため
に、１つ以上のプロセッシング・エレメントと、通信用
プロセッサと、これらに共有された主記憶を有するクラ
スタを構成要素とし、複数のクラスタがネットワークに
より接続された並列計算機において、各クラスタの通信
用プロセッサが、定期的に、そのクラスタの負荷を分配
する候補クラスタをランダムに選択し、その分配候補ク
ラスタの番号を共有メモリに記憶し、分配候補クラスタ
の負荷値と自くらすたの負荷値とを比較し、分配候補ク
ラスタの負荷値が自クラスタの負荷値より小さいときに
は負荷分散の可否を表すフラグとして、分配可を表すフ
ラグを前記共有メモリに記憶し、分配候補クラスタの負
荷値が自クラスタの負荷値より大きいか等しい時には負
荷分散の否を表すフラグを前記共有メモリに記憶し、そ
のクラスタ内のプロセッシング・エレメントが、負荷の
分散を行う際に、プロセッシング・エレメントが前記記
憶されたフラグを前記共有メモリから読み出し、当該読
み出したフラグが負荷分散を可としている場合には上記
記憶された番号を有する分配候補クラスタに負荷を分配
し、該フラグが負荷分散を否トしている場合には、自ク
ラスタに負荷を分配する。 [Means for Solving the Problems] In order to solve the above-mentioned purpose
And one or more processing elements for communication
A processor having a main memory shared by the processors;
Cluster as a component, and multiple clusters
Communication of each cluster on parallel computers connected by
Processor periodically distributes the load of the cluster
Randomly select a candidate cluster to be distributed
The raster number is stored in the shared memory, and the distribution candidate cluster is stored.
Compare the load value of the
When the load value of the raster is smaller than the load value of the local cluster
Is a flag indicating whether the load can be distributed.
The lag is stored in the shared memory, and the negative
Negative if the load value is greater than or equal to the load value of the own cluster
A flag indicating whether or not the load is distributed is stored in the shared memory, and
Processing elements in a cluster
When performing decentralization, the processing element
The stored flag is read from the shared memory, and the read
If the flag that has appeared allows load distribution,
Distribute load to distribution candidate clusters with stored numbers
However, if the flag indicates that the load distribution is not
Distribute the load to the raster.

【００１０】さらに、各ＰＥが、通信用プロセッサが決
めた分配候補クラスタ番号と分配可否を表す上記フラグ
を読む手段を設ける。クラスタ内のＰＥが負荷分散を行
なおうとした場合、負荷分散の可否を表す上記フラグが
可の場合通信用プロセッサが決めた分配候補クラスタに
負荷を分配し、不可の場合には自クラスタのプロセスプ
ールに負荷をつなぎ込む。Further, each PE is provided with means for reading the distribution candidate cluster number determined by the communication processor and the flag indicating whether or not distribution is possible. When the PEs in the cluster attempt to distribute the load, if the flag indicating whether the load distribution is possible is permitted, the load is distributed to the distribution candidate cluster determined by the communication processor, and if not, the process of the own cluster is performed. Connect the load to the pool.

【００１１】[0011]

【作用】本発明によれば、通信用のプロセッサがあらか
じめ負荷分散候補となるクラスタを決め、負荷分散の可
否を調べておくことにより、クラスタ内のＰＥが負荷分
散を行う場合、あらかじめ決められた分配候補クラスタ
の番号及び、負荷分散の可否を示すフラグを読み出すだ
けで、迅速に負荷分散を行なうことが出来る。According to the present invention, when a processor in a cluster performs load distribution, a communication processor determines in advance a cluster that is a load distribution candidate and checks whether or not load distribution is possible. By simply reading the number of the distribution candidate cluster and the flag indicating whether or not the load can be distributed, the load can be quickly distributed.

【００１２】図１に本発明の並列システムのブロック図
を示す。図中１１０、１２０がＰＥ、１５０が通信用プ
ロセッサである。通信用プロセッサはランダムに決定さ
れた分配候補クラスタ番号１５７を持つ。さらに、負荷
値読み出し機構１５３により読み出された分配候補クラ
スタの負荷値１５４と自クラスタの負荷値１６１を比較
した結果の分配可否フラグ１５６を持つ。この分配候補
クラスタ番号１５７と分配可否フラグ１５６の組を負荷
分散情報１７０と呼ぶ。各ＰＥのＣＰＵ１１０、１２０
がプロセスを生成したときには、プロセス管理機構１１
２、１２２は、負荷分散情報１７０を調べ、負荷分配が
可の場合は分配候補のクラスタへプロセスを分配する。
負荷分配が不可のときは、自クラスタのプロセスプール
にプロセスを入れる。これにより、各プロセッサが負荷
分散を行なおうとした場合、負荷分散情報１７０を読み
出すだけで、すぐに負荷分散を行なうことができ、負荷
分散の待ち時間をなくすことが出来る。FIG. 1 is a block diagram of a parallel system according to the present invention. In the figure, 110 and 120 are PEs, and 150 is a communication processor. The communication processor has a distribution candidate cluster number 157 determined at random. Further, it has a distribution availability flag 156 as a result of comparing the load value 154 of the distribution candidate cluster read by the load value reading mechanism 153 with the load value 161 of the own cluster. The set of the distribution candidate cluster number 157 and the distribution availability flag 156 is referred to as load distribution information 170. CPU 110, 120 of each PE
Creates a process, the process management mechanism 11
2, 122 examines the load balancing information 170 and, if the load distribution is possible, distributes the process to the distribution candidate clusters.
If load distribution is not possible, add the process to the process pool of the own cluster. As a result, when each processor attempts to perform the load distribution, the load can be immediately distributed by merely reading the load distribution information 170, and the waiting time for the load distribution can be eliminated.

【００１３】分配候補クラスタ番号１５７はタイマ割込
１５１により定期的に更新される。従って、分配候補ク
ラスタの負荷値１５４や、分配可否フラグ１５６もタイ
マのインターバルで更新される。従って、タイマ割込の
間隔をプロセスの平均実行時間より十分短くしておけ
ば、常に最新の負荷分散情報を反映することが出来るた
め、正確な負荷分散を行なうことが出来る。The distribution candidate cluster number 157 is periodically updated by a timer interrupt 151. Therefore, the load value 154 of the distribution candidate cluster and the distribution availability flag 156 are also updated at intervals of the timer. Therefore, if the interval between the timer interrupts is set sufficiently shorter than the average execution time of the process, the latest load distribution information can be always reflected, and accurate load distribution can be performed.

【００１４】[0014]

【実施例】図１〜図４に本発明の１実施例を示す。図１
はｎクラスタ×ｍＰＥからなる動的負荷分散機構を持っ
た並列計算機の構成図である。図２はクラスタ内の各Ｐ
Ｅの動作のフローチャートである。図３〜４はクラスタ
内の通信用プロセッサの動作フローであり、それぞれ図
３はプロセスの分配処理、図４は負荷分散情報の更新処
理を示す。1 to 4 show one embodiment of the present invention. FIG.
FIG. 2 is a configuration diagram of a parallel computer having a dynamic load distribution mechanism composed of n clusters × mPE. FIG. 2 shows each P in the cluster.
It is a flowchart of the operation of E. 3 and 4 show operation flows of the communication processors in the cluster. FIG. 3 shows a process distribution process, and FIG. 4 shows a load distribution information update process.

【００１５】図１において１００、２００はクラスタ、
５００はネットワークである。クラスタ０（１００）の
内部のみ詳細に記す。他のクラスタも全く同一の構成を
持つ。クラスタの内部では１１０、１２０が（演算用
の）ＰＥ、１５０が通信用のプロセッサである。ＰＥの
内部では１１１、１２１がプロセスを実行するＣＰＵ、
１１２、１２２がプロセス管理機構である。１６０がク
ラスタのプロセスプール１６１がクラスタの負荷値（実
行待ちのプロセス数）である。通信用プロセッサの中で
は１５１がタイマ割込機構、１５２は乱数発生機構、１
５７は乱数発生機構によりランダムに決定させられた分
配候補クラスタ番号、１５３は分配候補クラスタの負荷
値を読み出すための機構、１５４は分配候補クラスタの
負荷値、１５５は自クラスタの負荷値と分配候補クラス
タの負荷値を比較するための比較器、１５６は比較に結
果決められる負荷分散の可否を表すフラグである。分配
候補クラスタ番号１５７と分配可否フラグ１５６を合わ
せて負荷分散情報１７０と呼ぶ。ネットワークの中では
５０２がプロセス等の通信を行なうルータ、５０１が各
クラスタの負荷値を記憶するための負荷値レジスタであ
る。In FIG. 1, 100 and 200 are clusters,
500 is a network. Only the inside of the cluster 0 (100) will be described in detail. Other clusters have exactly the same configuration. Inside the cluster, 110 and 120 are PEs (for calculation), and 150 is a processor for communication. In the PE, CPUs 111 and 121 execute processes,
Reference numerals 112 and 122 denote process management mechanisms. 160 is the load value of the cluster process pool 161 (the number of processes waiting to be executed). Among the communication processors, 151 is a timer interrupt mechanism, 152 is a random number generation mechanism, 1
57 is a distribution candidate cluster number randomly determined by the random number generation mechanism, 153 is a mechanism for reading the load value of the distribution candidate cluster, 154 is the load value of the distribution candidate cluster, and 155 is the load value of the own cluster and the distribution candidate. A comparator 156 for comparing the load values of the clusters is a flag indicating whether or not the load distribution determined as a result of the comparison can be performed. The distribution candidate cluster number 157 and the distribution availability flag 156 are collectively referred to as load distribution information 170. In the network, reference numeral 502 denotes a router for communicating processes and the like, and reference numeral 501 denotes a load value register for storing a load value of each cluster.

【００１６】本発明では、通信用プロセッサにおいて定
期的に負荷分散情報１７０を更新する機能を持つと共
に、クラスタ内の各ＰＥが負荷分散情報を読む機能を設
ける事に特徴がある。The present invention is characterized in that the communication processor has a function of periodically updating the load balancing information 170 and a function of reading the load balancing information by each PE in the cluster.

【００１７】先ず、システム全体の構成について述べ
る。システムでは、プログラムを実行するＰＥ（１１
０、１２０）が集まり、クラスタ１００を構成する。ク
ラスタ１００、２００をネットワーク５００により結合
し、システム全体を構成する。クラスタ内のＰＥは主記
憶共有の密結合マルチプロセッサシステムとなってい
る。各クラスタは通信用プロセッサ１５０を持ち、クラ
スタからは通信用プロセッサのみがネットワークに接続
される。その他のＰＥはネットワークへの通信路を持た
ないため、ＰＥが通信を行なう際には、通信用プロセッ
サ１５０を経由しなければならない。First, the configuration of the entire system will be described. In the system, the PE (11) that executes the program
0, 120) gather to form the cluster 100. The clusters 100 and 200 are connected by a network 500 to form the entire system. The PEs in the cluster are a tightly coupled multiprocessor system sharing the main memory. Each cluster has a communication processor 150, and only communication processors are connected to the network from the cluster. Since the other PEs do not have a communication path to the network, the PEs must communicate via the communication processor 150 when communicating.

【００１８】次にプロセスの分配の方式（負荷分散方
式）について述べる。クラスタ内のＰＥは主記憶を共有
しているため、プロセスプール１６０を共有することが
可能である。従って、クラスタ内のＰＥ間の負荷分散は
共有のプロセスプール１６０でプロセスを管理すること
により自動的に行なうことが出来る。それに対して、ク
ラスタの間はネットワークにより結合されており、共有
のプロセスプールを持つことが出来ない。従って、クラ
スタ間の負荷分散を行なうためにはプロセスを明示的に
他のクラスタに分ける必要が有る。本発明はこのクラス
タ間の負荷分散を効率良く行なうための方式である。具
体的にはクラスタ間の負荷分散を行なうべきか、及びど
このクラスタに負荷を分配するべきかを通信用プロセッ
サがあらかじめ決定しておくことにより、クラスタ内の
各ＰＥが負荷分散を行なう際の待ち時間を削減すること
を可能にする。以下では上記の機能を実現するための各
構成ユニットの詳細な動作について述べる。Next, a process distribution method (load distribution method) will be described. Since the PEs in the cluster share the main storage, it is possible to share the process pool 160. Therefore, the load distribution among the PEs in the cluster can be automatically performed by managing the processes in the shared process pool 160. On the other hand, clusters are connected by a network and cannot have a shared process pool. Therefore, in order to distribute the load between clusters, it is necessary to explicitly divide the process into other clusters. The present invention is a method for efficiently distributing the load between the clusters. Specifically, the communication processor determines in advance which load should be distributed between clusters and to which cluster the load should be distributed, so that each PE in the cluster can perform load distribution. Enable to reduce waiting time. Hereinafter, the detailed operation of each component unit for realizing the above functions will be described.

【００１９】図２にＰＥの動作フローを示す。ＰＥでは
先ず、プロセスプール１６０にプロセスが有るかが調べ
られる（ステップ１０）。プロセスが有る場合にはプロ
セスプールよりプロセスを取り出し（ステップ１１）、
負荷値（１６１）を更新（ステップ１２）した後、プロ
セスが実行される（ステップ１３）。プロセスの実行が
中断されると、子プロセスが生成されたかが調べられる
（ステップ１４）。その結果子プロセスが生成されてい
た場合には負荷分散処理が必要になる（ステップ１５〜
１９）。負荷分散処理では先ず、通信用プロセッサが決
めた負荷分散情報１７０が読み込まれる（ステップ１
５）。この負荷分散情報の読み込みは、共有メモリをア
クセスするだけなので、待ち時間をほとんどかけずに行
なうことができる。負荷分散情報を読み込んだ結果、分
配可否フラグ１５６の比較が行なわれる（ステップ１
６）。分配可否フラグがＯＮのとき、すなわち負荷分散
が可能なときは、通信用プロセッサに対して他のクラス
タへのプロセスの分配依頼が行なわれる（ステップ１
７）。それに対して、分配可否フラグがＯＦＦの時は他
クラスタへのプロセスの分配は行なわれず、自クラスタ
のプロセスプールにプロセスを書き戻した後（ステップ
１８）、負荷値の更新が行なわれる（ステップ１９）。
その後中断したプロセスの継続もしくは新プロセスの取
り出しが行なわれる（ステップ２０）。以上の処理によ
り、各ＰＥは通信用プロセッサがあらかじめ決めてある
負荷分散情報１７０を読み出すことにより、生成したプ
ロセスを迅速に分配することができる。FIG. 2 shows an operation flow of the PE. The PE first checks whether there is a process in the process pool 160 (step 10). If there is a process, the process is taken out of the process pool (step 11),
After updating the load value (161) (step 12), the process is executed (step 13). When the execution of the process is interrupted, it is checked whether a child process has been created (step 14). As a result, if a child process has been created, load distribution processing is required (steps 15 to 15).
19). In the load distribution processing, first, the load distribution information 170 determined by the communication processor is read (step 1).
5). Since the load balancing information is read only by accessing the shared memory, the load balancing information can be read with little waiting time. As a result of reading the load balancing information, the distribution availability flag 156 is compared (step 1).
6). When the distribution enable / disable flag is ON, that is, when the load distribution is possible, the communication processor is requested to distribute the process to another cluster (step 1).
7). On the other hand, when the distribution enable / disable flag is OFF, the process is not distributed to another cluster, the process is written back to the process pool of the own cluster (step 18), and the load value is updated (step 19). ).
Thereafter, the interrupted process is continued or a new process is taken out (step 20). Through the above processing, each PE can quickly distribute the generated process by reading the load distribution information 170 determined in advance by the communication processor.

【００２０】次に、通信用プロセッサの動作について述
べる。図３は通信用プロセッサのクラスタ間プロセス分
配処理部１５８の動作フローを示す。ネットワークから
のプロセスの到着が有る場合には（ステップ３０）、自
クラスタのプロセスプールにプロセスをつないだ後（ス
テップ３２）、負荷値の更新が行なわれる（ステップ３
３）。それに対して、自クラスタ内のＰＥからのプロセ
スの分配処理が有った場合には（ステップ３１）、ネッ
トワークのルータ５０２にプロセスの送出が行なわれる
（ステップ３４）。Next, the operation of the communication processor will be described. FIG. 3 shows an operation flow of the inter-cluster process distribution processing unit 158 of the communication processor. If a process has arrived from the network (step 30), the process is connected to the process pool of the own cluster (step 32), and then the load value is updated (step 3).
3). On the other hand, if there is a process distribution process from the PE in the own cluster (step 31), the process is sent to the router 502 of the network (step 34).

【００２１】以上の他に、通信用プロセッサは、ＰＥが
使用する負荷分散情報１７０の更新を行なう。負荷分散
情報の更新処理はタイマ割込１５１により定期的に起動
される。図４に通信用プロセッサ１５０のタイマ割込で
の動作を記す。タイマ割込を受けると先ず、自クラスタ
の負荷値１６１をネットワーク上の負荷値レジスタ１５
１に書込む（ステップ４０）。これにより、他のクラス
タに正確な負荷値を知らせることが出来る。その後、乱
数発生機構１５２を用いてクラスタをランダムに選択
し、分配候補クラスタ番号１５７を決定する（ステップ
４１）。負荷値読み出し機構１５３を用いて分配候補ク
ラスタの負荷値をネットワーク内の負荷値レジスタ５０
１より読み出す（ステップ４２）。ネットワーク上で各
クラスタの負荷値が管理されていることにより、負荷値
の読み込みは高速に行なうことが出来る。分配候補クラ
スタの負荷値１５４と自クラスタの負荷値１６１を比較
器１５５で比較を行なう（ステップ４４）。分配候補ク
ラスタの負荷値が自クラスタの負荷値より小さい場合に
は分配可否フラグをＯＮとし（ステップ４５）、クラス
タ内のＰＥが分配候補クラスタにプロセスを分配できる
ようにする。分配候補クラスタの負荷値が自クラスタの
負荷値より大きい時（等しい時も含む）は、分配可否フ
ラグをＯＦＦとし（ステップ４６）、他のクラスタへの
プロセスの分配は行なわれないようにする。以上に記し
た方法により、負荷分散情報を決定しておくことによ
り、クラスタ内のＰＥが負荷分散を行なおうとした場
合、すぐにその挙動を決定することが出来る。ここで、
ＰＥが負荷分散を行なう際に、常に新しい負荷分散情報
を得、かつ特定のクラスタへの負荷の集中を避けるるた
めには、負荷分散情報を更新するためのタイマ割込の頻
度はプロセスの平均実行時間より十分小さくしておかな
くてはならない。In addition to the above, the communication processor updates the load distribution information 170 used by the PE. The process of updating the load balancing information is started periodically by the timer interrupt 151. FIG. 4 shows the operation of the communication processor 150 when the timer is interrupted. When the timer interrupt is received, first, the load value 161 of the own cluster is stored in the load value register 15 on the network.
1 is written (step 40). As a result, an accurate load value can be notified to other clusters. Thereafter, a cluster is randomly selected using the random number generation mechanism 152, and a distribution candidate cluster number 157 is determined (step 41). The load value of the distribution candidate cluster is stored in the load value register 50 in the network by using the load value reading mechanism 153.
1 is read out (step 42). Since the load value of each cluster is managed on the network, the load value can be read at high speed. The comparator 155 compares the load value 154 of the distribution candidate cluster with the load value 161 of the own cluster (step 44). If the load value of the distribution candidate cluster is smaller than the load value of its own cluster, the distribution possibility flag is turned on (step 45), so that the PEs in the cluster can distribute the process to the distribution candidate cluster. When the load value of the distribution candidate cluster is larger than the load value of the own cluster (including when the load value is equal), the distribution permission flag is turned off (step 46), and the process is not distributed to another cluster. By determining the load distribution information by the method described above, when the PEs in the cluster attempt to distribute the load, the behavior can be immediately determined. here,
In order to always obtain new load balancing information and avoid concentration of load on a specific cluster when the PE performs load balancing, the frequency of timer interrupts for updating the load balancing information is the average of the process. Must be much smaller than the execution time.

【００２２】以上により、通信用プロセッサ１５０はク
ラスタ内のＰＥに常に新しい負荷分散情報を供給し、ク
ラスタ内のＰＥはその負荷分散情報を用いて迅速に動的
負荷分散を行なうことが可能である。As described above, the communication processor 150 always supplies new load distribution information to the PEs in the cluster, and the PEs in the cluster can quickly perform dynamic load distribution using the load distribution information. .

【００２３】[0023]

【発明の効果】本発明によれば、ＴＣＭＰのＰＥからな
るクラスタを、通信専用プロセッサを介してネットワー
クに接続した階層構成の並列計算機において、ネットワ
ークに直接接続されていないクラスタ内のＰＥが負荷分
散を行なう際の待ち時間を大幅に削減することができ
る。According to the present invention, in a parallel computer having a hierarchical configuration in which a cluster composed of TCMP PEs is connected to a network via a dedicated communication processor, the PEs in the clusters not directly connected to the network are load balanced. The waiting time when performing is significantly reduced.

[Brief description of the drawings]

【図１】本発明の動的負荷分散方式を持った並列計算機
のブロック図である。FIG. 1 is a block diagram of a parallel computer having a dynamic load balancing method according to the present invention.

【図２】図１のクラスタ内のプロセッシングエレメント
（ＰＥ）の動作のフローチャートである。FIG. 2 is a flowchart of an operation of a processing element (PE) in the cluster of FIG. 1;

【図３】図１の通信用プロセッサがプロセスを分配する
際のフローチャートである。FIG. 3 is a flowchart when the communication processor of FIG. 1 distributes a process.

【図４】図１の通信用プロセッサが負荷情報を更新する
際のフローチャートである。FIG. 4 is a flowchart when the communication processor of FIG. 1 updates load information.

[Explanation of symbols]

１００〜２００…クラスタ、５００…ネットワーク、１
１０〜１２０…プロセッシングエレメント（ＰＥ）、１
５０……通信用プロセッサ、１６０…プロセスプール、
１６１…クラスタの負荷値（実行待ちプロセス数）、１
１１〜１２１…ＣＰＵ、１１２〜１２２…プロセス管理
機構、１５１…タイマ割込機構、１５２…乱数発生機
構、１５３…負荷値読み出し機構、１５４…分配候補ク
ラスタの負荷値、１５５…比較機、１５６…負荷分配可
否フラグ、１５７…負荷分配候補クラスタ番号、１５８
…クラスタ間プロセス分配機構、５０１…負荷値レジス
タ、５０２…ルータ。100 to 200: cluster, 500: network, 1
10 to 120: Processing element (PE), 1
50: communication processor, 160: process pool,
161: cluster load value (number of processes waiting to be executed), 1
11 to 121 CPU, 112 to 122 process management mechanism, 151 timer interrupt mechanism, 152 random number generation mechanism, 153 load value read mechanism, 154 load value of distribution candidate cluster, 155 comparator, 156 ... Load distribution availability flag, 157... Load distribution candidate cluster number, 158
... Inter-cluster process distribution mechanism, 501, load value register, 502, router.

Claims

(57) [Claims]

1. A parallel computer in which one or more processing elements, a communication processor, and a cluster having a main memory shared by them are included as components, and a plurality of clusters are connected by a network. The communication processor periodically randomly selects a candidate cluster to distribute the load of the cluster , stores the number of the distribution candidate cluster in the shared memory, and stores the load value of the distribution candidate cluster and the load value of its own cluster. When the load value of the distribution candidate cluster is smaller than the load value of its own cluster, as a flag indicating whether or not load distribution is possible, a flag indicating that distribution is possible is stored in the shared memory ,
When the load value of the distribution candidate cluster is greater than or equal to the load value of the own cluster, a flag indicating whether or not load distribution is performed is set to the shared value.
Stored in the organic memory, processing elements in the cluster, load
When performing a distributed, the processing element is the
Reading the stored flag from the shared memory;
If the read flag allows load distribution, the load is distributed to the distribution candidate clusters having the stored numbers, and if the flag indicates that load distribution is not performed, the load is distributed to the own cluster. A load balancing method for a parallel computer characterized by the following.