JPH09293057A

JPH09293057A - Task allocation method in hierarchical structure type multiprocessor system

Info

Publication number: JPH09293057A
Application number: JP8131124A
Authority: JP
Inventors: Katsuaki Fundou; 勝昭分銅
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-04-26
Filing date: 1996-04-26
Publication date: 1997-11-11

Abstract

PROBLEM TO BE SOLVED: To perform efficient parallel processing by the hierarchical structure type multiprocessor system. SOLUTION: When a source program 10 is translated, a compiler 11 predicts the execution times of respective tasks which can be processed in parallel and overhead times at the time of the execution of the tasks by clusters different from a main cluster and incorporates them as prediction results 120 in an object program 12. A scheduler 13 when allocating the tasks to processors judges whether the tasks are completed earlier by expecting free processors to be obtained in the main cluster 2 or assigning the tasks to free processors in other clusters unless free processors to which all the tasks are assigned are present in the main cluster 2, and then allocates the tasks to the cluster which will complete the tasks.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は階層構造型マルチプ
ロセッサシステムで並列処理を行う場合のタスク割り当
て方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a task allocation method for parallel processing in a hierarchical multiprocessor system.

【０００２】[0002]

【従来の技術】マルチプロセッサシステムの一種に、図
６に示すように、複数のプロセッサ４とそれらで共有さ
れるローカルメモリ３とから構成されるクラスタ２を複
数備え、これら複数のクラスタ２を共通システムバス１
によって相互にデータ転送可能に接続した階層構造型マ
ルチプロセッサシステムがある。この階層構造型マルチ
プロセッサシステムの特徴は、各クラスタ２毎にローカ
ルメモリ３を有する為、全てのプロセッサ４で１つのメ
モリを共有する通常の密結合マルチプロセッサシステム
に比べてメモリアクセス競合の頻度が低下することであ
る。但し、他のクラスタのローカルメモリ上のデータを
必要とする場合には、そのデータを自クラスタのローカ
ルメモリに転送する必要があり、時間がかかる。2. Description of the Related Art As shown in FIG. 6, a kind of multiprocessor system is provided with a plurality of clusters 2 each composed of a plurality of processors 4 and a local memory 3 shared by them, and these plurality of clusters 2 are commonly used. System bus 1
There is a hierarchical multi-processor system which is connected so that data can be transferred to each other. The characteristic of this hierarchical structure multiprocessor system is that each cluster 2 has a local memory 3, so that the frequency of memory access competition is higher than that of a normal tightly coupled multiprocessor system in which all the processors 4 share one memory. It is to fall. However, when the data in the local memory of another cluster is required, it is necessary to transfer the data to the local memory of its own cluster, which takes time.

【０００３】このような特徴を有するため、或るユーザ
プログラムを階層構造型マルチプロセッサシステムで実
行する場合、複数のクラスタの中から１つのクラスタを
主クラスタとして選択し、ユーザプログラムの開始から
終了まで主クラスタ内のプロセッサに、ユーザプログラ
ムを構成する各タスクを割り当てる方法が一般に採用さ
れている。Due to these characteristics, when a certain user program is executed in a hierarchical multiprocessor system, one cluster is selected as a main cluster from a plurality of clusters, and the user program is started and ended. A method of allocating each task constituting the user program to the processor in the main cluster is generally adopted.

【０００４】[0004]

【発明が解決しようとする課題】ところで、ユーザプロ
グラムの或る処理を複数に分割してその各々を別々のタ
スクとしてマルチプロセッサシステムで並列処理する場
合の、その分割方法として次の２通りがある。その１つ
は、例えば特開平１−１５２５７１号公報に示されるよ
うに、使用可能なプロセッサ台数を前提とし、ユーザプ
ログラムの処理をその使用可能プロセッサ台数で均等に
分割し、その各々を１つのタスクとする方法であり、他
の方法は、ソースプログラム中でユーザが予め並列化数
を指示しておき、その並列化指示に従ってユーザプログ
ラムの処理を分割し、その各々を１つのタスクとする方
法である。By the way, when a certain process of a user program is divided into a plurality of parts and each of them is processed in parallel by a multiprocessor system as a separate task, there are the following two division methods. . One of them is, for example, as disclosed in Japanese Laid-Open Patent Publication No. 1-152571, assuming the number of usable processors, the processing of a user program is equally divided by the number of usable processors, and each of them is regarded as one task. Another method is that the user pre-designates the number of parallelizations in the source program, divides the processing of the user program according to the parallelization instruction, and makes each one a task. is there.

【０００５】前者のような分割方法の場合、従来のタス
ク割り当て方法でも問題は生じない。同時に使用可能な
プログラム台数を前提に並列処理するタスク数が決定さ
れるため、全てのタスクが並列処理されるからである。
しかしながら、後者のような分割方法の場合、タスク数
分の空きプロセッサが常に存在するとは限らないため、
全てのタスクが並列処理される保証はない。この場合、
従来のタスク割り当て方法では、常に１つの主クラスタ
内のプロセッサにタスクを割り当てるようにしている
為、空きプロセッサ数が足りないと、空きプロセッサが
できるまで残りのタスクが必ず待たされることになる。In the case of the former division method, there is no problem even with the conventional task allocation method. This is because the number of tasks to be processed in parallel is determined based on the number of programs that can be used at the same time, and thus all tasks are processed in parallel.
However, in the case of the latter division method, there are not always free processors for the number of tasks.
There is no guarantee that all tasks will be processed in parallel. in this case,
In the conventional task allocation method, the tasks are always allocated to the processors in one main cluster. Therefore, if the number of free processors is insufficient, the remaining tasks must wait until there are free processors.

【０００６】なお、主クラスタ中に空きプロセッサが無
い場合、他のクラスタに空きプロセッサがあれば直ちに
そのプロセッサにタスクを割り当てるようにすることが
考えられる。しかしながら、前述したように他のクラス
タのローカルメモリ上のデータを必要とする場合にはそ
のデータを自クラスタのローカルメモリに転送する必要
があって、それには或る程度の時間がかかるため、他の
クラスタのプロセッサに割り当てると、そのタスクの実
行終了時刻や後続の処理の開始が却って遅くなる場合が
あり、問題である。If there is no free processor in the main cluster, it is possible to assign a task to that processor immediately if there is a free processor in another cluster. However, as described above, when the data in the local memory of another cluster is needed, the data needs to be transferred to the local memory of the own cluster, which takes a certain amount of time. However, if the task is assigned to the processor of the cluster, the execution end time of the task or the start of the subsequent processing may be delayed, which is a problem.

【０００７】そこで本発明の目的は、並列処理可能なタ
スクを基本的には１つのクラスタのプロセッサ群に割り
当てるようにするが、全タスクを割り当てるだけの空き
プロセッサが存在しない場合には、割り当て先を１つの
クラスタに限定せず、処理が早まるならば別のクラスタ
に割り当てることで、ユーザプログラムの処理時間の短
縮を図ることにある。Therefore, an object of the present invention is to basically allocate a task capable of parallel processing to a processor group of one cluster, but when there is no free processor for allocating all tasks, the allocation destination is set. Is not limited to one cluster, but if the processing is accelerated, it is assigned to another cluster to shorten the processing time of the user program.

【０００８】[0008]

【課題を解決するための手段】本発明は、メモリと該メ
モリを共有する複数のプロセッサとから構成されるクラ
スタを複数備え、これら複数のクラスタが共通システム
バスを通じて相互にデータ転送可能に接続された階層構
造型マルチプロセッサシステムにおけるタスク割り当て
方法において、ソースプログラムを翻訳して目的プログ
ラムを生成するコンパイラにおいて、並列化の指示に従
って並列処理可能な複数のタスクを生成した際に、各タ
スクの実行時間を予測すると共にタスクを前記目的プロ
グラムを主として実行する主クラスタと異なるクラスタ
で実行した場合のオーバヘッド時間を予測して、これら
の予測結果を目的プログラムに付加し、目的プログラム
のタスクをプロセッサに割り当てるスケジューラにおい
て、並列処理可能なタスクは基本的に前記主クラスタの
プロセッサ群に割り当てるようにするが、全タスクを割
り当てるだけの空きプロセッサが不足している場合に
は、主クラスタに空きプロセッサが生じるのを待った方
が早いか、それとも他のクラスタの空きプロセッサにタ
スクを割り当てた方が早いかを、前記予測結果を利用し
て判断し、処理が早くなる側のクラスタにタスクを割り
当てることを特徴とする。According to the present invention, there are provided a plurality of clusters each including a memory and a plurality of processors sharing the memory, and the plurality of clusters are connected to each other through a common system bus so that data can be transferred to each other. In a method of allocating tasks in a hierarchical multiprocessor system, when a compiler that translates a source program to generate a target program generates multiple tasks that can be processed in parallel according to the parallelization instructions, the execution time of each task A scheduler that predicts the overhead time when a task is executed in a cluster different from the main cluster that mainly executes the target program, adds these prediction results to the target program, and allocates the task of the target program to the processor. Can be processed in parallel Tasks are basically allocated to the processor group of the main cluster, but if there are not enough free processors to allocate all tasks, it is faster to wait for a free processor to occur in the main cluster, Alternatively, it is characterized in that whether or not it is earlier to allocate a task to a free processor of another cluster is determined by using the prediction result, and the task is allocated to a cluster on which processing is faster.

【０００９】また、タスクを前記主クラスタと異なるク
ラスタで実行した場合のオーバヘッド時間を、当該タス
クを他のクラスタのプロセッサで実行させるためにその
クラスタのローカルメモリへ前記主クラスタのローカル
メモリから転送しておく必要のあるデータの転送時間
と、当該タスクの処理によって書き換えられた前記他の
クラスタのローカルメモリのデータを前記主クラスタの
ローカルメモリに転送する時間とを少なくとも考慮して
予測することを特徴とする。Further, the overhead time when a task is executed in a cluster different from the main cluster is transferred from the local memory of the main cluster to the local memory of that cluster so that the processor of the other cluster can execute the task. A prediction is made by at least considering the transfer time of data that needs to be stored and the time to transfer the data of the local memory of the other cluster rewritten by the processing of the task to the local memory of the main cluster. And

【００１０】更に、並列処理可能な複数のタスクのうち
次に割り当てようとするタスクのオーバヘッド時間と、
並列処理可能な複数のタスクのうち既に主クラスタに割
り当てられているタスクの実行時間および割り当て時刻
から判明する空きプロセッサが生じるまでの待ち時間と
を比較して、主クラスタに空きプロセッサが生じるのを
待った方が早いか、それとも他のクラスタの空きプロセ
ッサにタスクを割り当てた方が早いかを判断することを
特徴とする。Further, an overhead time of a task to be assigned next among a plurality of tasks that can be processed in parallel,
Compare the execution time of tasks that are already assigned to the main cluster among the tasks that can be processed in parallel and the waiting time until the free processor occurs, which is known from the assignment time, and check whether the free processor occurs in the main cluster. It is characterized by determining whether it is faster to wait or it is faster to assign a task to a free processor of another cluster.

【００１１】[0011]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, an example of an embodiment of the present invention will be described in detail with reference to the drawings.

【００１２】図１は本発明の一実施例のシステム構成図
である。同図において、ＭＰＳは図６で説明した階層構
造型マルチプロセッサシステムであり、各々が複数のプ
ロセッサ４とそれらで共有されるローカルメモリ３とか
ら構成される複数のクラスタ２が共通システムバス１に
よって相互にデータ転送可能に接続されている。また、
１０はユーザが作成したソースプログラムである。この
ソースプログラム１０は高級言語で処理が記述されてお
り、特に並列処理可能な部分についてはユーザが予めそ
の並列化数を指示した並列化指示を埋め込んである。更
に、１１はソースプログラム１０を翻訳して階層構造型
マルチプロセッサシステムＭＰＳ向けの目的プログラム
１２を生成するコンパイラである。このコンパイラ１１
で生成される目的プログラム１２中には、コンパイラ１
１の後述する処理によって、並列処理可能なタスクに関
する実行時間とオーバヘッド時間との予測結果１２０が
埋め込まれている。また、１３は目的プログラム１２を
階層構造型マルチプロセッサシステムＭＰＳで実行すべ
く、目的プログラム１２を構成する各タスクをクラスタ
２のプロセッサ４に割り当てるスケジューラである。こ
のスケジューラ１３は、目的プログラム１２を実行する
ために少なくとも１つの空きプロセッサが存在する１つ
のクラスタ２を主クラスタとして選択し、基本的に主ク
ラスタ内のプロセッサ群に目的プログラム１２のタスク
を割り当てる。但し、並列処理可能な複数のタスクを割
り当てる際には割り当て先を主クラスタに限定せず、処
理が早まるなら別のクラスタに割り当てる。FIG. 1 is a system configuration diagram of an embodiment of the present invention. In the figure, the MPS is the hierarchical multi-processor system described in FIG. 6, and a plurality of clusters 2 each composed of a plurality of processors 4 and a local memory 3 shared by them are shared by a common system bus 1. They are connected to each other so that data can be transferred. Also,
Reference numeral 10 is a source program created by the user. The source program 10 has its processing described in a high-level language, and in particular, the parallelization instruction in which the user has instructed the number of parallelizations in advance is embedded in the portions that can be processed in parallel. Further, 11 is a compiler that translates the source program 10 to generate a target program 12 for the hierarchical structure type multiprocessor system MPS. This compiler 11
In the target program 12 generated by
By the processing of 1 described later, the prediction result 120 of the execution time and the overhead time regarding the tasks that can be processed in parallel is embedded. Further, 13 is a scheduler that allocates each task constituting the target program 12 to the processor 4 of the cluster 2 so that the target program 12 can be executed in the hierarchical structure multiprocessor system MPS. The scheduler 13 selects one cluster 2 having at least one free processor for executing the target program 12 as a main cluster, and basically assigns the task of the target program 12 to a processor group in the main cluster. However, when allocating a plurality of tasks that can be processed in parallel, the allocation destination is not limited to the main cluster, and if the processing becomes faster, it is allocated to another cluster.

【００１３】図２はコンパイラ１１の構成例とその処理
例とを示している。コンパイラ１１は、入力となるソー
スプログラム１０を読み込み、構文解析を行って中間テ
キスト１１３を生成する解析部１１０と、中間テキスト
１１３を読み込み、ユーザから指示された並列化指示に
従って処理を並列化して、並列化中間テキスト１１４を
生成する並列化部１１１と、並列化中間テキスト１１４
から目的プログラム１２を生成する生成部１１２とから
構成され、特に並列化部１１１には、実行時間・オーバ
ヘッド時間予測手段１１１０が設けられている。FIG. 2 shows a configuration example of the compiler 11 and a processing example thereof. The compiler 11 reads the input source program 10, performs parsing to generate the intermediate text 113, and the intermediate text 113, reads the intermediate text 113, and parallelizes the processing according to the parallelization instruction instructed by the user. The parallelizing unit 111 that generates the parallelized intermediate text 114, and the parallelized intermediate text 114
And an execution time / overhead time predicting means 1110 are provided in the parallelizing section 111.

【００１４】この実行時間・オーバヘッド時間予測手段
１１１０は、並列化部１１１において、ソースプログラ
ム１０中の或る処理部分が並列化指示に従って複数に分
割されてその各々が１つのタスクとされた場合に、図２
のステップＳ１〜Ｓ３の処理を行って、各タスクを主ク
ラスタのプロセッサで実行したときの実行時間と、主ク
ラスタとは別のクラスタのプロセッサで実行したときの
オーバヘッド時間とを予測する手段である。この予測さ
れた各タスクの実行時間とオーバヘッド時間とを含む予
測結果は生成部１１２を通じて目的プログラム１２に予
測結果１２０として埋め込まれる。This execution time / overhead time predicting means 1110 is used when a certain processing part in the source program 10 is divided into a plurality of parts according to the parallelization instruction in the parallelization part 111 and each of them is regarded as one task. , Fig. 2
Is a means for predicting an execution time when each task is executed by a processor of the main cluster and an overhead time when the task is executed by a processor of a cluster different from the main cluster by performing the processing of steps S1 to S3. . The prediction result including the predicted execution time and overhead time of each task is embedded in the target program 12 as the prediction result 120 via the generation unit 112.

【００１５】実行時間・オーバヘッド時間予測手段１１
１０は、本実施例の場合、各タスクそれぞれに対して作
成される命令の実行時間の総和（ループの場合には繰り
返し回数も考慮にいれる）を求め、これを各タスクの実
行時間とする（Ｓ１）。Execution time / overhead time predicting means 11
In the case of the present embodiment, reference numeral 10 finds the sum of the execution times of the instructions created for each task (in the case of a loop, the number of iterations is also taken into consideration), and sets this as the execution time of each task ( S1).

【００１６】また実行時間・オーバヘッド時間予測手段
１１１０は、本実施例の場合、オーバヘッド時間の大部
分がデータ転送時間で占められることに鑑み、各タスク
を主クラスタとは別のクラスタのプロセッサに割り当て
た場合に、主クラスタとそのクラスタとの間で転送が必
要となるデータのサイズを算出し（Ｓ２）、この算出し
たデータサイズに単位サイズ当たりのデータ転送時間を
乗じることによりデータ転送時間の総和を求め、これを
そのタスクのオーバヘッド時間とする（Ｓ３）。In the present embodiment, the execution time / overhead time predicting means 1110 allocates each task to a processor of a cluster different from the main cluster, considering that most of the overhead time is occupied by the data transfer time. In this case, the size of the data required to be transferred between the main cluster and the cluster is calculated (S2), and the calculated data size is multiplied by the data transfer time per unit size to obtain the total data transfer time. Is obtained and is set as the overhead time of the task (S3).

【００１７】或るタスクを主クラスタと別のクラスタの
プロセッサに割り当てた場合、主クラスタとそのクラス
タとの間で転送が必要となるデータとしては、例えば次
のようなものがある。When a certain task is assigned to the processors of the main cluster and another cluster, the data that needs to be transferred between the main cluster and the cluster is, for example, as follows.

【００１８】１つは、当該タスクの処理を行う前段階と
して、テキストと、処理を行うのに最小限必要な入力デ
ータがある。これらのデータは、目的プログラムのこれ
までの処理を行ってきた主クラスタのローカルメモリに
存在しているため、そのローカルメモリから当該タスク
が実行されるクラスタのローカルメモリへ転送しておく
必要がある。First, as a pre-stage for performing the processing of the task, there are text and minimum input data required for performing the processing. Since these data exist in the local memory of the main cluster that has executed the target program so far, it is necessary to transfer them from the local memory to the local memory of the cluster in which the task is executed. .

【００１９】他の１つは、当該タスクによって書き換え
られるであろうと予測されるデータである。このような
データは当該タスクが実行されるクラスタのローカルメ
モリ上で書き換えられるため、目的プログラムの主クラ
スタでの後続処理のために、主クラスタのローカルメモ
リへ転送しておく必要がある。The other one is data predicted to be rewritten by the task. Since such data is rewritten in the local memory of the cluster in which the task is executed, it is necessary to transfer it to the local memory of the main cluster for subsequent processing in the main cluster of the target program.

【００２０】以上のような種類のデータはソースプログ
ラム１０のコンパイル時点で判明する為、実行時間・オ
ーバヘッド時間予測手段１１１０はそれらのデータのサ
イズの総和を求め、そして、データ転送時間は転送され
るデータのサイズにほぼ比例するので、算出したサイズ
に単位サイズ当たりのデータ転送時間を乗ずることによ
り、当該タスクを他のクラスタに割り当てた際のオーバ
ヘッド時間を求める。Since the types of data described above are known at the time of compiling the source program 10, the execution time / overhead time predicting means 1110 obtains the sum of the sizes of these data, and the data transfer time is transferred. Since it is almost proportional to the size of the data, the overhead time when the task is assigned to another cluster is obtained by multiplying the calculated size by the data transfer time per unit size.

【００２１】なお、クラスタ間で転送が必要となるデー
タとしては、上述したデータが最もサイズが大きくなる
ため、それだけでオーバヘッド時間を求めるようにして
も良いが、例えば、処理の途中で他のタスクと同期処理
が必要な場合、同期をとらなくてはならないデータを転
送する必要があるため、この種のデータの転送時間も考
慮するようにしても良い。但し、この種のデータの転送
は１度だけでなく複数回行われる場合があり、また双方
向に転送されるので、それを考慮してデータ転送時間を
求めることが必要である。As the data that needs to be transferred between the clusters, the above-mentioned data has the largest size, and therefore the overhead time may be obtained by that alone. However, for example, another task may be performed during the processing. If the synchronization processing is required, it is necessary to transfer the data that must be synchronized. Therefore, the transfer time of this kind of data may be taken into consideration. However, this kind of data transfer may be performed not only once but also a plurality of times, and since it is transferred bidirectionally, it is necessary to determine the data transfer time in consideration of it.

【００２２】次に図１のスケジューラ１３の動作につい
て説明する。スケジューラ１３は、目的プログラム１２
の実行に際して、複数のクラスタ２のうちから少なくと
も１つの空きプロセッサが存在する１つのクラスタを、
目的プログラム１２を実行する主クラスタとして選択す
る。そして、基本的に主クラスタ内のプロセッサ群に目
的プログラム１２のタスクを割り当てる。但し、並列処
理可能な複数のタスクを割り当てる際に空きプロセッサ
が不足する場合は、主クラスタでプロセッサが空き状態
になるのを待った方が早いか、それとも他のクラスタの
空きプロセッサにタスクを割り当てた方が早いかを目的
プログラム１２中の予測結果１２０を利用して、処理が
早くなる方にタスクを割り当てる。なお、並列処理可能
な複数のタスクのうち最初に割り当てるタスクは必ず主
クラスタのプロセッサに割り当てるため、若し、空きプ
ロセッサが存在しなければ待ち状態となる。Next, the operation of the scheduler 13 shown in FIG. 1 will be described. The scheduler 13 uses the target program 12
When executing the above, one cluster in which at least one free processor exists among the plurality of clusters 2 is
It is selected as the main cluster for executing the target program 12. Then, basically, the task of the target program 12 is assigned to the processor group in the main cluster. However, if there are insufficient free processors when allocating multiple tasks that can be processed in parallel, it is faster to wait for the processors to become free in the main cluster, or the tasks are allocated to free processors in other clusters. By using the prediction result 120 in the target program 12 to determine whether it is earlier, the task is assigned to the faster processing. Note that the task to be assigned first among a plurality of tasks that can be processed in parallel is always assigned to the processor of the main cluster, so if there is no free processor, it will be in a waiting state.

【００２３】図３はスケジューラ１３が並列処理可能な
複数のタスクを順次に割り当てる際に２番目以降のタス
クについて行う処理の一例を示している。同図に示すよ
うに、スケジューラ１３は、並列処理可能な次のタスク
を割り当てる際、先ず、主クラスタ中のプロセッサの使
用状況を確認し、空きプロセッサが存在するか否かを調
べる（Ｓ１１）。主クラスタ中に空きプロセッサが存在
する場合には、そのプロセッサに当該タスクを割り当て
る（Ｓ１６）。FIG. 3 shows an example of processing performed by the scheduler 13 for the second and subsequent tasks when sequentially allocating a plurality of tasks that can be processed in parallel. As shown in the figure, when allocating the next task that can be processed in parallel, the scheduler 13 first confirms the usage status of the processors in the main cluster and checks whether or not there is a free processor (S11). If there is a free processor in the main cluster, the task is assigned to that processor (S16).

【００２４】他方、主クラスタ中に空きプロセッサが存
在しない場合、主クラスタに空きプロセッサが生じるを
待ってその空きプロセッサに当該タスクを割り当てた方
が処理が早いか、それとも他のクラスタの空きプロセッ
サに当該タスクを割り当てた方が処理が早いかを判定す
る。このために、先ず、主クラスタに空きプロセッサが
できるまでの待ち時間Ｘを予測する（Ｓ１２）。この待
ち時間Ｘは、当該並列処理可能な複数のタスクであって
主クラスタに既に割り当てられたタスクのうち、最も早
く実行を終えるタスクの終了時刻までの時間とする。タ
スクの実行終了時刻は、その割り付け時刻に予測結果１
２０中に記述されたそのタスクの実行時間を加算して求
める。次に、今回割り当てようとするタスクのオーバヘ
ッド時間（これも予測結果１２０中に記述されている）
が、上記待ち時間Ｘより小さいか否かを判定する（Ｓ１
３）。オーバヘッド時間が待ち時間Ｘより小さい場合、
他のクラスタの空きプロセッサに当該タスクを割り当て
た方が処理が早まる可能性が高いので、他のクラスタに
空きプロセッサが存在するか否かを調べ（Ｓ１４）、存
在すればその空きプロセッサに当該タスクを割り当てる
（Ｓ１５）。なお、他のクラスタに空きプロセッサが存
在しなかった場合にはステップＳ１１に戻って上述した
処理を繰り返す。一方、オーバヘッド時間が待ち時間Ｘ
より小さくない場合は、主クラスタに空きプロセッサが
生じるのを待っていた方が処理が早まる可能性が高いの
で、ステップＳ１１に戻って上述した処理を繰り返す。On the other hand, when there is no free processor in the main cluster, it is faster to allocate the task to the free processor after waiting for the free processor to be generated in the main cluster, or the free processor of another cluster is assigned. It is determined whether the task is assigned faster. For this purpose, first, the waiting time X until a free processor is formed in the main cluster is predicted (S12). The waiting time X is the time until the end time of the task that finishes the earliest among the plurality of tasks that can be processed in parallel and are already assigned to the main cluster. The task execution end time is predicted result 1 at the allocation time.
The execution time of the task described in 20 is added and obtained. Next, the overhead time of the task to be allocated this time (this is also described in the prediction result 120)
Is less than the waiting time X (S1
3). If the overhead time is less than the waiting time X,
Since it is more likely that the task will be faster if the task is assigned to a free processor in another cluster, it is checked whether or not there is a free processor in another cluster (S14). If there is, the task is assigned to the free processor. Is assigned (S15). If there is no free processor in another cluster, the process returns to step S11 to repeat the above process. On the other hand, the overhead time is the waiting time X
If it is not smaller, it is more likely that the process will be faster if there is a free processor in the main cluster, so the process returns to step S11 and the above-described process is repeated.

【００２５】例えば図４（Ａ）に示すように処理Ａ，処
理Ｂ，処理Ｃから構成されるソースプログラム１０をコ
ンパイラ１１が翻訳した結果、図４（Ｂ）に示すような
処理Ａを実行するタスクＴＡ，処理Ｂを並列実行する３
つのタスクＴＢ０，ＴＢ１，ＴＢ２，処理Ｃを実行する
タスクＴＣから構成される目的プログラム１２が生成さ
れ、主クラスタの或るプロセッサでタスクＴＡが実行さ
れ、その終了を契機に３つのタスクＴＢ０，ＴＢ１，Ｔ
Ｂ２を割り当てる場合、主クラスタに３つの空きプロセ
ッサがあれば、図５（Ａ）に示すようにタスクＴＢ０，
ＴＢ１，ＴＢ２の全てが主クラスタ内のプロセッサに割
り当てられる。しかし、例えば空きプロセッサが２つし
かないと、図５（Ｂ）に示すようにタスクＴＢ０，ＴＢ
１を主クラスタの空きプロセッサに割り当てた時点で、
空きプロセッサが無くなる。そこで、スケジューラ１３
は、タスクＴＢ２の割り当てにかかる図３の処理におい
て、主クラスタに空きプロセッサが存在しないことを判
定すると（Ｓ１１）、主クラスタに空きプロセッサがで
きまるでの待ち時間Ｘを予測する（Ｓ１２）。現在の時
刻をタスクＴＢ２の割り当て直後の時刻とすると、その
待ち時間Ｘは図５（Ｂ）に示す時間Ｘとなる。そこで、
スケジューラ１３は、この待ち時間ＸとタスクＴＢ２の
オーバヘッド時間（図５（Ｂ）のａ＋ｂの時間）との大
小関係を調べる（Ｓ１３）。そして、オーバヘッド時間
が待ち時間Ｘより小さい場合には、図５（Ｂ）に示すよ
うに、他のクラスタに空きプロセッサがあればそのプロ
セッサにタスクＴＢ２を割り当てる。こうすると、主ク
ラスタに空きが生じるのを待ってタスクＴＢ２を割り当
てる場合に比べて、図５（Ｂ）に示す時間Ｙだけ処理時
間が短縮できることになる。なお、他のクラスタに空き
プロセッサが存在しない為に、ステップＳ１１〜Ｓ１４
のループを繰り返していると、待ち時間Ｘは徐々に短く
なるのに対してオーバヘッド時間は固定なので、ついに
は待ち時間Ｘがオーバヘッド時間以下になる。こうなる
と、タスクＴＢ２はもはや他クラスタへは割り当てられ
ず、主クラスタに空きプロセッサが生じるのを待つこと
になる。For example, as a result of the compiler 11 translating the source program 10 composed of processing A, processing B, and processing C as shown in FIG. 4A, processing A as shown in FIG. 4B is executed. Task TA and process B are executed in parallel 3
A target program 12 including one task TB0, TB1, TB2, and a task TC that executes a process C is generated, a task TA is executed by a certain processor of the main cluster, and three tasks TB0 and TB1 are triggered by the end of the task TA. , T
When allocating B2, if there are three free processors in the main cluster, as shown in FIG.
All of TB1 and TB2 are assigned to the processors in the main cluster. However, if, for example, there are only two free processors, tasks TB0 and TB as shown in FIG.
When 1 is assigned to the free processor of the main cluster,
There are no free processors. Therefore, the scheduler 13
3 determines that there is no free processor in the main cluster in the process of FIG. 3 related to the allocation of the task TB2 (S11), it predicts the waiting time X until a free processor is formed in the main cluster (S12). When the current time is the time immediately after the task TB2 is assigned, the waiting time X is the time X shown in FIG. 5 (B). Therefore,
The scheduler 13 examines the magnitude relationship between the waiting time X and the overhead time of the task TB2 (time of a + b in FIG. 5B) (S13). Then, when the overhead time is smaller than the waiting time X, as shown in FIG. 5B, if there is a free processor in another cluster, the task TB2 is assigned to that processor. By doing this, the processing time can be shortened by the time Y shown in FIG. 5B as compared with the case where the task TB2 is allocated after waiting for the availability of the main cluster. Since there are no free processors in other clusters, steps S11 to S14
When the loop is repeated, the waiting time X gradually decreases, but the overhead time is fixed. Therefore, the waiting time X finally becomes equal to or less than the overhead time. When this happens, task TB2 is no longer assigned to another cluster and waits for a free processor to occur in the main cluster.

【００２６】以上本発明の実施例について説明したが、
本発明は以上の実施例にのみ限定されず、その他各種の
付加変更が可能である。例えば、上述した実施例では、
並列可能な複数のタスクの各々についてオーバヘッド時
間を予測したが、並列処理可能な複数のタスクの割り当
て順序が定まっており、前述したように最初のタスクを
必ず主クラスタに割り当てるようにしている場合にあっ
ては、並列処理可能な複数のタスクのうちその先頭のタ
スクについてはオーバヘッド時間の予測を省略すること
ができる。The embodiments of the present invention have been described above.
The present invention is not limited to the above embodiments, and various other additions and changes are possible. For example, in the embodiment described above,
The overhead time was predicted for each of the multiple tasks that can be parallelized. However, if the allocation order of the multiple tasks that can be processed in parallel is fixed and the first task is always assigned to the main cluster as described above. Therefore, it is possible to omit the overhead time prediction for the first task of the plurality of tasks that can be processed in parallel.

【００２７】[0027]

【発明の効果】以上説明したように本発明によれば、ソ
ースプログラムの翻訳時点で、並列処理可能な各タスク
の実行時間と、タスクを主クラスタと異なるクラスタで
実行した場合のオーバヘッド時間とを予測して目的プロ
グラムに含めておき、スケジューラがタスクをプロセッ
サに割り当てる際に、並列処理可能な複数のタスクの全
てを割り当てるだけの空きプロセッサが主クラスタに存
在しない場合、目的プログラムに含められた予測結果を
利用して、主クラスタに空きプロセッサが生じるのを待
った方が早いか、それとも他のクラスタの空きプロセッ
サにタスクを割り当てた方が早いかを判断し、処理が早
くなる側のクラスタにタスクを割り当てるようにしたの
で、より効率の良い並列処理が可能となる。As described above, according to the present invention, at the time of translation of a source program, the execution time of each task that can be processed in parallel and the overhead time when the task is executed in a cluster different from the main cluster are calculated. Predicted and included in the target program, and when the scheduler allocates tasks to the processor, if there are no free processors in the main cluster to allocate all of the multiple tasks that can be processed in parallel, the prediction included in the target program Use the result to determine whether it is faster to wait for a free processor to occur in the primary cluster, or to assign a task to a free processor in another cluster. Since it is allocated, more efficient parallel processing becomes possible.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成図である。FIG. 1 is a system configuration diagram of an embodiment of the present invention.

【図２】コンパイラの構成例と処理例とを示す図であ
る。FIG. 2 is a diagram illustrating a configuration example and a processing example of a compiler.

【図３】スケジューラの処理例を示すフローチャートで
ある。FIG. 3 is a flowchart showing a processing example of a scheduler.

【図４】ソースプログラムと目的プログラムの例を示す
図である。FIG. 4 is a diagram showing an example of a source program and a target program.

【図５】並列処理可能なタスクのクラスタへの割り当て
と実行時間との関係を示す図である。FIG. 5 is a diagram showing a relationship between allocation of tasks capable of parallel processing to clusters and execution time.

【図６】階層構造型マルチプロセッサシステムの一例を
示すブロック図である。FIG. 6 is a block diagram showing an example of a hierarchical structure multiprocessor system.

[Explanation of symbols]

ＭＰＳ…マルチプロセッサシステム１…共通システムバス２…クラスタ３…ローカルメモリ４…プロセッサ１０…ソースプログラム１１…コンパイラ１２…目的プログラム１２０…予測結果１３…スケジューラ MPS ... Multiprocessor system 1 ... Common system bus 2 ... Cluster 3 ... Local memory 4 ... Processor 10 ... Source program 11 ... Compiler 12 ... Object program 120 ... Prediction result 13 ... Scheduler

Claims

[Claims]

1. A hierarchical multi-processor system comprising a plurality of clusters each comprising a memory and a plurality of processors sharing the memory, and the plurality of clusters being connected to each other through a common system bus so that data can be transferred to each other. In the task assignment method, when a compiler that translates a source program to generate a target program generates a plurality of tasks that can be processed in parallel according to a parallelization instruction, the execution time of each task is predicted and In a scheduler that predicts the overhead time when a program is executed in a cluster different from the main cluster that executes it, adds these prediction results to the target program, and assigns the tasks of the target program to the processor Basically the lord Allocate to the raster processor group, but if there are not enough free processors to allocate all tasks, it is better to wait for a free processor in the main cluster, or free processor in another cluster. A method for allocating a task in a hierarchical structure multiprocessor system, characterized in that it is determined whether or not a task is to be allocated to a cluster by using the prediction result, and the task is allocated to a cluster on the side of faster processing.

2. The overhead time when a task is executed in a cluster different from the main cluster is transferred from the local memory of the main cluster to the local memory of that cluster so that the task of the task is executed by a processor of another cluster. It is necessary to predict at least the transfer time of the data that needs to be stored and the time to transfer the data of the local memory of the other cluster rewritten by the processing of the task to the local memory of the main cluster. A task allocation method in a hierarchical structure multiprocessor system according to claim 1.

3. Overhead time of a task to be assigned next among a plurality of tasks that can be processed in parallel, and execution time and assignment time of a task that has already been assigned to the main cluster among the plurality of tasks that can be processed in parallel. Compare with the waiting time until there is a free processor, which is determined from, and determine whether it is faster to wait for a free processor to occur in the main cluster, or to allocate a task to a free processor in another cluster. 3. A task allocation method in a hierarchical structure type multiprocessor system according to claim 1 or 2.