JP2009151645A

JP2009151645A - Parallel processor and program parallelizing device

Info

Publication number: JP2009151645A
Application number: JP2007330336A
Authority: JP
Inventors: Hideaki Minamide; 英明南出; Haruhiko Takeyama; 治彦竹山; Kenichi Sasaki; 賢一佐々木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-12-21
Filing date: 2007-12-21
Publication date: 2009-07-09
Anticipated expiration: 2027-12-21
Also published as: JP5036523B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a parallel processor for improving execution efficiency of a program and reducing a maximum execution time. <P>SOLUTION: A command execution part 9(13) notifies other process elements of a synchronous event corresponding to its own task when the execution result of the own task is synchronous with the tasks of the other process elements, and notifies the other process elements of the truth value of a predicate register 7(12) when carrying out a task of notifying the tasks executed by the other process elements, of the execution condition determined result of the task. The command execution part further carries out the own task based on the truth value of a predicate receiving register 8(11) when carrying out the task executed based on the execution condition determined result of the tasks executed by the other process elements. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、並列化したプログラムを実行する並列処理装置と、プログラムを並列に実行可能な複数のタスクに自動的に分割するプログラムの並列化装置とに関するものである。 The present invention relates to a parallel processing device that executes a parallelized program and a program parallelizing device that automatically divides a program into a plurality of tasks that can be executed in parallel.

プログラム並列化装置では、一般に一つのプログラムを複数のタスクに分割し、複数のプロセスエレメント上で並列に実行するため、プログラムの演算対象である変数に関して読み出しと書き込みに関するフロー依存、逆依存、出力依存の各依存関係からデータ依存を解析し、タスク間の先行制約を抽出し、分割したプログラムに対して同期命令を組み込むことで、プロセスエレメントに割り付けたタスク間の先行制約を守り、演算結果の整合性を保証する。 In a program parallelization device, one program is generally divided into a plurality of tasks and executed in parallel on a plurality of process elements. Therefore, the flow dependence, reverse dependence, and output dependence regarding the reading and writing of variables that are the operation target of the program. Analyzing data dependence from each dependency, extracting precedence constraints between tasks, and incorporating synchronization instructions into the divided program, so that precedence constraints between tasks assigned to process elements are protected, and operation results are consistent Guarantee sex.

このようなプログラム並列化装置において、従来の装置では、一つの仮想マシンコードを一つのタスクとして扱い、プログラムを並列に実行可能なタスクに分割し、複数のプロセッサにタスクスケジュールを行った後、タスクの結合が可能なときには、タスクを結合し再度タスクスケジュールをすることで、同期処理によるオーバーヘッドを小さくするようにしたものがあった（例えば、特許文献１参照）。 In such a program parallelization apparatus, the conventional apparatus treats one virtual machine code as one task, divides the program into tasks that can be executed in parallel, schedules tasks for a plurality of processors, In some cases, by combining tasks and re-scheduling tasks, the overhead due to synchronization processing is reduced (see, for example, Patent Document 1).

また、従来のプログラム並列化装置として次のような装置があった。即ち、実行プロセッサが指定されているタスクを含む各タスクの依存関係を示すタスクグラフに基づいて、各タスクのタスクスケジューリングを各タスクの実行前に行う静的なタスクスケジューリングシステムにおいて、各タスクから終端のタスクに至るまでのパス長が長いタスクに高い優先度が与えられ、優先度の高い順に登録される順方向リストと、終端のタスクから各タスクに至るまでのパス長が長いタスクに高い優先度が与えられ、優先度の低い順に登録される逆方向リストを持つようにしたものがあった（例えば、特許文献２参照）。そして、このようなプログラム並列化装置では、スケジューリングする際に、順方向リストか逆方向リストかのいずれかのリストを選択し、選択されたリストに基づいて各タスクの実行時刻を決定する。このようなプログラム並列化装置は、各プロセッサの最優先タスクの状況によって、順方向と逆方向のタスクスケジューリングを適宜行うため、タスクの待ち状態を少なくして、プロセッサの使用効率を向上させることができる。 In addition, there are the following devices as conventional program parallelization devices. In other words, in a static task scheduling system that performs task scheduling for each task before the execution of each task based on a task graph indicating the dependency relationship of each task including the task for which the execution processor is specified, the task is terminated from each task. High priority is given to tasks with a long path length to the next task, a forward direction list registered in descending order of priority, and high priority to tasks with a long path length from the last task to each task There is one that has a reverse direction list in which the degree is given and is registered in the order of low priority (see, for example, Patent Document 2). In such a program parallelization apparatus, when scheduling, either the forward list or the backward list is selected, and the execution time of each task is determined based on the selected list. Such a program parallelization apparatus appropriately performs task scheduling in the forward direction and the reverse direction depending on the state of the highest priority task of each processor, so that it is possible to reduce the waiting state of the task and improve the use efficiency of the processor. it can.

特許第３０３９９５３号公報Japanese Patent No. 3039953 特開２００３−２９９８８号公報JP 2003-29988 A

しかしながら、プログラムを複数のタスクに分割して複数のプロセスエレメントに割り付けるタスクスケジューリングを行う際に、従来のプログラム並列化装置では実行効率の高い割り付けを行っているとは言えなかった。 However, when performing task scheduling in which a program is divided into a plurality of tasks and allocated to a plurality of process elements, it cannot be said that the conventional program parallelization apparatus performs allocation with high execution efficiency.

即ち、並列化した後のプログラムの全体の実行時間は、最も処理時間が長いプロセスエレメントの処理時間によって決まる。機器に組み込んで、その機器の制御を行うプログラムの場合、そのプログラムは実行を終えた時点で、外部との入出力によってデータの更新が必要であるため、分割後に割り付けたプログラムの処理が早く終了したプロセスエレメントだけを、他のプロセスエレメントに先行して、次の処理周期に進めることができない。処理が早く終了したプロセスエレメントは、最も処理時間を必要とするプロセスエレメントでの処理が終了するまで、待機する必要がある。その結果、先に処理を終えたプロセスエレメントは、次の周期の処理を開始するまでの間、空き時間が発生することになる。タスクスケジューリングの結果、全てのプロセスエレメントが同時に処理を終了するように各プロセスエレメントに対して処理が割り付けられていれば良いが、ある一つのプロセスエレメントのみがその他のプロセスエレメントより処理時間が長くなるようなスケジューリング結果となった場合、その他のプロセスエレメントは、その一つのプロセスエレメントが処理を終了するまでの間、待機することになり、複数のプロセスエレメントの実行効率が低くなるという問題があった。 That is, the overall execution time of the program after parallelization is determined by the processing time of the process element having the longest processing time. In the case of a program that is installed in a device and controls the device, when the program finishes execution, the data needs to be updated by external input / output, so the processing of the program allocated after the division ends quickly Only the processed element cannot be advanced to the next processing cycle ahead of other process elements. A process element that has finished processing earlier needs to wait until processing in the process element that requires the most processing time is finished. As a result, the process element that has finished processing will have idle time until it starts processing in the next cycle. As a result of task scheduling, it suffices if processing is assigned to each process element so that all process elements finish processing at the same time, but only one process element takes longer than other process elements. When such a scheduling result is obtained, the other process elements wait until the one process element finishes processing, and there is a problem that the execution efficiency of a plurality of process elements is lowered. .

上記の問題は、プロセスエレメントに割り当てた処理の最後の部分についての問題であるが、同様の問題はプロセスエレメントに割り当てた処理の途中の時点でも発生する場合がある。あるプロセスエレメントに割り当てたタスクが、他のプロセスエレメントに割り付けたタスク（先行タスクと呼ぶ）に含まれる変数に対してデータ依存の関係にある場合、他のプロセスエレメントに割り付けた先行タスクの完了を待機する必要がある。このため、先行タスクの完了を待機する間にも、空き時間が発生するという問題があった。後続のタスクの中で、データ依存がなく、かつコントロールフローの点からも制約がないタスクを、この空き時間に割り当てることによって、空き時間をなくすことも可能である。しかし、これは、後続のタスクにそのような条件が成り立つタスクが存在する場合であって、条件が成立するタスクが存在しない場合は、先に述べたようにプロセスエレメントの処理に空き時間が発生し、そのプロセスエレメントの実行効率が低下するという問題が同様に発生する。 The above problem is a problem concerning the last part of the process assigned to the process element, but the same problem may occur at a point in the middle of the process assigned to the process element. If a task assigned to a process element has a data-dependent relationship with a variable included in a task assigned to another process element (referred to as a preceding task), the completion of the preceding task assigned to the other process element is I need to wait. Therefore, there is a problem that idle time occurs even while waiting for the completion of the preceding task. It is possible to eliminate the idle time by assigning a task that does not depend on data and is not restricted in terms of control flow to the idle time among the subsequent tasks. However, this is a case where there is a task that satisfies such a condition in the succeeding task, and when there is no task that satisfies the condition, free time is generated in processing of the process element as described above. However, the problem that the execution efficiency of the process element decreases similarly occurs.

例えば、特許文献１に記載された装置のように、一つの仮想マシンコードを一つのタスクとして扱い、プログラムを並列に実行可能なタスクに分割し、複数のプロセッサにタスクスケジュールを行った後に、結合が可能なタスク同士を結合した後、改めてスケジューリングを行う方法では、空き時間が少なくなるようにスケジューリングされるため、プロセスエレメントの実行効率は改善されるが、スケジューリングを行う回数が多く、演算量が増えるという問題があった。 For example, like the device described in Patent Document 1, one virtual machine code is handled as one task, a program is divided into tasks that can be executed in parallel, and a task schedule is performed on a plurality of processors, and then combined. In the method of scheduling again after combining tasks that can be performed, the execution efficiency of the process element is improved because scheduling is performed so that the idle time is reduced, but the number of times of scheduling is increased and the amount of calculation is reduced. There was a problem of increasing.

また、組み込み機器の制御プログラムでは、実行条件とその条件下での処理の組み合わせの連続によってプログラムを構成できる。特許文献１に記載された装置でのタスクの扱いに基づいて、このようなプログラムの１マシンコードを１タスクとして扱おうとすると、個々のマシンコードに対して実行条件を考慮してタスクの結合の可否を判断する必要がある。実行条件が異なるタスクは一つのタスクとして結合できないためである。その結果、このような制御プログラムの場合、タスクを結合する際に実行条件を考慮に入れるため、演算量が増えるという問題があった。 In addition, in the control program for an embedded device, the program can be configured by a continuous combination of execution conditions and processing under those conditions. If one machine code of such a program is handled as one task based on the handling of the task in the apparatus described in Patent Document 1, the task combination is considered in consideration of the execution condition for each machine code. It is necessary to judge whether it is possible. This is because tasks with different execution conditions cannot be combined as one task. As a result, in the case of such a control program, there is a problem that the amount of calculation increases because execution conditions are taken into consideration when tasks are combined.

また、特許文献２に記載された装置では、各タスクの依存関係を示すタスクグラフに基づいて、実行前にスケジューリングを行う静的なタスクスケジューリングシステムにおいて、優先度の高い順にタスクが登録された順方向リストと、優先度の低い順にタスクが登録された逆方向リストを用い、各プロセッサの最優先タスクの状況によって、順方向リストと逆方向リストを選択してタスクスケジューリングを行う。これにより、タスクの待ち状態である空き時間を削減できるとしている。しかし、データの依存性を持つタスク間では、依然としてこの方法では解決できない空き時間が発生するという問題があった。 Further, in the apparatus described in Patent Document 2, in a static task scheduling system that performs scheduling before execution based on a task graph indicating the dependency of each task, the order in which tasks are registered in descending order of priority. Using a direction list and a reverse list in which tasks are registered in order of priority, task scheduling is performed by selecting the forward list and the reverse list according to the status of the highest priority task of each processor. As a result, it is possible to reduce the idle time in which tasks are waiting. However, there is a problem that idle time still cannot be solved by this method between tasks having data dependency.

この発明は上記のような課題を解決するためになされたもので、プロセスエレメントの空き時間を削減してプログラムの実行効率を改善し、最大実行時間を短縮することのできる並列処理装置及びプログラム並列化装置を得ることを目的とする。 The present invention has been made to solve the above-described problems. A parallel processing apparatus and a program parallel capable of improving the execution efficiency of a program by reducing the idle time of a process element and reducing the maximum execution time. An object is to obtain a device.

この発明に係る並列処理装置は、複数のプロセスエレメントのそれぞれは、次の命令を実行するか否かを決定するための真偽値を格納するプレディケートレジスタと、他のプロセスエレメントから受信した次の命令を実行するか否かを決定するための真偽値が付いたプレディケート付き同期イベントを格納するプレディケート受信レジスタと、タスクの実行条件判定結果を他のプロセスエレメントで実行されるタスクに対して通知するタスクを実行する場合は、プレディケートレジスタの真偽値が付いたプレディケート付き同期イベントを他のプロセスエレメントに通知し、他のプロセスエレメントで実行されるタスクの実行条件判定結果に基づいて実行されるタスクを実行する場合は、プレディケート付き同期イベントの受信レジスタの真偽値に基づいて当該自身のタスクを実行する命令実行部とを備えたものである。 In the parallel processing device according to the present invention, each of the plurality of process elements includes a predicate register for storing a true / false value for determining whether or not to execute the next instruction, and the next process element received from another process element. A predicate reception register that stores a synchronization event with a predicate with a true / false value to determine whether or not to execute the instruction, and a task execution condition determination result is notified to a task executed by another process element When a task to be executed is executed, a synchronization event with a predicate with a true / false value of the predicate register is notified to other process elements and executed based on the execution condition determination result of the task executed in the other process element. When the task is executed, the prefetched synchronization event receive register is true. Those having an instruction execution unit for executing the own tasks on the basis of the value.

この発明の並列処理装置は、複数のプロセスエレメントのそれぞれが、次の命令を実行するか否かを決定するための真偽値を格納するプレディケートレジスタと、他のプロセスエレメントから受信した次の命令を実行するか否かを決定するための真偽値を格納するプレディケート受信レジスタとの値に基づいて、タスクを同期して実行するようにしたので、プログラムの実行効率を改善し、最大実行時間を短縮することができる。 The parallel processing device according to the present invention includes a predicate register for storing a true / false value for determining whether or not each of a plurality of process elements executes the next instruction, and a next instruction received from another process element. Since the task is executed synchronously based on the value of the predicate reception register that stores the truth value for determining whether to execute or not, the execution efficiency of the program is improved, and the maximum execution time Can be shortened.

実施の形態１．
図１は、この発明の実施の形態１による並列処理装置を示す構成図である。
図において、並列処理装置は、プロセッサ１と外部メモリ５およびこれらを相互に接続するためのバス６から構成されている。プロセッサ１は、二つ以上のプロセスエレメント（実施の形態１ではプロセスエレメント（ＰＥ）＃０（２）、＃１（３）の２台）と内蔵メモリ４からなる。これらプロセスエレメント＃０（２）、＃１（３）は、それぞれ、プレディケートレジスタ７，１２、プレディケート受信レジスタ８，１１、命令実行部９，１３、同期機構１０，１４を備えている。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a parallel processing apparatus according to Embodiment 1 of the present invention.
In the figure, the parallel processing device comprises a processor 1, an external memory 5, and a bus 6 for connecting them together. The processor 1 includes two or more process elements (in the first embodiment, two process elements (PE) # 0 (2) and # 1 (3)) and a built-in memory 4. These process elements # 0 (2) and # 1 (3) include predicate registers 7 and 12, predicate reception registers 8 and 11, instruction execution units 9 and 13, and synchronization mechanisms 10 and 14, respectively.

プレディケートレジスタ７，１２は、それぞれのプロセスエレメント＃０（２），＃１（３）において、次の命令を実行するか否かを決定するための真偽値を格納するレジスタである。これらのプレディケートレジスタ７，１２の値は、プロセスエレメント＃０（２），＃１（３）上での演算によって値が変化する。例えば、変数間の比較命令の結果が一致しない場合や、大小比較命令の結果が不成立の場合にこのレジスタの値が偽になる。一方で、変数間の比較命令の結果が一致したり、大小比較命令の結果が成立した場合には、このレジスタの値が真になる。
プレディケート受信レジスタ８，１１は、他のプロセスエレメントで実行された命令の結果（真偽値）を受信するためのレジスタである。即ち、プレディケート受信レジスタ８は、プロセスエレメント＃１（３）による命令実行結果を、また、プレディケート受信レジスタ１１は、プロセスエレメント＃０（２）による命令実行結果を受信する。 The predicate registers 7 and 12 are registers that store true / false values for determining whether or not to execute the next instruction in each of the process elements # 0 (2) and # 1 (3). The values of these predicate registers 7 and 12 are changed by calculation on the process elements # 0 (2) and # 1 (3). For example, the value of this register becomes false when the result of the comparison instruction between the variables does not match or when the result of the magnitude comparison instruction is not established. On the other hand, when the result of the comparison instruction between the variables matches or the result of the magnitude comparison instruction is established, the value of this register becomes true.
The predicate reception registers 8 and 11 are registers for receiving the result (true / false value) of an instruction executed by another process element. That is, the predicate reception register 8 receives the instruction execution result by the process element # 1 (3), and the predicate reception register 11 receives the instruction execution result by the process element # 0 (2).

命令実行部９，１３は、自身に割り当てられたタスクの実行結果が、他のプロセスエレメントに割り当てられているタスクに同期している場合は、自身のタスクに対応した同期イベントを他のプロセスエレメントに通知すると共に、他のプロセスエレメントで実行されるタスクの同期イベントに基づいて実行されるタスクを実行する場合、その同期イベントの受信に基づいて自身のタスクを実行し、かつ、タスクの実行条件判定結果を他のプロセスエレメントで実行されるタスクに対して通知するタスクを実行する場合は、プレディケートレジスタ７，１２の真偽値を、他のプロセスエレメントに通知すると共に、他のプロセスエレメントで実行されるタスクの実行条件判定結果に基づいて実行されるタスクを実行する場合は、プレディケート受信レジスタ８，１１の真偽値に基づいて自身のタスクを実行するよう構成されている。 When the execution result of the task assigned to itself is synchronized with the task assigned to another process element, the instruction execution units 9 and 13 send a synchronization event corresponding to the own task to the other process element. When executing a task that is executed based on a synchronization event of a task that is executed in another process element, the task is executed based on the reception of the synchronization event, and the execution condition of the task When executing a task for notifying the determination result to a task executed in another process element, the truth value of the predicate registers 7 and 12 is notified to the other process element and executed in the other process element. When executing a task that is executed based on the execution condition judgment result of the It is configured to perform its tasks on the basis of the truth value of the reception register 8, 11.

即ち、これらの命令実行部９，１３は、条件付き実行命令をサポートしている。条件付き実行命令の実行は、プレディケートレジスタ７，１２の値に依存する。プレディケートレジスタ７，１２には、真偽のいずれかの値が格納されるが、このプレディケートレジスタ７，１２の値が真の場合は、読み出した条件付き実行命令を実行して、その演算結果を内蔵メモリ４に書き込む。一方、プレディケートレジスタ７，１２の値が偽の場合は、読み出した条件付き実行命令を実行せず、次の命令を読み出す。 That is, these instruction execution units 9 and 13 support conditional execution instructions. Execution of the conditional execution instruction depends on the values of the predicate registers 7 and 12. The predicate registers 7 and 12 store either true or false values. If the values of the predicate registers 7 and 12 are true, the read conditional execution instruction is executed and the operation result is obtained. Write to the built-in memory 4. On the other hand, when the values of the predicate registers 7 and 12 are false, the next instruction is read without executing the read conditional execution instruction.

同期機構１０，１４は、プロセッサ１上の複数のプロセスエレメント＃０（２），＃１（３）間で、同期をとるための仕組みであり、自身のプロセスエレメントから通知されている同期イベントを待機する命令と、他のプロセスエレメントから通知された同期イベントの命令により、これらのプロセスエレメント間の同期を行う。
即ち、各プロセスエレメント＃０（２），＃１（３）は、他のプロセスエレメントに対して同期イベントを通知する命令を持つ。この同期イベントには、固有の番号が付けられる。この同期イベントの固有の番号は、後述するプログラム並列化装置２０がプログラム３０を分割する際に、静的に決定し、対応する命令をマシンコード３１に組み込むものである。尚、プログラム３０およびマシンコード３１についても後述する。 The synchronization mechanisms 10 and 14 are mechanisms for synchronizing the plurality of process elements # 0 (2) and # 1 (3) on the processor 1, and the synchronization events notified from their own process elements. Synchronization between these process elements is performed by an instruction to wait and an instruction of a synchronization event notified from another process element.
That is, each process element # 0 (2), # 1 (3) has a command for notifying other process elements of a synchronization event. This synchronization event is given a unique number. The unique number of the synchronization event is statically determined when the program parallelization apparatus 20 described later divides the program 30, and the corresponding instruction is incorporated into the machine code 31. The program 30 and the machine code 31 will also be described later.

例えば、プロセスエレメント＃１（３）がプロセスエレメント＃０（２）からの同期イベントを待機する場合は、プロセスエレメント＃１（３）の同期機構１４は、例えば”ＷＡＩＴＥＶ０”という命令で待機し、プロセスエレメント＃０（２）は、例えば”ＳＥＮＤＥＶ０”という命令で同期イベントを同期機構１４に通知する。待機しているプロセスエレメント＃１（３）が、対応する同期イベントを受信すると、待機しているプロセスエレメント＃１（３）が持つ同期機構１４がそれを認識し、この場合は、プロセスエレメント＃１（３）の実行処理を再開する。プロセスエレメントの処理は、”ＷＡＩＴＥＶｎ”（ｎは整数）が呼び出された時点で、対応する同期イベントがシステム上の外のプロセスエレメントが発行するまで待機する。 For example, when the process element # 1 (3) waits for a synchronization event from the process element # 0 (2), the synchronization mechanism 14 of the process element # 1 (3) waits with an instruction “WAITEV 0”, for example. The process element # 0 (2) notifies the synchronization mechanism 14 of a synchronization event by, for example, an instruction “SENDEV 0”. When the waiting process element # 1 (3) receives the corresponding synchronization event, the synchronization mechanism 14 of the waiting process element # 1 (3) recognizes this, and in this case, the process element # 1 The execution process of 1 (3) is resumed. Processing of the process element waits until a corresponding synchronization event is issued by an external process element on the system when “WAITEV n” (n is an integer) is called.

また、プロセッサ１における内蔵メモリ４は、外部メモリ５に格納されているプログラムやデータを保持し、かつ、命令実行部９，１３によって実行結果が書き込まれるメモリである。即ち、プロセッサ１上の各プロセスエレメント＃０（２），＃１（３）は外部メモリ５上の対応するプログラム領域からマシンコード３１を読み出し、内蔵メモリ４から値を読み出して演算を行い、内蔵メモリ４上の変数に結果を書き込む。尚、内蔵メモリ４はアービトレーション機能を持ち、プロセスエレメント＃０（２），＃１（３）からの一つの変数へのアクセスは排他的に行われる。
この例では、マシンコード３１を外部メモリ５上に配置しているが、プロセッサ１内部に存在する内蔵メモリ４上に配置してもよい。また、内蔵メモリ４上に配置されている変数は、外部メモリ５上に配置することもできる。尚、外部メモリ５もアービトレーション機能を持ち、アクセスが排他的に行われる。また、バス６は、プロセッサ１と外部メモリ５とを相互に接続するためのバスである。 The built-in memory 4 in the processor 1 is a memory that holds a program and data stored in the external memory 5 and in which execution results are written by the instruction execution units 9 and 13. That is, each process element # 0 (2), # 1 (3) on the processor 1 reads the machine code 31 from the corresponding program area on the external memory 5, reads the value from the built-in memory 4, performs an operation, and performs the built-in The result is written to a variable on the memory 4. The built-in memory 4 has an arbitration function, and access to one variable from the process elements # 0 (2) and # 1 (3) is performed exclusively.
In this example, the machine code 31 is arranged on the external memory 5, but it may be arranged on the internal memory 4 existing inside the processor 1. Further, the variables arranged on the built-in memory 4 can be arranged on the external memory 5. The external memory 5 also has an arbitration function and is accessed exclusively. The bus 6 is a bus for connecting the processor 1 and the external memory 5 to each other.

また、図１では、プロセッサ１に含まれるプロセスエレメントが２個の場合について示しているが、３個以上存在しても良い。この場合、各プロセスエレメントが持つ同期機構とプレディケートレジスタは、それぞれ自プロセスエレメント以外の全てのプロセスエレメントと接続される。 Further, FIG. 1 shows a case where the processor 1 includes two process elements, but three or more process elements may exist. In this case, the synchronization mechanism and predicate register possessed by each process element are connected to all process elements other than its own process element.

次に、プログラム並列化装置について説明する。
図２は、プログラム並列化装置の構成図である。
図示のプログラム並列化装置２０は、解析部２１、タスク分割部２２、スケジューリング部２３、コード生成部２４を備えている。 Next, the program parallelization apparatus will be described.
FIG. 2 is a configuration diagram of the program parallelization apparatus.
The illustrated program parallelization apparatus 20 includes an analysis unit 21, a task division unit 22, a scheduling unit 23, and a code generation unit 24.

解析部２１は、入力されたプログラム３０について、字句解析、構文解析を行い、中間コードへ変換する。中間コードとしては、例えば四つ組が挙げられる。また、解析部２１は、プログラム３０内で表現されている演算内容とその順序に基づいて、変数間の依存性の解析を行うよう構成されている。タスク分割部２２は、中間コードで表現されたプログラムを複数のタスクへ分割する。タスクの単位としては、基本ブロックや、実行条件とその条件下での処理を一まとまりとしたものが挙げられる。尚、基本ブロックとは、途中に分岐がなくプログラムが記述された順に連続して実行されるプログラムの単位である。これらを単位としたタスクについて、タスク分割部２２では、変数間の依存性の解析結果を基に、タスク間の実行順序制約を求める。 The analysis unit 21 performs lexical analysis and syntax analysis on the input program 30 and converts it into an intermediate code. As the intermediate code, for example, a quadruplet is cited. The analysis unit 21 is configured to analyze the dependency between variables based on the calculation contents expressed in the program 30 and the order thereof. The task dividing unit 22 divides the program expressed by the intermediate code into a plurality of tasks. As a unit of a task, a basic block or a group of execution conditions and processing under those conditions may be mentioned. A basic block is a unit of a program that is continuously executed in the order in which the program is described without a branch in the middle. For tasks based on these units, the task division unit 22 obtains execution order constraints between tasks based on the analysis result of the dependency between variables.

スケジューリング部２３は、タスク分割部２２においてタスクに分割されたプログラムを、複数のプロセスエレメントに対して割り付ける機能部である。スケジューリング方式としては、クリティカルパス法が挙げられるが、いずれのスケジューリング・アルゴリズムでも採用することができる。ただし、実施の形態１では、各タスクの処理時間を評価関数とするクリティカルパス法に基づいたスケジューリングについて説明する。スケジューリングは、先のタスク間の実行順序制約に基づいて行われる。コード生成部２４は、スケジューリング部２３で複数のプロセスエレメントへ割り付けられた個々のタスクについて、実行するプロセスエレメントのマシンコード３１を生成する機能部である。このコード生成の際に、タスク間での変数についての依存性がプロセスエレメント間にわたる場合は、異なるプロセスエレメント間で実行順序を守る必要があるため、コード生成部２４ではその箇所に同期命令を追加するよう構成されている。 The scheduling unit 23 is a functional unit that allocates the program divided into tasks by the task dividing unit 22 to a plurality of process elements. As a scheduling method, a critical path method can be mentioned, but any scheduling algorithm can be adopted. However, in the first embodiment, scheduling based on the critical path method using the processing time of each task as an evaluation function will be described. Scheduling is performed based on execution order constraints between previous tasks. The code generation unit 24 is a functional unit that generates a machine code 31 of a process element to be executed for each task assigned to a plurality of process elements by the scheduling unit 23. In this code generation, if the dependency of variables between tasks extends between process elements, it is necessary to keep the execution order between different process elements. Therefore, the code generation unit 24 adds a synchronous instruction to that place. It is configured to

尚、プログラム並列化装置２０はコンピュータによって実現され、解析部２１〜コード生成部２４に対応したソフトウェアと、これらのソフトウェアを実行するためのＣＰＵやメモリ等のハードウェアから構成されている。あるいは、解析部２１〜コード生成部２４は専用のハードウェアから構成されている。
また、プログラム並列処理装置２０と図１に示した並列実行装置とは、例えば、ネットワークケーブルやＵＳＢケーブルによって接続され、プログラム並列化装置２０によって並列化されたプログラムを、並列実行装置のプログラムを格納する外部メモリ５へ転送するよう構成されている。 Note that the program parallelization apparatus 20 is realized by a computer, and includes software corresponding to the analysis unit 21 to the code generation unit 24, and hardware such as a CPU and a memory for executing the software. Alternatively, the analysis unit 21 to the code generation unit 24 are configured from dedicated hardware.
Further, the program parallel processing device 20 and the parallel execution device shown in FIG. 1 are connected by, for example, a network cable or a USB cable, and store the program parallelized by the program parallelization device 20 and the program of the parallel execution device. Is transferred to the external memory 5.

次に、実施の形態１のプログラム並列化装置と並列処理装置の動作について説明する。
例えば、あるプログラムを、実行条件とその条件下での処理のまとまりを一つのタスクとして扱った場合のタスク間の依存関係が、図３で表現されるようなプログラムを対象に説明する。図３は、プログラムの一部であり、その前後には他のタスクが存在していても良い。
図３の各ノードはタスクを表し、円の中の数字はタスクの番号である。また、円の右の数値はそのタスクの処理時間を数値で表現したものである。更にノード間の矢印は、タスク間の先行制約を表現している。例えば、タスク２はタスク１が終了するまで処理を開始することができず、タスク４はタスク２とタスク３が終了するまで処理を開始することができない。 Next, operations of the program parallelization apparatus and the parallel processing apparatus according to the first embodiment will be described.
For example, the dependency relationship between tasks when a program is treated as an execution condition and a group of processes under the condition will be described with reference to a program represented in FIG. FIG. 3 shows a part of the program, and other tasks may exist before and after the program.
Each node in FIG. 3 represents a task, and the numbers in the circles are task numbers. The numerical value on the right side of the circle expresses the processing time of the task as a numerical value. Furthermore, the arrows between the nodes represent the precedence constraints between tasks. For example, task 2 cannot start processing until task 1 ends, and task 4 cannot start processing until task 2 and task 3 end.

以下では、各タスクは実行条件とその条件下での処理内容の対で表現されているとし、各タスクの先頭には必ず実行条件の判定が存在するとものとして扱う。尚、プログラムを記述する言語としては、実行条件とその条件下での処理が記述できる言語であれば何でも良く、対象となる言語が限定される訳ではない。 In the following, each task is expressed as a pair of an execution condition and a processing content under the condition, and it is assumed that a determination of the execution condition always exists at the head of each task. As a language for describing the program, any language can be used as long as it can describe execution conditions and processing under the conditions, and the target language is not limited.

以下、図３で示されるタスク構成を持つプログラムを、プログラム３０としてプログラム並列化装置２０に入力した際の処理の流れについて図４を用いて説明する。
プログラム並列化装置２０は、プログラム３０を入力として受けると、解析部２１で字句解析・構文解析を行い中間コードを生成する（ステップＳＴ１）。このとき、タスク分割部２２によって図３に示すタスク間の依存関係を求める（ステップＳＴ２）。次に、スケジューリング部２３において、図３に示すタスクの依存性解析結果と各タスクの処理時間から、例えばクリティカルパス法を用いて、プロセスエレメントに対してタスクの割付を行う（ステップＳＴ３）。二つのプロセスエレメントに対して、図３のタスク構成を割り付けた例を図５に示す。 Hereinafter, the flow of processing when the program having the task configuration shown in FIG. 3 is input as the program 30 to the program parallelizing apparatus 20 will be described with reference to FIG.
When receiving the program 30 as input, the program parallelizing apparatus 20 performs lexical analysis / syntactic analysis in the analysis unit 21 to generate an intermediate code (step ST1). At this time, the task division unit 22 obtains the dependency relationship between tasks shown in FIG. 3 (step ST2). Next, the scheduling unit 23 assigns tasks to process elements from the task dependency analysis results shown in FIG. 3 and the processing time of each task using, for example, a critical path method (step ST3). FIG. 5 shows an example in which the task configuration of FIG. 3 is assigned to two process elements.

図５では、プロセスエレメント＃０（２）に対してタスク１、タスク２、およびタスク４を割り付け、プロセスエレメント＃１（３）に対してタスク３を割り付けている。尚、タスク３の実行はタスク１の終了を待つ必要があるため、これらのタスク間に同期イベントの受け渡しを行う命令を追加する。つまり、タスク３の先頭にタスク１からの同期イベントを待機する命令を追加し、タスク１の最後にタスク３への同期イベントを通知する命令を追加する。尚、ここでは、同期に要する時間を１としている。このため、タスク１、タスク２及びタスク４の処理時間の合計に、タスク１からタスク３への同期イベントの受け渡し時間１を加算した６５がこのスケジューリングによる処理時間である。全てのタスクを一つのプロセスエレメント上で順に処理した場合の処理時間は、全てのタスクの処理時間の合計である８４になるため、並列化によって処理時間が短縮されている。 In FIG. 5, task 1, task 2, and task 4 are allocated to process element # 0 (2), and task 3 is allocated to process element # 1 (3). Since execution of task 3 needs to wait for task 1 to end, an instruction for transferring a synchronous event is added between these tasks. That is, an instruction for waiting for a synchronization event from task 1 is added to the beginning of task 3, and an instruction for notifying a synchronization event to task 3 is added to the end of task 1. Here, the time required for synchronization is set to 1. For this reason, 65, which is obtained by adding the synchronous event delivery time 1 from task 1 to task 3 to the total processing time of task 1, task 2 and task 4, is the processing time by this scheduling. The processing time when all tasks are processed in sequence on one process element is 84, which is the sum of the processing times of all tasks, so the processing time is reduced by parallelization.

同期イベントは、同期イベント通知命令”ＳＥＮＤＥＶｎ”および同期イベント待機命令”ＷＡＩＴＥＶｎ”で表現する。尚、ｎは整数で、イベントの番号を意味する。即ち、ＷＡＩＴＥＶ命令でイベントを待機する側は、指定したイベント番号の同期イベントが通知されるまで、処理を待機する。一方、同期イベントを通知する側は、同期イベントを通知すべき相手を、イベント番号で特定する。
同期イベントの通知では、一つの同期イベント通知命令が、複数の同期イベント待機命令に対して、ある一つの同期イベントをブロードキャストしてもよい。
また、プレディケート付き同期イベントは、例えば同期イベント通知命令“ＳＥＮＤＥＶ＿ＰＤｎ”およびプレディケート付き同期イベント待機命令“ＷＡＩＴＥＶ＿ＰＤｎ”で表現する。 The synchronous event is expressed by a synchronous event notification instruction “SENDEV n” and a synchronous event standby instruction “WAITEV n”. In addition, n is an integer and means an event number. That is, the side that waits for an event with the WAITEV instruction waits for processing until a synchronization event of the designated event number is notified. On the other hand, the side that notifies the synchronization event specifies the other party to whom the synchronization event is to be notified by the event number.
In the synchronization event notification, one synchronization event notification command may broadcast one synchronization event to a plurality of synchronization event standby commands.
The synchronization event with predicate is expressed by, for example, a synchronization event notification command “SENDEV_PD n” and a synchronization event wait command with predicate “WAITEV_PD n”.

スケジューリング部２３において、プロセスエレメントへのタスクの割り付けを終えると、スケジューリングした結果について、プロセスエレメント間の通信オーバーヘッドに要する時間よりも長い時間に渡って、タスクが割り付けられておらず、空いた時間が存在するかどうかを確認する（ステップＳＴ４）。空き時間の確認は、スケジューリング結果を、先頭から順に走査し、タスクが割り当てられていない時間を抽出することで行う。
図５の例では、タスク４をプロセスエレメント＃１（３）が実行している間、プロセスエレメント＃０（２）にはタスクが割り当てられておらず、空き時間が存在する。そこで、タスク分割部２２によるタスク再分割（ステップＳＴ５）に分岐し、タスクの再分割を行う。 When the scheduling unit 23 finishes assigning the tasks to the process elements, the task is not assigned for a longer time than the time required for the communication overhead between the process elements, and the free time is obtained. It is confirmed whether it exists (step ST4). The free time is confirmed by scanning the scheduling result in order from the top and extracting the time when no task is assigned.
In the example of FIG. 5, while process element # 1 (3) is executing task 4, no task is assigned to process element # 0 (2) and there is a free time. Therefore, the process branches to task re-division (step ST5) by the task division unit 22 to re-divide the task.

図６は、図４におけるタスクの再分割（ステップＳＴ５）の処理の流れを示すフローチャートである。
先ず、空き時間を持つプロセスエレメントについて、空き時間の処理開始からの時刻とその時刻に空き時間を持つプロセスエレメントの数を求める（ステップＳＴ１１）。次に、その時刻で処理を分割することができるタスクが存在するかを確認する（ステップＳＴ１２）。このとき、タスクの処理時間が短く、分割して並列化するより、並列化による同期イベントの受け渡しのオーバーヘッドが大きくなる場合は、分割可能なタスクが存在しないと判断して処理を終了する。一方、図５のタスク４のように、同期イベントの受け渡しのオーバーヘッドに対して処理時間が長いタスクが存在する場合は、ステップＳＴ１３に移行して、分割可能なタスクの処理を空き時間を持つプロセスエレメントの数で分割する。図５の例では、プロセスエレメントが二つ存在するため、タスク４の処理を二つに分割する。このようにタスクの処理を複数に分割する際には、処理の内部でのデータの依存性を確認し、依存性を持つ処理の間で分割することを避ける必要がある。その理由は、依存性を持つ処理を複数のプロセスエレメントに対して割り付けると、この依存性を保証するために、新たに同期イベントの受け渡しの命令を追加する必要があるためである。従って、異なるプロセスエレメントに分割する際には、依存性を持たない処理を割り付ける。依存性を持つ場合は、依存関係がある処理をまとめて一つのプロセスエレメントに配置し、残りの処理を別のプロセスエレメントに配置する。 FIG. 6 is a flowchart showing a process flow of the task re-division (step ST5) in FIG.
First, for process elements having free time, the time from the start of processing of free time and the number of process elements having free time at that time are obtained (step ST11). Next, it is confirmed whether there is a task that can divide the process at that time (step ST12). At this time, if the processing time of the task is short and the overhead of passing the synchronous event due to the parallelization becomes larger than the division and parallelization, it is determined that there is no task that can be divided and the processing is terminated. On the other hand, when there is a task with a long processing time for the overhead of passing the synchronous event as in task 4 in FIG. 5, the process moves to step ST13 and the process of the task that can be divided has a free time. Divide by the number of elements. In the example of FIG. 5, since there are two process elements, the process of task 4 is divided into two. As described above, when dividing a task process into a plurality of processes, it is necessary to check the dependency of the data in the process and avoid dividing the process between the processes. The reason is that, when a process having a dependency is assigned to a plurality of process elements, it is necessary to newly add a synchronization event passing instruction in order to guarantee the dependency. Therefore, when dividing into different process elements, a process having no dependency is assigned. If there is a dependency, processes having a dependency relationship are collectively arranged in one process element, and the remaining processes are arranged in another process element.

図６に示す処理の流れに沿って、図５のタスク４を二つに分割し、タスク４−１とタスク４−２を生成する。尚、タスク４−１はタスク４の前半の処理を、タスク４−２はタスク４の後半の処理である。タスク４の実行条件判定の処理はタスク４−１に含まれる。そこで、タスク４−１で求めた実行条件であるプレディケートの値を、タスク４−２に通知する。 In accordance with the processing flow shown in FIG. 6, task 4 in FIG. 5 is divided into two to generate task 4-1 and task 4-2. Task 4-1 is the first half of task 4 and task 4-2 is the second half of task 4. The task 4 execution condition determination process is included in task 4-1. Therefore, the predicate value, which is the execution condition obtained in task 4-1, is notified to task 4-2.

このようにタスクを再分割した際には、分割後のタスクの起動元タスクの同期イベントを発行する位置にプレディケート付き同期命令を追加する（ステップＳＴ１４）。更に、起動先の同期イベントの待機位置にプレディケート付き同期イベントを待機する命令を追加する（ステップＳＴ１５）。
以上がタスクを再分割する際のタスク分割部２２の処理である。 When the task is subdivided in this way, a synchronization instruction with a predicate is added to the position where the synchronization event of the task that started the divided task is issued (step ST14). Further, a command for waiting for a synchronization event with a predicate is added to the standby position of the synchronization event at the activation destination (step ST15).
The above is the process of the task dividing unit 22 when re-dividing a task.

タスクを再分割した後は、新たなタスク構成に対して、スケジューリング部２３でスケジューリングを行う（図４におけるステップＳＴ３）。そして、スケジューリング結果の中にプロセスエレメントの空き時間が存在するかどうかを確認する（ステップＳＴ４）。このステップＳＴ４において、空き時間が存在する場合は、改めてタスクの再分割処理（ステップＳＴ５）を行う。
空き時間がない場合は、コード生成部２４でマシンコード３１の生成を行う（ステップＳＴ６）。コード生成部２４では、タスク間の依存関係に基づいて、同期イベントの受け渡しを行うための、同期イベント通知命令と同期イベントの待機命令を追加する。尚、タスクを再分割した際に追加したプレディケート付き同期命令が追加されたタスクについては、マシンコード３１を生成する際に、その命令をマシンコード３１中に組み込む。 After the task is subdivided, the scheduling unit 23 performs scheduling for the new task configuration (step ST3 in FIG. 4). Then, it is confirmed whether there is a free time of the process element in the scheduling result (step ST4). In this step ST4, if there is a free time, a task re-division process (step ST5) is performed again.
If there is no free time, the code generator 24 generates the machine code 31 (step ST6). The code generation unit 24 adds a synchronization event notification command and a synchronization event standby command for delivering a synchronization event based on the dependency relationship between tasks. For a task to which a synchronization instruction with a predicate added when the task is subdivided is added, the instruction is incorporated in the machine code 31 when the machine code 31 is generated.

尚、あるプロセスエレメントから特定のプロセスエレメントに対して、同期イベントの通知処理を繰り返すなど、冗長な同期イベントの受け渡しについては、マシンコード３１を生成する時点で必要最小限まで同期イベントを削除しても良い。
図５に示したスケジューリング結果の空き時間を短縮する目的で、図４及び図６に示した処理を実行した結果、図７に示すスケジューリング結果が得られる。
図７は、図５のタスク４の処理を二つのプロセスエレメントに分割することで、空き時間を削減し、プログラム全体の処理時間を短縮している。尚、タスク４の条件判定処理の時間を２とすると、この時間は分割できないため、プロセスエレメント＃１（３）上で処理を行う。この条件判定処理によって求められたプレディケートの値を、プロセスエレメント＃０（２）に割り付けられたタスク４−２に通知する。このように、タスクを分割した結果、全体の処理時間は、タスク１及びタスク２の時間とタスク４−２の時間（＝１９）に対し、条件判定の時間２と二つの同期命令の時間の合計の４７となる。 In addition, with regard to the delivery of redundant synchronization events, such as repeated notification processing of synchronization events from a certain process element to a specific process element, the synchronization events are deleted to the minimum necessary when the machine code 31 is generated. Also good.
For the purpose of shortening the free time of the scheduling result shown in FIG. 5, the scheduling result shown in FIG. 7 is obtained as a result of executing the processing shown in FIGS. 4 and 6.
FIG. 7 divides the processing of task 4 in FIG. 5 into two process elements, thereby reducing the idle time and the processing time of the entire program. If the time for the condition determination process of task 4 is 2, this time cannot be divided, so the process is performed on process element # 1 (3). The predicate value obtained by this condition determination process is notified to the task 4-2 assigned to the process element # 0 (2). As described above, as a result of dividing the task, the total processing time is equal to the time 2 of the condition determination and the time of the two synchronous instructions with respect to the time of task 1 and task 2 and the time of task 4-2 (= 19). The total is 47.

次に、上記プログラム並列化装置２０で生成された並列化プログラム（図７に示す状態のプログラム）が外部メモリ５に格納された場合の並列処理装置の動作について説明する。
並列処理装置のプロセスエレメント＃０（２）は、図７の左側のタスクであるタスク１、タスク２、タスク４−２のプログラムを順に実行する。また、プロセスエレメント＃１（３）は、図７の右側のタスクであるタスク３、タスク４−１のプログラムを順に実行する。
先ず、プロセスエレメント＃０（２）は、タスク１のプログラムをメモリ（内蔵メモリ４または外部メモリ５）から読み込んで処理を開始する。このとき、プロセスエレメント＃１（３）でもタスク３のプログラムを読み込んで処理を開始するが、先頭の命令が、タスク３の先頭にタスク１が終了した時点でプロセスエレメント＃０（２）から発行される同期イベントを待機する命令であるため、プロセスエレメント＃１（３）が持つ同期機構１４に、プロセスエレメント＃０（２）からのイベントを待機する設定を行い、そのイベントの待機状態となる。尚、この同期イベントには固有の番号が割り振られており、その番号を持つイベントが発行されるまで待機する。同期イベントの固有番号は、プログラム並列化装置２０のコード生成部２４によって、同期イベントの受け渡しが必要となる箇所に、同期イベントの送受信命令を追加する際に、それぞれの箇所について固有の番号を割り付けられている。 Next, the operation of the parallel processing device when the parallelized program (program in the state shown in FIG. 7) generated by the program parallelizing device 20 is stored in the external memory 5 will be described.
Process element # 0 (2) of the parallel processing device executes the programs of task 1, task 2, and task 4-2, which are tasks on the left side of FIG. Further, the process element # 1 (3) sequentially executes the programs of the task 3 and the task 4-1, which are the tasks on the right side of FIG.
First, the process element # 0 (2) reads the task 1 program from the memory (the built-in memory 4 or the external memory 5) and starts processing. At this time, the process element # 1 (3) also reads the task 3 program and starts processing, but the first instruction is issued from the process element # 0 (2) when the task 1 ends at the beginning of the task 3. Since this is an instruction to wait for a synchronized event, the synchronization mechanism 14 of the process element # 1 (3) is set to wait for an event from the process element # 0 (2), and the event enters a standby state. . This synchronization event is assigned a unique number, and the system waits until an event having that number is issued. As for the unique number of the synchronization event, when the code generation unit 24 of the program parallelizing apparatus 20 adds a synchronization event transmission / reception command to the location where the synchronization event needs to be transferred, a unique number is assigned to each location. It has been.

プロセスエレメント＃０（２）の命令実行部９は、タスク１の処理を完了すると、プロセスエレメント＃１（３）のタスク３が待機している同期イベントを同期機構１４に通知する。次に、プロセスエレメント＃０（２）の命令実行部９は、タスク２のプログラムを読み出し、その実行を開始する。
プロセスエレメント＃０（２）からの同期イベントを受けたプロセスエレメント＃１（３）は、同期機構１４によって受信した同期イベントが、待機状態を抜ける条件となる同期イベントであるかを判断する。この場合は、待機状態を抜け出す条件の同期イベントであるため、待機状態を抜け、命令実行部１３での処理を再開し、プログラム３の処理を開始する。プロセスエレメント＃０（２）の命令実行部９は、タスク２の処理が完了すると、タスク４−２のプログラムを読み出す。このプログラム４−２の先頭は、プロセスエレメント＃１（３）からのプレディケート付き同期イベントを待機する命令である。このため、プロセスエレメント＃０（２）の命令実行部９は、同期機構１０に待機する同期イベントを登録し、待機状態に移る。 When the process of task 1 is completed, the instruction execution unit 9 of process element # 0 (2) notifies the synchronization mechanism 14 of the synchronization event that task 3 of process element # 1 (3) is waiting for. Next, the instruction execution unit 9 of the process element # 0 (2) reads the task 2 program and starts its execution.
The process element # 1 (3) that has received the synchronization event from the process element # 0 (2) determines whether the synchronization event received by the synchronization mechanism 14 is a synchronization event that is a condition for exiting the standby state. In this case, since this is a synchronous event for exiting the standby state, the standby state is exited, the processing in the instruction execution unit 13 is resumed, and the processing of the program 3 is started. When the process of task 2 is completed, the instruction execution unit 9 of process element # 0 (2) reads the program of task 4-2. The head of the program 4-2 is an instruction for waiting for a synchronization event with a predicate from the process element # 1 (3). For this reason, the instruction execution unit 9 of the process element # 0 (2) registers a synchronization event to wait in the synchronization mechanism 10, and shifts to a standby state.

プロセスエレメント＃１（３）の命令実行部１３は、タスク３の処理を終えると、タスク４−１の処理に移る。命令実行部１３は、タスク４−１の先頭部分の実行条件判定処理の実行を終えると、プログラム並列化装置２０によってプログラム上のその次の位置に追加されたプレディケート付き同期命令を実行する。このプレディケート付き同期命令は、命令が実行された時点でのプレディケートレジスタ１２の真偽の値を、同期イベントと共に、対象プロセスエレメントに対して通知する命令である。この場合、タスク４−１の実行条件判定の処理を終えた時点で、プレディケート付き同期命令を発行しているため、タスク４−１の実行条件判定処理終了時点でのプレディケートレジスタ１２の値をプロセスエレメント＃０（２）に対して同期イベントと共に通知する。ここで、プレディケートレジスタ１２の値はプレディケート受信レジスタ８に通知される。これは、タスク４−２の実行開始を指示すると共に、タスク４−２に含まれる条件付き実行命令に対して、実行する／しないを通知する処理になる。プロセスエレメント＃１（３）からの同期イベントとプレディケート値を受けたプロセスエレメント＃０（２）は、命令実行部９により、受け取ったプレディケート値に基づいて、タスク４−２の処理を開始する。 When the instruction execution unit 13 of the process element # 1 (3) finishes the process of the task 3, the process proceeds to the process of the task 4-1. When the instruction execution unit 13 finishes executing the execution condition determination process at the head part of the task 4-1, the program parallelization apparatus 20 executes the synchronization instruction with a predicate added to the next position on the program. This synchronization instruction with predicate is an instruction for notifying the target process element of the true / false value of the predicate register 12 at the time when the instruction is executed, together with the synchronization event. In this case, since the synchronization instruction with a predicate is issued when the execution condition determination process of task 4-1 is completed, the value of the predicate register 12 at the end of the execution condition determination process of task 4-1 is processed. The element # 0 (2) is notified together with the synchronization event. Here, the value of the predicate register 12 is notified to the predicate reception register 8. This is a process of instructing execution start of the task 4-2 and notifying execution / non-execution of the conditional execution instruction included in the task 4-2. The process element # 0 (2) that has received the synchronization event and the predicate value from the process element # 1 (3) causes the instruction execution unit 9 to start processing of the task 4-2 based on the received predicate value.

このように、プロセスエレメントへのタスク割り付けを行った後に、プロセスエレメントに空き時間が存在し、その時間に実行条件判定を行った後の処理を分割可能なタスクが存在する場合は、そのタスクの処理の部分を、空き時間を持つプロセスエレメントに対して割り付け、割り付けたプログラムの実行開始タイミングを通知する同期イベントの受け渡しの命令を追加すると共に、実行条件判定の結果を合わせて通知する機構を設けることによって、プログラム全体の処理時間を短縮することができる。 In this way, after assigning a task to a process element, if there is a free time in the process element and there is a task that can divide the processing after executing the execution condition determination at that time, the task's A process for allocating processing parts to process elements having free time, adding a synchronous event delivery instruction for notifying the execution start timing of the assigned program, and providing a mechanism for notifying the result of execution condition determination As a result, the processing time of the entire program can be shortened.

以上のように、実施の形態１の並列処理装置によれば、複数のプロセスエレメントを備え、これら複数のプロセスエレメントでタスクの並列処理を行う並列処理装置において、各プロセスエレメントは、次の命令を実行するか否かを決定するための真偽値を格納するプレディケートレジスタと、他のプロセスエレメントから受信した次の命令を実行するか否かを決定するための真偽値が付いたプレディケート付き同期イベントを格納するプレディケート受信レジスタと、タスクの実行条件判定結果を他のプロセスエレメントで実行されるタスクに対して通知するタスクを実行する場合は、プレディケートレジスタの真偽値が付いたプレディケート付き同期イベントを他のプロセスエレメントに通知し、他のプロセスエレメントで実行されるタスクの実行条件判定結果に基づいて実行されるタスクを実行する場合は、プレディケート付き同期イベントの受信レジスタの真偽値に基づいて自身のタスクを実行する命令実行部とを備えたので、プログラムの実行効率を改善し、最大実行時間を短縮することができる並列処理装置を実現することができる。 As described above, according to the parallel processing device of the first embodiment, in a parallel processing device that includes a plurality of process elements and performs parallel processing of tasks with the plurality of process elements, each process element outputs the following instruction: Predicate register that stores a Boolean value for determining whether to execute, and synchronization with a predicate with a Boolean value for determining whether to execute the next instruction received from another process element When a predicate reception register that stores events and a task that notifies task execution condition determination results to tasks executed by other process elements are executed, a synchronization event with a predicate with a true / false value of the predicate register To other process elements and When executing a task that is executed based on the execution condition determination result of the program, the instruction execution unit that executes the task based on the true / false value of the reception register of the synchronization event with predicate is provided. A parallel processing device capable of improving execution efficiency and reducing the maximum execution time can be realized.

また、実施の形態１のプログラム並列化装置によれば、プログラムを、実施の形態１の並列処理装置における複数のプロセスエレメント上で並列に実行するタスクに分割するプログラム並列化装置であって、プログラムに含まれるデータの依存関係を解析する解析部と、解析部の解析結果に基づいて、プログラムを複数のプロセスエレメントに対応したタスクに分割するタスク分割部と、分割したタスクを、複数のプロセスエレメントで実行するためのスケジューリングを行うスケジューリング部と、スケジューリングされた各プロセスエレメント毎のタスクを、データの依存関係に応じた同期命令を付加して各プロセスエレメントの実行形式のコードとして生成するコード生成部とを備え、スケジューリング部でスケジューリングを行った結果、スケジューリング部がいずれかのプロセスエレメントで、タスク実行の空き時間が存在すると判断した場合は、タスク分割部がタスクの処理を複数のプロセスエレメントに対応して分割すると共に、コード生成部が、分割したタスクの処理を同期させるためのプレディケート付き同期イベントの受け渡しを行う命令を付加するようにしたので、並列処理装置におけるプログラムの実行効率を改善し、最大実行時間を短縮することができる。 Further, according to the program parallelizing apparatus of the first embodiment, a program parallelizing apparatus that divides a program into tasks that are executed in parallel on a plurality of process elements in the parallel processing apparatus of the first embodiment. An analysis unit that analyzes the dependency of the data contained in the task, a task division unit that divides the program into tasks corresponding to a plurality of process elements based on the analysis result of the analysis unit, and a plurality of process elements A scheduling unit that performs scheduling for execution in the system, and a code generation unit that generates a task for each scheduled process element as a code of an execution format of each process element by adding a synchronization instruction according to data dependency And scheduled by the scheduling unit As a result, when the scheduling unit determines that there is a free time for task execution in any process element, the task division unit divides the task processing corresponding to a plurality of process elements, and the code generation unit Since an instruction for passing a synchronization event with a predicate for synchronizing the processing of the divided tasks is added, the execution efficiency of the program in the parallel processing device can be improved and the maximum execution time can be shortened.

実施の形態２．
実施の形態２における並列処理装置とプログラム並列化装置の構成は実施の形態１と同様であるため、実施の形態１の図面を用いて説明する。 Embodiment 2. FIG.
Since the configurations of the parallel processing device and the program parallelizing device in the second embodiment are the same as those in the first embodiment, they will be described with reference to the drawings of the first embodiment.

実行条件とその条件下での処理内容の一まとまりを一つのタスクとして扱った場合のタスク間の依存関係が図８に示されるようなプログラムを例に実施の形態２によるプログラム並列化装置２０の処理の流れと並列処理装置の動作について説明する。
図８は、プログラムの一部であり、その前後には他のタスクが存在していても良い。
図８において、実施の形態１と同様に、各ノードがタスクを表し、円の中の数値はタスクの番号を表す。また、円の右側の数値はそのタスクの処理時間である。更に、ノード間の矢印は、タスク間の先行制約を表現している。この例では、処理時間４のタスク３を実行するためには、処理時間が４０であるタスク１と処理時間が２０であるタスク２の両方のタスクが処理を完了していなければならない。 The program parallelizing apparatus 20 according to the second embodiment takes an example of a program in which the dependency relationship between tasks when an execution condition and a group of processing contents under the condition are handled as one task is shown in FIG. The flow of processing and the operation of the parallel processing device will be described.
FIG. 8 shows a part of the program, and other tasks may exist before and after the program.
In FIG. 8, as in the first embodiment, each node represents a task, and a numerical value in a circle represents a task number. The numerical value on the right side of the circle is the task processing time. Furthermore, the arrows between the nodes represent the precedence constraints between tasks. In this example, in order to execute the task 3 with the processing time 4, both the task 1 with the processing time 40 and the task 2 with the processing time 20 must complete the processing.

図８で示されるタスク構成を持つプログラムをプログラム並列化装置２０に入力すると、プログラムを解析し中間コードを生成した後、図８のようなタスクの単位に分割し、プロセスエレメントへのタスクの割り付けを行うスケジューリングを終了した時点で、図９のようなスケジューリング結果が得られる。プロセスエレメント＃０（２）上では、タスク１とタスク３を、プロセスエレメント＃１（３）上では、タスク２を実行する。このとき、プログラム全体の処理時間は、プロセスエレメント＃０（２）での処理時間となり、タスク１の処理時間とタスク３の処理時間を合わせた４４となる。 When a program having the task configuration shown in FIG. 8 is input to the program parallelization apparatus 20, the program is analyzed and an intermediate code is generated. Then, the program is divided into task units as shown in FIG. 8, and tasks are assigned to process elements. When the scheduling for performing is completed, a scheduling result as shown in FIG. 9 is obtained. Task 1 and task 3 are executed on process element # 0 (2), and task 2 is executed on process element # 1 (3). At this time, the processing time of the entire program is the processing time in the process element # 0 (2), which is 44, which is the sum of the processing time of task 1 and the processing time of task 3.

スケジューリングを終えると、いずれかのプロセスエレメント上にプロセスエレメント間の通信のオーバーヘッドを越える空き時間が存在するかを確認する（図４におけるステップＳＴ４）。この例では、プロセスエレメント＃１（３）において、タスク２を実行した後の時間が空き時間になっている。このため、プログラム並列化装置２０は、タスクの再分割の処理に入る。 When the scheduling is finished, it is confirmed whether there is a free time exceeding the communication overhead between the process elements on any of the process elements (step ST4 in FIG. 4). In this example, in process element # 1 (3), the time after execution of task 2 is idle time. For this reason, the program parallelization apparatus 20 enters into the task re-division process.

タスクの再分割（図６参照）では、空き時間の時刻とその時刻に空き時間を持つＰＥの数を求める（ステップＳＴ１１）。その後、同時刻で処理を分割可能なタスクが存在するかを確認する（ステップＳＴ１２）。この例では、プロセスエレメント＃１（３）でタスク２の実行を完了した後の時間では、プロセスエレメント＃０（２）上でタスク１の処理を行っている。この処理は、タスク２の処理完了後２０の時間に渡って行われるため、分割可能なタスクの処理となる。そこで、分割可能なタスクの処理を割当可能なＰＥの数で分割する（ステップＳＴ１３）。この例では、タスク１をタスク１−１とタスク１−２に分割する。 In task re-division (see FIG. 6), the time of free time and the number of PEs having free time at that time are obtained (step ST11). Thereafter, it is confirmed whether there is a task capable of dividing the process at the same time (step ST12). In this example, processing of task 1 is performed on process element # 0 (2) in the time after execution of task 2 is completed in process element # 1 (3). Since this process is performed for 20 hours after the completion of the process of task 2, it is a process of a separable task. Therefore, the divisional task processing is divided by the number of assignable PEs (step ST13). In this example, task 1 is divided into task 1-1 and task 1-2.

そして、起動元タスクの同期イベント発行位置となるタスク１−１に含まれる実行条件判定の処理を終えた位置にプレディケート付き同期命令を追加する（ステップＳＴ１４）。次に、その同期イベントの通知先となるタスク１−２の先頭位置に、プレディケート付き同期イベントを待機する命令を追加する（ステップＳＴ１５）。このように、タスク１を前後二つに分割した状態で、再度スケジューリングを実施する。
その結果、プログラムは、図１０に示すようにプロセスエレメントに対して割り付けられる。ここで、分割されたタスク１−２は、タスク２の後にスケジューリングされている。この例では、タスク１とタスク２の間に依存関係がないため、どちらのタスクを先に実行しても問題はない。ただし、タスク１−２は、タスク１−１で行われる実行条件判定の処理結果であるプレディケート値を必要とするため、このプレディケート値が確定してから処理を開始するようにした方がプロセスエレメントの空き時間の発生を抑えられる。そのため、プロセスエレメント＃０（２）上で実行条件判定の処理を行っている時間に並行して、プロセスエレメント＃１（３）上ではタスク２の処理を行うようにタスクを割り付ける。 Then, a synchronization instruction with a predicate is added to the position where the execution condition determination process included in the task 1-1 which is the synchronization event issuing position of the activation source task is completed (step ST14). Next, a command to wait for a synchronization event with a predicate is added to the head position of the task 1-2 that is the notification destination of the synchronization event (step ST15). In this way, scheduling is performed again in a state where task 1 is divided into two parts.
As a result, the program is allocated to the process element as shown in FIG. Here, the divided task 1-2 is scheduled after the task 2. In this example, since there is no dependency between task 1 and task 2, it does not matter which task is executed first. However, since the task 1-2 requires a predicate value that is the processing result of the execution condition determination performed in the task 1-1, the process element is preferably started after the predicate value is determined. The occurrence of free time can be suppressed. Therefore, in parallel with the time during which execution condition determination processing is performed on process element # 0 (2), a task is assigned so that processing of task 2 is performed on process element # 1 (3).

図１０のスケジューリング結果では、図９のタスク１の処理を二つのプロセスエレメントに分割することで、空き時間を削減し、プログラム全体の処理時間を短縮している。タスク１の実行条件判定の処理時間を４とした場合、図９のスケジューリング結果でのプログラム全体の処理時間が４４であるの対し、図１０に示すように、タスク１の処理の部分を分割して処理を並列化することにより、プログラム全体の処理時間を３５に短縮することができる。 In the scheduling result of FIG. 10, by dividing the processing of task 1 of FIG. 9 into two process elements, the free time is reduced and the processing time of the entire program is shortened. If the processing time of task 1 execution condition determination is 4, the processing time of the entire program in the scheduling result of FIG. 9 is 44, whereas the processing portion of task 1 is divided as shown in FIG. By parallelizing the processing, the processing time of the whole program can be shortened to 35.

次に、この実施の形態での並列処理装置の動作について説明する。
並列処理装置のプロセスエレメント＃０（２）は、図１０のタスク１−１のプログラムを実行する。また、プロセスエレメント＃１（３）は、図１０のタスク２のプログラムから実行を開始する。
プロセスエレメント＃０（２）が、タスク１−１の実行条件判定の処理を完了した時点で、プロセスエレメント＃１（３）に対して、プレディケート付き同期イベントを通知する。その後、プロセスエレメント＃０（２）は、タスク１−１の後続の処理を続けて行う。
プロセスエレメント＃１（３）は、開始時にタスク２のプログラムから実行を開始するが、タスク２の実行中にプロセスエレメント＃０（２）からプレディケート付き同期イベントを受信する。このとき、この同期イベント通知に対応する受信命令はまだ実行されていない。この同期イベントを受信する命令は、タスク２の処理が完了した後に実行するタスク１−２の先頭に存在するためである。このような場合、プレディケート付き同期イベントを受信したプロセスエレメント＃１（３）は、この同期イベントの番号を同期機構１４に保持すると共に、受信したプレディケートの値を、プレディケート受信レジスタ１１に保持する。 Next, the operation of the parallel processing device in this embodiment will be described.
Process element # 0 (2) of the parallel processing device executes the program of task 1-1 in FIG. Further, the process element # 1 (3) starts execution from the task 2 program of FIG.
When the process element # 0 (2) completes the execution condition determination process of the task 1-1, the process element # 0 (2) notifies the process element # 1 (3) of a synchronization event with a predicate. Thereafter, the process element # 0 (2) continues the subsequent processing of the task 1-1.
Process element # 1 (3) starts execution from the program of task 2 at the start, but receives a predicated synchronization event from process element # 0 (2) during the execution of task 2. At this time, the reception command corresponding to the synchronous event notification has not been executed yet. This is because the instruction for receiving this synchronization event is present at the head of the task 1-2 that is executed after the processing of the task 2 is completed. In such a case, the process element # 1 (3) that has received the synchronization event with a predicate holds the number of the synchronization event in the synchronization mechanism 14 and holds the value of the received predicate in the predicate reception register 11.

プロセスエレメント＃１（３）でのタスク２の処理が完了すると、引き続きタスク１−２の処理を開始する。このときタスク１−２の先頭は、先にプロセスエレメント＃０（２）から送信されたプレディケート付き同期イベントを受信する命令である。そこで、プロセスエレメント＃１（３）の命令実行部１３は、同期機構１４に対してそのイベントを既に受信しているかを確認する。この例では、既に対象となるプレディケート付き同期命令を受信しているため、後続の処理をそのまま継続する。プロセスエレメント＃１（３）は、タスク１−２の処理を終了すると、引き続きタスク３のプログラムを読み出しながら、最後まで処理を進める。 When the process of task 2 in process element # 1 (3) is completed, the process of task 1-2 is continuously started. At this time, the head of the task 1-2 is an instruction for receiving the synchronization event with predicate transmitted from the process element # 0 (2) first. Therefore, the instruction execution unit 13 of the process element # 1 (3) confirms whether the event has already been received by the synchronization mechanism 14. In this example, since the target predicate synchronization command has already been received, the subsequent processing is continued as it is. When the process element # 1 (3) finishes the process of the task 1-2, the process element # 1 (3) continues the process to the end while continuing to read the program of the task 3.

このように、実施の形態２では、プロセスエレメントへのタスクの割り付けを行った後に、プロセスエレメントに空き時間が存在し、その時間に条件実行判定を行った後の処理を分割可能なタスクが存在する場合は、そのタスクの処理の部分を、空き時間を持つプロセスエレメントに対して割り付け、割り付けたプログラムの実行開始タイミングを通知する同期イベントの受け渡しの命令を追加すると共に、実行条件判定の結果を合わせて通知する機構を設けることによって、プログラム全体の処理時間を短縮することができる。 As described above, in the second embodiment, after assigning a task to a process element, there is a free time in the process element, and there is a task that can divide the processing after performing conditional execution determination at that time. If this is the case, the processing part of the task is allocated to a process element having a free time, and a synchronous event delivery instruction is sent to notify the execution start timing of the allocated program. By providing a mechanism for notifying together, the processing time of the entire program can be shortened.

実施の形態３．
実施の形態３はタスクの先行制約において、後続のタスクを多数持つタスクを分割するようにしたものである。
実施の形態３におけるプログラム並列化装置および並列処理装置の図面上の構成は、実施の形態１と同様であるため、これらの図を援用して説明する。 Embodiment 3 FIG.
In the third embodiment, a task having a large number of subsequent tasks is divided under the preceding constraint of tasks.
Since the configuration of the program parallelizing apparatus and the parallel processing apparatus in the third embodiment is the same as that in the first embodiment, these drawings will be used for explanation.

実施の形態３のプログラム並列化装置２０において、タスク分割部２２は、解析部２１の解析結果に基づいて、プログラムを複数のプロセスエレメントに対応したタスクに分割すると共に、データ依存の関係において後続のタスクが所定の個数以上の対象タスクがあった場合は対象タスクを分割するよう構成されている。そしてスケジューリング部２３でスケジューリングを行った結果、処理時間が短縮される場合は、対象タスクの分割を行ってマシンコード３１を生成するよう構成されている。これ以外の構成は実施の形態１のプログラム並列化装置２０と同様である。また、並列処理装置の構成については実施の形態１と同様であるため、ここでの説明は省略する。 In the program parallelization apparatus 20 according to the third embodiment, the task dividing unit 22 divides the program into tasks corresponding to a plurality of process elements based on the analysis result of the analyzing unit 21 and the subsequent data dependency relationship. When there are more than a predetermined number of target tasks, the target task is divided. As a result of scheduling by the scheduling unit 23, when the processing time is shortened, the target task is divided and the machine code 31 is generated. Other configurations are the same as those of the program parallelization apparatus 20 of the first embodiment. Further, since the configuration of the parallel processing device is the same as that of the first embodiment, description thereof is omitted here.

以下、実行条件とその条件下での処理内容を一まとまりとして一つのタスクとして扱った場合のタスク間の依存関係が図１１に示されるようなプログラムを例に実施の形態３によるプログラム並列化装置２０の処理の流れについて説明する。
図１１において、実施の形態１と同様に、各ノードがタスクを表し、円の中の数値はタスクの番号を表す。また、円の右側の数値はそのタスクの処理時間である。更に、ノード間の矢印は、タスク間の先行制約を表現している。この例では、処理時間が１８であるタスク１を実行した後、タスク３〜タスク６が実行可能になる。また、タスク２は常に実行可能である。 Hereinafter, the program parallelizing apparatus according to the third embodiment will be described with reference to an example of a program in which the dependency relationship between tasks when the execution conditions and the processing contents under the conditions are treated as one task is shown in FIG. The process flow 20 will be described.
In FIG. 11, as in the first embodiment, each node represents a task, and a numerical value in a circle represents a task number. The numerical value on the right side of the circle is the task processing time. Furthermore, the arrows between the nodes represent the precedence constraints between tasks. In this example, after task 1 having a processing time of 18 is executed, tasks 3 to 6 can be executed. Task 2 can always be executed.

スケジューリング後に、実施の形態１、２に示すように、プログラム中のいずれかのタスクを分割しても、スケジューリング結果に改善が見られない状況であっても、プロセスエレメント上には、空き時間が存在する場合がある。同期によるオーバーヘッドのため、タスクを分割して同期命令を追加するより、そのまま空き時間とした方が、プログラム全体の処理時間が短いと判断される状況である。
しかし、このような細かな空き時間がスケジューリング結果の複数箇所に存在する場合、後続のタスクを多数持つタスクを分割することでスケジューリングの条件を改善し、プログラム全体の処理時間を短くできる可能性がある。 After scheduling, as shown in the first and second embodiments, even if any task in the program is divided and there is no improvement in the scheduling result, there is no free time on the process element. May exist. Due to the overhead due to synchronization, it is determined that the processing time of the entire program is shorter when the idle time is left as it is than when the task is divided and the synchronization instruction is added.
However, when such small free times exist at multiple locations in the scheduling result, there is a possibility that the scheduling conditions can be improved by dividing a task having many subsequent tasks and the processing time of the entire program can be shortened. is there.

後続のタスクを多数持つタスクを分割し、処理の終了時間を早めると、タスクの先行制約により、先行タスクが完了するまでスケジューリングの対象にならなかったタスク群が、従来よりも早い時間にスケジューリング可能となることで、スケジュール対象のタスクの候補が多くなる。その結果、従来では、タスクの先行制約により、空き時間となっていた箇所が、新たにスケジュール可能となったタスクが割り当てられることで、空き時間が減少し、プログラム全体の処理時間を短くすることができる。 If a task with many subsequent tasks is divided and the end time of processing is advanced, tasks that were not subject to scheduling until the preceding task is completed can be scheduled earlier than before due to task precedence constraints As a result, the number of candidate tasks to be scheduled increases. As a result, in the past, due to the task's precedence constraint, a task that has become free time is assigned a task that can be scheduled again, thereby reducing the free time and shortening the processing time of the entire program. Can do.

例えば、後続のタスクが四つ以上で、かつ、分割によって処理時間が短縮可能なタスクを検出し、それらのタスクについて、順にタスクの分割の前後で、プログラムの処理時間を比較し、効果があるタスクについてのみ、その分割を採用する方法がある。この場合、プログラムの先頭から順に、該当するタスクを検出し、プログラムの終端まで進んだ時点で処理を終了する。 For example, if there are four or more subsequent tasks and the task whose processing time can be shortened by dividing is detected, the processing time of the program is compared before and after dividing the task in order, and it is effective. There is a method of adopting the division only for the task. In this case, the corresponding task is detected in order from the beginning of the program, and the process is terminated when the program proceeds to the end of the program.

図１１のようなタスク構成を持つプログラムを二つのプロセスエレメントに対して割り付けを行うと図１２のようになる。図１２のタスクを示す矢印の右側の数値は、そのタスクの処理時間を表す。このとき、タスク１の実行条件判定の時間を１とする。タスク１とタスク２の間の時間差は３であるため、同期処理のオーバーヘッドを考慮すると、タスク１を分割する効果はない。このためタスク１は分割しない。このときのプログラムの処理時間は、２６になる。 When a program having a task configuration as shown in FIG. 11 is assigned to two process elements, the result is as shown in FIG. The numerical value on the right side of the arrow indicating the task in FIG. 12 represents the processing time of the task. At this time, the execution condition determination time for task 1 is set to 1. Since the time difference between task 1 and task 2 is 3, considering the synchronization processing overhead, there is no effect of dividing task 1. For this reason, task 1 is not divided. The processing time of the program at this time is 26.

一方、実施の形態３では、後続のタスク数が多いタスクを強制的に分割する。この例では、タスク１に続くタスクが四つあるため、タスク１を二つに分割する。そして、タスク１の処理の後半の部分を起動するため、プレディケート付き同期イベントの送受信処理を追加する。
タスク１を分割した際のスケジューリング結果を図１３に示す。図１３のようなスケジューリングを行った際のプログラムの処理時間は２５になり、タスク１を一つのタスクとして扱う場合に比べ短縮できる。図１３のタスクを示す矢印の右側の数値は、そのタスクの処理時間を表す。 On the other hand, in the third embodiment, a task having a large number of subsequent tasks is forcibly divided. In this example, since there are four tasks following task 1, task 1 is divided into two. Then, in order to activate the latter half of the task 1 process, a synchronization event transmission / reception process with a predicate is added.
FIG. 13 shows a scheduling result when task 1 is divided. When the scheduling as shown in FIG. 13 is performed, the processing time of the program is 25, which can be shortened compared to the case where task 1 is handled as one task. The numerical value on the right side of the arrow indicating the task in FIG. 13 represents the processing time of the task.

以上の処理の流れを図１４に示す。
先ず、タスク分割部２２は、全てのタスクのうち、後続のタスクが例えば三つ以上あるタスクについて、プログラム上の先頭のタスクから順にタスクを分割し（ステップＳＴ２１，ＳＴ２２）、スケジューリング部２３でスケジューリングを行う（ステップＳＴ２３）。その結果、ステップＳＴ２４において処理時間短縮の効果があれば、そのスケジューリング結果を採用し、そのタスクを分割する。一方、ステップＳＴ２４において、分割による効果がなければ、タスクを分割前に戻す（ステップＳＴ２５）。
次に、タスク分割部２２は、そのタスクがプログラムの最後のタスクかどうかを確認する（ステップＳＴ２６）。また、上記のステップＳＴ２１において、後続のタスクが三つ以上でなかった場合もステップＳＴ２６に移行する。ステップＳＴ２６において、そのタスクが最後のタスクであれば処理を終了する。一方、そのタスクが最後のタスクでない場合は、ステップＳＴ２１に戻って次のタスクに移る。このとき、後続のタスクが三つ以上あるタスクに移る。
尚、この例では、後続のタスクの数を３タスク以上としたが、このタスクの数は、３タスク以上に限定されるものではなく、２タスクや４タスク以上であってもよい。 The flow of the above processing is shown in FIG.
First, the task dividing unit 22 divides a task in order from the top task in the program for tasks having three or more subsequent tasks among all tasks (steps ST21 and ST22), and scheduling is performed by the scheduling unit 23. Is performed (step ST23). As a result, if there is an effect of reducing the processing time in step ST24, the scheduling result is adopted and the task is divided. On the other hand, if there is no effect due to the division in step ST24, the task is returned to the state before the division (step ST25).
Next, the task dividing unit 22 confirms whether or not the task is the last task of the program (step ST26). In step ST21, when there are not three or more subsequent tasks, the process proceeds to step ST26. In step ST26, if the task is the last task, the process ends. On the other hand, if the task is not the last task, the process returns to step ST21 and moves to the next task. At this time, the task moves to a task having three or more subsequent tasks.
In this example, the number of subsequent tasks is three or more, but the number of tasks is not limited to three or more, and may be two or four or more.

尚、並列処理装置の動作については、実施の形態１，２と同様であるため、ここでの説明は省略する。
また、実施の形態１または２を実施した後、更に実施の形態３のタスク分割を適用することで、更なるプログラムの処理時間の短縮を行うことができる。 Note that the operation of the parallel processing apparatus is the same as in the first and second embodiments, and a description thereof will be omitted here.
In addition, after the implementation of the first or second embodiment, further application of the task division of the third embodiment can further reduce the processing time of the program.

以上のように、実施の形態３のプログラム並列化装置によれば、プログラムを、実施の形態３の並列処理装置における複数のプロセスエレメント上で並列に実行するタスクに分割するプログラム並列化装置であって、プログラムに含まれるデータの依存関係を解析する解析部と、解析部の解析結果に基づいて、プログラムを複数のプロセスエレメントに対応したタスクに分割すると共に、データ依存の関係において後続のタスクが所定の個数以上の対象タスクがあった場合は対象タスクを分割するタスク分割部と、タスク分割部で分割したタスクを、複数のプロセスエレメントで実行するためのスケジューリングを行うスケジューリング部と、スケジューリングされた各プロセスエレメント毎のタスクを、データの依存関係に応じた同期命令を付加して各プロセスエレメントの実行形式のコードとして生成するコード生成部とを備え、スケジューリング部でスケジューリングを行った結果、スケジューリング部が処理時間の短縮が可能と判断した場合は、タスク分割部が対象タスクの分割を行うようにしたので、データの先行制約が原因で空き時間となっている時間に対して、その時刻で割当可能となるタスクの数が増加することにより、その空き時間に割当可能なタスクが現れた場合、プログラム全体の処理時間を短縮できる。 As described above, the program parallelizing apparatus according to the third embodiment is a program parallelizing apparatus that divides a program into tasks that are executed in parallel on a plurality of process elements in the parallel processing apparatus according to the third embodiment. Then, based on the analysis result of the data dependency included in the program and the analysis result of the analysis unit, the program is divided into tasks corresponding to a plurality of process elements, and the subsequent task in the data dependency relationship A task dividing unit that divides the target task when there are more than a predetermined number of target tasks, a scheduling unit that schedules the tasks divided by the task dividing unit to be executed by a plurality of process elements, Synchronize instructions for each process element according to data dependency And a code generator that generates each process element as an executable code, and if the scheduling unit determines that the processing time can be reduced as a result of scheduling by the scheduling unit, the task division unit is the target Since the task is divided, the number of tasks that can be assigned at that time increases with respect to the time that is free due to the preceding constraints of the data, so that it can be assigned to that free time. If a complicated task appears, the processing time of the entire program can be shortened.

実施の形態４．
実施の形態４は、タスクの実行条件の判定処理を複数のプロセスエレメントに対応して分割するようにしたものである。
プログラム並列化装置の図面上の構成は図２と同様であるため、図２を用いて説明する。
実施の形態４のプログラム並列化装置２０におけるタスク分割部２２は、解析部２１の解析結果に基づいて、プログラムを複数のプロセスエレメントに対応したタスクに分割すると共に、スケジューリング部２３でスケジューリングを行った結果、いずれかのプロセスエレメントで、タスク実行の空き時間が存在した場合は、タスクの実行条件の判定処理を複数のプロセスエレメントに対応して分割するよう構成されている。また、コード生成部２４は、タスク分割部２２で、タスクの実行条件の判定処理を分割した場合に、分割したタスクの実行条件の判定処理を同期させるための同期命令を付加するよう構成されている。それ以外の構成は実施の形態１と同様である。 Embodiment 4 FIG.
In the fourth embodiment, the task execution condition determination process is divided corresponding to a plurality of process elements.
The configuration of the program parallelizing apparatus in the drawing is the same as that in FIG. 2 and will be described with reference to FIG.
The task dividing unit 22 in the program parallelizing apparatus 20 according to the fourth embodiment divides the program into tasks corresponding to a plurality of process elements based on the analysis result of the analyzing unit 21 and performs scheduling by the scheduling unit 23. As a result, when there is an idle time for task execution in any process element, the task execution condition determination process is divided into a plurality of process elements. The code generation unit 24 is configured to add a synchronization instruction for synchronizing the divided task execution condition determination process when the task division unit 22 divides the task execution condition determination process. Yes. Other configurations are the same as those in the first embodiment.

図１５は、実施の形態４の並列処理装置におけるプロセスエレメントの構成図である。
実施の形態４のプロセッサはＮ（Ｎは任意の整数）個のプロセスエレメント（ＰＥ）を備えており、図１５はＮ番目（＃Ｎ）のプロセスエレメント５０を示している。
図示のプロセスエレメント５０は、プレディケートレジスタ５１、Ｎ−１個のプレディケート受信レジスタ５２、命令実行部５３、同期機構５４を備えている。プレディケートレジスタ５１は、そのプロセスエレメントで処理するプログラム用のプレディケート値を保持する（自プロセスエレメントの処理結果の真偽値を格納する）ためのレジスタであり、Ｎ−１個のプレディケート受信レジスタ５２は、それぞれ、他のプロセスエレメントからプレディケート付き同期イベントによって受信した、接続先のプレディケート値を格納するためのレジスタである。また、命令実行部５３は、プレディケートレジスタ５１の値とプレディケート受信レジスタ５２の値とに基づいて、タスクの実行を行うものである。更に、同期機構５４は、実施の形態１における同期機構１０，１４と同様に、他のプロセスエレメントとの同期を行うためのものである。 FIG. 15 is a configuration diagram of process elements in the parallel processing device according to the fourth embodiment.
The processor according to the fourth embodiment includes N (N is an arbitrary integer) process elements (PE), and FIG. 15 illustrates an Nth (#N) process element 50.
The illustrated process element 50 includes a predicate register 51, N−1 predicate reception registers 52, an instruction execution unit 53, and a synchronization mechanism 54. The predicate register 51 is a register for holding a predicate value for a program to be processed by the process element (stores the true / false value of the processing result of the own process element), and the N−1 predicate reception registers 52 are These are registers for storing predicate values of connection destinations received from other process elements by synchronization events with predicates. The instruction execution unit 53 executes the task based on the value of the predicate register 51 and the value of the predicate reception register 52. Furthermore, the synchronization mechanism 54 is for synchronizing with other process elements, like the synchronization mechanisms 10 and 14 in the first embodiment.

実施の形態４では、分割可能なタスクの実行条件の判定処理を空き時間を持つＰＥの数で分割する。そして、並列に演算する実行条件判定の結果を一つにまとめるために、複数のプロセスエレメント間でプレディケート付き同期イベントを受け渡す。
以下、実行条件とその条件下での処理内容の一まとまりを一つのタスクとして扱った場合のタスク間の依存関係が図１６に示されるようなプログラムを例に実施の形態４によるプログラム並列化装置２０の処理の流れと並列処理装置の動作について説明する。 In the fourth embodiment, the process for determining the task execution condition that can be divided is divided by the number of PEs having free time. Then, in order to combine the results of execution condition determinations that are calculated in parallel, a synchronization event with a predicate is passed between a plurality of process elements.
Hereinafter, the program parallelizing apparatus according to the fourth embodiment will be described with reference to an example of a program whose dependency relationship between tasks when an execution condition and a group of processing contents under the condition are treated as one task is shown in FIG. The flow of 20 processes and the operation of the parallel processing device will be described.

図１６において、実施の形態１と同様に、各ノードがタスクを表し、円の中の数値はタスクの番号を表す。また、円の右側の数値はそのタスクの処理時間である。更にノード間の矢印は、タスク間の先行制約を表現している。この例では、処理時間が４０であるタスク１を実行した後、タスク２とタスク３が実行可能になる。 In FIG. 16, as in the first embodiment, each node represents a task, and a numerical value in a circle represents a task number. The numerical value on the right side of the circle is the task processing time. Furthermore, the arrows between the nodes represent the precedence constraints between tasks. In this example, task 2 and task 3 can be executed after task 1 having a processing time of 40 is executed.

図１６で示されるタスク構成を持つプログラムをプログラム並列化装置２０に入力すると、プログラムを解析し中間コードを生成した後、図１６に示すタスクの単位に分割し、スケジューリングを終了した時点で、図１７のようなスケジューリング結果が得られる。図１７のタスクの動作を示す矢印の右側の数値は、そのタスクの処理時間を表す。プロセスエレメント＃０（２）上ではタスク１とタスク２を、プロセスエレメント＃１（３）上ではタスク３を実行する。このとき、プログラム全体の処理時間は、プロセスエレメント＃０（２）での処理時間となり、タスク１の処理時間とタスク２の処理時間を合わせた６１となる。 When a program having the task configuration shown in FIG. 16 is input to the program parallelization apparatus 20, the program is analyzed and an intermediate code is generated. Then, the program is divided into task units shown in FIG. A scheduling result like 17 is obtained. The numerical value on the right side of the arrow indicating the task operation in FIG. 17 represents the processing time of the task. Task 1 and task 2 are executed on process element # 0 (2), and task 3 is executed on process element # 1 (3). At this time, the processing time of the entire program is the processing time in process element # 0 (2), which is 61, which is the sum of the processing time of task 1 and the processing time of task 2.

スケジューリングを終えると、スケジューリング部２３は、いずれかのプロセスエレメント上にプロセスエレメント間の通信のオーバーヘッドを越える空き時間が存在するかを確認する。この例では、プロセスエレメント＃０（２）においてタスク１を実行している間、プロセスエレメント＃１（３）に空き時間が生じている。このため、プログラム並列化装置２０は、タスクの再分割の処理に入る。 When the scheduling is completed, the scheduling unit 23 confirms whether there is a free time exceeding the communication overhead between the process elements on any of the process elements. In this example, while task 1 is being executed in process element # 0 (2), there is a free time in process element # 1 (3). For this reason, the program parallelization apparatus 20 enters into the task re-division process.

図１８は、タスクの再分割の処理を示すフローチャートである。
タスクの再分割では、空き時間の時刻とその時刻に空き時間を持つＰＥの数を求める（ステップＳＴ３１）。その後、同時刻において処理を分割可能なタスクが存在するかを確認する（ステップＳＴ３２）。ここで、タスク１は処理時間が４０のタスクであるが、その処理時間４０のうち実行条件判定の処理に３０の時間を要するタスクであるとする。
この例では、プロセスエレメント＃０（２）で処理時間が４０のタスク１の実行を行っている間、プロセスエレメント＃１（３）には処理が割り当てられておらず、空き時間がある。このため、３０の時間を要するタスク１の実行条件判定を分割することができる。また、分割したタスク１の割当可能なプロセスエレメントは、プロセスエレメント＃０（２）とプロセスエレメント＃１（３）の二つである。そこで、プロセスエレメント＃０（２）上で実行するタスク１の実行条件判定の処理を二つに分割し、それぞれをプロセスエレメント＃０（２）とプロセスエレメント＃１（３）の二つに割り付ける（ステップＳＴ３３）。 FIG. 18 is a flowchart showing task re-division processing.
In the task re-division, the time of free time and the number of PEs having free time at that time are obtained (step ST31). Thereafter, it is confirmed whether there is a task capable of dividing the process at the same time (step ST32). Here, task 1 is a task with a processing time of 40, and it is assumed that the processing time 40 requires 30 hours for execution condition determination processing.
In this example, while task 1 having a processing time of 40 is being executed in process element # 0 (2), no process is assigned to process element # 1 (3) and there is free time. Therefore, it is possible to divide the execution condition determination of task 1 that requires 30 hours. Further, the process element # 0 (2) and the process element # 1 (3) that can be allocated to the divided task 1 are two. Therefore, the task 1 execution condition determination process executed on process element # 0 (2) is divided into two, and each is assigned to process element # 0 (2) and process element # 1 (3). (Step ST33).

実行条件判定の処理は、値の比較、論理演算、実行条件を求めるための処理をまとめた関数や、様々な実行条件の論理和や論理積の組み合わせによって構成される。実行条件判定の処理に長い時間を要する場合、個々の実行条件の演算を行った後、これらの実行条件について論理積や論理和を求めている場合がある。このような場合、個々の実行条件を求める処理を複数のプロセスエレメントに割り当てることで並列に処理を行えば、処理時間を短縮できる。
そして、各プロセスエレメントで求めた実行条件の演算結果を、プレディケート値として同期イベントと共に、プレディケート付き同期命令を用いて、他のプロセスエレメントに対して送信する。他のプロセスエレメントからプレディケート付き同期イベントを受信するプロセスエレメントには、プレディケート付き同期イベントの受信命令を追加する。そして、元のプログラムでの論理演算をプレディケートレジスタ間の演算命令に置き換える。 The execution condition determination process is constituted by a combination of values, a logical operation, and a process for obtaining execution conditions, and a combination of logical sums and logical products of various execution conditions. When a long time is required for execution condition determination processing, a logical product or a logical sum may be obtained for these execution conditions after calculation of individual execution conditions. In such a case, processing time can be shortened by performing processing in parallel by assigning processing for obtaining individual execution conditions to a plurality of process elements.
Then, the calculation result of the execution condition obtained in each process element is transmitted as a predicate value to other process elements using a synchronization instruction with a predicate together with a synchronization event. An instruction for receiving a synchronization event with a predicate is added to a process element that receives a synchronization event with a predicate from another process element. Then, the logical operation in the original program is replaced with an operation instruction between predicate registers.

即ち、タスク分割部２２は、起動元タスクの同期イベント発行位置にプレディケート付き同期命令を追加し（ステップＳＴ３４）、起動先の同期イベント待機位置に同期待ち命令を追加する（ステップＳＴ３５）。そして、同期イベント待ち後に受信したプレディケート値とプレディケートレジスタの値を用いた論理計算命令を追加する（ステップＳＴ３６）。このようにタスクを再分割したスケジューリング結果を図１９に示す。 That is, the task dividing unit 22 adds a synchronization instruction with a predicate to the synchronization event issuing position of the activation source task (step ST34), and adds a synchronization waiting instruction to the synchronization event standby position of the activation destination (step ST35). Then, a logical calculation instruction using the predicate value received after waiting for the synchronization event and the value of the predicate register is added (step ST36). FIG. 19 shows a scheduling result obtained by dividing the task in this way.

プレディケートレジスタ間の演算命令とは、そのプロセスエレメントで処理するプログラム用のプレディケート値を保持するプレディケートレジスタと、他のプロセスエレメントからプレディケート付き同期イベントによって受信した、接続先のプレディケート値を格納するプレディケートレジスタ間で論理演算する命令である。この演算結果は、自プロセスエレメントのプレディケート値を格納するプレディケートレジスタに入れる。 An operation instruction between predicate registers is a predicate register that holds a predicate value for a program to be processed by the process element, and a predicate register that stores a predicate value of a connection destination received from a sync event with a predicate from another process element. It is an instruction that performs a logical operation between them. This calculation result is put in a predicate register for storing the predicate value of the self-process element.

以上の構成により、条件実行判定の処理を複数のプロセスエレメントに分割して割り付けて並列実行すると共に、割り当てた実行条件判定の演算結果をプレディケート付き同期命令によって集約し、プレディケートレジスタ間で論理演算を行うことで、実行条件判定処理での実行条件を求めるまでの処理時間を短縮することができる。 With the above configuration, conditional execution determination processing is divided into a plurality of process elements, assigned and executed in parallel, and the assigned execution condition determination operation results are aggregated by a synchronized instruction with a predicate, and logical operations are performed between predicate registers. By doing so, it is possible to shorten the processing time required to obtain the execution condition in the execution condition determination process.

次に、実施の形態４の並列処理装置の動作を、図１９に示すスケジューリング結果のプログラムを例として説明する。
並列処理装置のプロセスエレメント＃０（２）は、分割されたタスク１−１を実行する。ここで、タスク１−１は、タスク１の実行条件判定処理の中の前半の処理である。また、プロセスエレメント＃１（３）は、分割されたタスク１−２を実行する。タスク１−２は、タスク１の実行条件判定処理の中の後半の処理である。タスク１−１のプログラムの最後の部分には、プレディケート付きイベントの受信命令と、通知されるプレディケート値と自プロセスエレメントのプレディケート値との間で論理演算を行う命令が追加されている。また、タスク１−２のプログラムの最後の部分には、プレディケート付き同期イベントの送信命令が追加されている。 Next, the operation of the parallel processing apparatus according to the fourth embodiment will be described using the scheduling result program shown in FIG. 19 as an example.
Process element # 0 (2) of the parallel processing device executes the divided task 1-1. Here, task 1-1 is the first half of task 1 execution condition determination processing. Further, the process element # 1 (3) executes the divided task 1-2. Task 1-2 is the latter half of the task 1 execution condition determination process. In the last part of the task 1-1 program, an instruction for receiving an event with a predicate and an instruction for performing a logical operation between the notified predicate value and the predicate value of the own process element are added. In addition, a transmission command for a synchronization event with a predicate is added to the last part of the task 1-2 program.

プロセスエレメント＃０（２）上のタスク１−１が、プロセスエレメント＃１（３）上のタスク１−２より先に終了すると、タスク１−２からのプレディケート付き同期イベントの待機状態となる。即ち、プロセスエレメント＃０（２）の同期機構５４に待機命令を格納する。そして、プロセスエレメント＃１（３）上のタスク１−２の処理を終了した時点で、命令実行部５３は、プレディケート付き同期イベントをプロセスエレメント＃０（２）の同期機構５４に対して送信する。このタスク１−２からのプレディケート付き同期イベントを受信することで、プロセスエレメント＃０（２）の命令実行部５３は、プレディケート間の演算を行う。 When the task 1-1 on the process element # 0 (2) ends before the task 1-2 on the process element # 1 (3), the task enters a standby state for a synchronization event with a predicate from the task 1-2. That is, the standby instruction is stored in the synchronization mechanism 54 of the process element # 0 (2). Then, when the processing of the task 1-2 on the process element # 1 (3) is completed, the instruction execution unit 53 transmits a synchronization event with a predicate to the synchronization mechanism 54 of the process element # 0 (2). . By receiving the predicate synchronization event from the task 1-2, the instruction execution unit 53 of the process element # 0 (2) performs an operation between predicates.

一方、プロセスエレメント＃１（３）のタスク１−２が、プロセスエレメント＃０（２）のタスク１−１より先に終了した場合は、終了時点でのプロセスエレメント＃１（３）のプレディケート値を、プロセスエレメント＃０（２）のプレディケート受信レジスタ５２に対して、プレディケート付き同期イベントによって通知する。そして、プロセスエレメント＃１（３）の命令実行部５３は、タスク３のプログラムを読み出し、実行を開始する。ここで、タスク３の先頭には、プロセスエレメント＃０（２）からの同期イベントの受信命令であるため、プロセスエレメント＃０（２）からの同期イベントを待機する。 On the other hand, if the task 1-2 of the process element # 1 (3) ends before the task 1-1 of the process element # 0 (2), the predicate value of the process element # 1 (3) at the end time Is notified to the predicate reception register 52 of the process element # 0 (2) by a synchronization event with a predicate. Then, the instruction execution unit 53 of the process element # 1 (3) reads the task 3 program and starts execution. Here, at the head of task 3, since there is a command for receiving a synchronization event from process element # 0 (2), the task waits for a synchronization event from process element # 0 (2).

プロセスエレメント＃０（２）では、タスク１−１の処理の最後で、プレディケート付き同期イベントの受信命令を実行する。このとき、対象となる同期イベントは既にプロセスエレメント＃１（３）から受信している。そのため、プロセスエレメント＃０（２）の命令実行部５３では、自プロセスエレメント＃０（２）のプレディケートレジスタ５１と、同期イベントと共に受信しプレディケート受信レジスタ５２に格納されているプロセスエレメント＃１（３）のプレディケート値を用いて、タスク１の実行条件判定の全体のプレディケート値を演算によって求める。 In the process element # 0 (2), at the end of the processing of the task 1-1, the reception instruction for the synchronization event with predicate is executed. At this time, the target synchronization event has already been received from the process element # 1 (3). Therefore, the instruction execution unit 53 of the process element # 0 (2) receives the predicate register 51 of its own process element # 0 (2) and the process element # 1 (3) received together with the synchronization event and stored in the predicate reception register 52. ), The overall predicate value for the task 1 execution condition determination is obtained by calculation.

プロセスエレメント＃０（２）の命令実行部５３では、複数のプロセスエレメント上で並列に演算した実行条件判定の結果を、プレディケート付き同期命令によって集約し、プレディケート受信レジスタ５２とプレディケートレジスタ５１間で演算を行い、実行条件判定部分のプレディケート値を求める。このプレディケートレジスタの真偽値に基づいて、タスク１−３の処理を行う。
プロセスエレメント＃０（２）の命令実行部５３は、タスク１−３の実行を終えると、プロセスエレメント＃１（３）上のタスク３に対して同期イベントを発行する。また、タスク２の処理を開始する。このとき、同期イベントを受信したプロセスエレメント＃１（３）では、タスク３の処理を開始し、タスク２とタスク３の処理を並行して行う。 In the instruction execution unit 53 of the process element # 0 (2), the execution condition determination results calculated in parallel on a plurality of process elements are aggregated by a synchronous instruction with a predicate, and the calculation is performed between the predicate reception register 52 and the predicate register 51. To obtain the predicate value of the execution condition determination part. Based on the true / false value of the predicate register, the process of task 1-3 is performed.
The instruction execution unit 53 of the process element # 0 (2) issues a synchronization event to the task 3 on the process element # 1 (3) after completing the execution of the task 1-3. Also, the process of task 2 is started. At this time, the process element # 1 (3) that has received the synchronization event starts the processing of task 3, and performs the processing of task 2 and task 3 in parallel.

以上の流れを図２０〜図２２を用いて説明する。
図２０は、実施の形態４におけるタスク１の分割前のプログラムの概要を示す説明図である。この図２０の処理Ｅの実行条件の判定処理において、論理和の部分で分割したものが、図２１と図２２である。この例では、図２１の処理をプロセスエレメント＃０（２）に、図２２の処理をプロセスエレメント＃１（３）に割り付けた例を示している。
図２１に示すプログラムは、条件Ａと条件Ｂの論理積をプレディケートレジスタ５１に格納し、プロセスエレメント＃１（３）からのプレディケート値の受信を待機することを示している。プロセスエレメント＃０（２）は、プレディケート付き同期イベントを受信すると、そのプレディケート受信レジスタ５２に格納された値と、自身のプレディケートレジスタ５１の論理和を取ってプレディケートレジスタ５１に格納する。そして、処理Ｅは、そのプレディケートレジスタ５１の真偽値に基づいて、処理を実行する。その後、図１６に示したタスクの先行関係に基づいて、タスク３を起動するために同期イベントをプロセスエレメント＃１（３）に対して送信する。その後、タスク２の処理に進む。 The above flow will be described with reference to FIGS.
FIG. 20 is an explanatory diagram showing an outline of a program before the division of task 1 in the fourth embodiment. In the execution condition determination process of process E in FIG. 20, what is divided at the logical sum is FIG. 21 and FIG. In this example, the process of FIG. 21 is assigned to process element # 0 (2), and the process of FIG. 22 is assigned to process element # 1 (3).
The program shown in FIG. 21 indicates that the logical product of the condition A and the condition B is stored in the predicate register 51 and the reception of the predicate value from the process element # 1 (3) is awaited. When the process element # 0 (2) receives the predicate synchronization event, the process element # 0 (2) takes the logical sum of the value stored in the predicate reception register 52 and its own predicate register 51 and stores it in the predicate register 51. Then, the process E executes the process based on the true / false value of the predicate register 51. Thereafter, a synchronization event is transmitted to the process element # 1 (3) in order to start the task 3 based on the prior relationship of the tasks shown in FIG. Thereafter, the process proceeds to task 2.

一方、図２２に示すプログラムは、条件Ｃと条件Ｄの論理積をプレディケートレジスタ５１に格納し、そのプレディケート値を同期イベントと共にプロセスエレメント＃０（２）に送信することを示している。その後、プロセスエレメント＃１（３）に割り当てられたタスク３の実行を開始するための同期イベントを待機する。
このような処理を行うことにより、タスク１の分割前の図１７に示すスケジューリングでは６１の時間がかかっていたプログラムの処理時間は、実行条件判定の処理の並列化によって、図１９に示すように４８に短縮される。 On the other hand, the program shown in FIG. 22 indicates that the logical product of the condition C and the condition D is stored in the predicate register 51, and the predicate value is transmitted to the process element # 0 (2) together with the synchronization event. Thereafter, it waits for a synchronization event for starting execution of task 3 assigned to process element # 1 (3).
By performing such processing, the processing time of the program, which took 61 hours in the scheduling shown in FIG. 17 before the division of task 1, is changed as shown in FIG. 48.

以上のように、実施の形態４のプログラム並列化装置によれば、プログラムを、実施の形態４の並列処理装置における複数のプロセスエレメント上で並列に実行するタスクに分割するプログラム並列化装置であって、プログラムに含まれるデータの依存関係を解析する解析部と、解析部の解析結果に基づいて、プログラムを複数のプロセスエレメントに対応したタスクに分割するタスク分割部と、分割したタスクを、複数のプロセスエレメントで実行するためのスケジューリングを行うスケジューリング部と、スケジューリングされた各プロセスエレメント毎のタスクを、データの依存関係に応じた同期命令を付加して各プロセスエレメントの実行形式のコードとして生成するコード生成部とを備え、スケジューリング部でスケジューリングを行った結果、スケジューリング部がいずれかのプロセスエレメントで、タスク実行の空き時間が存在すると判断したした場合は、タスク分割部がタスクの実行条件の判定処理を複数のプロセスエレメントに対応して分割すると共に、コード生成部が、分割したタスクの実行条件の判定処理を同期させるためのプレディケート付き同期イベントの受け渡しを行う命令を付加するようにしたので、並列処理装置におけるプログラムの実行効率を改善し、最大実行時間を短縮することができる。 As described above, the program parallelizing apparatus according to the fourth embodiment is a program parallelizing apparatus that divides a program into tasks that are executed in parallel on a plurality of process elements in the parallel processing apparatus according to the fourth embodiment. An analysis unit that analyzes the dependency of data included in the program, a task division unit that divides the program into tasks corresponding to a plurality of process elements based on the analysis result of the analysis unit, and a plurality of divided tasks A scheduling unit that performs scheduling for execution in each process element and a task for each scheduled process element are generated as a code in an execution format of each process element by adding a synchronization instruction according to data dependency A code generator, and scheduling by the scheduling unit As a result, if the scheduling unit determines that there is an idle time for task execution in any process element, the task division unit divides the task execution condition determination processing corresponding to a plurality of process elements. At the same time, the code generation unit adds an instruction for passing a synchronization event with a predicate for synchronizing the determination processing of the execution conditions of the divided tasks, so that the execution efficiency of the program in the parallel processing device is improved, Maximum execution time can be reduced.

この発明の実施の形態１による並列処理装置を示す構成図である。It is a block diagram which shows the parallel processing apparatus by Embodiment 1 of this invention. この発明の実施の形態１によるプログラム並列化装置を示す構成図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the program parallelization apparatus by Embodiment 1 of this invention. この発明の実施の形態１におけるタスクの先行関係を示す説明図である。It is explanatory drawing which shows the precedence relationship of the task in Embodiment 1 of this invention. この発明の実施の形態１によるプログラム並列化装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the program parallelization apparatus by Embodiment 1 of this invention. この発明の実施の形態１におけるタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure in Embodiment 1 of this invention to the process element. この発明の実施の形態１によるプログラム並列化装置のタスクの再分割処理を示すフローチャートである。It is a flowchart which shows the re-division process of the task of the program parallelization apparatus by Embodiment 1 of this invention. この発明の実施の形態１におけるタスク再分割後のタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure after the task re-division in Embodiment 1 of this invention to the process element. この発明の実施の形態２におけるタスクの先行関係を示す説明図である。It is explanatory drawing which shows the precedence relationship of the task in Embodiment 2 of this invention. この発明の実施の形態２におけるタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure in Embodiment 2 of this invention to the process element. この発明の実施の形態２におけるタスク再分割後のタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure after the task re-division in Embodiment 2 of this invention to the process element. この発明の実施の形態３におけるタスクの先行関係を示す説明図である。It is explanatory drawing which shows the precedence relationship of the task in Embodiment 3 of this invention. この発明の実施の形態３におけるタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure in Embodiment 3 of this invention to the process element. この発明の実施の形態３におけるタスク再分割後のタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure after the task re-division in Embodiment 3 of this invention to the process element. この発明の実施の形態３によるプログラム並列化装置のタスクの分割処理を示すフローチャートである。It is a flowchart which shows the division | segmentation process of the task of the program parallelization apparatus by Embodiment 3 of this invention. この発明の実施の形態４による並列処理装置のプロセスエレメントを示す構成図である。It is a block diagram which shows the process element of the parallel processing apparatus by Embodiment 4 of this invention. この発明の実施の形態４におけるタスクの先行関係を示す説明図である。It is explanatory drawing which shows the precedence relationship of the task in Embodiment 4 of this invention. この発明の実施の形態４におけるタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure in Embodiment 4 of this invention to the process element. この発明の実施の形態４によるプログラム並列化装置のタスクの再分割処理を示すフローチャートである。It is a flowchart which shows the re-division process of the task of the program parallelization apparatus by Embodiment 4 of this invention. この発明の実施の形態４におけるタスク再分割後のタスク構成をプロセスエレメントに割り付けた状態を示す説明図である。It is explanatory drawing which shows the state which allocated the task structure after the task re-division in Embodiment 4 of this invention to the process element. この発明の実施の形態４における分割前のプログラムを示す説明図である。It is explanatory drawing which shows the program before the division | segmentation in Embodiment 4 of this invention. この発明の実施の形態４における分割後のプログラムを一方のプロセスエレメントに割り付けた場合のプログラムを示す説明図である。It is explanatory drawing which shows the program at the time of allocating the program after the division | segmentation in Embodiment 4 of this invention to one process element. この発明の実施の形態４における分割後のプログラムを他方のプロセスエレメントに割り付けた場合のプログラムを示す説明図である。It is explanatory drawing which shows the program at the time of allocating the program after the division | segmentation in Embodiment 4 of this invention to the other process element.

Explanation of symbols

１プロセッサ、２，３，５０プロセスエレメント、４内蔵メモリ、５外部メモリ、６バス、７，１２，５１プレディケートレジスタ、８，１１，５２プレディケート受信レジスタ、９，１３，５３命令実行部、１０，１４，５４同期機構、２０プログラム並列化装置、２１解析部、２２タスク分割部、２３スケジューリング部、２４コード生成部、３０プログラム、３１マシンコード。 1 processor, 2, 3, 50 process element, 4 internal memory, 5 external memory, 6 bus, 7, 12, 51 predicate register, 8, 11, 52 predicate reception register, 9, 13, 53 instruction execution unit, 10, 14, 54 synchronization mechanism, 20 program parallelization device, 21 analysis unit, 22 task division unit, 23 scheduling unit, 24 code generation unit, 30 program, 31 machine code.

Claims

In a parallel processing device that includes a plurality of process elements and performs parallel processing of tasks with the plurality of process elements,
Each of the process elements is
A predicate register for storing a Boolean value for determining whether or not to execute the next instruction;
A predicate receive register that stores a predicate synchronization event with a Boolean value for determining whether to execute the next instruction received from another process element;
When executing a task that notifies the task execution condition determination result to a task executed by another process element, the predicate synchronization event with the true / false value of the predicate register is notified to the other process element. When executing a task that is executed based on the execution condition determination result of a task executed by another process element, the task is executed based on the true / false value of the reception register of the synchronization event with predicate. A parallel processing device comprising an instruction execution unit for performing the processing.

A program parallelizing device for dividing a program into tasks to be executed in parallel on a plurality of process elements in the parallel processing device according to claim 1,
An analysis unit for analyzing the dependency of data included in the program;
A task dividing unit that divides the program into tasks corresponding to the plurality of process elements based on the analysis result of the analyzing unit;
A scheduling unit that performs scheduling for executing the divided tasks in the plurality of process elements;
A code generation unit that generates a scheduled task for each process element as a code of an execution format of each process element by adding a synchronization instruction according to data dependency;
As a result of scheduling performed by the scheduling unit, if the scheduling unit determines that there is a free time for task execution in any process element, the task division unit handles task processing for the plurality of process elements. A program parallelizing apparatus to which the code generation unit adds an instruction for passing a synchronization event with a predicate for synchronizing the processing of the divided task.

A program parallelizing device for dividing a program into tasks to be executed in parallel on a plurality of process elements in the parallel processing device according to claim 1,
An analysis unit for analyzing the dependency of data included in the program;
Based on the analysis result of the analysis unit, the program is divided into tasks corresponding to the plurality of process elements, and if there are more than a predetermined number of target tasks in the data dependency relationship, the target A task division unit for dividing the task;
A scheduling unit that performs scheduling for executing the tasks divided by the task dividing unit by the plurality of process elements;
A code generation unit that generates a scheduled task for each process element as a code of an execution format of each process element by adding a synchronization instruction according to data dependency;
As a result of scheduling by the scheduling unit, a program parallelizing apparatus in which the task dividing unit divides the target task when the scheduling unit determines that the processing time can be shortened.

A program parallelizing device for dividing a program into tasks to be executed in parallel on a plurality of process elements in the parallel processing device according to claim 1,
An analysis unit for analyzing the dependency of data included in the program;
A task dividing unit that divides the program into tasks corresponding to the plurality of process elements based on the analysis result of the analyzing unit;
A scheduling unit that performs scheduling for executing the divided tasks in the plurality of process elements;
A code generation unit that generates a scheduled task for each process element as a code of an execution format of each process element by adding a synchronization instruction according to data dependency;
As a result of scheduling performed by the scheduling unit, if the scheduling unit determines that there is an idle time for task execution in any one of the process elements, the task division unit performs the task execution condition determination process. The program parallelizing apparatus adds a command for delivering a synchronization event with a predicate for synchronizing the determination processing of the execution condition of the divided task.