JPH0689188A

JPH0689188A - Automatic parallel processing system

Info

Publication number: JPH0689188A
Application number: JP23948392A
Authority: JP
Inventors: Chiharu Kori; 千治郡
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 1992-09-08
Filing date: 1992-09-08
Publication date: 1994-03-29

Abstract

PURPOSE:To suppress the overhead to improve the processing efficiency of the whole of a program by processing loops, where guarantee of end values of local data in micro tasks is required, in parallel without exclusive control or synchronous control by the automatic parallel function of a compiler. CONSTITUTION:A loop recognizing means 21 analyzes data dependence relations of loops by a data dependence relation analysis means 22. A parallelism discriminating means 23 recognizes loops which can be executed in parallel. An end value guarantee discriminating means 24 discriminates local data in each task and discriminates whether guarantee of the end value is required after the end of the loop or not. A memory assigning means 25 discriminates data hierarchies and assigns memories to individual hierarchies. A parallel code generating means 26 generates a code for parallel execution and end value guarantee. Thus, shared data between tasks executing the same process is used to guarantee the end values of local data in micro tasks without exclusive control or synchronous control, thereby shortening the execution time of the whole of the program.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自動並列化処理方式に
関し、特に、コンパイラとリンカとタスクスケジューラ
とを備えるメモリ共有型のマルチプロセッサ計算機シス
テムの自動並列化処理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic parallel processing system, and more particularly to an automatic parallel processing system for a memory sharing type multiprocessor computer system having a compiler, a linker and a task scheduler.

【０００２】[0002]

【従来の技術】流体解析，気象解析，プラズマ解析，構
造解析，資源探査，最適化問題等の科学技術計算分野に
おいては、大配列を幾つかのループを用いて処理するこ
とが多い。このような演算処理をループ構造に従って、
配列の中から１要素づつデータを取り出してスカラ命令
で順次実行する方式では、高性能な超大型計算機を占有
して実行させても、数十日かかることも珍しくなく、コ
ストパフォーマンス等を考えると実用的ではなかった。2. Description of the Related Art In the field of science and technology calculation such as fluid analysis, meteorological analysis, plasma analysis, structural analysis, resource exploration, optimization problem, etc., a large array is often processed using some loops. According to the loop structure, such arithmetic processing is
In the method of fetching data one element at a time from the array and sequentially executing it with scalar instructions, it often takes tens of days to occupy and execute a high-performance ultra-large computer, and considering cost performance, etc. It wasn't practical.

【０００３】また、ベクトル演算機能を有するベクトル
プロセッサを備えたスーパーコンピュータ等により、ベ
クトル命令で並列実行する方式でも、シングルプロセッ
サの場合には、プロセッサ自身の高速化にも限界があ
り、飛躍的な処理速度の向上は望めない。Further, even in a method of executing parallel execution by vector instructions by a supercomputer or the like having a vector processor having a vector operation function, in the case of a single processor, there is a limit to the speedup of the processor itself, which is a great leap. No improvement in processing speed can be expected.

【０００４】このために、マルチプロセッサで並列処理
可能なマルチタスキング方式が利用されるようになって
きている。また、並列処理には、サブルーチンあるいは
サブルーチン群を単位として処理するマクロタスキング
と、ループ，文あるいは文の集まりを単位として処理す
るマイクロタスキングとがある。For this reason, a multitasking system capable of parallel processing by a multiprocessor has come to be used. Further, the parallel processing includes macro tasking which processes a subroutine or a group of subroutines as a unit, and microtasking which processes a loop, a sentence or a group of sentences as a unit.

【０００５】利用者がマクロタスキングを用いる場合に
は、トップダウン的にプログラムのアルゴリズムが並列
処理に適しているかどうかを見直し、処理の並列性およ
びデータの独立性等を解析する必要があり、非常に高度
な技術が要求される。また、マイクロタスキングを用い
る場合には、ボトムアップ的にプログラムの局所的な部
分に着目し、各ループの繰り返しや文の集まり毎の処理
の並列性およびデータの独立性等を解析する必要があ
り、非常に細やかな注意力を要するために、コンパイラ
による自動並列化が不可欠である。When a user uses macrotasking, it is necessary to review top-down whether the algorithm of the program is suitable for parallel processing and analyze the parallelism of processing and the independence of data. Very sophisticated technology is required. Also, when using microtasking, it is necessary to focus on the local part of the program from the bottom up and analyze the parallelism of the processing for each loop iteration and each group of statements, data independence, etc. Yes, automatic parallelization by the compiler is indispensable because it requires very careful attention.

【０００６】自動並列化処理では、ループ等の制御構造
およびデータの定義引用関係等の情報を解析し、データ
の依存関係のない並列実行可能部分を判別し、その並列
実行用のコードを生成する。また、データの依存関係が
ある部分に対しても、排他制御や同期制御を行ったり、
ソースの変形を行ったりすることで並列実行を可能にし
ている。但し、一般的に、マイクロタスクは並列処理の
単位が小さいために、オーバーヘッドがプログラム全体
の処理時間に与える影響は大きい。従って、可能な限り
排他制御や同期制御を用いないで並列化を行い、オーバ
ーヘッドを抑止する必要がある。In the automatic parallelization processing, information such as a control structure such as a loop and a definition quotation relation of data is analyzed, a parallel executable part having no data dependency is determined, and a code for the parallel execution is generated. . In addition, even for parts that have data dependencies, exclusive control and synchronization control can be performed,
By modifying the source, parallel execution is possible. However, in general, since the microtask has a small unit of parallel processing, the overhead has a large influence on the processing time of the entire program. Therefore, it is necessary to suppress the overhead by performing parallelization without using exclusive control or synchronous control as much as possible.

【０００７】従来の自動並列化処理方式は、マイクロタ
スク内でローカルのデータが並列化の対象となるループ
の終了時に引用される場合には、ループの最終回をマル
チタスクのみで実行するようにシリアル制御し、マルチ
タスク内のローカルデータの最終回の値を、ループ終了
後に引用される当該データの記憶場所に設定することに
より、終値保証を行う必要があるが、そのオーバーヘッ
ドのために十分な高速化が図れなかった。In the conventional automatic parallelization processing method, when local data in a microtask is quoted at the end of a loop to be parallelized, the final round of the loop is executed only by multitasking. It is necessary to guarantee the final price by performing serial control and setting the final value of the local data in the multitask to the storage location of the data that is quoted after the loop ends, but it is sufficient for the overhead. I couldn't speed it up.

【０００８】[0008]

【発明が解決しようとする課題】上述した従来のマイク
ロタスキングによる自動並列化処理方式では、並列処理
単位を大きくし、且つ、排他制御や同期制御を抑止する
ことが並列化の処理効率の向上に大きく影響する。In the above-described conventional automatic parallelization processing system by microtasking, it is necessary to increase the parallel processing unit and suppress exclusive control or synchronous control to improve the processing efficiency of parallelization. Greatly affect the.

【０００９】本発明の目的は、マイクロタスク内のロー
カルデータの終値保証が必要となるループを排他制御や
同期制御を用いないで並列化し、オーバーヘッドを抑止
することにより、プログラム全体の処理効率を向上させ
る自動並列化処理方式を提供することにある。An object of the present invention is to improve the processing efficiency of the entire program by parallelizing a loop that requires guarantee of the final value of local data in a microtask without using exclusive control or synchronous control and suppressing overhead. It is to provide an automatic parallelization processing method.

【００１０】[0010]

【課題を解決するための手段】本発明の自動並列化処理
方式は、ソースプログラムを翻訳して並列実行用の目的
プログラムを生成するコンパイラと、前記目的プログラ
ムおよび実行時ルーチンを結合してマルチタスク構成の
実行可能プログラムを生成するリンカと、並列実行用の
タスクを管理するタスクスケジューラとを備えたメモリ
共有型のマルチプロセッサ計算機システムの自動並列化
処理方式において、コンパイラに、翻訳対象となるソー
スプログラムを入力後に、ソースプログラム内のループ
等の制御構造を認識するループ認識手段と、認識したル
ープ内のデータの定義引用関係を解析し、データの依存
関係を認識するデータ依存関係解析手段と、認識したル
ープの制御構造およびデータの依存関係により、並列実
行可能部分の判定および並列実行時のデータの定義引用
関係を保証するために、必要な排他制御や同期制御を認
識する並列性判定手段と、前記並列性判定手段により、
並列実行可能と判定されたループに対して、各タスク内
でローカルなデータを識別し、ループ終了時に終値保証
が必要か否かを判定する終値保証判定手段と、前記ソー
スプログラムの宣言情報、並列実行時のデータ依存関係
情報、終値保証情報およびシステムで利用可能な最大プ
ロセッサ数から、タスク間共有データおよび同一プロセ
ス実行タスク間共有データ並びにタスク内固有データの
データ階層を判別し、メモリへの割り当てを行うメモリ
割り当て手段と、前記並列性判定手段により、得られた
並列実行可能部分に対して、排他制御や同期制御用のモ
ードも含む並列実行用および終値保証用のコードを生成
する並列コード生成手段とを備え、タスクスケジューラ
に、前記コンパイラが生成した並列コードを含む前記目
的プログラムの実行時に要求される情報により、並列実
行用タスクを生成するタスク生成手段と、並列実行が要
求された時点で利用可能なプロセッサを確保し、確保し
たプロセッサを並列実行用タスクに割り当て、確保した
プロセッサ台数および各プロセッサを識別可能な識別番
号をプログラムに通知するタスク割り当て手段と、並列
実行のスケジューリングおよびタスク間の排他制御や同
期制御を行うタスク制御手段と、並列実行終了時にタス
クおよびプロセッサの解放を行うタスク解放手段とを備
え、前記タスクスケージューラから通知される並列実行
可能なプロセッサ台数およびそのプロセッサ識別番号情
報を利用することにより、同一プロセス実行タスク間共
有データというデータ階層を設け、排他制御や同期制御
を行わずに並列実行可能なタスク内でローカルなデータ
の終値保証を前記コンパイラで自動的に行うことにより
構成されている。The automatic parallelization processing system of the present invention is a multitasking system in which a source program is translated to generate a target program for parallel execution, and the target program and a runtime routine are combined. A source program to be translated by a compiler in an automatic parallelization processing method of a memory-sharing multiprocessor computer system including a linker that generates an executable program of a configuration and a task scheduler that manages tasks for parallel execution After inputting, the loop recognition means for recognizing control structures such as loops in the source program, and the data dependence analysis means for recognizing the data reference relationship in the recognized loop and recognizing the data dependence relationship. Of executable parts in parallel by the control structure and data dependency of the created loop To ensure defined citation data at the time of the preliminary parallel execution, the parallel determination means for recognizing the exclusive control and synchronous control necessary, by the parallel determination means,
For a loop that is determined to be executable in parallel, final value guarantee determination means that identifies local data in each task and determines whether or not a final price guarantee is required at the end of the loop, declaration information of the source program, parallel The data hierarchy of shared data between tasks, shared data between tasks and shared data between tasks and unique data within a task is determined from the data dependency information at run time, end price guarantee information and the maximum number of processors that can be used in the system, and allocated to memory. A parallel code generation for generating a code for parallel execution and a final value guarantee including a mode for exclusive control and synchronous control for the obtained parallel executable part by the memory allocation means for performing the above and the parallelism determination means. Means for executing the execution of the target program including parallel code generated by the compiler in a task scheduler. Depending on the information that is sometimes requested, task generation means for generating a task for parallel execution and a processor that can be used at the time when parallel execution is requested are secured, the secured processor is assigned to the task for parallel execution, and the number of secured processors And a task allocation means for notifying the program of an identification number capable of identifying each processor, a task control means for scheduling parallel execution and exclusive control and synchronization control between tasks, and releasing the task and processor at the end of parallel execution. A task release means is provided, and by using the number of processors that can be executed in parallel and the processor identification number information notified from the task scheduler, a data hierarchy of shared data between same process execution tasks is provided to perform exclusive control and synchronization. In a task that can be executed in parallel without controlling And it is configured by automatically performing closing guarantee local data by the compiler.

【００１１】[0011]

【作用】本発明の自動並列化処理方式は、タスクスケジ
ューラから通知される並列実行可能なプロセッサ台数と
そのプロセッサ識別番号情報とを利用することにより、
データスコープがタスク間共有データと同じであるが、
同一プロセス上で順次実行されるタスク間で共有され、
定義引用関係の順序性が保証される同一プロセス実行タ
スク間共有データというデータ階層を設け、排他制御や
同期制御を行わずに並列実行を可能とする並列コードの
生成とメモリ割り当てとをコンパイラで自動的に行って
いる。The automatic parallelization processing method of the present invention uses the number of processors capable of parallel execution notified from the task scheduler and the processor identification number information,
The data scope is the same as the shared data between tasks,
Shared between tasks that are executed sequentially in the same process,
A data hierarchy called shared data between same process execution tasks that guarantees the ordering of definition quoting relationships is provided, and the compiler automatically generates parallel code that enables parallel execution without performing exclusive control or synchronization control, and memory allocation. I am doing it.

【００１２】[0012]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。図１は、本発明の自動並列化処理方式の一
実施例を示すブロック図である。また、図２は、同一プ
ロセス実行タスク間共有データの概念を示す図である。
一方、図３は、ソースプログラム１の一例としてＦＯＲ
ＴＲＡＮ言語によるプログラムを示す図である。また、
図４は、図３のプログラムを従来の技術で並列化したプ
ログラムの一例を示す図である。他方、図５は、図３の
プログラムを従来の技術で並列化したデータ階層とメモ
リ割り当てとの状態の一例を示す図である。また、図６
は、図３のプログラムを従来の技術で並列化した各タス
クの実行状況の一例を示す図である。次に、図７は、図
３のプログラムを本実施例の自動並列化処理方式で並列
化したプログラムの一例を示す図である。また、図８
は、図３のプログラムを本実施例の自動並列化処理方式
で並列化したデータ階層とメモリ割り当てとの状態の一
例を示す図である。さらに、図９は、図３のプログラム
を本実施例の自動並列化処理方式で並列化した各タスク
の実行状況の一例を示す図である。Embodiments of the present invention will now be described with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of an automatic parallelization processing system of the present invention. 2 is a diagram showing the concept of shared data between same process execution tasks.
On the other hand, FIG. 3 shows the FOR as an example of the source program 1.
It is a figure which shows the program by TRAN language. Also,
FIG. 4 is a diagram showing an example of a program in which the program of FIG. 3 is parallelized by a conventional technique. On the other hand, FIG. 5 is a diagram showing an example of a state of a data hierarchy and memory allocation in which the program of FIG. 3 is parallelized by a conventional technique. In addition, FIG.
FIG. 4 is a diagram showing an example of an execution status of each task in which the program of FIG. 3 is parallelized by a conventional technique. Next, FIG. 7 is a diagram showing an example of a program in which the program of FIG. 3 is parallelized by the automatic parallelization processing method of the present embodiment. Also, FIG.
FIG. 4 is a diagram showing an example of a state of a data hierarchy and memory allocation in which the program of FIG. 3 is parallelized by the automatic parallelization processing method of this embodiment. Further, FIG. 9 is a diagram showing an example of the execution status of each task in which the program of FIG. 3 is parallelized by the automatic parallelization processing method of the present embodiment.

【００１３】図１に示すように、本実施例は、コンパイ
ラ２とリンカ５とタスクスケジューラ７とを備えてお
り、ソースプログラム１をコンパイラ２により翻訳し
て、並列実行用の目的プログラム３を生成し、その目的
プログラム３と実行時ルーチン４とをリンカ５により結
合してマルチタスク構成の実行可能プログラム６を生成
し、その実行可能プログラム６をタスクスケジューラ７
の制御の下で並列実行させている。As shown in FIG. 1, this embodiment comprises a compiler 2, a linker 5 and a task scheduler 7. The source program 1 is translated by the compiler 2 to generate a target program 3 for parallel execution. Then, the target program 3 and the run-time routine 4 are linked by a linker 5 to generate an executable program 6 having a multitask structure, and the executable program 6 is linked to the task scheduler 7
Are executed in parallel under the control of.

【００１４】ここで、本発明の特徴である同一プロセス
実行タスク間共有データの概念を図２を用いて説明す
る。並列実行可能な単位（例えば、ループ構造の１回の
繰り返し処理単位等）は、その単位間でデータの依存関
係がないことが前提であり、その単位を１つのタスクと
して並列処理を行う。この並列処理単位の間で共通にア
クセスされるデータ階層としてタスク間共有データ２７
という概念がある。The concept of shared data between same process execution tasks, which is a feature of the present invention, will be described with reference to FIG. It is premised that a unit that can be executed in parallel (for example, a unit of repeated processing of a loop structure, etc.) has no data dependency between the units, and that unit is used as one task for parallel processing. Intertask shared data 27 is used as a data hierarchy commonly accessed between the parallel processing units.
There is a concept.

【００１５】図２の例では、ＦＯＲＴＲＡＮ言語のＣＯ
ＭＭＯＮ文で宣言された配列Ａ，Ｂ，ＣがＤＯループ中
で参照されており、そのＤＯループが並列実行される場
合にタスク間共有データとして扱われる。配列Ａ，Ｂ，
Ｃには、データの依存関係がなく、このＤＯループが２
台のプロセッサで並列実行される場合には、各並列処理
単位がタスクとして、順次にプロセッサに割り当てられ
る。In the example of FIG. 2, the FORTRAN language CO is used.
The arrays A, B, and C declared in the MMON statement are referenced in the DO loop, and when the DO loop is executed in parallel, they are treated as inter-task shared data. Array A, B,
There is no data dependency in C, and this DO loop is 2
When parallel execution is performed by one processor, each parallel processing unit is sequentially assigned to the processor as a task.

【００１６】このときに、プロセスとしてアクセスさ
れるデータは、図２に示すように、配列Ａ，Ｂ，Ｃの第
１，３，５，……，９９要素であり、そのアクセスの順
序性は保たれている。同様に、プロセスとしてアクセ
スされるデータは、配列Ａ，Ｂ，Ｃの第２，４，６，…
…，１００要素であり、そのアクセスの順序性も保たれ
ている。よって、同一プロセス上で実行されるタスク間
では、参照されるデータのアクセス順序は、スカラ命令
で順次に実行される場合と同様の順序性が保証されてい
ることになる。At this time, the data accessed as a process are the first, third, fifth, ..., 99th elements of the arrays A, B, and C, as shown in FIG. It is kept. Similarly, the data accessed as a process is the second, fourth, sixth, ...
.., 100 elements, and the order of access is maintained. Therefore, between the tasks executed in the same process, the access order of the referenced data is guaranteed to be the same order as in the case of sequentially executing the scalar instructions.

【００１７】この考え方に基づき、同一プロセス実行タ
スク間共有データ２８というデータ階層を設けると、図
２に示すように、配列Ａ，Ｂ，Ｃの第１，３，５……，
９９要素は、プロセスの同一プロセス実行タスク間共
有データに割り当てることができる。同様に、配列Ａ，
Ｂ，Ｃの第２，４，６，……，１００要素は、プロセス
の同一プロセス実行タスク間共有データに割り当てる
ことができる。Based on this concept, when a data hierarchy called shared data between same process execution tasks 28 is provided, as shown in FIG. 2, the first, third, fifth, ...
99 elements can be assigned to shared data between same process execution tasks of a process. Similarly, array A,
The second, fourth, sixth, ..., 100 elements of B and C can be assigned to the same process execution task shared data of the process.

【００１８】同一プロセス実行タスク間共有データ２８
は、タスク間共有データ２７と同じデータスコープをも
つが、同一プロセス上で順次に実行されるタスク間での
み共有されることから、データの定義引用関係の順序性
が保証されるという特徴を有する。従って、従来の技術
に同一プロセス実行タスク間共有データ２８という新し
いデータ階層を導入することにより、データの依存関係
がある並列処理単位を並列実行する場合に、依存関係の
あるデータをプロセス毎に割り当てることで排他制御や
同期制御を行わなくても並列実行を可能とすることがで
きる。Shared data between same process execution tasks 28
Has the same data scope as the task-to-task shared data 27, but is shared only between tasks that are sequentially executed on the same process, and therefore has the characteristic that the order of the data definition citation relationship is guaranteed. . Therefore, by introducing a new data hierarchy called shared data between same process execution tasks 28 into the conventional technique, when parallel processing units having a data dependency relationship are executed in parallel, data having a dependency relationship is assigned to each process. Therefore, it is possible to perform parallel execution without performing exclusive control or synchronous control.

【００１９】ソースプログラム１の一例は、図３に示す
ように、与えられた配列Ａ，Ｂ，Ｃによりループ３１で
演算し、その結果を配列Ｃおよび変数Ｘに格納して返却
するプログラムであり、ループ３１を並列化した場合
に、変数Ｔの終値保証を必要とする。As shown in FIG. 3, an example of the source program 1 is a program for operating in a loop 31 with given arrays A, B, and C, storing the result in an array C and a variable X, and returning the result. , When the loop 31 is parallelized, it is necessary to guarantee the final value of the variable T.

【００２０】従来の技術で図３に示すプログラムを翻
訳，実行した場合には、図４，図６に示すように、ＲＥ
ＳＥＲＶＥルーチン４１を呼び出して、タスクスケジュ
ーラにより、タイミング６１で、利用可能なプロセッサ
を確保し、タイミング６２でＰＡＲＡＬＬＥＬ＿ＤＯル
ーチン４２を呼び出してＮ−１回のループ４３を並列実
行する。When the program shown in FIG. 3 is translated and executed by the conventional technique, as shown in FIGS.
The task scheduler calls the SERVE routine 41 to secure an available processor at timing 61, and calls the PARALLEL_DO routine 42 at timing 62 to execute the loop 43 N-1 times in parallel.

【００２１】そして、タイミング６３で、ＳＥＲＩＡＬ
ルーチン４４の呼び出しにより、ループ４３の終了待ち
の同期をとり、ループ３１の最終回（Ｎ回目）をシリア
ルセクション４５としてマスタタスク（タスク１）で実
行することで、タスク内固有データＴ４２の終値を代入
できる。Then, at timing 63, SERIAL
By calling the routine 44, the end waiting of the loop 43 is synchronized, and the final time (Nth time) of the loop 31 is executed as the serial section 45 by the master task (task 1), and the final value of the in-task unique data T42 is determined. Can be assigned.

【００２２】その後に、タイミング６４で、ＥＮＤ＿Ｓ
ＥＲＩＡＬルーチン４６の呼び出しにより、シリアルセ
クション４５の実行終了を待つスレーブタスク（タスク
２，〜タスクＭ）の待機状態を解除する。最後に、タイ
ミング６５で、ＲＥＬＥＡＳＥルーチン４７を呼び出し
てプロセッサを解放する。Thereafter, at timing 64, END_S
By calling the ERIAL routine 46, the standby state of the slave tasks (task 2, task M) waiting for the completion of execution of the serial section 45 is released. Finally, at timing 65, the RELEASE routine 47 is called to free the processor.

【００２３】本実施例では、図６のタイミング６３，６
４の同期制御を抑止して、プログラム全体の実行効率を
高めるために、以下のような処理を行う。In this embodiment, the timings 63 and 6 shown in FIG.
In order to suppress the synchronization control of No. 4 and improve the execution efficiency of the entire program, the following processing is performed.

【００２４】コンパイラ２は、ループ認識手段２１とデ
ータ依存関係解析手段２２と並列性判定手段２３と終値
保証判定手段２４とメモリ割り当て手段２５と並列コー
ド生成手段２６とを備えている。The compiler 2 comprises a loop recognizing means 21, a data dependency analyzing means 22, a parallelism determining means 23, a final price guarantee determining means 24, a memory allocating means 25 and a parallel code generating means 26.

【００２５】ループ認識手段２１は、入力したソースプ
ログラム１の制御構造を解析し、ループ３１を認識す
る。データ依存関係解析手段２２は、ループ３１内のデ
ータの定義引用関係を解析し、ループの繰り返しにまた
がる依存関係がないことを認識する。並列性判定手段２
３は、ループ３１全体が並列実行可能であることを判定
する。The loop recognition means 21 analyzes the control structure of the input source program 1 and recognizes the loop 31. The data dependency relationship analyzing unit 22 analyzes the definition quotation relationship of the data in the loop 31 and recognizes that there is no dependency relationship that spans the repetition of the loop. Parallelism determination means 2
3 determines that the entire loop 31 can be executed in parallel.

【００２６】また、終値保証判定手段２４は、ループ３
１の終了時に引用されていることから、終値保証を行う
必要があると判定し、図８に示すように、同一プロセス
実行タスク間共有データ８３として、システムで利用可
能な最大プロセッサ数だけの要素をもつＰＤＯ８４（何
回目のループ実行であるかの識別用）とＰＴ８５（同一
プロセス上で実行するタスクは順次実行されることか
ら、変数Ｔを代替するデータ用）とを用意し、図７に示
すように、ループ３１をループ７４に変形し、ループ７
４の並列実行終了後に、ループ７４の最終回を実行した
プロセスを判定し、そのプロセスに対応するＰＴ８５の
要素を変数Ｔの終値として変数Ｘに代入するテキスト列
７５を生成する。Further, the final price guarantee judging means 24 uses the loop 3
Since it is quoted at the end of 1, it is determined that it is necessary to guarantee the closing price, and as shown in FIG. 8, as the shared data 83 between the same process execution tasks, only the maximum number of processors available in the system 7 and PDO84 (for identifying how many times the loop is executed) and PT85 (for data that substitutes the variable T because the tasks executed in the same process are sequentially executed) are prepared. As shown, loop 31 is transformed into loop 74 and loop 7
After the end of the parallel execution of 4, the process that executed the final round of the loop 74 is determined, and the text string 75 that substitutes the element of PT85 corresponding to the process into the variable X as the final value of the variable T is generated.

【００２７】一方、メモリ割り当て手段２５は、プログ
ラム中に現れた各データを、図８のように、タスク間共
有データ８７，同一プロセス実行タスク間共有データ８
３，タスク内固有データ８８のデータ階層に分類して割
り当てる。並列コード生成手段２６は、並列性判定手段
２３で認識された並列実行可能部分や終値保証判定手段
２４で変形および生成したテキスト部分も含めて、並列
実行用のコードを生成する。On the other hand, the memory allocating means 25 converts the respective data appearing in the program into the inter-task shared data 87 and the same process execution task shared data 8 as shown in FIG.
3. Classify and assign to the data hierarchy of the in-task unique data 88. The parallel code generation unit 26 generates a code for parallel execution, including the parallel executable portion recognized by the parallelism determination unit 23 and the text portion modified and generated by the final price guarantee determination unit 24.

【００２８】以上のように、本実施例の自動並列化処理
方式で図３に示すプログラムを翻訳して実行した場合に
は、図７，図９に示すように、タイミング９１で、ＲＥ
ＳＥＲＶＥルーチン７１を呼び出して、タスクスケジュ
ーラにより利用可能なプロセッサが確保され、タスク間
共有データＰＮＵＭ８２に確保したプロセッサ台数が返
却される。また、ＩＰＩＤルーチン７２を呼び出して、
タスクスケジューラにより、自タスクが割り当てられた
プロセッサ識別番号を同一プロセス実行タスク間共有デ
ータＰＩＤ８６に返却される。As described above, when the program shown in FIG. 3 is translated and executed by the automatic parallelization processing method of this embodiment, as shown in FIG. 7 and FIG.
The SERVER routine 71 is called, the available processor is secured by the task scheduler, and the secured number of processors is returned to the inter-task shared data PNUM 82. Also, by calling the IPID routine 72,
The task scheduler returns the processor identification number to which the own task is assigned to the same process execution task shared data PID86.

【００２９】次に、タイミング９２で、ＰＡＲＡＬＬＥ
Ｌ＿ＤＯルーチン７３を呼び出してループ７４を並列実
行し、並列実行の終了したプロセスより順に、タイミン
グ９３で、ループ７４の最終回を実行したか否かを判定
し、最終回を実行したプロセスでは、変数Ｘに正しい終
値を代入する。その後に、タイミング９４で、ＲＥＬＥ
ＡＳＥルーチン７６を呼び出してプロセッサを解放す
る。Next, at a timing 92, PARALLE
The L_DO routine 73 is called to execute the loop 74 in parallel, and at the timing 93, it is determined whether or not the final round of the loop 74 has been executed in order from the process in which the parallel execution has ended. Substitute the correct closing price for X. After that, at timing 94, RELE
Call the ASE routine 76 to free the processor.

【００３０】[0030]

【発明の効果】以上説明したように、本発明の自動並列
化処理方式は、マイクロタスク内のローカルデータの終
値保証が必要となるループを含むプログラムを自動並列
化する場合に、排他制御や同期制御を用いずに、同一プ
ロセス実行タスク間共有データというデータ階層を利用
することにより解消し、排他制御や同期制御のオーバー
ヘッドが抑止されることにより、プログラム全体の実行
時間を短縮できるという効果を有している。As described above, according to the automatic parallelization processing method of the present invention, exclusive control or synchronization is performed in the case of automatically parallelizing a program including a loop in which a final value guarantee of local data in a microtask is required. This is solved by using the data hierarchy of shared data between same process execution tasks without using control, and the overhead of exclusive control and synchronous control is suppressed, which has the effect of reducing the execution time of the entire program. is doing.

[Brief description of drawings]

【図１】本発明の自動並列化処理方式の一実施例を示す
ブロック図である。FIG. 1 is a block diagram showing an embodiment of an automatic parallelization processing system of the present invention.

【図２】同一プロセス実行タスク間共有データの概念を
示す図である。FIG. 2 is a diagram showing a concept of shared data between same process execution tasks.

【図３】ソースプログラム１の一例としてＦＯＲＴＲＡ
Ｎ言語によるプログラムを示す図である。FIG. 3 shows FORTRA as an example of the source program 1.
It is a figure which shows the program by N language.

【図４】図３のプログラムを従来の技術で並列化したプ
ログラムの一例を示す図である。FIG. 4 is a diagram showing an example of a program in which the program of FIG. 3 is parallelized by a conventional technique.

【図５】図３のプログラムを従来の技術で並列化したデ
ータ階層とメモリ割り当てとの状態の一例を示す図であ
る。5 is a diagram showing an example of a state of a data hierarchy and memory allocation in which the program of FIG. 3 is parallelized by a conventional technique.

【図６】図３のプログラムを従来の技術で並列化した各
タスクの実行状況の一例を示す図である。FIG. 6 is a diagram showing an example of an execution status of each task in which the program of FIG. 3 is parallelized by a conventional technique.

【図７】図３のプログラムを本実施例の移動並列化処理
方式で並列化したプログラムの一例を示す図である。FIG. 7 is a diagram showing an example of a program in which the program of FIG. 3 is parallelized by the moving parallelization processing method of the present exemplary embodiment.

【図８】図３のプログラムを本実施例の自動並列化処理
方式で並列化したデータ階層とメモリ割り当てとの状態
の一例を示す図である。FIG. 8 is a diagram showing an example of a state of a data hierarchy and memory allocation in which the program of FIG. 3 is parallelized by the automatic parallelization processing method of the present embodiment.

【図９】図３のプログラムを本実施例の自動並列化処理
方式で並列化した各タスクの実行状況の一例を示す図で
ある。FIG. 9 is a diagram showing an example of the execution status of each task in which the program of FIG. 3 is parallelized by the automatic parallelization processing method of the present embodiment.

[Explanation of symbols]

１ソースプログラム２コンパイラ３目的プログラム４実行時ルーチン５リンカ６実行可能プログラム７タスクスケジューラ２１ループ認識手段２２データ依存関係解析手段２３並列性判定手段２４終値保証判定手段２５メモリ割り当て手段２６並列コード生成手段２７タスク間共有データ２８同一プロセス実行タスク間共有データ 1 Source Program 2 Compiler 3 Target Program 4 Runtime Routine 5 Linker 6 Executable Program 7 Task Scheduler 21 Loop Recognition Means 22 Data Dependency Analysis Means 23 Parallelism Judgment Means 24 Final Value Guarantee Means 25 Memory Allocation Means 26 Parallel Code Generation Means 27 Shared data between tasks 28 Shared data between same process execution tasks

Claims

[Claims]

1. A compiler for translating a source program to generate an object program for parallel execution, a linker for combining the object program and a runtime routine to generate an executable program having a multitask structure, and for parallel execution. In the automatic parallelization processing method of the memory sharing type multiprocessor computer system with the task scheduler that manages the tasks of the above, after inputting the source program to be translated into the compiler, control structures such as loops in the source program are input. Parallel execution by the recognized loop recognition means, the data definition analysis means that recognizes the definition citation relationship of the data in the recognized loop and recognizes the data dependency relationship, and the recognized control structure of the loop and the data dependency relationship Judgment of feasible parts and definition of data during parallel execution In order to do so, the parallelism determining means for recognizing the necessary exclusive control and the synchronous control, and the loop determined to be parallel executable by the parallelism determining means identify the local data in each task, A closing price guarantee judging means for judging whether closing price guarantee is necessary at the end of the loop, and from the declaration information of the source program, data dependency information at the time of parallel execution, closing price guarantee information and the maximum number of processors available in the system, the task Memory allocating means for deciding the data hierarchy of inter-process shared data, inter-process shared task shared data, and in-task specific data, and allocating to the memory, and the parallelism determining means for the parallel executable parts obtained. And a parallel code generation means for generating code for parallel execution including the modes for exclusive control and synchronous control and for guaranteeing the closing price. And a task scheduler that generates a task for parallel execution according to the information required when the target program including the parallel code generated by the compiler is executed, and a task scheduler that can be used when parallel execution is requested. Secure processor, allocate the secured processor to the task for parallel execution, and notify the program of the number of secured processors and the identification number that can identify each processor, and scheduling of parallel execution and exclusive control between tasks And a task control means for performing synchronous control, and a task release means for releasing the task and the processor at the end of parallel execution. The number of processors capable of parallel execution and the processor identification number information notified from the task scheduler are used. By doing the same process Automatic parallelization processing characterized by providing a data hierarchy called shared data between row tasks and automatically performing local end-value guarantee of local data within a task that can be executed in parallel without performing exclusive control or synchronous control method.