JP3317816B2

JP3317816B2 - Data transfer processing allocation method in the compiler

Info

Publication number: JP3317816B2
Application number: JP15717095A
Authority: JP
Inventors: 正典田村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-05-31
Filing date: 1995-05-31
Publication date: 2002-08-26
Anticipated expiration: 2017-08-26
Also published as: JPH08328873A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、データ分散記述言語で
記述されたソースプログラムを入力してメモリ分散型マ
ルチプロセッサシステム向けの目的プログラムを生成す
るコンパイラに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiler for inputting a source program described in a data distributed description language and generating a target program for a memory distributed multiprocessor system.

【０００２】[0002]

【従来の技術】プログラム中のループのうち、繰り返し
処理の間でデータ依存関係の無いループは、各繰り返し
処理を独立に実行できるため、複数のプロセッサに分散
させて並列に実行させることができる。この場合、各プ
ロセッサがメモリを共有するメモリ共有型マルチプロセ
ッサシステムで実行すると、例えば特開平４−２９６９
６２号公報に記述されるように各プロセッサ間でデータ
転送を必要としないが、各プロセッサは同一の共有メモ
リをアクセスするために、プロセッサの台数が増えれば
増えるほどメモリ競合が頻繁に発生し、並列化の効率が
低下する。そこで、メモリ競合による問題を解消し並列
化の程度を高める為に、各々がローカルメモリを有する
プロセッサを複数台ネットワーク等で結合したメモリ分
散型マルチプロセッサシステムが利用される。2. Description of the Related Art Among loops in a program, a loop having no data dependency between repetition processes can execute each repetition process independently, and thus can be distributed to a plurality of processors and executed in parallel. In this case, when each processor is executed by a shared memory type multiprocessor system in which a memory is shared, for example, Japanese Patent Laid-Open No. 4-2969 is disclosed.
No data transfer is required between processors as described in Japanese Patent Publication No. 62-62, but since each processor accesses the same shared memory, memory contention frequently occurs as the number of processors increases, The efficiency of parallelization decreases. Therefore, in order to solve the problem due to memory contention and increase the degree of parallelization, a memory distributed multiprocessor system in which a plurality of processors each having a local memory are connected by a network or the like is used.

【０００３】しかしながら、メモリ分散型マルチプロセ
ッサシステムの場合には、ループ中の配列を各プロセッ
サのローカルメモリに分散させるので、配列の分散状態
とループの分散状態とによっては、繰り返し処理の間で
データ依存関係の無いループであってもプロセッサ間で
データ転送を必要とする場合がある。即ち、メモリ分散
型マルチプロセッサシステムにおけるループ処理の並列
化では、プログラムが参照，更新する配列を例えばデー
タ分散記述言語による指示行で指定された通りに各プロ
セッサのローカルメモリに分散させ、他方、プログラム
中に含まれる各々のループを例えばそれに対して指示行
で指定された通りに分割して各プロセッサに割り当て
る。従って、配列の分散方法，ループの分割方法によっ
ては、各プロセッサが受け持つループの繰り返し処理に
おいてアクセスが必要となる配列要素がそのプロセッサ
のローカルメモリに分散されていない状況が発生する。However, in the case of a memory distributed type multiprocessor system, the array in the loop is distributed to the local memory of each processor. Even in a loop having no dependency, data transfer may be required between processors. That is, in parallelization of loop processing in a memory-distributed multiprocessor system, an array to be referenced or updated by a program is distributed to a local memory of each processor as specified by an instruction line in, for example, a data distribution description language. Each of the loops contained therein is divided and assigned to each processor, for example, as specified in the instruction line. Therefore, depending on the method of distributing the array and the method of dividing the loop, a situation arises in which the array elements that need to be accessed in the loop repetition processing assigned to each processor are not distributed to the local memory of the processor.

【０００４】そこで、データ分散記述言語で記述された
ソースプログラムを入力してメモリ分散型マルチプロセ
ッサシステム向けの目的プログラムを生成する従来のコ
ンパイラにおいては、各配列の分散方法およびループの
分割方法を決定した後、各プロセッサが受け持つループ
の繰り返し処理においてアクセスされる配列要素がその
プロセッサに分散されていない場合、前記ループ処理の
区間において配列の分散状態がそのループの分散状態と
合致するように一時的に変更するために、ループの前後
に配列要素をプロセッサ間で転送するデータ転送処理を
挿入するようにしていた。Therefore, in a conventional compiler which inputs a source program described in a data distribution description language and generates a target program for a memory distributed multiprocessor system, a distribution method of each array and a division method of a loop are determined. After that, if the array elements accessed in the repetitive processing of the loop handled by each processor are not distributed to the processors, the distributed state of the array is temporarily changed so that the distributed state of the array matches the distributed state of the loop in the section of the loop processing. Therefore, a data transfer process for transferring array elements between processors was inserted before and after the loop.

【０００５】例えば、図５に示すような配列要素Ａ
（１）〜Ａ（１２）からなる配列Ａ，配列要素Ｂ（１）
〜Ｂ（１２）からなる配列Ｂ，配列要素Ｃ（１）〜Ｃ
（１２）からなる配列Ｃを宣言したプログラムを、メモ
リ分散型マルチプロセッサシステムの４台のプロセッサ
Ｐ（１），Ｐ（２），Ｐ（３），Ｐ（４）で並列実行さ
せるために、配列Ａ，Ｂ，Ｃをその先頭要素から順に３
要素ずつ各プロセッサＰ（１）〜Ｐ（４）に分散させた
場合を考える。この場合、そのプログラム中に存在する
例えば、ＤＯ２０Ｊ＝１，１２Ａ（Ｊ）＝Ｂ（Ｊ）＋Ｃ（Ｊ）２０ＣＯＮＴＩＮＵＥの如きＤＯループを、プロセッサＰ（１）；ＤＯ２０Ｊ＝１，３Ａ（Ｊ）＝Ｂ（Ｊ）＋Ｃ（Ｊ）２０ＣＯＮＴＩＮＥプロセッサＰ（２）；ＤＯ２０Ｊ＝４，６Ａ（Ｊ）＝Ｂ（Ｊ）＋Ｃ（Ｊ）２０ＣＯＮＴＩＮＥプロセッサＰ（３）；ＤＯ２０Ｊ＝７，９Ａ（Ｊ）＝Ｂ（Ｊ）＋Ｃ（Ｊ）２０ＣＯＮＴＩＮＥプロセッサＰ（４）；ＤＯ２０Ｊ＝１０，１２Ａ（Ｊ）＝Ｂ（Ｊ）＋Ｃ（Ｊ）２０ＣＯＮＴＩＮＥのように各プロセッサに分散する場合、配列の分散とル
ープの分散とが一致するために、そのループ処理区間に
おいて配列の分散状態を変更する必要はない。For example, an array element A as shown in FIG.
Array A consisting of (1) to A (12), array element B (1)
B, array elements C (1) to C (1) to B (12)
In order to execute a program declaring an array C consisting of (12) in parallel on four processors P (1), P (2), P (3), and P (4) of a memory distributed multiprocessor system, Arrays A, B, and C are sequentially assigned 3
Consider a case in which elements are distributed to the processors P (1) to P (4) one by one. In this case, a DO loop such as DO 20 J = 1,12 A (J) = B (J) + C (J) 20 CONTINUE existing in the program is processed by the processor P (1); DO 20 J = 1 , 3 A (J) = B (J) + C (J) 20 CONTINE processor P (2); DO 20 J = 4,6 A (J) = B (J) + C (J) 20 CONTINE processor P (3) DO 20 J = 7,9 A (J) = B (J) + C (J) 20 CONTINE Processor P (4); DO 20 J = 10,12 A (J) = B (J) + C (J) 20 When distributed to each processor as in CONTINE, it is not necessary to change the distributed state of the array in the loop processing section because the distribution of the array matches the distribution of the loop.

【０００６】しかしながら、その同じプログラム中の例
えば、ＤＯ３０Ｊ＝２，１２Ａ（Ｊ）＝Ｂ（Ｊ−１）Ｂ（Ｊ−１）＝Ｃ（Ｊ）３０ＣＯＮＴＩＮＥのようなＤＯループを、プロセッサＰ（１）；ＤＯ３０Ｊ＝２，３Ａ（Ｊ）＝Ｂ（Ｊ−１）Ｂ（Ｊ−１）＝Ｃ（Ｊ）３０ＣＯＮＴＩＮＥプロセッサＰ（２）；ＤＯ３０Ｊ＝４，６Ａ（Ｊ）＝Ｂ（Ｊ−１）Ｂ（Ｊ−１）＝Ｃ（Ｊ）３０ＣＯＮＴＩＮＥプロセッサＰ（３）；ＤＯ３０Ｊ＝７，９Ａ（Ｊ）＝Ｂ（Ｊ−１）Ｂ（Ｊ−１）＝Ｃ（Ｊ）３０ＣＯＮＴＩＮＥプロセッサＰ（４）；ＤＯ３０Ｊ＝１０，１２Ａ（Ｊ）＝Ｂ（Ｊ−１）Ｂ（Ｊ−１）＝Ｃ（Ｊ）３０ＣＯＮＴＩＮＥのように各プロセッサに分散させる場合、プロセッサＰ
（２）がＤＯ制御変数Ｊ＝４のときにアクセスする配列
要素Ｂ（３）が自プロセッサに分散されておらず、プロ
セッサＰ（３）がＤＯ制御変数Ｊ＝７のときにアクセス
する配列要素Ｂ（６）が自プロセッサに分散されておら
ず、プロセッサＰ（４）がＤＯ制御変数Ｊ＝１０のとき
にアクセスする配列要素Ｂ（９）が自プロセッサに分散
されておらない為、配列の分散とループの分散とが不一
致となる。However, a DO loop such as DO 30 J = 2,12 A (J) = B (J-1) B (J-1) = C (J) 30 CONTINE in the same program, Processor P (1); DO 30 J = 2,3 A (J) = B (J-1) B (J-1) = C (J) 30 CONTINE Processor P (2); DO 30 J = 4,6 A (J) = B (J-1) B (J-1) = C (J) 30 CONTINE Processor P (3); DO 30 J = 7,9 A (J) = B (J-1) B ( J-1) = C (J) 30 CONTINE Processor P (4); DO 30 J = 10,12 A (J) = B (J-1) B (J-1) = C (J) 30 CONTINE Processor P, the processor P
Array element B (3) accessed when (2) is DO control variable J = 4 is not distributed to its own processor, and array element B (3) accessed when processor P (3) is DO control variable J = 7 B (6) is not distributed to its own processor, and array element B (9) accessed by processor P (4) when DO control variable J = 10 is not distributed to its own processor. The variance does not match the variance of the loop.

【０００７】そこで、このようなＤＯループ３０につい
ては、そのループの処理区間の直前に、プロセッサＰ
（１）からプロセッサＰ（２）に対して配列要素Ｂ
（３）を転送し、プロセッサＰ（２）からプロセッサＰ
（３）に対して配列要素Ｂ（６）を転送し、プロセッサ
Ｐ（３）からプロセッサＰ（４）に対して配列要素Ｂ
（９）を転送するデータ転送処理を挿入し、そのループ
の処理区間の直後に、プロセッサＰ（２）からプロセッ
サＰ（１）に対して配列要素Ｂ（３）を転送し、プロセ
ッサＰ（３）からプロセッサＰ（２）に対して配列要素
Ｂ（６）を転送し、プロセッサＰ（４）からプロセッサ
Ｐ（３）に対して配列要素Ｂ（９）を転送するデータ転
送処理を挿入していた。Therefore, in such a DO loop 30, immediately before the processing section of the loop, the processor P
Array element B from (1) to processor P (2)
(3) is transferred from the processor P (2) to the processor P
The array element B (6) is transferred to (3), and the array element B (6) is transferred from processor P (3) to processor P (4).
Data transfer processing for transferring (9) is inserted, and immediately after the processing section of the loop, the array element B (3) is transferred from the processor P (2) to the processor P (1), and the processor P (3) is transferred. ) From the processor P (2) to the processor P (2), and a data transfer process for transferring the array element B (9) from the processor P (4) to the processor P (3). Was.

【０００８】なお、配列Ａ，Ｂ，Ｃの分散状態をＤＯル
ープ３０に合致するように予め分散させた場合にはＤＯ
ループ３０の処理区間における配列分散状態の変更は必
要なくなるが、その反面、ＤＯループ２０の処理区間で
の配列分散状態の変更が必要となる。If the distribution states of the arrays A, B, and C are dispersed in advance so as to match the DO loop 30, DO
Although it is not necessary to change the array distribution state in the processing section of the loop 30, it is necessary to change the array distribution state in the processing section of the DO loop 20.

【０００９】[0009]

【発明が解決しようとする課題】このように従来におい
ては、ループの分散状態と一致しない当該ループ中の配
列要素については、必ずそのループの前後にデータ転送
処理が行われる。このデータ転送処理は実際のループ処
理には直接関係しない並列化のために生じるオーバヘッ
ドである。並列化においては、このオーバヘッドを如何
にして削減し並列化効率を上げるかが１つの課題であ
る。As described above, in the related art, for an array element in a loop that does not match the distribution state of the loop, data transfer processing is always performed before and after the loop. This data transfer processing is an overhead caused by parallelization not directly related to the actual loop processing. In parallelization, one problem is how to reduce the overhead and increase the parallelization efficiency.

【００１０】例えば前記と同じプログラム中に、ＤＯ１０Ｊ＝２，１２Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＥのようなＤＯループ１０があり、このＤＯループ１０
を、プロセッサＰ（１）；ＤＯ１０Ｊ＝２，３Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＥプロセッサＰ（２）；ＤＯ１０Ｊ＝４，６Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＥプロセッサＰ（３）；ＤＯ１０Ｊ＝７，９Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＥプロセッサＰ（４）；ＤＯ１０Ｊ＝１０，１２Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＥのように各プロセッサに分散させた場合、プロセッサＰ
（２）がＤＯ制御変数Ｊ＝４のときにアクセスする配列
要素Ｂ（３）が自プロセッサに分散されておらず、プロ
セッサＰ（３）がＤＯ制御変数Ｊ＝７のときにアクセス
する配列要素Ｂ（６）が自プロセッサに分散されておら
ず、プロセッサＰ（４）がＤＯ制御変数Ｊ＝１０のとき
にアクセスする配列要素Ｂ（９）が自プロセッサに分散
されておらない為、配列の分散とループの分散とが不一
致となるため、従来は、前記ＤＯループ３０と同様に、
このようなＤＯループ１０についても、そのループの処
理区間の直前に、プロセッサＰ（１）からプロセッサＰ
（２）に対して配列要素Ｂ（３）を転送し、プロセッサ
Ｐ（２）からプロセッサＰ（３）に対して配列要素Ｂ
（６）を転送し、プロセッサＰ（３）からプロセッサＰ
（４）に対して配列要素Ｂ（９）を転送するデータ転送
処理を挿入し、そのループの処理区間の直後に、プロセ
ッサＰ（２）からプロセッサＰ（１）に対して配列要素
Ｂ（３）を転送し、プロセッサＰ（３）からプロセッサ
Ｐ（２）に対して配列要素Ｂ（６）を転送し、プロセッ
サＰ（４）からプロセッサＰ（３）に対して配列要素Ｂ
（９）を転送するデータ転送処理を挿入する。For example, in the same program as above, there is a DO loop 10 such as DO 10 J = 2,12 A (J) = B (J-1) + C (J) 10 CONTINE.
Processor P (1); DO 10 J = 2,3 A (J) = B (J−1) + C (J) 10 CONTINE processor P (2); DO 10 J = 4,6 A (J) = B (J-1) + C (J) 10 CONTINE processor P (3); DO10J = 7,9 A (J) = B (J-1) + C (J) 10 CONTINE processor P (4); DO10 J = 10,12 A (J) = B (J-1) + C (J) When distributed to each processor as in 10 CONTINE, the processor P
Array element B (3) accessed when (2) is DO control variable J = 4 is not distributed to its own processor, and array element B (3) accessed when processor P (3) is DO control variable J = 7 B (6) is not distributed to its own processor, and array element B (9) accessed by processor P (4) when DO control variable J = 10 is not distributed to its own processor. Since the variance and the variance of the loop do not match, conventionally, similarly to the DO loop 30,
Also for such a DO loop 10, immediately before the processing section of the loop, the processor P (1)
The array element B (3) is transferred to (2), and the array element B (3) is transferred from the processor P (2) to the processor P (3).
(6) is transferred from the processor P (3) to the processor P
Data transfer processing for transferring array element B (9) to (4) is inserted, and immediately after the processing section of the loop, processor P (2) transfers array element B (3) to processor P (1). ), The array element B (6) is transferred from the processor P (3) to the processor P (2), and the array element B (6) is transferred from the processor P (4) to the processor P (3).
A data transfer process for transferring (9) is inserted.

【００１１】しかしながら、このようなＤＯループ１０
については、配列要素Ｂ（３），Ｂ（６），Ｂ（９）は
繰り返し処理において参照だけが行われるものであり、
そのループの繰り返しの前後で値は変化しないので、ル
ープ直後のデータ転送処理は不要である。しかし、従来
はそのようなことを検出してデータ転送処理の挿入位置
を制御することができず、前記のＤＯループ３０と同様
にループの前後にデータ転送処理が挿入されていた。However, such a DO loop 10
, Array elements B (3), B (6), and B (9) are only referred to in the iterative processing.
Since the value does not change before and after the repetition of the loop, the data transfer processing immediately after the loop is unnecessary. However, conventionally, the insertion position of the data transfer process cannot be controlled by detecting such a situation, and the data transfer process is inserted before and after the loop as in the case of the DO loop 30 described above.

【００１２】そこで本発明の目的は、ループの繰り返し
処理においてアクセスされる配列要素のアクセス方法を
調査し、その調査結果に応じてデータ転送処理を挿入す
る位置を決定することにより、ループの並列化において
発生するデータ転送を削減しオーバヘッドを減らすこと
でデータ分散プログラムの実行性能を上げることにあ
る。It is therefore an object of the present invention to investigate a method of accessing an array element accessed in a loop repetition process and determine a position where a data transfer process is to be inserted in accordance with the result of the search, thereby making the loop parallel. An object of the present invention is to improve the execution performance of a data distribution program by reducing data transfer and overhead.

【００１３】[0013]

【課題を解決するための手段】本発明の第１のコンパイ
ラにおけるデータ転送処理割り付け方法は、データ分散
記述言語で記述されたソースプログラム中で宣言された
配列を分散メモリ型マルチプロセッサシステムを構成す
る複数のプロセッサのローカルメモリに分散させると共
に、ループの各繰り返し処理の間にデータ依存関係の無
いループについて、そのループを前記複数のプロセッサ
で並列実行可能な形式に変形するコンパイラにおいて、
前記ソースプログラム中の配列の分散に関する指示行に
従って前記配列を分散し、前記ループの各繰り返し処理
毎にその配列要素が分散されているプロセッサを調べ、
その繰り返し処理でアクセスされる配列要素が最も多く
分散されているプロセッサにその繰り返し処理を割り当
てるように前記ループを分散し、前記ループの繰り返し
処理においてアクセスする配列要素のうちその繰り返し
処理を実行するプロセッサに分散されていない配列要素
を検出してその配列要素の配列に対するアクセス方法を
調査し、その配列要素の配列に対するアクセス方法が参
照だけのときは、その配列要素の内容をそれが分散され
ているプロセッサからそれをアクセスするプロセッサに
転送するデータ転送処理を、そのループの直前に挿入
し、その配列要素の配列に対するアクセス方法が更新だ
けのときは、それをアクセスしたプロセッサからアクセ
ス後の配列要素の内容をその配列要素が分散されている
プロセッサに転送するデータ転送処理を、前記ループの
直後に挿入することを特徴とする。本発明の第２のコン
パイラにおけるデータ転送処理割り付け方法は、データ
分散記述言語で記述されたソースプログラム中で宣言さ
れた配列を分散メモリ型マルチプロセッサシステムを構
成する複数のプロセッサのローカルメモリに分散させる
と共に、ループの各繰り返し処理の間にデータ依存関係
の無いループについて、そのループを前記複数のプロセ
ッサで並列実行可能な形式に変形するコンパイラにおい
て、前記ソースプログラム中の配列の分散に関する指示
行に従って前記配列を分散し、前記ループの各繰り返し
処理毎にその配列要素が分散されているプロセッサを調
べ、その繰り返し処理でアクセスされる配列要素が最も
多く分散されているプロセッサにその繰り返し処理を割
り当てるように前記ループを分散し、前記ループの繰り
返し処理においてアクセスする配列要素のうちその繰り
返し処理を実行するプロセッサに分散されていない配列
要素を検出してその配列要素の配列に対するアクセス方
法を調査し、その配列要素の配列に対するアクセス方法
が参照だけのときは、その配列要素の内容をそれが分散
されているプロセッサからそれをアクセスするプロセッ
サに転送するデータ転送処理を、そのループの直前に挿
入し、その配列要素の配列に対するアクセス方法が更新
だけのときは、それをアクセスしたプロセッサからアク
セス後の配列要素の内容をその配列要素が分散されてい
るプロセッサに転送するデータ転送処理を、前記ループ
の直後に挿入し、その配列要素の配列に対するアクセス
方法が前記以外の場合には、その配列要素の内容をそれ
が分散されているプロセッサからそれをアクセスするプ
ロセッサに転送するデータ転送処理を、そのループの直
前に挿入すると共に、それをアクセスしたプロセッサか
らアクセス後の配列要素の内容をその配列要素が分散さ
れているプロセッサに転送するデータ転送処理を、前記
ループの直後に挿入することを特徴とする。According to a first aspect of the present invention, there is provided a method for allocating data transfer processing in a compiler, wherein an array declared in a source program described in a data distributed description language constitutes a distributed memory type multiprocessor system. A compiler that distributes the loop to a local memory of a plurality of processors and has no data dependency between each iteration of the loop, and transforms the loop into a form that can be executed in parallel by the plurality of processors.
In the instruction line regarding the distribution of the array in the source program
Therefore, the array is distributed, and each iteration of the loop is performed.
Check the processor where the array element is distributed for each
Most of the array elements accessed in that iteration
Assign the repetitive processing to distributed processors
Disperse the loop so that
Detects array elements that are not distributed to the processor that executes the iterative processing among array elements to be accessed in processing and investigates the access method to the array of the array elements. In some cases, a data transfer process that transfers the contents of the array element from the processor in which it is distributed to the processor that accesses it is inserted immediately before the loop, and the access method for the array of array elements is only update. In some cases, a data transfer process for transferring the contents of the accessed array element from the processor accessing the element to the processor in which the array element is distributed is inserted immediately after the loop. In the data transfer processing allocation method in the second compiler according to the present invention, an array declared in a source program described in a data distribution description language is distributed to local memories of a plurality of processors constituting a distributed memory multiprocessor system. with a loop for no data dependence between each iteration loop, the compiler for deforming the loop in parallel executable form by the plurality of processors, instructions about the distribution of the sequences in the source program
Disperse the array according to the rows, each iteration of the loop
For each process, adjust the processor in which the array elements are distributed.
In addition, the array elements accessed in the
Assign the repetitive processing to many distributed processors.
Disperse the loop so that
Detects array elements that are not distributed to the processor that executes the iterative processing among the array elements to be accessed in the return processing, investigates the access method to the array of the array elements, and refers only to the access method to the array of the array elements. In the case of, insert a data transfer process that transfers the contents of the array element from the processor in which it is distributed to the processor that accesses it, immediately before the loop, and the access method for the array of array elements is updated only. In the case of, a data transfer process for transferring the contents of the accessed array element from the processor that accessed it to the processor in which the array element is distributed is inserted immediately after the loop, and the access to the array of the array element is performed. Otherwise, the contents of the array element are distributed A data transfer process for transferring the data from the processor to the processor accessing the data is inserted immediately before the loop, and the content of the accessed array element is transferred from the processor accessing the data to the processor where the array element is distributed. A data transfer process is inserted immediately after the loop.

【００１４】[0014]

【実施例】次に本発明の実施例について図面を参照して
詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００１５】図１は本発明を適用したコンパイラの一例
を示すブロック図である。この例のコンパイラ１は、プ
ロセッサ毎にデータや処理の分散を指定できるようなデ
ータ分散記述言語によって記述されたソースプログラム
２を入力して、メモリ分散型マルチプロセッサシステム
向けの目的プログラム３を生成するもので、ソースプロ
グラム２を入力して構文解析を行い、中間テキスト４を
生成する解析部１１と、中間テキスト４を入力して処理
の並列化を行い、並列化中間テキスト５を生成する並列
化部１２と、並列化中間テキスト５から目的プログラム
３を生成する生成部１３とから構成されている。本発明
の特徴はコンパイラ１における並列化部１２にあるた
め、以下、並列化部１２について詳述する。FIG. 1 is a block diagram showing an example of a compiler to which the present invention is applied. The compiler 1 of this example receives a source program 2 described in a data distribution description language that can specify the distribution of data and processing for each processor, and generates a target program 3 for a memory distributed multiprocessor system. A parsing unit 11 that inputs the source program 2 and performs a syntax analysis to generate an intermediate text 4, and a parallelization that inputs the intermediate text 4 and performs parallel processing and generates a parallelized intermediate text 5 It comprises a unit 12 and a generating unit 13 for generating the target program 3 from the parallelized intermediate text 5. Since the features of the present invention reside in the parallelization unit 12 in the compiler 1, the parallelization unit 12 will be described in detail below.

【００１６】並列化部１２は、図１に示すように、配列
分散手段１２１と、ループ分散手段１２２と、分散比較
手段１２３と、アクセス解析手段１２４と、データ転送
処理生成手段１２５と、ループ変形手段１２６とから構
成されている。As shown in FIG. 1, the parallelizing unit 12 includes an array distributing unit 121, a loop distributing unit 122, a distribution comparing unit 123, an access analyzing unit 124, a data transfer processing generating unit 125, And means 126.

【００１７】配列分散手段１２１は、ソースプログラム
２において宣言された各配列のどの要素を、当該プログ
ラムを並列実行させる複数プロセッサのどのプロセッサ
に分散させるかを決定する。分散の決定は、ソースプロ
グラム２中に配列の分散方法を指定する指示行がある場
合にはそれに従い、そのような指示行がなければ予め定
められた規定値に従う。生成された目的プログラム３の
実行時には、ここで決定された分散状態に従って、当該
プログラムを並列実行させる複数プロセッサのローカル
メモリに配列の要素が割り付けられる。The array distributing means 121 determines which element of each array declared in the source program 2 is distributed to which of a plurality of processors that execute the program in parallel. The distribution is determined according to the instruction line that specifies the array distribution method in the source program 2 if the instruction line is specified, and according to a predetermined value if there is no such instruction line. When the generated target program 3 is executed, the elements of the array are allocated to local memories of a plurality of processors that execute the program in parallel according to the distribution state determined here.

【００１８】例えば、図２に示すように、各々がローカ
ルメモリとＣＰＵとデータ転送装置を有する４台のプロ
セッサＰ（１）〜Ｐ（４）がネットワーク、例えば伝送
路を介して結合されたメモリ分散型マルチプロセッサシ
ステムにおいて、４台のプロセッサＰ（１）〜Ｐ（４）
に対して、図３に示すような要素Ａ（１）〜Ａ（１２）
で構成される配列Ａを等分に分散させる場合、プロセッ
サＰ（１）〜Ｐ（４）のローカルメモリに、図３の符号
Ｍ１〜Ｍ４に示すような配列Ａ全体を格納する記憶域が
確保され、プロセッサＰ（１）の記憶域Ｍ１の１番目か
ら３番目のエントリに配列要素Ａ（１）〜Ａ（３）がロ
ードされ、プロセッサＰ（２）の記憶域Ｍ２の４番目か
ら６番目のエントリに配列要素Ａ（４）〜Ａ（６）がロ
ードされ、プロセッサＰ（３）の記憶域Ｍ３の７番目か
ら９番目のエントリに配列要素Ａ（７）〜Ａ（９）がロ
ードされ、プロセッサＰ（４）の記憶域Ｍ４の１０番目
から１２番目のエントリに配列要素Ａ（１０）〜Ａ（１
２）がロードされる。なお、配列Ａ全体を格納する記憶
域の代わりに、分散分の配列要素とプロセッサ間でデー
タ転送される配列要素との合計に最小限必要な記憶域を
確保する方法もある。For example, as shown in FIG. 2, a memory in which four processors P (1) to P (4) each having a local memory, a CPU and a data transfer device are connected via a network, for example, a transmission line. In a distributed multiprocessor system, four processors P (1) to P (4)
In contrast, elements A (1) to A (12) as shown in FIG.
In the case of distributing the array A composed of the following evenly, a storage area for storing the entire array A as shown by reference numerals M1 to M4 in FIG. 3 is secured in the local memories of the processors P (1) to P (4). Then, array elements A (1) to A (3) are loaded into the first to third entries of the storage area M1 of the processor P (1), and the fourth to sixth elements of the storage area M2 of the processor P (2) are loaded. Are loaded with array elements A (4) to A (6), and array elements A (7) to A (9) are loaded into the seventh to ninth entries of the storage area M3 of the processor P (3). The array elements A (10) to A (1) are stored in the tenth to twelfth entries of the storage area M4 of the processor P (4).
2) is loaded. Note that, instead of the storage area for storing the entire array A, there is also a method of securing a minimum storage area required for the sum of the array elements for the distributed portion and the array elements for data transfer between the processors.

【００１９】上記の配列分散手段１２１で決定された配
列の分散状態を前提として、ソースプログラム２に記述
された個々のループ毎に、ループ分散手段１２２，分散
比較手段１２３，アクセス解析手段１２４，データ転送
処理生成手段１２５およびループ変形手段１２６で、以
下のような処理を実行する。なお、現在注目しているル
ープを現ループと呼ぶ。On the premise of the distribution state of the array determined by the above-described array distribution unit 121, a loop distribution unit 122, a distribution comparison unit 123, an access analysis unit 124, a data The transfer processing generating means 125 and the loop deforming means 126 execute the following processing. The current loop is called the current loop.

【００２０】ループ分散手段１２２は、ソースプログラ
ム２に記述された現ループが、繰り返し処理の間でデー
タ依存関係の無い並列化可能なループの場合に、そのル
ープのどの繰り返し部分を、当該プログラムを並列実行
させる複数プロセッサのどのプロセッサに分散させるか
を決定する。分散の決定は、ソースプログラム２中に現
ループの分散方法を指定する指示行がある場合にはそれ
に従い、そのような指示行がない場合には、現ループ中
に含まれる全ての配列について配列分散手段１２１で決
定された分散状態に基づいて決定する。When the current loop described in the source program 2 is a parallelizable loop having no data dependency between the iterative processes, the loop distributing means 122 Decide which of a plurality of processors to be executed in parallel is distributed. The distribution is determined according to the instruction line specifying the distribution method of the current loop in the source program 2 if there is no such instruction line, and if there is no such instruction line, the array is set for all the arrays included in the current loop. The determination is made based on the distribution state determined by the distribution unit 121.

【００２１】次に分散比較手段１２３は、当該プログラ
ムを並列実行させる各プロセッサ毎に、そのプロセッサ
に分散させるものとしてループ分散手段１２２で決定さ
れた現ループの繰り返し部分においてアクセスされる配
列要素のうち、配列分散手段１２１でそのプロセッサに
分散するものとして決定されていない配列要素（以下、
このような配列要素を未割り付け配列要素と呼ぶ）を全
て検出する。Next, the distribution comparing means 123 determines, for each processor which executes the program in parallel, the array elements accessed in the repetition part of the current loop determined by the loop distributing means 122 to be distributed to the processors. , Array elements that have not been determined to be distributed to the processor by the array
All such array elements are referred to as unallocated array elements).

【００２２】次にアクセス解析手段１２４は、分散比較
手段１２３で検出された未割り付け配列要素に関する現
ループでのアクセス方法を調べ、以下の３通りに区分す
る。１）未割り付け配列要素に対して参照だけが行われてい
る。２）未割り付け配列要素に対して更新だけが行われてい
る。３）前記１）および２）以外。即ち未割り付け配列要素
に対して参照および更新が行われている。Next, the access analysis means 124 examines the access method in the current loop regarding the unallocated array element detected by the distribution comparison means 123, and classifies it into the following three types. 1) Only reference is made to an unallocated array element. 2) Only the update is performed on the unallocated array elements. 3) Other than the above 1) and 2). That is, reference and update are performed on the unallocated array elements.

【００２３】次にデータ転送処理生成手段１２５は、分
散比較手段１２３で検出された未割り付け配列要素と、
その配列要素についてアクセス解析手段１２４で調査さ
れたアクセス方法とに基づいて、データ転送処理の記述
を生成し、現ループに以下のように追加する。Ａ）アクセス方法が前記１）の未割り付け配列要素につ
いては、その配列要素が分散されているプロセッサか
ら、その配列要素の内容を、その未割り付け配列要素の
存在したプロセッサに転送するデータ転送処理の記述
を、現ループの直前に挿入する。Ｂ）アクセス方法が前記２）の未割り付け配列要素につ
いては、その未割り付け配列要素の存在したプロセッサ
から、更新後の未割り付け配列要素の内容を、その配列
要素が分散されているプロセッサに転送するデータ転送
処理の記述を、現ループの直後に挿入する。Ｃ）アクセス方法が前記３）の未割り付け配列要素につ
いては、その配列要素が分散されているプロセッサか
ら、その配列要素の内容を、その未割り付け配列要素の
存在したプロセッサに転送するデータ転送処理の記述
を、現ループの直前に挿入し、且つ、その未割り付け配
列要素の存在したプロセッサから、更新後の未割り付け
配列要素の内容を、その配列要素が分散されているプロ
セッサに転送するデータ転送処理の記述を、現ループの
直後に挿入する。Next, the data transfer processing generating means 125 calculates the unallocated array element detected by the distribution comparing means 123,
Based on the access method investigated by the access analysis means 124 for the array element, a description of the data transfer processing is generated and added to the current loop as follows. A) For the unallocated array element whose access method is 1), the data transfer processing for transferring the contents of the array element from the processor in which the array element is distributed to the processor in which the unallocated array element exists is described. Insert the description just before the current loop. B) For the unallocated array element whose access method is 2), the content of the updated unallocated array element is transferred from the processor in which the unallocated array element exists to the processor in which the array element is distributed. The description of the data transfer process is inserted immediately after the current loop. C) For the unallocated array element whose access method is 3), the data transfer processing for transferring the contents of the array element from the processor in which the array element is distributed to the processor in which the unallocated array element exists. Data transfer processing for inserting a description immediately before the current loop and transferring the contents of the updated unallocated array element from the processor in which the unallocated array element was present to the processor in which the array element is distributed. Is inserted immediately after the current loop.

【００２４】次にループ変形手段１２６は、現ループの
各繰り返し処理がループ分散手段１２２で決定されたプ
ロセッサ上で処理されるよう現ループのループ制御文等
を変形する。そして、ソースプログラム２に未だ注目し
ていないループが残っていれば、ループ分散手段１２２
に制御を戻し、残っていなければ処理を終了する。Next, the loop transforming means 126 transforms the loop control statement and the like of the current loop so that each repetitive processing of the current loop is processed on the processor determined by the loop distributing means 122. Then, if a loop that has not yet been noticed remains in the source program 2, the loop distribution unit 122
And the process is terminated if there is no remaining one.

【００２５】以下、具体例に沿って本実施例の動作を説
明する。なお、例としては、図４に示すようなデータ分
散記述言語ＨＰＦ（ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅ
ＦＯＲＴＲＡＮ）で記述されたソースプログラム２を
コンパイルする場合を取り上げる。なお、図４のソース
プログラム２において、３行目の「！ＨＰＦ＄ＰＲＯ
ＣＥＳＳＯＲＳＰ（４）」は当該プログラムを並列実
行させるプロセッサの台数を指定したＨＰＦ指示行であ
り、４行目の「！ＨＰＦ＄ＤＩＳＴＲＩＢＵＴＥ
（ＢＬＯＣＫ（３））ＯＮＴＯＰ：：Ａ，Ｂ，
Ｃ」は配列Ａ〜Ｃの分散方法を指定するＨＰＦ指示行で
ある。The operation of this embodiment will be described below with reference to a specific example. As an example, a data distributed description language HPF (High Performance) as shown in FIG.
The case where the source program 2 described in FORTRAN) is compiled will be described. In the source program 2 of FIG. 4, "! HPF @ PRO"
“CESSORS P (4)” is an HPF instruction line that specifies the number of processors that execute the program in parallel, and “! HPF @ DISTRIBUTE” on the fourth line.
(BLOCK (3)) ONTO P :: A, B,
“C” is an HPF instruction line that specifies the distribution method of the arrays A to C.

【００２６】図４のようなソースプログラム２が入力さ
れると、コンパイラ１の解析部１１はその構文解析を行
って中間テキスト４を生成し、並列化部１２がその中間
テキスト４を入力して、以下のような処理を実行する。When the source program 2 as shown in FIG. 4 is input, the analysis unit 11 of the compiler 1 analyzes the syntax to generate an intermediate text 4, and the parallelization unit 12 inputs the intermediate text 4 and The following processing is executed.

【００２７】先ず、配列分散手段１２１は、プログラム
で宣言された配列Ａ〜Ｃに関して、その分散方法を指定
する指示行が存在するため、その指示行の内容に従っ
て、配列Ａ，Ｂ，Ｃをその先頭要素から３要素ずつ順番
に、４台のプロセッサ（Ｐ（１），Ｐ（２），Ｐ
（３），Ｐ（４）とする）に割り当てる。図５はこのと
きの配列Ａ，Ｂ，Ｃの各プロセッサＰ（１），Ｐ
（２），Ｐ（３），Ｐ（４）への分散状況を示してい
る。First, the array dispersing means 121 divides the arrays A, B, and C according to the contents of the instruction lines because there are instruction lines for specifying the distribution method for the arrays A to C declared in the program. Four processors (P (1), P (2), P
(3), P (4)). FIG. 5 shows each processor P (1), P of the arrays A, B, C at this time.
(2), the distribution status to P (3) and P (4) is shown.

【００２８】次にループ分散手段１２２は、ソースプロ
グラム２に記述された１つのループに注目し、そのルー
プが繰り返し処理においてデータ依存関係が無ければ並
列化の対象とし、そのループの分散を決定する。今、図
４に示される配列のアクセスを含む、ＤＯ１０Ｊ＝２，１２Ａ（Ｊ）＝Ｂ（Ｊ−１）＋Ｃ（Ｊ）１０ＣＯＮＴＩＮＵＥなるＤＯループ１０に注目したとすると、ループの繰り
返し処理においてデータ依存関係が無いため並列化可能
と判断する。また、そのループに対する分散指示の指示
行が無いため、そのループ中の配列Ａ，Ｂ，Ｃについて
の図５に示した分散状態に基づいて、そのループのどの
繰り返し処理をどのプロセッサに割り当てるかを決定す
る。Next, the loop distributing means 122 pays attention to one loop described in the source program 2, and if the loop has no data dependency in the repetitive processing, sets it as a target of parallelization and determines the distribution of the loop. . Now, if attention is paid to the DO loop 10 including DO 10 J = 2, 12 A (J) = B (J−1) + C (J) 10 CONTINUE including the access of the array shown in FIG. Since there is no data dependency in the processing, it is determined that parallelization is possible. Further, since there is no instruction line of the distribution instruction for the loop, based on the distribution state shown in FIG. 5 for the arrays A, B, and C in the loop, it is determined which repetitive processing of the loop is to be allocated to which processor. decide.

【００２９】この決定においては、プロセッサ間の配列
要素のデータ転送を極力避けるために、各繰り返し処理
毎に、その繰り返し処理でアクセスされる配列要素とそ
の配列要素が分散されているプロセッサとを調べ、その
繰り返し処理でアクセスされる配列要素が最も多く分散
されているプロセッサ（換言すれば未割り付け配列要素
が最も少ないプロセッサ）に、その繰り返し処理を割り
当てるようにする。In this determination, in order to avoid data transfer of array elements between processors as much as possible, at each repetition processing, the array elements accessed in the repetition processing and the processors in which the array elements are distributed are examined. The repetition processing is assigned to the processor in which the array elements accessed in the repetition processing are distributed most (in other words, the processor having the least unallocated array elements).

【００３０】例えば上記のＤＯループ１０についてＤＯ
制御変数Ｊの値毎、つまり各繰り返し処理毎に、その繰
り返し処理でアクセスされる配列要素とその配列要素が
分散されているプロセッサとの関係を調べると、図６の
（Ａ）に示すようになるため、各繰り返し処理の割り当
て先プロセッサは図６の（Ｂ）に示すものとなる。即
ち、ＤＯ制御変数Ｊが２である１巡目の繰り返し処理を
考えると、アクセスされる配列要素はＡ（２），Ｂ
（１），Ｃ（２）であり、これらの配列要素は全てプロ
セッサＰ（１）に分散されているので、一巡目の繰り返
し処理はプロセッサＰ（１）に割り当てられる。同様に
ＤＯ制御変数Ｊが３である２巡目の繰り返し処理もプロ
セッサＰ（１）に割り当てられる。ＤＯ制御変数Ｊが４
である３巡目の繰り返し処理では、アクセスされる３つ
の配列要素のうち、１つの配列要素Ｂ（３）がプロセッ
サＰ（１）に、残り２つの配列要素Ａ（４），Ｃ（４）
がプロセッサＰ（２）に分散されているため、プロセッ
サＰ（２）に割り当てられる。以下同様に、ＤＯ制御変
数Ｊが５，６である４，５巡目の繰り返し処理がプロセ
ッサＰ（２）に、ＤＯ制御変数Ｊが７，８，９である
６，７，８巡目の繰り返し処理がプロセッサＰ（３）
に、ＤＯ制御変数Ｊが１０，１１，１２である９，１
０，１１巡目の繰り返し処理がプロセッサＰ（４）に、
それぞれ割り当てられる。For example, for the above DO loop 10, DO
For each value of the control variable J, that is, for each iterative process, the relationship between the array element accessed in the iterative process and the processor in which the array element is distributed is examined. As shown in FIG. Therefore, the processor to which each repetition process is assigned is as shown in FIG. That is, considering the first iteration of the DO control variable J of 2, the array elements to be accessed are A (2), B
(1) and C (2), and since these array elements are all distributed to the processor P (1), the first round of repetitive processing is assigned to the processor P (1). Similarly, the second repetition process in which the DO control variable J is 3 is also assigned to the processor P (1). DO control variable J is 4
In the third iteration of the processing, among the three array elements to be accessed, one array element B (3) is assigned to the processor P (1) and the remaining two array elements A (4) and C (4)
Are distributed to the processor P (2), and are thus assigned to the processor P (2). Similarly, the repetition processing of the fourth and fifth rounds in which the DO control variable J is 5 and 6 is performed on the processor P (2), and the sixth, seventh and eighth rounds in which the DO control variable J is 7, 8, and 9 are performed. Iterative processing is processor P (3)
, Where the DO control variable J is 10, 11, 12
The repetition processing of the 0th and 11th rounds is performed by the processor P (4).
Assigned respectively.

【００３１】次に分散比較手段１２３は、各プロセッサ
Ｐ（１）〜Ｐ（４）毎に、そのプロセッサに割り当てら
れたループの繰り返し部分においてアクセスする配列要
素のうち、配列分散手段１２１でそのプロセッサに分散
されていない配列要素（未割り付け配列要素）を全て検
出する。例えば、上述したループ１０では、ＤＯ制御変
数Ｊが４である３巡目の繰り返し処理を割り当てられた
プロセッサＰ（２）がアクセスする配列要素Ｂ（３）は
そのプロセッサＰ（２）に分散されておらず、ＤＯ制御
変数Ｊが７である６巡目の繰り返し処理を割り当てられ
たプロセッサＰ（３）がアクセスする配列要素Ｂ（６）
はそのプロセッサＰ（３）に分散されておらず、ＤＯ制
御変数Ｊが１０である９巡目の繰り返し処理を割り当て
られたプロセッサＰ（４）がアクセスする配列要素Ｂ
（９）はそのプロセッサＰ（４）に分散されていないた
め、図６の（Ｃ）に示す配列Ｂの３つの配列要素Ｂ
（３），Ｂ（６），Ｂ（９）が未割り付け配列要素とし
て検出される。Next, for each of the processors P (1) to P (4), the distribution comparing means 123 selects, for each processor P (1) to P (4), All array elements that are not distributed to the array (unallocated array elements) are detected. For example, in the above-described loop 10, the array element B (3) accessed by the processor P (2) to which the third iteration of the DO control variable J is 4 is distributed to the processor P (2). Array element B (6) accessed by processor P (3) to which the sixth iteration of DO control variable J of 7 has been assigned
Is not distributed to the processor P (3), and the array element B accessed by the processor P (4) to which the ninth iteration processing in which the DO control variable J is 10 is assigned
Since (9) is not distributed to the processor P (4), three array elements B of the array B shown in FIG.
(3), B (6), and B (9) are detected as unallocated array elements.

【００３２】次にアクセス解析手段１２４は、分散比較
手段１２３で検出された未割り付け配列要素に関して、
そのループ中でのアクセスの内容を解析し、前述した
１），２），３）の３つのケースに分類する。前述のＤ
Ｏループ１０について検出された未割り付け配列要素Ｂ
（３），Ｂ（６），Ｂ（９）については、参照だけが行
われているので、前記１）のケースと判断される。Next, the access analyzing means 124 calculates the unallocated array element detected by the distribution comparing means 123
The contents of the access in the loop are analyzed and classified into the three cases 1), 2) and 3) described above. D above
Unallocated array element B detected for O loop 10
As for (3), B (6), and B (9), only the reference is performed, so the case of the above 1) is determined.

【００３３】次にデータ転送処理生成手段１２５は、分
散比較手段１２３で検出された未割り付け配列要素と、
それについてアクセス解析手段１２４で調査されたアク
セス方法とに基づき、前述のＡ）〜Ｃ）に示す方法で、
データ転送処理の記述をループに追加する。従って、前
記ＤＯループ１０の未割り付け配列要素Ｂ（３），Ｂ
（６），Ｂ（９）については、そのアクセス方法が前記
１）のケースに該当するので、前記Ａ）の方法が適用さ
れ、ＤＯループ１０の直前にデータ転送処理の記述が挿
入される。このとき挿入されるデータ転送処理の記述
は、プロセッサＰ（１）上の配列要素Ｂ（３）の内容を
プロセッサＰ（２）に転送する記述と、プロセッサＰ
（２）上の配列要素Ｂ（６）の内容をプロセッサＰ
（３）に転送する記述と、プロセッサＰ（３）上の配列
要素Ｂ（９）の内容をプロセッサＰ（４）に転送する記
述とを含む。Next, the data transfer processing generating means 125 calculates the unallocated array element detected by the distribution comparing means 123,
Based on the access method investigated by the access analysis means 124, the method shown in the above A) to C) is used.
Add a description of the data transfer process to the loop. Therefore, the unallocated array elements B (3), B
For (6) and B (9), since the access method corresponds to the case of the above 1), the method of the above A) is applied, and a description of the data transfer process is inserted immediately before the DO loop 10. The description of the data transfer process inserted at this time includes a description for transferring the contents of the array element B (3) on the processor P (1) to the processor P (2) and a description for the processor P (2).
(2) The contents of the array element B (6) above are
The description includes a description to be transferred to (3) and a description to transfer the contents of array element B (9) on processor P (3) to processor P (4).

【００３４】次にループ変形手段１２６は、ループの各
繰り返し処理がループ分散手段１２２で決定されたプロ
セッサ上で処理されるようにループを変形する。例えば
前記のＤＯループ１０の場合、ＤＯ制御変数Ｊに与えら
れる開始と終了の値をｓｔａｒｔ（）及びｅｎｄ（
）の関数で求めるように変形する。なお、関数ｓｔａ
ｒｔ（）及び関数ｅｎｄ（）はコンパイラ２が目的
プログラム３中に生成する関数であり、プロセッサＰ
（１）から呼び出されたときは２および３を、プロセッ
サＰ（２）から呼び出されたときは４および６を、プロ
セッサＰ（３）から呼び出されたときは７および９を、
プロセッサＰ（４）から呼び出されたときは１０および
１２を返却する。Next, the loop transformation means 126 transforms the loop so that each iteration of the loop is processed on the processor determined by the loop distribution means 122. For example, in the case of the DO loop 10, the start and end values given to the DO control variable J are defined as start () and end (
). Note that the function sta
rt () and function end () are functions generated by the compiler 2 in the target program 3, and the processor P
2 and 3 when called from (1), 4 and 6 when called from processor P (2), 7 and 9 when called from processor P (3),
When called from the processor P (4), 10 and 12 are returned.

【００３５】従って、前述したＤＯループ１０の場合に
は、結局、図７に示すようなループに変形される。な
お、図７において、１行目のデータ転送処理の記述は、
プロセッサＰ（１）上の配列要素Ｂ（３）の内容をプロ
セッサＰ（２）に転送する記述を、２行目のデータ転送
処理の記述は、プロセッサＰ（２）上の配列要素Ｂ
（６）の内容をプロセッサＰ（３）に転送する記述を、
３行目のデータ転送処理の記述は、プロセッサＰ（３）
上の配列要素Ｂ（９）の内容をプロセッサＰ（４）に転
送する記述をそれぞれ示し、各々該当するプロセッサの
みで実行される。Accordingly, in the case of the above-described DO loop 10, the loop is eventually transformed into a loop as shown in FIG. In FIG. 7, the description of the data transfer process on the first line is
The description for transferring the contents of the array element B (3) on the processor P (1) to the processor P (2) is described in the description of the data transfer processing on the second line.
The description for transferring the content of (6) to the processor P (3) is as follows:
The description of the data transfer processing in the third line is described in the processor P (3).
The description for transferring the contents of the above array element B (9) to the processor P (4) is shown, and is executed only by the corresponding processor.

【００３６】ループ変形手段１２６は、現ループについ
ての変形を終えると、プログラム中にまだ並列化してい
ないループが残っていれば、ループ分散手段１２２に制
御を戻し、残っていなければ処理を終了する。When the modification of the current loop is completed, the loop transformation means 126 returns the control to the loop distribution means 122 if there is a loop which has not been parallelized yet in the program, and terminates the processing if there is no remaining loop. .

【００３７】以上のような並列化部１２の処理後、生成
部１３が並列化中間テキスト５から目的プログラム３を
生成する。After the processing by the parallelizing unit 12 as described above, the generating unit 13 generates the target program 3 from the parallelized intermediate text 5.

【００３８】[0038]

【発明の効果】以上説明したように、本発明は、ループ
の繰り返し処理においてアクセスする配列要素のうちそ
の繰り返し処理を実行するプロセッサに分散されていな
い配列要素を検出した場合、更にその配列要素のアクセ
ス方法を調査し、それが参照のみの場合にはループの直
前にだけデータ転送処理を挿入し、それが更新のみの場
合にはループの直後にだけデータ転送処理を挿入するよ
うにしたので、何れの場合にも一律にループの前後にデ
ータ転送処理が挿入されていた従来技術に比べてデータ
転送処理の回数を削除することができ、データ分散プロ
グラムの並列化におけるオーバヘッドを低減することが
できる。As described above, according to the present invention, when an array element which is not distributed among processors which execute the iterative processing among array elements to be accessed in a loop iterative processing is detected, the array element After investigating the access method, if it is a reference only, insert a data transfer process just before the loop, and if it is only an update, insert a data transfer process only immediately after the loop, In any case, the number of times of data transfer processing can be reduced compared to the related art in which data transfer processing is uniformly inserted before and after a loop, and overhead in parallelizing a data distribution program can be reduced. .

[Brief description of the drawings]

【図１】本発明を適用したコンパイラの一実施例を示す
ブロック図である。FIG. 1 is a block diagram showing an embodiment of a compiler to which the present invention is applied.

【図２】配列を各プロセッサのローカルメモリに割り付
ける方法の一例の説明図である。FIG. 2 is an explanatory diagram of an example of a method of allocating an array to a local memory of each processor.

【図３】メモリ分散型マルチプロセッサシステムの一例
を示すブロック図である。FIG. 3 is a block diagram illustrating an example of a memory distributed multiprocessor system.

【図４】データ分散記述言語で記述されたソースプログ
ラムの一例を示す図である。FIG. 4 is a diagram showing an example of a source program described in a data distribution description language.

【図５】配列の分散例を示す図である。FIG. 5 is a diagram illustrating an example of arrangement of arrays;

【図６】ループの繰り返し処理を複数のプロセッサに分
散させた例と、ループの繰り返し処理においてアクセス
する配列要素のうちその繰り返し処理を実行するプロセ
ッサに分散されていない配列要素の例とを示す図であ
る。FIG. 6 is a diagram illustrating an example in which loop iteration processing is distributed to a plurality of processors, and an example of array elements that are not distributed among processors that execute the iteration processing among array elements accessed in loop iteration processing. It is.

【図７】並列化後のループの例を示す図である。FIG. 7 is a diagram illustrating an example of a loop after parallelization.

[Explanation of symbols]

１…コンパイラ１１…解析部１２…並列化部１２１…配列分散手段１２２…ループ分散手段１２３…分散比較手段１２４…アクセス解析手段１２５…データ転送処理生成手段１２６…ループ変形手段１３…生成部２…ソースプログラム３…目的プログラム４…中間テキスト５…並列化中間テキスト DESCRIPTION OF SYMBOLS 1 ... Compiler 11 ... Analysis part 12 ... Parallelization part 121 ... Array distribution means 122 ... Loop distribution means 123 ... Distribution comparison means 124 ... Access analysis means 125 ... Data transfer processing generation means 126 ... Loop transformation means 13 ... Generation part 2 ... Source program 3 ... Objective program 4 ... Intermediate text 5 ... Parallelized intermediate text

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−114516（ＪＰ，Ａ) 「並列処理シンポジウムＪＳＰＰ’ 95論文集」（1995−５−15）Ｐ．361− 368 「情報処理」Ｖｏｌ．34，Ｎｏ．９（1993−９）Ｐ．1179−1185 「並列処理シンポジウムＪＳＰＰ’ 94」（1994−５）Ｐ．156−158 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/45 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-7-1114516 (JP, A) "Parallel Processing Symposium JSPP'95 Transactions" (1995-5-15) 361-368 "Information Processing" Vol. 34, no. 9 (1993-9) p. 1179-1185 "Parallel Processing Symposium JSPP'94" (1994-5) 156-158 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 9/45

Claims

(57) [Claims]

1. A source described in a data distribution description language
Arrays declared in the program can be
Localizers of multiple processors that make up a processor system
And iterative processing of each loop
Loops that have no data dependencies between them
To a form that can be executed in parallel by the multiple processors
CompilerIn the instruction line regarding the distribution of the array in the source program
Therefore, dispersing the array, The array elements are distributed for each iteration of the loop.
Check the processor that is
Processor with the most distributed array elements
The loop is divided so that
Scattered In the loop iteration process Array required to access
Distributed to processors that execute the repetition of the element
Undetected array element and add it to the array
Investigate the access method for the
When the contents of an array element are
Transfer from the processor to the accessing processor
Insert the data transfer process just before the loop, and the access method to the array
Sometimes, after access from the processor that accessed it
The contents of the array element
Data transfer processing to be transferred to the processor immediately after the loop
Characterized by being inserted intoData in the compiler
Transfer processing allocation method.

2. A source described in a data distribution description language.
Arrays declared in the program can be
Localizers of multiple processors that make up a processor system
And iterative processing of each loop
Loops that have no data dependencies between them
To a form that can be executed in parallel by the multiple processors
CompilerIn the instruction line regarding the distribution of the array in the source program
Therefore, dispersing the array, The array elements are distributed for each iteration of the loop.
Processor Search, and access
Processor with the most distributed array elements
The loop is divided so that
Scattered In the loop iteration process Array required to access
Distributed to processors that execute the repetition of the element
Undetected array element and add it to the array
Investigate the access method for the
When the contents of an array element are
Transfer from the processor to the accessing processor
Insert the data transfer process just before the loop, and the access method to the array
Sometimes, after access from the processor that accessed it
The contents of the array element
Data transfer processing to be transferred to the processor immediately after the loop
And the access method to the array of array elements is other than the above.
If the contents of that array element it is scattered
Transfer it from the processor to the accessing processor
Inserting a data transfer process immediately before the loop
Distribution from the processor that accessed it.
The contents of a column element are
Insert the data transfer process to be transferred to the
Characterized byData transfer in the compiler
Processing allocation method.