JPH08305580A

JPH08305580A - Language processing unit for parallel program

Info

Publication number: JPH08305580A
Application number: JP10730695A
Authority: JP
Inventors: Kenji Suehiro; 謙二末広
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-05-01
Filing date: 1995-05-01
Publication date: 1996-11-22

Abstract

PURPOSE: To provide the parallel program language processing unit converting a program so that efficient parallel processing is executed with less number of times of awaiting a synchronizing signal in a chaining calculation process. CONSTITUTION: A source program converted into an internal expression by a syntax analyzer 1 is converted in parallel operation by a parallel computer system comprising plural processors having a local memory and the program is converted into an object program by an object code generator 3 and the converted program is outputted. In this case, the transfer of operands on the way of the processing is eliminated by converting the program so that an operand matrix required for the processing of an allocated part of each processor by a data transfer code insert device 4 is transferred before the start of the processing in each local memory, and part of the intermediate result of the processing is calculated in duplicate by allowing plural processors to calculate it by a calculation processing division allocation device 5 so as to eliminate transfer of the intermediate result on the way of the processing.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、原始プログラムを、ロ
ーカルメモリを備えた複数のプロセッサから成る並列計
算機システムで並列動作するような目的プログラムに変
換する並列プログラム言語処理装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel program language processing apparatus for converting a source program into an object program which operates in parallel in a parallel computer system composed of a plurality of processors having a local memory.

【０００２】[0002]

【従来の技術】ローカルメモリを備えた複数のプロセッ
サから成る並列計算機システムでプログラムを実行させ
る際、プログラム中の計算処理にかかる被演算数や結果
を保持する配列を、互いに素な部分配列に分割し、前記
複数のプログラムにそれぞれ割り当てて、各プロセッサ
は自己の分担する部分配列の更新計算処理のみを実行す
るという方式、すなわち所有者計算規則（ｏｗｎｅｒ
ｃｏｍｐｕｔｅｒｕｌｅ）に基づく並列処理が、広く
行なわれている。所有者計算規則に基づくプログラム言
語処理系は、たとえば「Ｌａｎｇｕａｇｅｓ，Ｃｏｍｐ
ｉｌｅｒｓａｎｄＲｕｎ−ＴｉｍｅＥｎｖｉｒｏ
ｎｍｅｎｔｓｆｏｒＤｉｓｔｒｉｂｕｔｅｄＭｅ
ｍｏｒｙＭａｃｈｉｎｅｓ（Ｓａｌｔｓ，Ｍｅｈｒｏ
ｔｒａ共編，１９９２）」の１３９〜１７６ページ、
「ＣｏｍｐｉｌｅｒＳｕｐｐｏｒｔｆｏｒＭａｃｈ
ｉｎｅ−ＩｎｄｅｐｅｎｄｅｎｔＰａｒａｌｌｅｌ
ＰｒｏｇｒａｍｍｉｎｇｉｎＦｏｒｔｒａｎＤ」
に提案されている。2. Description of the Related Art When a program is executed in a parallel computer system including a plurality of processors having a local memory, an array holding an operand and a result of a calculation process in the program is divided into disjoint partial arrays. However, it is assigned to each of the plurality of programs, and each processor executes only the update calculation process of the sub-array that is shared by itself, that is, the owner calculation rule (owner calculation rule).
Parallel processing based on the compute rule is widely performed. A programming language processing system based on the owner calculation rule is described, for example, in “Languages, Comp.
illers and Run-Time Enviro
nments for Distributed Me
more Machines (Salts, Mehro
tra co-edited, 1992) ”, pp. 139-176,
"Compiler Support for Mach
ine-Independent Parental
Programming in Fortran D "
Has been proposed to.

【０００３】この方式の並列処理では、あるプロセッサ
が計算処理を行なう上で他のプロセッサに割り当てられ
た部分配列上の被演算数を必要とする場合、当該プロセ
ッサ間で被演算数の転送処理が必要となる。In the parallel processing of this system, when one processor needs an operand in the partial array assigned to another processor to perform a calculation process, transfer processing of the operand between the processors is performed. Will be needed.

【０００４】[0004]

【発明が解決しようとする課題】前記従来方式の並列処
理では、あるプロセッサが計算処理を行なう上で他のプ
ロセッサに割り当てられた部分配列上の被演算数を必要
とする場合、当該プロセッサ間で被演算数の転送処理が
必要となり、転送処理が終了するまでは計算処理を開始
することができないため、これに伴って同期待ちが発生
する。しかしながら、通常のプログラムに見られるよう
な連鎖的な計算処理過程、すなわちある計算処理の結果
が次の計算処理の被演算数となるような計算処理過程に
おいては、過程の各段階ごとに前記データ転送が発生
し、これに伴うプロセッサ間の同期待ちの回数が増大し
て並列処理の効率が損なわれるという問題があった。In the conventional parallel processing described above, when one processor needs an operand to be operated on a partial array allocated to another processor in order to perform a calculation process, the processor is divided between the processors. Since the transfer processing of the operand is required and the calculation processing cannot be started until the transfer processing is completed, a synchronization wait occurs accordingly. However, in a chained calculation processing process as seen in a normal program, that is, in a calculation processing process in which the result of one calculation process is the operand of the next calculation process, the There is a problem that transfer occurs, and the number of times of waiting for synchronization between processors increases due to the transfer, and the efficiency of parallel processing is impaired.

【０００５】図７は図６に示す従来のプログラム言語処
理装置で後述する図４のプログラムを処理し、ローカル
メモリを備えた複数のプログラムから成る並列計算機シ
ステムで実行させた場合のデータの流れを示している。
図７では配列Ａ，Ｂ，Ｃ，Ｄを１０等分し、連続する１
００個ずつの要素を１０台のプロセッサに割り当てた場
合の２台目のプロセッサ（配列要素番号１０１〜２００
を保持している）の様子を表している。長方形のマス目
はそれぞれ配列要素を表しており、中の数字はその要素
番号である。また矢印はデータの流れであり、各ステッ
プで上側の要素から下側の要素が計算されることを表し
ている。FIG. 7 shows the flow of data when the program shown in FIG. 4 to be described later is processed by the conventional programming language processor shown in FIG. 6 and executed by a parallel computer system composed of a plurality of programs having local memories. Shows.
In FIG. 7, the arrays A, B, C, and D are divided into 10 equal parts, and continuous
A second processor (array element numbers 101 to 200) when 00 elements are allocated to 10 processors
Is held). Each rectangular cell represents an array element, and the number inside is the element number. The arrows indicate the flow of data, and the lower element is calculated from the upper element in each step.

【０００６】図７では、ステップ１でのＢおよびＣの計
算にはデータ転送は必要ないが、ステップ２においてＤ
の計算をするために、Ｂ，Ｃのそれぞれ２要素ずつ合計
４要素を他のプロセッサから転送しなければならないこ
とがわかる。これらが転送元のプロセッサ上で計算され
るまで転送処理を始めることができないため、ステップ
１とステップ２の間でプロセッサ間の同期待ちが発生
し、並列処理の効率が損なわれる。In FIG. 7, data transfer is not required for the calculation of B and C in step 1, but D is used in step 2.
It can be seen that a total of 4 elements, 2 elements each of B and C, must be transferred from another processor in order to calculate Since the transfer processing cannot be started until these are calculated on the processor of the transfer source, synchronization waiting between processors occurs between step 1 and step 2, and the efficiency of parallel processing is impaired.

【０００７】本発明はこのような点に鑑み、連鎖的な計
算処理過程において同期待ちの回数が少ない効率良い並
列処理が行なえるようプログラムを変換する並列プログ
ラム言語処理装置を提供することを目的とする。In view of the above points, an object of the present invention is to provide a parallel program language processing apparatus for converting a program so that efficient parallel processing can be performed with a small number of synchronization waits in a chain of calculation processing steps. To do.

【０００８】[0008]

【課題を解決するための手段】第１の発明の並列プログ
ラム言語処理装置は、原始プログラム中の連鎖的な処理
単位ごとに、各プロセッサが当該処理を行なうにあたっ
て必要とする配列が各ローカルメモリ上に当該処理単位
の開始前に用意されるようプロセッサ間においてデータ
転送させ、かつ当該配列を扱う計算処理の一部を複数の
プロセッサで重複して実行させるようプログラムを変換
することを特徴としている。In the parallel programming language processing apparatus of the first invention, an array required for each processor to perform the processing is stored in each local memory for each chained processing unit in the source program. In addition, the program is converted such that data is transferred between the processors so as to be prepared before the start of the processing unit, and a part of the calculation processing that handles the array is executed by a plurality of processors in an overlapping manner.

【０００９】第２の発明の並列プログラム言語処理装置
は、プロセッサに原始プログラムの配列を分割配置する
にあたり、その一部を複数のプロセッサに重複して配置
し、かつ当該配列を扱う計算処理の一部を複数のプロセ
ッサで重複して実行させるようプログラムを変換するこ
とを特徴としている。In the parallel programming language processing apparatus of the second invention, when the array of the source program is divided and arranged in the processor, a part of the array is redundantly arranged in a plurality of processors and one of the calculation processes for handling the array. It is characterized in that a program is converted so that a plurality of processors can be executed in duplicate.

【００１０】[0010]

【作用】第１の発明においては、原始プログラム中の連
鎖的な処理単位ごとに、各プロセッサが当該処理を行な
うにあたって必要とする配列が各ローカルメモリ上に当
該処理単位の開始前に用意されるようプロセッサ間にお
いてデータ転送させることにより、一連の計算処理の中
途での被演算数のデータ転送の必要性を除去する。ま
た、当該配列を扱う計算処理の一部を複数のプロセッサ
で重複して実行させることにより、連鎖的な計算処理の
過程で生成される中間結果であって各プロセッサの後段
の処理に必要となる部分を、各プロセッサのローカルメ
モリ上のデータのみを用いてそれぞれに計算させ、一連
の計算処理の中途での中間結果のデータ転送の必要性を
除去する。In the first aspect of the invention, for each chained processing unit in the source program, an array required for each processor to perform the processing is prepared in each local memory before the processing unit is started. By thus transferring data between the processors, it is possible to eliminate the need for transferring the data of the operand in the middle of a series of calculation processes. In addition, by causing a part of the calculation processing that handles the array to be executed by multiple processors in duplicate, it is an intermediate result that is generated in the process of the chained calculation processing and is necessary for the subsequent processing of each processor. The parts are respectively calculated using only the data on the local memory of each processor, eliminating the need for data transfer of intermediate results in the middle of a series of calculation processes.

【００１１】第２の発明においては、配列の一部を複数
のプロセッサに重複させて配置することにより、計算処
理に必要な被演算数を当該処理の開始前にすべてローカ
ルメモリ上に用意し、計算処理の中途での被演算数のデ
ータ転送の必要性を除去する。また、当該配列を扱う計
算処理の一部を複数のプロセッサで重複して実行させる
ことにより、連鎖的な計算処理の過程で生成される中間
結果であって各プロセッサの後段の処理に必要となる部
分を、各プロセッサのローカルメモリ上のデータのみを
用いてそれぞれに計算させ、一連の計算処理の中途での
中間結果のデータ転送の必要性を除去する。In the second invention, by arranging a part of the array so as to be overlapped with a plurality of processors, all the operands required for the calculation processing are prepared on the local memory before the start of the processing, Eliminates the need for data transfer of the operand in the middle of the calculation process. In addition, by causing a part of the calculation processing that handles the array to be executed by multiple processors in duplicate, it is an intermediate result that is generated in the process of the chained calculation processing and is necessary for the subsequent processing of each processor. The parts are respectively calculated using only the data on the local memory of each processor, eliminating the need for data transfer of intermediate results in the middle of a series of calculation processes.

【００１２】[0012]

【実施例】本発明について、図１〜図３を参照して実施
例を説明する。図１は第１の発明の実施例の概略構成を
示す。本実施例は、入力であるＦｏｒｔｒａｎ原始プロ
グラムを装置の内部表現に変換する構文解析装置１と、
内部表現された原始プログラムをローカルメモリを備え
た複数のプロセッサから成る並列計算機システムで並列
動作する内部表現された並列化プログラムに変換するプ
ログラム変換装置２と、内部表現をそれと等価なＦｏｒ
ｔｒａｎ目的プログラムに変換し出力する目的コード生
成装置３と、内部表現されたプログラムの処理単位ごと
に、各プロセッサが当該処理を行なうにあたって必要と
する配列が各ローカルメモリ上に当該処理単位の開始前
に用意されるようプロセッサ間においてデータ転送させ
るようなデータ転送コードを挿入するデータ転送コード
挿入装置４と、内部表現データで表されたプログラムの
配列を扱う計算処理の一部を複数のプロセッサで重複し
て実行させるように分割・割り当てを行なう計算処理分
割割当て装置５とから構成される。EXAMPLES Examples of the present invention will be described with reference to FIGS. FIG. 1 shows a schematic configuration of an embodiment of the first invention. In this embodiment, a syntax analysis device 1 for converting an input Fortran program into an internal representation of the device,
A program conversion device 2 for converting an internally expressed source program into an internally expressed parallelized program that operates in parallel in a parallel computer system consisting of a plurality of processors having local memories, and an internal expression equivalent to it.
A target code generation device 3 that converts and outputs the tran target program, and for each processing unit of the internally expressed program, an array required for each processor to perform the processing is stored in each local memory before the start of the processing unit. , A data transfer code insertion device 4 for inserting a data transfer code for transferring data between the processors and a part of the calculation process for handling the array of the program represented by the internal representation data are duplicated by the plurality of processors. It is composed of a calculation processing divisional allocation device 5 that performs division / allocation so as to be executed.

【００１３】本装置への入力であるＦｏｒｔｒａｎ原始
プログラムは、構文解析装置１へ与えられる。構文解析
装置１は原始プログラムの文法の検出を行ないつつ、原
始プログラムを装置全体で用いる内部表現に変換し、プ
ログラム変換装置２へ渡す。プログラム変換装置２では
渡された原始プログラムを解析し、従来の方法によりプ
ログラム中の配列の分割配置方法を決定する。配列の分
割方法を決定したことで、所有者計算規則により計算処
理の分割方法もこの時点でおおむね自動的に決定する。The Fortran original program, which is an input to this apparatus, is given to the syntax analysis apparatus 1. The syntax analysis device 1 detects the grammar of the source program, converts the source program into an internal representation used in the entire device, and passes it to the program conversion device 2. The program conversion device 2 analyzes the passed source program and determines the method of dividing and arranging the array in the program by the conventional method. Since the method of dividing the array is determined, the method of dividing the calculation process is also automatically determined at this point according to the owner calculation rule.

【００１４】次に、プログラム変換装置２はプログラム
をデータ転送コード挿入装置４に渡す。データ転送コー
ド挿入装置４では、渡されたプログラムをサブルーチ
ン、その他の論理的処理単位に分割し、各処理単位ごと
に、各プロセッサが当該処理を行なうにあたって最低限
必要とする配列、すなわち当該処理の入力データを求
め、入力データが各ローカルメモリ上に当該処理単位の
開始前に用意されるようプロセッサ間においてデータ転
送させるようなコードを挿入する。入力データを求める
にあたっては、当該処理の中間結果が他のプロセッサで
計算される場合であっても自プロセッサ内で計算するも
のとして扱う。これによって当該処理単位の中途でのデ
ータ転送の必要性が除去される。Next, the program conversion device 2 passes the program to the data transfer code insertion device 4. The data transfer code insertion device 4 divides the delivered program into subroutines and other logical processing units, and for each processing unit, an array that is the minimum required for each processor to perform the processing, that is, Input data is obtained, and a code for transferring data between processors is inserted so that the input data is prepared in each local memory before the start of the processing unit. When obtaining the input data, even if the intermediate result of the process is calculated by another processor, it is treated as being calculated by the own processor. This eliminates the need for data transfer midway through the processing unit.

【００１５】データ転送コード挿入装置４によりデータ
転送コードが挿入されたプログラムは、次に計算処理分
割割当て装置５に渡される。ここでは、所有者計算規則
により決定される計算処理の分割方法に基づいて、実際
に各プロセッサが各自の担当部分のみを実行するようプ
ログラムを変換する。また、データ転送コード挿入装置
４で前記のように自プロセッサ内で計算するものとして
扱った中間結果の計算処理を、実際に自プロセッサで行
なわせるようプログラムを調整する。この計算処理は本
来、それを行なうべきプロセッサにおいても当然行なわ
れることになるため、結果として当該計算処理は複数の
プロセッサで重複して実行されることになる。The program into which the data transfer code has been inserted by the data transfer code insertion device 4 is then passed to the calculation processing divisional allocation device 5. Here, based on the division method of the calculation process determined by the owner calculation rule, the program is actually converted so that each processor executes only its own portion. Further, the program is adjusted so that the data transfer code insertion device 4 actually causes the self processor to perform the calculation processing of the intermediate result, which has been treated as being calculated in the self processor as described above. Since this calculation process is naturally performed also in the processor that should perform it, as a result, the calculation process is redundantly executed by a plurality of processors.

【００１６】以上の処理により並列化されたプログラム
は、再びプログラム変換装置２に戻され、通常の言語処
理装置において行なわれるような調整が施された後、最
後に目的コード生成装置３に渡されて、出力であるＦｏ
ｒｔｒａｎ目的プログラムに変換される。The program parallelized by the above processing is returned to the program conversion device 2 again, adjusted as in a normal language processing device, and finally passed to the object code generation device 3. And output Fo
Converted to rtran object program.

【００１７】図２にデータ転送コード挿入装置４の構成
を示す。データ転送コード挿入装置４は、プログラム変
換装置２から渡されたプログラムをサブルーチン単位ま
たは原始プログラムの文面上に指示された処理単位に分
割する計算処理セクション化部４１と、計算処理セクシ
ョン化部４１で分割された処理単位ごとにデータの定義
・参照関係を勘案しながらプログラムの制御の流れをた
どることにより、当該処理単位の最終結果となる配列の
各プロセッサ相当部分の計算処理に最小限必要なデータ
集合を求めるデータフロー解析部４２と、計算処理セク
ション化部４１で分割された処理単位ごとに、データフ
ロー解析部４２で求められたデータ集合のうち各プロセ
ッサのローカルメモリ上に割り当てられていないデータ
を、当該データを所有するプロセッサから転送する命令
を当該処理単位の先頭に挿入するデータ転送命令挿入部
４３とから構成される。FIG. 2 shows the configuration of the data transfer code insertion device 4. The data transfer code insertion device 4 includes a calculation processing sectioning unit 41 that divides the program delivered from the program conversion device 2 into processing units designated in the subroutine unit or the text of the source program, and the calculation processing sectioning unit 41. By tracing the control flow of the program while considering the definition / reference relationship of the data for each divided processing unit, the minimum data required for the calculation processing of the processor equivalent part of the array that is the final result of the processing unit. Data that is not assigned to the local memory of each processor in the data set obtained by the data flow analysis unit 42 for each processing unit divided by the data flow analysis unit 42 that obtains a set and the calculation processing sectioning unit 41 At the beginning of the relevant processing unit. Composed data transfer instruction insertion unit 43.

【００１８】計算処理セクション化部４１では、プログ
ラムをサブルーチン単位または原始プログラムの文面上
に指示された処理単位に分割し、処理単位ごとにデータ
フロー解析部４２に渡す。データフロー解析部４２で
は、渡された処理単位を「Ｃｏｍｐｉｌｅｒｓ−Ｐｒｉ
ｎｃｉｐｌｅｓ，Ｔｅｃｈｎｉｑｕｅｓ，ａｎｄＴｏ
ｏｌｓ（Ａｈｏほか，１９８６）」第１０章に詳述され
るデータフロー解析手法により解析し、当該処理単位の
出口に到達する定義集合から当該処理単位の最終結果と
なる配列とそのインデックス範囲を求め、続いて各プロ
セッサに関し最終結果のうち担当する部分の要素から始
めてデータフローを逆向きに遡ることにより、前記要素
を計算するために当該処理単位の入口で用意されている
べき配列要素の集合を求める。この時、計算の中間結果
であって、自プロセッサに割り当てられていないものが
ある場合には記録しておき、計算処理分割割当て装置５
においてその情報をもとに、当該中間結果を自プロセッ
サでも計算させるようにコードを調整する。データ転送
命令挿入部４３では、データフロー解析部４２で求まっ
た当該処理単位の入口で用意されているべき配列要素の
集合のうち、当該プロセッサに割り当てられていないも
のについて、当該処理単位の先頭において当該要素を所
有するプロセッサからデータを転送する命令を挿入す
る。The calculation processing sectioning unit 41 divides the program into sub-routine units or processing units designated on the text of the source program, and transfers the processing units to the data flow analysis unit 42. In the data flow analysis unit 42, the passed processing unit is “Compilers-Prim”.
nciples, Techniques, and To
ols (Aho et al., 1986) ”, which is analyzed by the data flow analysis method described in Chapter 10, and the final result array of the processing unit and its index range are obtained from the definition set reaching the exit of the processing unit. Then, by starting from the element of the part in charge of the final result for each processor and going backward in the data flow, the set of array elements that should be prepared at the entrance of the processing unit to calculate the element is calculated. Ask. At this time, if there is an intermediate result of the calculation which is not allocated to the own processor, it is recorded and the calculation processing division allocation device 5
At that time, the code is adjusted based on the information so that the intermediate result is calculated by the own processor. In the data transfer instruction insertion unit 43, of the set of array elements that should be prepared at the entrance of the processing unit obtained by the data flow analysis unit 42, which is not assigned to the processor, at the beginning of the processing unit. Insert an instruction to transfer data from the processor that owns the element.

【００１９】図３は第２の発明の実施例の概略構成を示
す。本実施例は図１の第１の発明の実施例と比較して、
データ転送コード挿入装置４を用いる代わりに配列デー
タ分割割当て装置６を使用している点が異なっている。
配列データ分割割当て装置６では、実行中プロセッサ間
においてデータ転送させる代わりに、各プロセッサが当
該処理を行なうにあたって必要とする配列が各ローカル
メモリ上にプログラムの実行開始時に用意されているよ
うそれらの配列の分割方法自体を変更する。ここでは前
記データフロー解析の手法ないし、原始プログラムの文
面上に指示された方法によって、配列の分割方法を決定
する。FIG. 3 shows a schematic configuration of an embodiment of the second invention. This embodiment is compared with the embodiment of the first invention of FIG.
The difference is that an array data division allocation device 6 is used instead of using the data transfer code insertion device 4.
In the array data division and allocation device 6, instead of transferring data between processors during execution, arrays required for each processor to perform the processing are prepared on each local memory at the start of execution of the program. Change the division method itself. Here, the method of dividing the array is determined by the method of data flow analysis or the method instructed on the text of the source program.

【００２０】図４に示すＦｏｒｔｒａｎプログラムを例
に本発明の効果を説明する。このプログラムは２段階に
分かれており、ステップ１で被演算数Ａから中間結果Ｂ
およびＣを計算し、ステップ２でＢ，Ｃから最終結果Ｄ
を計算している。The effect of the present invention will be described by taking the Fortran program shown in FIG. 4 as an example. This program is divided into two stages. In step 1, the operand A to the intermediate result B
And C are calculated, and the final result D is calculated from B and C in step 2.
Is calculated.

【００２１】図５は第１の発明のプログラム言語処理装
置で図４のプログラムを処理し、同条件でプロセッサに
割り当てて実行させた場合のデータの流れである。自プ
ロセッサ内に存在しない要素Ａ（１００）とＡ（２０
１）をステップ１に先だって転送し、ステップ１におい
て本来自プロセッサに割り当てられていない計算（図で
は楕円で囲んだ部分）を併せて行なうことにより、処理
の中途でデータ転送することなしにすべての計算を自プ
ロセッサ内で行なうことができる。したがって、処理の
途中では同期待ちが発生せず、本発明によって並列処理
の効率が向上する。FIG. 5 is a data flow when the program of FIG. 4 is processed by the programming language processing device of the first invention and is assigned to the processor under the same condition to be executed. Elements A (100) and A (20) that do not exist in the own processor
1) is transferred prior to step 1, and the calculation that is not originally assigned to the processor itself in step 1 (the part surrounded by an ellipse in the figure) is also performed, so that all data can be transferred without data transfer in the middle of the process. The calculation can be performed in its own processor. Therefore, synchronization waiting does not occur in the middle of processing, and the efficiency of parallel processing is improved by the present invention.

【００２２】また、第２の発明によって配列Ａをあらか
じめＡ（１００）〜Ａ（２０１）の範囲でプロセッサに
割り当てておけば、ステップ１開始前のデータ転送が不
要になるため、さらに並列処理の効率が向上する。If the array A is assigned to the processor in the range of A (100) to A (201) in advance according to the second aspect of the invention, the data transfer before the start of step 1 becomes unnecessary, and the parallel processing is further performed. Efficiency is improved.

【００２３】なお、本発明は上述の実施例に限定される
ものではない。例えば入力言語は配列構造を持つ任意の
プログラムの言語であってもよく、出力はプログラム言
語、機械オブジェクト、実行可能コードのいずれであっ
てもよい。また、実施例の各構成要素は明確に分離され
ている必要はなく、一部を他の構成要素と共有するよう
な構成であってもよい。その他、本発明はその要旨を逸
脱しない範囲で種々変形して実施することができる。The present invention is not limited to the above embodiment. For example, the input language may be the language of any program having an array structure, and the output may be any of the programming language, machine objects, and executable code. Further, the respective constituent elements of the embodiment do not have to be clearly separated, and a part may be shared with other constituent elements. In addition, the present invention can be variously modified and implemented without departing from the gist thereof.

【００２４】[0024]

【発明の効果】以上説明してきたように、第１の発明に
よれば処理の中途でデータ転送することなしにすべての
計算を自プロセッサ内で行なうことができ、したがっ
て、処理の途中では同期待ちが発生せず、並列処理の効
率が向上する。As described above, according to the first aspect of the invention, all calculations can be performed in the processor itself without transferring data in the middle of processing, and therefore, in the middle of processing, synchronization waiting is performed. Does not occur, and the efficiency of parallel processing is improved.

【００２５】また、第２の発明によれば、ステップ１の
開始前のデータ転送が不要になるため、さらに並列処理
の効率が向上する。According to the second aspect of the invention, since the data transfer before the start of step 1 becomes unnecessary, the efficiency of parallel processing is further improved.

[Brief description of drawings]

【図１】第１の発明の一実施例を示す構成図である。FIG. 1 is a configuration diagram showing an embodiment of a first invention.

【図２】図１に示されたデータ転送コード挿入装置の一
例を示す構成図である。FIG. 2 is a configuration diagram showing an example of a data transfer code insertion device shown in FIG.

【図３】第２の発明の一実施例を示す構成図である。FIG. 3 is a configuration diagram showing an embodiment of the second invention.

【図４】本発明の効果を説明するためのプログラムの例
である。FIG. 4 is an example of a program for explaining the effect of the present invention.

【図５】本発明の装置により処理したプログラムの実行
の様子を示す図である。FIG. 5 is a diagram showing how a program processed by the device of the present invention is executed.

【図６】従来の装置の構成図である。FIG. 6 is a configuration diagram of a conventional device.

【図７】従来の装置により処理したプログラムの実行の
様子を示す図である。FIG. 7 is a diagram showing how a program processed by a conventional device is executed.

【符号の説明】１構文解析装置２プログラム変換装置３目的コード生成装置４データ転送コード挿入装置５計算処理分割割当て装置６配列データ分割割当て装置４１計算処理セクション化部４２データフロー解析部４３データ転送命令挿入部[Description of Codes] 1 syntax analysis device 2 program conversion device 3 object code generation device 4 data transfer code insertion device 5 calculation processing division allocation device 6 array data division allocation device 41 calculation processing sectioning unit 42 data flow analysis unit 43 data transfer Instruction insertion part

Claims

[Claims]

1. A programming language processing device comprising a program conversion device for converting a source program into a parallelized program that operates in parallel in a parallel computer system comprising a plurality of processors equipped with local memories. , A group of data (array) required for each processor to perform the processing is calculated for each processing unit (chain processing unit) such that the result of a certain calculation processing becomes the operand of the next calculation processing. A data transfer code insertion device that inserts a data transfer code into a program that allows data transfer between processors so that it is prepared in the local memory before the start of the processing unit, and a part of the calculation processing that handles the array is performed by a plurality of parts. And a device for converting a program to be executed by the processor in duplicate. Parallel programming language processing apparatus.

2. The programming language processing device according to claim 1, wherein the data transfer code insertion device divides the program passed from the program conversion device into subroutine units or processing units instructed on the text of the source program. The final result of the processing unit is obtained by tracing the control flow of the program while considering the definition / reference relationship of the data for each calculation processing sectioning unit and the processing unit divided by the calculation processing sectioning unit. A data flow analysis unit that obtains a minimum required data set for the calculation processing of each processor-corresponding portion of the array, and a data set obtained by the data flow analysis unit for each processing unit divided by the calculation processing sectioning unit Owns data that is not allocated in the local memory of each processor Parallel programming language processing device, comprising: a data transfer instruction insertion unit that inserts an instruction to be transferred from the processor to the head of the processing unit.

3. A programming language processing apparatus for converting a source program into a target program which operates in parallel in a parallel computer system comprising a plurality of processors having a local memory, in which an array of source programs is divided and arranged in processors. , A device that converts a program so that a part of the array is overlapped with a plurality of processors, and a device that converts a program so that a part of the calculation processing that handles the array is executed by a plurality of processors redundantly. A parallel programming language processing device characterized by being provided.