JPH07114516A

JPH07114516A - Program parallelizing method

Info

Publication number: JPH07114516A
Application number: JP15327394A
Authority: JP
Inventors: Toshio Okochi; 俊夫大河内; Chisato Konno; 千里金野; Mitsuyoshi Ikai; 光祥猪貝
Original assignee: Hitachi ULSI Engineering Corp; Hitachi Ltd
Current assignee: Hitachi ULSI Engineering Corp; Hitachi Ltd
Priority date: 1993-07-06
Filing date: 1994-07-05
Publication date: 1995-05-02

Abstract

PURPOSE:To reduce the inter-processor communication of a program for parallel computers in a process for converting a program, described in a high-level language, into the program for the parallel computers. CONSTITUTION:The variables of the inputted program are divided and assigned to plural processors. The inputted program 2 is analyzed, the presence of reference to a variable value defined by other processors is detected as to each process, and codes of a program part regarding the inter-processor communication for the reference are generated. When each processor executes the parallelized program 5, the codes of a program part for deciding whether or not reference to the variable value is already performed are generated by an analysis. A program part which performs control so as to perform the inter- processor communication for executing the reference to the variable value when the reference is not performed, or not to perform the reference when the reference is already done, is generated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、高級言語で記述された
単一プロセッサ用プログラムを分散メモリ型並列プロセ
ッサ用のプログラムに自動変換する技術に関し、特に、
並列プロセッサ用のプログラムに必要なプロセッサ間の
通信を行うプログラム部分を自動的に計算機により生成
する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for automatically converting a program for a single processor written in a high-level language into a program for a distributed memory type parallel processor, and more particularly,
The present invention relates to a technique for automatically generating, by a computer, a program part for performing communication between processors necessary for a program for a parallel processor.

【０００２】[0002]

【従来の技術】プログラムの並列化をおこなうために
は、すなわち、ＦＯＲＴＲＡＮなどの高級言語で記述さ
れた単一プロセッサ用プログラムから分散メモリ型並列
プロセッサ用のプログラム（以下「並列化プログラム」
と呼ぶ）を生成するためには、通信を行うプログラム部
分を付加することが必要である。これは、複数のプロ
セッサで構成される分散メモリ型並列プロセッサシステ
ムにおいて、例えば、変数は各プロセッサに分散して割
り当てられるため、あるプロセッサが他のプロセッサで
定義された変数を参照する場合に、その参照すべき変数
値をプロセッサ間通信によって獲得する必要があるため
である。2. Description of the Related Art In order to parallelize a program, that is, from a program for a single processor written in a high level language such as FORTRAN to a program for a distributed memory type parallel processor (hereinafter referred to as "parallelized program").
Call)), it is necessary to add a program part for communication. This is because, in a distributed memory parallel processor system composed of multiple processors, for example, variables are distributed and assigned to each processor, so when one processor refers to a variable defined in another processor, This is because it is necessary to obtain the variable value to be referred to by interprocessor communication.

【０００３】このため、ＦＯＲＴＲＡＮなどの高級言語
で記述されたプログラムを入力し並列化プログラムに自
動的に計算機によって変換する際には、計算機は、プロ
セッサ間のデータの通信を行うプログラム部分のコード
を自動的に生成し、並列化プログラム中に付与すること
が行われている。また、生成された並列化プログラム
は、各プロセッサに対するプログラム部分を比較すれ
ば、ほぼ同一の記述であることが望ましい。Therefore, when a program written in a high-level language such as FORTRAN is input and automatically converted into a parallelized program by the computer, the computer changes the code of the program portion for communicating data between processors. It is automatically generated and given to a parallelized program. Further, it is desirable that the generated parallelized programs have substantially the same description when comparing the program parts for each processor.

【０００４】このように、高級言語で記述されたプログ
ラムから、プロセッサ間のデータの通信を行うコードを
含む並列化プログラムを自動生成する技術に関しては、
例えば「Proceedings of the 1992 ACM International
Conference on Supercomputing」に記載されている。
これは、高級言語で記述されたプログラムを計算機が解
析して、どのタイミングでどのデータを通信すべきかを
検出し、並列化プログラムに自動的に通信コードを挿入
するものである。As described above, regarding the technique for automatically generating a parallelized program including a code for communicating data between processors from a program written in a high-level language,
For example, `` Proceedings of the 1992 ACM International
Conference on Supercomputing ".
In this system, a computer analyzes a program written in a high-level language, detects which data should be communicated at which timing, and automatically inserts a communication code into a parallelized program.

【０００５】また、物理現象を高水準の問題記述言語で
記述したプログラムを計算機が入力し、プロセッサ間通
信を記述した並列プロセッサ用の並列化プログラムに変
換する技術に関しては、例えば JP―A―4―190422 に記
載されている。A technique for inputting a program in which a physical phenomenon is described in a high-level problem description language into a computer and converting it into a parallelized program for a parallel processor in which interprocessor communication is described is described in JP-A-4, for example. ―190422.

【０００６】[0006]

【発明が解決しようとする課題】上述の従来技術では、
計算機は並列化プログラムの生成時に、各プロセッサ
が、生成された並列化プログラムを実際に実行しなくて
も分かる範囲の静的な状況のみを考慮して、通信コード
を生成し付加している。しかし、並列化プログラムを実
際に動作させたときには、種々の動的な状況の変化があ
るので、ある位置におけるプロセッサ間通信の要否が並
列化プログラムの実行時にしか決まらない場合もある。
したがって、前記従来技術によれば、必要のない無駄な
通信を行なう可能性があった。In the above-mentioned prior art,
When generating a parallelized program, the computer generates and adds a communication code in consideration of only the static situation that each processor can understand without actually executing the generated parallelized program. However, when the parallelized program is actually operated, there are various dynamic changes in the situation. Therefore, the necessity of interprocessor communication at a certain position may be determined only when the parallelized program is executed.
Therefore, according to the conventional technique, there is a possibility that unnecessary and unnecessary communication is performed.

【０００７】生成された並列化プログラムの実行性能を
高めるためには、プロセッサ間の通信を必要最小限にと
どめること、すなわち、無駄なプロセッサ間通信を行わ
ないようにするのが望ましい。前記従来技術では、通信
の要否が並列化プログラムの実行時にしか決まらないも
の（例えば、元のプログラム中に条件文があって、その
条件に応じてプロセッサ間通信の要否が決まるものな
ど）について、無駄な通信を行わないようにする方法に
ついては触れられていない。In order to improve the execution performance of the generated parallelized program, it is desirable to minimize communication between processors, that is, to prevent useless communication between processors. In the above-mentioned conventional technology, the necessity of communication is determined only when the parallelized program is executed (for example, there is a conditional statement in the original program, and the necessity of communication between processors is determined according to the condition). No mention is made of how to prevent unnecessary communication.

【０００８】本発明の目的は、プロセッサ間通信の要否
が実行時にしか決まらない場合についても、並列化プロ
グラムを各プロセッサが実行する時に、無駄なプロセッ
サ間通信が生じないような並列化プログラムを単一プロ
セッサ用プログラムから自動的に変換生成することがで
きる方法を提供することにある。An object of the present invention is to provide a parallelized program that prevents unnecessary interprocessor communication when each processor executes the parallelized program even when the necessity of interprocessor communication is determined only at the time of execution. It is to provide a method capable of automatically converting and generating from a program for a single processor.

【０００９】[0009]

【課題を解決するための手段】前記目的を達成するた
め、本発明は、単一プロセッサ用のプログラムを入力
し、該入力されたプログラムを並列プロセッサ用の並列
化プログラムに計算機によって変換する方法であって、
前記入力されたプログラムを解析することにより、各プ
ロセッサについて、変換された並列化プログラムを当該
プロセッサによって実行した場合に必要となる可能性の
あるプロセッサ間通信を検出し、該通信のためのコード
を生成する第１のステップと、前記入力されたプログラ
ムを解析することにより、各プロセッサについて、当該
プロセッサによって前記並列化プログラムを実行した際
に、他のプロセッサとの間の通信の要否を判定して通信
が必要であると判定された場合にのみ通信を行うように
制御するためのコードを生成する第２のステップと、前
記入力されたプログラムに、前記通信のためのコードと
前記制御のためのコードとを付加した並列化プログラム
を生成する第３のステップとを含むことを特徴とする。In order to achieve the above object, the present invention provides a method for inputting a program for a single processor and converting the input program into a parallelized program for a parallel processor by a computer. There
By analyzing the input program, the inter-processor communication that may be necessary when the converted parallelized program is executed by the processor is detected for each processor, and the code for the communication is detected. By analyzing the input step and the first step of generating, for each processor, when the parallelized program is executed by the processor, the necessity of communication with other processors is determined. Second step of generating a code for controlling so that communication is performed only when it is determined that communication is necessary, and a code for the communication and the control for the control in the input program. And a third step of generating a parallelized program added with the code.

【００１０】なお、プロセッサ間通信を行うことが常に
必要であると計算機が判定したプログラム部分には、前
記第３のステップによる制御のためのコードを生成せ
ず、無条件にプロセッサ間通信を行うようにするとよ
い。It should be noted that the code for the control in the third step is not generated in the program portion determined by the computer that it is always necessary to perform the inter-processor communication, and the inter-processor communication is performed unconditionally. It is good to do so.

【００１１】また、前記変数値への参照が当該プロセッ
サによって既に実行済みか否かを管理するためのフラグ
を新たに設け、該フラグの値を更新するコードと、該フ
ラグの値に応じてプロセッサ間通信の実行を制御するコ
ードとを生成するようにするとよい。Further, a flag for managing whether or not the reference to the variable value has already been executed by the processor is newly provided, a code for updating the value of the flag, and a processor according to the value of the flag. Code for controlling execution of intercommunication may be generated.

【００１２】例えば、前記第１のステップは、前記入力
されたプログラムから、その入力されたプログラムの実
行の流れを有向グラフ（directed graph）で表現した制
御フローグラフを作成する制御フロー解析ステップと、
該制御フローグラフから、前記入力されたプログラム中
の変数の定義・参照の関係を有向グラフで表現したデー
タ依存グラフを作成するデータ依存解析ステップと、該
データ依存グラフ中のそれぞれのデータの定義・参照の
関係について、プロセッサ間で通信が必要なものを選び
だし、通信対象データ依存グラフを作成する通信データ
抽出処理ステップと、該通信対象データ依存グラフから
プログラム中で通信を行う箇所を決定し、その位置に挿
入すべき通信のためのコードを生成・挿入して通信情報
リストを作成する通信位置決定処理ステップとを備える
ようにしてもよい。For example, the first step includes a control flow analysis step of creating a control flow graph from the input program, which represents a flow of execution of the input program in a directed graph.
From the control flow graph, a data dependence analysis step of creating a data dependence graph expressing the definition / reference relation of variables in the input program in a directed graph, and the definition / reference of each data in the data dependence graph The communication data extraction processing step of creating a communication target data dependence graph and the communication target data dependence graph, and determining the communication point in the program from the communication target data dependence graph. A communication position determination processing step of generating / inserting a code for communication to be inserted at a position to create a communication information list may be provided.

【００１３】また、前記第２のステップは、前記通信情
報リスト中の通信コードの通信対象データが定義されて
いるところで所定のフラグをセットするコードを生成す
る変数定義フラグ生成処理ステップと、該フラグがセッ
トされているときにのみ通信コードが実行されるように
制御するための通信条件式のコードを生成する通信条件
式生成処理ステップとを備えてもよい。The second step includes a variable definition flag generation processing step for generating a code for setting a predetermined flag in the communication information list in which communication target data of the communication code is defined, and the flag. And a communication conditional expression generation processing step for generating a code of the communication conditional expression for controlling the communication code to be executed only when is set.

【００１４】さらに、前記通信の実行後に前記フラグを
リセットするコードを生成する変数定義フラグ初期化情
報生成処理ステップ、あるいは前記通信条件式で参照す
るフラグについて、前記制御フローグラフを逆に辿って
そのフラグを定義する式を調べ、前記通信条件式が常に
成立する場合は、その通信条件式およびフラグを削除す
る通信条件簡約化処理ステップを備えてもよい。Further, a variable definition flag initialization information generation processing step for generating a code for resetting the flag after execution of the communication or a flag referred to in the communication conditional expression is traced backward in the control flow graph. An expression defining the flag may be examined, and if the communication conditional expression is always satisfied, a communication condition reduction processing step of deleting the communication conditional expression and the flag may be provided.

【００１５】[0015]

【作用】本発明のプログラム変換方法によれば、プロセ
ッサ間通信の要否が実行時にしか決まらないような場合
にも、プロセッサ間通信の要否を実行時に判断して選択
的に通信を行うことにより、無駄な通信を行わないよう
な並列化プログラムを計算機によって自動生成すること
ができる。According to the program conversion method of the present invention, even when the necessity of inter-processor communication is determined only at the time of execution, the necessity of the inter-processor communication is judged at the time of execution to selectively perform communication. As a result, a computer can automatically generate a parallelized program that prevents unnecessary communication.

【００１６】[0016]

【実施例】以下、図面を参照して本発明の一実施例を説
明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１７】図１は、本発明に係るプログラム並列化方
法を実現する計算機システムの実施例の概略構成および
概略処理を示す図である。図２は、図１に示す通信コー
ド生成処理４の詳細な手順を示すチャートである。これ
らの手順については、後に詳述する。FIG. 1 is a diagram showing a schematic configuration and a schematic process of an embodiment of a computer system for realizing a program parallelizing method according to the present invention. FIG. 2 is a chart showing a detailed procedure of the communication code generation processing 4 shown in FIG. These procedures will be described later in detail.

【００１８】図１に示すように、計算機１は、元の単一
プロセッサ用のプログラム２を入力し、入力されたプロ
グラム２に対して、複数のプロセッサに分割する処理３
および各プロセッサについて通信コードを生成する処理
４を含む本実施例のプログラム並列化処理を実行し、生
成された並列化プログラム５を出力する。プログラム分
割処理３は、データを各プロセッサに分割するデータ分
割処理３１および処理を各プロセッサに分割する処理の
分割処理３２を含んでいる。計算機１は、処理の過程で
生成されるデータ分割情報６、通信コードなしの並列化
プログラム７および並列実行用制御情報８をそれぞれフ
ァイルに格納する。As shown in FIG. 1, the computer 1 inputs a program 2 for the original single processor and divides the inputted program 2 into a plurality of processors 3
And the program parallelization processing of this embodiment including the processing 4 for generating the communication code is executed for each processor, and the generated parallelization program 5 is output. The program division processing 3 includes a data division processing 31 for dividing data into the processors and a division processing 32 for dividing the processing into the processors. The computer 1 stores in each file the data division information 6, the parallelization program 7 without communication code, and the parallel execution control information 8 generated in the process of processing.

【００１９】図３は、単一プロセッサ用のプログラムの
例を示す。図４は、図３に示した単一プロセッサ用のプ
ログラムを従来のプログラム並列化方法により並列化し
た場合のプログラムの例である。また、図５は、図３に
示した単一プロセッサ用のプログラムを本発明のプログ
ラム並列化方法により並列化したプログラムの例であ
る。図１０、図１２および図１４は、図３に示した単一
プロセッサ用のプログラムを本発明の実施例により並列
化する処理における、通信コード生成処理４の中間段階
のデータを示す。FIG. 3 shows an example of a program for a single processor. FIG. 4 shows an example of a program when the program for a single processor shown in FIG. 3 is parallelized by a conventional program parallelization method. 5 is an example of a program in which the program for the single processor shown in FIG. 3 is parallelized by the program parallelization method of the present invention. FIGS. 10, 12 and 14 show data in the intermediate stage of the communication code generation process 4 in the process of parallelizing the program for the single processor shown in FIG. 3 according to the embodiment of the present invention.

【００２０】以下、まず図３に示す単一プロセッサ用の
元のプログラム２００を例にして、本実施例によって並
列化したプログラム５０１（図５）と、従来方式によっ
て並列化したプログラム５００（図４）とを、比較しな
がら説明する。First, the original program 200 for a single processor shown in FIG. 3 is taken as an example, and a program 501 (FIG. 5) parallelized by the present embodiment and a program 500 (FIG. 4) parallelized by the conventional method. ) And will be explained while comparing.

【００２１】図３の元のプログラム２００において、右
側に付した（１０），（１１），（２０）、…などは文
番号を示す。この元のプログラム２００では、まず配列
インデックスＩが１から１００の範囲について、配列Ｂ
（Ｉ）の各要素の値をすべて０．０に初期化している
（文１０，１１）。次に、文２１から文２９までの処理
を、変数ＴＩＭＥに設定された回数だけ、繰り返してい
る（文２０，３０）。In the original program 200 of FIG. 3, (10), (11), (20), etc. attached to the right side indicate sentence numbers. In the original program 200, the array B is first arrayed in the range from 1 to 100.
The values of each element in (I) are all initialized to 0.0 (sentences 10 and 11). Next, the processing from sentence 21 to sentence 29 is repeated the number of times set in the variable TIME (statements 20 and 30).

【００２２】文２１〜２４では、配列要素Ｂ（１）の値
が１．０を越えるとき、配列インデックスＩが１から１
００の範囲について配列Ａ（Ｉ）の各要素の値を更新し
ている。その後、文２５〜２９で、配列インデックスＩ
が１から９９の範囲について、配列Ｃ（Ｉ）を更新し
（文２６）、続いて配列要素Ｂ（Ｉ）の値を更新する処
理（文２７）を繰り返している。次に、文２９で変数Ｎ
Ｔに１を加えている。In the statements 21 to 24, when the value of the array element B (1) exceeds 1.0, the array index I is 1 to 1.
The value of each element of the array A (I) is updated for the range of 00. Then, in sentences 25-29, the array index I
For the range from 1 to 99, the process of updating the array C (I) (sentence 26) and then updating the value of the array element B (I) (sentence 27) is repeated. Next, in statement 29, the variable N
1 is added to T.

【００２３】プログラム並列化処理とは、このような単
一プロセッサ用のプログラムから、複数のプロセッサで
並列に実行するようなプログラムを生成する処理のこと
である。例えば、図３に示す元のプログラム２００で
は、配列Ａ，Ｂ，Ｃが用いられているので、これらの配
列を適宜複数の範囲に分割して、各範囲の計算を各プロ
セッサに割り当てる。すなわち、各プロセッサが、配列
Ａ，Ｂ，Ｃの更新処理（文１１，２３，２６，２７）の
うち、配列インデックスが＄ｌｂから＄ｕｂまでの範囲
（ただし、＄ｌｂ、＄ｕｂは、各プロセッサ毎に異なる
値が設定される変数を表すものとする）を更新する処理
を分担することにより、複数のプロセッサが並列に処理
を実行するような並列化プログラムを生成する。The program parallelization process is a process for generating a program to be executed in parallel by a plurality of processors from such a program for a single processor. For example, since the original program 200 shown in FIG. 3 uses the arrays A, B, and C, these arrays are appropriately divided into a plurality of ranges, and the calculation of each range is assigned to each processor. That is, in each of the update processing of the arrays A, B, and C (statements 11, 23, 26, and 27), each processor has an array index in the range from $ lb to $ ub (where $ lb and $ ub are By assigning the processing of updating a variable for which a different value is set for each processor), a parallelized program is generated in which a plurality of processors execute the processing in parallel.

【００２４】初めに、従来技術によるプログラム並列化
処理によって生成されるプログラムを説明する。図４
は、図３の元のプログラムに対して従来技術によるプロ
グラム並列化処理を施すことによって生成した並列化プ
ログラム５００を示す。プログラム５００は、複数ある
プロセッサの各々で実行されるプログラムである。ただ
し、上述したように変数＄ｌｂ、＄ｕｂの値は各プロセ
ッサごとに異なる。First, a program generated by the conventional program parallelization processing will be described. Figure 4
3 shows a parallelized program 500 generated by subjecting the original program of FIG. The program 500 is a program executed by each of a plurality of processors. However, as described above, the values of the variables $ lb and $ ub are different for each processor.

【００２５】図３に示す元のプログラム２００におい
て、配列Ａ，Ｂ，Ｃを更新する文１１，２３，２６，２
７の処理は分割されて、各プロセッサが分担することに
なる。このために、図４に示す並列化プログラム５００
では、元のプログラム２００のＤＯ文１０，２２，２５
のループ範囲を変更し、それぞれ文１０−ａ，２２−
ａ，２５−ａに置き換えている。In the original program 200 shown in FIG. 3, statements 11, 23, 26, 2 for updating the arrays A, B, C.
The processing of 7 is divided, and each processor will share the processing. For this purpose, the parallelized program 500 shown in FIG.
Then, the DO sentence 10, 22, 25 of the original program 200
Change the loop range of each statement 10-a, 22-
a, 25-a.

【００２６】図４に示すプログラム５００において、条
件文２１の実行では、すべてのプロセッサが配列要素Ｂ
（１）の値を参照する。このため、プログラム並列化処
理は、配列要素Ｂ（１）を所有しているプロセッサから
他のすべてのプロセッサに配列要素Ｂ（１）の値を転送
する通信コード（文２０１，２０２）を生成する。ま
た、文２７の実行において、各プロセッサは自己が所有
していない配列要素Ａ（＄ｕｂ＋１）の値を参照する。
このため、プログラム並列化処理は、この値の転送を行
う通信コード（文２４２，２４３）を生成する。In the execution of the conditional statement 21 in the program 500 shown in FIG. 4, all processors are array element B.
Reference the value of (1). Therefore, the program parallelization process generates a communication code (statements 201, 202) for transferring the value of the array element B (1) from the processor owning the array element B (1) to all the other processors. . In executing the statement 27, each processor refers to the value of the array element A ($ ub + 1) that it does not own.
Therefore, the program parallelization process generates communication codes (statements 242 and 243) that transfer this value.

【００２７】ここで、図４のプログラム５００におい
て、配列Ａの値を更新する文２３は、条件文２１の評価
結果に応じて実行するか否かが決定される。条件文２１
の評価結果により文２３が実行されなかった場合、配列
Ａの値は更新されていない。従って、文２７で参照する
Ａ（Ｉ＋１）の値は、ＤＯ文２０の繰り返しにおけるそ
れ以前の文２７の実行の際に参照した値のままであるか
ら、このときのプロセッサ間通信（文２４２，２４３）
は本来は不要であるということになる。Here, in the program 500 of FIG. 4, whether or not to execute the statement 23 for updating the value of the array A is determined according to the evaluation result of the conditional statement 21. Conditional statement 21
If the statement 23 is not executed according to the evaluation result of, the value of the array A has not been updated. Therefore, the value of A (I + 1) referred to in the statement 27 remains the value referred to in the execution of the statement 27 before that in the repetition of the DO statement 20. 243)
Is essentially unnecessary.

【００２８】しかし、従来の並列化処理では、その要否
はプログラム並列化を行う時点では判定できない。この
ため、従来方式では、図４に示すようにその要否に関係
なく通信を行うような並列計算機用のプログラムを生成
してしまう。However, in the conventional parallelization processing, the necessity cannot be determined at the time of program parallelization. Therefore, in the conventional method, as shown in FIG. 4, a program for a parallel computer that performs communication regardless of the necessity is generated.

【００２９】次に、本実施例の方式によるプログラム並
列化処理によって生成したプログラムについて説明す
る。図５は、図３の元のプログラム２００を用いて、本
実施例によるプログラム並列化処理（図１および２）を
施すことによって生成した並列化プログラム５０１を示
す。Next, a program generated by the program parallelization processing according to the method of this embodiment will be described. FIG. 5 shows a parallelization program 501 generated by performing the program parallelization processing (FIGS. 1 and 2) according to this embodiment using the original program 200 of FIG.

【００３０】図５のプログラム５０１では、文２２−ａ
および文２３からなるブロックが実行されたか否かを、
フラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇの値によって管理し
ている。すなわち、文２２−ａおよび文２３からなるブ
ロックを実行したときには、文２４０でフラグＡ＿ｕｐ
ｄａｔｅ＿ｆｌａｇの値を１に更新する。そして、文２
５−ａから文２８までのブロックを実行する前に、フラ
グＡ＿ｕｐｄａｔｅ＿ｆｌａｇが１の場合には、プロセ
ッサ間通信（文２４２，２４３）を実行し、フラグＡ＿
ｕｐｄａｔｅ＿ｆｌａｇが０の場合には、プロセッサ間
通信（文２４２，２４３）を実行しないように制御する
条件文２４１，２４５が挿入されている。これにより、
本実施例によって並列化したプログラム５０１では、無
駄なプロセッサ間通信を行うことがない。In the program 501 of FIG. 5, statement 22-a
And whether a block consisting of statement 23 has been executed,
It is managed by the value of the flag A_update_flag. That is, when the block including the statement 22-a and the statement 23 is executed, the flag A_up is set in the statement 240.
Update the value of date_flag to 1. And sentence 2
If the flag A_update_flag is 1 before executing the block from 5-a to the statement 28, inter-processor communication (statements 242 and 243) is executed and the flag A_
When update_flag is 0, conditional statements 241 and 245 for controlling not to execute inter-processor communication (statements 242 and 243) are inserted. This allows
The program 501 parallelized according to the present embodiment does not perform unnecessary inter-processor communication.

【００３１】本来は、図５のプログラム５０１では、文
２６および２７から成るブロック（図１０の基本ブロッ
ク４）で、プロセッサ間通信をおこなった後の配列Ａを
あるプロセッサが用いる必要があるか否かは、そのプロ
セッサが参照すべき配列値を所有する他のプロセッサが
文２３のブロック（図１０の基本ブロック３）を実行し
たかどうかで判断しなければならない。しかし、本発明
によるプログラム並列化方法では、並列実行するプロセ
ッサは、それぞれ並列化プログラム中の基本ブロックを
同じ順序で実行するという事実に基づいて実現されてい
る。このため、他のプロセッサが基本ブロック３を実行
したかどうかは、自分自身のプロセッサが基本ブロック
３を実行したかどうかに置き換えることによって知るこ
とができる。自分自身が基本ブロック３を実行したかど
うかは、基本ブロック３の最後でフラグ変数をセットす
るような実行文を挿入することにより、該フラグ変数の
値で判定できる。これにより、前記のように並列化プロ
グラムの実行時にプロセッサ間通信の要否を判定するこ
とができる。Originally, in the program 501 of FIG. 5, whether a certain processor needs to use the array A after performing inter-processor communication in the block composed of the statements 26 and 27 (basic block 4 in FIG. 10). Whether or not that processor has to execute the block of the statement 23 (basic block 3 in FIG. 10) must be determined by another processor that owns the array value to be referenced. However, the program parallelization method according to the present invention is realized based on the fact that the processors that execute in parallel execute the basic blocks in the parallelized programs in the same order. Therefore, whether or not another processor has executed the basic block 3 can be known by substituting whether or not its own processor has executed the basic block 3. Whether or not the basic block 3 is executed by itself can be determined by the value of the flag variable by inserting an executable statement that sets the flag variable at the end of the basic block 3. As a result, as described above, it is possible to determine the necessity of inter-processor communication when executing the parallelized program.

【００３２】次に、このような無駄な通信を行わない並
列化されたプログラムを自動生成する本実施例の処理手
順を詳細に説明する。Next, the processing procedure of this embodiment for automatically generating a parallelized program that does not perform such wasteful communication will be described in detail.

【００３３】図１を参照して、本実施例のプログラム並
列化処理の処理手順を説明する。計算機１は、まずプロ
グラム２を入力し、プログラム分割処理３を行う。これ
は、データの分割処理３１および処理の分割３２の決
定、すなわち入力したプログラム２の処理およびデータ
をどのように分割して複数のプロセッサに割り当てるか
を決定する処理である。データ分割３１は図６に示すよ
うな、プログラム中の各配列をプロセッサにどう分割し
て割り当てるかを示す表を作成する。処理の分割３２
は、プログラム中の各実行文をどう分割してどのプロセ
ッサが実行するかを決定し、例えば図７に示すような、
各DOループをどのようにプロセッサが分担して実行する
かを示す並列実行制御情報８を生成し、さらに該並列実
行制御情報８を参照して各プロセッサが実行する共通の
並列化されたソースプログラム７（図９）を生成してそ
れぞれファイルに格納する。処理およびデータをどのよ
うに分割するかを決定する方式は、任意でよい。続い
て、通信コード生成処理４を実行して、並列化プログラ
ム５を生成する。The processing procedure of the program parallelization processing of this embodiment will be described with reference to FIG. The computer 1 first inputs the program 2 and performs the program division processing 3. This is the determination of the data division process 31 and the process division 32, that is, the process of the input program 2 and the process of deciding how to divide the data and allocate it to a plurality of processors. The data division 31 creates a table showing how each array in the program is divided and assigned to the processor, as shown in FIG. Division of processing 32
Determines how to divide each executable statement in the program and which processor executes it. For example, as shown in FIG.
A common parallelized source program that generates parallel execution control information 8 indicating how the processors share and execute each DO loop, and further executes each processor by referring to the parallel execution control information 8 7 (FIG. 9) and store each in a file. Any method may be used to determine the processing and how to divide the data. Then, the communication code generation processing 4 is executed to generate the parallelized program 5.

【００３４】該生成された並列化プログラム５を実行す
る並列プロセッサの構成を図８に示す。並列プロセッサ
は以下のような機能をもつ。FIG. 8 shows the configuration of a parallel processor that executes the generated parallelized program 5. The parallel processor has the following functions.

【００３５】（１）複数のプロセッサ４０ａ〜４０ｄと
それらプロセッサ間のデ−タ転送を可能とするネットワ
−ク４２とを有する。(1) It has a plurality of processors 40a to 40d and a network 42 which enables data transfer between the processors.

【００３６】（２）プロセッサ４０ａ〜４０ｄは、それ
ぞれデ−タやプログラムを保持するためのメモリ４１ａ
〜４１ｄを有する。(2) The processors 40a-40d each have a memory 41a for holding data and programs.
~ 41d.

【００３７】（３）各プロセッサは他のプロセッサとの
間で通信ネットワーク４２を介してデ−タを送受信する
機能を持つ。デ−タの送受信は、以下の２つのシステム
コールによって実行される。(3) Each processor has a function of transmitting and receiving data to and from other processors via the communication network 42. Transmission and reception of data is executed by the following two system calls.

【００３８】（3.1）デ−タ送信処理は、sendシステム
コールにより実行される。sendシステムコールは、送信
対象変数、送信先プロセッサ番号及びデ−タ識別子を引
数として実行される。(3.1) The data transmission process is executed by the send system call. The send system call is executed with the variable to be transmitted, the processor number of the transmission destination and the data identifier as arguments.

【００３９】（3.2）デ−タ受信処理は、receiveシステ
ムコールにより実行される。receiveシステムコール
は、受信デ−タを代入する変数及びデ−タ識別子を引数
として実行される。receiveシステムコールを実行した
とき識別子に対応するデ−タが到着していない場合は、
プロセッサはデ−タが到着するまで、他の処理の実行を
休止して待つ。(3.2) The data receiving process is executed by the receive system call. The receive system call is executed with the variables for substituting the received data and the data identifier as arguments. If the data corresponding to the identifier has not arrived when the receive system call is executed,
The processor pauses and waits for the execution of other processing until the data arrives.

【００４０】次に、図２のチャートを参照して、通信コ
ード生成処理４について詳細に説明する。ここでは、図
９に示す並列化プログラム（通信コードなし）７を元に
通信コードを付加する場合を説明する。しかし、図１の
破線の矢印に示すように、図３に示す元のプログラム２
００の分割処理３をおこなった後、そのまま通信コード
生成処理をおこなうようにしても良い。Next, the communication code generation processing 4 will be described in detail with reference to the chart of FIG. Here, a case where a communication code is added based on the parallelized program (without communication code) 7 shown in FIG. 9 will be described. However, as shown by the dashed arrow in FIG. 1, the original program 2 shown in FIG.
After performing the division processing 3 of 00, the communication code generation processing may be directly performed.

【００４１】図２において、通信コード生成処理４で
は、先ずプログラムの制御フロー解析４０１を行い、プ
ログラムの実行の流れを有向グラフで表現した制御フロ
ーグラフ４１１を作成する。図１０は、図９の並列化ソ
ースプログラム７に対して制御フロー解析４０１を行う
ことにより得られた制御フローグラフ４１１の概念図
を、また、図１１は、該制御フローグラフ４１１のデー
タ構造をそれぞれ示す。図１０において、４１００〜４
１０３は、逐次的に実行される一連の代入文や式の列か
らなる基本ブロックを示す。矢印４１１０〜４１１６
は、基本ブロック間の制御の流れを表す枝を示す。In FIG. 2, in the communication code generation processing 4, first, the control flow analysis 401 of the program is performed, and the control flow graph 411 that represents the flow of program execution in a directed graph is created. 10 is a conceptual diagram of the control flow graph 411 obtained by performing the control flow analysis 401 on the parallelized source program 7 of FIG. 9, and FIG. 11 shows the data structure of the control flow graph 411. Shown respectively. In FIG. 10, 4100-4
Reference numeral 103 denotes a basic block made up of a series of assignment statements and expressions that are executed sequentially. Arrows 4110-4116
Indicates a branch representing the flow of control between the basic blocks.

【００４２】次に、計算機１は、図２におけるデータ依
存解析４０２を行い、プログラム中の変数の定義・参照
の関係を有向グラフで表現したデータ依存グラフ４１２
を作成する。図１２は、図３に示す元のプログラム２０
０に対するデータ依存グラフ４１２の概念図を、また、
図１３は該データ依存グラフ４１２のデータ構造をそれ
ぞれ示す。矢印４１２０〜４１２４は、データの定義・
参照の関係を示す。各矢印の起点はそのデータが定義さ
れているところを示し、終点はそのデータが参照されて
いるところを示している。すなわち、図１２に示すデー
タ依存解析により得られる図１３の表４１２から、各配
列について、その配列を定義するブロックおよびその配
列を参照するブロックのすべてを知ることができる。具
体的には、図１３において、例えば配列Ａは、ブロック
４１０２（基本ブロック３）によって定義され、ブロッ
ク４１０３（基本ブロック４）の２カ所で参照されてい
ることがわかる。Next, the computer 1 performs the data dependence analysis 402 shown in FIG. 2, and the data dependence graph 412 expressing the relationship between the definition and reference of variables in the program by a directed graph.
To create. FIG. 12 shows the original program 20 shown in FIG.
A conceptual diagram of the data dependence graph 412 for 0,
FIG. 13 shows the data structure of the data dependence graph 412. Arrows 4120 to 4124 are data definition /
Indicates the reference relationship. The starting point of each arrow shows where the data is defined, and the ending point shows where the data is referenced. That is, from the table 412 of FIG. 13 obtained by the data dependence analysis shown in FIG. 12, for each array, all the blocks that define the array and the blocks that reference the array can be known. Specifically, in FIG. 13, for example, it can be seen that the array A is defined by the block 4102 (basic block 3) and is referenced at two locations of the block 4103 (basic block 4).

【００４３】次に、図２における通信データ抽出処理４
０３を行う。これは、データ依存グラフ４１２に現れる
各矢印、すなわちデータ依存グラフ中のそれぞれのデー
タの定義・参照の関係について、プロセッサ間で通信を
おこなう可能性のあるものを選びだし、通信対象データ
依存グラフ４１３を作成する処理である。通信対象デー
タ依存グラフ４１３は、データ依存グラフ４１２の部分
グラフとなる。Next, the communication data extraction processing 4 in FIG.
Do 03. This is because each arrow appearing in the data dependency graph 412, that is, the relation of definition / reference of each data in the data dependency graph, has a possibility of communicating between the processors, and the communication target data dependency graph 413 is selected. Is a process for creating. The communication target data dependence graph 413 is a partial graph of the data dependence graph 412.

【００４４】図１４は、図１２のデータ依存グラフ４１
２から作成した通信対象データ依存グラフ４１３の概念
図を、図１５は該通信対象データ依存グラフ４１３のデ
ータ構造をそれぞれ示す。図１２のデータ依存グラフ４
１２の中の矢印４１２１，４１２３については、そのデ
ータを定義するプロセッサと参照するプロセッサとが同
一であることが分かる。従って、これらの矢印４１２
１，４１２３は、プロセッサ間通信が必要ないので、取
り除かれる。一方、図１２の矢印４１２０、４１２２お
よび４１２４は、プロセッサ間通信が必要なものとして
取り出される。その結果、図１４に示す通信対象データ
依存グラフ４１３が得られる。FIG. 14 shows the data dependence graph 41 of FIG.
2 shows a conceptual diagram of the communication target data dependence graph 413 created from FIG. 2, and FIG. 15 shows the data structure of the communication target data dependence graph 413. Data dependence graph 4 of FIG.
As for arrows 4121 and 4123 in 12, it can be seen that the processor that defines the data is the same as the processor that refers to it. Therefore, these arrows 412
No. 1,4123 is eliminated because inter-processor communication is not required. On the other hand, arrows 4120, 4122 and 4124 in FIG. 12 are taken out as those requiring inter-processor communication. As a result, the communication target data dependence graph 413 shown in FIG. 14 is obtained.

【００４５】次に、図２における通信位置決定処理４０
４を行う。この処理により、通信データ抽出処理４０３
で抽出されたプロセッサ間通信をおこなう可能性のある
データ依存について、並列化プログラム中でプロセッサ
間通信を行う箇所を決定し、その位置に、必要なプロセ
ッサ間通信のための通信コード（通信情報）を挿入す
る。本実施例では、プロセッサ間通信対象のデータを参
照する基本ブロックの直前で通信を行うように、挿入位
置を決定する。Next, the communication position determination processing 40 in FIG.
Do 4. By this processing, communication data extraction processing 403
Regarding the data dependence that may be used for inter-processor communication, the location of inter-processor communication in the parallelized program is determined, and the required communication code (communication information) for inter-processor communication is located at that location. Insert. In this embodiment, the insertion position is determined so that communication is performed immediately before the basic block that refers to the data to be communicated between processors.

【００４６】図１６は、図１４の通信対象データ依存グ
ラフ４１３から得られた通信情報リスト４１４を、図１
７は該通信情報リスト４１４のデータ構造をそれぞれ示
す。４１４１は、図１４の通信対象データ依存グラフ４
１３中のデータ依存４１２０および４１２４に基づいて
生成した通信情報である。通信情報４１４１は、基本ブ
ロック４１０１の直前に配置される。この通信情報４１
４１は、自プロセッサが配列要素Ｂ（１）を所有してい
るときは他のすべてのプロセッサにその値を送信し、自
プロセッサが配列要素Ｂ（１）を所有していないときは
Ｂ（１）を所有するプロセッサからその値を受信すると
いう処理を行うものである。FIG. 16 shows the communication information list 414 obtained from the communication object data dependence graph 413 of FIG.
7 shows the data structure of the communication information list 414. Reference numeral 4141 indicates the communication target data dependence graph 4 of FIG.
It is the communication information generated based on the data dependencies 4120 and 4124 in 13. The communication information 4141 is arranged immediately before the basic block 4101. This communication information 41
41 sends its value to all the other processors when its own processor owns the array element B (1), and B (1) when its own processor does not own the array element B (1). ) Is received from the processor that owns the value).

【００４７】図１６に示す通信情報４１４２は、図１４
の通信対象データ依存グラフ４１３中のデータ依存４１
２２に基づいて生成した通信情報である。通信情報４１
４２は、配列要素Ａ（＄ｌｂ）を自プロセッサの直前の
プロセッサ（プロセッサのＩＤが自プロセッサより１小
さいもの）に送信し、配列要素Ａ（＄ｕｂ＋１）を自プ
ロセッサの直後のプロセッサ（プロセッサのＩＤが自プ
ロセッサより１大きいもの）から受信するという処理を
行うものである。The communication information 4142 shown in FIG.
Data dependence 41 in the communication target data dependence graph 413 of
22 is communication information generated based on 22. Communication information 41
42 sends the array element A ($ lb) to the processor immediately before the own processor (the processor ID is one smaller than the own processor), and sends the array element A ($ ub + 1) to the processor immediately after the own processor (processor The ID is one larger than that of its own processor).

【００４８】次に、計算機１は、図２に示す通信条件決
定処理４０５を行なう。これは、図２の通信位置決定処
理４０４で挿入した各通信情報（図１６の４１４１や４
１４２）について、データ依存グラフ４１２（図１２）
と制御フローグラフ４１１（図１０）とから実際に通信
が必要になる条件を求め、通信の実行を制御するための
制御情報を作成する処理である。通信が必要になる条件
は、通信対象データ依存グラフ４１３（図１４）に現れ
る矢印のデータ依存が実行中に実際に生ずる条件であ
り、これは制御フローグラフ４１１（図１０）を追跡す
ることによって求めることができる。Next, the computer 1 performs the communication condition determination processing 405 shown in FIG. This is the respective communication information (4141 and 4 in FIG. 16) inserted in the communication position determination processing 404 in FIG.
142) for data 142) (FIG. 12)
And a control flow graph 411 (FIG. 10) are used to determine the conditions under which communication is actually required, and control information for controlling the execution of communication is created. The condition that requires communication is the condition that the data dependence of the arrow appearing in the communication target data dependence graph 413 (FIG. 14) actually occurs during execution, and this is achieved by tracing the control flow graph 411 (FIG. 10). You can ask.

【００４９】図１８は、図２に示す通信条件決定処理４
０５の詳細なチャートを示す。図１９は、図１８の通信
条件決定処理４０５の中間段階の情報４０５５を示す。
以下、図１８に示したチャート、および図１９の情報４
０５５を参照して、通信条件決定処理４０５の処理を説
明する。FIG. 18 shows the communication condition determining process 4 shown in FIG.
The detailed chart of 05 is shown. 19 shows information 4055 at an intermediate stage of the communication condition determination processing 405 of FIG.
Hereinafter, the chart shown in FIG. 18 and the information 4 of FIG.
The processing of the communication condition determination processing 405 will be described with reference to 055.

【００５０】通信条件決定処理４０５は、図１６のよう
な通信情報リスト４１４の各通信情報について以下の処
理を行う。The communication condition determination processing 405 performs the following processing for each piece of communication information in the communication information list 414 as shown in FIG.

【００５１】図１８において、まず、変数定義フラグ生
成処理４０５１は、プログラム実行中に通信対象の変数
値の定義が行われたか否かを表すフラグ変数を設定する
コード情報を作成する。次に、通信条件式生成処理４０
５２は、該通信の要否を実行時に判定するための条件式
を作成する。In FIG. 18, first, the variable definition flag generation processing 4051 creates code information for setting a flag variable indicating whether or not a variable value to be communicated is defined during program execution. Next, the communication conditional expression generation process 40
52 creates a conditional expression for determining the necessity of the communication at the time of execution.

【００５２】さらに、変数定義フラグ初期化コード生成
処理４０５３は、通信を行った場合に変数定義フラグを
初期化するコード情報を作成する。また、通信条件簡約
化処理４０５４は、通信条件式生成処理４０５２で作成
した条件式から、制御フローグラフ４１１を参照して自
明な条件を取り除いて簡約化する。Further, the variable definition flag initialization code generation processing 4053 creates code information for initializing the variable definition flag when communication is performed. Further, the communication condition reduction processing 4054 removes the trivial condition from the conditional expression created in the communication condition expression generation processing 4052 by referring to the control flow graph 411 to reduce the condition.

【００５３】具体的に説明すると、図１６の通信情報リ
スト４１４の通信情報４１４１については、この通信が
データ依存４１２０および４１２４（図１４）から生じ
ていることから、基本ブロック４１００または４１０３
を通った場合に通信対象データＢ（１）の定義が起こる
ことが分かる。そこで、変数定義フラグ生成処理４０５
１は、フラグＢ＿ｕｐｄａｔｅ＿ｆｌａｇを設け、基本
ブロック４１００または４１０３を通った場合にこの値
を１に設定するコードの情報４１５４，４１５７（図１
９参照）を作成する。More specifically, regarding the communication information 4141 of the communication information list 414 of FIG. 16, since this communication originates from the data dependency 4120 and 4124 (FIG. 14), the basic block 4100 or 4103.
It is understood that the definition of the communication target data B (1) occurs when the data passes through. Therefore, the variable definition flag generation processing 405
1 provides a flag B_update_flag, and sets the value to 1 when passing through the basic block 4100 or 4103. Code information 4154, 4157 (FIG. 1).
9)).

【００５４】次に、プロセッサ間通信はフラグＢ＿ｕｐ
ｄａｔｅ＿ｆｌａｇが１になっている場合に限って実行
する必要があることから、図１８の通信条件式生成処理
４０５２は、条件式４１５５を作成する。次に、変数定
義フラグ初期化情報生成処理４０５３は、通信実行後に
フラグＢ＿ｕｐｄａｔｅ＿ｆｌａｇの値を０に初期化す
るコード情報４１５６を作成する。Next, the inter-processor communication is flagged as B_up.
Since it is necessary to execute it only when the date_flag is 1, the communication conditional expression generation processing 4052 in FIG. 18 creates the conditional expression 4155. Next, the variable definition flag initialization information generation processing 4053 creates code information 4156 that initializes the value of the flag B_update_flag to 0 after the communication is executed.

【００５５】さらに、通信条件簡約化処理４０５４は、
条件式で参照するフラグＢ＿ｕｐｄａｔｅ＿ｆｌａｇに
ついて、制御フローグラフ４１１（図１０）を逆に辿っ
て、その値を定義する式を調べる。この例では、制御フ
ローグラフ４１１の経路４１１０と４１１４を逆に辿る
と、共に、フラグＢ＿ｕｐｄａｔｅ＿ｆｌａｇは１にな
っていることが分かる。このことは、図１１に示す制御
フローグラフ４１１のデータ構造において、ブロック２
の先行ブロックが１と４のみであることからも分かる。
このため、条件式４１５５は常に成立することが確定す
るので、条件式は不要であると判定できる。そこで、条
件式４１５５およびこれに用いられる変数定義フラグＢ
＿ｕｐｄａｔｅ＿ｆｌａｇを削除する。Further, the communication condition reduction processing 4054 is
Regarding the flag B_update_flag that is referred to in the conditional expression, the control flow graph 411 (FIG. 10) is traced backward to check the expression that defines the value. In this example, when the paths 4110 and 4114 of the control flow graph 411 are traced in reverse, it can be seen that the flag B_update_flag is 1 in both cases. This means that in the data structure of the control flow graph 411 shown in FIG.
It can be seen from the fact that the preceding blocks of are only 1 and 4.
Therefore, since it is determined that the conditional expression 4155 is always satisfied, it can be determined that the conditional expression is unnecessary. Therefore, the conditional expression 4155 and the variable definition flag B used for this
Delete _update_flag.

【００５６】図１６の通信情報４１４２についても同様
にして、フラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇを設け、基
本ブロック４１０２を通った場合にこの値を１に設定す
るコードの情報４１５１（図１９）を設ける。また、条
件式４１５２、およびフラグＡ＿ｕｐｄａｔｅ＿ｆｌａ
ｇを初期化するコード情報４１５３を作成する。Similarly, the communication information 4142 of FIG. 16 is provided with a flag A_update_flag, and information 4151 (FIG. 19) of a code for setting this value to 1 when the basic block 4102 is passed. Also, the conditional expression 4152 and the flag A_update_fla
Code information 4153 for initializing g is created.

【００５７】次に、制御フローグラフ４１１（図１０）
を逆に辿ってフラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇの値を
定義する式を調べる。この場合は、経路４１１２および
４１１３を調べる。経路４１１３を逆に辿った場合は、
フラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇの値が１になってい
ることが分かる。しかし、経路４１１２を逆に辿った場
合は、フラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇの値は確定で
きない。このことは、図１１におけるブロック４の先行
ブロックが２、３および４であるのに、図１５における
配列Ａについての参照ブロック４の定義ブロックが３の
みであることからも分かる。このため条件式４１５２な
どはそのまま残す。Next, the control flow graph 411 (FIG. 10)
The reverse is followed and the expression which defines the value of the flag A_update_flag is examined. In this case, routes 4112 and 4113 are examined. If route 4113 is followed in reverse,
It can be seen that the value of the flag A_update_flag is 1. However, when the route 4112 is traced in reverse, the value of the flag A_update_flag cannot be determined. This can be seen from the fact that the preceding blocks of the block 4 in FIG. 11 are 2, 3 and 4, but the definition block of the reference block 4 for the array A in FIG. 15 is only 3. Therefore, the conditional expression 4152 and the like are left as they are.

【００５８】以上のようにして、図１６の通信情報リス
ト４１４から図２０に示すような通信制御情報リスト４
１５が生成される。図２０の通信制御情報リスト４１５
では、フラグＡ＿ｕｐｄａｔｅ＿ｆｌａｇを用いた条件
文４１５２によって、並列化プログラムの実行時に、各
プロセッサが、通信コード４１４２の実行の要否を判断
するようになっている。従って、不要な場合はプロセッ
サ間通信を行なわないので、通信の回数や負荷が軽減さ
れる。このような条件文はすべての通信コードに付加さ
れるわけではなく、通信コード４１４１のように常に必
要であると判断される通信コードでは（条件文が付加さ
れずに）無条件にプロセッサ間通信を行なうようになっ
ている。従って、無駄に条件文を実行することもない。
図２１は通信制御情報リスト４１５のデータ構造を示
す。As described above, the communication information list 414 shown in FIG. 16 to the communication control information list 4 shown in FIG.
15 is generated. The communication control information list 415 of FIG.
Then, by the conditional statement 4152 using the flag A_update_flag, each processor determines whether or not to execute the communication code 4142 when the parallelized program is executed. Therefore, the communication between processors is not performed when unnecessary, so that the number of times of communication and the load are reduced. Such a conditional statement is not added to all communication codes, and communication codes that are always judged to be necessary like the communication code 4141 (without adding a conditional statement) unconditionally perform inter-processor communication. Is designed to do. Therefore, the conditional statement is not unnecessarily executed.
FIG. 21 shows the data structure of the communication control information list 415.

【００５９】再び図２を参照して、コード生成処理４６
０は、ここまでに決定した処理の分割、通信情報に基づ
いて、図５に示したような並列プロセッサ用の並列化プ
ログラムを出力する。Referring again to FIG. 2, the code generation process 46.
0 outputs the parallelization program for the parallel processor as shown in FIG. 5 based on the division of the processing and the communication information determined so far.

【００６０】以上説明した実施例では、配列は１次元で
あった。したがって、参照フラグの値は、プロセッサ間
通信をおこなうか否かの２値でよかった。しかし、配列
が２次元以上の場合、ある配列要素に隣接する配列要素
は、４値以上となるため、参照フラグは、”１”か”
０”かの２値では不十分となる。In the embodiment described above, the array is one-dimensional. Therefore, the value of the reference flag may be a binary value indicating whether or not interprocessor communication is performed. However, when the array is two-dimensional or more, the array element adjacent to a certain array element has four or more values, so the reference flag is "1" or "
A binary value of 0 "is not sufficient.

【００６１】この問題に対処した、本発明の別の実施例
について説明する。本実施例は、１つの配列を複数回参
照し、かつ参照する配列要素インデックスが異なるよう
な場合について、本発明によるプロセッサ間通信コード
生成処理を適用したものである。この場合には、プロセ
ッサ間通信の要否を配列毎に判定する方法では無駄な通
信を十分に取り除くことはできず、１つの配列について
通信対象部分を分割して、それぞれについてプロセッサ
間通信の要否を判定する必要が生ずる。以下では本実施
例による通信コード生成処理を、図２２に示す通信コー
ドなしの並列化プログラム３１０に適用した場合につい
て説明する。Another embodiment of the present invention which addresses this problem will be described. In the present embodiment, the interprocessor communication code generation processing according to the present invention is applied to a case where one array is referred to a plurality of times and the array element indexes to be referred to are different. In this case, the method of determining the necessity of inter-processor communication for each array cannot sufficiently eliminate the useless communication, and the communication target part is divided for one array and the inter-processor communication necessity is divided for each. It becomes necessary to judge whether or not. The case where the communication code generation processing according to the present embodiment is applied to the parallelized program 310 without communication code shown in FIG. 22 will be described below.

【００６２】プログラム３１０は、ブロック３１１で配
列Ａを更新し、該更新後の値をブロック３１２およびブ
ロック３１３で参照する。ブロック３１２は条件ブロッ
クであり、条件文（文３２４）の判定条件が成立した場
合にのみ実行される。配列Ａはプログラム３１０中の２
つのブロックで参照されるが、各々の参照する配列要素
インデックスは異なる。すなわち、ブロック３１２では
Ｂ（Ｉ，Ｊ）の更新のためにＡ（Ｉ−１，Ｊ）、Ａ（Ｉ
＋１，Ｊ）を参照し、ブロック３１３ではＡ（Ｉ−１，
Ｊ）、Ａ（Ｉ＋１，Ｊ）、Ａ（Ｉ，Ｊ−１）、Ａ（Ｉ，
Ｊ＋１）を参照する。The program 310 updates the array A in block 311, and refers to the updated value in blocks 312 and 313. The block 312 is a conditional block, and is executed only when the determination condition of the conditional statement (statement 324) is satisfied. Sequence A is 2 in program 310
They are referenced in one block, but each array element index they refer to is different. That is, in block 312, A (I-1, J), A (I
+1, J), and in block 313, A (I-1,
J), A (I + 1, J), A (I, J-1), A (I,
J + 1).

【００６３】ここで、２次元配列のこのようなデータ参
照の仕方をデータ参照パターンとして図２３のようにグ
ラフで表現することにする。図２３で、例えばSTENCIL1
はＡ（Ｉ−１，Ｊ）、Ａ（Ｉ＋１，Ｊ）、Ａ（Ｉ，Ｊ−
１）、Ａ（Ｉ，Ｊ＋１）の４点を参照することを示す参
照パターンである。同様に、STENCIL2は、Ａ（Ｉ−１，
Ｊ）、Ａ（Ｉ＋１，Ｊ）の２点を参照する参照パター
ン、STENCIL3はＡ（Ｉ，Ｊ−１）、Ａ（Ｉ，Ｊ＋１）の
２点を参照する参照パターンである。この参照パターン
を用いて表せば、プログラム３１０では、ブロック３１
２で参照パターンSTENCIL2、ブロック３１３で参照パタ
ーンSTENCIL1で配列Ａを参照する。プロセッサ間で通信
すべきデータはこの参照パターンによって決定できる。
このような参照パターンは参照点の相対的なインデック
スを要素とする配列として表現できる。該参照パターン
を表す配列と通信対象の配列名とを引数として必要なデ
ータの送信、受信を行う関数をそれぞれsendp,recvpと
する。Here, such a data reference method of the two-dimensional array will be expressed as a data reference pattern in a graph as shown in FIG. In FIG. 23, for example, STENCIL1
Is A (I-1, J), A (I + 1, J), A (I, J-
1) is a reference pattern indicating that 4 points of A (I, J + 1) are referred to. Similarly, STENCIL2 is A (I-1,
J) and A (I + 1, J) are two reference patterns, and STENCIL3 is a reference pattern that references two points A (I, J-1) and A (I, J + 1). If it is expressed using this reference pattern, in the program 310, the block 31
The reference pattern STENCIL2 is referenced in 2 and the array A is referenced in the reference pattern STENCIL1 in block 313. The data to be communicated between the processors can be determined by this reference pattern.
Such a reference pattern can be expressed as an array whose elements are relative indices of reference points. The functions for transmitting and receiving the necessary data with the array representing the reference pattern and the array name of the communication target as arguments are respectively called sendp and recvp.

【００６４】以下では、まず本実施例によるプログラム
並列化方式をプログラム３１０の例で説明し、次にこの
ような並列化プログラムを生成する方法を説明する。In the following, first, the program parallelization method according to the present embodiment will be described using an example of the program 310, and then a method for generating such a parallelized program will be described.

【００６５】プログラム３１０を本実施例によって並列
化したプログラムを図２４に示す。プログラム中、配列
Ａに関係する参照パターンは、図２３のSTENCIL1,STENC
IL2の２通りである。配列のうち、各プロセッサが参照
のみ行い、更新処理を行わない部分を参照エリアと呼
ぶ。各プロセッサは参照エリアのデータを参照する場
合、プロセッサ間通信によって他のプロセッサからデー
タを受け取る必要がある。変数A_status_flagは、配列
Ａの参照エリアのうち最新の値に更新されていない（以
下これをDirtyであるという）部分に対応する参照パタ
ーン番号を保持する。配列Ａが更新された直後には参照
エリアはすべてDirtyとなり、Dirtyな部分はSTENCIL1に
相当する。この場合、プログラムではA_status_flagを1
にする。ブロック３１２の配列Ａの参照パターンはSTEN
CIL2なので、これに対応するプロセッサ間通信を行う
（文３２４１、３２４２）。FIG. 24 shows a program obtained by parallelizing the program 310 according to this embodiment. In the program, reference patterns related to array A are STENCIL1 and STENC in FIG.
There are two types of IL2. A portion of the array in which each processor only refers and does not perform update processing is called a reference area. When referring to the data in the reference area, each processor needs to receive the data from the other processor through inter-processor communication. The variable A_status_flag holds a reference pattern number corresponding to a portion of the reference area of the array A that has not been updated to the latest value (hereinafter, this is referred to as Dirty). Immediately after the array A is updated, all the reference areas become Dirty, and the Dirty part corresponds to STENCIL1. In this case, the program sets A_status_flag to 1
To The reference pattern of the array A of the block 312 is STEN
Since it is CIL2, communication between processors corresponding to this is performed (statements 3241 and 3242).

【００６６】このプロセッサ間通信によって、配列Ａの
参照エリアのうち一部分は通信済みとなり、STENCIL3に
相当する部分のみDirtyとなる。これを示すために制御
変数A_status_flagの値を３とする（文３２４３）。ブ
ロック３１３の配列Ａの参照パターンはSTENCIL1である
が、ブロック３１３実行開始時点では、ブロック３１２
を実行してSTENCIL2を参照パターンとするプロセッサ間
通信を実行済みの場合とそうでない場合の２通りがあ
る。ブロック３１２を実行していない場合は、参照パタ
ーンSTENCIL1でのプロセッサ間通信が必要である。一
方、ブロック３１２を実行済みの場合は、STENCIL1とST
ENCIL2の差分として参照パターンSTENCIL3でのプロセッ
サ間通信が必要である。これを、制御変数A_update_fla
gの値で判定し、通信を実行する（文３２９１〜３２９
７）。By this interprocessor communication, a part of the reference area of the array A has already been communicated, and only the part corresponding to STENCIL3 becomes dirty. To indicate this, the value of the control variable A_status_flag is set to 3 (statement 3243). The reference pattern of the array A of the block 313 is STENCIL1, but at the start of execution of the block 313, the block 312
Is executed and inter-processor communication using STENCIL2 as a reference pattern has already been executed, and there are two cases. When the block 312 is not executed, interprocessor communication with the reference pattern STENCIL1 is necessary. On the other hand, if block 312 has already been executed, STENCIL1 and ST
Interprocessor communication with the reference pattern STENCIL3 is required as a difference from ENCIL2. This is the control variable A_update_fla
Determine the value of g and execute communication (statements 3291 to 329)
7).

【００６７】通信コード生成処理４はプロセッサ間デー
タ依存がある各基本ブロックについて、該基本ブロック
実行時に通信が必要なデータを判定し、必要なデータだ
けを通信するようなプログラムを生成する。図２の制御
フロー解析４０１から通信位置決定処理４０４までは、
最初の実施例の場合と全く同様に行う。The communication code generation process 4 determines, for each basic block having inter-processor data dependency, data that needs communication when the basic block is executed, and generates a program that communicates only the necessary data. From the control flow analysis 401 to the communication position determination processing 404 of FIG.
The procedure is exactly the same as in the first embodiment.

【００６８】図２の通信条件決定処理４０５及びコード
生成処理４０６は、初めに通信対象となる各配列につい
てプログラム中で現れる参照パターン、及びそれらの参
照パターンの和、交わりをとることによって得られる参
照パターンをすべて列挙し、参照パターンに通し番号を
付ける。ここで、参照パターンの和、差とは、参照点の
集合としての和、差を意味する。また、すべての参照パ
ターンの和をとって得られる参照パターンをＵとする。
プログラム３１０の例では、図２３に示す３通りの参照
パターンを得る。The communication condition determination processing 405 and the code generation processing 406 shown in FIG. 2 are reference patterns first appearing in the program for each array to be communicated, and reference obtained by taking the sum and intersection of these reference patterns. List all patterns and number reference patterns serially. Here, the sum and difference of the reference patterns mean the sum and difference as a set of reference points. Further, a reference pattern obtained by taking the sum of all the reference patterns is U.
In the example of the program 310, the three reference patterns shown in FIG. 23 are obtained.

【００６９】次に通信情報リストの各要素について、状
態管理変数の示す参照パターンＳ、該ブロックでのデー
タ参照パターンＲに対して、参照パターンＲ−（Ｕ−
Ｓ）でプロセッサ間通信を行う実行文列（図５、文２４
１〜２４２、２９１〜２９７）を生成する。次に、状態
管理変数を参照パターンＳ−Ｒの番号に設定する実行文
列（図５、文２４３、２９８）を生成する。ここで減算
記号は、集合としての差分を意味する。これにより状態
管理変数は、該配列の通信エリアのうち、更新されてい
ない部分に相当する参照パターンの番号を常に保持し、
また各通信において本当に必要な部分だけを通信するよ
うに動作する。Next, for each element of the communication information list, with respect to the reference pattern S indicated by the state management variable and the data reference pattern R in the block, the reference pattern R- (U-
S), an execution statement string for performing inter-processor communication (statement 24 in FIG. 5).
1-242, 291-297) are generated. Next, an execution statement string (statements 243 and 298 in FIG. 5) that sets the state management variable to the number of the reference pattern SR is generated. Here, the subtraction symbol means a difference as a set. As a result, the state management variable always holds the number of the reference pattern corresponding to the part that is not updated in the communication area of the array,
Also, in each communication, it operates so as to communicate only the really necessary part.

【００７０】以上のようにして、図２４に示すような並
列化プログラムが生成される。As described above, the parallelized program as shown in FIG. 24 is generated.

【００７１】[0071]

【発明の効果】本発明のプログラム並列化方法によれ
ば、並列実行時の通信の回数を従来方法に比べて削減す
ることができ、従って通信に要する実行負荷を削減する
ことができる。これにより生成した並列計算機用プログ
ラムの実行性能を向上させることができる。According to the program parallelization method of the present invention, the number of communications during parallel execution can be reduced as compared with the conventional method, and therefore the execution load required for communications can be reduced. As a result, the execution performance of the generated parallel computer program can be improved.

[Brief description of drawings]

【図１】本発明のプログラム並列化方法を実現する計算
機システムの一実施例の概略構成および概略処理を示す
図。FIG. 1 is a diagram showing a schematic configuration and a schematic process of an embodiment of a computer system that realizes a program parallelization method of the present invention.

【図２】図１に示す計算機システムによる通信コード生
成処理の例を示すチャート。FIG. 2 is a chart showing an example of communication code generation processing by the computer system shown in FIG.

【図３】図３、単一プロセッサ用のプログラムの例を示
すプログラムリスト。FIG. 3 is a program list showing an example of a program for a single processor.

【図４】図３の単一プロセッサ用のプログラムを従来の
プログラム並列化方法により並列化したプログラムリス
ト。4 is a program list in which the program for the single processor in FIG. 3 is parallelized by a conventional program parallelization method.

【図５】図３の単一プロセッサ用のプログラムを本発明
のプログラム並列化方法により並列化したプログラムリ
スト。FIG. 5 is a program list in which the program for the single processor in FIG. 3 is parallelized by the program parallelization method of the present invention.

【図６】図１に示すデータ分割情報６の例を示す表。6 is a table showing an example of data division information 6 shown in FIG.

【図７】図７は、図１に示す並列実行用制御情報８の例
を示す表。FIG. 7 is a table showing an example of parallel execution control information 8 shown in FIG. 1.

【図８】図８は、本発明により生成された並列化プログ
ラムを実行する並列プロセッサシステムの構成例を示す
概略図。FIG. 8 is a schematic diagram showing a configuration example of a parallel processor system that executes a parallelized program generated by the present invention.

【図９】図１に示す並列化プログラム（通信コードな
し）７の例を示す図。FIG. 9 is a diagram showing an example of a parallelized program (without communication code) 7 shown in FIG. 1.

【図１０】図２に示す制御フローグラフ４１１の例を示
す図。10 is a diagram showing an example of a control flow graph 411 shown in FIG.

【図１１】図１０に示す制御フローグラフ４１１のデー
タ構造を示す表。通信条件決定処理フローチャート。11 is a table showing a data structure of a control flow graph 411 shown in FIG. The communication condition determination processing flowchart.

【図１２】図２に示すデータ依存グラフ４１２の例を示
す図。FIG. 12 is a diagram showing an example of a data dependence graph 412 shown in FIG.

【図１３】図１２に示すデータ依存グラフ４１２のデー
タ構造を示す表。13 is a table showing a data structure of a data dependence graph 412 shown in FIG.

【図１４】図１４は、図２に示す通信対象データ依存グ
ラフ４１３の例を示す図。14 is a diagram showing an example of a communication target data dependence graph 413 shown in FIG.

【図１５】図１４に示す通信対象データ依存グラフ４１
３のデータ構造を示す表。図１３の単一プロセッサ用の
プログラムを本実施例により並列化したプログラム。15 is a communication object data dependence graph 41 shown in FIG.
The table which shows the data structure of 3. A program obtained by parallelizing the program for the single processor of FIG. 13 according to this embodiment.

【図１６】図２に示す通信制御情報リスト４１４の例を
示す図。16 is a diagram showing an example of a communication control information list 414 shown in FIG.

【図１７】図１６に示す通信制御情報リスト４１４のデ
ータ構造を示す表。17 is a table showing a data structure of a communication control information list 414 shown in FIG.

【図１８】図２に示す通信条件決定処理４０５の詳細な
処理を示すチャート。FIG. 18 is a chart showing detailed processing of communication condition determination processing 405 shown in FIG.

【図１９】図１８に示す通信条件決定処理４０５の中間
段階における情報の例を示す図。19 is a diagram showing an example of information at an intermediate stage of the communication condition determination processing 405 shown in FIG.

【図２０】図２に示す通信制御情報リスト４１５の例を
示す図。20 is a diagram showing an example of a communication control information list 415 shown in FIG.

【図２１】図２０に示す通信制御情報リスト４１５のデ
ータ構造を示す表。21 is a table showing the data structure of the communication control information list 415 shown in FIG.

【図２２】２次元配列を含む単一プロセッサ用のプログ
ラムの例を示すプログラムリスト。FIG. 22 is a program listing showing an example of a program for a single processor including a two-dimensional array.

【図２３】２次元配列の参照パターンの例を示す説明
図。FIG. 23 is an explanatory diagram showing an example of a two-dimensional array reference pattern.

【図２４】図２２に示す単一プロセッサ用のプログラム
を本発明の実施例により並列化したプログラムリスト。FIG. 24 is a program list in which the program for the single processor shown in FIG. 22 is parallelized according to the embodiment of the present invention.

[Explanation of symbols]

１・・・プログラム並列化処理、２・・・ＦＯＲＴＲＡ
Ｎプログラム３・・・プログラム分割処理、４・・・通信コード生成
処理５・・・並列計算機用プログラム、４０３・・・通信デ
ータ抽出処理４０４・・・通信位置決定処理、４０５・・・通信条件
決定処理４１１・・・制御フローグラフ、４１２・・・データ依
存グラフ４１３・・・通信通信対象データ依存グラフ、４１４・
・・通信情報リスト、４１５・・・通信制御情報リス
ト。1 ... Program parallelization processing, 2 ... FORTRA
N program 3 ... Program division processing, 4 ... Communication code generation processing 5 ... Parallel computer program, 403 ... Communication data extraction processing 404 ... Communication position determination processing, 405 ... Communication conditions Decision processing 411 ... Control flow graph, 412 ... Data dependence graph 413 ... Communication Communication target data dependence graph, 414 ...
..Communication information list, 415 ... Communication control information list.

フロントページの続き (72)発明者金野千里東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者猪貝光祥東京都小平市上水本町５丁目20番１号日立超エル・エス・アイ・エンジニアリング株式会社内Front page continuation (72) Inventor Chisato Kanano 1-280, Higashi Koigokubo, Kokubunji, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (72) Inventor, Mitsuyoshi Inagai 5-20-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Sun Tachicho LSI Engineering Co., Ltd.

Claims

[Claims]

1. Inputting a program for a single processor,
A program parallelization method for converting, by a computer, the input program into a parallelized program for a parallel processor, wherein the converted parallelized program for each processor is analyzed by analyzing the input program. The first step of detecting inter-processor communication that may be necessary when executed by the above, and generating a code for the communication, and analyzing the input program, When the processor executes the parallelized program, a code for controlling to perform communication only when it is determined whether communication with another processor is necessary and it is determined that communication is necessary A second step of generating, a code for the communication and a code for the input program, Those containing a third step of generating a parallelized program adds the code for the control.

2. In the second step, the program part which is always required to perform the inter-processor communication,
2. The program parallelization method according to claim 1, wherein the code for the control in the third step is not generated, and the communication by the code for the communication is unconditionally performed.

3. The inter-processor communication that may be required when the second step is executed in parallel by the plurality of processors detected in the first step,
A flag for managing whether or not the condition requiring the communication is established is newly provided, and a code for updating the value of the flag and a code for controlling the execution of communication by the value of the flag are generated. The program parallelizing method according to claim 1, further comprising steps.

4. A control flow analyzing step of creating a control flow graph from the input program, the control flow graph representing a flow of execution of the input program in a directed graph, and the input program from the control flow graph. It is necessary to communicate between the processors regarding the data dependence analysis step of creating a data dependence graph expressing the definition / reference relationship of variables in a directed graph and the relationship between the definition / reference of each data in the data dependence graph. Communication data extraction processing step of selecting an object and creating a communication target data dependence graph, determining a communication point in the program from the communication target data dependence graph, and providing a communication code to be inserted at that position. A communication position determination processing step of generating and inserting to create a communication information list. The program parallelization method according to claim 1.

5. The variable definition flag generation processing step of generating a code for setting a predetermined flag in the second step, where communication target data of a communication code in the communication information list is defined, and the flag. 5. The program parallelization according to claim 4, further comprising a communication conditional expression generation processing step for generating a code of the communication conditional expression for controlling the communication code to be executed only when is set. Method.

6. The program according to claim 5, wherein the second step further includes a variable definition flag initialization information generation processing step of generating a code for resetting the flag after execution of the communication. Parallelization method.

7. In the second step, further, regarding a flag referred to in the communication conditional expression, the expression defining the flag is checked by tracing the control flow graph in reverse, and the communication conditional expression is always satisfied. In this case, the program parallelization method according to claim 6, further comprising a communication condition reduction processing step of deleting the communication condition expression and the flag.

8. Entering a program for a single processor,
A method of converting the input program into a parallelized program for a parallel processor by a computer, comprising the steps of: (a) dividing the variables of the input program and assigning them to a plurality of processors; By dividing the processing of the input program, assigning to the plurality of processors, (c) by analyzing the input program,
Detecting, for each processor, the presence / absence of a reference to a variable value defined in another processor, and generating a code of a program part for inter-processor communication for the reference, (d) the input By analyzing the program
For each processor, when the processor executes the parallelized program, a step of generating a code of a program part for determining whether or not a reference to the variable value has already been executed, and (e) the processor is parallel for each processor. At the time of executing the optimization program, if the reference to the variable value is not executed yet, the inter-processor communication is controlled to execute the reference to the variable value. And a step of generating a program part for controlling so as not to perform inter-processor communication.

9. A method for converting a program for a single processor into a program to be executed by parallel processors by a computer, wherein the processor refers to data that each processor refers to and is updated by another processor. A variable that indicates whether or not the latest value has been acquired is provided, and in the program part that updates the data, the execution statement string that sets the variable to a value that indicates that communication has not been executed, and the program part that communicates the data. The value of the control variable is inspected by the execution statement string that gives a value indicating that communication has been executed, and the value of the control variable is checked by the program part that refers to the data. A program parallelization method characterized by generating an execution statement sequence for acquiring a value.

10. A variable value that is referred to by each processor and updated by another processor is divided into a direct sum of variable value groups that are collectively referred to during execution, and a control variable is provided for each variable value group. 10. The program parallelization method according to claim 9, wherein when the inter-processor communication of the variable value group is performed, an execution statement string that sets the control variable to a value indicating that communication has been executed is generated.

11. A method for converting a program for a single processor into a program to be executed by parallel processors, wherein each processor refers to array data distributed among a plurality of processors and is processed by another processor. With respect to the part to be updated, a control variable indicating whether or not the processor has acquired the latest value is provided, and in the program part that updates the array, an execution statement string that sets the variable to a value indicating that communication has not been executed. , An execution statement string that is used as a value indicating that communication has been completed in the program part that communicates the data
A program part that refers to the data checks the value of the control variable, and when communication of the data is not completed, generates an execution statement string that acquires the latest value of the data through inter-processor communication. Program parallelization method.

12. A portion of the array data referred to by each processor and updated by another processor is divided into a direct sum of an element group which is always referred to as a group during execution, and a control variable is set for each element group. 11. An execution statement string is provided which sets the control variable to a value indicating that communication has been completed when the element group of the array data is communicated.
The described program parallelization method.

13. A portion of the array data referred to by each processor and updated by another processor is divided into a direct sum of an element group that is always collected during execution, and the array is associated with the array. In a program part that provides a control variable for managing whether each element group of data has been communicated, and communicates the element group of the array data,
The execution statement string that sets the control variable to a value indicating that the element group has been communicated, and the value of the control variable is checked in the program part that refers to the element group, and communication of the element group is not completed. 12. The program parallelization method according to claim 11, wherein in the case, an execution statement string for acquiring the latest value of the element group is generated by inter-processor communication.