JPH09167144A

JPH09167144A - Method and system for program generation

Info

Publication number: JPH09167144A
Application number: JP7330619A
Authority: JP
Inventors: Kozo Doi; 幸蔵土井; Asao Yamamoto; 朝男山本
Original assignee: Hitachi Engineering Co Ltd
Current assignee: Hitachi Engineering Co Ltd
Priority date: 1995-12-19
Filing date: 1995-12-19
Publication date: 1997-06-24

Abstract

PROBLEM TO BE SOLVED: To provide a means for generating a parallel program for optimizing a communication processing which respective processors execute so that a parallel processing in plural processors can efficiently be executed. SOLUTION: In the method, the parallel program 10 in which plural kinds of operation procedures and plural kinds of communication procedures corresponding to the communication processing among the processors are mentioned and which is for executing the parallel processing is changed. When the communication quantity of the communication processing which is executed in accordance with the communication procedure which is used at present is assumed to be increased, the communication procedure in the parallel program 10 is rearranged and a mentioned content is changed so that more than two communication procedures are synthesized when time from the start to the end of the parallel processing becomes short.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、複数プロセッサシ
ステムや、各プロセッサ毎にローカルメモリを備えた分
散メモリ型並列計算機等が並列処理を行なう際に使用す
る、並列プログラムの作成技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for creating a parallel program used when a multiprocessor system, a distributed memory parallel computer having a local memory for each processor, and the like perform parallel processing.

【０００２】[0002]

【従来の技術】複数プロセッサシステムや、各プロセッ
サ毎にローカルメモリを備えた分散メモリ型並列計算機
等において並列処理を行なう際には、各プロセッサが並
列プログラムを参照して、いわゆるプロセッサ通信を行
なう。この通信処理においては、大容量のデータを高速
に扱うため、ある程度まとまったデータを、一括して転
送すると通信効率が良いが、極端に通信量の多いデータ
を転送すると、並列処理演算を開始するまでの時間ロス
が発生するため、並列処理率が悪くなってしまう。2. Description of the Related Art When performing parallel processing in a multi-processor system or a distributed memory parallel computer having a local memory for each processor, each processor refers to a parallel program and performs so-called processor communication. In this communication processing, since a large amount of data is handled at high speed, it is possible to transfer data that has been collected to some extent in a batch for better communication efficiency. However, when data with an extremely large amount of communication is transferred, parallel processing operation is started As a result, the parallel processing rate becomes worse.

【０００３】ところで、従来から、逐次計算機向けのプ
ログラムを、並列計算機向けのプログラムに変換するプ
ログラム変換システムが提案されているが、逐次計算機
向けのプログラムに出てくる「Ｄｏループ」等の繰り返
し計算に着目した、データ分割処理が主流である。By the way, conventionally, a program conversion system for converting a program for a serial computer into a program for a parallel computer has been proposed. However, iterative calculation such as "Do loop" appearing in the program for a serial computer has been proposed. Data partitioning, which focuses on, is the mainstream.

【０００４】ここで、データ分割処理とは、逐次計算機
向けのプログラム中に出てくる「配列」を、並列計算機
のプロセッサ数に応じて割当てて、各プロセッサに割り
当てられた配列と、他プロセッサに割り当てられた配列
とが、データ通信によって、配列内にデータを授受し
て、並列処理演算を行なう処理である。Here, the data division processing means that the "array" appearing in the program for the sequential computer is assigned according to the number of processors of the parallel computer, and the array assigned to each processor and the other processors are assigned. The allocated array is a process of exchanging data in the array by data communication and performing a parallel processing operation.

【０００５】この場合、並列計算機向けのプログラムに
変換するためには、プログラム内に通信関数を挿入し、
プロセッサの配置状況、配列の割当てパターン等を設定
する必要があるが、逐次プログラムの構造の理解、並列
化手法の認識等を始めとして、実際の変換処理のため
に、多くの開発工数を要する。In this case, in order to convert into a program for a parallel computer, a communication function is inserted in the program,
Although it is necessary to set the arrangement state of the processors, the allocation pattern of the array, etc., a lot of development man-hours are required for the actual conversion processing including the understanding of the structure of the sequential program and the recognition of the parallelization method.

【０００６】現在、前述したようなプロセッサの配置状
況、配列の割当てパターン等の設定を自動的に行う変換
手段として、「Ｆｏｒｔｒａｎ−Ｄコンパイラ」等が使
用されており、データ分割処理に関しては、ある程度の
自動化が可能となった。At present, a "Fortran-D compiler" or the like is used as a conversion means for automatically setting the above-mentioned processor arrangement status, array allocation pattern and the like. Has become possible.

【０００７】しかしながら、実際に、このような自動変
換を、並列計算機上で実行した場合には、特定のパター
ンのみの並列化が行なわれるのみであり、綿密な手作業
で、プログラムを並列化した場合に比べ、変換性能にお
いて見劣りする。ところが、手作業によって、プログラ
ムを並列化した場合には、並列演算の高速化の点では、
高い効果を期待できるものの、工数、人件費等を多量に
要してしまう。また、複数種類のシステムに適合可能な
ように、プログラムの変換処理を行なう場合には、各シ
ステムの特性を把握して、プログラムの並列化作業を行
なうことにになるので、手作業では非常に困難であっ
た。However, actually, when such automatic conversion is executed on a parallel computer, only a specific pattern is parallelized, and the program is parallelized by careful manual work. The conversion performance is inferior to the case. However, when the program is parallelized by hand, in terms of speeding up the parallel operation,
Although a high effect can be expected, a lot of man-hours and labor costs are required. In addition, when performing program conversion processing so that it can be adapted to multiple types of systems, it is necessary to grasp the characteristics of each system and parallelize the program, so it is extremely manual work. It was difficult.

【０００８】[0008]

【発明が解決しようとする課題】ところで、分散メモリ
型並列計算機等を利用して並列処理を行なわせるため
に、並列処理プログラムを作成する際、計算機固有の計
算処理速度、通信速度等の処理性能を意識して、並列処
理プログラムを作成することは行われておらず、ある並
列計算機用に開発したプログラムを、性能の異なる他の
並列計算機上で使用する場合がある。By the way, when a parallel processing program is created in order to perform parallel processing using a distributed memory type parallel computer or the like, processing performance such as calculation processing speed and communication speed peculiar to the computer is created. With this in mind, a parallel processing program has not been created, and a program developed for a certain parallel computer may be used on another parallel computer having different performance.

【０００９】この場合、前述した、ある並列計算機用に
開発したプログラム自体は、当該並列計算機に対して最
適化が行なわれているものであっても、このプログラム
は、性能が大きく異なる他の並列計算機に対しては、必
ずしも最適化されていないことになる。特に、各並列計
算機において、同一プログラムによる性能評価を行う場
合には、このような問題がある。In this case, even if the above-mentioned program developed for a certain parallel computer has been optimized for the parallel computer, this program is different from another parallel computer having greatly different performance. It is not necessarily optimized for the computer. In particular, in each parallel computer, there is such a problem when performance evaluation is performed by the same program.

【００１０】また、並列処理の最適化手法例として、並
列計算機（プロセッサ）間の通信回数を削減する目的
で、通信データの一括転送を行うことが考えられるが、
並列計算処理性能を考慮すると、逆に、通信データの分
割転送を行う方が、高速なプロセッサ間通信が実現でき
る場合もあり、どのように、プロセッサ間通信を行なわ
せて、並列処理の効率を向上させるかを定めるのは、非
常に困難であるという課題があった。Further, as an example of a parallel processing optimization method, it is conceivable to collectively transfer communication data for the purpose of reducing the number of times of communication between parallel computers (processors).
In consideration of the parallel computing processing performance, on the contrary, it may be possible to realize high-speed inter-processor communication by performing division transfer of communication data. How can inter-processor communication be performed to improve parallel processing efficiency? There was a problem that it was very difficult to decide whether to improve.

【００１１】例えば、大容量のデータ通信を高速に行な
うためには、ある程度まとまったデータを、一括して転
送すると良いが、極端に、通信量の大きなデータを転送
すると、並列処理を開始するまでのタイムロスが発生す
るため、並列処理の効率が悪化する。For example, in order to perform high-capacity data communication at high speed, it is preferable to collectively transfer a certain amount of data, but when extremely large data is transferred, parallel processing is started. Therefore, the efficiency of parallel processing deteriorates.

【００１２】特開平５−８９０７０号公報等に記載の技
術によれば、プロセッサが備えるメモリ内に、とびとび
に配置された通信データを一括転送して、並列処理の高
効率化を図るようになっているが、通信量の設定方法、
通信処理順の決定方法、分割転送等を行なって、並列処
理の高速化を行なうための技術開示はない。また、並列
処理を実行するための、通信方法の決定は、有識者の長
年の経験や感によるところが大きく、また、各並列計算
機に特有の並列プログラムまでも、手作業で作成するの
は非常に困難であった。According to the technique disclosed in Japanese Unexamined Patent Publication No. 5-89070, the communication data arranged in discrete locations are collectively transferred to the memory of the processor to improve the efficiency of parallel processing. However, how to set the communication volume,
There is no technology disclosure for speeding up parallel processing by performing a communication processing order determination method, divided transfer, and the like. In addition, the decision of the communication method for executing parallel processing largely depends on the many years of experience and feelings of experts, and it is very difficult to manually create even a parallel program unique to each parallel computer. Met.

【００１３】そこで、本発明の目的は、複数プロセッサ
システムや、各プロセッサ毎にローカルメモリを備えた
分散メモリ型並列計算機等が並列処理を行なう際に使用
する、並列プログラムを、プロセッサ間で行なう通信処
理の効率向上を図るように作成する手段を提供すること
にある。Therefore, an object of the present invention is to perform communication between processors to execute a parallel program, which is used when a multiprocessor system, a distributed memory parallel computer having a local memory for each processor, or the like performs parallel processing. It is to provide a means for creating such that the processing efficiency is improved.

【００１４】[0014]

【課題を解決するための手段】上記課題を解決し、本発
明の目的を達成するために、以下の手段がある。In order to solve the above problems and achieve the object of the present invention, there are the following means.

【００１５】即ち、複数種類の演算手順、および、プロ
セッサ間での通信処理に対応する、複数種類の通信手順
を記述した、並列処理を行なうための並列プログラムを
変更する方法であって、記述された、ある通信手順に対
する通信量を、各演算手順に対する演算時間、および、
各通信手順に対する通信量を参照して決定して、決定さ
れた通信量で通信処理が行なわれるように、当該通信手
順の記述を変更する、プログラム作成方法である。That is, a method for changing a parallel program for performing parallel processing, which describes a plurality of types of arithmetic procedures and a plurality of types of communication procedures corresponding to communication processing between processors, is described. In addition, the communication amount for a certain communication procedure is calculated as the calculation time for each calculation procedure, and
This is a program creation method in which the communication amount for each communication procedure is referred to and determined, and the description of the communication procedure is changed so that communication processing is performed at the determined communication amount.

【００１６】さらに具体的には、複数種類の演算手順、
および、プロセッサ間での通信処理に対応する、複数種
類の通信手順を記述した、並列処理を行なうための並列
プログラムを変更する方法であって、現在使用している
通信手順にしたがって行なわれている通信処理の通信量
を増加したと仮定すると、前記並列処理の開始から終了
までの時間が短くなる場合、並列プログラム内での通信
手順の並べ替えを行ない、２以上の通信手順を合体する
ように、記述内容を変更する、プログラム作成方法であ
る。More specifically, a plurality of types of arithmetic procedures,
And a method for changing a parallel program for performing parallel processing, which describes a plurality of types of communication procedures corresponding to communication processing between processors, and is performed according to the communication procedure currently used. Assuming that the communication processing communication amount is increased, if the time from the start to the end of the parallel processing becomes short, the communication procedures in the parallel program are rearranged so that two or more communication procedures are combined. It is a method of creating a program that changes the description content.

【００１７】なお、通信手順の並べ替えを行なう際に
は、予め定められた制約条件を参照して行ない、該予め
定められた制約条件は、特定の演算手順の、前または後
に、並び替えられることであること、を特徴とするプロ
グラム作成方法が好ましい。When the communication procedures are rearranged, the predetermined constraint conditions are referred to, and the predetermined constraint conditions are rearranged before or after the specific operation procedure. A program creating method characterized by the above is preferable.

【００１８】さらに、本発明の他の態様によれば、複数
種類の演算手順、および、プロセッサ間での通信処理に
対応する、複数種類の通信手順を記述した、並列処理を
行なうための並列プログラムを変更する方法であって、
現在使用している通信手順にしたがって行なわれている
通信処理の通信量を減少したと仮定すると、前記並列処
理の開始から終了までの時間が短くなる場合、通信処理
対象となる通信データを分割して、分割したデータ毎
に、通信処理を行なうように、当該通信手順の記述内容
を変更する、プログラム作成方法が提供される。Further, according to another aspect of the present invention, a parallel program for performing parallel processing, which describes a plurality of types of operation procedures and a plurality of types of communication procedures corresponding to communication processing between processors. Is a method of changing
Assuming that the amount of communication processing being performed according to the communication procedure currently in use has been reduced, if the time from the start to the end of the parallel processing becomes short, the communication data to be processed by communication is divided. Thus, there is provided a program creating method for changing the description content of the communication procedure so that the communication processing is performed for each divided data.

【００１９】さらにまた、本発明によれば、以下のシス
テム態様が提供される。Furthermore, according to the present invention, the following system aspects are provided.

【００２０】即ち、複数種類の演算手順、および、プロ
セッサ間での通信処理に対応する、複数種類の通信手順
を記述した、並列処理を行なうための並列プログラムを
変更するシステムであって、現在使用している通信手順
にしたがって行なわれている通信処理の通信量を増加し
たと仮定すると、前記並列処理の開始から終了までの時
間が短くなることを判断する手段と、該手段によって、
前記並列処理の開始から終了までの時間が短くなると判
断された場合、並列プログラム内での通信手順の並べ替
えを行ない、２以上の通信手順を合体するように、記述
内容を変更する手段と、を備えるプログラム作成システ
ムである。That is, it is a system for changing a parallel program for performing parallel processing, which describes a plurality of types of communication procedures corresponding to a plurality of types of arithmetic procedures and communication processing between processors, and is currently used. Assuming that the communication amount of the communication processing performed according to the communication procedure being performed is increased, a means for determining that the time from the start to the end of the parallel processing is shortened, and the means,
When it is determined that the time from the start to the end of the parallel processing is shortened, the communication procedures in the parallel program are rearranged, and the description content is modified so that two or more communication procedures are combined. It is a program creation system provided with.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態を、図
面を参照しつつ説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２２】図１に、本発明の一実施形態である通信最
適化システムの構成を示す。FIG. 1 shows the configuration of a communication optimization system which is an embodiment of the present invention.

【００２３】通信最適化システム１２は、通信解析部１
３と、構文解析部１３と、スケジューリング部１５と、
通信最適化部１６と、目的プログラム生成部１７と、を
有して構成される。通信最適化システム１２は、与えら
れた並列プログラム１０を用いて、目的プログラム１１
を生成する。また、通信時間等を計算するための情報等
が格納されている通信用データベース１８を備えてい
る。The communication optimization system 12 includes a communication analysis unit 1
3, a parsing unit 13, a scheduling unit 15,
The communication optimization unit 16 and the target program generation unit 17 are included. The communication optimization system 12 uses the given parallel program 10 to execute the target program 11
Generate Further, the communication database 18 in which information and the like for calculating the communication time and the like are stored is provided.

【００２４】通信解析部１３は、与えられた並列プログ
ラム１０を読み込む機能や、通信用データベース１８を
生成する機能を有している。The communication analysis unit 13 has a function of reading the given parallel program 10 and a function of generating the communication database 18.

【００２５】次に、構文解析部１４は、並列プログラム
１０に基づいて実行される計算（演算）の時間を測定す
る計算時間測定処理や、並列プログラム１０に基づいて
実行される通信の時間を測定する通信時間測定処理や、
並列プログラム１０に基づいて実行される、計算と通信
の順番関係等を示す依存関係を調べる依存関係調査処理
等を行なう。Next, the syntactic analysis unit 14 measures the calculation time measuring process for measuring the time of the calculation (calculation) executed based on the parallel program 10 and the communication time executed based on the parallel program 10. Communication time measurement processing,
A dependency relationship checking process, which is executed based on the parallel program 10, is performed to check the dependency relationship indicating the order relationship between calculation and communication.

【００２６】また、スケジューリング部１５は、構文解
析部１４が行なう各測定処理の結果に基づいた表示処理
である、測定結果表示処理を行なう。The scheduling unit 15 also performs a measurement result display process, which is a display process based on the result of each measurement process performed by the syntax analysis unit 14.

【００２７】通信最適化部１６は、通信の最適化、例え
ば、複数の通信手順の入替えや、通信対象データの分割
処理等を行なう。この際、通信最適化部１６は、通信用
データベース１８を参照して、通信の最適化を行なう。The communication optimizing unit 16 optimizes communication, for example, exchanges of a plurality of communication procedures and division processing of communication target data. At this time, the communication optimization unit 16 refers to the communication database 18 to optimize the communication.

【００２８】目的プログラム生成部１７は、通信最適化
部１６が行なった最適化の結果を参照して、目的プログ
ラム１１を生成する。The object program generator 17 refers to the result of the optimization performed by the communication optimizer 16 to generate the object program 11.

【００２９】まず、本システムに与える、並列プログラ
ム１０の具体例について、図５を参照して説明する。こ
こでは、プログラミング言語の一例として、「ＦＯＲＴ
ＲＡＮ」を使用している。即ち、図５は、ＦＯＲＴＲＡ
Ｎのソースプログラムを示している。説明の容易化のた
め、通常、ソースプログラムの先頭位置に記述する、配
列宣言命令、プロセッサの計算範囲指定命令、各種変数
の設定命令等は、省略している。First, a specific example of the parallel program 10 given to this system will be described with reference to FIG. Here, as an example of a programming language, "FORT
RAN ”is used. That is, FIG. 5 shows FORTRA.
The source program of N is shown. For ease of explanation, an array declaration instruction, a processor calculation range designation instruction, various variable setting instructions, etc., which are usually described at the head position of the source program, are omitted.

【００３０】また、以後、４つのプロセッサを用いて並
列処理するものとして、各プロセッサは、全配列要素数
の「１／４」の要素に対する処理を行なうようになって
いるものとする。Further, hereinafter, assuming that four processors are used for parallel processing, each processor is supposed to perform processing for "1/4" of the total number of array elements.

【００３１】前記４つのプロセッサには、「０」から
「３」の装置番号を付与し、各プロセッサを、夫々、
「ＰＥ０」、「ＰＥ１」、「ＰＥ２」、「ＰＥ３」と記
載する。Device numbers from "0" to "3" are assigned to the four processors, and the respective processors are respectively assigned
Described as "PE0", "PE1", "PE2", and "PE3".

【００３２】さて、図５に示すプログラムが行なう、計
算処理、通信処理の内容について説明する。Now, the contents of calculation processing and communication processing performed by the program shown in FIG. 5 will be described.

【００３３】ステップ５０、５３、５６、５９では、夫
々、配列の計算を行ない、配列Ａ（Ｉ）、Ｂ（Ｉ）、Ｃ
（Ｉ）、Ｄ（Ｉ）、Ｅ（Ｉ）の内容を設定する、計算処
理を行なっている。なお、夫々の計算処理手順を、「CA
L1」、「CAL2」、「CAL3」、「CAL4」としている。In steps 50, 53, 56 and 59, the array calculation is performed and the arrays A (I), B (I) and C are calculated.
A calculation process for setting the contents of (I), D (I), and E (I) is performed. In addition, each calculation procedure is described in "CA
L1 ”,“ CAL2 ”,“ CAL3 ”, and“ CAL4 ”.

【００３４】また、ステップ５１、５２、５４、５５、
５７、５８では、プロセッサ間の通信処理を行なってい
る。なお、夫々の通信処理手順を、「COM1」、「COM
2」、「COM3」、としている。Further, steps 51, 52, 54, 55,
In 57 and 58, communication processing between processors is performed. In addition, each communication processing procedure, "COM1", "COM
2 ”and“ COM3 ”.

【００３５】通信は、例えば、配列を使用する計算処理
を行なう場合、次のように行なう。Communication is performed as follows, for example, when a calculation process using an array is performed.

【００３６】即ち、あるプロセッサが、配列を使用する
計算処理を行なう場合であって、当該配列の要素を所有
していない時に、該要素を所有している他のプロセッサ
から送信された該要素を、受信する。これが、配列を使
用する計算処理を行なう場合に行なわれる通信の一例で
ある。That is, when a processor performs a calculation process using an array and does not own the element of the array, the processor transmits the element transmitted from another processor owning the element. , To receive. This is an example of communication performed when a calculation process using an array is performed.

【００３７】なお、通信関数として、送信に「ＳＥＮ
Ｄ」、受信に「ＲＥＣＶ」を採用し、「ＳＥＮＤ」の引
き数は、「送信するデータの種類、転送長（１文字８バ
イトのデータを仮定する）、送信相手のプロセッサ番
号」である。また、「ＲＥＣＶ」の引き数は、「受信す
るデータの種類、転送長、受信相手のプロセッサの番
号」である。As a communication function, "SEN" is used for transmission.
"D", "RECV" is used for reception, and the arguments of "SEND" are "type of data to be transmitted, transfer length (assuming data of 8 bytes for one character), processor number of transmission partner". The argument of “RECV” is “type of data to be received, transfer length, processor number of receiving partner”.

【００３８】図５の５０では、配列ＡにＹを乗じた値を
配列Ａに代入するとともに、配列ＡにＺを乗じた掛けた
値を配列Ｂに代入する、配列を使用した計算処理を行な
うようにプログラミングしている。なお、このような配
列Ａ、配列Ｂに対する代入処理は、「ＤＯ文」命令によ
って、「ist」から「iend」まで、増分を「１」として、行な
われる。In 50 of FIG. 5, a calculation process using an array is performed in which a value obtained by multiplying the array A by Y is assigned to the array A and a value obtained by multiplying the array A by Z is assigned to the array B. I am programming like. The assignment process for the arrays A and B is performed by the "DO statement" command from "ist" to "iend" with an increment of "1".

【００３９】即ち、配列Ａ、配列Ｂに対する代入処理
は、変数Ｉの値が「ist」から「iend」になるまで、繰返し
行なわれる。That is, the assignment process for the arrays A and B is repeated until the value of the variable I changes from "ist" to "iend".

【００４０】次に、図５の５１では、ＰＥ（プロセッサ
番号）が３以外の時（「ＮＥ」は、等しくない場合を、
意味する）、自分のプロセッサ番号（ＭＹ）に「１」を
加えた番号に対応するプロセッサに、配列Ａの終値（各
プロセッサにおける配列Ａの終値であり、並列処理シス
テムにおける配列Ａの終値ではない）を送信する処理を
行なわせるように、プログラミングしている。Next, in 51 of FIG. 5, when the PE (processor number) is other than 3 ("NE" is not equal,
Meaning), the processor corresponding to the number obtained by adding “1” to its own processor number (MY) is the closing price of the array A (the closing price of the array A in each processor, not the closing price of the array A in the parallel processing system. ) Is programmed so that the process of sending) is performed.

【００４１】また、図５の５２では、ＰＥが０以外の
時、自分のプロセッサ番号から「１」を減じた番号に対
応するプロセッサから、配列Ａの「初期値-1」（ist-
1）番目の要素を受信するようにプログラミングしてい
る。Further, in 52 of FIG. 5, when the PE is other than 0, the processor corresponding to the number obtained by subtracting "1" from its own processor number, the "initial value-1" (ist-
1) It is programmed to receive the 1st element.

【００４２】そして、図５の５３では、配列Ｃの「I」
番目の値に、配列Ａの「I-1」番目の値を乗じた値を、
配列Ｃに代入する処理を行なうように、プログラミング
している。Then, in 53 of FIG. 5, "I" of the array C
The value obtained by multiplying the th value by the "I-1" th value of array A,
The programming is performed so that the process of substituting into the array C is performed.

【００４３】かかる代入処理は、「ＤＯ文」命令によっ
て、「ist」から「iend」まで、増分を「１」として、行なわ
れる。The substitution process is performed by the "DO statement" command from "ist" to "iend" with an increment of "1".

【００４４】次に、図５の５４では、ＰＥが０以外の
時、自分のプロセッサ番号から「１」を減じた番号に対
応するプロセッサに、配列Ａの初期値を送信し、また、
図５の５５では、ＰＥが３以外の時、自分のプロセッサ
番号に「１」を加えた番号に対応するプロセッサから、
配列Ａの「終値+1」(iend+1)番目の要素を受信するよう
にプログラミングしている。Next, in 54 of FIG. 5, when PE is other than 0, the initial value of the array A is transmitted to the processor corresponding to the number obtained by subtracting "1" from its own processor number, and
In 55 of FIG. 5, when the PE is other than 3, from the processor corresponding to the number obtained by adding “1” to its own processor number,
The program is programmed to receive the "iend + 1" th element of the array A.

【００４５】次に、図５の５６では、配列Ｄの「I」番
目の値に、配列Ａの「I+1」番目の値を乗じた値を、配
列Ｄに代入するようにプログラミングしている。かかる
代入処理は、「ＤＯ文」命令によって、「ist」から「ien
d」まで、増分を「１」として、行なわれる。Next, at 56 in FIG. 5, programming is performed so that the value obtained by multiplying the "I" th value of the array D by the "I + 1" th value of the array A is substituted into the array D. There is. This substitution process is performed by changing the "ist" to "ien" by the "DO sentence" command.
Up to "d", the increment is "1".

【００４６】次に、図５の５７では、ＰＥが３以外の
時、自分のプロセッサ番号に「１」を加えた番号に対応
するプロセッサに、配列Ｂの終値を送信し、また、図５
の５８では、ＰＥが０以外の時、自分のプロセッサ番号
から「１」を減じた番号に対応するプロセッサから、配
列Ｂの「初期値-1」（ist-1）番目の要素を受信するよ
うにプログラミングしている。Next, in 57 of FIG. 5, when the PE is other than 3, the final value of the array B is transmitted to the processor corresponding to the number obtained by adding “1” to its own processor number, and FIG.
58, when the PE is other than 0, the "initial value-1" (ist-1) th element of the array B is received from the processor corresponding to the number obtained by subtracting "1" from its own processor number. Programming to.

【００４７】図５の５９では、配列Ｅの「I」番目の値
に、配列Ｂの「I-1」番目の値を乗じた値を、配列Ｅに代
入するようにプログラミングしている。かかる代入処理
は、「ＤＯ文」命令によって、「ist」から「iend」まで、
増分を「１」として、行なわれる。In 59 of FIG. 5, programming is performed so that the value obtained by multiplying the "I" th value of the array E by the "I-1" th value of the array B is substituted into the array E. This substitution process is performed by the "DO sentence" command from "ist" to "iend".
The increment is "1".

【００４８】以上が、図１に示した、ソースプログラム
で記述した、並列ブログラム１０の一例の説明である。The above is an explanation of an example of the parallel program 10 described in the source program shown in FIG.

【００４９】次に、図１に示した、通信最適化システム
１２の動作を、図１〜１２を参照して説明する。Next, the operation of the communication optimizing system 12 shown in FIG. 1 will be described with reference to FIGS.

【００５０】まず、通信解析部１３は、並列プログラム
１０（図５で、ソースプログラムの一例を示した）を受
け付けて、並列処理に使用するプロセッサの数を判定す
る。First, the communication analysis unit 13 receives the parallel program 10 (an example of the source program is shown in FIG. 5) and determines the number of processors used for parallel processing.

【００５１】ここで、使用するプロセッサの数は、「４
台」であるので、プロセッサ間通信が行なわれる場合に
は、図２に示す様なプロセッサ間通信パターンのいずれ
かが採用されて、プロセッサ間通信が行なわれる。Here, the number of processors used is "4.
Therefore, when the inter-processor communication is performed, one of the inter-processor communication patterns as shown in FIG. 2 is adopted and the inter-processor communication is performed.

【００５２】ここで、図２で示した、プロセッサ間通信
のパターンを説明しておくことにする。図２では、４種
類のパターンが示されている。なお、矢印付近の、丸付
き数字は、通信順番を示している。The inter-processor communication pattern shown in FIG. 2 will be described. In FIG. 2, four types of patterns are shown. Circled numbers near the arrows indicate the communication order.

【００５３】図２（Ａ）のパターンは、プロセッサ０番
（ＰＥ０）が、他の総てのプロセッサ（ＰＥ１、ＰＥ
２、ＰＥ３）から、データを受信するパターンである。
受信順番の例を、丸付け数字で示している。In the pattern of FIG. 2A, the processor 0 (PE0) has all the other processors (PE1 and PE0).
2, PE3) is a pattern for receiving data.
An example of the reception order is shown by circled numbers.

【００５４】次に、図２（Ｂ）のパターンは、図２
（Ａ）のパターンとは逆の通信動作を行なうパターンで
あり、プロセッサ０番（ＰＥ０）が、他の総てのプロセ
ッサ（ＰＥ１、ＰＥ２、ＰＥ３）へデータを送信する。
送信順番の例を、丸付け数字で示している。Next, the pattern of FIG.
This is a pattern for performing a communication operation opposite to the pattern of (A), and the processor 0 (PE0) transmits data to all the other processors (PE1, PE2, PE3).
An example of the transmission order is shown by circled numbers.

【００５５】さらに、図２（Ｃ）のパターンは、隣同志
のプロセッサ間で、送信および受信を行うものである。
図示した例では、まず、各プロセッサが、自身の右側に
配置するプロセッサに対する送信処理を行ない（）、
次に、各プロセッサが、自身の左側に配置するプロセッ
サに対する送信処理を行なう（）。Further, the pattern of FIG. 2C is for performing transmission and reception between the processors of the adjacent comrades.
In the illustrated example, first, each processor performs transmission processing to the processor arranged on the right side of itself (),
Next, each processor performs transmission processing to the processor arranged on the left side of itself ().

【００５６】さらにまた、図２（Ｄ）は、各プロセッサ
が、順次送信処理を行なうものであり、具体的には、ま
ず、プロセッサ０番（ＰＥ０）がプロセッサ１番（ＰＥ
１）に対して送信処理し、プロセッサ１番（ＰＥ１）の
受信が完了した後、プロセッサ１番（ＰＥ１）が、プロ
セッサ２番（ＰＥ２）に対して送信処理し、プロセッサ
２番（ＰＥ２）の受信が完了した後、最後に、プロセッ
サ２番（ＰＥ２）が、プロセッサ３番（ＰＥ３）に対し
て送信処理を行なうものである。Further, in FIG. 2D, each processor sequentially performs transmission processing. Specifically, first, processor 0 (PE0) is processor 1 (PE).
1), the processor 1 (PE1) completes the reception, and then the processor 1 (PE1) transmits to the processor 2 (PE2) and the processor 2 (PE2) After the reception is completed, finally, the processor 2 (PE2) performs the transmission process to the processor 3 (PE3).

【００５７】これらは、通信パターンの一例にすぎな
い。These are merely examples of communication patterns.

【００５８】さて、通信解析部１３は、プロセッサ間通
信の各パターンに対する通信時間を、各データ転送長ご
と（０，８，１６，…，８ｎバイト（ｎは、自然数））
に、予め測定し、データベース化する機能を有してい
る。このデータベースが、図１の、通信用データーベー
ス１８である。したがって、通信用データーベース１８
を参照することによって、採用する通信パターンやデー
タ転送長に対応する、通信時間が定まることになる。The communication analysis unit 13 determines the communication time for each pattern of inter-processor communication for each data transfer length (0, 8, 16, ..., 8n bytes (n is a natural number)).
In addition, it has a function of measuring in advance and creating a database. This database is the communication database 18 shown in FIG. Therefore, the communication database 18
By referring to, the communication time corresponding to the adopted communication pattern and data transfer length is determined.

【００５９】具体的には、通信用データーベース１８
は、通信パターンの種類に対応する通信パターン名、デ
ータ転送長、通信時間の情報が、ペアとして格納されて
いる。Specifically, the communication database 18
The information of the communication pattern name, the data transfer length, and the communication time corresponding to the type of the communication pattern is stored as a pair.

【００６０】ここで、通信用データーベース１８に格納
する値の一例として、図３に、プロセッサ０番の、各通
信パターンにおける、データ転送長と通信時間との関係
を示す。図３の右側に示す、（Ａ）、（Ｂ）、（Ｃ）、
（Ｄ）は、図２の（Ａ）〜（Ｄ）で説明した、通信パタ
ーンに対応する。図３に示すように、各通信パターンに
おいて、データ転送長に応じた通信時間が、通信用デー
ターベース１８に予め設定される。Here, as an example of the values stored in the communication database 18, FIG. 3 shows the relationship between the data transfer length and the communication time in each communication pattern of processor 0. (A), (B), (C), shown on the right side of FIG.
(D) corresponds to the communication pattern described in (A) to (D) of FIG. As shown in FIG. 3, in each communication pattern, the communication time corresponding to the data transfer length is preset in the communication database 18.

【００６１】図３に示すように、図２の（Ａ）〜（Ｄ）
で示した通信パターンによれば、データ転送長がある程
度小さい場合（一般に、１０２４Ｋバイト以下）、通信
時間が、あまり変化しないことが分かる。As shown in FIG. 3, (A) to (D) of FIG.
According to the communication pattern indicated by, the communication time does not change much when the data transfer length is small to some extent (generally 1024 Kbytes or less).

【００６２】次に、構文解析部１４が行なう処理を、図
４、図５を参照して説明する。Next, the processing performed by the syntax analysis unit 14 will be described with reference to FIGS.

【００６３】なお、図４は、構文解析部１４が行なう処
理内容をフローチャートで示したものである。以下、構
文解析部１４が行なう処理について説明する。FIG. 4 is a flow chart showing the processing contents performed by the syntax analysis unit 14. The processing performed by the syntax analysis unit 14 will be described below.

【００６４】まず、ステップ４０において、通信解析部
が受け付けた並列プログラム１０を得て、Ｄｏループ、
ＧｏＴｏ文、Ｓｕｂｒｏｕｔｉｎｅ等の存在の検索を行
い、並列プログラム１０を、計算処理単位と、通信処理
単位とに分割する。First, in step 40, the parallel program 10 received by the communication analysis unit is obtained, and the Do loop,
The existence of GoTo sentence, Subroutine, etc. is searched, and the parallel program 10 is divided into a calculation processing unit and a communication processing unit.

【００６５】ここで、計算処理単位とは、並列プログラ
ムにおいて、一連の計算処理が行なわれている部分であ
り、例えば、計算処理手順を示す、図５の５０、５３、
５６、５９が挙げられる。Here, the calculation processing unit is a portion where a series of calculation processing is performed in the parallel program. For example, 50, 53 in FIG.
56 and 59.

【００６６】また、通信処理単位とは、並列プログラム
において、通信処理が行なわれている部分であり、例え
ば、通信処理手順を示す図５の５１、５２等が挙げられ
る。The communication processing unit is a portion of the parallel program in which communication processing is performed, and examples thereof include 51 and 52 in FIG. 5 showing the communication processing procedure.

【００６７】そして、並列プログラムを分割して得た、
計算処理単位と通信処理単位の夫々に名前を付ける（ス
テップ４２、４６）。Then, the parallel program is obtained by dividing,
A name is given to each of the calculation processing unit and the communication processing unit (steps 42 and 46).

【００６８】具体的には、以下のようにすればよい。図
５の「５０」は、配列を使用した計算を行っているの
で、ＣＡＬ１（「Ｃａｌｃｕｌａｔｉｏｎ：計算」の略
称）と命名し、図５の５１、５２は、プロセッサ間通信
を行うので、これらの部分を、ＣＯＭ１（「Ｃｏｍｍｕ
ｎｉｃａｔｉｏｎ：通信」の略称）と命名する。Specifically, the following may be done. Since “50” in FIG. 5 performs the calculation using the array, it is named CAL1 (abbreviation of “Calculation”), and 51 and 52 in FIG. 5 perform inter-processor communication. COM1 ("Commuu
abbreviation of "nication: communication").

【００６９】同様に、図５の５３を「ＣＡＬ２」、図５
の５４、５５を「ＣＯＭ２」、図５の５６を「ＣＡＬ
３」、図５の５７、５８を「ＣＯＭ３」、図５の５９を
「ＣＡＬ４」とする。Similarly, 53 in FIG. 5 is designated as “CAL2”, and FIG.
54 and 55 of "COM2", 56 of FIG. 5 is "CAL
3 ”, 57 and 58 in FIG. 5 are“ COM3 ”, and 59 in FIG. 5 is“ CAL4 ”.

【００７０】まず、通信時間測定処理４１について説明
する。この処理では、ステップ４２で命名された各通信
処理単位毎の、通信時間を計算する。つまり、ステップ
４３において、通信処理単位毎の、データ転送長および
通信パターンを調べ、さらに、ステップ４４において、
通信用データベース１８を参照して、通信処理単位毎の
通信時間を求める。First, the communication time measuring process 41 will be described. In this processing, the communication time is calculated for each communication processing unit named in step 42. That is, in step 43, the data transfer length and communication pattern for each communication processing unit are checked, and in step 44,
The communication time is calculated for each communication processing unit by referring to the communication database 18.

【００７１】次に、計算時間測定処理４５について説明
する。Next, the calculation time measuring process 45 will be described.

【００７２】この処理では、ステップ４６において命名
された各計算処理単位毎に、計算に要する時間である計
算時間を計算する。つまり、ステップ４７において、各
計算処理単位ごとの演算量を計算し、さらに、ステップ
４８において、演算処理性能（１演算にかかる、予め定
めた時間）を参照して、計算して求めた演算量に基づい
て計算時間を算出する。なお、演算量は、Ｄｏループの
場合には繰返し回数、ＧｏＴｏ文の場合には分岐回数、
Ｓｕｂｒｏｕｔｉｎｅの場合には、コール回数と、定め
ておいても良いし、各計算処理単位毎の総プログラムス
テップ数等と定めておいても良く、これらの定め方には
限定されない。各計算処理単位に対して、その演算に必
要な時間を計算可能なパラメータであれば、いかなるも
のを採用しても良い。In this processing, the calculation time, which is the time required for the calculation, is calculated for each calculation processing unit named in step 46. That is, in step 47, the calculation amount for each calculation processing unit is calculated, and further, in step 48, the calculation amount calculated by referring to the calculation processing performance (one calculation, a predetermined time) is calculated. The calculation time is calculated based on. The calculation amount is the number of iterations in the case of Do loop, the number of branches in the case of GoTo statement,
In the case of Subroutine, it may be defined as the number of calls, or may be defined as the total number of program steps for each calculation processing unit, and there is no limitation to these defining methods. For each calculation processing unit, any parameter may be adopted as long as it can calculate the time required for the calculation.

【００７３】また、演算処理性能は、予め、該性能を調
べるためのルーチンを定めておき、ステップ４８におい
て、１度だけ当該ルーチンの実行により求めるようにし
ておけばよい。Further, the arithmetic processing performance may be determined in advance by determining a routine for checking the performance and executing the routine only once in step 48.

【００７４】最後に、依存関係調査処理４９では、通信
処理単位と計算処理単位の依存関係を調査する。ここ
で、依存関係とは、ある通信処理単位に着目したとき、
当該通信処理単位の前後に配置しなければならない計算
処理単位が存在することや、ある通信処理単位の、受信
相手や送信相手となる通信処理単位が存在することによ
って、特定の計算処理単位計算と特定の通信処理単位と
が、守らなければならない、プログラム上での配置関係
があること意味する。Finally, in the dependency relationship investigation processing 49, the dependency relationship between the communication processing unit and the calculation processing unit is investigated. Here, the dependency means, when focusing on a certain communication processing unit,
Due to the existence of calculation processing units that must be placed before and after the communication processing unit, and because there is a communication processing unit that is a reception partner or a transmission partner of a certain communication processing unit, a specific calculation processing unit calculation It means that there is an arrangement relationship in the program that must be protected with a specific communication processing unit.

【００７５】依存関係調査処理４９では、通信処理単位
と計算処理単位の依存関係が存在するか否かを調査し
て、存在する依存関係の情報を、スケジューリング部１
５に渡す。In the dependency relation investigation processing 49, it is examined whether or not there is a dependency relation between the communication processing unit and the calculation processing unit, and the information on the existing dependency relation is calculated.
Pass to 5.

【００７６】次に、スケジューリング部１５は、構文解
析部１４によって得られた測定、調査結果を表示する処
理である結果表示処理を行なう。なお、この表示は、図
１で図示しない表示装置に表示すれば良い。なお、スケ
ジューリング部１５は、構文解析部１４によって得られ
た測定、調査結果を、通信最適化部１６に渡す機能も有
する。Next, the scheduling unit 15 performs a result display process for displaying the measurement and investigation results obtained by the syntax analysis unit 14. This display may be displayed on a display device not shown in FIG. The scheduling unit 15 also has a function of passing the measurement and investigation results obtained by the syntax analysis unit 14 to the communication optimization unit 16.

【００７７】図６に、表示データである、プロセッサ０
番におけるスケジュール形式のデータを示す。なお、こ
こでの表示データは、通信最適化部１６が受け取り、受
け取ったデータに基づく処理を行なうが、これについて
は後に述べる。FIG. 6 shows the display data, processor 0.
Shows the data in the schedule format. The display data here is received by the communication optimizing unit 16 and is processed based on the received data, which will be described later.

【００７８】さて、図４、５、６を参照して、図６にお
けるスケジュール形式のデータについて説明する。Now, with reference to FIGS. 4, 5 and 6, the schedule format data in FIG. 6 will be described.

【００７９】まず、図６の「ＮＡＭＥ」６２は、図４の
ステップ４２、４６において命名した名称を、順番にリ
ストアップしたものを示す欄である。また、図６の「Ｔ
ＩＭＥ」６３の欄では、図４のステップ４４において計
算した、通信処理単位毎の通信時間、または、図４のス
テップ４８において計算した、計算処理単位毎の計算時
間を示す。First, "NAME" 62 in FIG. 6 is a column showing a list of names named in steps 42 and 46 in FIG. 4 in order. In addition, in FIG.
In the column of "IME" 63, the communication time for each communication processing unit calculated in step 44 of FIG. 4 or the calculation time for each calculation processing unit calculated in step 48 of FIG. 4 is shown.

【００８０】また、図６の「演算量」６４は、図４のス
テップ４７において計算した演算量を示す欄であり、さ
らに、図６の「転送長」６５は、図４のステップ４３に
おいて調べたデータ転送長を示す欄である。そして、図
６の「依存関係」６６の「前計算」では、ある通信処理
を行う前に、計算処理を行う必要のある計算処理単位の
名前を示しており、一方、「後計算」では、ある通信処
理を行った後に、計算処理を行う必要がある計算処理単
位の名前を示すものである。また、「受信」は、どのプ
ロセッサからのデータを受信するか示す、「送信」は、
どのプロセッサにデータを送信するかを示すものであ
る。The "calculation amount" 64 of FIG. 6 is a column showing the calculation amount calculated in step 47 of FIG. 4, and the "transfer length" 65 of FIG. 6 is checked in step 43 of FIG. It is a column showing the data transfer length. The "pre-calculation" of the "dependency" 66 of FIG. 6 indicates the name of the calculation processing unit that needs to perform the calculation processing before performing a certain communication processing, while the "post-calculation" This indicates the name of a calculation processing unit that needs to perform calculation processing after performing a certain communication processing. In addition, “reception” indicates from which processor data is received, and “transmission” is
It indicates to which processor the data is transmitted.

【００８１】さらに詳細に、図６について説明すると、
以下のようになる。Referring to FIG. 6 in more detail,
It looks like this:

【００８２】「ＣＡＬ１」は、計算時間（ＴＩＭＥ）が
「５．００（μ秒）」、演算量は、「２００」となって
いることが分かる。「ＣＡＬ１」は、通信処理手順では
ないので、「転送長」、「依存関係」の欄が空白になっ
ている。It can be seen that "CAL1" has a calculation time (TIME) of "5.00 (μsec)" and a calculation amount of "200". Since "CAL1" is not a communication processing procedure, the fields of "transfer length" and "dependency" are blank.

【００８３】また、「ＣＯＭ１」は、通信時間（ＴＩＭ
Ｅ）が「４１２．００（μ秒）」、「転送長」は、「８
（バイト）」、「依存関係」は、「ＣＯＭ１」の通信処
理を行う前に「ＣＡＬ１」を計算し、「ＣＯＭ１」の通
信処理を行った後に「ＣＡＬ２」を計算する必要がある
ことを示している。さらに、「送信」が「１」になって
いることから、プロセッサ０番が、プロセッサ１番に対
して送信を行なうことを意味する。また、「受信」が空
欄になっているので、プロセッサ０番は、受信動作を行
わない。"COM1" is the communication time (TIM
E) is “412.00 (μsec)”, and “transfer length” is “8
"Byte""and" dependency "indicate that" CAL1 "must be calculated before" COM1 "communication processing and" CAL2 "must be calculated after" COM1 "communication processing. ing. Furthermore, since "transmission" is "1", it means that processor 0 transmits to processor 1. Further, since "receive" is blank, processor 0 does not perform the receiving operation.

【００８４】「ＣＡＬ２」は、ＣＡＬ１と同様であるの
で、説明を省略する。Since "CAL2" is the same as CAL1, its description is omitted.

【００８５】「ＣＯＭ２」は、通信時間（ＴＩＭＥ）が
「１０２５．００（μ秒）」、「転送長」は、「８（バ
イト）」、「依存関係」は、「ＣＯＭ２」の通信処理を
行う前に、「ＣＡＬ１」を計算し、「ＣＯＭ２」の通信
処理を行った後に、「ＣＡＬ３」を計算する必要がある
ことを示している。さらに、「受信」が「１」になって
いることから、プロセッサ０番は、プロセッサ１番から
送られるデータを受信することを意味する。また、「送
信」が空欄になっているので、プロセッサ０番は、送信
動作を行わない。"COM2" is a communication time (TIME) of "1025.00 (μsec)", "transfer length" is "8 (byte)", and "dependency" is "COM2". It indicates that it is necessary to calculate “CAL1” before performing the above, and perform “CAL3” after performing the communication processing for “COM2”. Furthermore, since "reception" is "1", it means that processor 0 receives the data sent from processor 1. Further, since “Send” is blank, processor 0 does not perform the send operation.

【００８６】また、「ＣＡＬ３」、「ＣＡＬ４」は、
「ＣＡＬ１」と同じであり、「ＣＯＭ３」は、「依存関
係」の「後計算」が「ＣＡＬ４」である以外は、ＣＯＭ
１と同様であるため、「ＣＡＬ３」、「ＣＡＬ４」、
「ＣＯＭ３」については、説明を省略する。Further, "CAL3" and "CAL4" are
It is the same as "CAL1", and "COM3" is COM except that "post-calculation" of "dependency" is "CAL4".
Since it is the same as that of 1, "CAL3", "CAL4",
Description of “COM3” is omitted.

【００８７】さて、図６を見れば、「ＣＯＭ２」の通信
時間（ＴＩＭＥ）は、「ＣＡＬ１」に比べて、大幅に増
加していることが分かる。これは、プロセッサ１番から
送られてくるデータの受信動作において、プロセッサ０
番に待ち時間が生じているためである。Now, referring to FIG. 6, it can be seen that the communication time (TIME) of "COM2" is significantly increased as compared with "CAL1". This is because processor 0 is used in the operation of receiving the data sent from processor 1.
This is because there is a waiting time.

【００８８】図７には、図６に示す様なスケジューリン
グ形式のデータに基づき、各プロセッサに対する、「計
算時間」、「通信時間」、「待ち時間」を示した例であ
る。FIG. 7 shows an example of “calculation time”, “communication time”, and “waiting time” for each processor based on the data of the scheduling format as shown in FIG.

【００８９】「計算時間」は、計算処理単位に対する処
理を行なっている時間であり、また、、「通信時間」
は、通信処理単位に対する処理を行なっている時間であ
る。そして、「計算時間」および「通信時間」以外の時
間が、「待ち時間」となる。The "calculation time" is the time during which processing is performed for the calculation processing unit, and the "communication time" is
Is the time during which processing is performed for the communication processing unit. The time other than the “calculation time” and the “communication time” is the “waiting time”.

【００９０】図７では、４台のプロセッサの夫々に対す
る、「計算時間」、「通信時間」、「待ち時間」を示し
ている。図６のデータを参照すると、例えば、プロセッ
サ０番（ＰＥ０）の白抜き部分７０が、「ＣＯＭ２」の
処理における、待ち時間に相当する。本発明では、通信
処理手順の最適化を行なって、このような「待ち時間」
の散在を極力抑えることを特徴としている。FIG. 7 shows “calculation time”, “communication time”, and “waiting time” for each of the four processors. Referring to the data in FIG. 6, for example, the white part 70 of the processor 0 (PE0) corresponds to the waiting time in the process of “COM2”. In the present invention, the communication processing procedure is optimized so that such "wait time"
It is characterized by suppressing the scattering of.

【００９１】さて、通信最適化部１６が行なう通信最適
化処理の内容を、図８を参照して説明する。なお、最適
化前のスケジューリング形式のデータは、図６に示した
ものであり、最適化後のスケジューリング形式のデータ
は、図９、図１０に示している。Now, the contents of the communication optimizing process performed by the communication optimizing unit 16 will be described with reference to FIG. The data in the scheduling format before the optimization is shown in FIG. 6, and the data in the scheduling format after the optimization is shown in FIGS. 9 and 10.

【００９２】まず、図６の各計算処理単位または各通信
処理単位毎に、前後の演算量または転送長を調べる（ス
テップ８０）。具体的には、図６の「ＣＡＬ１」（これ
は、計算処理単位である）に対しては、前に存在する処
理は無いので「０」であり、後に存在する通信処理手順
の転送長は「８」となることを調べる。First, the calculation amount or transfer length before and after each calculation processing unit or communication processing unit in FIG. 6 is checked (step 80). Specifically, for “CAL1” in FIG. 6 (which is a unit of calculation processing), it is “0” because there is no processing existing before, and the transfer length of the communication processing procedure existing after is “0”. Check that it becomes "8".

【００９３】また、「ＣＯＭ１」（これは、通信処理単
位である）の場合、前に存在する計算処理の演算量は
「２００」であり、後に存在する演算量も「２００」と
なることが、調べられる。Further, in the case of "COM1" (this is a communication processing unit), the calculation amount of the calculation process existing before is "200", and the calculation amount existing after is also "200". Can be investigated.

【００９４】同様にして、「ＣＡＬ２」、「ＣＡＬ
３」、「ＣＡＬ４」、「ＣＯＭ２」、「ＣＯＭ３」の夫
々について、その前後の演算量または転送長を調べる。Similarly, "CAL2", "CAL"
For each of "3", "CAL4", "COM2", and "COM3", the calculation amount or transfer length before and after that is checked.

【００９５】次に、演算量と転送長の関係から、現在の
転送長を増加した方が、高速な並列処理が行なわれるよ
うになるか否かを判定する（ステップ８１）。ここで
は、「ＣＯＭ１」、「ＣＯＭ２」、「ＣＯＭ３」の転送
長は小さいので、いずれも転送長を増加した方が高速な
並列処理が行なわれることになる。なお、転送長に対す
る通信時間について、図２のように、予め定められてお
り、転送長を増加した方が、高速な並列処理が行なわれ
るようになるか否かを判定することができる。Next, from the relationship between the amount of calculation and the transfer length, it is determined whether the increase in the current transfer length enables the faster parallel processing (step 81). Here, since the transfer lengths of “COM1”, “COM2”, and “COM3” are small, increasing the transfer length in all results in faster parallel processing. It should be noted that the communication time with respect to the transfer length is predetermined as shown in FIG. 2, and it is possible to determine whether or not the increase in the transfer length allows high-speed parallel processing.

【００９６】なお、ステップ８１において、現在の転送
長を増加した方が、高速な並列処理が行なわれないと判
断された場合には、ステップ８４において、通信データ
を分割して、通信処理を行なうようにする。データの分
割単位は、予め定めておけば良い。When it is determined in step 81 that the current transfer length is not increased, the high-speed parallel processing is not performed. In step 84, the communication data is divided and the communication processing is performed. To do so. The data division unit may be determined in advance.

【００９７】次に、ステップ８２、８３では、「ＣＯＭ
１」〜「ＣＯＭ３」の順番変更、即ち、図６における配
置位置の移動が可能か否かと、移動可能であれば移動可
能範囲も判定する。「ＣＯＭ１」〜「ＣＯＭ３」は、依
存関係を参照すれば、いずれも、「ＣＡＬ１」の後ろで
あって、「ＣＡＬ２」の前に存在すればよいの、各通信
処理手順は夫々、依存関係を満足する範囲内で移動可能
である。Next, in steps 82 and 83, "COM
It is also determined whether the order of "1" to "COM3" can be changed, that is, whether the arrangement position in FIG. 6 can be moved, and if it is possible, the movable range. “COM1” to “COM3” need only exist after “CAL1” and before “CAL2” if the dependency is referred to. Therefore, each communication processing procedure has a dependency. It is possible to move within the range of satisfaction.

【００９８】このような処理は、図６と図９を比較すれ
ば理解しやすい。Such processing is easy to understand by comparing FIG. 6 and FIG.

【００９９】但し、図９や図１０は、最適化後のスケジ
ューリング形式のデータを示しており、最適化途中のも
のではない。However, FIG. 9 and FIG. 10 show the data in the scheduling format after optimization, and are not in the middle of optimization.

【０１００】即ち、図６における「ＣＯＭ２」、「ＣＯ
Ｍ３」の移動例が、図９に示されている。「ＣＯＭ
２」、「ＣＯＭ３」ともに、依存関係を満足するように
移動されており、また、その移動も、可能な限り、「Ｃ
ＯＭ」同士が連続して配置されるように、行なわれてい
る。なお、図９では、「ＣＯＭ１」および「ＣＯＭ３」
を、通信処理単位９１として示しており、また、「ＣＯ
Ｍ２」を、通信処理単位９２として示している。That is, "COM2" and "CO" in FIG.
An example of movement of "M3" is shown in FIG. "COM
Both "2" and "COM3" have been moved so as to satisfy the dependency, and the movement is also possible by "C".
It is performed so that the OMs are continuously arranged. In addition, in FIG. 9, "COM1" and "COM3"
Is shown as a communication processing unit 91, and "CO
“M2” is shown as the communication processing unit 92.

【０１０１】通信処理単位９１である、「ＣＯＭ１」、
「ＣＯＭ３」は、ともに、プロセッサ１に対する送信処
理を行なう。これに対して「ＣＯＭ２」は、プロセッサ
１からの受信処理を行なう。"COM1", which is the communication processing unit 91,
Both “COM3” perform a transmission process to the processor 1. On the other hand, “COM2” performs the receiving process from the processor 1.

【０１０２】なお、図１０では、「ＣＯＭ１」と「ＣＯ
Ｍ３」とを、まとめて通信処理単位１０１「ＴＲＡ１」
としており、また、「ＣＯＭ２」を、通信処理単位１０
２としている。In FIG. 10, "COM1" and "CO1"
M3 ”together with the communication processing unit 101“ TRA1 ”
In addition, the communication processing unit 10 is defined as "COM2".
It is 2.

【０１０３】さて次に、移動した通信処理単位のうち、
同時に通信可能なものを判定する。Next, of the moved communication processing units,
Determine what can be communicated at the same time.

【０１０４】これは、図８のステップ８５で、目的の転
送長となるまで行う。ここでいう「目的の転送長」と
は、可能な限りデータ転送長が長くなるまで、という意
味である。図９に示す通信処理単位９１が、所定の転送
長となった状態に相当する。This is performed until the target transfer length is reached in step 85 of FIG. The “target transfer length” here means that the data transfer length is as long as possible. The communication processing unit 91 shown in FIG. 9 corresponds to a state in which the predetermined transfer length is reached.

【０１０５】そして、通信処理単位９１は、送信相手が
同じく、プロセッサ１番なので、２つの通信処理手順で
ある、「ＣＯＭ１」、「ＣＯＭ３」を同時に実行するこ
とが可能である。Since the communication processing unit 91 has the same processor 1 as the transmission partner, the two communication processing procedures "COM1" and "COM3" can be simultaneously executed.

【０１０６】しかしながら、図９の通信処理単位９２
「ＣＯＭ２」は、他の通信処理手順と異なり、受信処理
のための、他の通信処理手順「ＣＯＭ１」、「ＣＯＭ
３」と、同時に実行することはできない。ここで、同時
に実行可能な通信処理手順９１は、図１０に示す通信処
理手順１０１の様に、２つの通信処理手順「ＣＯＭ
１」、「ＣＯＭ３」を、１つの通信処理手順「ＴＲＡ
１」としてまとめて、転送長を２つの通信処理手順の合
計「１６」とし、「ＴＲＡ１」の通信時間を、図２を参
照して求める。However, the communication processing unit 92 of FIG.
Unlike the other communication processing procedure, "COM2" is another communication processing procedure "COM1", "COM" for receiving processing.
3 ”cannot be executed at the same time. Here, the communication processing procedure 91 that can be executed at the same time is the same as the communication processing procedure 101 shown in FIG.
1 ”and“ COM3 ”as one communication processing procedure“ TRA
1 ”, the transfer length is a total of“ 16 ”of the two communication processing procedures, and the communication time of“ TRA1 ”is obtained with reference to FIG.

【０１０７】そして、ステップ８６において、各プロセ
ッサに対するプログラムの全てについて、上述した判定
を行なったか否かを判断し、プログラムの全てについて
判定した場合は、１回目の並べ替えを終了し、各プロセ
ッサごとの処理時間を計算し、例えば、その中での最大
値を抽出し、当該並び替えに対する処理時間を、この最
大値Ｔｎとする（ｎは、処理繰返しを示す変数である）
（ステップ８７）。Then, in step 86, it is judged whether or not the above-mentioned judgment has been carried out for all the programs for each processor. If all the programs have been judged, the first rearrangement is completed and each processor is judged. Processing time is calculated, for example, the maximum value among them is extracted, and the processing time for the rearrangement is set to this maximum value Tn (n is a variable indicating processing repetition).
(Step 87).

【０１０８】例えば、各プロセッサの処理時間の中で最
大値を抽出し、１回目の並べ替え後の処理時間をＴ１と
する。For example, the maximum value is extracted from the processing time of each processor, and the processing time after the first rearrangement is set to T1.

【０１０９】次に、再び、プログラムの先頭に戻り（ス
テップ８８）、通信処理手順位置を他の位置に変更可能
かどうか判定し、変更可能であれば、２回目の並べ替え
を行い、（ステップ８９）並べ替え後の処理時間を、同
様に、Ｔ２する。Next, the program is returned to the beginning again (step 88), it is judged whether the communication processing procedure position can be changed to another position, and if it can be changed, the second rearrangement is carried out (step 89) Similarly, the processing time after rearrangement is T2.

【０１１０】前述したように、図９や図１０は、最適化
後のスケジューリング形式のデータを示しており、最適
化途中のものではない。したがって、実際には、ステッ
プ８９において、各プロセッサに対するプログラムにお
いて、依存関係を満足するように、各「ＣＯＭ」を移動
（並び替え）して、最適の通信パターンを求めることに
なる。As described above, FIGS. 9 and 10 show data in the scheduling format after optimization, and are not in the middle of optimization. Therefore, in actuality, in step 89, in the program for each processor, each "COM" is moved (rearranged) so as to satisfy the dependency, and the optimum communication pattern is obtained.

【０１１１】このように、通信処理手順位置の変更が可
能であれば、新たに、可能な並べ替えを行い、これに対
する処理時間をＴｎとし、並べ替え後の処理時間Ｔ１か
らＴｎの中で、最小値をとるものを、最適な通信パター
ンとする（ステップ８１０）。As described above, if the position of the communication processing procedure can be changed, a new possible rearrangement is performed, the processing time for this is set to Tn, and in the processing time T1 to Tn after the rearrangement, The one having the minimum value is set as the optimum communication pattern (step 810).

【０１１２】即ち、各プロセッサに対するプログラムに
おいて、依存関係を満足するようにして、「ＣＯＭ」を
移動可能なパターンを総て調べて、その中で、Ｔｎが最
小となるパターンを、最適な通信パターンとする。図６
に示すスケジューリング形式のデータでは、「ＣＯＭ」
が少ないが、実際には、かなり多くの「ＣＯＭ」が存在
するため、「ＣＯＭ」を移動可能なパターンを総て調べ
る必要がある。That is, in the program for each processor, all the patterns in which "COM" can be moved so as to satisfy the dependency relationship are examined, and the pattern having the smallest Tn among them is determined as the optimum communication pattern. And FIG.
In the scheduling format data shown in, "COM"
However, in reality, there are so many "COMs" that it is necessary to examine all patterns in which "COMs" can be moved.

【０１１３】この場合の通信パターンは、図１０に示す
ような、スケジューリング形式のデータで表される。The communication pattern in this case is represented by scheduling data as shown in FIG.

【０１１４】最後に、目的プログラム生成部１７は、通
信最適化部１６により作成されたスケジューリング形式
のデータを、目的プログラム１１へ変更する処理を行な
う。Finally, the target program generator 17 performs a process of changing the scheduling format data created by the communication optimizer 16 to the target program 11.

【０１１５】ここでは、最適化前のスケジューリング形
式のデータと、最適化後のスケジューリング形式のデー
タとを比較して、最適化前のプログラムにおける、通信
処理手順「ＣＯＭ」の位置、および、内容を変更して、
最適化後の並列プログラムである、目的プログラムを作
成する。Here, the scheduling format data before optimization and the scheduling format data after optimization are compared to determine the position and contents of the communication processing procedure "COM" in the program before optimization. Change
Create a target program that is a parallel program after optimization.

【０１１６】この処理内容について、図５、６、１０、
１２を参照して具体的に説明する。Regarding this processing content, as shown in FIGS.
This will be specifically described with reference to FIG.

【０１１７】まず、図１０は、図６に示された、スケジ
ューリング形式のデータにおいて、通信処理手順「ＣＯ
Ｍ３」、「ＣＯＭ２」が移動しており、新たな通信処理
手順「ＴＲＡ１」１０１の次に、「ＣＯＭ２」１０２が
配置されている。即ち、通信処理手順１０１は、「ＣＯ
Ｍ１」と「ＣＯＭ２」とをまとめて、「ＴＲＡ１」とし
ている。First, FIG. 10 shows the communication processing procedure "CO" in the scheduling format data shown in FIG.
“M3” and “COM2” have moved, and “COM2” 102 is arranged next to the new communication processing procedure “TRA1” 101. That is, the communication processing procedure 101 is “CO
The "M1" and "COM2" are collectively referred to as "TRA1".

【０１１８】そこで、図５における、通信処理手順５
７、５８、５４、５５を、通信処理手順５２の下に配置
するように移動する。そして、通信処理手順５１、５
２、５７、５８を、他の通信手順で置き換えた結果が、
図１２に示す通信処理手順１２０、１２１である。な
お、図１２と図５とを比べてみると分かるように、他の
部分に変更はない。Therefore, the communication processing procedure 5 in FIG.
7, 58, 54, 55 are moved to be placed under the communication processing procedure 52. Then, the communication processing procedures 51, 5
The result of replacing 2, 57, 58 with another communication procedure is
The communication processing procedures 120 and 121 shown in FIG. As can be seen by comparing FIG. 12 and FIG. 5, there is no change in other parts.

【０１１９】さて、ここでは、配列「ＢＵＦ」なる、ロ
ーカルな配列を使用し、通信処理において、送信または
受信するデータを、一時的に格納する。つまり、図１２
を参照して具体的に説明すると、通信処理手順１２０で
は、配列Ａの終値と配列Ｂの終値を、夫々、配列ＢＵＦ
の１、２番目の配列要素として格納し、配列ＢＵＦの値
を、自分のプロセッサ番号（ＭＹ）に「１」を加えた番
号（ＭＹ＋１）に対応するプロセッサに、１度に送信す
る。By the way, here, a local array "BUF" is used, and the data to be transmitted or received in the communication process is temporarily stored. That is, FIG.
Specifically, in the communication processing procedure 120, the closing price of the array A and the closing price of the array B are respectively set in the array BUF.
Of the array BUF, and the value of the array BUF is transmitted at a time to the processor corresponding to the number (MY + 1) obtained by adding “1” to the processor number (MY).

【０１２０】一方、通信処理手順１２１は、通信処理手
順１２０とは逆に、配列ＢＵＦの値を、自分のプロセッ
サ番号（ＭＹ）から「１」を減じた番号（ＭＹ−１）に
対応するプロセッサから、１度に受信して、配列Ａの、
「初期値−１」番目の要素と、配列Ｂの「初期値−１」
番目の要素として、夫々、配列ＢＵＦの１、２番目の配
列要素を代入する。On the other hand, in the communication processing procedure 121, contrary to the communication processing procedure 120, the value of the array BUF is a processor corresponding to the processor number (MY) minus "1" (MY-1). From the array A,
"Initial value-1" th element and "Initial value-1" of array B
As the second element, the first and second array elements of the array BUF are substituted, respectively.

【０１２１】このようなプログラム記述により、図５の
「５１、５２、５７、５８」で行なわれる通信処理が、
図１２の「１２０、１２１」で行なわれる。With such a program description, the communication process performed at "51, 52, 57, 58" in FIG.
This is performed at "120, 121" in FIG.

【０１２２】図１１は、図１０に示した、最適化後のス
ケジューリング形式のデータを参照して、各プロセッサ
ごとに、「計算時間」、「通信時間」、「待ち時間」の
スケジューリングパターンを示したものである。FIG. 11 shows scheduling patterns of “calculation time”, “communication time”, and “waiting time” for each processor with reference to the data in the scheduling format after optimization shown in FIG. It is a thing.

【０１２３】図７と比べて分かるように、最適化処理前
に比べ、待ち時間、通信時間が減少している。これは、
通信処理手順をまとめて、通信データを一括して通信可
能にしたことにより、通信処理に要する時間が削減され
たことによる。As can be seen from the comparison with FIG. 7, the waiting time and the communication time are shorter than those before the optimization processing. this is,
This is because the time required for the communication processing is reduced because the communication processing procedure is put together and the communication data can be collectively sent.

【０１２４】なお、転送長と演算量の関係を調べ、転送
長が長くて、「転送長を増加した方が、並列処理が高速
とはならないと判断される」場合には、逆に、転送長を
短くして、即ち、通信データを分割することによって、
通信処理に要する時間が削減される。If the relationship between the transfer length and the amount of calculation is examined and the transfer length is long and "it is judged that the parallel processing does not become faster when the transfer length is increased", the transfer is reversed. By shortening the length, that is, by dividing the communication data,
The time required for communication processing is reduced.

【０１２５】以上のように、複数種類の演算手順、およ
び、プロセッサ間での通信処理に対応する、複数種類の
通信手順を記述した、並列処理を行なうための並列プロ
グラムにおいて、記述された、ある通信手順に対する通
信量を、各演算手順に対する演算時間、および、各通信
手順に対する通信量を参照して決定して、さらに、決定
された通信量で通信処理が行なわれるように、当該通信
手順の記述を変更するプログラム作成方法が提供され
る。As described above, a plurality of types of arithmetic procedures and a plurality of types of communication procedures corresponding to communication processing between processors are described, and are described in a parallel program for performing parallel processing. The communication amount for the communication procedure is determined with reference to the calculation time for each calculation procedure and the communication amount for each communication procedure, and further, the communication procedure of the communication procedure is determined so that the communication process is performed with the determined communication amount. A method for creating a program for changing the description is provided.

【０１２６】なお、図１に示すシステムを、並列処理計
算機等に設けておき、各プロセッサは、並列プログラム
を与えることによって生成された、目的プログラムを参
照して、通信処理や演算手順を行なうようにしておくこ
とによって、高速なプロセッサ間通信を行なう並列処理
システムを実現できる。It should be noted that the system shown in FIG. 1 is provided in a parallel processing computer or the like, and each processor performs a communication process or a calculation procedure by referring to a target program generated by giving a parallel program. By doing so, a parallel processing system for high-speed interprocessor communication can be realized.

【０１２７】[0127]

【発明の効果】以上述べたように、本発明によれば、プ
ロセッサ間通信に要する通信時間、および、演算時間を
考慮して、通信処理パターンを最適になるように変更
し、通信処理を行なうため、通信待ち時間の削減を行な
い、通信データの転送効率を向上するとともに、並列処
理の効率も向上できる、並列プログラムを作成できる。As described above, according to the present invention, in consideration of the communication time required for inter-processor communication and the calculation time, the communication processing pattern is changed to be optimum and the communication processing is performed. Therefore, it is possible to create a parallel program that can reduce the communication waiting time, improve the communication data transfer efficiency, and improve the parallel processing efficiency.

[Brief description of the drawings]

【図１】通信最適化システムの構成図である。FIG. 1 is a configuration diagram of a communication optimization system.

【図２】通信パターンの説明図である。FIG. 2 is an explanatory diagram of a communication pattern.

【図３】プロセッサ０番における、データ転送長と通信
時間との関係を示す説明図である。FIG. 3 is an explanatory diagram showing the relationship between the data transfer length and communication time in processor 0.

【図４】構文解析部が行なう処理を示す説明図である。FIG. 4 is an explanatory diagram showing a process performed by a syntax analysis unit.

【図５】並列プログラムをソースレベルで記述した説明
図である。FIG. 5 is an explanatory diagram describing a parallel program at a source level.

【図６】スケジューリング部が扱うデータの説明図であ
る。FIG. 6 is an explanatory diagram of data handled by a scheduling unit.

【図７】各プロセッサの、計算時間、通信時間、およ
び、待ち時間のパターン例の説明図である。FIG. 7 is an explanatory diagram of a pattern example of calculation time, communication time, and waiting time of each processor.

【図８】通信最適化部が行う処理を示す説明図である。FIG. 8 is an explanatory diagram illustrating a process performed by a communication optimization unit.

【図９】通信最適化処理に際して扱うデータの説明図で
ある。FIG. 9 is an explanatory diagram of data handled in communication optimization processing.

【図１０】通信最適化処理に際して扱うデータの説明図
である。FIG. 10 is an explanatory diagram of data handled in communication optimization processing.

【図１１】通信最適化処理後の、各プロセッサの、計算
時間、通信時間、および、待ち時間のパターン例の説明
図である。FIG. 11 is an explanatory diagram of an example of a pattern of calculation time, communication time, and waiting time of each processor after the communication optimization process.

【図１２】通信最適化処理後の、並列プログラム（目的
プログラム）をソースレベルで記述した説明図である。FIG. 12 is an explanatory diagram in which a parallel program (objective program) after the communication optimization process is described at the source level.

[Explanation of symbols]

１０…並列プログラム、１１…目的プログラム、１２…
通信最適化システム、１３…通信解析部、１４…構文解
析部、１５…スケジューリング部、１６…通信最適化
部、１７…目的プログラム生成部、１８…通信用データ
ベース10 ... Parallel program, 11 ... Objective program, 12 ...
Communication optimization system, 13 ... Communication analysis unit, 14 ... Syntax analysis unit, 15 ... Scheduling unit, 16 ... Communication optimization unit, 17 ... Object program generation unit, 18 ... Communication database

Claims

[Claims]

1. A method for changing a parallel program for performing parallel processing, which describes a plurality of types of arithmetic procedures and a plurality of types of communication procedures corresponding to communication processing between processors, the method being described. Also, the communication amount for a certain communication procedure is determined by referring to the calculation time for each calculation procedure and the communication amount for each communication procedure, and the communication procedure is performed so that the communication process is performed at the determined communication amount. How to change the description of the program.

2. A method for changing a parallel program for performing parallel processing, which describes a plurality of types of arithmetic procedures and a plurality of types of communication procedures corresponding to communication processing between processors, and is currently used. Assuming that the communication amount of the communication processing being performed according to the communication procedure being performed is increased, if the time from the start to the end of the parallel processing becomes short, the communication procedures in the parallel program are rearranged. A method of creating a program in which the description contents are changed so that two or more communication procedures are combined.

3. The rearrangement of communication procedures according to claim 2, wherein the predetermined constraint conditions are referred to, and the predetermined constraint conditions are before or after a specific operation procedure. A method of creating a program, characterized in that it is rearranged later.

4. A method for changing a parallel program for performing parallel processing, which describes a plurality of types of communication procedures and a plurality of types of communication procedures corresponding to communication processing between processors, and is currently used. Assuming that the amount of communication processing performed according to the communication procedure is reduced, if the time from the start to the end of the parallel processing becomes short, the communication data to be processed by communication is divided, A method of creating a program, which changes the description content of the communication procedure so that communication processing is performed for each divided data.

5. A system for changing a parallel program for performing parallel processing, which describes a plurality of kinds of arithmetic procedures and a plurality of kinds of communication procedures corresponding to communication processing between processors, and is currently used. Assuming that the communication amount of the communication processing being performed according to the communication procedure being performed is increased, means for determining that the time from the start to the end of the parallel processing is shortened, and the means for performing parallel processing by the means. When it is determined that the time from the start to the end of the communication is shortened, the communication procedure in the parallel program is rearranged, and the description content is modified so as to combine two or more communication procedures. Creation system.