JP3239963B2

JP3239963B2 - Multi-processor computer system

Info

Publication number: JP3239963B2
Application number: JP28151792A
Authority: JP
Inventors: 修一中村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-10-20
Filing date: 1992-10-20
Publication date: 2001-12-17
Anticipated expiration: 2016-12-17
Also published as: JPH06131313A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は複数のプロセッサから構
成される計算機システムにおいて、データを分割して割
り当て、複数のプロセッサにより並列実行するためマル
チ・プロセッサ計算機システムに関し、特に、本発明は
各プロセッサがそれぞれ固有のメモリを持つ分散メモリ
方式のマルチ・プロセッサ計算機システムに関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system comprising a plurality of processors for dividing and dividing data.
Against Ri, those relates Mar <br/> Chi processor computer system for parallel execution by a plurality of processors, in particular, the present invention is about the multi-processor computer system of distributed memory type such that the processor, each with its own memory It is.

【０００２】[0002]

【従来の技術】複数のプロセッサからなる計算機システ
ムとして、従来から図６（ａ），（ｂ）に示すシステム
が知られている。同図（ａ），（ｂ）において、１１
ａ，１１ｂ…はプロセッサ、１２，１２ａ，１２ｂ…は
メモリ、１３はネットワークを示す。同図（ａ）は、共
有メモリ方式のマルチ・プロセッサ・システムを示し、
各プロセッサ１１ａ，１１ｂはメモリ１２を共有してお
り、メモリ１２は各プロセッサ１１ａ，１１ｂの命令か
ら直接アクセス可能である。2. Description of the Related Art As a computer system including a plurality of processors, a system shown in FIGS. 6A and 6B is conventionally known. 11A and 11B, 11
.. represent processors, 12, 12a, 12b... memory, and 13 a network. FIG. 1A shows a multi-processor system of a shared memory system.
The processors 11a and 11b share the memory 12, and the memory 12 is directly accessible from the instructions of the processors 11a and 11b.

【０００３】同図（ｂ）は分散メモリ方式のマルチ・プ
ロセッサ・システムを示し、各プロセッサ１１ａ，１１
ｂ…が固有のメモリ１２ａ，１２ｂ…を有し、各プロセ
ッサ間の通信はネットワーク１３を介して行われる。と
ころで、上記したマルチ・プロセッサ・システムにおい
て、複数のプロセッサによりプログラムを並列実行する
場合、従来、次の２つの方法が用いられていた。並
列実行可能部分にその旨の指示を行った１つのソース・
プログラムを変換して並列実行する方法。FIG. 1B shows a multi-processor system of a distributed memory system, in which each processor 11a, 11
have unique memories 12a, 12b,..., and communication between the processors is performed via a network 13. By the way, in the above-mentioned multi-processor system, when a program is executed in parallel by a plurality of processors, the following two methods have conventionally been used. One source that instructs the parallel executable part to that effect
A method of converting programs and executing them in parallel.

【０００４】図７（ａ）は同図（ｃ）に示すソース・プ
ログラムＰ１をプロセッサ１１ａ，１１ｂ、メモリ１２
より構成されるメモリ共有方式のマルチ・プロセッサ・
システムにおいて実行する場合を示した図である。同図
におけるソース・プログラムＰ１は、ファイルより配列
Ａにデータを読込み２台のプロセッサによりＤＯループ
を並列実行する例を示したものであり、「READ A」によ
りファイルよりデータを配列Ａに読み込み、「PARALLEL
DO I=1,1000…」「=A(I) 」により並列実行を行う。FIG. 7A shows a case where a source program P1 shown in FIG.
Memory-sharing multi-processor
It is a figure showing the case where it is performed in a system. The source program P1 in the figure shows an example in which data is read from a file into an array A and a DO loop is executed in parallel by two processors. Data is read from a file into an array A by "READ A". "PARALLEL
DO I = 1,1000 ... "and" = A (I) "for parallel execution.

【０００５】上記ソース・プログラムＰ１は共有メモリ
方式のマルチ・プロセッサ・システムにおいて、同図
（ａ）に示すように実行される。プロセッサ１１ａによ
り、メモリ１２に配列の領域を確保したのち、「READ
A」を実行して、メモリ１２に確保された領域にファイ
ルよりデータを読み込む。ついで、プロセッサ１１ａ，
１１ｂにより並列実行が開始される。すなわち、ＤＯル
ープの１〜１０００のループを２つに分割し、プロセッ
サ１１ａにより１〜５００のＤＯループを、また、プロ
セッサ１１ｂにより５０１〜１０００のＤＯループを並
列に実行し、参照データは、それぞれのプロセッサ１１
ａ，１１ｂによりメモリ１２から読み込まれる。The source program P1 is executed in a shared memory multiprocessor system as shown in FIG. After securing an array area in the memory 12 by the processor 11a, "READ"
A "is executed to read data from the file into the area secured in the memory 12. Then, the processors 11a,
11b starts parallel execution. That is, the DO loop of 1 to 1000 is divided into two, and the processor 11a executes the DO loops of 1 to 500 in parallel, and the processor 11b executes the DO loops of 501 to 1000 in parallel. Processor 11
a and 11b are read from the memory 12.

【０００６】同図（ｂ）は上記ソース・プログラムＰ１
を、メモリ１２ａを持つプロセッサ１１ａ，メモリ１２
ｂを持つプロセッサ１１ｂより構成される分散メモリ方
式のマルチ・プロセッサ・システムにより実行する場合
を示す図である。同図（ｂ）において、プロセッサ１１
ａにより、メモリ１２ａに配列の領域を確保したのち、
「READ A」を実行して、メモリ１２ａに確保された領域
にファイルよりデータを読み込む。FIG. 1B shows the source program P1.
To the processor 11a having the memory 12a and the memory 12
9 is a diagram showing a case where the processing is executed by a distributed memory type multi-processor system composed of a processor 11b having the processor 11b. In FIG. 2B, the processor 11
a, after securing an array area in the memory 12a,
Execute "READ A" to read data from the file into the area secured in the memory 12a.

【０００７】ついで、ＤＯループの１〜１０００のルー
プを２つに分割し、同図（ａ）に示したように、プロセ
ッサ１１ａ，プロセッサ１１ｂにより１〜５００のＤＯ
ループ、５０１〜１０００のＤＯループを並列に実行す
る。また、この場合には、配列データがメモリ１２ａに
格納されているため、プロセッサ１１ｂがＤＯループの
命令を実行するごとに、データがプロセッサ１１ａ，１
１ｂ間で転送される。このため転送のオーバ・ヘッドが
生じて、実行時間が増大する。プロセッサごとにソ
ース・プログラムを記述し、それぞれを実行形式プログ
ラムに変換して並列実行する方法。Next, as shown in FIG. 1A, the DO loops 1 to 1000 of the DO loop are divided into two, and the DO loops 1 to 500 are provided by the processors 11a and 11b.
Loops: Execute 501 to 1000 DO loops in parallel. In this case, since the array data is stored in the memory 12a, every time the processor 11b executes the instruction of the DO loop, the data is stored in the processor 11a, 1a.
1b. This causes transfer overhead and increases execution time. A method of writing a source program for each processor, converting each to an executable program, and executing the program in parallel.

【０００８】図８（ａ）はメモリ１２ａを持つプロセッ
サ１１ａ、メモリ１２ｂを持つプロセッサ１１ｂから構
成される分散メモリ方式のマルチ・プロセッサ・システ
ムにより、それぞれのプロセッサ用に記述された図８
（ｂ）に示すソース・プログラムＰ２ａ、ソース・プロ
グラムＰ２ｂを実行する場合を示す図である。ソース・
プログラムＰ２ａ，ｐ２ｂは、ファイルよりそれぞれ配
列ＡおよびＢにデータを読込み、ソース・プログラムＰ
１と同様に、２台のプロセッサによりＤＯループを並列
実行する例を示したものである。FIG. 8A shows a multi-processor system of a distributed memory system composed of a processor 11a having a memory 12a and a processor 11b having a memory 12b.
It is a figure which shows the case where the source program P2a shown in (b) and the source program P2b are executed. Source·
The programs P2a and p2b read data from the file into the arrays A and B, respectively, and
As in the case of FIG. 1, an example in which a DO loop is executed in parallel by two processors is shown.

【０００９】ソース・プログラムＰ２ａはプロセッサ１
１ａ用に記述されたプログラムであり、ソース・プログ
ラムＰ２ｂはプロセッサ１１ｂ用に記述されたプログラ
ムである。上記ソース・プログラムＰ２ａ，Ｐ２ｂは分
散メモリ方式のマルチ・プロセッサ・システムにおい
て、同図（ａ）に示すように実行される。The source program P2a is a processor 1
The source program P2b is a program described for the processor 11b. The source programs P2a and P2b are executed in a distributed memory multiprocessor system as shown in FIG.

【００１０】プロセッサ１１ａ，１１ｂにより、それぞ
れのメモリ１２ａ，１２ｂに配列の領域を確保したの
ち、プロセッサ１１ａにおいては「READ A」を実行し
て、配列データを５００要素分を読み込んで配列Ａに格
納し、その後、”前半完了”という事象を通知する。プ
ロセッサ１１ｂにおいては、”前半完了”という事象が
起きることを待ち続け、それが満足したのちに「READ
B」を実行して配列データを５００要素分読み込み配列
Ｂへ格納する。After the processor 11a, 11b secures an array area in each of the memories 12a, 12b, the processor 11a executes "READ A" to read 500 pieces of array data and store it in the array A. Then, an event of “first half completed” is notified. The processor 11b keeps waiting for the event of "first half completed", and after it is satisfied, "READ"
B "to read 500 elements of array data and store it in array B.

【００１１】以上のようにして、メモリ１２ａに１〜５
００までのデータが格納されるとプロセッサ１１ａによ
り、また、メモリ１２ｂに５０１〜１０００のデータが
格納されるとプロセッサ１１ｂにより並列にＤＯループ
を実行し、参照データは、それぞれのプロセッサ１１
ａ，１１ｂによりメモリ１２ａ，１２ｂから読み込まれ
る。As described above, 1 to 5 are stored in the memory 12a.
When data of up to 00 is stored, the processor 11a executes a DO loop in parallel by storing the data of 501 to 1000 in the memory 12b, and the processor 11b executes a DO loop in parallel.
a and 11b are read from the memories 12a and 12b.

【００１２】[0012]

【発明が解決しようとする課題】ところで、上記した
の方法は、図６（ａ）に示したような、均一の共有メモ
リ方式（各プロセッサからのメモリに対するアクセス速
度にあまり差がない共有メモリ方式）のマルチ・プロセ
ッサ・システムに適用するには適しているが、図６
（ｂ）に示した分散メモリにおいては、データの参照時
に他のプロセッサとのデータ転送が生じるため、実行時
間が増大するという欠点があった。By the way, the above-mentioned method uses a uniform shared memory system as shown in FIG. 6A (a shared memory system in which there is not much difference in the access speed to the memory from each processor). 6) is suitable for application to a multi-processor system.
The distributed memory shown in (b) has a drawback that the execution time increases because data transfer with another processor occurs when data is referred to.

【００１３】また、上記したの方法は、各プロセッサ
ごとにソース・プログラムを記述しなければならないた
め、例えば、ＲＥＡＤ文のような並列実行できない部分
のために、ＰＯＳＴ，ＷＡＩＴ文による同期制御等の考
慮をプログラム中でするなど、並列実行できない部分の
ための考慮をプログラム中でしなければならず、プログ
ラムの記述がむずかしいという欠点があった。In the above-mentioned method, since a source program must be described for each processor, for example, for a part that cannot be executed in parallel such as a READ statement, synchronization control using a POST or WAIT statement or the like is required. For example, in a program, consideration must be given to a part that cannot be executed in parallel, and there is a disadvantage that the description of the program is difficult.

【００１４】本発明は上記した従来技術の欠点に鑑みな
されたものであって、複数のプロセッサからなる分散方
式のマルチ・プロセッサ・システムにおいて、プロセッ
サを複数台利用するための命令の分割と、データを各プ
ロセッサの固有のメモリに割り当てるためのデータ分割
を、１つの分割指示により共通に行えるようにすること
により、並列実行用プログラムを容易に記述できるよう
にするとともに、データ参照時及び格納時のデータ転送
時間を削減し、並列実行における実行時間を短縮するこ
とができるマルチ・プロセッサ計算機システムを提供す
ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned drawbacks of the prior art. In a distributed multiprocessor system including a plurality of processors, the present invention divides an instruction for using a plurality of processors, and executes data division. Can be shared by a single division instruction so that a parallel execution program can be easily described, and data reference and storage at the time of data reference and storage can be performed. An object of the present invention is to provide a multi-processor computer system capable of reducing data transfer time and executing time in parallel execution.

【００１５】[0015]

【課題を解決するための手段】図１は本発明の原理図で
ある。上記課題を解決するため、本発明は、図１に示す
ように、分散メモリ方式、もしくは、プロセッサとメモ
リ間の距離によってプロセッサからメモリへのアクセス
速度に差がある共有メモリ方式のマルチ・プロセッサ計
算機システムにおいて、マルチ・プロセッサ計算機シス
テム２の複数のプロセッサ２ａ，２ｂ，２ｃにより一つ
のソース・プログラム１を分担して並列実行する際、ソ
ース・プログラム中に記述されたプロセッサ数と、命令
及びデータの分割基準となるインデックス区間と、イン
デックスの分割方法を示す情報とに基づき、ソース・プ
ログラム中の一つの一次元配列データおよび命令を、上
記ソース・プログラム中の命令部の各プロセッサへの分
割と、一次元配列データの各プロセッサのメモリへの分
割とが一致するように複数個に分割したオブジェクト・
プログラムを各プロセッサ毎に生成する。そして、マル
チ・プロセッサ計算機システム２の各プロセッサ２ａ，
２ｂ，２ｃに当該オブジェクト・プログラムを割り当
て、上記各プロセッサ２ａ，２ｂ，２ｃにより、上記分
割された上記割り当てられたオブジェクト・プログラム
を並列実行するようにしたものである。FIG. 1 is a diagram illustrating the principle of the present invention. To solve the above problems, the present invention provides a multi-processor computer of a distributed memory system or a shared memory system in which the access speed from a processor to a memory varies depending on the distance between the processor and the memory, as shown in FIG. in the system, a plurality of processors 2a of the multi-processor computer system 2, 2b, when the parallel execution by sharing one of the source program 1 by 2c, Seo
Number of processors and instructions written in the source program
And one-dimensional array data and an instruction in the source program based on an index section serving as a data division reference and information indicating an index division method. The object divided into a plurality of objects so that the division matches the division of the one-dimensional array data into the memory of each processor.
A program is generated for each processor. Then, each processor 2a of the multi-processor computer system 2
2b, 2c, the object program is assigned , and each of the processors 2a, 2b, 2c assigns the divided object program.
The is obtained so as to parallel execution.

【００１６】[0016]

【作用】本発明においては、マルチ・プロセッサ計算機
システム２の複数のプロセッサ２ａ，２ｂ，２ｃにより
一つのソース・プログラム１を分担して並列実行する
際、ソース・プログラム１中に記述されたプロセッサ数
と、命令及びデータの分割基準となるインデックス区間
と、インデックスの分割方法を示す情報とに基づき、ソ
ース・プログラム中の一つの一次元配列データおよび命
令を、上記ソース・プログラム中の命令部の各プロセッ
サへの分割と、一次元配列データの各プロセッサのメモ
リへの分割とが一致するように複数個に分割したオブジ
ェクト・プログラムを各プロセッサ２ａ，２ｂ，２ｃ毎
に生成し、マルチ・プロセッサ計算機システムの各プロ
セッサ２ａ，２ｂ，２ｃに当該オブジェクト・プログラ
ムを割り当て、上記割り当てられたオブジェクト・プロ
グラムを並列実行するようにしたので、並列実行用プロ
グラムを容易に記述することが可能となるとともに、デ
ータ参照時のデータ転送時間を削減し、並列実行におけ
る実行時間を短縮することが可能となる。また、プログ
ラム１中の命令部の各プロセッサ２ａ，２ｂ，２ｃへの
分割と配列データの各プロセッサ２ａ，２ｂ，２ｃのメ
モリへの分割とを一致させるようにしたので、配列デー
タの数と命令における繰り返し回数が異なっていても、
各プロセッサにおいて、問題なく並列実行を行うことが
でき、データ参照時のデータ転送時間を削減し、並列実
行における実行時間を短縮することが可能となる。According to the present invention, when a plurality of processors 2a, 2b, 2c of a multi-processor computer system 2 share and execute one source program 1 in parallel, the number of processors described in the source program 1
And an index section that serves as a reference for dividing instructions and data
And information indicating how to divide the index,
One-dimensional array data and
Instructions in each processor of the instruction section in the above source program.
And memos for each processor for one-dimensional array data
Objects divided so that the division into
Project program for each processor 2a, 2b, 2c
Generated by each processor of the multi-processor computer system.
The object program is stored in the processors 2a, 2b, and 2c.
Assign the object program
Since such executed in parallel grams, it becomes possible to easily describe the parallel execution program, to reduce the data transfer time for data reference, it is possible to shorten the execution time of parallel execution . In addition, the division of the instruction part in the program 1 into the processors 2a, 2b, and 2c and the division of the array data into the memories of the processors 2a, 2b, and 2c are matched with each other. Even if the number of repetitions in is different,
In each processor, parallel execution can be performed without any problem, and the data transfer time when referring to data can be reduced, and the execution time in parallel execution can be reduced.

【００１７】[0017]

【実施例】図２は本発明におけるソース・プログラムの
一例と、そのプログラムを図６（ｂ）に示した分散メモ
リ方式のマルチ・プロセッサ・システムにより実行する
場合の動作を示す図である。同図において、１１ａ，１
１ｂはプロセッサ、１２ａ，１２ｂは、それぞれプロセ
ッサ１１ａ，１１ｂに固有のメモリである。また、Ｐ３
は同図のシステムにより実行されるソース・プログラム
の一例であり、前記したソース・プログラムＰ１と同様
に、ファイルより配列Ａにデータを読込み２台のプロセ
ッサによりＤＯループを並列実行する例を示したもので
ある。なお、ソース・プログラムＰ３は後述するコンパ
イラによりコンパイルされプロセッサ１１ａ，１１ｂに
より実行される実行形式のプログラムに変換される。FIG. 2 is a diagram showing an example of a source program according to the present invention and the operation when the program is executed by a distributed memory type multiprocessor system shown in FIG. 6B. In the figure, 11a, 1
1b is a processor, and 12a and 12b are memories specific to the processors 11a and 11b, respectively. Also, P3
Is an example of a source program executed by the system shown in the figure, and shows an example in which data is read from a file into an array A and a DO loop is executed in parallel by two processors, similarly to the source program P1 described above. Things. The source program P3 is compiled by a compiler described later and converted into an executable program executed by the processors 11a and 11b.

【００１８】また、実行形式に変換されたプログラムの
プロセッサ１１ａ，１１ｂにおける実行は、前記した図
８の場合と同様であり、プロセッサ１１ａにおいて「RE
AD A」によりファイルより配列Ａにデータを読み込み、
プロセッサ１１ｂにデータを転送して、並列実行を行
う。ソース・プログラムＰ３において、「PARTITION P=
(2,1:1000,BAND) 」は、命令部とデータ部を分割し複数
のプロセッサに割り当てるための分割情報であり、上記
記述において、括弧内の「2 」は命令又はデータを何台
のプロセッサへ割り当てるかを示すプロセッサ数、「1:
1000」は命令又はデータの分割基準となるインデックス
区間、「BAND」はインデックスの分割方法を示す。The execution of the program converted into the execution format in the processors 11a and 11b is the same as in the case of FIG.
AD A ”reads data from the file into array A,
The data is transferred to the processor 11b and is executed in parallel. In the source program P3, "PARTITION P =
(2,1: 1000, BAND) "is division information for dividing the instruction part and the data part and assigning them to a plurality of processors.In the above description," 2 "in parentheses indicates how many instructions or data Number of processors indicating whether to assign to processors, "1:
“1000” indicates an index section serving as an instruction or data division reference, and “BAND” indicates an index division method.

【００１９】また、「PARALLEL DO …」は図７のソース
・プログラムＰ１で示したものと同様、並列実行開始を
記述したものである。ここで、上記インデックス区間、
インデックスの分割方法について、説明する。インデックス区間インデックス区間は上記したように命令又はデータの分
割基準となるものであり、図２に示したソース・プログ
ラムＰ３においては、配列宣言が「REAL A(1000)」、ま
た、ＤＯループにおける繰り返し回数が「I=1,1000」で
あり、これらはインデックス区間「1:1000」と一致して
いる。このため、配列宣言の上下限値、あるいは、ＤＯ
ループにおける繰り返し回数により命令、データを分割
しても、問題は生じない。"PARALLEL DO ..." describes the start of parallel execution in the same manner as the source program P1 shown in FIG. Here, the index section,
A method of dividing an index will be described. Index section The index section serves as a reference for dividing an instruction or data as described above. In the source program P3 shown in FIG. 2, the array declaration is "REAL A (1000)" and the repetition in the DO loop is performed. The number of times is “I = 1,1000”, which coincides with the index section “1: 1000”. For this reason, the upper and lower limits of the array declaration or DO
Even if instructions and data are divided according to the number of repetitions in the loop, no problem occurs.

【００２０】しかしながら、上記配列宣言やＤＯループ
における繰り返し回数が一致しない場合、配列宣言の上
下限値、あるいは、ＤＯループにおける繰り返し回数い
ずれか一方により命令、データを分割すると、他方によ
る分割値と一致しないという問題が生ずる。例えば、下
記の例において、配列宣言においては、Ａ（４００）であるが、ＤＯルー
プにおける繰り返し回数はＩ＝３，３９８であり、配列
宣言の両端が「２」ずつ計算対象外となっている。上記
例において、配列Ａを４分割すると、配列ＡはＡ（１：
１００）Ａ（１０１：２００）Ａ（２０１：３０
０）Ａ（３０１：４００）に分割される。また、ＤＯ
ループのＩ＝３，３９８により、繰り返し回数を４分割
すると、Ｉ＝３，１０１Ｉ＝１０２，２００Ｉ＝２
０１，２９９Ｉ＝３００，３９８に分割され、１０１
と３００の所で両者は一致しない。However, if the above-mentioned array declaration or the number of repetitions in the DO loop does not match, if the instruction or data is divided by either the upper or lower limit value of the array declaration or the number of repetitions in the DO loop, the divided value by the other will match the divided value by the other. The problem of not doing so arises. For example, in the following example: In the array declaration, it is A (400), but the number of repetitions in the DO loop is I = 3,398, and both ends of the array declaration are excluded from the calculation by "2". In the above example, when array A is divided into four, array A becomes A (1:
100) A (101: 200) A (201: 30)
0) It is divided into A (301: 400). DO
When the number of repetitions is divided into four by I = 3,398 of the loop, I = 3,101 I = 102,200 I = 2
01,299 I = 300,398, 101
And 300 do not agree.

【００２１】一方、分割情報のインデックス区間を基準
として分割すると、ＤＯループにおける繰り返し回数
は、Ｉ＝３，１００Ｉ＝１０１，２００Ｉ＝２０
１，３００Ｉ＝３０１，３９８に分割され、データの
分割と一致する。以上のように、命令や配列を、その繰
り返しの始値、終値あるいは配列宣言の添字の上下限値
ではなく、分割情報のインデックス区間を基準として分
割することにより、上記した例のように、境界値が計算
対象外であるとき等、縮小区間においても、命令と配列
の分割を一致させることができる。インデックスの
分割方法図３はインデックスの分割方法を示す図であ
り、同図において、縦軸はプロセッサ番号を示し、横軸
はインデックス値を示している。同図に示すように、イ
ンデックスは「均等分割」、「循環分割」、「三角分
割」等の種々の方法で分割することができる。On the other hand, when the division is performed based on the index section of the division information, the number of repetitions in the DO loop is I = 3,100 I = 101,200 I = 20
1,300 I = 301,398, which matches the data division. As described above, by dividing an instruction or an array based on the index section of the division information instead of the start value and the end value of the repetition or the upper and lower limits of the subscript of the array declaration, the boundary as shown in the above example is obtained. Even when the value is out of the calculation target, the division of the instruction and the array can be matched even in the reduced section. FIG. 3 is a diagram showing a method of dividing an index. In FIG. 3, the vertical axis indicates a processor number and the horizontal axis indicates an index value. As shown in the figure, the index can be divided by various methods such as “equal division”, “circular division”, “triangular division” and the like.

【００２２】上記分割方法において、例えば、三角分割
は、ＤＯループが下記のような２重ループになっている
場合など、後ろの方ほど処理量が多い場合に採用するこ
とにより、処理量を均一化することができる。図４は、並列実行を行うためのソース・プログラムをコ
ンパイルするためのコンパイラの構成の一例を示す図で
あり、同図において、２１は並列実行するためのソース
・プログラム、２２はソース・プログラムをコンパイル
するためのコンパイラ、２３はソース・プログラムを解
析するソース・プログラム解析部、２４はソース・プロ
グラムより自動的に並列化可能部分を見つけ出す最適化
部、２５は解析されたソース・プログラムより実行形式
の並列オブジェクト・プログラムを生成するコード生成
部、２６はソース・プログラム解析部において構文解析
を行うための構文、２７は作成された並列オブジェクト
・プログラムである。In the above dividing method, for example, the triangular division is employed when the processing amount is larger toward the rear, such as when the DO loop is a double loop as described below, so that the processing amount is uniform. Can be FIG. 4 is a diagram showing an example of the configuration of a compiler for compiling a source program for performing parallel execution. In FIG. 4, reference numeral 21 denotes a source program for parallel execution, and 22 denotes a source program. Compiler for compiling, 23 is a source program analyzing unit for analyzing a source program, 24 is an optimizing unit for automatically finding a parallelizable part from the source program, and 25 is an executable form from the analyzed source program , A code generator for generating a syntax analysis in the source program analysis unit; and 27, a created parallel object program.

【００２３】なお、図４における最適化部２４は必要に
応じて設けられるものであって、図２の例に示したよう
に、ソース・プログラム中に並列化部分が記述されてい
る場合には必要ない。同図において、ソース・プログラ
ム２１がコンパイラ２２に与えられると、ソース・プロ
グラム２１は、まず、ソース・プログラム解析部２３に
おいてソース・プログラムの解析が行われる。The optimizing unit 24 shown in FIG. 4 is provided as needed. As shown in the example of FIG. 2, when the parallelized part is described in the source program, unnecessary. In the figure, when a source program 21 is provided to a compiler 22, the source program 21 is first analyzed by a source program analysis unit 23.

【００２４】ソース・プログラム解析部２３において構
文解析を行う際には、通常のソース・プログラムを解析
する場合に用いられる構文２６の文法に加え、並列実行
用ソース・プログラムを解釈するための並列用構文２６
ａの文法を参照して、構文解析が行われる。図５はＦｏ
ｒｔｒａｎのソース・プログラムに並列実行用の分割記
述を採用した構文の一例を示す図であり、同図に示した
構文は、図２のソース・プログラムＰ３をＢＮＦ（Ｂａ
ｃｋｕｓ−ＮａｕｒＦｏｒｍ）により記述した一例を
示している（ＢＮＦについては、例えば、A.V.ホセ外２
名著「コンパイラＩ−原理・技法・ツール」株式会社サ
イエンス社1990/10/10初版発行などコンパイラ関係の文
献を参照されたい）。When the syntax analysis is performed by the source program analysis unit 23, in addition to the grammar of the syntax 26 used for analyzing a normal source program, a parallel program for interpreting a parallel execution source program is used. Syntax 26
Parsing is performed with reference to the grammar of a. FIG. 5 shows Fo
3 is a diagram illustrating an example of a syntax that employs a split description for parallel execution in a source program of rtran. The syntax illustrated in FIG.
ckus-Naur Form).
(See Compiler I-Principles, Techniques and Tools, Compiler-related literature such as the first edition of Science Co., Ltd. 1990/10/10).

【００２５】ソース・プログラム解析部２３の出力は、
必要に応じて最適化部２４に与えられる。なお、最適化
部２４は、前記したように、ソース・プログラム中に並
列化部分が記述されている場合には必要ない。最適化部
２４においては、与えられたソース・プログラム中で並
列化可能な部分を選択する。ソース・プログラム中で並
列化可能な部分としては、ＤＯループのように繰り返し
演算が行われる部分が選択されるが、その選択にあたっ
ては、例えば、多重ＤＯループの場合にはＤＯループの
交換を行い出来るだけ外側のループを分割することによ
り並列処理制御の回数を少なくしてオーバ・ヘッドを削
減するなど、その実行性能を考慮する。The output of the source program analysis unit 23 is
It is provided to the optimization unit 24 as needed. Note that, as described above, the optimization unit 24 is not necessary when the parallelized part is described in the source program. The optimizing unit 24 selects a parallelizable part in a given source program. As a part that can be parallelized in the source program, a part where repetitive operations are performed, such as a DO loop, is selected. For this selection, for example, in the case of a multiple DO loop, the DO loop is exchanged. The execution performance is taken into consideration, such as dividing the outer loop as much as possible to reduce the number of parallel processing controls and reduce overhead.

【００２６】以上のようにして、ソース・プログラム解
析部において、解析が行われたプログラム、あるいは、
最適化部２４において、並列化対象ループが選択された
プログラムはコード生成部２５に与えられる。コード生
成部２５においては、上記プログラムより最終的なオブ
ジェクト・プログラムが生成される。As described above, the program analyzed by the source program analysis unit, or
In the optimizing unit 24, the program in which the loop to be parallelized is selected is provided to the code generating unit 25. The code generator 25 generates a final object program from the above program.

【００２７】コード生成部２５においてオブジェクト・
プログラムを生成するに際して、命令については、繰り
返し演算の始値、終値と分割情報のインデックス値とが
比較され、実行するプロセッサが決定されるとともに、
データについては、配列宣言の添字の下限、上限と分割
情報のインデックス値とが比較され、データを割りつけ
るプロセッサが決定される。In the code generation unit 25, the object
When generating a program, for an instruction, a start value and an end value of a repetitive operation are compared with an index value of division information, and a processor to be executed is determined.
For the data, the lower limit and upper limit of the subscript in the array declaration are compared with the index value of the division information, and the processor to which the data is allocated is determined.

【００２８】図５に示した例の場合には、「locate-sta
tement」に対しては、「array-name」の宣言における上
下限値を、分割情報のインデックス範囲と照らし合わせ
て複数の配列に分割し、それぞれのプロセッサ用の実行
形式プログラムに組み込む。また、「parallel-do-stat
ement 」に対しては、「start-value 」および「end-va
lue 」を分割情報のインデックス範囲に照らし合わせ
て、それぞれのプロセッサごとの新しい「start-value
」、「end-value 」による繰り返しを実行形式のプロ
グラムに組み込む。In the example shown in FIG. 5, "locate-sta
For "tement", the upper and lower limit values in the declaration of "array-name" are divided into a plurality of arrays according to the index range of the division information, and incorporated into the executable program for each processor. Also, "parallel-do-stat
ement "for" start-value "and" end-va
lue "against the index range of the split information, and a new" start-value
, "End-value" repetition into the executable program.

【００２９】以上のような処理により、コンパイラ２２
は並列実行用の並列オブジェクト・プログラム２７を生
成して出力し、並列オブジェクト・プログラム２７ａな
いし２７ｃが各プロセッサにロードされ実行される。実
際には、各プロセッサ用の並列オブジェクト・プログラ
ム２７ａないし２７ｃは一連のプログラムとして出力さ
れて各プロセッサにロードされ、各プロセッサにおける
実行時、各プロセッサが自プロセッサに対応したプログ
ラム部分を選んで実行する。With the above processing, the compiler 22
Generates and outputs a parallel object program 27 for parallel execution, and the parallel object programs 27a to 27c are loaded into each processor and executed. Actually, the parallel object programs 27a to 27c for each processor are output as a series of programs and loaded into each processor. When each processor executes, each processor selects and executes a program portion corresponding to its own processor. .

【００３０】なお、以上の実施例においては、分散メモ
リ方式を持つマルチ・プロセッサ・システムに本発明を
適用した例を示したが、本発明は上記実施例に限定され
るものではなく、例えば、共有メモリ方式をもつマルチ
・プロセッサ・システムであっても、メモリと各プロセ
ッサとの間の距離によりメモリへのアクセス速度に差が
あり、各プロセッサの命令実行部が参照するデータを得
るためのデータ転送時間が演算時間より大きく、データ
転送時間の削減がプログラム全体の実行速度の向上に寄
与するシステムに適用することにより、分散メモリ方式
の場合と同様な効果を得ることが可能である。In the above embodiment, an example is shown in which the present invention is applied to a multiprocessor system having a distributed memory system. However, the present invention is not limited to the above embodiment. Even in a multi-processor system having a shared memory system, there is a difference in the access speed to the memory depending on the distance between the memory and each processor, and the data for obtaining the data referred to by the instruction execution unit of each processor. By applying the present invention to a system in which the transfer time is longer than the calculation time and the reduction of the data transfer time contributes to the improvement of the execution speed of the entire program, the same effect as in the case of the distributed memory system can be obtained.

【００３１】また、上記実施例においては、ソース・プ
ログラムをコンパイラによりコンパイルして実行する例
を示したが、本発明は上記実施例に限定されるものでは
なく、例えば、ソース・プログラムを計算機上でそのま
ま実行するインタプリータ方式、あるいは、一つのソー
ス・プログラムから複数個の新ソース・プログラムを生
成するプリプロセッサ方式にも適用することができる。Further, in the above embodiment, an example in which a source program is compiled by a compiler and executed is described. However, the present invention is not limited to the above embodiment. The present invention can also be applied to an interpreter system which executes the program as it is, or a preprocessor system which generates a plurality of new source programs from one source program.

【００３２】[0032]

【発明の効果】以上説明したことから明らかにように、
本発明においては、ソース・プログラム中の一つの配列
データを複数個の配列に分割し複数のプロセッサのメモ
リへ割り当て、上記プログラム中の命令部と配列データ
の分割・割り当てを一致させるようにしたので、並列実
行用プログラムを容易に記述することができるととも
に、データ参照時のデータ転送時間を削減し、並列実行
における実行時間を短縮することが可能となる。As apparent from the above description,
In the present invention, one array data in the source program is divided into a plurality of arrays and allocated to the memories of a plurality of processors, so that the instruction part in the program and the division / allocation of the array data coincide with each other. In addition, it is possible to easily describe a program for parallel execution, reduce the data transfer time when referring to data, and reduce the execution time in parallel execution.

[Brief description of the drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の実施例における並列実行処理を示す図
である。FIG. 2 is a diagram illustrating a parallel execution process according to the embodiment of the present invention.

【図３】インデックスの分割方法を示す図である。FIG. 3 is a diagram showing a method of dividing an index.

【図４】本発明の実施例におけるコンパイラの構成を示
す図である。FIG. 4 is a diagram showing a configuration of a compiler according to the embodiment of the present invention.

【図５】本発明の実施例における構文の一例を示す図で
ある。FIG. 5 is a diagram showing an example of a syntax in the embodiment of the present invention.

【図６】複数のプロセッサからなる計算機システムの構
成を示す図である。FIG. 6 is a diagram illustrating a configuration of a computer system including a plurality of processors.

【図７】従来例（その１）を示す図である。FIG. 7 is a diagram showing a conventional example (part 1).

【図８】従来例（その２）を示す図である。FIG. 8 is a diagram showing a conventional example (part 2).

[Explanation of symbols]

１，２１ソース・プログラム２マルチ・プロセッサ・システム２２コンパイラ２７オブジェクト・プログラム１１ａ，１１ｂプロセッサ１２ａ，１２ｂメモリ２３ソース・プログラム解析部２４最適化部２５コード生成部 1, 21 Source program 2 Multi-processor system 22 Compiler 27 Object program 11a, 11b Processor 12a, 12b Memory 23 Source program analysis unit 24 Optimization unit 25 Code generation unit

Claims

(57) [Claims]

1. A multi-processor computer system of a distributed memory system or a shared memory system having a difference in access speed from a processor to a memory depending on a distance between the processor and the memory, wherein the multiprocessor computer system is described in a source program. based on the number of processors, and the index section serving as a division criterion of instructions and data, and information indicating the method of division index, source
One one-dimensional array data and instructions in the program are
An object divided into a plurality of parts so that the division of the instruction part in the source program into each processor and the division of the one-dimensional array data into the memory of each processor match.
A program is generated for each processor, and the object is stored in each processor of the multi-processor computer system.
Assign the door program, each processor is assigned above object
.Multiple programs characterized by executing programs in parallel
Processor computing system.