JPH06131313A

JPH06131313A - System for dividing/allocating data

Info

Publication number: JPH06131313A
Application number: JP28151792A
Authority: JP
Inventors: Shuichi Nakamura; 修一中村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-10-20
Filing date: 1992-10-20
Publication date: 1994-05-13
Anticipated expiration: 2016-12-17
Also published as: JP3239963B2

Abstract

PURPOSE:To provide the system for dividing/allocating instructions and data which can easily describe a program for parallel execution, can reduce data transfer time at the time of referring to data and can shorten execution time in parallel execution. CONSTITUTION:In the case of parallelly executing one source program 1 while plural processors 2a, 2b and 2c take partial charge of it, one piece of arrangement data in the program 1 are divided and allocated to the memories of the plural processors 2a, 2b and 2c. Thus, since the arrangement data are allocated, the program for parallel execution can be easily described, and the data transfer time in the case of referring to data can be reduced. Further, since the division of the instruction part into the respective processors 2a, 2b and 2c is made coincident with the division of the arrangement data, the respective processors can perform parallel execution without any problem even when the number of times of repetition is made different for the arrangement data and the instructions.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は複数のプロセッサから構
成される計算機システムにおいて、複数のプロセッサに
よりプログラムを並列実行するためのデータの分割・割
り当て方式に関し、特に、本発明は各プロセッサがそれ
ぞれ固有のメモリを持つ分散メモリ方式のマルチ・プロ
セッサ計算機システムに適用するに好適なデータの分割
・割り当て方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data division / allocation system for parallel execution of a program by a plurality of processors in a computer system composed of a plurality of processors. The present invention relates to a data division / allocation method suitable for application to a distributed memory type multi-processor computer system having the above memory.

【０００２】[0002]

【従来の技術】複数のプロセッサからなる計算機システ
ムとして、従来から図６（ａ），（ｂ）に示すシステム
が知られている。同図（ａ），（ｂ）において、１１
ａ，１１ｂ…はプロセッサ、１２，１２ａ，１２ｂ…は
メモリ、１３はネットワークを示す。同図（ａ）は、共
有メモリ方式のマルチ・プロセッサ・システムを示し、
各プロセッサ１１ａ，１１ｂはメモリ１２を共有してお
り、メモリ１２は各プロセッサ１１ａ，１１ｂの命令か
ら直接アクセス可能である。2. Description of the Related Art As a computer system including a plurality of processors, the systems shown in FIGS. 6 (a) and 6 (b) have been conventionally known. 11A and 11B in FIG.
a, 11b ... Are processors, 12, 12a, 12b ... Are memories, and 13 is a network. FIG. 1A shows a shared memory multiprocessor system,
The processors 11a and 11b share the memory 12, and the memory 12 can be directly accessed from the instructions of the processors 11a and 11b.

【０００３】同図（ｂ）は分散メモリ方式のマルチ・プ
ロセッサ・システムを示し、各プロセッサ１１ａ，１１
ｂ…が固有のメモリ１２ａ，１２ｂ…を有し、各プロセ
ッサ間の通信はネットワーク１３を介して行われる。と
ころで、上記したマルチ・プロセッサ・システムにおい
て、複数のプロセッサによりプログラムを並列実行する
場合、従来、次の２つの方法が用いられていた。並列実行可能部分にその旨の指示を行った１つのソ
ース・プログラムを変換して並列実行する方法。FIG. 1B shows a distributed memory type multi-processor system.
b have their own memories 12a, 12b, ..., and communication between the processors is performed via the network 13. By the way, in the above-described multi-processor system, when a program is executed in parallel by a plurality of processors, the following two methods have been conventionally used. A method of converting one source program instructed to the parallel-executable part and executing the source program in parallel.

【０００４】図７（ａ）は同図（ｃ）に示すソース・プ
ログラムＰ１をプロセッサ１１ａ，１１ｂ、メモリ１２
より構成されるメモリ共有方式のマルチ・プロセッサ・
システムにおいて実行する場合を示した図である。同図
におけるソース・プログラムＰ１は、ファイルより配列
Ａにデータを読込み２台のプロセッサによりＤＯループ
を並列実行する例を示したものであり、「READ A」によ
りファイルよりデータを配列Ａに読み込み、「PARALLEL
DO I=1,1000…」「=A(I) 」により並列実行を行う。FIG. 7A shows the source program P1 shown in FIG. 7C as the processors 11a and 11b and the memory 12.
Memory sharing multi-processor
It is the figure which showed the case where it performs in a system. The source program P1 in the figure shows an example in which the data is read from the file to the array A and the DO loop is executed in parallel by the two processors, and the data is read from the file to the array A by "READ A". "PARALLEL
DO I = 1,1000 ... ”and“ = A (I) ”execute parallel execution.

【０００５】上記ソース・プログラムＰ１は共有メモリ
方式のマルチ・プロセッサ・システムにおいて、同図
（ａ）に示すように実行される。プロセッサ１１ａによ
り、メモリ１２に配列の領域を確保したのち、「READ
A」を実行して、メモリ１２に確保された領域にファイ
ルよりデータを読み込む。ついで、プロセッサ１１ａ，
１１ｂにより並列実行が開始される。すなわち、ＤＯル
ープの１〜１０００のループを２つに分割し、プロセッ
サ１１ａにより１〜５００のＤＯループを、また、プロ
セッサ１１ｂにより５０１〜１０００のＤＯループを並
列に実行し、参照データは、それぞれのプロセッサ１１
ａ，１１ｂによりメモリ１２から読み込まれる。The source program P1 is executed in a shared memory multiprocessor system as shown in FIG. After the array area is secured in the memory 12 by the processor 11a, "READ
"A" is executed to read the data from the file into the area secured in the memory 12. Then, the processor 11a,
Parallel execution is started by 11b. That is, the DO loop of 1 to 1000 is divided into two, the processor 11a executes the DO loop of 1 to 500, and the processor 11b executes the DO loop of 501 to 1000 in parallel. Processor 11
It is read from the memory 12 by a and 11b.

【０００６】同図（ｂ）は上記ソース・プログラムＰ１
を、メモリ１２ａを持つプロセッサ１１ａ，メモリ１２
ｂを持つプロセッサ１１ｂより構成される分散メモリ方
式のマルチ・プロセッサ・システムにより実行する場合
を示す図である。同図（ｂ）において、プロセッサ１１
ａにより、メモリ１２ａに配列の領域を確保したのち、
「READ A」を実行して、メモリ１２ａに確保された領域
にファイルよりデータを読み込む。FIG. 2B shows the above source program P1.
To the processor 11a having the memory 12a and the memory 12
FIG. 10 is a diagram showing a case of execution by a distributed memory multi-processor system including a processor 11b having b. In FIG. 2B, the processor 11
After securing the array area in the memory 12a by a,
"READ A" is executed to read the data from the file into the area secured in the memory 12a.

【０００７】ついで、ＤＯループの１〜１０００のルー
プを２つに分割し、同図（ａ）に示したように、プロセ
ッサ１１ａ，プロセッサ１１ｂにより１〜５００のＤＯ
ループ、５０１〜１０００のＤＯループを並列に実行す
る。また、この場合には、配列データがメモリ１２ａに
格納されているため、プロセッサ１１ｂがＤＯループの
命令を実行するごとに、データがプロセッサ１１ａ，１
１ｂ間で転送される。このため転送のオーバ・ヘッドが
生じて、実行時間が増大する。プロセッサごとにソース・プログラムを記述し、そ
れぞれを実行形式プログラムに変換して並列実行する方
法。Then, the DO loop of 1 to 1000 is divided into two, and the DO of 1 to 500 is divided by the processor 11a and the processor 11b as shown in FIG.
The loop, DO loops of 501 to 1000 are executed in parallel. Further, in this case, since the array data is stored in the memory 12a, the data is stored in the processors 11a, 1 each time the processor 11b executes the instruction of the DO loop.
It is transferred between 1b. This causes transfer overhead and increases execution time. A method of writing a source program for each processor, converting each to an executable program, and executing them in parallel.

【０００８】図８（ａ）はメモリ１２ａを持つプロセッ
サ１１ａ、メモリ１２ｂを持つプロセッサ１１ｂから構
成される分散メモリ方式のマルチ・プロセッサ・システ
ムにより、それぞれのプロセッサ用に記述された図８
（ｂ）に示すソース・プログラムＰ２ａ、ソース・プロ
グラムＰ２ｂを実行する場合を示す図である。ソース・
プログラムＰ２ａ，ｐ２ｂは、ファイルよりそれぞれ配
列ＡおよびＢにデータを読込み、ソース・プログラムＰ
１と同様に、２台のプロセッサによりＤＯループを並列
実行する例を示したものである。FIG. 8A is a diagram for each processor described by a distributed memory multiprocessor system including a processor 11a having a memory 12a and a processor 11b having a memory 12b.
It is a figure which shows the case where the source program P2a and the source program P2b shown in (b) are executed. Source·
The programs P2a and p2b read the data into the arrays A and B, respectively, from the file, and the source program P
As in the case of No. 1, an example in which a DO loop is executed in parallel by two processors is shown.

【０００９】ソース・プログラムＰ２ａはプロセッサ１
１ａ用に記述されたプログラムであり、ソース・プログ
ラムＰ２ｂはプロセッサ１１ｂ用に記述されたプログラ
ムである。上記ソース・プログラムＰ２ａ，Ｐ２ｂは分
散メモリ方式のマルチ・プロセッサ・システムにおい
て、同図（ａ）に示すように実行される。The source program P2a is the processor 1
The source program P2b is a program written for 1a, and the source program P2b is a program written for the processor 11b. The source programs P2a and P2b are executed in a distributed memory multiprocessor system as shown in FIG.

【００１０】プロセッサ１１ａ，１１ｂにより、それぞ
れのメモリ１２ａ，１２ｂに配列の領域を確保したの
ち、プロセッサ１１ａにおいては「READ A」を実行し
て、配列データを５００要素分を読み込んで配列Ａに格
納し、その後、”前半完了”という事象を通知する。プ
ロセッサ１１ｂにおいては、”前半完了”という事象が
起きることを待ち続け、それが満足したのちに「READ
B」を実行して配列データを５００要素分読み込み配列
Ｂへ格納する。After the array areas are secured in the memories 12a and 12b by the processors 11a and 11b, the processor 11a executes "READ A" to read the array data for 500 elements and store the array data in the array A. Then, the event of "first half completed" is notified. The processor 11b continues to wait for the occurrence of the "first half completed" event, and after satisfying it, "READ
"B" is executed to read the array data for 500 elements and store it in array B.

【００１１】以上のようにして、メモリ１２ａに１〜５
００までのデータが格納されるとプロセッサ１１ａによ
り、また、メモリ１２ｂに５０１〜１０００のデータが
格納されるとプロセッサ１１ｂにより並列にＤＯループ
を実行し、参照データは、それぞれのプロセッサ１１
ａ，１１ｂによりメモリ１２ａ，１２ｂから読み込まれ
る。As described above, 1 to 5 are stored in the memory 12a.
The DO loop is executed in parallel by the processor 11a when the data up to 00 is stored, and by the processor 11b when the data of 501 to 1000 is stored in the memory 12b.
It is read from the memories 12a and 12b by a and 11b.

【００１２】[0012]

【発明が解決しようとする課題】ところで、上記した
の方法は、図６（ａ）に示したような、均一の共有メモ
リ方式（各プロセッサからのメモリに対するアクセス速
度にあまり差がない共有メモリ方式）のマルチ・プロセ
ッサ・システムに適用するには適しているが、図６
（ｂ）に示した分散メモリにおいては、データの参照時
に他のプロセッサとのデータ転送が生じるため、実行時
間が増大するという欠点があった。By the way, the above method is based on a uniform shared memory system (shared memory system in which access speeds to the memories from the processors are not so different from each other as shown in FIG. 6A). 6) is suitable for application to the multiprocessor system of FIG.
In the distributed memory shown in (b), data transfer with another processor occurs at the time of referring to data, so that there is a drawback that the execution time increases.

【００１３】また、上記したの方法は、各プロセッサ
ごとにソース・プログラムを記述しなければならないた
め、例えば、ＲＥＡＤ文のような並列実行できない部分
のために、ＰＯＳＴ，ＷＡＩＴ文による同期制御等の考
慮をプログラム中でするなど、並列実行できない部分の
ための考慮をプログラム中でしなければならず、プログ
ラムの記述がむずかしいという欠点があった。Further, in the above method, since the source program must be written for each processor, for example, because of a portion such as a READ statement that cannot be executed in parallel, synchronous control by a POST or WAIT statement, etc. There is a drawback in that it is difficult to describe the program because it is necessary to consider in the program for the parts that cannot be executed in parallel, such as considering in the program.

【００１４】本発明は上記した従来技術の欠点に鑑みな
されたものであって、複数のプロセッサからなる分散方
式のマルチ・プロセッサ・システムにおいて、プロセッ
サを複数台利用するための命令の分割と、データを各プ
ロセッサの固有のメモリに割り当てるためのデータ分割
を、１つの分割指示により共通に行えるようにすること
により、並列実行用プログラムを容易に記述できるよう
にするとともに、データ参照時及び格納時のデータ転送
時間を削減し、並列実行における実行時間を短縮するこ
とができるデータの分割・割り当て方式を提供すること
を目的とする。The present invention has been made in view of the above-mentioned drawbacks of the prior art. In a distributed multi-processor system including a plurality of processors, instruction division for using a plurality of processors and data are used. Data can be shared by a single partitioning instruction for allocating data to the unique memory of each processor, thereby making it possible to easily write a parallel execution program, and at the time of data reference and storage. An object of the present invention is to provide a data division / allocation method that can reduce the data transfer time and the execution time in parallel execution.

【００１５】[0015]

【課題を解決するための手段】図１は本発明の原理図で
ある。上記課題を解決するため、本発明の請求項１の発
明は、図１に示すように、分散メモリ方式、もしくは、
プロセッサとメモリ間の距離によってプロセッサからメ
モリへのアクセス速度に差がある共有メモリ方式のマル
チ・プロセッサ計算機システムにおいて、マルチ・プロ
セッサ計算機システム２の複数のプロセッサ２ａ，２
ｂ，２ｃにより一つのソース・プログラム１を分担して
並列実行する際、一つのソース・プログラム１から、ソ
ース・プログラム１中の一つの配列データを複数個の配
列に分割し、複数プロセッサ２ａ，２ｂ，２ｃのメモリ
に割り当てるようにしたものである。FIG. 1 shows the principle of the present invention. In order to solve the above problems, the invention of claim 1 of the present invention is, as shown in FIG.
In a shared memory multi-processor computer system in which the access speed from the processor to the memory differs depending on the distance between the processor and the memory, a plurality of processors 2a, 2 of the multi-processor computer system 2
When one source program 1 is shared by b and 2c and executed in parallel, one source program 1 divides one array data in the source program 1 into a plurality of arrays, and a plurality of processors 2a, It is arranged to be allocated to the memories 2b and 2c.

【００１６】本発明の請求項２の発明は、分散メモリ方
式、もしくは、プロセッサとメモリ間の距離によってプ
ロセッサからメモリへのアクセス速度に差がある共有メ
モリ方式のマルチ・プロセッサ計算機システムにおい
て、マルチ・プロセッサ計算機システム２の複数のプロ
セッサ２ａ，２ｂ，２ｃにより一つのソース・プログラ
ム１を分担して並列実行させる際、該ソース・プログラ
ム１中の命令部の各プロセッサ２ａ，２ｂ，２ｃへの分
割と配列データの各プロセッサ２ａ，２ｂ，２ｃのメモ
リへの分割とを一致させるようにしたものである。According to a second aspect of the present invention, in a multi-processor computer system of a distributed memory system or a shared memory system in which the access speed from the processor to the memory differs depending on the distance between the processors, When a plurality of processors 2a, 2b, 2c of the processor computer system 2 share one source program 1 for parallel execution, the instruction part in the source program 1 is divided into the processors 2a, 2b, 2c. The arrangement of the array data into the memories of the processors 2a, 2b, 2c is made to coincide with each other.

【００１７】[0017]

【作用】本発明の請求項１の発明においては、マルチ・
プロセッサ計算機システム２の複数のプロセッサ２ａ，
２ｂ，２ｃにより一つのソース・プログラム１を分担し
て並列実行する際、一つのソース・プログラム１から、
ソース・プログラム１中の一つの配列データを複数個の
配列に分割し、複数プロセッサ２ａ，２ｂ，２ｃのメモ
リに割り当てるようにしたので、並列実行用プログラム
を容易に記述することが可能となるとともに、データ参
照時のデータ転送時間を削減し、並列実行における実行
時間を短縮することが可能となる。According to the first aspect of the present invention, the multi
A plurality of processors 2a of the processor computer system 2,
When one source program 1 is shared by 2b and 2c and executed in parallel, from one source program 1,
Since one array data in the source program 1 is divided into a plurality of arrays and allocated to the memories of the plurality of processors 2a, 2b, 2c, the parallel execution program can be easily described. It is possible to reduce the data transfer time at the time of referring to the data and the execution time in parallel execution.

【００１８】本発明の請求項２の発明においては、マル
チ・プロセッサ計算機システム２の複数のプロセッサ２
ａ，２ｂ，２ｃにより一つのソース・プログラム１を分
担して並列実行させる際、該プログラム１中の命令部の
各プロセッサ２ａ，２ｂ，２ｃへの分割と配列データの
各プロセッサ２ａ，２ｂ，２ｃのメモリへの分割とを一
致させるようにしたので、配列データの数と命令におけ
る繰り返し回数が異なっていても、各プロセッサにおい
て、問題なく並列実行を行うことができ、請求項１の発
明と同様、並列実行用プログラムを容易に記述すること
が可能となるとともに、データ参照時のデータ転送時間
を削減し、並列実行における実行時間を短縮することが
可能となる。According to a second aspect of the present invention, a plurality of processors 2 of the multi-processor computer system 2 are provided.
When one source program 1 is shared by a, 2b, and 2c and executed in parallel, the instruction part in the program 1 is divided into the processors 2a, 2b, and 2c, and the processor 2a, 2b, and 2c of the array data is divided. Since the division into memory is made to match, even if the number of array data and the number of repetitions of instructions are different, parallel execution can be performed without problems in each processor, and the same as in the invention of claim 1. The parallel execution program can be easily written, the data transfer time at the time of data reference can be reduced, and the execution time in parallel execution can be shortened.

【００１９】[0019]

【実施例】図２は本発明におけるソース・プログラムの
一例と、そのプログラムを図６（ｂ）に示した分散メモ
リ方式のマルチ・プロセッサ・システムにより実行する
場合の動作を示す図である。同図において、１１ａ，１
１ｂはプロセッサ、１２ａ，１２ｂは、それぞれプロセ
ッサ１１ａ，１１ｂに固有のメモリである。また、Ｐ３
は同図のシステムにより実行されるソース・プログラム
の一例であり、前記したソース・プログラムＰ１と同様
に、ファイルより配列Ａにデータを読込み２台のプロセ
ッサによりＤＯループを並列実行する例を示したもので
ある。なお、ソース・プログラムＰ３は後述するコンパ
イラによりコンパイルされプロセッサ１１ａ，１１ｂに
より実行される実行形式のプログラムに変換される。FIG. 2 is a diagram showing an example of a source program according to the present invention and an operation when the program is executed by the distributed memory multiprocessor system shown in FIG. 6B. In the figure, 11a, 1
1b is a processor, and 12a and 12b are memories unique to the processors 11a and 11b, respectively. Also, P3
Is an example of a source program executed by the system shown in the figure, and an example in which data is read from the file into the array A and the DO loop is executed in parallel by the two processors is shown as in the case of the source program P1. It is a thing. The source program P3 is converted into an executable program which is compiled by a compiler described later and executed by the processors 11a and 11b.

【００２０】また、実行形式に変換されたプログラムの
プロセッサ１１ａ，１１ｂにおける実行は、前記した図
８の場合と同様であり、プロセッサ１１ａにおいて「RE
AD A」によりファイルより配列Ａにデータを読み込み、
プロセッサ１１ｂにデータを転送して、並列実行を行
う。ソース・プログラムＰ３において、「PARTITION P=
(2,1:1000,BAND) 」は、命令部とデータ部を分割し複数
のプロセッサに割り当てるための分割情報であり、上記
記述において、括弧内の「2 」は命令又はデータを何台
のプロセッサへ割り当てるかを示すプロセッサ数、「1:
1000」は命令又はデータの分割基準となるインデックス
区間、「BAND」はインデックスの分割方法を示す。The execution of the program converted into the execution format in the processors 11a and 11b is the same as in the case of FIG.
"AD A" reads the data from the file into array A,
Data is transferred to the processor 11b to execute parallel execution. In the source program P3, "PARTITION P =
(2,1: 1000, BAND) "is division information for dividing the instruction part and the data part and assigning them to multiple processors. In the above description," 2 "in parentheses indicates how many instructions or data Number of processors that indicates whether to allocate to the processor, "1:
“1000” indicates an index section which is a reference for dividing an instruction or data, and “BAND” indicates an index dividing method.

【００２１】また、「PARALLEL DO …」は図７のソース
・プログラムＰ１で示したものと同様、並列実行開始を
記述したものである。ここで、上記インデックス区間、
インデックスの分割方法について、説明する。インデックス区間インデックス区間は上記したように命令又はデータの分
割基準となるものであり、図２に示したソース・プログ
ラムＰ３においては、配列宣言が「REAL A(1000)」、ま
た、ＤＯループにおける繰り返し回数が「I=1,1000」で
あり、これらはインデックス区間「1:1000」と一致して
いる。このため、配列宣言の上下限値、あるいは、ＤＯ
ループにおける繰り返し回数により命令、データを分割
しても、問題は生じない。Also, "PARALLEL DO ..." describes the start of parallel execution, as in the case of the source program P1 in FIG. Here, the index section,
A method of dividing the index will be described. Index section The index section serves as a reference for dividing instructions or data as described above. In the source program P3 shown in FIG. 2, the array declaration is "REAL A (1000)" and the repetition in the DO loop is also performed. The number of times is “I = 1,1000”, and these coincide with the index section “1: 1000”. Therefore, the upper and lower limits of the array declaration, or DO
There is no problem even if the instruction and data are divided according to the number of repetitions in the loop.

【００２２】しかしながら、上記配列宣言やＤＯループ
における繰り返し回数が一致しない場合、配列宣言の上
下限値、あるいは、ＤＯループにおける繰り返し回数い
ずれか一方により命令、データを分割すると、他方によ
る分割値と一致しないという問題が生ずる。例えば、下
記の例において、 REAL A(400) ・・ DO I=3,398 …=A(I) END DO 配列宣言においては、Ａ（４００）であるが、ＤＯルー
プにおける繰り返し回数はＩ＝３，３９８であり、配列
宣言の両端が「２」ずつ計算対象外となっている。上記
例において、配列Ａを４分割すると、配列ＡはＡ（１：
１００）Ａ（１０１：２００）Ａ（２０１：３０
０）Ａ（３０１：４００）に分割される。また、ＤＯ
ループのＩ＝３，３９８により、繰り返し回数を４分割
すると、Ｉ＝３，１０１Ｉ＝１０２，２００Ｉ＝２
０１，２９９Ｉ＝３００，３９８に分割され、１０１
と３００の所で両者は一致しない。However, when the number of repetitions in the array declaration or the DO loop does not match, if the instruction or data is divided by either the upper or lower limit value of the array declaration or the number of repetitions in the DO loop, it matches the division value by the other. The problem of not doing occurs. For example, in the following example, REAL A (400) ... DO I = 3,398 ... = A (I) END DO is A (400) in the array declaration, but the number of iterations in the DO loop is I = 3,398. And both ends of the array declaration are excluded from the calculation by "2". In the above example, when the array A is divided into four, the array A becomes A (1:
100) A (101: 200) A (201: 30)
0) It is divided into A (301: 400). Also, DO
When the number of iterations is divided into four by I = 3,398 of the loop, I = 3,101 I = 102,200 I = 2
01,299 I = 300,398 divided into 101
At 300, the two do not match.

【００２３】一方、分割情報のインデックス区間を基準
として分割すると、ＤＯループにおける繰り返し回数
は、Ｉ＝３，１００Ｉ＝１０１，２００Ｉ＝２０
１，３００Ｉ＝３０１，３９８に分割され、データの
分割と一致する。以上のように、命令や配列を、その繰
り返しの始値、終値あるいは配列宣言の添字の上下限値
ではなく、分割情報のインデックス区間を基準として分
割することにより、上記した例のように、境界値が計算
対象外であるとき等、縮小区間においても、命令と配列
の分割を一致させることができる。インデックスの分割方法図３はインデックスの分割方法を示す図であり、同図に
おいて、縦軸はプロセッサ番号を示し、横軸はインデッ
クス値を示している。同図に示すように、インデックス
は「均等分割」、「循環分割」、「三角分割」等の種々
の方法で分割することができる。On the other hand, when the division is performed using the index section of the division information as a reference, the number of repetitions in the DO loop is I = 3,100 I = 101,200 I = 20
1,300 I = 301,398, which corresponds to the division of data. As described above, the instruction or array is divided based on the index section of the division information, not the opening and closing values of the repetitions or the upper and lower limit values of the subscripts of the array declaration. Even when the value is out of the calculation target, the instruction and array division can be matched even in the reduced section. Index Dividing Method FIG. 3 is a diagram showing an index dividing method. In the figure, the vertical axis represents the processor number and the horizontal axis represents the index value. As shown in the figure, the index can be divided by various methods such as "equal division", "circular division", and "triangular division".

【００２４】上記分割方法において、例えば、三角分割
は、ＤＯループが下記のような２重ループになっている
場合など、後ろの方ほど処理量が多い場合に採用するこ
とにより、処理量を均一化することができる。 DO I=1,1000 DO J=1,I …… 図４は、並列実行を行うためのソース・プログラムをコ
ンパイルするためのコンパイラの構成の一例を示す図で
あり、同図において、２１は並列実行するためのソース
・プログラム、２２はソース・プログラムをコンパイル
するためのコンパイラ、２３はソース・プログラムを解
析するソース・プログラム解析部、２４はソース・プロ
グラムより自動的に並列化可能部分を見つけ出す最適化
部、２５は解析されたソース・プログラムより実行形式
の並列オブジェクト・プログラムを生成するコード生成
部、２６はソース・プログラム解析部において構文解析
を行うための構文、２７は作成された並列オブジェクト
・プログラムである。In the above division method, for example, the triangular division is adopted when the processing amount becomes larger toward the rear, such as when the DO loop is a double loop as described below, so that the processing amount becomes uniform. Can be converted. DO I = 1,1000 DO J = 1, I ... FIG. 4 is a diagram showing an example of the configuration of a compiler for compiling a source program for parallel execution, in which 21 is a parallel Source program to be executed, 22 is a compiler for compiling the source program, 23 is a source program analysis unit for analyzing the source program, and 24 is an optimum for automatically finding a parallelizable part from the source program. A code generation unit for generating an executable parallel object program from the analyzed source program, a syntax 26 for performing syntax analysis in the source program analysis unit, and a created parallel object program 27. It is a program.

【００２５】なお、図４における最適化部２４は必要に
応じて設けられるものであって、図２の例に示したよう
に、ソース・プログラム中に並列化部分が記述されてい
る場合には必要ない。同図において、ソース・プログラ
ム２１がコンパイラ２２に与えられると、ソース・プロ
グラム２１は、まず、ソース・プログラム解析部２３に
おいてソース・プログラムの解析が行われる。The optimizing unit 24 in FIG. 4 is provided as needed. As shown in the example of FIG. 2, when the parallelizing part is described in the source program, unnecessary. In the figure, when the source program 21 is given to the compiler 22, the source program 21 is first analyzed by the source program analysis unit 23.

【００２６】ソース・プログラム解析部２３において構
文解析を行う際には、通常のソース・プログラムを解析
する場合に用いられる構文２６の文法に加え、並列実行
用ソース・プログラムを解釈するための並列用構文２６
ａの文法を参照して、構文解析が行われる。図５はＦｏ
ｒｔｒａｎのソース・プログラムに並列実行用の分割記
述を採用した構文の一例を示す図であり、同図に示した
構文は、図２のソース・プログラムＰ３をＢＮＦ（Ｂａ
ｃｋｕｓ−ＮａｕｒＦｏｒｍ）により記述した一例を
示している（ＢＮＦについては、例えば、A.V.ホセ外２
名著「コンパイラＩ−原理・技法・ツール」株式会社サ
イエンス社1990/10/10初版発行などコンパイラ関係の文
献を参照されたい）。When the source program analysis unit 23 performs syntax analysis, in addition to the grammar of the syntax 26 used when analyzing a normal source program, a parallel program for interpreting a parallel execution source program is used. Syntax 26
Parsing is performed with reference to the grammar of a. Figure 5 is Fo
It is a figure which shows an example of the syntax which employ | adopted the division description for parallel execution to the source program of rtran. The syntax shown in the figure is the same as the source program P3 of FIG.
An example described by Ckus-Naur Form is shown (for BNF, for example, AV Jose 2
Please refer to the compiler-related literature such as the famous book "Compiler I-Principles, Techniques, Tools" Science Co., Ltd. 1990/10/10 first edition).

【００２７】ソース・プログラム解析部２３の出力は、
必要に応じて最適化部２４に与えられる。なお、最適化
部２４は、前記したように、ソース・プログラム中に並
列化部分が記述されている場合には必要ない。最適化部
２４においては、与えられたソース・プログラム中で並
列化可能な部分を選択する。ソース・プログラム中で並
列化可能な部分としては、ＤＯループのように繰り返し
演算が行われる部分が選択されるが、その選択にあたっ
ては、例えば、多重ＤＯループの場合にはＤＯループの
交換を行い出来るだけ外側のループを分割することによ
り並列処理制御の回数を少なくしてオーバ・ヘッドを削
減するなど、その実行性能を考慮する。The output of the source program analysis section 23 is
It is given to the optimization unit 24 as needed. The optimizing unit 24 is not necessary when the parallelized portion is described in the source program, as described above. The optimizing unit 24 selects a parallelizable part in a given source program. As a part that can be parallelized in the source program, a part such as a DO loop in which repeated operations are performed is selected. In selecting the part, for example, in the case of a multiple DO loop, the DO loops are exchanged. Consider the execution performance by dividing the outer loop as much as possible to reduce the number of parallel processing controls and reduce the overhead.

【００２８】以上のようにして、ソース・プログラム解
析部において、解析が行われたプログラム、あるいは、
最適化部２４において、並列化対象ループが選択された
プログラムはコード生成部２５に与えられる。コード生
成部２５においては、上記プログラムより最終的なオブ
ジェクト・プログラムが生成される。As described above, the program analyzed by the source program analysis unit, or
The program for which the parallelization target loop is selected by the optimization unit 24 is given to the code generation unit 25. In the code generator 25, a final object program is generated from the above program.

【００２９】コード生成部２５においてオブジェクト・
プログラムを生成するに際して、命令については、繰り
返し演算の始値、終値と分割情報のインデックス値とが
比較され、実行するプロセッサが決定されるとともに、
データについては、配列宣言の添字の下限、上限と分割
情報のインデックス値とが比較され、データを割りつけ
るプロセッサが決定される。In the code generator 25, the object
When generating a program, for the instruction, the start value and end value of the repeated operation are compared with the index value of the division information to determine the processor to execute,
For data, the lower and upper bounds of the subscripts in the array declaration are compared with the index value of the partition information to determine the processor to which the data is assigned.

【００３０】図５に示した例の場合には、「locate-sta
tement」に対しては、「array-name」の宣言における上
下限値を、分割情報のインデックス範囲と照らし合わせ
て複数の配列に分割し、それぞれのプロセッサ用の実行
形式プログラムに組み込む。また、「parallel-do-stat
ement 」に対しては、「start-value 」および「end-va
lue 」を分割情報のインデックス範囲に照らし合わせ
て、それぞれのプロセッサごとの新しい「start-value
」、「end-value 」による繰り返しを実行形式のプロ
グラムに組み込む。In the case of the example shown in FIG. 5, "locate-sta
For “tement”, the upper and lower limit values in the declaration of “array-name” are divided into a plurality of arrays by comparing with the index range of the division information, and embedded in the execution format program for each processor. In addition, "parallel-do-stat
For "ement", "start-value" and "end-va"
"lue" against the index range of the partition information, and a new "start-value" for each processor
, "End-value" repetition is incorporated into the executable program.

【００３１】以上のような処理により、コンパイラ２２
は並列実行用の並列オブジェクト・プログラム２７を生
成して出力し、並列オブジェクト・プログラム２７ａな
いし２７ｃが各プロセッサにロードされ実行される。実
際には、各プロセッサ用の並列オブジェクト・プログラ
ム２７ａないし２７ｃは一連のプログラムとして出力さ
れて各プロセッサにロードされ、各プロセッサにおける
実行時、各プロセッサが自プロセッサに対応したプログ
ラム部分を選んで実行する。With the above processing, the compiler 22
Generates and outputs a parallel object program 27 for parallel execution, and the parallel object programs 27a to 27c are loaded into each processor and executed. Actually, the parallel object programs 27a to 27c for each processor are output as a series of programs and loaded into each processor, and when each processor executes, each processor selects and executes the program part corresponding to its own processor. .

【００３２】なお、以上の実施例においては、分散メモ
リ方式を持つマルチ・プロセッサ・システムに本発明を
適用した例を示したが、本発明は上記実施例に限定され
るものではなく、例えば、共有メモリ方式をもつマルチ
・プロセッサ・システムであっても、メモリと各プロセ
ッサとの間の距離によりメモリへのアクセス速度に差が
あり、各プロセッサの命令実行部が参照するデータを得
るためのデータ転送時間が演算時間より大きく、データ
転送時間の削減がプログラム全体の実行速度の向上に寄
与するシステムに適用することにより、分散メモリ方式
の場合と同様な効果を得ることが可能である。In the above embodiments, an example in which the present invention is applied to a multi-processor system having a distributed memory system has been shown, but the present invention is not limited to the above-mentioned embodiments. Even in a multi-processor system with a shared memory system, there is a difference in the access speed to the memory due to the distance between the memory and each processor, and the data for obtaining the data referenced by the instruction execution unit of each processor By applying to a system in which the transfer time is longer than the calculation time and the reduction of the data transfer time contributes to the improvement of the execution speed of the entire program, it is possible to obtain the same effect as in the case of the distributed memory method.

【００３３】また、上記実施例においては、ソース・プ
ログラムをコンパイラによりコンパイルして実行する例
を示したが、本発明は上記実施例に限定されるものでは
なく、例えば、ソース・プログラムを計算機上でそのま
ま実行するインタプリータ方式、あるいは、一つのソー
ス・プログラムから複数個の新ソース・プログラムを生
成するプリプロセッサ方式にも適用することができる。Further, in the above embodiment, an example in which the source program is compiled and executed by the compiler has been shown, but the present invention is not limited to the above embodiment. For example, the source program can be executed on a computer. The present invention can be applied to an interpreter system that executes the program as it is or a preprocessor system that generates a plurality of new source programs from one source program.

【００３４】[0034]

【発明の効果】以上説明したことから明らかにように、
本発明においては、ソース・プログラム中の一つの配列
データを複数個の配列に分割し複数のプロセッサのメモ
リへ割り当て、上記プログラム中の命令部と配列データ
の分割・割り当てを一致させるようにしたので、並列実
行用プログラムを容易に記述することができるととも
に、データ参照時のデータ転送時間を削減し、並列実行
における実行時間を短縮することが可能となる。As is clear from the above description,
In the present invention, one array data in the source program is divided into a plurality of arrays and allocated to the memories of a plurality of processors so that the instruction part in the program and the allocation / division of the array data are matched. It is possible to easily write a parallel execution program, reduce the data transfer time at the time of data reference, and reduce the execution time in parallel execution.

[Brief description of drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の実施例における並列実行処理を示す図
である。FIG. 2 is a diagram showing parallel execution processing in the embodiment of the present invention.

【図３】インデックスの分割方法を示す図である。FIG. 3 is a diagram showing a method of dividing an index.

【図４】本発明の実施例におけるコンパイラの構成を示
す図である。FIG. 4 is a diagram showing a configuration of a compiler according to an embodiment of the present invention.

【図５】本発明の実施例における構文の一例を示す図で
ある。FIG. 5 is a diagram showing an example of a syntax in the embodiment of the present invention.

【図６】複数のプロセッサからなる計算機システムの構
成を示す図である。FIG. 6 is a diagram showing a configuration of a computer system including a plurality of processors.

【図７】従来例（その１）を示す図である。FIG. 7 is a diagram showing a conventional example (No. 1).

【図８】従来例（その２）を示す図である。FIG. 8 is a diagram showing a conventional example (No. 2).

[Explanation of symbols]

１，２１ソース・プログラム２マルチ・プロセッサ・システム２２コンパイラ２７オブジェクト・プログラム１１ａ，１１ｂプロセッサ１２ａ，１２ｂメモリ２３ソース・プログラム解析部２４最適化部２５コード生成部 1, 21 Source program 2 Multi-processor system 22 Compiler 27 Object program 11a, 11b Processor 12a, 12b Memory 23 Source program analysis unit 24 Optimization unit 25 Code generation unit

Claims

[Claims]

1. A multi-processor computer system of a distributed memory system or a shared memory system in which the access speed from the processor to the memory differs depending on the distance between the processors, and a plurality of multi-processor computer systems (2) are provided. When one source program (1) is shared and executed in parallel by the processors (2a, 2b, 2c) of the above, one array data in the source program (1) is read from one source program (1). Split into multiple arrays,
A data division / allocation system characterized by allocating to the memory of multiple processors (2a, 2b, 2c).

2. A multi-processor computer system of a distributed memory system or a shared memory system in which the access speed from the processor to the memory differs depending on the distance between the processors, and a plurality of multi-processor computer systems (2) are provided. When one source program (1) is shared by the two processors (2a, 2b, 2c) and executed in parallel, each processor (2) of the instruction part in the source program (1)
a, 2b, 2c) and array data processor (2a, 2c
A method of dividing and allocating data, characterized in that the division into memory (b, 2c) is matched.