JPH0573607A

JPH0573607A - Vector instruction generation processing method

Info

Publication number: JPH0573607A
Application number: JP23067191A
Authority: JP
Inventors: Eiji Yamanaka; 栄次山中; Koichiro Hotta; 耕一郎堀田; Hiroshi Nagakura; 浩士長倉; Hideki Nozaki; 英樹野崎
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-09-11
Filing date: 1991-09-11
Publication date: 1993-03-26

Abstract

PURPOSE:To efficiently generate a vectoring parallel processing program on a compiler processing in a computer. CONSTITUTION:In the processing of the computer generating a target program having vector instruction strings which are parallel-processed from a prescribed source program, the vector instruction strings are generated as a single processing on the necessary part of the source program in the first processing step 1. In the second processing step 2, the vector instruction strings are copied, they are set to be the vector instruction strings of the necessary number of parallel processings, data being a processing object is divided and they are distributed into the vector instruction strings of the number of the parallel processings. The operands of the respective vector instruction strings are altered to operands corresponding to distributed data so as to make the parallel processing program.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、計算機のプログラム生
成処理において、複数のベクトル処理装置で並列処理す
るように、ベクトル化した並列処理プログラムを生成す
るためのベクトル命令生成処理方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector instruction generation processing method for generating a vectorized parallel processing program so that a plurality of vector processing devices can perform parallel processing in the program generation processing of a computer.

【０００２】[0002]

【従来の技術と発明が解決しようとする課題】原始プロ
グラムから、並列処理化した目的プログラムを生成する
ための、自動並列化コンパイラの処理においては、いわ
ゆるループスライスの手法により、ループをなすプログ
ラム部分を、複数のループに分割して、複数の処理装置
で並列に各ループを実行するようにする。2. Description of the Related Art In the processing of an automatic parallelizing compiler for generating a parallelized object program from a source program, a program portion forming a loop by a so-called loop slice method. Is divided into a plurality of loops, and a plurality of processing devices execute each loop in parallel.

【０００３】即ち、ループスライス法では図５に示すよ
うに、原始プログラムに、例えば制御変数Ｉについて１
からＮまで繰り返すループがある場合に、これを例えば
２つのループに分割して、１からＮ／２までのループ
と、Ｎ／２＋１からＮまでのループにする。That is, in the loop slice method, as shown in FIG.
If there is a loop that repeats from 1 to N, it is divided into, for example, two loops, a loop from 1 to N / 2 and a loop from N / 2 + 1 to N.

【０００４】なおこの場合に、並列化によって単一シー
ケンシャル処理の場合と処理順序の変化が生じるので、
並列化に先立って、公知のいわゆるデータ依存関係解析
処理を行って、チェック及び必要な処置をする。In this case, since the parallelization causes a change in the processing order as compared with the case of the single sequential processing,
Prior to parallelization, known so-called data dependency analysis processing is performed, and checks and necessary measures are performed.

【０００５】データ依存関係解析処理では、その処理順
序の変化によって、元のプログラム論理との矛盾（更新
内容が参照されるべきなのに、更新前に参照が行われる
ように順序が変わる等）を生じないかをチェックし、矛
盾を生じない場合、あるいは矛盾発生を除くように処置
した上で、以上のようなスライス処理を行う。In the data dependency analysis process, a change in the processing order causes a contradiction with the original program logic (the order is changed so that the reference is made before the update although the update content should be referred). If there is no contradiction, or if there is no contradiction, the slicing process is performed as described above.

【０００６】又、いわゆるベクトル処理装置で実行され
るためのベクトル命令列を生成する、いわゆる自動ベク
トル化コンパイラの処理では、公知のように原始プログ
ラムの各最内ループを、ループに代わる処理を行うベク
トル命令の列に置き換えることができる。Further, in the processing of a so-called automatic vectorizing compiler for generating a vector instruction string to be executed by a so-called vector processing device, each innermost loop of the source program is replaced with a loop, as is well known. It can be replaced by a sequence of vector instructions.

【０００７】従って、図５に示すように、ループスライ
スで分割された各ループを、自動ベクトル化コンパイラ
で処理することにより、複数のベクトル処理装置で並列
処理するベクトル化プログラムが生成できる。Therefore, as shown in FIG. 5, by processing each loop divided by the loop slice by the automatic vectorizing compiler, a vectorizing program for parallel processing by a plurality of vector processing devices can be generated.

【０００８】しかし、この自動ベクトル化処理は、比較
的多量の翻訳時間を要する処理であり、以上の方法によ
れば並列化とベクトル化を行う場合には、このベクトル
化処理時間が更に並列数倍され、コンパイラ処理の時間
を非常に増大する。However, this automatic vectorization process requires a relatively large amount of translation time, and according to the above method, when parallelization and vectorization are performed, the vectorization process time is further increased by the parallel number. Doubled, greatly increasing compiler processing time.

【０００９】本発明は、ベクトル化並列処理プログラム
を効率よく生成できる、ベクトル命令生成処理方法を目
的とする。An object of the present invention is to provide a vector instruction generation processing method capable of efficiently generating a vectorized parallel processing program.

【００１０】[0010]

【課題を解決するための手段】図１は、本発明の構成を
示すブロック図である。図はベクトル命令生成処理方法
の構成であって、所定の原始プログラムから、並列処理
化したベクトル命令列を有する目的プログラムを生成す
る計算機の処理である。FIG. 1 is a block diagram showing the configuration of the present invention. The figure shows a configuration of a vector instruction generation processing method, which is processing of a computer that generates a target program having a vector instruction sequence that has been parallelized from a predetermined source program.

【００１１】第１処理段階１で、該原始プログラムの所
要部分について、単一処理としてベクトル命令列を生成
する。第２処理段階２で、該ベクトル命令列を複写し
て、所要の並列処理数のベクトル命令列とし、処理対象
のデータを分割して、該並列処理数のベクトル命令列に
配分し、各該ベクトル命令列のオペランドを、それぞれ
該配分したデータに対応するオペランドに変更して並列
処理プログラムとする。In the first processing stage 1, a vector instruction sequence is generated as a single process for the required part of the source program. In the second processing stage 2, the vector instruction sequence is copied into a required number of parallel processing vector instruction sequences, and the data to be processed is divided and distributed to the parallel instruction vector instruction sequences. The operands of the vector instruction sequence are changed to the operands corresponding to the distributed data, respectively, to form a parallel processing program.

【００１２】[0012]

【作用】本発明の処理方法により、比較的長い翻訳時間
を要する自動ベクトル化処理が、処理の並列数に関わら
ず１回にまとめられ、その後でベクトル化プログラムを
並列処理数に複写して、オペランドを所要の値に置き換
えることによって、並列処理プログラムを生成するの
で、ベクトル化並列処理プログラムの生成処理時間を大
幅に減少することができる。According to the processing method of the present invention, automatic vectorization processing requiring a relatively long translation time is integrated into one time regardless of the parallel processing number, and then the vectorization program is copied into the parallel processing number, Since the parallel processing program is generated by replacing the operands with required values, the generation processing time of the vectorized parallel processing program can be significantly reduced.

【００１３】[0013]

【実施例】図２は本発明の実施例の処理の流れを示す図
であり、処理段階１において、先ず処理ステップ10で自
動ベクトル化処理により、単一のシーケシャル処理のま
ゝベクトル化して、ベクトル命令列を生成する。FIG. 2 is a diagram showing the flow of processing of an embodiment of the present invention. In the processing stage 1, first, in the processing step 10, automatic vectorization processing is performed to perform vectorization of a single sequential processing, Generate a vector instruction sequence.

【００１４】次に処理ステップ11で、このベクトル命令
列についてデータ依存関係解析処理を行い、ベクトル演
算範囲に現れるデータについて、前記のような意味の依
存関係を求める。Next, in processing step 11, data dependency relationship analysis processing is performed on this vector instruction sequence to obtain the above-described meaning dependency relationship for the data appearing in the vector operation range.

【００１５】処理ステップ12で、データ依存関係の結果
等に基づいてベクトル命令の検査処理を行い、並列化不
可能部分を可能な限り小部分にまとめて分離することに
より、並列化可能部分を大きくして取り出し、又並列の
処理間で同一のデータにアクセスが生じる場合には、必
要な同期及び排他制御のための命令を挿入する位置を指
定しておく。In the processing step 12, the vector instruction is inspected based on the result of the data dependency, etc., and the non-parallelizable parts are grouped into the smallest possible parts so that the parallelizable parts are enlarged. If the same data is accessed during parallel processing, the position for inserting the necessary instruction for synchronization and exclusive control is designated.

【００１６】処理ステップ13で識別して、以上で並列化
可能なベクトル命令列が得られた場合には、ベクトル命
令列及び必要な制御情報を第２処理段階へ渡すので、第
２処理段階の処理ステップ14で並列化可能部分のベクト
ル命令列を所定の並列数になるように複写する。When the parallelized vector instruction sequence is obtained as identified in the processing step 13, the vector instruction sequence and necessary control information are passed to the second processing stage. In processing step 14, the vector instruction string of the parallelizable portion is copied so that the predetermined parallel number is obtained.

【００１７】処理ステップ15で、各並列処理にデータを
分割して割り当てるように、複写したベクトル命令のオ
ペランドの内容を置き換える。処理ステップ16で、それ
らの分担するデータに対応して、必要な同期処理、排他
制御処理のための命令を、指定の位置に挿入し、処理ス
テップ17で各並列処理部分の前後に並列処理の開始及び
終了の手続きを付加する。At processing step 15, the contents of the operands of the copied vector instruction are replaced so that the data is divided and assigned to each parallel processing. In processing step 16, corresponding instructions for the synchronous processing and the exclusive control processing are inserted at designated positions in correspondence with the data to be shared, and in processing step 17, parallel processing is performed before and after each parallel processing part. Add start and end procedures.

【００１８】図３及び図４は、以上の処理をプログラム
例について説明する図であり、図３(a）は原始プログラ
ムのループの例であり、(b）はそれを目的プログラムに
より近い表現の中間命令に変換した、いわゆる中間表現
のプログラムである。なお、図の中の、等は両プロ
グラムの対応を示している。FIG. 3 and FIG. 4 are diagrams for explaining the above processing for a program example, FIG. 3 (a) is an example of a loop of the source program, and FIG. 3 (b) is a representation of it closer to the target program. It is a so-called intermediate representation program converted into intermediate instructions. Note that, etc. in the figure show the correspondence between both programs.

【００１９】図３(b）のプログラムについて、公知のベ
クトル化処理により、ベクトル化不可部分を分離し
て、その他のベクトル化可能部分をベクトル命令列に変
換した結果が図３(c）である。FIG. 3C shows a result of separating the non-vectorizable portion and converting the other vectorizable portions into a vector instruction sequence by a known vectorization process in the program of FIG. 3B. ..

【００２０】図３(c）において、「SVCT」と「EVCT」と
は、それぞれベクトル演算の開始及び終了を示すために
挿入された命令であり、又「VSUM」は総和の演算を行う
ベクトル命令、「VRC 」は一次回帰演算のベクトル命令
であり、その他「VLOAD 」等のように、「V 」を冠した
ものがベクトル命令で、それぞれV に続く名前で示す
「LOAD」等をベクトルデータについて実行する命令を表
している。In FIG. 3 (c), "SVCT" and "EVCT" are instructions inserted to indicate the start and end of the vector operation, respectively, and "VSUM" is a vector instruction for performing the sum operation. , "VRC" is a vector instruction of linear regression operation, and like "VLOAD", etc., vector instructions are those prefixed with "V", such as "LOAD" which is indicated by the name following V for vector data. It represents the instruction to be executed.

【００２１】又、各ベクトル命令のオペランドにおい
て、「A(1:N)」のように表現されたものがベクトルデー
タを指示するオペランドであり、この例はベクトルデー
タA(1)からA(N)までを表す。In the operand of each vector instruction, the one expressed as "A (1: N)" is an operand for indicating vector data. In this example, vector data A (1) to A (N ).

【００２２】なお、との部分は、データ依存関係に
よって、単純にはベクトル化できないので、特殊な変換
処理をしてベクトル化している。以上のようにして、単
一処理の場合のベクトル化を行った結果について、デー
タ依存関係解析の結果により、並列化が不可能な部分
を分離し、並列化可能な、、を複写して２並列処
理とし、オペランドを置き換え、同期、排他制御を追加
して図４の命令列を生成する。Note that the portions (1) and (2) cannot be vectorized simply due to the data dependence, so they are vectorized by a special conversion process. As described above, regarding the result of vectorization in the case of single processing, the part which cannot be parallelized is separated and the part which can be parallelized is copied according to the result of the data dependency analysis. Parallel processing is performed, operands are replaced, synchronization and exclusive control are added, and the instruction sequence of FIG. 4 is generated.

【００２３】図４において、の部分に挿入されている
「SMUTEX」と「EMUTEX」は排他制御の開始と終了の命令
であり、並列の両処理で同一の変数SUM に値を代入して
いるので、排他制御が必要になる。In FIG. 4, "SMUTEX" and "EMUTEX" inserted in the part of are instructions for starting and ending the exclusive control, and the values are assigned to the same variable SUM in both parallel processes. , Exclusive control is required.

【００２４】又の部分で左側の処理に挿入されている
「POST」と、右側の「WAIT」とは同期制御の命令であ
り、左側の処理による配列データＥへのロード (POSTの
直前のVLOAD 命令) を、右側の処理によるＥのストア
(WAITの直後のVSTORE命令) より先に実行する必要があ
るために同期をとる。In the other part, "POST" inserted in the processing on the left side and "WAIT" on the right side are synchronous control instructions, and load to the array data E by the processing on the left side (VLOAD immediately before POST Command), store E by processing on the right
Synchronize because it must be executed before (VSTORE instruction immediately after WAIT).

【００２５】なお、本発明の方法は一重ループの場合に
特に有効であり、多重ループの部分については、その最
内ループを先ずベクトル化した後、外側のループについ
てループスライス法を適用すればよい。The method of the present invention is particularly effective in the case of a single loop. For the multiple loop part, the innermost loop is first vectorized, and then the loop slice method is applied to the outer loop. ..

【００２６】[0026]

【発明の効果】以上の説明から明らかなように本発明に
よれば、計算機のコンパイラ処理において、ベクトル化
並列処理プログラムを効率よく生成できるという著しい
工業的効果がある。As is apparent from the above description, according to the present invention, there is a remarkable industrial effect that the vectorized parallel processing program can be efficiently generated in the compiler processing of the computer.

[Brief description of drawings]

【図１】本発明の構成を示す処理の流れ図FIG. 1 is a process flow chart showing the configuration of the present invention.

【図２】本発明の実施例の処理の流れ図FIG. 2 is a process flow chart of an embodiment of the present invention.

【図３】プログラム例を説明する図FIG. 3 is a diagram illustrating a program example.

【図４】プログラム例を説明する図FIG. 4 is a diagram illustrating a program example.

【図５】ループスライスの例を説明する図FIG. 5 is a diagram illustrating an example of a loop slice.

[Explanation of symbols]

１第１処理段階２第２処理段階 10〜17 処理ステップ 1 First processing stage 2 Second processing stage 10 to 17 processing steps

───────────────────────────────────────────────────── フロントページの続き (72)発明者野崎英樹神奈川県川崎市中原区上小田中1015番地富士通株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hideki Nozaki 1015 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Fujitsu Limited

Claims

[Claims]

1. A process of a computer for generating an object program having a vector instruction sequence parallelized from a predetermined source program, wherein a vector instruction sequence is generated as a single process for a required part of the source program (1 ), Copying the vector instruction sequence into a vector instruction sequence of the required number of parallel processes, dividing the data to be processed, and allocating to the vector instruction sequence of the parallel process number, and the operand of each vector instruction sequence Is configured as a parallel processing program by changing the operands corresponding to the distributed data respectively (2).