JPH11242598A

JPH11242598A - Compiling method and device, object program executing method and device and program storage medium

Info

Publication number: JPH11242598A
Application number: JP10042422A
Authority: JP
Inventors: Kiyobumi Suzuki; 清文鈴木; Takeshi Soga; 武史曽我; Masaki Aoki; 正樹青木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-02-24
Filing date: 1998-02-24
Publication date: 1999-09-07
Anticipated expiration: 2018-02-24
Also published as: JP3887097B2

Abstract

PROBLEM TO BE SOLVED: To increase the vector arithmetic processing speed with mask data following a single loop or a merged loop that is formed in a compiling mode. SOLUTION: A mask generation instruction vgsm(vector generate subarray mask) is outputted when a source program part corresponding to the vector arithmetic with mask data is compiled. The instruction vgsm is a machine instruction which can be directly executed by hardware. That is, the instruction vgsm sets a series part equivalent to the vector length (length of an accessed range) starting at a position that is shown by the value of a gr3 (general-purpose register) at a mask register mr among the series which repeated a basic pattern 1 consisting of the total number of value of gr2 (general-purpose register) after the false value (0) equivalent to the number of value of a gr1 (general-purpose register) continues and then the true value (1) continues. The pattern 1 can also use a series that consists of a prescribed number of truth value and a prescribed number of false value following the truth value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ソ−スプログラム
中の、マスクデ−タ付きのベクトル演算に対応の特定部
分に対するコンパイルの実行により、マスク生成命令を
出力するデ−タ処理、および、オブジェクトプログラム
中のマスク生成命令に基づいてマスクデ−タ付きのベク
トル演算を実行するデ−タ処理に関し、マスクデ−タの
作成処理時間を短縮化するためのマスクデ−タ生成命令
を用いたものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing for outputting a mask generation instruction by compiling a specific portion in a source program corresponding to a vector operation with mask data, and an object. In the data processing for executing a vector operation with mask data based on a mask generation instruction in a program, a mask data generation instruction for shortening a mask data creation processing time is used.

【０００２】本明細書では、マスクデ−タ生成命令を
「マスク生成命令」と略記し、複数の命令からなる命令
列も含む意で「命令」の用語を用いる。[0002] In this specification, a mask data generation instruction is abbreviated as a "mask generation instruction", and the term "instruction" is used to mean an instruction sequence including a plurality of instructions.

【０００３】ベクトル処理方式の計算機では、例えば多
重ル−プの一重化や並立ル−プの融合化などの場合にマ
スクデ−タ付きのベクトル演算を実行している。このベ
クトル演算の実効性を確保するためには、マスクデ−タ
の準備処理に時間を要しないことが必要であり、本発明
はこのような要請に応えるものである。[0003] In a computer of the vector processing system, a vector operation with mask data is executed in the case of, for example, unification of multiple loops or fusion of parallel loops. In order to ensure the effectiveness of this vector operation, it is necessary that no time is required for mask data preparation processing, and the present invention meets such a demand.

【０００４】[0004]

【従来の技術】図１４は、従来の、マスク付き一重化の
コンパイル処理の概要を示す説明図であり、(a) は前提
のソ−スプログラム形式、(b) はコンパイル内容（ソ−
スイメ−ジ）を示している。2. Description of the Related Art FIG. 14 is an explanatory diagram showing an outline of a conventional compile process of unification with a mask. (A) is a prerequisite source program format, and (b) is a compile content (source).
S. image).

【０００５】ソ−スプログラムは、（１００×１００）
の計１００００個の要素からなる配列Ａ、Ｂそれぞれの
（２，２）の要素から（９９，９９）の要素までの計９
６０４個の各要素同士を乗算するものある。The source program is (100 × 100)
Array A, B consisting of a total of 10,000 elements of (2, 2) to (99, 99)
In some cases, 604 elements are multiplied by each other.

【０００６】図１５は、図１４(b) のコンパイル内容に
対応の処理手順を示す説明図であり、その内容は次のよ
うになっている。 (61)マスクデ−タを示す配列maskを作る。 (62)回転数が（１００×１００）の一重ル−プを作り、
配列maskのすべての要素に偽値（０）を設定する命令を
出力する。 (63)元の二重ル−プと同じ構造の二重ル−プを作り、そ
の中で配列maskの必要な要素にだけ真値（１）を設定す
る命令を出力する。 (64)元の二重ル−プを一重ル−プ構造に変換する。この
ときの回転数は（１００×１００）である。 (65)元の二重ル−プ中の実行文が、配列要素それぞれに
対応する配列maskの値が真値（１）のときのみ実行され
るようにＩＦ文（に対応の命令）を挿入する。FIG. 15 is an explanatory diagram showing a processing procedure corresponding to the compile contents of FIG. 14 (b), and the contents are as follows. (61) Create an array "mask" indicating the mask data. (62) Make a single loop with the rotation speed of (100 × 100),
Outputs an instruction to set a false value (0) to all elements of array mask. (63) Create a double loop having the same structure as the original double loop, and output an instruction to set a true value (1) only to necessary elements of the array mask in the double loop. (64) The original double loop is converted into a single loop structure. The rotation speed at this time is (100 × 100). (65) Insert an IF statement so that the original executable statement in the double loop is executed only when the value of the array mask corresponding to each array element is a true value (1). I do.

【０００７】図１６は、図１４(b) のコンパイル内容に
対応の命令イメ−ジを示す説明図であり、・７１はステップ(61),(62) に対応し、・７２はステップ(63)に対応し、・７３はステップ(64),(65) に対応している。FIG. 16 is an explanatory diagram showing an instruction image corresponding to the compile contents of FIG. 14 (b). 71 corresponds to steps (61) and (62), and 72 corresponds to step (63). ), And 73 corresponds to steps (64) and (65).

【０００８】[0008]

【発明が解決しようとする課題】このように、従来のマ
スクデ−タ付きのベクトル処理方式の場合、ル−プ処理
を実行することによりマスクデ−タを準備しているの
で、マスクデ−タ作成のための要処理時間が長くなりマ
スクデ−タ付きのベクトル演算を効率的に行なえないと
いう問題点があった。As described above, in the case of the conventional vector processing method with mask data, since the mask data is prepared by executing the loop processing, the mask data is not prepared. Therefore, there is a problem that the vector processing with mask data cannot be efficiently performed due to a long processing time.

【０００９】そこで、本発明では、ハ−ドウェアが直接
実行可能なマスクデ−タ作成用の機械命令を設け、この
マスク作成命令を用いてコンパイルし、またオブジェク
トプログラム中のマスク作成命令を実行することによ
り、ル−プ一重化やル−プ融合化にともなうマスクデ−
タ付きのベクトル演算処理の高速化を図ることを目的と
する。Accordingly, in the present invention, there is provided a machine instruction for creating mask data which can be directly executed by hardware, compiling using the mask creating instruction, and executing a mask creating instruction in an object program. Mask data associated with loop integration and loop fusion
It is an object of the present invention to speed up vector arithmetic processing with data.

【００１０】また、このマスク作成命令をル−プ融合化
に適用する場合、ル−プ融合化後のプログラム中の共通
式の削除や命令スケジュ−リングなどの最適化を行なう
ことにより、ベクトル計算機の実効性能を一段と向上さ
せることを目的とする。When the mask creation instruction is applied to loop fusion, the vector computer can be optimized by deleting common formulas in the program after loop fusion and optimizing instruction scheduling. It is intended to further improve the effective performance of the.

【００１１】[0011]

【課題を解決するための手段】これを達成するために、
本発明では、偽値または真値の一方を所定個数連続させ
た第１のデ−タ部分と、これに続く、偽値または真値の
他方を所定個数連続させた第２のデ−タ部分と、の基本
パタ−ンからなるデ−タ列中の所定範囲をマスクデ−タ
として設定するためのマスク生成命令ｖｇｓｍ（図１参
照）を、ソ−スプログラムのコンパイルのときに出力す
るコンパイル方法やコンパイル装置を用いていく。Means for Solving the Problems In order to achieve this,
In the present invention, a first data portion in which one of a false value and a true value is continued by a predetermined number is followed by a second data portion in which the other of the false value or the true value is continued by a predetermined number. And a compiling method for outputting a mask generation instruction vgsm (see FIG. 1) for setting a predetermined range in the data sequence consisting of the basic patterns as mask data when compiling the source program. And compiling equipment.

【００１２】マスク生成命令ｖｇｓｍは、・多重ル−プの一重化処理・並立ル−プの融合化処理などをともなうベクトル演算部分のコンパイルのときに
出力される。The mask generation instruction vgsm is output at the time of compiling a vector operation part with, for example, unification processing of multiple loops and fusion processing of parallel loops.

【００１３】マスク生成命令ｖｇｓｍを出力するコンパ
イル装置は、ベクトル演算命令を出力するベクトル化部
と、マスク生成命令出力部とを少なくとも備え、・多重ル−プ処理をマスクデ−タ付の一重ル−プ処理に
変更したかたちのル−プ演算命令を出力するマスク付き
一重化実施部・並立ル−プ処理を融合して最適化したかたちのル−プ
演算命令を出力するマスク付き融合化実施部なども併せ持っている。The compiling apparatus for outputting the mask generation instruction vgsm includes at least a vectorizing unit for outputting a vector operation instruction, and a mask generation instruction output unit. The multi-loop processing includes a single loop with mask data. Single unit with mask that outputs loop operation instructions in the form changed to loop processing.-Fusion unit with mask that outputs loop operation instructions in the form optimized by integrating parallel loop processing. Also has such.

【００１４】また、本発明では、マスク生成命令ｖｇｓ
ｍを実行してマスクデ−タを作成するオブジェクトプロ
グラム実行方法や、実行マスク生成命令ｖｇｓｍの実行
主体であるマスクデ−タ作成部を少なくとも備え、オブ
ジェクトプログラム中のマスク生成命令に基づいてマス
クデ−タ付きのベクトル演算を行なうオブジェクトプロ
グラム実行装置を用いている。In the present invention, the mask generation instruction vgs
m, a method of executing an object program for generating mask data by executing m, and a mask data generating unit which is an execution subject of an execution mask generating instruction vgsm, and having a mask data based on a mask generating instruction in the object program. Is used.

【００１５】また、本発明では、・ソ−スプログラム中の、マスクデ−タ付きのベクトル
演算に対応の特定部分に対するコンパイルの実行によ
り、マスク生成命令を出力するデ−タ処理に用いられ、
このマスク生成命令としてｖｇｓｍを出力する機能をコ
ンピュ−タに実現させるためのプログラムを格納したこ
とを特徴とするコンピュ−タ読み取り可能なプログラム
記憶媒体・オブジェクトプログラム中のマスク生成命令に基づい
てマスクデ−タ付きのベクトル演算を実行するデ−タ処
理に用いられ、ｖｇｓｍを実行する機能をコンピュ−タ
に実現させるためのプログラムを格納したことを特徴と
するコンピュ−タ読み取り可能なプログラム記憶媒体も用いている。Further, in the present invention, the following is used: data processing for outputting a mask generation instruction by compiling a specific portion corresponding to a vector operation with mask data in a source program;
A computer-readable program storage medium storing a program for realizing a function of outputting vgsm as a mask generation instruction in a computer. A mask data is stored based on a mask generation instruction in an object program. A computer-readable program storage medium, which is used for data processing for executing a vector operation with data and stores a program for realizing the function of executing vgsm on the computer, is also used. ing.

【００１６】このようなマスク生成命令ｖｇｓｍを利用
しているので、例えばコンパイルによりマスク付き一重
化やマスク付き融合化などの処理が行なわれたプログラ
ムの実行時には、マスクデ−タの作成時間が短縮され、
プログラム全体の処理時間が高速化される。Since such a mask generation instruction vgsm is used, for example, when executing a program that has been subjected to processing such as unification with masking and fusion with masking by compiling, the time for creating mask data is reduced. ,
The processing time of the entire program is shortened.

【００１７】マスク生成命令ｖｇｓｍのル−プ融合化へ
の適用時には、ベクトル長が長くなるだけでなく、命令
スケジュ−リングの対象範囲が広くなったり、同じ配列
からのロ−ドを共通化して比較的低速なメモリアクセス
を減らす、などの更なる最適化が期待できる。When the mask generation instruction vgsm is applied to loop fusion, not only is the vector length increased, but also the instruction scheduling range is expanded, and loading from the same array is shared. Further optimization, such as reducing relatively slow memory accesses, can be expected.

【００１８】[0018]

【発明の実施の形態】図１乃至図１３を参照して本発明
の実施の形態を説明する。なお、以下の実施の形態で
は、説明の便宜上、対象数列の先頭位置を「１」番目と
表現する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described with reference to FIGS. In the following embodiment, the head position of the target number sequence is expressed as “1” for convenience of explanation.

【００１９】図１は、マスク生成命令を示す説明図であ
る。マスク生成命令ｖｇｓｍ（vector generate subarr
ay mask)は、ハ−ドウェアが直接実行可能な機械命令で
あって、・ｇｒ１（汎用レジスタ）の値の個数の偽値（例えば
０）が連続し、その後に真値（例えば１）が続いて、・ｇｒ２（汎用レジスタ）の値の合計個数からなる基本
パタ−ン１を繰り返した数列の中、・ｇｒ３（汎用レジスタ）の値が示す位置から始まるベ
クトル長（アクセスされる範囲の長さ）分の数列部分を
マスクレジスタｍｒに設定する、ことを命令内容としている。基本パタ−ン１を、所定個
数の真値とこれに続く所定個数の偽値とからなる数列に
してもよい。FIG. 1 is an explanatory diagram showing a mask generation instruction. Mask generation instruction vgsm (vector generate subarr
ay mask) is a machine instruction that can be directly executed by hardware. A number of false values (eg, 0) of the number of values of gr1 (general purpose register) are consecutive, followed by a true value (eg, 1). In a sequence of repeating the basic pattern 1 consisting of the total number of values of gr2 (general purpose register); vector length starting from the position indicated by the value of gr3 (general purpose register) (length of the range to be accessed) ) Is set in the mask register mr as the instruction content. The basic pattern 1 may be a sequence of a predetermined number of true values followed by a predetermined number of false values.

【００２０】図２は、ル−プ一重化におけるマスク生成
命令の概念を示す説明図であり、図１４と同様のソ−ス
プログラム形式を前提にしている。FIG. 2 is an explanatory diagram showing the concept of a mask generation instruction in loop loop unification, and assumes a source program format similar to that of FIG.

【００２１】配列Ａは（Ｍ＊Ｎ）個の要素からなり、２
の斜線部分はル−プ処理対象部分で（Ｉ２−Ｉ１＋１）
＊（Ｊ２−Ｊ１＋１）個の要素からなっている。また、
３は基本パタ−ン１の開始位置に対応の要素、４はｇｒ
１の指定位置に対応の要素、５はｇｒ３の指定位置に対
応の要素、６はｇｒ２の指定位置でかつ要素基本パタ−
ン１の最終位置に対応の要素、７は要素５からベクトル
長分だけ後続の要素を示している。Array A is composed of (M * N) elements, and 2
The hatched portion is the loop processing target portion (I2-I1 + 1).
It consists of * (J2-J1 + 1) elements. Also,
3 is an element corresponding to the start position of the basic pattern 1, and 4 is gr.
1 is an element corresponding to the designated position, 5 is an element corresponding to the designated position of gr3, 6 is a designated position of gr2 and the element basic pattern.
The element corresponding to the final position of the element 1 and the element 7 are the elements following the element 5 by the vector length.

【００２２】図２のマスク生成命令ｖｇｓｍの場合、・ベクトル長は、要素５（Ｉ１，Ｊ１）から要素７（Ｉ
２，Ｊ２）までの実線部分の要素数〔Ｍ＊（Ｊ２−Ｊ１
＋１）−（Ｉ１−１）−（Ｍ−Ｉ２）〕・ｇｒ１の値は、要素３（Ｉ２＋１，Ｊ１−１）から要
素４（Ｉ１−１，Ｊ１）までの要素数〔（Ｍ−Ｉ２）＋
（Ｉ１−１）〕・ｇｒ２の値は、要素３（Ｉ２＋１，Ｊ１−１）から要
素６（Ｉ２，Ｊ１）までの要素数Ｍ・ｇｒ３の値は、要素３（Ｉ２＋１，Ｊ１−１）から要
素５（Ｉ１，Ｊ１）までの要素数〔（Ｍ−Ｉ２）＋Ｉ
１〕となる。In the case of the mask generation instruction vgsm shown in FIG. 2, the vector length is from element 5 (I1, J1) to element 7 (I
2, the number of elements in the solid line portion up to J2) [M * (J2-J1
+1)-(I1-1)-(M-I2)] The value of gr1 is the number of elements from element 3 (I2 + 1, J1-1) to element 4 (I1-1, J1) [(M-I2) +
(I1-1)] The value of gr2 is the number M of elements from element 3 (I2 + 1, J1-1) to element 6 (I2, J1). The value of gr3 is the value of element 3 (I2 + 1, J1-1). The number of elements up to element 5 (I1, J1) [(M−I2) + I
1].

【００２３】すなわち、要素３から要素４までに対応の
偽値と、これに続く要素５から要素６までに対応の真値
とからなる基本パターンを繰り返した数列の、要素５か
ら要素７までに対応の数列部分をマスクデ−タとして生
成する旨の命令である。That is, from the element 5 to the element 7 in the sequence of repeating a basic pattern consisting of a false value corresponding to the element 3 to the element 4 and a subsequent true value corresponding to the element 5 to the element 6 This is an instruction to generate the corresponding sequence portion as mask data.

【００２４】図３は、図２の二重ル−プに対するコンパ
イル手順の概要を示す説明図であり、その内容は次のよ
うになっている。 (11)ベクトル長を〔Ｍ＊（Ｊ２−Ｊ１＋１）−（Ｉ１−
１）−（Ｍ−Ｉ２）〕とする命令を出力する。 (12)ｖｇｓｍ命令のｇｒ１に対して〔（Ｍ−Ｉ２）＋
（Ｉ１−１）〕を設定する命令を出力する。 (13)ｖｇｓｍ命令のｇｒ２に対してＭを設定する命令を
出力する。 (14)ｖｇｓｍ命令のｇｒ３に対して〔（Ｍ−Ｉ２）＋Ｉ
１〕を設定する命令を出力する。 (15)ｖｇｓｍ命令を出力する。 (16)ル−プ内の命令すべてのオペランドにｍｒ（の値）
を追加する。ｖｇｓｍ命令で生成されたｍｒ（の値）を
マスクデ−タとして使うためである。 (17)二重ル−プ構造を一重ル−プ構造に変換する。変換
後の一次元配列の処理開始位置は元の要素５の位置と
し、変換後のル−プ構造は下記のＩＦ文を挿入したかた
ちに対応している（図４参照）。 (18)ベクトル化によってベクトル命令を出力する。FIG. 3 is an explanatory diagram showing an outline of a compiling procedure for the double loop shown in FIG. 2, and the contents are as follows. (11) Set the vector length to [M * (J2-J1 + 1)-(I1-
1)-(M-I2)]. (12) For the gr1 of the vgsm instruction, [(M-I2) +
(I1-1)] is output. (13) An instruction to set M for gr2 of the vgsm instruction is output. (14) For the gr3 of the vgsm instruction, [(M-I2) + I
1] is output. (15) Output the vgsm instruction. (16) mr (value) is assigned to all operands of the instructions in the loop
Add. This is because mr (the value) generated by the vgsm instruction is used as mask data. (17) Convert a double loop structure into a single loop structure. The processing start position of the one-dimensional array after the conversion is the position of the original element 5, and the loop structure after the conversion corresponds to a form in which the following IF statement is inserted (see FIG. 4). (18) Output a vector instruction by vectorization.

【数１】 (Equation 1)

【００２５】図４は、ｖｇｓｍ命令を使ったル−プ一重
化の例を示す説明図であり、(a) は前提のソ−スプログ
ラム形式、(b) は命令イメ−ジのコンパイル出力を示し
ている。FIGS. 4A and 4B are explanatory diagrams showing examples of loop unification using the vgsm instruction. FIG. 4A shows a prerequisite source program format, and FIG. 4B shows a compile output of an instruction image. Is shown.

【００２６】これは、図２において、・Ｍ＝１００・Ｎ＝１００・Ｉ１＝２，１２＝９９・Ｊ１＝２，Ｊ２＝９９とした場合である。This is the case where in FIG. 2, M = 100, N = 100, I1 = 2, 12 = 99, J1 = 2, J2 = 99.

【００２７】図５は、ル−プ一重化時のコンパイラの構
成図であり、１１はソ−スプログラム、１２はコンパイ
ラ（コンパイル装置）、１３はオブジェクトプログラ
ム、１４はソ−スプログラム解析部、１５はル−プ一重
化部、１６はマスク付き一重化認識部、１７はマスク生
成命令出力部、１８はマスク付き一重化実施部、１９は
ベクトル化部、２０はオブジェクトプログラム生成部を
それぞれ示している。FIG. 5 is a block diagram of a compiler when loops are unified, 11 is a source program, 12 is a compiler (compiler), 13 is an object program, 14 is a source program analysis unit, Reference numeral 15 denotes a loop unification unit, 16 denotes a masked unification recognition unit, 17 denotes a mask generation instruction output unit, 18 denotes a masked singletoning execution unit, 19 denotes a vectorization unit, and 20 denotes an object program generation unit. ing.

【００２８】ソ−スプログラム解析部１４はソ−スプロ
グラム１１を中間テキストに変換し、マスク付き一重化
認識部１６はその中の命令の種類やオペランドのかたち
をみてマスク付き一重化が可能なル−プを取り出してい
る。The source program analysis unit 14 converts the source program 11 into an intermediate text, and the unification with masking recognition unit 16 can perform unification with masking by looking at the type of instructions and the form of operands therein. The loop is taken out.

【００２９】マスク生成命令出力部１７は図３のステッ
プ(11)乃至(15)の処理を実行し、また、マスク付き一重
化実施部１８は図３のステップ(16)，(17)の処理を実行
している。ベクトル化部１９およびオブジェクトプログ
ラム生成部２０の作用は従来のものと同様である。The mask generation command output unit 17 executes the processing of steps (11) to (15) in FIG. 3, and the unification implementing unit with mask 18 executes the processing of steps (16) and (17) in FIG. Running. The operations of the vectorization unit 19 and the object program generation unit 20 are the same as those of the related art.

【００３０】図６は、ル−プ一重化に関するコンパイル
手順の概要を示す説明図であり、その内容は次のように
なっている。 (21)コンパイル対象のル−プを取り出して、次のステッ
プに進む。 (22)多重ル−プの構造をしているかどうかを判断し、
「YES 」の場合は次のステップに進み、「NO」の場合は
ステップ(25)に進む。 (23)マスク付き一重化が可能であるかどうかを判断し、
「YES 」の場合は次のステップに進み、「NO」の場合は
ステップ(25)に進む。 (24)マスク生成命令を出力した上でマスク付き一重化を
実行して、次のステップに進む。 (25)通常のベクトル化を実行して、次のステップに進
む。 (26)すべてのル−プについての処理が終了したかどうか
を判断し、「YES 」の場合は一連の処理を終了し、「N
O」の場合はステップ(21)に戻る。FIG. 6 is an explanatory diagram showing an outline of a compiling procedure relating to loop unification, and the contents are as follows. (21) Take out the loop to be compiled and proceed to the next step. (22) judge whether it has the structure of multiple loops,
If “YES”, proceed to the next step; if “NO”, proceed to step (25). (23) Determine whether unification with mask is possible,
If “YES”, proceed to the next step; if “NO”, proceed to step (25). (24) After outputting the mask generation instruction, the unification with the mask is executed, and the process proceeds to the next step. (25) Perform normal vectorization and proceed to the next step. (26) It is determined whether or not the processing for all loops has been completed. If "YES", a series of processing is completed, and "N
If "O", the process returns to step (21).

【００３１】図７は、ル−プ融合化の説明の前提として
用いるソ−スプログラム形式を示す説明図である。説明
の便宜上、各ル−プの処理対象要素範囲が他ル−プの処
理対象要素範囲に完全に包含されない、すなわち「Ｉ３
＜Ｉ１＜Ｉ４＜Ｉ２」または「Ｉ１＜Ｉ３＜Ｉ２＜Ｉ
４」が成り立つとする。FIG. 7 is an explanatory diagram showing a source program format used as a premise for explaining loop fusion. For convenience of explanation, the processing element range of each loop is not completely included in the processing element range of another loop, that is, "I3
<I1 <I4 <I2 "or" I1 <I3 <I2 <I
4 ”holds.

【００３２】ソ−スプログラム例は、１０００個の要素
からなる各配列Ａ、Ｂに対し、・１０１乃至１０００（Ｉ１乃至Ｉ２）の各要素の乗算
を実行するル−プ・１乃至９００（Ｉ３乃至Ｉ４）の各要素の加算を実行
するル−プの二つの並立ル−プを有している。The example of the source program is as follows. For each array A and B consisting of 1000 elements, a loop for executing multiplication of each element of 101 to 1000 (I1 to I2). 1 to 900 (I3 To I4) to perform addition of each element.

【００３３】図８は、図７の並立ル−プに対するコンパ
イル手順の概要を示す説明図であり、その内容は次のよ
うになっている。 (31)通常のベクトル化を実行する。 (32)ベクトル長ＶＬを〔ＭＡＸ（Ｉ２，Ｉ４）−ＭＩＮ
（Ｉ１，Ｉ３）＋１〕とする命令を出力する。 (33)ｖｇｓｍ命令のｇｒ１に対して〔ＶＬ−（Ｉ２−Ｉ
１＋１）〕を設定する命令を出力する。 (34)ｖｇｓｍ命令のｇｒ２に対してＶＬを設定する命令
を出力する。 (35)ｖｇｓｍ命令のｇｒ３に対して〔１：（Ｉ１＞Ｉ３
のとき）〕または〔（Ｉ４−１２＋１）：（Ｉ１＜Ｉ３
のとき）〕を設定する命令を出力する。 (36)ｖｇｓｍ命令を出力する。 (37)ｖｇｓｍ命令のｇｒ１に対して〔ＶＬ−（Ｉ４−Ｉ
３＋１）〕を設定する命令を出力する。 (38)ｖｇｓｍ命令のｇｒ２に対してＶＬを設定する命令
を出力する。 (39)ｖｇｓｍ命令のｇｒ３に対して〔（Ｉ２−１４＋
１）：（Ｉ１＞Ｉ３のとき）〕または〔１：（Ｉ１＜Ｉ
３のとき）〕を設定する命令を出力する。 (40)ｖｇｓｍ命令を出力する。 (41)一つ目のル−プ内の命令すべてのオペランドに、ス
テップ(36)のｖｇｓｍ命令で生成されたｍｒ１（の値）
を追加する。ｍｒ１をマスクデ−タとして使うためであ
る。 (42)二つ目のル−プ内の命令すべてのオペランドに、ス
テップ(40)のｖｇｓｍ命令で生成されたｍｒ２（の値）
を追加する。ｍｒ２をマスクデ−タとして使うためであ
る。 (43)並立ル−プ構造を単一ル−プ構造に変換する。新し
いル−プの初期値はＭＩＮ（Ｉ１，Ｉ３）、終値はＭＡ
Ｘ（Ｉ２，Ｉ４）とし、変換後のル−プ構造は下記のＩ
Ｆ文を挿入したかたちに対応している。 (44)共通式の削除や命令スケジュ−リングなどの最適化
を実行する。FIG. 8 is an explanatory diagram showing an outline of a compiling procedure for the parallel loop shown in FIG. 7, and the contents are as follows. (31) Perform normal vectorization. (32) Set the vector length VL to [MAX (I2, I4) -MIN
(I1, I3) +1]. (33) With respect to gr1 of the vgsm instruction, [VL- (I2-I
1 + 1)] is output. (34) An instruction to set VL is output to gr2 of the vgsm instruction. (35) For gr3 of the vgsm instruction, [1: (I1> I3
)] Or [(I4-12 + 1): (I1 <I3
)] Is output. (36) Output the vgsm instruction. (37) For the gr1 of the vgsm instruction, [VL- (I4-I
3 + 1)] is output. (38) An instruction to set VL is output to gr2 of the vgsm instruction. (39) For the gr3 of the vgsm instruction, [(I2-14 +
1): (when I1> I3)] or [1: (I1 <I3
3) is output. (40) Output the vgsm instruction. (41) The value of mr1 (value) generated by the vgsm instruction in step (36) is added to all the operands in the instruction in the first loop.
Add. This is because mr1 is used as mask data. (42) The value of mr2 (value) generated by the vgsm instruction in step (40) is added to all the operands in the instruction in the second loop.
Add. This is because mr2 is used as mask data. (43) Convert the parallel loop structure into a single loop structure. The initial value of the new loop is MIN (I1, I3) and the final value is MA.
X (I2, I4), and the loop structure after the conversion is represented by the following I
It corresponds to the form in which the F sentence was inserted. (44) Perform optimization such as elimination of common expressions and instruction scheduling.

【数２】 (Equation 2)

【００３４】図９は、ｖｇｓｍ命令を使ったル−プ融合
化の例を示す説明図であり、通常のベクトル化後、ｖｇ
ｓｍ命令を使ったル−プ融合後、および最適化後のそれ
ぞれの場合における命令イメ−ジのコンパイル出力を示
している。各命令の内容は図４のそれと同様である。FIG. 9 is an explanatory diagram showing an example of loop fusion using the vgsm instruction.
The compile output of the instruction image is shown in each case after loop fusion using the sm instruction and after optimization. The contents of each instruction are the same as those in FIG.

【００３５】ここで、・１０１乃至１０００の各要素の乗算処理ル−プに対し
ては「ｇｒ１＝１００，ｇｒ２＝１０００、ｇｒ３＝
１」のｖｇｓｍ命令（ｍｒ１）を使い、・１乃至９００の各要素の加算処理ル−プに対しては
「ｇｒ１＝１００，ｇｒ２＝１０００、ｇｒ３＝１０
１」のｖｇｓｍ命令（ｍｒ２）を使っている。Here, for the multiplication loop of each element of 101 to 1000, "gr1 = 100, gr2 = 1000, gr3 =
Vgsm instruction (mr1) of "1", and "gr1 = 100, gr2 = 1000, gr3 = 10" for the addition processing loop of each element of 1 to 900
The vgsm instruction (mr2) of “1” is used.

【００３６】この場合、ｍｒ１とｍｒ２のｖｇｓｍ命令
出力の段階でもそれぞれのｇｒ１およびｇｒ２の最適化
を実行している。In this case, the respective gr1 and gr2 are also optimized at the stage of outputting the vgsm instruction of mr1 and mr2.

【００３７】ル−プ融合後の最適化では、・配列Ｂをベクトルレジスタｖｒに格納する二つのｖｌ
ｏａｄ命令の一つを削除し、・配列Ｃをベクトルレジスタｖｒに格納する二つのｖｌ
ｏａｄ命令の一つを削除し、・乗算命令ｖｍｕｌｔ、加算命令ｖａｄｄ、および乗算
結果の保持命令ｖｓｔｏｒｅの順序を変更している。In the optimization after the loop fusion, two VLs storing the array B in the vector register vr
One of the oad instructions is deleted, and two vl that store the array C in the vector register vr
One of the oad instructions has been deleted, and the order of the multiplication instruction vmult, the addition instruction vadd, and the multiplication result holding instruction vstore has been changed.

【００３８】なお、保持命令ｖｓｔｏｒｅにｍｒを付加
してマスクデ−タ付き命令とする場合、その前の乗算命
令ｖｍｕｌｔや加算命令ｖａｄｄにこれを付加するかど
うかは任意である。When mr is added to the holding instruction vstore to make it an instruction with mask data, it is optional whether or not to add this to the preceding multiplication instruction vmultit or addition instruction vadd.

【００３９】図１０は、ル−プ融合化時のコンパイラの
構成図であり、２１はソ−スプログラム、２２はコンパ
イラ（コンパイル装置）、２３はオブジェクトプログラ
ム、２４はソ−スプログラム解析部、２５はベクトル化
部、２６はル−プ融合化部、２７はマスク付き融合化認
識部、２８はマスク生成命令出力部、２９はマスク付き
融合化実施部、３０は最適化部、３１はオブジェクトプ
ログラム生成部をそれぞれ示している。FIG. 10 is a configuration diagram of a compiler at the time of loop fusion, wherein 21 is a source program, 22 is a compiler (compiler), 23 is an object program, 24 is a source program analysis unit, 25 is a vectorization unit, 26 is a loop fusion unit, 27 is a fusion recognition unit with mask, 28 is a mask generation instruction output unit, 29 is a fusion execution unit with mask, 30 is an optimization unit, and 31 is an object. Each shows a program generation unit.

【００４０】ソ−スプログラム解析部２４はソ−スプロ
グラム２１を中間テキストに変換し、マスク付き融合化
認識部２７はその中の命令の種類やオペランドのかたち
をみてマスク付き融合化が可能なル−プを取り出してい
る。The source program analysis unit 24 converts the source program 21 into an intermediate text, and the fusion-with-masking recognizing unit 27 is capable of performing fusion with a mask by looking at the type of instructions and the form of operands in the instruction. The loop is taken out.

【００４１】マスク生成命令出力部２８は図８のステッ
プ(33)乃至(40)の処理を実行し、マスク付き融合化実施
部２９は図８のステップ(41)乃至(43)の処理を実行し、
また、最適化部３０は図８のステップ(44)の処理を実行
している。ベクトル化部２５およびオブジェクトプログ
ラム生成部３１の作用は従来のものと同様である。The mask generation command output unit 28 executes the processing of steps (33) to (40) in FIG. 8, and the fusion unit with mask 29 executes the processing of steps (41) to (43) of FIG. And
The optimizing unit 30 performs the process of step (44) in FIG. The operations of the vectorizing unit 25 and the object program generating unit 31 are the same as those of the conventional one.

【００４２】図１１は、ル−プ融合化に関するコンパイ
ル手順の概要を示す説明図であり、その内容は次のよう
になっている。 (51)コンパイル対象のル−プを取り出して、次のステッ
プに進む。 (52)ベクトル化を実行して、次のステップに進む。 (53)並立ル−プの構造をしているかどうかを判断し、
「YES 」の場合は次のステップに進み、「NO」の場合は
ステップ(57)に進む。 (54)マスク付き融合化が可能であるかどうかを判断し、
「YES 」の場合は次のステップに進み、「NO」の場合は
ステップ(57)に進む。 (55)マスク生成命令を出力した上でマスク付き融合化を
実行して、次のステップに進む。 (56)ル−プ融合後の共通式の削除や命令スケジュ−リン
グなどの最適化を実行して、次のステップに進む。 (57)すべてのル−プについての処理が終了したかどうか
を判断し、「YES 」の場合は一連の処理を終了し、「N
O」の場合はステップ(51)に戻る。FIG. 11 is an explanatory diagram showing an outline of a compiling procedure relating to loop fusion, and the contents are as follows. (51) Take out the loop to be compiled and proceed to the next step. (52) Perform vectorization and proceed to the next step. (53) Judge whether it has a parallel loop structure,
If “YES”, proceed to the next step; if “NO”, proceed to step (57). (54) Judge whether fusion with mask is possible,
If “YES”, proceed to the next step; if “NO”, proceed to step (57). (55) After outputting a mask generation instruction, fusion with mask is executed, and the process proceeds to the next step. (56) Optimization such as elimination of common expressions after loop fusion and instruction scheduling is performed, and the process proceeds to the next step. (57) It is determined whether or not processing for all loops has been completed. If "YES", a series of processing is completed and "N
If "O", the process returns to step (51).

【００４３】図１２は、マスク生成命令を実行するオブ
ジェクトプログラム実行装置を示す説明図であり、４１
はオブジェクトプログラム、４２はオブジェクトプログ
ラム実行装置、４３は各種命令や演算用デ−タなどを保
持する記憶部、４４はマスク生成命令（ｖｇｓｍ）を実
行してマスクデ−タを出力するマスクデ−タ作成部、４
５はマスクデ−タ付きのベクトル演算を実行するベクト
ル演算部、４６はレジスタなどのベクトル演算結果保持
部をそれぞれ示している。FIG. 12 is an explanatory diagram showing an object program execution device for executing a mask generation instruction.
Is an object program, 42 is an object program execution device, 43 is a storage unit for holding various instructions and operation data, and 44 is mask data generation for executing a mask generation instruction (vgsm) and outputting mask data. Part 4,
Reference numeral 5 denotes a vector operation unit for executing a vector operation with mask data, and reference numeral 46 denotes a vector operation result holding unit such as a register.

【００４４】図１３は、コンピュ−タ読み取り可能な記
録媒体からプログラムを読み取って実行するコンピュ−
タシステムの概要を示す説明図であり、５１はコンピュ
−タシステム、５２はＣＰＵやディスクドライブ装置な
どを内蔵した本体部、５３は本体部５２からの指示によ
り画像を表示するディスプレイ、５４は表示画面、５５
はコンピュ−タシステム５１に種々の情報を入力するた
めのキ−ボ−ド、５６は表示画面５４上の任意の位置を
指定するマウス、５７は外部のデ−タベ−ス（ＤＡＳＤ
などの回線先メモリ）、５８は外部のデ−タベ−ス５７
にアクセスするモデム、５９はＣＤ−ＲＯＭやフロッピ
−ディスクなどの可搬型記憶媒体をそれぞれ示してい
る。FIG. 13 shows a computer which reads a program from a computer-readable recording medium and executes the program.
51 is an explanatory diagram showing an outline of a computer system, 51 is a computer system, 52 is a main unit incorporating a CPU and a disk drive, etc., 53 is a display for displaying an image according to an instruction from the main unit 52, 54 is a display screen, 55
Is a keyboard for inputting various information to the computer system 51, 56 is a mouse for specifying an arbitrary position on the display screen 54, 57 is an external database (DASD).
And 58, an external database 57.
And 59, a portable storage medium such as a CD-ROM or a floppy disk.

【００４５】プログラムを格納する記憶媒体としては、・プログラム提供者側のデ−タベ−ス５７（回線先メモ
リ）・可搬型記憶媒体５９・本体部５２側のメモリなどのいずれでもよく、当該プログラムは本体部５２に
ロ−デイングされてその主メモリ上で実行される。The storage medium for storing the program may be any one of: a database 57 on the program provider side (line destination memory); a portable storage medium 59; and a memory on the main body 52 side. Is loaded into the main unit 52 and executed on its main memory.

【００４６】[0046]

【発明の効果】本発明は、このようなマスク作成命令ｖ
ｇｓｍを用いてコンパイルし、またオブジェクトプログ
ラム中のマスク作成命令を実行しているので、ル−プ一
重化やル−プ融合化にともなうマスクデ−タ付きのベク
トル演算処理の高速化を図ることができる。According to the present invention, such a mask generation instruction v
Compiling using gsm and executing a mask creation instruction in an object program can speed up vector arithmetic processing with mask data accompanying loop unification and loop fusion. it can.

【００４７】また、このマスク作成命令をル−プ融合化
に適用するときには、ル−プ融合化後のプログラム中の
共通式の削除や命令スケジュ−リングなどの最適化を行
なうことにより、ベクトル計算機の実効性能を一段と向
上させることができる。When this mask creation instruction is applied to loop fusion, the vector computer is optimized by deleting common formulas in the program after loop fusion and optimizing instruction scheduling and the like. Can be further improved.

[Brief description of the drawings]

【図１】本発明の、マスク生成命令を示す説明図であ
る。FIG. 1 is an explanatory diagram showing a mask generation instruction according to the present invention.

【図２】本発明の、ル−プ一重化におけるマスク生成命
令の概念を示す説明図である。FIG. 2 is an explanatory diagram showing the concept of a mask generation instruction in loop unification according to the present invention.

【図３】図２の二重ル−プに対するコンパイル手順の概
要を示す説明図である。FIG. 3 is an explanatory diagram showing an outline of a compilation procedure for the double loop in FIG. 2;

【図４】本発明の、ｖｇｓｍ命令を使ったル−プ一重化
の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of loop unification using a vgsm instruction according to the present invention.

【図５】本発明の、ル−プ一重化時のコンパイラの構成
図である。FIG. 5 is a configuration diagram of a compiler when loops are unified according to the present invention.

【図６】本発明の、ル−プ一重化に関するコンパイル手
順の概要を示す説明図である。FIG. 6 is an explanatory diagram showing an outline of a compiling procedure relating to loop unification of the present invention.

【図７】本発明の、ル−プ融合化の説明の前提として用
いるソ−スプログラム形式を示す説明図である。FIG. 7 is an explanatory diagram showing a source program format used as a premise for explaining loop fusion according to the present invention.

【図８】図７の並立ル−プに対するコンパイル手順の概
要を示す説明図である。FIG. 8 is an explanatory diagram showing an outline of a compiling procedure for the parallel loop shown in FIG. 7;

【図９】本発明の、ｖｇｓｍ命令を使ったル−プ融合化
の例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of loop fusion using a vgsm instruction according to the present invention.

【図１０】本発明の、ル−プ融合化時のコンパイラの構
成図である。FIG. 10 is a configuration diagram of a compiler at the time of loop fusion according to the present invention.

【図１１】本発明の、ル−プ融合化に関するコンパイル
手順の概要を示す説明図である。FIG. 11 is an explanatory diagram showing an outline of a compiling procedure relating to loop fusion according to the present invention.

【図１２】本発明の、マスク生成命令を実行するオブジ
ェクトプログラム実行装置を示す説明図である。FIG. 12 is an explanatory diagram illustrating an object program execution device that executes a mask generation instruction according to the present invention.

【図１３】本発明の、コンピュ−タ読み取り可能な記録
媒体からプログラムを読み取って実行するコンピュ−タ
システムの概要を示す説明図である。FIG. 13 is an explanatory diagram showing an outline of a computer system for reading and executing a program from a computer-readable recording medium according to the present invention.

【図１４】従来の、マスク付き一重化のコンパイル処理
の概要を示す説明図である。FIG. 14 is an explanatory diagram showing an outline of a conventional compile process of unification with a mask.

【図１５】図１４(b) のコンパイル内容に対応の処理手
順を示す説す説明図である。FIG. 15 is an explanatory diagram showing a processing procedure corresponding to the compile contents of FIG. 14 (b).

【図１６】図１４(b) のコンパイル内容に対応の命令イ
メ−ジを示す説明図である。FIG. 16 is an explanatory diagram showing an instruction image corresponding to the compile contents of FIG. 14 (b).

[Explanation of symbols]

図１および図２において、１：基本パタ−ンｇｒ１：基本パタ−ンの偽値（０）の個数を特定する値
（レジスタ）ｇｒ２：基本パタ−ンの全体個数を特定する値（レジス
タ）ｇｒ３：マスクデ−タの始まり位置を特定する値（レジ
スタ）２：ル−プ処理対象部分３：基本パタ−ンの開始位置に対応の要素４：ｇｒ１の指定位置に対応の要素５：ｇｒ３の指定位置に対応の要素６：ｇｒ２の指定位置でかつ要素基本パタ−ンの最終位
置に対応の要素７：要素５からベクトル長分だけ後続の要素In FIGS. 1 and 2, 1: a basic pattern gr1: a value (register) for specifying the number of false values (0) of the basic pattern gr2: a value (register) for specifying the total number of basic patterns gr3: value (register) for specifying the start position of mask data 2: part to be subjected to loop processing 3: element corresponding to the start position of basic pattern 4: element corresponding to the specified position of gr1 5: gr3 Element 6 corresponding to the specified position 6: Element corresponding to the specified position of gr2 and the final position of the element basic pattern 7: Element following vector 5 from element 5

Claims

[Claims]

1. A data processing method for outputting a mask generation instruction by compiling a specific portion corresponding to a vector operation with mask data in a source program, the method comprising: The first data in which one of the value or the true value is continued by a predetermined number
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. A compiling method using an instruction for setting.

2. The compiling method according to claim 1, wherein said vector operation is related to a single loop processing of multiple loops.

3. The compiling method according to claim 1, wherein said vector operation is related to parallel loop fusion processing.

4. A data processing apparatus for outputting a mask generation instruction by compiling a specific portion corresponding to a vector operation with mask data in a source program, wherein the vector operation instruction of the specific portion is provided. And a first data in which one of a false value and a true value is continued for a predetermined number of times.
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. And a mask generation command output unit for outputting the mask generation command for setting.

5. A masked unification executing unit for outputting a loop operation instruction in a form in which the multiple loop processing is changed to a single loop processing with mask data. 4. The compiling device according to 4.

6. The compiling device according to claim 4, further comprising a fusion execution unit with a mask for outputting a loop operation instruction in a form optimized by fusing the parallel loop processing.

7. A data processing for outputting a mask generation instruction by compiling a specific portion corresponding to a vector operation with mask data in a source program, wherein the mask generation instruction includes: A first data in which one of a false value and a true value is continued by a predetermined number.
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. A computer readable program storage medium, which stores a program for outputting a setting instruction and for realizing a function in the computer.

8. A data processing method for executing a vector operation with mask data based on a mask generation instruction in an object program, wherein the mask generation instruction causes one of a false value and a true value to continue in a predetermined number. The first date
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. An object program execution method, which is an instruction for setting.

9. A data processing apparatus for executing a vector operation with mask data based on a mask generation instruction in an object program, comprising: a first data in which one of a false value and a true value is continued by a predetermined number;
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. An object program execution device, comprising at least a mask data creation unit for executing the mask generation instruction for setting.

10. A first data which is used for data processing for executing a vector operation with mask data based on a mask generation instruction in an object program, wherein one of a false value and a true value is continued by a predetermined number. −
A predetermined range in a data sequence consisting of a basic pattern of a data portion and a second data portion in which the other of a false value or a true value is continued by a predetermined number is used as mask data. A computer-readable program storage medium, which stores a program for executing the mask generation command for setting and for realizing a function on the computer.