JP3196625B2

JP3196625B2 - Parallel compilation method

Info

Publication number: JP3196625B2
Application number: JP34489395A
Authority: JP
Inventors: 淳嗣酒井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-12-06
Filing date: 1995-12-06
Publication date: 2001-08-06
Anticipated expiration: 2015-12-06
Also published as: JPH09160784A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は原始プログラムを入
力して目的プログラムを出力するコンパイラに関し、よ
り詳細には、ワード内の複数フィールド毎に同種演算を
施す命令セットを持つプロセッサ向けの、上記命令セッ
トを用いた目的プログラムを生成する並列化コンパイル
方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiler for inputting a source program and outputting a target program, and more particularly to a compiler having an instruction set for performing the same kind of operation for each of a plurality of fields in a word. The present invention relates to a parallel compilation method for generating a target program using a set.

【０００２】[0002]

【従来の技術】単一プロセッサで複数データに対し同種
演算を行うものとして、ベクトルプロセッサがある。ベ
クトルプロセッサでは、配列データの各要素に対して同
種の演算処理をパイプライン的に施す。ここで、ベクト
ル演算対象となる配列データは記憶装置上の任意の位置
に配置することができる。一般に、ベクトル命令を適用
した部分の実行速度は通常のスカラ命令での実行に比べ
て数十倍以上に加速される。このため、スカラ命令での
実行を前提として記述された原始プログラムから、ベク
トル命令を使用したベクトルプロセッサ向けの目的プロ
グラムを生成するコンパイラ、すなわちベクトル化コン
パイラが種々提案されている。2. Description of the Related Art There is a vector processor which performs the same kind of operation on a plurality of data with a single processor. The vector processor performs the same kind of arithmetic processing on each element of the array data in a pipeline manner. Here, the array data to be subjected to the vector operation can be arranged at an arbitrary position on the storage device. Generally, the execution speed of a portion to which a vector instruction is applied is accelerated by several tens or more times as compared with the execution by a normal scalar instruction. For this reason, various compilers have been proposed for generating a target program for a vector processor using vector instructions from a source program described on the assumption that the program is executed by a scalar instruction, that is, various vectorizing compilers.

【０００３】ベクトル化コンパイラでは、原始プログラ
ムにおいて主としてループ形式で記述された配列操作部
分を精密に調査し、可能な部分についてはベクトル命令
列に変換する。その際、ベクトル命令列への変換によっ
て、それらの演算の実行順序が原始プログラムでの演算
順序から変化することがあるため、その変換によって誤
った演算処理結果を得ることのないように、ベクトル化
コンパイラは依存関係解析処理を行う。依存関係解析と
は、配列要素等の変数の定義参照関係を調査するもので
あり、詳細については、例えば、"Supercompilers for
Parallel and Vector Computers"（H.Zimaほか、1991
年,Addison-Wesley Publishing Company）の第４章に述
べられている。また、ベクトル命令はその起動に時間を
要するため、ベクトル処理対象の配列の長さが短い場合
には性能が低下する。そこでベクトル化コンパイラで
は、演算対象配列の長さが十分にあるか否かを考慮して
ベクトル化の可否を判断する。The vectorizing compiler precisely examines an array operation portion described in a loop format in a source program, and converts a possible portion into a vector instruction sequence. At this time, the order of execution of these operations may be changed from the order of operation in the source program due to the conversion to the vector instruction sequence, so that the conversion is performed so that an erroneous operation processing result is not obtained. The compiler performs a dependency analysis process. Dependency analysis is to investigate the definition reference relationship of variables such as array elements. For details, see "Supercompilers for
Parallel and Vector Computers "(H. Zima et al., 1991
Year, Addison-Wesley Publishing Company). Further, since a vector instruction requires a long time to start, if the length of an array to be vector-processed is short, the performance is reduced. Therefore, the vectorization compiler determines whether or not vectorization can be performed in consideration of whether or not the length of the operation target array is sufficient.

【０００４】他方、単一プロセッサで複数データに対し
同種演算を行う他の形式のプロセッサとして、ワード内
の複数フィールドに対して同種演算を施す命令セットを
有するプロセッサが提案されている。このような演算命
令を本明細書ではビットスライス演算命令と呼び、また
ビットスライス演算命令を命令セットに有するプロセッ
サをビットスライス演算型プロセッサと呼ぶことにす
る。ビットスライス演算命令としては、例えば図３に示
すように、１ワード６４ビット長のデータブロックを４
つの１６ビットフィールドの集合と見なし、各フィール
ド毎に加算を行う、というものがあげられる。図３にお
いて、レジスタ６及びレジスタ７は演算のソース、レジ
スタ８は演算のデスティネーションであり、すべて６４
ビット長である。プロセッサ内の演算ユニット９は４つ
の等価な演算サブユニット９１，９２，９３，９４から
構成されており、各サブユニットは同種の１６ビット演
算を独立に行う。ビットスライス演算命令は、レジスタ
６の第１フィールドＳ₁₁とレジスタ７の第１フィールド
Ｓ₂₁をソースとする演算の結果をレジスタ８の第１フィ
ールドＤ₁に格納し、レジスタ６の第２フィールドＳ₁₂
とレジスタ７の第２フィールドＳ₂₂をソースとする演算
の結果をレジスタ８の第２フィールドＤ₂に格納し、以
下、第３フィールドと第４フィールドに対しても同様の
処理を行う。On the other hand, as another type of processor that performs the same operation on a plurality of data with a single processor, a processor having an instruction set for performing the same operation on a plurality of fields in a word has been proposed. In this specification, such an operation instruction is referred to as a bit slice operation instruction, and a processor having the bit slice operation instruction in an instruction set is referred to as a bit slice operation type processor. As a bit slice operation instruction, for example, as shown in FIG.
That is, it is regarded as a set of two 16-bit fields, and addition is performed for each field. In FIG. 3, registers 6 and 7 are the source of the operation, and register 8 is the destination of the operation.
Bit length. The arithmetic unit 9 in the processor is composed of four equivalent arithmetic subunits 91, 92, 93, 94, and each subunit independently performs the same type of 16-bit arithmetic. The bit slice operation instruction stores the result of the operation using the first field S ₁₁ of the register 6 and the first field S ₂₁ of the register 7 in the first field D ₁ of the register 8 and the second field S ₁ of the register 6. ₁₂
And a second field S ₂₂ of the register 7 stores the result of the operation to the source in the second field D ₂ of the register 8, hereinafter, the same processing is performed for the third and fourth fields.

【０００５】ビットスライス演算は通常のワード間演算
とほとんど同じ時間で完了し、ベクトル命令のように起
動に時間を要することはない。このため、必ずしも高精
度演算が要求されない実画像処理やコンピュータグラフ
ィックス処理などへの適用が試みられている。[0005] A bit slice operation is completed in almost the same time as a normal inter-word operation, and does not require a long time for activation unlike a vector instruction. For this reason, application to real image processing or computer graphics processing, which does not necessarily require high-precision arithmetic, has been attempted.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、従来に
おいては、通常のプロセッサ用に記述された原始プログ
ラムからビットスライス演算型プロセッサ向けの目的プ
ログラムを生成するコンパイラはなく、従ってアセンブ
ラ言語や機械語を使用して目的プログラムを直接にコー
ディングするプログラミング作業が必要であった。However, conventionally, there is no compiler that generates a target program for a bit slice operation type processor from a source program described for a normal processor, and therefore uses an assembler language or a machine language. Then, programming work to directly code the target program was necessary.

【０００７】本発明はこのような事情に鑑みて提案され
たものであり、その目的は、原始プログラムからビット
スライス演算型プロセッサ向けの目的プログラムを生成
することのできる並列化コンパイル方式を提供すること
にある。The present invention has been proposed in view of the above circumstances, and an object of the present invention is to provide a parallelized compiling method capable of generating a target program for a bit slice operation type processor from a source program. It is in.

【０００８】また、ビットスライス演算とベクトル演算
とは、両者とも原始プログラムにおいては通常配列操作
として記述される点で類似しているため、ビットスライ
ス演算型プログラム用のコンパイラには、ベクトル化コ
ンパイラの技術や思想をある程度流用することができ
る。例えば、ビットスライス演算命令を使用する場合も
演算順序が元の原始プログラムから変更されるため、ベ
クトル化コンパイラにおける前述した依存関係解析はビ
ットスライス演算命令生成の場合にもまた必須の機能で
ある。しかし、ビットスライス演算とベクトル演算とで
は実行形態が大きく異なる。例えば、ベクトル演算では
ベクトル演算対象となる配列データは記憶装置上の任意
の位置に配置することができるのに対して、ビットスラ
イス演算では、記憶装置へのアクセスがワード境界に沿
っていなければならない。即ち、図３のレジスタ６，７
に読み込むべきデータの先頭アドレスがワード境界に沿
っていなければならない。これは、ワード境界に沿って
いないと、整列化等の余分な処理が必要となり、ビット
スライス演算命令による実行速度の向上が通常数倍程度
に過ぎないため、ビットスライス演算命令使用による速
度向上がそのような余分な処理に関するオーバーヘッド
によって相殺され、実行効率の良い目的プログラムが得
られないからである。ベクトル化コンパイラでは、先に
述べたベクトル演算の特性から、演算対象配列の長さが
十分にあるか否かについては考慮するが、演算対象配列
が記憶装置上にどのように配置されているかについては
考慮していない。しかもベクトル化による速度向上率が
大きいため、多少命令ステップ数が増大してもベクトル
化を行うようなコンパイル方針をとっている。Further, since the bit slice operation and the vector operation are similar in that both are usually described as an array operation in a source program, a compiler for a bit slice operation type program includes a vectorization compiler. Technology and ideas can be diverted to some extent. For example, when a bit slice operation instruction is used, the operation order is changed from the original source program. Therefore, the above-described dependency analysis in the vectorizing compiler is also an essential function when generating a bit slice operation instruction. However, the execution form differs greatly between the bit slice operation and the vector operation. For example, in a vector operation, array data to be subjected to a vector operation can be arranged at any position on a storage device, whereas in a bit slice operation, access to the storage device must be along a word boundary. . That is, the registers 6 and 7 in FIG.
The first address of the data to be read into the memory must be along a word boundary. This is because, if not along a word boundary, extra processing such as alignment is necessary, and the execution speed improvement by the bit slice operation instruction is usually only about several times. This is because such extra processing overhead cancels out and a target program with high execution efficiency cannot be obtained. The vectorizing compiler considers whether or not the length of the operation target array is sufficient from the characteristics of the vector operation described above, but considers how the operation target array is arranged on the storage device. Is not considered. In addition, since the rate of speed improvement by vectorization is large, a compiling policy is adopted in which vectorization is performed even if the number of instruction steps increases somewhat.

【０００９】このようにビットスライス演算型プロセッ
サ向けの目的プログラムを生成するコンパイラは、原始
プログラム中でループの形式で記述された部分を目的プ
ログラムに変換するという点でベクトル化コンパイラの
コンパイル方式と類似した部分がある。しかし、先に述
べたようなベクトル命令とビットスライス演算命令との
特性の違いから、ベクトル化方式をビットスライス演算
命令生成にそのまま適用すると、ビットスライス演算命
令使用による速度向上が記憶装置へのアクセスやプログ
ラム変換に関するオーバーヘッドによって相殺され、実
行効率の良い目的プログラムは得られない。特に、ビッ
トスライス演算命令で低精度並列演算を行う分野では、
計算に必要なデータの開始アドレスは動的に変化する場
合が多く、データの先頭アドレスがワード境界になると
は限らないため、ビットスライス演算命令へ一律に置き
換えることは、問題である。As described above, the compiler for generating the target program for the bit slice operation type processor is similar to the compile method of the vectorizing compiler in that a portion described in the form of a loop in the source program is converted into the target program. There is a part that did. However, due to the difference in characteristics between the vector instruction and the bit slice operation instruction as described above, if the vectorization method is applied to bit slice operation instruction generation as it is, the speed improvement due to the use of the bit slice operation instruction will increase the access to the storage device. And an overhead related to program conversion, and a target program with high execution efficiency cannot be obtained. In particular, in the field of performing low-precision parallel operations using bit slice operation instructions,
In many cases, the start address of data required for calculation dynamically changes, and the head address of data does not always coincide with a word boundary. Therefore, it is problematic to uniformly replace the start address of the data with a bit slice operation instruction.

【００１０】そこで、本発明の別の目的は、ビットスラ
イス演算命令の特性を考慮し、実際のビットスライス演
算命令で効率的な実行が可能か否かを判定しつつコンパ
イルを進める並列化コンパイル方式を提供することにあ
る。[0010] Therefore, another object of the present invention is to consider a characteristic of a bit slice operation instruction and to determine whether or not an actual bit slice operation instruction can be efficiently executed. Is to provide.

【００１１】[0011]

【課題を解決するための手段】本発明は、単一ワードを
複数フィールドに分割し各フィールドに対して同種演算
を施すビットスライス演算命令を有するプロセッサ向け
のコンパイラにおいて、配列操作を記述したループをア
ンローリングするループアンローリング手段と、依存関
係を保ちつつアンローリングされた命令列を並べ換える
コード列並べ換え手段と、並べ換えられた命令列からパ
ターン照合によってビットスライス演算命令を生成する
ビットスライスコード生成手段とを備えている。SUMMARY OF THE INVENTION The present invention relates to a compiler for a processor having a bit slice operation instruction for dividing a single word into a plurality of fields and performing the same kind of operation on each field. Loop unrolling means for unrolling, code string rearranging means for rearranging unrolled instruction strings while maintaining a dependency, and bit slice code generating means for generating a bit slice operation instruction by pattern matching from the rearranged instruction strings And

【００１２】また、前記ループアンローリング手段は、
アンローリング前のコード列に第１の順序番号を振り、
ビットスライス演算命令のフィールド数分だけアンロー
リングする際、ｋ回目のアンローリングによって生成さ
れたコード列に対して当該ｋに応じた第２の順序番号を
振るよう構成され、前記コード列並べ換え手段は、ルー
プ内に出現する変数の定義参照関係を保持しつつ、第１
の順序番号を第１キー，第２の順序番号を第２キーとし
て、ループ内の命令列を並べ換える構成を有している。Further, the loop unrolling means includes:
Assign the first sequence number to the code sequence before unrolling,
When unrolling is performed by the number of fields of the bit slice operation instruction, a second sequence number corresponding to the k is assigned to the code sequence generated by the k-th unrolling. , While maintaining the definition reference relationship of the variables appearing in the loop,
Is used as a first key and a second key is used as a second key to rearrange instruction sequences in a loop.

【００１３】更に、前記ビットスライスコード生成手段
は、ビットスライス演算命令に適合する命令列であって
も、その命令列中でアクセスするデータの先頭アドレス
がワード境界に沿っていない場合にはビットスライス演
算命令への変換を行わない構成を有している。Further, the bit slice code generation means may execute the bit slice operation even if the instruction sequence conforms to the bit slice operation instruction if the head address of the data to be accessed in the instruction sequence is not along a word boundary. It has a configuration in which conversion into an operation instruction is not performed.

【００１４】また更に、本発明の好ましい実施例におい
ては、原始プログラムを読み込み構文解析を行って中間
コード列を生成する構文解析部と、前記中間コード列か
らループ構造を検出して前記中間コード列をビットスラ
イス演算命令で並列に処理する構造を持った中間コード
列に変換する最適化処理部と、変換された中間コード列
から目的プログラムを生成して出力するコード生成部と
を備え、且つ、前記最適化処理部に、前記中間コード列
から並列化処理対象とするループ構造を検出して、その
ループ構造を前記ビットスライス演算命令のフィールド
数分アンローリングするループアンローリング手段と、
アンローリングされたコード列を、そのループ構造内で
定義される各変数の定義参照関係を変化させない範囲内
で、同種の演算を行うコードが隣接するように並べ換え
るコード列並べ換え手段と、並べ換えられたコード列か
らパターン照合によってビットスライス演算命令に適合
するコードパターンを検出し、それらを対応するビット
スライスデータ型中間コードに変換するビットスライス
コード生成手段とを備えている。Still further, in a preferred embodiment of the present invention, a syntax analysis unit for reading a source program and performing syntax analysis to generate an intermediate code sequence, and detecting a loop structure from the intermediate code sequence to generate the intermediate code sequence And a code generation unit that generates and outputs a target program from the converted intermediate code sequence, and an optimization processing unit that converts an intermediate code sequence having a structure of performing parallel processing with a bit slice operation instruction. A loop unrolling unit that detects a loop structure to be subjected to parallel processing from the intermediate code sequence, and unrolls the loop structure by the number of fields of the bit slice operation instruction,
A code string rearranging means for rearranging the unrolled code string so that codes performing the same kind of operation are adjacent to each other within a range that does not change the definition reference relationship of each variable defined in the loop structure. Bit slice code generation means for detecting a code pattern conforming to the bit slice operation instruction from the code sequence by pattern matching and converting the code pattern into a corresponding bit slice data type intermediate code.

【００１５】上述のように構成された本発明の並列化コ
ンパイル方式にあっては、先ずループアンローリング手
段が原始プログラム中にあるループ構造を展開し、次に
コード列並べ換え手段が演算実行結果の正当性を保持で
きる範囲でループ内コード列を並べ換え、次いでビット
スライスコード生成手段が並べ換えられた命令列からビ
ットスライス演算命令に適合するパターンを抽出してビ
ットスライス演算命令を生成する。In the parallelized compiling method of the present invention configured as described above, first, the loop unrolling means expands the loop structure in the source program, and then the code string rearranging means executes the operation execution result. The code sequence in the loop is rearranged within a range in which the validity can be maintained, and then the bit slice code generation unit generates a bit slice operation instruction by extracting a pattern matching the bit slice operation instruction from the rearranged instruction sequence.

【００１６】ループアンローリングの回数は、このコン
パイラの出力する目的プログラムを実行するプロセッサ
が有するビットスライス演算命令のフィールド数に合わ
せる。例えば、図３に示すような４フィールド並列演算
を行う命令を備えている場合は、ループを４回アンロー
リングする。The number of times of loop unrolling is adjusted to the number of fields of a bit slice operation instruction of a processor that executes a target program output by the compiler. For example, when an instruction for performing a four-field parallel operation as shown in FIG. 3 is provided, the loop is unrolled four times.

【００１７】アンローリングしたコード列は、同種の演
算を行うコードが隣接するように順序を並べ換える。こ
こで同種の演算とは、中間コードのオペコードが同一
で、オペランドのみ異なるものをいう。命令順序の入れ
換えは、依存関係解析で得られる変数の定義参照関係を
参照しつつ、プログラム実行結果が元のプログラムから
変化しない範囲で行う。このような並べ換えのために本
発明の好ましい実施例においては、ループアンローリン
グ手段において、アンローリング前のコード列に第１の
順序番号を振り、アンローリングの際にはｋ回目のアン
ローリングによって生成されたコード列に対してｋに応
じた第２の順序番号を振り、コード列並べ換え手段にお
いて、ループ内に出現する変数の定義参照関係を保持し
つつ、第１の順序番号を第１キー，第２の順序番号を第
２キーとして、ループ内の命令列を並べ換える。The order of the unrolled code sequence is rearranged so that codes performing the same kind of operation are adjacent to each other. Here, the same type of operation refers to an operation in which the operation code of the intermediate code is the same and only the operand is different. The instruction order is changed within a range in which the program execution result does not change from the original program while referring to the variable definition reference relationship obtained by the dependency analysis. In a preferred embodiment of the present invention for such rearrangement, the loop unrolling means assigns the first sequence number to the code sequence before unrolling, and generates the code sequence by the k-th unrolling at the time of unrolling. A second sequence number corresponding to k is assigned to the generated code sequence, and the code sequence rearranging means assigns the first sequence number to the first key, while maintaining the definition reference relationship of the variable appearing in the loop. The instruction sequence in the loop is rearranged using the second sequence number as the second key.

【００１８】並べ換え後は、コード列中に同種演算が隣
接して出現するようになるため、ビットスライス演算命
令に変換できるコードパターンは容易に検出できる。こ
のとき、検出されたコードパターンは単純な整数データ
型や配列要素型ではなく、ビットスライスデータ型を用
いた中間コード列として内部表現する。ビットスライス
データ型で表現された中間コード列は、目的コードとし
て生成すべきビットスライス演算命令セットとの対応関
係、及び、アクセスする記憶装置のアドレスとループ制
御変数との関係が明白である。このため、ビットスライ
ス演算命令に適合する命令列であっても、その命令列中
でアクセスするデータの先頭アドレスがワード境界に沿
っていない場合にはビットスライス演算命令への変換を
行わない、といった制御が容易に行え、実行効率の良い
目的プログラムを生成することができる。After rearrangement, the same kind of operation appears adjacently in the code string, so that a code pattern that can be converted into a bit slice operation instruction can be easily detected. At this time, the detected code pattern is internally represented as an intermediate code string using a bit slice data type, not a simple integer data type or array element type. In the intermediate code string represented by the bit slice data type, the correspondence relationship between the bit slice operation instruction set to be generated as the target code and the relationship between the address of the storage device to be accessed and the loop control variable are clear. For this reason, even if the instruction sequence conforms to the bit slice operation instruction, the conversion to the bit slice operation instruction is not performed if the head address of the data to be accessed in the instruction sequence is not along a word boundary. Control can be easily performed, and a target program with high execution efficiency can be generated.

【００１９】[0019]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００２０】図１は本発明の一実施例のブロック図であ
る。この実施例のコンパイラＡは、Ｃ言語等の高級言語
で記述された原始プログラムＢを入力し、ビットスライ
ス演算型プロセッサ向けの目的プログラムＣを生成して
出力し、その処理の過程で中間コード列Ｄを生成する。
コンパイラＡは、原始プログラムＢを読み込み構文解析
を行って中間コード列Ｄを生成する構文解析部Ｅと、中
間コード列Ｄからループ構造を検出して中間コード列Ｄ
をビットスライス演算命令で並列に処理する構造を持っ
た中間コード列Ｄに変換する最適化処理部Ｆと、変換さ
れた中間コード列Ｄから目的プログラムＣを生成して出
力するコード生成部Ｇとから構成されている。FIG. 1 is a block diagram of one embodiment of the present invention. The compiler A of this embodiment receives a source program B described in a high-level language such as C language, generates and outputs a target program C for a bit-slice operation type processor. Generate D.
The compiler A reads a source program B and performs a syntax analysis to generate an intermediate code sequence D. The compiler A detects a loop structure from the intermediate code sequence D and detects an intermediate code sequence D.
And a code generation unit G that generates and outputs a target program C from the converted intermediate code sequence D. It is composed of

【００２１】図２は最適化処理部Ｆの構成例を示すブロ
ック図である。この例の最適化処理部Ｆは、前処理装置
１，ループアンローリング装置２，コード列並べ換え装
置３，ビットスライスコード生成装置５が順に接続され
た構成になっており、コード列並べ換え装置３は依存解
析装置４と接続されている。FIG. 2 is a block diagram showing a configuration example of the optimization processing unit F. The optimization processing unit F of this example has a configuration in which a preprocessing device 1, a loop unrolling device 2, a code sequence rearranging device 3, and a bit slice code generating device 5 are connected in this order. It is connected to the dependency analyzer 4.

【００２２】コンパイラＡに与えられた原始プログラム
Ｂは、構文解析部Ｅによって適切な中間コード列Ｄに変
換された後、最適化処理部Ｆの前処理装置１に入力され
る。The source program B given to the compiler A is converted into an appropriate intermediate code sequence D by the syntax analyzer E, and then input to the preprocessor 1 of the optimization processor F.

【００２３】前処理装置１は、従来からのスカラ最適化
処理の他、ループの正規化及び帰納変数の置き換えを行
う。ループの正規化では、ループ制御変数の値が０から
ある自然数Ｎまで１ずつ増加するようにループ構造を変
換する。帰納変数の置き換えでは、ループ中の帰納変数
出現をループ制御変数を用いた線形式で置き換える。従
来からのスカラ最適化処理及び帰納変数については、例
えば、"Compilers Principles,Techniques,and,Tools"
(A.V.Aho ほか、1986年,Addison-Wesley Publishers)
の第９章，第１０章に述べられている。The preprocessing device 1 performs normalization of a loop and replacement of an induction variable, in addition to the conventional scalar optimization processing. In the loop normalization, the loop structure is converted so that the value of the loop control variable increases from 0 to a natural number N by one. In the replacement of the induction variable, the occurrence of the induction variable in the loop is replaced with a linear form using the loop control variable. For conventional scalar optimization processing and induction variables, see, for example, "Compilers Principles, Techniques, and, Tools"
(AVAho et al., 1986, Addison-Wesley Publishers)
Chapter 9 and Chapter 10.

【００２４】ループアンローリング装置２は、前処理装
置１の出力する中間コード列から並列化処理対象とする
ループ構造を探し出し、それらのループをＵ回アンロー
リングする。ここでＵは、ターゲットプロセッサが有す
るビットスライス演算命令のフィールド数である。ま
た、並列化処理対象ループか否かの判定基準は必ずしも
正確でなくてもよく、例えば、当該ループ構造中に配列
操作処理が含まれており、かつ、並列化に適さない手続
き呼び出しや関数呼び出しが含まれていない、という条
件で判断してもよい。当該ループ構造がビットスライス
演算命令を使用した並列化に適するか否かの正確な判定
は後段で行う。The loop unrolling device 2 searches for a loop structure to be subjected to parallel processing from the intermediate code sequence output from the preprocessing device 1, and unrolls those loops U times. Here, U is the number of fields of the bit slice operation instruction of the target processor. Also, the criterion for determining whether or not a loop is a target loop for parallel processing is not necessarily accurate. For example, an array operation process is included in the loop structure, and a procedure call or a function call that is not suitable for parallelization is performed. May not be included in the condition. An accurate determination as to whether or not the loop structure is suitable for parallelization using a bit slice operation instruction is made in a subsequent stage.

【００２５】また、ループ中でループ独立な単純変数が
用いられている場合、ここでその変数を配列化しておく
こともでき、変数の配列化によりビットスライス演算命
令での実行効率を高めることができる。単純変数の配列
化に関しては、例えば、前出"Supercompilers for Para
llel and Vector Computers"の６．５節に述べられてい
る。When a loop-independent simple variable is used in a loop, the variable can be arrayed here, and the execution efficiency of the bit slice operation instruction can be improved by arraying the variables. it can. Regarding arraying of simple variables, for example, see "Supercompilers for Para
llel and Vector Computers "in section 6.5.

【００２６】コード列並べ換え装置３は、中間コード列
中で同種の演算を行うコードが隣接するように、ループ
アンローリングされた中間コード列を適切に並べ換え
る。並べ換えに際しては、依存解析装置４による解析結
果を参照し、並べ換え後のプログラムの実行結果が並べ
換え前と同一であることを保証する。即ち、ループ内で
定義される各変数の定義参照関係を変化させない範囲で
並べ換えを行う。The code sequence rearranging device 3 appropriately rearranges the loop-unrolled intermediate code sequence so that codes performing the same kind of operation are adjacent in the intermediate code sequence. At the time of rearrangement, the analysis result by the dependency analysis device 4 is referred to assure that the execution result of the program after rearrangement is the same as that before the rearrangement. That is, rearrangement is performed within a range that does not change the definition reference relationship of each variable defined in the loop.

【００２７】依存解析装置４はループ内の各命令の定義
参照関係を調査する。依存関係解析手法は従来からのベ
クトル化コンパイラ等で用いられていた手法でよいが、
コード列並べ換え装置３における命令並べ換えはアンロ
ーリング後のループの本体内での並べ換えにとどまるた
め、ループの繰り返しの前後（Ｕ−１）回以内における
定義参照関係を調査すれば十分である。The dependency analyzer 4 checks the definition reference relationship of each instruction in the loop. The dependency analysis method may be the method used in the conventional vectorizing compiler, etc.
Since instruction rearrangement in the code sequence rearrangement device 3 is limited to rearrangement in the body of the loop after unrolling, it is sufficient to investigate the definition reference relationship before and after (U-1) times before and after the loop is repeated.

【００２８】ビットスライスコード生成装置５は、並べ
換え後の中間コード列を走査してビットスライス演算命
令に適合し得る中間コードパターンを検出し、それらを
対応するビットスライスデータ型中間コードに変換す
る。前記並べ換えの結果、ビットスライス演算命令に適
合するパターンは中間コード列中で隣接した位置に存在
しているため、検出は容易である。The bit slice code generation device 5 scans the rearranged intermediate code sequence to detect an intermediate code pattern that can be adapted to the bit slice operation instruction, and converts them into a corresponding bit slice data type intermediate code. As a result of the rearrangement, the pattern conforming to the bit slice operation instruction exists at an adjacent position in the intermediate code string, and therefore, the detection is easy.

【００２９】ビットスライスコード生成装置５は、命令
パターン検出と同時に以下に挙げる項目のチェックを行
う。（ア）当該中間コード列をそのビット長で演算するビッ
トスライス演算命令が存在すること。（イ）主記憶装置等に記憶された配列変数のアクセスを
行う場合はそのデータの先頭アドレスがビットスライス
演算命令に適したアドレス境界（アライメント）に配置
されていること。ここで、データの先頭アドレスは、配
列変数の配置される先頭アドレスと、ループ制御変数等
の線形式で表現される配列添字式から算出する。（ウ）上記（イ）の場合、データ要素のアクセス間隔
（スライド）がターゲットプロセッサが有するビットス
ライス演算命令のフィールド数の約数に一致し、望まし
くは１であること。The bit slice code generator 5 checks the following items simultaneously with the detection of the instruction pattern. (A) A bit slice operation instruction for operating the intermediate code sequence with the bit length exists. (A) When accessing an array variable stored in a main storage device or the like, the head address of the data must be located at an address boundary (alignment) suitable for a bit slice operation instruction. Here, the head address of the data is calculated from the head address where the array variable is arranged and the array subscript expression expressed in a linear format such as a loop control variable. (C) In the case of (a), the access interval (slide) of the data element is equal to a divisor of the number of fields of the bit slice operation instruction of the target processor, and is preferably 1.

【００３０】上記（ア）の項目をチェックするのは、そ
のようなビットスライス演算命令が存在しないと置き換
えができないからである。また、上記（イ）の項目をチ
ェックするのは、そのような条件を満たさないとビット
スライス演算命令を実際に使っても効率的な実行が行え
ないからである。更に、上記（ウ）の項目をチェックす
るのは、データ要素のアクセス間隔がビットスライス演
算命令のフィールド数の約数でないとビットスライス演
算命令を適用できないこと、約数であっても大きな値で
あればビットスライス演算命令を実際に使っても効率的
な実行が行えないことによる。The item (a) is checked because it cannot be replaced without such a bit slice operation instruction. The reason why the item (a) is checked is that if such conditions are not satisfied, efficient execution cannot be performed even if the bit slice operation instruction is actually used. Further, the item (c) is checked because the access interval of the data element is not a divisor of the number of fields of the bit slice operation instruction, and the bit slice operation instruction cannot be applied. If there is, even if a bit slice operation instruction is actually used, efficient execution cannot be performed.

【００３１】但し、ターゲットプロセッサの有するビッ
トスライス演算命令セットによっては、更にいくつかの
チェック項目が付加される場合もある。However, some check items may be further added depending on the bit slice operation instruction set of the target processor.

【００３２】対象ループ中にビットスライスデータ型中
間コードに変換できない中間コード列が部分的に存在す
る場合、ビットスライス演算と通常演算の混同は実行に
際してオーバーヘッドをもたらすから、当該ループ構造
のビットスライスデータ型中間コードへの変換を中止す
る。ただし、ループ内においてビットスライスデータ型
中間コードに変換できる部分と変換できない部分とが各
々連続している場合は、依存解析装置４の解析結果を再
度参照した上で、ループ分割手法の適用を試みる。ルー
プ分割手法については、例えば、前出"Supercompilers
for Parallel and Vector Computers"の６．２節に述べ
られている。[0032] If the target loop intermediate code string can not be converted into bit slice data type intermediate code during exists partially confusion bit slice operation and the normal operation Once also the overhead when executing scalar, bits of the loop structure Abort conversion to slice data type intermediate code. However, if a portion that can be converted to the bit slice data type intermediate code and a portion that cannot be converted are continuous in the loop, the analysis result of the dependency analysis device 4 is referred to again, and then the application of the loop division method is attempted. . For the loop division method, for example, see "Supercompilers
for Parallel and Vector Computers "in section 6.2.

【００３３】次に、ループアンローリング装置２および
コード列並べ換え装置３において、２つのソート（並べ
換え）キーを用いて中間コードを並べ換える方法につい
て説明する。中間コードで表現されたループ構造が与え
られた後、ビットスライスデータ型中間コードに変換さ
れるまでの処理の流れを図４に示す。図４の処理手順は
大きく２つの部分に分かれている。一つはループアンロ
ーリング処理２０で、もう一つはコード列並べ換え処理
３０である。ループアンローリング処理２０は図２にお
けるループアンローリング装置２が行い、コード列並べ
換え処理３０は図２におけるコード列並べ換え装置３が
行う。以下、図５から図９のコード例も参照しながら処
理手順を説明する。Next, a method of rearranging intermediate codes in the loop unrolling device 2 and the code sequence rearranging device 3 using two sort keys will be described. FIG. 4 shows the flow of processing from the application of the loop structure represented by the intermediate code to the conversion to the bit slice data type intermediate code. The processing procedure of FIG. 4 is roughly divided into two parts. One is a loop unrolling process 20 and the other is a code sequence rearranging process 30. The loop unrolling process 20 is performed by the loop unrolling device 2 in FIG. 2, and the code sequence rearranging process 30 is performed by the code sequence rearranging device 3 in FIG. Hereinafter, the processing procedure will be described with reference to the code examples of FIGS.

【００３４】図５はＣ言語による原始プログラムの例で
あり、図６は図５のプログラムを本コンパイラに与えた
ときに前処理装置１から出力される中間コード列のう
ち、ｆｏｒループの本体を構成する部分を抜き出したも
のである。ｔ_1,ｔ_2,ｔ₃は演算結果を一時保持する中間
項、左矢印は右辺の演算結果を左辺へ代入することを表
す。FIG. 5 shows an example of a source program in C language. FIG. 6 shows the main part of the for loop in the intermediate code sequence output from the preprocessing device 1 when the program of FIG. The constituent parts are extracted. t _1, t _{2, and} t ₃ are intermediate terms that temporarily hold the operation result, and the left arrow indicates that the operation result on the right side is assigned to the left side.

【００３５】ループアンローリング処理２０では、最初
に当該ループがビットスライス演算向けの変換に適して
いるか否かを判定する（２１）。この判定は、変換が全
く不可能あるいは無意味であるようなループを処理対象
から外すことにより、コンパイル処理速度を改善するた
めに行う。変換対象と判定されたループ構造に対しての
み、以下の処理を施す。In the loop unrolling process 20, first, it is determined whether or not the loop is suitable for conversion for a bit slice operation (21). This determination is made in order to improve the compile processing speed by excluding a loop for which conversion is not possible or meaningless from the processing target. The following processing is performed only on the loop structure determined to be a conversion target.

【００３６】アンローリングに際しては、ループ本体を
構成する各中間コードに対して２種類の順序番号を振
る。即ち、まずループ本体を構成する各中間コードに対
して順序番号Ｋ₁を振る（２２）。この順序番号は中間
コード列の上流側から下流側に向かって昇順に一意に振
る。図６に示した例では、同図第１列目の数値が順序番
号Ｋ₁である。At the time of unrolling, two types of sequence numbers are assigned to each intermediate code constituting the loop body. That is, first shake sequence number K ₁ for each intermediate code constituting the loop body (22). The sequence numbers are uniquely assigned in ascending order from the upstream side to the downstream side of the intermediate code string. In the example shown in FIG. 6, the numerical value of the first column the figure is a sequence number K _1.

【００３７】次に、ｋ回目のアンローリングで生成され
る中間コードの各々に対して順序番号Ｋ₂＝ｋを振りな
がらループをＵ回アンローリングする（２３）。この順
序番号をＫ₂とする。アンローリングの結果、各中間コ
ードはＫ₁とＫ₂の２つの順序番号を持つことになる。
図５のプログラムに対してこのアンローリング手順を適
用した結果を図７に示す。Next, the loop is unrolled U times while giving the sequence number K ₂ = k to each of the intermediate codes generated in the k-th unrolling (23). The sequence number and K _2. As a result of the unrolling, each intermediate code has two sequence numbers, K ₁ and K ₂ .
FIG. 7 shows the result of applying this unrolling procedure to the program of FIG.

【００３８】コード列並べ換え処理３０では、まず最初
に依存関係解析３１を行う。ここでの依存関係解析の目
的はアンローリング後のループ本体内のコード列の順序
をどの程度入れ換えることができるか、であるから、ル
ープの繰り返し（アイテレーション）間にまたがらない
解析で十分である。In the code string rearrangement process 30, first, a dependency analysis 31 is performed. The purpose of the dependency analysis here is to what extent the order of the code strings in the loop body after unrolling can be changed, so that analysis that does not span between loop iterations (iteration) is sufficient. is there.

【００３９】続いて、演算処理結果を変えない範囲内
で、Ｋ₁を第１キー、Ｋ₂を第２キーとしてループ内中
間コードを昇順に並べ換える（３２）。これにより、ル
ープ内の中間コードは、ビットスライス演算命令を用い
て実行するとした場合の演算順序に並び換わる。図７の
コード列を並べ変えた後の様子を図８に示す。図７の例
の場合、並べ換えを妨げる依存関係が存在しないため、
図８では完全にキーの昇順になるように並べ換えられて
いる。[0039] Then, within a range that does not alter the operation result, reorder K ₁ first key, the loop in the intermediate code of K ₂ as the second key in ascending order (32). As a result, the intermediate codes in the loop are rearranged in the order of operation when executed using the bit slice operation instruction. FIG. 8 shows the state after rearranging the code strings in FIG. In the case of the example of FIG. 7, there is no dependency that prevents the rearrangement.
In FIG. 8, the keys are completely rearranged in ascending order.

【００４０】並べ換え後、ビットスライス演算命令によ
る実行の効率について検討し、不適なものは変換を中止
する（３３）。実行効率を見積もる際のチェック項目に
ついては先に述べた。その中に主記憶装置等に記憶され
た配列のアクセスを行う場合はそのデータの先頭アドレ
スがビットスライス演算命令に適したアクセス境界（ア
ライメント）に配置されているか否か、という項目があ
る。前述したように、データの先頭アドレスは、配列変
数の場合、配列変数の配置される先頭アドレスと、ルー
プ制御変数等の線形式で表現される配列添字式から算出
するが、配列変数の配置される先頭アドレスがこの段階
で不明である場合、後のコード生成において配列先頭要
素がワード境界に沿って配置されることを前提として、
データの先頭アドレスを算出し、チェックを行えば良
い。また、ビットスライスデータ型中間コードに変換で
きない中間コード列が一部に存在する場合は、ビットス
ライス演算命令への変換をすべて中止するか、あるいは
先に述べたループ分割処理の適用を試みる。After the rearrangement, the efficiency of execution by the bit slice operation instruction is examined, and if it is inappropriate, the conversion is stopped (33). The check items for estimating the execution efficiency are described above. When accessing an array stored in a main storage device or the like, there is an item as to whether or not the head address of the data is located at an access boundary (alignment) suitable for a bit slice operation instruction. As described above, in the case of an array variable, the head address of the data is calculated from the head address where the array variable is arranged and the array subscript expression expressed in a linear format such as a loop control variable. If the starting address is unknown at this stage, it is assumed that the starting element of the array will be arranged along a word boundary in later code generation.
What is necessary is just to calculate the head address of the data and check it. If there is a part of the intermediate code string that cannot be converted to the bit slice data type intermediate code, the conversion to the bit slice operation instruction is all stopped, or the application of the above-described loop division processing is attempted.

【００４１】実行効率の点で問題ないと判定されたルー
プ構造については、中間コード列を上から順に走査して
ビットスライス演算命令に適合するパターンを隣接中間
コード列中から見い出し、それらをビットスライスデー
タ型を用いた中間コード列に変換する（３４）。パター
ン検出と対応するビットスライスデータ型中間コードの
生成規則の一例を図１０に、また、ここで用いたビット
スライスデータ型中間項表記の説明を図１１に示す。図
８のコード列から生成された中間コード列は図９のよう
になる。With respect to the loop structure determined to have no problem in terms of execution efficiency, the intermediate code sequence is scanned in order from the top to find a pattern suitable for the bit slice operation instruction from the adjacent intermediate code sequence, and to execute the bit slice operation. It is converted into an intermediate code sequence using a data type (34). FIG. 10 shows an example of a rule for generating a bit slice data type intermediate code corresponding to pattern detection, and FIG. 11 shows a description of the bit slice data type intermediate term notation used here. The intermediate code string generated from the code string in FIG. 8 is as shown in FIG.

【００４２】以上に述べたコンパイラは、アンローリン
グとコード列並べ換えによってビットスライスデータ型
中間コードを容易に生成することができ、しかも生成さ
れた中間コードはビットスライス演算命令セットとの対
応がとりやすく、オーバヘッド等を考慮してビットスラ
イス演算命令への変換が可能か否かの判定を適切に行う
ことができる。また、上記のアンローリングおよびコー
ド列並べ換え処理も２つのソートキーによるソートによ
って容易に行うことができる。The compiler described above can easily generate a bit slice data type intermediate code by unrolling and rearranging the code strings, and the generated intermediate code can be easily associated with a bit slice operation instruction set. , It is possible to appropriately determine whether conversion to a bit slice operation instruction is possible in consideration of overhead and the like. In addition, the above-described unrolling and code string rearrangement processing can be easily performed by sorting using two sort keys.

【００４３】以上本発明の実施例について説明したが、
本発明は以上の実施例にのみ限定されずその他各種の付
加変更が可能である。例えば、コード列の並べ換えを中
間コードの段階で実施したが、コード生成部Ｇで生成さ
れたコード列（命令列）の段階で並べ換えを行い、該当
するビットスライス演算命令に置き換えるようにするこ
とも可能である。The embodiment of the present invention has been described above.
The present invention is not limited to the above embodiments, and various other additions and changes are possible. For example, the rearrangement of the code sequence is performed at the stage of the intermediate code, but the rearrangement may be performed at the stage of the code sequence (instruction sequence) generated by the code generation unit G and replaced with a corresponding bit slice operation instruction. It is possible.

【００４４】[0044]

【発明の効果】以上説明したように、本発明によれば、
原始プログラムからビットスライス演算型プロセッサ向
けの目的プログラムを自動生成することのできるコンパ
イラが得られる。これにより、アセンブラ言語や機械語
でビットスライス演算型プロセッサの目的プロセッサを
プログラミングしていた作業から利用者を開放すること
ができる。As described above, according to the present invention,
A compiler capable of automatically generating a target program for a bit slice operation type processor from a source program is obtained. As a result, it is possible to relieve the user from the task of programming the target processor of the bit slice arithmetic processor in the assembler language or the machine language.

【００４５】また、第１の順序番号，第２の順序番号を
振って並べ換えを行う構成では、同種の演算を行うコー
ドが隣接するような並べ換えを、２つのソートキーによ
るソート処理にて容易に行うことができる。In the configuration in which the first order number and the second order number are assigned to perform the rearrangement, the rearrangement in which codes for performing the same kind of operation are adjacent to each other is easily performed by the sort processing using two sort keys. be able to.

【００４６】更に、命令列中でアクセスするデータの先
頭アドレスがワード境界に沿っていない場合にはビット
スライス演算命令への変換を行わない構成としたことに
より、ビットスライス演算命令に変換したことによって
却って実行効率が低下するのを防止できる。Further, when the head address of the data to be accessed in the instruction sequence is not along a word boundary, the conversion into the bit slice operation instruction is not performed. On the contrary, the execution efficiency can be prevented from lowering.

[Brief description of the drawings]

【図１】本発明の一実施例のブロック図である。FIG. 1 is a block diagram of one embodiment of the present invention.

【図２】最適化処理部の構成例を示すブロック図であ
る。FIG. 2 is a block diagram illustrating a configuration example of an optimization processing unit.

【図３】ビットスライス演算命令を有するプロセッサの
動作説明図である。FIG. 3 is an explanatory diagram of an operation of a processor having a bit slice operation instruction.

【図４】ループアンローリング装置およびコード列並べ
換え装置において、２つのソートキーを用いて中間コー
ドを並べ換える手順を示すフローチャートである。FIG. 4 is a flowchart showing a procedure for rearranging intermediate codes using two sort keys in the loop unrolling apparatus and the code string rearranging apparatus.

【図５】原始プログラムの一例を示す図である。FIG. 5 is a diagram illustrating an example of a source program.

【図６】図５の原始プログラムを前処理装置で処理した
後の中間コード列の一部を示す図である。FIG. 6 is a diagram showing a part of an intermediate code string after the source program of FIG. 5 is processed by a preprocessing device.

【図７】図６の中間コード列をアンローリングした後の
中間コード列を示す図である。FIG. 7 is a diagram showing an intermediate code sequence after unrolling the intermediate code sequence of FIG. 6;

【図８】図７の中間コード列の順序を並べ換えた後の中
間コード列を示す図である。FIG. 8 is a diagram showing an intermediate code sequence after rearranging the order of the intermediate code sequence in FIG. 7;

【図９】図８の中間コード列から生成されたビットスラ
イスデータ型中間コードを示す図である。FIG. 9 is a diagram showing a bit slice data type intermediate code generated from the intermediate code sequence of FIG. 8;

【図１０】中間コード列からビットスライスデータ型中
間コードに変換するパターンと変換規則を示す図であ
る。FIG. 10 is a diagram showing a pattern for converting an intermediate code sequence into a bit slice data type intermediate code and conversion rules.

【図１１】ビットスライスデータ型中間コードにおける
中間項表記の説明図である。FIG. 11 is an explanatory diagram of an intermediate term notation in a bit slice data type intermediate code.

[Explanation of symbols]

Ａ…原始プログラムＢ…コンパイラＣ…目的プログラムＤ…中間コード列Ｅ…構文解析部Ｆ…最適化処理部Ｇ…コード生成部１…前処理装置２…ループアンローリング装置３…コード列並べ換え装置４…依存解析装置５…ビットスライスコード生成装置６，７…ソースとなるレジスタ８…デスティネーションとなるレジスタ９…演算ユニット２０…ループアンローリング処理３０…コード列並べ換え処理９１，９２，９３，９４…演算サブユニット A: Source program B: Compiler C: Object program D: Intermediate code sequence E: Syntax analysis unit F: Optimization processing unit G: Code generation unit 1: Preprocessing device 2: Loop unrolling device 3: Code sequence rearrangement device 4 ... Dependency analyzer 5 ... Bit slice code generator 6,7 ... Source register 8 ... Destination register 9 ... Operation unit 20 ... Loop unrolling process 30 ... Code sequence rearrangement process 91,92,93,94 ... Arithmetic subunit

フロントページの続き (56)参考文献特開平６−28324（ＪＰ，Ａ) 米国特許5121498（ＵＳ，Ａ) ＫｏｈｎＬ．ｅｔａｌ，”ＴｈｅＶｉｓｕａｌＩｎｓｔｒｕｃｔｉｏｎＳｅｔ（ＶＩＳ）ｉｎＵｌｔｒａｓｐａｒｃ”Ｐｒｏｃ．ｏｆＩＥＥＥＣＯＭＰＣＯＮＳＰＲＩＮＧ 1995，Ｐ．462−469 ＢａｕｅｒＢ．Ｅ．”ＰａｒａｌｌｅｌＣＥｘｔｅｎｓｉｏｎｓ（ＰａｒａｌｌｅｌｉｚｅｄＣｏｄｅＲｅｔａｉｎｓｉｔｓＳｅｒｉａｌＳｔｒｕｃｔｕｒｅ”Ｄｒ．Ｄｏｂｂ’ｓＪｏｕｒｎａｌ（1992．８) ＦｉｓｈｅｒＡ．Ｌ．ｅｔａｌ，”ＤｅｓｉｎｇａｎｄＰｅｒｆｏｒｍａｎｃｅｏｆａｎＯｐｔｉｍｉｚｉｎｇＳＩＭＤＣｏｍｐｉｌｅｒ”Ｐｒｏｃ．ｏｆ３ｒｄ．Ｓｙｍｐ．ｏｎｔｈｅＦｒｏｎｔｉｅｒｓｏｆＭａｓｓｉｖｅｌｌｙＰａｒａｌｌｅｌＣｏｍｐｕｔａｔｉｏｎ（1990）Ｐ．507−510 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/45,15/16,17/16 Continuation of front page (56) References JP-A-6-28324 (JP, A) U.S. Pat. No. 5,121,498 (US, A) Kohn L. et al, "The Visual Instruction Set (VIS) in Ultra spark" Proc. of IEEE COMPCON SPRING 1995, p. 462-469 Bauer B.R. E. FIG. "Parallel C Extensions (Parallelized Code Retains It's Serial Structure)" Dr. Dobb's Journal (1992.8) "Fisher A.L.O.P.O.D.O.P. Of 3rd.Sym p.on the Frontiers of Massively Parallel Compound (1990) P.507-510 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 9 / 45,15 / 16,17 / 16

Claims

(57) [Claims]

1. A parallelized compiling method for a processor having a bit slice operation instruction for dividing a single word into a plurality of fields and performing the same kind of operation on each field, wherein a loop describing an array operation is unrolled. Including loop unrolling means, code sequence rearranging means for rearranging the unrolled instruction sequence while maintaining a dependency, and bit slice code generating means for generating a bit slice operation instruction by pattern matching from the rearranged instruction sequence A parallel compilation method comprising an optimization processing unit.

2. The loop unrolling means assigns a first sequence number to a code sequence before unrolling, and when unrolling is performed by the number of fields of a bit slice operation instruction, the loop is generated by a k-th unrolling. The code sequence is configured to assign a second sequence number corresponding to the k, and the code sequence rearrangement unit assigns the first sequence number to the first sequence number while maintaining a definition reference relationship of a variable appearing in the loop. 2. The apparatus according to claim 1, wherein the instruction sequence in the loop is rearranged using one key and a second sequence number as a second key.
The parallelized compilation method described.

3. The bit slice code generation means,
Even if the instruction sequence conforms to the bit slice operation instruction,
3. The parallel compiling method according to claim 1, wherein a conversion to a bit slice operation instruction is not performed when a head address of data accessed in the instruction sequence is not along a word boundary. .

4. A syntactic analysis unit which reads a source program and performs syntax analysis to generate an intermediate code sequence, detects a loop structure from the intermediate code sequence, and processes the intermediate code sequence in parallel with a bit slice operation instruction. An optimization processing unit for converting into an intermediate code sequence having a structure, a code generation unit for generating and outputting a target program from the converted intermediate code sequence, and wherein the optimization processing unit includes: Loop unrolling means for detecting a loop structure to be subjected to parallel processing from the sequence, and unrolling the loop structure by the number of fields of the bit slice operation instruction, and an unrolled code sequence in the loop structure. Code that rearranges codes that perform the same kind of operation so that they are adjacent to each other within the range that does not change the definition reference relationship of each defined variable A column rearranging unit, and a bit slice code generating unit that detects a code pattern conforming to the bit slice operation instruction from the rearranged code sequence by pattern matching and converts them to a corresponding bit slice data type intermediate code. Characterized parallel compilation method.