JP2009070070A

JP2009070070A - Compiler and compile method

Info

Publication number: JP2009070070A
Application number: JP2007236818A
Authority: JP
Inventors: Yuji Yokoya; 雄司横谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-09-12
Filing date: 2007-09-12
Publication date: 2009-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To generate an object program for a vector processor of high execution efficiency without limitation even in a double loop in which the number of repeated loops is smaller than the maximum vector length. <P>SOLUTION: When the number N of an inner loop repeated in a source program having a double loop of an outer loop and an inner loop is half or smaller than the maximum vector length VL of the source program in a compiler, the double loop is converted by a loop conversion part 23 into a loop in which a vector operation to operate N elements is a loop body, and the number of repetitions is a remainder obtained by dividing the number M of the repeated outer loops by the smaller of a value VL/N, or M, and a loop in which a vector operation to operate an element of a value obtained by multiplying a smaller of the value VL/N, or M, by N is a loop body, and the number of repetitions is a value obtained by dividing M by the smaller of a value VL/N, or M. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ＦｏｒｔｒａｎやＣ言語となどの高級言語を入力し、ベクトルプロセッサ向けの目的プログラムを生成するコンパイラ及びコンパイル方法に関する。 The present invention relates to a compiler and a compiling method for inputting a high-level language such as Fortran or C language and generating a target program for a vector processor.

ＦｏｒｔｒａｎやＣ言語などの高級言語からベクトルプロセッサ向けの目的プログラムを生成するコンパイラは、一般にソースプログラム中のループを、ベクトル処理を行うベクトル命令列に変換するベクトル化機能を備える。このベクトル化において生成される１個のベクトル命令が処理するデータ要素数は、ループの繰り返し数以下となる。 A compiler that generates a target program for a vector processor from a high-level language such as Fortran or C language generally has a vectorization function for converting a loop in a source program into a vector instruction sequence for performing vector processing. The number of data elements processed by one vector instruction generated in this vectorization is equal to or less than the number of loop iterations.

ここで、ループの繰り返し数が、ベクトルプロセッサの最大ベクトル長よりも小さな場合、ベクトルプロセッサの備える演算器などのハードウェア資源に空きが生じてしまう。そのため、生成された目的プログラムの実行効率が低くなってしまう。 Here, when the number of loop repetitions is smaller than the maximum vector length of the vector processor, a hardware resource such as an arithmetic unit provided in the vector processor is vacant. For this reason, the execution efficiency of the generated target program is lowered.

そこで、ベクトル化するループの繰り返し数を大きくする手法として、ループを一重化する技術が考えられている（例えば、特許文献１参照。）。 Therefore, as a technique for increasing the number of loop iterations to be vectorized, a technique for unifying the loops has been considered (for example, see Patent Document 1).

また、多重ループのループ交換により、ベクトル化するループの繰り返し数を大きくする技術が考えられている（例えば、非特許参考文献１参照。）。 Further, a technique for increasing the number of loop repetitions to be vectorized by exchanging multiple loops has been considered (for example, see Non-patent Reference 1).

また、ループ内の複数の同一演算に対して１個のベクトル命令を生成する方法が考えられている（例えば、特許文献２参照。）。
特開平６−４３００号公報特開平２−２３６７７５号公報 Hans Zima/Barbara Chapman共著、村岡洋一訳、「スーパーコンパイラ」、オーム社、１９９５年４月、ｐ．２２３−２３０、２６２ In addition, a method of generating one vector instruction for a plurality of identical operations in a loop has been considered (see, for example, Patent Document 2).
JP-A-6-4300 JP-A-2-236775 Co-authored by Hans Zima / Barbara Chapman, translated by Yoichi Muraoka, “Super Compiler”, Ohmsha, April 1995, p. 223-230, 262

しかし、特許文献１に記載された技術を用いたループ一重化は、多重ループ内で使用されているすべての配列が連続でなければならないという条件がある。そのため、適用できる多重ループがかなり限定されてしまうという問題点がある。 However, loop unification using the technique described in Patent Document 1 has a condition that all the sequences used in the multiple loops must be continuous. Therefore, there is a problem that applicable multiple loops are considerably limited.

また、非特許文献１に記載された技術を用いたループ交換においては、多重ループを構成するループ中の最大の繰り返し数以上にはできない。そのため、すべてのループの繰り返し数が最大ベクトル長よりも小さな場合は、実行効率を改善することが困難であるという問題点がある。 Further, in the loop exchange using the technique described in Non-Patent Document 1, it is not possible to exceed the maximum number of repetitions in the loop constituting the multiple loop. Therefore, when the number of iterations of all loops is smaller than the maximum vector length, there is a problem that it is difficult to improve execution efficiency.

また、特許文献２に記載された技術においては、同一演算を複数含むループにしか適用することができないという問題点がある。 In addition, the technique described in Patent Document 2 has a problem that it can be applied only to a loop including a plurality of identical operations.

本発明は、上述したような従来の技術が有する問題点に鑑みてなされたものであって、ループの繰り返し数が、ベクトルプロセッサの最大ベクトル長よりも小さな２重ループであっても、実行効率の高いベクトルプロセッサ向けの目的プログラムを制限なく生成することができるコンパイラ及びコンパイル方法を提供することを目的とする。 The present invention has been made in view of the problems of the conventional technology as described above, and even if the number of loop iterations is a double loop smaller than the maximum vector length of the vector processor, the execution efficiency is improved. An object of the present invention is to provide a compiler and a compiling method that can generate a target program for a vector processor having a high level without limitation.

上記目的を達成するために本発明は、
外側ループと内側ループとの２重ループを有するソースプログラムを目的プログラムに変換するコンパイラであって、
前記外側ループの繰り返し数をＭとし、前記内側ループの繰り返し数をＮとし、前記ソースプログラムの最大ベクトル長をＶＬとし、前記Ｎが前記ＶＬの１/２以下である場合、
前記２重ループを、
前記Ｎの要素の演算を行うベクトル演算をループ本体とし、前記Ｍを、前記ＶＬを前記Ｎで割った値と前記Ｍとのどちらか小さな値で割った余りの値を繰り返し数とするループと、
前記ＶＬを前記Ｎで割った値と前記Ｍとのどちらか小さな値に前記Ｎを乗じた値の要素の演算を行うベクトル演算をループ本体とし、前記Ｍを、前記ＶＬを前記Ｎで割った値と前記Ｍとのどちらか小さな値で割った値を繰り返し数とするループとの２重ループに変換する。 In order to achieve the above object, the present invention provides:
A compiler for converting a source program having a double loop of an outer loop and an inner loop into a target program,
When the number of iterations of the outer loop is M, the number of iterations of the inner loop is N, the maximum vector length of the source program is VL, and N is 1/2 or less of the VL,
The double loop,
A loop that uses a vector operation for calculating the elements of N as a loop body, and that uses the remainder of dividing M by a value obtained by dividing VL by N or the smaller value of M; ,
A vector operation for calculating an element of a value obtained by multiplying the value obtained by dividing the VL by the value obtained by dividing the VL by the N or the value obtained by multiplying the N by the N is used as a loop body, and the M is divided by the VL It is converted into a double loop with a loop having a value obtained by dividing a value divided by the smaller value of M and the value M.

以上説明したように本発明においては、ソースプログラムが有する２重ループのうち、外側ループの繰り返し数をＭとし、内側ループの繰り返し数をＮとし、ソースプログラムの最大ベクトル長をＶＬとし、ＮがＶＬの１/２以下である場合、２重ループを、Ｎの要素の演算を行うベクトル演算をループ本体とし、Ｍを、ＶＬをＮで割った値とＭとのどちらか小さな値で割った余りの値を繰り返し数とするループと、ＶＬをＮで割った値とＭとのどちらか小さな値にＮを乗じた値の要素の演算を行うベクトル演算をループ本体とし、Ｍを、ＶＬをＮで割った値とＭとのどちらか小さな値で割った値を繰り返し数とするループとの２重ループに変換する構成としたため、ループの繰り返し数が、ベクトルプロセッサの最大ベクトル長よりも小さな２重ループであっても、実行効率の高いベクトルプロセッサ向けの目的プログラムを制限なく生成することができる。 As described above, in the present invention, among the double loops of the source program, the number of iterations of the outer loop is M, the number of iterations of the inner loop is N, the maximum vector length of the source program is VL, and N is When VL is ½ or less, the double loop is a vector operation for calculating the elements of N as a loop body, and M is divided by a value obtained by dividing VL by N or M, whichever is smaller. A loop having the remainder as the number of repetitions, and a vector operation for calculating an element of a value obtained by multiplying a value obtained by dividing N by a value obtained by dividing VL by N or M, is set as M. Since the loop is converted into a double loop with a loop obtained by dividing the value divided by N or M, whichever is smaller, the number of loop iterations is smaller than the maximum vector length of the vector processor. Even in a double loop, a target program for a vector processor with high execution efficiency can be generated without restriction.

以下に、本発明の実施の形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明のコンパイラの実施の一形態を示す図である。 FIG. 1 is a diagram showing an embodiment of a compiler of the present invention.

本形態は図１に示すように、ソースプログラム記憶装置１と、コンパイラ２と、目的プログラム記憶装置３とから構成されている。 As shown in FIG. 1, this embodiment includes a source program storage device 1, a compiler 2, and a target program storage device 3.

ソースプログラム記憶装置１は、ＦｏｒｔｒａｎやＣ言語などの高級言語で記述されたソースプログラムを記憶する。 The source program storage device 1 stores a source program written in a high-level language such as Fortran or C language.

コンパイラ２は、ソースプログラム記憶装置１に記憶されているソースプログラムからコンピュータが実行できる形式であるオブジェクトコード（目的プログラム）を生成する。つまり、ソースプログラムを目的プログラムに変換する。そして、生成した目的プログラムを目的プログラム記憶装置３へ送信する。 The compiler 2 generates object code (object program) in a format that can be executed by a computer from the source program stored in the source program storage device 1. That is, the source program is converted into the target program. Then, the generated target program is transmitted to the target program storage device 3.

目的プログラム記憶装置３は、コンパイラ２から送信されてきた目的プログラムを記憶する。 The target program storage device 3 stores the target program transmitted from the compiler 2.

さらに、コンパイラ２は、ソースプログラム解析部２１と、ループ解析部２２と、ループ変換部２３と、目的プログラム生成部２４と、第１中間テキスト記憶部２５と、ループ情報記憶部２６と、第２中間テキスト記憶部２７とから構成されている。 Furthermore, the compiler 2 includes a source program analysis unit 21, a loop analysis unit 22, a loop conversion unit 23, a target program generation unit 24, a first intermediate text storage unit 25, a loop information storage unit 26, a second And an intermediate text storage unit 27.

ソースプログラム解析部２１は、ソースプログラム記憶装置１に記憶されているソースプログラムを入力し、解析処理および変換処理で使用するための第１中間テキストを生成する。そして、生成した第１中間テキストを第１中間テキスト記憶部２５に記憶させる。 The source program analysis unit 21 inputs a source program stored in the source program storage device 1 and generates a first intermediate text for use in analysis processing and conversion processing. Then, the generated first intermediate text is stored in the first intermediate text storage unit 25.

第１中間テキスト記憶部２５は、ソースプログラム解析部２１にて生成された第１中間テキストを記憶する。 The first intermediate text storage unit 25 stores the first intermediate text generated by the source program analysis unit 21.

ループ解析部２２は、ソースプログラム解析部２１にて生成されて第１中間テキスト記憶部２５に記憶された第１中間テキストを入力し、ループの入れ子構造およびループの繰り返し回数、ループインデックス、ループ内の文のデータ依存、制御依存情報を含むループ情報を生成する。そして、生成したループ情報をループ情報記憶部２６に記憶させる。 The loop analysis unit 22 inputs the first intermediate text generated by the source program analysis unit 21 and stored in the first intermediate text storage unit 25, and includes a loop nesting structure, the number of loop repetitions, a loop index, Generate loop information including data dependency and control dependency information of the statement. Then, the generated loop information is stored in the loop information storage unit 26.

ループ情報記憶部２６は、ループ解析部２２にて生成されたループ情報を記憶する。 The loop information storage unit 26 stores the loop information generated by the loop analysis unit 22.

ループ変換部２３は、ループ解析部２２にて生成されてループ情報記憶部２６に記憶されたループ情報に基づいて第２中間テキストを生成する。具体的には、内側ループがベクトル化可能かつ繰り返し数Ｎ（Ｎは１以上の整数）が最大ベクトル長ＶＬ（ＶＬは１以上の整数）の１/２以下で外側ループの繰り返し数Ｍ（Ｍは１以上の整数）が２以上かつ変換可能条件を満たす２重ループを選択する。ここで、最大ベクトル長とは、１つのベクトル命令で処理できる最大要素数である。第１中間テキスト記憶部２５から入力した第１中間テキスト中の選択された２重ループを、Ｎ要素の演算を行うベクトル演算をループ本体とする繰り返し数mod(M, min(M, VL/N))のループと、Ｎにmin(M, VL/N)を乗算するN*min(M, VL/N)要素の演算を行うベクトル演算をループ本体とする繰り返し数M/min(M, VL/N)のループとに変換した第２中間テキストを生成する。そして、生成した第２中間テキストを第２中間テキスト記憶部２７に記憶させる。ここで、mod(x,y)は、ｘをｙで割った（除算した）余りの値を示す。また、min(x,y)は、ｘとｙとのどちらか小さな値を示す。また、VL/Nを第１の除算処理、M/min(M, VL/N)を第２の除算処理とする。 The loop conversion unit 23 generates the second intermediate text based on the loop information generated by the loop analysis unit 22 and stored in the loop information storage unit 26. Specifically, the inner loop can be vectorized and the number of iterations N (N is an integer of 1 or more) is less than or equal to 1/2 of the maximum vector length VL (VL is an integer of 1 or more), and the number of iterations M (M Is an integer greater than or equal to 1, and a double loop satisfying the convertible condition is selected. Here, the maximum vector length is the maximum number of elements that can be processed by one vector instruction. The number of repetitions mod (M, min (M, VL / N) having the loop operation as a vector operation for performing N element operations on the selected double loop in the first intermediate text input from the first intermediate text storage unit 25 )) Loop and a vector operation that performs N * min (M, VL / N) element operation that multiplies N by min (M, VL / N) loop number M / min (M, VL / N) to generate a second intermediate text converted into a loop. Then, the generated second intermediate text is stored in the second intermediate text storage unit 27. Here, mod (x, y) represents a remainder value obtained by dividing (dividing) x by y. Min (x, y) indicates a smaller value of either x or y. Further, VL / N is a first division process, and M / min (M, VL / N) is a second division process.

第２中間テキスト記憶部２７は、ループ変換部２３にて生成された第２中間テキストを記憶する。 The second intermediate text storage unit 27 stores the second intermediate text generated by the loop conversion unit 23.

目的プログラム生成部２４は、第２中間テキスト記憶部２７からループ変換部２３にて生成された第２中間テキストを入力し、目的プログラムを生成して、目的プログラム記憶装置３へ送信する。この入力は、目的プログラム生成部２４によって第２中間テキスト記憶部２７から第２中間テキストが読み出されることによるものであっても良い。 The target program generation unit 24 receives the second intermediate text generated by the loop conversion unit 23 from the second intermediate text storage unit 27, generates a target program, and transmits it to the target program storage device 3. This input may be performed by reading the second intermediate text from the second intermediate text storage unit 27 by the target program generation unit 24.

以下に、図１に示したコンパイラ２におけるコンパイル方法について説明する。 A compiling method in the compiler 2 shown in FIG. 1 will be described below.

図２は、図１に示したコンパイラ２におけるコンパイル方法を説明するためのフローチャートである。 FIG. 2 is a flowchart for explaining a compiling method in the compiler 2 shown in FIG.

まず、ソースプログラム解析部２１に、ソースプログラム記憶装置１に記憶されているソースプログラムが入力する。ソースプログラム解析部２１は、入力したソースプログラムの字句解析および構文解析などを行い、解析処理および変換処理で使用するための第１中間テキストを生成する。そして、生成した第１中間テキストを第1中間テキスト記憶部２５に記憶させる。 First, the source program stored in the source program storage device 1 is input to the source program analysis unit 21. The source program analysis unit 21 performs lexical analysis and syntax analysis of the input source program, and generates a first intermediate text for use in analysis processing and conversion processing. Then, the generated first intermediate text is stored in the first intermediate text storage unit 25.

その後、ループ解析部２２に、第１中間テキスト記憶部２５に記憶された第１中間テキストが入力する。この入力は、ループ解析部２２によって第１中間テキスト記憶部２５から第１中間テキストが読み出されることによるものであっても良い。 Thereafter, the first intermediate text stored in the first intermediate text storage unit 25 is input to the loop analysis unit 22. This input may be due to the first intermediate text being read from the first intermediate text storage unit 25 by the loop analysis unit 22.

ループ解析部２２は、入力した第１中間テキストについて、当該第１中間テキストのプログラム中に現れる全ループ構造を解析し、ループインデックス、ループの繰り返し数、ループ間の入れ子関係、ループ内の文のデータ依存および制御依存関係を調査する。そして、調査した結果の情報を有するループ情報を生成する。生成されたループ情報をループ情報記憶部２６に記憶させる。 The loop analyzing unit 22 analyzes the entire loop structure appearing in the program of the first intermediate text with respect to the input first intermediate text, and determines the loop index, the number of loop repetitions, the nesting relationship between the loops, and the sentence in the loop. Investigate data dependencies and control dependencies. And the loop information which has the information of the investigated result is produced | generated. The generated loop information is stored in the loop information storage unit 26.

その後、ループ変換部２３は、ループ情報記憶部２６に記憶されているループ情報を読み出す。読み出されたループ情報に基づいて、変換対象となる２重ループを選択する。ループ変換部２３の動作は以下に示す通りである。 Thereafter, the loop conversion unit 23 reads the loop information stored in the loop information storage unit 26. Based on the read loop information, a double loop to be converted is selected. The operation of the loop converter 23 is as follows.

まず、ループ内の文のデータ依存および制御依存がベクトル化可能条件を満たすループを選択する（ステップＳ１０１）。 First, a loop in which the data dependency and control dependency of a statement in the loop satisfy the vectorizable condition is selected (step S101).

ステップＳ１０１にて選択されたループの中から、ループの繰り返し数Ｎがベクトルプロセッサの最大ベクトル長ＶＬの１/２以下であるものを選択する（ステップＳ１０２）。 From the loops selected in step S101, a loop whose loop repetition number N is 1/2 or less of the maximum vector length VL of the vector processor is selected (step S102).

ステップＳ１０２にて選択されたループが別のループの内側ループとなっているものを選択する（ステップＳ１０３）。 The loop selected in step S102 is selected as an inner loop of another loop (step S103).

ステップＳ１０３にて選択されたループの外側ループの繰り返し数が２以上であるものを選択する（ステップＳ１０４）。 A loop with the number of repetitions of the outer loop selected in step S103 being 2 or more is selected (step S104).

ステップＳ１０４にて選択されたループの外側ループ内の文の間でデータ依存および制御依存が存在しないものを選択する（ステップＳ１０５）。 A sentence that does not have data dependency and control dependency among the statements in the outer loop of the loop selected in step S104 is selected (step S105).

以上のステップＳ１０１からステップＳ１０５で選択されたループおよびそのループの外側ループから構成される２重ループに対応する第１中間テキストを変換し、第２中間テキストとして生成する（ステップＳ１０６）。そして、生成した第２中間テキストを第２中間テキスト記憶部２７に記憶させる。 The first intermediate text corresponding to the double loop composed of the loop selected in steps S101 to S105 and the outer loop of the loop is converted and generated as the second intermediate text (step S106). Then, the generated second intermediate text is stored in the second intermediate text storage unit 27.

なお、３重以上入れ子になった多重ループの場合でも、それを構成している最内側の２重ループがステップＳ１０１からステップＳ１０５の条件を満たしていれば、その内側の２重ループ部分を変換対象とすることができる。 Even in the case of multiple loops nested in three or more layers, if the innermost double loop constituting the loop satisfies the conditions of steps S101 to S105, the inner double loop portion is converted. Can be targeted.

変換処理では、繰り返し数mod(M, min(M, VL/N))の第１のループと、それに続く、ループインデックスＪ２の始値がmod(M, min(M, VL/N))+1、限界値がＭ、増分値ＩＮＣがmin(M, VL/N)の第２のループとを生成する。 In the conversion process, the first loop of the repetition number mod (M, min (M, VL / N)) and the subsequent opening value of the loop index J2 is mod (M, min (M, VL / N)) + 1. A second loop having a limit value M and an increment value INC of min (M, VL / N) is generated.

第１のループのループ本体は、もとの２重ループの内側ループのループ本体中で参照されている外側ループのループインデックスＪを第１のループのループインデックスに置換し、内側ループのループインデックスＩの始値から限界値の範囲の要素の演算を行うベクトル演算とする。 The loop body of the first loop replaces the loop index J of the outer loop referenced in the loop body of the inner loop of the original double loop with the loop index of the first loop, and the loop index of the inner loop It is assumed that a vector operation is performed for calculating an element in the range from the starting value of I to the limit value.

第２のループのループ本体は、２種類のベクトル転送と１種類のベクトル演算とから構成される。１つ目のベクトル転送は、もとの２重ループの内側ループのループ本体で参照されている各配列Xn(I,J)に対し、Xn(1:N, J2)を大きさN*INCの作業用配列ＷＸｎの１からＮ番目の要素であるWXn(1:N)に、Xn(1:N,J2+1) をWXn(N+1:2*N)に、・・・、Xn(1:N, J2+INC-1) をWXn((INC-1)*N+1:INC*N)にそれぞれ転送するベクトル転送である。また、ベクトル演算は、内側ループ内の演算のオペランドの配列Xn(I,J)を対応する作業配列WXn(1:N*INC)に置換し、演算結果の格納先の配列Yn(I,J)を大きさN*INCの作業配列WYn(1:N*INC)に置換したベクトル演算である。また２つ目のベクトル転送は、ループ本体内で定義されている各配列Yn(I,J)に対し、対応する作業配列WYn(1:N)をYn(1:N,J2)に、WYn(N+1:2*N)をYn(1:N,J2+1)に、・・・、WYn((INC-1)*N+1:INC*N)をYn(1:N,J+INC-1)にそれぞれ転送するベクトル転送である。 The loop body of the second loop is composed of two types of vector transfers and one type of vector operation. The first vector transfer takes Xn (1: N, J2) of size N * INC for each array Xn (I, J) referenced in the loop body of the inner loop of the original double loop. , Xn (1: N, J2 + 1) into WXn (N + 1: 2 * N), WXn (1: N) as the 1st to Nth elements of the working array WXn In this vector transfer, (1: N, J2 + INC-1) is transferred to WXn ((INC-1) * N + 1: INC * N). The vector operation replaces the operand array Xn (I, J) of the operation in the inner loop with the corresponding work array WXn (1: N * INC), and stores the operation result storage array Yn (I, J ) Is replaced with a work array WYn (1: N * INC) of size N * INC. In the second vector transfer, for each array Yn (I, J) defined in the loop body, the corresponding work array WYn (1: N) is changed to Yn (1: N, J2) and WYn (N + 1: 2 * N) to Yn (1: N, J2 + 1), ..., WYn ((INC-1) * N + 1: INC * N) to Yn (1: N, J + INC-1) is a vector transfer.

そして、目的プログラム生成部２４は、第２中間テキスト記憶部２７から第２中間テキストを読み出し、読み出した第２中間テキストに基づいてベクトルプロセッサの目的プログラムを生成して目的プログラム記憶装置３へ送信する。
（実施例）
以下に本発明の実施例を示す。 Then, the target program generation unit 24 reads the second intermediate text from the second intermediate text storage unit 27, generates a vector processor target program based on the read second intermediate text, and transmits it to the target program storage device 3. .
(Example)
Examples of the present invention are shown below.

図３は、Ｆｏｒｔｒａｎ言語で記述された２重ループを含むソースプログラムの一例を示す図である。 FIG. 3 is a diagram illustrating an example of a source program including a double loop described in the Fortran language.

図３に示すソースプログラムは、配列Ｂと配列Ｃとの加算を行い、結果を配列Ａに求めるものである。このソースプログラムを入力し、最大ベクトル長ＶＬが１００であるベクトルプロセッサの目的プログラムを生成する場合を例に挙げる。 The source program shown in FIG. 3 adds the array B and the array C and obtains the result in the array A. As an example, the source program is input and a target program of a vector processor having a maximum vector length VL of 100 is generated.

ループ解析部２２は、ソースプログラム解析部２１が図３に示したソースプログラムから生成した第１中間テキストを解析し、ループ情報を生成する。生成されるループ情報は、ループＬ１の繰り返し数が１０、ループＬ２の繰り返し数が３０、ループＬ１，Ｌ２ともにデータ依存と制御依存なし、ループＬ１がループＬ２の外側ループとなっているという情報である。 The loop analysis unit 22 analyzes the first intermediate text generated from the source program shown in FIG. 3 by the source program analysis unit 21, and generates loop information. The generated loop information is information that the number of repetitions of the loop L1 is 10, the number of repetitions of the loop L2 is 30, both the loops L1 and L2 have no data dependency and control dependency, and the loop L1 is an outer loop of the loop L2. is there.

次に、ループ変換部２３は、ループ解析部２２が生成したループ情報から、ループＬ２がベクトル化可能であること、また３０である繰り返し数が最大ベクトル長１００の１/２以下であること、またループＬ２の外側にループＬ１が存在すること、ループＬ１の繰り返し数が２以上であることを判断する。そして、ループＬ１とループＬ２とから構成される２重ループを変換対象として選択し、第２中間テキストを生成する。 Next, from the loop information generated by the loop analysis unit 22, the loop conversion unit 23 can vectorize the loop L2, and the number of repetitions of 30 is 1/2 or less of the maximum vector length 100, Also, it is determined that the loop L1 exists outside the loop L2, and that the number of repetitions of the loop L1 is two or more. And the double loop comprised from the loop L1 and the loop L2 is selected as conversion object, and a 2nd intermediate text is produced | generated.

図４は、図１に示したループ変換部２３にて生成された第２中間テキストの一例を示す図である。 FIG. 4 is a diagram showing an example of the second intermediate text generated by the loop conversion unit 23 shown in FIG.

図４に示した第２中間テキストは、ループ変換部２３にて変換されたループＬ１１及びループＬ１２を有する。 The second intermediate text shown in FIG. 4 has a loop L11 and a loop L12 converted by the loop conversion unit 23.

ループＬ１１は、繰り返し数mod(10, min(10, 100/30))=mod(10, 3)=1でループインデックスがＪ１とする。ループＬ１１のループ本体は、ループＬ２内の文中に現れる外側ループＬ１のループインデックスＪをループＬ１１のループインデックスＪ１に置換し、ループＬ２のループインデックスＩを１から３０の要素のアクセスに置換したベクトル演算A(1:30,J1)←B(1:30,J1)+C(1:30,J1)とする。 The loop L11 has a repetition number mod (10, min (10, 100/30)) = mod (10, 3) = 1 and a loop index J1. The loop body of the loop L11 is a vector in which the loop index J of the outer loop L1 appearing in the sentence in the loop L2 is replaced with the loop index J1 of the loop L11, and the loop index I of the loop L2 is replaced with access of 1 to 30 elements. Operation A (1:30, J1) ← B (1:30, J1) + C (1:30, J1).

ループＬ１２は、始値がmod(10, min(10, 100/30))+1=mod(10, 3)+1=2、限界値がループＬ１１と同じ１０、増分値がmin(10, 100/30)=3であり、ループインデックスはＪ２である。ループＬ１２のループ本体は、３種類のベクトルと１種類のベクトル加算とから構成される。 In the loop L12, the opening price is mod (10, min (10, 100/30)) + 1 = mod (10,3) + 1 = 2, the limit value is the same as the loop L11, and the increment value is min (10, 100/30) = 3 and the loop index is J2. The loop body of the loop L12 includes three types of vectors and one type of vector addition.

１つ目のベクトル転送は、配列B(1:30,J2) を大きさ30*3=90の作業用配列ＷＸ１の先頭３０の要素に、またB(1:30,J2+1)をＷＸ１の３１から６０の要素に、またB(1:30,J2+2)をＷＸ１の６１から９０の要素にそれぞれ格納する３個のベクトル転送である。２つ目のベクトル転送は、C(1:30,J2) を作業用配列ＷＸ２の先頭３０の要素に、またC(1:30,J2+1)をＷＸ２の３１から６０の要素に、またC(1:30,J2+2)をＷＸ２の６１から９０の要素にそれぞれ転送する３個のベクトル転送である。ベクトル加算は、ＷＸ１とＷＸ２との９０の要素の加算結果を一時配列ＷＹ１に求めるベクトル加算である。３つ目のベクトル転送は、ＷＹ１の先頭３０の要素をA（1:30,J2）に、またＷＹ１の３１から６０の要素をA(1:30,J2+1)に、またＷＹ１の６１から９０の要素をA（1:30,J2+2）にそれぞれ転送する３個のベクトル転送である。 In the first vector transfer, array B (1: 30, J2) is set to the top 30 elements of work array WX1 of size 30 * 3 = 90, and B (1: 30, J2 + 1) is set to WX1. The three vector transfers store B in the 31 to 60 elements and B (1:30, J2 + 2) in the 61 to 90 elements of WX1, respectively. In the second vector transfer, C (1: 30, J2) is the first 30 elements of the working array WX2, C (1: 30, J2 + 1) is the 31st to 60th elements of WX2, and Three vector transfers for transferring C (1:30, J2 + 2) to 61 to 90 elements of WX2. The vector addition is a vector addition for obtaining the addition result of 90 elements of WX1 and WX2 in the temporary array WY1. In the third vector transfer, the first 30 elements of WY1 are set to A (1:30, J2), 31 to 60 elements of WY1 are set to A (1:30, J2 + 1), and 61 of WY1 is set. To vector transfer 3 to transfer 90 elements to A (1:30, J2 + 2) respectively.

ループ変換部２３は、変換した第２中間テキストを、第２中間テキスト記憶部２７に記憶させる。 The loop conversion unit 23 stores the converted second intermediate text in the second intermediate text storage unit 27.

そして、目的プログラム生成部２４は、第２中間テキスト記憶部２７から第２中間テキストを読み出し、読み出した第２中間テキストに基づいてベクトルプロセッサの目的プログラムを生成して目的プログラム記憶装置３へ送信する。この第２中間テキストに基づいて目的プログラムを生成する方法については、一般的な方法を用いるもので良い。 Then, the target program generation unit 24 reads the second intermediate text from the second intermediate text storage unit 27, generates a vector processor target program based on the read second intermediate text, and transmits it to the target program storage device 3. . As a method for generating the target program based on the second intermediate text, a general method may be used.

以上説明したように本発明においては、最大ベクトル長をＶＬ、外側ループの繰り返し数をＭ、内側ループの繰り返し数をＮとした場合、従来技術ではＮ要素の演算を行うmin(m,VL/N) 個のベクトル演算命令で処理していたのと同等の処理を、ループ変換部２３により、N*min(M, VL/N)要素の演算を行うベクトル演算命令１個で処理できる目的プログラムを生成できる。 As described above, in the present invention, when the maximum vector length is VL, the number of iterations of the outer loop is M, and the number of iterations of the inner loop is N, the conventional technique performs the calculation of N elements min (m, VL / N) A target program that can process the same processing as that performed by vector operation instructions with a single vector operation instruction that performs N * min (M, VL / N) element operations by the loop conversion unit 23. Can be generated.

これにより、プログラム中の内側ループの繰り返し数が最大ベクトル長の１/２以下である２重ループに対して、実効効率の高いベクトルプロセッサ向けの目的プログラムを生成できる。 As a result, a target program for a vector processor with high effective efficiency can be generated for a double loop in which the number of iterations of the inner loop in the program is 1/2 or less of the maximum vector length.

本発明のコンパイラの実施の一形態を示す図である。It is a figure which shows one Embodiment of the compiler of this invention. 図１に示したコンパイラにおけるコンパイル方法を説明するためのフローチャートである。It is a flowchart for demonstrating the compilation method in the compiler shown in FIG. Ｆｏｒｔｒａｎ言語で記述された２重ループを含むソースプログラムの一例を示す図である。It is a figure which shows an example of the source program containing the double loop described by the Fortran language. 図１に示したループ変換部にて生成された第２中間テキストの一例を示す図である。It is a figure which shows an example of the 2nd intermediate text produced | generated by the loop conversion part shown in FIG.

Explanation of symbols

１ソースプログラム記憶装置
２コンパイラ
３目的プログラム記憶装置
２１ソースプログラム解析部
２２ループ解析部
２３ループ変換部
２４目的プログラム生成部
２５第１中間テキスト記憶部
２６ループ情報記憶部
２７第２中間テキスト記憶部
Ｌ１，Ｌ２，Ｌ１１，Ｌ１２ループ 1 source program storage device 2 compiler 3 target program storage device 21 source program analysis unit 22 loop analysis unit 23 loop conversion unit 24 target program generation unit 25 first intermediate text storage unit 26 loop information storage unit 27 second intermediate text storage unit L1 , L2, L11, L12 loop

Claims

A compiler for converting a source program having a double loop of an outer loop and an inner loop into a target program,
When the number of iterations of the outer loop is M, the number of iterations of the inner loop is N, the maximum vector length of the source program is VL, and N is 1/2 or less of the VL,
The double loop,
A loop that uses a vector operation for calculating the elements of N as a loop body, and that uses the remainder of dividing M by a value obtained by dividing VL by N or the smaller value of M; ,
A vector operation for calculating an element of a value obtained by multiplying the value obtained by dividing the VL by the value obtained by dividing the VL by the N or the value obtained by multiplying the N by the N is a loop body, and the M is obtained by dividing the VL by the N. A compiler that converts a loop obtained by dividing a value obtained by dividing a value by M, whichever is smaller, into a double loop.

The compiler according to claim 1,
A compiler that generates the object program based on the converted double loop.

A compiling method for converting a source program having a double loop of an inner loop and an outer loop into a target program,
When the number of iterations of the outer loop is M, the number of iterations of the inner loop is N, the maximum vector length of the source program is VL, and N is 1/2 or less of the VL,
A first division process for dividing the VL by the N;
A second division process for dividing M by the smaller value of the result of the first division process and M;
A multiplication process for multiplying the smaller value of the result of the first division process and the M by the N;
A vector operation for calculating the element of N is a loop operation in which a vector operation for calculating the element of N is used as a loop body, and the number of repetitions is a remainder value of the result of the second division process. And a process of converting the loop body into a double loop with a loop having the value of the result of the second division process as the number of repetitions.

The compiling method according to claim 3,
A compiling method comprising processing for generating the object program based on the converted double loop.