JP2010244204A

JP2010244204A - Compiler program and compiler device

Info

Publication number: JP2010244204A
Application number: JP2009090479A
Authority: JP
Inventors: Shuichi Chiba; 修一千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-04-02
Filing date: 2009-04-02
Publication date: 2010-10-28
Anticipated expiration: 2029-04-02
Also published as: JP5251688B2

Abstract

PROBLEM TO BE SOLVED: To improve execution performance of a program by effectively using a cache. SOLUTION: A loop structure analysis part 31, an arrangement analysis part 32, and an access pattern analysis part 33 analyze context of each loop of a source program, arrangement data used in each loop, and an access pattern of the data. A dependency relation analysis part 34 analyzes dependency relation of the data processed in back-and-forth loops from the access pattern. A partial cache instruction part 35 performs determination such that a storage destination of the data processed in the preceding loop is allocated to one of a memory and the cache based on the dependency relation. An optimization part 36 generates a new source program wherein the preceding loop determined with the storage destination is replaced with a loop of cache use and a loop of a block store instruction. An object file generation part 37 generates an object file from the new source program. COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、コンパイラプログラムおよびコンパイラ装置に関する。 The present invention relates to a compiler program and a compiler apparatus.

従来より、グリッドコンピューティングや企業間で構築される業務システムでは、ソースプログラムからコンパイラによって生成されたアプリケーションプログラムが、様々なハードウェア構成のシステム上で動作することで科学演算や業務に利用されている。ここで、科学演算や業務に利用されるアプリケーションプログラムに対しては、高速に処理を実行することが要求される。 Traditionally, in business systems built between grid computing and companies, application programs generated by compilers from source programs are used for scientific computation and business by running on systems with various hardware configurations. Yes. Here, it is required to execute processing at high speed for an application program used for scientific computation or business.

処理速度を向上させるための方法としては、一般的には、ＣＰＵ（Central Processing Unit）とメモリとの間に高速アクセスが可能である小容量のキャッシュを配置し、キャッシュに保存したデータを用いることが行なわれている。 As a method for improving the processing speed, generally, a small-capacity cache capable of high-speed access is arranged between a CPU (Central Processing Unit) and a memory, and data stored in the cache is used. Has been done.

しかし、近年、アプリケーションプログラムの処理対象となるデータは、例えば、動画配信などのストリームデータのように、キャッシュのサイズに収まらない大量のデータである場合が増えてきている。このため、キャッシュ上にデータが保存された場合でも、ＣＰＵのアクセス時にキャッシュヒットする確率は、非常に低い。 However, in recent years, the data to be processed by the application program is increasing in the case of a large amount of data that does not fit in the cache size, for example, stream data for moving image distribution or the like. For this reason, even when data is stored on the cache, the probability of a cache hit when accessing the CPU is very low.

ここで、ＣＰＵが、プログラムにしたがって、ストリームデータを繰り返してアクセスする場合のキャッシュの状況について、図２６を用いて説明する。なお、図２６は、ストリームデータを繰り返してアクセスする場合のキャッシュの状況について説明するための図である。 Here, the cache status when the CPU repeatedly accesses the stream data in accordance with the program will be described with reference to FIG. FIG. 26 is a diagram for explaining the cache status when the stream data is repeatedly accessed.

例えば、ＣＰＵは、プログラムにしたがって、メモリからレジスタにストリームデータを順次読み出して所定の処理を実行し、処理後のストリームデータをキャッシュに格納したのちメモリにストアする。この場合、ＣＰＵは、図２６の（Ａ）に示すように、処理後のストリームデータ１をキャッシュに格納したのちにメモリにストアし、さらに、処理後のストリームデータ２〜ストリームデータｎを、順次、キャッシュに上書きしたのちにメモリにストアする。 For example, the CPU sequentially reads stream data from the memory to a register according to a program, executes predetermined processing, stores the processed stream data in a cache, and then stores the data in the memory. In this case, as shown in FIG. 26A, the CPU stores the stream data 1 after processing in the cache and then stores it in the memory. Further, the CPU sequentially stores the stream data 2 to stream data n after processing. , Store in memory after overwriting the cache.

しかし、ＣＰＵが、プログラムにしたがって、次の処理においてストリームデータのロードを実行する場合、キャッシュが保存しているデータは、図２６の（Ｂ）に示すように、前の処理における最終部分のデータ（ストリームデータｎ）となる。したがって、ＣＰＵが次の処理において最初にストリームデータ１のロードを実行する場合においては、図２６の（Ｂ）に示すように、キャッシュミスが発生する。 However, when the CPU executes loading of stream data in the next process according to the program, the data stored in the cache is the last part of the previous process as shown in FIG. (Stream data n). Therefore, when the CPU first loads the stream data 1 in the next process, a cache miss occurs as shown in FIG.

このように、ストリームデータなど、プログラムの処理対象となるデータサイズがキャッシュのサイズに収まらない場合では、データの再アクセス時に、キャッシュヒットが期待できず、キャッシュを用いたデータの再利用性が低くなり、プログラムの実行性能が低下してしまう。 In this way, when the data size to be processed by the program, such as stream data, does not fit within the cache size, a cache hit cannot be expected when data is re-accessed, and the reusability of data using the cache is low. As a result, the execution performance of the program is degraded.

このため、プログラムの実行性能を向上させる解決策の１つとして、プリフェッチ命令を利用した方法がある（例えば、特許文献１および２参照）。具体的には、処理に用いられるデータを先読みしてプリフェッチ命令をソースプログラム内に挿入することにより、処理に用いられるデータを予めメモリからキャッシュに格納してキャッシュミスを軽減させる方法である。 For this reason, as one of solutions for improving the execution performance of the program, there is a method using a prefetch instruction (see, for example, Patent Documents 1 and 2). Specifically, the data used for processing is prefetched and a prefetch instruction is inserted into the source program, so that the data used for processing is stored in the cache in advance from the memory to reduce cache misses.

しかし、ストリームデータのように連続性のあるデータにおいては、再利用されるデータをプリフェッチ命令で精度よくキャッシュに取り込むのは難しい。このため、プログラムの実行性能を向上させる解決策の別の方法として、ブロックストア命令を用いた方法がある。 However, in continuous data such as stream data, it is difficult to accurately retrieve data to be reused into a cache with a prefetch instruction. For this reason, as another method for improving the execution performance of the program, there is a method using a block store instruction.

ブロックストア命令とは、図２７に示すように、キャッシュへの保存を行わず、データ（ストリームデータ１、ストリームデータ２、〜、ストリームデータｎ）を直接メモリに書き込むことで、転送時間を短縮し、プログラムの実行性能を向上させる方法である。ブロックストア命令を利用することで、プリフェッチ命令をプログラム内に挿入するといった複雑な計算を実行することなく、プログラムの実行性能を向上させることができる。なお、図２７は、ブロックストア命令を説明するための図である。 As shown in FIG. 27, the block store instruction reduces the transfer time by directly writing data (stream data 1, stream data 2,..., Stream data n) to the memory without saving in the cache. This is a method for improving the execution performance of a program. By using the block store instruction, the execution performance of the program can be improved without executing a complicated calculation such as inserting a prefetch instruction into the program. FIG. 27 is a diagram for explaining a block store instruction.

特開平１０−２８３１９２号公報JP-A-10-283192 特開２００６−３３０８１３号公報JP 2006-330813 A

ところで、上記した従来の技術は、キャッシュを有効的に利用することができないため、プログラムの実行性能が低下してしまう場合があるという課題があった。例えば、上記した従来の技術により、ストリームデータに対する全てのストア命令をブロックストア命令に置き換えた場合、ＣＰＵは、ロード命令の実行時にキャッシュが全く利用できなくなってしまい、その結果、ロードの実行性能が著しく低下する。特に、ストリームデータに対し、プログラムが繰り返して更新処理を実行する場合、ブロックストア命令を用いることでキャッシュを活用できないと、プログラムの実行性能が低下してしまう。 However, the above-described conventional technique has a problem that the execution performance of the program may be deteriorated because the cache cannot be used effectively. For example, when all store instructions for stream data are replaced with block store instructions by the above-described conventional technique, the CPU cannot use the cache at the time of executing the load instruction, and as a result, the load execution performance is improved. It drops significantly. In particular, when a program repeatedly executes update processing on stream data, the execution performance of the program is degraded if the cache cannot be utilized by using a block store instruction.

なお、上記では、ストリームデータが処理対象である場合に、キャッシュを有効的に利用することができないため、プログラムの実行性能が低下してしまう場合があるという課題があったことを説明した。しかし、キャッシュのサイズに収まらない大容量の連続したデータが処理対象であるならば、上記した従来の技術を用いても、同様の課題があった。 In the above description, it has been described that there is a problem that the execution performance of the program may be deteriorated because the cache cannot be effectively used when the stream data is a processing target. However, if large-capacity continuous data that does not fit in the size of the cache is a processing target, there is a similar problem even if the conventional technique described above is used.

そこで、開示の技術は、上述した従来技術の課題を解決するためになされたものであり、キャッシュを有効的に利用することによりプログラムの実行性能を向上することが可能となるコンパイラプログラムおよびコンパイラ装置を提供することを目的とする。 Accordingly, the disclosed technique has been made to solve the above-described problems of the prior art, and a compiler program and a compiler apparatus that can improve the execution performance of a program by effectively using a cache. The purpose is to provide.

上述した課題を解決し、目的を達成するため、このプログラムは、ソースプログラムにおける各ループのデータへのアクセスパターンを解析するアクセスパターン解析手順と、
前記アクセスパターン解析手順によって解析された前記アクセスパターンから、現ループにおいて処理されるデータと次ループにおいて処理されるデータとの依存関係を解析する依存関係解析手順と、前記依存関係解析手順によって解析された前記依存関係に基づいて、前記現ループにおいて処理されたデータの格納先をメモリまたはキャッシュのいずれかに割り振るように決定する格納先決定手順と、前記格納先決定手順によって格納先が決定された現ループを、前記メモリに処理済みデータを格納するための第一ループおよび前記キャッシュに処理済みデータを格納するための第二ループに置き換えることにより、前記ソースプログラムから新規ソースプログラムを生成する新規ソースプログラム生成手順と、前記新規ソースプログラム生成手順によって生成された前記新規ソースプログラムからオブジェクトファイルを生成するオブジェクトファイル生成手順と、をコンピュータに実行させることを要件とする。 In order to solve the above-described problems and achieve the object, this program includes an access pattern analysis procedure for analyzing an access pattern to data of each loop in the source program,
From the access pattern analyzed by the access pattern analysis procedure, a dependency analysis procedure for analyzing a dependency relationship between data processed in the current loop and data processed in the next loop is analyzed by the dependency analysis procedure. Based on the dependency, a storage destination determination procedure for allocating a storage destination of data processed in the current loop to either a memory or a cache, and a storage destination is determined by the storage destination determination procedure A new source that generates a new source program from the source program by replacing the current loop with a first loop for storing processed data in the memory and a second loop for storing processed data in the cache Program generation procedure and the new source program And an object file generation step of generating object files from the new source program generated by the formation procedures, and the requirements that cause a computer to execute the.

開示のプログラムによれば、キャッシュを有効的に利用することによりプログラムの実行性能を向上することが可能となる。 According to the disclosed program, it is possible to improve the execution performance of the program by effectively using the cache.

図１は、本実施例におけるコンパイラ装置による処理の概念を説明するための図である。FIG. 1 is a diagram for explaining the concept of processing by the compiler apparatus in this embodiment. 図２は、本実施例におけるコンパイラ装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of the compiler apparatus in this embodiment. 図３は、ループ構造解析部を説明するための図である。FIG. 3 is a diagram for explaining the loop structure analysis unit. 図４は、ループデータ記憶部を説明するための図である。FIG. 4 is a diagram for explaining the loop data storage unit. 図５は、ループ構造解析部による処理を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining processing by the loop structure analysis unit. 図６は、ループの種類を説明するための図である。FIG. 6 is a diagram for explaining types of loops. 図７は、配列解析部を説明するための図である。FIG. 7 is a diagram for explaining the sequence analysis unit. 図８は、配列解析部による処理後におけるループデータ記憶部および配列データ記憶部を説明するための図である。FIG. 8 is a diagram for explaining the loop data storage unit and the sequence data storage unit after processing by the sequence analysis unit. 図９は、配列解析部による処理を説明するためのフローチャートである。FIG. 9 is a flowchart for explaining processing by the sequence analysis unit. 図１０は、アクセスパターン解析部を説明するための図である。FIG. 10 is a diagram for explaining the access pattern analysis unit. 図１１は、アクセスパターン解析部による処理後における配列データ記憶部を説明するための図である。FIG. 11 is a diagram for explaining the array data storage unit after processing by the access pattern analysis unit. 図１２は、アクセスパターン解析部による処理を説明するためのフローチャートである。FIG. 12 is a flowchart for explaining processing by the access pattern analysis unit. 図１３は、依存関係解析部を説明するための図である。FIG. 13 is a diagram for explaining the dependency analysis unit. 図１４は、依存関係解析部による処理後における配列データ記憶部を説明するための図である。FIG. 14 is a diagram for explaining the array data storage unit after processing by the dependency analysis unit. 図１５は、依存関係解析部による方向性の判断処理を説明するためのフローチャートである。FIG. 15 is a flowchart for explaining the directionality determination processing by the dependency relationship analysis unit. 図１６は、ハードウェア情報記憶部を説明するための図である。FIG. 16 is a diagram for explaining the hardware information storage unit. 図１７は、部分キャッシュ指示部による処理を説明するためのフローチャートである。FIG. 17 is a flowchart for explaining processing by the partial cache instruction unit. 図１８は、パターン１〜４のループ変形指示の概要を説明するための図である。FIG. 18 is a diagram for explaining the outline of the loop deformation instruction for patterns 1 to 4. 図１９は、パターン１のループ変形指示を説明するための図である。FIG. 19 is a diagram for explaining a loop deformation instruction of pattern 1. 図２０は、パターン２のループ変形指示を説明するための図である。FIG. 20 is a diagram for explaining a loop deformation instruction for pattern 2. FIG. 図２１は、パターン３のループ変形指示を説明するための図である。FIG. 21 is a diagram for explaining a loop deformation instruction of the pattern 3. 図２２は、パターン４のループ変形指示を説明するための図である。FIG. 22 is a diagram for explaining a loop deformation instruction of the pattern 4. 図２３は、本実施例におけるコンパイラ装置の処理を説明するためのフローチャートである。FIG. 23 is a flowchart for explaining processing of the compiler apparatus according to this embodiment. 図２４は、部分キャッシュ指示部による処理の変形例を説明するためのフローチャートである。FIG. 24 is a flowchart for explaining a modification of the processing by the partial cache instruction unit. 図２５は、本実施例のコンパイラプログラムを実行するコンピュータを示す図である。FIG. 25 is a diagram illustrating a computer that executes the compiler program of this embodiment. 図２６は、ストリームデータを繰り返してアクセスする場合のキャッシュの状況について説明するための図である。FIG. 26 is a diagram for explaining a cache state when the stream data is repeatedly accessed. 図２７は、ブロックストア命令を説明するための図である。FIG. 27 is a diagram for explaining a block store instruction.

以下に添付図面を参照して、本願の開示するコンパイラプログラムおよびコンパイラ装置の実施例を詳細に説明する。なお、以下では、本願の開示するコンパイラプログラムを実行するコンパイラ装置を実施例として説明する。 Exemplary embodiments of a compiler program and a compiler apparatus disclosed in the present application will be described below in detail with reference to the accompanying drawings. In the following, a compiler apparatus that executes a compiler program disclosed in the present application will be described as an embodiment.

まず、本実施例におけるコンパイラ装置による処理の概念について説明する。図１は、本実施例におけるコンパイラ装置による処理の概念を説明するための図である。 First, the concept of processing by the compiler apparatus in this embodiment will be described. FIG. 1 is a diagram for explaining the concept of processing by the compiler apparatus in this embodiment.

本実施例におけるコンパイラ装置は、ソースプログラムに記述されているループ単位の処理の間で、「前のループ」がストアしたデータを「後のループ」がロードする場合はキャッシュが有効に活用できると判断する。そして、本実施例におけるコンパイラ装置は、「後のループ」にて実行される処理の先頭で要求されるデータ（ロードするデータ）については、「前のループ」にて実行される処理の最後にアクセスし、かつ、キャッシュにストアするように、「前のループ」を変形する。「前のループ」の変形処理を行なったソースプログラムを実行する場合、「後のループ」にて実行される処理の先頭にてロードされるデータは、キャッシュヒットとなる。 The compiler apparatus according to the present embodiment can effectively use the cache when the “following loop” loads the data stored by the “previous loop” during the processing of the loop unit described in the source program. to decide. Then, the compiler apparatus according to the present embodiment, at the end of the process executed in the “previous loop”, is requested for data (data to be loaded) requested at the head of the process executed in the “later loop”. Transform the "previous loop" to access and store in the cache. When a source program that has been subjected to the “previous loop” transformation process is executed, the data loaded at the head of the process executed in the “subsequent loop” is a cache hit.

例えば、本実施例におけるコンパイラ装置は、図１の（Ａ）に示すように、「後のループ」の先頭にて処理されるストリームデータ１を「前のループ」の最後にて処理し、かつ、ストリームデータ１のみキャッシュに格納されるように「前のループ」を変形する。また、本実施例におけるコンパイラ装置は、例えば、図１の（Ａ）に示すように、「前のループ」にて処理されるストリームデータ１以外のストリームデータ２〜ｎについては、直接メモリに格納するブロックストア命令を利用するように「前のループ」を変形する。これにより、図１の（Ａ）に示すように、「後のループ」における処理の先頭にロードされるストリームデータ１は、キャッシュヒットとなる。 For example, as shown in FIG. 1A, the compiler apparatus according to the present embodiment processes the stream data 1 processed at the head of the “following loop” at the end of the “previous loop”, and The “previous loop” is modified so that only stream data 1 is stored in the cache. Further, for example, as shown in FIG. 1A, the compiler apparatus according to the present embodiment directly stores the stream data 2 to n other than the stream data 1 processed in the “previous loop” in the memory. The “previous loop” is modified to use the block store instruction to perform. As a result, as shown in FIG. 1A, the stream data 1 loaded at the beginning of the processing in the “later loop” becomes a cache hit.

図１の（Ｂ）に示すソースプログラムの具体例を用いて上述した処理をより具体的に説明する。図１の（Ｂ）に示すようなソースプログラムを実行する場合、ループＸにおいて、配列ａ、配列ｂおよび配列ｃのデータが初期設定され、ループＹにおいて、配列ａのデータが確定され、ループＺにおいて、配列ａのデータが参照される。ここで、配列ａ、配列ｂおよび配列ｃのデータがキャッシュに収まらないストリームデータであるとすると、通常のストア命令によりキャッシュを利用した場合、配列ａのデータは、「ａ（ｎ−キャッシュサイズ）からａ（ｎ）」の範囲でキャッシュに保存される。しかし、次に実行されるループＺでは、配列ａをデータの先頭からロードするため、キャッシュのデータを利用することはできない。一方、ブロックストア命令を利用した場合、配列ａのデータは、キャッシュには保存されていないので、キャッシュを有効的に用いることができない。 The above-described processing will be described more specifically using a specific example of the source program shown in FIG. When the source program as shown in FIG. 1B is executed, the data of the array a, the array b, and the array c are initialized in the loop X, the data of the array a is determined in the loop Y, and the loop Z The data of the array a is referred to. Here, assuming that the data of the array a, the array b, and the array c is stream data that does not fit in the cache, when the cache is used by a normal store instruction, the data of the array a is “a (n−cache size)”. To a (n) ". However, in the next loop Z to be executed, since the array a is loaded from the beginning of the data, the cache data cannot be used. On the other hand, when the block store instruction is used, the data of the array a is not stored in the cache, so that the cache cannot be used effectively.

そこで、本実施例におけるコンパイラ装置は、ソースプログラムにおける前後のループ間におけるデータのアクセスパターンの依存関係を解析し、例えば、図１の（Ｂ）に示すソースプログラムのループＺにおいて、ループＹにて処理された配列ａのデータが参照されると判定する。そして、本実施例におけるコンパイラ装置は、図１の（Ｂ）に示すように、ループＹとループＺとの間で、キャッシュを部分利用すると決定する。そして、本実施例におけるコンパイラ装置は、ループＹを、キャッシュストア命令が書き込まれたループと、ブロックストア命令が書き込まれたループとの２つのループに変形する。 Therefore, the compiler apparatus in the present embodiment analyzes the dependency of the data access pattern between the loops before and after in the source program. For example, in the loop Z of the source program shown in FIG. It is determined that the data of the processed array a is referred to. Then, the compiler apparatus according to the present embodiment determines to partially use the cache between the loop Y and the loop Z as shown in FIG. The compiler apparatus according to the present embodiment transforms the loop Y into two loops, that is, a loop in which a cache store instruction is written and a loop in which a block store instruction is written.

そして、本実施例におけるコンパイラ装置は、ループ変形されたソースプログラムを最適化処理したのちにオブジェクトファイルを生成する。これにより、本実施例におけるコンパイラ装置から出力されたオブジェクトファイルを実行する情報処理装置においては、部分的ではあるがキャッシュを有効的に利用することにより、ストリームデータに対するアクセスが効率化され、プログラムの実行性能を向上することが可能となる。 The compiler apparatus according to the present embodiment generates an object file after optimizing the loop-modified source program. As a result, in the information processing apparatus that executes the object file output from the compiler apparatus according to the present embodiment, access to the stream data is made efficient by using the cache effectively although it is partial, and the program Execution performance can be improved.

次に、本実施例におけるコンパイラ装置の構成について、図２を用いて説明する。図２は、本実施例におけるコンパイラ装置の構成を示すブロック図である。 Next, the configuration of the compiler apparatus in this embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing the configuration of the compiler apparatus in this embodiment.

図２に示すように、本実施例におけるコンパイラ装置１０は、ソースプログラム入力部１１と、オブジェクトファイル出力部１２と、通信部１３と、入出力制御Ｉ／Ｆ部１４と、記憶部２０と、処理部３０とを有する。また、本実施例におけるコンパイラ装置１０は、図２に示すように、情報処理装置４０と接続される。 As shown in FIG. 2, the compiler apparatus 10 according to the present embodiment includes a source program input unit 11, an object file output unit 12, a communication unit 13, an input / output control I / F unit 14, a storage unit 20, And a processing unit 30. Further, the compiler apparatus 10 in this embodiment is connected to the information processing apparatus 40 as shown in FIG.

情報処理装置４０は、コンパイラ装置１０から出力されたオブジェクトファイルを実行するＰＣ（Personal computer）などの装置である。なお、本実施例では、コンパイラ装置１０と情報処理装置４０とが独立した装置である場合について説明するが、本実施例はこれに限定されるものではなく、コンパイラ装置１０が情報処理装置４０に組み込まれている場合であってもよい。 The information processing device 40 is a device such as a PC (Personal computer) that executes the object file output from the compiler device 10. In the present embodiment, the case where the compiler apparatus 10 and the information processing apparatus 40 are independent apparatuses will be described. However, the present embodiment is not limited to this, and the compiler apparatus 10 is included in the information processing apparatus 40. It may be a case where it is incorporated.

ソースプログラム入力部１１は、プログラマが作成したソースプログラムを受け付け、オブジェクトファイル出力部１２は、処理部３０により生成されたオブジェクトファイルを、情報処理装置４０に出力する。通信部１３は、情報処理装置４０から後述するハードウェア情報を受信する。 The source program input unit 11 receives a source program created by a programmer, and the object file output unit 12 outputs the object file generated by the processing unit 30 to the information processing apparatus 40. The communication unit 13 receives hardware information described later from the information processing apparatus 40.

通信部１３は、情報処理装置４０から後述するハードウェア情報を受信する。 The communication unit 13 receives hardware information described later from the information processing apparatus 40.

入出力制御Ｉ／Ｆ部１４は、ソースプログラム入力部１１、オブジェクトファイル出力部１２および通信部１３と、記憶部２０および処理部３０との間におけるデータ転送を制御する。 The input / output control I / F unit 14 controls data transfer among the source program input unit 11, the object file output unit 12 and the communication unit 13, and the storage unit 20 and the processing unit 30.

記憶部２０は、ソースプログラム入力部１１が受け付けたソースプログラムや後述する処理部３０による各種処理結果を記憶する。ここで、記憶部２０は、特に本実施例に密接に関連するものとして、図２に示すように、ソースプログラム記憶部２１と、ループデータ記憶部２２と、配列データ記憶部２３と、ハードウェア情報記憶部２４と、新規ソースプログラム記憶部２５とを有する。 The storage unit 20 stores the source program received by the source program input unit 11 and various processing results by the processing unit 30 described later. Here, the storage unit 20 is particularly closely related to the present embodiment, and as shown in FIG. 2, a source program storage unit 21, a loop data storage unit 22, an array data storage unit 23, a hardware An information storage unit 24 and a new source program storage unit 25 are included.

ソースプログラム記憶部２１は、ソースプログラム入力部１１が受け付けたソースプログラムを記憶する。例えば、ソースプログラム記憶部２１は、図１の（Ｂ）に示すようなソースプログラムを記憶する。 The source program storage unit 21 stores the source program received by the source program input unit 11. For example, the source program storage unit 21 stores a source program as shown in FIG.

なお、ループデータ記憶部２２と、配列データ記憶部２３と、ハードウェア情報記憶部２４と、新規ソースプログラム記憶部２５とについては、処理部３０を説明する際に、詳述する。 The loop data storage unit 22, the array data storage unit 23, the hardware information storage unit 24, and the new source program storage unit 25 will be described in detail when the processing unit 30 is described.

処理部３０は、入出力制御Ｉ／Ｆ部１４から転送されたソースプログラムがソースプログラム記憶部２１に格納された際に、各種処理を実行する。ここで、処理部３０は、特に本実施例に密接に関連するものとして、図２に示すように、ループ構造解析部３１と、配列解析部３２と、アクセスパターン解析部３３と、依存関係解析部３４と、部分キャッシュ指示部３５とを有する。さらに、処理部３０は、特に本実施例に密接に関連するものとして、図２に示すように、最適化部３６と、オブジェクトファイル生成部３７とを有する。なお、ループ構造解析部３１、配列解析部３２、アクセスパターン解析部３３、依存関係解析部３４および部分キャッシュ指示部３５は、ソースプログラムの解析処理を分散して実行する処理部である。 The processing unit 30 executes various processes when the source program transferred from the input / output control I / F unit 14 is stored in the source program storage unit 21. Here, the processing unit 30 is particularly closely related to the present embodiment, and as shown in FIG. 2, a loop structure analysis unit 31, a sequence analysis unit 32, an access pattern analysis unit 33, and a dependency relationship analysis. Unit 34 and a partial cache instruction unit 35. Further, the processing unit 30 has an optimization unit 36 and an object file generation unit 37 as shown in FIG. 2 as being particularly closely related to the present embodiment. The loop structure analysis unit 31, the sequence analysis unit 32, the access pattern analysis unit 33, the dependency relationship analysis unit 34, and the partial cache instruction unit 35 are processing units that execute the analysis processing of the source program in a distributed manner.

ループ構造解析部３１は、ソースプログラム記憶部２１が記憶するソースプログラムにおけるループ構造を解析する。具体的には、ループ構造解析部３１は、ソースプログラムにおけるＤＯループやＦＯＲループなどを抽出し、抽出したループ間の処理の大まかな流れを解析する。ここで、ループ構造解析部３１は、抽出したループが多重ループである場合、データ配列の演算が記載されているループの中で、最も内側にあるループ（最内のループ）を、後続する処理の対象であるループ（対象ループ）とする。なお、多重ループである場合の処理については、後に詳述する。 The loop structure analysis unit 31 analyzes the loop structure in the source program stored in the source program storage unit 21. Specifically, the loop structure analysis unit 31 extracts a DO loop, a FOR loop, and the like in the source program, and analyzes a rough flow of processing between the extracted loops. Here, when the extracted loop is a multiple loop, the loop structure analysis unit 31 performs a subsequent process on the innermost loop (the innermost loop) among the loops in which the operation of the data array is described. The loop that is the target of (the target loop). The processing in the case of multiple loops will be described in detail later.

例えば、ループ構造解析部３１は、図１の（Ｂ）に示すソースプログラムを解析し、最内のループである「ループＸ（do j）」、「ループＹ（do j）」および「ループＺ（do j）」を抽出する。そして、ループ構造解析部３１は、図３に示すように、図１の（Ｂ）に示すソースプログラムが、「ループＸ」から「ループＹ」へ、「ループＹ」から「ループＺ」へと処理が進行するループ構造であると解析する。なお、図３は、ループ構造解析部を説明するための図である。 For example, the loop structure analysis unit 31 analyzes the source program shown in FIG. 1B, and “loop X (do j)”, “loop Y (do j)”, and “loop Z” that are the innermost loops. (Do j) "is extracted. Then, as shown in FIG. 3, the loop structure analysis unit 31 changes the source program shown in FIG. 1B from “loop X” to “loop Y”, from “loop Y” to “loop Z”. Analyze the loop structure as the process proceeds. FIG. 3 is a diagram for explaining the loop structure analysis unit.

そして、ループ構造解析部３１は、解析結果に基づいて、ループデータ記憶部２２にて記憶されるループデータの初期設定を行なう。例えば、ループ構造解析部３１は、図４に示すように、「ループＸ」、「ループＹ」および「ループＺ」それぞれのループデータを初期設定する。なお、図４は、ループデータ記憶部を説明するための図である。 Then, the loop structure analysis unit 31 performs initial setting of loop data stored in the loop data storage unit 22 based on the analysis result. For example, as shown in FIG. 4, the loop structure analysis unit 31 initializes loop data of “loop X”, “loop Y”, and “loop Z”. FIG. 4 is a diagram for explaining the loop data storage unit.

すなわち、ループ構造解析部３１は、ループデータ記憶部２２において、図４に示すように、「ループ名：Ｘ」、「ループ名：Ｙ」および「ループ名：Ｚ」となる３つのループデータを初期設定する。ここで、ループデータには、ループ構造解析部３１の解析結果として、ループ間処理の流れを示す「ｐｒｅｖ」および「ｎｅｘｔ」の項目が用意される。「ｐｒｅｖ」および「ｎｅｘｔ」の項目が設定されることにより、各ループデータには、ループ間の連鎖に関する情報が格納されることとなる。 That is, as shown in FIG. 4, the loop structure analyzing unit 31 stores three pieces of loop data “loop name: X”, “loop name: Y”, and “loop name: Z” in the loop data storage unit 22. Initial setting. Here, items of “prev” and “next” indicating the flow of processing between loops are prepared as the analysis result of the loop structure analysis unit 31 in the loop data. By setting the items “prev” and “next”, each loop data stores information about the chain between loops.

例えば、「ループ名：Ｘ」のループデータには、「ループＸ」の前に実行されるループが無いことから「ｐｒｅｖ：ＮＵＬＬ」が格納される。また、「ループ名：Ｘ」のループデータには、「ループＸ」の後に実行されるループが「ループＹ」であることから「ｎｅｘｔ：＊Ｙ」が格納される。 For example, “prev: NULL” is stored in the loop data of “loop name: X” because there is no loop executed before “loop X”. The loop data “loop name: X” stores “next: * Y” because the loop executed after “loop X” is “loop Y”.

また、「ループ名：Ｙ」のループデータには、「ｐｒｅｖ：＊Ｘ」および「ｎｅｘｔ：＊Ｚ」が格納され、「ループ名：Ｚ」のループデータには、「ｐｒｅｖ：＊Ｙ」および「ｎｅｘｔ：ＮＵＬＬ」が格納される。 Further, “prev: * X” and “next: * Z” are stored in the loop data of “loop name: Y”, and “prev: * Y” and “prev: * Y” are stored in the loop data of “loop name: Z”. “Next: NULL” is stored.

なお、ループデータにおいて、図４に示す「配列データ」に格納されるデータは、後述する処理により決定されるため、ループ構造解析部３１による初期設定時では、すべて「ＮＵＬＬ」となる。 In the loop data, the data stored in the “array data” shown in FIG. 4 is determined by the processing described later, and therefore all of the data is “NULL” at the time of initial setting by the loop structure analysis unit 31.

ここで、ループ構造解析部３１による詳細な処理の手順について、図５および図６を用いて説明する。なお、図５は、ループ構造解析部による処理を説明するためのフローチャートであり、図６は、ループの種類を説明するための図である。 Here, a detailed processing procedure by the loop structure analysis unit 31 will be described with reference to FIGS. 5 and 6. FIG. 5 is a flowchart for explaining processing by the loop structure analysis unit, and FIG. 6 is a diagram for explaining types of loops.

図５に示すように、ループ構造解析部３１は、ソースプログラムが入力されると（ステップＳ１０１肯定）、ソースプログラム記憶部２１からソースプログラムを読み出して、関数内の外側ループを検索する（ステップＳ１０２）。すなわち、ループ構造解析部３１は、ソースプログラムにおいて、関数の直下のスコープにあるループをソースプログラムから検索する。 As shown in FIG. 5, when the source program is input (Yes at Step S101), the loop structure analyzing unit 31 reads the source program from the source program storage unit 21 and searches for an outer loop in the function (Step S102). ). That is, the loop structure analysis unit 31 searches the source program for a loop in the scope immediately below the function in the source program.

そして、ループ構造解析部３１は、検索した外側ループが多重ループであるか否かを解析する（ステップＳ１０３）。なお、ループ構造解析部３１は、ソースプログラムにおける出現順に、検索した外側ループを処理する。 Then, the loop structure analysis unit 31 analyzes whether or not the searched outer loop is a multiple loop (step S103). The loop structure analysis unit 31 processes the searched outer loops in the order of appearance in the source program.

ここで、検索した外側ループが多重ループでない場合（ステップＳ１０３否定）、ループ構造解析部３１は、検索したループ（外側ループ）を対象ループに含め（ステップＳ１０４）、ループデータの作成を実行する（ステップＳ１０７）。具体的には、ループ構造解析部３１は、対象ループのループ名が設定されたループデータを作成する。 Here, when the searched outer loop is not a multiple loop (No at Step S103), the loop structure analysis unit 31 includes the searched loop (outer loop) in the target loop (Step S104), and creates loop data (Step S104). Step S107). Specifically, the loop structure analysis unit 31 creates loop data in which the loop name of the target loop is set.

一方、検索した外側ループが多重ループである場合（ステップＳ１０３肯定）、ループ構造解析部３１は、多重ループがタイトリループであるか否かを解析する（ステップＳ１０５）。 On the other hand, when the searched outer loop is a multiple loop (Yes at Step S103), the loop structure analysis unit 31 analyzes whether the multiple loop is a title loop (Step S105).

ここで、タイトリループ（タイトリなループ）とは、ループとループとの間で処理がないループのことである。例えば、図６の（Ａ）の左側に示す外側ループ「ループＸ」は、「Ａ＝Ａ＋Ｂ」を実行する「ループＹ」を含む多重ループであるが、「ループＸ」と「ループＹ」との間で実行される処理がないため、タイトリなループとなる。一方、図６の（Ａ）の右側に示す外側ループ「ループＸ」は、「Ａ＝Ａ＋Ｂ」を実行する「ループＹ」を含む多重ループであるが、「ループＸ」において「Ｃ＝Ｃ＋Ｄ」の処理が実行されるため、タイトリでないループとなる。 Here, a title loop (a title loop) is a loop in which no processing is performed between the loops. For example, the outer loop “loop X” shown on the left side of FIG. 6A is a multiple loop including “loop Y” that executes “A = A + B”, but “loop X” and “loop Y” Since there is no processing executed between the two, it becomes a tight loop. On the other hand, the outer loop “loop X” shown on the right side of FIG. 6A is a multiple loop including “loop Y” for executing “A = A + B”, but “C = C + D” in “loop X”. Since this process is executed, the loop is not a title.

図５に戻って、ループ構造解析部３１は、検索した外側ループが多重ループであるが、タイトリループでない場合（ステップＳ１０５否定）、検索した外側ループを後続処理の対象外であるとし、検索した外側ループをすべて処理したか否かを判定する（ステップＳ１０８）。例えば、図６の（Ｂ）の右側に示す「ループＸ」は、対象外とされる。 Returning to FIG. 5, if the searched outer loop is a multiple loop, but is not a title loop (No in step S 105), the loop structure analyzing unit 31 determines that the searched outer loop is not subject to subsequent processing. It is determined whether or not all the outer loops have been processed (step S108). For example, “loop X” shown on the right side of FIG. 6B is excluded.

一方、ループ構造解析部３１は、検索した外側ループが多重ループであり、タイトリループである場合（ステップＳ１０５肯定）、タイトリループの最内のループを対象ループに含め（ステップＳ１０６）、ループデータの作成を実行する（ステップＳ１０７）。そして、ループ構造解析部３１は、検索した外側ループをすべて処理したか否かを判定する（ステップＳ１０８）。 On the other hand, if the searched outer loop is a multiple loop and is a title loop (Yes in step S105), the loop structure analyzing unit 31 includes the innermost loop of the title loop in the target loop (step S106). Data creation is executed (step S107). Then, the loop structure analyzing unit 31 determines whether or not all the searched outer loops have been processed (step S108).

例えば、図６の（Ａ）の左側に示す外側「ループＸ」においては、「ループＹ」が対象ループとされる。また、タイトリループ内のループが兄弟ループである場合、対象ループが複数となる。例えば、図６の（Ｂ）の左側に示すように、同一階層にある「ループＡ」および「ループＢ」は、兄弟ループとなり、ループ構造解析部３１は、「ループＡ」および「ループＢ」を対象ループとする。一方、図６の（Ｂ）の右側に示すように、同一階層にない「ループＡ」および「ループＢ」は、兄弟ループではなく、ループ構造解析部３１は、「ループＢ」より内側にある「ループＡ」を対象ループとする。 For example, in the outer “loop X” shown on the left side of FIG. 6A, “loop Y” is the target loop. When the loop in the title loop is a sibling loop, there are a plurality of target loops. For example, as shown on the left side of FIG. 6B, “Loop A” and “Loop B” in the same hierarchy are sibling loops, and the loop structure analysis unit 31 performs “Loop A” and “Loop B”. Is the target loop. On the other hand, as shown on the right side of FIG. 6B, “Loop A” and “Loop B” that are not in the same hierarchy are not sibling loops, and the loop structure analysis unit 31 is inside “Loop B”. Let “Loop A” be the target loop.

図５に戻って、ループ構造解析部３１は、未処理の外側ループがある場合（ステップＳ１０８否定）、ステップＳ１０３に戻って、次の外側ループに対する処理を実行する。 Returning to FIG. 5, when there is an unprocessed outer loop (No at Step S 108), the loop structure analyzing unit 31 returns to Step S 103 and executes processing for the next outer loop.

一方、ループ構造解析部３１は、検索した外側ループをすべて処理した場合（ステップＳ１０８肯定）、処理を終了する。なお、ループ構造解析部３１は、対象ループのループ名が設定された順に応じて、ループデータの「ｐｒｅｖ」および「ｎｅｘｔ」に、前後にあるループのループ名を設定する。 On the other hand, when all the searched outer loops have been processed (Yes at Step S108), the loop structure analyzing unit 31 ends the processing. The loop structure analysis unit 31 sets the loop names of the loops before and after “prev” and “next” of the loop data in accordance with the order in which the loop names of the target loop are set.

図２に戻って、配列解析部３２は、ループ構造解析部３１による解析の結果、ループデータに設定された対象ループを再解析し、対象ループで利用されているデータの配列（以下、配列と記す）を列挙する。 Returning to FIG. 2, the sequence analysis unit 32 reanalyzes the target loop set in the loop data as a result of the analysis by the loop structure analysis unit 31, and the sequence of data used in the target loop (hereinafter referred to as sequence and sequence). List).

例えば、配列解析部３２は、図１の（Ｂ）に示すソースプログラムを解析し、図７に示すように、「ループＸ」において利用されている配列が「配列ａ、配列ｂ、配列ｃ」の３つの配列であると列挙する。また、配列解析部３２は、図１の（Ｂ）に示すソースプログラムを解析し、図７に示すように、「ループＹ」において利用されている配列が「配列ａ、配列ｂ、配列ｃ」の３つの配列であると列挙する。また、配列解析部３２は、図１の（Ｂ）に示すソースプログラムを解析し、図７に示すように、「ループＺ」において利用されている配列が「配列ａ」の１つの配列であると列挙する。なお、図７は、配列解析部を説明するための図である。 For example, the sequence analysis unit 32 analyzes the source program shown in FIG. 1B, and as shown in FIG. 7, the sequences used in “loop X” are “array a, array b, array c”. The three sequences are listed. Further, the sequence analysis unit 32 analyzes the source program shown in FIG. 1B, and as shown in FIG. 7, the sequences used in the “loop Y” are “array a, array b, array c”. The three sequences are listed. Further, the sequence analysis unit 32 analyzes the source program shown in FIG. 1B, and as shown in FIG. 7, the sequence used in “loop Z” is one sequence of “array a”. Are listed. FIG. 7 is a diagram for explaining the sequence analysis unit.

そして、配列解析部３２は、解析結果を、ループデータ記憶部２２のループデータに設定するとともに、配列データ記憶部２３において配列データを初期設定する。 Then, the sequence analysis unit 32 sets the analysis result as loop data in the loop data storage unit 22 and initializes the sequence data in the sequence data storage unit 23.

例えば、配列解析部３２は、図８に示すように、「ループ名：ループＸ」のループデータにおける項目「配列データ」に利用されている配列数が３つであることを示す「ａｒｒａｙ[３]」を設定する。同様に、配列解析部３２は、図８に示すように、「ループ名：ループＹ」のループデータにおける項目「配列データ」に「ａｒｒａｙ[３]」を設定し、「ループ名：ループＺ」のループデータにおける項目「配列データ」に「ａｒｒａｙ[１]」を設定する。なお、図８は、配列解析部による処理後におけるループデータ記憶部および配列データ記憶部を説明するための図である。 For example, as illustrated in FIG. 8, the sequence analysis unit 32 displays “array [3] indicating that the number of sequences used for the item“ sequence data ”in the loop data“ loop name: loop X ”is three. ] ”. Similarly, as shown in FIG. 8, the sequence analysis unit 32 sets “array [3]” to the item “array data” in the loop data of “loop name: loop Y”, and “loop name: loop Z”. “Array [1]” is set in the item “array data” in the loop data of. FIG. 8 is a diagram for explaining the loop data storage unit and the array data storage unit after processing by the sequence analysis unit.

さらに、配列解析部３２は、図８に示すように、「ループ名：Ｘ」のループデータに連鎖して、「配列名」が『「ａ」、「ｂ」、「ｃ」』からなる配列データを初期設定する。同様に、配列解析部３２は、図８に示すように、「ループ名：Ｙ」のループデータに連鎖して、「配列名」が『「ａ」、「ｂ」、「ｃ」』からなる配列データを初期設定し、「ループ名：ループＺ」のループデータに連鎖して、「配列名」が「ａ」からなる配列データを初期設定する。 Further, as shown in FIG. 8, the sequence analysis unit 32 is linked to the loop data of “loop name: X”, and an array whose “array name” is “a”, “b”, “c” ”. Initialize the data. Similarly, as shown in FIG. 8, the sequence analysis unit 32 is linked to the loop data “loop name: Y”, and the “array name” is composed of “a”, “b”, and “c”. Array data is initialized, and array data whose “array name” is “a” is linked to the loop data of “loop name: loop Z”.

なお、図８に示すように、配列データには、「初期値」、「終値」および「増分値」からなる項目がある。「初期値」、「終値」および「増分値」の項目は、後述するアクセスパターン解析部３３の処理により設定されるため、配列解析部３２による配列データの初期設定時においては、すべて「ＮＵＬＬ」と設定される。 As shown in FIG. 8, the array data includes items including “initial value”, “final value”, and “increment value”. Since the items of “initial value”, “final value”, and “increment value” are set by processing of the access pattern analysis unit 33 described later, all of the items “NULL” are set when the array data is initially set by the sequence analysis unit 32. Is set.

ここで、配列解析部３２による詳細な処理の手順について、図９を用いて説明する。なお、図９は、配列解析部による処理を説明するためのフローチャートである。 Here, a detailed processing procedure by the sequence analysis unit 32 will be described with reference to FIG. FIG. 9 is a flowchart for explaining processing by the sequence analysis unit.

図９に示すように、配列解析部３２は、ループ構造解析部３１により対象ループが抽出されると（ステップＳ２０１肯定）、対象ループに記載されている配列名から配列データの設定処理を開始する（ステップＳ２０２）。すなわち、配列解析部３２は、抽出された対象ループごとに以下の処理を開始する。 As illustrated in FIG. 9, when the target loop is extracted by the loop structure analysis unit 31 (Yes in step S201), the sequence analysis unit 32 starts the sequence data setting process from the sequence name described in the target loop. (Step S202). That is, the sequence analysis unit 32 starts the following process for each extracted target loop.

そして、配列解析部３２は、配列の１次元目の添え字に利用されている変数名を抽出し（ステップＳ２０３）、抽出した変数名の変数が対象ループの制御変数であるか否かを判定する（ステップＳ２０４）。例えば、配列解析部３２は、図１の（Ｂ）に示すソースプログラムにおいて、対象ループである「ループＸ（do j）」において、「配列ａ」の制御変数に「変数名：j」が登場するか否かを判定する。 Then, the array analysis unit 32 extracts a variable name used for the first dimension subscript of the array (step S203), and determines whether or not the extracted variable name variable is a control variable of the target loop. (Step S204). For example, in the source program shown in FIG. 1B, the sequence analysis unit 32 makes “variable name: j” appear as a control variable of “array a” in “loop X (do j)” that is the target loop. It is determined whether or not to do.

ここで、変数が対象ループの制御変数でない場合（ステップＳ２０４否定）、配列解析部３２は、対象ループをループデータの連鎖から削除する（ステップＳ２０６）。 Here, when the variable is not a control variable of the target loop (No at Step S204), the array analysis unit 32 deletes the target loop from the chain of loop data (Step S206).

一方、変数が対象ループの制御変数である場合（ステップＳ２０４肯定）、配列解析部３２は、ループデータと連鎖した配列データを設定する（ステップ２０５）。例えば、配列解析部３２は、図１の（Ｂ）に示すソースプログラムにおいて、対象ループである「ループＸ（do j）」において、「配列ａ〜ｃ」の制御変数に「変数名：j」が登場すると判定する。これにより、配列解析部３２は、「ループ名：ループＸ」のループデータに連鎖して、「配列名」が『「ａ」、「ｂ」、「ｃ」』からなる配列データを初期設定する。 On the other hand, if the variable is a control variable of the target loop (Yes at Step S204), the array analysis unit 32 sets array data linked to the loop data (Step 205). For example, in the source program shown in FIG. 1B, the sequence analysis unit 32 sets “variable name: j” as a control variable of “array a to c” in “loop X (do j)” that is a target loop. Is determined to appear. As a result, the sequence analysis unit 32 initializes sequence data whose “sequence name” is “a”, “b”, “c” in a chain with the loop data of “loop name: loop X”. .

ステップＳ２０５およびステップＳ２０６の処理ののち、配列解析部３２は、対象ループをすべて処理したか否かを判定する（ステップＳ２０７）。 After the processing of step S205 and step S206, the sequence analysis unit 32 determines whether all the target loops have been processed (step S207).

未処理の対象ループがある場合（ステップＳ２０７否定）、配列解析部３２は、ステップＳ２０２に戻って、次の対象ループに対する処理を開始する。 If there is an unprocessed target loop (No at Step S207), the sequence analysis unit 32 returns to Step S202 and starts processing for the next target loop.

一方、配列解析部３２は、対象ループをすべて処理した場合（ステップＳ２０７肯定）、処理を終了する。 On the other hand, the arrangement | sequence analysis part 32 complete | finishes a process, when all the object loops are processed (step S207 affirmation).

図２に戻って、アクセスパターン解析部３３は、配列データが設定された対象ループの配列におけるデータのアクセスパターンを解析する。例えば、アクセスパターン解析部３３は、「配列名：ａ」の配列データが設定された図１の（Ｂ）に示すソースプログラムの「ループＸ（do j）」における「配列ａ」のアクセスパターンを解析する。 Returning to FIG. 2, the access pattern analysis unit 33 analyzes the access pattern of data in the array of the target loop in which the array data is set. For example, the access pattern analysis unit 33 determines the access pattern of “array a” in “loop X (do j)” of the source program shown in FIG. 1B in which the array data of “array name: a” is set. To analyze.

そして、アクセスパターン解析部３３は、図１０に示すように、「ループＸ」における「配列ａ」の制御変数「j」が「１」から「ｎ」に１ずつ増分して変化するパターンであると解析する。同様に、アクセスパターン解析部３３は、「ループＸ」における「配列ｂ、ｃ」と、「ループＹ」における「配列ａ、ｂ、ｃ」と、「ループＺ」における「配列ａ」との制御変数「j」それぞれが「１」から「ｎ」に１ずつ増分して変化するパターンであると解析する。なお、図１０は、アクセスパターン解析部を説明するための図である。 Then, as shown in FIG. 10, the access pattern analysis unit 33 is a pattern in which the control variable “j” of “array a” in “loop X” is incremented by 1 from “1” to “n”. And analyze. Similarly, the access pattern analysis unit 33 controls “array b, c” in “loop X”, “array a, b, c” in “loop Y”, and “array a” in “loop Z”. Each variable “j” is analyzed to be a pattern that changes from “1” to “n” by one increment. FIG. 10 is a diagram for explaining the access pattern analysis unit.

そして、アクセスパターン解析部３３は、図１１の（Ａ）に示すように、例えば、「ループ名：ループＸ」のループデータと連鎖して配列データ記憶部２３に格納されている配列データに、「初期値：１」、「終値：ｎ」および「増分値：１」を設定する。なお、図１１は、アクセスパターン解析部による処理後における配列データ記憶部を説明するための図である。 Then, as shown in FIG. 11A, for example, the access pattern analysis unit 33 links the array data stored in the array data storage unit 23 with the loop data “loop name: loop X”. “Initial value: 1”, “End value: n”, and “Increment value: 1” are set. FIG. 11 is a diagram for explaining the array data storage unit after the processing by the access pattern analysis unit.

また、配列名が異なっていても領域が同じである場合に対処するために、アクセスパターン解析部３３は、配列データ記憶部２３において、配列データとは別に、アドレス変換テーブルを格納する。なお、アクセスパターン解析部３３は、領域を特定するためのメモリ上にアドレスを、ソースプログラムを解析することにより獲得する。これにより、アクセスパターン解析部３３は、例えば、図１１の（Ｂ）に示すように、「配列名：ａ」のアドレスが「0xaaaaaaa」であり、「配列名：ｂ」のアドレスが「0xbbbbbbb」であり、「配列名：ｃ」のアドレスが「0xcccccccc」であることを獲得する。そして、アクセスパターン解析部３３は、取得したアドレスを配列名に対応付けたアドレス変換テーブルとして配列データ記憶部２３に格納する。なお、アクセスパターン解析部３３は、アドレスが獲得されなかった場合、配列名に対応付けて「アドレス：ＮＵＬＬ」をアドレス変換テーブルに格納する。 In order to cope with the case where the regions are the same even if the array names are different, the access pattern analysis unit 33 stores an address conversion table in the array data storage unit 23 separately from the array data. The access pattern analysis unit 33 acquires an address on a memory for specifying an area by analyzing the source program. Thereby, the access pattern analysis unit 33, for example, as shown in FIG. 11B, the address of “array name: a” is “0xaaaaaaa” and the address of “array name: b” is “0xbbbbbbb”. And that the address of “array name: c” is “0xcccccccc”. Then, the access pattern analysis unit 33 stores the acquired address in the array data storage unit 23 as an address conversion table associated with the array name. When the address is not acquired, the access pattern analysis unit 33 stores “address: NULL” in the address conversion table in association with the array name.

なお、対象ループ内で同一配列に対するデータのアクセスパターンが複数存在する場合、アクセスパターン解析部３３は、同一配列名の配列データを、アクセスパターンごとに設定する。例えば、ループ内で「配列名：ａ」に対して、制御変数が「１」から「ｎ」に１ずつ増分して変化するパターン、制御変数が「ｎ」から「１」に１ずつ減少して変化するパターンおよび制御変数が「２」から「ｎ」に１ずつ増分して変化するパターンがあるとする。この場合、アクセスパターン解析部３３は、図１１の（Ｃ）に示すように、「配列名：ａ」において、「初期値：１、終値：ｎ、増分値：１」、「初期値：ｎ、終値：１、増分値：−１」および「初期値：２、終値：ｎ、増分値：１」の３つの配列データを設定する。 When there are a plurality of data access patterns for the same sequence in the target loop, the access pattern analysis unit 33 sets the sequence data having the same sequence name for each access pattern. For example, for a “sequence name: a” in a loop, a pattern in which the control variable is incremented by 1 from “1” to “n” and the control variable is decremented by 1 from “n” to “1”. , And a pattern in which the control variable is changed by incrementing from “2” to “n” by one. In this case, as shown in FIG. 11C, the access pattern analysis unit 33 uses “initial value: 1, final value: n, incremental value: 1”, “initial value: n” in “array name: a”. , Final value: 1, incremental value: −1 ”and“ initial value: 2, final value: n, incremental value: 1 ”are set.

ここで、アクセスパターン解析部３３による詳細な処理の手順について、図１２を用いて説明する。なお、図１２は、アクセスパターン解析部による処理を説明するためのフローチャートである。 Here, a detailed processing procedure by the access pattern analysis unit 33 will be described with reference to FIG. FIG. 12 is a flowchart for explaining processing by the access pattern analysis unit.

図１２に示すように、アクセスパターン解析部３３は、配列解析部３２により配列データが設定されると（ステップＳ３０１肯定）、配列データごとに処理を開始する。 As shown in FIG. 12, when the sequence data is set by the sequence analysis unit 32 (Yes in step S301), the access pattern analysis unit 33 starts processing for each sequence data.

まず、アクセスパターン解析部３３は、配列名からアドレスを獲得し、アドレス変換テーブルを配列データ記憶部２３に構築する（ステップＳ３０２）。 First, the access pattern analysis unit 33 acquires an address from the array name, and constructs an address conversion table in the array data storage unit 23 (step S302).

そして、アクセスパターン解析部３３は、配列ごとに制御変数の初期値、終値および増分値を取得し、配列データを更新する（ステップＳ３０３）。なお、対象ループ内で同一配列に対するデータのアクセスパターンが複数存在する場合、アクセスパターン解析部３３は、同一配列名の配列データを、アクセスパターンごとに設定して更新する。 Then, the access pattern analysis unit 33 acquires the initial value, final value, and increment value of the control variable for each array, and updates the array data (step S303). If there are a plurality of data access patterns for the same array in the target loop, the access pattern analysis unit 33 sets and updates the array data having the same array name for each access pattern.

そののち、アクセスパターン解析部３３は、配列解析部３２により設定された配列データをすべて更新処理したか否かを判定する（ステップＳ３０４）。 After that, the access pattern analysis unit 33 determines whether or not all the sequence data set by the sequence analysis unit 32 has been updated (step S304).

ここで、未処理の配列データがあった場合（ステップＳ３０４否定）、アクセスパターン解析部３３は、ステップＳ３０２に戻って、次の配列データに対する処理を開始する。 If there is unprocessed array data (No at Step S304), the access pattern analysis unit 33 returns to Step S302 and starts processing for the next array data.

一方、配列データをすべて更新処理した場合（ステップＳ３０４肯定）、アクセスパターン解析部３３は、処理を終了する。 On the other hand, when all the array data has been updated (Yes at Step S304), the access pattern analysis unit 33 ends the process.

図２に戻って、依存関係解析部３４は、アクセスパターン解析部３３が解析した対象ループの配列におけるデータのアクセスパターンから、前後のループ間で処理される配列のデータにおける依存関係を解析する。例えば、アクセスパターン解析部３３は、図１の（Ｂ）に示すソースプログラムの「配列ａ」におけるループ間の依存関係を解析する。 Returning to FIG. 2, the dependency relationship analysis unit 34 analyzes the dependency relationship in the data of the array processed between the preceding and succeeding loops from the data access pattern in the sequence of the target loop analyzed by the access pattern analysis unit 33. For example, the access pattern analysis unit 33 analyzes the dependency relationship between the loops in the “array a” of the source program shown in FIG.

すなわち、依存関係解析部３４は、図１３の（Ａ）に示すように、ループＸとループＹとの間において、ループＸにて１からｎの順に処理された「配列ａ」のデータが、ループＹにて１からｎの順に処理される依存関係であると解析する。また、依存関係解析部３４は、図１３の（Ａ）に示すように、ループＹとループＺとの間において、ループＹにて１からｎの順に処理された「配列ａ」のデータが、ループＺにて１からｎの順に処理される依存関係であると解析する。なお、図１３は、依存関係解析部を説明するための図である。 That is, as shown in FIG. 13A, the dependency relationship analysis unit 34, between the loop X and the loop Y, the data of the “array a” processed in the order of 1 to n in the loop X, It is analyzed that the dependency is processed in the order of 1 to n in the loop Y. Further, as shown in FIG. 13A, the dependency relationship analysis unit 34 stores the data of the “array a” processed in the order of 1 to n in the loop Y between the loop Y and the loop Z. It is analyzed that the dependency is processed in the order of 1 to n in the loop Z. FIG. 13 is a diagram for explaining the dependency relationship analysis unit.

具体的には、依存関係解析部３４は、前後のループそれぞれにおいて設定されたループデータと、ループデータと連鎖している配列データと、アドレス変換テーブルとを参照して、同一配列のデータにおけるループ間の依存関係を解析する。 Specifically, the dependency relationship analysis unit 34 refers to the loop data set in each of the preceding and following loops, the array data linked to the loop data, and the address conversion table, and the loops in the data of the same array Analyze the dependency between them.

例えば、依存関係解析部３４は、図１３の（Ｂ）に示すように、「ループ名：Ｘ」および「ループ名：Ｙ」のループデータと、「ループ名：Ｘ」のループデータと連鎖する「配列名：ａ」の配列データおよび「ループ名：Ｙ」のループデータと連鎖する「配列名：ａ」の配列データと、アドレス変換テーブルとを読み出す。そして、依存関係解析部３４は、図１３の（Ｂ）に示すように、アドレス変換テーブルを参照してアドレスから配列の一致を判断したうえで、配列データを比較する。 For example, as shown in FIG. 13B, the dependency relationship analysis unit 34 links the loop data “loop name: X” and “loop name: Y” with the loop data “loop name: X”. The array data of “array name: a” that is linked with the array data of “array name: a” and the loop data of “loop name: Y” and the address conversion table are read. Then, as shown in FIG. 13B, the dependency relationship analysis unit 34 refers to the address conversion table to determine sequence matching from the addresses, and then compares the sequence data.

この際、依存関係解析部３４は、各配列データの増分値を参照し、図１３の（Ｂ）に示すように、アクセス方向がそれぞれ増分する「＋」であると判断し、ループＸおよびループＹのアクセス方向が一致する順方向であると解析する。そして、依存関係解析部３４は、図１３の（Ｂ）に示すように、「ループ名：Ｘ」のループデータと連鎖した「配列ａ」の配列データに、新たに「次ループ：順方向」を設定する。 At this time, the dependency relationship analysis unit 34 refers to the increment value of each array data, determines that the access direction is “+”, and increments the loop X and the loop as shown in FIG. It is analyzed that the access direction of Y is a forward direction that matches. Then, as shown in FIG. 13B, the dependency relationship analysis unit 34 newly adds “next loop: forward direction” to the array data of “array a” linked to the loop data of “loop name: X”. Set.

図１３の（Ｂ）を用いて説明した処理により、依存関係解析部３４は、ループＸ、ＹおよびＺのループデータと連鎖する「配列ａ」の配列データに新たに「次ループ」の項目を設定する。例えば、依存関係解析部３４は、図１４に示すように、ループＸのループデータと連鎖する「配列ａ」の配列データに「次ループ：順方向」を設定し、ループＹのループデータと連鎖する「配列ａ」の配列データに「次ループ：順方向」を設定する。また、依存関係解析部３４は、図１４に示すように、ループＺのループデータと連鎖する「配列ａ」の配列データに「次ループ：なし」を設定する。なお、図１４は、依存関係解析部による処理後における配列データ記憶部を説明するための図である。 By the processing described with reference to FIG. 13B, the dependency relationship analysis unit 34 newly adds an item “next loop” to the array data of “array a” linked to the loop data of loops X, Y, and Z. Set. For example, as shown in FIG. 14, the dependency relationship analysis unit 34 sets “next loop: forward direction” to the array data of “array a” that is linked to the loop data of loop X, and is linked to the loop data of loop Y. “Next loop: forward direction” is set in the array data of “array a”. Further, as illustrated in FIG. 14, the dependency relationship analysis unit 34 sets “next loop: none” to the array data of “array a” linked to the loop data of the loop Z. FIG. 14 is a diagram for explaining the array data storage unit after the processing by the dependency analysis unit.

なお、依存関係解析部３４は、複数のアクセスパターンがあるために同一配列に対して複数の配列データが存在する場合、アクセスパターンが同一のものを検索して比較することで、順方向または逆方向の方向性を決定する。初期値、終値および増分値に関しては、変数であったとしても、変数のアドレスで比較するために、同一性を確保できる。 In addition, when there are a plurality of sequence data for the same sequence because there are a plurality of access patterns, the dependency relationship analysis unit 34 searches for and compares the same access pattern, and forward or reverse Determine the directionality of the direction. As for the initial value, the closing price, and the increment value, even if they are variables, the sameness can be ensured in order to compare by the address of the variable.

ここで、依存関係解析部３４による詳細な方向性の判断処理の手順について、図１５を用いて説明する。なお、図１５は、依存関係解析部による方向性の判断処理を説明するためのフローチャートである。 Here, the detailed procedure for determining the directionality by the dependency relationship analysis unit 34 will be described with reference to FIG. FIG. 15 is a flowchart for explaining the directionality determination processing by the dependency relationship analysis unit.

図１５に示すように、依存関係解析部３４は、アクセスパターン解析部３３により、すべての配列データにおいて、初期値、終値および増分値が更新されると（ステップＳ４０１肯定）、カウンタにおける順方向および逆方向の値を初期化する（ステップＳ４０２）。この際、依存関係解析部３４は、処理対象とするループ（以下、自ループと記す）のループデータと連鎖する配列データを配列データ記憶部２３から抽出したうえで、カウンタにおける順方向および逆方向の値を初期化する。 As shown in FIG. 15, when the initial value, the final value, and the increment value are updated in all the array data by the access pattern analyzing unit 33 (Yes in step S401), the dependency relationship analyzing unit 34 The reverse value is initialized (step S402). At this time, the dependency analysis unit 34 extracts sequence data linked to loop data of a loop to be processed (hereinafter referred to as a self loop) from the sequence data storage unit 23, and then forwards and reverses the counter. The value of is initialized.

そして、依存関係解析部３４は、自ループの次にあるループである次ループと連鎖する配列データに同一配列名があるか否かを判定する（ステップＳ４０３）。 Then, the dependency relationship analysis unit 34 determines whether or not there is the same sequence name in the sequence data linked to the next loop that is the loop next to the own loop (step S403).

ここで、次ループに同一配列名がある場合（ステップＳ４０３肯定）、依存関係解析部３４は、次ループにおける同一配列名の配列データのアクセス方向は、順方向であるか否かを判定する（ステップＳ４０４）。なお、ステップＳ４０４以降の処理は、同一配列名の配列データ一つずつに対して、順次実行される。 If the next loop has the same sequence name (Yes at step S403), the dependency analysis unit 34 determines whether or not the access direction of the sequence data having the same sequence name in the next loop is the forward direction ( Step S404). Note that the processing after step S404 is sequentially executed for each array data having the same array name.

順方向であり（ステップＳ４０４肯定）、自ループの初期値と次ループの初期値とが同じであるならば（ステップＳ４０５肯定）、依存関係解析部３４は、順方向カウンタの値をインクリメントする（ステップＳ４０６）。 If the current direction is the forward direction (Yes at Step S404) and the initial value of the own loop and the initial value of the next loop are the same (Yes at Step S405), the dependency relationship analysis unit 34 increments the value of the forward direction counter ( Step S406).

また、逆方向であり（ステップＳ４０４否定）、自ループの初期値と次ループの初期値とが同じであるならば（ステップＳ４０７肯定）、依存関係解析部３４は、逆方向カウンタの値をインクリメントする（ステップＳ４０８）。 In the reverse direction (No at Step S404), if the initial value of the own loop and the initial value of the next loop are the same (Yes at Step S407), the dependency relationship analysis unit 34 increments the value of the reverse direction counter. (Step S408).

ここで、「順方向であり（ステップＳ４０４肯定）、自ループの初期値と次ループの初期値とが異なる場合（ステップＳ４０５否定）」、依存関係解析部３４は、ステップＳ４０３に戻って、次ループに未処理の同一配列名の配列データがあるか否かを判定する。 Here, when “the forward direction (Yes at Step S404) and the initial value of the own loop and the initial value of the next loop are different (No at Step S405)”, the dependency relationship analysis unit 34 returns to Step S403 and proceeds to the next. It is determined whether there is unprocessed sequence data having the same sequence name in the loop.

また、依存関係解析部３４は、「逆方向であり（ステップＳ４０４否定）、自ループの初期値と次ループの初期値とが異なる場合（ステップＳ４０７否定）」も、ステップＳ４０３に戻って、次ループに未処理の同一配列名の配列データがあるか否かを判定する。 Also, the dependency relationship analysis unit 34 returns to step S403 and returns to the next step when “in the reverse direction (No at Step S404) and the initial value of the own loop and the initial value of the next loop are different (No at Step S407)”. It is determined whether there is unprocessed sequence data having the same sequence name in the loop.

また、依存関係解析部３４は、「ステップＳ４０６の処理を終了した場合」および「ステップＳ４０８の処理を終了した場合も、ステップＳ４０３に戻って、次ループに未処理の同一配列名の配列データがあるか否かを判定する。 Also, the dependency relationship analysis unit 34 returns to step S403 even when the processing of step S406 is completed and when the processing of step S408 is completed, and the unprocessed sequence data of the same sequence name is stored in the next loop. It is determined whether or not there is.

一方、次ループに未処理の同一配列名の配列データがない場合（ステップＳ４０３否定）、依存関係解析部３４は、順方向および逆方向のカウンタの値がともに「０」であるか否かを判定する（ステップＳ４０９）。 On the other hand, when there is no unprocessed array data with the same array name in the next loop (No in step S403), the dependency relationship analysis unit 34 determines whether or not both the forward and backward counter values are “0”. Determination is made (step S409).

順方向および逆方向のカウンタの値がともに「０」である場合（ステップＳ４０９肯定）、依存関係解析部３４は、配列データにおける次ループの項目を「なし」として更新し（ステップＳ４１０）、処理を終了する。 When both the forward and backward counter values are “0” (Yes at Step S409), the dependency analysis unit 34 updates the item of the next loop in the array data as “None” (Step S410), and performs processing. Exit.

また、順方向および逆方向のカウンタの値のいずれか一方が「０」でない場合（ステップＳ４０９否定）、順方向のカウンタの値が、逆方向の値以上であるか否かを判定する（ステップＳ４１１）。 If one of the forward and backward counter values is not “0” (No in step S409), it is determined whether or not the forward counter value is greater than or equal to the backward value (step S409). S411).

ここで、順方向のカウンタの値が、逆方向の値以上である場合（ステップＳ４１１肯定）、依存関係解析部３４は、配列データにおける次ループの項目を「順方向」として更新し（ステップＳ４１２）、処理を終了する。 If the value of the forward counter is greater than or equal to the backward value (Yes at Step S411), the dependency relationship analysis unit 34 updates the item of the next loop in the array data as “Forward” (Step S412). ), The process is terminated.

一方、順方向のカウンタの値が、逆方向の値未満である場合（ステップＳ４１１否定）、依存関係解析部３４は、配列データにおける次ループの項目を「逆方向」として更新し（ステップＳ４１３）、処理を終了する。 On the other hand, if the value of the forward counter is less than the value in the reverse direction (No in step S411), the dependency analysis unit 34 updates the item of the next loop in the array data as “reverse direction” (step S413). The process is terminated.

図２に戻って、ハードウェア情報記憶部２４は、通信部１３が受信した情報処理装置４０のハードウェア情報を記憶する。ここで、ハードウェア情報は、コンパイラ装置１０の起動時、初期化処理の延長で、通信部１３が、情報処理装置４０のＯＳに問い合わせることで、ハードウェア情報記憶部２４に格納される。 Returning to FIG. 2, the hardware information storage unit 24 stores the hardware information of the information processing apparatus 40 received by the communication unit 13. Here, the hardware information is stored in the hardware information storage unit 24 by the communication unit 13 inquiring of the OS of the information processing apparatus 40 as an extension of the initialization process when the compiler apparatus 10 is activated.

例えば、ハードウェア情報記憶部２４は、図１６に示すように、情報処理装置４０に搭載されているキャッシュの容量（キャッシュサイズ）が「ｃａｓｈｅ」とするハードウェア情報を記憶する。なお、図１６は、ハードウェア情報記憶部を説明するための図である。 For example, as illustrated in FIG. 16, the hardware information storage unit 24 stores hardware information in which the capacity (cache size) of the cache installed in the information processing apparatus 40 is “cache”. FIG. 16 is a diagram for explaining the hardware information storage unit.

また、ハードウェア情報記憶部２４は、図１６に示すように、情報処理装置４０のＣＰＵが逆方向ハードウェアプリフェッチを実行可能であるか不可能であるかの情報として「可」または「不可」とするハードウェア情報を記憶する。なお、逆方向ハードウェアプリフェッチとは、ハードウェアプリフェッチにおいて、アドレスを減算する方向でデータアクセスが可能である機能のことである。 Further, as shown in FIG. 16, the hardware information storage unit 24 indicates “possible” or “impossible” as information as to whether or not the CPU of the information processing apparatus 40 can execute backward hardware prefetch. Hardware information is stored. Note that backward hardware prefetching is a function that allows data access in the direction of subtracting addresses in hardware prefetching.

図２に戻って、部分キャッシュ指示部３５は、依存関係解析部３４による依存関係の解析結果から、自ループにおいて処理されたデータの格納先をメモリまたはキャッシュのいずれかに割り振るように決定する。すなわち、部分キャッシュ指示部３５は、依存関係の解析結果から、キャッシュサイズを考慮したアクセスパターンにループを変形するように後述する最適化部３６に指示する。具体的には、部分キャッシュ指示部３５は、次ループの方向性と、キャッシュサイズと、逆方向ハードウェアプリフェッチ機能の有無から変形する形を決定する。 Returning to FIG. 2, the partial cache instruction unit 35 determines, based on the dependency analysis result by the dependency analysis unit 34, to allocate the storage destination of the data processed in its own loop to either the memory or the cache. That is, the partial cache instruction unit 35 instructs the optimization unit 36 to be described later to transform the loop into an access pattern considering the cache size from the analysis result of the dependency relationship. Specifically, the partial cache instruction unit 35 determines a shape to be deformed based on the direction of the next loop, the cache size, and the presence / absence of the backward hardware prefetch function.

ここで、部分キャッシュ指示部３５について、図１７〜図２２を用いて詳細に説明する。なお、図１７は、部分キャッシュ指示部による処理を説明するためのフローチャートであり、図１８は、パターン１〜４のループ変形指示の概要を説明するための図である。また、図１９は、パターン１のループ変形指示を説明するための図であり、図２０は、パターン２のループ変形指示を説明するための図である。また、図２１は、パターン３のループ変形指示を説明するための図であり、図２２は、パターン４のループ変形指示を説明するための図である。 Here, the partial cache instruction unit 35 will be described in detail with reference to FIGS. FIG. 17 is a flowchart for explaining the processing by the partial cache instruction unit, and FIG. 18 is a diagram for explaining the outline of the loop modification instructions for patterns 1 to 4. FIG. 19 is a diagram for explaining a loop deformation instruction for pattern 1, and FIG. 20 is a diagram for explaining a loop deformation instruction for pattern 2. FIG. 21 is a diagram for explaining a loop deformation instruction for pattern 3, and FIG. 22 is a diagram for explaining a loop deformation instruction for pattern 4.

図１７に示すように、部分キャッシュ指示部３５は、配列データの依存関係（項目：次ループ）が更新されると（ステップＳ５０１肯定）、処理対象となる配列データにて更新された次ループの方向性が順方向であるか否かを判定する（ステップＳ５０２）。 As shown in FIG. 17, when the dependency relationship (item: next loop) of the array data is updated (Yes at step S501), the partial cache instruction unit 35 determines that the next loop updated with the array data to be processed. It is determined whether or not the directionality is forward (step S502).

次ループの方向性が順方向である場合（ステップＳ５０２肯定）、部分キャッシュ指示部３５は、ハードウェア情報を参照して、情報処理装置４０が逆方向ハードウェアプリフェッチ処理可であるか否かを判定する（ステップＳ５０３）。 If the direction of the next loop is the forward direction (Yes at Step S502), the partial cache instruction unit 35 refers to the hardware information and determines whether or not the information processing apparatus 40 can perform the backward hardware prefetch processing. Determination is made (step S503).

逆方向ハードウェアプリフェッチ処理可である場合（ステップＳ５０３肯定）、部分キャッシュ指示部３５は、パターン１のループ変形処理を最適化部３６に指示して（ステップＳ５０４）、処理を終了する。 If the backward hardware prefetch process is possible (Yes at Step S503), the partial cache instruction unit 35 instructs the optimization unit 36 to perform the loop deformation process for Pattern 1 (Step S504), and the process is terminated.

また、逆方向ハードウェアプリフェッチ処理不可である場合（ステップＳ５０３否定）、部分キャッシュ指示部３５は、パターン２のループ変形処理を最適化部３６に指示して（ステップＳ５０５）、処理を終了する。 If the backward hardware prefetch process is not possible (No at Step S503), the partial cache instruction unit 35 instructs the optimization unit 36 to perform the loop transformation process for Pattern 2 (Step S505), and ends the process.

一方、次ループの方向性が逆方向である場合（ステップＳ５０２否定）、部分キャッシュ指示部３５は、ハードウェア情報を参照して、情報処理装置４０が逆方向ハードウェアプリフェッチ処理可であるか否かを判定する（ステップＳ５０６）。 On the other hand, when the directionality of the next loop is the reverse direction (No in step S502), the partial cache instruction unit 35 refers to the hardware information and determines whether or not the information processing apparatus 40 can perform the reverse direction hardware prefetch process. Is determined (step S506).

逆方向ハードウェアプリフェッチ処理可である場合（ステップＳ５０６肯定）、部分キャッシュ指示部３５は、パターン３のループ変形処理を最適化部３６に指示して（ステップＳ５０７）、処理を終了する。 If the backward hardware prefetch process is possible (Yes at Step S506), the partial cache instruction unit 35 instructs the optimization unit 36 to perform the loop deformation process of Pattern 3 (Step S507), and ends the process.

また、逆方向ハードウェアプリフェッチ処理不可である場合（ステップＳ５０６否定）、部分キャッシュ指示部３５は、パターン４のループ変形処理を最適化部３６に指示して（ステップＳ５０８）、処理を終了する。 If the backward hardware prefetch process is not possible (No at Step S506), the partial cache instruction unit 35 instructs the optimization unit 36 to perform the loop deformation process of Pattern 4 (Step S508), and ends the process.

パターン１〜４のループ変形指示の概要について、図１８を用いて説明する。なお、図１８において、「Ａ」は、変形するループの初期値であり、「Ｂ」は、変形するループの初期値にキャッシュサイズ（ｃａｓｈｅ）分の値を加算した値である。また、図１８において、「Ｃ」は、変形するループの終値からキャッシュサイズ（ｃａｓｈｅ）分の値を減算した値であり、「Ｄ」は、変形するループの終値である。なお、図１８では、変形されるループにて作成される「キャッシュ利用を実行する新たなループ」を「別ループ」として記載してある。 The outline of the loop deformation instructions for patterns 1 to 4 will be described with reference to FIG. In FIG. 18, “A” is an initial value of the loop to be deformed, and “B” is a value obtained by adding a value corresponding to the cache size (cash) to the initial value of the loop to be deformed. In FIG. 18, “C” is a value obtained by subtracting a value corresponding to the cache size (cash) from the closing price of the loop to be deformed, and “D” is the closing price of the loop to be deformed. In FIG. 18, “new loop for executing cache use” created in the loop to be transformed is described as “another loop”.

次ループが順方向であり、逆方向ハードウェアプリフェッチ可である場合に実行されるパターン１のループ変形指示においては、図１８に示すように、別ループの作成位置が「先頭」とされる。具体的には、パターン１のループ変形指示においては、変形されるループにて処理されるデータのキャッシュサイズに対応する前半部分がキャッシュ利用のループにより処理されるように指示される。また、パターン１のループ変形指示においては、別ループの初期値が「Ｂ」、別ループの終値が「Ａ」とされ、逆方向ハードウェアプリフェッチ機能を有することから、別ループの制御変数の回転の方向性が「逆方向」となる。 In the loop modification instruction of pattern 1 that is executed when the next loop is forward and backward hardware prefetch is possible, the creation position of another loop is set to “first” as shown in FIG. Specifically, in the loop modification instruction of pattern 1, the first half part corresponding to the cache size of the data processed in the modified loop is instructed to be processed by the cache use loop. In the loop deformation instruction of pattern 1, the initial value of another loop is “B”, the final value of another loop is “A”, and the reverse hardware prefetch function is provided. Becomes the “reverse direction”.

パターン１のループ変形指示の具体例を、図１９を用いて説明する。入力されたソースプログラムが図１９の（Ａ）の左側に示すように記述されていた場合、後ループにて処理される配列ａの方向性は、前ループと順方向である。ここで、配列ａのデータサイズがキャッシュサイズに収まらない場合、図１９の（Ｂ）の左側に示すように、前ループの処理のあとキャッシュの最後尾にあるデータの制御変数は、「ｅｊ」である。すなわち、後ループの処理の先頭において処理される制御変数「ｓｊ」のデータは、キャッシュから落ちており、キャッシュミスが発生する。したがって、従来では、キャッシュ不用として、ブロックストア命令を用いていた。 A specific example of the loop deformation instruction for pattern 1 will be described with reference to FIG. When the input source program is described as shown on the left side of FIG. 19A, the directionality of the array a processed in the subsequent loop is the forward direction with respect to the previous loop. Here, when the data size of the array a does not fit in the cache size, as shown on the left side of FIG. 19B, the control variable of the data at the tail of the cache after the processing of the previous loop is “ej”. It is. That is, the data of the control variable “sj” processed at the head of the subsequent loop processing has fallen from the cache, and a cache miss occurs. Therefore, conventionally, a block store instruction is used as a cache unnecessary.

そこで、パターン１のループ変形指示では、図１９の（Ａ）の右側に示すように、前ループの先頭（前半部分）においてキャッシュサイズに相当するデータをキャッシュ利用とし、かつ、制御変数を１つずつ減算（−１）とする新ループを作成する。また、キャッシュサイズに相当する前半部分以外の後半部分のデータに関しては、前ループにおいて、ブロックストア命令とする。 Therefore, in the loop modification instruction of pattern 1, as shown on the right side of FIG. 19A, data corresponding to the cache size is used for cache at the head (first half part) of the previous loop, and one control variable is used. Create a new loop with subtraction (-1). Further, regarding the data in the latter half other than the former half corresponding to the cache size, the block store instruction is used in the previous loop.

また、部分キャッシュ指示部３５は、ブロックストアのループがキャッシュ利用の新ループの前に実行されるように指示する。 In addition, the partial cache instruction unit 35 instructs the block store loop to be executed before the new cache use loop.

これにより、図１９の（Ｂ）の右側に示すように、プログラム実行時においては、最初に、前ループにてキャッシュサイズに相当する前半部分以外の後半部分のデータが処理されて、キャッシュ不使用のブロックストアで、メモリに直接格納される。そして、次に、逆方向回転の新ループにてキャッシュサイズに相当する前半部分のデータが処理されてキャッシュに格納される。これにより、前ループによって処理された初期値のデータがキャッシュの最後に格納され、後ループにおいて確実にキャッシュヒットとなる。また、キャッシュ不使用のデータに関しては、メモリに直接アクセスすることでデータをロードすることができるので、処理が高速化される。 As a result, as shown on the right side of FIG. 19B, at the time of program execution, the data in the second half other than the first half corresponding to the cache size is first processed in the previous loop, and the cache is not used. Stored directly in memory. Then, the first half of the data corresponding to the cache size is processed and stored in the cache in a new reverse rotation loop. As a result, the initial value data processed by the previous loop is stored at the end of the cache, and a cache hit is reliably obtained in the subsequent loop. In addition, with respect to data not using the cache, the data can be loaded by directly accessing the memory, so that the processing speed is increased.

図１８に戻って、次ループが順方向であり、逆方向ハードウェアプリフェッチ不可である場合に実行されるパターン２のループ変形指示においては、パターン１と同様に、別ループの作成位置が「先頭」とされる。すなわち、パターン２のループ変形指示においても、パターン１と同様に、変形されるループにて処理されるデータのキャッシュサイズに対応する前半部分がキャッシュ利用のループにより処理されるように指示される。また、パターン２のループ変形指示においては、別ループの初期値が「Ａ」、別ループの終値が「Ｂ」とされ、逆方向ハードウェアプリフェッチ機能がないことから、別ループの制御変数の回転の方向性が「順方向」となる。 Returning to FIG. 18, in the loop modification instruction for pattern 2 that is executed when the next loop is in the forward direction and the backward hardware prefetch is not possible, the creation position of another loop is “head” as in pattern 1. " That is, in the loop modification instruction of pattern 2, as in pattern 1, the first half corresponding to the cache size of the data processed in the loop to be modified is instructed to be processed by the cache use loop. In the loop modification instruction of pattern 2, the initial value of another loop is “A”, the final value of another loop is “B”, and there is no backward hardware prefetch function. Is the “forward direction”.

パターン２のループ変形指示の具体例を、図２０を用いて説明する。なお、図２０の（Ａ）の左側に示すソースプログラムは、図１９の（Ａ）の左側に示すソースプログラムと同じであるので、説明を省略する。また、図２０の（Ｂ）の左側では、図２０の（Ａ）の左側に示すソースプログラムを実行した場合、キャッシュを利用すると後ループにてキャッシュミスが発生することを示しているが、図１９の（Ｂ）の左側を用いて説明した内容と同じであるので、説明を省略する。 A specific example of the pattern 2 loop deformation instruction will be described with reference to FIG. The source program shown on the left side of FIG. 20A is the same as the source program shown on the left side of FIG. Further, the left side of FIG. 20B shows that when the source program shown on the left side of FIG. 20A is executed, a cache miss occurs in a later loop when the cache is used. Since it is the same as the content demonstrated using the left side of 19 (B), description is abbreviate | omitted.

そこで、パターン１と逆方向ハードウェアプリフェッチ機能がない点が異なるパターン２のループ変形指示では、図２０の（Ａ）の右側に示すように、前ループの先頭（前半部分）においてキャッシュサイズに相当するデータをキャッシュ利用とし、かつ、制御変数を１つずつ増分する新ループを作成する。また、キャッシュサイズに相当する前半部分以外の後半部分のデータに関しては、前ループにおいて、ブロックストア命令とする。 Therefore, in the loop modification instruction of pattern 2 which is different from pattern 1 in that there is no backward hardware prefetch function, as shown on the right side of FIG. 20A, it corresponds to the cache size at the head (first half part) of the previous loop. A new loop is created that uses the data to be cached and increments the control variable by one. Further, regarding the data in the latter half other than the former half corresponding to the cache size, the block store instruction is used in the previous loop.

また、パターン１と同様に、部分キャッシュ指示部３５は、ブロックストアのループがキャッシュ利用の新ループの前に実行されるように指示する。 Similarly to the pattern 1, the partial cache instruction unit 35 instructs the block store loop to be executed before the new cache use loop.

これにより、図２０の（Ｂ）の右側に示すように、プログラム実行時においては、最初に、前ループにてキャッシュサイズに相当する前半部分以外の後半部分のデータが処理されて、キャッシュ不使用のブロックストアで、メモリに直接格納される。そして、次に、順方向回転の新ループにてキャッシュサイズに相当する前半部分のデータが処理されてキャッシュに格納される。これにより、前ループによって処理された初期値のデータがキャッシュにて最初に格納されたとしても、キャッシュから落ちることが防止され、後ループにおいて確実にキャッシュヒットとなる。また、キャッシュ不使用のデータに関しては、メモリに直接アクセスすることでデータをロードすることができるので、処理が高速化される。 As a result, as shown on the right side of FIG. 20B, at the time of program execution, the data in the second half other than the first half corresponding to the cache size is first processed in the previous loop, and the cache is not used. Stored directly in memory. Then, in the new forward rotation loop, the first half of the data corresponding to the cache size is processed and stored in the cache. As a result, even if the initial value data processed by the previous loop is first stored in the cache, it is prevented from falling out of the cache, and a cache hit is surely made in the subsequent loop. In addition, with respect to data not using the cache, the data can be loaded by directly accessing the memory, so that the processing speed is increased.

図１８に戻って、次ループが逆方向であり、逆方向ハードウェアプリフェッチ可である場合に実行されるパターン３のループ変形指示においては、別ループの作成位置が「最終」とされる。具体的には、パターン１のループ変形指示においては、変形されるループにて処理されるデータのキャッシュサイズに対応する後半部分がキャッシュ利用のループにより処理されるように指示される。また、パターン３のループ変形指示においては、別ループの初期値が「Ｄ」、別ループの終値が「Ｃ」とされ、逆方向ハードウェアプリフェッチ機能を有することから、別ループの制御変数の回転の方向性が「逆方向」となる。 Returning to FIG. 18, in the loop modification instruction of the pattern 3 executed when the next loop is in the reverse direction and the reverse direction hardware prefetch is possible, the creation position of another loop is “final”. Specifically, in the loop modification instruction of pattern 1, the second half part corresponding to the cache size of the data processed in the modified loop is instructed to be processed by the cache use loop. Also, in the loop deformation instruction of pattern 3, the initial value of another loop is “D”, the final value of another loop is “C”, and the reverse hardware prefetch function is provided. Becomes the “reverse direction”.

パターン３のループ変形指示の具体例を、図２１を用いて説明する。入力されたソースプログラムが図２１の（Ａ）の左側に示すように記述されていた場合、後ループにて処理される配列ａの方向性は、前ループと逆方向である。ここで、配列ａのデータサイズがキャッシュサイズに収まらない場合、図２１の（Ｂ）の左側に示すように、前ループの処理のあとキャッシュの最後尾にあるデータの制御変数は、「ｅｊ」である。すなわち、後ループの処理の先頭において処理される制御変数「ｅｊ」のデータは、キャッシュヒットとなるが、前ループにて処理されるデータのキャッシュサイズに対応する後半部分以外のデータに関しては、キャッシュヒットが期待できない。従来では、キャッシュ不用として、ブロックストア命令を用いていた。 A specific example of the pattern 3 loop deformation instruction will be described with reference to FIG. When the input source program is described as shown on the left side of FIG. 21A, the directionality of the array a processed in the rear loop is the reverse direction to that of the front loop. Here, when the data size of the array a does not fit in the cache size, as shown on the left side of FIG. 21B, the control variable of the data at the tail of the cache after the processing of the previous loop is “ej”. It is. That is, the data of the control variable “ej” processed at the beginning of the processing of the subsequent loop is a cache hit, but the data other than the latter half corresponding to the cache size of the data processed in the previous loop is cached. I can't expect a hit. Conventionally, a block store instruction is used as a cache unnecessary.

そこで、パターン３のループ変形指示では、図２１の（Ａ）の右側に示すように、前ループの最終（後半部分）においてキャッシュサイズに相当するデータをキャッシュ利用とし、かつ、制御変数を１つずつ減算（−１）とする新ループを作成する。また、キャッシュサイズに相当する後半部分以外の前半部分のデータに関しては、前ループにおいて、ブロックストア命令とする。 Therefore, in the loop deformation instruction of pattern 3, as shown on the right side of FIG. 21A, the data corresponding to the cache size is used for the cache at the last (second half) of the previous loop, and one control variable is used. Create a new loop with subtraction (-1). In addition, regarding the data in the first half other than the second half corresponding to the cache size, the block store instruction is used in the previous loop.

また、部分キャッシュ指示部３５は、パターン１および２と同様に、ブロックストアのループがキャッシュ利用の新ループの前に実行されるように指示する。 Similarly to the patterns 1 and 2, the partial cache instruction unit 35 instructs the block store loop to be executed before the new cache use loop.

これにより、図２１の（Ｂ）の右側に示すように、プログラム実行時においては、最初に、前ループにてキャッシュサイズに相当する後半部分以外の前半部分のデータが処理されて、キャッシュ不使用のブロックストアで、メモリに直接格納される。そして、次に、逆方向回転の新ループにてキャッシュサイズに相当する後半部分のデータが処理されてキャッシュに格納される。これにより、前ループによって処理された終値のデータがキャッシュの最後に格納され、後ループにおいて確実にキャッシュヒットとなる。また、キャッシュ不使用のデータに関しては、メモリに直接アクセスすることでデータをロードすることができるので、処理が高速化される。 As a result, as shown on the right side of FIG. 21B, at the time of program execution, data in the first half other than the second half corresponding to the cache size is first processed in the previous loop, and the cache is not used. Stored directly in memory. Then, the second half of the data corresponding to the cache size is processed and stored in the cache in a new reverse rotation loop. As a result, the closing price data processed by the previous loop is stored at the end of the cache, and a cache hit is surely made in the subsequent loop. In addition, with respect to data not using the cache, the data can be loaded by directly accessing the memory, so that the processing speed is increased.

図１８に戻って、次ループが逆方向であり、逆方向ハードウェアプリフェッチ不可である場合に実行されるパターン４のループ変形指示においては、パターン３と同様に、別ループの作成位置が「最終」とされる。すなわち、パターン４のループ変形指示においても、パターン３と同様に、変形されるループにて処理されるデータのキャッシュサイズに対応する後半部分がキャッシュ利用のループにより処理されるように指示される。また、パターン４のループ変形指示においては、別ループの初期値が「Ｃ」、別ループの終値が「Ｄ」とされ、逆方向ハードウェアプリフェッチ機能がないことから、別ループの制御変数の回転の方向性が「順方向」となる。 Returning to FIG. 18, in the loop modification instruction for pattern 4 that is executed when the next loop is in the reverse direction and the reverse direction hardware prefetch is not possible, the creation position of another loop is “final” as in pattern 3. " That is, in the loop modification instruction for pattern 4, as in pattern 3, the latter half corresponding to the cache size of the data processed in the modified loop is instructed to be processed by the cache utilization loop. Also, in the loop deformation instruction of pattern 4, the initial value of another loop is “C”, the final value of another loop is “D”, and there is no backward hardware prefetch function. Is the “forward direction”.

パターン４のループ変形指示の具体例を、図２２を用いて説明する。なお、図２２の（Ａ）の左側に示すソースプログラムは、図２１の（Ａ）の左側に示すソースプログラムと同じであるので、説明を省略する。また、図２２の（Ｂ）の左側では、図２１の（Ｂ）の左側で説明したように、図２２の（Ａ）の左側に示すソースプログラムを実行した場合、キャッシュを利用すると後ループにてキャッシュヒットするものの、前ループにて処理されるデータのキャッシュサイズに対応する後半部分以外のデータに関しては、キャッシュヒットが期待できないことを示している。 A specific example of the pattern 4 loop deformation instruction will be described with reference to FIG. Note that the source program shown on the left side of FIG. 22A is the same as the source program shown on the left side of FIG. Further, on the left side of FIG. 22B, as described on the left side of FIG. 21B, when the source program shown on the left side of FIG. This indicates that a cache hit cannot be expected for data other than the latter half corresponding to the cache size of the data processed in the previous loop.

そこで、パターン３と逆方向ハードウェアプリフェッチ機能がない点が異なるパターン４のループ変形指示では、図２２の（Ａ）の右側に示すように、前ループの最終（後半部分）においてキャッシュサイズに相当するデータをキャッシュ利用とし、かつ、制御変数を１つずつ増分する新ループを作成する。また、キャッシュサイズに相当する後半部分以外の前半部分のデータに関しては、前ループにおいて、ブロックストア命令とする。 Therefore, in the loop modification instruction of pattern 4 which is different from pattern 3 in that there is no backward hardware prefetch function, as shown on the right side of FIG. 22A, it corresponds to the cache size at the last (second half) of the previous loop. A new loop is created that uses the data to be cached and increments the control variable by one. In addition, regarding the data in the first half other than the second half corresponding to the cache size, the block store instruction is used in the previous loop.

また、パターン１〜３と同様に、部分キャッシュ指示部３５は、ブロックストアのループがキャッシュ利用の新ループの前に実行されるように指示する。 Similarly to the patterns 1 to 3, the partial cache instruction unit 35 instructs the block store loop to be executed before the new cache use loop.

これにより、図２２の（Ｂ）の右側に示すように、プログラム実行時においては、最初に、前ループにてキャッシュサイズに相当する後半部分以外の前半部分のデータが処理されて、キャッシュ不使用のブロックストアで、メモリに直接格納される。そして、次に、順方向回転の新ループにてキャッシュサイズに相当する後半部分のデータが処理されてキャッシュに格納される。これにより、前ループによって処理された終値のデータがキャッシュにて最初に格納されたとしても、キャッシュから落ちることが防止され、後ループにおいて確実にキャッシュヒットとなる。また、キャッシュ不使用のデータに関しては、メモリに直接アクセスすることでデータをロードすることができるので、処理が高速化される。 As a result, as shown on the right side of FIG. 22B, at the time of program execution, data in the first half other than the second half corresponding to the cache size is first processed in the previous loop, and the cache is not used. Stored directly in memory. Then, in the new forward rotation loop, the latter half of the data corresponding to the cache size is processed and stored in the cache. As a result, even if the closing price data processed by the previous loop is first stored in the cache, it is prevented from falling out of the cache, and a cache hit is surely made in the subsequent loop. In addition, with respect to data not using the cache, the data can be loaded by directly accessing the memory, so that the processing speed is increased.

図２に戻って、最適化部３６は、部分キャッシュ指示部３５によるループ変形指示にしたがって、ソースプログラムに新ループを挿入したソースプログラムを生成する。そして、最適化部３６は、新ループを挿入したソースプログラムに対して、ソフトウェアプリフェッチなどの従来の最適化処理を実行することで、新規ソースプログラムを生成し、生成した新規ソースプログラムを新規ソースプログラム記憶部２５に格納する。 Returning to FIG. 2, the optimization unit 36 generates a source program in which a new loop is inserted into the source program in accordance with the loop transformation instruction from the partial cache instruction unit 35. Then, the optimization unit 36 generates a new source program by executing a conventional optimization process such as software prefetch on the source program into which the new loop is inserted, and the generated new source program is converted into the new source program. Store in the storage unit 25.

オブジェクトファイル生成部３７は、新規ソースプログラム記憶部２５が記憶する新規ソースプログラムからオブジェクトファイルを生成し、生成したオブジェクトファイルを、オブジェクトファイル出力部１２に対して出力する。 The object file generation unit 37 generates an object file from the new source program stored in the new source program storage unit 25, and outputs the generated object file to the object file output unit 12.

続いて、本実施例におけるコンパイル装置１０の処理の流れについて、図２３を用いて説明する。なお、図２３は、本実施例におけるコンパイラ装置の処理を説明するためのフローチャートである。 Next, the flow of processing of the compiling device 10 in this embodiment will be described with reference to FIG. FIG. 23 is a flowchart for explaining the processing of the compiler apparatus according to this embodiment.

図２３に示すように、本実施例におけるコンパイラ装置１０は、ソースプログラム入力部１１からソースプログラムが入力されると（ステップＳ６０１肯定）、ループ構造解析部３１は、対象ループを抽出し、ループデータを設定する（ステップＳ６０２）。なお、コンパイラ装置１０は、自装置の起動時などに、情報処理装置４０から、ハードウェア情報を獲得する。 As shown in FIG. 23, in the compiler apparatus 10 according to the present embodiment, when a source program is input from the source program input unit 11 (Yes in step S601), the loop structure analysis unit 31 extracts a target loop, and loop data Is set (step S602). The compiler apparatus 10 acquires hardware information from the information processing apparatus 40 when the own apparatus is activated.

そして、配列解析部３２は、対象ループを解析して、対象ループで利用されているデータの配列を列挙することで、ループデータと連鎖する配列データを設定する（ステップＳ６０３）。 Then, the sequence analysis unit 32 analyzes the target loop and lists the sequence of data used in the target loop, thereby setting sequence data linked to the loop data (step S603).

そののち、アクセスパターン解析部３３は、設定された配列データの配列名に基づいて、アドレス変換テーブルの構築と、配列データの初期値、終値および増分値の更新を行なう（ステップＳ６０４）。 After that, the access pattern analysis unit 33 constructs an address conversion table and updates the initial value, final value, and increment value of the array data based on the set array data array name (step S604).

さらに、依存関係解析部３４は、設定されたループデータと、ループデータと連鎖している配列データと、アドレス変換テーブルとを参照して、ループ間の同一配列名を有する配列データの依存関係（次ループの項目）を更新する（ステップＳ６０５）。 Further, the dependency relationship analysis unit 34 refers to the set loop data, the sequence data linked to the loop data, and the address conversion table, and the dependency relationship of the sequence data having the same sequence name between the loops ( Next loop item) is updated (step S605).

そして、部分キャッシュ指示部３５は、配列データの依存関係と、キャッシュサイズと、逆方向ハードウェアプリフェッチ機能とに応じてループ変形指示を行なう（ステップＳ６０６）。 Then, the partial cache instruction unit 35 issues a loop modification instruction in accordance with the dependency of the array data, the cache size, and the backward hardware prefetch function (step S606).

そののち、最適化部３６は、ループ変形指示にしたがって、新ループをソースプログラムに挿入して最適化処理を行なって、新規ソースプログラムを生成する（ステップＳ６０７）。 Thereafter, the optimization unit 36 inserts a new loop into the source program in accordance with the loop transformation instruction, performs optimization processing, and generates a new source program (step S607).

さらに、オブジェクトファイル生成部３７は、新規ソースプログラムからオブジェクトファイルを生成し（ステップＳ６０８）、処理を終了する。 Furthermore, the object file generation unit 37 generates an object file from the new source program (step S608), and ends the process.

上述してきたように、本実施例によれば、ループ構造解析部３１、配列解析部３２およびアクセスパターン解析部３３は、ソースプログラムの各ループの前後関係、各ループにて利用される配列およびデータのアクセスパターンを解析し、依存関係解析部３４は、アクセスパターンから、前のループにおいて処理されるデータと次ループにおいて処理されるデータとの依存関係を解析する。そして、部分キャッシュ指示部３５は、依存関係に基づいて、前のループにおいて処理されたデータの格納先をメモリまたはキャッシュのいずれかに割り振るように決定し、最適化部３６は、格納先が決定された前のループを、キャッシュ利用のループおよびブロックストア命令のループに置き換えた新規ソースプログラムを生成し、オブジェクトファイル生成部３７は、新規ソースプログラムからオブジェクトファイルを生成する。 As described above, according to the present embodiment, the loop structure analysis unit 31, the sequence analysis unit 32, and the access pattern analysis unit 33 determine the context of each loop of the source program, the sequence and data used in each loop. The dependency analysis unit 34 analyzes the dependency between the data processed in the previous loop and the data processed in the next loop from the access pattern. Then, the partial cache instruction unit 35 determines to allocate the storage destination of the data processed in the previous loop to either the memory or the cache based on the dependency, and the optimization unit 36 determines the storage destination. A new source program is generated by replacing the previous loop with a loop using a cache and a loop of a block store instruction, and the object file generation unit 37 generates an object file from the new source program.

したがって、プログラム実行時において、キャッシュを利用すべきデータと、メモリに直接格納したほうがよいデータとを同一ループ内で分割することができ、キャッシュを有効的に利用することによりプログラムの実行性能を向上することが可能となる。 Therefore, at the time of program execution, the data that should use the cache and the data that should be stored directly in the memory can be divided in the same loop, and the execution performance of the program is improved by using the cache effectively It becomes possible to do.

また、本実施例では、依存関係解析部３４は、依存関係として、前ループにおいて処理されるデータの順番と、前ループの処理済みデータが次ループにおいて処理される順番とが順方向であるか逆方向であるかを解析し、部分キャッシュ指示部３５は、依存関係が順方向である場合、前ループの処理済みデータのうちキャッシュ容量に相当する前半部分をキャッシュ利用として決定し、依存関係が逆方向である場合、前ループの処理済みデータのうちキャッシュ容量に相当する後半部分をキャッシュ利用として決定する。 In the present embodiment, the dependency relationship analysis unit 34 determines whether the order of data processed in the previous loop and the order in which processed data of the previous loop are processed in the next loop are forward as the dependency relationship. If the dependency relationship is forward, the partial cache instruction unit 35 determines that the first half portion corresponding to the cache capacity of the processed data of the previous loop is used as a cache, and the dependency relationship is In the reverse direction, the latter half of the processed data of the previous loop corresponding to the cache capacity is determined as cache usage.

したがって、プログラム実行時において、キャッシュに格納されたデータが次ループにおいて確実にキャッシュヒットとなり、プログラムの実行性能をより向上することが可能となる。 Therefore, when the program is executed, the data stored in the cache is surely a cache hit in the next loop, and the execution performance of the program can be further improved.

また、本実施例では、部分キャッシュ指示部３５は、ブロックストアのループの次に、キャッシュのループを入れるように最適化部３６を指示するので、キャッシュに格納されたデータが次ループにおいてより確実にキャッシュヒットとなり、プログラムの実行性能をより向上することが可能となる。 Further, in this embodiment, the partial cache instruction unit 35 instructs the optimization unit 36 to insert a cache loop after the block store loop, so that the data stored in the cache is more reliable in the next loop. As a result, a cache hit occurs and the execution performance of the program can be further improved.

また、本実施例では、部分キャッシュ指示部３５は、逆方向ハードウェアプリフェッチ処理実行可能ならば、キャッシュ利用のループにおけるデータの処理順番を逆にするように指示する。また、部分キャッシュ指示部３５は、逆方向ハードウェアプリフェッチ処理実行不可ならば、キャッシュ利用のループにおけるデータの処理順番を同一とするように指示する。 In this embodiment, the partial cache instruction unit 35 instructs to reverse the data processing order in the cache use loop if the backward hardware prefetch process can be executed. If the backward hardware prefetch process cannot be executed, the partial cache instruction unit 35 instructs the data processing order in the cache use loop to be the same.

したがって、プログラムを実行する装置の機能に応じて、キャッシュに格納されたデータが次ループにおいて確実にキャッシュヒットとなるキャッシュ利用のループを作成することができ、プログラムの実行性能を適切に向上することが可能となる。 Therefore, according to the function of the device that executes the program, it is possible to create a cache use loop in which the data stored in the cache reliably causes a cache hit in the next loop, and appropriately improve the execution performance of the program Is possible.

なお、上記では、ループ変形指示が、配列データの依存関係と、キャッシュサイズと、逆方向ハードウェアプリフェッチ機能と応じてキャッシュ利用の新ループにおける回転の方向性を順方向または逆方向に設定する場合について説明した。しかし、本実施例はこれに限定されるものではなく、例えば、動画などのストリームデータのように、処理する順番が順方向となることが望ましいならば、異なる基準にてループ変形指示が実行される場合であってもよい。これについて、図２４を用いて説明する。なお、図２４は、部分キャッシュ指示部による処理の変形例を説明するためのフローチャートである。 In the above, when the loop transformation instruction sets the directionality of rotation in a new loop using cache according to the dependency relationship of array data, the cache size, and the backward hardware prefetch function. Explained. However, the present embodiment is not limited to this. For example, if it is desirable that the processing order be in the forward direction as in stream data such as moving images, the loop deformation instruction is executed based on different criteria. It may be the case. This will be described with reference to FIG. FIG. 24 is a flowchart for explaining a modification of the processing by the partial cache instruction unit.

図２４に示すように、変形例における部分キャッシュ指示部３５は、配列データの依存関係が更新されると（ステップＳ７０１肯定）、配列データのデータが、順次アクセスのパターンを保証する必要があるか否かを判定する（ステップＳ７０２）。 As shown in FIG. 24, the partial cache instruction unit 35 in the modification example needs to guarantee the sequential access pattern when the array data dependency is updated (Yes in step S701). It is determined whether or not (step S702).

ここで、順次アクセスのパターンを保証する必要がない場合（ステップＳ７０２否定）、部分キャッシュ指示部３５は、図１７を用いて説明したステップＳ５０２以降の処理を実行し、処理を終了する。 Here, when it is not necessary to guarantee the sequential access pattern (No at Step S702), the partial cache instruction unit 35 executes the processing after Step S502 described with reference to FIG. 17, and ends the processing.

一方、順次アクセスのパターンを保証する必要がある場合（ステップＳ７０２肯定）、部分キャッシュ指示部３５は、変形指示を行なう対象ループにおいて、キャッシュ利用のループの初期値、終値および増分値の設定を行なう（ステップＳ７０４）。すなわち、部分キャッシュ指示部３５は、順次アクセスのパターンを保証するように、キャッシュ利用のループの初期値、終値および増分値の設定を行なう。 On the other hand, when it is necessary to guarantee the sequential access pattern (Yes in step S702), the partial cache instruction unit 35 sets the initial value, the end value, and the increment value of the cache use loop in the target loop to be subjected to the modification instruction. (Step S704). That is, the partial cache instruction unit 35 sets the initial value, the end value, and the increment value of the cache use loop so as to guarantee the sequential access pattern.

そして、部分キャッシュ指示部３５は、ループインデックス計算の補正を行い（ステップＳ７０５）、対象ループから新規で作成したキャッシュ利用のループ分の回転を除外する（ステップＳ７０６）。 Then, the partial cache instruction unit 35 corrects the loop index calculation (step S705), and excludes the rotation of the newly created cache use loop from the target loop (step S706).

そののち、部分キャッシュ指示部３５は、キャッシュ利用のループにて処理されるデータ以外のデータを処理する対象ループのストアをブロックストアに変更し（ステップＳ７０７）、処理を終了する。 After that, the partial cache instruction unit 35 changes the store of the target loop that processes data other than the data processed in the cache use loop to a block store (step S707), and ends the process.

これにより、再利用されるストリームデータにおいて、順次アクセスのパターンを保証する必要がある場合であっても、キャッシュを有効利用して、プログラムの実行処理を向上することが可能となる。 As a result, even when it is necessary to guarantee the sequential access pattern in the stream data to be reused, it is possible to improve the program execution process by effectively using the cache.

また、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行なうこともできる（例えば、ハードウェア情報を、プログラマが手動で入力するなど）。あるいは、本実施例において説明した各処理のうち、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the present embodiment, all or a part of the processes described as being automatically performed can be manually performed (for example, a programmer manually inputs hardware information). Such). Alternatively, among the processes described in the present embodiment, all or a part of the processes described as being performed manually can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、ループ構造解析部３１、配列解析部３２、アクセスパターン解析部３３および依存関係解析部３４を統合して、ソースプログラム解析部としてもよい。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the loop structure analysis unit 31, the sequence analysis unit 32, the access pattern analysis unit 33, and the dependency relationship analysis unit 34 may be integrated into a source program analysis unit. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

ところで上記の実施例では、ハードウェアロジックによって各種の処理を実現する場合を説明したが、本発明はこれに限定されるものではなく、あらかじめ用意されたプログラムをコンピュータで実行するようにしてもよい。そこで以下では、図２５を用いて、上記の実施例に示したコンパイラ装置１０と同様の機能を有するコンパイラプログラムを実行するコンピュータの一例を説明する。図２５は、本実施例のコンパイラプログラムを実行するコンピュータを示す図である。 In the above embodiment, the case where various processes are realized by hardware logic has been described. However, the present invention is not limited to this, and a program prepared in advance may be executed by a computer. . In the following, an example of a computer that executes a compiler program having the same function as that of the compiler apparatus 10 shown in the above embodiment will be described with reference to FIG. FIG. 25 is a diagram illustrating a computer that executes the compiler program of this embodiment.

図２５に示すように、情報処理装置としてのコンピュータ１００は、キーボード１０１、ディスプレイ１０２、ＣＰＵ１０３、ＲＯＭ１０４、ＨＤＤ１０５、ＲＡＭ１０６を有する。そして、キーボード１０１、ディスプレイ１０２、ＣＰＵ１０３、ＲＯＭ１０４、ＨＤＤ１０５およびＲＡＭ１０６は、バス１０７などで接続される。また、コンピュータ１００は、情報処理装置４０と接続される。 As shown in FIG. 25, a computer 100 as an information processing apparatus includes a keyboard 101, a display 102, a CPU 103, a ROM 104, an HDD 105, and a RAM 106. The keyboard 101, the display 102, the CPU 103, the ROM 104, the HDD 105, and the RAM 106 are connected by a bus 107 or the like. The computer 100 is connected to the information processing apparatus 40.

ＲＯＭ１０４には、上記の実施例に示したコンパイラ装置１０と同様の機能を発揮するコンパイラプログラム、つまり、図２５に示すように、ループ構造解析プログラム１０４ａ、配列解析プログラム１０４ｂが予め記憶されている。また、ＲＯＭ１０４には、アクセスパターン解析プログラム１０４ｃ、依存関係解析プログラム１０４ｄ、部分キャッシュ指示プログラム１０４ｅが予め記憶されている。また、ＲＯＭ１０４には、最適化プログラム１０４ｆ、オブジェクトファイル生成プログラム１０４ｇが予め記憶されている。なお、これらのプログラム１０４ａ〜１０４ｇについては、図２に示したコンパイラ装置１０の各構成要素と同様、適宜統合または分散してもよい。 The ROM 104 stores in advance a compiler program that exhibits the same function as the compiler apparatus 10 shown in the above-described embodiment, that is, as shown in FIG. 25, a loop structure analysis program 104a and an array analysis program 104b. The ROM 104 stores in advance an access pattern analysis program 104c, a dependency relationship analysis program 104d, and a partial cache instruction program 104e. The ROM 104 stores an optimization program 104f and an object file generation program 104g in advance. Note that these programs 104a to 104g may be appropriately integrated or distributed in the same manner as each component of the compiler apparatus 10 shown in FIG.

そして、ＣＰＵ１０３が、これらのプログラム１０４ａ〜１０４ｇをＲＯＭ１０４から読み出して実行する。これにより、図２５に示すように、プログラム１０４ａおよびプログラム１０４ｂは、ループ構造解析プロセス１０３ａおよび配列解析プロセス１０３ｂとして機能するようになる。また、プログラム１０４ｃ〜１０４ｅは、アクセスパターン解析プロセス１０３ｃ、依存関係解析プロセス１０３ｄ、部分キャッシュ指示プロセス１０３ｅとして機能するようになる。また、プログラム１０４ｆおよびプログラム１０４ｇは、最適化プロセス１０３ｆおよびオブジェクトファイル生成プロセス１０３ｇとして機能するようになる。なお、各プロセス１０３ａ〜１０３ｇは、図２に示した、ループ構造解析部３１、配列解析部３２、アクセスパターン解析部３３、依存関係解析部３４、部分キャッシュ指示部３５、最適化部３６、オブジェクトファイル生成部３７にそれぞれ対応する。 Then, the CPU 103 reads out these programs 104a to 104g from the ROM 104 and executes them. As a result, as shown in FIG. 25, the program 104a and the program 104b function as a loop structure analysis process 103a and a sequence analysis process 103b. The programs 104c to 104e function as an access pattern analysis process 103c, a dependency relationship analysis process 103d, and a partial cache instruction process 103e. In addition, the program 104f and the program 104g function as an optimization process 103f and an object file generation process 103g. Each of the processes 103a to 103g includes the loop structure analysis unit 31, the sequence analysis unit 32, the access pattern analysis unit 33, the dependency relationship analysis unit 34, the partial cache instruction unit 35, the optimization unit 36, the object shown in FIG. Each corresponds to the file generation unit 37.

また、ＨＤＤ１０５には、図２５に示すように、ソースプログラムデータ１０５ａ、ループデータ１０５ｂ、配列データ１０５ｃ、ハードウェア情報データ１０５ｄ、新規ソースプログラムデータ１０５ｅが設けられる。各データ１０５ａ〜１０５ｅは、図２に用いたソースプログラム記憶部２１、ループデータ記憶部２２、配列データ記憶部２３、ハードウェア情報記憶部２４、新規ソースプログラム記憶部２５にそれぞれ対応する。そしてＣＰＵ１０３は、ソースプログラムデータ１０６ａをソースプログラムデータ１０５ａに対して登録し、ループデータ１０６ｂをループデータ１０５ｂに対して登録し、配列データ１０６ｃを配列データ１０５ｃに対して登録する。また、ＣＰＵ１０３は、ハードウェア情報データ１０６ｄをハードウェア情報データ１０５ｄに対して登録し、新規ソースプログラムデータ１０６ｅを新規ソースプログラムデータ１０５ｅに対して登録する。そして、ＣＰＵ１０３は、登録したデータを読み出してＲＡＭ１０６に格納し、ＲＡＭ１０６に格納されたソースプログラムデータ１０６ａ、ループデータ１０６ｂ、配列データ１０６ｃ、ハードウェア情報データ１０６ｄおよび新規ソースプログラムデータ１０６ｅに基づいてコンパイル処理を実行する。 As shown in FIG. 25, the HDD 105 is provided with source program data 105a, loop data 105b, array data 105c, hardware information data 105d, and new source program data 105e. Each data 105a to 105e corresponds to the source program storage unit 21, the loop data storage unit 22, the array data storage unit 23, the hardware information storage unit 24, and the new source program storage unit 25 used in FIG. The CPU 103 registers the source program data 106a with the source program data 105a, registers the loop data 106b with the loop data 105b, and registers the array data 106c with the array data 105c. Further, the CPU 103 registers the hardware information data 106d with respect to the hardware information data 105d, and registers the new source program data 106e with respect to the new source program data 105e. Then, the CPU 103 reads out the registered data and stores it in the RAM 106, and compile processing based on the source program data 106a, loop data 106b, array data 106c, hardware information data 106d, and new source program data 106e stored in the RAM 106. Execute.

なお、上記した各プログラム１０４ａ〜１０４ｇについては、必ずしも最初からＲＯＭ１０４に記憶させておく必要はなく、例えばコンピュータ１００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯディスク、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」、または、コンピュータ１００の内外に備えられるＨＤＤなどの「固定用物理媒体」、さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される「他のコンピュータ（またはサーバ）」などに各プログラムを記憶させておき、コンピュータ１００がこれらから各プログラムを読み出して実行するようにしてもよい。 The above-described programs 104a to 104g are not necessarily stored in the ROM 104 from the beginning. For example, a flexible disk (FD), a CD-ROM, an MO disk, a DVD disk, and a magneto-optical disk that are inserted into the computer 100. The computer 100 via a “portable physical medium” such as a disk or an IC card, or a “fixed physical medium” such as an HDD provided inside or outside the computer 100, and further via a public line, the Internet, a LAN, a WAN, etc. Each program may be stored in “another computer (or server)” connected to the computer, and the computer 100 may read and execute each program from these programs.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）ソースプログラムにおける各ループのデータへのアクセスパターンを解析するアクセスパターン解析手順と、
前記アクセスパターン解析手順によって解析された前記アクセスパターンから、現ループにおいて処理されるデータと次ループにおいて処理されるデータとの依存関係を解析する依存関係解析手順と、
前記依存関係解析手順によって解析された前記依存関係に基づいて、前記現ループにおいて処理されたデータの格納先をメモリまたはキャッシュのいずれかに割り振るように決定する格納先決定手順と、
前記格納先決定手順によって格納先が決定された現ループを、前記メモリに処理済みデータを格納するための第一ループおよび前記キャッシュに処理済みデータを格納するための第二ループに置き換えることにより、前記ソースプログラムから新規ソースプログラムを生成する新規ソースプログラム生成手順と、
前記新規ソースプログラム生成手順によって生成された前記新規ソースプログラムからオブジェクトファイルを生成するオブジェクトファイル生成手順と、
をコンピュータに実行させることを特徴とするコンパイラプログラム。 (Supplementary Note 1) Access pattern analysis procedure for analyzing access pattern to data of each loop in source program;
From the access pattern analyzed by the access pattern analysis procedure, a dependency analysis procedure for analyzing a dependency relationship between data processed in the current loop and data processed in the next loop;
A storage location determination procedure for determining to allocate a storage location of data processed in the current loop to either a memory or a cache based on the dependency relationship analyzed by the dependency relationship analysis procedure;
By replacing the current loop whose storage location is determined by the storage location determination procedure with a first loop for storing processed data in the memory and a second loop for storing processed data in the cache, A new source program generation procedure for generating a new source program from the source program;
An object file generation procedure for generating an object file from the new source program generated by the new source program generation procedure;
A compiler program characterized by causing a computer to execute.

（付記２）前記依存関係解析手順は、前記依存関係として、前記現ループにおいて処理されるデータの順番と、当該現ループの処理済みデータが前記次ループにおいて処理される順番とが順方向であるか逆方向であるかを解析し、
前記格納先決定手順は、前記依存関係解析手順によって解析された前記依存関係が順方向である場合、前記現ループの処理済みデータのうちキャッシュ容量に相当する前半部分を前記キャッシュに格納すると決定し、前記依存関係解析手順によって解析された前記依存関係が逆方向である場合、前記現ループの処理済みデータのうちキャッシュ容量に相当する後半部分を前記キャッシュに格納すると決定することを特徴とする付記１に記載のコンパイラプログラム。 (Supplementary Note 2) In the dependency relationship analysis procedure, as the dependency relationship, the order of data processed in the current loop and the order in which processed data of the current loop are processed in the next loop are forward. Or reverse direction,
The storage location determination procedure determines that the first half portion corresponding to the cache capacity of the processed data of the current loop is stored in the cache when the dependency relationship analyzed by the dependency relationship analysis procedure is forward. When the dependency relationship analyzed by the dependency relationship analysis procedure is in the reverse direction, it is determined that the second half portion corresponding to the cache capacity of the processed data of the current loop is stored in the cache. The compiler program according to 1.

（付記３）前記格納先決定手順は、前記第一ループの次に前記第二ループを実行する新規ソースプログラムを生成するように、前記新規ソースプログラム生成手順を指示することを特徴とする付記２に記載のコンパイラプログラム。 (Additional remark 3) The said storage location determination procedure directs the said new source program production | generation procedure so that the new source program which performs said 2nd loop after the said 1st loop may be produced | generated. The compiler program described in.

（付記４）前記オブジェクトファイルを実行する情報処理装置のハードウェア情報を収集して、当該情報処理装置が逆方向ハードウェアプリフェッチ処理実行可能であるか否かを判定する判定手順をさらにコンピュータに実行させ、
前記格納先決定手順は、前記判定手順により前記情報処理装置が逆方向ハードウェアプリフェッチ処理実行可能であるとして判定された場合、前記第二のループにおけるデータの処理順番を、前記現ループにおける処理順番に対して逆にしたうえで前記新規ソースプログラムを生成し、前記判定手順により前記情報処理装置が逆方向ハードウェアプリフェッチ処理実行不可として判定された場合、前記第二のループにおけるデータの処理順番を、前記現ループにおける処理順番と同一にしたうえで前記新規ソースプログラムを生成するように、前記新規ソースプログラム生成手順を指示することを特徴とする付記３に記載のコンパイラプログラム。 (Supplementary Note 4) Collecting hardware information of an information processing apparatus that executes the object file, and further executing a determination procedure for determining whether or not the information processing apparatus is capable of executing backward hardware prefetch processing on the computer Let
In the storage location determination procedure, when it is determined by the determination procedure that the information processing apparatus can execute backward hardware prefetch processing, the data processing order in the second loop is changed to the processing order in the current loop. When the information processing apparatus determines that the backward hardware prefetch process cannot be executed by the determination procedure, the data processing order in the second loop is changed. The compiler program according to appendix 3, wherein the new source program generation procedure is instructed to generate the new source program in the same processing order as in the current loop.

（付記５）ソースプログラムにおける各ループのデータへのアクセスパターンを解析するアクセスパターン解析部と、
前記アクセスパターン解析部によって解析された前記アクセスパターンから、現ループにおいて処理されるデータと次ループにおいて処理されるデータとの依存関係を解析する依存関係解析部と、
前記依存関係解析部によって解析された前記依存関係に基づいて、前記現ループにおいて処理されたデータの格納先をメモリまたはキャッシュのいずれかに割り振るように決定する格納先決定部と、
前記格納先決定部によって格納先が決定された現ループを、前記メモリに処理済みデータを格納するための第一ループおよび前記キャッシュに処理済みデータを格納するための第二ループに置き換えた新規ソースプログラムを生成する新規ソースプログラム生成部と、
前記新規ソースプログラム生成部によって生成された前記新規ソースプログラムからオブジェクトファイルを生成するオブジェクトファイル生成部と、
を有することを特徴とするコンパイラ装置。 (Supplementary Note 5) An access pattern analysis unit that analyzes an access pattern to data of each loop in the source program;
From the access pattern analyzed by the access pattern analysis unit, a dependency analysis unit that analyzes a dependency relationship between data processed in the current loop and data processed in the next loop;
A storage location determination unit that determines to allocate a storage location of data processed in the current loop to either a memory or a cache based on the dependency relationship analyzed by the dependency relationship analysis unit;
A new source in which the current loop whose storage destination is determined by the storage destination determination unit is replaced with a first loop for storing processed data in the memory and a second loop for storing processed data in the cache A new source program generator for generating a program;
An object file generation unit that generates an object file from the new source program generated by the new source program generation unit;
A compiler apparatus comprising:

（付記６）前記依存関係解析部は、前記依存関係として、前記現ループにおいて処理されるデータの順番と、当該現ループの処理済みデータが前記次ループにおいて処理される順番とが順方向であるか逆方向であるかを解析し、
前記格納先決定部は、前記依存関係解析部によって解析された前記依存関係が順方向である場合、前記現ループの処理済みデータのうちキャッシュ容量に相当する前半部分を前記キャッシュに格納すると決定し、前記依存関係解析部によって解析された前記依存関係が逆方向である場合、前記現ループの処理済みデータのうちキャッシュ容量に相当する後半部分を前記キャッシュに格納すると決定することを特徴とする付記５に記載のコンパイラ装置。 (Additional remark 6) The said dependence analysis part is a forward direction as said dependence, the order of the data processed in the said current loop, and the order in which the processed data of the said current loop are processed in the said next loop. Or reverse direction,
The storage destination determining unit determines to store, in the cache, the first half portion corresponding to the cache capacity of the processed data of the current loop when the dependency relationship analyzed by the dependency relationship analyzing unit is forward. When the dependency relationship analyzed by the dependency relationship analysis unit is in the reverse direction, it is determined that the second half portion corresponding to the cache capacity of the processed data of the current loop is stored in the cache. 5. The compiler apparatus according to 5.

（付記７）前記格納先決定部は、前記第一ループの次に前記第二ループを実行する新規ソースプログラムを生成するように、前記新規ソースプログラム生成部を指示することを特徴とする付記６に記載のコンパイラ装置。 (Supplementary note 7) The storage destination determination unit instructs the new source program generation unit to generate a new source program that executes the second loop after the first loop. The compiler apparatus described in 1.

（付記８）前記オブジェクトファイルを実行する情報処理装置のハードウェア情報を収集して、当該情報処理装置が逆方向ハードウェアプリフェッチ処理実行可能であるか否かを判定する判定部をさらに有し、
前記格納先決定部は、前記判定部により前記情報処理装置が逆方向ハードウェアプリフェッチ処理実行可能であるとして判定された場合、前記第二ループにおけるデータの処理順番を、前記現ループにおける処理順番に対して逆にしたうえで前記新規ソースプログラムを生成し、前記判定部により前記情報処理装置が逆方向ハードウェアプリフェッチ処理実行不可として判定された場合、前記第二ループにおけるデータの処理順番を、前記現ループにおける処理順番と同一にしたうえで前記新規ソースプログラムを生成するように、前記新規ソースプログラム生成部を指示することを特徴とする付記７に記載のコンパイラ装置。 (Additional remark 8) It further has the determination part which collects the hardware information of the information processing apparatus which performs the said object file, and determines whether the said information processing apparatus can perform reverse direction hardware prefetch processing,
When the information processing apparatus determines that the information processing apparatus can execute backward hardware prefetch processing by the determination unit, the storage destination determination unit changes the data processing order in the second loop to the processing order in the current loop. On the other hand, when generating the new source program and the determination unit determines that the information processing apparatus cannot execute backward hardware prefetch processing, the processing order of the data in the second loop is The compiler apparatus according to appendix 7, wherein the new source program generation unit is instructed to generate the new source program in the same processing order as in the current loop.

１０コンパイラ装置
１１ソースプログラム入力部
１２オブジェクトファイル出力部
１３通信部
１４入出力制御Ｉ／Ｆ部
２０記憶部
２１ソースプログラム記憶部
２２ループデータ記憶部
２３配列データ記憶部
２４ハードウェア情報記憶部
２５新規ソースプログラム記憶部
３０処理部
３１ループ構造解析部
３２配列解析部
３３アクセスパターン解析部
３４依存関係解析部
３５部分キャッシュ指示部
３６最適化部
３７オブジェクトファイル生成部
４０情報処理装置 DESCRIPTION OF SYMBOLS 10 Compiler apparatus 11 Source program input part 12 Object file output part 13 Communication part 14 Input / output control I / F part 20 Storage part 21 Source program storage part 22 Loop data storage part 23 Array data storage part 24 Hardware information storage part 25 New Source program storage unit 30 Processing unit 31 Loop structure analysis unit 32 Sequence analysis unit 33 Access pattern analysis unit 34 Dependency analysis unit 35 Partial cache instruction unit 36 Optimization unit 37 Object file generation unit 40 Information processing device

Claims

An access pattern analysis procedure for analyzing the access pattern to the data of each loop in the source program;
From the access pattern analyzed by the access pattern analysis procedure, a dependency analysis procedure for analyzing a dependency relationship between data processed in the current loop and data processed in the next loop;
A storage location determination procedure for determining to allocate a storage location of data processed in the current loop to either a memory or a cache based on the dependency relationship analyzed by the dependency relationship analysis procedure;
By replacing the current loop whose storage location is determined by the storage location determination procedure with a first loop for storing processed data in the memory and a second loop for storing processed data in the cache, A new source program generation procedure for generating a new source program from the source program;
An object file generation procedure for generating an object file from the new source program generated by the new source program generation procedure;
A compiler program characterized by causing a computer to execute.

In the dependency relationship analyzing procedure, as the dependency relationship, the order of data processed in the current loop and the order in which processed data of the current loop are processed in the next loop are forward or backward. Analyzing whether there is
The storage location determination procedure determines that the first half portion corresponding to the cache capacity of the processed data of the current loop is stored in the cache when the dependency relationship analyzed by the dependency relationship analysis procedure is forward. When the dependency relationship analyzed by the dependency relationship analysis procedure is in the reverse direction, it is determined that the second half portion corresponding to the cache capacity of the processed data of the current loop is stored in the cache. Item 1. The compiler program according to item 1.

The said storage location determination procedure directs the said new source program production | generation procedure so that the new source program which performs said 2nd loop after the said 1st loop may be produced | generated. Compiler program.

Collecting hardware information of the information processing apparatus that executes the object file, and causing the computer to further execute a determination procedure for determining whether or not the information processing apparatus is capable of executing backward hardware prefetch processing,
In the storage location determination procedure, when it is determined by the determination procedure that the information processing apparatus can execute backward hardware prefetch processing, the data processing order in the second loop is changed to the processing order in the current loop. On the other hand, the new source program is generated and when the information processing apparatus is determined not to be able to execute backward hardware prefetch processing by the determination procedure, the processing order of the data in the second loop is 4. The compiler program according to claim 3, wherein the new source program generation procedure is instructed to generate the new source program in the same processing order as in the current loop.

An access pattern analysis unit for analyzing the access pattern to the data of each loop in the source program;
From the access pattern analyzed by the access pattern analysis unit, a dependency analysis unit that analyzes a dependency relationship between data processed in the current loop and data processed in the next loop;
A storage location determination unit that determines to allocate a storage location of data processed in the current loop to either a memory or a cache based on the dependency relationship analyzed by the dependency relationship analysis unit;
A new source in which the current loop whose storage destination is determined by the storage destination determination unit is replaced with a first loop for storing processed data in the memory and a second loop for storing processed data in the cache A new source program generator for generating a program;
An object file generation unit that generates an object file from the new source program generated by the new source program generation unit;
A compiler apparatus comprising: