JP2011197876A

JP2011197876A - Program, apparatus and method for optimization processing

Info

Publication number: JP2011197876A
Application number: JP2010062312A
Authority: JP
Inventors: Shuichi Chiba; 修一千葉; Tomoko Shoji; 智子庄司
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-03-18
Filing date: 2010-03-18
Publication date: 2011-10-06
Anticipated expiration: 2030-03-18
Also published as: JP5402746B2

Abstract

PROBLEM TO BE SOLVED: To provide a technology capable of executing optimization by blocking even to a multiple loop not smaller than a triple loop, and the multiple loop including an array of a three-dimensional array or more.SOLUTION: A processing target loop analysis part 100 extracts a tight loop from a source program as a blocking target loop. An array analysis part 110 extracts an array included in the blocking target loop. When there is an array whose access pattern is a cross, an access pattern analysis part 120, determines that blocking to the blocking target loop is valid. A memory size calculation formula generation part 130 generates a calculation formula of a size of a memory to be accessed by the array. A divided block length calculation part 140 determines a divided block length by which a memory size not exceeding a cache size can be obtained by using the memory size calculation formula. An optimization execution part 17 executes blocking using the determined divided block length.

Description

本発明は，プログラミング言語で記述されたソースプログラムに含まれるループの最適化を行う最適化処理プログラム，最適化処理装置および最適化処理方法に関するものである。 The present invention relates to an optimization processing program, an optimization processing apparatus, and an optimization processing method for optimizing a loop included in a source program described in a programming language.

コンパイラには，キャッシュメモリをソフトウェアから有効活用するための技術として，ブロッキングと呼ばれる最適化の技術が用意されている。ブロッキングは，ソースプログラム中に存在する多重ループの処理構造に対して，その最内ループ内で使用する配列に対するメモリアクセスがキャッシュメモリのサイズに収まるように，多重ループの回転数を分割する技術である。コンパイラがソースプログラムに対してブロッキングによる最適化を実行することにより，配列のデータが常にキャッシュメモリ上に保持されるようになり，キャッシュヒット率が向上する。 The compiler provides an optimization technique called blocking as a technique for effectively using cache memory from software. Blocking is a technology that divides the number of rotations of a multi-loop so that memory access to the array used in the innermost loop is within the size of the cache memory for the multi-loop processing structure existing in the source program. is there. When the compiler performs blocking optimization on the source program, the array data is always held in the cache memory, and the cache hit rate is improved.

なお，最適化の対象となる多重ループについて，その最内ループと最内の１つ外側のループに関してのみ最適化を行うブロッキングの技術が知られている。また，３次元以上の配列を含む多重ループについて，その最内ループに関してのみ最適化を行うブロッキングの技術が知られている。 A blocking technique is known in which optimization is performed only for the innermost loop and the innermost loop outside the multiple loop to be optimized. In addition, there is known a blocking technique that optimizes only the innermost loop of a multiple loop including a three-dimensional array or more.

特開平５−２６５７７０号公報JP-A-5-265770

上述の従来のブロッキングによる多重ループの最適化の技術には，３重以上のループに対してブロッキングを適用することができない，３次元以上の配列を含む多重ループに対してブロッキングを適用することができないという問題があった。 In the conventional technique for optimizing multiple loops by blocking, it is not possible to apply blocking to three or more loops, and it is possible to apply blocking to multiple loops including arrays of three or more dimensions. There was a problem that I could not.

近年，ハードウェア性能の向上に合わせて，より高度で複雑な科学演算が行われる傾向がある。このようなニーズに答えるためには，上記の従来技術の問題を解決する必要がある。 In recent years, there has been a tendency for more sophisticated and complex scientific operations to be performed as hardware performance improves. In order to answer such needs, it is necessary to solve the above-mentioned problems of the prior art.

本発明は，上記の問題の解決を図り，３重ループ以上のループに対して，また３次元以上の配列を含む多重ループに対して，広範囲にブロッキングによる最適化の実行を可能とする技術を提供することを目的とする。 The present invention is a technique for solving the above-described problems and enabling optimization by blocking over a wide range for loops of three or more loops and for multiple loops including arrays of three or more dimensions. The purpose is to provide.

開示するプログラムは，プログラミング言語で記述されたソースプログラムをコンパイルするコンパイラにおいて，ソースプログラムに含まれる多重ループに対するブロッキングによる最適化の処理を行うコンピュータを，次のように機能させる。 In the disclosed program, a compiler that compiles a source program written in a programming language causes a computer that performs optimization processing by blocking against multiple loops included in the source program to function as follows.

すなわち，前記プログラムは，前記プログラムがインストールされて実行されるコンピュータに，ソースプログラムに含まれる多重ループから最内ループにのみ実行文を有する構造のループをブロッキング対象ループとして抽出する手順と，ブロッキング対象ループ内に存在する配列を抽出する手順と，抽出された配列の中に添え字として出現する制御変数の順序がブロッキング対象ループの最内ループから最外ループに向かって出現する制御変数の順序と異なる配列が存在する場合に，ブロッキング対象ループに対するブロッキングが有効であると判定する手順と，ブロッキングが有効であると判定されたブロッキング対象ループについて，ブロッキング対象ループに対するブロッキングによって最適化される範囲で配列が使用するメモリサイズを算出する計算式を配列ごとに生成する手順と，配列ごとに生成された計算式からブロッキング対象ループに対するブロッキングによって最適化される範囲でブロッキング対象ループ内の全配列が使用するメモリサイズを計算する計算式を生成する手順と，全配列が使用するメモリサイズを計算する計算式を用いて，ブロッキング対象ループに対するブロッキングによって最適化される範囲で配列に対するアクセスによってキャッシュミスが発生しない分割ブロック長を自動計算する手順と，自動計算された分割ブロック長を用いてブロッキング対象ループに対するブロッキングによる最適化を行う手順とを実行させる。 That is, the program includes a procedure for extracting, as a blocking target loop, a loop having a structure having an executable statement only in the innermost loop from the multiple loops included in the source program on a computer on which the program is installed and executed. The procedure for extracting an array existing in a loop, and the order of control variables appearing as subscripts in the extracted array from the innermost loop of the blocking target loop to the outermost loop When there is a different sequence, the procedure for determining that blocking for the blocking target loop is effective and the blocking target loop for which blocking is determined to be effective are arranged within the range optimized by blocking for the blocking target loop. Memory used by Calculate the memory size used by all arrays in the blocking target loop within the range that is optimized by blocking for the blocking target loop from the calculation formula generated for each array Using the procedure for generating the calculation formula to be used and the calculation formula for calculating the memory size used by all arrays, the divided block length that does not cause a cache miss due to access to the array within the range optimized by blocking for the blocking loop is determined. A procedure for automatic calculation and a procedure for performing optimization by blocking the blocking target loop using the automatically calculated divided block length are executed.

上記の技術によって，３重以上のループに対して，最内ループ内に含まれる配列の次元数に影響なく，広範囲にブロッキングによる最適化を実行することが可能となる。これにより，従来はブロッキングによる最適化の実行が不可能であったループに対してもブロッキングを適用して，キャッシュメモリのヒット率が改善することで，最適化されるプログラムの実行性能が向上する。 With the above technique, it is possible to perform optimization by blocking over a wide range without affecting the number of dimensions of the array included in the innermost loop for a loop of three or more layers. This improves the execution performance of the optimized program by improving the cache memory hit rate by applying blocking to loops that could not be optimized by blocking. .

３重ループに対する従来のブロッキングの例を説明する図である。It is a figure explaining the example of the conventional blocking with respect to a triple loop. メモリアクセスの例を示す図である。It is a figure which shows the example of a memory access. ３重ループのすべてに対してブロッキングを行った場合のブロッキング後の最適化対象ループの例を示す図である。It is a figure which shows the example of the optimization object loop after blocking at the time of blocking with respect to all the triple loops. メモリアクセスの例を示す図である。It is a figure which shows the example of a memory access. ３次元の配列を含む多重ループに対する従来のブロッキングの例を説明する図である。It is a figure explaining the example of the conventional blocking with respect to the multiple loop containing a three-dimensional arrangement | sequence. ３次元の配列を含む多重ループに対して，最内ループ以外のループもブロッキングを行った場合のブロッキング後の最適化対象ループの例を示す図である。It is a figure which shows the example of the optimization object loop after blocking when loops other than innermost loop are also blocked with respect to the multiple loop containing a three-dimensional arrangement | sequence. 本実施の形態によるコンパイラの機能構成例を示す図である。It is a figure which shows the function structural example of the compiler by this Embodiment. 本実施の形態によるコンピュータのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the computer by this Embodiment. 本実施の形態によるソース解析部の機能構成例を示す図である。It is a figure which shows the function structural example of the source analysis part by this Embodiment. 本実施の形態の最適化処理部によるブロッキング最適化処理フローチャートである。It is a blocking optimization process flowchart by the optimization process part of this Embodiment. 本実施の形態による最適化の対象となるループとブロッキングによる最適化実行後のループとの例を示す図である。It is a figure which shows the example of the loop used as the object of the optimization by this Embodiment, and the loop after the optimization execution by blocking. 本実施の形態によるループデータのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of the loop data by this Embodiment. 本実施の形態によるタイトなループを説明する図である。It is a figure explaining the tight loop by this Embodiment. 本実施の形態の処理対象ループ解析部による処理対象ループ解析処理フローチャートである。It is a process target loop analysis process flowchart by the process target loop analysis part of this Embodiment. 本実施の形態による配列データの管理を説明する図である。It is a figure explaining management of arrangement data by this embodiment. 本実施の形態の配列解析部による配列解析処理フローチャートである。It is a sequence analysis processing flowchart by the sequence analysis unit of the present embodiment. 本実施の形態によるアクセスパターンを説明する図である。It is a figure explaining the access pattern by this Embodiment. 本実施の形態によるループデータにおける有効フラグを設定した例を示す図である。It is a figure which shows the example which set the valid flag in the loop data by this Embodiment. 本実施の形態のアクセスパターン解析部によるアクセスパターン解析処理フローチャートである。It is an access pattern analysis process flowchart by the access pattern analysis part of this Embodiment. 本実施の形態のアクセスパターン解析部による添え字の連鎖作成処理フローチャートである。It is a subscript chain creation processing flowchart by the access pattern analysis unit of the present embodiment. 本実施の形態による配列別メモリ計算式の生成を説明する図である。It is a figure explaining the production | generation of the memory calculation formula classified by arrangement | sequence by this Embodiment. 本実施の形態による配列別メモリ計算式からのメモリサイズ計算式の生成を説明する図である。It is a figure explaining the production | generation of the memory size calculation formula from the memory calculation formula classified by arrangement | sequence by this Embodiment. 本実施の形態のメモリサイズ計算式生成部によるメモリサイズ計算式生成処理フローチャートである。It is a memory size calculation formula production | generation process flowchart by the memory size calculation formula production | generation part of this Embodiment. 本実施の形態のメモリサイズ計算式生成部による配列別メモリ計算データ生成処理フローチャートである。It is a memory calculation data generation processing flowchart according to arrangement | sequence by the memory size calculation formula production | generation part of this Embodiment. 本実施の形態によるメモリサイズ計算式を用いて分割ブロック長を自動計算する例を説明する図である。It is a figure explaining the example which calculates a division | segmentation block length automatically using the memory size calculation formula by this Embodiment. 本実施の形態の分割ブロック長計算部による分割ブロック長計算処理フローチャートである。It is a divided block length calculation process flowchart by the divided block length calculation part of this Embodiment. 本実施の形態によるブロッキング指示部によるブロッキング指示処理フローチャートである。It is a blocking instruction | indication processing flowchart by the blocking instruction | indication part by this Embodiment. 本実施の形態のブロッキングによる最適化を３重以上のループに対して実行した例である。This is an example in which the optimization by blocking of the present embodiment is executed for a loop of three or more layers. 本実施の形態のブロッキングによる最適化を３次元以上の配列を含む多重ループに対して実行した例である。This is an example in which the optimization by blocking according to the present embodiment is performed on a multiple loop including an array of three or more dimensions. 本実施の形態のブロッキングによる最適化をブロッキング抑止の最適化指示行を含む多重ループに対して実行した例である。This is an example in which optimization by blocking according to the present embodiment is performed on a multiple loop including an optimization instruction line for blocking inhibition. 本実施の形態のブロッキングによる最適化を固定の分割ブロック長の最適化指示行を含む多重ループに対して実行した例である。This is an example in which the optimization by blocking according to the present embodiment is executed for a multiple loop including an optimization instruction line having a fixed divided block length.

以下，本実施の形態について，図を用いて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

まず，本実施の形態によるブロッキングの技術を説明する前に，従来のブロッキングの技術について，簡単に説明する。 First, before describing the blocking technique according to the present embodiment, a conventional blocking technique will be briefly described.

図１は，３重ループに対する従来のブロッキングの例を説明する図である。 FIG. 1 is a diagram illustrating an example of conventional blocking for a triple loop.

図１（Ａ）は，最適化対象ループの例を示す。図１（Ａ）に示す最適化対象ループは，ループＩ，ループＪ，ループＫの３重ループである。図１（Ａ）に示す最適化対象ループでは，最内ループ内にのみ，２次元の配列を用いた演算の実行文が含まれている。 FIG. 1A shows an example of the optimization target loop. The optimization target loop shown in FIG. 1A is a triple loop of loop I, loop J, and loop K. In the optimization target loop shown in FIG. 1A, only the innermost loop includes an execution statement for an operation using a two-dimensional array.

なお，本実施の形態では，制御変数をループの名称に用いるものとする。具体的には，本実施の形態では，制御変数ＩのループをループＩと呼ぶものとする。 In this embodiment, the control variable is used as the name of the loop. Specifically, in the present embodiment, the loop of the control variable I is called a loop I.

図１（Ｂ）は，図１（Ａ）に示す最適化対象ループに対して，インデックス交換を行った結果を示す。インデックスとは，制御変数のことである。インデックス交換は，配列へのアクセスがメモリ上の連続領域へのアクセスとなるように，内側のループと外側のループとを入れ換える処理である。ブロッキングは連続領域へのメモリアクセスに対して効果が高いため，インデックス交換により連続領域へのメモリアクセスとなる場合は，事前にインデックス交換を行う必要がある。インデックス交換の詳細については，特許文献１の段落［００３４］〜［００４３］，図３に記載されている。 FIG. 1B shows the result of index exchange for the optimization target loop shown in FIG. An index is a control variable. The index exchange is a process of exchanging the inner loop and the outer loop so that access to the array is access to a continuous area in the memory. Since blocking is highly effective for memory access to the continuous area, if the memory access to the continuous area is caused by index exchange, it is necessary to exchange the index in advance. Details of the index exchange are described in paragraphs [0034] to [0043] of FIG.

図１（Ｃ）は，図１（Ｂ）に示すインデックス交換後の最適化対象ループに対して，従来のブロッキングを行った結果を示す。従来の技術では，最内ループとその１つ外側のループとに対してのみ，すなわち図１に示す例ではループＩとループＪとに対してのみ，ブロッキングが行われる。図１（Ｃ）に示すブロッキング後の最適化対象ループにおいて，ループの分割単位を示す分割ブロック長ｂｌｏｃｋには，コンパイラのブロッキング機能によって，適切な値が求められる。 FIG. 1 (C) shows the result of performing conventional blocking on the optimization target loop after the index exchange shown in FIG. 1 (B). In the prior art, blocking is performed only for the innermost loop and the loop that is one outer side thereof, that is, only for the loop I and the loop J in the example shown in FIG. In the optimization target loop after blocking shown in FIG. 1C, an appropriate value is obtained for the divided block length block indicating the division unit of the loop by the blocking function of the compiler.

図２は，メモリアクセスの例を示す図である。 FIG. 2 is a diagram illustrating an example of memory access.

図２に示すメモリアクセスの例は，図１（Ｃ）に示すブロッキング後の最適化対象ループにおけるメモリアクセスの例である。図２において，太線の枠は，各配列のデータが記憶されたメモリ領域を示し，細線の枠は，一辺が分割ブロック長ｂｌｏｃｋである領域を示している。図２において，網掛けの領域は，図１（Ｃ）に示すブロッキング後の最適化対象ループにおいて，制御変数ＪＪ，ＩＩを固定して，ループＩ，ループＪ，ループＫとが回転したときの，演算の実行文に含まれる配列ごとのアクセス領域を示す。 The example of memory access shown in FIG. 2 is an example of memory access in the optimization target loop after blocking shown in FIG. In FIG. 2, a thick line frame indicates a memory area in which data of each array is stored, and a thin line frame indicates an area whose one side is a divided block length block. In FIG. 2, the shaded area is when the control variables JJ and II are fixed and the loops I, J, and K are rotated in the optimization target loop after blocking shown in FIG. , Indicates the access area for each array included in the execution statement of the operation.

図２に示すように，ブロッキングが行われたループの制御変数ＩとＪとを添え字とする配列Ａについては，メモリアクセスが局所化されているため，キャッシュメモリに効率的にデータをのせることができる。しかし，ブロッキングが行われなかったループの制御変数Ｋを添え字に含む配列Ｂ，配列Ｃについては，メモリアクセスが局所化されないため，キャッシュメモリに効率的にデータをのせることができない。 As shown in FIG. 2, since the memory access is localized for the array A having the subscripts of the control variables I and J of the loop in which blocking is performed, the data is efficiently placed in the cache memory. be able to. However, since the memory access is not localized for the arrays B and C including the control variable K of the loop that has not been blocked as a subscript, data cannot be efficiently placed in the cache memory.

このように，従来のブロッキングの技術では，３重以上のループを対象としたブロッキングができないため，キャッシュメモリに効率的にデータをのせることができる配列の範囲が，狭い範囲に限られていた。 In this way, with the conventional blocking technique, since it is not possible to block over three or more loops, the range of arrays in which data can be efficiently placed in the cache memory has been limited to a narrow range. .

図３は，３重ループのすべてに対してブロッキングを行った場合のブロッキング後の最適化対象ループの例を示す図である。 FIG. 3 is a diagram illustrating an example of an optimization target loop after blocking when blocking is performed on all triple loops.

図３に示すブロッキング後の最適化対象ループの例は，図１（Ｂ）に示すインデックス交換後の最適化対象ループにおいて，３重ループのすべてに対してブロッキングを行ったと仮定した場合の結果の例である。 The example of the optimization target loop after blocking shown in FIG. 3 is the result of assuming that blocking is performed for all of the triple loops in the optimization target loop after index exchange shown in FIG. It is an example.

例えば，実行文に含まれる配列の添え字において，各ループの制御変数の登場回数があまり変わらないような場合には，すべてのループに対してブロッキングを行うことにより，よりよいメモリ効率が得られる場合がある。 For example, if the number of occurrences of control variables in each loop does not change much in the array subscripts included in the executable statement, better memory efficiency can be obtained by blocking all loops. There is a case.

図４は，メモリアクセスの例を示す図である。 FIG. 4 is a diagram illustrating an example of memory access.

図４に示すメモリアクセスの例は，図３に示すブロッキング後の最適化対象ループにおけるメモリアクセスの例である。すなわち，図４に示すメモリアクセスの例は，図１（Ｂ）に示すインデックス交換後の最適化対象ループにおいて，３重ループのすべてに対してブロッキングを行ったと仮定した場合のブロッキング後の最適化対象ループにおけるメモリアクセスの例である。 The memory access example shown in FIG. 4 is an example of memory access in the optimization target loop after blocking shown in FIG. That is, the example of the memory access shown in FIG. 4 is an optimization after blocking when it is assumed that all the triple loops are blocked in the optimization target loop after the index exchange shown in FIG. It is an example of the memory access in an object loop.

図４において，太線の枠は，各配列のデータが記憶されたメモリ領域を示し，細線の枠は，一辺が分割ブロック長ｂｌｏｃｋである領域を示している。図２において，網掛けの領域は，図３に示すブロッキング後の最適化対象ループにおいて，制御変数ＪＪ，ＩＩ，ＫＫとを固定して，ループＩ，ループＪ，ループＫとが回転したときの，演算の実行文に含まれる配列ごとのアクセス領域を示す。 In FIG. 4, a thick line frame indicates a memory area in which data of each array is stored, and a thin line frame indicates an area whose one side is a divided block length block. In FIG. 2, the shaded area is when the control variables JJ, II, and KK are fixed and the loops I, J, and K are rotated in the optimization target loop after blocking shown in FIG. , Indicates the access area for each array included in the execution statement of the operation.

図４に示すように，３重ループのすべてに対してブロッキングを行ったと仮定すると，配列Ａ，配列Ｂ，配列Ｃのすべてについて，メモリアクセスが局所化されるため，すべての配列のデータを，キャッシュメモリに効率的にのせることができる。 As shown in FIG. 4, assuming that blocking is performed for all the triple loops, memory access is localized for all of the arrays A, B, and C. The cache memory can be loaded efficiently.

図３に示すような３重以上のループを対象としたブロッキングを行うことができれば，従来のブロッキングの技術と比較して，よりメモリ効率が高いループの最適化を行うことができるようになる。 If blocking for a loop of three or more loops as shown in FIG. 3 can be performed, a loop with higher memory efficiency can be optimized as compared with the conventional blocking technique.

図５は，３次元の配列を含む多重ループに対する従来のブロッキングの例を説明する図である。 FIG. 5 is a diagram for explaining an example of conventional blocking for a multiple loop including a three-dimensional array.

図５（Ａ）は，最適化対象ループの例を示す。図５（Ａ）に示す最適化対象ループは，ループＩ，ループＪ，ループＫの３重ループである。図５（Ａ）に示す最適化対象ループでは，最内ループ内にのみ，３次元の配列を含む演算の実行文が含まれている。 FIG. 5A shows an example of the optimization target loop. The optimization target loop shown in FIG. 5A is a triple loop of loop I, loop J, and loop K. In the optimization target loop shown in FIG. 5A, only the innermost loop includes an execution statement for an operation including a three-dimensional array.

図５（Ｂ）は，図５（Ａ）に示す最適化対象ループに対して，従来のブロッキングを行った結果を示す。従来の技術では，最内ループ内で３次元以上の配列が使用されている場合に，翻訳コストの問題から，最内ループのみ，すなわち図５に示す例ではループＩのみをブロッキングの対象とする。しかし，実際には，制御変数Ｊを更新するループＪに対してもブロッキングは有効である。 FIG. 5B shows the result of conventional blocking performed on the optimization target loop shown in FIG. In the conventional technique, when an array of three or more dimensions is used in the innermost loop, only the innermost loop, that is, only the loop I in the example shown in FIG. . However, in practice, blocking is also effective for the loop J that updates the control variable J.

このように，従来のブロッキングの技術では，３次元の配列を含む多重ループに対しては，最内ループを対象としたブロッキングしかできないため，あまり効果的なループの最適化はできなかった。 As described above, with the conventional blocking technique, it is only possible to block the innermost loop with respect to a multiple loop including a three-dimensional array. Therefore, the loop cannot be optimized very effectively.

図６は，３次元の配列を含む多重ループに対して，最内ループ以外のループもブロッキングを行った場合のブロッキング後の最適化対象ループの例を示す図である。 FIG. 6 is a diagram illustrating an example of an optimization target loop after blocking when a loop other than the innermost loop is also blocked with respect to a multiple loop including a three-dimensional array.

図６に示すブロッキング後の最適化対象ループの例は，図５（Ａ）に示す３次元の配列を含む最適化対象ループにおいて，ループＩとループＪとに対してブロッキングを行ったと仮定した場合の結果の例である。 The example of the optimization target loop after blocking shown in FIG. 6 is based on the assumption that blocking is performed on loop I and loop J in the optimization target loop including the three-dimensional array shown in FIG. This is an example of the result.

図６に示すように，３次元以上の配列を含む多重ループにおいて，最内ループ以外のループもブロッキングの対象とすることで，よりメモリ効率が高いループの最適化を行うことができるようになる。 As shown in FIG. 6, in a multiple loop including a three-dimensional array or more, loops other than the innermost loop are also subject to blocking, so that the loop with higher memory efficiency can be optimized. .

近年，ハードウェア性能の向上に合わせて，より高度で複雑な科学演算が行われる傾向がある。このようなニーズに応えるためには，図３や図６に示すように，３重以上の多重ループや３次元以上の配列を含む多重ループに対しても，ブロッキングの範囲を広げることにより，プログラムの実行性能を向上させる必要がある。 In recent years, there has been a tendency for more sophisticated and complex scientific operations to be performed as hardware performance improves. In order to meet such needs, as shown in FIG. 3 and FIG. 6, the program can be expanded by broadening the blocking range even for multiple loops including three or more multiple loops and multiple loops including three-dimensional arrays. It is necessary to improve the execution performance.

以下では，３重以上の多重ループや３次元以上の配列を含む多重ループに対するブロッキングの問題の解決を図った，本実施の形態のブロッキングの技術について説明する。 In the following, a blocking technique according to the present embodiment for solving the blocking problem with respect to multiple loops having three or more layers or multiple loops including three-dimensional or more arrays will be described.

図７は，本実施の形態によるコンパイラの機能構成例を示す図である。 FIG. 7 is a diagram illustrating a functional configuration example of the compiler according to the present embodiment.

図７において，実線の矢印は主に制御の流れを示し，破線の矢印は主にデータの流れを示す。 In FIG. 7, the solid-line arrows mainly indicate the flow of control, and the broken-line arrows mainly indicate the data flow.

コンピュータ１は，コンパイラ１０，記憶部２０，リンカ３０，オペレーティングシステム４０を備える。オペレーティングシステム４０は，アプリケーションが共通して利用する基本的な機能を提供する。 The computer 1 includes a compiler 10, a storage unit 20, a linker 30, and an operating system 40. The operating system 40 provides basic functions that are commonly used by applications.

記憶部２０は，ソースプログラム２１，オブジェクトファイル２２，実行ファイル２３などを記憶する，コンピュータ１がアクセス可能な記憶装置である。ソースプログラム２１は，Ｆｏｒｔｒａｎなどのプログラミング言語で記述されたアプリケーションのプログラムである。オブジェクトファイル２２は，コンパイラ１０によってソースプログラム２１から生成された，アプリケーションのオブジェクトコードのファイルである。実行ファイル２３は，リンカ３０によってオブジェクトファイル２２とライブラリとがリンクされた，アプリケーションの実行形式プログラムのファイルである。 The storage unit 20 is a storage device that can be accessed by the computer 1 and stores a source program 21, an object file 22, an execution file 23, and the like. The source program 21 is an application program described in a programming language such as Fortran. The object file 22 is an application object code file generated from the source program 21 by the compiler 10. The execution file 23 is an application executable program file in which the object file 22 and the library are linked by the linker 30.

コンパイラ１０は，ソースプログラム２１をコンパイルし，オブジェクトファイルを生成する。リンカ３０は，オブジェクトファイル２２とライブラリとをリンクし，実行ファイル２３を生成する。 The compiler 10 compiles the source program 21 and generates an object file. The linker 30 links the object file 22 and the library, and generates an execution file 23.

コンパイラ１０は，ソースプログラム入力部１１，入出力制御部１２，中間言語生成部１３，中間言語記憶部１４，最適化処理部１５，コード生成部１８，オブジェクトファイル出力部１９を備える。 The compiler 10 includes a source program input unit 11, an input / output control unit 12, an intermediate language generation unit 13, an intermediate language storage unit 14, an optimization processing unit 15, a code generation unit 18, and an object file output unit 19.

ソースプログラム入力部１１は，コンパイルが指定されたソースプログラム２１をオープンする。入出力制御部１２は，オプション，ファイルの種別に応じて必要な処理を選択する。 The source program input unit 11 opens a source program 21 in which compilation is specified. The input / output control unit 12 selects necessary processing according to the type of option and file.

中間言語生成部１３は，ソースプログラム２１を，最適化処理部１５による最適化の処理で利用する中間コードに変換し，中間言語記憶部１４に格納する。中間言語記憶部１４は，中間言語生成部１３によりソースプログラム２１から変換された中間コードを記憶する，コンピュータ１がアクセス可能な記憶装置である。高級言語で記述されたソースプログラム２１に対してそのまま最適化の処理を施すことは，非常に困難である。そのため，コンパイラ１０は，最適化の処理が施しやすいデータ構造体でソースプログラム２１を表現する中間コードに変換してから，その中間コードに対して最適化の処理を施す。 The intermediate language generation unit 13 converts the source program 21 into intermediate code used in the optimization process by the optimization processing unit 15 and stores it in the intermediate language storage unit 14. The intermediate language storage unit 14 is a storage device accessible by the computer 1 that stores the intermediate code converted from the source program 21 by the intermediate language generation unit 13. It is very difficult to perform the optimization process as it is on the source program 21 described in a high-level language. Therefore, the compiler 10 converts the source program 21 into an intermediate code that expresses the source program 21 with a data structure that can be easily optimized, and then performs optimization processing on the intermediate code.

最適化処理部１５は，中間言語記憶部１４に記憶された中間コードに対して，最適化の処理を行う。最適化処理部１５は，ソース解析部１６，最適化実行部１７を備える。ソース解析部１６は，中間言語記憶部１４に記憶された中間コードを解析して，有効な最適化を選択し，最適化実行部１７に指示する。最適化実行部１７は，指示された最適化を，中間言語記憶部１４に記憶された中間コードに対して適用する。 The optimization processing unit 15 performs optimization processing on the intermediate code stored in the intermediate language storage unit 14. The optimization processing unit 15 includes a source analysis unit 16 and an optimization execution unit 17. The source analysis unit 16 analyzes the intermediate code stored in the intermediate language storage unit 14, selects effective optimization, and instructs the optimization execution unit 17. The optimization execution unit 17 applies the instructed optimization to the intermediate code stored in the intermediate language storage unit 14.

コード生成部１８は，最適化された中間コードから，アセンブラコードを生成する。オブジェクトファイル出力部１９は，アセンブラコードからオブジェクトファイル２２を生成する。 The code generation unit 18 generates assembler code from the optimized intermediate code. The object file output unit 19 generates an object file 22 from the assembler code.

図８は，本実施の形態によるコンピュータのハードウェア構成例を示す図である。 FIG. 8 is a diagram illustrating a hardware configuration example of a computer according to the present embodiment.

コンピュータ１は，ＣＰＵ（Central Processing Unit ）２，ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）などのメモリ３，ＨＤＤ（Hard Disk Drive ）４などのハードウェアを備える。メモリ３の一部は，ＣＰＵ２が高速にアクセス可能なキャッシュメモリである。また，コンピュータ１には，キーボードやマウスなどの入力装置５，ディスプレイなどの表示装置６が接続されている。 The computer 1 includes hardware such as a central processing unit (CPU) 2, a memory 3 such as a random access memory (RAM) and a read only memory (ROM) 3, and a hard disk drive (HDD) 4. A part of the memory 3 is a cache memory that the CPU 2 can access at high speed. Further, an input device 5 such as a keyboard and a mouse 5 and a display device 6 such as a display are connected to the computer 1.

図７に示すコンピュータ１が備えるコンパイラ１０，記憶部２０，リンカ３０，オペレーティングシステム４０等が提供する機能は，図８に示すコンピュータ１のハードウェアとソフトウェアプログラムとによって実現される。 Functions provided by the compiler 10, the storage unit 20, the linker 30, the operating system 40, and the like included in the computer 1 illustrated in FIG. 7 are realized by the hardware and software programs of the computer 1 illustrated in FIG.

図９は，本実施の形態によるソース解析部の機能構成例を示す図である。 FIG. 9 is a diagram illustrating a functional configuration example of the source analysis unit according to the present embodiment.

本実施の形態では，ソース解析部１６は，多重ループを最適化するブロッキングに関する処理として，アクセスパターンの解析，メモリサイズ計算式の作成，分割ブロック長の算出などの処理を実施する。最適化処理部１５は，ソース解析部１６により得られた情報を最適化実行部１７に伝播することで，ソースプログラム２１に含まれるループの最適化を実行し，コンパイラ１０により生成されるプログラムの高速化を図る。 In the present embodiment, the source analysis unit 16 performs processing such as access pattern analysis, creation of a memory size calculation formula, and calculation of a divided block length as processing related to blocking that optimizes multiple loops. The optimization processing unit 15 propagates the information obtained by the source analysis unit 16 to the optimization execution unit 17, thereby executing optimization of the loop included in the source program 21, and executing the program generated by the compiler 10. Increase speed.

ソース解析部１６は，多重ループを最適化するブロッキングに関する処理として，中間言語記憶部１４に記憶された中間コードを解析して，ブロッキングによるループ分割の単位である分割ブロック長を自動計算する。中間言語記憶部１４は，ソースプログラム２１に含まれるループに関する中間コードのデータとして，ループデータを記憶する。図９に示すループデータ記憶部２００は，中間言語記憶部１４におけるループデータの記憶領域である。 The source analysis unit 16 analyzes the intermediate code stored in the intermediate language storage unit 14 as a process related to blocking for optimizing multiple loops, and automatically calculates a divided block length which is a unit of loop division by blocking. The intermediate language storage unit 14 stores loop data as intermediate code data relating to a loop included in the source program 21. The loop data storage unit 200 shown in FIG. 9 is a loop data storage area in the intermediate language storage unit 14.

なお，本実施の形態では，最適化処理部１５によるブロッキングに関する処理の前に，ループの内外を交換するインデックス変換や，並列するループを含む多重ループを直列の入れ子構造に変形するループ分割等の最適化が，すでに実行されているものとする。 In the present embodiment, before the processing related to blocking by the optimization processing unit 15, index conversion for exchanging the inside and outside of the loop, loop division for transforming multiple loops including parallel loops into a serial nested structure, and the like. Assume that optimization has already been performed.

ソース解析部１６は，ブロッキングに関する機能部として，処理対象ループ解析部１００，配列解析部１１０，アクセスパターン解析部１２０，メモリサイズ計算式生成部１３０，分割ブロック長計算部１４０，ブロッキング指示部１５０等の各機能部を有する。また，ソース解析部１６は，配列情報記憶部１６０，配列別メモリ計算情報記憶部１７０，メモリサイズ計算情報記憶部１８０等の各記憶部を有する。ソース解析部１６が有する各機能部，各記憶部は，コンピュータ１が備えるＣＰＵ２，メモリ３，ＨＤＤ４等のハードウェアと，ソフトウェアプログラムとによって実現される。 The source analysis unit 16 includes, as functional units related to blocking, a processing target loop analysis unit 100, an array analysis unit 110, an access pattern analysis unit 120, a memory size calculation formula generation unit 130, a divided block length calculation unit 140, a blocking instruction unit 150, and the like. It has each functional part. The source analysis unit 16 includes storage units such as an array information storage unit 160, an array-specific memory calculation information storage unit 170, and a memory size calculation information storage unit 180. Each functional unit and each storage unit included in the source analysis unit 16 are realized by hardware such as the CPU 2, the memory 3, and the HDD 4 included in the computer 1, and a software program.

処理対象ループ解析部１００は，最適化の対象となるループからタイトな構造を持つループを抽出する。以下では，中間コードで表現されたソースプログラム２１に含まれる最適化の対象となるループを，最適化対象ループと呼ぶものとする。また，処理対象ループ解析部１００によって，最適化対象ループから抽出されるタイトな構造を持つループを，ブロッキング対象ループと呼ぶものとする。 The processing target loop analysis unit 100 extracts a loop having a tight structure from the loop to be optimized. Hereinafter, a loop to be optimized included in the source program 21 expressed by the intermediate code is referred to as an optimization target loop. A loop having a tight structure extracted from the optimization target loop by the processing target loop analysis unit 100 is referred to as a blocking target loop.

配列解析部１１０は，ブロッキング対象ループの最内ループ内を解析し，ブロッキング対象ループ内に存在する配列を抽出する。配列解析部１１０は，ブロッキング対象ループ内から抽出された配列の情報を配列情報記憶部１６０に記憶する。配列情報記憶部１６０は，ブロッキング対象ループ内から抽出された配列の情報を記憶する，コンピュータがアクセス可能な記憶装置である。 The sequence analysis unit 110 analyzes the innermost loop of the blocking target loop and extracts a sequence existing in the blocking target loop. The sequence analysis unit 110 stores the sequence information extracted from the blocking target loop in the sequence information storage unit 160. The sequence information storage unit 160 is a storage device that can be accessed by a computer and stores sequence information extracted from the blocking target loop.

アクセスパターン解析部１２０は，ブロッキング対象ループにおいて最内ループから最外ループに向かって出現する制御変数の順序と，ブロッキング対象ループ内から抽出された配列の添え字に出現する制御変数の順序との関係を解析する。アクセスパターン解析部１２０は，添え字に出現する制御変数の順序が，最内ループから最外ループに向かって出現する制御変数の順序とは異なる配列が存在する場合に，ブロッキング対象ループに対するブロッキングによる最適化が有効であると判定する。 The access pattern analysis unit 120 determines the order of the control variables appearing from the innermost loop toward the outermost loop in the blocking target loop, and the order of the control variables appearing in the subscripts of the array extracted from the blocking target loop. Analyze the relationship. When there is an array in which the order of the control variables appearing in the subscript is different from the order of the control variables appearing from the innermost loop toward the outermost loop, the access pattern analysis unit 120 performs blocking on the blocking target loop. It is determined that optimization is effective.

メモリサイズ計算式生成部１３０は，ブロッキング対象ループ内の配列ごとに，ブロッキング対象ループに対するブロッキングによって最適化される範囲で，配列が使用するメモリサイズを算出する計算式を生成する。以下では，ブロッキングによって最適化される範囲で配列が使用するメモリサイズを算出する計算式を，配列別メモリ計算式と呼ぶものとする。メモリサイズ計算式生成部１３０は，配列別メモリ計算式の情報を配列別メモリ計算情報記憶部１７０に記憶する。配列別メモリ計算情報記憶部１７０は，配列別メモリ計算式の情報を記憶する，コンピュータがアクセス可能な記憶装置である。 The memory size calculation formula generation unit 130 generates, for each array in the blocking target loop, a calculation formula for calculating the memory size used by the array within a range optimized by blocking the blocking target loop. Hereinafter, a calculation formula for calculating a memory size used by an array within a range optimized by blocking is referred to as an array-specific memory calculation formula. The memory size calculation formula generation unit 130 stores the memory calculation formula information for each array in the memory calculation information storage unit 170 for each array. The array-based memory calculation information storage unit 170 is a computer-accessible storage device that stores information on the array-specific memory calculation formula.

また，メモリサイズ計算式生成部１３０は，配列別メモリ計算式から，ブロッキング対象ループに対するブロッキングによって最適化される範囲で，ブロッキング対象ループ内の全配列が使用するメモリサイズを算出する計算式を生成する。以下では，ブロッキングによって最適化される範囲でブロッキング対象ループ内の全配列が使用するメモリサイズを算出する計算式を，メモリサイズ計算式と呼ぶものとする。メモリサイズ計算式生成部１３０は，メモリサイズ計算式の情報をメモリサイズ計算情報記憶部１８０に記憶する。メモリサイズ計算情報記憶部１８０は，メモリサイズ計算式の情報を記憶する，コンピュータがアクセス可能な記憶装置である。 In addition, the memory size calculation formula generation unit 130 generates a calculation formula for calculating the memory size used by all the arrays in the blocking target loop within the range optimized by blocking the blocking target loop from the array-specific memory calculation formula. To do. Hereinafter, a calculation formula for calculating the memory size used by all the arrays in the blocking target loop within a range optimized by blocking is referred to as a memory size calculation formula. The memory size calculation formula generation unit 130 stores the memory size calculation formula information in the memory size calculation information storage unit 180. The memory size calculation information storage unit 180 is a computer-accessible storage device that stores information on a memory size calculation formula.

分割ブロック長計算部１４０は，メモリサイズ計算式を用いて，ブロッキング対象ループに対するブロッキングによって最適化される範囲で，配列に対するアクセスによってキャッシュミスが発生しない分割ブロック長を自動計算する。 The divided block length calculation unit 140 automatically calculates a divided block length that does not cause a cache miss due to access to the array, within a range optimized by blocking the blocking target loop, using a memory size calculation formula.

ブロッキング指示部１５０は，分割ブロック長計算部１４０によって得られた分割ブロック長を用いた，ブロッキング対象ループに対するブロッキングによる最適化の指示を行う。 The blocking instruction unit 150 issues an instruction for optimization by blocking the blocking target loop using the divided block length obtained by the divided block length calculation unit 140.

最適化実行部１７は，ソース解析部１６により求められた分割ブロック長を用いて，ブロッキング対象ループに対するブロッキングによる最適化を実行する。より具体的には，最適化実行部１７は，ソース解析部１６により求められた分割ブロック長を用いて分割数を指示するループをもとのブロッキング対象ループの外に生成して，もとのブロッキング対象ループを変形する。 The optimization execution unit 17 uses the divided block length obtained by the source analysis unit 16 to perform optimization by blocking on the blocking target loop. More specifically, the optimization execution unit 17 generates a loop for designating the number of divisions using the division block length obtained by the source analysis unit 16 outside the original blocking target loop. Deform the blocking loop.

図１０は，本実施の形態の最適化処理部によるブロッキング最適化処理フローチャートである。 FIG. 10 is a flowchart of the blocking optimization process performed by the optimization processing unit according to this embodiment.

最適化処理部１５のソース解析部１６において，処理対象ループ解析部１００は，処理対象ループ解析処理を行う（ステップＳ１）。処理対象ループ解析処理は，最適化の対象となるループからタイトな構造を持つループを抽出する処理である。処理対象ループ解析処理の詳細については，後述する。 In the source analysis unit 16 of the optimization processing unit 15, the processing target loop analysis unit 100 performs processing target loop analysis processing (step S1). The processing target loop analysis process is a process for extracting a loop having a tight structure from a loop to be optimized. Details of the processing target loop analysis processing will be described later.

配列解析部１１０は，配列解析処理を行う（ステップＳ２）。配列解析処理は，ブロッキング対象ループ内に存在する配列を抽出する処理である。配列解析処理の詳細については，後述する。 The sequence analysis unit 110 performs sequence analysis processing (step S2). The sequence analysis processing is processing for extracting a sequence existing in the blocking target loop. Details of the sequence analysis processing will be described later.

アクセスパターン解析部１２０は，アクセスパターン解析処理を行う（ステップＳ３）。アクセスパターン解析処理は，ブロッキング対象ループ内のループの制御変数の出現順と，ブロッキング対象ループ内の配列の制御変数の出現順との関係から，ブロッキング対象ループに対するブロッキングによる最適化の有効／無効を判定する処理である。アクセスパターン解析処理の詳細については，後述する。 The access pattern analysis unit 120 performs access pattern analysis processing (step S3). The access pattern analysis process determines whether the optimization for blocking is enabled or disabled based on the relationship between the order of appearance of control variables in the loop within the blocking target loop and the order of appearance of control variables in the array within the blocking target loop. This is a process of determining. Details of the access pattern analysis processing will be described later.

メモリサイズ計算式生成部１３０は，メモリサイズ計算式生成処理を行う（ステップＳ４）。メモリサイズ計算式生成処理は，ブロッキング対象ループに対するブロッキング対象ループに対するブロッキングによって最適化される範囲で，ブロッキング対象ループ内の全配列が使用するメモリサイズを算出する計算式を生成する処理である。メモリサイズ計算式生成処理の詳細については，後述する。 The memory size calculation formula generation unit 130 performs a memory size calculation formula generation process (step S4). The memory size calculation formula generation process is a process for generating a calculation formula for calculating the memory size used by all the arrays in the blocking target loop within a range optimized by blocking the blocking target loop with respect to the blocking target loop. Details of the memory size calculation formula generation processing will be described later.

分割ブロック長計算部１４０は，分割ブロック長計算処理を行う（ステップＳ５）。分割ブロック長計算処理は，メモリサイズ計算式を用いて，ブロッキング対象ループに対するブロッキングによって最適化される範囲のループの処理で，配列に対するアクセスによってキャッシュミスが発生しない分割ブロック長を自動計算する処理である。分割ブロック長計算処理の詳細については，後述する。 The divided block length calculation unit 140 performs a divided block length calculation process (step S5). Divided block length calculation processing is a process of loops in the range optimized by blocking for the loop to be blocked using a memory size calculation formula, and automatically calculates the divided block length that does not cause cache misses due to access to the array. is there. Details of the division block length calculation processing will be described later.

ブロッキング指示部１５０は，ブロッキング指示処理を行う（ステップＳ６）。ブロッキング指示処理は，求めた分割ブロック長を用いた，ブロッキング対象ループに対するブロッキングによる最適化の指示を行う処理である。ブロッキング指示処理の詳細については，後述する。 The blocking instruction unit 150 performs a blocking instruction process (step S6). The blocking instruction process is a process for instructing an optimization by blocking the blocking target loop using the obtained divided block length. Details of the blocking instruction process will be described later.

最適化処理部１５の最適化実行部１７は，ソース解析部１６により得られた分割ブロック長を用いた，ブロッキング対象ループに対するブロッキングによる最適化を実行する（ステップＳ７）。 The optimization execution unit 17 of the optimization processing unit 15 executes optimization by blocking the blocking target loop using the divided block length obtained by the source analysis unit 16 (step S7).

以下では，本実施の形態によるソース解析部１６が備えるブロッキングに関する各機能部の処理について，具体的な例を用いて説明する。 Hereinafter, the processing of each functional unit related to blocking included in the source analysis unit 16 according to the present embodiment will be described using a specific example.

図１１は，本実施の形態による最適化の対象となるループとブロッキングによる最適化実行後のループとの例を示す図である。 FIG. 11 is a diagram illustrating an example of a loop to be optimized according to the present embodiment and a loop after execution of optimization by blocking.

図１１（Ａ）は，ソースプログラム２１における最適化の対象となるループの例を示す。以下では，最適化の対象となるループを，最適化対象ループと呼ぶものとする。図１１（Ａ）に示す最適化対象ループは，最内ループ内に３次元の配列を持つ，３重の多重ループである。 FIG. 11A shows an example of a loop to be optimized in the source program 21. Hereinafter, a loop to be optimized is referred to as an optimization target loop. The optimization target loop shown in FIG. 11A is a triple multiple loop having a three-dimensional array in the innermost loop.

図１１（Ｂ）は，図１１（Ａ）に示す最適化対象ループに対して本実施の形態のブロッキングによる最適化を実行した場合の，目標となる最適化実行後のループの例を示す。図１１（Ｂ）に示す最適化実行後のループは，図（Ａ）に示す最適化対象ループのループＩ，ループＪ，ループＫのすべてのループについて，分割ブロック長ｂｌｏｃｋを用いたブロッキングによる最適化が実行された状態となっている。 FIG. 11B shows an example of a loop after execution of optimization as a target when optimization by blocking of the present embodiment is executed on the optimization target loop shown in FIG. The loop after the optimization execution shown in FIG. 11B is the optimum by blocking using the divided block length block for all of the loops I, J, and K of the optimization target loop shown in FIG. Is in a state of being executed.

図１１（Ｂ）に示すように，最適化実行後のループでは，分割ブロック長ｂｌｏｃｋを用いて分割数を指示するループＩＩ，ループＪＪ，ループＫＫがもとのループＩ，ループＪ，ループＫの外に生成され，もとのループＩ，ループＪ，ループＫが変形されている。図１１（Ｂ）における最適化実行後のループにおいて，内側のループＩ，ループＪ，ループＫからなる多重ループが，ブロッキングによって最適化される範囲のループとなる。本実施の形態によるソース解析部１６では，ブロッキングによって最適化される範囲のループ内の処理では，配列に対するアクセスによってキャッシュミスが発生しないような分割ブロック長ｂｌｏｃｋを，自動計算する。 As shown in FIG. 11B, in the loop after execution of optimization, the loop II, loop JJ, and loop KK that indicate the number of divisions using the divided block length block are the original loop I, loop J, and loop K. The original loop I, loop J, and loop K are transformed. In the loop after the optimization execution in FIG. 11B, the multiple loop composed of the inner loop I, loop J, and loop K becomes a loop in the range optimized by blocking. The source analysis unit 16 according to the present embodiment automatically calculates a divided block length block so that a cache miss does not occur due to access to the array in the processing in the loop optimized by blocking.

以下では，本実施の形態によるソース解析部１６によって，図１１（Ａ）に示す最適化対象ループに対するブロッキングによる最適化で用いる分割ブロック長ｂｌｏｃｋを，自動的に計算する例を説明する。 Hereinafter, an example in which the source analysis unit 16 according to the present embodiment automatically calculates the divided block length block used in the optimization by blocking with respect to the optimization target loop shown in FIG. 11A will be described.

なお，図１１に示すループのプログラムは，ソースプログラム２１の形式で表現しているが，実際の最適化処理部１５では，中間コードに変換されたループのプログラムに対して処理が行われる。 The loop program shown in FIG. 11 is expressed in the form of the source program 21, but the actual optimization processing unit 15 performs processing on the loop program converted into the intermediate code.

図１２は，本実施の形態によるループデータのデータ構造の例を示す図である。 FIG. 12 is a diagram illustrating an example of a data structure of loop data according to the present embodiment.

図１２に示すループデータ２１０は，ソースプログラム２１に含まれるループが中間コードに変換されたデータテーブルの例である。図１２に示すループデータ２１０ｋは，図１１（Ａ）に示す最適化対象ループのループＫについてのループデータ２１０である。図１２に示すループデータ２１０ｊは，図１１（Ａ）に示す最適化対象ループのループＪについてのループデータ２１０である。図１２に示すループデータ２１０ｉは，図１１（Ａ）に示す最適化対象ループのループＩについてのループデータ２１０である。 12 is an example of a data table in which a loop included in the source program 21 is converted into an intermediate code. The loop data 210k illustrated in FIG. 12 is the loop data 210 for the loop K of the optimization target loop illustrated in FIG. The loop data 210j shown in FIG. 12 is the loop data 210 for the loop J of the optimization target loop shown in FIG. The loop data 210i shown in FIG. 12 is the loop data 210 for the loop I of the optimization target loop shown in FIG.

図１２に示すループデータ２１０は，ループ名，ｐｒｅｖ（previous），ｎｅｘｔ，配列データ，ブロッキングフラグ，分割ブロック長，有効フラグ等の情報を有する。なお，実際のループデータ２１０は，ループに関する膨大な量の情報を有するデータである。図１２には，ループに関する膨大な量の情報を有するループデータ２１０の一部のみが記載されている。 The loop data 210 shown in FIG. 12 includes information such as a loop name, prev (previous), next, array data, blocking flag, divided block length, and valid flag. The actual loop data 210 is data having an enormous amount of information regarding the loop. FIG. 12 shows only a part of the loop data 210 having a huge amount of information regarding the loop.

図１２に示すループデータ２１０におけるループ名は，ループで値が更新される制御変数の名前を示す。 The loop name in the loop data 210 shown in FIG. 12 indicates the name of the control variable whose value is updated in the loop.

図１２に示すループデータ２１０におけるｐｒｅｖは，外側のループのループデータ２１０に対するポインタを示す。図１２において，例えばループデータ２１０ｊのｐｒｅｖにおける“＊Ｋ”は，ループＪの外側のループであるループＫのループデータ２１０ｋに対するポインタを表している。なお，図１２において，外側にループを持たないループＫのループデータ２１０ｋのｐｒｅｖは，“ＮＵＬＬ”である。 Prev in the loop data 210 shown in FIG. 12 indicates a pointer to the loop data 210 of the outer loop. In FIG. 12, for example, “* K” in prev of the loop data 210j represents a pointer to the loop data 210k of the loop K that is a loop outside the loop J. In FIG. 12, the prev of the loop data 210k of the loop K having no loop on the outside is “NULL”.

図１２に示すループデータ２１０におけるｎｅｘｔは，内側のループのループデータ２１０に対するポインタを示す。例えば，ループデータ２１０ｊのｎｅｘｔにおける“＊Ｉ”は，ループＪの内側のループであるループＩのループデータ２１０ｉに対するポインタを表している。なお，図１２において，内側にループを持たないループＩのループデータ２１０ｉのｎｅｘｔは，“ＮＵＬＬ”である。 Next in the loop data 210 shown in FIG. 12 indicates a pointer to the loop data 210 of the inner loop. For example, “* I” in the next of the loop data 210j represents a pointer to the loop data 210i of the loop I that is a loop inside the loop J. In FIG. 12, the next of the loop data 210i of the loop I having no loop on the inside is “NULL”.

図１２に示すループデータ２１０における配列データは，最内ループ内に存在する配列に関する情報である配列データに対するポインタを示す。ループデータ２１０の配列データのデフォルトは，“ＮＵＬＬ”である。ループデータ２１０における配列データは，後述の配列解析部１１０の処理によって設定される。 The array data in the loop data 210 shown in FIG. 12 indicates a pointer to array data that is information relating to the array existing in the innermost loop. The default of the array data of the loop data 210 is “NULL”. The sequence data in the loop data 210 is set by processing of the sequence analysis unit 110 described later.

図１２に示すループデータ２１０におけるブロッキングフラグは，ブロッキング実行可否の指示の有無を示す。本実施の形態では，ループごとに，外部的にブロッキングによる最適化の有無を指示できるケースも想定している。本実施の形態では，特に指示のないループについては，ループデータ２１０のブロッキングフラグに“ＴＲＵＥ”が設定される。最適化指示行などによりブロッキングが抑止されたループについては，ループデータ２１０のブロッキングフラグに“ＦＡＬＳＥ”が設定される。 The blocking flag in the loop data 210 shown in FIG. In the present embodiment, it is also assumed that for each loop, the presence or absence of optimization by blocking can be externally instructed. In the present embodiment, “TRUE” is set in the blocking flag of the loop data 210 for a loop that has no particular instruction. For a loop whose blocking is suppressed by an optimization instruction line or the like, “FALSE” is set in the blocking flag of the loop data 210.

図１２に示すループデータ２１０における分割ブロック長は，ブロッキングで用いる分割ブロック長を示す。本実施の形態では，ループごとに，外部的に固定の分割ブロック長を指示できるケースも想定している。本実施の形態では，特に固定の分割ブロック長の指示がないループについては，ループデータ２１０の分割ブロック長に，ブロッキングで用いる分割ブロック長を自動計算することを示す“０”が設定される。最適化指示行などにより固定の分割ブロック長が指示されたループについては，ループデータ２１０の分割ブロック長に指示された値が設定される。なお，ループデータ２１０のブロッキングフラグが“ＦＡＬＳＥ”であるループについては，ループデータ２１０の分割ブロック長に，ブロッキングを実行しない旨を示す“−１”が設定される。 The divided block length in the loop data 210 shown in FIG. 12 indicates the divided block length used for blocking. In this embodiment, it is assumed that a fixed division block length can be externally designated for each loop. In the present embodiment, “0” indicating that the divided block length used for blocking is automatically calculated is set as the divided block length of the loop data 210 for a loop that does not instruct a fixed divided block length. For a loop for which a fixed division block length is designated by an optimization instruction line or the like, the value designated for the division block length of the loop data 210 is set. For a loop in which the blocking flag of the loop data 210 is “FALSE”, “−1” indicating that blocking is not executed is set in the divided block length of the loop data 210.

図１２に示すループデータ２１０における有効フラグは，ブロッキングによる最適化が有効であるか否かを示す。ループデータ２１０の有効フラグのデフォルトは，“ＮＵＬＬ”である。ループデータ２１０の有効フラグは，後述のアクセスパターン解析部１２０の処理によって設定される。 The valid flag in the loop data 210 shown in FIG. 12 indicates whether optimization by blocking is valid. The default valid flag of the loop data 210 is “NULL”. The valid flag of the loop data 210 is set by processing of the access pattern analysis unit 120 described later.

本実施の形態の処理対象ループ解析部１００による処理の例を説明する。 An example of processing performed by the processing target loop analysis unit 100 according to the present embodiment will be described.

処理対象ループ解析部１００は，まず，プログラムに含まれるＤＯループやＦＯＲループなどのループの入れ子関係を解析し，多重ループにおける最内ループを探し出す。より具体的には，処理対象ループ解析部１００は，ループデータ記憶部２００に記憶されたループデータ２１０を，外側のループから順にｎｅｘｔのポインタを辿って，ｎｅｘｔが“ＮＵＬＬ”である最内ループを見つけ出す。例えば，処理対象ループ解析部１００は，図１１（Ａ）に示す最適化対象ループにおいて，最外のループＫからループＪ，ループＩへと順に辿り，子ループがないループＩを最内ループとして抽出する。 The processing target loop analysis unit 100 first analyzes the nesting relationship of loops such as DO loops and FOR loops included in the program to find the innermost loop in the multiple loop. More specifically, the processing target loop analysis unit 100 traces the loop data 210 stored in the loop data storage unit 200 by tracing the next pointer sequentially from the outer loop, and the next loop is “NULL”. Find out. For example, in the optimization target loop shown in FIG. 11A, the processing target loop analysis unit 100 sequentially traces from the outermost loop K to the loop J and loop I, and sets the loop I having no child loop as the innermost loop. Extract.

次に，処理対象ループ解析部１００は，抽出された最内ループから順に外側にループを広げながら，タイトな構造を持つループの範囲を抽出する。処理対象ループ解析部１００は，抽出されたタイトな構造を持つループの範囲を，ブロッキングによる最適化の対象ループとする。以下では，ブロッキングによる最適化の対象ループを，ブロッキング対象ループと呼ぶものとする。 Next, the processing target loop analysis unit 100 extracts a range of loops having a tight structure while expanding the loops outward in order from the extracted innermost loop. The processing target loop analysis unit 100 sets the extracted loop range having a tight structure as a target loop to be optimized by blocking. In the following, the optimization target loop is called a blocking target loop.

ループにおけるタイトな構造とは，ループが直列の入れ子構造をなし，最内ループにのみ実行文を有する構造である。本実施の形態では，タイトな構造を持つ多重ループをタイトなループと呼ぶものとする。途中のループとループとの間に実行文が含まれる場合には，実行文を含むループよりも下のループがタイトなループとみなされる。 A tight structure in a loop is a structure in which loops are serially nested and an executable statement is included only in the innermost loop. In the present embodiment, a multiple loop having a tight structure is called a tight loop. When an executable statement is included between loops in the middle, the loop below the loop containing the executable statement is regarded as a tight loop.

図１３は，本実施の形態によるタイトなループを説明する図である。 FIG. 13 is a diagram for explaining a tight loop according to the present embodiment.

図１３（Ａ）に示すループは，ループＸとループＹからなる２重ループである。図１３（Ａ）に示すループは，最内のループＹの内側にのみ演算の実行文を有している。すなわち，ループＸとループＹとからなる多重ループは，タイトなループである。 The loop shown in FIG. 13A is a double loop composed of a loop X and a loop Y. The loop shown in FIG. 13A has an execution statement for calculation only inside the innermost loop Y. That is, the multiple loop composed of the loop X and the loop Y is a tight loop.

図１３（Ｂ）に示すループは，ループＸとループＹからなる２重ループである。図１３（Ｂ）に示すループでは，ループＸとループＹとの間に演算の実行文が含まれている。そのため，ループＸとループＹとからなる多重ループは，タイトなループではない。 The loop shown in FIG. 13B is a double loop composed of a loop X and a loop Y. In the loop shown in FIG. 13B, an execution statement for the operation is included between the loop X and the loop Y. Therefore, the multiple loop composed of loop X and loop Y is not a tight loop.

図１３（Ｃ）に示すループは，ループＸ，ループＹ，ループＺからなる３重ループである。図１３（Ｃ）に示すループでは，ループＸとループＹとの間に演算の実行文が含まれており，ループＸ，ループＹ，ループＺからなる多重ループは，タイトなループではない。ただし，ループＹとループＺとの間には実行文が含まれていないので，ループＹとループＺとからなる多重ループは，タイトなループである。処理対象ループ解析部１００は，このようなタイトなループの範囲を抽出する。 The loop shown in FIG. 13C is a triple loop including loop X, loop Y, and loop Z. In the loop shown in FIG. 13C, the execution statement of the operation is included between the loop X and the loop Y, and the multiple loop composed of the loop X, the loop Y, and the loop Z is not a tight loop. However, since an executable statement is not included between the loop Y and the loop Z, the multiple loop composed of the loop Y and the loop Z is a tight loop. The processing target loop analysis unit 100 extracts such a tight loop range.

例えば，図１１（Ａ）に示す最適化対象ループでは，最内のループＩの内側にのみ演算の実行分が存在する。処理対象ループ解析部１００は，図１１（Ａ）に示す最適化対象ループからループＫ，ループＪ，ループＩの３重ループをタイトなループとして抽出し，ブロッキング対象ループとする。すなわち，図１１（Ａ）に示す最適化対象ループは，そのままブロッキング対象ループとなる。 For example, in the optimization target loop shown in FIG. 11 (A), the execution amount of the calculation exists only inside the innermost loop I. The processing target loop analysis unit 100 extracts the loop K, loop J, and loop I from the optimization target loop shown in FIG. 11A as a tight loop and sets it as a blocking target loop. That is, the optimization target loop shown in FIG. 11A becomes a blocking target loop as it is.

図１４は，本実施の形態の処理対象ループ解析部による処理対象ループ解析処理フローチャートである。 FIG. 14 is a processing target loop analysis processing flowchart by the processing target loop analysis unit of the present embodiment.

処理対象ループ解析部１００は，処理対象の多重ループから最内ループを抽出する（ステップＳ１０）。処理対象ループ解析部１００は，抽出された最内ループをタイトな最外ループとする（ステップＳ１１）。ここでのタイトな最外ループは，その時点でタイトなループであることが確認されているループのうち，最外のループを示す。 The processing target loop analysis unit 100 extracts the innermost loop from the multiple loops to be processed (step S10). The processing target loop analysis unit 100 sets the extracted innermost loop as a tight outermost loop (step S11). The tight outermost loop here indicates the outermost loop among the loops that are confirmed to be tight at that time.

処理対象ループ解析部１００は，現在のタイトな最外ループの外側に，ループがあるかを判定する（ステップＳ１２）。 The processing target loop analysis unit 100 determines whether there is a loop outside the current tight outermost loop (step S12).

現在のタイトな最外ループの外側にループがある場合には（ステップＳ１２のＹＥＳ），処理対象ループ解析部１００は，現在のタイトな最外ループの１つ外側のループを含む多重ループがタイトなループであるかを判定する（ステップＳ１３）。 If there is a loop outside the current tight outermost loop (YES in step S12), the processing target loop analysis unit 100 determines that the multiple loop including the outer loop one of the current tight outermost loop is tight. It is determined whether the loop is correct (step S13).

現在のタイトな最外ループの１つ外側のループを含む多重ループがタイトなループであれば（ステップＳ１３のＹＥＳ），処理対象ループ解析部１００は，タイトな最外ループを，現在のタイトな最外ループのその１つ外側のループに更新する（ステップＳ１４）。処理対象ループ解析部１００は，ステップＳ１２に戻って，更新されたタイトな最外ループの処理に移る。 If the multiple loop including the loop outside the current tight outermost loop is a tight loop (YES in step S13), the processing target loop analysis unit 100 converts the tight outermost loop to the current tight outer loop. The loop is updated to the outermost loop of the outermost loop (step S14). The processing target loop analysis unit 100 returns to Step S12 and proceeds to the processing of the updated tight outermost loop.

現在のタイトな最外ループの外側にループがない場合には（ステップＳ１２のＮＯ），処理対象ループ解析部１００は，最内ループからタイトな最外ループまでのループをブロッキング対象ループとする（ステップＳ１５）。また，現在のタイトな最外ループの１つ外側のループを含む多重ループがタイトなループでなければ（ステップＳ１３のＮＯ），処理対象ループ解析部１００は，最内ループからタイトな最外ループまでのループをブロッキング対象ループとする（ステップＳ１５）。 When there is no loop outside the current tight outermost loop (NO in step S12), the processing target loop analysis unit 100 sets the loop from the innermost loop to the tight outermost loop as a blocking target loop ( Step S15). If the multiple loop including the outer loop of the current tight outermost loop is not a tight loop (NO in step S13), the processing target loop analysis unit 100 performs a tight outermost loop from the innermost loop. The loop up to is set as a blocking target loop (step S15).

次に，本実施の形態の配列解析部１１０による処理の例を説明する。 Next, an example of processing by the sequence analysis unit 110 of this embodiment will be described.

配列解析部１１０は，処理対象ループ解析部１００の処理によって抽出された，ブロッキング対象ループの最内ループ内で利用されている配列を抽出する。このとき，配列解析部１１０は，演算の実行によってメモリ３を更新する配列と，メモリ３を参照する配列とのすべてを列挙する。 The sequence analysis unit 110 extracts the sequence used in the innermost loop of the blocking target loop extracted by the processing of the processing target loop analysis unit 100. At this time, the array analysis unit 110 lists all the arrays that update the memory 3 and the arrays that refer to the memory 3 by executing the operation.

例えば，図１１（Ａ）のブロッキング対象ループにおいて，最内ループ内から，以下の６つの配列が抽出される。 For example, in the blocking target loop of FIG. 11A, the following six sequences are extracted from the innermost loop.

配列Ａ（Ｉ，Ｊ，Ｋ）
配列Ａ（Ｉ，Ｊ，Ｋ）
配列Ｂ（Ｊ，Ｋ，Ｉ）
配列Ｃ（Ｋ，Ｉ，Ｊ）
配列Ｄ（Ｉ，Ｘ，Ｊ）
配列Ｄ（Ｉ，Ｙ，Ｊ）
このうち，最内ループ内の演算の実行文の左辺にある配列Ａ（Ｉ，Ｊ，Ｋ）が演算の実行によってメモリ３を更新する配列であり，演算の実行文の右辺にある残りの５つの配列がメモリ３を参照する配列である。 Array A (I, J, K)
Array A (I, J, K)
Array B (J, K, I)
Array C (K, I, J)
Array D (I, X, J)
Array D (I, Y, J)
Among these, the array A (I, J, K) on the left side of the execution statement of the operation in the innermost loop is an array for updating the memory 3 by executing the operation, and the remaining 5 on the right side of the execution statement of the operation One array refers to the memory 3.

配列解析部１１０は，抽出された配列のうち，配列名と添え字とが全く同じ配列が複数ある場合には，それらの配列をまとめて１つの配列とみなす。配列名と添え字とが全く同じ配列に対するアクセスは，すべてメモリ３上の同じ領域へのアクセスとなる。すなわち，ブロッキング対象ループ内の処理で，配列名と添え字とが全く同じ配列に何度アクセスしても，アクセスするメモリ３は変わらない。そのため，後述のブロッキングによる最適化が行われたループ内の処理で配列が使用するメモリサイズを計算する手順では，配列名と添え字とが全く同じ配列が複数あっても，１つ分だけ計算すれば十分となる。 When there are a plurality of sequences having the same sequence name and subscript among the extracted sequences, the sequence analysis unit 110 collectively regards these sequences as one sequence. All accesses to arrays with exactly the same array name and subscript are access to the same area on the memory 3. That is, the accessed memory 3 does not change regardless of how many times the array name and the subscript are accessed in the process within the blocking target loop. Therefore, in the procedure for calculating the memory size used by an array in a loop that has been optimized by blocking, which will be described later, even if there are multiple arrays with exactly the same array name and subscript, only one is calculated. It will be enough.

例えば，図１１（Ａ）において，最内ループから抽出された配列には，配列Ａ（Ｉ，Ｊ，Ｋ）が２つある。このとき，配列解析部１１０は，１つの配列Ａ（Ｉ，Ｊ，Ｋ）が抽出されたものとみなす。 For example, in FIG. 11A, the arrays extracted from the innermost loop include two arrays A (I, J, K). At this time, the sequence analysis unit 110 assumes that one sequence A (I, J, K) has been extracted.

配列解析部１１０は，最内ループ内から抽出された各配列について，配列データを生成する。配列解析部１１０は，生成された配列データを，ブロッキング対象ループ内から抽出された配列の情報として，配列情報記憶部１６０に記憶する。 The sequence analysis unit 110 generates sequence data for each sequence extracted from the innermost loop. The sequence analysis unit 110 stores the generated sequence data in the sequence information storage unit 160 as sequence information extracted from the blocking target loop.

図１５は，本実施の形態による配列データの管理を説明する図である。 FIG. 15 is a diagram for explaining management of array data according to the present embodiment.

図１５に示す配列データ１６５は，配列解析部１１０によって，図１１（Ａ）における最内ループ内から抽出された配列データ１６５の例である。配列データ１６５ａは，配列Ａ（Ｉ，Ｊ，Ｋ）の配列データである。配列データ１６５ｂは，配列Ｂ（Ｊ，Ｋ，Ｉ）の配列データである。配列データ１６５ｃは，配列Ｃ（Ｋ，Ｉ，Ｊ）の配列データである。配列データ１６５ｄ₁は，配列Ｄ（Ｉ，Ｘ，Ｊ）の配列データである。配列データ１６５ｄ₂は，配列Ｄ（Ｉ，Ｙ，Ｊ）の配列データである。このように，配列解析部１１０は，抽出された配列の数だけ，配列データ１６５を生成する。 The sequence data 165 shown in FIG. 15 is an example of the sequence data 165 extracted from the innermost loop in FIG. 11A by the sequence analysis unit 110. The array data 165a is array data of the array A (I, J, K). The array data 165b is array data of the array B (J, K, I). The array data 165c is array data of the array C (K, I, J). The array data 165d ₁ is array data of the array D (I, X, J). The array data 165d ₂ is array data of the array D (I, Y, J). As described above, the sequence analysis unit 110 generates the sequence data 165 by the number of the extracted sequences.

図１５に示す配列データ１６５は，該当配列についての配列名，添え字等の情報を有する。配列データ１６５において，添え字のフィールドは，配列の次元数分だけ用意される。すなわち，図１５に示すように，３次元の配列の配列データ１６５であれば，添え字＃１，添え字＃２，添え字＃３の３つの添え字のフィールドが存在する。 The array data 165 shown in FIG. 15 has information such as an array name and a suffix for the corresponding array. In the array data 165, as many subscript fields as the number of dimensions of the array are prepared. That is, as shown in FIG. 15, in the case of array data 165 of a three-dimensional array, there are three subscript fields, subscript # 1, subscript # 2, and subscript # 3.

また，図１５に示すように，配列解析部１１０は，ループデータ記憶部２００に記憶されたループデータ２１０における配列データのフィールドに，生成した配列データ１６５へのポインタを設定する。ここでは，最内ループのループデータ２１０に対してのみ，生成した配列データ１６５へのポインタが設定されるものとする。例えば，図１５に示すように，ブロッキング対象ループの最内ループであるループＩのループデータ２１０ｉにおける配列データのフィールドに，５つの配列データ１６５へのポインタが設定される。 As shown in FIG. 15, the sequence analysis unit 110 sets a pointer to the generated sequence data 165 in the sequence data field in the loop data 210 stored in the loop data storage unit 200. Here, it is assumed that a pointer to the generated array data 165 is set only for the loop data 210 of the innermost loop. For example, as shown in FIG. 15, pointers to five array data 165 are set in the array data field in the loop data 210i of the loop I which is the innermost loop of the blocking target loop.

図１６は，本実施の形態の配列解析部による配列解析処理フローチャートである。 FIG. 16 is a flowchart of sequence analysis processing by the sequence analysis unit of this embodiment.

配列解析部１１０は，ブロッキング対象ループの最内ループ内に存在する配列を１つ選択する（ステップＳ２０）。配列解析部１１０は，選択した配列の配列名を取得する（ステップＳ２１）。配列解析部１１０は，選択した配列の次元数，すなわち添え字の数を取得する（ステップＳ２２），配列解析部１１０は，選択した配列の添え字を登場順に取得する（ステップＳ２３）。 The sequence analysis unit 110 selects one sequence existing in the innermost loop of the blocking target loop (step S20). The sequence analysis unit 110 acquires the sequence name of the selected sequence (step S21). The sequence analysis unit 110 acquires the number of dimensions of the selected sequence, that is, the number of subscripts (step S22), and the sequence analysis unit 110 acquires the subscripts of the selected sequence in the order of appearance (step S23).

配列解析部１１０は，配列名と添え字とが，選択された配列と全く同じである配列の配列データ１６５がすでにあるかを判定する（ステップＳ２４）。配列名と添え字とが，選択された配列と全く同じである配列の配列データ１６５がまだなければ（ステップＳ２４のＮＯ），配列解析部１１０は，選択された配列の配列データ１６５を生成する（ステップＳ２５）。このとき，配列解析部１１０は，取得された配列の次元数分の添え字フィールドを有する配列データ１６５を用意する。生成された配列データ１６５は，配列情報記憶部１６０に記憶される。 The sequence analysis unit 110 determines whether there is already sequence data 165 having a sequence name and a subscript that are exactly the same as the selected sequence (step S24). If there is not yet array data 165 of the array whose array name and subscript are exactly the same as the selected array (NO in step S24), the array analysis unit 110 generates array data 165 of the selected array. (Step S25). At this time, the array analysis unit 110 prepares array data 165 having subscript fields for the number of dimensions of the acquired array. The generated array data 165 is stored in the array information storage unit 160.

配列解析部１１０は，最内ループ内に存在するすべての配列について処理が終了したかを判定する（ステップＳ２６）。すべての配列についてまだ処理が終了していなければ（ステップＳ２６のＮＯ），配列解析部１１０は，ステップＳ２０に戻って次の配列の処理に移る。すべての配列について処理が終了していれば（ステップＳ２６のＹＥＳ），配列解析部１１０は，処理を終了する。 The sequence analysis unit 110 determines whether the processing has been completed for all sequences existing in the innermost loop (step S26). If the processing has not yet been completed for all the arrays (NO in step S26), the array analysis unit 110 returns to step S20 and proceeds to processing for the next array. If the processing has been completed for all the arrays (YES in step S26), the array analysis unit 110 ends the processing.

次に，本実施の形態のアクセスパターン解析部１２０による処理の例を説明する。 Next, an example of processing by the access pattern analysis unit 120 of this embodiment will be described.

アクセスパターン解析部１２０は，配列解析部１１０によって抽出された配列のアクセスパターンを解析する。 The access pattern analysis unit 120 analyzes the access pattern of the sequence extracted by the sequence analysis unit 110.

アクセスパターンは，配列に対するメモリアクセスのパターンを示す。本実施の形態では，配列に対するメモリアクセスにおいて，アドレスに沿って順次メモリ３にアクセスするパターンをストレートと呼び，アドレスの間隔を空けて点々とメモリ３にアクセスするパターンをクロスと呼ぶものとする。アクセスパターンは，配列の添え字に登場する制御変数によって判断することができる。 The access pattern indicates a memory access pattern for the array. In the present embodiment, in memory access to an array, a pattern that sequentially accesses the memory 3 along the address is referred to as straight, and a pattern that accesses the memory 3 at intervals with an address interval is referred to as cross. The access pattern can be determined by the control variable appearing in the array subscript.

図１７は，本実施の形態によるアクセスパターンを説明する図である。 FIG. 17 is a diagram for explaining an access pattern according to the present embodiment.

図１７に示すループＫ，ループＪ，ループＩからなる多重ループにおいて，制御変数Ｋ，Ｊ，Ｉを更新するテンポが速い順に並べると，最内ループから最外ループに向かって出現する制御変数の順序で，Ｉ→Ｊ→Ｋとなる。本実施の形態では，ブロッキング対象ループの最内ループから最外ループに向かう順序で出現する制御変数を並べたものを，制御変数の連鎖と呼ぶものとする。 In the multiple loop composed of loop K, loop J, and loop I shown in FIG. 17, when control variables K, J, and I are arranged in order of increasing tempo, control variables that appear from the innermost loop toward the outermost loop are displayed. In order, I → J → K. In this embodiment, an arrangement of control variables that appear in the order from the innermost loop to the outermost loop of the blocking target loop is called a chain of control variables.

図１７（Ａ）に示す多重ループにおいて，最内のループＩの内側には配列Ａ（Ｉ，Ｊ，Ｋ）が存在する。この配列Ａ（Ｉ，Ｊ，Ｋ）の添え字として使われている制御変数を，その登場する次元の順に並べると，Ｉ→Ｊ→Ｋとなる。本実施の形態では，ブロッキング対象ループの最内ループ内にある配列において，その配列に添え字として使われている制御変数をその出現順に並べたものを，添え字の連鎖と呼ぶものとする。このように，添え字の連鎖が制御変数の連鎖と同じ配列のアクセスパターンが，ストレートである。 In the multiple loop shown in FIG. 17A, the array A (I, J, K) exists inside the innermost loop I. When the control variables used as subscripts of this array A (I, J, K) are arranged in the order of their appearing dimensions, I → J → K. In the present embodiment, in an array in the innermost loop of the blocking target loop, a control variable used as a subscript in the array in the order of appearance is called a subscript chain. Thus, an access pattern in which the subscript chain is the same as the control variable chain is straight.

図１７（Ｂ）に示すブロッキング対象ループにおいて，上述したように制御変数の連鎖は，Ｉ→Ｊ→Ｋである。また，最内ループ内にある配列Ｂ（Ｊ，Ｋ，Ｉ）の添え字の連鎖は，Ｊ→Ｋ→Ｉである。このように，添え字の連鎖が制御変数の連鎖と異なる配列のアクセスパターンが，クロスである。 In the blocking target loop shown in FIG. 17B, as described above, the chain of control variables is I → J → K. The subscript chain of the array B (J, K, I) in the innermost loop is J → K → I. Thus, an access pattern of an array in which the subscript chain is different from the control variable chain is cross.

ブロッキングによる最適化は，配列に対するメモリアクセスがメモリ３のアドレス順序ではなく，点在するようなケースにおいて，メモリ３へのアクセス範囲を局所化することで，キャッシュヒット率を向上させる最適化である。そのため，アクセスパターンがメモリ３を順次アクセスするストレートの配列しかないような多重ループに対してブロッキングによる最適化を適用しても，効果が出ない場合が多い。 The optimization by blocking is an optimization that improves the cache hit rate by localizing the access range to the memory 3 in the case where the memory access to the array is scattered not in the address order of the memory 3. . For this reason, even if the optimization by blocking is applied to a multiple loop in which the access pattern has only a straight array that sequentially accesses the memory 3, there is often no effect.

そこで，本実施の形態では，アクセスパターン解析部１２０によって，ブロッキング対象ループ内の各配列のアクセスパターンを解析することにより，ブロッキング対象ループに対してブロッキングによる最適化を適用することの有効性をチェックする。アクセスパターン解析部１２０は，ブロッキング対象ループが，アクセスパターンがメモリ３を点々とアクセスするクロスの配列を内側に持つような場合に，そのブロッキング対象ループに対するブロッキングによる最適化が有効であると判定する。 Therefore, in this embodiment, the access pattern analysis unit 120 analyzes the access pattern of each array in the blocking target loop, thereby checking the effectiveness of applying the optimization by blocking to the blocking target loop. To do. The access pattern analysis unit 120 determines that the optimization by blocking for the blocking target loop is effective when the blocking target loop has an array of crosses in which the access pattern accesses the memory 3 point by point. .

アクセスパターン解析部１２０は，ブロッキング対象ループに対するブロッキングによる最適化が有効であると判定すれば，そのブロッキング対象ループの最内ループのループデータ２１０における有効フラグを“ＴＲＵＥ”に設定する。 If the access pattern analysis unit 120 determines that the optimization by blocking for the blocking target loop is effective, the access pattern analysis unit 120 sets the effective flag in the loop data 210 of the innermost loop of the blocking target loop to “TRUE”.

例えば，前述したように，配列解析部１１０は，図１１（Ａ）に示すブロッキング対象ループの内部から５つの配列を抽出し，５つの配列データ１６５を作成する。アクセスパターン解析部１２０は，配列解析部１１０により生成された５つの配列データ１６５を配列情報記憶部１６０から取得し，取得された配列データ１６５から各配列の添え字の連鎖を作成する。 For example, as described above, the sequence analysis unit 110 extracts five sequences from the inside of the blocking target loop shown in FIG. 11A and creates five sequence data 165. The access pattern analysis unit 120 acquires the five array data 165 generated by the array analysis unit 110 from the array information storage unit 160, and creates a subscript chain for each array from the acquired array data 165.

このとき，アクセスパターン解析部１２０は，添え字にブロッキング対象ループ内で値が不変のものがあれば，その添え字を省いて添え字の連鎖を作成する。ブロッキング対象ループ内で値が不変のものは，ブロッキング対象ループに含まれるループの制御変数以外のもの，例えば固定値やブロッキング対象ループより外側のループの制御変数などである。 At this time, if there is a subscript whose value does not change in the blocking target loop, the access pattern analysis unit 120 omits the subscript and creates a subscript chain. Those whose values do not change in the blocking target loop are those other than the control variables of the loop included in the blocking target loop, such as a fixed value or a control variable of a loop outside the blocking target loop.

図１１（Ａ）に示すブロッキング対象ループの内部から抽出された５つの配列についての添え字の連鎖は，次のようになる。 The subscript chain for the five sequences extracted from the inside of the blocking target loop shown in FIG. 11A is as follows.

配列Ａ（Ｉ，Ｊ，Ｋ）：Ｉ→Ｊ→Ｋ
配列Ｂ（Ｊ，Ｋ，Ｉ）：Ｊ→Ｋ→Ｉ
配列Ｃ（Ｋ，Ｉ，Ｊ）：Ｋ→Ｉ→Ｊ
配列Ｄ（Ｉ，Ｘ，Ｊ）：Ｉ→Ｊ
配列Ｄ（Ｉ，Ｙ，Ｊ）：Ｉ→Ｊ
配列Ｄ（Ｉ，Ｘ，Ｊ），配列Ｄ（Ｉ，Ｙ，Ｊ）については，それぞれＸ，Ｙがブロッキング対象ループ内で不変値であるので，添え字の連鎖から省かれる。 Array A (I, J, K): I → J → K
Array B (J, K, I): J → K → I
Array C (K, I, J): K → I → J
Array D (I, X, J): I → J
Array D (I, Y, J): I → J
The arrays D (I, X, J) and D (I, Y, J) are omitted from the chain of subscripts because X and Y are invariant values in the blocking target loop, respectively.

ここで，図１１（Ａ）に示すブロッキング対象ループに含まれるループは，内側から順に，ループＩ，ループＪ，ループＫである。よって，図１１（Ａ）に示すブロッキング対象ループの制御変数の連鎖は，Ｉ→Ｊ→Ｋとなる。 Here, the loops included in the blocking target loop shown in FIG. 11A are loop I, loop J, and loop K in order from the inside. Therefore, the chain of control variables in the blocking target loop shown in FIG. 11A is I → J → K.

図１１（Ａ）に示すブロッキング対象ループの制御変数の連鎖と，各配列の添え字の連鎖とを比較すると，各配列のアクセスパターンは，次のようになる。 When the chain of control variables in the blocking target loop shown in FIG. 11A is compared with the chain of subscripts of each array, the access pattern of each array is as follows.

配列Ａ（Ｉ，Ｊ，Ｋ）：ストレート
配列Ｂ（Ｊ，Ｋ，Ｉ）：クロス
配列Ｃ（Ｋ，Ｉ，Ｊ）：クロス
配列Ｄ（Ｉ，Ｘ，Ｊ）：ストレート
配列Ｄ（Ｉ，Ｙ，Ｊ）：ストレート
このように，配列Ｂ（Ｊ，Ｋ，Ｉ）と配列Ｃ（Ｋ，Ｉ，Ｊ）のアクセスパターンがクロスであるので，アクセスパターン解析部１２０は，図１１（Ａ）に示すブロッキング対象ループに対するブロッキングによる最適化が有効であると判定する。 Array A (I, J, K): Straight Array B (J, K, I): Cross Array C (K, I, J): Cross Array D (I, X, J): Straight Array D (I, Y) , J): Straight In this way, since the access patterns of the array B (J, K, I) and the array C (K, I, J) are crosses, the access pattern analysis unit 120 is shown in FIG. It is determined that optimization by blocking is effective for the blocking target loop shown.

図１８は，本実施の形態によるループデータにおける有効フラグを設定した例を示す図である。 FIG. 18 is a diagram showing an example in which a valid flag is set in loop data according to the present embodiment.

図１８に示すように，アクセスパターン解析部１２０は，ブロッキングによる最適化が有効であると判定された図１１（Ａ）に示すブロッキング対象ループについて，最内ループＩのループデータ２１０ｉにおける有効フラグを，“ＴＲＵＥ”に設定する。 As shown in FIG. 18, the access pattern analysis unit 120 sets the validity flag in the loop data 210i of the innermost loop I for the blocking target loop shown in FIG. , “TRUE”.

図１９は，本実施の形態のアクセスパターン解析部によるアクセスパターン解析処理フローチャートである。 FIG. 19 is a flowchart of access pattern analysis processing by the access pattern analysis unit of the present embodiment.

アクセスパターン解析部１２０は，ブロッキング対象ループの最内ループのループデータ２１０における有効フラグを，無効を示す“ＦＡＬＳＥ”で初期化する（ステップＳ３０）。 The access pattern analysis unit 120 initializes the valid flag in the loop data 210 of the innermost loop of the blocking target loop with “FALSE” indicating invalidity (step S30).

アクセスパターン解析部１２０は，ブロッキング対象ループが多重ループであるかを判定する（ステップＳ３１）。ブロッキング対象ループが多重ループでなければ（ステップＳ３１のＮＯ），アクセスパターン解析部１２０は，処理を終了する。ブロッキング対象ループが１重のループである場合には，そのブロッキング対象ループにブロッキングによる最適化を適用しても，効果が出ない場合が多い。本実施の形態では，このような１重ループであるブロッキング対象ループに対するブロッキングを行わない。なお，上述の処理対象ループ解析部１００の処理において，タイトなループが１重ループである場合に，ブロッキング対象ループから外すようにしてもよい。 The access pattern analysis unit 120 determines whether the blocking target loop is a multiple loop (step S31). If the blocking target loop is not a multiple loop (NO in step S31), the access pattern analysis unit 120 ends the process. When the blocking target loop is a single loop, there is often no effect even if the optimization by blocking is applied to the blocking target loop. In this embodiment, blocking is not performed on such a blocking target loop that is a single loop. In the processing of the processing target loop analysis unit 100 described above, when the tight loop is a single loop, it may be removed from the blocking target loop.

ブロッキング対象ループが多重ループであれば（ステップＳ３１のＹＥＳ），アクセスパターン解析部１２０は，ブロッキング対象ループのループデータ２１０から，制御変数の連鎖を作成する（ステップＳ３２）。 If the blocking target loop is a multiple loop (YES in step S31), the access pattern analysis unit 120 creates a chain of control variables from the loop data 210 of the blocking target loop (step S32).

アクセスパターン解析部１２０は，配列情報記憶部１６０から，ブロッキング対象ループの最内ループ内にある配列の配列データ１６５を１つ選択する（ステップＳ３３）。アクセスパターン解析部１２０は，選択された配列データ１６５の添え字の連鎖作成処理を行う（ステップＳ３４）。添え字の連鎖作成処理については，後述する。 The access pattern analysis unit 120 selects one sequence data 165 of the sequence in the innermost loop of the blocking target loop from the sequence information storage unit 160 (step S33). The access pattern analysis unit 120 performs a chain creation process for the subscript of the selected array data 165 (step S34). Subscript chain creation processing will be described later.

アクセスパターン解析部１２０は，制御変数の連鎖と選択された配列データ１６５の添え字の連鎖とを比較し，アクセスパターンを確認する（ステップＳ３５）。アクセスパターンがクロスであれば（ステップＳ３５のクロス），アクセスパターン解析部１２０は，ブロッキング対象ループの最内ループのループデータ２１０における有効フラグに，有効を示す“ＴＲＵＥ”を設定し（ステップＳ３６），処理を終了する。 The access pattern analysis unit 120 compares the chain of control variables with the chain of subscripts of the selected sequence data 165, and confirms the access pattern (step S35). If the access pattern is cross (cross in step S35), the access pattern analysis unit 120 sets “TRUE” indicating validity in the valid flag in the loop data 210 of the innermost loop of the blocking target loop (step S36). , Terminate the process.

アクセスパターンがストレートであれば（ステップＳ３５のストレート），アクセスパターン解析部１２０は，ブロッキング対象ループの最内ループ内にあるすべての配列の配列データ１６５について処理が終了したかを判定する（ステップＳ３７）。すべての配列データ１６５についてまだ処理が終了していなければ（ステップＳ３７のＮＯ），アクセスパターン解析部１２０は，ステップＳ３３に戻って次の配列データ１６５の処理に移る。すべての配列データ１６５について処理が終了していれば（ステップＳ３７のＹＥＳ），アクセスパターン解析部１２０は，処理を終了する。 If the access pattern is straight (straight in step S35), the access pattern analysis unit 120 determines whether the processing has been completed for the array data 165 of all the arrays in the innermost loop of the blocking target loop (step S37). ). If the processing has not been completed for all the array data 165 (NO in step S37), the access pattern analysis unit 120 returns to step S33 and proceeds to the processing of the next array data 165. If the processing has been completed for all the array data 165 (YES in step S37), the access pattern analysis unit 120 ends the processing.

図２０は，本実施の形態のアクセスパターン解析部による添え字の連鎖作成処理フローチャートである。 FIG. 20 is a flowchart of subscript chain creation processing by the access pattern analysis unit of this embodiment.

アクセスパターン解析部１２０は，添え字の連鎖を作成する対象配列の配列データ１６５から，出現順に添え字を１つ取得する（ステップＳ３４０）。アクセスパターン解析部１２０は，取得された添え字がブロッキング対象ループに含まれるループの制御変数であるかを判定する（ステップＳ３４１）。 The access pattern analysis unit 120 acquires one subscript in the order of appearance from the array data 165 of the target sequence for creating a subscript chain (step S340). The access pattern analysis unit 120 determines whether the acquired subscript is a control variable of a loop included in the blocking target loop (step S341).

取得された添え字がブロッキング対象ループに含まれるループの制御変数であれば（ステップＳ３４１のＹＥＳ），アクセスパターン解析部１２０は，取得された添え字を添え字の連鎖に追加する（ステップＳ３４２）。 If the acquired subscript is a control variable of a loop included in the blocking target loop (YES in step S341), the access pattern analysis unit 120 adds the acquired subscript to the subscript chain (step S342). .

アクセスパターン解析部１２０は，配列データ１６５内のすべての添え字について処理が終了したかを判定する（ステップＳ３４３）。すべての添え字についてまだ処理が終了していなければ（ステップＳ３４３のＮＯ），アクセスパターン解析部１２０は，ステップＳ３４０に戻って次の添え字の処理に移る。すべての添え字について処理が終了していれば（ステップＳ３４３のＹＥＳ），アクセスパターン解析部１２０は，処理を終了する。 The access pattern analysis unit 120 determines whether processing has been completed for all subscripts in the array data 165 (step S343). If the processing has not been completed for all the subscripts (NO in step S343), the access pattern analysis unit 120 returns to step S340 and proceeds to the processing of the next subscript. If the processing has been completed for all the subscripts (YES in step S343), the access pattern analysis unit 120 ends the processing.

次に，本実施の形態のメモリサイズ計算式生成部１３０による処理の例を説明する。 Next, an example of processing by the memory size calculation formula generation unit 130 of this embodiment will be described.

メモリサイズ計算式生成部１３０は，アクセスパターン解析部１２０による判定の結果，ループデータ２１０における有効フラグが“ＴＲＵＥ”となっているブロッキング対象ループについて，メモリサイズ計算式を生成する。メモリサイズ計算式は，ブロッキング対象ループに対するブロッキングによって最適化される範囲で，ブロッキング対象ループ内の全配列が使用するメモリサイズを計算する計算式である。 As a result of the determination by the access pattern analysis unit 120, the memory size calculation formula generation unit 130 generates a memory size calculation formula for the blocking target loop whose effective flag in the loop data 210 is “TRUE”. The memory size calculation formula is a calculation formula for calculating the memory size used by all the arrays in the blocking target loop within a range optimized by blocking the blocking target loop.

例えば，図１１（Ｂ）に示すブロッキング後の最適化対象ループにおいて，ループＫ，ループＪ，ループＩが，分割ブロック長ｂｌｏｃｋを用いたブロッキングによって最適化された範囲である。このとき，メモリサイズ計算式は，図１１（Ｂ）に示すブロッキング後の最適化対象ループにおけるループＫ，ループＪ，ループＩのみが回転する範囲で，その最内ループ内の全配列が使用するメモリサイズを計算する計算式である。 For example, in the optimization target loop after blocking shown in FIG. 11B, loop K, loop J, and loop I are ranges optimized by blocking using the divided block length block. At this time, the memory size calculation formula is used by all the arrays in the innermost loop within a range in which only the loop K, loop J, and loop I in the optimization target loop after blocking shown in FIG. This is a calculation formula for calculating the memory size.

メモリサイズ計算式生成部１３０は，ブロッキング対象ループ内の配列ごとに生成される配列別メモリ計算式から，メモリサイズ計算式を生成する。配列別メモリ計算式は，配列ごとに生成される，ブロッキング対象ループに対するブロッキングによって最適化される範囲で，それぞれの配列が使用するメモリサイズを計算する計算式である。 The memory size calculation formula generation unit 130 generates a memory size calculation formula from the array-specific memory calculation formula generated for each array in the blocking target loop. The memory calculation formula for each array is a calculation formula for calculating the memory size used by each array within a range that is generated for each array and is optimized by blocking the blocking target loop.

図２１は，本実施の形態による配列別メモリ計算式の生成を説明する図である。 FIG. 21 is a diagram for explaining the generation of the memory calculation formula for each array according to the present embodiment.

図２１（Ａ）は，最内ループＩの内側に４次元の配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）を有する，ループＬ，ループＫ，ループＪ，ループＩからなるタイトな４重ループの例を示す。ここでは，図２１に示す４重ループに対してブロッキングによる最適化を行った場合に，ブロッキングによって最適化された範囲で，配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）が使用するメモリサイズを計算する計算式を生成する例を説明する。 FIG. 21A shows a tight quadruple loop consisting of loop L, loop K, loop J, and loop I having a four-dimensional array A (I, J, K, L) inside the innermost loop I. An example is shown. Here, when optimization by blocking is performed on the quadruple loop shown in FIG. 21, the memory size used by the array A (I, J, K, L) is calculated within the range optimized by blocking. An example of generating a calculation formula is described.

なお，図２１（Ａ）に示すように，配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）は，配列の型が実数（ｒｅａｌ）であることが宣言されており，配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）の型サイズは，８バイトとなる。また，ループＬにはブロッキングを実行しない旨が指示されており，そのループデータ２１０におけるブロッキングフラグには，“ＦＡＬＳＥ”が設定されている。また，ループＪには固定の分割ブロック長２０が指示されており，そのループデータ２１０における分割ブロック長のフィールドには，２０が設定されている。ループＫとループＩに対しては，特定の指示がないので，そのループデータ２１０における分割ブロック長のフィールドには，自動計算を示す０が設定されている。 As shown in FIG. 21A, the array A (I, J, K, L) is declared that the type of the array is a real number, and the array A (I, J, K) , L) type size is 8 bytes. Further, the loop L is instructed not to execute blocking, and “FALSE” is set in the blocking flag in the loop data 210. Further, a fixed divided block length 20 is instructed to the loop J, and 20 is set in the divided block length field in the loop data 210. Since there is no specific instruction for the loop K and the loop I, 0 indicating automatic calculation is set in the field of the divided block length in the loop data 210.

図２１（Ｂ）は，配列別メモリ計算データ１７５の例を示す。配列別メモリ計算データ１７５は，本実施の形態において，配列別メモリ計算式を表すデータである。配列別メモリ計算データ１７５は，可変部と固定部とを有する。 FIG. 21B shows an example of memory calculation data 175 for each array. The array-specific memory calculation data 175 is data representing an array-specific memory calculation formula in the present embodiment. The array-specific memory calculation data 175 has a variable part and a fixed part.

可変部は，配列別メモリ計算式において，自動計算される分割ブロック長の値によって，配列が使用するメモリサイズが変化する部分を示している。図２１（Ｂ）に示す配列別メモリ計算データ１７５において，ｎは自動計算する分割ブロック長を示す。計算式を生成する配列において，自動計算される分割ブロック長で最適化を行うループの制御変数である添え字の個数をＺとすると，配列別メモリ計算データ１７５の可変部は，ｎ^Zで表される。ここでは，Ｚを指数部と呼ぶ。 The variable part indicates a part where the memory size used by the array changes depending on the value of the automatically calculated divided block length in the memory calculation formula for each array. In the memory calculation data 175 for each array shown in FIG. 21B, n indicates the divided block length to be automatically calculated. In the array for generating the calculation formula, if the number of subscripts that are control variables of the loop that performs optimization with the automatically calculated divided block length is Z, the variable part of the array-specific memory calculation data 175 is represented by n ^Z. Is done. Here, Z is called an exponent part.

固定部は，配列別メモリ計算式において，自動計算される分割ブロック長の値によって，配列が使用するメモリサイズが変化しない部分を示している。配列別メモリ計算データ１７５の固定部は，計算式を生成する配列において固定の分割ブロック長が指定されたループの制御変数が添え字に含まれていない場合には，配列の型サイズの値となる。また，配列別メモリ計算データ１７５の固定部は，計算式を生成する配列において固定の分割ブロック長が指定されたループの制御変数が添え字に含まれている場合には，その固定のブロック長と配列の型サイズとを掛け合わせた値となる。 The fixed portion indicates a portion where the memory size used by the array does not change depending on the value of the automatically calculated divided block length in the memory calculation formula for each array. The fixed part of the array-specific memory calculation data 175 includes the value of the type size of the array if the subscript does not include a loop control variable for which a fixed divided block length is specified in the array that generates the calculation formula. Become. Also, the fixed part of the array-specific memory calculation data 175 is the fixed block length if the subscript includes a loop control variable for which a fixed divided block length is specified in the array for generating the calculation formula. And the type size of the array.

なお，ブロッキングを実行しない旨が指定されたループの制御変数である添え字は，配列別メモリ計算式に含めない。また，ブロッキング対象ループ内で値が不変の添え字は，ブロッキングによる最適化の影響を受けないので，配列別メモリ計算式の生成では無視される。 A subscript that is a control variable of a loop in which blocking is not executed is not included in the memory calculation for each array. In addition, subscripts whose values do not change in the blocking target loop are not affected by the optimization due to blocking, and are therefore ignored when generating memory formulas for each array.

例えば，図２１（Ａ）に示す多重ループ内の配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）において，添え字Ｉと添え字Ｋの２つが，自動計算される分割ブロック長で最適化を行うループの制御変数である。よって，指数部Ｚ＝２となり，配列別メモリ計算式における可変部は，ｎ²となる。また，添え字Ｊは，固定の分割ブロック長として２０が指定されたループの制御変数である。配列Ａの型サイズが８バイトであるので，配列別メモリ計算式における固定部は，８（配列の型サイズ）×２０（固定の分割ブロック長）＝１６０となる。なお，添え字Ｌについては，ブロッキングを実行しない旨が指定されたループの制御変数であるので，配列別メモリ計算式における可変部にも固定部にも含まれない。 For example, in the array A (I, J, K, L) in the multiple loop shown in FIG. 21A, a loop in which two subscripts I and K are optimized with the automatically calculated divided block length. Control variable. Therefore, the exponent part Z = 2, and the variable part in the memory calculation formula for each array is n ² . The subscript J is a control variable of a loop in which 20 is designated as a fixed divided block length. Since the type size of the array A is 8 bytes, the fixed part in the memory calculation formula for each array is 8 (array type size) × 20 (fixed divided block length) = 160. Note that the subscript L is a control variable of a loop in which it is specified that blocking is not executed, and thus is not included in the variable part or the fixed part in the memory calculation formula for each array.

配列別メモリ計算データ１７５には，可変部の計算結果を示すフィールドに，固定部の計算結果が格納される。すなわち，図２１（Ｂ）に示すように，可変部の計算結果ｎ²のフィールドに，固定部の計算結果１６０が格納される。なお，配列別メモリ計算データ１７５において，各フィールドの固定部の初期値は０である。 In the memory calculation data 175 for each array, the calculation result of the fixed part is stored in the field indicating the calculation result of the variable part. That is, as shown in FIG. 21B, the calculation result 160 of the fixed part is stored in the field of the calculation result n ² of the variable part. In the memory calculation data 175 for each array, the initial value of the fixed part of each field is 0.

図２１（Ｂ）に示す配列別メモリ計算データ１７５が表す配列別メモリ計算式は，
（ｎ⁰×０）＋（ｎ¹×０）＋（ｎ²×１６０）＋（ｎ³×０）＋（ｎ⁴×０）
＝１６０ｎ²
となる。この配列別メモリ計算式１６０ｎ²が，図２１（Ａ）に示す多重ループに対して分割ブロック長ｎを用いたブロッキングによる最適化を実行した場合の，配列Ａ（Ｉ，Ｊ，Ｋ，Ｌ）が使用するメモリサイズを計算する計算式となる。 The memory calculation formula for each array represented by the memory calculation data for each array 175 shown in FIG.
(N ⁰ × 0) + (n ¹ × 0) + (n ² × 160) + (n ³ × 0) + (n ⁴ × 0)
= 160n ²
It becomes. This array memory calculation formula 160n ² is an array A (I, J, K, L) in the case where optimization by blocking using the divided block length n is performed on the multiple loop shown in FIG. Is a calculation formula for calculating the memory size used.

図２２は，本実施の形態による配列別メモリ計算式からのメモリサイズ計算式の生成を説明する図である。 FIG. 22 is a diagram for explaining generation of a memory size calculation formula from an array-specific memory calculation formula according to the present embodiment.

メモリサイズ計算式生成部１３０は，図１１（Ａ）に示すブロッキング対象ループから抽出された各配列について，配列情報記憶部１６０に記憶された各配列データ１６５と，ループデータ記憶部２００に記憶された各ループデータ２１０とを用いて，配列別メモリ計算式を生成する。 The memory size calculation formula generation unit 130 stores each array data 165 stored in the array information storage unit 160 and each loop data storage unit 200 for each array extracted from the blocking target loop shown in FIG. Using each loop data 210, an array-specific memory calculation formula is generated.

ここでは，配列Ａ，配列Ｂ，配列Ｃ，配列Ｄのすべてについて，配列の型としてｒｅａｌ型が宣言されており，すべての配列の型サイズが８バイトであるものとする。 Here, it is assumed that the real type is declared as the array type for all of the arrays A, B, C, and D, and the type size of all the arrays is 8 bytes.

配列Ａ（Ｉ，Ｊ，Ｋ），配列Ｂ（Ｊ，Ｋ，Ｉ），配列Ｃ（Ｋ，Ｉ，Ｊ）については，すべての添え字がブロッキング対象ループに含まれるループの制御変数であるので，配列別メモリ計算式の可変部はｎ³となる。また，ブロッキング対象ループに固定の分割ブロック長が指示されたループがないので，配列別メモリ計算式の固定部は配列の型サイズのみで８となる。なお，ブロッキング対象ループに含まれる各ループのループデータ２１０におけるブロッキングフラグは，すべて“ＴＲＵＥ”である。よって，配列Ａ（Ｉ，Ｊ，Ｋ），配列Ｂ（Ｊ，Ｋ，Ｉ），配列Ｃ（Ｋ，Ｉ，Ｊ）の配列別メモリ計算式は，すべて８ｎ³となる。 For array A (I, J, K), array B (J, K, I), and array C (K, I, J), all subscripts are control variables for the loops included in the blocking target loop. The variable part of the memory calculation formula for each array is n ³ . Further, since there is no loop for which a fixed divided block length is instructed in the blocking target loop, the fixed part of the memory calculation formula for each array is 8 only in the array type size. Note that all blocking flags in the loop data 210 of each loop included in the blocking target loop are “TRUE”. Therefore, the array-specific memory calculation formulas for array A (I, J, K), array B (J, K, I), and array C (K, I, J) are all 8n ³ .

配列Ｄ（Ｉ，Ｘ，Ｊ），配列Ｄ（Ｉ，Ｙ，Ｊ）については，それぞれ，添え字Ｘ，添え字Ｙが，ブロッキング対象ループに含まれるループの制御変数ではなく，ブロッキング対象ループ内で不変の値となる。添え字Ｉと添え字Ｊはブロッキング対象ループに含まれるループの制御変数であるので，配列別メモリ計算式の可変部はｎ²となる。また，ブロッキング対象ループに固定の分割ブロック長が指示されたループがないので，配列別メモリ計算式の固定部は配列の型サイズのみで８となる。よって，配列Ｄ（Ｉ，Ｘ，Ｊ），配列Ｄ（Ｉ，Ｙ，Ｊ）の配列別メモリ計算式は，それぞれ８ｎ²となる。 For array D (I, X, J) and array D (I, Y, J), subscript X and subscript Y are not control variables of the loop included in the blocking target loop, but are in the blocking target loop, respectively. It becomes an invariant value. Since subscript I and subscript J are control variables of the loop included in the blocking target loop, the variable part of the memory calculation formula for each array is n ² . Further, since there is no loop for which a fixed divided block length is instructed in the blocking target loop, the fixed part of the memory calculation formula for each array is 8 only in the array type size. Therefore, the memory calculation formula for each of the arrays D (I, X, J) and D (I, Y, J) is 8n ² .

各配列について生成された配列別メモリ計算式を表す配列別メモリ計算データ１７５は，図２２に示す通りとなる。配列別メモリ計算データ１７５ａは，配列Ａ（Ｉ，Ｊ，Ｋ）が使用するメモリサイズを計算する配列別メモリ計算式を表すデータである。配列別メモリ計算データ１７５ｂは，配列Ｂ（Ｊ，Ｋ，Ｉ）が使用するメモリサイズを計算する配列別メモリ計算式を表すデータである。配列別メモリ計算データ１７５ｃは，配列Ｃ（Ｋ，Ｉ，Ｊ）が使用するメモリサイズを計算する配列別メモリ計算式を表すデータである。配列別メモリ計算データ１７５ｄ₁は，配列Ｄ（Ｉ，Ｘ，Ｊ）が使用するメモリサイズを計算する配列別メモリ計算式を表すデータである。配列別メモリ計算データ１７５ｄ₂は，配列Ｄ（Ｉ，Ｙ，Ｊ）が使用するメモリサイズを計算する配列別メモリ計算式を表すデータである。メモリサイズ計算式生成部１３０は，生成された各配列別メモリ計算データ１７５を，配列別メモリ計算情報記憶部１７０に記憶する。 The array-specific memory calculation data 175 representing the array-specific memory calculation formula generated for each array is as shown in FIG. The array-specific memory calculation data 175a is data representing an array-specific memory calculation formula for calculating the memory size used by the array A (I, J, K). The array-specific memory calculation data 175b is data representing an array-specific memory calculation formula for calculating the memory size used by the array B (J, K, I). The array-specific memory calculation data 175c is data representing an array-specific memory calculation formula for calculating the memory size used by the array C (K, I, J). The array-specific memory calculation data 175d ₁ is data representing an array-specific memory calculation formula for calculating the memory size used by the array D (I, X, J). The array-specific memory calculation data 175d ₂ is data representing an array-specific memory calculation formula for calculating the memory size used by the array D (I, Y, J). The memory size calculation formula generation unit 130 stores the generated memory calculation data 175 for each array in the memory calculation information storage unit 170 for each array.

次に，メモリサイズ計算式生成部１３０は，配列ごとの配列別メモリ計算式から，全配列が使用するメモリサイズを計算するメモリサイズ計算式を生成する。メモリサイズ計算式は，配列ごとに生成されたすべての配列別メモリ計算式を加算することで得られる。 Next, the memory size calculation formula generation unit 130 generates a memory size calculation formula for calculating the memory size used by all the arrays from the array-specific memory calculation formula for each array. The memory size calculation formula is obtained by adding all the memory calculation formulas for each array generated for each array.

図２２において，メモリサイズ計算データ１８５は，メモリサイズ計算式を表すデータである。メモリサイズ計算データ１８５のデータ構造は，配列別メモリ計算データ１７５のデータ構造と同様である。 In FIG. 22, memory size calculation data 185 is data representing a memory size calculation formula. The data structure of the memory size calculation data 185 is the same as the data structure of the memory calculation data 175 for each array.

メモリサイズ計算式生成部１３０は，図２２に示すように，配列別メモリ計算情報記憶部１７０に記憶された全配列の配列別メモリ計算データ１７５の固定部を，メモリサイズ計算データ１８５上でマージする。なお，メモリサイズ計算データ１８５において，各フィールドの固定部の初期値は０である。 As shown in FIG. 22, the memory size calculation formula generation unit 130 merges the fixed portion of the memory calculation data 175 for each array stored in the memory calculation information storage unit 170 for each array on the memory size calculation data 185. To do. In the memory size calculation data 185, the initial value of the fixed part of each field is 0.

このように，メモリサイズ計算式生成部１３０により，図１１に示すブロッキング対象ループ内の全配列が使用するメモリサイズを計算する計算式を表すデータとして，図２２に示すメモリサイズ計算データ１８５が生成される。メモリサイズ計算式生成部１３０は，生成されたメモリサイズ計算データ１８５を，メモリサイズ計算情報記憶部１８０に記憶する。 As described above, the memory size calculation formula generation unit 130 generates the memory size calculation data 185 shown in FIG. 22 as data representing the calculation formula for calculating the memory size used by all the arrays in the blocking target loop shown in FIG. Is done. The memory size calculation formula generation unit 130 stores the generated memory size calculation data 185 in the memory size calculation information storage unit 180.

図２２に示すメモリサイズ計算データ１８５が表すメモリサイズ計算式は，
（ｎ⁰×０）＋（ｎ¹×０）＋（ｎ²×１６）＋（ｎ³×２４）
＝１６ｎ²＋２４ｎ³
となる。なお，配列の型サイズの単位をバイトで計算しているので，このメモリサイズ計算式で得られるサイズの単位は，バイトである。 The memory size calculation formula represented by the memory size calculation data 185 shown in FIG.
(N ⁰ × 0) + (n ¹ × 0) + (n ² × 16) + (n ³ × 24)
= 16n ² + 24n ³
It becomes. Since the unit of the array type size is calculated in bytes, the unit of size obtained by this memory size calculation formula is bytes.

図２３は，本実施の形態のメモリサイズ計算式生成部によるメモリサイズ計算式生成処理フローチャートである。 FIG. 23 is a flowchart of a memory size calculation formula generation process by the memory size calculation formula generation unit of the present embodiment.

メモリサイズ計算式生成部１３０は，配列情報記憶部１６０に記憶された，ブロッキング対象ループ内から抽出された配列の配列データ１６５を，１つ選択する（ステップＳ４０）。メモリサイズ計算式生成部１３０は，選択された配列データ１６５について，配列別メモリ計算データ生成処理を行う（ステップＳ４１）。配列別メモリ計算データ生成処理の詳細については，後述する。 The memory size calculation formula generation unit 130 selects one array data 165 of the array extracted from the blocking target loop stored in the array information storage unit 160 (step S40). The memory size calculation formula generation unit 130 performs an array-specific memory calculation data generation process for the selected array data 165 (step S41). Details of the memory calculation data generation processing for each array will be described later.

メモリサイズ計算式生成部１３０は，ブロッキング対象ループ内から抽出されたすべての配列の配列データ１６５について処理が終了したかを判定する（ステップＳ４２）。すべての配列の配列データ１６５についてまだ処理が終了していなければ（ステップＳ４２のＮＯ），メモリサイズ計算式生成部１３０は，ステップＳ４０に戻って次の配列の配列データ１６５の処理に移る。 The memory size calculation formula generation unit 130 determines whether the processing has been completed for the array data 165 of all the arrays extracted from within the blocking target loop (step S42). If the processing has not been completed for all the array data 165 of the arrays (NO in step S42), the memory size calculation formula generation unit 130 returns to step S40 and proceeds to the processing of the array data 165 of the next array.

すべての配列の配列データ１６５について処理が終了していれば（ステップＳ４２のＹＥＳ），メモリサイズ計算式生成部１３０は，生成されたすべての配列別メモリ計算データ１７５から，固定部に値が格納された可変部における最大の指数部の値を取得する（ステップＳ４３）。メモリサイズ計算式生成部１３０は，指数部が，０から取得された最大の指数部までの可変部のフィールドを有するメモリサイズ計算データ１８５を生成する（ステップＳ４４）。 If processing has been completed for array data 165 of all arrays (YES in step S42), the memory size calculation formula generation unit 130 stores values from all the generated memory calculation data 175 for each array in a fixed part. The value of the maximum exponent part in the obtained variable part is acquired (step S43). The memory size calculation formula generation unit 130 generates memory size calculation data 185 in which the exponent part has a variable part field from 0 to the maximum exponent part acquired (step S44).

メモリサイズ計算式生成部１３０は，配列別メモリ計算データ１７５を１つ選択する（ステップＳ４５）。メモリサイズ計算式生成部１３０は，選択された配列別メモリ計算データ１７５を，メモリサイズ計算データ１８５にマージする（ステップＳ４６）。 The memory size calculation formula generation unit 130 selects one array-specific memory calculation data 175 (step S45). The memory size calculation formula generation unit 130 merges the selected array-specific memory calculation data 175 with the memory size calculation data 185 (step S46).

メモリサイズ計算式生成部１３０は，生成されたすべての配列別メモリ計算データ１７５について処理が終了したかを判定する（ステップＳ４７）。すべての配列別メモリ計算データ１７５についてまだ処理が終了していなければ（ステップＳ４７のＮＯ），メモリサイズ計算式生成部１３０は，ステップＳ４５に戻って次の配列別メモリ計算データ１７５の処理に移る。すべての配列別メモリ計算データ１７５について処理が終了していれば（ステップＳ４７のＹＥＳ），メモリサイズ計算式生成部１３０は，処理を終了する。 The memory size calculation formula generation unit 130 determines whether the processing has been completed for all the generated array-based memory calculation data 175 (step S47). If the processing has not been completed for all the array-specific memory calculation data 175 (NO in step S47), the memory size calculation formula generation unit 130 returns to step S45 and proceeds to the processing of the next array-specific memory calculation data 175. . If the processing has been completed for all the array-specific memory calculation data 175 (YES in step S47), the memory size calculation formula generation unit 130 ends the processing.

図２４は，本実施の形態のメモリサイズ計算式生成部による配列別メモリ計算データ生成処理フローチャートである。 FIG. 24 is a memory calculation data generation processing flowchart for each array by the memory size calculation formula generation unit of this embodiment.

メモリサイズ計算式生成部１３０は，ａを１に，Ｚを０に初期化する（ステップＳ４１０）。 The memory size calculation formula generation unit 130 initializes a to 1 and Z to 0 (step S410).

メモリサイズ計算式生成部１３０は，配列データ１６５から添え字を１つ選択する（ステップＳ４１１）。 The memory size calculation formula generation unit 130 selects one subscript from the array data 165 (step S411).

メモリサイズ計算式生成部１３０は，ループデータ記憶部２００に記憶されたブロッキング対象ループのループデータ２１０を参照し，選択された添え字がブロッキング対象ループに含まれるループの制御変数であるかを判定する（ステップＳ４１２）。選択された添え字が制御変数でなければ（ステップＳ４１２のＮＯ），メモリサイズ計算式生成部１３０は，ステップＳ４１７の処理に進む。 The memory size calculation formula generation unit 130 refers to the loop data 210 of the blocking target loop stored in the loop data storage unit 200 and determines whether the selected subscript is a control variable of a loop included in the blocking target loop. (Step S412). If the selected subscript is not a control variable (NO in step S412), the memory size calculation formula generation unit 130 proceeds to the process of step S417.

選択された添え字が制御変数であれば（ステップＳ４１２のＹＥＳ），メモリサイズ計算式生成部１３０は，選択された添え字を制御変数とするループのループデータ２１０におけるブロッキングフラグが“ＴＲＵＥ”であるかを判定する（ステップＳ４１３）。ブロッキングフラグが“ＴＲＵＥ”でなければ（ステップＳ４１３のＮＯ），メモリサイズ計算式生成部１３０は，ステップＳ４１７の処理に進む。 If the selected subscript is a control variable (YES in step S412), the memory size calculation formula generation unit 130 indicates that the blocking flag in the loop data 210 of the loop having the selected subscript as a control variable is “TRUE”. It is determined whether or not there is (step S413). If the blocking flag is not “TRUE” (NO in step S413), the memory size calculation formula generation unit 130 proceeds to the process in step S417.

ブロッキングフラグが“ＴＲＵＥ”であれば（ステップＳ４１３のＹＥＳ），メモリサイズ計算式生成部１３０は，選択された添え字を制御変数とするループのループデータ２１０における分割ブロック長に固定の分割ブロック長が設定されているかを判定する（ステップＳ４１４）。固定の分割ブロック長が設定されていれば（ステップＳ４１４のＹＥＳ），メモリサイズ計算式生成部１３０は，ａの値を，ａ×指定された固定の分割ブロック長の値で更新する（ステップＳ４１５）。固定の分割ブロック長が指定されていなければ（ステップＳ４１４のＮＯ），メモリサイズ計算式生成部１３０は，Ｚの値をインクリメントする（ステップＳ４１６）。 If the blocking flag is “TRUE” (YES in step S413), the memory size calculation formula generation unit 130 sets the divided block length fixed to the divided block length in the loop data 210 of the loop using the selected subscript as a control variable. Is set (step S414). If a fixed division block length is set (YES in step S414), the memory size calculation formula generation unit 130 updates the value of a with a value of a designated division block length (step S415). ). If the fixed divided block length is not specified (NO in step S414), the memory size calculation formula generation unit 130 increments the value of Z (step S416).

メモリサイズ計算式生成部１３０は，配列データ１６５のすべての添え字について処理が終了したかを判定する（ステップＳ４１７）。すべての添え字についてまだ処理が終了していなければ（ステップＳ４１７のＮＯ），メモリサイズ計算式生成部１３０は，ステップＳ４１１に戻って次の添え字の処理に移る。 The memory size calculation formula generation unit 130 determines whether the processing has been completed for all subscripts of the array data 165 (step S417). If the processing has not been completed for all the subscripts (NO in step S417), the memory size calculation formula generation unit 130 returns to step S411 and proceeds to the processing of the next subscript.

すべての添え字について処理が終了していれば（ステップＳ４１７のＹＥＳ），メモリサイズ計算式生成部１３０は，配列の型サイズｂを取得する（ステップＳ４１８）。メモリサイズ計算式生成部１３０は，ｃ＝ａ×ｂを計算する（ステップＳ４１９）。メモリサイズ計算式生成部１３０は，配列別メモリ計算データ１７５の可変部がｎ^Zであるフィールドの固定部に，算出されたｃを設定する（ステップＳ４２０）。 If the processing has been completed for all the subscripts (YES in step S417), the memory size calculation formula generation unit 130 acquires the array type size b (step S418). The memory size calculation formula generation unit 130 calculates c = a × b (step S419). The memory size calculation formula generation unit 130 sets the calculated c to the fixed part of the field whose variable part of the array-specific memory calculation data 175 is n ^Z (step S420).

次に，本実施の形態の分割ブロック長計算部１４０による処理の例を説明する。 Next, an example of processing performed by the divided block length calculation unit 140 according to the present embodiment will be described.

分割ブロック長計算部１４０は，メモリサイズ計算式生成部１３０により生成されたメモリサイズ計算式に，分割ブロック長を入力し，メモリサイズを計算する。このとき，分割ブロック長計算部１４０は，メモリサイズ計算式に入力する分割ブロック長を小さい値から徐々に大きくしていき，得られたメモリサイズがキャッシュメモリのサイズを超えない，適切な分割ブロック長を求める。 The divided block length calculation unit 140 inputs the divided block length to the memory size calculation formula generated by the memory size calculation formula generation unit 130 and calculates the memory size. At this time, the divided block length calculation unit 140 gradually increases the divided block length input to the memory size calculation formula from a small value and the obtained memory size does not exceed the size of the cache memory. Find the length.

図２５は，本実施の形態によるメモリサイズ計算式を用いて分割ブロック長を自動計算する例を説明する図である。 FIG. 25 is a diagram illustrating an example in which the divided block length is automatically calculated using the memory size calculation formula according to this embodiment.

図２５において，メモリサイズ計算データ１８５は，図１１（Ａ）に示すブロッキング対象ループに基づいて生成された，図２２に示すメモリサイズ計算データ１８５である。すなわち，図２５に示すメモリサイズ計算データ１８５は，メモリサイズ計算式１６ｎ²＋２４ｎ³を表すメモリサイズ計算データ１８５である。 25, the memory size calculation data 185 is the memory size calculation data 185 shown in FIG. 22 generated based on the blocking target loop shown in FIG. That is, the memory size calculation data 185 shown in FIG. 25 is the memory size calculation data 185 representing the memory size calculation formula 16n ² + 24n ³ .

図２５において，キャッシュサイズは，コンピュータ１が備える階層型のメモリ３における１次キャッシュメモリのサイズを示す。ここでは，キャッシュサイズが１３１０７２バイトであるものとする。 In FIG. 25, the cache size indicates the size of the primary cache memory in the hierarchical memory 3 provided in the computer 1. Here, it is assumed that the cache size is 131072 bytes.

分割ブロック長計算部１４０は，メモリサイズ計算データ１８５のｎに，分割ブロック長として仮定した値を入力し，メモリサイズを算出する。分割ブロック長計算部１４０は，算出されたメモリサイズとキャッシュサイズとを比較し，メモリサイズがキャッシュサイズを超えているかをチェックする。 The divided block length calculation unit 140 inputs a value assumed as the divided block length to n of the memory size calculation data 185, and calculates the memory size. The divided block length calculation unit 140 compares the calculated memory size with the cache size, and checks whether the memory size exceeds the cache size.

メモリサイズ計算データ１８５に分割ブロック長と仮定して入力する値は，任意である。本実施の形態では，キャッシュラインやアライメントを意識して，メモリサイズ計算データ１８５に８の倍数を入力していくものとする。また，図２５では，算出されたメモリサイズとキャッシュサイズとの比較結果として，メモリサイズがキャッシュサイズ以下の場合には○が，メモリサイズがキャッシュサイズを超えている場合には×が記載されている。 The value that is input to the memory size calculation data 185 on the assumption that the divided block length is arbitrary. In this embodiment, it is assumed that a multiple of 8 is input to the memory size calculation data 185 in consideration of cache lines and alignment. In FIG. 25, as a result of comparison between the calculated memory size and the cache size, “◯” is described when the memory size is equal to or smaller than the cache size, and “X” is described when the memory size exceeds the cache size. Yes.

まず，分割ブロック長計算部１４０は，メモリサイズ計算データ１８５のｎに，分割ブロック長として８を入力する。このとき，メモリサイズ計算データ１８５によって得られるメモリサイズは，１３３１２バイトとなる。算出されたメモリサイズは，キャッシュサイズ以下であるので，比較結果は○となる。 First, the divided block length calculation unit 140 inputs 8 as the divided block length to n of the memory size calculation data 185. At this time, the memory size obtained from the memory size calculation data 185 is 13312 bytes. Since the calculated memory size is equal to or smaller than the cache size, the comparison result is ◯.

次に，分割ブロック長計算部１４０は，メモリサイズ計算データ１８５のｎに，分割ブロック長として１６を入力する。このとき，メモリサイズ計算データ１８５によって得られるメモリサイズは，１０２４００バイトとなる。算出されたメモリサイズは，キャッシュサイズ以下であるので，比較結果は○となる。 Next, the divided block length calculation unit 140 inputs 16 as the divided block length to n of the memory size calculation data 185. At this time, the memory size obtained from the memory size calculation data 185 is 102400 bytes. Since the calculated memory size is equal to or smaller than the cache size, the comparison result is ◯.

次に，分割ブロック長計算部１４０は，メモリサイズ計算データ１８５のｎに，分割ブロック長として２４を入力する。このとき，メモリサイズ計算データ１８５によって得られるメモリサイズは，３４０９９２バイトとなる。算出されたメモリサイズは，キャッシュサイズを超えているので，比較結果は×となる。 Next, the divided block length calculation unit 140 inputs 24 as the divided block length to n of the memory size calculation data 185. At this time, the memory size obtained from the memory size calculation data 185 is 340992 bytes. Since the calculated memory size exceeds the cache size, the comparison result is x.

分割ブロック長計算部１４０は，キャッシュサイズを超えない最大のメモリサイズが得られた入力値１６を，図１１（Ａ）に示すブロッキング対象ループに対するブロッキングによる最適化で用いる，自動計算された分割ブロック長として決定する。 The divided block length calculation unit 140 uses the input value 16 from which the maximum memory size not exceeding the cache size is obtained in the optimization by blocking for the blocking target loop shown in FIG. Determine as long.

図２６は，本実施の形態の分割ブロック長計算部による分割ブロック長計算処理フローチャートである。 FIG. 26 is a divided block length calculation processing flowchart by the divided block length calculation unit of the present embodiment.

分割ブロック長計算部１４０は，分割ブロック長の値を−１に設定する（ステップＳ５０）。分割ブロック長計算部１４０は，−１に設定された分割ブロック長を，メモリ３上の退避領域に退避する（ステップＳ５１）。ここまでの処理は，メモリサイズ計算データ１８５に入力する分割ブロック長の値を初期化する処理である。メモリサイズ計算データ１８５に最初の分割ブロック長を入力して得られるメモリサイズがキャッシュサイズを超えてしまうような場合には，ブロッキング対象ループに対する自動計算されたブロック長を用いたブロッキングは行われない。そのため，分割ブロック長の初期値として，ブロッキングによる最適化を実行しない旨を示す−１が設定される。 The divided block length calculation unit 140 sets the value of the divided block length to −1 (step S50). The divided block length calculation unit 140 saves the divided block length set to −1 in the save area on the memory 3 (step S51). The processing so far is processing for initializing the value of the divided block length input to the memory size calculation data 185. When the memory size obtained by inputting the first divided block length to the memory size calculation data 185 exceeds the cache size, blocking using the automatically calculated block length for the blocking target loop is not performed. . Therefore, −1 indicating that optimization by blocking is not executed is set as the initial value of the divided block length.

分割ブロック長計算部１４０は，分割ブロック長に最初の入力値を設定する（ステップＳ５２）。分割ブロック長計算部１４０は，オペレーティングシステム４０に対して問い合わせを行い，キャッシュサイズを取得する（ステップＳ５３）。 The divided block length calculation unit 140 sets the first input value as the divided block length (step S52). The divided block length calculation unit 140 makes an inquiry to the operating system 40 and acquires the cache size (step S53).

分割ブロック長計算部１４０は，メモリサイズ計算データ１８５に分割ブロック長を入力して，メモリサイズを算出する（ステップＳ５４）。分割ブロック長計算部１４０は，算出されたメモリサイズが，取得されたキャッシュサイズを超えているかを判定する（ステップＳ５５）。 The divided block length calculation unit 140 inputs the divided block length to the memory size calculation data 185 and calculates the memory size (step S54). The divided block length calculation unit 140 determines whether the calculated memory size exceeds the acquired cache size (step S55).

メモリサイズがキャッシュサイズを超えていなければ（ステップＳ５５のＮＯ），分割ブロック長計算部１４０は，分割ブロック長の値をメモリ３上の退避領域に退避する（ステップＳ５６）。この処理で，退避領域に前に退避された分割ブロック長が更新される。分割ブロック長計算部１４０は，メモリサイズ計算データ１８５に入力する分割ブロック長の値を増加し（ステップＳ５７），ステップＳ５４に戻って，増加された分割ブロック長の処理に移る。 If the memory size does not exceed the cache size (NO in step S55), the divided block length calculation unit 140 saves the value of the divided block length in the save area on the memory 3 (step S56). In this process, the divided block length previously saved in the save area is updated. The divided block length calculation unit 140 increases the value of the divided block length input to the memory size calculation data 185 (step S57), returns to step S54, and proceeds to processing of the increased divided block length.

メモリサイズがキャッシュサイズを超えていれば（ステップＳ５５のＹＥＳ），分割ブロック長計算部１４０は，その時点でメモリ３の退避領域に退避されている値を，自動計算された分割ブロック長の値として決定し（ステップＳ５８），処理を終了する。 If the memory size exceeds the cache size (YES in step S55), the divided block length calculation unit 140 uses the value saved in the save area of the memory 3 at that time as the automatically calculated divided block length value. (Step S58), and the process ends.

次に，本実施の形態のブロッキング指示部１５０による処理の例を説明する。 Next, an example of processing by the blocking instruction unit 150 of the present embodiment will be described.

ブロッキング指示部１５０は，分割ブロック長計算部１４０により決定された分割ブロック長を用いたブロッキングによる最適化実行の指示を行う。より具体的には，ブロッキング指示部１５０は，ループデータ記憶部２００に記憶されたブロッキング対象ループの各ループデータ２１０における分割ブロック長に対して，自動計算された分割ブロック長の値を設定する。このとき，ブロッキング指示部１５０は，ループデータ２１０における分割ブロック長に自動計算を示す０が設定されているものについてのみ，自動計算された分割ブロック長の設定を行う。 The blocking instruction unit 150 instructs optimization execution by blocking using the divided block length determined by the divided block length calculation unit 140. More specifically, the blocking instruction unit 150 sets the automatically calculated division block length value for the division block length in each loop data 210 of the blocking target loop stored in the loop data storage unit 200. At this time, the blocking instruction unit 150 sets the automatically calculated divided block length only for those in which 0 indicating automatic calculation is set as the divided block length in the loop data 210.

例えば，本実施の形態では，ブロッキング指示部１５０は，図１２に示すブロッキング対象ループの各ループデータ２１０における分割ブロック長に対して，図２５に示す例で決定された分割ブロック長の値１６を設定する。 For example, in the present embodiment, the blocking instruction unit 150 sets the divided block length value 16 determined in the example shown in FIG. 25 to the divided block length in each loop data 210 of the blocking target loop shown in FIG. Set.

図２７は，本実施の形態によるブロッキング指示部によるブロッキング指示処理フローチャートである。 FIG. 27 is a flowchart of a blocking instruction process by the blocking instruction unit according to this embodiment.

ブロッキング指示部１５０は，ループデータ記憶部２００に記憶された，ブロッキング対象ループに含まれるループのループデータ２１０を，１つ選択する（ステップＳ６０）。 The blocking instruction unit 150 selects one loop data 210 of the loop included in the blocking target loop stored in the loop data storage unit 200 (step S60).

ブロッキング指示部１５０は，選択されたループデータ２１０におけるブロッキングフラグが“ＴＲＵＥ”であるかを判定する（ステップＳ６１）。また，ブロッキング指示部１５０は，選択されたループデータ２１０における分割ブロック長に，固定の分割ブロック長が設定されているかを判定する（ステップＳ６２）。 The blocking instruction unit 150 determines whether the blocking flag in the selected loop data 210 is “TRUE” (step S61). Further, the blocking instruction unit 150 determines whether a fixed divided block length is set as the divided block length in the selected loop data 210 (step S62).

ブロッキングフラグが“ＴＲＵＥ”であり（ステップＳ６１のＹＥＳ），かつ分割ブロック長に固定の分割ブロック長が設定されていない場合には（ステップＳ６２のＮＯ），ブロッキング指示部１５０は，選択されたループデータ２１０の分割ブロック長に，自動計算された分割ブロック長の値を設定する（ステップＳ６３）。 When the blocking flag is “TRUE” (YES in step S61) and a fixed divided block length is not set as the divided block length (NO in step S62), the blocking instruction unit 150 selects the selected loop. The value of the automatically calculated divided block length is set as the divided block length of the data 210 (step S63).

ブロッキング指示部１５０は，ブロッキング対象ループに含まれるループのすべてのループデータ２１０について処理が終了したかを判定する（ステップＳ６４）。すべてのループデータ２１０についてまだ処理が終了していなければ（ステップＳ６４のＮＯ），ブロッキング指示部１５０は，ステップＳ６０に戻って次のループデータ２１０の処理に移る。すべてのループデータ２１０について処理が終了していれば（ステップＳ６４のＹＥＳ），ブロッキング指示部１５０は，処理を終了する。 The blocking instruction unit 150 determines whether the processing has been completed for all the loop data 210 of the loop included in the blocking target loop (step S64). If the processing has not been completed for all the loop data 210 (NO in step S64), the blocking instruction unit 150 returns to step S60 and proceeds to the processing of the next loop data 210. If the processing has been completed for all the loop data 210 (YES in step S64), the blocking instruction unit 150 ends the processing.

最適化実行部１７は，中間コードの一部としてループデータ記憶部２００に記憶されたループデータ２１０における分割ブロック長の値を用いて，ブロッキング対象ループに対するブロッキングによる最適化を実行する。 The optimization execution unit 17 performs optimization by blocking the blocking target loop using the value of the divided block length in the loop data 210 stored in the loop data storage unit 200 as a part of the intermediate code.

以下，図２８〜図３１を用いて，本実施の形態のブロッキングによる最適化の具体的な実施例をいくつか説明する。 Hereinafter, some specific examples of optimization by blocking according to the present embodiment will be described with reference to FIGS. 28 to 31.

図２８は，本実施の形態のブロッキングによる最適化を３重以上のループに対して実行した例である。 FIG. 28 shows an example in which the optimization by blocking according to the present embodiment is executed for a loop of three or more layers.

図２８に示す最適化対象ループは，ループＫ，ループＪ，ループＩからなる。図２８に示す最適化対象ループは，最内のループＩ内にのみ演算の実行文を有するタイトなループであるので，処理対象ループ解析部１００は，図２８に示す最適化対象ループをそのままブロッキング対象ループとする。配列解析部１１０は，３つの配列を抽出する。アクセスパターン解析部１２０は，抽出された配列Ｃ（Ｋ，Ｊ）のアクセスパターンがクロスであるので，図２８に示す最適化対象ループに対するブロッキングによる最適化が有効であると判定する。 The optimization target loop shown in FIG. 28 includes a loop K, a loop J, and a loop I. Since the optimization target loop shown in FIG. 28 is a tight loop having an execution statement for computation only in the innermost loop I, the processing target loop analysis unit 100 blocks the optimization target loop shown in FIG. 28 as it is. The target loop. The sequence analysis unit 110 extracts three sequences. Since the access pattern of the extracted array C (K, J) is a cross, the access pattern analysis unit 120 determines that the optimization by blocking for the optimization target loop shown in FIG. 28 is effective.

抽出された３つの配列の型は，すべてｒｅａｌ（配列の型サイズは８バイト）で宣言されているものとする。また，３つの配列は，すべてブロッキング対象ループに含まれるループの制御変数を添え字とする，２次元の配列である。メモリサイズ計算式生成部１３０は，メモリサイズ計算式２４ｎ²を生成する。 The extracted three array types are all declared as real (the array type size is 8 bytes). In addition, the three arrays are two-dimensional arrays in which the control variables of the loops included in the blocking target loop are used as subscripts. The memory size calculation formula generation unit 130 generates a memory size calculation formula 24n ² .

キャッシュサイズとして，１３１０７２バイトが取得されたものとする。分割ブロック長計算部１４０は，メモリサイズ計算式２４ｎ²に８の倍数を小さい値から順に入力していく。分割ブロック長計算部１４０は，メモリサイズ計算式２４ｎ²から得られるメモリサイズが，キャッシュサイズを超えない最大のｎを，自動計算された分割ブロック長として，決定する。ここでは，ｎ＝７２のときに得られるメモリサイズ１２４４１６バイトが，キャッシュサイズを超えない最大のメモリサイズとなる。 It is assumed that 131072 bytes are acquired as the cache size. The divided block length calculation unit 140 sequentially inputs multiples of 8 from the smallest value to the memory size calculation formula 24n ² . The divided block length calculation unit 140 determines the maximum n that the memory size obtained from the memory size calculation formula 24n ² does not exceed the cache size as the automatically calculated divided block length. Here, the memory size of 124416 bytes obtained when n = 72 is the maximum memory size that does not exceed the cache size.

ブロッキング指示部１５０は，得られた値７２を自動計算される分割ブロック長として，最適化実行部１７に指示する。最適化実行部１７は，自動計算された分割ブロック長ｂｌｏｃｋ＝７２を用いて，図２８に示す最適化対象ループに対するブロッキングによる最適化を実行する。図２８に示す最適化対象ループに対するブロッキングによる最適化の実行結果として，図２８に示すブロッキング後ループが得られる。 The blocking instruction unit 150 instructs the optimization execution unit 17 to use the obtained value 72 as the automatically calculated divided block length. The optimization execution unit 17 executes optimization by blocking on the optimization target loop shown in FIG. 28 using the automatically calculated divided block length block = 72. The post-blocking loop shown in FIG. 28 is obtained as an execution result of the optimization by blocking the optimization target loop shown in FIG.

このように，本実施の形態のブロッキングによる最適化の技術によって，３重ループ以上に対するブロッキングによる最適化が可能となる。 As described above, the optimization by blocking according to the present embodiment enables optimization by blocking with respect to three or more loops.

図２９は，本実施の形態のブロッキングによる最適化を３次元以上の配列を含む多重ループに対して実行した例である。 FIG. 29 shows an example in which the optimization by blocking according to the present embodiment is performed on a multiple loop including a three-dimensional array or more.

図２９に示す最適化対象ループは，ループＫ，ループＪ，ループＩからなる。図２９に示す最適化対象ループは，最内のループＩ内にのみ，３次元の配列を持つ演算の実行文を有するタイトなループであるので，処理対象ループ解析部１００は，図２９に示す最適化対象ループをそのままブロッキング対象ループとする。配列解析部１１０は，５つの配列を抽出する。アクセスパターン解析部１２０は，抽出された配列Ｂ（Ｊ，Ｋ，Ｉ）のアクセスパターンがクロスであるので，図２９に示す最適化対象ループに対するブロッキングによる最適化が有効であると判定する。 The optimization target loop shown in FIG. 29 includes a loop K, a loop J, and a loop I. Since the optimization target loop shown in FIG. 29 is a tight loop having an execution statement of an operation having a three-dimensional array only in the innermost loop I, the processing target loop analysis unit 100 is shown in FIG. The optimization target loop is directly used as a blocking target loop. The sequence analysis unit 110 extracts five sequences. Since the access pattern of the extracted array B (J, K, I) is a cross, the access pattern analysis unit 120 determines that optimization by blocking for the optimization target loop shown in FIG. 29 is effective.

抽出された５つの配列の型は，すべてｒｅａｌ（配列の型サイズは８バイト）で宣言されているものとする。また，５つの配列のうちの３つは，すべてブロッキング対象ループに含まれるループの制御変数を添え字とする，３次元の配列である。５つの配列のうちの２つは，ブロッキング対象ループに含まれるループの制御変数でない添え字を１つ含む，３次元の配列である。メモリサイズ計算式生成部１３０は，メモリサイズ計算式１６ｎ²＋２４ｎ³を生成する。 The extracted five array types are all declared as real (the array type size is 8 bytes). In addition, three of the five arrays are three-dimensional arrays that have subscripts as control variables of the loops included in the blocking target loop. Two of the five arrays are three-dimensional arrays including one subscript that is not a control variable of a loop included in the blocking target loop. The memory size calculation formula generation unit 130 generates a memory size calculation formula 16n ² + 24n ³ .

キャッシュサイズとして，１３１０７２バイトが取得されたものとする。分割ブロック長計算部１４０は，メモリサイズ計算式１６ｎ²＋２４ｎ³に８の倍数を小さい値から順に入力していく。分割ブロック長計算部１４０は，メモリサイズ計算式１６ｎ²＋２４ｎ³から得られるメモリサイズが，キャッシュサイズを超えない最大のｎを，自動計算された分割ブロック長として，決定する。ここでは，ｎ＝１６のときに得られるメモリサイズ１０２４００バイトが，キャッシュサイズを超えない最大のメモリサイズとなる。 It is assumed that 131072 bytes are acquired as the cache size. The divided block length calculation unit 140 sequentially inputs multiples of 8 from the smallest value to the memory size calculation formula 16n ² + 24n ³ . The divided block length calculation unit 140 determines, as the automatically calculated divided block length, the maximum n that the memory size obtained from the memory size calculation formula 16n ² + 24n ³ does not exceed the cache size. Here, the memory size of 102400 bytes obtained when n = 16 is the maximum memory size that does not exceed the cache size.

ブロッキング指示部１５０は，得られた値１６を自動計算される分割ブロック長として，最適化実行部１７に指示する。最適化実行部１７は，自動計算された分割ブロック長ｂｌｏｃｋ＝１６を用いて，図２９に示す最適化対象ループに対するブロッキングによる最適化を実行する。図２９に示す最適化対象ループに対するブロッキングによる最適化の実行結果として，図２９に示すブロッキング後ループが得られる。 The blocking instruction unit 150 instructs the optimization execution unit 17 to set the obtained value 16 as the automatically calculated divided block length. The optimization execution unit 17 executes optimization by blocking the optimization target loop shown in FIG. 29 using the automatically calculated divided block length block = 16. The post-blocking loop shown in FIG. 29 is obtained as an execution result of the optimization by blocking on the optimization target loop shown in FIG.

このように，本実施の形態のブロッキングによる最適化の技術によって，３次元以上の配列を含む多重ループに対するブロッキングによる最適化が可能となる。 As described above, the optimization by blocking according to the present embodiment enables the optimization by blocking with respect to multiple loops including arrays of three or more dimensions.

図３０は，本実施の形態のブロッキングによる最適化をブロッキング抑止の最適化指示行を含む多重ループに対して実行した例である。 FIG. 30 shows an example in which the optimization by blocking according to the present embodiment is executed for a multiple loop including an optimization instruction line for blocking inhibition.

図３０に示す最適化対象ループは，ループＫ，ループＪ，ループＩからなる。図３０に示す最適化対象ループは，最内のループＩ内にのみ，２次元の配列を持つ演算の実行文を有するタイトなループであるので，処理対象ループ解析部１００は，図３０に示す最適化対象ループをそのままブロッキング対象ループとする。配列解析部１１０は，３つの配列を抽出する。アクセスパターン解析部１２０は，抽出された配列Ｃ（Ｋ，Ｊ）のアクセスパターンがクロスであるので，図３０に示す最適化対象ループに対するブロッキングによる最適化が有効であると判定する。 The optimization target loop shown in FIG. 30 includes a loop K, a loop J, and a loop I. Since the optimization target loop shown in FIG. 30 is a tight loop having an execution statement of an operation having a two-dimensional array only in the innermost loop I, the processing target loop analysis unit 100 is shown in FIG. The optimization target loop is directly used as a blocking target loop. The sequence analysis unit 110 extracts three sequences. Since the access pattern of the extracted array C (K, J) is a cross, the access pattern analysis unit 120 determines that the optimization by blocking for the optimization target loop shown in FIG. 30 is effective.

抽出された３つの配列の型は，すべてｒｅａｌ（配列の型サイズは８バイト）で宣言されているものとする。また，図３０に示す最適化対象ループは，ループＪに対するブロッキング抑止の最適化指示行を含んでいる。３つの配列のうちの１つは，すべてブロッキング抑止の対象となっていない，ブロッキング対象ループに含まれるループの制御変数を添え字とする，２次元の配列である。３つの配列のうちの２つは，ブロッキング抑止の対象となっているループの制御変数を１つ含む，すべてブロッキング対象ループに含まれるループの制御変数を添え字とする，２次元の配列である。メモリサイズ計算式生成部１３０は，メモリサイズ計算式８ｎ²＋１６ｎを生成する。 The extracted three array types are all declared as real (the array type size is 8 bytes). Further, the optimization target loop shown in FIG. 30 includes an optimization instruction line for blocking suppression for the loop J. One of the three arrays is a two-dimensional array in which all of the control variables of the loop included in the blocking target loop are not subscripted and are subscripts. Two of the three arrays are two-dimensional arrays that contain one control variable for the loop that is subject to blocking suppression, and that are subscripted from the control variables for the loops that are all included in the blocking target loop. . The memory size calculation formula generation unit 130 generates a memory size calculation formula 8n ² + 16n.

キャッシュサイズとして，１３１０７２バイトが取得されたものとする。分割ブロック長計算部１４０は，メモリサイズ計算式８ｎ²＋１６ｎに８の倍数を小さい値から順に入力していく。分割ブロック長計算部１４０は，メモリサイズ計算式８ｎ²＋１６ｎから得られるメモリサイズが，キャッシュサイズを超えない最大のｎを，自動計算された分割ブロック長として，決定する。ここでは，ｎ＝１２０のときに得られるメモリサイズ１１７１２０バイトが，キャッシュサイズを超えない最大のメモリサイズとなる。 It is assumed that 131072 bytes are acquired as the cache size. The divided block length calculation unit 140 inputs multiples of 8 in order from the smallest value to the memory size calculation formula 8n ² + 16n. The divided block length calculation unit 140 determines, as the automatically calculated divided block length, the maximum n that the memory size obtained from the memory size calculation formula 8n ² + 16n does not exceed the cache size. Here, the memory size 117120 bytes obtained when n = 120 is the maximum memory size that does not exceed the cache size.

ブロッキング指示部１５０は，得られた値１２０を自動計算される分割ブロック長として，最適化実行部１７に指示する。最適化実行部１７は，自動計算された分割ブロック長ｂｌｏｃｋ＝１２０を用いて，図３０に示す最適化対象ループに対するブロッキングによる最適化を実行する。このとき，最適化実行部１７は，ブロッキング抑止の対象となっているループＪに対しては，ブロッキングによる最適化を実行しない。図３０に示す最適化対象ループに対するブロッキングによる最適化の実行結果として，図３０に示すブロッキング後ループが得られる。 The blocking instruction unit 150 instructs the optimization execution unit 17 to use the obtained value 120 as the automatically calculated divided block length. The optimization execution unit 17 executes optimization by blocking on the optimization target loop shown in FIG. 30 using the automatically calculated divided block length block = 120. At this time, the optimization execution unit 17 does not execute the optimization by blocking the loop J that is the object of blocking inhibition. A post-blocking loop shown in FIG. 30 is obtained as a result of execution of optimization by blocking the optimization target loop shown in FIG.

このように，本実施の形態のブロッキングによる最適化の技術によって，あらかじめブロッキング抑止が指示されたループを除いた，最適化対象ループに対するブロッキングによる最適化が可能となる。 As described above, the optimization by blocking of the optimization target loop can be performed except for the loop for which blocking suppression is instructed in advance by the optimization technique by blocking according to the present embodiment.

図３１は，本実施の形態のブロッキングによる最適化を固定の分割ブロック長の最適化指示行を含む多重ループに対して実行した例である。 FIG. 31 shows an example in which the optimization by blocking according to the present embodiment is performed on a multiple loop including an optimization instruction line having a fixed divided block length.

図３１に示す最適化対象ループは，ループＫ，ループＪ，ループＩからなる。図３１に示す最適化対象ループは，最内のループＩ内にのみ，２次元の配列を持つ演算の実行文を有するタイトなループであるので，処理対象ループ解析部１００は，図３１に示す最適化対象ループをそのままブロッキング対象ループとする。配列解析部１１０は，３つの配列を抽出する。アクセスパターン解析部１２０は，抽出された配列Ｃ（Ｋ，Ｊ）のアクセスパターンがクロスであるので，図３１に示す最適化対象ループに対するブロッキングによる最適化が有効であると判定する。 The optimization target loop shown in FIG. 31 includes a loop K, a loop J, and a loop I. Since the optimization target loop shown in FIG. 31 is a tight loop having an execution statement of an operation having a two-dimensional array only in the innermost loop I, the processing target loop analysis unit 100 is shown in FIG. The optimization target loop is directly used as a blocking target loop. The sequence analysis unit 110 extracts three sequences. Since the access pattern of the extracted array C (K, J) is a cross, the access pattern analysis unit 120 determines that the optimization by blocking for the optimization target loop shown in FIG. 31 is effective.

抽出された３つの配列の型は，すべてｒｅａｌ（配列の型サイズは８バイト）で宣言されているものとする。また，図３１に示す最適化対象ループは，ループＪに対して固定の分割ブロック長（値４８）を指示した最適化指示行を含んでいる。３つの配列のうちの１つは，すべて固定の分割ブロック長が指示されていない，ブロッキング対象ループに含まれるループの制御変数を添え字とする，２次元の配列である。３つの配列のうちの２つは，固定の分割ブロック長が指示されたループの制御変数を１つ含む，すべてブロッキング対象ループに含まれるループの制御変数を添え字とする，２次元の配列である。メモリサイズ計算式生成部１３０は，メモリサイズ計算式８ｎ²＋７６８ｎを生成する。 The extracted three array types are all declared as real (the array type size is 8 bytes). Further, the optimization target loop shown in FIG. 31 includes an optimization instruction row instructing a fixed divided block length (value 48) for the loop J. One of the three arrays is a two-dimensional array in which a fixed variable block length is not specified and the control variable of the loop included in the blocking target loop is used as a subscript. Two of the three arrays are two-dimensional arrays that contain one control variable for the loop for which a fixed division block length is specified and that are subscripted from the control variables for the loops that are all included in the blocking target loop. is there. The memory size calculation formula generation unit 130 generates a memory size calculation formula 8n ² + 768n.

キャッシュサイズとして，１３１０７２バイトが取得されたものとする。分割ブロック長計算部１４０は，メモリサイズ計算式８ｎ²＋７６８ｎに８の倍数を小さい値から順に入力していく。分割ブロック長計算部１４０は，メモリサイズ計算式８ｎ²＋７６８ｎから得られるメモリサイズが，キャッシュサイズを超えない最大のｎを，自動計算された分割ブロック長として，決定する。ここでは，ｎ＝８８のときに得られるメモリサイズ１２９５３６バイトが，キャッシュサイズを超えない最大のメモリサイズとなる。 It is assumed that 131072 bytes are acquired as the cache size. The divided block length calculation unit 140 sequentially inputs multiples of 8 from the smallest value to the memory size calculation formula 8n ² + 768n. The divided block length calculation unit 140 determines, as the automatically calculated divided block length, the maximum n that the memory size obtained from the memory size calculation formula 8n ² + 768n does not exceed the cache size. Here, the memory size 129536 bytes obtained when n = 88 is the maximum memory size that does not exceed the cache size.

ブロッキング指示部１５０は，得られた値８８を自動計算される分割ブロック長として，最適化実行部１７に指示する。最適化実行部１７は，自動計算された分割ブロック長ｂｌｏｃｋ＝８８を用いて，図３１に示す最適化対象ループに対するブロッキングによる最適化を実行する。このとき，最適化実行部１７は，固定の分割ブロック長が指示されたループＪに対しては，その指示された固定の分割ブロック長の値４８を用いて，ブロッキングによる最適化を実行する。図３１に示す最適化対象ループに対するブロッキングによる最適化の実行結果として，図３１に示すブロッキング後ループが得られる。 The blocking instruction unit 150 instructs the optimization execution unit 17 to use the obtained value 88 as the automatically calculated divided block length. The optimization execution unit 17 executes optimization by blocking on the optimization target loop shown in FIG. 31 using the automatically calculated divided block length block = 88. At this time, the optimization execution unit 17 executes the optimization by blocking for the loop J for which the fixed division block length is instructed, using the value 48 of the instructed fixed division block length. As a result of execution of optimization by blocking the optimization target loop shown in FIG. 31, a post-blocking loop shown in FIG. 31 is obtained.

このように，本実施の形態のブロッキングによる最適化の技術によって，あらかじめ固定の分割ブロック長が指示されたループにはそれを用いて，それ以外のループには自動計算された分割ブロック長を用いて，最適化対象ループに対するブロッキングによる最適化が可能となる。 As described above, by using the optimization technique based on blocking according to the present embodiment, the fixed division block length is used in advance for the loop, and the automatically calculated division block length is used for other loops. Therefore, it is possible to optimize the loop to be optimized by blocking.

以上説明した本実施の形態の最適化処理部１５による処理は，コンピュータが備えるＣＰＵ，メモリ等のハードウェアとソフトウェアプログラムとにより実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録することも，ネットワークを通して提供することも可能である。 The processing by the optimization processing unit 15 of the present embodiment described above can be realized by hardware such as CPU and memory provided in the computer and a software program, and the program is recorded on a computer-readable recording medium. It can also be provided through a network.

以上，本実施の形態について説明したが，本発明はその主旨の範囲において種々の変形が可能であることは当然である。 Although the present embodiment has been described above, the present invention can naturally be modified in various ways within the scope of the gist thereof.

１コンピュータ
２ＣＰＵ
３メモリ
４ＨＤＤ
５入力装置
６表示装置
１０コンパイラ
１１ソースプログラム入力部
１２入出力制御部
１３中間言語生成部
１４中間言語記憶部
２００ループデータ記憶部
１５最適化処理部
１６ソース解析部
１００処理対象ループ解析部
１１０配列解析部
１２０アクセスパターン解析部
１３０メモリサイズ計算式生成部
１４０分割ブロック長計算部
１５０ブロッキング指示部
１６０配列情報記憶部
１７０配列別メモリ計算情報記憶部
１８０メモリサイズ計算情報記憶部
１７最適化実行部
１８コード生成部
１９オブジェクトファイル出力部
２０記憶部
２１ソースプログラム
２２オブジェクトファイル
２３実行ファイル
３０リンカ
４０オペレーティングシステム 1 Computer 2 CPU
3 Memory 4 HDD
DESCRIPTION OF SYMBOLS 5 Input device 6 Display apparatus 10 Compiler 11 Source program input part 12 Input / output control part 13 Intermediate language production | generation part 14 Intermediate language memory | storage part 200 Loop data memory | storage part 15 Optimization processing part 16 Source analysis part 100 Processing object loop analysis part 110 Array Analysis unit 120 Access pattern analysis unit 130 Memory size calculation formula generation unit 140 Division block length calculation unit 150 Blocking instruction unit 160 Array information storage unit 170 Memory calculation information storage unit by array 180 Memory size calculation information storage unit 17 Optimization execution unit 18 Code generation unit 19 Object file output unit 20 Storage unit 21 Source program 22 Object file 23 Execution file 30 Linker 40 Operating system

Claims

In a compiler for compiling a source program written in a programming language, a program for causing a computer to perform optimization processing by blocking multiple loops included in the source program,
In the computer,
A procedure for extracting, from the multiple loops included in the source program, a loop having a structure having an executable statement only in the innermost loop as a blocking target loop;
Extracting a sequence present in the blocking loop;
In the extracted array, if there is an array in which the order of control variables appearing as subscripts is different from the order of control variables appearing from the innermost loop to the outermost loop of the blocking target loop, A procedure for determining that blocking for the blocking target loop is effective;
A procedure for generating, for each array, a calculation formula for calculating a memory size used by the array in a range optimized by blocking with respect to the blocking target loop for the blocking target loop determined to be effective.
Generating a calculation formula for calculating a memory size used by all the arrays in the blocking target loop from a calculation formula generated for each array in a range optimized by blocking with respect to the blocking target loop;
A procedure for automatically calculating a divided block length in which a cache miss does not occur due to access to an array within a range optimized by blocking for the blocking target loop, using a calculation formula for calculating a memory size used by all the arrays;
An optimization processing program for executing a procedure for performing optimization by blocking on the blocking target loop using the automatically calculated divided block length.

In the procedure for generating the calculation formula for each array, the blocking target loop that is determined to be effective for blocking is optimized by blocking the blocking target loop except the loop instructed not to execute blocking. Generate a calculation formula for each array to calculate the memory size used by the array
In the procedure for performing the optimization by blocking, using the automatically calculated divided block length, the loop instructed not to execute the blocking is excluded, and the blocking target loop is optimized by the blocking. The optimization processing program according to claim 1, wherein:

In the procedure for generating the calculation formula for each array, the blocking target loop that is determined to be effective for blocking is optimized by blocking the blocking target loop including a loop in which a fixed division block length is indicated. Generate a calculation formula for each array to calculate the memory size used by the array in the range,
In the procedure for performing the optimization by blocking, for the loop designated by the fixed divided block length included in the loop to be blocked, optimization by blocking is performed using the designated fixed block length. The optimization by blocking is performed using the automatically calculated divided block length for a loop in which the fixed divided block length included in the blocking target loop is not specified. Alternatively, the optimization processing program according to claim 2.

In a compiler for compiling a source program described in a programming language, an optimization processing device that executes optimization processing by blocking against multiple loops included in the source program,
A processing target loop analysis unit that extracts a loop having an executable statement only in the innermost loop as a blocking target loop from the multiple loops included in the source program;
A sequence analysis unit for extracting a sequence existing in the blocking target loop;
In the extracted array, if there is an array in which the order of control variables appearing as subscripts is different from the order of control variables appearing from the innermost loop to the outermost loop of the blocking target loop, An access pattern analyzer that determines that blocking is effective for the blocking loop;
For each of the arrays for which the blocking is determined to be effective, a calculation formula for calculating the memory size used by the array within a range optimized by the blocking for the blocking target loop is generated for each array. A memory size calculation formula generation unit that generates a calculation formula for calculating a memory size used by all the arrays in the blocking target loop within a range optimized by blocking with respect to the blocking target loop, from the calculation formula generated for each block; ,
Divided block length that automatically calculates a divided block length that does not cause a cache miss due to access to the array within a range optimized by blocking for the blocking target loop, using a calculation formula that calculates the memory size used by the entire array A calculation unit;
An optimization processing device comprising: an optimization execution unit that performs optimization by blocking the blocking target loop using the automatically calculated divided block length.

In a compiler that compiles a source program written in a programming language, an optimization processing method in which a computer executes optimization processing by blocking against multiple loops included in the source program,
The computer is
Extracting a loop having a structure having an executable statement only in the innermost loop as a blocking target loop from multiple loops included in the source program;
Extracting a sequence present in the blocking loop;
When there is an array in the extracted array in which the order of control variables appearing as subscripts is different from the order of control variables appearing from the innermost loop to the outermost loop of the blocking target loop, A process for determining that blocking is effective for a blocking target loop;
A step of generating, for each array, a calculation formula for calculating a memory size used by the array within a range optimized by blocking with respect to the blocking target loop for the blocking target loop determined to be effective;
Generating a calculation formula for calculating a memory size used by all the arrays in the blocking target loop from a calculation formula generated for each array in a range optimized by blocking with respect to the blocking target loop;
A process of automatically calculating a divided block length in which a cache miss does not occur due to an access to an array within a range optimized by blocking for the blocking target loop, using a calculation formula for calculating a memory size used by the entire array;
The optimization processing method is characterized in that, using the automatically calculated divided block length, a process of performing optimization by blocking the blocking target loop is executed.