JP2010244205A

JP2010244205A - Compiler program and compiler device

Info

Publication number: JP2010244205A
Application number: JP2009090480A
Authority: JP
Inventors: Masatoshi Haraguchi; 正寿原口; Mikio Hondo; 幹雄本藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-04-02
Filing date: 2009-04-02
Publication date: 2010-10-28
Anticipated expiration: 2029-04-02
Also published as: JP5251689B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve the executing performance of a program by effectively using a cache memory. <P>SOLUTION: An optimization part 32 analyzes an intermediate language text converted by a source analyzing part 31 from a source program to be executed by an information processor 40 on which a cache memory with a sector function is loaded. Concretely, the optimization part 32 determines the presence/absence of reusability in executing the loop processing of a data group to be processed by each group. Then, the optimization part 32 determines a sector division rate and a sector number from the number of ways required for storing the data group of which the presence/absence of reusability is determined and the maximum number of ways of a system. The optimization part 32 inserts a sector division instruction and an instruction sentence to which a sector number is added in a loop whose sector division rate and sector number have been determined. A file generation part 33 generates an object file from an intermediate language text into which the instruction sentence is inserted. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、コンパイラプログラムおよびコンパイラ装置に関する。 The present invention relates to a compiler program and a compiler apparatus.

従来より、ＣＰＵ（Central Processing Unit）およびメインメモリ間に発生するデータ遅延を解消してプログラムの実行処理を向上するために、記憶容量が小さいが高速アクセスが可能なキャッシュメモリが利用されている（例えば、特許文献１〜５参照）。 Conventionally, a cache memory having a small storage capacity but capable of high-speed access has been used in order to eliminate a data delay occurring between a CPU (Central Processing Unit) and a main memory and improve program execution processing ( For example, see Patent Documents 1 to 5).

ここで、以下に説明するように、特定のデータをできるだけ、あるいは、特定のデータを必ずキャッシュメモリ上に残すことで、プログラムの実行処理を向上させる技術が知られている。 Here, as will be described below, there is known a technique for improving program execution processing by leaving specific data as much as possible or leaving certain data on a cache memory as much as possible.

第一の技術は、最弱ウェイ方式である。最弱ウェイ方式は、メモリアクセス命令によってデータをキャッシュメモリに転送する際に、同一インデクス内（同一ウェイ内）で最初に追い出すべきデータを指示することができる方式である。最弱ウェイ方式では、ストリームデータのような再利用性の無いデータとそれ以外のデータとを同じＬＲＵ方式により追出し（置換）するのではなく、再利用性の無いデータを優先的に追い出すことで、再利用性のあるデータがキャッシュ上に残り易くなる。なお、ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）方式では、使われてから最も長い時間が経っているウェイのデータが選択され、選択されたデータが新たなデータと置換される。また、ウェイとは、セットアソシエイティブ方式により、キャッシュラインを複数に分割した場合の単位である。 The first technique is the weakest way method. The weakest way method is a method in which when data is transferred to a cache memory by a memory access instruction, the data to be evicted first in the same index (in the same way) can be indicated. In the weakest way method, non-reusable data such as stream data and other data are not ejected (replaced) by the same LRU method, but non-reusable data is preferentially ejected. This makes it easy for reusable data to remain in the cache. In the LRU (Least Recently Used) method, the data of the way that has been used for the longest time is selected, and the selected data is replaced with new data. The way is a unit when the cache line is divided into a plurality of by the set associative method.

第二の技術は、ローカルメモリ方式である（例えば、特許文献６、７参照）。ローカルメモリ方式では、キャッシュメモリ領域が通常のキャッシュメモリ領域とローカルメモリ（またはスクラッチパッド）領域とに分割される。そして、ローカルメモリ方式では、ローカルメモリ領域に再利用性のあるデータが配置される。したがって、ローカルメモリ方式では、ストリームデータのような再利用性の無いデータによって再利用性のあるデータがキャッシュメモリから追い出されることを防ぐことができる。 The second technique is a local memory system (see, for example, Patent Documents 6 and 7). In the local memory system, the cache memory area is divided into a normal cache memory area and a local memory (or scratch pad) area. In the local memory method, reusable data is arranged in the local memory area. Therefore, in the local memory method, reusable data can be prevented from being evicted from the cache memory by non-reusable data such as stream data.

第三の技術は、キャッシュラインロック方式またはキャッシュウェイロック方式である。キャッシュラインロック方式では、特定のキャッシュラインがロックされ、キャッシュウェイロック方式では、特定のキャッシュウェイがロックされる。そして、キャッシュラインロック方式またはキャッシュウェイロック方式では、ローカルメモリ方式と同様に、再利用性のあるデータを書き込んだキャッシュラインやキャッシュウェイがロックされることで、ストリームデータのような再利用性の無いデータによって再利用性のあるデータがキャッシュメモリから追い出されることを防ぐことができる。 The third technique is a cache line lock method or a cache way lock method. In the cache line lock method, a specific cache line is locked, and in the cache way lock method, a specific cache way is locked. Then, in the cache line lock method or the cache way lock method, similar to the local memory method, the cache line or the cache way in which reusable data is written is locked, so that reusability such as stream data can be achieved. It is possible to prevent reusable data from being evicted from the cache memory due to missing data.

特開昭６２−２３３８６４号公報JP 62-233864 A 特表２００８−５２５９１９号公報Special table 2008-525919 特開２００５−１２２５０６号公報JP 2005-122506 A 特開２００８−１０２７３３号公報JP 2008-102733 A 特開２００２−４９５２９号公報JP 2002-49529 A 特開平１０−１８７５３３号公報JP-A-10-187533 特開平４−１７５９４６号公報JP-A-4-175946

ところで、上記したローカルメモリ方式では、通常のキャッシュメモリ領域およびメインメモリとは異なるローカルメモリ領域に対する特別なストア命令が必要となる。また、上記したローカルメモリ方式では、通常のキャッシュ領域およびスクラッチパッド領域と、メインメモリとの間でデータの一貫性を維持するための制御が必要になる。また、上記したローカルメモリ方式では、プログラム実行中に、ローカルメモリの領域サイズを変更する場合、大きなオーバーヘッドが必要となる。 By the way, the above-mentioned local memory system requires a special store instruction for a local memory area different from the normal cache memory area and the main memory. In the above-mentioned local memory system, it is necessary to perform control for maintaining data consistency between the normal cache area and scratch pad area and the main memory. In the local memory system described above, a large overhead is required when changing the area size of the local memory during program execution.

また、上記したキャッシュラインロック方式またはキャッシュウェイロック方式では、プログラマがロック解除（アンロック）を忘れたり、誤って全てのキャッシュ領域をロックしてしまったりした場合、キャッシュシステムが正常動作しなくなる可能性がある。また、上記したキャッシュラインロック方式またはキャッシュウェイロック方式では、アンロックのための専用ハードウェア機構が必要となり、追加コストが大きくなってしまう。 Also, with the above cache line lock method or cache way lock method, if the programmer forgets to unlock (unlock) or locks all the cache areas by mistake, the cache system may not operate normally. There is sex. In addition, the above-described cache line lock method or cache way lock method requires a dedicated hardware mechanism for unlocking, which increases the additional cost.

また、上記した最弱ウェイ方式は、あくまでも優先的な追出しであり、ローカルメモリ方式やキャッシュラインロック方式またはキャッシュウェイロック方式と比べると、再利用性のあるデータがキャッシュメモリに残る確率が低くなる。また、上記した最弱ウェイ方式は、ローカルメモリ方式と同時に実装できない。 In addition, the weakest way method described above is a priority eviction, and the probability that reusable data remains in the cache memory is lower than the local memory method, the cache line lock method, or the cache way lock method. . The weakest way method described above cannot be implemented simultaneously with the local memory method.

すなわち、上記した従来の技術は、必ずしもキャッシュメモリを有効的に利用することができないため、プログラムの実行性能が低下してしまう場合があるという課題があった。 That is, the above-described conventional technique has a problem that the execution performance of the program may be deteriorated because the cache memory cannot always be effectively used.

そこで、開示の技術は、上述した従来技術の課題を解決するためになされたものであり、キャッシュメモリを有効的に利用することによりプログラムの実行性能を向上することが可能となるコンパイラプログラムおよびコンパイラ装置を提供することを目的とする。 Therefore, the disclosed technique has been made to solve the above-described problems of the prior art, and a compiler program and a compiler that can improve the execution performance of a program by effectively using a cache memory An object is to provide an apparatus.

上述した課題を解決し、目的を達成するため、本願の開示するプログラムは、一つの様態において、セクタ機能付きのキャッシュメモリを搭載する情報処理装置にて実行されるソースプログラムを解析することで、当該ソースプログラムの各ループにおいて処理されるデータ配列の集合であるデータ集合のループ処理実行時における再利用性の有無を判定し、当該再利用性の有無を判定したデータ集合を格納するために要する容量と前記キャッシュメモリの容量とから、前記キャッシュメモリにおけるセクタ分割比と、当該データ集合を格納するセクタを特定するためのセクタ番号とを決定する決定手順と、前記決定手順によって前記セクタ分割比および前記セクタ番号が決定されたループにおいて、当該セクタ分割比および当該セクタ番号に基づく命令制御文を前記ソースプログラムに挿入する挿入手順と、前記挿入手順によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成手順と、をコンピュータに実行させることを要件とする。 In order to solve the above-described problems and achieve the object, the program disclosed in the present application is, in one aspect, by analyzing a source program executed by an information processing apparatus equipped with a cache memory with a sector function. Necessary for determining whether or not the data set, which is a set of data arrays processed in each loop of the source program, is reusable when loop processing is executed, and storing the data set for which the reusability is determined A determination procedure for determining a sector division ratio in the cache memory and a sector number for specifying a sector storing the data set from the capacity and the capacity of the cache memory, and the sector division ratio and In the loop in which the sector number is determined, the sector division ratio and the sector number It is a requirement that the computer execute an insertion procedure for inserting an instruction control statement based on the source program and a file generation procedure for generating an object file from the source program in which the instruction control statement is inserted by the insertion procedure. .

また、本願の開示するプログラムは、他の様態において、セクタ機能付きのキャッシュメモリに格納する際のセクタ分割比が作成者によりループ内にて指定されたソースプログラムを受け付ける受け付け手順と、前記受け付け手順によって受け付けられた前記ソースプログラムから、当該ソースプログラムにて前記作成者により前記セクタ分割比が指定されたループにて用いられる配列データを抽出する抽出手順と、前記抽出手順によって抽出された前記配列データの再利用性の有無と、前記セクタ分割比に基づいて、当該配列データを前記キャッシュメモリに格納する際のセクタ番号を決定する決定手順と、前記決定手順によって前記セクタ番号が決定されたループにおいて、当該決定されたセクタ番号および前記作成者から指定されたセクタ分割比に基づく命令制御文を前記ソースプログラムに挿入する挿入手順と、前記挿入手順によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成手順と、をコンピュータに実行させることを要件とする。 In another aspect, the program disclosed in the present application is a reception procedure for receiving a source program in which a sector division ratio is specified in a loop by a creator when stored in a cache memory with a sector function, and the reception procedure An extraction procedure for extracting sequence data used in a loop in which the sector division ratio is specified by the creator in the source program from the source program accepted by the source program, and the sequence data extracted by the extraction procedure In the loop in which the sector number is determined by the determination procedure, and a determination procedure for determining a sector number for storing the array data in the cache memory based on the presence or absence of reusability and the sector division ratio The determined sector number and the cell specified by the creator. And causing a computer to execute an insertion procedure for inserting an instruction control statement based on a data division ratio into the source program and a file generation procedure for generating an object file from the source program into which the instruction control statement has been inserted by the insertion procedure. Is a requirement.

開示のプログラムによれば、キャッシュメモリを有効的に利用することによりプログラムの実行性能を向上することが可能となる。 According to the disclosed program, it is possible to improve the execution performance of the program by effectively using the cache memory.

図１は、セクタ機能付きキャッシュシステムにおける最弱ウェイ方式を説明するための図である。FIG. 1 is a diagram for explaining the weakest way method in a cache system with a sector function. 図２は、セクタ機能付きキャッシュシステムにおけるローカルメモリ方式を説明するための図である。FIG. 2 is a diagram for explaining a local memory system in a cache system with a sector function. 図３は、実施例１におけるコンパイラ装置によるセクタ分割処理の概念を説明するための図である。FIG. 3 is a diagram for explaining the concept of sector division processing by the compiler apparatus according to the first embodiment. 図４は、実施例１におけるコンパイラ装置の構成を示すブロック図である。FIG. 4 is a block diagram illustrating the configuration of the compiler apparatus according to the first embodiment. 図５は、実施例１におけるソースプログラム記憶部を説明するための図である。FIG. 5 is a diagram for explaining the source program storage unit according to the first embodiment. 図６は、アーキデータ記憶部を説明するための図である。FIG. 6 is a diagram for explaining the archidata storage unit. 図７は、実施例１におけるコンパイラ装置の全体処理を説明するためのフローチャートである。FIG. 7 is a flowchart for explaining the overall processing of the compiler apparatus according to the first embodiment. 図８は、図７に示した最適化部の全体処理を説明するためのフローチャートである。FIG. 8 is a flowchart for explaining the overall processing of the optimization unit shown in FIG. 図９は、図８に示したセクタ分割比、セクタ分割比有効範囲およびセクタ番号の決定処理を説明するためのフローチャートである。FIG. 9 is a flowchart for explaining the sector division ratio, sector division ratio effective range, and sector number determination process shown in FIG. 図１０は、図９に示したセクタ分割比有効範囲の決定処理を説明するためのフローチャートである。FIG. 10 is a flowchart for explaining the process of determining the sector division ratio effective range shown in FIG. 図１１は、図９に示した再利用性のあるメモリアクセスデータとして扱うデータ集合および必要ウェイ数の決定処理を説明するためのフローチャートである。FIG. 11 is a flowchart for explaining the determination processing of the data set and the necessary number of ways to be handled as the reusable memory access data shown in FIG. 図１２は、図９に示したストリームデータとして扱うデータ集合および必要ウェイ数の決定処理を説明するためのフローチャートである。FIG. 12 is a flowchart for explaining the data set handling as the stream data shown in FIG. 9 and the necessary way number determination processing. 図１３は、図９に示した３種類の出力パターンを説明するための図である。FIG. 13 is a diagram for explaining the three types of output patterns shown in FIG. 図１４は、最適化結果記憶部を説明するための図である。FIG. 14 is a diagram for explaining the optimization result storage unit. 図１５は、実施例２において用いられるキャッシュ制御文の一例を説明するための図である。FIG. 15 is a diagram for explaining an example of a cache control statement used in the second embodiment. 図１６は、最弱ウェイ方式によるセクタ分割の指定例を説明するための図である。FIG. 16 is a diagram for explaining an example of sector division designation by the weakest way method. 図１７は、ローカルメモリ方式によるセクタ分割の指定例を説明するための図である。FIG. 17 is a diagram for explaining an example of designation of sector division by the local memory method. 図１８は、最弱ウェイ方式およびローカルメモリ方式によるセクタ分割の指定例を説明するための図である。FIG. 18 is a diagram for explaining an example of sector division designation by the weakest way method and the local memory method. 図１９は、制御エリア指定の変形例を説明するための図である。FIG. 19 is a diagram for explaining a modification of the control area designation. 図２０は、準ローカルメモリ方式によるセクタ分割の指定例を説明するための図である。FIG. 20 is a diagram for explaining an example of sector division designation by the quasi-local memory method. 図２１は、実施例２におけるソース解析部を説明するための図である。FIG. 21 is a diagram for explaining the source analysis unit according to the second embodiment. 図２２は、実施例２における最適化部のセクタ決定処理を説明するためのフローチャートである。FIG. 22 is a flowchart for explaining the sector determination process of the optimization unit according to the second embodiment. 図２３は、実施例１のコンパイラプログラムを実行するコンピュータを示す図である。FIG. 23 is a diagram illustrating a computer that executes the compiler program according to the first embodiment.

以下に添付図面を参照して、本願の開示するコンパイラプログラムおよびコンパイラ装置の実施例を詳細に説明する。なお、以下では、本願の開示するコンパイラプログラムを実行するコンパイラ装置を実施例として説明する。 Exemplary embodiments of a compiler program and a compiler apparatus disclosed in the present application will be described below in detail with reference to the accompanying drawings. In the following, a compiler apparatus that executes a compiler program disclosed in the present application will be described as an embodiment.

まず最初に、本実施例で用いる主要な用語を説明する。本実施例で用いる「セクタ機能付きキャッシュメモリシステム」とは、セット・アソシアティブ方式によりキャッシュラインが複数のウェイに分割されたキャッシュメモリを、プログラム実行中でも、複数のセクタに分割することを指示できるシステムのことである。 First, main terms used in this embodiment will be described. The “cache memory system with a sector function” used in the present embodiment is a system that can instruct a cache memory in which a cache line is divided into a plurality of ways by a set associative method to be divided into a plurality of sectors even during program execution. That's it.

具体的には、「セクタ機能付きキャッシュメモリシステム」においては、キャッシュメモリをセクタに分割する際に、セクタの分割比（以下、セクタ分割比と記す）を指示することができる。さらに、「セクタ機能付きキャッシュメモリシステム」においては、後続する各メモリアクセス命令の使用セクタ（具体的には、分割された各セクタを特定するセクタ番号）を指示することができる。これにより、「セクタ機能付きキャッシュメモリシステム」においては、間接的（または、直接的）に各セクタの使用用途を指示できる。 Specifically, in the “cache memory system with sector function”, when the cache memory is divided into sectors, a sector division ratio (hereinafter referred to as a sector division ratio) can be designated. Further, in the “cache memory system with a sector function”, it is possible to indicate a sector to be used for each subsequent memory access instruction (specifically, a sector number that identifies each divided sector). As a result, in the “cache memory system with sector function”, the usage of each sector can be instructed indirectly (or directly).

また、「ＬＲＵ方式」とは、使われてから最も長い時間が経っている（ＬＲＵ：ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）データを順次キャッシュメモリから除去し、新たなデータを取り込む方式のことである。また、「最弱ウェイ方式」とは、メモリアクセス命令によってデータをキャッシュメモリに転送する際に、同一インデクス内（同一ウェイ内）で最初に追い出すべきデータを指示する方式のことである。また、「ローカルメモリ方式」とは、キャッシュメモリ領域を通常のキャッシュメモリ領域とローカルメモリ（またはスクラッチパッド）領域とに分割し、ローカルメモリ領域に再利用性のあるデータが配置される方式のことである。 In addition, the “LRU method” is a method in which data that has been used for the longest time (LRU: Last Recently Used) is sequentially removed from the cache memory and new data is taken in. The “weakest way method” is a method for instructing data to be first ejected in the same index (in the same way) when data is transferred to the cache memory by a memory access instruction. The “local memory method” is a method in which the cache memory area is divided into a normal cache memory area and a local memory (or scratch pad) area, and reusable data is arranged in the local memory area. It is.

さらに、「セクタ機能付きキャッシュメモリシステムにおける最弱ウェイ方式」とは、例えば、２つのセクタ（セクタ０およびセクタ１）に分割されたキャッシュメモリを用いてプログラムを実行する際に、最弱ウェイ方式を採用したシステムのことである。以下、「セクタ機能付きキャッシュメモリシステムにおける最弱ウェイ方式」について、図１を用いて説明する。なお、図１は、セクタ機能付きキャッシュシステムにおける最弱ウェイ方式を説明するための図である。 Further, the “weakest way method in a cache memory system with a sector function” is, for example, the weakest way method when executing a program using a cache memory divided into two sectors (sector 0 and sector 1). It is a system that adopts. The “weakest way method in the cache memory system with a sector function” will be described below with reference to FIG. FIG. 1 is a diagram for explaining the weakest way method in the cache system with a sector function.

「セクタ機能付きキャッシュメモリシステムにおける最弱ウェイ方式」においては、図１の（Ａ）に示すように、セクタ０とセクタ１との最大ウェイ数の合計が、システムの最大ウェイ数より大きな値になるようにセクタ分割比が指定される。例えば、図１の（Ｂ）に示すように、１ウェイが５１２ＫＢであるシステム（キャッシュメモリ）の最大ウェイ数が「１０」である場合、セクタ０の最大ウェイ数は、システムの最大ウェイ数と同じ「１０」とされ、セクタ１の最大ウェイ数は、「５」とされる。 In the “weakest way method in a cache memory system with a sector function”, as shown in FIG. 1A, the sum of the maximum number of ways of sector 0 and sector 1 is larger than the maximum number of ways of the system. The sector division ratio is designated as follows. For example, as shown in FIG. 1B, when the maximum number of ways in a system (cache memory) in which one way is 512 KB is “10”, the maximum number of ways in sector 0 is the maximum number of ways in the system. The same number “10” is set, and the maximum number of ways in sector 1 is “5”.

そして、例えば、図１の（Ｃ）に示すソースプログラムを実行する際、隣接領域を順次アクセスする連続性のあるデータであるストリームデータの配列ａおよび配列ｂは、セクタ１に割り当てられる。そして、図１の（Ｃ）に示すように、ｄｏループにおいて、再利用性のあるデータである配列ｃは、セクタ０に割り当てられる。これにより、「セクタ機能付きキャッシュメモリシステム」にて「最弱ウェイ方式」が実行された場合、配列ａおよび配列ｂのデータが別のセクタにストアされることで、配列ｃのデータがキャッシュメモリから追い出される確率を低減することができる。 For example, when the source program shown in FIG. 1C is executed, stream data array a and array b, which are continuous data for sequentially accessing adjacent areas, are allocated to sector 1. As shown in FIG. 1C, the array c, which is reusable data, is assigned to sector 0 in the do loop. Thus, when the “weakest way method” is executed in the “cache memory system with sector function”, the data of the array a and the array b are stored in different sectors, so that the data of the array c is cache memory. It is possible to reduce the probability of being evicted from.

また、「セクタ機能付きキャッシュメモリシステムにおけるローカルメモリ方式」とは、再利用性のある配列が予定外のキャッシュコンフリクトを起こしてキャッシュメモリから追い出されることを防止するローカルメモリ方式を、セクタ機能にて実現する方式のことである。一般的には、ローカルメモリは、通常のキャッシュメモリとは別の高速メモリを使うことが多いが、セクタ機能付きキャッシュメモリシステムでは、キャッシュ制御だけで、ローカルメモリ方式と同じ動きを実現する。以下、「セクタ機能付きキャッシュメモリシステムにおけるローカルメモリ方式」について、図２を用いて説明する。なお、図２は、セクタ機能付きキャッシュシステムにおけるローカルメモリ方式を説明するための図である。 The “local memory system in a cache memory system with a sector function” is a local memory system that prevents a reusable array from causing an unplanned cache conflict and being evicted from the cache memory. It is a method to realize. In general, the local memory often uses a high-speed memory different from the normal cache memory. However, in the cache memory system with a sector function, the same operation as the local memory system is realized only by the cache control. The “local memory system in the cache memory system with a sector function” will be described below with reference to FIG. FIG. 2 is a diagram for explaining a local memory system in a cache system with a sector function.

「セクタ機能付きキャッシュメモリシステムにおけるローカルメモリ方式」では、セクタ機能付きキャッシュメモリをローカルメモリとして利用する場合、図２の（Ａ）に示すように、セクタ０とセクタ１との最大ウェイ数の合計が、システムの最大ウェイ数と同じ値になるようにセクタ分割比が指定される。例えば、図２の（Ｂ）に示すように、１ウェイが５１２ＫＢであるシステム（キャッシュメモリ）の最大ウェイ数が「１０」である場合、セクタ０の最大ウェイ数は、「９」とされ、セクタ１の最大ウェイ数は、「１０」から「９」を差し引いた「１」とされる。 In the “local memory system in the cache memory system with sector function”, when the cache memory with sector function is used as the local memory, as shown in FIG. However, the sector division ratio is specified so that it becomes the same value as the maximum number of ways of the system. For example, as shown in FIG. 2B, when the maximum number of ways in a system (cache memory) where 1 way is 512 KB is “10”, the maximum number of ways in sector 0 is “9”. The maximum number of ways in sector 1 is “1” obtained by subtracting “9” from “10”.

そして、例えば、図２の（Ｃ）に示すソースプログラムを実行する際、再利用性のある配列ｃは、セクタ１に割当て、ストリームデータの配列ａおよび配列ｂは、セクタ０に割り当てられる。なお、図１の（Ｃ）に示すソースプログラムでは、配列ｃの宣言サイズが動的（do i=1,m）であるのに対し、図２の（Ｃ）に示すソースプログラムでは、配列ｃの宣言サイズが静的に「do i=1,1000」として示されている。 For example, when the source program shown in FIG. 2C is executed, the reusable array c is allocated to the sector 1, and the stream data arrays a and b are allocated to the sector 0. In the source program shown in FIG. 1C, the declaration size of the array c is dynamic (do i = 1, m), whereas in the source program shown in FIG. Is declared statically as “do i = 1,1000”.

ここで、配列ｃのアクセス幅（サイズ）がセクタ１のサイズ５１２ＫＢ以下である場合、セクタ１を利用するデータは配列ｃのみであるため、配列ｃがキャッシュコンフリクトを起こしてセクタ１から追い出される確率は、０％となる。このように、「セクタ機能付きキャッシュメモリシステム」にて、セクタ１をローカルメモリとして扱うことにより、再利用性のある配列ｃがキャッシュから追い出されることを防ぐことができる。なお、図２の（Ｃ）に示すソースプログラムのように、ループの回転数が明確でない場合であっても、配列ｃの宣言サイズが静的に明示されていれば、「セクタ機能付きキャッシュメモリシステムにおけるローカルメモリ方式」は、実行可能である。 Here, when the access width (size) of the array c is less than or equal to the size 512 KB of the sector 1, the data that uses the sector 1 is only the array c. Therefore, the probability that the array c is evicted from the sector 1 due to a cache conflict. Is 0%. In this way, by handling sector 1 as a local memory in the “cache memory system with sector function”, it is possible to prevent the reusable array c from being evicted from the cache. Even if the number of rotations of the loop is not clear as in the source program shown in FIG. 2C, if the declaration size of the array c is statically specified, “cache memory with sector function” The “local memory system in the system” is executable.

また、「セクタ機能付きキャッシュメモリシステムにおける準ローカルメモリ方式」とは、ローカルメモリとするセクタ番号のキャッシュメモリに割り当てる再利用性のある配列のアクセスサイズが不明の場合に用いられる方式のことである。「セクタ機能付きキャッシュメモリシステムにおける準ローカルメモリ方式」では、「セクタ機能付きキャッシュメモリシステムにおけるローカルメモリ方式」と同様に、セクタ０とセクタ１の最大ウェイ数の和がシステムの最大ウェイ数となる分割比が指定される。 The “quasi-local memory method in the cache memory system with a sector function” is a method used when the access size of the reusable array allocated to the cache memory having the sector number as the local memory is unknown. . In the “quasi-local memory system in the cache memory system with the sector function”, the sum of the maximum number of ways of the sector 0 and the sector 1 becomes the maximum number of ways of the system as in the “local memory system in the cache memory system with the sector function” A split ratio is specified.

例えば、図１の（Ａ）に示すソースプログラムでは、配列ｃのアクセスサイズも宣言サイズも不明である。「セクタ機能付きキャッシュメモリシステムにおける準ローカルメモリ方式」を用いる場合に、例えば、図２の（Ｃ）に示すようなセクタ分割を行う。すなわち、再利用性のある配列ｃのアクセスサイズが不明の場合、例えば、図２の（Ｃ）に示すように、システムの最大ウェイ数が「１０」である場合、セクタ０の最大ウェイ数は、「６」とされ、セクタ１の最大ウェイ数は、「１０」から「６」を差し引いた「４」とされる。すなわち、図２の（Ｃ）に示すセクタ分割比は、配列ｃのアクセスサイズが不明のため、図２の（Ｂ）に示すローカルメモリ方式と比較して、セクタ１の最大ウェイ数に余裕を持たせるように指定されている。 For example, in the source program shown in FIG. 1A, the access size and declaration size of the array c are unknown. When the “quasi-local memory method in the cache memory system with a sector function” is used, for example, sector division as shown in FIG. That is, when the access size of the reusable array c is unknown, for example, as shown in FIG. 2C, when the maximum number of ways of the system is “10”, the maximum number of ways of sector 0 is , “6”, and the maximum number of ways in sector 1 is “4” obtained by subtracting “6” from “10”. That is, the sector division ratio shown in FIG. 2C has a margin in the maximum number of ways in sector 1 compared to the local memory method shown in FIG. It is specified to have.

しかしながら、「セクタ機能付きキャッシュメモリシステムにおける準ローカルメモリ方式」では、以下に示す２つの課題がある。第一の課題は、再利用性のある配列のアクセス幅が不明なため、再利用性のある配列を割り当てたセクタ１内にてＬＲＵ方式により、キャッシュコンフリクトを起こす可能性があることである。また、第二の課題は、キャッシュコンフリクトを懸念して、セクタ１の最大ウェイ数を増やすと、セクタ０の最大ウェイ数が減るため、ストリームデータのオンキャッシュ率が低下することである。 However, the “semi-local memory system in the cache memory system with a sector function” has the following two problems. The first problem is that, since the access width of a reusable array is unknown, there is a possibility of causing a cache conflict in the sector 1 to which the reusable array is allocated by the LRU method. Further, the second problem is that if the maximum number of ways in sector 1 is increased due to concern about cache conflict, the maximum number of ways in sector 0 decreases, and the on-cache rate of stream data decreases.

このような２つの課題があるため、「セクタ機能付きキャッシュメモリシステムにおける準ローカルメモリ方式」では、例えば、ＰＡ（パフォーマンス・アナライザ）ツールによって収集したＰＡ情報に基づいて、セクタ分割比を決定する必要がある。このため、プログラマは、デフォルトではなくオプショナルの動作とするなどの工夫が必要となり、「セクタ機能付きキャッシュメモリシステム」を容易に用いることができない場合がある。すなわち、自動的にセクタ分割比と有効範囲（開始位置と終了位置）とを決め、メモリアクセスデータごとのセクタ番号を設定できるコンパイラ装置が無いため、プログラマは、セクタ機能付きキャッシュメモリを有効に活用することができない。 Because of these two problems, in the “semi-local memory method in the cache memory system with sector function”, for example, it is necessary to determine the sector division ratio based on PA information collected by a PA (performance analyzer) tool. There is. For this reason, the programmer needs to devise such as an optional operation instead of the default, and the “cache memory system with sector function” may not be easily used. In other words, since there is no compiler device that can automatically determine the sector division ratio and effective range (start position and end position) and set the sector number for each memory access data, programmers can effectively use the cache memory with sector function Can not do it.

そこで、本実施例におけるコンパイラ装置は、以下、図３を用いて説明するようなセクタ分割処理を実行する。なお、図３は、実施例１におけるコンパイラ装置によるセクタ分割処理の概念を説明するための図である。 Therefore, the compiler apparatus according to the present embodiment executes sector division processing as described below with reference to FIG. FIG. 3 is a diagram for explaining the concept of sector division processing by the compiler apparatus according to the first embodiment.

図３に示すように、本実施例におけるコンパイラ装置は、ループ内のデータのアクセス方法、ループ内で用いられるデータ配列のサイズ、キャッシュメモリのウェイ数、キャッシュメモリのサイズなど様々な要因（要因１〜ｎ）を解析する。そして、本実施例におけるコンパイラ装置は、図３に示すように、解析結果である各要因をパラメタとして、セクタ分割比、セクタ分割比有効範囲（開始位置および終了位置）および各データのセクタ番号を決定する決定関数に入力する。 As shown in FIG. 3, the compiler apparatus according to the present embodiment has various factors (factor 1) such as the data access method in the loop, the size of the data array used in the loop, the number of ways of the cache memory, the size of the cache memory ~ N) is analyzed. Then, as shown in FIG. 3, the compiler apparatus according to the present embodiment uses the respective factors as analysis results as parameters to set the sector division ratio, the effective range of the sector division ratio (start position and end position), and the sector number of each data. Enter the decision function to be determined.

決定関数の出力結果から、本実施例におけるコンパイラ装置は、図３に示すように、セクタ分割比、セクタ分割比有効範囲および各データのセクタ番号を決定し、ソースプログラムにセクタ分割命令の挿入およびメモリアクセス命令へのセクタ番号付加の最適化処理を実行する。なお、本実施例におけるコンパイラ装置は、ソースプログラムごとに解析した新しい要因に応じて、決定関数を変更する。 From the output result of the decision function, the compiler apparatus according to the present embodiment determines the sector division ratio, the sector division ratio effective range, and the sector number of each data, as shown in FIG. An optimization process for adding a sector number to a memory access instruction is executed. Note that the compiler apparatus according to the present embodiment changes the decision function according to a new factor analyzed for each source program.

次に、本実施例におけるコンパイラ装置の構成について、図４を用いて説明する。図４は、実施例１におけるコンパイラ装置の構成を示すブロック図である。 Next, the configuration of the compiler apparatus in this embodiment will be described with reference to FIG. FIG. 4 is a block diagram illustrating the configuration of the compiler apparatus according to the first embodiment.

図４に示すように、実施例１におけるコンパイラ装置１０は、ソースプログラム入力部１１と、オブジェクトファイル出力部１２と、通信部１３と、入出力制御Ｉ／Ｆ部１４と、記憶部２０と、処理部３０とを有する。また、本実施例におけるコンパイラ装置１０は、図２に示すように、情報処理装置４０と接続される。 As shown in FIG. 4, the compiler apparatus 10 according to the first embodiment includes a source program input unit 11, an object file output unit 12, a communication unit 13, an input / output control I / F unit 14, a storage unit 20, And a processing unit 30. Further, the compiler apparatus 10 in this embodiment is connected to the information processing apparatus 40 as shown in FIG.

情報処理装置４０は、「セクタ機能付きキャッシュメモリ」を搭載し、コンパイラ装置１０から出力されたオブジェクトファイルを実行する計算機である。なお、本実施例では、コンパイラ装置１０と情報処理装置４０とが独立した装置である場合について説明するが、本実施例はこれに限定されるものではなく、コンパイラ装置１０が情報処理装置４０に組み込まれている場合であってもよい。 The information processing apparatus 40 is a computer that has a “cache memory with sector function” and executes the object file output from the compiler apparatus 10. In the present embodiment, the case where the compiler apparatus 10 and the information processing apparatus 40 are independent apparatuses will be described. However, the present embodiment is not limited to this, and the compiler apparatus 10 is included in the information processing apparatus 40. It may be a case where it is incorporated.

ソースプログラム入力部１１は、プログラマが作成したソースプログラムを受け付け、オブジェクトファイル出力部１２は、処理部３０により生成されたオブジェクトファイルを、情報処理装置４０に出力する。通信部１３は、情報処理装置４０から後述するアーキデータを受信する。 The source program input unit 11 receives a source program created by a programmer, and the object file output unit 12 outputs the object file generated by the processing unit 30 to the information processing apparatus 40. The communication unit 13 receives archedata described later from the information processing apparatus 40.

通信部１３は、情報処理装置４０から後述するアーキデータを受信する。 The communication unit 13 receives archedata described later from the information processing apparatus 40.

入出力部Ｉ／Ｆ部１４は、ソースプログラム入力部１１、オブジェクトファイル出力部１２および通信部１３と、記憶部２０および処理部３０との間におけるデータ転送を制御する。 The input / output unit I / F unit 14 controls data transfer among the source program input unit 11, the object file output unit 12 and the communication unit 13, and the storage unit 20 and the processing unit 30.

記憶部２０は、ソースプログラム入力部１１が受け付けたソースプログラムや後述する処理部１５による各種処理結果を記憶する。ここで、記憶部２０は、特に本実施例に密接に関連するものとして、図４に示すように、ソースプログラム記憶部２１と、ソース解析結果記憶部２２と、アーキデータ記憶部２３と、最適化結果報記憶部２４とを有する。 The storage unit 20 stores the source program received by the source program input unit 11 and various processing results by the processing unit 15 described later. Here, the storage unit 20 is particularly closely related to the present embodiment. As shown in FIG. 4, the source program storage unit 21, the source analysis result storage unit 22, the archi data storage unit 23, and the optimum And a conversion result report storage unit 24.

ソースプログラム記憶部２１は、ソースプログラム入力部１１が受け付けたソースプログラムを記憶する。例えば、ソースプログラム記憶部２１は、図５の（Ａ）、（Ｂ）および（Ｃ）に示すようなソースプログラムを記憶する。図５は、実施例１におけるソースプログラム記憶部を説明するための図である。 The source program storage unit 21 stores the source program received by the source program input unit 11. For example, the source program storage unit 21 stores source programs as shown in (A), (B), and (C) of FIG. FIG. 5 is a diagram for explaining the source program storage unit according to the first embodiment.

なお、図５に示す３種類のソースプログラムについては、後に詳細に説明する。 Note that the three types of source programs shown in FIG. 5 will be described in detail later.

ソース解析結果記憶部２２は、後述するソース解析部３１がソースプログラムをコンパイラ装置１０が扱うことが可能となる言語表現に変換した中間言語テキストを記憶する。 The source analysis result storage unit 22 stores intermediate language text converted by a source analysis unit 31 (to be described later) into a language expression that can be handled by the compiler apparatus 10.

アーキデータ記憶部２３は、通信部１３が受信した情報処理装置４０のハードウェア情報としてのアーキデータを記憶する。アーキデータは、情報処理装置４０に搭載されるキャッシュメモリのサイズ、キャッシュウェイ数およびキャッシュセクタ数の情報のことである。例えば、アーキデータ記憶部２３は、図６に示すように、情報処理装置４０の２次キャッシュメモリのアーキデータとして、キャッシュサイズが「５ＭＢ」であり、キャッシュウェイ数が「１０ＷＡＹ」であり、キャッシュセクタ数が「２」であるとする情報を記憶する。なお、キャッシュウェイ数は、システムの最大ウェイ数であり、キャッシュセクタ数は、セクタ分割数である。 The archi data storage unit 23 stores archi data as hardware information of the information processing apparatus 40 received by the communication unit 13. The archedata is information on the size of the cache memory mounted on the information processing apparatus 40, the number of cache ways, and the number of cache sectors. For example, as shown in FIG. 6, the archive data storage unit 23 has a cache size of “5 MB”, a cache way number of “10 WAY”, and a cache cache as the archive data of the secondary cache memory of the information processing apparatus 40. Information that the number of sectors is “2” is stored. The number of cache ways is the maximum number of ways of the system, and the number of cache sectors is the number of sector divisions.

最適化結果記憶部２４は、後述する最適化部３２の処理結果を記憶する。なお、最適化結果記憶部２４が記憶する内容については、後に詳述する。 The optimization result storage unit 24 stores the processing result of the optimization unit 32 described later. The contents stored in the optimization result storage unit 24 will be described in detail later.

処理部３０は、入出力部Ｉ／Ｆ部１４から転送されたソースプログラムがソースプログラム記憶部２１に格納された際に、各種処理を実行する。ここで、処理部３０は、特に本実施例に密接に関連するものとして、図４に示すように、ソース解析部３１と、最適化部３２と、ファイル生成部３３とを有する。 The processing unit 30 executes various processes when the source program transferred from the input / output unit I / F unit 14 is stored in the source program storage unit 21. Here, the processing unit 30 includes a source analysis unit 31, an optimization unit 32, and a file generation unit 33, as shown in FIG. 4, particularly as closely related to the present embodiment.

ここで、実施例１におけるコンパイラ装置１０が、図４に示すソース解析部３１と、最適化部３２と、ファイル生成部３３とを用いて実行する処理全体の大まかな流れについて、図７を用いて説明する。なお、図７は、実施例１におけるコンパイラ装置の全体処理を説明するためのフローチャートである。 Here, FIG. 7 is used as a rough flow of the entire processing executed by the compiler apparatus 10 according to the first embodiment using the source analysis unit 31, the optimization unit 32, and the file generation unit 33 shown in FIG. I will explain. FIG. 7 is a flowchart for explaining the overall processing of the compiler apparatus according to the first embodiment.

図７に示すように、ソース解析部３１は、ソースプログラムが入力されると（ステップＳ１０１肯定）、ソースプログラム記憶部２１が記憶するソースプログラムを中間言語ファイルに変換するソース解析を行なう（ステップＳ１０２）。 As shown in FIG. 7, when a source program is input (Yes in step S101), the source analysis unit 31 performs source analysis for converting the source program stored in the source program storage unit 21 into an intermediate language file (step S102). ).

そして、最適化部３２は、中間言語ファイルとアーキデータとに基づいて、図３を用いて説明したセクタ分割処理により最適化処理を行なう（ステップＳ１０３）。 Then, the optimization unit 32 performs the optimization process by the sector division process described with reference to FIG. 3 based on the intermediate language file and the arche data (step S103).

そののち、ファイル生成部３３は、最適化部３２が最適化処理を行なった中間言語ファイルからオブジェクトファイルを生成してオブジェクトファイル出力部１２に出力し（ステップＳ１０４）、実施例１におけるコンパイラ装置１０は、処理を終了する。 After that, the file generation unit 33 generates an object file from the intermediate language file that has been optimized by the optimization unit 32 and outputs the object file to the object file output unit 12 (step S104), and the compiler apparatus 10 according to the first embodiment. Ends the process.

ここで、図７に示した最適化部３２によるステップＳ１０３の処理について、図８〜図１２のフローチャートを用いて詳細に説明する。なお、図８は、図７に示した最適化部の全体処理を説明するためのフローチャートであり、図９は、図８に示したセクタ分割比、セクタ分割比有効範囲およびセクタ番号の決定処理を説明するためのフローチャートである。また、図１０は、図９に示したセクタ分割比有効範囲の決定処理を説明するためのフローチャートであり、図１１は、図９に示した再利用性のあるメモリアクセスデータとして扱うデータ集合および必要ウェイ数の決定処理を説明するためのフローチャートである。また、図１２は、図９に示したストリームデータとして扱うデータ集合および必要ウェイ数の決定処理を説明するためのフローチャートである。 Here, the process of step S103 performed by the optimization unit 32 illustrated in FIG. 7 will be described in detail with reference to the flowcharts of FIGS. 8 is a flowchart for explaining the overall processing of the optimization unit shown in FIG. 7. FIG. 9 shows the sector division ratio, sector division ratio effective range and sector number determination processing shown in FIG. It is a flowchart for demonstrating. FIG. 10 is a flowchart for explaining the process of determining the sector division ratio effective range shown in FIG. 9. FIG. 11 shows a data set treated as reusable memory access data shown in FIG. It is a flowchart for demonstrating the determination process of required way number. FIG. 12 is a flowchart for explaining the data set handling as the stream data shown in FIG. 9 and the necessary way number determination processing.

図８に示すように、実施例１における最適化部３２は、アーキデータおよび中間言語テキストが格納されると（ステップＳ２０１肯定）、セクタ分割比、セクタ分割比有効範囲および各データのセクタ番号を決定する（ステップＳ２０２）。ここで、プログラム内のループには複雑な多重ループもあるので、最適化部３２は、プログラム内の各ループに対して、ループ制御変数に依存した配列があるかどうかと、ユーザ手続きがあるかどうかとを判別して、キャッシュ分割命令を挿入するループを最初に決定する。 As shown in FIG. 8, the optimization unit 32 in the first embodiment stores the arc division data and the intermediate language text (Yes in step S201), and sets the sector division ratio, the sector division ratio effective range, and the sector number of each data. Determine (step S202). Here, since there are complex multiple loops in the program loop, the optimization unit 32 determines whether there is an array depending on the loop control variable and a user procedure for each loop in the program. First, a loop for inserting a cache division instruction is determined.

具体的には、図９に示すように、最適化部３２は、まず、セクタ分割比有効範囲（ループ集合ＬＳＥＴ）の決定を行なう（ステップＳ３０１）。 Specifically, as shown in FIG. 9, the optimization unit 32 first determines a sector division ratio effective range (loop set LSET) (step S301).

より具体的には、図１０に示すように、最適化部３２は、中間言語テキスト内の全ループを検索し（ステップＳ４０１）、処理対象のループＬ内にループ制御変数に依存する配列がないか、または、ユーザ手続き（関数）があるかを判定する（ステップＳ４０２）。 More specifically, as shown in FIG. 10, the optimization unit 32 searches all loops in the intermediate language text (step S401), and there is no array depending on the loop control variable in the loop L to be processed. Or whether there is a user procedure (function) (step S402).

ループＬ内において、ループ制御変数に依存する配列があり、かつ、ユーザ手続き（関数）がない場合（ステップＳ４０２否定）、最適化部３２は、セクタ分割比有効範囲（ループ集合ＬＳＥＴ）にループＬを登録する（ステップＳ４０３）。 If there is an array that depends on the loop control variable in the loop L and there is no user procedure (function) (No in step S402), the optimization unit 32 sets the loop L to the sector division ratio effective range (loop set LSET). Is registered (step S403).

一方、ループＬ内においてループ制御変数に依存する配列がない場合、または、ループＬ内においてユーザ手続き（関数）がある場合（ステップＳ４０２肯定）、最適化部３２は、ループＬを処理対象外とする。 On the other hand, when there is no array that depends on the loop control variable in the loop L, or when there is a user procedure (function) in the loop L (Yes in step S402), the optimization unit 32 excludes the loop L from processing. To do.

そして、最適化部３２は、ステップＳ４０１にて検索した全ループを処理したか否かを判定し（ステップＳ４０４）、未処理のループがある場合（ステップＳ４０４否定）、ステップＳ４０２に戻って、次のループに対する処理を行なう。 Then, the optimization unit 32 determines whether or not all the loops searched in step S401 have been processed (step S404). If there is an unprocessed loop (No in step S404), the optimization unit 32 returns to step S402 and proceeds to the next step. Perform processing for the loop.

一方、ステップＳ４０１にて検索した全ループを処理していた場合（ステップＳ４０４肯定）、登録されたループの集合であるセクタ分割比有効範囲（ループ集合ＬＳＥＴ）を決定し（ステップＳ４０５）、図９のステップＳ３０２の処理に移行する。 On the other hand, if all loops searched in step S401 have been processed (Yes in step S404), a sector division ratio effective range (loop set LSET) that is a set of registered loops is determined (step S405), and FIG. The process proceeds to step S302.

図９に戻って、最適化部３２は、決定したループ集合ＬＳＥＴ内のループに対して、１つずつ処理を開始し（ステップＳ３０２）、ループ内のすべての配列サイズが静的に分かり、キャッシュサイズより小さいか否かを判定する（ステップＳ３０３）。 Returning to FIG. 9, the optimizing unit 32 starts processing one by one for the loops in the determined loop set LSET (step S302), and statically knows the size of all the arrays in the loop. It is determined whether or not the size is smaller (step S303).

ループ内のすべての配列サイズが静的に分かり、かつ、すべての配列サイズがキャッシュサイズより小さい場合（ステップＳ３０３肯定）、最適化部３２は、セクタ利用なしと判定して、ＬＲＵ方式を採用すると決定する（ステップＳ３１０）。 When all the array sizes in the loop are statically known and all the array sizes are smaller than the cache size (Yes at Step S303), the optimization unit 32 determines that the sector is not used and adopts the LRU method. Determine (step S310).

一方、配列サイズが静的に分からない配列がある、または、配列サイズがキャッシュサイズ以上である配列がある場合（ステップＳ３０３否定）、最適化部３２は、再利用性のあるメモリアクセスデータとして扱うデータ集合（Ｓ）および必要ウェイ数(ＳＷ)の決定を行なう（ステップＳ３０４）。 On the other hand, when there is an array whose array size is not known statically or there is an array whose array size is equal to or larger than the cache size (No at Step S303), the optimization unit 32 treats it as reusable memory access data. The data set (S) and the required number of ways (SW) are determined (step S304).

具体的には、図１１に示すように、最適化部３２は、ループ内の各データ（各データ配列）に対する処理を開始し（ステップＳ５０１）、再利用性のあるデータが「１」以上であるか否かを判定する（ステップＳ５０２）。 Specifically, as shown in FIG. 11, the optimization unit 32 starts processing for each data (each data array) in the loop (step S501), and the reusable data is “1” or more. It is determined whether or not there is (step S502).

ループ内の再利用性のあるデータが「０」の場合（ステップＳ５０２否定）、最適化部３２は、「Ｓ＝ＮＵＬＬ」および「ＳＷ＝０」と決定する（ステップＳ５１１）。 When the reusable data in the loop is “0” (No at Step S502), the optimization unit 32 determines “S = NULL” and “SW = 0” (Step S511).

一方、ループ内の再利用性のあるデータが「１」以上である場合（ステップＳ５０２肯定）、最適化部３２は、ループ内の再利用性のあるデータが「１」であるか否かを判定する（ステップＳ５０３）。 On the other hand, when the reusable data in the loop is “1” or more (Yes in step S502), the optimization unit 32 determines whether or not the reusable data in the loop is “1”. Determination is made (step S503).

ループ内の再利用性のあるデータが「１」である場合（ステップＳ５０３肯定）、最適化部３２は、「Ｓ」を決定し、ＳＷを決定するためのＳＢを決定する（ステップＳ５０４）。すなわち、最適化部３２は、１つの再利用性のあるデータを「Ｓ」として決定し、再利用性のあるデータのサイズが、アクセスサイズまたは宣言サイズで静的に分かるほうを「ＳＢ、単位：バイト」と決定する。ここで、最適化部３２は、再利用性のあるデータのサイズが、アクセスサイズおよび宣言サイズの両方で静的に分かる場合、小さい値をＳＢとする。 If the reusable data in the loop is “1” (Yes at Step S503), the optimization unit 32 determines “S” and determines an SB for determining SW (Step S504). That is, the optimizing unit 32 determines one reusable data as “S” and determines that the size of the reusable data is statically known by the access size or the declaration size as “SB, unit. : Byte ". Here, when the size of reusable data is statically known from both the access size and the declaration size, the optimization unit 32 sets the small value as SB.

一方、ループ内の再利用性のあるデータが「２」以上である場合（ステップＳ５０３否定）、最適化部３２は、再利用性のある配列が連続領域に割付可能か否かを判定する（ステップ５０５）。例えば、最適化部３２は、再利用性のある配列１および配列２の間に、別の再利用性の無い配列３が割り込まれるか否かを判定する。 On the other hand, when the reusable data in the loop is “2” or more (No in step S503), the optimization unit 32 determines whether or not the reusable array can be allocated to the continuous area ( Step 505). For example, the optimization unit 32 determines whether another non-reusable array 3 is interrupted between the reusable array 1 and the array 2.

ここで、再利用性のある配列が連続領域に割付不可である場合（ステップ５０５否定）、最適化部３２は、「Ｓ＝ＮＵＬＬ」および「ＳＷ＝０」と決定する（ステップＳ５１１）。 Here, when a reusable array cannot be assigned to a continuous area (No at Step 505), the optimization unit 32 determines “S = NULL” and “SW = 0” (Step S511).

一方、再利用性のある配列が連続領域に割付可能である場合（ステップ５０５肯定）、最適化部３２は、再利用性のある２つ以上の配列の集合である「Ｓ」を決定し、ＳＷを決定するためのＳＢを決定する（ステップＳ５０６）。なお、最適化部３２は、再利用性のある配列のデータの宣言サイズを合計（ＳＵＭ）することで、ＳＢを決定する。 On the other hand, when a reusable array can be assigned to a continuous region (Yes in step 505), the optimization unit 32 determines “S”, which is a set of two or more reusable arrays, SB for determining SW is determined (step S506). The optimization unit 32 determines the SB by summing up (SUM) the declaration size of the data of the reusable array.

ステップＳ５０４およびステップＳ５０６により、ＳおよびＳＢが決定されると、最適化部３２は、決定したＳＢが予め設定された下限閾値から上限閾値の範囲内にあるか否かを判定する（ステップＳ５０７）。 When S and SB are determined in steps S504 and S506, the optimization unit 32 determines whether or not the determined SB is within a range from a preset lower limit threshold to an upper limit threshold (step S507). .

ここで、決定したＳＢが予め設定された下限閾値から上限閾値の範囲内にない場合（ステップＳ５０７否定）、最適化部３２は、決定した集合Ｓを以降の処理に用いないとして、「Ｓ＝ＮＵＬＬ」および「ＳＷ＝０」と決定する（ステップＳ５１１）。 Here, when the determined SB is not within the range from the preset lower limit threshold to the upper limit threshold (No at Step S507), the optimization unit 32 assumes that the determined set S is not used for the subsequent processing, and “S = “NULL” and “SW = 0” are determined (step S511).

一方、決定したＳＢが予め設定された下限閾値から上限閾値の範囲内にある場合（ステップＳ５０７肯定）、ＳＢバイトが入るウェイ数をＳＷとして決定する（ステップＳ５０８）。 On the other hand, when the determined SB is within the range between the preset lower limit threshold value and the upper limit threshold value (Yes at Step S507), the number of ways into which the SB byte enters is determined as SW (Step S508).

さらに、最適化部３２は、再利用性のあるデータ数比率（ＡＲ）を再利用性のあるデータ数を全データ数にて除することで算出し（ステップＳ５０９）、算出したＡＲが予め設定された下限閾値から上限閾値の範囲内にあるか否かを判定する（ステップＳ５１０）。なお、ＡＲに対して設定される閾値と、上述したＳＢに対して設定される閾値とは、異なる値である。 Further, the optimization unit 32 calculates the reusable data number ratio (AR) by dividing the number of reusable data by the total number of data (step S509), and the calculated AR is set in advance. It is determined whether it is within the range of the upper limit threshold from the set lower limit threshold (step S510). Note that the threshold set for AR and the threshold set for SB described above are different values.

ここで、算出したＡＲが予め設定された下限閾値から上限閾値の範囲内にない場合（ステップＳ５１０否定）、最適化部３２は、決定した集合Ｓを以降の処理に用いないとして、「Ｓ＝ＮＵＬＬ」および「ＳＷ＝０」と決定する（ステップＳ５１１）。 Here, when the calculated AR is not within the range between the preset lower limit threshold value and the upper limit threshold value (No in step S510), the optimization unit 32 assumes that the determined set S is not used for the subsequent processing, and “S = “NULL” and “SW = 0” are determined (step S511).

一方、算出したＡＲが予め設定された下限閾値から上限閾値の範囲内にある場合（ステップＳ５１０肯定）、最適化部３２は、ステップＳ５０４またはステップＳ５０６で決定した集合ＳおよびステップＳ５０８で決定したＳＷを確定する（ステップＳ５１２）。 On the other hand, when the calculated AR is within the range from the preset lower limit threshold to the upper limit threshold (Yes at Step S510), the optimization unit 32 sets the set S determined at Step S504 or Step S506 and the SW determined at Step S508. Is determined (step S512).

そして、最適化部３２は、ステップＳ５１１およびステップＳ５１２によりＳおよびＳＷが確定したのち、ステップＳ３０５に移行する。 The optimization unit 32 proceeds to step S305 after S and SW are determined in steps S511 and S512.

図９に戻って、最適化部３２は、ステップＳ３０４（図１１）の処理により決定された再利用性のあるメモリアクセスデータのデータ集合Ｓが空（ＮＵＬＬ）であるか否かを判定する（ステップＳ３０５）。 Returning to FIG. 9, the optimizing unit 32 determines whether or not the data set S of reusable memory access data determined by the processing of step S304 (FIG. 11) is empty (NULL). Step S305).

ここで、データ集合Ｓが空でない場合（ステップＳ３０５否定）、最適化部３２は、ローカルメモリ方式の新アルゴリズムを生成すると決定する（ステップＳ３０７）。すなわち、最適化部３２は、再利用性のあるデータを連続領域に割当てることができるので、ローカルメモリ方式を採用できると判断する。 If the data set S is not empty (No at Step S305), the optimization unit 32 determines to generate a new algorithm of the local memory method (Step S307). That is, the optimizing unit 32 can allocate reusable data to the continuous area, and determines that the local memory method can be adopted.

具体的には、最適化部３２は、セクタ分割比有効範囲であるループＬに対して、図６に示したキャッシュウェイ数「１０」から、セクタ分割比を「セクタ番号０：セクタ番号１＝１０−ＳＷ：ＳＷ」と決定する。また、最適化部３２は、データ集合Ｓに登録されたデータ配列をセクタ番号１に格納し、ループＬ内でデータ集合Ｓに含まれないデータ配列をセクタ番号０に格納すると決定する。なお、最適化部３２は、各セクタに格納されたデータをＬＲＵ方式により順次新たなデータにより置換すると決定する。 Specifically, the optimizing unit 32 determines the sector division ratio “sector number 0: sector number 1 = from the cache way number“ 10 ”shown in FIG. 10-SW: SW ". Further, the optimization unit 32 determines that the data array registered in the data set S is stored in the sector number 1 and that the data array not included in the data set S in the loop L is stored in the sector number 0. The optimization unit 32 determines that the data stored in each sector is sequentially replaced with new data using the LRU method.

一方、データ集合Ｓが空である場合（ステップＳ３０５肯定）、最適化部３２は、ストリームデータ（連続する隣接アクセスデータ）として扱うデータ集合（Ｔ）および必要ウェイ数（ＴＷ）の決定を行なう（ステップＳ３０６）。 On the other hand, when the data set S is empty (Yes at step S305), the optimization unit 32 determines the data set (T) to be handled as stream data (continuous adjacent access data) and the required number of ways (TW) ( Step S306).

具体的には、図１２に示すように、最適化部３２は、ループ内の各データ（各データ配列）に対する処理を開始し（ステップＳ６０１）、ストリームデータがあるか否かを判定する（ステップＳ６０２）。 Specifically, as shown in FIG. 12, the optimization unit 32 starts processing for each data (each data array) in the loop (step S601), and determines whether there is stream data (step S601). S602).

ループ内にストリームデータがない場合（ステップＳ６０２否定）、最適化部３２は、「Ｔ＝ＮＵＬＬ」および「ＴＷ＝０」と決定する（ステップＳ６１０）。 When there is no stream data in the loop (No at Step S602), the optimization unit 32 determines “T = NULL” and “TW = 0” (Step S610).

一方、ループ内にストリームデータがある場合（ステップＳ６０２肯定）、最適化部３２は、ループ内にストリームデータ以外のデータがあるか否かを判定する（ステップＳ６０３）。 On the other hand, when there is stream data in the loop (Yes in step S602), the optimization unit 32 determines whether there is data other than stream data in the loop (step S603).

ここで、ループ内にストリームデータしかない場合（ステップＳ６０３否定）、最適化部３２は、「Ｔ＝ＮＵＬＬ」および「ＴＷ＝０」と決定する（ステップＳ６１０）。 If there is only stream data in the loop (No at step S603), the optimization unit 32 determines that “T = NULL” and “TW = 0” (step S610).

一方、ループ内にストリームデータとストリームデータ以外のデータとが混在する場合（ステップＳ６０３肯定）、最適化部３２は、「Ｔ」を決定し、ＴＷを決定するためのＴＢを決定する（ステップＳ６０４）。すなわち、最適化部３２は、ストリームデータの集合を「Ｔ」として決定し、ストリームデータの総アクセスサイズを回転数、または宣言サイズから最大アクセス幅を算出することで、ＴＢ（単位：バイト）を計算する。 On the other hand, when stream data and data other than stream data are mixed in the loop (Yes in step S603), the optimization unit 32 determines “T” and TB for determining TW (step S604). ). That is, the optimizing unit 32 determines a set of stream data as “T”, and calculates the maximum access width from the total access size of the stream data from the number of rotations or the declaration size, thereby obtaining TB (unit: bytes). calculate.

ここで、ＴＢが計算され、明らかとなった場合（ステップＳ６０５否定）、ＴＢが予め設定された下限閾値から上限閾値の範囲内にあるか否かを判定する（ステップＳ６０６）。なお、ＴＢに対して設定される閾値と、上述したＳＢに対して設定される閾値とは、異なる値であってもよいし、同一の値であってもよい。 Here, when TB is calculated and clarified (No in step S605), it is determined whether or not TB is within a range from a preset lower limit threshold to an upper limit threshold (step S606). The threshold value set for TB and the threshold value set for SB described above may be different values or the same value.

そして、ＴＢが予め設定された下限閾値から上限閾値の範囲内にない場合（ステップＳ６０６否定）、最適化部３２は、「Ｔ＝ＮＵＬＬ」および「ＴＷ＝０」と決定する（ステップＳ６１０）。 If TB is not within the range from the preset lower limit threshold to the upper limit threshold (No at Step S606), the optimization unit 32 determines “T = NULL” and “TW = 0” (Step S610).

一方、ＴＢが計算されず、不明である場合（ステップＳ６０５肯定）、および、計算されたＴＢが予め設定された下限閾値から上限閾値の範囲内にある場合（ステップＳ６０６肯定）、最適化部３２は、ストリームデータ数比率を算出する（ステップＳ６０７）。すなわち、最適化部３２は、ストリームデータ数を全データ数にて除することでＢＲを算出する。 On the other hand, when the TB is not calculated and is unknown (Yes at Step S605), and when the calculated TB is within the range from the preset lower limit threshold to the upper limit threshold (Yes at Step S606), the optimization unit 32 Calculates the stream data number ratio (step S607). That is, the optimization unit 32 calculates BR by dividing the number of stream data by the total number of data.

そして、最適化部３２は、算出したＢＲが予め設定された下限閾値から上限閾値の範囲内にあるか否かを判定する（ステップＳ６０８）。なお、ＢＲに対して設定される閾値と、上述したＴＢに対して設定される閾値とは、異なる値である。 Then, the optimization unit 32 determines whether or not the calculated BR is within a range from a preset lower limit threshold value to an upper limit threshold value (step S608). Note that the threshold value set for BR and the threshold value set for TB described above are different values.

ここで、ＢＲが予め設定された下限閾値から上限閾値の範囲内にない場合（ステップＳ６０８否定）、最適化部３２は、決定した集合Ｔを以降の処理に用いないとして、「Ｓ＝ＮＵＬＬ」および「ＳＷ＝０」と決定する（ステップＳ６１０）。 Here, when BR is not within the range from the preset lower limit threshold to the upper limit threshold (No at Step S608), the optimization unit 32 assumes that the determined set T is not used for the subsequent processing, and “S = NULL”. And “SW = 0” is determined (step S610).

一方、ＢＲが予め設定された下限閾値から上限閾値の範囲内にある場合（ステップＳ６０８肯定）、最適化部３２は、キャッシュメモリの総ウェイ数にＡＲおよび予め設定した閾値を乗算することで、ＴＷを算出する（ステップＳ６０９）。なお、ＴＷを算出するために用いられる閾値は、プログラマにより任意に設定される。 On the other hand, when BR is within the range from the preset lower threshold to the upper threshold (Yes at Step S608), the optimization unit 32 multiplies the total number of ways of the cache memory by AR and a preset threshold. TW is calculated (step S609). The threshold used for calculating TW is arbitrarily set by the programmer.

そして、最適化部３２は、ステップＳ６０４で決定したＴおよびステップＳ６０９で算出したＴＷを確定する（ステップＳ６１１）。 Then, the optimization unit 32 determines T determined in step S604 and TW calculated in step S609 (step S611).

続いて、最適化部３２は、ステップＳ６１０およびステップＳ６１１によりＴおよびＴＷが確定したのち、ステップＳ３０８に移行する。 Subsequently, the optimization unit 32 proceeds to step S308 after T and TW are determined in steps S610 and S611.

図９に戻って、最適化部３２は、ステップＳ３０６（図１２）の処理により決定されたストリームデータとして扱うデータ集合Ｔが空（ＮＵＬＬ）であるか否かを判定する（ステップＳ３０８）。 Returning to FIG. 9, the optimizing unit 32 determines whether or not the data set T to be handled as stream data determined by the process of step S306 (FIG. 12) is empty (NULL) (step S308).

ここで、データ集合Ｔが空であった場合（ステップＳ３０８肯定）、最適化部３２は、セクタ利用なしと判定して、ＬＲＵ方式を採用すると決定する（ステップＳ３１０）。 Here, when the data set T is empty (Yes at Step S308), the optimization unit 32 determines that the sector is not used and determines to adopt the LRU method (Step S310).

一方、データ集合Ｔが空でない場合（ステップＳ３０８否定）、最適化部３２は、最弱ウェイ方式の新アルゴリズムを生成すると決定する（ステップＳ３０９）。 On the other hand, when the data set T is not empty (No at Step S308), the optimization unit 32 determines to generate a new algorithm of the weakest way method (Step S309).

具体的には、最適化部３２は、セクタ分割比有効範囲であるループＬに対して、図６に示したキャッシュウェイ数「１０」から、セクタ分割比を「セクタ番号０：セクタ番号１＝１０：ＴＷ」と決定する。また、最適化部３２は、データ集合Ｔに登録されたデータ配列をセクタ番号１に格納し、ループＬ内でデータ集合Ｔに含まれないデータ配列をセクタ番号０に格納すると決定する。なお、最適化部３２は、各セクタに格納されたデータをＬＲＵ方式により順次新たなデータにより置換すると決定する。 Specifically, the optimizing unit 32 determines the sector division ratio “sector number 0: sector number 1 = from the cache way number“ 10 ”shown in FIG. 10: TW ". Further, the optimization unit 32 determines that the data array registered in the data set T is stored in the sector number 1 and the data array not included in the data set T in the loop L is stored in the sector number 0. The optimization unit 32 determines that the data stored in each sector is sequentially replaced with new data using the LRU method.

そして、ステップＳ３０７、ステップＳ３０９、または、ステップＳ３１０のいずれかの決定を行なったのち、最適化部３２は、図８に示すステップＳ２０３の処理に移行する。 Then, after making any determination of step S307, step S309, or step S310, the optimization unit 32 proceeds to the process of step S203 shown in FIG.

ここで、図９において決定された３種類のパターンについて、図１３を用いて具体的に説明する。図１３は、図９に示した３種類の決定パターンを説明するための図である。 Here, the three types of patterns determined in FIG. 9 will be specifically described with reference to FIG. FIG. 13 is a diagram for explaining the three types of determination patterns shown in FIG.

図５の（Ａ）に示すソースプログラムが格納された場合、最適化部３２は、図１３の（Ａ）に示すように、セクタ分割比有効範囲を「do J」と決定し、データ集合Ｓの要素を配列ｃと決定する。また、最適化部３２は、配列ｃの宣言サイズから算出されたＳＢが閾値範囲内にあり、ＡＲが閾値範囲内にあることから、ＳＢバイトを１ウェイのバイト数で除することで、例えば、「ＳＷ＝１」と決定する。そして、図６に示したキャッシュウェイ数「１０」から、最適化部３２は、図１３の（Ａ）に示すように、ローカルメモリ方式により、セクタ分割比を「セクタ番号０：セクタ番号１＝９：１」と決定する。また、最適化部３２は、データ集合Ｓに登録された配列ｃをセクタ番号１に格納し、ループＬ内でデータ集合Ｓに含まれない配列ａおよびｂをセクタ番号０に格納すると決定する。 When the source program shown in FIG. 5A is stored, the optimization unit 32 determines the effective range of the sector division ratio as “do J” as shown in FIG. Is determined as an array c. Further, the optimization unit 32 divides the SB byte by the number of bytes of 1 way because the SB calculated from the declaration size of the array c is within the threshold range and the AR is within the threshold range. , “SW = 1”. Then, from the cache way number “10” shown in FIG. 6, the optimization unit 32 sets the sector division ratio to “sector number 0: sector number 1 ==” by the local memory method as shown in FIG. 9: 1 ". Further, the optimization unit 32 determines that the array c registered in the data set S is stored in the sector number 1 and the arrays a and b not included in the data set S in the loop L are stored in the sector number 0.

図５の（Ｂ）に示すソースプログラムが格納された場合、最適化部３２は、図１３の（Ｂ）に示すように、セクタ分割比有効範囲を「do J」と決定し、データ集合Ｓが空であると決定する。さらに、最適化部３２は、連続する隣接アクセスデータである配列ａを要素とするデータ集合Ｔを決定する。また、最適化部３２は、ＴＢが不明である配列ａのＢＲが閾値範囲内であることから、総ウェイ数（キャッシュウェイ数）とＢＲと閾値とを乗算することで、「ＴＷ＝５」と決定する。そして、図６に示したキャッシュウェイ数「１０」から、最適化部３２は、図１３の（Ｂ）に示すように、最弱ウェイ方式により、セクタ分割比を「セクタ番号０：セクタ番号１＝１０：５」と決定する。また、最適化部３２は、データ集合Ｔに登録された配列ａをセクタ番号１に格納し、ループＬ内でデータ集合Ｔに含まれない配列ｃおよびｂをセクタ番号０に格納すると決定する。 When the source program shown in FIG. 5B is stored, the optimization unit 32 determines the effective range of the sector division ratio as “do J” as shown in FIG. Is determined to be empty. Furthermore, the optimization unit 32 determines a data set T whose elements are the array a that is continuous adjacent access data. In addition, since the BR of the array a whose TB is unknown is within the threshold range, the optimization unit 32 multiplies the total number of ways (cache way number), BR, and the threshold value to obtain “TW = 5”. And decide. Then, from the cache way number “10” shown in FIG. 6, the optimization unit 32 sets the sector division ratio to “sector number 0: sector number 1” by the weakest way method as shown in FIG. = 10: 5 ". Further, the optimization unit 32 determines that the array a registered in the data set T is stored in the sector number 1 and the arrays c and b not included in the data set T in the loop L are stored in the sector number 0.

図５の（Ｃ）に示すソースプログラムが格納された場合、最適化部３２は、ループ内のすべての配列サイズが明らかであり、キャッシュサイズより小さいことから、図１３の（Ｃ）に示すように、「セクタ分割データはなし」として、ＬＲＵ方式を採用すると決定する。 When the source program shown in (C) of FIG. 5 is stored, the optimization unit 32 clearly shows that all the array sizes in the loop are smaller than the cache size. Therefore, as shown in (C) of FIG. In addition, it is determined that the LRU method is adopted as “no sector division data”.

図８に戻って、最適化部３２は、図９の処理ののち、決定したセクタ分割処理に応じて、セクタ分割命令の挿入およびメモリアクセス命令へのセクタ番号付加を行ない（ステップＳ２０３）、最適化処理を終了する。 Returning to FIG. 8, the optimization unit 32 inserts a sector division instruction and adds a sector number to the memory access instruction in accordance with the determined sector division process after the process of FIG. 9 (step S203). The process is terminated.

すなわち、最適化部３２は、決定したセクタ分割比制御範囲、セクタ分割比、データごとのセクタ番号にしたがって、セクタ分割命令の挿入やセクタ番号の付加を行なって、例えば、図１４に示すような処理結果を最適化結果記憶部２４に格納する。図１４は、最適化結果記憶部を説明するための図である。 That is, the optimization unit 32 inserts a sector division command and adds a sector number according to the determined sector division ratio control range, sector division ratio, and sector number for each data, for example, as shown in FIG. The processing result is stored in the optimization result storage unit 24. FIG. 14 is a diagram for explaining the optimization result storage unit.

例えば、最適化部３２は、図１４の左側に示すループＬ１の前後がセクタ分割比有効範囲である場合、ローカルメモリ方式のセクタ分割比（９：１）で分割を指示する命令を挿入する。さらに、最適化部３２は、図１４に示すように、ループＬ１の後ろで、通常のＬＲＵ方式に戻す命令を挿入する。また、最適化部３２は、図１４に示すように、セクタ分割比有効範囲内の「load, store 」命令に対して、使用するセクタ番号（sector0, sector1）を指定する。そして、最適化部３２は、図１４の右側に示すデータを、最適化結果記憶部２４に格納する。なお、セクタ番号を指定する命令は、プリフェッチ命令である「prefetch」である場合であってもよい。 For example, when the loop division L1 shown on the left side of FIG. 14 is within the effective range of the sector division ratio, the optimization unit 32 inserts an instruction for instructing division at the sector division ratio (9: 1) of the local memory method. Further, as shown in FIG. 14, the optimization unit 32 inserts an instruction for returning to the normal LRU method after the loop L1. Further, as shown in FIG. 14, the optimization unit 32 designates sector numbers (sector0, sector1) to be used for the “load, store” instruction within the sector division ratio effective range. Then, the optimization unit 32 stores the data shown on the right side of FIG. 14 in the optimization result storage unit 24. Note that the instruction specifying the sector number may be a prefetch instruction “prefetch”.

そして、ステップＳ２０３（図８）にて最適化部３２の処理が終了すると、ファイル生成部３３は、ステップＳ１０４（図７）にて、オブジェクトファイルを生成する。 Then, when the process of the optimization unit 32 ends in step S203 (FIG. 8), the file generation unit 33 generates an object file in step S104 (FIG. 7).

上述してきたように、実施例１によれば、最適化部３２は、セクタ機能付きのキャッシュメモリを搭載する情報処理装置４０にて実行されるソースプログラムからソース解析部３１が変換した中間言語テキストを解析する。具体的には、最適化部３２は、各ループにおいて処理されるデータ配列の集合であるデータ集合のループ処理実行時における再利用性の有無を判定する。そして、最適化部３２は、再利用性の有無を判定したデータ集合を格納するために要するウェイ数とシステムの最大ウェイ数とから、セクタ分割比と、セクタ番号とを決定する。そして、最適化部３２は、セクタ分割比およびセクタ番号が決定されたループにおいて、セクタ分割命令およびメモリアクセス命令へのセクタ番号を付加した命令文を挿入する。そして、ファイル生成部３３は、命令文が挿入された中間言語テキストからオブジェクトファイルを生成する。 As described above, according to the first embodiment, the optimization unit 32 performs the intermediate language text converted by the source analysis unit 31 from the source program executed by the information processing apparatus 40 that includes the cache memory with the sector function. Is analyzed. Specifically, the optimization unit 32 determines whether or not there is reusability when executing loop processing of a data set that is a set of data arrays processed in each loop. Then, the optimizing unit 32 determines the sector division ratio and the sector number from the number of ways required to store the data set for which reusability is determined and the maximum number of ways of the system. Then, the optimization unit 32 inserts an instruction sentence in which the sector number is added to the sector division instruction and the memory access instruction in the loop in which the sector division ratio and the sector number are determined. Then, the file generation unit 33 generates an object file from the intermediate language text in which the command sentence is inserted.

したがって、コンパイラ装置１０において自動的にセクタ分割比とセクタ分割比有効範囲とセクタ番号と決定することができ、キャッシュメモリ、特にセクタ機能付きキャッシュメモリを有効的に利用することによりプログラムの実行性能を向上することが可能となる。また、再利用性のあるデータのアクセスサイズが分からない場合であっても、自動的にセクタ分割比およびセクタ番号を自動的に決定することができ、プログラマの負担を軽減して、セクタ機能付きキャッシュメモリを有効的に利用することが可能となる。 Therefore, the compiler unit 10 can automatically determine the sector division ratio, the sector division ratio effective range, and the sector number, and can effectively execute the program execution performance by effectively using the cache memory, particularly the cache memory with the sector function. It becomes possible to improve. Also, even when the access size of reusable data is unknown, the sector division ratio and sector number can be determined automatically, reducing the burden on the programmer and providing a sector function. The cache memory can be used effectively.

また、実施例１では、最適化部３２は、再利用性有りであり連続割り付け可能として判定したデータ集合について、ローカルメモリ方式によりセクタ分割比とセクタ番号とを決定する。これにより、再利用性のあるデータが、再利用性のあるデータ以外のデータによりＬＲＵ方式によりキャッシュメモリから追い出されることを防止でき、セクタ機能付きキャッシュメモリをより有効的に利用することが可能となる。 In the first embodiment, the optimization unit 32 determines a sector division ratio and a sector number by a local memory method for a data set that is determined to be reusable and can be continuously allocated. As a result, reusable data can be prevented from being evicted from the cache memory by the LRU method by data other than reusable data, and the cache memory with sector function can be used more effectively. Become.

また、実施例１では、最適化部３２は、再利用性有りのデータ配列がなかったループにて再利用性の無いストリームデータ（連続する隣接アクセスデータ）として判定されるデータ集合が存在する場合、ストリームデータとして判定したデータ集合について、最弱ウェイ方式によりセクタ分割比とセクタ番号とを決定する。これにより、再利用性のあるデータがキャッシュラインのコンフリクトにより、予定外にキャッシュメモリから追い出されることを防ぐことができ、セクタ機能付きキャッシュメモリをさらに有効的に利用することが可能となる。 Further, in the first embodiment, the optimization unit 32 includes a data set that is determined as stream data that is not reusable (continuous adjacent access data) in a loop in which there is no data array that has reusability. For the data set determined as stream data, the sector division ratio and the sector number are determined by the weakest way method. As a result, reusable data can be prevented from being unscheduled out of the cache memory due to a conflict in the cache line, and the cache memory with a sector function can be used more effectively.

なお、本実施例では、図１１に示したように、再利用性のあるデータ配列が連続領域に割り付け不可であった場合、Ｓを「ＮＵＬＬ」と決定する場合について説明したが、本実施例はこれに限定されるものではない。例えば、最適化部３２は、再利用性のあるデータ配列１、２および３のうち、データ配列１および２が連続領域に割り付け可能であった場合、データ配列１および２をＳとしてステップＳ５０６以降の処理を実行してもよい。 In the present embodiment, as shown in FIG. 11, the case where S is determined to be “NULL” when a reusable data array cannot be allocated to a continuous area has been described. Is not limited to this. For example, if the data arrays 1 and 2 of the reusable data arrays 1, 2, and 3 can be allocated to the continuous area, the optimization unit 32 sets the data arrays 1 and 2 as S and the subsequent steps S506 The process may be executed.

上述した実施例１では、セクタ分割比がコンパイラ装置１０により自動的に決定される場合について説明したが、実施例２では、セクタ分割比がプログラマにより指定される場合について説明する。 In the first embodiment described above, the case where the sector division ratio is automatically determined by the compiler apparatus 10 has been described. In the second embodiment, the case where the sector division ratio is specified by the programmer will be described.

実施例２におけるコンパイラ装置１０は、図４を用いて説明した実施例１におけるコンパイラ装置１０と同様の構成となる。しかし、実施例２においては、ソースプログラム入力部１１が受け付けるソースプログラムの内容と、ソース解析部３１および最適化部３２の処理内容とが実施例１と異なる。以下、これらを中心に説明する。 The compiler apparatus 10 in the second embodiment has the same configuration as the compiler apparatus 10 in the first embodiment described with reference to FIG. However, in the second embodiment, the contents of the source program received by the source program input unit 11 and the processing contents of the source analysis unit 31 and the optimization unit 32 are different from the first embodiment. Hereinafter, these will be mainly described.

実施例２においては、キャッシュメモリのセクタ分割をアセンブラ表記レベルでなく、ソースプログラムの表記レベルでプログラマが指定でき、かつ、コンパイラ装置１０が理解できるキャッシュ制御文が用いられる。実施例２において用いられるキャッシュ制御文の一例について、図１５を用いて説明する。なお、以下に説明するキャッシュ制御文は、あくまでも一例であり、プログラマが指定したセクタ分割比をコンパイラ装置１０が解読可能な形式であるならば、如何なる形式であってもよい。 In the second embodiment, a cache control statement that allows the programmer to specify the sector division of the cache memory not at the assembler notation level but at the source program notation level and that can be understood by the compiler apparatus 10 is used. An example of the cache control statement used in the second embodiment will be described with reference to FIG. Note that the cache control statement described below is merely an example, and may be in any format as long as the compiler device 10 can decode the sector division ratio specified by the programmer.

まず、キャッシュ制御文では、「!ocl sector_cache_begin」が、セクタ制御の開始を意味する制御文として用いられる（図１５の（１）参照）。なお、「!ocl」という表記は、一般的に最適化制御行と呼ばれるコンパイラに指示を与える制御文であり、多くのベンダでサポートされている。すなわち、図１５に示すキャッシュ制御文では、「!ocl」により、セクタ指定が行なわれている。 First, in the cache control statement, “! Ocl sector_cache_begin” is used as a control statement meaning the start of sector control (see (1) in FIG. 15). The notation “! Ocl” is a control statement that gives instructions to the compiler, generally called an optimization control line, and is supported by many vendors. That is, in the cache control statement shown in FIG. 15, the sector is specified by “! Ocl”.

また、キャッシュ制御文では、「!ocl sector0_max(m), sector1_max(n)」により、セクタ分割比が、セクタ番号とともに指示される（図１５の（２）参照）。ここで、図１５の（２）では、セクタ０の最大ウェイ数がｍ個であり、セクタ１の最大ウェイ数がｎ個であることが指定されている。ここで、実施例２におけるコンパイラ装置１０は、キャッシュ方式判定ルールとして、「ｍ＋ｎ」がシステムの最大ウェイ数と同じであるならば、ローカルメモリ方式をプログラマが指定していると判定する。また、実施例２におけるコンパイラ装置１０は、キャッシュ方式判定ルールとして、「ｍ＋ｎ」がシステムの最大ウェイ数より大きいならば最弱ウェイ方式をプログラマが指定していると判定する。 In the cache control statement, the sector division ratio is indicated together with the sector number by “! Ocl sector0_max (m), sector1_max (n)” (see (2) in FIG. 15). Here, in (2) of FIG. 15, it is specified that the maximum number of ways in sector 0 is m and the maximum number of ways in sector 1 is n. Here, the compiler apparatus 10 according to the second embodiment determines that the programmer specifies the local memory method if “m + n” is the same as the maximum number of ways of the system as a cache method determination rule. Further, the compiler apparatus 10 according to the second embodiment determines that the programmer designates the weakest way method if “m + n” is larger than the maximum number of ways of the system as a cache method determination rule.

また、キャッシュ制御文では、「!ocl sector0(array1), sector1(array2, array3)」により、セクタ番号ごとに格納されるデータの配列名が指定される（図１５の（３）参照）。ここで、図１５の（３）では、array1がセクタ０に割り当てる配列であり、array2およびarray3がセクタ１に割り当てる配列であるとして指定されている。 In the cache control statement, “! Ocl sector0 (array1), sector1 (array2, array3)” designates the array name of data stored for each sector number (see (3) in FIG. 15). Here, in (3) of FIG. 15, array1 is designated as an array assigned to sector 0, and arrays2 and array3 are designated as arrays assigned to sector 1.

なお、図１５の（３）に示すように、配列指定は、コンパイラ（コンパイル装置１０）に委ねることも可能である。ここで、コンパイラは、キャッシュ方式が最弱ウェイ方式と判定されたならば、セクタ１にストリームデータを割り当て、セクタ０にループ内のストリームデータ以外のデータを割り当てると判断する。また、コンパイラは、キャッシュ方式がローカルメモリ方式と判定されたならば、セクタ１に再利用性のあるデータを割り当て、セクタ０にループ内の再利用性のあるデータ以外のデータを割り当てると判断する。なお、再利用性のあるデータおよびストリームデータの判定は、実施例１で説明した方法と同様に行なう。 As shown in (3) of FIG. 15, it is possible to leave the array designation to the compiler (compile device 10). Here, if it is determined that the cache method is the weakest way method, the compiler determines that stream data is allocated to sector 1 and data other than the stream data in the loop is allocated to sector 0. Further, if the cache method is determined to be the local memory method, the compiler determines that reusable data is allocated to sector 1 and data other than the reusable data in the loop is allocated to sector 0. . Note that reusable data and stream data are determined in the same manner as described in the first embodiment.

また、キャッシュ制御文では、「!ocl sector_cache_end」が、セクタ制御の終了を意味する制御文として用いられる（図１５の（４）参照）。すなわち、「!ocl sector_cache_begin」および「!ocl sector_cache_end」により、実施例１で説明したセクタ分割比有効範囲の指定が、プログラマにより決定される。なお、以下では、実施例１で用いたセクタ分割比有効範囲に代わりに、制御開始位置から制御終了位置までを制御エリアと呼ぶ。 In the cache control statement, “! Ocl sector_cache_end” is used as a control statement meaning the end of sector control (see (4) in FIG. 15). That is, the designation of the sector division ratio effective range described in the first embodiment is determined by the programmer based on “! Ocl sector_cache_begin” and “! Ocl sector_cache_end”. In the following description, the area from the control start position to the control end position is referred to as a control area instead of the sector division ratio effective range used in the first embodiment.

また、キャッシュ制御文では、「!ocl loop sector_cache」が、「!ocl sector_cache_begin」および「!ocl sector_cache_end」による制御エリア指定の代わりに、後続するループに対して指示できる制御文として用いられてもよい（図１５の（５）参照）。 Also, in the cache control statement, “! Ocl loop sector_cache” may be used as a control statement that can be instructed to the subsequent loop instead of specifying the control area by “! Ocl sector_cache_begin” and “! Ocl sector_cache_end”. (See (5) in FIG. 15).

また、キャッシュ制御文では、上記したルールの他に、図１５に示すように、「セクタの入れ子は許さない」とするルールや、「sector_cache_beginからsector_cache_endに必ず到達する」とするルールが採用される。 In addition to the rules described above, in the cache control statement, as shown in FIG. 15, a rule that “nesting of sectors is not allowed” or a rule that “sector_cache_end must be reached from sector_cache_begin” is adopted. .

ここで、図１６〜図２０を用いて、図１５に示したキャッシュ制御文による様々なセクタ分割の指定例について説明する。なお、図１６は、最弱ウェイ方式によるセクタ分割の指定例を説明するための図であり、図１７は、ローカルメモリ方式によるセクタ分割の指定例を説明するための図である。また、図１８は、最弱ウェイ方式およびローカルメモリ方式によるセクタ分割の指定例を説明するための図であり、図１９は、制御エリア指定の変形例を説明するための図である。また、図２０は、準ローカルメモリ方式によるセクタ分割の指定例を説明するための図である。 Here, various sector division designation examples by the cache control statement shown in FIG. 15 will be described with reference to FIGS. FIG. 16 is a diagram for explaining an example of designation of sector division by the weakest way method, and FIG. 17 is a diagram for explaining an example of designation of sector division by the local memory method. FIG. 18 is a diagram for explaining a specification example of sector division by the weakest way method and the local memory method, and FIG. 19 is a diagram for explaining a modification example of control area specification. FIG. 20 is a diagram for explaining an example of designation of sector division by the quasi-local memory method.

また、図１６以降に示す指定例では、セクタ数が「２」であり、システムの最大ウェイ数が「１０」であるとして説明する。 In the designation examples shown in FIG. 16 and subsequent figures, it is assumed that the number of sectors is “2” and the maximum number of ways of the system is “10”.

図１６に示すキャッシュ制御文では、再利用性のないストリームデータ（配列ａおよびｂ）を５ウェイのセクタ１に格納し、ストリームデータ以外の再利用性があるデータを含むデータ（配列ｃ）を１０ウェイのセクタ０に格納することが指定されている。ここで、コンパイラ装置１０は、「１０＋５＞１０」であることから、最弱ウェイ方式が指定されていると判定する。すなわち、プログラマは、再利用性のあるデータ（配列ｃ）がキャッシュメモリから追い出される確率を低くするように指定している。 In the cache control statement shown in FIG. 16, non-reusable stream data (arrays a and b) is stored in the 5-way sector 1, and data including reusable data other than stream data (array c) is stored. It is specified that data is stored in sector 0 of 10 ways. Here, since “10 + 5> 10”, the compiler apparatus 10 determines that the weakest way method is designated. That is, the programmer specifies to reduce the probability that reusable data (array c) is evicted from the cache memory.

次に、図１７に示すキャッシュ制御文では、再利用性のあるデータ（配列ｃ）を１ウェイのセクタ１に格納し、再利用性のあるデータ以外のデータ（配列ａおよびｂ）を９ウェイのセクタ０に格納することが指定されている。ここで、コンパイラ装置１０は、「１＋９＝１０」であることから、ローカルメモリ方式が指定されていると判定する。すなわち、プログラマは、再利用性のあるデータ（配列ｃ）がキャッシュラインのコンフリクトにより、予定外にキャッシュメモリから追い出されることを完璧に防ぐことができる。 Next, in the cache control statement shown in FIG. 17, reusable data (array c) is stored in sector 1 of 1 way, and data other than reusable data (arrays a and b) is 9 ways. It is specified that the data is stored in the sector 0. Here, since the compiler apparatus 10 is “1 + 9 = 10”, it is determined that the local memory system is designated. That is, the programmer can completely prevent the reusable data (array c) from being unintentionally evicted from the cache memory due to a cache line conflict.

次に、図１８に示すキャッシュ制御文では、セクタ機能付きキャッシュメモリシステムの特性を活用するために、ループごとにセクタ分割比を指定して、最弱ウェイ方式とローカルメモリ方式とが併用されている。 Next, in the cache control statement shown in FIG. 18, in order to utilize the characteristics of the cache memory system with the sector function, the sector division ratio is designated for each loop, and the weakest way method and the local memory method are used together. Yes.

すなわち、図１８に示すように、あるループでは、再利用性のあるデータ（配列ｃ）を１ウェイのセクタ１に格納し、再利用性のあるデータ以外のデータ（配列ａおよびｂ）を９ウェイのセクタ０に格納するローカルウェイ方式が指定されている。また、図１８に示すように、別のループでは、ストリームデータ（配列ｄ）を５ウェイのセクタ１に格納する最弱ウェイ方式が指定されている。 That is, as shown in FIG. 18, in a certain loop, reusable data (array c) is stored in sector 1 of 1 way, and data other than reusable data (arrays a and b) is 9 A local way method for storing in sector 0 of the way is specified. As shown in FIG. 18, in another loop, the weakest way method for storing stream data (array d) in sector 1 of 5 ways is designated.

次に、図１９に示すキャッシュ制御文では、上述した制御エリア指定の変形例を示している。すなわち、図１９に示す例では、「!ocl sector_cache_begin」および「!ocl sector_cache_end」の代わりに、「!ocl loop sector_cache」を用いてループごとにセクタ分割比とセクタ番号を指示している。一般的に、データの再利用性の有無は、ループのＤＯ制御変数を判定対象とすることが多い。つまり、プログラマにとっては、ループに着目した制御エリア指定が簡易にできる方が使い勝手が良いため、本実施例では、「!ocl loop sector_cache」という指定も可能としている。 Next, the cache control statement shown in FIG. 19 shows a modified example of the control area designation described above. That is, in the example shown in FIG. 19, “! Ocl loop sector_cache” is used instead of “! Ocl sector_cache_begin” and “! Ocl sector_cache_end” to indicate the sector division ratio and sector number for each loop. In general, the presence or absence of data reusability is often determined based on the DO control variable of the loop. That is, for the programmer, it is easier to specify a control area that focuses on the loop, so that it is easier to use. In this embodiment, it is possible to specify “! Ocl loop sector_cache”.

なお、ローカルメモリ方式を指定する場合、図１７および図１８に示すように、ローカルメモリ方式を指定する先頭の最内ループの回転数が静的に判明していることは、必須ではない。例えば、回転数が変数であってもプログラマが、変数の取りうる最大値を知っているのであれば、ローカルメモリ方式によるセクタ分割は、可能である。例えば、プリグラマは、実際の実行モジュールに対してＰＡ（パフォーマンス・アナライザ）ツールを使うことにより、変数の取りうる最大値を取得する。 When the local memory system is designated, it is not essential that the rotation speed of the innermost loop for designating the local memory system is statically known as shown in FIGS. For example, even if the rotational speed is a variable, if the programmer knows the maximum value that the variable can take, sector division by the local memory method is possible. For example, the grammar uses the PA (performance analyzer) tool for the actual execution module to obtain the maximum possible value of the variable.

ここで、図２０に示すキャッシュ制御文は、再利用性のある配列のサイズが不明な配列ｃに対して、「６:４」と、セクタ分割比としてはローカルメモリ方式を適用した準ローカルメモリ方式の指定例を示している。しかし、準ローカルメモリ方式では、サイズが不明な配列ｃに関しては、セクタ１のキャッシュラインのコンフリクトを完全に防ぐことはできない可能性がある。また、準ローカルメモリ方式では、サイズの不明な配列ｃに対して、セクタ１のウェイ数を余分に取り過ぎてしまう可能性があるため、残りのデータを格納するウェイ数が減り、プログラムの実行性能が低下してしまう場合もある。 Here, the cache control statement shown in FIG. 20 is “6: 4” for the array c whose reusable array size is unknown, and a quasi-local memory to which the local memory method is applied as the sector division ratio. A method specification example is shown. However, in the quasi-local memory method, there is a possibility that the conflict of the cache line of sector 1 cannot be completely prevented with respect to the array c whose size is unknown. Further, in the quasi-local memory method, since there is a possibility that the number of ways in sector 1 is excessively increased with respect to the array c whose size is unknown, the number of ways for storing the remaining data is reduced, and the program is executed. In some cases, the performance is degraded.

したがって、セクタ機能付きキャッシュメモリシステムに準ローカルメモリ方式を指定する場合、プログラマは、例えば、実行モジュールを一度実行させ、ＰＡ（パフォーマンス・アナライザ）ツールでキャッシュ情報を把握した上で、キャッシュ制御文を入れる。 Therefore, when specifying the quasi-local memory system for a cache memory system with a sector function, for example, the programmer executes the execution module once, grasps the cache information with a PA (performance analyzer) tool, and then executes the cache control statement. Put in.

以下、キャッシュ制御文が挿入されたソースプログラムが入力された場合の実施例２におけるコンパイラ装置１０の処理について説明する。 Hereinafter, processing of the compiler apparatus 10 in the second embodiment when a source program in which a cache control statement is inserted is input will be described.

実施例２におけるソース解析部３１は、入力されたソースプログラムに記述されているキャッシュ制御文を解析し、解析結果をキャッシュ制御文データとして、上述した中間言語テキストとともにソース解析結果記憶部２２に格納する。 The source analysis unit 31 in the second embodiment analyzes a cache control statement described in the input source program, and stores the analysis result as cache control statement data in the source analysis result storage unit 22 together with the intermediate language text described above. To do.

例えば、ソース解析部３１は、図１６に示すソースプログラムのキャッシュ制御文を解析して、図２１に示すように、「sector0_max」に「１０」、「sector1_max」に「５」、「制御開始位置」に「１０」を登録する。なお、図２１は、実施例２におけるソース解析部を説明するための図である。 For example, the source analysis unit 31 analyzes the cache control statement of the source program shown in FIG. 16 and, as shown in FIG. 21, “10” for “sector0_max”, “5” for “sector1_max”, “control start position” "10" is registered. FIG. 21 is a diagram for explaining the source analysis unit according to the second embodiment.

さらに、ソース解析部３１は、図２１に示すように、セクタ０が指定された配列が１つであり、「sector0[0]」に「ｃ」を登録する。また、ソース解析部３１は、図２１に示すように、セクタ１が指定された配列が２つであり、「sector1[0]」に「ａ」を、「sector1[1]」に「ｂ」を登録する。さらに、ソース解析部３１は、図２１には示さないが、「制御終了位置」にも解析した値を登録する。 Furthermore, as illustrated in FIG. 21, the source analysis unit 31 has one array in which the sector 0 is specified, and registers “c” in “sector0 [0]”. Further, as shown in FIG. 21, the source analysis unit 31 has two arrays in which sector 1 is designated, “a” in “sector1 [0]”, and “b” in “sector1 [1]”. Register. Furthermore, although not shown in FIG. 21, the source analysis unit 31 registers the analyzed value in the “control end position”.

そして、実施例２における最適化部３２は、キャッシュ制御文データを読込んで、制御開始位置にキャッシュ分割命令を挿入する。具体的には、実施例１で説明した図１３のように、キャッシュ分割命令を挿入する。 Then, the optimization unit 32 according to the second embodiment reads the cache control statement data and inserts a cache division instruction at the control start position. Specifically, a cache division instruction is inserted as shown in FIG. 13 described in the first embodiment.

具体的には、最適化部３２は、キャッシュ制御文データを読込んで、データの再利用性の解析を行って、セクタにおけるウェイの重複を許すか否かを判定する。そして、最適化部３２は、制御エリア内のすべてのメモリアクセス命令（load, store, prefetch命令）に対し、各メモリアクセスのセクタ番号を決定する。ただし、最適化部３２は、キャッシュ制御文データに既にセクタ指定が明示されている場合、セクタ指定に従う。 Specifically, the optimization unit 32 reads the cache control statement data, analyzes the data reusability, and determines whether or not the way duplication is allowed in the sector. Then, the optimization unit 32 determines the sector number of each memory access for all the memory access instructions (load, store, prefetch instructions) in the control area. However, the optimization unit 32 follows the sector designation when the sector designation has already been explicitly indicated in the cache control statement data.

より具体的には、実施例２における最適化部３２は、図２２に示す手順により、セクタ決定処理を実行する。なお、図２２は、実施例２における最適化部のセクタ決定処理を説明するためのフローチャートである。 More specifically, the optimization unit 32 according to the second embodiment executes the sector determination process according to the procedure illustrated in FIG. FIG. 22 is a flowchart for explaining the sector determination process of the optimization unit according to the second embodiment.

図２２に示すように、実施例２における最適化部３２は、各データに対する処理を開始し（ステップＳ７０１）、キャッシュ制御文における「ｍ＋ｎ」がシステムの最大ウェイ数以上であるか否かを判定する（ステップＳ７０２）。なお、最適化部３２は、例えば、図２１に示す「sector0_max」と「sector1_max」とを足しあわすとこで「ｍ＋ｎ」を取得する。 As illustrated in FIG. 22, the optimization unit 32 according to the second embodiment starts processing for each data (step S701), and determines whether “m + n” in the cache control statement is equal to or greater than the maximum number of ways of the system. (Step S702). Note that the optimization unit 32 obtains “m + n” by adding “sector0_max” and “sector1_max” illustrated in FIG. 21, for example.

ここで、「ｍ＋ｎ」がシステムの最大ウェイ数より小さい場合（ステップＳ７０２否定）、最適化部３２は、セクタを指定しないと判定し（ステップＳ７０３）、処理を終了する。すなわち、最適化部３２は、セクタに分割せずに、ＬＲＵ方式にてデータの格納処理を実行すると判定する。 Here, when “m + n” is smaller than the maximum number of ways of the system (No at Step S702), the optimization unit 32 determines that no sector is designated (Step S703), and ends the process. That is, the optimizing unit 32 determines to execute the data storage process by the LRU method without dividing the sector.

一方、「ｍ＋ｎ」がシステムの最大ウェイ数以上である場合（ステップＳ７０２肯定）、最適化部３２は、キャッシュ制御文データにおいて、データのセクタ番号指定があるか否かを判定する（ステップＳ７０４）。すなわち、最適化部３２は、キャッシュ制御文のデータ構造内で、セクタが指定された配列があるか否かを判定する。 On the other hand, when “m + n” is equal to or greater than the maximum number of ways of the system (Yes at Step S702), the optimization unit 32 determines whether or not the data sector number is specified in the cache control statement data (Step S704). . That is, the optimization unit 32 determines whether there is an array in which a sector is specified in the data structure of the cache control statement.

ここで、データのセクタ番号指定がある場合（ステップＳ７０４肯定）、最適化部３２は、制御エリア内の全メモリアクセスに対し、セクタ番号を設定し、一部のデータ配列でセクタ番号指定がないものがあった場合、セクタ０に設定し（ステップＳ７０５）、処理を終了する。 If the data sector number is specified (Yes at step S704), the optimization unit 32 sets a sector number for all memory accesses in the control area, and no sector number is specified in some data arrays. If there is something, it is set to sector 0 (step S705), and the process is terminated.

一方、データのセクタ番号指定がない場合（ステップＳ７０４否定）、最適化部３２は、メモリアクセスごとの添字解析による再利用性判定の結果、すべて再利用性のないストリームデータか否かを判定する（ステップＳ７０６）。具体的には、最適化部３２は、添字を解析することで、ループ内のメモリアクセスがすべてＤＯ制御変数に依存して隣接データを連続アクセスするストリームデータであるか否かを判定する。 On the other hand, when the sector number of the data is not specified (No in step S704), the optimization unit 32 determines whether or not all the stream data has no reusability as a result of the reusability determination by the subscript analysis for each memory access. (Step S706). Specifically, the optimization unit 32 analyzes the subscripts to determine whether or not all memory accesses in the loop are stream data that continuously accesses adjacent data depending on the DO control variable.

ここで、すべて再利用性のないストリームデータである場合（ステップＳ７０６肯定）、最適化部３２は、セクタに分割せずに、ＬＲＵ方式にてデータの格納処理を実行すると判定して、処理を終了する。 Here, when all the stream data is not reusable (Yes in step S706), the optimization unit 32 determines that the data storage processing is to be executed by the LRU method without dividing the sector data, and performs the processing. finish.

一方、すべてのデータが、再利用性のないストリームデータでない場合（ステップＳ７０６否定）、最適化部３２は、「ｍ＋ｎ」がシステムの最大ウェイ数より大きいか否かを判定する（ステップＳ７０７）。 On the other hand, when all the data is not stream data having no reusability (No at Step S706), the optimization unit 32 determines whether “m + n” is larger than the maximum number of ways of the system (Step S707).

ここで、「ｍ＋ｎ」がシステムの最大ウェイ数と同じ値である場合（ステップＳ７０７否定）、最適化部３２は、再利用性のあるメモリアクセスをセクタ番号「１」とし、それ以外をセクタ番号「０」と指定して（ステップＳ７０８）、処理を終了する。すなわち、最適化部３２は、ローカルメモリ方式にてセクタを分割すると判定する。 Here, when “m + n” is the same value as the maximum number of ways of the system (No in step S707), the optimization unit 32 sets the reusable memory access as the sector number “1”, and sets the other as the sector number. “0” is designated (step S708), and the process is terminated. That is, the optimization unit 32 determines to divide the sector by the local memory method.

一方、「ｍ＋ｎ」がシステムの最大ウェイ数より大きい場合（ステップＳ７０７肯定）、最適化部３２は、再利用性の無いストリームデータのメモリアクセスをセクタ番号「１」とし、それ以外をセクタ番号「０」と指定して（ステップＳ７０９）、処理を終了する。すなわち、最適化部３２は、最弱ウェイ方式にてセクタを分割すると判定する。なお、最適化部３２は、データ配列の再利用性の有無およびデータ配列がストリームデータであるか否かの判定を、実施例１で説明した処理により、実行する。 On the other hand, when “m + n” is larger than the maximum number of ways of the system (Yes in step S707), the optimization unit 32 sets the memory access of the stream data without reusability as the sector number “1”, and sets the others as the sector number “ "0" is designated (step S709), and the process is terminated. That is, the optimization unit 32 determines to divide the sector by the weakest way method. The optimization unit 32 performs the determination as to whether or not the data array is reusable and whether or not the data array is stream data by the process described in the first embodiment.

これにより、実施例２における最適化部３２は、制御開始位置にキャッシュ分割命令を挿入して、キャッシュ分割命令が挿入された中間言語テキストを、最適化結果記憶部２４に格納する。ファイル生成部３３は、キャッシュ分割命令が挿入された中間言語テキストからオブジェクトファイルを生成して、情報処理装置４０に出力する。これにより、キャッシュ制御文が実行モジュールに反映され、情報処理装置４０は、プログラマが意図した通りのキャッシュ制御を実行する。 Thereby, the optimization unit 32 according to the second embodiment inserts the cache division instruction at the control start position, and stores the intermediate language text with the cache division instruction inserted in the optimization result storage unit 24. The file generation unit 33 generates an object file from the intermediate language text in which the cache division instruction is inserted, and outputs the object file to the information processing apparatus 40. As a result, the cache control statement is reflected in the execution module, and the information processing apparatus 40 executes cache control as intended by the programmer.

上述してきたように、実施例２によれば、最適化部３２は、セクタ機能付きのキャッシュメモリに格納する際のセクタ分割比がプログラマによりキャッシュ制御文にてソースプログラムから、セクタ分割比が指定されたループにて用いられる配列データの再利用性の有無を判定する。そして、最適化部３２は、データ配列の再利用性の有無とキャッシュ制御文のセクタ分割比とに基づいて、セクタ番号を決定して、セクタ分割命令およびメモリアクセス命令へのセクタ番号を付加した命令文を中間言語テキストに挿入する。 As described above, according to the second embodiment, the optimization unit 32 uses the cache control statement to specify the sector division ratio from the source program in the cache control statement by the programmer when storing in the cache memory with the sector function. The presence / absence of reusability of the sequence data used in the loop is determined. Then, the optimization unit 32 determines the sector number based on whether the data array is reusable and the sector division ratio of the cache control statement, and adds the sector number to the sector division instruction and the memory access instruction. Insert statements into intermediate language text.

したがって、キャッシュ制御文を解読可能なコンパイラ装置１０を用いることにより、プログラマの意図に沿って、セクタ分割機能付きキャッシュメモリを有効的に利用することができ、プログラムの実行性能を向上させることが可能となる。 Therefore, by using the compiler device 10 that can decode the cache control statement, the cache memory with the sector dividing function can be used effectively in accordance with the intention of the programmer, and the execution performance of the program can be improved. It becomes.

また、実施例２では、最適化部３２は、キャッシュ制御文にて指定されたセクタ分割比からローカルメモリ方式か最弱ウェイ方式かを自動的に判定してセクタ番号を自動的に決定するので、プログラマの負担を軽減することが可能となる。 In the second embodiment, the optimization unit 32 automatically determines the sector number by automatically determining whether the local memory method or the weakest way method from the sector division ratio specified in the cache control statement. It becomes possible to reduce the burden on the programmer.

なお、上記した実施例１および２において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともできる（例えば、ハードウェア情報を、プログラマが手動で入力するなど）。あるいは、本実施例において説明した各処理のうち、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報（例えば、図１５に示すキャッシュ制御文など）については、特記する場合を除いて任意に変更することができる。 Of the processes described in the first and second embodiments described above, all or a part of the processes described as being automatically performed can be manually performed (for example, the hardware information is stored by the programmer. Manually). Alternatively, among the processes described in the present embodiment, all or a part of the processes described as being performed manually can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters (for example, the cache control statement shown in FIG. 15) shown in the above-mentioned document and drawings are not specifically described. Can be changed arbitrarily.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

ところで上記の実施例では、ハードウェアロジックによって各種の処理を実現する場合を説明したが、本発明はこれに限定されるものではなく、あらかじめ用意されたプログラムをコンピュータで実行するようにしてもよい。そこで以下では、図２３を用いて、上記の実施例１に示したコンパイラ装置１０と同様の機能を有するコンパイラプログラムを実行するコンピュータの一例を説明する。図２３は、実施例１のコンパイラプログラムを実行するコンピュータを示す図である。 In the above embodiment, the case where various processes are realized by hardware logic has been described. However, the present invention is not limited to this, and a program prepared in advance may be executed by a computer. . Therefore, in the following, an example of a computer that executes a compiler program having the same function as the compiler apparatus 10 shown in the first embodiment will be described with reference to FIG. FIG. 23 is a diagram illustrating a computer that executes the compiler program according to the first embodiment.

図２３に示すように、情報処理装置としてのコンピュータ１００は、キーボード１０１、ディスプレイ１０２、ＣＰＵ１０３、ＲＯＭ１０４、ＨＤＤ１０５、ＲＡＭ１０６を有する。そして、キーボード１０１、ディスプレイ１０２、ＣＰＵ１０３、ＲＯＭ１０４、ＨＤＤ１０５およびＲＡＭ１０６は、バス１０７などで接続される。また、コンピュータ１００は、情報処理装置４０と接続される。 As shown in FIG. 23, a computer 100 as an information processing apparatus includes a keyboard 101, a display 102, a CPU 103, a ROM 104, an HDD 105, and a RAM 106. The keyboard 101, the display 102, the CPU 103, the ROM 104, the HDD 105, and the RAM 106 are connected by a bus 107 or the like. The computer 100 is connected to the information processing apparatus 40.

ＲＯＭ１０４には、上記の実施例１に示したコンパイラ装置１０と同様の機能を発揮するコンパイラプログラム、つまり、図２３に示すように、ソース解析プログラム１０４ａ、最適化プログラム１０４ｂ、ファイル生成プログラム１０４ｃが予め記憶されている。なお、これらのプログラム１０４ａ〜１０４ｃについては、図４に示したコンパイラ装置１０の各構成要素と同様、適宜統合または分散してもよい。 The ROM 104 stores in advance a compiler program that exhibits the same function as the compiler apparatus 10 shown in the first embodiment, that is, a source analysis program 104a, an optimization program 104b, and a file generation program 104c as shown in FIG. It is remembered. Note that these programs 104a to 104c may be appropriately integrated or distributed in the same manner as each component of the compiler apparatus 10 shown in FIG.

そして、ＣＰＵ１０３が、これらのプログラム１０４ａ〜１０４ｃをＲＯＭ１０４から読みだして実行する。これにより、図２３に示すように、プログラム１０４ａ〜１０４ｃ
は、ソース解析プロセス１０３ａ、最適化プロセス１０３ｂ、ファイル生成プロセス１０３ｃとして機能するようになる。なお、各プロセス１０３ａ〜１０３ｃは、図４に示した、ソース解析部３１、最適化部３２、ファイル生成部３３にそれぞれ対応する。 Then, the CPU 103 reads these programs 104 a to 104 c from the ROM 104 and executes them. Thus, as shown in FIG. 23, the programs 104a to 104c
Functions as a source analysis process 103a, an optimization process 103b, and a file generation process 103c. The processes 103a to 103c correspond to the source analysis unit 31, the optimization unit 32, and the file generation unit 33 shown in FIG.

また、ＨＤＤ１０５には、図２３に示すように、ソースプログラムデータ１０５ａ、ソース解析データ１０５ｂ、アーキデータ１０５ｃ、最適化結果データ１０５ｄが設けられる。各プログラムデータ１０５ａ〜１０５ｄは、図４に用いたソースプログラム記憶部２１、ソース解析結果記憶部２２、アーキデータ記憶部２３、最適化結果記憶部２４にそれぞれ対応する。そしてＣＰＵ１０３は、ソースプログラムデータ１０６ａをソースプログラムデータ１０５ａに対して登録し、ソース解析データ１０６ｂをソース解析データ１０５ｂに対して登録し、アーキデータ１０６ｃをアーキデータ１０５ｃに対して登録する。また、ＣＰＵ１０３は、最適化結果データ１０６ｄを最適化結果データ１０５ｄに対して登録する。そして、ＣＰＵ１０３は、登録したデータを読み出してＲＡＭ１０６に格納し、ＲＡＭ１０６に格納されたソースプログラムデータ１０６ａ、ソース解析データ１０６ｂ、アーキデータ１０６ｃおよび最適化結果データ１０６ｄに基づいてコンパイル処理を実行する。 Further, as shown in FIG. 23, the HDD 105 is provided with source program data 105a, source analysis data 105b, arche data 105c, and optimization result data 105d. Each of the program data 105a to 105d corresponds to the source program storage unit 21, the source analysis result storage unit 22, the arche data storage unit 23, and the optimization result storage unit 24 used in FIG. Then, the CPU 103 registers the source program data 106a with the source program data 105a, registers the source analysis data 106b with the source analysis data 105b, and registers the arche data 106c with the archi data 105c. Further, the CPU 103 registers the optimization result data 106d in the optimization result data 105d. Then, the CPU 103 reads the registered data, stores it in the RAM 106, and executes a compile process based on the source program data 106a, the source analysis data 106b, the arc data 106c, and the optimization result data 106d stored in the RAM 106.

なお、上記した各プログラム１０４ａ〜１０４ｃについては、必ずしも最初からＲＯＭ１０４に記憶させておく必要はなく、例えばコンピュータ１００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯディスク、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」、または、コンピュータ１００の内外に備えられるＨＤＤなどの「固定用物理媒体」、さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される「他のコンピュータ（またはサーバ）」などに各プログラムを記憶させておき、コンピュータ１００がこれらから各プログラムを読み出して実行するようにしてもよい。 The above-described programs 104a to 104c are not necessarily stored in the ROM 104 from the beginning. For example, a flexible disk (FD), a CD-ROM, an MO disk, a DVD disk, and a magneto-optical disk inserted into the computer 100 are used. The computer 100 via a “portable physical medium” such as a disk or an IC card, or a “fixed physical medium” such as an HDD provided inside or outside the computer 100, and further via a public line, the Internet, a LAN, a WAN, etc. Each program may be stored in “another computer (or server)” connected to the computer, and the computer 100 may read and execute each program from these programs.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）セクタ機能付きのキャッシュメモリを搭載する情報処理装置にて実行されるソースプログラムを解析することで、当該ソースプログラムの各ループにおいて処理されるデータ配列の集合であるデータ集合のループ処理実行時における再利用性の有無を判定し、当該再利用性の有無を判定したデータ集合を格納するために要する容量と前記キャッシュメモリの容量とから、前記キャッシュメモリにおけるセクタ分割比と、当該データ集合を格納するセクタを特定するためのセクタ番号とを決定する決定手順と、
前記決定手順によって前記セクタ分割比および前記セクタ番号が決定されたループにおいて、当該セクタ分割比および当該セクタ番号に基づく命令制御文を前記ソースプログラムに挿入する挿入手順と、
前記挿入手順によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成手順と、
をコンピュータに実行させることを特徴とするコンパイラプログラム。 (Supplementary Note 1) By analyzing a source program executed by an information processing apparatus equipped with a cache memory with a sector function, loop processing of a data set that is a set of data arrays processed in each loop of the source program Determine the presence or absence of reusability at the time of execution, and determine the sector division ratio in the cache memory and the data from the capacity required to store the data set for which the presence or absence of the reusability is stored and the capacity of the cache memory A decision procedure for determining a sector number for identifying a sector storing the set;
An insertion procedure for inserting an instruction control statement based on the sector division ratio and the sector number into the source program in the loop in which the sector division ratio and the sector number are determined by the determination procedure;
A file generation procedure for generating an object file from the source program in which the instruction control statement is inserted by the insertion procedure;
A compiler program characterized by causing a computer to execute.

（付記２）前記決定手順は、再利用性有りとして判定したデータ配列のデータ集合を格納するために要する容量と、当該再利用性有りとして判定したデータ配列以外のデータ集合を格納するために要する容量との和が、前記キャッシュメモリの容量となるように前記セクタ分割比と前記セクタ番号とを決定することを特徴とする付記１に記載のコンパイラプログラム。 (Supplementary Note 2) The determination procedure is required for storing a capacity required to store a data set of a data array determined to be reusable and a data set other than the data array determined to be reusable. The compiler program according to appendix 1, wherein the sector division ratio and the sector number are determined so that a sum of the capacities is equal to a capacity of the cache memory.

（付記３）前記決定手順は、再利用性有りのデータ配列がなかったループにて再利用性の無いストリームデータとして判定されるデータ集合が存在する場合、当該再利用性の無いストリームデータとして判定されるデータ集合を格納するために要する容量と当該再利用性の無いストリームデータとして判定されるデータ配列以外のデータ集合を格納するために要する容量との和が、前記キャッシュメモリの容量より大きくなるように前記セクタ分割比と前記セクタ番号とを決定することを特徴とする付記２に記載のコンパイラプログラム。 (Supplementary Note 3) When there is a data set that is determined as non-reusable stream data in a loop in which there is no reusable data array, the determination procedure is determined as non-reusable stream data. The sum of the capacity required to store the data set to be stored and the capacity required to store the data set other than the data array determined as the non-reusable stream data is larger than the capacity of the cache memory. The compiler program according to appendix 2, wherein the sector division ratio and the sector number are determined as described above.

（付記４）セクタ機能付きのキャッシュメモリに格納する際のセクタ分割比が作成者によりループ内にて指定されたソースプログラムを受け付ける受け付け手順と、
前記受け付け手順によって受け付けられた前記ソースプログラムから、当該ソースプログラムにて前記作成者により前記セクタ分割比が指定されたループにて用いられる配列データを抽出する抽出手順と、
前記抽出手順によって抽出された前記配列データの再利用性の有無と、前記セクタ分割比に基づいて、当該配列データを前記キャッシュメモリに格納する際のセクタ番号を決定する決定手順と、
前記決定手順によって前記セクタ番号が決定されたループにおいて、当該決定されたセクタ番号および前記作成者から指定されたセクタ分割比に基づく命令制御文を前記ソースプログラムに挿入する挿入手順と、
前記挿入手順によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成手順と、
をコンピュータに実行させることを特徴とするコンパイラプログラム。 (Supplementary Note 4) Accepting procedure for accepting a source program in which a sector division ratio when storing in a cache memory with a sector function is designated in a loop by the creator;
An extraction procedure for extracting array data used in a loop in which the sector division ratio is designated by the creator in the source program from the source program accepted by the acceptance procedure;
A determination procedure for determining a sector number when storing the array data in the cache memory based on the presence or absence of reusability of the array data extracted by the extraction procedure and the sector division ratio;
In the loop in which the sector number is determined by the determination procedure, an insertion procedure for inserting an instruction control statement based on the determined sector number and a sector division ratio designated by the creator into the source program;
A file generation procedure for generating an object file from the source program in which the instruction control statement is inserted by the insertion procedure;
A compiler program characterized by causing a computer to execute.

（付記５）前記決定手順は、前記作成者から指定されたセクタ分割比が、前記キャッシュメモリの容量に対応する範囲内で分割されたものであるか、前記キャッシュメモリの容量に対応する範囲より大きい範囲内で分割されたものであるかを判定したうえで、前記セクタ番号を決定することを特徴とする付記４に記載のコンパイラプログラム。 (Supplementary Note 5) In the determination procedure, the sector division ratio designated by the creator is divided within a range corresponding to the capacity of the cache memory, or from a range corresponding to the capacity of the cache memory. The compiler program according to appendix 4, wherein the sector number is determined after determining whether the sector is divided within a large range.

（付記６）セクタ機能付きのキャッシュメモリを搭載する情報処理装置にて実行されるソースプログラムを解析することで、当該ソースプログラムの各ループにおいて処理されるデータ配列の集合であるデータ集合のループ処理実行時における再利用性の有無を判定し、当該再利用性の有無を判定したデータ集合を格納するために要する容量と前記キャッシュメモリの容量とから、前記キャッシュメモリにおけるセクタ分割比と、当該データ集合を格納するセクタを特定するためのセクタ番号とを決定する決定部と、
前記決定部によって前記セクタ分割比および前記セクタ番号が決定されたループにおいて、当該セクタ分割比および当該セクタ番号に基づく命令制御文を前記ソースプログラムに挿入する挿入部と、
前記挿入部によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成部と、
を有することを特徴とするコンパイラ装置。 (Appendix 6) Loop processing of a data set that is a set of data arrays processed in each loop of the source program by analyzing a source program executed by an information processing apparatus equipped with a cache memory with a sector function Determine the presence or absence of reusability at the time of execution, and determine the sector division ratio in the cache memory and the data from the capacity required to store the data set for which the presence or absence of the reusability is stored and the capacity of the cache memory A determination unit for determining a sector number for identifying a sector storing the set;
An insertion unit for inserting an instruction control statement based on the sector division ratio and the sector number into the source program in a loop in which the sector division ratio and the sector number are determined by the determination unit;
A file generation unit that generates an object file from a source program in which the instruction control statement is inserted by the insertion unit;
A compiler apparatus comprising:

（付記７）前記決定部は、再利用性有りとして判定したデータ配列のデータ集合を格納するために要する容量と、当該再利用性有りとして判定したデータ配列以外のデータ集合を格納するために要する容量との和が、前記キャッシュメモリの容量となるように前記セクタ分割比と前記セクタ番号とを決定することを特徴とする付記６に記載のコンパイラ装置。 (Additional remark 7) The said determination part is required in order to store the capacity | capacitance required to store the data set of the data arrangement | sequence determined to have reusability, and data sets other than the data array determined to have the said reusability 7. The compiler apparatus according to appendix 6, wherein the sector division ratio and the sector number are determined so that a sum of the capacities is equal to a capacity of the cache memory.

（付記８）前記決定部は、再利用性有りのデータ配列がなかったループにて再利用性の無いストリームデータとして判定されるデータ集合が存在する場合、当該再利用性の無いストリームデータとして判定されるデータ集合を格納するために要する容量と当該再利用性の無いストリームデータとして判定されるデータ配列以外のデータ集合を格納するために要する容量との和が、前記キャッシュメモリの容量より大きくなるように前記セクタ分割比と前記セクタ番号とを決定することを特徴とする付記７に記載のコンパイラ装置。 (Supplementary Note 8) When there is a data set that is determined as non-reusable stream data in a loop in which there is no reusable data array, the determination unit determines that the data is not reusable stream data The sum of the capacity required to store the data set to be stored and the capacity required to store the data set other than the data array determined as the non-reusable stream data is larger than the capacity of the cache memory. The compiler apparatus according to appendix 7, wherein the sector division ratio and the sector number are determined as described above.

（付記９）セクタ機能付きのキャッシュメモリに格納する際のセクタ分割比が作成者によりループ内にて指定されたソースプログラムを受け付ける受け付け部と、
前記受け付け部によって受け付けられた前記ソースプログラムから、当該ソースプログラムにて前記作成者により前記セクタ分割比が指定されたループにて用いられる配列データを抽出する抽出部と、
前記抽出部によって抽出された前記配列データの再利用性の有無と、前記セクタ分割比に基づいて、当該配列データを前記キャッシュメモリに格納する際のセクタ番号を決定する決定部と、
前記決定部によって前記セクタ番号が決定されたループにおいて、当該決定されたセクタ番号および前記作成者から指定されたセクタ分割比に基づく命令制御文を前記ソースプログラムに挿入する挿入部と、
前記挿入部によって前記命令制御文が挿入されたソースプログラムからオブジェクトファイルを生成するファイル生成部と、
を有することを特徴とするコンパイラ装置。 (Supplementary Note 9) A reception unit that receives a source program in which a sector division ratio specified in a loop by a creator is stored in a cache memory with a sector function;
An extraction unit that extracts array data used in a loop in which the sector division ratio is specified by the creator in the source program from the source program received by the reception unit;
A determination unit for determining a sector number when storing the sequence data in the cache memory based on the presence or absence of reusability of the sequence data extracted by the extraction unit and the sector division ratio;
In the loop in which the sector number is determined by the determination unit, an insertion unit that inserts an instruction control statement based on the determined sector number and a sector division ratio specified by the creator into the source program;
A file generation unit that generates an object file from a source program in which the instruction control statement is inserted by the insertion unit;
A compiler apparatus comprising:

（付記１０）前記決定部は、前記作成者から指定されたセクタ分割比が、前記キャッシュメモリの容量に対応する範囲内で分割されたものであるか、前記キャッシュメモリの容量に対応する範囲より大きい範囲内で分割されたものであるかを判定したうえで、前記セクタ番号を決定することを特徴とする付記９に記載のコンパイラ装置。 (Additional remark 10) The said determination part is a sector division ratio designated by the said creator divided | segmented within the range corresponding to the capacity | capacitance of the said cache memory, or from the range corresponding to the capacity | capacitance of the said cache memory. The compiler apparatus according to appendix 9, wherein the sector number is determined after determining whether the sector is divided within a large range.

１０コンパイラ装置
１１ソースプログラム入力部
１２オブジェクトファイル出力部
１３通信部
１４入出力制御Ｉ／Ｆ部
２０記憶部
２１ソースプログラム記憶部
２２ソース解析結果記憶部
２３アーキデータ記憶部
２４最適化結果記憶部
３０処理部
３１ソース解析部
３２最適化部
３３ファイル生成部
４０情報処理装置 DESCRIPTION OF SYMBOLS 10 Compiler apparatus 11 Source program input part 12 Object file output part 13 Communication part 14 Input / output control I / F part 20 Storage part 21 Source program storage part 22 Source analysis result storage part 23 Arche data storage part 24 Optimization result storage part 30 Processing unit 31 Source analysis unit 32 Optimization unit 33 File generation unit 40 Information processing device

Claims

By analyzing a source program executed by an information processing device equipped with a cache memory with a sector function, a data set, which is a set of data arrays processed in each loop of the source program, is reproduced at the time of loop processing execution. The presence / absence of usability is determined, and the sector division ratio in the cache memory and the data set are stored from the capacity required to store the data set for which the reusability is determined and the capacity of the cache memory. A determination procedure for determining a sector number for identifying a sector;
An insertion procedure for inserting an instruction control statement based on the sector division ratio and the sector number into the source program in the loop in which the sector division ratio and the sector number are determined by the determination procedure;
A file generation procedure for generating an object file from the source program in which the instruction control statement is inserted by the insertion procedure;
A compiler program characterized by causing a computer to execute.

The determination procedure is a sum of a capacity required to store a data set of a data array determined to be reusable and a capacity required to store a data set other than the data array determined to be reusable. 2. The compiler program according to claim 1, wherein the sector division ratio and the sector number are determined so as to be a capacity of the cache memory.

In the determination procedure, when there is a data set determined as non-reusable stream data in a loop where there is no reusable data array, the data set determined as non-reusable stream data The sum of the capacity required to store the data and the capacity required to store the data set other than the data array determined as the non-reusable stream data is larger than the capacity of the cache memory. The compiler program according to claim 2, wherein a division ratio and the sector number are determined.

An acceptance procedure for accepting a source program in which a sector division ratio when storing in a cache memory with a sector function is specified in a loop by the creator;
An extraction procedure for extracting array data used in a loop in which the sector division ratio is designated by the creator in the source program from the source program accepted by the acceptance procedure;
A determination procedure for determining a sector number when storing the array data in the cache memory based on the presence or absence of reusability of the array data extracted by the extraction procedure and the sector division ratio;
In the loop in which the sector number is determined by the determination procedure, an insertion procedure for inserting an instruction control statement based on the determined sector number and a sector division ratio designated by the creator into the source program;
A file generation procedure for generating an object file from the source program in which the instruction control statement is inserted by the insertion procedure;
A compiler program characterized by causing a computer to execute.

In the determination procedure, the sector division ratio specified by the creator is divided within a range corresponding to the capacity of the cache memory, or within a range larger than the range corresponding to the capacity of the cache memory. The compiler program according to claim 4, wherein the sector number is determined after determining whether the data is divided.

By analyzing a source program executed by an information processing device equipped with a cache memory with a sector function, a data set, which is a set of data arrays processed in each loop of the source program, is reproduced at the time of loop processing execution. The presence / absence of usability is determined, and the sector division ratio in the cache memory and the data set are stored from the capacity required to store the data set for which the reusability is determined and the capacity of the cache memory. A determination unit for determining a sector number for identifying the sector;
An insertion unit for inserting an instruction control statement based on the sector division ratio and the sector number into the source program in a loop in which the sector division ratio and the sector number are determined by the determination unit;
A file generation unit that generates an object file from a source program in which the instruction control statement is inserted by the insertion unit;
A compiler apparatus comprising:

A receiving unit for receiving a source program in which a sector division ratio when storing in a cache memory with a sector function is designated in a loop by the creator;
An extraction unit that extracts array data used in a loop in which the sector division ratio is specified by the creator in the source program from the source program received by the reception unit;
A determination unit for determining a sector number when storing the sequence data in the cache memory based on the presence or absence of reusability of the sequence data extracted by the extraction unit and the sector division ratio;
In the loop in which the sector number is determined by the determination unit, an insertion unit that inserts an instruction control statement based on the determined sector number and a sector division ratio specified by the creator into the source program;
A file generation unit that generates an object file from a source program in which the instruction control statement is inserted by the insertion unit;
A compiler apparatus comprising: