JP2000066899A

JP2000066899A - Execution object optimizing device

Info

Publication number: JP2000066899A
Application number: JP10232434A
Authority: JP
Inventors: Kuniyasu Tajima; 邦康田島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-08-19
Filing date: 1998-08-19
Publication date: 2000-03-03

Abstract

PROBLEM TO BE SOLVED: To provide an execution object optimizing device capable of reducing a cache error caused by cache line competition and improving processing performance. SOLUTION: Object files 4 and 5 are processed by a linker locator 7 according to an instruction file 6 and a map file 9 including an executable object file 8 and arrangement addresses of functions is generated. Then, the executable object file 8 is downloaded to a target system 10 and the number of times of function calls in the target system 10 to the executable object file 8 is measured by a function calling time measuring means 31. Then, the cache line competition at the time of function calling is investigated from the number of times of function calls and the arrangement addresses of respective functions included in the map file 9 and the instruction file 6 is updated by dividing the respective functions into groups through a grouping processing means 33.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、ダイレクトマッ
プあるいはセットアソシアティブ・キャッシュを有する
マイクロプロセッサにおいて、キャッシュライン（キャ
ッシュ・インデックス・アドレス）の競合によるキャッ
シュミス（キャッシュ・スラッシング）を低減するよう
に実行オブジェクトを再配置する機能を有する実行オブ
ジェクト最適化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a microprocessor having a direct map or a set associative cache, and an execution object for reducing a cache miss (cache thrashing) due to competition of a cache line (cache index address). The present invention relates to an execution object optimizing device having a function of rearranging an execution object.

【０００２】[0002]

【従来の技術】図４は従来の実行オブジェクト生成装置
とデータ相関関係を示す機能ブロック図である。この図
４に示す従来の実行オブジェクト生成装置において、ソ
ースファイルをテキストエディタなどを用いて作成す
る。この図４では、ソースファイル１、ソースファイル
２の２つがあり、それぞれ、関数１１〜関数１３、関数
２１と関数２２が含まれているものとしている。このソ
ースファイル１，２をコンパイラ３で処理（コンパイ
ル）すると、機械語を含むリロケータブル・オブジェク
トであるオブジェクトファイル４、オブジェクトファイ
ル５が生成される。2. Description of the Related Art FIG. 4 is a functional block diagram showing a conventional execution object generating apparatus and data correlation. In the conventional execution object generation device shown in FIG. 4, a source file is created using a text editor or the like. In FIG. 4, it is assumed that there are two source files, source file 1 and source file 2, which include functions 11 to 13, functions 21 and 22, respectively. When the source files 1 and 2 are processed (compiled) by the compiler 3, object files 4 and 5 which are relocatable objects including machine language are generated.

【０００３】一方、テキストエディタなどで作成される
指示ファイル６には、結合するオブジェクトファイル
４，５の順序やアドレスなどのリンク・リロケータの情
報が含まれている。オブジェクトファイル４、オブジェ
クトファイル５を指示ファイル６にしたがって、リンカ
・ロケータ７で処理することで、実行可能オブジェクト
ファイル８と、各関数の配置アドレス情報などを含むマ
ップファイル９が生成される。実行可能オブジェクトフ
ァイル８はターゲットシステム１０へダウンロードさ
れ、関数の呼び出しが実行されることになる。[0003] On the other hand, an instruction file 6 created by a text editor or the like contains information on link relocators such as the order and addresses of the object files 4 and 5 to be combined. The object file 4 and the object file 5 are processed by the linker / locator 7 in accordance with the instruction file 6, so that an executable object file 8 and a map file 9 including arrangement address information of each function are generated. The executable object file 8 is downloaded to the target system 10, and the function call is executed.

【０００４】図５に実行可能オブジェクトファイル８に
格納されているプログラムのメモリ配置を示している。
この例では、関数１１、関数１２、関数１３、関数２
１、関数２２の順に配置されている。関数間の矢印は、
関数の呼び出し関係の一例を示している。関数１１は関
数１２と関数１３を呼び出し、関数１２と関数１３は関
数２２を呼び出している。また、関数１３は関数２１を
呼び出している。分岐処理がないものとすると、関数１
１→関数１２→関数２２→関数１２→関数１１→関数１
３→関数２１→関数１３→関数２２→関数１３→関数１
１と遷移する。FIG. 5 shows a memory arrangement of a program stored in the executable object file 8.
In this example, function 11, function 12, function 13, function 2
1, and a function 22 are arranged in this order. The arrows between the functions
4 shows an example of a function calling relationship. Function 11 calls function 12 and function 13, and function 12 and function 13 call function 22. The function 13 calls the function 21. If there is no branch processing, function 1
1 → Function 12 → Function 22 → Function 12 → Function 11 → Function 1
3 → Function 21 → Function 13 → Function 22 → Function 13 → Function 1
Transitions to 1.

【０００５】図６にダイレクトマップのキャッシュメモ
リの構造を示す。アドレスの下位（ここではビット１３
〜０）でインデックスされる選択部１４分のタグ部１６
とアドレス上位（ここではビット３１〜１４）が比較器
１５に入力され、一致・不一致が判定される。一致して
いる場合、キャッシュメモリに格納されたデータが正し
いヒット状態でデータ部１７のデータが使用可能であ
る。不一致の場合、格納されているデータは正しくない
ので、主記憶装置から正しいデータをコピーし、タグ部
１６を更新する。FIG. 6 shows the structure of a direct map cache memory. The lower part of the address (here, bit 13
To 16) The tag section 16 of the selection section 14 indexed by 0)
And the upper address of the address (here, bits 31 to 14) are input to the comparator 15, and a match / mismatch is determined. If they match, the data stored in the cache memory is in a correct hit state and the data in the data section 17 can be used. If they do not match, the stored data is incorrect, so the correct data is copied from the main storage device and the tag section 16 is updated.

【０００６】ミスヒットの処理を行っている間はキャッ
シュメモリへのアクセスは行えず、処理が停止している
ことになる。すなわち、できるだけキャッシュメモリへ
ヒットするように工夫することがマイクロプロセッサの
性能を向上するために重要になる。図５のマップファイ
ル９において、キャッシュメモリの単位（この例では、
１６Ｋバイト）を右側へ示している。[0006] During the process of mishit, access to the cache memory cannot be performed, and the process is stopped. That is, it is important to devise a hit to the cache memory as much as possible to improve the performance of the microprocessor. In the map file 9 of FIG. 5, the unit of the cache memory (in this example,
16K bytes) is shown on the right.

【０００７】[0007]

【発明が解決しようとする課題】このようなキャッシュ
メモリでは、キャッシュスラッシングと呼ばれる動作が
発生する場合がある。図７を用いて、その端的な例を説
明する。関数ａに含まれる関数ｂの呼び出し命令（ｃａ
ｌｌ命令）のキャッシュ・インデックスアドレスと関数
ｂのキャッシュ・インデックスアドレスが同じ場合であ
る。この呼び出し命令を実行する際には、この呼び出し
命令がキャッシュにヒットしている。In such a cache memory, an operation called cache thrashing may occur. A simple example will be described with reference to FIG. Call instruction of function b included in function a (ca
This is a case where the cache index address of the (ll instruction) and the cache index address of the function b are the same. When executing this call instruction, the call instruction has hit the cache.

【０００８】呼び出し命令が実行されると、制御が関数
ｂへ移り、キャッシュミスが検出され、キャッシュが関
数ｂの命令に更新され、その後関数ｂが実行される。関
数ｂの実行が終了すると、関数ａに制御が戻るが、その
戻り先は関数ｂに占有されており、この際、関数ａはキ
ャッシュミス状態である。また、ここで、関数ａに更新
されることになる。このように、同じキャッシュインデ
ックスに制御が行き来することで、キャッシュミスヒッ
トを発生する状態をスラッシングと称する。特に、ルー
プ制御にスラッシングが含まれると、キャッシュミスが
多発し、性能が極端に悪くなるという課題が生じること
になる。When the call instruction is executed, the control is transferred to the function b, a cache miss is detected, the cache is updated to the instruction of the function b, and then the function b is executed. When the execution of the function b is completed, the control returns to the function a, but the return destination is occupied by the function b. At this time, the function a is in a cache miss state. Also, here, the function a is updated. A state in which a cache mishit occurs when control is transferred to and from the same cache index is called thrashing. In particular, if thrashing is included in the loop control, cache misses occur frequently, and the problem of extremely poor performance occurs.

【０００９】このような課題の発生原因は、リロケータ
ブルオブジェクトから実行形式オブジェクトを作成する
際に、リンカが各関数の呼び出し相関とキャッシュへの
マッピングアドレスを考慮しないために起因している。
このスラッシングの発生を抑えるために、キャッシュメ
モリを複数のバンクにするセット・アソシアティブ構成
にする方式もあるが、その分、制御回路が複雑となり、
回路動作スピードを制限してしまう場合がある。また、
セットアソシアティブでも関数呼び出し相関が複雑にな
ると上述と同じようなスラッシングを発生することにな
る。The cause of the problem is that the linker does not consider the call correlation of each function and the mapping address to the cache when creating the executable object from the relocatable object.
In order to suppress the occurrence of this thrashing, there is a method of using a set associative configuration in which the cache memory is provided in a plurality of banks, but the control circuit is accordingly complicated,
The circuit operation speed may be limited. Also,
Even in the set associative, when the function call correlation becomes complicated, the same thrashing as described above occurs.

【００１０】なお、キャッシュメモリのミスヒットを防
止するために、特開平０５−２６５７７０号公報には、
ブロッキング可能な配列データを分割する場合に、ルー
プ回転数に割り切れ、キャッシュメモリに収まる大きさ
となる分割値を決定してオブジェクトプログラムを生成
することが開示されている。また、特開平０７−２８７
０２号公報には、プログラム中で繰り返し実行される部
分で実行されるアクセスされる複数のデータについての
キャッシュメモリのヒット効率を向上することが開示さ
れている。さらに、特開平０３−１８４１２６号公報に
は、実行回数の多い文をコンパイルしたコード列を一つ
の領域にまとめてオブジェクトプログラムを作成し、オ
ブジェクトプログラム実行時の主記憶装置へのアクセス
範囲を狭めることにより、アクセスしたいコードが命令
キャッシュに存在する確率を上げることが開示されてい
る。In order to prevent a cache memory mis-hit, Japanese Patent Laid-Open No. 05-265770 discloses that
It is disclosed that, when dividing blocking-capable array data, an object program is generated by determining a division value that is divisible by the loop rotation number and is large enough to fit in the cache memory. Also, Japanese Patent Application Laid-Open No. 07-287
No. 02 discloses that the hit efficiency of the cache memory is improved for a plurality of data to be accessed which are executed in a portion which is repeatedly executed in a program. Further, Japanese Patent Application Laid-Open No. H03-184126 discloses that an object program is created by compiling a code sequence obtained by compiling a sentence with a large number of executions into one area, thereby narrowing an access range to a main storage device when the object program is executed. To increase the probability that the code to be accessed exists in the instruction cache.

【００１１】しかし、これらの公報に開示されている技
術思想においても、関数呼び出し時に命令キャッシュの
キャッシュライン衝突によるキャッシュミスによるキャ
ッシュ置き替えの時間を短縮するために、関数の再配置
に関して示唆されていない。However, the technical ideas disclosed in these publications also suggest relocation of functions in order to reduce the time for cache replacement due to a cache miss due to a cache line collision of the instruction cache at the time of function call. Absent.

【００１２】この発明は、上記従来の課題を解決するた
めになされたもので、キャッシュラインの競合によるキ
ャッシュミスを低減でき、処理性能を向上することがで
きる実行オブジェクト最適化装置を提供することを目的
とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned conventional problems, and an object of the present invention is to provide an execution object optimizing apparatus capable of reducing cache misses due to cache line contention and improving processing performance. Aim.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するため
に、この発明の実行オブジェクト最適化装置は、複数の
オブジェクトファイルと結合する順序あるいはリンカ・
リロケート情報を含む指示ファイルと、所定の前記オブ
ジェクトファイルを前記指示ファイルにしたがってリン
カ・ロケータで処理することにより生成されてターゲッ
トシステムへダウンロードされて関数が呼び出される
か、あるいはデータキャッシュラインに対応するアドレ
スのデータをアクセスする実行可能オブジェクトファイ
ルと、前記オブジェクトファイルを前記指示ファイルに
したがってリンカ・ロケータで処理されることにより生
成され、関数の配置アドレス情報あるいはデータキャッ
シュラインに対応するアドレスのデータを含むマップフ
ァイルと、前記ターゲットシステムから関数呼び出し回
数あるいは前記アドレスデータのアクセス回数を測定す
る回数測定手段と、前記回数測定手段で測定された前記
実行可能オブジェクトファイルに対する関数呼び出し回
数あるいは前記アドレスデータのアクセス回数と前記マ
ップファイルの関数の配置アドレス情報あるいは前記ア
ドレスデータとから前記指示ファイルのリンカ・リロケ
ート情報をグループ化して更新するグループ化処理手段
とを備えることを特徴とする。In order to achieve the above object, an execution object optimizing apparatus according to the present invention comprises:
An instruction file containing relocate information, and an address corresponding to a data cache line generated or downloaded by processing the predetermined object file with a linker locator according to the instruction file and downloaded to a target system, or And a map generated by processing the object file by the linker locator according to the instruction file and including location data of a function or data of an address corresponding to a data cache line. A file, a number measuring means for measuring the number of function calls or the number of accesses to the address data from the target system, and the executable object measured by the number measuring means. Grouping processing means for grouping and updating the linker / relocate information of the instruction file from the number of function calls to the file or the number of accesses to the address data and the location address information or the address data of the functions in the map file. It is characterized by.

【００１４】この発明によれば、オブジェクトファイル
をリンク・ロケータで処理することにより実行可能オブ
ジェクトを生成するとともに、関数の配置アドレス情報
あるいはリンカ・ロケータなどを含むマップファイルを
生成し、この実行可能オブジェクトファイルをターゲッ
トシステムへダウンロードして実行可能オブジェクトフ
ァイルに対してターゲットシステムが行う関数呼び出し
回数、あるいはデータキャッシュラインに対応するアド
レスのデータアクセス回数を回数測定手段で測定し、そ
の測定された回数とマップファイルの関数の配置アドレ
ス情報、あるいはデータキャッシュラインに対応するア
ドレスのデータとからグループ化処理手段により指示フ
ァイルの関数あるいはアドレスデータのグループ化を行
って更新する。According to the present invention, an executable object is generated by processing the object file by the link locator, and a map file including the allocation address information of the function or the linker locator is generated. The file is downloaded to the target system and the number of function calls performed by the target system on the executable object file or the number of data accesses of the address corresponding to the data cache line is measured by the number measuring means. The function or address data of the designated file is grouped and updated by the grouping processing means from the allocation address information of the file function or the data of the address corresponding to the data cache line.

【００１５】したがって、この発明では、関数呼び出し
時、あるいはデータキャッシュラインに対応するアドレ
スのアクセス時にキャッシュライン衝突によるキャッシ
ュミスに起因するキャッシュ置き替え時間を削減でき、
キャッシュミスの低減化が可能となり、処理性能を向上
することができる。Therefore, according to the present invention, the cache replacement time due to a cache miss due to a cache line collision can be reduced at the time of calling a function or accessing an address corresponding to a data cache line,
Cache misses can be reduced, and processing performance can be improved.

【００１６】[0016]

【発明の実施の形態】以下、この発明による実行オブジ
ェクト最適化装置の実施の形態について図面に基づき説
明する。図１は、この発明による実行オブジェクト最適
化装置の第１実施の形態の構成とデータの相関関係を示
す機能ブロック図である。この図１において、前記図４
で示した従来例と同一部分には、同一符号を付して、構
成の重複説明を避け、図４とは異なるこの第１実施の形
態の特徴となす部分を重点的に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of an execution object optimizing device according to the present invention will be described below with reference to the drawings. FIG. 1 is a functional block diagram showing the correlation between the configuration and data of the first embodiment of the execution object optimizing device according to the present invention. In FIG. 1, FIG.
The same parts as those of the conventional example shown by the same reference numerals are denoted by the same reference numerals to avoid redundant description of the configuration, and the parts which are different from FIG. 4 and which are features of the first embodiment will be mainly described.

【００１７】この図１を図４と比較しても明らかなよう
に、図１に示す第１実施の形態では、図４で示した従来
の構成に新たに、ターゲットシステムから関数呼び出し
回数を測定する回数測定手段としての関数呼び出し回数
測定手段３１と、この関数呼び出し回数測定手段３１で
測定された関数呼び出し回数を収集した関数呼び出しデ
ータベース３２と、グループ化処理手段３３とが付加さ
れている。関数呼び出しデータベース３２に収集された
関数呼び出し回数とマップファイル９とからグループ化
処理手段３３により指示ファイル６の内容を更新するよ
うにしている。As is clear from the comparison of FIG. 1 with FIG. 4, the first embodiment shown in FIG. 1 newly measures the number of function calls from the target system in addition to the conventional configuration shown in FIG. A function call frequency measuring means 31 as a function frequency measuring means, a function call database 32 collecting the function call frequency measured by the function call frequency measuring means 31, and a grouping processing means 33 are added. The contents of the instruction file 6 are updated by the grouping processing means 33 from the number of function calls collected in the function call database 32 and the map file 9.

【００１８】次に、この第１実施の形態の動作について
説明する。関数の呼び出し回数の測定は、関数呼び出し
回数測定手段３１により実行オブジェクトファイル８が
ターゲットシステム１０にダウンロードして実行オブジ
ェクトファイル８に対してターゲットシステム１０が関
数呼び出しを行う実動作をモニタすることで、各関数間
の呼び出し回数を計数し、関数呼び出しデータベース３
２を作成する。関数呼び出しデータベース３２には、呼
び出す側の関数名、呼び出される関数名、および、呼び
出し回数が格納される。Next, the operation of the first embodiment will be described. The function call count is measured by monitoring the actual operation in which the execution object file 8 is downloaded to the target system 10 by the function call count measuring unit 31 and the target system 10 calls the function to the execution object file 8. The number of calls between each function is counted, and the function call database 3
Create 2. The function call database 32 stores the name of the function to be called, the name of the function to be called, and the number of calls.

【００１９】次に、図２を用いてグループ化処理手段３
３によるグループ化処理を説明する。図２は、グループ
の作成する手順を示している。グループ化は関数呼び出
しデータベース３２とマップファイル９から、関数呼び
出し時のキャッシュライン競合を調べて関数のグループ
化を行う。マップファイル９には、関数名、その関数を
含むソース名、関数の配置されているアドレスおよび関
数サイズが格納されている。グループ化は結合度の高い
（関数呼び出し回数の多い）ものから順にまとめてい
く。Next, the grouping processing means 3 will be described with reference to FIG.
3 will be described. FIG. 2 shows a procedure for creating a group. Grouping is performed by examining cache line conflicts at the time of function call from the function call database 32 and the map file 9. The map file 9 stores a function name, a source name including the function, an address where the function is located, and a function size. Grouping is performed in ascending order of degree of coupling (the number of function calls is large).

【００２０】グループサイズは、ダイレクトマップ方式
の場合はキャッシュサイズ、セットアソシアティブ方式
の場合は１セットのサイズとする。各関数をグループ分
けした後、小さなサイズのグループをまとめて、グルー
プ内での空き領域をできるだけ小さくする。The group size is a cache size in the case of the direct map system, and one set size in the case of the set associative system. After grouping the functions, small-sized groups are put together to make the free space in the groups as small as possible.

【００２１】図３は、図２で作成されたグループ間でス
ラッシングを発生しない（あるいは発生しづらくする）
ように、グループ内で再配置する手順を示している。グ
ループ間の衝突回避では、各グループ内で関数を入れ替
える。関数呼び出しデータベース３２のうち、グループ
化でまとめられたものはすでに衝突の可能性がない。残
ったデータを、呼び出し回数の多い順に処理する。衝突
が回避されたペア、あるいは、もとから衝突が発生しな
かったペアについて、後の処理で衝突が再発しないよう
に、各関数に印をつけていく。FIG. 3 shows that thrashing does not occur (or hardly occurs) between the groups created in FIG.
Thus, the procedure for rearranging within a group is shown. In collision avoidance between groups, functions are exchanged within each group. Of the function call databases 32, those compiled by grouping already have no possibility of collision. The remaining data is processed in descending order of the number of calls. For each pair for which a collision has been avoided or for which a collision has not originally occurred, each function is marked so that the collision does not recur in later processing.

【００２２】関数の入れ替えは、衝突の発生する関数を
n個、上位アドレスまたは下位アドレス方向にずらすこ
とで行う。図３では、グループａの関数ａ１の位置と関
数ａ２に位置が逆に入れ替えられている場合を示してい
る。上位アドレスにずらして、衝突回避した場合には、
以後の処理で下位アドレス方向にずらさないように印を
つけ、下位アドレス方向にずらして、衝突回避した場合
には、以後の処理で上位アドレス方向にずらさないよう
に印をつける。The function replacement is performed by replacing the function in which the collision occurs.
This is performed by shifting the number by n in the direction of the upper address or the lower address. FIG. 3 shows a case where the positions of the function a1 of the group a and the function a2 are reversed. If the collision is avoided by shifting to the upper address,
In the subsequent processing, a mark is made so as not to be shifted in the lower address direction. When the collision is avoided by shifting in the lower address direction, a mark is made so as not to be shifted in the higher address direction in the subsequent processing.

【００２３】指示ファイル６には、グループ化処理手段
３３により上述のグループ化された関数の配置順、アラ
イメントが記述される。指示ファイル６が更新される
と、再度リンカは指示ファイル６を読み込んで、関数配
置順、アライメントを変更した実行オブジェクトを作成
する。The instruction file 6 describes the arrangement order and alignment of the grouped functions by the grouping processing means 33. When the instruction file 6 is updated, the linker reads the instruction file 6 again and creates an execution object whose function arrangement order and alignment are changed.

【００２４】命令キャッシュの競合を避けるための関数
配置を求める問題は、ＮＰコンプリートであり、完全解
を求めるためには非常に多くのメモリと時間を要する。
そこで、この第１実施の形態では、関数呼び出し回数に
よる優先度を用いて、プログラム全体の実行速度短縮に
より大きく寄与するものを優先的に処理し、再配置処理
に要する時間と実行速度改善効果のバランスをとってい
る。このように、第１実施の形態では、関数呼び出し時
に命令キャッシュのキャッシュライン衝突によるキャッ
シュミスによるキャッシュ置き換えの時間を削減できる
ように関数を再配置しているので、キャッシュミスを低
減でき、処理性能を向上することができるという効果が
ある。The problem of finding a function arrangement to avoid instruction cache contention is NP complete, and it takes a lot of memory and time to find a complete solution.
Therefore, in the first embodiment, the priority based on the number of function calls is used to preferentially process a program that greatly contributes to the reduction in the execution speed of the entire program, thereby reducing the time required for the rearrangement process and the execution speed improvement effect. Balanced. As described above, in the first embodiment, the functions are rearranged so that the cache replacement time due to the cache miss due to the cache line collision of the instruction cache can be reduced when the function is called, so that the cache miss can be reduced and the processing performance can be reduced. There is an effect that can be improved.

【００２５】次に、この発明の第２実施の形態について
説明する。この第２実施の形態では、構成に関しては、
前記第１の実施の形態と同じであるが、命令キャッシュ
の代わりにデータキャッシュを対象とすることができ
る。この場合、関数の呼び出し回数の計数を、各データ
キャッシュラインに対応するアドレスのデータアクセス
回数の計数とする。また、この第２実施の形態では、タ
ーゲットシステム１０に実行可能オブジェクトファイル
をダウンロードして、関数呼び出しデータベースを生成
しているが、シミュレータを用いてこのデータベースを
生成することも可能である。さらに、ここでは説明を簡
単にするために、ダイレクトマップキャッシュを想定し
ていたが、グループ化処理手段３３を変更することで、
セットアソシアティブキャッシュにも適用することがで
きる。Next, a second embodiment of the present invention will be described. In the second embodiment, regarding the configuration,
Same as the first embodiment, except that a data cache can be used instead of an instruction cache. In this case, the count of the number of function calls is the count of the number of data accesses of the address corresponding to each data cache line. In the second embodiment, the executable object file is downloaded to the target system 10 to generate the function call database. However, the database may be generated using a simulator. Further, here, for the sake of simplicity, a direct map cache is assumed, but by changing the grouping processing means 33,
It can also be applied to set associative cash.

【００２６】[0026]

【発明の効果】以上のように、この発明によれば、ター
ゲットシステムから関数の呼び出し回数あるいはデータ
キャッシュラインに対応するアドレスデータのアクセス
回数を回数測定手段で測定し、その測定された回数とマ
ップファイルの配置アドレス情報とからグループ化処理
手段により指示ファイルの関数あるいはアドレスデータ
のグループ化を行って更新するようにしたので、関数呼
び出し時、あるいはデータキャッシュラインに対応する
アドレスのアクセス時にキャッシュライン衝突によるキ
ャッシュミスに起因するキャッシュ置き替え時間を削減
でき、キャッシュミスの低減化が可能となり、処理性能
を向上することができる。As described above, according to the present invention, the number of times a function is called from the target system or the number of times of accessing address data corresponding to a data cache line is measured by the number-of-times measuring means. Since the function of the instruction file or the address data is grouped and updated by the grouping processing means from the file arrangement address information, the cache line collision occurs at the time of calling the function or accessing the address corresponding to the data cache line. The cache replacement time caused by the cache miss can be reduced, the cache miss can be reduced, and the processing performance can be improved.

[Brief description of the drawings]

【図１】この発明による実行オブジェクト最適化装置の
第１実施の形態の構成とデータとの相関関係を示す機能
ブロック図である。FIG. 1 is a functional block diagram showing a correlation between data and a configuration of a first embodiment of an execution object optimizing apparatus according to the present invention.

【図２】図１の実行オブジェクト最適化装置におけるグ
ループ化処理手段による指示ファイルのリンク・リロケ
ート情報のグループ化作成の説明図である。FIG. 2 is an explanatory diagram of group creation of link / relocate information of an instruction file by a grouping processing unit in the execution object optimizing apparatus of FIG. 1;

【図３】図１の実行オブジェクト最適化装置における指
示ファイルのリンク・リロケート情報のグループ内の再
配置手順の説明図である。FIG. 3 is an explanatory diagram of a relocation procedure in a group of link / relocate information of an instruction file in the execution object optimizing apparatus of FIG. 1;

【図４】従来の実行オブジェクト生成装置の構成とデー
タの相関関係を示す機能ブロック図である。FIG. 4 is a functional block diagram showing a correlation between a configuration of a conventional execution object generation device and data.

【図５】従来の実行オブジェクト生成装置における実行
オブジェクト内の関数配置と間数呼び出し関係を示す説
明図である。FIG. 5 is an explanatory diagram showing a function arrangement and an inter-number call relation in an execution object in a conventional execution object generation device.

【図６】従来の実行オブジェクト生成装置に適用される
ダイレクトマップのキャッシュメモリの構造を示す説明
図である。FIG. 6 is an explanatory diagram showing the structure of a direct map cache memory applied to a conventional execution object generation device.

【図７】従来の実行オブジェクト生成装置における実行
オブジェクトの関数呼び出しとスラッシング発生メカ二
ズムの関係を説明するための説明図である。FIG. 7 is an explanatory diagram for explaining a relationship between a function call of an execution object and a thrashing generation mechanism in a conventional execution object generation device.

[Explanation of symbols]

１，２……ソースファイル、３……コンパイラ、４、５
……オブジェクトファイル、６……指示ファイル、７…
…リンカ・ロケータ、８……実行可能オブジェクトファ
イル、９……マップファイル、１０……ターゲットシス
テム、１１〜１３，２１，２２……関数、３１……関数
呼び出し回数測定手段、３２……関数呼び出しデータベ
ース、３３……グループ化処理手段。1, 2, ... source file, 3 ... compiler, 4, 5
…… Object file, 6 …… Instruction file, 7…
... Linker locator, 8 ... Executable object file, 9 ... Map file, 10 ... Target system, 11-13, 21, 22 ... Function, 31 ... Function call frequency measuring means, 32 ... Function call Database, 33 ... Grouping processing means.

Claims

[Claims]

1. An instruction file including an order of linking with a plurality of object files or linker relocating information, and a predetermined object file generated by processing with a linker locator according to the instruction file and downloaded to a target system And an executable object file for accessing data at an address corresponding to a data cache line when the function is called, and the object file is generated by being processed by a linker locator according to the instruction file. Measures the number of function calls or the number of access to the address data from the target system and the map file containing the location address information or the data of the address corresponding to the data cache line The number of times the function file is called or the number of times the address data is accessed for the executable object file measured by the number of times measurement means, and the location address information of the function in the map file or the address data. An execution object optimizing device, comprising: grouping processing means for grouping and updating linker / relocate information.

2. A function call database is created by counting the number of calls between functions by monitoring the actual operation of calling a function to the executable object file by the target system. 2. The execution object optimizing device according to claim 1, wherein:

3. The function call database according to claim 1, wherein a function name called from said executable object file, a function name called from said target system, and a function call count from said count measuring means are stored. Item 3. The execution object optimizing device according to Item 2.

4. The execution object optimization method according to claim 1, wherein said grouping processing means performs function grouping by checking a cache line conflict at the time of function call from said function call database and said map file. Device.

5. The execution object optimizing apparatus according to claim 1, wherein said grouping processing means groups together the functions from the executable object file in descending order of the number of calls.

6. The execution object optimization method according to claim 1, wherein the grouping processing means sets a group size of the function when performing the grouping of the function to a cache size in the case of a direct map method. Device.

7. The execution according to claim 1, wherein the grouping processing means sets a group size of the function when performing grouping of the function to one set size in the case of a set associative method. Object optimization device.

8. The execution object according to claim 1, wherein after grouping the functions, the grouping processing unit collects small-sized groups to reduce a free area in the groups as much as possible. Optimizer.

9. The execution object optimization method according to claim 1, wherein said grouping processing means performs function replacement within each group after grouping of said functions to avoid collision between groups. apparatus.

10. The data processing apparatus according to claim 1, wherein the grouping processing unit performs a collision avoidance process in descending order of the number of calls among the data regarding the ungrouped functions stored in the function call database. 1
An execution object optimizing apparatus according to the above.

11. The grouping processing means may mark a function name so that a collision among functions stored in the function call database that avoids collision or has no possibility of collision does not recur. 2. The execution object optimizing apparatus according to claim 1, wherein the execution object is attached.

12. The execution object according to claim 1, wherein said grouping processing means shifts a function in which a collision of the grouped functions occurs by a predetermined number in an upper address or lower address direction to avoid the collision. Optimizer.

13. The grouping processing means, when shifting a function in which a collision of the grouped functions occurs by a predetermined number in the direction of an upper address to avoid the function, a mark is provided so as not to shift to a lower address in the subsequent processing. 13. The execution object optimizing device according to claim 12, wherein the execution object optimizing device is attached.

14. The grouping processing means, when shifting a function in which a grouped function causes a collision by a predetermined number in the lower address direction to avoid the same, sets a mark so as not to shift to the upper address in the subsequent processing. 13. The execution object optimizing device according to claim 12, wherein the execution object optimizing device is attached.

15. The execution object optimization method according to claim 12, wherein said grouping processing means performs a grouping process by giving a priority to the number of times the executable object calls a function to said target system. Device.

16. The execution object optimizing apparatus according to claim 1, wherein the map file stores a function name, a source name including the function, an address where the function is located, and a function size. .

17. The execution object optimizing apparatus according to claim 1, wherein the instruction file describes the arrangement order and alignment of the grouped functions by the grouping processing unit.

18. The execution method according to claim 1, wherein, when the instruction file is updated, the linker locator reads the instruction file again to create an execution object in which a function arrangement order and an alignment are updated. Object optimization device.

19. The execution object optimizing apparatus according to claim 1, wherein the function call database is generated using a simulator.