JP3309810B2

JP3309810B2 - Program link system, method and recording medium

Info

Publication number: JP3309810B2
Application number: JP22904498A
Authority: JP
Inventors: 孝宮崎
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-08-13
Filing date: 1998-08-13
Publication date: 2002-07-29
Anticipated expiration: 2018-08-13
Also published as: JP2000056983A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、キャッシュメモリ
を有する計算機において、プログラム実行時の命令キャ
ッシュミスを削減するように、プログラムに含まれる各
ファンクションのオブジェクトモジュールをリンクする
ための技術に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for linking an object module of each function included in a program in a computer having a cache memory so as to reduce instruction cache misses during program execution. .

【０００２】[0002]

【従来の技術】計算機のプログラムを作成する場合は、
通常、コンパイラを使用して高級言語で書かれたソース
モジュールからオブジェクトモジュールを生成し、さら
にリンカを使用して複数のオブジェクトモジュールをリ
ンクし、計算機が実行できる形式のロードモジュールを
作成する。ここで、オブジェクトモジュールをリンクす
る順序は、オブジェクトモジュールに含まれるファンク
ションをある基準で並べることによって決められる。2. Description of the Related Art When creating a computer program,
Normally, an object module is generated from a source module written in a high-level language using a compiler, and a plurality of object modules are linked using a linker to create a load module in a format that can be executed by a computer. Here, the order in which the object modules are linked is determined by arranging the functions included in the object modules on a certain basis.

【０００３】このような計算機で実行可能なロードモジ
ュールを作成する場合、作成されるロードモジュールの
実行速度は高速であるほど望ましい。そして、計算機が
キャッシュメモリを持つことを考慮し、命令キャッシュ
ヒット率を上げてロードモジュールの実行速度を高速化
するためのプログラムリンクシステムが、特開平１０−
１２４３２７号公報に開示されている。When creating a load module that can be executed by such a computer, it is desirable that the execution speed of the created load module be as high as possible. In consideration of the fact that a computer has a cache memory, a program link system for increasing the instruction cache hit rate and increasing the execution speed of a load module is disclosed in
No. 124327.

【０００４】図８に、このプログラムリンクシステムの
構成を示す。このプログラムリンクシステムは、命令キ
ャッシュミスプロファイル作成手段８０１と、配置探索
手段８０２と、再リンク手段８０３とから構成されてい
る。FIG. 8 shows the configuration of the program link system. This program link system includes an instruction cache miss profile creation unit 801, an arrangement search unit 802, and a relink unit 803.

【０００５】このプログラムリンクシステムでは、命令
キャッシュミスプロファイル作成手段８０１は、何らか
のコンパイラ、リンカを用いて作成されたロードモジュ
ールの実行をシミュレーションして命令の実行状態をト
レースし、命令キャッシュミスの発生回数と、キャッシ
ュミスが発生したキャッシュブロックと、命令キャッシ
ュミスが発生したファンクションとを記録する。In this program link system, an instruction cache miss profile creating means 801 simulates the execution of a load module created by using some kind of compiler or linker, traces the execution state of instructions, and indicates the number of occurrences of instruction cache misses. Then, the cache block in which the cache miss has occurred and the function in which the instruction cache miss has occurred are recorded.

【０００６】次に、キャッシュミスの発生頻度が大きい
キャッシュブロックでは、同じキャッシュブロックに配
置されているファンクションが実行時に互いに他のファ
ンクションをキャッシュブロックから排除するように働
くので、配置探索手段８０２は、衝突しているファンク
ションを他のキャッシュブロックに移動させるように配
置の探索を行う。Next, in a cache block in which the frequency of occurrence of a cache miss is high, the functions located in the same cache block function to exclude each other from the cache block at the time of execution. An arrangement search is performed to move the colliding function to another cache block.

【０００７】そして、再リンク手段８０３により、配置
探索手段８０２によって探索された配置に応じてオブジ
ェクトプログラムを再リンクし、ロードモジュールを再
度作成する。再リンクによって作成されたロードモジュ
ールに対しても、上記の同様の処理が繰り返され、キャ
ッシュミスの発生頻度が最小となるファンクションの配
置を探すことによって、最も好ましいロードモジュール
が作成される。Then, the object program is re-linked by the re-linking means 803 according to the arrangement searched by the arrangement searching means 802, and the load module is created again. The same processing as described above is repeated for a load module created by relinking, and the most preferable load module is created by searching for a function arrangement that minimizes the frequency of occurrence of cache misses.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記従
来例には、次のような問題点があった。第１に、上記従
来例のプログラムリンクシステムでは、キャッシュミス
の発生頻度が最小となるファンクションの配置が決まる
までに、プログラムの実行をシミュレーションしてキャ
ッシュミスの発生状況を調べ、ファンクションを再配置
するということを繰り返さなければならない。このた
め、最適なファンクションの配置が決まるまでに時間が
かかるという問題点があった。However, the above conventional example has the following problems. First, in the above-described conventional program link system, the execution of a program is simulated to check the occurrence state of a cache miss, and the function is rearranged until the arrangement of the function that minimizes the occurrence frequency of the cache miss is determined. You have to repeat that. For this reason, there is a problem that it takes time until the optimal function arrangement is determined.

【０００９】また、第２に、ＤＲＡＭ（Dynamic Random
Access Memory）などのメモリモジュールでは、一般
に、非連続なメモリアドレスへのアクセスは、連続した
メモリアドレスへのアクセスに比べて遅くなる。上記従
来例のプログラムリンクシステムでは、呼び出し順序に
関係なく、非連続なメモリ位置にファンクションが配置
されることが多くなるので、キャッシュミスが発生した
ときに、メインメモリからキャッシュメモリへの命令の
転送に時間がかかってしまうという問題点があった。Second, DRAM (Dynamic Random)
In a memory module such as an Access Memory, access to non-consecutive memory addresses is generally slower than access to consecutive memory addresses. In the above-described conventional program link system, functions are often arranged at non-contiguous memory locations regardless of the calling order. Therefore, when a cache miss occurs, the transfer of instructions from the main memory to the cache memory is performed. There is a problem that it takes time.

【００１０】さらに、第３に、複数のプログラムが同時
に実行されるマルチタスクをサポートする計算機では、
キャッシュメモリの利用状況は常に変化しているので、
キャッシュミスの発生状況をシミュレーションによって
静的に解析しても、その解析結果は実際の状況とは異な
るものとなってしまう。このため、現在の汎用コンピュ
ータで一般化しているマルチタスクをサポートする計算
機では、上記のようにしてロードモジュールを作成して
も、必ずしもその実行速度が速くなるとは限らないとい
う問題点があった。Third, in a computer that supports multitasking in which a plurality of programs are executed simultaneously,
Since the usage of cache memory is constantly changing,
Even if the occurrence situation of the cache miss is statically analyzed by simulation, the analysis result is different from the actual situation. For this reason, in a computer that supports multitasking, which is generalized in current general-purpose computers, there is a problem that even if a load module is created as described above, its execution speed does not always increase.

【００１１】なお、本願に関連して、命令キャッシュミ
スの頻度を少なくしてプログラムの実行速度を高速化す
るための技術が、特開平８−３２８８７０号公報と特開
平９−１２８２４５号公報に、それぞれ開示されてい
る。しかしながら、これらの技術は、いずれもソースモ
ジュールからオブジェクトモジュールを生成するコンパ
イル処理に関するものであり、本発明のように、オブジ
ェクトモジュールからロードモジュールを生成するため
のリンク処理に関するものではない。Japanese Patent Application Laid-Open Nos. 8-328870 and 9-128245 disclose techniques for reducing the frequency of instruction cache misses and increasing the execution speed of a program. Each is disclosed. However, each of these techniques relates to a compiling process for generating an object module from a source module, and does not relate to a linking process for generating a load module from an object module as in the present invention.

【００１２】本発明は、上記従来例の問題点を解消する
ためになされたものであり、実行頻度の高いファンクシ
ョンのオブジェクトモジュールを連続して配置すること
で命令キャッシュミスの発生頻度を少なくし、しかも、
このようなファンクションの配置決めを高速に行うこと
ができるプログラムリンクシステム、方法、及びこの方
法を記録した記録媒体を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and reduces the frequency of occurrence of an instruction cache miss by continuously arranging object modules of a frequently executed function. Moreover,
It is an object of the present invention to provide a program link system and method capable of deciding such a function arrangement at high speed, and a recording medium recording this method.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するた
め、本発明の第１の観点にかかるプログラムリンクシス
テムは、複数のオブジェクトモジュールをリンクし、メ
インメモリ上のプログラムの一部が格納されるキャッシ
ュメモリを有する計算機で実行するロードモジュールを
作成するプログラムリンクシステムであって、前記リン
クによって作成されるロードモジュールに対応するプロ
グラムの構造を解析するプログラム構造抽出手段と、前
記プログラム構造抽出手段による解析結果に基づいて、
前記複数のオブジェクトモジュールをそれぞれリンクす
るための順序を付ける順序付け手段と、前記順序付け手
段によって付けられた順序に従って、前記複数のオブジ
ェクトモジュールをリンクしてロードモジュールを作成
するリンク手段とを備え、前記複数のオブジェクトモジ
ュールは、ロードモジュールの作成対象となるプログラ
ムに含まれる各ファンクションに対応し、前記プログラ
ム構造抽出手段は、ロードモジュールの作成対象となる
プログラムのソースプログラムを解析して、ファンクシ
ョン呼び出しとファンクション呼び出しを含むループと
の関係を表す構造グラフを作成し、前記順序付け手段
は、前記構造グラフを参照し、まず、呼び出し順に各フ
ァンクションを並べ、次に、実行頻度の高いループで呼
び出されるファンクションが該ループ内で呼び出し順に
連続するように並べ替えることによって、前記複数のオ
ブジェクトモジュールの順序付けを行うことを特徴とす
る。In order to achieve the above object, a program link system according to a first aspect of the present invention links a plurality of object modules and stores a part of a program on a main memory. A program link system for creating a load module to be executed by a computer having a cache memory, comprising: a program structure extracting unit for analyzing a structure of a program corresponding to the load module created by the link; and an analysis by the program structure extracting unit. Based on the result,
And ordering means for attaching a sequence for linking the plurality of object modules respectively, according to the order given by the ordered unit, and a link means for creating a load module to link the plurality of object modules, said plurality Object moji
Module is the program for which the load module is created.
Corresponding to each function included in the program,
The program structure extraction means is a target for creating a load module.
Analyze the source program of the program and
Loop containing function calls and function calls
Creating a structure graph representing the relationship of
First refers to the structure graph and first calls
Functions, and then call them in a frequently executed loop.
The called functions are called in the loop in the order
By rearranging the plurality of
It is characterized in that the ordering of the object modules is performed .

【００１４】上記目的を達成するため、本発明の第２の
観点にかかるプログラムリンクシステムは、メインメモ
リ上のプログラムの一部が格納されるキャッシュメモリ
を有する計算機で実行するロードモジュールを作成する
際に、リンカによってリンクされる複数のオブジェクト
モジュールに順序を付けるプログラムリンクシステムで
あって、前記リンクによって作成されるロードモジュー
ルに対応するプログラムの構造を解析するプログラム構
造抽出手段と、前記プログラム構造抽出手段による解析
結果に基づいて、前記複数のオブジェクトモジュールを
それぞれリンクするための順序を付ける順序付け手段と
を備え、前記複数のオブジェクトモジュールは、ロード
モジュールの作成対象となるプログラムに含まれる各フ
ァンクションに対応し、前記プログラム構造抽出手段
は、ロードモジュールの作成対象となるプログラムのソ
ースプログラムを解析して、ファンクション呼び出しと
ファンクション呼び出しを含むループとの関係を表す構
造グラフを作成し、前記順序付け手段は、前記構造グラ
フを参照し、まず、呼び出し順に各ファンクションを並
べ、次に、実行頻度の高いループで呼び出されるファン
クションが該ループ内で呼び出し順に連続するように並
べ替えることによって、前記複数のオブジェクトモジュ
ールの順序付けを行うことを特徴とする。In order to achieve the above object, a program link system according to a second aspect of the present invention provides a program link system for creating a load module to be executed by a computer having a cache memory in which a part of a program on a main memory is stored. A program link system for assigning an order to a plurality of object modules linked by a linker, comprising: a program structure extracting unit for analyzing a structure of a program corresponding to a load module created by the link; and a program structure extracting unit. Ordering means for ordering the plurality of object modules based on the result of the analysis.
Wherein the plurality of object modules are loaded
Each file included in the program for which a module is created
Said program structure extracting means corresponding to the function
Is the source of the program for which the load module is to be created.
Analysis of the source program
A structure representing the relationship with the loop containing the function call
A drawing graph, and the ordering means generates the drawing graph.
First, list each function in the order of calls.
Next, the fan called in the frequently executed loop
Actions are arranged in the loop so that they are consecutive in the calling order.
By replacing, the plurality of object modules
It is characterized in that the rules are ordered .

【００１５】上記第１、第２の観点にかかるプログラム
リンクシステムでは、順序付け手段によって実行頻度の
高いモジュール同士がキャッシュメモリ中の同一のブロ
ックを取り合わないように各オブジェクトモジュールを
それぞれリンクするための順序付けすることにより、キ
ャッシュミスの発生頻度を抑えることができる。このた
め、最終的に作成されるロードモジュールを計算機で実
行した場合に、その実質的な実行速度を速くすることが
できる。In the program link system according to the first and second aspects, ordering means for linking the object modules by the ordering means so that modules having a high execution frequency do not compete for the same block in the cache memory. By doing so, the frequency of occurrence of cache misses can be suppressed. For this reason, when the load module finally created is executed by the computer, the substantial execution speed can be increased.

【００１６】また、オブジェクトモジュールの最適なリ
ンク順序の決定には、プログラム構造抽出手段がプログ
ラム構造の解析をし、このプログラム構造に従って順序
決定手段がオブジェクトモジュールに順序を付ければよ
い。つまり、上記のプログラムリンクシステムでは、ロ
ードモジュールが実際に作成される度にシミュレーショ
ンを実行するといったことを繰り返す必要がなく、オブ
ジェクトモジュールをリンクするための順序を決定する
処理の時間も小さくて済む。In order to determine the optimal link order of the object modules, the program structure extracting means may analyze the program structure, and the order determining means may order the object modules according to the program structure. That is, in the above program link system, it is not necessary to repeat the simulation each time a load module is actually created, and the processing time for determining the order for linking the object modules can be reduced.

【００１７】[0017]

【００１８】上記第１、第２の観点にかかるプログラム
リンクシステムにおいて、前記プログラム構造抽出手段
は、ロードモジュールの作成対象となるプログラムにお
けるループ内での各ファンクションの実行回数をプログ
ラム構造グラフに付加する手段を有し、ネスト構造とな
っているループ内のファンクションの実行回数は、それ
ぞれのループの実行頻度の積を各ファンクションの実行
回数とすることができる。A program according to the first and second aspects
In the link system, the program structure extracting means includes means for adding the number of executions of each function in a loop in a program for which a load module is to be created to a program structure graph. The number of executions of each function can be determined by multiplying the execution frequency of each loop by the number of executions of each function.

【００１９】なお、上記プログラムリンクシステムにお
いて、前記キャッシュメモリは、ダイレクト方式または
セットアソシエイティブ方式で前記メインメモリ上のプ
ログラムをマッピングして格納するものであることを好
適とする。In the above-mentioned program link system, it is preferable that the cache memory maps and stores a program on the main memory by a direct method or a set associative method.

【００２０】すなわち、キャッシュメモリのマッピング
方法としてダイレクト方式またはセットアソシエイティ
ブ方式を採用している計算機では、メインメモリのブロ
ック単位でキャッシュメモリのどのブロックが割り当て
られるか決まる。このため、作成されるロードモジュー
ルにおいて実行頻度の高い部分同士が命令キャッシュメ
モリの同一のブロックを取り合わないようにすること
で、キャッシュミスの頻度の小さくすることが可能とな
る。特に、メインメモリのブロック毎に割り当てられる
キャッシュメモリのブロックが一意に定まるダイレクト
方式を採用している計算機で実行されるロードモジュー
ルを作成する場合に、本発明の効果が顕著に現れる。That is, in a computer adopting the direct method or the set associative method as a mapping method of the cache memory, which block of the cache memory is allocated is determined in block units of the main memory. For this reason, it is possible to reduce the frequency of cache misses by preventing the portions of the load module to be created that have a high execution frequency from competing for the same block of the instruction cache memory. In particular, the effect of the present invention is remarkably exhibited when a load module to be executed by a computer adopting a direct method in which a block of a cache memory allocated to each block of a main memory is uniquely determined.

【００２１】上記目的を達成するため、本発明の第３の
観点にかかるプログラムリンク方法は、複数のオブジェ
クトモジュールをリンクし、メインメモリ上のプログラ
ムの一部が格納されるキャッシュメモリを有する計算機
で実行するロードモジュールを作成するプログラムリン
ク方法であって、前記リンクによって作成されるロード
モジュールに対応するプログラムの構造を解析するプロ
グラム構造抽出ステップと、前記プログラム構造抽出ス
テップでの解析結果に基づいて、前記複数のオブジェク
トモジュールをそれぞれリンクするための順序を付ける
順序付けステップと、前記順序付けステップにおいて付
けられた順序に従って、前記複数のオブジェクトモジュ
ールをリンクしてロードモジュールを作成するリンクス
テップとを含み、前記複数のオブジェクトモジュール
は、ロードモジュールの作成対象となるプログラムに含
まれる各ファンクションに対応し、前記プログラム構造
抽出ステップは、ロードモジュールの作成対象となるプ
ログラムのソースプログラムを解析して、ファンクショ
ン呼び出しとファンクション呼び出しを含むループとの
関係を表す構造グラフを作成し、前記順序付けステップ
は、前記構造グラフを参照し、まず、呼び出し順に各フ
ァンクションを並べ、次に、実行頻度の高いループで呼
び出されるファンクションが該ループ内で呼び出し順に
連続するように並べ替えることによって、前記複数のオ
ブジェクトモジュールの順序付けを行うことを特徴とす
る。To achieve the above object, a program linking method according to a third aspect of the present invention is a computer having a cache memory for linking a plurality of object modules and storing a part of a program on a main memory. A program linking method for creating a load module to be executed, comprising: a program structure extracting step of analyzing a structure of a program corresponding to a load module created by the link; and, based on an analysis result in the program structure extracting step, the includes a sequencing step to give the order for a plurality of object modules each link, according to the order given in the ordering step, and a link creating a load module to link the plurality of object modules, Serial multiple object modules
Is included in the program for which the load module is created.
The program structure corresponding to each function
The extraction step is a process for creating a load module.
Analyze the program source program and
Loop with function calls and function calls
Creating a structure graph representing the relationship, and said ordering step
First refers to the structure graph and first calls
Functions, and then call them in a frequently executed loop.
The called functions are called in the loop in the order
By rearranging the plurality of
It is characterized in that the ordering of the object modules is performed .

【００２２】上記目的を達成するため、本発明の第４の
観点にかかる記録媒体は、複数のオブジェクトモジュー
ルをリンクし、メインメモリ上のプログラムの一部が格
納されるキャッシュメモリを有する計算機で実行するロ
ードモジュールを作成するプログラムを記録する記録媒
体であって、前記リンクによって作成されるロードモジ
ュールに対応するプログラムの構造を解析するプログラ
ム構造抽出ステップと、前記プログラム構造抽出ステッ
プでの解析結果に基づいて、前記複数のオブジェクトモ
ジュールをそれぞれリンクするための順序を付ける順序
付けステップと、前記順序付けステップにおいて付けら
れた順序に従って、前記複数のオブジェクトモジュール
をリンクしてロードモジュールを作成するリンクステッ
プとをコンピュータ装置に実行させるためのプログラム
を記録し、前記プログラム構造抽出ステップは、ロード
モジュールの作成対象となるプログラムのソースプログ
ラムを解析して、ファンクション呼び出しとファンクシ
ョン呼び出しを含むループとの関係を表す構造グラフを
作成し、前記順序付けステップは、前記構造グラフを参
照し、まず、呼び出し順に各ファンクションを並べ、次
に、実行頻度の高いループで呼び出されるファンクショ
ンが該ループ内で呼び出し順に連続するように並べ替え
ることによって、前記複数のオブジェクトモジュールの
順序付けを行うことを特徴とする。In order to achieve the above object, a recording medium according to a fourth aspect of the present invention links a plurality of object modules to be executed by a computer having a cache memory in which a part of a program on a main memory is stored. A recording medium for recording a program for creating a load module to be executed, based on a program structure extracting step of analyzing a structure of a program corresponding to the load module created by the link, based on an analysis result in the program structure extracting step. Te, the ordering step to give an order to link the plurality of object modules respectively, said ordered according Tagged sequence in step, computing a link creating a load module to link the plurality of object modules Program for executing device
And the program structure extracting step includes loading
Source program of the program for which the module is created
Analyze the ram for function calls and functions
Structure graph showing the relationship with the loop containing the
Creating and ordering step refers to the structure graph.
First, arrange each function in the order of calling, then
Function that is called in a frequently executed loop
Are sorted in the calling sequence in the loop
By doing so, the plurality of object modules
It is characterized by performing ordering .

【００２３】[0023]

【発明の実施の形態】以下、添付図面を参照して、本発
明の実施の形態について説明する。Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００２４】最初に、この実施の形態においてソースプ
ログラムがコンパイルされ、さらにこのコンパイルされ
たオブジェクトモジュールがリンクされたロードモジュ
ールを実行すべきキャッシュメモリを有する計算機につ
いて、図１のブロック図を参照して説明する。First, a computer having a cache memory in which a source program is compiled in this embodiment and a load module to which the compiled object module is linked is to be executed will be described with reference to the block diagram of FIG. explain.

【００２５】図示するように、この計算機は、演算を実
行するＣＰＵ（Central ProcessingUnit）７０１と、命
令をキャッシュする命令キャッシュメモリ７０２、デー
タをキャッシュするデータキャッシュメモリ７０３、及
びメインメモリ７０４とを備えている。As shown in the figure, the computer includes a CPU (Central Processing Unit) 701 for executing an operation, an instruction cache memory 702 for caching instructions, a data cache memory 703 for caching data, and a main memory 704. I have.

【００２６】ＣＰＵ７０１は、プログラムを実行する場
合、命令キャッシュメモリ７０２から１つずつ命令を読
み込む。命令キャッシュレジスタ７０２に実行しようと
する命令が格納されていない場合（命令キャッシュミ
ス）、実行に必要な部分の命令を含むブロックがメイン
メモリ７０４から命令キャッシュメモリ７０２にロード
される。When executing a program, the CPU 701 reads instructions one by one from the instruction cache memory 702. When an instruction to be executed is not stored in the instruction cache register 702 (instruction cache miss), a block including an instruction of a part required for execution is loaded from the main memory 704 to the instruction cache memory 702.

【００２７】なお、キャッシュメモリのマッピングの方
法には、一般に、ダイレクト方式、セットアソシエイテ
ィブ方式及びフルアソシエイティブ方式の３つがある
が、この計算機では、命令キャッシュメモリ７０２のマ
ッピング方法としてダイレクト方式またはセットアソシ
エイティブ方式のいずれかが採用されている。ここで、
この実施の形態による効果が最も大きいダイレクト方式
について簡単に説明すると、メインメモリ７０４中のブ
ロックが割り付けられる命令キャッシュメモリ７０２の
位置は、メインメモリ７０４のアドレスのブロックの番
号を命令キャッシュメモリ７０２中のブロック数で割っ
た剰余とされる。There are generally three methods of mapping the cache memory: a direct method, a set associative method, and a full associative method. In this computer, the direct cache or the set method is used as the mapping method of the instruction cache memory 702. One of the associative methods is adopted. here,
A brief description will be given of the direct method in which the effect of this embodiment is the largest. The position of the instruction cache memory 702 to which the block in the main memory 704 is allocated is the number of the block at the address of the main memory 704. The remainder is divided by the number of blocks.

【００２８】図２は、この実施の形態にかかる、ソース
プログラムから図１の計算機で実行可能なロードモジュ
ールを作成するためのプログラムリンクシステムの機能
構成を示すブロック図である。図示するように、このプ
ログラムリンクシステムは、プログラム構造抽出手段１
０１と、ファンクション順序付け手段１０２と、リンク
手段１０３と、コンパイル手段１０４とから構成されて
いる。FIG. 2 is a block diagram showing a functional configuration of a program link system for creating a load module executable by the computer shown in FIG. 1 from a source program according to this embodiment. As shown, the program link system comprises a program structure extracting means 1
01, function ordering means 102, link means 103, and compiling means 104.

【００２９】プログラム構造抽出手段１０１は、プログ
ラム制御により動作し、ロードモジュールの作成対象と
なるプログラムのソースプログラム（複数のファンクシ
ョンに対応するソースモジュールを含む）を読み込み、
ソースプログラム中のファンクションのフロー解析を行
い、ファンクション呼び出しとファンクション呼び出し
を含むループとの関係を表すプログラム構造グラフを作
成する。The program structure extraction means 101 operates under program control, reads a source program (including a source module corresponding to a plurality of functions) of a program for which a load module is to be created, and
A flow analysis of a function in a source program is performed, and a program structure graph representing a relationship between a function call and a loop including the function call is created.

【００３０】ファンクション順序付け手段１０２は、プ
ログラム制御により動作し、プログラム構造抽出手段１
０１で作成したプログラム構造グラフを参照し、まず、
呼び出し順にファンクションを並べ、次に実行頻度の高
いループで呼び出されるファンクションが呼び出し順に
連続するようにファンクションを並べ替えて、ソースモ
ジュールに含まれるすべてのファンクションに順序を付
ける。The function ordering means 102 operates under program control, and the program structure extracting means 1
01, referring to the program structure graph created in
The functions are arranged in the order of call, and the functions are rearranged so that functions called in the next most frequently executed loop are consecutive in the order of calls, and all the functions included in the source module are ordered.

【００３１】リンク手段１０３は、プログラム制御によ
り動作し、コンパイル手段１０４によって作成されたフ
ァンクション毎のオブジェクトモジュールを、ファンク
ション順序付け手段１０２によって順序付けされた順番
でメインメモリ７０４条に配置されるようにリンクし
て、ロードモジュールを作成する。The linking unit 103 operates under program control, and links the object modules for each function created by the compiling unit 104 so as to be arranged in the main memory 704 in the order ordered by the function ordering unit 102. To create a load module.

【００３２】コンパイル手段１０４は、プログラム制御
により動作し、ソースプログラムから、該ソースプログ
ラムに含まれるそれぞれのソースモジュールに対応する
オブジェクトモジュールを作成する。The compiling means 104 operates under program control, and creates an object module corresponding to each source module included in the source program from the source program.

【００３３】以下、この実施の形態にかかるプログラム
リンクシステムにおける処理について、説明する。ま
ず、プログラム構造抽出手段１０１とファンクション順
序付け手段１０２とによって、実行頻度の高いファンク
ションのオブジェクトモジュールを連続して配置させる
べく、各ファンクションへの順序付けがなされる。Hereinafter, processing in the program link system according to this embodiment will be described. First, the program structure extracting unit 101 and the function ordering unit 102 order each function so that object modules of frequently executed functions are continuously arranged.

【００３４】図３は、ファンクションへの順位付けのた
めに、プログラム構造抽出手段１０１とファンクション
順序付け手段１０２とが実行する処理を示すフローチャ
ートである。FIG. 3 is a flowchart showing a process executed by the program structure extracting means 101 and the function ordering means 102 for ordering the functions.

【００３５】処理が開始すると、プログラム構造抽出手
段１０１は、ロードモジュールの作成対象となるソース
プログラムを読み込み、このプログラムをフロー解析す
る（ステップＳ１）。次に、プログラム構造抽出手段１
０１は、ファンクションの呼び出し関係とファンクショ
ン呼び出しを含むループ構成を抽出し、プログラム構造
グラフを作成する（ステップＳ２）。When the process starts, the program structure extracting means 101 reads a source program for which a load module is to be created, and analyzes the flow of the program (step S1). Next, program structure extracting means 1
01 extracts a loop configuration including a function call relationship and a function call, and creates a program structure graph (step S2).

【００３６】このとき、プログラム構造抽出手段１０１
は、ループの実行頻度も調べて、ループ内の各ファンク
ションの実行回数を情報として付加する。個々のループ
の実行頻度は、ループがネスティングされている場合
は、それぞれの実行回数を乗算した値で、プログラムを
実行した際のループ内のファンクションの実行回数を表
す。また、Ｃ言語でのwhile文、do〜while文のような条
件判定ループでは、実際にプログラムが実行されなけれ
ばループの実行頻度はわからないが、この場合、適当な
回数をループ内の実行頻度と仮定する。At this time, the program structure extracting means 101
Checks the execution frequency of the loop, and adds the number of executions of each function in the loop as information. When a loop is nested, the execution frequency of each loop is a value obtained by multiplying the number of executions, and indicates the number of executions of a function in the loop when the program is executed. In a condition determination loop such as a while statement and a do-while statement in the C language, the execution frequency of the loop is not known unless the program is actually executed. In this case, an appropriate number of times is determined by the execution frequency in the loop. Assume.

【００３７】次に、ファンクション順序付け手段１０２
は、プログラム構造グラフを参照し、プログラム中の各
ファンクションを、メインファンクションから順番に、
それが最初に最初に呼び出される順序で配置する（ステ
ップＳ３）。次に、ファンクション順序付け手段１０２
は、プログラム構造グラフにある全てのループを順序づ
け未確定に設定する（ステップＳ４）。Next, the function ordering means 102
Refers to the program structure graph, and sorts each function in the program in order from the main function,
They are arranged in the order in which they are first called (step S3). Next, the function ordering means 102
Sets all the loops in the program structure graph to unordered (step S4).

【００３８】次に、ファンクション順序付け手段１０２
は、順序づけ未確定のループがあるかどうかを調べる
（ステップＳ５）。順序づけ未確定のループがあると判
定した場合は、まず、ループ内のファンクションが、プ
ログラム中の他でも呼び出されているかどうかを調べ
る。プログラム中の他でも使われている場合は、そのフ
ァンクションの実行頻度を比較し、現在調べているルー
プの方が大きい場合は、ファンクションが現在のループ
内の呼び出し位置に配置されるように移動する（ステッ
プＳ６）。Next, the function ordering means 102
Checks whether there is a loop whose ordering is undetermined (step S5). If it is determined that there is a loop whose ordering is undetermined, first, it is determined whether or not a function in the loop is called by any other function in the program. If the function is used elsewhere in the program, compare the execution frequency of the function, and if the current loop is larger, move the function so that it is located at the calling position in the current loop. (Step S6).

【００３９】ステップＳ６が終了すると、ファンクショ
ン順序付け手段１０２は、ステップＳ５に戻り、さらに
順序付け未確定のループがあるかどうか調べる。そし
て、ステップＳ５において、順序付け未確定のループが
ないと判定された場合に、このフローチャートの処理を
終了し、ファンクションの順序付けが確定したこととな
る。When step S6 ends, the function ordering means 102 returns to step S5, and checks whether there is a loop whose ordering is undetermined. If it is determined in step S5 that there is no loop whose ordering has not been determined, the processing of this flowchart ends, and the ordering of the functions is determined.

【００４０】一方、プログラム構造抽出手段１０１とフ
ァンクション順序付け手段１０２とが上記の処理を行っ
ている間に、コンパイル手段１０４は、ソースプログラ
ムに含まれる各ファンクションのソースモジュールをコ
ンパイルし、各ファンクションに対応したオブジェクト
モジュールを作成する。On the other hand, while the program structure extracting means 101 and the function ordering means 102 perform the above processing, the compiling means 104 compiles the source module of each function included in the source program and corresponds to each function. Create a new object module.

【００４１】そして、プログラム構造抽出手段１０１及
びファンクション順序付け手段１０２による処理と、コ
ンパイル手段１０４による処理との両方が終了すると、
リンク手段１０３は、コンパイル手段１０４でコンパイ
ルされたファンクション毎のオブジェクトモジュール
を、図３の処理で確定した対応するファンクションの順
序でリンクして、ロードモジュールを作成する。When both the processing by the program structure extracting means 101 and the function ordering means 102 and the processing by the compiling means 104 are completed,
The linking unit 103 links the object modules for each function compiled by the compiling unit 104 in the order of the corresponding functions determined in the processing of FIG. 3 to create a load module.

【００４２】以下、この実施の形態における処理を、具
体例に基づいて詳細に説明する。図４は、この実施の形
態における処理を具体的に説明するためのプログラム例
を示す図である。このプログラムは、Ｃ言語を用いて記
述している。但し、ここでは、ファンクション呼び出し
とループ以外のプログラムの記述は省略している。Hereinafter, the processing in this embodiment will be described in detail based on a specific example. FIG. 4 is a diagram showing an example of a program for specifically explaining the processing in this embodiment. This program is described using the C language. However, the description of the program other than the function call and the loop is omitted here.

【００４３】図４に示すプログラムにおいて、メインプ
ログラムmain()は、４個のファンクションfa(),fb(), f
c(), fd()を呼び出している。ファンクションfc()は、f
or(i=0; i<50; i++)で記述される５０回の繰り返し実行
するループを持ち、このループ内で３個のファンクショ
ンfa(),fe(), ff()を呼び出している。さらに、ファン
クションfc()は、ループの外でファンクションfg()を呼
び出している。In the program shown in FIG. 4, the main program main () has four functions fa (), fb (), f
Calls c () and fd (). Function fc () returns f
It has a loop that is repeatedly executed 50 times described by or (i = 0; i <50; i ++), and calls three functions fa (), fe () and ff () in this loop. Further, the function fc () calls the function fg () outside the loop.

【００４４】ファンクションfe()は、for(j=0;j<100; j
++)で記述される１００回の繰り返し実行するループを
持ち、ループ内で３個のファンクションfa(),fh(), f
i()を呼び出している。このループは、ファンクションf
c()にあるループとネスティングされて（入れ子構造と
なって）いる。そのため、ファンクションfa(),fh(), f
i()は、このループ内において５０×１００＝５０００
回呼び出されることになる。さらに、ファンクションfe
()は、ループの外でファンクションfd()を呼び出してい
る。The function fe () is for (j = 0; j <100; j
++) has a loop that executes 100 times repeatedly, and three functions fa (), fh (), f
Calling i (). This loop is a function f
Nested (nested) with the loop in c (). Therefore, the functions fa (), fh (), f
i () is 50 × 100 = 5000 in this loop
Will be called multiple times. In addition, the function fe
() Calls the function fd () outside the loop.

【００４５】図４の左側に、図３のプログラムをフロー
解析し、ファンクション呼び出しとループ構成を表した
プログラム構造グラフを示す。プログラムをフロー解析
した結果は、ＵＮＩＸのプログラム開発環境にあるコマ
ンドｃｆｌｏｗの出力結果にループとループ回数を示す
情報を付加したのと同様なものとなる。図４において、
四角の枠は、ループを示している。ここでは、ループ１
とループ２の２個のループがある。四角の枠で囲まれた
ファンクションは、ループに内包されていることを示
し、括弧<>内の数字は、ループ内でのファンクションの
実行回数を示している。On the left side of FIG. 4, there is shown a program structure graph showing a flow analysis of the program of FIG. 3 and showing a function call and a loop configuration. The result of the flow analysis of the program is the same as the result obtained by adding information indicating the loop and the number of loops to the output result of the command cflow in the UNIX program development environment. In FIG.
A square frame indicates a loop. Here, loop 1
And loop 2. A function surrounded by a rectangular frame indicates that the function is included in the loop, and a number in parentheses <> indicates the number of times the function is executed in the loop.

【００４６】次に、各ファンクションへ順序付けする方
法について説明する。最初に、各ファンクションを呼び
出される順序で配置する。これは、プログラム構造グラ
フの上から順番に、各ファンクションを出現順に並べれ
ばよい。ファンクションを並べた結果を図４の中央部に
示す。Next, a method of ordering each function will be described. First, place each function in the order in which it is called. This can be done by arranging the functions in order of appearance from the top of the program structure graph. The result of arranging the functions is shown in the center of FIG.

【００４７】次に、ループ内でのファンクションの順序
を確定する。まず、ループ２は、fa(),fh(), fi()の３
個のファンクションを持つ。これらのファンクション
は、各々５，０００回実行され、プログラム内で、最も
実行頻度が高い。したがって、他の位置に配置されてい
るファンクションfa()は、このループ内に配置を移動
し、ファンクションfa(),fh(), fi()の順番で配置す
る。次に、ループ１内のファンクションの順序付けをす
る。Next, the order of the functions in the loop is determined. First, loop 2 is composed of 3 of fa (), fh (), fi ()
Has functions. Each of these functions is executed 5,000 times, and is executed most frequently in a program. Therefore, the function fa () arranged at another position moves within this loop, and is arranged in the order of functions fa (), fh (), fi (). Next, the functions in the loop 1 are ordered.

【００４８】ループ１には、ループ２に含まれ既に順序
が確定したfa(),fh(), fi()の他に、fa(), fe(), f
d(), ff()の４個のファンクションを持つ。それらのフ
ァンクションは、各々５０回実行される。ファンクショ
ンfa()は、ループ２で配置が確定しているので除外し、
ファンクションfe(),fd(), ff()を、この順番で配置す
る。このとき、ループ２を内包するファンクションfe()
の後に、ループ２で配置が確定したファンクションf
a(),fh(), fi()が配置される。これで、未確定のループ
は無くなり、図４の右側に示すように、ファンクション
の配置が決定できる。In loop 1, in addition to fa (), fh (), fi () included in loop 2 and having already determined the order, fa (), fe (), f
It has four functions of d () and ff (). Each of these functions is executed 50 times. Function fa () is excluded because loop 2 has its placement determined.
The functions fe (), fd (), and ff () are arranged in this order. At this time, the function fe () containing loop 2
Followed by the function f whose placement is determined in loop 2
a (), fh (), fi () are arranged. As a result, there is no undetermined loop, and the arrangement of functions can be determined as shown on the right side of FIG.

【００４９】なお、ファンクションの配置をリンカ（リ
ンク手段１０３）に指示する方法は、リンカの種類によ
り異なり、いくつかの方法がある。例えば、リンカに対
する指示として用いるリンク・ディレクティブ（linkdi
rective）ファイルを作成する方法がある。リンク・デ
ィレクティブ・ファイルの記述方法については、「ＮＥ
Ｃユーザーズ・マニュアルＣＡ７３２／ＣＡ８３０
・ＣＡ８５０Ｖ８００シリーズ・Ｃコンパイラ・パッ
ケージ操作編ＵＮＩＸベース」（資料番号Ｕ１１０
１３ＪＪ１Ｖ０ＵＭ００、１９９５年）の１１３頁から
１８８頁に記載されている。The method of instructing the linker (link means 103) on the arrangement of functions differs depending on the type of linker, and there are several methods. For example, a link directive (linkdi
rective) file. For information on how to write a link directive file, see "NE
C User's Manual CA732 / CA830
-CA850 V800 Series-C Compiler-Package Operation UNIX Base "(Document number U110
13JJ1V0UM00, 1995) at pages 113 to 188.

【００５０】次に、図４のソースプログラムから作成さ
れたロードモジュールが、図１の計算機で実行される場
合について検討する。ここで、命令キャッシュメモリ７
０２は、第０〜第３の４ブロックに分かれており、ダイ
レクト方式でマッピングされるものとする。また、作成
されたロードモジュール中でファンクションfa()〜fg()
は、それぞれ１ブロックずつの大きさを有し、メインメ
モリ７０４の第０ブロックからの連続した領域にロード
モジュールが格納されているものとする。Next, consider a case where the load module created from the source program of FIG. 4 is executed by the computer of FIG. Here, the instruction cache memory 7
02 is divided into 0th to 3rd blocks and is mapped in a direct system. In the created load module, functions fa () to fg ()
Has a size of one block each, and the load module is stored in a continuous area from the 0th block of the main memory 704.

【００５１】図５の中央部に示すように、ファンクショ
ンの呼び出し順序で各ファンクションのオブジェクトモ
ジュールをリンクした場合には、ファンクションfa()と
ファンクションfh()とには、命令キャッシュメモリ７０
２の第０ブロックが割り当てられることとなる。ここ
で、ループ２を実行する場合に、ファンクションfa()を
命令キャッシュメモリ７０２の第０ブロックに転送して
実行したすぐ後に、第０ブロックに格納されるべきファ
ンクションfh()を実行しなければならない。つまり、フ
ァンクションfh()を実行する時には必ずキャッシュミス
が発生することとなる。一方、ループ２の先頭に戻って
ファンクションfa()を実行するときにも、キャッシュミ
スが発生することとなる。すなわち、ループ２におい
て、５０００×２＝１００００回のキャッシュミスが発
生することとなり、作成されたロードモジュールの実行
速度を低下させる要因となる。As shown in the center part of FIG. 5, when the object modules of each function are linked in the order of calling the functions, the function fa () and the function fh () are stored in the instruction cache memory 70.
The second 0-th block will be allocated. Here, when executing the loop 2, the function fh () to be stored in the zeroth block must be executed immediately after the function fa () is transferred to the zeroth block of the instruction cache memory 702 and executed. No. That is, a cache miss always occurs when the function fh () is executed. On the other hand, when returning to the beginning of the loop 2 and executing the function fa (), a cache miss will also occur. That is, in the loop 2, 5000 × 2 = 10000 cache misses occur, which causes a reduction in the execution speed of the created load module.

【００５２】一方、図５の右側に示すように、ファンク
ションの順序を変えて各ファンクションのオブジェクト
モジュールを作成した場合には、実行頻度の高いループ
２内のファンクションfa(),fh(),fi()には、それぞれ命
令キャッシュメモリ７０２の第３、第０、第１ブロック
が割り当てられることになる。このため、ループ２でフ
ァンクションfa(),fh(),fi()を繰り返して実行した場合
に、キャッシュミスが発生することがない。一方、ルー
プ１では、例えば、ファンクションfe(),fd()で命令キ
ャッシュメモリ７０２のブロックを取り合ってキャッシ
ュミスが発生するが、その回数は、５０×２＝１００回
でその頻度は格段に小さいものとなる。このため、作成
されたロードモジュールの実行速度が速くなる。On the other hand, as shown on the right side of FIG. 5, when the object module of each function is created by changing the order of the functions, the functions fa (), fh (), fi The third, zeroth, and first blocks of the instruction cache memory 702 are respectively assigned to (). Therefore, when the functions fa (), fh (), fi () are repeatedly executed in the loop 2, a cache miss does not occur. On the other hand, in the loop 1, for example, a cache miss occurs when the blocks of the instruction cache memory 702 are held by the functions fe () and fd (), but the number of times is 50 × 2 = 100 times, and the frequency is extremely small. It will be. Therefore, the execution speed of the created load module increases.

【００５３】以上説明したように、この実施の形態にか
かるプログラムリンクシステムによれば、プログラム中
の実行頻度の高い部分にあるファンクションが連続した
アドレスとなるように配置されたロードモジュールが作
成される。このため、計算機でロードモジュールを実行
するときに、実行頻度の高いファンクションが命令キャ
ッシュメモリ７０２のブロックを取り合うようなことが
発生しない。このため、プログラム全体としてキャッシ
ュミスの発生頻度が減り、プログラムの実行速度が速く
なる。As described above, according to the program link system of this embodiment, a load module is created in which functions in frequently executed portions of a program are arranged at consecutive addresses. . Therefore, when the load module is executed by the computer, it does not occur that frequently executed functions compete for blocks in the instruction cache memory 702. Therefore, the frequency of occurrence of cache misses in the entire program is reduced, and the execution speed of the program is increased.

【００５４】また、実行頻度が高いループ内にある各フ
ァンクションは、その呼び出し順に順序付けられ、連続
したアドレスとなるように配置されたロードモジュール
が作成される。このため、キャッシュミスが発生した場
合に、次に実行するプログラムをメインメモリ７０４の
連続したアドレスから命令キャッシュメモリ７０２に転
送することができるので、命令キャッシュメモリ７０２
への命令の転送にかかる時間も少なくて済む。Each function in a frequently executed loop is ordered in its calling order, and a load module arranged so as to have continuous addresses is created. Therefore, when a cache miss occurs, the next program to be executed can be transferred from consecutive addresses of the main memory 704 to the instruction cache memory 702.
It takes less time to transfer instructions to the server.

【００５５】さらに、プログラム構造抽出手段１０１が
プログラムのフロー解析を１回行うだけでプログラム構
造グラフが作成される。次に、ファンクション順序付け
手段１０２が呼び出し順にファンクションを並べた後
に、実行頻度の高いループからプログラム構造グラフを
調べていってファンクションを並べ替えればよい。この
ため、同様の処理を何度も繰り返す必要がなく、ファン
クションの配置を決めるまでの処理時間も短くて済む。Further, a program structure graph is created only by the program structure extracting means 101 performing the program flow analysis once. Next, after the function ordering means 102 arranges the functions in the calling order, the function may be rearranged by examining the program structure graph from the loop having a high execution frequency. Therefore, it is not necessary to repeat the same processing many times, and the processing time required to determine the arrangement of the functions can be reduced.

【００５６】本発明は、上記の実施の形態に限られず、
種々の変形、応用が可能である。以下、本発明に適用可
能な上記の実施の形態の変形態様について、説明する。The present invention is not limited to the above embodiment,
Various modifications and applications are possible. Hereinafter, modifications of the above-described embodiment applicable to the present invention will be described.

【００５７】上記の実施の形態では、リンク手段１０４
で作成されたロードモジュールを実行する計算機は、命
令データキャッシュメモリ７０２とデータキャッシュメ
モリ７０３とを別々に有していた。しかしながら、本発
明は、命令とデータとを特に区別しないで格納するキャ
ッシュメモリを有する計算機で実行するロードモジュー
ルの作成に適用してもよい。In the above embodiment, the link means 104
The computer that executes the load module created in the above has an instruction data cache memory 702 and a data cache memory 703 separately. However, the present invention may be applied to the creation of a load module to be executed by a computer having a cache memory for storing instructions and data without distinction.

【００５８】上記の実施の形態では、ファンクション順
序付け手段１０２によってプログラム中の各ファンクシ
ョンに順序付けをし、コンパイル手段１０４でコンパイ
ルしてファンクション毎のロードモジュールを作成して
いた。しかしながら、本発明は、コンパイラでソースプ
ログラムの構文解析をするときに、その構文解析結果を
基に各ファンクションに対応するオブジェクトモジュー
ルに順序を付け、その順序に関する情報を各オブジェク
トモジュールに含ませてもよい。そして、リンカが各オ
ブジェクトモジュールに含まれている順序に関する情報
に従って、オブジェクトモジュールをリンクしてロード
モジュールを作成すればよい。In the above embodiment, each function in the program is ordered by the function ordering means 102 and compiled by the compiling means 104 to create a load module for each function. However, according to the present invention, when parsing a source program with a compiler, an order may be assigned to object modules corresponding to each function based on the result of the syntax analysis, and information about the order may be included in each object module. Good. Then, according to the information on the order in which the linker includes the object modules, the object modules may be linked to create a load module.

【００５９】上記の実施の形態では、プログラム制御に
より動作するプログラム構造抽出手段１０１、ファンク
ション順序付け手段１０２、リンク手段１０３及びコン
パイル手段１０４によってソースプログラムからロード
モジュールの作成を行うものとしていた。しかしなが
ら、本発明のプログラムリンクシステムを実現するため
のプログラムは、磁気ディスク、半導体メモリ、その他
の記録媒体に格納して配布してもよい。In the above embodiment, the load module is created from the source program by the program structure extracting means 101, the function ordering means 102, the linking means 103 and the compiling means 104 which operate under program control. However, the program for realizing the program link system of the present invention may be stored on a magnetic disk, a semiconductor memory, or another recording medium and distributed.

【００６０】すなわち、図６に示すように、記録媒体１
００２に記録されたプログラムを、計算機１００１の外
部記憶装置（ハードディスクなど）１００３にインスト
ールし、計算機１００１が外部記憶装置１００３にイン
ストールされたプログラムを実行することで、プログラ
ム制御により動作するプログラム構造抽出手段１０１、
ファンクション順序付け手段１０２、リンク手段１０３
及びコンパイル手段１０４を実現してもよい。なお、計
算機１００１は、図１に示す計算機、すなわち作成され
たロードモジュールを実行する計算機としてもよいが、
これ以外の計算機、すなわち作成されたロードモジュー
ルを実行する計算機とは別の計算機として、クロスコン
パイル、クロスリンクによってソースプログラムからロ
ードモジュールを作成するものとしてもよい。That is, as shown in FIG.
002 is installed in an external storage device (such as a hard disk) 1003 of the computer 1001, and the computer 1001 executes the program installed in the external storage device 1003, whereby the program structure extracting means operates under program control. 101,
Function ordering means 102, link means 103
And the compiling means 104 may be realized. The computer 1001 may be the computer shown in FIG. 1, that is, a computer that executes the created load module.
As a computer other than this, that is, a computer different from the computer that executes the created load module, a load module may be created from a source program by cross compiling and cross linking.

【００６１】[0061]

【発明の効果】以上説明したように、本発明によれば、
実行頻度の高いモジュール同士がキャッシュメモリ中の
同一のブロックを取り合うことを避けることができ、キ
ャッシュミスの発生頻度を抑えることができる。このた
め、プログラムの実行速度が速くなる。As described above, according to the present invention,
Modules with high execution frequency can be prevented from competing for the same block in the cache memory, and the frequency of occurrence of cache misses can be suppressed. For this reason, the execution speed of the program is increased.

【００６２】また、オブジェクトモジュールの最適なリ
ンク順序を決定に同様の処理を繰り返す必要がなく、そ
のための処理時間も小さくて済む。Further, it is not necessary to repeat the same processing to determine the optimum link order of the object modules, and the processing time for the processing can be reduced.

[Brief description of the drawings]

【図１】キャッシュメモリを採用した計算機の構成例を
示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a computer that employs a cache memory.

【図２】本発明の実施の形態にかかるプログラムリンク
システムの機能構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of a program link system according to the embodiment of the present invention.

【図３】本発明の実施の形態において、図２のプログラ
ム構造抽出手段とファンクション順位付け手段とが実行
する処理を示すフローチャートである。FIG. 3 is a flowchart showing processing executed by a program structure extracting unit and a function ranking unit in FIG. 2 in the embodiment of the present invention.

【図４】本発明の実施の形態における処理を具体的に説
明するためのプログラム例である。FIG. 4 is an example of a program for specifically explaining processing according to the embodiment of the present invention.

【図５】図４に示すプログラムについて、その構造とフ
ァンクションの順位付けとの関係を示す図である。5 is a diagram showing the relationship between the structure of the program shown in FIG. 4 and the ranking of functions.

【図６】本発明の実施の形態に変形例に適用される計算
機システムの構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a computer system applied to a modification of the embodiment of the present invention.

【図７】従来技術にかかるプログラムリンクシステムの
機能構成を示すブロック図である。FIG. 7 is a block diagram showing a functional configuration of a program link system according to the related art.

[Explanation of symbols]

１０１プログラム構造抽出手段１０２ファンクション順序付け手段１０３リンク手段１０４コンパイル手段７０１ＣＰＵ７０２命令キャッシュメモリ７０３データキャッシュメモリ７０４メインメモリ 101 program structure extracting means 102 function ordering means 103 linking means 104 compiling means 701 CPU 702 instruction cache memory 703 data cache memory 704 main memory

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−165537（ＪＰ，Ａ) 特開平６−202875（ＪＰ，Ａ) Ａ．Ｈ．Ｈａｓｈｅｍｉ他，ＥｆｆｉｃｉｅｎｔＰｒｏｃｅｄｕｒｅＭａｐｐｉｎｇＵｓｉｎｇＣａｃｈｅＬｉｎｅＣｏｌｏｒｉｎｇ，ＡＣＭＳＩＧＰＬＡＮＮＯＴＩＣＥＳ, 1997，Ｖｏｌ．32，Ｎｏ．５，ｐ．171 −182 Ｋ．Ｐｅｔｔｉｓ，Ｐｒｏｆｉｌｅｇｕｉｄｅｄｃｏｄｅｐｏｓｉｔｉｏｎｉｎｇ，ＡＣＭＳＩＧＰＬＡＮＮＯＴＩＣＥＳ，1990，Ｖｏｌ．25，Ｎｏ．６，ｐ．16−27 Ｓ．ＭｃＦａｒｒｌｉｎｇ，ＰｒｏｇｒａｍＯｐｔｉｍｉｚａｔｉｏｎｆｏｒＩｎｓｔｒｕｃｔｉｏｎＣａｃｈｅｓ，ＡＣＭＳＩＧＰＬＡＮＮＯＴＩＣＥＳ，1989，ＶＯＬ．24，ｓｐｅｃｉａｌＩｓｓｕｅ，ｐ．183−191 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 9/45 G06F 12/08 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-4-165537 (JP, A) JP-A-6-202875 (JP, A) H. Hashimi et al., Efficient Procedures Mapping Using Cache Line Coloring, ACM SIGPLAN NOTICES, 1997, Vol. 32, No. 5, p. 171-182 K.P. Pettis, Profile guided code positioning, ACM SIGGPLAN NOTICES, 1990, Vol. 25, No. 6, p. 16-27 S.C. McFarring, Program Optimization for Instructions Caches, ACM SIGPLAN NO TICES, 1989, VOL. 24, special serial Issue, p. 183-191 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 9/45 G06F 12/08

Claims

(57) [Claims]

1. A program link system for linking a plurality of object modules and creating a load module to be executed by a computer having a cache memory in which a part of a program on a main memory is stored. Program structure extracting means for analyzing a structure of a program corresponding to a load module to be loaded, ordering means for assigning an order for linking the plurality of object modules based on an analysis result by the program structure extracting means, according to the order given by the means, and a link means for creating a load module to link the plurality of object modules, the plurality of object modules, load modules
Each function included in the program for which
The program structure extracting means corresponds to the operation of the load module.
Analyze the source program of the target program
Function call and function call
Create a structure graph representing the relationship with the loop including, the ordering means refers to the structure graph, first,
Arrange each function in call order, then execute frequency
Function called in a loop with high
In the order of calling within
To order the plurality of object modules.
Program link system, characterized in that the cormorant.

2. A program link system for ordering a plurality of object modules linked by a linker when creating a load module to be executed by a computer having a cache memory in which a part of a program on a main memory is stored. Program structure extracting means for analyzing a structure of a program corresponding to a load module created by the link; and an order for linking the plurality of object modules based on an analysis result by the program structure extracting means. and a sequencing means for attaching said plurality of object modules are loaded modules
Each function included in the program for which
The program structure extracting means corresponds to the operation of the load module.
Analyze the source program of the target program
Function call and function call
Create a structure graph representing the relationship with the loop including, the ordering means refers to the structure graph, first,
Arrange each function in call order, then execute frequency
Function called in a loop with high
In the order of calling within
To order the plurality of object modules.
Program link system, characterized in that the cormorant.

3. The program structure extracting means includes means for adding the number of executions of each function in a loop in a program for which a load module is to be created to a program structure graph. 3. The program link system according to claim 1, wherein the number of executions of the function is the product of the execution frequencies of the respective loops as the number of executions of each function. 4.

Wherein said cache memory includes a program link according to any one of claims 1 to 3, characterized by storing mapping program on the main memory in a direct manner or set associative method system.

5. A program linking method for linking a plurality of object modules and creating a load module to be executed by a computer having a cache memory in which a part of a program on a main memory is stored. A program structure extracting step of analyzing a structure of a program corresponding to a load module to be loaded, and an ordering step of assigning an order for linking the plurality of object modules based on a result of the analysis in the program structure extracting step. according to the order given in ordered step, and a link creating a load module to link the plurality of object modules, the plurality of object modules, load modules
Each function included in the program for which
The program structure extraction step corresponds to a load module.
Analyzes the source program of the program to be created
Function call and function call
Creating a structure graph representing a relationship with the loop including the structure graph, wherein the ordering step refers to the structure graph,
Instead, arrange each function in the order of invocation, and then execute
Functions called in frequent loops
In the order that they are called in the group
Thus, the ordering of the plurality of object modules
Program linking method and performing.

6. A recording medium for linking a plurality of object modules and recording a program for creating a load module to be executed on a computer having a cache memory in which a part of the program on a main memory is stored, wherein the link is A program structure extracting step of analyzing a structure of a program corresponding to a load module created by the above, and an ordering step of assigning an order for linking the plurality of object modules based on an analysis result in the program structure extracting step. When, according to the order given in the ordering step, and a link creating a load module to link the plurality of object modules Computing
A program to be executed by the data device, and the program structure extracting step includes a load module.
Analysis of the source program of the programs that will be created object
Function call and function call
Creating a structure graph representing a relationship with the loop including the structure graph, wherein the ordering step refers to the structure graph,
Instead, arrange each function in the order of invocation, and then execute
Functions called in frequent loops
In the order that they are called in the group
Thus, the ordering of the plurality of object modules
A computer-readable recording medium and performs.