JP3918274B2

JP3918274B2 - COMPILING DEVICE, COMPILING METHOD, AND RECORDING MEDIUM CONTAINING COMPILER PROGRAM

Info

Publication number: JP3918274B2
Application number: JP01416898A
Authority: JP
Inventors: 政人森島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-01-27
Filing date: 1998-01-27
Publication date: 2007-05-23
Anticipated expiration: 2018-01-27
Also published as: JPH11212802A

Description

【０００１】
【発明の属する技術分野】
本発明は原始プログラムを中間テキストに変換し、変換した中間テキストに対して最適化を行い、機械語を生成するコンパイル装置において、データキャッシュを有効に活用したプログラムを生成して実行時性能を向上するようにしたコンパイル装置に関する。
【０００２】
【従来の技術】
コンピュータの性能は従来より種々の技術的改良により飛躍的に向上してきている。特にその中心であるＣＰＵについても命令実行速度の向上が著しいＲＩＳＣ（ＲｅｄｕｃｅｄＩｎｓｔｒｕｃｔｉｏｎＳｅｔＣｏｍｐｕｔｅｒ）型アーキテクチャが多く採用されている。一方主記憶装置（メモリ）はＣＰＵに比較してあまり早くなっていない。このような状態のとき、データをメモリからロードしようとすると、メモリから長い時間（ＣＰＵに比べて相対的に）をかけて演算のためのレジスタに格納されるので、その完了を待って短い演算処理時間の後結果が得られることになる。すなわち、高速化されたＣＰＵもメモリにあるデータの処理の大部分をメモリからレジスタへのロード時間待ちのために全体としての性能向上はわずかなものとなってしまうことになる。
【０００３】
このようなＲＩＳＣ型アーキテクチャにおいては、演算に使用されるデータは必ずレジスタに格納されている必要があり、従来の汎用コンピュータのように、演算命令のオペランドにメモリを指定するアドレッシング方法は行われない。また、ＣＰＵ時間とメモリのアクセス時間の比較をしてみると、マシンサイクルをτとするとほとんどの演算命令が１τから数τの程度であるのに比較してメモリの動作時間は数十τ必要である。すなわち、実行時間の大部分はメモリアクセスの待ち時間（レイテンシとも言う）に費やされてしまうことになる。
【０００４】
この遅延時間を減らすことがＲＩＳＣ型アーキテクチャにおける性能向上の要件ということになる。そこで、このために考案されたのがデータキャッシュ機能であり、メモリを階層化してアクセスの早いキャッシュメモリを備えるもので、現在のＣＰＵのほとんどに実装されている。データキャッシュはその名のとおりメモリの内容を一時的にため込んでおいて、繰り返し使用できればアクセスのおそいメモリに対するアクセス回数を減らすことができる。しかし、データキャッシュは大変速く動作する変わりに実装される容量は小さい。
【０００５】
ＣＰＵから近い順に一次キャッシュ、二次キャッシュ、メモリで階層的に構成される場合、ＣＰＵからロード要求があった時の遅延時間の程度を先に述べたマシンサイクル時間τにより例示すると、一次キャッシュにデータが存在している場合は０τ、二次キャッシュにあれば一次キャッシュへのコピーを伴い１０τ、メモリに要求データがあれば一次／二次キャッシュへのコピーを伴い１００τの遅延時間が伴うことになる。このように、ロード命令が実行されるとき目的のデータが上記したＣＰＵに近い階層に存在する場合ほど待ち時間が少なく処理が高速に行われるようになる。
【０００６】
上記のように従来のコンピュータにおいても、メモリを階層化してアクセスの早いキャッシュメモリを備え、繰り返しアクセスされるメモリ上のデータ群についてのアクセスの高速化が実現されていた。また、単に一度アクセスされたメモリ上のデータ群についてのアクセス高速化を期待するだけでなく積極的に実行されるプログラムの命令を解析して後に実行されるはずのメモリアクセスを伴う命令について前以って必要部分を通常のメモリから高速のキャッシュメモリにデータをコピーしておく工夫も開示されている。例えば、特開平８−１６１２２６にはキャッシュメモリを使用するシステムにおけるデータ先読み制御方法として、ロード命令のアクセス対象となるデータの配置パターンに応じたタイプによってロード命令の種類を分けて、この種類をオペレーション・コード等により識別できるようにして、ロード命令が先に発行したロード命令と関係づけられるとき、先行するロード命令の処理時に、ロード命令の種類に応じて後続するロード命令のロードアドレスを予測し、そのデータをキャッシュメモリに先読みする技術が示されている。
【０００７】
従来はこの様にプログラムの実行時にハード的に先読みの効果のある条件を検出してデータキャッシュの活用をすすめていたが、最近ソフト的に命令によりデータキャッシュにメモリの内容を先読みする機能すなわちデータプリフェッチ命令を持ったコンピュータが開発されるようになってきた。このデータプリフェッチ命令を利用したデータキャッシュの活用が期待されている。
【０００８】
【発明が解決しようとする課題】
上記したようにメモリ上のデータをロードするときの待ちによる遅延時間は繰り返しアクセスされるデータ群についてはデータキャッシュにより格段に削減されるが、データキャッシュに存在しないときには効果が無い。上記公知例に示すような条件のとき実行時に先読みで期待される効果を検出して事前にロードされるデータをデータキャッシュに先読みするものもあるが上記ロード命令の条件に合わない場合にはデータキャッシュの高速アクセスを有効に活用できていない。
【０００９】
従来はこの様にプログラムの実行時にハード的に先読みの効果のある条件を検出してデータキャッシュの活用をすすめていたが、実行時に検出できるようなデータキャッシュを有効とするメモリ先読みの条件は限られたものとなり、プログラム作成段階でのデータキャッシュの活用手続きの組み込みが課題でありその解決策が求められていた。
【００１０】
本発明はこのような点にかんがみて、ソースプログラム作成段階あるいはコンパイル段階でプログラムの構造の特徴を解析してロード命令に先立ってデータキャッシュに必要なメモリの内容をコピーしておくようにする手段を提供することを目的とする。
【００１１】
【課題を解決するための手段】
上記の課題は下記の如くに構成されたコンパイル装置によって解決される。
図１は、本発明の構成図である。
【００１２】
図において、１は原始プログラムを中間テキストに変換し、変換した中間テキストに対して最適化を行い、機械語を生成するコンパイル装置において、あらかじめ設定された１以上の原始プログラム命令種類に対応して変換された中間テキストを検索する中間テキスト検索手段であり、２は検索した中間テキストに基づいてデータキャッシュへコピーするメモリのアドレスを算出し、検索した中間テキストより前の実行順序位置に上記算出したアドレスを指定したデータキャッシュプリフェッチ命令を挿入するプリフェッチ命令挿入手段である。（請求項１）
３は上記検索した中間テキストの実行順序位置から、あらかじめ原始プログラム命令種類毎に設定された遡及すべきプログラム実行時間に基づいてプログラムの制御の流れをさかのぼり、上記プリフェッチ命令を挿入する実行順序位置を決定するプリフェッチ命令挿入位置最適化手段である（請求項２）。
【００１３】
また、上記コンパイル装置は複数の単純な命令を複数のスロットに格納して構成するＶＬＩＷ命令を持つＶＬＩＷ型アーキテクチャのコンピュータで実行するプログラムを生成するコンパイル装置であって、プリフェッチ命令挿入手段２は空きスロットにデータキャッシュプリフェッチ命令を生成するものであるとき、４は空きスロットがないときに、データキャッシュプリフェッチ命令の挿入を抑止するか、新たに独立したＶＬＩＷ命令を生成するかを設定するプリフェッチ命令挿入モード設定手段である（請求項７）。
【００１４】
【発明の実施の形態】
図２は本発明の実施の形態の構成図である。
２２はコンパイル装置であり、ソースプログラム２１で示される原始プログラムをソースプログラム解析部２３で中間テキストに変換し、変換した中間テキストに対して最適化実施部２４で最適化を行い、スケジューリング＆コード生成部２５でオブジェクトプログラム２６で示す機械語を生成するものである。本発明の特徴とするデータキャッシュプリフェッチ命令の生成と挿入の制御は上記最適化実施部２４で行うようになっている。コンパイル装置２２はコンピュータ上で動作するものである。
【００１５】
すなわち、本実施の形態においては、パーソナルコンピュータ、ワークステーション等の汎用的な目的で使用される計算機上で実行するコンピュータプログラムにより実現する形態を示す。
【００１６】
本発明のコンパイル装置は、処理装置、主記憶装置、補助記憶装置、入出力装置などから構成される計算機上で、コンピュータプログラムを実行して実現される。また、コンピュータプログラムは、フロッピーディスクやＣＤ−ＲＯＭ等の可搬型媒体やネットワーク接続された他の計算機の主記憶装置や補助記憶装置等に格納されて提供される。本発明の記録媒体は、上記可搬型媒体、主記憶装置、補助記憶装置に該当する。
【００１７】
提供されたコンピュータプログラムは、可搬型媒体から直接計算機の主記憶装置にロードされ、または、可搬型媒体から一旦補助記憶装置にコピーまたはインストール後に、主記憶装置にロードされて実行する。また、ネットワーク接続された他の装置に格納されて提供された場合も、他の装置からネットワークを経由して受信後に、補助記憶装置にコピー、主記憶装置にロードされ実行するものである。
【００１８】
以下にデータプリフェッチ命令の挿入の仕組みと動作について図とフローチャートを参照しながら説明する。
図３はデータプリフェッチ命令の説明図である。図３（ａ）にはメモリ上のデータを処理対象とする関数として「ｓｔｒｃｐｙ」（ｓｔｒｉｎｇｃｏｐｙ、すなわち、文字列をコピーする関数）を例にとり、アドレス「ａｄ２」で示されるメモリ上の位置に文字列「ａ，ｂ，ｃ，ｄ，ｅ」３１があるときＲＥＧ３２を経由してアドレス「ａｄ１」のメモリ３３にコピーするものであることを示している。
【００１９】
図３（ｂ）は上記関数「ｓｔｒｃｐｙ」がコンパイル装置２２のソースプログラム解析部２３で生成された「ＬＯＡＤ命令▲１▼、▲２▼」および関数「ｓｔｒｃｐｙの呼び出し命令：ＣＡＬＬ▲４▼」を示している。すなわち「ＣＡＬＬ」の引数の二つのアドレス、ａｄ１、ａｄ２はそれぞれＲＥＧ１、ＲＥＧ２に格納して渡し、「ｓｔｒｃｐｙ」ルーチンを呼び出すように中間テキストを生成していることを示している。ここで中間テキストはオブジェクトプログラムとしてマシン命令を作り出す中間の命令でおおむねマシン命令に対応するものである。ただし、コンパイル装置の後の処理工程で必要とする情報を合わせて持っている。例えば、最適化のときに必要な情報として、どの中間テキストから制御が渡されるのかを示す「経路情報」などがある。
【００２０】
ここで、▲３▼により示される「ＰＲＥＦＥＴＣＨＲＥＧ２」は「ｓｔｒｃｐｙの呼び出し命令：ＣＡＬＬ▲４▼」の引数としてコピーする元の文字列のアドレスを含むメモリ上のデータの塊を対応するデータキャッシュにコピーすることを示すものである。これは、「ｓｔｒｃｐｙ」でメモリからレジスタに文字データをロードする必要性が分かっているのでロード命令の実行前にアクセスの速いデータキャッシュにコピーする命令を挿入するものである。
【００２１】
すなわち、図３（ｃ）に示すようにＲＥＧ３を用いて「ＬＯＡＤ命令、ＳＴＯＲＥ命令」によりＲＥＧ２で示される文字データをＲＥＧ１で示されるメモリにコピーする。このＬＯＡＤ命令でアクセスするメモリの内容が含まれる部分がＬＯＡＤ命令の実行前にデータキャッシュにコピー完了していることが望ましいが、もしメモリからのコピーが途中の状態であってもＬＯＡＤ命令が発行されたとき全く新たにメモリからデータキャッシュにコピーする動作を開始することに比べれば時間短縮の効果は確実に得られる。つまり、ＬＯＡＤ命令が発行されたときに指定のアドレスのメモリの内容がデータキャッシュに存在すればデータキャッシュからロードされ、もし無ければＬＯＡＤ命令で指定されたアドレスのメモリの内容をまずデータキャッシュに移し完了を待ってレジスタにロードされるようになっている。したがって、事前に発行されたＰＲＥＦＥＴＣＨ命令によりメモリからデータキャッシュにデータの転送が始まっていればその完了を待ってデータキャッシュからレジスタにロードされるので時間短縮の効果が出るのである。
【００２２】
図４には中間テキストの説明図（その１）を示す。ここには、ユーザ関数定義文の例をソースプログラムのイメージで示す。ｆｏｏ（）によりユーザ定義の関数を表し、「・・・（１）・・・」により先頭部分の文を示す。続いてｉｆ文で条件が成り立つ［ｔｈｅｎ」のとき「・・・（２）・・・」の文を実行し、否「ｅｌｓｅ」の場合「・・・（３）・・・」の文を実行し、（２）または（３）の後に「・・・（４）・・・」を実行して終了する関数の定義を行うものである。
【００２３】
図５の中間テキストの説明図（その２）は上記図４のソースプログラムの構造をブロック図として示している。ここで、先頭の５１で入口／退避、最後の５６で出口／復元のように示したものは、定義する関数の中で使用するレジスタの内、グローバルレジスタと呼ばれるレジスタについて、その内容を関数の実行後も保証するために入口でメモリに退避し、出口で退避していた内容をレジスタに復元することを示している。つまり、グローバルレジスタの内容は各関数が呼ばれたときにはある固有の意味を持ったものであり関数の出口で呼び出し元に戻るときにはその内容を元に戻す約束ごとである。これらの「退避」あるいは「復元」も関数として呼び出すようにしている。
【００２４】
図６は中間テキストの説明図（その３）である。上記した図４および図５に対応して生成された中間テキストをアセンブラ命令レベルの構造で示したものである。詳しく図示はしていないが複数のアセンブラ命令をソース命令文の単位にくくった「節」が存在する。「ｔｈｅｎ節」、「ｅｌｓｅ節」などがこれである。また、入口、出口の「退避」、「復元」も同様に一塊の処理であることが処理の種類と制御の流れなどについての情報とともに記録されている。
【００２５】
中間テキスト（命令）を区別して説明するために「＃」により番号を示しているが、そこに示す上向きの矢印は中間テキスト（命令）の制御が渡される元の中間テキスト（命令）を指し示す経路情報である。これはコンパイル装置のソースプログラム解析部が後の工程で必要な情報として記録するものの一つであり、同様に順方向の制御の流れも記録されている。
【００２６】
具体的に対応を見てみると、関数の実行に先立つレジスタの退避（＃１、＃２）、ｉｆ文の分岐（＃３）、ｔｈｅｎ節（＃４、＃５、および＃６で後処理への分岐）、ｅｌｓｅ節（＃７、＃８）、後処理（＃９）、関数の出口での復元（＃１０、＃１１）のように構成されていることがわかる。また、上向き矢印は実行時の制御の流れを溯る方向を示している。＃３には＃４および＃７からの矢印が指し示しているがｉｆ文の条件の判定により二通りに別れることに対応している。同様に＃９では実行時の制御の流れが＃６あるいは＃８から渡されることを意味している。
【００２７】
図７はプリフェッチ対象処理テーブル説明図である。図２に示した最適化実施部２４が図４乃至６に示した中間テキストをソースプログラム解析部２３から受け取り、データプリフェッチ命令を事前に挿入することが効果的であるものを見つけるために、そのキーとなる中間テキストに記録されている処理名情報を一覧としたものでありコンピュータに記憶されている。図７に見るようにタイプがシステム関数であるものとしては、ｓｔｒｃｐｙ（文字列コピー）、ｓｔｒｎｃｐｙ（文字数指定の文字列コピー）、ｓｔｒｌｅｎ（文字列長さ）、ｓｔｒｃｍｐ（文字列比較）、ｍｅｍｃｐｙ（メモリ内容コピー）、ｍｅｍｃｍｐ（メモリ内容比較）等がある。これらは、何れもメモリ上のデータをレジスタにロードした後処理するものであり、処理の前のメモリからのロード待ち時間をいかに短くするかにより実行時間が大きく左右されるシステム関数である。対象とするシステム関数はあらかじめこのテーブルに登録しておくようにしている。
【００２８】
同様にタイプがユーザ定義関数のｆｏｏｘ、ｆｏｏｙ、ｆｏｏｚもメモリ上のデータをレジスタにロードした後処理するユーザ定義関数であり、効果の期待できるものだけをこのテーブルに登録しておく。
【００２９】
タイプがレジスタ復元のものはプログラム開発の部品として復元内容別にいくつかの種類が登録されるもので前記したようにグローバルレジスタとして共通に内容を受け渡すために利用するレジスタを局部的に別の用途に使用するとき自分で退避しておいた内容を復元して処理を呼び出し元に返すものである。すなわち必ずメモリからレジスタへのロードが実行されるのであらかじめアクセスされるアドレスのメモリをプリフェッチしてデータキャッシュにコピーしておくことが望ましいものである。
【００３０】
タイプがユーザ指定プリフェッチとしたものはソースプログラムに記述して、陽にプリフェッチを指示するハード命令を発行するものである。
プリフェッチ対象処理テーブルの各処理名に対応して引数の並びとｌｏａｄ命令の対象引数、遡及命令数が示されている。このｌｏａｄ命令の対象引数に示される引数の位置を引数の並びから探すことにより各処理でロードされるメモリのアドレス、すなわちプリフェッチすべきメモリのアドレスを知ることができる。さらに、遡及命令数は各処理ごとに実行時にその処理の先頭からどの程度処理が進んだ位置にメモリからレジスタへのロード命令が実行されるのかに対応して逆にどれだけ先行してプリフェッチ命令を実行したら良いのかを示すものである。ここに示す数値はこのように溯った位置に命令を挿入するために溯る命令数に見合った数値を記録するようにしている。ただし、各命令毎に若干の命令実行時間、すなわちマシンサイクルを単位とした時間が異なるので溯る命令数は各関数で調整して設定するようにしている。
【００３１】
次に、図８、図９にＶＬＩＷ命令の空きスロットとプリフェッチ命令配置の説明図（その１）および（その２）を示す。これはＶＬＩＷ型アーキテクチャのコンピュータに本発明を実施したときの形態を説明するためのものである。
【００３２】
上記ＶＬＩＷ型アーキテクチャを備えたコンピュータの特徴は、複数の単純な命令を複数のスロットに格納して構成するＶＬＩＷ命令を持つものである。そして当然コンパイル装置もＶＬＩＷ型アーキテクチャのコンピュータで実行するプログラムを生成するコンパイル装置が使用される。
【００３３】
まず図８（ａ）ＶＬＩＷ命令とスロットに示すようにＶＬＩＷ命令は複数のスロット、ここではスロットａ乃至スロットｄの４個のスロットを備えた命令であり、各スロットに格納された命令は同時に実行されるようになる。
【００３４】
図８（ｂ）命令配置の並列化によって一つのＶＬＩＷ命令に一緒に格納可能な命令について説明する。まず通常のコンピュータであれば順番に実行される命令列を▲１▼に示している。これはアドレスＡとアドレスＢのデータをレジスタＲＥＧ１とＲＥＧ２にロードしてＡＤＤ命令で加算してレジスタＲＥＧ３に結果を作り出し、ＳＴＯＲＥ命令でＲＥＧ３の結果をアドレスＣに格納するための命令群である。続いて▲２▼には並列に実行可能なものは横に並べ、順序を必要とする命令は縦方向に並べたものを示す。すなわち、ＲＥＧ１、ＲＥＧ２へのロードは別々のレジスタであり順序性はないので同時に実行可能であり、加算およびその結果を格納する命令はそれぞれ順番に前の命令の完了を待ってはじめて実行できるので順序が必要であることを示す。図８（ｃ）には上記並列に実行可能なＬＯＡＤ命令を一つのＶＬＩＷ命令に収めた例を示している。ここで、スロットｂおよびスロットｄに格納されている「ＮＯＰ」はＮｏＯｐｅｒａｔｉｏｎを意味する命令であり、並列に実行可能な命令が無く、やむを得ず空きスロットとして残すときに埋める命令コードである。
【００３５】
次に、図９（ｄ）プリフェッチ命令の配置について説明する。図８（ｂ）に示した命令列においてプリフェッチ命令を挿入する位置は▲１▼に示すようにＬＯＡＤ命令の後、ＡＤＤ命令の前となる。これをＶＬＩＷ命令として挿入したものを▲２▼に示す。すなわちＰＲＥＦＥＴＣＨ命令だけのためにＶＬＩＷ命令を一つ挿入しているが上記説明したＮＯＰ命令が多くを占めることになる。そこで、本発明ではＶＬＩＷ命令においては既にＮＯＰ命令つまり空きがあるときにはそこにＰＲＥＦＥＴＣＨ命令を配置して格納するようにする。このように配置したものを図９（ｄ）▲３▼に示す。このようにすることにより命令数の削減、命令実行待ち時間の削減の効果が得られる。逆にループの中などではＶＬＩＷ命令を単純に一つ挿入して追加するときのコストが余分にかかることにもなり、空きスロットがあるときだけプリフェッチ命令を挿入することも可能とする。これは図１のプリフェッチ命令挿入モード設定手段４により実現している。
【００３６】
図１０にはプリフェッチ命令挿入フローチャートを示す。ここには図２のソースプログラム解析部２３の出力である図６に示すような中間テキストを入力としてプリフェッチ命令をその前に挿入する効果のある処理を見つけてプリフェッチすべきメモリ上のデータのアドレスを指定したプリフェッチ命令を生成して最適な位置に挿入する手順を説明する。
【００３７】
ステップＳ１０１ではコンパイルするときにコンパイルモードとして利用者からプリフェッチオプションが指定されていることを確認する。ステップＳ１０２で順番に中間テキストを一つずつ取出す。ステップＳ１０３でまずそれが関数呼び出しの中間テキストかを調べる中間テキストにおいては関数は図３（ｂ）に示すように「ＣＡＬＬ」命令またはここには例示していないが、前記した図７におけるタイプがユーザ指定プリフェッチのとき指定する拡張命令である「ｐｒａｇｍ」命令の引数として関数名を表すのでこれを取出して調べることになる。
【００３８】
ステップＳ１０４では取出した関数名が図７のプリフェッチ対象処理テーブルに登録されている処理名と比較していずれかと一致しているかを調べる。テーブルに登録されていない場合はプリフェッチ命令を挿入しても効果が無いものであるとして何も行わずステップＳ１０７に移り次の中間テキストの処理に移る。
【００３９】
テーブルにあった場合にはステップＳ１０５において見つかった関数の引数並びとｌｏａｄ命令対象引数からプリフェッチ命令に指定するアドレスを調べてＰＲＥＦＥＴＣＨテキストを生成する。生成したＰＲＥＦＥＴＣＨテキストの配置についてはステップＳ１０６において後に別途フローチャートで説明する最適化を行い、挿入位置を決定して上記生成したＰＲＥＦＥＴＣＨテキストを挿入する。あとはステップＳ１０７で終わりを確認して終了する。
【００４０】
最後に図１１と図１２でプリフェッチ命令の配置の最適化を説明する。図１１および図１２にはプリフェッチ命令挿入位置最適化フローチャート（その１）および（その２）を示す。フローチャート（その２）はフローチャート（その１）から呼び出されるもので挿入位置検出プログラムの動作を示す。
【００４１】
プリフェッチ命令挿入位置最適化フローチャート（その１）は上記説明したようにプリフェッチ命令を挿入するべき関数を検出し、挿入すべきＰＲＥＦＥＴＣＨテキストを生成したときにそれを配置するべく呼び出されて起動されるプログラムである。したがってステップＳ１１１では中間テキストの位置、溯って挿入する位置を数えるための初期値０を与えたカウンタおよびその関数に対応する図７のテーブルに示した遡及命令数を引数として設定して、ステップＳ１１２において図１２に示すＰＲＥＦＥＴＣＨテキスト挿入位置検出プログラムを呼び出す。挿入位置は図６で分かるように↑で示される複数の経路情報を溯ってたどるので１以上のプリフェッチ命令の配置が起こり得るのでＰＲＥＦＥＴＣＨテキスト挿入位置検出プログラムからはそれらの１以上の位置情報を返される。
【００４２】
ステップＳ１１３では生成されたＰＲＥＦＥＴＣＨテキストを上記記憶されて返された１以上の挿入位置に配置する。
図１２のフローチャートの説明に入る。これは上記プリフェッチ命令を配置するとき呼び出されるもので、ＰＲＥＦＥＴＣＨテキスト挿入位置検出プログラムの処理手順を示すものである。このプログラムは図６の↑で示す経路情報をたどり引数で示される遡及すべき命令数だけ溯った位置を最適配置位置として記憶するものであり途中で溯る経路情報が複数あるテキストも存在するので溯りながら最も近い複数の経路情報のあった中間テキストの位置に戻って別の経路を溯るようにするため、後で説明するようにステップＳ１２６においては自分自身を呼び出す再起呼び出しの形式をとっている。
【００４３】
ステップＳ１２１では引数として与えられたテキスト位置、カウンタアドレスを受け取る。ステップＳ１２２で、まずテキスト位置を一つ溯りカウンタに１を加える。ステップＳ１２３ではカウンタが遡及命令数に到達していないか、すなわちもっと溯ってプリフェッチ命令を配置すべきかを調べる。到達したときにはステップＳ１２７においてその時点のテキスト位置をＰＲＥＦＥＴＣＨテキストの挿入位置として蓄積して記憶して呼び出し元に返る。
【００４４】
ステップＳ１２４では現在の中間テキストに未処理の経路情報があるかをチェックする。ない場合このプログラムを終了して呼び出し元に帰る。
未処理経路情報がある場合にはステップＳ１２５において未処理経路情報を取出し、ステップＳ１２６において取出した経路について溯るように、現在のテキスト位置と、カウンタを引数として自分自身すなわちＰＲＥＦＥＴＣＨテキスト挿入位置検出プログラムを呼び出す。これにより枝別れした経路を溯ってステップＳ１２３の条件を満足する位置が見つかる度にステップＳ１２７でＰＲＥＦＥＴＣＨテキストの挿入位置として蓄積して記憶する。一つの経路を溯り目的の位置を見つける度に記憶されたＰＲＥＦＥＴＣＨテキスト挿入位置は図１１の最初の呼び出しに対応して戻ったときにステップＳ１１３において取出されプリフェッチ命令がそれに基づいて配置されることになる。ステップＳ１２６での戻りはステップＳ１２４として次の別れた経路を探索するようにしている。
【００４５】
なお、フローチャートには図示していないが中間テキストの経路情報と並んで記録された付加情報によりそれ以上溯れないテキスト、例えば関数定義の先頭とか中間テキストが関数呼び出しである場合にはカウンタが遡及命令数に達したとして扱い、ステップＳ１２３においてステップＳ１２７に制御を移すように処理する。
【００４６】
このようにして中間テキストについてプリフェッチ命令を挿入すべき関数の呼び出し毎に有効なプリフェッチ命令を生成して実行時の経路の可能性を溯ってたどり最適な位置に配置して挿入することができる。
【００４７】
なお、ステップの短いユーザ定義関数などで溯る命令数が図７のテーブルにある遡及命令数を満足する位置とならず最適化の配置が完全なものではなくても、前記したようにロード命令の完了がデータキャッシュへのメモリからのコピー完了待ち時間がより少なくなるという効果が期待できる。極端にいえばプリフェッチ命令を挿入すべき関数の直前に配置したとしても効果は得られるものである。
【００４８】
また、原始プログラム作成段階でプリフェッチ命令をプログラム作成者が原始プログラムの適当な位置に記述して組み込み実行時性能を向上させる効果を得ることは通常の手段であり、上記実施の形態における図７のプリフェッチ対象処理テーブルにおけるユーザ指定プリフェッチによって組み込む方法もある。
【００４９】
【発明の効果】
以上の説明から明らかなように本発明によれば、コンパイル段階でプログラムの構造の特徴を解析してロード命令に先立ってデータキャッシュに必要なメモリの内容をコピーしておくようにすることが可能となり、データキャッシュのヒット率を向上し、あるいはメモリからデータキャッシュへのコピー待ち時間を削減することができることにより、コンピュータの命令実行性能を向上するという著しい工業的効果がある。
【図面の簡単な説明】
【図１】本発明の構成図
【図２】本発明の実施の形態の構成図
【図３】データプリフェッチ命令の説明図
【図４】中間テキストの説明図（その１）
【図５】中間テキストの説明図（その２）
【図６】中間テキストの説明図（その３）
【図７】プリフェッチ対象処理テーブル説明図
【図８】ＶＬＩＷ命令の空きスロットとプリフェッチ命令配置の説明図（その１）
【図９】ＶＬＩＷ命令の空きスロットとプリフェッチ命令配置の説明図（その２）
【図１０】プリフェッチ命令挿入フローチャート
【図１１】プリフェッチ命令挿入位置最適化フローチャート（その１）
【図１２】プリフェッチ命令挿入位置最適化フローチャート（その２）
【符号の説明】
１中間テキスト検索手段
２プリフェッチ命令挿入手段
３プリフェッチ命令挿入位置最適化手段
４プリフェッチ命令挿入モード設定手段[0001]
BACKGROUND OF THE INVENTION
The present invention converts a source program into intermediate text, optimizes the converted intermediate text, and generates a program that effectively uses the data cache in a compiling device that generates machine language to improve runtime performance. The present invention relates to a compiling device.
[0002]
[Prior art]
The performance of computers has been dramatically improved by various technical improvements. In particular, the RISC (Reduced Instruction Set Computer) type architecture, in which the instruction execution speed is remarkably improved, is often adopted for the central CPU. On the other hand, the main memory (memory) is not so fast as compared with the CPU. In such a state, if data is loaded from the memory, it takes a long time (relative to the CPU) from the memory and is stored in the register for the operation. Results will be obtained after processing time. That is, the speeded-up CPU also has a slight improvement in overall performance because most of the processing of data in the memory waits for the load time from the memory to the register.
[0003]
In such a RISC type architecture, data used for calculation must be stored in a register, and an addressing method for specifying a memory as an operand of an operation instruction is not performed unlike a conventional general-purpose computer. . Also, when comparing the CPU time and the memory access time, if the machine cycle is τ, the operation time of the memory is tens of τ compared to most arithmetic instructions ranging from 1τ to several τ. It is. That is, most of the execution time is spent on the memory access waiting time (also referred to as latency).
[0004]
Reducing the delay time is a requirement for improving the performance of the RISC type architecture. In view of this, a data cache function has been devised for this purpose. The data cache function is provided with a cache memory having a fast access by hierarchizing memories, and is implemented in most of current CPUs. As the name suggests, the data cache temporarily stores the contents of the memory, and if it can be used repeatedly, the number of accesses to the stale memory can be reduced. However, instead of the data cache operating very fast, the capacity implemented is small.
[0005]
In the case where the primary cache, the secondary cache, and the memory are hierarchically arranged in the order from the CPU, the degree of delay time when there is a load request from the CPU is exemplified by the machine cycle time τ described above, the primary cache If data exists, it is 0τ, if it is in the secondary cache, it is accompanied by a copy to the primary cache, 10τ, and if there is requested data in the memory, it is accompanied by a copy to the primary / secondary cache and a delay time of 100τ. Become. In this way, when the load instruction is executed, the waiting time is less and the processing is performed faster as the target data exists in the hierarchy closer to the CPU.
[0006]
As described above, the conventional computer also has a cache memory that is quickly accessed by hierarchizing the memory, and speeding up the access to the data group on the memory that is repeatedly accessed has been realized. In addition, it is not only expected to increase the access speed for the data group on the memory that has been accessed once, but also the instruction with memory access that should be executed later by analyzing the instructions of the program that is actively executed. Thus, a device for copying data from a normal memory to a high-speed cache memory is also disclosed. For example, Japanese Patent Application Laid-Open No. 8-161226 discloses a data prefetch control method in a system using a cache memory, in which the type of load instruction is divided according to the type according to the arrangement pattern of the data to be accessed by the load instruction. When the load instruction is related to the previously issued load instruction so that it can be identified by a code etc., the load address of the subsequent load instruction is predicted according to the type of the load instruction when the preceding load instruction is processed. A technique for prefetching the data into the cache memory is shown.
[0007]
In the past, the use of data caches was promoted by detecting conditions that had the effect of prefetching in hardware during program execution. However, recently, the function of prefetching the memory contents into the data cache by software instructions, that is, data Computers with prefetch instructions have been developed. Utilization of a data cache using this data prefetch instruction is expected.
[0008]
[Problems to be solved by the invention]
As described above, the delay time due to waiting when loading data on the memory is remarkably reduced by the data cache for the data group that is repeatedly accessed, but there is no effect when it does not exist in the data cache. Some of the conditions shown in the above known example detect the effect expected by prefetching at the time of execution and prefetch data loaded in advance into the data cache. The cache's high-speed access cannot be effectively used.
[0009]
In the past, it was recommended to use data caches by detecting conditions that have the effect of prefetching in hardware during program execution. However, there are limited memory prefetching conditions that enable data caches that can be detected during execution. As a result, the incorporation of data cache utilization procedures at the program creation stage was an issue and a solution was sought.
[0010]
In view of these points, the present invention is a means for analyzing the characteristics of the program structure at the source program creation stage or the compilation stage and copying the memory contents necessary for the data cache prior to the load instruction. The purpose is to provide.
[0011]
[Means for Solving the Problems]
The above problem is solved by a compiling apparatus configured as follows.
FIG. 1 is a block diagram of the present invention.
[0012]
In the figure, reference numeral 1 denotes a compiling device that converts a source program into intermediate text, optimizes the converted intermediate text, and generates a machine language, corresponding to one or more types of preset source program instructions. An intermediate text search means for searching the converted intermediate text. 2 calculates the address of the memory to be copied to the data cache based on the searched intermediate text, and the above calculation is performed at the execution order position before the searched intermediate text. This is prefetch instruction insertion means for inserting a data cache prefetch instruction specifying an address. (Claim 1)
3 goes back to the control flow of the program based on the program execution time to be retroactively set in advance for each source program instruction type from the execution order position of the retrieved intermediate text, and determines the execution order position to insert the prefetch instruction. It is a prefetch instruction insertion position optimizing means for determining (claim 2).
[0013]
The compiling device is a compiling device for generating a program to be executed by a computer having a VLIW type architecture having a VLIW instruction configured by storing a plurality of simple instructions in a plurality of slots, and the prefetch instruction inserting means 2 is empty. When a data cache prefetch instruction is generated in a slot, 4 is a prefetch instruction insertion for setting whether to suppress insertion of a data cache prefetch instruction or to generate a new independent VLIW instruction when there is no empty slot Mode setting means (Claim 7).
[0014]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 2 is a configuration diagram of the embodiment of the present invention.
A compiling device 22 converts a source program indicated by the source program 21 into an intermediate text by the source program analysis unit 23, optimizes the converted intermediate text by the optimization execution unit 24, and generates scheduling and code. The unit 25 generates a machine language indicated by the object program 26. The generation and insertion control of the data cache prefetch instruction, which is a feature of the present invention, is performed by the optimization execution unit 24. The compiling device 22 operates on a computer.
[0015]
That is, in the present embodiment, a form realized by a computer program executed on a computer used for general purposes such as a personal computer or a workstation is shown.
[0016]
The compiling device of the present invention is realized by executing a computer program on a computer including a processing device, a main storage device, an auxiliary storage device, an input / output device, and the like. The computer program is stored and provided in a portable medium such as a floppy disk or CD-ROM, or in a main storage device or auxiliary storage device of another computer connected to the network. The recording medium of the present invention corresponds to the portable medium, the main storage device, and the auxiliary storage device.
[0017]
The provided computer program is loaded directly from the portable medium into the main storage device of the computer, or once copied or installed from the portable medium to the auxiliary storage device, then loaded into the main storage device and executed. In addition, when the data is stored and provided in another device connected to the network, it is copied to the auxiliary storage device after being received from the other device via the network, and loaded into the main storage device for execution.
[0018]
The mechanism and operation for inserting a data prefetch instruction will be described below with reference to the drawings and flowcharts.
FIG. 3 is an explanatory diagram of a data prefetch instruction. In FIG. 3A, “strcpy” (string copy, ie, a function for copying a character string) is taken as an example of a function for processing data on the memory, and the function is set at a position on the memory indicated by the address “ad2”. When the character string “a, b, c, d, e” 31 is present, it indicates that the character string “a, b, c, d, e” is to be copied to the memory 33 at the address “ad1” via the REG 32.
[0019]
FIG. 3B shows that the function “strcpy” is generated by the “LOAD instruction (1), (2)” generated by the source program analysis unit 23 of the compiling device 22 and the function “strcpy call instruction: CALL (4)”. Show. In other words, the two addresses ad1 and ad2 of the argument “CALL” are stored in REG1 and REG2, respectively, and intermediate text is generated so as to call the “strcpy” routine. Here, the intermediate text is an intermediate instruction that generates a machine instruction as an object program, and generally corresponds to a machine instruction. However, it also has information necessary for the processing steps after the compiling device. For example, information necessary for optimization includes “route information” indicating from which intermediate text control is passed.
[0020]
Here, “PREFETCH REG2” indicated by {circle over (3)} is a block of data on the memory including the address of the original character string to be copied as an argument of “strcpy call instruction: CALL <4>” in the corresponding data cache. Indicates copying. In this case, since it is known that the character data needs to be loaded from the memory into the register by “strcpy”, an instruction to be copied to the fast-access data cache is inserted before the load instruction is executed.
[0021]
That is, as shown in FIG. 3C, the character data indicated by REG2 is copied to the memory indicated by REG1 using the LOAD instruction and the STORE instruction using REG3. It is desirable that the portion including the contents of the memory accessed by the LOAD instruction has been copied to the data cache before the execution of the LOAD instruction. However, even if copying from the memory is in progress, the LOAD instruction is issued. When this is done, the effect of shortening the time can be surely obtained as compared with starting the operation of copying from the memory to the data cache anew. In other words, if the contents of the memory at the specified address are present in the data cache when the LOAD instruction is issued, the contents of the memory at the address specified by the LOAD instruction are first transferred to the data cache. It waits for completion and is loaded into the register. Therefore, if the transfer of data from the memory to the data cache is started by the PREFETCH instruction issued in advance, the data cache is loaded from the data cache to the register after completion of the transfer, so that the time can be shortened.
[0022]
FIG. 4 shows an explanatory diagram (part 1) of the intermediate text. Here, an example of a user function definition statement is shown as an image of a source program. foo () represents a user-defined function, and “... (1). Subsequently, the statement “... (2)...” Is executed when “if” satisfies the condition [then], and the statement “... (3). The function to be executed is defined by executing “... (4)...” After (2) or (3).
[0023]
The explanatory diagram (part 2) of the intermediate text in FIG. 5 shows the structure of the source program in FIG. 4 as a block diagram. Here, the ones indicated as entry / save at the first 51 and exit / restore at the last 56 are the registers used in the function to be defined, and the contents of the registers called global registers are In order to guarantee even after execution, the contents are saved in the memory at the entrance, and the contents saved at the exit are restored to the register. In other words, the contents of the global register have a specific meaning when each function is called, and are each promise to restore the contents when returning to the caller at the exit of the function. These “save” or “restore” are also called as functions.
[0024]
FIG. 6 is an explanatory diagram (part 3) of the intermediate text. The intermediate text generated corresponding to FIGS. 4 and 5 is shown in an assembler instruction level structure. Although not shown in detail, there is a “clause” in which a plurality of assembler instructions are difficult to unit in a source instruction sentence. These are “then clause”, “else clause”, and the like. Similarly, “evacuation” and “restoration” at the entrance and exit are recorded as a single process, together with information on the type of process and the flow of control.
[0025]
In order to distinguish and explain the intermediate text (command), a number is indicated by “#”, but the upward arrow shown there indicates a path indicating the original intermediate text (command) to which control of the intermediate text (command) is passed. Information. This is one of the information recorded by the source program analysis unit of the compiling device as necessary information in a later process, and the forward control flow is also recorded.
[0026]
Looking specifically at the correspondence, register saving prior to function execution (# 1, # 2), branching of if statement (# 3), post-processing in the then clauses (# 4, # 5, and # 6) ), Else clause (# 7, # 8), post-processing (# 9), restoration at function exit (# 10, # 11). Further, the upward arrow indicates the direction in which the control flow during execution is directed. The arrow from # 4 and # 7 points to # 3, but it corresponds to the fact that it is separated into two ways depending on the determination of the condition of the if sentence. Similarly, # 9 means that the control flow during execution is passed from # 6 or # 8.
[0027]
FIG. 7 is an explanatory diagram of a prefetch target processing table. In order for the optimization execution unit 24 shown in FIG. 2 to receive the intermediate text shown in FIGS. 4 to 6 from the source program analysis unit 23 and to find out that it is effective to insert a data prefetch instruction in advance, The processing name information recorded in the intermediate text as a key is listed and stored in the computer. As shown in FIG. 7, the type is a system function. Strcppy (character string copy), strncpy (character string copy specifying the number of characters), strlen (character string length), strcmp (character string comparison), memcpy ( Memory content copy), memcmp (memory content comparison), and the like. These are system functions that are processed after loading the data in the memory into the register, and are system functions whose execution time is greatly influenced by how short the load waiting time from the memory before processing is reduced. The target system function is registered in advance in this table.
[0028]
Similarly, fox, foo, and foo whose types are user-defined functions are user-defined functions that are processed after loading the data on the memory into the register, and only those that can be expected to be effective are registered in this table.
[0029]
If the type is register restoration, some types are registered as part of program development as a part of program development. As mentioned above, the register used to transfer contents in common as a global register is used for other purposes locally. The contents saved by itself are restored and the process is returned to the caller. That is, since the load from the memory to the register is always executed, it is desirable to prefetch the memory at the address accessed in advance and copy it to the data cache.
[0030]
When the type is user-specified prefetch, it is described in the source program and a hardware instruction that explicitly instructs prefetching is issued.
The list of arguments, the target argument of the load instruction, and the number of retroactive instructions are shown corresponding to each processing name in the prefetch target processing table. The address of the memory loaded in each process, that is, the address of the memory to be prefetched can be found by searching the argument position indicated by the target argument of the load instruction from the argument list. In addition, the number of retroactive instructions is the number of prefetch instructions ahead of each other, corresponding to how far the processing load instruction is executed from the beginning of the processing at the time of execution. Indicates what should be done. The numerical values shown here are recorded so as to correspond to the number of instructions to be inserted in order to insert an instruction at such a position. However, since the instruction execution time is slightly different for each instruction, that is, the time in units of machine cycles is different, the number of instructions is adjusted and set by each function.
[0031]
Next, FIGS. 8 and 9 are explanatory views (No. 1) and (No. 2) of empty slots and prefetch instruction arrangements of VLIW instructions. This is for explaining a form when the present invention is implemented in a computer having a VLIW architecture.
[0032]
A feature of the computer having the VLIW type architecture is that it has a VLIW instruction configured by storing a plurality of simple instructions in a plurality of slots. As a matter of course, a compiling device that generates a program to be executed by a computer having a VLIW architecture is used.
[0033]
First, as shown in FIG. 8 (a) VLIW instruction and slot, the VLIW instruction is an instruction having a plurality of slots, here, four slots, slot a to slot d, and instructions stored in each slot are executed simultaneously. Will come to be.
[0034]
FIG. 8B illustrates an instruction that can be stored together in one VLIW instruction by parallelizing instruction arrangement. First, in the case of a normal computer, a sequence of instructions executed in order is shown in (1). This is an instruction group for loading the data of address A and address B into registers REG1 and REG2, adding them with an ADD instruction to create a result in register REG3, and storing the result of REG3 into address C with a STORE instruction. Subsequently, in (2), instructions that can be executed in parallel are arranged horizontally, and instructions that require an order are arranged vertically. In other words, loads to REG1 and REG2 are separate registers and are not ordered and can be executed simultaneously, and instructions for storing additions and results thereof can be executed only after waiting for completion of the previous instruction in order. Indicates that is required. FIG. 8C shows an example in which the LOAD instructions that can be executed in parallel are stored in one VLIW instruction. Here, “NOP” stored in the slot b and the slot d is an instruction meaning No Operation, and is an instruction code to be filled when there is no instruction that can be executed in parallel and is unavoidably left as an empty slot.
[0035]
Next, the arrangement of prefetch instructions in FIG. The position where the prefetch instruction is inserted in the instruction sequence shown in FIG. 8B is after the LOAD instruction and before the ADD instruction as shown in (1). The result of inserting this as a VLIW instruction is shown in (2). That is, one VLIW instruction is inserted only for the PREFETCH instruction, but the above-described NOP instruction occupies many. Therefore, in the present invention, in the VLIW instruction, when there is already a NOP instruction, that is, there is a vacancy, the PREFETCH instruction is arranged and stored therein. The arrangement thus arranged is shown in FIG. By doing so, the effect of reducing the number of instructions and the instruction execution waiting time can be obtained. Conversely, in a loop or the like, an extra cost is required for inserting and adding one VLIW instruction, and a prefetch instruction can be inserted only when there is an empty slot. This is realized by the prefetch instruction insertion mode setting means 4 in FIG.
[0036]
FIG. 10 shows a prefetch instruction insertion flowchart. Here, the address of the data on the memory to be prefetched by finding a process effective to insert the prefetch instruction before the intermediate text as shown in FIG. 6 which is the output of the source program analysis unit 23 of FIG. 2 is input. A procedure for generating a prefetch instruction designating and inserting it at an optimum position will be described.
[0037]
In step S101, it is confirmed that a prefetch option is designated by the user as a compilation mode when compiling. In step S102, intermediate texts are extracted one by one in order. In step S103, in the intermediate text to check whether it is an intermediate text of the function call, the function is not “CALL” instruction or illustrated here as shown in FIG. 3B, but the type in FIG. Since the function name is expressed as an argument of the “pragm” instruction which is an extension instruction specified at the time of user-specified prefetch, this is taken out and examined.
[0038]
In step S104, the extracted function name is compared with the process name registered in the prefetch target process table of FIG. If it is not registered in the table, it is assumed that there is no effect even if a prefetch instruction is inserted, and nothing is performed and the process proceeds to step S107 and the next intermediate text process is performed.
[0039]
If it is in the table, the PREFETCH text is generated by examining the address specified in the prefetch instruction from the argument list of the function found in step S105 and the load instruction target argument. The arrangement of the generated PREFETCH text is optimized in a later step in step S106, and an insertion position is determined and the generated PREFETCH text is inserted. After that, the end is confirmed in step S107 and the process is ended.
[0040]
Finally, optimization of the prefetch instruction arrangement will be described with reference to FIGS. FIG. 11 and FIG. 12 show a prefetch instruction insertion position optimization flowchart (part 1) and part (part 2). The flowchart (part 2) is called from the flowchart (part 1) and shows the operation of the insertion position detection program.
[0041]
The prefetch instruction insertion position optimization flowchart (part 1) detects a function to insert a prefetch instruction as described above, and is a program that is called and started when a PREFETCH text to be inserted is generated and placed. It is. Accordingly, in step S111, the counter that gave the initial value 0 for counting the position of the intermediate text, the position to be inserted, and the number of retroactive instructions shown in the table of FIG. 7 corresponding to the function are set as arguments, and step S112. In FIG. 12, the PREFETCH text insertion position detection program shown in FIG. 12 is called. As shown in FIG. 6, since the insertion position follows a plurality of path information indicated by ↑, one or more prefetch instructions can be arranged. Therefore, the PREFETCH text insertion position detection program returns one or more pieces of position information. It is.
[0042]
In step S113, the generated PREFETCH text is placed at one or more insertion positions stored and returned.
The description of the flowchart of FIG. This is called when the prefetch instruction is arranged, and shows the processing procedure of the PREFETCH text insertion position detection program. This program traces the route information indicated by ↑ in FIG. 6 and stores the position as many as the number of instructions to be retroactively indicated as an argument as the optimum arrangement position. There is also text with a plurality of route information in the middle. However, in order to return to the position of the intermediate text having the closest plurality of route information and go to another route, in the step S126, a re-calling form for calling itself is used as described later.
[0043]
In step S121, the text position and counter address given as arguments are received. In step S122, the text position is incremented by one and 1 is added to the counter. In step S123, it is checked whether the counter has reached the number of retroactive instructions, that is, whether more prefetch instructions should be arranged. When it has been reached, in step S127, the text position at that time is accumulated and stored as the insertion position of the PREFETCH text and returned to the caller.
[0044]
In step S124, it is checked whether there is unprocessed route information in the current intermediate text. If not, exit this program and return to the caller.
If there is unprocessed path information, the unprocessed path information is extracted in step S125, and the current text position and the counter are used as arguments, ie, the PREFETCH text insertion position detection program so as to talk about the path extracted in step S126. call. Thus, every time a position satisfying the condition of step S123 is found over the branched path, it is accumulated and stored as the insertion position of the PREFETCH text in step S127. The stored PREFETCH text insertion position every time a target position is found through one path is fetched in step S113 when returning in response to the first call in FIG. 11, and the prefetch instruction is arranged based on it. Become. The return in step S126 searches for the next separated route in step S124.
[0045]
Although not shown in the flowchart, the counter is retroactive when the text cannot be further moved by the additional information recorded along with the path information of the intermediate text, for example, the head of the function definition or the intermediate text is a function call. It is handled that the number of instructions has been reached, and processing is performed so as to transfer control to step S127 in step S123.
[0046]
In this way, an effective prefetch instruction can be generated for each call of a function in which a prefetch instruction should be inserted with respect to the intermediate text, and can be inserted at an optimal position by tracing the possibility of a path at the time of execution.
[0047]
Even if the number of instructions struck by a user-defined function with a short step does not satisfy the number of retroactive instructions in the table of FIG. 7 and the optimization layout is not perfect, as described above, The effect that the completion waiting time for the completion of copying from the memory to the data cache becomes smaller can be expected. In extreme terms, even if the prefetch instruction is placed immediately before the function to be inserted, the effect can be obtained.
[0048]
In addition, it is a normal means for the program creator to describe the prefetch instruction at an appropriate position in the source program at the source program creation stage to obtain the effect of improving the built-in execution performance. FIG. There is also a method of incorporating by user-specified prefetch in the prefetch target processing table.
[0049]
【The invention's effect】
As is clear from the above description, according to the present invention, it is possible to analyze the characteristics of the structure of the program at the compilation stage and copy the memory contents necessary for the data cache prior to the load instruction. Thus, there is a significant industrial effect of improving the instruction execution performance of the computer by improving the hit rate of the data cache or reducing the waiting time for copying from the memory to the data cache.
[Brief description of the drawings]
FIG. 1 is a block diagram of the present invention.
FIG. 2 is a configuration diagram of an embodiment of the present invention.
FIG. 3 is an explanatory diagram of a data prefetch instruction.
FIG. 4 is an explanatory diagram of intermediate text (part 1).
FIG. 5 is an explanatory diagram of intermediate text (part 2).
FIG. 6 is an explanatory diagram of intermediate text (part 3).
FIG. 7 is an explanatory diagram of a prefetch target processing table.
FIG. 8 is an explanatory diagram of empty slots of VLIW instructions and prefetch instruction arrangement (part 1).
FIG. 9 is an explanatory diagram of empty slots of VLIW instructions and arrangement of prefetch instructions (part 2)
FIG. 10 is a flowchart for inserting a prefetch instruction.
FIG. 11 is a flowchart of optimizing the prefetch instruction insertion position (part 1).
FIG. 12 is a flowchart of optimizing the prefetch instruction insertion position (part 2).
[Explanation of symbols]
1 Intermediate text search means
2 Prefetch instruction insertion means
3 Prefetch instruction insertion position optimization means
4 Prefetch instruction insertion mode setting means

Claims

A compiling device for optimizing an intermediate text stored by converting a source program by inserting a prefetch instruction from the memory into the data cache at a preceding execution order position corresponding to the load processing instruction from the memory to the register Because
Execution of inserting a prefetch instruction to be executed prior to the processing corresponding to the processing name of the processing name information recorded in the intermediate text by converting the source program instruction with respect to one or more source program instructions set in advance A prefetch target processing table that stores the number of instructions that are determined by retroactively counting the sequential positions along the execution path information stored in the intermediate text;
For each processing instruction of the intermediate text converted and stored from the original program, the processing name is searched for a match with any of the processing names stored in the prefetch target processing table, and the prefetch corresponding to the matching processing name is searched. Intermediate text search means for temporarily storing the number of retroactive instructions stored in the target processing table;
The control flow of the program is traced back along the execution path information stored in the intermediate text based on the number of retrospectively stored instructions from the execution order position of the processing instruction of the intermediate text whose search name matches the search. , Prefetch instruction insertion position optimization means for determining an execution order position for inserting the prefetch instruction;
Prefetch instruction insertion means for inserting a data cache prefetch instruction designating an address of a memory to be loaded by the processing instruction of the intermediate text whose processing name is matched by the search at the determined execution order position;
A compiling device comprising:

2. The compiling apparatus according to claim 1, wherein the source program instruction preset in the prefetch target processing table is a system function that specifies an address of data to be loaded from a memory to a register as an argument.

2. The compiling apparatus according to claim 1, wherein the source program instruction preset in the prefetch target processing table is a user-defined function that specifies an address of data to be loaded from a memory to a register as an argument.

The primitive program instruction preset in the prefetch target processing table is a register restoration function that specifies an address of data to be loaded from a memory in a program describing a procedure and a user-defined function as an argument. 1. The compiling device according to 1.

A compiling method for optimizing an intermediate text stored by converting a source program by inserting a prefetch instruction from the memory into the data cache at a preceding execution order position corresponding to the load processing instruction from the memory to the register Because
Execution of inserting a prefetch instruction to be executed prior to the processing corresponding to the processing name of the processing name information recorded in the intermediate text by converting the source program instruction with respect to one or more source program instructions set in advance A storage unit stores in the prefetch target processing table the number of instructions that are determined by retroactively counting the sequential positions along the execution path information stored in the intermediate text.
The search means searches for a match with one of the process names stored in the prefetch target process table for each process instruction of the intermediate text converted and stored from the source program, and corresponds to the matched process name. An intermediate text search step in which the storage means temporarily stores the number of retroactive instructions stored in the prefetch target processing table;
The control flow of the program is traced back along the execution path information stored in the intermediate text based on the number of retrospectively stored instructions from the execution order position of the processing instruction of the intermediate text whose search name matches the search. A prefetch instruction insertion position optimizing step in which the determining means determines an execution order position for inserting the prefetch instruction,
A prefetch instruction insertion step in which an insertion means inserts a data cache prefetch instruction specifying an address of a memory to be loaded by a processing instruction of an intermediate text whose processing name is matched by the search at the determined execution order position;
Compile method having

A compiler program that optimizes an intermediate text stored by converting a source program by inserting a prefetch instruction from the memory into the data cache at a preceding execution order position corresponding to the load processing instruction from the memory to the register A recording medium on which
Computer
Execution of inserting a prefetch instruction to be executed prior to the processing corresponding to the processing name of the processing name information recorded in the intermediate text by converting the source program instruction with respect to one or more source program instructions set in advance Means for storing in the prefetch target processing table the number of instructions that are determined by retroactively counting the sequential positions along the execution path information stored in the intermediate text;
For each processing instruction of the intermediate text converted and stored from the original program, the processing name is searched for a match with any of the processing names stored in the prefetch target processing table, and the prefetch corresponding to the matching processing name is searched. Intermediate text search means for temporarily storing the number of retroactive instructions stored in the target processing table;
The control flow of the program is traced back along the execution path information stored in the intermediate text based on the number of retrospectively stored instructions from the execution order position of the processing instruction of the intermediate text whose search name matches the search. , Prefetch instruction insertion position optimization means for determining an execution order position for inserting the prefetch instruction,
Prefetch instruction insertion means for inserting a data cache prefetch instruction designating an address of a memory to be loaded by the processing instruction of the intermediate text whose processing name is matched by the search at the determined execution order position;
A computer-readable recording medium in which a program for functioning as a computer is recorded.