JP3693503B2

JP3693503B2 - Processor with instruction cache write mechanism

Info

Publication number: JP3693503B2
Application number: JP20026698A
Authority: JP
Inventors: 博泰西山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-07-15
Filing date: 1998-07-15
Publication date: 2005-09-07
Anticipated expiration: 2018-07-15
Also published as: JP2000029787A

Description

【０００１】
【発明の属する技術分野】
本発明は、プロセッサに関し、特に、実行時に動的に機械語命令プログラムを生成して実行を行う仮想マシンインタプリタ等を実行するのに適したマイクロプロセッサに関する。
【０００２】
【従来の技術】
従来のマイクロプロセッサでは、主記憶（メインメモリ）とＣＰＵとの間に、高速な中間記憶装置として、キャッシュを設けることが多い。このキャッシュは、その構成上、データキャッシュと命令キャッシュを分離した非統合型キャッシュと、データキャッシュと命令キャッシュを統合した統合型キャッシュとに分類することができる。前者の非統合型キャッシュは、後者の統合型キャッシュと比較して、データキャッシュと命令キャッシュのそれぞれに、独立に高いデータ供給バンド幅を与えることができるので、一般に高い性能を得ることができる。このため、ハイエンドのマイクロプロセッサでは、非統合型のキャッシュが利用されることが多い。
【０００３】
一方、このようなマイクロプロセッサで実行されるプログラムを記述するプログラミング言語として、Ｊａｖａ（Ｊａｖａは、米国 Sun Microsystems, Inc.の商標）やＳｍａｌｌｔａｌｋといったものが知られている。これらのプログラミング言語で記述されたプログラムは、バイトコードと呼ばれる仮想的な計算機の命令に一旦変換され、バイトコードは、通常、仮想マシンインタプリタと呼ばれるソフトウェアによって実行される。
【０００４】
しかし、このような仮想マシンインタプリタによる実行は、仮想マシン命令の解釈実行などのオーバヘッドが大きい。従って、ＪａｖａやＳｍａｌｌｔａｌｋで記述されたプログラムは、ＣやＦＯＲＴＲＡＮといった従来のコンパイラ型言語で記述された場合に比べて、一般に、その実行速度は低速になる。これに対して、例えば、L.P. Deutsch と A.M. Chiffmanによる ”Efficient Implementation of the Smalltalk-80 System”（In Proceedings of the 11th Annual ACM Symposium on Principles of Programming Languages pp.297-302, 1984）に示されているように、実行対象の仮想マシン命令列に対して、それと等価な処理を行う機械語命令を実行時に生成し、生成した機械語命令を直接実行することでインタプリタ実行の高速化を行う手法が用いられる。このような手法は、機械語命令を実行時に動的に生成することから動的コンパイル、または実行を行おうとした時点でコンパイルを行うことからＪＩＴ（ジャストインタイム：Just-In-Time）コンパイルと呼ばれ、Ｊａｖａ仮想マシンの高速化などに利用されている。
【０００５】
【発明が解決しようとする課題】
このような実行時に命令を動的に生成するＪＩＴコンパイル技術は、仮想マシンインタプリタなどの高速化には有効であるが、非統合型のキャッシュ機構を採用したマイクロプロセッサで利用することを考えた場合、実行時に生成された命令を、実行時に主記憶へ書き込むことが必要になることから、次のような問題が生じることになる。
【０００６】
動的に生成された命令の主記憶への書き込みは、プロセッサから見た場合、通常のデータの書き込みと変わらないので、非統合型キャッシュを採用したプロセッサでは、データキャッシュを介して行われることになる。このため、生成された命令をデータキャッシュを介してメモリへ書き込んだ直後に、書き込んだ当該命令を実行しようとしても、命令キャッシュとデータキャッシュの整合性が保たれていないので、そのまま実行することができない。すなわち、命令キャッシュの更新が行われていないために、そのままでは予期しない命令を実行してしまう可能性がある。そこで、ほとんどのマイクロプロセッサでは、指定したデータキャッシュ上のデータを主記憶に書き戻したり、命令キャッシュ上のキャッシュブロックを無効化するための命令を備えている。そして、主記憶へ書き込んだ命令を実行する前に、このような命令を使って、書き込んだアドレスに対応する命令キャッシュ上のデータを破棄（無効化）するとともに、ライトバック（write-back）型のキャッシュであれば、変更が主記憶に反映されるように、書き込んだアドレスに対応するデータキャッシュ上のデータを主記憶に書き戻す。
【０００７】
つまり、動的な命令の生成・実行処理は、以下のような流れで行われる。
【０００８】
１．書き込み対象アドレス（以下、アドレスＡという）に対応するデータキャッシュのキャッシュブロックへの、動的に生成された機械語命令の書き込み
２. アドレスＡに対応する命令キャッシュのキャッシュブロックの破棄
３. アドレスＡに対応するデータキャッシュのキャッシュブロックの主記憶への書き戻し（ライトバック型のキャッシュの場合）
４. 主記憶のアドレスＡから命令キャッシュの対応するキャッシュブロックへの読み出し
５．命令キャッシュからの命令の読み出し及び実行
以上述べたように、非統合型キャッシュを採用したマイクロプロセッサにおいて、実行時に動的に命令を生成し、当該生成された命令を実行しようとすると、動的に生成された命令は、データキャッシュに書き込まれた後、主記憶へ書き出され、その後、命令キャッシュへ読み出す必要があるため、主記憶への書き込みや読み出しを伴うことになり、このことは、実行速度の低下要因となりうる。
【０００９】
特に、プロセッサ内での命令レベル並列度や動作周波数が向上し、主記憶参照に要する時間で、多数の命令を処理可能なハイエンドマイクロプロセッサでは、主記憶参照に要するオーバヘッドにより動的な命令生成処理を行うプログラムの動作性能が低下してしまうといった問題が深刻化するおそれがある。
【００１０】
なお、ＣやＦＯＲＴＲＡＮといった従来の静的なコンパイラ型言語では、機械語命令への変換がプログラム実行前にあらかじめ行われ、実行時に動的に命令を生成することがないため、分離型のキャッシュ構成であっても、大きな問題とはならなかった。
本発明の目的は、仮想マシンインタプリタにおけるＪＩＴコンパイルのような、実行時に動的に機械語命令の生成を行う処理において、動的な命令生成時の主記憶参照によって生じる性能低下を低く押えることができるプロセッサを提供することにある。
【００１１】
【課題を解決するための手段】
本発明に係るプロセッサは、命令セットの中に、所定のメモリ領域毎に、当該領域が、データキャッシュを介して書き込みを行う領域か、命令キャッシュを介して書き込みを行う領域かを指定することができる命令を有することを特徴とする。この場合、前記命令によって命令キャッシュを介して書き込みを行う領域として指定されたメモリ領域に対して、書き込みが指示された場合は、命令キャッシュを介して書き込みを行う。
【００１２】
また、本発明に係る第２のプロセッサは、データキャッシュを介して書き込みを行うモードか、命令キャッシュを介して書き込みを行うモードであるかを表すフラグを備え、当該フラグが命令キャッシュを介して書き込みを行うモードであることを示している場合は、命令キャッシュを介して書き込みを行うことを特徴とする。この場合、命令セットの中に、前記フラグを操作する命令を有することが好ましい。
【００１３】
また、本発明に係る第３のプロセッサは、命令セットの中に、命令キャッシュを介してメモリへの書き込みを行うことを指示する命令を有することを特徴とする。
【００１４】
なお、上記第１から第３のプロセッサにおいて、命令キャッシュを介した書き込みが指示された場合、命令キャッシュとデータキャッシュの両方に書き込みを行うようにしてもよい。このようにすれば、データキャッシュに命令キャッシュの更新データのコピーが存在することになるので、命令キャッシュからメモリへの直接的な書き戻し経路を設けることなく、命令キャッシュにおいてキャッシュ溢れが生じた場合の更新データの主記憶への書き戻しが可能になる。
【００１５】
更に、本発明に係る動的コンパイルの方法は、前述の第１から第３のプロセッサを使った動的コンパイルの方法であって、動的に生成した命令をメモリへ書き込む場合、命令キャッシュを介して書き込むようにすることを特徴とする。
【００１６】
本発明によるプロセッサでは、命令キャッシュを介してメモリへの書き込みを行うことができるので、動的コンパイルによって、動的に生成した命令をメモリへ書き込む場合、命令キャッシュを介して書き込むことができ、主記憶参照の必要性が減り、仮想マシンインタプリタにおけるＪＩＴコンパイルのような、実行時に動的に機械語命令の生成を行う処理において、動的な命令生成時の主記憶参照によって生じる性能低下を低く押えることができる
【００１７】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しつつ、詳細に説明する。
【００１８】
図９は、本発明を実施する計算機システムの例を示す図である。仮想マシンインタプリタにおけるＪＩＴコンパイラなど実行時に命令生成処理を行うソフトウェアは、ディスク装置２０４から主記憶２０３に読み出され、プロセッサ２０１によって実行される。このソフトウェアにより生成される機械語命令は、高速な中間記憶装置であるキャッシュ２０２を介して主記憶２０３とプロセッサ２０１の間で読み書きされる。なお、ここでは、簡単のためメモリ階層をキャッシュと主記憶の２階層としているが、本発明は、複数のキャッシュ階層を持つマイクロプロセッサに対して適用することも可能である。
【００１９】
図１は、本発明による非統合型キャッシュを採用したマイクロプロセッサの動作概要を説明する図である。本発明と対比するため、まず、非統合型キャッシュを採用した従来のマイクロプロセッサの動作について同図を使って説明する。非統合型キャッシュを採用した従来のマイクロプロセッサでは、ＣＰＵ１０１が実行する命令は、主記憶１０４から命令キャッシュ１０２へ読み出され、ＣＰＵ１０１へと受け渡される。このようにしてフェッチされた命令は、メモリユニット１０７を介して、命令デコーダ１０８に渡される。更に、当該命令は、命令デコーダ１０８でデコードされて、当該デコード結果に基づいて、ＡＬＵ１０５やレジスタ１０６が制御されて、命令の実行が行われる。頻繁に実行される命令は、高速な中間記憶装置である命令キャッシュ１０２に保存されることにより、低速な主記憶１０４を参照する必要がなくなるため、高速な実行を行うことが可能になる。
【００２０】
図１において、破線は、従来のマイクロプロセッサにおいて、動的命令生成処理での動的に生成された命令の流れを表す。また、実線は本発明における動的に生成された命令の流れを表している。なお、ここでの説明では、簡単のため、まず、キャッシュブロックの競合や容量不足による溢れが生じない場合について説明し、キャッシュブロックの溢れが発生する場合については後述する。
【００２１】
同図の破線で示すように、従来の動的命令生成処理においては、ＡＬＵ１０５によって実行時に動的に生成された命令は、レジスタ１０６に格納された後、メモリユニット１０７を介して、データキャッシュ１０３へ書き込まれる。データキャッシュ１０３がライトバック方式を採用している場合、この状態では、実行すべき命令がデータキャッシュ１０３上にのみ存在するので、当該命令が書き込まれたキャッシュブロックを主記憶１０４へ書き戻す。また、命令キャッシュ１０２上に当該書き込みを行ったアドレスに対応するキャッシュブロックが存在する可能性があるので、存在する場合には命令キャッシュ１０２上の当該ブロックを無効化する。これによって、動的に生成された命令の書き込みを行ったアドレスに対応するキャッシュブロックは、命令キャッシュ１０２上に存在しなくなるので、生成された命令を実行しようとした時点で、主記憶１０４から命令キャッシュ１０２上に命令が読み出され、メモリユニット１０７を介して命令デコーダ１０８へと受け渡され、デコード結果に従って、ＣＰＵ１０１での命令実行が行われる。
【００２２】
これに対して、本発明における動的命令生成処理での生成命令の流れでは、ＡＬＵ１０５が動的に生成した命令は、レジスタ１０６に格納された後、メモリユニット１０７を介して、命令キャッシュ１０２に直接書き込まれる。そして、生成された命令を実行する場合には、命令キャッシュ１０２上に生成された命令が存在するため、メモリユニット１０７は、命令キャッシュ１０２から実行すべき命令を読み出し、読み出した命令を命令デコーダ１０８へ渡す。
【００２３】
以上説明したように、従来のマイクロプロセッサでは、キャッシュ容量の余裕の有無に関わらず、データキャッシュ１０３に書き込んだ命令を一度高速なキャッシュメモリから低速な主記憶１０４へ書き込んで、再度命令キャッシュ１０２を介してＣＰＵ１０１へ読み出さねばならなかった。これに対して、本発明によるマイクロプロセッサでは、主記憶１０４を介することなく直接命令キャッシュ１０２上に書き込んだ命令を実行することができるので、動的に生成された命令の実行を高速化することができる。
【００２４】
以上の説明では、キャッシュブロックの競合や容量不足によるキャッシュブロックの追い出しを考慮していなかった。実際には、キャッシュ容量は限られており、キャッシュブロックの競合は生じるため、追い出し時の処理を考慮する必要がある。そこで、以下ではキャッシュ追い出し時の処理を考慮した命令書き込み処理について説明する。
【００２５】
キャッシュへの書き込み処理において、競合等によりキャッシュブロックを追い出す場合の処理は、キャッシュがライトスルー（write-through）方式を採用しているか、ライトバック方式を採用しているかによってその処理が異なる。
【００２６】
ライトスルー方式の場合には、キャッシュへの書き込みと同時に主記憶への書き込みも行われているため、追い出し対象のキャッシュブロックを単に無効化するだけで良い。すなわち、本発明による命令キャッシュをライトスルー方式のキャッシュとして構成した場合は、命令キャッシュと主記憶の両方に、書き込み対象の命令が書き込まれるので、主記憶上に最新のデータが保存されていることになり、キャッシュブロックの追い出し時にはキャッシュブロックの内容を破棄すれば良い。
【００２７】
一方、ライトバック方式の場合には、最新のデータはキャッシュ上のみに存在するので、更新データを含むキャッシュブロックの追い出し時には主記憶へのデータの書き戻し処理を行う必要がある。従来のプロセッサでは、ＣＰＵからキャッシュへのデータの書き込みはデータキャッシュに対してのみ行われていたため、キャッシュから主記憶へのデータの書き戻し処理はデータキャッシュだけでしか行われていなかった。本発明によるプロセッサでは、命令キャッシュに対する書き込みも行うので、ライトバック方式を採用した場合、命令キャッシュからの主記憶へのデータの書き戻し処理を考慮する必要がある。このような命令キャッシュからの主記憶へのデータの書き戻し処理には、命令キャッシュから直接主記憶への書き込みを可能とする方式と、従来通り主記憶への書き込みはデータキャッシュのみから行うようにする方式とが考えられる。以下では、各方式についてその実現例を示す。
【００２８】
図２は、命令キャッシュから主記憶へキャッシュブロックを書き込み可能なようにアーキテクチャを拡張した場合の、動的に生成された命令の流れを表している。図２において、ＡＬＵ１０５によって動的に生成され、レジスタ１０６に格納された命令は、メモリユニット１０７を介して命令キャッシュ１０２に書き込まれる。ここで、書き込み対象となるキャッシュブロック３０１に書き込みアドレスとは異なるアドレスに対応する命令が既に保持されている場合、そのキャッシュブロックを利用可能とする必要があるが、当該キャッシュブロック３０１の更新状態によってその処理が異なる。キャッシュブロック３０１が書き込みが行われていない状態（クリーン：clean）であれば、主記憶１０４上の内容と命令キャッシュ１０２上の内容は一致しているので、主記憶１０４への書き戻しは行わず、命令キャッシュ１０２上のキャッシュブロック３０１のデータを破棄する。一方、キャッシュブロック３０１が書き込みが行われた状態（ダーティ：dirty）であれば、命令キャッシュ１０２上のみに最新の命令が存在するので、まず、キャッシュブロック３０１を対応する主記憶上の領域３０２へ書き戻す。
【００２９】
命令キャッシュ１０２上の元のデータの破棄または書き戻しが終了すると、動的に生成された命令の書き込み対象アドレスに対応した主記憶１０４上のデータブロック３０３をキャッシュブロック３０１へ読み込んで、メモリユニット１０７から受け渡された書き込み対象命令の書き込みを行う。これらの動作は、通常のライトバック型データキャッシュの動作と同様である。
【００３０】
図３は、図２に示した命令キャッシュから主記憶へのキャッシュブロックの書き戻し機構を追加した場合の主記憶への書き込み処理の流れを示す図である。同図に示すように、処理を開始すると（Ｓ４０１）、まず、書き込みが命令キャッシュを介した書き込みか否かを判別する（Ｓ４０２）。あるデータ書き込み命令がデータキャッシュを介した書き込みか、命令キャッシュを介した書き込みかを指定・判断する方法については、後述する。
【００３１】
判別の結果、当該書き込みがデータキャッシュを介したものであれば（Ｓ４０２：ＮＯ）、従来と同じデータキャッシュを介した書き込み処理を行い（Ｓ４０３）、処理を終了する（Ｓ４０８）。一方、命令キャッシュを介した書き込みであれば（Ｓ４０２：ＹＥＳ）、書き込み対象アドレスに対応するキャッシュブロックが命令キャッシュ上に存在するか否かを調べる（Ｓ４０４）。その結果、書き込み対象アドレスに対応するキャッシュブロックが命令キャッシュ上に存在しない場合は（Ｓ４０４：ＮＯ）、続けて、データキャッシュに書き込み対象アドレスに対応するキャッシュブロックが存在するか否かを調べ、そのようなキャッシュブロックが存在し、データキャッシュの当該キャッシュブロックがダーティ状態であれば当該ブロックを主記憶に書き出し（Ｓ４０５）、書き込み対象のアドレスに対応したブロックを命令キャッシュに読み出す（Ｓ４０６）。これにより、命令キャッシュ上に書き込み対象アドレスのキャッシュブロックが確保されるので、命令キャッシュの当該ブロックに動的に生成された命令を書き込み（Ｓ４０７）、処理を終える（Ｓ４０８）。一方、書き込み対象アドレスに対応するキャッシュブロックが命令キャッシュ上に存在する場合は（Ｓ４０４：ＹＥＳ）、命令キャッシュ上に書き込み対象アドレスのキャッシュブロックが既に確保されているので、そのまま、命令キャッシュに命令を書き込み（Ｓ４０７）、処理を終える（Ｓ４０８）。
【００３２】
なお、命令書き込み対象アドレスに対応するキャッシュブロックがデータキャッシュ上に存在する場合は、命令キャッシュとの整合性を保つため、該当するデータキャッシュのデータブロックを破棄（無効化）するか、又は、命令キャッシュとともに、データキャッシュにも命令を書き込むようにすればよい。
【００３３】
図４は、命令キャッシュ１０２から主記憶１０４へ直接書き込みを行う機構を有しない場合の、動的に生成された命令の流れを表している。この場合、命令キャッシュから主記憶への直接的な書き戻し経路を設ける必要がないので、図２に示したものに比べて、従来のマイクロプロセッサに対するハードウェア的な変更量が少なくてすむ。
【００３４】
図４に示すように、ＡＬＵ１０５によって動的に生成され、レジスタ１０６に格納された命令は、メモリユニット１０７を介して命令キャッシュ１０２と、データキャッシュ１０３に書き込まれる。すなわち、データキャッシュ１０３上にも当該命令のコピーを持つこととする。
【００３５】
このようにデータキャッシュ１０３上にも当該命令（に対応するキャッシュブロック）のコピーを持つのは、本実施形態では、命令キャッシュ１０２から主記憶１０４へのキャッシュブロックの書き戻しパスが存在しないため、命令キャッシュ１０２に対して動的に生成された命令の書き込みを行う際に、キャッシュ溢れが生じ、主記憶への書き戻しが必要になった場合、当該書き戻しをできるようにするためである。
【００３６】
命令キャッシュ１０２への書き込みを行う際、キャッシュ溢れが生じ、主記憶への書き戻しが必要になった場合は、書き戻し対象となった命令キャッシュ上のブロック５０１を破棄するとともに、対応するデータキャッシュ１０３上のブロック５０２の更新状況を調べて、ダーティであれば、対応する主記憶１０４上の領域５０４に書き戻す。次に、書き込み対象アドレスに対応するブロック５０３を主記憶１０４から読み出し、命令キャッシュ１０２とデータキャッシュ１０３の両方に読み込む。そして、書き込み対象の命令を命令キャッシュ１０２上のブロック５０１と、データキャッシュ１０３上のブロック５０２へ書き込む。
【００３７】
なお、命令キャッシュ１０２上のデータに対応して、データキャッシュ１０３上に確保されているキャッシュブロックは、データキャッシュへのデータの書き込みによる競合によって主記憶へ書き戻されることがあるが、その場合には、命令キャッシュ１０２上のキャッシュブロックの内容と主記憶１０４上のデータは一致しているので、特別な操作を行う必要はない。
【００３８】
図５は、図４に示した命令キャッシュから主記憶へのキャッシュブロックの書き戻し機構を追加しない場合の主記憶への書き込み処理の流れを示す図である。同図に示すように、主記憶への書き込みを行う場合（Ｓ６０１）、まず、命令キャッシュを介した書き込みか否かのチェックを行う（Ｓ６０２）。書き込みが命令キャッシュを介した書き込みか否かの指定・判定方法については後述する。
【００３９】
チェックの結果、書き込みが、命令キャッシュを介した書き込みでなければ（Ｓ６０２：ＮＯ）、通常のデータキャッシュを介した書き込み処理を行い（Ｓ６０３）、書き込み処理を終了する（Ｓ６１０）。
【００４０】
一方、命令キャッシュを介した書き込みであれば（Ｓ６０２：ＹＥＳ）、まず、データキャッシュに書き込み対象アドレスに対応するブロックが存在するか否かを調べる（Ｓ６０４）。その結果、対応するブロックがデータキャッシュに存在しなければ（Ｓ６０４：ＮＯ）、データキャッシュに当該ブロックを読み出す（Ｓ６０５）。更に、命令キャッシュに書き込み対象アドレスに対応するブロックが存在するか否かを調べる（Ｓ６０６）。
【００４１】
その結果、命令キャッシュに当該ブロックが存在しなければ（Ｓ６０６：ＮＯ）、続けて、データキャッシュの当該ブロックがダーティであるか否かを調べて、ダーティであれば主記憶に当該ブロックを書き戻す（Ｓ６０７）。次に、書き込み対象アドレスに対応するブロックを主記憶から命令キャッシュへ読み出す（Ｓ６０８）。これにより、主記憶、データキャッシュ、命令キャッシュの内容が一致するので、データキャッシュと命令キャッシュに書き込み対象の命令を書き込み（Ｓ６０９）、処理を終了する（Ｓ６１０）。
【００４２】
一方、命令キャッシュに当該ブロックが存在すれば（Ｓ６０６：ＹＥＳ）、データキャッシュと命令キャッシュの両方にブロックが存在することになるので、データキャッシュと命令キャッシュに書き込み対象の命令を書き込み（Ｓ６０９）、処理を終了する（Ｓ６１０）。
【００４３】
なお、上述した処理フローでは、データキャッシュに書き込み対象アドレスに対応するブロックが存在しなければ、主記憶からデータキャッシュへの読み出しを行っているが（Ｓ６０５）、命令キャッシュとデータキャッシュの間でデータの転送が可能であれば、命令キャッシュに対応するデータが存在する場合は、主記憶との間の転送は省略することができる。同様に、書き込み対象アドレスに対応するブロックを主記憶から命令キャッシュへ読み出しているが（Ｓ６０８）、命令キャッシュとデータキャッシュの間でデータの転送が可能で、データキャッシュに対応するデータが存在する場合は、主記憶との間の転送は省略し、命令キャッシュからコピーするように最適化することができる。
【００４４】
次に、あるデータ書き込みがデータキャッシュを介した書き込みか、命令キャッシュを介した書き込みかを指定・判別する方法について説明する。このような方法として、例えば、以下の（ａ）〜（ｃ）に示すようなものが考えられる。
【００４５】
（ａ）ページなど所定のメモリ領域毎に、命令キャッシュ及びデータキャッシュのいずれを介して書き込みを行う領域かを指定できるようにし、書き込み対象のアドレスによって判別する方法。
【００４６】
（ｂ）ＣＰＵに内部フラグを設け、フラグの設定によって、命令キャッシュ及びデータキャッシュのいずれを介して書き込みを行うモードであるかを指定・判別する方法。
【００４７】
（ｃ）データ書き込み命令として、命令キャッシュを介したものと、データキャッシュを介したものとを分けて用意し、これらの命令を使い分けることで、指定・判別する方法。
【００４８】
まず、上述した（ａ）の方法に対応するものとして、マイクロプロセッサの命令セットのなかに、メモリ領域毎に、データキャッシュを介して書き込みが行われる領域か、命令キャッシュを介して書き込みが行われる領域かを指定するための命令（例えば、「set-dwrite」及び「set-iwrite」）を設けた場合について説明する。プロセッサは、メモリに対する書き込みを実行する際、書き込み対象アドレスがいずれの領域に該当するのか調べて、その結果に応じて、命令キャッシュまたはデータキャッシュを介した書き込みを行う。
【００４９】
命令「set-dwrite」は、そのオペランドによって指定されるアドレス領域（例えば、オペランドで指定されるアドレスから１ページ分）は、データキャッシュを介して書き込みが行われるべき領域であることを指定する命令であり、当該命令実行後は、指定されたアドレス領域に対する書き込みはデータキャッシュを介して行われるようになる。同様に、命令「set-iwrite」は、オペランドによって指定されたアドレス領域は、命令キャッシュを介して書き込みが行われるべき領域であることを指定する命令であり、当該命令実行後は、指定されたアドレス領域に対する書き込みは命令キャッシュを介して行われる。
【００５０】
図６は、命令「set-dwrite」及び「set-iwrite」を設けた場合のハードウェア構成例を示す図である。これらの命令は、命令フェッチ・デコードユニット１００１によりメモリからの読み出され、デコードされて、実行制御が行われる。命令「set-dwrite」又は命令「set-iwrite」が実行されると、命令キャッシュを介した書き込みかデータキャッシュを介した書き込みかを示すビットとともに、オペランドで指定されたアドレスが、方向テーブル１００２に記憶される。
【００５１】
そして、メモリへの書き込みを行う命令、例えば、ストア命令が実行され、データのストアを行う場合には、まず、ストア命令で指定されたアドレッシング・モードに従って、レジスタ１００３の値とストア命令中で指定された即値（immediate）のいずれかをマルチプレクサ１００４によって選択し、加算器１００５で、レジスタ１００３の値と演算して、データを格納する実効アドレスを生成する。そして、生成されたアドレスが、方向テーブル１００２に存在するか否かを比較器１００６により確認し、同時にそのアドレスに対して指定された書き込み対象キャッシュが命令キャッシュであるか否かを比較器１００７で調べる。その結果、方向テーブル１００２にストア・アドレスが存在し、かつ書き込み対象キャッシュが命令キャッシュであると指定されていた場合は、デマルチプレクサ１００８及びデマルチプレクサ１００９によって、書き込みアドレス、書き込みデータの供給先として、命令キャッシュ１０１１を選択し、書き込みを行う。一方、方向テーブル１００２にストア・アドレスが存在しないか、又は、書き込み対象キャッシュが命令キャッシュであると指定されていない場合は、デマルチプレクサ１００８及びデマルチプレクサ１００９によって、書き込みアドレス、書き込みデータの供給先として、データキャッシュ１０１０を選択し、書き込みを行う。なお、比較器１００６が比較対象とするビット数は、指定される領域の大きさによって、定まる。
【００５２】
次に、上述した（ｂ）の方法に対応するものとして、マイクロプロセッサの内部に、メモリに対する書き込みを、データキャッシュを介して行うモードであるか、命令キャッシュを介して行うモードであるかを表すフラグを設けるとともに、命令セットのなかにフラグ設定命令（例えば、「set-iwrite-flag」及び「reset-iwrite-flag」）を設けた場合について説明する。命令「set-iwrite-flag」及び「reset-iwrite-flag」は、ＣＰＵに設けられた内部フラグ（iwrite-flag）を制御（セット／リセット）するための命令であり、ＣＰＵは、このフラグの設定値に従って、命令キャッシュを介してデータの書き込みを行うか、データキャッシュを介してデータの書き込みを行うかを決定する。
【００５３】
ここで、命令「set-iwrite-flag」は、命令キャッシュを介して書き込みが行われるように、内部フラグをセットする命令であり、当該命令実行後の書き込み命令はすべて命令キャッシュを介して行われる。これに対して、命令「reset-iwrite-flag」は、データキャッシュを介して書き込みが行われるように、内部フラグをリセットする命令であり、当該命令実行後の書き込みはすべてデータキャッシュを介して行われる。
【００５４】
図７は、内部フラグ並びに命令「set-iwrite-flag」及び「reset-iwrite-flag」を設けた場合のハードウェア構成例を示す図である。命令「set-iwrite-flag」及び「reset-iwrite-flag」は、命令フェッチ・デコードユニット１１０１により、メモリから読み出され、デコードされ、実行制御が行われる。命令「set-dwrite-flag」又は「set-iwrite-flag」が実行されると、現在の書き込みモードを表すフラグが内部フラグ（iwrite-flag）１１０２に記憶される。
【００５５】
そして、ストア命令等によりデータの書き込みを行う場合には、図６の場合と同様にして実効アドレスを、レジスタ１１０３、マルチプレクサ１１０４、加算器１１０５によって生成する。そして、デマルチプレクサ１１０６、１１０７では、フラグ１１０２の値に従ってアドレスおよびデータの供給先を選択し、データキャッシュ１１０９または命令キャッシュ１１０８へ書き込みを行う。
【００５６】
最後に、上述した（ｃ）の方法に対応するものとして、命令コードによって命令キャッシュを介してデータの書き込みを行うか、データキャッシュを介してデータの書き込みを行うかを指定・決定する場合について説明する。すなわち、マイクロプロセッサの命令セットのなかに、データ書き込み命令として、通常の、データキャッシュを介してデータの書き込みを行う命令（例えば、「write」）の他に、命令キャッシュを介してデータの書き込みを行う命令（例えば、「iwrite」）を用意し、ソフトウェア等は、これら２種の命令を適宜選択して使用することにより、主記憶への書き込みの際に介するキャッシュの選択を行う。
【００５７】
命令「iwrite」が実行されると、オペランドで指定された実効アドレスに対して、オペランドで指定したレジスタ等に格納されたデータ（命令）を、命令キャッシュを介して書き込む。
【００５８】
図８は、命令「iwrite」を設けた場合のハードウェア構成例を示す図である。命令「iwrite」は、命令フェッチ・デコードユニット１２０１により主記憶や命令キャッシュからの読み出され、デコードされ、実行制御が行われる。命令キャッシュを対象としたストア命令「iwrite」により、データのストアを行う場合には、まず、前述したのと同様にして、レジスタ１２０３、マルチプレクサ１２０４、加算器１２０５でアドレスを生成する。デマルチプレクサ１２０６、１２０７では、命令フェッチ・デコードユニット１２０１からの、当該命令が命令キャッシュを対象としたストア命令であることを表す出力信号１２０２によりデータの書き込み先として、命令キャッシュ１２０８を選択し、書き込みを行う。
【００５９】
同様にデータキャッシュを対象としたストア命令「write」によりデータのストアを行う場合には、デマルチプレクサ１２０６、１２０７では命令フェッチ・デコードユニット１２０１からの出力信号１２０２によりアドレスおよびデータの供給先として、データキャッシュを選択し、データキャッシュ１２０９への書き込みを行う。
【００６０】
以上説明したような、命令キャッシュへの書き込みをソフトウェア的に制御できる機構をマイクロプロセッサに設けることにより、主記憶とキャッシュ間（又は、上位キャッシュと下位キャッシュ間）の不要なデータの転送を減少させ、動的な命令生成を行うプログラムの性能を向上させることが可能となる。
【００６１】
最後に、上述したような本発明によるマイクロプロセッサ上で実行される動的コンパイラについて説明する。本発明によるマイクロプロセッサでは、命令キャッシュへの書き込みをソフトウェア的に制御できるので、動的コンパイラは、プログラム実行時に動的に生成した命令を主記憶に格納する際、上述したいずれかの方法を使って、命令キャッシュを介して書き込むようにする。これ以外の動作は、従来の動的コンパイラの動作と同様でよい。このようにすれば、動的に生成した命令を主記憶に格納する場合の主記憶参照の必要性が減り、プログラムの高速実行が可能になる。
【００６２】
【発明の効果】
以上、詳細に説明したように、本発明によれば、実行時に動的に機械語命令の生成を行うプログラムにおいて、生成した機械語命令を直接命令キャッシュへ書き込むことが可能となる。これにより、実行時に動的に機械語命令の生成を行うプログラムの実行の高速化が図れる。
【図面の簡単な説明】
【図１】本発明による非統合型キャッシュのマイクロプロセッサの動作概要を説明する図である。
【図２】命令キャッシュから主記憶への書き込みパスを設けたマイクロプロセッサにおける、動的に生成された命令の流れを示す図である。
【図３】命令キャッシュから主記憶への書き込みパスを設けたマイクロプロセッサにおける、書き込み処理の流れを示す図である。
【図４】命令キャッシュから主記憶への書き込みパスを有しないマイクロプロセッサにおける、動的に生成された命令の流れを示す図である。
【図５】命令キャッシュから主記憶への書き込みパスを有しないマイクロプロセッサにおける、書き込み処理の流れを示す図である。
【図６】所定のメモリ領域毎に、書き込み対象キャッシュを指定できる命令を設けたマイクロプロセッサのハードウェア構成例を示す図である。
【図７】命令キャッシュを介して書き込むモードか、データキャッシュを介して書き込むモードかを指定する内部フラグを設けたマイクロプロセッサのハードウェア構成例を示す図である。
【図８】命令キャッシュを介して書き込みを行う命令を設けたマイクロプロセッサのハードウェア構成例を示す図である。
【図９】本発明が適用される計算機システムの例を示す図である。
【符号の説明】
１０１ＣＰＵ
１０２命令キャッシュ
１０３データキャッシュ
１０４主記憶
１０５ＡＬＵ
１０６レジスタ
１０７メモリユニット
１０８命令デコーダ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a processor, and more particularly to a microprocessor suitable for executing a virtual machine interpreter or the like that dynamically generates and executes a machine language instruction program at the time of execution.
[0002]
[Prior art]
In a conventional microprocessor, a cache is often provided as a high-speed intermediate storage device between a main memory (main memory) and a CPU. This cache can be classified into a non-integrated cache in which the data cache and the instruction cache are separated, and an integrated cache in which the data cache and the instruction cache are integrated. Compared with the latter integrated cache, the former non-integrated cache can provide a high data supply bandwidth to each of the data cache and the instruction cache independently, so that generally high performance can be obtained. For this reason, non-integrated caches are often used in high-end microprocessors.
[0003]
On the other hand, Java (Java is a trademark of Sun Microsystems, Inc., USA) and Smalltalk are known as programming languages for describing a program executed by such a microprocessor. Programs written in these programming languages are once converted into virtual computer instructions called byte codes, and the byte codes are usually executed by software called a virtual machine interpreter.
[0004]
However, execution by such a virtual machine interpreter has a large overhead such as interpretation and execution of virtual machine instructions. Therefore, a program written in Java or Smalltalk generally has a lower execution speed than a program written in a conventional compiler type language such as C or FORTRAN. On the other hand, it is shown in “Efficient Implementation of the Smalltalk-80 System” by LP Deutsch and AM Chiffman (In Proceedings of the 11th Annual ACM Symposium on Principles of Programming Languages pp.297-302, 1984). In this way, a machine language instruction that performs processing equivalent to the virtual machine instruction sequence to be executed is generated at the time of execution, and the generated machine language instruction is directly executed to speed up interpreter execution. It is done. Such a technique involves dynamic compilation from the dynamic generation of machine language instructions at the time of execution, or compilation at the time of execution, and JIT (Just-In-Time) compilation. It is called and used for speeding up Java virtual machines.
[0005]
[Problems to be solved by the invention]
JIT compilation technology that dynamically generates instructions at the time of execution is effective for speeding up virtual machine interpreters, etc., but is considered to be used in microprocessors that employ a non-integrated cache mechanism. Since the instruction generated at the time of execution needs to be written into the main memory at the time of execution, the following problem arises.
[0006]
Writing dynamically generated instructions to the main memory is not different from normal data writing when viewed from the processor, so in a processor employing a non-integrated cache, it is performed via the data cache. Become. For this reason, even if it is attempted to execute the written instruction immediately after the generated instruction is written to the memory via the data cache, the instruction cache and the data cache are not consistent, so that the instruction can be executed as it is. Can not. That is, since the instruction cache is not updated, an unexpected instruction may be executed as it is. Therefore, most microprocessors are provided with instructions for writing back data on a designated data cache to the main memory and invalidating a cache block on the instruction cache. Before executing the instruction written in the main memory, the instruction cache data corresponding to the written address is discarded (invalidated) and written back (write-back) type. In this cache, the data in the data cache corresponding to the written address is written back to the main memory so that the change is reflected in the main memory.
[0007]
That is, dynamic instruction generation / execution processing is performed in the following flow.
[0008]
1. Writing dynamically generated machine language instructions to the cache block of the data cache corresponding to the write target address (hereinafter referred to as address A)
2. Discard the cache block of the instruction cache corresponding to address A
3. Write back the cache block of the data cache corresponding to address A to the main memory (in the case of a write-back cache)
4. Reading from main memory address A to the corresponding cache block in the instruction cache
5). Reading and executing instructions from the instruction cache
As described above, in a microprocessor employing a non-integrated cache, when an instruction is dynamically generated at the time of execution and an attempt is made to execute the generated instruction, the dynamically generated instruction is stored in the data cache. After being written, it needs to be written to the main memory and then read to the instruction cache, which involves writing to and reading from the main memory, which can be a cause of a decrease in execution speed.
[0009]
In particular, in high-end microprocessors that can process many instructions in the time required for main memory reference due to improved instruction level parallelism and operating frequency in the processor, dynamic instruction generation processing due to the overhead required for main memory reference There is a possibility that the problem that the operating performance of the program that performs the above-mentioned will deteriorate becomes serious.
[0010]
In conventional static compiler type languages such as C and FORTRAN, conversion into machine language instructions is performed in advance before program execution, and instructions are not dynamically generated at the time of execution. But it was not a big problem.
An object of the present invention is to suppress performance degradation caused by main memory reference at the time of dynamic instruction generation in a process of dynamically generating machine language instructions at the time of execution such as JIT compilation in a virtual machine interpreter. It is to provide a processor that can be used.
[0011]
[Means for Solving the Problems]
In the instruction set, the processor according to the present invention may specify, for each predetermined memory area, whether the area is an area where data is written via a data cache or an area where data is written via an instruction cache. It has the command which can be performed. In this case, when writing is instructed to the memory area designated as an area to be written via the instruction cache by the instruction, the writing is performed via the instruction cache.
[0012]
The second processor according to the present invention includes a flag indicating whether the mode is a mode in which writing is performed via the data cache or a mode in which writing is performed via the instruction cache, and the flag is written via the instruction cache. If the mode indicates that the mode is to be performed, writing is performed through the instruction cache. In this case, it is preferable to have an instruction for manipulating the flag in the instruction set.
[0013]
The third processor according to the present invention is characterized in that the instruction set includes an instruction for instructing writing to the memory via the instruction cache.
[0014]
In the first to third processors, when an instruction to write via the instruction cache is given, writing to both the instruction cache and the data cache may be performed. In this way, there will be a copy of the instruction cache update data in the data cache, so if there is a cache overflow in the instruction cache without providing a direct write-back path from the instruction cache to the memory. Can be written back to the main memory.
[0015]
Further, the dynamic compilation method according to the present invention is a dynamic compilation method using the first to third processors described above, and when writing dynamically generated instructions to the memory, the instruction is passed through the instruction cache. It is characterized by writing.
[0016]
Since the processor according to the present invention can write to the memory via the instruction cache, when writing dynamically generated instructions to the memory by dynamic compilation, it can be written via the instruction cache. The need for memory reference is reduced, and in the process of dynamically generating machine language instructions at the time of execution, such as JIT compilation in a virtual machine interpreter, performance degradation caused by main memory reference at the time of dynamic instruction generation can be kept low. be able to
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0018]
FIG. 9 is a diagram showing an example of a computer system that implements the present invention. Software that performs instruction generation processing at the time of execution, such as a JIT compiler in a virtual machine interpreter, is read from the disk device 204 to the main memory 203 and executed by the processor 201. Machine language instructions generated by this software are read and written between the main memory 203 and the processor 201 via the cache 202 which is a high-speed intermediate storage device. Here, for the sake of simplicity, the memory hierarchy is assumed to be two levels of the cache and the main memory, but the present invention can also be applied to a microprocessor having a plurality of cache hierarchies.
[0019]
FIG. 1 is a diagram for explaining an operation outline of a microprocessor employing a non-integrated cache according to the present invention. For comparison with the present invention, first, the operation of a conventional microprocessor employing a non-integrated cache will be described with reference to FIG. In a conventional microprocessor that employs a non-integrated cache, an instruction executed by the CPU 101 is read from the main memory 104 to the instruction cache 102 and transferred to the CPU 101. The instruction fetched in this way is passed to the instruction decoder 108 via the memory unit 107. Further, the instruction is decoded by the instruction decoder 108, and the ALU 105 and the register 106 are controlled based on the decoding result to execute the instruction. Frequently executed instructions are stored in the instruction cache 102, which is a high-speed intermediate storage device, so that it is not necessary to refer to the low-speed main memory 104, so that high-speed execution can be performed.
[0020]
In FIG. 1, a broken line represents a flow of instructions dynamically generated in a dynamic instruction generation process in a conventional microprocessor. A solid line represents the flow of dynamically generated instructions in the present invention. In the description here, for the sake of simplicity, first, a description will be given of a case where no overflow occurs due to cache block contention or insufficient capacity, and a case where cache block overflow occurs will be described later.
[0021]
In the conventional dynamic instruction generation process, as dynamically indicated by the ALU 105 at the time of execution, an instruction dynamically generated at the time of execution is stored in the register 106 and then the data cache 103 via the memory unit 107 as shown by a broken line in FIG. Is written to. When the data cache 103 adopts the write-back method, in this state, the instruction to be executed exists only on the data cache 103, so the cache block in which the instruction is written is written back to the main memory 104. Further, since there is a possibility that a cache block corresponding to the address at which the writing has been performed exists on the instruction cache 102, the block on the instruction cache 102 is invalidated. As a result, the cache block corresponding to the address to which the dynamically generated instruction is written does not exist on the instruction cache 102. Therefore, when the generated instruction is executed, the instruction is read from the main memory 104. An instruction is read out on the cache 102 and transferred to the instruction decoder 108 via the memory unit 107, and the CPU 101 executes the instruction according to the decoding result.
[0022]
On the other hand, in the flow of generated instructions in the dynamic instruction generating process according to the present invention, instructions dynamically generated by the ALU 105 are stored in the register 106 and then stored in the instruction cache 102 via the memory unit 107. Written directly. When the generated instruction is executed, since the generated instruction exists on the instruction cache 102, the memory unit 107 reads the instruction to be executed from the instruction cache 102, and reads the read instruction to the instruction decoder 108. To pass.
[0023]
As described above, in the conventional microprocessor, the instruction written in the data cache 103 is once written from the high-speed cache memory to the low-speed main memory 104 regardless of whether the cache capacity is sufficient or not, and the instruction cache 102 is set again. Via the CPU 101. On the other hand, the microprocessor according to the present invention can execute instructions written directly on the instruction cache 102 without going through the main memory 104, so that the execution of dynamically generated instructions can be executed at high speed. Can do.
[0024]
The above description does not consider cache block eviction due to cache block contention or capacity shortage. Actually, the cache capacity is limited, and cache block contention occurs. Therefore, it is necessary to consider processing at the time of eviction. Therefore, in the following, an instruction write process taking into account the process at the time of cache eviction will be described.
[0025]
In the cache write process, the process for evicting a cache block due to competition or the like differs depending on whether the cache employs a write-through system or a write-back system.
[0026]
In the case of the write-through method, since writing to the main memory is performed simultaneously with writing to the cache, it is only necessary to invalidate the cache block to be evicted. That is, when the instruction cache according to the present invention is configured as a write-through cache, the instruction to be written is written in both the instruction cache and the main memory, so that the latest data is stored in the main memory. Therefore, the contents of the cache block may be discarded when the cache block is evicted.
[0027]
On the other hand, in the case of the write back method, since the latest data exists only in the cache, it is necessary to perform a data write back process to the main memory when the cache block including the update data is evicted. In the conventional processor, since data is written from the CPU to the cache only in the data cache, the data write-back processing from the cache to the main memory is performed only in the data cache. Since the processor according to the present invention also writes to the instruction cache, when the write back method is adopted, it is necessary to consider the process of writing back data from the instruction cache to the main memory. In such a process of writing back data from the instruction cache to the main memory, a method that enables writing directly from the instruction cache to the main memory, and writing to the main memory as before is performed only from the data cache. It is conceivable to do this. Below, the implementation example is shown about each system.
[0028]
FIG. 2 shows the flow of dynamically generated instructions when the architecture is expanded so that cache blocks can be written from the instruction cache to the main memory. In FIG. 2, an instruction dynamically generated by the ALU 105 and stored in the register 106 is written to the instruction cache 102 via the memory unit 107. Here, when an instruction corresponding to an address different from the write address is already held in the cache block 301 to be written, it is necessary to make the cache block available, but depending on the update state of the cache block 301 The process is different. If the cache block 301 is not written (clean), the contents on the main memory 104 and the contents on the instruction cache 102 match, so the write back to the main memory 104 is not performed. The data in the cache block 301 on the instruction cache 102 is discarded. On the other hand, if the cache block 301 is in a written state (dirty), the latest instruction exists only on the instruction cache 102. First, the cache block 301 is moved to the corresponding main storage area 302. Write back.
[0029]
When the original data on the instruction cache 102 is discarded or written back, the data block 303 on the main memory 104 corresponding to the write target address of the dynamically generated instruction is read into the cache block 301 and the memory unit 107 is read. Writes the write target instruction passed from. These operations are the same as those of a normal write-back data cache.
[0030]
FIG. 3 is a diagram showing the flow of a write process to the main memory when the cache block write-back mechanism from the instruction cache to the main memory shown in FIG. 2 is added. As shown in the figure, when the process is started (S401), first, it is determined whether or not the write is a write through the instruction cache (S402). A method for designating / determining whether a certain data write instruction is a write through the data cache or a write through the instruction cache will be described later.
[0031]
As a result of the determination, if the write is via the data cache (S402: NO), the write process via the same data cache as the conventional one is performed (S403), and the process is terminated (S408). On the other hand, if it is a write through the instruction cache (S402: YES), it is checked whether or not a cache block corresponding to the write target address exists on the instruction cache (S404). As a result, when the cache block corresponding to the write target address does not exist on the instruction cache (S404: NO), it is checked whether or not the cache block corresponding to the write target address exists in the data cache. If such a cache block exists and the cache block of the data cache is in a dirty state, the block is written to the main memory (S405), and the block corresponding to the write target address is read to the instruction cache (S406). As a result, since the cache block of the write target address is secured on the instruction cache, the dynamically generated instruction is written into the block of the instruction cache (S407), and the process is terminated (S408). On the other hand, when the cache block corresponding to the write target address exists in the instruction cache (S404: YES), the cache block of the write target address is already secured in the instruction cache. Write (S407), and the process ends (S408).
[0032]
If the cache block corresponding to the instruction write target address exists in the data cache, the data block of the corresponding data cache is discarded (invalidated) or the instruction cache is stored in order to maintain consistency with the instruction cache. Instructions may be written to the data cache together with the cache.
[0033]
FIG. 4 shows the flow of dynamically generated instructions when there is no mechanism for writing directly from the instruction cache 102 to the main memory 104. In this case, since it is not necessary to provide a direct write-back path from the instruction cache to the main memory, the amount of hardware change with respect to the conventional microprocessor can be reduced as compared with that shown in FIG.
[0034]
As shown in FIG. 4, instructions dynamically generated by the ALU 105 and stored in the register 106 are written into the instruction cache 102 and the data cache 103 via the memory unit 107. That is, the data cache 103 also has a copy of the instruction.
[0035]
In this embodiment, there is no cache block write-back path from the instruction cache 102 to the main memory 104 in this embodiment because the data cache 103 has a copy of the instruction (corresponding cache block). This is because when a dynamically generated instruction is written to the instruction cache 102, when the cache overflows and the write back to the main memory becomes necessary, the write back can be performed.
[0036]
When writing to the instruction cache 102 is performed, if the cache overflows and it is necessary to write back to the main memory, the block 501 on the instruction cache to be written back is discarded and the corresponding data cache The update status of the block 502 on 103 is checked, and if it is dirty, it is written back to the corresponding area 504 on the main memory 104. Next, the block 503 corresponding to the write target address is read from the main memory 104 and read into both the instruction cache 102 and the data cache 103. Then, the instruction to be written is written to the block 501 on the instruction cache 102 and the block 502 on the data cache 103.
[0037]
The cache block secured on the data cache 103 corresponding to the data on the instruction cache 102 may be written back to the main memory due to contention due to the data writing to the data cache. Since the contents of the cache block on the instruction cache 102 match the data on the main memory 104, no special operation is required.
[0038]
FIG. 5 is a diagram showing a flow of a write process to the main memory when the cache block write-back mechanism from the instruction cache to the main memory shown in FIG. 4 is not added. As shown in the figure, when writing to the main memory (S601), first, it is checked whether or not the writing is via the instruction cache (S602). A method for specifying / determining whether or not the write is performed via the instruction cache will be described later.
[0039]
As a result of the check, if the write is not a write through the instruction cache (S602: NO), a write process through a normal data cache is performed (S603), and the write process is terminated (S610).
[0040]
On the other hand, if it is a write through the instruction cache (S602: YES), it is first checked whether or not there is a block corresponding to the write target address in the data cache (S604). As a result, if the corresponding block does not exist in the data cache (S604: NO), the block is read out to the data cache (S605). Further, it is checked whether there is a block corresponding to the write target address in the instruction cache (S606).
[0041]
As a result, if the block does not exist in the instruction cache (S606: NO), it is checked whether the block in the data cache is dirty. If it is dirty, the block is written back to the main memory. (S607). Next, the block corresponding to the write target address is read from the main memory to the instruction cache (S608). As a result, the contents of the main memory, the data cache, and the instruction cache match, so the instruction to be written is written to the data cache and the instruction cache (S609), and the process ends (S610).
[0042]
On the other hand, if the block exists in the instruction cache (S606: YES), since the block exists in both the data cache and the instruction cache, the instruction to be written is written in the data cache and the instruction cache (S609). The process ends (S610).
[0043]
In the processing flow described above, if there is no block corresponding to the write target address in the data cache, data is read from the main memory to the data cache (S605), but data is transferred between the instruction cache and the data cache. If data corresponding to the instruction cache exists, the transfer to the main memory can be omitted. Similarly, the block corresponding to the write target address is read from the main memory to the instruction cache (S608), but data can be transferred between the instruction cache and the data cache, and there is data corresponding to the data cache. Can be optimized to copy from the instruction cache, omitting transfers to and from main memory.
[0044]
Next, a method for specifying / determining whether a certain data write is a write through the data cache or a write through the instruction cache will be described. As such a method, for example, the following methods (a) to (c) can be considered.
[0045]
(A) A method of determining for each predetermined memory area such as a page whether the area is to be written via an instruction cache or a data cache, and discriminating based on an address to be written.
[0046]
(B) A method in which an internal flag is provided in the CPU, and it is specified / determined whether the mode is to perform writing via the instruction cache or the data cache by setting the flag.
[0047]
(C) A method for specifying and discriminating by separately preparing a data write command via an instruction cache and a data cache command and using these commands separately.
[0048]
First, as a method corresponding to the method (a) described above, in the instruction set of the microprocessor, for each memory area, writing is performed via the data cache, or writing is performed via the instruction cache. A case where an instruction (for example, “set-dwrite” and “set-iwrite”) for designating an area is provided will be described. When executing writing to the memory, the processor checks which area the write target address corresponds to, and performs writing via the instruction cache or data cache according to the result.
[0049]
The instruction “set-dwrite” is an instruction that specifies that the address area specified by the operand (for example, one page from the address specified by the operand) is an area to be written via the data cache. After the instruction is executed, writing to the designated address area is performed via the data cache. Similarly, the instruction “set-iwrite” is an instruction that specifies that the address area specified by the operand is an area that is to be written via the instruction cache, and is specified after execution of the instruction. Writing to the address area is performed via the instruction cache.
[0050]
FIG. 6 is a diagram illustrating a hardware configuration example when instructions “set-dwrite” and “set-iwrite” are provided. These instructions are read from the memory by the instruction fetch / decode unit 1001, decoded, and subjected to execution control. When the instruction “set-dwrite” or the instruction “set-iwrite” is executed, an address specified by the operand is stored in the direction table 1002 together with a bit indicating whether writing is performed via the instruction cache or data cache. Remembered.
[0051]
When an instruction for writing to the memory, for example, a store instruction is executed and data is stored, first, the value of the register 1003 and the value specified in the store instruction are specified according to the addressing mode specified by the store instruction. One of the immediate values thus selected is selected by the multiplexer 1004, and the adder 1005 calculates the value of the register 1003 to generate an effective address for storing data. Then, the comparator 1006 confirms whether the generated address exists in the direction table 1002, and at the same time, the comparator 1007 determines whether the write target cache designated for the address is an instruction cache. Investigate. As a result, when the store address exists in the direction table 1002 and the write target cache is designated as the instruction cache, the demultiplexer 1008 and the demultiplexer 1009 use the write address and write data as the supply destination. The instruction cache 1011 is selected and writing is performed. On the other hand, when the store address does not exist in the direction table 1002 or when the write target cache is not designated as the instruction cache, the demultiplexer 1008 and the demultiplexer 1009 serve as the write address and write data supply destination. The data cache 1010 is selected and writing is performed. Note that the number of bits to be compared by the comparator 1006 is determined by the size of the designated area.
[0052]
Next, as a method corresponding to the method (b) described above, it indicates whether the mode is a mode in which writing to the memory is performed through the data cache or the instruction cache in the microprocessor. A case will be described in which a flag is provided and flag setting instructions (for example, “set-iwrite-flag” and “reset-iwrite-flag”) are provided in the instruction set. The commands “set-iwrite-flag” and “reset-iwrite-flag” are commands for controlling (setting / resetting) an internal flag (iwrite-flag) provided in the CPU. In accordance with the set value, it is determined whether to write data via the instruction cache or to write data via the data cache.
[0053]
Here, the instruction “set-iwrite-flag” is an instruction for setting an internal flag so that writing is performed via the instruction cache, and all write instructions after execution of the instruction are performed via the instruction cache. . On the other hand, the instruction “reset-iwrite-flag” is an instruction that resets the internal flag so that writing is performed via the data cache. All writing after the execution of the instruction is performed via the data cache. Is called.
[0054]
FIG. 7 is a diagram illustrating a hardware configuration example in the case where an internal flag and instructions “set-iwrite-flag” and “reset-iwrite-flag” are provided. The instructions “set-iwrite-flag” and “reset-iwrite-flag” are read from the memory by the instruction fetch / decode unit 1101, decoded, and subjected to execution control. When the instruction “set-dwrite-flag” or “set-iwrite-flag” is executed, a flag indicating the current write mode is stored in the internal flag (iwrite-flag) 1102.
[0055]
When data is written by a store instruction or the like, an effective address is generated by the register 1103, the multiplexer 1104, and the adder 1105 as in the case of FIG. Then, the demultiplexers 1106 and 1107 select an address and data supply destination according to the value of the flag 1102, and write to the data cache 1109 or the instruction cache 1108.
[0056]
Finally, as a method corresponding to the method (c) described above, a case will be described in which designation / determination of whether to write data via the instruction cache or to write data via the data cache is performed according to the instruction code. To do. That is, in the microprocessor instruction set, as a data write instruction, in addition to a normal instruction for writing data via a data cache (for example, “write”), a data write via an instruction cache is performed. An instruction to be executed (for example, “iwrite”) is prepared, and the software or the like selects a cache to be used when writing to the main memory by appropriately selecting and using these two kinds of instructions.
[0057]
When the instruction “iwrite” is executed, the data (instruction) stored in the register or the like specified by the operand is written to the effective address specified by the operand via the instruction cache.
[0058]
FIG. 8 is a diagram illustrating a hardware configuration example when the instruction “iwrite” is provided. The instruction “iwrite” is read from the main memory or the instruction cache by the instruction fetch / decode unit 1201, decoded, and execution control is performed. When data is stored by the store instruction “iwrite” for the instruction cache, an address is first generated by the register 1203, the multiplexer 1204, and the adder 1205 in the same manner as described above. The demultiplexers 1206 and 1207 select the instruction cache 1208 as a data write destination by the output signal 1202 indicating that the instruction is a store instruction for the instruction cache from the instruction fetch / decode unit 1201, and write I do.
[0059]
Similarly, when data is stored by the store instruction “write” for the data cache, the demultiplexers 1206 and 1207 use the output signal 1202 from the instruction fetch / decode unit 1201 as the address and data supply destination. A cache is selected and writing to the data cache 1209 is performed.
[0060]
By providing the microprocessor with a mechanism that can control the writing to the instruction cache as described above, the transfer of unnecessary data between the main memory and the cache (or between the upper cache and the lower cache) is reduced. Therefore, it is possible to improve the performance of a program that performs dynamic instruction generation.
[0061]
Finally, a dynamic compiler executed on the microprocessor according to the present invention as described above will be described. In the microprocessor according to the present invention, writing to the instruction cache can be controlled by software, so that the dynamic compiler uses any of the above-described methods when storing dynamically generated instructions in the main memory during program execution. Write through the instruction cache. Other operations may be the same as those of the conventional dynamic compiler. In this way, the necessity of referring to the main memory when storing dynamically generated instructions in the main memory is reduced, and the program can be executed at high speed.
[0062]
【The invention's effect】
As described above in detail, according to the present invention, in a program that dynamically generates machine language instructions at the time of execution, the generated machine language instructions can be directly written to the instruction cache. As a result, the execution speed of a program that dynamically generates machine language instructions at the time of execution can be increased.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining an outline of operation of a microprocessor of a non-integrated cache according to the present invention.
FIG. 2 is a diagram showing a flow of dynamically generated instructions in a microprocessor provided with a write path from an instruction cache to main memory.
FIG. 3 is a diagram showing a flow of write processing in a microprocessor provided with a write path from an instruction cache to main memory.
FIG. 4 is a diagram illustrating a flow of dynamically generated instructions in a microprocessor that does not have a write path from the instruction cache to the main memory.
FIG. 5 is a diagram showing the flow of write processing in a microprocessor that does not have a write path from the instruction cache to the main memory.
FIG. 6 is a diagram illustrating a hardware configuration example of a microprocessor provided with an instruction that can specify a write target cache for each predetermined memory area;
FIG. 7 is a diagram illustrating a hardware configuration example of a microprocessor provided with an internal flag that designates a mode for writing via an instruction cache or a mode for writing via a data cache.
FIG. 8 is a diagram illustrating a hardware configuration example of a microprocessor provided with an instruction to perform writing via an instruction cache.
FIG. 9 is a diagram showing an example of a computer system to which the present invention is applied.
[Explanation of symbols]
101 CPU
102 Instruction cache
103 Data cache
104 Main memory
105 ALU
106 registers
107 Memory unit
108 Instruction decoder

Claims

Provided with a flag indicating whether it is a mode for writing via the data cache or a mode for writing via the instruction cache,
A processor which performs writing through an instruction cache when the flag indicates a mode of writing through the instruction cache.

The processor of claim 1 in the instruction set, and having an instruction for operating the flag.

A processor comprising an instruction in the instruction set for instructing writing to a memory through an instruction cache.

The processor according to any one of claims 1 to 3 , wherein when an instruction to write through the instruction cache is instructed, writing to both the instruction cache and the data cache is performed.

A method of dynamic compilation using a processor according to any one of claims 1 to 4 ,
A method of dynamic compilation, wherein when dynamically generated instructions are written to a memory, the instructions are written via an instruction cache.