JPH08504044A

JPH08504044A - Microcode cache system and method

Info

Publication number: JPH08504044A
Application number: JP6513176A
Authority: JP
Inventors: ディーマーズ・エリック; デレクレンツ
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1992-11-23
Filing date: 1993-11-10
Publication date: 1996-04-30
Also published as: WO1994012929A1; JP2006228258A; JP2006302313A

Abstract

(57)【要約】機能を行うためプログラム命令を処理するディジタル・システムによって用いられるマイクロコードを格納するマイクロコード・キャッシュ・システム。当システムは、マイクロコードの第１のグループを格納するよう構成された読み出し専用メモリ（ＲＯＭ）及びマイクロコードの第２のグループのブロックを一時的に格納するよう構成されたランダム・アクセス・メモリ（ＲＡＭ）キヤツシュを有する。マイクロコードの第２のグループは、ＲＡＭキャッシユから分離したメモリ装置からＲＡＭキャッシュに直接マップされているので、そのブロックはそのディジタル・システムにスワップイン及びスワップアウトすることが可能である。当マイクロコード・キャッシュ・システムは当ディジタル・システムと集積化されているが、分離したメモリ素子は集積化されていない。 (57) [Summary] A microcode cache system that stores microcode used by digital systems that process program instructions to perform functions. The system includes a read only memory (ROM) configured to store a first group of microcode and a random access memory (ROM) configured to temporarily store a block of a second group of microcode. RAM) has a cache. The second group of microcode is mapped directly into the RAM cache from a memory device separate from the RAM cache so that the block can be swapped in and out of the digital system. The microcode cache system is integrated with the digital system, but the separate memory devices are not.

Description

【発明の詳細な説明】マイクロコード・キャッシュ・システム及び方法（A Microcode Cache System and Method）発明の背景１．産業上の利用分野本発明は、ディジタル・プロセサに関するもので、より具体的には、マイクロコード命令を記憶するためのランダム・アクセス・メモリ（ＲＡＭ）に関するものである。２．関連技術マイクロコード、即ちマイクロ・プログラミングは、最初、中央処理装置（ＣＰＵ）の機能を制御するのに使われる論理をより規則正しくする１つの方法として提案された。それは、簡単なデータパスが複雑なプログラム命令を実行するのを可能にするが、複雑なプログラムを実行するのに必要な機能を一連の簡単な操作に分解しそれを順次に行うことによってこれを可能にするのである。マイクロコードを、ＣＰＵの命令セットに対応するプログラム命令と混同してはいけない。マイクロコードのアドレスは、マイクロコード・エンジン（又はマイクロシーケンサ）にとって内部的ないくつかの状態値と共に、命令レジスタの一部又は全ての内容から形成されている。マイクロシーケンサは、データパスを制御するためにマイクロコードを実行する。マイクロコードの利点は、制御論理が非常に規則的な構造、普通は読み出し専用メモリ（ＲＯＭ）、でインプリメントされることである。命令セットの変更又はそれに追加するには、ＲＯＭのビットを変更するか、新しいＲＯＭを追加しなくてはならない。より柔軟性を持たせるためＲＯＭの全部又は一部を書き込み可能なＲＡＭで置き換えると言う提案もある。しかし、ＲＡＭ方式は、実現可能なマイクロコード制御ではまだ量的な制限がある。その理由は、ＲＡＭはチップ上に集積するのが高価であるのと、チップ上のＲＯＭに比べて記憶ビット数／チップ面積比が低いからである。（現在、ＲＯＭの記憶ビット数／チップ面積比はＲＡＭに比べて3:1 又は4:1で、即ち、同じ記憶容量に対してＲＡＭはＲＯＭの３〜４倍の大きさである。）あるマイクロコード化されたマシンでは、マイクロプログラムでブランチやサブルーチンを用いることが可能であるが、これは、マイクロシーケンシング論理を非常に複雑なものにする。ＲＩＳＣプロセサの設計者が、マイクロプログラミングで用いられるいくつかの技法を採用したことはあるが、これは、ユーザー・プログラムのレベルである。これらの技法は、遅延ブランチ、ソフトウエア管理のパイプライン・インタロック等である。マイクロコードは、プログラマに、パイプライン及びハードウエアの詳細を見えるようにするが、これは長所とも短所ともなり得る。マイクロコードを使用するのはより効率的ではあるが、同時に、プログラムするのは難しい。大きな長所の一つは、もし効率的にマイクロコード化されていれば、１サイクル中に複数の操作を行うことができることである。ＲＯＭのアクセス時間は非常に早いので、頻繁に使われるＲＯＭ操作は、もしそれらがマイクロコード化されていれば高速に動作する。初期のＲＩＳＣの支持者は、命令キャッシュ・メモリもアクセス時間が速いことを指摘していた。ＲＯＭをベースにしたマイクロコードは、最初の設計者が選んだ静的な命令セットだけを含むのに対して、従来の命令キャッシュは、現在のタスクに適合するようハードウエアが自動的に選んだ頻繁に使用される操作の動的なセットを含むことが可能である。しかし、従来の命令キャッシュは、マイクロコードを格納するのに用いられたことはない。（種々の商業用コンピュータ・システムに用いられる多数のキャッシュ・メモリ構成については、ステフエン・Ｂ・フアーバー（Stephen B.Furber）著「VLSI RISC アーキテクチヤ及び構造（VLSI RISC Architecture and Organization”）」（Marcel Dckk er，Inc.，1989）及びジヨン・Ｌ・ヘネシー等（John L.Hennessy et al.）著「コンピュータ・アーキテクチャー定量的手法（Computer Architecturc-AQuantit ative Approach）」（Morgan Kaufmann Publishers，Inc.，San Mateo，Califor nia，1990）を参照のこと。ＲＩＳＣへの動きはマイクロコードへの反動と見られてきたが、これは、誤解を生むかも知れない。問題になったのは、複雑な命令セットであって、マイクロコード・インプリメンテーシヨンではない。確かに、小さなマシンではマイクロコードが複雑な命令セットをサポートする必要があることは事実だが、簡単な命令セットをマイクロコードすることも等しく可能である。初期のＲＩＳＣの設計ではマイクロコードを用いることを避けていたが、後に出てきたいくつかの商業用ＲＩＳＣ設計では再び採用するようになってきている。ほとんどの命令を単一サイクルで実行するというＲＩＳＣの傾向は、風変わりなマイクロコードの構成を用いるのを嫌うと思われる。単一サイクル命令に関しては、マイクロコードＲＯＭは、実際、単に通常のデコード構成に過ぎない。ファーバー（Furbcr）は、その著書で、複雑な命令セットを持つディジタル・プロセサは、ＤＥＣのＶＡＸ-11/780のように、通常は、マイクロコードを用いて比較的簡単なデータパス上で、またいくつかのシーケンシャルなステップで複雑な命令を実行することが可能であると述べている。ここで、簡単な命令のサブルーチンに対してマイクロコードによって得られる利点は、マイクロコードが、主メモリに比較して非常に速いアクセス時間を持つメモリに保持されていることである。しかし、ＣＰＵがプログラム命令用にキャッシュ・メモリを持っていれば、この利点は失われてしまう。マイクロコードは、簡単な命令列の予め選ばれたセットに対して優れた性能を示すが、命令キャッシュは、動的に自己調整した命令列に対して同等な性能を示す。以前のＶＡＸ設計では、マイクロコードを格納するのにオフチップのＲＡＭが使われ、そのマイクロコードは直接チップにマップされていた。しかし、マイクロコードをアクセスするためオフチップのＲＡＭとの接続用に余分なピンが必要であった。さらにもう一つの欠点は、必要なマイクロコードを取り出すのにプログラマがコードを書かなくてはならないことであった。望ましいのは、マイクロコード及びキャッシュ技術双方の高速性を利用したＣＩＳＣ又はＲＩＳＣシステムに於てさらに大きな柔軟性を得ることである。発明の要約本発明は、マイクロコードを格納するためのオンチップＲＯＭ及びＲＡＭキャッシュ双方を有するキャッシュ・システム及び方法に関するものである。本発明は十分な柔軟性を持っており、タグ情報及び対応する命令がデコードされるとき同時にデコードされるターゲット・アドレスに基づく情報を用いてマイクロコードをＲＯＭ又はＲＡＭキャッシュのいずれかに於て探すことが可能である。本発明では、重要なマイクロコードをＲＯＭに格納し、付加的なマイクロコードを必要に応じてＲＡＭにキャッシュしたりＲＡＭから取り出したりすることが可能である。本発明ではさらに、例えば、チップや周辺機器のテストを行なうためオフチップの診断マイクロコードをマイクロコードＲＡＭキャッシュにキャッシュすることが出来る。本発明は、チップの標準入出力チャネルを用いてマイクロコードのグループをアクセスするが、そのサイズは使用可能なオフチップ・メモリ量又はマイクロコードのアドレス・スペースによってのみ限定される。プログラマの観点からするとマイクロコードＲＡＭキャッシュは透明である。プログラマは、必要なマイクロコードを取り出すのにコードを書く必要はない。従って、本発明は従来の設計に比べて速く効率的である。マイクロコードはキャッシュ・スラッシングを避けるコードを配列できるため、直接にマップされたキャッシュを用いることが出来る。他のキャッシュ・マッピング手法もまた考慮されている。本発明はまたクロック・サイクルを短縮する。現在の技術によって、クロック速度がアクセス時間を短縮しているので、マイクロコードをオフチツプＲＡＭに記憶させるのはもはや有用ではなくなってきている。５０ＭＨｚのクロックは２０nsのクロック・サイクルということになるが、チップから出てマイクロコードを持ってまたチップに戻ってくるのはクロック・サイクルを消費し過ぎるので効率的でない。一つの解決法は非常に速いオフチップのＲＡＭを用いることであろうが、そのような装置はコスト的に許されないし、パッケージに多くの専用のピンを必要とする。本発明では、オンチップ・キャッシュとしては遅いが安価なＳＲＡＭ又はＤＲＡＭを用いることが可能で、しかもクロック・サイクルの形で貴重な時間を節約できる。本発明のもう一つの利点は、新しいマイクロコードをシステムに加えることが出来、それ故、それが必要なときに新しいマイクロコードをロードするオーバヘッドが必要となるだけで、組み込みＲＯＭマイクロコードと同じ速度で走らせることが可能なことである。さらにもう一つの利点は、ＲＡＭマイクロコード又は、例えばＲＡＭに分岐するＲＯＭサブルーチンといったような、ＲＯＭとＲＡＭマイクロコードの組み合わせを用いてバグを修正できることである。ＲＡＭとＲＯＭの間及びその逆の他のパッチ技法もまた考慮されている。本発明の、上述した、そしてその他の、特長及び利点は、添付図面に示すように、好適な実施例の以下に記載のより具体的な説明から明らかになるであろう。図面の簡単な説明当発明は、添付の図面を参照することにより理解を増すことが出来る。第１図は、従来のマイクロコード記憶システムのブロック図である。第２図は、本発明のマイクロコード・キャッシュ・システムの高レベルのブロック図である。第３図は、主メモリのブロックに格納されたマイクロコードを示す代表的な図である。図において、同一の要素又は同様の機能を持つ要素は同じ数で表わされる。さらに、番号の一番左の数字は、その番号が最初に出てくる図面の番号を表す。発明の詳細な説明序本発明は、命令セットは比較的小さいがマイクロコードは非常に複雑なシングル（半導体）チップのグラフィックス・プロセサ用として開発された。その時生じた問題は、データパス中でプログラム命令の処理を如何に高速化するかということであった。この問題を解決することはそのマイクロプロセサを成功させるために大変重要であった。その理由は、グラフィックス・プログラム命令の１つ１つを実行するのに要する機能の数は膨大なものだからである。１つのグラフィックス・プログラム命令を実行するのには、数百から恐らく数百万の浮動小数点計算を行う必要がある。それ故、例えばグラフィックス・プロセサでは、プログラム命令を実行するため必要な機能を行うマイクロコードに大きな負担が掛かってくる。それに反して、典型的なＣＩＳＣ又はＲＩＳＣ命令の実行にはただ２つの数の加算が必要なだけである。 1978年に発表されたＤＥＣのＶＡＸ-11/780はデータパスの制御用として複雑な命令及びマイクロコードを用いるpre-ＲＩＳＣアーキテクチャを有していた。そのマイクロコードの一部は固定化されており、また他の一部は書き込み可能であった。書き込み可能な部分は、いくつかの命令をインプリメントしたり他をパッチするのに用いられ、また診断用に命令を構築できるようにするために用いられた。ＶＡＸ-11/780の簡単化したブロック・ダイアグラムを第１図に示す。ＣＰＵ102は、プロセサ104、固定マイクロコード・ブロツク106、及び書き込み可能ブロック108を含んでいる。それ以外に第１図には、命令パス110、データ読み出し及び書き込みパス112及び114の夫々、データ・キャッシュ116、書き込みバッファ118、仮想から物理アドレスへの変換器、即ち、トランスレーション・ルック・アサイド・バッファ（ＴＬＢ）120、外部システム・バス 122、メモリ・サブシステム（主メモリ）124、及び入出力サブシステム126等が示されている。それら要素の機能は従来と同様である。ここで注意を要するのは、データ・キャッシュ116はデータをプロセサ104にキャッシュするためだけに用いられることである。システム・ブロック・ダイアグラム本発明は、グラフィックス・プロセサ・チップに関して説明されるが、当業者に明かになるように、本発明は、マイクロコードを用いたいかなるシステムにも適用することが可能である。例えば、高速フーリエ変換を行うデイジタル・シグナル・プロセサはマイクロコードを用いて多くの必要な加算操作及び乗算操作を行うことが可能である。従って、本発明は、グラフィックス・プロセサだけに制限されるものではない。本分野では、プロセサ、ＣＰＵ、及びディジタル・プロセサはしばしば同意語として用いられる。以後プロセサという用語を用いるが、勿論当開示の基礎をなす意味を変えることなく他の同様な用語に置き換えることが出来ると理解する。チップ、集積回路、半導体デバイス、マイクロエレクトロニクス・デバイスという用語もこの分野ではしばしば同意語として使われている。本発明は当分野で一般に理解されているように上記のものの全てに適用できる。本発明の集積マイクロコード記憶システムの好適な実施例を第２図の200 に示す。集積マイクロコード記憶システム200は、マイクロコード・エンジンともよばれるマイクロシーケンサ201を含むが、これは、データパスを制御するためにデータパス・デコード論理回路にあるマイクロコード命令を実行する（データパス・デコード論理回路及びデータパスは示されていない）。マイクロシーケンサ201は、プログラム命令で特定された機能に従って予め決められたマイクロコード・ルーチンを実行する。マイクロコードは、１行１行実行される。マイクロコードの各ラインはデコードされ、データパスから成る機能ブロック（レジスタ、マルチプレクサ、及びそれに類似のもの）を制御する。第２図において、標準的入出力（I/O）インタフェース202は外部システム・バス122に接続されている。内部アドレス・バスは206に示されている。ランダム・アクセス・メモリ（ＲＡＭ）ブロック208、読み出し専用メモリ・ブロック2 10、キャッシュ・タグ・メモリ・ブロック212、制御論理回路ブロック216、及び直接メモリ・アクセス（ＤＭＡ）制御ブロック214がシステム200の主要な要素である。システム200はまた、システム200とは分離した（即ち、チップ外の）主メモリにインタフェースする。ＤＭＡ214は、マイクロ命令アドレスをメモリ・アドレスに変換することにより、主メモリ又はそれと同類のもの（ここには示されていない）からマイクロコードを取り出すのに用いられる。或いは、ＤＭＡ214の機能はマイクロシーケンサ201中に組み込むことも可能であろう。制御論理回路216は、ＲＡＭキャッシュ208がアクセスされたとき発生するかも知れないキャッシュ・ミスを処理するためのキャッシュ・ミス論理回路を含んでいる。キャッシュ・ミス諭理回路は、キャッシュ・タグ・メモリ・ブロック 212と共に、ＲＡＭキャッシュ208に記憶されたマイクロコードを更新する標準的キャッシュ手法を実行する。当業者には以後明かになるが、システム200には、ヘネシー等の著書で説明されているような多くの従来のキャッシュ手法を用いることが可能である。マイクロシーケンサには、ネキスト・マイクロコード・アドレス・ジェネレータにこには示されていない）が含まれているが、それは、次のアドレスに基づいてどちらのメモリ（ＲＡＭ208か又はＲＯＭ210か）を用いるかを選択する（即ち、次のアドレスは、ＲＡＭのアドレス領域にあるのか又はＲＯＭのアドレス領域にあるのか）。ネキスト・アドレス生成は、現在のアドレス、データパスからのデータ、及び／又はデータパス・デコード論理回路（普通はＲＡＭ208又はＲＯＭ210の出力）に基づいている。その後メモリ208又は210は、夫々バス222又は224を経由してそのサイクル用のマイクロコード・データをマルチプレクサ（ＭＵＸ）226に出力する。制御論理回路ブロック216は、ＭＵＸ選択信号を生成し、ライン228を介してＭＵＸ22 6に送り、バス230を介してマイクロコード・データをデータパス・デコード論理回路に出力する。このマイクロコード・データは、デコードされ、データパスを制御しネキスト・アドレス・ジェネレータに次アドレスにはどちらのアドレス・ソースを用いるかを告げる（アドレスは、データパスから来るのか、ＲＯＭ/ＲＡＭからのマイクロコード・データから来るのか、或いは単に次の順番のロケーシヨンなのか）もし次アドレスが有効なＲＡＭアドレスでＲＡＭ208がそのアドレスに正しい情報を持っているなら、キャッシュ・ヒットになる。それ以外は、制御論理回路ブロック216内のキャッシュ・ミス論理回路がデータパス及びマイクロシーケンサを凍結する。その後キャッシュ・ミス論理回路はＤＭＡ214に（双方向バス122を経て）欠落している情報を請求する。ＤＭＡ214は、この情報を取り出しＲＡＭ208に格納する。これを行うため、ＤＭＡ214は、単方向バス232を介してI/Oインタフェース202と通信する。I/Oインタフェース202は、外部システム・バス122を経て主メモリをアクセスする。要請されたマイクロコード・データは、外部システム・バス122を介してI/Oインタフェース202に送り戻される。その要請を開始したキャッシュ・ミス論理回路はＲＡＭキャッシュ208に単方向バス234を通してマイクロコード・データが来ることを告げる。I/Oインタフェース202は、新しいマイクロコード・データを単方向バス236を通してＲＡＭキャッシュ208に送る。次にタグ・メモリ212は新しい情報で更新され、システムはキャッシュ・ミスが起きた前の点から続けられる。或いは、キャッシュ・ミスが起きた時は、順序制御装置は、その要請と共にマイクロコード・ブロック・アドレスを単に送るだけである。マイクロコード・ブロック・アドレスは他のどんな形式にも変換でき、主システム・メモリからそれを取り出すことが可能である。仮想アドレス方式（プロセサでよく用いられるような）又は他のいかなる種類の方式もマイクロコードと共に用いることが出来る。例えば、ＤＭＡ214はブロック・アドレスを受け取り、それをワード・アドレスに変換し、それにポインタを付け加える。従って、システムの主メモリを 32ビット・ワードに編成し、マイクロコードを、その空間内のワードでアライメントしたいかなるロケーシヨンからでも始まるように配置することができる。ＲＡＭコードを格納するのに必要な空間の大きさに対する唯一の条件は連続しているということである。マイクロコード・キャッシュと外部マイクロコード・メモリとの間のいかなる相互接続バス上でもバースト・モード・アクセスが利用できるようにキャッシュ・ブロックがアライメントされているのが望ましい。仮想メモリを採用しているシステムでは、ＲＡＭマイクロコードを仮想メモリ空間に置くようにすることは有効であるかも知れない。このインプリメンテーシヨンは当業者にとって明かになるであろう。物理的ＲＡＭマイクロコード・アドレスをシステム内の仮想アドレスに変換する標準的ＴＬＢ回路が必要となる。システム・ページ・テーブル形式を理解するハードウエアを構築することも可能であろう。或いは、その要請に応えるためホスト・プロセサを中断してもよいであろう。システムはマイクロコード・ページ・テーブルに対する特別な形式を用い、且つハードウエアにはそれ自身の（多分より簡単な）ページ・テーブル形式を持たせることも可能である。ＲＡＭキャッシュ208及びＲＯＭ210のサイズは、夫々のインプリメンテーションに特有なものであるが、議論を進めるためにＲＡＭは８マイクロワードの 32ブロックから成るものとする。マイクロコードＲＡＭキャッシュ208は直接マッピング形式でインプリメントされる。これにより大量のマイクロコードを主メモリに格納することが可能になる。オフチップでアドレス可能なマイクロコードの実際的な量は128Ｋバイトである。マイクロコードＲＡＭキャッシュ 208のアドレス・ロケーシヨンは、ＲＡＭキャッシュ自身より十分に大きな主メモリ（オフチップ・メモリ又はそれと同類のもの）の１つのセクションに直接マップされている。それゆえ、キャッシュ・ミスが起きたとき、主メモリに格納されたマイクロコードのブロックをマイクロコードＲＡＭキャッシュにスワップインしたり、スワップアウトすることが出来る。他の従来のキャッシュ・マッピング方式を本発明で使用できるように適合することができることは、当業者に明かになるであろう。制御論理回路216は、アドレス・バス206上のＲＡＭキャッシュ208に向けられたマイクロコード要請に対応したアドレスを検出するように構成されている。制御論理回路216は、キャッシュ・タグ・メモリ・ブロック212から双方向バス 218又はそれと同類のものに沿って、アドレス・バス 206上で検出される望ましいアドレスに対応するＲＡＭロケーションの正当性に関する情報を受け取る。もしＲＡＭキヤツシュ208が無効アドレスを含むとき、制御論理回路216はマイクロコード制御下の全てのデータ・パス要素を停止し、出入力インタフエース202を通して欠落しているマイクロコードをＤＭＡ214に要請する。次に制御論理回路216は、望ましいマイクロコードのアドレスに対応するブロック・アドレスを双方向バス220又はそれと同類のものを通してＤＭＡ214に渡す。ＤＭＡ214は、必要なマイクロコードをシステムの主メモリに要請出来るように構成されている。マイクロコードは、主メモリに、第３図に示すようにブロック302として格納されている。ＤＭＡ214は、最初の主メモリ・ロケーシヨン 304〜310を格納するが、このロケーシヨンはマイクロコードを含む主メモリ・ブロック夫々に対応する。ＤＭＡ214は、ポインタを用い、主メモリより戻されたマイクロコードのブロック内にある要請されたマイクロコードを探すためベース・ブロック・アドレスをオフセットする。ＤＭＡ214は、要請されたブロックのアドレスに対するベース・ポインタを用いるので、マイクロコードは、システム・メモリ内の殆どどんなロケーシヨンにも格納できる。ただ必要なことは、マイクロコードのベース・アドレスを反映するようにベース・ポインタを変更することだけである。別の方法として、主メモリ・アドレスを生成出来るようにマイクロシーケンサを修正することが可能であるが、その時にはＤＭＡが不要になる。制御論理回路216によってＤＭＡ214にアドレスが渡される時、ＤＭＡ214 は、ベース・ポインタ・アドレス値をブロック・アドレスに付加し、入出力インタフェース202を経てよく知られた方法で主メモリからブロックを取り出す。データパス及びデコーディングは、要請されたマイクロコードがキャッシュに受け取られるまで凍結される。ある好適な実施例では、データパスは完全にパイプライン化されているので、実行を停止する最善の方法は、ゲート化され．た論理回路（こには示されていない）用の主クロック・バッファをゲートする、即ち「凍結する」ことである。凍結には、マイクロコードＲＡＭにロードするために用いるものを除き全ての回路に対するクロックが休止することが必要である。次に、装置の順序制御要素は外部メモリに欠落したコードを要請し、上述したように、そのコードが検索された時クロックを再びスタートさせる。それに引き続き、入出力インタフェース202は主メモリからマイクロコード情報を受け取り、それをデータバス207を経てＲＡＭキャッシュ208に渡す。本発明の好適な実施例では、クリティカルな又は使用頻度の高いマイクロコードはＲＯＭ210に格納され、それほどクリティカルでないか使用頻度の低いマイクロコードはオンチップＲＡＭ208にキャッシュされている。各マイクロコードの格納場所を最適化することにより良好なシステム性能が得られる。ＲＯＭ 210にあるマイクロコードの要請は、制御論理回路216により検証される。それに引き続き、マイクロコードはＲＯＭ210から読み出され、データバス207に載せられ、マイクロシーケンサがデータパスを制御するのに用いられる。チップの製造もまたスピードアップされる。その理由は、書いたりデバッグするのに時間が掛かるマイクロコードはＲＯＭ210に格納する必要はないからである。後で改善されたマイクロコードは単に主メモリに格納し、必要に応じてマイクロコードＲＡＭキャッシュ208にキャッシュすればよい。さらに、頻繁には使わない特別なテスト又は診断用マイクロコードは、ＲＯＭ210に格納し高価な面積を取ってしまうことはない。特別なテスト又は診断用マイクロコードは必要に応じてキャッシュすることが可能である。さらに、特別なテスト又は診断用マイクロコードはいつでも書くことが出来、ＲＯＭ210のサイズに制限を受けない。本発明のマイクロコードＲＡＭキャッシュ208のためプログラム命令形式にも柔軟性がある。システムが対応するマイクロコードを探してＲＯＭ210又はＲＡＭキャッシュ208を見るようにプログラム命令を簡単に変更（例えば、ターゲット変更）することが出来る。もともとＲＯＭに格納されていたマイクロコードに問題があったり、それがもう古くなった時、新しいマイクロコードを加え、ＲＯＭ210ではなくＲＡＭキャッシュ208で必要なマイクロコードをシステムが探すようにプログラム命令を変更することが出来る。ＲＯＭからＲＡＭへの分岐及びその逆本発明の好適な実施例では、ＲＯＭ及びＲＡＭは、連続的主メモリ空間の異なったロケーションにマップされているので、それらは全く同様な方法で用いることが可能である。両者で唯一異なっている点は、それらが異なったアドレスにマップされているということだけである。全メモリ空間がｎビット・アドレスで表されているなら、アドレス空間のある部分はＲＡＭにマップし、アドレス空間の他の部分はＲＯＭにマップすることが可能である。例えば、14ビットのアドレス空間があるとすれば、ＲＯＭには13ビットのアドレス空間を割り当て、ＲＡＭにも同様にする。従って、アドレスの最上位ビットはＲＯＭ/ＲＡＭの選択用となる（即ち、ビットがセットされていれば、ＲＡＭが選ばれる）。ＲＯＭ/ＲＡＭから及びＲＯＭ/ＲＡＭへの分岐に対する現在のインプリメンテーシヨンでは、殆どの時間ただ一つのメモリのみが電力を消費していることを確実にするためプレデコード論理回路をシステムに加えることが必要となる。プレデコード論理回路は、ＲＯＭ/ＲＡＭの電流使用状況を監視するように設計されている（即ち、それはＲＯＭ/ＲＡＭの選択ビットを監視している）。それはまた「起こりうる」将来のアドレス（例えば、ブランチ・アドレス）を監視しなくてはならない。プレデコード論理回路が、ＲＯＭとＲＡＭの間で切り換えが起きうることを検出すると、それは他のメモリを使用可能にし、それに引き続き後者を使用不能にする。例えば、ＲＡＭが現在使用されているなら、それは使用可能状態である。もしマイクロコード・ラインが次にＲＯＭへの分岐を実行するなら、プレデコード論理回路は、そのラインが実際に実行される前にＲＯＭを使用可能にする。上述したことは、マイクロコード内の分岐コマンドのルック・アヘッドを実行することによりなされる。勿論、ユニットをオフにするには、プレデコード論理回路は、そのユニットが「近い」将来使われないことを決めなくてはならないし、またそのユニットが現在使われていてはならない。プログラマは、サブルーチンを用いプログラム・スペース及び開発時間を節約できる。その結果、ＲＡＭとＲＯＭ間の分岐は些細なこととなり、サブルーチンはＲＯＭは勿論ＲＡＭにも入れることが可能である。最も一般的なのは、ＲＡＭルーチンがＲＯＭサブルーチンを実行し、時間とスペースを節約することである。スペースの節約は、問題にしているサブルーチンをＲＡＭにロードしなくてもよいというところから来ている。時間の節約は、コードを実行するのにＲＡＭでキャッシュ・ミスが起こらないというところから来ており、一方、ＲＯＭは非居住のマイクロコードを取って来るため時間の損失なく常に使用することが可能である。用いられる特別形式命令がプロセサに送られる時、この命令を実行するマイクロコードのアドレスがそれに含まれている。これによって命令に対するパッチ（その時点でＲＡＭに存在するであろうが）が可能になりデコード論理をより簡単にできるようになる。パッチは、命令内のマイクロコードの開始アドレスを変更することによりなされる。プロセサ・ソフトウエアは、命令を発するとき、マイクロコードに対応するテーブルを用いる。マイクロコードが変更されると、新しい命令ルックアップ・テーブルが作られる。命令フオーマットは、ここでの説明に関係する２つのフイールドを含んでいる。「命令番号」フィールドは厳密には必要ではないが、レジスタやＤＭＡコントローラ命令のような固定ハードウエア・コード化命令を付加するのに便利である。「マイクロコード・アドレス」フィールドは、マイクロコード・アドレス空間内にある命令の開始アドレスを保持している。ハードウエアを駆動するソフトウエアは、ブートしたりリスタートしたりする時このテーブルを読んだり用いたりする。命令番号フィールドは、いかなる開始アドレスにも対応出来、実行する命令を特定するのに一般に用いられる。このテーブルは各命令に対する開始アドレスを供給するのに用いられる。この手法はソフトウエアでのダイナミック・リンキングに類似している。実際、マイクロコードの複数のセットをインプリメントすることが出来る。例えば、テスト・マイクロコードをブート診断に用い、それをその後機能マイクロコード・セットに置き換えることが可能である。特殊テスト・コ−ド特殊コード（例えば、テスト用）は容易に実行可能である。その理由は、命令のアドレスは命令と共に与えられ、望ましいテスト・コードはプロセサの外部より取り出されるからである。これは、プロセサを予め決められたＲＡＭアドレスにジャンプさせ、次いでプロセサにテスト・コードを要請される通りに送ることで行われる。コードは必要なだけ長くすることが出来る（例えば、それは、アドレッシングが13ビットの長さでもよいし、特殊な取り出しに関してはそれ以上の長さでもよい）。本発明によれば、プロセサは特殊なテストモードを持つことが出来る。テストモードを制御するため、外部よりアクセス可能なレジスタを付け加えることが可能である。このテストモードが使用可能にされると、プロセサの通常の実行は停止され、プロセサはテストモードに入る。このテストモードの一つの実施例では３つのレジスタが用いられる。最初のレジスタはアドレス又はアドレス／制御レジスタである。ＲＯＭおよびＲＡＭは同一のアドレス・スペースにマップされているので、どんなアドレスもアドレス・レジスタにロードすることが出来、必要なＲＯＭ又はＲＡＭワードをアクセスすることが出来る。アドレス／制御レジスタに対して例として挙げた６つのフィールドの機能が表１に掲げられている。他のビットを別のテスト機能のために用いることができることは当業者には明かであろう。第２のレジスタには、ユーザーがそのサイクルにＲＡＭに書き込みたいデータが含まれている（ＲＡＭアドレスはアドレス・レジスタに含まれている）。第３のレジスタには、ＲＯＭ、ＲＡＭ、又はＴＡＧメモリの現在のアドレスにあるデータが含まれている。もしＲＡＭが前に書き込まれたなら、このレジスタにも、少し遅れて、先に書き込まれたのと同じデータが含まれているはずである。テストモードが使用不能にされる時、アドレス・レジスタにロードされているアドレスが、シーケンサがマイクロコードをリスタートさせるアドレスになっている。本発明の多くの実施例について上に述べたが、それらは例として挙げられたものであり、限度を示すためのものではない。従って、本発明の広さと範囲は、上述したいかなる典型的な実施例もその限界を与えるものではなく、以下の特許請求範囲及びそれと同等のものに従ってのみ規定される。DETAILED DESCRIPTION OF THE INVENTION A Microcode Cache System and Method Background of the Invention FIELD OF THE INVENTION This invention relates to digital processors, and more particularly to random access memory (RAM) for storing microcode instructions. 2. Related Art Microcode, or microprogramming, was originally proposed as one way to make the logic used to control the functions of a central processing unit (CPU) more regular. It allows a simple data path to execute a complex program instruction, but it does this by breaking down the functionality required to execute a complex program into a series of simple operations and doing it sequentially. Make it possible. Microcode should not be confused with program instructions corresponding to the instruction set of a CPU. The microcode address is formed from the contents of some or all of the instruction register, along with some state values internal to the microcode engine (or microsequencer). The microsequencer executes microcode to control the data path. The advantage of microcode is that the control logic is implemented in a very regular structure, usually read only memory (ROM). To change or add to the instruction set, the bits in the ROM must be changed or a new ROM added. There is also a proposal to replace all or part of the ROM with a writable RAM for more flexibility. However, the RAM method is still quantitatively limited in the realizable microcode control. The reason is that RAM is expensive to integrate on a chip, and the ratio of the number of memory bits / chip area is lower than that of a ROM on a chip. (Currently, the storage bit number / chip area ratio of ROM is 3: 1 or 4: 1 as compared with RAM, that is, RAM is 3 to 4 times as large as ROM for the same storage capacity.) On microcoded machines, it is possible to use branches and subroutines in microprograms, which makes microsequencing logic very complex. The designers of RISC processors have adopted some of the techniques used in microprogramming, but this is at the level of the user program. These techniques include delay branching, software managed pipeline interlocks, and the like. Microcode exposes details of pipelines and hardware to the programmer, which can be pros and cons. Using microcode is more efficient, but at the same time hard to program. One of the major advantages is that multiple operations can be performed in one cycle if efficiently microcoded. ROM access times are so fast that frequently used ROM operations run fast if they are microcoded. Early RISC supporters pointed out that instruction cache memories also have fast access times. ROM-based microcode contains only a static instruction set chosen by the original designer, whereas conventional instruction caches are automatically chosen by the hardware to suit the current task. It is possible to include a dynamic set of frequently used operations. However, conventional instruction caches have never been used to store microcode. (For a number of cache memory configurations used in various commercial computer systems, see Stephen B. Furber) “VLSI RISC Architecture and Organization” (Marcel Dckker, Inc.) , 1989) and Jiyoung L. Hennessy et al. (John L. Hennessy et al. ) "Computer Architecturc-AQuantitative Approach" (Morgan Kaufmann Publishers, Inc.) , San Mateo, California, 1990). The move to RISC has been seen as a reaction to microcode, which may be misleading. The problem was the complex instruction set, not the microcode implementation. True, it is true that on small machines microcode needs to support complex instruction sets, but it is equally possible to microcode simple instruction sets. While early RISC designs avoided using microcode, some later commercial RISC designs have re-employed them. RISC's tendency to execute most instructions in a single cycle appears to dislike using eccentric microcode constructs. For single cycle instructions, microcode ROM is, in fact, merely a normal decoding arrangement. In his book, Furbcr, a digital processor with a complex instruction set, like DEC's VAX-11 / 780, usually uses microcode over a relatively simple data path, It states that it is possible to execute complex instructions in several sequential steps. Here, the advantage of microcode over a subroutine of simple instructions is that the microcode is held in a memory that has a very fast access time compared to main memory. However, this advantage is lost if the CPU has a cache memory for program instructions. The microcode shows good performance for a preselected set of simple instruction sequences, while the instruction cache shows comparable performance for dynamically self-adjusting instruction sequences. In previous VAX designs, off-chip RAM was used to store microcode, which was directly mapped to the chip. However, extra pins were required for connection to off-chip RAM to access the microcode. Yet another drawback was that the programmer had to write the code to retrieve the required microcode. It is desirable to have greater flexibility in CISC or RISC systems that take advantage of the high speed of both microcode and cache technology. SUMMARY OF THE INVENTION The present invention is directed to a cache system and method having both on-chip ROM and RAM cache for storing microcode. The present invention is sufficiently flexible to locate microcode in either ROM or RAM cache using tag information and target address-based information that is simultaneously decoded when the corresponding instruction is decoded. It is possible. The present invention allows important microcode to be stored in ROM and additional microcode to be cached in or retrieved from RAM as needed. The invention further allows off-chip diagnostic microcode to be cached in the microcode RAM cache, for example for testing chips and peripherals. Although the present invention uses standard I / O channels on the chip to access groups of microcode, their size is limited only by the amount of off-chip memory available or the microcode address space. From the programmer's point of view, the microcode RAM cache is transparent. The programmer does not have to write any code to get the required microcode. Therefore, the present invention is faster and more efficient than conventional designs. Since the microcode can arrange code to avoid cache thrashing, it can use a directly mapped cache. Other cache mapping techniques are also considered. The present invention also shortens clock cycles. With current technology, clock speeds have reduced access times, making microcode storage in off-chip RAMs no longer useful. A 50 MHz clock would be a 20 ns clock cycle, but exiting the chip with the microcode back to the chip would be too efficient as it would consume too many clock cycles. One solution would be to use very fast off-chip RAM, but such a device would not be cost-effective and would require many dedicated pins in the package. The present invention allows the use of a slow but inexpensive SRAM or DRAM as an on-chip cache, and saves valuable time in the form of clock cycles. Another advantage of the present invention is that new microcode can be added to the system, thus requiring the overhead of loading the new microcode when it is needed, at the same speed as embedded ROM microcode. It is possible to run with. Yet another advantage is that bugs can be fixed using RAM microcode or a combination of ROM and RAM microcode, such as ROM subroutines branching to RAM. Other patching techniques between RAM and ROM and vice versa are also contemplated. The above-mentioned and other features and advantages of the present invention will become apparent from the following more detailed description of the preferred embodiments, as shown in the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The present invention may be better understood with reference to the accompanying drawings. FIG. 1 is a block diagram of a conventional microcode storage system. FIG. 2 is a high level block diagram of the microcode cache system of the present invention. FIG. 3 is a typical view showing the microcode stored in the block of the main memory. In the figures, the same elements or elements having similar functions are represented by the same numbers. Further, the leftmost digit of the number represents the number of the drawing in which the number first appears. DETAILED DESCRIPTION OF THE INVENTION Introduction The present invention was developed for a single (semiconductor) chip graphics processor with a relatively small instruction set but very complex microcode. The problem that then arose was how to speed up the processing of program instructions in the data path. Solving this problem was very important for the success of the microprocessor. The reason is that the number of functions required to execute each graphics program instruction is enormous. It takes hundreds to perhaps millions of floating point calculations to execute a single graphics program instruction. Thus, for example, in a graphics processor, the microcode that performs the functions necessary to execute the program instructions is heavily burdened. On the contrary, the execution of a typical CISC or RISC instruction only requires the addition of two numbers. DEC's VAX-11 / 780, announced in 1978, had a pre-RISC architecture that used complex instructions and microcode to control the datapath. Some of the microcode was fixed, and some was writable. The writable portion was used to implement some instructions, patch others, and to allow instructions to be constructed for diagnostics. A simplified block diagram of the VAX-11 / 780 is shown in FIG. CPU 102 includes processor 104, fixed microcode block 106, and writable block 108. In addition, FIG. 1 shows an instruction path 110, a data read and write path 112 and 114, respectively, a data cache 116, a write buffer 118, a virtual-to-physical address converter, or a translation look-aside. A buffer (TLB) 120, an external system bus 122, a memory subsystem (main memory) 124, an input / output subsystem 126, etc. are shown. The functions of these elements are the same as conventional ones. It should be noted that the data cache 116 is used only to cache data on the processor 104. System Block Diagram Although the invention is described with respect to a graphics processor chip, it will be apparent to those skilled in the art that the invention is applicable to any system using microcode. . For example, a digital signal processor that performs a Fast Fourier Transform can use microcode to perform many necessary addition and multiplication operations. Therefore, the present invention is not limited to graphics processors only. Processors, CPUs, and digital processors are often used synonymously in the art. The term processor is used hereafter, but of course it is understood that it can be replaced with other similar terms without changing the meaning underlying the disclosure. The terms chip, integrated circuit, semiconductor device, microelectronic device are also often used synonymously in this field. The present invention is applicable to all of the above as is generally understood in the art. A preferred embodiment of the integrated microcode storage system of the present invention is shown at 200 in FIG. The integrated microcode storage system 200 includes a microsequencer 201, also known as a microcode engine, which executes microcode instructions located in the datapath decode logic to control the datapath (datapath decode). Logic circuits and data paths are not shown). The microsequencer 201 executes a predetermined microcode routine according to the function specified by the program instruction. The microcode is executed line by line. Each line of microcode is decoded and controls a functional block (registers, multiplexers, and the like) consisting of a datapath. In FIG. 2, the standard input / output (I / O) interface 202 is connected to the external system bus 122. The internal address bus is shown at 206. Random access memory (RAM) block 208, read only memory block 210, cache tag memory block 212, control logic block 216, and direct memory access (DMA) control block 214 are the main components of system 200. It is an element. System 200 also interfaces to main memory that is separate from system 200 (ie, off-chip). DMA 214 is used to retrieve microcode from main memory or the like (not shown here) by translating microinstruction addresses into memory addresses. Alternatively, the functionality of DMA 214 could be incorporated into microsequencer 201. Control logic 216 includes cache miss logic for handling cache misses that may occur when RAM cache 208 is accessed. The cache miss instruction circuit, along with the cache tag memory block 212, implements standard cache techniques for updating the microcode stored in the RAM cache 208. As will be apparent to one of ordinary skill in the art, the system 200 can employ many conventional caching techniques, such as those described in Hennessy et al. The microsequencer contains a (not shown here for the next microcode address generator), which uses which memory (RAM 208 or ROM 210) based on the next address. (That is, whether the next address is in the address area of RAM or in the address area of ROM). Next address generation is based on the current address, the data from the datapath, and / or the datapath decode logic (typically the output of RAM 208 or ROM 210). Memory 208 or 210 then outputs the microcode data for that cycle to multiplexer (MUX) 226 via bus 222 or 224, respectively. The control logic block 216 generates the MUX select signal, sends it to MUX 226 via line 228 and outputs the microcode data to the datapath decode logic via bus 230. This microcode data is decoded to control the datapath and tell the next address generator which address source to use for the next address (whether the address comes from the datapath or from ROM / RAM). If the next address is a valid RAM address and RAM 208 has the correct information at that address, then it is a cache hit. Otherwise, cache miss logic in control logic block 216 freezes the datapath and microsequencer. The cache miss logic then requests DMA 214 (via bidirectional bus 122) for the missing information. The DMA 214 extracts this information and stores it in the RAM 208. To do this, DMA 214 communicates with I / O interface 202 via unidirectional bus 232. The I / O interface 202 accesses main memory via the external system bus 122. The requested microcode data is sent back to the I / O interface 202 via the external system bus 122. The cache miss logic that initiated the request tells the RAM cache 208 that microcode data is coming through the unidirectional bus 234. I / O interface 202 sends new microcode data to RAM cache 208 over unidirectional bus 236. The tag memory 212 is then updated with the new information and the system continues from the point before the cache miss occurred. Alternatively, when a cache miss occurs, the sequence controller simply sends the microcode block address with the request. The microcode block address can be converted to any other format and retrieved from main system memory. Virtual addressing schemes (as is commonly used in processors) or any other type of scheme can be used with microcode. For example, DMA 214 receives a block address, translates it into a word address, and adds a pointer to it. Thus, the main memory of the system can be organized into 32-bit words and the microcode can be arranged to start with any location aligned with words in that space. The only requirement for the amount of space needed to store the RAM code is that it is continuous. It is desirable that the cache blocks be aligned so that burst mode access is available on any interconnect bus between the microcode cache and the external microcode memory. In systems employing virtual memory, it may be beneficial to have RAM microcode located in virtual memory space. This implementation will be apparent to those skilled in the art. Standard TLB circuitry is required to translate the physical RAM microcode address into a virtual address within the system. It would be possible to build hardware that understands the system page table format. Alternatively, the host processor could be suspended to meet the request. The system may use a special format for the microcode page table, and the hardware may have its own (perhaps simpler) page table format. The sizes of RAM cache 208 and ROM 210 are specific to their respective implementations, but for the sake of discussion it is assumed that RAM consists of 32 blocks of 8 microwords. The microcode RAM cache 208 is implemented in a direct mapping format. This allows large amounts of microcode to be stored in main memory. The practical amount of microcode addressable off-chip is 128 Kbytes. The address location of the microcode RAM cache 208 is directly mapped to a section of main memory (off-chip memory or the like) that is significantly larger than the RAM cache itself. Therefore, when a cache miss occurs, blocks of microcode stored in main memory can be swapped in and out of the microcode RAM cache. It will be apparent to those skilled in the art that other conventional cache mapping schemes can be adapted for use with the present invention. Control logic 216 is configured to detect an address corresponding to a microcode request directed to RAM cache 208 on address bus 206. The control logic 216 provides information from the cache tag memory block 212 along the bidirectional bus 218 or the like regarding the correctness of the RAM location corresponding to the desired address found on the address bus 206. receive. If the RAM cache 208 contains an invalid address, the control logic 216 will stop all data path elements under microcode control and request the DMA 214 for the missing microcode through the input / output interface 202. Control logic 216 then passes the block address corresponding to the desired microcode address to DMA 214 through bidirectional bus 220 or the like. The DMA 214 is configured so that the required microcode can be requested from the main memory of the system. The microcode is stored in main memory as block 302 as shown in FIG. The DMA 214 stores the first main memory location 304-310, which location corresponds to each main memory block containing microcode. The DMA 214 uses the pointer to offset the base block address to look for the requested microcode within the block of microcode returned from main memory. The DMA 214 uses a base pointer to the address of the requested block so that the microcode can be stored in almost any location in system memory. All that is needed is to change the base pointer to reflect the microcode base address. Alternatively, the microsequencer could be modified to generate the main memory address, but then the DMA would be unnecessary. When an address is passed to DMA 214 by control logic 216, DMA 214 appends the base pointer address value to the block address and fetches the block from main memory via I / O interface 202 in well-known fashion. The data path and decoding are frozen until the requested microcode is received in the cache. In one preferred embodiment, the datapath is fully pipelined, so the best way to stop execution is gated. Gating, or "frozen," the main clock buffer for the logic circuit (not shown). Freezing requires the clock to pause for all circuits except those used to load microcode RAM. The device sequence control element then requests the missing code from the external memory and restarts the clock when that code is retrieved, as described above. Following that, the I / O interface 202 receives microcode information from main memory and passes it to the RAM cache 208 via the data bus 207. In the preferred embodiment of the invention, critical or heavily used microcode is stored in ROM 210 and less critical or less frequently used microcode is cached in on-chip RAM 208. Good system performance is obtained by optimizing the storage location of each microcode. Microcode requests in ROM 210 are verified by control logic 216. Following that, the microcode is read from the ROM 210, placed on the data bus 207 and used by the microsequencer to control the datapath. The production of chips is also speeded up. The reason is that microcode, which takes time to write and debug, need not be stored in ROM 210. The later improved microcode may simply be stored in main memory and cached in microcode RAM cache 208 as needed. Moreover, special test or diagnostic microcodes that are not used frequently are stored in ROM 210 and do not take up valuable space. Special test or diagnostic microcode can be cached as needed. Moreover, special test or diagnostic microcode can be written at any time and is not limited by the size of ROM 210. The microcode RAM cache 208 of the present invention also provides flexibility in program instruction format. Program instructions can be easily modified (eg, retargeted) so that the system looks for the corresponding microcode and looks in the ROM 210 or RAM cache 208. If there is a problem with the microcode originally stored in ROM, or it is old, add new microcode and change the program instructions so that the system will find the required microcode in RAM cache 208 instead of ROM 210. You can do it. ROM to RAM Branching and Vice Versa In the preferred embodiment of the present invention, ROM and RAM are mapped to different locations in contiguous main memory space so they can be used in exactly the same way. Is. The only difference between the two is that they are mapped to different addresses. If the entire memory space is represented by n-bit addresses, some parts of the address space can be mapped to RAM and other parts of the address space can be mapped to ROM. For example, if there is a 14-bit address space, the ROM is assigned a 13-bit address space, and the RAM is similarly processed. Therefore, the most significant bit of the address is for ROM / RAM selection (ie RAM is selected if the bit is set). Current implementations of ROM / RAM to and from ROM / RAM branches add predecode logic to the system to ensure that only one memory is consuming power most of the time. Will be needed. The pre-decode logic is designed to monitor the ROM / RAM current usage (ie, it monitors the ROM / RAM select bits). It must also monitor "potential" future addresses (eg branch addresses). When the predecode logic detects that a switch can occur between ROM and RAM, it enables the other memory and subsequently disables the latter. For example, if the RAM is currently in use, it is ready for use. If the microcode line next executes a branch to ROM, the predecode logic circuit enables the ROM before the line is actually executed. The above is done by performing a look ahead of the branch command in the microcode. Of course, in order to turn off a unit, the predecode logic must determine that the unit will not be used "in the near future", and the unit must not be currently in use. Programmers can use subroutines to save program space and development time. As a result, the branch between RAM and ROM is trivial and the subroutine can be placed in RAM as well as ROM. Most commonly, RAM routines execute ROM subroutines, saving time and space. The space savings comes from the fact that the subroutine in question does not have to be loaded into RAM. The time savings come from the fact that there is no cache miss in the RAM to execute the code, while the ROM fetches the non-resident microcode so it is always available without loss of time. It is possible. When the special format instruction used is sent to the processor, it contains the address of the microcode that executes the instruction. This allows a patch for the instruction (as it would be in RAM at that time) and allows easier decoding logic. Patching is done by changing the starting address of the microcode in the instruction. Processor software uses a table corresponding to microcode when issuing instructions. When the microcode changes, a new instruction look-up table is created. The instruction format contains two fields relevant to the description herein. The "instruction number" field is not strictly necessary, but it is convenient to add fixed hardware coded instructions such as registers or DMA controller instructions. The "microcode address" field holds the starting address of the instruction in the microcode address space. The software that drives the hardware reads and uses this table when booting and restarting. The instruction number field can correspond to any starting address and is commonly used to identify the instruction to execute. This table is used to supply the starting address for each instruction. This technique is similar to dynamic linking in software. In fact, multiple sets of microcode can be implemented. For example, test microcode can be used for boot diagnostics, which can then be replaced with a functional microcode set. Special test code Special code (eg for testing) is easily executable. The reason is that the address of the instruction is given with the instruction and the desired test code is fetched from outside the processor. This is done by jumping the processor to a predetermined RAM address and then sending the test code to the processor as requested. The code can be as long as needed (eg, it can be 13 bits long for addressing, or longer for special retrievals). According to the invention, the processor can have a special test mode. To control the test mode, externally accessible registers can be added. When this test mode is enabled, normal execution of the processor is stopped and the processor enters test mode. In one embodiment of this test mode, three registers are used. The first register is the address or address / control register. Since ROM and RAM are mapped in the same address space, any address can be loaded into the address register and the required ROM or RAM word can be accessed. The functions of the six field examples given for the address / control registers are listed in Table 1. It will be apparent to those skilled in the art that other bits can be used for other test functions. The second register contains the data that the user wants to write to RAM during the cycle (RAM address is contained in the address register). The third register contains the data at the current address of the ROM, RAM, or TAG memory. If the RAM was previously written, this register would also contain the same data as previously written, with some delay. When test mode is disabled, the address loaded in the address register is the address at which the sequencer restarts the microcode. While many embodiments of the invention have been described above, they are given by way of example and not as limitations. Therefore, the breadth and scope of the present invention is not limited by any of the exemplary embodiments described above, but is defined only in accordance with the following claims and equivalents thereof.

【手続補正書】特許法第１８４条の８【提出日】１９９５年１月１１日【補正内容】特許請求の範囲１．デイジタル・システムが使用するマイクロコードを格納するキャッシュ・システムであって、ディジタル・システムは該ディジタル・システムを制御するプログラム命令を処理し、各プログラム命令は複数の命令フィールドから成り、そのシステムが該マイクロコードの第１グループを格納するよう構成された読み出し専用メモリ（ＲＯＭ）（210）と、該マイクロコードの第２グループのサブセクションを一時的に格納するよう構成されたランダム・アクセス・メモリ（ＲＡＭ）キャツシュ（208）とを含み、該マイクロコードの前記第２グループは前記ＲＡＭキャッシュとは分離したメモリ素子から前記ＲＡＭキャッシュ（208）にマップされ、前記ＲＡＭキャッシュは前記サブセクションをマイクロコード・キャッシュ・システムにスワップイン及びそこからスワップアウトするように構成されており、前記マイクロコードをプリデコードする手段であって、前記プリデコード手段が使用されていない前記ＲＯＭ（210）又は前記ＲＡＭ（208）の一つを将来使用することを検出すれば使用可能にするように前記マイクロコードをプリデコードする手段によって特徴づけられることを特徴とするキャッシュ・システム。２．請求の範囲第１項に記載のシステムであって、マイクロコードを求める要求を受け取り且つ要求されたマイクロコードが前記ＲＯＭ、前記ＲＡＭキャッシュ、又は分離したメモリ素子のどれかに位置するか否かを決定する手段（216、2 12）をさらに含むことを特徴とするシステム。３．請求の範囲第１項又は第２項に記載のシステムであって、該マイクロコード・キャッシュ・システムは単一チップ上にあるが、該分離したメモリ素子は前記チップ上にはないことを特徴とするシステム。４．請求の範囲第２項に記載のシステムであって、該ディジタル・システムはマイクロコード・アドレス範囲を有し、且つ該マイクロコードが前記ＲＯＭ（210）又は前記ＲＡＭ（208）にあるか否かを決定する前記手段（216）は、前記ＲＯＭに対しては前記マイクロコード・アドレス範囲の第１のサブセクションを用い、前記ＲＡＭに対しては第２のサブセクションを用いることを特徴とするシステム。５．請求の範囲第２項に記載のシステムであって、前記手段は、要求されたマイクロコードが前記ＲＯＭ又は前記ＲＡＭキャッシュにあるか否かを決定する制御論理（216）と、要求された所望のマイクロコードが要求された時点に前記ＲＡＭキャッシュに実際に常駐し有効か否かを表すタグ・メモリ（212）とを、さらに含むことを特徴とするシステム。６．請求の範囲第１項乃至第５項中の少なくも一つに記載のシステムであって、該システムは、所望のマイクロコードを分離したメモリ素子から取り出しそれを前記ＲＡＭキャッシュ（208）に格納する直接アクセス・メモリ・コントローラ（214）をさらに含むことを特徴とするシステム。７．請求の範囲第１項乃至第６項中の少なくも一つに記載のシステムであって、該マイクロコードの前記第２グループが分離したメモリ素子から前記ＲＡＭキャッシュ（208）に直接マップされることを特徴とするシステム。８．請求の範囲第１項乃至第７項中の少なくも一つに記載のシステムであって、該マイクロコードの前記第２グループが該マイクロコードの前記第１グループより十分に大きいことを特徴とするシステム。９．請求の範囲第１項乃至第８項中の少なくも一つのシステムであって、ここでディジタル・システムがグラフィックス・プロセサであることを特徴とするシステム。 10．請求の範囲第１項乃至第９項中の少なくも一つに記載のシステムであって、マイクロコードの前記第１及び第２のグループのサブセットが診断を行うのに用いられることを特徴とするシステム。 11．ディジタル・システムで用いられるマイクロコードをＲＯＭメモリ・ユニット（210）及びＲＡＭメモリ・ユニット（208）に格納する方法であって、ＲＯＭ及びＲＡＭは夫々ｍ及びｋ個のアドレス指定可能なロケーションを有し、且つ該ディジタル・システムに集積されており、ディジタル・システムはｎビットのアドレス空間を有し、当該格納の方法は、アドレス指定可能なＲＯＭロケーションの１ビットを除く全てのビットをアドレス空間にマップするステップと、アドレス指定可能なＲＡＭロケーシヨンの１ビットを除く全てのビットをアドレス空間にマップするステップとを、含み、ｍビット及びｋビットの残りのビットは、該ＲＯＭ又は該ＲＡＭのいずれかをアクセスするための選択ビットとして用いられることを特徴とする方法。 12．請求の範囲第１１項に記載の方法であって、該ＲＯＭ及び該ＲＡＭの現在の利用状況を監視するよう選択ビットの値をチェックするステップをさらに含むことを特徴とする方法。 13．請求の範囲第１１項又は第１２項に記載の方法であって、該ＲＯＭと該ＲＡＭ間の将来の切り換えを検出するためマイクロコードのブランチ・アドレスをプリデコードするステップと、もし切り換えが検出されれば、使用されていないメモリ・ユニットを使用可能にしそれに引き続き他方を使用不能にするステップとを、さらに含むことを特徴とする方法。[Procedure Amendment] Patent Act Article 184-8 [Submission date] January 11, 1995 [Correction content] Claims 1. A cache that stores microcode used by digital systems. A system, the digital system controlling the digital system Process program instructions, each program instruction consisting of multiple instruction fields, That system Read-only memory configured to store a first group of the microcode (ROM) (210), Arranged to temporarily store a second section subsection of the microcode A random access memory (RAM) cache (208) that is The second group of microcode is a memory separate from the RAM cache Elements to the RAM cache (208), where the RAM cache is Swap the subsections into the microcode cache system Configured to swap out from there, A means for predecoding the microcode, wherein the predecoding means is One of the unused ROM (210) or RAM (208) will be used in the future. Predecodes the microcode to enable it if detected A cache system characterized by being characterized by means. 2. The system according to claim 1, further comprising: And the requested microcode is the ROM, the RAM cache , Or means for determining whether it is located in one of the separate memory elements (216, 2 The system further including 12). 3. The system according to claim 1 or 2, wherein the microcode De cache system is on a single chip, but the separate memory elements A system characterized by not being on the chip. 4. The system of claim 2 wherein the digital system is Has a microcode address range, and Determine whether the microcode is in the ROM (210) or the RAM (208) The means (216) for setting the microcode address for the ROM. A first subsection of the memory range and a second subsection for the RAM. A system characterized by using a computer. 5. The system according to claim 2, wherein the means comprises: Whether the requested microcode is in the ROM or RAM cache A control logic (216) for determining When the desired microcode requested is actually loaded into the RAM cache. And a tag memory (212) that indicates whether or not it is resident and valid, A system further comprising: 6. A system according to at least one of claims 1-5. , The system is The desired microcode is taken out from the separate memory device and it is stored in the RAM An additional direct access memory controller (214) to store in the cache (208) A system characterized by including M 7. A system according to at least one of claims 1 to 6 , The RAM key from the separate memory device of the second group of microcodes. A system characterized by being directly mapped to the cache (208). 8. A system according to at least one of claims 1 to 7. , The second group of microcodes is the first group of microcodes A system characterized by being much larger. 9. At least one system in claims 1-8, wherein: System characterized by the fact that the digital system is a graphics processor. Stem. Ten. A system according to at least one of claims 1-9. , The subsets of the first and second groups of microcode to make a diagnosis A system characterized by being used. 11. The microcode used in digital systems is stored in ROM memory unit. And a method for storing the data in the RAM (210) and the RAM memory unit (208). M and RAM have m and k addressable locations, respectively, and Integrated into the digital system, the digital system is an n-bit It has an address space and the storage method is Address all but one bit of the addressable ROM location Space space, Address all but one bit of the addressable RAM location And the step of mapping to space Including, The remaining bits of m bits and k bits are assigned to either the ROM or the RAM. A method characterized by being used as a selection bit for access. 12. The method according to claim 11, wherein the ROM and the RAM are present. Further includes the step of checking the value of the select bit to monitor usage of the A method characterized by the following. 13. A method according to claim 11 or claim 12, A microcode blank to detect future switching between the ROM and RAM. Pre-decoding the main address, If a switch is detected, the unused memory unit can be used And then subsequently disabling the other, The method further comprising:

Claims

[Claims] The scope of the claims is as follows. 1. A cache that stores microcode used by digital systems. A system, the digital system controlling the digital system Process each program instruction, and each program instruction has multiple instruction fields. The system A read-only device configured to store a first group of the microcode. Memory (ROM), Temporarily store a second section subsection of the microcode And a random access memory (RAM) cache configured as The second group of microcode is a separate memory from the RAM cache. Memory elements are mapped to the RAM cache, and the RAM cache is Swap the subsection into the microcode cache system Cash system characterized by being configured to Tem. 2. The system of claim 1, wherein the request for the microcode is received. If the requested microcode is the ROM, the RAM cache, Having means for determining if otherwise in a separate memory element System characterized by. 3. The system of claim 1, wherein the microcode cache The system is on a single chip, but the separate memory devices are not on the chip. A system that features 4. The system of claim 2 wherein the digital system is a microphone. Has a coded address range, and Before determining that the microcode is in the ROM or RAM The means for writing to the ROM is the first subaddress of the microcode address range. Use a sub-section and a second sub-section for the RAM. System characterized by. 5. The system of claim 2 wherein said means comprises: The requested microcode is in the ROM or RAM cache A control logic circuit that determines whether or not When the desired desired microcode is requested, the RAM cache is loaded. And a tag memory indicating whether or not it is actually resident in the cache and is valid. And the system. 6. The system of claim 5, wherein the system comprises: Take the desired microcode from the separate memory device and store it in the R Further has a direct access memory controller for storing in AM cache A system characterized by that. 7. The system of claim 1, wherein said second code of said microcode. The loop is mapped directly from the separate memory device to the RAM cache System characterized by. 8. The system of claim 1, wherein said second code of said microcode. The loop is sufficiently larger than the first group of microcode System to do. 9. The system of claim 1, wherein the digital system is a graphics A system characterized by being a processor. Ten. The system of claim 1, wherein the first and second microcodes are A system in which a subset of two groups is used to make a diagnosis. 11. The microcode used in digital systems is stored in ROM memory unit. ROM and RAM are stored in a memory and RAM memory unit. The digital system has m and k addressable locations, respectively. Integrated system, the digital system has an n-bit address space. However, the method of storing the microcode is Adds all but one bit of addressable ROM location Mapping to less space, Adds all but one bit of addressable RAM location Mapping to the less space, The remaining bits of m bits and k bits are either the ROM or the RAM. A method characterized by being used as a selection bit for accessing either. 12. The method of claim 11 wherein the ROM and RAM are currently in use. Further comprising the step of checking the value of the select bit to monitor the status. How to characterize. 13. The method of claim 11, wherein Microcode to detect future switching between the ROM and RAM. Pre-decoding the branch address, If a switch is detected, use an unused memory unit Further enabling and subsequently disabling the other How to characterize.