JP4068828B2

JP4068828B2 - Integrated separation-type switching cache memory and processor system having the cache memory

Info

Publication number: JP4068828B2
Application number: JP2001327733A
Authority: JP
Inventors: 文男荒川
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 2001-10-25
Filing date: 2001-10-25
Publication date: 2008-03-26
Anticipated expiration: 2021-10-25
Also published as: JP2003131943A

Description

【０００１】
【発明の属する技術分野】
本発明はキャッシュメモリを有するプロセッサシステムに係わり、特に複数のコマンドを処理可能なプロセッサと該プロセッサからのアクセス要求に応じて動作しうるキャッシュメモリを有するプロセッサシステムに関する。更に、統合型および分離型の双方のキャッシュメモリシステムを同一アーキテクチャで実現することを可能にし、また、統合型キャッシュメモリシステムを分離型並に高速化することを可能にするものである。
【０００２】
【従来の技術】
図１にキャッシュメモリアーキテクチャの変遷の概略を示す。キャッシュメモリ導入以前は（１）のようにプロセッサＣＰＵとメインメモリＭＭは直接命令およびデータをやり取りしていた。その後、メインメモリＭＭの容量増大およびプロセッサＣＰＵの高速化によって、メインメモリＭＭの速度がプロセッサシステムの性能を律速するようになった。
【０００３】
そこで、（２）のようにメインメモリＭＭに比べて小容量かつ高速なのキャッシュメモリＵＣをプロセッサＣＰＵとメインメモリＭＭの間に配置して性能を向上させた。初期のキャッシュメモリＵＣは命令とデータの双方を扱う統合型であった。
【０００４】
その後、（３）のようにプロセスの微細化によってプロセッサＣＰＵとキャッシュメモリＵＣを同一チップ上に集積することが可能になった。これによってプロセッサＣＰＵとキャッシュメモリＵＣを結ぶ信号線数を大幅に増加させることが可能となり、（４）のように、キャッシュメモリＵＣを命令キャッシュＩＣとデータキャッシュＤＣに分離して同時アクセスを可能にした、ハーバードアーキテクチャが登場した。そして、高性能なキャッシュアーキテクチャはハーバードアーキテクチャであるということが常識となった。
【０００５】
その後、スーパスカラまたはＶＬＩＷ（ＶｅｒｙＬｏｎｇＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）アーキテクチャが登場し、同時に複数のデータアクセスを行うことが可能となった。このため、（５）のようにデータキャッシュＤＣを複数ポート化したプロセッサが登場した。この複数ポート化はバンクインターリーブ方式によって異なるバンクへのアクセスのみ同時実行するのが一般的である。
【０００６】
また、統合型キャッシュメモリアーキテクチャがハーバードアーキテクチャより低コストであることから、同一のプロセッサファミリにおいて、ローエンド版は統合型、ハイエンド版はハーバードアーキテクチャという場合がある。例えば、「ＭｉｃｒｏｐｒｏｃｅｓｓｏｒＲｅｐｏｒｔＶｏｌ．９，ｎｏ．３，３／６／９５，ｐ．１２」記載のＳＨ−３と「ＭｉｃｒｏｐｒｏｃｅｓｓｏｒＲｅｐｏｒｔＶｏｌ．１０，ｎｏ．１４，１０／２８／９６，ｐｐ．３２−３５」記載のＳＨ−４は同じＳｕｐｅｒＨシリーズのプロセッサであるが、前者は統合型、後者はハーバードアーキテクチャである。
【０００７】
近年、プロセッサに依存しないプログラミング言語としてＪＡＶＡが急速に普及しつつある。ＪＡＶＡは命令の書き換えを行う言語である。初回に実行した複雑な命令を、一度実行することによって確定した情報をもとに高速実行する命令に書き換える。更に、ＪＡＶＡで書かれたプログラムを高速実行するために、実行頻度の高いルーチンを検出してプロセッサ固有の機械語のプログラムに書き換えて高速実行する方式もＪＩＴ（Ｊｕｓｔ−ｉｎ−ｔｉｍｅ）コンパイル方式として一般化している。
【０００８】
キャッシュメモリによる性能向上はメモリアクセスの空間的時間的局所性を前提としている。したがって、該局所性がない場合は有効に働かない。例えば、「ＭｉｃｒｏｐｒｏｃｅｓｓｏｒＲｅｐｏｒｔＶｏｌ．１３，ｎｏ．１２，９／１３／９９，ｐｐ１，６−１０」記載のネットワークプロセッサＩＸＰ１２００はデータキャッシュを内蔵せず、外付けのＳＲＡＭやＳＤＲＡＭに直接アクセスする。また、「ＭｉｃｒｏｐｒｏｃｅｓｓｏｒＲｅｐｏｒｔＶｏｌ．１３，ｎｏ．５，４／１９／９９，ｐｐ１，６−１１」記載のエモーションエンジンＥＥのベクトル浮動小数点ユニットＶＰＵはキャッシュの代わりに専用のＲＡＭを持っている。そして、ソフトウェアによって制御されるダイレクトメモリアクセスユニットが該ＲＡＭへのデータアクセスを行う。
【０００９】
【発明が解決しようとする課題】
上記のようなキャッシュアーキテクチャの歴史的変遷の結果、性能を重視する場合はハーバードアーキテクチャ、コストを重視する場合は統合型キャッシュメモリアーキテクチャとすることが常識となっている。しかし、プロセス微細化に伴う集積度の向上により、統合型アーキテクチャとハーバードアーキテクチャのコストの差がチップ全体のコストに比べて小さくなってきており、２種類のキャッシュメモリアーキテクチャを製品別に作り分けるメリットがなくなってきている。
【００１０】
また、上記ＪＡＶＡのように命令の書き換えを行う言語が普及してくると、ハーバードアーキテクチャが必ずしも良いとは限らない。ハーバードアーキテクチャにおいては命令の書き換えをハードウェアで検出しないのが一般的である。このため、命令を書き換えた場合はソフトウェア責任で書き換え前の命令が実行されないことを保証しなければならない。命令書き換え時には、書き換えられる命令はデータとして扱われるため、書き換えた命令はデータキャッシュＤＣに格納される。この時、書き換え前の命令が命令キャッシュＩＣに存在しても更新されない。
【００１１】
ソフトウェアは、命令キャッシュＩＣ上の書き換え前命令をクリアし、データキャッシュＤＣ上の書き換え後、命令をメインメモリＭＭに書き戻してから、書き換え後の命令を実行する。すると、ハードウェアは書き換え後の命令をメインメモリＭＭからフェッチして実行する。尚、命令の書き換えをハードウェアで検出したとしても、上記ソフトウェア処理をハード化するだけなので効率的な処理は困難である。
【００１２】
一方、統合型キャッシュメモリアーキテクチャでは命令書き換えによってキャッシュメモリＵＣ上の命令が書き換えられる。したがって、命令書き換え後にキャッシュメモリＵＣから命令フェッチすれば書き換え後命令をフェッチすることが出来る。このためには、通常のパイプライン方式のプロセッサでは、命令書き換え後にパイプライン上に存在する実行中の命令をキャンセルするだけでよい。したがって、命令書き換えをサポートするには統合型キャッシュメモリアーキテクチャの方が適している。
【００１３】
プロセス微細化に伴う集積度の向上により、小規模システムではメインメモリＭＭをオンチップ化することが可能となってきている。また、上記エモーションエンジンＥＥのように命令またはデータをオンチップメモリに載せ、ダイレクトメモリアクセス等により、ソフトウェアであらかじめオンチップメモリに命令やデータを転送して、実際に使用する際に確実に高速アクセスすることも可能である。こうすることにより、使用する命令やデータが予測可能であれば、メモリアクセスに空間的時間的局所性がなくても高速化が可能である。このような状況では、キャッシュメモリが不要であったり、命令キャッシュとデータキャッシュのうち一方のみが必要であったりする。
【００１４】
更に、プロセッサシステムが、メインフレーム、ワークステーション、ＰＣ等に限定されていた時代から、携帯電話、デジタル家電、自動車といった多種多様な製品に搭載される時代となり、用途に応じてキャッシュメモリの最適な構成も多様化している。したがって、同一プロセッサで多様なキャッシュメモリ構成を取れることも重要になってきている。
【００１５】
本発明が解決しようとする第１の課題は、従来、ハーバードアーキテクチャでのみ達成可能であった命令フェッチとデータアクセスの同時実行を統合型キャッシュメモリアーキテクチャで達成することである。これによって、高性能と命令書き換えの容易性とを同時に達成することが可能となる。また、命令とデータの一方を重点的にキャッシングしたいアプリケーションの場合でも、ハーバードアーキテクチャのように一方のキャッシュが無駄になることなく、全容量を活用することが出来る。
【００１６】
本発明が解決しようとする第２の課題は、同一のプロセッサで統合型キャッシュメモリアーキテクチャとハーバードアーキテクチャの双方を実現することである。そして更に、同一プロセッサで多様なキャッシュメモリ構成を実現することである。
【００１７】
【課題を解決するための手段】
上記第１の課題は、統合型キャッシュメモリを複数ポート化することによって解決される。これによって命令フェッチとデータアクセス要求を同時に処理することが可能となり、ハーバードアーキテクチャと同等の性能が達成される。但し、純粋な複数ポート化はハードウェア量を増大させ、同一面積で実現できるキャッシュメモリ容量が小さくなってしまう。そこで、キャッシュメモリをアドレスの一部によって指定される複数バンクによって構成し、各バンクを１ポートキャッシュとし、命令フェッチとデータアクセス要求が異なるバンクに対するものであれば同時処理、同一バンクに対する場合は逐次処理することにより、完全な複数ポートキャッシュメモリよりも、ハードウェア量を削減し、キャッシュメモリ容量を維持することが出来る。プロセス微細化に伴ってキャッシュメモリの大容量化が可能となっているが、大容量化にはメモリマットの分割が必要であり、分割されたメモリマットをバンクに割当てれば、バンク分割に伴うコストの増大は回避できる。
【００１８】
上記第２の課題は、上記ポートまたはバンクの指定にアドレスの一部だけでなく命令フェッチとデータアクセス要求の識別信号も用いることによって解決される。統合型キャッシュメモリアーキテクチャとして使用する場合はアドレスの一部を、ハーバードアーキテクチャとして使用する場合は識別信号を使用する。このようにポートまたはバンク指定に使用する信号を切替えることにより、２つのキャッシュメモリアーキテクチャを同一のプロセッサで実現する。更に、信号切替の仕方によってハーバードアーキテクチャの命令キャッシュとデータキャッシュの容量配分を変えることも可能である。また、複数ポート化を複数のウェイに対して異なるアドレスでアクセスできるようにすることによっても、同様に上記第１および第２の課題の解決が可能である。
【００１９】
さらに、上記課題を解決する為に、本発明は複数のコマンドを独立に処理可能なプロセッサと、該プロセッサからのアクセス要求に応じて動作するキャッシュメモリを有するプロセッサシステムにおいて、前記キャッシュメモリが複数個のポートを有し、該複数個のポートを介して前記プロセッサから送信される命令フェッチを含む複数の制御コマンド及び複数のアドレス信号を同時に処理しうることを特徴とするプロセッサシステムを提供することにある。
【００２０】
さらに、本発明は命令フェッチとデータアクセスを独立に処理可能なプロセッサと、該プロセッサからのアクセス要求に呼応して動作するキャッシュメモリを有するシステムにおいて、前記キャッシュメモリを複数のセレクタ及び複数のアドレスの一部によって指定される複数バンクによって構成し、各バンクを１ポートキャッシュとし、前記命令フェッチ要求と前記データアクセス要求が異なるバンクに対するものであれば同時処理、同一バンクに対する場合は逐次処理することを特徴とするプロセッサシステムを提供することにある。
【００２１】
さらに、本発明は複数のバンクを備え、該複数のバンクを制御するコントローラを有し、該コントローラは前記複数のバンクの各々に命令或いはデータの書き込み又は読み出しを行う為の制御信号を生成し、前記コントローラの制御により該制御信号を前記複数のバンクに供給し、前記複数のバンク内の異なるバンクに対し前記命令或いはデータの書き込み或いは読み出し動作を同時に行い、同一のバンクに対し、前記命令或いはデータの書き込み或いは読み出し動作を逐次に行うことを特徴とするキャッシュメモリを提供することにある。
【００２２】
【発明の実施の形態】
以下、本発明の各実施例を図を用いて説明する。各実施例の図における同一符号は同一物または相当物を示す。図２は本発明を適用したプロセッサシステムの例である。プロセッサＬＳＩおよびメインメモリＭＭから成る。プロセッサＬＳＩは中央処理装置ＣＰＵ、キャッシュメモリＣＭ、外部メモリインタフェイスＥＭＩ、および周辺モジュールＰＭから成り、内部バスＩＢで接続されている。中央処理装置ＣＰＵは命令フェッチユニットＩＦＵ、実行ユニットＥＸＵ、およびバスインタフェイスユニットＢＩＵから成る。又、当該プロセッサ及びキャッシュメモリＣＭは同一のＬＳＩチップ上に集積されている。
【００２３】
中央処理装置ＣＰＵの基本動作は以下の通りである。まず、命令フェッチユニットＩＦＵがキャッシュメモリＣＭに命令アドレスＡＩと共に命令フェッチ要求ＲＥＱＩを出す。キャッシュメモリＣＭは要求ＲＥＱＩに応じて読出した命令ＲＩを命令フェッチユニットＩＦＵに返す。命令フェッチユニットＩＦＵは命令ＲＩを実行ユニットＥＸＵに供給する。実行ユニットＥＸＵは命令ＲＩをデコードし実行する。デコードした命令がメモリ読出し命令の場合は、データアドレスＡＤと共にデータアクセス要求ＲＥＱＤを出す。キャッシュメモリＣＭは要求ＲＥＱＤに応じて読出したデータＲＤを実行ユニットＥＸＵに返す。また、デコードした命令がメモリ書込み命令の場合は、データアドレスＡＤおよび書込みデータＷＤと共にデータアクセス要求ＲＥＱＤを出す。キャッシュメモリＣＭは要求ＲＥＱＤに応じてデータＷＤを書込む。
【００２４】
命令フェッチ要求ＲＥＱＩまたはデータアクセス要求ＲＥＱＤがキャッシュミスした場合は、バスインタフェイスユニットＢＩＵが該要求に関連する命令アドレスＡＩ、データアドレスＡＤおよび書込みデータＷＤ等を受け取り、内部バスＩＢを経由して外部メモリインタフェイスＥＭＩに外部メモリフェッチ要求を出す。外部メモリインタフェイスＥＭＩは要求に応じてメインメモリＭＭにアドレスＡを出力して外部メモリフェッチ要求を出し、メインメモリＭＭはこれに呼応してデータＤを返す。外部メモリインタフェイスＥＭＩは内部バスＩＢを経由してデータＤをバスインタフェイスユニットＢＩＵに返す。バスインタフェイスユニットＢＩＵは、外部アドレスＡＸ、書込みデータＷＸと共に外部アクセス要求ＲＥＱＸを出し、キャッシュメモリＣＭは外部アクセス要求ＲＥＱＸ（外部からのアクセス要求である）を処理するポートを有し、要求ＲＥＱＸに応じて書込みデータＷＸを書込む。
【００２５】
中央処理装置ＣＰＵにおいてパイプライン動作が行われると、命令フェッチユニットＩＦＵは実行ユニットＥＸＵの命令処理と同時に後続命令のフェッチを行う。更に、実行ユニットＥＸＵのデータアクセスがノンブロッキングであると、キャッシュメモリＣＭのデータアクセスミスによる外部メモリアクセスと同時に後続命令によるデータアクセスが行われる。このため、キャッシュメモリＣＭには、命令フェッチ要求とデータアクセス要求、或いは命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸの内の何れか１組から成る複数の制御コマンド、及び命令アドレス信号ＡＩとデータアドレス信号ＡＤ、或いは信号ＡＩ、信号ＡＤと外部アドレス信号ＡＸの内の何れか１組から成る複数のアドレス信号を同時に処理する能力が必要である。
【００２６】
図３は本発明を適用したキャッシュメモリＣＭの第１の実施例である。キャッシュメモリＣＭはキャッシュ制御レジスタＣＣＲ、バンク信号生成部ＢＫＧ（或いは信号生成部）、ＣＭを制御するキャッシュ制御部ＣＣ、およびキャッシュ本体から成る。ＢＫＧは複数のアドレス信号に基づき、ＣＣに与える複数の制御信号（ＢＫＩ，ＢＫＤ，ＢＫＸ）を生成する。
【００２７】
キャッシュメモリ本体は、複数のアドレス信号（ＡＩ，ＡＤ，及びＡＸ）の各々における該複数のアドレスの一部で有る特定のビットによって指定される４つのバンクＢＫ０〜ＢＫ３に分割されており異なるバンクへの同時アクセスが可能である。
【００２８】
キャッシュメモリ内の複数バンクの各々を指定する際に、上記特定のビットの代わりに命令フェッチ要求又はデータアクセス要求、及び上記複数の制御信号の入力に基づき、キャッシュメモリ内に有するキャッシュ制御部にて生成される複数のアドレス選択制御信号及び書き込みデータ選択制御信号の制御により複数のセレクタの各々を介してバンクを指定することにより、命令データ分離型キャッシュメモリとして動作する。
【００２９】
バンクＢＫ０〜ＢＫ３はそれぞれアクセスアドレスＡ０〜Ａ３、書込み時は更に書込みデータＷ０〜Ｗ３を受取って、読出しまたは書込み動作を行い、読出し時には読出しデータＲ０〜Ｒ３を出力する。バンクＢＫ０〜ＢＫ３の各々は１ポートキャッシュと見なされる。アクセスアドレスＡ０〜Ａ３はそれぞれアドレスマルチプレクサ（又はセレクタ）ＡＭ０〜ＡＭ３において、アドレス選択制御信号ＣＡ０〜ＣＡ３によって、アドレスＡＩ、ＡＤ、またはＡＸから選択される。書込みデータＷ０〜Ｗ３はそれぞれ書込みデータマルチプレクサ（又はセレクタ）ＷＭ０〜ＷＭ３において、書込みデータ選択制御信号ＣＷ０〜ＣＷ３によって、書込みデータＷＤまたはＷＸから選択される。読出しデータＲＩ、ＲＤ、およびＲＸはそれぞれ読出しデータマルチプレクサ（又はセレクタ）ＲＭＩ、ＲＭＤ、およびＲＭＸにおいて、読出しデータ選択制御信号ＣＲＩ、ＣＲＤ、およびＣＲＸによって、読出しデータＲ０〜Ｒ３から選択される。尚、図中の各マルチプレクサの入力信号に振られた番号は、その入力を選択する場合にアサートされる選択制御信号のビット番号である。
【００３０】
図４は第１の実施例のキャッシュ制御部ＣＣの詳細である。バンク信号生成部ＢＫＧからの命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸと、命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸとから、キャッシュ本体の各マルチプレクサの制御信号を生成する。
【００３１】
詳細に説明すると、キャッシュ制御部は命令フェッチ要求、データアクセス要求及び複数の制御信号（ＢＫＩ，ＢＫＤ，ＢＫＸ）の入力に対し、制御信号に基づいて指定されたバンクに既にデータアクセス要求が割り当てられた場合は、更なる割り当てを実行しないように遅延信号を生成し、制御信号に基づいて指定されたバンクに未だデータアクセス要求が割り当てられていない場合は、複数のアドレス選択制御信号又は書き込みデータ選択制御信号を生成する。
【００３２】
図３の読出しデータ選択制御信号ＣＲＩ、ＣＲＤ、およびＣＲＸは４入力マルチプレクサを制御する４ビットの信号である。それぞれ２ビットの命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸを単純にデコードすれば生成できるので図示していない。
【００３３】
アドレス選択制御信号ＣＡ０〜ＣＡ３および書込みデータ選択制御信号ＣＷ０〜ＣＷ３は命令フェッチ、データアクセス、および外部アクセスの優先度を決定しなければ生成できない。最も単純な優先度決定方式はプログラム本来の逐次実行順序を守ることである。外部アクセスは先行するキャッシュアクセスのミスによって生じるので最も逐次実行順序が早い。また、命令フェッチは後続命令の準備であり最も逐次実行順序が遅い。したがって、優先度は第１が外部アクセス、第２がデータアクセス、第３が命令フェッチである。
【００３４】
しかしながら、高度に最適化されたプログラムでは命令やデータをプリフェッチ命令等によって事前にメインメモリＭＭからキャッシュメモリＣＭにキャッシングし、実際に使用する際にキャッシュミスが起こらないようにする。このようなプログラムでは外部アクセスの優先度を下げた方が性能は向上する。ジャストオンタイムでキャッシングするようにプログラムを最適化することは困難なので、少し余裕時間を見てキャッシングした場合、本来必要な時刻より早くキャッシングするので、これを待たせて内部動作をストールさせない方が良いからである。そこで、本実施例では優先度を第１がデータアクセス、第２が命令フェッチ、第３が外部アクセスとする。
【００３５】
図３に示すアクセスアドレスＡ０〜Ａ３はアドレスＡＩ、ＡＤ、またはＡＸから選択されるので、アドレスマルチプレクサＡＭ０〜ＡＭ３は３入力であり、制御信号ＣＡ０〜ＣＡ３は３ビットである。そこで、制御信号のビット番号は命令アドレスに２、データアドレスに１、外部アドレスに０を割当てる。
【００３６】
まず、図４に示す通り、最優先のデータアクセス要求ＲＥＱＤがアサートされたら、データバンクＢＫＤで指定されるバンクを割当て、アドレス選択制御信号ＣＡ０〜ＣＡ３のうち割当てたバンクの制御信号のビット１をアサートする。即ち、２ビットのデータバンクＢＫＤをデータバンクデコーダＢＤＤによってデコードした４ビット信号のそれぞれとデータアクセス要求ＲＥＱＤのＡＮＤ論理を取る。
【００３７】
次に、命令フェッチ要求ＲＥＱＩがアサートされたら、命令バンクＢＫＩで指定されるバンクを割当てる。この時、該当バンクに既にデータアクセスが割当てられていた場合は、命令フェッチ遅延信号ＤＬＩをアサートし、命令フェッチの割当ては行わない。割当てを行った場合はアドレス選択制御信号ＣＡ０〜ＣＡ３のうち割当てたバンクの制御信号のビット２をアサートする。即ち、２ビットの命令バンクＢＫＩを命令バンクデコーダＢＤＩによってデコードした４ビット信号のそれぞれと命令アクセス要求ＲＥＱＩのＡＮＤ論理を取り、更に、アドレス選択制御信号ＣＡ０〜ＣＡ３のビット１の反転信号とＡＮＤ論理を取る。アドレス選択制御信号ＣＡ０〜ＣＡ３のビット０は該当バンクにデータアクセスも命令アクセスも行わない場合にアサートする。即ち、アドレス選択制御信号ＣＡ０〜ＣＡ３のビット１の反転信号と、ビット２の元になっている命令アクセス要求ＲＥＱＩと当該ＢＤＩからの信号とのＡＮＤ論理を取った信号の反転信号とのＡＮＤ論理を取った信号である。
【００３８】
また、２ビットの外部バンクＢＫＸを外部バンクデコーダＢＤＸによってデコードした４ビット信号のそれぞれと外部アクセス要求ＲＥＱＸのＡＮＤ論理を取った信号がアサートされて、該当バンクへの外部アクセス要求が出たにもかかわらず、該当バンクアドレスとして外部アドレスを選択する信号、即ち、アドレス選択制御信号ＣＡ０〜ＣＡ３のビット０がアサートされない場合、必要なバンクが選択できなかったので外部アクセス遅延信号ＤＬＸをアサートする。
【００３９】
図３に示す書込みデータＷ０〜Ｗ３は書込みデータＷＤまたはＷＸから選択されるので、書込みデータマルチプレクサＷＭ０〜ＷＭ３は２入力であり、制御信号ＣＷ０〜ＣＷ３は２ビットである。そこで、制御信号のビット番号はデータアドレスに１、外部アドレスに０を割当てる。まず、書込みデータ選択制御信号ＣＷ０〜ＣＷ３のビット１は、アドレス選択制御信号ＣＡ０〜ＣＡ３のビット１と同一論理である。データアクセスがない場合は、書込みデータとして図３に示す外部書込みデータＷＸを選択するので、図４に示す通り、書込みデータ選択制御信号ＣＷ０〜ＣＷ３のビット０はビット１の反転である。
【００４０】
図５はバンク信号生成部ＢＫＧの第１の例である。キャッシュメモリＣＭの容量を１２８ＫＢとし、４ウェイセットアソシアティブ方式とすると、１ウェイ当りの容量は３２ＫＢであり、インデクスは１５ビットである。バンクインタリーブ方式ではバンク指定にインデクスの一部を使用する。本実施例ではバンク数が４なので、バンク指定に２ビット使用する。どのビットをバンク指定に用いたときにバンク競合によるストール頻度が最小になるかはプログラムに依存する。逆に、バンク指定ビットをプログラマに公開することによって競合を抑えたプログラムを作成することも可能である。図５ではアドレスのビット１４〜０をインデクスとし、インデクスの上位２ビットをバンク指定ビットとする。したがって、ビット１４〜１３がバンク指定ビットである。
【００４１】
バンク信号生成部ＢＫＧは、キャッシュ制御レジスタＣＣＲのバンク制御フィールドＢＣによって制御される。図５ではバンク制御フィールドＢＣは１ビットで、バンクマルチプレクサＢＭＩ、ＢＭＤ、およびＢＭＸを制御して２ビットのバンク信号ＢＫＩ、ＢＫＤ、およびＢＫＸの上位ビットを選択する。本実施例では下位ビットは常にアドレスＡＩ、ＡＤ、およびＡＸのビット１３である。図５においてバンクマルチプレクサＢＭＩ、ＢＭＤ、およびＢＭＸの入力信号に振られた番号は、該入力信号を選択する時のバンク制御フィールドＢＣの値である。即ち、バンク制御フィールドＢＣが１であれば、バンク信号ＢＫＩ、ＢＫＤ、およびＢＫＸの上位ビットとして、それぞれアドレスＡＩ、ＡＤ、およびＡＸのビット１４を選択する。一方、バンク制御フィールドＢＣが０であれば、それぞれ値０、値１、および外部データアクセス信号ＤＡを選択する。外部データアクセス信号ＤＡは外部アクセスがデータ系である時にアサートされる。
【００４２】
この結果、バンク制御フィールドＢＣが１であれば、バンク信号ＢＫＩ、ＢＫＤ、およびＢＫＸはそれぞれアドレスＡＩ、ＡＤ、およびＡＸのビット１４〜１３となる。したがって、キャッシュメモリＣＭは統合型の４バンクインタリーブキャッシュとなる。
【００４３】
バンク制御フィールドＢＣが０であれば、バンク信号ＢＫＩはアドレスＡＩのビット１３の値に応じて０または１となり、バンク信号ＢＫＤはアドレスＡＤのビット１３の値に応じて２または３となり、バンク信号ＢＫＸはアドレスＡＸのビット１３の値に応じて、外部データアクセス信号ＤＡがネゲートされれば０または１、アサートされれば２または３となる。
【００４４】
したがって、命令フェッチおよび命令系外部アクセスに対しては、バンク信号ＢＫＩおよびＢＫＸが常にバンク０または１を指定し、データアクセスおよびデータ系外部アクセスに対してはバンク信号ＢＫＤおよびＢＫＸが常にバンク２または３を指定する。この結果、バンク０および１が２バンクインタリーブ命令キャッシュ、バンク２および３が２バンクインタリーブデータキャッシュとして動作する。そして、アクセスするバンクが異なれば同時アクセス可能なので、ハーバードアーキテクチャとなる。尚、この時アドレスのビット１４は常にタグとして使用する。バンク制御フィールドＢＣが１であればビット１４をタグとすることは冗長であるが誤動作はせず、バンク制御フィールドＢＣが０であればビット１４はタグとして必要である。１ビットの冗長性を取り除こうとするとかえって論理が複雑になり速度が低下する。
【００４５】
図６はバンク生成部ＢＫＧの第２の例である。通常、システムによってプログラムを置くアドレス空間とデータを置くアドレス空間はあらかじめ決まっていることが多いので、これら２つの空間を識別するアドレスのビットがあれば、これをバンク指定ビットとすることにより、命令とデータのバンク競合を避けることができる。この結果、統合型の４バンクインタリーブキャッシュでありながら、ハーバードアーキテクチャと同等の性能を得ることが出来る。
【００４６】
図６の例ではバンク制御フィールドＢＣを２ビットとし、図５のバンク制御フィールドＢＣが０および１の場合に加えて、２および３の場合を追加している。そして、バンク制御フィールドＢＣが２の場合にはアドレスのビット２０を、３の場合にはビット２４を選択する。
【００４７】
この結果、プログラムサイズが１ＭＢ程度の比較的小さいシステムではビット２０を、１６ＭＢ程度のやや大きなシステムではビット２４をバンク指定ビットとすることにより、命令とデータのバンク競合を避けることができる。この場合も図５の場合と同様に、ビット２４、２０、１４は常にタグとしても使用する。
【００４８】
図７〜９は本実施例の第１の動作例である。図７はバンク生成部ＢＫＧの動作例である。図中太い信号線はアサート、細い信号線はネゲートとなっている。本動作例ではキャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣが１で統合型キャッシュモードであるとする。そして、命令アドレスＡＩ、データアドレスＡＤ、および外部アドレスＡＸを１６進数で００００１２３０、００１０２４６８、００１０４８Ｃ０とし、命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸが全てアサートされたとする。
【００４９】
尚、図７においてはビット１４〜１３の値が明確と成るように１６進数を２進数に展開してある。バンク制御フィールドＢＣが１なので、命令バンクマルチプレクサＢＭＩ、データバンクマルチプレクサＢＭＤ、および外部バンクマルチプレクサＢＭＸはそれぞれ命令アドレスＡＩ、データアドレスＡＤ、および外部アドレスＡＸのビット１４を選択し、それぞれ０、０、および１を出力する。該出力と常にバンク指定信号として使用されるビット１３とを結合した信号がバンク信号であるから、命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸはそれぞれ０、１、および２となる。
【００５０】
図８はキャッシュ制御部ＣＣの動作例である。図中太い信号線はアサート、細い信号線はネゲートとなっている。命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸはそれぞれ命令バンクデコーダＢＤＩ、データバンクデコーダＢＤＤ、および外部バンクデコーダＢＤＸによってデコードされ、該デコーダ出力のそれぞれビット０、１、および２がアサートされる。
【００５１】
命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸは全てアサートされているので、これらの信号とのＡＮＤ論理後もアサート状態を保つ。そして優先度判定論理のＡＮＤゲートにより、アドレス選択制御信号はＣＡ０のビット２、ＣＡ１のビット１、ＣＡ２のビット０、ＣＡ３のビット０がアサートされ、書込みデータ選択信号はＣＷ０のビット０、ＣＷ１のビット１、ＣＷ２のビット０、ＣＷ３のビット０がアサートされる。
【００５２】
また、バンク競合判定論理により命令フェッチ遅延ＤＬＩおよび外部アクセス遅延ＤＬＸはネゲートされる。また、図示していないが、読出しデータ選択信号ＣＲＩ、ＣＲＤ、およびＣＲＸは命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸの単純デコードであるから、それぞれビット０、１、および２がアサートされる。
【００５３】
図９はキャッシュメモリＣＭの動作例である。キャッシュ制御部ＣＣからの制御信号ＣＴＬとして、アドレス選択制御信号はＣＡ０のビット２、ＣＡ１のビット１、ＣＡ２のビット０、ＣＡ３のビット０、書込みデータ選択信号はＣＷ１のビット１、他ＣＷ０，２，３のビット０、読出しデータ選択信号はＣＲＩのビット０、ＣＲＤのビット１、およびＣＲＸのビット２がアサートされている。
【００５４】
この結果、アドレスＡ０〜Ａ３にはそれぞれ命令アドレスＡＩ、データアドレスＡＤ、および外部アドレスＡＸが選択される。また、書込みデータＷ０〜Ｗ３としては、Ｗ１には書込みデータＷＤが、他には書込みデータＷＸが選択される。そして、読出しデータＲＩ、ＲＤ、およびＲＸにはそれぞれ読出しデータＲ０、Ｒ１、およびＲ２が選択される。
【００５５】
以上のようにキャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣを１として統合型キャッシュモードとし、バンク指定ビットとして使用されるビット１４〜１３が異なるアドレスでアクセスすれば、命令フェッチ要求、データアクセス要求、および外部アクセス要求を同時に異なるバンクで処理することが出来る。すなわち、同時処理が実行される。
【００５６】
図１０〜１２は本実施例の第２の動作例である。本動作例でもキャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣが１で統合型キャッシュモードであるとする。そして、命令アドレスＡＩ、データアドレスＡＤ、および外部アドレスＡＸを１６進数で００００１２３０、００１０１３５７、００１００２４０とし、命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸが全てアサートされたとする。
【００５７】
すると、図１０のように、命令バンクマルチプレクサＢＭＩ、データバンクマルチプレクサＢＭＤ、および外部バンクマルチプレクサＢＭＸは全て０を出力する。更にビット１３と結合すると、命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸは全て０となる。
【００５８】
図１１はキャッシュ制御部ＣＣの動作例である。命令バンクデコーダＢＤＩ、データバンクデコーダＢＤＤ、および外部バンクデコーダＢＤＸの出力は全てビット０がアサートされ、命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸとのＡＮＤ論理後もアサート状態を保つ。そして優先度判定論理のＡＮＤゲートにより、アドレス選択制御信号としてはＣＡ０のビット１、他ＣＡ１〜３のビット０がアサートされ、書込みデータ選択信号としてはＣＷ０のビット１、他ＣＷ１〜３のビット０がアサートされる。
【００５９】
また、バンク競合判定論理により命令フェッチ遅延ＤＬＩおよび外部アクセス遅延ＤＬＸがアサートされる。即ち、命令フェッチおよび外部アクセスは待たされる。また、図示していないが、読出しデータ選択信号ＣＲＩ、ＣＲＤ、およびＣＲＸは命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸの単純デコードであるから、全て０がアサートされる。
【００６０】
図１２はキャッシュメモリＣＭの動作例である。キャッシュ制御部ＣＣからの制御信号ＣＴＬとして、アドレス選択制御信号はＣＡ０のビット１、他ＣＡ１〜３のビット０、書込みデータ選択信号はＣＷ０のビット１、他ＣＷ１〜３のビット０、読出しデータ選択信号は全てビット０がアサートされている。
【００６１】
この結果、アドレスＡ０にはデータアドレスＡＤ、他のアドレスＡ１からＡ３には外部アドレスＡＸが選択される。また、書込みデータＷ０には書込みデータＷＤが、他の書込みデータＷ１〜Ｗ３には書込みデータＷＸが選択される。そして、読出しデータＲＩ、ＲＤ、およびＲＸには全て読出しデータＲ０が選択される。
【００６２】
以上のようにキャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣを１として統合型キャッシュモードとし、バンク指定ビットとして使用されるビット１４〜１３が同一のアドレスでアクセスすると、バンク競合によりデータアクセスのみが実行され、命令フェッチおよび外部アクセスは待たされる。即ち、逐次処理が実行される。
【００６３】
図１３〜１５は第３の動作例である。第２の動作例と同一のアドレスによるアクセス要求を用い、キャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣを０としてハーバードアーキテクチャモードとする。また、外部アクセスはデータ系のアクセスとする。即ち外部データアクセス信号ＤＡをアサートする。すると、図１３のようにバンク制御フィールドＢＣが０なので、命令バンクマルチプレクサＢＭＩ、データバンクマルチプレクサＢＭＤ、および外部バンクマルチプレクサＢＭＸはそれぞれ値０、値１、および外部データアクセス信号ＤＡの値１を選択して出力する。該出力と常にバンク指定信号として使用されるビット１３とを結合した信号がバンク信号であるから、命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸはそれぞれ０、２、および２となる。
【００６４】
図１４はキャッシュ制御部ＣＣの動作例である。命令バンクデコーダＢＤＩ、データバンクデコーダＢＤＤ、および外部バンクデコーダＢＤＸの出力はそれぞれビット０、２、および２がアサートされ、命令フェッチ要求ＲＥＱＩ、データアクセス要求ＲＥＱＤ、および外部アクセス要求ＲＥＱＸとのＡＮＤ論理後もアサート状態を保つ。そして優先度判定論理のＡＮＤゲートにより、アドレス選択制御信号はＣＡ０のビット２、ＣＡ１のビット０、ＣＡ２のビット１、ＣＡ３のビット０がアサートされ、書込みデータ選択信号はＣＷ２のビット１、他ＣＷ０，１，３のビット０がアサートされる。また、バンク競合判定論理により外部アクセス遅延ＤＬＸがアサートされる。即ち、外部アクセスは待たされる。また、図示していないが、読出しデータ選択信号ＣＲＩ、ＣＲＤ、およびＣＲＸは命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸの単純デコードであるから、それぞれビット０、２、および２がアサートされる。
【００６５】
図１５はキャッシュメモリＣＭの動作例である。キャッシュ制御部ＣＣからの制御信号ＣＴＬとして、アドレス選択制御信号はＣＡ０のビット２、ＣＡ１のビット０、ＣＡ２のビット１、ＣＡ３のビット０が、書込みデータ選択信号はＣＷ２のビット１、他ＣＷ０，１，３のビット０、読出しデータ選択信号はＣＲＩのビット０、ＣＲＤのビット２、およびＣＲＸのビット２がアサートされている。
【００６６】
この結果、アドレスＡ０〜Ａ３にはそれぞれ命令アドレスＡＩ、外部アドレスＡＸ、データアドレスＡＤ、および外部アドレスＡＸが選択される。また、書込みデータＷ０〜Ｗ３に関しては、Ｗ２には書込みデータＷＤが、他Ｗ０，１，３には書込みデータＷＸが選択される。そして、読出しデータＲＩ、ＲＤ、およびＲＸにはそれぞれ読出しデータＲ０、Ｒ２、およびＲ２が選択される。
【００６７】
以上のようにキャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣを０としてハーバードアーキテクチャモードとすると、第２の動作例と同一アドレスでアクセスしても、命令フェッチとデータアクセスのバンク競合を回避できる。一方、データアクセスとデータ系外部アクセスのバンク競合は回避できないが、これは通常のハーバードアーキテクチャでも回避できない。
【００６８】
図１６はバンク信号生成部の第４の動作例である。第２の動作例と同一のアドレスによるアクセス要求を処理しているが、図６のバンク生成部ＢＫＧを用い、キャッシュコントロールレジスタＣＣＲのバンク制御フィールドＢＣを２とする。また、プログラムとデータのアドレス空間を区別するアドレスビットはビット２０とする。
【００６９】
すると、図１６のようにバンク制御フィールドＢＣが２なので、命令バンクマルチプレクサＢＭＩ、データバンクマルチプレクサＢＭＤ、および外部バンクマルチプレクサＢＭＸはそれぞれ０、１、および１を出力する。該出力と常にバンク指定信号として使用されるビット１３とを結合した信号がバンク信号であるから、命令バンクＢＫＩ、データバンクＢＫＤ、および外部バンクＢＫＸはそれぞれ０、２、および２となる。即ち、第３の動作例と同一のバンク信号を出力する。この結果、キャッシュ制御部ＣＣおよびキャッシュメモリＣＭも同様に動作し、統合型キャッシュでありながら、ハーバードアーキテクチャと同様、命令フェッチとデータアクセスの競合を回避できる。
【００７０】
前述した各図の動作例をまとめると命令フェッチ要求、データアクセス要求及び外部アクセス要求並びに複数の制御信号（ＢＫＩ，ＢＫＤ，ＢＫＸ）の入力によりキャッシュ制御部にて生成される複数のアドレス選択制御信号及び書き込みデータ選択制御信号の制御により、複数のアドレス信号（ＡＩ，ＡＤ，ＡＸ）及び書き込みデータ（ＷＤ，ＷＸ）から複数のセレクタの各々を介して、複数のバンク内の異なるバンクに対しては同時に、同一バンクに対しては逐次的に、複数のアクセスアドレスを与える。
【００７１】
さらに上記複数のバンクの各々において、複数のアクセスアドレスへの書き込みデータの書き込みまたは各々のアクセスアドレスからのデータ読出しを、異なるバンクに対しては同時に、同一バンクに対しては逐次的に任意に行う。
【００７２】
特に逐次処理について換言して表現すれば、キャッシュ制御部ＣＣ（コントローラ）の制御により制御信号（ＣＡ０〜３、ＣＷ０〜３）を複数のバンクの各々に供給し、同一のバンクに対し、命令或いはデータの書き込み或いは読み出し動作を逐次に行う。
【００７３】
図１７は本発明の第２の実施例のキャッシュメモリＣＭである。バンク毎に優先的に選択するアクセス要求を、キャッシュ制御レジスタＣＣＲのバンク選択フィールドＢＳ０〜ＢＳ３によってあらかじめ指定しておき、同時に２つ以上アクセス要求がきた場合はバンク毎に優先度の高いアクセス要求を受け付ける。図３に示す第１の実施例との違いはキャッシュ制御レジスタＣＣＲからキャッシュ制御部ＣＣへバンク選択フィールドＢＳ０〜ＢＳ３を出力し、これと命令フェッチ要求ＲＥＱＩおよびデータアクセス要求ＲＥＱＤとの入力からキャッシュ制御信号ＣＴＬを生成している点である。
【００７４】
図１８は第２の実施例のキャッシュ制御部ＣＣである。本方式の長所は、アクセス要求に比べて確定の遅いアドレス情報をバンク選択に使用しないため、高速にキャッシュアクセスを開始できることである。バンクの使用効率よりキャッシュのアクセス速度を重視する場合に適している。アドレスから生成されるバンク信号ＢＫＩ、ＢＫＤ、およびＢＫＸを使用せずにアドレス選択制御信号ＣＡ０〜ＣＡ３および書込みデータ選択制御信号ＣＷ０〜ＣＷ３を生成する。
【００７５】
アドレス選択制御信号ＣＡ０はバンク選択フィールドＢＳ０と命令フェッチ要求ＲＥＱＩおよびデータアクセス要求ＲＥＱＤとから以下のように生成する。バンク選択フィールドＢＳ０は命令フェッチ優先時に０、データアクセス優先時に１となる。アドレス選択制御信号ＣＡ０のビット２、１、および０はそれぞれ命令アドレスＡＩ、データアドレスＡＤ、および外部アドレスＡＸに対応する。
【００７６】
まず、ビット２は命令フェッチ要求ＲＥＱＩがアサートされた場合に、データアクセス要求ＲＥＱＤがネゲートされているか、又はバンク選択フィールドＢＳ０が０で命令フェッチ優先である場合にアサートされる。同様にビット１はデータアクセス要求ＲＥＱＤがアサートされた場合に、命令フェッチ要求ＲＥＱＩがネゲートされているか、又はバンク選択フィールドＢＳ０が１でデータアクセス優先である場合にアサートされる。ビット０は命令フェッチ要求ＲＥＱＩおよびデータアクセス要求ＲＥＱＤのどちらもネゲートされている場合にアサートする。アドレス選択制御信号ＣＡ１〜ＣＡ３も同様にバンク選択フィールドＢＳ１〜ＢＳ３と命令フェッチ要求ＲＥＱＩおよびデータアクセス要求ＲＥＱＤとから生成される。書込みデータ選択制御信号ＣＷ０〜ＣＷ３は、図４に示す第１の実施例同様、ビット１はアドレス選択制御信号ＣＡ０〜ＣＡ３のビット１と同一論理であり、ビット０はビット１の反転である。
【００７７】
尚、本実施例においても読出しデータ選択信号ＣＲＩ、ＣＲＤ、およびＣＲＸはそれぞれバンク信号ＢＫＩ、ＢＫＤ、およびＢＫＸのデコードによって得られる。アクセス要求ＲＥＱＩ、ＲＥＱＤ、およびＲＥＱＸとバンク選択フィールドＢＳ０〜３のみでバンク選択を行うので、各アクセスに必要なバンクが確保されたかをチェックし、確保できなかった場合はアクセス遅延信号ＤＬＩ、ＤＬＤ、およびＤＬＸをアサートして、アクセスを待たせる。このチェックはキャッシュアクセスと並列に行えばよい。
【００７８】
具体的には、第１の実施例と同様にバンク信号ＢＫＩまたはＢＫＤとアクセス要求ＲＥＱＩまたはＲＥＱＤとからアクセスすべきバンクを決定し、そのバンクにおいてアドレス選択制御信号ＣＡ０〜ＣＡ３によって、命令アドレスＡＩまたはデータアドレスＡＤが選択されていなければ、それぞれアクセス遅延信号ＤＬＩまたはＤＬＤをアサートする。本実施例では外部アクセスは全てのバンクで常に優先度が低いのでアクセス要求ＲＥＱＩまたはＲＥＱＤが出ているとバンクを確保できない。このため、バンクに依らずアクセス要求ＲＥＱＸアサート時にＲＥＱＩまたはＲＥＱＤがアサートされていればアクセス遅延信号ＤＬＸをアサートする。
【００７９】
第２の実施例では外部アクセスは常に優先度を最低にしているが、外部アクセスを含めて自由に優先度を可変にすることは可能である。第１および第２の実施例は４バンク構成の場合であるが、様々なバンク数の場合に本発明を拡張することは本発明の属する分野の通常の技術者であれば可能である。
【００８０】
図１９は本発明の第３の実施例のキャッシュメモリＣＭである。複数ウェイ構成のキャッシュメモリを全て命令データ共用にするか、ウェイ毎に命令用、データ用のいずれかに指定するかする。バンク数が２の累乗でないと実現困難であるのに対し、ウェイ数は任意の数が可能なので設計の自由度が増す。また、バンクインタリーブ化する必要もない。例えば４ウェイセットアソシアティブの統合型キャッシュメモリを、全てのウェイを命令データ共用とすればそのまま統合型に、２ウェイを命令用、残りの２ウェイをデータ用とすればハーバードアーキテクチャになる。複数個のウェイの各々に対して命令またはデータの内何れか１つのみをキャッシングすることにより、命令データ分離型キャッシュメモリとして動作させることが出来る。但し、バンクインタリーブ化しないと外部アクセスと命令またはデータアクセスを同時に実行することは出来ない。また、ウェイ毎に異なるアドレスを指定する必要があるため、１つのメモリマットに複数のウェイを実装することは出来ない。
【００８１】
さて、第２と第３の実施例のキャッシュメモリＣＭの違いは、図１７と図１９に示されるＣＭを構成するブロックの違いである。まず、キャッシュメモリ本体はバンクＢＫ０〜ＢＫ３の代わりにウェイＷＹ０〜ＷＹ３に分割されて、それぞれのウェイがアドレスマルチプレクサＡＭ０〜ＡＭ３（又はセレクタ）によって選択される固有のアドレスでアクセスされる。そして、バンクはないのでバンク生成部ＢＫＧはない。また、外部アクセスと命令またはデータアクセスを同時に実行することは出来ないので、外部アクセス専用のポートは不要である。
【００８２】
図２２はキャッシュメモリ本体にウェイを使用した場合のプロセッサ構成例を示した図である。外部アクセスは命令系とデータ系に分けて、通常のハーバードアーキテクチャのように、事前に命令アクセスおよびデータアクセスにマージしておく。プロセッサＣＰＵから命令フェッチ要求ＲＥＱＩ，データアクセス要求ＲＥＱＤから成る複数の制御コマンド及びＡＩ，ＡＤから成る複数のアドレス信号がキャッシュメモリ本体に送信される。
【００８３】
前述したバンクＢＫ０〜ＢＫ３を採用したキャッシュメモリＣＭを備えたプロセッサシステム（図２）とバンクの代わりにウェイを採用したＣＭを備えたプロセッサシステム（図２２）の相違点としては、実行ユニットＥＸＵからＣＭに送信される書き込みデータＷＤ及びＢＩＵからＣＭに送信される書き込みデータの内、何れか１データがセレクタにて選択されＣＭに書き込まれ、並行してＢＩＵからの命令系書き込みデータＷＩがＣＭに書き込まれることにある。
【００８４】
同図２２のプロセッサシステムを採用した結果、図１７に示すキャッシュメモリ構成例と比較し、図１９のキャッシュメモリ構成においてアドレスマルチプレクサＡＭ０〜ＡＭ３は命令アドレスＡＩとデータアドレスＡＤの２入力に、書込みデータマルチプレクサＷＭ０〜ＷＭ３の外部書込みデータＷＸ（図１７）は、命令系は命令系書込みデータＷＩに、データ系はデータ系書込みデータＷＤにマージされて不要になる。命令系外部アクセスＡＸ（図１７）をマージした結果、マージ前にはなかった命令系書込みデータＷＩ（図１９）が存在する。
【００８５】
更に、バンクインタリーブ方式では単純なバンク信号のデコード結果であった読出しデータ選択制御信号ＣＲＩおよびＣＲＤは、図１９のキャッシュメモリＣＭに具備されるウェイ選択制御部ＷＳＣによって生成される。ウェイ選択制御部ＷＳＣの詳細を図２０に示す。キャッシュ制御部ＣＣ（図２１）からの読出しデータ選択制御信号ＣＲ０〜ＣＲ３と各ウェイからのヒット信号ＨＴ０〜ＨＴ３とのＡＮＤ論理によって上記信号ＣＲＩ及びＣＲＤは生成される。図１９に示すように読出しデータＲＩ、ＲＤは読出しデータ選択制御信号ＣＲＩ及びＣＲＤによる制御の下、それぞれ読出しデータマルチプレクサＲＭＩ及びＲＭＤを介して、各ウェイＷＹ０〜ＷＹ３から読み出される読出しデータＲ０〜Ｒ３から選択される信号である。
【００８６】
図２１は図１９に示す第３の実施例のキャッシュメモリＣＭ内のキャッシュ制御部ＣＣを示している。キャッシュ制御レジスタＣＣＲには統合ビットＵ（キャッシュメモリが統合型か分離型かを区別するビット）およびウェイ選択フィールドＷＳ０〜ＷＳ３がある。統合ビットＵは全てのウェイを命令データ共用にすることを示す。ウェイ選択フィールドＷＳ０〜ＷＳ３は、統合ビットＵがアサートされている時には命令データアクセス競合においてウェイ毎にどちらのアクセスを優先するかを示し、統合ビットＵがネゲートされている時にはウェイが命令用かデータ用かを示す。ウェイ選択フィールドＷＳ０〜ＷＳ３は命令選択時に０、データ選択時に１とする。この時、キャッシュ制御部ＣＣから出力されるアドレス選択制御ＣＡ０〜ＣＡ３、書込みデータ選択制御ＣＷ０〜ＣＷ３、および読出しデータ選択制御ＣＲ０〜ＣＲ３は全て同一論理で生成できる。尚、図中１、０はビット１、０を表し、それぞれ命令系およびデータ系の選択制御信号である。
【００８７】
例えば、図１９に示す通リ、アドレス選択制御信号ＣＡ０のビット０はウェイ０のデータアドレスＡＤの選択制御信号である。データアドレスＡＤを選択する条件は、図２１にてデータアクセス要求ＲＥＱＤアサート時に、命令アクセス要求ＲＥＱＩがネゲートされているか、ウェイ選択フィールドＷＳ０が１の場合である。この時、ウェイ０は、統合ビットＵの値によってデータアクセスが優先されているか、データ用であるかのいずれかである。いずれの場合もデータアクセスを行う。アドレス選択制御ＣＡ０のビット１はビット０の反転信号である。このため、本実施例では命令アクセス要求ＲＥＱＩおよびデータアクセス要求ＲＥＱＤの双方がネゲートされた場合等、アドレスとしてどちらを選択しても良い場合は、命令アドレスＡＩが選択される。他の制御信号も同様に生成される。
【００８８】
統合ビットＵは通常のキャッシュアクセスでは不要であるが、キャッシュエントリのリプレース時に必要となる。統合ビットＵがアサートされている場合は、リプレースエントリの候補は全ウェイである。この結果、命令とデータの混在した統合型キャッシュメモリとなる。統合ビットＵがネゲートされている場合は、リプレースエントリの候補となるウェイは、命令リプレース時はウェイ選択フィールドＷＳ０〜ＷＳ３が０であるウェイ、データリプレース時はウェイ選択フィールドＷＳ０〜ＷＳ３が１であるウェイのみである。この結果、ウェイ毎に命令またはデータのみが書込まれるため、ハーバードアーキテクチャとなる。
【００８９】
前述したキャッシュメモリＣＭにウェイを使用した際の動作例をまとめると、命令フェッチ要求ＲＥＱＩとデータアクセス要求ＲＥＱＤの入力によりキャッシュ制御部ＣＣにて生成される複数のアドレス選択制御信号及び書き込みデータ選択制御信号により、複数のアドレス信号（ＡＩ，ＡＤ）及び書き込みデータ（ＷＩ，ＷＤ）から複数のセレクタの各々を介して、複数個のウェイ内の異なるウェイに対しては同時に、同一ウェイに対しては逐次的に、複数のアクセスアドレス又は書き込みデータを与える。
【００９０】
さらに複数個のウェイの各々において、アクセスアドレスへの書き込みデータの書き込みまたはアクセスアドレスからのデータ読出しを異なるウェイに対しては同時に、同一ウェイに対しては逐次的に任意に行う。
【００９１】
【発明の効果】
本発明によって、従来、ハーバードアーキテクチャでのみ達成可能であった命令フェッチとデータアクセスの同時実行を統合型キャッシュメモリアーキテクチャで達成することが可能となる。これによって、命令書き換えの容易性と高性能とを同時に達成することができる。
【００９２】
また、アプリケーションを使用し、が命令とデータの一方を重点的にキャッシングしたい場合でも、ハーバードアーキテクチャのように一方のキャッシュが無駄になることなく、全容量を活用することが出来る。
【００９３】
また、同一のプロセッサで統合型キャッシュメモリアーキテクチャとハーバードアーキテクチャの双方を実現することが可能となる。更に、同一プロセッサで多様なキャッシュメモリ構成を実現することも可能となる。
【図面の簡単な説明】
【図１】キャッシュメモリアーキテクチャの変遷を示す図である。
【図２】本発明を適用したプロセッサシステムの例を示す図である。
【図３】本発明を適用したキャッシュメモリの第１の実施例を示す図である。
【図４】本発明の第１の実施例のキャッシュ制御部を示す図である。
【図５】バンク信号生成部の第１の例を示す図である。
【図６】バンク信号生成部の第２の例を示す図である。
【図７】バンク信号生成部の第１の動作例を示す図である。
【図８】キャッシュ制御部の第１の動作例を示す図である。
【図９】キャッシュメモリの第１の動作例を示す図である。
【図１０】バンク信号生成部の第２の動作例を示す図である。
【図１１】キャッシュ制御部の第２の動作例を示す図である。
【図１２】キャッシュメモリの第２の動作例を示す図である。
【図１３】バンク信号生成部の第３の動作例を示す図である。
【図１４】キャッシュ制御部の第３の動作例を示す図である。
【図１５】キャッシュメモリの第３の動作例を示す図である。
【図１６】バンク信号生成部の第４の動作例を示す図である。
【図１７】本発明を適用したキャッシュメモリの第２の実施例を示す図である。
【図１８】第２の実施例のキャッシュ制御部を示す図である。
【図１９】本発明を適用したキャッシュメモリの第３の実施例を示す図である。
【図２０】第３の実施例のウェイ選択制御部を示す図である。
【図２１】第３の実施例のキャッシュ制御部を示す図である。
【図２２】本発明を適用したキャッシュメモリにおいて、バンクの代わりにウェイを使用した場合のプロセッサシステムの例を示す図である。
【符号の説明】
ＣＰＵ：中央処理装置、ＩＦＵ：命令フェッチユニット、ＥＸＵ：実行ユニット、ＢＩＵ：バスインタフェイスユニット、ＡＩ：命令アドレス、ＡＤ：データアドレス、ＡＸ：外部アドレス、ＲＥＱＩ：命令フェッチ要求、ＲＥＱＤ：データアクセス要求、ＲＥＱＸ：外部アクセス要求、ＲＩ、ＲＤ、ＲＸ：読出しデータ、ＷＤ、ＷＸ：書込みデータ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a processor system having a cache memory, and more particularly to a processor system having a processor capable of processing a plurality of commands and a cache memory operable in response to an access request from the processor. Further, both the integrated cache memory system and the separated cache memory system can be realized by the same architecture, and the integrated cache memory system can be accelerated at the same speed as the separated cache memory system.
[0002]
[Prior art]
FIG. 1 shows an outline of the transition of the cache memory architecture. Prior to the introduction of the cache memory, the processor CPU and the main memory MM directly exchanged instructions and data as in (1). After that, the main memory MM speed has been determined by the increase in the capacity of the main memory MM and the increase in the speed of the processor CPU.
[0003]
Therefore, as shown in (2), the cache memory UC having a smaller capacity and higher speed than the main memory MM is arranged between the processor CPU and the main memory MM to improve the performance. The initial cache memory UC was an integrated type that handles both instructions and data.
[0004]
Thereafter, the processor CPU and the cache memory UC can be integrated on the same chip by miniaturizing the process as shown in (3). As a result, the number of signal lines connecting the processor CPU and the cache memory UC can be greatly increased. As shown in (4), the cache memory UC can be separated into the instruction cache IC and the data cache DC and simultaneously accessed. Harvard architecture has appeared. And it became common sense that high-performance cache architecture is Harvard architecture.
[0005]
Subsequently, a superscalar or VLIW (Very Long Instruction Word) architecture appeared, enabling a plurality of data accesses simultaneously. For this reason, a processor having a plurality of data cache DC ports as shown in (5) has appeared. In general, the multi-port configuration is such that only accesses to different banks are simultaneously executed by the bank interleaving method.
[0006]
In addition, since the integrated cache memory architecture is less expensive than the Harvard architecture, the low-end version may be the integrated type and the high-end version may be the Harvard architecture in the same processor family. For example, SH-3 described in “Microprocessor Report Vol. 9, no. 3, 3/6/95, p. 12” and “Microprocessor Report Vol. 10, no. 14, 10/28/96, pp. 32- SH-4 described in "35" is the same SuperH series processor, but the former is an integrated type and the latter is a Harvard architecture.
[0007]
In recent years, JAVA is rapidly spreading as a programming language independent of processors. JAVA is a language for rewriting instructions. The complex instruction executed at the first time is rewritten to an instruction that is executed at high speed based on information determined by executing it once. Furthermore, in order to execute a program written in JAVA at high speed, a method of detecting a routine with high execution frequency, rewriting it into a machine language program specific to the processor and executing it at high speed is also a JIT (Just-in-time) compilation method. Generalized.
[0008]
The performance improvement by the cache memory is premised on the spatial and temporal locality of memory access. Therefore, it does not work effectively without the locality. For example, the network processor IXP1200 described in “Microprocessor Report Vol.13, no.12, 9/13/99, pp1, 6-10” does not include a data cache, and directly accesses an external SRAM or SDRAM. In addition, the vector floating point unit VPU of the emotion engine EE described in “Microprocessor Report Vol. 13, No. 5, 4/19/99, pp1, 6-11” has a dedicated RAM instead of a cache. A direct memory access unit controlled by software performs data access to the RAM.
[0009]
[Problems to be solved by the invention]
As a result of the historical transition of the cache architecture as described above, it has become common sense to use a Harvard architecture when emphasizing performance and an integrated cache memory architecture when emphasizing cost. However, due to the improvement in integration due to process miniaturization, the difference in cost between the integrated architecture and the Harvard architecture has become smaller than the cost of the entire chip, and there is an advantage of creating two types of cache memory architecture for each product. It is gone.
[0010]
Further, when a language for rewriting instructions such as JAVA becomes widespread, the Harvard architecture is not always good. In the Harvard architecture, instruction rewriting is generally not detected by hardware. For this reason, when the instruction is rewritten, it must be ensured that the instruction before rewriting is not executed by software responsibility. At the time of instruction rewriting, since the rewritten instruction is handled as data, the rewritten instruction is stored in the data cache DC. At this time, even if the instruction before rewriting exists in the instruction cache IC, it is not updated.
[0011]
The software clears the instruction before rewriting on the instruction cache IC, writes the instruction back to the main memory MM after rewriting on the data cache DC, and then executes the instruction after rewriting. Then, the hardware fetches and executes the rewritten instruction from the main memory MM. Even if the instruction rewrite is detected by hardware, efficient processing is difficult because only the above-described software processing is hardwareized.
[0012]
On the other hand, in the integrated cache memory architecture, instructions on the cache memory UC are rewritten by instruction rewriting. Therefore, if an instruction is fetched from the cache memory UC after instruction rewriting, the instruction after rewriting can be fetched. For this purpose, in a normal pipeline processor, it is only necessary to cancel an executing instruction existing on the pipeline after the instruction is rewritten. Therefore, the integrated cache memory architecture is more suitable for supporting instruction rewriting.
[0013]
Due to the improvement of the degree of integration accompanying the process miniaturization, it is possible to make the main memory MM on-chip in a small-scale system. Also, like the Emotion Engine EE, instructions or data are placed on the on-chip memory, and the instructions and data are transferred to the on-chip memory in advance by direct memory access, etc., so that high-speed access is ensured when actually used. It is also possible to do. In this way, if the instruction or data to be used can be predicted, the speed can be increased even if the memory access has no spatial and temporal locality. In such a situation, a cache memory is unnecessary, or only one of an instruction cache and a data cache is required.
[0014]
In addition, from the era when processor systems were limited to mainframes, workstations, PCs, etc., the era has come to be installed in a wide variety of products such as mobile phones, digital home appliances, and automobiles. The composition is also diversified. Therefore, it is important to have various cache memory configurations with the same processor.
[0015]
A first problem to be solved by the present invention is to achieve simultaneous execution of instruction fetch and data access, which can be achieved only with a Harvard architecture, with an integrated cache memory architecture. This makes it possible to achieve both high performance and ease of instruction rewriting at the same time. In addition, even in the case of an application that wants to cache one of instructions and data intensively, the entire capacity can be utilized without wasting one cache as in the Harvard architecture.
[0016]
A second problem to be solved by the present invention is to realize both an integrated cache memory architecture and a Harvard architecture with the same processor. Furthermore, it is to realize various cache memory configurations with the same processor.
[0017]
[Means for Solving the Problems]
The first problem is solved by forming a plurality of ports in the integrated cache memory. This allows instruction fetches and data access requests to be processed at the same time, achieving the same performance as the Harvard architecture. However, pure multiple ports increase the amount of hardware and reduce the cache memory capacity that can be realized in the same area. Therefore, the cache memory is constituted by a plurality of banks specified by a part of the address, each bank is a one-port cache, and if the instruction fetch and the data access request are for different banks, simultaneous processing is performed, and if the same bank is sequentially processed By processing, the amount of hardware can be reduced and the cache memory capacity can be maintained as compared with a complete multi-port cache memory. Although the capacity of cache memory can be increased along with the process miniaturization, it is necessary to divide the memory mat in order to increase the capacity, and if the divided memory mat is assigned to the bank, it will accompany the bank division. An increase in cost can be avoided.
[0018]
The second problem is solved by using not only a part of the address but also an instruction fetch and a data access request identification signal for specifying the port or bank. When used as an integrated cache memory architecture, a part of the address is used, and when used as a Harvard architecture, an identification signal is used. In this way, by switching signals used for port or bank designation, two cache memory architectures are realized by the same processor. Furthermore, it is possible to change the capacity distribution of the Harvard architecture instruction cache and data cache depending on the signal switching method. Also, the first and second problems can be solved in the same way by making it possible to access a plurality of ways with different addresses.
[0019]
Further, in order to solve the above problems, the present invention provides a processor system having a processor capable of independently processing a plurality of commands and a cache memory that operates in response to an access request from the processor. And a plurality of control commands including instruction fetches transmitted from the processor via the plurality of ports, and a plurality of address signals can be simultaneously processed. is there.
[0020]
Furthermore, the present invention provides a system having a processor capable of independently processing instruction fetch and data access and a cache memory operating in response to an access request from the processor, wherein the cache memory includes a plurality of selectors and a plurality of addresses. It is composed of a plurality of banks specified by a part, each bank is a one-port cache, and if the instruction fetch request and the data access request are for different banks, simultaneous processing is performed, and if the same bank is processed, sequential processing is performed. An object is to provide a processor system.
[0021]
Further, the present invention includes a plurality of banks and a controller that controls the plurality of banks, the controller generates a control signal for writing or reading a command or data to each of the plurality of banks, The control signal is supplied to the plurality of banks under the control of the controller, and the command or data is written or read simultaneously to different banks in the plurality of banks, and the command or data is transferred to the same bank. It is another object of the present invention to provide a cache memory characterized by sequentially performing the writing or reading operation.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals in the drawings of the respective embodiments indicate the same or equivalent. FIG. 2 shows an example of a processor system to which the present invention is applied. It consists of a processor LSI and a main memory MM. The processor LSI includes a central processing unit CPU, a cache memory CM, an external memory interface EMI, and a peripheral module PM, and is connected by an internal bus IB. The central processing unit CPU includes an instruction fetch unit IFU, an execution unit EXU, and a bus interface unit BIU. The processor and the cache memory CM are integrated on the same LSI chip.
[0023]
The basic operation of the central processing unit CPU is as follows. First, the instruction fetch unit IFU issues an instruction fetch request REQI together with an instruction address AI to the cache memory CM. The cache memory CM returns the read instruction RI to the instruction fetch unit IFU in response to the request REQI. The instruction fetch unit IFU supplies the instruction RI to the execution unit EXU. The execution unit EXU decodes and executes the instruction RI. If the decoded instruction is a memory read instruction, a data access request REQD is issued together with the data address AD. The cache memory CM returns the read data RD to the execution unit EXU in response to the request REQD. If the decoded instruction is a memory write instruction, a data access request REQD is issued together with the data address AD and the write data WD. The cache memory CM writes the data WD in response to the request REQD.
[0024]
When the instruction fetch request REQI or the data access request REQD has a cache miss, the bus interface unit BIU receives the instruction address AI, data address AD, write data WD, and the like related to the request, and externally passes through the internal bus IB. An external memory fetch request is issued to the memory interface EMI. In response to the request, the external memory interface EMI outputs an address A to the main memory MM to issue an external memory fetch request, and the main memory MM returns data D in response thereto. The external memory interface EMI returns data D to the bus interface unit BIU via the internal bus IB. The bus interface unit BIU issues an external access request REQX together with the external address AX and write data WX, and the cache memory CM has a port for processing the external access request REQX (access request from the outside). Write data WX is written accordingly.
[0025]
When the pipeline operation is performed in the central processing unit CPU, the instruction fetch unit IFU fetches the subsequent instruction simultaneously with the instruction processing of the execution unit EXU. Further, if the data access of the execution unit EXU is non-blocking, the data access by the subsequent instruction is performed simultaneously with the external memory access due to the data access miss of the cache memory CM. Therefore, the cache memory CM includes an instruction fetch request and a data access request, or a plurality of control commands composed of any one of the instruction fetch request REQI, the data access request REQD, and the external access request REQX, and an instruction address. An ability to simultaneously process a plurality of address signals including any one of the signal AI and the data address signal AD, or the signal AI, the signal AD, and the external address signal AX is required.
[0026]
FIG. 3 shows a first embodiment of a cache memory CM to which the present invention is applied. The cache memory CM includes a cache control register CCR, a bank signal generation unit BKG (or signal generation unit), a cache control unit CC that controls the CM, and a cache body. BKG generates a plurality of control signals (BKI, BKD, BKX) to be given to CC based on a plurality of address signals.
[0027]
The cache memory main body is divided into four banks BK0 to BK3 designated by specific bits which are a part of the plurality of addresses in each of a plurality of address signals (AI, AD, and AX), and to different banks. Simultaneous access is possible.
[0028]
When specifying each of the plurality of banks in the cache memory, the cache control unit included in the cache memory based on the input of the instruction fetch request or data access request and the plurality of control signals instead of the specific bit By specifying a bank via each of the plurality of selectors under the control of the plurality of generated address selection control signals and write data selection control signals, it operates as an instruction data separation type cache memory.
[0029]
The banks BK0 to BK3 respectively receive access addresses A0 to A3, and further write data W0 to W3 at the time of writing, and perform a read or write operation, and output read data R0 to R3 at the time of reading. Each of the banks BK0 to BK3 is regarded as a one-port cache. Access addresses A0 to A3 are selected from addresses AI, AD, or AX by address selection control signals CA0 to CA3 in address multiplexers (or selectors) AM0 to AM3, respectively. Write data W0 to W3 are selected from write data WD or WX by write data selection control signals CW0 to CW3 in write data multiplexers (or selectors) WM0 to WM3, respectively. Read data RI, RD, and RX are selected from read data R0 to R3 by read data selection control signals CRI, CRD, and CRX in read data multiplexers (or selectors) RMI, RMD, and RMX, respectively. Note that the number assigned to the input signal of each multiplexer in the figure is the bit number of the selection control signal that is asserted when that input is selected.
[0030]
FIG. 4 shows details of the cache control unit CC of the first embodiment. A control signal for each multiplexer of the cache body is generated from the instruction bank BKI, data bank BKD, and external bank BKX from the bank signal generation unit BKG, and the instruction fetch request REQI, data access request REQD, and external access request REQX. .
[0031]
More specifically, the cache control unit has already assigned a data access request to a bank designated based on a control signal in response to an instruction fetch request, a data access request, and a plurality of control signals (BKI, BKD, BKX) inputs. If a data access request is not yet assigned to the bank specified based on the control signal, a plurality of address selection control signals or write data selections are generated. Generate a control signal.
[0032]
The read data selection control signals CRI, CRD, and CRX in FIG. 3 are 4-bit signals that control the 4-input multiplexer. Each of the 2-bit instruction bank BKI, the data bank BKD, and the external bank BKX can be generated by simple decoding and is not shown.
[0033]
Address selection control signals CA0 to CA3 and write data selection control signals CW0 to CW3 cannot be generated unless the priority of instruction fetch, data access, and external access is determined. The simplest priority determination method is to maintain the original sequential execution order of the program. Since external access is caused by a previous cache access miss, the sequential execution order is the fastest. Further, instruction fetch is preparation of subsequent instructions, and the sequential execution order is the slowest. Therefore, the priority is first for external access, second for data access, and third for instruction fetch.
[0034]
However, in a highly optimized program, instructions and data are cached in advance from the main memory MM to the cache memory CM by a prefetch instruction or the like so that a cache miss does not occur when actually used. In such a program, lowering the external access priority improves performance. Since it is difficult to optimize the program so that it is just-on-time caching, when caching with a little extra time, caching is performed earlier than necessary, so it is better not to stall internal operations by waiting for this Because it is good. Therefore, in this embodiment, the priority is set to data access for the first, instruction fetch for the second, and external access for the third.
[0035]
Since the access addresses A0 to A3 shown in FIG. 3 are selected from the addresses AI, AD, or AX, the address multiplexers AM0 to AM3 have 3 inputs, and the control signals CA0 to CA3 have 3 bits. Therefore, as the bit number of the control signal, 2 is assigned to the instruction address, 1 is assigned to the data address, and 0 is assigned to the external address.
[0036]
First, as shown in FIG. 4, when the highest priority data access request REQD is asserted, a bank designated by the data bank BKD is assigned, and bit 1 of the control signal of the assigned bank among the address selection control signals CA0 to CA3 is assigned. Assert. In other words, each of the 4-bit signals obtained by decoding the 2-bit data bank BKD by the data bank decoder BDD and the data access request REQD are ANDed.
[0037]
Next, when the instruction fetch request REQI is asserted, a bank designated by the instruction bank BKI is allocated. At this time, if data access has already been assigned to the bank, the instruction fetch delay signal DLI is asserted and instruction fetch is not assigned. When the allocation is performed, bit 2 of the control signal of the allocated bank is asserted among the address selection control signals CA0 to CA3. That is, the AND logic of each of the 4-bit signals obtained by decoding the 2-bit instruction bank BKI by the instruction bank decoder BDI and the instruction access request REQI is taken, and further, the inverted signal of the bit 1 of the address selection control signals CA0 to CA3 and the AND logic. I take the. Bit 0 of the address selection control signals CA0 to CA3 is asserted when neither data access nor instruction access is performed in the corresponding bank. That is, the AND logic of the inverted signal of bit 1 of the address selection control signals CA0 to CA3 and the inverted signal of the signal obtained by ANDing the instruction access request REQI that is the source of bit 2 and the signal from the BDI. It is a signal taken.
[0038]
Further, a signal obtained by ANDing the 4-bit signal obtained by decoding the 2-bit external bank BKX by the external bank decoder BDX and the external access request REQX is asserted, and an external access request to the corresponding bank is issued. Regardless, if the signal for selecting an external address as the bank address, that is, bit 0 of the address selection control signals CA0 to CA3 is not asserted, the necessary access bank cannot be selected, and the external access delay signal DLX is asserted.
[0039]
Since the write data W0 to W3 shown in FIG. 3 is selected from the write data WD or WX, the write data multiplexers WM0 to WM3 have two inputs and the control signals CW0 to CW3 have two bits. Therefore, the bit number of the control signal is assigned 1 to the data address and 0 to the external address. First, bit 1 of write data selection control signals CW0 to CW3 has the same logic as bit 1 of address selection control signals CA0 to CA3. When there is no data access, the external write data WX shown in FIG. 3 is selected as the write data. Therefore, bit 0 of the write data selection control signals CW0 to CW3 is the inversion of bit 1 as shown in FIG.
[0040]
FIG. 5 shows a first example of the bank signal generator BKG. If the capacity of the cache memory CM is 128 KB and the 4-way set associative method is used, the capacity per way is 32 KB and the index is 15 bits. In the bank interleave method, a part of the index is used for bank designation. In this embodiment, since the number of banks is 4, 2 bits are used for bank designation. Which bit is used for bank designation determines the stall frequency due to bank contention to a minimum, depending on the program. Conversely, it is also possible to create a program that suppresses contention by disclosing bank designation bits to the programmer. In FIG. 5, bits 14 to 0 of the address are used as indexes, and the upper 2 bits of the index are used as bank designation bits. Therefore, bits 14 to 13 are bank designation bits.
[0041]
The bank signal generator BKG is controlled by the bank control field BC of the cache control register CCR. In FIG. 5, the bank control field BC is 1 bit, and the bank multiplexers BMI, BMD, and BMX are controlled to select the upper bits of the 2-bit bank signals BKI, BKD, and BKX. In this embodiment, the lower bits are always bit 13 of addresses AI, AD, and AX. In FIG. 5, the numbers assigned to the input signals of the bank multiplexers BMI, BMD, and BMX are the values of the bank control field BC when the input signals are selected. That is, if the bank control field BC is 1, the bits 14 of the addresses AI, AD, and AX are selected as the upper bits of the bank signals BKI, BKD, and BKX, respectively. On the other hand, if the bank control field BC is 0, the value 0, the value 1, and the external data access signal DA are selected. The external data access signal DA is asserted when the external access is a data system.
[0042]
As a result, if the bank control field BC is 1, the bank signals BKI, BKD, and BKX are bits 14 to 13 of the addresses AI, AD, and AX, respectively. Therefore, the cache memory CM is an integrated 4-bank interleaved cache.
[0043]
If the bank control field BC is 0, the bank signal BKI is 0 or 1 depending on the value of the bit 13 of the address AI, and the bank signal BKD is 2 or 3 depending on the value of the bit 13 of the address AD. BKX is 0 or 1 when the external data access signal DA is negated, or 2 or 3 when asserted, depending on the value of bit 13 of the address AX.
[0044]
Therefore, for instruction fetch and instruction system external access, bank signals BKI and BKX always specify bank 0 or 1, and for data access and data system external access, bank signals BKD and BKX are always bank 2 or Specify 3. As a result, banks 0 and 1 operate as a 2-bank interleave instruction cache, and banks 2 and 3 operate as a 2-bank interleave data cache. Since different banks can be accessed simultaneously, the Harvard architecture is achieved. At this time, bit 14 of the address is always used as a tag. If the bank control field BC is 1, it is redundant to set bit 14 as a tag, but no malfunction occurs. If the bank control field BC is 0, bit 14 is necessary as a tag. If one bit of redundancy is removed, the logic becomes complicated and the speed decreases.
[0045]
FIG. 6 shows a second example of the bank generation unit BKG. Normally, the address space where the program is placed and the address space where the data is placed are often determined in advance by the system. Therefore, if there is an address bit for identifying these two spaces, the instruction bit is designated as a bank designation bit. And data bank conflicts can be avoided. As a result, it is possible to obtain the same performance as the Harvard architecture while being an integrated four-bank interleaved cache.
[0046]
In the example of FIG. 6, the bank control field BC is 2 bits, and in addition to the cases where the bank control field BC of FIG. When the bank control field BC is 2, the bit 20 of the address is selected, and when the bank control field BC is 3, the bit 24 is selected.
[0047]
As a result, bank conflict between instruction and data can be avoided by using bit 20 as a bank designation bit in a relatively small system with a program size of about 1 MB and bit 24 in a slightly larger system of about 16 MB. Also in this case, as in the case of FIG. 5, the bits 24, 20, and 14 are always used as tags.
[0048]
7 to 9 show a first operation example of this embodiment. FIG. 7 shows an example of the operation of the bank generator BKG. In the figure, thick signal lines are asserted, and thin signal lines are negated. In this operation example, it is assumed that the bank control field BC of the cache control register CCR is 1 and the integrated cache mode is set. It is assumed that the instruction address AI, the data address AD, and the external address AX are hexadecimal numbers 00001230, 00102468, and 001048C0, and the instruction fetch request REQI, data access request REQD, and external access request REQX are all asserted.
[0049]
In FIG. 7, hexadecimal numbers are expanded to binary numbers so that the values of bits 14 to 13 are clear. Since bank control field BC is 1, instruction bank multiplexer BMI, data bank multiplexer BMD, and external bank multiplexer BMX select bit 14 of instruction address AI, data address AD, and external address AX, respectively, and 0, 0, and 1 is output. Since a signal obtained by combining the output and the bit 13 that is always used as a bank designation signal is a bank signal, the instruction bank BKI, the data bank BKD, and the external bank BKX are 0, 1, and 2, respectively.
[0050]
FIG. 8 shows an operation example of the cache control unit CC. In the figure, thick signal lines are asserted, and thin signal lines are negated. Instruction bank BKI, data bank BKD, and external bank BKX are decoded by instruction bank decoder BDI, data bank decoder BDD, and external bank decoder BDX, respectively, and bits 0, 1, and 2 of the decoder output are asserted, respectively.
[0051]
Since the instruction fetch request REQI, the data access request REQD, and the external access request REQX are all asserted, the asserted state is maintained even after AND logic with these signals. Then, the AND gate of the priority determination logic asserts the CA2 bit 2, the CA1 bit 1, the CA2 bit 0, and the CA3 bit 0 as the address selection control signal, and the write data selection signal is the CW0 bit 0 and the CW1 bit. Bit 1, bit 0 of CW2, and bit 0 of CW3 are asserted.
[0052]
Further, the instruction fetch delay DLI and the external access delay DLX are negated by the bank conflict determination logic. Although not shown, read data selection signals CRI, CRD, and CRX are simple decodes of instruction bank BKI, data bank BKD, and external bank BKX, so that bits 0, 1, and 2 are asserted, respectively. .
[0053]
FIG. 9 shows an operation example of the cache memory CM. As a control signal CTL from the cache control unit CC, the address selection control signal is bit 2 of CA0, bit 1 of CA1, bit 0 of CA2, bit 0 of CA3, write data selection signal is bit 1 of CW1, and other CW0,2 , 3 bit 0, and CRI bit 0, CRD bit 1 and CRX bit 2 are asserted in the read data selection signal.
[0054]
As a result, the instruction address AI, the data address AD, and the external address AX are selected as the addresses A0 to A3, respectively. As the write data W0 to W3, the write data WD is selected for W1, and the write data WX is selected for the other. Then, read data R0, R1, and R2 are selected as read data RI, RD, and RX, respectively.
[0055]
As described above, if the bank control field BC of the cache control register CCR is set to 1, the integrated cache mode is set, and the bits 14 to 13 used as the bank designation bits are accessed with different addresses, an instruction fetch request, a data access request, and External access requests can be processed simultaneously in different banks. That is, simultaneous processing is executed.
[0056]
10 to 12 show a second operation example of this embodiment. Also in this operation example, it is assumed that the bank control field BC of the cache control register CCR is 1 and the integrated cache mode is set. Then, assume that the instruction address AI, the data address AD, and the external address AX are hexadecimal 00001230, 00101357, and 00100240, and the instruction fetch request REQI, the data access request REQD, and the external access request REQX are all asserted.
[0057]
Then, as shown in FIG. 10, the instruction bank multiplexer BMI, the data bank multiplexer BMD, and the external bank multiplexer BMX all output 0. When further combined with bit 13, instruction bank BKI, data bank BKD, and external bank BKX are all zero.
[0058]
FIG. 11 shows an operation example of the cache control unit CC. The output of the instruction bank decoder BDI, data bank decoder BDD, and external bank decoder BDX is all asserted bit 0, and remains asserted after AND logic with the instruction fetch request REQI, data access request REQD, and external access request REQX . Then, the AND gate of the priority determination logic asserts bit 1 of CA0 and bit 0 of other CA1 to 3 as the address selection control signal, and bit 1 of CW0 and bit 0 of the other CW1 to 3 as the write data selection signal. Is asserted.
[0059]
Further, the instruction fetch delay DLI and the external access delay DLX are asserted by the bank conflict determination logic. That is, instruction fetch and external access are awaited. Although not shown, since the read data selection signals CRI, CRD, and CRX are simple decodes of the instruction bank BKI, the data bank BKD, and the external bank BKX, all 0s are asserted.
[0060]
FIG. 12 shows an operation example of the cache memory CM. As the control signal CTL from the cache control unit CC, the address selection control signal is bit 1 of CA0, bit 0 of other CA1-3, the write data selection signal is bit 1 of CW0, bit 0 of other CW1-3, read data selection All signals have bit 0 asserted.
[0061]
As a result, the data address AD is selected as the address A0, and the external address AX is selected as the other addresses A1 to A3. The write data WD is selected as the write data W0, and the write data WX is selected as the other write data W1 to W3. Then, the read data R0 is selected for all the read data RI, RD, and RX.
[0062]
As described above, when the bank control field BC of the cache control register CCR is set to 1, the integrated cache mode is set, and the bits 14 to 13 used as the bank designation bits are accessed with the same address, only the data access is executed due to the bank conflict. Instruction fetches and external accesses are awaited. That is, sequential processing is executed.
[0063]
13 to 15 show a third operation example. The access request with the same address as in the second operation example is used, and the bank control field BC of the cache control register CCR is set to 0 to set the Harvard architecture mode. External access is data-based access. That is, the external data access signal DA is asserted. Then, since the bank control field BC is 0 as shown in FIG. 13, the instruction bank multiplexer BMI, the data bank multiplexer BMD, and the external bank multiplexer BMX select the value 0, the value 1, and the value 1 of the external data access signal DA, respectively. Output. Since a signal obtained by combining the output and the bit 13 that is always used as the bank designation signal is a bank signal, the instruction bank BKI, the data bank BKD, and the external bank BKX are 0, 2, and 2, respectively.
[0064]
FIG. 14 shows an operation example of the cache control unit CC. The outputs of the instruction bank decoder BDI, data bank decoder BDD, and external bank decoder BDX are asserted with bits 0, 2, and 2, respectively, and after AND logic with the instruction fetch request REQI, data access request REQD, and external access request REQX Keep asserted. Then, by the AND gate of the priority determination logic, the CA2 bit 0, the CA1 bit 0, the CA2 bit 1, and the CA3 bit 0 are asserted as the address selection control signal, the write data selection signal is the CW2 bit 1, and the other CW0. , 1, 3 bit 0 is asserted. The external access delay DLX is asserted by the bank conflict determination logic. That is, external access is awaited. Although not shown, read data selection signals CRI, CRD, and CRX are simple decodes of instruction bank BKI, data bank BKD, and external bank BKX, so that bits 0, 2, and 2 are asserted, respectively. .
[0065]
FIG. 15 shows an operation example of the cache memory CM. As the control signal CTL from the cache control unit CC, the address selection control signal is bit 2 of CA0, bit 0 of CA1, bit 1 of CA2, bit 0 of CA3, the write data selection signal is bit 1 of CW2, other CW0, As for the read data selection signal, CRI bit 0, CRD bit 2, and CRX bit 2 are asserted.
[0066]
As a result, the instruction address AI, the external address AX, the data address AD, and the external address AX are selected as the addresses A0 to A3, respectively. As for the write data W0 to W3, the write data WD is selected for W2, and the write data WX is selected for the other W0, 1, and 3. Then, read data R0, R2, and R2 are selected as read data RI, RD, and RX, respectively.
[0067]
As described above, when the bank control field BC of the cache control register CCR is set to 0 and the Harvard architecture mode is set, bank contention between instruction fetch and data access can be avoided even when accessed with the same address as in the second operation example. On the other hand, bank conflicts between data access and data system external access cannot be avoided, but this cannot be avoided even with a normal Harvard architecture.
[0068]
FIG. 16 shows a fourth operation example of the bank signal generation unit. Although an access request with the same address as that in the second operation example is processed, the bank generator BKG in FIG. 6 is used and the bank control field BC of the cache control register CCR is set to 2. The address bit for distinguishing the address space between the program and data is bit 20.
[0069]
Then, since the bank control field BC is 2 as shown in FIG. 16, the instruction bank multiplexer BMI, the data bank multiplexer BMD, and the external bank multiplexer BMX output 0, 1, and 1, respectively. Since a signal obtained by combining the output and the bit 13 that is always used as the bank designation signal is a bank signal, the instruction bank BKI, the data bank BKD, and the external bank BKX are 0, 2, and 2, respectively. That is, the same bank signal as in the third operation example is output. As a result, the cache control unit CC and the cache memory CM operate in the same manner, and it is possible to avoid contention between instruction fetch and data access as in the Harvard architecture, although it is an integrated cache.
[0070]
To summarize the operation examples of the above-described figures, a plurality of address selection control signals generated by the cache control unit by inputting an instruction fetch request, a data access request, an external access request, and a plurality of control signals (BKI, BKD, BKX). In addition, by controlling the write data selection control signal, a plurality of address signals (AI, AD, AX) and write data (WD, WX) are sent to different banks in the plurality of banks through each of the plurality of selectors. At the same time, a plurality of access addresses are sequentially given to the same bank.
[0071]
Further, in each of the plurality of banks, writing of write data to a plurality of access addresses or reading of data from each access address is arbitrarily performed simultaneously for different banks and sequentially for the same bank. .
[0072]
In particular, in terms of sequential processing, control signals (CA0-3, CW0-3) are supplied to each of a plurality of banks under the control of the cache control unit CC (controller), and instructions or Data write or read operations are sequentially performed.
[0073]
FIG. 17 shows a cache memory CM according to the second embodiment of the present invention. An access request to be preferentially selected for each bank is designated in advance by the bank selection fields BS0 to BS3 of the cache control register CCR. If two or more access requests are received at the same time, an access request with a high priority is assigned to each bank. Accept. The difference from the first embodiment shown in FIG. 3 is that the bank selection fields BS0 to BS3 are output from the cache control register CCR to the cache control unit CC, and the cache control is performed from the input of the instruction fetch request REQI and the data access request REQD. The point is that the signal CTL is generated.
[0074]
FIG. 18 shows the cache control unit CC of the second embodiment. The advantage of this method is that cache access can be started at a high speed because address information that is slower than the access request is not used for bank selection. This is suitable when the cache access speed is more important than the bank usage efficiency. Address selection control signals CA0 to CA3 and write data selection control signals CW0 to CW3 are generated without using bank signals BKI, BKD, and BKX generated from addresses.
[0075]
Address selection control signal CA0 is generated from bank selection field BS0, instruction fetch request REQI and data access request REQD as follows. The bank selection field BS0 is 0 when instruction fetch is prioritized and 1 when data access is prioritized. Bits 2, 1, and 0 of address selection control signal CA0 correspond to instruction address AI, data address AD, and external address AX, respectively.
[0076]
First, bit 2 is asserted when the data fetch request REQD is negated when the instruction fetch request REQI is asserted or when the bank selection field BS0 is 0 and the instruction fetch priority is given. Similarly, bit 1 is asserted when the data access request REQD is asserted, when the instruction fetch request REQI is negated, or when the bank selection field BS0 is 1 and data access is prioritized. Bit 0 is asserted when both the instruction fetch request REQI and the data access request REQD are negated. Similarly, address selection control signals CA1 to CA3 are generated from bank selection fields BS1 to BS3, instruction fetch request REQI and data access request REQD. In the write data selection control signals CW0 to CW3, the bit 1 is the same logic as the bit 1 of the address selection control signals CA0 to CA3, and the bit 0 is the inversion of the bit 1, as in the first embodiment shown in FIG.
[0077]
In this embodiment, the read data selection signals CRI, CRD, and CRX are obtained by decoding the bank signals BKI, BKD, and BKX, respectively. Since bank selection is performed only by the access requests REQI, REQD, and REQX and the bank selection fields BS0 to BS3, it is checked whether a bank necessary for each access is secured. If the bank is not secured, access delay signals DLI, DLD, And assert DLX to wait for access. This check may be performed in parallel with the cache access.
[0078]
Specifically, as in the first embodiment, the bank to be accessed is determined from the bank signal BKI or BKD and the access request REQI or REQD, and the instruction address AI or the address is selected by the address selection control signals CA0 to CA3 in the bank. If the data address AD is not selected, the access delay signal DLI or DLD is asserted, respectively. In this embodiment, since external access always has a low priority in all banks, a bank cannot be secured if an access request REQI or REQD is issued. Therefore, the access delay signal DLX is asserted if REQI or REQD is asserted when the access request REQX is asserted regardless of the bank.
[0079]
In the second embodiment, external access always has the lowest priority, but it is possible to freely change the priority including external access. Although the first and second embodiments are of a four-bank configuration, it is possible for ordinary engineers in the field to which the present invention belongs to expand the present invention in the case of various numbers of banks.
[0080]
FIG. 19 shows a cache memory CM according to the third embodiment of the present invention. Whether all the cache memories having a multi-way configuration are shared by instruction data, or each way is designated for instruction or data. Whereas the number of banks is not a power of 2, it is difficult to realize, while the number of ways can be any number, so that the degree of freedom of design increases. In addition, there is no need for bank interleaving. For example, if a 4-way set associative integrated cache memory is used to share instruction data for all ways, it becomes an integrated type as it is, and if 2 ways are used for instructions and the remaining 2 ways are used for data, a Harvard architecture is obtained. By caching only one of instructions or data for each of the plurality of ways, it is possible to operate as an instruction data separation type cache memory. However, external access and instruction or data access cannot be executed at the same time without bank interleaving. In addition, since it is necessary to specify a different address for each way, a plurality of ways cannot be mounted on one memory mat.
[0081]
The difference between the cache memories CM of the second and third embodiments is the difference between the blocks constituting the CM shown in FIGS. First, the cache memory main body is divided into ways WY0 to WY3 instead of the banks BK0 to BK3, and each way is accessed with a unique address selected by the address multiplexers AM0 to AM3 (or selectors). Since there is no bank, there is no bank generation unit BKG. Also, since external access and instruction or data access cannot be executed at the same time, a dedicated port for external access is unnecessary.
[0082]
FIG. 22 is a diagram showing a processor configuration example when a way is used in the cache memory main body. External access is divided into an instruction system and a data system, and merged into instruction access and data access in advance as in a normal Harvard architecture. A plurality of control commands including an instruction fetch request REQI and a data access request REQD and a plurality of address signals including AI and AD are transmitted from the processor CPU to the cache memory body.
[0083]
The difference between the processor system having the cache memory CM adopting the banks BK0 to BK3 (FIG. 2) and the processor system having the CM adopting the way instead of the bank (FIG. 22) is that from the execution unit EXU. One of the write data WD transmitted to the CM and the write data transmitted from the BIU to the CM is selected by the selector and written to the CM, and the command-based write data WI from the BIU is concurrently sent to the CM. It is to be written.
[0084]
As a result of adopting the processor system shown in FIG. 22, in comparison with the cache memory configuration example shown in FIG. 17, in the cache memory configuration shown in FIG. 19, the address multiplexers AM0 to AM3 receive the write data at the two inputs of the instruction address AI and the data address AD. The external write data WX (FIG. 17) of the multiplexers WM0 to WM3 becomes unnecessary because the instruction system is merged with the instruction system write data WI and the data system is merged with the data system write data WD. As a result of merging instruction-related external access AX (FIG. 17), there is instruction-related write data WI (FIG. 19) that was not present before the merge.
[0085]
Furthermore, the read data selection control signals CRI and CRD, which are simple bank signal decoding results in the bank interleaving method, are generated by the way selection control unit WSC provided in the cache memory CM of FIG. Details of the way selection control unit WSC are shown in FIG. The signals CRI and CRD are generated by AND logic of read data selection control signals CR0 to CR3 from the cache control unit CC (FIG. 21) and hit signals HT0 to HT3 from each way. As shown in FIG. 19, read data RI and RD are read from read data R0 to R3 read from each way WY0 to WY3 through read data multiplexers RMI and RMD, respectively, under the control of read data selection control signals CRI and CRD. The signal to be selected.
[0086]
FIG. 21 shows the cache control unit CC in the cache memory CM of the third embodiment shown in FIG. The cache control register CCR has an integrated bit U (a bit for distinguishing whether the cache memory is an integrated type or a separated type) and a way selection field WS0 to WS3. The integrated bit U indicates that all ways are shared with instruction data. The way selection fields WS0 to WS3 indicate which access is prioritized for each way in the instruction data access conflict when the integrated bit U is asserted. When the integrated bit U is negated, the way selection data Indicates whether it is for use. The way selection fields WS0 to WS3 are set to 0 when an instruction is selected and 1 when data is selected. At this time, the address selection controls CA0 to CA3, write data selection controls CW0 to CW3, and read data selection controls CR0 to CR3 output from the cache control unit CC can all be generated with the same logic. In the figure, 1 and 0 represent bits 1 and 0, respectively, which are instruction system and data system selection control signals.
[0087]
For example, as shown in FIG. 19, bit 0 of the address selection control signal CA0 is a selection control signal for the data address AD of way 0. The condition for selecting the data address AD is that the instruction access request REQI is negated or the way selection field WS0 is 1 when the data access request REQD is asserted in FIG. At this time, the way 0 is either prioritized for data access or for data depending on the value of the integrated bit U. In either case, data access is performed. Bit 1 of the address selection control CA0 is an inverted signal of bit 0. For this reason, in this embodiment, when both of the instruction access request REQI and the data access request REQD are negated, the instruction address AI is selected when either can be selected as the address. Other control signals are similarly generated.
[0088]
The integrated bit U is not required for normal cache access, but is required for replacement of a cache entry. When the integrated bit U is asserted, the replacement entry candidates are all ways. As a result, an integrated cache memory in which instructions and data are mixed is obtained. When the integrated bit U is negated, the way that is a candidate for the replacement entry is a way in which the way selection fields WS0 to WS3 are 0 at the time of instruction replacement, and the way selection fields WS0 to WS3 are 1 at the time of data replacement. Way only. This results in a Harvard architecture because only instructions or data are written for each way.
[0089]
The operation example when the way is used for the cache memory CM described above is summarized. A plurality of address selection control signals and write data selection control generated by the cache control unit CC by inputting the instruction fetch request REQI and the data access request REQD. Depending on the signal, a plurality of address signals (AI, AD) and write data (WI, WD) are passed through each of a plurality of selectors to different ways in a plurality of ways at the same time. Sequentially, a plurality of access addresses or write data are given.
[0090]
Further, in each of the plurality of ways, writing of the write data to the access address or reading of the data from the access address is optionally performed simultaneously for different ways and sequentially for the same way.
[0091]
【The invention's effect】
According to the present invention, it is possible to achieve simultaneous execution of instruction fetch and data access, which can be achieved only with the Harvard architecture, with the integrated cache memory architecture. This makes it possible to simultaneously achieve ease of instruction rewriting and high performance.
[0092]
In addition, even when using an application and wanting to cache one of the instructions and data intensively, the entire capacity can be utilized without wasting one cache as in the Harvard architecture.
[0093]
In addition, both the integrated cache memory architecture and the Harvard architecture can be realized with the same processor. Furthermore, various cache memory configurations can be realized by the same processor.
[Brief description of the drawings]
FIG. 1 is a diagram showing the transition of a cache memory architecture.
FIG. 2 is a diagram illustrating an example of a processor system to which the present invention is applied.
FIG. 3 is a diagram showing a first embodiment of a cache memory to which the present invention is applied;
FIG. 4 is a diagram illustrating a cache control unit according to the first embodiment of this invention.
FIG. 5 is a diagram illustrating a first example of a bank signal generation unit.
FIG. 6 is a diagram illustrating a second example of the bank signal generation unit.
FIG. 7 is a diagram illustrating a first operation example of a bank signal generation unit.
FIG. 8 is a diagram illustrating a first operation example of a cache control unit.
FIG. 9 is a diagram illustrating a first operation example of a cache memory.
FIG. 10 is a diagram illustrating a second operation example of the bank signal generation unit.
FIG. 11 is a diagram illustrating a second operation example of the cache control unit.
FIG. 12 is a diagram illustrating a second operation example of the cache memory.
FIG. 13 is a diagram illustrating a third operation example of the bank signal generation unit.
FIG. 14 is a diagram illustrating a third operation example of the cache control unit.
FIG. 15 is a diagram illustrating a third operation example of the cache memory;
FIG. 16 is a diagram illustrating a fourth operation example of the bank signal generation unit.
FIG. 17 is a diagram showing a second embodiment of a cache memory to which the present invention is applied;
FIG. 18 illustrates a cache control unit according to the second embodiment.
FIG. 19 is a diagram showing a third embodiment of a cache memory to which the present invention is applied.
FIG. 20 is a diagram illustrating a way selection control unit according to a third embodiment;
FIG. 21 is a diagram illustrating a cache control unit according to a third embodiment;
FIG. 22 is a diagram showing an example of a processor system when a way is used instead of a bank in a cache memory to which the present invention is applied.
[Explanation of symbols]
CPU: Central processing unit, IFU: Instruction fetch unit, EXU: Execution unit, BIU: Bus interface unit, AI: Instruction address, AD: Data address, AX: External address, REQI: Instruction fetch request, REQD: Data access request , REQX: external access request, RI, RD, RX: read data, WD, WX: write data.

Claims

In a processor system having a processor capable of independently processing a plurality of commands and a cache memory that operates in response to an access request from the processor,
The cache memory has a plurality of banks and a plurality of ports, and simultaneously processes a plurality of control commands and a plurality of address signals including an instruction fetch transmitted from the processor via the plurality of ports. Is possible, and
The cache memory includes a cache control unit that controls address and data writing of the cache memory, a signal generation unit that generates a control signal that controls the cache control unit, and a bank control field that controls the signal generation unit. A cache control register,
The signal generation unit corresponds to a value of a specific bit provided in an address and outputs the control signal according to the bank control field;
The cache control unit is configured to output an address selection control signal or write data selection control signal for controlling the cache memory address and data writing, respectively in response to said control signal and said access request,
2. The processor system according to claim 1, wherein whether the cache memory functions as an integrated cache memory or a separate cache memory is selected according to the address selection control signal or the write data selection control signal.

In a processor system having a processor capable of independently processing instruction fetch and data access and a cache memory operating in response to an access request from the processor ,
The cache memory is composed of a plurality of banks specified by a plurality of selectors and a part of a plurality of addresses, each bank is a one-port cache, and if the instruction fetch request and the data access request are for different banks, Processing, in the case of the same bank, sequentially processing,
The cache memory includes a cache control unit that controls address and data writing of the cache memory, a signal generation unit that generates a control signal that controls the cache control unit, and a bank control field that controls the signal generation unit. A cache control register,
The signal generation unit corresponds to a value of a specific bit provided in an address and outputs the control signal according to the bank control field;
The cache control unit is configured to output an address selection control signal or write data selection control signal for controlling the cache memory address and data writing, respectively in response to said control signal and said access request,
2. The processor system according to claim 1, wherein whether the cache memory functions as an integrated cache memory or a separate cache memory is selected according to the address selection control signal or the write data selection control signal.

The processor system according to claim 1, wherein
The plurality of control commands include any one of an instruction fetch request and a data access request, or the instruction fetch request, the data access request, and an external access request, and the plurality of address signals include an instruction address signal and A processor system comprising a data address signal or any one of the instruction address signal, the data address signal, and an external address signal.

The processor system according to claim 3, wherein
In response to the instruction fetch request, the data access request, and the input of the control signal, the cache control unit performs further allocation when the data access request is already assigned to the bank designated based on the control signal. A delay signal is generated so as not to be executed, and when the data access request is not yet assigned to the bank designated based on the control signal, the address selection control signal or the write data selection control signal is generated. A featured processor system.

The processor system according to claim 3, wherein
Further, the address signal and the data access request and the external access request, and the control of the address selection control signal and the write data selection control signal generated by the cache control unit upon input of the control signal, A processor system characterized in that a plurality of access addresses are given to different banks in a plurality of banks simultaneously and sequentially to the same bank through each of a plurality of selectors from write data.

The processor system according to claim 5, wherein
Further, in each of the plurality of banks, the writing of the write data to the plurality of access addresses or the data reading from the access address is performed simultaneously for the different banks and sequentially for the same bank. A processor system that is optionally performed.

The processor system according to claim 2, wherein
A part of the plurality of addresses is a specific bit in each of the plurality of addresses, and instruction data is obtained by using the specific bit when designating each of the plurality of banks in the cache memory. A plurality of cache control units that operate as an integrated cache memory and are generated by a cache control unit included in the cache memory based on an input of the control signal generated using a predetermined value instead of a part of the specific bit A processor system which operates as an instruction data separation type cache memory by designating the bank via each of the plurality of selectors under the control of an address selection control signal and a write data selection control signal.

The processor system according to any one of claims 3 to 6,
The cache memory further includes a port for processing the external access request that is an external access request, and simultaneously processes at least two requests among the instruction fetch request, the data access request, and the external access request. A processor system characterized by the above.

In a processor system having a processor capable of independently processing a plurality of commands and a cache memory that operates in response to an access request from the processor,
The cache memory has a plurality of ports, and is capable of simultaneously processing a plurality of control commands and a plurality of address signals including an instruction fetch transmitted from the processor via the plurality of ports. ,
The plurality of control commands include an instruction fetch request and a data access request, the plurality of address signals include an instruction address signal and a data address signal, the cache memory further includes a cache control unit for controlling the cache memory, the cache A cache control register having an integrated bit and a way selection field for controlling the control unit, a plurality of ways, a plurality of selectors, and a way selection control unit,
The integrated bit is a bit for distinguishing whether the cache memory is integrated or separated,
The cache memory is selected to function as an integrated cache memory or a separate cache memory based on the integrated bit,
The way selection field is a signal indicating a priority for data access and instruction access in each of the plurality of ways according to the value of the integrated bit, or each of the plurality of ways is used for data access and instruction access. A processor system, wherein the signal is switched to a designated signal.

The processor system according to claim 9, wherein
Further, each of the plurality of selectors is selected from the plurality of address signals and write data by an address selection control signal and a write data selection control signal generated by the cache control unit upon input of the instruction fetch request and the data access request. Thus, a plurality of access addresses or the write data are given to different ways in the plurality of ways simultaneously and sequentially to the same way .

The processor system according to claim 10, wherein
Further, in each of the plurality of ways, writing of write data to the access address or reading of data from the access address is arbitrarily performed simultaneously for the different ways and sequentially for the same way. A processor system characterized by the above.

The processor system according to claim 9, wherein
A processor system which operates as an instruction data separation type cache memory by caching only one of an instruction and data for each of the plurality of ways.

The processor system according to any one of claims 1 to 12,
A processor system, wherein the processor and the cache memory are integrated on the same chip.

Comprising a plurality of banks, a controller for controlling the plurality of banks, the controller generates a control signal for writing or reading instruction or data to each of the plurality of banks, before Symbol plurality of banks A cache memory for simultaneously writing or reading the instruction or data to different banks in the bank,
The cache memory includes a cache control register including a bank control field that controls the controller when generating the control signal;
The controller is the control signal, in correspondence with the value of a specific bit provided to the address, and outputs in response to the bank control field, further, the address of the cache memory in response to said control signal及beauty access request and outputs the address selection control signal or write data selection control signal for controlling data writing, respectively,
The cache memory is selected to function as an integrated cache memory or a separate cache memory in accordance with the address selection control signal or the write data selection control signal.

The cache memory according to claim 14, wherein
Further, the address selection control signal or the write data selection control signal is supplied to each of the plurality of banks under the control of the controller , and the command or data write or read operation is sequentially performed on the same bank. Characteristic cache memory.

A plurality of ways, and a controller for controlling the plurality of ways, the controller generating a control signal for writing or reading a command or data in each of the plurality of ways; The control signal is supplied to the plurality of ways, the command or data is written to or read from different ways in the plurality of ways at the same time, and the control signal is controlled by the controller. A cache memory that sequentially supplies the instructions or data to the same way and performs write or read operations on the same way,
The cache memory includes a cache control register including an integration bit and a way selection field for controlling the controller when generating the control signal;
The integrated bit is a bit for distinguishing whether the cache memory is integrated or separated,
The cache memory is selected to function as an integrated cache memory or a separate cache memory based on the integrated bit,
The way selection field is a signal indicating a priority for data access and instruction access in each of the plurality of ways according to the value of the integrated bit, or each of the plurality of ways is used for data access and instruction access. A cache memory characterized by being switched to a designated signal.