JPH07210463A

JPH07210463A - Cache memory system and data processor

Info

Publication number: JPH07210463A
Application number: JP6021970A
Authority: JP
Inventors: Shigeru Nakahara; 茂中原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-01-21
Filing date: 1994-01-21
Publication date: 1995-08-11
Also published as: KR950033846A

Abstract

PURPOSE:To improve system performance by omitting unnecessary block transfer operation from a high-order hierarchy memory to an initiary cache mistake at the time of execution of a storing instruction. CONSTITUTION:Consecutive storing instructions causing the operation of the access of the whole data in the same entry are detected by a block transfer deterrence judging part 14 and the detection result is given to a memory control part 12 so as to prohibit the pertient memory control part from the operation of block-transferring an entry from the second cache memory 2 of a high order hierarchy to a low-order hierarchy memory such as a data cache memory 5 at the time of the initiary cache mistake.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、キャッシュメモリシス
テム更にはそれに利用されるデータプロセッサに係り、
例えば下位側キャッシュメモリを内蔵したデータプロセ
ッサ及び当該データプロセッサに外付けされた上位側キ
ャッシュメモリを備えたデータ処理システムに適用して
特に有効な技術に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache memory system and a data processor used therein,
For example, the present invention relates to a technique particularly effective when applied to a data processing system including a data processor having a lower cache memory built therein and an upper cache memory externally attached to the data processor.

【０００２】[0002]

【従来の技術】キャッシュメモリシステムとしては高速
且つ小容量のファーストキャッシュメモリを内蔵したデ
ータプロセッサの外にそれよりもアクセス速度は遅いも
のの記憶容量の比較的大きなセカンドキャッシュメモリ
を設けたシステムがある。ファーストキャッシュメモリ
及びセカンドキャッシュメモリは例えば複数のキャッシ
ュデータを単一のエントリとして複数エントリを保有す
る。2. Description of the Related Art As a cache memory system, there is a system in which a second cache memory having a relatively large storage capacity is provided in addition to a data processor having a built-in high speed and small capacity first cache memory. The first cache memory and the second cache memory hold a plurality of entries with a plurality of cache data as a single entry, for example.

【０００３】上記ファーストキャッシュメモリは下位側
キャッシュメモリとされ、セカンドキャッシュメモリは
上位側キャッシュメモリとされる。したがって、上記フ
ァーストキャッシュメモリはセカンドキャッシュメモリ
が保有するエントリのサブセットを保有しなければなら
ない。The first cache memory is a lower cache memory, and the second cache memory is an upper cache memory. Therefore, the first cache memory must hold a subset of the entries held by the second cache memory.

【０００４】このとき、書込みアクセスにおけるファー
ストキャッシュメモリのキャッシュミスに対するセカン
ドキャッシュメモリのキャッシュヒット時における処理
方式として、双方のキャッシュメモリを書換えるライト
アロケート方式と、上位側キャッシュメモリだけを書換
える非アロケート方式がある。At this time, as a processing method at the time of a cache hit of the second cache memory against a cache miss of the first cache memory in the write access, a write allocate method of rewriting both cache memories and a non-allocate method of rewriting only the upper cache memory. There is a method.

【０００５】ライトアロケート方式の場合、ファースト
キャッシュメモリはセカンドキャッシュメモリが保有す
るエントリのサブセットを保有しなければならないの
で、セカンドキャッシュメモリにおいてキャッシュヒッ
トしたキャッシュデータを含むエントリがセカンドキャ
ッシュメモリからファーストキャッシュメモリにブロッ
ク転送され、その後で双方のキャッシュメモリの該当デ
ータが書換えられる。このライトアロケート方式におい
ては、キャッシュミスを生じたキャッシュデータを含む
エントリがファーストキャッシュメモリに転送されてい
るので、ある時間の範囲内でのデータ参照には局在性が
ある（最近利用されたデータ近傍のデータがその直後に
利用される確立が高い）という経験則に従えば、その後
のメモリアクセスに際してのキャッシュヒット率の向上
が期待できる。但し、エントリのブロック転送に費やさ
れる時間だけ命令実行が遅れる。特に、セカンドキャッ
シュメモリはデータプロセッサの外部にバス接続され、
そのスループットがシステムの性能を律則するから、上
記経験則に従わないアクセスが頻発する場合にはシステ
ム性能の低下をもたらすことになる。In the case of the write allocate method, the first cache memory must hold a subset of the entries held by the second cache memory, so that the entry containing the cache-hit cache data in the second cache memory is transferred from the second cache memory to the first cache memory. Block transfer to the cache memory, and then the corresponding data in both cache memories is rewritten. In this write allocate method, since the entry including the cache-missed cache data is transferred to the first cache memory, there is a locality in the data reference within a certain time range (recently used data According to the rule of thumb that data in the vicinity is used immediately after that) is high, the cache hit rate can be expected to improve in subsequent memory accesses. However, the instruction execution is delayed by the time spent for the block transfer of the entry. In particular, the second cache memory is bus-connected to the outside of the data processor,
Since the throughput regulates the performance of the system, the system performance will be deteriorated when frequent accesses that do not follow the above empirical rule occur.

【０００６】一方、非ライトアロケート方式において
は、ファーストキャッシュメモリに対しては何等書込み
処理を行わないのでブロック転送を必要としない代わり
に、キャッシュメモリの経験則に従ったアクセスが頻発
してもキャッシュヒット率の向上を期待することができ
ない。このようにライトアロケート方式と非ライトアロ
ケート方式には一長一短があり、総合的な両者の優劣
は、実行するアプリケーションや実装するシステムによ
って変わってくる。On the other hand, in the non-write allocate method, since no write processing is performed to the first cache memory, block transfer is not required, but even if access is frequently performed according to the empirical rule of the cache memory. We cannot expect an improvement in the hit rate. As described above, the write allocate method and the non-write allocate method have advantages and disadvantages, and the overall superiority or inferiority of the both depends on the application to be executed and the system to be mounted.

【０００７】尚、キャッシュメモリについて記載された
文献の例としては昭和６２年１１月５日株式会社培風館
発行の「超高速ＭＯＳデバイス」第２８５頁〜第２８３
頁がある。As an example of the document describing the cache memory, "Ultra High Speed MOS Device", issued by Baifukan Co., Ltd., November 5, 1987, pages 285 to 283.
There is a page.

【０００８】[0008]

【発明が解決しようとする課題】本発明者は上記ライト
アロケート方式について更に検討したところ、ライトア
ロケート方式は本質的に余分な操作を実行する可能性の
あることが本発明者によって見い出された。すなわち、
キャッシュメモリの単一エントリに対するストアが全キ
ャッシュデータに対して行われないような場合や、スト
ア命令の間にロード命令が存在するような場合は、同一
エントリ内のキャッシュデータに対するアクセスが２回
目からはヒットになるので、最初にブロック転送が行わ
れてもそれによるによる余分なペナルティはその後のキ
ャッシュヒットによって低減できる。しかし、同一エン
トリ内の全キャッシュデータに対し、間にロード命令等
のメモリ参照命令が行われること無しにストア命令が連
続的に行われる場合は、最初のブロック転送が結果的に
余分な操作となる。これは、新たにブロック転送された
エントリ内の全てのキャッシュデータは一度も参照され
ること無しに、後続のストア命令により順次書き換えら
れてしまうからである。When the present inventor further examined the above-mentioned write allocate system, it was found by the present inventor that the write allocate system may essentially perform an extra operation. That is,
If a single entry in the cache memory is not stored for all cache data, or if there is a load instruction between store instructions, the cache data in the same entry is accessed from the second time. Becomes a hit, so even if the block transfer is performed first, the extra penalty caused by the block transfer can be reduced by the subsequent cache hit. However, if all cache data in the same entry are continuously stored without a memory reference instruction such as a load instruction, the first block transfer will result in an extra operation. Become. This is because all cache data in the newly block-transferred entry is sequentially rewritten by the subsequent store instruction without being referred to once.

【０００９】本発明の目的は、上記ライトアロケート方
式におけるような余計なブロック転送の操作を削減し
て、全体のシステム性能を向上させることにある。An object of the present invention is to reduce unnecessary block transfer operations as in the above write allocate system and improve the overall system performance.

【００１０】本発明の前記並びにその他の目的と新規な
特徴は本明細書の記述及び添付図面から明らかになるで
あろう。The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

【００１１】[0011]

【課題を解決するための手段】本願において開示される
発明のうち代表的なものの概要を簡単に説明すれば下記
の通りである。The outline of the representative one of the inventions disclosed in the present application will be briefly described as follows.

【００１２】（１）すなわち、簡単に説明すれば、キャ
ッシュメモリのエントリ中の全キャッシュデータに対す
る連続ストア検出のような手段を設け、その検出結果を
命令列に反映し、或は直接メモリ制御部に反映する。反
映されたその検出結果に基づいてブロック転送の起動を
制御する。(1) That is, in short, a means such as continuous store detection for all cache data in the entries of the cache memory is provided and the detection result is reflected in the instruction sequence, or the memory control unit is directly connected. Reflect on. The activation of the block transfer is controlled based on the reflected detection result.

【００１３】更に詳述すれば、上位側メモリと、上記上
位側メモリが保有するデータをキャッシュデータとして
当該複数のキャッシュデータを含んで成る単一のエント
リを複数エントリ保有するための下位側メモリと、書込
みを伴う命令実行における下位側メモリのキャッシュミ
スに応じて、当該キャッシュミスのキャッシュデータを
含むエントリを上位側メモリから下位側メモリにデータ
転送可能なメモリ制御部と、上記データ転送を行うか否
かの判別を上記書込みを伴う命令のコード情報に基づい
て判定する判定部であって、単一のエントリに含まれる
全てのキャッシュデータが連続的に書換えられる操作に
対してはメモリ制御部に上記データ転送を禁止させ、そ
うでない場合にはメモリ制御部に上記データ転送を許容
する判定部と、によってキャッシュメモリシステムを構
成する。上位側メモリはメインメモリであってもよく、
また、上位側キャッシュメモリであってもよい。上位及
び下位側キャッシュメモリを用いる場合、下位側キャッ
シュメモリは上位側キャッシュメモリが保有するエント
リのサブセットを保有する。More specifically, an upper memory and a lower memory for holding a plurality of single entries each including a plurality of cache data with the data held by the upper memory as cache data. In response to a cache miss in the lower memory in the execution of an instruction that involves writing, an entry including cache data of the cache miss can be transferred from the upper memory to the lower memory and a memory control unit that can perform the above data transfer. A determination unit that determines whether or not it is determined based on the code information of the instruction accompanied with the above-mentioned writing, and for the operation in which all the cache data included in a single entry is continuously rewritten, the memory control unit A determination unit that prohibits the data transfer, and otherwise permits the memory control unit to transfer the data; You configure the cache memory system. The upper memory may be the main memory,
It may also be a higher-side cache memory. When using the upper and lower cache memories, the lower cache memory holds a subset of the entries held by the upper cache memory.

【００１４】上記判定部を専らハードウェアに依存して
構成する場合には、単一のエントリを構成するキャッシ
ュデータの数に応ずる数のプリフェッチされた命令のコ
ード情報を並列的に受け、それによって単一のエントリ
に含まれる全てのキャッシュデータが連続的に書換えら
れる操作を指示している状態を検出する検出回路と、上
記検出結果が得られたときに検出回路に並列的に供給さ
れている全ての命令の実行期間中に上記検出状態を維持
する遅延回路と、上記遅延回路の出力を上記メモリ制御
部に渡すための記憶手段とによって当該判定部を構成で
きる。判定部で判定すべき命令をストア命令とする場合
には、上記並列的に受ける各命令のコード情報がストア
命令のオペレーションコードを含むことを検出するスト
ア検出部と、上記並列的に受ける複数命令のコード情報
が同一のエントリに含まれるキャッシュデータを重複す
ることなく全体として全て指すアドレス指定情報を含む
ものであることを検出するアドレス検出部と、上記スト
ア検出部及びアドレス検出部の双方において上記検出状
態が得られることを検出して上記遅延回路に供給する論
理ゲートとによって上記判定部を構成できる。In the case where the above-mentioned judging section is constituted exclusively by hardware, the code information of the prefetched instructions of the number corresponding to the number of cache data forming a single entry is received in parallel, and accordingly, A detection circuit that detects a state in which all cache data contained in a single entry is instructed to continuously rewrite, and a detection circuit that supplies the detection circuit in parallel when the detection result is obtained. The determination unit can be configured by a delay circuit that maintains the detection state during the execution period of all instructions and a storage unit that passes the output of the delay circuit to the memory control unit. When the instruction to be determined by the determination unit is a store instruction, a store detection unit that detects that the code information of each instruction received in parallel includes the operation code of the store instruction, and the plurality of instructions received in parallel Of the address detection unit for detecting that the code information of the address information includes all the cache data included in the same entry without duplication, and the detection state in both the store detection unit and the address detection unit. The determination unit can be configured by a logic gate that detects that the signal is obtained and supplies it to the delay circuit.

【００１５】上記判定部のハードウェア構成を簡素化す
るには、同一のエントリに含まれることになる全ての情
報を連続的に書換える操作を指示する命令列の夫々の命
令にはコンパイルの段階で上記データ転送禁止指示情報
を保有させる。この場合、上記判定部は、そのデータ転
送禁止指示情報に基づいてメモリ制御部に上記データ転
送を禁止させればよい。In order to simplify the hardware configuration of the determination unit, each instruction of the instruction sequence for instructing the operation of continuously rewriting all the information included in the same entry has a compile stage. Then, the data transfer prohibition instruction information is held. In this case, the determination section may cause the memory control section to prohibit the data transfer based on the data transfer prohibition instruction information.

【００１６】（２）上記キャッシュメモリシステムに適
用可能なデータプロセッサは、上位側キャッシュメモリ
を専用的に外付けで接続可能なインタフェース回路と、
上記上位側キャッシュメモリが保有するデータをキャッ
シュデータとして当該複数のキャッシュデータを含んで
成る単一のエントリを複数エントリ保有するための下位
側キャッシュメモリと、書込みを伴う命令実行時におけ
る下位側キャッシュメモリのキャッシュミスに対する上
位側キャッシュメモリのキャッシュヒット時に、当該キ
ャッシュミスのキャッシュデータを含む上位側キャッシ
ュメモリのエントリを下位側キャッシュメモリにデータ
転送可能なメモリ制御部と、上記データ転送をするか否
かの判別を書込みを伴う命令のコード情報に基づいて判
定する判定部であって、単一のエントリに含まれる全て
のキャッシュデータが連続的に書換えられる操作に対し
てはメモリ制御部に上記データ転送を禁止させ、そうで
ない場合にはメモリ制御部に上記データ転送を許容する
判定部と、パイプライン的に命令を実行する中央処理装
置と、を備えて１チップ化して構成できる。(2) The data processor applicable to the cache memory system described above is an interface circuit capable of externally connecting the upper cache memory exclusively.
A lower cache memory for holding a plurality of single entries including the plurality of cache data with the data held by the upper cache memory as cache data, and a lower cache memory at the time of executing an instruction involving writing When a cache hit occurs in the upper cache memory in response to the cache miss of the above, a memory control unit capable of transferring the entry of the upper cache memory including the cache data of the cache miss to the lower cache memory and whether or not the above data transfer is performed. Is a determination unit that determines the determination based on code information of an instruction that involves writing, and transfers the data to the memory control unit for an operation in which all cache data included in a single entry is continuously rewritten. Ban, otherwise note A determination unit that permits the data transfer to the control unit, a central processing unit for executing a pipeline manner instructions can be configured into a single chip includes a.

【００１７】[0017]

【作用】上記した手段（１）によれば、ソフトウェア的
に命令に付加された情報を参照して、或は専らハードウ
ェア的な構成に依存して、エントリのデータ転送即ちキ
ャッシュデータのブロック転送を選択的に禁止するため
の判定部は、キャッシュメモリの同一エントリ中の全キ
ャッシュデータに対する連続ストアのような操作を検出
し、当該操作に際して、当該エントリの全キャッシュデ
ータに対するブロック転送を禁止する。このことは、新
たにブロック転送されても一度も参照されること無く後
続のストア命令により順次キャッシュデータが書き換え
られてしまうことになるエントリの無駄なブロック転送
を削減し、システム性能を向上させる。According to the above-mentioned means (1), the data transfer of the entry, that is, the block transfer of the cache data is carried out by referring to the information added to the instruction by software or exclusively depending on the hardware structure. The determination unit for selectively prohibiting the operation detects an operation such as continuous store for all cache data in the same entry of the cache memory, and prohibits block transfer for all cache data of the entry at the time of the operation. This reduces wasteful block transfer of entries that would cause cache data to be sequentially rewritten by subsequent store instructions without being referenced even if a new block is transferred, thus improving system performance.

【００１８】上記した手段（２）によれば、データプロ
セッサは、高速且つ小容量の内蔵下位側キャッシュメモ
リに比べてアクセス速度は遅いものの記憶容量の比較的
大きな上位側キャッシュメモリを外部にバス接続で保有
することができ、このとき、上位側キャッシュメモリは
データプロセッサの外部にバス接続されるためそのスル
ープットがシステムの性能を律則するが、上位側キャッ
シュメモリから内蔵下位キャッシュメモリへの無駄なブ
ロック転送を削減することができ、これによってシステ
ム性能を向上させる。According to the above means (2), the data processor connects the upper cache memory, which has a relatively large storage capacity but has a slower access speed than the built-in lower cache memory having a high speed and a small capacity, to the external bus. In this case, since the upper cache memory is connected to the outside of the data processor by a bus, its throughput regulates the system performance, but there is no waste of data from the upper cache memory to the internal lower cache memory. Block transfers can be reduced, which improves system performance.

【００１９】[0019]

【実施例】図１には本発明の一実施例に係るキャッシュ
メモリシステムのブロック図が示される。同図において
１は、特に制限されないが、ＲＩＳＣ（縮小命令セット
を持つアーキテクチャー）形式のデータプロセッサ１で
あり、このデータプロセッサ１には、セカンドキャッシ
ュメモリ２が結合され、また、外部バスとしてのシステ
ムバス３が結合されている。特に図示はしないが、上記
システムバス３にはメインメモリ、補助記憶装置、入出
力回路などが接続される。1 is a block diagram of a cache memory system according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a RISC (architecture having a reduced instruction set) format data processor 1, which is not particularly limited. A second cache memory 2 is coupled to the data processor 1 and an external bus is provided. The system bus 3 is connected. Although not shown, a main memory, an auxiliary storage device, an input / output circuit, etc. are connected to the system bus 3.

【００２０】データプロセッサ１はインタフェース回路
８を有する。このインタフェース回路８は外部において
上記セカンドキャッシュメモリ２及びシステムバス３な
どに結合され、内部においては後述する内蔵回路モジュ
ールに専用若しくは共通信号線を介して結合される。内
蔵回路モジュールとして、命令キャッシュメモリ４、デ
ータキャッシュメモリ５、アドレス変換バッファ（ＴＬ
Ｂ）６、タグキャッシュメモリ７、ライトバッファ９、
命令制御部１０、演算部１１、メモリ制御部１２、命令
バッファレジスタ１３、及びブロック転送抑止判定部１
４を有する。The data processor 1 has an interface circuit 8. The interface circuit 8 is externally coupled to the second cache memory 2 and the system bus 3 and the like, and internally coupled to a later-described built-in circuit module via a dedicated or common signal line. As a built-in circuit module, an instruction cache memory 4, a data cache memory 5, an address translation buffer (TL
B) 6, tag cache memory 7, write buffer 9,
Instruction control unit 10, arithmetic unit 11, memory control unit 12, instruction buffer register 13, and block transfer inhibition determination unit 1
Have 4.

【００２１】命令バッファレジスタ１３は命令プリフェ
ッチキューなどを構成する。命令バッファレジスタ１３
から出力される命令は命令制御部で解読されて内部の各
種制御信号を生成する。演算部１１は演算器やレジスタ
ファイルなどを備え、アドレス演算やデータ演算などを
行って命令を実行する。命令実行に当たって図示しない
外部のメインメモリ及びセカンドキャッシュメモリ２な
どのアクセス制御はメモリ制御部１２が行う。更にメモ
リ制御部１２は、内部の命令キャッシュメモリ４、デー
タキャッシュメモリ５、及びタグキャッシュメモリ７に
対して後述のエントリの転送やキャッシュデータの書込
み読出し制御などを行う。The instruction buffer register 13 constitutes an instruction prefetch queue and the like. Instruction buffer register 13
The command output from is decoded by the command control unit to generate various internal control signals. The arithmetic unit 11 includes an arithmetic unit, a register file, and the like, and performs address arithmetic, data arithmetic, and the like to execute an instruction. The memory control unit 12 controls access to an external main memory (not shown) and the second cache memory 2 when executing an instruction. Further, the memory control unit 12 performs transfer of entries, which will be described later, and write / read control of cache data to the internal instruction cache memory 4, the data cache memory 5, and the tag cache memory 7.

【００２２】上記命令キャッシュメモリ４、データキャ
ッシュメモリ５、アドレス変換バッファ６及びタグキャ
ッシュメモリ７によって構成される回路は、セカンドキ
ャッシュメモリ２に対する下位側キャッシュメモリ（フ
ァーストキャッシュメモリとも記す）を構成する。ここ
で当該ファーストキャッシュメモリの構成を図２に基づ
いて説明しておく。The circuit composed of the instruction cache memory 4, the data cache memory 5, the address translation buffer 6 and the tag cache memory 7 constitutes a lower cache memory (also referred to as a first cache memory) for the second cache memory 2. Here, the configuration of the first cache memory will be described with reference to FIG.

【００２３】命令キャッシュメモリ４は命令を記憶し、
データキャッシュメモリ５はデータを格納するが、キャ
ッシュメモリに格納されるデータや命令を総称してキャ
ッシュデータと称する。命令キャッシュメモリ４及びデ
ータキャッシュメモリ５は複数のキャッシュデータを単
一のエントリとして複数エントリを保有する。代表的に
データキャッシュメモリ５に着目して説明する。タグキ
ャッシュメモリ７はデータキャッシュメモリ５のエント
リに一対一対応される物理アドレス情報をそのエントリ
のタグ情報として保有する。データキャッシュメモリ５
及びタグキャッシュメモリ７は演算部１１から出力され
る論理アドレスに基づいてアクセスされる。アドレス変
換バッファ６は演算部１１から出力される論理アドレス
に対応する物理アドレスを出力する。例えば、ストア命
令やロード命令のようなメモリアクセスを伴う命令を実
行するとき、演算部１１がそのためのアドレス演算を行
って論理アドレスを出力すると、それがデータキャッシ
ュメモリ５、タグキャッシュメモリ７及びアドレス変換
バッファ６に供給される。これによってアドレス変換バ
ッファ６から出力される物理アドレスとタグキャッシュ
メモリ７から出力されるタグ情報とがコンパレータ１５
で比較され、一致する場合にはキャッシュヒットとさ
れ、このとき実行されている命令がロード命令であるな
ら、データキャッシュメモリ５から読出されるキャッシ
ュデータが当該命令実行に利用され、ストア命令で有る
場合にはそのキャッシュデータが書換えられる。なお、
タグ情報にはキャッシュデータの有効性を示すための有
効ビット、マルチプロセッサシステムにおいてキャッシ
ュ対象情報の共有が行われているかを示すためのシェア
ービットなどを有し、それらも参照されてキャッシュヒ
ット／ミスの判定などが行われる。The instruction cache memory 4 stores instructions,
The data cache memory 5 stores data, and the data and instructions stored in the cache memory are collectively referred to as cache data. The instruction cache memory 4 and the data cache memory 5 hold a plurality of entries with a plurality of cache data as a single entry. The data cache memory 5 will be representatively described. The tag cache memory 7 holds the physical address information that corresponds to the entry of the data cache memory 5 on a one-to-one basis as the tag information of that entry. Data cache memory 5
The tag cache memory 7 is accessed based on the logical address output from the arithmetic unit 11. The address translation buffer 6 outputs a physical address corresponding to the logical address output from the arithmetic unit 11. For example, when an instruction involving memory access such as a store instruction or a load instruction is executed, when the operation unit 11 performs an address operation for that and outputs a logical address, it outputs the data cache memory 5, the tag cache memory 7, and the address. It is supplied to the conversion buffer 6. As a result, the physical address output from the address conversion buffer 6 and the tag information output from the tag cache memory 7 are compared.
If the instruction being executed at this time is a load instruction, the cache data read from the data cache memory 5 is used for executing the instruction and is a store instruction. In that case, the cache data is rewritten. In addition,
Tag information has a valid bit for indicating the validity of cache data, a share bit for indicating whether or not cache target information is shared in a multiprocessor system, and these are also referred to for cache hit / miss. Is determined.

【００２４】上記セカンドキャッシュメモリ２は、特に
制限されないが、データキャッシュ、命令キャッシュ、
及びタグキャッシュの各メモリから構成される。このセ
カンドキャッシュメモリ２は上記メモリ制御部１２によ
ってアクセス制御される。The second cache memory 2 is not particularly limited, but includes a data cache, an instruction cache,
And each memory of the tag cache. The access of the second cache memory 2 is controlled by the memory control unit 12.

【００２５】セカンドキャッシュメモリ２とファースト
キャッシュメモリ４〜７との関係を説明する。上記ファ
ーストキャッシュメモリ４〜７は下位側キャッシュメモ
リとされ、セカンドキャッシュメモリ２は上位側キャッ
シュメモリとされる。この関係を満足するには、上記フ
ァーストキャッシュメモリ４〜７はセカンドキャッシュ
メモリ２が保有するエントリのサブセットを保有しなけ
ればならない。したがって、例えばメモリからレジスタ
へのロード命令においてファーストキャッシュメモリが
キャッシュミスで上位側のセカンドキャッシュメモリが
キャッシュヒットの場合、ロードすべきデータはセカン
ドキャッシュメモリから読出される。このとき、ファー
ストキャッシュメモリ４〜７に対してはキャッシュミス
のキャッシュデータを含むエントリをセカンドキャッシ
ュメモリ２からデータ転送してもしなくての上記関係は
満足される。ある時間の範囲内でのデータ参照には局在
性があるという経験則に従えば、その様な場合に無条件
に上記データ転送することは得策である。The relationship between the second cache memory 2 and the first cache memories 4 to 7 will be described. The first cache memories 4 to 7 are lower cache memories, and the second cache memory 2 is an upper cache memory. To satisfy this relationship, the first cache memories 4 to 7 must hold a subset of the entries held by the second cache memory 2. Therefore, for example, in a load instruction from a memory to a register, when the first cache memory is a cache miss and the upper second cache memory is a cache hit, the data to be loaded is read from the second cache memory. At this time, the above relationship is satisfied even if the entry including the cache miss cache data is not transferred from the second cache memory 2 to the first cache memories 4 to 7. According to the empirical rule that the data reference within a certain time range has locality, it is a good idea to transfer the data unconditionally in such a case.

【００２６】一方、例えばレジスタからメモリへのスト
ア命令においてファーストキャッシュメモリがキャッシ
ュミスで上位側のセカンドキャッシュメモリがキャッシ
ュヒットの場合、ストアすべきデータはセカンドキャッ
シュメモリに書き込まれる。このとき、ファーストキャ
ッシュメモリ４〜７も書換えるか否かによってその処理
はライトアロケート方式と非アロケート方式に分けて考
えられる。On the other hand, for example, in the case of a store instruction from a register to a memory, when the first cache memory is a cache miss and the upper second cache memory is a cache hit, the data to be stored is written in the second cache memory. At this time, depending on whether or not the first cache memories 4 to 7 are also rewritten, the processing can be divided into the write allocate system and the non-allocate system.

【００２７】ライトアロケート方式は、その後のデータ
参照におけるファーストキャッシュメモリのキャッシュ
ヒットを期待して双方のキャッシュメモリを書換える方
式である。ファーストキャッシュメモリ４〜７はセカン
ドキャッシュメモリ２が保有するエントリのサブセット
を保有しなければならないので、セカンドキャッシュメ
モリ２においてキャッシュヒットしたキャッシュデータ
を含むエントリがセカンドキャッシュメモリ２からファ
ーストキャッシュメモリ４〜７にブロック転送され、そ
の後で双方のキャッシュメモリの該当データが書換えら
れる。The write allocate system is a system in which both cache memories are rewritten in the expectation of a cache hit of the first cache memory in the subsequent data reference. Since the first cache memories 4 to 7 must hold a subset of the entries held by the second cache memory 2, the entries including the cache data cache hit in the second cache memory 2 are transferred from the second cache memory 2 to the first cache memories 4 to 7. Block transfer to the cache memory, and then the corresponding data in both cache memories is rewritten.

【００２８】図３にはライトアロケート方式による手順
が示される。データＸ’のストア動作時にファーストキ
ャッシュキャッシュメモリ４〜７（以下ファーストキャ
ッシュメモリＦＣＭとも記す）がキャッシュミスである
場合（）、セカンドキャッシュメモリ２（以下セカン
ドキャッシュメモリＳＣＭとも記す）からミスしたエン
トリのデータがブロック転送によってファーストキャッ
シュメモリＦＣＭに書込まれる（）。ブロック転送さ
れたエントリには前のデータＸが含まれている。そし
て、キャッシュミスしたデータＸ’がファーストキャッ
シュメモリＦＣＭとライトバッファ９（以下単にライト
バッファＷＴＢとも記す）に書き込まれる（）。ライ
トバッファＷＴＢに書込まれたデータＸ’はセカンドキ
ャッシュメモリＳＣＭとデータプロセッサ１とを接続す
るバスが開放されているタイミングを見計らって当該セ
カンドキャッシュメモリＳＣＭに転送される。〜の
期間においてデータプロセッサ１によるパイプラインは
数サイクルの期間ロックされる可能性がある。例えば図
５に示されるパイプライン的な命令実行シーケンスにお
いて、ストア命令のメモリアクセスで上記キャッシュミ
スがあると、当該メモリアクセスステージＡが複数ステ
ージ分連続して、その期間当該命令実行が実質的に中断
される。FIG. 3 shows a procedure according to the write allocate system. If a cache miss occurs in the first cache cache memories 4 to 7 (hereinafter also referred to as the first cache memory FCM) during the store operation of the data X ′ (), the entry that has been missed from the second cache memory 2 (hereinafter also referred to as the second cache memory SCM). Data is written in the first cache memory FCM by block transfer (). The block-transferred entry contains the previous data X. Then, the cache-missed data X ′ is written to the first cache memory FCM and the write buffer 9 (hereinafter also simply referred to as the write buffer WTB) (). The data X ′ written in the write buffer WTB is transferred to the second cache memory SCM in consideration of the timing when the bus connecting the second cache memory SCM and the data processor 1 is released. During the period of ~, the pipeline by the data processor 1 may be locked for a period of several cycles. For example, in the pipelined instruction execution sequence shown in FIG. 5, if there is a cache miss in the memory access of a store instruction, the memory access stage A continues for a plurality of stages, and the instruction execution is substantially executed during that period. Suspended.

【００２９】図４には非ライトアロケート方式による手
順が示される。データＸ’のストア動作時にファースト
キャッシュメモリＦＣＭがキャッシュミスである場合
（）、ブロック転送は起動されず、セカンドキャッシ
ュメモリＳＣＭのシェアービットＳ（マルチプロセッサ
構成時の共有データを表わす）の値をチェックして
（）、キャッシュミスであった新しいデータＸ’をラ
イトバッファＷＴＢのみに書き込む（）。ライトバッ
ファＷＴＢに書込まれたデータＸ’はセカンドキャッシ
ュメモリＳＣＭとデータプロセッサ１とを接続するバス
が開放されているタイミングを見計らって当該セカンド
キャッシュメモリＳＣＭに転送される。〜の期間は
データプロセッサ１のパイプラインは数サイクルの間ロ
ックされる可能性がある。FIG. 4 shows a procedure according to the non-write allocate system. If the first cache memory FCM has a cache miss during the store operation of the data X '(), the block transfer is not started, and the value of the share bit S (representing the shared data in the multiprocessor configuration) of the second cache memory SCM is checked. Then, (), the new data X'that was the cache miss is written only in the write buffer WTB (). The data X ′ written in the write buffer WTB is transferred to the second cache memory SCM in consideration of the timing when the bus connecting the second cache memory SCM and the data processor 1 is released. During the period of ~, the pipeline of the data processor 1 may be locked for several cycles.

【００３０】上記非ライトアロケート方式のパイプライ
ンロックによるペナルティはライトアロケート方式の１
／２〜２／３程度で済む。非ライトアロケート方式はキ
ャッシュミスしたエントリのブロック転送を行わないか
らである。但し、その代わりに、その後に局在性のある
データ参照が頻発しても非ライトアロケート方式ではフ
ァーストキャッシュメモリＦＣＭにおけるキャッシュヒ
ット率の向上を期待することはできない。The penalty due to the pipeline lock of the non-write allocate system is 1 in the write allocate system.
/ 2 to 2/3 is sufficient. This is because the non-write allocate method does not transfer the block of the cache missed entry. However, instead, even if local data references frequently occur thereafter, the non-write allocate method cannot be expected to improve the cache hit rate in the first cache memory FCM.

【００３１】上述のブロック転送はメモリ制御部１２が
制御する。すなわち、メモリへのストア時におけるファ
ーストキャッシュメモリのキャッシュミスに対するセカ
ンドキャッシュメモリ２のキャッシュヒット時に、当該
キャッシュミスのキャッシュデータを含むセカンドキャ
ッシュメモリ２のエントリをファーストキャッシュメモ
リにデータ転送制御する。The memory controller 12 controls the above block transfer. That is, at the time of a cache hit of the second cache memory 2 against the cache miss of the first cache memory at the time of storing in the memory, the data transfer control of the entry of the second cache memory 2 including the cache data of the cache miss to the first cache memory.

【００３２】本実施例においてはライトアロケート方式
におけ上記ブロック転送が実質的に無駄となる場合には
ブロック転送を行わないように、そのときには非ライト
アロケート方式の利点を得られるように、ライトアロケ
ート方式の手順を一部修正するものである。換言すれ
ば、局在性の有るデータ参照が発生しないことが明らか
な場合には上記ブロック転送を行わず、そうでない場合
には局在性のあるデータ参照によるキャッシュヒット率
向上を期待して上記ブロック転送を行うようにするもの
である。In the present embodiment, in the write allocate system, when the above block transfer is substantially wasted, the block transfer is not performed, and at that time, the write allocate is performed so that the advantage of the non-write allocate system can be obtained. This is a partial modification of the procedure of the method. In other words, the block transfer is not performed when it is clear that the localized data reference does not occur, and otherwise, the cache hit rate is expected to improve by the localized data reference. Block transfer is performed.

【００３３】図６には局在性の有るデータ参照が発生し
ないことが明らかな場合とそうでない場合の一例が示さ
れる。同図に示される様に一つのエントリは４個のキャ
ッシュデータを保有するものとする。図６の命令列
（ａ）は、一つのエントリに対するストアがその全部の
キャッシュデータに対して行われないような場合や、ス
トア命令の間にロード命令が存在するような場合であ
る。この場合、同一エントリのキャッシュデータに対す
るアクセスが２回目からはキャッシュヒットになるの
で、ブロック転送（ＢＴ）による余分なペナルティを低
減できる。図６の命令列（ｂ）は、同一エントリ内の全
キャッシュデータに対し、間にロード命令等のメモリ参
照命令が介在されることなく連続的にストア命令が実行
される場合である。このとき、最初のブロック転送（Ｂ
Ｔ）が余分な操作となる。これは、新たにセカンドキャ
ッシュメモリからファーストキャッシュメモリにブロッ
ク転送されたエントリ内の全てのキャッシュデータはそ
の後一度も参照されることなく（ここではＢ，Ｃ，Ｄの
データ）、後続のストア命令により書き換えられてしま
うからである。FIG. 6 shows an example of the case where it is clear that a localized data reference does not occur and a case where it is not. As shown in the figure, one entry holds four pieces of cache data. The instruction sequence (a) in FIG. 6 shows a case in which one entry is not stored for all the cache data, or a load instruction exists between store instructions. In this case, since the cache data of the same entry is accessed as a cache hit from the second time, an extra penalty due to block transfer (BT) can be reduced. The instruction sequence (b) in FIG. 6 is a case where store instructions are continuously executed for all cache data in the same entry without a memory reference instruction such as a load instruction interposed therebetween. At this time, the first block transfer (B
T) is an extra operation. This is because all the cache data in the entry newly block-transferred from the second cache memory to the first cache memory is never referred to (here, the data of B, C, D) and is stored by the subsequent store instruction. Because it will be rewritten.

【００３４】次に上述のような同一エントリの全てのキ
ャッシュデータが書換えられるような連続ストア命令に
よる操作に対してメモリ制御部１２によるブロック転送
を禁止するための具体的な一例を図７乃至図９をも参照
しながら説明する。Next, a concrete example for prohibiting the block transfer by the memory control unit 12 for the operation by the continuous store instruction in which all the cache data of the same entry is rewritten as described above will be described with reference to FIGS. The description will be made with reference to FIG.

【００３５】図７に示されるように上記命令バッファレ
ジスタ１３は直列４段の命令キュー１６Ａ〜１６Ｄを備
える。初段の命令キュー１６Ａは命令キャッシュメモリ
４から命令が伝達される。終段の命令キュー１６Ｄの出
力は命令レジスタ２３に供給される。ブロック転送抑止
判定部１４は、上記ブロック転送をするか否かの判別を
命令のコード情報に基づいて判定するものであって、単
一のエントリに含まれる全てのキャッシュデータが連続
的に書換えられる操作に対してはメモリ制御部１２に上
記データ転送を禁止させ、そうでない場合にはメモリ制
御部１２に上記データ転送を許容する。As shown in FIG. 7, the instruction buffer register 13 includes serial four-stage instruction queues 16A to 16D. Instructions are transmitted from the instruction cache memory 4 to the instruction queue 16A in the first stage. The output of the last-stage instruction queue 16D is supplied to the instruction register 23. The block transfer inhibition determination unit 14 determines whether or not to perform the block transfer based on the code information of the instruction, and all cache data included in a single entry is continuously rewritten. In response to the operation, the memory control unit 12 is prohibited from performing the data transfer, and otherwise, the memory control unit 12 is allowed to perform the data transfer.

【００３６】すなわち、本実施例に従えば、図９に示さ
れるようにキャッシュメモリの一つのエントリは４個の
キャッシュデータを含む。例えばキャッシュデータをア
クセスするための論理アドレスの下位２ビットは一つの
エントリ内でどのキャッシュデータを指すかを示す信号
とみなされ、その上位側ビットは一つのエントリを指す
ためのビット列とみなされる。That is, according to this embodiment, as shown in FIG. 9, one entry of the cache memory includes four cache data. For example, the lower 2 bits of the logical address for accessing the cache data are regarded as a signal indicating which cache data is pointed within one entry, and the upper bits thereof are regarded as a bit string for pointing one entry.

【００３７】図７に示されるように上記ブロック転送抑
止判定部１４は、直列４段の命令キュー１６Ａ〜１６Ｄ
の夫々が保持する情報を並列的に受ける。並列入力され
た情報はストア検出部３０、ベースレジスタ一致検出部
３１、及びディスプレースメント検出部３２に供給され
る。特に制限されないが、ストア命令のフォーマットは
図８に示されるように６ビットのオペレーションコード
部、５ビットのベースレジスタ部、１１ビットのディス
プレースメント部からなる。この命令フォーマットにお
けるディスティネーションアドレス（書込みメモリアド
レス）のアドレッシング方式はベースレジスタ修飾とさ
れ、ベースレジスタ部で指定されるベースレジスタの値
にディスプレースメント部の値を加算して得られる値が
ディスティネーションアドレスとされる。As shown in FIG. 7, the block transfer inhibition determination unit 14 includes the instruction queues 16A to 16D of four serial stages.
Receive the information held by each of them in parallel. The information input in parallel is supplied to the store detection unit 30, the base register match detection unit 31, and the displacement detection unit 32. Although not particularly limited, the format of the store instruction is composed of a 6-bit operation code section, a 5-bit base register section, and an 11-bit displacement section as shown in FIG. The addressing method of the destination address (write memory address) in this instruction format is base register modification, and the value obtained by adding the value of the displacement part to the value of the base register specified by the base register part is the destination address. It is said that

【００３８】上記ストア検出部３０は直列４段の命令キ
ュー１６Ａ〜１６Ｄが保持する命令のオペレーションコ
ードがストア命令のコードであることを検出してハイレ
ベルの信号を出力する。ベースレジスタ一致検出部３１
は、４段の命令キュー１６Ａ〜１６Ｄが保持する命令の
ベースレジスタ指定部の値が夫々一致することを検出し
てハイレベルの信号を出力する。ディスプレースメント
検出部３２は、４段の命令キュー１６Ａ〜１６Ｄが保持
する命令のディスプレースメント部の上位９ビットが完
全一致で、下位２ビットが順次００，０１，１０，１１
であることを検出してハイレベルを出力する。これらの
機能は図８に概念的に示されている。夫々の検出部３０
〜３２の出力はアンドゲート３３に供給される。したが
って、アンドゲート３３の出力がハイレベルにされるの
は、同一エントリの全てのキャッシュデータが書換えら
れることになる操作が指示されることになるようなスト
ア命令が連続して直列４段の命令プリフェッチキュー１
６Ａ〜１６Ｄに取り込まれた状態に応じて得られる。The store detecting section 30 detects that the operation code of the instruction held by the four-stage instruction queues 16A to 16D in series is the code of the store instruction and outputs a high level signal. Base register match detection unit 31
Detects that the values stored in the base register designating section of the instructions held by the four-stage instruction queues 16A to 16D match, and outputs a high level signal. In the displacement detection unit 32, the upper 9 bits of the displacement portion of the instruction held by the four-stage instruction queues 16A to 16D are completely coincident, and the lower 2 bits are sequentially 00, 01, 10, 11.
Is detected and a high level is output. These functions are shown conceptually in FIG. Each detection unit 30
The outputs of .about.32 are supplied to the AND gate 33. Therefore, the output of the AND gate 33 is set to the high level because a series of store instructions such that all cache data of the same entry are instructed to be rewritten is instructed continuously in four stages. Prefetch queue 1
It is obtained according to the state taken into 6A to 16D.

【００３９】アンドゲート３３の出力は直列３段のラッ
チ回路３５Ａ，３５Ｂ，３５Ｃから成るシフト回路３５
に伝達される。直列３段のラッチ回路３５Ａ，３５Ｂ，
３５Ｃの夫々の出力は上記アンドゲート３３の出力と共
に４入力型のオアゲート３４に供給される。シフト回路
３５のシフト動作は終段の命令キュー１６Ｄの出力が命
令レジスタ２３に伝達されるタイミングに同期される。
したがって、アンドゲート３３の出力が一端ハイレベル
にされると、そのとき４段の命令キュー１６Ａ〜１６Ｄ
に保持されているストア命令が順次命令レジスタ２３に
フェッチされる期間に亘ってオアゲート３４の出力はハ
イレベルに維持される。上記オアゲート３４の出力は上
記ブロック転送の可否を制御する制御ビットとみなされ
る。以下この制御ビットを、ブロック転送制御ビットＢ
ＴＣと記す。このブロック転送制御ビットＢＴＣは命令
レジスタ２３の所定のフラグビット２４に供給されて、
命令と共に命令制御部１０に与えられる。The output of the AND gate 33 is a shift circuit 35 including latch circuits 35A, 35B and 35C in three stages in series.
Be transmitted to. Latch circuits 35A, 35B of three stages in series,
The respective outputs of 35C are supplied to the 4-input type OR gate 34 together with the output of the AND gate 33. The shift operation of the shift circuit 35 is synchronized with the timing at which the output of the instruction queue 16D at the final stage is transmitted to the instruction register 23.
Therefore, if the output of the AND gate 33 is once set to the high level, then the four-stage instruction queues 16A to 16D
The output of the OR gate 34 is maintained at the high level for the period in which the store instruction held in the instruction register 23 is sequentially fetched into the instruction register 23. The output of the OR gate 34 is regarded as a control bit that controls the availability of the block transfer. Hereinafter, this control bit is referred to as block transfer control bit B
It is referred to as TC. This block transfer control bit BTC is supplied to a predetermined flag bit 24 of the instruction register 23,
It is given to the instruction control unit 10 together with the instruction.

【００４０】命令制御部１０に与えられた上記ブロック
転送制御ビットＢＴＣは、例えば図１に示されるように
メモリ制御部１０のブロック転送起動回路４０に与えら
れる。それに含まれる論理ブロック４１は、メモリへの
ストア時におけるファーストキャッシュメモリＦＣＭの
キャッシュミスに対するセカンドキャッシュメモリ２の
キャッシュヒット時に、当該キャッシュミスのキャッシ
ュデータを含むセカンドキャッシュメモリ２のエントリ
をファーストキャッシュメモリＦＣＭにデータ転送制御
するための原始的なブロック転送検出信号４２を形成す
る。このブロック転送検出信号４２は、特に制限されな
いが、ブロック転送すべき状態の原始的な検出レベルを
ハイレベルとする。当該検出信号４２を一方の入力に受
けるアンドゲート４３の他方の入力には上記ブロック転
送制御ビットＢＴＣの反転信号が供給される。したがっ
て、ブロック転送制御ビットＢＴＣがハイレベルの期間
においてはブロック転送検出信号４２がハイレベルにさ
れても当該アンドゲート４３から出力されるブロック転
送起動信号４４はディスエーブルレベルにされて、メモ
リ制御部１２による上記ブロック転送が抑止される。ブ
ロック転送制御ビットＢＴＣがローレベルのときにはブ
ロック転送検出信号４２がそのままブロック転送起動信
号４４として出力されて、メモリ制御部１２による上記
ブロック転送が可能にされる。The block transfer control bit BTC given to the instruction control unit 10 is given to the block transfer starting circuit 40 of the memory control unit 10 as shown in FIG. 1, for example. The logical block 41 included therein stores the entry of the second cache memory 2 including the cache data of the cache miss at the time of the cache hit of the second cache memory 2 against the cache miss of the first cache memory FCM at the time of storing in the memory. To form a primitive block transfer detection signal 42 for controlling data transfer. The block transfer detection signal 42 is not particularly limited, but sets the primitive detection level of the block transfer state to the high level. The inverted signal of the block transfer control bit BTC is supplied to the other input of the AND gate 43 which receives the detection signal 42 at one input. Therefore, while the block transfer control signal BTC is at the high level, even if the block transfer detection signal 42 is at the high level, the block transfer activation signal 44 output from the AND gate 43 is set to the disable level, and the memory controller The block transfer by 12 is suppressed. When the block transfer control bit BTC is at the low level, the block transfer detection signal 42 is directly output as the block transfer start signal 44, and the block transfer by the memory controller 12 is enabled.

【００４１】図１０には従来のライトアロケート方式、
非ライトアロケート方式、及び上記実施例方式による処
理ステップの比較例が示される。同図の（Ａ）は同一エ
ントリのキャッシュデータの書込み操作を要することに
なる連続４ストア命令を実行する場合の例であり、同図
の（Ｂ）は同一エントリのキャッシュデータの操作を要
することになる１ストア−３ロード命令を実行する場合
の例である。FIG. 10 shows a conventional write allocate system,
A comparative example of the processing steps according to the non-write allocate method and the above-described embodiment method is shown. (A) of the same figure is an example in the case of executing a continuous 4-store instruction that requires a write operation of cache data of the same entry, and (B) of the same figure shows that an operation of cache data of the same entry is required. 2 is an example of executing a 1 store-3 load instruction.

【００４２】同図の（Ａ）において従来のライトアロケ
ート方式と本実施例方式によってファーストキャッシュ
メモリ及びセカンドキャッシュメモリに得られる状態は
同じであるが、従来のライトアロケート方式において実
質的に無駄な最初のブロック転送の処理が省かれる結
果、処理ステップ数（パイプラインステージ数）は本実
施例方式の方が少なくされる。本実施例方式と非ライト
アロケート方式を比較すると、見掛け上は同一ステップ
数になるが後者の場合ファーストキャッシュメモリには
ストアされるべきデータはキャッシュデータとして保持
されていない。したがって、その後に当該エントリを参
照するロード命令などが実行された場合にはキャッシュ
ミスとなる。したがって、本実施例方式はその処理ステ
ップ数とその後のデータ参照におけるキャッシュヒット
率との双方の点において他の２方式よりも優れている。In FIG. 9A, the states obtained in the first cache memory and the second cache memory by the conventional write allocate method and the method of the present embodiment are the same, but in the conventional write allocate method, there is a substantially wasteful start. As a result of omitting the block transfer processing, the number of processing steps (the number of pipeline stages) is reduced in the method of this embodiment. Comparing the method of this embodiment and the non-write allocate method, the number of steps is apparently the same, but in the latter case, the data to be stored in the first cache memory is not held as cache data. Therefore, if a load instruction or the like that refers to the entry is executed thereafter, a cache miss will occur. Therefore, the method of this embodiment is superior to the other two methods in terms of both the number of processing steps and the cache hit rate in subsequent data reference.

【００４３】同図の（Ｂ）において本実施例方式は従来
のライトアロケート方式と実質的に同じとされ、データ
参照の局在性に応じたキャッシュヒット率の向上を期待
できる。非アロケート方式はそれを一切期待できず、処
理ステップ数が格段に多くなる。In FIG. 6B, the method of this embodiment is substantially the same as the conventional write allocate method, and it can be expected that the cache hit rate can be improved according to the locality of data reference. The non-allocate method cannot expect that at all, and the number of processing steps is significantly increased.

【００４４】本実施例を採用したときのシステム全体で
の性能向上は上記した連続４ストア命令の発生頻度によ
って決まるが、仮にそれが数パーセントであっても確実
にシステム性能の向上を達成できる。特に、ＲＩＳＣプ
ロセッサの場合には高機能な処理をストア命令を含むよ
うな比較的簡単な命令の組合わせで実現するため、連続
４ストア命令の発生頻度も増えると考えられる。また、
上記実施例では１エントリが４キャッシュデータを含む
が、１エントリに含まれるキャッシュデータの数が多い
ほど従来のアロケート方式に比べて大きな効果を期待で
きる。さらに、データバスのバス幅若しくはバスのビッ
ト数が増える場合にも上記連続的なスト命令の実行頻度
も増えると考えられるため、その場合にも大きな効果を
期待できる。The performance improvement of the entire system when this embodiment is adopted is determined by the occurrence frequency of the above-mentioned continuous 4-store instruction, but even if it is several percent, the system performance can be surely improved. In particular, in the case of the RISC processor, high-performance processing is realized by a combination of relatively simple instructions including a store instruction, so that it is considered that the frequency of occurrence of continuous four store instructions also increases. Also,
In the above embodiment, one entry includes four cache data, but the greater the number of cache data included in one entry, the greater the effect can be expected as compared with the conventional allocate method. Further, even if the bus width of the data bus or the number of bits of the bus increases, it is considered that the execution frequency of the continuous strike instruction also increases, so that a large effect can be expected in that case as well.

【００４５】上記実施例において４連続ストア命令の実
行中に割込みが発生して４個のストア命令が終了される
前に割込み処理が開始された場合には、ファーストキャ
ッシュメモリＦＣＭとセカンドキャッシュメモリＳＣＭ
の共通のエントリの内容が不整合とされたまま処理が中
断される。したがって、少なくとも、当該割込み処理か
ら復帰した直後においては、その不整合を修正すること
が望ましい。例えば、４個の命令キュー１６Ａ〜１６Ｄ
には図示しない特別なフラグビットを設け、アンドゲー
ト３３の出力がハイレベルにされたときにその４個のフ
ラグがセットされるようにしておく。割込みに際しては
その４個のフラグビット及びブロック転送制御ビットＢ
ＴＣも待避するようにする。復帰後においては、上記フ
ラグビットを参照し、当該フラグビットがセット状態で
ある限りブロック転送制御ビットＢＴＣをハイレベルに
維持するような処理を追加すればよい。In the above embodiment, when an interrupt occurs during execution of four consecutive store instructions and the interrupt processing is started before the end of four store instructions, the first cache memory FCM and the second cache memory SCM are used.
The processing is interrupted while the contents of the common entry of are left inconsistent. Therefore, it is desirable to correct the mismatch at least immediately after returning from the interrupt processing. For example, four instruction queues 16A to 16D
Is provided with a special flag bit (not shown) so that the four flags are set when the output of the AND gate 33 is set to the high level. When interrupted, the four flag bits and block transfer control bit B
Also try to save the TC. After the return, it is sufficient to refer to the flag bit and add processing for maintaining the block transfer control bit BTC at the high level as long as the flag bit is in the set state.

【００４６】また、上記実施例では、ブロック転送抑止
判定部１４を用いて専らハードウェア的に判定部を構成
したが、上記判定部のハードウェア構成を簡素化するに
は、同一のエントリに含まれることになる全ての情報を
連続的に書換える操作を指示する命令列の夫々の命令に
はコンパイルの段階で上記データ転送禁止指示情報を保
有させる。データ転送禁止指示情報はその他の命令コー
ド情報と共に命令制御部１０で解読してその結果をメモ
リ制御部１２に与え、或はそのデータ転送禁止指示情報
を命令から切り出して直接メモリ制御部１２に与える。
これによって同様に上記ブロック転送を選択的に禁止さ
せることできて、同様の効果を得る。Further, in the above embodiment, the block transfer inhibition determination unit 14 is used to configure the determination unit exclusively by hardware, but to simplify the hardware configuration of the determination unit, the determination unit is included in the same entry. Each instruction of the instruction sequence for instructing the operation of continuously rewriting all the information to be stored has the data transfer prohibition instruction information at the compilation stage. The data transfer prohibition instruction information is decoded together with other instruction code information by the instruction control unit 10 and the result is given to the memory control unit 12, or the data transfer prohibition instruction information is cut out from the instruction and given directly to the memory control unit 12. .
By this, similarly, the block transfer can be selectively prohibited, and the same effect can be obtained.

【００４７】上記メモリ制御部１２は、パイプライン的
な命令実行段階においてリードアクセスとライトアクセ
スが並列的に行われるとき、リードアクセスにおけるフ
ァーストキャッシュメモリＦＣＭのキャッシュミスに対
するセカンドキャッシュメモリＳＣＭのキャッシュヒッ
トの状態に対して、ライトアクセスに優先させて当該セ
カンドキャッシュメモリＳＣＭのリード動作を許容し、
これに並行してセカンドキャッシュメモリＳＣＭのライ
ト動作に代えてライトバッファ９へライト動作を行うよ
うにする。When the read access and the write access are performed in parallel in the pipeline instruction execution stage, the memory control section 12 detects the cache hit of the second cache memory SCM against the cache miss of the first cache memory FCM in the read access. For the state, the read operation of the second cache memory SCM is permitted in preference to the write access,
In parallel with this, instead of the write operation of the second cache memory SCM, the write operation to the write buffer 9 is performed.

【００４８】また、上記実施例ではファーストキャッシ
ュメモリＦＣＭに対するセカンドキャッシュメモリ２を
上位側メモリとしたが、当該上位側メモリをシステムバ
ス３の図示しないメインメモリとしすることもできる。
この場合にメモリ制御部１２は上記したブロック転送を
メインメモリからファーストキャッシュメモリに対して
行う。論理ブロック４１は、メモリへのストア時におけ
るファーストキャッシュメモリのキャッシュミスの判定
結果からブロック転送検出信号４２を生成すればよい。In the above embodiment, the second cache memory 2 for the first cache memory FCM is the upper memory, but the upper memory may be the main memory (not shown) of the system bus 3.
In this case, the memory control unit 12 performs the above block transfer from the main memory to the first cache memory. The logic block 41 may generate the block transfer detection signal 42 from the result of the cache miss determination of the first cache memory during the storage in the memory.

【００４９】上記実施例によれば以下の作用効果を得る
ことができる。（１）ストア命令の実行時におけるファーストキャッシ
ュメモリＦＣＭにおける１次キャッシュミス時に対する
セカンドキャッシュメモリ２やメインメモリなどの上位
階層メモリからのブロック転送動作を、同一エントリ内
の全データアクセス時に限り省くことができる。これに
より、上位階層メモリが外付けのＳＲＡＭ形態で構成さ
れたセカンドキャッシュメモリ２の場合は数サイクル、
メインメモリの場合は数十サイクルのペナルティ（パイ
プラインロックのパイプラインステージ数若しくはパイ
プラインサイクル数）を削減できる。したがって、シス
テム性能を向上させることができる。（２）データプロセッサ１は、高速且つ小容量の内蔵キ
ャッシュメモリＦＣＭに比べてアクセス速度は遅いもの
の記憶容量の比較的大きな上位キャッシュメモリＳＣＭ
を外部にバス接続で保有することができ、このとき、上
位キャッシュメモリＳＣＭはデータプロセッサ１の外部
にバス接続されるためそのスループットがシステムの性
能を律則するが、上位キャッシュメモリＳＣＭから内蔵
キャッシュメモリＦＣＭへの無駄なブロック転送を削減
することができる。従って、本実施例で説明したデータ
プロセッサ１を用いることにより、それが制御するシス
テムの性能を向上させることができる。（３）メモリ制御部１２によるキャッシュメモリＳＣＭ
へのリードアクセスとライトアクセスが競合するような
場合にライトバッファ９を用いてキャッシュデータのリ
ードを優先させることにより、データ処理若しくは命令
実行の遅延を最小限に抑え、更にシステム性能を向上さ
せることができる。According to the above embodiment, the following operational effects can be obtained. (1) To omit the block transfer operation from the upper cache memory such as the second cache memory 2 and the main memory when the primary cache miss occurs in the first cache memory FCM at the time of executing a store instruction only when all data in the same entry is accessed. You can As a result, if the upper-layer memory is the second cache memory 2 configured in the external SRAM form, several cycles,
In the case of the main memory, the penalty of tens of cycles (the number of pipeline stages or pipeline cycles of pipeline lock) can be reduced. Therefore, system performance can be improved. (2) The data processor 1 is a high-order cache memory SCM having a relatively large storage capacity, although the access speed is slower than that of the built-in cache memory FCM having a high speed and a small capacity.
Can be held externally by bus connection. At this time, since the upper cache memory SCM is connected to the outside of the data processor 1 by bus, its throughput regulates the system performance. It is possible to reduce unnecessary block transfer to the memory FCM. Therefore, by using the data processor 1 described in this embodiment, the performance of the system controlled by the data processor 1 can be improved. (3) Cache memory SCM by the memory control unit 12
When the read access and the write access conflict with each other, the write buffer 9 is used to prioritize the read of the cache data to minimize the delay of the data processing or the instruction execution, and further improve the system performance. You can

【００５０】以上本発明者によってなされた発明を実施
例に基づいて具体的に説明したが、本発明はそれに限定
されるものではなく、その要旨を逸脱しない範囲におい
て種々変更可能であることは言うまでもない。例えば、
ファーストキャッシュメモリのような下位側キャッシュ
メモリの構成は上記実施例に限定されず適宜変更可能で
ある。例えば、命令キャッシュメモリとデータキャッシ
ュメモリの何れか一方を備える構成であっても、また命
令とデータを混在させて格納する構成であってもよい。
メモリ制御部１２へのブロック転送制御ビットＢＴＣの
伝達は当該ブロック転送制御ビットＢＴＣの直接伝達で
あってもよく、また命令制御部１０によるデコード結果
から得られる信号をメモリ制御部１２へ伝達してもよ
い。Although the invention made by the present inventor has been specifically described based on the embodiments, the present invention is not limited to the embodiments and various modifications can be made without departing from the scope of the invention. Yes. For example,
The configuration of the lower cache memory such as the first cache memory is not limited to the above embodiment and can be changed as appropriate. For example, the configuration may be such that either one of the instruction cache memory and the data cache memory is provided, or the instruction and data may be mixed and stored.
The transmission of the block transfer control bit BTC to the memory control unit 12 may be a direct transmission of the block transfer control bit BTC, or a signal obtained from the decoding result by the instruction control unit 10 may be transmitted to the memory control unit 12. Good.

【００５１】[0051]

【発明の効果】本願において開示される発明のうち代表
的なものによって得られる効果を簡単に説明すれば下記
の通りである。The effects obtained by the typical ones of the inventions disclosed in the present application will be briefly described as follows.

【００５２】（１）書込みを伴う命令実行に際しての１
次キャッシュミス時に対する上位階層メモリからのブロ
ック転送動作を、同一エントリ内の全データアクセス時
に限り省くことができる。これにより、新たにブロック
転送されても一度も参照されること無く後続の命令によ
り順次キャッシュデータが書き換えられてしまうことに
なるエントリの無駄なブロック転送を削減でき、システ
ム性能を向上させることができる。（２）データプロセッサは、高速且つ小容量の内蔵下位
キャッシュメモリに比べてをアクセス速度は遅いものの
記憶容量の比較的大きな上位キャッシュメモリを外部に
バス接続で保有することができ、このとき、上位キャッ
シュメモリはデータプロセッサの外部にバス接続される
ためそのスループットがシステムの性能を律則するが、
上位キャッシュメモリから内蔵下位キャッシュメモリへ
の無駄なブロック転送を削減することができ、これによ
ってシステム性能を向上させることができる。(1) 1 when executing an instruction involving writing
The block transfer operation from the upper layer memory for the next cache miss can be omitted only when all data in the same entry is accessed. As a result, it is possible to reduce wasteful block transfer of entries that would cause cache data to be sequentially rewritten by subsequent instructions without being referenced even if a new block is transferred, and improve system performance. . (2) The data processor can externally hold a high-order cache memory having a relatively large storage capacity but a relatively large storage capacity by bus connection as compared with a high-speed and small-capacity built-in low-order cache memory. Since the cache memory is connected to the outside of the data processor by a bus, its throughput regulates the system performance.
It is possible to reduce unnecessary block transfer from the upper cache memory to the built-in lower cache memory, thereby improving the system performance.

[Brief description of drawings]

【図１】本発明の一実施例に係るキャッシュメモリシス
テムのブロック図である。FIG. 1 is a block diagram of a cache memory system according to an embodiment of the present invention.

【図２】ファーストキャッシュメモリの全体的な一例構
成ブロック図である。FIG. 2 is a block diagram showing an overall configuration of a first cache memory.

【図３】ライトアロケート方式による処理手順説明図で
ある。FIG. 3 is an explanatory diagram of a processing procedure by a write allocate system.

【図４】非ライトアロケート方式に処理手順説明図であ
る。FIG. 4 is an explanatory diagram of a processing procedure in a non-write allocate system.

【図５】パイプラインロックの説明図である。FIG. 5 is an explanatory diagram of pipeline lock.

【図６】局在性の有るデータ参照が発生しないことが明
らかな場合とそうでない場合の一例を示す説明図であ
る。FIG. 6 is an explanatory diagram showing an example of a case in which it is clear that a localized data reference does not occur and a case in which it is not.

【図７】命令バッファレジスタ及びブロック転送制御部
の一例ブロック図である。FIG. 7 is a block diagram of an example of an instruction buffer register and a block transfer control unit.

【図８】ブロック転送抑止判定部の機能を概念的に示す
説明図である。FIG. 8 is an explanatory diagram conceptually showing the function of a block transfer inhibition determination unit.

【図９】一つのエントリとこれに含まれる複数個のキャ
ッシュデータとの関係を示す一例説明図である。FIG. 9 is an explanatory diagram showing an example of the relationship between one entry and a plurality of cache data included therein.

【図１０】従来のライトアロケート方式、非ライトアロ
ケート方式、及び本実施例による方式の処理ステップ数
を連続４ストア命令実行時と１ストア−３ロード命令実
行時の夫々に分けた比較説明図である。FIG. 10 is a comparative explanatory diagram in which the number of processing steps of the conventional write allocate method, non-write allocate method, and method according to the present embodiment is divided into four consecutive store instruction executions and one store-3 load instruction execution. is there.

[Explanation of symbols]

１データプロセッサ２セカンドキャッシュメモリ３システムバス４命令キャッシュメモリ５データキャッシュメモリ６アドレス変換バッファ７タグキャッシュメモリ８インタフェース回路９ライトバッファ１０命令制御部１２メモリ制御部１３命令バッファレジスタ１４ブロック転送抑止判定部１５コンパレータＦＣＭファーストキャッシュメモリＳＣＭセカンドキャッシュメモリ１６Ａ〜１６Ｄ命令キュー２３命令レジスタ２４フラグビットＢＴＣブロック転送制御ビット３０ストア検出部３１ベースレジスタ一致検出部３２ディスプレースメント検出部３３アンドゲート３４オアゲート３５シフト回路４０ブロック転送起動回路４１論理ブロック４２ブロック転送起動信号４３アンドゲート４４ブロック転送起動信号 1 data processor 2 second cache memory 3 system bus 4 instruction cache memory 5 data cache memory 6 address conversion buffer 7 tag cache memory 8 interface circuit 9 write buffer 10 instruction control unit 12 memory control unit 13 instruction buffer register 14 block transfer inhibition determination unit 15 comparator FCM first cache memory SCM second cache memory 16A to 16D instruction queue 23 instruction register 24 flag bit BTC block transfer control bit 30 store detection unit 31 base register match detection unit 32 displacement detection unit 33 AND gate 34 OR gate 35 shift circuit 40 Block transfer start circuit 41 Logical block 42 Block transfer start signal 43 AND gate 4 Block transfer start-up signal

Claims

[Claims]

1. An upper side memory, a lower side memory for holding a plurality of single entries including the plurality of cache data with data held by the upper side memory as cache data, and writing In response to a cache miss in the lower side memory during instruction execution, a memory control unit that can transfer an entry including cache data of the cache miss from the upper side memory to the lower side memory, and whether or not to perform the data transfer Is a determination unit for determining based on the code information of the instruction accompanied by the writing,
A determination unit that prohibits the data transfer by the memory control unit for an operation in which all cache data included in a single entry is continuously rewritten, and otherwise permits the data transfer by the memory control unit. And a cache memory system comprising:

2. An upper cache memory for holding a plurality of entries with a plurality of cache data as a single entry, and a lower cache memory for holding a subset of the entries held by the upper cache memory, Memory control that can transfer the entry of the upper cache memory including cache data of the cache miss to the lower cache memory when the cache hit of the upper cache memory against the cache miss of the lower cache memory at the time of executing an instruction accompanying writing And a determination unit that determines whether or not to perform the data transfer, based on code information of the instruction accompanying the writing,
A determination unit that prohibits the data transfer by the memory control unit in response to an operation in which all cache data included in a single entry is continuously rewritten, and otherwise allows the memory control unit to allow the data transfer. And a cache memory system comprising:

3. The determination unit receives in parallel code information of prefetched instructions, the number of which corresponds to the number of cache data forming a single entry, and thereby all caches included in the single entry. A detection circuit that detects a state instructing an operation in which data is continuously rewritten, and the above detection during the execution period of all the instructions supplied in parallel to the detection circuit when the detection state is obtained. 3. The cache memory system according to claim 1, further comprising: a delay circuit for maintaining a state, and a storage unit for passing an output of the delay circuit to the memory control unit.

4. The store detection unit, wherein the detection circuit detects that the code information of each instruction received in parallel includes an operation code of a store instruction, and the code information of the plurality of instructions received in parallel are the same. It is detected that the detection state is obtained in both the store detection unit and the address detection unit that detects that the cache data included in the entry includes the addressing information that indicates the entire cache data without duplication. 4. The cache memory system according to claim 3, further comprising a logic gate which is supplied to the delay circuit.

5. Each instruction of an instruction sequence for instructing an operation of continuously rewriting all information to be included in the same entry has the data transfer prohibition instruction information, and the determination section 3. The cache memory system according to claim 1, wherein the memory control unit prohibits the data transfer based on the data transfer prohibition instruction information.

6. An interface circuit to which an upper cache memory can be exclusively connected externally, and a single entry including a plurality of cache data, wherein the data held by the upper cache memory is used as cache data. When the cache hit of the upper side cache memory against the cache miss of the lower side cache memory at the time of executing an instruction accompanied by a write, the upper side cache memory including the cache data of the cache miss A memory control unit that can transfer the data to the lower cache memory, and a judgment unit that judges whether or not to perform the data transfer based on the code information of the instruction accompanied by the writing. All included cache data is continuous A control unit that prohibits the above data transfer by the memory control unit for an operation that is rewritten, and a determination unit that permits the above data transfer to the memory control unit if not, and a central processing unit that executes instructions in a pipeline manner. A data processor characterized by comprising: and a single chip.