JPH02197940A

JPH02197940A - Hierarchical memory system including another cash memory for storing data and instructions

Info

Publication number: JPH02197940A
Application number: JP1040435A
Authority: JP
Inventors: Pransis Paul Karba; カルバ、プランシス・ポール; John Coke; コーク、ジヨン; Norman H Kritzer; クライツアー、ノーマン・エイチ; George Radin; ラデイン、ジヨージ
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1989-02-22
Filing date: 1989-02-22
Publication date: 1990-08-06
Also published as: JPH0348540B2

Abstract

PURPOSE: To shorten the waiting time of a CPU by writing back the instruction, which is read into a cache for data and is changed, to a main storage device and invalidating a line corresponding to the changed instruction in a cache for instruction. CONSTITUTION: Assuming that a line is effective and indicates the change in the case of the occurrence of data cache mistake, the data line in the data cache must be written back to the storage device before a new data line can be loaded to this specific line in the data cache. Therefore, a write-back latch/ multiplexer 52 is energized. A byte selection mechanism 46 and byte write gates 50 control the storage of data to the data cache. The byte selection mechanism 49 gates main storage data so that only double words of main storage data pass a byte input multiplexer 48 at the time of CPU load OP accompanied with directory mistake, and all byte write gates 50 are activated. Thus, the waiting time of the CPU is minimized.

Description

【発明の詳細な説明】Ａ、技術分野本発明は、それぞれ命令記憶用およびデータ記憶用の別
個のキャッシュを有する、独特のキャッシュ・アーキテ
クチャを用いた階層記憶構成に関するものである。この
階層記憶構成は、メモリ・アクセスによるＣＰＵの待ち
時間をできるだけ減らすことが特に望まれる、高速電子
計算システム用に特に適している。DETAILED DESCRIPTION OF THE INVENTION A. TECHNICAL FIELD The present invention relates to a hierarchical storage organization using a unique cache architecture, each having separate caches for instruction storage and data storage. This hierarchical storage arrangement is particularly suitable for high speed electronic computing systems where it is particularly desirable to minimize CPU latency due to memory accesses.

Ｂ、背景技術現代の高速電子データ処理システムは、処理装置あるい
はＣＰＵと階層記憶システムとから構成されることが多
い。後者は、サイクル・タイムが処理装置のサイクル・
タイムよりも、ずっと長い相対的に大容量で低速のメモ
リと、通常はキャッシュと呼ばれているサイクル番タイ
ムが処理装置のサイクル・タイムに匹敵する、相対的に
はるかに小容量で高速のメモリとを含んでいる。合理的
なコストで実効メモリ・アクセス時間を減らすためのこ
のようなキャッシュ・メモリーシステムは、当技術分野
でよく知られている。ＣＰＵが情報を必要とする時、情
報は主記憶装置から読取られ、処理装置に与えられ、キ
ャッシュ記憶に書込まれる。その後に同じ情報を処理装
置が必要とする場合には、その情報がキャツユから直接
読取られて、普通なら主記憶装置を読取る際に生じるは
ずの時間遅延を避けることができる。B. Background Art Modern high-speed electronic data processing systems often consist of a processing unit or CPU and a hierarchical storage system. The latter means that the cycle time is
A relatively large, slow memory whose cycle time is much longer than that of a processing unit; and a relatively much smaller, faster memory whose cycle time, usually called a cache, is comparable to the cycle time of a processing unit. Contains. Such cache memory systems for reducing effective memory access time at reasonable cost are well known in the art. When the CPU requires information, the information is read from main memory, provided to the processing unit, and written to cache storage. If the same information is subsequently needed by the processing unit, it can be read directly from the cache, avoiding the time delay that would normally occur when reading from main memory.

しかし、キャッシュ記憶装置が７杯の場合には、必要な
情報は主記憶装置から得なければならず、この新しい情
報を記憶するためのキャッシュ中の記憶位置が識別され
なければならない。しかしながら、古い記憶位置が新し
いデータの記憶に使用できるようになる前に、現在キャ
ッシュ中にあるデータがプログラムによって変更されて
いるかどうかの判定が行われなければならず０、変更さ
れている場合には、（必要ならば）主記憶装置が現在の
データの状態を適切に反映するように、データが主記憶
装置に書戻されなければならない。現在の大部分のキャ
ッシュ・アーキテクチャは、そのような書戻しを必要と
しているが、その変更された形のデータが再びプログラ
ムによって必要とされることがない場合には、あるいは
データが決して変更されない場合には、この書戻し機能
を削除すれば、明らかに有利であろう。However, if the cache storage is 7 full, the required information must be obtained from main storage, and a location in the cache must be identified to store this new information. However, before an old storage location can be used to store new data, a determination must be made whether the data currently in the cache has been modified by the program; The data must be written back to main memory (if necessary) so that main memory properly reflects the current state of the data. Most current cache architectures require such writeback, but only if the data in its modified form is never needed by the program again, or if the data is never modified. , it would be clearly advantageous to remove this write-back function.

既存の多くのキャッシュ・アーキテクチャにみられるも
う１つの共通な特徴は、それらがシステム・ソフトウェ
アに対して基本的に透明なことである。即ち、コンパイ
ラ、オペレーティング・システム等を含めたシステム・
ソフトウェアは、キャッシュが存在しないかのようにメ
モリの取出しおよび記憶の動作を行う。このようなシス
テムでは、キャッシュのハードウェアが本質的にＣＰＵ
と主記憶装置との間に介在する。かかるシステムでは、
キャッシュの存在によって実効メモリ・アクセス時間が
大幅にスピード・アップされるが、かかる高速の記憶装
置から得られるはずの利益の多くは、使用されるアーキ
テクチャおよび規約のせいで失われる。Another common feature of many existing cache architectures is that they are essentially transparent to system software. In other words, the system including compiler, operating system, etc.
The software performs memory fetch and store operations as if the cache were not present. In such systems, the cache hardware is essentially
and the main memory. In such a system,
Although the presence of a cache significantly speeds up the effective memory access time, much of the benefit that could be gained from such faster storage is lost due to the architecture and conventions used.

ＡＣＭ　　５ＩＧＰＬＡＮ　　Ｎｏｔｉｃｅ、Ｖｏｌ、
１７、Ｎｏ、４、Ａｐｒｉｌ　１９８２、　ｐｐ、３９
〜４７所載のＧｅｏｒｇｅ　ＲａｄｉｎによるＴｈｅ８
０１　Ｍｉｎｉｃｏｍｐｕｔｅｒと題する論文は、本発
明の技術思想を利用した、命令用およびデータ用の別個
のキャッシュ記憶を含む階層記憶構成を組込んだ実験的
なミニコンピユータの概説である。ACM 5IGPLAN Notice, Vol.
17, No. 4, April 1982, pp, 39
~The8 by George Radin in 47
The paper entitled 01 Minicomputer is an overview of an experimental minicomputer that utilizes the concepts of the present invention and incorporates a hierarchical storage configuration including separate cache storage for instructions and data.

米国特許第４１４２２３４号、およびＩＢＭＴａｃｈｎ
ｉＬｃａｌ　　Ｄｉａｅｌｏｉｕｒｅ　　Ｂｕｌｌｉｅ
ｔｉｎＶｏ　１．１８　Ｎｏ、　１２、Ｍａｙ　１９７
６、および米国特許第４０５６８４４号は、一般にキャ
ッシュ記憶を含む階層記憶構成を開示しているが、それ
らのキャッシュは別個のデータ部および命令部に分割さ
れてもいなければ、メモリ・システムの動作を制御する
ためのプログラムにアクセス可能な特別の制御フィール
ドを含んでもいない。U.S. Patent No. 4,142,234, and IBM Tachn
iLcal Diaeloiure Bullie
tinVo 1.18 No, 12, May 197
No. 6, and U.S. Pat. No. 4,056,844, disclose hierarchical storage configurations that generally include cache storage, but the caches are not divided into separate data and instruction portions or otherwise affect the operation of the memory system. It does not contain any special control fields accessible to the program for control.

米国特許第４１６１０２４号および第４１９５３４２号
は、当技術分野で一般に知られているように、ＣＰＵか
もキャッシュへの直接のインターフェースを含むＦＤＰ
システムを記載している。U.S. Pat.
The system is described.

米国特許第４０７０７０６号は、キャッシュがデータ部
とアドレス部（ディレクトリではない）に分割されたキ
ャッシュ記憶システムを開示しているが、このキャッシ
ュはデータ部と命令部には分割されていない。No. 4,070,706 discloses a cache storage system in which the cache is divided into a data section and an address section (not a directory), but the cache is not divided into a data section and an instruction section.

米国特許第４２４５３０４号は動作を２つの半サイクル
に分け、キャッシュからの命令のアクセスまたはキャッ
シュへのデータの書込みが同じ半サイクル中に行えるよ
うにした１、スプリット・キャッシュ中システムについ
て記載している。この特許は、キャッシュ／主記憶装置
の動作を制御するためのキャッシュ・ディレクトリ中の
特殊な制御ビットについては開示も示唆もしていない。U.S. Pat. No. 4,245,304 describes a split-cache system in which operations are divided into two half-cycles so that instructions can be accessed from or written to the cache during the same half-cycle. . This patent does not disclose or suggest special control bits in the cache directory to control cache/main memory operations.

米国特許第４０７５６８６号は、メモリ・アクセス動作
の際に特定の命令ビットのコーディングに従ってキャッ
シュを選択的にバイパスし、それによっである種類の動
作の実行時間を短縮する、キャッシュ記憶システムにつ
いて記載している。U.S. Pat. No. 4,075,686 describes a cache storage system that selectively bypasses the cache during memory access operations according to the coding of specific instruction bits, thereby reducing execution time for certain types of operations. There is.

米国特許第４１４２２３４号は、キャッシュ−ディレク
トリの特定の質問を除去して、スペックのサイズを減少
させた、キャッシュ・システムを開示している。U.S. Pat. No. 4,142,234 discloses a cache system that eliminates cache-directory specific queries to reduce the size of specs.

米国特許第３６１８０４１号およびＩＢＭＴｅｃｈｎｉ
ｃａｌ　　Ｄｉｓｃｌｏｓｕｒｅ　　　Ｂｕｌｌｅｔｉ
ｎＶｏｌ、２２　　Ｎｏ、１１、Ａｐｒｉｌ　１９８０
、ｐ、５１８３は、キャッシュ・サブシステムを別個の
命令部とデータ部に分割し、２つの別個のキャッシュの
オーバーラツプ動作のために、複雑なオペレーティング
・システムのサポートを与える、基本的な概念を大まか
に開示している。U.S. Patent No. 3,618,041 and IBM Techni
cal Disclosure Bulletin
nVol, 22 No. 11, April 1980
, p. 5183 outlines the basic concept of dividing the cache subsystem into separate instruction and data parts and providing complex operating system support for overlapping operation of the two separate caches. has been disclosed.

米国特許第４１９７５８０号は、特殊な有効ビットおよ
びある種の読取り一書込み動作制御用の「タグ」を用い
た、単一キャッシュ・サブシステムを大まかに開示して
いる。このビットは、不必要な主記憶サイクルではなく
、不必要なキャッシュ・サイクルを避けるために使われ
るが、本発明は不要な主記憶サイクルを避けることを意
図したものである。US Pat. No. 4,197,580 generally discloses a single cache subsystem using special valid bits and "tags" for controlling certain read-write operations. This bit is used to avoid unnecessary cache cycles rather than unnecessary main memory cycles, and the present invention is intended to avoid unnecessary main memory cycles.

Ｃ８発明の目的および概要本発明の目的は大容量の主記憶装置ならびに、それぞれ
データおよび命令の記憶専用の２つの別個の部分に分割
された小容量のキャッシュ記憶装置を有する、改良され
た階層記憶構成を提供することである。C8 OBJECTS AND SUMMARY OF THE INVENTION It is an object of the present invention to provide an improved hierarchical storage having a large capacity main memory and a small capacity cache storage divided into two separate parts, each dedicated to the storage of data and instructions. It is to provide configuration.

本発明によれば、かかるキャッシュ記憶構成はキャッシ
ュ・ディレクトリ中に各ラインについてキャッシュの動
作に影響を与える少なくとも１つの特殊な制御ビットを
有する。According to the invention, such a cache storage arrangement has at least one special control bit in the cache directory for each line that affects the operation of the cache.

本発明によれば、かかるキャッシュ記憶装置は、キャッ
シュ・ライン置き換え過程において用いられる少なくと
も１つの特殊な制御ビットを有する。According to the invention, such a cache storage device has at least one special control bit used in the cache line replacement process.

本発明によれば、かかるキャッシュ記憶システムは、デ
ータφキャッシュ中のラインが当初そこに記憶されて以
来、変更されたかどうかを制御ビットが指示する。In accordance with the present invention, such a cache storage system has control bits that indicate whether a line in the data φ cache has been modified since it was originally stored therein.

本発明によれば、所定のキャッシュ・ラインが有効か否
かを指示するような制御ビットが提供さ１゜れる。In accordance with the present invention, a control bit is provided to indicate whether a given cache line is valid or not.

本発明によれば、主記憶に対する照会を必要とすること
な（、スクラッチ・パッド記憶等のために一時的データ
記憶スペースが与えられるような。In accordance with the present invention, temporary data storage space is provided for scratch pad storage, etc., without requiring queries to main memory.

キャッシュ・サブシステムが提供される。A cache subsystem is provided.

本発明によれば、全てのマス制御がシステム・ソフトウ
ェアによって制御され、データ・キャッシュに記憶され
たデータに対して変更が加えられたときに命令キャッシ
ュに通報するための制御装置が必要でないようなキャッ
シュ・サブシステムが提供される。In accordance with the present invention, all mass controls are controlled by the system software and no controller is required to notify the instruction cache when changes are made to data stored in the data cache. A cache subsystem is provided.

本発明の目的は、一般に大容量で比較的低速の主記憶装
置および遥かに小容量で高速のキャッシュ記憶装置を含
む、高速データ処理装置と共に用いられる階層記憶シス
テムによって達成される。The objects of the present invention are achieved by a hierarchical storage system for use with high speed data processing devices, which typically includes a large capacity, relatively slow main memory and a much smaller capacity, high speed cache storage.

キャッシュ記憶装置は、それぞれデータおよび命令の記
憶専用の２つの別個の部分から成る。各部分は、上記キ
ャッシュに記憶された各ラインに関する記憶位置を有す
る１つのキャッシュ・ディレクトリを備えている。両デ
ィレクトリのそれぞれの上記位置は、各々のキャッシュ
・ラインに記憶されているデータの主記憶アドレスの高
位ビットを記憶するための手段、および特定のラインへ
のメモリ・アクセスが要求される度に、上記キャッシュ
記憶装置の動作を制御するためのシステムによってセッ
ト可能な、特殊制御ビットを記憶するための手段を有す
る。上記のどちらのキャッシュ・ディレクトリも、キャ
ッシュ・ミスが生じた時に複数のキャッシュ・サブセッ
ト・ラインのどれが置き換えられるべきかを指示するた
めの制御ビットを各ディレクトリ・エントリに含んでい
る。Cache storage consists of two separate parts, each dedicated to the storage of data and instructions. Each portion includes a cache directory having a storage location for each line stored in the cache. The above locations in each of both directories provide a means for storing the high order bits of the main memory address of the data stored in each cache line, and each time memory access to a particular line is requested. Means is provided for storing special control bits settable by the system for controlling operation of the cache storage device. Both cache directories described above include control bits in each directory entry to indicate which of a plurality of cache subset lines should be replaced when a cache miss occurs.

また、上記のどちらのディレクトリも特定のキャッシュ
・ディレクトリ記憶位置によって指定される所定のキャ
ッシュ・ラインが無効である事を示すための、複数の制
御ビットを各記憶位置に含んでいる。また、データ・キ
ャッシュのためのキャッシュ・ディレクトリの各記憶位
置は、記憶位置に記憶された特定のアドレスによって指
定される所定のキャッシュ・ラインが、以前のＣＰＵ動
作によって変更されたこと、およびキャッシュ中の特定
のラインが置き換えできるようになる前に、主記憶中の
対応するラインが「書き戻し」動作によって更新されな
ければならないことを示すための複数のビットをも含ん
でいる。Both directories also include a plurality of control bits at each location to indicate that a given cache line specified by a particular cache directory location is invalid. Additionally, each memory location in the cache directory for a data cache indicates that a given cache line specified by a particular address stored in the memory location has been modified by a previous CPU operation, and that the cache It also includes bits to indicate that the corresponding line in main memory must be updated by a "write back" operation before a particular line in main memory can be replaced.

また、キャッシュ・ディレクトリ中に種々の制御ビット
をセットするための手段ならびにＣＰＵによるメモリ・
アクセス動作とは独立に主記憶装置から特定のキャッシ
ュ・ラインをロードして記憶し、またその逆の動作を行
うための手段がキャッシュ制御システムに設けられてい
る。It also provides means for setting various control bits in the cache directory and memory access by the CPU.
Means are provided in the cache control system for loading and storing particular cache lines from main memory and vice versa independent of access operations.

Ｄ１９発明を実施するための最良の形態（−）　　記憶
階層の一般的説明きわめて高速に動作するシステム（ＣＰＵ）の性能は、
記憶サブシステムの性能に大きく依存する。現在の技術
によれば、アクセス・タイムが６０ナノ秒のキャッシュ
およびアクセス争タイムが約１１５の速さの即ち約３０
０ナノ秒のバッキング記憶装置を製造することが可能で
ある。記憶サブシステムの性能の改善は、システム全体
の性能の点で非常に良好な結果を与える。D19 Best Mode for Carrying Out the Invention (-) General Explanation of Storage Hierarchy The performance of a system (CPU) that operates at extremely high speed is
Highly dependent on storage subsystem performance. Current technology provides caches with access times of 60 nanoseconds and access contention times as fast as about 115, or about 30 nanoseconds.
It is possible to produce 0 nanosecond backing storage. Improving the performance of the storage subsystem gives very good results in terms of overall system performance.

本発明は、２つの領域の動作においてキャッシュ・サブ
システムの性能を改善することを目標とする。その第１
はキャシュ・ヒツト率を改善すること、即ちキャッシュ
中に見い出され、従って主記憶装置へのアクセスおよび
それに伴う遅延を要しない記憶装置参照のパーセンテー
ジを増加させることである。第２はキャッシュ参照が失
敗したときに、バッキング記憶装置からラインをアクセ
スする時間を改善することである。The present invention aims to improve the performance of the cache subsystem in two areas of operation. The first
The objective is to improve the cache hit ratio, ie, to increase the percentage of storage references that are found in the cache and therefore do not require access to main storage and the associated delay. The second is to improve the time to access a line from backing storage when a cache reference fails.

第１の目的について考えると、バッキング記憶装置への
頻繁なアクセスは、プログラムの正しい実行にとって必
ずしも必要でないことがわかる。Considering the first objective, it can be seen that frequent access to backing storage is not necessary for correct execution of a program.

バッキング記憶装置への頻繁なアクセスが行われるのは
、ハードウェアがソフトウェアの意味を推測できないた
めである。一般にこの不必要な参照は２つの類に分けら
れる。Frequent accesses to the backing storage occur because the hardware cannot infer the meaning of the software. Generally, these unnecessary references fall into two categories.

第１はプログラムが新たな記憶のブロックを望むことで
ある。これはプロシージャが呼び出され一時的な（即ち
ＡＵＴＯＭＡＴ　Ｉ　Ｃの）記憶域を必要とするとき、
第ルベル割込みハンドラがレジスタ保存領域を必要とす
るとき、アクセス方式がバッファを必要とするとき、ま
たはプログラムがＧＥＴＭＡＩＮ要求を出すとき等に起
こり得る。The first is that the program wants a new block of memory. This means that when a procedure is called and requires temporary (i.e. AUTOMATIC) storage,
This can occur when a second level interrupt handler requires a register save area, when an access method requires a buffer, or when a program issues a GETMAIN request.

これら全ての場合に類似しているのは、プログラムが、
記憶装置の古い内容に対して関心を持たない事である。What is similar in all of these cases is that the program
Don't be concerned about the old contents of your storage device.

プログラムは、ただ、若干の記憶域を望んでいるだけで
ある。しかし、現行の大多数の記憶サブシステムは、か
かる記憶装置の最初の参照が起きたとき、バッキング記
憶装置からキャッシュへ古いラインを取シ出す。そのよ
うＫするのは、（ＣＰＵとキャッシュとの間のアクセス
の単位はワードなので）最初の参照がせいぜいライン中
の１つのワードに対するものであり、そして後続する要
求はライ″ン中の他のワードを更新するまではそれらを
必要としないという事をサブシステムが知ることができ
ないためである。The program just wants some storage. However, most current storage subsystems flush old lines from backing storage to cache when the first reference to such storage occurs. The reason for doing so is that the first reference is to at most one word in the line (since the unit of access between the CPU and the cache is a word), and subsequent requests refer to other words in the line. This is because the subsystem cannot know that it does not need the words until it updates them.

第２の場合は、その内容が変更されていたとしてもプロ
グラムがもはや記憶装置のブロックを必要としない場合
である。これはプロシージャからのリターン時に一時的
記憶域が解放されるとき、バッファが解放されるとき、
一般にプログラムがＦ’ＲＥＥ　ＭＡＩＮを出すとき等
に起こシ得る。The second case is when the program no longer needs the block of storage, even if its contents have changed. This occurs when temporary storage is freed on return from a procedure, when buffers are freed,
Generally, this can occur when a program issues F'REE MAIN.

変更されたラインのかかる書き戻しが不必要であると判
定するための機構を有する現在利用可能な記憶サブシス
テムは当技術分野で知られていない。No currently available storage subsystem is known in the art that has a mechanism for determining that such writeback of modified lines is unnecessary.

従って、ソフトウェアがそのような情報をハードウェア
即ちキャッシュ制御機構に与えるために用いることので
きる（命令の形の）ハードウェアの基本的機構をＣＰＵ
が提供すると有利である。よシ具体的には、一般にキャ
ッシュ動作のかかるソフトウェア制御を可能にする、キ
ャッシュ制御ハードウェアによって実行される２つの命
令が定義される。それらは次のように定義される。Therefore, the CPU has the basic hardware mechanism (in the form of instructions) that software can use to provide such information to the hardware, i.e., the cache control mechanism.
It is advantageous to provide Specifically, two instructions are defined that are generally executed by cache control hardware to enable such software control of cache operations. They are defined as follows.

（１）　　データ・キャッシュ・ライン・セット（２）
　　データ・キャッシュ・ライン無効化これらの命令は
制御プログラムによって出され、コンパイラによって適
用業務プログラムのために生成される。これらの命令は
、かかる不必要なバッキング記憶アクセスが起きないこ
とを保証する。(1) Data cache line set (2)
Data Cache Line Invalidation These instructions are issued by the control program and generated by the compiler for the application program. These instructions ensure that such unnecessary backing storage accesses do not occur.

事実、プロシージャによって必要とされる一時的記憶域
はスタックで管理され、またスーパーバイザ呼び出しさ
えも要求に応じて呼び出されるので、ディスパッチされ
たプロセス・データが持続的（即ちＳ　ＲＡＴ　Ｉ　ｅ
的）であるか、まだはスタックの深さがキャッシュ・サ
イズと比較して大きくならない限シ、ディスパッチされ
たプロセス・データに対してバッキング記憶装置がアク
セスされることはない。従ってバッキング記憶は、２次
記憶装置（即ちファイル空間およびベージング領域）に
よって演じられる役割に類似した役割を演じ始める。か
かる戦略を用いれば、割シ込み時のｃｐＵのアクティビ
イティは「優先レベル割り込み」システムがそのハイ・
エンドで実施するアクティビイティ以上ではない。この
「優先レベル割シ込み」システムも、°その内部レジス
タを高速メモリ（即ちレジスタ空間）に記憶しなければ
ならない。In fact, the temporary storage required by procedures is managed in the stack, and even supervisor calls are called on demand, so that dispatched process data is persistent (i.e. S RAT I e
The backing storage is not accessed for dispatched process data until the stack depth is large compared to the cache size. Backing storage therefore begins to play a role similar to that played by secondary storage (ie, file space and paging area). Using such a strategy, the cpu's activity during interrupts is controlled by the "priority level interrupt" system.
No more than an activity performed at the end. This "priority level interrupt" system also must store its internal registers in high speed memory (ie, register space).

前述のＣＰＵ構成の相違点は、それが高速メモリをこの
目的専用にしないこと、従って費用を節約できることで
ある。かかるアーキテクチャのＣＰＵはｒ再ディスバッ
チ」時にキャッシュ・ミスを見つけることができるが、
それは経路の応答クリティカル部分にはない。The difference in the aforementioned CPU configuration is that it does not dedicate high speed memory to this purpose, thus saving costs. Although CPUs with such architectures can detect cache misses during 'redisbatch',
It is not in the response critical part of the route.

大部分のシステムの場合と同様に、ＣＰｔＪは命令の実
行を見越してキャッシュから命令を取出す。ＣＰＵは「
事前取出しバッファ」　［ここで開示するシステムでは
６ワードのバッファが想定されている）を有し、ＣＰＵ
はそれを一杯に保つように試みる。このバッファを一杯
にするとキャッシュ・ミスが生じて、バッキング記憶装
置からの命令取出しを開始させることがある。As with most systems, CPtJ retrieves instructions from the cache in anticipation of execution. The CPU is
It has a prefetch buffer (a 6-word buffer is assumed in the system disclosed here), and
tries to keep it full. Filling this buffer may result in a cache miss and initiate an instruction fetch from backing storage.

しかし取出される命令の前に、既にバッファ中に存在す
るがまだ実行されていない分岐命令があることがあり得
る。本発明のアーキテクチャにおいては、ＣＰＴＪの事
前取出し機構がＯＰコードを走査し、かかる不必要なバ
ッキング記憶装置の取出しを禁止することが想定されて
いる。実際、この目的のためにＯＰコードを走査する間
に、Ｎ０ＯＰ（／−・オペレーション）を認識シて削除
することもでき、従ってその実行時間はゼロになる。However, there may be a branch instruction that is already in the buffer but not yet executed before the instruction that is fetched. In the present architecture, it is envisioned that the CPTJ's prefetch mechanism scans the OP code and inhibits such unnecessary backing storage fetches. In fact, while scanning the OP code for this purpose, the N0OP (/- operation) can also be recognized and deleted, so that its execution time becomes zero.

本発明の教示にもとづいて記憶サブシステムの性能を改
善する第２の方法は、バッキング記憶装置のアクセスを
より高速にすることに関係するものである。A second method of improving storage subsystem performance based on the teachings of the present invention involves making backing storage accesses faster.

データであれ命令であれキャッシュ・ミスを生じるワー
ドが記憶サブシステムから要求される時、要求されたワ
ードで始まる必要なラインを求めてバッキング記憶装置
がアクセスされる。次にこのワードはキャッシュをバイ
パスして直接ＣＰＵに送られ、ＣＰＵが命令の実行を続
ける間、ラインがキャッシュに記憶される。従って、例
えばキャッシュ・ミスを生じるロード命令を完了するの
に３４０ナノ秒（から命令がロードとオーバーラツプで
きる分を差し引いたもの）しかかからない。When a word, whether data or instruction, that results in a cache miss is requested from the storage subsystem, the backing storage is accessed for the necessary line starting with the requested word. This word is then sent directly to the CPU, bypassing the cache, and the line is stored in the cache while the CPU continues executing instructions. Thus, for example, a load instruction that causes a cache miss takes only 340 nanoseconds (minus the amount by which the instruction can overlap with the load) to complete.

命令の事前取出し機構は本質的にデータ取出し機構に対
して非同期的なので、キャッシュを命令用とデータ用の
２つの別個の部分に分離すると有利なことが判明した。Since the instruction prefetching mechanism is inherently asynchronous to the data fetching mechanism, it has been found advantageous to separate the cache into two separate parts, one for instructions and one for data.

その結果、各キャッシュによってバッキング記憶装置を
独立的にそしてオーバーラツプ方式でアクセスすること
が可能となる。As a result, each cache allows the backing storage to be accessed independently and in an overlapping manner.

この特徴は、実行中の特定の命令ストリームが許す場合
、アクセス速度を事実上２倍にすることができる。かか
る全体的キャッシュ・サブシステム・アーキテクチャを
用いれば、他の利点も可能である。This feature can effectively double the access speed if the particular instruction stream being executed allows. Other benefits are possible with such an overall cache subsystem architecture.

このキャッシュ−サブシステムは、命令キャッシュ中の
命令の変更が許されないように構成され定義されている
。従って命令ラインの取出しは決して書戻しを必要とせ
ず、命令キャッシュ・ミス時の遅延を伴わない。The cache subsystem is constructed and defined such that no modification of instructions in the instruction cache is permitted. Therefore, instruction line fetches never require write-backs and are not delayed during instruction cache misses.

各キャッシュ部分（命令部およびデータ部）は２ウエイ
・セット・アソシアティブ方式に設計される。従って４
ウエイ・セット・アソシアティブ方式の利点の一部が費
用を要することなしに、得られる。以後ＬＲＵ（ｌｅａ
ｓｔ　　ｒｅｃｅｎｔｌｙ　ｕｓｅｄ）ビットと呼び特
殊な制御ビットまたはフィールドはミスの後にキャッシ
ュのアドレスされた領域において、２つのラインのどち
らが置き換えられるべきかを判定する通常の置換手続き
を、現在の２ウエイ・セット・アソシアティブ拳キャッ
シュがもつことを可能にする。Each cache part (instruction part and data part) is designed in a two-way set associative manner. Therefore 4
Some of the benefits of the way set associative approach are obtained without the expense. From now on, LRU (lea
Special control bits or fields, called bits (used) bits, control the normal replacement procedure that determines which of two lines should be replaced in the addressed region of the cache after a miss. Allows the associative fist cache to be held.

全体の寸法、深さ、ラインの寸法およびその他の物理的
パラメータ等の各キャッシュ特性は、キャッシュが意図
している特定の目的に即ち命令またはデータに適合する
ように選択することができる。同様に別々の置換アルゴ
リズムを使用しズ命令とデータとの間のアクセス特性ま
たはアクセス・パターンの相違を利用することができる
。本システムの原型で、両方のキャッシュに類似の置換
アルゴリズムを用いて満足な結果が得られたが、アルゴ
リズムを特定のキャッシュに適合させると一定の改善が
生じるはずである。Each cache characteristic, such as overall size, depth, line size, and other physical parameters, can be selected to suit the particular purpose, instructions or data, for which the cache is intended. Similarly, separate replacement algorithms can be used to take advantage of differences in access characteristics or patterns between instructions and data. In the prototype of our system, a similar replacement algorithm was used for both caches with satisfactory results, but adapting the algorithm to a particular cache should yield some improvement.

「背景技術」の章で述べたように、スプリット・キャッ
シュは当技術分野で数年前から知られているが、従来の
アーキテクチャを用いてそれを実施すると、深刻な問題
が生じていた。命令はデータ・キャッシュで合法的に変
更でき、その後そこに分岐できるので、全ての変更は命
令キャッシュに通知されなければならず、またそれが、
変更されたラインを無効化することを保証しなければな
らない。しかし実際は、今日のよシ洗練されたシステム
では命令は殆んど変更されない。従ってこの機能、即ち
命令の変更は、データの変更が起きる度にこの機能を反
復して実行するための機構をハードウェアに設けるよシ
も（必要な時に）ソフトウェアで実行する方がずっと効
率的なものであると判断された。As mentioned in the Background section, split caching has been known in the art for several years, but its implementation using traditional architectures has created serious problems. Since instructions can be legally modified in the data cache and then branched to, all modifications must be notified to the instruction cache, and
It must be ensured that modified lines are invalidated. But in reality, in today's more sophisticated systems, instructions rarely change. Therefore, it is much more efficient to perform this function, i.e., changing instructions, in software (when needed) rather than providing a mechanism in the hardware to repeatedly perform this function each time a data change occurs. It was determined that it was.

先に述べたように、このシステムでは変更が必要な時に
は命令をデータΦキャッシュに入れ、それをデータとし
て扱うためのソフトウェアが設けられていると想定する
。明らかにこれは容易に行うことができる。しかし、デ
ータ・キャッシュは変更されたラインを命令キャッシュ
に知らせない。As mentioned above, this system assumes that when a change is required, an instruction is placed in the data Φ cache and that software is provided to treat it as data. Obviously this can be done easily. However, the data cache does not inform the instruction cache of modified lines.

従ってそれらの変更は、変更された命令への次の分岐中
には反映されない。かかる制御を達成するために、ここ
で開示するスプリット・キャッシュ・サブシステム会ア
ーキテクチャは「命令キャッシュ・ライン無効化」と呼
ばれる命令を提供する。Therefore, those changes are not reflected during the next branch to the modified instruction. To achieve such control, the split cache subsystem architecture disclosed herein provides an instruction called "Instruction Cache Line Invalidation."

ソフトウェアは古い命令を掃き出すために、この命令を
出さなければならない。また上記アーキテクチャは「デ
ータ・キャッシュ・ライン記憶」と呼ばれる命令を提供
する。この命令は変更された命令がバッキング記憶装置
中に反映されることを保証する。最もあシふれた場合で
あるが、ディスクからのロードによシブログラムの変更
が生じるとき、最初の命令のみが出されなければならな
い。Software must issue this command to purge old commands. The architecture also provides an instruction called "data cache line store." This instruction ensures that modified instructions are reflected in backing storage. In the most common case, when loading from disk causes a change in the siprogram, only the first instruction must be issued.

このキャッシュ・サブシステム・アーキテクチャの追加
的な利点は、別個に独立して動作するキャッシュによシ
、命令用およびデータ用の別個の仮想記憶装置を自然に
サポートできることである。An additional advantage of this cache subsystem architecture is that it can naturally support separate virtual storage for caches, instructions, and data that operate separately and independently.

従って例えばＡＰＬインタープリタの単一の再入可能コ
ピーが反復的なメモリ動作によって、最小限のバス時間
で多くの異なったユーザー領域のために実行できるよう
な、ソフトウェア・ストラテジーが可能になる。Thus, for example, software strategies are possible in which a single reentrant copy of the APL interpreter can be executed for many different user areas with minimal bus time by repeated memory operations.

このアーキテクチャは仮想記憶装置間でのページの共有
が可能でない。この制限はハードウェアおよびソフトウ
ェアのどちらの観点からも、大きな単純化をもたらす。This architecture does not allow sharing of pages between virtual storage devices. This restriction provides great simplifications from both a hardware and software perspective.

第１にそれはキャッシュが仮想モードで動作することを
可能にする。言い換えると実アドレスではなく仮想アド
レスを用いてキャシュにアクセスすることができる。明
らかに、ラインがキャッシュ中に（９０％以上の時間）
見い出される時、再配置アルゴリズムを実行することに
よる時間損失や品質低下は全く生じない。First, it allows the cache to operate in virtual mode. In other words, the cache can be accessed using virtual addresses rather than real addresses. Apparently the line is in cache (more than 90% of the time)
When found, there is no time loss or quality loss from running the relocation algorithm.

従来の再配置システムはかなシの量の高速記憶装置をペ
ージ・テーブル上のルック・アサイド・テーブル（ＤＬ
ＡＴ　）専用にしている。記憶装置は一般にキャッシュ
と同じ技術であり、従って非常に高価である。このアー
キテクチャではページ・テーブルは通常はキャッシュを
通してアクセスすることができ、その期待される使用頻
度が高いために１一般にキャッシュ・ヒツトの確率が非
常に高（なる。従って追加的な費用なしで、性能ＤＬＡ
Ｔを近似することが可能である。従ってシステムの性能
に大幅に影響を与えることなしに、異常＋７）ＤＬＡＴ
への投資をキャッシュ・サイズの増大に適切に振り向け
ることもまた全く削除することもできる。Traditional relocation systems store fleeting amounts of high-speed storage in look-aside tables (DLs) on page tables.
AT) is reserved for exclusive use. Storage is generally the same technology as cache and is therefore very expensive. In this architecture, page tables are usually accessible through the cache, and their expected high frequency of use generally results in a very high probability of a cache hit. D.L.A.
It is possible to approximate T. Therefore, abnormal +7) DLAT can be detected without significantly affecting system performance.
The investment in can be appropriately directed to increasing the cache size or can be eliminated altogether.

ハードウェアの実施例、与えられる特殊なハードウェア
命令、およびこのキャッシュ・サブシステム内で実行で
きる各種の動作についての詳細な説明を始めろ前に、次
の図面の簡単な説明とその相互の関係を述べる。Before proceeding with a detailed description of the hardware implementation, the special hardware instructions provided, and the various operations that can be performed within this cache subsystem, a brief description of the following figures and their relationship to each other is provided. state

第１図は、ＣＰＵ、命令キャッシュ、データ・キャッシ
ュおよび主記憶装置の関係をはっきりと示した、階層記
憶構成の全体構成図である。直接記憶アダプタ（ＤＭＡ
　）も主記憶装置に直接接続されたものとして示されて
いる。先に述べたように、キャッシュヲヘースとする大
部分のシステムでは、入出力はキャッシュ・サブシステ
ムを通過して、主記憶装置に記憶され、システム・スル
ーブツトの低下を伴う。本システムでは、入出力がキャ
ッシュを通過することを禁じられており、実際にも大部
分は直接ＤＭＡを通過する。事実、後述のように入出力
動作がキャッシュによって開始される記憶と読取りを中
断することを防止するための措置がとられている。キャ
ッシュにより開始される記憶と読取りは「ストア・スル
ー型」動作を伴うキャッシュ・ミスが起こった場合に生
じる。FIG. 1 is an overall block diagram of a hierarchical storage configuration clearly showing the relationship between the CPU, instruction cache, data cache, and main memory. Direct storage adapter (DMA)
) are also shown as being directly connected to main memory. As previously mentioned, in most cache based systems, I/O passes through the cache subsystem and is stored in main memory, with the attendant reduction in system throughput. In this system, input/output is prohibited from passing through the cache, and in fact most of it passes directly through the DMA. In fact, steps are taken to prevent I/O operations from interrupting cache-initiated stores and reads, as described below. Cache-initiated stores and reads occur when cache misses occur that involve "store-through" operations.

また後で図面を参照する際に指摘するが、データ・キャ
ッシュから主記憶装置へのデータの流れのラインはある
が、命令キャッシュから主記憶装置への流れのラインは
存在しない。キャッシュの命令部を「ストア・スルー」
する必要はないため、命令キャッシュ中での命令の変更
は許されないので、このことは銘記しておくべきである
。Also, as will be pointed out later when referring to the figures, there is a data flow line from the data cache to main memory, but there is no flow line from the instruction cache to main memory. “Store-through” the instruction part of the cache
This should be kept in mind since changing the instruction in the instruction cache is not allowed.

第２図（第２．１図および第２．２図）は、主記憶装置
、キャッシュおよびＣＰＵ０間のデータ経路をより詳し
く示した、第１図の拡張である。図のマルチプレクサ（
ＭＵＸ）は、このノ１−ドウエア実施例で設けられてい
る各種のデータ転送のためのゲート機能を実行する、通
常の論理回路である。FIG. 2 (FIGS. 2.1 and 2.2) is an extension of FIG. 1 showing the data path between main memory, cache, and CPU0 in more detail. The multiplexer in the figure (
MUX) is a conventional logic circuit that performs the gate functions for the various data transfers provided in this hardware embodiment.

データ１バイト当たり１つのパリティ−・ピッドが入出
力、主記憶装置および両キャッシュ中を通って運ばれる
が、パリティ−・ビットはＣＰＵに転送されたり、ＣＰ
Ｕかも受は取られることはないことに注意する。One parity bit per byte of data is carried through input/output, main memory, and both caches; parity bits are transferred to the CPU and
Note that U-kamo-uke will not be taken.

第２図は、主記憶装置からＣＰＵへの４バイトの目的デ
ータ・ワードに直接データ経路を与える、ワード・バイ
パス機構を図示したものである。キャッシュと主記憶装
置の間のデータ転送の基本サイズは、３２バイトのライ
ンである。このデータは、主記憶装置からそれぞれ８バ
イトからなる一連の４個の二重ワードとして受は取られ
、この記憶アーキテクチャによって受は取られた最初の
二重ワードが必ず目的ワードを含むことが保証される。FIG. 2 illustrates a word bypass mechanism that provides a direct data path for a 4-byte target data word from main memory to the CPU. The basic size for data transfer between cache and main memory is a 32-byte line. This data is received from main memory as a series of four double words of eight bytes each, and this storage architecture ensures that the first double word received always contains the target word. be done.

目的ワードはキャッシュに記憶されるのと同時に、ＣＰ
Ｕにバイパスされる。キャッシュから主記憶装置へのデ
ータ書戻し経路は、書戻しマルチプレクサを通るそれぞ
れ４バイトからなる一連の８個のワードとしてデータ・
キャッシュのみに設けられていることに注意すること。The destination word is stored in the cache at the same time as the CP
Bypassed to U. The data write-back path from the cache to main memory writes the data as a series of eight words of four bytes each through a write-back multiplexer.
Note that this is only provided for caching.

第２図の命令キャッシュとデータ番キャッシュを参照す
ると、２つのサブセラ）ＡとＢの存在が示されているこ
とに気づく。先に述べたように、この２つのサブセット
の存在はアドレッシング・アーキテクチャの２ウエイ会
セツト・アソシアティブの要件である。If you refer to the instruction cache and data number cache in FIG. 2, you will notice that the existence of two subcells A and B is shown. As mentioned above, the existence of these two subsets is a requirement of the two-way set-associative addressing architecture.

次に第２図のデータ流れ図について簡単に説明する。図
を参照する際、データ流れの方向は基本的に図の上方か
ら下方に向かっている。命令キャッシュ１０とデーター
キャッシュ１２は、ともに主記憶装置１４かもデータを
受取ることができる。Next, the data flow diagram shown in FIG. 2 will be briefly explained. When referring to the diagram, the direction of data flow is generally from the top of the diagram to the bottom. Both instruction cache 10 and data cache 12 can also receive data from main memory 14.

しかし、ＣＰＵからデータを受取ることができるのはデ
ータ・キャッシュだけである。このことは先に述べた。However, only the data cache can receive data from the CPU. This was mentioned earlier.

これはこのアーキテクチャでは命令キャッシュでの命令
の変更が可能でな（、従って命令キャッシュの出力はＣ
ＰＵに向かうだけであるためである。同様に、そのキャ
ッシュ中の命令は直接ＣＰＵによって変更できないので
、命令キャッシュはローディングのために取出しランチ
２０と、入力マルチプレクサ２２を経て主記憶装置１４
に接続しさえすればよい。従って、命令キャッシュ１０
に命令を記憶するため、命令データのラインが取出しラ
ッテ２０を経て、最終的には命令キャッシュの選択され
たセクション（ライン）に転送させられる。同様に、命
令キャッシュからＣＰＵに命令を転送するため、データ
・ワードが命令キャッシュの選択されたセクションから
読取られて、出力マルチプレクサ１６、およびノくイノ
ぐス・マルチプレクサ１Ｂに送られる。This is because in this architecture it is not possible to change instructions in the instruction cache (therefore, the output of the instruction cache is
This is because it only goes to the PU. Similarly, since the instructions in its cache cannot be modified directly by the CPU, the instruction cache is routed to main memory 14 for loading via fetch launch 20 and input multiplexer 22.
All you have to do is connect to it. Therefore, instruction cache 10
To store instructions in the instruction cache, a line of instruction data is passed through the fetch latte 20 and ultimately transferred to a selected section (line) of the instruction cache. Similarly, to transfer instructions from the instruction cache to the CPU, data words are read from selected sections of the instruction cache and sent to output multiplexer 16 and multiplexer 1B.

それぞれ命令キャッシュ１０とデータ・キャッシュ１２
用の連関されたディレクトリ１１と１３（第３図（第３
１図と第３２図）に詳しく示す）が、キャッシュに（機
能的に）接続されたものとして示されている。これらの
ディレクトリはキャッシュ・データ流れ経路に含まれて
いるのではなく、物理的にキャッシュの動作と密接に関
連づけられていることを理解すべきである。後でさらに
詳しく説明するように、所与のキャッシュ・アクセスの
場合、アドレスのＤフィールドに対して決定された同じ
ラインによって、両キャッシュ・サブセットの同じライ
ンおよび連関されたディレクトリ項目が並列にアクセス
される。さらに、このアクセスによって、両キャッシュ
・サプセツ）ＡおよびＢのデータがシステムにとって使
用可能となる。instruction cache 10 and data cache 12, respectively.
associated directories 11 and 13 (Figure 3)
1 and 32)) are shown as (functionally) connected to the cache. It should be understood that these directories are not included in the cache data flow path, but are physically tied to the operation of the cache. As will be explained in more detail below, for a given cache access, the same line and associated directory entry in both cache subsets are accessed in parallel by the same line determined for the D field of the address. Ru. Additionally, this access makes the data in both cache subsets A and B available to the system.

これらのディレクトリは、通常はエミッタ結合論理回路
など適当な高速回路ファミリー中の別個の極めて速い記
憶装置として作られる。ここに開示する実施例では、キ
ャッシュのアクセス時間が３０ナノ秒であるのに対して
、ディレクトリのアクセス時間は約１２ナノ秒であった
。これはキャッシュから２つのデータ・サブセットが使
用可能なとき、ディレクトリ項目にもとづいて行うべき
論理判断を可能にするためである。These directories are typically created as separate, very fast storage devices in a suitable high speed circuit family, such as emitter-coupled logic circuits. In the embodiment disclosed herein, the cache access time was 30 nanoseconds, while the directory access time was approximately 12 nanoseconds. This is to allow logical decisions to be made based on directory entries when two data subsets are available from the cache.

先に述べたように、データ・キャッシュは主記憶装置１
４−からも、ｃ、　ｐ　ｕ：ｓ　ｏからもロードできる
。As mentioned earlier, the data cache is located in main memory 1.
It can be loaded from 4- or from c, pu:so.

データ・ラインを主記憶装置１４から記憶すべき場合、
データ流れは両キャッシュ共用の取出しラッチ２０を通
シ、次いでデータ・キャッシュ用のバイト入力マルチプ
レクサ４８を通ってデータ・キャツシュ１２自体に入る
。データ・ラインをＣＰＵ３０から転送すべき場合、デ
ータ経路は図に示すようにバイト入力マルチプレクサ４
８ｆｔ通って、次にデータ番キャッシュ１２に入る。デ
ータ・キャッシュ１２からＣＰＵにデータを転送するに
は、データは出力マルチプレクサ６２、バイパス・マル
チプレクサ３４を通ってＣＰＵのデータ・レジスタに入
る。If the data line is to be stored from main memory 14,
Data flow passes through the fetch latch 20 shared by both caches and then through the data cache byte input multiplexer 48 into the data cache 12 itself. If a data line is to be transferred from CPU 30, the data path is routed to byte input multiplexer 4 as shown.
8ft and then enters data number cache 12. To transfer data from data cache 12 to the CPU, the data passes through output multiplexer 62, bypass multiplexer 34, and enters the CPU's data register.

データ・キャッシュのミスが起こった場合、そのライン
が有効であり、かつ変更を示すと仮定すると、新しいデ
ータ・ラインをデータ・キャッシュのその特定ラインに
ロードできるようになる前に、データ・キャッシュ中の
このデータ・ラインを記憶装置１４に書き戻さなければ
ならない。このために書き戻しラッチ／マルチプレクサ
５２が付勢される。When a data cache miss occurs, assuming that the line is valid and indicates a change, the data cache miss occurs before a new data line can be loaded into that particular line in the data cache. This line of data must be written back to storage 14. For this purpose, writeback latch/multiplexer 52 is activated.

バイト選択機構４９およびバイト書込みゲート５０は、
データ・キャッシュへのデータの記憶を制御する。Byte selection mechanism 49 and byte write gate 50 are
Controls the storage of data in the data cache.

ディレクトリ・ミスを伴うＣＰＵロードＯＰのとき、バ
イト選択機構４９は主記憶データの二重ワードのみがバ
イト入力マルチプレクサ４８を通るようにゲートシ、す
べてのバイト書込みゲート５０が活動化される。During a CPU load OP with a directory miss, byte selector 49 gates so that only double words of main memory data pass through byte input multiplexer 48, and all byte write gates 50 are activated.

ディレクトリ−ヒツトを伴うＣＰＵ記憶ＯＰのとき、バ
イト選択機構４９は、ＣＰＵデーターワードのみがバイ
ト入力マルチプレクサ４８を通るようにゲートし、書込
みゲート５０は、データ・キャッシュのサブセット中に
記憶されるバイト数を制御するのに使用される。このＣ
ＰＵアーキテクチャは、１．２または３バイトのデータ
を記憶することができる。When CPU store OP with directory-hit, byte selector 49 gates only CPU data words through byte input multiplexer 48 and write gate 50 selects the number of bytes stored in the subset of data cache. used to control. This C
The PU architecture can store 1.2 or 3 bytes of data.

ディレクトリ・ミスを伴うＣＰＵ記憶ＯＰのとき、バイ
ト選択機構４９は１．２または３バイトのＣＰＵデータ
を主記憶装置からの最初のミス二重ワードに組み合わせ
、バイト入力マルチプレクサ４Ｂを通過させる。すべて
の４バイト書込みゲート５０が活動化される。ミスを生
じたラインの残シの３つの二重ワードについては、バイ
ト選択機構４９は主記憶装置の二重ワードのみがバイト
入力マルチプレクサ４日を通過できるようにする。During a CPU store OP with a directory miss, byte selector 49 combines 1.2 or 3 bytes of CPU data into the first missed double word from main memory and passes it through byte input multiplexer 4B. All 4-byte write gates 50 are activated. For the remaining three double words of the missed line, byte selector 49 allows only main memory double words to pass through byte input multiplexer 4.

どちらのキャッシュもバイパス機構を備えており、どち
らかのキャッシュに対してキャッシュ・ミスを生じる取
出し要求があったとき、データが主記憶装置から使用可
能になっている場合、それぞれ命令キャッシュおよびデ
ータ・キャッシュについてワード選択マルチプレクサ３
８または４０を通して、そのデータをキャッシュおよび
ＣＰＵに同時に送ることができる。こうしてデータ（ま
たは命令）がまずキャッシュに完全に記憶されるのを待
つ間、ＣＰＵが停止する必要はない。Both caches have a bypass mechanism so that a fetch request that results in a cache miss for either cache will bypass the instruction and data caches, respectively, if the data is available from main storage. About Cache Word Selection Multiplexer 3
8 or 40, the data can be sent to the cache and the CPU simultaneously. Thus, the CPU does not have to stop while waiting for the data (or instructions) to first be completely stored in the cache.

後でさらに詳しく述べるように、命令またはデータ・ラ
インの目的ワードのみが直接ＣＰＵにバイパスされる。As discussed in more detail below, only the destination words of the instruction or data lines are bypassed directly to the CPU.

従って、本スプリット自キャッシューサブシステムの全
体アーキテクチャは、基本的に通常の性格のものである
ことがわかる。すなわち、主記憶装置から命令キャッシ
ュおよびデータ・キャッシュをロードするためのデータ
経路が設けられている。追加手段がＣＰＵからデータ・
キャッシュをロードする。同様に両方のキャッシュがそ
れぞれ命令とデータをＣＰＵに転送することができ、デ
ータ・キャッシュはまたデータを主記憶装置に書戻すこ
とができなければならない。最後に、どちらのキャッシ
ュにもバイパス機構が備えられており、それによってラ
イン中のアドレスされたワードが直ちにＣＰＵに送られ
、そのラインはＣＰＵ遅延を最小にするため、同時にキ
ャッシュに記憶される。従って、本スプリット・キャッ
シュ・サブシステムの実際の７・−ドウエア構成は、全
く簡明であることが認められる。ここに開示するキャッ
シュ・サブシステムの改善された機能をもたらすのは、
キャッシュの実際の使い方およびキャッシュ・ディレク
トリとその中に設けられた特殊な制御ビットの独特の構
成である。Therefore, it can be seen that the overall architecture of the present split cache subsystem is basically of a conventional character. That is, a data path is provided for loading the instruction and data caches from main memory. An additional means is to transfer data from the CPU.
Load cache. Similarly, both caches must each be able to transfer instructions and data to the CPU, and the data cache must also be able to write data back to main memory. Finally, both caches are equipped with a bypass mechanism whereby the addressed words in a line are immediately sent to the CPU and the line is stored in the cache simultaneously to minimize CPU delay. Therefore, it can be seen that the actual 7-ware architecture of the present split cache subsystem is quite straightforward. The improved functionality of the cache subsystem disclosed herein is provided by:
The actual use of the cache and the unique organization of the cache directory and special control bits provided within it.

ちょっと第２図に戻ると、どちらのキャッシュも図のよ
うな別個のライン選択機構によって２つのサブセクショ
ンＡとＢに分けられていることが気付かれる。このこと
は後の説明でさらにはっきり認められるはずである。こ
のキャッシュ・サブシステムは、２ウエイ・セット・ア
ソシアティブである。キャッシュ・ディレクトリ中の所
与のラインがアドレスされると、２つのデータ・ライン
の各々からキャッシュ内で異なる２つのページ（ＡとＢ
）に属する１ワードが実際にアドレスされる。最終的に
選択されるラインは後で説明するように、目的ページ・
アドレスｐｔとキャッシュ争ディレクトリに含まれる２
つのページ参照ＰＡおよびＰＢとの比較によって決定さ
れる。選択されたラインからアドレスされたワードは、
出力マルチプレクサ１６または３２によってＣＰＵにゲ
ートされる。Returning briefly to FIG. 2, it will be noted that both caches are divided into two subsections A and B by separate line selection mechanisms as shown. This will become clearer in the explanation that follows. This cache subsystem is two-way set associative. When a given line in the cache directory is addressed, two different pages (A and B) are accessed in the cache from each of the two data lines.
) is actually addressed. The final line selected will be determined by the destination page and
2 included in address pt and cache conflict directory
is determined by comparison with the two page references PA and PB. The word addressed from the selected line is
Gated to the CPU by output multiplexer 16 or 32.

また、第２図を参照すると、キャッシュ・サブシステム
の各ユニットを接続する各種の多重ビット・ケーブルは
３２または３６ビツトを含むものとして示されているこ
とが気付かれる。３２ビツト・ケーブルと３６ビツト・
ケーブルの違いは、この実施例の範囲内では３６ビツト
・ケーブルは３２個のデータ・ビットと４個のパリティ
・ビットを含むということである。一般に４パリテイ・
ビットは、キャッシュ・サブシステムからＣＰＵにデー
タ（′または命令）を転送する際に除去ないし削除され
る。また、主記憶装置１４と取出しラッチ２０の間のケ
ーブルが７２ビツトを含むことも気付かれる。すぐ認め
られるように、主記憶装置は二重ワードを読出して転送
するように編成されているので、このケーブルは２つの
３６ビツト・ワードを含むだけである。Also, with reference to FIG. 2, it will be noted that the various multi-bit cables connecting each unit of the cache subsystem are shown as containing 32 or 36 bits. 32-bit cable and 36-bit cable
The cable difference is that, within the scope of this embodiment, a 36 bit cable contains 32 data bits and 4 parity bits. Generally 4 parities
Bits are removed or deleted when transferring data (or instructions) from the cache subsystem to the CPU. It is also noted that the cable between main storage 14 and eject latch 20 includes 72 bits. As will be readily appreciated, main memory is organized to read and transfer double words, so this cable only contains two 36-bit words.

これで、スプリット・キャッシュ・サブシステム・ハ“
−ドウエア全体の一般的説明は終わる。このハードウェ
アの一般的構成および動作は単純明解であシ、当接術で
は周知のものであると考えられる。The split cache subsystem is now “
-The general description of the entire software ends. The general construction and operation of this hardware is believed to be straightforward and well known in the abutment art.

第３図（３，１図および第３．２図）を参照すると、１
６にバイトのデータ・キャッシュ・ディレクトリとそれ
に関連する論理および制御回路の詳細な機能ブロック図
が示されている。ここで開示される実施例では、２４ビ
ツトのＣＰＵアドレスがレジスタ６０に配置されている
と仮定されている。Referring to Figure 3 (Figure 3.1 and Figure 3.2), 1
A detailed functional block diagram of Byte's data cache directory and its associated logic and control circuitry is shown in FIG. In the embodiment disclosed herein, it is assumed that a 24-bit CPU address is placed in register 60.

この２４ピツト・アドレス全体のうちで、左側の１１ビ
ツト（Ｐｔ）は記憶参照のページ・アドレスを含んでい
る。８ピツトからなるＤフィールドは、指定されたペー
ジ内の特定の記憶参照のライン・アドレスを含んでいる
。このアドレスはキャッシュ・ディレクトリの（従って
またキャッシュ自身の内部の）２５６個のラインの１つ
を実際にアドレスするのに利用されることが指摘される
。Of this entire 24-pit address, the left 11 bits (Pt) contain the page address of the memory reference. The 8-pit D field contains the line address of a particular storage reference within the specified page. It is pointed out that this address is actually used to address one of the 256 lines in the cache directory (and therefore also within the cache itself).

最後に右側５ビツトのＷフィールドは３２バイトのライ
ン全体の中のワードまたはバイト・オフセットである。Finally, the 5-bit W field on the right is the word or byte offset within the entire 32-byte line.

そのアドレスすなわちセグメントは、全体アドレスによ
って指示される所期のパイ）ｆアドレスするのに実際に
利用される。That address or segment is actually used to address the intended pi)f pointed to by the global address.

図からはっきりわかるように、キャッシュ・ディレクト
リは２５６の項目を含んでおシ、（０〜２５５）の各項
目は全体で７つの情報フィールドＰＡ％ＰＲ，ＶＡ１Ｖ
Ｂ、ＭＡ、ＭＢ％ＬＲＵ’ｉ含んでいる。後でさらに詳
しく説明するように、ＰＡ％ＶＡおよびＭＡは、サブセ
ットＡに属するキャッシュ内の各ラインを指し、要素Ｐ
Ｂ、ＶＢおよびＭＢはサプセツ）Ｈに関係している。Ｌ
ＲＵビットは２つのサブセット・２インのどちらが最も
最近にアクセスされたかを示し、従ってキャッシュ内の
（サプセツ）ＡまたはＢ中の）特定ラインの置換を制御
する。As can be clearly seen from the figure, the cache directory contains 256 entries, and each entry (0 to 255) has a total of 7 information fields PA%PR,VA1V
Contains B, MA, MB%LRU'i. As explained in more detail below, PA%VA and MA refer to each line in the cache that belongs to subset A, with element P
B, VB and MB are related to the subsidiary) H. L
The RU bit indicates which of the two subsets 2-in was most recently accessed, and thus controls the replacement of a particular line in the cache (in subsets A or B).

動作の際には、キャッシュ・アクセスを行う場合、ＣＰ
ＵアドレスのＤフィールドによって指示される特定のラ
インが、ディレクトリ内の２５６の項目の１つをアクセ
スさせる。次にページＰＡとＰＢのどちらがＣＰＵアド
レス中の目的ページ・アドレスｐｔと一致するかを決定
しなければならない。この比較は、２つの比較回路６２
と６４で実行される。ページ・アドレスＰＡとＰＢのど
ちらかがｐｔと一致する場合、「ヒツト＝Ａ」ラインま
たは「ヒツト＝ＢＪラインが活動化される。During operation, when performing cache access, CP
A particular line pointed to by the D field of the U address causes one of the 256 entries in the directory to be accessed. It must then be determined which of pages PA and PB matches the destination page address pt in the CPU address. This comparison is performed using two comparison circuits 62.
and 64 is executed. If either page address PA or PB matches pt, the "Hit=A" line or the "Hit=BJ" line is activated.

次にアクセスが継続できるかどうか知るため、当該の妥
当性ピッ）Ｖまたは変更ピッ）Ｍが問い合わせられる。The validity pin) V or modification pin) M is then queried to find out whether access can continue.

この動作の詳細については、後でよシ詳しく述べる。２
つのページ・アドレスＰＡまたはＰＢのどちらもｐｔと
一致したい場合、ＮＡＮＤ回路６６が「ミス」ラインを
活動化させ、ディレクトリ更新論理回路６８がシステム
に、「ミス」が生じ、新しいデータ・ラインをキャッシ
ュ・システムに持ち込まなければならないことを示す。The details of this operation will be discussed in detail later. 2
If neither page address PA or PB wants to match pt, NAND circuit 66 activates the "miss" line and directory update logic 68 informs the system that a "miss" has occurred and the new data line is cached. - Indicates what must be brought into the system.

２つのサブセット−ラインのどちらが置換されるかは、
ＬＲＵビットによって決定される。「書込みストローブ
」として示されている７ビツト・ラインは後で詳しく説
明するように、°新しいデータを選択されたフィールド
ないしキャッシュ・ディレクトリの特定項目のビット位
置に入力できるようにする。もちろんどのビットが変更
され、いつ新しいページ・アドレスがＰＡまたはＰＢフ
ィールドに挿入されるかはＣＰＵ命令解読機構７０によ
って解読される特定の命令によって決定される。Which of the two subset-lines is replaced is
Determined by the LRU bit. A 7-bit line, designated as a "write strobe," allows new data to be entered into the selected field or bit position of a particular entry in the cache directory, as will be explained in more detail below. Of course, which bits are changed and when a new page address is inserted into the PA or PB field is determined by the particular instruction being decoded by the CPU instruction decoder 70.

キャッシュ・ディレクトリおよびそれに関連する制御装
置の動作および構成は、単純明解であると考えられ、こ
こに開示される機能説明とブロック図を与えられれば、
コンピュータ技術の専門家が容易に実現できるものであ
る。The operation and organization of the cache directory and its associated controls is believed to be straightforward and given the functional description and block diagram disclosed herein.
This can be easily accomplished by computer technology experts.

第４図はキャッシュ・サブシステムのアドレッシングお
よび構造を図示した一連の表を含んでいる。それに加え
て、この図はキャッシュ・サブシステムのサイズのアド
レッシング拳フィールドなど各種のパラメータに対する
影響を示している。FIG. 4 includes a series of tables illustrating the addressing and structure of the cache subsystem. In addition, the figure shows the effect of cache subsystem size on various parameters such as the addressing field.

要するに、異なる６種のキャッシュψサイズを４に、８
におよび１６にと仮定すると、ディレクトリ項目は２つ
のページ識別子ＰａおよびＰｂと５つの特殊制御ビット
Ｖａ、Ｖｂ、Ｍ’ａ、Ｍｂ％ＬＲＵを含むことが気付か
れる。これらの特別制御ビットが利用される特殊な方法
については、後で詳しく説明する。In short, six different types of cache ψ size are set to 4, 8
and 16, it will be noticed that the directory entry contains two page identifiers Pa and Pb and five special control bits Va, Vb, M'a, Mb%LRU. The specific manner in which these special control bits are utilized will be discussed in more detail below.

キャッシュ中のディレクトリのアドレッシングは、図の
上部にもはつきシ示されている。ここで２４ビツトのＣ
ＰＵ目的アドレスＵ、Ｐ（ページ）、Ｄ（ライン）、お
よびＷ（バイト）の３つのフィールドを含んでいる。図
から明らかなように、キャッシュ自体はＤフィールドと
Ｗフィールドを用いてアドレスされるが、ディレクトリ
はかかる２ウエイ・セット・アソシアティブ式キャッシ
ュではＤフィールドのみを利用してアドレスされる。当
該技術の専門家なら理解できるように、ディレクトリが
アクセスされ、次に目的アドレスのＰフィールドがディ
レクトリ項目Ｐａまたはｐｂのどちらかと一致するかど
うか決定される。この図については後でより詳しく説明
する。The addressing of directories in the cache is also shown at the top of the diagram. Here, 24 bit C
Contains three fields: PU destination address U, P (page), D (line), and W (byte). As can be seen, the cache itself is addressed using the D and W fields, but the directory is addressed using only the D field in such a two-way set-associative cache. As will be understood by those skilled in the art, the directory is accessed and then it is determined whether the P field of the destination address matches either directory entry Pa or pb. This figure will be explained in more detail later.

第５図から第１１図までは、それぞれ特定のハードウェ
ア動作の結果としてキャッシュ・サブシステムのハード
ウェア内部で起こる動作を表の形でまとめたものである
。「ハードウェア・プロシージャ」の表現はキャッシュ
−サブシステム場ハードウェアの動作の結果としてシス
テムで起こることをリストにしたものという意味である
。5 through 11 are tabular summaries of the operations that occur within the hardware of the cache subsystem as a result of each particular hardware operation. The expression "hardware procedures" means a list of things that occur in the system as a result of operations on the cache subsystem hardware.

第１２図から第１８図まではすべて流れ図であり、各図
につけたラベルで示されるような第５図ないし第１１図
に記される各種のキャッシュ・サブシステム動作と密接
に関係している。言い換えれば、列挙された各ハードウ
ェア・プロン−ジャに対する流れ図があり、例えば第６
図のデータ・キャッシュ取出しハードウェアープロシー
ジャは第１４図にずっと詳しく示されている。このよう
にこれらの流れ図では、詳しいテスト操作および分岐操
作および様々な分岐に沿って進むとき、列挙された様青
なブロックで起こる特定の操作が明確に記載されている
。これらの操作については後でさらに詳しく説明するが
、基本的にキャッシュ・サブシステムの動作を半ば説明
すると考えられている。Figures 12 through 18 are all flowcharts and are closely related to the various cache subsystem operations described in Figures 5 through 11, as indicated by the labels attached to each figure. In other words, there is a flowchart for each hardware plunger listed, e.g.
The illustrated data cache retrieval hardware procedure is shown in greater detail in FIG. These flowcharts thus clearly describe the detailed testing and branching operations and specific operations that occur in the enumerated blue blocks as we proceed along the various branches. These operations will be explained in more detail later, but are generally considered to be a partial explanation of the operation of the cache subsystem.

かかるキャッシュ・サブシステムでおこる事象、ならび
に第２図と第３図で特別に示したすべてのハードウェア
構成要素の機能と目的は、当技術で周知のものであると
考えられる。当技術の専門家なら、第２図と第６図およ
び詳細な流れ図に記載された全体キャッシュ・サブシス
テム構成を用いて、本発明のキャッシュ・サブシステム
を作成するのに困難はないはずである。The events that occur in such cache subsystems, as well as the function and purpose of all hardware components specifically shown in FIGS. 2 and 3, are believed to be well known in the art. Those skilled in the art should have no difficulty in creating the cache subsystem of the present invention using the overall cache subsystem configuration described in Figures 2 and 6 and the detailed flowcharts. .

（ｂ）　　記憶階層の動作の詳細な説明以下の説明は、
本発明の階層記憶が特に有用なミニコンピユータのある
バージョンに適用される。(b) Detailed description of the operation of the storage hierarchy The following description is
The hierarchical storage of the present invention is particularly useful in certain versions of minicomputers.

これは、１６メガバイトまでの実主記憶装置を含む。該
階層記憶システムへの２４ビツトのアドレッシングを実
現する。２４ビツトの主ＣＰＵアーキテクチャは、ここ
で詳しく述べる適正な記憶命令を供給しなければならな
いこと以外は、本発明にとって決定的条件ではない。It contains up to 16 megabytes of real main memory. Achieves 24-bit addressing to the hierarchical storage system. The 24-bit main CPU architecture is not critical to the invention, other than that it must provide the proper storage instructions as detailed herein.

ここに開示する記憶階層の実施例は、ＣＰＵ速度で動作
するキャッシュ拳サブシステムおよびＣＰＵ速度の１１
５の速度で動作する最大１６メガバイトのＦＥＴ主記憶
装置からなる。The embodiments of the storage hierarchy disclosed herein include a cache fist subsystem that operates at CPU speeds and
It consists of up to 16 megabytes of FET main memory operating at speeds of 5.

ＣＰＵは、キャッシュ・サブシステムと直接に連絡し、
一方、後者は主記憶装置と連絡する（第１図を参照する
）。入出力データは、直接記憶アダプタ（ＤＭＡ　）を
介して主記憶装置に送ることができるが、キャッシュ・
サブシステムに直接連絡することはできない。The CPU communicates directly with the cache subsystem;
The latter, on the other hand, communicates with the main memory (see FIG. 1). Input/output data can be sent to main memory via a direct storage adapter (DMA), but cache
Subsystems cannot be contacted directly.

ＣＰＵとキャッシュ−サブシステムの間のデータ転送の
単位は、４バイト・ワードである。主記憶装置とキャッ
シュ・サブシステムの間の転送単位は、３２バイト・ラ
インである。ライン転送は、主記憶装置から４つの８バ
イト２重ワードを経て、また主記憶装置へ８つの４バイ
ト・ワードを経て行われる（第１図を参照のこと）。主
記憶装置からまたはそこへの入出力データ転送は、ＤＭ
Ａアダプタの制御下で４バイト・ワードを介して行われ
る。The unit of data transfer between the CPU and the cache subsystem is a 4-byte word. The unit of transfer between main storage and the cache subsystem is a 32-byte line. Line transfers occur from main memory via four 8-byte double words and to main memory via eight 4-byte words (see Figure 1). I/O data transfers to and from main memory are
This is done via 4-byte words under the control of the A adapter.

データ・バイト当り１つのパリティ・ピットが、記憶階
層中を運ばれることに注意すること。パリティ・ピット
は、ＣＰＵへまたはＣＰＵからは転送されない。Note that one parity pit per data byte is carried through the storage hierarchy. Parity pits are not transferred to or from the CPU.

（ｃ）土！ジコダ紀ｉ星ここに開示される階層記憶サブシステムは、記憶階層へ
の参照によって生じるＣＰＩＪ遊休時間を最小限に抑え
るように設計された、システム・アーキテクチャを基礎
にしている。この記憶アーキテクチャは、サイクル毎に
新しい命令を利用できるＣＰＵ用に設計されているので
、ＣＰＵ速度に合致する別個の命令キャッシュは、命令
の取出しが記憶階層中でのデータ取出しとは独立に進行
できるようにする。また、このアーキテクチャは、キャ
ッシュ・サブシステムとの直接の入出力通信を禁示し、
従って入出力妨害雑音によってＣＰＵがロックアウトさ
れる可能性を除外する。同様に、主記憶装置に対する過
剰の参照によるパフォーマンスの低下を避けるため、す
べての記憶は、データ・キャッシュに向けられ、主記憶
装置に自動的に「ストア・スルー」されることはない。(c) Earth! The hierarchical storage subsystem disclosed herein is based on a system architecture designed to minimize CPIJ idle time caused by references to the storage hierarchy. This storage architecture is designed for CPUs with new instructions available every cycle, so a separate instruction cache matched to CPU speed allows instruction fetches to proceed independently of data fetches in the memory hierarchy. Do it like this. This architecture also prohibits direct I/O communication with the cache subsystem;
Therefore, the possibility of the CPU being locked out due to input/output interference noise is eliminated. Similarly, to avoid performance degradation due to excessive references to main memory, all storage is directed to the data cache and is not automatically "stored through" to main memory.

このアーキテクチャ型式のために、入出力操作による主
記憶装置の内容の変更がＣＰＵに直ちに知られることは
なく、また、ｃｐＵによって実施されたデータ・キャッ
シュの内容の変更が入出力または命令キャッシュに直ち
に知られることはないかもしれない。Because of this architectural type, changes to the contents of main memory caused by I/O operations are not immediately known to the CPU, and changes to the contents of the data cache performed by the cpu are not immediately known to the I/O or instruction cache. It may never be known.

しかし、この上位システム・アーキテクチャは、プログ
ラムが主記憶装置とキャッシュ・サブシステムの内容の
間の関係を制御することを可能にする、限られた組のキ
ャッシュ管理命令を供給する。However, this high-level system architecture provides a limited set of cache management instructions that allow programs to control the relationship between main memory and the contents of the cache subsystem.

これらの管理命令は、３２バイトのキャッシュ・ライン
のみを扱い、システムがより低速の主記憶装置に対する
不必要なキャッシュ参照を避けることを可能にする。例
えば、−時記憶域が、もはや必要でなくなったとき、キ
ャッシュ中のラインが以前のＣＰＵ記憶によって変更さ
れていた場合でも、データ・キャッシュ・ライン無効化
命令の使用が、主記憶装置に対する不必要な書戻しを防
止する。These management instructions only handle 32-byte cache lines, allowing the system to avoid unnecessary cache references to slower main memory. For example, when storage is no longer needed, the use of a data cache line invalidation instruction removes unnecessary data from main memory, even if the line in the cache has been modified by previous CPU storage. prevent unnecessary writebacks.

（ｄ）　　キャッシュ・サブシステムキャッシュ会サブシステムは、１６にの命令キャッシュ
と１６にのデータ・キャッシュから構成される。各キャ
ッシュは、２ウエイ・セット・アソシアティブとして編
成されている。従って、各キャッシュは、８にのサブセ
ットＡと８にのサブセットＢから構成される。一つのキ
ャッシュは、最大限３２バイトのラインを５１２本含む
ことができる。サブセットＡ中のライン２５６本とサブ
セットＢ中のライン２５６本である。(d) Cache subsystem The cache subsystem consists of 16 instruction caches and 16 data caches. Each cache is organized as a two-way set associative. Therefore, each cache consists of 8 subsets A and 8 subsets B. One cache can contain up to 512 lines of 32 bytes each. There are 256 lines in subset A and 256 lines in subset B.

（、）　　ディレクトリ各キャッシュは、連関するディレクトリを備えている。(,) Directory Each cache has an associated directory.

このディレクトリは、アクセス時間が全キャッシュ・サ
イクル時間の約１７５の超高速バイポーラ・ランダム・
アクセス記憶装置に含まれている。This directory has an ultra-fast bipolar random
Included in the access storage device.

ディレクトリ中の各項目は、各連関サブセット中に一つ
ずつ、２つの可能なキャッシュ・ラインの存在および状
況を記載する。従って、このディレクトリは、１つのキ
ャッシュ・サブセット中に物理的に存在できる最大ライ
ン数と同じ項目を含めるだけの大きさでなければならな
い。各キャッシュ・サブセットは、２５６本までのライ
ンを含むことができるので、ディレクトリは２５６の項
目を含まなければならない。この構造は、第２１図には
つきシと示されている。Each entry in the directory describes the existence and status of two possible cache lines, one in each association subset. Therefore, this directory must be large enough to contain entries equal to the maximum number of lines that can physically exist in one cache subset. Each cache subset can contain up to 256 lines, so the directory must contain 256 entries. This structure is shown in FIG.

（ｅ−１）アドレス・フィールド本実施例に関しては、上位システムは２４ビツト・アド
レスを利用することが仮定される。概念的には、アドレ
スはページ・アドレス、ページ中のラインのアドレスお
よびライン中のバイトのアドレスの３つのフィールドに
再分割できる。これらのサブフィールドは、ここではそ
れぞれＰ％ＤおよびＷと呼ぶことにする。この構成は、
第３図に詳しく記されている。第３図のテーブルは、キ
ャッシュ・サイズの範囲（４に→１６Ｋ）およびキャッ
シュおよびディレクトリの各種パラメータに対するサイ
ズの影響をも示している。(e-1) Address Field For this embodiment, it is assumed that the host system uses 24-bit addresses. Conceptually, an address can be subdivided into three fields: the page address, the address of the line in the page, and the address of the byte in the line. These subfields will be referred to herein as P%D and W, respectively. This configuration is
This is detailed in Figure 3. The table in FIG. 3 also shows the range of cache sizes (from 4 to 16K) and the effect of size on various cache and directory parameters.

各ディレクトリ項目は、２つのキャッシュ・サプセツ）
（ＰＡおよびＰＢ）に記憶されているラインのページ・
アドレスを含む２つのアドレス・フィールドと、処理さ
れているキャッシュ命令のためのハードウェア・アルゴ
リズムに指令する制御ビット・フィールドを持っている
。ディレクトリは、目的アドレスのライン・アドレス・
サブフィールドによってアドレスされる。キャッシュ・
サイズが小さくなるとライン・アドレス・サブフィール
ドのサイズが減少するが、ページ・アドレス・サブフィ
ールドのサイズは増大する。（第３図を参照）事実、こ
の時、サイズのよシ小さい（１ページ当シのラインの数
がよシ少ない）ページがよシ多く形成される。Each directory entry has two cache successors)
Pages of lines stored in (PA and PB)
It has two address fields containing the address and a control bit field that directs the hardware algorithm for the cache instruction being processed. The directory is the line address of the destination address.
Addressed by subfield. cache·
As the size decreases, the size of the line address subfield decreases, but the size of the page address subfield increases. (See FIG. 3) In fact, at this time, more pages of smaller size (fewer lines per page) are formed.

（＠−２）制御ビット・フィールド命令キャッシュ・ディレクトリは、各項目毎に３つノ制
御ピットを含み、データ・キャッシュ・ディレクトリは
５つの制御ビットを含む。どちらのキャッシュ・ディレ
クトリもその各項目が各サブセットについて１つずつ、
合計２つの有効ヒツト（ｖＡおよびＶＢ）ならびに１つ
のＬＲＵビットを含む。その上、データ・キャッシュ・
ディレクトリは各サブセットについて１つずつ、合計２
つの変更ピッ）（ＭＡおよびＭＢ）を含む（第３図を参
照のこと）。(@-2) Control Bit Field The instruction cache directory contains three control pits for each entry, and the data cache directory contains five control bits. Both cache directories have each entry one for each subset,
Contains a total of two valid hits (vA and VB) and one LRU bit. Moreover, the data cache
There are 2 directories, one for each subset.
(See Figure 3).

有効ビットは、キャッシュと主記憶装置の内容の間の関
係を制御するために使用される。それらは、キャッシュ
のラインが現在主記憶装置に常駐するバージョンによっ
て置換されたとき、「１」にセットされる。あるライン
に対する有効なビットは、プロセッサからのキャッシュ
管理命令によってターンオフすることができる。無効化
された（Ｖ二重）ラインをプログラムが参照すると、無
効ラインは主記憶装置中に存在するその現バージョンに
よって置換させられる。The valid bit is used to control the relationship between cache and main memory contents. They are set to ``1'' when a line in the cache is replaced by the version currently residing in main memory. Valid bits for a line can be turned off by cache management instructions from the processor. When a program references an invalidated (V double) line, the invalidated line is replaced by its current version that resides in main memory.

ＬＲＵビットは、どちらのサブセットが主記憶装置から
置換ラインを受取るかを決定する。ＬＲＵピットの状態
は、キャッシュ・ハードウェア・プロシージャによって
制御され、プロセッサがプログラム制御下で管理するこ
とはできない。ＬＲＵ置換置換シロシージャキャッシュ
のあるラインを主記憶装置からの新しいラインで置換す
ることが必要になったとき、従うべき有効な戦略は、最
近もつとも使われなかったアソシアティブ・セット中の
ラインを置換することであるという前提条件にもとづい
ている。キャッシュは、正に２ウエイ・セット・アソシ
アティブであるので、単一制御ビットを開いてこの判断
を下すことができ、最近もつとも使用されなかったもの
は最近にもつとも使用されたものになる。The LRU bit determines which subset receives the replacement line from main memory. The state of the LRU pit is controlled by cache hardware procedures and cannot be managed by the processor under program control. LRU Replacement Replacement When it becomes necessary to replace a line in the cache with a new line from main memory, a valid strategy to follow is to replace the line in the associative set that has not been used in the recent past. It is based on the premise that this is true. Since the cache is truly two-way set associative, a single control bit can be opened to make this decision, and the least recently used becomes the most recently used.

データ・キャッシュ・ディレクトリ中の変更ヒツトは、
プロセッサ記憶命令が生じたとき、「１」にセットされ
る。これは、キャッシュ中のラインのバージョンが更新
済みであることをキャッシュ制御ハードウェアに指示し
、このラインを置換すべき場合は主記憶装置にそれを書
戻さなければならない。しかし、ラインが無効化（Ｖ＝
Ｏ）された場合には、書戻しは禁止される。命令キャッ
シュでは、可能でないことを再度指摘しておく。Changes in the data cache directory are
Set to ``1'' when a processor store instruction occurs. This indicates to the cache control hardware that the version of the line in the cache has been updated and must write it back to main memory if the line is to be replaced. However, the line is disabled (V=
O), writeback is prohibited. I would like to point out again that this is not possible with the instruction cache.

（ｆ）　　プロトタイプの実現上記の記憶階層を、プロトタイプとして実現した。主記
憶装置は、サイクル時間が３００ナノ秒の１．０メガバ
イトのＦＥ前前記製装置用いて設計した。２重キャッシ
ュは、それぞれサイクル時間がＣＰＵのサイクル時間と
符号する６０ナノ秒の１６にバイトのバイポーラ記憶装
置を用いて設計した。各キャッシュは、２つのアソシア
ティブ・セットそれぞれに２５６本のライン、すなわち
最大限５１２本のラインを含んでいる。その上、各キャ
ッシュの最大サイズを手動で８Ｋまたは４にバイトに減
らすことができ、それによって全体内容をそれぞれ２５
６本または１２８本のラインに減らすことができる（第
３図を参照のこと）。(f) Realization of prototype The above memory hierarchy was realized as a prototype. The main memory was designed using a 1.0 MB pre-FE device with a cycle time of 300 nanoseconds. A dual cache was designed using bipolar storage of 16 bytes, each with a cycle time of 60 nanoseconds coinciding with the CPU's cycle time. Each cache contains 256 lines in each of the two associative sets, or a maximum of 512 lines. Moreover, the maximum size of each cache can be manually reduced to 8K or 4 bytes, thereby reducing the overall contents to 25 bytes each.
It can be reduced to 6 or 128 lines (see Figure 3).

（ｆ−１）物理パッケージ命令キャッシュでもデータ・キャッシュでもキャッシュ
・アレイはバイポーラ・トランジスタ記憶技術を利用し
て、４枚のカードにパッケージした。各カードは２に８
１８ビツトを含み、１キャッシュ当りのカードは４枚で
あった。ここで述べる実施例は例示のためだけのもので
ある。かかるキャッシュの一般構造は当接術の専門家な
ら、ここに記載するアーキテクチャ定義・制御機能およ
び命令書式から充分にわがろと考えられる。(f-1) Physical Package The cache array, both instruction and data cache, utilizes bipolar transistor storage technology and is packaged into four cards. Each card is 2 to 8
There were 4 cards per cash, including 18 bits. The embodiments described herein are for illustrative purposes only. The general structure of such a cache is well within the scope of the architecture definition, control functions, and instruction format described herein to those skilled in the art of abutment.

（ｇ）　　キャッシュ編成ＣＰＵに対するキャッシュ・インターフェースは、幅３
２ビット（１ワード）であり、主記憶装置に対するイン
ターフェースは取出し用で幅７２ビット（ハリティを含
めた二重ワード）または記憶用で３６ビツトである。初
期取出しには目的アドレスでＡおよびＢサブセットに同
時にアクセスできることが望ましい。目的アドレスのワ
ードを両方のアソシアティブ・サブセットから同時に読
取れると仮定すれば、様々なキャッシュ記憶編成が可能
である。(g) The cache interface to the cache-organizing CPU is 3 wide.
The interface to main memory is 72 bits wide (double word including harness) for retrieval or 36 bits for storage. It is desirable for the initial fetch to be able to access the A and B subsets simultaneously at the destination address. Various cache storage organizations are possible, assuming that the word at the destination address can be read from both associative subsets simultaneously.

そうするのは、キャッシュ・サイクル中の後の方でディ
レクトリ・アクセスによってその情報が与えられるまで
は、目的物が存在するサブセットがわからないためであ
る。この理由から、また時間を節約するためにディレク
トリとキャッシュは同時にアクセスされる。This is because the subset in which the object resides is not known until that information is provided by a directory access later in the cache cycle. For this reason, and to save time, the directory and cache are accessed simultaneously.

ディレクトリ・アクセスでＡとＢのどちらのサブセット
にも目的物が存在しないこと（ミス）が示された場合、
キャッシュ・アクセスからのデータは無視され、目的デ
ータを含むラインを求めて主記憶装置がアクセスされる
。データ・キャッシユ・ミスの場合、現在キャッシュ中
に存在するラインを主記憶装置からの新しいラインで置
換する前に書戻することか必要になる場合もある。If directory access indicates that the object does not exist in either subset A or B (a miss),
Data from the cache access is ignored and main memory is accessed for the line containing the target data. In the case of a data cache miss, it may be necessary to write back the lines currently in the cache before replacing them with new lines from main memory.

目的ページ・アドレスがＡまたはＢのすブセットに対す
るディレクトリ項目と符号する（ヒツト）場合、正しい
サブセットが直ちに知られ、キャッシュからの目的デー
タをヒツト・すブセットからＣＰＵに直接ゲートするこ
とができる。この戦略を用いると、データをＣＰＵに送
るのに必要な合計時間が最小限に抑えられる。If the destination page address corresponds to a directory entry for subset A or B, the correct subset is immediately known and the destination data from the cache can be gated directly from the subset to the CPU. Using this strategy, the total time required to send data to the CPU is minimized.

キャッシュ・ミスによる主記憶装置へのアクセスｔｌｉ
、３２／＜イトのラインを生成し、そのラインが連続す
る４つの二重ワードとしてキャッシュに多重化される。Access to main memory due to cache miss tli
, 32/<ite, which is multiplexed into the cache as four consecutive double words.

この記憶システム・アーキテクチャはキャッシュ・ミス
の場合に目的ワードを含む二重ワードが、まず記憶制御
装置によって返送されることを指定している。残りの３
つの二重ワードは、そのラインに含まれる４つの二重ワ
ードがすべて返送されるまで、目的アドレスを二重ワー
ドずつ増分することによって生成される、順次隣接する
アドレスから返送される。This storage system architecture specifies that in the event of a cache miss, the double word containing the target word is first returned by the storage controller. remaining 3
Two double words are returned from successively adjacent addresses, generated by incrementing the destination address by double words, until all four double words contained in that line have been returned.

この最初の二重ワードが常に目的ワードを含むので、デ
ータ・キャッシュ中ではＣＰＵデータを主記憶装置から
の最初の二重ワードに組み合わせることによって記憶ミ
スが処理される。Since this first double word always contains the destination word, memory misses are handled in the data cache by combining CPU data with the first double word from main memory.

（ｈ）　　命令キャッシュの取出シ命令キャッシュ取出しハードウェア操作順序の流れ図が
第１３図に示され、第５図に表にしてまとめられている
。両キャッシュのデータ流れのブロック・ダイアグラム
が第２．１図に示されている。(h) Instruction Cache Retrieval A flowchart of the instruction cache retrieval hardware operation sequence is shown in FIG. 13 and summarized in a table in FIG. A block diagram of the data flow for both caches is shown in Figure 2.1.

命令キャッシュ取出し要求がキャッシュ−アレイとディ
レクトリへのアクセスを開始する。ディレクトリ・アク
セスはキャッシュ・アクセスとオーバラップされている
。両キャッシュは取出しの目的ワードがＡとＢのどちら
のすブセットからも同時忙アクセスされるように編成さ
れている。Instruction cache fetch requests initiate accesses to the cache array and directory. Directory access is overlapped with cache access. Both caches are organized such that the target word of the fetch is busy accessed from both the A and B subsets simultaneously.

これはどちらのサブセットに目的データが存在するかを
知らずに、あるいは目的物がどちらのサブセットにも全
く存在しない場合にも行われる。This is done without knowing in which subset the target data is present, or even if the target is not present in either subset at all.

（ｈ−１）　　ヒツト目的アドレスがＡまたはＢのサフ゛セットのディレクト
リ項目と符号する場合そのディレクトリへのアクセスは
ヒ２トとなる。正しいサブセットが直ちに知られ、その
ラインが有効な場合には目的データをヒツト拳サブセッ
トからＣＰＵにゲートできる。キャッシュ命アレイへの
アクセスに追加的時間は不要である。そのディレクトリ
項目に対するＬＲＵビットは、次に逆のサブセットに切
り替えられる。もちろんＬＲＵが以前の操作の結果とし
て既に逆のサブセットを示していることもある。(h-1) If the hit target address corresponds to a directory entry in a subset of A or B, the access to that directory is a hit. The correct subset is immediately known and the target data can be gated from the Hituken subset to the CPU if the line is valid. No additional time is required to access the cache array. The LRU bit for that directory entry is then switched to the opposite subset. Of course, the LRU may already indicate the opposite subset as a result of previous operations.

（ｈ−２）　　ミスまたは無効ヒツト目的アドレスがＡとＢのどちらのすブセットのディレク
トリ項目とも符号しない場合、そのディレクトリうのア
ｉクセ°スはミスと、なる。キャッシュ・アレイからア
クセスされたデータは無視され、取出し要求が（目的ア
ドレスと一緒に）主記憶装置へ転送され、主記憶装置が
３２バイトのラインを４つの８バイト二重ワードとして
返送する。返送される最初の二重ワードは（前述のよう
に）目的アドレスによって指示された特定の４バイト・
ワードを必ず含む。速度を上げるため、このワードは二
重ワードがキャッシュに記憶される間に同時にＣＰＵヘ
バイパスされる。このデータ経路が第２．１図にはっき
り示されている。次の３つの二重ワードは順次主記憶装
置から返送され、致着するとキャッシュに記憶される。(h-2) Miss or Invalid Hit If the target address does not correspond to a directory entry in either subset A or B, the access to that directory results in a miss. Data accessed from the cache array is ignored and the fetch request (along with the destination address) is forwarded to main memory, which returns the 32-byte line as four 8-byte doublewords. The first doubleword returned is the specific 4-byte word pointed to by the destination address (as described above).
Must include word. To increase speed, this word is simultaneously bypassed to the CPU while the double word is stored in the cache. This data path is clearly shown in Figure 2.1. The next three double words are sent back from main memory in sequence and stored in the cache when they arrive.

主記憶装置によって返送される、目的物を含む最初の二
重ワードが実際にはそのラインのｌ４の二重ワードであ
ることがあり得る。この場合、そのラインの残りの３つ
の二重ワードも、やはり順次返送されるが、ラインｑ歿
跨から始まる。このようにどの二重ワードが最初に到着
するかにかかわらず、他の３つの二重ワードが順次受取
られる。It is possible that the first doubleword containing the object returned by main memory is actually the l4 doubleword of that line. In this case, the remaining three double words on that line are also returned sequentially, but starting with line q crossing. Thus, regardless of which double word arrives first, the other three double words are received sequentially.

また、主記憶装置から取出されるラインがＣＰＵに対す
る一連の命令をデータとして含んでいることにも注意す
べきである。ＣＰＵは深さ４レベルの事前取出スタック
を含んでいると仮定される。It should also be noted that the lines retrieved from main memory contain as data a series of instructions to the CPU. It is assumed that the CPU contains a prefetch stack that is four levels deep.

この事前取出しスタックは、自身を充填された状態を保
とうと試みて絶えず取出し要求を命令キャッシュに送り
、それによって各機械サイクル毎に全ＣＰＵアーキテク
チャの要件にもとづいて新しい命令を与える。命令キャ
ッシュ・ミスの場合は、この事前取出レベルが２つ以上
空になることもあり得る。新しいラインが主記憶装置か
ら到着したとき、キャッシュ・ラインの最後に達するま
で、またはＣＰＵ事前取出し機構が一杯になってバイパ
ス舎アクションを停止するまで、キャッシュはデータを
ＣＰＵにバイパスし続ける。こうしてＣＰＵＫバイパス
されるワード数は、最低１ワードから最高でそのライン
中の８ワードすべてまでのどんな数となることもありう
る。This prefetch stack attempts to keep itself filled and constantly sends fetch requests to the instruction cache, thereby providing new instructions each machine cycle based on the requirements of the overall CPU architecture. In the case of an instruction cache miss, more than one of these prefetch levels may be empty. When a new line arrives from main memory, the cache continues to bypass data to the CPU until the end of the cache line is reached or the CPU prefetch mechanism becomes full and stops the bypass action. The number of words that are thus CPUK bypassed can be any number from a minimum of one word to a maximum of all eight words in the line.

（ｈ−３）ディレクトリの更新ミスの後、ディレクトリは新しい目的アドレス（ｐｔ）
で更新され、新しいラインが記憶されたサブセットに対
する有効ビットがｒＩＪにセットされ、ＬＲＵビットは
逆のサブセットに切り替えられる。(h-3) After a directory update error, the directory has a new destination address (pt)
The valid bit for the subset in which the new line was stored is set in rIJ and the LRU bit is switched to the opposite subset.

目的アドレスが無効な（Ｖ＝０）ディレクトリ項目と符
号した場合、その結果は、無効ヒツトであり、ハードウ
ェア・アクションは下記の２点を除いてはミスが発生し
た場合と同じである。第１にＬＲＵビットの状態とはか
かわりなく、新しいラインが符号したサブセットに入り
、第２にディレクトリ中のアドレスは更新されない。こ
れはアドレスは有効符号を生じたが、データは無効で置
換しなければならないからである。新しいラインのロー
ドに続いて、ディレクトリＬＲＵビットは逆のサブセッ
トに切り替えられ、記憶されたサブセットに対する有効
ビットが（「１」に）セットオンされる。この場合も、
「有効」ピッ）（ＶＡまたはＶＢ）がゼロになり得る唯
一の経路は、「命令キャッシュ・ライン無効化」命令を
用いてＣＰＵを介するものであることに注意すべきであ
る。If the destination address encodes an invalid (V=0) directory entry, the result is an invalid hit and the hardware action is the same as if a miss had occurred, with the following two exceptions. First, the new line enters the coded subset regardless of the state of the LRU bit, and second, the address in the directory is not updated. This is because the address yielded a valid code, but the data is invalid and must be replaced. Following loading of a new line, the directory LRU bit is switched to the opposite subset and the valid bit for the stored subset is set on (to "1"). In this case too,
It should be noted that the only path in which the "valid" bit (VA or VB) can be zero is through the CPU using the "Instruction Cache Line Invalidation" instruction.

第５図は上述のこの命令キャッシュ取出しハードウェア
順序をまとめたものである。FIG. 5 summarizes this instruction cache retrieval hardware order described above.

（ｉ）−一ターキャッシュ増出しデータ・キャッシュ取出しノ為−ドウェアの操作順序が
第１４図に示され、第６図にまとめられている。全体の
データ流れ図は第２１図に出ている。(i) - The order of operation of the hardware for data expansion/cache retrieval is shown in FIG. 14 and summarized in FIG. The entire data flow diagram is shown in Figure 21.

（ｉ−ｊ）　　ヒツト有効ヒツトに対するデータ・キャッシュ取出しプロシー
ジャは、上記命令キャッシュ取出しプロシージャと同一
であり、繰返す必要はない。(i-j) Hits The data cache retrieval procedure for valid hits is the same as the instruction cache retrieval procedure described above and need not be repeated.

（ｉ　−２）　　ミスまたは無効ヒツトミスまたは無効
ヒツトの場合のデーターキャッシュ取出し操作順序は、
２つの例外を除いては上記の命令キャッシュ取出し順序
と類似している。(i-2) Miss or invalid hit The data cache retrieval operation order in the case of a miss or invalid hit is as follows:
This is similar to the instruction cache fetch order described above with two exceptions.

第１にミスのとき、ＣＰＵ記憶装置によって変更されて
いる有効ラインをそれが置換される前に主記憶装置に書
き戻さなければならない。第２にミスが生じると、８つ
までのワードをＣＰＵ事前取出しスタックにバイパスで
きる命令キャッシュとは違って、データｅキャッシュは
ミスのとき１ワードしかＣＰＵにバイパスしない。First, on a miss, the valid line that has been modified by CPU storage must be written back to main storage before it is replaced. Second, unlike the instruction cache, which can bypass up to eight words to the CPU prefetch stack when a miss occurs, the data e-cache only bypasses one word to the CPU on a miss.

取出しミスが起こったとき、変更ビットおよび有効ビッ
トについて置換すべきラインがチエツクされる。ＡとＢ
のどちらのサブセットを置換すべきかは、ＬＲＵビット
によって決定される。そのラインが無効または変更され
ていない場合には、書戻しは起こらず、目的ラインに対
する取出し要求が主記憶装置に送られる。主記憶装置に
よって返送されたラインの最初の二重ワードは、目的ワ
ードを含み、最初の二重ワードがキャッシュに言己憶さ
れている間に同時にキャッシュ・ハードウェアがこのワ
ードをストリップアウトして、それをＣＰＨに直接バイ
パスする。新しいラインの記憶に続いて、命令キャッシ
ュ取出しについて説明したようにディレクトリは更新さ
れる。When a fetch miss occurs, the line to be replaced is checked for changed bits and valid bits. A and B
Which subset of is to be replaced is determined by the LRU bit. If the line is invalid or unchanged, no writeback occurs and a fetch request for the target line is sent to main storage. The first doubleword of the line returned by main memory contains the target word, and the cache hardware strips out this word at the same time that the first doubleword is stored in the cache. , bypassing it directly to CPH. Following storage of the new line, the directory is updated as described for instruction cache fetches.

（ｉ−３）書戻し置換すべきラインが有効でかつ変更済みの場合、その置
換ラインについて取出し要求が出される前にそれを主記
憶装置に書戻さなければならない。(i-3) Writing back If the line to be replaced is valid and has been changed, it must be written back to the main memory before a fetch request is issued for the replacement line.

古戻すべきラインのページ・アドレスはディレクトリ項
目に含まれ、そのアドレスが書戻しのため主記憶装置に
供給される。キャッシュは書戻しラインの４つの二重ワ
ードを読出すのに４サイクルかかり、マルチプレクサ５
２がそれらを一連の８ワードに分けて主記憶装置の記憶
入力レジスタ（Ｓ　ＩＲ）に転送する（第２．１図を参
照のこと）。The page address of the line to be rolled back is contained in a directory entry, and that address is provided to main memory for writing back. The cache takes four cycles to read the four double words on the writeback line, and multiplexer 5
2 divides them into a series of eight words and transfers them to the storage input register (SIR) of main memory (see Figure 2.1).

通常の場合、入出力が書戻しと新しいラインに対する取
出し要求の発行との間に主記憶装置に対するアクセスを
得、それによって入出力操作時にデータ・キャッシュを
停止させることが起こり得る。これが起こることを防止
するため、特に書戻しを伴うデータ・キャッシュ・ミス
の場合には、データ・キャッシュから特に高い優先順位
の取出し要求を主記憶装置に出すことができる。この高
い優先順位の要求の効果は、旧（書戻し）ラインをＳＩ
Ｒ中にロードしながら主記憶装置に新しいラインの自動
取出しを開示させることである。これによってデータ・
キャッシュ専用のバック−ツー弗バック主記憶装置サイ
クルがもたらされ、旧ラインの記憶前に主記憶装置の新
しいラインに対する取出しが起こることができるように
なる。旧ラインは一時的にＳＩＲに保管される。In the normal case, it may occur that an I/O gains access to main storage between writeback and issuing a fetch request for a new line, thereby stalling the data cache during the I/O operation. To prevent this from happening, a particularly high priority fetch request from the data cache can be issued to main storage, especially in the case of a data cache miss that involves a writeback. The effect of this high priority request is that the old (writeback) line
The purpose of this is to cause main memory to initiate automatic retrieval of new lines while loading into R. This allows the data
A cache-only back-to-back main memory cycle is provided, allowing fetches for new lines of main memory to occur before storing old lines. The old line is temporarily stored in SIR.

この戦略を具体化するため、データ・キャッシュが書戻
すべきラインの最初の二重ワードについてアクセスされ
、データ・キャッシュ制御装置に書戻しアドレス、最初
の書戻しワードおよび高い優先順位の要求を送る。主記
憶装置が要求の受取りを認めると、データ・キャッシュ
はただちに旧ラインの残りの３つの二重ワードの取出し
を始め、主記憶装置へのアドレスを旧（書戻し）アドレ
スから新（目的）アドレスに変更する。書戻しラインは
一度に１ワードずつ主記憶装置ＳＩＲに転送される。主
記憶装置が新しいライン・データを返送すると、新しい
ラインが記憶されている間にデータ・キャッシュで一連
の４つの二重ワードの記憶が開始され、目的データがＣ
ＰＵにバイパスされる。次に、ディレクトリ情報が更新
される。To implement this strategy, the data cache is accessed for the first double word of the line to be written back, sending the data cache controller the write back address, the first write back word, and a high priority request. When main memory acknowledges the request, the data cache immediately begins fetching the remaining three double words of the old line, changing the address into main memory from the old (writeback) address to the new (destination) address. Change to The writeback line is transferred to main memory SIR one word at a time. When main memory returns a new line of data, the data cache begins storing a series of four double words while the new line is being stored, and the target data is
Bypassed to PU. Next, directory information is updated.

第６図は先に述べたようにデータ・キャッシュ取出しハ
ードウェアの操作順序をまとめたものであり、第１４図
は操作の屓次的詳細を流れ図の形で示したものである。FIG. 6 summarizes the sequence of operations of the data cache retrieval hardware as previously described, and FIG. 14 shows step-by-step details of the operations in flowchart form.

（ｊ）　　データ・キャッシュ記。(j) Data cache record.

データ・キャッシュ記憶ハードウェアの操作順序の流れ
図が第１５図に示してあり、第７図にまとめである。デ
ータ流れ図については、やはり第２．１図を参照すべき
である。A flowchart of the sequence of operations for the data cache storage hardware is shown in FIG. 15 and summarized in FIG. Reference should also be made to Figure 2.1 for the data flow diagram.

第１４図と第１５図の流れ図をちょっと比較してみると
、データ・キャッシュ取出し、記憶アルゴリズムは全く
類似していることがわかる。主な違いは、単にＣＰｔＪ
とキャッシュの間のデータ流れの方向である。下記の記
憶プロセスについての議論は、取出しプロセスをよく知
っていることを前提にしたもので、この２つの違いを重
点に置いている。A quick comparison of the flow diagrams of Figures 14 and 15 shows that the data cache retrieval and storage algorithms are quite similar. The main difference is simply CPtJ
is the direction of data flow between the cache and the cache. The following discussion of the storage process assumes familiarity with the retrieval process and focuses on the differences between the two.

取出し要求とは違って、データ・キャッシュに対する記
憶要求はキャッシュ・アレイのアクセスを自動的には開
始しない。キャッシュ・′アレイの両サブセットは取出
し操作から同時に読取ることができるが、記憶操作の場
合は１つのサブセットしか書込めない。従ってディレク
トリ・アクセスの結果が、目的物がキャッシュ中に存在
するか否か、および存在する場合はどのサブセットに存
在するかを示すまで記憶操作は開始できない。キャッシ
ュ・アレイへのアクセスは、取出し操作の場合のように
ディレクトリへのアクセスとオーバーラツプできないの
で、すべての記憶操作にはキャッシュ・サイクルを拡大
することが必要である。Unlike a fetch request, a store request to a data cache does not automatically initiate an access to the cache array. Both subsets of the cache' array can be read simultaneously from a fetch operation, but only one subset can be written to for a store operation. Therefore, a storage operation cannot begin until the results of the directory access indicate whether the object is present in the cache, and if so, in which subset. All storage operations require extended cache cycles because accesses to the cache array cannot overlap with accesses to the directory as in the case of fetch operations.

必要とされるディレクトリおよびキャッシュ・アレイへ
の順次アクセスを収容するには、記憶サイクルを５０９
６だけ拡大しなければならない。To accommodate the required sequential access to the directory and cache array, storage cycles are
It must be expanded by 6.

記憶装置は３２ピツト・ワードの列として編成される。The storage is organized as columns of 32 pit words.

各ワードは２つの１６ビツト半ワードまたは４つの８ビ
ット文字に再分割できる。プロセッサ記憶命令は、１．
２または３文字のエンティティに対して働く。この３種
のプロセッサ記憶は、キャッシュ・ハードウェアによっ
て記憶８（１バイト）、記憶１６（２バイト）および記
憶２４（３バイト）の３つの異なる記憶指令として区別
さレル。第１４図のデータ・キャッシュ記憶のＲｈ図は
、この３つの記憶指令のすべてに適用される。Each word can be subdivided into two 16-bit half words or four 8-bit characters. Processor storage instructions are: 1.
Works on 2- or 3-character entities. These three types of processor storage are distinguished by the cache hardware as three different storage commands: storage 8 (1 byte), storage 16 (2 bytes), and storage 24 (3 bytes). The Rh diagram of data cache storage in FIG. 14 applies to all three storage commands.

ＣＰＵは各記憶指令と共に３２ビツトのデータ・ワード
を供給し、書込むべきバイトをワード内で事前位置合わ
せする。記憶目的アドレスの最下位２ピット＋特定型式
の記憶指令が４つのバイト書込みゲート（第２１図のｓ
　ｏ　）　（ｗｏ−ｗ３）のうちのどれが活動化するか
を決定するのに充分な情報をデータ・キャッシュ制御ハ
ードウェアに与える。The CPU supplies a 32-bit data word with each store command and pre-aligns the byte to be written within the word. The lowest two pits of the storage target address + the specific type of storage command are the four byte write gates (s in Figure 21).
o) Provide sufficient information to the data cache control hardware to determine which of (wo-w3) to activate.

（ｊ−１）　ヒツト記憶目的アドレスがディレクトリに存在する場合、書込
むべきサブセットが識別され、キャッシュ・サイクルが
その記憶を収容できるように拡大される。３２ビツトの
記憶データ・ワード＋生成されたパリティ（４ビツト）
が、全てのアレイ・カードに同時に印加される。バイト
書込みゲートのサブセット選択交差が、どのバイトが書
込まれるかを決定する。記憶に続いて、ヒツト・ライン
に対する変更ビットをターンオンしＬＲＵピットを逆の
サブセットに切替えることによってディレクトリが更新
される。(j-1) If the hit storage destination address exists in the directory, the subset to write to is identified and the cache cycle is expanded to accommodate the storage. 32-bit storage data word + generated parity (4 bits)
is applied to all array cards simultaneously. The subset selection intersection of the byte write gates determines which bytes are written. Following storage, the directory is updated by turning on the change bit for the hit line and switching the LRU pits to the opposite subset.

（ｊ−２）　　ミスまたは無効ヒツト記憶目的ラインが無効またはキャッシュ中にない場合、
キャッシュ・サイクルは拡大されず、そのラインを主記
憶装置から検索しなければならない。置換すべきライン
が有効でかつ変更済みの場合、書戻しが必要であり、デ
ータ・キャッシュ取出しについて説明したのと全く同様
に書戻し機構が動作する。(j-2) If the miss or invalid hit storage target line is invalid or not in the cache,
The cache cycle is not extended and the line must be retrieved from main memory. If the line to be replaced is valid and has been modified, then writeback is required and the writeback mechanism operates exactly as described for data cache retrieval.

データ・キャッシュは８バイト二重ワードのどこにでも
バイトを挿入できる、入力バイトマルチプレクサ４８（
第２１図を参照のこと）を備えている。ミスがあるとこ
のマルチプレクサがセットアツプされて、ＣＰＵからの
データを主記憶装置によって返送された取出しデータに
組合わせる。The data cache has an input byte multiplexer 48 (
(see FIG. 21). When there is a miss, this multiplexer is set up to combine the data from the CPU with the retrieved data returned by main memory.

この記憶の目的物は、主記憶装置によって返送されるラ
イン・データの最初の二重ワードに含まれるので、ＣＰ
Ｕデータの組合わせは４つの二重ワードのうちの最初の
ものがキャッシュに記憶されたときしか起こらない。後
の３つの二重ワードおよびＣＰＵによって選択されなか
った最初の二重ワードのすべてのバイトは、入力マルチ
プレクサによって主記憶装置取出しデータ経路に切り換
えられる。４つの二重ワード記憶中にすべてのバイト書
込みゲートはターンオンされ、１つのサブセットが選択
される。従ってこの記憶によって変更されるバイトは専
ら入力マルチプレクサとサブセット選択によって制御さ
れる。The object of this storage is contained in the first double word of line data returned by main memory, so the CP
Combining U data only occurs when the first of four double words is stored in the cache. The last three doublewords and all bytes of the first doubleword not selected by the CPU are switched to the main memory fetch data path by the input multiplexer. During four double word stores all byte write gates are turned on and one subset is selected. The bytes modified by this storage are therefore controlled exclusively by the input multiplexer and subset selection.

記憶ミス（または無効ヒツト）に続いて置換ラインがそ
の記憶を受取った場合に変更ビットがターンオンされる
点を除いては、命令キャッシュ取出しゃデータ・キャッ
シュ取出しの場合と同様に、ディレクトリが更新される
。Instruction cache fetches update the directory in the same way as data cache fetches, except that the modified bit is turned on if the replacement line receives the store following a memory miss (or invalid hit). Ru.

第７図はデータ・キャッシュ記憶ノ・−ドウエア・プロ
シージャをまとめたものである。FIG. 7 summarizes the data cache storage node-ware procedures.

（ｋ）　　キャッシュ・ライン無効化両方のキャッシュに適用されるハードウェア無効化操作
順序の流れ図は、第１２図にみられ、第８図にまとめら
れている。(k) Cache Line Invalidation A flowchart of the hardware invalidation operation sequence as applied to both caches is shown in FIG. 12 and summarized in FIG.

（第２１図および第２２図に示す）命令キャッシュ制御
ハードウェアは、プロセッサカラの１つのキャッシュ管
理命令に応答するように設計されている。この命令は、
命令キャッシュ・ライン無効化（ＩＮＩＣＬ）と呼ばれ
、その目的はディレクトリの有効ビットを目的アドレス
によって識別されるラインに対してゼロにセットするこ
とである。命令が主記憶装置中で変更された場合、更新
された情報を求めてキャッシュが主記憶装置にアクセス
するように、ラインを無効化しなければならない。デー
タ・キャッシュは、同じやり方でデータ拳キャッシュ番
ライン無効化（Ｉ　ＮＤ　ＣＩ　）キャッシュ管理命令
に応答する。この管理命令に関係するのはディレクトリ
だけなので、キャッシュ・アレイはアクセスされない。The instruction cache control hardware (shown in FIGS. 21 and 22) is designed to respond to a single cache management instruction in a processor color. This command is
Referred to as Instruction Cache Line Invalidation (INICL), its purpose is to set the valid bit of the directory to zero for the line identified by the destination address. If an instruction is changed in main memory, the line must be invalidated so that the cache accesses main memory for the updated information. The data cache responds to the data cache number line invalidation (INDCI) cache management command in the same manner. The cache array is not accessed since only the directory is involved in this management command.

目的アドレスは、キャッシュ・すブセットＡまたはＢの
どちらかに存在し得るが、両方に存在することはない、
単一ラインのみを識別することに注意すべきである。従
って、目的アドレスに対するディレクトリ項目の２つの
有効ビットのうちの１つのみが影響を受ける。すなわち
、Ｖａまたはｖｂが影響を受ける。The destination address may exist in either cache subset A or B, but not both.
It should be noted that only a single line is identified. Therefore, only one of the two valid bits of the directory entry for the destination address is affected. That is, Va or vb is affected.

無効化命令は、プログラムがキャッシュ中のラインを主
記憶装置からの最も最近のバージョンで置換できるよう
にする。ＬＲＵビットは、ノ・−ドウエアによって無効
化ラインを含むサブセットを指すように強いられ、従っ
て、置換ラインがそのキャッシュ位置に入る。Invalidation instructions allow a program to replace a line in the cache with the most recent version from main memory. The LRU bit is forced by the hardware to point to the subset containing the invalidated line, so that the replacement line enters that cache location.

目的物のページ・アドレス嗜サブフィールドが。The page address subfield of the object.

キャッシュ・サブセットＡとＢのいずれに関するディレ
クトリにも存在しないことがあシ得る。これは、キャッ
シュ・ミスとして定義され、ハードウェア・プロシージ
ャによって扱われなければならない。どちらかの（デー
タまたは命令）キャッシュ・ライン無効化命令の場合、
ミスは、有効ビットを変更する必要をなくシ、ハードウ
ェアは単に何もしないだけである。It is possible that it does not exist in the directory for either cache subset A or B. This is defined as a cache miss and must be handled by hardware procedures. For either (data or instruction) cache line invalidation instructions,
A miss eliminates the need to change the valid bit; the hardware simply does nothing.

第８図は、ＩＮＩＣＬおよびＩＮＤＣＬハードウェア操
作順序をまとめたものである。FIG. 8 summarizes the INICL and INDCL hardware operation order.

（ｊ）　　７”−タ・キャッシュ−ライン・ロードデー
ターキャッシュ・ライン・ロード（Ｉ、ＤＣＬ）キャッ
シュ管理命令の流れ図は、第１６図にみることができ、
第９図にまとめられている。この命令の目的は、単に、
あるう、インがキャッシュ中にまだない場合には、それ
を主記憶装置からキャッシュにロードすることである。(j) 7” Data Cache Line Load Data Cache Line Load (I, DCL) A flow diagram of the cache management command can be seen in FIG.
It is summarized in Figure 9. The purpose of this command is simply to
Alternatively, if the in is not already in the cache, load it from main memory into the cache.

（ｊ−１）　　ヒツトそのラインがすでにキャッシュ中にあり有効な場合、ロ
ードは起こらず、ディレクトリのＬＲｔＪピットが逆の
サブセットに切替えられる。(j-1) If the line is already in the cache and valid, no load occurs and the LRtJ pit of the directory is switched to the opposite subset.

（ｊ−２）　　ミスまたは無効ヒツトミスないし無効ヒツトの場合、仁の命令はデータ・キャ
ッシュ取出しについて先に述べたのと同様の挙動を示す
。不在ラインに対する取出し要求が、主記憶装置に転送
される。ミスがあり、置換すべきラインが有効でかつ変
更済みである場合、最初に書戻しが起こる。主記憶装置
から返送されたラインはキャッシュ中にロードされるが
、データがＣＰＵにバイパスされることはない。新しい
２インのロードに続いて、記憶されたサブセットに対す
る有効ビットがｒオン」に（すなわち「０」に）セット
され、ＬＲＵビットが逆のサブセットに切替えられる。(j-2) Miss or Invalid Hit In the case of a miss or an invalid hit, the instruction in question exhibits similar behavior as described above for data cache fetches. A fetch request for the absent line is transferred to main storage. If there is a mistake and the line to be replaced is valid and has been modified, a writeback occurs first. Lines returned from main memory are loaded into the cache, but the data is not bypassed to the CPU. Following the loading of a new 2-in, the valid bit for the stored subset is set to 'on' (ie '0') and the LRU bit is toggled to the opposite subset.

新しいラインがミスのためにロードされた場合、ディレ
クトリも新しいラインのページ・アドレスで更新される
。If a new line is loaded due to a miss, the directory is also updated with the new line's page address.

第９図ｊｄ、データ・キャッシュ・ライン番ロードのハ
ードウェア・プロシージャをまとめたものである。FIG. 9jd summarizes the hardware procedure for loading data cache line number.

←）データ・キャッシュ・ライン・セットデータ・キャ
ッシュ・ライン・セット（ＳＥＴＤＣＬ）キャッシュ管
理命令は、第１７図にみることができ、第１０図にまと
められている。この命令の目的は、ラインに対するディ
レクトリ項目がまだそこにない場合、それを確立するこ
とであり、そのラインを主記憶装置からキャッシュ中に
ロードすることではない。これは、ラインをその後の記
憶によって変更すべき場合に、主記憶装置からの不必要
な取出しを防止するために使用できる。←) Set Data Cache Line The Set Data Cache Line (SETDCL) cache management command can be seen in FIG. 17 and summarized in FIG. The purpose of this instruction is to establish a directory entry for the line if it is not already there, not to load the line from main memory into the cache. This can be used to prevent unnecessary retrieval from main memory if the line is to be modified by subsequent storage.

（ｍ−１）　ヒツトラインがキャッシュ中に既に存在する場合、有効ビット
を「オン」にセットし、変更ビットを「オフ」にセット
することによって、そのディレクトリ項目が更新される
。(m-1) If the hit line already exists in the cache, its directory entry is updated by setting the valid bit to "on" and the modified bit to "off."

（ｍ−２）　　ミスミスのとき、現在存在するラインが有効でかつ変更済み
である場合、この命令は書戻°しを強いる。(m-2) On a miss, if the currently existing line is valid and has been modified, this instruction forces a writeback.

（書戻しを伴うまたは伴わない）ミスに続いて、ディレ
クトリは新しい目的ページ・アドレスで更新され、有効
ビットがオンにセットされ、変更ビットはオフにセット
され、ＬＲＵビットは逆のサブセットに切替えられる。Following a miss (with or without writeback), the directory is updated with the new destination page address, the valid bit is set on, the modified bit is set off, and the LRU bit is toggled to the opposite subset. .

書戻しが必要でない限υ、データ転送は起こらない。No data transfer occurs unless writeback is required.

第１０図はデータ・キャッシュ・ライン・セットツバ−
ドウエア操作順序をまとめたものである。Figure 10 shows the data cache line set tab.
This is a summary of the software operation order.

（ｎ）　　データ・キャッシュ・ライン記データ舎キャ
ッシュ・ライフ記憶（ＳＴＤＣＬ）キャッシュ管理命令
の流れ図は、第１８図に詳しくみることができ、第１１
図にまとめられている。(n) The flowchart of the data cache line storage cache life storage (STDCL) cache management commands can be seen in detail in FIG.
It is summarized in the figure.

この命令の目的は、有効でかつ変更済みであるラインを
書戻させて、主記憶装置がラインの最初のバージョンを
反映するようにすることである。明らかなように、デー
タ・キャッシュ・ラインのこの記憶は、正常な「取出し
」または「記憶」操作の結果ではない。The purpose of this instruction is to cause a valid and modified line to be written back so that main memory reflects the original version of the line. As is clear, this storage of a data cache line is not the result of a normal "fetch" or "store" operation.

（ｎ−１）　　ヒツトラインがキャッシュ中にあシ、有効でかつ変更済みであ
る場合、書戻しが起とシ、変更ピットをオフにすること
によってディレクトリは更新される。ラインは将来使用
できるようになおキャッシュに保持される。(n-1) If the hit line is still in the cache, valid, and modified, a writeback occurs and the directory is updated by turning off the modification pit. The line is still kept in cache for future use.

（ｎ−２）　　ミスまたは無効ヒツトラインがキャッシュ中にない場合、またはキャッシュ中
に存在するが有効でないかまたは変更済みでない場合、
ハードウェアは書戻しを実施せず、プロシージャから出
る。(n-2) If the missed or invalid hit line is not in the cache, or if it is in the cache but is not valid or modified,
The hardware does not perform a writeback and exits the procedure.

この命令は、主記憶装置に書戻されるラインが、次の入
出力命令または命令キャッシュ・ミスに使用できる時間
には完了できないことに注意すること。この命令の特別
の同期バージョンがこの目的に使用できるが、それにつ
いては後述する。Note that this instruction cannot complete in time for the line written back to main memory to be available for the next I/O instruction or instruction cache miss. A special synchronous version of this instruction can be used for this purpose and will be discussed below.

第１１図は、データ・キャツ、シュｅライン記憶ハード
ウェア操作順序をまとめたものである。FIG. 11 summarizes the data cat and e-line storage hardware operation sequence.

（、）　　データ・キャッシュ・ライン記憶・同期化こ
の命令（ＳＴＳＤＣＬ）は、キャッシュ中で上述のデー
タ・キャッシュ・ライン記憶（ＳＴＤＣＬ）と全く同様
に動作する。しかし、プロセッサは、命令ス・ドリーム
に進む前に、この命令が完了するまで待つ。従って、こ
の命令の実行後に起こる入出力活動または命令キャッシ
ュ活動は主記憶装置で利用可能なラインの最新バージョ
ンをもつことになる。(,) Store Data Cache Line Synchronization This instruction (STSDCL) operates in the cache exactly like the Store Data Cache Line (STDCL) described above. However, the processor waits until this instruction completes before proceeding to the instruction dream. Therefore, any I/O activity or instruction cache activity that occurs after execution of this instruction will have the latest version of the line available in main memory.

Ｅ、結論以上要約すると、ここで開示したキャッシュ・サブシス
テム・アーキテクチャは、記憶装置への参照によって生
じるＣＰＵ遊休時間を最小限に抑えることができるので
、ユニークである。多数の特殊なキャッシュ管理命令と
キャッシュ・ディレクトリ中のいくつかの特別な制御ピ
ットとを独特のやυ方で使用してかかる主記憶装置への
参照を最小限にすることができるのが新規な点である。E. CONCLUSIONS In summary, the cache subsystem architecture disclosed herein is unique because it can minimize CPU idle time caused by references to storage. What is novel is that a number of special cache management instructions and some special control pits in the cache directory can be used in unique ways to minimize references to such main memory. It is a point.

これらのキャッシュ管理命令およびディレクトリ制御ピ
ットは、ソフトウェアがキャッシュ・サブシステムの内
容と主記憶装置の内容の関係を制御できるようにする。These cache management instructions and directory control pits allow software to control the relationship between the contents of the cache subsystem and the contents of main memory.

この能力のため、有効キャッシュ・ヒツト率が改善され
、それによってシステムがよシ低速の主記憶装置を不必
要に参照することを避けることができる。This ability improves the effective cache hit ratio, thereby allowing the system to avoid unnecessary references to slower main memory.

その上、このキャッシュ・サブシステムのアーキテクチ
ャは、主記憶装置への参照が避けられない場合にアクセ
ス時間を改善できる点が新規である。この能力は、命令
とデータに対して、独立にオーバーラツプして、主記憶
にアクセスできる別個のキャッシュを使用するという独
特のやシ方および主記憶装置からＣＰＵへの目的情報の
流れをスピードアップするためのキャッシュ・バイパス
機構の使用によって生じたものである。Moreover, this cache subsystem architecture is novel in that it can improve access times when references to main memory are unavoidable. This ability is unique in that it uses separate caches for instructions and data that can overlap and access main memory independently, speeding up the flow of desired information from main memory to the CPU. This was caused by the use of a cache bypass mechanism for

命令取出しを記憶階層中でデータ取出しとは独立に進行
させることが可能なこと、ＣＰＵから主記憶装置への自
動的「ストア・スルー」を禁止したこと、およびキャッ
シュ・ハードウェアによって処理される特別のキャッシ
ュ管理命令のソフトウェアを使用することがあいまって
、ハードウェアとソフトウェアの機能が新規なやり方で
調和して相互作用できるようにすることによって、シス
テムのパフォーマンスが改善される。The ability to allow instruction fetches to proceed independently of data fetches in the storage hierarchy, the prohibition of automatic "store through" from the CPU to main memory, and the special features handled by cache hardware. Coupled with the use of software cache management instructions, system performance is improved by allowing hardware and software functionality to interact harmoniously in novel ways.

[Brief explanation of the drawing]

第１図は、本階層記憶構成の高ノベル組織図である。第２１図および第２２図は、データの流れおよび機能的
ハードウェア構成の詳細を示す、第１図に記した記憶階
層の機能ブロック図である。第五１図および第五２図は、キャッシュ・ディレクトリ
の基本要素およびそれに付随する制御要素を示す、デー
タ・キャッシュ・アドレッシングの細部の機能ブロック
図である。第４図は、異なる３種のキャッシュ・サイズ（４に１３
に、および１６Ｋ）の場合の、キャッシュ・アドレッシ
ング、ディレクトリ項目およびキャッシュ・パラメータ
の図である。第５図は、命令キャッシュ取出しハードウェアの動作中
にキャッシュ・サブシステムで行われる動作をまとめた
ものである。第６図は、データ・キャッシュ取出しハードウェアの動
作中にキャッシュ・サブシステムで行ワれる動作をまと
めたものである。第７図は、データ・キャッシュ記憶ハードウェアの動作
中にキャッシュ・サブシステムで行われる動作をまとめ
たものである。第８図は、命令またはデータ・キャッシュ・ライン無効
化の動作中にキャッシュ・サブシステム・ハードウェア
で行われる動作をまとめたものである。第９図は、データ・キャッシュ番ライン・ロードの動作
中にキャッシュ番サブシステムーハードウェアで行われ
る動作をまとめたものである。第１０図は、データ・キャッシュ・ライン・セットの動
作中にキャッシュＱサブシステム番ハードウェアで行わ
れる動作をまとめたものである。第１１図は、データ拳キャッシュ・ライン記憶の動作中
にキャッシュ・サブシステム・ハードウェアで行われる
動作をまとめた。ものである。第１２図は、「キャッシュ・ライン無効化」命令の実施
中にキャッシュ・サブシステム・ハードウェアで行われ
る動作の詳細な流れ図である。第１３図は、［命令キャッシュ取出し」命令の実施中に
キャッシュ・サブシステムもハードウェアで行われる動
作の詳細な流れ図である。第１４図は、［データ・キャッシュ取出し」命令の実施
中にキャッシュ・サブシステム・ハードウェアで行われ
る動作の詳細な流れ図である。第１５図は、「データ・キャッシュ記憶」命令の実施中
にキャッシュ・サブシステム・ハードウェアで行われる
動作の詳細な流れ図である。第１６図は、「データ・キャッシュ・ライン・ロード」
命令の実施中にキャッシュ・サブシステム・ハードウェ
アで行われる動作の詳細な流れ図である。第１７図は、［データ・キャッシュ・ライン・セット」
命令の実施中にキャッシュ・サブシステム・ハードウェ
アで行われる動作の詳細な流れ図である。第１８図は、「データ・キャッシュ・ライン記憶」命令
の実施中にキャッシュ・サブシステム・ハードウェアで
行われる動作の詳細な流れ図である。出願人　　インターナショナノいビジネス・マシーンズ
・コーボレーククン復代理人弁理士　　岡　　　１）　
　次　　　生ＣＰＵ命令ＦＩＧ、　３．１ＣＰＵ−ｚ″−タ主記憶入力テ一タＦＩＧ、４ＦＩＧ、　８ＦＩＧ、　ｌ　Ｉ（ＣＰＬＩ？ｆｉ）（ＣＰＵ呑氷）（ＣＰＩＪ　！堪）（ｃｐｕ要家）ＦＩＧ、Ｉ７FIG. 1 is a high-level organizational chart of this hierarchical storage structure. 21 and 22 are functional block diagrams of the storage hierarchy depicted in FIG. 1, showing details of data flow and functional hardware configuration. Figures 51 and 52 are detailed functional block diagrams of data cache addressing showing the basic elements of a cache directory and its associated control elements. Figure 4 shows three different cache sizes (4 to 13
FIG. 3 is a diagram of cache addressing, directory entries, and cache parameters for 16K and 16K). FIG. 5 summarizes the operations that occur in the cache subsystem during operation of the instruction cache retrieval hardware. FIG. 6 summarizes the operations that occur in the cache subsystem during operation of the data cache retrieval hardware. FIG. 7 summarizes the operations that occur in the cache subsystem during operation of the data cache storage hardware. FIG. 8 summarizes the operations that occur in the cache subsystem hardware during an instruction or data cache line invalidation operation. FIG. 9 summarizes the operations performed in the cache subsystem hardware during a data cache line load operation. FIG. 10 summarizes the operations performed in the cache Q subsystem hardware during data cache line set operation. FIG. 11 summarizes the operations that occur in the cache subsystem hardware during data storage cache line storage operation. It is something. FIG. 12 is a detailed flowchart of the operations that take place in the cache subsystem hardware during implementation of the "invalidate cache line" instruction. FIG. 13 is a detailed flowchart of the operations that the cache subsystem also performs in hardware during the execution of the Instruction Cache Fetch instruction. FIG. 14 is a detailed flowchart of the operations that occur in the cache subsystem hardware during the execution of a ``data cache fetch'' instruction. FIG. 15 is a detailed flowchart of the operations that occur in the cache subsystem hardware during the implementation of a ``store data cache'' instruction. Figure 16 shows "data cache line load"
2 is a detailed flowchart of operations performed in cache subsystem hardware during instruction execution; Figure 17 shows [Data cache line set]
2 is a detailed flowchart of operations performed in cache subsystem hardware during instruction execution; FIG. 18 is a detailed flowchart of the operations that take place in the cache subsystem hardware during implementation of a "store data cache line" instruction. Applicant International Business Machines Co., Ltd. Patent Attorney Oka 1)
Next Raw CPU instruction FIG, 3.1 CPU-z″-ta main memory input data FIG, 4 FIG, 8 FIG, l I (CPLI?fi) (CPU drinking ice) (CPIJ! fluent) (CPU key person) ) FIG, I7

Claims

[Claims]

(1) In a data processing device including a main memory and a store-in type cache storage device, an instruction cache portion of the cache storage device, a data cache portion of the cache storage device, and a data cache portion of the cache storage device. a cache directory that stores, for each memory location line, the high-order bits of the main memory address of the stored data and control bits indicating the state of said line; and instructions read and modified into said cache for said data. and means for invalidating a line corresponding to the changed instruction in the instruction cache, wherein the instruction cache has a write path only from the main memory. A data processing system equipped with