JPH0659978A

JPH0659978A - Information processor

Info

Publication number: JPH0659978A
Application number: JP4213842A
Authority: JP
Inventors: 均 ▲高▼木; Hitoshi Takagi
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-08-11
Filing date: 1992-08-11
Publication date: 1994-03-04

Abstract

PURPOSE:To prevent a pipeline from being emptied until an instruction is loaded from an instruction cached into a main storage device. CONSTITUTION:The leading address of an exceptional processing routine is stored in a result register 116, and in a succeeding cycle, the leading address is transferred to an instruction pointer 102 to start an instruction cache 104. A finite status logic 111 goes to the succeeding step independently of the existence of the address of the pointer 102 in the cache 104. Even when a cache miss is generated, the data transfer of the cache 104 to/from the main storage device can be overlapped to the succeeding step of the logic 111, so that a required instruction can be prevented from being queued due to the generation of a miss in the cache 104.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報処理装置に関し、特
に命令とデータが分離されたキャッシュメモリを持つプ
ロセッサにおける命令キャッシュメモリの制御に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus, and more particularly to control of an instruction cache memory in a processor having a cache memory in which instructions and data are separated.

【０００２】[0002]

【従来の技術】パイプライン化されたプロセッサに命令
を継続して供給するために、命令専用のキャッシュ・メ
モリが採用されている。命令専用のキャッシュ・メモリ
をここでは簡単に、命令キャッシュと云う。命令キャッ
シュは、パイプライン方式でオーバラップして命令を実
行する場合に発生する命令とデータ（オペランド）の取
り出しの衝突を解消できる。2. Description of the Prior Art In order to continuously supply instructions to a pipelined processor, an instruction-only cache memory is employed. The instruction-only cache memory is simply referred to as an instruction cache here. The instruction cache can eliminate a collision between fetching of an instruction and data (operand), which occurs when the instructions are overlapped and executed in a pipeline manner.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、命令キ
ャッシュは高速ゆえ小容量であり、プロセッサ上で動作
するプログラム全体の命令を命令キャッシュ上に格納し
ない限り、命令が必要なタイミングで取り出せない事
（すなわち、キャッシュミス）が起き得る。この時、命
令キャッシュへ主記憶から所望の命令をロードするま
で、パイプラインは空になり、性能の低下をまねく。However, the instruction cache has a small capacity due to its high speed, and unless the instructions of the entire program operating on the processor are stored in the instruction cache, the instructions cannot be fetched at the necessary timing (that is, , Cache miss) can occur. At this time, the pipeline becomes empty until the desired instruction is loaded from the main memory to the instruction cache, resulting in a decrease in performance.

【０００４】[0004]

【課題を解決するための手段】本発明の情報処理装置
は、ａ．命令取り出し動作とは別に、予め定められたア
ドレスまたは特定の命令により指定されたアドレスのデ
ータを主記憶装置から取り出すことと、ｂ．命令キャッ
シュはデータを主記憶装置へ要求中でも、後続リクエス
トの要求アドレスが主記憶装置へ要求中のアドレスと同
一でない限り、その後続リクエストを受け付けるように
して、本来命令が必要とななる時より前に、命令を主記
憶装置から命令キャッシュへロードすることを、他のプ
ロセッサ内の動作とオーバラップして、従来装置におい
て生じたパイプラインの空きを極力減らすことを目的と
している。The information processing apparatus according to the present invention comprises: a. Aside from the instruction fetch operation, fetching data at a predetermined address or an address designated by a specific instruction from the main storage device; b. Even if the instruction cache is requesting data to the main memory device, it will accept the subsequent request unless the requested address of the subsequent request is the same as the address being requested to the main memory device. In addition, the purpose of the present invention is to load the instruction from the main memory device to the instruction cache by overlapping with the operation in the other processor so as to minimize the empty space of the pipeline generated in the conventional device.

【０００５】[0005]

【実施例】以下に、本発明の実施例について図面を参照
して説明する。本発明の実施例の１つは、パイプライン
中の命令実行部が順序回路によって実現されている場合
であり、順序回路が長いサイクルにわたって動作し、命
令の取り出し動作が止まる場合に、長いサイクルの後に
取り出される命令が命令キャッシュ中に存在することを
保証するものである。もう１つは、指定したアドレスの
命令が命令キャッシュ中に存在することを保証する命令
を実際の命令取り出し（分岐命令であることが多いと思
われる）に先き出って発行するものである。この命令の
プログラム中の挿入は、命令の主記憶からの取り出しに
要するサイクル数が分かればコンパイラにより可能であ
る。Embodiments of the present invention will be described below with reference to the drawings. One of the embodiments of the present invention is a case where the instruction execution unit in the pipeline is realized by a sequential circuit, and when the sequential circuit operates for a long cycle and the instruction fetch operation stops, It guarantees that the instruction fetched later will be in the instruction cache. The other is to issue an instruction that guarantees that the instruction at the specified address exists in the instruction cache, prior to the actual instruction fetch (which is likely to be a branch instruction). . This instruction can be inserted into the program by the compiler if the number of cycles required to fetch the instruction from the main memory is known.

【０００６】まず、情報処理装置における命令・データ
の両キャッシュの位置付けについて図３を参照して簡単
に説明する。情報処理装置は中央処理装置１（ＰＲＯ
Ｃ）と主記憶装置２（ＭＭ）からなる。他に、外部記憶
手段との通信や人間とのインタフェースを取るための入
出力手段があるが、命令実行に関しては本質的でないの
で省いた。First, the positioning of both instruction and data caches in the information processing apparatus will be briefly described with reference to FIG. The information processing apparatus is the central processing unit 1 (PRO
C) and the main memory 2 (MM). Besides, there is an input / output means for communicating with an external storage means and an interface with a human, but it is omitted because it is not essential for instruction execution.

【０００７】さて、中央処理装置１の基本的動作は主記
憶装置２内にある命令を１つ取り出して、そこで指示さ
れた操作を主記憶装置２内のデータ（オペランド）、ま
たは中央処理装置１内部の状態に対して行ない、主記憶
装置２内の状態を変化させるか、中央処理装置１内部の
状態に反映させることである。この動作において高々３
つのメモリ操作が存在する。即ち、ａ．命令取り出し、
ｂ．オペランド取り出し、ｃ．結果の書き込みである。
キャッシュメモリは、これらの操作が命令実行部（Ｉ
Ｐ）３に比較してサイクル・タイムが数倍から数十倍遅
い主記憶装置に対して行なわれることによる性能の低下
を、小容量であるが、命令実行部３と同じサイクルタイ
ムの高速メモリを置くことにより回避しようと云うもの
である。Now, the basic operation of the central processing unit 1 is to take out one instruction in the main storage device 2 and to perform the operation instructed therein by the data (operand) in the main storage device 2 or the central processing unit 1. This is done with respect to the internal state, and the state in the main storage device 2 is changed or reflected in the internal state of the central processing unit 1. At most 3 in this operation
There are two memory operations. That is, a. Instruction fetch,
b. Operand fetch, c. It is the writing of the result.
In the cache memory, these operations are executed by the instruction execution unit (I
P) A high-speed memory, which has a small capacity but the same cycle time as that of the instruction execution unit 3, has a decrease in performance due to a main memory having a cycle time which is several to several tens of times slower than that of the P. It is meant to avoid by placing.

【０００８】上記のメモリ操作ａに対して命令キャッシ
ュ（ＩＣ）４があり、メモリ操作ｂとｃに対してはデー
タキャッシュ（ＤＣ）５がある。There is an instruction cache (IC) 4 for the memory operation a and a data cache (DC) 5 for the memory operations b and c.

【０００９】図１は本発明の第１の実施例の中央処理装
置を示すブロック図である。まず、全体の動きを説明す
る。FIG. 1 is a block diagram showing a central processing unit according to the first embodiment of the present invention. First, the overall movement will be described.

【００１０】命令は命令ポインタ（ＩＰ）１０２が示す
アドレスで命令キャッシュ（Ｉ−ＣＡＣＨＥ）１０４か
ら取り出される。命令ポインタ１０２は通常ある命令の
取り出しが終わると＋１加算器（＋１）１０３により次
の命令を示すように更新される。なお、分岐命令が実行
された場合には、後で延べる演算部の実行結果が信号線
１５１により供給されるので、そちらを選択して命令ポ
インタ１０２に格納する。The instruction is fetched from the instruction cache (I-CACHE) 104 at the address indicated by the instruction pointer (IP) 102. The instruction pointer 102 is normally updated by the +1 adder (+1) 103 to indicate the next instruction when the fetching of a certain instruction is completed. When a branch instruction is executed, the execution result of the operation unit, which will be extended later, is supplied by the signal line 151, so that it is selected and stored in the instruction pointer 102.

【００１１】もし、命令ポインタ１０２によって指定さ
れたアドレスの命令が命令キャッシュ１０４の中に見つ
からなければ、アドレスを命令メモリアドレスレジスタ
命令メモリアドレスレジスタ１０６に格納して、主記憶
装置（ここでは図示せず）へ信号線１６８を通じて必要
となるアドレスのデータを要求する。その時、命令ポイ
ンタ１０２の更新を保留される。If the instruction of the address designated by the instruction pointer 102 is not found in the instruction cache 104, the address is stored in the instruction memory address register and the instruction memory address register 106, and the main memory (not shown here). No.) requesting the data of the necessary address through the signal line 168. At that time, the update of the instruction pointer 102 is suspended.

【００１２】何サイクルかの後、所望のデータが信号線
１６７を通じて主記憶装置から送られて来るので、命令
キャッシュ１０４はそれを自身へ登録するとともに、命
令レジスタ（ＩＲ）１０５へ格納し、保留されていた命
令ポインタ１０２の更新を再会させる。After several cycles, the desired data is sent from the main memory through the signal line 167, so that the instruction cache 104 registers it in itself and stores it in the instruction register (IR) 105 and holds it. The update of the instruction pointer 102 that has been performed is reunited.

【００１３】命令レジスタ１０５に納められた命令は命
令デコーダ（ＤＥＣＯＤＥ）１０７によりハードウェア
がコントロールし易いように解読され、デコード済み命
令レジスタ（ＤＩＲ）１０８へ納められる。The instruction stored in the instruction register 105 is decoded by an instruction decoder (DECODE) 107 so as to be easily controlled by hardware, and stored in a decoded instruction register (DIR) 108.

【００１４】デコードされた情報、即ちデコード済み命
令レジスタ１０８に格納さてている情報の一部は、オペ
ランドを選択するために用いられる。即ち、演算レジス
タのアドレス、メモリ中データのアドレスなどである。The decoded information, that is, a part of the information stored in the decoded instruction register 108, is used to select an operand. That is, the address of the arithmetic register, the address of the data in the memory, and the like.

【００１５】これらの情報により、最終的に２つのオペ
ランドが選択され、オペランドレジスタ（ＯＲ１）１１
２と（ＯＲ２）１１３へ格納される。メモリデータの取
り出しにはデータキャッシュ１１０が用いられる。要求
されたデータがデータキャッシュ１１０に見つからない
場合の動作は概ね命令キャッシュ１０４と同じであるの
で省略する（主記憶装置とのインタフェースも省略す
る）。演算レジスタがオペランドとして指定された場
合、レジスタファイル（ＲＥＧＳ）１０９のデータが取
り出される。Based on these pieces of information, two operands are finally selected, and the operand register (OR1) 11
2 and (OR2) 113. The data cache 110 is used to retrieve the memory data. Since the operation when the requested data is not found in the data cache 110 is almost the same as that of the instruction cache 104, the description thereof is omitted (the interface with the main storage device is also omitted). When the operation register is designated as an operand, the data in the register file (REGS) 109 is taken out.

【００１６】デコードされた情報の残りは命令の動作を
司る有限状態論理（Ｆ．Ｓ．Ｍ．）１１１の初期状態と
なる。有限状態論理１１１はデコード済み命令レジスタ
１０８から信号線１５７を通じて初期状態を受け取り、
新しい状態を予め決められた論理によって次状態を生成
し、それを状態レジスタ（ＳＴＡＴＥ）１１４に格納す
る。状態レジスタ１１４の出力は、算術論理ユニット
（ＡＬＬ）１１５の動作の指定、次のオペランドの取り
出し指示などとともに、有限状態論理１１１自身の次の
状態を決定するための情報も含む。図１中で信号線１６
５は前者に対応し、信号線１６６は後者に対応する。The rest of the decoded information becomes the initial state of the finite state logic (FSM) 111 which controls the operation of the instruction. Finite state logic 111 receives the initial state from decoded instruction register 108 via signal line 157,
A new state is generated by a predetermined logic to generate a next state, and the next state is stored in the state register (STATE) 114. The output of the state register 114 includes information for determining the next state of the finite state logic 111 itself, as well as designation of the operation of the arithmetic logic unit (ALL) 115, a fetch instruction of the next operand, and the like. Signal line 16 in FIG.
5 corresponds to the former, and the signal line 166 corresponds to the latter.

【００１７】有限状態論理１１１は論理回路による順序
回路によって実現されている場合もあるし、マイクロプ
ログラム方式によって実現されていることもある。The finite state logic 111 may be realized by a sequential circuit composed of a logic circuit or may be realized by a microprogram system.

【００１８】さて、オペランドレジスタ１１２およびオ
ペランドレジスタ１１３に格納されたデータは、状態レ
ジスタ１１４の指定による演算を算術論理ユニット１１
５により施され、結果レジスタ（ＲＤＲ）１１６に演算
結果が納められる。そして、次のサイクルで結果レジス
タ１１６のデータは、レジスタファイル１０９かデータ
キャッシュ１１０に、その時の状態レジスタ１１４の指
定によって格納される。この結果の格納をもって有限状
態論理１１１の動作は終了し、次の命令の実行を開始す
る。The data stored in the operand register 112 and the operand register 113 is subjected to the operation designated by the status register 114 in the arithmetic logic unit 11.
5, and the operation result is stored in the result register (RDR) 116. Then, in the next cycle, the data in the result register 116 is stored in the register file 109 or the data cache 110 according to the designation of the status register 114 at that time. With the storage of this result, the operation of the finite state logic 111 ends and execution of the next instruction begins.

【００１９】次に、第１の実施例で本発明の効果が発揮
される一つの例について述べる。Next, one example in which the effect of the present invention is exhibited in the first embodiment will be described.

【００２０】ある処理の途中で例外的な事象が発生し、
それに対応する処理に柔軟性を持たせるためにソフトウ
ェアで実現することは良く行なわれる。しかし、その処
理が例外的故に命令が必要な時に命令キャッシュに存在
せず、（キャッシュミス）ソフトウェア処理を開始しよ
うとした時に、かなりの遅れを生じる事も良く有ること
である。An exceptional event occurs in the middle of a certain process,
It is often implemented by software in order to give the corresponding processing flexibility. However, since the processing is exceptional, the instruction does not exist in the instruction cache when the instruction is needed, and a considerable delay often occurs when the (cache miss) software processing is started.

【００２１】この時、例外的な事象の発生から、最終的
なソフトウェアルーチンに飛び込む迄に、上記の有限状
態論理１１１が動作し、原因究明，状態退避，初期設定
などのステップを踏むことになる。このことを図示する
と、図４のようになる。これらの有限状態論理１１１の
ステップ３０２，３０３，３０４には１０〜３０サイク
ルを要し、この間命令の取り出しは無い。At this time, from the occurrence of the exceptional event to the jump to the final software routine, the above-mentioned finite state logic 111 operates, and steps such as investigating the cause, saving the state and initial setting are taken. . This is illustrated in FIG. Steps 302, 303, 304 of these finite state logic 111 require 10 to 30 cycles, and no instruction is fetched during this period.

【００２２】例外処理ルーチンが必要な時点で命令キャ
ッシュに存在していることを保障するために、有限状態
論理１１１の動作ステップの初期に命令キャッシュへそ
のアドレスを要求しておけば、以降の有限状態論理１１
１のステップと命令キャッシュへの取り込みがオーバラ
ップするため、命令が必要の段になって、命令キャッシ
ュのミスにより待たせることが無くなる。図５は、この
改善された有限状態論理のフローを示している。図中３
０８，３１０，３１１はそれぞれ図４の３０２，３０
３，３０４に対応するもので、３０９が上記の命令の取
り込みを命令キャッシュへ指示するステップである。こ
の時の動作を図１で説明すると、ステップ３０９におい
て、結果レジスタ１１６に例外処理ルーチンの先頭アド
レスが納められていて、次のサイクルで命令ポインタ１
０２へそれが移送されて、命令キャッシュ１０４が起動
される。In order to guarantee that the exception handling routine exists in the instruction cache at the required time, if the address is requested from the instruction cache at the beginning of the operation step of the finite state logic 111, the subsequent finite State logic 11
Since the step 1 and the fetching into the instruction cache overlap, an instruction becomes a necessary stage and is not made to wait due to an instruction cache miss. FIG. 5 shows the flow of this improved finite state logic. 3 in the figure
08, 310, and 311 are 302 and 30 of FIG. 4, respectively.
This is a step corresponding to 3,304, and 309 is a step of instructing the instruction cache to fetch the above-mentioned instruction. The operation at this time will be described with reference to FIG. 1. In step 309, the start address of the exception handling routine is stored in the result register 116, and the instruction pointer 1 is set in the next cycle.
It is moved to 02 and the instruction cache 104 is activated.

【００２３】命令ポインタ１０２のアドレスが命令キャ
ッシュ１０４中に有っても無くても、有限状態論理１１
１は次のステップへ進む。これにより、もし、命令キャ
ッシュ１０４でキャッシュミスが生じても命令キャッシ
ュ１０４の先に説明した主記憶装置とのやりとりと有限
状態論理１１１の後続のステップとをオーバラップする
ことができる。Whether the address of the instruction pointer 102 is in the instruction cache 104 or not, the finite state logic 11
1 proceeds to the next step. As a result, even if a cache miss occurs in the instruction cache 104, the above-described interaction with the main memory of the instruction cache 104 and the subsequent step of the finite state logic 111 can be overlapped.

【００２４】例外処理ルーチンの命令の取り出しを起動
する前に命令キャッシュ１０４でのキャッシュミス動作
が終了していれば、遅れなしで例外処理ルーチンが起動
できる。If the cache miss operation in the instruction cache 104 is completed before the instruction fetch of the exception handling routine is started, the exception handling routine can be started without delay.

【００２５】次に、図２は第２の実施例の中央処理装置
を示すブロック図である。図２において、この中央処理
装置は４段のパイプラインで一命令を実行し、図１と違
って実行部には有限状態論理は存在せず、命令の実行は
必らず１サイクルで終了するようになっている。そのた
め、パイプラインが止まるのは、命令キャッシュ（Ｉ−
ＣＡＣＨＥ）２０５でキャッシュミスが起こるときのみ
である。パイプラインの各ステージは、取り出し（Ｆｅ
ｔｃｈ）Ｆ，算術（Ａｒｉｔｈｍｅｔｉｃ）Ａ，メモリ
（Ｍｅｍｏｒｙ）Ｍ，書き込み（Ｗｒｉｔｅ）Ｗと定義
されている。演算は必ずレジスタ間で行なわれ、再びレ
ジスタへ書き戻される。Next, FIG. 2 is a block diagram showing the central processing unit of the second embodiment. In FIG. 2, this central processing unit executes one instruction in a 4-stage pipeline, and unlike FIG. 1, there is no finite state logic in the execution unit, and execution of the instruction is necessarily completed in one cycle. It is like this. Therefore, the pipeline stops at the instruction cache (I-
Only when a cache miss occurs in CACHE) 205. Each stage of the pipeline is
tch), arithmetic (A), memory (Memory) M, and write (Write) W. Operations are always performed between registers and written back to the registers again.

【００２６】メモリに対する操作はレジスタへのロード
とレジスタ内のデータのメモリへのストアのみで、その
時に何の演算も行なわれない。The operations on the memory are only loading to the register and storing the data in the register to the memory, and no operation is performed at that time.

【００２７】さて、この中央処理装置の一般的動作を説
明すると、命令は常に命令ポインタ（ＩＰ）２０１によ
って指され、命令ポインタ２０１は分岐命令以外は毎サ
イクル＋１加算器（＋１）２０２で＋１される。命令キ
ャッシュ２０５は図１の１０４と同じものでよい。但し
後で説明するように、要求アドレスの命令ポインタ２０
１のみでなく先行ロード用のアドレスレジスタ（ＰＡ
Ｒ）２０３によっても指定される。取り出された命令は
Ａステージの命令レジスタ（ＡＩＲ）２０７に格納され
る。この出力は命令デコーダ（ＤＥＣＯＤＥ）２０９に
供給され、Ａステージの制御信号を生成するのに用いら
れる。例えば信号線２６１により、ソースオペランドレ
ジスタアドレス（２つ）が指定され、汎用レジスタファ
イル（ＧＲＳ）２１０が読み出され、それぞれが信号線
２６５と２６６によって算術論理ユニット（ＡＬＵ）２
１１に供給される。同時に、信号線２６４により、算術
論理ユニット２１１で行なわれる演算が指定される。算
術論理ユニット２１１の出力２５１はＭステージのアド
レスレジスタ（ＭＡＲ）２１３に格納される。ここで、
もし、命令が分岐命令であったら、算術論理ユニット２
１１の出力は命令ポインタ２０１へ納められる。また、
命令がストア命令であれば汎用レジスタファイル２１０
の出力は信号線２６６を通じてＭステージのデータレジ
スタＭステージのデータレジスタ２１４に納められる。Now, explaining the general operation of this central processing unit, an instruction is always pointed to by an instruction pointer (IP) 201, and the instruction pointer 201 is incremented by +1 by a +1 adder (+1) 202 every cycle except for branch instructions. It The instruction cache 205 may be the same as 104 in FIG. However, as will be described later, the instruction pointer 20 of the request address
Not only 1 but also the address register (PA
R) 203. The fetched instruction is stored in the A stage instruction register (AIR) 207. This output is supplied to the instruction decoder (DECODE) 209 and used to generate the control signal of the A stage. For example, the signal line 261 specifies the source operand register addresses (two), the general register file (GRS) 210 is read, and the arithmetic logic unit (ALU) 2 is read by the signal lines 265 and 266, respectively.
11 is supplied. At the same time, signal line 264 specifies the operation performed by arithmetic logic unit 211. The output 251 of the arithmetic logic unit 211 is stored in the address register (MAR) 213 of the M stage. here,
If the instruction is a branch instruction, arithmetic logic unit 2
The output of 11 is stored in the instruction pointer 201. Also,
If the instruction is a store instruction, the general-purpose register file 210
Of the data is stored in the data register 214 of the M stage through the signal line 266.

【００２８】一方、Ｍステージの命令レジスタＭステー
ジの命令レジスタ２１２には、命令がＭステージの制御
信号に沿った形で格納される。On the other hand, the instruction register 212 of the M stage stores the instruction in the instruction register 212 of the M stage in accordance with the control signal of the M stage.

【００２９】Ｍステージでは、データキャッシュ（Ｄ−
ＣＡＣＨＥ）２１５がＭステージのアドレスレジスタ２
１３によりアドレスを、Ｍステージの命令レジスタ２１
２の一部（信号結２６８）によりメモリ操作を、またス
トア命令の場合にはＭステージのデータレジスタ２１４
によりデータを指示される。In the M stage, the data cache (D-
CACHE) 215 is an M-stage address register 2
13 for the address and M stage instruction register 21
Memory operation by a part of 2 (signal connection 268), and in the case of a store instruction, M stage data register 214
The data is instructed by.

【００３０】データキャッシュの動作は図１の１１０と
同じであり、ここでは省略する。The operation of the data cache is the same as 110 of FIG. 1 and is omitted here.

【００３１】もし、命令がメモリ操作に関するものでな
いのであれば、Ｍステージのアドレスレジスタ２１３の
内容は信号線２６９のバイパスを通ってＷステージのデ
ータレジスタ（ＷＤＲ）２１７へ格納される。それ以外
は、データキャッシュ２１５の出力２７１の内容がＷス
テージのデータレジスタ２１７へ取り込まれる。If the instruction is not related to memory operation, the contents of the address register 213 of the M stage is stored in the data register (WDR) 217 of the W stage through the bypass of the signal line 269. Otherwise, the content of the output 271 of the data cache 215 is fetched into the data register 217 of the W stage.

【００３２】Ｗステージの命令レジスタＷステージの命
令レジスタ２１６は結果（Ｗステージのデータレジスタ
２１７の内容）を汎用レジスタファイル２１０へ書き戻
すためのデスティネーションアドレス（信号線２６０）
を含む，結果の書き込みにより一命令の実行は終了す
る。W stage instruction register The W stage instruction register 216 is a destination address (signal line 260) for writing back the result (the contents of the W stage data register 217) to the general register file 210.
Execution of one instruction is completed by writing the result including.

【００３３】次に図２で分岐命令で分岐先がキャッシュ
ミスした場合を考える。その場合のタイミングを図６に
示す。Next, consider a case where a branch instruction causes a cache miss with a branch instruction in FIG. The timing in that case is shown in FIG.

【００３４】分岐命令では、分岐先のアドレスを命令ポ
インタ２０１に格納するとともに、後続の命令を無効果
する。そして、新しいアドレスの命令をパイプラインに
供給する。この時分岐先アドレスは分岐前のアドレスと
は違うメモリ領域であることが多く、分岐先（ターゲッ
ト）の命令取り出しでキャッシュミスが起きることが多
い。その場合、図６に示すように、キャッシュミスのた
めの遅れが見えてしまう。In the branch instruction, the address of the branch destination is stored in the instruction pointer 201, and the subsequent instruction becomes ineffective. Then, the instruction of the new address is supplied to the pipeline. At this time, the branch destination address is often a memory area different from the address before the branch, and a cache miss often occurs when the instruction of the branch destination (target) is fetched. In that case, as shown in FIG. 6, a delay due to a cache miss appears.

【００３５】次に、本実施例の場合について説明する。
まず、分岐先プリロード（ＴＰＬ）命令を新設する。Ｔ
ＰＬ命令は分岐命令と同じ形式を持ち、同じ分岐先アド
レスの計算を行なう。計算結果は、命令ポインタ２０１
でなく特別なプリロード用アドレスレジスタ（ＰＡＲ）
２０３へ格納され、命令キャッシュ２０５をアクセスす
る。この時、プリロード用アドレウで指定されたアドレ
スの命令が命令キャッシュ２０５内にあっても、無くて
もＴＰＬ命令は終了する。なお、プリロード用アドレス
レジスタ２０３によって命令キャッシュ２０５がアクセ
スされるとき、命令ポインタ２０１の更新（＋１）は抑
止される。すなわちパイプラインに空きができるのは１
サイクルのみである。Next, the case of this embodiment will be described.
First, a branch destination preload (TPL) instruction is newly provided. T
The PL instruction has the same format as the branch instruction and calculates the same branch destination address. The calculation result is the instruction pointer 201.
Not special address register for preload (PAR)
It is stored in 203 and accesses the instruction cache 205. At this time, the TPL instruction is terminated regardless of whether the instruction of the address designated by the preload address is in the instruction cache 205 or not. When the instruction cache 205 is accessed by the preload address register 203, the update (+1) of the instruction pointer 201 is suppressed. That is, there is only one empty space in the pipeline
Only cycles.

【００３６】さて、もしプリロード用アドレスレジスタ
２０３による命令キャッシュ２０５のアクセスでキャッ
シュミスが起こると、プリロード用アドレレスレジスタ
２０３に納められているアドレスはマルチプレクサ２０
４を経由して命令メモリアドレスレジスタ（ＩＭＡＲ）
２０６に格納される。そして、信号線２５８を通じて主
記憶装置（図示せず）へ要求を行なう。主記憶よりデー
タが返ってくるまでに数サイクル（ここでは１０サイク
ル）の時間が掛るが、上述のようにＴＰＬ命令はキャッ
シュヒット／ミスにかかわらず終了し、後続の命令を実
行するのでこの時間は見えなくなる。もし、後続の命令
が命令メモリアドレスレジスタ２０６と同じアドレスを
要求したり、キャッシュミスを起こした場合には、命令
取り出し動作は現在の主記憶装置への要求が完了するま
で中断される。If a cache miss occurs when the instruction cache 205 is accessed by the preload address register 203, the address stored in the preload address register 203 is the multiplexer 20.
Instruction memory address register (IMAR) via 4
It is stored in 206. Then, a request is made to the main storage device (not shown) through the signal line 258. It takes several cycles (10 cycles in this case) until the data is returned from the main memory, but as described above, the TPL instruction ends regardless of cache hit / miss and the subsequent instruction is executed, so this time Disappears. If a subsequent instruction requests the same address as the instruction memory address register 206 or causes a cache miss, the instruction fetch operation is suspended until the request to the current main memory is completed.

【００３７】図７は、本実施例のＴＰＬ命令の効果を示
すものである。図６のシーケンスにおいて、一命令当り
のサイクル数は約２．１であるのに対して、図７のシー
ケンスのそれは１．３である。FIG. 7 shows the effect of the TPL instruction of this embodiment. In the sequence of FIG. 6, the number of cycles per instruction is about 2.1, whereas that of the sequence of FIG. 7 is 1.3.

【００３８】[0038]

【発明の効果】以上説明したように本発明は、演算動作
とオーバラップして必要となる命令の命令キャッシュ内
の存在を保障することにより、命令が必要となる時点で
のキャッシュミスの遅れによるパイプラインの空きを防
ぎ、性能を向上する。As described above, the present invention guarantees the existence of the necessary instruction in the instruction cache by overlapping with the arithmetic operation, thereby delaying the cache miss at the time when the instruction is needed. Prevents empty pipelines and improves performance.

[Brief description of drawings]

【図１】本発明の第１の実施例を示すブロック図であ
る。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の第２の実施例を示すブロック図であ
る。FIG. 2 is a block diagram showing a second embodiment of the present invention.

【図３】情報処理装置におけるキャッシュメモリの位置
を説明する図である。FIG. 3 is a diagram illustrating a position of a cache memory in the information processing device.

【図４】第１の実施例を説明するためのＦ．Ｓ．Ｍ．の
動作フローを示す図である。FIG. 4 is a diagram illustrating an F.V. for explaining the first embodiment. S. M. It is a figure which shows the operation | movement flow of.

【図５】第１の実施例のＦ．Ｓ．Ｍ．の改善されたフロ
ーを示す図である。FIG. 5 is a diagram illustrating the F.V. of the first embodiment. S. M. FIG. 6 shows an improved flow of FIG.

【図６】第２の実施例を説明するための命令実行の状況
（分岐命令で分岐先キャッシュミス）を示す図である。FIG. 6 is a diagram showing an instruction execution state (branch destination cache miss with a branch instruction) for explaining a second embodiment.

【図７】第２の実施例を適用後の命令実行の状況（分岐
先プリロード命令の効果）を示す図である。FIG. 7 is a diagram showing an instruction execution situation (effect of a branch destination preload instruction) after application of the second embodiment.

[Explanation of symbols]

１中央処理装置（ＰＲＯＣ）２主記憶装置（ＭＭ）３命令実行部（ＩＰ）４命令キャッシュメモリ（ＩＣ）５データキャッシュメモリ（ＤＣ）１０２命令ポインタ１０３＋１加算器（＋１）１０４命令キャッシュ（Ｉ−ＣＡＣＨＥ）１０５命令レジスタ（１Ｒ）１０６命令メモリアドレスレジスタ（ＩＭＡＲ）１０７命令デコーダ（ＤＥＣＯＤＥ）１０８デコード済み命令レジスタ（ＤＩＲ）１０９レジスタファイル１１０データキャッシュ（Ｄ−ＣＡＣＨＥ）１１１有限状態論理（Ｆ．Ｓ．Ｍ．）１１２オペランドレジスタ（ＯＲ１）１１３オペランドレジスタ（ＯＲ２）１１４状態レジスタ（ＳＴＡＴＥ）１１５算術論理ユニット（ＡＬＵ）１１６結果レジスタ（ＲＤＲ）２０１命令ポインタ（ＩＰ）２０２＋１加算器（＋１）２０３プリロード用アドレスレジスタ（ＰＡＲ）２０４マルチプレクサ（ＭＵＸ）２０５命令キャッシュ（Ｉ−ＣＡＣＨＥ）２０６命令メモリアドレスレジスタ（ＩＭＡＲ）２０７Ａステージの命令レジスタ（ＡＩＲ）２０８比較器２０９命令デコーダ２１０汎用レジスタファイル（ＧＲＳ）２１１算術論理ユニット（ＡＬＵ）２１２Ｍステージの命令レジスタ（ＭＩＲ）２１３Ｍステージのアドレスレジスタ（ＭＡＲ）２１４Ｍステージのデータレジスタ（ＭＤＲ）２１５データキャッシュ（Ｄ−ＣＡＣＨＥ）２１６Ｗステージの命令レジスタ（ＷＩＲ）２１７Ｗステージのデータレジスタ 1 Central Processing Unit (PROC) 2 Main Memory (MM) 3 Instruction Execution Unit (IP) 4 Instruction Cache Memory (IC) 5 Data Cache Memory (DC) 102 Instruction Pointer 103 +1 Adder (+1) 104 Instruction Cache (I -CACHE) 105 instruction register (1R) 106 instruction memory address register (IMAR) 107 instruction decoder (DECODE) 108 decoded instruction register (DIR) 109 register file 110 data cache (D-CACHE) 111 finite state logic (FS) .. M.) 112 Operand register (OR1) 113 Operand register (OR2) 114 Status register (STATE) 115 Arithmetic logic unit (ALU) 116 Result register (RDR) 201 Instruction pointer (IP) 20 +1 adder (+1) 203 Preload address register (PAR) 204 Multiplexer (MUX) 205 Instruction cache (I-CACHE) 206 Instruction memory address register (IMAR) 207 A stage instruction register (AIR) 208 Comparator 209 Instruction decoder 210 General Purpose Register File (GRS) 211 Arithmetic Logic Unit (ALU) 212 M Stage Instruction Register (MIR) 213 M Stage Address Register (MAR) 214 M Stage Data Register (MDR) 215 Data Cache (D-CACHE) 216 W stage instruction register (WIR) 217 W stage data register

Claims

[Claims]

1. A relatively low-speed, large-capacity main memory device, and a relatively high-speed, small-capacity device which is used by appropriately replacing the contents of the main memory device in order to access the contents of the main memory device at high speed. The cache memory serving as a storage means has two separate cache memories, an instruction cache for storing an instruction and a data cache for storing operand data specified in the instruction, and the instruction cache fetches an instruction. An information processing apparatus, characterized in that data of a predetermined address is extracted from the main storage device separately from the address.

2. The instruction cache has an instruction for fetching data at a predetermined address from the main memory, and the instruction cache decodes the instruction to cause the instruction cache to Performing the fetch, and allowing the instruction cache to accept a subsequent request even while requesting data to the main storage device only if the address is the same as the address currently requested to the main storage device. The information processing apparatus according to claim 1, which is characterized in that.