JPH02308332A

JPH02308332A - Information processor

Info

Publication number: JPH02308332A
Application number: JP12980589A
Authority: JP
Inventors: Hitoshi Ishida; 仁志石田; Minoru Shiga; 稔志賀; Seisuke Kazama; 風間　成介
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1989-05-23
Filing date: 1989-05-23
Publication date: 1990-12-21

Abstract

PURPOSE:To use the locality of time and space of memory reference to quickly access the memory by providing a data unit with a stock means and providing a data cache with a transfer means. CONSTITUTION:A data unit 101 is provided with a stock means 101a where plural data are stocked, and a data cache 103 is provided with a transfer means 103a which simultaneously transfers data in a memory address N requested by the data unit 101 and data in a memory address N+1 to the data unit 101. Consequently, data having a high reference probability in the data cache 103 is preliminarily transferred to the data unit 101 and requested data is compared with data in the data unit 101, and memory access is unnecessary when they coincide with each other.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は情報の並列処理を行う情報処理装置に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to an information processing device that performs parallel processing of information.

[Conventional technology]

従来、この種の情報処理装置として第２図に示すような
ものがあった。この第２図はＣａｒｌ　Ｄｏｂｂｓ＋Ｐ
ａｕｌ　Ｒｅｅｄ　ａｎｄ　Ｔｏｍｍｙ　Ｎｇ　著：　
Ｓｕｐｅｒｃｏｍｐｕｔｉｎｇｏｎ　Ｃｈｉｐ、ＶＩＳ
ｒ　ＳＶＳＴＥＭＳ　ＤＥＳＩＧＮ　Ｖｏｌ、ＩＸ、Ｎ
ｏ、５．Ｍａｙ１９８８、　ｐｐ２４−３３に示された
構成に基づくもので、図において、２０１は整数演算や
論理演算などを行う整数ユニット、２０２は浮動小数点
演算などを行う浮動小数点ユニット、２０３はデータキ
ャッシュとレジスタファイル間でデータの読み出し又は
書き込み処理を行うデータユニット、２０４は上記演算
に必要な情報などを格納するレジスタファイル、２０５
はレジスタ競合の検出と回避を行うスコアボード、２０
６は命令のフェッチ、復号そして上記各ユニットへのデ
ータ転送を行う命令フェッチユニット、２０７は内部バ
ス、２０８はデータユニット２０３とデータキャッシュ
２１０間で命令のアドレスのやり取りをするアドレスバ
ス、２０９はデータユニット２０３とデータキャッシュ
２１０間でデータのやり取りをするデータバスである。Conventionally, there has been an information processing apparatus of this type as shown in FIG. This second figure was created by Carl Dobbs+P.
Written by aul Reed and Tommy Ng:
Supercomputing Chip, VIS
r SVSTEMS DESIGN Vol, IX, N
o, 5. It is based on the configuration shown in May 1988, pp. 24-33, and in the figure, 201 is an integer unit that performs integer operations and logical operations, 202 is a floating point unit that performs floating point operations, and 203 is a data cache and register file. 204 is a register file that stores information necessary for the above operations; 205 is a data unit that performs data read or write processing;
is a scoreboard that detects and avoids register conflicts, 20
6 is an instruction fetch unit that fetches and decodes instructions and transfers data to each of the above units; 207 is an internal bus; 208 is an address bus that exchanges instruction addresses between the data unit 203 and the data cache 210; and 209 is a data This is a data bus for exchanging data between the unit 203 and the data cache 210.

次に動作について説明する。命令フェッチユニット２０
６は命令のフェッチ、復号、転送の３つのステージにパ
イプライン化されていて、１クロツクでフェッチを完了
した後、命令を復号ステージに渡す。復号ステージで命
令を部分的に復号し、演算に必要なオペランドを命令に
対応する機能ユニットにレジスタファイル２０４からプ
リフェッチしてあげるためにスコアボード２０５にレジ
スタ要求を出す。レジスタファイル２０４の各レジスタ
はスコアボード・ビットを持っていて、そのスコアボー
ド・ビットはそのレジスタがストールされている時はセ
ットされ、データ操作が完了したらクリアされる。命令
フェッチユニット２０６からレジスタ要求を受は取った
スコアボード２０５はスコアボード・ビットを８周べ、
クリア状態である場合だけ命令フェッチユニット２０６
に利用可能のシグナルを送る。スコアボード２０５から
利用可能のシグナルを受けた命令フェッチユニット２０
６は命令を対応する機能ユニットに転送すると同時に命
令のフェッチを行う。各機能ユニットも数段のバイブラ
インステージを持ち、プリフェッチされたオペランドを
用いて命令を実行する。Next, the operation will be explained. Instruction fetch unit 20
6 is pipelined into three stages: instruction fetch, decode, and transfer, and after completing the fetch in one clock, the instruction is passed to the decode stage. In the decoding stage, the instruction is partially decoded and a register request is issued to the scoreboard 205 in order to prefetch the operands necessary for the operation from the register file 204 to the functional unit corresponding to the instruction. Each register in register file 204 has a scoreboard bit that is set when the register is stalled and cleared when the data operation is complete. The scoreboard 205 receives the register request from the instruction fetch unit 206 and rotates the scoreboard bits eight times.
The instruction fetch unit 206 only when it is in the clear state.
sends a signal of availability to. Instruction fetch unit 20 receives an available signal from scoreboard 205
6 transfers the instruction to the corresponding functional unit and simultaneously fetches the instruction. Each functional unit also has several vibe line stages and executes instructions using prefetched operands.

データユニット２０３も１つの機能ユニットであり、３
つのステージにパイプライン化されている。ステージ１
では、受は付けた命令のメモリアドレスの計算を行い、
計算結果を次のステージに渡す。ステージ２では、送ら
れてきたアドレスに基づいてアドレスバス２０８を駆動
する。送られてきた命令がストア命令の場合、このステ
ージでストアされるデータをレジスタファイル２０４か
らフェッチしてアドレスバス２０８にアドレスを、デー
タバス２０９にデータをのせる。ステージ３では、デー
タキャッシュ２１０の反応を監視する。The data unit 203 is also one functional unit, and 3
pipelined into two stages. stage 1
Now, the receiver calculates the memory address of the attached instruction,
Pass the calculation results to the next stage. In stage 2, the address bus 208 is driven based on the sent address. If the sent instruction is a store instruction, the data to be stored at this stage is fetched from the register file 204 and the address is placed on the address bus 208 and the data is placed on the data bus 209. In stage 3, the response of the data cache 210 is monitored.

ロード命令の場合、このステージでデータバス２０９を
読み、データを取り込む。In the case of a load instruction, the data bus 209 is read at this stage and data is taken in.

[Problem to be solved by the invention]

従来の情報処理装置は以上のように構成されているので
、データユニットは個々のロード命令の度にメモリアク
セスを行わなければならず、演算性能がいかに向上しよ
うとも多くの処理時間を要するところの外部とのデータ
のやりとりが多くては、思ったように処理性能が向上し
ないという問題点があった。Since conventional information processing devices are configured as described above, the data unit must perform memory access for each individual load instruction, which requires a lot of processing time no matter how much the calculation performance improves. The problem was that processing performance did not improve as expected if there was a lot of data exchanged with the outside world.

この発明は上記のような問題点を解決するためになされ
たもので、メモリ参照の時間的そして空間的な局所性を
利用して、高速のメモリアクセスを行うことのできる情
報処理装置を得ることを目的とする。This invention was made in order to solve the above-mentioned problems, and provides an information processing device that can perform high-speed memory access by utilizing the temporal and spatial locality of memory references. With the goal.

[Means to solve the problem]

この発明に係る情報処理装置は、データユニット１０１
に複数個のデータをストックするストック手段１０１ａ
を設け、データキャッシュ１０３にはデータユニット１
０１から要求されたデータのメモリアドレスをＮとする
と、要求されたＮ番地のデータと（Ｎ＋１）番地のデー
タを同時にデータユニット１０１に転送する転送手段１
０３ａを設けたことを特徴とするものである。The information processing device according to the present invention includes a data unit 101
Stocking means 101a for stocking a plurality of pieces of data in
is provided, and the data cache 103 has data unit 1.
Assuming that the memory address of the data requested from 01 is N, the transfer means 1 simultaneously transfers the requested data at address N and the data at address (N+1) to the data unit 101.
03a is provided.

[Effect]

データユニット１０１のストック手段１０１ａは複数個
のデータをストックする。データキャッシュ１０３の転
送手段１０３ａは、データユニット１０１から要求され
たＮ番地のデータと（Ｎ＋１）番地のデータを同時にデ
ータユニット１０１に転送する。The stock means 101a of the data unit 101 stocks a plurality of pieces of data. The transfer means 103a of the data cache 103 simultaneously transfers the data at address N and the data at address (N+1) requested from the data unit 101 to the data unit 101.

[Embodiments of the invention]

第１図はこの発明の一実施例に係る情報処理装置の構成
を示すブロック図であり、第２図に示す構成要素に対応
するものには同一の符号を付し、その説明を省略する。FIG. 1 is a block diagram showing the configuration of an information processing apparatus according to an embodiment of the present invention. Components corresponding to those shown in FIG. 2 are given the same reference numerals, and their explanations will be omitted.

第１図において、１０１は複数個のデータをストックす
るストック手段１０１ａを持ったデータユニット、１０
３はデータユニット１０１から要求されたデータのメモ
リアドレスをＮとすると、要求されたＮ番地のデータと
（Ｎ＋１）番地のデータを同時にデータユニソト１０１
に転送する転送手段１０３ａを持ったデータキャッシュ
である。また、１０２はデータキャッシュ１０３とデー
タユニット１０１間でのデータのやり取りを行うデータ
バスである。In FIG. 1, 101 is a data unit having stock means 101a for stocking a plurality of data;
3, when the memory address of the data requested from the data unit 101 is N, the requested data at address N and the data at address (N+1) are simultaneously stored in the data unit 101.
This is a data cache having a transfer means 103a for transferring data to. Further, 102 is a data bus for exchanging data between the data cache 103 and the data unit 101.

次に動作について説明する。データユニット１０１は、
従来装置と同様３つのパイプライン・−ステージに分割
されていて、メモリアクセス命令を受は取ると第１ステ
ージでメモリアドレスの計算を行う。第１ステージで計
算されたメモリアドレスをＮとする。受は付けた命令が
ストア命令の場合、従来装置と同様な計算が終了すると
命令は第２ステージに渡される。受は付けた命令がロー
ド命令の場合、計算したメモリアドレスＮとデータユニ
ット１０１内のストック手段１０１ａにストックされた
データのアドレスと比較する。比較の結果、一致するも
のがあれば、メモリアクセスを行うことなく一致したア
ドレスに対応するデータをレジスタファイル２０４に書
き込む。一致するものがなければ、メモリアクセスを行
うために命令を第２ステージに渡す。第２ステージでは
、アドレスバス２０８及びデータバス１０２を駆動し、
命令の実行に必要なデータのメモリアドレスＮをデータ
キャッシュ１０３に知らせ、命令を第３ステージに渡す
。データユニット１０１からメモリアドレスＮを受は取
ったデータキャッシュ１０３は、要求されたＮ番地のデ
ータと（Ｎ＋１）番地のデータを２本のデータバス１０
２を使って、転送手段１０３ａによって同時にデータユ
ニット１０１に転送する。第３ステージでは、データキ
ャッシュ１０３から送られてくる２つのデータのうち命
令の実行に必要なデータをレジスタファイル２０４に書
き込むと同時に、２つのデータでデータユニット１０１
内のデータを置き換える。ここでスト・７りされたデー
タは、次のロード命令の時参照される。Next, the operation will be explained. The data unit 101 is
Like the conventional device, it is divided into three pipeline stages, and when a memory access command is received, a memory address is calculated in the first stage. Let N be the memory address calculated in the first stage. If the accepted instruction is a store instruction, the instruction is passed to the second stage after the calculation similar to the conventional device is completed. If the accepted instruction is a load instruction, the calculated memory address N is compared with the address of the data stored in the stock means 101a in the data unit 101. If there is a match as a result of the comparison, data corresponding to the matched address is written to the register file 204 without performing memory access. If there is no match, the instruction is passed to the second stage to perform the memory access. In the second stage, address bus 208 and data bus 102 are driven,
The data cache 103 is informed of the memory address N of the data required to execute the instruction, and the instruction is passed to the third stage. The data cache 103, which has received the memory address N from the data unit 101, transfers the requested data at address N and data at address (N+1) to two data buses 10.
2 is simultaneously transferred to the data unit 101 by the transfer means 103a. In the third stage, out of the two data sent from the data cache 103, the data necessary for executing the instruction is written to the register file 204, and at the same time, the two data are sent to the data unit 103.
Replace the data within. The data stored here will be referenced at the time of the next load instruction.

このように、データキャッシュ１０３において参照確率
の高いデータを予めデータユニット１０１に転送してお
き、データユニットｉｏｉにおいて要求されたデータの
アドレスとデータユニット１０１内のデータのアドレス
とを比較し、両者が一致すればメモリアクセスを行う必
要がなく、一致しない場合にのみメモリアクセスを行う
ので、全体としてメモリアクセスに要する時間を短縮で
きる。In this way, data with a high reference probability is transferred in advance to the data unit 101 in the data cache 103, and the address of the requested data in the data unit ioi is compared with the address of the data in the data unit 101. If they match, there is no need to perform memory access, and only if they do not match, memory access is performed, so the overall time required for memory access can be reduced.

〔Effect of the invention〕

以上のように本発明によれば、データユニットに複数個
のデータをストックするストック手段を設け、データキ
ャッシュにはデータユニットから要求されたＮ番地のデ
ータと（Ｎ＋１）番地のデータを同時にデータユニット
に転送する転送手段を設けて構成したので、例えばデー
タキャッシュにおいて参照確率の高いデータを予めデー
タユニットに転送しておき、データユニットにおいて要
求されたデータとデータユニット内のデータとを比較す
ることができ、両者が一致した場合はメモリアクセスを
行う必要がなくなり、全体としてのメモリアクセス回数
が少なくなり、したがってメモリアクセスに要する時間
の短縮化を図れ、データ処理速度が向上するという効果
が得られる。As described above, according to the present invention, the data unit is provided with a stock means for stocking a plurality of pieces of data, and the data cache simultaneously stores the data at address N and the data at address (N+1) requested from the data unit. For example, data with a high reference probability can be transferred to the data unit in advance in the data cache, and the requested data in the data unit can be compared with the data in the data unit. If both match, there is no need to perform memory access, and the overall number of memory accesses is reduced.Therefore, the time required for memory access can be shortened, and the data processing speed can be improved.

[Brief explanation of drawings]

第１図はこの発明の一実施例に係る情報処理装置の要部
構成を示すブロック図、第２図は従来の情報処理装置の
要部構成を示すプロ・７り図である。１０１・・・７’−タ：ｌ−ニット、１０１ａ・・・ス
トック手段、１０３・・・データキャッシュ、１０３ａ
・・・転送手段、２０１・・・整数ユニット、２０２・
・・浮動小数点ユニット、２０４・・・レジスタファイ
ル、２０６・・・命令フェッチユニット。FIG. 1 is a block diagram showing the main part configuration of an information processing apparatus according to an embodiment of the present invention, and FIG. 2 is a professional diagram showing the main part structure of a conventional information processing apparatus. 101...7'-ta:l-nit, 101a...Stock means, 103...Data cache, 103a
... Transfer means, 201 ... Integer unit, 202.
...Floating point unit, 204...Register file, 206...Instruction fetch unit.

Claims

[Claims]

Data is transferred between an integer unit that performs integer operations and logical operations, a floating point unit that performs floating point operations, a register file that stores information necessary for the above operations, and a data cache that stores data and this register file. In an information processing device, the information processing device includes a data unit that performs read and write processing, and an instruction fetch unit that pipelines each stage of instruction fetch, decoding, and data transfer to each of the above components. A stock means for storing a plurality of pieces of data is provided, and when the memory address of the data requested from the data unit is N, the data at the requested address N and the data at address (N+1) are simultaneously stored in the data cache. An information processing device comprising a transfer means for transferring data to the data unit.