JP3085267B2

JP3085267B2 - Memory access acceleration device

Info

Publication number: JP3085267B2
Application number: JP09326692A
Authority: JP
Inventors: 格岡野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-11-27
Filing date: 1997-11-27
Publication date: 2000-09-04
Anticipated expiration: 2017-11-27
Also published as: JPH11161489A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はメモリアクセス高速
化方法および装置に関し、特に主メモリとキャッシュメ
モリとの間のアクセス速度のギャップを改善してシステ
ム処理性能を向上させるメモリアクセス高速化方法およ
び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for speeding up memory access, and more particularly to a method and apparatus for speeding up memory access for improving a system processing performance by improving a gap in access speed between a main memory and a cache memory. About.

【０００２】[0002]

【従来の技術】近年、ＬＳＩの集積度向上および高速化
の要求の高まりにより、情報処理装置にキャッシュメモ
リを利用する技術は一般的になっている。キャッシュメ
モリの原理はアクセスのアドレスの局所性を利用して主
記憶の写しを高速かつ小容量のバッファメモリに蓄え、
疑似的に主記憶アクセス速度を向上させることにある。2. Description of the Related Art In recent years, a technology for using a cache memory in an information processing apparatus has become popular due to an increase in demand for higher integration and higher speed of LSI. The principle of the cache memory is to store a copy of the main memory in a high-speed and small-capacity buffer memory using the locality of the access address,
It is to improve the main memory access speed in a pseudo manner.

【０００３】一方で、ＣＰＵのマシンクロックの高速化
に伴って、主記憶装置のアクセス遅延が相対的に大きく
見えるようになりつつあるという問題がある。つまり、
キャッシュメモリにヒットしている間は問題ないが、ひ
とたびキャッシュミスが発生するとデータ取得の遅延が
大きく、性能向上の壁になっていた。On the other hand, there is a problem that the access delay of the main storage device is becoming relatively large with the increase in the speed of the machine clock of the CPU. That is,
There is no problem while hitting the cache memory, but once a cache miss occurs, the delay in data acquisition is large, which has been a barrier to performance improvement.

【０００４】また、ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅ
Ｃｏｍｐｕｔｉｎｇ（以下、ＨＰＣという。）領域で
は、参照するデータが離散的で、かつ主記憶全体に及ぶ
大量のデータを扱う。このような用途では、主記憶から
読み出したデータを再利用する前に、キャッシュに蓄え
られたデータが他のデータで置き換えられてしまい、ア
クセスのローカリティがあることが前提となっているキ
ャッシュメモリの効果が出にくいという問題点がある。[0004] In addition, High Performance
In the Computing (hereinafter, referred to as HPC) area, a large amount of data which is discrete and refers to the entire main memory is handled. In such an application, the data stored in the cache is replaced with other data before the data read from the main memory is reused, and the cache memory is assumed to have access locality. There is a problem that the effect is difficult to obtain.

【０００５】上記のように従来のキャッシュメモリはい
ずれもリクエストを受け取った時点から受動的に動作す
るものであり、キャッシュミスの頻度を減らすことはで
きても無くすことは不可能であった。しかしながら、次
に発生するリクエストを予測できれば、あらかじめ主記
憶装置に対してリクエストを出しておき、キャッシュミ
スによる性能低下を隠すことが可能になる。さらに、キ
ャッシュミスが目立たなくなることにより、従来大きな
面積を占めていたキャッシュの容量を減らすことが可能
になり、マシンクロックの向上およびＬＳＩの面積縮小
にも役立つことになる。As described above, all of the conventional cache memories operate passively from the time of receiving a request, and it was impossible to reduce the frequency of cache misses, but it was impossible to eliminate them. However, if the next request to be generated can be predicted, it is possible to issue a request to the main storage device in advance and hide the performance degradation due to a cache miss. Further, since the cache miss becomes inconspicuous, it becomes possible to reduce the capacity of the cache, which conventionally occupies a large area, and it is also useful for improving the machine clock and reducing the area of the LSI.

【０００６】このような発想に基づいて従来から改善や
工夫が重ねられている。たとえば、特開平６−５１９８
２号公報では、キャッシュミス中に停止している演算処
理装置を使って、キャッシュミスアドレスから次のアク
セスアドレスを予測する技術について述べられている。[0006] Improvements and ideas have been repeatedly made based on such ideas. For example, JP-A-6-5198
Japanese Patent Application Laid-Open Publication No. 2002-115873 describes a technique for predicting the next access address from a cache miss address using an arithmetic processing unit stopped during a cache miss.

【０００７】この発明のアクセスアドレスの予測は、キ
ャッシュミス時に実行されるハンドラコードを変更する
ことによって様々なシステムに柔軟に対応できるという
利点があるものの、主記憶からデータが戻ってくる時間
内に上記ハンドラコードの実行を完了しなければ、却っ
て処理時間が長くなるいう欠点も伴わせ持つ。[0007] The prediction of the access address according to the present invention has the advantage that it can flexibly cope with various systems by changing the handler code executed at the time of a cache miss, but within the time when data returns from the main memory. If the execution of the handler code is not completed, the processing time will be rather long.

【０００８】また、最近のキャッシュメモリシステムで
は、ノンブロッキングキャッシュの採用により、キャッ
シュミスによる演算処理装置の空き時間そのものを減ら
す技術が普及してきており、本発明に述べられているよ
うな演算処理装置の空き時間を必ずしも確保できないと
いう問題もある。In recent cache memory systems, a technique for reducing the idle time of an arithmetic processing unit itself due to a cache miss by adopting a non-blocking cache has become widespread. There is also a problem that free time cannot always be secured.

【０００９】また、特開平４−５２７４１号公報では、
予想されるアドレスのデータを命令により予めキャッシ
ュメモリにロードする技術が説明されている。この発明
では、プログラムによって予め必要なアドレスのデータ
ロードを指示できるので、的確にプログラムすることに
よりキャッシュミスを隠蔽することができ、無駄なキャ
ッシュロードを無くすことができる利点がある。反面、
プログラムを解析し、予想されるアドレスを明示的にキ
ャッシュロードするよう指示する必要があるので、プロ
グラムのチューニングに手間がかかるという欠点を伴わ
せ持つ。In Japanese Patent Application Laid-Open No. 4-52741,
A technique has been described in which data of an expected address is previously loaded into a cache memory by an instruction. According to the present invention, there is an advantage that since a program can instruct a data load of a necessary address in advance, a cache miss can be concealed by performing an appropriate program, and unnecessary cache load can be eliminated. On the other hand,
Since it is necessary to analyze a program and explicitly instruct an expected address to be cache-loaded, there is a disadvantage that tuning of the program is troublesome.

【００１０】また、特開平３−２９２５４８号公報で
は、キャッシュミスに伴い、連続するキャッシュブロッ
クがキャッシュに登録されているかどうかを検査し、登
録されていなければ予めキャッシュにロードする技術に
ついて説明されている。この発明は、非常に単純な構造
で既存のシステムにも導入しやすい利点がある反面、キ
ャッシュミスしたブロックに連続するブロックが使用さ
れる保証が無いばかりか、追い出されたキャッシュブロ
ックにアクセスがある場合には却って性能を低下させる
恐れがある。Japanese Patent Application Laid-Open No. 3-292548 describes a technique for checking whether a continuous cache block is registered in a cache due to a cache miss, and loading the cache block in advance if the cache block is not registered. I have. The present invention has an advantage that it has a very simple structure and can be easily introduced into an existing system. On the other hand, there is no guarantee that a block following a cache miss block is used, and there is access to an evicted cache block. In some cases, the performance may be degraded.

【００１１】[0011]

【発明が解決しようとする課題】上記のような従来のメ
モリアクセスの方式は、予想されるデータを予め主記憶
から読み出しておく点については共通だが、その実現方
法は、プログラムの書き換えを要する，副作用が大きい
等，導入にあたって困難な点が少なくない。さらに、キ
ャッシュメモリを含めたプロセッサ全体に何らかの手を
加えなければならない点が挙げられる。The above-described conventional memory access method is common in that expected data is read from a main memory in advance, but the method of realizing the method requires rewriting of a program. There are many difficult points in introducing such as large side effects. Further, there is a point that some modification must be made to the entire processor including the cache memory.

【００１２】近年のＨＰＣの分野では、プロセッサから
新規にシステムを開発するよりは、既存のマイクロプロ
セッサを利用してシステムを構築する例が多くなった。
これは、マイクロプロセッサの性能が専用プロセッサに
比べて遜色が無くなったこと、コストパフォーマンスが
優れていること等によるが、ＨＰＣ分野向けにハードウ
ェアのチューニングを施すことが難しくなることも意味
している。In the field of HPC in recent years, there have been many cases where a system is constructed using an existing microprocessor, rather than developing a new system from a processor.
This means that the performance of the microprocessor is not inferior to that of the dedicated processor, the cost performance is excellent, and the like, but it also means that it is difficult to perform hardware tuning for the HPC field. .

【００１３】本発明の目的は、上記のような点に配慮
し、プログラムの書き替えが不要で、副作用が少なく、
かつ既存のマイクロプロセッサにも応用できるようなメ
モリアクセス高速化方法および装置を提供することにあ
る。An object of the present invention is to consider the above points, eliminate the need to rewrite a program, reduce side effects,
Another object of the present invention is to provide a memory access speed-up method and apparatus which can be applied to an existing microprocessor.

【００１４】[0014]

【００１５】[0015]

【００１６】[0016]

【課題を解決するための手段】本発明のメモリアクセス
高速化装置は、過去のメモリアクセスのアドレス履歴を
保持する第一のバッファと、メモリアクセスの際にその
アドレスと前記第一のバッファに保持されたアドレス履
歴の一部もしくは全部との間の距離を計算する手段と、
前記第一のバッファに記憶されたアドレスとそれ以前の
一個以上のアドレスとの距離および一致回数を保持する
第二のバッファと、新たなメモリアクセスに際しそのア
ドレスと前記第一のバッファに記憶された過去のメモリ
アクセスのアドレスの一つとの距離が前記第二のバッフ
ァに保持されている過去のメモリアクセスに対応した距
離の少なくとも一つと一致する場合にはそのアクセスの
組み合わせに対応する一致回数を記憶する手段と、前記
一致回数があらかじめ定めた回数に達したときに前記距
離で連続したアクセスを発行する手段とを有して構成さ
れる。According to the present invention, there is provided a memory access speed-up device , comprising: a first buffer for holding a past memory access address history; Means for calculating the distance between some or all of the address history that has been entered,
A second buffer for holding the distance and matches the number of the first address stored in the buffer and previous one or more addresses, stored in the addresses and the first buffer upon a new memory access Past memory
When the distance from one of the access addresses coincides with at least one of the distances corresponding to past memory accesses held in the second buffer, means for storing the number of matches corresponding to the combination of the accesses Means for issuing continuous access at the distance when the number of matches reaches a predetermined number.

【００１７】さらに、本発明のメモリアクセス高速化装
置において、連続してアクセスしたデータが使用されな
いときには前記連続したアクセスを停止する手段を備え
る。Further, in the memory access speed-up device of the present invention, there is provided a means for stopping the continuous access when the continuously accessed data is not used.

【００１８】さらに、本発明のメモリアクセス高速化装
置は、連続してアクセスしたデータを一時記憶バッファ
に格納しキャッシュメモリには直接格納しないようにし
て構成される。Further, the memory access speed-up device of the present invention is configured such that data accessed continuously is stored in a temporary storage buffer and not directly stored in a cache memory.

【００１９】さらに、本発明のメモリアクセス高速化装
置において、前記連続したアクセスの際に例外が発生し
ても前記例外を無視する手段を備える。Further, in the memory access speed-up device of the present invention, there is provided means for ignoring the exception even if an exception occurs during the continuous access.

【００２０】すなわち、本発明は、主記憶上のデータを
等間隔でアクセスする際に、予測されるデータを予めバ
ッファに読み出す手段を提供する。主記憶上データを等
間隔でアクセスするベクトル的な処理は、科学技術演算
では頻繁に登場する。しかしながら、オペランドアクセ
スは勿論、等間隔のデータアクセスが連続して出るわけ
ではないので、様々な雑音の中から繰り返し同じ間隔で
出力されるアドレスの規則性を高精度で抽出している。That is, the present invention provides means for reading predicted data into a buffer in advance when accessing data on the main memory at regular intervals. Vector processing for accessing data on the main memory at equal intervals frequently appears in scientific and technical calculations. However, since data accesses at equal intervals are not continuously output as well as operand accesses, the regularity of addresses repeatedly output at the same interval is extracted from various noises with high accuracy.

【００２１】[0021]

【発明の実施の形態】以下、本発明について図面を参照
しながら説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings.

【００２２】図１は、本発明の原理を示すブロック図で
ある。同図において、本発明によるメモリアクセス高速
化装置は、過去４つ分のメモリアクセスについて、アク
セスアドレスの規則性を調べている。レジスタＤ９，Ｅ
９，Ｆ９，Ｇ９は、請求項３の第一のバッファに相当
し、過去のメモリアクセスアドレスを記憶する。本例で
は、メモリアクセスの度に、レジスタＤ９，Ｅ９，Ｆ
９，Ｇ９がストローブされているため、常に、レジスタ
Ｄ９には１個前、レジスタＥ９には２個前のアクセスア
ドレスが格納されている。FIG. 1 is a block diagram showing the principle of the present invention. In the figure, the memory access speed-up device according to the present invention checks the regularity of the access address for the past four memory accesses. Registers D9 and E
9, F9 and G9 correspond to the first buffer of claim 3 and store past memory access addresses. In this example, each time the memory is accessed, the registers D9, E9, F
Since 9 and G9 are strobed, the immediately preceding access address is stored in the register D9 and the immediately preceding access address is stored in the register E9.

【００２３】新たなメモリアクセスが発生すると、その
メモリアクセスのアドレスと、レジスタＤ９，Ｅ９，Ｆ
９，Ｇ９、即ち過去のメモリアクセスアドレスとの差
（距離）が、減算器Ｄ，Ｅ，Ｆ，Ｇによって計算され
る。計算結果は、請求項３の第二のバッファの一部に相
当するレジスタＤ０，Ｄ１，Ｄ２，Ｄ３に格納されると
共に、過去のメモリアクセス時に計算された距離とも比
較される。When a new memory access occurs, the address of the memory access and the registers D9, E9, F
9, G9, that is, the difference (distance) from the past memory access address is calculated by the subtracters D, E, F, and G. The calculation result is stored in registers D0, D1, D2, and D3 corresponding to a part of the second buffer according to claim 3, and is compared with the distance calculated at the time of past memory access.

【００２４】例えば、レジスタＤ９と今回のメモリアク
セスアドレスの距離は、レジスタＤ０，Ｄ１，Ｄ２，Ｄ
３に格納されている前々回のメモリアクセスの際に計算
された距離と比較される。一致する距離があれば、今回
と前回，前回と前々回のアクセスで等間隔にメモりアク
セスしていることになり、更に次回も同じ間隔でメモリ
をアクセスする確率が高いことが予測される。For example, the distance between the register D9 and the current memory access address is determined by the registers D0, D1, D2, D
3 is compared with the distance calculated at the time of the last two-time memory access. If there is a matching distance, it means that memory access has been performed at equal intervals in the current and previous accesses and in the previous and last two accesses, and it is predicted that there is a high probability that the memory will be accessed at the same intervals next time.

【００２５】距離の一致が検出された場合は、一致回数
バッファのカウント値が＋１され、対応するバッファに
記憶される。図２に示す演算器−０，１，２，３がこの
機能を実現している。When a distance match is detected, the count value of the match number buffer is incremented by one and stored in the corresponding buffer. Arithmetic units-0, 1, 2, and 3 shown in FIG. 2 realize this function.

【００２６】図３は演算器−０の動作を示す説明図であ
る。同図において、演算器−０には、比較器Ｃ０，Ｃ
１，Ｃ２，Ｃ３の比較結果が入力されている。メモリア
クセスアドレスの距離が一致しなかった場合には、比較
器の出力が’０’となる。この結果、演算器−０からは
値’１’が出力され、レジスタＤ４にセットされる。す
なわち、バッファに記録されている限りでは、同じメモ
リアクセスアドレスの距離でアクセスしたことは無いこ
とを意味する。FIG. 3 is an explanatory diagram showing the operation of the arithmetic unit-0. In the figure, a computing unit-0 includes comparators C0, C
1, a comparison result of C2 and C3 is input. If the distances of the memory access addresses do not match, the output of the comparator becomes "0". As a result, the value "1" is output from the arithmetic unit-0 and set in the register D4. In other words, as long as the data is recorded in the buffer, it means that no access has been made at the same memory access address distance.

【００２７】メモリアクセスアドレスの距離が一致する
と、比較器の出力が’１’となる。例えば、比較器Ｃ０
でメモリアクセスアドレスの距離が一致すると、レジス
タＤ４にセットされた一致回数バッファのカウント値が
＋１され、結果はレジスタＤ４にセットされる。もとの
レジスタＤ４の値は、レジスタＥ４にコピーされる。When the distances of the memory access addresses match, the output of the comparator becomes "1". For example, the comparator C0
When the distances of the memory access addresses match, the count value of the match count buffer set in the register D4 is incremented by 1, and the result is set in the register D4. The original value of register D4 is copied to register E4.

【００２８】この一致回数の値が予め定められた回数を
超えると、図示しない回路により、連続したメモリアク
セス要求が発行される。When the value of the number of matches exceeds a predetermined number, a continuous memory access request is issued by a circuit (not shown).

【００２９】レジスタＤ０，Ｄ１，Ｄ２，Ｄ３の内容
は、最新のメモリアクセスアドレスの距離が格納される
のと同時に、レジスタＥ０，Ｅ１，Ｅ２，Ｅ３にコピー
される。同時に、レジスタＥ０，Ｅ１，Ｅ２，Ｅ３の値
は、レジスタＦ０，Ｆ１，Ｆ２，Ｆ３にコピーされる。
レジスタＦ０，Ｆ１，Ｆ２，Ｆ３，Ｇ０，Ｇ１，Ｇ２，
Ｇ３についても同様である。The contents of the registers D0, D1, D2, and D3 are copied to the registers E0, E1, E2, and E3 at the same time that the distance of the latest memory access address is stored. At the same time, the values of registers E0, E1, E2, E3 are copied to registers F0, F1, F2, F3.
Registers F0, F1, F2, F3, G0, G1, G2
The same applies to G3.

【００３０】以上の動作により、常にレジスタＤ０，Ｄ
１，Ｄ２，Ｄ３にはレジスタＤ９に対応したメモリアク
セスアドレスの距離が常に格納されている。レジスタＥ
０，Ｅ１，Ｅ２，Ｅ３，Ｆ０，Ｆ１，Ｆ２，Ｆ３，Ｇ
０，Ｇ１，Ｇ２，Ｇ３についても同様である。By the above operation, the registers D0, D
1, D2 and D3 always store the distance of the memory access address corresponding to the register D9. Register E
0, E1, E2, E3, F0, F1, F2, F3, G
The same applies to 0, G1, G2, and G3.

【００３１】図４（ａ）は、二次元配列の内容を累算し
て一次配列に変換する単純なプログラムの例である。プ
ログラムは２重のＤ０ループから構成されており、内周
のループでは、Ｂ（Ｊ，Ｉ）のＪが１から１０までの範
囲が累算され、Ａ（Ｉ）に代入される。FIG. 4A is an example of a simple program for accumulating the contents of a two-dimensional array and converting it into a primary array. The program is composed of a double D0 loop. In the inner loop, the range of J of B (J, I) from 1 to 10 is accumulated and substituted into A (I).

【００３２】このプログラムをアセンブリ言語に置き換
えると、図４（ｂ）のようになる。ループの最内周はＬ
ＡＢＥＬ２からＥＮＤまでであるが、この間に主記憶装
置からレジスタへのロード命令が式（１）から式（４）
までの４回出現する。式（１），式（２），式（４）
は、ループ最内周で主記憶装置上のアドレスは変化しな
いが、式（３）のアドレスは、毎回変化する。When this program is replaced with assembly language, the result is as shown in FIG. The innermost circumference of the loop is L
ABEL2 to END, during which the load instruction from the main memory to the register is given by the formulas (1) to (4).
Appears four times. Equation (1), Equation (2), Equation (4)
, The address on the main storage device does not change at the innermost circumference of the loop, but the address in Expression (3) changes every time.

【００３３】２次元配列の場合、主記憶上のアドレス
は、一般的に図５のようになる。すなわち、Ｂ（１，
１）に続くアドレスはＢ（１，２）になり、配列の１要
素を８Ｂｙｔｅとすると、Ｂ（２，１）はＢ（１，１）
から１０ｘ８Ｂｙｔｅ離れた場所になる。図４に示した
プログラムでは、配列Ｂ（Ｉ，Ｊ）のアクセス順序は図
５の黒色の位置になり、ループを廻る毎に８０Ｂｙｔｅ
離れたアドレスをアクセスすることになる。In the case of a two-dimensional array, the addresses on the main memory are generally as shown in FIG. That is, B (1,
The address following 1) is B (1,2). If one element of the array is 8 bytes, B (2,1) becomes B (1,1).
10x8 Bytes away from In the program shown in FIG. 4, the access order of the array B (I, J) is the black position in FIG.
Access to a distant address will occur.

【００３４】図６は最内周のループを廻る度に、図１お
よび図２のレジスタ値がどのように変化するかを示した
ものである。変数が格納されている主記憶上のアドレス
は、Ｉの主記憶上のアドレスを０番地，Ｊの主記憶上の
アドレスを８番地，Ａ（Ｉ）の開始アドレスを１００番
地，Ｂ（Ｊ，Ｉ）の開始アドレスを２００番地と仮定す
る。FIG. 6 shows how the register values of FIGS. 1 and 2 change every time the circuit goes around the innermost loop. The addresses in the main memory where the variables are stored are as follows: address 0 in the main memory of I, address 8 in the main memory of J, address 100 in the start address of A (I), B (J, It is assumed that the start address of I) is 200.

【００３５】ループの１回目（図６（ａ）参照）では、
バッファに履歴が記録されていないため、特別な動作は
行わない。ループの２回目（図６（ｂ）参照）になる
と、前回のループの履歴が残っているため、１回目のル
ープとは若干動作が異なる。Ｂ（１，１）のアドレス
（２００番地）に続いて、Ａ（１）のアドレス（１００
番地）、Ｉのアドレス（０番地）が１００番地飛びでア
クセスされたため、距離−１００で３回目の連続アクセ
スがあったことが記録されている。このように、必ずし
も規則的なアクセスでなくても、アドレスのアクセス間
隔が等しくなると、等間隔のアクセスとして、バッファ
上に記録が残る。In the first loop (see FIG. 6A),
No special operation is performed because no history is recorded in the buffer. In the second loop (see FIG. 6B), the operation is slightly different from the first loop because the history of the previous loop remains. Following the address of B (1,1) (address 200), the address of A (1) (100
Since the address (address 0) and the address I (address 0) were accessed by skipping address 100, it is recorded that the third consecutive access was performed at a distance of -100. As described above, even if the access is not always regular, if the access intervals of the addresses become equal, a record remains on the buffer as an access at equal intervals.

【００３６】ループの３回目（図６（ｃ）参照）では、
ループ毎に等間隔でアクセスを繰り返しているＢ（Ｊ，
Ｉ）が、等間隔のアクセスとして、レジスタ上に記録さ
れている。１回目のループでは２００番地，２回目のル
ープでは２８０番地，３回目のループでは３６０番地
と、８０番地間隔で規則的なアクセスをしている。ルー
プの４回目（図６（ｄ）参照）になると、一致を検出し
たカウント値が３を示し、次も同じ間隔でアクセスを繰
り返すことが予測される。In the third loop (see FIG. 6C),
B (J,
I) is recorded on the register as accesses at equal intervals. In the first loop, address 200 is accessed, in the second loop, address 280, and in the third loop, address 360, and regular access is made at address intervals of 80. At the fourth time in the loop (see FIG. 6D), the count value at which a match is detected indicates 3, and it is predicted that access will be repeated at the same interval.

【００３７】プロセッサから出力されるアドレスを監視
する限り、等距離で出るリクエストは４回に１回とな
り、単純な回路では、Ｂ（Ｉ，Ｊ）のアドレスの規則性
を見極めることは難しい。本発明のメモリアクセス高速
化方式を採用すれば、このようなケースでも、アクセス
アドレスの規則性から、次にアクセスするアドレスの規
則性を見極めることができる。As long as the address output from the processor is monitored, requests issued at the same distance are once in four times, and it is difficult to determine the regularity of the address of B (I, J) with a simple circuit. If the memory access speed-up method of the present invention is adopted, even in such a case, the regularity of the address to be accessed next can be determined from the regularity of the access address.

【００３８】図７は、本発明のメモリアクセス高速化方
式を採用した情報処理装置の例を示すブロック図であ
る。同図において、プロセッサ１はキャッシュメモリ２
の制御機能を内蔵しているため、外部回路によってキャ
ッシュの内容を書き換えることはできないと仮定した。
このため。アクセスの規則性をもとに、主記憶装置４か
ら先行して読み出したデータは、バッファメモリ３と呼
ばれる一時記憶装置に格納される。FIG. 7 is a block diagram showing an example of an information processing apparatus employing the high speed memory access method of the present invention. In the figure, a processor 1 is a cache memory 2
It is assumed that the contents of the cache cannot be rewritten by an external circuit because of the built-in control function.
For this reason. Based on the regularity of access, data read in advance from the main storage device 4 is stored in a temporary storage device called a buffer memory 3.

【００３９】前記キャッシュメモリ２のミスが発生する
と、バッファメモリ３内に該当するデータがあるか検索
し、データがバッファメモリ内に存在する場合、主記憶
装置４からデータを読み出すことなくバッファメモリ３
からキャッシュメモリにデータを転送する。主記憶装置
からのデータ読み出しは、一般的にかなり長い時間を要
するため、予めバッファメモリにデータを読み出してお
く本方式は、高速化の効果がある。When a miss occurs in the cache memory 2, the buffer memory 3 is searched for the corresponding data. If the data exists in the buffer memory, the data is not read from the main memory 4.
From the cache memory to the cache memory. Since reading data from the main storage device generally requires a considerably long time, this method of reading data in the buffer memory in advance has an effect of speeding up.

【００４０】プロセッサ１とキャッシュメモリ２の間の
バスは、図１の回路に相当するアドレス監視回路５によ
って監視されている。監視しているアドレスに規則性が
検出されたときには、リクエスト生成回路６によって主
記憶装置４の先行読み出し要求が生成される。The bus between the processor 1 and the cache memory 2 is monitored by an address monitoring circuit 5 corresponding to the circuit shown in FIG. When regularity is detected in the monitored address, the request generation circuit 6 generates a pre-read request for the main storage device 4.

【００４１】先行して読み出されたデータが一定期間使
用されない場合は、アクセス規則性の予測が外れたもの
と判断し、リクエスト中止回路７によってリクエストの
生成を中断するが、予測に基づいて主記憶装置から読み
出したデータが実際にプロセッサによって使われている
時には、引き続きリクエストの生成を継続する。If the data read earlier is not used for a certain period of time, it is determined that the prediction of the access regularity is wrong, and the request generation circuit 7 interrupts the generation of the request. When the data read from the storage device is actually used by the processor, the generation of the request is continued.

【００４２】アクセス規則性の予測に基づき、主記憶装
置に出されたリクエストが例外を発生することも考えら
れる。プロセッサからのリクエストに伴う記憶アクセス
例外であれば、プロセッサに割り込みをかけて例外処理
を行う必要があるが、アクセス規則性の予測に基づいて
主記憶装置に出力されたリクエストの場合は例外を発生
してはならない。このように、リクエスト元に応じて記
憶アクセス例外を抑止する回路がメモリ例外無効化回路
８である。It is conceivable that a request issued to the main storage device may cause an exception based on the prediction of the access regularity. If it is a memory access exception accompanying a request from the processor, it is necessary to interrupt the processor and perform the exception processing, but if the request is output to the main storage device based on the prediction of access regularity, an exception occurs should not be done. As described above, the circuit that suppresses the storage access exception according to the request source is the memory exception invalidation circuit 8.

【００４３】[0043]

【発明の効果】本発明によれば、メモリアクセスのアド
レスを高精度で予測することができるので、キャッシュ
メモリに格納できないような大規模なデータを扱うプロ
グラムでも、キャッシュのヒット率の低下を気にするこ
となく、ＭＰＵの性能をフルに引き出すことができる。According to the present invention, since the address of memory access can be predicted with high accuracy, even if a program handles large-scale data that cannot be stored in the cache memory, the cache hit rate is reduced. The performance of the MPU can be brought out to the full without the need.

【００４４】すなわち、大規模データを扱うプログラム
では、配列上のデータをある規則（距離）に従って連続
アクセスするケースが多い。連続アクセスのアクセス間
隔は、ソフトウェアを書き換えれば容易に知ることがで
きるが、市販のＭＰＵ上で動作する流通アプリケーショ
ンを移植する際、マシン固有のチューニングを施すのは
容易なことではない。ソフトウェアチューニング無し
に、同じＭＰＵを使う他社のマシンよりも高い性能が出
せれば、差別化の強力な手段となりうる。本発明は、こ
のような差別化の難しい、市販ＭＰＵを使ったシステム
で大きな効果を期待することができる。That is, in a program that handles large-scale data, data on an array is often accessed continuously according to a certain rule (distance). The access interval of the continuous access can be easily known by rewriting the software, but it is not easy to perform machine-specific tuning when porting a distribution application operating on a commercially available MPU. If higher performance than other machines using the same MPU can be achieved without software tuning, this can be a powerful means of differentiation. The present invention can be expected to have a great effect in a system using a commercially available MPU in which such differentiation is difficult.

[Brief description of the drawings]

【図１】本発明の実施の一形態を示すブロック図。FIG. 1 is a block diagram showing one embodiment of the present invention.

【図２】本発明の実施の一形態を示すブロック図（つづ
き）。FIG. 2 is a block diagram showing one embodiment of the present invention (continued).

【図３】演算器の論理を示す説明図。FIG. 3 is an explanatory diagram showing the logic of a computing unit.

【図４】アプリケーションプログラムの例を示す説明
図。FIG. 4 is an explanatory diagram showing an example of an application program.

【図５】２次元配列の処理動作を示す説明図。FIG. 5 is an explanatory diagram showing a processing operation of a two-dimensional array.

【図６】本発明の動作例を示す説明図。FIG. 6 is an explanatory diagram showing an operation example of the present invention.

【図７】本発明を適用した情報処理装置の例を示すブロ
ック図。FIG. 7 is a block diagram illustrating an example of an information processing apparatus to which the present invention has been applied.

[Explanation of symbols]

１プロセッサ２キャッシュメモリ３バッファメモリ４主記憶装置５アドレス監視回路６リクエスト生成回路７リクエスト中止回路８メモリ例外無効化回路Ｃ０〜Ｃ９，ＣＡ〜ＣＦ比較器Ｄ０〜Ｄ９，Ｅ０〜Ｅ９，Ｆ０〜Ｆ９，Ｇ０〜Ｇ９
レジスタDESCRIPTION OF SYMBOLS 1 Processor 2 Cache memory 3 Buffer memory 4 Main storage device 5 Address monitoring circuit 6 Request generation circuit 7 Request cancellation circuit 8 Memory exception invalidation circuit C0-C9, CA-CF Comparator D0-D9, E0-E9, F0-F9 , G0 to G9
register

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平３−63852（ＪＰ，Ａ) 特開平６−314241（ＪＰ，Ａ) 特開昭53−134335（ＪＰ，Ａ) 特開平４−369061（ＪＰ，Ａ) 特開平６−51982（ＪＰ，Ａ) 特開平８−161226（ＪＰ，Ａ) 特開平７−64862（ＪＰ，Ａ) 特開平６−342403（ＪＰ，Ａ) 特開平５−181748（ＪＰ，Ａ) 特開平８−212054（ＪＰ，Ａ) 特開平６−28180（ＪＰ，Ａ) 特開平３−102443（ＪＰ，Ａ) 特開平２−18645（ＪＰ，Ａ) 特表平７−506921（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 12/08 - 12/12 G06F 9/38 G06F 17/16 G06T 1/60 ──────────────────────────────────────────────────続き Continuation of front page (56) References JP-A-3-63852 (JP, A) JP-A-6-314241 (JP, A) JP-A-53-134335 (JP, A) JP-A-4- 369061 (JP, A) JP-A-6-51982 (JP, A) JP-A-8-161226 (JP, A) JP-A-7-64862 (JP, A) JP-A-6-342403 (JP, A) JP-A-5-181748 (JP, A) JP-A-8-212054 (JP, A) JP-A-6-28180 (JP, A) JP-A-3-102443 (JP, A) JP-A-2-18645 (JP, A) Special table Hei 7-506921 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 12/08-12/12 G06F 9/38 G06F 17/16 G06T 1 / 60

Claims

(57) [Claims]

1. A first buffer for holding an address history of a past memory access, and a distance between the address and a part or all of the address history held in the first buffer at the time of memory access. Means for calculating
A second buffer for holding the distance and matches the number of the first address stored in the buffer and previous one or more addresses, stored in the addresses and the first buffer upon a new memory access Past memory
When the distance from one of the access addresses coincides with at least one of the distances corresponding to past memory accesses held in the second buffer, means for storing the number of matches corresponding to the combination of the accesses A means for issuing continuous access at the distance when the number of matches reaches a predetermined number.

2. The memory access speed-up device according to claim 1, further comprising means for stopping the continuous access when data accessed continuously is not used.

3. The memory access speed-up device according to claim 1, wherein the continuously accessed data is stored in a temporary storage buffer and not directly stored in a cache memory.

4. The memory access speed-up device according to claim 1, further comprising means for ignoring the exception even if an exception occurs at the time of continuous access. apparatus.