JPH049346B2

JPH049346B2 -

Info

Publication number: JPH049346B2
Application number: JP60155131A
Authority: JP
Priority date: 1985-07-16
Filing date: 1985-07-16
Publication date: 1992-02-19
Also published as: JPS6217851A

Description

[Detailed description of the invention]

〔発明の利用分野〕本発明は、情報処理システムのメモリ管理ユニ
ツトに係り、特に、高集積、並びに高速メモリア
クセスを必要とするシステムに好適なメモリ管理
ユニツトに関する。〔発明の背景〕従来のCMOS等を用いた高集積情報処理シス
テムでは、一般に第２図に示すような構成を取る
ものが多くみられた。同図に於て演算処理装置１
は、プログラムを解読実行するもので、CMOS
ロジツクを用いて１チツプで実現される。アドレ
ス変換部２は、仮想記憶空間を支援するため、演
算処理装置１が送出する論理アドレスをダイナミ
ツクに物理アドレスに割当て、システムバス３に
送出する。メインメモリ４は、プログラムやデー
タを記憶し、Ｉ／Ｏアダプター５は、システムバ
ス３とＩ／Ｏバス６間の通信を支援する。フアイ
ルコントロールプロセツサ７は、たとえば、デイ
スクドライブ８を制御し、メインメモリ４へのダ
イレクトメモリアクセス（以下DMA）を実行す
る。以上のようにシステム構成においては、命令フ
エツチ、オペランドフエツチに伴なう演算処理装
置１からのメモリアクセスは、アドレス変換部２
で論理アドレスから物理アドレスに変換された後
直接メインメモリ４をアクセスする方式をとる。
従来では、演算処理装置の処理速度と、メインメ
モリを構成するダイナミツクRAMのアクセス速
度に大幅な隔たりがなく、上記システム構成で十
分演算処理装置の性能を引き出す事ができた。と
ころが、CMOSプロセス技術の高度化に伴い、
素子の集積度が増大し、その結果、演算処理装置
は、高機能化及び素子の遅延時間短縮により高速
化される傾向にある。それに対し、メインメモリ
に用いられるダイナミツクRAMは、アクセス速
度は一定のままで、高密度化に重点が置かれてい
るため、演算装置の処理速度とメインメモリのア
クセス速度の差は増大しつつある。高速な演算処
理装置の性能を十分引き出すためには上記２つの
装置間に緩衝装置としての、メインメモリの一部
を高速小容量のメモリに保持するキヤツシ装置が
要求されるようになつてきた。さらに最近注目さ
れている人工知能分野においては、述語論理を扱
うことができるプログラム言語、たとえばプロロ
ーグを高速に処理するハードウエアが要求されて
いるが、これらの言語はスタツク操作を基本とす
るため、従来言語に比較してメモリアクセス頻度
が高いという特徴を持つている。ここにおいて
も、メモリアクセスの高速化に効果的なキヤツシ
ユメモリの実現が重要な課題となりつつある。従来のキヤツシユメモリの実現方法としては、
第３図に示す構成が一般的である。すなわち、ア
ドレス変換バツフア８４で論理アドレス８９を物
理アドレス９０に変換した後、デイレクトリ８５
とキヤツシユデータ部８６をアクセスし、デイレ
クトリの出力と物理アドレス９０を比較器８８で
比較して一致していれば対応するデータをセレク
タ８７で選択する方法である。この構成をVLSI
技術を用いた高集積情報処理システムに適用した
場合、次の問題が発生する。 (1) キヤツシユメモリは、ヒツト率向上のためセ
ツトアソシアテイブ方式を取り、通常セツト数
は２〜４程度である。このため、デイレクトリ
及びキヤツシユデータ部は２〜４プレーンから
成り、パラレルにアクセスされる必要がある。
この時、キヤツシユデータ部に使用されるメモ
リのワード長は、例えば64KB容量でバンド幅
が4Bのキヤツシユメモリを２セツトアソシア
テイブ方式で実現した場合、8Kワードという
数値になる。しかし、現存の高集積技術を用い
れば、この8Kワード以上の、例えば16Kワー
ドという１つのアドレス空間（アドレスを10進
で示すと０〜15999）をもつメモリの方が集積
度が高いが、このような長いワード長のメモリ
を十分に利用できず、ハード量の増大につなが
る。 (2) 読出し要求時には、デイレクトリとキヤツシ
ユデータ部をパラレルにアクセス可能である
が、書込み要求時には、デイレクトリの出力と
物理アドレスを比較した結果を用いてキヤツシ
ユデータ部へ書込み起動をかける必要があるた
めパラレル処理ができず、アクセスが遅くな
る。又、第４図には、メインフレームが高性能ミニ
コンピユータにおいて、高速アクセスの目的で、
採用されるキヤツシユ構成を示した。この構成で
はアドレス変換バツフア９２でアドレス変換する
と同時に論理アドレス９９中のアドレス変換に依
存しないオフセツト９７でデイレクトリ９３とキ
ヤツシユデータ部９４をアクセスし、アドレス変
換により得られる物理アドレス９８とデイレクト
リの出力を比較器９５で比較した後、ヒツトして
いれば、対応するデータをセレクタ９６で選択す
る構成である。この構成では、アクセスは高速と
なるが、オフセツトでデイレクトリ及びキヤツシ
ユデータ部をアクセスするため、セツト数がかな
り大きくなつてしまう。例えば、4KBのページ
サイズ64KB容量のキヤツシユを実現するにはセ
ツト数は16となる。このため、VLSIシステムへ
適用を考えた場合、前記キヤツシユ実現例以上に
キヤツシユデータ部の１プレーンのワード長を短
くてこれを並列アクセスすることになるから、ワ
ード方向に長いメモリの有効利用には一層不向と
なり、ハード量が大きくなるという問題が発生す
る。次にVLSI技術を用いたメモリ管理に関する実
際の公知例としては、ザイログ社製CPUの「プ
レリミナリ・プロダクト・スペーシケーシヨン」
（Preliminary Product Specication）に記載の
ように、演算処理装置のチツプ内に小容量のキヤ
ツシユメモリを設けた装置が提案されている。す
なわち第５図に示すように、演算処理装置１と、
アドレス変換部２と、デイレクトリ９及びキヤツ
シユデータ部１０を１チツプ上に実装した装置１
１である。しかし、この実現法では、次のような
問題点が考えられる。 (1) オンチツプできるハード量には限界があり、
キヤツシユ容量が制限されるため、キヤツシユ
のヒツト率が低く今後予想される高速演算装置
の性能を十分引出せない。 (2) フアイルコントロールプロセツサ７からの
DMA転送等、他のバスマスターによるメモリ
アクセスに対して、メインメモリ４とキヤツシ
ユメモリ１０の内容一致をはかるためのオーバ
ーヘツドが必要となる。又、他の公知例としては、シグネテイクス社の
「ベーシツク・メモリ・アクセス・コントローラ」
（Besic Memory Access Controller）に関する
アドレス・インフオメーシヨン（Advance
Information）に記載のように、アドレス変換部
と、キヤツシユのデイレクトリを１チツプ化した
メモリ管理ユニツトが提案されている。すなわち
第６図に示すように、アドレス変換部２をキヤツ
シユメモリのタグ比較部９を１チツプ化した装置
１２である。この場合の問題点は、 (1) アクセス高速化のため、論理アドレスでキヤ
ツシユを検索する方式を取つているが、これで
は現在OSの主流となりつつあるUNIXの特徴
である多重仮想空間を支援した場合、タスクス
イツチのたびにキヤツシユを消去する必要があ
り、ヒツト率の低下を招く。 (2) 前記公知例と同様に、DMA転送に対してメ
インメモリとキヤツシユメモリの一致をはかる
オーバーヘツドが必要となる。［発明の目的］本発明の第１の目的は、キヤツシユアクセスの
スループツトを向上させるメモリ管理ユニツトを
提供することにある。本発明の第２の目的は、大容量のキヤツシユメ
モリを小量のハードウエアで実現するメモリ管理
ユニツトを提供することにある。本発明の第３の目的は、演算処理装置内キヤツ
シユメモリとメインメモリとの一致保証を容易に
するメモリ管理ユニツトを提供することにある。［発明の概要］上記第１、第２の目的は、メモリ管理ユニツト
に、論理アドレスのうち変換に依存しないオフセ
ツト部で前記デイレクトリ部を検索することによ
りアドレス変換とキヤツシユタグ比較とを並列に
実行する手段と、キヤツシユタグ比較結果をエン
コードした情報と論理アドレスのオフセツト部を
結合してキヤツシユデータ部のアドレスを生成す
る手段と、該キヤツシユデータ部のアドレスを保
持する手段と、該キヤツシユデータ部のアドレス
によりキヤツシユデータ部を読み出す手段と、前
記キヤツシユタグ比較と前記キヤツシユデータ部
の読み出しをパイプライン動作させる手段とを設
けることで、達成される。上記のメモリ管理ユニツトでは、キヤツシユデ
イレクトリはアドレス変換に依存しない論理アド
レスのオフセツト部で検索されるので、アドレス
変換とキヤツシユタグ比較が並列に実行可能とな
る。また、キヤツシユタグ比較結果をエンコード
した情報と論理アドレスのオフセツト部とを結合
したアドレスによつてキヤツシユデータ部が読み
出されるので、キヤツシユデータ部を複数のセツ
トに分割して並列に読み出すことなくセツトアソ
シアテイブ方式のキヤツシユメモリを実現でき
る。また、キヤツシユデータ部のアドレスを保持
する手段により、アドレス変換とキヤツシユタグ
比較の並列実行ステージと、キヤツシユデータ部
アクセスステージをパイプライン処理可能とな
る。上記第３の目的は、メモリ管理ユニツトに、
DMA装置等の入出力装置からのメモリアクセス
アドレスを保持する受信ラツチレジスタと、該受
信ラツチレジスタのオフセツト部と前記演算処理
装置からの論理アドレスのオフセツト部を選択す
る第１セレクタと、該受信ラツチレジスタの物理
ページ番号と前記アドレス変換バツフアが生成す
る物理ページ番号を選択する第２セレクタと、選
択された入出力装置のアクセスアドレスによつて
キヤツシユデイレクトリを検索する手段と、キヤ
ツシユタグ比較結果をエンコードした情報と入出
力装置のアクセスアドレスのオフセツト部を結合
してキヤツシユデータ部のアドレスを生成する手
段と、該キヤツシユデータ部のアドレスによりキ
ヤツシユデータ部のデータを読み出す手段とを設
けることで、達成される。この第３の目的を達成するメモリ管理ユニツト
では、演算処理装置からのアクセスアドレスと入
出力装置からのアクセスアドレスを選択する手段
と、キヤツシユタグ比較結果をエンコードした情
報と入出力装置のアクセスアドレスのオフセツト
部とを結合してキヤツシユデータ部を読み出す手
段により、入出力装置から直接キヤツシユメモリ
を読み出すことが可能となる。これにより、入出
力装置からのアクセスは全てキヤツシユメモリ経
由とすることができ、キヤツシユメモリとメイン
メモリの一致保証を容易に実現できる。〔発明の実施例〕以下、本発明の実施例を図面を用いて説明す
る。第７図は本発明のユニツトを用いるシステム
の構成例を示すもので、演算処理装置１は、
CMOSロジツクで１チツプに集積され、メモリ
アクセス時はリクエスト信号と、32ビツトの論理
アドレス及び３ビツトのアドレススペース信号、
すなわちメインメモリ物理アドレス空間と、Ｉ／
Ｏバス物理アドレス空間とを区別する信号をメモ
リ管理ユニツト１３に転送する。メインメモリ４
は、27ビツトのアドレス空間で128MBの容量を
持ち、ダイナミツクRAMを用いて構成される。
Ｉ／Ｏアダプタ５は、内部にアドレス変換テーブ
ルを持ち、フアイルコントロールプロセツサ７等
からのDMA転送要求に対して、Ｉ／Ｏバス６上
の論理アドレスをメインメモリ物理アドレスに変
換する機能を持つ。これにより、Ｉ／Ｏバス６上
のバスマスタは、ダイナミツクにメインメモリを
使用することができる。本発明の対象となるのはメモリ管理ユニツト１
３であつて、その内部構成を第８図に示した。演
算装置１との間はアドレス線１７、データ線１８
で接続され、メインメモリ４との間はアドレス線
２１、データ線２２で接続され、Ｉ／Ｏアダプタ
５との間はアドレス線１９、データ線２０で接続
される。アドレス生成チツプ１４は本発明の特徴
とするものであつて、その実施例は後に第１図に
よつて説明するが、内部にアドレス変換部と、キ
ヤツシユのデイレクトリ及びコントローラ部を有
し、機能としては、演算装置からの論理アドレ
ス、あるいはDMA装置からの物理アドレスを受
け取つて、１サイクルでキヤツシユアドレスと物
理アドレスを生成する。CMOS微細加工技術を
用いて１チツプ化される。キヤツシユデータ部１
５は、キヤツシユメモリのデータ記憶部で、
64KBの容量を持ち、高速なスタテイツクRAM
を用いて実現される。インターフエース部１６
は、ゲートアレイ等で実現され、メインメモリ及
びＩ／Ｏアダプタへのライトバツフア等を有し、
データの流れをコントロールする。このメモリ管
理ユニツト１３の動作は以下の通りである。 (1) 演算装置からの読出し要求に対する処理処理フローを第９図に示す。演算装置１から
の読出し要求信号、論理アドレス１７、アドレ
ススペース信号は、アドレス生成チツプ１４に
受け取られ、まずアドレス変換部により、論理
アドレスが物理アドレスに変換される。次に得
られた物理アドレスとアドレススペース信号を
用いてキヤツシユのタグが比較される。その結
果対象データがキヤツシユに存在すれば、キヤ
ツシユアドレス２３が生成され、キヤツシユデ
ータ部１５からデータを読出し、データ線１８
を通じて、演算装置１にデータが転送される。
もし、対象データがキヤツシユ内に存在しなけ
れば、アドレススペース信号より要求空間を区
別し、メインメモリ空間ならば、アドレス線２
１へ、Ｉ／Ｏ空間ならば、アドレス線１９へ物
理アドレスを送出し、メモリ管理ユニツト１３
へのデータ転送を要求する。メモリ空間からの
データはデータ線２２，２４を通じて32バイト
単位でキヤツシユデータ部１５へ書込まれ、
Ｉ／Ｏ空間からのデータはインタフエイス部１
６のバツフアへ書込まれた後、データ線１８を
通じて演算装置へ要求データが転送される。 (2) 演算装置からの書込み要求に対する処理処理フローを第１０図に示す。演算装置１か
らの書込み要求信号、論理アドレス１７、アド
レススペース信号はアドレス生成チツプ１４で
受け取り、書込みデータはデータ線１８を通じ
てインタフエース部１６に受け取る。まずアド
レス生成チツプ１４で論理アドレスを物理アド
レスに変換する。次にアドレススペース信号か
ら対象空間を識別し、物理アドレスを書込みデ
ータを対応する空間のライトバツフアへ書込
む。この時、同時にキヤツシユタグの比較も行
ない、対象データがキヤツシユに依存すればデ
ータ線２４を通じてキヤツシユデータ部１５に
書込む。もし、キヤツシユに存在しなければ、
キヤツシユデータ部１５に対してはノーオペレ
ーシヨンとする。 (3) DMA装置からのアクセスに対する処理フアイルコントロールプロセツサ７等からの
DMA転送要求があると、まずＩ／Ｏアダプタ
ー５でＩ／Ｏバス空間６の論理アドレスがメイ
ンメモリ空間の物理アドレスに変換される。そ
の結果がアドレス線１９を通じてアドレス生成
チツプ１４に転送され、キヤツシユメモリの検
索が実行される。以上のメモリアクセス動作を、高速に実行し、
かつ少量のハードで実現するに当つて鍵となるの
は、アドレス生成チツプ１４であり、その実施例
を第１図に示す。同図にて、入力はＩ／Ｏ空間か
らの物理アドレス１９、演算処理装置からの論理
アドレス１７、出力はメインメモリ、あるいは
Ｉ／Ｏ空間への物理アドレス２１、キヤツシユア
ドレス２３である。又、ラツチレジスタ２５，２
６，２７，２８，１００，１０１は、スルーラツ
チで、パイプラインレジスタとして使用される。
論理アドレスタグ保持部３１、物理アドレスタグ
保持部３２、比較器３３、及びセレクタ３４は、
アドレス変換を高速化するためのアドレス変換バ
ツフア（TLB）を構成している。これは第１表
に示したように、２ウエイのセツトアソシアテイ
ブ方式を取り、セツト当り論理アドレスと物理ア
ドレスの組合せを128エントリ持つ。リプレース
メントアルゴリズムは、各カラム毎に直前にアク
セスされた方ではないページを置換えるという
LRU方式とする。 [Field of Application of the Invention] The present invention relates to a memory management unit for an information processing system, and particularly to a memory management unit suitable for a system requiring high integration and high-speed memory access. [Background of the Invention] Conventional highly integrated information processing systems using CMOS or the like generally have the configuration shown in FIG. 2. In the figure, arithmetic processing unit 1
is for decoding and executing programs, and is for CMOS
It is realized on a single chip using logic. The address translation unit 2 dynamically allocates the logical address sent by the arithmetic processing unit 1 to a physical address and sends it to the system bus 3 in order to support the virtual storage space. Main memory 4 stores programs and data, and I/O adapter 5 supports communication between system bus 3 and I/O bus 6. The file control processor 7 controls, for example, the disk drive 8 and executes direct memory access (hereinafter referred to as DMA) to the main memory 4. In the system configuration as described above, memory accesses from the arithmetic processing unit 1 associated with instruction fetches and operand fetches are performed by the address conversion unit 2.
After the logical address is converted into a physical address, the main memory 4 is accessed directly.
Conventionally, there was no significant difference between the processing speed of the arithmetic processing unit and the access speed of the dynamic RAM that constitutes the main memory, and the above system configuration was able to sufficiently bring out the performance of the arithmetic processing unit. However, with the advancement of CMOS process technology,
The degree of integration of elements is increasing, and as a result, arithmetic processing units tend to be faster due to higher functionality and shorter element delay times. In contrast, with dynamic RAM used for main memory, the access speed remains constant and emphasis is placed on high density, so the difference between the processing speed of the arithmetic unit and the access speed of the main memory is increasing. . In order to take full advantage of the performance of high-speed arithmetic processing devices, a cache device that serves as a buffer between the two devices and stores a portion of the main memory in a high-speed, small-capacity memory has come to be required. Furthermore, in the field of artificial intelligence, which has recently attracted attention, programming languages that can handle predicate logic, such as hardware that can process prologues at high speed, are required, but since these languages are based on stack operations, It is characterized by higher memory access frequency than conventional languages. Here too, the realization of a cache memory that is effective in speeding up memory access is becoming an important issue. The conventional method of realizing cache memory is as follows.
The configuration shown in FIG. 3 is common. That is, after converting the logical address 89 into a physical address 90 by the address conversion buffer 84, the directory 85
The comparator 88 compares the output of the directory with the physical address 90, and if they match, the selector 87 selects the corresponding data. This configuration can be converted to VLSI
When applied to a highly integrated information processing system using technology, the following problems occur. (1) The cache memory uses a set associative method to improve the hit rate, and the number of sets is usually about 2 to 4. Therefore, the directory and cache data sections consist of 2 to 4 planes and need to be accessed in parallel.
At this time, the word length of the memory used for the cache data section is, for example, 8K words when a cache memory with a capacity of 64KB and a bandwidth of 4B is implemented using the 2-set associative method. However, using existing high-density technology, memory with a single address space of 8K words or more, for example 16K words (0 to 15999 in decimal format), has a higher degree of integration. Such long word length memory cannot be fully utilized, leading to an increase in the amount of hardware. (2) At the time of a read request, it is possible to access the directory and cache data section in parallel, but at the time of a write request, it is necessary to start writing to the cache data section using the result of comparing the directory output and the physical address. Because of this, parallel processing is not possible and access becomes slow. Figure 4 also shows that the mainframe is a high-performance minicomputer, and for the purpose of high-speed access,
The adopted cache configuration is shown. In this configuration, the address translation buffer 92 performs address translation, and at the same time, the directory 93 and cache data section 94 are accessed using an offset 97 in the logical address 99 that does not depend on address translation, and the physical address 98 obtained by address translation and the output of the directory are accessed. After the comparison is made by the comparator 95, if there is a hit, the corresponding data is selected by the selector 96. With this configuration, access is fast, but since the directory and cache data sections are accessed at offsets, the number of sets becomes considerably large. For example, to achieve a cache with a page size of 4KB and a capacity of 64KB, the number of sets is 16. Therefore, when considering application to a VLSI system, the word length of one plane of the cache data section is shorter than in the cache implementation example described above, and this is accessed in parallel, which makes effective use of memory that is long in the word direction. becomes even more unsuitable, and the problem arises that the amount of hardware increases. Next, an actual known example of memory management using VLSI technology is the "Preliminary Product Spacing" of Zilog's CPU.
As described in the Preliminary Product Specification, a device has been proposed in which a small-capacity cache memory is provided within a chip of an arithmetic processing unit. That is, as shown in FIG. 5, the arithmetic processing device 1,
A device 1 in which an address conversion section 2, a directory 9 and a cache data section 10 are mounted on one chip.
It is 1. However, this implementation method has the following problems. (1) There is a limit to the amount of hardware that can be put on-chip.
Since the cache capacity is limited, the cache hit rate is low and the performance of high-speed computing devices expected in the future cannot be fully exploited. (2) From file control processor 7
Overhead is required to match the contents of the main memory 4 and the cache memory 10 for memory access by other bus masters such as DMA transfer. Another known example is Signetaix's "Basic Memory Access Controller".
(Basic Memory Access Controller) address information (Advanced Memory Access Controller)
As described in ``Information'', a memory management unit has been proposed in which an address conversion section and a cache directory are integrated into one chip. That is, as shown in FIG. 6, this is a device 12 in which the address conversion section 2 and the tag comparison section 9 of the cache memory are integrated into one chip. The problems in this case are: (1) In order to speed up access, a method is used to search the cache using logical addresses, but this method does not support multiple virtual spaces, which is a feature of UNIX, which is currently becoming the mainstream OS. In this case, it is necessary to clear the cache every time the task is switched, resulting in a decrease in hit rate. (2) Similar to the prior art example, overhead is required to match the main memory and cache memory for DMA transfer. [Object of the Invention] A first object of the present invention is to provide a memory management unit that improves the throughput of cache access. A second object of the present invention is to provide a memory management unit that realizes a large capacity cache memory with a small amount of hardware. A third object of the present invention is to provide a memory management unit that facilitates ensuring consistency between a cache memory within an arithmetic processing unit and a main memory. [Summary of the Invention] The first and second objects described above are to cause a memory management unit to perform address translation and cache tag comparison in parallel by searching the directory section using an offset section of a logical address that does not depend on translation. means for generating an address of a cache data section by combining information obtained by encoding a cache tag comparison result with an offset section of a logical address; means for holding an address of the cache data section; This is achieved by providing means for reading out the cache data section according to the address of , and means for performing pipeline operation of the cache tag comparison and the readout of the cache data section. In the above memory management unit, since the cache directory is searched at the offset portion of the logical address that does not depend on address translation, address translation and cache tag comparison can be performed in parallel. Furthermore, since the cache data section is read using an address that combines the information encoded from the cache tag comparison result and the offset section of the logical address, the cache data section can be read out without having to divide the cache data section into multiple sets and read them in parallel. Associative type cache memory can be realized. Furthermore, the means for holding the address of the cache data section enables pipeline processing of the parallel execution stage of address conversion and cache tag comparison and the cache data section access stage. The third purpose is to provide the memory management unit with
a receive latch register that holds a memory access address from an input/output device such as a DMA device; a first selector that selects an offset portion of the receive latch register and an offset portion of a logical address from the arithmetic processing unit; a second selector for selecting a physical page number of a register and a physical page number generated by the address translation buffer; means for searching a cache directory based on the access address of the selected input/output device; and encoding a cache tag comparison result. By providing means for generating the address of the cache data section by combining the information and the offset section of the access address of the input/output device, and means for reading the data of the cache data section using the address of the cache data section. , achieved. The memory management unit that achieves this third purpose has a means for selecting an access address from an arithmetic processing unit and an access address from an input/output device, and an offset between information encoded as a result of comparing cache tags and the access address of an input/output device. The means for reading out the cache data section by reading out the cache data section makes it possible to read out the cache memory directly from the input/output device. Thereby, all accesses from the input/output device can be made via the cache memory, and it is possible to easily guarantee consistency between the cache memory and the main memory. [Embodiments of the Invention] Examples of the present invention will be described below with reference to the drawings. FIG. 7 shows an example of the configuration of a system using the unit of the present invention.
It is integrated into one chip using CMOS logic, and when accessing memory, it sends a request signal, a 32-bit logical address, and a 3-bit address space signal.
That is, main memory physical address space and I/
A signal distinguishing the O bus from the physical address space is transferred to the memory management unit 13. main memory 4
It has a 27-bit address space, a capacity of 128MB, and is constructed using dynamic RAM.
The I/O adapter 5 has an internal address conversion table and has the function of converting logical addresses on the I/O bus 6 into main memory physical addresses in response to DMA transfer requests from the file control processor 7, etc. . This allows the bus master on the I/O bus 6 to dynamically use the main memory. The object of the present invention is the memory management unit 1.
3, and its internal configuration is shown in FIG. An address line 17 and a data line 18 are connected to the arithmetic unit 1.
It is connected to the main memory 4 through an address line 21 and a data line 22, and connected to the I/O adapter 5 through an address line 19 and a data line 20. The address generation chip 14 is a feature of the present invention, and an embodiment thereof will be explained later with reference to FIG. receives a logical address from an arithmetic device or a physical address from a DMA device, and generates a cache address and a physical address in one cycle. It is made into a single chip using CMOS microfabrication technology. Cash data section 1
5 is a data storage section of cache memory;
High-speed static RAM with 64KB capacity
This is realized using Interface section 16
is realized with a gate array, etc., and has a main memory and a write buffer to the I/O adapter, etc.
Control the flow of data. The operation of this memory management unit 13 is as follows. (1) Processing for a read request from an arithmetic unit The processing flow is shown in FIG. A read request signal, a logical address 17, and an address space signal from the arithmetic unit 1 are received by the address generation chip 14, and the logical address is first converted into a physical address by an address conversion section. The cache's tag is then compared using the resulting physical address and address space signals. As a result, if the target data exists in the cache, a cache address 23 is generated, the data is read from the cache data section 15, and the data line 18 is read out.
Data is transferred to the arithmetic device 1 through the arithmetic unit 1.
If the target data does not exist in the cache, the requested space is distinguished from the address space signal, and if it is the main memory space, the address line 2
1, if it is an I/O space, sends the physical address to the address line 19 and sends the physical address to the memory management unit 13.
Request data transfer to. Data from the memory space is written to the cache data section 15 in units of 32 bytes through data lines 22 and 24.
Data from the I/O space is transferred to interface section 1.
After being written into the buffer No. 6, the requested data is transferred to the arithmetic unit via the data line 18. (2) Processing for a write request from an arithmetic unit The processing flow is shown in Figure 10. A write request signal, logical address 17, and address space signal from the arithmetic unit 1 are received by the address generation chip 14, and write data is received by the interface section 16 through the data line 18. First, the address generation chip 14 converts a logical address into a physical address. Next, the target space is identified from the address space signal, the physical address is written, and the write data is written to the write buffer of the corresponding space. At this time, the cache tags are also compared, and if the target data depends on the cache, it is written into the cache data section 15 via the data line 24. If it does not exist in the cache,
No operation is performed on the cache data section 15. (3) Processing for access from DMA device Processing for access from file control processor 7, etc.
When a DMA transfer request is made, the I/O adapter 5 first converts the logical address in the I/O bus space 6 into a physical address in the main memory space. The result is transferred to the address generation chip 14 through the address line 19, and a cache memory search is executed. The above memory access operations are executed at high speed,
The key to realizing this with a small amount of hardware is the address generation chip 14, an embodiment of which is shown in FIG. In the figure, the input is a physical address 19 from the I/O space, a logical address 17 from the arithmetic processing unit, and the output is a physical address 21 and a cache address 23 to the main memory or I/O space. Also, latch registers 25, 2
6, 27, 28, 100, and 101 are through latches used as pipeline registers.
The logical address tag holding unit 31, the physical address tag holding unit 32, the comparator 33, and the selector 34 are as follows:
It constitutes an address translation buffer (TLB) to speed up address translation. As shown in Table 1, this uses a two-way set associative method, and each set has 128 entries for combinations of logical addresses and physical addresses. The replacement algorithm replaces the page that was not accessed most recently for each column.
The LRU method will be used.

【表】セレクタ２９，３０は演算装置１からの要求
と、DMA装置からの要求を選択するセレクタで
ある。デイレクトリ３５は、キヤツシユ内データ
に対応する物理アドレスの上位15ビツトをその内
容として保持していて、アドレス変換に依存しな
いオフセツト部12ビツトでアクセスされる。この
ため１セツト当りに対応するキヤツシユデータ容
量は4KBに制限されてしまう。従つて目標とす
る64KBのキヤツシユ容量を得るために、16セツ
トをパラレルにアクセスする方式を取る。タグ比
較部３６はアクセスアドレスのアドレス変換によ
り得られた物理アドレスの上位15ビツトと各セツ
トの該当するタグ（キヤツシユ上にあるデータの
物理アドレスの上位15ビツト）を比較し一致を調
べる。エンコード部３７は、上記タグ比較部３６
より得られた情報をエンコードしてキヤツシユア
ドレスの上位４ビツトを生成する。キヤツシユメ
モリは上記デイレクトリ３５に対応して、第２表
に示す特性とする。[Table] Selectors 29 and 30 are selectors for selecting a request from the arithmetic device 1 and a request from the DMA device. The directory 35 holds as its contents the upper 15 bits of the physical address corresponding to the data in the cache, and is accessed using the 12 bits of the offset part, which does not depend on address translation. Therefore, the cache data capacity corresponding to one set is limited to 4KB. Therefore, in order to obtain the target cache capacity of 64KB, a method is used in which 16 sets are accessed in parallel. The tag comparison unit 36 compares the upper 15 bits of the physical address obtained by address conversion of the access address with the corresponding tag of each set (the upper 15 bits of the physical address of the data on the cache) to check for a match. The encoding unit 37 is the tag comparison unit 36
The information obtained is encoded to generate the upper 4 bits of the cache address. The cache memory has the characteristics shown in Table 2, corresponding to the directory 35 mentioned above.

〔Effect of the invention〕

第１の発明によれば、次の(a)、(b)の効果が得ら
れ、第２の発明によれば、次の(c)の効果が得られ
る。 (a) アドレス変換とキヤツシユタグ比較が並列に
実行でき、また、アドレス変換とキヤツシユタ
グ比較の並列実行ステージと、キヤツシユデー
タ部アクセスステージをパイプライン処理でき
る。これにより、キヤツシユアクセスのスルー
プツトを大幅に向上できる。 (b) キヤツシユタグ比較結果をエンコードした情
報と論理アドレスのオフセツト部とを結合した
アドレスによつてキヤツシユデータ部が読み出
されるので、キヤツシユデータ部を複数のセツ
トに分割して並列に読み出すことなくヒツト率
の高いセツトアソシアテイブ方式のキヤツシユ
メモリを実現できる。これにより、キヤツシユ
メモリが大容量になつても、キヤツシユデータ
部を並列に読み出す必要がなくなり、大容量の
キヤツシユメモリを小量のハードウエアで実現
することが可能となる。 (c) 演算処理装置からのアクセスアドレスと入出
力装置からのアクセスアドレスを選択する手段
により、入出力装置からのアクセスは全てキヤ
ツシユメモリ経由とすることができ、入出力装
置からのアクセスに対するキヤツシユメモリと
メインメモリの一致保証を容易に実現すること
ができる。 According to the first invention, the following effects (a) and (b) can be obtained, and according to the second invention, the following effect (c) can be obtained. (a) Address conversion and cache tag comparison can be executed in parallel, and the parallel execution stage of address conversion and cache tag comparison and the cache data section access stage can be processed in a pipeline. This greatly improves the throughput of cache access. (b) Since the cache data section is read by an address that combines the information encoded by the cache tag comparison result and the offset section of the logical address, it is not necessary to divide the cache data section into multiple sets and read them in parallel. A set associative type cache memory with a high hit rate can be realized. As a result, even if the cache memory becomes large in capacity, there is no need to read out the cache data portion in parallel, and it becomes possible to realize a large capacity cache memory with a small amount of hardware. (c) By selecting the access address from the processing unit and the access address from the input/output device, all accesses from the input/output device can be made via the cache memory, and the cache memory for access from the input/output device is It is possible to easily guarantee consistency between the storage memory and the main memory.

[Brief explanation of drawings]

第１図は本発明の特徴とするアドレス生成チツ
プの一実施例を示す図、第２図は従来の情報処理
システムの構成図、第３図は一般的なキヤツシユ
メモリの構成図、第４図はメインフレーム等で用
いられる高速アクセスが可能なキヤツシユメモリ
構成図、第５図はザイログ社製CPUを用いたシ
ステム構成図、第６図はシグネテイクス社製メモ
リ管理ユニツトを用いたシステム構成図、第７図
は本発明のユニツトを用いたシステム構成図、第
８図は第１図のアドレス生成チツプを用いたメモ
リ管理ユニツトの内部構成図、第９図及び第１０
図は第８図のメモリ管理ユニツトによるメモリ読
出しフロー及びメモリ書込みのフロチヤート、第
１１図は論理アドレスの説明図、第１２図はパイ
プライン構成の説明図、第１３図及び第１４図は
第１サイクルコントローラの状態遷移図及びその
構成図、第１５図は第２サイクルコントローラの
状態遷移図、第１６図はメモリアクセスのパイプ
ライン動作のタイムチヤートである。１３……メモリ管理ユニツト、１４……アドレ
ス生成チツプ、１５……キヤツシユデータ部、２
５，２６，２７，２８，１００，１０１……ラツ
チレジスタ、２９，３０……セレクタ、３１……
論理アドレスタグ保持部、３２……物理アドレス
タグ保持部、３３，３６……比較器、３４……セ
レクタ、３５……デイレクトリ、３７……エンコ
ーダ。 FIG. 1 is a diagram showing an embodiment of an address generation chip that is a feature of the present invention, FIG. 2 is a configuration diagram of a conventional information processing system, FIG. 3 is a configuration diagram of a general cache memory, and FIG. The figure shows a configuration diagram of a cache memory that can be accessed at high speed and is used in mainframes, etc. Figure 5 is a system configuration diagram using a CPU manufactured by Zilog, and Figure 6 is a system configuration diagram using a memory management unit manufactured by Signetakes. 7 is a system configuration diagram using the unit of the present invention, FIG. 8 is an internal configuration diagram of a memory management unit using the address generation chip of FIG. 1, and FIGS.
The figure shows a flowchart of a memory read flow and a memory write by the memory management unit in FIG. 8, FIG. 11 is an explanatory diagram of logical addresses, FIG. FIG. 15 is a state transition diagram of the cycle controller and its configuration diagram, FIG. 15 is a state transition diagram of the second cycle controller, and FIG. 16 is a time chart of pipeline operation of memory access. 13...Memory management unit, 14...Address generation chip, 15...Cache data section, 2
5, 26, 27, 28, 100, 101... Latch register, 29, 30... Selector, 31...
Logical address tag holding unit, 32... Physical address tag holding unit, 33, 36... Comparator, 34... Selector, 35... Directory, 37... Encoder.

Claims

[Claims] 1. An address conversion buffer that receives a logical address from an arithmetic processing unit, converts the logical page number that is part of the logical address into a physical page number, and generates a physical address, and a search using the generated physical address. In the memory management unit, which consists of a directory part of the cache memory that is stored in means for executing address translation and cache tag comparison in parallel by searching the directory section using an offset section that is not stored; means for generating the cache data section; means for holding the address of the cache data section; means for reading the cache data section based on the address of the cache data section; and a pipeline operation for comparing the cache tags and reading the cache data section. A memory management unit characterized in that it is provided with means for controlling. 2. An address conversion buffer that receives a logical address from an arithmetic processing unit, converts the logical page number that is part of the logical address into a physical page number, and generates a physical address, and a cache memory that is searched using the generated physical address. A memory management unit consisting of a directory section and a data section of a cache memory that holds part of the data in the main memory and is read out using the generated physical address, handles memory access addresses from input/output devices such as DMA devices. A receive latch register to be held, a first selector that selects an offset part of the receive latch register and an offset part of the logical address from the arithmetic processing unit, a physical page number of the receive latch register, and the address translation buffer are generated. a second selector for selecting a physical page number; means for searching a cache directory according to the access address of the selected input/output device; and information encoding the cache tag comparison result and an offset part of the access address of the input/output device. 1. A memory management unit comprising: means for combining to generate an address of a cache data section; and means for reading data of the cache data section using the address of the cache data section.