JPH08272681A

JPH08272681A - Microprocessor with cache reconstitutible at instruction level

Info

Publication number: JPH08272681A
Application number: JP8017299A
Authority: JP
Inventors: Vargade Argade Pramod; ヴァサントアーゲイドプラモッド
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1995-02-03
Filing date: 1996-02-02
Publication date: 1996-10-18
Also published as: TW297111B; KR960032182A

Abstract

PROBLEM TO BE SOLVED: To provide a microprocessor which has reconstructible caches on an instruction level. SOLUTION: This microprocessor contains a product-sum operation unit (MAU) 305 to execute a high speed signal processing operation. When a product- sum operation (MAC) instruction is executed, 1st and 2nd caches 301 and 302 directly supply 1st and 2nd operands (x and y) to the MAU 305. When a normal instruction is executed, multiplexers 310 and 311 are included which select data from either the 1st or 2nd cache 301 or 302. Translation lookaside buffer is included which has a page entry table that contains an additional 'reconstruction' bit and a 'way' bit to control writing data to the caches. Thus, this microprocessor can use a conventional set associative (set association) cache to simultaneously access plural operands.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、増大したバンド幅
での演算に備える、セットアソシエイティブ（セット連
想付け）キャッシュを再構成する手段を有する、マイク
ロプロセッサーに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a microprocessor having means for reconfiguring a set associative cache for operations with increased bandwidth.

【０００２】[0002]

【従来の技術】多くの従来のマイクロプロセッサーは、
多重ポートを持つレジスタファイルを有しており、それ
故、各サイクル毎に、レジスタに含まれた２つのオペラ
ンドを実行ユニット（ＥＵ）に提供することが可能であ
る。レジスタファイルは、同じ集積回路上に、算術論理
（演算）ユニット（ＡＬＵ）として含まれており、望み
のデータを提供する非常に速い手段である。例えば、図
１を参照するに、典型的な先行技術におけるマイクロプ
ロセッサー１００は、第一のレジスタファイル１０２に
第一のアドレス（アドレス０）を、さらに、第二のレジ
スタファイル１０３に第二のアドレス（アドレス１）を
供給する命令レジスタ１０１を含んでいる。レジスタフ
ァイル１０２及び１０３は、例示的には、各３２ビット
の３２エントリを有している。第一のレジスタファイル
１０２は、第一のオペランドレジスタ１０４に第一のオ
ペランドを供給し、第二のレジスタファイル１０３は、
第二のオペランドレジスタ１０５に第二のオペランドを
供給している。レジスタ１０４及び１０５は、算術論理
（演算）ユニット（ＡＬＵ）１０６に第一及び第二のオ
ペランドを供給しており、この算術論理（演算）ユニッ
ト（ＡＬＵ）１０６は、例示的には、積和演算（ＭＡ
Ｃ）を含む様々な算術演算を実行しうる。その結果は、
結果レジスタ１０７に保存され、線１０８を経由して、
レジスタファイルに再び書き込まれうる。選択的な実施
例においては、２つのファイル１０２、１０３の代わり
に、単一のデュアル（二重）ポートレジスタファイル
（示されていない）が、用いられている。その場合に
は、２つの読み出しポートが、レジスタファイル内の、
いずれかの２つのエントリへの同時アクセスを許すこと
になる。BACKGROUND OF THE INVENTION Many conventional microprocessors
It has a register file with multiple ports, so it is possible to provide the execution unit (EU) with the two operands contained in the register each cycle. The register file is contained on the same integrated circuit as an arithmetic logic unit (ALU) and is a very fast means of providing the desired data. For example, referring to FIG. 1, a typical prior art microprocessor 100 has a first address (address 0) in a first register file 102 and a second address in a second register file 103. It includes an instruction register 101 that supplies (address 1). The register files 102 and 103 illustratively have 32 entries of 32 bits each. The first register file 102 supplies the first operand to the first operand register 104, and the second register file 103
The second operand is supplied to the second operand register 105. The registers 104 and 105 supply the arithmetic logic (arithmetic) unit (ALU) 106 with the first and second operands, and the arithmetic logic (arithmetic) unit (ALU) 106 illustratively sums of products. Operation (MA
Various arithmetic operations may be performed, including C). The result is
Saved in result register 107, via line 108,
It can be written back to the register file. In an alternative embodiment, a single dual port register file (not shown) is used instead of the two files 102, 103. In that case, the two read ports are in the register file,
It will allow simultaneous access to any two entries.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、チップ
上のレジスタにないメモリに含まれている、２つのオペ
ランドを供給することが必要となる場合は、多くある。
その例としては、信号処理の基本的な原理の一つである
積和演算命令がある。２つのメモリオペランドは、利用
可能である場合には、典型的には、チップ上のデータキ
ャッシュに含まれているか、あるいは選択的には、マイ
クロプロセッサーチップへの外部キャッシュに含まれて
いる。いずれの場合においても、各サイクル毎に実行ユ
ニット（ＥＵ）へ２つのオペランドを供給するというこ
とは、データキャッシュをデュアル（二重）ポート化す
るということを意味しているのである。典型的な命令と
しては、ＭＡＣｘ，ｙ，ａ０というものがある。こ
こで、ＭＡＣとは、積和演算という命令に対する簡略記
法であり、これにより特定される演算とは、ａ０＝ａ０
＋（ｘ^*ｙ）となる。典型的には、ｘとｙは、メモリ内
の特定のアレイに属しており、ｘは係数アレイに、ｙは
データアレイに位置し得よう。However, there are many cases where it is necessary to supply two operands contained in a memory that is not in a register on the chip.
An example thereof is a product-sum operation instruction, which is one of the basic principles of signal processing. The two memory operands, if available, are typically contained in a data cache on chip or, optionally, in an external cache to the microprocessor chip. In either case, supplying two operands to the execution unit (EU) each cycle means dual porting the data cache. A typical instruction is MAC x, y, a0. Here, MAC is a shorthand notation for an instruction called multiply-add operation, and the operation specified by this is a0 = a0
It becomes + (x ^* y). Typically, x and y would belong to a particular array in memory, where x could be located in the coefficient array and y in the data array.

【０００４】図２を参照するに、チップ上のメモリの２
つのバンクを有しているマイクロプロセッサー２００の
例が示されている。命令レジスタ２０１は、キャッシュ
のバンク０（２０２）及びバンク１（２０３）に第一及
び第二のアドレス（アドレス０、アドレス１）を供給し
ており、ここで、例示的には、各バンクは、大きさとし
ては、１キロバイトとなることが考えられる。データ
は、書き込み線２１３を経由して、キャッシュに書き込
まれる。第一のオペランドは、バンク０（２０２）から
読み出され、マルチプレクサ２０４により選択される読
み出し出力である。それから、第一のオペランドは、オ
ペランドレジスタ１（２０５）へラッチ（一時保持）さ
れる。同様に、第二のオペランドは、バンク１（２０
３）から読み出され、マルチプレクサ２０６により選択
される読み出し出力である。それから、第二のオペラン
ドは、オペランドレジスタ２（２０７）へラッチ（一時
保持）される。選択的には、オペランドは、マルチプレ
クサ２０４及び２０６によって、外部メモリバス２１２
から選択されうる。さらに、オペランドは、オペランド
レジスタからＡＬＵ／ＭＡＣユニット２０８へ供給さ
れ、そこで、これらのオペランドは共に乗算がなされ、
経路２１４を経由してアキュムレータ（累算器）ファイ
ルからアクセスがなされた前の結果に加算される。その
結果は、結果レジスタ２０９へ供給され、アキュムレー
タ（累算器）ファイル２１０に保存される。このような
技術は、従来のマイクロプロセッサーアーキテクチャー
において、積和演算機能を備えるものであるが、そのよ
うなアプローチには不利な点が存在している。例えば、
チップ上のメモリは、キャッシュとしてよりも、むしろ
ＲＡＭとして構成されていることから、選択されたアプ
リケーションのみが、それを利用できるに過ぎない。メ
モリの全てのデータアドレスは、アプリケーションプロ
グラムが開発された際に、決定されなくてはならない。
このようなことから、従来のマイクロプロセッサーアプ
リケーションでは、ここでのメモリのフレキシブルな利
用ができない。さらに加えて、異なった販売者によるア
プリケーションを動作させることも困難である。Referring to FIG. 2, two of the memories on the chip
An example of a microprocessor 200 having two banks is shown. The instruction register 201 supplies the first and second addresses (address 0, address 1) to the bank 0 (202) and the bank 1 (203) of the cache, where each bank is, for example, The size may be 1 kilobyte. The data is written to the cache via the write line 213. The first operand is the read output read from bank 0 (202) and selected by multiplexer 204. Then, the first operand is latched (temporarily held) in the operand register 1 (205). Similarly, the second operand is bank 1 (20
3) The read output read from 3) and selected by the multiplexer 206. Then, the second operand is latched (temporarily held) in the operand register 2 (207). Optionally, the operands are sent by the multiplexers 204 and 206 to the external memory bus 212.
Can be selected from In addition, the operands are provided to the ALU / MAC unit 208 from the operand register, where the operands are multiplied together,
It is added to the previous result accessed from the accumulator file via path 214. The result is supplied to the result register 209 and stored in the accumulator file 210. Such a technique has a product-sum operation function in the conventional microprocessor architecture, but there is a disadvantage in such an approach. For example,
Since the memory on the chip is configured as RAM rather than cache, it is only available to selected applications. All data addresses in memory must be determined when the application program is developed.
As such, conventional microprocessor applications do not allow for flexible use of memory here. In addition, it is difficult to run applications from different vendors.

【０００５】[0005]

【課題を解決するための手段】発明者は、ｎウエイ（形
式）の連想付けを持ったキャッシュを有するデータプロ
セッサー及びデータプロセッシングシステムを発明し
た。このキャッシュにおいて、第一のオペランド（ｘ）
は、キャッシュの第一の部位に位置しており、第二のオ
ペランド（ｙ）は、キャッシュの第二の部位に位置して
いる。キャッシュの第一及び第二の部位の出力（ｘ，
ｙ）は、例えば積和演算命令のような、所与の命令形式
が実行されるときには、例えば積和演算ユニットのよう
な、機能ユニットへ供給されている。マルチプレクサ
は、キャッシュの第一及び第二の部位の出力へと接続さ
れている。それ故に、当該キャッシュが、異なった形式
の命令を実行するために、従来のセットアソシエイティ
ブ（セット連想付け）キャッシュとしてアクセスされる
こととなるときは、いずれの部位からもオペランドを取
り出されることが可能である。キャッシュへの書き込み
を制御するために、変換索引バッファは、再構成領域を
有するページテーブルエントリを含みうるものであり、
選択的には、その他の制御方法が用いられ得る。The inventor has invented a data processor and a data processing system having a cache with n ways (types) of association. In this cache, the first operand (x)
Is located in the first part of the cache and the second operand (y) is located in the second part of the cache. Output of the first and second parts of the cache (x,
y) is supplied to a functional unit such as a product-sum operation unit when a given instruction format such as a product-sum operation instruction is executed. The multiplexer is connected to the outputs of the first and second parts of the cache. Therefore, when the cache is to be accessed as a conventional set associative cache to execute different types of instructions, operands can be fetched from either site. It is possible. To control writing to the cache, the translation lookaside buffer may contain a page table entry with a reconstruction area,
Other control methods may optionally be used.

【０００６】[0006]

【発明の実施の形態】本詳細な記述は、再構成されるこ
とが可能なセットアソシエイティブ（セット連想付け）
キャッシュを利用するマイクロプロセッサーに関するも
のである。高いデータバンド幅を必要とする命令が実行
されるときには、第一の配置においては、キャッシュは
一つのオペランドを、第二の配置においては、複数のオ
ペランド（ｘ，ｙ）を、同時に算術プロセッサーに供給
する。ここで用いられているように、「同時に」という
言葉は、（一つ以上の）クロックサイクルを含みうる、
同一のマシン上でのサイクルにおいて、ということを意
味するものである。そのような命令の例としては、積和
演算命令がある。このような手法で、高速な積和演算
が、例えば、汎用マイクロプロセッサーにおいて実施さ
れうる。キャッシュは、典型的には、ｎウエイ（形式）
のセットアソシエイティブ（セット連想付け）キャッシ
ュであり、本技術では、また、当該キャッシュが、複数
の直接マッピングがなされたキャッシュとして用いられ
ることを許すものである。ｎウエイ（形式）セットアソ
シエイティブ（セット連想付け）キャッシュから直接マ
ッピングがなされたキャッシュへの再構成及びその逆
は、各命令ベースで実行されうる。ここで用いられてい
るように、当該キャッシュ部位は、また、「キャッシュ
ウエイ０」、「キャッシュウエイ１」、あるいは、より
一般的には、ｎを正の整数として、「キャッシュウエイ
ｎ」として呼称される。Detailed Description This detailed description is reconfigurable set associative.
It relates to a microprocessor that uses a cache. When an instruction that requires a high data bandwidth is executed, in the first arrangement the cache will send one operand and in the second arrangement multiple operands (x, y) to the arithmetic processor at the same time. Supply. As used herein, the term "simultaneously" can include (one or more) clock cycles,
It means that in a cycle on the same machine. An example of such an instruction is a multiply-add operation instruction. In this way, fast multiply-accumulate operations can be implemented, for example, in a general purpose microprocessor. Caches are typically n ways (form)
Set associative cache, and the present technology also allows the cache to be used as a cache with a plurality of direct mappings. Reconfiguration from an n-way (formal) set associative cache to a directly mapped cache and vice versa can be performed on a per instruction basis. As used herein, the cache portion is also referred to as "cashway 0,""cashway1," or, more generally, "cashway n," where n is a positive integer. To be done.

【０００７】図３を参照するに、本発明の例示的な実施
例が、キャッシュ部位３０１及び３０２を含んでいる、
２ウエイ（形式）セットアソシエイティブ（セット連想
付け）キャッシュについて示されている。キャッシュ部
位３０１及び３０２のデータ出力は、それぞれ、データ
線３０３及び３０４を経由して、積和演算ユニット(mul
tiply-accumulate unit,MAU)３０５に供給されている。
ｘ及びｙデータ入力に加えて、ＭＡＵ３０５は、アキュ
ムレータ（累算器）ファイル３１２から線３０８を経由
する、アキュムレータ（累算器）入力を受け取る。ＭＡ
Ｕ３０５は、マルチプライヤ（乗算器）３０６及びアキ
ュムレータ（累算器）３０７を含んでおり、これらにつ
いては、本発明が関連する限りにおいて、当該技術にお
いて公知なものを含めて、様々な設計をもち得るもので
ある。動作中であり、積和演算命令が実行されていると
きには、マルチプレクサ３１０を経由してキャッシュ部
位３０１からアクセスがなされたオペランドｘ及び、マ
ルチプレクサ３１１を経由してキャッシュ部位３０２か
らアクセスがなされたオペランドｙについて、積和演算
機能を実行するように、ＭＡＵ３０５に命令が下され
る。しかしながら、当該キャッシュからの同時のオペラ
ンドを必要としない、もう一つの形の命令が実行されよ
うとしているときには、マルチプレクサ３１１は、望み
のデータを提供するため、キャッシュ部位３０１、キャ
ッシュ部位３０２、あるいは外部メモリバス３１２から
選択的に出力を選択する。Referring to FIG. 3, an exemplary embodiment of the present invention includes cache sites 301 and 302,
A two-way (form) set associative cache is shown. The data output of the cache parts 301 and 302 is sent via the data lines 303 and 304, respectively, to the product-sum operation unit (mul).
tiply-accumulate unit (MAU) 305.
In addition to the x and y data inputs, MAU 305 receives an accumulator input from line 308 from accumulator file 312. MA
The U 305 includes a multiplier 306 and an accumulator 307, which have various designs, including those known in the art, as far as the invention is concerned. I will get it. When the operation is in progress and the multiply-accumulate operation instruction is being executed, the operand x accessed from the cache part 301 via the multiplexer 310 and the operand y accessed from the cache part 302 via the multiplexer 311. , The MAU 305 is instructed to perform the multiply-accumulate function. However, when another form of instruction is about to be executed that does not require simultaneous operands from the cache, the multiplexer 311 will provide the desired data to the cache location 301, cache location 302, or external location. The output is selectively selected from the memory bus 312.

【０００８】例示的な実施例は、２ウエイ（形式）セッ
トアソシエイティブ（セット連想付け）キャッシュにつ
いてのものであるということに留意すべきである。しか
しながら、本発明は、ｎをあらゆる正の整数であるとし
て、あらゆるｎウエイ（形式）セットアソシエイティブ
（セット連想付け）キャッシュについて実施されうるも
のである。次の議論においては、Ｎとは例示的に偶数
（さらに、例としては、ｎ＝２となっている。）である
が、選択的には、ｎは奇数となり得る。一般的に、この
点は、ｎ入力（各キャッシュ部位からのものである。）
を有するマルチプレクサを用いて実現されうる。ｎが２
よりも大きい際には、２つのオペランドにアクセスする
ためのｎ通りの配置法は、特定の実施例により決定さ
れ、それらのいずれについても、本発明において用いら
れ得るものである。さらに、キャッシュが、従来のｎウ
エイ（形式）セットアソシエイティブ（セット連想付
け）キャッシュとして構成されているときには、当該キ
ャッシュへの置き換えアルゴリズムは、本発明が関連す
る限りにおいては、あらゆる技術を用いて、実現されう
る。It should be noted that the exemplary embodiment is for a two-way (form) set associative (set associative) cache. However, the present invention can be implemented for any n-way (formal) set-associative cache, where n is any positive integer. In the following discussion, N is illustratively even (and n = 2, by way of example), but alternatively n can be odd. Generally, this point is n inputs (from each cache site).
Can be implemented using a multiplexer having n is 2
When greater than, the n possible placement strategies for accessing the two operands are determined by the particular implementation, any of which may be used in the present invention. Furthermore, when the cache is configured as a conventional n-way (formal) set associative (set associative) cache, the replacement algorithm for that cache may use any technique as far as the invention is concerned. Can be realized.

【０００９】当該技術において知られているように、メ
モリ管理ページテーブルは、仮想アドレスを物理的アド
レスへ変換し、さらにまた、キャッシュ操作を制御する
ために用いられている。ページテーブルは、変換索引バ
ッファ(translationlookaside buffer,TLB)内に保存さ
れており、これが、仮想メモリアドレスを物理的メモリ
アドレスへと変換する。また、ＴＬＢは、メモリページ
について、及び、与えられたページがキャッシュ可能で
あるかについての制御情報を備えている。図４を参照す
ると、例示的なページテーブルエントリは、領域４１に
（ビットでは、１２から３１まで）物理的アドレスの
「タグ」を含んでいる。タグは、アドレスの最上位ビッ
トを表示しており、望みのアドレスが当該キャッシュ内
に位置しているかについて判断し、そのような場合に
は、キャッシュの「ヒット」は、図３におけるＬＨＩＴ
３２０あるいはＲＨＩＴ３２１により示されている。ア
ドレスの「インデックス」部位（示されていない。）
は、最下位ビットを表示しており、ポインタ（３２２、
３２３）を、当該技術においてよく知られた手法で、所
与のキャッシュ部位（それぞれ、３０１、３０２）の望
みの位置に向けるために用いられる。領域４２は、例え
ば、未使用ビットを含みうるものであり、領域４５は、
メモリページ内のデータが、例えば、書き込み可能、有
効、キャッシュ可能、あるいは、利用者がアクセス可能
かどうかを制御する「許諾」ビットを、典型的には含ん
でいる。本発明が関連する限りにおいては、これらの領
域は、あらゆる順序となり得るものである。図５を参照
すると、ＴＬＢは、仮想タグ５０１に沿い、物理的タグ
５０２及び制御ビット５０３として、例示的なページテ
ーブルエントリを含んでいる。このような手法では、ま
た、当該技術において知られている原理に従って、仮想
アドレスが物理的アドレスへと変換される。As is known in the art, memory management page tables are used to translate virtual addresses into physical addresses and also to control cache operations. The page table is stored in a translation lookaside buffer (TLB), which translates virtual memory addresses into physical memory addresses. The TLB also contains control information about the memory page and whether a given page is cacheable. Referring to FIG. 4, an exemplary page table entry includes a physical address “tag” (in bits 12 to 31) in region 41. The tag indicates the most significant bit of the address and determines if the desired address is located in the cache, in which case the cache "hit" is the LHIT in FIG.
320 or RHIT 321. The "index" part of the address (not shown)
Indicates the least significant bit, and the pointer (322,
323) in a manner well known in the art to direct the desired location of a given cache site (301, 302, respectively). The area 42 can include, for example, unused bits, and the area 45 can include:
The data in a memory page typically includes a "grant" bit that controls whether it is writable, valid, cacheable, or accessible to the user, for example. As far as the invention is concerned, these areas can be in any order. Referring to FIG. 5, the TLB contains exemplary page table entries along with virtual tag 501 as physical tag 502 and control bits 503. Such an approach also translates virtual addresses into physical addresses in accordance with principles known in the art.

【００１０】上で記述されたような、本発明の技術を実
施するために、（一つ以上の）付加的な制御ビットを、
メモリ管理ページテーブルに含ませ得る。例えば、領域
４３は、さらに以下で記述されるように、データがキャ
ッシュ内にどのように書き込まれるべきかを示す、偶数
あるいは奇数の「ウエイ(way)」（形式）ビットを含み
うる。領域４４は、「再構成」ビットを含み得る。再構
成ビットが０であるとき、当該キャッシュは、従来の２
ウエイ（形式）セットアソシエイティブ（セット連想付
け）キャッシュとして扱われることになる。すなわち、
データは、選ばれたキャッシュエントリ置き換えスキー
ムを用いることで、キャッシュウエイ３０１及びキャッ
シュウエイ３０２に書き込まれる。一方、再構成ビット
が１であるときには、２ウエイ（形式）セットアソシエ
イティブ（セット連想付け）キャッシュは、２つの直接
マッピングがなされたキャッシュとして扱われる。それ
から、領域４３のウエイビットが０である場合には、デ
ータは、偶数ウエイ（形式）キャッシュ部位へ書き込ま
れるように向けられ、さらに選択的には、領域４３のウ
エイビットが１である場合には、奇数ウエイ（形式）キ
ャッシュ部位へ書き込まれることとなる。ＭＡＵによ
り、積和演算あるいは、その他の特別な形式の命令を実
行するための、ｘ及びｙオペランドとして用いられるよ
うに、当該データは、このような手法で、適切なキャッ
シュ部位に配置される。オペレーティングシステム（Ｏ
Ｓ）の存在下では、ユーザープログラムが、特別なファ
ンクションコールを経由することで、当該ＯＳに「再構
成ビット」及び「ウエイビット」を設定するように指示
しうるであろう。このような手法で、データプロセッサ
ー及びオペレーティングシステムを両方含むデータプロ
セッシングシステムは、本発明の技術を有効に利用しう
るのである。In order to implement the techniques of the present invention, as described above, additional control bit (s) (s) may be added:
It may be included in the memory management page table. For example, region 43 may include even or odd "way" (form) bits that indicate how the data should be written into the cache, as described further below. Region 44 may include "reconstruction" bits. When the reconfiguration bit is 0, the cache has a conventional 2
It is treated as a way (form) set associative (set associative) cache. That is,
Data is written to cache way 301 and cache way 302 using the selected cache entry replacement scheme. On the other hand, when the reconfiguration bit is 1, the two-way (form) set associative cache is treated as two directly mapped caches. Then, if the way bit of area 43 is 0, the data is directed to be written to an even way (form) cache site, and optionally, if the way bit of area 43 is 1. Will be written to the odd way (form) cache site. The data is placed in the appropriate cache location in this manner for use by the MAU as x and y operands for executing multiply-add operations or other special types of instructions. Operating system (O
In the presence of S), the user program could instruct the OS to set the "reconfiguration bit" and the "way bit" by way of a special function call. In this manner, the data processing system including both the data processor and the operating system can effectively use the technique of the present invention.

【００１１】慣例としては、左側のオペランド（すなわ
ち、上の例ではｘ）は、ウエイ０からフェッチされ（取
り出され）、右側のオペランド（すなわち、上の例で
は、ｙ）は、奇数のウエイからフェッチされる（取り出
される）。しかしながら、その他の慣例も可能である。
さらに、本発明と共に用いられるため、キャッシュ部位
へのデータの書き込みを制御する、さらにその他の技術
も考えられる。例えば、キャッシュをロードする命令
は、キャッシュのどの部位にデータが書き込まれるべき
であるかを、明確に特定し得る。これを実現するため
に、（一つ以上の）ウエイビット（３１３）が、図３の
命令レジスタ内に含ませ得る。その場合、メモリ管理ユ
ニット及びＴＬＢは必要とならないかもしれない。ま
た、ｘ及びｙデータの配置は、偶数及び奇数ウエイのキ
ャッシュに分けられる必要はないが、何らかの勝手のよ
い手法で、キャッシュ内で配置されうる。つまりは、当
該技術分野の当業者にとっては明白であるように、機能
ユニットにより実行される様々な演算のために、キャッ
シュから同時に複数のオペランドがフェッチされうる
（取り出されうる）ということについては、留意された
い。By convention, the left operand (ie x in the above example) is fetched (fetched) from way 0 and the right operand (ie y in the above example) is from an odd way. Fetched (retrieved). However, other conventions are possible.
Furthermore, since it is used with the present invention, another technique for controlling the writing of data to the cache part is also conceivable. For example, the instruction to load the cache may unambiguously specify where in the cache the data should be written. To accomplish this, way bit (s) (313) may be included in the instruction register of FIG. In that case, the memory management unit and TLB may not be needed. Also, the placement of x and y data need not be split into even and odd way caches, but can be placed within the cache in some convenient way. That is, as will be apparent to those skilled in the art, multiple operands can be fetched (fetched) from the cache at the same time for various operations performed by the functional units. Please note.

【００１２】本発明のデータプロセッサーは、典型的な
場合は、従来「マイクロプロセッサー」として呼ばれて
いる形式のものであるが、さらにその他の目的及び形式
も考えられ得るものであり、それらも、ここで含まれる
ものである。例えば、ＭＡＣ以外の命令のための、増強
された機能性を有するデジタル信号プロセッサーも、本
技術を有効に使用しうるものである。The data processor of the present invention is typically of the type conventionally referred to as a "microprocessor", but it is contemplated that other purposes and types are contemplated and they are also It is included here. For example, digital signal processors with enhanced functionality for instructions other than MAC may also benefit from the present technology.

【００１３】[0013]

【発明の効果】本発明によって、各命令ベースで、セッ
トアソシエイティブ（セット連想付け）キャッシュと直
接マッピングがなされたキャッシュ間での相互の再構成
がなされることが可能な、セットアソシエイティブ（セ
ット連想付け）キャッシュを利用するマイクロプロセッ
サーが実現された。これにより、マイクロプロセッサー
のアプリケーションにおいて、メモリのよりフレキシブ
ルな利用が可能となった。例えば、典型的な信号処理で
ある積和演算（ＭＡＣ）について、高速な積和演算が、
例えば、汎用マイクロプロセッサーにおいて実施可能と
なる。According to the present invention, set-associative (set-associative) caches and set-associative (set-associative) caches that can be mutually reconfigured on a per instruction basis can be mutually reconfigured. Associative) A microprocessor utilizing a cache was realized. This allows more flexible use of memory in microprocessor applications. For example, for the product-sum operation (MAC) which is a typical signal processing, a high-speed product-sum operation is
For example, it can be implemented in a general-purpose microprocessor.

[Brief description of drawings]

【図１】図１は、オペランドを保存する２つのレジスタ
ファイルを有する、先行技術におけるマイクロプロセッ
サーを示している。FIG. 1 shows a prior art microprocessor having two register files that store operands.

【図２】図２は、オペランドを保存する複数のバンクを
含むチップ上のランダムアクセスメモリ（ＲＡＭ）を有
する、先行技術におけるマイクロプロセッサーを示して
いる。FIG. 2 shows a prior art microprocessor having a random access memory (RAM) on a chip containing multiple banks for storing operands.

【図３】図３は、本発明に従った、マイクロプロセッサ
ーの例示的な実施例を示している。FIG. 3 illustrates an exemplary embodiment of a microprocessor according to the present invention.

【図４】図４は、本発明に従った、例示的なページテー
ブルエントリを示している。FIG. 4 illustrates an exemplary page table entry in accordance with the present invention.

【図５】図５は、本発明の実施に際して用いられ得る、
例示的な変換索引バッファを示している。FIG. 5 may be used in the practice of the invention,
3 illustrates an exemplary translation index buffer.

【符号の説明】１００先行技術によるマイクロプロセッサー１０１命令レジスタ１０２第一のレジスタファイル１０３第二のレジスタファイル１０４第一のオペランドレジスタ１０５第二のオペランドレジスタ１０６算術論理（演算）ユニット（ＡＬＵ）１０７結果レジスタ１０８線２００先行技術によるマイクロプロセッサー２０１命令レジスタ２０２（ＲＡＭ）バンク０２０３（ＲＡＭ）バンク１２０４マルチプレクサ２０５オペランドレジスタ１２０６マルチプレクサ２０７オペランドレジスタ２２０８ＡＬＵ／ＭＡＣ（算術論理／積和演算）ユニッ
ト２０９結果レジスタ２１０アキュムレータ（累算器）ファイル２１１オペランドアクセス制御ロジック２１２外部メモリバス２１３書き込み線３０１キャッシュウエイ０３０２キャッシュウエイ１３０３、３０４データ線３０５積和演算ユニット（ＭＡＵ）３０６マルチプライヤ（乗算器）３０７アキュムレータ（累算器）３０８線３１０、３１１マルチプレクサ３１２外部メモリバス３１３制御ビット（ウエイビット）３１４命令レジスタ３１６マルチプレクサ３１７、３１８変換索引バッファ３１９マルチプレクサ３２０ＬＨＩＴ３２１ＲＨＩＴ３２２、３２３ポインタ３２４アクセス制御３２５第一の信号経路３２７第二の信号経路４１物理的タグ４２未使用（領域）４３ウエイ（形式）ビット４４再構成ビット４５許諾ビット５００変換索引バッファ５０１仮想タグ５０２物理的タグ５０３制御ビットDESCRIPTION OF SYMBOLS Microprocessor according to the prior art 101 Instruction register 102 First register file 103 Second register file 104 First operand register 105 Second operand register 106 Arithmetic logic (arithmetic) unit (ALU) 107 Result Register 108 Line 200 Prior Art Microprocessor 201 Instruction Register 202 (RAM) Bank 0 203 (RAM) Bank 1 204 Multiplexer 205 Operand Register 1 206 Multiplexer 207 Operand Register 2 208 ALU / MAC (Arithmetic Logic / Sum of Sum) Unit 209 Result register 210 Accumulator (accumulator) file 211 Operand access control logic 212 External memory bus 213 Write line 30 Cash way 0 302 Cash way 1 303, 304 Data line 305 Multiply-accumulate unit (MAU) 306 Multiplier (multiplier) 307 Accumulator (accumulator) 308 Line 310, 311 Multiplexer 312 External memory bus 313 Control bit (way bit) ) 314 Instruction register 316 Multiplexer 317, 318 Conversion lookaside buffer 319 Multiplexer 320 LHIT 321 RHIT 322, 323 Pointer 324 Access control 325 First signal path 327 Second signal path 41 Physical tag 42 Unused (area) 43 way ( Format) Bit 44 Reconstruction Bit 45 Grant Bit 500 Translation Index Buffer 501 Virtual Tag 502 Physical Tag 503 Control Bit

Claims

[Claims]

1. An instruction register (314), wherein n is an integer of 2 or more, and a first cache part (30)
1) and a second cache part (302), an n-way set associative cache, and an operation on the first and second operands (x, y) when the instruction is executed. Functional unit to perform (305)
A first signal path (325) from the first cache site that supplies the first operand (x) to the functional unit; and when a special form of instruction is executed, the first signal path And a second signal path (327) from the second cache site that supplies the second operand (y) to the functional unit at the same time as the operand of Selecting data from either the first or the second cache part,
A data processor comprising: a multiplexer (310, 311).

2. A translation lookaside buffer (500) having a page table entry (FIG. 4) containing a reconstruction area (44), which controls how data is written to the cache. The data processor of claim 1, wherein:

3. The page table entry is further written to a cache where a first set of data is direct mapped for even ways and a second set of data is direct mapped for odd ways. The way area (4
3) The data processor of claim 2, including:

4. The control bit (31), wherein the instruction register controls writing of data to the cache part.
3) The data processor of claim 1, including:

5. The data processor of claim 1, wherein the special form of instruction comprises a multiply-accumulate operation instruction.

6. The data processor of claim 1, wherein the functional unit is a product-sum operation unit.