TW297111B

TW297111B -

Info

Publication number: TW297111B
Application number: TW085100981A
Authority: TW
Original assignee: At & T Corp
Priority date: 1995-02-03
Filing date: 1996-01-26
Publication date: 1997-02-01
Also published as: KR960032182A; JPH08272681A

Description

A7 B7 經濟部中央標準局員工消費合作社印製五、發明説明（1 ) 發明背景：發明領域：本發明係有關一種具有可重新配置成組關聯式快取記憶體（set-associative cache)的裝置以供更大頻寬作業之微處理器。習用技術之說明：許多傳統的微處理器具有多出入埠式暫存器檔（ register file)，因而可在每一週期中將各暫存器中所存的兩個運算元提供給執行單元（Execution Unit;簡稱 EU)。這些暫存器係包含在算術邏輯單元（Arithmetic Logic Unit;簡稱ALU)所在的同一個積體電路中，且是用於提供所需資料的極快速裝置。例如，請參閱圓1 , 一個典型習用技術的微處理器（100)包含一指令暫存器 (101)，該指令暫存器（101)將一第一位址（addrO) 供應到一第一暫存器檔（102),並將一第二位址（addrl )供應到一第二暫存器檔（103)。例示之暫存器檔（1〇2 )及（103 )具有32個登錄，每一個登錄有32個位元。第一暫存器檔（102)將一第一運算元供應到一第一運算元暫存器（104)，且第二暫存器檔（103)將一第二運算元供應到一第二運算元暫存器（105)。暫存器（1〇4)及 (105)將該第一及第二運算元供應到算術邏輯單元（ALU )(106)，該ALU(106)可執行各種算術運算，其中包括乘法累積（Multiply Accumulate;簡稱MAC)運算。運算結果被儲存在運算結果暫存器（107)中，並可經由 (請先閲讀背面之注意事項导填寫本頁) •裝· 許線本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐）經濟部中央標準局貝工消費合作社印製 ^97111 at B7 五、發明説明（2 ) 線路（108)將運算結果寫回各暫存器檔。在一替代實施例中，係使用一單一雙出入埠式暫存器檔（圖中未示出）替代這兩個暫存器檔（102)及（103)。在此種情形中，兩個讀取瘅可同時取得該暫存器檔中的兩個登錄。然而，有很多必須供應記憶體中所存有但並未存在晶片內部暫存器的兩個運算元之情形。其中一個例子是乘法 -累積指令，此種指令是信號處理的一種基本原始形式。兩個記憶體運算元通常存於一晶片內部之資料快取記憶體 (在快取命中的情形時），亦可存於微處理器晶片外部的一快取記憶體中。不論在哪一種情形，由於都須在每一週期中將兩個運算元供應到EU，所以必須使資料快取記憶體具有雙出入埠。一個典型的指令如下： MAC X， y » a0 此處的MAC是指令"乘法累積"之簡字符號，且所指定的運算係如下式： aO = a0 + (x * y) 通常x及y屬於記憶體中的一些特定陣列；例如，χ可能位於一係數陣列，而y可能位於一資料陣列。請參閱圖2 ，圖中示出一具有兩組晶片內部記憶體本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁) .裝. -IT--Γ 經濟部中央標準局員工消費合作社印製 A7 _____B7 五、發明説明（3 ) ~~~~ 之微處理器（ 2 00 )。指令暫存器（201)將第一及第二位址（addrO， addrl )供應到快取記憶體的第0記憶體組 (2 02 )及第1記憶體組（2 03 )，其中例示每一記憶體組之大小爲1千位元組。係經由寫入線（2 1 3 )將資料寫到快取記憶體。自第0記憶體組（202 )讀取第一運算元，且係利用多工器（204 )選擇第0記憶體組（202 )之讀取輸出。然後將第一運算元鎖存到運算元暫存器1 ( 205)。同樣地，自第1記憶組（203)讀取第二運算元，且係利用多工器（206 )選擇該第1記億體組（203 )之讀取輸出。然後將第二運算元鎖存到運算元暫存器2( 207)。此外，多工器（204)及（206)亦可自外部記憶體匯流排（212)選擇這些運算元。然後將這些運算元自各運算元暫存器提供給ALU/MAC單元（ 2 08 )，在此單元中對這些運算元執行乘法，並將運算結果加到經由路徑（ 214)自累積器儲存部分取得的前一運算結果。然後將此運算結果提供給運算結果暫存器（ 209 )，並儲存在累積器儲存部分（210)中。雖然此種技術利用傳統的微處理器架構提供乘法/累積功能，但是此種方法尙有一些缺點。例如，因爲晶片內部之記憶體係配置成RAM，而非配置成快取記憶體，所以只有經過選擇的應用程式可使用晶片內部之記憶體。在開發應用程式時，必須決定記憶體中所有的資料位址。因此，傳統微處理器的應用程式無法彈性地使用此記憶體。此外，很難執行不同廠商的應用程式本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公董） I— i 裝 I ^ I I n ^ (請先閲讀背面之注意事項再填寫本頁) A7 B7 經濟部中央標準局員工消費合作社印製五、發明説明（4 ) 發明概述：發明人發明了一種具有一 η路關聯性式取記憶體之資料處理器及資料處理系統，其中一第一運算元（X)係位於快取記憶體的一第一部分，而一第二運算元（y)係位於快取記憶體的一第二部分。於執行一諸如乘法-累積指令等某一指令類型時，係將快取記憶體的第一及第二部分之輸出（X，y)提供給一諸如乘法-累積單元之功能單元。一多工器係連接到快取記憶體的第一及第二部分之輸出端。因此，當以傳統成組關聯式快取記憶體之方式存取該快取記憶體，而執行其他類型的指令時，可自其他部分擷取各運算元。爲了控制對該快取記憶體之寫入，一並行式位址轉換緩衝區（translation lookaside buffer)可包含一具有一重新配置欄位之頁次表登錄；此外，亦可使用其他的控制方法。附圖簡述：圖1示出一具有兩個用來儲存運算元的暫存器檔之習用技術微處理器。圖2示出一具有晶片內部的隨機存取記憶體之習用技術微處理器，該隨機存取記憶體包含多個用於儲存運算元的記憶體組。圖3示出一根據本發明的微處理器實施例。圖4示出一根據本發明的例示頁次表登錄。 (請先閱讀背面之注意事項再填寫本頁) 装. 灯線本紙張尺度適用中國國家標準（CNS ) A4規格（2!〇Χ297公釐）經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明（5 ) 圖5示出一可用於實施本發明的例示並行式位址轉換緩衝區。本詳細說明係有關一種採用一可重新配置的成組關聯式快取記憶體之微處理器。在第一種配置中，快取記憶體提供一個運算元；而在第二種配置中，在執行一需要較高資料頻寬的指令時，微處理器可同時將兩個或更多個運算元（x，y)提供給一算術處理器。在本文的用法中，"同時 "意指在同一個機器週期中，這可包含一個或多個時脈週期。此類指令的一個例子是乘法-累積指令。在此種方式下，可在一個諸如一般用途的微處理器中執行快速的乘法 -累積運算。快取記憶體通常是η路成組關聯式快取記憶體，本技術亦可以直接對映式快取記憶體（direct-mapped cache)之方式利用此快取記憶體。可依照每一個不同的指令完成自η路成組關聯式快取記憶體到直接對映式快取記憶體的重新配置，並可反向重新配置。在本文的用法中，亦將各快取記憶體部分稱爲"快取記憶體第〇路"、快取記憶體第1路"、或更具一般性的”快取記憶體第η路"，此處η是一正整數。請參閱圖3 ，圖中示出一個本發明的二路成組關聯式快取記憶體實施例，該二路成組關聯式快取記憶體包含快取記憶體部分（301)及（302)。分別經由資料線（A7 B7 Printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (1) Background of the invention: Field of the invention: The present invention relates to a device with a set-associative cache that can be reconfigured into groups Microprocessor for larger bandwidth operation. Description of conventional technology: Many traditional microprocessors have multi-port register files, so they can provide the two operands stored in each register to the execution unit (Execution) in each cycle Unit; referred to as EU). These registers are contained in the same integrated circuit where the Arithmetic Logic Unit (ALU) is located, and are extremely fast devices used to provide the required data. For example, please refer to circle 1, a typical conventional technology microprocessor (100) includes an instruction register (101), the instruction register (101) supplies a first address (addrO) to a first Register file (102), and supply a second address (addrl) to a second register file (103). The illustrated scratchpad files (102) and (103) have 32 entries, and each entry has 32 bits. The first register file (102) supplies a first operand to a first operand register (104), and the second register file (103) supplies a second operand to a second Operand register (105). The registers (104) and (105) supply the first and second operands to the arithmetic logic unit (ALU) (106), which can perform various arithmetic operations, including multiply accumulation (Multiply Accumulate; referred to as MAC) operation. The calculation result is stored in the calculation result register (107), and can be accessed via (please read the precautions on the back to fill in this page) • The paper size of Xu Xun is applicable to the Chinese National Standard (CNS) A4 specification (210X297 Mm) Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs ^ 97111 at B7 5. Description of the invention (2) The circuit (108) writes the calculation result back to each temporary file. In an alternative embodiment, a single dual-port register file (not shown) is used to replace the two register files (102) and (103). In this case, two reads can simultaneously obtain two entries in the register file. However, there are many situations where it is necessary to supply two operands that are stored in the memory but do not exist in the on-chip scratchpad. One example is the multiply-accumulate instruction, which is a basic primitive form of signal processing. The two memory operands are usually stored in the data cache of a chip (in the case of a cache hit), or they can be stored in a cache memory outside the microprocessor chip. In either case, since two operands must be supplied to the EU in each cycle, the data cache must have dual access ports. A typical instruction is as follows: MAC X, y »a0 where MAC is the simple character number of the instruction" multiplication accumulation ", and the specified operation system is as follows: aO = a0 + (x * y) usually x and y belongs to some specific array in the memory; for example, χ may be located in a coefficient array, and y may be located in a data array. Please refer to Figure 2, which shows a paper with two sets of internal memory. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm) (please read the precautions on the back and fill in this page). -IT--Γ A7 _____B7 printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Invention description (3) ~~~~ Microprocessor (200). The instruction register (201) supplies the first and second addresses (addrO, addrl) to the 0th memory group (2 02) and the 1st memory group (2 03) of the cache memory, where The size of a memory group is 1 kilobyte. The data is written to the cache memory via the write line (2 1 3). The first operand is read from the 0th memory group (202), and the read output of the 0th memory group (202) is selected using the multiplexer (204). The first operand is then latched into operand register 1 (205). Similarly, the second operand is read from the first memory group (203), and the read output of the first memory group (203) is selected using the multiplexer (206). The second operand is then latched into operand register 2 (207). In addition, the multiplexers (204) and (206) can also select these operands from the external memory bus (212). Then, these operands are provided from each operand register to the ALU / MAC unit (2 08), in which the operands are multiplied, and the operation result is added to the storage part obtained from the accumulator via the path (214) Result of the previous operation. This operation result is then provided to the operation result register (209) and stored in the accumulator storage section (210). Although this technique utilizes the traditional microprocessor architecture to provide multiply / accumulate functions, this method has some disadvantages. For example, because the internal memory system of the chip is configured as RAM, not as cache memory, only selected applications can use the internal memory of the chip. When developing applications, all data addresses in memory must be determined. Therefore, applications of traditional microprocessors cannot use this memory elastically. In addition, it is difficult to execute applications from different manufacturers. This paper standard is applicable to the Chinese National Standard (CNS) A4 specification (210X297 company director) I— i installed I ^ II n ^ (please read the precautions on the back before filling this page) A7 B7 Printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (4) Summary of the invention: The inventor invented a data processor and data processing system with an η-way associative memory fetch, one of which Element (X) is located in a first part of the cache memory, and a second operand (y) is located in a second part of the cache memory. When an instruction type such as a multiply-accumulate instruction is executed, the outputs (X, y) of the first and second parts of the cache memory are provided to a functional unit such as a multiply-accumulate unit. A multiplexer is connected to the output of the first and second parts of the cache memory. Therefore, when accessing the cache memory in the traditional group associative cache memory and executing other types of instructions, each operand can be retrieved from other parts. To control writing to the cache, a translation lookaside buffer (translation lookaside buffer) may include a page table entry with a reconfiguration field; in addition, other control methods may be used. Brief Description of the Drawings: Fig. 1 shows a conventional technical microprocessor having two temporary memory files for storing operands. FIG. 2 shows a conventional technology microprocessor having a random access memory inside the chip. The random access memory includes a plurality of memory groups for storing operands. Figure 3 shows an embodiment of a microprocessor according to the invention. FIG. 4 shows an exemplary page order table registration according to the present invention. (Please read the precautions on the back before filling out this page) Pack. The paper size of the light line is applicable to the Chinese National Standard (CNS) A4 specification (2! 〇297297 mm) A7 B7 printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs 5. Description of the Invention (5) FIG. 5 shows an exemplary parallel address conversion buffer that can be used to implement the present invention. This detailed description relates to a microprocessor that uses a reconfigurable set of associative cache memories. In the first configuration, the cache memory provides an operand; in the second configuration, when executing an instruction that requires a higher data bandwidth, the microprocessor can simultaneously perform two or more operations The element (x, y) is provided to an arithmetic processor. In the usage of this article, " simultaneously " means that in the same machine cycle, this may include one or more clock cycles. An example of such an instruction is a multiply-accumulate instruction. In this way, fast multiply-accumulate operations can be performed in a microprocessor such as a general purpose. Cache memory is usually n-way group-associative cache memory. This technology can also use this cache memory in a direct-mapped cache. The reconfiguration from the n-way group associative cache memory to the direct map cache memory can be completed according to each different command, and can be reversely reconfigured. In the usage of this article, each cache memory part is also called " cache memory No. 0 ", cache memory No. 1 ", or the more general “cache memory No. η way ", where η is a positive integer. Please refer to FIG. 3, which shows an embodiment of the two-way group associative cache memory of the present invention, the two-way group associative cache memory Contains the cache memory parts (301) and (302). Via the data line (

303 )及（ 304 )將快取記憶體部分（301)及（ 302 )之資料輸出提供給一乘法-累積單元（MAU) ( 305 )。除了 X 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） I I I I I I 裝—! I I I 訂 —r 線 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 ____ B7 五、發明説明（6 ) 及y資料輸入以外，MAIM305)經由線路（308)自累積器儲存部分（312)接收一累積器輸入。該MAIM305)包含一乘法器（306 )及一累積器（307 )，而在與本發明相關的範圍中，該乘法器（306 )及累積器（307 )可以利用各種設計，其中包括本門技術中所熟知者。在運算中當執行一乘法-累積指令時，MAU (305)被指示對經由多工器 (310 )自快取記憶體部分（301 )取得的運算元X及經由多工器（311)自快取記憶體部分（302)取得的運算元 y執行乘法-累積功能。然而，當正在執行無須同時自快取記憶體取得多個運算元的另一類指令時，多工器（311 )替代性地自快取記憶體部分（301 )、快取記憶體部分 (302)、或外部記憶體匯流排（312)選擇輸出，而提供所需的資料。請注意，所示實施例是針對一個二路成組關聯式快取記憶體。然而，亦可針對任何η路成組關聯式快取記憶體實施本發明，此處的η是任何正整數。在下列的說明中，η是示爲一個偶數（例如η = 2 )，但是η亦可以是一個奇數。一般而言，利用一具有η個輸入（每一快取記憶體部分提供一個輸入）之多工器，即可完成上述方式 °胃η大於2時，係由特定的實施方式決定用於存取兩個運算元的η路之分佈，且任何特定的實施方式都可配合本發明使用。此外，將快取記憶體配置成傳統的η路成組關聯式快取記憶體時，可在與本發明有關的範圍中利用任何技術完成快取記憶體的替換（replacememt)演算本紙張尺度適用中國國家標準（CNS ) A4規格（210X 297公釐） (請先閲讀背面之注意事項再填寫本頁) -裝· -訂----303) and (304) provide the data output of the cache memory parts (301) and (302) to a multiply-accumulate unit (MAU) (305). Except for the X paper size, the Chinese National Standard (CNS) A4 specification (210X297 mm) is applied. IIIIII Pack—! III Order—r line (please read the precautions on the back before filling out this page). Printed by the Employee Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs System A7 ____ B7 5. In addition to the description of the invention (6) and y data input, MAIM305) receives an accumulator input from the accumulator storage section (312) via line (308). The MAIM305) includes a multiplier (306) and an accumulator (307), and within the scope of the present invention, the multiplier (306) and the accumulator (307) can utilize various designs, including the gate technology Known in. When a multiply-accumulate instruction is executed in the operation, the MAU (305) is instructed to operate unit X obtained from the cache section (301) through the multiplexer (310) and the cache from the multiplexer (311) The operand y obtained from the memory part (302) performs a multiply-accumulate function. However, when another type of instruction that does not require the simultaneous acquisition of multiple operands from the cache memory is being executed, the multiplexer (311) substitutes the cache memory portion (301) and the cache memory portion (302) instead Or, the external memory bus (312) selects the output and provides the required data. Please note that the illustrated embodiment is directed to a two-way group associative cache memory. However, the invention can also be implemented for any n-way group-associative cache memory, where n is any positive integer. In the following description, η is shown as an even number (for example, η = 2), but η can also be an odd number. Generally speaking, using a multiplexer with η inputs (one input for each cache section), the above method can be completed. When the stomach η is greater than 2, it is determined by the specific implementation for access The distribution of the η way of the two operands, and any specific embodiment can be used in conjunction with the present invention. In addition, when the cache memory is configured as a conventional η-way group-associative cache memory, any technique can be used to complete the replacement memory of the cache memory within the scope of the present invention. The paper size is applicable. China National Standards (CNS) A4 Specification (210X 297mm) (Please read the precautions on the back before filling out this page) -installed--order ----

S971H A7 經濟部中央標準局員工消費合作社印製 B7五、發明説明（7 ) 法。如本門技術中所熟知者，記憶體管理頁次表係用來將虛擬位址轉換成實體位址，並係用來控制快取記憶體作業。這些頁次表的部分被緩衝儲存在並行式位址轉換緩衝區 (Translation Lookaside Buffer;簡稱 TLB)中，該 TLB將虛擬記憶體位址轉換成實體記憶體位址。該TLB 亦將控制資訊提供給記憶體頁次，且亦提供某一頁次是否緩衝儲存在TLB中。請參閱圖4 ，一例示頁次表登錄在欄位（41 )(第12到31位元）包含一實體位址"標記 "。該標記代表位址的若干最高有效位元，且係用來決定所需的位址是否位於快取記憶體之內，且係由圖3中之 LHIT ( 32 0 )或RHIT ( 321 )指示一快取記憶體"命中"的情形。位址的"索引’’部分（圖中未示出）代表若干最低有效位元，且係用來以本門技術中所熟知的技術將指標（ 322 )及（323 )指引到一特定快取記憶體部分（分別爲 301及302 )的所需位置。欄位（42 )可包含諸如未使用位元，而欄位（4 5 )通常包含"容許"位元，用以控制該記憶體頁次中之資料是否爲諸如可寫入的、有效的、存放在快取記憶體的、及（或）使用者可存取的。在與本發明有關的範圍中，這些欄位可以任何順序排列。請參閱圖5 ，一 TLB所包含的例示頁次表登錄係作爲實體標記（502 )、控制標記（5 0 3 )、及虛擬標記（5 0 1 )。在此種方式下，亦係根據本門技術中所熟知的原理將虛擬位址轉換成實體位址。 (請先閱請背面之注意事項再填寫本頁) 裝. 訂線· 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 10 - 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明（8 ) 爲了實施上述本發明之技術，可將額外的一個或多個控制位元包含在記億體管理頁次表中。例如，欄位（43) 可包含一偶次/奇次"路”位元，用以指示如何將資料寫入快取記憶體中，將於下文中詳述其方式。欄位（44)可包含一n重新配置"位元。當重新配置位元爲"時，該快取記憶體被視爲一個傳統的二路成組關聯式快取記憶體。亦即，使用所選擇的快取記憶體登錄替換架構將資料寫入快取gB憶體之各路（301)及（ 302)。另一方面，當重新配置位元爲"1"時，此二路成組關聯式快取記憶體被視爲一個直接對映式快取記憶體。然後，如果欄位（43 )中之路次位元爲"(Γ時，則指示資料寫入偶次路的快取記憶體部分；如果攔位（4 3 )中之路次位元爲"1 "時，則指示資料寫入奇次路的快取記憶體部分。在此種方式下，資料被置入適當的快取記憶體部分，而作爲X及 y運算元，以供MAU執行一乘法-累積指令、或其他特殊類型的指令。在安裝有作業系統（0S)時，使用者應用程式可經由一特殊的功能呼叫。在此種方式下，包含一資料處理器及作業系統的一資料處理系統可以有效地利用本發明之技術。在慣例上，係自第〇路提取左方運算元（亦即上例中之X)，並自奇次路提取右方運算元（亦即上例中之y )。然而，亦可使用其他的慣例。此外，亦可配合本發明使用將資料寫入快取記憶體部分的其他控制技術。例如，將資訊載入快取記憶體的一指令可明確指定應將資料寫入本紙張尺度適用中國國家標準（CNS ) A4規格（210X 297公釐） (請先閱讀背面之注意事項再填寫本頁) -'sS971H A7 Printed by Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs B7 V. Invention Instructions (7) Method. As is well known in the art, the memory management page table is used to convert virtual addresses to physical addresses and is used to control cache memory operations. Parts of these page tables are buffered and stored in a parallel address translation buffer (Translation Lookaside Buffer; TLB for short), which translates virtual memory addresses into physical memory addresses. The TLB also provides control information to the memory page, and also provides whether a page is buffered in the TLB. Please refer to Fig. 4, an example page table is registered in the field (41) (bits 12 to 31) contains a physical address " mark ". This mark represents the most significant bits of the address, and is used to determine whether the desired address is located in the cache memory, and is indicated by LHIT (32 0) or RHIT (321) in FIG. 3. Cache memory " hit " situation. The " index " part of the address (not shown in the figure) represents a few least significant bits and is used to guide the indicators (322) and (323) to a specific speed using techniques well known in the art Take the desired location of the memory part (301 and 302, respectively). The field (42) may contain such unused bits, and the field (45) usually contains " allowed " bits to control whether the data in the memory page order is such as writable and valid , Cached, and / or user accessible. Within the scope of the present invention, these fields may be arranged in any order. Please refer to FIG. 5, an example page table entry included in a TLB is used as a physical mark (502), a control mark (5 0 3), and a virtual mark (5 0 1). In this way, the virtual address is also converted into a physical address according to the principles well known in this technology. (Please read the precautions on the back before filling in this page). Binding. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 10-A7 B7 printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs V. Description of the Invention (8) In order to implement the above-mentioned technology of the present invention, one or more additional control bits may be included in the billion-page management page table. For example, the field (43) may contain an even / odd time " way " bit to indicate how to write the data into the cache memory, which will be described in detail below. Field (44) May include an n reconfiguration bit. When the reconfiguration bit is ", the cache memory is regarded as a traditional two-way group associative cache memory. That is, the selected Cache memory registration and replacement architecture writes data to each path (301) and (302) of cache gB memory. On the other hand, when the reconfiguration bit is " 1 ", this two-way group association The cache memory is regarded as a direct-mapped cache memory. Then, if the path bit in the field (43) is " (Γ, it indicates that the data is written to the cache memory of the even path Body part; if the path bit in the block (4 3) is " 1 ", it indicates that the data is written to the cache memory part of the odd path. In this way, the data is placed appropriately Part of the cache memory, as X and y operands, for the MAU to perform a multiply-accumulate instruction, or other special types of When an operating system (OS) is installed, user applications can be called via a special function. In this way, a data processing system including a data processor and operating system can effectively utilize the invention Technology. In practice, the left operand is extracted from the 0th path (that is, X in the above example), and the right operand is extracted from the odd path (that is, y in the above example). However, it can also be Use other conventions. In addition, other control techniques for writing data to the cache memory section can also be used in conjunction with the present invention. For example, a command to load information into the cache memory can clearly specify that data should be written to this paper The standard is applicable to the Chinese National Standard (CNS) A4 specification (210X 297mm) (please read the precautions on the back before filling this page) -'s

-11 - A7 B7 五、發明説明（9 ) 快取記憶體哪一部分。爲了達到此一目的，可在圖3所示的各指令暫存器中包含一個或多個"路次"位元（313 )。在此種情形中，可能不需要一記憶體管理單元及TLB 。此外，X及y資料的分佈不需要分開到偶次或奇次路的快取記憶體，而是可以任何方便的方式將這些資料在各快取記憶體之中。最後請注意，熟悉本門技術者當可了解，功能單元所執行的各種運算可同時自快取記憶體提取兩個以上的運算元。雖然本發明的資料處理器一般是傳統上稱爲"微處理器"的這一類型，但是亦可採用其他名稱及類型的資料處理器，且係包含在本發明的範圍內。例如，具有提昇非 MAC指令功能的數位信號處理器可有效利用本發明之技術 (請先閱讀背面之注意事項再填寫本頁) •ί 裝. -•訂經濟部中央標準局員工消費合作社印製 -—1-IJ.. CN j準標 I家一國國中用適度尺 -張紙一釐公-11-A7 B7 5. Description of the invention (9) Which part of the cache memory. To achieve this, one or more "quote" bits (313) may be included in each instruction register shown in FIG. In this case, a memory management unit and TLB may not be needed. In addition, the distribution of X and y data does not need to be divided into even-order or odd-order caches, but the data can be stored in each cache in any convenient way. Finally, please note that those familiar with this technology can understand that various operations performed by the functional unit can simultaneously extract more than two operands from the cache memory. Although the data processor of the present invention is generally of the type conventionally referred to as " microprocessor ", other names and types of data processors may also be used and are included within the scope of the present invention. For example, a digital signal processor with a non-MAC instruction function can effectively use the technology of the present invention (please read the precautions on the back before filling out this page) • ί Installation.-• Printed by the Employees Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs -—1-IJ .. CN j quasi-standard I family one country junior high school moderate ruler-sheet paper one centimeter

Claims

A8 B8 C8 D8 printed by the Employees ’Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs 6. Scope of patent application 1. A data processor, including: a command register (314). One n-way group associative cache memory, this Η at is at least 2, 9 and the cache memory includes a first cache memory portion (301) and a second cache memory portion (302); and a functional unit (305), the functional unit (305) is the operation of the first and second operands (X, y) when executing an instruction; the data processor is characterized in that the data processor further includes a first from the first cache memory part A signal path (325) for supplying the first operand (X) to the functional unit, the data processor | includes a second signal path (327) from the second cache memory portion, used When executing a special type of instruction, the second operand (y) and the first operand are simultaneously supplied to the functional unit; the data processor further includes a multiplexer (310, 311) for When executing another type of instruction from the first and the first One of the cache memory section selection information. 2. The data processor according to item 1 of the patent application scope also includes a parallel address translation buffer (500) with a number of page table entries (Figure 4). These page table entries include a reconfiguration column Bit (44), used to control the way data is written to the cache. 3. The data processor according to item 2 of the patent application scope, where the page table entry also includes a pass field (43), which is used to provide direct mapping of writing the first set of data to an even pass Take the memory and write the second set of data into an odd-order direct-map cache. 4 _According to the data processor of item 1 of the patent application scope, the paper size is in accordance with the Chinese National Standard (CNs) A4 specification (210X297mm) (please read the precautions on the back before filling in this page) • Installation-Order- 13-A8 B8 C8 D8 #, the scope of patent application The instruction register contains at least one control bit (313), used to control the writing of data into the cache memory part. 5. The data processor according to item 1 of the patent application scope, wherein the special type of instruction includes a multiply-accumulate instruction. 6. The data processor according to item 1 of the patent application scope, wherein the functional unit is a multiply-accumulate unit. (Please read the precautions on the back and then fill out this page)-Binding · Order Printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs This paper standard is applicable to the Chinese National Standard (CNS) A4 (210 X 297 mm) -14-