TW297111B - - Google Patents

Download PDF

Info

Publication number
TW297111B
TW297111B TW085100981A TW85100981A TW297111B TW 297111 B TW297111 B TW 297111B TW 085100981 A TW085100981 A TW 085100981A TW 85100981 A TW85100981 A TW 85100981A TW 297111 B TW297111 B TW 297111B
Authority
TW
Taiwan
Prior art keywords
cache memory
data processor
data
instruction
patent application
Prior art date
Application number
TW085100981A
Other languages
Chinese (zh)
Original Assignee
At & T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At & T Corp filed Critical At & T Corp
Application granted granted Critical
Publication of TW297111B publication Critical patent/TW297111B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Executing Machine-Instructions (AREA)

Description

A7 B7 經濟部中央標準局員工消費合作社印製 五、發明説明(1 ) 發明背景: 發明領域: 本發明係有關一種具有可重新配置成組關聯式快取記 憶體(set-associative cache)的裝置以供更大頻寬作 業之微處理器。 習用技術之說明: 許多傳統的微處理器具有多出入埠式暫存器檔( register file),因而可在每一週期中將各暫存器中所 存的兩個運算元提供給執行單元(Execution Unit;簡稱 EU)。這些暫存器係包含在算術邏輯單元(Arithmetic Logic Unit;簡稱ALU)所在的同一個積體電路中,且是 用於提供所需資料的極快速裝置。例如,請參閱圓1 , 一個典型習用技術的微處理器(100)包含一指令暫存器 (101),該指令暫存器(101)將一第一位址(addrO) 供應到一第一暫存器檔(102),並將一第二位址(addrl )供應到一第二暫存器檔(103)。例示之暫存器檔(1〇2 )及(103 )具有32個登錄,每一個登錄有32個位元。 第一暫存器檔(102)將一第一運算元供應到一第一運算 元暫存器(104),且第二暫存器檔(103)將一第二運算 元供應到一第二運算元暫存器(105)。暫存器(1〇4)及 (105)將該第一及第二運算元供應到算術邏輯單元(ALU )(106),該ALU(106)可執行各種算術運算,其中包 括乘法累積(Multiply Accumulate;簡稱MAC)運算。 運算結果被儲存在運算結果暫存器(107)中,並可經由 (請先閲讀背面之注意事項导填寫本頁) •裝· 許 線 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央標準局貝工消費合作社印製 ^97111 at B7 五、發明説明(2 ) 線路(108)將運算結果寫回各暫存器檔。在一替代實施 例中,係使用一單一雙出入埠式暫存器檔(圖中未示出) 替代這兩個暫存器檔(102)及(103)。在此種情形中, 兩個讀取瘅可同時取得該暫存器檔中的兩個登錄。 然而,有很多必須供應記憶體中所存有但並未存在晶 片內部暫存器的兩個運算元之情形。其中一個例子是乘法 -累積指令,此種指令是信號處理的一種基本原始形式。 兩個記憶體運算元通常存於一晶片內部之資料快取記憶體 (在快取命中的情形時),亦可存於微處理器晶片外部的 一快取記憶體中。不論在哪一種情形,由於都須在每一週 期中將兩個運算元供應到EU,所以必須使資料快取記憶 體具有雙出入埠。 一個典型的指令如下: MAC X, y » a0 此處的MAC是指令"乘法累積"之簡字符號,且所指定 的運算係如下式: aO = a0 + (x * y) 通常x及y屬於記憶體中的一些特定陣列;例如,χ可 能位於一係數陣列,而y可能位於一資料陣列。 請參閱圖2 ,圖中示出一具有兩組晶片內部記憶體 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) .裝. -IT--Γ 經濟部中央標準局員工消費合作社印製 A7 _____B7 五、發明説明(3 ) ~~~~ 之微處理器( 2 00 )。指令暫存器(201)將第一及第二位 址(addrO, addrl )供應到快取記憶體的第0記憶體組 (2 02 )及第1記憶體組(2 03 ),其中例示每一記憶體 組之大小爲1千位元組。係經由寫入線(2 1 3 )將資料寫 到快取記憶體。自第0記憶體組(202 )讀取第一運算元 ,且係利用多工器(204 )選擇第0記憶體組(202 )之 讀取輸出。然後將第一運算元鎖存到運算元暫存器1 ( 205)。同樣地,自第1記憶組(203)讀取第二運算元 ,且係利用多工器(206 )選擇該第1記億體組(203 )之 讀取輸出。然後將第二運算元鎖存到運算元暫存器2( 207)。此外,多工器(204)及(206)亦可自外部記憶 體匯流排(212)選擇這些運算元。然後將這些運算元自 各運算元暫存器提供給ALU/MAC單元( 2 08 ),在此單元 中對這些運算元執行乘法,並將運算結果加到經由路徑( 214)自累積器儲存部分取得的前一運算結果。然後將此 運算結果提供給運算結果暫存器( 209 ),並儲存在累積 器儲存部分(210)中。雖然此種技術利用傳統的微處理 器架構提供乘法/累積功能,但是此種方法尙有一些缺點 。例如,因爲晶片內部之記憶體係配置成RAM,而非配 置成快取記憶體,所以只有經過選擇的應用程式可使用晶 片內部之記憶體。在開發應用程式時,必須決定記憶體中 所有的資料位址。因此,傳統微處理器的應用程式無法彈 性地使用此記憶體。此外,很難執行不同廠商的應用程式 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公董) I— i 裝 I ^ I I n ^ (請先閲讀背面之注意事項再填寫本頁) A7 B7 經濟部中央標準局員工消費合作社印製 五、發明説明(4 ) 發明概述: 發明人發明了一種具有一 η路關聯性式取記憶體之 資料處理器及資料處理系統,其中一第一運算元(X)係 位於快取記憶體的一第一部分,而一第二運算元(y)係 位於快取記憶體的一第二部分。於執行一諸如乘法-累積 指令等某一指令類型時,係將快取記憶體的第一及第二部 分之輸出(X,y)提供給一諸如乘法-累積單元之功能單 元。一多工器係連接到快取記憶體的第一及第二部分之輸 出端。因此,當以傳統成組關聯式快取記憶體之方式存取 該快取記憶體,而執行其他類型的指令時,可自其他部分 擷取各運算元。爲了控制對該快取記憶體之寫入,一並行 式位址轉換緩衝區(translation lookaside buffer)可 包含一具有一重新配置欄位之頁次表登錄;此外,亦可使 用其他的控制方法。 附圖簡述: 圖1示出一具有兩個用來儲存運算元的暫存器檔之 習用技術微處理器。 圖2示出一具有晶片內部的隨機存取記憶體之習用 技術微處理器,該隨機存取記憶體包含多個用於儲存運算 元的記憶體組。 圖3示出一根據本發明的微處理器實施例。 圖4示出一根據本發明的例示頁次表登錄。 (請先閱讀背面之注意事項再填寫本頁) 装. 灯 線 本紙張尺度適用中國國家標準(CNS ) A4規格(2!〇Χ297公釐) 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(5 ) 圖5示出一可用於實施本發明的例示並行式位址轉 換緩衝區。 本詳細說明係有關一種採用一可重新配置的成組關聯 式快取記憶體之微處理器。在第一種配置中,快取記憶體 提供一個運算元;而在第二種配置中,在執行一需要較高 資料頻寬的指令時,微處理器可同時將兩個或更多個運算 元(x,y)提供給一算術處理器。在本文的用法中,"同時 "意指在同一個機器週期中,這可包含一個或多個時脈週 期。此類指令的一個例子是乘法-累積指令。在此種方式 下,可在一個諸如一般用途的微處理器中執行快速的乘法 -累積運算。快取記憶體通常是η路成組關聯式快取記憶 體,本技術亦可以直接對映式快取記憶體(direct-mapped cache)之方式利用此快取記憶體。可依照每一個 不同的指令完成自η路成組關聯式快取記憶體到直接對 映式快取記憶體的重新配置,並可反向重新配置。在本文 的用法中,亦將各快取記憶體部分稱爲"快取記憶體第〇 路"、快取記憶體第1路"、或更具一般性的”快取記 憶體第η路",此處η是一正整數。 請參閱圖3 ,圖中示出一個本發明的二路成組關聯 式快取記憶體實施例,該二路成組關聯式快取記憶體包含 快取記憶體部分(301)及(302)。分別經由資料線(A7 B7 Printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (1) Background of the invention: Field of the invention: The present invention relates to a device with a set-associative cache that can be reconfigured into groups Microprocessor for larger bandwidth operation. Description of conventional technology: Many traditional microprocessors have multi-port register files, so they can provide the two operands stored in each register to the execution unit (Execution) in each cycle Unit; referred to as EU). These registers are contained in the same integrated circuit where the Arithmetic Logic Unit (ALU) is located, and are extremely fast devices used to provide the required data. For example, please refer to circle 1, a typical conventional technology microprocessor (100) includes an instruction register (101), the instruction register (101) supplies a first address (addrO) to a first Register file (102), and supply a second address (addrl) to a second register file (103). The illustrated scratchpad files (102) and (103) have 32 entries, and each entry has 32 bits. The first register file (102) supplies a first operand to a first operand register (104), and the second register file (103) supplies a second operand to a second Operand register (105). The registers (104) and (105) supply the first and second operands to the arithmetic logic unit (ALU) (106), which can perform various arithmetic operations, including multiply accumulation (Multiply Accumulate; referred to as MAC) operation. The calculation result is stored in the calculation result register (107), and can be accessed via (please read the precautions on the back to fill in this page) • The paper size of Xu Xun is applicable to the Chinese National Standard (CNS) A4 specification (210X297 Mm) Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs ^ 97111 at B7 5. Description of the invention (2) The circuit (108) writes the calculation result back to each temporary file. In an alternative embodiment, a single dual-port register file (not shown) is used to replace the two register files (102) and (103). In this case, two reads can simultaneously obtain two entries in the register file. However, there are many situations where it is necessary to supply two operands that are stored in the memory but do not exist in the on-chip scratchpad. One example is the multiply-accumulate instruction, which is a basic primitive form of signal processing. The two memory operands are usually stored in the data cache of a chip (in the case of a cache hit), or they can be stored in a cache memory outside the microprocessor chip. In either case, since two operands must be supplied to the EU in each cycle, the data cache must have dual access ports. A typical instruction is as follows: MAC X, y »a0 where MAC is the simple character number of the instruction" multiplication accumulation ", and the specified operation system is as follows: aO = a0 + (x * y) usually x and y belongs to some specific array in the memory; for example, χ may be located in a coefficient array, and y may be located in a data array. Please refer to Figure 2, which shows a paper with two sets of internal memory. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm) (please read the precautions on the back and fill in this page). -IT--Γ A7 _____B7 printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Invention description (3) ~~~~ Microprocessor (200). The instruction register (201) supplies the first and second addresses (addrO, addrl) to the 0th memory group (2 02) and the 1st memory group (2 03) of the cache memory, where The size of a memory group is 1 kilobyte. The data is written to the cache memory via the write line (2 1 3). The first operand is read from the 0th memory group (202), and the read output of the 0th memory group (202) is selected using the multiplexer (204). The first operand is then latched into operand register 1 (205). Similarly, the second operand is read from the first memory group (203), and the read output of the first memory group (203) is selected using the multiplexer (206). The second operand is then latched into operand register 2 (207). In addition, the multiplexers (204) and (206) can also select these operands from the external memory bus (212). Then, these operands are provided from each operand register to the ALU / MAC unit (2 08), in which the operands are multiplied, and the operation result is added to the storage part obtained from the accumulator via the path (214) Result of the previous operation. This operation result is then provided to the operation result register (209) and stored in the accumulator storage section (210). Although this technique utilizes the traditional microprocessor architecture to provide multiply / accumulate functions, this method has some disadvantages. For example, because the internal memory system of the chip is configured as RAM, not as cache memory, only selected applications can use the internal memory of the chip. When developing applications, all data addresses in memory must be determined. Therefore, applications of traditional microprocessors cannot use this memory elastically. In addition, it is difficult to execute applications from different manufacturers. This paper standard is applicable to the Chinese National Standard (CNS) A4 specification (210X297 company director) I— i installed I ^ II n ^ (please read the precautions on the back before filling this page) A7 B7 Printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (4) Summary of the invention: The inventor invented a data processor and data processing system with an η-way associative memory fetch, one of which Element (X) is located in a first part of the cache memory, and a second operand (y) is located in a second part of the cache memory. When an instruction type such as a multiply-accumulate instruction is executed, the outputs (X, y) of the first and second parts of the cache memory are provided to a functional unit such as a multiply-accumulate unit. A multiplexer is connected to the output of the first and second parts of the cache memory. Therefore, when accessing the cache memory in the traditional group associative cache memory and executing other types of instructions, each operand can be retrieved from other parts. To control writing to the cache, a translation lookaside buffer (translation lookaside buffer) may include a page table entry with a reconfiguration field; in addition, other control methods may be used. Brief Description of the Drawings: Fig. 1 shows a conventional technical microprocessor having two temporary memory files for storing operands. FIG. 2 shows a conventional technology microprocessor having a random access memory inside the chip. The random access memory includes a plurality of memory groups for storing operands. Figure 3 shows an embodiment of a microprocessor according to the invention. FIG. 4 shows an exemplary page order table registration according to the present invention. (Please read the precautions on the back before filling out this page) Pack. The paper size of the light line is applicable to the Chinese National Standard (CNS) A4 specification (2! 〇297297 mm) A7 B7 printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs 5. Description of the Invention (5) FIG. 5 shows an exemplary parallel address conversion buffer that can be used to implement the present invention. This detailed description relates to a microprocessor that uses a reconfigurable set of associative cache memories. In the first configuration, the cache memory provides an operand; in the second configuration, when executing an instruction that requires a higher data bandwidth, the microprocessor can simultaneously perform two or more operations The element (x, y) is provided to an arithmetic processor. In the usage of this article, " simultaneously " means that in the same machine cycle, this may include one or more clock cycles. An example of such an instruction is a multiply-accumulate instruction. In this way, fast multiply-accumulate operations can be performed in a microprocessor such as a general purpose. Cache memory is usually n-way group-associative cache memory. This technology can also use this cache memory in a direct-mapped cache. The reconfiguration from the n-way group associative cache memory to the direct map cache memory can be completed according to each different command, and can be reversely reconfigured. In the usage of this article, each cache memory part is also called " cache memory No. 0 ", cache memory No. 1 ", or the more general “cache memory No. η way ", where η is a positive integer. Please refer to FIG. 3, which shows an embodiment of the two-way group associative cache memory of the present invention, the two-way group associative cache memory Contains the cache memory parts (301) and (302). Via the data line (

303 )及( 304 )將快取記憶體部分(301)及( 302 )之資 料輸出提供給一乘法-累積單元(MAU) ( 305 )。除了 X 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) I I I I I I 裝—! I I I 訂 —r 線 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 ____ B7 五、發明説明(6 ) 及y資料輸入以外,MAIM305)經由線路(308)自累積 器儲存部分(312)接收一累積器輸入。該MAIM305)包 含一乘法器(306 )及一累積器(307 ),而在與本發明相 關的範圍中,該乘法器(306 )及累積器(307 )可以利用 各種設計,其中包括本門技術中所熟知者。在運算中當執 行一乘法-累積指令時,MAU (305)被指示對經由多工器 (310 )自快取記憶體部分(301 )取得的運算元X及經 由多工器(311)自快取記憶體部分(302)取得的運算元 y執行乘法-累積功能。然而,當正在執行無須同時自快 取記憶體取得多個運算元的另一類指令時,多工器(311 )替代性地自快取記憶體部分(301 )、快取記憶體部分 (302)、或外部記憶體匯流排(312)選擇輸出,而提供 所需的資料。 請注意,所示實施例是針對一個二路成組關聯式快取 記憶體。然而,亦可針對任何η路成組關聯式快取記憶 體實施本發明,此處的η是任何正整數。在下列的說明 中,η是示爲一個偶數(例如η = 2 ),但是η亦可以是 一個奇數。一般而言,利用一具有η個輸入(每一快取 記憶體部分提供一個輸入)之多工器,即可完成上述方式 °胃η大於2時,係由特定的實施方式決定用於存取兩 個運算元的η路之分佈,且任何特定的實施方式都可配 合本發明使用。此外,將快取記憶體配置成傳統的η路 成組關聯式快取記憶體時,可在與本發明有關的範圍中利 用任何技術完成快取記憶體的替換(replacememt)演算 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) (請先閲讀背面之注意事項再填寫本頁) -裝· -訂----303) and (304) provide the data output of the cache memory parts (301) and (302) to a multiply-accumulate unit (MAU) (305). Except for the X paper size, the Chinese National Standard (CNS) A4 specification (210X297 mm) is applied. IIIIII Pack—! III Order—r line (please read the precautions on the back before filling out this page). Printed by the Employee Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs System A7 ____ B7 5. In addition to the description of the invention (6) and y data input, MAIM305) receives an accumulator input from the accumulator storage section (312) via line (308). The MAIM305) includes a multiplier (306) and an accumulator (307), and within the scope of the present invention, the multiplier (306) and the accumulator (307) can utilize various designs, including the gate technology Known in. When a multiply-accumulate instruction is executed in the operation, the MAU (305) is instructed to operate unit X obtained from the cache section (301) through the multiplexer (310) and the cache from the multiplexer (311) The operand y obtained from the memory part (302) performs a multiply-accumulate function. However, when another type of instruction that does not require the simultaneous acquisition of multiple operands from the cache memory is being executed, the multiplexer (311) substitutes the cache memory portion (301) and the cache memory portion (302) instead Or, the external memory bus (312) selects the output and provides the required data. Please note that the illustrated embodiment is directed to a two-way group associative cache memory. However, the invention can also be implemented for any n-way group-associative cache memory, where n is any positive integer. In the following description, η is shown as an even number (for example, η = 2), but η can also be an odd number. Generally speaking, using a multiplexer with η inputs (one input for each cache section), the above method can be completed. When the stomach η is greater than 2, it is determined by the specific implementation for access The distribution of the η way of the two operands, and any specific embodiment can be used in conjunction with the present invention. In addition, when the cache memory is configured as a conventional η-way group-associative cache memory, any technique can be used to complete the replacement memory of the cache memory within the scope of the present invention. The paper size is applicable. China National Standards (CNS) A4 Specification (210X 297mm) (Please read the precautions on the back before filling out this page) -installed--order ----

S971H A7 經濟部中央標準局員工消費合作社印製 B7五、發明説明(7 ) 法。 如本門技術中所熟知者,記憶體管理頁次表係用來將 虛擬位址轉換成實體位址,並係用來控制快取記憶體作業 。這些頁次表的部分被緩衝儲存在並行式位址轉換緩衝區 (Translation Lookaside Buffer;簡稱 TLB)中,該 TLB將虛擬記憶體位址轉換成實體記憶體位址。該TLB 亦將控制資訊提供給記憶體頁次,且亦提供某一頁次是否 緩衝儲存在TLB中。請參閱圖4 ,一例示頁次表登錄在 欄位(41 )(第12到31位元)包含一實體位址"標記 "。該標記代表位址的若干最高有效位元,且係用來決定 所需的位址是否位於快取記憶體之內,且係由圖3中之 LHIT ( 32 0 )或RHIT ( 321 )指示一快取記憶體"命中"的 情形。位址的"索引’’部分(圖中未示出)代表若干最低 有效位元,且係用來以本門技術中所熟知的技術將指標( 322 )及(323 )指引到一特定快取記憶體部分(分別爲 301及302 )的所需位置。欄位(42 )可包含諸如未使用 位元,而欄位(4 5 )通常包含"容許"位元,用以控制該 記憶體頁次中之資料是否爲諸如可寫入的、有效的、存放 在快取記憶體的、及(或)使用者可存取的。在與本發明 有關的範圍中,這些欄位可以任何順序排列。請參閱圖5 ,一 TLB所包含的例示頁次表登錄係作爲實體標記(502 )、控制標記(5 0 3 )、及虛擬標記(5 0 1 )。在此種方式 下,亦係根據本門技術中所熟知的原理將虛擬位址轉換成 實體位址。 (請先閱請背面之注意事項再填寫本頁) 裝. 訂 線· 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 10 - 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(8 ) 爲了實施上述本發明之技術,可將額外的一個或多個 控制位元包含在記億體管理頁次表中。例如,欄位(43) 可包含一偶次/奇次"路”位元,用以指示如何將資料寫 入快取記憶體中,將於下文中詳述其方式。欄位(44)可 包含一n重新配置"位元。當重新配置位元爲"時, 該快取記憶體被視爲一個傳統的二路成組關聯式快取記憶 體。亦即,使用所選擇的快取記憶體登錄替換架構將資料 寫入快取gB憶體之各路(301)及( 302)。另一方面,當 重新配置位元爲"1"時,此二路成組關聯式快取記憶體 被視爲一個直接對映式快取記憶體。然後,如果欄位(43 )中之路次位元爲"(Γ時,則指示資料寫入偶次路的快 取記憶體部分;如果攔位(4 3 )中之路次位元爲"1 "時 ,則指示資料寫入奇次路的快取記憶體部分。在此種方式 下,資料被置入適當的快取記憶體部分,而作爲X及 y運算元,以供MAU執行一乘法-累積指令、或其他特殊 類型的指令。在安裝有作業系統(0S)時,使用者應用 程式可經由一特殊的功能呼叫。在此種方式下,包含一資 料處理器及作業系統的一資料處理系統可以有效地利用本 發明之技術。 在慣例上,係自第〇路提取左方運算元(亦即上例 中之X),並自奇次路提取右方運算元(亦即上例中之y )。然而,亦可使用其他的慣例。此外,亦可配合本發明 使用將資料寫入快取記憶體部分的其他控制技術。例如, 將資訊載入快取記憶體的一指令可明確指定應將資料寫入 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) (請先閱讀背面之注意事項再填寫本頁) -'sS971H A7 Printed by Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs B7 V. Invention Instructions (7) Method. As is well known in the art, the memory management page table is used to convert virtual addresses to physical addresses and is used to control cache memory operations. Parts of these page tables are buffered and stored in a parallel address translation buffer (Translation Lookaside Buffer; TLB for short), which translates virtual memory addresses into physical memory addresses. The TLB also provides control information to the memory page, and also provides whether a page is buffered in the TLB. Please refer to Fig. 4, an example page table is registered in the field (41) (bits 12 to 31) contains a physical address " mark ". This mark represents the most significant bits of the address, and is used to determine whether the desired address is located in the cache memory, and is indicated by LHIT (32 0) or RHIT (321) in FIG. 3. Cache memory " hit " situation. The " index " part of the address (not shown in the figure) represents a few least significant bits and is used to guide the indicators (322) and (323) to a specific speed using techniques well known in the art Take the desired location of the memory part (301 and 302, respectively). The field (42) may contain such unused bits, and the field (45) usually contains " allowed " bits to control whether the data in the memory page order is such as writable and valid , Cached, and / or user accessible. Within the scope of the present invention, these fields may be arranged in any order. Please refer to FIG. 5, an example page table entry included in a TLB is used as a physical mark (502), a control mark (5 0 3), and a virtual mark (5 0 1). In this way, the virtual address is also converted into a physical address according to the principles well known in this technology. (Please read the precautions on the back before filling in this page). Binding. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 10-A7 B7 printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs V. Description of the Invention (8) In order to implement the above-mentioned technology of the present invention, one or more additional control bits may be included in the billion-page management page table. For example, the field (43) may contain an even / odd time " way " bit to indicate how to write the data into the cache memory, which will be described in detail below. Field (44) May include an n reconfiguration bit. When the reconfiguration bit is ", the cache memory is regarded as a traditional two-way group associative cache memory. That is, the selected Cache memory registration and replacement architecture writes data to each path (301) and (302) of cache gB memory. On the other hand, when the reconfiguration bit is " 1 ", this two-way group association The cache memory is regarded as a direct-mapped cache memory. Then, if the path bit in the field (43) is " (Γ, it indicates that the data is written to the cache memory of the even path Body part; if the path bit in the block (4 3) is " 1 ", it indicates that the data is written to the cache memory part of the odd path. In this way, the data is placed appropriately Part of the cache memory, as X and y operands, for the MAU to perform a multiply-accumulate instruction, or other special types of When an operating system (OS) is installed, user applications can be called via a special function. In this way, a data processing system including a data processor and operating system can effectively utilize the invention Technology. In practice, the left operand is extracted from the 0th path (that is, X in the above example), and the right operand is extracted from the odd path (that is, y in the above example). However, it can also be Use other conventions. In addition, other control techniques for writing data to the cache memory section can also be used in conjunction with the present invention. For example, a command to load information into the cache memory can clearly specify that data should be written to this paper The standard is applicable to the Chinese National Standard (CNS) A4 specification (210X 297mm) (please read the precautions on the back before filling this page) -'s

-11 - A7 B7 五、發明説明(9 ) 快取記憶體哪一部分。爲了達到此一目的,可在圖3所 示的各指令暫存器中包含一個或多個"路次"位元(313 )。在此種情形中,可能不需要一記憶體管理單元及TLB 。此外,X及y資料的分佈不需要分開到偶次或奇次路 的快取記憶體,而是可以任何方便的方式將這些資料 在各快取記憶體之中。最後請注意,熟悉本門技術者當可 了解,功能單元所執行的各種運算可同時自快取記憶體提 取兩個以上的運算元。 雖然本發明的資料處理器一般是傳統上稱爲"微處理 器"的這一類型,但是亦可採用其他名稱及類型的資料處 理器,且係包含在本發明的範圍內。例如,具有提昇非 MAC指令功能的數位信號處理器可有效利用本發明之技術 (請先閱讀背面之注意事項再填寫本頁) •ί 裝. -•訂 經濟部中央標準局員工消費合作社印製 -—1-IJ.. CN j準 標 I家 一國 國 中 用 適 度 尺 -張 紙 一釐 公-11-A7 B7 5. Description of the invention (9) Which part of the cache memory. To achieve this, one or more "quote" bits (313) may be included in each instruction register shown in FIG. In this case, a memory management unit and TLB may not be needed. In addition, the distribution of X and y data does not need to be divided into even-order or odd-order caches, but the data can be stored in each cache in any convenient way. Finally, please note that those familiar with this technology can understand that various operations performed by the functional unit can simultaneously extract more than two operands from the cache memory. Although the data processor of the present invention is generally of the type conventionally referred to as " microprocessor ", other names and types of data processors may also be used and are included within the scope of the present invention. For example, a digital signal processor with a non-MAC instruction function can effectively use the technology of the present invention (please read the precautions on the back before filling out this page) • ί Installation.-• Printed by the Employees Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs -—1-IJ .. CN j quasi-standard I family one country junior high school moderate ruler-sheet paper one centimeter

Claims (1)

經濟部中央標準局員工消費合作社印製 A8 B8 C8 D8 六、申請專利範圍 1 . 一種資料處理器,包含:一指令暫存器(314) .一 η路成組關聯式快取記憶體,此處的η至少爲2, 9 且該快取記憶體包含一第一快取記憶體部分(301)及一 第二快取記憶體部分(302 );以及一功能單元(305 ), 該功能單元( 305 )係於執行一指令時對第一及第二運算 元(X,y)運算; 該資料處理器之特徵在於:該資料處理器又包含一來 自該第一快取記憶體部分之第一信號路徑( 325 ),用以 將該第一運算元(X)供應到該功能單元,該資料處理器 |包含一來自該第二快取記憶體部分之第二信號路徑( 327),用以在執行一特殊類型的指令時,同時將該第二 運算元(y)連同該第一運算元供應到該功能單元; 該資料處理器又包含一多工器(310、311),用以在 執行另一類型的指令時自該第一及第二快取記憶體部分其 中之一選擇資料。 2.根據申請專利範圍第1項之資料處理器,又包 含一具有若干頁次表登錄(圖4)之並行式位址轉換緩衝 區( 500 ),該等頁次表登錄包含一重新配置欄位(44) ,用以控制將資料寫入該快取記憶體之方式。 3 ·根據申請專利範圍第2項之資料處理器,其中 該等頁次表登錄又包含一路次欄位(43),用以提供將第 一組資料寫入一偶次路直接對映式快取記憶體,並將第二 組資料寫入一奇次路直接對映式快取記憶體。 4 _根據申請專利範圍第1項之資料處理器,其中 本紙張尺度適用中國國家標準(CNs ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) •裝- 訂 -13 - A8 B8 C8 D8 #、申請專利範圍 該指令暫存器包含至少一個控制位元(313),用以控制 將資料寫入該快取記憶體部分。 5 .根據申請專利範圍第1項之資料處理器,其中 該特殊類型的指令包括一乘法-累積指令。 6.根據申請專利範圍第1項之資料處理器,其中 該功能單元是一乘法-累積單元。 (請先閱讀背面之注意事項再填寫本頁) -裝· 訂 經濟部中央標準局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) -14 -A8 B8 C8 D8 printed by the Employees ’Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs 6. Scope of patent application 1. A data processor, including: a command register (314). One n-way group associative cache memory, this Η at is at least 2, 9 and the cache memory includes a first cache memory portion (301) and a second cache memory portion (302); and a functional unit (305), the functional unit (305) is the operation of the first and second operands (X, y) when executing an instruction; the data processor is characterized in that the data processor further includes a first from the first cache memory part A signal path (325) for supplying the first operand (X) to the functional unit, the data processor | includes a second signal path (327) from the second cache memory portion, used When executing a special type of instruction, the second operand (y) and the first operand are simultaneously supplied to the functional unit; the data processor further includes a multiplexer (310, 311) for When executing another type of instruction from the first and the first One of the cache memory section selection information. 2. The data processor according to item 1 of the patent application scope also includes a parallel address translation buffer (500) with a number of page table entries (Figure 4). These page table entries include a reconfiguration column Bit (44), used to control the way data is written to the cache. 3. The data processor according to item 2 of the patent application scope, where the page table entry also includes a pass field (43), which is used to provide direct mapping of writing the first set of data to an even pass Take the memory and write the second set of data into an odd-order direct-map cache. 4 _According to the data processor of item 1 of the patent application scope, the paper size is in accordance with the Chinese National Standard (CNs) A4 specification (210X297mm) (please read the precautions on the back before filling in this page) • Installation-Order- 13-A8 B8 C8 D8 #, the scope of patent application The instruction register contains at least one control bit (313), used to control the writing of data into the cache memory part. 5. The data processor according to item 1 of the patent application scope, wherein the special type of instruction includes a multiply-accumulate instruction. 6. The data processor according to item 1 of the patent application scope, wherein the functional unit is a multiply-accumulate unit. (Please read the precautions on the back and then fill out this page)-Binding · Order Printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs This paper standard is applicable to the Chinese National Standard (CNS) A4 (210 X 297 mm) -14-
TW085100981A 1995-02-03 1996-01-26 TW297111B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US38303795A 1995-02-03 1995-02-03

Publications (1)

Publication Number Publication Date
TW297111B true TW297111B (en) 1997-02-01

Family

ID=23511440

Family Applications (1)

Application Number Title Priority Date Filing Date
TW085100981A TW297111B (en) 1995-02-03 1996-01-26

Country Status (3)

Country Link
JP (1) JPH08272681A (en)
KR (1) KR960032182A (en)
TW (1) TW297111B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665775B1 (en) * 2000-09-22 2003-12-16 Intel Corporation Cache dynamically configured for simultaneous accesses by multiple computing engines

Also Published As

Publication number Publication date
KR960032182A (en) 1996-09-17
JPH08272681A (en) 1996-10-18

Similar Documents

Publication Publication Date Title
EP0734553B1 (en) Split level cache
KR100260864B1 (en) System & method for reducing power consumption in an electronic circuit
US5781924A (en) Computer caching methods and apparatus
US5412787A (en) Two-level TLB having the second level TLB implemented in cache tag RAMs
US5155832A (en) Method to increase performance in a multi-level cache system by the use of forced cache misses
US5249286A (en) Selectively locking memory locations within a microprocessor's on-chip cache
US6138208A (en) Multiple level cache memory with overlapped L1 and L2 memory access
JP3666689B2 (en) Virtual address translation method
JP2000122916A5 (en)
KR100261639B1 (en) System and Method for Reducing Power Consumption in an Electronic Circuit
JP5341163B2 (en) Instruction cache with a fixed number of variable-length instructions
US6223255B1 (en) Microprocessor with an instruction level reconfigurable n-way cache
JP2000231549A (en) Microprocessor
US8041930B2 (en) Data processing apparatus and method for controlling thread access of register sets when selectively operating in secure and non-secure domains
US5155828A (en) Computing system with a cache memory and an additional look-aside cache memory
Kohn et al. A 1,000,000 transistor microprocessor
US6029241A (en) Processor architecture scheme having multiple bank address override sources for supplying address values and method therefor
JPS62164148A (en) Data processing system
EP0459233A2 (en) Selectively locking memory locations within a microprocessor's on-chip cache
GB2200481A (en) Maintaining coherence between a microprocessor's integrated cache and external memory
JP3618868B2 (en) Method and system for efficient memory management in a data processing system utilizing a dual mode conversion index buffer
JPS5868286A (en) Cash memory and operation thereof
TW297111B (en)
JPH0519176B2 (en)
US5687350A (en) Protocol and system for performing line-fill address during copy-back operation