TWI519955B - Prefetcher, method of prefetch data and computer program product - Google Patents

Prefetcher, method of prefetch data and computer program product Download PDF

Info

Publication number
TWI519955B
TWI519955B TW103128257A TW103128257A TWI519955B TW I519955 B TWI519955 B TW I519955B TW 103128257 A TW103128257 A TW 103128257A TW 103128257 A TW103128257 A TW 103128257A TW I519955 B TWI519955 B TW I519955B
Authority
TW
Taiwan
Prior art keywords
period
register
memory block
cache line
bit
Prior art date
Application number
TW103128257A
Other languages
Chinese (zh)
Other versions
TW201447581A (en
Inventor
羅德尼E 虎克
約翰 麥可 吉爾
Original Assignee
威盛電子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/033,765 external-priority patent/US8762649B2/en
Priority claimed from US13/033,848 external-priority patent/US8719510B2/en
Priority claimed from US13/033,809 external-priority patent/US8645631B2/en
Application filed by 威盛電子股份有限公司 filed Critical 威盛電子股份有限公司
Publication of TW201447581A publication Critical patent/TW201447581A/en
Application granted granted Critical
Publication of TWI519955B publication Critical patent/TWI519955B/en

Links

Description

預取單元、資料預取方法以及電腦程式產品 Prefetch unit, data prefetching method and computer program product

本發明係有關於一般微處理器之快取記憶體,特別係有關將資料預取至微處理器之快取記憶體。 The present invention relates to a cache memory for a general microprocessor, and more particularly to a cache memory for prefetching data to a microprocessor.

以最近的電腦系統而言,在快取失敗(cache miss)時,微處理器存取系統記憶體所需的時間,會比微處理器存取快取記憶體(cache)多上一或兩個數量級。因此,為了提高快取命中率(cache hit rate),微處理器整合了預取技術,用來檢查最近資料存取樣態(examine recent data access patterns),並且企圖預測哪一個資料為程式下一個存取的對象,而預取的好處已是眾所皆知的範疇。 In the case of recent computer systems, in the case of a cache miss, the microprocessor will access the system memory for one or two more times than the microprocessor accesses the cache. An order of magnitude. Therefore, in order to increase the cache hit rate, the microprocessor integrates prefetching techniques to check for recent recent data access patterns and attempts to predict which data is the next program. Access to objects, and the benefits of prefetching are well known.

然而,申請人注意到某些程式的存取樣態並不為習知微處理器之預取單元所能偵測的。例如,第1圖所示為當執行之程式包括經由記憶體進行一序列之儲存動作時,第二級快取記憶體(L2 Cache)之存取樣態,而圖中所描繪者為各時間之記憶體位址。由第1圖可知,雖然總趨勢為隨著時間而增加記憶體位址,即由往上之方向,然而在許多狀況下,所指定之存取記憶體位址亦可較前一個時間往下,而非總趨勢之往上,使其不同於習知預取單元實際所預測的結果。 However, Applicants have noted that the access patterns of certain programs are not detectable by the prefetching units of conventional microprocessors. For example, Figure 1 shows the access mode of the second-level cache memory (L2 Cache) when the executed program includes a sequence of storage operations via memory, and the time depicted in the figure is the time The memory address. As can be seen from Fig. 1, although the general trend is to increase the memory address over time, that is, from the upward direction, in many cases, the specified access memory address can be lower than the previous time. The non-total trend is upwards, making it different from the actual predicted results of the conventional prefetch unit.

雖然就數量相對大的樣本而言,總趨勢係朝一個 方向前進,但習知預取單元在面臨小樣本時卻可能出現混亂狀況的原因有兩個。第一個原因為程式係依循其架構存取記憶體,不論是由演算法特性或是不佳的編程(poor programming)所造成。第二個原因為非循序(out-of-order)微處理器核心之管線與佇列在正常功能下執行時,常常會用不同於其所產生的程式順序來進行記憶體存取。 Although for a relatively large number of samples, the general trend is toward one The direction is moving forward, but there are two reasons why the conventional prefetch unit may be in a chaotic situation when facing a small sample. The first reason is that the program accesses the memory according to its architecture, whether it is caused by algorithmic features or poor programming. The second reason is that when the pipelines and queues of the out-of-order microprocessor core are executed under normal functions, the memory access is often performed in a different program order than that generated.

因此,需要一個資料預取單元(器)能夠有效地為程式進行資料預取,其必須考慮到在較小時窗(time windows)進行記憶體存取指令(動作)時並不會呈現明顯之趨勢(no clear trend),但當以較大樣本數進行審查時則會出現明顯之趨勢。 Therefore, a data prefetching unit is required to efficiently perform data prefetching for the program, which must take into account that when a memory access instruction (action) is performed in a small time window (time windows), it does not appear obvious. No clear trend, but there is a clear trend when reviewing with a larger sample size.

本發明提供一種預取單元,設置於一微處理器中,包括複數週期匹配計數器以及一控制邏輯。週期匹配計數器分別相應於不同之複數樣態週期。控制邏輯用以響應於微處理器存取一記憶體區塊的動作,更新週期匹配計數器;根據週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由週期匹配計數器所決定之具有明顯的樣態週期之一樣態,對記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 The present invention provides a prefetch unit disposed in a microprocessor, including a complex period matching counter and a control logic. The period matching counters correspond to different complex pattern periods, respectively. The control logic is configured to update the period matching counter in response to the microprocessor accessing a memory block; determine an apparent period according to the period matching counter count value; and have a period determined by the period matching counter The same state of the apparent cycle, prefetching the cache line that has not been prefetched in the complex cache line in the memory block.

在另一實施例中,預取單元更包括一位元遮罩暫存器。位元遮罩暫存器具有記憶體區塊中每一快取線所對應之位元,其中,控制邏輯響應於對記憶體區塊進行存取的動作,更新位元遮罩暫存器,以指示記憶體區塊中已被存取之快取線。另外,控制邏輯從位元遮罩暫存器,決定具有明顯的樣態週期之樣態。 In another embodiment, the prefetch unit further includes a one-bit mask register. The bit mask register has a bit corresponding to each cache line in the memory block, wherein the control logic updates the bit mask register in response to the action of accessing the memory block, To indicate the cache line that has been accessed in the memory block. In addition, the control logic masks the scratchpad from the bit and determines the state of the apparent cycle.

又另一實施例中,預取單元更包括一中間指標暫存器。中間指標暫存器具有由控制邏輯響應於記憶體區塊進行存取的動作所計算之值,用以在位元遮罩暫存器中指向記憶體區塊中已被存取之快取線中的一中間快取線,其中,每一不同之樣態週期分別為中間指標暫存器左邊或者右邊之位元的數量。為了更新週期匹配計數器,控制邏輯對每一週期匹配計數器分別進行以下步驟:當中間指標暫存器左邊之N個位元以及中間指標暫存器右邊之N個位元相匹配時,增加週期匹配計數器之數值,其中,N為不同之樣態週期中相應於週期匹配計數器之一者。 In still another embodiment, the prefetch unit further includes an intermediate indicator register. The intermediate indicator register has a value calculated by an action of the control logic to access the memory block for pointing to the cache line that has been accessed in the memory block in the bit mask register. An intermediate cache line in which each different phase period is the number of bits to the left or right of the intermediate indicator register. In order to update the period matching counter, the control logic performs the following steps for each period matching counter: when the N bits on the left side of the intermediate indicator register and the N bits on the right side of the intermediate indicator register match, the period matching is increased. The value of the counter, where N is one of the different period periods corresponding to one of the period matching counters.

又另一實施例中,預取單元更包括一最小指標暫存器以及一最大指標暫存器。最小指標暫存器用以指向記憶體區塊中已被存取之快取線中之一最低快取線。最大指標暫存器用以指向記憶體區塊中已被存取之快取線中之一最高快取線。其中,控制邏輯將中間指標暫存器之計數值作為最小指標暫存器之計數值以及最大指標暫存器之計數值的一平均。另外,由控制邏輯自位元遮罩暫存器所決定具有明顯的樣態週期之樣態為位元遮罩暫存器中中間指標暫存器左邊或者右邊之明顯的樣態週期位元。為了根據週期匹配計數器決定明顯的樣態週期,控制邏輯更用以決定週期匹配計數器中之一者與週期匹配計數器中之其它者的計數值之間的差值是否大於一既定值。 In still another embodiment, the prefetch unit further includes a minimum indicator register and a maximum indicator register. The minimum indicator register is used to point to one of the fastest cache lines in the memory block that has been accessed. The maximum indicator register is used to point to one of the highest cache lines in the memory block that has been accessed. The control logic uses the count value of the intermediate indicator register as an average of the count value of the minimum indicator register and the count value of the maximum indicator register. In addition, the mode that is determined by the control logic from the bit mask register to have a distinct morph period is the obvious morphological period bit of the left or right side of the intermediate indicator register in the bit mask register. In order to determine an apparent transition period based on the period match counter, the control logic is further configured to determine whether the difference between one of the period match counters and the count value of the other of the period match counters is greater than a predetermined value.

不同之樣態週期包括至少三個不同之樣態週期。在一實施例中,控制邏輯僅在記憶體區塊中之至少九個快取線 已被存取時,對微處理器中尚未被預取之快取線進行預取。在另一實施例中,控制邏輯僅在記憶體區塊中之至少十個快取線已被存取時,對微處理器中尚未被預取之快取線進行預取。為了微處理器根據具有明顯的樣態週期之樣態預取記憶體區塊中尚未被預取之快取線,控制邏輯更用以:定位在一搜尋指標暫存器中,在位元遮罩暫存器外的樣態;對樣態中之每一位元進行一預測,預測包括當位元被設置時,預測是否需要相應於位元之快取線,當需要相應於位元之快取線時,判斷快取線是否尚未被預取,其中當相應於樣態中之位元的位元遮罩中之位元指出快取線尚未被存取時,判斷快取線尚未被預取。為了微處理器根據具有明顯的樣態週期之樣態預取記憶體區塊中尚未被預取之快取線,控制邏輯僅在快取線少於記憶體區塊中位於已被存取之快取線末端之一快取線之一既定值時,對被控制邏輯預測為需要以及尚未被預取的快取線進行預取。 The different morphological periods include at least three different morphological periods. In an embodiment, the control logic only has at least nine cache lines in the memory block When accessed, the cache line that has not been prefetched in the microprocessor is prefetched. In another embodiment, the control logic prefetches the cache lines that have not been prefetched in the microprocessor only when at least ten cache lines in the memory block have been accessed. In order for the microprocessor to prefetch the cache line in the memory block that has not been prefetched according to the state of the apparent state cycle, the control logic is further used to: locate in a search index register, and block the bit in the bit a state outside the mask register; a prediction is made for each bit in the pattern, and the prediction includes whether the prediction needs a cache line corresponding to the bit when the bit is set, when corresponding to the bit When the line is cached, it is determined whether the cache line has not been prefetched. When the bit in the bit mask corresponding to the bit in the pattern indicates that the cache line has not been accessed, it is determined that the cache line has not been Prefetching. In order for the microprocessor to prefetch the cache line in the memory block that has not been prefetched according to the state of the apparent phase, the control logic is located only in the cache line that is less than the memory block. When one of the cache lines at the end of the cache line has a predetermined value, it is prefetched for the cache line that is predicted by the control logic to be needed and not yet prefetched.

本發明亦提供一種方法包括:藉由一微處理器,響應於對存取一記憶體區塊的動作,對複數週期匹配計數器進行更新,其中週期匹配計數器分別相應於不同之複數樣態週期;根據週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由週期匹配計數器所決定之具有明顯的樣態週期之一樣態,對記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 The present invention also provides a method comprising: updating, by a microprocessor, a complex period matching counter in response to an action of accessing a memory block, wherein the period matching counters respectively correspond to different complex pattern periods; According to the count value of the period matching counter, an obvious period of time is determined; and according to the same state of the apparent period determined by the period matching counter, the complex cache line in the memory block has not been pre-predicted Take the cache line for prefetching.

本發明亦提供一種電腦程式產品,編碼於至少一電腦可讀取媒體之上,並且適用於一計算裝置,電腦程式產品包括:一電腦可讀程式編碼,儲存於電腦可讀取媒體,用以定 義一微處理器中之一預取單元。電腦可讀程式包括:一第一程式碼,用以定義分別相應於不同之複數樣態週期的複數週期匹配計數器;以及一第二程式碼,用以定義一控制邏輯,控制邏輯用以:響應於存取一記憶體區塊的動作,更新週期匹配計數器;根據週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由週期匹配計數器所決定之具有明顯的樣態週期之一樣態,對記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 The present invention also provides a computer program product encoded on at least one computer readable medium and adapted for use in a computing device. The computer program product comprises: a computer readable program code stored in a computer readable medium for set A prefetch unit in a microprocessor. The computer readable program includes: a first code for defining a complex period matching counter corresponding to different complex period periods; and a second code for defining a control logic, the control logic is configured to: respond In an operation of accessing a memory block, updating a period matching counter; determining an apparent period according to a count value of the period matching counter; and determining the same state according to a periodic matching counter Prefetching the cache line that has not been prefetched in the complex cache line in the memory block.

100‧‧‧微處理器 100‧‧‧Microprocessor

102‧‧‧指令快取記憶體 102‧‧‧ instruction cache memory

104‧‧‧指令解碼器 104‧‧‧ instruction decoder

106‧‧‧暫存器別名表 106‧‧‧Scratchpad alias table

108‧‧‧保留站 108‧‧‧Reservation station

112‧‧‧執行單元 112‧‧‧Execution unit

132‧‧‧其他執行單元 132‧‧‧Other execution units

134‧‧‧載入/儲存單元 134‧‧‧Load/storage unit

124‧‧‧預取單元 124‧‧‧Prefetching unit

114‧‧‧引退單元 114‧‧‧Retirement unit

116‧‧‧第一級資料快取記憶體 116‧‧‧First level data cache memory

118‧‧‧第二級快取記憶體 118‧‧‧Second level cache memory

122‧‧‧匯流排介面單元 122‧‧‧ Busbar interface unit

162‧‧‧虛擬雜湊表 162‧‧‧Virtual Hash Table

198‧‧‧佇列 198‧‧‧伫

172‧‧‧第一級資料搜尋指標器 172‧‧‧First-level data search indicator

178‧‧‧第一級資料樣態位址 178‧‧‧First level data sample address

196‧‧‧第一級資料記憶體位址 196‧‧‧First level data memory address

194‧‧‧樣態預測快取線位址 194‧‧‧Spse prediction cache line address

192‧‧‧快取線配置要求 192‧‧‧ Cache line configuration requirements

188‧‧‧快取線資料 188‧‧‧Cache line information

354‧‧‧記憶體區塊虛擬雜湊位址欄 354‧‧‧Memory Block Virtual Hashing Address Bar

356‧‧‧狀態欄 356‧‧‧Status Bar

302‧‧‧區塊位元遮罩暫存器 302‧‧‧ Block Bit Mask Register

303‧‧‧區塊號碼暫存器 303‧‧‧block number register

304‧‧‧最小指標暫存器 304‧‧‧Minimum indicator register

306‧‧‧最大指標暫存器 306‧‧‧Maximum indicator register

308‧‧‧最小改變計數器 308‧‧‧Minimum change counter

312‧‧‧最大改變計數器 312‧‧‧Maximum change counter

314‧‧‧總計數器 314‧‧‧ total counter

316‧‧‧中間指標暫存器 316‧‧‧Intermediate indicator register

318‧‧‧週期匹配計數器 318‧‧‧Period matching counter

342‧‧‧方向暫存器 342‧‧‧ Directional register

344‧‧‧樣態暫存器 344‧‧‧Spread register

346‧‧‧樣態週期暫存器 346‧‧‧like periodic register

348‧‧‧樣態區域暫存器 348‧‧‧ modal area register

352‧‧‧搜尋指標暫存器 352‧‧‧Search indicator register

332‧‧‧硬體單元 332‧‧‧ hardware unit

322‧‧‧控制邏輯 322‧‧‧Control logic

328‧‧‧預取要求佇列 328‧‧‧Pre-request requirements queue

324‧‧‧提取指標器 324‧‧‧Extraction indicator

326‧‧‧推進指標器 326‧‧‧Advance indicator

2002‧‧‧雜湊虛擬位址庫 2002‧‧‧Hatch Virtual Address Library

2102A‧‧‧核心A 2102A‧‧‧ Core A

2102B‧‧‧核心B 2102B‧‧‧ Core B

2104‧‧‧高度反應式的預取單元 2104‧‧‧Highly reactive prefetching unit

2106‧‧‧共享之高度反應式的預取單元 2106‧‧‧Shared highly reactive prefetch unit

第1圖所示為當執行經由記憶體包括一序列儲存操作的程式時,一種對第二級快取記憶體之存取樣態示意圖。 Figure 1 is a schematic diagram showing the access mode of the second level cache memory when executing a program including a sequence of memory operations via the memory.

第2圖為本發明的一種微處理器的方塊圖。 Figure 2 is a block diagram of a microprocessor of the present invention.

第3圖為本發明第2圖之預取單元細部方塊圖。 Figure 3 is a detailed block diagram of the prefetching unit of Figure 2 of the present invention.

第4圖為本發明第2圖之微處理器以及特別係第3圖之預取單元的操作流程圖。 Figure 4 is a flow chart showing the operation of the microprocessor of Figure 2 of the present invention and, in particular, the prefetching unit of Figure 3.

第5圖為本發明第3圖之預取單元對第4圖之步驟的操作流程圖。 Figure 5 is a flow chart showing the operation of the steps of Figure 4 of the prefetching unit of Figure 3 of the present invention.

第6圖為本發明第3圖之預取單元對第4圖之步驟的操作流程圖。 Figure 6 is a flow chart showing the operation of the steps of Figure 4 of the prefetching unit of Figure 3 of the present invention.

第7圖為本發明第3圖之預取要求佇列的操作流程圖。 Figure 7 is a flow chart showing the operation of the prefetch request queue of Figure 3 of the present invention.

第8A、8B圖為本發明一記憶體區塊之兩個圖型存取點,用以表示本發明之定界框預取單元。 8A and 8B are diagrams showing two pattern access points of a memory block of the present invention for indicating the bounding box prefetching unit of the present invention.

第9圖為本發明第2圖所示之微處理器之操作範例的方塊 圖。 Figure 9 is a block diagram showing an operation example of the microprocessor shown in Figure 2 of the present invention. Figure.

第10圖為本發明延續第9圖之範例的第2圖所示之微處理器之操作範例的方塊圖。 Fig. 10 is a block diagram showing an operation example of the microprocessor shown in Fig. 2 of the example of the ninth embodiment of the present invention.

第11A、11B圖為本發明延續第9以及10圖之範例的第2圖所示之微處理器之操作範例的方塊圖。 11A and 11B are block diagrams showing an operation example of the microprocessor shown in Fig. 2 of the example of the ninth and tenth embodiments of the present invention.

第12圖為本發明另一實施例之一種微處理器之方塊圖。 Figure 12 is a block diagram of a microprocessor in accordance with another embodiment of the present invention.

第13圖為本發明第12圖所示之預取單元之操作流程圖。 Figure 13 is a flow chart showing the operation of the prefetching unit shown in Fig. 12 of the present invention.

第14圖為本發明根據第13圖步驟之第12圖所示之預取單元的操作流程圖。 Figure 14 is a flow chart showing the operation of the prefetch unit shown in Fig. 12 of the step of Fig. 13 of the present invention.

第15圖為本發明另一實施例具有一定界框預取單元之一種微處理器的方塊圖。 Figure 15 is a block diagram of a microprocessor having a bounding frame prefetching unit in accordance with another embodiment of the present invention.

第16圖為本發明第15圖之虛擬雜湊表的方塊圖。 Figure 16 is a block diagram of a virtual hash table of Figure 15 of the present invention.

第17圖為本發明第15圖之微處理器的操作流程圖。 Figure 17 is a flow chart showing the operation of the microprocessor of Figure 15 of the present invention.

第18圖為本發明根據經由第17圖範例所述之在預取單元之操作後之第16圖之虛擬雜湊表的內容。 Figure 18 is a diagram showing the contents of the virtual hash table of Figure 16 of the present invention in accordance with the operation of the prefetch unit as illustrated by the example of Figure 17.

第19圖(統合第19A以及19B圖)為本發明第15圖之預取單元的操作流程圖。 Fig. 19 (collectively, Figs. 19A and 19B) is a flow chart showing the operation of the prefetching unit of Fig. 15 of the present invention.

第20圖為本發明另一實施例之用在第15圖之預取單元之一雜湊實體位址至雜湊虛擬位址庫的方塊圖。 Figure 20 is a block diagram of a hash entity address to a hash virtual address pool used in the prefetch unit of Figure 15 in accordance with another embodiment of the present invention.

第21圖本發明之一多核微處理器的方塊圖。 Figure 21 is a block diagram of a multi-core microprocessor of the present invention.

以下將詳細討論本發明各種實施例之製造及使用方法。然而值得注意的是,本發明所提供之許多可行的發明概念可實施在各種特定範圍中。這些特定實施例僅用於舉例說明 本發明之製造及使用方法,但非用於限定本發明之範圍。 The methods of making and using various embodiments of the present invention are discussed in detail below. However, it is to be noted that many of the possible inventive concepts provided by the present invention can be implemented in various specific ranges. These specific embodiments are for illustration only The method of manufacture and use of the present invention is not intended to limit the scope of the invention.

定界框預取器: Bounding box prefetcher:

廣泛而言,關於上述問題之解決方法可以後續敘述加以解釋。當一記憶體之所有存取(指令、動作或要求)皆表示在一張圖上時,所有存取(指令、動作或要求)之一集合可被一定界框(bouding box)圈起來。當外加的存取要求亦表示於同一圖上時,這些存取要求亦可被調整大小後之定界框圈起來。第8圖之首張圖繪示對一記憶體區塊的兩次存取(指令或動作),第8圖之X軸表示指令存取之時間順序,Y軸表示具有4KB區塊之存取的64位元組快取線之索引。首先,描繪第一次之兩個存取:第一個存取係對快取線5進行存取,第二個存取要求係對快取線6進行存取。如圖所示之一定界框將代表存取要求的兩點圈起來。 Broadly speaking, the solution to the above problem can be explained in the following description. When all accesses (instructions, actions, or requirements) of a memory are represented on a single map, a set of all accesses (instructions, actions, or requirements) can be circled by a bounding box. When the additional access requirements are also indicated on the same map, these access requirements can also be rounded and bounded by the bounding box. The first picture in Figure 8 shows two accesses (instructions or actions) to a memory block. The X-axis of Figure 8 indicates the time sequence of instruction access, and the Y-axis indicates access with 4 KB blocks. The index of the 64-bit cache line. First, the first two accesses are depicted: the first access system accesses the cache line 5 and the second access request accesses the cache line 6. The bounding box shown in the figure will circle the two points representing the access requirements.

再者,第三個存取要求發生於快取線7,定界框變大使得代表第三個存取要求的新點可被定界框圈在內部。隨著新的存取不斷發生,定界框必隨著X軸擴大,並且定界框之上緣亦隨著Y軸擴大(此為向上的例子)。上述定界框上緣以及下緣的移動之歷史記錄將用以決定存取樣態之趨勢為向上、向下或者都不是。 Furthermore, a third access request occurs on the cache line 7, and the bounding box becomes larger so that a new point representing the third access requirement can be bounded internally. As new accesses continue to occur, the bounding box must expand with the X axis, and the upper edge of the bounding box also expands with the Y axis (this is an upward example). The history of the movement of the upper and lower edges of the bounding box above will be used to determine whether the trend of accessing the state is up, down or not.

除了追蹤定界框之上緣以及下緣的趨勢以決定一趨勢方向外,追蹤個別的存取要求也是必要的,因為存取要求跳過一或兩個快取線的事件時常發生。因此,為了避免預取可能被跳過快取線的事件發生,一旦偵測到一向上或向下之趨勢,預取單元將使用額外的準則決定所要預取之快取線。由於 存取要求趨勢會被重新排列,預取單元會以刪除時間記錄的方式顯示這些存取歷史記錄。此動作係藉由標記位元(marking bit)在一位元遮罩(bit mask)中完成的,每一位元對應於一記憶體區塊之一快取線,且當特定之區塊被存取時,對應位元遮罩之位元將被設置。一旦對記憶體區塊的存取要求已達到一充分數量,預取單元會使用位元遮罩(其中位元遮罩不具有存取時間順序的指示),並基於如下所述之較大的存取觀點(廣義large view)去存取整個區塊,而非基於較小的存取觀點(狹義small view)以及習知預取單元般僅根據存取的時間去做預取決策。 In addition to tracking the top and bottom edges of the bounding box to determine a trend direction, it is also necessary to track individual access requirements, as events that require one or two cache lines to skip are frequently encountered. Therefore, in order to avoid prefetching events that may be skipped by the cache line, once an up or down trend is detected, the prefetch unit will use additional criteria to determine the cache line to prefetch. due to The access request trend will be rearranged and the prefetch unit will display these access history records in the form of a delete time record. This action is performed by a marking bit in a bit mask, each bit corresponding to a cache line of a memory block, and when a particular block is When accessing, the bit of the corresponding bit mask will be set. Once the access requirements for the memory block have reached a sufficient number, the prefetch unit will use a bit mask (where the bit mask does not have an indication of the access time order) and is based on the larger The access point of view (generalized large view) accesses the entire block, rather than making prefetch decisions based only on the time of access, based on a smaller view of the small (small view) and the conventional prefetch unit.

第2圖所示為本發明之微處理器100的方塊圖。微處理器100包括一個具有複數階層之管線,並且管線中亦包括各種功能單元。管線包括一指令快取記憶體102,指令快取記憶體102耦接至一指令解碼器104;指令解碼器104耦接至一暫存器別名表106(register alias table,RAT);暫存器別名表106耦接至一保留站108(reservation station);保留站108耦接至一執行單元112(execution unit);最後,執行單元112耦接至一引退單元114(retire unit)。指令解碼器104可包括一指令轉譯器(instruction translator),用以將巨集指令(例如x86架構之巨集指令)轉譯為微處理器100之類似精簡指令集(reduce instruction set computer RISC)之巨集指令。保留站108產生並且傳送指令至執行單元112,用以使執行單元112以非循序方式來執行。引退單元114包括一重新排序緩衝器(reorder buffer),用以依據程式順序執行指令之引退(Retirement)。執行單元112包括載入/儲存單元134以及其他執行單元132(other execution unit),例如整數單元(integer unit)、浮點數單元(floating point unit)、分支單元(branch unit)或者單指令多重資料串流(Single Instruction Multiple Data,SIMD)單元。載入/儲存單元134用以讀取第一級資料快取記憶體116(level 1 data cache)之資料,並且寫入資料至第一級資料快取記憶體116。一第二級快取記憶體118作為第一級資料快取記憶體116以及指令快取記憶體102之備份。第二級快取記憶體118用以經由一匯流排介面單元122讀取以及寫入系統記憶體,匯流排介面單元122係微處理器100與一匯流排(例如一區域匯流排(local bus)或是記憶體匯流排(memory bus))間之一介面。微處理器100亦包括一預取單元124,用以自系統記憶體預取資料到第二級快取記憶體118及/或第一級資料快取記憶體116。 Figure 2 is a block diagram of a microprocessor 100 of the present invention. The microprocessor 100 includes a pipeline having a plurality of levels, and various functional units are also included in the pipeline. The pipeline includes an instruction cache 102, the instruction cache 102 is coupled to an instruction decoder 104; the instruction decoder 104 is coupled to a register alias table (RAT); a register The alias table 106 is coupled to a reservation station 108 (reservation station); the reservation unit 108 is coupled to an execution unit 112; finally, the execution unit 112 is coupled to a retirement unit 114. The instruction decoder 104 can include an instruction translator for translating a macro instruction (e.g., a macro instruction of an x86 architecture) into a macro of a reduced instruction set computer RISC of the microprocessor 100. Set instructions. The reservation station 108 generates and transmits instructions to the execution unit 112 for causing the execution unit 112 to execute in a non-sequential manner. The retirement unit 114 includes a reorder buffer for performing the retirement of the instructions in accordance with the program sequence. The execution unit 112 includes a load/store unit 134 and other execution units 132 (other execution Unit), such as an integer unit, a floating point unit, a branch unit, or a single instruction multiple instruction multiple data (SIMD) unit. The load/store unit 134 is configured to read the data of the first level data cache 116 and write the data to the first level data cache 116. A second level cache memory 118 serves as a backup of the first level data cache memory 116 and the instruction cache memory 102. The second level cache memory 118 is used to read and write system memory via a bus interface unit 122. The bus interface unit 122 is a bus 100 and a bus (for example, a local bus). Or an interface between memory buss. The microprocessor 100 also includes a prefetch unit 124 for prefetching data from the system memory to the second level cache memory 118 and/or the first level data cache memory 116.

第3圖所示為第2圖之預取單元124細部方塊圖。預取單元124包括一區塊位元遮罩暫存器302。區塊位元遮罩暫存器302中之每一位元對應一記憶體區塊之一快取線,其中記憶體區塊之區塊號碼係儲存在一區塊號碼暫存器303內。換言之,區塊號碼暫存器303儲存了記憶體區塊之上層位址位元(upper address bits)。當區塊位元遮罩暫存器302中之一位元的數值為真(true value)時,係指出所對應之快取線已經被存取了。初始化區塊位元遮罩暫存器302將使得所有的位元值為假(false)。在一實施例中,記憶體區塊的大小為4KB,並且快取線之大小為64位元組。因此,區塊位元遮罩暫存器302具有64位元之容量。在某些實施例中,記憶體區塊之大小亦可與實體記憶體分頁(physical memory page)之大小相同。然而,快取 線之大小在其他實施例中可為其他各種不同之大小。再者,區塊位元遮罩暫存器302上所維持之記憶體區域之大小是可改變的,並不需要對應於實體記憶體分頁的大小。更確切的說,區塊位元遮罩暫存器302上所維持之記憶體區域(或區塊)之大小可為任何大小(二的倍數最好),只要其擁有足夠的快取線以利於預取方向與樣態的偵測即可。 Fig. 3 is a detailed block diagram of the prefetch unit 124 of Fig. 2. The prefetch unit 124 includes a block bit mask register 302. Each bit in the block bit mask register 302 corresponds to a cache line of a memory block, wherein the block number of the memory block is stored in a block number register 303. In other words, the block number register 303 stores the upper address bits of the memory block. When the value of one of the bits in the block bit mask register 302 is true (true value), it indicates that the corresponding cache line has been accessed. Initializing the block bit mask register 302 will cause all bit values to be false. In one embodiment, the size of the memory block is 4 KB and the size of the cache line is 64 bytes. Therefore, the block bit mask register 302 has a capacity of 64 bits. In some embodiments, the size of the memory block can also be the same as the size of the physical memory page. However, cache The size of the lines can be of various other different sizes in other embodiments. Moreover, the size of the memory area maintained on the block bit mask register 302 is changeable and does not need to correspond to the size of the physical memory page. More specifically, the size of the memory area (or block) maintained on the block bit mask register 302 can be any size (the multiple of two is best) as long as it has enough cache lines. It is good for pre-fetching direction and pattern detection.

預取單元124亦可包括一最小指標暫存器304(min pointer register)以及一最大指標暫存器306(max pointer register)。最小指標暫存器304以及最大指標暫存器306分別用以在預取單元124開始追蹤一記憶體區塊之存取後,持續地指向此記憶體區塊中已被存取之最低以及最高之快取線的索引(index)。預取單元124更包括一最小改變計數器308以及一最大改變計數器312。最小改變計數器308以及最大改變計數器312分別用以在預取單元124開始追蹤此記憶體區塊之存取後,計算最小指標暫存器304以及最大指標暫存器306改變之次數。預取單元124亦包括一總計數器314,用以在預取單元124開始追蹤此記憶體區塊之存取後,計算已被存取之快取線的總數。預取單元124亦包括一中間指標暫存器316,用以在預取單元124開始追蹤此記憶體區塊之存取後,指向此記憶體區塊之中間預取記憶體線之索引(例如最小指標暫存器304之計數值以及最大改變計數器312之計數值的平均)。預取單元124亦包括一方向暫存器342(direction register)、一樣態暫存器344,一樣態週期暫存器346、一樣態區域暫存器348以及一搜尋指標暫存器352,其各功能如下所述。 The prefetch unit 124 can also include a min pointer register 304 and a max pointer register 306. The minimum indicator register 304 and the maximum indicator register 306 are respectively used to continuously point to the lowest and highest accessed in the memory block after the prefetch unit 124 starts tracking the access of a memory block. The index of the cache line (index). The prefetch unit 124 further includes a minimum change counter 308 and a maximum change counter 312. The minimum change counter 308 and the maximum change counter 312 are respectively used to calculate the number of times the minimum indicator register 304 and the maximum indicator register 306 are changed after the prefetch unit 124 starts tracking the access of the memory block. The prefetch unit 124 also includes a total counter 314 for calculating the total number of cache lines that have been accessed after the prefetch unit 124 begins tracking the access of the memory block. The prefetch unit 124 also includes an intermediate indicator register 316 for pointing to the index of the intermediate prefetch memory line of the memory block after the prefetch unit 124 starts tracking the access of the memory block (eg The count value of the minimum index register 304 and the average of the count values of the maximum change counter 312). The prefetch unit 124 also includes a direction register 342, a state register 344, a state cycle register 346, a state area register 348, and a search index register 352, each of which The function is as follows.

預取單元124亦包括複數週期匹配計數器318(period match counter)。每一週期匹配計數器318維持一不同週期之一計數值。在一實施例中,週期為3、4以及5。週期係指中間指標暫存器316左/右之位元數。週期匹配計數器318之計數值在區塊的每一記憶體存取進行之後更新。當區塊位元遮罩暫存器302指示在週期中對中間指標暫存器316左邊的存取與對中間指標暫存器316右邊的存取相匹配時,預取單元124則接著增加與該週期相關之週期匹配計數器318之計數值。關於週期匹配計數器318更詳細之應用以及操作,將特別在下述之第四、五圖講述之。 The prefetch unit 124 also includes a period match counter 318 (period match counter). Each cycle match counter 318 maintains a count value for one of the different cycles. In one embodiment, the periods are 3, 4, and 5. The period refers to the number of bits left/right of the intermediate indicator register 316. The count value of the period match counter 318 is updated after each memory access of the block is made. When the block bit mask register 302 indicates that the access to the left of the intermediate indicator register 316 matches the access to the right of the intermediate indicator register 316 during the cycle, the prefetch unit 124 then increments The period associated with this period matches the count value of counter 318. More detailed application and operation of the period matching counter 318 will be described in particular in the fourth and fifth figures below.

預取單元124亦包括一預取要求佇列328、一提取指標器324(pop pointer)以及一推進指標器326(push pointer)。預取要求佇列328包括一具有許多項目(entry)的循環佇列,上述項目的每一者用以儲存預取單元124之操作(特別是關於第4、6以及7圖)所產生之預取要求。推進指標器326指出預取要求佇列328中下一個被分派的項目(entry)。提取指標器324指出將從預取要求佇列328移出之下一個項目。在一實施例中,因為預取要求可能以非循序的方式(out of order)結束,所以預取要求佇列328係可以非循序的方式提取(popping)已完成的(completed)項目。在一實施例中,預取要求佇列328的大小係由於管線流程中,所有於管線裡欲進入第二級快取記憶體118之標記管線(tag pipeline)的要求數量而定的,於是使得預取要求佇列328中項目之數目至少和第二級快取記憶體118內之管線層級(stages)一樣多。預取要求將維持直至第二級快取記憶 體118之管線結束,在這個時間點,要求可能是三個結果之一,如第7圖更詳細之敘述,亦即命中(hit in)第二級快取記憶體118、重新執行(replay)、或者推進一填入佇列項目(fill queue entry),用以從系統記憶體預取需要的資料。 The prefetch unit 124 also includes a prefetch request queue 328, an extract pointer 324 (pop pointer), and a push pointer 326. The prefetch request queue 328 includes a loop queue having a number of entries, each of which is used to store the operations of the prefetch unit 124 (especially with respect to Figures 4, 6, and 7). Take the request. The push indicator 326 indicates the next assigned entry in the prefetch request queue 328. The extraction indicator 324 indicates that the next item will be removed from the prefetch request queue 328. In an embodiment, because the prefetch request may end in an out of order, the prefetch request queue 328 may populate the completed item in a non-sequential manner. In one embodiment, the size of the prefetch request queue 328 is determined by the number of pipeline pipelines in the pipeline that are intended to enter the second stage cache memory 118, thus The number of items in the prefetch request queue 328 is at least as many as the number of pipelines in the second level cache 118. Prefetch requirements will be maintained until the second level cache memory The pipeline of the body 118 ends. At this point in time, the request may be one of three results, as described in more detail in FIG. 7, that is, hit in the second-level cache memory 118, replay. Or advance a fill queue entry to prefetch the required data from the system memory.

預取單元124亦包括控制邏輯322,控制邏輯322控制預取單元124之各元件執行其功能。 Prefetch unit 124 also includes control logic 322 that controls the various components of prefetch unit 124 to perform their functions.

雖然第3圖只顯示出一組與一主動(active)記憶體區塊有關之硬體單元332(區塊位元遮罩暫存器302、區塊號碼暫存器303、最小指標暫存器304、最大指標暫存器306、最小改變計數器308、最大改變計數器312、總計數器314、中間指標暫存器316、樣態週期暫存器346、樣態區域暫存器348以及搜尋指標暫存器352),但預取單元124可包括複數個如第3圖所示之硬體單元332,用以追蹤多個主動記憶體區塊的存取。 Although FIG. 3 only shows a set of hardware units 332 associated with an active memory block (block bit mask register 302, block number register 303, minimum indicator register) 304, maximum indicator register 306, minimum change counter 308, maximum change counter 312, total counter 314, intermediate indicator register 316, state cycle register 346, sample area register 348, and search indicator temporary storage The prefetch unit 124 can include a plurality of hardware units 332 as shown in FIG. 3 for tracking access of a plurality of active memory blocks.

在一實施例中,微處理器100亦包括一個或多個高度反應式的(highly reactive)預取單元(未圖示),高度反應式的預取單元係在非常小的時間樣本(sample)中使用不同的演算法來進行存取,並且與預取單元124配合動作,其說明如下。由於此處所述之預取單元124分析較大記憶體存取之數目(相較於高度反應式的預取單元),其必趨向使用更長的時間去開始預取一新的記憶體區塊(如下所述),但比高度反應式的預取單元更精確。因此,使用高度反應式的預取單元與預取單元124同時動作,微處理器100可擁有高度反應式的預取單元之更快反應時間以及預取單元124之高精確度。另外,預取單元124可監控來自其他預取單元之要求,並且在其預取演算法中使用這 些要求。 In one embodiment, the microprocessor 100 also includes one or more highly reactive prefetch units (not shown) that are highly reactive pre-fetch units in very small time samples. Different algorithms are used for access and cooperate with the prefetch unit 124, as explained below. Since the prefetch unit 124 described herein analyzes the number of larger memory accesses (compared to the highly reactive prefetch unit), it will tend to use a longer time to start prefetching a new memory region. Block (described below), but more accurate than highly reactive prefetch units. Thus, using the highly reactive prefetch unit to operate simultaneously with the prefetch unit 124, the microprocessor 100 can have a faster reaction time for the highly reactive prefetch unit and high precision of the prefetch unit 124. Additionally, prefetch unit 124 can monitor requirements from other prefetch units and use this in its prefetch algorithm Some requirements.

第4圖所示為第2圖之微處理器100的操作流程圖,並且特別是第3圖之預取單元124的動作。流程開始於步驟402。 Figure 4 is a flow chart showing the operation of the microprocessor 100 of Figure 2, and in particular the operation of the prefetch unit 124 of Figure 3. The process begins in step 402.

在步驟402中,預取單元124接收一個對一記憶體位址進行存取之一載入/儲存記憶體存取要求。在一實施例中,預取單元124在判斷預取哪些快取線時,會將載入記憶體存取要求與儲存記憶體存取要求加以區分。在其他實施例中,預取單元124並不會在判斷預取哪些快取線時,辨別載入以及儲存。在一實施例中,預取單元124接收載入/儲存單元134所輸出之記憶體存取要求。預取單元124可接收來自不同來源之記憶體存取要求,上述來源包括(但不限於)載入/儲存單元134、第一級資料快取記憶體116(例如第一級資料快取記憶體116所產生之一分派要求,於載入/儲存單元134記憶體存取未擊中第一級資料快取記憶體116時),及/或其他來源,例如微處理器100中執行與預取單元124不同預取演算法以預取資料之其他預取單元(未圖示)。流程進入步驟404。 In step 402, prefetch unit 124 receives a load/store memory access request for accessing a memory address. In one embodiment, the prefetch unit 124 distinguishes between the load memory access request and the memory memory access request when determining which cache lines are prefetched. In other embodiments, the prefetch unit 124 does not discriminate between loading and storing when determining which cache lines are prefetched. In one embodiment, the prefetch unit 124 receives the memory access request output by the load/store unit 134. The prefetch unit 124 can receive memory access requests from different sources, including but not limited to the load/store unit 134, the first level data cache 116 (eg, the first level data cache) 116 generates one of the dispatch requests, when the load/store unit 134 memory access misses the first level data cache 116, and/or other sources, such as the microprocessor 100 performs and prefetches Unit 124 differs in prefetching algorithms to prefetch other prefetch units of data (not shown). The flow proceeds to step 404.

在步驟404中,控制邏輯322根據比較記憶體存取位址與每一區塊號碼暫存器303之數值,判斷是否對一主動區塊之記憶體進行存取。也就是,控制邏輯322判斷第3圖所示之硬體單元332是否已被分派給記憶體存取要求所指定之記憶體位址所相關的記憶體區塊。若是,則進入步驟406。 In step 404, the control logic 322 determines whether to access the memory of an active block based on the comparison memory access address and the value of each block number register 303. That is, the control logic 322 determines whether the hardware unit 332 shown in FIG. 3 has been assigned to the memory block associated with the memory address specified by the memory access request. If yes, go to step 406.

在步驟406中,控制邏輯322分派第3圖所示之硬體單元332給相關之記憶體區塊。在一實施例中,控制邏輯322以 一輪替(round-robin)的方式分派硬體單元332。在其他實施例,控制邏輯322為硬體單元332維持最久未用到的頁取代法(least-recently-used)之資訊,並且以一最久未用到的頁取代法(least-recently-used)之基礎進行分派。另外,控制邏輯322會初始化所分派之硬體單元332。特別是,控制邏輯322會清除區塊位元遮罩暫存器302之所有位元,將記憶體存取位址之上層位元填寫(populate)至區塊號碼暫存器303,並且清除最小指標暫存器304、最大指標暫存器306、最小改變計數器308、最大改變計數器312、總計數器314以及週期匹配計數器318為0。流程進入到步驟408。 In step 406, control logic 322 dispatches hardware unit 332 shown in FIG. 3 to the associated memory block. In an embodiment, the control logic 322 The hardware unit 332 is dispatched in a round-robin manner. In other embodiments, control logic 322 maintains the least-recently-used information for hardware unit 332 that has not been used for the longest time, and is a least-recently-used method that has not been used for the longest time. The basis is assigned. Additionally, control logic 322 initializes the assigned hardware unit 332. In particular, the control logic 322 clears all bits of the block bit mask register 302, populates the level bits above the memory access address into the block number register 303, and clears the minimum. The index register 304, the maximum indicator register 306, the minimum change counter 308, the maximum change counter 312, the total counter 314, and the period match counter 318 are zero. The flow proceeds to step 408.

在步驟408中,控制邏輯322根據記憶體存取位址更新硬體單元332,如第5圖所述。流程進入步驟412。 In step 408, control logic 322 updates hardware unit 332 based on the memory access address, as described in FIG. The flow proceeds to step 412.

在步驟412中,硬體單元332測試(examine)總計數器314用以判斷程式是否已對記憶體區塊進行足夠之存取要求,以便偵測一存取樣態。在一實施例中,控制邏輯322判斷總計數器314之計數值是否大於一既定值。在一實施例中,此既定值為10,然而此既定值有很多種本發明不限於此。若已執行足夠之存取要求,流程進行至步驟414;否則流程結束。 In step 412, the hardware unit 332 examines the total counter 314 to determine whether the program has sufficient access requirements for the memory block to detect an access pattern. In one embodiment, control logic 322 determines whether the count value of total counter 314 is greater than a predetermined value. In an embodiment, the predetermined value is 10, however, there are many variations of the predetermined values, and the present invention is not limited thereto. If sufficient access requirements have been fulfilled, the flow proceeds to step 414; otherwise the flow ends.

在步驟414中,控制邏輯322判斷在區塊位元遮罩暫存器302中所指定的存取要求是否具有一個明顯的趨勢。也就是說,控制邏輯322判斷存取要求有明顯向上的趨勢(存取位址增加)或是向下的趨勢(存取位址減少)。在一實施例中,控制邏輯322根據最小改變計數器308以及最大改變計數器312兩者的差值(difference)是否大於一既定值來決定存取要求是否有 明顯的趨勢。在一實施例中,既定值為2,而在其他實施例中既定值可為其他數值。當最小改變計數器308之計數值大於最大改變計數器312之計數值一既定值,則有明顯向下的趨勢;反之,當最大改變計數器312之計數值大於最小改變計數器308之計數值一既定值,則有明顯向上的趨勢。當有一明顯的趨勢已產生,則進入步驟416,否則結束流程。 In step 414, control logic 322 determines if the access request specified in block bit mask register 302 has a significant trend. That is, control logic 322 determines whether the access request has a significantly upward trend (access address increase) or a downward trend (access address reduction). In an embodiment, the control logic 322 determines whether the access request is based on whether the difference between the minimum change counter 308 and the maximum change counter 312 is greater than a predetermined value. A clear trend. In one embodiment, the predetermined value is 2, while in other embodiments the predetermined value may be other values. When the count value of the minimum change counter 308 is greater than the count value of the maximum change counter 312, there is a clear downward trend; conversely, when the count value of the maximum change counter 312 is greater than the count value of the minimum change counter 308, There is a clear upward trend. When a significant trend has occurred, proceed to step 416, otherwise the process ends.

在步驟416中,控制邏輯322判斷在區塊位元遮罩暫存器302所指定的存取要求中是否為具有一明顯的樣態週期贏家(pattern period winner)。在一實施例中,控制邏輯322根據週期匹配計數器318之一者與其他週期匹配計數器318計數值之差值是否大於一既定值來決定是否有一明顯的樣態週期贏家。在一實施例中,既定值為2,而在其他實施例中既定值可為其他數值。週期匹配計數器318之更新動作將於第5圖加以詳述。當有一明顯的樣態週期贏家產生,流程進行到步驟418;否則,流程結束。 In step 416, control logic 322 determines if there is an apparent pattern period winner in the access request specified by block cipher mask register 302. In one embodiment, control logic 322 determines whether there is a significant epoch cycle winner based on whether the difference between one of cycle match counters 318 and the other cycle match counter 318 count values is greater than a predetermined value. In one embodiment, the predetermined value is 2, while in other embodiments the predetermined value may be other values. The update action of the period match counter 318 will be detailed in FIG. When there is a clear pattern of winners, the flow proceeds to step 418; otherwise, the process ends.

在步驟418中,控制邏輯322填寫方向暫存器342以指出步驟414所判斷之明顯的方向趨勢。另外,控制邏輯322用在步驟416偵測之明顯贏家樣態週期(clear winning pattern period)(N)填寫樣態週期暫存器346。最後,控制邏輯322將步驟416所偵測到之明顯贏家樣態週期填寫至樣態暫存器344中。也就是說,控制邏輯322用區塊位元遮罩暫存器302之中間指標暫存器316至右側或者左側N位元(根據第5圖步驟518所述而匹配)來填寫樣態暫存器344。流程進行到步驟422。 In step 418, control logic 322 fills in direction register 342 to indicate the apparent directional trend as determined by step 414. In addition, control logic 322 fills in state cycle register 346 with the clear winning pattern period (N) detected in step 416. Finally, control logic 322 populates the apparent register 344 with the apparent winner modality period detected in step 416. That is, the control logic 322 masks the intermediate indicator register 316 of the scratchpad 302 to the right or left N bits (matched as described in step 518 of FIG. 5) with the block bit to fill in the temporary state. 344. The flow proceeds to step 422.

在步驟422中,控制邏輯322根據所偵測到之方向 以及樣態開始對記憶體區塊中尚未被預取之快取線(non-fetched cache line)進行預取(如第6圖中所述)。流程在步驟422結束。 In step 422, control logic 322 is based on the detected direction And the pattern begins to prefetch the non-fetched cache line in the memory block (as described in Figure 6). The process ends at step 422.

第5圖所示為第3圖所示之預取單元124執行第4圖所示之步驟408的操作流程。流程開始於步驟502。 Fig. 5 is a flow chart showing the operation of step 408 shown in Fig. 4 by the prefetch unit 124 shown in Fig. 3. The flow begins in step 502.

在步驟502中,控制邏輯322增加總計數器314之計數值。流程進行到步驟504。 In step 502, control logic 322 increments the count value of total counter 314. The flow proceeds to step 504.

在步驟504中,控制邏輯322判斷目前的記憶體存取位址(特別是指,最近記憶體存取位址所相關之快取線之記憶體區塊的索引值)是否大於最大指標暫存器306之值。若是,流程進行到步驟506;若否則流程進行至步驟508。 In step 504, the control logic 322 determines whether the current memory access address (in particular, the index value of the memory block of the cache line associated with the most recent memory access address) is greater than the maximum index temporary storage. The value of 306. If so, the flow proceeds to step 506; otherwise, the flow proceeds to step 508.

在步驟506中,控制邏輯322用最近記憶體存取位址所相關之快取線之記憶體區塊的索引值來更新最大指標暫存器306,並增加最大改變計數器312之計數值。流程進行到步驟514。 In step 506, control logic 322 updates the maximum indicator register 306 with the index value of the memory block of the cache line associated with the most recent memory access address and increments the count value of the maximum change counter 312. The flow proceeds to step 514.

在步驟508中,控制邏輯322判斷被最近記憶體存取位址所相關之快取線之記憶體區塊的索引值是否小於最小指標暫存器304之值。若是,流程進行至步驟512;若否,則流程進行至步驟514。 In step 508, control logic 322 determines whether the index value of the memory block of the cache line associated with the most recent memory access address is less than the value of minimum indicator register 304. If so, the flow proceeds to step 512; if not, the flow proceeds to step 514.

在步驟512中,控制邏輯322用最近記憶體存取位址所相關之快取線之記憶體區塊的索引值來更新最小指標暫存器304,並增加最小改變計數器308之計數值。流程進行到步驟514。 In step 512, control logic 322 updates the minimum indicator register 304 with the index value of the memory block of the cache line associated with the most recent memory access address and increments the count value of the minimum change counter 308. The flow proceeds to step 514.

在步驟514中,控制邏輯322計算最小指標暫存器 304與最大指標暫存器306之平均值,並且用所算之出平均值更新中間指標暫存器316。流程進行到步驟516。 In step 514, control logic 322 calculates a minimum indicator register The average of 304 and the maximum indicator register 306, and the intermediate indicator register 316 is updated with the calculated average. The flow proceeds to step 516.

在步驟516中,控制邏輯322檢查區塊位元遮罩暫存器302,並且以中間指標暫存器316為中心,切割成左側與右側各N位元,其中N為與每一週期匹配計數器318相關之位元數。流程進行到步驟518。 In step 516, the control logic 322 checks the block bit mask register 302 and cuts it into the left and right N bits, centered on the intermediate indicator register 316, where N is the match counter for each cycle. 318 related bits. The flow proceeds to step 518.

在步驟518中,控制邏輯322決定在中間指標暫存器316之左側的N位元是否與中間指標暫存器316之右側的N位元相匹配。若是,流程進行到步驟522;若否,則流程結束。 In step 518, control logic 322 determines whether the N-bit to the left of intermediate indicator register 316 matches the N-bit to the right of intermediate indicator register 316. If so, the flow proceeds to step 522; if not, the flow ends.

在步驟522中,控制邏輯322增加具有一N週期之週期匹配計數器318之計數值。流程結束於步驟522。 In step 522, control logic 322 increments the count value of period match counter 318 having an N cycle. The process ends at step 522.

第6圖所示為第3圖之預取單元124執行第4圖之步驟422的操作流程圖。流程開始於步驟602。 Figure 6 is a flow chart showing the operation of the prefetch unit 124 of Figure 3 to perform step 422 of Figure 4. The flow begins in step 602.

在步驟602中,控制邏輯322初始化會在偵測方向外之中間指標暫存器316的樣態週期暫存器346中,對搜尋指標暫存器352以及樣態區域暫存器(patten location)348進行初始化。也就是說,控制邏輯322會將搜尋指標暫存器352以及樣態區域暫存器348初始化成中間指標暫存器316與所偵測到之週期(N)兩者之間相加/相減後的值。例如,當中間指標暫存器316之值為16,N為5,並且方向暫存器342所示之趨勢為向上時,控制邏輯322將搜尋指標暫存器352以及樣態區域暫存器348初始化為21。因此,在本例中,為了比較之目的(如下所述),樣態暫存器344之5位元可設置於區塊位元遮罩暫存器302之位元21到25。流程進行到步驟604。 In step 602, the control logic 322 initializes the state cycle register 346 of the intermediate indicator register 316 that is outside the detection direction, the search index register 352 and the patch location. 348 is initialized. That is, control logic 322 initializes search index register 352 and aspect area register 348 to add/subtract between intermediate indicator register 316 and detected period (N). After the value. For example, when the value of the intermediate indicator register 316 is 16, N is 5, and the trend shown by the direction register 342 is upward, the control logic 322 will search the indicator register 352 and the sample area register 348. Initialized to 21. Thus, in this example, for comparison purposes (described below), the 5-bits of the mode register 344 can be placed in bits 21 through 25 of the block bit mask register 302. The flow proceeds to step 604.

在步驟604中,控制邏輯322測試區塊位元遮罩暫存器302中在方向暫存器342所指之位元以及樣態暫存器344中之對應位元(該位元係位於樣態區域暫存器348中,用以對應區塊位元遮罩暫存器者),用以預測是否預取記憶體區塊中之對應快取線。流程進行到步驟606。 In step 604, the control logic 322 tests the bit in the block buffer mask 302 in the direction register 342 and the corresponding bit in the mode register 344 (the bit is located in the sample). The state area register 348 is used to mask the temporary buffer in the corresponding block, and is used to predict whether to prefetch the corresponding cache line in the memory block. The flow proceeds to step 606.

在步驟606中,控制邏輯322預測是否需要所測試之快取線。當樣態暫存器344之位元為真(true),控制邏輯322則預測此快取線係需要的,樣態預測程式將會存取此快取線。若快取線係需要的,流程進行到步驟614;否則,流程進行到步驟608。 In step 606, control logic 322 predicts if the tested cache line is needed. When the bit of the mode register 344 is true, the control logic 322 predicts what the cache line needs, and the mode predictor will access the cache line. If the cache line is required, the flow proceeds to step 614; otherwise, the flow proceeds to step 608.

在步驟608中,控制邏輯322根據方向暫存器342是否已到達區塊位元遮罩暫存器302之末端,判斷在記憶體區塊中是否有其他未測試之快取線。若已無未測試之快取線,則流程結束;否則,流程進行至步驟612。 In step 608, control logic 322 determines if there are other untested cache lines in the memory block based on whether direction register 342 has reached the end of block bit mask register 302. If there are no untested cache lines, the process ends; otherwise, the flow proceeds to step 612.

在步驟612中,控制邏輯322增加/減少方向暫存器342之值。另外,若方向暫存器342已超過樣態暫存器344的最後一位元時,控制邏輯322將用方向暫存器342之新數值更新樣態區域暫存器348,例如將樣態暫存器344移位(shift)至方向暫存器342之位置。流程進行到步驟604。 In step 612, control logic 322 increments/decreases the value of direction register 342. In addition, if the direction register 342 has exceeded the last bit of the mode register 344, the control logic 322 will update the sample area register 348 with the new value of the direction register 342, for example, to temporarily pause the pattern. The register 344 is shifted to the position of the direction register 342. The flow proceeds to step 604.

在步驟614中,控制邏輯322決定所需要之快取線是否已被預取。當區塊位元遮罩暫存器302之位元為真,控制邏輯322則判斷所需要之快取線已被預取。若所需要之快取線已被預取,流程進行到步驟608;否則,流程進行到步驟616。 In step 614, control logic 322 determines if the required cache line has been prefetched. When the bit of the block bit mask register 302 is true, the control logic 322 determines that the desired cache line has been prefetched. If the desired cache line has been prefetched, the flow proceeds to step 608; otherwise, the flow proceeds to step 616.

在判斷步驟616中,若方向暫存器342為向下,控 制邏輯322自最小指標暫存器304判斷列入考量之快取線是否多於一既定值(既定值在一實施例中為16);或者若方向暫存器342為向上,控制邏輯322自最大指標暫存器306判斷列入考量之快取線是否多於一既定值。若控制邏輯322於上述的判斷為真,則流程結束;否則,流程進行到判斷步驟618。值得注意的是,若快取線大幅多於(遠離)最小指標暫存器304/最大指標暫存器306則流程結束,但這樣並不代表預取單元124將不接著預取記憶體區塊之其它快取線,因為根據第4圖之步驟,對記憶體區塊之快取線的後續存取亦會再觸發對該記憶體區塊更多的預取動作。 In decision step 616, if the direction register 342 is down, control The logic 322 determines from the minimum indicator register 304 whether the cache line in question is more than a predetermined value (the default value is 16 in one embodiment); or if the direction register 342 is up, the control logic 322 is The maximum indicator register 306 determines whether the cache line included in the consideration is more than a predetermined value. If the control logic 322 is true at the above determination, the flow ends; otherwise, the flow proceeds to decision step 618. It should be noted that if the cache line is significantly larger than (away from) the minimum indicator register 304/maximum indicator register 306, the process ends, but this does not mean that the prefetch unit 124 will not prefetch the memory block. The other cache lines, because according to the steps of FIG. 4, subsequent accesses to the cache line of the memory block will also trigger more prefetch actions for the memory block.

在步驟618中,控制邏輯322判斷預取要求佇列328是否已滿。若是預取要求佇列328已滿,則流程進行到步驟622,否則流程進行到步驟624。 In step 618, control logic 322 determines if prefetch request queue 328 is full. If the prefetch request queue 328 is full, the flow proceeds to step 622, otherwise the flow proceeds to step 624.

在步驟622中,控制邏輯322暫停(stall)直到預取要求佇列328未滿(non-full)為止。流程進行到步驟624。 In step 622, control logic 322 stalls until prefetch request queue 328 is non-full. The flow proceeds to step 624.

在步驟624中,控制邏輯322推進一項目(entry)至預取要求佇列328,以預取快取線。流程進行到步驟608。 In step 624, control logic 322 advances an entry to prefetch request queue 328 to prefetch the cache line. The flow proceeds to step 608.

第7圖所示為第3圖之預取要求佇列328的操作流程圖。流程開始於步驟702。 Figure 7 is a flow chart showing the operation of the prefetch request queue 328 of Figure 3. The process begins in step 702.

在步驟702中,在步驟624中被推進到預取要求佇列328中之一預取要求被允許進行存取(其中此預取要求用以對第二級快取記憶體118進行存取),並繼續進行至第二級快取記憶體118的管線。流程進行到步驟704。 In step 702, one of the prefetch requests advanced to the prefetch request queue 328 is allowed to be accessed in step 624 (where the prefetch request is used to access the second level cache 118) And proceed to the pipeline of the second level cache memory 118. The flow proceeds to step 704.

在步驟704中,第二級快取記憶體118判斷快取線 位址是否命中第二級快取記憶體118。若快取線位址命中第二級快取記憶體118,則流程進行到步驟706;否則,流程進行到判斷步驟708。 In step 704, the second level cache memory 118 determines the cache line. Whether the address hits the second level cache 118. If the cache line address hits the second level cache memory 118, the flow proceeds to step 706; otherwise, the flow proceeds to decision step 708.

在步驟706中,因為快取線已經在第二級快取記憶體118中準備好,故不需要預取快取線,流程結束。 In step 706, since the cache line is already prepared in the second level cache 118, there is no need to prefetch the cache line and the flow ends.

在步驟708中,控制邏輯322判斷第二級快取記憶體118之回應是否為此預取要求必須被重新執行。若是,則流程進行至步驟712;否則,流程進行至步驟714。 In step 708, control logic 322 determines if the response of second level cache 118 has to be re-executed for this prefetch request. If so, the flow proceeds to step 712; otherwise, the flow proceeds to step 714.

在步驟712中,預取快取線之預取要求係重新推進(re-pushed)至預取要求佇列328中。流程結束於步驟712。 In step 712, the prefetch request for the prefetch cache line is re-pushed into the prefetch request queue 328. The process ends at step 712.

在步驟714中,第二級快取記憶體118推進一要求至微處理器100之一填入佇列(fill queue)(未圖示)中,用以要求匯流排介面單元122將快取線讀取至微處理器100中。流程結束於步驟714。 In step 714, the second level cache memory 118 advances a request to one of the microprocessors 100 to fill a fill queue (not shown) for requesting the bus interface unit 122 to cache the line. Read into the microprocessor 100. The process ends at step 714.

第9圖所示為第2圖之微處理器100的操作範例。如第9圖所示為對一記憶體區塊進行十次存取後,區塊位元遮罩暫存器302(在一位元位置上之星號表示對所對應之快取線進行存取)、最小改變計數器308、最大改變計數器312、以及總計數器314在第一、第二以及第十存取之內容。在第9圖中,最小改變計數器308稱為“cntr_min_change”,最大改變計數器312稱為“cntr_max_change”,以及總計數器314稱為“cntr_total”。中間指標暫存器316之位置在第9圖中則以“M”所指示。 Fig. 9 is a diagram showing an example of the operation of the microprocessor 100 of Fig. 2. As shown in FIG. 9, after ten accesses to a memory block, the block bit masks the scratchpad 302 (the asterisk at the one-bit position indicates access to the corresponding cache line) The minimum change counter 308, the maximum change counter 312, and the contents of the total counter 314 in the first, second, and tenth accesses. In Fig. 9, the minimum change counter 308 is referred to as "cntr_min_change", the maximum change counter 312 is referred to as "cntr_max_change", and the total counter 314 is referred to as "cntr_total". The position of the intermediate indicator register 316 is indicated by "M" in Fig. 9.

由於對位址0x4dced300所進行的第一次存取(如第 4圖之步驟402)係在記憶體區塊中位於索引12上的快取線上進行,因此控制邏輯322將設定區塊位元遮罩暫存器302之位元12(第4圖之步驟408),如圖所示。另外,控制邏輯322將更新最小改變計數器308、最大改變計數器312以及總計數器314(第5圖之步驟502、506以及512)。 Due to the first access to the address 0x4dced300 (eg Step 402 of Figure 4 is performed on the cache line on index 12 in the memory block, so control logic 322 masks the block bits to mask bit 12 of register 302 (step 408 of Figure 4). ),as the picture shows. Additionally, control logic 322 will update minimum change counter 308, maximum change counter 312, and total counter 314 (steps 502, 506, and 512 of FIG. 5).

由於對位址0x4ced260之第二次存取係在記憶體區塊中位於索引9上的快取線進行,控制邏輯322根據將設定區塊位元遮罩暫存器302之位元9,如圖所示。另外,控制邏輯322將更新最小改變計數器308以及總計數器314之計數值。 Since the second access to the address 0x4ced260 is performed on the cache line located on the index 9 in the memory block, the control logic 322 masks the bit 9 of the register 302 according to the set block bit, such as The figure shows. Additionally, control logic 322 will update the count value of minimum change counter 308 and total counter 314.

在第三到第十次存取中(第三到第九次存取之位址未予圖示,第十次的存取位址為0x4dced6c0),控制邏輯322根據會對區塊位元遮罩暫存器302進行適當元之設置,如圖所示。另外,控制邏輯322對應於每一次存取更新最小改變計數器308、最大改變計數器312以及總計數器314之計數值。 In the third to tenth accesses (the addresses of the third to ninth access are not illustrated, and the tenth access address is 0x4dced6c0), the control logic 322 blocks the block bits according to The hood register 302 performs the appropriate element settings as shown. Additionally, control logic 322 updates the count values of minimum change counter 308, maximum change counter 312, and total counter 314 for each access update.

第9圖底部為控制邏輯322在每個執行十次的記憶體的存取中,當執行完步驟514到522後的週期匹配計數器318之內容。在第9圖中,週期匹配計數器318稱為“cntr_period_N_matches”,其中N為1、2、3、4或者5。 At the bottom of Fig. 9, the control logic 322 matches the contents of the counter 318 after the execution of steps 514 through 522 in each memory access performed ten times. In Fig. 9, the period matching counter 318 is referred to as "cntr_period_N_matches", where N is 1, 2, 3, 4 or 5.

如第9圖所示之範例,雖然符合步驟412的準則(總計數器314至少為十)以及符合步驟416的準則(週期5之週期匹配計數器318較其他所有之週期匹配計數器318至少大於2),但不符合步驟414的準則(最小改變計數器308以及區塊位元遮罩暫存器302之間的差少於2)。因此,此時將不會在此記憶體區塊內執行預取操作。 As in the example shown in FIG. 9, although the criteria of step 412 are met (the total counter 314 is at least ten) and the criteria of step 416 are met (the period matching counter 318 of period 5 is at least greater than 2 than all other period matching counters 318), However, the criteria of step 414 are not met (the difference between the minimum change counter 308 and the block bit mask register 302 is less than 2). Therefore, the prefetch operation will not be performed within this memory block at this time.

第9圖底部亦顯示在週期3、4以及5中,從週期3、4以及5至中間指標暫存器316之右側與左側的樣態。 The bottom of Fig. 9 also shows the states from the periods 3, 4, and 5 to the right and left sides of the intermediate indicator register 316 in periods 3, 4, and 5.

第10圖所示為第2圖之微處理器100延續第9圖所示之範例的操作流程圖。第10圖描繪相似於第9圖之資訊,但不同處於在對記憶體區塊之進行第十一次以及第十二次的存取(第十二次存取之位址為0x4dced760)。如圖所示,其符合步驟412的準則(總計數器314至少為十)、步驟414的準則(最小改變計數器308以及區塊位元遮罩暫存器302之間的差至少為2)以及步驟416的準則(週期5之週期匹配計數器318在週期5之計數較其他所有之週期匹配計數器318至少大於2)。因此,根據第4圖之步驟418,控制邏輯322填寫(populate)方向暫存器342(用以指出方向趨勢為向上)、樣態週期暫存器346(填入數值5)、樣態暫存器344(用樣態“* *”或者“01010”)。控制邏輯322亦根據第4圖之步驟422與第6圖,為記憶體區塊執行預取預測,如第11圖所示。第10圖亦顯示控制邏輯322在第6圖之步驟602操作中,方向暫存器342在位元21之位置。 Figure 10 is a flow chart showing the operation of the example shown in Figure 9 of the microprocessor 100 of Figure 2. Figure 10 depicts information similar to Figure 9, but with the exception of the eleventh and twelfth accesses to the memory block (the address of the twelfth access is 0x4dced760). As shown, it conforms to the criteria of step 412 (the total counter 314 is at least ten), the criteria of step 414 (the difference between the minimum change counter 308 and the block bit mask register 302 is at least 2), and the steps The criteria of 416 (the period match counter 318 of cycle 5 is at least greater than 2 in the cycle 5 count than all other cycles match counter 318). Therefore, according to step 418 of FIG. 4, the control logic 322 populates the direction register 342 (to indicate that the direction trend is upward), the state cycle register 346 (fills the value 5), and temporarily stores the state. 344 (using the form "* *" or "01010"). Control logic 322 also performs prefetch prediction for the memory block in accordance with steps 422 and 6 of FIG. 4, as shown in FIG. Figure 10 also shows control logic 322 in operation of step 602 of Figure 6, direction register 342 at bit 21 position.

第11圖所示為第2圖之微處理器100延續第9以及10圖之範例的操作流程圖。第11圖經由範例中描繪十二不同範例之每一者(表標示成0到11)經過第6圖之步驟604到步驟616直到記憶體區塊之快取線被預取單元124預測發現需要被預取之記憶體區塊的操作。如圖所示,在每一範例中,方向暫存器342的值是根據第6圖步驟612而增加。如第11圖所示,在範例5以及10中,樣態區域暫存器348會根據第6圖之步驟612被更新。如範例0、2、4、5、7以及10所示,由於在方向暫存器342之位 元為假(false),樣態指出在方向暫存器342上之快取線將不被需要。圖中更顯示,在範例1、3、6以及8中,由於在方向暫存器342中樣態暫存器344之位元為真(true),樣態暫存器344指出在方向暫存器342上的快取線將被需要,然而快取線已經準備被取出(fetched),如區塊位元遮罩暫存器302之位元為真(true)之指示。最後如圖所示,在範例11中,由於在方向暫存器342中樣態暫存器344之位元為真(true),所以樣態暫存器344指出在方向暫存器342上之快取線將被需要,但是因區塊位元遮罩暫存器302之位元為假(false),所以此快取線尚未被取出(fetched)。因此,控制邏輯322根據第6圖之步驟624推進一預取要求至預取要求佇列328中,用以預取在位址0x4dced800之快取線,其對應於在區塊位元遮罩暫存器302之位元32。 Figure 11 is a flow chart showing the operation of the microprocessor 100 of Figure 2 continuing the examples of Figures 9 and 10. Figure 11 depicts each of the twelve different examples (the tables are labeled 0 through 11) by way of example through step 604 through step 616 of Figure 6 until the cache line of the memory block is predicted by the prefetch unit 124 to find the need. The operation of the pre-fetched memory block. As shown, in each of the examples, the value of direction register 342 is incremented according to step 612 of FIG. As shown in FIG. 11, in the examples 5 and 10, the modal area register 348 is updated according to step 612 of FIG. As shown in Examples 0, 2, 4, 5, 7, and 10, due to the position in the direction register 342 The element is false (false), and the pattern indicates that the cache line on the direction register 342 will not be needed. The figure further shows that in the examples 1, 3, 6 and 8, since the bit of the mode register 344 is true in the direction register 342, the mode register 344 indicates that the bit is temporarily stored in the direction. The cache line on device 342 will be needed, however the cache line is ready to be fetched, as indicated by the block bit mask register 302 being true. Finally, as shown, in the example 11, since the bit of the mode register 344 is true in the direction register 342, the mode register 344 indicates the direction register 342. The cache line will be needed, but since the bit of the block bit mask register 302 is false, the cache line has not been fetched. Accordingly, control logic 322 advances a prefetch request to prefetch request queue 328 in accordance with step 624 of FIG. 6 for prefetching the cache line at address 0x4dced800, which corresponds to the mask in the block bit mask. Bit 32 of the memory 302.

在一實施例中,所描述之一或多個既定值係可藉由操作系統(例如經由一樣態特定暫存器(model specific register,MSR))或者經由微處理器100之熔絲(fuses)來編程,其中熔絲可在微處理器100的生產過程中熔斷。 In one embodiment, one or more of the predetermined values described may be by an operating system (eg, via a model specific register (MSR)) or via a microprocessor 100 fuses. Programming is performed in which the fuse can be blown during the production of the microprocessor 100.

在一實施例中,區塊位元遮罩暫存器302之大小可為了節省電源(power)以與及裸片晶片大小機板(die real estate)而減小。也就是說,在每一區塊位元遮罩暫存器302中的位元數,將少於在一記憶體區塊中快取線之數量。例如,在一實施例中,每一區塊位元遮罩暫存器302之位元數僅為記憶體區塊所包含之快取線之數量的一半。區塊位元遮罩暫存器302僅追蹤對上半區塊或者下半區塊的存取,端看記憶體區塊的那一半先被存取,而一額外之位元用以指出記憶體區塊之下半部或者 上半部是否先被存取。 In one embodiment, the size of the block bit mask register 302 can be reduced to save power and die die real estate. That is, the number of bits in each block bit mask register 302 will be less than the number of cache lines in a memory block. For example, in one embodiment, the number of bits per block masking buffer 302 is only half the number of cache lines included in the memory block. The block bit mask register 302 only tracks access to the upper half block or the lower half block, and the half of the memory block is accessed first, and an additional bit is used to indicate the memory. The lower half of the body block or Whether the upper half is accessed first.

在一實施例中,控制邏輯322並不如步驟516/518所述測試中間指標暫存器316上下N位元,而是包括一序列引擎(serial engine),一次一個或兩個位元地掃描區塊位元遮罩暫存器302,用以尋找週期大於一最大週期之樣態(如前所述為5位元)。 In an embodiment, the control logic 322 does not test the upper and lower N bits of the intermediate indicator register 316 as described in step 516/518, but includes a serial engine, one or two bit scan areas at a time. The block bit masks the register 302 for finding a pattern with a period greater than a maximum period (5 bits as previously described).

在一實施例中,若在步驟414沒有偵測出明顯的方向趨勢、或者在步驟416並未偵測出明顯的樣態週期、以及總計數器314之計數值到達一既定臨界值(用以指出在記憶體區塊中之大部份的快取線已被存取)時,控制邏輯322繼續執行以及預取在記憶體區塊中剩下的快取線。上述既定臨界值係為記憶體區塊之快取記憶體數量之一相對高的百分比值,例如區塊位元遮罩暫存器302之位元的值。 In one embodiment, if no significant directional trend is detected at step 414, or a significant morphological period is not detected at step 416, and the count value of the total counter 314 reaches a predetermined threshold (to indicate When most of the cache lines in the memory block have been accessed, control logic 322 continues to execute and prefetches the remaining cache lines in the memory block. The predetermined threshold value is a relatively high percentage value of one of the number of cache memories of the memory block, for example, the value of the bit of the block bit mask register 302.

結合第二級快取記憶體以及第一級資料快取記憶體之預取單元:近代的微處理器包括具有一階層結構之快取記憶體。典型地,一微處理器包括一又小又快的第一級資料快取記憶體以及一較大但較慢之第二級快取記憶體,分別如第2圖之第一級資料快取記憶體116以及第二級快取記憶體118。具有一階層結構之快取記憶體有利於預取資料至快取記憶體,以改善快取記憶體之命中率(hit rate)。由於第一級資料快取記憶體116之速度較快,故較佳的狀況為預取資料至第一級資料快取記憶體116。然而,由於第一級資料快取記憶體116之記憶體容量較小,快取記憶體命中率可能實際上較差且慢,由於如果預取單 元不正確地預取資料進第一級資料快取記憶體116,當最後發現這些資料係不需要的,便要以其他需要的資料做替代。因此,載入第一級資料快取記憶體116或者第二級快取記憶體118的選擇,係預取單元是否能正確預測資料是否被需要的函數(function)。因為第一級資料快取記憶體116被要求較小的尺寸,第一級資料快取記憶體116傾向較小之容量以及因此具有較差的準確性;反之,由於第二級快取記憶體標籤以及資料陣列之大小使得第一級快取記憶體預取單元之大小顯得很小,所以一第二級快取記憶體預取單元可為較大之容量因此具有較佳之準確性。 In combination with the second level cache memory and the prefetch unit of the first level data cache memory: the modern microprocessor includes a cache memory having a hierarchical structure. Typically, a microprocessor includes a small and fast first-level data cache memory and a larger but slower second-level cache memory, as in the first level data cache of FIG. 2, respectively. The memory 116 and the second level cache memory 118. A cache memory having a hierarchical structure facilitates prefetching data to the cache memory to improve the hit rate of the cache memory. Since the first level data cache memory 116 is faster, the preferred condition is prefetching data to the first level data cache memory 116. However, since the memory capacity of the first level data cache memory 116 is small, the cache memory hit rate may actually be poor and slow, due to prefetch orders. The element incorrectly prefetches the data into the first level data cache memory 116. When it is finally found that the data is not needed, it needs to be replaced with other required data. Therefore, the selection of the first level data cache memory 116 or the second level cache memory 118 is loaded, and whether the prefetch unit can correctly predict whether the data is required or not. Since the first level data cache memory 116 is required to be smaller in size, the first level data cache memory 116 tends to have a smaller capacity and thus has a lower accuracy; conversely, due to the second level cache memory label And the size of the data array makes the size of the first-level cache memory prefetching unit appear small, so a second-level cache memory prefetching unit can have a larger capacity and thus has better accuracy.

本發明實施例所述微處理器100的優勢,在於一載入/儲存單元134用以作為第二級快取記憶體118以及第一級資料快取記憶體116之預取需要之基礎。本發明之實施例提升載入/儲存單元134(第二級快取記憶體118)之準確性,用以應用在解決上述預取進入第一級資料快取記憶體116之問題。再者,實施例中也完成了運用單一邏輯(single body of logic)來處理第一級資料快取記憶體116以及第二級快取記憶體118之預取操作的目標。 The advantage of the microprocessor 100 in the embodiment of the present invention is that a load/store unit 134 is used as the basis for the prefetching needs of the second level cache memory 118 and the first level data cache memory 116. Embodiments of the present invention improve the accuracy of the load/store unit 134 (second level cache memory 118) for application to resolve the above-described prefetching into the first level data cache memory 116. Moreover, the embodiment also uses a single body of logic to process the target of the prefetch operation of the first level data cache memory 116 and the second level cache memory 118.

第12圖所示為根據本發明各實施例之微處理器100。第12圖之微處理器100相似於第2圖之微處理器100並具有如下所述之額外的特性。 Figure 12 shows a microprocessor 100 in accordance with various embodiments of the present invention. The microprocessor 100 of Fig. 12 is similar to the microprocessor 100 of Fig. 2 and has additional features as described below.

第一級資料快取記憶體116提供第一級資料記憶體位址196至預取單元124。第一級資料記憶體位址196係藉由載入/儲存單元134對第一級資料快取記憶體116進行載入/儲存 存取的實體位址。也就是說,預取單元124會隨著載入/儲存單元134存取第一級資料快取記憶體116時進行竊聽(eavesdrops)。預取單元124提供一樣態預測快取線位址194至第一級資料快取記憶體116之一佇列198,樣態預測快取線位址194為快取線之位址,其中之快取線係預取單元124根據第一級資料記憶體位址196預測載入/儲存單元134即將對第一級資料快取記憶體116提出要求者。第一級資料快取記憶體116提供一快取線配置要求192至預取單元124,用以從第二級快取記憶體118要求快取線,而這些快取線之位址係儲存於佇列198中。最後,第二級快取記憶體118提供所要求之快取線資料188至第一級資料快取記憶體116。 The first level data cache memory 116 provides a first level data memory address 196 to the prefetch unit 124. The first level data memory address 196 is loaded/stored by the first level data cache memory 116 by the load/store unit 134. The physical address of the access. That is, the prefetch unit 124 performs eavesdrops as the load/store unit 134 accesses the first level data cache 116. The prefetch unit 124 provides the same state prediction cache line address 194 to the first level data cache memory 116, and the sample prediction cache line address 194 is the address of the cache line, which is fast. The line take-up prefetching unit 124 predicts that the load/store unit 134 is about to request the first level data cache memory 116 based on the first level data memory address 196. The first level data cache memory 116 provides a cache line configuration request 192 to the prefetch unit 124 for requesting cache lines from the second level cache memory 118, and the addresses of the cache lines are stored in伫 198. Finally, the second level cache memory 118 provides the requested cache line data 188 to the first level data cache memory 116.

預取單元124亦包括第一級資料搜尋指標器172以及第一級資料樣態位址178,如第12圖所示。第一級資料搜尋指標器172以及第一級資料樣態位址178之用途與第14圖相關且如下所述。 The prefetch unit 124 also includes a first level data search indicator 172 and a first level data sample address 178, as shown in FIG. The use of the first level data search indicator 172 and the first level data sample address 178 is related to Figure 14 and is described below.

第13圖所示為第12圖之預取單元124的操作流程圖。流程開始於步驟1302。 Fig. 13 is a flow chart showing the operation of the prefetch unit 124 of Fig. 12. The flow begins in step 1302.

在步驟1302中,預取單元124從第一級資料快取記憶體116接收第12圖之第一級資料記憶體位址196。流程進行到步驟1304。 In step 1302, the prefetch unit 124 receives the first level data memory address 196 of FIG. 12 from the first level data cache memory 116. The flow proceeds to step 1304.

在步驟1304中,預取單元124偵測到一記憶體區塊(例如分頁(page))落入預取單元124為先前所測得之存取樣態所預取之快取線中,並已開始從系統記憶體預取這些快取線進入第二級快取記憶體118,如第1至11圖中相關處所述。仔細而 言,預取單元124維持一區塊號碼暫存器303,其指定所偵測之存取樣態所預取記憶體區塊之基底位址。預取單元124藉由偵測區塊號碼暫存器303之位元是否匹配第一級資料記憶體位址196之對應位元,來偵測第一級資料記憶體位址196是否落在記憶體區塊中。流程進行到步驟1306。 In step 1304, the prefetch unit 124 detects that a memory block (eg, a page) falls into the cache line prefetched by the prefetch unit 124 for the previously measured access mode, and Pre-fetching these cache lines from the system memory has begun to enter the second level cache memory 118 as described in relation to Figures 1 through 11. Carefully The prefetch unit 124 maintains a block number register 303 that specifies the base address of the memory block prefetched by the detected access mode. The prefetch unit 124 detects whether the first level data memory address 196 falls in the memory area by detecting whether the bit of the block number register 303 matches the corresponding bit of the first level data memory address 196. In the block. The flow proceeds to step 1306.

在步驟1306中,從第一級資料記憶體位址196開始,預取單元124在記憶體區塊中所偵測到之存取方向(detected access direction)上尋找下兩個快取線,這兩個快取線與先前所偵測的存取方向有關。步驟1306更詳細之執行操作將於後續的第14圖中加以說明。流程進行到步驟1308。 In step 1306, starting from the first level data memory address 196, the prefetch unit 124 looks for the next two cache lines in the detected access direction detected in the memory block. The cache lines are related to the previously detected access directions. A more detailed execution of step 1306 will be described in subsequent Figure 14. The flow proceeds to step 1308.

在步驟1308中,預取單元124提供在步驟1306找到之下兩個快取線之實體位址至第一級資料快取記憶體116,作為樣態預測快取線位址194。在其他實施例中,預取單元124所提供之快取線位址的數量可多於或少於2。流程進行到步驟1312。 In step 1308, the prefetch unit 124 provides the physical address of the next two cache lines to the first level data cache 116 in step 1306 as the aspect prediction cache line address 194. In other embodiments, the number of cache line addresses provided by prefetch unit 124 may be more or less than two. The flow proceeds to step 1312.

在步驟1312中,第一級資料快取記憶體116把在步驟1308中所提供之位址推進至佇列198中。流程進行到步驟1314。 In step 1312, the first level data cache 116 advances the address provided in step 1308 to queue 198. The flow proceeds to step 1314.

在步驟1314中,無論何時只要佇列198為非空(non-empty),第一級資料快取記憶體116將下一個位址取出佇列198,並發出一快取線配置要求192至第二級快取記憶體118,以便取得在該位址之快取線。然而,若在佇列198之一位址已出現於第一級資料快取記憶體116,第一級資料快取記憶體116將拋棄(dumps)該位址以及放棄自第二級快取記憶體118 要求其快取線。第二級快取記憶體118接著提供所要求之快取線資料188至第一級資料快取記憶體116。流程結束於步驟1314。 In step 1314, whenever the queue 198 is non-empty, the first level data cache 116 takes the next address out of the queue 198 and issues a cache line configuration request 192 to The secondary cache memory 118 is used to obtain the cache line at the address. However, if one of the addresses in the queue 198 has appeared in the first level data cache 116, the first level data cache 116 will dump the address and discard the second level cache. Body 118 Ask for its cache line. The second level cache memory 118 then provides the requested cache line data 188 to the first level data cache memory 116. The process ends at step 1314.

第14圖所示為第12圖所示之預取單元124根據第13圖之步驟1306的操作流程圖。第14圖所敘述之操作係在第3圖所偵測到樣態方向為向上(upward)的狀況下。然而,若所偵測到之樣態方向為向下,預取單元124亦可用以執行同樣的功能。步驟1402到1408之操作係用以將第3圖中之樣態暫存器344放置在記憶體區塊中適當的位置,使得預取單元124藉由從第一級資料記憶體位址196上開始之樣態搜尋下兩個快取線,並只要有需求時在該記憶體區塊上複製該樣態暫存器344之樣態344即可。流程開始於步驟1402。 Fig. 14 is a flow chart showing the operation of the prefetch unit 124 shown in Fig. 12 in accordance with step 1306 of Fig. 13. The operation described in Fig. 14 is in the case where the direction of the sample is detected as upward in Fig. 3. However, if the detected direction is downward, the prefetch unit 124 can also be used to perform the same function. The operations of steps 1402 to 1408 are for placing the mode register 344 in FIG. 3 in an appropriate position in the memory block, so that the prefetch unit 124 starts from the first level data memory address 196. The mode searches for the next two cache lines, and copies the state 344 of the state register 344 on the memory block as needed. The flow begins in step 1402.

在步驟1402中,預取單元124以相似於第6圖在步驟602初始化搜尋指標暫存器352以及樣態區域暫存器348之方式,用第3圖之樣態週期暫存器346以及中間指標暫存器316的總和,來初始化第12圖之第一級資料搜尋指標器172以及第一級資料樣態位址178。例如,若中間指標暫存器316之值為16以及樣態週期暫存器346為5,並且方向暫存器342之方向為往上,預取單元124初始化第一級資料搜尋指標器172以及第一級資料樣態位址178至21。流程進行到步驟1404。 In step 1402, the prefetch unit 124 initializes the search index register 352 and the sample area register 348 in step 602 similarly to the sixth figure, and uses the state cycle register 346 of FIG. 3 and the middle. The sum of the index registers 316 initializes the first level data search indicator 172 of FIG. 12 and the first level data sample address 178. For example, if the value of the intermediate indicator register 316 is 16 and the phase period register 346 is 5, and the direction of the direction register 342 is upward, the prefetch unit 124 initializes the first level data search indicator 172 and The first level of data sample addresses 178 to 21. The flow proceeds to step 1404.

在步驟1404中,預取單元124決定第一級資料記憶體位址196是否落入在具有目前所指定位置之樣態暫存器344之樣態中,樣態的目前起始位置係根據步驟1402所決定的,並可根據步驟1406進行更新。也就是說,預取單元124決定第一 級資料記憶體位址196之適當位元(relevant bits)的值(即除去確認記憶體區塊的位元,以及在快取線中用來指定位元組偏移(byte offset)的位元),是否大於或者等於第一級資料搜尋指標器172之值,以及是否小於或者等於第一級資料搜尋指標器172與樣態週期暫存器346兩者之值所相加之總合。若第一級資料記憶體位址196落入(fall within)樣態暫存器344之樣態中,流程進行到步驟1408;否則流程進行到步驟1406。 In step 1404, the prefetch unit 124 determines whether the first level data memory address 196 falls into the state of the state register 344 having the currently specified position. The current starting position of the mode is according to step 1402. Determined and updated according to step 1406. That is, the prefetch unit 124 determines the first The value of the appropriate bits of the level data memory address 196 (ie, the bit from which the memory block is removed, and the bit used to specify the byte offset in the cache line) Whether it is greater than or equal to the value of the first level data search indexer 172, and whether it is less than or equal to the sum of the values of the first level data search indexer 172 and the state cycle register 346. If the first level data memory address 196 falls into the state of the state register 344, the flow proceeds to step 1408; otherwise, the flow proceeds to step 1406.

在步驟1406中,預取單元124根據樣態週期暫存器346增加第一級資料搜尋指標器172以及第一級資料樣態位址178之值。根據步驟1406(與後續之步驟1418)所述之操作,若第一級資料搜尋指標器172已達到記憶體區塊之終點則結束搜尋。流程回到步驟1404。 In step 1406, the prefetch unit 124 increments the values of the first level data search indexer 172 and the first level data sample address 178 according to the state cycle register 346. According to the operation described in step 1406 (and subsequent step 1418), if the first level data search indexer 172 has reached the end of the memory block, the search ends. The flow returns to step 1404.

在步驟1408中,預取單元124將第一級資料搜尋指標器172之值設置(set)為第一級資料記憶體位址196所相關之快取線之記憶體頁的偏移量(offset)。流程進行到步驟1412。 In step 1408, the prefetch unit 124 sets the value of the first level data search indexer 172 to the offset of the memory page of the cache line associated with the first level data memory address 196. . The flow proceeds to step 1412.

在步驟1412中,預取單元124在第一級資料搜尋指標器172中測試樣態暫存器344中之位元。流程進行到步驟1414。 In step 1412, the prefetch unit 124 tests the bits in the modal register 344 in the first level data search indexer 172. The flow proceeds to step 1414.

在步驟1414中,預取單元124決定步驟1412所測試之位元是否設置好了。如果在步驟1412所測試之位元設置好了,流程進行到步驟1416;否則流程進行到步驟1418。 In step 1414, prefetch unit 124 determines if the bit tested in step 1412 is set. If the bit tested in step 1412 is set, the flow proceeds to step 1416; otherwise, the flow proceeds to step 1418.

在步驟1416中,預取單元124將在步驟1414中被樣態暫存器344所預測之快取線標記為已準備好傳送實體位址至第一級資料快取記憶體116,以作為一樣態預測快取線位址 194。流程結束於步驟1416。 In step 1416, prefetch unit 124 marks the cache line predicted by state register 344 in step 1414 as ready to transmit the physical address to first level data cache 116 as the same. State prediction cache line address 194. The process ends at step 1416.

在步驟1418中,預取單元124增加第一級資料搜尋指標器172之值。另外,若第一級資料搜尋指標器172已超過上述樣態暫存器344之最後一個位元,預取單元124則用第一級資料搜尋指標器172之新的數值更新第一級資料搜尋指標器172之值,亦即切換(shift)樣態暫存器344至新的第一級資料搜尋指標器172的位置。步驟1412到1418之操作係反覆執行,直到兩快取線(或者快取線之其他既定值)被找到為止。流程結束於步驟1418。 In step 1418, the prefetch unit 124 increments the value of the first level data search indicator 172. In addition, if the first level data search indexer 172 has exceeded the last bit of the mode register 344, the prefetch unit 124 updates the first level data search with the new value of the first level data search indicator 172. The value of the indicator 172, that is, the position of the shift state register 344 to the new first level data search indexer 172. The operations of steps 1412 through 1418 are repeated until the two cache lines (or other predetermined values of the cache line) are found. The process ends at step 1418.

第13圖中以些微繞路的方式來預取快取線至第一級資料快取記憶體116的好處係第一級資料快取記憶體116以及第二級快取記憶體118所需要之改變較小。然而,在其他實施例中,預取單元124亦可不提供樣態預測快取線位址194至第一級資料快取記憶體116。例如,在一實施例中,預取單元124直接要求匯流排介面單元122自記憶體獲擷取快取線,然後將所接收之寫入快取線寫入至第一級資料快取記憶體116。在另一實施例中,預取單元124自用以提供資料至預取單元124的第二級快取記憶體118要求並取得快取線(如果為命中失敗(missing)則從記憶體取得快取線),並將收到之快取線寫入至第一級資料快取記憶體116。在其他實施例中,預取單元124自第二級快取記憶體118要求快取線(如果為命中失敗(missing)則從記憶體取得快取線),其直接將快取線寫入第一級資料快取記憶體116。 The advantage of pre-fetching the cache line to the first level data cache 116 in a micro-bypass manner in FIG. 13 is required for the first level data cache 116 and the second level memory 118. The change is small. However, in other embodiments, the prefetch unit 124 may also not provide the state prediction cache line address 194 to the first level data cache memory 116. For example, in an embodiment, the prefetch unit 124 directly requests the bus interface unit 122 to retrieve the cache line from the memory, and then writes the received write cache line to the first level data cache. 116. In another embodiment, the prefetch unit 124 requests and obtains a cache line from the second level cache 118 for providing data to the prefetch unit 124 (if a miss is used, the cache is fetched from the memory) Line) and write the received cache line to the first level data cache 116. In other embodiments, the prefetch unit 124 requests a cache line from the second level cache memory 118 (if a miss line is used to retrieve the cache line from the memory), which directly writes the cache line The primary data cache memory 116.

如上所述,本發明之各實施例的好處在於具有單 一的預取單元124,作為第二級快取記憶體118以及第一級資料快取記憶體116兩者之預取需要之基礎。雖然在第2、12以及15圖所示(如下討論之內容)為不同之區塊,預取單元124在空間安排上可鄰近於第二級快取記憶體118之標籤(tag)以及資料列(data array)之位置並且概念上包括第二級快取記憶體118,如第21圖所示。各實施例允許預取單元124具較大空間之安排來提升其精確度與其大空間之需求,以應用一單一邏輯來處理第一級資料快取記憶體116以及第二級快取記憶體118之預取操作,以解決習知技術中只能預取資料給容量較小的第一級資料快取記憶體116之問題。 As mentioned above, the advantages of various embodiments of the present invention are that they have a single The prefetch unit 124 of one is the basis for the prefetching needs of both the second level cache memory 118 and the first level data cache memory 116. Although the blocks shown in Figures 2, 12, and 15 (discussed below) are different blocks, the prefetch unit 124 may be spatially arranged adjacent to the tag and data column of the second level cache memory 118. The location of (data array) and conceptually includes second level cache memory 118, as shown in FIG. Embodiments allow prefetch unit 124 to have a larger spatial arrangement to increase its accuracy and its large space requirements to apply a single logic to process first level data cache memory 116 and second level cache memory 118. The prefetching operation solves the problem that the prior art can only prefetch data to the first-level data cache memory 116 having a smaller capacity.

具有減少跨頁上之暖機損失(warm-up penalty)的定界框預取單元: A bounding box prefetch unit with reduced warm-up penalty across pages:

本發明所述之預取單元124在可偵測一記憶體區塊(例如,一實體記憶體頁)上較複雜之存取樣態(例如,一實體記憶體頁),其一般而言為習知預取單元所無法偵測者。舉例而言,預取單元124可以根據一樣態偵測正在進行存取一記憶體區塊之程式,即使微處理器100之非循序執行(out-of-order execution)管線(pipeline)會不以程式命令的順序而重新排序(re-order)記憶體存取,一般而言,這可能會造成習知預取單元不去偵測記憶體存取樣態而導致沒有預取動作。這是由於預取單元124只考慮對記憶體區塊之進行有效地存取,而時間順序(time order)並非其考量點。 The prefetching unit 124 of the present invention can detect a more complex access mode (for example, a physical memory page) on a memory block (for example, a physical memory page), which is generally It is not possible to detect by the prefetch unit. For example, the prefetch unit 124 can detect a program that is accessing a memory block according to the same state, even if the out-of-order execution pipeline of the microprocessor 100 does not The order of the program commands re-orders the memory access. In general, this may cause the conventional prefetch unit not to detect the memory access pattern without causing prefetching. This is because the prefetch unit 124 only considers efficient access to the memory block, and the time order is not a point of consideration.

然而,為了滿足辨識更複雜之存取樣態及/或重新排序存取樣態之能力,相較於習知的預取單元,本發明之預取 單元124可能需要一較長之時間去偵測存取樣態,如下所述之“暖機時間(warm-up time)”。因此需要一減少預取單元124暖機時間之方法。 However, in order to satisfy the ability to identify more complex access patterns and/or reorder access patterns, the prefetching of the present invention is compared to conventional prefetching units. Unit 124 may take a longer period of time to detect the access pattern, as described below, "warm-up time." Therefore, a method of reducing the warm-up time of the prefetch unit 124 is required.

預取單元124用以預測一個之前藉由一存取樣態正在存取一記憶體區塊之程式,是否已經跨越(cross over)實際上與舊的記憶體區塊相鄰之一新記憶體區塊,以及預測此程式是否會根據相同之樣態繼續存取這個新的記憶體區塊。因應於此,預取單元124使用來自舊的記憶體區塊之樣態、方向以及其他相關資訊,以加快在新的記憶體區塊偵測存取樣態的速度,即減少暖機時間。 The prefetch unit 124 is configured to predict whether a program that was previously accessing a memory block by an access mode has crossover a new memory that is actually adjacent to the old memory block. Block, and predict whether this program will continue to access this new memory block according to the same pattern. In response to this, the prefetch unit 124 uses the appearance, direction, and other relevant information from the old memory block to speed up the detection of the access state in the new memory block, ie, reduce the warm-up time.

如第15圖所示為具有一預取單元124之微處理器100的方塊圖。第15圖之微處理器100相似於第2以及12圖之微處理器100,並且具有如下所述之其它特性。 A block diagram of a microprocessor 100 having a prefetch unit 124 is shown in FIG. The microprocessor 100 of Fig. 15 is similar to the microprocessor 100 of Figs. 2 and 12 and has other characteristics as described below.

如第3圖中之相關敘述,預取單元124包括複數硬體單元332。每一硬體單元332相較於第3圖所述更包括一記憶體區塊虛擬雜湊虛擬位址欄(hashed virtual address of memory,HVAMB)354以及一狀態欄(status)356。在第4圖所述之步驟406初始化已分派之硬體單元332的過程中,預取單元124取出區塊號碼暫存器303中之實體區塊碼(physical block number),並在將實體區塊碼轉譯成一虛擬位址後,根據後續第17圖所述之步驟1704所執行之相同雜湊演算法(the same hashing algorithm)將實體區塊碼轉譯成一虛擬位址(雜湊(hash)此之虛擬位址),並將其雜湊演算之結果儲存至記憶體區塊虛擬雜湊位址欄354。狀態欄356具有三種可能之數值:閒置 (inactive)、主動(active)或者試用(probationary),如下所述。預取單元124亦包括一虛擬雜湊表(virtual hash table,VHT)162,關於虛擬雜湊表162組織架構以及操作之詳細說明請參考後續第16到19圖之敘述。 As described in relation to FIG. 3, the prefetch unit 124 includes a plurality of hardware units 332. Each hardware unit 332 further includes a memory block hashed virtual address of memory (HVAMB) 354 and a status bar 356 as compared to FIG. In the process of initializing the dispatched hardware unit 332 in step 406 described in FIG. 4, the prefetch unit 124 fetches the physical block number in the block number register 303 and places the physical block After the block code is translated into a virtual address, the physical block code is translated into a virtual address (the hash is virtualized according to the same hashing algorithm performed in step 1704 described in the subsequent FIG. The address is stored and the result of its hash calculation is stored in the memory block virtual hash address field 354. Status bar 356 has three possible values: idle (inactive), active (active) or trial (probationary), as described below. The prefetch unit 124 also includes a virtual hash table (VHT) 162. For a detailed description of the organization and operation of the virtual hash table 162, please refer to the following descriptions of FIGS. 16 to 19.

如第16圖所示為第15圖之虛擬雜湊表162。虛擬雜湊表162包括複數項目,最好組織成一佇列。每一項目包括一有效位元(valid bit)(未圖示)以及三個欄:一負1雜湊虛擬位址1602(HVAM1)、一未修改雜湊虛擬位址1604(HVAUN)以及一正1雜湊虛擬位址1606(HVAP1)。填寫上述欄位以生成這些數值的方式請參考後續第17圖所述。 As shown in Fig. 16, the virtual hash table 162 of Fig. 15 is shown. The virtual hash table 162 includes a plurality of items, preferably organized into a queue. Each entry includes a valid bit (not shown) and three columns: a negative 1 hash virtual address 1602 (HVAM1), an unmodified hash virtual address 1604 (HVAUN), and a positive 1 hash. Virtual address 1606 (HVAP1). Please refer to the subsequent Figure 17 for the manner in which the above fields are filled out to generate these values.

第17圖所述為第15圖之微處理器100之操作流程圖。流程開始於步驟1702。 Figure 17 is a flow chart showing the operation of the microprocessor 100 of Figure 15. The flow begins in step 1702.

在步驟1702中,第一級資料快取記憶體116接收來自載入/儲存單元134之一載入/儲存要求,其載入/儲存要求包括一虛擬位址。流程進行到步驟1704。 In step 1702, the first level data cache 116 receives a load/store request from the load/store unit 134 whose load/store requirements include a virtual address. The flow proceeds to step 1704.

在步驟1704中,第一級資料快取記憶體116對步驟1702中所接收之雜湊位址所選擇之位元執行一雜湊功能(函數),用以產生一未修改雜湊虛擬位址1604(HVAUN)。另外,第一級資料快取記憶體116將一記憶體區塊大小(MBS)與在步驟1702所接收之雜湊位址所選擇的位元相加,用以產生一加總值,並對加總值執行一雜湊功能,以產生一正1雜湊虛擬位址1606(HVAP1)。另外,第一級資料快取記憶體116從在步驟1702所接收之雜湊位址選擇的位元,減去記憶體區塊之大小,用以產生一差值,並對此差值執行一雜湊功能,以產生一負1雜湊 虛擬位址1602(HVAM1)。在一實施例中,記憶體區塊大小為4KB。在一實施例中,虛擬位址為40位元,虛擬位址之位元39:30以及11:0被會雜湊功能忽略。剩下之18個虛擬位址位元為“已處理(dealt)”,如已擁有之資訊,係透過雜湊位元位置來處理。其想法為虛擬位址之較低位元具有最高亂度(entropy)以及較高位元具有最低亂度,用此方法處理可保證亂度層級(entropy level)在跨越雜湊之位元時較一致。在一實施例中,剩下之虛擬位址之18位元係根據後續表1之方法雜湊至6位元。然而,在其他實施例中,亦可考慮使用不同雜湊演算法;此外,若有性能支配空間(performance dominates space)以及電力消耗之設計考量,實施例可考慮不使用雜湊演算法。流程進行到步驟1706。 In step 1704, the first level data cache 116 performs a hash function (function) on the selected bit of the hash address received in step 1702 to generate an unmodified hash virtual address 1604 (HVAUN). ). In addition, the first level data cache memory 116 adds a memory block size (MBS) to the bit selected by the hash address received in step 1702 to generate a total value and add The total value performs a hash function to generate a positive 1 hash virtual address 1606 (HVAP1). In addition, the first level data cache memory 116 subtracts the size of the memory block from the bit selected by the hash address received in step 1702 to generate a difference and perform a hash on the difference. Function to produce a negative 1 hash Virtual address 1602 (HVAM1). In one embodiment, the memory block size is 4 KB. In one embodiment, the virtual address is 40 bits, and the virtual address bits 39:30 and 11:0 are ignored by the hash function. The remaining 18 virtual address bits are "dealt", and if there is already information, it is handled by the hash bit position. The idea is that the lower bits of the virtual address have the highest degree of entropy and the higher bits have the lowest degree of chaos. Processing in this way ensures that the entropy level is more consistent across the hashed bits. In one embodiment, the remaining 18 bits of the virtual address are hashed to 6 bits according to the method of Table 1 below. However, in other embodiments, different hashing algorithms may also be considered; in addition, embodiments may consider not using a hashing algorithm if there is a performance dominates space and design considerations for power consumption. The flow proceeds to step 1706.

assign hash[5]=VA[29]^VA[18]^VA[17];assign hash[4]=VA[28]^VA[19]^VA[16];assign hash[3]=VA[27]^VA[20]^VA[15];assign hash[2]=VA[26]^VA[21]^VA[14];assign hash[1]=VA[25]^VA[22]^VA[13];assign hash[0]=VA[24]^VA[23]^VA[12];表1 Assign hash[5]=VA[29]^VA[18]^VA[17];assign hash[4]=VA[28]^VA[19]^VA[16];assign hash[3]=VA[ 27]^VA[20]^VA[15];assign hash[2]=VA[26]^VA[21]^VA[14];assign hash[1]=VA[25]^VA[22]^ VA[13]; assign hash[0]=VA[24]^VA[23]^VA[12]; Table 1

在步驟1706中,第一級資料快取記憶體116提供在步驟1704中所產生之未修改雜湊虛擬位址(HVAUN)1604、正1雜湊虛擬位址(HVAP1)1606以及負1雜湊虛擬位址(HVAM1)1602至預取單元124。流程進行到步驟1708。 In step 1706, the first level data cache memory 116 provides the unmodified hash virtual address (HVAUN) 1604, the positive 1 hash virtual address (HVAP1) 1606, and the negative 1 hash virtual address generated in step 1704. (HVAM1) 1602 to prefetch unit 124. Flow proceeds to step 1708.

在步驟1708中,預取單元124用步驟1706所接收之 未修改雜湊虛擬位址(HVAUN)1604、正1雜湊虛擬位址(HVAP1)1606以及負1雜湊虛擬位址(HVAM1)1602以選擇性地更新虛擬雜湊表162。也就是說,如果虛擬雜湊表162已包括一新的未修改雜湊虛擬位址1604(HVAUN)、正1雜湊虛擬位址1606(HVAP1)以及負1雜湊虛擬位址1602(HVAM1)之項目,預取單元124則放棄更新虛擬雜湊表162。相反地,預取單元124則以先進先出(first-in-first-out)的方式將未修改雜湊虛擬位址1604(HVAUN)、正1雜湊虛擬位址1606(HVAP1)以及負1雜湊虛擬位址1602(HVAM1)推進至虛擬雜湊表162最頂端之項目,並將所推進之項目標記為有效(valid)。流程結束於步驟1708。 In step 1708, the prefetch unit 124 receives the step 1706. The hash virtual address (HVAUN) 1604, the positive 1 hash virtual address (HVAP1) 1606, and the negative 1 hash virtual address (HVAM1) 1602 are unmodified to selectively update the virtual hash table 162. That is, if the virtual hash table 162 already includes a new unmodified hash virtual address 1604 (HVAUN), positive 1 hash virtual address 1606 (HVAP1), and negative 1 hash virtual address 1602 (HVAM1), The fetch unit 124 then discards the update virtual hash table 162. Conversely, prefetch unit 124 will unmodified hash virtual address 1604 (HVAUN), positive 1 hash virtual address 1606 (HVAP1), and negative 1 hash virtual in a first-in-first-out manner. The address 1602 (HVAM1) advances to the topmost item of the virtual hash table 162 and marks the promoted item as valid. The process ends at step 1708.

如第18圖所示為第16圖之虛擬雜湊表162在預取單元124的載入/儲存單元134根據第17圖之敘述操作之後的內容,其中在載入/儲存單元134因應於程式的執行,已經在一向上的方向上前進兩記憶體區塊(標示為A and A+MBS),並進入一第三記憶體區塊(標示為A+2*MBS),以便回應已填寫虛擬雜湊表162之預取單元124。仔細而言,虛擬雜湊表162距離尾端兩個項目的項目包括在負1雜湊虛擬位址(HVAM1)1602之A-MBS的雜湊、在未修改雜湊虛擬位址(HVAUN)1604之A的雜湊以及在正1雜湊虛擬位址(HVAP1)1606之A+MBS的雜湊;虛擬雜湊表162距離尾端一個項目的項目包括負1雜湊虛擬位址(HVAM1)1602之A的雜湊、在未修改雜湊虛擬位址(HVAUN)1604之A+MBS的雜湊以及在正1雜湊虛擬位址(HVAP1)1606之A+2*MBS的雜湊;虛擬雜湊表162在尾端的項目(即最近時間所推進的項目)包括在負1雜湊虛擬位址 (HVAM1)1602之A+MBS的雜湊、在未修改雜湊虛擬位址(HVAUN)1604之A+2*MBS的雜湊以及在正1雜湊虛擬位址(HVAP1)1606之A+3*MBS的雜湊。 As shown in FIG. 18, the virtual hash table 162 of FIG. 16 is operated after the load/store unit 134 of the prefetch unit 124 operates according to the description of FIG. 17, wherein the load/store unit 134 is adapted to the program. Execution, has advanced two memory blocks (labeled A and A+MBS) in an upward direction, and enters a third memory block (labeled A+2*MBS) in response to the filled virtual hash Prefetch unit 124 of Table 162. In detail, the virtual hash table 162 from the end of the two items of the project includes a hash of the A-MBS in the negative 1 hash virtual address (HVAM1) 1602, and a hash in the unmodified hash virtual address (HVAUN) 1604 A. And the hash of the A+MBS in the 1st hashed virtual address (HVAP1) 1606; the virtual hash table 162 is a hash of the A project of the negative 1 hash virtual address (HVAM1) 1602 from the end of a project, in the unmodified hash The hash of the A+MBS of the virtual address (HVAUN) 1604 and the hash of the A+2*MBS of the positive 1 hash virtual address (HVAP1) 1606; the item of the virtual hash table 162 at the end (ie the item promoted at the most recent time) ) is included in the negative 1 hash virtual address (HVAM1) 1602 hash of A+MBS, hash of A+2*MBS in unmodified hash virtual address (HVAUN) 1604, and hash of A+3*MBS in positive 1 hashed virtual address (HVAP1) 1606 .

第19圖所示(由第19A圖以及第19B圖組成)為第5圖之預取單元124的操作流程圖。流程開始於步驟1902。 The operation flowchart of the prefetch unit 124 of Fig. 5 is shown in Fig. 19 (consisting of Fig. 19A and Fig. 19B). The flow begins in step 1902.

在步驟1902中,第一級資料快取記憶體116傳送一新的配置要求(allocation request,AR)至第二級快取記憶體118。新的配置要求係要求一新記憶體區塊。也就是說預取單元124決定與配置要求相關之記憶體區塊係新的,意即尚未配置一硬體單元332給新的配置要求所相關之記憶體區塊。也就是說,預取單元124最近未遇到(encountered)一新記憶體區塊之配置要求。在一實施例中,配置要求係在一載入/儲存第一級資料快取記憶體116結果失敗並隨之由第二級快取記憶體118要求同一快取線所產生的要求。在一實施例中,配置要求用以指定一實體位址,實體位址所相關之一虛擬位址是由實體位址轉譯而來的。第一級資料快取記憶體116根據一雜湊功能(意即與第17圖之步驟1704相同之雜湊功能),雜湊與配置要求之實體位址有關之虛擬位址,用以產生配置要求之一已雜湊虛擬位址(HVAAR),並且將配置要求之已雜湊虛擬位址提供至預取單元124。流程進行至步驟1903。 In step 1902, the first level data cache 116 transmits a new allocation request (AR) to the second level cache 118. The new configuration requirement requires a new memory block. That is to say, the prefetch unit 124 determines that the memory block associated with the configuration request is new, that is, the memory block associated with the new configuration request has not been configured. That is, the prefetch unit 124 has not recently encountered a configuration requirement for a new memory block. In one embodiment, the configuration requirements are a failure of the result of loading/storing the first level of data cache memory 116 and subsequently requiring the second cache memory 118 to request the same cache line. In an embodiment, the configuration requirement is to specify a physical address, and one of the virtual addresses associated with the physical address is translated from the physical address. The first level data cache memory 116 is based on a hash function (ie, the same hash function as step 1704 of FIG. 17), and hashes the virtual address associated with the physical address of the configuration request to generate one of the configuration requirements. The virtual address (HVAAR) has been hashed and the hashed virtual address of the configuration requirement is provided to the prefetch unit 124. The flow proceeds to step 1903.

在步驟1903中,預取單元124配至一個新的硬體單元332給新的記憶體區塊。如果有閒置的硬體單元332存在,預取單元124配置一閒置的硬體單元332給新的記憶體區塊。否則,在一實施例中,預取單元124則配置一個最近最少使用之 硬體單元332給新的記憶體區塊。在一實施例中,一旦預取單元124已經預取樣態所指示之記憶體區塊的所有快取線時,預取單元124則會失效(inactivate)該硬體單元332。在一實施例中,預取單元124具有固定(pin)硬體單元332之能力,使其就算為一個最近最少使用之硬體單元332亦不會被重置。舉例而言,若預取單元124偵測到已經根據樣態對記憶體區塊進行一既定次數之存取,但預取單元124尚未根據樣態對整個記憶體區塊完成所有的預取,預取單元124即可固定與記憶體區塊有關之硬體單元332,使其就算成為一個最近最少使用之硬體單元332仍不適合被重置。在一實施例中,預取單元124維持每一硬體單元332之相對期間(從原始配置),並且當其期間(age)到達一既定期間臨界值時,預取單元124則會失效該硬體單元332。在另一實施例中,若預取單元124(藉由後續的步驟1904到1926)偵測一虛擬相鄰的記憶體區塊,並且已完成自虛擬鄰近的記憶體區塊之預取,預取單元124則會選擇性地重複使用在虛擬相鄰的記憶體區塊之硬體單元332,而不是配置一新的硬體單元332。在此實施例中,預取單元124選擇性地初始化重複使用之硬體單元332之各種儲存元件(例如方向暫存器342、樣態暫存器344與樣態區域暫存器348),以便維持儲存在其內之可用資訊。流程進行至步驟1904。 In step 1903, prefetch unit 124 is assigned a new hardware unit 332 to the new memory block. If there are idle hardware units 332 present, prefetch unit 124 configures an idle hardware unit 332 for the new memory block. Otherwise, in an embodiment, the prefetch unit 124 configures a least recently used one. The hardware unit 332 gives a new memory block. In one embodiment, the prefetch unit 124 inactivates the hardware unit 332 once the prefetch unit 124 has pre-sampled all of the cache lines of the memory block indicated by the state. In one embodiment, the prefetch unit 124 has the ability to pin the hard unit 332 so that it is not reset even if it is a least recently used hardware unit 332. For example, if the prefetch unit 124 detects that the memory block has been accessed for a predetermined number of times according to the mode, the prefetch unit 124 has not completed all prefetching of the entire memory block according to the mode. The prefetch unit 124 can fix the hardware unit 332 associated with the memory block so that it is not suitable for being reset even if it is a least recently used hardware unit 332. In an embodiment, the prefetch unit 124 maintains the relative period of each hardware unit 332 (from the original configuration), and when its age reaches a predetermined period threshold, the prefetch unit 124 will fail the hard Body unit 332. In another embodiment, if the prefetch unit 124 detects a virtual adjacent memory block (by subsequent steps 1904 to 1926) and has completed prefetching from the virtual adjacent memory block, The fetch unit 124 selectively reuses the hardware unit 332 in the virtual adjacent memory block instead of configuring a new hardware unit 332. In this embodiment, the prefetch unit 124 selectively initializes various storage elements of the reusable hardware unit 332 (eg, the direction register 342, the mode register 344, and the sample area register 348) so that Maintain the information available for storage within it. Flow proceeds to step 1904.

在步驟1904中,預取單元124比較在步驟1902所產生之已雜湊虛擬位址(HVAAR)與虛擬雜湊表162之每一項目之負1雜湊虛擬位址1602(HVAM1)和正1雜湊虛擬位址1606(HVAP1)。預取單元124根據步驟1904到1922之操作係為了 決定一已主動(active)記憶體區塊是否虛擬相鄰至新記憶體區塊,預取單元124根據步驟1924到1928之操作係為了預測記憶體存取是否將根據事先偵測到之存取樣態與方向,繼續自虛擬相鄰之已主動記憶體區塊進入新的記憶體區塊,用以降低預取單元124之暖機時間,使得預取單元124可較快開始預取新的記憶體區塊。流程進行至步驟1906。 In step 1904, prefetch unit 124 compares the negative 1 hash virtual address 1602 (HVAM1) and the positive 1 hash virtual address of each of the hashed virtual address (HVAAR) and virtual hash table 162 generated in step 1902. 1606 (HVAP1). The prefetch unit 124 operates in accordance with steps 1904 through 1922. Determining whether an active memory block is virtually adjacent to a new memory block, the prefetch unit 124 operates in accordance with steps 1924 through 1928 to predict whether the memory access will be accessed based on prior detection. The mode and the direction continue to enter the new memory block from the virtual adjacent active memory block to reduce the warm-up time of the prefetch unit 124, so that the prefetch unit 124 can start prefetching the new one faster. Memory block. Flow proceeds to step 1906.

在步驟1906中,預取單元124根據步驟1904執行之比較方式,決定已雜湊虛擬位址(HVAAR)是否與虛擬雜湊表162之任何一項目匹配。若已雜湊虛擬位址(HVAAR)與虛擬雜湊表162之一項目匹配,流程進行至步驟1908;否則,流程進行至步驟1912。 In step 1906, prefetch unit 124 determines whether the hashed virtual address (HVAAR) matches any of the virtual hash table 162 based on the comparison performed in step 1904. If the hash virtual address (HVAAR) matches one of the virtual hash tables 162, the flow proceeds to step 1908; otherwise, the flow proceeds to step 1912.

在步驟1908中,預取單元124設定一候補方向旗幟(candidate_direction flag)至一數值,以指示向上之方向。流程進行至步驟1916。 In step 1908, the prefetch unit 124 sets a candidate direction flag (canddate_direction flag) to a value to indicate the upward direction. The flow proceeds to step 1916.

在步驟1912中,預取單元124根據步驟1908所執行之比較方式,決定已雜湊虛擬位址(HVAAR)是否與虛擬雜湊表162之任何一項目匹配。若已雜湊虛擬位址(HVAAR)與虛擬雜湊表162之一項目匹配,流程進行至步驟1914;否則,流程結束。 In step 1912, prefetch unit 124 determines whether the hashed virtual address (HVAAR) matches any of the virtual hash table 162 based on the comparison performed by step 1908. If the hash virtual address (HVAAR) matches one of the virtual hash tables 162, the flow proceeds to step 1914; otherwise, the flow ends.

在步驟1914中,預取單元124設定候補方向旗幟(candidate_direction flag)至一數值,以指示向下之方向。流程進行至步驟1916。 In step 1914, the prefetch unit 124 sets the candidate direction flag (canddate_direction flag) to a value to indicate the downward direction. The flow proceeds to step 1916.

在步驟1916中,預取單元124將候補雜湊暫存器(candidate_hav register)(未圖示)設定為步驟1906或1912所決 定之虛擬雜湊表162之未修改雜湊虛擬位址1604(HVAUN)的一數值。流程進行至步驟1918。 In step 1916, prefetch unit 124 sets a candidate hash register (not shown) to step 1906 or 1912. A value of the unmodified hash virtual address 1604 (HVAUN) of the virtual hash table 162 is determined. The flow proceeds to step 1918.

在步驟1918中,預取單元124比較候選雜湊(candidate_hva)與預取單元124中每一主動記憶體區塊之記憶體區塊虛擬雜湊位址欄(HVAMB)354。流程進行至步驟1922。 In step 1918, the prefetch unit 124 compares the candidate hashes (candidate_hva) with the memory block virtual hash address field (HVAMB) 354 of each active memory block in the prefetch unit 124. The flow proceeds to step 1922.

在步驟1922中,預取單元124根據步驟1918所執行之比較方式,決定候選雜湊(candidate_hva)是否與任何一記憶體區塊虛擬雜湊位址欄(HVAMB)354匹配。若候選雜湊(candidate_hva)與一記憶體區塊虛擬雜湊位址欄(HVAMB)354匹配,流程進行至步驟1924;否則,流程結束。 In step 1922, prefetch unit 124 determines whether the candidate hash (candidate_hva) matches any of the memory block virtual hash address fields (HVAMB) 354, based on the comparison performed by step 1918. If the candidate hash (candidate_hva) matches a memory block virtual hash address field (HVAMB) 354, the flow proceeds to step 1924; otherwise, the flow ends.

在步驟1924中,預取單元124已確定步驟1922所找到之匹配主動記憶體區塊確實虛擬鄰近於新的記憶體區塊。因此,預取單元124比較(步驟1908或者1914所指定之)候選方向與匹配主動記憶體區塊之方向暫存器342,用以根據先前偵測到之存取樣態與方向,預測記憶體存取是否將繼續自虛擬相鄰的已主動記憶體區塊進入新的記憶體區塊。仔細而言,若候選方向與虛擬相鄰記憶體區塊之方向暫存器342不同,記憶體存取不太可能會根據先前偵測到之存取樣態與方向,繼續自虛擬相鄰的已主動記憶體區塊進入新的記憶體區塊。流程進行至步驟1926。 In step 1924, prefetch unit 124 has determined that the matching active memory block found at step 1922 is indeed virtually adjacent to the new memory block. Therefore, the prefetch unit 124 compares the candidate direction (specified by step 1908 or 1914) with the direction register 342 of the matching active memory block for predicting the memory according to the previously detected access mode and direction. Whether the access will continue to enter the new memory block from the virtual adjacent active memory block. In detail, if the candidate direction is different from the direction register 342 of the virtual adjacent memory block, the memory access is unlikely to continue from the virtual adjacent according to the previously detected access mode and direction. The active memory block has entered a new memory block. The flow proceeds to step 1926.

在步驟1926中,預取單元124根據步驟1924所執行之比較方法,決定候選方向與匹配主動記憶體區塊之方向暫存器342是否匹配。若候選方向與匹配主動記憶體區塊之方向暫存器342匹配,則流程進行至步驟1928;否則,流程結束。 In step 1926, the prefetch unit 124 determines whether the candidate direction matches the direction register 342 matching the active memory block according to the comparison method performed in step 1924. If the candidate direction matches the direction register 342 matching the active memory block, the flow proceeds to step 1928; otherwise, the flow ends.

在步驟1928中,預取單元124決定在步驟1902所接收到之新的重置要求是否被指到步驟1926所偵測到之匹配虛擬相鄰主動記憶體區塊之一已被樣態暫存器344所預測之快取線。在一實施例中,為了執行步驟1928之決定,預取單元124根據其樣態週期暫存器346有效地切換與複製匹配虛擬相鄰主動記憶體區塊之樣態暫存器344,用以在虛擬相鄰記憶體區塊繼續樣態位置樣態區域暫存器348,以便在新的記憶體區塊維持樣態334連貫性。若新的配置要求係要求匹配主動記憶體區塊之樣態暫存器344所相關之一快取記憶體列,流程進行至步驟1934;否則,流程進行至步驟1932。 In step 1928, the prefetch unit 124 determines whether the new reset request received at step 1902 is directed to the one of the matching virtual adjacent active memory blocks detected in step 1926 has been temporarily stored. The cache line predicted by 344. In an embodiment, in order to perform the determination of step 1928, the prefetch unit 124 effectively switches and copies the modal register 344 of the virtual adjacent active memory block according to the state cycle register 346. The modal location region register 348 continues in the virtual adjacent memory block to maintain the continuity of the pattern 334 in the new memory block. If the new configuration request is to match one of the cache memories associated with the profile register 344 of the active memory block, the flow proceeds to step 1934; otherwise, the flow proceeds to step 1932.

在步驟1932中,預取單元124根據第4圖之步驟406與408,初始化與填寫(步驟1903所配置之)新的硬體單元332,希望其最後可根據上述與第4到6圖相關之方法,偵測對新的記憶體區塊之存取的新樣態,而這將需要暖機時間。流程結束於步驟1932。 In step 1932, the prefetch unit 124 initializes and fills in the new hardware unit 332 (configured in step 1903) according to steps 406 and 408 of FIG. 4, which is expected to be finally associated with the fourth to sixth figures. The method detects a new pattern of access to a new memory block, which will require warm-up time. The process ends at step 1932.

在步驟1934中,預取單元124預測存取要求將會根據匹配虛擬相鄰主動記憶體區塊之樣態暫存器344與方向暫存器342繼續進入新的記憶體區塊。因此,預取單元124以相似於步驟1932之方式填寫新的硬體單元332,但會有些許不同。仔細而言,預取單元124會用來自虛擬相鄰記憶體區塊之硬體單元332的對應數值來填寫方向暫存器342、樣態暫存器344以及樣態週期暫存器346。另外,樣態區域暫存器348之新的數值係藉由繼續切換於增加之樣態週期暫存器346之值所決定,直到其交叉進入新的記憶體區塊,以提供樣態暫存器344持續地進 入新的記憶體區塊,如步驟1928中之相關敘述。再者,新的硬體單元332中之狀態欄356係用以標記新的硬體單元332為試用(probationary)。最後,搜尋指標暫存352被初使化以便由一記憶體區塊之開頭進行搜尋。流程進行至步驟1936。 In step 1934, the prefetch unit 124 predicts that the access request will continue to enter the new memory block based on the state register 344 and the direction register 342 that match the virtual adjacent active memory block. Therefore, the prefetch unit 124 fills in the new hardware unit 332 in a manner similar to step 1932, but will be slightly different. In detail, the prefetch unit 124 fills the direction register 342, the mode register 344, and the state cycle register 346 with corresponding values from the hardware unit 332 of the virtual adjacent memory block. In addition, the new value of the mode region register 348 is determined by continuing to switch to the value of the increased state cycle register 346 until it crosses into a new memory block to provide a temporary state of storage. 344 continues to advance Enter a new memory block, as described in step 1928. Furthermore, the status bar 356 in the new hardware unit 332 is used to mark the new hardware unit 332 as a probationary. Finally, the search indicator temporary store 352 is initialized to be searched by the beginning of a memory block. The flow proceeds to step 1936.

在步驟1936中,預取單元124繼續監視發生於新記憶體區塊之存取要求。若預取單元124偵測到對記憶體區塊之至少一既定數量的後續存取要求是要求樣態暫存器344所預測之記憶體快取線,接著預取單元124促使硬體單元332之狀態欄356自試用(probationary)轉為主動,並且接著如第6圖所述開始自新的記憶體區塊進行預取。在一實施例中,存取要求之既定數量為2,雖然其他實施例可考慮為其它既定數量。流程結束於步驟1936。 In step 1936, prefetch unit 124 continues to monitor access requests that occur in the new memory block. If the prefetch unit 124 detects that at least a predetermined number of subsequent access requests to the memory block is a memory cache line predicted by the mode register 344, then the prefetch unit 124 causes the hardware unit 332 The status bar 356 transitions from probationary to active, and then begins prefetching from the new memory block as described in FIG. In one embodiment, the predetermined number of access requests is two, although other embodiments may be considered for other predetermined quantities. The process ends at step 1936.

如第20圖所示為第15圖所示之預取單元124所用之一雜湊實體位址至雜湊虛擬位址庫(hashed physical address-to-hashed virtual address thesaurus)2002。雜湊實體位址至雜湊虛擬位址庫2002包括一項目陣列。每一項目包括一實體位址(PA)2004以及一對應的雜湊虛擬位址(HVA)2006。對應的雜湊虛擬位址2006係由實體位址2004轉譯成之虛擬位址加以雜湊的結果。預取單元124藉由對跨越載入/儲存單元134的管線之最近一對虛擬/實體位址進行竊聽,用以對雜湊實體位址至雜湊虛擬位址庫2002之項目進行填寫。在另一實施例中,於第19圖之步驟1902,第一級資料快取記憶體116並未提供已雜湊虛擬位址(HVAAR)至預取單元124,但只提供配置要求所相關之實體位址。預取單元124在雜湊實體位址至雜湊虛擬位 址庫2002中尋找實體位置,以找到一匹配實體位址(PA)2004,並獲得相關之雜湊虛擬位址(HVA)2006,其將在第19圖其他部分成為已雜湊虛擬位址(HVAAR)。將雜湊實體位址至雜湊虛擬位址庫2002包括在預取單元124可緩和第一級資料快取記憶體116提供配置要求所要求之雜湊虛擬位址的需要,因此可簡化第一級資料快取記憶體116與預取單元124之間的介面。 As shown in FIG. 20, one of the hashed physical address-to-hashed virtual address thesaurus 2002 is used by the prefetch unit 124 shown in FIG. The hash entity address to hashed virtual address library 2002 includes an array of items. Each item includes a physical address (PA) 2004 and a corresponding hash virtual address (HVA) 2006. The corresponding hash virtual address 2006 is the result of hashing the virtual address translated into the physical address 2004. The prefetch unit 124 fills in the items of the hash entity address to the hash virtual address base 2002 by eavesdropping on the most recent pair of virtual/physical addresses of the pipeline spanning the load/store unit 134. In another embodiment, in step 1902 of FIG. 19, the first level data cache memory 116 does not provide a hashed virtual address (HVAAR) to the prefetch unit 124, but only provides the entity associated with the configuration request. Address. Prefetch unit 124 is in the hash entity address to the hash virtual bit The location of the entity is located in the location database 2002 to find a matching physical address (PA) 2004 and obtain the associated hash virtual address (HVA) 2006, which will become a hashed virtual address (HVAAR) in other parts of Figure 19. . Addressing the hash entity address to the hash virtual address library 2002 includes the need for the prefetch unit 124 to alleviate the hash virtual address required by the first level data cache 116 to provide configuration requirements, thereby simplifying the first level of data. The interface between the memory 116 and the prefetch unit 124 is taken.

在一實施例中,雜湊實體位址至雜湊虛擬位址庫2002之每一項目包括一雜湊實體位址,而不是實體位址2004,並且預取單元124將自第一級資料快取記憶體116所接收之配置要求實體位址雜湊成一雜湊實體位址,用以搜尋雜湊實體位址至雜湊虛擬位址庫2002,以便獲得適當之對應的雜湊虛擬位址(HVA)2006。本實施例允許較小之雜湊實體位址至雜湊虛擬位址庫2002,但需要另外之時間對實體位址進行雜湊。 In one embodiment, each item of the hash entity address to the hash virtual address pool 2002 includes a hash entity address instead of the physical address 2004, and the prefetch unit 124 will cache the memory from the first level. The configuration received by 116 requires the physical address to be hashed into a hashed physical address to search for the hashed physical address to the hashed virtual address base 2002 to obtain the appropriate corresponding hash virtual address (HVA) 2006. This embodiment allows a smaller hashed entity address to be hashed to the virtual address pool 2002, but requires additional time to hash the physical address.

如第21圖所示為本發明實施例之多核微處理器100。多核微處理器100包括兩個核心(表示成核心A2102A以及核心B2102B),可整個視為核心2102(或者單一核心2102)。每一核心具有相似於如第2圖所示之單核微處理器100之元件12或15。另外,每一核心2102具有如前所述之高度反應式的預取單元2104。該兩個核心2102共享第二級快取記憶體118以及預取單元124。特別的是,每一核心2012之第一級資料快取記憶體116、載入/儲存單元134以及高度反應式的預取單元2104係耦接至共享之第二級快取記憶體118以及預取單元124。另外,一共享之高度反應式的預取單元2106係耦接至第二級快取記憶體118以及預取單元124。在一實施例中,高度反應式的預取 單元2104/共享之高度反應式的預取單元2106只預取一記憶體存取所相關之快取線後的下一個相鄰之快取線。 As shown in Fig. 21, a multi-core microprocessor 100 according to an embodiment of the present invention is shown. The multi-core microprocessor 100 includes two cores (denoted as core A2102A and core B2102B), which may be considered entirely as core 2102 (or single core 2102). Each core has an element 12 or 15 similar to the single core microprocessor 100 as shown in FIG. Additionally, each core 2102 has a highly reactive prefetch unit 2104 as previously described. The two cores 2102 share the second level cache memory 118 and the prefetch unit 124. In particular, each core 2012 first level data cache memory 116, load/store unit 134, and highly reactive prefetch unit 2104 are coupled to the shared second level cache memory 118 and pre Take unit 124. In addition, a shared highly reactive prefetch unit 2106 is coupled to the second level cache memory 118 and the prefetch unit 124. In an embodiment, highly reactive prefetching The unit 2104/shared highly reactive prefetch unit 2106 prefetches only the next adjacent cache line after the memory access associated with the cache line.

預取單元124除了監控載入/儲存單元134以及第一級資料快取記憶體116之記憶體存取之外,亦可監控高度反應式的預取單元2104/共享之高度反應式的預取單元2106所產生之記憶體存取,用以進行預取決定。預取單元124可監控從不同組合之記憶體存取來源的記憶體存取,以執行本發明所述之不同的功能。例如,預取單元124可監控記憶體存取之一第一組合,以執行第2到11圖所述之關相功能,預取單元124可監控記憶體存取之一第二組合,以執行第12到14圖所述之相關功能,並且預取單元124可監控記憶體存取之一第三組合,以執行第15到19圖所述之相關功能。在實施例中,共享之預取單元124由於時間因素難以監控每一核心2102的載入/儲存單元134之行為。因此,共享之預取單元124經由第一級資料快取記憶體116所產生之傳輸狀況(traffic)間接地監控載入/儲存單元134之行為,作為其載入/儲存未命中(miss)之結果。 In addition to monitoring the memory access of the load/store unit 134 and the first level data cache 116, the prefetch unit 124 can also monitor the highly reactive prefetch unit 2104/shared highly reactive prefetch. The memory access generated by unit 2106 is used to make a prefetch decision. Prefetch unit 124 can monitor memory accesses from different combined memory access sources to perform the different functions described herein. For example, the prefetch unit 124 can monitor a first combination of memory accesses to perform the phase closure function described in FIGS. 2 through 11, and the prefetch unit 124 can monitor a second combination of memory accesses to perform The related functions described in Figures 12 through 14 and the prefetch unit 124 can monitor a third combination of memory accesses to perform the associated functions described in Figures 15 through 19. In an embodiment, the shared prefetch unit 124 has difficulty monitoring the behavior of the load/store unit 134 of each core 2102 due to time factors. Therefore, the shared prefetch unit 124 indirectly monitors the behavior of the load/store unit 134 via the traffic generated by the first level data cache 116 as its load/store miss. result.

本發明的不同實施例已於本文敘述,但本領域具有通常知識者應能瞭解這些實施例僅作為範例,而非限定於此。本領域具有通常知識者可在不脫離本發明之精神的情況下,對形式與細節上做不同的變化。例如,軟體可致能本發明實施例所述的裝置與方法之功能、組建(fabrication)、塑造(modeling)、模擬、描述(description)、以及/或測試,亦可透過一般程式語言(C、C++)、硬體描述語言(Hardware Description Languages,HDL)(包括Verilog HDL、VHDL等等)、 或其他可利用的程式語言來完成。此軟體可配置在任何已知的電腦可使用媒介,例如磁帶、半導體、磁碟,或是光碟(例如CD-ROM、DVD-ROM等等)、網際網路、有線、無線、或其他通訊媒介的傳輸方式之中。本發明所述之裝置與方法實施例可被包括於半導體智慧財產核心,例如微處理器核心(以HDL來實現),並轉換成積體電路產品的硬體。此外,本發明所述之裝置與方法透過硬體與軟體的結合來實現。因此,本發明不應侷限於所揭露之實施例,而是依後附之申請專利範圍與等效實施所界定。特別是,本發明可實施在使用於一般用途電腦中的微處理器裝置內。最後,本發明雖以較佳實施例揭露如上,然其並非用以限定本發明的範圍,任何所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可做些許的更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Various embodiments of the invention have been described herein, but those skilled in the art should understand that these embodiments are only by way of example and not limitation. Variations in form and detail may be made by those skilled in the art without departing from the spirit of the invention. For example, the software can enable the functions, fabrication, modeling, simulation, description, and/or testing of the apparatus and method described in the embodiments of the present invention, and can also be through a general programming language (C, C++), Hardware Description Languages (HDL) (including Verilog HDL, VHDL, etc.), Or other available programming languages to complete. The software can be configured on any known computer usable medium such as tape, semiconductor, disk, or optical disc (eg CD-ROM, DVD-ROM, etc.), internet, wired, wireless, or other communication medium. Among the transmission methods. The apparatus and method embodiments of the present invention can be included in a semiconductor intellectual property core, such as a microprocessor core (implemented in HDL), and converted into hardware of an integrated circuit product. Furthermore, the apparatus and method of the present invention are implemented by a combination of a hardware and a soft body. Therefore, the invention should not be limited to the disclosed embodiments, but is defined by the scope of the appended claims. In particular, the present invention can be implemented in a microprocessor device for use in a general purpose computer. In the following, the present invention is not limited to the scope of the present invention, and any one of ordinary skill in the art can make a slight difference without departing from the spirit and scope of the present invention. The scope of protection of the present invention is defined by the scope of the appended claims.

100‧‧‧微處理器 100‧‧‧Microprocessor

102‧‧‧指令快取記憶體 102‧‧‧ instruction cache memory

104‧‧‧指令解碼器 104‧‧‧ instruction decoder

106‧‧‧暫存器別名表 106‧‧‧Scratchpad alias table

108‧‧‧保留站 108‧‧‧Reservation station

112‧‧‧執行單元 112‧‧‧Execution unit

132‧‧‧其他執行單元 132‧‧‧Other execution units

134‧‧‧載入/儲存單元 134‧‧‧Load/storage unit

124‧‧‧預取單元 124‧‧‧Prefetching unit

114‧‧‧引退單元 114‧‧‧Retirement unit

116‧‧‧第一級資料快取記憶體 116‧‧‧First level data cache memory

118‧‧‧第二級快取記憶體 118‧‧‧Second level cache memory

122‧‧‧匯流排介面單元 122‧‧‧ Busbar interface unit

Claims (25)

一種預取單元,設置於一微處理器中,包括:複數週期匹配計數器,分別相應於不同之複數樣態週期;以及一控制邏輯,用以響應於該微處理器存取一記憶體區塊的動作,更新該等週期匹配計數器;根據該等週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由該等週期匹配計數器所決定之具有該明顯的樣態週期之一樣態,對該記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 A prefetching unit is disposed in a microprocessor, comprising: a complex period matching counter corresponding to different complex pattern periods; and a control logic for accessing a memory block in response to the microprocessor The action of updating the period matching counters; determining an apparent period of time according to the count value of the matching counters; and determining the state of the apparent period determined by the period matching counters, The cache line that has not been prefetched in the plurality of cache lines in the memory block is prefetched. 如申請專利範圍第1項所述之預取單元,更包括:一位元遮罩暫存器,具有該記憶體區塊中每一該等快取線所對應之位元;其中,該控制邏輯響應於對該記憶體區塊進行存取的動作,更新該位元遮罩暫存器,以指示該記憶體區塊中已被存取之該等快取線;以及其中,該控制邏輯從該位元遮罩暫存器,決定具有該明顯的樣態週期之該樣態。 The prefetching unit of claim 1, further comprising: a one-dimensional mask register having a bit corresponding to each of the cache lines in the memory block; wherein the control Logic responsive to the act of accessing the memory block, updating the bit mask register to indicate the cache lines that have been accessed in the memory block; and wherein the control logic From the bit mask register, the pattern having the apparent transition period is determined. 如申請專利範圍第2項所述之預取單元,更包括:一中間指標暫存器,具有由該控制邏輯響應於該記憶體區塊進行存取的動作所計算之值,用以在該位元遮罩暫存器中指向該記憶體區塊中已被存取之該等快取線中的一中間 快取線;以及其中,每一不同之該等樣態週期分別為該中間指標暫存器左邊或者右邊之位元的數量。 The prefetching unit of claim 2, further comprising: an intermediate indicator register having a value calculated by an action of the control logic to access the memory block for accessing the a middle of the cache lines in the bit mask register pointing to the memory block that has been accessed in the memory block a cache line; and wherein each of the different sample periods is the number of bits to the left or right of the intermediate indicator register. 如申請專利範圍第3項所述之預取單元,其中為了更新該等週期匹配計數器,該控制邏輯對每一該等週期匹配計數器分別進行以下步驟:當該中間指標暫存器左邊之N個位元以及該中間指標暫存器右邊之N個位元相匹配時,增加該週期匹配計數器之數值;以及其中,N為不同之該等樣態週期中相應於該週期匹配計數器之一者。 The prefetching unit of claim 3, wherein in order to update the periodic matching counters, the control logic performs the following steps for each of the periodic matching counters: when the N of the intermediate indicator registers are left When the bit and the N bits on the right side of the intermediate indicator register match, the value of the period matching counter is increased; and wherein N is one of the matching period counters corresponding to the period in the different period of the pattern. 如申請專利範圍第3項所述之預取單元,更包括:一最小指標暫存器,用以指向該記憶體區塊中已被存取之該等快取線中之一最低快取線;一最大指標暫存器,用以指向該記憶體區塊中已被存取之該等快取線中之一最高快取線;以及其中,該控制邏輯將該中間指標暫存器之計數值作為該最小指標暫存器之計數值以及最大指標暫存器之計數值的一平均。 The prefetching unit of claim 3, further comprising: a minimum indicator register for pointing to one of the cache lines of the memory block that has been accessed; a maximum indicator register for pointing to one of the highest cache lines of the cache lines that have been accessed in the memory block; and wherein the control logic calculates the intermediate indicator register The value is used as an average of the count value of the minimum indicator register and the count value of the largest indicator register. 如申請專利範圍第3項所述之預取單元,其中由該控制邏輯自該位元遮罩暫存器所決定具有該明顯的樣態週期之該樣態為該位元遮罩暫存器中該中間指標暫存器左邊或者右邊之該明顯的樣態週期位元。 The prefetching unit of claim 3, wherein the mode is determined by the control logic from the bit mask register to have the apparent period of time. The apparent morphological period bit to the left or right of the intermediate indicator register. 如申請專利範圍第1項所述之預取單元,其中為了根據該等 週期匹配計數器決定該明顯的樣態週期,該控制邏輯更用以:決定該等週期匹配計數器中之一者與該等週期匹配計數器中之其它者的計數值之間的差值是否大於一既定值。 The prefetching unit as described in claim 1 of the patent application, wherein The period matching counter determines the apparent modal period, and the control logic is further configured to: determine whether a difference between one of the period matching counters and a count value of the other of the period matching counters is greater than a predetermined value value. 如申請專利範圍第1項所述之預取單元,其中不同之該等樣態週期包括至少三個不同之樣態週期。 The prefetching unit of claim 1, wherein the different modal periods include at least three different modal periods. 如申請專利範圍第1項所述之預取單元,其中該控制邏輯僅在該記憶體區塊中之至少九個該等快取線已被存取時,對該微處理器中該等尚未被預取之快取線進行預取。 The prefetching unit of claim 1, wherein the control logic is only in the microprocessor when at least nine of the cache lines in the memory block have been accessed. Prefetched by the prefetched cache line. 如申請專利範圍第1項所述之預取單元,其中該控制邏輯僅在該記憶體區塊中之至少十個該等快取線已被存取時,對該微處理器中該等尚未被預取之快取線進行預取。 The prefetching unit of claim 1, wherein the control logic is only in the microprocessor when at least ten of the cache lines in the memory block have been accessed. Prefetched by the prefetched cache line. 如申請專利範圍第1項所述之預取單元,其中為了該微處理器根據具有該明顯的樣態週期之該樣態預取該記憶體區塊中尚未被預取之快取線,該控制邏輯更用以:定位在一搜尋指標暫存器中,在該位元遮罩暫存器外的該樣態;對該樣態中之每一位元進行一預測,該預測包括當該位元被設置時,預測是否需要相應於該位元之該快取線,當需要相應於該位元之該快取線時,判斷該快取線是否尚未被預取,其中當相應於該樣態中之該位元的該位元遮罩中之該位元指出該快取線尚未被存取時,判斷該快取線尚未被預取。 The prefetching unit of claim 1, wherein the microprocessor prefetches a cache line in the memory block that has not been prefetched according to the mode having the apparent phase period. The control logic is further configured to: locate in a search index register, mask the state outside the register in the bit; perform a prediction on each bit in the state, the prediction includes when When the bit is set, it is predicted whether the cache line corresponding to the bit is needed, and when the cache line corresponding to the bit is needed, it is determined whether the cache line has not been prefetched, wherein when corresponding to the cache line When the bit in the bit mask of the bit in the state indicates that the cache line has not been accessed, it is determined that the cache line has not been prefetched. 如申請專利範圍第11項所述之預取單元,其中為了該微處 理器根據具有該明顯的樣態週期之該樣態預取該記憶體區塊中尚未被預取之快取線,該控制邏輯更用以:僅在該快取線少於該記憶體區塊中位於已被存取之該等快取線末端之一快取線之一既定值時,對被該控制邏輯預測為需要以及尚未被預取的該快取線進行預取。 For example, the prefetching unit described in claim 11 is for the micro-location The controller prefetches the cache line in the memory block that has not been prefetched according to the state with the obvious period of time, and the control logic is further used to: only the cache line is less than the memory area When the block is located at a predetermined value of one of the cache lines at the end of the cache line that has been accessed, the cache line predicted by the control logic as needed and not yet prefetched is prefetched. 一種資料預取方法,包括:藉由一微處理器,響應於對存取一記憶體區塊的動作,對複數週期匹配計數器進行更新,其中該等週期匹配計數器分別相應於不同之複數樣態週期;根據該等週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由該等週期匹配計數器所決定之具有該明顯的樣態週期之一樣態,對該記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 A data prefetching method includes: updating, by a microprocessor, a complex period matching counter in response to an action of accessing a memory block, wherein the period matching counters respectively correspond to different complex patterns a period; determining an apparent period of time according to the count value of the matching counters; and determining, according to the period matching counters, the same state of the period of the pattern, in the memory block The cache line that has not been prefetched in the complex cache line is prefetched. 如申請專利範圍第13項所述之資料預取方法,更包括:響應於對該記憶體區塊進行存取的動作,更新一位元遮罩暫存器,以指示該記憶體區塊中已被存取之該等快取線,其中該位元遮罩暫存器具有該記憶體區塊中每一該等快取線所對應之位元;以及自該位元遮罩暫存器,決定具有該明顯的樣態週期之該樣態。 The method for prefetching data according to claim 13 further includes: updating a one-bit mask register to indicate the memory block in response to the accessing the memory block The cache lines that have been accessed, wherein the bit mask register has a bit corresponding to each of the cache lines in the memory block; and a mask register from the bit mask , decides to have this apparent state of the pattern cycle. 如申請專利範圍第14項所述之資料預取方法,更包括:響應於該記憶體區塊進行存取的動作,計算一中間指標,該中間指標用以在該位元遮罩暫存器中指向該記憶體區塊 中已被存取之該等快取線中的一中間快取線;以及其中,每一不同之該等樣態週期分別為該中間指標暫存器左邊或者右邊之位元的數量。 The method for prefetching data according to claim 14 further includes: calculating an intermediate indicator for masking the temporary register in the bit in response to the accessing operation of the memory block; Point to the memory block An intermediate cache line of the cache lines that have been accessed; and wherein each of the different sample periods is the number of bits to the left or right of the intermediate indicator register. 如申請專利範圍第15項所述之資料預取方法,其中所述更新該等週期匹配計數器的步驟更包括對每一該等週期匹配計數器分別進行以下步驟:當該中間指標暫存器左邊之N個位元以及該中間指標暫存器右邊之N個位元相匹配時,增加該週期匹配計數器之數值;以及其中,N為不同之該等樣態週期中相應於該週期匹配計數器之一者。 The data prefetching method of claim 15, wherein the step of updating the period matching counters further comprises the following steps for each of the period matching counters: when the intermediate indicator register is on the left side When the N bits and the N bits on the right side of the intermediate indicator register match, the value of the period matching counter is increased; and wherein N is different, one of the matching periods corresponding to the period in the period of the same period By. 如申請專利範圍第15項所述之資料預取方法,更包括:維持一最小指標暫存器,使得該最小指標暫存器指向該記憶體區塊中已被存取之該等快取線中之一最低快取線;維持一最大指標暫存器,使得該最小指標暫存器指向該記憶體區塊中已被存取之該等快取線中之一最高快取線;以及將該中間指標暫存器之計數值作為該最小指標暫存器之計數值以及最大指標暫存器之計數值的一平均。 The method for prefetching data according to claim 15 further includes: maintaining a minimum indicator register, such that the minimum indicator register points to the cache line that has been accessed in the memory block. One of the lowest cache lines; maintaining a maximum indicator register such that the minimum indicator register points to one of the highest cache lines of the cache lines that have been accessed in the memory block; The count value of the intermediate indicator register is used as an average of the count value of the minimum indicator register and the count value of the maximum indicator register. 如申請專利範圍第15項所述之資料預取方法,其中由該控制邏輯自該位元遮罩暫存器所決定具有該明顯的樣態週期之該樣態為該位元遮罩暫存器中該中間指標暫存器左邊或者右邊之該明顯的樣態週期位元。 The method for prefetching data according to claim 15 , wherein the mode is determined by the control logic from the bit mask register to have the apparent period of the state, and the bit mask is temporarily stored. The apparent morphological period bit to the left or right of the intermediate indicator register. 如申請專利範圍第13項所述之資料預取方法,其中所述根 據該等週期匹配計數器決定該明顯的樣態週期的步驟更包括:決定該等週期匹配計數器中之一者與該等週期匹配計數器中之其它者的計數值之間的差值是否大於一既定值。 The method for prefetching data according to claim 13 of the patent application, wherein the root The step of determining the apparent modal period according to the period matching counters further comprises: determining whether a difference between one of the period matching counters and a count value of the other of the period matching counters is greater than a predetermined value value. 如申請專利範圍第13項所述之資料預取方法,其中不同之該等樣態週期包括至少三個不同之樣態週期。 The data prefetching method of claim 13, wherein the different modal periods include at least three different modal periods. 如申請專利範圍第13項所述之資料預取方法,其中僅在該記憶體區塊中之至少九個該等快取線已被存取時,執行所述對該微處理器中該等尚未被預取之快取線進行預取的步驟。 The data prefetching method of claim 13, wherein only when at least nine of the cache lines in the memory block have been accessed, performing the processing in the microprocessor The step of prefetching the cache line that has not been prefetched. 如申請專利範圍第13項所述之資料預取方法,其中僅在該記憶體區塊中之至少十個該等快取線已被存取時,執行所述對該微處理器中該等尚未被預取之快取線進行預取的步驟。 The data prefetching method of claim 13, wherein only when at least ten of the cache lines in the memory block have been accessed, performing the processing in the microprocessor The step of prefetching the cache line that has not been prefetched. 如申請專利範圍第13項所述之資料預取方法,其中所述根據具有該明顯的樣態週期之該樣態預取該記憶體區塊中尚未被預取之快取線的步驟更包括:定位在一搜尋指標暫存器中,在該位元遮罩暫存器外的該樣態;對該樣態中之每一位元進行一預測,該預測包括當該位元被設置時,預測是否需要相應於該位元之該快取線,當需要相應於該位元之該快取線時,判斷該快取線是否尚未被預取,其中當相應於該樣態中之該位元的該位元遮罩中之該位元指出該快取線尚未被存取時,判斷該快取線尚未被 預取。 The data prefetching method of claim 13, wherein the step of prefetching the cache line in the memory block that has not been prefetched according to the aspect having the apparent period is further included. Locating in a search index register, the mode outside the scratch mask of the bit; performing a prediction for each bit in the pattern, the prediction including when the bit is set Determining whether the cache line corresponding to the bit is needed, and when the cache line corresponding to the bit is needed, determining whether the cache line has not been prefetched, wherein when corresponding to the pattern The bit in the bit mask of the bit indicates that the cache line has not been accessed, and determines that the cache line has not been Prefetching. 如申請專利範圍第23項所述之資料預取方法,其中所述根據具有該明顯的樣態週期之該樣態預取該記憶體區塊中尚未被預取之快取線的步驟更包括:僅在該快取線少於該記憶體區塊中位於已被存取之該等快取線末端之一快取線之一既定值時,對被該控制邏輯預測為需要並被判斷為尚未被預取的該快取線進行預取。 The data prefetching method of claim 23, wherein the step of prefetching the cache line in the memory block that has not been prefetched according to the aspect having the apparent period is further included. : when the cache line is less than a predetermined value of one of the cache lines at the end of the cache line that has been accessed in the memory block, it is predicted to be required by the control logic and is determined as The cache line that has not been prefetched is prefetched. 一種電腦程式產品,編碼於至少一電腦可讀取媒體之上,並且適用於一計算裝置,該電腦程式產品包括:一電腦可讀程式編碼,儲存於該電腦可讀取媒體,用以定義一微處理器中之一預取單元,該電腦可讀程式包括:一第一程式碼,用以定義分別相應於不同之複數樣態週期的複數週期匹配計數器;以及一第二程式碼,用以定義一控制邏輯,該控制邏輯用以:響應於存取一記憶體區塊的動作,更新該等週期匹配計數器;根據該等週期匹配計數器之計數值,決定一明顯的樣態週期;以及根據由該等週期匹配計數器所決定之具有該明顯的樣態週期之一樣態,對該記憶體區塊中之複數快取線中尚未被預取之快取線進行預取。 A computer program product encoded on at least one computer readable medium and adapted for use in a computing device, the computer program product comprising: a computer readable program code stored in the computer readable medium for defining a a pre-fetching unit in the microprocessor, the computer readable program comprising: a first code for defining a complex period matching counter corresponding to different complex period periods; and a second code for Defining a control logic, the control logic is configured to: update the period matching counters in response to an action of accessing a memory block; determine an apparent period according to the count value of the matching counters; and The cache line that has not been prefetched in the plurality of cache lines in the memory block is prefetched by the same period of the apparent period as determined by the period matching counters.
TW103128257A 2010-03-29 2011-03-29 Prefetcher, method of prefetch data and computer program product TWI519955B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US31859410P 2010-03-29 2010-03-29
US13/033,765 US8762649B2 (en) 2010-03-29 2011-02-24 Bounding box prefetcher
US13/033,848 US8719510B2 (en) 2010-03-29 2011-02-24 Bounding box prefetcher with reduced warm-up penalty on memory block crossings
US13/033,809 US8645631B2 (en) 2010-03-29 2011-02-24 Combined L2 cache and L1D cache prefetcher

Publications (2)

Publication Number Publication Date
TW201447581A TW201447581A (en) 2014-12-16
TWI519955B true TWI519955B (en) 2016-02-01

Family

ID=44490596

Family Applications (5)

Application Number Title Priority Date Filing Date
TW104118874A TWI547803B (en) 2010-03-29 2011-03-29 Prefetcher, method of prefetch data, computer program product and microprocessor
TW105108032A TWI574155B (en) 2010-03-29 2011-03-29 Method of prefetch data, computer program product and microprocessor
TW104118873A TWI534621B (en) 2010-03-29 2011-03-29 Prefetcher, method of prefetch data, computer program product and microprocessor
TW100110731A TWI506434B (en) 2010-03-29 2011-03-29 Prefetcher,method of prefetch data,computer program product and microprocessor
TW103128257A TWI519955B (en) 2010-03-29 2011-03-29 Prefetcher, method of prefetch data and computer program product

Family Applications Before (4)

Application Number Title Priority Date Filing Date
TW104118874A TWI547803B (en) 2010-03-29 2011-03-29 Prefetcher, method of prefetch data, computer program product and microprocessor
TW105108032A TWI574155B (en) 2010-03-29 2011-03-29 Method of prefetch data, computer program product and microprocessor
TW104118873A TWI534621B (en) 2010-03-29 2011-03-29 Prefetcher, method of prefetch data, computer program product and microprocessor
TW100110731A TWI506434B (en) 2010-03-29 2011-03-29 Prefetcher,method of prefetch data,computer program product and microprocessor

Country Status (2)

Country Link
CN (4) CN104636274B (en)
TW (5) TWI547803B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8959320B2 (en) * 2011-12-07 2015-02-17 Apple Inc. Preventing update training of first predictor with mismatching second predictor for branch instructions with alternating pattern hysteresis
US9442759B2 (en) * 2011-12-09 2016-09-13 Nvidia Corporation Concurrent execution of independent streams in multi-channel time slice groups
WO2013089682A1 (en) 2011-12-13 2013-06-20 Intel Corporation Method and apparatus to process keccak secure hashing algorithm
US10146545B2 (en) 2012-03-13 2018-12-04 Nvidia Corporation Translation address cache for a microprocessor
US9880846B2 (en) 2012-04-11 2018-01-30 Nvidia Corporation Improving hit rate of code translation redirection table with replacement strategy based on usage history table of evicted entries
US10241810B2 (en) 2012-05-18 2019-03-26 Nvidia Corporation Instruction-optimizing processor with branch-count table in hardware
US20140189310A1 (en) 2012-12-27 2014-07-03 Nvidia Corporation Fault detection in instruction translations
CN104133780B (en) * 2013-05-02 2017-04-05 华为技术有限公司 A kind of cross-page forecasting method, apparatus and system
US10514920B2 (en) * 2014-10-20 2019-12-24 Via Technologies, Inc. Dynamically updating hardware prefetch trait to exclusive or shared at program detection
CN105653199B (en) * 2014-11-14 2018-12-14 群联电子股份有限公司 Method for reading data, memory storage apparatus and memorizer control circuit unit
US10387318B2 (en) * 2014-12-14 2019-08-20 Via Alliance Semiconductor Co., Ltd Prefetching with level of aggressiveness based on effectiveness by memory access type
US10152421B2 (en) * 2015-11-23 2018-12-11 Intel Corporation Instruction and logic for cache control operations
CN106919367B (en) * 2016-04-20 2019-05-07 上海兆芯集成电路有限公司 Detect the processor and method of modification program code
US10579522B2 (en) * 2016-09-13 2020-03-03 Andes Technology Corporation Method and device for accessing a cache memory
US10353601B2 (en) * 2016-11-28 2019-07-16 Arm Limited Data movement engine
US10732858B2 (en) 2017-01-19 2020-08-04 International Business Machines Corporation Loading and storing controls regulating the operation of a guarded storage facility
US10579377B2 (en) 2017-01-19 2020-03-03 International Business Machines Corporation Guarded storage event handling during transactional execution
US10452288B2 (en) 2017-01-19 2019-10-22 International Business Machines Corporation Identifying processor attributes based on detecting a guarded storage event
US10725685B2 (en) * 2017-01-19 2020-07-28 International Business Machines Corporation Load logical and shift guarded instruction
US10496292B2 (en) 2017-01-19 2019-12-03 International Business Machines Corporation Saving/restoring guarded storage controls in a virtualized environment
US10496311B2 (en) 2017-01-19 2019-12-03 International Business Machines Corporation Run-time instrumentation of guarded storage event processing
CN109857786B (en) * 2018-12-19 2020-10-30 成都四方伟业软件股份有限公司 Page data filling method and device
CN111797052B (en) * 2020-07-01 2023-11-21 上海兆芯集成电路股份有限公司 System single chip and system memory acceleration access method
KR102253362B1 (en) * 2020-09-22 2021-05-20 쿠팡 주식회사 Electronic apparatus and information providing method using the same
CN112416437B (en) * 2020-12-02 2023-04-21 海光信息技术股份有限公司 Information processing method, information processing device and electronic equipment
WO2022233391A1 (en) * 2021-05-04 2022-11-10 Huawei Technologies Co., Ltd. Smart data placement on hierarchical storage
CN114116529A (en) * 2021-12-01 2022-03-01 上海兆芯集成电路有限公司 Fast loading device and data caching method

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5003471A (en) * 1988-09-01 1991-03-26 Gibson Glenn A Windowed programmable data transferring apparatus which uses a selective number of address offset registers and synchronizes memory access to buffer
SE515718C2 (en) * 1994-10-17 2001-10-01 Ericsson Telefon Ab L M Systems and methods for processing memory data and communication systems
US6484239B1 (en) * 1997-12-29 2002-11-19 Intel Corporation Prefetch queue
US6810466B2 (en) * 2001-10-23 2004-10-26 Ip-First, Llc Microprocessor and method for performing selective prefetch based on bus activity level
JP4067887B2 (en) * 2002-06-28 2008-03-26 富士通株式会社 Arithmetic processing device for performing prefetch, information processing device and control method thereof
US7310722B2 (en) * 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7237065B2 (en) * 2005-05-24 2007-06-26 Texas Instruments Incorporated Configurable cache system depending on instruction type
US20070186050A1 (en) * 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for data lines
WO2008155815A1 (en) * 2007-06-19 2008-12-24 Fujitsu Limited Information processor and cache control method
US8103832B2 (en) * 2007-06-26 2012-01-24 International Business Machines Corporation Method and apparatus of prefetching streams of varying prefetch depth
CN100449481C (en) * 2007-06-29 2009-01-07 东南大学 Storage control circuit with multiple-passage instruction pre-fetching function
US8161243B1 (en) * 2007-09-28 2012-04-17 Intel Corporation Address translation caching and I/O cache performance improvement in virtualized environments
US7890702B2 (en) * 2007-11-26 2011-02-15 Advanced Micro Devices, Inc. Prefetch instruction extensions
US8140768B2 (en) * 2008-02-01 2012-03-20 International Business Machines Corporation Jump starting prefetch streams across page boundaries
JP2009230374A (en) * 2008-03-21 2009-10-08 Fujitsu Ltd Information processor, program, and instruction sequence generation method
US7958317B2 (en) * 2008-08-04 2011-06-07 International Business Machines Corporation Cache directed sequential prefetch
US8402279B2 (en) * 2008-09-09 2013-03-19 Via Technologies, Inc. Apparatus and method for updating set of limited access model specific registers in a microprocessor
US9032151B2 (en) * 2008-09-15 2015-05-12 Microsoft Technology Licensing, Llc Method and system for ensuring reliability of cache data and metadata subsequent to a reboot
CN101887360A (en) * 2009-07-10 2010-11-17 威盛电子股份有限公司 The data pre-acquisition machine of microprocessor and method
CN101667159B (en) * 2009-09-15 2012-06-27 威盛电子股份有限公司 High speed cache system and method of trb

Also Published As

Publication number Publication date
TW201535119A (en) 2015-09-16
TWI506434B (en) 2015-11-01
TWI574155B (en) 2017-03-11
CN105183663A (en) 2015-12-23
CN104636274B (en) 2018-01-26
CN104636274A (en) 2015-05-20
TW201535118A (en) 2015-09-16
CN105183663B (en) 2018-11-27
CN104615548B (en) 2018-08-31
TWI547803B (en) 2016-09-01
TW201447581A (en) 2014-12-16
TWI534621B (en) 2016-05-21
CN102169429B (en) 2016-06-29
TW201135460A (en) 2011-10-16
TW201624289A (en) 2016-07-01
CN104615548A (en) 2015-05-13
CN102169429A (en) 2011-08-31

Similar Documents

Publication Publication Date Title
TWI519955B (en) Prefetcher, method of prefetch data and computer program product
JP4027620B2 (en) Branch prediction apparatus, processor, and branch prediction method
US7958317B2 (en) Cache directed sequential prefetch
US8583894B2 (en) Hybrid prefetch method and apparatus
US9886385B1 (en) Content-directed prefetch circuit with quality filtering
KR102546238B1 (en) Multi-Table Branch Target Buffer
US20070186050A1 (en) Self prefetching L2 cache mechanism for data lines
KR20180039537A (en) Branch predictor that uses multiple byte offsets in hash of instruction block fetch address and branch pattern to generate conditional branch predictor indexes
US8677049B2 (en) Region prefetcher and methods thereof
JP4574712B2 (en) Arithmetic processing apparatus, information processing apparatus and control method
US8601240B2 (en) Selectively defering load instructions after encountering a store instruction with an unknown destination address during speculative execution
US20110010506A1 (en) Data prefetcher with multi-level table for predicting stride patterns
US8195889B2 (en) Hybrid region CAM for region prefetcher and methods thereof
CN101681258A (en) Associate cached branch information with the last granularity of branch instruction in variable length instruction set
JP6701380B2 (en) Up/down prefetcher
US9223714B2 (en) Instruction boundary prediction for variable length instruction set
US11847053B2 (en) Apparatuses, methods, and systems for a duplication resistant on-die irregular data prefetcher
US20080162907A1 (en) Structure for self prefetching l2 cache mechanism for instruction lines
US20080162819A1 (en) Design structure for self prefetching l2 cache mechanism for data lines
US20140115257A1 (en) Prefetching using branch information from an instruction cache
US11379372B1 (en) Managing prefetch lookahead distance based on memory access latency
US11907722B2 (en) Methods and apparatus for storing prefetch metadata
US20230205699A1 (en) Region aware delta prefetcher
US11663132B2 (en) Prefetching