TWI590053B - Selective prefetching of physically sequential cache line to cache line that includes loaded page table - Google Patents

Selective prefetching of physically sequential cache line to cache line that includes loaded page table Download PDF

Info

Publication number
TWI590053B
TWI590053B TW105120463A TW105120463A TWI590053B TW I590053 B TWI590053 B TW I590053B TW 105120463 A TW105120463 A TW 105120463A TW 105120463 A TW105120463 A TW 105120463A TW I590053 B TWI590053 B TW I590053B
Authority
TW
Taiwan
Prior art keywords
cache line
page table
decision
microprocessor
request
Prior art date
Application number
TW105120463A
Other languages
Chinese (zh)
Other versions
TW201710911A (en
Inventor
羅德尼E 虎克
柯林 艾迪
Original Assignee
威盛電子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/790,467 external-priority patent/US9569363B2/en
Application filed by 威盛電子股份有限公司 filed Critical 威盛電子股份有限公司
Publication of TW201710911A publication Critical patent/TW201710911A/en
Application granted granted Critical
Publication of TWI590053B publication Critical patent/TWI590053B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch

Description

選擇性預取實體接續快取線至包含被載入分頁表之快取線 Selective prefetch entity to connect the cache line to the cache line containing the paged table loaded

本發明係關於微處理器,特別是關於微處理器之預取(prefetch)資料的方法。 This invention relates to microprocessors, and more particularly to methods for prefetching data from microprocessors.

現今許多微處理器具有使用虛擬記憶體的能力,特別是能夠運用一記憶體分頁機制(memory paging mechanism)。熟知本領域技藝者應能理解,作業系統在系統記憶體中所建立之分頁表(page tables)係用來將虛擬位址轉譯成實體位址。根據《IA-32英特爾®架構軟體開發者手冊,第3A冊:系統程式設計導引,第1篇,2006年6月》中所描述之x86架構處理器技術(該參考文獻全文係以引用方式併入本文中),分頁表可採取階層方式(hierarchical fashion)排列。具體說來,分頁表包含複數分頁表項目(page table entries;PTE),各個分頁表項目儲存一實體記憶體分頁之實體分頁位址與實體記憶體分頁之屬性。所謂的分頁表尋訪(tablewalk)係指提取一虛擬記憶體分頁位址並使用此虛擬記憶體分頁位址來尋訪(traverse)分頁表階層,用以取得與此虛擬記憶體分頁位址對應之分頁表 項目以便將虛擬位址轉譯成實體位址。 Many microprocessors today have the ability to use virtual memory, particularly the ability to utilize a memory paging mechanism. Those skilled in the art will appreciate that the page tables established by the operating system in system memory are used to translate virtual addresses into physical addresses. The x86 architecture processor technology described in the IA-32 Intel® Architecture Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, June 2006 (the full reference is cited by reference) Incorporated herein, the pagination table can be arranged in a hierarchical fashion. Specifically, the paging table includes page table entries (PTE), and each page table item stores an attribute of an entity paged address and a physical memory page of a physical memory page. The so-called pagewalk refers to extracting a virtual memory paging address and using the virtual memory paging address to traverse the paging table hierarchy for obtaining the paging corresponding to the virtual memory paging address. table Project to translate virtual addresses into physical addresses.

由於實體記憶體存取之延遲時間相對較長,加上在分頁表尋訪過程中可能要對實體記憶體進行多重存取,因此執行分頁表尋訪十分耗時。為了避免因執行分頁表尋訪而造成的時耗,處理器通常會包含一轉譯查詢緩衝器(Translation Lookaside Buffer;TLB)用以儲存虛擬位址及由虛擬位址轉譯成之實體位址。然而,轉譯查詢緩衝器之大小有限,並且當轉譯查詢緩衝器發生遺失(miss)時還是需要執行分頁表尋訪。因此,我們需要一種能夠縮短分頁表尋訪之執行時間的方法。 Since the delay time of physical memory access is relatively long, and the physical memory may be accessed multiple times during the paging table search, it is time consuming to perform paging table search. In order to avoid the time consumption caused by the execution of the paging table search, the processor usually includes a Translation Lookaside Buffer (TLB) for storing the virtual address and the physical address translated by the virtual address. However, the size of the translation query buffer is limited, and a page table lookup is still required when the translation buffer is missed. Therefore, we need a way to shorten the execution time of page table lookups.

在一實施例中,本發明提供一種微處理器,包括一轉譯查詢緩衝器、第一要求、硬體邏輯以及第二要求。第一要求載入一分頁表項目至微處理器,以回應未在轉譯查詢緩衝器找到一虛擬位址,被要求的分頁表項目被包含於一分頁表,分頁表包括複數個快取線,該等快取線包括一第一快取線,第一快取線包括被要求的分頁表項目。硬體邏輯決定實體接續第一快取線的一第二快取線是否在分頁表之外。第二要求預取第二快取線至微處理器,第二要求係至少基於硬體邏輯所作之決定而被選擇性產生。 In one embodiment, the present invention provides a microprocessor including a translation query buffer, a first requirement, hardware logic, and a second requirement. The first request loads a page table entry to the microprocessor in response to not finding a virtual address in the translation query buffer, the requested page table entry being included in a page table, the page table including a plurality of cache lines, The cache lines include a first cache line, and the first cache line includes the requested page table items. The hardware logic determines whether the entity is connected to a second cache line of the first cache line whether it is outside the page break table. The second requirement prefetches the second cache line to the microprocessor, and the second requirement is selectively generated based at least on the decision made by the hardware logic.

在另一實施例中,本發明提供一種方法,包括產生一第一要求以載入一分頁表項目至一微處理器,以回應未在微處理器之一轉譯查詢緩衝器找到一虛擬位址,被要求的分頁表項目被包含於一分頁表,分頁表包括複數個快取線,該等快取線包括一第一快取線,第一快取線包括被要求的分頁表項目; 決定實體接續第一快取線的一第二快取線是否在分頁表之外;以及至少基於決定而選擇性產生一第二要求以預取第二快取線至微處理器。 In another embodiment, the present invention provides a method comprising generating a first request to load a page table entry to a microprocessor in response to not finding a virtual address in a query buffer of one of the microprocessors The requested page table item is included in a page table, the page table includes a plurality of cache lines, the cache lines include a first cache line, and the first cache line includes the requested page table items; Determining whether the entity continues with a second cache line of the first cache line outside of the page break table; and selectively generating a second request based on the decision to prefetch the second cache line to the microprocessor.

在另一實施例中,本發明提供一種被編碼於至少一非暫態(non-transitory)電腦用媒體且使用於運算裝置之電腦程式產品,電腦程式產品包括內建於媒體之電腦用程式碼,用以確認一微處理器。電腦用程式碼包括一第一程式碼,用以確認一轉譯查詢緩衝器;一第二程式碼,用以確認一第一要求以載入一分頁表項目至一微處理器,以回應未在微處理器之一轉譯查詢緩衝器找到一虛擬位址,被要求的分頁表項目被包含於一分頁表,分頁表包括複數個快取線,該等快取線包括一第一快取線,第一快取線包括被要求的分頁表項目;一第三程式碼,用以確認一硬體邏輯,硬體邏輯決定實體接續第一快取線的一第二快取線是否在分頁表之外;以及一第四程式碼,用以確認一第二要求以預取第二快取線至微處理器,第二要求係至少基於決定而被選擇性產生。 In another embodiment, the present invention provides a computer program product for use in at least one non-transitory computer medium and for use in an computing device, the computer program product including computer code built into the media To confirm a microprocessor. The computer program code includes a first code for confirming a translation query buffer, and a second code for confirming a first request to load a page table item to a microprocessor in response to not being One of the microprocessors translates the query buffer to find a virtual address, and the requested page table entry is included in a page table, the page table includes a plurality of cache lines, and the cache lines include a first cache line. The first cache line includes the requested page table item; a third code is used to confirm a hardware logic, and the hardware logic determines whether the entity is connected to a second cache line of the first cache line in the page table. And a fourth code for confirming a second request to prefetch the second cache line to the microprocessor, the second requirement being selectively generated based at least on the decision.

為讓本發明之上述和其他目的、特徵、和優點能更明顯易懂,下文特舉出較佳實施例,並配合所附圖式,作詳細說明如下。 The above and other objects, features, and advantages of the present invention will become more apparent and understood by the appended claims appended claims

100‧‧‧微處理器 100‧‧‧Microprocessor

102‧‧‧指令快取 102‧‧‧ instruction cache

104‧‧‧指令轉譯器 104‧‧‧Instruction Translator

106‧‧‧指令配送器 106‧‧‧Command Dispenser

108‧‧‧載入單元 108‧‧‧Loading unit

112‧‧‧資料快取 112‧‧‧Information cache

114‧‧‧匯流排介面單元 114‧‧‧ Busbar interface unit

116‧‧‧轉譯查詢緩衝器 116‧‧‧Translated query buffer

118‧‧‧分頁表尋訪引擎 118‧‧‧Page Table Search Engine

122‧‧‧預取單元 122‧‧‧Prefetching unit

124‧‧‧第一快取線 124‧‧‧First cache line

126‧‧‧第二快取線 126‧‧‧Second Cache Line

128‧‧‧實體記憶體 128‧‧‧ physical memory

132‧‧‧虛擬位址 132‧‧‧virtual address

134‧‧‧遺失信號 134‧‧‧ Lost signal

136‧‧‧分頁表項目載入請求信號 136‧‧‧Page Table Item Loading Request Signal

138‧‧‧確認信號 138‧‧‧Confirmation signal

142‧‧‧預取請求信號 142‧‧‧Prefetch request signal

144‧‧‧實體位址 144‧‧‧ physical address

396‧‧‧最後旗標 396‧‧‧ final flag

396‧‧‧分頁表項目實體位址 396‧‧‧Page Table Project Entity Address

502‧‧‧分頁表項目位址 502‧‧‧Page Table Item Address

504‧‧‧快取線索引 504‧‧‧ Cacheline Index

506‧‧‧分頁表位址 506‧‧‧page table address

508‧‧‧分頁表 508‧‧‧page table

第1圖係為本發明實施例之微處理器的方塊圖;第2圖係為第1圖中之微處理器的操作流程圖;第3圖係為本發明實施例之微處理器的方塊圖; 第4圖係為第3圖中之微處理器的操作流程圖;第5圖係為分頁表尋訪引擎形成分頁表項目實體位址的方塊圖;第6圖係為分頁表尋訪引擎形成分頁表項目實體位址的方塊圖;第7圖至第10圖為決定第二快取線是否在分頁表之外的實施例之方塊圖;第11圖至第13圖為依據其他實施例之微處理器的方塊圖。 1 is a block diagram of a microprocessor according to an embodiment of the present invention; FIG. 2 is a flowchart of operation of the microprocessor in FIG. 1; and FIG. 3 is a block diagram of a microprocessor according to an embodiment of the present invention; Figure Figure 4 is a flow chart of the operation of the microprocessor in Figure 3; Figure 5 is a block diagram of the paging table search engine forming the physical address of the page table item; Figure 6 is a page table table searching engine forming a page table Block diagram of the project entity address; Figures 7 through 10 are block diagrams of embodiments for determining whether the second cache line is outside the page break table; Figures 11 through 13 are micro-processes according to other embodiments Block diagram of the device.

為讓本發明之目的、特徵和優點能更明顯易懂,下文特舉出本發明之具體實施例,並配合所附圖式,作詳細說明如下。目的在於說明本發明之精神而非用以限定本發明之保護範圍,應理解下列實施例可經由軟體、硬體、韌體、或上述任意組合來實現。 In order to make the objects, features and advantages of the present invention more comprehensible, the specific embodiments of the invention are set forth in the accompanying drawings. The intention is to illustrate the spirit of the invention and not to limit the scope of the invention, it being understood that the following embodiments can be implemented by software, hardware, firmware, or any combination of the above.

請參考第1圖,第1圖係為本發明實施例之微處理器100的方塊圖,此微處理器100係為一管線式微處理器(pipelined microprocessor)。微處理器100包括一指令快取102用以提供複數指令至一指令轉譯器104,並且指令轉譯器104將所接收之指令轉譯並將轉譯後之指令提供至一指令配送器(instruction dispatcher)106。指令配送器106將指令提供至一載入單元108,其中上述指令可包括記憶體存取指令(例如載入指令或儲存指令)。載入單元108將一記憶體存取指令所指定之虛擬位址132提供至一轉譯查詢緩衝器116,並且轉譯查詢緩衝器116對虛擬位址132進行查找(lookup)。若虛擬位址132有出現在 轉譯查詢緩衝器116中,則轉譯查詢緩衝器116將虛擬位址132轉譯後之實體位址144傳送回載入單元108。若虛擬位址132未出現在轉譯查詢緩衝器116中,則轉譯查詢緩衝器116產生一遺失信號(miss signal)134並傳送至一分頁表尋訪引擎(tablewalk engine)118。分頁表尋訪引擎118耦接至載入單元108以及轉譯查詢緩衝器116。 Please refer to FIG. 1. FIG. 1 is a block diagram of a microprocessor 100 according to an embodiment of the present invention. The microprocessor 100 is a pipelined microprocessor. Microprocessor 100 includes an instruction cache 102 for providing a plurality of instructions to an instruction translator 104, and instruction translator 104 translates the received instructions and provides the translated instructions to an instruction dispatcher 106. . The instruction dispatcher 106 provides instructions to a load unit 108, wherein the instructions may include memory access instructions (eg, load instructions or store instructions). The load unit 108 provides the virtual address 132 specified by a memory access instruction to a translation query buffer 116, and the translation query buffer 116 looks up the virtual address 132. If the virtual address 132 appears in In the translation query buffer 116, the translation query buffer 116 transmits the translated physical address 144 of the virtual address 132 back to the loading unit 108. If the virtual address 132 does not appear in the translation query buffer 116, the translation query buffer 116 generates a miss signal 134 and transmits it to a page table engine 118. The page table search engine 118 is coupled to the load unit 108 and the translation query buffer 116.

如第1圖所示,預取單元122與資料快取112也耦接至載入單元108,並且匯流排介面單元114耦接至資料快取112。匯流排介面單元114將微處理器100耦接至一處理器匯流排,上述處理器匯流排耦接至具有微處理器100之電腦系統中的實體記憶體128。具體說來,實體記憶體128儲存複數分頁表,其中一分頁表包括位於實體位址P之一第一快取線124以及位於實體位址P+64之一第二快取線126,並且第一快取線124與第二快取線126分別儲存八個分頁表項目。在本實施例中一條快取線之大小係為64位元組(bytes),並且一個分頁表項目之大小係為8位元組,因此每條快取線可儲存八個分頁表項目。 As shown in FIG. 1 , the prefetch unit 122 and the data cache 112 are also coupled to the load unit 108 , and the bus interface unit 114 is coupled to the data cache 112 . The bus interface unit 114 couples the microprocessor 100 to a processor bus, and the processor bus is coupled to the physical memory 128 in the computer system having the microprocessor 100. Specifically, the physical memory 128 stores a plurality of page break tables, wherein a page break table includes a first cache line 124 located at one of the physical addresses P and a second cache line 126 located at one of the physical addresses P+64, and A cache line 124 and a second cache line 126 store eight page table items, respectively. In this embodiment, the size of one cache line is 64 bytes, and the size of one page table item is 8 bytes, so each cache line can store eight page table items.

請參考第2圖,第2圖係為第1圖中之微處理器100的操作流程圖,用以說明如何預取下一條快取線,其中此快取線與一載入至載入單元之分頁表項目有關。流程從步驟202開始。 Please refer to FIG. 2, which is a flowchart of the operation of the microprocessor 100 in FIG. 1 for explaining how to prefetch the next cache line, wherein the cache line and one load to the load unit Related to the pagination table item. The process begins in step 202.

在步驟202中,當虛擬位址132未出現在轉譯查詢緩衝器116中,轉譯查詢緩衝器116產生一遺失信號134並傳送至分頁表尋訪引擎118。分頁表尋訪引擎118在接收遺失信號134後即執行分頁表尋訪以便取得遺失在轉譯查詢緩衝器116 中之虛擬位址132所轉譯成的實體位址。分頁表尋訪引擎118藉由產生一分頁表項目載入請求信號(PTE load request)136來執行分頁表尋訪動作,其中分頁表尋訪引擎118將分頁表項目載入請求信號136傳送至載入單元108,用以載入執行位址轉譯所需之分頁表項目。流程前進至步驟204。 In step 202, when the virtual address 132 does not appear in the translation query buffer 116, the translation query buffer 116 generates a missing signal 134 and transmits it to the page table search engine 118. The page table search engine 118 performs a page table lookup after receiving the lost signal 134 to obtain the missing in the translation query buffer 116. The physical address translated into the virtual address 132 in the medium. The page table search engine 118 performs a page table lookup action by generating a page table entry request signal (PTE load request) 136, wherein the page table table search engine 118 transmits the page table entry load request signal 136 to the load unit 108. Used to load the paging table items needed to perform address translation. Flow proceeds to step 204.

在步驟204中,載入單元108偵測分頁表項目載入請求信號136並且載入位於實體記憶體128中之分頁表項目。此外,載入單元108透過一確認信號138告知預取單元122已經查見(seen)分頁表項目載入請求信號136,並且將第一快取線124之實體位址提供至轉譯查詢緩衝器116,在第1圖的實施例中,該實體位址為P,其中此第一快取線124具有載入單元108所載入之分頁表項目。流程前進至步驟206。 In step 204, the load unit 108 detects the page table entry load request signal 136 and loads the page table entry located in the entity memory 128. In addition, the loading unit 108 informs the prefetch unit 122 via a confirmation signal 138 that the page table entry request signal 136 has been spotted, and provides the physical address of the first cache line 124 to the translation query buffer 116. In the embodiment of FIG. 1, the physical address is P, and the first cache line 124 has a page table entry loaded by the loading unit 108. Flow proceeds to step 206.

在步驟206中,預取單元122產生一預取請求信號142並傳送至載入單元108。預取請求信號142命令載入單元108將位於實體位址P+64之第二快取線126預取至資料快取112。換言之,載入單元108將位於第一快取線124(具有載入至載入單元108之分頁表項目)之後的下一條快取線(第二快取線126)預取至資料快取112。流程前進至步驟208。 In step 206, prefetch unit 122 generates a prefetch request signal 142 and transmits it to load unit 108. The prefetch request signal 142 commands the load unit 108 to prefetch the second cache line 126 located at the physical address P+64 to the data cache 112. In other words, the loading unit 108 prefetches the next cache line (second cache line 126) located after the first cache line 124 (having the page table entry loaded to the load unit 108) to the data cache 112. . Flow proceeds to step 208.

在步驟208中,載入單元108根據預取請求信號142將下一條快取線(第二快取線126)預取至資料快取112。然而在某些情況下,微處理器100中之載入單元108並不會執行載入第二快取線126的動作。舉例而言,上述情況可為一功能性需求(functional requirement)情況,例如快取線落在一非快取記憶體區(non-cacheable memory region)。上述情況也可為微處理器 100要執行非推測性配置(non-speculative allocations)。若載入單元108決定載入來自實體記憶體128中之第二快取線126,則載入單元108命令匯流排介面單元114執行此載入動作。流程結束於步驟208。 In step 208, the load unit 108 prefetches the next cache line (second cache line 126) to the data cache 112 based on the prefetch request signal 142. In some cases, however, the load unit 108 in the microprocessor 100 does not perform the act of loading the second cache line 126. For example, the above situation may be a functional requirement situation, such as a cache line falling in a non-cacheable memory region. The above situation can also be a microprocessor 100 is to perform non-speculative allocations. If the load unit 108 decides to load the second cache line 126 from the physical memory 128, the load unit 108 commands the bus interface unit 114 to perform this load action. The process ends at step 208.

雖然本發明實施例係描述預取下一條快取線,但在其他實施例中,預取單元122會產生一請求信號用以命令載入單元108預取上一條快取線,或者是命令載入單元108預取下一條與上一條快取線。此實施例適用於程式在記憶體分頁中以另一方向行進的情況。 Although the embodiment of the present invention describes prefetching the next cache line, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch the last cache line, or the command load. The ingress unit 108 prefetches the next and previous cache lines. This embodiment is suitable for situations where the program travels in the other direction in the memory page.

此外,雖然本發明實施例係描述預取具有分頁表項目之下一條快取線,但在其他實施例中,預取單元122會產生一請求信號用以命令載入單元108預取具有其他層級(level)的分頁資訊階層之下一條快取線,例如分頁描述符項目(Page Descriptor Entries;PDE)。值得注意的是,雖然使用此方法的某些程式之存取樣本(access pattern)是有助益的,但由於將大量實體記憶體設置於單一分頁描述符項目下方的情況不常見,並且程式尋訪記憶體的速度會變得很慢,因此上述方法不但效率不彰也會帶來風險。此外,在其他實施例中,預取單元122會產生一請求信號用以命令載入單元108預取具有另一分頁表階層(不同於上述分頁描述符項目/分頁表項目階層)之下一條快取線。 In addition, although the embodiment of the present invention describes prefetching a cache line below the page table entry, in other embodiments, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch with other levels. (level) A cache line below the paging information hierarchy, such as Page Descriptor Entries (PDE). It's worth noting that although access patterns for some programs using this method are helpful, it is not common to place a large amount of physical memory under a single paged descriptor item, and the program is searched. The speed of the memory can become very slow, so the above methods are not only inefficient but also bring risks. In addition, in other embodiments, the prefetch unit 122 generates a request signal for instructing the loading unit 108 to prefetch another subpage table hierarchy (unlike the above-mentioned paging descriptor item/page table item hierarchy). Take the line.

如前文所述,預取單元122會產生一請求信號用以命令載入單元108預取下一條快取線至具有需要完成分頁表尋訪之分頁表項目的快取線。假設各個分頁表之大小為4千位元 組(KB),各個分頁表項目之大小為8位元組,並且各條快取線之大小為64位元組,所以一個分頁表中會具有64條分別具有八個分頁表項目之快取線。因此,於步驟208中所預取之下一條快取線中具有分頁表中緊鄰的(next)八個分頁表項目的可能性相當高,特別是在作業系統將分頁表配置為實體連續分頁表的情況下。 As described above, the prefetch unit 122 generates a request signal for instructing the load unit 108 to prefetch the next cache line to the cache line having the page table entry that needs to complete the page table lookup. Suppose each page table is 4 kilobytes in size. Group (KB), the size of each page table item is octet, and the size of each cache line is 64 bytes, so there will be 64 caches with eight page table items in one page table. line. Therefore, it is highly probable that the next cache line in the next hop line has the next eight page table entries in the next page, in particular in the operating system, the paging table is configured as a physical continuous page table. in the case of.

在使用小型分頁(通常為4千位元組)的情況下,程式在最後會存取記憶體之八個分頁中的其中幾個,而這些所存取的分頁有很大的可能性係超過於步驟202中轉譯查詢緩沖器116所存取之分頁。在另一實施例中可將額外的邏輯電路加入至預取單元122與載入單元108,使得預取單元122產生一請求信號用以命令載入單元108預取八個分頁表項目,此舉會大大地減少執行一分頁表尋訪用以將八個記憶體分頁儲存至轉譯查詢緩衝器116所需的時脈週期,其中這八個記憶體分頁的實體位址係儲存在八個分頁表項目中。具體說來,當分頁表尋訪引擎118必須執行分頁表尋訪(包括載入位於第二快取線126之八個分頁表項目中的任意一個)時,這些所載入之分頁表項目將會位於資料快取112中(除非他們依序從資料快取112中移除),此舉會縮短讀取實體記憶體128用以取得分頁表項目所需的延遲時間。 In the case of small page breaks (usually 4 kilobytes), the program will eventually access several of the eight pages of memory, and these accessed pages have a high probability of exceeding The page accessed by the query buffer 116 is translated in step 202. In another embodiment, additional logic circuitry can be added to the prefetch unit 122 and the load unit 108 such that the prefetch unit 122 generates a request signal to command the load unit 108 to prefetch eight page table entries. The clock cycle required to perform a page table lookup to store eight memory pages to the translation query buffer 116 is greatly reduced, wherein the eight memory paged physical address locations are stored in eight page table entries. in. In particular, when the page table search engine 118 must perform a page table lookup (including loading any of the eight page table items located in the second cache line 126), the loaded page table items will be located. The data cache 112 (unless they are removed from the data cache 112 in sequence) will reduce the delay time required to read the physical memory 128 to obtain the page table entry.

習知預取機制係用以偵測程式記憶體存取之記憶體存取樣本(pattern)(即載入指令與儲存指令)。若預取器所偵測到之程式係藉由一樣本來存取記憶體,則預取器會預期之後載入指令或儲存指令的位址,並且從此位址執行預取動作。若 程式依序地存取記憶體,則預取器通常會根據載入指令或儲存指令的虛擬位址來預取下一條快取線。在一作業系統執行分頁表尋訪之處理器架構中,以載入指令或儲存指令為基礎之預取器(program load/store-based prefetcher)會在載入分頁表項目之後預取下一條快取線。然而,在以硬體方式執行分頁表尋訪而不是軟體進行載入指令或儲存指令之處理器中,以載入指令或儲存指令為基礎之預取器並不會觸發(trigger off)分頁表項目之載入動作(因為這不是一個載入指令),也因此不會在載入分頁表項目之後預取下一條快取線。相反地,在本發明之以硬體方式執行分頁表尋訪之處理器中,預取單元122可觸發一非程式化之分頁表項目載入動作,也就是藉由分頁表尋訪引擎118所觸發之實體記憶體存取動作。因此,不同於以載入指令或儲存指令為基礎的機制,本發明之預取單元122會命令載入單元108預取下一條快取線,並且此快取線可能包含分頁表中之數個分頁表項目。 The conventional prefetch mechanism is used to detect memory access patterns (ie, load instructions and store instructions) for program memory access. If the program detected by the prefetcher accesses the memory by the same way, the prefetcher expects to load the address of the instruction or the store instruction and perform the prefetch action from the address. If When the program accesses the memory sequentially, the prefetcher usually prefetches the next cache line based on the virtual address of the load instruction or the store instruction. In a processor architecture that performs paging page lookup in an operating system, a program load/store-based prefetcher based on a load instruction or a store instruction prefetches the next cache after loading the page table entry. line. However, in processors that perform paged table searches in hardware rather than software for load or store instructions, prefetchers based on load or store instructions do not trigger off the page table entry. The load action (because this is not a load instruction), and therefore does not prefetch the next cache line after loading the page table entry. Conversely, in the processor of the present invention that performs the paging table search in a hardware manner, the prefetch unit 122 may trigger an unprogrammed paging table item loading action, that is, triggered by the paging table search engine 118. Physical memory access action. Therefore, unlike the mechanism based on the load instruction or the store instruction, the prefetch unit 122 of the present invention instructs the load unit 108 to prefetch the next cache line, and the cache line may contain several of the page break tables. Pagination table item.

選擇性預取 Selective prefetch

第1圖與第2圖所述之分頁表項目預取機制具有降低分頁表尋訪時間的優點。如上所述,很有可能下一條被預取的實體快取線包含分頁表中的接下來幾個分頁表項目。當作業系統將分頁表設置為實體上依序相鄰時,可能性會特別高。上述做法的優點在於,因為有相當高的機率程式可能存取至少一些記憶體之中超出虛擬存取當前頁面的接下來數個頁面而導致轉譯查詢緩衝器的遺失。然而,如果作業系統沒有將分頁表設置為實體上依序相鄰、或者至少並非當中的一些,則預取下 一條快取線可能導致從快取記憶體階層(hierarchy)逐出(evict)比已預取快取線更好用的快取線。以下的實施例係關於此,並且改善快取效率。 The paging table item prefetching mechanism described in Figures 1 and 2 has the advantage of reducing the paging table browsing time. As mentioned above, it is likely that the next prefetched entity cache line will contain the next few pagination table items in the pagination table. The possibility is particularly high when the operating system sets the paging table to be physically adjacent. The advantage of the above approach is that the translation query buffer is lost because there is a relatively high probability that the program may access at least some of the memory beyond the next few pages of the virtual access current page. However, if the operating system does not set the paging table to be physically adjacent, or at least not some of them, prefetch A cache line may result in a faster cache line from the cache memory hierarchy (evict) than the prefetch cache line. The following embodiments are related to this and improve the cache efficiency.

詞彙 vocabulary

分頁表項目(page table entry,PTE)儲存實體記憶體的實體頁面位址以及實體記憶體頁面的屬性。分頁表項目包含於微處理器的記憶體分頁機制的分頁表。分頁表項目的實體記憶體位址本質上對應為一個分頁表項目的尺寸。在一些實施例中分頁表項目為4位元組(bytes),在其他實例中分頁表項目為8位元組,但其他實施例亦被考量而運用在本發明中。 The page table entry (PTE) stores the physical page address of the physical memory and the attributes of the physical memory page. The paging table entry is included in the paging table of the memory paging mechanism of the microprocessor. The physical memory address of the paging table item essentially corresponds to the size of a paging table item. The page table entry is 4 bytes in some embodiments, and the page table entry is 8 bytes in other examples, but other embodiments are also contemplated for use in the present invention.

分頁表(page table)是一組實體上連續的分頁表項目。分頁表的實體記憶位址本質上對應於位址邊界,而位址邊界為分頁表的尺寸。在一實施例中,舉例而言,分頁表為4K位元組,並且分頁表包括1024個4位元的分頁表項目、或是512個8位元的分頁表項目。然而,其他實施例考量了不同尺寸的分頁表。分頁表中的每個分頁表項目具有索引,該索引係決定自要被轉譯的虛擬位址的一部分位元。舉例而言,在4K位元組分頁表與4位元組分頁表項目的情況中,虛擬位址的位元21:12標定分頁表項目的項目至分頁表。在另一個實施例中,在4K位元組分頁表與8位元組分頁表項目的情況中,虛擬位址的位元20:12標定分頁表項目的項目至分頁表。 A page table is a set of contiguous page table items on a solid basis. The physical memory address of the paging table essentially corresponds to the address boundary, and the address boundary is the size of the paging table. In one embodiment, for example, the page break table is a 4K byte, and the page break table includes 1024 4-bit page break table entries, or 512 8-bit page break table entries. However, other embodiments have considered paging tables of different sizes. Each page table entry in the pagination table has an index that determines a portion of the bits from the virtual address to be translated. For example, in the case of a 4K-bit component page table and a 4-bit component page table entry, bits 21:12 of the virtual address calibrate the items of the page table entry to the page table. In another embodiment, in the case of a 4K-bit component page table and an 8-bit component page table entry, bits 20:12 of the virtual address calibrate the items of the page table entry to the page table.

分頁表包括了多個快取線,其實體位址本質上對應至一快取線的尺寸。在一實施例中,快取線的尺寸為64位元組,但其他實施例亦被考量而運用在本發明中。因為快取線大 於分頁表項目,每一快取線包括多個分頁表項目。分頁表所包括的每一個快取線具有索引,該索引係決定自要被轉譯的虛擬位址的一部分位元。舉例而言,在4K位元組分頁表與64位元組分頁表項目的情況中,虛擬位址的位元21:16標定分頁表之中的快取線的索引。 The paging table includes a plurality of cache lines whose physical addresses essentially correspond to the size of a cache line. In one embodiment, the size of the cache line is 64 bytes, but other embodiments are contemplated for use in the present invention. Because the cache line is large For paged table items, each cache line includes multiple page table items. Each cache line included in the paging table has an index that determines a portion of the bits from the virtual address to be translated. For example, in the case of a 4K-bit component page table and a 64-bit component page table entry, bits 21:16 of the virtual address calibrate the index of the cache line among the page break tables.

分頁表的最後快取線是具有分頁表所包含的快取線中最大索引的快取線。舉例而言,在4K位元組分頁表與64位元組快取線與4位元組分頁表項目的情況中,分頁表的最後快取線的索引(虛擬位址的位元21:16)為0x3F(或是二進位111111)。在另一個實施例中,在4K位元組分頁表與64位元組快取線與8位元組分頁表項目的情況中,分頁表的最後快取線的索引(虛擬位址的位元20:15)為0x3F(或是二進位111111)。 The last cache line of the page break table is the cache line with the largest index among the cache lines contained in the page break table. For example, in the case of a 4K-bit component page table and a 64-bit group cache line and a 4-bit component page table entry, the index of the last cache line of the page table (bit 21:16 of the virtual address) ) is 0x3F (or binary 111111). In another embodiment, in the case of a 4K-bit component page table and a 64-bit group cache line and an 8-bit component page table entry, the index of the last cache line of the page table (bits of the virtual address) 20:15) is 0x3F (or binary 111111).

現在參考第3圖所顯示的微處理器100的示意圖。第3圖的微處理器100在許多方面類似於第1圖的微處理器100。若未特別標示,則相似標號的元件是類似的。第1圖與第3圖的差異在於修正了載入單元308、分頁表尋訪引擎318、以及分頁表項目載入要求336(因此,上述修正相較於第1圖具有不同的標號)。詳細而言,PTE載入要求336包括除了被要求的分頁表項目位址398(位於實體位址P、快取線之中)之外的最後旗標(flag)396。此外,分頁表尋訪引擎318接著決定包含分頁表項目的快取線是否為包含分頁表項目並且填寫最後旗標396的分頁表的最後快取線。最後,載入單元308檢查最後旗標396以決定是否提供快取線之實體位址138至預取單元122。第4圖至第8圖將描述更多細節。 Reference is now made to the schematic diagram of the microprocessor 100 shown in FIG. The microprocessor 100 of Figure 3 is similar in many respects to the microprocessor 100 of Figure 1. If not specifically indicated, similarly labeled elements are similar. The difference between Fig. 1 and Fig. 3 is that the load unit 308, the page table search engine 318, and the page table entry load request 336 are modified (therefore, the above correction has a different label than the first figure). In detail, the PTE load request 336 includes a final flag 396 in addition to the requested page table entry address 398 (located within the physical address P, the cache line). In addition, the page table search engine 318 then determines if the cache line containing the page table entry is the last cache line that contains the page table entry and fills in the page table of the last flag 396. Finally, the load unit 308 checks the last flag 396 to determine whether to provide the physical address 138 of the cache line to the prefetch unit 122. More details will be described in Figures 4 through 8.

現在參考第4圖所示之操作第3圖之微處理器100的流程圖。流程圖始於步驟402。 Reference is now made to the flowchart of the microprocessor 100 of Fig. 3, which is shown in Fig. 4. The flowchart begins at step 402.

在步驟402中,當未在轉譯查詢緩衝器116中找到虛擬位址132時,轉譯查詢緩衝器116產生遺失信號134至分頁表尋訪引擎318,分頁表尋訪引擎318對應執行分頁表尋訪以得到轉譯查詢緩衝器116中未找到的虛擬位址132的實體位址轉譯。分頁表尋訪包括分頁表尋訪引擎318以決定需要執行位址轉譯的分頁表項目的實體位址。分頁表尋訪可包括存取微處理器100的分頁機制的其他結構,以決定分頁表項目的實體位址。舉例而言,在x86架構的實施例中,分頁表依據微處理器100是在32位元、PAE或IA-32e頁面模式,以尋訪包括存取PML4項目(PML4E)、PDPT項目(PDPTE)、及/或頁面名冊項目(page directory entry,PDE)。這些結構中的全部或部分可在具有分頁機制的微處理器100的快取結構中被快取,例如PML4快取、PDPTE快取或PDE快取,或是在包括資料快取112的微處理器100的快取記憶體的各種位階之中。其他實施例包括具有虛擬記憶能力的其他處理器架構、以及在其記憶分頁機制實施分頁表尋訪和其他分業架構的其他處理器架構,例如SPARC架構、ARM架構、PowerPC架構、以及其他習知的處理器架構,亦可運用於本發明中。流程進行到步驟404。 In step 402, when the virtual address 132 is not found in the translation query buffer 116, the translation query buffer 116 generates a missing signal 134 to the page table search engine 318, and the page table search engine 318 performs a page table search to obtain a translation. The physical address translation of the virtual address 132 not found in the buffer 116 is queried. The page table lookup includes a page table lookup engine 318 to determine the physical address of the page table entry for which address translation needs to be performed. The paging table lookup may include accessing other structures of the paging mechanism of the microprocessor 100 to determine the physical address of the paging table entry. For example, in an embodiment of the x86 architecture, the paging table is based on the 32-bit, PAE, or IA-32e page mode of the microprocessor 100 to search for access to PML4 projects (PML4E), PDPT projects (PDPTE), And/or page directory entry (PDE). All or part of these structures may be cached in a cache structure of the microprocessor 100 having a paging mechanism, such as a PML4 cache, a PDPTE cache or a PDE cache, or a microprocessor including a data cache 112. The cache of the device 100 is among various levels of the memory. Other embodiments include other processor architectures with virtual memory capabilities, as well as other processor architectures that implement paging table lookups and other split architectures in their memory paging mechanisms, such as SPARC architecture, ARM architecture, PowerPC architecture, and other conventional processing. The architecture can also be used in the present invention. The flow proceeds to step 404.

在步驟404,分頁表尋訪引擎318決定包括步驟402之分頁表項目的快取線(第一快取線)是否為包括該分頁表項目的分頁表中的最後快取線。這表示第二快取線實體上接續第一快取線(亦即,第二快取線具有相等於第一快取線之實體位址 以快取線尺寸而遞增的實體位址)。較好的情況是,分頁表尋訪引擎318檢測步驟402中轉譯查詢緩衝器116未找到的虛擬位址132的預定位元來做決定。步驟404之操作細節將於第5、6圖描述。流程進行到決定步驟406。 At step 404, the page table search engine 318 determines whether the cache line (first cache line) including the page table entry of step 402 is the last cache line in the page table including the page table entry. This means that the second cache line entity is connected to the first cache line (that is, the second cache line has a physical address equal to the first cache line). The physical address that is incremented by the cache line size). Preferably, the page lookup engine 318 detects the predetermined bits of the virtual address 132 not found by the translation query buffer 116 in step 402 to make a decision. Details of the operation of step 404 will be described in Figures 5 and 6. Flow proceeds to decision step 406.

在判斷步驟406中,如果步驟404之決定為真,流程進行到步驟408;否則,流程進行到步驟412。 In decision step 406, if the decision of step 404 is true, the flow proceeds to step 408; otherwise, the flow proceeds to step 412.

在判斷408中,分頁表尋訪引擎318設定步驟414所產生要求336的最後旗標396為真。流程進行到步驟414。 In decision 408, the page table search engine 318 sets the last flag 396 of the request 336 generated by step 414 to be true. The flow proceeds to step 414.

在步驟412中,分頁表尋訪引擎318設定步驟414所產生要求336的最後旗標396為假。流程進行到步驟414。 In step 412, the page table search engine 318 sets the last flag 396 of the request 336 generated by step 414 to be false. The flow proceeds to step 414.

在步驟414中,分頁表尋訪引擎318產生要求336以載入分頁表項目並且傳送要求336至載入單元308,上述分頁表項目之實體位址係決定於步驟402。要求336包括步驟408或步驟412所產生的最後旗標396之數值。當隨後得到分頁表項目時,分頁表尋訪引擎318使用該分頁表項目以轉譯虛擬位址132,並以虛擬位址132轉譯的實體位址更新轉譯查詢緩衝器116來完成分頁表尋訪。流程進行到決定步驟416。 In step 414, the page table search engine 318 generates a request 336 to load the page table entry and transfer the request 336 to the load unit 308, the physical address of which is determined in step 402. Requirement 336 includes the value of the last flag 396 generated by step 408 or step 412. When the page table entry is subsequently obtained, the page table search engine 318 uses the page table entry to translate the virtual address 132 and updates the translation query buffer 116 with the physical address translated by the virtual address 132 to complete the page table lookup. Flow proceeds to decision step 416.

在判斷步驟416中,載入單元308判斷最後旗標396是否為真。如果是,則流程進行到步驟418;否則,流程進行到步驟422。 In decision step 416, the load unit 308 determines if the last flag 396 is true. If so, the flow proceeds to step 418; otherwise, the flow proceeds to step 422.

在步驟418,載入單元308並未提供第一快取線的實體位址138至預取單元122,然後流程結束。 At step 418, the load unit 308 does not provide the physical address 138 of the first cache line to the prefetch unit 122, and the flow ends.

在步驟422,載入單元308提供第一快取線的實體位址138至預取單元122。流程進行到步驟424。 At step 422, the load unit 308 provides the physical address 138 of the first cache line to the prefetch unit 122. The flow proceeds to step 424.

在步驟424,預取單元122以快取線的尺寸(例如64位元組)遞增第一快取線138的實體位址,並且傳送要求142至載入單元308以在遞增位址預取第二快取線。流程進行到步驟426。 At step 424, prefetch unit 122 increments the physical address of first cache line 138 by the size of the cache line (e.g., 64 bytes) and transmits request 142 to load unit 308 to prefetch in incremented address. Two quick lines. The flow proceeds to step 426.

在步驟426,載入單元308使用預取要求142作為指示,藉以預取第二快取線到微處理器100。流程結束於步驟426。 At step 426, the load unit 308 uses the prefetch request 142 as an indication to prefetch the second cache line to the microprocessor 100. The process ends at step 426.

現在參考第5圖所示之方塊圖用以說明分頁表尋訪引擎318所形成的分頁表項目位址502。分頁表項目位址502為實體位址。在第5圖所述的產生分頁表項目位址502的實施例中,分頁表項目的尺寸為4位元組,分頁表為4K位元組。第5圖也顯示分頁表項目位址502的位元,並且分頁表項目位址502構成包括該分頁表項目的分頁表項目之分頁表508所包含的快取線的索引504。分頁表項目位址401係由微處理器100架構所形成。 Referring now to the block diagram shown in FIG. 5, the paged table entry address 502 formed by the page lookup engine 318 is illustrated. The paging table item address 502 is a physical address. In the embodiment of generating the paging table entry address 502 described in FIG. 5, the size of the paging table entry is 4 bytes, and the paging table is 4K bytes. Figure 5 also shows the bits of the page table entry address 502, and the page break table entry address 502 constitutes the index 504 of the cache line included in the page break table 508 of the page break table entry for the page break table entry. The paging table entry address 401 is formed by the microprocessor 100 architecture.

分頁表尋訪引擎318從虛擬位址132以及分頁表位址506形成分頁表項目位址502。換言之,PDE包括指向分頁表508之指標(pointer),亦即如圖所示的分頁表508基底的實體記憶體位址。一般而言,分頁表位址506係取得自頁面名冊項目(PDE),然而在一些分頁模式中(例如,只有一階的分頁結構),分頁表項目位址502可以直接得自微處理器100的暫存器(例如x86架構中的CR3暫存器)。 The page table search engine 318 forms a page table entry address 502 from the virtual address 132 and the page table address 506. In other words, the PDE includes a pointer to the pager table 508, that is, the physical memory address of the paged table 508 as shown. In general, the pagination table address 506 is obtained from a page roster item (PDE), however in some paging modes (eg, only a first order pagination structure), the pagination table item address 502 can be obtained directly from the microprocessor 100. The scratchpad (such as the CR3 scratchpad in the x86 architecture).

在第5圖所示的實施例中,因為分頁表項目為4位元組並且以4位元組為準,較低的兩個位元的數值被填入0。虛 擬位址132的位元[21:12]成為分頁表項目位址502的位元[11:2],而分頁表位址506的位元[N:12]則形成分頁表項目位址502的位元[N:12],其中N為分頁表位址506與分頁表項目位址502的最重要位元(例如32位元實體位址中的31位元,36位元實體位址中的35位元,40位元實體位址中的39位元)。分頁表項目位址502指向分頁表508中的分頁表項目,如圖所示,其為分頁表項目的實體記憶位址。在第5圖的實施例中,分頁表項目位址502指向16個分頁表項目之快取線中的分頁表項目13。 In the embodiment shown in FIG. 5, since the page table item is a 4-byte group and the 4-bit group is used, the value of the lower two bits is filled with 0. Virtual The bit [21:12] of the pseudo address 132 becomes the bit [11:2] of the paging table item address 502, and the bit [N:12] of the paging table address 506 forms the paging table item address 502. Bits [N:12], where N is the most significant bit of the page table address 506 and the page table entry address 502 (eg, 31 bits in a 32-bit entity address, 36-bit entity address) 35 bits, 39 bits in a 40-bit entity address). The pagination table item address 502 points to the pagination table item in the pagination table 508, which is the physical memory address of the pagination table item, as shown. In the embodiment of Figure 5, the page break table entry address 502 points to the page break table entry 13 in the cache line of the 16 page table entries.

如圖所示,快取線索引504是分頁表項目位址502的位元[11:6],對應到虛擬位址132的位元[21:16]。因此,快取線索引504可決定自虛擬位址132或已形成的分頁表項目位址502(亦即,藉由第11圖之實施例的載入單元1108)。在第5圖的實施例中,快取線的快取線索引504包含被分頁表項目位址502所指向的分頁表項目,並且快取線索引504的數值為0x3C。如上所述,因為分頁表508包含64條快取線(亦即在快取線為64位元組以及分頁表為4K位元組的實施例中),最大的快取線索引504為0x3F。 As shown, cache line index 504 is a bit [11:6] of paged table entry address 502, corresponding to bit [21:16] of virtual address 132. Thus, the cache line index 504 can be determined from the virtual address 132 or the formed page table entry address 502 (i.e., by the load unit 1108 of the embodiment of FIG. 11). In the embodiment of FIG. 5, the cache line index 504 of the cache line contains the page table entry pointed to by the page break table entry address 502, and the value of the cache line index 504 is 0x3C. As described above, because the page break table 508 contains 64 cache lines (i.e., in the embodiment where the cache line is 64 bytes and the page break table is 4K bytes), the largest cache line index 504 is 0x3F.

現在參考第6圖所示之方塊圖用以說明分頁表尋訪引擎318所形成的分頁表項目位址502。在一實施例中,分頁表項目為8位元組(不是第5圖所示的4位元組)。第6圖與第5圖相似,除了以下所述之外。首先,由於在實施例中分頁表項目為8位元組並且以8位元組為準,較低的3個位元的數值被填入0(而非第5圖所示的較低的2個位元)。再者,虛擬位址132的位元[20:12]成為分頁表項目位址502的位元[11:3](而非第5圖所 示的虛擬位址132的位元[21:12]成為分頁表項目位址502的位元[11:2])。在第6圖所示的實施例中,分頁表項目位址502指向8個分頁表項目之快取線中的分頁表項目5(而非指向第5圖所示的16個分頁表項目的快取線)。如上所述,快取線索引504為分頁表項目位址502的位元[11:6],對應至第6圖實施例的虛擬位址132的位元[20:15](而非第5圖之位元[21:16])。在第5圖的實施例中,快取線的快取線索引504包含被分頁表項目位址502所指向的分頁表項目,並且快取線索引504的數值為0x04。 Referring now to the block diagram shown in FIG. 6, the paged table entry address 502 formed by the page lookup engine 318 is illustrated. In one embodiment, the page break table entry is an 8-bit tuple (not a 4-byte tuple as shown in FIG. 5). Figure 6 is similar to Figure 5 except as described below. First, since the paging table entry in the embodiment is an 8-bit group and is based on an 8-bit group, the value of the lower 3 bits is filled with 0 (instead of the lower 2 shown in FIG. 5). One bit). Furthermore, the bits [20:12] of the virtual address 132 become the bits [11:3] of the page table entry address 502 (instead of Figure 5). The bit [21:12] of the virtual address 132 shown becomes the bit [11:2] of the page table entry address 502. In the embodiment shown in FIG. 6, the page table entry address 502 points to the page table entry 5 in the cache line of the eight page table entries (rather than to the 16 page table entries shown in FIG. 5). Take the line). As described above, the cache line index 504 is the bit [11:6] of the page table entry address 502, corresponding to the bit [20:15] of the virtual address 132 of the embodiment of FIG. 6 (instead of the fifth Figure bit [21:16]). In the embodiment of FIG. 5, the cache line index 504 of the cache line contains the page table entry pointed to by the page break table entry address 502, and the value of the cache line index 504 is 0x04.

現在參考第7圖,其方塊圖顯示了決定第二快取線(亦即實體上接續快取線(第一快取線)的快取線並且包含因應於轉譯查詢緩衝器未找到之要求的分頁表項目)是否在分頁表508之外的第一實施例,例如藉由第4圖之步驟404的分頁表尋訪引擎318。上述決定係藉由檢視第一快取線的快取線索引504,並且比較其是否相等於最大快取線索引504的數值(例如0x3F),亦即分頁表508所包含的最後快取線的快取線索引504。詳細而言,如果第一快取線是分頁表508所包含的最後快取線(亦即在分頁表508的最後),則實體上接續的快取線(第二快取線)在分頁表508之外。如果第一快取線的快取線索引504相等於最大的快取線索引數值,此決定為真,亦即第二快取線在分頁表508之外。否則,此決定為假。 Referring now to Figure 7, the block diagram shows the decision to determine the second cache line (i.e., the cache line of the contiguous cache line (the first cache line) and includes the requirements for the translation buffer not found. The paging table entry) is a first embodiment other than the paging table 508, such as the paging table search engine 318 by step 404 of FIG. The above decision is made by examining the cache line index 504 of the first cache line and comparing whether it is equal to the value of the maximum cache line index 504 (eg, 0x3F), that is, the last cache line included in the page table 508. Cache line index 504. In detail, if the first cache line is the last cache line included in the page break table 508 (that is, at the end of the page break table 508), then the entity succeeds the cache line (the second cache line) in the page break table. Outside 508. If the cache line index 504 of the first cache line is equal to the maximum cache line index value, the decision is true, that is, the second cache line is outside the page break table 508. Otherwise, this decision is false.

在第7圖的實施例中,虛擬位址132具有數值0x12345678。結果,0x34為虛擬位址132的位元[21:16],其為分頁表項目位址502的位元[11:6],並且其為第一快取線索引504。因此,由於第一快取線索引504的數值0x34小於最高快取 線索引504的數值0x3F,此決定為假,並且最後旗標396被設定為假。如上所述,第二快取線被包含在分頁表508,而不在分頁表508之外。 In the embodiment of Figure 7, virtual address 132 has a value of 0x12345678. As a result, 0x34 is the bit [21:16] of the virtual address 132, which is the bit [11:6] of the page table entry address 502, and is the first cache line index 504. Therefore, since the value 0x34 of the first cache line index 504 is less than the highest cache The value of line index 504 is 0x3F, this decision is false, and the last flag 396 is set to false. As described above, the second cache line is included in the page break table 508, not outside the page break table 508.

現在參考第8圖,其方塊圖顯示了決定第二快取線是否在所述分頁表508之外的第二實施例。第8圖類似於第7圖,除了虛擬位址132的數值不同之外。在第8圖的實施例中,虛擬位址132具有數值0x123F5678。結果,0x34為虛擬位址132的位元[21:16],其為分頁表項目位址502的位元[11:6],並且其為第一快取線索引504。因此,由於第一快取線索引504的數值0x3F相等於最高快取線索引504的數值0x3F,此決定為真,並且最後旗標396被設定為真。如上所述,第二快取線在分頁表508之外,而不是被包含在分頁表508。結果,第二快取線可能也可能未包括分頁表項目的快取線。即使包含,也可能不包括分頁表的分頁表項目,上述分頁表是被分頁結構中的下一個PDE所指向的下一個分頁表。因此,此處所描述的實施例選擇性地預取第二快取線,其優點在於降低對上述微處理器100的快取階層架構的汙染(pollution)。 Referring now to Figure 8, a block diagram showing a second embodiment of determining whether the second cache line is outside of the page break table 508. Figure 8 is similar to Figure 7, except that the value of virtual address 132 is different. In the embodiment of Figure 8, virtual address 132 has a value of 0x123F5678. As a result, 0x34 is the bit [21:16] of the virtual address 132, which is the bit [11:6] of the page table entry address 502, and is the first cache line index 504. Thus, since the value 0x3F of the first cache line index 504 is equal to the value 0x3F of the highest cache line index 504, this decision is true and the last flag 396 is set to true. As noted above, the second cache line is outside of the page break table 508 and is not included in the page break table 508. As a result, the second cache line may or may not include the cache line of the page table entry. Even if it contains, it may not include the pagination table item of the pagination table, which is the next pagination table pointed to by the next PDE in the pagination structure. Thus, the embodiments described herein selectively prefetch the second cache line, which has the advantage of reducing the pollution of the cache hierarchy of the microprocessor 100 described above.

現在參考第9圖,其方塊圖顯示了形成決定的第三實施例。第9圖類似於第7圖,除了在第9圖的實施例中採用了8位元組的分頁表項目,因此每一條快取線只有包括8個分頁表項目。如同第7圖所示,上述決定係藉由檢視第一快取線的快取線索引504,並且比較其是否相等於最大快取線索引504的數值(例如0x3F),亦即分頁表508所包含的最後快取線的快取線索引504。然而在第9圖中,形成決定是藉由檢視虛擬位址132 的位元[20:15](而非第7圖中的虛擬位址132的位元[21:16]),其在兩種情況中為分頁表項目位址502的位元[11:6]。 Referring now to Figure 9, a block diagram shows a third embodiment of the formation decision. Fig. 9 is similar to Fig. 7, except that in the embodiment of Fig. 9, an 8-byte page table entry is used, so each cache line includes only 8 page table items. As shown in FIG. 7, the above decision is made by examining the cache line index 504 of the first cache line and comparing whether it is equal to the value of the maximum cache line index 504 (for example, 0x3F), that is, the page table 508. The cache line index 504 of the last cache line included. However, in Figure 9, the formation decision is made by examining the virtual address 132. Bits [20:15] (instead of bits [21:16] of virtual address 132 in Figure 7), which in both cases are the bits of the page table entry address 502 [11:6] ].

在第9圖的實施例中,虛擬位址132具有數值0x12345678。因此,0x28為虛擬位址132的位元[20:15],其為分頁表項目位址502的位元[11:6],並且其為第一快取線索引504。因此,由於第一快取線索引504的數值0x28小於最高快取線索引504的數值0x3F,此決定為假,並且最後旗標396被設定為假。如上所述,第二快取線被包含在分頁表508,而不在分頁表508之外。 In the embodiment of Figure 9, virtual address 132 has a value of 0x12345678. Thus, 0x28 is the bit [20:15] of the virtual address 132, which is the bit [11:6] of the page table entry address 502, and is the first cache line index 504. Therefore, since the value 0x28 of the first cache line index 504 is less than the value 0x3F of the highest cache line index 504, this decision is false, and the last flag 396 is set to false. As described above, the second cache line is included in the page break table 508, not outside the page break table 508.

現在參考第10圖,其方塊圖顯示了決定第二快取線是否在所述分頁表508之外的第四實施例。第10圖類似於第9圖,除了虛擬位址132的數值不同之外。在第10圖的實施例中,虛擬位址132具有數值0x123FD678。於是,0x3F為虛擬位址132的位元[20:15],其為分頁表項目位址502的位元[11:6],並且其為第一快取線索引504。因此,由於第一快取線索引504的數值0x3F相等於最高快取線索引504的數值0x3F,此決定為真,並且最後旗標396被設定為真。如上所述,第二快取線在分頁表508之外,而不是被包含在分頁表508。結果,此處所描述的實施例選擇性地預取第二快取線,其優點在於降低對上述微處理器100的快取階層架構的汙染。 Referring now to Figure 10, a block diagram shows a fourth embodiment for determining whether the second cache line is outside of the page break table 508. Figure 10 is similar to Figure 9, except that the value of virtual address 132 is different. In the embodiment of FIG. 10, virtual address 132 has a value of 0x123FD678. Thus, 0x3F is the bit [20:15] of the virtual address 132, which is the bit [11:6] of the page table entry address 502, and is the first cache line index 504. Thus, since the value 0x3F of the first cache line index 504 is equal to the value 0x3F of the highest cache line index 504, this decision is true and the last flag 396 is set to true. As noted above, the second cache line is outside of the page break table 508 and is not included in the page break table 508. As a result, the embodiments described herein selectively prefetch the second cache line, which has the advantage of reducing contamination of the cache hierarchy of the microprocessor 100 described above.

應該理解的是,雖然第7圖至第10圖描述了關於第4圖之步驟404的實施例的決定的形成(亦即藉由分頁表尋訪引擎318並設定最後旗標396),上述決定也可藉由微處理器100的其他單元來形成(例如,藉由第11圖實施例之載入單元1108), 以及在一些實施例中不使用最後旗標396(亦即第11圖至第13圖的實施例)。較好的情況是,由硬體邏輯來形成決定,例如在相關單元中的組合邏輯,其中相關單元例如分頁表尋訪引擎318/1218/1318、或是比較虛擬位址132的適當位元之載入單元1108、或是具有預定最高快取線索引數值的分頁表項目位址502。 It should be understood that although Figures 7 through 10 depict the formation of a decision regarding the embodiment of step 404 of Figure 4 (i.e., by paging table search engine 318 and setting the final flag 396), the above decision is also made. It can be formed by other units of the microprocessor 100 (e.g., by the loading unit 1108 of the embodiment of Fig. 11), And in some embodiments the final flag 396 (i.e., the embodiment of Figures 11 through 13) is not used. Preferably, the decision is made by hardware logic, such as combinatorial logic in the associated unit, such as the paging table search engine 318/1218/1318, or the appropriate bit of the virtual address 132. The entry unit 1108 is either a paged table entry address 502 having a predetermined maximum cache line index value.

現在參考第11圖,其方塊圖顯示了另一種實施方式的微處理器100。第11圖的微處理器100在許多方面類似第1圖的微處理器100。除非特別標示,相似標號的元件是類似的。第11圖與第1圖的差異在於修改了載入單元1108。第11圖的載入單元1108被修改為包括用以決定第二快取線是否在分頁表508之外的硬體邏輯。因此,在第11圖的實施例中,分頁表項目載入要求136並未包含最後旗標396。第11圖的微處理器100的操作方式類似於第4圖所述,除了分頁表尋訪引擎118並未做決定(例如步驟404),而是由載入單元1108做決定(類似於步驟416的決定,在步驟414所述之分頁表尋訪引擎118傳送分頁表項目要求136之後),並且如果決定為真,則不提供第一快取線的實體位址138至快取單元122。 Referring now to Figure 11, a block diagram of another embodiment of a microprocessor 100 is shown. The microprocessor 100 of Figure 11 is similar in many respects to the microprocessor 100 of Figure 1. Like referenced elements are similar unless specifically indicated. The difference between FIG. 11 and FIG. 1 is that the loading unit 1108 is modified. The load unit 1108 of FIG. 11 is modified to include hardware logic to determine if the second cache line is outside of the page break table 508. Thus, in the embodiment of FIG. 11, the pagination table item load request 136 does not include the last flag 396. The operation of the microprocessor 100 of FIG. 11 is similar to that described in FIG. 4 except that the paging table search engine 118 does not make a decision (eg, step 404), but instead is determined by the loading unit 1108 (similar to step 416). It is decided that after the page table search engine 118 described in step 414 transmits the page table entry request 136), and if the decision is true, the entity address 138 of the first cache line is not provided to the cache unit 122.

現在參考第12圖,其方塊圖顯示了另一種實施方式的微處理器100。第12圖的微處理器100在許多方面類似第11圖的微處理器100。除非特別標示,相似標號的元件是類似的。第12圖與第11圖的差異在於修改了分頁表尋訪引擎1218、載入單元1208以及預取單元1222。第12圖的載入單元1208被修改為不提供第一快取線的實體位址138至預取單元122。如果決定為 假,分頁表尋訪引擎1218形成決定並且產生與直接提供第一快取線的實體位址1238至預取單元1222。第12圖的微處理器100的操作方式類似於第4圖所述,除了如果步驟406中的決定為真,流程進行到步驟418(不執行第二快取線的預取)。如果決定為假,流程進行到步驟414,然後直接到修改後的步驟422,分頁表尋訪引擎1218提供第一快取線的實體位址1238至預取單元1222。 Referring now to Figure 12, a block diagram of another embodiment of a microprocessor 100 is shown. The microprocessor 100 of Fig. 12 is similar in many respects to the microprocessor 100 of Fig. 11. Like referenced elements are similar unless specifically indicated. The difference between FIG. 12 and FIG. 11 is that the paging table search engine 1218, the loading unit 1208, and the prefetch unit 1222 are modified. The load unit 1208 of FIG. 12 is modified to not provide the physical address 138 of the first cache line to the prefetch unit 122. If decided to Falsely, the page table lookup engine 1218 forms a decision and generates a physical address 1238 to the prefetch unit 1222 that directly provides the first cache line. The operation of the microprocessor 100 of Fig. 12 is similar to that described in Fig. 4 except that if the decision in step 406 is true, the flow proceeds to step 418 (prefetching of the second cache line is not performed). If the decision is false, the flow proceeds to step 414 and then directly to the modified step 422, the page table search engine 1218 provides the entity address 1238 of the first cache line to the prefetch unit 1222.

現在參考第13圖,其方塊圖顯示了另一種實施方式的微處理器100。第13圖的微處理器100在許多方面類似第12圖的微處理器100。除非特別標示,相似標號的元件是類似的。第13圖與第12圖的差異在於修改了分頁表尋訪引擎1318、以及預取單元1322。第13圖的分頁表尋訪引擎1318遞增第一快取線的實體位址以產生第二快取線的實體位址1338(而非由預取單元1322來執行),並且如果決定為假時將它提供至預取單元1322。第13圖微處理器100的操作方式類似於第4圖所述,除了如果步驟406中的決定為真,流程進行到步驟418(不執行第二快取線的預取)。如果決定為假,流程進行到步驟414,然後直接到修改後的步驟422,分頁表尋訪引擎1218提供第一快取線的實體位址1338至預取單元1222。然後在修改後的步驟424,預取單元1322不需要執行遞增,而只要在其送往載入單元1208的要求142中,使用已接收的第二快取線的實體位址1338。 Referring now to Figure 13, a block diagram of another embodiment of a microprocessor 100 is shown. The microprocessor 100 of Figure 13 is similar in many respects to the microprocessor 100 of Figure 12. Like referenced elements are similar unless specifically indicated. The difference between the 13th and 12th figures is that the paging table search engine 1318 and the prefetch unit 1322 are modified. The page table search engine 1318 of FIG. 13 increments the physical address of the first cache line to generate the physical address 1338 of the second cache line (rather than being executed by the prefetch unit 1322), and if the decision is false, It is provided to the prefetch unit 1322. The operation of the microprocessor 100 of Fig. 13 is similar to that of Fig. 4 except that if the decision in step 406 is true, the flow proceeds to step 418 (prefetching of the second cache line is not performed). If the decision is false, the flow proceeds to step 414 and then directly to the modified step 422, the page table search engine 1218 provides the physical address 1338 of the first cache line to the prefetch unit 1222. Then, in a modified step 424, the prefetch unit 1322 does not need to perform an increment, but instead uses the received physical address 1338 of the second cache line in its request 142 to the load unit 1208.

在另一個實施例中(未顯示),載入單元自分頁表尋訪引擎接收分頁表項目載入要求,計算第二快取線的實體位址,以及產生用於第二快取線的預取要求。在此實施例中,預取單 元可以是不存在的。 In another embodiment (not shown), the load unit receives a page table entry load request from the page table lookup engine, calculates a physical address of the second cache line, and generates a prefetch for the second cache line Claim. In this embodiment, the prefetch order The yuan can be non-existent.

雖然實施例已說明記憶體分頁機制所使用的x86架構處理器的通用詞彙,應當理解上述實施例包括其他處理器架構,其包括虛擬記憶體能力並且在記憶頁面機制中使用分頁表,例如SPARC架構、ARM架構、PowerPC架構、以及其他習知的處理器架構。 Although the embodiments have described the general vocabulary of x86 architecture processors used by the memory paging mechanism, it should be understood that the above embodiments include other processor architectures that include virtual memory capabilities and use paging tables in memory page mechanisms, such as SPARC architecture. , ARM architecture, PowerPC architecture, and other conventional processor architectures.

再者,雖然實施例已經描述第二快取線為下一條實體接續的快取線,並且藉由決定第一快取線是否在分頁表的最後來決定第二快取線是否在分頁表之外,其他實施例亦可為第二快取線是上一條實體接續的快取線,並且藉由決定第一快取線是否在分頁表的開始來加以決定,其包含了透過多個記憶體頁面在其他方向執行程式。 Furthermore, although the embodiment has described that the second cache line is the cache line of the next entity connection, and by determining whether the first cache line is at the end of the page table, it is determined whether the second cache line is in the page table. In addition, other embodiments may also be that the second cache line is a cache line connected by the previous entity, and is determined by determining whether the first cache line is at the beginning of the page break table, and includes multiple memory The page executes the program in other directions.

本發明雖以各種實施例揭露如上,然其僅為範例參考而非用以限定本發明的範圍,任何熟習此項技藝者,在不脫離本發明之精神和範圍內,當可做些許的更動與潤飾。舉例而言,可使用軟體來實現本發明所述之裝置與方法的功能、構造、模組、模擬、描述及/或測試。此目的可透過使用一般程式語言(例如C、C++)、硬體描述語言(包括Verilog或VHDL硬體描述語言等等)、或其他可用的程式來實現。該軟體可被設置在任何電腦可用的媒體,例如半導體、磁碟、光碟(例如CD-ROM、DVD-ROM等等)中。本發明實施例中所述之裝置與方法可被包括在一半導體智慧財產權核心(semiconductor intellectual property core),例如以硬體描述語言(HDL)實現之微處理器核心中,並被轉換為硬體型態的積體電路產品。此外,本發明所 描述之裝置與方法可透過結合硬體與軟體的方式來實現。因此,本發明不應該被本文中之任一實施例所限定,而當視後附之申請專利範圍與其等效物所界定者為準。特別是,本發明係實現於一般用途電腦之微處理器裝置中。最後,任何熟知技藝者,在不脫離本發明之精神和範圍內,當可作些許更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 The present invention has been described in terms of various embodiments, which are intended to be illustrative only and not to limit the scope of the present invention, and those skilled in the art can make a few changes without departing from the spirit and scope of the invention. With retouching. For example, software may be used to implement the functions, construction, modules, simulations, descriptions, and/or tests of the devices and methods described herein. This can be achieved by using a general programming language (eg C, C++), a hardware description language (including Verilog or VHDL hardware description language, etc.), or other available programs. The software can be placed in any computer-usable medium such as a semiconductor, a disk, a compact disc (such as a CD-ROM, a DVD-ROM, etc.). The apparatus and method described in the embodiments of the present invention may be included in a semiconductor intellectual property core, such as a microprocessor core implemented in a hardware description language (HDL), and converted into a hardware. Type integrated circuit products. Further, the present invention The devices and methods described can be implemented by combining hardware and software. Therefore, the present invention should not be limited by any of the embodiments herein, and the scope of the appended claims and their equivalents. In particular, the present invention is embodied in a microprocessor device for a general purpose computer. In the final analysis, the scope of the present invention is defined by the scope of the appended claims.

Claims (21)

一種微處理器,包括:一轉譯查詢緩衝器;一第一要求,載入一分頁表項目至該微處理器,以回應未在該轉譯查詢緩衝器找到一虛擬位址,該被要求的分頁表項目被包含於一分頁表,該分頁表包括複數個快取線,該等快取線包括一第一快取線,該第一快取線包括該被要求的分頁表項目;硬體邏輯,決定實體接續該第一快取線的一第二快取線是否在該分頁表之外;一第二要求,預取該第二快取線至該微處理器,該第二要求係至少基於該硬體邏輯所作之該決定而被選擇性產生;決定該第二快取線是否在該分頁表之外,該硬體邏輯決定該第一快取線是否為該分頁表所包含的最後快取線;以及決定該第一快取線是否為該分頁表所包含的最後快取線,該硬體邏輯決定該虛擬位址的複數個預定位元的數值是否都為一。 A microprocessor comprising: a translation query buffer; a first request to load a page table entry to the microprocessor in response to not finding a virtual address in the translation query buffer, the requested page break The table item is included in a page table, the page table includes a plurality of cache lines, the cache line includes a first cache line, the first cache line includes the requested page table item; hardware logic Determining whether a second cache line of the first cache line is outside the page table; a second request prefetching the second cache line to the microprocessor, the second requirement being at least Selectively generated based on the decision made by the hardware logic; determining whether the second cache line is outside the page table, the hardware logic determining whether the first cache line is the last included in the page table a cache line; and determining whether the first cache line is the last cache line included in the page table, and the hardware logic determines whether the value of the plurality of predetermined bits of the virtual address is one. 如申請專利範圍第1項所述之微處理器,更包括:該虛擬位址的該等預定位元為N位元的較高的M位元並決定該分頁表中的該分頁表項目的一索引,其中N-M為該分頁表項目之位元組尺寸的對數(log2)。 The microprocessor of claim 1, further comprising: the predetermined bit of the virtual address being a higher M bit of N bits and determining the page table entry in the page table An index, where NM is the logarithm (log2) of the byte size of the page table entry. 如申請專利範圍第1項所述之微處理器,更包括:當該決定為假時,產生該第二要求;以及當該決定為真時,不產生該第二要求。 The microprocessor of claim 1, further comprising: when the decision is false, generating the second request; and when the decision is true, the second request is not generated. 如申請專利範圍第1項所述之微處理器,更包括:一載入單元;以及一分頁表尋訪引擎,產生該第一要求至該載入單元。 The microprocessor of claim 1, further comprising: a load unit; and a page table search engine that generates the first request to the load unit. 如申請專利範圍第4項所述之微處理器,更包括:該第一要求包括一旗標,該旗標包括該分頁表尋訪引擎所做的決定;一預取單元;如果該旗標指示該決定為假,該載入單元提供該第一快取線的該實體位址至該預取單元;以及該預取單元產生該第二要求,以回應自該載入單元所接收之該第一快取線的該實體位址。 The microprocessor of claim 4, further comprising: the first requirement includes a flag, the flag including a decision made by the paging table search engine; a prefetch unit; if the flag indicates The decision is false, the loading unit provides the physical address of the first cache line to the prefetch unit; and the prefetch unit generates the second request in response to the first received from the loading unit The physical address of a cache line. 如申請專利範圍第4項所述之微處理器,更包括:該載入單元製作該決定;一預取單元;如果該決定為假,該載入單元提供該第一快取線的該實體位址至該預取單元;以及該預取單元產生該第二要求,以回應自該載入單元所接收之該第一快取線的該實體位址。 The microprocessor of claim 4, further comprising: the loading unit making the decision; a prefetching unit; if the decision is false, the loading unit provides the entity of the first cache line Addressing the prefetch unit; and the prefetch unit generates the second request in response to the physical address of the first cache line received from the load unit. 如申請專利範圍第4項所述之微處理器,更包括:該分頁表尋訪引擎製作該決定;一預取單元;如果該決定為假,該分頁表尋訪引擎提供該第一快取線的該實體位址至該預取單元;以及該預取單元產生該第二要求,以回應自該分頁表尋訪引擎 所接收之該第一快取線的該實體位址。 The microprocessor of claim 4, further comprising: the paging table search engine making the decision; a prefetching unit; if the decision is false, the paging table searching engine provides the first cache line The physical address is addressed to the prefetch unit; and the prefetch unit generates the second request in response to the paging table search engine The physical address of the first cache line received. 如申請專利範圍第4項所述之微處理器,更包括:該分頁表尋訪引擎製作該決定;一預取單元;如果該決定為假,該分頁表尋訪引擎提供該第二快取線的該實體位址至該預取單元;以及該預取單元產生該第二要求,以回應自該分頁表尋訪引擎所接收之該第二快取線的該實體位址。 The microprocessor of claim 4, further comprising: the paging table search engine making the decision; a prefetching unit; if the decision is false, the paging table searching engine provides the second cache line The entity address is addressed to the prefetch unit; and the prefetch unit generates the second request in response to the physical address of the second cache line received by the paging table search engine. 如申請專利範圍第4項所述之微處理器,更包括:該載入單元製作該決定;以及如果該決定為假,該載入單元產生該第二要求。 The microprocessor of claim 4, further comprising: the loading unit making the decision; and if the decision is false, the loading unit generates the second request. 如申請專利範圍第1項所述之微處理器,更包括:一快取記憶體;以及該第二要求包括一要求以預取該第二快取線至該快取記憶體。 The microprocessor of claim 1, further comprising: a cache memory; and the second requirement includes a request to prefetch the second cache line to the cache memory. 一種方法,包括:產生一第一要求以載入一分頁表項目至一微處理器,以回應在未在該微處理器之一轉譯查詢緩衝器找到一虛擬位址,該被要求的分頁表項目被包含於一分頁表,該分頁表包括複數個快取線,該等快取線包括一第一快取線,該第一快取線包括該被要求的分頁表項目;決定實體接續該第一快取線的一第二快取線是否在該分頁表之外;至少基於該決定而選擇性產生一第二要求以預取該第二快 取線至該微處理器;決定該第二快取線是否在該分頁表之外包括決定該第一快取線是否為該分頁表所包含的最後快取線;以及決定該第一快取線是否為該分頁表所包含的最後快取線包括決定該虛擬位址的複數個預定位元的數值是否都為一。 A method comprising: generating a first request to load a page table entry to a microprocessor in response to finding a virtual address in a query buffer not translated in the microprocessor, the requested page table The item is included in a page table, the page table includes a plurality of cache lines, the cache line includes a first cache line, the first cache line includes the requested page table item; Whether a second cache line of the first cache line is outside the page table; at least based on the decision, selectively generating a second request to prefetch the second fast Taking a line to the microprocessor; determining whether the second cache line is outside the page table includes determining whether the first cache line is the last cache line included in the page table; and determining the first cache Whether the line is the last cache line included in the page table includes whether the value of the plurality of predetermined bits determining the virtual address is one. 如申請專利範圍第11項所述之方法,更包括:該虛擬位址的該等預定位元為N位元的較高的M位元並決定該分頁表中的該分頁表項目的一索引,其中N-M為該分頁表項目之位元組尺寸的對數(log2)。 The method of claim 11, further comprising: the predetermined bit of the virtual address being a higher M bit of N bits and determining an index of the page table entry in the page table , where NM is the logarithm (log2) of the byte size of the page table entry. 如申請專利範圍第11項所述之方法,更包括:選擇性產生該第二要求包括:當該決定為假時,產生該第二要求;以及當該決定為真時,不產生該第二要求。 The method of claim 11, further comprising: selectively generating the second requirement comprises: generating the second request when the decision is false; and not generating the second when the decision is true Claim. 如申請專利範圍第11項所述之方法,更包括:該微處理器之一分頁表尋訪引擎產生該第一要求至該微處理器之一載入單元。 The method of claim 11, further comprising: the paging table search engine of the microprocessor generating the first request to one of the microprocessor loading units. 如申請專利範圍第11項所述之方法,更包括:該第一要求包括一旗標以指示該分頁表尋訪引擎所做的該決定;如果該旗標指示該決定為假,該載入單元提供該第一快取線的該實體位址至該微處理器之一預取單元;以及該預取單元產生該第二要求,以回應自該載入單元所接收之該第一快取線的該實體位址。 The method of claim 11, further comprising: the first requirement includes a flag to indicate the decision made by the paging table search engine; if the flag indicates that the decision is false, the loading unit Providing the physical address of the first cache line to a prefetch unit of the microprocessor; and the prefetch unit generates the second request in response to the first cache line received from the load unit The physical address of the entity. 如申請專利範圍第14項所述之方法,更包括: 該載入單元製作該決定;如果該決定為假,該載入單元提供該第一快取線的該實體位址至該微處理器之一預取單元;以及該預取單元產生該第二要求,以回應自該載入單元所接收之該第一快取線的該實體位址。 For example, the method described in claim 14 of the patent scope further includes: The loading unit makes the decision; if the decision is false, the loading unit provides the physical address of the first cache line to one of the prefetch units of the microprocessor; and the prefetch unit generates the second Requiring to respond to the physical address of the first cache line received from the loading unit. 如申請專利範圍第14項所述之方法,更包括:該分頁表尋訪引擎製作該決定;如果該決定為假,該分頁表尋訪引擎提供該第一快取線的該實體位址至該微處理器之一預取單元;以及該預取單元產生該第二要求,以回應自該分頁表尋訪引擎所接收之該第一快取線的該實體位址。 The method of claim 14, further comprising: the paging table search engine making the decision; if the decision is false, the paging table search engine provides the physical address of the first cache line to the micro a prefetch unit of the processor; and the prefetch unit generates the second request in response to the physical address of the first cache line received by the paging table search engine. 如申請專利範圍第14項所述之方法,更包括:該分頁表尋訪引擎製作該決定;如果該決定為假,該分頁表尋訪引擎提供該第二快取線的該實體位址至該微處理器之一預取單元;以及該預取單元產生該第二要求,以回應自該分頁表尋訪引擎所接收之該第二快取線的該實體位址。 The method of claim 14, further comprising: the paging table search engine making the decision; if the decision is false, the paging table search engine provides the physical address of the second cache line to the micro a prefetch unit of the processor; and the prefetch unit generates the second request in response to the physical address of the second cache line received by the paging table search engine. 如申請專利範圍第14項所述之方法,更包括:該載入單元製作該決定;以及如果該決定為假,該載入單元產生該第二要求。 The method of claim 14, further comprising: the loading unit making the decision; and if the decision is false, the loading unit generates the second request. 如申請專利範圍第11項所述之方法,更包括:該第二要求包括一要求以預取該第二快取線至該微處理器之一快取記憶體。 The method of claim 11, further comprising: the second requirement comprising a request to prefetch the second cache line to one of the microprocessor cache memories. 一種被編碼於至少一非暫態(non-transitory)電腦用媒體且 使用於運算裝置之電腦程式產品,該電腦程式產品包括:內建於該媒體之電腦用程式碼,用以確認一微處理器,該電腦用程式碼包括:一第一程式碼,用以確認一轉譯查詢緩衝器;一第二程式碼,用以確認一第一要求以載入一分頁表項目至一微處理器,以回應未在該微處理器之一轉譯查詢緩衝器找到一虛擬位址,該被要求的分頁表項目被包含於一分頁表,該分頁表包括複數個快取線,該等快取線包括一第一快取線,該第一快取線包括該被要求的分頁表項目;一第三程式碼,用以確認一硬體邏輯,該硬體邏輯決定實體接續該第一快取線的一第二快取線是否在該分頁表之外;以及一第四程式碼,用以確認一第二要求以預取該第二快取線至該微處理器,該第二要求係至少基於該決定而被選擇性產生,決定該第二快取線是否在該分頁表之外包括決定該第一快取線是否為該分頁表所包含的最後快取線;以及決定該第一快取線是否為該分頁表所包含的最後快取線包括決定該虛擬位址的複數個預定位元的數值是否都為一。 One encoded in at least one non-transitory computer medium and A computer program product for use in an arithmetic device, the computer program product comprising: a computer program code built into the media for confirming a microprocessor, the computer program code comprising: a first code for confirming Translating a query buffer; a second code for confirming a first request to load a page table entry to a microprocessor in response to not finding a dummy bit in a translation buffer of one of the microprocessors Address, the requested page table item is included in a page table, the page table includes a plurality of cache lines, the cache line includes a first cache line, and the first cache line includes the requested line a page table item; a third code for confirming a hardware logic, the hardware logic determining whether a second cache line connecting the first cache line is outside the page table; and a fourth a code for confirming a second request to prefetch the second cache line to the microprocessor, the second request being selectively generated based at least on the determining, determining whether the second cache line is in the Outside the pagination table includes the decision of the first Whether the line fetch is the last cache line included in the page table; and determining whether the first cache line is the last cache line included in the page table includes determining whether the value of the plurality of predetermined bits of the virtual address is All are one.
TW105120463A 2015-07-02 2016-06-29 Selective prefetching of physically sequential cache line to cache line that includes loaded page table TWI590053B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/790,467 US9569363B2 (en) 2009-03-30 2015-07-02 Selective prefetching of physically sequential cache line to cache line that includes loaded page table entry

Publications (2)

Publication Number Publication Date
TW201710911A TW201710911A (en) 2017-03-16
TWI590053B true TWI590053B (en) 2017-07-01

Family

ID=58066157

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105120463A TWI590053B (en) 2015-07-02 2016-06-29 Selective prefetching of physically sequential cache line to cache line that includes loaded page table

Country Status (2)

Country Link
CN (1) CN106168929B (en)
TW (1) TWI590053B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3837610A4 (en) * 2018-08-14 2021-10-27 Texas Instruments Incorporated Prefetch kill and revival in an instruction cache

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018156573A (en) * 2017-03-21 2018-10-04 東芝メモリ株式会社 Memory device and information processing system
CN110389911A (en) * 2018-04-23 2019-10-29 珠海全志科技股份有限公司 A kind of forecasting method, the apparatus and system of device memory administrative unit
CN111198827B (en) * 2018-11-16 2022-10-28 展讯通信(上海)有限公司 Page table prefetching method and device
CN111552653B (en) * 2020-05-14 2021-01-29 上海燧原科技有限公司 Page table reading method, device and equipment and computer storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7111125B2 (en) * 2002-04-02 2006-09-19 Ip-First, Llc Apparatus and method for renaming a data block within a cache
US20060136696A1 (en) * 2004-12-16 2006-06-22 Grayson Brian C Method and apparatus for address translation
US8161246B2 (en) * 2009-03-30 2012-04-17 Via Technologies, Inc. Prefetching of next physically sequential cache line after cache line that includes loaded page table entry

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3837610A4 (en) * 2018-08-14 2021-10-27 Texas Instruments Incorporated Prefetch kill and revival in an instruction cache
US11314660B2 (en) 2018-08-14 2022-04-26 Texas Instruments Incorporated Prefetch kill and revival in an instruction cache
US11620236B2 (en) 2018-08-14 2023-04-04 Texas Instruments Incorporated Prefetch kill and revival in an instruction cache

Also Published As

Publication number Publication date
TW201710911A (en) 2017-03-16
CN106168929B (en) 2019-05-31
CN106168929A (en) 2016-11-30

Similar Documents

Publication Publication Date Title
TWI451334B (en) Microprocessor and method for reducing tablewalk time
TWI590053B (en) Selective prefetching of physically sequential cache line to cache line that includes loaded page table
EP0642086B1 (en) Virtual address to physical address translation cache that supports multiple page sizes
US7117290B2 (en) MicroTLB and micro tag for reducing power in a processor
US7089398B2 (en) Address translation using a page size tag
US8296547B2 (en) Loading entries into a TLB in hardware via indirect TLB entries
JP2618175B2 (en) History table of virtual address translation prediction for cache access
US8984254B2 (en) Techniques for utilizing translation lookaside buffer entry numbers to improve processor performance
US20050050278A1 (en) Low power way-predicted cache
US10083126B2 (en) Apparatus and method for avoiding conflicting entries in a storage structure
US8185692B2 (en) Unified cache structure that facilitates accessing translation table entries
US9569363B2 (en) Selective prefetching of physically sequential cache line to cache line that includes loaded page table entry
US9720847B2 (en) Least recently used (LRU) cache replacement implementation using a FIFO storing indications of whether a way of the cache was most recently accessed
KR20170139659A (en) A computer processor having separate registers for addressing memory
US7472227B2 (en) Invalidating multiple address cache entries
JP2009512943A (en) Multi-level translation index buffer (TLBs) field updates
JPH0371355A (en) Apparatus and method for retrieving cache
US11494300B2 (en) Page table walker with page table entry (PTE) physical address prediction
US11500779B1 (en) Vector prefetching for computing systems
US7171540B1 (en) Object-addressed memory hierarchy that facilitates accessing objects stored outside of main memory
TWI782754B (en) Microprocessor and method implemented therein
US10977176B2 (en) Prefetching data to reduce cache misses
KR20040047398A (en) Method for data access using cache memory
CN115080464A (en) Data processing method and data processing device