TWI420311B - Set-based modular cache partitioning method - Google Patents

Set-based modular cache partitioning method Download PDF

Info

Publication number
TWI420311B
TWI420311B TW99108010A TW99108010A TWI420311B TW I420311 B TWI420311 B TW I420311B TW 99108010 A TW99108010 A TW 99108010A TW 99108010 A TW99108010 A TW 99108010A TW I420311 B TWI420311 B TW I420311B
Authority
TW
Taiwan
Prior art keywords
cache memory
cache
physical
virtual
module
Prior art date
Application number
TW99108010A
Other languages
Chinese (zh)
Other versions
TW201133237A (en
Inventor
Tsung Lee
Original Assignee
Univ Nat Sun Yat Sen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Sun Yat Sen filed Critical Univ Nat Sun Yat Sen
Priority to TW99108010A priority Critical patent/TWI420311B/en
Publication of TW201133237A publication Critical patent/TW201133237A/en
Application granted granted Critical
Publication of TWI420311B publication Critical patent/TWI420311B/en

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Description

基於集合分模組之快取記憶體之分割方法Segmentation method of cache memory based on set sub-module

本發明係關於一種快取記憶體分割方法,尤其是一種基於集合分模組之基於集合分模組之快取記憶體之分割方法。The invention relates to a method for segmenting a cache memory, in particular to a method for segmenting a cache memory based on a set sub-module based on a set sub-module.

目前,單核心處理器因功率及頻率之限制,顯然已無法滿足於現今計算機系統之操作需求及應用,因此,發展具有更高操作性能之多核心處理器技術遂逐漸取代單核心處理器之應用。然而,由於計算機系統之多核心處理器具有快取記憶體共享之特性,衍伸而來的,如何妥善且有效的分割快取記憶體,使得在各核心上執行的程式均能獲得適量的快取記憶體,並容納各該程式的內容,以增進存取效能與降低功率消耗,顯然已成為該領域中之業者亟欲達成之目標。At present, single-core processors are obviously unable to meet the operational requirements and applications of today's computer systems due to power and frequency limitations. Therefore, the development of multi-core processor technology with higher operational performance has gradually replaced the application of single-core processors. . However, because the multi-core processor of the computer system has the characteristics of cache memory sharing, how to properly and effectively segment the cache memory, so that the programs executed on each core can get the right amount of fast Taking memory and accommodating the contents of the program to improve access efficiency and reduce power consumption has clearly become the goal of the industry.

現有的快取記憶體動態分割方式主要有基於結合度的分割方式、基於集合的分割方式及共享方式等三種。許多文獻及專利已揭示一種基於結合度的快取記憶體之分割方式,如:The existing dynamic memory segmentation methods mainly include three types: segmentation method based on combination degree, segmentation method based on set, and sharing mode. Many documents and patents have revealed a way of dividing memory based on the degree of integration, such as:

1. 2006年發表在39th IEEE/ACM Int’l Symp. On Microarchitecture之研討會論文第423至432頁「Utility-based cache partitioning:a low-overhead,high-performance runtime mechanism to partition shared caches」;1. "Utility-based cache partitioning: a low-overhead, high-performance runtime mechanism to partition shared caches", published in the 39th IEEE/ACM Int'l Symp. On Microarchitecture seminar paper, pp. 423-432.

2. 2007年發表在13th Int’l Symp. on High Performance computer Architecture之研討會論文第2至12頁「An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors」;2. An adaptive shared/private NUCA cache partitioning scheme for chip multiprocessors, published on page 2 to 12 of the 13th Int’l Symp. on High Performance computer Architecture.

3. 2004年刊載在J. of Supercomputing之期刊論文第2至12頁「Dynamic partitioning of shared cache memory」;3. Published in the Journal of J. of Supercomputing, pages 2 to 12, "Dynamic partitioning of shared cache memory";

4. 2008年發表在Int’l Conf. on Embedded Computer Systems:Architectures,Modeling,and Simulation之研討會論文第25至32頁「An adaptive bloom filter cache partitioning scheme for multicore architectures」;及4. "An adaptive bloom filter cache partitioning scheme for multicore architectures" on pages 25 to 32 of Int’l Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation, 2008; and

5. 中華民國公告第I285810號「晶片多處理器之分享快取記憶體之分割方法、裝置、系統以及儲存有相關指令的電腦可讀取之記錄媒體」發明專利。5. The Republic of China Announcement No. I285810, "Distribution Method, Device, System, and Computer-Readable Recording Media Storing Cache Memory for Chip Multiprocessors".

上述前案揭示一種習知基於結合度的動態快取記憶體分割,其結構請參照第1圖所示,其將一快取記憶結構區分成數個快取記憶模組,各該模組包含數個快取記憶集合(set),且各快取記憶集合更包含有數個固定數量之快取線通道(way)。當多個程序執行時,各快取記憶模組的快取線通道依各程序的需求量被分割給不同程序,因此,需要設置更多的快取線,該類快取記憶分割方式在快取記憶體進行存取時需操作全部或多條快取線,因而需要較高的功率,並且多核心處理器之多個程式易競爭同一個快取記憶模組,造成額外的存取等待時間,進而降低系統效能。The foregoing disclosure discloses a conventional dynamic cache memory segmentation based on the degree of integration. The structure thereof is shown in FIG. 1 , which divides a cache memory structure into a plurality of cache memory modules, and each module includes a number. Each cache memory set (set), and each cache memory set further contains a plurality of fixed number of cache line (way). When multiple programs are executed, the cache line of each cache module is divided into different programs according to the requirements of each program. Therefore, more cache lines need to be set, and the cache memory segmentation method is fast. It takes a lot of power to access all or more cache lines when accessing memory, and multiple programs of multi-core processors are easy to compete with the same cache memory module, resulting in additional access latency. , thereby reducing system performance.

有鑑於此,該基於集合的動態快取記憶體之分割方式被提出,由於在該類快取記憶體之分割方式中,該快取記憶模組具有較低的結合度,且對於多個快取存取操作僅需要存取各自對應的快取記憶模組,因而可解決上述該基於結合度的動態快取記憶體分割之缺點。關於該基於集合的動態快取記憶體之分割方式,如2008年發表在14th Int’l Symp. on High Performance computer Architecture之研討會論文第367至378頁「Gaining insights into multicore cache partitioning:bridging the gap between simulation and real systems」,其揭示快取區塊係均勻配置於所分配的快取記憶模組中,在快取記憶重新分割時,各快取存取操作係分開進行,因此可減少等待時間,惟其在快取記憶重新配置時需要整體的快取記憶內容的移動,因此相對產生移動時耗費的功率與時間,在進行快取記憶存取時,需要進行除法運算,亦即需要除法電路之設置,相對的,其將導致邏輯電路及運算複雜度增加,因此亦需耗費較多功率與時間。In view of this, the segmentation-based dynamic cache memory segmentation method is proposed, because in the segmentation mode of the cache memory, the cache memory module has a lower degree of integration, and for multiple fast The access operation only needs to access the corresponding cache memory modules, so that the disadvantages of the above-described degree-based dynamic cache memory segmentation can be solved. About the collection-based dynamic cache memory segmentation method, such as the 2008 seminar published in 14th Int'l Symp. on High Performance computer Architecture, pp. 367-378 "Gaining insights into multicore cache partitioning: bridging the gap Between the simulation and real systems, which reveals that the cache block is evenly arranged in the allocated cache memory module, and each cache access operation is performed separately when the cache memory is re-segmented, thereby reducing the waiting time However, in the cache memory reconfiguration, the overall cache memory content movement is required, so that the power and time consumed during the movement are relatively generated, and when the cache memory access is performed, the division operation is required, that is, the division circuit is required. Setting, in contrast, will result in increased logic and computational complexity, and therefore more power and time.

另外,關於共享方式的快取記憶體之分割,如2006年發表在39th IEEE/ACM Int’l Symp. on Microarchitecture之研討會論文第433至442頁「Molecular caches:a caching structure for dynamic creation of application-specific heterogeneous cache regions」,其揭示採用隨機分佈的方式,選擇快取記憶模組儲存之記憶區塊,在進行存取時,平均皆需要搜尋多個可能分佈的記憶區塊,因此需要較多搜尋時間與功率消耗。In addition, the segmentation of cache memory for shared mode, as published in the 39th IEEE/ACM Int'l Symp. on Microarchitecture, pp. 433-442, "Molecular caches: a caching structure for dynamic creation of application -specific heterogeneous cache regions, which reveals that the memory blocks stored by the cache module are selected in a random distribution manner. When accessing, the average needs to search for a plurality of memory blocks that may be distributed, so more Search time and power consumption.

又,如2004年發表在10th Int’l Symp. on High Performance computer Architecture之研討會論文第176至185頁「Organizing the last line of defense before hitting the memory wall for CMPs」,其揭示將快取記憶模組分配給一至多個程式,在進行快取記憶存取時需要搜尋可能存在的快取記憶模組,因此同樣需要較多搜尋時間與功率消耗。Also, as published in the 10th Int'l Symp. on High Performance computer Architecture, pp. 176-185, "Organizing the last line of defense before hitting the memory wall for CMPs", which reveals that the memory model will be cached. The group is assigned to one or more programs, and it is necessary to search for a cache memory module that may exist when performing a cache memory access, so that more search time and power consumption are also required.

再者,如中華民國公告第I297832號「多核心處理器中之非均勻快取記憶體的系統方法」發明專利,其揭示一種將記憶區塊以由遠至近的方式分佈於各快取記憶模組,可以依使用狀況動態移動該記憶區塊,調整其與使用之處理器的距離,因此,快取記憶在存取時亦需由近至遠搜尋多個快取記憶模組,故需要較多時間與功率之消耗。Furthermore, as disclosed in the Republic of China Announcement No. I297832, "System Method for Non-Uniform Cache Memory in Multi-Core Processors", it discloses a method of distributing memory blocks in a cache mode in a far-to-near manner. The group can dynamically move the memory block according to the usage condition and adjust the distance between the memory block and the processor used. Therefore, the cache memory needs to search for multiple cache memory modules from near to far during access, so it is necessary to compare More time and power consumption.

另外,中華民國公告第I317065號「適用於平行處理器之快取記憶體資料存取的方法」發明專利,其亦揭示採取由近至遠搜尋快取記憶模組的方式進行快取記憶存取,因此同樣亦需較多的時間與功率消耗。In addition, the Republic of China Announcement No. I317065 "Method for Accessing Memory Data Accessed by Parallel Processors" invention patent also discloses that the memory access is performed by searching for the memory module from near to far. Therefore, it also requires more time and power consumption.

本發明主要目的係提供一種基於集合分模組之快取記憶體之分割方法,以減少存取的等待時間、搜尋時間及消耗功率。The main object of the present invention is to provide a method for segmenting cache memory based on a collection sub-module to reduce access latency, search time, and power consumption.

本發明次一目的係提供一種基於集合分模組之快取記憶體之分割方法,可以減少重新配置的動作,進一步降低重新分配所需的功率與時間。The second object of the present invention is to provide a method for segmenting cache memory based on a collection sub-module, which can reduce the reconfiguration action and further reduce the power and time required for reallocation.

本發明另一目的係提供一種基於集合分模組之快取記憶體之分割方法,省去重新配置快取記憶時造成該程序進行快取記憶存取的等待時間。Another object of the present invention is to provide a method for segmenting a cache memory based on a collection sub-module, which eliminates the waiting time for the program to perform cache memory access when reconfiguring the cache memory.

根據本發明基於集合分模組之快取記憶體之分割方法,係包含步驟:一程式實體位址對應步驟,係將至少一程式實體位址對應到數個虛擬快取記憶模組之至少一個;一虛擬快取記憶模組對應步驟,利用一虛擬至實體快取記憶模組區域編號對應計算與一區域至全域實體快取記憶模組對應表查詢將該數個虛擬快取記憶模組之內容對應至數個實體快取記憶模組,其中該實體快取記憶模組分配數量之選取及該區域至全域實體快取記憶模組對應表之設定係由一作業系統內之軟體演算法或一硬體演算法依據目前進行快取存取之程式之命中率所決定,且其中該虛擬至實體快取記憶模組區域編號對應計算係包含步驟:依據選取之實體快取記憶模組之數量P,以公式計算一k值;將該數個虛擬快取記憶模組依序編號s1N ,N為0至u-1,其中u為一個程序所對應之虛擬快取記憶模組的數量,且該虛擬快取記憶模組的數量為2的次方,以及將該數個實體快取記憶模組依序做區域編號sPN ,N為0至v-1,其中v為一個程序動態配置的實體快取記憶模組的數量,將各該虛擬快取記憶模組以其編號s1N 對2k 進行如下式之取餘數之運算:s1N '=s1N mod2k ;判斷求得之餘數s1N ’是否小於P;依據該判斷結果執行一實體快取記憶模組之區域編號sPN 對應步驟,其中,若餘數s1N ’小於P,則該餘數s1N ’即為對應之該實體快取記憶模組之全域編號sPN ’若餘數s1N ’不小於P,則將該虛擬快取記憶模組以其編號再進行如下式之取餘數之運算:s1N '=s1N mod2k-1 ,該餘數s1N ’即為對應之該實體快取記憶模組之區域編號sPN ;及回覆該虛擬快取記憶模組所欲存取之實體快取記憶模組之區域編號sPN ;及一實體快取記憶模組之存取步驟,係將各該虛擬快取記憶模組的快取存取,依據該對應到的實體快取記憶模組進行一實體快取記憶體之快取存取。The method for segmenting the cache memory based on the set sub-module according to the present invention comprises the steps of: a program entity address corresponding step, corresponding to at least one program entity address corresponding to at least one of the plurality of virtual cache memory modules a virtual cache memory module corresponding step, using a virtual to physical cache memory module area number corresponding calculation and an area to global entity cache memory module correspondence table query for the plurality of virtual cache memory modules The content corresponds to a plurality of physical cache memory modules, wherein the selection of the number of allocations of the physical cache memory module and the setting of the correspondence table of the area to the global entity cache module are performed by a software algorithm in an operating system or A hardware algorithm is determined according to the hit rate of the program currently accessed by the cache access, and wherein the virtual to physical cache memory module area number corresponding to the calculation system includes steps: according to the number of selected physical cache memory modules P, to The formula calculates a k value; the plurality of virtual cache memory modules are sequentially numbered s 1N , N is 0 to u-1, where u is the number of virtual cache memory modules corresponding to a program, and the virtual The number of cached memory modules is 2, and the number of physical cache modules is sequentially numbered s PN , N is 0 to v-1, where v is a program dynamically configured entity fast Taking the number of memory modules, each virtual cache memory module performs the remainder of the following equation with its number s 1N for 2 k : s 1N '= s 1N mod2 k ; the remainder of the determination is s 1N ' Whether it is less than P; performing the step of the area number s PN of the physical cache memory module according to the judgment result, wherein if the remainder s 1N ' is smaller than P, the remainder s 1N ' is the corresponding cache mode of the entity The global number s PN ' of the group, if the remainder s 1N ' is not less than P, the virtual cache memory module is further subjected to the operation of the remainder of the following equation: s 1N '= s 1N mod2 k-1 , remainder s 1N 'is the number corresponding to the region s PN modules of the physical cache memory; and respond to the desired virtual cache memory module Taken from area number s PN entity of the cache module; and a step of accessing the physical cache memory module, each line of the cache memory module virtual cache access, according to the speed corresponding to the entity The memory module is used to perform a cache access of a physical cache memory.

為讓本發明之上述及其他目的、特徵及優點能更明顯易懂,下文特舉本發明之較佳實施例,並配合所附圖式,作詳細說明如下:The above and other objects, features and advantages of the present invention will become more <RTIgt;

請參照第2圖所示,其揭示本發明較佳實施例之基於集合分模組之快取記憶體之分割方法,包含步驟:一程式實體位址對應步驟S1,係將至少一程式實體位址對應到數個虛擬快取記憶模組之一個;一虛擬快取記憶模組對應步驟S2,利用一虛擬至實體快取記憶模組區域編號對應計算與一區域至全域實體快取記憶模組對應表查詢將數個虛擬快取記憶模組之內容對應至數個實體快取記憶模組,其中該實體快取記憶模組分配數量之選取及該對應表之設定係由一作業系統內之軟體演算法或一硬體演算法依據目前進行快取存取之程式之命中率所決定;及一實體快取記憶模組之存取步驟,係將各該虛擬快取記憶模組的快取存取,依據該對應之實體快取記憶模組進行一實體快取記憶體之快取存取。Referring to FIG. 2, a method for segmenting a cache memory based on a set sub-module according to a preferred embodiment of the present invention is disclosed. The method includes the steps of: a program entity address corresponding to step S1, and at least one program entity bit. The address corresponds to one of the plurality of virtual cache memory modules; a virtual cache memory module corresponding to step S2, using a virtual to physical cache memory module area number corresponding to calculate and an area to global entity cache memory module Corresponding table query corresponds to the content of the plurality of virtual cache memory modules to a plurality of physical cache memory modules, wherein the selection of the number of the physical cache memory module allocations and the setting of the correspondence table are performed by an operating system The software algorithm or a hardware algorithm is determined according to the hit rate of the program currently performing the cache access; and the access step of the entity cache memory module is to cache the virtual cache memory modules. Accessing, performing a cache access of a physical cache memory according to the corresponding physical cache memory module.

其中所謂實體快取記憶模組區域編號的定義為:一程序被分配到若干數量之實體快取記模組(數量為P),針對該些實體快取記模組進行0~P-1之編號,則該些編號即代表實體快取記憶模組區域編號。The so-called entity cache memory module area number is defined as: a program is assigned to a number of physical cache modules (the number is P), and 0~P-1 is performed for the entity cache modules. If they are numbered, they represent the physical cache memory module area number.

請參照第3A、3B及3C圖所示,在該程式實體位址對應步驟S1中,各該程式段(program segment)均有一程式實體位址,多個程式段可共用同一虛擬快取記憶空間,各該程式實體位址中包含三個位址內容D1、D2及D3,該程式段可利用第3B圖之流程依據該二位址內容D1、D2對應到該虛擬快取記憶體。而當該程式實體位址需要位移時,可利用第3C圖之流程進行虛擬快取記憶模組位址對應之轉換機制,其係將一個或多個程式的個別或整體程式段對應到虛擬快取記憶體中相同或不同的虛擬快取記憶模組,其目的為將低使用率及互補使用的程式段對應到同一個虛擬快取記憶空間,有高度競爭性的程式段對應到不同的虛擬快取記憶空間,以消除其間因快取記憶存取競爭造成的快取記憶污染(cache pollution)的效應。Referring to FIG. 3A, FIG. 3B and FIG. 3C, in the program entity address corresponding to step S1, each program segment has a program entity address, and multiple program segments can share the same virtual cache memory space. Each of the program entity addresses includes three address contents D1, D2, and D3, and the program segment can correspond to the virtual cache memory according to the two address contents D1 and D2 by using the process of FIG. 3B. When the physical address of the program needs to be shifted, the process of the virtual cache memory module address corresponding to the process of the 3C is used to map the individual or the whole block of one or more programs to the virtual fast. The same or different virtual cache memory modules in the memory are used to match the low usage rate and the complementary used blocks to the same virtual cache memory space, and the highly competitive program segments correspond to different virtual memories. Cache memory space to eliminate the effect of cache pollution caused by cache memory access competition.

關於在同一虛擬快取記憶空間之位址對應,可利用記憶體交錯(memory interleaving)的安排方式進行對應,或者如本發明第3A及3C圖所示由數個程式段對應在一虛擬快取記憶空間內位移的技術,以錯開對應到同一虛擬快取空間的其他程式段,以避免快取記憶存取時進行相互競爭。Regarding the address correspondence in the same virtual cache memory space, the memory interleaving arrangement may be used for correspondence, or as shown in FIGS. 3A and 3C of the present invention, the plurality of blocks correspond to a virtual cache. The technique of shifting in the memory space to stagger other blocks corresponding to the same virtual cache space to avoid competing with each other when the memory access is cached.

請參照第4圖所示,該程式內容D1用以提供一索引資料,進行虛擬對應與實體對應而獲得所對應之實體快取記憶模組之全域編號,再使用該程式內容D2及D3至該實體快取記憶模組進行快取存取。Referring to FIG. 4, the program content D1 is used to provide an index data, and the virtual correspondence corresponds to the entity to obtain the global number of the corresponding physical cache memory module, and then the program contents D2 and D3 are used to The physical cache memory module performs cache access.

其主要目的為在各虛擬快取記憶空間,依動態需要分配若干數量的實體快取記憶模組,並提供從虛擬到實體快取記憶模組的對應方式。如此,該作業系統內之軟體演算法或硬體演算法依據各程式動態需要調整其使用的實體快取記憶容量,以獲得快取命中率的改進。更進一步言之,一個虛擬快取記憶空間形成一個位址空間整合單位,之後再分配實體快取記憶給各個虛擬快取記憶空間使用,可以依快取命中率與總失誤次數調整各程式所使用實體快取記憶的容量,並使得在相同位址空間可能會相互競爭的程式可對應到不同實體快取記憶模組,而不會造成快取污染。The main purpose is to allocate a certain number of physical cache memory modules according to dynamic needs in each virtual cache memory space, and provide a corresponding manner from virtual to physical cache memory modules. In this way, the software algorithm or the hardware algorithm in the operating system needs to adjust the physical cache memory capacity used by each program according to the dynamics of each program to obtain an improvement in the cache hit rate. Furthermore, a virtual cache memory space forms an address space integration unit, and then allocates the entity cache memory to each virtual cache memory space, which can be adjusted according to the fast hit rate and the total number of errors. The physical cache memory capacity, and programs that may compete with each other in the same address space can correspond to different entity cache memory modules without causing cache pollution.

其中所謂實體快取記憶模組全域編號的定義為:一作業系統或硬體演算法藉由該區域至全域實體快取記憶模組對應表查詢出實際的實體快取記憶體模組之真實位置,該真實位置即為該實際的實體快取記憶體模組之全域編號。The so-called physical cache memory module global number is defined as: an operating system or a hardware algorithm queries the actual location of the actual physical cache memory module by using the area to global entity cache memory module correspondence table. The real location is the global number of the actual physical cache memory module.

關於本發明之基於集合分模組之快取記憶體之分割方法將細述如下:請參照第5圖所示,其揭示將一程式實體位址空間(physical address space)對應到一虛擬快取記憶體(virtual cache)再對應至一實體快取記憶體(physical cache)之間的對應流程圖。The method for segmenting the cache memory based on the set sub-module of the present invention will be described in detail as follows: Please refer to FIG. 5, which discloses that a physical address space is mapped to a virtual cache. The virtual cache then corresponds to a corresponding flow chart between physical caches.

該程式實體位址空間以一虛擬快取記憶對應之方式對應至該虛擬快取記憶體,其中該虛擬快取記憶體具有數個虛擬快取記憶模組,且該虛擬快取記憶模組之數量為2的次方,該程式實體位址空間之至少一程式實體位址係分配至一個該對應到的虛擬快取記憶模組;藉由該區域至全域實體快取記憶模組對應表之查詢結果,將該對應到的虛擬快取記憶模組內之資料,對應到一實體快取記憶體中的一個對應到的實體快取記憶模組中進行儲存,其中該作業系統內之軟體演算法或硬體演算法依據目前程式需求決定實體快取記憶模組之分配數量,因此該所分配實體快取記憶模組之數量可能為2的次方或非2的次方的任意正整數,且該實體快取記憶模組數量係不大於該虛擬快取記憶模組數量。在一般的狀況下,因局部調整計算的關係,該實體快取記憶模組分配之數量常不為2的次方。The virtual address memory space corresponds to the virtual cache memory in a manner corresponding to a virtual cache memory, wherein the virtual cache memory has a plurality of virtual cache memory modules, and the virtual cache memory module a power of 2, the at least one program entity address of the program entity address space is allocated to a corresponding virtual cache memory module; and the area to the global entity cache memory module correspondence table As a result of the query, the data in the corresponding virtual cache memory module is correspondingly stored in a corresponding physical cache memory module in a physical cache memory, wherein the software calculation in the operating system The method or hardware algorithm determines the allocation quantity of the physical cache memory module according to the current program requirement, so the number of the allocated entity cache memory modules may be any power of 2 or any positive integer other than 2. The number of the physical cache memory modules is not greater than the number of the virtual cache memory modules. Under normal circumstances, due to the local adjustment calculation relationship, the number of allocations of the entity cache memory module is often not the power of 2.

本發明藉由該對應表記載各實體快取記憶模組所進行之動態快取記憶分割,使得各程式可以獲得分配合適數量的實體記憶體,促進其個別與整體的快取命中率。The present invention records the dynamic cache segmentation performed by each entity cache module by using the correspondence table, so that each program can obtain an appropriate number of physical memory to promote the individual and overall cache hit ratio.

為清楚介紹如何以該記憶體位址經運算獲得該對應表的索引值,以及如何利用該對應表之索引值決定各該虛擬快取記憶模組與各實體快取記憶模組之間的對應實施方式,請配合參照本發明第6及7圖。In order to clearly describe how to obtain the index value of the corresponding table by the operation of the memory address, and how to use the index value of the corresponding table to determine the corresponding implementation between each virtual cache memory module and each entity cache memory module For the manner, please refer to the sixth and seventh drawings of the present invention.

本發明第6圖係揭示一個規律性以模組為單位的虛擬快取記憶空間對應到實體快取記憶空間的計算方法,此方法僅需要快速低功率的計算。該方法包含步驟:一冪級值計算步驟S21,係依據選取之實體快取記憶模組之數量P,以公式計算該冪級值k值;一將各虛擬快取記憶模組對2k 進行取餘數運算之步驟S22,係將該數個欲進行快取存取之虛擬快取記憶模組依序編號s1N 為0至u-1(即s10 ,s11 ,...,s1u-1 ),以及將該數個實體快取記憶模組依序做區域編號sPN 為0至v-1(即sP0 ,sP1 ,...,sPv-1 ),並將各該虛擬快取記憶模組以其編號s1N 對2k 進行如下式之取餘數之運算:s1N '=s1N mod2k ;一判斷步驟S23,係判斷求得之餘數s1N ’是否小於P;依據該判斷結果執行一實體快取記憶模組之區域編號sPN 對應步驟S24,其中,若餘數s1N ’小於P,則執行一對應步驟,即該餘數s1N ’為對應之該實體快取記憶模組之區域編號sPN ,若餘數s1N ’不小於P,則將各虛擬快取記憶模組再對2k-1 次方進行取餘數之運算步驟:s1N '=s1N mod2k-1 ,再執行一對應步驟,即該餘數s1N ’為對應之該實體快取記憶模組之區域編號sPN ;及一回覆步驟S25,係回覆該虛擬快取記憶模組所欲存取之實體快取記憶模組之區域編號sPNFigure 6 of the present invention discloses a method for calculating a virtual cache memory space in a module-based unit corresponding to an entity cache memory space, which requires only a fast low-power calculation. The method comprises the steps of: a power level value calculation step S21, based on the number P of the selected physical cache memory modules, The formula calculates the power level value k value; a step S22 of performing a remainder operation on each virtual cache memory module for 2 k , sequentially numbers the plurality of virtual cache memory modules to be cache accessed. s 1N is 0 to u-1 (ie, s 10 , s 11 , ..., s 1u-1 ), and the number of physical cache memory modules is sequentially numbered s PN is 0 to v-1 (ie, s P0 , s P1 , ..., s Pv-1 ), and each of the virtual cache memory modules performs the remainder of the following equation with its number s 1N for 2 k : s 1N '=s 1N mod2 k ; a determining step S23, determining whether the obtained remainder s 1N ' is less than P; performing an entity cache memory module area number s PN according to the determination result corresponding to step S24, wherein if the remainder s 1N ' If it is less than P, a corresponding step is performed, that is, the remainder s 1N ' is the area code s PN corresponding to the physical cache memory module, and if the remainder s 1N ' is not less than P, each virtual cache memory module is re- The operation step of taking the remainder of 2 k-1 power: s 1N '= s 1N mod2 k-1 , and then performing a corresponding step, that is, the remainder s 1N ' is the corresponding area number of the physical cache memory module s PN ; and one In step S25, the area number s PN of the physical cache memory module to be accessed by the virtual cache memory module is returned.

如上步驟所述,藉由輸入該虛擬快取記憶空間所分配到的實體快取記憶模組數量P與所欲查詢的虛擬快取記憶模組編號s1N ,經過簡單快速的硬體計算可以回覆出所欲存取的分配之實體快取記憶模組的區域編號sPN 。在該實體快取記憶模組之存取步驟中,各該虛擬快取記憶模組可依據其對應到的實體快取記憶模組之區域編號sPN 進行快取存取。而且,在第6圖的電路實現上,由於其不需要除法器,因此可藉由簡單的計算電路架構進行該虛擬快取記憶模組與實體快取記憶模組之間的對應作業。As described in the above step, by inputting the number of physical cache memory modules P allocated to the virtual cache memory space and the virtual cache memory module number s 1N to be queried, it can be replied through a simple and fast hardware calculation. The area code s PN of the allocated physical cache memory module to be accessed. In the access step of the physical cache memory module, each of the virtual cache memory modules can perform cache access according to the area code s PN of the corresponding physical cache memory module. Moreover, in the circuit implementation of FIG. 6, since the divider is not required, the corresponding operation between the virtual cache memory module and the physical cache memory module can be performed by a simple calculation circuit architecture.

請參照第7圖所示,其係可用以說明利用第6圖之虛擬至實體快取記憶模組區域編號對應計算方法進行虛擬快取記憶空間與實體快取記憶空間之間的對應關係。假設該虛擬快取記憶空間具有8個虛擬快取記憶模組,其數量係為2的次方,並加以依序編號為0~7;另外,假設作業系統內之軟體演算法或一硬體演算法所選取欲分配之該實體快取記憶模組之數量為P,並加以依序做區域編號為0~P-1,其中P為任意正整數,於此,以該實體快取記憶模組之數量P為1至8為例,列出選取八種不同數量P之表項之對應狀況。Please refer to FIG. 7 , which can be used to illustrate the correspondence between the virtual cache memory space and the physical cache memory space by using the virtual to physical cache memory module area number corresponding calculation method in FIG. 6 . Assume that the virtual cache memory space has eight virtual cache memory modules, the number of which is the power of 2, and is sequentially numbered from 0 to 7. In addition, it is assumed that the software algorithm or a hardware in the operating system The number of the physical cache memory modules to be allocated by the algorithm is P, and the area number is 0~P-1, where P is any positive integer. Here, the memory module is cached by the entity. The number of groups P is 1 to 8 as an example, and the corresponding conditions of selecting eight different numbers of P items are listed.

關於第7圖中該虛擬快取記憶模組如何利用第6圖之計算流程對應至該實體快取記憶模組之對應方式,茲舉例說明如下:假設以第7圖中第五表項進行說明,該表項中實體快取記憶模組被分配之數量為5個,其介於,因此,將該虛擬快取記憶模組編號s1N 分別對23 =8進行取餘數之計算,若求得的餘數s1N ’小於5(於此例子中,該虛擬快取記憶模組編號s1N =0~4之餘數皆小於5),則該餘數s1N ’即為對應之該實體快取記憶模組之區域編號sPN ,即該虛擬快取記憶模組編號s1N =0~4分別對應到的實體快取記憶模組區域編號sPN 為0~4。Regarding how the virtual cache memory module in FIG. 7 corresponds to the corresponding manner of the physical cache memory module by using the calculation flow of FIG. 6, the following is an example: assume that the fifth entry in FIG. 7 is used for explanation. The number of physical cache modules in the entry is 5, which is between Therefore, the virtual cache memory module number s 1N is calculated for 2 3 = 8 respectively, if the obtained remainder s 1N ' is less than 5 (in this example, the virtual cache memory module number If the remainder of s 1N =0~4 is less than 5), the remainder s 1N ' is the corresponding area number s PN of the physical cache memory module, that is, the virtual cache memory module number s 1N =0~ 4 The corresponding physical cache memory module area number s PN is 0~4.

惟,若求得的餘數s1N ’不小於P(於此例子中,該虛擬快取記憶模組編號s1N =5~7分別對23 =8取餘數後所求得之餘數皆不小於5),則將該些虛擬快取記憶模組以其編號再分別對22 =4進行取餘數之計算,此時求得的該餘數s1N ’即為對應之該實體快取記憶模組之區域編號sPN ,即該虛擬快取記憶模組編號s1N =5~7此時對應到的實體快取記憶模組區域編號sPN 為1~3。However, if the obtained remainder s 1N ' is not less than P (in this example, the virtual cache module number s 1N = 5 to 7 respectively is 2 3 = 8 and the remainder is not less than the remainder. 5), the virtual cache memory modules are further calculated by using the number and 2 2 = 4, and the remainder s 1N ' is obtained corresponding to the physical cache memory module. The area number s PN , that is, the virtual cache memory module number s 1N = 5~7, the corresponding physical cache memory module area number s PN is 1~3.

綜上所述,請再參照第6及7圖所示,當依照程式動態需求分配實體快取記憶模組給虛擬快取記憶空間時,可將藉由程式實體位址計算出的虛擬快取記憶模組編號與所分配實體快取記憶模組個數作為該虛擬至實體快取記憶模組區域編號對應計算之參考值,並以前述對應計算方法得到所分配的實體快取記憶模組區域編號,再以此區域編號作為索引值進行第8圖所示區域至全域實體快取記憶模組對應表的查詢,進行所對應實體快取記憶模組的快取存取。In summary, please refer to the 6th and 7th diagrams. When the physical cache memory module is allocated to the virtual cache memory space according to the dynamic requirements of the program, the virtual cache calculated by the program entity address can be calculated. The memory module number and the number of the cached memory modules of the allocated entity are used as reference values for calculating the virtual to physical cache memory module area number, and the allocated physical cache memory module area is obtained by the foregoing corresponding calculation method. The number is used, and the area number is used as the index value to perform the query of the area to the global entity cache module correspondence table shown in FIG. 8 , and the cache access of the corresponding entity cache memory module is performed.

請再參照第8圖所示,其揭示程式實體位址及虛擬至實體快取記憶體之間的對應流程圖,該程式實體位址轉換出虛擬快取記憶體模組編號s1N ,再計算出實體快取記憶模組區域編號sp ,以此為索引值到區域至全域實體快取記憶模組對應表上查詢出要存取之實體快取記憶模組的全域編號(ci ,mi ),而此區域至全域的對應表上由作業系統內之軟體演算法或硬體演算法預存有依序分配到的實體快取記憶模組的全域編號。此方法可達到兩個重要的特性,其一為可將虛擬快取記憶模組編號s1N 對應到實體快取記憶模組區域編號sPN ,因此可以局部的重新配置各執行程式所使用到的實體快取記憶大小。其二為在重新配置實體快取記憶模組時,可以以最少量的實體快取記憶模組的重新配置,進行牽涉的快取線集合的移動,因此所需重新配置的時間與功率消耗可以控制在儘量少量的程度。Please refer to FIG. 8 again, which discloses a corresponding flow chart between the program entity address and the virtual to physical cache memory. The program entity address translates the virtual cache memory module number s 1N and then calculates The physical cache memory module area number s p is used as the index value to query the global number of the physical cache memory module to be accessed on the area-to-global entity cache memory module correspondence table (c i ,m i ), and the area-to-global correspondence table is pre-stored by the software algorithm or the hardware algorithm in the operating system with the global number of the physical cache memory module sequentially allocated. This method can achieve two important characteristics. One is that the virtual cache memory module number s 1N can be mapped to the physical cache memory module area number s PN , so that the partial use of each executable program can be reconfigured. Entity cache memory size. The second is that when the physical cache memory module is reconfigured, the reconfiguration of the memory module can be performed with a minimum amount of physical cache memory, and the movement of the cache line set involved is performed, so the time and power consumption required for reconfiguration can be Control to a minimum extent.

請再參照第8圖所示,其中由程式實體位址至該虛擬快取記憶模組之轉換可參照第3A、3B及3C圖之對應方式。另外,以該對應表進行該虛擬快取記憶模組至該實體快取記憶模組之對應方式及其對應之硬體架構茲舉例如下:Please refer to FIG. 8 again, wherein the conversion from the program entity address to the virtual cache memory module can refer to the corresponding manners of the 3A, 3B, and 3C diagrams. In addition, the corresponding manner of the virtual cache memory module to the physical cache memory module and the corresponding hardware structure of the corresponding table are as follows:

請參照第9圖所示,該虛擬快取記憶體具有數個虛擬快取記模組,並加以編號0~7,該虛擬快取記模組用以對應轉換程式實體位址;該實體快取記憶體具有數個實體快取記憶叢集(cluster)c0~c3,各該實體快取記憶叢集c0~c3包含四個實體快取記憶模組m0~m3。Referring to FIG. 9, the virtual cache memory has a plurality of virtual cache modules, which are numbered 0-7, and the virtual cache module is used to correspond to a translation program entity address; The memory has a plurality of physical cache memory clusters c0~c3, and each of the entity cache memory clusters c0~c3 includes four entity cache memory modules m0~m3.

請再參照第9圖所示,其中該虛擬快取記憶體及該實體快取記憶體之間係配合該虛擬至實體快取記憶模組區域編號對應計算及區域至全域實體快取記憶模組對應表進行對應。前述對應計算係如第6及7圖之計算;而該區域至全域實體快取記憶模組對應表則係預存於硬體中,由作業系統內之軟體演算法或硬體演算法依據目前程式快取存取之命中率與失誤次數,決定需要分配之實體快取記憶模組之數量,如第9圖所示,該實體快取記憶體共用非2次方數量的實體快取記憶模組進行分配運作(於此,該分配的實體快取記憶模組之數量為5,並被依序索引為0~4);其中該對應表中索引值為0的實體快取記憶模組係被配置到實體快取記憶叢集c0之實體快取記憶模組m0,如此,該虛擬快取記憶模組編號為0之資料可依路徑快取存取至該實體快取記憶叢集c0之實體快取記憶模組m0;由於該分配之實體快取記憶模組之數量為5,參照第7圖之第五表項可知,編號為1及5之該二虛擬快取記憶模組內之資料可依路徑快取存取至該實體快取記憶叢集c0之實體快取記憶模組m1;其餘虛擬快取記憶模組可參照第7圖之第五表項進行對應配置。Please refer to FIG. 9 again, wherein the virtual cache memory and the physical cache memory are matched with the virtual to physical cache memory module area number corresponding calculation and the area to global entity cache memory module. Correspondence table corresponds. The corresponding calculation is performed as shown in Figures 6 and 7; and the region-to-global entity cache memory module correspondence table is pre-stored in the hardware, and the software algorithm or hardware algorithm in the operating system is based on the current program. The hit rate and the number of misses of the cache access determine the number of physical cache modules that need to be allocated. As shown in FIG. 9, the entity cache memory shares a non-second power number of the physical cache memory module. Performing the allocation operation (here, the number of the allocated entity cache memory modules is 5, and is sequentially indexed to 0 to 4); wherein the entity cache memory module with the index value of 0 in the corresponding table is The physical cache memory module m0 is configured to the entity cache memory cluster c0. Thus, the data of the virtual cache memory module number 0 can be accessed by the path cache to the entity cache of the entity cache memory cluster c0. The memory module m0; since the number of the physical cache memory modules allocated is 5, referring to the fifth entry in FIG. 7, the data in the two virtual cache memory modules numbered 1 and 5 can be Path cache access to the entity cache memory cluster c0 entity fast Memory module M1; remaining virtual cache module can refer to the fifth entry of FIG 7 correspond to the configuration.

再者,該作業系統內之軟體演算法或硬體演算法依據各該程序P1、P2等目前所處理之快取存取是否有符合其命中率之預定要求,若有,則維持該實體快取記憶模組之分配數量;若無,則重新配置該實體快取記憶模組之分配數量。如第10圖所示,其為假設該程序P1目前所處理之快取存取無法符合命中率之預定要求,而程序P2被判定快取命中率過高,而可將P2所分配之一實體快取記憶模組重新配置給P1使用的快取重新配置(cache reconfiguration)處理過程。Furthermore, the software algorithm or the hardware algorithm in the operating system is based on whether the cache access currently processed by each of the programs P1, P2, etc. has a predetermined requirement that meets its hit rate, and if so, the entity is maintained fast. The allocated quantity of the memory module is taken; if not, the allocated quantity of the physical cache memory module is reconfigured. As shown in FIG. 10, it is assumed that the cache access currently processed by the program P1 cannot meet the predetermined requirement of the hit rate, and the program P2 is determined that the cache hit rate is too high, and one entity that can be allocated by P2 can be assigned. The cache memory module is reconfigured to the cache reconfiguration process used by P1.

請參照第11圖所示,其揭示經由實體快取記憶模組之位址掃瞄、重新配置,以及利用搬移快取線集合進行實體快取記憶模組之間的重新配置作業,以達到如第10圖所示快取重新配置之目標的操作方式。利用該搬移快取線集合之方法在重新配置之實體快取記憶模組中逐步清除與移入快取線之重新配置操作,係可以一個硬體計數器達到掃瞄的控制。在掃瞄移動快取線集合的過程,對於重新配置的實體快取記憶模組而言,已掃瞄過的快取線集合屬於重新配置完成後的虛擬快取記憶空間,此一情況之實體快取記憶模組的快取存取應使用新的位址對應設定;而尚未掃瞄的快取線集合屬於原來的虛擬快取記憶空間,因此,此一情況之實體快取記憶模組的快取存取應使用原來的位址對應設定。Please refer to FIG. 11 , which discloses that the address scanning, reconfiguration, and the reconfiguration operation between the physical cache memory modules are performed by using the physical cache memory module to achieve, for example, Figure 10 shows how the target of the cache reconfiguration is operated. By using the method of moving the cache line, the reconfiguration operation of the re-configured physical cache memory module is gradually cleared and moved into the cache line, and the control of the scan can be achieved by a hardware counter. In the process of scanning the mobile cache line set, for the reconfigured physical cache memory module, the scanned cache line set belongs to the virtual cache memory space after the reconfiguration is completed, and the entity in this case The cache access of the cache memory module should use the new address corresponding setting; and the cache line set that has not been scanned belongs to the original virtual cache memory space, therefore, the physical cache memory module of this case The cache access should use the original address corresponding settings.

請參照第12圖所示,其揭示第8圖中再加入二判斷步驟,其一判斷步驟為加入一重新配置旗標,該重新配置旗標係由該作業系統內之軟體演算法或硬體演算法進行控制,以決定該實體快取記憶模組是否正在進行重新配置,即決定獲取該實體快取記憶模組的程序P1的快取存取是要到正重新配置的實體快取記憶模組,或是要到未重新配置的實體快取記憶模組進行存取之作業,並決定釋出該實體快取記憶模組的程序P2的快取存取是要到正重新配置的實體快取記憶模組,或是搬移至其他的實體快取記憶模組進行存取之作業;另一判斷步驟為比較該實體快取記憶模組內存取之集合位址是否不大於快取線集合之掃描位址,以決定該實體快取記憶模組目前應存取新對應的實體快取記憶位址或原來對應的實體快取記憶位址。Referring to FIG. 12, it is disclosed in FIG. 8 that a second judging step is added. One judging step is to add a reconfiguration flag, which is a software algorithm or hardware in the operating system. The algorithm controls to determine whether the physical cache memory module is being reconfigured, that is, the cache access of the program P1 that obtains the physical cache memory module is to be a physical memory module that is being reconfigured. Group, or to access the unreconfigured physical cache module, and decide to release the cache access of the entity cache module P2 to the entity being reconfigured Taking a memory module or moving to another physical cache memory module for accessing operations; another determining step is to compare whether the aggregate address accessed in the physical cache memory module is not greater than the cache line set The scan address is determined to determine whether the entity cache memory module should currently access the new corresponding entity cache memory address or the original corresponding entity cache memory address.

請再參照第12圖所示,當該作業系統內之軟體演算法或硬體演算法設定該重新配置旗標為1時,即表示該實體快取記憶模組需重新配置,若某一虛擬快取記憶模組一次獲得增加n(或減少n)個實體快取記憶模組的分配,可以同時以P及P+n(或P-n)計算出對應表上的實體快取記憶模組區域編號sPN 及sPN ’,再使用實體快取記憶模組內存取集合位址與快取線集合之掃描位址進行比較,若實體快取記憶模組內存取集合位址不大於快取線集合之掃描位址,則sPN ’選擇對應到已重新配置的實體快取記憶模組;反之,則sPN 選擇未重新配置的實體快取記憶模組,進行快取記憶存取。Referring to FIG. 12 again, when the software algorithm or the hardware algorithm in the operating system sets the reconfiguration flag to 1, it means that the physical cache memory module needs to be reconfigured, if a virtual The cache memory module can increase the allocation of n (or decrease n) physical cache memory modules at one time, and can simultaneously calculate the physical cache memory module area number on the corresponding table by using P and P+n (or Pn). s PN and s PN ', and then use the physical address of the physical cache memory module to compare with the scan address of the cache line set, if the access address of the entity cache module is not greater than the cache address The scan address of the line set, the s PN 'select corresponds to the reconfigured physical cache memory module; otherwise, the s PN selects the unreconfigured physical cache memory module for cache memory access.

藉由以上的重新配置與快取記憶位址對應方法,在正被重新配置的虛擬快取記憶模組上所進行的快取記憶存取,均可以正確對應到相關實體快取記憶模組上進行存取。其特點有二:其一為在相關快取記憶重新配置的設定後即可在背景中趁處理器不使用該快取記憶時,進行逐步累進的重新配置工作,牽涉重新配置實體快取記憶模組的執行程式不需等待其重新配置完成,即可繼續執行,此有助於各程式執行的速度。另一特點為在任何時刻的快取存取均只需要做一次的實體快取記憶模組存取,此有利於降低功率與減少快取模組的競爭,其亦可導致效能的增進。With the above reconfiguration and cache memory address mapping method, the cache memory access performed on the virtual cache memory module being reconfigured can correctly correspond to the relevant entity cache memory module. Access. There are two characteristics: one is that after the setting of the relevant cache memory reconfiguration, the processor can perform the step-by-step reconfiguration work in the background when the processor does not use the cache memory, and involves reconfiguring the entity cache mode. The execution program of the group can continue to execute without waiting for its reconfiguration to complete, which helps the speed of execution of each program. Another feature is that the cache access at any time requires only one physical cache access, which is beneficial to reduce power and reduce competition of the cache module, which can also lead to improved performance.

綜上所述,本發明藉由上述步驟之操作,以克服習用基於集合分模組之快取記憶體之分割方法具有在效能上的問題,因此本發明相較於上述習用技術可進一步減少存取的等待與搜尋的時間,並降低存取時電路的消耗功率等功效。In summary, the present invention overcomes the problem of the efficiency of the conventional method of segmenting the cache memory based on the cluster module by the operation of the above steps. Therefore, the present invention can further reduce the memory compared with the above-mentioned conventional techniques. Take the time of waiting and searching, and reduce the power consumption of the circuit when accessing.

雖然本發明已利用上述較佳實施例揭示,然其並非用以限定本發明,任何熟習此技藝者在不脫離本發明之精神和範圍之內,相對上述實施例進行各種更動與修改仍屬本發明所保護之技術範疇,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。While the invention has been described in connection with the preferred embodiments described above, it is not intended to limit the scope of the invention. The technical scope of the invention is protected, and therefore the scope of the invention is defined by the scope of the appended claims.

第1圖:習知基於結合度的動態快取記憶體分割結構示意圖。Figure 1: Schematic diagram of a conventional dynamic cache memory partitioning structure based on the degree of integration.

第2圖:本發明較佳實施例之基於集合分模組之快取記憶體之分割方法流程圖。2 is a flow chart of a method for segmenting a cache memory based on a set sub-module according to a preferred embodiment of the present invention.

第3A圖:為多程式段對應至一虛擬快取記憶空間示意圖。Figure 3A: Schematic diagram of a multi-program segment corresponding to a virtual cache memory space.

第3B圖:為一程式之程式實體位址對應至一虛擬快取記憶空間示意圖。Figure 3B: Schematic diagram of a program entity address corresponding to a virtual cache memory space.

第3C圖:為一程式之程式實體位址進行位移對應至另一虛擬快取記憶空間示意圖。Figure 3C: Schematic diagram of displacement of a program's program entity address to another virtual cache memory space.

第4圖:為一程式之程式實體位址藉由一虛擬對應與實體對應處理對應至一實體快取記憶模組的示意圖。Figure 4 is a schematic diagram of a program entity address corresponding to a physical cache memory module by a virtual correspondence and entity correspondence processing.

第5圖:本發明較佳實施例之一程式實體位址空間、一虛擬快取記憶體及一實體快取記憶體之間的對應示意圖。Figure 5 is a diagram showing the correspondence between a program entity address space, a virtual cache memory and an entity cache memory in a preferred embodiment of the present invention.

第6圖:本發明較佳實施例之虛擬快取記憶空間至所分配實體快取記憶空間的運算流程圖。Figure 6 is a flow chart showing the operation of the virtual cache memory space to the allocated entity cache memory space in the preferred embodiment of the present invention.

第7圖:當該虛擬快取記憶空間具有8個虛擬快取記憶模組時,其利用第6圖之運算流程對應到八種不同實體快取記憶模組數量之情況。Figure 7: When the virtual cache memory space has eight virtual cache memory modules, the operation flow of Figure 6 corresponds to the number of eight different physical cache memory modules.

第8圖:本發明較佳實施例之程式實體位址依據第3圖之運算流程對應至虛擬快取記憶模組,再依據第6圖計算出實體快取記憶模組的區域編號,再以此區域編號查詢區域至全域實體快取記憶模組對應表對應至實體快取記憶模組全域編號,此流程顯示於此示意圖。Figure 8: The program entity address of the preferred embodiment of the present invention corresponds to the virtual cache memory module according to the operation flow of Figure 3, and then the area number of the physical cache memory module is calculated according to the sixth figure, and then The area number query area to the global entity cache memory module correspondence table corresponds to the entity cache memory module global number, and the flow is shown in this diagram.

第9圖:本發明較佳實施例之虛擬快取記憶模組與實體快取記憶模組之間藉由一虛擬至實體快取記憶模組區域編號對應計算及一作業系統內之軟體演算法或硬體演算法所支配的一區域至全域實體快取記憶模組對應表進行對應示意圖。Figure 9: Correspondence calculation between a virtual cache memory module and a physical cache memory module according to a preferred embodiment of the present invention by a virtual to physical cache memory module area number and a software algorithm in an operating system Or a corresponding diagram of a region-to-global entity cache memory module correspondence table governed by the hardware algorithm.

第10圖:為二程序間進行重新配置之示意圖。Figure 10: Schematic diagram of reconfiguration between two programs.

第11圖:為經由實體快取記憶模組之位址掃瞄、重新配置,以及利用搬移快取線集合進行實體快取記憶模組之間的重新配置示意圖。Figure 11: Schematic diagram of re-configuration of the physical cache memory modules for scanning, reconfiguring, and retrieving the addresses of the memory modules via the physical cache.

第12圖:為第8圖中再加入一重新配置旗標及一快取線集合之掃描位址的完整實體快取記憶位址轉換流程示意圖。Figure 12 is a schematic diagram of a complete entity cache memory address conversion process for adding a reconfiguration flag and a scan address of a cache line set in Fig. 8.

Claims (10)

一種基於集合分模組之快取記憶體之分割方法,包含步驟:一程式實體位址對應步驟,係將至少一程式實體位址對應到數個虛擬快取記憶模組之至少一個;一虛擬快取記憶模組對應步驟,利用一虛擬至實體快取記憶模組區域編號對應計算與一區域至全域實體快取記憶模組對應表查詢將該數個虛擬快取記憶模組之內容對應至數個實體快取記憶模組,其中該實體快取記憶模組分配數量之選取及該區域至全域實體快取記憶模組對應表之設定係由一作業系統內之軟體演算法或一硬體演算法依據目前進行快取存取之程式之命中率所決定,且其中該虛擬至實體快取記憶模組區域編號對應計算係包含步驟:依據選取之實體快取記憶模組之數量P,以公式計算一k值;將該數個虛擬快取記憶模組依序編號s1N ,N為0至u-1,其中u為一個程序所對應之虛擬快取記憶模組的數量,且該虛擬快取記憶模組的數量為2的次方,以及將該數個實體快取記憶模組依序做區域編號sPN ,N為0至v-1,其中v為一個程序動態配置的實體快取記憶模組的數量,將各該虛擬快取記憶模組以其編號s1N 對2k 進行如下式之取餘數之運算:s1N '=s1N mod2k ;判斷求得之餘數s1N ’是否小於P;依據該判斷結果執行一實體快取記憶模組之區域編號sPN 對應步驟,其中,若餘數s1N ’小於P,則該餘數s1N ’即為對應之該實體快取記憶模組之全域編號sPN ,若餘數s1N ’不小於P,則將該虛擬快取記憶模組以其編號再進行如下式之取餘數之運算:s1N '=s1N mod2k-1 ,該餘數s1N ’即為對應之該實體快取記憶模組之區域編號sPN ;及回覆該虛擬快取記憶模組所欲存取之實體快取記憶模組之區域編號sPN ;及一實體快取記憶模組之存取步驟,係將各該虛擬快取記憶模組的快取存取,依據該對應到的實體快取記憶模組,進行一實體快取記憶體之快取存取。A method for segmenting a cache memory based on a set of modules, comprising the steps of: a program entity address corresponding step, corresponding to at least one program entity address corresponding to at least one of the plurality of virtual cache memory modules; Corresponding to the memory module corresponding step, using a virtual to physical cache memory module area number corresponding calculation and an area to global entity cache memory module correspondence table query to correspond to the contents of the plurality of virtual cache memory modules a plurality of physical cache memory modules, wherein the selection of the number of allocations of the physical cache memory module and the setting of the correspondence table of the area to the global entity cache module are performed by a software algorithm or a hardware in an operating system. The algorithm is determined according to the hit rate of the program currently accessed by the cache access, and wherein the virtual to physical cache memory module area number corresponding to the calculation system includes the steps of: according to the number P of the selected physical cache memory modules, The formula calculates a k value; the plurality of virtual cache memory modules are sequentially numbered s 1N , N is 0 to u-1, where u is the number of virtual cache memory modules corresponding to a program, and the virtual The number of cached memory modules is 2, and the number of physical cache modules is sequentially numbered s PN , N is 0 to v-1, where v is a program dynamically configured entity fast Taking the number of memory modules, each virtual cache memory module performs the remainder of the following equation with its number s 1N for 2 k : s 1N '= s 1N mod2 k ; the remainder of the determination is s 1N ' Whether it is less than P; performing the step of the area number s PN of the physical cache memory module according to the judgment result, wherein if the remainder s 1N ' is smaller than P, the remainder s 1N ' is the corresponding cache mode of the entity The global number s PN of the group , if the remainder s 1N 'is not less than P, the virtual cache memory module is further subjected to the operation of the remainder of the following equation: s 1N '=s 1N mod2 k-1 , The remainder s 1N ' is the area number s PN corresponding to the physical cache memory module; and replies to the virtual cache memory module Accessing the physical cache memory module area number s PN ; and accessing the physical cache memory module, the cache access of each virtual cache memory module, according to the corresponding entity The memory module is cached for quick access of a physical cache memory. 依申請專利範圍第1項所述之基於集合分模組之快取記憶體之分割方法,其中該實體快取記憶模組數量不大於該虛擬快取記憶模組數量,且該實體快取記憶模組數量為任意整數。The method for segmenting the cache memory based on the aggregate module according to the first aspect of the patent application, wherein the number of the physical cache memory module is not greater than the number of the virtual cache memory module, and the entity cache memory The number of modules is any integer. 依申請專利範圍第1項所述之基於集合分模組之快取記憶體之分割方法,其中在該程式段實體位址對應步驟中,將低使用率及互補使用的程式段對應到同一個虛擬快取記憶空間,有高度競爭性的程式段對應到不同的虛擬快取記憶空間。According to the segmentation method of the cache module based on the aggregate module according to Item 1 of the patent application scope, in the step corresponding to the physical address of the program segment, the low usage rate and the complementary used program segments are corresponding to the same one. The virtual cache memory space has highly competitive blocks corresponding to different virtual cache memories. 依申請專利範圍第3項所述之基於集合分模組之快取記憶體之分割方法,其中在同一虛擬快取記憶空間內之位址對應,可利用記憶體交錯的安排方式進行對應,或者由數個程式段對應在一虛擬快取記憶空間內位移的技術,以錯開對應到同一虛擬快取空間的其他程式段。According to the method for segmenting the cache memory based on the aggregate module according to Item 3 of the patent application scope, wherein the addresses in the same virtual cache memory space correspond to each other, the memory interleave arrangement may be used for correspondence, or A technique in which a plurality of blocks correspond to displacements in a virtual cache memory space to stagger other blocks corresponding to the same virtual cache space. 依申請專利範圍第1項所述之基於集合分模組之快取記憶體之分割方法,其中在該虛擬快取記憶空間對應步驟中,可對應至實體快取記憶空間為使用共用非2次方數量的實體快取記憶模組,進行快取存取運作。According to the method for segmenting the cache memory based on the aggregate module according to the first aspect of the patent application, in the corresponding step of the virtual cache memory space, the physical cache memory space may be corresponding to the use sharing non-two times. A square number of physical cache memory modules for cache access operations. 依申請專利範圍第1項所述之基於集合分模組之快取記憶體之分割方法,其中該作業系統內之軟體演算法依據目前所處理之快取存取是否符合其命中率與失誤次數之相對要求,若有,則維持該實體快取記憶模組之分配數量,若無,則以該作業系統內之軟體演算法或該硬體演算法重新配置該實體快取記憶模組之分配數量。According to the method for segmenting the cache memory based on the aggregate module according to Item 1 of the patent application scope, wherein the software algorithm in the operating system conforms to the hit rate and the number of errors according to the currently processed cache access. The relative requirement, if any, maintains the allocated number of the physical cache memory module, and if not, reconfigures the allocation of the physical cache memory module by the software algorithm or the hardware algorithm in the operating system Quantity. 依申請專利範圍第6項所述之基於集合分模組之快取記憶體之分割方法,其中在該重新配置過程中,利用搬移快取線集合進行實體快取記憶模組之間的重新配置作業。The method for segmenting cache memory based on the aggregate module according to item 6 of the patent application scope, wherein in the reconfiguration process, the reconfiguration of the physical cache memory module is performed by using the moving cache line set operation. 依申請專利範圍第7項所述之基於集合分模組之快取記憶體之分割方法,其中該搬移快取線集合之方法將實體快取記憶模組逐步清除與移入快取線之重新配置操作係以一個硬體計數器達到掃瞄的控制。According to the method for segmenting the cache memory based on the aggregate module according to Item 7 of the patent application scope, the method for moving the cache line to gradually clear and reconfigure the physical cache memory module The operation is controlled by a hardware counter to achieve scanning. 依申請專利範圍第7項所述之基於集合分模組之快取記憶體之分割方法,其中在掃瞄移動快取線集合的過程中,對於重新配置的實體快取記憶模組而言,已掃瞄過的快取線集合,其實體快取記憶模組的快取存取使用新對應的實體快取記憶位址對應設定,而尚未掃瞄的快取線集合,其實體快取記憶模組的快取存取使用原來對應的實體快取記憶位址對應設定。According to the method for segmenting the cache memory based on the aggregate module according to Item 7 of the patent application scope, in the process of scanning the moving cache line set, for the reconfigured physical cache memory module, The scanned cache line collection, the cache access of the physical cache memory module uses the corresponding corresponding physical cache memory address corresponding setting, and the cache line set that has not been scanned, the entity cache memory The cache access of the module uses the corresponding corresponding physical cache memory address corresponding setting. 依申請專利範圍第6項所述之基於集合分模組之快取記憶體之分割方法,其中在該重新配置過程中係包含二判斷步驟,其一判斷步驟為加入一重新配置旗標,該重新配置旗標係由該作業系統內之軟體演算法或該硬體演算法進行控制,以決定該實體快取記憶模組是否正在進行重新配置;另一判斷步驟為比較該實體快取記憶模組內存取之集合位址是否不大於快取線集合之掃描位址,以決定該實體快取記憶模組目前應存取新對應的實體快取記憶位址或原來對應的實體快取記憶位址。According to the sixth aspect of the patent application, the method for segmenting the cache memory based on the aggregate module, wherein the reconfiguration process includes a second determining step, and a determining step is to add a reconfiguration flag, The reconfiguration flag is controlled by the software algorithm or the hardware algorithm in the operating system to determine whether the physical cache module is being reconfigured; another determining step is to compare the entity cache mode. Whether the set address of the intra-group access is not greater than the scan address of the cache line set, so as to determine that the entity cache memory module should currently access the new corresponding entity cache memory address or the original corresponding entity cache memory. Address.
TW99108010A 2010-03-18 2010-03-18 Set-based modular cache partitioning method TWI420311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99108010A TWI420311B (en) 2010-03-18 2010-03-18 Set-based modular cache partitioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99108010A TWI420311B (en) 2010-03-18 2010-03-18 Set-based modular cache partitioning method

Publications (2)

Publication Number Publication Date
TW201133237A TW201133237A (en) 2011-10-01
TWI420311B true TWI420311B (en) 2013-12-21

Family

ID=46751121

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99108010A TWI420311B (en) 2010-03-18 2010-03-18 Set-based modular cache partitioning method

Country Status (1)

Country Link
TW (1) TWI420311B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI788890B (en) * 2021-03-24 2023-01-01 日商鎧俠股份有限公司 memory system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015686A1 (en) * 2004-07-14 2006-01-19 Silicon Optix Inc. Cache memory management system and method
US7069387B2 (en) * 2003-03-31 2006-06-27 Sun Microsystems, Inc. Optimized cache structure for multi-texturing
TWI285810B (en) * 2004-06-30 2007-08-21 Intel Corp Method, apparatus, and system for partitioning a shared cache of a chip multi-processor, and computer-readable recording medium having stored thereon related instructions
TWI297832B (en) * 2004-12-27 2008-06-11 Intel Corp System and method for non-uniform cache in a multi-core processor
TW200849012A (en) * 2007-02-22 2008-12-16 Qualcomm Inc Dynamic configurable texture cache for multi-texturing
TWI321726B (en) * 2003-10-14 2010-03-11 Ibm Method of dynamically controlling cache size

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7069387B2 (en) * 2003-03-31 2006-06-27 Sun Microsystems, Inc. Optimized cache structure for multi-texturing
TWI321726B (en) * 2003-10-14 2010-03-11 Ibm Method of dynamically controlling cache size
TWI285810B (en) * 2004-06-30 2007-08-21 Intel Corp Method, apparatus, and system for partitioning a shared cache of a chip multi-processor, and computer-readable recording medium having stored thereon related instructions
US20060015686A1 (en) * 2004-07-14 2006-01-19 Silicon Optix Inc. Cache memory management system and method
TWI297832B (en) * 2004-12-27 2008-06-11 Intel Corp System and method for non-uniform cache in a multi-core processor
TW200849012A (en) * 2007-02-22 2008-12-16 Qualcomm Inc Dynamic configurable texture cache for multi-texturing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI788890B (en) * 2021-03-24 2023-01-01 日商鎧俠股份有限公司 memory system

Also Published As

Publication number Publication date
TW201133237A (en) 2011-10-01

Similar Documents

Publication Publication Date Title
KR101761301B1 (en) Memory resource optimization method and apparatus
US10067872B2 (en) Memory speculation for multiple memories
US7899994B2 (en) Providing quality of service (QoS) for cache architectures using priority information
Jeannot et al. Near-optimal placement of MPI processes on hierarchical NUMA architectures
US7913040B2 (en) Managing working set use of a cache via page coloring
EP2645259A1 (en) Method, device and system for caching data in multi-node system
US9098406B2 (en) Managing addressable memory in heterogeneous multicore processors
CN1808400A (en) Methods and apparatus for managing a shared memory in a multi-processor system
KR20130018742A (en) Gpu support for garbage collection
US20160342518A1 (en) Real-time cache behavior forecast using hypothetical cache
Franey et al. Tag tables
CN103455443A (en) Buffer management method and device
CN104346284A (en) Memory management method and memory management equipment
Wu et al. When storage response time catches up with overall context switch overhead, what is next?
JP2009015509A (en) Cache memory device
WO2014051544A2 (en) Improved performance and energy efficiency while using large pages
TWI420311B (en) Set-based modular cache partitioning method
KR101456370B1 (en) Method and device for management of storage
Zhang et al. Lightweight dynamic partitioning for last level cache of multicore processor on real system
JP7469306B2 (en) Method for enabling allocation of virtual pages to discontiguous backing physical subpages - Patents.com
CN111949562A (en) Application processor, system-on-chip and method for operating memory management unit
CN1740994A (en) System and method for DMA controller with multi-dimensional line-walking functionality
JP7071640B2 (en) Arithmetic processing device, information processing device and control method of arithmetic processing device
KR101942663B1 (en) Method and system to improve tlb coverage by using chunks of contiguous memory
Lira et al. Analysis of non-uniform cache architecture policies for chip-multiprocessors using the parsec benchmark suite

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees