WO2006082554A2 - Systeme de traitement de donnees comprenant une unite memoire cache - Google Patents
Systeme de traitement de donnees comprenant une unite memoire cache Download PDFInfo
- Publication number
- WO2006082554A2 WO2006082554A2 PCT/IB2006/050319 IB2006050319W WO2006082554A2 WO 2006082554 A2 WO2006082554 A2 WO 2006082554A2 IB 2006050319 W IB2006050319 W IB 2006050319W WO 2006082554 A2 WO2006082554 A2 WO 2006082554A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- data processing
- processing system
- mask
- bits
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
- G06F12/0848—Partitioned cache, e.g. separate instruction and operand caches
Definitions
- the invention relates to a data processing system comprising a cache unit, the cache unit comprising an address space, wherein the address space is divided into a plurality of sets.
- Modern CPUs employ caches to hide the latency to access data in memory.
- Today's systems run multiple processes simultaneously, time-sharing a single CPU and/or utilizing multiple CPUs in a System-on-Chip (SoC).
- SoC System-on-Chip
- These processes typically operate on large quantities of data that are stored in low-cost off-chip memory, such as high-definition video frames.
- Caches are needed to reduce the average memory access time for the processes, while operating transparently to an application programmer. With multiple processes competing for cache storage space, and off-chip memory accesses being relatively slow, the processes seriously impact each other's performance. To maintain predictable process performance effective cache management techniques are needed.
- An example of such a cache management technique is known from US
- the apparatus includes a set identification device to specify a group of set-associative data blocks in a cache memory or translation- look aside buffer.
- a block replacement logic circuit is used to identify replaceable blocks within the set- associative data blocks that can store new information.
- the block replacement logic circuit is also used to identify un-replaceable blocks within the set-associative data blocks that cannot store new information.
- the block replacement logic circuit only writes new information to the replaceable blocks of the set-associative data blocks.
- the block replacement logic circuit can be implemented using a block replacement mask to identify within the set-associative data blocks the replaceable blocks and the un-replaceable blocks.
- Cache management requires different components to function properly.
- a system resource manager must collect the requirements of the different processes that (want to) operate. It should implement some policy to decide on cache space to be granted to the requesting process, depending upon real-time requirements and user priorities. Such a manager is typically implemented in software, making decisions on time granularities of tens of milliseconds to seconds. Once these decisions are made, the system should implement and enforce the granted spaces. Enforcing this cache management policy, once chosen, requires dedicated hardware support to operate at the rate of individual cache misses, at sub- microsecond speed. Clearly, for programmable systems the chip hardware is not application specific, and the cache control hardware must be able to flexibly adopt different configurations as directed by the cache management software.
- cache management techniques such as the dynamic partitioning technique as set forth in US 5,584,014 make use of 'way partitioning' with associative caches. Unfortunately, such solutions do not allow management of many individual tasks or memory objects.
- the invention relies on the perception that set-associative caches have relatively few ways (typically less than 16) and therefore way partitioning is not a good solution.
- the invention implements a cache partitioning technique that operates on cache sets. As caches typically have many sets (such as several hundreds), set partitioning can provide the granularity to control many individual objects.
- the problem with conventional caches is that the sets are selected by direct addressing. This selection must be unique and uniform, otherwise different processes cannot access each others shared data.
- the proposed solution adopts a lookup-table that can contain a generic set selection function, and that is integrated close to the cache itself to provide a single global mapping shared between all processes and processors, and that can operate at the speed of the cache itself.
- the data processing system uses a mapping function to partition the address space into segments, and the mapping function provides as output an index value for selecting a set within a segment.
- the mapping function translates a plurality of input bits into at least one output bit, the output bit being used to select the set within the segment.
- the plurality of input bits is derived from a tag portion and an index portion of a requested address.
- mapping function is implemented as a lookup table.
- mapping function further provides mask bits, the mask bits being used to control victim assignment if a cache miss occurs.
- the lookup table comprises an additional field, the additional field comprising the mask bits.
- the data processing system further comprises a victim mask table, wherein the lookup table comprises an additional field, the additional comprising a cache domain indicator which is used to select the mask bits in the victim mask table.
- the mask bits are selected from a victim mask table, and an index value provided by an application is used to select the mask bits from the victim mask table.
- FIG. 1 illustrates an example of a cache organization according to the prior art
- Fig. 2 illustrates an example of a cache organization according to the invention
- Fig. 3 illustrates an extension of the cache organization according to the invention illustrated in Fig. 2;
- Fig. 4 illustrates an example of cache partitioning according to the invention
- Fig. 5 illustrates an example of a table look-up based function according to the invention.
- Fig. 6 illustrates an example of a cache organization according to the invention, extended with application-driven way masking.
- a preferred embodiment encompasses a system with one or more processors, each having a first level (Ll) cache.
- the cache management solution is implemented for the shared and larger second level (L2) cache.
- a table-lookup facility (set table select) provides a translation for addresses of individual segments, spanning the full memory range of interest, which is typically a large part of the system address space.
- the table lookup provides a set of cache-sets that shall be used for that memory segment. Preferably, this is a power-of-two number of sets, indicated by masking-out a number of address bits and replacing them with a new value, where both the mask and the new value are provided by the table lookup.
- the table lookup actually provides a flexible (programmable) function that selects a cache set for every incoming address, allowing improved control in comparison with the standard function that just uses a sub-range of the incoming address.
- the traditional cache organization is shown for clarity in Fig. 1.
- the new table look-up based function is indicated in pseudo-code semantics in Fig. 5, and the corresponding cache organization is depicted in Fig. 2.
- the mapping function would be implemented in a cache controller unit. This proposed function partitions the address space in segments, where each segment is identified by a set of address-bits taken from both the tag and index part of the address. Each segment typically covers an addressed data space of a size considerably larger than the sizes of the cache lines that belong to one set.
- addresses belonging to the segment are translated to a set index number by a partial masking and re-assigning expression.
- a particular implementation might use a map function that spans a lGbyte address range, using 4K segments, each of size 256Kbyte. In a 32-bit address system, the remaining 3Gbyte of address space might remain unmapped, for which set index selection remains according to the traditional scheme (as indicated by the 'has map?' in Fig. 2 and Fig. 5).
- the map function typically translates multiple input bits (from the original tag and index address section) into fewer output bits for the actual set selection. As result of this many-to-one property, the cache architecture need to store and match longer tags in comparison with conventional caches without this invention.
- the number of bits to store and compare per tag would encompass the sum of the traditional tag and index bits.
- the lowest few index bits could remain unmapped, to reduce on the extra cost of longer tags, but also increasing the granularity of cache management.
- Way masking is an earlier known, but hardly -if ever- used, method for cache partitioning.
- Way masking (for example as described in US 5,584,014) can be used with great advantage in combination with set partitioning.
- Way masking is used to control victim assignment when a cache miss occurs.
- a cache controller supports a 'locked' status in conjunction with the tag information per cache line. Locked cache lines will never be selected as victim and be retained in the cache.
- the proposed way masking 'or's this locked information with a set of mask bits (one bit per way), to further reduce the victim selection freedom. It is proposed that such mask bits are also provided by the set index select function. In one embodiment, these mask bits could be directly provided from an extra field in the table lookup.
- An alternative embodiment would store a smaller set of masks, of which one is selected through an extra 'cache domain' field in the table lookup. This latter scheme is depicted in Fig. 3.
- the 'cache domain' field could alternatively be provided as function of the application that generated this particular lookup request, depicted with the 'application id.' input in Fig. 6.
- the application could directly provide a value for the way mask bits, or alternatively, an 'application id.' is provided as index into a table that stores the actual way mask bits, as shown in Fig. 6. In either case, the application input value is stored in a dedicated register. Multi-processor systems would have one such register for every processor. It is a responsibility of the operating system (or the thread scheduler) to update the value in this register according to the new application that starts executing after a context switch.
- the hardware could provide functionality to create run time combinations of application driven way masks with address driven way masks. Potential good combination functions are (a) concatenating the bits of these two inputs, or (b) selecting the application- input when the table-lookup results in a 0-value domain id, or the non-zero table-lookup result otherwise.
- This invention describes a method to optimize cache behavior, in particular in situations where multiple processes compete for cache storage space.
- the invention proposes a new method for cache footprint control.
- the method allows explicit mapping of memory objects into restricted areas of the cache.
- the method operates orthogonal to known process- based cache protection, allowing a best-of-both- worlds system behavior.
- the methods implements in hardware the low- level and high-speed cache control that is required for quality-of-service resource management software, and is thereby an important prerequisite for predictable performance in complex systems.
- the method is implemented for a multi-processor shared level-2 cache, but is equally applicable to single-processor caches. It is remarked that the scope of protection of the invention is not restricted to the embodiments described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
La présente invention se rapporte à un procédé de partitionnement de mémoires caches, qui est mis en oeuvre sur des ensembles de mémoires caches. Comme les mémoires caches comportent généralement de nombreux ensembles (plusieurs centaines par exemple), le partitionnement des ensembles peut fournir la granularité requise pour commander de nombreux objets individuels. Le problème des caches classiques réside dans le fait que les ensembles sont sélectionnés par adressage direct. Ladite sélection doit être unique et uniforme, faute de quoi les divers processus ne peuvent accéder aux données partagées des uns et des autres. La solution selon l'invention fait appel à une table de consultation qui peut contenir une fonction de sélection d'ensembles générique, qui est intégrée à proximité de la mémoire cache elle-même afin de fournir un mappage global unique partagé entre tous les processus et processeurs, et qui peut fonctionner à la vitesse de la mémoire cache elle-même.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05100705.2 | 2005-02-02 | ||
EP05100705 | 2005-02-02 | ||
EP05104652.2 | 2005-05-31 | ||
EP05104652 | 2005-05-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2006082554A2 true WO2006082554A2 (fr) | 2006-08-10 |
WO2006082554A3 WO2006082554A3 (fr) | 2006-10-12 |
Family
ID=36602455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2006/050319 WO2006082554A2 (fr) | 2005-02-02 | 2006-01-30 | Systeme de traitement de donnees comprenant une unite memoire cache |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2006082554A2 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009107048A2 (fr) | 2008-02-25 | 2009-09-03 | Telefonaktiebolaget L M Ericsson (Publ) | Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs |
WO2011023617A1 (fr) * | 2009-08-25 | 2011-03-03 | International Business Machines Corporation | Segmentation de cache dans des environnements virtualisés |
CN102323909A (zh) * | 2011-09-13 | 2012-01-18 | 北京北大众志微系统科技有限责任公司 | 实现使用大容量高速缓存的内存管理方法及装置 |
US8543769B2 (en) | 2009-07-27 | 2013-09-24 | International Business Machines Corporation | Fine grained cache allocation |
WO2023130316A1 (fr) * | 2022-01-06 | 2023-07-13 | 中国科学院计算技术研究所 | Procédé et système de division dynamique de cache prenant en compte aussi bien la qualité de service qu'un taux d'utilisation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5584014A (en) * | 1994-12-20 | 1996-12-10 | Sun Microsystems, Inc. | Apparatus and method to preserve data in a set associative memory device |
US20020002657A1 (en) * | 1997-01-30 | 2002-01-03 | Sgs-Thomson Microelectronics Limited | Cache system for concurrent processes |
US20020174301A1 (en) * | 2001-05-17 | 2002-11-21 | Conway Patrick N. | Method and system for logical partitioning of cache memory structures in a partitioned computer system |
US6493800B1 (en) * | 1999-03-31 | 2002-12-10 | International Business Machines Corporation | Method and system for dynamically partitioning a shared cache |
US20030196041A1 (en) * | 1997-01-30 | 2003-10-16 | Stmicroelectronics Limited | Cache system |
-
2006
- 2006-01-30 WO PCT/IB2006/050319 patent/WO2006082554A2/fr not_active Application Discontinuation
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5584014A (en) * | 1994-12-20 | 1996-12-10 | Sun Microsystems, Inc. | Apparatus and method to preserve data in a set associative memory device |
US20020002657A1 (en) * | 1997-01-30 | 2002-01-03 | Sgs-Thomson Microelectronics Limited | Cache system for concurrent processes |
US20030196041A1 (en) * | 1997-01-30 | 2003-10-16 | Stmicroelectronics Limited | Cache system |
US6493800B1 (en) * | 1999-03-31 | 2002-12-10 | International Business Machines Corporation | Method and system for dynamically partitioning a shared cache |
US20020174301A1 (en) * | 2001-05-17 | 2002-11-21 | Conway Patrick N. | Method and system for logical partitioning of cache memory structures in a partitioned computer system |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009107048A2 (fr) | 2008-02-25 | 2009-09-03 | Telefonaktiebolaget L M Ericsson (Publ) | Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs |
WO2009107048A3 (fr) * | 2008-02-25 | 2009-11-26 | Telefonaktiebolaget L M Ericsson (Publ) | Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs |
US8095736B2 (en) | 2008-02-25 | 2012-01-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and systems for dynamic cache partitioning for distributed applications operating on multiprocessor architectures |
US8543769B2 (en) | 2009-07-27 | 2013-09-24 | International Business Machines Corporation | Fine grained cache allocation |
CN102483718A (zh) * | 2009-08-25 | 2012-05-30 | 国际商业机器公司 | 虚拟化环境中的高速缓存分区 |
GB2485328A (en) * | 2009-08-25 | 2012-05-09 | Ibm | Cache partitioning in virtualized environments |
WO2011023617A1 (fr) * | 2009-08-25 | 2011-03-03 | International Business Machines Corporation | Segmentation de cache dans des environnements virtualisés |
US8739159B2 (en) | 2009-08-25 | 2014-05-27 | International Business Machines Corporation | Cache partitioning with a partition table to effect allocation of shared cache to virtual machines in virtualized environments |
US8745618B2 (en) | 2009-08-25 | 2014-06-03 | International Business Machines Corporation | Cache partitioning with a partition table to effect allocation of ways and rows of the cache to virtual machine in virtualized environments |
CN102483718B (zh) * | 2009-08-25 | 2014-12-24 | 国际商业机器公司 | 虚拟化环境中的高速缓存分区 |
GB2485328B (en) * | 2009-08-25 | 2016-07-13 | Ibm | Cache partitioning in virtualized environments |
CN102323909A (zh) * | 2011-09-13 | 2012-01-18 | 北京北大众志微系统科技有限责任公司 | 实现使用大容量高速缓存的内存管理方法及装置 |
WO2023130316A1 (fr) * | 2022-01-06 | 2023-07-13 | 中国科学院计算技术研究所 | Procédé et système de division dynamique de cache prenant en compte aussi bien la qualité de service qu'un taux d'utilisation |
Also Published As
Publication number | Publication date |
---|---|
WO2006082554A3 (fr) | 2006-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10963387B2 (en) | Methods of cache preloading on a partition or a context switch | |
US6381676B2 (en) | Cache management for a multi-threaded processor | |
US8095736B2 (en) | Methods and systems for dynamic cache partitioning for distributed applications operating on multiprocessor architectures | |
US5809563A (en) | Method and apparatus utilizing a region based page table walk bit | |
US7085890B2 (en) | Memory mapping to reduce cache conflicts in multiprocessor systems | |
US10838864B2 (en) | Prioritizing local and remote memory access in a non-uniform memory access architecture | |
EP1010080B1 (fr) | Mecanisme de gestion de l'attribution de tampons de memoire virtuelle a des processus virtuels, fonde sur l'anciennete (lru - least recently used) | |
US8190839B2 (en) | Using domains for physical address management in a multiprocessor system | |
US7461209B2 (en) | Transient cache storage with discard function for disposable data | |
US20100318742A1 (en) | Partitioned Replacement For Cache Memory | |
US20110010503A1 (en) | Cache memory | |
US5978888A (en) | Hardware-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels | |
JP2020529656A (ja) | アドレス変換キャッシュ | |
JPH10307756A (ja) | キャッシュ除外方法及びシステム | |
US11604733B1 (en) | Limiting allocation of ways in a cache based on cache maximum associativity value | |
US8694755B1 (en) | Virtual memory management for real-time embedded devices | |
US10289565B2 (en) | Cache drop feature to increase memory bandwidth and save power | |
US11232042B2 (en) | Process dedicated in-memory translation lookaside buffers (TLBs) (mTLBs) for augmenting memory management unit (MMU) TLB for translating virtual addresses (VAs) to physical addresses (PAs) in a processor-based system | |
US11256625B2 (en) | Partition identifiers for page table walk memory transactions | |
KR101893966B1 (ko) | 메모리 관리 방법 및 장치, 및 메모리 컨트롤러 | |
US6026470A (en) | Software-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels | |
WO2006082554A2 (fr) | Systeme de traitement de donnees comprenant une unite memoire cache | |
JP2009015509A (ja) | キャッシュメモリ装置 | |
US8266379B2 (en) | Multithreaded processor with multiple caches | |
US5983322A (en) | Hardware-managed programmable congruence class caching mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06710787 Country of ref document: EP Kind code of ref document: A2 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 6710787 Country of ref document: EP |