WO2006082554A2 - Systeme de traitement de donnees comprenant une unite memoire cache - Google Patents

Systeme de traitement de donnees comprenant une unite memoire cache Download PDF

Info

Publication number
WO2006082554A2
WO2006082554A2 PCT/IB2006/050319 IB2006050319W WO2006082554A2 WO 2006082554 A2 WO2006082554 A2 WO 2006082554A2 IB 2006050319 W IB2006050319 W IB 2006050319W WO 2006082554 A2 WO2006082554 A2 WO 2006082554A2
Authority
WO
WIPO (PCT)
Prior art keywords
cache
data processing
processing system
mask
bits
Prior art date
Application number
PCT/IB2006/050319
Other languages
English (en)
Other versions
WO2006082554A3 (fr
Inventor
Josephus T. J. Van Eijndhoven
Paul Stravers
Anca M. Molnos
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2006082554A2 publication Critical patent/WO2006082554A2/fr
Publication of WO2006082554A3 publication Critical patent/WO2006082554A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches

Definitions

  • the invention relates to a data processing system comprising a cache unit, the cache unit comprising an address space, wherein the address space is divided into a plurality of sets.
  • Modern CPUs employ caches to hide the latency to access data in memory.
  • Today's systems run multiple processes simultaneously, time-sharing a single CPU and/or utilizing multiple CPUs in a System-on-Chip (SoC).
  • SoC System-on-Chip
  • These processes typically operate on large quantities of data that are stored in low-cost off-chip memory, such as high-definition video frames.
  • Caches are needed to reduce the average memory access time for the processes, while operating transparently to an application programmer. With multiple processes competing for cache storage space, and off-chip memory accesses being relatively slow, the processes seriously impact each other's performance. To maintain predictable process performance effective cache management techniques are needed.
  • An example of such a cache management technique is known from US
  • the apparatus includes a set identification device to specify a group of set-associative data blocks in a cache memory or translation- look aside buffer.
  • a block replacement logic circuit is used to identify replaceable blocks within the set- associative data blocks that can store new information.
  • the block replacement logic circuit is also used to identify un-replaceable blocks within the set-associative data blocks that cannot store new information.
  • the block replacement logic circuit only writes new information to the replaceable blocks of the set-associative data blocks.
  • the block replacement logic circuit can be implemented using a block replacement mask to identify within the set-associative data blocks the replaceable blocks and the un-replaceable blocks.
  • Cache management requires different components to function properly.
  • a system resource manager must collect the requirements of the different processes that (want to) operate. It should implement some policy to decide on cache space to be granted to the requesting process, depending upon real-time requirements and user priorities. Such a manager is typically implemented in software, making decisions on time granularities of tens of milliseconds to seconds. Once these decisions are made, the system should implement and enforce the granted spaces. Enforcing this cache management policy, once chosen, requires dedicated hardware support to operate at the rate of individual cache misses, at sub- microsecond speed. Clearly, for programmable systems the chip hardware is not application specific, and the cache control hardware must be able to flexibly adopt different configurations as directed by the cache management software.
  • cache management techniques such as the dynamic partitioning technique as set forth in US 5,584,014 make use of 'way partitioning' with associative caches. Unfortunately, such solutions do not allow management of many individual tasks or memory objects.
  • the invention relies on the perception that set-associative caches have relatively few ways (typically less than 16) and therefore way partitioning is not a good solution.
  • the invention implements a cache partitioning technique that operates on cache sets. As caches typically have many sets (such as several hundreds), set partitioning can provide the granularity to control many individual objects.
  • the problem with conventional caches is that the sets are selected by direct addressing. This selection must be unique and uniform, otherwise different processes cannot access each others shared data.
  • the proposed solution adopts a lookup-table that can contain a generic set selection function, and that is integrated close to the cache itself to provide a single global mapping shared between all processes and processors, and that can operate at the speed of the cache itself.
  • the data processing system uses a mapping function to partition the address space into segments, and the mapping function provides as output an index value for selecting a set within a segment.
  • the mapping function translates a plurality of input bits into at least one output bit, the output bit being used to select the set within the segment.
  • the plurality of input bits is derived from a tag portion and an index portion of a requested address.
  • mapping function is implemented as a lookup table.
  • mapping function further provides mask bits, the mask bits being used to control victim assignment if a cache miss occurs.
  • the lookup table comprises an additional field, the additional field comprising the mask bits.
  • the data processing system further comprises a victim mask table, wherein the lookup table comprises an additional field, the additional comprising a cache domain indicator which is used to select the mask bits in the victim mask table.
  • the mask bits are selected from a victim mask table, and an index value provided by an application is used to select the mask bits from the victim mask table.
  • FIG. 1 illustrates an example of a cache organization according to the prior art
  • Fig. 2 illustrates an example of a cache organization according to the invention
  • Fig. 3 illustrates an extension of the cache organization according to the invention illustrated in Fig. 2;
  • Fig. 4 illustrates an example of cache partitioning according to the invention
  • Fig. 5 illustrates an example of a table look-up based function according to the invention.
  • Fig. 6 illustrates an example of a cache organization according to the invention, extended with application-driven way masking.
  • a preferred embodiment encompasses a system with one or more processors, each having a first level (Ll) cache.
  • the cache management solution is implemented for the shared and larger second level (L2) cache.
  • a table-lookup facility (set table select) provides a translation for addresses of individual segments, spanning the full memory range of interest, which is typically a large part of the system address space.
  • the table lookup provides a set of cache-sets that shall be used for that memory segment. Preferably, this is a power-of-two number of sets, indicated by masking-out a number of address bits and replacing them with a new value, where both the mask and the new value are provided by the table lookup.
  • the table lookup actually provides a flexible (programmable) function that selects a cache set for every incoming address, allowing improved control in comparison with the standard function that just uses a sub-range of the incoming address.
  • the traditional cache organization is shown for clarity in Fig. 1.
  • the new table look-up based function is indicated in pseudo-code semantics in Fig. 5, and the corresponding cache organization is depicted in Fig. 2.
  • the mapping function would be implemented in a cache controller unit. This proposed function partitions the address space in segments, where each segment is identified by a set of address-bits taken from both the tag and index part of the address. Each segment typically covers an addressed data space of a size considerably larger than the sizes of the cache lines that belong to one set.
  • addresses belonging to the segment are translated to a set index number by a partial masking and re-assigning expression.
  • a particular implementation might use a map function that spans a lGbyte address range, using 4K segments, each of size 256Kbyte. In a 32-bit address system, the remaining 3Gbyte of address space might remain unmapped, for which set index selection remains according to the traditional scheme (as indicated by the 'has map?' in Fig. 2 and Fig. 5).
  • the map function typically translates multiple input bits (from the original tag and index address section) into fewer output bits for the actual set selection. As result of this many-to-one property, the cache architecture need to store and match longer tags in comparison with conventional caches without this invention.
  • the number of bits to store and compare per tag would encompass the sum of the traditional tag and index bits.
  • the lowest few index bits could remain unmapped, to reduce on the extra cost of longer tags, but also increasing the granularity of cache management.
  • Way masking is an earlier known, but hardly -if ever- used, method for cache partitioning.
  • Way masking (for example as described in US 5,584,014) can be used with great advantage in combination with set partitioning.
  • Way masking is used to control victim assignment when a cache miss occurs.
  • a cache controller supports a 'locked' status in conjunction with the tag information per cache line. Locked cache lines will never be selected as victim and be retained in the cache.
  • the proposed way masking 'or's this locked information with a set of mask bits (one bit per way), to further reduce the victim selection freedom. It is proposed that such mask bits are also provided by the set index select function. In one embodiment, these mask bits could be directly provided from an extra field in the table lookup.
  • An alternative embodiment would store a smaller set of masks, of which one is selected through an extra 'cache domain' field in the table lookup. This latter scheme is depicted in Fig. 3.
  • the 'cache domain' field could alternatively be provided as function of the application that generated this particular lookup request, depicted with the 'application id.' input in Fig. 6.
  • the application could directly provide a value for the way mask bits, or alternatively, an 'application id.' is provided as index into a table that stores the actual way mask bits, as shown in Fig. 6. In either case, the application input value is stored in a dedicated register. Multi-processor systems would have one such register for every processor. It is a responsibility of the operating system (or the thread scheduler) to update the value in this register according to the new application that starts executing after a context switch.
  • the hardware could provide functionality to create run time combinations of application driven way masks with address driven way masks. Potential good combination functions are (a) concatenating the bits of these two inputs, or (b) selecting the application- input when the table-lookup results in a 0-value domain id, or the non-zero table-lookup result otherwise.
  • This invention describes a method to optimize cache behavior, in particular in situations where multiple processes compete for cache storage space.
  • the invention proposes a new method for cache footprint control.
  • the method allows explicit mapping of memory objects into restricted areas of the cache.
  • the method operates orthogonal to known process- based cache protection, allowing a best-of-both- worlds system behavior.
  • the methods implements in hardware the low- level and high-speed cache control that is required for quality-of-service resource management software, and is thereby an important prerequisite for predictable performance in complex systems.
  • the method is implemented for a multi-processor shared level-2 cache, but is equally applicable to single-processor caches. It is remarked that the scope of protection of the invention is not restricted to the embodiments described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

La présente invention se rapporte à un procédé de partitionnement de mémoires caches, qui est mis en oeuvre sur des ensembles de mémoires caches. Comme les mémoires caches comportent généralement de nombreux ensembles (plusieurs centaines par exemple), le partitionnement des ensembles peut fournir la granularité requise pour commander de nombreux objets individuels. Le problème des caches classiques réside dans le fait que les ensembles sont sélectionnés par adressage direct. Ladite sélection doit être unique et uniforme, faute de quoi les divers processus ne peuvent accéder aux données partagées des uns et des autres. La solution selon l'invention fait appel à une table de consultation qui peut contenir une fonction de sélection d'ensembles générique, qui est intégrée à proximité de la mémoire cache elle-même afin de fournir un mappage global unique partagé entre tous les processus et processeurs, et qui peut fonctionner à la vitesse de la mémoire cache elle-même.
PCT/IB2006/050319 2005-02-02 2006-01-30 Systeme de traitement de donnees comprenant une unite memoire cache WO2006082554A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP05100705.2 2005-02-02
EP05100705 2005-02-02
EP05104652.2 2005-05-31
EP05104652 2005-05-31

Publications (2)

Publication Number Publication Date
WO2006082554A2 true WO2006082554A2 (fr) 2006-08-10
WO2006082554A3 WO2006082554A3 (fr) 2006-10-12

Family

ID=36602455

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/050319 WO2006082554A2 (fr) 2005-02-02 2006-01-30 Systeme de traitement de donnees comprenant une unite memoire cache

Country Status (1)

Country Link
WO (1) WO2006082554A2 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009107048A2 (fr) 2008-02-25 2009-09-03 Telefonaktiebolaget L M Ericsson (Publ) Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs
WO2011023617A1 (fr) * 2009-08-25 2011-03-03 International Business Machines Corporation Segmentation de cache dans des environnements virtualisés
CN102323909A (zh) * 2011-09-13 2012-01-18 北京北大众志微系统科技有限责任公司 实现使用大容量高速缓存的内存管理方法及装置
US8543769B2 (en) 2009-07-27 2013-09-24 International Business Machines Corporation Fine grained cache allocation
WO2023130316A1 (fr) * 2022-01-06 2023-07-13 中国科学院计算技术研究所 Procédé et système de division dynamique de cache prenant en compte aussi bien la qualité de service qu'un taux d'utilisation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5584014A (en) * 1994-12-20 1996-12-10 Sun Microsystems, Inc. Apparatus and method to preserve data in a set associative memory device
US20020002657A1 (en) * 1997-01-30 2002-01-03 Sgs-Thomson Microelectronics Limited Cache system for concurrent processes
US20020174301A1 (en) * 2001-05-17 2002-11-21 Conway Patrick N. Method and system for logical partitioning of cache memory structures in a partitioned computer system
US6493800B1 (en) * 1999-03-31 2002-12-10 International Business Machines Corporation Method and system for dynamically partitioning a shared cache
US20030196041A1 (en) * 1997-01-30 2003-10-16 Stmicroelectronics Limited Cache system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5584014A (en) * 1994-12-20 1996-12-10 Sun Microsystems, Inc. Apparatus and method to preserve data in a set associative memory device
US20020002657A1 (en) * 1997-01-30 2002-01-03 Sgs-Thomson Microelectronics Limited Cache system for concurrent processes
US20030196041A1 (en) * 1997-01-30 2003-10-16 Stmicroelectronics Limited Cache system
US6493800B1 (en) * 1999-03-31 2002-12-10 International Business Machines Corporation Method and system for dynamically partitioning a shared cache
US20020174301A1 (en) * 2001-05-17 2002-11-21 Conway Patrick N. Method and system for logical partitioning of cache memory structures in a partitioned computer system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009107048A2 (fr) 2008-02-25 2009-09-03 Telefonaktiebolaget L M Ericsson (Publ) Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs
WO2009107048A3 (fr) * 2008-02-25 2009-11-26 Telefonaktiebolaget L M Ericsson (Publ) Procédés et systèmes de partitionnement de mémoire cache dynamique pour des applications réparties fonctionnant sur des architectures multiprocesseurs
US8095736B2 (en) 2008-02-25 2012-01-10 Telefonaktiebolaget Lm Ericsson (Publ) Methods and systems for dynamic cache partitioning for distributed applications operating on multiprocessor architectures
US8543769B2 (en) 2009-07-27 2013-09-24 International Business Machines Corporation Fine grained cache allocation
CN102483718A (zh) * 2009-08-25 2012-05-30 国际商业机器公司 虚拟化环境中的高速缓存分区
GB2485328A (en) * 2009-08-25 2012-05-09 Ibm Cache partitioning in virtualized environments
WO2011023617A1 (fr) * 2009-08-25 2011-03-03 International Business Machines Corporation Segmentation de cache dans des environnements virtualisés
US8739159B2 (en) 2009-08-25 2014-05-27 International Business Machines Corporation Cache partitioning with a partition table to effect allocation of shared cache to virtual machines in virtualized environments
US8745618B2 (en) 2009-08-25 2014-06-03 International Business Machines Corporation Cache partitioning with a partition table to effect allocation of ways and rows of the cache to virtual machine in virtualized environments
CN102483718B (zh) * 2009-08-25 2014-12-24 国际商业机器公司 虚拟化环境中的高速缓存分区
GB2485328B (en) * 2009-08-25 2016-07-13 Ibm Cache partitioning in virtualized environments
CN102323909A (zh) * 2011-09-13 2012-01-18 北京北大众志微系统科技有限责任公司 实现使用大容量高速缓存的内存管理方法及装置
WO2023130316A1 (fr) * 2022-01-06 2023-07-13 中国科学院计算技术研究所 Procédé et système de division dynamique de cache prenant en compte aussi bien la qualité de service qu'un taux d'utilisation

Also Published As

Publication number Publication date
WO2006082554A3 (fr) 2006-10-12

Similar Documents

Publication Publication Date Title
US10963387B2 (en) Methods of cache preloading on a partition or a context switch
US6381676B2 (en) Cache management for a multi-threaded processor
US8095736B2 (en) Methods and systems for dynamic cache partitioning for distributed applications operating on multiprocessor architectures
US5809563A (en) Method and apparatus utilizing a region based page table walk bit
US7085890B2 (en) Memory mapping to reduce cache conflicts in multiprocessor systems
US10838864B2 (en) Prioritizing local and remote memory access in a non-uniform memory access architecture
EP1010080B1 (fr) Mecanisme de gestion de l'attribution de tampons de memoire virtuelle a des processus virtuels, fonde sur l'anciennete (lru - least recently used)
US8190839B2 (en) Using domains for physical address management in a multiprocessor system
US7461209B2 (en) Transient cache storage with discard function for disposable data
US20100318742A1 (en) Partitioned Replacement For Cache Memory
US20110010503A1 (en) Cache memory
US5978888A (en) Hardware-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels
JP2020529656A (ja) アドレス変換キャッシュ
JPH10307756A (ja) キャッシュ除外方法及びシステム
US11604733B1 (en) Limiting allocation of ways in a cache based on cache maximum associativity value
US8694755B1 (en) Virtual memory management for real-time embedded devices
US10289565B2 (en) Cache drop feature to increase memory bandwidth and save power
US11232042B2 (en) Process dedicated in-memory translation lookaside buffers (TLBs) (mTLBs) for augmenting memory management unit (MMU) TLB for translating virtual addresses (VAs) to physical addresses (PAs) in a processor-based system
US11256625B2 (en) Partition identifiers for page table walk memory transactions
KR101893966B1 (ko) 메모리 관리 방법 및 장치, 및 메모리 컨트롤러
US6026470A (en) Software-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels
WO2006082554A2 (fr) Systeme de traitement de donnees comprenant une unite memoire cache
JP2009015509A (ja) キャッシュメモリ装置
US8266379B2 (en) Multithreaded processor with multiple caches
US5983322A (en) Hardware-managed programmable congruence class caching mechanism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06710787

Country of ref document: EP

Kind code of ref document: A2

WWW Wipo information: withdrawn in national office

Ref document number: 6710787

Country of ref document: EP