US20140201456A1 - Control Of Processor Cache Memory Occupancy - Google Patents
Control Of Processor Cache Memory Occupancy Download PDFInfo
- Publication number
- US20140201456A1 US20140201456A1 US14/218,724 US201414218724A US2014201456A1 US 20140201456 A1 US20140201456 A1 US 20140201456A1 US 201414218724 A US201414218724 A US 201414218724A US 2014201456 A1 US2014201456 A1 US 2014201456A1
- Authority
- US
- United States
- Prior art keywords
- cache
- quota
- entity
- entities
- lines
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0891—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/126—Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/50—Control mechanisms for virtual memory, cache or TLB
- G06F2212/502—Control mechanisms for virtual memory, cache or TLB using adaptive policy
Definitions
- the subject matter described herein relates to systems, techniques, and articles for allocating cache space to entities based on their corresponding performance requirements.
- the “vacating” (replacement) policies often used are referred to as Random and Least Recently Used (LRU).
- LRU Least Recently Used
- the Random method the cache location to be vacated is selected randomly while with the LRU method, the location containing data that has been least recently accessed is vacated making the assumption that the data least recently accessed is of less importance.
- a cache occupancy value is calculated for each of a plurality of entities executing in the processor system, a cache occupancy value for the entity based on associated entity identifiers.
- a cache replacement algorithm uses the occupancy values to determine which cache lines in the cache memory to replace when vacating entities.
- the occupancy value can be calculated by repeatedly counting a number of cache lines allocated to the entity offset by a number of cache lines vacated for the entity.
- cache lines can be shared by multiple entities and techniques such as first to access can be used to determine to which entity the cache line is associated.
- the entities can be one or more of: a single task, a group of tasks, a thread, a group of threads, a single state machine, a group of state machines, a single virtual machine, and a group of virtual machines, or any combination thereof.
- Each entity can have an associated occupancy profile.
- the occupancy profile can include a minimum quota specifying a minimum number of cache lines the corresponding entity should occupy.
- the occupancy profile can include a maximum quota specifying a maximum number of cache lines the corresponding entity should occupy.
- Performance of at least one of the entities by the processor system can be monitored (e.g., cache hit rate, cache miss rate, execution time, etc.) so that at least one of the minimum quota and the maximum quota can be varied to affect subsequent performance for the associated entity. If it is determined that a cache hit value is below a predetermined level for one of the entities, the minimum quota can be increased for the corresponding entity. Similarly, if it is determined that a cache miss penalty is above a predetermined level for one of the entities, the maximum quota for the corresponding entity can be decreased. In some arrangements there are multiple levels of quotas.
- the relationships of the occupancy value compared to quotas/thresholds can be encoded into an n-bit compliance value by comparing a number of lines specified by the occupancy value with the minimum quota and the maximum quota for the entity.
- the occupancy values for each entity can be encoded in an n-bit code stored in a compliance table.
- the cache replacement algorithm can read the compliance values for entities in the compliance table and compare those compliance values to select cache lines to be replaced.
- the cache replacement algorithm selects a cache line to replace (e.g., a victim, etc.) by taking into account, in prioritized order, whether an entity: occupies a number of cache lines substantially exceeding its corresponding maximum quota, occupies a number of cache lines exceeding it corresponding maximum quota, occupies a number of cache lines less than its corresponding maximum quota and more than its corresponding minimum quota, occupies a number of cache lines less than its minimum quota, and occupies a number of cache lines substantially less than its minimum quota.
- a cache line to replace e.g., a victim, etc.
- a default method such as random or least recently used cache line can be used to replacement selection. If a cache replacement algorithm identifies multiple cache lines eligible for replacement, a default method such as random or least recently used can be used to select which cache line among those selected to replace.
- performance of a plurality of entities in a processor system is monitored.
- Each entity has an associated maximum quota specifying a maximum number of cache lines that the entity should occupy and an associated minimum quota specifying a minimum number of cache lines that the entity should occupy.
- a number of cache lines occupied by the entity are also determined.
- one or more of the maximum quota or the minimum quota for an entity is dynamically adjusted if such entity is performing outside desired performance criteria.
- a cache replacement algorithm is used to replace cache lines in the cache memory. The cache replacement algorithm selects cache lines to be replaced based on a number of cache lines occupied by an entity in relation to its associated maximum quota and/or its associated minimum quota.
- systems and methods for controlling execution of entities using cache memory within a processor system are provided. With such systems and methods, performance of a plurality of entities are monitored. Thereafter, based on the monitoring, at least one of a minimum cache quote and a maximum cache quota are selectively adjusted.
- Articles of manufacture are also described that comprise computer executable instructions permanently stored on computer readable media, which, when executed by a computer, causes the computer to perform operations herein.
- computer systems are also described that may include a processor and a memory coupled to the processor.
- the memory may temporarily or permanently store one or more programs that cause the processor to perform one or more of the operations described herein.
- FIG. 1 is a general block diagram of a processor cache system
- FIG. 2 is a detailed block diagram of a processor cache system
- FIG. 3 is a process flow diagram illustrating a method for controlling processor cache memory occupancy.
- the current subject matter provides cache quotas replacement policies that select vacating cache locations in order to effectively control the amount of cache space a entity may occupy within a processor system.
- the caching techniques utilized herein can be characterized as set associative caching, which in turn is sometimes described as a compromise between a direct mapped cache and a fully associative cache where each address is mapped to a certain set of cache locations.
- a set of cache locations utilized by a particular entity is sometimes referred to as a working set.
- the address space can be divided into blocks of 2 m bytes (i.e., the cache line size), discarding the bottom m address bits.
- An “n-way set associative” cache with S sets has n cache locations in each set.
- Block b can be mapped to set “b mod S” and may be stored in any of the n locations in that set with its upper address bits stored in the tag. To determine whether block b is in the cache, set “b mod S” the upper address bits are searched associatively in the tag.
- a tag as used herein can be characterized as an object that stores status (state) information and the entity ID for each cache line. Stated differently, each cache line can have an associated tag which stores upper address bits, status, ID.
- entity refers to tasks, groups of tasks, threads, groups of threads, state machines, groups of state machines, virtual machines, groups of virtual machines and/or other software or hardware requiring cache.
- a task can be characterized as a set of instruction to be executed by the processor system.
- the entities can be instances of computer programs that are being executed, threads of execution such as one or more simultaneously, or pseudo-simultaneously, executing instances of a computer program closely sharing resources, etc. that execute within one or more processor systems (e.g., microprocessors, etc.) or virtual machines such as virtual execution environments on one or more processors.
- a virtual machine can be characterized as a software implementation of a machine (computer) that executes programs like a real machine.
- the entities can be state machines such as DMA controllers and the collection of commands for such state machines (e.g., DMA channels).
- the current subject matter can be implemented to be upwards compatible with existing cache replacement policies such as LRU or Random. If no quotas are set by the software or a decision among entities with similar relationships to their quotas is required or no meaningful decision based on quotas can be reached, the replacement policy reverts to the default policy such as LRU or Random.
- the cache quotas techniques described herein can be implemented in hardware and they can be used as replacements to conventional cache circuits (which can be the only part of the cache hardware affected) within processor systems.
- the current subject matter can ensure that the cache controller speed of operation is not affected by the new circuits. Examples of processor systems that can utilize the current subject matter are described and illustrated in U.S. Pat. Pub. No. 2009/0055829 and U.S. patent application Ser. No 13/072,596 (Attorney Docket No. 42497-502001US filed on the same day as this application) claiming priority to U.S. Pat. App. Ser. No. 61/341,069, the contents of all three applications are hereby fully incorporated by reference.
- a large portion of program execution timing variability comes from cache hit variability and high cache miss penalties. In some cases, such variability cannot be mitigated by just increasing the run time and/or clock speed. As a result, real-time deadlines or desired response times may be missed when multiple applications are sharing the cache space.
- FIG. 1 is a block diagram illustrating a processor cache system 100 having the following components: cache line owner identification (ID) 110 (the cache line is the object that includes a tag and storage for data), address tag/state 120 , minimum quota 130 , maximum quota 140 , actual occupancy 150 , compliance table 160 , and victim selection 170 .
- ID cache line owner identification
- the amount of cache space a entity can occupy can be controlled dynamically by adjusting one or more of the cache quotas 130 , 140 .
- the quotas can be defined as a minimum/maximum number of cache locations (sometimes referred to herein as “lines” or “cache lines”) an entity being executed by a processor system is guaranteed to be able to occupy and/or cannot exceed.
- the term “guarantee” in relation to cache quotas references a targeted number of cache lines (whether minimum or maximum) to be allocated to an entity and does not necessarily require an absolute guarantee of cache line occupancy. For example, aliasing as well as minimum quota oversubscription are sometimes difficult/impossible to avoid. Similarly, under subscription of the cache can allow occupancy to above maximum quota. If the goal is to guarantee that a working set (e.g., an amount of cache required for the instructions and/or data for the entity to execute properly) for the entity remains in the cache, including when the entity is swapped out (not running), the minimum quota 130 can be established for the working set.
- each location in main memory and in a cache has an index which is a unique number referencing such location. Each location has a tag that contains the index of the data in the main memory that has been cached.
- the maximum quota 140 can be used to provide performance isolation between entities and/or prevent some entities from excess use (e.g., hogging, etc.) of the cache.
- the maximum quota 140 can also be used to free up some cache space (by reducing the maximum quota) to allow other entities to expand (by increasing their maximum quota) their share of the cache.
- a scheduler is utilized in order to prioritize execution of entities and/or to define schedules for execution of the entities (e.g., execution initiation, execution termination, etc.).
- the current arrangement provides another mechanism to control the “execution speed” of an entity, by accelerating those entities which are falling behind (i.e., entities that are likely to be finalized subsequent to their corresponding execution deadline) while decelerating entities that are ahead (i.e., entities that are likely to be finalized prior to their corresponding execution deadline, etc.).
- This is dictated by the finite size of the corresponding cache.
- the cache occupancy quota method (as described herein) can keep track of how many cache lines each entity occupies and it can decide which line, within a set, will be replaced based on the line owner's (entity) Min/Max settings (as defined in the minimum quota 130 and the maximum quota 140 ), both of which can be dynamically adjusted at run time.
- Cache quotas can effectively partition the cache dynamically amongst entities and groups of entities, which in turn can control cache misses.
- Cache misses occur when the data the processor access cache while executing an entity that is not presently in cache. These accesses may be instruction fetches, data reads or writes.
- the current cache replacement algorithm can control the amount of cache space an entity is allowed/guaranteed to have and therefore the cache miss rate is controlled. More cache can result in fewer cache misses.
- Cache space isolation (i.e., a guarantee of a certain amount of cache space for the entity to respond/function accordingly, etc.) can also be used to assist with hardware convergence.
- Hardware convergence in this regard, refers to reducing the number of processors by consolidating entities in fewer processors and/or utilizing one or more processors in a common system as opposed to multiple computer systems.
- One of the problems is that there can be real-time entities mixed with non-real time entities in the same processor, and such entities can require a guaranteed response time. A major part of the response time is to guarantee data/instructions is in the cache. Without such a guarantee, real-time entities cannot be intermingled with non-real-time entities.
- Cache occupancy quotas can be allocated to individual entities (for critical code) or to groups of entities (sometimes referred to herein as “cache groups”) to limit the size of the implementation hardware.
- Hardware required to implement the cache replacement algorithm increases with the number of entities.
- entities can be grouped such that there is one entry for multiple entities within a particular group.
- the effectiveness of the cache quota replacement technique, as described herein, can depend in part, on the number of sets a set-associative cache contains. With more sets, there are more options to choose from as a line replacement candidate (which can be defined by the victim selection 170 ).
- additional information can be added to the tag memory to include the ID 110 to which the entity(s) belongs.
- This number can be uniquely associated with an entity.
- Associated with each ID there is a Max Quota 140 specifying the max cache lines the entity cannot exceed (unless the cache is underutilized), a Min Quota 130 specifying the minimum cache lines the entity is guaranteed to be able to occupy, and an Actual Occupancy 150 indicating the actual number of lines the entity occupies.
- Max Quota 140 specifying the max cache lines the entity cannot exceed (unless the cache is underutilized)
- Min Quota 130 specifying the minimum cache lines the entity is guaranteed to be able to occupy
- an Actual Occupancy 150 indicating the actual number of lines the entity occupies.
- FIG. 2 is a detailed block diagram of a processor cache system 200 . It will be appreciated that the current subject matter is applicable to a wide variety of processor cache systems. N sets are illustrated with individual entity or a group in which an entity is a member being identified by the corresponding ID. ID- 0 is the ID associated with a currently running entity while ID- 1 to N identifies groups owning the respective cache lines. In FIG.
- cache tags 210 (1 to N for an N-set associative Cache), victim selection logic 220 , a compliance table 230 (N+1 read ports, 1 write port, 2-bit output), a quota table 240 (2 read ports—victim, new owner), a MUX 250 that selects the victim ID, logic 260 to encode the result of comparing the Occupancy to Min and Max Quota Values and threshold registers 270 into one or more N-bit values (as described in further detail below).
- the cache tags 210 can contain both an upper address of the data residing there and a state of the associated cache line (e.g., valid, invalid, etc.).
- the tag state can be augmented with the ID of the entity to which the data residing there is associated.
- the number of bits for the ID field is implementation-dependent (e.g., 5, 8 bits for 32, 256 IDs respectively). IDs can be reclaimed, to be utilized by a new group, when the corresponding group is no longer active in the processor cache system 200 .
- the compliance table 230 can be indexed by the IDs from all tags that have valid entries.
- the content of the table can be a 2-bit value indicating the level of compliance of the ID owner of that cache location to the current quotas.
- “over-exceeding” a quota can be characterized as exceeding a maximum quota and a maximum threshold over the maximum quota and “exceeding” a quota can be characterized as exceeding the quota but being below such maximum threshold.
- a quota can be characterized as being less than a minimum quota and a minimum threshold below the minimum quota and “under-achieving” a quota can be characterized as being less than the minimum quota but greater than the minimum threshold.
- Max quota compliance (2 bits) can comprise:
- Min quota compliance (2 bits) can comprise:
- a single bit compliance code can be used as shown below:
- Compliance codes (C-0, C-1 . . . C-N, etc.) for all IDs in the tags indexed by the lower bits of the address which resulted in a cache miss can be read out simultaneously (N read ports) and supplied to the victim selection logic 220 which determines the set where the selected victim resides.
- the set selection can be provided to the cache control logic to write the ID of the missing address in the victim's tag position as the new owner of that location.
- the valid bit can also be set.
- the victim i.e., the cache location to be replaced, etc.
- the victim may be selected in a decreasing priority order is:
- the ID of the victim can be selected by the MUX 250 and supplied to the quota table 240 which can contain an entry for each ID in the system 200 .
- the ID field has 5 bits, providing for 32 IDs, the quota table 240 will have 32 entries.
- a management module can set table entries in the quota table 240 .
- Each table entry data in the quota table 240 can include:
- the Occupancy level field can be incremented for the ID owner of the newly-fetched cache data and decremented for the ID of the victim.
- the encode logic 260 can use the new occupancy levels of the new owner and the victim and the threshold registers 270 to generate new compliance codes which can be stored in the compliance table 230 at the locations corresponding to the new owner and the victim.
- the information in the quota table 240 should, in most cases, be sufficient to select the victim.
- the compliance table 230 (as illustrated) can be characterized as an optimization aimed at reducing hardware size and improving circuit timing as it has only 4 bits of information albeit being N-multi-ported. Applying the same level of multi porting to the quota table 240 could result in a large and slow structure.
- the victim selection can be in the time critical path but can be sped up by the compliance table 230 small data width. Updating the compliance table 230 through the MUX 250 , quota table 240 and encoding Logic 260 is not time-critical as it only needs to be done before the next cache miss occurs (or within the processing pipeline).
- victim selection logic 220 below is an example of victim selection algorithm to be implemented by victim selection logic 220 .
- the above described victim selection algorithm attempts to take cache locations away from the worst offenders of their pre-set quotas. Exceeding the max quota is the worst “offense” while exceeding the min quota is not an “offense” but it means the respective entitiy(s) associated with ID has more than the minimum guaranteed number of locations and therefore can afford to donate/surrender one or more.
- the selection of the victim can be done by the default mechanism of the cache controller (e.g., LRU, Random, etc.). Similarly, if there are multiple cache lines that qualify as a “victim” (i.e., there is a “tie”), a default mechanism, such as LRU or Random techniques, can be used to select a victim amongst such cache lines. Setting the max quota equal to the total number of locations in the cache and the min Quota to zero will force the use of a default selection under all conditions effectively turning the quota-based selection mechanism off in the above example.
- the two “quota threshold” registers 270 can be set by software to any arbitrary values to set the boundary between “exceeding” and “over-exceeding” as a percentage of the individual quotas.
- the multiplier can be restricted to a power of 2. Only two calculation (the new owner and the victim) compliance codes need to be generated for each cache miss and may be calculated sequentially using the same hardware.
- a threshold value specific to each quota can be required (instead of a unique multiplier) and such threshold values can be stored along with the max/min quota values in the quota table 240 .
- the main use of the cache quota can be to influence the cache miss ratio (or cache miss rate) for individual entities, or groups of entities, identified by their IDs.
- the min quota results in guaranteeing a minimum hit ratio while the max quota limits maximum occupancy (which is often correlated to a higher hit ratio). In some cases a “zero” miss ratio for a certain entity and memory region can be required.
- the base quota algorithm deals with “number of locations” but not where those locations are. In simple cases where all the memory locations “touched” by an entity need to stay in the cache, the min quota can be set equal to the number of locations.
- the above algorithm can pick the victim based on LRU or Random method effectively making the victim own fewer locations than its guaranteed minimum. The likelihood of such occurrences can be reduced by limiting the amount of “guaranteed number of locations” for an entity.
- Cache occupancy can include mapping virtual memory, memory management techniques allowing tasks to utilize virtual memory address space(s) which may be separate from physical address space(s), to physical memory.
- the physical memory in effect acts as a cache allowing a plurality of entities to share physical memory wherein the total size of the virtual memory space(s) may be larger than the size of physical memory, or larger than the physical memory allocated to one or more entities, and thus the physical memory, and/or a portion thereof, acts as a “cache”.
- Entity physical memory occupancy can be managed as described elsewhere and as in the co-pending applications.
- FIG. 3 is a process flow diagram illustrating a method 300 for controlling processor cache memory within a processor system.
- a cache occupancy value is calculated, at 310 , for each of a plurality of entities executing in the processor system.
- the cache occupancy value for the entity can be calculated based on the number of cache lines in the cache memory having identifiers associated with the entity
- cache lines in the cache memory are replaced, using a cache replacement algorithm, in connection with the subsequent execution of entities.
- the cache replacement algorithm uses the occupancy values in order to determine which cache lines to replace.
- aspects of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Techniques are described for controlling processor cache memory within a processor system. Cache occupancy values for each of a plurality of entities executing the processor system can be calculated. A cache replacement algorithm uses the cache occupancy values when making subsequent cache line replacement decisions. In some variations, entities can have occupancy profiles specifying a maximum cache quota and/or a minimum cache quota which can be adjusted to achieve desired performance criteria. Related methods, systems, and articles are also described.
Description
- This patent application is a continuation of and claims the benefit of priority under 35 U.S.C. §120 of U.S. patent application Ser. No. 13/072,529, filed Mar. 25, 2011, now U.S. Pat. No. 8,677,701, entitled “CONTROL OF PROCESSOR CACHE MEMORY OCCUPANCY”, and claims priority under 35 U.S.C. §119 to U.S. Provisional Application Ser. No. 61/341,069, filed Mar. 26, 2010, entitled “METHOD AND APPARATUS FOR THE CONTROL OF PROCESSOR CACHE MEMORY OCCUPANCY”, the disclosure of which is incorporated herein by reference.
- The subject matter described herein relates to systems, techniques, and articles for allocating cache space to entities based on their corresponding performance requirements.
- The presence/absence of instructions and data in a processor cache memory has a significant impact on the processor performance. With main memory being 100 (and more) clocks “away” from the processor, the execution speed decreases dramatically if data/instructions have to be fetched from there. This arrangement creates a challenge for real-time applications that have to guarantee a certain response time to a triggering event. Most conventional cache designs employ a structure called “set associative”, meaning there are multiple cache locations available for a certain cache address. If two memory accesses alias/reference the same location in cache, multiple data items can be stored in alternate locations (sets), otherwise the later data will vacate and occupy the space of the first data. If there are several sets (e.g., 4 or 8), and all locations are occupied, a determination must be made as to which space is to be vacated to make room for a new data.
- The “vacating” (replacement) policies often used are referred to as Random and Least Recently Used (LRU). With the Random method, the cache location to be vacated is selected randomly while with the LRU method, the location containing data that has been least recently accessed is vacated making the assumption that the data least recently accessed is of less importance.
- Both methods fail to guarantee response time. Even if, in the case of LRU, if certain data is rarely used and statistically has less impact on performance, for a particular application, this can offer no performance “comfort”. In the case of hard real-time software (i.e., software in which failing to meet timing has serious consequences on system behavior, etc.), programmers use the method of reserving (locking) a portion of the cache and then re-arranging the code to ensure all critical data will permanently reside in the reserved area. This method certainly guarantees response time but at the expense of potentially “permanently” crippling the performance of other resident software.
- Because of ever increasing processor speeds and with the proliferation of multi-core implementations, caches are growing in size. With more space available and more software running, the need to “police” the cache space allocation is obvious. The traditional LRU and Random methods have provided adequate performance in the past but they are unable to keep up with the evolution of processors.
- In one aspect, methods and systems for controlling processor cache memory within a processor are provided. A cache occupancy value is calculated for each of a plurality of entities executing in the processor system, a cache occupancy value for the entity based on associated entity identifiers. A cache replacement algorithm uses the occupancy values to determine which cache lines in the cache memory to replace when vacating entities.
- The occupancy value can be calculated by repeatedly counting a number of cache lines allocated to the entity offset by a number of cache lines vacated for the entity. In some situations, cache lines can be shared by multiple entities and techniques such as first to access can be used to determine to which entity the cache line is associated. The entities can be one or more of: a single task, a group of tasks, a thread, a group of threads, a single state machine, a group of state machines, a single virtual machine, and a group of virtual machines, or any combination thereof.
- Each entity can have an associated occupancy profile. The occupancy profile can include a minimum quota specifying a minimum number of cache lines the corresponding entity should occupy. The occupancy profile can include a maximum quota specifying a maximum number of cache lines the corresponding entity should occupy. Performance of at least one of the entities by the processor system can be monitored (e.g., cache hit rate, cache miss rate, execution time, etc.) so that at least one of the minimum quota and the maximum quota can be varied to affect subsequent performance for the associated entity. If it is determined that a cache hit value is below a predetermined level for one of the entities, the minimum quota can be increased for the corresponding entity. Similarly, if it is determined that a cache miss penalty is above a predetermined level for one of the entities, the maximum quota for the corresponding entity can be decreased. In some arrangements there are multiple levels of quotas.
- The relationships of the occupancy value compared to quotas/thresholds can be encoded into an n-bit compliance value by comparing a number of lines specified by the occupancy value with the minimum quota and the maximum quota for the entity. The occupancy values for each entity can be encoded in an n-bit code stored in a compliance table. The cache replacement algorithm can read the compliance values for entities in the compliance table and compare those compliance values to select cache lines to be replaced. The cache replacement algorithm selects a cache line to replace (e.g., a victim, etc.) by taking into account, in prioritized order, whether an entity: occupies a number of cache lines substantially exceeding its corresponding maximum quota, occupies a number of cache lines exceeding it corresponding maximum quota, occupies a number of cache lines less than its corresponding maximum quota and more than its corresponding minimum quota, occupies a number of cache lines less than its minimum quota, and occupies a number of cache lines substantially less than its minimum quota.
- If the cache replacement algorithm is not able to identify a cache line to be replaced, a default method such as random or least recently used cache line can be used to replacement selection. If a cache replacement algorithm identifies multiple cache lines eligible for replacement, a default method such as random or least recently used can be used to select which cache line among those selected to replace.
- In another aspect, performance of a plurality of entities in a processor system is monitored. Each entity has an associated maximum quota specifying a maximum number of cache lines that the entity should occupy and an associated minimum quota specifying a minimum number of cache lines that the entity should occupy. A number of cache lines occupied by the entity are also determined. Thereafter, one or more of the maximum quota or the minimum quota for an entity is dynamically adjusted if such entity is performing outside desired performance criteria. A cache replacement algorithm is used to replace cache lines in the cache memory. The cache replacement algorithm selects cache lines to be replaced based on a number of cache lines occupied by an entity in relation to its associated maximum quota and/or its associated minimum quota.
- In a further aspect, systems and methods for controlling execution of entities using cache memory within a processor system are provided. With such systems and methods, performance of a plurality of entities are monitored. Thereafter, based on the monitoring, at least one of a minimum cache quote and a maximum cache quota are selectively adjusted.
- Articles of manufacture are also described that comprise computer executable instructions permanently stored on computer readable media, which, when executed by a computer, causes the computer to perform operations herein. Similarly, computer systems are also described that may include a processor and a memory coupled to the processor. The memory may temporarily or permanently store one or more programs that cause the processor to perform one or more of the operations described herein.
- The subject matter described herein provides many advantages. For example, overall entity performance can be more effectively controlled by specifying minimum and maximum cache quotas and allowing for the dynamic adjustment of both and by replacing cache lines based on cache occupancy values and/or compliance values.
- The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a general block diagram of a processor cache system; -
FIG. 2 is a detailed block diagram of a processor cache system; and -
FIG. 3 is a process flow diagram illustrating a method for controlling processor cache memory occupancy. - The current subject matter provides cache quotas replacement policies that select vacating cache locations in order to effectively control the amount of cache space a entity may occupy within a processor system. The caching techniques utilized herein can be characterized as set associative caching, which in turn is sometimes described as a compromise between a direct mapped cache and a fully associative cache where each address is mapped to a certain set of cache locations. A set of cache locations utilized by a particular entity is sometimes referred to as a working set. The address space can be divided into blocks of 2m bytes (i.e., the cache line size), discarding the bottom m address bits. An “n-way set associative” cache with S sets has n cache locations in each set. Block b can be mapped to set “b mod S” and may be stored in any of the n locations in that set with its upper address bits stored in the tag. To determine whether block b is in the cache, set “b mod S” the upper address bits are searched associatively in the tag. A tag as used herein can be characterized as an object that stores status (state) information and the entity ID for each cache line. Stated differently, each cache line can have an associated tag which stores upper address bits, status, ID.
- As used herein, the term “entity” or “entity” (unless otherwise noted) refers to tasks, groups of tasks, threads, groups of threads, state machines, groups of state machines, virtual machines, groups of virtual machines and/or other software or hardware requiring cache. A task can be characterized as a set of instruction to be executed by the processor system. The entities can be instances of computer programs that are being executed, threads of execution such as one or more simultaneously, or pseudo-simultaneously, executing instances of a computer program closely sharing resources, etc. that execute within one or more processor systems (e.g., microprocessors, etc.) or virtual machines such as virtual execution environments on one or more processors. A virtual machine (VM) can be characterized as a software implementation of a machine (computer) that executes programs like a real machine. In some implementations, the entities can be state machines such as DMA controllers and the collection of commands for such state machines (e.g., DMA channels).
- As will be described further, the current subject matter can be implemented to be upwards compatible with existing cache replacement policies such as LRU or Random. If no quotas are set by the software or a decision among entities with similar relationships to their quotas is required or no meaningful decision based on quotas can be reached, the replacement policy reverts to the default policy such as LRU or Random.
- The cache quotas techniques described herein can be implemented in hardware and they can be used as replacements to conventional cache circuits (which can be the only part of the cache hardware affected) within processor systems. The current subject matter can ensure that the cache controller speed of operation is not affected by the new circuits. Examples of processor systems that can utilize the current subject matter are described and illustrated in U.S. Pat. Pub. No. 2009/0055829 and U.S. patent application Ser. No 13/072,596 (Attorney Docket No. 42497-502001US filed on the same day as this application) claiming priority to U.S. Pat. App. Ser. No. 61/341,069, the contents of all three applications are hereby fully incorporated by reference.
- A large portion of program execution timing variability comes from cache hit variability and high cache miss penalties. In some cases, such variability cannot be mitigated by just increasing the run time and/or clock speed. As a result, real-time deadlines or desired response times may be missed when multiple applications are sharing the cache space.
-
FIG. 1 is a block diagram illustrating aprocessor cache system 100 having the following components: cache line owner identification (ID) 110 (the cache line is the object that includes a tag and storage for data), address tag/state 120,minimum quota 130,maximum quota 140,actual occupancy 150, compliance table 160, andvictim selection 170. With such an arrangement, the amount of cache space a entity can occupy can be controlled dynamically by adjusting one or more of thecache quotas minimum quota 130 can be established for the working set. As with conventional cache processor systems, each location in main memory and in a cache has an index which is a unique number referencing such location. Each location has a tag that contains the index of the data in the main memory that has been cached. - The
maximum quota 140 can be used to provide performance isolation between entities and/or prevent some entities from excess use (e.g., hogging, etc.) of the cache. Themaximum quota 140 can also be used to free up some cache space (by reducing the maximum quota) to allow other entities to expand (by increasing their maximum quota) their share of the cache. In some cases, a scheduler is utilized in order to prioritize execution of entities and/or to define schedules for execution of the entities (e.g., execution initiation, execution termination, etc.). In such cases, the current arrangement provides another mechanism to control the “execution speed” of an entity, by accelerating those entities which are falling behind (i.e., entities that are likely to be finalized subsequent to their corresponding execution deadline) while decelerating entities that are ahead (i.e., entities that are likely to be finalized prior to their corresponding execution deadline, etc.). This is dictated by the finite size of the corresponding cache. The cache occupancy quota method (as described herein) can keep track of how many cache lines each entity occupies and it can decide which line, within a set, will be replaced based on the line owner's (entity) Min/Max settings (as defined in theminimum quota 130 and the maximum quota 140), both of which can be dynamically adjusted at run time. - Cache quotas can effectively partition the cache dynamically amongst entities and groups of entities, which in turn can control cache misses. Cache misses occur when the data the processor access cache while executing an entity that is not presently in cache. These accesses may be instruction fetches, data reads or writes. The current cache replacement algorithm can control the amount of cache space an entity is allowed/guaranteed to have and therefore the cache miss rate is controlled. More cache can result in fewer cache misses.
- Cache space isolation (i.e., a guarantee of a certain amount of cache space for the entity to respond/function accordingly, etc.) can also be used to assist with hardware convergence. Hardware convergence, in this regard, refers to reducing the number of processors by consolidating entities in fewer processors and/or utilizing one or more processors in a common system as opposed to multiple computer systems. One of the problems is that there can be real-time entities mixed with non-real time entities in the same processor, and such entities can require a guaranteed response time. A major part of the response time is to guarantee data/instructions is in the cache. Without such a guarantee, real-time entities cannot be intermingled with non-real-time entities.
- Cache occupancy quotas can be allocated to individual entities (for critical code) or to groups of entities (sometimes referred to herein as “cache groups”) to limit the size of the implementation hardware. Hardware required to implement the cache replacement algorithm increases with the number of entities. To limit an increase in a number of entities (and thus limiting the hardware size/requirements), entities can be grouped such that there is one entry for multiple entities within a particular group.
- The effectiveness of the cache quota replacement technique, as described herein, can depend in part, on the number of sets a set-associative cache contains. With more sets, there are more options to choose from as a line replacement candidate (which can be defined by the victim selection 170).
- With reference again to
FIG. 1 , additional information can be added to the tag memory to include theID 110 to which the entity(s) belongs. This number can be uniquely associated with an entity. Associated with each ID, there is aMax Quota 140 specifying the max cache lines the entity cannot exceed (unless the cache is underutilized), aMin Quota 130 specifying the minimum cache lines the entity is guaranteed to be able to occupy, and anActual Occupancy 150 indicating the actual number of lines the entity occupies. These three values can be encoded and stored in the compliance table 160. When a miss occurs, the data in the compliance table 160 is used to determine the “victim” (which is then selected in the victim selection 170), namely the cache location where new data will be stored. -
FIG. 2 is a detailed block diagram of aprocessor cache system 200. It will be appreciated that the current subject matter is applicable to a wide variety of processor cache systems. N sets are illustrated with individual entity or a group in which an entity is a member being identified by the corresponding ID. ID-0 is the ID associated with a currently running entity while ID-1 to N identifies groups owning the respective cache lines. InFIG. 2 , there are cache tags 210 (1 to N for an N-set associative Cache),victim selection logic 220, a compliance table 230 (N+1 read ports, 1 write port, 2-bit output), a quota table 240 (2 read ports—victim, new owner), aMUX 250 that selects the victim ID,logic 260 to encode the result of comparing the Occupancy to Min and Max Quota Values and threshold registers 270 into one or more N-bit values (as described in further detail below). - The cache tags 210 can contain both an upper address of the data residing there and a state of the associated cache line (e.g., valid, invalid, etc.). The tag state can be augmented with the ID of the entity to which the data residing there is associated. The number of bits for the ID field is implementation-dependent (e.g., 5, 8 bits for 32, 256 IDs respectively). IDs can be reclaimed, to be utilized by a new group, when the corresponding group is no longer active in the
processor cache system 200. - When a cache access (read or write) results in a cache miss, indicated by the associated address in the tag for each set not matching the cache access address, a new location for the missing data needs to be selected. The compliance table 230 can be indexed by the IDs from all tags that have valid entries. The content of the table can be a 2-bit value indicating the level of compliance of the ID owner of that cache location to the current quotas. As used herein, “over-exceeding” a quota can be characterized as exceeding a maximum quota and a maximum threshold over the maximum quota and “exceeding” a quota can be characterized as exceeding the quota but being below such maximum threshold. As used herein, “greatly under-achieving” a quota can be characterized as being less than a minimum quota and a minimum threshold below the minimum quota and “under-achieving” a quota can be characterized as being less than the minimum quota but greater than the minimum threshold.
- Max quota compliance (2 bits) can comprise:
-
- ID over-exceeds its max quota/has far more lines than its max quota
- ID exceeds its max quota/has more lines than its max quota
- ID complies with its max quota/has no more lines than its max quota
- Min quota compliance (2 bits) can comprise:
-
- ID greatly under-achieves its min quota/has far fewer lines than its min quota
- ID under-achieves its min quota/has fewer lines than its min quota
- ID complies with its min quota/has at least the number of lines specified by the min quota
- To reduce the size of the hardware, a single bit compliance code can be used as shown below:
-
- ID over-exceeds its max quota/has far more lines than its max quota
- ID exceeds its max quota/has more lines than the max quota
- ID under exceeds its min quota/has fewer lines than min quota
- ID in compliance/has fewer lines than the max quota and more lines than min quota
- Compliance codes (C-0, C-1 . . . C-N, etc.) for all IDs in the tags indexed by the lower bits of the address which resulted in a cache miss can be read out simultaneously (N read ports) and supplied to the
victim selection logic 220 which determines the set where the selected victim resides. The set selection can be provided to the cache control logic to write the ID of the missing address in the victim's tag position as the new owner of that location. The valid bit can also be set. - The victim (i.e., the cache location to be replaced, etc.) may be selected in a decreasing priority order is:
-
- 1. Unoccupied cache line
- 2. Over-exceeds max quota
- 3. Exceeds max quota
- 4. In compliance with both max quota and min quota
- 5. Under-achieves min quota
- 6. Greatly under-achieves min quota 7. Default selection method such as LRU or Random.
It will be appreciated that a subset of the items above can be used for determining the victim.
- At the same time, the ID of the victim can be selected by the
MUX 250 and supplied to the quota table 240 which can contain an entry for each ID in thesystem 200. For example, if the ID field has 5 bits, providing for 32 IDs, the quota table 240 will have 32 entries. - A management module can set table entries in the quota table 240. Each table entry data in the quota table 240 can include:
-
- 1. Occupancy (Occ) level: The number of cache lines each group or entity associated with the ID owns.
- 2. Maximum (Max) Quota: The maximum number of cache locations the group or entity associated with the ID is allowed to occupy.
- 3. Minimum (Min) Quota: The minimum number of cache locations the group or entity associated with the ID is guaranteed to occupy.
- In one implementation, there can be two
registers 270 set by software to determine the threshold for “over-exceeding the max quota” and “greatly under-achieving the min quota”. - The Occupancy level field can be incremented for the ID owner of the newly-fetched cache data and decremented for the ID of the victim. After that, the encode
logic 260 can use the new occupancy levels of the new owner and the victim and the threshold registers 270 to generate new compliance codes which can be stored in the compliance table 230 at the locations corresponding to the new owner and the victim. - The information in the quota table 240 should, in most cases, be sufficient to select the victim. The compliance table 230 (as illustrated) can be characterized as an optimization aimed at reducing hardware size and improving circuit timing as it has only 4 bits of information albeit being N-multi-ported. Applying the same level of multi porting to the quota table 240 could result in a large and slow structure.
- The victim selection can be in the time critical path but can be sped up by the compliance table 230 small data width. Updating the compliance table 230 through the
MUX 250, quota table 240 and encodingLogic 260 is not time-critical as it only needs to be done before the next cache miss occurs (or within the processing pipeline). - Below is an example of victim selection algorithm to be implemented by
victim selection logic 220. - Pick an un-occupied location (invalid)
- Else:
- Pick a location with corresponding ID that over-exceeds its max quota Else:
-
- Pick a location with corresponding ID that exceeds its max quota Else:
- Pick a location with corresponding ID that under-achieves its min quota
- Pick a location with corresponding ID that exceeds its max quota Else:
- Else:
-
-
- Pick a location with corresponding ID that greatly under-achieves its min quota
- Else:
- Pick location based on the default method: LRU or Random
-
-
- The above described victim selection algorithm attempts to take cache locations away from the worst offenders of their pre-set quotas. Exceeding the max quota is the worst “offense” while exceeding the min quota is not an “offense” but it means the respective entitiy(s) associated with ID has more than the minimum guaranteed number of locations and therefore can afford to donate/surrender one or more.
- If none of the cache lines associated IDs qualify as donor, the selection of the victim can be done by the default mechanism of the cache controller (e.g., LRU, Random, etc.). Similarly, if there are multiple cache lines that qualify as a “victim” (i.e., there is a “tie”), a default mechanism, such as LRU or Random techniques, can be used to select a victim amongst such cache lines. Setting the max quota equal to the total number of locations in the cache and the min Quota to zero will force the use of a default selection under all conditions effectively turning the quota-based selection mechanism off in the above example.
- In one variation, the two “quota threshold” registers 270, one for max quota and the other for min quota, can be set by software to any arbitrary values to set the boundary between “exceeding” and “over-exceeding” as a percentage of the individual quotas. As multiplication operations may be expensive to implement, the multiplier can be restricted to a power of 2. Only two calculation (the new owner and the victim) compliance codes need to be generated for each cache miss and may be calculated sequentially using the same hardware.
- In another variation, a threshold value specific to each quota can be required (instead of a unique multiplier) and such threshold values can be stored along with the max/min quota values in the quota table 240.
- The main use of the cache quota can be to influence the cache miss ratio (or cache miss rate) for individual entities, or groups of entities, identified by their IDs. The min quota results in guaranteeing a minimum hit ratio while the max quota limits maximum occupancy (which is often correlated to a higher hit ratio). In some cases a “zero” miss ratio for a certain entity and memory region can be required. The base quota algorithm deals with “number of locations” but not where those locations are. In simple cases where all the memory locations “touched” by an entity need to stay in the cache, the min quota can be set equal to the number of locations. In the rare case when all IDs in the cache corresponding to the cache miss address have occupancy levels below their min quota and therefore none is a candidate for victim selection, the above algorithm can pick the victim based on LRU or Random method effectively making the victim own fewer locations than its guaranteed minimum. The likelihood of such occurrences can be reduced by limiting the amount of “guaranteed number of locations” for an entity.
- Cache occupancy can include mapping virtual memory, memory management techniques allowing tasks to utilize virtual memory address space(s) which may be separate from physical address space(s), to physical memory. The physical memory in effect acts as a cache allowing a plurality of entities to share physical memory wherein the total size of the virtual memory space(s) may be larger than the size of physical memory, or larger than the physical memory allocated to one or more entities, and thus the physical memory, and/or a portion thereof, acts as a “cache”. Entity physical memory occupancy can be managed as described elsewhere and as in the co-pending applications.
-
FIG. 3 is a process flow diagram illustrating amethod 300 for controlling processor cache memory within a processor system. A cache occupancy value is calculated, at 310, for each of a plurality of entities executing in the processor system. The cache occupancy value for the entity can be calculated based on the number of cache lines in the cache memory having identifiers associated with the entity Thereafter, at 320, cache lines in the cache memory are replaced, using a cache replacement algorithm, in connection with the subsequent execution of entities. The cache replacement algorithm uses the occupancy values in order to determine which cache lines to replace. - Various aspects of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
- Although a few variations have been described in detail above, other modifications are possible. For example, the logic flow depicted in the accompanying figures and described herein do not require the particular order shown, or sequential order, to achieve desirable results. Other embodiments may be within the scope of the following claims.
Claims (20)
1. A method for controlling processor cache memory within a processor system, the method comprising:
calculating, for each of a plurality of entities executing in the processor system, a cache occupancy value for the entity based on cache lines in the cache memory having identifiers associated with the entity; and
replacing, using a cache replacement algorithm that provides for varying treatments based on a number of occupied cache lines, cache lines in the cache memory required for execution of at least a portion of the entities based on the occupancy values.
2. A method as in claim 1 , wherein the calculating comprises by repeatedly counting a number of cache lines allocated to the entity offset by a number of cache lines vacated for the entity.
3. A method as in claim 1 , wherein the entities are each selected from a group comprising: a single task, a group of tasks, a thread, a group of threads, a single state machine, a group of state machines, a single virtual machine, and a group of virtual machines, and any combination thereof.
4. A method as in claim 1 , wherein each entity has an associated occupancy profile.
5. A method as in claim 4 , wherein the occupancy profile comprises a minimum quota specifying a minimum number of cache lines a corresponding entity should occupy.
6. A method as in claim 5 , wherein the occupancy profile comprises a maximum quota specifying a maximum number of cache lines the corresponding entity should occupy.
7. A method as in claim 6 , further comprising:
monitoring performance of at least one of the entities by the processor system; and
varying, based on the monitoring of the performance, at least one of the minimum quota and the maximum quota to affect subsequent performance for one or more of the at least one monitored entity.
8. A method as in claim 7 , wherein performance is based on at least one of: cache hit values and cache miss penalty.
9. A method as in claim 8 , further comprising:
determining that a cache hit value is below a predetermined level for one of the entities; and
increasing the minimum quota for the entity having the cache hit value below the predetermined level.
10. A method as in claim 8 , further comprising:
determining that a cache miss penalty is above a predetermined level for one of the entities; and
decreasing the maximum quota for the entity having the cache hit value above the predetermined level.
11. A method as in claim 5 , further comprising:
encoding, for each entity, a corresponding occupancy value into an n-bit compliance value by comparing a number of lines specified by the occupancy value with the minimum quota and the maximum quota for the entity.
12. A method as in claim 11 , wherein the occupancy values for each entity are encoded in an N-bit code that is stored in a compliance table.
13. A method as in claim 12 , wherein the cache replacement algorithm reads the compliance values for entities in the compliance table and compares those compliance values to select cache lines to be replaced.
14. A method as in claim 5 , wherein the cache replacement algorithm selects a cache line to replace by taking into account, in prioritized order, whether an entity:
occupies a number of cache lines exceeding its corresponding maximum quota above a first threshold,
occupies a number of cache lines exceeding its corresponding maximum quota above a second threshold, but below the first threshold,
occupies a number of cache lines less than its corresponding maximum quota and more than its corresponding minimum quota,
occupies a number of cache lines less than its minimum quota by more than a third threshold, and
occupies a number of cache lines substantially less than its minimum quota below a fourth threshold that is lower than the third threshold.
15. A method as in claim 1 , wherein if the cache replacement algorithm is not able to identify a cache line to be replaced, a randomly selected cache line is replaced.
16. A method as in claim 1 , wherein if the cache replacement algorithm is not able to identify a cache line to be replaced and there is not an empty cache line, a least recently used cache line is replaced.
17. A method as in claim 1 , wherein if the cache replacement algorithm identifies multiple cache lines to be replaced and there is not an empty cache line, a randomly selected or least recently used cache line among the identified multiple cache lines is replaced.
18. A method for controlling execution of entities using cache memory within a processor system, the method comprising:
monitoring performance of a plurality of entities and their usage of cache memory within the processor system; and
selectively adjusting at least one of a minimum cache quota and a maximum cache quota for entities based on the monitoring.
19. A method as in claim 18 , wherein the monitoring and adjusting are implemented by at least one data processor.
20. An apparatus for controlling processor cache memory within a processor system, the apparatus comprising:
means for calculating, for each of a plurality of entities executing in the processor system, a cache occupancy value for the entity based on cache lines in the cache memory having identifiers associated with the entity; and
means for replacing, using a cache replacement algorithm that provides for varying treatments based on a number of occupied cache lines, cache lines in the cache memory required for execution of at least a portion of the entities based on the occupancy values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/218,724 US20140201456A1 (en) | 2010-03-26 | 2014-03-18 | Control Of Processor Cache Memory Occupancy |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US34106910P | 2010-03-26 | 2010-03-26 | |
US13/072,529 US8677071B2 (en) | 2010-03-26 | 2011-03-25 | Control of processor cache memory occupancy |
US14/218,724 US20140201456A1 (en) | 2010-03-26 | 2014-03-18 | Control Of Processor Cache Memory Occupancy |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/072,529 Continuation US8677071B2 (en) | 2010-03-26 | 2011-03-25 | Control of processor cache memory occupancy |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140201456A1 true US20140201456A1 (en) | 2014-07-17 |
Family
ID=44657656
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/072,529 Expired - Fee Related US8677071B2 (en) | 2010-03-26 | 2011-03-25 | Control of processor cache memory occupancy |
US14/218,724 Abandoned US20140201456A1 (en) | 2010-03-26 | 2014-03-18 | Control Of Processor Cache Memory Occupancy |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/072,529 Expired - Fee Related US8677071B2 (en) | 2010-03-26 | 2011-03-25 | Control of processor cache memory occupancy |
Country Status (1)
Country | Link |
---|---|
US (2) | US8677071B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098193A1 (en) * | 2014-10-07 | 2016-04-07 | Google Inc. | Method and apparatus for monitoring system performance and dynamically updating memory sub-system settings using software to optimize performance and power consumption |
US9740631B2 (en) | 2014-10-07 | 2017-08-22 | Google Inc. | Hardware-assisted memory compression management using page filter and system MMU |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9235443B2 (en) | 2011-11-30 | 2016-01-12 | International Business Machines Corporation | Allocation enforcement in a multi-tenant cache mechanism |
US10140219B2 (en) | 2012-11-02 | 2018-11-27 | Blackberry Limited | Multi-port shared cache apparatus |
EP2728485B1 (en) * | 2012-11-02 | 2018-01-10 | BlackBerry Limited | Multi-Port Shared Cache Apparatus |
US9201803B1 (en) * | 2012-12-31 | 2015-12-01 | Emc Corporation | System and method for caching data |
US9400544B2 (en) | 2013-04-02 | 2016-07-26 | Apple Inc. | Advanced fine-grained cache power management |
US9396122B2 (en) * | 2013-04-19 | 2016-07-19 | Apple Inc. | Cache allocation scheme optimized for browsing applications |
CN103257900B (en) * | 2013-05-24 | 2016-05-18 | 杭州电子科技大学 | Real-time task collection method for obligating resource on the multiprocessor that minimizing CPU takies |
US10140210B2 (en) * | 2013-09-24 | 2018-11-27 | Intel Corporation | Method and apparatus for cache occupancy determination and instruction scheduling |
US20160050112A1 (en) * | 2014-08-13 | 2016-02-18 | PernixData, Inc. | Distributed caching systems and methods |
US9703951B2 (en) | 2014-09-30 | 2017-07-11 | Amazon Technologies, Inc. | Allocation of shared system resources |
US9754103B1 (en) | 2014-10-08 | 2017-09-05 | Amazon Technologies, Inc. | Micro-architecturally delayed timer |
US9378363B1 (en) | 2014-10-08 | 2016-06-28 | Amazon Technologies, Inc. | Noise injected virtual timer |
US9864636B1 (en) * | 2014-12-10 | 2018-01-09 | Amazon Technologies, Inc. | Allocating processor resources based on a service-level agreement |
US9491112B1 (en) | 2014-12-10 | 2016-11-08 | Amazon Technologies, Inc. | Allocating processor resources based on a task identifier |
US10719434B2 (en) * | 2014-12-14 | 2020-07-21 | Via Alliance Semiconductors Co., Ltd. | Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or a subset of its ways depending on the mode |
EP3230874B1 (en) * | 2014-12-14 | 2021-04-28 | VIA Alliance Semiconductor Co., Ltd. | Fully associative cache memory budgeted by memory access type |
EP3274840A4 (en) * | 2015-03-27 | 2018-12-26 | Intel Corporation | Dynamic cache allocation |
GB2540761B (en) * | 2015-07-23 | 2017-12-06 | Advanced Risc Mach Ltd | Cache usage estimation |
WO2017030714A1 (en) * | 2015-08-19 | 2017-02-23 | Board Of Regents, The University Of Texas System | Evicting appropriate cache line using a replacement policy utilizing belady's optimal algorithm |
CN106484310B (en) * | 2015-08-31 | 2020-01-10 | 华为数字技术(成都)有限公司 | Storage array operation method and device |
US20190034337A1 (en) * | 2017-12-28 | 2019-01-31 | Intel Corporation | Multi-level system memory configurations to operate higher priority users out of a faster memory level |
US10795364B1 (en) | 2017-12-29 | 2020-10-06 | Apex Artificial Intelligence Industries, Inc. | Apparatus and method for monitoring and controlling of a neural network using another neural network implemented on one or more solid-state chips |
US10802488B1 (en) | 2017-12-29 | 2020-10-13 | Apex Artificial Intelligence Industries, Inc. | Apparatus and method for monitoring and controlling of a neural network using another neural network implemented on one or more solid-state chips |
US10802489B1 (en) | 2017-12-29 | 2020-10-13 | Apex Artificial Intelligence Industries, Inc. | Apparatus and method for monitoring and controlling of a neural network using another neural network implemented on one or more solid-state chips |
US10324467B1 (en) | 2017-12-29 | 2019-06-18 | Apex Artificial Intelligence Industries, Inc. | Controller systems and methods of limiting the operation of neural networks to be within one or more conditions |
US10672389B1 (en) | 2017-12-29 | 2020-06-02 | Apex Artificial Intelligence Industries, Inc. | Controller systems and methods of limiting the operation of neural networks to be within one or more conditions |
US10620631B1 (en) | 2017-12-29 | 2020-04-14 | Apex Artificial Intelligence Industries, Inc. | Self-correcting controller systems and methods of limiting the operation of neural networks to be within one or more conditions |
US11232033B2 (en) | 2019-08-02 | 2022-01-25 | Apple Inc. | Application aware SoC memory cache partitioning |
US11372769B1 (en) * | 2019-08-29 | 2022-06-28 | Xilinx, Inc. | Fine-grained multi-tenant cache management |
US11366434B2 (en) | 2019-11-26 | 2022-06-21 | Apex Artificial Intelligence Industries, Inc. | Adaptive and interchangeable neural networks |
US10691133B1 (en) | 2019-11-26 | 2020-06-23 | Apex Artificial Intelligence Industries, Inc. | Adaptive and interchangeable neural networks |
US12081646B2 (en) | 2019-11-26 | 2024-09-03 | Apex Ai Industries, Llc | Adaptively controlling groups of automated machines |
US10956807B1 (en) | 2019-11-26 | 2021-03-23 | Apex Artificial Intelligence Industries, Inc. | Adaptive and interchangeable neural networks utilizing predicting information |
US11656997B2 (en) * | 2019-11-26 | 2023-05-23 | Intel Corporation | Flexible cache allocation technology priority-based cache line eviction algorithm |
US11367290B2 (en) | 2019-11-26 | 2022-06-21 | Apex Artificial Intelligence Industries, Inc. | Group of neural networks ensuring integrity |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5943691A (en) * | 1995-12-27 | 1999-08-24 | Sun Microsystems, Inc. | Determination of array padding using collision vectors |
US20110072218A1 (en) * | 2009-09-24 | 2011-03-24 | Srilatha Manne | Prefetch promotion mechanism to reduce cache pollution |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4914657A (en) * | 1987-04-15 | 1990-04-03 | Allied-Signal Inc. | Operations controller for a fault tolerant multiple node processing system |
JPH07253893A (en) | 1994-03-15 | 1995-10-03 | Toshiba Corp | System for managing task priority order |
US6418478B1 (en) * | 1997-10-30 | 2002-07-09 | Commvault Systems, Inc. | Pipelined high speed data transfer mechanism |
US6671762B1 (en) * | 1997-12-29 | 2003-12-30 | Stmicroelectronics, Inc. | System and method of saving and restoring registers in a data processing system |
US7386586B1 (en) | 1998-12-22 | 2008-06-10 | Computer Associates Think, Inc. | System for scheduling and monitoring computer processes |
US6493741B1 (en) * | 1999-10-01 | 2002-12-10 | Compaq Information Technologies Group, L.P. | Method and apparatus to quiesce a portion of a simultaneous multithreaded central processing unit |
US7228546B1 (en) * | 2000-01-28 | 2007-06-05 | Hewlett-Packard Development Company, L.P. | Dynamic management of computer workloads through service level optimization |
US7035932B1 (en) * | 2000-10-27 | 2006-04-25 | Eric Morgan Dowling | Federated multiprotocol communication |
US6845456B1 (en) * | 2001-05-01 | 2005-01-18 | Advanced Micro Devices, Inc. | CPU utilization measurement techniques for use in power management |
US7082610B2 (en) * | 2001-06-02 | 2006-07-25 | Redback Networks, Inc. | Method and apparatus for exception handling in a multi-processing environment |
JP2003131892A (en) | 2001-10-25 | 2003-05-09 | Matsushita Electric Ind Co Ltd | Task execution control device and method therefor |
US7539994B2 (en) * | 2003-01-03 | 2009-05-26 | Intel Corporation | Dynamic performance and resource management in a processing system |
US7653912B2 (en) * | 2003-05-30 | 2010-01-26 | Steven Frank | Virtual processor methods and apparatus with unified event notification and consumer-producer memory operations |
JP3920818B2 (en) | 2003-07-22 | 2007-05-30 | 株式会社東芝 | Scheduling method and information processing system |
US8544005B2 (en) * | 2003-10-28 | 2013-09-24 | International Business Machines Corporation | Autonomic method, system and program product for managing processes |
US7770034B2 (en) * | 2003-12-16 | 2010-08-03 | Intel Corporation | Performance monitoring based dynamic voltage and frequency scaling |
US20050198636A1 (en) * | 2004-02-26 | 2005-09-08 | International Business Machines Corporation | Dynamic optimization of batch processing |
JP4327008B2 (en) * | 2004-04-21 | 2009-09-09 | 富士通株式会社 | Arithmetic processing device and control method of arithmetic processing device |
US7281145B2 (en) * | 2004-06-24 | 2007-10-09 | International Business Machiness Corporation | Method for managing resources in a CPU by allocating a specified percentage of CPU resources to high priority applications |
US8037474B2 (en) * | 2005-09-27 | 2011-10-11 | Sony Computer Entertainment Inc. | Task manager with stored task definition having pointer to a memory address containing required code data related to the task for execution |
US8286183B2 (en) * | 2005-10-22 | 2012-10-09 | Cisco Technology, Inc. | Techniques for task management using presence |
JP4781089B2 (en) * | 2005-11-15 | 2011-09-28 | 株式会社ソニー・コンピュータエンタテインメント | Task assignment method and task assignment device |
US7721127B2 (en) * | 2006-03-28 | 2010-05-18 | Mips Technologies, Inc. | Multithreaded dynamic voltage-frequency scaling microprocessor |
JP2008282150A (en) * | 2007-05-09 | 2008-11-20 | Matsushita Electric Ind Co Ltd | Signal processor and signal processing system |
US8122448B2 (en) * | 2007-06-29 | 2012-02-21 | International Business Machines Corporation | Estimation method and system |
WO2009029549A2 (en) * | 2007-08-24 | 2009-03-05 | Virtualmetrix, Inc. | Method and apparatus for fine grain performance management of computer systems |
US9143554B2 (en) * | 2008-10-13 | 2015-09-22 | Hewlett-Packard Development Company, L.P. | Control of a computing system having adjustable inputs |
-
2011
- 2011-03-25 US US13/072,529 patent/US8677071B2/en not_active Expired - Fee Related
-
2014
- 2014-03-18 US US14/218,724 patent/US20140201456A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5943691A (en) * | 1995-12-27 | 1999-08-24 | Sun Microsystems, Inc. | Determination of array padding using collision vectors |
US20110072218A1 (en) * | 2009-09-24 | 2011-03-24 | Srilatha Manne | Prefetch promotion mechanism to reduce cache pollution |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098193A1 (en) * | 2014-10-07 | 2016-04-07 | Google Inc. | Method and apparatus for monitoring system performance and dynamically updating memory sub-system settings using software to optimize performance and power consumption |
US9740631B2 (en) | 2014-10-07 | 2017-08-22 | Google Inc. | Hardware-assisted memory compression management using page filter and system MMU |
US9785571B2 (en) | 2014-10-07 | 2017-10-10 | Google Inc. | Methods and systems for memory de-duplication |
US9892054B2 (en) * | 2014-10-07 | 2018-02-13 | Google Llc | Method and apparatus for monitoring system performance and dynamically updating memory sub-system settings using software to optimize performance and power consumption |
Also Published As
Publication number | Publication date |
---|---|
US8677071B2 (en) | 2014-03-18 |
US20110238919A1 (en) | 2011-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8677071B2 (en) | Control of processor cache memory occupancy | |
CN110178124B (en) | Method and apparatus for partitioning TLB or cache allocation | |
CN110140111B (en) | Partitioning of memory system resource or performance monitoring | |
CN110168502B (en) | Apparatus and method for memory partitioning | |
US9734070B2 (en) | System and method for a shared cache with adaptive partitioning | |
CN110175136B (en) | Cache management method, cache, and storage medium | |
CN110168501B (en) | Partitioning of memory system resource or performance monitoring | |
CN110168500B (en) | Partitioning of memory system resource or performance monitoring | |
US5978888A (en) | Hardware-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels | |
US20100318742A1 (en) | Partitioned Replacement For Cache Memory | |
JP2011018196A (en) | Cache memory | |
GB2509755A (en) | Partitioning a shared cache using masks associated with threads to avoiding thrashing | |
US8544008B2 (en) | Data processing system and method for cache replacement using task scheduler | |
WO2017218026A1 (en) | Scaled set dueling for cache replacement policies | |
US11256625B2 (en) | Partition identifiers for page table walk memory transactions | |
CN106372007B (en) | Cache utilization estimation | |
WO2005015408A1 (en) | A method of data caching | |
KR102344008B1 (en) | Data store and method of allocating data to the data store | |
US11604733B1 (en) | Limiting allocation of ways in a cache based on cache maximum associativity value | |
US11237985B2 (en) | Controlling allocation of entries in a partitioned cache | |
KR20200080142A (en) | Bypass predictor for an exclusive last-level cache | |
US6026470A (en) | Software-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels | |
US20200065246A1 (en) | Coherency directory entry allocation based on eviction costs | |
US20090157968A1 (en) | Cache Memory with Extended Set-associativity of Partner Sets | |
US20070101064A1 (en) | Cache controller and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: VIRTUALMETRIX, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GIBSON, GARY ALLEN;POPESCU, VALERI;REEL/FRAME:032534/0342 Effective date: 20110325 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |