WO2015073009A1

WO2015073009A1 - Mark cache entry

Info

Publication number: WO2015073009A1
Application number: PCT/US2013/070125
Authority: WO
Inventors: Derek Alan Sherlock
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2015-05-21

Abstract

An interrupt mask may be examined to determine if interrupts are enabled or disabled, if at least one of a load and store operation is being executed by a processor of a device. A cache entry of a cache associated with at least one of an instruction and operand address of the at least one of load and store operation may be marked, if the interrupts are disabled when the interrupt mask is examined. The marked cache entry may be locked into the cache such that the marked cache entry is not selected for eviction during subsequent cache eviction activity.

Description

MARK CACHE ENTRY

BACKGROUND

[0001] In some operating systems (OSs), interrupt handlers may be divided into two parts: a First-Level Interrupt Handler (FLIH) and a Second-Level Interrupt Handlers (SLIH). A job of the FLIH may be to quickly service the interrupt, or to record platform-specific critical information which is only available at the time of the interrupt, and to schedule the execution of a SLIH for further long-lived interrupt handling.

[0002] FLIHs mask interrupts and may cause jitter in process execution. Reducing the jitter is generally important for real-time OSs, since these OSs are usually required to maintain a guarantee that execution of specific code will complete within an agreed amount of time. To reduce jitter and to reduce the potential for losing data from masked interrupts, manufacturers, vendors, and/or programmers are challenged to reduce the execution time of a FLIH, instead moving as much as interrupt handling as possible to the SLIH.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] The following detailed description references the drawings, wherein:

[0004] FIG. 1 is an example block diagram of a device to mark a cache entry;

[0005] FIG. 2 is another example block diagram of a device to mark a cache entry;

[0006] FIG. 3 is an example block diagram of a computing device including instructions for marking a cache entry; and [0007] FIG. 4 is an example flowchart of a method for marking a cache entry.

DETAILED DESCRIPTION

[0008] Specific details are given in the following description to provide a thorough understanding of embodiments. However, it will be understood that embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure embodiments in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring embodiments.

[0009] Cache load or store miss latency may have a variable amount of impact on system performance. For example in deeply pipelined or threaded application code, a cache miss may have relatively little performance cost, because the CPU is not blocked from doing other useful work during the resulting memory access latency. On the other hand, cache misses in serialized code may be costly. For instance, cache misses that occur during interrupts- masked code, as found in first-level interrupt handlers (FLIHs), may have high latency costs. This is because such FLIHs tend to serialize all activity system wide; even other interrupt handlers and OS scheduler activity are inhibited.

[0010] In scalable chipset/system designs, a persistent design difficulty is that RAS architecture (scalability fabric, route-failover capability, hierarchical timeouts, etc.) may need to guarantee low latency completion of every CPU cache miss. This is because, although only a tiny fraction of all CPU cache misses may be in critical sections, such as for FLIHs, external hardware must assume a worst-case scenario in case the delay impacts an FLIH and results in an operating system (OS) crash.

[0011] Examples may prevent or reduce cache misses in FLIHs. An example device may include an interrupt unit and a marking unit. The interrupt unit may examine an interrupt mask to determine if interrupts are one of enabled and disabled, if at least one of a load and store operation is being executed by a processor of the device. The marking unit may mark a cache entry of a cache associated with at least one of an instruction and operand address of the at least one of load and store operation, if the interrupts are disabled when the interrupt mask is examined. The marked cache entry may be locked into the cache such that the marked cache entry is not selected for eviction during subsequent cache eviction activity.

[0012] These marked cache entries may allow for automatic preservation of code and data accesses associated with critical FLIH code sections in the cache, as opposed to less critical code and data that may be evicted. As a result, subsequent critical-section misses after an initial one may be eliminated. Thus, examples may ensure that, in steady-state operation, there are few or no cache misses in FLIHs.

[0013] In addition to performance benefits (since interrupt handlers are points of serialization of activity), examples may also allow relaxed external latency for worst-case cache-miss-handling, thus easing timeout and allowing for improved RAS design. While relaxed "worst-case miss" latency constraints are not the same as relaxed "typical-case cache-miss" constraints, typical-case latency will impact performance, and thus is an important design consideration. [0014] Referring now to the drawings, FIG. 1 is an example block diagram of a device 100 to mark a cache entry. The device 100 may interlace with or be included in any type of device including a cache, a notebook computer, a desktop computer, an all-in-one system, a server, a network device, a wireless device, a storage device, a mobile device, a thin client, a retail point of sale device, a gaming device, a scientific instrument, and the like.

[0015] In FIG. 1 , the device 100 is shown to include an interrupt unit 110, a marking unit 120 and an interrupt mask 130. The interrupt unit 110 and marking unit 120 may include, for example, a hardware device including electronic circuitry for implementing the functionality described below, such as control logic and/or memory. In addition or as an alternative, the interrupt unit 110 and marking unit 120 may be implemented as a series of instructions encoded on a machine- readable storage medium and executable by a processor. The interrupt unit 110 may examine the interrupt mask 130 to determine if interrupts are one of enabled and disabled, if at least one of a load and store operation is being executed by a processor (not shown) of the device 100.

[0016] An interrupt may be signal to the processor emitted by hardware or software indicating an event that needs immediate attention. The interrupt may alert the processor to a high-priority condition requiring the interruption of the current code the processor is executing, e.g. the current thread. The processor may responds by suspending its current activities, saving its state, and/or executing a small program called an interrupt handler (or interrupt service routine, ISR) to deal with the event. This interruption may be temporary, and after the interrupt handler finishes, the processor resumes execution of the previous thread.

[0017] Processors may have an internal interrupt mask which allows software to ignore all external hardware interrupts while it is set. Setting or clearing this mask may be faster than accessing an interrupt mask register (IMR) in a programmable interrupt controller (PIC) or disabling interrupts in the device itself. The IMR (not shown) specifies which interrupts are to be ignored and not acknowledged. A hardware interrupt may be ignored by setting a bit in an interrupt mask register's (IMR) bit-mask. The PIC (not shown) may set or reset the bits of the IMR.

[0018] In some cases, such as the x86 architecture, disabling and enabling interrupts on the processor itself may act as a memory barrier. The interrupt mask 130 may refer to any combination of the above described interrupts masks, such as the internal interrupt mask or the IMR. The interrupt mask 130 may include or be stored upon be any electronic, magnetic, optical, or other physical storage device, such as Random Access Memory (RAM) and the like.

[0019] The term load operation may refer to a process for loading a value from main memory (not shown), such as copying data from the main memory or a cache to a register (not shown). The term store operation may refer to a process for storing a value to the main memory or cache, such as copying the data from the register to the main memory or cache. The cache may be used by the processor to reduce the average time to access the main memory, such as dynamic random access memory (DRAM). The cache may be a smaller, faster memory which stores copies of the data from frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses may be closer to the cache latency than to the latency of main memory.

[0020] The marking unit 120 may mark a cache entry of the cache associated with at least one of an instruction and operand address of the at least one of load and store operation, if the interrupts are disabled when the interrupt mask 130 is examined. The marked cache entry may be locked into the cache such that the marked cache entry is not selected for eviction during subsequent cache eviction activity. The cache entry associated with the at least one of instruction and operand address may not be marked if the interrupts are enabled when the interrupt mask 130 is examined. The interrupt unit 110, the marking unit 120 and the cache will be explained in greater detail below, with respect to FIG. 2.

[0021] FIG. 2 is another example block diagram of a device 200 to mark a cache entry. The device 200 may interface with or be included in any type of device including a cache, a notebook computer, a desktop computer, an all-in- one system, a server, a network device, a wireless device, a storage device, a mobile device, a thin client, a retail point of sale device, a gaming device, a scientific instrument, and the like.

[0022] The device 200 of FIG. 2 may include the functionality and/or hardware of the device 100 of FIG. 1. For example, the device 200 includes the interrupt unit 110, the marking unit 120 and the interrupt mask 130. Here, the device 200 is shown to interlace with a cache 140 and a processor 150. While FIG. 2 shows the device 200 to be separate from the cache 140 and processor 150, examples may include the device 200 to be integrated with the cache 140 and/or processor 150.

[0023] The cache 140 may include or be stored upon be any electronic, magnetic, optical, or other physical storage device, such as Random Access Memory (RAM) like synchronous dynamic random-access memory (SDRAM), dynamic random access memory (DRAM), or graphics RAM. The processor 150 may be, at least one central processing unit (CPU), at least one semiconductor-based microprocessor, at least one graphics processing unit (GPU), a microcontroller, special purpose logic hardware controlled by microcode, and/or the like.

[0024] Data is transferred between a main memory (not shown) and the cache 140 in blocks of fixed size, called cache lines. When a cache line is copied from main memory into the cache, a cache entry 142 may be created. While the cache 140 is shown to have four cache entries 142-1 to 142-4, examples may include more or less than four cache entries 142. The cache entry 142 may include the copied data as well as the requested memory location, such as a tag. The copied data may include data that is likely to be used again or frequently used, such as application files, user documents, and/or metadata. The tag may contain at least part of the address of the actual data fetched from the main memory.

[0025] When the processor 150 seeks to read from or write to a location in main memory, the processor 150 may first check for a corresponding entry 142 in the cache 140. The cache 140 may check for the contents of the requested memory location in any cache lines that might contain that address. If the processor 150 finds that the memory location is in the cache 140, a cache hit has occurred. However, if the processor 150 does not find the memory location in the cache 140, a cache miss has occurred.

[0026] For a cache hit, the processor 150 may immediately read or write the data in the cache line. For a cache miss, the cache 140 may allocate a new entry 142, and copy in data from then main memory. Then, the request may be fulfilled from the contents of the cache 140. The proportion of accesses that result in a cache hit is known as the hit rate, and can be a measure of the effectiveness of the cache 140 for a given program or algorithm.

[0027] While the cache 140 is shown as a single cache, examples of the cache 140 may include multiple independent caches, such as an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and/or a translation lookaside buffer (TLB) to speed up virtual-to-physical address translation for both executable instructions and data. Further, the data cache may be organized as a hierarchy of more cache levels (L1 , 12, etc.).

[0028] As noted above, the interrupt unit 110 may examine the interrupt mask 130 to determine if interrupts are one of enabled and disabled, if the at least one of a load and store operation is being executed by the processor 150. For example, interrupts may be disabled or masked during execution of first level interrupt handler (FLIH) code. As explained earlier, FLIH code is usually executed immediately in response to hardware interrupts, and masks other interrupts in the meantime. The invoked FLIH code may serialize all activity of an operating system (OS) (not shown) of the device 200. During execution of FLIH code, load and store operations may be accessing critical sections of the cache 140, where a cache miss during a subsequent invocation of the FLIH code may result in high latency costs.

[0029] The marking unit 120 may mark a cache entry 142, such as the first cache entry 142-1 , of the cache 140 associated with at least one of the instruction and operand address of the at least one of load and store operation, if the interrupts are disabled when the interrupt mask 130 is examined. The cache entry 142 associated with the at least one of instruction and operand address of at least one of load and store operation is not marked if the interrupts are enabled when the interrupt mask 130 is examined. The marked cache entry 142-1 may be locked in the cache 140 and therefore not selected for eviction from the cache 140 in response to subsequent cache eviction activity. Hence, the marked cache entry may prevent a cache miss for a subsequent invocation of the FLIH code during steady-state operation.

[0030] As shown in FIG. 2, each of the cache entries 142-1 to 142-4 is associated with a marking bit 144-1 to 144-4. The marking bit 144 is to be set for the marked cache entry 142-1. Otherwise, the marking bit 144 may remain reset fur unmarked cache entries 142-2 to 142-4. While only one cache entry 142-1 is shown as marked, a plurality of cache entries 142 may be marked, such as during invocation of FLIH code.

[0031] The marked cache entry 142-1 may include at least one of code and data associated with FLIH code. As note above, the marked cache entry 142-1 may relate to at least one of an operand address and an instruction address of the at least one of load and store operation. For example, assume the following instruction is being executed: STORE R3 to address 12345. Also, assume the STORE instruction resides at address 67890 in the main memory. Then, 12345 may be the operand address and 67890 may be the instruction address. These addresses may be stored at separate cache entries 142, both of which may need to be marked during FLIH code execution to prevent subsequent eviction from the cache. This is because eviction of either entry may result in a cache miss for a subsequent invocation by the FLIH code.

[0032] Cache eviction may normally occur when the cache 140 evicts one of the existing cache entries 142 in order to make room for a new cache entry on a cache miss. The cache 140 may choose from various types of replacement policies to choose the cache entry 142 to evict, such as least-recently used (LRU). However, as noted above, the cache 140 may be prevented from evicting a marked cache entry 142-1. Nonetheless, if eviction is required due the cache 140 reaching capacity, at least one of the cache entries 142 will be evicted. Thus, the marked cache entry 142-1 may be evicted despite being locked to avoid deadlock, if all ways of a cache set have been locked. Further, the marked cache entry 142- 1 may be snooped away by another processor or other coherent entity, to avoid deadlock.

[0033] The marking unit 120 may subsequently unmark the marked cache entry 142-1 after it is no longer needed, such as if a memory management unit (MMU) page mapping (not shown) associated with the marked cache entry 142- 1 is decommissioned. Second level interrupt handler (SLIH) code may maintain the interrupts as masked while the SLIH accesses a marked cache entry 142-1 from a work queue. Thus, examples may reduce memory latencies of the critical section of the cache 140. [0034] FIG. 3 is an example block diagram of a computing device including instructions for marking a cache entry. In the embodiment of FIG. 3, the computing device 300 includes a processor 310 and a machine-readable storage medium 320. The machine-readable storage medium 320 further includes instructions 322, 324 and 326 for marking a cache entry.

[0035] The computing device 300 may be included in or part of, for example, a microprocessor, a controller such as a memory controller, a memory module or device, a notebook computer, a desktop computer, an all-in-one system, a server, a network device, a wireless device, or any other type of device capable of executing the instructions 322, 324 and 326. In certain examples, the computing device 300 may include or be connected to additional components such as memories, controllers, etc.

[0036] The processor 310 may be, at least one central processing unit (CPU), at least one semiconductor-based microprocessor, at least one graphics processing unit (GPU), a microcontroller, special purpose logic hardware controlled by microcode or other hardware devices suitable for retrieval and execution of instructions stored in the machine-readable storage medium 320, or combinations thereof. The processor 310 may fetch, decode, and execute instructions 322, 324 and 326 to implement marking the cache entry. As an alternative or in addition to retrieving and executing instructions, the processor 310 may include at least one integrated circuit (IC), other control logic, other electronic circuits, or combinations thereof that include a number of electronic components for performing the functionality of instructions 322, 324 and 326. [0037] The machine-readable storage medium 320 may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. Thus, the machine-readable storage medium 320 may be, for example, Random Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage drive, a Compact Disc Read Only Memory (CD-ROM), and the like. As such, the machine- readable storage medium 320 can be non-transitory. As described in detail below, machine-readable storage medium 320 may be encoded with a series of executable instructions for marking the cache entry.

[0038] Moreover, the instructions 322, 324 and 326 when executed by a processor (e.g., via one processing element or multiple processing elements of the processor) can cause the processor to perform processes, such as, the process of FIG. 4. For example, the determine instructions 322 may be executed by the processor 310 to determine if at least one of a load and store operation is being executed by the processor of the device. The check instructions 324 may be executed by the processor 310 to check if interrupts are masked, if the at least one of load and store operation is being executed.

[0039] The mark instructions 326 may be executed by a processor 310 to mark a cache entry of a cache (not shown) of the device that is associated with the at least one of an instruction and operand address of the at least one of the load and store operation, if the checked interrupts are masked. The marked cache entry may not be selected during subsequent cache eviction of cache entries from the cache. The at least one of instruction and operand address included in the marked cache entry may be related to first level interrupt handler (FLIH) code.

[0040] FIG. 4 is an example flowchart of a method 400 for marking a cache entry. Although execution of the method 400 is described below with reference to the device 200, other suitable components for execution of the method 400 can be utilized, such as the device 100. Additionally, the components for executing the method 400 may be spread among multiple devices (e.g., a processing device in communication with input and output devices). In certain scenarios, multiple devices acting in coordination can be considered a single device to perform the method 400. The method 400 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 320, and/or in the form of electronic circuitry.

[0041] At block 410, the device 200 determines if at least one of a load and store operation is being executed by a processor 150 of the device 200. If the at least one of a load and store operation is not being executed, the device 200 continues to wait at block 410. However, if the at least one of a load and store operation is being executed, the method 400 flows block 420. At block 420, the device 200 examines an interrupt mask 130, and then proceeds to block 430.

[0042] If interrupts are not masked when the at least one of load and store operation is being executed, the method 400 flows from block 430 back to block 410. However, if interrupts are masked when the at least one of load and store operation is being executed, the method 400 flows from block 430 to block 440. At block 440, the device 200 marks a cache entry 142-1 of a cache 140 associated with at least one of an instruction and operand address of the at least one of load and store operation, if the interrupts are masked when the interrupt mask 130 is examined. The marked cache entry 142-1 may be exempt from selection during subsequent cache eviction activity of the cache 140. The marking, at block 440, may set a bit 144-1 of the cache 140 associated with the cache entry 142-1 to mark the cache entry 142-1. The cache entry 142-1 is not marked if the interrupts are not masked when the interrupt mask 130 is examined. The unmarked cache entry 142-1 is not exempt from selection during subsequent cache evection activity.

[0043] The marked cache entry 142-1 may be associated with first level interrupt handler (FLIH) code. The marked cache entry 142-1 in the cache 140 may prevent a cache miss for a subsequent invocation of the FLIH code during steady-state operation. The invoked FLIH code may serialize all activity of an operating system (OS) of the device 200. The marked cache entry 142-1 may include at least one of code and data associated with the FLIH code.

Claims

CLAIMS We claim:

1. A device, comprising:

an interrupt unit to examine an interrupt mask to determine if interrupts are one of enabled and disabled, if at least one of a load and store operation is being executed by a processor of the device; and

a marking unit to mark a cache entry of a cache associated with at least one of an instruction and operand address of the at least one of load and store operation, if the interrupts are disabled when the interrupt mask is examined, wherein

the marked cache entry is locked into the cache such that the marked cache entry is not selected for eviction during subsequent cache eviction activity.

2. The device of claim 1 , wherein the cache entry associated with the at least one of instruction and operand address is not marked if the interrupts are enabled when the interrupt mask is examined.

3. The device of claim 1 , wherein the marked cache entry includes at least one of code and data associated with first level interrupt handler (FLIH) code.

4. The device of claim 3, wherein the marked cache entry prevents a cache miss for a subsequent invocation of the FLIH code during steady-state operation.

5. The device of claim 4, wherein,

the invoked FLIH code serializes all activity of an operating system (OS) of the device, and

the interrupt unit is to examine the interrupt mask only if FLIH code is being executed by the processor.

6. The device of claim 1 , wherein,

each cache entry of the cache is associated with a marking bit, and the marking bit is to be set for the marked cache entry.

7. The device of claim 1 , wherein the marking unit is to unmark the marked cache entry if a memory management unit (MMU) page mapping associated with the marked cache entry is decommissioned.

8. The device of claim 1 , wherein second level interrupt handler (SLIH) code is to maintain the interrupts as disabled while the SLIH accesses a marked cache entry from a work queue.

9. A method, comprising:

determining if at least one of a load and store operation is being executed for execution by a processor of a device;

examining an interrupt mask, if the at least one of load and store operation is being executed ; and

marking a cache entry of a cache associated with at least one of an instruction and operand address of the at least one of load and store operation, if the interrupts are masked when the interrupt mask is examined, wherein

the marked cache entry is exempt from selection during subsequent cache eviction activity.

10. The method of claim 9, wherein,

the cache entry is not marked if the interrupts are not masked when the interrupt mask is examined, and

the unmarked cache entry is not exempt from selection during subsequent cache evection activity.

11. The method of claim 9, wherein,

the marked cache entry is associated with first level interrupt handler (FLIH) code, and

the marked cache entry in the cache is to prevent a cache miss for a subsequent invocation of the FLIH code during steady-state operation.

12. The method of claim 11 , wherein,

the invoked FLIH code serializes all activity of an operating system (OS) of the device, and the marked cache entry includes at least one of code and data associated with the FLIH code.

13. The method of claim 9, wherein the marking sets a bit associated with a cache line of the cache to mark the cache entry.

14. A non-transitory computer-readable storage medium storing instructions that, if executed by a processor of a device, cause the processor to: determine if at least one of a load and store operation is being executed by the processor of the device;

check if interrupts are masked, if the at least one of the load and store operation is being executed; and

mark a cache entry of a cache of the device that is associated with at least one of an instruction and operand address of the at least one of the load and store operation, if the checked interrupts are masked, wherein

the marked cache entry is not selected for eviction during subsequent cache eviction of cache entries from the cache.

15. The non-transitory computer-readable storage medium of claim 14, wherein the at least one of instruction and operand address is related to first level interrupt handler (FLIH) code.