CN117396828A

CN117396828A - System support for persistent cache flushing

Info

Publication number: CN117396828A
Application number: CN202280038692.1A
Authority: CN
Inventors: B·J·富勒
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2021-03-31
Filing date: 2022-03-29
Publication date: 2024-01-12

Abstract

Techniques for refreshing volatile system memory to persistent memory after loss of Alternating Current (AC) power are described herein. In some embodiments, these techniques include implementing an extended retention window long enough to complete a full flush of the processor cache and memory controller buffers after a power down event using energy available in the bulk capacitor of one or more power supplies. The voltage on the bulk capacitor within one or more power supply units may be monitored and a notification may be triggered when a programmable threshold voltage is detected on the bulk capacitor. The system may configure the voltage threshold to indicate that a certain minimum amount of energy is available for successfully completing the cache refresh operation. These techniques allow for flushing of volatile system caches without reliance on a Battery Backup Unit (BBU), which can be cumbersome to install and maintain.

Description

System support for persistent cache flushing

Technical Field

The present disclosure relates to cache management techniques. In particular, the present disclosure relates to techniques for refreshing volatile cache states upon a power down event.

Background

Modern server designs typically incorporate persistent memory (PMEM), such as a Data Center Persistent Memory Module (DCPMM) or a nonvolatile dual inline memory module (NVDIMMS), into a memory architecture. Compared to block-based persistent media, persistent memory has several advantages, including low latency random access time and the ability to perform Remote Direct Memory Access (RDMA) operations directly on the persistent memory.

It is expensive to commit data directly to persistent memory devices, and servers with persistent memory often support treating some volatile on-chip (on-chip) state as persistent in order to limit the number of explicit commit operations that software needs to perform. If the system can ensure that the state of the volatile buffer will be refreshed to persistent memory at all resets or power transitions, which would otherwise destroy the content stored in the volatile buffer, the program can treat any data committed to the volatile buffer as persistent. One such method for refreshing volatile buffers is known as asynchronous dynamic random access memory refresh (ADR), whereby volatile buffers in a memory controller are included in the persistence domain. According to this method, the system retains the necessary small amount of energy to keep the system powered for a sufficient time after power down to refresh the volatile memory controller buffers out to the persistent memory device.

Another technique known as enhanced ADR (coadr) or Persistent Cache Flush (PCF) expands the volatile states that can be handled as persistence to include all processor caches and on-chip buffers. Typically, the processor cache is several orders of magnitude larger than the volatile memory buffers in the memory controller. Thus, the system requires significantly more energy to complete the refresh process. The server supporting the persistent cache refresh must include some form of auxiliary energy storage to power the system during the persistent cache refresh operation. Some servers include a Battery Backup Unit (BBU) to provide enough energy to complete the outward flushing of data from the processor cache into the persistent memory after power down. The BBU can store a large amount of energy; however, they face many challenges including large footprint, limited ability to supply the high currents required by the server system, thermal constraints, and additional costs.

Asynchronous hardware reset events further complicate the implementation of the persistent cache flush mechanism. Asynchronous hardware reset is typically accomplished by directly asserting a reset request pin (pin) and may not be detected by the power sequencing logic of the processor or chipset. If the system allows an externally initiated reset event to trigger a hardware reset without invoking a persistent refresh handler prior to the reset, the persistent memory state may not be properly refreshed. If an application relies on persistent cache flushing with incomplete support of platform hardware, application data may be lost or corrupted during a power interrupt event.

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, any method described in this section is not to be construed as limiting the scope of the prior art merely by virtue of its inclusion in this section.

Drawings

Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. It should be noted that references to "an embodiment" or "one embodiment" in this disclosure are not necessarily to the same embodiment, and they mean at least one. In the drawings:

FIG. 1 illustrates a system for performing a persistent cache refresh operation according to some embodiments.

FIG. 2 illustrates an example set of operations for performing a persistent cache refresh operation to maintain a persistent memory state, according to some embodiments.

FIG. 3 illustrates an example set of operations for managing persistent cache refresh operations in a system having multiple power supplies, according to some embodiments.

FIG. 4 illustrates an example system for managing multiple power sources, according to some embodiments.

Fig. 5 illustrates an example timing diagram with interleaved (stabilized) warning signals from different power supply units, according to some embodiments.

FIG. 6 illustrates an example set of operations for handling externally initiated asynchronous reset events, according to some embodiments.

FIG. 7 illustrates an example system for intercepting and handling externally initiated asynchronous reset events, according to some embodiments.

FIG. 8 illustrates an example set of operations for coordinating persistent memory modes of operation in accordance with some embodiments.

FIG. 9 shows a block diagram illustrating a computer system in accordance with some embodiments.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding. One or more embodiments may be practiced without these specific details. Features described in one embodiment may be combined with features described in another embodiment. In some instances, well-known structures and devices are described with reference to block diagram form in order to avoid unnecessarily obscuring the present invention.

1. General overview

Techniques for providing auxiliary energy with a system Power Supply Unit (PSU) to refresh volatile system memory to persistent memory after loss of Alternating Current (AC) power are described herein. In some embodiments, these techniques include implementing an extended retention window long enough to use the energy available in the PSU bulk capacitor to complete a full flush of the processor cache and memory controller buffers after a power down event. Even though the amount of energy available in the PSU is relatively small compared to most BBUs, these techniques may enable flushing of volatile system caches without the BBUs.

Many PSUs contain large capacity capacitors that allow the system to handle temporary 10 millisecond (ms) AC power losses. The example PSU implementation assumes a worst case output load and provides a 10ms timer that turns off the power supply output when the timer expires. This implementation limits the maximum retention window that the PSU can achieve to 10ms, regardless of system power consumption, which may not have enough time to flush all system caches.

In some embodiments, the PSU is implemented to extend the hold window by an uncertainty time window determined by the system power consumption rather than a fixed time window. The voltage across the bulk capacitor within one or more PSUs may be monitored and a notification may be triggered when a programmable power failure warning threshold voltage is detected across the bulk capacitor. The system may configure the voltage threshold to indicate that a particular minimum amount of energy is available in the PSU that is required to successfully complete the cache refresh operation. The PSU may also implement a second voltage threshold associated with a minimum amount of energy required to safely turn off (sequence down) the system power rails in sequence. Since both notifications are based on the amount of energy available in the PSU bulk capacitor, the system can implement a configurable hold window whose duration is determined by the power consumption of the system rather than a fixed duration. Thus, the system may define an operating point to minimize energy consumption, rather than being constrained by a fixed duration timer.

In some embodiments, the system logic may implement an energy counter that estimates the total amount of energy available on all installed PSUs and generates an interrupt signal to invoke the persistent refresh handler when the estimated total system energy has reached a threshold associated with the minimum energy required to successfully complete the cache refresh operation. The system logic may implement an energy counter for each PSU installed in the system. After the PSU has generated a power failure warning signal to the system logic, the system logic may begin to decrement an energy counter associated with the PSU at a rate proportional to the number of active power sources in the system and the operating mode of the system. The system logic may estimate the total energy available by summing each of the per PSU (per PSU) counters. When the total estimated energy falls below a critical threshold, the system logic may generate an interrupt signal to invoke the persistent cache flush handler.

In some embodiments, the system is configured to reduce power consumption during the refresh process to minimize the amount of energy required to complete the refresh. The processor, persistent memory device, and supporting circuitry may remain powered. Other system components that do not participate in the refresh process, such as fans, input/output (I/O) devices, and hard disk drives, may have power disabled. The power control loop in the system may also contain hooks (hooks) that reduce Central Processing Unit (CPU) power consumption, such as by reducing processor frequency.

To ensure that volatile system resources containing states that are considered persistent are properly refreshed to the persistent medium prior to a system reset or power conversion, in some embodiments, all resets or power conversion are preceded by a persistent refresh handler that is responsible for pushing all volatile states out to the persistent medium prior to the reset or power conversion. The system may capture (trap) access to registers used to initiate a reset or power state transition and initiate a persistent cache flush before allowing the captured write to complete. Capturing access to registers allows the system to run a cache flush handler prior to performing a requested reset or power conversion action. A similar mechanism may be implemented to handle the reset and power conversion requested by the platform entity external to the host subsystem.

The system reset and power conversion may occur not only in response to a power outage, but also in response to an event initiated by an external agent. For example, certain system errors may trigger a Hardware (HW) initiated system reset. As another example, a user may initiate a warm reset or forced power down by pressing a button or toggle switch. If the system allows an externally initiated reset or power conversion event to trigger a hardware reset without invoking a flush handler prior to the reset, data residing in the volatile processor cache or memory buffer may be lost. To ensure that an externally initiated system reset or power transition correctly invokes a persistent flush handler, the system may proxy these asynchronous events through system logic that generates interrupts to invoke a special persistent flush interrupt handler that performs a persistent cache flush prior to invoking the requested HW operation. Additionally or alternatively, the system may include a HW backup mechanism to ensure that all resets and power conversions requested in the HW are reliably completed within a bounded time window, regardless of whether the persistent cache flush handler is successful.

The techniques described herein also provide a handshake mechanism and protocol for informing the operating system whether the system hardware supports persistent cache flushing. The system may determine whether the hardware is capable of supporting a full flush of the processor cache and volatile memory buffers in the event of a power down or asynchronous reset. If the hardware is capable, persistent cache flushing may be selectively enabled and advertised to the operating system. Once persistent cache flushing is enabled, the operating system may treat data submitted to the volatile processor cache as persistent. If disabled or unsupported by the system hardware, such data may be lost due to power failure or reset events, and the platform may not advertise support for persistent cache flushing to the operating system.

One or more embodiments described in the specification and/or recited in the claims may not be included in this general overview section.

2. System architecture

In some embodiments, the techniques described herein are implemented on one or more computing devices (such as a server apparatus or other network host) that include persistent memory in a memory layout. While example computing architectures are provided herein, these techniques are applicable to a variety of different computing architectures, which may vary depending on the particular implementation. These techniques may be used to (a) determine whether a particular combination of system components is capable of supporting a persistent cache refresh, (b) configure the system components to enable the persistent cache refresh (if supported), and/or (c) perform a persistent cache refresh handler prior to a power conversion or reset if the persistent cache refresh is enabled.

FIG. 1 illustrates a system for performing a persistent cache refresh operation according to some embodiments. As shown, FIG. 1 includes PSU 102a, PSU 102b, power management subsystem 104, persistent cache flush handler 106, memory subsystem 108, CPU 116, system management module 118, system firmware 120, peripheral components 124, and operating system 122. In other embodiments, system 100 may include more or fewer components than those shown in FIG. 1. In some cases, the components shown in fig. 1 may be located locally to each other or remotely from each other.

PSU 102a and PSU 102b convert the power into a form that allows the components of system 100 to operate properly. In some embodiments, PSU 102a and PSU 102b convert AC power to Direct Current (DC) energy for powering components of system 100. Additionally or alternatively, PSU 102a and PSU 102b may include DC-DC power converters, such as converters that step up or step down the input voltage. PSU 102a and PSU 102b can be electrically coupled to other components of system 100 via one or more power rails, such as +3 volt (V), +5v, and/or +12v rails. Although two PSUs are illustrated, the system may have only a single PSU or additional PSUs, depending on the particular implementation.

The power management subsystem 104 controls the power delivery of the system PSU to the components of the system 100. In some embodiments, the power management subsystem 104 selectively powers down components during a reset or power down event to gracefully shut down the system 100. Additionally or alternatively, the power management subsystem 104 may monitor the voltage levels across bulk capacitors in PSU 102a and PSU 102 b. If the voltage level falls below a programmable threshold, the power management subsystem 104 may assert or de-assert a signal to notify other components of the system 100.

The memory subsystem 108 includes volatile and nonvolatile memory regions. In some embodiments, the volatile storage area includes a processor cache 110 and a memory buffer 112. Processor cache 110 may include caches within CPU 116, such as level 3 (L3) and level 4 (L4) caches, which may be used by CPU 116 to reduce the number of data accesses to main memory. Memory buffer 112 may include registers in CPU 116 and/or a memory controller that provides intermediate storage for data transferred between different regions. For example, the memory controller buffers may provide temporary storage for data being transferred between the processor cache 110 and main memory.

Persistent memory 114 includes one or more nonvolatile memory devices such as a Data Center Persistent Memory Module (DCPMM) and a nonvolatile dual inline memory module (NVDIMM). In some embodiments, persistent memory 114 is byte-addressable and resides on a memory bus, providing similar speed and latency as volatile DRAM, which is typically much faster than peripheral non-volatile storage devices such as hard disks and flash drives that do not reside on a memory bus. Furthermore, persistent memory 114 may be paged and mapped by operating system 122 in the same manner as volatile DRAM, which is typically not the case for other forms of persistent storage. Persistent storage 114 may be used as main memory within system 100. In other cases, the main memory may include one or more volatile memory modules, such as DRAM.

When a persistent cache flush processor is installed and platform signaling mechanisms are enabled, data stored within the volatile memory region including the processor cache 110 and memory buffer 112 may be considered part of the persistent memory state, even in the event of a power outage or other power conversion event. To maintain the persistence state, the cache refresh handler 106 performs and manages cache refresh operations in response to detecting the trigger event. If the persistent cache refresh operation is not enabled, then a full cache refresh may not be performed during the power conversion event and some or all of the data within the volatile memory region may be lost. Without the persistent cache flush handler, data from the memory buffer 112 may be flushed, but the processor cache 110 may not be flushed, which may reduce the amount of time required to perform a flush operation.

The system management module 118 includes software and/or hardware for managing system level operations. In some embodiments, the system management module 118 includes a Service Processor (SP) and a CPU chipset. The system management module 118 may interface with one or more sensors to monitor hardware components. Additionally or alternatively, the system management module 118 may perform other functions including capturing writes to system registers, generating System Management Interrupts (SMIs), and monitoring system boot (boot) states.

The system firmware 120 includes software that provides low-level control of the system hardware. In some embodiments, system firmware 120 includes software, such as basic input/output system (BIOS) firmware, that manages the boot process when the system is powered on or reset. System firmware 120 may also provide runtime services for operating system 122, such as managing persistent cache refresh operations and peripheral components 124.

The operating system 122 includes software that supports operations including scheduling execution of instructions on the CPU 116, providing services to software applications, and controlling access to peripheral components 124. In some embodiments, system firmware 120 may advertise the ability to include cache content in the persistence domain if supported by system 100. Operating system 122 may then selectively enable or disable persistent cache flushing. When enabled, the operating system 122 may treat data submitted to volatile memory, including the processor cache 110 and the memory buffer 112, as persistent.

Peripheral components 124 include auxiliary hardware devices such as hard disks, input devices, display devices, and/or other output devices that may be electrically coupled to other components of system 100. The power consumption of the system 100 may vary based in part on the connected and active peripheral components 124. The maximum power load for a worst case scenario may be calculated by assuming that all hardware components, including peripheral component 124, are operating at full capacity.

3. Persistent cache flush

3.1 managing cache refresh operations during Power interrupt events

When AC power is interrupted, it may not be desirable to trigger a cache refresh operation immediately, as power may be restored quickly. However, there is a risk of: if too much time passes without power restoration, the retention energy within PSU 102a and PSU 102b will be insufficient to perform a full cache flush. If persistent cache flushing is enabled, the persistent memory state may be corrupted. To maintain the persistent state, the power management subsystem 104 may generate a warning signal when the remaining energy within the bulk capacitors of PSU 102a and PSU 102b falls below a threshold level.

FIG. 2 illustrates an example set of operations for performing a cache refresh operation to maintain a persistent memory state, in accordance with some embodiments. One or more of the operations shown in fig. 2 may be modified, rearranged or omitted altogether. Accordingly, the particular order of operations illustrated in FIG. 2 should not be construed as limiting the scope of one or more embodiments.

Referring to fig. 2, process 200 includes estimating a total amount of ride-through time and hold time based on a system load (operation 202). The traversal time corresponds to an estimated amount of time that the system 100 may operate without AC power while leaving enough energy to perform a cache flush and turn off the power tracks in sequence. The retention time corresponds to the amount of time that a full cache flush is performed and the power tracks are turned off sequentially given the system load. The estimate may be calculated based on the full system load or a reduced system load limited to a maximum value, as further described herein.

In some embodiments, process 200 programs one or more energy thresholds based on the estimated ride-through time and hold time (operation 204). For example, process 200 may estimate a voltage level in a bulk capacitor of the PSU that will guarantee an estimated amount of retention time for system 100 under limited system loads to complete cache flushing and in-order power-rail shutdown operations. The voltage level may then be programmed to a threshold value. In other embodiments, the time may be set based on the estimated ride-through time instead of a voltage/energy based threshold.

In some embodiments, operations 202 and 204 are implemented as separate processes from the rest of the operations described herein. For example, operations 202 and 204 may be performed during a boot sequence of system 100, which may calculate the amount of energy required for each operating point and the associated voltage threshold. The calculation may be performed based in part on the detected system components of the boot sequence and the estimated power requirements of the running components during normal operation and/or reduced power modes of operation. The boot sequence may then set the programmable voltage threshold of the system 100. In other embodiments, the programmable threshold may be set or modified by user input. For example, a system administrator may set a programmable voltage threshold for each operating point, allowing the system administrator to inject domain knowledge about the system power requirements.

Referring again to fig. 2, process 200 includes monitoring for loss of AC power (operation 206). In some embodiments, the system 100 includes sensors and/or circuitry that may be embedded in the PSU 102a and/or PSU 102b that detect when input AC power is interrupted. In other embodiments, external circuitry and/or sensors may signal the system 100 when AC power is lost.

Based on the monitoring circuitry, process 200 may detect a loss of AC power (operation 208). In response, process 200 triggers a notification (operation 210). In some embodiments, the notification is triggered by deasserting the ack signal. Deasserting this signal will provide the following warning: the power is no longer stable and the energy reserve within the PSU bulk capacitor has fallen to a critical point where a system shutdown may need to be initiated to preserve the persistent state of the data, marking the beginning of the estimated hold time. In other words, the notification is used to alert the power management subsystem 104 to reserve enough energy in the system PSU to keep the power track long enough to perform a full cache flush of the processor cache 110 and the memory buffer 112.

In some implementations, the early warning mechanism is associated with a fixed time interval prior to shutdown, in which case the power management subsystem 104 may assume that the system 100 is operating at maximum load and ensure a minimum amount of time to complete a cache flush at maximum load and shut down the power rails in sequence. However, this approach may lead to a conservative implementation, where a system shutdown may be initiated earlier than desired, especially if the maximum PSU load is significantly higher than the actual system load during a persistent cache flush. In contrast, a programmable early warning threshold allows the system to trade off energy consumption before asserting a warning signal for traversal versus energy consumption after asserting a hold warning signal.

After the notification has been triggered, process 200 continues with powering the system component in the first mode of operation (operation 212). When operating in the first mode of operation, the system components may be powered using energy in the PSU bulk capacitor. In some embodiments, power may be provided as if the AC was not interrupted. In other embodiments, power saving adjustments may be made within the system 100. For example, processor frequency may be adjusted, display brightness may be dimmed, and/or other power saving actions may be taken. Additionally or alternatively, data may continue to be written to and updated in the processor cache 110 and the memory buffer 112.

The process 200 also monitors the energy level within one or more system PSUs based on the programmed threshold (operation 214). In some embodiments, the system 100 includes a sensor for monitoring the voltage across the bulk capacitor in the PSU. Since the capacitance value of the bulk capacitor is fixed, the voltage in the capacitor can act as a proxy for PSU energy levels when AC power is interrupted. In other embodiments, the energy level may be calculated from the capacitance value of the bulk capacitor and the measured voltage.

The process 200 also determines whether the energy level of the one or more PSUs meets a threshold (operation 216). For example, if the measured voltage across one or more bulk capacitors is below the voltage threshold programmed at operation 204, process 200 may determine that the threshold is met. If the threshold is not met, process 200 may continue to monitor the PSU energy level until power is restored or the voltage in the PSU bulk capacitor reaches or drops below a programmable threshold. Once the threshold is met, an alert signal may be asserted to trigger a cache flush and power down sequence.

In some embodiments, the process 200 enters the second mode of operation by reducing system load to minimize power consumption (operation 218). During this stage, the power management subsystem 104 may shut down components not involved in the cache flush operation. For example, the power management subsystem 104 may power down the peripheral components 124, and the peripheral components 124 may include hard drives, fans, displays, peripheral component interconnect express (PCIe) devices, and/or other peripheral hardware. Additionally or alternatively, the power management subsystem 104 may adjust the clock speed and frequency of the CPU 116 to minimize power consumption.

Process 200 also performs a cache flush (operation 220). During a cache flush operation, the CPU 116 may write data stored in the processor cache 110 and the memory buffer 112 to the persistent memory 114 to maintain a persistent state of the data. In some embodiments, process 200 may continue to monitor PSU energy levels during this operation. If the PSU energy level falls below the second voltage threshold, process 200 may trigger a power down sequence to prevent all power rails from powering down at the same time even if the cache flush is not complete. The second voltage threshold may be programmed to a much lower level than the first threshold, leaving enough energy to turn off the power rails in sequence.

Once the cache flush is complete, process 200 powers down the remaining system components (operation 224). Process 200 may sequentially shut down the power rails to gracefully shut down system 100. The order in which the power rails are turned off may vary from system to system.

The process depicted in FIG. 2 can maintain persistent memory states without installing the BBU or otherwise relying on energy from the BBU. Alternatively, the energy within the bulk capacitor of one or more PSUs may be managed by the power management subsystem 104 to ensure persistence. Further, the power management subsystem 104 accounts for run-time electrical loads, which allows for variable ride-through times and hold times to more efficiently and effectively use stored energy.

3.2 managing multiple Power supply units

When there are multiple PSUs in the system and one or more of the PSUs lose AC power, the amount of energy in the single PSU may not be sufficient to complete the cache flush operation. However, the aggregate energy across multiple PSUs may be sufficient to complete cache flushes to maintain a persistent state of data. If there are multiple PSUs, the power management subsystem 104 may monitor the total energy available on all power sources. When the aggregate voltage level exceeds a threshold, the power management system 104 may issue a power failure warning signal to trigger a cache refresh operation.

In some embodiments, the power management subsystem 104 detects the following events with respect to each PSU being managed:

AC power loss, which can be detected by de-assertion of the ack signal. If the signal is received from one or more PSUs, the power management subsystem 104 may enter a first mode of operation, reducing power during crossing of the window in the event of AC power restoration.

An indication that the energy or voltage level in the PSU has exceeded the first threshold, which can be detected by assertion of the vwarn signal. The power management subsystem 104 may combine the voltage warning information from all PSUs to determine when to enter the second mode of operation, whereby the persistent cache refresh handler 106 initiates the cache refresh operation. The power management system 104 may further reduce power during the second mode of operation, as previously described.

An indication that the energy or voltage level in the PSU has exceeded the second threshold, which can be detected by assertion of the pwrok signal. The power management subsystem 104 may combine pwrok signal information to determine whether to immediately power down the system. If further decreases in PSU energy levels do not leave enough energy to safely shut down the power rails in sequence, a shutdown may be triggered.

In some embodiments, the power management subsystem 104 maintains a set of per PSU counters to track estimated energy levels in each PSU in the event of AC power loss. The initial value of the per PSU counter may be hard coded or programmable to correspond to the amount of energy available in the PSU when vwarn is asserted. When the power management subsystem 104 detects that PSU has asserted vwarn, it may begin to decrement the energy counter of the associated PSU at a rate proportional to the number of active power sources in the system and the maximum load of each power source. For example, if there is a single active PSU and the maximum load is 1200 watts (W), the counter may be decremented at a rate of 1.2 joules (J) per millisecond. If there are two active power sources, then the load per power source is 600W and the energy counter may be decremented by 600 mJ/ms. With four active power supplies, the energy counter may be decremented by 300 mJ/ms. As another example, if the worst case system load is reduced to 1000W, the counter decrement rate may be modified to be 1J/ms for a single power supply, 500mJ/ms for two power supplies, and 250mJ/ms for four power supplies. The counter may be adjusted to provide a maximum traversal time while maintaining sufficient energy to maintain a persistent cache refresh for long power outages.

FIG. 3 illustrates an example set of operations for managing persistent cache refresh operations in a system having multiple power supplies, according to some embodiments. One or more of the operations shown in fig. 3 may be modified, rearranged, or omitted altogether. Accordingly, the particular order of operations illustrated in FIG. 3 should not be construed as limiting the scope of one or more embodiments.

Referring to fig. 3, process 300 detects assertion of one or more vwarn signals from one or more PSUs (operation 302). As previously described, each PSU may be configured to assert a signal when AC power is lost and the energy in the PSU bulk capacitor(s) falls below a threshold, which may be programmable.

In response to detecting the vwarn signal(s), the process 300 initiates one or more associated countdown timers (operation 304). In some embodiments, a countdown timer tracks the estimated energy level of each PSU that has asserted the vwarn signal. The process 300 may decrement the counter at a rate proportional to the number of PSUs in the system and the maximum load per power supply. In other embodiments, other mechanisms may be used to track the energy level within the PSU. For example, process 300 may increment a counter instead of decrementing the counter until a threshold is reached, or use other tracking logic.

Additionally or alternatively, the process 300 may cause the system 100 to enter a reduced power mode in response to detecting one or more vwarn signals. The reduced power mode of operation may be triggered by a single signal or a threshold number of signals, depending on the particular implementation. In other embodiments, the process 300 may gradually decrease power with each newly detected signal. For example, process 300 may utilize each new vwarn signal to gradually adjust the CPU frequency and/or initiate or increase other power saving actions as previously described.

The process 300 also monitors (a) the aggregate energy level of the combined PSU based on the countdown timer (or other tracking logic), (b) the assertion of the additional vwarn signal from the other PSU, and (c) the assertion of the pwrok signal from the PSU (operation 306). If an additional vwarn signal is detected, then the process 300 initiates an associated countdown timer for the PSU(s) that assert the signal (operation 304).

If the aggregate energy level meets the first threshold, then the process 300 performs a cache flush operation (operation 308). For example, process 300 may determine that the aggregate energy level on all PSUs is at or below a minimum threshold. During a cache flush operation, the CPU 116 may write data stored in the processor cache 110 and the memory buffer 112 to the persistent memory 114 to maintain a persistent state of the data. In some embodiments, the process 300 may continue to monitor PSU energy levels during this operation.

If the cache flush operation is complete or the PSU drops below a second voltage threshold amount that triggers one or more pwrok signals, then the process 300 sequentially shuts down the power rails (operation 310). When the pwrok signal is detected, process 300 may initiate a power down sequence to prevent all power tracks from powering down at the same time, even if a cache flush is not complete. The second voltage threshold may be programmed to a much lower level than the first threshold, leaving enough energy to turn off the power rails in sequence.

FIG. 4 illustrates an example system 400 for managing multiple power sources, according to some embodiments. The system 400 includes PSUs 402 and 404. However, the number of PSUs may vary depending on the particular implementation. Each PSU may include one or more bulk capacitors, such as capacitor 406, that store electrostatic energy obtained from a connected AC power network. The capacitor-based storage allows the PSU to be implemented with a smaller footprint than the BBU and provides faster charge and discharge rates. PSUs 402 and 404 may be connected to the same AC power network or different AC power networks, depending on the particular implementation. If connected to a different AC power network, one PSU may lose AC power while another PSU continues to be powered by a different AC network. In such a scenario, each PSU may provide a separate ack signal (not shown) to the power management subsystem 408. When AC power is lost, these signals may be independently de-asserted by the individual PSUs to signal which PSU lost power. In other cases, de-assertion of the ack signal may signal that a group of PSUs or all PSUs have lost AC power.

In some embodiments, each PSU asserts a vwarn signal when energy in a bulk capacitor (e.g., bulk capacitor 406) reaches a threshold. Thus, the vwarn signal informs the power management subsystem 408 that the available energy of the associated PSU is at the first threshold level. The power management subsystem 408 maintains a separate energy counter for each PSU, which are triggered when the associated PSU asserts the vwarn signal. For example, when PSU 402 asserts the vwarn signal, power management subsystem 408 may decrement energy counter 410 at a rate proportional to the number of PSUs in the system and the maximum load per power supply. The energy counter 412 is managed independently of the energy counter 410 (the vwarn signal from the PSU does not trigger the counting of the unassociated counters of the other PSUs) and decrements in response to the PSU 404 asserting the vwarn signal. The power management subsystem 408 includes an adder 414 that sums together the estimated energy counts of PSUs to calculate an aggregate energy counter 416.

In some embodiments, the power management subsystem monitors the aggregate energy counter 416 to determine whether the aggregate energy on all PSUs has reached or fallen below a system threshold, which may be programmable and vary depending on the particular implementation. If the threshold is reached, power management subsystem 408 asserts the SMI signal to stop the current task performed by CPU/chipset 422 in preparation for a persistent cache flush and reset. In response to the SMI, the persistent cache flush handler 424 may initiate the previously described persistent cache flush operation.

Fig. 5 illustrates an example timing diagram 500 with interleaved warning signals from different power supply units, according to some embodiments. The upper half of the graph 500 depicts the timing of a power failure, while the lower half of the graph 500 depicts potential power reductions that the system 100 may achieve in response to an impending power failure. The graph 500 assumes a worst case behavior in which the system operates at maximum load until a power failure cache flush is triggered. When a power failure cache flush is triggered, system power consumption is reduced, thereby minimizing the load upon completion of the flush operation.

The variables of the graph 500 may be defined as follows:

·t _{psu0_v1warn} time when the first PSU asserts v1 wave

·t _{psu1_v1warn} Time when the second PSU asserts v1 wave

·t _{flush_start} Time when power management subsystem 104 initiates a power failure refresh operation

·t _{psu1_pwrok} Time when the second PSU deasserts pwrok and the system 100 begins the power-down sequence

·T _{v1warn_delay} –t _{psu0_v1warn} And t _{psu1_v1warn} Time delay between

·T _{v1warn_debounce} –t _{psu1_v1warn} Up to t _{flush_start} The time delay between the time of the two,

·T _v1warn -utilizing all E in the power supply when at maximum load _v1warn Time of energy

·P _max Maximum system load

·P _throttle -system load after v1warn is asserted by the first PSU

·P _debounce System load after all PSUs have asserted v1 wave

·P _flush -system load after power failure refresh operation is triggered

·E _v1warn Usable energy available in PSU after v1warn assertion but before pwrok de-assertion

·E _pwrok Usable energy available in PSU after pwrok de-assertion and before loss of primary power track

·E _{psu0_reserve} -at t at the second PSU _{psu1_v1warn} After v1warn is asserted, the available energy remaining in the first PSU for v1warn assertion

·E _reserve After assertion of v1warn on all PSUs, but before de-assertion of pwrok on the last PSU, the total usable energy remaining in all PSUs.

·E _{v1warn_delay} System 100 at t _{v1warn_delay} Energy consumed during the period

·E _flush Energy required to successfully complete a power failure refresh operation

N-number of active PSUs providing Power to System

Referring to fig. 5, when the first PSU is at t _{psu0_v1warn} When v1 wave is asserted, the PSU has (E _v1warn +E _pwrok ) Available energy. There are N active PSUs and the system load is divided among all active PSUs in the system. At some later point in time, the second PSU is at t _{psu1_v1warn} V1 wave is broken down. t is t _{psu0_v1warn} And t _{psu1_v1warn} The amount of energy consumed during the time period in between is denoted as E _{v1warn_delay} 。

The system 100 draws energy from all N active PSUs during a period of time after the first power source has asserted v1warn but before the second power source has asserted v1warn. The maximum energy consumed by the first PSU after the v1 wave is asserted is ((E) _v1warn +E _pwrok ). The energy remaining in the first PSU when the second CPU asserts v1 wave is denoted as E _psu0reserve 。

If T _{v1warn_delay} If so, the second PSU deasserts pwrok before the system has consumed all the energy from the first power supply. In the worst case, when both power supplies simultaneously assert v1warn each, both power supplies will also simultaneously de-assert pwrok. In these cases, if the system shuts down when pwrok is de-asserted for all power supplies, then any E may not be used in the first power supply _pwrok Energy. To allow for this possibility, the system 100 may be configured to assume E _pwrok Energy is not available in the first power supply.

To harness all of the energy in both power supplies, the system 100 may be configured such that the power-down refresh does not begin immediately when the second PSU asserts v1 wave. Alternatively, the system 100 may delay the refresh trigger until the amount of energy retained in all active PSUs is equal to the amount required to complete the refresh. The system 100 may also be configured to guarantee E _reserve ≥E _flush To reserve sufficient energy to complete the cache refresh operation.

If both PSUs assert v1 wave at the same time, then T _{v1warn_delay} =0, and T _{v1warn_debounce} ＝T _v1warn . If the PSU asserts v1 wave for a long interval, then T _{v1warn_debounce} =0, and once the second PSU asserts v1 wave, a power failure refresh can be triggered. The power management subsystem 104 may program the energy/voltage thresholds accordingly.

4. Managing externally initiated asynchronous reset events

Power interruption events are not the only cause of system shutdown or reset. In some cases, a system error or user action may trigger a system shutdown or reset. For these externally initiated asynchronous events, monitoring for power loss may not be sufficient to maintain persistent memory states, as a/C power may be relatively constant. Asynchronous hardware resets are typically implemented by directly asserting a reset request pin that initiates a reset in the HW and may not provide any ability to invoke a software cache flush handler prior to the reset. In some embodiments, to prevent data loss, board logic (board logic) is configured to generate an SMI signal to initiate a cache refresh when an externally initiated reset request is detected.

FIG. 6 illustrates an example set of operations for handling externally initiated asynchronous reset events, according to some embodiments. One or more of the operations illustrated in fig. 6 may be modified, rearranged, or omitted altogether. Accordingly, the particular sequence of operations illustrated in FIG. 6 should not be construed as limiting the scope of one or more embodiments.

Referring to fig. 6, process 600 intercepts assertion of the HW reset request signal (operation 602). In some embodiments, platform initiated reset requests (including those initiated by system management module 118) are proxied through power management subsystem 104. This allows system 100 to run persistent cache flush handler 106 prior to performing the requested reset or power conversion action.

In some embodiments, the process 600 determines whether persistent cache flushing is enabled (operation 604). As described further below, the system firmware (or other system logic) may selectively enable or disable persistent cache flushing to configure whether data in the processor cache 110 is included in the persistent memory state.

If persistent cache flushing is not enabled, then the process 600 routes the request to a reset pin (operation 606). In some embodiments, the power management subsystem 104 routes the request to the system chipset. The chipset may initiate a HW reset sequence.

If persistent cache flushing is enabled, the process 600 routes the request to the system management module 118 (operation 608). In this case, the reset pin is not immediately asserted in response to a platform or user initiated reset to allow time for invoking a software-based cache flush handler.

In some embodiments, process 600 generates an SMI signal to place system 100 in a system management mode (operation 610). The SMI signal may be asserted by the system management module 118 and the system management module 118 may use a special signaling line directly bound to the CPU 116. This signal may cause system firmware 120 (e.g., BIOS) to suspend the current task being performed by CPU 116 in preparation for a cache flush and reset.

In some embodiments, if persistent cache flushing is enabled, system firmware 120 (e.g., BIOS) configures a general purpose input/output (GPIO) pin within system management module 118 as a trigger for an SMI. The GPIO pin may be used to signal to system firmware 120 when a cache flush is to be performed and a subsequent warm reset (warm reset). The GPIO may be different from the GPIO used to signal an upcoming power failure to the chipset to convey that the persistent cache refresh handler should terminate by requesting a warm reset rather than powering down.

The process 600 next performs a cache refresh operation (operation 612). In response to the SMI signal, system firmware 120 may invoke cache flush handler 106 to manage the cache flush operation, as previously described. Thus, data is transferred from volatile memory, such as processor cache 110 and memory buffer 112, to persistent memory 114, thereby maintaining a persistent state.

The process 600 further determines whether the refresh is complete (operation 614). When data in processor cache 110 and memory buffer 112 has been written to persistent memory 114, persistent cache flush handler 106 may assert a signal or otherwise provide notification.

Once the cache flush is complete, process 600 generates a reset request (operation 622). For example, persistent cache flush processor 106 may initiate a system reset by writing a particular value to a particular IO port/register of the PCH (e.g., 0x06 to port CF 9) or by requesting system logic to assert a HW reset request signal from the chipset.

If the refresh is not complete, then the process 600 may determine if a timeout has been reached (operation 616). For example, process 600 may allow one second or another threshold period of time (which may be configured by system 100) for completing a refresh operation. In some cases, the system state associated with the reset event may prevent the refresh from being completed. Implementing a timeout may prevent system 100 from entering a state where a warm reset cannot be performed.

If a timeout is reached, the process 600 generates a reset request signal directly to the chipset (operation 618). The reset request in operation 622 may also be a direct reset request to the chipset or may be a software-based request. Thus, the mechanism by which the system is reset may vary based on whether the refresh was successfully completed.

In response to the reset signal, the system 100 is then reset (operation 420). A reset in such a scenario may cause the system 100 to shut down or restart.

FIG. 7 illustrates an example system 700 for intercepting and handling externally initiated asynchronous reset events, according to some embodiments. The system 700 includes a system management module 702. The system management module 702 may be implemented in programmable hardware such as a Field Programmable Gate Array (FPGA) or by other hardware components (or a combination of hardware and software) as described further below. The system management module 702 acts as a proxy intercepting assertion of a hardware request signal that may be triggered by a user pressing a reset button or a reset request asserted by a Baseboard Management Controller (BMC) or debug header (debug header).

The system management module 702 includes a logic gate 704 that routes an asserted reset request signal to a demultiplexer 706. The select line coupled to the demultiplexer 706 is set based on whether persistent cache refresh is enabled or disabled. The "0" or low voltage state represents a memory mode of operation in which persistent cache flushing is disabled and data in the processor cache 110 and memory buffer 112 is not managed as part of the persistent domain. A "1" or high voltage state indicates a persistent cache mode of operation in which persistent cache flushing is enabled and the data in processor cache 110 and memory buffer 112 is part of a persistent domain. However, depending on the particular implementation, the values on the select lines may be swapped.

When persistent cache refresh is disabled, system management module 702 then asserts a request to reset interrupt signal to a pin electrically coupled to CPU/chipset 712. In response, reset control logic 714 on CPU/chipset 712 pauses the current task being performed and initiates a hardware reset, which may include signaling to reset Finite State Machine (FSM) 710. Reset FSM 710 may sequentially shut down power rails in a particular order to avoid damaging hardware components. As previously mentioned, the order in which the power rails are turned off may vary depending on the system architecture.

When persistent cache flushing is enabled, system management module 702 asserts the SMI using a special signaling line directly bound to another pin on CPU/chipset 712. This signaling line is different from the previously described line used to perform HW reset when persistent cache refresh is not enabled. In response to detecting the SMI, CPU/chipset 712 sends a software-based request to power failure refresh handler 716 to initiate a persistent cache refresh.

In response to the request, the persistent cache flush handler 716 initiates a persistent cache flush operation to transfer data from the processor cache and the memory buffer to the persistent storage medium. If the cache flush completes successfully, the power failure flush handler 716 sends a software reset request to the reset control logic 714, which may trigger a power down sequence as previously described.

The system management module 702 further initializes the timer 708 when persistent cache flushing is enabled. Timer 708 may be decremented or incremented until a timeout value is cancelled or reached. In response to detecting assertion of the signal on the input pin to reset FSM 710, the count may be de-counted. This signal indicates that the power failure refresh handler 716 successfully refreshes the processor cache and memory buffer to the persistent storage medium and has initiated a reset sequence. If the timeout value is reached before the timer is cancelled, the system management module 702 may assert the rst_req_in pin on the CPU/chipset 712 directly to trigger a HW reset.

5. Coordinating persistent cache refresh states among system components

The system boot firmware may disclose persistent cache refresh support via user configurable options. However, boot firmware may be deployed on a wide variety of hardware platforms, and proposing this option may not mean that particular platform hardware is capable of supporting persistent cache flushing. Whether a platform is capable of supporting persistent cache flushing may depend on hardware configuration, the presence and/or health of energy storage modules, and the capabilities of the underlying hardware components. In some embodiments, components of system 100 engage in a handshake to (a) determine whether hardware has sufficient capability to support a persistent cache flush; (b) Selectively enabling/disabling persistent cache flushing; (c) When a persistent cache flush is enabled, configuring the system component to support the persistent cache flush; and (d) whether communicating with the operating system has successfully enabled persistent cache flushing.

FIG. 8 illustrates an example set of operations for coordinating persistent memory modes of operation in accordance with some embodiments. One or more of the operations illustrated in fig. 8 may be modified, rearranged, or omitted altogether. Accordingly, the particular order of operations illustrated in FIG. 8 should not be construed as limiting the scope of one or more embodiments.

Referring to fig. 8, process 800 initiates a boot sequence (operation 802). In some embodiments, the boot sequence loads system firmware 120, which may include BIOS firmware. The firmware may be configured to disclose user configurable options for persistent cache flushing. For example, the firmware may present a prompt to the user: whether the user wants to enable persistent cache refresh, or the user can navigate a user interface, such as a BIOS setup utility screen.

In some embodiments, the user interface exposes a plurality of settings of the "durability domain" setting option to configure whether the platform is to operate in ADR mode or in persistent cache refresh mode. For example, the user interface may disclose an option for selecting a "memory controller" setting or a "CPU cache hierarchy" setting. In the "memory controller" setting, ADR is enabled, but persistent cache refresh is disabled. When this setting is selected, the memory buffer 112 is refreshed during the power down event, but the refresh operation is not applied to the processor cache 110. In some embodiments, the system hardware may be configured with this setting by default.

In a "CPU cache hierarchy" setting, persistent cache flushing is enabled. Thus, if this option is selected, then the data in memory buffer 112 and processor cache 110 is flushed at the power down event, in the event that the platform hardware supports a persistent cache flush operation.

Other settings may additionally or alternatively be supported. For example, a "standard domain" setting may be selected in which cached data is not refreshed upon a power failure event. The user may select the setting of the preference via the user interface, as described previously. If the user does not select a setting, system firmware 120 may select a default setting, which may vary depending on the particular implementation.

In some embodiments, system firmware 120 checks to determine if the persistent cache mode has been selected by the user or by default (operation 804). Even if this option is selected, the platform hardware may not support a persistent cache flush operation in some cases. Furthermore, system hardware may evolve over time as components are added, removed, aged, and/or failed.

If the persistent cache mode has not been selected, system firmware 120 continues the boot sequence without advertising support for the persistent cache mode (operation 822). The boot sequence may include initializing hardware components, loading an operating system, and/or processing system boot files that have not yet been processed. The boot sequence may continue without performing a hardware capability check as described further below.

If the persistent cache mode has been selected, system firmware 120 sends a request to system management module 118 to determine if system 100 is capable of supporting a persistent cache refresh operation (operation 806).

In response to the request, the system management module 118 evaluates the hardware capabilities of the system 100 (operation 808). In some embodiments, system management module 118 may engage in handshaking with one or more hardware components to determine settings, configurations, and/or other information indicating whether persistent cache flushing is supported. For example, during a boot sequence, the connected hardware components may include firmware that provides the system firmware with a list of features supported by the component. The system management module 118 may scan the provided feature list and/or other information to determine whether the feature is compatible with the persistent cache refresh.

In some embodiments, evaluating the hardware capabilities of the system 100 includes determining whether the PSU 102a and/or the PSU 102b supports generating an early warning signal and configuring a programmable vwarn threshold. For example, the system management module 118 may determine whether the PSU 402 includes a pin for asserting the vwarn signal. If the PSU does not have these capabilities, the system management module 118 may determine that the platform hardware does not support a persistent cache flush operation.

Additionally or alternatively, the system management module 118 may determine whether the power management subsystem 104 includes logic for detecting a vwarn signal, monitoring an aggregate energy level across a plurality of PSUs, and/or triggering an interrupt when a system-wide energy level is below a threshold. If the power management subsystem 104 does not have these capabilities, the system management module 118 may determine that the platform hardware does not support a persistent cache flush operation.

Additionally or alternatively, the system management module 118 may evaluate other hardware capabilities. For example, the system management module 118 may evaluate the system 100 to determine whether the system supports intercepting reset signals and configuring GPIO pins to handle asynchronous reset events. As another example, the system management module 118 may evaluate the CPU 116 to determine whether it includes a special signaling line for invoking the persistent cache refresh handler.

Additionally or alternatively, the system management module 118 can determine if any BBUs supporting the persistent cache refresh operation have been installed. If the BBU has been installed, the system management module 118 may determine that persistent cache flushing is supported even if the PSU architecture does not provide support. On the other hand, if the BBU is not installed and the PSU and/or the power management subsystem do not support a persistent cache flush operation, the system management module 118 may determine that the persistent cache flush is not supported.

Additionally or alternatively, the system management module 118 may evaluate other hardware capabilities. For example, the system management module 118 may evaluate the capacity of an auxiliary energy storage device (such as a BBU) installed in the platform and determine whether the device provides enough energy to power the system components that are active during the refresh process. Additionally or alternatively, the system management module 118 may evaluate the health of the battery, such as by measuring battery impedance, to determine whether the platform hardware supports persistent cache flushing.

Based on the evaluation, system management module 118 returns a response to system firmware 120 indicating whether the platform is capable of supporting a persistent cache flush (operation 810). If so, the response may grant permission to system firmware 120 to enable persistent cache flushing. Otherwise, the system management module 118 denies the ability of the system firmware 120 to enable persistent cache flushing.

Upon receiving the response, system firmware 120 determines whether system 100 supports persistent cache flushing (operation 812).

If the platform hardware does not support a persistent cache flush, system firmware 120 continues the boot sequence without advertising to operating system 122 support for the persistent cache flush (operation 822). When the persistent cache flush is not advertised and enabled, the operating system 122 may prevent the application from attempting to treat the processor cache as persistent in the system 100.

If persistent cache flushing is supported, then system firmware 120 and/or system management module 118 configures the system components to support the persistent cache flushing operation (operation 814). For example, system firmware 120 may establish GPIO pins, initialize a per PSU timer, configure PSU, and otherwise configure system hardware/software to perform cache flush operations, as previously described.

The system firmware 120 and/or the system management module 118 then advertises support for the persistent cache flush to the operating system 122 (operation 816). In some embodiments, system firmware 120 may provide operating system 122 with a list of supported features and/or configuration settings. The list may include entries indicating that persistent cache flushing is supported and enabled. However, the manner in which the notification is supported may vary depending on the particular implementation.

Based on the notification, the operating system 122 detects whether a persistent cache mode is supported (operation 818). For example, operating system 122 may scan a list of supported features during a boot sequence to determine if system firmware or system management module 118 is advertising support for persistent cache flushing.

If the platform hardware enables and supports the persistent cache mode, the operating system 122 advertises the persistent cache mode to one or more applications (operation 820). In some embodiments, an application may query operating system 122 to determine if persistent cache mode is available and supported. Operating system 122 may provide a response to indicate whether the application may rely on the persistent cache. Applications may implement different logic depending on whether persistent caches are enabled and supported. For example, if enabled, the database application may treat reads and writes as committed without implementing complex software-based checks, which may simplify application code and provide more efficient execution of reads and writes.

As system components evolve, process 800 may be repeated to determine if support for the persistent cache mode has changed. Changes in hardware, such as the installation of a BBU or a PSU upgrade, may cause the system 100 to advertise support for persistent cache flushes that it did not previously support. In other cases, the advertisement may be removed if a component such as a BBU is removed or fails.

6. Hardware implementation

According to one embodiment, the techniques described herein are implemented by one or more special purpose computing devices. The special purpose computing device may be hardwired to perform the present techniques, or may include a digital electronic device, such as one or more Application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or Network Processing Units (NPUs), or may include one or more general purpose hardware processors programmed to perform the present techniques in accordance with program instructions in firmware, memory, other storage, or a combination. Such special purpose computing devices may also combine custom hardwired logic, ASICs, FPGAs, or NPUs with custom programming to implement the present techniques. The special purpose computing device may be a desktop computer system, portable computer system, handheld device, networking device, or any other device that implements techniques in conjunction with hardwired and/or program logic.

For example, FIG. 9 is a block diagram that illustrates a computer system 900 upon which an embodiment of the invention may be implemented. Computer system 900 includes a bus 902 or other communication mechanism for communicating information, and a hardware processor 904 coupled with bus 902 for processing information. The hardware processor 904 may be, for example, a general purpose microprocessor.

Computer system 900 also includes a main memory 906, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 902 for storing information and instructions to be executed by processor 904. Main memory 906 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 904. Such instructions, when stored in a non-transitory storage medium accessible to the processor 904, cause the computer system 900 to be a special purpose machine that is customized to perform the operations specified in the instructions.

Computer system 900 also includes a Read Only Memory (ROM) 908 or other static storage device coupled to bus 902 for storing static information and instructions for processor 904. A storage device 910, such as a magnetic disk or optical disk, is provided and coupled to bus 902 for storing information and instructions.

Computer system 900 may be coupled via bus 902 to a display 912, such as a Cathode Ray Tube (CRT) or Light Emitting Diode (LED) monitor, for displaying information to a computer user. An input device 914, which may include alphanumeric and other keys, is coupled to bus 902 for communicating information and command selections to processor 904. Another type of user input device is cursor control 916, such as a mouse, a trackball, a touch screen, or cursor direction keys for communicating direction information and command selections to processor 904 and for controlling cursor movement on display 912. The input device 914 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allow the device to specify positions in a plane.

Computer system 900 can implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic in combination with a computer system to make computer system 900 a special purpose machine or to program computer system 900 into a special purpose machine. According to one embodiment, the techniques herein are performed by computer system 900 in response to processor 904 executing one or more sequences of one or more instructions contained in main memory 906. Such instructions may be read into main memory 906 from another storage medium, such as storage device 910. Execution of the sequences of instructions contained in main memory 906 causes processor 904 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term "storage medium" as used herein refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 910. Volatile media includes dynamic memory, such as main memory 906. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, a Content Addressable Memory (CAM), and a Ternary Content Addressable Memory (TCAM).

Storage media are different from, but may be used in conjunction with, transmission media. Transmission media participate in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 902. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 904 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a network cable, such as a telephone line, a fiber optic cable, or a coaxial cable, using a modem. A modem local to computer system 900 can receive the data on the network line and use an infrared transmitter to convert the data to an infrared signal. The infrared detector may receive the data carried in the infrared signal and appropriate circuitry may place the data on bus 902. Bus 902 carries the data to main memory 906, from which main memory 906 processor 904 retrieves and executes the instructions. The instructions received by main memory 906 may optionally be stored on storage device 910 either before or after execution by processor 904.

Computer system 900 also includes a communication interface 918 coupled to bus 902. Communication interface 918 provides a two-way data communication coupling to a network link 920 that is connected to a local network 922. For example, communication interface 918 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 918 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 918 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 920 typically provides data communication through one or more networks to other data devices. For example, network link 920 may provide a connection through local network 922 to a host computer 924 or to data equipment operated by an Internet Service Provider (ISP) 926. ISP 926 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 928. Local network 922 and internet 928 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 920 and through communication interface 918, which carry the digital data to computer system 900 or from computer system 900, are exemplary forms of transmission media.

Computer system 900 can send messages and receive data, including program code, through the network(s), network link 920 and communication interface 918. In the internet example, a server 930 might transmit a requested code for an application program through internet 928, ISP 926, local network 922 and communication interface 918.

The received code may be executed by processor 904 as it is received, and/or stored in storage device 910, or other non-volatile storage for later execution.

7. Other matters; expansion of

Embodiments are directed to a system having one or more devices including a hardware processor and configured to perform any of the operations described herein and/or in any of the following claims.

In an embodiment, a non-transitory computer-readable storage medium includes instructions that, when executed by one or more hardware processors, cause performance of any of the operations described and/or claimed herein.

Any combination of the features and functions described herein may be used in accordance with one or more embodiments. In the foregoing specification, examples have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and is intended by the applicants to be the scope of the invention, the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

Claims

1. A method, comprising:

storing a set of data in a volatile memory of a computing system, wherein the set of data is part of a persistent memory state but not stored in a non-volatile storage;

determining that input power in a group of one or more power supply units has been lost;

in response to determining that input power in the set of one or more power supply units has been lost, operating the computing system in a first mode of operation, wherein the set of data is not refreshed to the persistent storage in the first mode of operation;

determining that available energy in the set of one or more power sources has fallen to a threshold;

in response to determining that the available energy in the set of one or more power supplies has fallen to the threshold, performing a cache refresh that transfers the set of data to persistent memory in the computing device to maintain a persistent memory state.

2. The method of claim 1, wherein the threshold is programmed based on an estimated level of energy used by the computing system to complete the cache refresh operation.

3. The method of claim 1, wherein determining that available energy in the one or more power supply units falls below the threshold comprises determining that a voltage across at least one bulk capacitor in the one or more power supply units has fallen below a threshold voltage.

4. The method of claim 1, further comprising estimating an amount of energy available within the power supply using a counter that begins to decrement when the power supply unit signals that an energy threshold has been exceeded, wherein the counter decrements at a rate proportional to a worst case estimated load on the power supply.

5. The method of claim 1, further comprising estimating a total amount of energy available within the plurality of power sources by summing a set of energy counters maintained for the plurality of power sources.

6. The method of claim 1, wherein the system-wide energy threshold is programmed based on an amount of energy used to complete a cache refresh operation.

7. The method of claim 1, wherein determining that the available energy in the set of one or more power supplies has fallen to a threshold value comprises comparing a sum of a plurality of counters to a minimum energy threshold value, wherein the plurality of counters includes at least one counter for each of the one or more power supply units, wherein the threshold value is met when the sum of the plurality of counters is less than or equal to the minimum energy threshold value.

8. The method of claim 1, wherein one or more hardware components not participating in performing cache flushing are powered down in response to determining that available energy in the set of one or more power supplies has fallen to the threshold.

9. The method of claim 1, wherein in response to determining that available energy in the set of one or more power sources has fallen to the threshold, reducing a processor frequency in the computing system.

10. The method of claim 1, further comprising de-asserting a first signal in response to determining that the input power has been lost, and asserting a second signal in response to determining that available energy in the set of one or more power sources has fallen to the threshold.

11. A system, comprising:

one or more power supply units;

a memory subsystem comprising volatile memory and persistent memory, wherein the volatile memory stores a set of data that is part of a persistent memory state and is not stored in the persistent memory; and

system logic coupled to the one or more power supply units, the system logic to:

determining that input power in the one or more power supply units has been lost;

in response to determining that input power in the one or more power supply units has been lost, causing the system to operate in a first mode of operation, wherein the set of data is not refreshed to the persistent storage in the first mode of operation;

Determining that available energy in a set of one or more power sources has fallen to a threshold; and

12. The system of claim 11, wherein the system logic is to program the threshold based on an estimated level of energy used by the computing system to complete the cache refresh operation.

13. The system of claim 11, wherein determining that the available energy in the one or more power supply units falls below the threshold comprises determining that a voltage across at least one bulk capacitor in the one or more power supply units has fallen below a threshold voltage.

14. The system of claim 11, wherein the system logic estimates the amount of energy available within the power supply using a counter that begins to decrement when the power supply unit signals that an energy threshold has been exceeded, wherein the counter decrements at a rate proportional to a worst case estimated load on the power supply.

15. The system of claim 11, wherein the system logic estimates the total amount of energy available within the plurality of power sources by summing a set of energy counters maintained for the plurality of power sources.

16. The system of claim 11, wherein the system logic is to program a system-wide energy threshold based on an amount of energy used to complete a cache refresh operation.

17. The system of claim 11, wherein determining that the available energy in the set of one or more power supplies has fallen to a threshold comprises comparing a sum of a plurality of counters to a minimum energy threshold, wherein the plurality of counters includes at least one counter for each of the one or more power supply units, wherein the threshold is met when the sum of the plurality of counters is less than or equal to the minimum energy threshold.

18. The system of claim 11, wherein in response to determining that available energy in the set of one or more power supplies has fallen to the threshold, the system logic powers down one or more hardware components that are not involved in performing cache flushing.

19. The system of claim 11, wherein the system further comprises a processor, and the system logic is to decrease a frequency of the processor in response to determining that available energy in the set of one or more power sources has fallen to the threshold.

20. A system, comprising:

means for storing a set of data in volatile memory of a computing system, wherein the set of data is part of a persistent memory state but not stored in a non-volatile storage;

means for determining that input power in a group of one or more power supply units has been lost;

means for operating the computing system in a first mode of operation in response to determining that input power in the set of one or more power supply units has been lost, wherein the set of data is not refreshed to persistent storage in the first mode of operation;

means for determining that available energy in the set of one or more power sources has fallen to a threshold;

means for performing a cache refresh that transfers the set of data to persistent memory in the computing device to maintain a persistent memory state in response to determining that available energy in the set of one or more power supplies has fallen to the threshold.