WO2015094373A1 - Apparatus and method for adaptive guard-band reduction - Google Patents

Apparatus and method for adaptive guard-band reduction Download PDF

Info

Publication number
WO2015094373A1
WO2015094373A1 PCT/US2013/077263 US2013077263W WO2015094373A1 WO 2015094373 A1 WO2015094373 A1 WO 2015094373A1 US 2013077263 W US2013077263 W US 2013077263W WO 2015094373 A1 WO2015094373 A1 WO 2015094373A1
Authority
WO
WIPO (PCT)
Prior art keywords
flip
clock signal
flop
clock
comparator
Prior art date
Application number
PCT/US2013/077263
Other languages
French (fr)
Inventor
Stefan Rusu
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/US2013/077263 priority Critical patent/WO2015094373A1/en
Publication of WO2015094373A1 publication Critical patent/WO2015094373A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K3/00Circuits for generating electric pulses; Monostable, bistable or multistable circuits
    • H03K3/02Generators characterised by the type of circuit or by the means used for producing pulses
    • H03K3/027Generators characterised by the type of circuit or by the means used for producing pulses by the use of logic circuits, with internal or external positive feedback
    • H03K3/037Bistable circuits
    • H03K3/0375Bistable circuits provided with means for increasing reliability; for protection; for ensuring a predetermined initial state when the supply voltage has been applied; for storing the actual state when the supply voltage fails

Definitions

  • This disclosure relates generally to electronic circuits. More particularly but not exclusively, the present disclosure relates to adaptive guard- band reduction using canary flip-flops with reduced power and area overhead.
  • Timing margins for modern multi-core processors are usually checked for worst-case loading and operating conditions. These worst-case conditions may rarely happen during normal operations, so the amount of over- provisioning of timing margins, typically called “guard-band” and expressed in either voltage or timing terms, may cause inefficiencies if excessive.
  • guard-band may be increased when portions of the chip are turned off while not in use, which reduces the total chip power consumption and increases the local power supply voltage provided to the circuits that remain operational. Similar conditions arise when the application code running on each core does not fully stress the operating conditions or if the process corner and the operating temperature are further away from the assumptions modeled during the design phase.
  • guard-band reduction Functioning after guard-band reduction is opportunistic, so guard- band reduction may be used by each processor or graphics core in an adaptive process whenever possible.
  • adaptive circuit techniques for reducing voltage and/or timing margins usually have limited coverage or high area and power overhead that make them unfit for large volume commercial applications.
  • Figure 1 is a schematic diagram of one system for adaptive guard- band reduction using canary flip-flops, incorporating aspects of the present disclosure, in accordance with various embodiments.
  • Figure 2 is a schematic diagram of another system for adaptive guard-band reduction using canary flip-flops, incorporating aspects of the present disclosure, in accordance with various embodiments.
  • Figure 3 is a schematic diagram of an integrated canary flip-flop cell, incorporating aspects of the present disclosure, in accordance with various embodiments.
  • Figure 4 is a schematic diagram of an integrated canary flip-flop cell including a pulse catcher, incorporating aspects of the present disclosure, in accordance with various embodiments.
  • Figure 5 is a schematic diagram that illustrates control loops of a power control unit, incorporating aspects of the present disclosure, in
  • FIG. 6 is a block diagram that illustrates an example computer system suitable for practicing the disclosed embodiments, in accordance with various embodiments.
  • Embodiments of adaptive guard-band reduction apparatus and method are described herein.
  • numerous specific details are given to provide a thorough understanding of embodiments.
  • the embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc.
  • well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the embodiments.
  • the phrase “A and/or B” means (A), (B), or (A and B).
  • the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C).
  • connection with the embodiment is included in at least one embodiment of the invention.
  • the description may use the phrases “in one embodiment,” “in an embodiment,” “in another embodiment,” “in embodiments,” “in various combinations thereof
  • embodiments or the like, which may each refer to one or more of the same or different embodiments. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Additionally, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous.
  • the canary flip-flop may include a first flip-flop to generate a first input to a comparator based at least in part on a first clock signal and a data input.
  • the canary flip-flop may also include a second flip-flop to generate a second input to the comparator based at least in part on a second clock signal and the data input.
  • the first clock signal, the second clock signal, and the data input may be configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop.
  • the second flip-flop may be gated to save power.
  • the area overhead of the canary flip-flip may be reduced by integrating the two flip- flops in a single cell together with the comparator. Moreover, a pulse catcher may also be integrated into the same cell. As such, the second flip-flop may be operated only during the adaptive operation for guard-band reduction, and be gated during the rest of the time. Moreover, the area overhead may be reduced by integrating all required circuits in a single cell.
  • System 100 may include clock grid 1 10 coupled to regional clock buffer (RCB) 122 as well as RCB 124.
  • RCB 122 may be coupled to local clock buffer (LCB) 126.
  • LCB 126 may be coupled to data storage 134 and main flip-flop (MFF) 142 in canary flip-flop (CFF) 140.
  • MFF main flip-flop
  • Data storage 134 may receive data D on one side and send data to cloud 136.
  • Cloud 136 may be coupled to MFF 142.
  • MFF 142 may generate output signal Q.
  • RCB 124 may be coupled to shadow flip-flop (SFF) 144 in CFF 140.
  • SFF 144 may receive data D, and may be coupled to comparator 146.
  • Comparator 146 may also be coupled to MFF 142.
  • comparator 146 may generate an output signal W.
  • Clock grid 1 10 may generate and amplify high-frequency clock signals, and distribute them to various RCBs.
  • clock signals may be distributed using many drivers in a tree schema, e.g., binary tree, H-tree, X-tree, etc.
  • RCB 122 and RCB 124 may be located at a same level of clock distribution tree. The clock signals produced by clock grid 1 10 may reach RCB 122 and RCB 124 at the same time. Therefore, RCB 122 and RCB 124 may form a dual RCB structure to drive in-phase clock signals to CFF 140.
  • RCB 122 and RCB 124 each may be an adjustable regional clock buffer whose clock signals may be adjusted.
  • RCB 122 may be coupled to LCB 126.
  • LCB 126 may be a fixed clock buffer where clock signals cannot be adjusted.
  • LCB 126 may add a delay to its output clock signals.
  • Clock signals from LCB 126 may be sent to data storage 134 which may receive and store data D.
  • Data D may be sent to cloud 136 for processing.
  • Cloud 136 represents logic that may perform various operations with respect to, or based on, data D.
  • RCB 122 may drive clock signals to MFF 142 via LCB 126 wherein the clock signals may be delayed due to LCB 126.
  • the delay by LCB 126 may be adjustable or programmable.
  • the line between RCB 122 and MFF 142 thus may become a programmable delay line. The delay may be adjusted to control a timing of a warning generated by CFF 140.
  • RCB 124 may drive clock signals to SFF 144.
  • SFF 144 may receive clock signals from RCB 124 earlier than MFF 142, e.g., because there is no LCB between RCB 124 and SFF 144 to cause any delay.
  • MFF 142 and SFF 144 each may sample data according to their respective clock signals. Data may be sampled earlier in SFF 144 than in MFF 142 because clock signals are earlier in SFF 144 than in MFF 142. Thus, sampling data may be correct at MFF 142, but fail at SFF 144 because the data may have less time to propagate under a relatively earlier clock at SFF 144. In other words, SFF 144 may see a failure, but the correct data may still be latched at MFF 142.
  • Comparator 146 may generate a warning signal W if the input received from MFF 142 differs from the input received from SFF 144, e.g., when MFF 142 timely latches the data, but SFF 144 fails to latch the data due to its earlier clock.
  • MFF 142 may have latched the correct data when a warning signal is generated because SFF 144 supposes to fail first due to its tighter timing requirement, e.g., caused by its earlier clock.
  • a warning signal may indicate that CFF 140 is going to fail soon if its operating conditions remain the same or worsen.
  • warning signal Once a warning signal is generated, it may be sent to a controller (not shown), which may adjust system 100 accordingly, e.g., raising the voltage or lowering the frequency, to mitigate the risk of timing failure for MFF 142. Therefore, CFF 140 may always generate the correct output Q when such failure prevention mechanism is in place.
  • comparator 146 may receive different inputs from MFF 142 and SFF 144 in different cycles of the clock signal, and a comparison may be performed in each cycle. There may be many paths and logics in system 100.
  • CFF 140 may be located at a critical path of system 100.
  • CFF 140 may indicate sufficient timing margins for data coming from one source, but generate a warning signal for data coming from another source. Thus, CFF 140 may generate an indication of whether it is going to fail the timing in each cycle, and it may continuously consume power in doing so.
  • RCB 124 may receive a control signal EN, which may enable or disable RCB 124.
  • EN When RCB 124 is enabled, SFF 144 will attempt to latch data in every cycle.
  • SFF 144 can be turned off. Therefore, SFF 144 may be selectively enabled when there is a need to monitor a critical path of system 100.
  • RCB 124 may be enabled when the operating conditions of a critical path have changed, so that CFF 140 may be used to monitor the critical path for a period of time.
  • RCB 124 may be disabled, or at least the clock signals from RCB 124 may be turned off, such that SFF 144 and/or comparator 146 can go off-duty to reduce power overhead at CFF 140, but MFF 142 may continue to latch the correct data.
  • the clock to SFF 144 may be configured to be earlier than the clock to MFF 142.
  • the early clock to SFF 144 may be generated by directly tapping the output from regional clock buffers, e.g., RCB 122 and/or 124. As illustrated above, the clock to SFF 144 may be made earlier by the delay of LCB 126.
  • dual RCB 120 may use the
  • the clock to SFF 144 may be made earlier than the regular clock to MFF 142 by making RCB 124 earlier than the regular clock.
  • the clock to SFF 144 may be made earlier than the clock to MFF 142 by making RCB 122 later than the regular clock.
  • the clock to SFF 144 may be earlier than the clock to MFF 142 by a programmable clock delay delta.
  • the adjustable time window between the two input clocks to CFF 140 may enable a shmoo function to sweep the delay to observe different failure points within a given clock region.
  • the time difference between the early and the regular clocks may be adjusted through the separate RCB 122 and 124.
  • the adjustability of RCBs may be used for debug purposes to sweep the entire delay range and monitor the increase in warning signals at all the canary flip-flop outputs.
  • the data may be sampled earlier at SFF 144 than MFF 142.
  • MFF 142 may latch the correct data without any error correction process.
  • comparator 146 will receive same inputs, and no warning signals would be generated.
  • SFF 144 may fail to latch the correct data if the data arrives within the clock delay delta from the end of the cycle. In this case, the data may meet the timing into MFF 142, but is late going into SFF 144, and thus causing
  • comparator 146 to warn of an impending timing failure.
  • the warning may prompt system 100 to make appropriate adjustments to gain sufficient timing margins to prevent MFF 142 from actually failing the timing criteria.
  • system 100 may stay away from a real timing failure and/or the trouble of recovering from the timing failure.
  • CFF 140 may be configured to reduce voltage and/or frequency guard-bands with reduced power and area overhead.
  • RCB 124 may be coupled to many canary flip-flops, and all clocks to these canary flip- flops may be gated, e.g., via the control signal EN to RCB 124, to reduce power overhead.
  • the area overhead of CFF 140 may be reduced by integrating the two flip-flops, MFF 142 and SFF 144, in a single cell together with comparator 146.
  • FIG. 2 illustrates system 200 for adaptive guard-band reduction using canary flip-flops, in accordance with various embodiments.
  • System 200 may include clock grid 210 coupled to RCB 222.
  • RCB 222 may be coupled to LCB 226.
  • LCB 226 may be coupled to data storage 234 and MFF 242 in CFF 240.
  • Data storage 234 may receive data D on one side and send data D to cloud 236.
  • Cloud 236 may be coupled to MFF 242.
  • MFF 242 may generate output signal Q.
  • RCB 222 may be coupled to LCB 224.
  • LCB 224 may be coupled to SFF 244 in CFF 240.
  • SFF 244 may receive data via local delay buffer (LDB) 248, which may receive data from cloud 236.
  • SFF 244 may also be coupled to comparator 246, which may also be coupled to MFF 242.
  • comparator 146 may generate output signal W.
  • LDB local delay buffer
  • Clock grid 210 may generate and amplify high-frequency clock signals, and distribute them to various RCBs, including RCB 222.
  • RCB 222 may be an adjustable regional clock buffer whose clock signals may be adjusted. RCB 222 may be coupled to LCB 224 or 226 which may be a fixed clock buffer. In some embodiments, LCB 224 and 226 may add a similar delay to their respective output clock signals. In some embodiments, LCB 224 and 226 may add different amount of delay to their respective output clock signals, therefore, for example, making the clock earlier in reaching SFF 244 than MFF 242. Data storage 234 and cloud 236 may function similar to data storage 134 and cloud 136 respectively in connection with Figure 1 .
  • RCB 222 may drive clock signals to MFF 242 via LCB 226, or to SFF 244 via LCB 224.
  • the delay caused by LCB 224 or 226 may be adjustable or programmable. The delay may be adjusted to control the timing of the warning that may be generated by CFF 140.
  • the delays by LCB 224 and 226 may be configured to be substantially identical to each other, so that MFF 242 and SFF 244 may receive similar clock signals.
  • MFF 242 and SFF 244 may each latch data according to their respective clocks.
  • data may be sampled earlier by SFF 244 than MFF 242, at least due to the delay of the data coming to SFF 244, e.g., caused by LDB 248.
  • Data may be latched correctly at MFF 242, but fail at SFF 244 because the data may come to SFF 244 late under similar clocks to MFF 242 and SFF 244.
  • Comparator 246 may generate a warning signal W if the input received from MFF 242 differs from the input received from SFF 244, e.g., when MFF 242 timely latches the data, but SFF 244 fails to do the same due to a data delay.
  • MFF 242 may have latched the correct data when a warning signal is generated because SFF 244 supposes to fail first due to its earlier sampling on the data signal, e.g., caused by the data delay. Warning signals may prompt system 200 conduct certain adjustments, e.g., raising the voltage or lowering the frequency, to mitigate the risk of timing failure for MFF 242. Therefore, CFF 240 may generate the correct output Q under such failure prevention scheme.
  • LCB 224 may receive a control signal EN, which may enable or disable LCB 224.
  • LCB 224 may include an AND gate.
  • SFF 244 may attempt to latch data in every cycle.
  • SFF 244 can be turned off. Therefore, SFF 244 may be selectively enabled, e.g., when the operating conditions of system 200 have been changed.
  • LCB 224 may be disabled, such that power overhead at CFF 240 may be reduced while MFF 242 may continue to latch correct data and produce correct output Q.
  • data may be sampled earlier at SFF 244 than MFF 242.
  • MFF 242 may latch the correct data without any error correction process.
  • comparator 246 will receive same inputs, and the output signal from comparator 246 will indicate no warnings.
  • SFF 244 may fail to correctly latch the data if the data delayed by LDB 248 misses the setup time at the end of the cycle for SFF 244. In this case, the data may meet timing criteria at MFF 242, but may become too late for SFF 244, and thus cause comparator 246 to warn of an impending timing failure.
  • the warning may prompt system 200 to make appropriate adjustments to gain sufficient timing margins to prevent MFF 242 from failing the timing criteria.
  • CFF 240 may be configured to reduce voltage and/or frequency guard-bands with reduced power and area overhead.
  • the shadow flip-flops used for monitoring may be operated only when required for adaptive operation and its clock may be gated during the rest of the time.
  • RCB 222 may be coupled to many canary flip-flops, and clocks to these canary flip-flops may be gated, e.g., via the control signal EN to LCB 224, to reduce power overhead.
  • the area overhead may be reduced by integrating all desired circuits in a single cell.
  • the area overhead of CFF 240 may be reduced by integrating the two flip-flops, MFF 242 and SFF 244, in a single cell together with LDB 248 as well as comparator 246.
  • FIG. 3 illustrates an integrated canary flip-flop cell 300, in accordance with various embodiments.
  • Cell 300 may be practiced in place of CFF 140 in connection with Figure 1 .
  • Cell 300 may include MFF 310, SFF 320, and comparator 330.
  • MFF 310 may receive data D and clock signal CLK1
  • SFF 320 may receive data D and clock signal CLK2.
  • MFF 310 may generate output Q and also generate input to comparator 330.
  • comparator 330 may be coupled with and receive input from SFF 320.
  • Comparator 330 may generate an output signal W based on the inputs from MFF 310 and SFF 320.
  • CLK2 may be in-phase with but earlier than CLK1 .
  • CLK1 may control MFF 310, and CLK2 may control SFF 320.
  • CLK2 may be configured to be earlier than CLK1 accordingly to different embodiments disclosed in connection with Figure 1 .
  • CLK1 may be a regular clock, but CLK2 may be an early clock.
  • comparator 330 may include an XOR gate to receive both inputs from MFF 310 and SFF 320.
  • SFF 320 may fail to latch the correct data, e.g., due to the early clock, then, comparator 330 may detect a mismatch between MFF 310 and SFF 320.
  • a timing path including cell 300 may be just about to fail timing criteria, and comparator 330 may generate a warning signal.
  • MFF 310 may still operate to latch the correct data and generate the correct output, but MFF 310 may be about to fail the timing criteria within the time margin between CLK1 and CLK2. If the operating conditions are expected to worsen, e.g., the voltage drops or the temperature changes, then MFF 310 may actually fail the timing criteria.
  • the warning signal may be latched in a pulse catcher (not shown) and routed to a controller (not shown) in the system or directly to the clock generator (e.g., for faster response).
  • the warning signal may be used to trigger the controller to adjust its operating conditions, e.g., to increase the cycle time or voltage to avoid this potential upcoming failure.
  • MFF 310 may always latch the right data, then no correction would be required.
  • CFF cell 300 may provide a warning that a timing error is imminent for MFF 310, while MFF 310 may be expected to sample the correct data.
  • CFFs may be used to monitor critical paths and facilitate voltage and/or frequency guard-bands reduction in a system, e.g., system 100 in connection with Figure 1 .
  • a system e.g., system 100 in connection with Figure 1 .
  • a larger number of paths in the system may need to be monitored.
  • CFFs may be configured to monitor those paths with reduced power and area overhead.
  • SFF 320 may be controlled by an early clock, e.g., CLK2 that can be gated to save power.
  • the area overhead may be reduced by integrating the shadow and main flip-flops in a single cell together with the comparator.
  • the comparator may be integrated together with two flip-flops into a single library cell with its own pre- characterized timing parameters. Such integration may at least save a few inverters in the XOR function of the comparator and provide deterministic timing parameters compared to a place-and-route design using discrete library cells. Such integration may also remove the different routing variations when distinct flip-flop and XOR cells are routed together. Illustrated by the current figure, for instance, MFF 310, SFF 320, and comparator 330 may be integrated into one single library cell.
  • FIG. 4 illustrates an integrated canary flip-flop cell 400, including a pulse catcher, in accordance with various embodiments.
  • Cell 400 may be practiced in place of CFF 140 in connection with Figure 1 .
  • Cell 400 may include MFF 410, SFF 420, comparator 430, and pulse catcher 440.
  • MFF 410 may receive data D and clock signal CLK1
  • SFF 420 may receive data D and clock signal CLK2.
  • MFF 410 may generate output Q and also generate input to comparator 430.
  • comparator 430 may be coupled with and receive input from SFF 420.
  • the output of comparator 430 may be latched into pulse catcher 440.
  • Pulse catcher 440 may generate output W.
  • CLK1 may control MFF 410
  • CLK2 may control SFF 420.
  • CLK2 may be in-phase with but earlier than CLK1 .
  • CLK1 may be a regular clock, but CLK2 may be an early clock.
  • comparator 430 may include an XOR gate to receive both inputs from MFF 410 and SFF 420. When the values latched by MFF 410 and SFF 420 agree, there is likely enough timing margin in the path. When comparator 430 detects a mismatch between MFF 410 and SFF 420, the timing path including cell 400 may be about to fail timing criteria, and comparator 430 may generate a warning signal. The output generated by comparator 430 is only valid for a full cycle as comparator 430 may generate different outputs in different cycles. The output signal from comparator 430 may be latched in pulse catcher 440.
  • Pulse catcher 440 may hold the output signal from comparator 430, e.g., a pulse warning, and output it to a controller for further actions.
  • pulse catcher 440 may be integrated with MFF 410, SFF 420, and comparator 430 in a single cell.
  • FIG. 5 illustrates control loops 500 of a power control unit (PCU) 510, incorporating aspects of the present disclosure, in accordance with various embodiments.
  • PCU 510 may be coupled to and receive signals from processor or graphics core 540, where temperature sensor 550 and/or timing sensor 560 may reside.
  • PCU 510 may be coupled to and send signals to frequency regulator 520 and/or voltage regulator 530, while core 540 may be coupled to and receive signals from frequency regulator 520 and/or voltage regulator 530.
  • Control loops 500 may provide various corrective operations triggered by a warning signal of an upcoming timing failure.
  • core 540 may be a processor core or a graphic core. In various circumstances, it may be desirable to enhance the core operating point to a lower voltage for a better power efficiency or a higher frequency for higher performance via guard-band reduction. Temperature sensor 550 and/or timing sensor 560 may be used to monitor various operating conditions relating to guard-band reduction.
  • temperature sensor 550 may include one or more digital temperature sensors. Temperature sensor 550 may measure the real- time temperature at a particular location of core 540 or core 540 as a whole. PCU 510 may receive the temperature information from temperature sensor 550, but also analyze whether core 540 is warming up or cooling off based on the historical temperature data. PCU 510 may take the temperature information into account to decide whether and how to react to a warning signal of an upcoming timing failure. As an example, if temperature sensor 550 indicates that core 540 is cooling off quickly, then a warning signal of an upcoming timing failure may not even warrant a response as it is expected to go away by itself due to improved operating conditions. On the contrary, if temperature sensor 550 indicates that core 540 is warming up quickly, then a warning signal of an upcoming timing failure may warrant responsive actions as the temperature evolves. In
  • reaction of the PCU 510 to temperature changes reported by temperature sensor 550 may be programmed to be inverted to account for the reverse temperature dependence observed in high-K metal-gate process technologies.
  • timing sensor 560 may include one or more CFFs as disclosed in connection with Figures 1 -4.
  • timing sensor 560 may be in-situ timing margin sensors distributed across core 540 at the end- points of one or more known timing critical paths. In-situ sensors may measure the real signal passing through those critical paths. Thus, the actual guard-band on that signal may be measured.
  • the output from these in-situ timing margin sensors may be aggregated into a single warning signal that alerts PCU 510 of an impending timing failure.
  • the outputs from various comparators or pulse catchers illustrated in Figures 1 -4 may be aggregated into a single error signal that is latched into a pulse catcher before sending to PCU 510.
  • PCU 510 may be an online micro controller that may receive signals from temperature sensor 550 and/or timing sensor 560, and respond accordingly based on received signals.
  • PCU 510 may set up one or more thresholds to filter warning signals from timing sensor 560.
  • PCU 510 may not respond to warming signals below a threshold.
  • PCU 510 may decide to sit on a warning for a while to see if more warning signals may follow suit from the same block or the same area of the chip.
  • PCU 510 may decide to respond to a warning signal if it is a repetitive warning signal from a critical block of the chip.
  • PCU 510 may respond to a warning signal by controlling the operating voltage through voltage regulator 530, and/or the frequency through frequency regulator 520.
  • Voltage regulator 530 controlled by PCU 510, may be either onboard or on-die. Higher voltage may make timing faster, thus PCU 510 may command voltage regulator 530 to raise the voltage to core 540 to alleviate the condition indicated by the warning signal.
  • An alternative resolution may be taken by PCU 510 is to command frequency regulator 520 to relax the frequency to subdue the warning signals.
  • Frequency regulator 520 may include analog or digital phase-locked loops (PLLs).
  • PCU 510 may adopt thermal solutions to control temperature, for example, to speed up a fan to cool off core 540.
  • PCU 510 needs to communicate to voltage regulator 530, which could be onboard or on-die. Communication with an onboard voltage regulator may take longer than communication with an on-die voltage regulator. However, generally it will take much longer for voltage regulator 530 to adjust the voltage to core 540 comparing to the time required to adjust frequency by frequency regulator 520. Therefore, in embodiments, PCU 510 may choose to adjust frequency first in response to a warning signal. Subsequently, PCU 510 may decide to raise the voltage to core 540 in some embodiments. Once the voltage is raised, the frequency may be brought back to its previous level in some embodiments.
  • PCU 510 may activate an adaptive feedback process for guard-band reduction after an operating condition changed. For example, PCU 510 may activate the EN signal to RCB 124 in Figure 1 or the EN signal to LCB 224 in Figure 2 when voltage, frequency, or large enough temperature changes occur at core 540. Afterwards, PCU 510 may tighten the operating margins, e.g., to reduce voltage or increase frequency, until a warning is received from timing sensor 560. In embodiments, PCU 510 may activate some or all critical paths for guard-band reduction. Once a first warning is detected, PCU 510 may relax the voltage and frequency settings by an
  • Such adaptive feedback process may be repeated every time for the timing and voltage guard-band reduction when the operating conditions change.
  • Core 540 may continue operating all the time while the margin calibration process is conducted in parallel.
  • This scheme may work with both analog and digital PLLs as well as multiple types of voltage regulators, e.g., on- board, on-package or integrated on-die. Generally, the faster the frequency and voltage response, the higher the savings obtained through such adaptive operations.
  • PCU 510 may run such adaptive operations based on a timer, which may be adjustable. After operating conditions change, PCU 510 may activate timing sensor 560 to monitor critical paths till the timer elapses, then a part or the whole control loops 500 may be turned off to save power.
  • the shadow flip-flops may be enabled only during an adaptive operation.
  • PCU 510 may enable the clocks to those shadow flip-flops for an adequate period of time to reduce the guard-band at the new voltage, frequency and temperature settings. Once the guard-band has been satisfactorily reduced or minimized, as long as the voltage, frequency and temperature conditions do not change, there is no need to burn power to continuously calibrate the system. Thus those shadow flip-flops may be gated off to reduce the power overhead.
  • PCU 510 may maintain a soft look-up table of valid recent operating conditions. Based on the look-up table, PCU 510 may jump to some known voltage, frequency, and temperature operating points without turning on the adaptive feedback process. Thus, power overhead may be further reduced, and system efficiency may be further improved. In embodiments, this look-up table may be lost when core 540 is reset, however, PCU 510 may reconstruct this look-up table by storing recent valid operating points again. In embodiments, this look-up table may be stored in a non-volatile storage array (e.g., either on-die or contained in the system) and may be re- loaded after core 540 is reset.
  • a non-volatile storage array e.g., either on-die or contained in the system
  • Embodiments for the guard-band reduction using canary flip- flops described herein may be used in a number of implementations and applications.
  • mobile devices including but not limited to smart phones, tablets, and other Mobile Internet Devices (MIDs) may have processors or graphic cores that would benefit from improved guard-band reduction operations.
  • MIDs Mobile Internet Devices
  • FIG. 6 is a block diagram that illustrates an example computer system 600 suitable for practicing the disclosed embodiments, in accordance with various embodiments.
  • the computer system 600 may include a power supply unit 620, a number of processors or processor cores 610, a system memory 630 having processor-readable and processor-executable instructions 680 stored therein, a non-volatile memory (NVM)/storage 640 that may also store the instructions 680, an I/O interface 650, and a communication interface 660.
  • NVM non-volatile memory
  • At least one of the processors 610 may generate or cause to be generated a signal to trigger activation/deactivation of a guard-band reduction operation, in response to the processor 610 receiving or otherwise evaluating the state of the output signal provided by one or more canary flip-flops (e.g., residing in a timing sensor).
  • various other components may generate one or more of such signals in response to the output signal from the one or more canary flip-flops in the processors 610.
  • the one or more NVM/storage 640 and/or the memory 630 may comprise a tangible, non-transitory computer-readable storage device (such as a diskette, hard drive, compact disc read only memory (CDROM), hardware storage unit, flash memory, phase change memory (PCM), solid-state drive (SSD) memory, and so forth).
  • the instructions 680 stored in the NVM/storage 640 and/or the memory 630 may be executable by one or more of the
  • the computer system 600 may also comprise input/output devices
  • Communication interface 660 may provide an interface for computing device 600 to communicate over one or more network(s) and/or with any other suitable device.
  • Communication interface 660 may include any suitable hardware and/or firmware, such as a network adapter, one or more antennas, wireless
  • communication interface 660 may include an interface for computing device 600 to use near field
  • communication interface 660 may interoperate with radio communications technologies such as, for example, Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications (GSM), Long Term Evolution (LTE), WiFi, Bluetooth®, Zigbee, and the like.
  • WCDMA Wideband Code Division Multiple Access
  • GSM Global System for Mobile communications
  • LTE Long Term Evolution
  • WiFi Wireless Fidelity
  • Bluetooth® Zigbee
  • system bus 670 which represents one or more buses. In the case of multiple buses, they may be bridged by one or more bus bridges (not shown). Data may pass through the system bus 670 through the I/O interface 650, for example, between an output terminal and the processors 610.
  • the system memory 630 and the mass storage device 640 may be employed to store a working copy and a permanent copy of the programming instructions implementing one or more operating systems, firmware modules or drivers, applications, and so forth, herein collectively denoted as instructions 680.
  • instructions 680 may include logic for guard-band reduction described in connection with Figures 1 -5.
  • the permanent copy of the programming instructions may be placed into permanent storage in the factory, or in the field, via, for example, a distribution medium (not shown), such as a compact disc (CD), or through the communication interface 660 (from a distribution server (not shown)).
  • at least one of the processor(s) 610 may be packaged together with memory having instruction 680.
  • at least one of the processor(s) 610 may be packaged together with memory having instruction 680 to form a System in Package (SiP).
  • SiP System in Package
  • At least one of the processor(s) 610 may be integrated on the same die with memory having instruction 680. In some embodiments, at least one of the processor(s) 610 may be integrated on the same die with memory having instruction 680 to form a System on Chip (SoC).
  • SoC System on Chip
  • one or more of the depicted components of the system 600 and/or other element(s) may include a keyboard, LCD screen, non-volatile memory port, multiple antennas, graphics processor, application processor, speakers, or other associated mobile device elements, including a camera.
  • a keyboard LCD screen
  • non-volatile memory port multiple antennas
  • graphics processor application processor
  • speakers or other associated mobile device elements, including a camera.
  • the remaining constitution of the various elements of the computer system 600 is known, and accordingly will not be further described in detail.
  • Example 1 is an apparatus, which may include a comparator; a first flip-flop, coupled to the comparator, to generate a first input to the
  • the comparator based at least in part on a first clock signal and a data input; and a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on a second clock signal and the data input.
  • the first clock signal, the second clock signal, and the data input may be configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator may be configured to generate a signal based on a comparison of the first and second inputs.
  • Example 2 may include the subject matter of Example 1 , and may further include a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; and a local clock buffer, coupled to the first regional clock buffer and the first flip-flop, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip- flop.
  • Example 3 may include the subject matter of Examples 1 or 2, and may further include a second regional clock buffer, coupled to the second flip- flop, to supply the second clock signal to the second flip-flop, wherein the second regional clock buffer is gated by an enabling signal.
  • a second regional clock buffer coupled to the second flip- flop, to supply the second clock signal to the second flip-flop, wherein the second regional clock buffer is gated by an enabling signal.
  • Example 4 may include the subject matter of Example 3, and may further specify that the second regional clock buffer may be gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core, e.g., by a power control unit.
  • Example 5 may include the subject matter of any one of Examples
  • the second clock signal and the first clock signal may have a programmable clock delay delta, and the second clock signal is earlier than the first clock signal by the programmable clock delay delta.
  • Example 6 may include the subject matter of Example 5, and may further specify that the first flip-flop may always latch data correctly from the data input based on the first clock signal, and the second flip-flop may fail to latch the data correctly from the data input based on the second clock signal when the data arrives within the clock delay delta from the end of a cycle of the second clock signal.
  • Example 7 may include the subject matter of any one of Examples
  • Example 8 may include the subject matter of any one of Examples 1 -7, and may further specify that the comparator may be integrated with the first flip-flop and the second flip-flop in a single cell.
  • Example 9 may include the subject matter of any one of Examples 1 -8, and may further include a pulse catcher, coupled with the comparator, to latch the signal.
  • Example 10 may include the subject matter of Example 9, and may further specify the pulse catcher is integrated with the first flip-flop and the second flip-flop in a single cell.
  • Example 1 1 may include the subject matter of any one of
  • Examples 1 -10 may further include a regional clock buffer to supply a third clock signal in phase with the first clock signal and the second clock signal; a first local clock buffer, coupled to the regional clock buffer and the first flip-flop, to receive the third clock signal from the regional clock buffer and to output the first clock signal to the first flip-flop; a second local clock buffer, coupled to the regional clock buffer and the second flip-flop, to receive the third clock signal from the first regional clock buffer and to output the second clock signal to the second flip-flop, wherein the second local clock buffer is gated by an enabling signal; and a third local buffer, coupled to the to the second flip-flop, configured to receive the data input and to output the data input to the second flip-flop.
  • a regional clock buffer to supply a third clock signal in phase with the first clock signal and the second clock signal
  • a first local clock buffer coupled to the regional clock buffer and the first flip-flop, to receive the third clock signal from the regional clock buffer and to output the first clock signal to the first flip-flop
  • a second local clock buffer
  • Example 12 may include the subject matter of Example 1 1 , and may further specify the second local clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core, e.g., by a power control unit.
  • Example 13 may include the subject matter of Example 1 1 or
  • the second flip-flop receives the data input later than the first flip-flop by a delay based at least in part on the local buffer.
  • Example 14 may include the subject matter of any one of
  • Examples 1 1 -13 may further specify that the first flip-flop always latches data correctly from the data input based on the first clock signal, and the second flip-flop will fail to latch the data correctly from the data input based on the second clock signal when the data arrives within the delay from the end of a cycle of the second clock signal.
  • Example 15 is a method, which may include generating a first output signal, by a first flip-flop to a comparator, based at least in part on a first clock signal and a data input; generating a second output signal, by a second flip-flop to the comparator, based at least in part on an second clock signal and the data input, wherein the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop; and generating a warning signal, by the comparator, when the second output signal differs from the first output signal.
  • Example 16 may include the subject matter of Example 15, and may further include latching the warning signal, e.g., by a pulse catcher.
  • Example 17 may include the subject matter of Example 15 or 16, and may further include supplying the first clock signal and the second clock signal from a regional clock buffer; providing a first local clock buffer to the first clock signal; and providing a second local clock buffer to the second clock signal.
  • Example 18 may include the subject matter of Example 17, and may further include gating on the second local clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
  • Example 19 may include the subject matter of Example 15, and may further include supplying the first clock signal from a first regional clock buffer; supplying the second clock signal from a second regional clock buffer wherein the second clock signal is in phase with the first clock signal; and providing a clock delay delta to the first clock signal to cause the second clock signal to be earlier than the first clock signal.
  • Example 20 may include the subject matter of Example 19, and may further specify that gating on the second regional clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
  • Example 21 may include the subject matter of Example 18 or 20, and may further specify that reducing a voltage or increasing a frequency to the processor or graphics core, by a power control unit, until a warning signal from the comparator is detected; and increasing the voltage or reducing the frequency by an acceptable margin, by the power control unit, once the first warning signal is detected.
  • Example 22 may include the subject matter of any one of
  • Examples 18, 20, and 21 may further specify that the processor or graphics core continues to operate during the adaptive operation.
  • Example 23 may include the subject matter of any one of
  • Examples 18 and 20-22 may further specify that the power control unit is configured to activate the adaptive operation when an operating condition changes.
  • Example 24 may include the subject matter of any one of
  • Examples 18 and 20-23 may further specify that the power control unit is configured to maintain a look-up table of valid recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
  • Example 25 is a system, which may include a processor or graphics core having a plurality of in-situ timing margin sensors and a
  • Each of the plurality of in-situ timing margin sensors may include a comparator; a first flip-flop, coupled to the comparator, to generate a first input to the comparator
  • the comparator based at least in part on a first clock signal and a data input; and a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on an second clock signal and the data input.
  • the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator is configured to generate a warning signal when the second input differs from the first input.
  • Example 26 may include the subject matter of Example 25, and may further specify that at least more than one of the plurality of in-situ timing margin sensors are distributed at respective end-points of a plurality of known timing critical paths of the processor or graphics core.
  • Example 27 may include the subject matter of Example 26, and may further specify that respective warning signals of the at least more than one of the plurality of in-situ timing margin sensors are aggregated into a single warning signal to alert the power control unit of an impending timing failure.
  • Example 28 may include the subject matter of any one of
  • Examples 25-27 may further specify that the processor or graphics core further includes a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; a plurality of local clock buffers, respectively coupled to the first flip-flops in a portion of the plurality of in-situ timing margin sensors, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip-flops; and a second regional clock buffer, coupled to a plurality of second flip-flops in the block, to supply the second clock signal to the second flip-flops in the portion of the plurality of in-situ timing margin sensors, wherein the second regional clock buffer is gated.
  • a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal
  • a plurality of local clock buffers respectively coupled to the first flip-flops in a portion of the plurality of in-situ timing margin sensors, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip-flop
  • Example 29 may include the subject matter of Example 28, and may further specify that the second regional clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for the processor or graphics core by the power control unit.
  • Example 30 may include the subject matter of Example 29, and may further specify that the processor or graphics core continues to operate during the adaptive operation.
  • Example 31 may include the subject matter of Example 29 or 30, and may further specify that the power control unit is configured to maintain a look-up table of recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
  • Example 32 may include the subject matter of any one of
  • Examples 29-31 may further specify that the power control unit is configured to control the voltage regulator to reduce the voltage or control the frequency regulator to reduce the clock frequency until a warning signal from the plurality of in-situ timing margin sensors is detected.

Landscapes

  • Power Sources (AREA)

Abstract

An apparatus is configured for adaptive guard-band reduction using canary flip-flops with reduced power and area overhead. The apparatus may include a comparator configured to generate a signal by comparing two inputs. The apparatus may also include a first flip-flop to generate the first input based at least in part on a first clock signal and a data input. The apparatus may further include a second flip-flop to generate the second input to the comparator based at least in part on a second clock signal and the data input. The first clock signal, the second clock signal, and the data may be configured to enable the data to be sampled earlier by the second flip-flop than the first flip-flop.

Description

APPARATUS AND METHOD FOR ADAPTIVE GUARD-BAND REDUCTION
TECHNICAL FIELD
This disclosure relates generally to electronic circuits. More particularly but not exclusively, the present disclosure relates to adaptive guard- band reduction using canary flip-flops with reduced power and area overhead.
BACKGROUND INFORMATION
Timing margins for modern multi-core processors are usually checked for worst-case loading and operating conditions. These worst-case conditions may rarely happen during normal operations, so the amount of over- provisioning of timing margins, typically called "guard-band" and expressed in either voltage or timing terms, may cause inefficiencies if excessive.
There are opportunities to enhance the core operating point to a lower voltage for better power efficiency or a higher frequency for higher performance via guard-band reduction. As an example, the guard-band may be increased when portions of the chip are turned off while not in use, which reduces the total chip power consumption and increases the local power supply voltage provided to the circuits that remain operational. Similar conditions arise when the application code running on each core does not fully stress the operating conditions or if the process corner and the operating temperature are further away from the assumptions modeled during the design phase.
Functioning after guard-band reduction is opportunistic, so guard- band reduction may be used by each processor or graphics core in an adaptive process whenever possible. However, adaptive circuit techniques for reducing voltage and/or timing margins usually have limited coverage or high area and power overhead that make them unfit for large volume commercial applications.
BRIEF DESCRIPTION OF THE DRAWINGS
Non-limiting and non-exhaustive embodiments are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified. Figure 1 is a schematic diagram of one system for adaptive guard- band reduction using canary flip-flops, incorporating aspects of the present disclosure, in accordance with various embodiments.
Figure 2 is a schematic diagram of another system for adaptive guard-band reduction using canary flip-flops, incorporating aspects of the present disclosure, in accordance with various embodiments.
Figure 3 is a schematic diagram of an integrated canary flip-flop cell, incorporating aspects of the present disclosure, in accordance with various embodiments.
Figure 4 is a schematic diagram of an integrated canary flip-flop cell including a pulse catcher, incorporating aspects of the present disclosure, in accordance with various embodiments.
Figure 5 is a schematic diagram that illustrates control loops of a power control unit, incorporating aspects of the present disclosure, in
accordance with various embodiments.
Figure 6 is a block diagram that illustrates an example computer system suitable for practicing the disclosed embodiments, in accordance with various embodiments. DETAILED DESCRIPTION
Embodiments of adaptive guard-band reduction apparatus and method are described herein. In the following description, numerous specific details are given to provide a thorough understanding of embodiments. The embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the embodiments.
In the following description, reference is made to the accompanying drawings, which form a part hereof, wherein like numerals designate like parts throughout, and in which is shown by way of illustration embodiments that may be practiced. It is to be understood that other
embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of embodiments is defined by the appended claims and their equivalents.
Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the claimed subject matter. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order than the described embodiment. Various additional operations may be performed and/or described operations may be omitted in additional embodiments.
For the purposes of the present disclosure, the phrase "A and/or B" means (A), (B), or (A and B). For the purposes of the present disclosure, the phrase "A, B, and/or C" means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B and C). Where the disclosure recites "a" or "a first" element or the equivalent thereof, such disclosure includes one or more such elements, neither requiring nor excluding two or more such elements. Further, ordinal indicators (e.g., first, second or third) for identified elements are used to distinguish between the elements, and do not indicate or imply a required or limited number of such elements, nor do they indicate a particular position or order of such elements unless otherwise specifically stated.
Reference in the description to one embodiment or an embodiment means that a particular feature, structure or characteristic described in
connection with the embodiment is included in at least one embodiment of the invention. The description may use the phrases "in one embodiment," "in an embodiment," "in another embodiment," "in embodiments," "in various
embodiments," or the like, which may each refer to one or more of the same or different embodiments. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Additionally, the terms "comprising," "including," "having," and the like, as used with respect to embodiments of the present disclosure, are synonymous.
Various embodiments provide at least a lower power and area canary flip-flop implementation that reduces voltage and frequency guard-bands. The canary flip-flop may include a first flip-flop to generate a first input to a comparator based at least in part on a first clock signal and a data input. The canary flip-flop may also include a second flip-flop to generate a second input to the comparator based at least in part on a second clock signal and the data input. The first clock signal, the second clock signal, and the data input may be configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop. The second flip-flop may be gated to save power. The area overhead of the canary flip-flip may be reduced by integrating the two flip- flops in a single cell together with the comparator. Moreover, a pulse catcher may also be integrated into the same cell. As such, the second flip-flop may be operated only during the adaptive operation for guard-band reduction, and be gated during the rest of the time. Moreover, the area overhead may be reduced by integrating all required circuits in a single cell.
Referring now to Figure 1 , system 100 for adaptive guard-band reduction using canary flip-flops, in accordance with various embodiments, is illustrated. System 100 may include clock grid 1 10 coupled to regional clock buffer (RCB) 122 as well as RCB 124. On one branch of system 100, RCB 122 may be coupled to local clock buffer (LCB) 126. LCB 126 may be coupled to data storage 134 and main flip-flop (MFF) 142 in canary flip-flop (CFF) 140.
Data storage 134 may receive data D on one side and send data to cloud 136. Cloud 136 may be coupled to MFF 142. MFF 142 may generate output signal Q. On another branch of system 100, RCB 124 may be coupled to shadow flip-flop (SFF) 144 in CFF 140. SFF 144 may receive data D, and may be coupled to comparator 146. Comparator 146 may also be coupled to MFF 142. Moreover, comparator 146 may generate an output signal W.
Clock grid 1 10 may generate and amplify high-frequency clock signals, and distribute them to various RCBs. In various embodiments, clock signals may be distributed using many drivers in a tree schema, e.g., binary tree, H-tree, X-tree, etc. In embodiments, RCB 122 and RCB 124 may be located at a same level of clock distribution tree. The clock signals produced by clock grid 1 10 may reach RCB 122 and RCB 124 at the same time. Therefore, RCB 122 and RCB 124 may form a dual RCB structure to drive in-phase clock signals to CFF 140. In embodiments, RCB 122 and RCB 124, each may be an adjustable regional clock buffer whose clock signals may be adjusted.
In embodiments, RCB 122 may be coupled to LCB 126. LCB 126 may be a fixed clock buffer where clock signals cannot be adjusted. LCB 126 may add a delay to its output clock signals. Clock signals from LCB 126 may be sent to data storage 134 which may receive and store data D. Data D may be sent to cloud 136 for processing. Cloud 136 represents logic that may perform various operations with respect to, or based on, data D. RCB 122 may drive clock signals to MFF 142 via LCB 126 wherein the clock signals may be delayed due to LCB 126. In embodiments, the delay by LCB 126 may be adjustable or programmable. The line between RCB 122 and MFF 142 thus may become a programmable delay line. The delay may be adjusted to control a timing of a warning generated by CFF 140.
In embodiments, RCB 124 may drive clock signals to SFF 144. SFF 144 may receive clock signals from RCB 124 earlier than MFF 142, e.g., because there is no LCB between RCB 124 and SFF 144 to cause any delay. MFF 142 and SFF 144, each may sample data according to their respective clock signals. Data may be sampled earlier in SFF 144 than in MFF 142 because clock signals are earlier in SFF 144 than in MFF 142. Thus, sampling data may be correct at MFF 142, but fail at SFF 144 because the data may have less time to propagate under a relatively earlier clock at SFF 144. In other words, SFF 144 may see a failure, but the correct data may still be latched at MFF 142.
Comparator 146 may generate a warning signal W if the input received from MFF 142 differs from the input received from SFF 144, e.g., when MFF 142 timely latches the data, but SFF 144 fails to latch the data due to its earlier clock. In embodiments, MFF 142 may have latched the correct data when a warning signal is generated because SFF 144 supposes to fail first due to its tighter timing requirement, e.g., caused by its earlier clock. A warning signal may indicate that CFF 140 is going to fail soon if its operating conditions remain the same or worsen. Once a warning signal is generated, it may be sent to a controller (not shown), which may adjust system 100 accordingly, e.g., raising the voltage or lowering the frequency, to mitigate the risk of timing failure for MFF 142. Therefore, CFF 140 may always generate the correct output Q when such failure prevention mechanism is in place.
In embodiments, comparator 146 may receive different inputs from MFF 142 and SFF 144 in different cycles of the clock signal, and a comparison may be performed in each cycle. There may be many paths and logics in system 100. CFF 140 may be located at a critical path of system 100. CFF 140 may indicate sufficient timing margins for data coming from one source, but generate a warning signal for data coming from another source. Thus, CFF 140 may generate an indication of whether it is going to fail the timing in each cycle, and it may continuously consume power in doing so.
In embodiments, RCB 124 may receive a control signal EN, which may enable or disable RCB 124. When RCB 124 is enabled, SFF 144 will attempt to latch data in every cycle. When RCB 124 is disabled, SFF 144 can be turned off. Therefore, SFF 144 may be selectively enabled when there is a need to monitor a critical path of system 100. As an example, RCB 124 may be enabled when the operating conditions of a critical path have changed, so that CFF 140 may be used to monitor the critical path for a period of time. When system 100 establishes sufficient timing margins under the new operating conditions, RCB 124 may be disabled, or at least the clock signals from RCB 124 may be turned off, such that SFF 144 and/or comparator 146 can go off-duty to reduce power overhead at CFF 140, but MFF 142 may continue to latch the correct data.
In embodiments, the clock to SFF 144 may be configured to be earlier than the clock to MFF 142. The early clock to SFF 144 may be generated by directly tapping the output from regional clock buffers, e.g., RCB 122 and/or 124. As illustrated above, the clock to SFF 144 may be made earlier by the delay of LCB 126. In other embodiments, dual RCB 120 may use the
adjustability of RCBs 122 and 124 to create an adjustable time window between their respective output clocks. As an example, the clock to SFF 144 may be made earlier than the regular clock to MFF 142 by making RCB 124 earlier than the regular clock. As another example, the clock to SFF 144 may be made earlier than the clock to MFF 142 by making RCB 122 later than the regular clock. In embodiments, the clock to SFF 144 may be earlier than the clock to MFF 142 by a programmable clock delay delta. The adjustable time window between the two input clocks to CFF 140 may enable a shmoo function to sweep the delay to observe different failure points within a given clock region. For example, the time difference between the early and the regular clocks may be adjusted through the separate RCB 122 and 124. The adjustability of RCBs may be used for debug purposes to sweep the entire delay range and monitor the increase in warning signals at all the canary flip-flop outputs.
Enhanced with the present disclosure, the data may be sampled earlier at SFF 144 than MFF 142. Thus, MFF 142 may latch the correct data without any error correction process. During normal operations, if the data meets timing margin at both MFF 142 and SFF 144 such that the same data is correctly latched into both flip-flops, then comparator 146 will receive same inputs, and no warning signals would be generated. However, using an early clock, SFF 144 may fail to latch the correct data if the data arrives within the clock delay delta from the end of the cycle. In this case, the data may meet the timing into MFF 142, but is late going into SFF 144, and thus causing
comparator 146 to warn of an impending timing failure. The warning may prompt system 100 to make appropriate adjustments to gain sufficient timing margins to prevent MFF 142 from actually failing the timing criteria. Advantageously, system 100 may stay away from a real timing failure and/or the trouble of recovering from the timing failure.
In embodiments, CFF 140 may be configured to reduce voltage and/or frequency guard-bands with reduced power and area overhead. RCB 124 may be coupled to many canary flip-flops, and all clocks to these canary flip- flops may be gated, e.g., via the control signal EN to RCB 124, to reduce power overhead. The area overhead of CFF 140 may be reduced by integrating the two flip-flops, MFF 142 and SFF 144, in a single cell together with comparator 146.
Figure 2 illustrates system 200 for adaptive guard-band reduction using canary flip-flops, in accordance with various embodiments. System 200 may include clock grid 210 coupled to RCB 222. On one branch of system 200, RCB 222 may be coupled to LCB 226. LCB 226 may be coupled to data storage 234 and MFF 242 in CFF 240. Data storage 234 may receive data D on one side and send data D to cloud 236. Cloud 236 may be coupled to MFF 242. MFF 242 may generate output signal Q. On another branch of system 200, RCB 222 may be coupled to LCB 224. LCB 224 may be coupled to SFF 244 in CFF 240. SFF 244 may receive data via local delay buffer (LDB) 248, which may receive data from cloud 236. SFF 244 may also be coupled to comparator 246, which may also be coupled to MFF 242. Moreover, comparator 146 may generate output signal W.
Clock grid 210 may generate and amplify high-frequency clock signals, and distribute them to various RCBs, including RCB 222. In
embodiments, RCB 222 may be an adjustable regional clock buffer whose clock signals may be adjusted. RCB 222 may be coupled to LCB 224 or 226 which may be a fixed clock buffer. In some embodiments, LCB 224 and 226 may add a similar delay to their respective output clock signals. In some embodiments, LCB 224 and 226 may add different amount of delay to their respective output clock signals, therefore, for example, making the clock earlier in reaching SFF 244 than MFF 242. Data storage 234 and cloud 236 may function similar to data storage 134 and cloud 136 respectively in connection with Figure 1 .
RCB 222 may drive clock signals to MFF 242 via LCB 226, or to SFF 244 via LCB 224. In embodiments, the delay caused by LCB 224 or 226 may be adjustable or programmable. The delay may be adjusted to control the timing of the warning that may be generated by CFF 140. In embodiments, the delays by LCB 224 and 226 may be configured to be substantially identical to each other, so that MFF 242 and SFF 244 may receive similar clock signals.
MFF 242 and SFF 244 may each latch data according to their respective clocks. In embodiments, data may be sampled earlier by SFF 244 than MFF 242, at least due to the delay of the data coming to SFF 244, e.g., caused by LDB 248. Data may be latched correctly at MFF 242, but fail at SFF 244 because the data may come to SFF 244 late under similar clocks to MFF 242 and SFF 244.
Comparator 246 may generate a warning signal W if the input received from MFF 242 differs from the input received from SFF 244, e.g., when MFF 242 timely latches the data, but SFF 244 fails to do the same due to a data delay. In embodiments, MFF 242 may have latched the correct data when a warning signal is generated because SFF 244 supposes to fail first due to its earlier sampling on the data signal, e.g., caused by the data delay. Warning signals may prompt system 200 conduct certain adjustments, e.g., raising the voltage or lowering the frequency, to mitigate the risk of timing failure for MFF 242. Therefore, CFF 240 may generate the correct output Q under such failure prevention scheme.
In embodiments, LCB 224 may receive a control signal EN, which may enable or disable LCB 224. LCB 224 may include an AND gate. When LCB 224 is enabled, SFF 244 may attempt to latch data in every cycle. When LCB 224 is disabled, SFF 244 can be turned off. Therefore, SFF 244 may be selectively enabled, e.g., when the operating conditions of system 200 have been changed. When system 200 establishes sufficient timing margins under the new operating conditions, LCB 224 may be disabled, such that power overhead at CFF 240 may be reduced while MFF 242 may continue to latch correct data and produce correct output Q.
Similar to what has been illustrated in connection with Figure 1 , data may be sampled earlier at SFF 244 than MFF 242. Thus, MFF 242 may latch the correct data without any error correction process. During normal operations, if the data meets timing criteria at both MFF 242 and SFF 244, then comparator 246 will receive same inputs, and the output signal from comparator 246 will indicate no warnings. However, SFF 244 may fail to correctly latch the data if the data delayed by LDB 248 misses the setup time at the end of the cycle for SFF 244. In this case, the data may meet timing criteria at MFF 242, but may become too late for SFF 244, and thus cause comparator 246 to warn of an impending timing failure. The warning may prompt system 200 to make appropriate adjustments to gain sufficient timing margins to prevent MFF 242 from failing the timing criteria.
In embodiments, CFF 240 may be configured to reduce voltage and/or frequency guard-bands with reduced power and area overhead. The shadow flip-flops used for monitoring may be operated only when required for adaptive operation and its clock may be gated during the rest of the time. For example, RCB 222 may be coupled to many canary flip-flops, and clocks to these canary flip-flops may be gated, e.g., via the control signal EN to LCB 224, to reduce power overhead. In embodiments, the area overhead may be reduced by integrating all desired circuits in a single cell. For example, the area overhead of CFF 240 may be reduced by integrating the two flip-flops, MFF 242 and SFF 244, in a single cell together with LDB 248 as well as comparator 246.
Figure 3 illustrates an integrated canary flip-flop cell 300, in accordance with various embodiments. Cell 300 may be practiced in place of CFF 140 in connection with Figure 1 . Cell 300 may include MFF 310, SFF 320, and comparator 330. MFF 310 may receive data D and clock signal CLK1 , while SFF 320 may receive data D and clock signal CLK2. MFF 310 may generate output Q and also generate input to comparator 330. Similarly, comparator 330 may be coupled with and receive input from SFF 320. Comparator 330 may generate an output signal W based on the inputs from MFF 310 and SFF 320.
In embodiments, CLK2 may be in-phase with but earlier than CLK1 . CLK1 may control MFF 310, and CLK2 may control SFF 320. CLK2 may be configured to be earlier than CLK1 accordingly to different embodiments disclosed in connection with Figure 1 . For example, CLK1 may be a regular clock, but CLK2 may be an early clock.
In embodiments, comparator 330 may include an XOR gate to receive both inputs from MFF 310 and SFF 320. When the values latched by MFF 310 and SFF 320 agree, there is likely enough timing margin in the system. SFF 320 may fail to latch the correct data, e.g., due to the early clock, then, comparator 330 may detect a mismatch between MFF 310 and SFF 320. In this case, a timing path including cell 300 may be just about to fail timing criteria, and comparator 330 may generate a warning signal. When the warning signal is generated, MFF 310 may still operate to latch the correct data and generate the correct output, but MFF 310 may be about to fail the timing criteria within the time margin between CLK1 and CLK2. If the operating conditions are expected to worsen, e.g., the voltage drops or the temperature changes, then MFF 310 may actually fail the timing criteria.
The warning signal may be latched in a pulse catcher (not shown) and routed to a controller (not shown) in the system or directly to the clock generator (e.g., for faster response). The warning signal may be used to trigger the controller to adjust its operating conditions, e.g., to increase the cycle time or voltage to avoid this potential upcoming failure. Enhanced with this disclosure, MFF 310 may always latch the right data, then no correction would be required. Thus, CFF cell 300 may provide a warning that a timing error is imminent for MFF 310, while MFF 310 may be expected to sample the correct data.
CFFs may be used to monitor critical paths and facilitate voltage and/or frequency guard-bands reduction in a system, e.g., system 100 in connection with Figure 1 . In considering the large variations that manifest in extreme low-voltage operation (e.g., in near- or sub-threshold mode), a larger number of paths in the system may need to be monitored.
Enhanced with this disclosure, CFFs may be configured to monitor those paths with reduced power and area overhead. In embodiments, SFF 320 may be controlled by an early clock, e.g., CLK2 that can be gated to save power. In embodiments, when compared to a discrete standard cell implementation, the area overhead may be reduced by integrating the shadow and main flip-flops in a single cell together with the comparator. For example, the comparator may be integrated together with two flip-flops into a single library cell with its own pre- characterized timing parameters. Such integration may at least save a few inverters in the XOR function of the comparator and provide deterministic timing parameters compared to a place-and-route design using discrete library cells. Such integration may also remove the different routing variations when distinct flip-flop and XOR cells are routed together. Illustrated by the current figure, for instance, MFF 310, SFF 320, and comparator 330 may be integrated into one single library cell.
Figure 4 illustrates an integrated canary flip-flop cell 400, including a pulse catcher, in accordance with various embodiments. Cell 400 may be practiced in place of CFF 140 in connection with Figure 1 . Cell 400 may include MFF 410, SFF 420, comparator 430, and pulse catcher 440. MFF 410 may receive data D and clock signal CLK1 , while SFF 420 may receive data D and clock signal CLK2. MFF 410 may generate output Q and also generate input to comparator 430. Similarly, comparator 430 may be coupled with and receive input from SFF 420. The output of comparator 430 may be latched into pulse catcher 440. Pulse catcher 440 may generate output W. CLK1 may control MFF 410, and CLK2 may control SFF 420.
Similar to what is disclosed in connection with Figure 3, in embodiments, CLK2 may be in-phase with but earlier than CLK1 . For example, CLK1 may be a regular clock, but CLK2 may be an early clock.
In embodiments, comparator 430 may include an XOR gate to receive both inputs from MFF 410 and SFF 420. When the values latched by MFF 410 and SFF 420 agree, there is likely enough timing margin in the path. When comparator 430 detects a mismatch between MFF 410 and SFF 420, the timing path including cell 400 may be about to fail timing criteria, and comparator 430 may generate a warning signal. The output generated by comparator 430 is only valid for a full cycle as comparator 430 may generate different outputs in different cycles. The output signal from comparator 430 may be latched in pulse catcher 440. Pulse catcher 440 may hold the output signal from comparator 430, e.g., a pulse warning, and output it to a controller for further actions. In embodiments, to further reduce area overhead, pulse catcher 440 may be integrated with MFF 410, SFF 420, and comparator 430 in a single cell.
Figure 5 illustrates control loops 500 of a power control unit (PCU) 510, incorporating aspects of the present disclosure, in accordance with various embodiments. In embodiments, PCU 510 may be coupled to and receive signals from processor or graphics core 540, where temperature sensor 550 and/or timing sensor 560 may reside. PCU 510 may be coupled to and send signals to frequency regulator 520 and/or voltage regulator 530, while core 540 may be coupled to and receive signals from frequency regulator 520 and/or voltage regulator 530. Control loops 500 may provide various corrective operations triggered by a warning signal of an upcoming timing failure.
In embodiments, core 540 may be a processor core or a graphic core. In various circumstances, it may be desirable to enhance the core operating point to a lower voltage for a better power efficiency or a higher frequency for higher performance via guard-band reduction. Temperature sensor 550 and/or timing sensor 560 may be used to monitor various operating conditions relating to guard-band reduction.
In embodiments, temperature sensor 550 may include one or more digital temperature sensors. Temperature sensor 550 may measure the real- time temperature at a particular location of core 540 or core 540 as a whole. PCU 510 may receive the temperature information from temperature sensor 550, but also analyze whether core 540 is warming up or cooling off based on the historical temperature data. PCU 510 may take the temperature information into account to decide whether and how to react to a warning signal of an upcoming timing failure. As an example, if temperature sensor 550 indicates that core 540 is cooling off quickly, then a warning signal of an upcoming timing failure may not even warrant a response as it is expected to go away by itself due to improved operating conditions. On the contrary, if temperature sensor 550 indicates that core 540 is warming up quickly, then a warning signal of an upcoming timing failure may warrant responsive actions as the temperature evolves. In
embodiments, the reaction of the PCU 510 to temperature changes reported by temperature sensor 550 may be programmed to be inverted to account for the reverse temperature dependence observed in high-K metal-gate process technologies.
In embodiments, timing sensor 560 may include one or more CFFs as disclosed in connection with Figures 1 -4. In embodiments, timing sensor 560 may be in-situ timing margin sensors distributed across core 540 at the end- points of one or more known timing critical paths. In-situ sensors may measure the real signal passing through those critical paths. Thus, the actual guard-band on that signal may be measured. In embodiments, the output from these in-situ timing margin sensors may be aggregated into a single warning signal that alerts PCU 510 of an impending timing failure. For example, the outputs from various comparators or pulse catchers illustrated in Figures 1 -4 may be aggregated into a single error signal that is latched into a pulse catcher before sending to PCU 510.
In embodiments, PCU 510 may be an online micro controller that may receive signals from temperature sensor 550 and/or timing sensor 560, and respond accordingly based on received signals. In some embodiments, PCU 510 may set up one or more thresholds to filter warning signals from timing sensor 560. PCU 510 may not respond to warming signals below a threshold. As an example, PCU 510 may decide to sit on a warning for a while to see if more warning signals may follow suit from the same block or the same area of the chip. As another example, PCU 510 may decide to respond to a warning signal if it is a repetitive warning signal from a critical block of the chip.
In embodiments, PCU 510 may respond to a warning signal by controlling the operating voltage through voltage regulator 530, and/or the frequency through frequency regulator 520. Voltage regulator 530, controlled by PCU 510, may be either onboard or on-die. Higher voltage may make timing faster, thus PCU 510 may command voltage regulator 530 to raise the voltage to core 540 to alleviate the condition indicated by the warning signal. An alternative resolution may be taken by PCU 510 is to command frequency regulator 520 to relax the frequency to subdue the warning signals. Frequency regulator 520 may include analog or digital phase-locked loops (PLLs). In embodiments, PCU 510 may adopt thermal solutions to control temperature, for example, to speed up a fan to cool off core 540.
There are differences between increasing the voltage and reducing the frequency. Reducing the frequency may be generally much faster to accomplish, which can sometimes be done in two or three clock cycles. To increase the voltage, PCU 510 needs to communicate to voltage regulator 530, which could be onboard or on-die. Communication with an onboard voltage regulator may take longer than communication with an on-die voltage regulator. However, generally it will take much longer for voltage regulator 530 to adjust the voltage to core 540 comparing to the time required to adjust frequency by frequency regulator 520. Therefore, in embodiments, PCU 510 may choose to adjust frequency first in response to a warning signal. Subsequently, PCU 510 may decide to raise the voltage to core 540 in some embodiments. Once the voltage is raised, the frequency may be brought back to its previous level in some embodiments.
In embodiments, PCU 510 may activate an adaptive feedback process for guard-band reduction after an operating condition changed. For example, PCU 510 may activate the EN signal to RCB 124 in Figure 1 or the EN signal to LCB 224 in Figure 2 when voltage, frequency, or large enough temperature changes occur at core 540. Afterwards, PCU 510 may tighten the operating margins, e.g., to reduce voltage or increase frequency, until a warning is received from timing sensor 560. In embodiments, PCU 510 may activate some or all critical paths for guard-band reduction. Once a first warning is detected, PCU 510 may relax the voltage and frequency settings by an
acceptable margin to account for voltage droops and sudden temperature changes.
Such adaptive feedback process may be repeated every time for the timing and voltage guard-band reduction when the operating conditions change. Core 540 may continue operating all the time while the margin calibration process is conducted in parallel. This scheme may work with both analog and digital PLLs as well as multiple types of voltage regulators, e.g., on- board, on-package or integrated on-die. Generally, the faster the frequency and voltage response, the higher the savings obtained through such adaptive operations.
Such adaptive operations may only be needed for a period of time after operating conditions change. In embodiments, PCU 510 may run such adaptive operations based on a timer, which may be adjustable. After operating conditions change, PCU 510 may activate timing sensor 560 to monitor critical paths till the timer elapses, then a part or the whole control loops 500 may be turned off to save power.
Relating back to Figures 1 -4, to further reduce the power overhead, the shadow flip-flops may be enabled only during an adaptive operation. When PCU 510 needs to adjust the voltage and frequency to core 540 under a new operating condition, or it detects a large enough local temperature change, PCU 510 may enable the clocks to those shadow flip-flops for an adequate period of time to reduce the guard-band at the new voltage, frequency and temperature settings. Once the guard-band has been satisfactorily reduced or minimized, as long as the voltage, frequency and temperature conditions do not change, there is no need to burn power to continuously calibrate the system. Thus those shadow flip-flops may be gated off to reduce the power overhead.
In embodiments, PCU 510 may maintain a soft look-up table of valid recent operating conditions. Based on the look-up table, PCU 510 may jump to some known voltage, frequency, and temperature operating points without turning on the adaptive feedback process. Thus, power overhead may be further reduced, and system efficiency may be further improved. In embodiments, this look-up table may be lost when core 540 is reset, however, PCU 510 may reconstruct this look-up table by storing recent valid operating points again. In embodiments, this look-up table may be stored in a non-volatile storage array (e.g., either on-die or contained in the system) and may be re- loaded after core 540 is reset.
Embodiments for the guard-band reduction using canary flip- flops described herein may be used in a number of implementations and applications. For example, mobile devices, including but not limited to smart phones, tablets, and other Mobile Internet Devices (MIDs) may have processors or graphic cores that would benefit from improved guard-band reduction operations.
Figure 6 is a block diagram that illustrates an example computer system 600 suitable for practicing the disclosed embodiments, in accordance with various embodiments. As shown, the computer system 600 may include a power supply unit 620, a number of processors or processor cores 610, a system memory 630 having processor-readable and processor-executable instructions 680 stored therein, a non-volatile memory (NVM)/storage 640 that may also store the instructions 680, an I/O interface 650, and a communication interface 660. For the purpose of this application, including the claims, the terms "processor" and "processor cores" may be considered synonymous, unless the context clearly requires otherwise.
In various embodiments of the present disclosure, at least one of the processors 610, including a controller, may generate or cause to be generated a signal to trigger activation/deactivation of a guard-band reduction operation, in response to the processor 610 receiving or otherwise evaluating the state of the output signal provided by one or more canary flip-flops (e.g., residing in a timing sensor). In other embodiments, various other components (internal or external to the system 600) may generate one or more of such signals in response to the output signal from the one or more canary flip-flops in the processors 610.
The one or more NVM/storage 640 and/or the memory 630 may comprise a tangible, non-transitory computer-readable storage device (such as a diskette, hard drive, compact disc read only memory (CDROM), hardware storage unit, flash memory, phase change memory (PCM), solid-state drive (SSD) memory, and so forth). The instructions 680 stored in the NVM/storage 640 and/or the memory 630 may be executable by one or more of the
processors 610.
The computer system 600 may also comprise input/output devices
(not shown) coupled to the computer system 600 via I/O interface 650.
Communication interface 660 may provide an interface for computing device 600 to communicate over one or more network(s) and/or with any other suitable device. Communication interface 660 may include any suitable hardware and/or firmware, such as a network adapter, one or more antennas, wireless
interface(s), and so forth. In various embodiments, communication interface 660 may include an interface for computing device 600 to use near field
communication (NFC), optical communications, or other similar technologies to communicate directly (e.g., without an intermediary) with another device. In various embodiments, communication interface 660 may interoperate with radio communications technologies such as, for example, Wideband Code Division Multiple Access (WCDMA), Global System for Mobile communications (GSM), Long Term Evolution (LTE), WiFi, Bluetooth®, Zigbee, and the like.
The various elements of Figure 6 may be coupled to each other via a system bus 670, which represents one or more buses. In the case of multiple buses, they may be bridged by one or more bus bridges (not shown). Data may pass through the system bus 670 through the I/O interface 650, for example, between an output terminal and the processors 610.
The system memory 630 and the mass storage device 640 may be employed to store a working copy and a permanent copy of the programming instructions implementing one or more operating systems, firmware modules or drivers, applications, and so forth, herein collectively denoted as instructions 680. In embodiments, instructions 680 may include logic for guard-band reduction described in connection with Figures 1 -5. The permanent copy of the programming instructions may be placed into permanent storage in the factory, or in the field, via, for example, a distribution medium (not shown), such as a compact disc (CD), or through the communication interface 660 (from a distribution server (not shown)). In some embodiments, at least one of the processor(s) 610 may be packaged together with memory having instruction 680. In some embodiments, at least one of the processor(s) 610 may be packaged together with memory having instruction 680 to form a System in Package (SiP). In some
embodiments, at least one of the processor(s) 610 may be integrated on the same die with memory having instruction 680. In some embodiments, at least one of the processor(s) 610 may be integrated on the same die with memory having instruction 680 to form a System on Chip (SoC).
According to various embodiments, one or more of the depicted components of the system 600 and/or other element(s) may include a keyboard, LCD screen, non-volatile memory port, multiple antennas, graphics processor, application processor, speakers, or other associated mobile device elements, including a camera. The remaining constitution of the various elements of the computer system 600 is known, and accordingly will not be further described in detail.
The above description of illustrated embodiments, including what is described in the Abstract, is not intended to be exhaustive or to be limited to the precise forms disclosed. While specific embodiments and examples are described herein for illustrative purposes, various modifications are possible. For example, the configuration and connection of certain elements in various embodiments have been described above in the context of regular clock, early clock, and so forth. In other embodiments, data may be still sampled earlier in a shadow flip-flop than in the main flip-flop in a canary flip-flop with appropriate settings of clocks and data feeding into the canary flip-flop.
These and other modifications can be made in light of the above detailed description. The terms used in the following claims should not be construed to be limited to the specific embodiments disclosed in the
specification. The following paragraphs describe examples of various
embodiments.
Example 1 is an apparatus, which may include a comparator; a first flip-flop, coupled to the comparator, to generate a first input to the
comparator based at least in part on a first clock signal and a data input; and a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on a second clock signal and the data input. The first clock signal, the second clock signal, and the data input may be configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator may be configured to generate a signal based on a comparison of the first and second inputs.
Example 2 may include the subject matter of Example 1 , and may further include a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; and a local clock buffer, coupled to the first regional clock buffer and the first flip-flop, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip- flop.
Example 3 may include the subject matter of Examples 1 or 2, and may further include a second regional clock buffer, coupled to the second flip- flop, to supply the second clock signal to the second flip-flop, wherein the second regional clock buffer is gated by an enabling signal.
Example 4 may include the subject matter of Example 3, and may further specify that the second regional clock buffer may be gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core, e.g., by a power control unit.
Example 5 may include the subject matter of any one of Examples
1 -4, and may further specify that the second clock signal and the first clock signal may have a programmable clock delay delta, and the second clock signal is earlier than the first clock signal by the programmable clock delay delta.
Example 6 may include the subject matter of Example 5, and may further specify that the first flip-flop may always latch data correctly from the data input based on the first clock signal, and the second flip-flop may fail to latch the data correctly from the data input based on the second clock signal when the data arrives within the clock delay delta from the end of a cycle of the second clock signal.
Example 7 may include the subject matter of any one of Examples
1 -6, and further specifies that the comparator may include an XOR gate. Example 8 may include the subject matter of any one of Examples 1 -7, and may further specify that the comparator may be integrated with the first flip-flop and the second flip-flop in a single cell.
Example 9 may include the subject matter of any one of Examples 1 -8, and may further include a pulse catcher, coupled with the comparator, to latch the signal.
Example 10 may include the subject matter of Example 9, and may further specify the pulse catcher is integrated with the first flip-flop and the second flip-flop in a single cell.
Example 1 1 may include the subject matter of any one of
Examples 1 -10, and may further include a regional clock buffer to supply a third clock signal in phase with the first clock signal and the second clock signal; a first local clock buffer, coupled to the regional clock buffer and the first flip-flop, to receive the third clock signal from the regional clock buffer and to output the first clock signal to the first flip-flop; a second local clock buffer, coupled to the regional clock buffer and the second flip-flop, to receive the third clock signal from the first regional clock buffer and to output the second clock signal to the second flip-flop, wherein the second local clock buffer is gated by an enabling signal; and a third local buffer, coupled to the to the second flip-flop, configured to receive the data input and to output the data input to the second flip-flop.
Example 12 may include the subject matter of Example 1 1 , and may further specify the second local clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core, e.g., by a power control unit.
Example 13 may include the subject matter of Example 1 1 or
12, and may further specify the second flip-flop receives the data input later than the first flip-flop by a delay based at least in part on the local buffer.
Example 14 may include the subject matter of any one of
Examples 1 1 -13, and may further specify that the first flip-flop always latches data correctly from the data input based on the first clock signal, and the second flip-flop will fail to latch the data correctly from the data input based on the second clock signal when the data arrives within the delay from the end of a cycle of the second clock signal. Example 15 is a method, which may include generating a first output signal, by a first flip-flop to a comparator, based at least in part on a first clock signal and a data input; generating a second output signal, by a second flip-flop to the comparator, based at least in part on an second clock signal and the data input, wherein the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop; and generating a warning signal, by the comparator, when the second output signal differs from the first output signal.
Example 16 may include the subject matter of Example 15, and may further include latching the warning signal, e.g., by a pulse catcher.
Example 17 may include the subject matter of Example 15 or 16, and may further include supplying the first clock signal and the second clock signal from a regional clock buffer; providing a first local clock buffer to the first clock signal; and providing a second local clock buffer to the second clock signal.
Example 18 may include the subject matter of Example 17, and may further include gating on the second local clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
Example 19 may include the subject matter of Example 15, and may further include supplying the first clock signal from a first regional clock buffer; supplying the second clock signal from a second regional clock buffer wherein the second clock signal is in phase with the first clock signal; and providing a clock delay delta to the first clock signal to cause the second clock signal to be earlier than the first clock signal.
Example 20 may include the subject matter of Example 19, and may further specify that gating on the second regional clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
Example 21 may include the subject matter of Example 18 or 20, and may further specify that reducing a voltage or increasing a frequency to the processor or graphics core, by a power control unit, until a warning signal from the comparator is detected; and increasing the voltage or reducing the frequency by an acceptable margin, by the power control unit, once the first warning signal is detected.
Example 22 may include the subject matter of any one of
Examples 18, 20, and 21 , and may further specify that the processor or graphics core continues to operate during the adaptive operation.
Example 23 may include the subject matter of any one of
Examples 18 and 20-22, and may further specify that the power control unit is configured to activate the adaptive operation when an operating condition changes.
Example 24 may include the subject matter of any one of
Examples 18 and 20-23, and may further specify that the power control unit is configured to maintain a look-up table of valid recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
Example 25 is a system, which may include a processor or graphics core having a plurality of in-situ timing margin sensors and a
temperature sensor; a power control unit, coupled with the power control unit, to control voltage or frequency settings of the processor or graphics core; a voltage regulator, coupled with the power control unit, to regulate a voltage to the processor or graphics core; and a frequency regulator, coupled with the power control unit, to regulate a clock frequency to the processor or graphics core. Each of the plurality of in-situ timing margin sensors may include a comparator; a first flip-flop, coupled to the comparator, to generate a first input to the
comparator based at least in part on a first clock signal and a data input; and a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on an second clock signal and the data input. The first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator is configured to generate a warning signal when the second input differs from the first input.
Example 26 may include the subject matter of Example 25, and may further specify that at least more than one of the plurality of in-situ timing margin sensors are distributed at respective end-points of a plurality of known timing critical paths of the processor or graphics core.
Example 27 may include the subject matter of Example 26, and may further specify that respective warning signals of the at least more than one of the plurality of in-situ timing margin sensors are aggregated into a single warning signal to alert the power control unit of an impending timing failure.
Example 28 may include the subject matter of any one of
Examples 25-27, may further specify that the processor or graphics core further includes a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; a plurality of local clock buffers, respectively coupled to the first flip-flops in a portion of the plurality of in-situ timing margin sensors, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip-flops; and a second regional clock buffer, coupled to a plurality of second flip-flops in the block, to supply the second clock signal to the second flip-flops in the portion of the plurality of in-situ timing margin sensors, wherein the second regional clock buffer is gated.
Example 29 may include the subject matter of Example 28, and may further specify that the second regional clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for the processor or graphics core by the power control unit.
Example 30 may include the subject matter of Example 29, and may further specify that the processor or graphics core continues to operate during the adaptive operation.
Example 31 may include the subject matter of Example 29 or 30, and may further specify that the power control unit is configured to maintain a look-up table of recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
Example 32 may include the subject matter of any one of
Examples 29-31 , may further specify that the power control unit is configured to control the voltage regulator to reduce the voltage or control the frequency regulator to reduce the clock frequency until a warning signal from the plurality of in-situ timing margin sensors is detected.

Claims

CLAIMS What is claimed is:
1 . An apparatus, comprising:
a comparator;
a first flip-flop, coupled to the comparator, to generate a first input to the comparator based at least in part on a first clock signal and a data input; and
a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on a second clock signal and the data input,
wherein the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator is configured to generate a signal based on a comparison of the first and second inputs.
2. The apparatus of claim 1 , further comprising:
a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; and
a local clock buffer, coupled to the first regional clock buffer and the first flip-flop, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip-flop.
3. The apparatus of claim 2, further comprising:
a second regional clock buffer, coupled to the second flip-flop, to supply the second clock signal to the second flip-flop, wherein the second regional clock buffer is gated by an enabling signal.
4. The apparatus of claim 3, wherein the second regional clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core.
5. The apparatus of claim 1 , wherein the second clock signal and the first clock signal has a programmable clock delay delta, and the second clock signal is earlier than the first clock signal by the programmable clock delay delta.
6. The apparatus of claim 5, wherein the first flip-flop latches data correctly from the data input based on the first clock signal, and the second flip-flop will fail to latch the data correctly from the data input based on the second clock signal when the data arrives within the clock delay delta from an end of a cycle of the second clock signal.
7. The apparatus of claim 1 , wherein the comparator comprises an XOR gate, and the XOR gate is integrated with the first flip-flop and the second flip-flop in a single cell.
8. The apparatus of claim 1 , further comprising:
a pulse catcher, coupled with the comparator, to latch the signal, and the pulse catcher is integrated with the first flip-flop and the second flip-flop in a single cell.
9. The apparatus of claim 1 , further comprising:
a regional clock buffer to supply a third clock signal in phase with the first clock signal and the second clock signal;
a first local clock buffer, coupled to the regional clock buffer and the first flip-flop, to receive the third clock signal from the regional clock buffer and to output the first clock signal to the first flip-flop;
a second local clock buffer, coupled to the regional clock buffer and the second flip-flop, to receive the third clock signal from the first regional clock buffer and to output the second clock signal to the second flip-flop, wherein the second local clock buffer is gated by an enabling signal; and a local buffer, coupled to the second flip-flop, configured to receive the data input and to output the data input to the second flip-flop.
10. The apparatus of claim 9, wherein the second local clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for a processor or graphics core.
1 1 . The apparatus of claim 9, wherein the second flip-flop receives the data input later than the first flip-flop by a delay based at least in part on the local buffer.
12. The apparatus of any one of claims 9-1 1 , wherein the first flip-flop latches data correctly based on the first clock signal, and the second flip- flop will fail to latch the data correctly based on the second clock signal when the data misses a setup time at an end of a cycle for the second flip-flop.
13. A method, comprising:
generating a first output signal, by a first flip-flop to a comparator, based at least in part on a first clock signal and a data input;
generating a second output signal, by a second flip-flop to the comparator, based at least in part on an second clock signal and the data input, wherein the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop; and
generating a warning signal, by the comparator, when the second output signal differs from the first output signal.
14. The method of claim 13, further comprising:
supplying the first clock signal and the second clock signal from a regional clock buffer;
providing a first local clock buffer to the first clock signal; and providing a second local clock buffer to the second clock signal.
15. The method of claim 14, further comprising: gating on the second local clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
16. The method of claim 13, further comprising:
supplying the first clock signal from a first regional clock buffer; supplying the second clock signal from a second regional clock buffer wherein the second clock signal is in phase with the first clock signal; and
providing a clock delay delta to the first clock signal to cause the second clock signal to be earlier than the first clock signal.
17. The method of claim 16, further comprising:
gating on the second regional clock buffer only during an adaptive operation to establish a new voltage or frequency setting to a processor or graphics core.
18. The method of claim 17, further comprising:
reducing a voltage or increasing a frequency to the processor or graphics core, by a power control unit, until a warning signal from the comparator is detected; and
increasing the voltage or reducing the frequency by an acceptable margin, by the power control unit, once the first warning signal is detected.
19. The method of claim 18, wherein the power control unit is configured to maintain a look-up table of valid recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
20. A system, comprising:
a processor or graphics core having a plurality of in-situ timing margin sensors and a temperature sensor; a power control unit, coupled with the power control unit, to control voltage or frequency settings of the processor or graphics core;
a voltage regulator, coupled with the power control unit, to regulate a voltage to the processor or graphics core; and
a frequency regulator, coupled with the power control unit, to regulate a clock frequency to the processor or graphics core;
wherein each of the plurality of in-situ timing margin sensors comprises:
a comparator;
a first flip-flop, coupled to the comparator, to generate a first input to the comparator based at least in part on a first clock signal and a data input; and
a second flip-flop, coupled to the comparator, to generate a second input to the comparator based at least in part on an second clock signal and the data input, wherein the first clock signal, the second clock signal, and the data input are configured to enable the data input to be sampled earlier by the second flip-flop than the first flip-flop, and the comparator is configured to generate a warning signal when the second input differs from the first input.
21 . The system of claim 20, wherein at least more than one of the plurality of in-situ timing margin sensors are distributed at respective end- points of a plurality of known timing critical paths of the processor or graphics core.
22. The system of claim 20, the processor or graphics core further comprises:
a first regional clock buffer to supply a third clock signal that is in phase with the second clock signal; a plurality of local clock buffers, respectively coupled to the first flip- flops in a portion of the plurality of in-situ timing margin sensors, to receive the third clock signal from the first regional clock buffer and to output the first clock signal to the first flip-flops; and
a second regional clock buffer, coupled to a plurality of second flip- flops in the block, to supply the second clock signal to the second flip- flops in the portion of the plurality of in-situ timing margin sensors, wherein the second regional clock buffer is gated.
23. The system of claim 22, wherein the second regional clock buffer is gated on only during an adaptive operation to establish a new voltage or frequency setting for the processor or graphics core by the power control unit.
24. The system of claim 23, wherein the power control unit is configured to maintain a look-up table of recent operating conditions, and activate the adaptive operation only for a new operating condition not maintained at the look-up table.
25. The system of any one of claims 20-24, wherein the power control unit is configured to control the voltage regulator to reduce a voltage or control the frequency regulator to reduce a clock frequency until a warning signal from the plurality of in-situ timing margin sensors is detected.
PCT/US2013/077263 2013-12-20 2013-12-20 Apparatus and method for adaptive guard-band reduction WO2015094373A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2013/077263 WO2015094373A1 (en) 2013-12-20 2013-12-20 Apparatus and method for adaptive guard-band reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/077263 WO2015094373A1 (en) 2013-12-20 2013-12-20 Apparatus and method for adaptive guard-band reduction

Publications (1)

Publication Number Publication Date
WO2015094373A1 true WO2015094373A1 (en) 2015-06-25

Family

ID=53403452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/077263 WO2015094373A1 (en) 2013-12-20 2013-12-20 Apparatus and method for adaptive guard-band reduction

Country Status (1)

Country Link
WO (1) WO2015094373A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100052730A1 (en) * 2005-09-23 2010-03-04 Edward Grochowski Method and apparatus for late timing transition detection
US20110006827A1 (en) * 2008-02-20 2011-01-13 Hidekichi Shimura Semiconductor integrated circuit
US20110068858A1 (en) * 2009-09-18 2011-03-24 Stmicroelectronics Pvt. Ltd. Fail safe adaptive voltage/frequency system
JP2013055199A (en) * 2011-09-02 2013-03-21 Fujitsu Ltd Semiconductor device
US20130166980A1 (en) * 2011-12-23 2013-06-27 Arm Limited Error recovery in a data processing apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100052730A1 (en) * 2005-09-23 2010-03-04 Edward Grochowski Method and apparatus for late timing transition detection
US20110006827A1 (en) * 2008-02-20 2011-01-13 Hidekichi Shimura Semiconductor integrated circuit
US20110068858A1 (en) * 2009-09-18 2011-03-24 Stmicroelectronics Pvt. Ltd. Fail safe adaptive voltage/frequency system
JP2013055199A (en) * 2011-09-02 2013-03-21 Fujitsu Ltd Semiconductor device
US20130166980A1 (en) * 2011-12-23 2013-06-27 Arm Limited Error recovery in a data processing apparatus

Similar Documents

Publication Publication Date Title
US9939839B2 (en) Low power automatic calibration method for high frequency oscillators
US8228106B2 (en) On-chip self calibrating delay monitoring circuitry
US9590639B2 (en) Semiconductor device and control method
US9513690B2 (en) Apparatus and method for adjusting operating frequencies of processors based on result of comparison of power level with a first threshold and a second threshold
KR101698877B1 (en) Total platform power control
JP6113538B2 (en) Control device, control method, program, and semiconductor device
US8639987B2 (en) Data processing apparatus and method using monitoring circuitry to control operating parameters
KR102340679B1 (en) Clock adjustment for voltage droop
BR102013015444B1 (en) APPARATUS, METHOD AND INTEGRATED CIRCUIT FOR THERMAL CONTROL
US20130321072A1 (en) Method, Apparatus and System for Adaptively Adjusting Voltage
CN107112994B (en) Power management system for integrated circuits
US20180052506A1 (en) Voltage and frequency scaling apparatus, system on chip and voltage and frequency scaling method
US20190317546A1 (en) Level-based droop detection
US8786449B1 (en) System-on-chip with thermal management core
EP2972660B1 (en) Controlling power supply unit power consumption during idle state
US20140245028A1 (en) System and method for temperature driven selection of voltage modes in a portable computing device
US8937511B2 (en) Frequency scaling of variable speed systems for fast response and power reduction
CN113126892A (en) Method for controlling storage system, electronic device and computer program product
US9117511B2 (en) Control circuits for asynchronous circuits
US20130159734A1 (en) Power Management Methods for System on a Chip
EP4341776A1 (en) Power controller communication latency mitigation
US9268393B2 (en) Enforcing a power consumption duty cycle in a processor
WO2015188690A1 (en) Adaptive voltage adjustment circuit and chip
US11163345B2 (en) Electronic device to control temperature and computing performance of at least one processing unit and system and method thereof
WO2015094373A1 (en) Apparatus and method for adaptive guard-band reduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13899539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13899539

Country of ref document: EP

Kind code of ref document: A1