US20200106424A1 - Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges - Google Patents
Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges Download PDFInfo
- Publication number
- US20200106424A1 US20200106424A1 US16/143,973 US201816143973A US2020106424A1 US 20200106424 A1 US20200106424 A1 US 20200106424A1 US 201816143973 A US201816143973 A US 201816143973A US 2020106424 A1 US2020106424 A1 US 2020106424A1
- Authority
- US
- United States
- Prior art keywords
- output
- data
- bypass circuit
- clock
- clock signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K3/00—Circuits for generating electric pulses; Monostable, bistable or multistable circuits
- H03K3/02—Generators characterised by the type of circuit or by the means used for producing pulses
- H03K3/027—Generators characterised by the type of circuit or by the means used for producing pulses by the use of logic circuits, with internal or external positive feedback
- H03K3/037—Bistable circuits
- H03K3/0372—Bistable circuits of the master-slave type
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K3/00—Circuits for generating electric pulses; Monostable, bistable or multistable circuits
- H03K3/02—Generators characterised by the type of circuit or by the means used for producing pulses
- H03K3/353—Generators characterised by the type of circuit or by the means used for producing pulses by the use, as active elements, of field-effect transistors with internal or external positive feedback
- H03K3/356—Bistable circuits
- H03K3/3562—Bistable circuits of the master-slave type
- H03K3/35625—Bistable circuits of the master-slave type using complementary field-effect transistors
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/08—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using semiconductor devices
- H03K19/094—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using semiconductor devices using field-effect transistors
- H03K19/096—Synchronous circuits, i.e. using clock signals
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K3/00—Circuits for generating electric pulses; Monostable, bistable or multistable circuits
- H03K3/64—Generators producing trains of pulses, i.e. finite sequences of pulses
Definitions
- Embodiments described herein relate to the field of computing systems and, more particularly, to efficiently storing and driving data between pipeline stages.
- Sequential elements are used for storing and driving data in a variety of circuits such as general-purpose central processing unit (CPU), data parallel processors like graphics processing units (GPUs), digital signal processors (DSPs), and so forth.
- Modern processors are typically pipelined.
- the processors include one or more data processing stages connected in series with sequential elements placed between the stages for storing and driving the data. The output of one stage is made the input of the next stage during each transition of a clock signal.
- the sequential elements typically are flip-flop circuits.
- a processor's performance is dependent at least upon the operating frequency of a clock signal.
- the duration of a clock cycle period corresponding to the operating frequency is determined by the amount of time required for processing of data between the flip-flop circuits.
- the clock cycle period increases based at least upon the setup time and the clock-to-output delay of the flip-flop circuit.
- a variety of versions of flip-flop circuits are designed for different end purposes. Flip-flop circuits designed for low latency typically favor one input data transition between a rising and a falling edge transition while eliminating the other transition from the critical path. The tradeoff is design flexibility is reduced while also the setup overhead still exists.
- a flip-flop circuit used between pipeline stages of a processor includes a master latch, which receives a clock signal and a data signal, and a shadow latch, which receives an output of the master latch and a clock signal.
- the flip-flop circuit includes a bypass circuit capable of receiving the clock signal and a version of the data signal. When the clock signal is asserted, the output of the master latch is sent to the shadow latch, and the output of the shadow latch is prevented from being sent as an output of the flip-flop circuit. Rather, the output of the bypass circuit is sent as the output of the flip-flop circuit.
- the bypass circuit is a tri-state inverter used to reduce the clock-to-output delay of the flip-flop circuit.
- the version of the data signal received by the bypass circuit is the data signal. Therefore, the data signal is received by each of the master latch and the bypass circuit.
- the clock signal is a pulse signal generated from a source clock signal, and each of the master latch, the shadow latch and the bypass circuit receive the pulse signal.
- the version of the data signal received by the bypass circuit is the output of the master latch.
- the output of the master latch is pre-charged to a Boolean logic high level when the clock is negated. Although the output of the master latch is pre-charged, when the clock is asserted, each of a late arriving rising and falling data transition are included in the critical path of the flip-flop circuit.
- the bypass circuit is a tri-state inverter with the rising input data transition gated by the clock signal, but with the falling input data transition remaining ungated by the clock signal.
- the path from the output of the master latch through the bypass circuit to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit.
- a sequential feedback circuit in the master latch receives a delayed version of the clock signal while a sequential feedback circuit in the shadow latch receives the clock signal.
- FIG. 1 is a block diagram of one embodiment of a sequential element.
- FIG. 2 is a block diagram of one embodiment of clock waveforms.
- FIG. 3 is a block diagram of another embodiment of a sequential element.
- FIG. 4 is a flow diagram of one embodiment of a method for efficiently storing and driving data between pipeline stages.
- FIG. 5 is a block diagram of another embodiment of a sequential element.
- FIG. 6 is a flow diagram of one embodiment of a method for efficiently storing and driving data between pipeline stages.
- FIG. 7 is a block diagram of one embodiment of a system.
- circuits, or other components may be described as “configured to” perform a task or tasks.
- “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation.
- the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on.
- the circuitry that forms the structure corresponding to “configured to” may include hardware circuits.
- various units/circuits/components may be described as performing a task or tasks, for convenience in the description.
- sequential element 100 includes a master latch 120 , a shadow latch 160 and a bypass circuit 150 .
- sequential element receives a data signal 110 (or Data 110 ), a clock signal 102 (or Clock 102 ), an inverted level of clock 102 , which is ClockBar 104 , a delayed (and buffered) version of Clock 102 , which is ClockBuf 106 , and an inverted level of ClockBuf 106 , which is ClockBarBuf 108 .
- sequential element 100 With the received inputs, sequential element 100 generates Output 180 .
- sequential element 100 is used between pipeline stages of a processor.
- sequential element 100 is a flip-flop circuit.
- a “shadow latch” refers to a latch with a tri-state output.
- Each of the shadow latch 160 and the bypass circuit 150 receives the output from the master latch 120 , which is DataBar 114 .
- the master latch 120 is a data input stage of the sequential element 100 .
- only one of the shadow latch 160 and the bypass circuit 150 sets a voltage level on Output 180 .
- the shadow latch 160 sets the voltage level on Output 180 and the bypass circuit 150 is disabled.
- the pre-charge device Q 140 (a PFET) pre-charges the dynamic node DataBar 114 to a logic high level, which disables device Q 152 (a PFET) in the bypass circuit 150 .
- the device Q 156 (an NFET) is disabled by Clock 102 . Therefore, the bypass circuit 150 does not set a voltage level on its output, which is Output 180 , and the bypass circuit 150 is considered disabled or tri-stated.
- the shadow latch 160 is prevented from sending an output to Output 180 .
- the bypass circuit 150 sends an inverted level of DataBar 114 to Output 180 .
- a “device” refers to a resistor, a transistor, or other suitable type of transconductance device coupled between a circuit node and either a power node or a ground node.
- a “logic low level,” a “logic 0 value,” or a “Boolean logic low level” corresponds to a voltage level sufficiently low to enable a p-type metal oxide semiconductor (MOS) field effect transistor (FET), which is also referred simply as a “PFET.”
- PFET p-type metal oxide semiconductor
- a “logic high level,” a “logic 1 value,” or a “Boolean logic high level” corresponds to a voltage level sufficiently high to enable an n-type metal oxide semiconductor (MOS) field effect transistor (FET), which is also referred simply as an “NFET.”
- CMOS complementary metal-oxide semiconductor
- the master latch 120 receives Data 110 on the gate terminals of devices Q 124 (a PFET) and Q 130 (an NFET).
- the clock signals Clock 102 and its inverted version ClockBar 104 are received on the gate terminals of Q 132 and Q 122 , respectively.
- External circuitry generates delayed versions of Clock 102 and ClockBar 104 such as ClockBuf 106 and ClockBarBuf 108 .
- a delayed clock generator includes a number of series connected inverters for receiving Clock 102 and generating ClockBar 104 , ClockBuf 106 and ClockBarBuf 108 .
- ClockBar 104 has a one inverter logic gate delay from Clock 102
- ClockBuf 106 has a four inverter logic gate delay from Clock 102
- ClockBarBuf 108 has a five inverter logic gate delay from Clock 102 .
- other numbers of logic gates and types of logic gates are used to generate clocks signals from Clock 102 .
- Sequential element 100 is described as being “transparent” or “open” when data is capable of being transmitted from Data 110 through the master latch 120 to DataBar 114 , which is received by each of the shadow latch 160 and bypass circuit 150 . Sequential element 100 is described as being “opaque” or “closed” when data is incapable of being transmitted from Data 110 through the master latch 120 to DataBar 114 .
- Clock 102 A figure and a later description of Clock 102 , ClockBuf 106 and a pulse are provided in FIG. 2 .
- ClockBar 104 When Clock 102 is asserted, or set at a logic high level to enable Q 132 , a relatively short time later, such as the delay of one inverter logic gate (or inverter), ClockBar 104 becomes negated. Since ClockBuf 106 is still at a logic low level due to the previous level of Clock 102 , ClockBarbuf 108 is still at a logic high level. Accordingly, the devices Q 126 and Q 128 are enabled, and an inverted level of Data 110 is capable of being set on DataBar 114 . Sequential element 100 is considered to be open.
- ClockBuf 106 transitions from the logic low level to the logic high level, which disables Q 126 (a PFET).
- ClockBarBuf 108 transitions from the logic high level to the logic low level, which disables Q 12 (an NFET).
- the measure of time between a first point in time when Clock 102 transitions to the logic high level to a second point in time when the devices Q 126 and Q 128 become disabled is referred to as the width of the pulse.
- the pre-charge device Q 140 (a PFET) is used to pre-charge the dynamic node DataBar 114 when Clock 102 is negated.
- the output of the master latch which is DataBar 114
- each of a late arriving rising data transition and a late arriving falling data transition for Data 110 is included in the critical path of the sequential element 100 .
- semi-dynamic flip-flop circuits remove low-to-high (rising) transitions from the critical path, and consequently, have transparency on only one input data transition.
- the sequential element 100 uses semi-dynamic circuitry (e.g., master latch 120 is dynamic while shadow latch 160 is static) with transparency for both input data transitions.
- the master latch 120 also includes a delayed clock, sequential feedback circuit, which includes inverter 142 and tri-state inverter 144 .
- Inverter 142 receives DataBar 114 .
- Tri-state inverter 144 receives the output of inverter 142 while sending its output to the dynamic node DataBar 114 .
- tri-state inverter 144 receives ClockBuf 106 and ClockBarBuf 108 , which are delayed clock signals compared to Clock 102 . Therefore, the sequential feedback circuit lags behind the front-end of the master latch 120 , which incorporates devices Q 122 -Q 132 .
- the sequential feedback circuit with logic gates 142 - 144 is still disabled for the period of time until ClockBuf 106 and ClockBarBuf 108 transition following the transition of Clock 102 .
- the sequential feedback circuit with logic gates 142 - 144 remains disabled until ClockBuf 106 transitions to a logic high level.
- the shadow latch 160 receives the output of the master latch 120 using the gate terminals of the devices Q 162 (a PFET) and Q 164 (an NFET).
- the device Q 166 (an NFET) receives Clock 102 .
- the shadow latch 160 also includes a sequential feedback circuit. This sequential feedback circuit in the shadow latch 160 includes inverter 170 and tri-state inverter 172 .
- Inverter 170 receives the output on the drain terminals of the devices Q 162 and Q 164 .
- Tri-state inverter 172 receives the output of inverter 170 while sending its output to the input of inverter 170 .
- tri-state inverter 172 receives Clock 102 and ClockBar 104 , but with Clock 102 being used as an inverted clock to enable PFETs included in tri-state inverter 172 and ClockBar 104 being used to enable NFETs included in tri-state inverter 172 .
- tri-state inverter 174 receives Clock 102 and ClockBar 104 , but with Clock 102 being used as an inverted clock to enable PFETs included in tri-state inverter 174 and ClockBar 104 being used to enable NFETs included in tri-state inverter 174 . Therefore, each of tri-state inverters 172 and 174 is disabled when Clock 102 is set at a logic high level, and each of tri-state inverters 172 and 174 is enabled when Clock 102 is set at a logic low level.
- the bypass circuit 150 receives the output of the master latch 120 using the gate terminals of the devices Q 152 (a PFET) and Q 154 (an NFET).
- the device Q 156 (an NFET) receives Clock 102 . Therefore, the input of the bypass circuit 150 is a tri-state inverter with the rising input data transition gated by Clock 102 , but the falling input data transition remains ungated by Clock 102 .
- the path from Data 110 through the master latch 120 and through the bypass circuit 150 to Output 180 reduces the clock-to-output (clk-to-q) delay of the sequential element 100 .
- the tri-state inverter 174 being disabled when Clock 102 is set at a logic high level ensures that the shadow latch 160 and the bypass circuit 150 do not contend with one another.
- FIG. 2 a generalized block diagram of one embodiment of a clock waveforms 200 is shown.
- signals previously described are numbered identically.
- Clock 102 transitions from a logic low level to a logic high level (a rising transition) at time t 1 .
- a clock generator receives Clock 102 and generates ClockBuf 106 through multiple series connected inverters. In other embodiments, other types of logic gates or other circuitry are used to generate ClockBuf 106 from Clock 102 .
- ClockBuf 106 transitions from a logic low level to a logic high level (a rising transition) at time t 3 .
- the signal ClockBuf 106 has a similar delay from Clock 102 for falling transitions.
- Clock 102 has a falling transition at time t 6 and ClockBuf 106 has a falling transition at time t 7 .
- ClockBarBuf 108 is an inverted version of ClockBuf 106 .
- ClockBuf 106 has a rising transition at time t 3
- ClockBarBuf 108 has a falling transition at time t 4
- ClockBarBuf 108 has a rising transition at time t 8 .
- a pulse generator receives one or more inverted versions of Clock 102 and other buffered (delayed) versions of Clock 102 , as well as Clock 102 .
- the pulse generator combines Clock 102 and one or more of the other received versions of Clock 102 in a Boolean AND gate to generate Pulse 202 .
- Pulse 202 has a rising transition at time t 1
- Pulse 202 has a rising transition at time t 2
- Pulse 202 has a falling transition at time t 5 .
- the pulse width of Pulse 202 is measured between times t 2 and t 5 .
- the pulse width of Pulse 202 is used to determine a transparency window for a sequential element.
- the use of Pulse 202 as an input clock signal provides a better setup time, since it is generated at a later time, but the delay also increases the clock-to-output (clk-to-q) delay of a sequential element.
- the master latch 320 receives Data 110 on the gate terminals of devices Q 324 (a PFET) and Q 330 (an NFET).
- the master latch 320 is a data input stage of the sequential element 300 .
- the devices Q 326 and Q 328 of the master latch 320 uses Pulse 202 and the inverted level of Pulse 202 , which is PulseBar 304 .
- Pulse 202 is received on the gate terminal of device Q 328 (an NFET) and PulseBar 304 is received on the gate terminal of device Q 326 (a PFET).
- the pulse width of Pulse 202 is used to determine a transparency window for sequential element 300 .
- the use of Pulse 202 as an input clock signal provides a better setup time, but the delay of Pulse 202 also increases the clock-to-output (clk-to-q) delay of sequential element 300 . Therefore, in some embodiments, sequential element 100 is selected for use in a design when the clock-to-output latency is prioritized over the setup time, but sequential element 300 is selected for use in a design when the setup time is prioritized over the clock-to-output latency.
- FIG. 4 a generalized flow diagram of one embodiment of a method 400 for efficiently handling instruction execution ordering is shown.
- the steps in this embodiment are shown in sequential order. However, in other embodiments some steps may occur in a different order than shown, some steps may be performed concurrently, some steps may be combined with other steps, and some steps may be absent.
- Data is received at a master latch included in a flip-flop circuit (block 402 ).
- the master latch uses a combination of a clock signal and one or more delayed versions of the clock signal to determine when the master latch is transparent.
- the master latch uses a pulse signal and one or more delayed version of the pulse signal to determine when the master latch is transparent. If a received clock signal is negated (de-asserted) (“no” branch of the conditional block 404 ), then outputs from the master latch and a bypass circuit are prevented from being sent (block 406 ). For example, tri-state inverters or other circuitry are used to prevent sending the outputs.
- a value of the data sent from the master latch is stored when the clock signal became negated (block 408 ).
- a sequential feedback circuit latches the data value.
- the sequential feedback circuit uses a tri-state inverter that receives delayed clock inputs.
- the stored value of the data is sent from a shadow latch as the output of the flip-flop circuit (block 410 ).
- the shadow latch receives the stored value of the master latch, an input stage of the shadow latch closes, and back-end of the shadow latch becomes open (transparent) when the clock signal is negated.
- the back-end of the shadow latch uses a sequential feedback circuit using a tri-state inverter that receives inverted values of the clock signal.
- the output of the master latch is pre-charged with the negated clock signal (block 412 ). Therefore, the flip-flop circuit is a semi-dynamic flip-flop circuit, since the master latch uses dynamic logic while the shadow latch uses static logic. Typically, semi-dynamic flip-flop circuits remove low-to-high (rising) transitions from the critical path, and consequently, have transparency on only one input data transition. Here, the flip-flop circuit uses semi-dynamic circuitry with transparency for both input data transitions.
- the pre-charged output is sent to each of the shadow latch and the bypass circuit (block 414 ).
- a received clock signal is asserted (“yes” branch of the conditional block 404 )
- the output from the shadow latch is prevented from being sent as the output of the flip-flop circuit (block 416 ).
- Pre-charging of the output of the master latch is prevented (block 418 ).
- the received data from the master latch is sent to each of the shadow latch and the bypass circuit (block 420 ).
- the received data is sent from the bypass circuit as the output of the flip-flop circuit (block 422 ).
- the path from the output of the master latch through the bypass circuit to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit.
- sequential element 500 receives Pulse 202 and PulseBar 304 without receiving source Clock 102 .
- the data input stage 520 of the sequential element 500 uses Pulse 202 and the inverted level of Pulse 202 , which is PulseBar 304 .
- Pulse 202 is received on the gate terminal of device Q 530 (an NFET) and PulseBar 304 is received on the gate terminal of device Q 524 (a PFET).
- the pulse width of Pulse 202 is used to determine a transparency window for sequential element 500 .
- the data input stage 520 receives Data 110 on the gate terminals of devices Q 526 (a PFET) and Q 528 (an NFET).
- the data input stage 520 does not use dynamic logic, so there is no pre-charged node.
- the output of the data input stage 520 which is DataBar 514 , is received by shadow latch 560 .
- shadow latch 560 uses a sequential feedback circuit with inverter 562 and tri-state inverter 564 .
- Tri-state inverter 564 receives both Pulse 202 and PulseBar 304 . When Pulse 202 is asserted with a logic high level, the tri-state inverter 564 does not drive a voltage level on its output, which is DataBar 514 .
- the tri-state inverter 564 drives the inverted level of DataBuf 566 on its output DataBar 514 .
- the final tri-state stage in shadow latch 560 receives Pulse 202 on the gate terminal of Q 572 (a PFET) and PulseBar 304 on the gate terminal of Q 574 (an NFET).
- the voltage level of DataBuf 566 is received on the gate terminal of Q 570 (a PFET) and on the gate terminal of Q 576 (an NFET).
- the bypass circuit 550 directly receives Data 110 .
- Data 110 is received on the gate terminal of Q 554 (a PFET) and on the gate terminal of Q 556 (an NFET).
- the bypass circuit 550 receives Pulse 202 on the gate terminal of Q 558 (an NFET) and PulseBar 304 on the gate terminal of Q 552 (a PFET).
- Pulse 202 is asserted, the path from Data 110 through the bypass circuit 550 (skipping the data input stage 520 ) to Output 180 reduces the clock-to-output delay of sequential element 500 while also providing a smaller setup time.
- FIG. 6 a generalized flow diagram of one embodiment of a method 600 for efficiently handling instruction execution ordering is shown.
- Data is received at a master latch included in a flip-flop circuit (block 602 ).
- Data is received at a bypass circuit included in the flip-flop circuit (block 604 ). It is noted that the bypass circuit does not receive an output of the master latch, but rather, directly receives the input data signal.
- a received pulse signal is negated (de-asserted) (“no” branch of the conditional block 606 ) (“no” branch of the conditional block 606 ), then outputs from the master latch and the bypass circuit are prevented from being sent (block 608 ).
- tri-state inverters or other circuitry are used to prevent sending the outputs.
- a value of the data sent from the master latch is stored when the pulse signal became negated (block 610 ).
- a feedback circuit latches the data value.
- the feedback circuit uses tri-state inverter with no clock inputs.
- the stored value of the data is sent from a shadow latch included in the flip-flop circuit as the output of the flip-flop circuit (block 612 ).
- the output from the shadow latch is prevented from being sent as the output of the flip-flop circuit (block 614 ).
- the received data from the master latch is sent to the shadow latch (block 616 ).
- the received data is sent from the bypass circuit as the output of the flip-flop circuit (block 618 ).
- the bypass circuit is a tri-state inverter with each of the rising input data transition and the falling input data transition gated by a pulse signal. When the received pulse signal is asserted, the path from the input data signal through the bypass circuit (skipping the master latch) to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit.
- system 700 represents chip, circuitry, components, etc., of a desktop computer 710 , laptop computer 720 , tablet computer 730 , cell or mobile phone 740 , television 750 (or set top box coupled to a television), wrist watch or other wearable item 760 , or otherwise.
- the system 700 includes at least one instance of a system on chip (SoC) 706 which includes multiple types of processing units, such as a central processing unit (CPU), a graphics processing unit (GPU), or other, a communication fabric, and interfaces to memories and input/output devices.
- SoC system on chip
- one or more processors in SoC 706 includes multiple sequential elements between processor pipeline stages similar to sequential elements 100 , 300 and 500 as illustrated in FIG. 1 , FIG. 3 and FIG. 5 .
- SoC 706 is coupled to external memory 702 , peripherals 704 , and power supply 708 .
- a power supply 708 is also provided which supplies the supply voltages to SoC 706 as well as one or more supply voltages to the memory 702 and/or the peripherals 704 .
- power supply 708 represents a battery (e.g., a rechargeable battery in a smart phone, laptop or tablet computer).
- more than one instance of SoC 706 is included (and more than one external memory 702 is included as well).
- the memory 702 is any type of memory, such as dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM (including mobile versions of the SDRAMs such as mDDR3, etc., and/or low power versions of the SDRAMs such as LPDDR2, etc.), RAMBUS DRAM (RDRAM), static RAM (SRAM), etc.
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- DDR double data rate
- DDR double data rate
- RDRAM RAMBUS DRAM
- SRAM static RAM
- One or more memory devices are coupled onto a circuit board to form memory modules such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc.
- SIMMs single inline memory modules
- DIMMs dual inline memory modules
- the devices are mounted with a SoC or an integrated circuit in a chip-on-chip configuration, a package-on-package configuration, or a multi-chip module
- peripherals 704 include any desired circuitry, depending on the type of system 700 .
- peripherals 704 includes devices for various types of wireless communication, such as Wi-Fi, Bluetooth, cellular, global positioning system, etc.
- the peripherals 704 also include additional storage, including RAM storage, solid state storage, or disk storage.
- the peripherals 704 include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc.
- program instructions of a software application may be used to implement the methods and/or mechanisms previously described.
- the program instructions may describe the behavior of hardware in a high-level programming language, such as C.
- a hardware design language HDL
- the program instructions may be stored on a non-transitory computer readable storage medium. Numerous types of storage media are available. The storage medium may be accessible by a computer during use to provide the program instructions and accompanying data to the computer for program execution.
- a synthesis tool reads the program instructions in order to produce a netlist including a list of gates from a synthesis library.
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Logic Circuits (AREA)
- Pulse Circuits (AREA)
Abstract
A system and method for efficiently storing and driving data between pipeline stages. In various embodiments, a flip-flop circuit includes a bypass circuit, which is a tri-state inverter, and the bypass circuit receives a clock signal and a version of a data signal. When the clock signal received by the flip-flop circuit is asserted, the output of the bypass circuit is sent as the output of the flip-flop circuit. In one example, the version of the data signal received by the bypass circuit is the data signal. In another example, the version of the data signal received by the bypass circuit is the output of a master latch. Although the output of the master latch is pre-charged, when the clock is asserted, each of a late arriving rising and falling data transition are included in the critical path of the flip-flop circuit.
Description
- Embodiments described herein relate to the field of computing systems and, more particularly, to efficiently storing and driving data between pipeline stages.
- Sequential elements are used for storing and driving data in a variety of circuits such as general-purpose central processing unit (CPU), data parallel processors like graphics processing units (GPUs), digital signal processors (DSPs), and so forth. Modern processors are typically pipelined. For example, the processors include one or more data processing stages connected in series with sequential elements placed between the stages for storing and driving the data. The output of one stage is made the input of the next stage during each transition of a clock signal. The sequential elements typically are flip-flop circuits.
- A processor's performance is dependent at least upon the operating frequency of a clock signal. The duration of a clock cycle period corresponding to the operating frequency is determined by the amount of time required for processing of data between the flip-flop circuits. The clock cycle period increases based at least upon the setup time and the clock-to-output delay of the flip-flop circuit. A variety of versions of flip-flop circuits are designed for different end purposes. Flip-flop circuits designed for low latency typically favor one input data transition between a rising and a falling edge transition while eliminating the other transition from the critical path. The tradeoff is design flexibility is reduced while also the setup overhead still exists.
- In view of the above, methods and mechanisms for efficiently storing and driving data between pipeline stages are desired.
- Systems and methods for efficiently storing and driving data between pipeline stages are contemplated. In various embodiments, a flip-flop circuit used between pipeline stages of a processor includes a master latch, which receives a clock signal and a data signal, and a shadow latch, which receives an output of the master latch and a clock signal. In addition, the flip-flop circuit includes a bypass circuit capable of receiving the clock signal and a version of the data signal. When the clock signal is asserted, the output of the master latch is sent to the shadow latch, and the output of the shadow latch is prevented from being sent as an output of the flip-flop circuit. Rather, the output of the bypass circuit is sent as the output of the flip-flop circuit. In various embodiments, the bypass circuit is a tri-state inverter used to reduce the clock-to-output delay of the flip-flop circuit.
- In some embodiments, the version of the data signal received by the bypass circuit is the data signal. Therefore, the data signal is received by each of the master latch and the bypass circuit. In an embodiment, the clock signal is a pulse signal generated from a source clock signal, and each of the master latch, the shadow latch and the bypass circuit receive the pulse signal.
- In some embodiments, the version of the data signal received by the bypass circuit is the output of the master latch. In an embodiment, the output of the master latch is pre-charged to a Boolean logic high level when the clock is negated. Although the output of the master latch is pre-charged, when the clock is asserted, each of a late arriving rising and falling data transition are included in the critical path of the flip-flop circuit.
- In some embodiments, the bypass circuit is a tri-state inverter with the rising input data transition gated by the clock signal, but with the falling input data transition remaining ungated by the clock signal. When the clock signal is asserted, the path from the output of the master latch through the bypass circuit to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit. A sequential feedback circuit in the master latch receives a delayed version of the clock signal while a sequential feedback circuit in the shadow latch receives the clock signal.
- These and other embodiments will be further appreciated upon reference to the following description and drawings.
- The above and further advantages of the methods and mechanisms may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
-
FIG. 1 is a block diagram of one embodiment of a sequential element. -
FIG. 2 is a block diagram of one embodiment of clock waveforms. -
FIG. 3 is a block diagram of another embodiment of a sequential element. -
FIG. 4 is a flow diagram of one embodiment of a method for efficiently storing and driving data between pipeline stages. -
FIG. 5 is a block diagram of another embodiment of a sequential element. -
FIG. 6 is a flow diagram of one embodiment of a method for efficiently storing and driving data between pipeline stages. -
FIG. 7 is a block diagram of one embodiment of a system. - While the embodiments described in this disclosure may be susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
- Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) for that unit/circuit/component.
- In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments described in this disclosure. However, one having ordinary skill in the art should recognize that the embodiments might be practiced without these specific details. In some instances, well-known circuits, structures, and techniques have not been shown in detail for ease of illustration and to avoid obscuring the description of the embodiments.
- Referring to
FIG. 1 , a generalized block diagram of one embodiment of asequential element 100 is shown. In the illustrated embodiment,sequential element 100 includes amaster latch 120, ashadow latch 160 and abypass circuit 150. As shown, sequential element receives a data signal 110 (or Data 110), a clock signal 102 (or Clock 102), an inverted level ofclock 102, which is ClockBar 104, a delayed (and buffered) version ofClock 102, which is ClockBuf 106, and an inverted level of ClockBuf 106, which is ClockBarBuf 108. With the received inputs,sequential element 100 generatesOutput 180. In various embodiments,sequential element 100 is used between pipeline stages of a processor. In some embodiments,sequential element 100 is a flip-flop circuit. - As used herein, a “shadow latch” refers to a latch with a tri-state output. Each of the
shadow latch 160 and thebypass circuit 150 receives the output from themaster latch 120, which is DataBar 114. Themaster latch 120 is a data input stage of thesequential element 100. As shown, only one of theshadow latch 160 and thebypass circuit 150 sets a voltage level onOutput 180. WhenClock 102 is negated, theshadow latch 160 sets the voltage level onOutput 180 and thebypass circuit 150 is disabled. For example, whenClock 102 is negated with a logic low level, the pre-charge device Q140 (a PFET) pre-charges thedynamic node DataBar 114 to a logic high level, which disables device Q152 (a PFET) in thebypass circuit 150. The device Q156 (an NFET) is disabled byClock 102. Therefore, thebypass circuit 150 does not set a voltage level on its output, which isOutput 180, and thebypass circuit 150 is considered disabled or tri-stated. WhenClock 102 is asserted, theshadow latch 160 is prevented from sending an output toOutput 180. In addition, whenClock 102 is asserted, thebypass circuit 150 sends an inverted level ofDataBar 114 toOutput 180. - When
Clock 102 is asserted with a logic high level, the path from the output of the master latch, which isDataBar 114, through thebypass circuit 150 toOutput 180 reduces the clock-to-output (clk-to-q) delay of thesequential element 100. Before providing further details of the components included in themaster latch 120, theshadow latch 160 and thebypass circuit 150, a description of terms used to describe them is next provided. - As used herein, a “device” refers to a resistor, a transistor, or other suitable type of transconductance device coupled between a circuit node and either a power node or a ground node. In addition, as used herein, a “logic low level,” a “logic 0 value,” or a “Boolean logic low level” corresponds to a voltage level sufficiently low to enable a p-type metal oxide semiconductor (MOS) field effect transistor (FET), which is also referred simply as a “PFET.” Similarly, a “logic high level,” a “
logic 1 value,” or a “Boolean logic high level” corresponds to a voltage level sufficiently high to enable an n-type metal oxide semiconductor (MOS) field effect transistor (FET), which is also referred simply as an “NFET.” In various other embodiments, different technology, including technologies other than complementary metal-oxide semiconductor (CMOS), result in different voltage levels for “low” and “high.” As used herein, a signal is considered “asserted” when the signal has a particular voltage level used for enabling combinatorial logic or devices. A signal is considered “de-asserted” or “negated” when the signal has a particular voltage level used for disabling combinatorial logic or devices. - In the illustrated embodiment, the
master latch 120 receivesData 110 on the gate terminals of devices Q124 (a PFET) and Q130 (an NFET). The clock signalsClock 102 and itsinverted version ClockBar 104 are received on the gate terminals of Q132 and Q122, respectively. External circuitry generates delayed versions ofClock 102 andClockBar 104 such asClockBuf 106 andClockBarBuf 108. In an embodiment, a delayed clock generator includes a number of series connected inverters for receivingClock 102 and generatingClockBar 104,ClockBuf 106 andClockBarBuf 108. In one embodiment,ClockBar 104 has a one inverter logic gate delay fromClock 102,ClockBuf 106 has a four inverter logic gate delay fromClock 102 andClockBarBuf 108 has a five inverter logic gate delay fromClock 102. In other embodiments, other numbers of logic gates and types of logic gates are used to generate clocks signals fromClock 102. - Combining the clock signals 102-108 using the devices Q122, Q126, Q128 and Q132 as shown creates a smaller duration of time for transmitting
Data 110 for latching by themaster latch 120 than a duty cycle ofClock 102. The smaller duration of time is referred to as a “pulse,” and it is used to determine a “transparency window” for thesequential element 100.Sequential element 100 is described as being “transparent” or “open” when data is capable of being transmitted fromData 110 through themaster latch 120 toDataBar 114, which is received by each of theshadow latch 160 andbypass circuit 150.Sequential element 100 is described as being “opaque” or “closed” when data is incapable of being transmitted fromData 110 through themaster latch 120 toDataBar 114. - A figure and a later description of
Clock 102,ClockBuf 106 and a pulse are provided inFIG. 2 . For example, whenClock 102 is asserted, or set at a logic high level to enable Q132, a relatively short time later, such as the delay of one inverter logic gate (or inverter),ClockBar 104 becomes negated. SinceClockBuf 106 is still at a logic low level due to the previous level ofClock 102,ClockBarbuf 108 is still at a logic high level. Accordingly, the devices Q126 and Q128 are enabled, and an inverted level ofData 110 is capable of being set onDataBar 114.Sequential element 100 is considered to be open. Once the logic high level ofClock 102 is transmitted through multiple logic gate delays in the external clock generator to generateClockBuf 106, theinput ClockBuf 106 transitions from the logic low level to the logic high level, which disables Q126 (a PFET). Similarly,ClockBarBuf 108 transitions from the logic high level to the logic low level, which disables Q12 (an NFET). The measure of time between a first point in time whenClock 102 transitions to the logic high level to a second point in time when the devices Q126 and Q128 become disabled is referred to as the width of the pulse. - As shown, the pre-charge device Q140 (a PFET) is used to pre-charge the
dynamic node DataBar 114 whenClock 102 is negated. Although the output of the master latch, which isDataBar 114, is pre-charged, whenClock 102 is again asserted, each of a late arriving rising data transition and a late arriving falling data transition forData 110 is included in the critical path of thesequential element 100. Typically, semi-dynamic flip-flop circuits remove low-to-high (rising) transitions from the critical path, and consequently, have transparency on only one input data transition. Here, thesequential element 100 uses semi-dynamic circuitry (e.g.,master latch 120 is dynamic whileshadow latch 160 is static) with transparency for both input data transitions. - The
master latch 120 also includes a delayed clock, sequential feedback circuit, which includesinverter 142 andtri-state inverter 144.Inverter 142 receivesDataBar 114.Tri-state inverter 144 receives the output ofinverter 142 while sending its output to thedynamic node DataBar 114. In the illustrated embodiment,tri-state inverter 144 receivesClockBuf 106 andClockBarBuf 108, which are delayed clock signals compared toClock 102. Therefore, the sequential feedback circuit lags behind the front-end of themaster latch 120, which incorporates devices Q122-Q132. For example, when the inverted version ofData 110 initially transfers toDataBar 114, the sequential feedback circuit with logic gates 142-144 is still disabled for the period of time untilClockBuf 106 andClockBarBuf 108 transition following the transition ofClock 102. For example, whenClock 102 transitions to a logic high value to enable device Q132, the sequential feedback circuit with logic gates 142-144 remains disabled untilClockBuf 106 transitions to a logic high level. - The
shadow latch 160 receives the output of themaster latch 120 using the gate terminals of the devices Q162 (a PFET) and Q164 (an NFET). The device Q166 (an NFET) receivesClock 102. Theshadow latch 160 also includes a sequential feedback circuit. This sequential feedback circuit in theshadow latch 160 includesinverter 170 andtri-state inverter 172.Inverter 170 receives the output on the drain terminals of the devices Q162 and Q164.Tri-state inverter 172 receives the output ofinverter 170 while sending its output to the input ofinverter 170. In the illustrated embodiment,tri-state inverter 172 receivesClock 102 andClockBar 104, but withClock 102 being used as an inverted clock to enable PFETs included intri-state inverter 172 andClockBar 104 being used to enable NFETs included intri-state inverter 172. - The output of
inverter 170 is received bytri-state inverter 174. Similar totri-state inverter 172,tri-state inverter 174 receivesClock 102 andClockBar 104, but withClock 102 being used as an inverted clock to enable PFETs included intri-state inverter 174 andClockBar 104 being used to enable NFETs included intri-state inverter 174. Therefore, each oftri-state inverters Clock 102 is set at a logic high level, and each oftri-state inverters Clock 102 is set at a logic low level. - Similar to the input stage of the
shadow latch 160, thebypass circuit 150 receives the output of themaster latch 120 using the gate terminals of the devices Q152 (a PFET) and Q154 (an NFET). The device Q156 (an NFET) receivesClock 102. Therefore, the input of thebypass circuit 150 is a tri-state inverter with the rising input data transition gated byClock 102, but the falling input data transition remains ungated byClock 102. As described earlier, whenClock 102 is set at a logic high level, the path fromData 110 through themaster latch 120 and through thebypass circuit 150 toOutput 180 reduces the clock-to-output (clk-to-q) delay of thesequential element 100. Thetri-state inverter 174 being disabled whenClock 102 is set at a logic high level ensures that theshadow latch 160 and thebypass circuit 150 do not contend with one another. - Referring to
FIG. 2 , a generalized block diagram of one embodiment of aclock waveforms 200 is shown. In the illustrated embodiment, signals previously described are numbered identically. As shown,Clock 102 transitions from a logic low level to a logic high level (a rising transition) at time t1. As described earlier, a clock generator receivesClock 102 and generatesClockBuf 106 through multiple series connected inverters. In other embodiments, other types of logic gates or other circuitry are used to generateClockBuf 106 fromClock 102. - As shown,
ClockBuf 106 transitions from a logic low level to a logic high level (a rising transition) at time t3. Thesignal ClockBuf 106 has a similar delay fromClock 102 for falling transitions. As shown,Clock 102 has a falling transition at time t6 andClockBuf 106 has a falling transition at time t7.ClockBarBuf 108 is an inverted version ofClockBuf 106. WhenClockBuf 106 has a rising transition at time t3,ClockBarBuf 108 has a falling transition at time t4, and whenClockBuf 106 has a falling transition at time t7,ClockBarBuf 108 has a rising transition at time t8. In some embodiments, a pulse generator receives one or more inverted versions ofClock 102 and other buffered (delayed) versions ofClock 102, as well asClock 102. - In an embodiment, the pulse generator combines
Clock 102 and one or more of the other received versions ofClock 102 in a Boolean AND gate to generatePulse 202. As shown, whenClock 102 has a rising transition at time t1,Pulse 202 has a rising transition at time t2, andPulse 202 has a falling transition at time t5. The pulse width ofPulse 202 is measured between times t2 and t5. The pulse width ofPulse 202 is used to determine a transparency window for a sequential element. The use ofPulse 202 as an input clock signal provides a better setup time, since it is generated at a later time, but the delay also increases the clock-to-output (clk-to-q) delay of a sequential element. - Referring to
FIG. 3 , a generalized block diagram of another embodiment of asequential element 300 is shown. Circuitry, logic and signals described earlier are numbered identically. In the illustrated embodiment, themaster latch 320 receivesData 110 on the gate terminals of devices Q324 (a PFET) and Q330 (an NFET). Themaster latch 320 is a data input stage of thesequential element 300. Rather than receiveClock 102, the devices Q326 and Q328 of themaster latch 320 usesPulse 202 and the inverted level ofPulse 202, which isPulseBar 304.Pulse 202 is received on the gate terminal of device Q328 (an NFET) andPulseBar 304 is received on the gate terminal of device Q326 (a PFET). The pulse width ofPulse 202 is used to determine a transparency window forsequential element 300. The use ofPulse 202 as an input clock signal provides a better setup time, but the delay ofPulse 202 also increases the clock-to-output (clk-to-q) delay ofsequential element 300. Therefore, in some embodiments,sequential element 100 is selected for use in a design when the clock-to-output latency is prioritized over the setup time, butsequential element 300 is selected for use in a design when the setup time is prioritized over the clock-to-output latency. - Referring now to
FIG. 4 , a generalized flow diagram of one embodiment of amethod 400 for efficiently handling instruction execution ordering is shown. For purposes of discussion, the steps in this embodiment (as well as forFIG. 6 ) are shown in sequential order. However, in other embodiments some steps may occur in a different order than shown, some steps may be performed concurrently, some steps may be combined with other steps, and some steps may be absent. - Data is received at a master latch included in a flip-flop circuit (block 402). In some embodiments, the master latch uses a combination of a clock signal and one or more delayed versions of the clock signal to determine when the master latch is transparent. In other embodiments, the master latch uses a pulse signal and one or more delayed version of the pulse signal to determine when the master latch is transparent. If a received clock signal is negated (de-asserted) (“no” branch of the conditional block 404), then outputs from the master latch and a bypass circuit are prevented from being sent (block 406). For example, tri-state inverters or other circuitry are used to prevent sending the outputs.
- A value of the data sent from the master latch is stored when the clock signal became negated (block 408). For example, a sequential feedback circuit latches the data value. In some embodiments, the sequential feedback circuit uses a tri-state inverter that receives delayed clock inputs. The stored value of the data is sent from a shadow latch as the output of the flip-flop circuit (block 410). For example, the shadow latch receives the stored value of the master latch, an input stage of the shadow latch closes, and back-end of the shadow latch becomes open (transparent) when the clock signal is negated. In some embodiments, the back-end of the shadow latch uses a sequential feedback circuit using a tri-state inverter that receives inverted values of the clock signal.
- The output of the master latch is pre-charged with the negated clock signal (block 412). Therefore, the flip-flop circuit is a semi-dynamic flip-flop circuit, since the master latch uses dynamic logic while the shadow latch uses static logic. Typically, semi-dynamic flip-flop circuits remove low-to-high (rising) transitions from the critical path, and consequently, have transparency on only one input data transition. Here, the flip-flop circuit uses semi-dynamic circuitry with transparency for both input data transitions. The pre-charged output is sent to each of the shadow latch and the bypass circuit (block 414).
- If a received clock signal is asserted (“yes” branch of the conditional block 404), then the output from the shadow latch is prevented from being sent as the output of the flip-flop circuit (block 416). Pre-charging of the output of the master latch is prevented (block 418). The received data from the master latch is sent to each of the shadow latch and the bypass circuit (block 420). The received data is sent from the bypass circuit as the output of the flip-flop circuit (block 422). When the clock signal is asserted, the path from the output of the master latch through the bypass circuit to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit.
- Referring to
FIG. 5 , a generalized block diagram of another embodiment of asequential element 500 is shown. As shown,sequential element 500 receivesPulse 202 andPulseBar 304 without receivingsource Clock 102. Rather than receiveClock 102, thedata input stage 520 of thesequential element 500 usesPulse 202 and the inverted level ofPulse 202, which isPulseBar 304.Pulse 202 is received on the gate terminal of device Q530 (an NFET) andPulseBar 304 is received on the gate terminal of device Q524 (a PFET). The pulse width ofPulse 202 is used to determine a transparency window forsequential element 500. In the illustrated embodiment, thedata input stage 520 receivesData 110 on the gate terminals of devices Q526 (a PFET) and Q528 (an NFET). - The
data input stage 520 does not use dynamic logic, so there is no pre-charged node. The output of thedata input stage 520, which isDataBar 514, is received byshadow latch 560. As shown,shadow latch 560 uses a sequential feedback circuit withinverter 562 andtri-state inverter 564.Tri-state inverter 564 receives bothPulse 202 andPulseBar 304. WhenPulse 202 is asserted with a logic high level, thetri-state inverter 564 does not drive a voltage level on its output, which isDataBar 514. In contrast, whenPulse 202 is negated with a logic low level, thetri-state inverter 564 drives the inverted level ofDataBuf 566 on itsoutput DataBar 514. After the sequential feedback circuit withgates shadow latch 560 receivesPulse 202 on the gate terminal of Q572 (a PFET) andPulseBar 304 on the gate terminal of Q574 (an NFET). The voltage level ofDataBuf 566 is received on the gate terminal of Q570 (a PFET) and on the gate terminal of Q576 (an NFET). - In the illustrated embodiment, the
bypass circuit 550 directly receivesData 110. As shown,Data 110 is received on the gate terminal of Q554 (a PFET) and on the gate terminal of Q556 (an NFET). Thebypass circuit 550 receivesPulse 202 on the gate terminal of Q558 (an NFET) andPulseBar 304 on the gate terminal of Q552 (a PFET). WhenPulse 202 is asserted, the path fromData 110 through the bypass circuit 550 (skipping the data input stage 520) toOutput 180 reduces the clock-to-output delay ofsequential element 500 while also providing a smaller setup time. - Referring now to
FIG. 6 , a generalized flow diagram of one embodiment of amethod 600 for efficiently handling instruction execution ordering is shown. Data is received at a master latch included in a flip-flop circuit (block 602). Data is received at a bypass circuit included in the flip-flop circuit (block 604). It is noted that the bypass circuit does not receive an output of the master latch, but rather, directly receives the input data signal. - If a received pulse signal is negated (de-asserted) (“no” branch of the conditional block 606), then outputs from the master latch and the bypass circuit are prevented from being sent (block 608). For example, tri-state inverters or other circuitry are used to prevent sending the outputs. A value of the data sent from the master latch is stored when the pulse signal became negated (block 610). For example, a feedback circuit latches the data value. In various embodiments, the feedback circuit uses tri-state inverter with no clock inputs. The stored value of the data is sent from a shadow latch included in the flip-flop circuit as the output of the flip-flop circuit (block 612).
- If a received pulse signal is asserted (“yes” branch of the conditional block 606), then the output from the shadow latch is prevented from being sent as the output of the flip-flop circuit (block 614). The received data from the master latch is sent to the shadow latch (block 616). The received data is sent from the bypass circuit as the output of the flip-flop circuit (block 618). In some embodiments, the bypass circuit is a tri-state inverter with each of the rising input data transition and the falling input data transition gated by a pulse signal. When the received pulse signal is asserted, the path from the input data signal through the bypass circuit (skipping the master latch) to the output of the flip-flop circuit reduces the clock-to-output delay of the flip-flop circuit.
- Turning next to
FIG. 7 , a block diagram of one embodiment of asystem 700 is shown. As shown,system 700 represents chip, circuitry, components, etc., of adesktop computer 710,laptop computer 720,tablet computer 730, cell ormobile phone 740, television 750 (or set top box coupled to a television), wrist watch or otherwearable item 760, or otherwise. Other devices are possible and are contemplated. In the illustrated embodiment, thesystem 700 includes at least one instance of a system on chip (SoC) 706 which includes multiple types of processing units, such as a central processing unit (CPU), a graphics processing unit (GPU), or other, a communication fabric, and interfaces to memories and input/output devices. In some embodiments, one or more processors inSoC 706 includes multiple sequential elements between processor pipeline stages similar tosequential elements FIG. 1 ,FIG. 3 andFIG. 5 . In various embodiments,SoC 706 is coupled toexternal memory 702,peripherals 704, andpower supply 708. - A
power supply 708 is also provided which supplies the supply voltages toSoC 706 as well as one or more supply voltages to thememory 702 and/or theperipherals 704. In various embodiments,power supply 708 represents a battery (e.g., a rechargeable battery in a smart phone, laptop or tablet computer). In some embodiments, more than one instance ofSoC 706 is included (and more than oneexternal memory 702 is included as well). - The
memory 702 is any type of memory, such as dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate (DDR, DDR2, DDR3, etc.) SDRAM (including mobile versions of the SDRAMs such as mDDR3, etc., and/or low power versions of the SDRAMs such as LPDDR2, etc.), RAMBUS DRAM (RDRAM), static RAM (SRAM), etc. One or more memory devices are coupled onto a circuit board to form memory modules such as single inline memory modules (SIMMs), dual inline memory modules (DIMMs), etc. Alternatively, the devices are mounted with a SoC or an integrated circuit in a chip-on-chip configuration, a package-on-package configuration, or a multi-chip module configuration. - The
peripherals 704 include any desired circuitry, depending on the type ofsystem 700. For example, in one embodiment,peripherals 704 includes devices for various types of wireless communication, such as Wi-Fi, Bluetooth, cellular, global positioning system, etc. In some embodiments, theperipherals 704 also include additional storage, including RAM storage, solid state storage, or disk storage. Theperipherals 704 include user interface devices such as a display screen, including touch display screens or multitouch display screens, keyboard or other input devices, microphones, speakers, etc. - In various embodiments, program instructions of a software application may be used to implement the methods and/or mechanisms previously described. The program instructions may describe the behavior of hardware in a high-level programming language, such as C. Alternatively, a hardware design language (HDL) may be used, such as Verilog. The program instructions may be stored on a non-transitory computer readable storage medium. Numerous types of storage media are available. The storage medium may be accessible by a computer during use to provide the program instructions and accompanying data to the computer for program execution. In some embodiments, a synthesis tool reads the program instructions in order to produce a netlist including a list of gates from a synthesis library.
- It should be emphasized that the above-described embodiments are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Claims (20)
1. An apparatus comprising:
a data input stage of a sequential element configured to receive a data signal;
a shadow latch configured to receive only an output of the data input stage and a clock signal; and
a bypass circuit configured to receive the clock signal and a version of the data signal; and
in response to determining the clock signal is asserted:
send the output of the data input stage to the shadow latch;
prevent sending an output of the shadow latch as an output of the apparatus; and
send an output of the bypass circuit as the output of the apparatus.
2. The apparatus as recited in claim 1 , wherein the version of the data signal received by the bypass circuit is the data signal.
3. The apparatus as recited in claim 2 , wherein the output of the bypass circuit is an inverted value of the data signal generated by one stage of inversion.
4. The apparatus as recited in claim 2 , wherein the data input stage is configured to receive the clock signal.
5. The apparatus as recited in claim 4 , wherein the clock signal is a pulse clock signal generated from a source clock signal.
6. The apparatus as recited in claim 1 , wherein the version of the data signal received by the bypass circuit is the output of the data input stage, wherein the data input stage is a master latch.
7. The apparatus as recited in claim 6 , wherein the output of the bypass circuit is a non-inverted value of the data signal generated by two data input stages of inversion.
8. The apparatus as recited in claim 6 , wherein:
a sequential feedback circuit in the master latch is configured to receive a delayed version of the clock signal; and
a sequential feedback circuit in the shadow latch is configured to receive the clock signal.
9. A method, comprising:
receiving, by a data input stage of a sequential element, a data signal;
receiving, by a shadow latch, only an output of the data input stage and a clock signal;
receiving, by a bypass circuit, the clock signal and a version of the data signal;
in response to determining the clock signal is asserted:
sending the output of the data input stage to the shadow latch;
preventing sending an output of the shadow latch as an output of a flip-flop circuit; and
sending an output of the bypass circuit as the output of the flip-flop circuit.
10. The method as recited in claim 9 , wherein the version of the data signal received by the bypass circuit is the data signal.
11. The method as recited in claim 10 , wherein the output of the bypass circuit is an inverted value of the data signal generated by one stage of inversion.
12. The method as recited in claim 9 , wherein the version of the data signal received by the bypass circuit is the output of the data input stage, wherein the data input stage is a master latch.
13. The method as recited in claim 12 , wherein the output of the bypass circuit is a non-inverted value of the data signal generated by two data input stages of inversion.
14. The method as recited in claim 12 , further comprising:
receiving, by a sequential feedback circuit in the master latch, a delayed version of the clock signal; and
receiving, by a sequential feedback circuit in the shadow latch, the clock signal.
15. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable by a processor to:
receive, by a data input stage of a sequential element, a data signal;
receive, by a shadow latch, only an output of the data input stage and a clock signal;
receive, by a bypass circuit, the clock signal and a version of the data signal;
in response to determining the clock signal is asserted:
send the output of the data input stage to the shadow latch;
prevent sending an output of the shadow latch as an output of the flip-flop circuit; and
send an output of the bypass circuit as the output of the flip-flop circuit.
16. The non-transitory computer readable storage medium as recited in claim 15 , wherein the version of the data signal received by the bypass circuit is the data signal.
17. The non-transitory computer readable storage medium as recited in claim 16 , wherein the output of the bypass circuit is an inverted value of the data signal generated by one stage of inversion.
18. The non-transitory computer readable storage medium as recited in claim 16 , wherein the version of the data signal received by the bypass circuit is the output of the data input stage, wherein the data input stage is a master latch.
19. The non-transitory computer readable storage medium as recited in claim 18 , wherein the output of the bypass circuit is a non-inverted value of the data signal generated by two data input stages of inversion.
20. The non-transitory computer readable storage medium as recited in claim 18 , wherein the program instructions are executable by a processor to:
receive, by a sequential feedback circuit in the master latch, a delayed version of the clock signal; and
receive, by a sequential feedback circuit in the shadow latch, the clock signal.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/143,973 US20200106424A1 (en) | 2018-09-27 | 2018-09-27 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
US17/173,055 US11303268B2 (en) | 2018-09-27 | 2021-02-10 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/143,973 US20200106424A1 (en) | 2018-09-27 | 2018-09-27 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/173,055 Continuation US11303268B2 (en) | 2018-09-27 | 2021-02-10 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200106424A1 true US20200106424A1 (en) | 2020-04-02 |
Family
ID=69945235
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/143,973 Abandoned US20200106424A1 (en) | 2018-09-27 | 2018-09-27 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
US17/173,055 Active US11303268B2 (en) | 2018-09-27 | 2021-02-10 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/173,055 Active US11303268B2 (en) | 2018-09-27 | 2021-02-10 | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
Country Status (1)
Country | Link |
---|---|
US (2) | US20200106424A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11303268B2 (en) | 2018-09-27 | 2022-04-12 | Apple Inc. | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
US11693387B2 (en) | 2014-01-22 | 2023-07-04 | Omax Corporation | Generating optimized tool paths and machine commands for beam cutting tools |
US20240014811A1 (en) * | 2020-03-31 | 2024-01-11 | Taiwan Semiconductor Manufacturing Company, Ltd. | Gated tri-state inverter, and method of operating same |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5148052A (en) * | 1991-10-10 | 1992-09-15 | Intel Corporation | Recirculating transparent latch employing a multiplexing circuit |
US5250852A (en) * | 1992-04-16 | 1993-10-05 | Texas Instruments Incorporated | Circuitry and method for latching a logic state |
US5349255A (en) * | 1993-03-08 | 1994-09-20 | Altera Corporation | Programmable tco circuit |
US5378934A (en) * | 1990-09-12 | 1995-01-03 | Hitachi, Ltd. | Circuit having a master-and-slave and a by-pass |
US5656962A (en) * | 1994-11-30 | 1997-08-12 | Intel Corporation | Master-slave flip-flop circuit with bypass |
US6661121B2 (en) * | 2001-09-19 | 2003-12-09 | International Business Machines Corporation | Pulse generator with controlled output characteristics |
US6831495B2 (en) * | 2002-09-26 | 2004-12-14 | International Business Machines Corporation | Method and circuit for optimizing power consumption in a flip-flop |
US20050280459A1 (en) * | 2004-06-17 | 2005-12-22 | Matsushita Electric Industrial Co., Ltd. | Flip-flop circuit |
US7427875B2 (en) * | 2005-09-29 | 2008-09-23 | Hynix Semiconductor Inc. | Flip-flop circuit |
US7725792B2 (en) * | 2006-03-01 | 2010-05-25 | Qualcomm Incorporated | Dual-path, multimode sequential storage element |
US8067970B2 (en) * | 2006-03-31 | 2011-11-29 | Masleid Robert P | Multi-write memory circuit with a data input and a clock input |
US8497721B1 (en) * | 2011-06-15 | 2013-07-30 | Applied Micro Circuits Corporation | Pass gate shadow latch |
US8975933B1 (en) * | 2012-07-02 | 2015-03-10 | Marvell Israel (M.I.S.L.) Ltd. | Systems and methods for a bypass flip flop with transparency |
US20150207496A1 (en) * | 2014-01-22 | 2015-07-23 | Apple Inc. | Latch circuit with dual-ended write |
US9425771B2 (en) * | 2014-09-26 | 2016-08-23 | Texas Instruments Incorporated | Low area flip-flop with a shared inverter |
US9680450B2 (en) * | 2015-02-19 | 2017-06-13 | Advanced Micro Devices, Inc. | Flip-flop circuit with latch bypass |
US9911470B2 (en) * | 2011-12-15 | 2018-03-06 | Nvidia Corporation | Fast-bypass memory circuit |
US10338136B2 (en) * | 2016-08-29 | 2019-07-02 | Nxp Usa, Inc. | Integrated circuit with low power scan system |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5552738A (en) * | 1995-04-21 | 1996-09-03 | Texas Instruments Incorporated | High performance energy efficient push pull D flip flop circuits |
TW305958B (en) | 1995-05-26 | 1997-05-21 | Matsushita Electric Ind Co Ltd | |
US5767716A (en) * | 1995-09-26 | 1998-06-16 | Texas Instruments Incorporated | Noise insensitive high performance energy efficient push pull isolation flip-flop circuits |
JP3530422B2 (en) * | 1999-06-16 | 2004-05-24 | Necエレクトロニクス株式会社 | Latch circuit and register circuit |
US6501315B1 (en) * | 2001-12-12 | 2002-12-31 | Xilinx, Inc. | High-speed flip-flop operable at very low voltage levels with set and reset capability |
US7319344B2 (en) | 2005-12-15 | 2008-01-15 | P.A. Semi, Inc. | Pulsed flop with embedded logic |
JP2008219491A (en) | 2007-03-05 | 2008-09-18 | Nec Electronics Corp | Master slave type flip-flop circuit and latch circuit |
KR20110105153A (en) * | 2010-03-18 | 2011-09-26 | 삼성전자주식회사 | Flipflop circuit and scan flipflop circuit |
JP5416008B2 (en) | 2010-03-24 | 2014-02-12 | ルネサスエレクトロニクス株式会社 | Level shift circuit, data driver, and display device |
US8631213B2 (en) | 2010-09-16 | 2014-01-14 | Apple Inc. | Dynamic QoS upgrading |
US10049177B1 (en) * | 2015-07-07 | 2018-08-14 | Xilinx, Inc. | Circuits for and methods of reducing power consumed by routing clock signals in an integrated |
US10340898B1 (en) * | 2017-06-23 | 2019-07-02 | Xilinx, Inc. | Configurable latch circuit |
US10491197B2 (en) | 2017-09-20 | 2019-11-26 | Apple Inc. | Flop circuit with integrated clock gating circuit |
US20200106424A1 (en) | 2018-09-27 | 2020-04-02 | Apple Inc. | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
-
2018
- 2018-09-27 US US16/143,973 patent/US20200106424A1/en not_active Abandoned
-
2021
- 2021-02-10 US US17/173,055 patent/US11303268B2/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5378934A (en) * | 1990-09-12 | 1995-01-03 | Hitachi, Ltd. | Circuit having a master-and-slave and a by-pass |
US5148052A (en) * | 1991-10-10 | 1992-09-15 | Intel Corporation | Recirculating transparent latch employing a multiplexing circuit |
US5250852A (en) * | 1992-04-16 | 1993-10-05 | Texas Instruments Incorporated | Circuitry and method for latching a logic state |
US5349255A (en) * | 1993-03-08 | 1994-09-20 | Altera Corporation | Programmable tco circuit |
US5656962A (en) * | 1994-11-30 | 1997-08-12 | Intel Corporation | Master-slave flip-flop circuit with bypass |
US6661121B2 (en) * | 2001-09-19 | 2003-12-09 | International Business Machines Corporation | Pulse generator with controlled output characteristics |
US6831495B2 (en) * | 2002-09-26 | 2004-12-14 | International Business Machines Corporation | Method and circuit for optimizing power consumption in a flip-flop |
US20050280459A1 (en) * | 2004-06-17 | 2005-12-22 | Matsushita Electric Industrial Co., Ltd. | Flip-flop circuit |
US7427875B2 (en) * | 2005-09-29 | 2008-09-23 | Hynix Semiconductor Inc. | Flip-flop circuit |
US7725792B2 (en) * | 2006-03-01 | 2010-05-25 | Qualcomm Incorporated | Dual-path, multimode sequential storage element |
US8067970B2 (en) * | 2006-03-31 | 2011-11-29 | Masleid Robert P | Multi-write memory circuit with a data input and a clock input |
US8497721B1 (en) * | 2011-06-15 | 2013-07-30 | Applied Micro Circuits Corporation | Pass gate shadow latch |
US9911470B2 (en) * | 2011-12-15 | 2018-03-06 | Nvidia Corporation | Fast-bypass memory circuit |
US8975933B1 (en) * | 2012-07-02 | 2015-03-10 | Marvell Israel (M.I.S.L.) Ltd. | Systems and methods for a bypass flip flop with transparency |
US20150207496A1 (en) * | 2014-01-22 | 2015-07-23 | Apple Inc. | Latch circuit with dual-ended write |
US9425771B2 (en) * | 2014-09-26 | 2016-08-23 | Texas Instruments Incorporated | Low area flip-flop with a shared inverter |
US9680450B2 (en) * | 2015-02-19 | 2017-06-13 | Advanced Micro Devices, Inc. | Flip-flop circuit with latch bypass |
US10338136B2 (en) * | 2016-08-29 | 2019-07-02 | Nxp Usa, Inc. | Integrated circuit with low power scan system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11693387B2 (en) | 2014-01-22 | 2023-07-04 | Omax Corporation | Generating optimized tool paths and machine commands for beam cutting tools |
US11303268B2 (en) | 2018-09-27 | 2022-04-12 | Apple Inc. | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges |
US20240014811A1 (en) * | 2020-03-31 | 2024-01-11 | Taiwan Semiconductor Manufacturing Company, Ltd. | Gated tri-state inverter, and method of operating same |
Also Published As
Publication number | Publication date |
---|---|
US20210167759A1 (en) | 2021-06-03 |
US11303268B2 (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10461747B2 (en) | Low power clock gating circuit | |
US11303268B2 (en) | Semi dynamic flop and single stage pulse flop with shadow latch and transparency on both input data edges | |
US9564901B1 (en) | Self-timed dynamic level shifter with falling edge generator | |
US8559247B2 (en) | Dynamic level shifter for interfacing signals referenced to different power supply domains | |
US8912853B2 (en) | Dynamic level shifter circuit and ring oscillator using the same | |
US11005459B1 (en) | Efficient retention flop utilizing different voltage domain | |
CN107850919B (en) | Clock gating using delay circuits | |
CN106026990B (en) | Semiconductor circuit having a plurality of transistors | |
US10938383B2 (en) | Sequential circuit having increased negative setup time | |
US8169246B2 (en) | Dynamic-to-static converter latch with glitch suppression | |
US20120044009A1 (en) | Level-Shifting Latch | |
US10491197B2 (en) | Flop circuit with integrated clock gating circuit | |
US11128300B1 (en) | Level shifter circuit with intermediate power domain | |
US8493118B2 (en) | Low power scannable latch | |
US11258446B2 (en) | No-enable setup clock gater based on pulse | |
EP3907885B1 (en) | Low voltage clock swing tolerant sequential circuits for dynamic power savings | |
US20110016367A1 (en) | Skew tolerant scannable master/slave flip-flop including embedded logic | |
US20110202809A1 (en) | Pulse Flop with Enhanced Scan Implementation | |
US11139803B1 (en) | Low power flip-flop with balanced clock-to-Q delay | |
US11336272B2 (en) | Low power single retention pin flip-flop with balloon latch | |
US10566972B1 (en) | Analog switches immune to power sequence | |
US11496120B2 (en) | Flip-flop circuit with glitch protection | |
US9503086B1 (en) | Lockup latch for subthreshold operation | |
WO2013028369A1 (en) | One-of-n n-nary logic implementation of a storage cell |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VENUGOPAL, VIVEKANANDAN;BHATIA, AJAY KUMAR;REEL/FRAME:046994/0255 Effective date: 20180926 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |