WO2023056709A1 - 流水线时钟驱动电路、计算芯片、算力板和计算设备 - Google Patents

流水线时钟驱动电路、计算芯片、算力板和计算设备 Download PDF

Info

Publication number
WO2023056709A1
WO2023056709A1 PCT/CN2021/140016 CN2021140016W WO2023056709A1 WO 2023056709 A1 WO2023056709 A1 WO 2023056709A1 CN 2021140016 W CN2021140016 W CN 2021140016W WO 2023056709 A1 WO2023056709 A1 WO 2023056709A1
Authority
WO
WIPO (PCT)
Prior art keywords
clock
driving circuit
signal
pipeline
clock driving
Prior art date
Application number
PCT/CN2021/140016
Other languages
English (en)
French (fr)
Inventor
李楠
范志军
许超
段恋华
郭海丰
Original Assignee
深圳比特微电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳比特微电子科技有限公司 filed Critical 深圳比特微电子科技有限公司
Priority to KR1020237015561A priority Critical patent/KR102575572B1/ko
Priority to US17/795,777 priority patent/US20240313753A1/en
Priority to CA3165378A priority patent/CA3165378A1/en
Publication of WO2023056709A1 publication Critical patent/WO2023056709A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom

Definitions

  • the present disclosure relates to circuits for performing hash algorithms. More specifically, it relates to a pipeline clock drive circuit, and a computing chip, a hash board and a computing device including the pipeline clock drive circuit.
  • a mining machine chip used to generate cryptocurrency usually adopts a pipeline structure including multiple computing stages.
  • the operation logic is divided into several operation stages, each of which has a similar function design and operation structure.
  • the latch in each operation stage requires an operating clock (ie, a pulse clock). Therefore, for each computing stage, a pulse clock is input to it through the corresponding primary clock driving circuit.
  • the working clock for each operation stage comes from the same clock source, and the clock signal generated by the clock source is transmitted stage by stage through the pipeline clock driving circuit.
  • the basic principle of generating the working clock for the latch for each operation stage is to input both the input clock signal of the clock drive circuit of this stage and the delayed input clock signal to the gate circuit (such as NOR gate, NAND gate etc.) to generate a pulse clock, wherein the delayed input clock signal is generated after the input clock signal passes through the delay module.
  • the width of the pulse clock is basically determined by the delay time of the delay module.
  • the width of the pulse clock needs to meet the minimum pulse width requirements of the pipeline. That is, when the pulse clock is valid, the state (high level or low level) of the input clock signal of the clock driving circuit of this stage needs to remain unchanged, so as to maintain the state of the generated pulse clock for a time above the minimum pulse width. Therefore, the duty cycle of the input clock signal of each stage of the clock driving circuit needs to meet certain requirements.
  • One of the objects of the present disclosure is to provide an improved pipeline clock driving circuit.
  • a pipeline clock driving circuit which is used to provide a pulse clock signal for a pipeline including multiple operation stages.
  • the pipeline clock driving circuit includes: a multi-stage clock driving circuit, wherein each stage clock The driving circuit is used to provide a pulsed clock signal for a corresponding one of the multiple operation stages of the pipeline; and a clock source, coupled to the input of the first-stage clock drive circuit, is used to provide a basic clock signal, wherein the multi-stage clock
  • the inputs of the clock driving circuits at other levels except the first-level clock driving circuit are coupled to the output of the upper-level clock driving circuit, and each level of clock driving circuit includes: a flip-flop coupled to the clock of the current level The input of the drive circuit; the delay module, coupled to the output of the flip-flop, the delay module is used to delay the pulse signal output by the flip-flop, feed back the delayed pulse signal to the flip-flop and output it to the next A level clock drive circuit; and a combinatorial logic module, coupled
  • a computing chip which includes one or more pipeline clock driving circuits as described above.
  • a hashboard which includes one or more computing chips as described above.
  • a computing device which includes one or more hash boards as described above.
  • FIG. 1 shows a schematic diagram of a related art pipeline clock driving circuit.
  • FIG. 2A shows a schematic diagram of a pipeline clock driving circuit according to some embodiments of the present disclosure.
  • FIG. 2B shows a timing diagram of pulsed clock signals generated by the pipeline clock driving circuit according to some embodiments of the present disclosure.
  • FIG. 3A shows a schematic diagram of a pipeline clock driving circuit according to other embodiments of the present disclosure.
  • FIG. 3B shows a timing diagram of a pulse clock signal generated by a pipeline clock driving circuit according to other embodiments of the present disclosure.
  • Fig. 4 shows a schematic diagram of a pipeline clock driving circuit according to some other embodiments of the present disclosure.
  • FIG. 1 shows a schematic diagram of a related art pipeline clock driving circuit 100 .
  • the pipeline clock driving circuit 100 is used to provide a pulse clock signal for the pipeline 101 including a plurality of operation stages 101-1, . . . , 101-(N-1), 101-N.
  • the pipeline clock driving circuit 100 includes a clock source 110 and multi-stage clock driving circuits 120 - 1 , 120 - 2 , . . . , 120 -N.
  • the clock source 110 is coupled to the input of the first-stage clock driving circuit 120-1 for providing a basic clock signal.
  • Each stage of clock driving circuits in the multi-stage clock driving circuits 120-1, 120-2, ..., 120-N is used to provide multiple operation stages 101-N, 101-(N-1), ..., 101- A corresponding one of the arithmetic stages in 1 provides a pulse clock signal.
  • each stage of clock driving circuits 120-1, 120-2, ..., 120-N includes delay modules 130-1, 130-2, ..., 130-N and combinational logic modules 140-1, 140-2, ..., 140-N (such as NOR gate, NAND gate, etc.).
  • the delay modules 130-1, 130-2, . . . , 130-N are used to delay the input clock signals of the clock driving circuits 120-1, 120-2, . . . , 120-N.
  • Combination logic modules 140-1, 140-2, ..., 140-N are used for input clock signals to the clock driving circuits 120-1, 120-2, ..., 120-N of the stage and the delay modules 130-1, 130-2, ..., 130-N delays the input clock signal to perform logical operations (such as NOR, NAND, etc.), and outputs the result of the operation as the output pulse clock signal of the clock drive circuit 120-1, 120-2,..., 120-N, For supplying to a corresponding one of the operation stages 101-N, 101-(N-1), . . . , 101-1 of the pipeline 101.
  • logical operations such as NOR, NAND, etc.
  • the basic clock signal generated by the clock source has a duty cycle of 0.5.
  • the duty cycle will get worse and worse.
  • the main reason for the deterioration of the duty cycle of the clock signal is the accumulation of manufacturing errors of combinational logic devices.
  • combinational logic devices such as buffers and inverters. Due to the manufacturing process, there is an error in the performance parameters of these combinational logic devices, and this error makes the clock duty cycle deviate.
  • the influence caused by the parameter error of the combinational logic device in the clock driving circuit of each level is continuously accumulated, so the deviation of the clock duty cycle gradually increases. Therefore, the farther away from the clock source the worse the duty cycle of the input clock signal of the clock driving circuit, the worse the pulse clock generated accordingly, so that it may not be able to meet the minimum pulse width requirement of the corresponding operation stage.
  • the duty cycle of the clock signal will deviate.
  • this deviation of the duty cycle will gradually accumulate, making the farther away from the clock source 110
  • the present disclosure proposes an improved pipeline clock driving circuit, wherein the pulse width of the pulse clock signal generated by the clock driving circuit at each stage has nothing to do with its input clock signal.
  • FIG. 2A shows a schematic diagram of a pipeline clock driving circuit 200 according to some embodiments of the present disclosure.
  • FIG. 2B shows a timing diagram of pulsed clock signals generated by the pipeline clock driving circuit 200 according to some embodiments of the present disclosure.
  • the pipeline clock driving circuit 200 is used to provide a pulse clock signal for the pipeline 201 including a plurality of operation stages 201-1, . . . , 201-N.
  • the pipeline clock driving circuit 200 includes a clock source 210 and multi-stage clock driving circuits 220 - 1 , . . . , 220 -N.
  • the clock source 210 is coupled to the input of the first-stage clock driving circuit 220-1 for providing a basic clock signal.
  • the duty ratio of the basic clock signal provided by the clock source 210 may be 0.5, and the frequency may be several hundred megahertz, for example, 400-700MHz.
  • the inputs of the clock driving circuits at other levels except the first-level clock driving circuit 220-1 are coupled to the output of the upper-level clock driving circuit, and each The stage clock driving circuits 220 - 1 , .
  • each stage of clock driving circuits 220-1,..., 220-N includes flip-flops 230-1,..., 230-N, delay modules 240-1,..., 240-N and combinational logic modules 250-1, ..., 250-N.
  • the flip-flops 230-1, . . . , 230-N are coupled to the input of the clock driving circuit of the present stage. That is, the flip-flop 230-1 in the first-stage clock driving circuit 220-1 is coupled to the output of the clock source 210, while the flip-flops in the other stages of clock driving circuits are coupled to the output of the upper-stage clock driving circuit.
  • the flip-flops 230-1, . . . , 230-N may be edge triggers.
  • the type and connection mode of the flip-flops 230-1, . . . , 230-N can be configured as required.
  • FIG. 2A An embodiment of flip-flops 230 - 1 , . . . , 230 -N as rising-edge D flip-flops is shown in FIG. 2A .
  • the SET terminals of the flip-flops 230-1, ..., 230-N are coupled to the outputs of the delay modules 240-1, ..., 240-N, and the D terminals are fixed at low level (ie logic "0")
  • the CP terminal is coupled to the output of the upper clock drive circuit
  • the output terminal Q is coupled to the delay modules 240-1, . . . , 240-N as their input.
  • the flip-flops 230-1, . . . , 230-N may be, for example, falling-edge flip-flops, and their connection manners may also be adjusted accordingly (details will be described in the embodiment shown in FIG. 3A below).
  • the inputs of the delay blocks 240-1, . . . , 240-N are coupled to the outputs of the flip-flops 230-1, . . . , 230-N.
  • the delay modules 240-1,...,240-N are used to delay the pulse signals output by the flip-flops 230-1,...,230-N, and feed back the delayed pulse signals to the flip-flops 230-1,... , 230-N and output to the next-level clock drive circuit.
  • the delay modules 240-1, . . . , 240-N also invert the pulse signals output by the flip-flops 230-1, . . . , 230-N.
  • the delay modules 240 - 1 , . . . , 240 -N can be realized by several buffers and/or inverters.
  • the delay modules 240 - 1 , . . . , 240 -N may be composed of an odd number of inverters.
  • the delay modules 240-1, . . . , 240-N may be composed of several buffers and an odd number of inverters.
  • Combinational logic blocks 250-1, . . . , 250-N are coupled to the outputs of flip-flops 230-1, .
  • Combination logic modules 250-1, ..., 250-N perform pulse signals output by flip-flops 230-1, ..., 230-N and delayed pulse signals output by delay modules 240-1, ..., 240-N
  • the logic operations are combined to generate a pulsed clock signal to be provided to a corresponding one of the operation stages 201-N, . . . , 201-1 of the pipeline 201.
  • the combinational logic modules 250-1,...,250-N can be composed of OR gates or OR A NOT gate is formed.
  • the combinational logic modules 250-1, . . . , 250-N may be composed of AND gates or NAND gates (details will be described in the embodiment shown in FIG. 3A below).
  • the direction in which the pulse signal is transmitted in the multi-stage clock driving circuit 220-1,..., 220-N is the same as that of the data signal in the multiple operation stages 201-1, ..., the direction passed in 201-N is reversed. That is, the first-stage clock driving circuit 220-1 is used to provide a pulse clock signal for the last computing stage 201-N, and the last-stage clock driving circuit 220-N is used to provide a pulse clock signal for the first computing stage 201-1 signal, and so on.
  • Such an arrangement can more easily meet the requirements of the operation timing of each operation stage 201-1, . . . , 201-N.
  • the timing of generating the pulse clock signal is described below by taking the first-stage clock driving circuit 220 - 1 as an example.
  • the CP end of the flip-flop 230-1 receives the basic clock signal S201 from the clock source 210 as an input signal (correspondingly, the CP ends of the flip-flops of the following stages receive the output signal from the output of the delay module in the previous stage clock drive circuit respectively. S203 as the input signal), and the pulse signal S202 is provided at the output terminal Q to one input terminal of the delay module 240-1 and the combinational logic module 250-1 (NOR gate in this embodiment).
  • the delay module 240-1 inverts and delays the pulse signal S202 to obtain the output signal S203, and provides the output signal S203 to the SET terminal of the flip-flop 230-1 and the other input terminal of the combinational logic module 250-1 respectively , and provide an input signal for the next-level clock drive circuit.
  • the pulse signal S202 at the output terminal Q of the flip-flop 230-1 will be stable at a high level.
  • the output signal S203 of the delay module 240-1 is stable at a low level, that is, the SET terminal of the flip-flop 230-1 is at a low level, and the input signal of the clock driving circuit of the next stage is also at a low level (similar to the first Corresponding to the input signal S201 of the primary clock driving circuit 220-1). Therefore, the input signals of the combinatorial logic module 250 - 1 (NOR gate) are respectively high level ( S202 ) and low level ( S203 ), and the output pulse clock signal S204 is low level.
  • the clock source 210 starts to output the basic clock signal S201.
  • the period of the basic clock signal S201 is T.
  • the signal S201 changes from low level to high level
  • the rising edge of the signal at the CP terminal of flip-flop 230-1 arrives, and the signal at the SET terminal (S203) is still at low level, so that flip-flop 230-1
  • the signal S202 of the output terminal Q of 1 becomes the signal value of the D terminal, that is, low level. Therefore, the input signals of the combinational logic module 250 - 1 (NOR gate) are low level ( S202 ) and low level ( S203 ), respectively, and the pulse clock signal S204 outputted by it becomes high level.
  • t 0 is the delay between the signal S203 and the signal S202, which is determined by the configuration of the delay module 240-1. In the embodiment shown in FIG. 2A , t 0 is the sum of delay times of multiple inverters in the delay module 240 - 1 .
  • the SET terminal of the flip-flop 230-1 becomes high level, so that the signal S202 of the output terminal Q of the flip-flop 230-1 becomes high level.
  • the input signals of the combinatorial logic module 250 - 1 are high level ( S202 ) and high level ( S203 ), respectively, and the pulse clock signal S204 outputted by it becomes low level.
  • the output signal S203 of the delay module 240-1 becomes low level.
  • the SET terminal of the flip-flop 230-1 becomes low level, but there is no signal rising edge coming at the CP terminal, so the signal S202 of the output terminal Q of the flip-flop 230-1 remains is high level.
  • the input signals of the combinatorial logic module 250 - 1 are high level ( S202 ) and low level ( S203 ), respectively, and the output pulse clock signal S204 is still low level.
  • the output signal S203 of the delay module 240-1 becomes high level.
  • the SET terminal of the flip-flop 230-1 becomes high level, so that the signal S202 of the output terminal Q of the flip-flop 230-1 becomes high level.
  • the pulse clock signal S204 at the output terminal of the combinational logic module 250-1 becomes low level.
  • the output signal S203 of the delay module 240-1 becomes low level.
  • the signal S202 at the output terminal Q of the flip-flop 230 - 1 remains at a high level
  • the pulse clock signal S204 at the output terminal of the combinational logic module 250 - 1 remains at a low level.
  • a pulse clock signal S204 with period T and pulse width t0 is generated at the output terminal of the combinational logic module 250-1.
  • the pulsed clock signal S204 is provided to the corresponding operation stage 201-N as an operating clock.
  • the output signal S203 is generated at the output terminal of the delay module 240-1, and the output signal S203 is simultaneously used as the input signal of the next-stage clock driving circuit (equivalent to the input signal S201 of the first-stage clock driving circuit 220-1) .
  • the rising edge of the output signal S203 is used to trigger the flip-flop of the clock driving circuit of the next stage. As shown in FIG. 2B , the rising edge of the output signal S203 is delayed by t 0 from the rising edge of the input signal S201 . Similarly, the rising edge of the output signal generated by each stage of the clock driving circuit is delayed by t 0 from the rising edge of the input signal of the clock driving circuit, which meets the working requirements of each operation stage of the pipeline.
  • the pulse width t 0 of the pulse clock signal generated by the clock driving circuit of each stage is only determined by the configuration of the clock driving circuit of this stage, and has nothing to do with the input signal of the clock driving circuit of this stage.
  • the manufacturing errors of the combinational logic devices in the clock driving circuits at all levels may still cause deviations in the pulse widths of the input signals and output signals at all levels
  • the pulse width of the pulse clock signals generated by each level of clock driving circuits is different from that of the input signals.
  • the pulse width is independent, so this deviation in pulse width does not accumulate as the signal passes through the stages of the clock driver circuit.
  • the possible deviation of the pulse width of the pulse clock signal generated by each level of clock driving circuit has nothing to do with the possible manufacturing errors of the combinational logic devices in the previous levels of clock driving circuits, but only with this level of clock driving circuit related to possible manufacturing errors of combinational logic devices in .
  • Such manufacturing tolerances are usually small, and thus the resulting pulse width deviation is acceptable.
  • FIG. 3A shows a schematic diagram of a pipeline clock driving circuit 300 according to other embodiments of the present disclosure.
  • FIG. 3B shows a timing diagram of pulsed clock signals generated by the pipeline clock driving circuit 300 according to other embodiments of the present disclosure.
  • the pipeline clock driving circuit 300 is used to provide a pulse clock signal for the pipeline 301 including a plurality of operation stages 301-1, . . . , 301-N. As shown in FIG. 3A , the pipeline clock driving circuit 300 includes a clock source 310 and multi-stage clock driving circuits 320 - 1 , . . . , 320 -N.
  • the clock source 310 is coupled to the input of the first-stage clock driving circuit 320-1 for providing a basic clock signal.
  • the inputs of the clock driving circuits at other levels except the first-level clock driving circuit 320-1 are coupled to the output of the upper-level clock driving circuit, and each The stage clock driving circuits 320 - 1 , .
  • each stage of clock driving circuits 320-1,...,320-N includes flip-flops 330-1,...,330-N, delay modules 340-1,...,340-N and combinational logic modules 350-1, ..., 350-N.
  • the flip-flops 330-1, . . . , 330-N are coupled to the input of the clock driving circuit of the present stage. That is, the flip-flop 330-1 in the first-stage clock driving circuit 320-1 is coupled to the output of the clock source 310, while the flip-flops in the other stages of clock driving circuits are coupled to the output of the upper-stage clock driving circuit.
  • FIG. 3A An embodiment of flip-flops 330 - 1 , . . . , 330 -N as falling edge D flip-flops is shown in FIG. 3A .
  • the RESET terminals of the flip-flops 330-1,..., 330-N are coupled to the outputs of the delay modules 340-1,..., 340-N, and the D terminals are fixed at a high level (ie logic "1")
  • the CPN terminal is coupled to the output of the upper-level clock drive circuit
  • the output terminal Q is coupled to the delay modules 340-1, . . . , 340-N as its input.
  • the RESET terminal signal of the falling edge D flip-flop is low level, the output terminal Q is always low level.
  • the signal at the RESET terminal is at a high level, whenever the falling edge of the signal at the CPN terminal arrives, the output terminal Q becomes the signal value at the D terminal.
  • the inputs of the delay blocks 340-1, . . . , 340-N are coupled to the outputs of the flip-flops 330-1, . . . , 330-N.
  • the delay modules 340-1,...,340-N are used to delay the pulse signals output by the flip-flops 330-1,...,330-N, and feed back the delayed pulse signals to the flip-flops 330-1,... , 330-N and output to the next level of clock drive circuit.
  • the delay modules 340-1, . . . , 340-N also invert the pulse signals output by the flip-flops 330-1, . . . , 330-N.
  • the delay modules 340 - 1 , . . . , 340 -N may be composed of an odd number of inverters.
  • Combinational logic modules 350-1,...,350-N are coupled to the outputs of flip-flops 330-1,...,330-N and delay modules 340-1,...,340-N.
  • Combination logic modules 350-1,...,350-N perform pulse signals output by flip-flops 330-1,...,330-N and delayed pulse signals output by delay modules 340-1,...,340-N
  • the logic operations are combined to generate a pulse clock signal to be provided to a corresponding one of the operation stages 301-N, . . . , 301-1 of the pipeline 301.
  • the combinational logic modules 350-1,...,350-N can be formed by NAND gates .
  • the timing of generating the pulse clock signal is described below by taking the first-stage clock driving circuit 320 - 1 as an example.
  • the CPN end of the flip-flop 330-1 receives the basic clock signal S301 from the clock source 310 as an input signal (correspondingly, the CPN ends of the flip-flops of the following stages receive the output signal from the output of the delay module in the previous stage of clock drive circuit respectively. S303 as the input signal), and the pulse signal S302 is provided at the output terminal Q to one input terminal of the delay module 340-1 and the combinational logic module 350-1 (in this embodiment, a NAND gate).
  • the delay module 340-1 inverts and delays the pulse signal S302 to obtain the output signal S303, and provides the output signal S303 to the RESET terminal of the flip-flop 330-1 and the other input terminal of the combinational logic module 350-1 respectively , and provide an input signal for the next-level clock drive circuit.
  • the pulse signal S302 at the output terminal Q of the flip-flop 330-1 will be stable at a low level.
  • the output signal S303 of the delay module 340-1 is stable at a high level, that is, the RESET terminal of the flip-flop 330-1 is at a high level, and the input signal of the next-level clock drive circuit is also at a high level (similar to the first Corresponding to the input signal S301 of the primary clock driving circuit 320-1). Therefore, the input signals of the combinatorial logic module 350 - 1 (NAND gate) are respectively low level ( S302 ) and high level ( S303 ), and the pulse clock signal S304 outputted by it is high level.
  • the clock source 310 starts to output the basic clock signal S301.
  • the period of the basic clock signal S301 is T.
  • the signal S301 changes from high level to low level
  • the falling edge of the signal at the CPN terminal of flip-flop 330-1 arrives, and the signal at the RESET terminal (S303) is still at high level, so that flip-flop 330-1
  • the signal S302 at the output terminal Q of 1 changes to the signal value at the D terminal, that is, a high level. Therefore, the input signals of the combinatorial logic module 350 - 1 (NAND gate) are respectively high level ( S302 ) and high level ( S303 ), and the pulse clock signal S304 outputted by it becomes low level.
  • t 0 is the delay between the signal S303 and the signal S302, which is determined by the configuration of the delay module 340-1. In the embodiment shown in FIG. 3A , t 0 is the sum of delay times of multiple inverters in the delay module 340 - 1 .
  • the RESET terminal of the flip-flop 330-1 becomes low level, so that the signal S302 of the output terminal Q of the flip-flop 330-1 becomes low level.
  • the input signals of the combinational logic module 350 - 1 are low level ( S302 ) and low level ( S303 ), respectively, and the pulse clock signal S304 outputted by it becomes high level.
  • the output signal S303 of the delay module 340-1 becomes high level.
  • the RESET terminal of the flip-flop 330-1 becomes high level, but there is no signal rising edge coming at the CPN terminal, so the signal S302 of the output terminal Q of the flip-flop 330-1 remains is low level.
  • the input signals of the combinatorial logic module 350 - 1 are low level ( S302 ) and high level ( S303 ), respectively, and the pulse clock signal S304 outputted by it is still high level.
  • the output signal S303 of the delay module 340-1 becomes low level.
  • the RESET terminal of the flip-flop 330-1 becomes low level, so that the signal S302 of the output terminal Q of the flip-flop 330-1 becomes low level.
  • the pulse clock signal S304 at the output terminal of the combinational logic module 350-1 becomes high level.
  • the output signal S303 of the delay module 340-1 becomes high level.
  • the signal S302 at the output terminal Q of the flip-flop 330 - 1 remains at a low level
  • the pulse clock signal S304 at the output terminal of the combinational logic module 350 - 1 remains at a high level.
  • a pulse clock signal S304 with period T and pulse width t0 is generated at the output terminal of the combinational logic module 350-1.
  • the pulsed clock signal S304 is provided to the corresponding operation stage 301-N as an operating clock.
  • the output signal S303 is generated at the output terminal of the delay module 340-1, and the output signal S303 is also used as the input signal of the next-stage clock driving circuit (equivalent to the input signal S301 of the first-stage clock driving circuit 320-1) .
  • the falling edge of the output signal S303 is used to trigger the flip-flop of the clock driving circuit of the next stage. As shown in FIG. 3B , the falling edge of the output signal S303 is delayed by t 0 from the falling edge of the input signal S301 . Similarly, the falling edge of the output signal generated by each stage of the clock driving circuit is delayed by t 0 from the falling edge of the input signal of the clock driving circuit, which meets the working requirements of each operation stage of the pipeline.
  • the pulse width of the pulse clock generated by the pipeline clock driving circuit according to the present disclosure is determined by the time t0 delayed by the delay module.
  • the delay module is composed of an inverter. The larger the number of inverters, the larger the pulse width of the generated pulse clock signal, and the lower the operating frequency of the pipeline. In engineering practice, it is generally desired to make the operating frequency of the pipeline as high as possible under the condition that the pulse width of the pulse clock can meet the requirements. For this reason, the present disclosure provides a further improved pipeline clock driving circuit, wherein the number of inverters constituting the delay module can be flexibly adjusted.
  • FIG. 4 shows a schematic diagram of a pipeline clock driving circuit 400 according to still other embodiments of the present disclosure.
  • the pipeline clock driving circuit 400 is used to provide a pulse clock signal for the pipeline. As shown in FIG. 4 , the pipeline clock driving circuit 400 includes a clock source 410 and a multi-stage clock driving circuit.
  • the first-stage clock driving circuit 420-1 is used to provide a pulse clock signal for the last-stage operation stage 401-N of the pipeline.
  • the first stage clock driving circuit 420-1 includes a flip-flop 430-1, a delay module 440-1 and a combinational logic module 450-1.
  • the configuration mode of the flip-flop 430-1, the delay module 440-1 and the combinational logic module 450-1 is similar to the embodiment shown in FIG. 2A, and the timing of generating the pulse clock signal is similar to the timing shown in FIG. 2B. I won't repeat them here.
  • the delay module 440 - 1 is composed of multiple inverters and one or more data selectors.
  • the data selector is configured such that a plurality of inverters form a plurality of signal paths, and the number of inverters in each signal path is an odd number.
  • the delay module 440-1 is composed of seven inverters and three data selectors, which form four signal paths, and these four signal paths include one, three, five and Seven inverters. Therefore, in the embodiment shown in FIG. 4, by switching the states of the three data selectors, the generated clock pulse signals can have four different pulse widths (that is, corresponding to one, three, and five pulse widths respectively). and the sum of the delay times of the seven inverters).
  • the state of the data selector can be changed flexibly and conveniently according to the actual pulse width requirement, so that the operating frequency of the pipeline can be as high as possible, thereby improving the working efficiency of the chip.
  • the configuration of the delay module 440-1 shown in FIG. 4 is only an example.
  • the delay module 440-1 can be composed of any appropriate number of inverters and data selectors in any appropriate configuration to form multiple signal paths, so that each signal path includes an appropriate number of inverter.
  • the number of inverters in each signal path is different.
  • arithmetic circuit can be realized in various appropriate ways such as software, hardware, a combination of software and hardware.
  • a computing chip may include one or more pipeline clock driving circuits described above.
  • a hash board may include one or more computing chips.
  • a computing device may include one or more hashboards. Multiple hash boards can perform computing tasks in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Logic Circuits (AREA)
  • Power Sources (AREA)
  • Pulse Circuits (AREA)

Abstract

本公开涉及流水线时钟驱动电路、计算芯片、算力板和计算设备。公开了一种流水线时钟驱动电路,为包括多个运算级的流水线提供脉冲时钟信号,包括:多级时钟驱动电路,每级时钟驱动电路为相应运算级提供脉冲时钟信号;和时钟源,耦合到第一级时钟驱动电路的输入,用于提供基本时钟信号,其他各级时钟驱动电路的输入耦合到上一级时钟驱动电路的输出,每级时钟驱动电路包括:触发器,耦合到本级时钟驱动电路的输入;延时模块,耦合到触发器的输出,对触发器输出的脉冲信号进行延时,将其反馈到触发器并输出到下一级时钟驱动电路;和组合逻辑模块,耦合到触发器和延时模块的输出,对其进行组合逻辑运算来产生脉冲时钟信号以提供到流水线的相应运算级。

Description

流水线时钟驱动电路、计算芯片、算力板和计算设备
相关申请的交叉引用
本申请是以CN申请号为202111174118.2,申请日为2021年10月9日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及用于执行哈希算法的电路。更具体地,涉及一种流水线时钟驱动电路,以及包括流水线时钟驱动电路的计算芯片、算力板和计算设备。
背景技术
用于产生加密货币的矿机类芯片通常采用包括多个运算级的流水线的结构。根据所使用的算法,将运算逻辑分成若干个运算级,其中每一个运算级具有相似的功能设计和运算结构。特别地,在流水线的各个运算级中使用锁存器作为时序器件时,每个运算级中的锁存器需要一个工作时钟(即脉冲时钟)。因此,对于每个运算级,通过相应的一级时钟驱动电路向其输入一个脉冲时钟。通常,用于每个运算级的工作时钟来源于同一个时钟源,该时钟源所产生的时钟信号通过流水线时钟驱动电路而逐级传递。
针对每个运算级产生用于锁存器的工作时钟的基本原理是将本级时钟驱动电路的输入时钟信号和延时的输入时钟信号二者输入到门电路(如或非门、与非门等)从而产生脉冲时钟,其中延时的输入时钟信号是输入时钟信号经过延时模块后产生的。该脉冲时钟的宽度基本由延时模块的延迟时间决定。
需要注意的是,该脉冲时钟的宽度需要满足流水线的最小脉冲宽度的要求。即,当脉冲时钟有效时,需要该级时钟驱动电路的输入时钟信号的状态(高电平或低电平)保持不变,从而维持所产生的脉冲时钟的状态保持最小脉冲宽度以上的时间。因此,每一级时钟驱动电路的输入时钟信号的占空比需要满足一定的要求。
发明内容
本公开的目的之一是提供一种改进的流水线时钟驱动电路。
根据本公开的一个方面,提供了一种流水线时钟驱动电路,用于为包括多个运算级的流水线提供脉冲时钟信号,所述流水线时钟驱动电路包括:多级时钟驱动电路,其中每一级时钟驱动电路用于为流水线的多个运算级中的相应一个运算级提供脉冲时钟信号;以及时钟源,耦合到第一级时钟驱动电路的输入,用于提供基本时钟信号,其中所述多级时钟驱动电路中的除第一级时钟驱动电路以外的其他各级时钟驱动电路的输入耦合到上一级时 钟驱动电路的输出,并且其中每一级时钟驱动电路包括:触发器,耦合到本级时钟驱动电路的输入;延时模块,耦合到触发器的输出,所述延时模块用于对触发器输出的脉冲信号进行延时,将延时后的脉冲信号反馈到触发器并输出到下一级时钟驱动电路;以及组合逻辑模块,耦合到触发器和延时模块的输出,所述组合逻辑模块对触发器输出的脉冲信号和延时模块输出的延时后的脉冲信号进行组合逻辑运算来产生脉冲时钟信号以提供到流水线的相应一个运算级。
根据本公开的另一个方面,提供了一种计算芯片,其包括一个或多个如上所述的流水线时钟驱动电路。
根据本公开的又一个方面,提供了一种算力板,其包括一个或多个如上所述的计算芯片。
根据本公开的又一个方面,提供了一种计算设备,其包括一个或多个如上所述的算力板。
根据参照附图的以下描述,本公开的其它特性特征和优点将变得清晰。
附图说明
所包括的附图用于说明性目的,并且仅用于提供本文所公开的发明性装置以及将其应用到计算设备的方法的可能结构和布置的示例。这些附图决不限制本领域的技术人员在不脱离实施方案的实质和范围的前提下可对实施方案进行的在形式和细节方面的任何更改。所述实施方案通过下面结合附图的具体描述将更易于理解,其中类似的附图标记表示类似的结构元件。
图1示出了相关技术的流水线时钟驱动电路的示意图。
图2A示出了根据本公开的一些实施例的流水线时钟驱动电路的示意图。
图2B示出了根据本公开的一些实施例的流水线时钟驱动电路所产生的脉冲时钟信号的时序图。
图3A示出了根据本公开的另一些实施例的流水线时钟驱动电路的示意图。
图3B示出了根据本公开的另一些实施例的流水线时钟驱动电路所产生的脉冲时钟信号的时序图。
图4示出了根据本公开的又一些实施例的流水线时钟驱动电路的示意图。
注意,在以下说明的实施方式中,有时在不同的附图之间共同使用同一附图标记来表示相同部分或具有相同功能的部分,而省略其重复说明。在本说明书中,使用相似的标号和字母表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
为了便于理解,在附图等中所示的各结构的位置、尺寸及范围等有时不表示实际的位置、尺寸及范围等。因此,所公开的发明并不限于附图等所公开的位置、尺寸及范围等。此外,附图不必按比例绘制,一些特征可能被放大以示出具体组件的细节。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应当注意,除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。也就是说,本文中的哈希引擎是以示例性的方式示出,来说明本公开中的电路的不同实施例,而并非意图限制。本领域的技术人员将会理解,它们仅仅说明可以用来实施本公开的示例性方式,而不是穷尽的方式。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
图1示出了相关技术的流水线时钟驱动电路100的示意图。流水线时钟驱动电路100用于为包括多个运算级101-1,…,101-(N-1),101-N的流水线101提供脉冲时钟信号。
如图1所示,流水线时钟驱动电路100包括时钟源110和多级时钟驱动电路120-1,120-2,…,120-N。时钟源110耦合到第一级时钟驱动电路120-1的输入,用于提供基本时钟信号。多级时钟驱动电路120-1,120-2,…,120-N中的每一级时钟驱动电路用于为流水线101的多个运算级101-N,101-(N-1),…,101-1中的相应一个运算级提供脉冲时钟信号。
其中,每一级时钟驱动电路120-1,120-2,…,120-N包括延时模块130-1,130-2,…,130-N和组合逻辑模块140-1,140-2,…,140-N(如或非门、与非门等)。延时模块130-1,130-2,…,130-N用于对该级时钟驱动电路120-1,120-2,…,120-N的输入时钟信号进行延时。组合逻辑模块140-1,140-2,…,140-N用于对该级时钟驱动电路120-1,120-2,…,120-N的输入时钟信号与经延时模块130-1,130-2,…,130-N延时的输入时钟信号进行逻辑运算(如或非、与非等),并且输出其运算结果作为该级时钟驱动电路120-1,120-2,…,120-N的输出脉冲时钟信号,用于提供到流水线101的相应一个运算级101-N,101-(N-1),…,101-1。
通常,时钟源产生的基本时钟信号的占空比为0.5。但是,在该时钟信号通过流水线时钟驱动电路逐级传递的过程中,占空比会越来越差。造成时钟信号占空比变差的主要原因是组合逻辑器件的制造误差的累积。在时钟信号通过流水线时钟驱动电路逐级传递的过程中,其会经过许多缓冲器、反相器等组合逻辑器件。由于制造工艺,这些组合逻辑器件的性能参数存在误差,而这种误差使得时钟占空比发生偏差。并且,随着时钟信号的逐级传递,各级时钟驱动电路中的组合逻辑器件的参数误差造成的影响不断累积,因而时钟占空比的这种偏差逐渐增加。因此,离时钟源越远的时钟驱动电路的输入时钟信号的占空比越差,其所产生的脉冲时钟也相应地越差,以致可能不能满足相应运算级的最小脉冲宽度的要求。
即,由于延时模块130-1,130-2,…,130-N中的组合逻辑器件(如缓冲器、反相器等)的制造误差,时钟信号的占空比会发生偏差。在从时钟源110输出的基本时钟信号通过各级时钟驱动电路120-1,120-2,…,120-N传递的过程中,占空比的这种偏差会逐渐累积,使得离时钟源110越远的运算级(101-1)的时钟脉冲宽度越差, 以至无法满足该运算级对工作时钟的脉冲宽度的要求。
针对这一问题,本公开提出一种改进的流水线时钟驱动电路,其中各级时钟驱动电路所产生的脉冲时钟信号的脉冲宽度与其输入时钟信号无关。
图2A示出了根据本公开的一些实施例的流水线时钟驱动电路200的示意图。图2B示出了根据本公开的一些实施例的流水线时钟驱动电路200所产生的脉冲时钟信号的时序图。
流水线时钟驱动电路200用于为包括多个运算级201-1,…,201-N的流水线201提供脉冲时钟信号。
如图2A所示,流水线时钟驱动电路200包括时钟源210和多级时钟驱动电路220-1,…,220-N。
时钟源210耦合到第一级时钟驱动电路220-1的输入,用于提供基本时钟信号。时钟源210提供的基本时钟信号的占空比可以是0.5,频率可以是几百兆赫兹,例如400-700MHz。
多级时钟驱动电路220-1,…,220-N中的除第一级时钟驱动电路220-1以外的其他各级时钟驱动电路的输入耦合到上一级时钟驱动电路的输出,其中每一级时钟驱动电路220-1,…,220-N用于为流水线201的多个运算级201-N,…,201-1中的相应一个运算级提供脉冲时钟信号。
其中,每一级时钟驱动电路220-1,…,220-N包括触发器230-1,…,230-N、延时模块240-1,…,240-N和组合逻辑模块250-1,…,250-N。
触发器230-1,…,230-N耦合到本级时钟驱动电路的输入。即,第一级时钟驱动电路220-1中的触发器230-1耦合到时钟源210的输出,而其他各级时钟驱动电路中的触发器耦合到上一级时钟驱动电路的输出。触发器230-1,…,230-N可以是边沿触发器。触发器230-1,…,230-N的类型和连接方式可以根据需要来配置。
在图2A中示出了触发器230-1,…,230-N作为上升沿D触发器的实施例。在图2A所示的实施例中,触发器230-1,…,230-N的SET端耦合到延时模块240-1,…,240-N的输出,D端固定为低电平(即逻辑“0”),CP端耦合到上一级时钟驱动电路的输出,而输出端Q耦合到延时模块240-1,…,240-N作为其输入。在上升沿D触发器的SET端信号为高电平的情况下,输出端Q始终为高电平。在SET端信号为低电平的情况下,每当CP端信号的上升沿到来时,输出端Q变为D端的信号值。在其它实施例中,触发器230-1,…,230-N可以是例如下降沿触发器,并且其连接方式也可以随之调整(以下将在图3A所示的实施例中详细描述)。
延时模块240-1,…,240-N的输入耦合到触发器230-1,…,230-N的输出。延时模块240-1,…,240-N用于对触发器230-1,…,230-N输出的脉冲信号进行延时,将延时后的脉冲信号反馈到触发器230-1,…,230-N并输出到下一级时钟驱动电路。在优选的实施例中,延时模块240-1,…,240-N还对触发器230-1,…,230-N输出的脉冲信号进行反相。延时模块240-1,…,240-N可以由若干个缓冲器和/或反相器来实现。在优选的实施例中,如图2A所示,延时模块240-1,…,240-N可以由奇数个反相器构成。在其它实施例中,延时模块240-1,…,240-N可以由若干个缓冲器和奇数个反相器构成。
组合逻辑模块250-1,…,250-N耦合到触发器230-1,…,230-N和延时模块240-1,…,240-N的输出。组合逻辑模块250-1,…,250-N对触发器230-1,…,230-N输出的脉冲信号和延时模块240-1,…,240-N输出的延时后的脉冲信号进行组合逻辑运算来产生脉冲时钟信号以提供到流水线201的相应一个运算级201-N,…,201-1。可以根据触发器230-1,…,230-N的类型来相应地设计组合逻辑模块250-1,…,250-N。在图2A所示的实施例中,在触发器230-1,…,230-N作为上升沿D触发器的情况下,组合逻辑模块250-1,…,250-N可以由或门或者或非门构成。在其它实施例中,组合逻辑模块250-1,…,250-N可以由与门或者与非门构成(以下将在图3A所示的实施例中详细描述)。
在优选的实施例中,如图2A所示,脉冲信号在多级时钟驱动电路中220-1,…,220-N传递的方向与数据信号在流水线的所述多个运算级201-1,…,201-N中传递的方向相反。即,第一级时钟驱动电路220-1用于为最后一个运算级201-N提供脉冲时钟信号,而最后一级时钟驱动电路220-N用于为第一个运算级201-1提供脉冲时钟信号,依此类推。这样的布置能够更容易地满足各个运算级201-1,…,201-N的运算时序的要求。
参考图2B,以下以第一级时钟驱动电路220-1为例描述其产生脉冲时钟信号的时序。
触发器230-1的CP端从时钟源210接收基本时钟信号S201作为输入信号(相应地,后面各级触发器的CP端分别从前一级时钟驱动电路中的延时模块的输出处接收输出信号S203作为输入信号),并且在输出端Q处将脉冲信号S202提供至延时模块240-1和组合逻辑模块250-1(在此实施例中为或非门)的一个输入端。延时模块240-1对脉冲信号S202进行反相并延时以得到输出信号S203,并且将输出信号S203分别提供至触发器230-1的SET端和组合逻辑模块250-1的另一个输入端,并提供为下一级时钟驱动电路的输入信号。
在整个系统上电之后,时钟源210尚未输出基本时钟信号S201时,触发器230-1的输出端Q处的脉冲信号S202会稳定在高电平。于是,延时模块240-1的输出信号S203稳定在低电平,即触发器230-1的SET端为低电平,且下一级时钟驱动电路的输入信号同样为低电平(与第一级时钟驱动电路220-1的输入信号S201相对应)。因此,组合逻辑模块250-1(或非门)的输入信号分别为高电平(S202)、低电平(S203),其输出的脉冲时钟信号S204为低电平。
在t1时刻,时钟源210开始输出基本时钟信号S201。基本时钟信号S201的周期为T。
如图2B所示,当信号S201从低电平变为高电平时,触发器230-1的CP端的信号上升沿到来,且SET端信号(S203)仍为低电平,使得触发器230-1的输出端Q的信号S202变为D端的信号值,即低电平。于是,组合逻辑模块250-1(或非门)的输入信号分别为低电平(S202)、低电平(S203),其输出的脉冲时钟信号S204变为高电平。
经过t 0,在t2时刻,延时模块240-1的输出信号S203变为高电平。t 0是信号S203与信号S202之间的延时,由延时模块240-1的配置决定。在图2A中所示的实施例中,t 0是延时模块240-1中的多个反相器的延迟 时间之和。
于是,如图2B所示,一方面,触发器230-1的SET端变为高电平,使得触发器230-1的输出端Q的信号S202变为高电平。另一方面,组合逻辑模块250-1(或非门)的输入信号分别为高电平(S202)、高电平(S203),其输出的脉冲时钟信号S204变为低电平。
又经过t 0,在t3时刻,延时模块240-1的输出信号S203变为低电平。
于是,如图2B所示,一方面,触发器230-1的SET端变为低电平,但CP端尚无信号上升沿到来,因此触发器230-1的输出端Q的信号S202仍保持为高电平。另一方面,组合逻辑模块250-1(或非门)的输入信号分别为高电平(S202)、低电平(S203),其输出的脉冲时钟信号S204仍为低电平。
此后,信号S202、S203、S204的值保持不变。直到t4时刻,基本时钟信号S201的下一周期开始。从t1时刻起到t4时刻经过基本时钟信号S201的一个周期T。
在t4时刻,信号S201变为高电平。
如图2B所示,当信号S201从低电平变为高电平时,触发器230-1的CP端的信号上升沿到来,且SET端信号(S203)仍为低电平,使得触发器230-1的输出端Q的信号S202变为低电平。于是,组合逻辑模块250-1(或非门)输出端处的脉冲时钟信号S204变为高电平。
经过t 0,在t5时刻,延时模块240-1的输出信号S203变为高电平。
于是,如图2B所示,一方面,触发器230-1的SET端变为高电平,使得触发器230-1的输出端Q的信号S202变为高电平。另一方面,组合逻辑模块250-1输出端处的脉冲时钟信号S204变为低电平。
又经过t 0,在t6时刻,延时模块240-1的输出信号S203变为低电平。
于是,如图2B所示,触发器230-1的输出端Q的信号S202仍保持为高电平,并且组合逻辑模块250-1的输出端处的脉冲时钟信号S204仍为低电平。
这样,在组合逻辑模块250-1的输出端产生了周期为T、脉冲宽度为t 0的脉冲时钟信号S204。该脉冲时钟信号S204被提供到相应的运算级201-N作为工作时钟。
此外,在延时模块240-1的输出端产生了输出信号S203,该输出信号S203同时作为下一级时钟驱动电路的输入信号(相当于第一级时钟驱动电路220-1的输入信号S201)。输出信号S203的上升沿用于触发下一级时钟驱动电路的触发器。如图2B所示,输出信号S203的上升沿比输入信号S201的上升沿延迟了t 0。同样地,每一级时钟驱动电路所产生的输出信号的上升沿都比该级时钟驱动电路的输入信号的上升沿延迟t 0,这符合流水线的各个运算级的工作需要。
这样,各级时钟驱动电路所产生的脉冲时钟信号的脉冲宽度为t 0,仅由该级时钟驱动电路的配置决定,而与该级时钟驱动电路的输入信号无关。虽然各级时钟驱动电路中的组合逻辑器件的制造误差仍可能导致各级输入 信号和输出信号的脉冲宽度出现偏差,但是每一级时钟驱动电路所产生的脉冲时钟信号的脉冲宽度与其输入信号的脉冲宽度无关,因此脉冲宽度的这种偏差不会随着信号在各级时钟驱动电路中传递而不断累积。也就是说,每一级时钟驱动电路所产生的脉冲时钟信号的脉冲宽度的可能的偏差与之前各级时钟驱动电路中的组合逻辑器件的可能的制造误差无关,而仅与该级时钟驱动电路中的组合逻辑器件的可能的制造误差有关。这样的制造误差通常较小,因而所导致的脉冲宽度偏差是可接受的。
图3A示出了根据本公开的另一些实施例的流水线时钟驱动电路300的示意图。图3B示出了根据本公开的另一些实施例的流水线时钟驱动电路300所产生的脉冲时钟信号的时序图。
流水线时钟驱动电路300用于为包括多个运算级301-1,…,301-N的流水线301提供脉冲时钟信号。如图3A所示,流水线时钟驱动电路300包括时钟源310和多级时钟驱动电路320-1,…,320-N。
时钟源310耦合到第一级时钟驱动电路320-1的输入,用于提供基本时钟信号。多级时钟驱动电路320-1,…,320-N中的除第一级时钟驱动电路320-1以外的其他各级时钟驱动电路的输入耦合到上一级时钟驱动电路的输出,其中每一级时钟驱动电路320-1,…,320-N用于为流水线301的多个运算级320-1,…,320-N中的相应一个运算级提供脉冲时钟信号。
其中,每一级时钟驱动电路320-1,…,320-N包括触发器330-1,…,330-N、延时模块340-1,…,340-N和组合逻辑模块350-1,…,350-N。
触发器330-1,…,330-N耦合到本级时钟驱动电路的输入。即,第一级时钟驱动电路320-1中的触发器330-1耦合到时钟源310的输出,而其他各级时钟驱动电路中的触发器耦合到上一级时钟驱动电路的输出。
在图3A中示出了触发器330-1,…,330-N作为下降沿D触发器的实施例。在图3A所示的实施例中,触发器330-1,…,330-N的RESET端耦合到延时模块340-1,…,340-N的输出,D端固定为高电平(即逻辑“1”),CPN端耦合到上一级时钟驱动电路的输出,而输出端Q耦合到延时模块340-1,…,340-N作为其输入。在下降沿D触发器的RESET端信号为低电平的情况下,输出端Q始终为低电平。在RESET端信号为高电平的情况下,每当CPN端信号的下降沿到来时,输出端Q变为D端的信号值。
延时模块340-1,…,340-N的输入耦合到触发器330-1,…,330-N的输出。延时模块340-1,…,340-N用于对触发器330-1,…,330-N输出的脉冲信号进行延时,将延时后的脉冲信号反馈到触发器330-1,…,330-N并输出到下一级时钟驱动电路。在优选的实施例中,延时模块340-1,…,340-N还对触发器330-1,…,330-N输出的脉冲信号进行反相。在图3A所示的实施例中,延时模块340-1,…,340-N可以由奇数个反相器构成。
组合逻辑模块350-1,…,350-N耦合到触发器330-1,…,330-N和延时模块340-1,…,340-N的输出。组合逻辑模块350-1,…,350-N对触发器330-1,…,330-N输出的脉冲信号和延时模块340-1,…,340-N输出的延时后的脉冲信号进行组合逻辑运算来产生脉冲时钟信号以提供到流水线301的相应一个运算级301-N,…, 301-1。在图3A所示的实施例中,在触发器330-1,…,330-N作为下降沿D触发器的情况下,组合逻辑模块350-1,…,350-N可以由与非门构成。
参考图3B,以下以第一级时钟驱动电路320-1为例描述其产生脉冲时钟信号的时序。
触发器330-1的CPN端从时钟源310接收基本时钟信号S301作为输入信号(相应地,后面各级触发器的CPN端分别从前一级时钟驱动电路中的延时模块的输出处接收输出信号S303作为输入信号),并且在输出端Q处将脉冲信号S302提供至延时模块340-1和组合逻辑模块350-1(在此实施例中为与非门)的一个输入端。延时模块340-1对脉冲信号S302进行反相并延时以得到输出信号S303,并且将输出信号S303分别提供至触发器330-1的RESET端和组合逻辑模块350-1的另一个输入端,并提供为下一级时钟驱动电路的输入信号。
在整个系统上电之后,时钟源310尚未输出基本时钟信号S301时,触发器330-1的输出端Q处的脉冲信号S302会稳定在低电平。于是,延时模块340-1的输出信号S303稳定在高电平,即触发器330-1的RESET端为高电平,且下一级时钟驱动电路的输入信号同样为高电平(与第一级时钟驱动电路320-1的输入信号S301相对应)。因此,组合逻辑模块350-1(与非门)的输入信号分别为低电平(S302)、高电平(S303),其输出的脉冲时钟信号S304为高电平。
在t1时刻,时钟源310开始输出基本时钟信号S301。基本时钟信号S301的周期为T。
如图3B所示,当信号S301从高电平变为低电平时,触发器330-1的CPN端的信号下降沿到来,且RESET端信号(S303)仍为高电平,使得触发器330-1的输出端Q的信号S302变为D端的信号值,即高电平。于是,组合逻辑模块350-1(与非门)的输入信号分别为高电平(S302)、高电平(S303),其输出的脉冲时钟信号S304变为低电平。
经过t 0,在t2时刻,延时模块340-1的输出信号S303变为低电平。t 0是信号S303与信号S302之间的延时,由延时模块340-1的配置决定。在图3A中所示的实施例中,t 0是延时模块340-1中的多个反相器的延迟时间之和。
于是,如图3B所示,一方面,触发器330-1的RESET端变为低电平,使得触发器330-1的输出端Q的信号S302变为低电平。另一方面,组合逻辑模块350-1(与非门)的输入信号分别为低电平(S302)、低电平(S303),其输出的脉冲时钟信号S304变为高电平。
又经过t 0,在t3时刻,延时模块340-1的输出信号S303变为高电平。
于是,如图3B所示,一方面,触发器330-1的RESET端变为高电平,但CPN端尚无信号上升沿到来,因此触发器330-1的输出端Q的信号S302仍保持为低电平。另一方面,组合逻辑模块350-1(与非门)的输入信号分别为低电平(S302)、高电平(S303),其输出的脉冲时钟信号S304仍为高电平。
此后,信号S302、S303、S304的值保持不变。直到t4时刻,基本时钟信号S301的下一周期开始。从t1 时刻起到t4时刻经过基本时钟信号S301的一个周期T。
在t4时刻,信号S301变为低电平。
如图3B所示,当信号S301从高电平变为低电平时,触发器330-1的CPN端有信号下降沿到来,使得触发器330-1的输出端Q的信号S302变为高电平。于是,组合逻辑模块350-1(与非门)输出端处的脉冲时钟信号S304变为低电平。
经过t 0,在t5时刻,延时模块340-1的输出信号S303变为低电平。
于是,如图3B所示,一方面,触发器330-1的RESET端变为低电平,使得触发器330-1的输出端Q的信号S302变为低电平。另一方面,组合逻辑模块350-1输出端处的脉冲时钟信号S304变为高电平。
又经过t 0,在t6时刻,延时模块340-1的输出信号S303变为高电平。
于是,如图3B所示,触发器330-1的输出端Q的信号S302仍保持为低电平,并且组合逻辑模块350-1输出端处的脉冲时钟信号S304仍为高电平。
这样,在组合逻辑模块350-1的输出端产生了周期为T、脉冲宽度为t 0的脉冲时钟信号S304。该脉冲时钟信号S304被提供到相应的运算级301-N作为工作时钟。
此外,在延时模块340-1的输出端产生了输出信号S303,该输出信号S303同时作为下一级时钟驱动电路的输入信号(相当于第一级时钟驱动电路320-1的输入信号S301)。输出信号S303的下降沿用于触发下一级时钟驱动电路的触发器。如图3B所示,输出信号S303的下降沿比输入信号S301的下降沿延迟了t 0。同样地,每一级时钟驱动电路所产生的输出信号的下降沿都比该级时钟驱动电路输入信号的下降沿延迟t 0,这符合流水线的各个运算级的工作需要。
如上所述,根据本公开的流水线时钟驱动电路所产生的脉冲时钟的脉冲宽度由延时模块所延迟的时间t 0决定。在优选的实施例中,延时模块由反相器构成。反相器的数量越多,所产生的脉冲时钟信号的脉冲宽度越大,而流水线的工作频率越低。在工程实践中,通常期望在脉冲时钟的脉冲宽度能够满足要求的情况下,使流水线的工作频率尽可能地高。为此,本公开提供了一种进一步改进的流水线时钟驱动电路,其中构成延时模块的反相器的数量可以被灵活地调整。
图4示出了根据本公开的又一些实施例的流水线时钟驱动电路400的示意图。
流水线时钟驱动电路400用于为流水线提供脉冲时钟信号。如图4所示,流水线时钟驱动电路400包括时钟源410和多级时钟驱动电路。
以第一级时钟驱动电路420-1为例,其用于为流水线的最后一级运算级401-N提供脉冲时钟信号。第一级时钟驱动电路420-1包括触发器430-1、延时模块440-1和组合逻辑模块450-1。其中,触发器430-1、延时模块440-1和组合逻辑模块450-1的配置方式与图2A所示的实施例类似,其产生脉冲时钟信号的时序与图2B 所示的时序类似,在此不再赘述。
与图2A所示的实施例不同的是,延时模块440-1由多个反相器和一个或多个数据选择器构成。其中,数据选择器被配置为使得多个反相器形成多条信号通路,并且每条信号通路中的反相器的数量均为奇数。在图4所示的实施例中,延时模块440-1由七个反相器和三个数据选择器构成,其形成四条信号通路,这四条信号通路分别包括一个、三个、五个以及七个反相器。因此,在图4所示的实施例中,通过切换三个数据选择器的状态,能够使得所产生的时钟脉冲信号具有四种不同的脉冲宽度(即,分别对应于一个、三个、五个以及七个反相器的延迟时间之和)。
由此,能够根据实际的脉冲宽度需求而灵活、便捷地改变数据选择器的状态,以使得流水线的工作频率尽可能高,从而提高芯片的工作效率。
图4中所示的延时模块440-1的配置仅作为示例。在其它实施例中,延时模块440-1可以由任何适当数量的反相器和数据选择器以任何适当的配置方式构成,以形成多条信号通路,使得每条信号通路中包括适当数量的反相器。在优选的实施例中,每条信号通路中的反相器的数量均不同。
根据本公开的运算电路可以以软件、硬件、软件与硬件的结合等各种适当的方式实现。在一种实现方式中,一种计算芯片可以包括一个或多个上述流水线时钟驱动电路。在一种实现方式中,一种算力板可以包括一个或多个计算芯片。在一种实现方式中,一种计算设备可以包括一个或多个算力板。多个算力板可以并行地执行计算任务。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
还应理解,“包括/包含”一词在本文中使用时,说明存在所指出的特征、整体、步骤、操作、单元和/或组件,但是并不排除存在或增加一个或多个其它特征、整体、步骤、操作、单元和/或组件以及/或者它们的组合。
虽然已通过示例详细展示了本公开的一些具体实施例,但是本领域技术人员应当理解,上述示例仅意图是说明性的而不限制本公开的范围。本领域技术人员应该理解,上述实施例可以在不脱离本公开的范围和实质的情况下被修改。本公开的范围是通过所附的权利要求限定的。

Claims (12)

  1. 一种流水线时钟驱动电路,用于为包括多个运算级的流水线提供脉冲时钟信号,其中所述流水线时钟驱动电路包括:
    多级时钟驱动电路,其中每一级时钟驱动电路用于为流水线的多个运算级中的相应一个运算级提供脉冲时钟信号;以及
    时钟源,耦合到第一级时钟驱动电路的输入,用于提供基本时钟信号,
    其中所述多级时钟驱动电路中的除第一级时钟驱动电路以外的其他各级时钟驱动电路的输入耦合到上一级时钟驱动电路的输出,并且
    其中每一级时钟驱动电路包括:
    触发器,耦合到本级时钟驱动电路的输入;
    延时模块,耦合到触发器的输出,所述延时模块用于对触发器输出的脉冲信号进行延时,将延时后的脉冲信号反馈到触发器并输出到下一级时钟驱动电路;以及
    组合逻辑模块,耦合到触发器和延时模块的输出,所述组合逻辑模块被配置为对触发器输出的脉冲信号和延时模块输出的延时后的脉冲信号进行组合逻辑运算来产生脉冲时钟信号以提供到流水线的相应一个运算级。
  2. 如权利要求1所述的流水线时钟驱动电路,其中所述触发器是上升沿触发器。
  3. 如权利要求2所述的流水线时钟驱动电路,其中所述组合逻辑模块是或门或者或非门。
  4. 如权利要求1所述的流水线时钟驱动电路,其中所述触发器是下降沿触发器。
  5. 如权利要求4所述的流水线时钟驱动电路,其中所述组合逻辑模块是与门或者与非门。
  6. 如权利要求1-5中任一项所述的流水线时钟驱动电路,其中所述延时模块由奇数个反相器构成。
  7. 如权利要求1-5中任一项所述的流水线时钟驱动电路,其中所述延时模块由多个反相器和一个或多个数据选择器构成,其中所述一个或多个数据选择器被配置为使得所述多个反相器形成多条信号通路,并且每条信号 通路中的反相器的数量均为奇数。
  8. 如权利要求7所述的流水线时钟驱动电路,其中所述每条信号通路中的反相器的数量均不同。
  9. 如权利要求1-5中任一项所述的流水线时钟驱动电路,其中脉冲信号在所述多级时钟驱动电路中传递的方向与数据信号在流水线的所述多个运算级中传递的方向相反。
  10. 一种计算芯片,其中包括一个或多个如权利要求1-9中任一项所述的流水线时钟驱动电路。
  11. 一种算力板,其中包括一个或多个如权利要求10所述的计算芯片。
  12. 一种计算设备,其中包括一个或多个如权利要求11所述的算力板。
PCT/CN2021/140016 2021-10-09 2021-12-21 流水线时钟驱动电路、计算芯片、算力板和计算设备 WO2023056709A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020237015561A KR102575572B1 (ko) 2021-10-09 2021-12-21 파이프라인 클럭 구동회로, 계산 칩, 계산 보드 및 계산 장치
US17/795,777 US20240313753A1 (en) 2021-10-09 2021-12-21 Pipeline clock driving circuit, computing chip, hashboard and computing device
CA3165378A CA3165378A1 (en) 2021-10-09 2021-12-21 Pipeline clock driving circuit, computing chip, hashboard and computing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111174118.2 2021-10-09
CN202111174118.2A CN113608575B (zh) 2021-10-09 2021-10-09 流水线时钟驱动电路、计算芯片、算力板和计算设备

Publications (1)

Publication Number Publication Date
WO2023056709A1 true WO2023056709A1 (zh) 2023-04-13

Family

ID=78343409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140016 WO2023056709A1 (zh) 2021-10-09 2021-12-21 流水线时钟驱动电路、计算芯片、算力板和计算设备

Country Status (3)

Country Link
CN (1) CN113608575B (zh)
TW (1) TWI784864B (zh)
WO (1) WO2023056709A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737651A (zh) * 2023-06-27 2023-09-12 无锡中微亿芯有限公司 一种低功耗的存算架构fpga

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081370B (zh) * 2022-06-27 2024-10-01 东科半导体(安徽)股份有限公司 可灵活配置驱动能力的驱动单元
CN115543016B (zh) * 2022-11-30 2023-03-10 苏州浪潮智能科技有限公司 一种时钟架构及处理模组
CN116088635A (zh) * 2023-02-02 2023-05-09 深圳比特微电子科技有限公司 流水线时钟驱动电路、计算芯片、算力板和计算设备
CN118520837A (zh) * 2023-02-17 2024-08-20 华为技术有限公司 芯粒和电子设备
CN116938198B (zh) * 2023-07-20 2024-06-21 上海奎芯集成电路设计有限公司 脉冲上升下降沿延迟电路及脉冲上升下降沿延迟芯片

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6323688B1 (en) * 1999-03-08 2001-11-27 Elbrus International Limited Efficient half-cycle clocking scheme for self-reset circuit
CN111404550A (zh) * 2019-01-03 2020-07-10 无锡华润上华科技有限公司 模数转换器及其时钟产生电路
CN112422116A (zh) * 2019-08-23 2021-02-26 长鑫存储技术有限公司 多级驱动数据传输电路及数据传输方法
TW202131632A (zh) * 2020-06-22 2021-08-16 大陸商深圳比特微電子科技有限公司 時鐘電路系統、計算晶片、算力板和資料處理設備

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5539337A (en) * 1994-12-30 1996-07-23 Intel Corporation Clock noise filter for integrated circuits
CN101446843A (zh) * 2008-12-30 2009-06-03 北京中星微电子有限公司 一种高频时钟发生器、时钟频率转换方法以及一种芯片
CN104021246B (zh) * 2014-05-28 2017-02-15 复旦大学 一种应用于低功耗容错电路的自适应长度预测器
US9590602B2 (en) * 2014-06-13 2017-03-07 Stmicroelectronics International N.V. System and method for a pulse generator
CN109120257B (zh) * 2018-08-03 2020-06-12 中国电子科技集团公司第二十四研究所 一种低抖动分频时钟电路
CN111510137A (zh) * 2020-06-04 2020-08-07 深圳比特微电子科技有限公司 时钟电路、计算芯片、算力板和数字货币挖矿机
CN212160484U (zh) * 2020-06-22 2020-12-15 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN111651403B (zh) * 2020-07-16 2024-10-01 深圳比特微电子科技有限公司 时钟树、哈希引擎、计算芯片、算力板和计算设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6323688B1 (en) * 1999-03-08 2001-11-27 Elbrus International Limited Efficient half-cycle clocking scheme for self-reset circuit
CN111404550A (zh) * 2019-01-03 2020-07-10 无锡华润上华科技有限公司 模数转换器及其时钟产生电路
CN112422116A (zh) * 2019-08-23 2021-02-26 长鑫存储技术有限公司 多级驱动数据传输电路及数据传输方法
TW202131632A (zh) * 2020-06-22 2021-08-16 大陸商深圳比特微電子科技有限公司 時鐘電路系統、計算晶片、算力板和資料處理設備

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737651A (zh) * 2023-06-27 2023-09-12 无锡中微亿芯有限公司 一种低功耗的存算架构fpga

Also Published As

Publication number Publication date
CN113608575B (zh) 2022-02-08
CN113608575A (zh) 2021-11-05
TWI784864B (zh) 2022-11-21
TW202230074A (zh) 2022-08-01

Similar Documents

Publication Publication Date Title
WO2023056709A1 (zh) 流水线时钟驱动电路、计算芯片、算力板和计算设备
TWI784457B (zh) 時鐘電路系統、計算晶片、算力板和資料處理設備
US6771100B2 (en) Clock control circuit
WO2024160037A1 (zh) 流水线时钟驱动电路、计算芯片、算力板和计算设备
CN102664623B (zh) 数字延迟装置
CN101213749A (zh) 多位可编程分频器
TW202139027A (zh) 時鐘樹電路、哈希引擎、計算晶片、算力板和資料處理設備
US9287854B2 (en) Pulse stretching circuit and method
US10009027B2 (en) Three state latch
JP2002208844A (ja) グリッチ除去回路
US20230236622A1 (en) Clock circuits, computing chips, hash boards and data processing devices
US20070159226A1 (en) Clock generator
CN111930682A (zh) 时钟树、哈希引擎、计算芯片、算力板和数字货币挖矿机
CN212160484U (zh) 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN220154843U (zh) 流水线时钟驱动电路、计算芯片、算力板和计算设备
KR20220120652A (ko) 클록 생성 회로 및 이를 사용하는 래치, 그리고 연산 장치
TWI790088B (zh) 處理器和計算系統
CN111651403A (zh) 时钟树、哈希引擎、计算芯片、算力板和数字货币挖矿机
CN114884488A (zh) 时钟电路、数据运算单元
US20190173458A1 (en) Shift register utilizing latches controlled by dual non-overlapping clocks
US6661864B2 (en) Counter circuit for detecting erroneous operation and recovering to normal operation by itself
KR102575572B1 (ko) 파이프라인 클럭 구동회로, 계산 칩, 계산 보드 및 계산 장치
TW202431063A (zh) 流水線時脈驅動電路、計算晶片、算力板和計算設備
CN114978114A (zh) 时钟电路、数据运算单元、芯片
US7049864B2 (en) Apparatus and method for high frequency state machine divider with low power consumption

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 3165378

Country of ref document: CA

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20237015561

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21959795

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21959795

Country of ref document: EP

Kind code of ref document: A1