WO2021258801A1 - Clock circuit system, computing chip, hash board, and data processing device - Google Patents

Clock circuit system, computing chip, hash board, and data processing device Download PDF

Info

Publication number
WO2021258801A1
WO2021258801A1 PCT/CN2021/083764 CN2021083764W WO2021258801A1 WO 2021258801 A1 WO2021258801 A1 WO 2021258801A1 CN 2021083764 W CN2021083764 W CN 2021083764W WO 2021258801 A1 WO2021258801 A1 WO 2021258801A1
Authority
WO
WIPO (PCT)
Prior art keywords
clock
circuit
clock circuit
port
signal
Prior art date
Application number
PCT/CN2021/083764
Other languages
French (fr)
Chinese (zh)
Inventor
范志军
刘建波
杨作兴
Original Assignee
深圳比特微电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳比特微电子科技有限公司 filed Critical 深圳比特微电子科技有限公司
Publication of WO2021258801A1 publication Critical patent/WO2021258801A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/06Clock generators producing several clock signals

Definitions

  • the present disclosure relates to the field of electronic circuits, and more specifically to a clock circuit system, and a computing chip, a computing power board, and a data processing device using the clock circuit system.
  • pipeline structure is a common method of chip design. Using the pipeline structure can effectively improve the efficiency/throughput rate of performing data processing tasks.
  • it is usually a pipeline related to instruction execution, so the processing time of the pipelines at all levels in the pipeline structure is not exactly the same.
  • pure hardware computing such as virtual digital currency computing, artificial intelligence (AI) computing, etc.
  • strict timing requirements are usually imposed. For example, the time of each stage of the pipeline needs to be precisely controlled to be consistent. Therefore, in these fields, a clock circuit system used to provide a clock signal for a pipeline structure often has a specific structure and function.
  • the embodiments of the present disclosure aim to use a simplified clock circuit system to generate a pulse signal with good performance, and the pulse signal can be used in a pipeline structure for performing computationally intensive data processing tasks.
  • a clock circuit system including a main clock circuit and one or more local clock circuits.
  • the master clock circuit includes a plurality of cascaded clock driving circuits, each clock driving circuit includes one or more delay elements that delay a clock signal, and the master clock circuit is configured to drive a clock signal along the plurality of clocks. Drive circuit propagation.
  • Each of the one or more local clock circuits is associated with a corresponding clock drive circuit in the master clock circuit, and includes: a first input terminal coupled to a first terminal of the master clock circuit A port to draw a first clock signal from the main clock circuit; a second input terminal, coupled to a second port of the main clock circuit to draw a second clock signal from the main clock circuit; and a logic gate element, coupled It is connected to the first input terminal and the second input terminal, and is configured to generate a pulse signal based on the first clock signal and the second clock signal.
  • the second port is located downstream of the first port in the main clock circuit, and the corresponding clock driving circuit of the main clock circuit exists between the first port and the second port At least one delay element in.
  • the local clock circuit further includes one or more additional delay elements that delay the second clock signal, and the one or more additional delay elements are provided between the logic gate element and the logic gate element. Between the second input terminal.
  • the local clock circuit has one of the following various configurations: a first configuration in which the first port and the second port associated with the local clock circuit Located in the same stage of the clock driving circuit of the main clock circuit; a second configuration, wherein the first port and the second port associated with the local clock circuit are located in two adjacent stages of the main clock circuit In a clock driving circuit; or a third configuration, wherein there is at least one level of clock driving circuit of the master clock circuit between the first port and the second port associated with the local clock circuit.
  • the one or more local clock circuits include a first local clock circuit and a second local clock circuit
  • the one or more local clock circuits include a first local clock circuit and a second local clock circuit.
  • a clock circuit, the first local clock circuit and the second local clock circuit each have a different configuration among the first configuration, the second configuration, and the third configuration.
  • no delay element is provided between the logic gate element and the first input terminal and the second input terminal of the local clock circuit.
  • the logic gate element is selected from one of an AND gate, a NAND gate, an OR gate, and a NOR gate; and the selection of the logic gate element is determined based on at least the following ⁇ : the type and number of the at least one delay element between the first port and the second port; the type and number of the delay element between the logic gate element and the second input terminal; and / Or the type of pulse signal required.
  • the one or more delay elements include at least one of a buffer and an inverter.
  • the local clock circuit is coupled to a corresponding one-stage pipeline circuit in a pipeline structure for performing data processing tasks to provide the pulse signal to the corresponding one-stage pipeline circuit.
  • the pulse signal is provided to one or more sets of registers in the corresponding one-stage pipeline circuit, and the output terminal of the local clock circuit is connected to the one or more sets of registers. There are additional buffers or inverters between each set of registers.
  • the register is a latch-type register that can be triggered by a high-level pulse or a low-level pulse of the pulse signal.
  • the data processing task includes executing a hash algorithm or executing an AI calculation.
  • the hash algorithm includes the SHA-256 algorithm.
  • a computing chip includes any clock circuit system as described herein.
  • a computing power board which includes the computing chip as described herein.
  • a data processing device including the hashrate board as described herein.
  • a pulse signal with good performance can be generated with a simplified clock circuit system and lower power consumption.
  • Fig. 1 shows a block diagram of a system according to an embodiment of the present disclosure
  • Fig. 2 shows an exemplary configuration of a clock circuit system
  • FIG. 3 shows an example of generating a pulse signal based on a first clock signal and a second clock signal that have a delay between each other;
  • Figure 5 shows an exemplary configuration of a further improved clock circuit system
  • Figure 6 shows a schematic diagram of a pipeline structure that can be used to implement the SHA-256 algorithm
  • Fig. 7 shows a schematic block diagram of a computing chip according to an embodiment of the present disclosure
  • FIG. 8 shows a schematic block diagram of a computing power board according to an embodiment of the present disclosure.
  • Fig. 9 shows a schematic block diagram of a digital currency mining machine according to an embodiment of the present disclosure.
  • computing hardware When performing computationally intensive data processing tasks such as virtual digital currency calculations and artificial intelligence (AI) calculations, computing hardware often needs to run for a long time. For example, in order to obtain digital currency efficiently, a data processing device such as a digital currency mining machine needs to perform a large number of hash operations without interruption. Such computing hardware consumes power significantly and brings corresponding costs, such as electricity costs.
  • the power consumption ratio is defined as the power consumed per unit of computing power of computing hardware, and it is one of the important performance indicators for measuring computing hardware.
  • the computing hardware includes or is implemented as a computing chip
  • the power consumption ratio can be reduced by reducing the number of components used by the computing chip.
  • the chip area of the computing chip can also be reduced.
  • FIG. 1 shows a block diagram of a system 100 according to an embodiment of the present disclosure.
  • the system 100 may include a clock circuit system 1000, a clock source 2000, and a pipeline structure 3000.
  • the clock circuit system 1000 may be coupled with the clock source 2000 and the pipeline structure 3000.
  • the clock source 2000 does not separately provide a separate clock signal for each stage of the pipeline circuit of the pipeline structure 3000, but provides the initial clock signal to the clock circuit system 1000, and the clock circuit system 1000 is used for the pipeline structure 3000.
  • Each stage of pipeline circuit provides a corresponding clock signal.
  • the clock circuit system 1000 may be designed to include multiple stages of clock driving circuits, and each stage of the clock driving circuit may provide a clock signal for an associated one-stage pipeline circuit.
  • Such a multi-level clock driving circuit of the clock circuit system 1000 may be called a master clock circuit, or also called a "master clock tree".
  • the main clock circuit can be extended with the extension of the pipeline circuits at all levels of the pipeline structure.
  • the main clock circuit of the clock circuit system 1000 may include a plurality of clock driving circuits 1100, 1200, and 1300 connected in series.
  • the initial clock signal provided by the clock source 2000 may be provided to the first-stage clock driving circuit 1100.
  • the output clock signal of the first-stage clock driving circuit 1100 may be provided to the second-stage clock driving circuit 1200.
  • the output clock signal of the second stage clock driving circuit 1200 may be provided to the third stage clock driving circuit 1300.
  • the clock drive circuit of the subsequent stage generates a new clock signal in response to receiving the clock signal of the clock drive circuit of the previous stage.
  • the clock signal generated by each stage of the clock driving circuit can be provided to the associated stage of pipeline circuit.
  • the master clock circuit enables the clock signal derived from the same initial clock signal to propagate tens or hundreds of stages along the clock driving circuit of each stage, thus being a pipeline structure containing tens or even hundreds of pipeline circuits.
  • Each stage of pipeline circuit provides a corresponding clock signal.
  • the clock driving circuits 1100, 1200, and 1300 can respectively provide corresponding clock signals for the pipeline circuits 3100, 3200, and 3300 in the pipeline structure 3000.
  • each stage of the clock driving circuit of the main clock circuit may be configured to include one or more circuit elements connected in series.
  • the clock signal can propagate through these circuit elements in turn.
  • the clock driving circuit 1100 may include circuit elements 1110, 1120, and 1130 connected in series in sequence
  • the clock driving circuit 1200 may include circuit elements 1210, 1220, and 1230 connected in series in sequence
  • the clock driving circuit 1300 may include circuit elements connected in series in sequence.
  • Circuit elements 1310, 1320, 1330. These circuit elements may be active elements. Active components can compensate for the power of the input signal, so that the amplitude of the clock signal propagating through the clock driving circuit can be maintained.
  • Typical active components used in clock drive circuits may include inverters and buffers.
  • the output signal of the inverter has an opposite level and phase relative to the input signal of the inverter. That is, in response to an input signal at a high level, the output signal of the inverter will be at a low level; and in response to an input signal at a low level, the output signal of the inverter will be at a high level.
  • the output signal of the buffer has the same level and phase relative to the input signal of the buffer.
  • the circuit elements 1110, 1120, and 1130 in the clock driving circuit 1100 may all be inverters, all buffers, or any combination of inverters and buffers.
  • the clock driving circuit 1200 and the clock driving circuit 1300 should have the same configuration as the clock driving circuit 1100, which can maintain the consistency of the clock driving circuits at all levels, thereby helping to ensure the clock signals provided by the clock driving circuits at all levels. Precise timing.
  • each circuit element will not be ideal.
  • the response (output signal) of each circuit element will have a certain delay relative to the stimulus (input signal). Therefore, each of the circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, and 1330 in the clock driving circuit delays the clock signal passing through the circuit element.
  • These circuit elements can also be referred to as delay elements. The delay characteristics of circuit elements can be used to generate specific signals, which will be described further below.
  • the clock signal directly output by the clock driving circuit of the master clock circuit cannot be directly used in the pipeline circuit.
  • the clock signal propagating along the main clock circuit is usually a square wave signal (for example, a square wave with a 50% duty cycle), and the pipeline circuit may use a latch type (Latch) register.
  • the latch type register needs to be triggered by a pulse signal.
  • the pulse signal is a signal that only has a short-term high-level state (or a short-term low-level state) in each clock cycle.
  • the square wave signal output by the main clock circuit is not suitable for being directly used to trigger the latch-type register in the pipeline circuit.
  • the clock signal output by the clock driving circuit of each stage of the main clock circuit needs to be preprocessed before being provided to the pipeline circuit.
  • the clock circuit system 1000 may further include local clock circuits 4100, 4200, and 4300.
  • Each local clock circuit can be associated with a corresponding clock driving circuit, and with a corresponding pipeline circuit. Unlike the clock driving circuits in the master clock circuit that are connected in series, each local clock circuit can be coupled between the corresponding clock driving circuit and the corresponding pipeline circuit.
  • the local clock circuit 4100 can be coupled between the clock driving circuit 1100 and the pipeline circuit 3100
  • the local clock circuit 4200 can be coupled between the clock driving circuit 1200 and the pipeline circuit 3200
  • the local clock circuit 4300 can be coupled between the clock driving circuit. Between the circuit 1300 and the pipeline circuit 3300.
  • the local clock circuit may draw the clock signal from the corresponding clock driving circuit, preprocess the clock signal to generate an appropriate signal (for example, a pulse signal), and provide the generated appropriate signal to the corresponding pipeline circuit.
  • an appropriate signal for example, a pulse signal
  • FIGS. 4A-4D and FIG. 5 A specific example of the configuration of the local clock circuit is described in detail below in conjunction with FIG. 2, and an improved embodiment regarding the configuration of the local clock circuit is further described in conjunction with FIGS. 4A-4D and FIG. 5.
  • the structure of the system 100 shown in FIG. 1 is only exemplary.
  • the pipeline structure 3000 in FIG. 1 includes 3-stage pipeline circuits
  • the pipeline structure according to an embodiment of the present disclosure may include more or fewer pipeline circuits, such as 2 stages, 10 stages, 50 stages, or more than 100 stages.
  • Clock drive circuit Accordingly, the master clock circuit according to the embodiment of the present disclosure is not limited to include 3 clock driving circuits, but may include more or fewer clock driving circuits, such as 2, 10, 50, or more than 100 clocks. Drive circuit.
  • a box with an ellipsis indicates an additional module 1400 that receives the output clock signal of the clock driving circuit 1300.
  • the additional module 1400 may represent a plurality of clock driving circuits not specifically shown or may represent a tail load element. If the additional module 1400 represents multiple clock drive circuits not specifically shown, each of the multiple clock drive circuits will also include multiple circuit elements connected in series, and may also have an associated local clock circuit .
  • the border of the clock circuit system 1000 in FIG. 1 is shown with a dashed line, which means that the border shown in FIG. 1 is only exemplary.
  • the clock source 2000 may be part of the clock circuitry 1000.
  • the local clock circuits 4100, 4200, 4300 may be located inside the corresponding one-stage pipeline circuit. However, from a functional point of view, such a local clock circuit can still be regarded as a part of the clock circuit system 1000.
  • FIG. 2 shows an exemplary configuration of the clock circuit system 1000. Compared with FIG. 1, FIG. 2 enlarges the size of the local clock circuit 4200 to specifically show the configuration of the local clock circuit 4200.
  • the local clock circuit 4200 may include one or more delay elements 4221, 4222, 4223 and a logic gate element 4230. Each of the delay elements 4221, 4222, 4223 may be an inverter or a buffer.
  • the input 4211 of the local clock circuit 4200 may be coupled to the clock driving circuit 1200 associated with the local clock circuit 4200 in the main clock circuit, so as to receive the clock signal output by the clock driving circuit 1200.
  • the logic gate element 4230 may be a logic gate element having two input terminals.
  • a first signal path and a second signal path may exist between the input terminal 4211 and the two input terminals of the logic gate element 4230.
  • the delay elements 4221, 4222, 4223 may be provided on the second signal path.
  • the clock signal received by the input terminal 4211 can be directly input to one input terminal of the logic gate element 4230 via the first signal path as the first clock signal, and can be used as the delay element 4221, 4222, 4223 on the second signal path.
  • the second clock signal is input to the other input terminal of the logic gate element 4230. Due to the existence of the delay elements 4221, 4222, 4223, the second clock signal will have a certain delay relative to the first clock signal. The amount of this delay is associated with the delay elements 4221, 4222, 4223.
  • the logic gate element 4230 can perform logic operations on the first clock signal and the second clock signal that are delayed between each other, thereby generating a pulse signal.
  • the generated pulse signal may be provided to a corresponding pipeline circuit (for example, the pipeline circuit 3200 of FIG. 1).
  • FIG. 3 shows an example of generating the pulse signal PLS based on the first clock signal CLK1 and the second clock signal CLK2 that are delayed from each other.
  • both the first clock signal CLK1 and the second clock signal CLK2 may be square wave signals.
  • both the first clock signal CLK1 and the second clock signal CLK2 come from the input 4211, due to the existence of the delay elements 4221, 4222, 4223, the second clock signal CLK2 and the first clock signal CLK1 can be inverted , And the delay of the second clock signal CLK2 relative to the first clock signal CLK1 is d.
  • Two signals, CLK1 and CLK2 can be input to the logic gate element 4230.
  • the logic gate element 4230 may be an AND gate (AND2).
  • the obtained PLS is a high-level pulse signal with a pulse width of d.
  • a high-level pulse signal refers to a signal with a short-term high-level state.
  • the phase difference and delay between the first clock signal CLK1 and the second clock signal CLK2 are related to the type and number of delay elements on the second signal path. If an odd number of inverters are provided on the second signal path, the obtained second clock signal CLK2 will have the opposite phase to the first clock signal CLK1. In addition, the delay of the second clock signal CLK2 relative to the first clock signal CLK1 depends on the sum of the delays of all delay elements provided on the second signal path. It should be understood that although FIG. 2 shows that the local clock circuit 4200 includes three delay elements, more or fewer delay elements may be used.
  • the pulse width of the obtained pulse signal is associated with the delay between the two input signals, and therefore is also associated with the sum of the delays of all the delay elements provided on the second signal path.
  • a high-level pulse signal may be provided to a latch-type register that can be triggered by a high-level pulse
  • a low-level pulse signal may be provided to a latch-type register that can be triggered by a low-level pulse.
  • FIG. 3 shows a situation where the first clock signal CLK1 and the second clock signal CLK2 are inverted, in other embodiments, the first clock signal CLK1 and the second clock signal CLK2 may also be In phase.
  • the type of logic gate element can be selected accordingly, such as OR gate (OR2) or NOR gate (NOR2).
  • the delay and pulse width shown in FIG. 3 are significant relative to the period width of the clock signal, this is only for the purpose of clarity.
  • the delay caused by the delay element and the pulse width of the generated pulse signal may be smaller with respect to the period of the clock signal.
  • the delay caused by each delay element may be on the order of tens of picoseconds, and one clock cycle of the clock signal may be on the order of several nanoseconds.
  • the local clock circuit 4200 shown in FIG. 2 can generate the pulse signal required by the pipeline circuit, such a local clock circuit still has room for improvement.
  • the local clock circuit 4200 requires necessary delay elements to be provided in the local clock circuit itself. These delay elements will occupy the chip area and increase the power consumption of the chip. In the case of pipeline circuits containing tens or hundreds of stages (correspondingly including tens or hundreds of local clock circuits), the number of delay elements used cannot be ignored. Moreover, when the available chip area is limited or the total power is limited, it may not be possible to provide enough delay elements in the local clock circuit.
  • the delay d of the second clock signal CLK2 relative to the first clock signal CLK1 may be too small, which may cause the pulse width of the generated pulse signal to be too narrow.
  • the trigger register requires a minimum pulse width, and a wide pulse width helps to trigger the register reliably. If the pulse width of the pulse signal generated by the local clock circuit is too narrow, it may not be able to effectively trigger the register in the pipeline circuit, which may cause the pipeline circuit to fail to perform data processing tasks correctly.
  • FIG. 4A shows an exemplary configuration of an improved clock circuit system 1000A according to an embodiment of the present disclosure.
  • the clock circuit system 1000A may include a main clock circuit and one or more local clock circuits 4100, 4200, 4300.
  • the master clock circuit may include multiple clock driving circuits 1100, 1200, and 1300 cascaded.
  • the main clock circuit is configured to drive a clock signal to propagate along the plurality of clock drive circuits 1100, 1200, and 1300.
  • Each clock driving circuit can each include one or more circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, 1330, these circuit elements can drive the propagation of the clock signal, and on the other hand also cause the clock signal Delay.
  • Each of the local clock circuits 4100, 4200, and 4300 is respectively associated with a corresponding clock driving circuit in the main clock circuit.
  • the local clock circuits 4100, 4200, 4300 of the clock circuit system 1000A may have a different configuration from the example of FIG. 2. The following description takes the local clock circuit 4200 as an example.
  • the local clock circuit 4200 may have two different input terminals 4212 and 4213 that draw clock signals from the main clock circuit.
  • the input terminal 4212 and the input terminal 4213 may be respectively coupled to the first port and the second port in the main clock circuit.
  • the second port may be located downstream of the first port in the main clock circuit, and there is at least one circuit element in the clock driving circuit of the main clock circuit that can cause the clock signal delay between the first port and the second port.
  • the second clock signal drawn by the input 4213 of the local clock circuit 4200 from the second port will have a delay relative to the first clock signal drawn by the input 4212 from the first port. This delay is caused by one or more circuit elements in the clock driving circuit in the master clock circuit, and does not depend on the delay element in the local clock circuit 4200.
  • the input terminal 4212 of the local clock circuit 4200 may be coupled to the output terminal (first port) of the second circuit element 1220 in the clock driving circuit 1200 to draw the first clock signal
  • the local clock circuit 4200 The input terminal 4213 of may be coupled to the output terminal (second port) of the third circuit element 1230 in the clock driving circuit 1200 to draw the second clock signal.
  • the output clock signal ie, the first clock signal drawn by the input terminal 4212
  • the local clock circuit 4200 may have a logic gate element 4230.
  • the logic gate element 4230 can perform logic operations on each input signal.
  • the first clock signal and the second clock signal drawn by the input terminals 4212 and 4213 may be provided to the logic gate element 4230.
  • An input terminal of the logic gate element 4230 may be connected to the input terminal 4212 through a first signal path, thereby receiving the first clock signal.
  • the other input terminal of the logic gate element 4230 may be connected to the input terminal 4213 through a second signal path, thereby receiving the second clock signal.
  • the logic gate element 4230 may be configured to perform logic operations on the two input clock signals, thereby generating pulse signals, as discussed in relation to FIG. 3.
  • One or more delay elements 4221, 4222, 4223 may be provided on the second signal path, so as to further delay the second clock signal on the second signal path.
  • the delay between the two clock signals received by the two input terminals of the logic gate element 4230 includes not only the delay caused by the delay elements 4221, 4222, 4223 in the local clock circuit 4200, but also the delay caused by The delay caused by the circuit element 1230 in the clock driving circuit 1200.
  • This increases the delay between the two clock signals received by the two input ends of the logic gate element 4230 without adding a delay element. Accordingly, the pulse width of the pulse signal generated by the logic gate element 4230 is increased, so that a better pulse signal can be provided to the pipeline circuit.
  • the two input terminals 4212 and 4213 of the local clock circuit 4200 can be connected to any two other first port and second port on the clock driving circuit of the master clock circuit, as long as the first port and There is at least one circuit element in the clock driving circuit that can delay the clock signal between the second ports.
  • the first port and the second port can have various configurations.
  • the first port and the second port respectively coupled to the two input terminals 4212 and 4213 of the local clock circuit 4200 may be located in the same level of clock driving circuit of the main clock circuit.
  • Figure 4A shows an embodiment of this exemplary configuration.
  • the input terminal 4212 of the local clock circuit 4200 may not be connected to the output terminal of the circuit element 1220, but to the output terminal of the circuit element 1210.
  • the delay between the two clock signals received by the two input terminals of the logic gate element 4230 will further include the delay caused by the circuit element 1220, thereby further increasing the pulse width of the generated pulse signal.
  • FIG. 4B shows an exemplary configuration of an improved clock circuit system 1000B.
  • the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the circuit element 1220 of the clock driving circuit 1200, and the input terminal 4213 of the local clock circuit 4200 may be connected to an adjacent clock The output terminal (second port) of the circuit element 1310 of the driving circuit 1300.
  • FIG. 4C shows an exemplary configuration of an improved clock circuit system 1000C.
  • the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the clock driving circuit 1100, and the input terminal 4213 of the local clock circuit 4200 may be connected to the circuit element 1310 of the clock driving circuit 1300 The output terminal (the second port).
  • the entire primary clock driving circuit 1200 exists between the first port and the second port. This situation can be advantageous because it can utilize the existing output port of the clock drive circuit without drawing the clock signal from the clock drive circuit, and will not affect the load inside the clock drive circuit of each stage.
  • the positions of the first port and the second port can be determined based on the properties of the required pulse signal.
  • the properties of pulse signals can include pulse width and signal type, and so on.
  • the number of circuit elements of the clock driving circuit that should exist between the first port and the second port can be determined based on the required pulse width of the pulse signal.
  • the required pulse width is relatively wide, the first port and the second port can be spaced far apart, so that there are more circuit elements that can cause delay between the two ports.
  • the type of pulse signal required is a high-level pulse signal
  • the position of the first port and the second port can be selected so that the number of inverters between the first port and the second port is the same as that of the local clock circuit.
  • the sum of the number of inverters (if any) on the signal path is an odd number, so that the two clock signals input to the logic gate element are inverted (for example, the situation shown in FIG. 3).
  • the delay elements 4221, 4222, 4223 are drawn with dashed boxes, which means that one or more of them may not be necessary. Since the delay of one or more circuit elements in the master clock circuit has been introduced, one or more of the one or more delay elements 4221, 4222, 4223 inside the local clock circuit 4200 can be removed. For example, in FIG. 4A, the elements 1230, 4221, 4222 can provide the delay originally provided by the elements 4221, 4222, 4223, so that the element 4223 can be removed while still satisfying one of the two clock signals input to the logic gate element 4230. Time delay requirements.
  • the elements 1220, 1230, and 4221 may provide the delay originally provided by the elements 4221, 4222, 4223, thereby removing the elements 4222 and 4223.
  • the delay elements inside the local clock circuit 4200 may be completely removed, as discussed below with respect to FIG. 4D.
  • FIG. 4D shows an exemplary configuration of an improved clock circuit system 1000D according to an embodiment of the present disclosure.
  • the input terminal 4212 of the local clock circuit 4200 may be coupled to the input terminal (first port) of the clock driving circuit 1200 to draw the first clock signal
  • the input terminal 4213 of the local clock circuit 4200 may be coupled to the clock
  • the output terminal (second port) of the third circuit element 1230 in the driving circuit 1200 is used to draw the second clock signal.
  • the delay between the two clock signals received by the two input ends of the logic gate element 4230 can be completely provided by the elements 1210, 1220, and 1230 in the clock driving circuit 1200 of the master clock circuit, without having to be locally
  • a delay element (for example, 4221, 4222, 4223) is provided in the clock circuit 4200. If the total delay of the circuit elements between the first port and the second port can provide a pulse signal of sufficient pulse width, the configuration shown in FIG. 4D can be preferably adopted, which can eliminate the delay element in the local clock circuit, thereby Minimize the area and power of the local clock circuit.
  • the advantages of the clock circuit systems 1000A-1000D exist in at least two aspects.
  • a larger delay can be provided, thereby obtaining a pulse signal with a wider pulse width.
  • the required delay remains the same, it is allowed to reduce the number of delay elements in the local clock circuit or to remove them completely, which will significantly reduce power consumption, component cost and chip area.
  • FIG. 5 shows a schematic diagram of an improved clock circuit system 1000E according to an embodiment of the present disclosure.
  • the configuration of FIG. 5 is similar to that of FIG. 4D. The difference is that the output of the logic gate element 4230 is not directly provided to the pipeline circuit, but can be provided to the additional circuit elements 4241 and 4242 first.
  • the circuit elements 4241 and 4242 may be inverters or buffers. As described above, the circuit elements 4241 and 4242 as active elements can realize the function of driving signals, thereby maintaining the amplitude of the output signal.
  • the circuit elements 4241 and 4242 may provide respective output signals to a corresponding set of elements (for example, registers). In the case where there are a large number of registers in the pipeline circuit, the configuration of FIG. 5 is advantageous.
  • FIG. 5 shows two circuit elements 4241 and 4242 of the local clock circuit 4200
  • the local clock circuit 4200 may include more such circuit elements without limitation.
  • the local clock circuits 4100 and 4300 may each have similar circuit elements (not shown).
  • one or more local clock circuits of each clock circuit system in FIGS. 4A-4D may also have similar circuit elements.
  • the local clock circuit 4200 is used as an example for discussion, and the local clock circuits 4100 and 4300 adopt the same configuration as the local clock circuit 4200, and the specific details of the local clock circuits 4100 and 4300 are omitted. describe.
  • each of the local clock circuits 4100, 4200, 4300 may adopt any of the various exemplary configurations described above without limitation.
  • the local clock circuit 4100 may adopt the configuration described with respect to FIG. 4A
  • the local clock circuit 4200 may adopt the configuration described with respect to FIG. 4B
  • the local clock circuit 4300 may adopt the configuration described with respect to FIG. 4C.
  • Other hybrid configurations are also possible.
  • the logic gate element used by each local clock circuit may be one selected from an AND gate, a NAND gate, an OR gate, and a NOR gate.
  • the type of the selected logic gate element can be determined based on multiple factors, including but not limited to: the circuit between the first port and the second port connected to the two input terminals of the local clock circuit The type (inverter or buffer), number and delay of the component; the type (inverter or buffer) of the delay element on the second signal path of the logic gate element, the number and the delay; what is required The type of pulse signal (high-level pulse trigger or low-level pulse trigger), etc. For example, if the sum of the number of inverters between the first port and the second port and the number of inverters on the second signal path is an odd number, an AND gate or a NAND gate can be selected.
  • OR gate or NOR gate If the sum of the numbers is even, you can choose OR gate or NOR gate.
  • a logic gate element can be realized by a combination of several logic gate elements. For example, there can be one inverter between the AND gate and the NAND gate, and the difference between the OR gate and the NOR gate can also be one inverter.
  • Various clock circuit systems 1000 may be used in combination with the pipeline structure 3000. Driven by various clock signals provided by the clock circuit system 1000, the pipeline circuits at various levels of the pipeline structure 3000 can perform various data processing tasks.
  • the data processing tasks here include, but are not limited to, data storage, data operations, and so on.
  • the data processing tasks performed by the pipeline structure 3000 may include various computationally intensive tasks.
  • Computing-intensive tasks require computing hardware to run for a long time, and a large number of pipeline circuits need to be implemented on computing chips to perform parallel computing, so they are sensitive to clock signal performance, power consumption, and chip area.
  • Data processing tasks that can be used to advantage of the present disclosure include, but are not limited to, performing hash algorithm calculations or performing artificial intelligence (AI) calculations.
  • Hashing algorithm is an algorithm that takes variable-length data as input and produces fixed-length hash value as output.
  • input data of any length is filled so that the length of the filled data is an integer multiple of a certain fixed length (for example, 512 bits), that is, the filled data can be divided into a plurality of fixed lengths.
  • the content of the stuffing bit includes the bit length information of the original data.
  • the hash algorithm will perform calculations on each fixed-length data block, for example, multiple rounds of calculations including data expansion and ⁇ or compression. When all data blocks are used, the final fixed-length hash value is obtained.
  • the hash algorithm executed by the pipeline structure 3000 may be the SHA-256 algorithm. Since 1993, the American Institute of Standards and Technology has designed and released multiple versions of Secure Hash Algorithm SHA (Secure Hash Algorithm). SHA-256 is one of the secure hash algorithms with a hash length of 256 bits. . The SHA-256 algorithm is one of the hash algorithms commonly used in calculations associated with virtual encrypted digital currencies (for example, Bitcoin). For example, Bitcoin is a proof of work (POW) based on the SHA-256 algorithm. The core of using a data processing device (such as a mining machine) for bitcoin mining is to calculate the SHA-256 computing power based on the data processing device to obtain bitcoin rewards.
  • a data processing device such as a mining machine
  • a pipeline structure with multiple operation stages can be used to implement high-speed operations.
  • a 64-stage pipeline structure can be used to operate 64 groups of data in parallel.
  • Figure 6 shows a schematic diagram of a pipeline structure 6000 that can be used to implement the SHA-256 algorithm.
  • the pipeline structure 6000 may be a specific use case of the pipeline structure 3000 described above.
  • the pipeline structure 6000 can be a 32-stage, 64-stage, or 128-stage pipeline.
  • the t-th operation stage, the t+1-th operation stage, and the t+2th operation stage in the pipeline structure 6000 are divided by dashed lines.
  • Each arithmetic stage can be realized by a corresponding one-stage pipeline circuit.
  • Each operation level can also include operation logic.
  • Each arithmetic stage may also include a plurality of registers A to H for storing intermediate values and a plurality of registers R0 to R15 for storing extended data, respectively.
  • One or more of these registers may be latch type registers.
  • the latch-type register in each pipeline circuit in the pipeline structure 6000 can be triggered based on the corresponding pulse signal provided by the clock circuit system described above, thereby updating the data stored therein .
  • the pulse signal provided by the clock circuit system may be a high-level pulse signal or a low-level pulse signal.
  • these registers can be divided into one or more groups, each of which can be composed of multiple circuit elements as shown in FIG. 5 (ie, circuit element 4241). And 4242) the output signal of the corresponding circuit element is triggered.
  • the clock circuit system according to the embodiments of the present disclosure may be included in various devices, including but not limited to computing chips, computing power boards, data processing devices (such as digital currency mining machines), and the like. Since the clock circuit system according to the embodiment of the present disclosure is adopted, these devices can obtain multiple clock signals with stable duty ratios at low cost and simple circuit structure, thereby ensuring that these devices perform specific computing tasks. Performance.
  • FIG. 7 shows a schematic block diagram of a computing chip 7000 according to an embodiment of the present disclosure.
  • the computing chip 7000 may include a clock circuit system 7100, a clock source 7200, and a pipeline structure 7300.
  • the clock circuit system 7100 may be a specific embodiment of the clock circuit system described above (for example, any one of 1000, 1000A, 1000B, 1000C, 1000D, and 1000E).
  • the clock source 7200 may be a specific embodiment of the clock source 2000 described above.
  • the pipeline structure 7300 may be a specific embodiment of the pipeline structure 3000 or 6000 described above.
  • the clock circuitry 7100 can be coupled to the clock source 7200 and the pipeline structure 7300.
  • the clock circuitry 7100 can receive the initial clock signal from the clock source 7200 and generate multiple clock signals accordingly.
  • the multiple clock signals can be provided to the pipeline structure 7300 to perform specific computing tasks.
  • the specific calculation task may be to execute the SHA-256 algorithm, for example.
  • the computing chip 7000 may be configured as a Bitcoin chip.
  • the clock source 7200 is shown as a dashed frame, indicating that the clock source 7200 may also be located outside the computing chip 7000.
  • FIG. 8 shows a schematic block diagram of a computing power board 8000 according to an embodiment of the present disclosure.
  • the computing power board 8000 may include one or more computing chips 8100.
  • the computing chip 8100 may be a specific embodiment of the computing chip 7000. Multiple computing chips 8100 can perform computing tasks in parallel.
  • FIG. 9 shows a schematic block diagram of a digital currency mining machine 9000 according to an embodiment of the present disclosure.
  • the digital currency mining machine 9000 is an example of a data processing device according to an embodiment of the present disclosure.
  • the digital currency mining machine 9000 can be configured to execute the SHA-256 algorithm to obtain proof of work (POW), and further obtain digital currency based on the proof of work.
  • the digital currency can be Bitcoin.
  • the digital currency mining machine 9000 may include one or more computing power boards 9100.
  • the computing power board 9100 may be a specific embodiment of the computing power board 8000. Multiple computing power boards 9100 can perform computing tasks in parallel, for example, execute the SHA-256 algorithm.
  • the word "exemplary” means “serving as an example, instance, or illustration” and not as a “model” to be accurately reproduced. Any implementation described exemplarily herein is not necessarily construed as being preferred or advantageous over other implementations. Moreover, the present disclosure is not limited by any expressed or implied theory given in the above technical field, background art, summary of the invention, or specific embodiments.
  • the word “substantially” means to include any small changes caused by design or manufacturing defects, device or component tolerances, environmental influences, and/or other factors.
  • the word “substantially” also allows the difference between the perfect or ideal situation caused by parasitic effects, noise, and other practical considerations that may be present in the actual implementation.
  • connection means that one element/node/feature is electrically, mechanically, logically, or otherwise directly connected (or Direct communication).
  • coupled means that one element/node/feature can be directly or indirectly connected to another element/node/feature mechanically, electrically, logically, or in other ways. Interaction is allowed, even if the two features may not be directly connected. In other words, “coupled” intends to include direct connection and indirect connection of elements or other features, including the connection of one or more intermediate elements.

Abstract

A clock circuit system, a computing chip, a hash board, and a data processing device. The clock circuit system comprises a primary clock circuit and a local clock circuit. The primary clock circuit comprises multiple cascaded clock drive circuits, and each clock drive circuit comprises one or more delay elements that delay clock signals. The local clock circuit comprises a first input end coupled to a first port of the primary clock circuit; a second input end coupled to a second port of the primary clock circuit; and a logic gate element. At least one delay element of the primary clock circuit is present between the first port and the second port. In the present disclosure, pulse signals having good performance can be generated by using a simplified clock circuit system.

Description

时钟电路系统、计算芯片、算力板和数据处理设备Clock circuit system, computing chip, computing power board and data processing equipment
相关申请的交叉引用Cross-references to related applications
本申请是以CN申请号为202010572765.8,申请日为2020年6月22日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with the CN application number 202010572765.8 and the filing date of June 22, 2020, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及电子电路领域,并且更具体地涉及时钟电路系统以及应用该时钟电路系统的计算芯片、算力板和数据处理设备。The present disclosure relates to the field of electronic circuits, and more specifically to a clock circuit system, and a computing chip, a computing power board, and a data processing device using the clock circuit system.
背景技术Background technique
采用流水线(pipeline)结构是芯片设计的常用方法。使用流水线结构能够有效地提升执行数据处理任务的效率/吞吐率。在通用CPU领域,通常是指令执行相关的流水线,因此流水线结构中的各级流水线的处理时间并不完全相同。然而,在依赖于纯硬件计算的许多领域(诸如虚拟数字货币计算、人工智能(AI)计算等),通常施加严格的时序要求。例如,每级流水线的时间都需要精确地控制到一致。因此,在这些领域中,用于为流水线结构提供时钟信号的时钟电路系统往往具有特定的结构和功能。The use of pipeline structure is a common method of chip design. Using the pipeline structure can effectively improve the efficiency/throughput rate of performing data processing tasks. In the field of general-purpose CPUs, it is usually a pipeline related to instruction execution, so the processing time of the pipelines at all levels in the pipeline structure is not exactly the same. However, in many fields that rely on pure hardware computing (such as virtual digital currency computing, artificial intelligence (AI) computing, etc.), strict timing requirements are usually imposed. For example, the time of each stage of the pipeline needs to be precisely controlled to be consistent. Therefore, in these fields, a clock circuit system used to provide a clock signal for a pipeline structure often has a specific structure and function.
发明内容Summary of the invention
本公开的实施例旨在使用精简的时钟电路系统生成性能良好的脉冲信号,该脉冲信号可以被用于执行计算密集型的数据处理任务的流水线结构。The embodiments of the present disclosure aim to use a simplified clock circuit system to generate a pulse signal with good performance, and the pulse signal can be used in a pipeline structure for performing computationally intensive data processing tasks.
根据本公开的第一方面,提供了一种时钟电路系统,所述时钟电路系统包括主时钟电路以及一个或多个本地时钟电路。所述主时钟电路包括级联的多个时钟驱动电路,每个时钟驱动电路包括使时钟信号延迟的一个或多个延迟元件,所述主时钟电路被配置为驱动时钟信号沿所述多个时钟驱动电路传播。所述一个或多个本地时钟电路中的每一个本地时钟电路与所述主时钟电路中的相应时钟驱动电路相关联,并且包括:第一输入端,耦接到所述主时钟电路的第一端口以从所述主时钟电路汲取第一时钟信号;第二输入端,耦接到所述主时钟电路的第二端口以从所述主时钟电路汲取第二时钟信号;和逻辑门元件,耦接到所述第一输入端和所述第二输入端,并且被配置为基于所述第一时钟信号和所述第二时钟信号生成脉冲信号。其中,所述第二端口在所述主时 钟电路中位于所述第一端口的下游,并且所述第一端口和所述第二端口之间存在所述主时钟电路的所述相应时钟驱动电路中的至少一个延迟元件。According to a first aspect of the present disclosure, there is provided a clock circuit system including a main clock circuit and one or more local clock circuits. The master clock circuit includes a plurality of cascaded clock driving circuits, each clock driving circuit includes one or more delay elements that delay a clock signal, and the master clock circuit is configured to drive a clock signal along the plurality of clocks. Drive circuit propagation. Each of the one or more local clock circuits is associated with a corresponding clock drive circuit in the master clock circuit, and includes: a first input terminal coupled to a first terminal of the master clock circuit A port to draw a first clock signal from the main clock circuit; a second input terminal, coupled to a second port of the main clock circuit to draw a second clock signal from the main clock circuit; and a logic gate element, coupled It is connected to the first input terminal and the second input terminal, and is configured to generate a pulse signal based on the first clock signal and the second clock signal. Wherein, the second port is located downstream of the first port in the main clock circuit, and the corresponding clock driving circuit of the main clock circuit exists between the first port and the second port At least one delay element in.
根据本公开的该第一方面,所述本地时钟电路还包括使第二时钟信号延迟的一个或多个附加延迟元件,所述一个或多个附加延迟元件被设置在所述逻辑门元件与所述第二输入端之间。According to the first aspect of the present disclosure, the local clock circuit further includes one or more additional delay elements that delay the second clock signal, and the one or more additional delay elements are provided between the logic gate element and the logic gate element. Between the second input terminal.
根据本公开的该第一方面,所述本地时钟电路具有以下各种配置中的一种配置:第一配置,其中与所述本地时钟电路相关联的所述第一端口和所述第二端口位于所述主时钟电路的同一级时钟驱动电路中;第二配置,其中与所述本地时钟电路相关联的所述第一端口和所述第二端口位于所述主时钟电路的相邻两级时钟驱动电路中;或者第三配置,其中与所述本地时钟电路相关联的所述第一端口与所述第二端口之间存在所述主时钟电路的至少一级时钟驱动电路。According to this first aspect of the present disclosure, the local clock circuit has one of the following various configurations: a first configuration in which the first port and the second port associated with the local clock circuit Located in the same stage of the clock driving circuit of the main clock circuit; a second configuration, wherein the first port and the second port associated with the local clock circuit are located in two adjacent stages of the main clock circuit In a clock driving circuit; or a third configuration, wherein there is at least one level of clock driving circuit of the master clock circuit between the first port and the second port associated with the local clock circuit.
根据本公开的该第一方面,所述一个或多个本地时钟电路包括第一本地时钟电路和第二本地时钟电路,所述一个或多个本地时钟电路包括第一本地时钟电路和第二本地时钟电路,所述第一本地时钟电路和所述第二本地时钟电路各自具有所述第一配置、第二配置和第三配置中的不同配置。According to this first aspect of the present disclosure, the one or more local clock circuits include a first local clock circuit and a second local clock circuit, and the one or more local clock circuits include a first local clock circuit and a second local clock circuit. A clock circuit, the first local clock circuit and the second local clock circuit each have a different configuration among the first configuration, the second configuration, and the third configuration.
根据本公开的该第一方面,所述逻辑门元件与所述本地时钟电路的所述第一输入端和所述第二输入端之间没有设置延迟元件。According to the first aspect of the present disclosure, no delay element is provided between the logic gate element and the first input terminal and the second input terminal of the local clock circuit.
根据本公开的该第一方面,所述逻辑门元件选自与门、与非门、或门、或非门中的一种;并且所述逻辑门元件的选择是至少基于以下各项而确定的:所述第一端口与所述第二端口之间的所述至少一个延迟元件的类型和数量;所述逻辑门元件与所述第二输入端之间的延迟元件的类型和数量;和/或所需要的脉冲信号的类型。According to the first aspect of the present disclosure, the logic gate element is selected from one of an AND gate, a NAND gate, an OR gate, and a NOR gate; and the selection of the logic gate element is determined based on at least the following的: the type and number of the at least one delay element between the first port and the second port; the type and number of the delay element between the logic gate element and the second input terminal; and / Or the type of pulse signal required.
根据本公开的该第一方面,所述一个或多个延迟元件包括缓冲器和反相器中的至少一者。According to this first aspect of the present disclosure, the one or more delay elements include at least one of a buffer and an inverter.
根据本公开的该第一方面,所述本地时钟电路耦接到用于执行数据处理任务的流水线结构中的对应一级流水线电路,以将所述脉冲信号提供给所述对应一级流水线电路。According to this first aspect of the present disclosure, the local clock circuit is coupled to a corresponding one-stage pipeline circuit in a pipeline structure for performing data processing tasks to provide the pulse signal to the corresponding one-stage pipeline circuit.
根据本公开的该第一方面,所述脉冲信号被提供给所述对应一级流水线电路中的一组或多组寄存器,所述本地时钟电路的输出端与所述一组或多组寄存器中的每组寄存器之间设置有附加的缓冲器或反相器。According to the first aspect of the present disclosure, the pulse signal is provided to one or more sets of registers in the corresponding one-stage pipeline circuit, and the output terminal of the local clock circuit is connected to the one or more sets of registers. There are additional buffers or inverters between each set of registers.
根据本公开的该第一方面,所述寄存器是锁存器型寄存器,所述锁存器型寄存器 能够被所述脉冲信号的高电平脉冲或低电平脉冲触发。According to this first aspect of the present disclosure, the register is a latch-type register that can be triggered by a high-level pulse or a low-level pulse of the pulse signal.
根据本公开的该第一方面,所述数据处理任务包括执行散列算法或执行AI计算。According to this first aspect of the present disclosure, the data processing task includes executing a hash algorithm or executing an AI calculation.
根据本公开的该第一方面,所述散列算法包括SHA-256算法。According to this first aspect of the present disclosure, the hash algorithm includes the SHA-256 algorithm.
根据本公开的第二方面,公开了一种计算芯片,所述计算芯片包括如本文所述的任一种时钟电路系统。According to a second aspect of the present disclosure, a computing chip is disclosed. The computing chip includes any clock circuit system as described herein.
根据本公开的第三方面,公开了一种算力板,所述算力板包括如本文所述的计算芯片。According to a third aspect of the present disclosure, a computing power board is disclosed, which includes the computing chip as described herein.
根据本公开的第四方面,公开了一种数据处理设备,所述数据处理设备包括如本文所述的算力板。According to a fourth aspect of the present disclosure, a data processing device is disclosed, the data processing device including the hashrate board as described herein.
根据本公开的各个方面能够以精简的时钟电路系统和较低的功率消耗生成性能良好的脉冲信号。通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。According to various aspects of the present disclosure, a pulse signal with good performance can be generated with a simplified clock circuit system and lower power consumption. Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features and advantages of the present disclosure will become clear.
附图说明Description of the drawings
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。The drawings constituting a part of the specification describe the embodiments of the present disclosure, and together with the specification, serve to explain the principle of the present disclosure.
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:With reference to the accompanying drawings, the present disclosure can be understood more clearly according to the following detailed description, in which:
图1示出了根据本公开的实施例的系统的框图;Fig. 1 shows a block diagram of a system according to an embodiment of the present disclosure;
图2示出了时钟电路系统的示例性配置;Fig. 2 shows an exemplary configuration of a clock circuit system;
图3示出了基于彼此之间存在延迟的第一路时钟信号和第二路时钟信号生成脉冲信号的示例;FIG. 3 shows an example of generating a pulse signal based on a first clock signal and a second clock signal that have a delay between each other;
图4A-4D示出了改进的时钟电路系统的示例性配置;4A-4D show exemplary configurations of an improved clock circuit system;
图5示出了进一步改进的时钟电路系统的示例性配置;Figure 5 shows an exemplary configuration of a further improved clock circuit system;
图6示出了可用于实现SHA-256算法的流水线结构的示意图;Figure 6 shows a schematic diagram of a pipeline structure that can be used to implement the SHA-256 algorithm;
图7示出了根据本公开的实施例的计算芯片的示意性框图;Fig. 7 shows a schematic block diagram of a computing chip according to an embodiment of the present disclosure;
图8示出了根据本公开的实施例的算力板的示意性框图;并且FIG. 8 shows a schematic block diagram of a computing power board according to an embodiment of the present disclosure; and
图9示出了根据本公开的实施例的数字货币挖矿机的示意性框图。Fig. 9 shows a schematic block diagram of a digital currency mining machine according to an embodiment of the present disclosure.
注意,在以下说明的实施方式中,有时在不同的附图之间共同使用同一附图标记来表示相同部分或具有相同功能的部分,而省略其重复说明。在本说明书中,使用相似的标号和字母表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附 图中不需要对其进行进一步讨论。Note that in the embodiments described below, the same reference numerals are sometimes used in common between different drawings to denote the same parts or parts with the same functions, and repetitive descriptions thereof are omitted. In this specification, similar reference numerals and letters are used to indicate similar items. Therefore, once a certain item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
为了便于理解,在附图等中所示的各结构的位置、尺寸及范围等有时不表示实际的位置、尺寸及范围等。因此,所公开的内容并不限于附图等所公开的位置、尺寸及范围等。此外,附图不必按比例绘制,一些特征可能被放大以示出具体组件的细节。For ease of understanding, the position, size, range, etc. of each structure shown in the drawings and the like may not indicate the actual position, size, range, etc. Therefore, the disclosed content is not limited to the position, size, range, etc. disclosed in the drawings and the like. In addition, the drawings are not necessarily drawn to scale, and some features may be exaggerated to show details of specific components.
具体实施方式detailed description
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。也就是说,本文中的用于实现散列算法的电路和方法是以示例性的方式示出,来说明本公开中的电路或方法的不同实施例,而并非意图限制。本领域的技术人员将会理解,它们仅仅说明可以用来实施本公开的示例性方式,而不是穷尽的方式。The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. That is to say, the circuits and methods for implementing the hash algorithm in this document are shown in an exemplary manner to illustrate different embodiments of the circuits or methods in the present disclosure, and are not intended to be limiting. Those skilled in the art will understand that they only illustrate exemplary ways that can be used to implement the present disclosure, rather than exhaustive ways.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the authorization specification.
在执行诸如虚拟数字货币计算、人工智能(AI)计算之类的计算密集型的数据处理任务时,计算硬件常常需要长时间地运行。例如,为了高效率地获取数字货币,诸如数字货币挖矿机之类的数据处理设备需要不间断地执行大量散列(hash)运算。这样的计算硬件会显著地消耗功率并带来相应的成本,例如电力成本。功耗比被定义为计算硬件的每单位算力所消耗的功率,它是衡量计算硬件的重要性能指标之一。当计算硬件包含或被实现为计算芯片时,可以通过减少计算芯片所使用的元件的数量来降低功耗比。有利地,计算芯片的芯片面积也可以被减小。When performing computationally intensive data processing tasks such as virtual digital currency calculations and artificial intelligence (AI) calculations, computing hardware often needs to run for a long time. For example, in order to obtain digital currency efficiently, a data processing device such as a digital currency mining machine needs to perform a large number of hash operations without interruption. Such computing hardware consumes power significantly and brings corresponding costs, such as electricity costs. The power consumption ratio is defined as the power consumed per unit of computing power of computing hardware, and it is one of the important performance indicators for measuring computing hardware. When the computing hardware includes or is implemented as a computing chip, the power consumption ratio can be reduced by reducing the number of components used by the computing chip. Advantageously, the chip area of the computing chip can also be reduced.
图1示出了根据本公开的实施例的系统100的框图。系统100可以包括时钟电路系统1000、时钟源2000以及流水线结构3000。时钟电路系统1000可以与时钟源2000和流水线结构3000耦接。FIG. 1 shows a block diagram of a system 100 according to an embodiment of the present disclosure. The system 100 may include a clock circuit system 1000, a clock source 2000, and a pipeline structure 3000. The clock circuit system 1000 may be coupled with the clock source 2000 and the pipeline structure 3000.
在系统100中,时钟源2000不是分别为流水线结构3000的每一级流水线电路提供单独的时钟信号,而是将初始时钟信号提供给时钟电路系统1000,并且由时钟电路系统1000为流水线结构3000的每一级流水线电路提供相应的时钟信号。为此,时钟电路系统1000可以被设计为包括多级时钟驱动电路,每一级时钟驱动电路可以提供 用于相关联的一级流水线电路的时钟信号。时钟电路系统1000的这种多级时钟驱动电路可以被称为主时钟电路,或者又称为“主时钟树”。主时钟电路可以随着流水线结构的各级流水线电路的延伸而延伸。In the system 100, the clock source 2000 does not separately provide a separate clock signal for each stage of the pipeline circuit of the pipeline structure 3000, but provides the initial clock signal to the clock circuit system 1000, and the clock circuit system 1000 is used for the pipeline structure 3000. Each stage of pipeline circuit provides a corresponding clock signal. To this end, the clock circuit system 1000 may be designed to include multiple stages of clock driving circuits, and each stage of the clock driving circuit may provide a clock signal for an associated one-stage pipeline circuit. Such a multi-level clock driving circuit of the clock circuit system 1000 may be called a master clock circuit, or also called a "master clock tree". The main clock circuit can be extended with the extension of the pipeline circuits at all levels of the pipeline structure.
具体地,在图1的示例中,时钟电路系统1000的主时钟电路可以包括串联的多个时钟驱动电路1100、1200、1300。由时钟源2000提供的初始时钟信号可以被提供给第一级时钟驱动电路1100。第一级时钟驱动电路1100的输出时钟信号可以被提供给第二级时钟驱动电路1200。第二级时钟驱动电路1200的输出时钟信号可以被提供给第三级时钟驱动电路1300。后一级时钟驱动电路响应于接收到前一级时钟驱动电路的时钟信号而生成新的时钟信号。每一级时钟驱动电路生成的时钟信号可以被提供给相关联的一级流水线电路。以这种方式,主时钟电路使得源自同一初始时钟信号的时钟信号能够沿着各级时钟驱动电路传播数十或上百级,从而为包含数十级甚至上百级流水线电路的流水线结构的每一级流水线电路提供相应的时钟信号。如图1所示,时钟驱动电路1100、1200、1300可以分别为流水线结构3000中的流水线电路3100、3200、3300提供相应的时钟信号。Specifically, in the example of FIG. 1, the main clock circuit of the clock circuit system 1000 may include a plurality of clock driving circuits 1100, 1200, and 1300 connected in series. The initial clock signal provided by the clock source 2000 may be provided to the first-stage clock driving circuit 1100. The output clock signal of the first-stage clock driving circuit 1100 may be provided to the second-stage clock driving circuit 1200. The output clock signal of the second stage clock driving circuit 1200 may be provided to the third stage clock driving circuit 1300. The clock drive circuit of the subsequent stage generates a new clock signal in response to receiving the clock signal of the clock drive circuit of the previous stage. The clock signal generated by each stage of the clock driving circuit can be provided to the associated stage of pipeline circuit. In this way, the master clock circuit enables the clock signal derived from the same initial clock signal to propagate tens or hundreds of stages along the clock driving circuit of each stage, thus being a pipeline structure containing tens or even hundreds of pipeline circuits. Each stage of pipeline circuit provides a corresponding clock signal. As shown in FIG. 1, the clock driving circuits 1100, 1200, and 1300 can respectively provide corresponding clock signals for the pipeline circuits 3100, 3200, and 3300 in the pipeline structure 3000.
为了实现驱动和传播时钟信号的功能,主时钟电路的每一级时钟驱动电路可以被配置为包括串联的一个或多个电路元件。在每一级时钟驱动电路内,时钟信号可以依次传播通过这些电路元件。如图1所示,时钟驱动电路1100可以包括依次串联的电路元件1110、1120、1130,时钟驱动电路1200可以包括依次串联的电路元件1210、1220、1230,并且时钟驱动电路1300可以包括依次串联的电路元件1310、1320、1330。这些电路元件可以是有源元件。有源元件可以对输入信号的功率进行补偿,从而可以维持传播通过时钟驱动电路的时钟信号的幅度。In order to realize the function of driving and propagating the clock signal, each stage of the clock driving circuit of the main clock circuit may be configured to include one or more circuit elements connected in series. In each stage of the clock drive circuit, the clock signal can propagate through these circuit elements in turn. As shown in FIG. 1, the clock driving circuit 1100 may include circuit elements 1110, 1120, and 1130 connected in series in sequence, the clock driving circuit 1200 may include circuit elements 1210, 1220, and 1230 connected in series in sequence, and the clock driving circuit 1300 may include circuit elements connected in series in sequence. Circuit elements 1310, 1320, 1330. These circuit elements may be active elements. Active components can compensate for the power of the input signal, so that the amplitude of the clock signal propagating through the clock driving circuit can be maintained.
用于时钟驱动电路的典型的有源元件可以包括反相器和缓冲器。本领域技术人员知晓,反相器的输出信号相对于该反相器的输入信号具有相反的电平相位。即,响应于处于高电平的输入信号,反相器的输出信号将处于低电平;而响应于处于低电平的输入信号,反相器的输出信号将处于高电平。与反相器不同,缓冲器的输出信号相对于该缓冲器的输入信号具有相同的电平相位。根据需要,时钟驱动电路1100中的电路元件1110、1120、1130可以全都是反相器、全都是缓冲器、或者是反相器与缓冲器的任意组合。优选地,时钟驱动电路1200和时钟驱动电路1300应当具有与时钟驱动电路1100相同的配置,这可以保持各级时钟驱动电路的一致性,从而有助于确保由各级时钟驱动电路提供的时钟信号的精确时序。Typical active components used in clock drive circuits may include inverters and buffers. Those skilled in the art know that the output signal of the inverter has an opposite level and phase relative to the input signal of the inverter. That is, in response to an input signal at a high level, the output signal of the inverter will be at a low level; and in response to an input signal at a low level, the output signal of the inverter will be at a high level. Unlike an inverter, the output signal of the buffer has the same level and phase relative to the input signal of the buffer. According to requirements, the circuit elements 1110, 1120, and 1130 in the clock driving circuit 1100 may all be inverters, all buffers, or any combination of inverters and buffers. Preferably, the clock driving circuit 1200 and the clock driving circuit 1300 should have the same configuration as the clock driving circuit 1100, which can maintain the consistency of the clock driving circuits at all levels, thereby helping to ensure the clock signals provided by the clock driving circuits at all levels. Precise timing.
注意到,实际的电路元件(例如,反相器和缓冲器)的输入-输出响应不会是理想的。例如,每个电路元件的响应(输出信号)相对于激励(输入信号)将会存在一定的延迟。因此,时钟驱动电路中的电路元件1110、1120、1130、1210、1220、1230、1310、1320、1330中每一个电路元件都会使通过该电路元件的时钟信号延迟。这些电路元件也可以被称为延迟元件。电路元件的延迟特性可以被用于生成特定信号,这将在后文进一步描述。Note that the input-output response of actual circuit elements (for example, inverters and buffers) will not be ideal. For example, the response (output signal) of each circuit element will have a certain delay relative to the stimulus (input signal). Therefore, each of the circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, and 1330 in the clock driving circuit delays the clock signal passing through the circuit element. These circuit elements can also be referred to as delay elements. The delay characteristics of circuit elements can be used to generate specific signals, which will be described further below.
在许多场景中,由主时钟电路的时钟驱动电路直接输出的时钟信号并不能被直接地用于流水线电路。例如,沿着主时钟电路传播的时钟信号通常是方波信号(例如,占空比为50%的方波),而流水线电路可能采用锁存器型(Latch)寄存器。锁存器型寄存器需要由脉冲信号来触发。脉冲信号是在每个时钟周期内仅具有短时高电平状态(或短时低电平状态)的信号。由主时钟电路输出的方波信号不适于被直接地用于触发流水线电路中的锁存型寄存器。在这种情况下,主时钟电路的每一级时钟驱动电路输出的时钟信号在被提供给流水线电路之前需要被预处理。In many scenarios, the clock signal directly output by the clock driving circuit of the master clock circuit cannot be directly used in the pipeline circuit. For example, the clock signal propagating along the main clock circuit is usually a square wave signal (for example, a square wave with a 50% duty cycle), and the pipeline circuit may use a latch type (Latch) register. The latch type register needs to be triggered by a pulse signal. The pulse signal is a signal that only has a short-term high-level state (or a short-term low-level state) in each clock cycle. The square wave signal output by the main clock circuit is not suitable for being directly used to trigger the latch-type register in the pipeline circuit. In this case, the clock signal output by the clock driving circuit of each stage of the main clock circuit needs to be preprocessed before being provided to the pipeline circuit.
为了对主时钟电路的时钟驱动电路输出的时钟信号进行预处理,时钟电路系统1000还可以包括本地时钟电路4100、4200、4300。每个本地时钟电路可以与相应的时钟驱动电路相关联,并且与相应的流水线电路相关联。与主时钟电路中的时钟驱动电路彼此串联不同,每个本地时钟电路可以分别耦接在相应的时钟驱动电路与相应的流水线电路之间。例如,本地时钟电路4100可以耦接在时钟驱动电路1100与流水线电路3100之间,本地时钟电路4200可以耦接在时钟驱动电路1200与流水线电路3200之间,本地时钟电路4300可以耦接在时钟驱动电路1300与流水线电路3300之间。本地时钟电路可以从相应的时钟驱动电路汲取时钟信号、对该时钟信号进行预处理以生成适当的信号(例如,脉冲信号)、并将所生成的适当信号提供给相应的流水线电路。下面结合图2详细地描述了本地时钟电路的配置的具体示例,并且结合图4A-4D以及图5进一步描述了关于本地时钟电路的配置的改进实施例。In order to preprocess the clock signal output by the clock driving circuit of the master clock circuit, the clock circuit system 1000 may further include local clock circuits 4100, 4200, and 4300. Each local clock circuit can be associated with a corresponding clock driving circuit, and with a corresponding pipeline circuit. Unlike the clock driving circuits in the master clock circuit that are connected in series, each local clock circuit can be coupled between the corresponding clock driving circuit and the corresponding pipeline circuit. For example, the local clock circuit 4100 can be coupled between the clock driving circuit 1100 and the pipeline circuit 3100, the local clock circuit 4200 can be coupled between the clock driving circuit 1200 and the pipeline circuit 3200, and the local clock circuit 4300 can be coupled between the clock driving circuit. Between the circuit 1300 and the pipeline circuit 3300. The local clock circuit may draw the clock signal from the corresponding clock driving circuit, preprocess the clock signal to generate an appropriate signal (for example, a pulse signal), and provide the generated appropriate signal to the corresponding pipeline circuit. A specific example of the configuration of the local clock circuit is described in detail below in conjunction with FIG. 2, and an improved embodiment regarding the configuration of the local clock circuit is further described in conjunction with FIGS. 4A-4D and FIG. 5.
应当注意的是,图1所示的系统100的结构仅仅是示例性的。例如,尽管图1中的流水线结构3000包括3级流水线电路,但是根据本公开的实施例的流水线结构可以包括更多或更少的流水线电路,例如2级、10级、50级或者大于100级时钟驱动电路。相应地,根据本公开的实施例的主时钟电路不限于包括3个时钟驱动电路,而是可以包括更多或更少的时钟驱动电路,例如2个、10个、50个或者大于100个时钟驱动电路。图1中以带有省略号的框表示接收时钟驱动电路1300的输出时钟信号的附 加模块1400,该附加模块1400可以表示未具体示出的多个时钟驱动电路或者可以表示尾端负载元件。如果附加模块1400表示未具体示出的多个时钟驱动电路,则该多个时钟驱动电路中的每一个时钟驱动电路也将包括多个串联的电路元件,并且也可以具有相关联的本地时钟电路。It should be noted that the structure of the system 100 shown in FIG. 1 is only exemplary. For example, although the pipeline structure 3000 in FIG. 1 includes 3-stage pipeline circuits, the pipeline structure according to an embodiment of the present disclosure may include more or fewer pipeline circuits, such as 2 stages, 10 stages, 50 stages, or more than 100 stages. Clock drive circuit. Accordingly, the master clock circuit according to the embodiment of the present disclosure is not limited to include 3 clock driving circuits, but may include more or fewer clock driving circuits, such as 2, 10, 50, or more than 100 clocks. Drive circuit. In FIG. 1, a box with an ellipsis indicates an additional module 1400 that receives the output clock signal of the clock driving circuit 1300. The additional module 1400 may represent a plurality of clock driving circuits not specifically shown or may represent a tail load element. If the additional module 1400 represents multiple clock drive circuits not specifically shown, each of the multiple clock drive circuits will also include multiple circuit elements connected in series, and may also have an associated local clock circuit .
此外,图1中的时钟电路系统1000的边框以虚线示出,意味着图1所示的边界仅仅是示例性的。例如,在一种替代实施例中,时钟源2000可以是时钟电路系统1000的一部分。在另一种替代实施例中,本地时钟电路4100、4200、4300可以位于对应一级流水线电路内部,然而从功能的角度,这样的本地时钟电路仍然可以被视为时钟电路系统1000的一部分。In addition, the border of the clock circuit system 1000 in FIG. 1 is shown with a dashed line, which means that the border shown in FIG. 1 is only exemplary. For example, in an alternative embodiment, the clock source 2000 may be part of the clock circuitry 1000. In another alternative embodiment, the local clock circuits 4100, 4200, 4300 may be located inside the corresponding one-stage pipeline circuit. However, from a functional point of view, such a local clock circuit can still be regarded as a part of the clock circuit system 1000.
图2示出了时钟电路系统1000的一种示例性配置。与图1相比,图2放大了本地时钟电路4200的尺寸以具体示出本地时钟电路4200的配置。如图2所示,本地时钟电路4200可以包括一个或多个延迟元件4221、4222、4223以及逻辑门元件4230。延迟元件4221、4222、4223中的每一个可以是反相器或缓冲器。本地时钟电路4200的输入端4211可以耦接到主时钟电路中与本地时钟电路4200相关联的时钟驱动电路1200,从而接收时钟驱动电路1200输出的时钟信号。逻辑门元件4230可以是具有两个输入端的逻辑门元件。输入端4211与逻辑门元件4230的两个输入端之间可以存在第一信号路径和第二信号路径。延迟元件4221、4222、4223可以被设置在第二信号路径上。由输入端4211接收的时钟信号可以经由第一信号路径作为第一路时钟信号被直接输入到逻辑门元件4230的一个输入端,并且可以经由第二信号路径上的延迟元件4221、4222、4223作为第二路时钟信号被输入给逻辑门元件4230的另一个输入端。由于延迟元件4221、4222、4223的存在,第二路时钟信号相对于第一路时钟信号将会存在一定的延迟。该延迟的量与延迟元件4221、4222、4223相关联。逻辑门元件4230可以对彼此之间存在延迟的第一路时钟信号和第二路时钟信号进行逻辑运算,从而生成脉冲信号。所生成的脉冲信号可以被提供给对应的流水线电路(例如,图1的流水线电路3200)。FIG. 2 shows an exemplary configuration of the clock circuit system 1000. Compared with FIG. 1, FIG. 2 enlarges the size of the local clock circuit 4200 to specifically show the configuration of the local clock circuit 4200. As shown in FIG. 2, the local clock circuit 4200 may include one or more delay elements 4221, 4222, 4223 and a logic gate element 4230. Each of the delay elements 4221, 4222, 4223 may be an inverter or a buffer. The input 4211 of the local clock circuit 4200 may be coupled to the clock driving circuit 1200 associated with the local clock circuit 4200 in the main clock circuit, so as to receive the clock signal output by the clock driving circuit 1200. The logic gate element 4230 may be a logic gate element having two input terminals. A first signal path and a second signal path may exist between the input terminal 4211 and the two input terminals of the logic gate element 4230. The delay elements 4221, 4222, 4223 may be provided on the second signal path. The clock signal received by the input terminal 4211 can be directly input to one input terminal of the logic gate element 4230 via the first signal path as the first clock signal, and can be used as the delay element 4221, 4222, 4223 on the second signal path. The second clock signal is input to the other input terminal of the logic gate element 4230. Due to the existence of the delay elements 4221, 4222, 4223, the second clock signal will have a certain delay relative to the first clock signal. The amount of this delay is associated with the delay elements 4221, 4222, 4223. The logic gate element 4230 can perform logic operations on the first clock signal and the second clock signal that are delayed between each other, thereby generating a pulse signal. The generated pulse signal may be provided to a corresponding pipeline circuit (for example, the pipeline circuit 3200 of FIG. 1).
图3示出了基于彼此之间存在延迟的第一路时钟信号CLK1和第二路时钟信号CLK2生成脉冲信号PLS的示例。如图3所示,第一路时钟信号CLK1和第二路时钟信号CLK2二者可以是方波信号。尽管第一路时钟信号CLK1和第二路时钟信号CLK2都是来自输入端4211,但是由于延迟元件4221、4222、4223的存在,第二路时钟信号CLK2与第一路时钟信号CLK1可以是反相的,并且第二路时钟信号CLK2相 对于第一路时钟信号CLK1的延迟为d。可以将CLK1和CLK2两个信号输入到逻辑门元件4230。逻辑门元件4230可以是与门(AND2)。该与门对CLK1和CLK2执行逻辑“与”运算从而得到信号PLS,即,PLS=AND2(CLK1,CLK2)。所得到的PLS是高电平脉冲信号,其脉冲宽度为d。高电平脉冲信号是指具有短时高电平状态的信号。FIG. 3 shows an example of generating the pulse signal PLS based on the first clock signal CLK1 and the second clock signal CLK2 that are delayed from each other. As shown in FIG. 3, both the first clock signal CLK1 and the second clock signal CLK2 may be square wave signals. Although both the first clock signal CLK1 and the second clock signal CLK2 come from the input 4211, due to the existence of the delay elements 4221, 4222, 4223, the second clock signal CLK2 and the first clock signal CLK1 can be inverted , And the delay of the second clock signal CLK2 relative to the first clock signal CLK1 is d. Two signals, CLK1 and CLK2, can be input to the logic gate element 4230. The logic gate element 4230 may be an AND gate (AND2). The AND gate performs a logical AND operation on CLK1 and CLK2 to obtain the signal PLS, that is, PLS=AND2 (CLK1, CLK2). The obtained PLS is a high-level pulse signal with a pulse width of d. A high-level pulse signal refers to a signal with a short-term high-level state.
第一路时钟信号CLK1和第二路时钟信号CLK2二者之间的相位差异以及延迟与第二信号路径上的延迟元件的类型和数量相关联。如果第二信号路径上设置有奇数个反相器,则所得到的第二路时钟信号CLK2将具有与第一路时钟信号CLK1相反的相位。此外,第二路时钟信号CLK2相对于第一路时钟信号CLK1的延迟取决于第二信号路径上设置的全部延迟元件的延迟之和。应该理解的是,尽管图2示出本地时钟电路4200包括3个延迟元件,但是也可以使用更多或更少的延迟元件。所获得的脉冲信号的脉冲宽度与两路输入信号之间的延迟相关联,并且因此也与第二信号路径上设置的全部延迟元件的延迟之和相关联。The phase difference and delay between the first clock signal CLK1 and the second clock signal CLK2 are related to the type and number of delay elements on the second signal path. If an odd number of inverters are provided on the second signal path, the obtained second clock signal CLK2 will have the opposite phase to the first clock signal CLK1. In addition, the delay of the second clock signal CLK2 relative to the first clock signal CLK1 depends on the sum of the delays of all delay elements provided on the second signal path. It should be understood that although FIG. 2 shows that the local clock circuit 4200 includes three delay elements, more or fewer delay elements may be used. The pulse width of the obtained pulse signal is associated with the delay between the two input signals, and therefore is also associated with the sum of the delays of all the delay elements provided on the second signal path.
应当注意,尽管图3得到的PLS是高电平脉冲信号,但是也可以使用不同的逻辑门元件(NAND2)来获得低电平脉冲信号。高电平脉冲信号可以被提供给能够被高电平脉冲触发的锁存型寄存器,而低电平脉冲信号可以被提供给能够被低电平脉冲触发的锁存型寄存器。It should be noted that although the PLS obtained in FIG. 3 is a high-level pulse signal, a different logic gate element (NAND2) can also be used to obtain a low-level pulse signal. A high-level pulse signal may be provided to a latch-type register that can be triggered by a high-level pulse, and a low-level pulse signal may be provided to a latch-type register that can be triggered by a low-level pulse.
还应当注意,尽管图3示出了第一路时钟信号CLK1和第二路时钟信号CLK2反相的情况,在其他实施例中,第一路时钟信号CLK1和第二路时钟信号CLK2也可能是同相的。可以相应地选择逻辑门元件的类型,例如或门(OR2)或者或非门(NOR2)。It should also be noted that although FIG. 3 shows a situation where the first clock signal CLK1 and the second clock signal CLK2 are inverted, in other embodiments, the first clock signal CLK1 and the second clock signal CLK2 may also be In phase. The type of logic gate element can be selected accordingly, such as OR gate (OR2) or NOR gate (NOR2).
另外,尽管图3示出的延迟和脉冲宽度相对于时钟信号的周期宽度而言是显著的,但这仅仅是为了清楚的目的。在实际电路中,由延迟元件造成的延迟以及所生成的脉冲信号的脉冲宽度相对于时钟信号的周期可能更小。例如,每个延迟元件造成的延迟可能在数十皮秒量级,而时钟信号的一个时钟周期可能在几纳秒量级。In addition, although the delay and pulse width shown in FIG. 3 are significant relative to the period width of the clock signal, this is only for the purpose of clarity. In an actual circuit, the delay caused by the delay element and the pulse width of the generated pulse signal may be smaller with respect to the period of the clock signal. For example, the delay caused by each delay element may be on the order of tens of picoseconds, and one clock cycle of the clock signal may be on the order of several nanoseconds.
尽管图2示出的本地时钟电路4200能够生成流水线电路所需的脉冲信号,但是这种本地时钟电路仍然具有改进的空间。本地时钟电路4200要求在该本地时钟电路本身中设置必要的延迟元件。这些延迟元件将占用芯片面积并增加芯片的功率消耗。对于包含数十级或上百级的流水线电路(相应地包含数十个或上百个本地时钟电路)的情况,所使用的延迟元件的数量是不可忽视的。并且,当可用的芯片面积受限或者总功率受限时,可能不能在本地时钟电路中设置足够多的延迟元件。在这种情况下, 第二路时钟信号CLK2相对于第一路时钟信号CLK1的延迟d可能太小,这会导致所产生的脉冲信号的脉冲宽度太窄。触发寄存器要求最小的脉冲宽度,并且宽的脉冲宽度有助于可靠地触发寄存器。本地时钟电路所产生的脉冲信号的脉冲宽度如果太窄,则可能无法有效地触发流水线电路中的寄存器,这可能会导致流水线电路无法正确地执行数据处理任务。Although the local clock circuit 4200 shown in FIG. 2 can generate the pulse signal required by the pipeline circuit, such a local clock circuit still has room for improvement. The local clock circuit 4200 requires necessary delay elements to be provided in the local clock circuit itself. These delay elements will occupy the chip area and increase the power consumption of the chip. In the case of pipeline circuits containing tens or hundreds of stages (correspondingly including tens or hundreds of local clock circuits), the number of delay elements used cannot be ignored. Moreover, when the available chip area is limited or the total power is limited, it may not be possible to provide enough delay elements in the local clock circuit. In this case, the delay d of the second clock signal CLK2 relative to the first clock signal CLK1 may be too small, which may cause the pulse width of the generated pulse signal to be too narrow. The trigger register requires a minimum pulse width, and a wide pulse width helps to trigger the register reliably. If the pulse width of the pulse signal generated by the local clock circuit is too narrow, it may not be able to effectively trigger the register in the pipeline circuit, which may cause the pipeline circuit to fail to perform data processing tasks correctly.
图4A示出了根据本公开的实施例的改进的时钟电路系统1000A的示例性配置。与前面描述的时钟电路系统1000类似,时钟电路系统1000A可以包括主时钟电路以及一个或多个本地时钟电路4100、4200、4300。主时钟电路可以包括级联的多个时钟驱动电路1100、1200、1300。主时钟电路被配置为驱动时钟信号沿多个时钟驱动电路1100、1200、1300传播。每个时钟驱动电路可以各自包括一个或多个电路元件1110、1120、1130、1210、1220、1230、1310、1320、1330,这些电路元件可以驱动时钟信号的传播,并且另一方面也造成时钟信号的延迟。本地时钟电路4100、4200、4300中的每一个本地时钟电路分别与主时钟电路中的相应时钟驱动电路相关联。时钟电路系统1000A的本地时钟电路4100、4200、4300可以具有与图2的示例不同的配置。下面以本地时钟电路4200为例进行描述。FIG. 4A shows an exemplary configuration of an improved clock circuit system 1000A according to an embodiment of the present disclosure. Similar to the clock circuit system 1000 described above, the clock circuit system 1000A may include a main clock circuit and one or more local clock circuits 4100, 4200, 4300. The master clock circuit may include multiple clock driving circuits 1100, 1200, and 1300 cascaded. The main clock circuit is configured to drive a clock signal to propagate along the plurality of clock drive circuits 1100, 1200, and 1300. Each clock driving circuit can each include one or more circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, 1330, these circuit elements can drive the propagation of the clock signal, and on the other hand also cause the clock signal Delay. Each of the local clock circuits 4100, 4200, and 4300 is respectively associated with a corresponding clock driving circuit in the main clock circuit. The local clock circuits 4100, 4200, 4300 of the clock circuit system 1000A may have a different configuration from the example of FIG. 2. The following description takes the local clock circuit 4200 as an example.
根据本公开的实施例,本地时钟电路4200可以具有从主时钟电路汲取时钟信号的两个不同的输入端4212和4213。输入端4212和输入端4213可以分别耦接到主时钟电路中的第一端口和第二端口。第二端口在主时钟电路中可以位于第一端口的下游,并且第一端口和第二端口之间存在主时钟电路的时钟驱动电路中的能够造成时钟信号延迟的至少一个电路元件。在这种配置下,本地时钟电路4200的输入端4213从第二端口汲取的第二时钟信号相对于输入端4212从第一端口汲取的第一时钟信号将具有延迟。这种延迟是由主时钟电路中的时钟驱动电路中的一个或多个电路元件造成的,而不依赖于本地时钟电路4200中的延迟元件。According to an embodiment of the present disclosure, the local clock circuit 4200 may have two different input terminals 4212 and 4213 that draw clock signals from the main clock circuit. The input terminal 4212 and the input terminal 4213 may be respectively coupled to the first port and the second port in the main clock circuit. The second port may be located downstream of the first port in the main clock circuit, and there is at least one circuit element in the clock driving circuit of the main clock circuit that can cause the clock signal delay between the first port and the second port. In this configuration, the second clock signal drawn by the input 4213 of the local clock circuit 4200 from the second port will have a delay relative to the first clock signal drawn by the input 4212 from the first port. This delay is caused by one or more circuit elements in the clock driving circuit in the master clock circuit, and does not depend on the delay element in the local clock circuit 4200.
如图4A所示,本地时钟电路4200的输入端4212可以耦接到时钟驱动电路1200中的第二个电路元件1220的输出端(第一端口)以汲取第一时钟信号,而本地时钟电路4200的输入端4213可以耦接到时钟驱动电路1200中的第三个电路元件1230的输出端(第二端口)以汲取第二时钟信号。第一端口和第二端口之间存在电路元件1230。由于电路元件1230本身是延迟元件(反相器或缓冲器),所以电路元件1230的输出端所输出的时钟信号(即,输入端4213所汲取的第二时钟信号)相对于电路元件1220输出端输出的时钟信号(即,输入端4212所汲取的第一时钟信号)将具有一定的延 迟。As shown in FIG. 4A, the input terminal 4212 of the local clock circuit 4200 may be coupled to the output terminal (first port) of the second circuit element 1220 in the clock driving circuit 1200 to draw the first clock signal, and the local clock circuit 4200 The input terminal 4213 of may be coupled to the output terminal (second port) of the third circuit element 1230 in the clock driving circuit 1200 to draw the second clock signal. There is a circuit element 1230 between the first port and the second port. Since the circuit element 1230 itself is a delay element (inverter or buffer), the clock signal output by the output end of the circuit element 1230 (that is, the second clock signal drawn by the input end 4213) is relative to the output end of the circuit element 1220. The output clock signal (ie, the first clock signal drawn by the input terminal 4212) will have a certain delay.
根据本公开的实施例,本地时钟电路4200可以具有逻辑门元件4230。逻辑门元件4230可以对输入的各个信号执行逻辑运算。由输入端4212和4213汲取的第一时钟信号和第二时钟信号可以被提供给逻辑门元件4230。逻辑门元件4230的一个输入端可以与输入端4212通过第一信号路径连接,从而接收第一时钟信号。逻辑门元件4230的另一个输入端可以与输入端4213通过第二信号路径连接,从而接收第二时钟信号。逻辑门元件4230可以被配置为对输入的两个时钟信号执行逻辑运算,从而生成脉冲信号,如关于图3所讨论的那样。在第二信号路径上可以设置一个或多个延迟元件4221、4222、4223,从而对第二信号路径上的第二时钟信号进行进一步的延迟。通过这种方式,逻辑门元件4230的两个输入端所接收的两个时钟信号之间的延迟不仅包括由本地时钟电路4200中的延迟元件4221、4222、4223造成的延迟,还附加地包括由时钟驱动电路1200中的电路元件1230造成的延迟。这在没有增加延迟元件的情况下增加了逻辑门元件4230的两个输入端所接收的两个时钟信号之间的延迟。相应地,由逻辑门元件4230生成的脉冲信号的脉冲宽度被增加,从而能够为流水线电路提供更好的脉冲信号。According to an embodiment of the present disclosure, the local clock circuit 4200 may have a logic gate element 4230. The logic gate element 4230 can perform logic operations on each input signal. The first clock signal and the second clock signal drawn by the input terminals 4212 and 4213 may be provided to the logic gate element 4230. An input terminal of the logic gate element 4230 may be connected to the input terminal 4212 through a first signal path, thereby receiving the first clock signal. The other input terminal of the logic gate element 4230 may be connected to the input terminal 4213 through a second signal path, thereby receiving the second clock signal. The logic gate element 4230 may be configured to perform logic operations on the two input clock signals, thereby generating pulse signals, as discussed in relation to FIG. 3. One or more delay elements 4221, 4222, 4223 may be provided on the second signal path, so as to further delay the second clock signal on the second signal path. In this way, the delay between the two clock signals received by the two input terminals of the logic gate element 4230 includes not only the delay caused by the delay elements 4221, 4222, 4223 in the local clock circuit 4200, but also the delay caused by The delay caused by the circuit element 1230 in the clock driving circuit 1200. This increases the delay between the two clock signals received by the two input ends of the logic gate element 4230 without adding a delay element. Accordingly, the pulse width of the pulse signal generated by the logic gate element 4230 is increased, so that a better pulse signal can be provided to the pipeline circuit.
根据本公开的实施例,本地时钟电路4200的两个输入端4212和4213可以连接到主时钟电路的时钟驱动电路上的任意两个其他的第一端口和第二端口,只要该第一端口和第二端口之间存在时钟驱动电路中的能够使时钟信号延迟的至少一个电路元件。第一端口和第二端口可以具有多种配置。According to an embodiment of the present disclosure, the two input terminals 4212 and 4213 of the local clock circuit 4200 can be connected to any two other first port and second port on the clock driving circuit of the master clock circuit, as long as the first port and There is at least one circuit element in the clock driving circuit that can delay the clock signal between the second ports. The first port and the second port can have various configurations.
作为一种示例性配置,分别与本地时钟电路4200的两个输入端4212和4213耦接的第一端口和第二端口可以位于主时钟电路的同一级时钟驱动电路中。图4A示出的是这种示例性配置的一个实施例。作为替代的实施例,本地时钟电路4200的输入端4212可以不是连接到电路元件1220的输出端,而是连接到电路元件1210的输出端。在这种情况下,逻辑门元件4230的两个输入端所接收的两个时钟信号之间的延迟还将进一步包含由电路元件1220造成的延迟,从而进一步增加所生成的脉冲信号的脉冲宽度。As an exemplary configuration, the first port and the second port respectively coupled to the two input terminals 4212 and 4213 of the local clock circuit 4200 may be located in the same level of clock driving circuit of the main clock circuit. Figure 4A shows an embodiment of this exemplary configuration. As an alternative embodiment, the input terminal 4212 of the local clock circuit 4200 may not be connected to the output terminal of the circuit element 1220, but to the output terminal of the circuit element 1210. In this case, the delay between the two clock signals received by the two input terminals of the logic gate element 4230 will further include the delay caused by the circuit element 1220, thereby further increasing the pulse width of the generated pulse signal.
作为另一种示例性配置,分别与本地时钟电路4200的两个输入端4212和4213耦接的第一端口和第二端口可以位于主时钟电路的相邻两级时钟驱动电路中。图4B示出了改进的时钟电路系统1000B的示例性配置。如图4B所示,本地时钟电路4200的输入端4212可以连接到时钟驱动电路1200的电路元件1220的输出端(第一端口), 而本地时钟电路4200的输入端4213可以连接到相邻的时钟驱动电路1300的电路元件1310的输出端(第二端口)。As another exemplary configuration, the first port and the second port respectively coupled to the two input terminals 4212 and 4213 of the local clock circuit 4200 may be located in two adjacent stages of clock driving circuits of the main clock circuit. FIG. 4B shows an exemplary configuration of an improved clock circuit system 1000B. As shown in FIG. 4B, the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the circuit element 1220 of the clock driving circuit 1200, and the input terminal 4213 of the local clock circuit 4200 may be connected to an adjacent clock The output terminal (second port) of the circuit element 1310 of the driving circuit 1300.
作为还有的一种示例性配置,分别与本地时钟电路4200的两个输入端4212和4213耦接的第一端口和第二端口之间可以存在主时钟电路的至少一级时钟驱动电路。图4C示出了改进的时钟电路系统1000C的示例性配置。如图4C所示,本地时钟电路4200的输入端4212可以连接到时钟驱动电路1100的输出端(第一端口),而本地时钟电路4200的输入端4213可以连接到时钟驱动电路1300的电路元件1310的输出端(第二端口)。第一端口和第二端口之间存在整个一级时钟驱动电路1200。这种情况可以是有利的,因为它可以利用时钟驱动电路的现有输出端口而不需要从时钟驱动电路内部引出时钟信号,并且不会影响每一级时钟驱动电路内部的负载。As another exemplary configuration, there may be at least one level of clock driving circuit of the main clock circuit between the first port and the second port respectively coupled to the two input terminals 4212 and 4213 of the local clock circuit 4200. FIG. 4C shows an exemplary configuration of an improved clock circuit system 1000C. As shown in FIG. 4C, the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the clock driving circuit 1100, and the input terminal 4213 of the local clock circuit 4200 may be connected to the circuit element 1310 of the clock driving circuit 1300 The output terminal (the second port). The entire primary clock driving circuit 1200 exists between the first port and the second port. This situation can be advantageous because it can utilize the existing output port of the clock drive circuit without drawing the clock signal from the clock drive circuit, and will not affect the load inside the clock drive circuit of each stage.
根据本公开的实施例,可以基于所需的脉冲信号的性质来确定第一端口和第二端口的位置。脉冲信号的性质可以包括脉冲宽度和信号类型,等等。例如,可以基于脉冲信号的所需的脉冲宽度来确定第一端口和第二端口之间应当存在的时钟驱动电路的电路元件的数量。当所需的脉冲宽度较宽时,可以使第一端口和第二端口相隔较远,从而在两个端口之间存在较多的可以造成延迟的电路元件。当需要的脉冲信号的类型是高电平脉冲信号时,可以选择第一端口和第二端口的位置,使得第一端口和第二端口之间的反相器的数量与本地时钟电路的第二信号路径上的反相器(如果有的话)的数量之和为奇数,从而使得输入到逻辑门元件的两路时钟信号是反相的(例如,图3所示的情况)。According to an embodiment of the present disclosure, the positions of the first port and the second port can be determined based on the properties of the required pulse signal. The properties of pulse signals can include pulse width and signal type, and so on. For example, the number of circuit elements of the clock driving circuit that should exist between the first port and the second port can be determined based on the required pulse width of the pulse signal. When the required pulse width is relatively wide, the first port and the second port can be spaced far apart, so that there are more circuit elements that can cause delay between the two ports. When the type of pulse signal required is a high-level pulse signal, the position of the first port and the second port can be selected so that the number of inverters between the first port and the second port is the same as that of the local clock circuit. The sum of the number of inverters (if any) on the signal path is an odd number, so that the two clock signals input to the logic gate element are inverted (for example, the situation shown in FIG. 3).
在图4A-4C中,延迟元件4221、4222、4223被用虚线框绘出,这意味着可以它们当中的一个或多个不是必要的。由于已经引入了主时钟电路中的一个或多个电路元件的延迟,所以本地时钟电路4200内部的一个或多个延迟元件4221、4222、4223当中的一个或多个可以被移除。例如,在图4A中,可以由元件1230、4221、4222来提供原本由元件4221、4222、4223提供的延迟,从而可以移除元件4223而依然满足输入给逻辑门元件4230的两路时钟信号之间的延迟要求。作为另一个示例,可以由元件1220、1230、4221来提供原本由元件4221、4222、4223提供的延迟,从而移除元件4222和4223。在一些示例性配置中,本地时钟电路4200内部的延迟元件可以被全部移除,如下面关于图4D所讨论的。In FIGS. 4A-4C, the delay elements 4221, 4222, 4223 are drawn with dashed boxes, which means that one or more of them may not be necessary. Since the delay of one or more circuit elements in the master clock circuit has been introduced, one or more of the one or more delay elements 4221, 4222, 4223 inside the local clock circuit 4200 can be removed. For example, in FIG. 4A, the elements 1230, 4221, 4222 can provide the delay originally provided by the elements 4221, 4222, 4223, so that the element 4223 can be removed while still satisfying one of the two clock signals input to the logic gate element 4230. Time delay requirements. As another example, the elements 1220, 1230, and 4221 may provide the delay originally provided by the elements 4221, 4222, 4223, thereby removing the elements 4222 and 4223. In some exemplary configurations, the delay elements inside the local clock circuit 4200 may be completely removed, as discussed below with respect to FIG. 4D.
图4D示出了根据本公开的实施例的改进的时钟电路系统1000D的示例性配置。在图4D中,本地时钟电路4200的输入端4212可以耦接到时钟驱动电路1200的输入 端(第一端口)以汲取第一时钟信号,而本地时钟电路4200的输入端4213可以耦接到时钟驱动电路1200中的第三个电路元件1230的输出端(第二端口)以汲取第二时钟信号。并且,本地时钟电路4200的逻辑门元件4230与本地时钟电路4200的输入端4212和4213之间都没有设置延迟元件。在该示例中,逻辑门元件4230的两个输入端所接收的两个时钟信号之间的延迟可以完全由主时钟电路的时钟驱动电路1200中的元件1210、1220、1230提供,而无需在本地时钟电路4200中设置延迟元件(例如4221、4222、4223)。如果第一端口与第二端口之间的电路元件的总延迟量可以提供足够脉宽的脉冲信号,则可以优选地采取图4D所示的配置,其能够消除本地时钟电路中的延迟元件,从而最小化本地时钟电路的占用面积和功率。FIG. 4D shows an exemplary configuration of an improved clock circuit system 1000D according to an embodiment of the present disclosure. In FIG. 4D, the input terminal 4212 of the local clock circuit 4200 may be coupled to the input terminal (first port) of the clock driving circuit 1200 to draw the first clock signal, and the input terminal 4213 of the local clock circuit 4200 may be coupled to the clock The output terminal (second port) of the third circuit element 1230 in the driving circuit 1200 is used to draw the second clock signal. In addition, there is no delay element between the logic gate element 4230 of the local clock circuit 4200 and the input terminals 4212 and 4213 of the local clock circuit 4200. In this example, the delay between the two clock signals received by the two input ends of the logic gate element 4230 can be completely provided by the elements 1210, 1220, and 1230 in the clock driving circuit 1200 of the master clock circuit, without having to be locally A delay element (for example, 4221, 4222, 4223) is provided in the clock circuit 4200. If the total delay of the circuit elements between the first port and the second port can provide a pulse signal of sufficient pulse width, the configuration shown in FIG. 4D can be preferably adopted, which can eliminate the delay element in the local clock circuit, thereby Minimize the area and power of the local clock circuit.
与图2的示例相比,时钟电路系统1000A-1000D的优点至少存在于两个方面。一方面,在不改变本地时钟电路中的延迟元件的布置的情况下,可以提供更大的延迟,从而获得具有更宽脉冲宽度的脉冲信号。另一方面,在所需的延迟不变的情况下,允许减少本地时钟电路中的延迟元件的数量或者将它们完全移除,这将显著减小功率消耗、元件成本以及芯片面积。Compared with the example in FIG. 2, the advantages of the clock circuit systems 1000A-1000D exist in at least two aspects. On the one hand, without changing the arrangement of the delay element in the local clock circuit, a larger delay can be provided, thereby obtaining a pulse signal with a wider pulse width. On the other hand, under the condition that the required delay remains the same, it is allowed to reduce the number of delay elements in the local clock circuit or to remove them completely, which will significantly reduce power consumption, component cost and chip area.
图5示出了根据本公开的实施例的改进的时钟电路系统1000E的示意图。图5的配置与图4D类似,不同之处在于逻辑门元件4230的输出不是直接提供给流水线电路,而是可以先提供给附加的电路元件4241和4242。电路元件4241和4242可以是反相器或缓冲器。如前面所描述的,电路元件4241和4242作为有源元件可以实现驱动信号的功能,从而维持输出的信号的幅度。电路元件4241和4242可以将各自的输出信号提供给对应的一组元件(例如,寄存器)。在流水线电路中存在大量寄存器的情况下,图5的配置是有利的。这是因为由单个逻辑门元件4230提供的输出信号可能不足以驱动大量寄存器,因此有必要利用有源电路元件4241和4242来将单个逻辑门元件4230提供的输出信号转化为多个输出信号。FIG. 5 shows a schematic diagram of an improved clock circuit system 1000E according to an embodiment of the present disclosure. The configuration of FIG. 5 is similar to that of FIG. 4D. The difference is that the output of the logic gate element 4230 is not directly provided to the pipeline circuit, but can be provided to the additional circuit elements 4241 and 4242 first. The circuit elements 4241 and 4242 may be inverters or buffers. As described above, the circuit elements 4241 and 4242 as active elements can realize the function of driving signals, thereby maintaining the amplitude of the output signal. The circuit elements 4241 and 4242 may provide respective output signals to a corresponding set of elements (for example, registers). In the case where there are a large number of registers in the pipeline circuit, the configuration of FIG. 5 is advantageous. This is because the output signal provided by a single logic gate element 4230 may not be sufficient to drive a large number of registers, so it is necessary to use active circuit elements 4241 and 4242 to convert the output signal provided by a single logic gate element 4230 into multiple output signals.
应当注意的是,尽管图5示出了本地时钟电路4200的2个电路元件4241和4242,但是本地时钟电路4200可以包括更多这样的电路元件而不受限制。并且,本地时钟电路4100和4300也可以各自具有类似的电路元件(未示出)。进一步地,图4A-4D中的每一个时钟电路系统的一个或多个本地时钟电路也可以具有类似的电路元件。It should be noted that although FIG. 5 shows two circuit elements 4241 and 4242 of the local clock circuit 4200, the local clock circuit 4200 may include more such circuit elements without limitation. Also, the local clock circuits 4100 and 4300 may each have similar circuit elements (not shown). Further, one or more local clock circuits of each clock circuit system in FIGS. 4A-4D may also have similar circuit elements.
在图4A-4D以及图5中都以本地时钟电路4200为例进行了讨论,并且本地时钟电路4100和4300采用了与本地时钟电路4200相同的配置而省略了对本地时钟电路4100和4300的具体描述。然而,应当理解,本地时钟电路4100、4200、4300中的每 个本地时钟电路可以采用上述各种示例性配置中的任何一种而不受限制。例如,本地时钟电路4100可以采用关于图4A所描述的配置,而本地时钟电路4200可以采用关于图4B所描述的配置,并且本地时钟电路4300可以采用关于图4C所描述的配置。其他的混合型配置也是可能的。In FIGS. 4A-4D and FIG. 5, the local clock circuit 4200 is used as an example for discussion, and the local clock circuits 4100 and 4300 adopt the same configuration as the local clock circuit 4200, and the specific details of the local clock circuits 4100 and 4300 are omitted. describe. However, it should be understood that each of the local clock circuits 4100, 4200, 4300 may adopt any of the various exemplary configurations described above without limitation. For example, the local clock circuit 4100 may adopt the configuration described with respect to FIG. 4A, the local clock circuit 4200 may adopt the configuration described with respect to FIG. 4B, and the local clock circuit 4300 may adopt the configuration described with respect to FIG. 4C. Other hybrid configurations are also possible.
根据本公开的实施例,每个本地时钟电路所使用的逻辑门元件可以是选自与门、与非门、或门、或非门中的一种。本领域技术人员知晓,可以使用各种器件和技术来实现逻辑门元件而不受限制。According to an embodiment of the present disclosure, the logic gate element used by each local clock circuit may be one selected from an AND gate, a NAND gate, an OR gate, and a NOR gate. Those skilled in the art know that various devices and technologies can be used to implement logic gate elements without limitation.
根据本公开的实施例,可以基于多个因素而确定所选择的逻辑门元件的类型,包括但不限于:与本地时钟电路的两个输入端连接的第一端口与第二端口之间的电路元件的类型(反相器还是缓冲器)、数量及其延迟量;逻辑门元件的第二信号路径上的延迟元件的类型(反相器还是缓冲器)、数量及其延迟量;所需要的脉冲信号的类型(高电平脉冲触发还是低电平脉冲触发),等等。例如,如果第一端口与第二端口之间的反相器与第二信号路径上的反相器的数量之和为奇数,则可以选择与门或者与非门。如果该数量之和为偶数,则可以选择或门或者或非门。一种逻辑门元件可以由几个逻辑门元件的组合而实现。例如,与门和与非门之间可以相差一个反相器,或门与或非门之间也可以相差一个反相器。According to the embodiments of the present disclosure, the type of the selected logic gate element can be determined based on multiple factors, including but not limited to: the circuit between the first port and the second port connected to the two input terminals of the local clock circuit The type (inverter or buffer), number and delay of the component; the type (inverter or buffer) of the delay element on the second signal path of the logic gate element, the number and the delay; what is required The type of pulse signal (high-level pulse trigger or low-level pulse trigger), etc. For example, if the sum of the number of inverters between the first port and the second port and the number of inverters on the second signal path is an odd number, an AND gate or a NAND gate can be selected. If the sum of the numbers is even, you can choose OR gate or NOR gate. A logic gate element can be realized by a combination of several logic gate elements. For example, there can be one inverter between the AND gate and the NAND gate, and the difference between the OR gate and the NOR gate can also be one inverter.
根据本公开的实施例的各种时钟电路系统1000可以与流水线结构3000结合使用。在时钟电路系统1000提供的各个时钟信号的驱动下,流水线结构3000的各级流水线电路可以执行各种数据处理任务。这里的数据处理任务包括但不限于数据存储、数据运算等等。Various clock circuit systems 1000 according to embodiments of the present disclosure may be used in combination with the pipeline structure 3000. Driven by various clock signals provided by the clock circuit system 1000, the pipeline circuits at various levels of the pipeline structure 3000 can perform various data processing tasks. The data processing tasks here include, but are not limited to, data storage, data operations, and so on.
根据本公开的实施例,由流水线结构3000执行的数据处理任务可以包括各种计算密集型的任务。计算密集型的任务需要计算硬件长时间地运行,并且需要在计算芯片上实现大量的流水线电路以执行并行计算,因此对于时钟信号的性能、功率消耗和芯片面积都是敏感的。能够有利地利用本公开数据处理任务包括但不限于执行散列算法计算或执行人工智能(AI)计算。According to an embodiment of the present disclosure, the data processing tasks performed by the pipeline structure 3000 may include various computationally intensive tasks. Computing-intensive tasks require computing hardware to run for a long time, and a large number of pipeline circuits need to be implemented on computing chips to perform parallel computing, so they are sensitive to clock signal performance, power consumption, and chip area. Data processing tasks that can be used to advantage of the present disclosure include, but are not limited to, performing hash algorithm calculations or performing artificial intelligence (AI) calculations.
散列算法是一种将可变长度的数据作为输入并产生固定长度的散列值作为输出的算法。在散列算法中,任意长度的输入数据被填充,以使得填充后的数据长度为某固定长度(例如512位)的整数倍,即,使得填充后的数据可以划分为多个具有上述固定长度的数据块。填充位的内容包括原始数据的位长度信息。接着散列算法会对各个固定长度的数据块分别进行运算处理,例如包括数据扩展和\或压缩等操作的多轮运 算。当所有数据块都被使用以后,得到最终的固定长度的散列值。Hashing algorithm is an algorithm that takes variable-length data as input and produces fixed-length hash value as output. In the hash algorithm, input data of any length is filled so that the length of the filled data is an integer multiple of a certain fixed length (for example, 512 bits), that is, the filled data can be divided into a plurality of fixed lengths. Data block. The content of the stuffing bit includes the bit length information of the original data. Then the hash algorithm will perform calculations on each fixed-length data block, for example, multiple rounds of calculations including data expansion and\or compression. When all data blocks are used, the final fixed-length hash value is obtained.
由流水线结构3000执行的散列算法可以是SHA-256算法。自1993年以来,美国标准与技术研究所先后设计并发布了多个版本的安全散列算法SHA(Secure Hash Algorithm),SHA-256正是其中一种散列长度为256位的安全散列算法。SHA-256算法是在与虚拟加密数字货币(例如,比特币)相关联的计算中通常采用的散列算法之一。例如,比特币是基于SHA-256算法的工作量证明POW(proof of work)。使用数据处理设备(诸如,矿机)来进行比特币挖矿的核心是根据该数据处理设备计算SHA-256的运算能力来获得比特币奖励。The hash algorithm executed by the pipeline structure 3000 may be the SHA-256 algorithm. Since 1993, the American Institute of Standards and Technology has designed and released multiple versions of Secure Hash Algorithm SHA (Secure Hash Algorithm). SHA-256 is one of the secure hash algorithms with a hash length of 256 bits. . The SHA-256 algorithm is one of the hash algorithms commonly used in calculations associated with virtual encrypted digital currencies (for example, Bitcoin). For example, Bitcoin is a proof of work (POW) based on the SHA-256 algorithm. The core of using a data processing device (such as a mining machine) for bitcoin mining is to calculate the SHA-256 computing power based on the data processing device to obtain bitcoin rewards.
对于包括多轮运算的散列算法(例如SHA-256算法)而言,可以使用具有多个运算级的流水线结构来实现高速运算。例如,在执行SHA-256算法时,由于对于每个512位的数据块要进行64轮重复运算,因此可以采用64级的流水线结构来并行运算64组数据。For hash algorithms that include multiple rounds of operations (for example, the SHA-256 algorithm), a pipeline structure with multiple operation stages can be used to implement high-speed operations. For example, when performing the SHA-256 algorithm, because each 512-bit data block needs to perform 64 rounds of repeated operations, a 64-stage pipeline structure can be used to operate 64 groups of data in parallel.
图6示出了可用于实现SHA-256算法的流水线结构6000的示意图。流水线结构6000可以是前面描述的流水线结构3000的具体用例。为了实现SHA-256算法,流水线结构6000可以是32级、64级或128级流水线。如图6所示,以虚线划分了流水线结构6000中的第t运算级、第t+1运算级和第t+2运算级。每个运算级可以通过对应的一级流水线电路来实现。每一运算级也可以包括运算逻辑。每一运算级还可以包括用于存储中间值的多个寄存器A到H和分别用于存储扩展数据的多个寄存器R0至R15。这些寄存器中的一个或多个可以是锁存型寄存器。在执行SHA-256算法的过程中,流水线结构6000中的每个流水线电路中的锁存型寄存器可以基于由前面描述的时钟电路系统提供的相应的脉冲信号而触发,从而更新存储在其中的数据。取决于锁存型寄存器的类型,由时钟电路系统提供的脉冲信号可以是高电平脉冲信号或低电平脉冲信号。优选地,为了能够触发每一运算级中的多个寄存器,可以将这些寄存器分为一个或多个组,其中的每一组可以由图5所示的多个电路元件(即,电路元件4241和4242)中的相应一个电路元件的输出信号触发。Figure 6 shows a schematic diagram of a pipeline structure 6000 that can be used to implement the SHA-256 algorithm. The pipeline structure 6000 may be a specific use case of the pipeline structure 3000 described above. In order to implement the SHA-256 algorithm, the pipeline structure 6000 can be a 32-stage, 64-stage, or 128-stage pipeline. As shown in FIG. 6, the t-th operation stage, the t+1-th operation stage, and the t+2th operation stage in the pipeline structure 6000 are divided by dashed lines. Each arithmetic stage can be realized by a corresponding one-stage pipeline circuit. Each operation level can also include operation logic. Each arithmetic stage may also include a plurality of registers A to H for storing intermediate values and a plurality of registers R0 to R15 for storing extended data, respectively. One or more of these registers may be latch type registers. In the process of executing the SHA-256 algorithm, the latch-type register in each pipeline circuit in the pipeline structure 6000 can be triggered based on the corresponding pulse signal provided by the clock circuit system described above, thereby updating the data stored therein . Depending on the type of the latch type register, the pulse signal provided by the clock circuit system may be a high-level pulse signal or a low-level pulse signal. Preferably, in order to be able to trigger multiple registers in each arithmetic stage, these registers can be divided into one or more groups, each of which can be composed of multiple circuit elements as shown in FIG. 5 (ie, circuit element 4241). And 4242) the output signal of the corresponding circuit element is triggered.
根据本公开的实施例的时钟电路系统可以被包括在各种设备中,这些设备包括但不限于计算芯片、算力板、数据处理设备(诸如数字货币挖矿机)等。由于采用了根据本公开的实施例的时钟电路系统,所以这些设备能够以低廉的成本和简单的电路结构获得具有稳定占空比的多个时钟信号,从而保证了这些设备在执行具体计算任务时的性能。The clock circuit system according to the embodiments of the present disclosure may be included in various devices, including but not limited to computing chips, computing power boards, data processing devices (such as digital currency mining machines), and the like. Since the clock circuit system according to the embodiment of the present disclosure is adopted, these devices can obtain multiple clock signals with stable duty ratios at low cost and simple circuit structure, thereby ensuring that these devices perform specific computing tasks. Performance.
图7示出了根据本公开的实施例的计算芯片7000的示意性框图。计算芯片7000可以包括时钟电路系统7100、时钟源7200和流水线结构7300。时钟电路系统7100可以是前面描述的时钟电路系统(例如1000、1000A、1000B、1000C、1000D、1000E中的任一个)的具体实施例。时钟源7200可以是前面描述的时钟源2000的具体实施例。流水线结构7300可以是前面描述的流水线结构3000或6000的具体实施例。时钟电路系统7100可以与时钟源7200和流水线结构7300耦接。时钟电路系统7100可以从时钟源7200接收初始时钟信号并相应地生成多个时钟信号。该多个时钟信号可以被提供给流水线结构7300以执行特定计算任务。该特定计算任务例如可以是执行SHA-256算法。相应地,计算芯片7000可以被配置为比特币芯片。在图7中,时钟源7200以虚线框示出,表示该时钟源7200也可以位于计算芯片7000的外部。FIG. 7 shows a schematic block diagram of a computing chip 7000 according to an embodiment of the present disclosure. The computing chip 7000 may include a clock circuit system 7100, a clock source 7200, and a pipeline structure 7300. The clock circuit system 7100 may be a specific embodiment of the clock circuit system described above (for example, any one of 1000, 1000A, 1000B, 1000C, 1000D, and 1000E). The clock source 7200 may be a specific embodiment of the clock source 2000 described above. The pipeline structure 7300 may be a specific embodiment of the pipeline structure 3000 or 6000 described above. The clock circuitry 7100 can be coupled to the clock source 7200 and the pipeline structure 7300. The clock circuitry 7100 can receive the initial clock signal from the clock source 7200 and generate multiple clock signals accordingly. The multiple clock signals can be provided to the pipeline structure 7300 to perform specific computing tasks. The specific calculation task may be to execute the SHA-256 algorithm, for example. Accordingly, the computing chip 7000 may be configured as a Bitcoin chip. In FIG. 7, the clock source 7200 is shown as a dashed frame, indicating that the clock source 7200 may also be located outside the computing chip 7000.
图8示出了根据本公开的实施例的算力板8000的示意性框图。算力板8000可以包括一个或多个计算芯片8100。计算芯片8100可以是计算芯片7000的具体实施例。多个计算芯片8100可以并行地执行计算任务。FIG. 8 shows a schematic block diagram of a computing power board 8000 according to an embodiment of the present disclosure. The computing power board 8000 may include one or more computing chips 8100. The computing chip 8100 may be a specific embodiment of the computing chip 7000. Multiple computing chips 8100 can perform computing tasks in parallel.
图9示出了根据本公开的实施例的数字货币挖矿机9000的示意性框图。数字货币挖矿机9000是根据本公开的实施例的数据处理设备的示例。数字货币挖矿机9000可以被配置为执行SHA-256算法从而获得工作量证明POW(proof of work),并进一步基于该工作量证明而获得数字货币。该数字货币可以是比特币。数字货币挖矿机9000可以包括一个或多个算力板9100。算力板9100可以是算力板8000的具体实施例。多个算力板9100可以并行地执行计算任务,例如执行SHA-256算法。FIG. 9 shows a schematic block diagram of a digital currency mining machine 9000 according to an embodiment of the present disclosure. The digital currency mining machine 9000 is an example of a data processing device according to an embodiment of the present disclosure. The digital currency mining machine 9000 can be configured to execute the SHA-256 algorithm to obtain proof of work (POW), and further obtain digital currency based on the proof of work. The digital currency can be Bitcoin. The digital currency mining machine 9000 may include one or more computing power boards 9100. The computing power board 9100 may be a specific embodiment of the computing power board 8000. Multiple computing power boards 9100 can perform computing tasks in parallel, for example, execute the SHA-256 algorithm.
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。In all the examples shown and discussed herein, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.
在说明书及权利要求中的词语“前”、“后”、“顶”、“底”、“之上”、“之下”等,如果存在的话,用于描述性的目的而并不一定用于描述不变的相对位置。应当理解,这样使用的词语在适当的情况下是可互换的,使得在此所描述的本公开的实施例,例如,能够在与在此所示出的或另外描述的那些取向不同的其他取向上操作。In the specification and claims, the words "front", "rear", "top", "bottom", "above", "below", etc., if they exist, are used for descriptive purposes and are not necessarily used To describe the same relative position. It should be understood that the terms used in this way are interchangeable under appropriate circumstances, so that the embodiments of the present disclosure described herein, for example, can be used in other orientations different from those shown or otherwise described herein. Operation in orientation.
如在此所使用的,词语“示例性的”意指“用作示例、实例或说明”,而不是作为将被精确复制的“模型”。在此示例性描述的任意实现方式并不一定要被解释为比其它实现方式优选的或有利的。而且,本公开不受在上述技术领域、背景技术、发明内容或具体实施方式中所给出的任何所表述的或所暗示的理论所限定。As used herein, the word "exemplary" means "serving as an example, instance, or illustration" and not as a "model" to be accurately reproduced. Any implementation described exemplarily herein is not necessarily construed as being preferred or advantageous over other implementations. Moreover, the present disclosure is not limited by any expressed or implied theory given in the above technical field, background art, summary of the invention, or specific embodiments.
如在此所使用的,词语“基本上”意指包含由设计或制造的缺陷、器件或元件的容 差、环境影响和/或其它因素所致的任意微小的变化。词语“基本上”还允许由寄生效应、噪音以及可能存在于实际的实现方式中的其它实际考虑因素所致的与完美的或理想的情形之间的差异。As used herein, the word "substantially" means to include any small changes caused by design or manufacturing defects, device or component tolerances, environmental influences, and/or other factors. The word "substantially" also allows the difference between the perfect or ideal situation caused by parasitic effects, noise, and other practical considerations that may be present in the actual implementation.
上述描述可以指示被“连接”或“耦合”在一起的元件或节点或特征。如在此所使用的,除非另外明确说明,“连接”意指一个元件/节点/特征与另一种元件/节点/特征在电学上、机械上、逻辑上或以其它方式直接地连接(或者直接通信)。类似地,除非另外明确说明,“耦合”意指一个元件/节点/特征可以与另一元件/节点/特征以直接的或间接的方式在机械上、电学上、逻辑上或以其它方式连结以允许相互作用,即使这两个特征可能并没有直接连接也是如此。也就是说,“耦合”意图包含元件或其它特征的直接连结和间接连结,包括利用一个或多个中间元件的连接。The foregoing description may indicate elements or nodes or features that are "connected" or "coupled" together. As used herein, unless expressly stated otherwise, "connected" means that one element/node/feature is electrically, mechanically, logically, or otherwise directly connected (or Direct communication). Similarly, unless expressly stated otherwise, "coupled" means that one element/node/feature can be directly or indirectly connected to another element/node/feature mechanically, electrically, logically, or in other ways. Interaction is allowed, even if the two features may not be directly connected. In other words, "coupled" intends to include direct connection and indirect connection of elements or other features, including the connection of one or more intermediate elements.
还应理解,“包括/包含”一词在本文中使用时,说明存在所指出的特征、整体、步骤、操作、单元和/或组件,但是并不排除存在或增加一个或多个其它特征、整体、步骤、操作、单元和/或组件以及/或者它们的组合。It should also be understood that, when the term "including/comprising" is used in this text, it indicates that the specified features, wholes, steps, operations, units and/or components exist, but does not exclude the presence or addition of one or more other features, Whole, steps, operations, units and/or components and/or combinations thereof.
本领域技术人员应当意识到,在上述操作之间的边界仅仅是说明性的。多个操作可以结合成单个操作,单个操作可以分布于附加的操作中,并且操作可以在时间上至少部分重叠地执行。而且,另选的实施例可以包括特定操作的多个实例,并且在其他各种实施例中可以改变操作顺序。但是,其它的修改、变化和替换同样是可能的。因此,本说明书和附图应当被看作是说明性的,而非限制性的。Those skilled in the art should realize that the boundaries between the above operations are merely illustrative. Multiple operations can be combined into a single operation, a single operation can be distributed in additional operations, and the operations can be executed at least partially overlapping in time. Also, alternative embodiments may include multiple instances of specific operations, and the order of operations may be changed in other various embodiments. However, other modifications, changes and replacements are also possible. Therefore, this specification and drawings should be regarded as illustrative rather than restrictive.
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。在此公开的各实施例可以任意组合,而不脱离本公开的精神和范围。本领域的技术人员还应理解,可以对实施例进行多种修改而不脱离本公开的范围和精神。本公开的范围由所附权利要求来限定。Although some specific embodiments of the present disclosure have been described in detail through examples, those skilled in the art should understand that the above examples are only for illustration and not for limiting the scope of the present disclosure. The various embodiments disclosed herein can be combined arbitrarily without departing from the spirit and scope of the present disclosure. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (15)

  1. 一种时钟电路系统,其中,所述时钟电路系统包括:A clock circuit system, wherein the clock circuit system includes:
    主时钟电路,所述主时钟电路包括级联的多个时钟驱动电路,每个时钟驱动电路包括使时钟信号延迟的一个或多个延迟元件,所述主时钟电路被配置为驱动时钟信号沿所述多个时钟驱动电路传播;以及A master clock circuit, the master clock circuit includes a plurality of cascaded clock driving circuits, each clock driving circuit includes one or more delay elements that delay a clock signal, the master clock circuit is configured to drive the clock signal edge Said multiple clock drive circuits propagate; and
    一个或多个本地时钟电路,所述一个或多个本地时钟电路中的每一个本地时钟电路与所述主时钟电路中的相应时钟驱动电路相关联,并且包括:One or more local clock circuits, each of the one or more local clock circuits is associated with a corresponding clock drive circuit in the master clock circuit, and includes:
    第一输入端,耦接到所述主时钟电路的第一端口以从所述主时钟电路汲取第一时钟信号;A first input terminal, coupled to the first port of the main clock circuit to draw a first clock signal from the main clock circuit;
    第二输入端,耦接到所述主时钟电路的第二端口以从所述主时钟电路汲取第二时钟信号;和A second input terminal, coupled to the second port of the main clock circuit to draw a second clock signal from the main clock circuit; and
    逻辑门元件,耦接到所述第一输入端和所述第二输入端,并且被配置为基于所述第一时钟信号和所述第二时钟信号生成脉冲信号;A logic gate element, coupled to the first input terminal and the second input terminal, and configured to generate a pulse signal based on the first clock signal and the second clock signal;
    其中,所述第二端口在所述主时钟电路中位于所述第一端口的下游,并且所述第一端口和所述第二端口之间存在所述主时钟电路的所述相应时钟驱动电路中的至少一个延迟元件。Wherein, the second port is located downstream of the first port in the main clock circuit, and the corresponding clock driving circuit of the main clock circuit exists between the first port and the second port At least one delay element in.
  2. 如权利要求1所述的时钟电路系统,其中,所述本地时钟电路还包括使第二时钟信号延迟的一个或多个附加延迟元件,所述一个或多个附加延迟元件被设置在所述逻辑门元件与所述第二输入端之间。The clock circuit system of claim 1, wherein the local clock circuit further comprises one or more additional delay elements that delay the second clock signal, and the one or more additional delay elements are provided in the logic Between the gate element and the second input terminal.
  3. 如权利要求1所述的时钟电路系统,其中,所述本地时钟电路具有以下各种配置中的一种配置:The clock circuit system according to claim 1, wherein the local clock circuit has one of the following configurations:
    第一配置,其中与所述本地时钟电路相关联的所述第一端口和所述第二端口位于所述主时钟电路的同一级时钟驱动电路中;A first configuration, wherein the first port and the second port associated with the local clock circuit are located in the same level of clock driving circuit of the master clock circuit;
    第二配置,其中与所述本地时钟电路相关联的所述第一端口和所述第二端口位于所述主时钟电路的相邻两级时钟驱动电路中;或者A second configuration, wherein the first port and the second port associated with the local clock circuit are located in two adjacent stages of clock drive circuits of the main clock circuit; or
    第三配置,其中与所述本地时钟电路相关联的所述第一端口与所述第二端口之间存在所述主时钟电路的至少一级时钟驱动电路。A third configuration, wherein there is at least one level of clock driving circuit of the master clock circuit between the first port and the second port associated with the local clock circuit.
  4. 如权利要求3所述的时钟电路系统,其中,所述一个或多个本地时钟电路包括第一本地时钟电路和第二本地时钟电路,所述第一本地时钟电路和所述第二本地时钟电路各自具有所述第一配置、第二配置和第三配置中的不同配置。The clock circuit system of claim 3, wherein the one or more local clock circuits include a first local clock circuit and a second local clock circuit, the first local clock circuit and the second local clock circuit Each has a different configuration among the first configuration, the second configuration, and the third configuration.
  5. 如权利要求1所述的时钟电路系统,其中,所述逻辑门元件与所述本地时钟电路的所述第一输入端和所述第二输入端之间没有设置延迟元件。3. The clock circuit system according to claim 1, wherein no delay element is provided between the logic gate element and the first input terminal and the second input terminal of the local clock circuit.
  6. 如权利要求1所述的时钟电路系统,其中,所述逻辑门元件选自与门、与非门、或门、或非门中的一种;并且The clock circuit system of claim 1, wherein the logic gate element is selected from one of an AND gate, a NAND gate, an OR gate, and a NOR gate; and
    所述逻辑门元件的选择是至少基于以下各项而确定的:The selection of the logic gate element is determined based on at least the following items:
    所述第一端口与所述第二端口之间的所述至少一个延迟元件的类型和数量;The type and quantity of the at least one delay element between the first port and the second port;
    所述逻辑门元件与所述第二输入端之间的延迟元件的类型和数量;和/或The type and number of delay elements between the logic gate element and the second input terminal; and/or
    所需要的脉冲信号的类型。The type of pulse signal required.
  7. 如权利要求1中所述的时钟电路系统,其中,所述一个或多个延迟元件包括缓冲器和反相器中的至少一者。The clock circuit system of claim 1, wherein the one or more delay elements include at least one of a buffer and an inverter.
  8. 如权利要求1所述的时钟电路系统,其中,所述本地时钟电路耦接到用于执行数据处理任务的流水线结构中的对应一级流水线电路,以将所述脉冲信号提供给所述对应一级流水线电路。The clock circuit system of claim 1, wherein the local clock circuit is coupled to a corresponding one-stage pipeline circuit in a pipeline structure for performing data processing tasks to provide the pulse signal to the corresponding one Stage pipeline circuit.
  9. 如权利要求8所述的时钟电路系统,其中,所述脉冲信号被提供给所述对应一级流水线电路中的一组或多组寄存器,所述本地时钟电路的输出端与所述一组或多组寄存器中的每组寄存器之间设置有附加的缓冲器或反相器。The clock circuit system according to claim 8, wherein the pulse signal is provided to one or more sets of registers in the corresponding one-stage pipeline circuit, and the output terminal of the local clock circuit is connected to the one or more sets of registers. Additional buffers or inverters are arranged between each set of registers in the multiple sets of registers.
  10. 如权利要求9所述的时钟电路系统,其中,所述寄存器是锁存器型寄存器,所述锁存器型寄存器能够被所述脉冲信号的高电平脉冲或低电平脉冲触发。9. The clock circuit system of claim 9, wherein the register is a latch-type register, and the latch-type register can be triggered by a high-level pulse or a low-level pulse of the pulse signal.
  11. 如权利要求8所述的时钟电路系统,其中,所述数据处理任务包括执行散列 算法或执行人工智能计算。The clock circuit system of claim 8, wherein the data processing task includes executing a hash algorithm or executing artificial intelligence calculations.
  12. 如权利要求11所述的时钟电路系统,其中,所述散列算法包括SHA-256算法。The clock circuitry of claim 11, wherein the hash algorithm includes a SHA-256 algorithm.
  13. 一种计算芯片,其中,所述计算芯片包括如权利要求1-12中任一项所述的时钟电路系统。A computing chip, wherein the computing chip includes the clock circuit system according to any one of claims 1-12.
  14. 一种算力板,其中,所述算力板包括如权利要求13所述的计算芯片。A computing power board, wherein the computing power board comprises the computing chip according to claim 13.
  15. 一种数据处理设备,其中,所述数据处理设备包括如权利要求14所述的算力板。A data processing device, wherein the data processing device comprises the computing power board according to claim 14.
PCT/CN2021/083764 2020-06-22 2021-03-30 Clock circuit system, computing chip, hash board, and data processing device WO2021258801A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010572765.8A CN111562808A (en) 2020-06-22 2020-06-22 Clock circuit system, computing chip, computing board and digital currency mining machine
CN202010572765.8 2020-06-22

Publications (1)

Publication Number Publication Date
WO2021258801A1 true WO2021258801A1 (en) 2021-12-30

Family

ID=72072797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083764 WO2021258801A1 (en) 2020-06-22 2021-03-30 Clock circuit system, computing chip, hash board, and data processing device

Country Status (3)

Country Link
CN (1) CN111562808A (en)
TW (1) TWI784457B (en)
WO (1) WO2021258801A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111510137A (en) * 2020-06-04 2020-08-07 深圳比特微电子科技有限公司 Clock circuit, computing chip, computing board and digital currency mining machine
CN111562808A (en) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 Clock circuit system, computing chip, computing board and digital currency mining machine
CN114442996A (en) * 2020-10-30 2022-05-06 深圳比特微电子科技有限公司 Computing chip, computing force plate and digital currency mining machine
CN114648318A (en) * 2020-12-18 2022-06-21 深圳比特微电子科技有限公司 Circuit for executing hash algorithm, computing chip, encrypted currency mining machine and method
CN114765455A (en) * 2021-01-14 2022-07-19 深圳比特微电子科技有限公司 Processor and computing system
CN113608575B (en) * 2021-10-09 2022-02-08 深圳比特微电子科技有限公司 Assembly line clock drive circuit, calculating chip, force calculating board and calculating equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488738A (en) * 2008-01-15 2009-07-22 北京芯慧同用微电子技术有限责任公司 Time clock generating circuit and design method
CN104113304A (en) * 2014-06-26 2014-10-22 上海无线电设备研究所 Two-phase mutually non-overlap clock circuit and method thereof
CN108052156A (en) * 2017-11-27 2018-05-18 中国电子科技集团公司第三十八研究所 A kind of processor clock tree framework and construction method based on gating technology
US10659058B1 (en) * 2015-06-26 2020-05-19 Gsi Technology, Inc. Systems and methods involving lock loop circuits, distributed duty cycle correction loop circuitry
CN111562808A (en) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 Clock circuit system, computing chip, computing board and digital currency mining machine
CN212160484U (en) * 2020-06-22 2020-12-15 深圳比特微电子科技有限公司 Clock circuit system, computing chip, computing board and digital currency mining machine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488738A (en) * 2008-01-15 2009-07-22 北京芯慧同用微电子技术有限责任公司 Time clock generating circuit and design method
CN104113304A (en) * 2014-06-26 2014-10-22 上海无线电设备研究所 Two-phase mutually non-overlap clock circuit and method thereof
US10659058B1 (en) * 2015-06-26 2020-05-19 Gsi Technology, Inc. Systems and methods involving lock loop circuits, distributed duty cycle correction loop circuitry
CN108052156A (en) * 2017-11-27 2018-05-18 中国电子科技集团公司第三十八研究所 A kind of processor clock tree framework and construction method based on gating technology
CN111562808A (en) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 Clock circuit system, computing chip, computing board and digital currency mining machine
CN212160484U (en) * 2020-06-22 2020-12-15 深圳比特微电子科技有限公司 Clock circuit system, computing chip, computing board and digital currency mining machine

Also Published As

Publication number Publication date
TWI784457B (en) 2022-11-21
TW202131632A (en) 2021-08-16
CN111562808A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
WO2021258801A1 (en) Clock circuit system, computing chip, hash board, and data processing device
CN212160484U (en) Clock circuit system, computing chip, computing board and digital currency mining machine
US10979214B2 (en) Secure hash algorithm implementation
US11522546B2 (en) Clock tree, hash engine, computing chip, hash board and data processing device
US7668022B2 (en) Integrated circuit for clock generation for memory devices
WO2021258824A1 (en) Inverting output dynamic d flip-flop
KR100660639B1 (en) Data output circuit of ddr semiconductor device and semiconductor device including the same
TW202143076A (en) Circuit and method for implementing hash algorithm
CN111930682A (en) Clock tree, hash engine, computing chip, force plate and digital currency mining machine
CN111984058A (en) Microprocessor system based on superconducting SFQ circuit and arithmetic device thereof
WO2021244113A1 (en) Clock circuit, computation chip, hash board, and data processing device
KR100498473B1 (en) Control signal generation circuit and data transmission circuit having the same
US20070028058A1 (en) System for determining the position of an element in memory
WO2021135102A1 (en) Clock generation circuit and latch using same, and computing device
TWI790088B (en) Processors and Computing Systems
US6377071B1 (en) Composite flag generation for DDR FIFOs
CN212515801U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine
CN212515800U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine
CN113726335B (en) Clock control circuit, clock circuit and electronic device
US20090150709A1 (en) Reducing Inefficiencies of Multi-Clock-Domain Interfaces Using a Modified Latch Bank
CN111651403A (en) Clock tree, hash engine, computing chip, force plate and digital currency mining machine
Huemer et al. Timing domain crossing using Muller pipelines
CN212515799U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine
CN112580278A (en) Optimization method and optimization device for logic circuit and storage medium
US7047392B2 (en) Data processing apparatus and method for controlling staged multi-pipeline processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828696

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21828696

Country of ref document: EP

Kind code of ref document: A1