WO2021258801A1 - Système de circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données - Google Patents

Système de circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données Download PDF

Info

Publication number
WO2021258801A1
WO2021258801A1 PCT/CN2021/083764 CN2021083764W WO2021258801A1 WO 2021258801 A1 WO2021258801 A1 WO 2021258801A1 CN 2021083764 W CN2021083764 W CN 2021083764W WO 2021258801 A1 WO2021258801 A1 WO 2021258801A1
Authority
WO
WIPO (PCT)
Prior art keywords
clock
circuit
clock circuit
port
signal
Prior art date
Application number
PCT/CN2021/083764
Other languages
English (en)
Chinese (zh)
Inventor
范志军
刘建波
杨作兴
Original Assignee
深圳比特微电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳比特微电子科技有限公司 filed Critical 深圳比特微电子科技有限公司
Publication of WO2021258801A1 publication Critical patent/WO2021258801A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/06Clock generators producing several clock signals

Definitions

  • the present disclosure relates to the field of electronic circuits, and more specifically to a clock circuit system, and a computing chip, a computing power board, and a data processing device using the clock circuit system.
  • pipeline structure is a common method of chip design. Using the pipeline structure can effectively improve the efficiency/throughput rate of performing data processing tasks.
  • it is usually a pipeline related to instruction execution, so the processing time of the pipelines at all levels in the pipeline structure is not exactly the same.
  • pure hardware computing such as virtual digital currency computing, artificial intelligence (AI) computing, etc.
  • strict timing requirements are usually imposed. For example, the time of each stage of the pipeline needs to be precisely controlled to be consistent. Therefore, in these fields, a clock circuit system used to provide a clock signal for a pipeline structure often has a specific structure and function.
  • the embodiments of the present disclosure aim to use a simplified clock circuit system to generate a pulse signal with good performance, and the pulse signal can be used in a pipeline structure for performing computationally intensive data processing tasks.
  • a clock circuit system including a main clock circuit and one or more local clock circuits.
  • the master clock circuit includes a plurality of cascaded clock driving circuits, each clock driving circuit includes one or more delay elements that delay a clock signal, and the master clock circuit is configured to drive a clock signal along the plurality of clocks. Drive circuit propagation.
  • Each of the one or more local clock circuits is associated with a corresponding clock drive circuit in the master clock circuit, and includes: a first input terminal coupled to a first terminal of the master clock circuit A port to draw a first clock signal from the main clock circuit; a second input terminal, coupled to a second port of the main clock circuit to draw a second clock signal from the main clock circuit; and a logic gate element, coupled It is connected to the first input terminal and the second input terminal, and is configured to generate a pulse signal based on the first clock signal and the second clock signal.
  • the second port is located downstream of the first port in the main clock circuit, and the corresponding clock driving circuit of the main clock circuit exists between the first port and the second port At least one delay element in.
  • the local clock circuit further includes one or more additional delay elements that delay the second clock signal, and the one or more additional delay elements are provided between the logic gate element and the logic gate element. Between the second input terminal.
  • the local clock circuit has one of the following various configurations: a first configuration in which the first port and the second port associated with the local clock circuit Located in the same stage of the clock driving circuit of the main clock circuit; a second configuration, wherein the first port and the second port associated with the local clock circuit are located in two adjacent stages of the main clock circuit In a clock driving circuit; or a third configuration, wherein there is at least one level of clock driving circuit of the master clock circuit between the first port and the second port associated with the local clock circuit.
  • the one or more local clock circuits include a first local clock circuit and a second local clock circuit
  • the one or more local clock circuits include a first local clock circuit and a second local clock circuit.
  • a clock circuit, the first local clock circuit and the second local clock circuit each have a different configuration among the first configuration, the second configuration, and the third configuration.
  • no delay element is provided between the logic gate element and the first input terminal and the second input terminal of the local clock circuit.
  • the logic gate element is selected from one of an AND gate, a NAND gate, an OR gate, and a NOR gate; and the selection of the logic gate element is determined based on at least the following ⁇ : the type and number of the at least one delay element between the first port and the second port; the type and number of the delay element between the logic gate element and the second input terminal; and / Or the type of pulse signal required.
  • the one or more delay elements include at least one of a buffer and an inverter.
  • the local clock circuit is coupled to a corresponding one-stage pipeline circuit in a pipeline structure for performing data processing tasks to provide the pulse signal to the corresponding one-stage pipeline circuit.
  • the pulse signal is provided to one or more sets of registers in the corresponding one-stage pipeline circuit, and the output terminal of the local clock circuit is connected to the one or more sets of registers. There are additional buffers or inverters between each set of registers.
  • the register is a latch-type register that can be triggered by a high-level pulse or a low-level pulse of the pulse signal.
  • the data processing task includes executing a hash algorithm or executing an AI calculation.
  • the hash algorithm includes the SHA-256 algorithm.
  • a computing chip includes any clock circuit system as described herein.
  • a computing power board which includes the computing chip as described herein.
  • a data processing device including the hashrate board as described herein.
  • a pulse signal with good performance can be generated with a simplified clock circuit system and lower power consumption.
  • Fig. 1 shows a block diagram of a system according to an embodiment of the present disclosure
  • Fig. 2 shows an exemplary configuration of a clock circuit system
  • FIG. 3 shows an example of generating a pulse signal based on a first clock signal and a second clock signal that have a delay between each other;
  • Figure 5 shows an exemplary configuration of a further improved clock circuit system
  • Figure 6 shows a schematic diagram of a pipeline structure that can be used to implement the SHA-256 algorithm
  • Fig. 7 shows a schematic block diagram of a computing chip according to an embodiment of the present disclosure
  • FIG. 8 shows a schematic block diagram of a computing power board according to an embodiment of the present disclosure.
  • Fig. 9 shows a schematic block diagram of a digital currency mining machine according to an embodiment of the present disclosure.
  • computing hardware When performing computationally intensive data processing tasks such as virtual digital currency calculations and artificial intelligence (AI) calculations, computing hardware often needs to run for a long time. For example, in order to obtain digital currency efficiently, a data processing device such as a digital currency mining machine needs to perform a large number of hash operations without interruption. Such computing hardware consumes power significantly and brings corresponding costs, such as electricity costs.
  • the power consumption ratio is defined as the power consumed per unit of computing power of computing hardware, and it is one of the important performance indicators for measuring computing hardware.
  • the computing hardware includes or is implemented as a computing chip
  • the power consumption ratio can be reduced by reducing the number of components used by the computing chip.
  • the chip area of the computing chip can also be reduced.
  • FIG. 1 shows a block diagram of a system 100 according to an embodiment of the present disclosure.
  • the system 100 may include a clock circuit system 1000, a clock source 2000, and a pipeline structure 3000.
  • the clock circuit system 1000 may be coupled with the clock source 2000 and the pipeline structure 3000.
  • the clock source 2000 does not separately provide a separate clock signal for each stage of the pipeline circuit of the pipeline structure 3000, but provides the initial clock signal to the clock circuit system 1000, and the clock circuit system 1000 is used for the pipeline structure 3000.
  • Each stage of pipeline circuit provides a corresponding clock signal.
  • the clock circuit system 1000 may be designed to include multiple stages of clock driving circuits, and each stage of the clock driving circuit may provide a clock signal for an associated one-stage pipeline circuit.
  • Such a multi-level clock driving circuit of the clock circuit system 1000 may be called a master clock circuit, or also called a "master clock tree".
  • the main clock circuit can be extended with the extension of the pipeline circuits at all levels of the pipeline structure.
  • the main clock circuit of the clock circuit system 1000 may include a plurality of clock driving circuits 1100, 1200, and 1300 connected in series.
  • the initial clock signal provided by the clock source 2000 may be provided to the first-stage clock driving circuit 1100.
  • the output clock signal of the first-stage clock driving circuit 1100 may be provided to the second-stage clock driving circuit 1200.
  • the output clock signal of the second stage clock driving circuit 1200 may be provided to the third stage clock driving circuit 1300.
  • the clock drive circuit of the subsequent stage generates a new clock signal in response to receiving the clock signal of the clock drive circuit of the previous stage.
  • the clock signal generated by each stage of the clock driving circuit can be provided to the associated stage of pipeline circuit.
  • the master clock circuit enables the clock signal derived from the same initial clock signal to propagate tens or hundreds of stages along the clock driving circuit of each stage, thus being a pipeline structure containing tens or even hundreds of pipeline circuits.
  • Each stage of pipeline circuit provides a corresponding clock signal.
  • the clock driving circuits 1100, 1200, and 1300 can respectively provide corresponding clock signals for the pipeline circuits 3100, 3200, and 3300 in the pipeline structure 3000.
  • each stage of the clock driving circuit of the main clock circuit may be configured to include one or more circuit elements connected in series.
  • the clock signal can propagate through these circuit elements in turn.
  • the clock driving circuit 1100 may include circuit elements 1110, 1120, and 1130 connected in series in sequence
  • the clock driving circuit 1200 may include circuit elements 1210, 1220, and 1230 connected in series in sequence
  • the clock driving circuit 1300 may include circuit elements connected in series in sequence.
  • Circuit elements 1310, 1320, 1330. These circuit elements may be active elements. Active components can compensate for the power of the input signal, so that the amplitude of the clock signal propagating through the clock driving circuit can be maintained.
  • Typical active components used in clock drive circuits may include inverters and buffers.
  • the output signal of the inverter has an opposite level and phase relative to the input signal of the inverter. That is, in response to an input signal at a high level, the output signal of the inverter will be at a low level; and in response to an input signal at a low level, the output signal of the inverter will be at a high level.
  • the output signal of the buffer has the same level and phase relative to the input signal of the buffer.
  • the circuit elements 1110, 1120, and 1130 in the clock driving circuit 1100 may all be inverters, all buffers, or any combination of inverters and buffers.
  • the clock driving circuit 1200 and the clock driving circuit 1300 should have the same configuration as the clock driving circuit 1100, which can maintain the consistency of the clock driving circuits at all levels, thereby helping to ensure the clock signals provided by the clock driving circuits at all levels. Precise timing.
  • each circuit element will not be ideal.
  • the response (output signal) of each circuit element will have a certain delay relative to the stimulus (input signal). Therefore, each of the circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, and 1330 in the clock driving circuit delays the clock signal passing through the circuit element.
  • These circuit elements can also be referred to as delay elements. The delay characteristics of circuit elements can be used to generate specific signals, which will be described further below.
  • the clock signal directly output by the clock driving circuit of the master clock circuit cannot be directly used in the pipeline circuit.
  • the clock signal propagating along the main clock circuit is usually a square wave signal (for example, a square wave with a 50% duty cycle), and the pipeline circuit may use a latch type (Latch) register.
  • the latch type register needs to be triggered by a pulse signal.
  • the pulse signal is a signal that only has a short-term high-level state (or a short-term low-level state) in each clock cycle.
  • the square wave signal output by the main clock circuit is not suitable for being directly used to trigger the latch-type register in the pipeline circuit.
  • the clock signal output by the clock driving circuit of each stage of the main clock circuit needs to be preprocessed before being provided to the pipeline circuit.
  • the clock circuit system 1000 may further include local clock circuits 4100, 4200, and 4300.
  • Each local clock circuit can be associated with a corresponding clock driving circuit, and with a corresponding pipeline circuit. Unlike the clock driving circuits in the master clock circuit that are connected in series, each local clock circuit can be coupled between the corresponding clock driving circuit and the corresponding pipeline circuit.
  • the local clock circuit 4100 can be coupled between the clock driving circuit 1100 and the pipeline circuit 3100
  • the local clock circuit 4200 can be coupled between the clock driving circuit 1200 and the pipeline circuit 3200
  • the local clock circuit 4300 can be coupled between the clock driving circuit. Between the circuit 1300 and the pipeline circuit 3300.
  • the local clock circuit may draw the clock signal from the corresponding clock driving circuit, preprocess the clock signal to generate an appropriate signal (for example, a pulse signal), and provide the generated appropriate signal to the corresponding pipeline circuit.
  • an appropriate signal for example, a pulse signal
  • FIGS. 4A-4D and FIG. 5 A specific example of the configuration of the local clock circuit is described in detail below in conjunction with FIG. 2, and an improved embodiment regarding the configuration of the local clock circuit is further described in conjunction with FIGS. 4A-4D and FIG. 5.
  • the structure of the system 100 shown in FIG. 1 is only exemplary.
  • the pipeline structure 3000 in FIG. 1 includes 3-stage pipeline circuits
  • the pipeline structure according to an embodiment of the present disclosure may include more or fewer pipeline circuits, such as 2 stages, 10 stages, 50 stages, or more than 100 stages.
  • Clock drive circuit Accordingly, the master clock circuit according to the embodiment of the present disclosure is not limited to include 3 clock driving circuits, but may include more or fewer clock driving circuits, such as 2, 10, 50, or more than 100 clocks. Drive circuit.
  • a box with an ellipsis indicates an additional module 1400 that receives the output clock signal of the clock driving circuit 1300.
  • the additional module 1400 may represent a plurality of clock driving circuits not specifically shown or may represent a tail load element. If the additional module 1400 represents multiple clock drive circuits not specifically shown, each of the multiple clock drive circuits will also include multiple circuit elements connected in series, and may also have an associated local clock circuit .
  • the border of the clock circuit system 1000 in FIG. 1 is shown with a dashed line, which means that the border shown in FIG. 1 is only exemplary.
  • the clock source 2000 may be part of the clock circuitry 1000.
  • the local clock circuits 4100, 4200, 4300 may be located inside the corresponding one-stage pipeline circuit. However, from a functional point of view, such a local clock circuit can still be regarded as a part of the clock circuit system 1000.
  • FIG. 2 shows an exemplary configuration of the clock circuit system 1000. Compared with FIG. 1, FIG. 2 enlarges the size of the local clock circuit 4200 to specifically show the configuration of the local clock circuit 4200.
  • the local clock circuit 4200 may include one or more delay elements 4221, 4222, 4223 and a logic gate element 4230. Each of the delay elements 4221, 4222, 4223 may be an inverter or a buffer.
  • the input 4211 of the local clock circuit 4200 may be coupled to the clock driving circuit 1200 associated with the local clock circuit 4200 in the main clock circuit, so as to receive the clock signal output by the clock driving circuit 1200.
  • the logic gate element 4230 may be a logic gate element having two input terminals.
  • a first signal path and a second signal path may exist between the input terminal 4211 and the two input terminals of the logic gate element 4230.
  • the delay elements 4221, 4222, 4223 may be provided on the second signal path.
  • the clock signal received by the input terminal 4211 can be directly input to one input terminal of the logic gate element 4230 via the first signal path as the first clock signal, and can be used as the delay element 4221, 4222, 4223 on the second signal path.
  • the second clock signal is input to the other input terminal of the logic gate element 4230. Due to the existence of the delay elements 4221, 4222, 4223, the second clock signal will have a certain delay relative to the first clock signal. The amount of this delay is associated with the delay elements 4221, 4222, 4223.
  • the logic gate element 4230 can perform logic operations on the first clock signal and the second clock signal that are delayed between each other, thereby generating a pulse signal.
  • the generated pulse signal may be provided to a corresponding pipeline circuit (for example, the pipeline circuit 3200 of FIG. 1).
  • FIG. 3 shows an example of generating the pulse signal PLS based on the first clock signal CLK1 and the second clock signal CLK2 that are delayed from each other.
  • both the first clock signal CLK1 and the second clock signal CLK2 may be square wave signals.
  • both the first clock signal CLK1 and the second clock signal CLK2 come from the input 4211, due to the existence of the delay elements 4221, 4222, 4223, the second clock signal CLK2 and the first clock signal CLK1 can be inverted , And the delay of the second clock signal CLK2 relative to the first clock signal CLK1 is d.
  • Two signals, CLK1 and CLK2 can be input to the logic gate element 4230.
  • the logic gate element 4230 may be an AND gate (AND2).
  • the obtained PLS is a high-level pulse signal with a pulse width of d.
  • a high-level pulse signal refers to a signal with a short-term high-level state.
  • the phase difference and delay between the first clock signal CLK1 and the second clock signal CLK2 are related to the type and number of delay elements on the second signal path. If an odd number of inverters are provided on the second signal path, the obtained second clock signal CLK2 will have the opposite phase to the first clock signal CLK1. In addition, the delay of the second clock signal CLK2 relative to the first clock signal CLK1 depends on the sum of the delays of all delay elements provided on the second signal path. It should be understood that although FIG. 2 shows that the local clock circuit 4200 includes three delay elements, more or fewer delay elements may be used.
  • the pulse width of the obtained pulse signal is associated with the delay between the two input signals, and therefore is also associated with the sum of the delays of all the delay elements provided on the second signal path.
  • a high-level pulse signal may be provided to a latch-type register that can be triggered by a high-level pulse
  • a low-level pulse signal may be provided to a latch-type register that can be triggered by a low-level pulse.
  • FIG. 3 shows a situation where the first clock signal CLK1 and the second clock signal CLK2 are inverted, in other embodiments, the first clock signal CLK1 and the second clock signal CLK2 may also be In phase.
  • the type of logic gate element can be selected accordingly, such as OR gate (OR2) or NOR gate (NOR2).
  • the delay and pulse width shown in FIG. 3 are significant relative to the period width of the clock signal, this is only for the purpose of clarity.
  • the delay caused by the delay element and the pulse width of the generated pulse signal may be smaller with respect to the period of the clock signal.
  • the delay caused by each delay element may be on the order of tens of picoseconds, and one clock cycle of the clock signal may be on the order of several nanoseconds.
  • the local clock circuit 4200 shown in FIG. 2 can generate the pulse signal required by the pipeline circuit, such a local clock circuit still has room for improvement.
  • the local clock circuit 4200 requires necessary delay elements to be provided in the local clock circuit itself. These delay elements will occupy the chip area and increase the power consumption of the chip. In the case of pipeline circuits containing tens or hundreds of stages (correspondingly including tens or hundreds of local clock circuits), the number of delay elements used cannot be ignored. Moreover, when the available chip area is limited or the total power is limited, it may not be possible to provide enough delay elements in the local clock circuit.
  • the delay d of the second clock signal CLK2 relative to the first clock signal CLK1 may be too small, which may cause the pulse width of the generated pulse signal to be too narrow.
  • the trigger register requires a minimum pulse width, and a wide pulse width helps to trigger the register reliably. If the pulse width of the pulse signal generated by the local clock circuit is too narrow, it may not be able to effectively trigger the register in the pipeline circuit, which may cause the pipeline circuit to fail to perform data processing tasks correctly.
  • FIG. 4A shows an exemplary configuration of an improved clock circuit system 1000A according to an embodiment of the present disclosure.
  • the clock circuit system 1000A may include a main clock circuit and one or more local clock circuits 4100, 4200, 4300.
  • the master clock circuit may include multiple clock driving circuits 1100, 1200, and 1300 cascaded.
  • the main clock circuit is configured to drive a clock signal to propagate along the plurality of clock drive circuits 1100, 1200, and 1300.
  • Each clock driving circuit can each include one or more circuit elements 1110, 1120, 1130, 1210, 1220, 1230, 1310, 1320, 1330, these circuit elements can drive the propagation of the clock signal, and on the other hand also cause the clock signal Delay.
  • Each of the local clock circuits 4100, 4200, and 4300 is respectively associated with a corresponding clock driving circuit in the main clock circuit.
  • the local clock circuits 4100, 4200, 4300 of the clock circuit system 1000A may have a different configuration from the example of FIG. 2. The following description takes the local clock circuit 4200 as an example.
  • the local clock circuit 4200 may have two different input terminals 4212 and 4213 that draw clock signals from the main clock circuit.
  • the input terminal 4212 and the input terminal 4213 may be respectively coupled to the first port and the second port in the main clock circuit.
  • the second port may be located downstream of the first port in the main clock circuit, and there is at least one circuit element in the clock driving circuit of the main clock circuit that can cause the clock signal delay between the first port and the second port.
  • the second clock signal drawn by the input 4213 of the local clock circuit 4200 from the second port will have a delay relative to the first clock signal drawn by the input 4212 from the first port. This delay is caused by one or more circuit elements in the clock driving circuit in the master clock circuit, and does not depend on the delay element in the local clock circuit 4200.
  • the input terminal 4212 of the local clock circuit 4200 may be coupled to the output terminal (first port) of the second circuit element 1220 in the clock driving circuit 1200 to draw the first clock signal
  • the local clock circuit 4200 The input terminal 4213 of may be coupled to the output terminal (second port) of the third circuit element 1230 in the clock driving circuit 1200 to draw the second clock signal.
  • the output clock signal ie, the first clock signal drawn by the input terminal 4212
  • the local clock circuit 4200 may have a logic gate element 4230.
  • the logic gate element 4230 can perform logic operations on each input signal.
  • the first clock signal and the second clock signal drawn by the input terminals 4212 and 4213 may be provided to the logic gate element 4230.
  • An input terminal of the logic gate element 4230 may be connected to the input terminal 4212 through a first signal path, thereby receiving the first clock signal.
  • the other input terminal of the logic gate element 4230 may be connected to the input terminal 4213 through a second signal path, thereby receiving the second clock signal.
  • the logic gate element 4230 may be configured to perform logic operations on the two input clock signals, thereby generating pulse signals, as discussed in relation to FIG. 3.
  • One or more delay elements 4221, 4222, 4223 may be provided on the second signal path, so as to further delay the second clock signal on the second signal path.
  • the delay between the two clock signals received by the two input terminals of the logic gate element 4230 includes not only the delay caused by the delay elements 4221, 4222, 4223 in the local clock circuit 4200, but also the delay caused by The delay caused by the circuit element 1230 in the clock driving circuit 1200.
  • This increases the delay between the two clock signals received by the two input ends of the logic gate element 4230 without adding a delay element. Accordingly, the pulse width of the pulse signal generated by the logic gate element 4230 is increased, so that a better pulse signal can be provided to the pipeline circuit.
  • the two input terminals 4212 and 4213 of the local clock circuit 4200 can be connected to any two other first port and second port on the clock driving circuit of the master clock circuit, as long as the first port and There is at least one circuit element in the clock driving circuit that can delay the clock signal between the second ports.
  • the first port and the second port can have various configurations.
  • the first port and the second port respectively coupled to the two input terminals 4212 and 4213 of the local clock circuit 4200 may be located in the same level of clock driving circuit of the main clock circuit.
  • Figure 4A shows an embodiment of this exemplary configuration.
  • the input terminal 4212 of the local clock circuit 4200 may not be connected to the output terminal of the circuit element 1220, but to the output terminal of the circuit element 1210.
  • the delay between the two clock signals received by the two input terminals of the logic gate element 4230 will further include the delay caused by the circuit element 1220, thereby further increasing the pulse width of the generated pulse signal.
  • FIG. 4B shows an exemplary configuration of an improved clock circuit system 1000B.
  • the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the circuit element 1220 of the clock driving circuit 1200, and the input terminal 4213 of the local clock circuit 4200 may be connected to an adjacent clock The output terminal (second port) of the circuit element 1310 of the driving circuit 1300.
  • FIG. 4C shows an exemplary configuration of an improved clock circuit system 1000C.
  • the input terminal 4212 of the local clock circuit 4200 may be connected to the output terminal (first port) of the clock driving circuit 1100, and the input terminal 4213 of the local clock circuit 4200 may be connected to the circuit element 1310 of the clock driving circuit 1300 The output terminal (the second port).
  • the entire primary clock driving circuit 1200 exists between the first port and the second port. This situation can be advantageous because it can utilize the existing output port of the clock drive circuit without drawing the clock signal from the clock drive circuit, and will not affect the load inside the clock drive circuit of each stage.
  • the positions of the first port and the second port can be determined based on the properties of the required pulse signal.
  • the properties of pulse signals can include pulse width and signal type, and so on.
  • the number of circuit elements of the clock driving circuit that should exist between the first port and the second port can be determined based on the required pulse width of the pulse signal.
  • the required pulse width is relatively wide, the first port and the second port can be spaced far apart, so that there are more circuit elements that can cause delay between the two ports.
  • the type of pulse signal required is a high-level pulse signal
  • the position of the first port and the second port can be selected so that the number of inverters between the first port and the second port is the same as that of the local clock circuit.
  • the sum of the number of inverters (if any) on the signal path is an odd number, so that the two clock signals input to the logic gate element are inverted (for example, the situation shown in FIG. 3).
  • the delay elements 4221, 4222, 4223 are drawn with dashed boxes, which means that one or more of them may not be necessary. Since the delay of one or more circuit elements in the master clock circuit has been introduced, one or more of the one or more delay elements 4221, 4222, 4223 inside the local clock circuit 4200 can be removed. For example, in FIG. 4A, the elements 1230, 4221, 4222 can provide the delay originally provided by the elements 4221, 4222, 4223, so that the element 4223 can be removed while still satisfying one of the two clock signals input to the logic gate element 4230. Time delay requirements.
  • the elements 1220, 1230, and 4221 may provide the delay originally provided by the elements 4221, 4222, 4223, thereby removing the elements 4222 and 4223.
  • the delay elements inside the local clock circuit 4200 may be completely removed, as discussed below with respect to FIG. 4D.
  • FIG. 4D shows an exemplary configuration of an improved clock circuit system 1000D according to an embodiment of the present disclosure.
  • the input terminal 4212 of the local clock circuit 4200 may be coupled to the input terminal (first port) of the clock driving circuit 1200 to draw the first clock signal
  • the input terminal 4213 of the local clock circuit 4200 may be coupled to the clock
  • the output terminal (second port) of the third circuit element 1230 in the driving circuit 1200 is used to draw the second clock signal.
  • the delay between the two clock signals received by the two input ends of the logic gate element 4230 can be completely provided by the elements 1210, 1220, and 1230 in the clock driving circuit 1200 of the master clock circuit, without having to be locally
  • a delay element (for example, 4221, 4222, 4223) is provided in the clock circuit 4200. If the total delay of the circuit elements between the first port and the second port can provide a pulse signal of sufficient pulse width, the configuration shown in FIG. 4D can be preferably adopted, which can eliminate the delay element in the local clock circuit, thereby Minimize the area and power of the local clock circuit.
  • the advantages of the clock circuit systems 1000A-1000D exist in at least two aspects.
  • a larger delay can be provided, thereby obtaining a pulse signal with a wider pulse width.
  • the required delay remains the same, it is allowed to reduce the number of delay elements in the local clock circuit or to remove them completely, which will significantly reduce power consumption, component cost and chip area.
  • FIG. 5 shows a schematic diagram of an improved clock circuit system 1000E according to an embodiment of the present disclosure.
  • the configuration of FIG. 5 is similar to that of FIG. 4D. The difference is that the output of the logic gate element 4230 is not directly provided to the pipeline circuit, but can be provided to the additional circuit elements 4241 and 4242 first.
  • the circuit elements 4241 and 4242 may be inverters or buffers. As described above, the circuit elements 4241 and 4242 as active elements can realize the function of driving signals, thereby maintaining the amplitude of the output signal.
  • the circuit elements 4241 and 4242 may provide respective output signals to a corresponding set of elements (for example, registers). In the case where there are a large number of registers in the pipeline circuit, the configuration of FIG. 5 is advantageous.
  • FIG. 5 shows two circuit elements 4241 and 4242 of the local clock circuit 4200
  • the local clock circuit 4200 may include more such circuit elements without limitation.
  • the local clock circuits 4100 and 4300 may each have similar circuit elements (not shown).
  • one or more local clock circuits of each clock circuit system in FIGS. 4A-4D may also have similar circuit elements.
  • the local clock circuit 4200 is used as an example for discussion, and the local clock circuits 4100 and 4300 adopt the same configuration as the local clock circuit 4200, and the specific details of the local clock circuits 4100 and 4300 are omitted. describe.
  • each of the local clock circuits 4100, 4200, 4300 may adopt any of the various exemplary configurations described above without limitation.
  • the local clock circuit 4100 may adopt the configuration described with respect to FIG. 4A
  • the local clock circuit 4200 may adopt the configuration described with respect to FIG. 4B
  • the local clock circuit 4300 may adopt the configuration described with respect to FIG. 4C.
  • Other hybrid configurations are also possible.
  • the logic gate element used by each local clock circuit may be one selected from an AND gate, a NAND gate, an OR gate, and a NOR gate.
  • the type of the selected logic gate element can be determined based on multiple factors, including but not limited to: the circuit between the first port and the second port connected to the two input terminals of the local clock circuit The type (inverter or buffer), number and delay of the component; the type (inverter or buffer) of the delay element on the second signal path of the logic gate element, the number and the delay; what is required The type of pulse signal (high-level pulse trigger or low-level pulse trigger), etc. For example, if the sum of the number of inverters between the first port and the second port and the number of inverters on the second signal path is an odd number, an AND gate or a NAND gate can be selected.
  • OR gate or NOR gate If the sum of the numbers is even, you can choose OR gate or NOR gate.
  • a logic gate element can be realized by a combination of several logic gate elements. For example, there can be one inverter between the AND gate and the NAND gate, and the difference between the OR gate and the NOR gate can also be one inverter.
  • Various clock circuit systems 1000 may be used in combination with the pipeline structure 3000. Driven by various clock signals provided by the clock circuit system 1000, the pipeline circuits at various levels of the pipeline structure 3000 can perform various data processing tasks.
  • the data processing tasks here include, but are not limited to, data storage, data operations, and so on.
  • the data processing tasks performed by the pipeline structure 3000 may include various computationally intensive tasks.
  • Computing-intensive tasks require computing hardware to run for a long time, and a large number of pipeline circuits need to be implemented on computing chips to perform parallel computing, so they are sensitive to clock signal performance, power consumption, and chip area.
  • Data processing tasks that can be used to advantage of the present disclosure include, but are not limited to, performing hash algorithm calculations or performing artificial intelligence (AI) calculations.
  • Hashing algorithm is an algorithm that takes variable-length data as input and produces fixed-length hash value as output.
  • input data of any length is filled so that the length of the filled data is an integer multiple of a certain fixed length (for example, 512 bits), that is, the filled data can be divided into a plurality of fixed lengths.
  • the content of the stuffing bit includes the bit length information of the original data.
  • the hash algorithm will perform calculations on each fixed-length data block, for example, multiple rounds of calculations including data expansion and ⁇ or compression. When all data blocks are used, the final fixed-length hash value is obtained.
  • the hash algorithm executed by the pipeline structure 3000 may be the SHA-256 algorithm. Since 1993, the American Institute of Standards and Technology has designed and released multiple versions of Secure Hash Algorithm SHA (Secure Hash Algorithm). SHA-256 is one of the secure hash algorithms with a hash length of 256 bits. . The SHA-256 algorithm is one of the hash algorithms commonly used in calculations associated with virtual encrypted digital currencies (for example, Bitcoin). For example, Bitcoin is a proof of work (POW) based on the SHA-256 algorithm. The core of using a data processing device (such as a mining machine) for bitcoin mining is to calculate the SHA-256 computing power based on the data processing device to obtain bitcoin rewards.
  • a data processing device such as a mining machine
  • a pipeline structure with multiple operation stages can be used to implement high-speed operations.
  • a 64-stage pipeline structure can be used to operate 64 groups of data in parallel.
  • Figure 6 shows a schematic diagram of a pipeline structure 6000 that can be used to implement the SHA-256 algorithm.
  • the pipeline structure 6000 may be a specific use case of the pipeline structure 3000 described above.
  • the pipeline structure 6000 can be a 32-stage, 64-stage, or 128-stage pipeline.
  • the t-th operation stage, the t+1-th operation stage, and the t+2th operation stage in the pipeline structure 6000 are divided by dashed lines.
  • Each arithmetic stage can be realized by a corresponding one-stage pipeline circuit.
  • Each operation level can also include operation logic.
  • Each arithmetic stage may also include a plurality of registers A to H for storing intermediate values and a plurality of registers R0 to R15 for storing extended data, respectively.
  • One or more of these registers may be latch type registers.
  • the latch-type register in each pipeline circuit in the pipeline structure 6000 can be triggered based on the corresponding pulse signal provided by the clock circuit system described above, thereby updating the data stored therein .
  • the pulse signal provided by the clock circuit system may be a high-level pulse signal or a low-level pulse signal.
  • these registers can be divided into one or more groups, each of which can be composed of multiple circuit elements as shown in FIG. 5 (ie, circuit element 4241). And 4242) the output signal of the corresponding circuit element is triggered.
  • the clock circuit system according to the embodiments of the present disclosure may be included in various devices, including but not limited to computing chips, computing power boards, data processing devices (such as digital currency mining machines), and the like. Since the clock circuit system according to the embodiment of the present disclosure is adopted, these devices can obtain multiple clock signals with stable duty ratios at low cost and simple circuit structure, thereby ensuring that these devices perform specific computing tasks. Performance.
  • FIG. 7 shows a schematic block diagram of a computing chip 7000 according to an embodiment of the present disclosure.
  • the computing chip 7000 may include a clock circuit system 7100, a clock source 7200, and a pipeline structure 7300.
  • the clock circuit system 7100 may be a specific embodiment of the clock circuit system described above (for example, any one of 1000, 1000A, 1000B, 1000C, 1000D, and 1000E).
  • the clock source 7200 may be a specific embodiment of the clock source 2000 described above.
  • the pipeline structure 7300 may be a specific embodiment of the pipeline structure 3000 or 6000 described above.
  • the clock circuitry 7100 can be coupled to the clock source 7200 and the pipeline structure 7300.
  • the clock circuitry 7100 can receive the initial clock signal from the clock source 7200 and generate multiple clock signals accordingly.
  • the multiple clock signals can be provided to the pipeline structure 7300 to perform specific computing tasks.
  • the specific calculation task may be to execute the SHA-256 algorithm, for example.
  • the computing chip 7000 may be configured as a Bitcoin chip.
  • the clock source 7200 is shown as a dashed frame, indicating that the clock source 7200 may also be located outside the computing chip 7000.
  • FIG. 8 shows a schematic block diagram of a computing power board 8000 according to an embodiment of the present disclosure.
  • the computing power board 8000 may include one or more computing chips 8100.
  • the computing chip 8100 may be a specific embodiment of the computing chip 7000. Multiple computing chips 8100 can perform computing tasks in parallel.
  • FIG. 9 shows a schematic block diagram of a digital currency mining machine 9000 according to an embodiment of the present disclosure.
  • the digital currency mining machine 9000 is an example of a data processing device according to an embodiment of the present disclosure.
  • the digital currency mining machine 9000 can be configured to execute the SHA-256 algorithm to obtain proof of work (POW), and further obtain digital currency based on the proof of work.
  • the digital currency can be Bitcoin.
  • the digital currency mining machine 9000 may include one or more computing power boards 9100.
  • the computing power board 9100 may be a specific embodiment of the computing power board 8000. Multiple computing power boards 9100 can perform computing tasks in parallel, for example, execute the SHA-256 algorithm.
  • the word "exemplary” means “serving as an example, instance, or illustration” and not as a “model” to be accurately reproduced. Any implementation described exemplarily herein is not necessarily construed as being preferred or advantageous over other implementations. Moreover, the present disclosure is not limited by any expressed or implied theory given in the above technical field, background art, summary of the invention, or specific embodiments.
  • the word “substantially” means to include any small changes caused by design or manufacturing defects, device or component tolerances, environmental influences, and/or other factors.
  • the word “substantially” also allows the difference between the perfect or ideal situation caused by parasitic effects, noise, and other practical considerations that may be present in the actual implementation.
  • connection means that one element/node/feature is electrically, mechanically, logically, or otherwise directly connected (or Direct communication).
  • coupled means that one element/node/feature can be directly or indirectly connected to another element/node/feature mechanically, electrically, logically, or in other ways. Interaction is allowed, even if the two features may not be directly connected. In other words, “coupled” intends to include direct connection and indirect connection of elements or other features, including the connection of one or more intermediate elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Communication Control (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Manipulation Of Pulses (AREA)
  • Pulse Circuits (AREA)

Abstract

Un système de circuit d'horloge, une puce de calcul, une carte de hachage et un dispositif de traitement de données. Le système de circuit d'horloge comprend un circuit d'horloge primaire et un circuit d'horloge local. Le circuit d'horloge primaire comprend de multiples circuits d'attaque d'horloge en cascade, et chaque circuit d'attaque d'horloge comprend un ou plusieurs éléments de retard qui retardent des signaux d'horloge. Le circuit d'horloge local comprend une première extrémité d'entrée couplée à un premier port du circuit d'horloge primaire ; une seconde extrémité d'entrée couplée à un second port du circuit d'horloge primaire ; et un élément de porte logique. Au moins un élément de retard du circuit d'horloge primaire est présent entre le premier port et le second port. Dans la présente invention, des signaux d'impulsion ayant une bonne performance peuvent être générés à l'aide d'un système de circuit d'horloge simplifié.
PCT/CN2021/083764 2020-06-22 2021-03-30 Système de circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données WO2021258801A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010572765.8A CN111562808A (zh) 2020-06-22 2020-06-22 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN202010572765.8 2020-06-22

Publications (1)

Publication Number Publication Date
WO2021258801A1 true WO2021258801A1 (fr) 2021-12-30

Family

ID=72072797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083764 WO2021258801A1 (fr) 2020-06-22 2021-03-30 Système de circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données

Country Status (3)

Country Link
CN (1) CN111562808A (fr)
TW (1) TWI784457B (fr)
WO (1) WO2021258801A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111510137A (zh) * 2020-06-04 2020-08-07 深圳比特微电子科技有限公司 时钟电路、计算芯片、算力板和数字货币挖矿机
CN111562808A (zh) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN114442996A (zh) 2020-10-30 2022-05-06 深圳比特微电子科技有限公司 计算芯片、算力板和数字货币挖矿机
CN114648318A (zh) * 2020-12-18 2022-06-21 深圳比特微电子科技有限公司 执行哈希算法的电路、计算芯片、加密货币矿机和方法
CN114765455A (zh) * 2021-01-14 2022-07-19 深圳比特微电子科技有限公司 处理器和计算系统
CN113608575B (zh) * 2021-10-09 2022-02-08 深圳比特微电子科技有限公司 流水线时钟驱动电路、计算芯片、算力板和计算设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488738A (zh) * 2008-01-15 2009-07-22 北京芯慧同用微电子技术有限责任公司 一种时钟产生电路及设计方法
CN104113304A (zh) * 2014-06-26 2014-10-22 上海无线电设备研究所 两相互不交叠时钟电路及其方法
CN108052156A (zh) * 2017-11-27 2018-05-18 中国电子科技集团公司第三十八研究所 一种基于门控技术的处理器时钟树架构及构建方法
US10659058B1 (en) * 2015-06-26 2020-05-19 Gsi Technology, Inc. Systems and methods involving lock loop circuits, distributed duty cycle correction loop circuitry
CN111562808A (zh) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN212160484U (zh) * 2020-06-22 2020-12-15 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488738A (zh) * 2008-01-15 2009-07-22 北京芯慧同用微电子技术有限责任公司 一种时钟产生电路及设计方法
CN104113304A (zh) * 2014-06-26 2014-10-22 上海无线电设备研究所 两相互不交叠时钟电路及其方法
US10659058B1 (en) * 2015-06-26 2020-05-19 Gsi Technology, Inc. Systems and methods involving lock loop circuits, distributed duty cycle correction loop circuitry
CN108052156A (zh) * 2017-11-27 2018-05-18 中国电子科技集团公司第三十八研究所 一种基于门控技术的处理器时钟树架构及构建方法
CN111562808A (zh) * 2020-06-22 2020-08-21 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机
CN212160484U (zh) * 2020-06-22 2020-12-15 深圳比特微电子科技有限公司 时钟电路系统、计算芯片、算力板和数字货币挖矿机

Also Published As

Publication number Publication date
TWI784457B (zh) 2022-11-21
TW202131632A (zh) 2021-08-16
CN111562808A (zh) 2020-08-21

Similar Documents

Publication Publication Date Title
WO2021258801A1 (fr) Système de circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données
CN212160484U (zh) 时钟电路系统、计算芯片、算力板和数字货币挖矿机
US10979214B2 (en) Secure hash algorithm implementation
US11522546B2 (en) Clock tree, hash engine, computing chip, hash board and data processing device
US7668022B2 (en) Integrated circuit for clock generation for memory devices
WO2021258824A1 (fr) Bascule d dynamique de sortie inverseuse
KR100660639B1 (ko) 더블 데이터 레이트 반도체 장치의 데이터 출력 회로 및이를 구비하는 반도체 장치
TW202143076A (zh) 用於執行散列算法的電路和方法
CN111930682A (zh) 时钟树、哈希引擎、计算芯片、算力板和数字货币挖矿机
CN111984058A (zh) 基于超导sfq电路的微处理器系统及其运算装置
WO2021244113A1 (fr) Circuit d'horloge, puce de calcul, carte de hachage et dispositif de traitement de données
KR100498473B1 (ko) 제어신호 발생회로 및 상기 제어신호 발생회로를 구비하는데이터 전송회로
CN111651403A (zh) 时钟树、哈希引擎、计算芯片、算力板和数字货币挖矿机
TW202201902A (zh) 動態d觸發器、寄存器、晶片和數據處理裝置
WO2021135102A1 (fr) Circuit de génération d'horloge et verrou l'utilisant, et dispositif informatique
TWI790088B (zh) 處理器和計算系統
US6377071B1 (en) Composite flag generation for DDR FIFOs
CN212515801U (zh) 时钟树、哈希引擎、计算芯片、算力板和加密货币挖矿机
CN212515800U (zh) 时钟树、哈希引擎、计算芯片、算力板和加密货币挖矿机
WO2022152032A1 (fr) Circuit de test, procédé de test et système informatique comprenant un circuit de test
CN113726335B (zh) 时钟控制电路、时钟电路和电子设备
US20090150709A1 (en) Reducing Inefficiencies of Multi-Clock-Domain Interfaces Using a Modified Latch Bank
Huemer et al. Timing domain crossing using Muller pipelines
CN212515799U (zh) 时钟树、哈希引擎、计算芯片、算力板和加密货币挖矿机
CN112580278A (zh) 逻辑电路的优化方法、优化装置以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828696

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21828696

Country of ref document: EP

Kind code of ref document: A1