CN114528246A - Operation core, calculation chip and encrypted currency mining machine - Google Patents

Operation core, calculation chip and encrypted currency mining machine Download PDF

Info

Publication number
CN114528246A
CN114528246A CN202011320665.2A CN202011320665A CN114528246A CN 114528246 A CN114528246 A CN 114528246A CN 202011320665 A CN202011320665 A CN 202011320665A CN 114528246 A CN114528246 A CN 114528246A
Authority
CN
China
Prior art keywords
clock
arithmetic
clock signal
stage
operational
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011320665.2A
Other languages
Chinese (zh)
Inventor
范志军
薛可
许超
杨作兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen MicroBT Electronics Technology Co Ltd
Original Assignee
Shenzhen MicroBT Electronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen MicroBT Electronics Technology Co Ltd filed Critical Shenzhen MicroBT Electronics Technology Co Ltd
Priority to CN202011320665.2A priority Critical patent/CN114528246A/en
Priority to PCT/CN2021/104624 priority patent/WO2022105252A1/en
Priority to TW110124791A priority patent/TWI775514B/en
Publication of CN114528246A publication Critical patent/CN114528246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • G06F1/06Clock generators producing several clock signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Advance Control (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Multi Processors (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to an arithmetic core, a computing chip and an encryption currency mining machine. An arithmetic core includes an input module configured to receive a block of data, an arithmetic module configured to hash the received block of data, and a clock module. The operation module comprises: a first hash engine comprising a first plurality of operational stages arranged in a pipeline structure such that data block-based data signals pass along the first plurality of operational stages in sequence; and a second hash engine comprising a second plurality of operation stages arranged in a pipeline structure such that the data signal received from the first hash engine passes along the second plurality of operation stages in sequence. The clock module is configured to provide a clock signal to the first hash engine and the second hash engine, wherein a direction of transfer of the clock signal within the first hash engine is opposite a direction of transfer of the clock signal within the second hash engine.

Description

Operation core, calculation chip and encrypted currency mining machine
Technical Field
The present disclosure relates to an arithmetic core for performing a hash operation, and more particularly, to an arithmetic core, a calculation chip, and a cryptocurrency mining machine.
Background
The bitcoin system is the block chain system that was first proposed and is currently most widely recognized. One of the primary roles of bitcoin systems is to act as a decentralized public ledger, which can record a variety of financial transactions. This is called "decentralized" because the bitcoins are not issued by a single centralized monetary institution, but are generated by operations based on a particular algorithm. Bitcoin systems use distributed databases of nodes of a computer network to validate and record all transactions and use cryptographic designs to ensure security.
The current bitcoin protocol uses the secure Hash algorithm SHA (secure Hash Algorithm) -256. The SHA series of algorithms are published by the U.S. institute of standards and technology, wherein the SHA-256 algorithm is a secure hash algorithm with a hash length of 256 bits.
According to bitcoin protocol, the first node that succeeds in determining a workload proof pow (proof of work) of a candidate block has the right to add the block to the chain of blocks and to generate a new crypto-currency unit as a reward. This process is referred to as "mining" and the nodes that execute the bitcoin algorithm are referred to as mining machines.
The ASIC mining machine is designed to improve chip size, chip running speed and chip power consumption.
Disclosure of Invention
According to a first aspect of the present disclosure, there is provided an arithmetic core comprising: an input module configured to receive a block of data; an operation module configured to perform a hash operation on a received data block, the operation module including a first hash engine and a second hash engine, the first hash engine including a first plurality of operation stages arranged in a pipeline structure such that a data signal based on the data block is sequentially transferred along the first plurality of operation stages, the second hash engine including a second plurality of operation stages arranged in a pipeline structure such that a data signal received from the first hash engine is sequentially transferred along the second plurality of operation stages, wherein each of the first and second plurality of operation stages operates on a data signal received from a previous operation stage and provides a data signal operated on by the operation stage to a subsequent operation stage; and a clock module configured to provide clock signals to the first hash engine and the second hash engine, wherein a transfer direction of the clock signal within the first hash engine is opposite to a transfer direction of the clock signal within the second hash engine.
According to a second aspect of the present disclosure, there is provided a computing chip comprising one or more arithmetic cores as described above.
According to a third aspect of the present disclosure, there is provided a computing chip comprising a plurality of the above-described arithmetic cores arranged in a plurality of columns, the clock module of each column of arithmetic cores receiving a clock signal via a common clock channel.
According to a fourth aspect of the present disclosure there is provided a cryptocurrency mining machine comprising one or more computing chips as described above.
Other features of the present disclosure and advantages thereof will become more apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The drawings are included for illustrative purposes and serve only to provide examples of possible structures and arrangements of the inventive apparatus disclosed herein and methods of applying the same to computing devices. These drawings in no way limit any changes in form and detail that may be made to the embodiments by one skilled in the art without departing from the spirit and scope of the embodiments. The embodiments will be more readily understood from the following detailed description taken in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements.
Fig. 1-4 are schematic diagrams of an arithmetic core, according to some embodiments of the present disclosure.
Fig. 5A-5C are schematic diagrams of an arithmetic core according to some further embodiments of the present disclosure.
Fig. 6A-6D are schematic diagrams of an arithmetic core having a vertical structure, according to some embodiments of the present disclosure.
FIG. 7 is a schematic diagram of a computing chip, according to some embodiments of the present disclosure.
FIG. 8A is a schematic diagram of a computing chip according to some further embodiments of the present disclosure.
Fig. 8B is a schematic diagram of a compute chip including an arithmetic core having a vertical structure in accordance with some further embodiments of the present disclosure.
Fig. 8C is a schematic layout diagram for distributing clock signals to the operation cores having a vertical structure in the calculation chip shown in fig. 8B.
Fig. 9A-9C are schematic diagrams of computing chips according to some further embodiments of the present disclosure.
Fig. 9D is a schematic layout diagram for distributing clock signals to the arithmetic cores in the computing chip shown in fig. 9C.
Fig. 9E is a schematic diagram of a computing chip including an arithmetic core having a vertical structure in accordance with some further embodiments of the present disclosure.
Fig. 9F is a schematic layout diagram for distributing clock signals to the operation cores having a vertical structure in the calculation chip shown in fig. 9E.
Fig. 9G is a schematic layout diagram for distributing clock signals to an arithmetic core in a compute chip according to some further embodiments of the present disclosure.
FIG. 10 is a schematic diagram of an exemplary pipeline structure for performing the SHA-256 algorithm.
Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In this specification, like reference numerals and letters are used to designate like items, and therefore, once an item is defined in one drawing, further discussion thereof is not required in subsequent drawings.
For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the disclosed invention is not limited to the positions, dimensions, ranges, etc. disclosed in the drawings and the like. Furthermore, the figures are not necessarily to scale, some features may be exaggerated to show details of particular components.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. That is, the hash engine herein is shown by way of example to illustrate different embodiments of the circuit in the present disclosure and is not intended to be limiting. Those skilled in the art will appreciate that they are merely illustrative of ways that the invention may be practiced, not exhaustive.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
The computing chip of a mining machine typically includes a top module and an arithmetic core. The top module is used for executing functions such as a communication function, a control function, an input/output (IO) function, a clock PLL (phase locked loop) function and the like. The operation core is used for the calculation operation of the core. The operation core obtains the operation task from the top module and feeds back the operation result to the top module. For bitcoin mining, a complete computation usually requires two 64 cycles (two SHA-256 algorithms are performed, which is usually called double hashing), i.e. 128 beats. Some optimization methods may reduce several beat (e.g., 6 beat) operations. In the embodiment according to the present disclosure, the operation of performing the SHA-256 algorithm twice (i.e. 128-beat operation) by the operation core is mainly taken as an example for explanation, but it will be understood by those skilled in the art that the present disclosure is not limited thereto and can be applied to any number of beat operations. The SHA-256 algorithm referred to herein includes any version of the SHA-256 algorithm known in the art, as well as variations and modifications thereof.
In general, the number of operation beats required by the current bitcoin mining is still more. In the present disclosure, in order to improve operation throughput, the operation core may be configured to have a plurality of operation stages arranged in a pipeline structure. FIG. 10 schematically illustrates an exemplary pipeline structure for executing the SHA-256 algorithm, which includes 64 operation stages, each having 8 packed registers A-H and 16 extended registers 0-15. The 1 st arithmetic stage may receive an input data block, divide it into 8 32-bit data, store them in the compression registers a-H, and then perform arithmetic processing on them and supply them to the 2 nd arithmetic stage. Then, each operation stage operates the operation result of the previous operation stage received by the operation stage and provides the operation result of the operation stage to the next operation stage. Finally, after 64 operation stages, the operation core may output a hash operation result of performing the SHA-256 algorithm once on the input data block. Therefore, when all the operation levels in the pipeline structure are fully loaded (namely all the operation levels receive data and perform operation processing), the operation core can output an operation result in each beat, and the operation throughput rate is greatly improved.
By way of non-limiting example, for an ore computing chip arithmetic core that needs to execute the SHA-256 algorithm twice, a total of 128 arithmetic stages are required. The arithmetic core may include two hash engines, each hash engine may include 64 arithmetic stages and be configured to execute the SHA-256 algorithm. Each hash engine may have, for example, the configuration shown in fig. 10. It is to be understood that the present disclosure is not particularly limited to the hashing algorithm performed by the hashing engine, and that the hashing engine of the arithmetic core may be used to perform virtually any hashing algorithm now known or later developed that is suitable for use in mining machines (not limited to the SHA family of algorithms) and accordingly may include a corresponding number of arithmetic stages.
When the operations of the operation core are designed in accordance with the pipeline structure, a clock signal needs to be supplied to each operation stage in the pipeline structure. In a case where the transfer direction of the clock signal in the arithmetic core may be the same as the transfer direction of the data signal in the pipeline structure (forward clock structure), i.e., from the foremost arithmetic stage to the last arithmetic stage in the pipeline structure, the clock cycle may be smaller in such a case, and accordingly the chip frequency may be faster and higher performance may be achieved, but in such a case, the holding time of the register at each arithmetic stage in the pipeline structure may not be easily satisfied, and the chip may not work normally. In another case, the transfer direction of the clock signal in the operation core may be opposite to the transfer direction of the data signal in the pipeline structure (inverted clock structure), i.e., from the last operation stage to the foremost operation stage in the pipeline structure, in which case it is easier to satisfy the holding time of the register at each operation stage in the pipeline structure so that data can be stably driven into the register, but in which case the frequency of the chip is sacrificed to cause a reduction in the performance of the chip. In addition, in both cases, the clock signal needs to traverse through each operational stage on the pipeline structure in the operational core, typically up to 128 stages in the propagation order of the clock signal. However, the farther the clock signal propagates, the greater the distortion degree of the rising edge and/or the falling edge of the clock signal becomes, resulting in the deterioration of the shape of the clock signal and the worse the duty ratio becomes. When the clock signal propagates to an operation stage (e.g., 128 th operation stage) located downstream of the pipeline structure in the transfer direction of the data signal or to an operation stage (e.g., 1 st operation stage) located upstream of the pipeline structure in the direction opposite to the transfer direction of the data signal, the level of the clock signal may not be able to satisfy the minimum pulse requirement of the register of the current operation stage, thereby seriously degrading performance.
In the arithmetic core according to the embodiment of the present disclosure, one hash engine is configured as a forward clock structure and the other hash engine is configured as a reverse clock structure, so that the entire arithmetic core has a mixed clock structure, that is, the transfer directions of clock signals within the two hash engines of the arithmetic core are different from each other. Such a configuration not only can combine the advantages of the forward clock structure and the reverse clock structure to seek a balance between the retention time and the chip frequency, but also can greatly reduce the number of operation stages through which the clock signal needs to be transmitted, thereby being capable of significantly improving the shape of the clock signal at each operation stage, and thus advantageously improving the performance of the operation core and thus the entire computation chip.
An operation core according to an embodiment of the present disclosure is described in detail below with reference to the accompanying drawings. In these drawings, a dotted arrow is used to indicate a transfer direction of a data signal, and a solid arrow is used to indicate a transfer direction of a clock signal. It should be noted that an actual computational core may also include additional components that are not shown in the figures and are not discussed in this disclosure to avoid obscuring the points of the present disclosure.
Fig. 1 schematically illustrates an arithmetic core 100A according to an embodiment of the disclosure. The arithmetic core 100A may include an input module 110, an arithmetic module 120, and a clock module 140. The input module 110 may be configured to receive a block of data. The operation module 120 may be configured to perform a hash operation on the received data block. The clock module 140 may be configured to provide the required clock signal to the operation module 120.
As shown in fig. 1, the operation module 120 includes a first hash engine 121 and a second hash engine 122. The first hash engine 121 includes a first plurality of operation stages 121-1. The operation stages 121-1, the. The second hash engine 122 includes a second plurality of operation stages 122-1. The arithmetic stages 122-1, 122-i, 122-64 are arranged in a pipeline structure such that data signals received from the first hash engine 121 are passed along the arithmetic stages 122-1, 122-i, 122-64 in sequence. Each of the operation stages 121-1, a. In some examples, the operational stages in the hash engine may be configured with reference to the exemplary pipeline structure of FIG. 10, as well as other pipeline structures known in the art or later developed. It should be understood that although first hash engine 121 and second hash engine 122 are depicted in the figures as respectively comprising 64 operational stages, as already mentioned above, the hash engine for performing the SHA-256 algorithm is used herein only as a non-limiting example, the operational core and its hash engine according to the present disclosure may be adapted to perform virtually any hash algorithm now known or later developed that is suitable for use in an industrial machine (not limited to the SHA series of algorithms), and accordingly may comprise a suitable number of operational stages.
The clock module 140 is configured to provide the clock signals to the first hash engine 121 and the second hash engine 122 such that the clock signal within the first hash engine 121 is transferred in a direction opposite to the clock signal within the second hash engine 122. In this way, the clock signal does not need to traverse the first hash engine 121 and then the second hash engine 122 (forward clock structure) or traverse the second hash engine 122 and then the first hash engine 121 (reverse clock structure), but only needs to traverse the first hash engine 121 and the second hash engine 122, respectively, so that the number of operation stages that the clock signal needs to go through is halved with respect to the forward clock structure or the reverse clock structure. Since the transfer directions of the data signals within the first and second hash engines 121 and 122 are the same, and the transfer directions of the clock signals within the first and second hash engines 121 and 122 are opposite, the clock structure of the operation core according to the present disclosure may be referred to as a hybrid clock structure.
In some embodiments, the transfer direction of the clock signal and the transfer direction of the data signal within the first hash engine 121 may be opposite, and the transfer direction of the clock signal and the transfer direction of the data signal within the second hash engine 122 may be the same. In some embodiments, the direction of transfer of the clock signal and the direction of transfer of the data signal within the first hash engine 121 may be the same, and the direction of transfer of the clock signal and the direction of transfer of the data signal within the second hash engine 122 may be opposite.
For example, as shown in FIG. 1, in the arithmetic core 100A, the clock signal provided by the clock module 140 to the first hash engine 121 is passed from the arithmetic stage 121-64 to the arithmetic stage 121-1, and the clock signal provided by the clock module 140 to the second hash engine 122 is passed from the arithmetic stage 122-1 to the arithmetic stage 122-64. Therefore, the clock signal propagates from the middle of the operation core to two sides, only 64 operation stages are transmitted at most, and the worst shape of the clock is greatly improved.
In some embodiments, the arithmetic core according to the present disclosure may be further provided with synchronous first-in first-out (FIFO) modules at the first hash engine 121 and the second hash engine 122 for passing data signals between the arithmetic stages 121-64 and 122-1. A synchronous FIFO is a FIFO design that performs read and write operations of a FIFO buffer in the same clock domain, i.e. data values are written to and read from the FIFO buffer from one clock domain and from the same FIFO buffer from the same clock domain.
FIG. 2 illustrates an arithmetic core 100B having a synchronous FIFO module according to the present disclosure. The arithmetic core 100B further includes a synchronization FIFO module 150 disposed between the last arithmetic stage 121-64 of the first hash engine 121 and the most previous arithmetic stage 122-1 of the second hash engine 122, as compared to the arithmetic core 100A. The synchronization FIFO module 150 is configured to receive the data signal output from the last arithmetic stage 121-64 of the first hash engine 121 using the clock signal provided by the clock module 140 and output the received data signal to the most previous arithmetic stage 122-1 of the second hash engine 122 using the clock signal provided by the clock module 140. Clock module 140 is also configured to provide a clock signal to synchronization FIFO module 150. In the arithmetic core 100B, the clock signal provided by the clock module 140 to the first hash engine 121 is passed from the arithmetic stage 121-64 to the arithmetic stage 121-1, and the clock signal provided by the clock module 140 to the second hash engine 122 is passed from the arithmetic stage 122-1 to the arithmetic stage 122-64. The clock signal passes only 64 operational stages at the most. The introduction of the synchronization FIFO module 150 does not affect the processing speed and throughput of the entire computational core, because the data signal will not pass through between the synchronization FIFO module 150 and the computational stages for longer than between the computational stages.
The introduction of the synchronization FIFO module also brings additional effects. In general, the arithmetic core may be implemented on a semiconductor chip (e.g., a silicon chip). All the arithmetic stages of the pipeline structure are typically arranged in the same row, with the first and second hash engines adjacent to each other in the horizontal direction. The horizontal direction referred to herein may refer to an extending direction of the pipeline structure, i.e., a transfer direction of the data signal. In some embodiments, the first and second hash engines may also be arranged in two different rows along the surface of the semiconductor chip so as to be adjacent to each other in a vertical direction perpendicular to the horizontal direction. The operation core having the first hash engine and the second hash engine thus arranged may be referred to herein as an operation core having a vertical structure. An arithmetic core having a vertical structure may have a more suitable (e.g., more square-like) aspect ratio, thereby facilitating flexible placement of such arithmetic cores on a computing chip. In such a case, it is more convenient to cut more generally rectangular chips from a generally circular silicon wafer. However, for an arithmetic core having a vertical structure, the distance that a data signal needs to travel between the last arithmetic stage of the first hash engine and the foremost arithmetic stage of the second hash engine is longer than the distance that needs to travel between two adjacent arithmetic stages inside the hash engine, resulting in a longer data signal travel time between these two arithmetic stages than between two adjacent arithmetic stages inside the hash engine, thereby possibly limiting the processing speed and throughput rate of the arithmetic core. However, the synchronous FIFO module has loose timing, which can help to shorten the transfer time of the data signal from the last operation stage of the first hash engine to the first operation stage of the second hash engine in the vertical structure, thereby improving the performance of the operation core having the vertical structure, so that the vertical structure does not deteriorate the processing speed and throughput of the operation core while bringing advantages.
Fig. 6A and 6B are schematic diagrams of an arithmetic core including a synchronization FIFO module and first and second hash engines vertically adjacent to each other according to an embodiment of the present disclosure. As shown in fig. 6A, the first hash engine 221 and the second hash engine 222 of the arithmetic core 200A are adjacent to each other in the vertical direction, and the data signal reaches the second hash engine 222 from the first hash engine 221 via the synchronization FIFO module 250, wherein the clock signal in the first hash engine 221 is opposite to the transfer direction of the data signal, and the clock signal in the second hash engine 222 is the same as the transfer direction of the data signal. As shown in fig. 6B, the first hash engine 221 and the second hash engine 222 of the operation core 200B are adjacent to each other in the vertical direction, and the data signal arrives at the second hash engine 222 from the first hash engine 221 via the synchronization FIFO module 250, wherein the clock signal in the second hash engine 222 is opposite to the transfer direction of the data signal, and the clock signal in the first hash engine 221 is the same as the transfer direction of the data signal. The relative position relationship between the first hash engine 221 and the second hash engine 222 in the vertical direction depicted in the drawings is only exemplary and not limiting, and the relative position relationship between the first hash engine 221 and the second hash engine 222 in the vertical direction may also be reversed according to actual needs.
In still other embodiments, the arithmetic core according to the present disclosure may be further provided with asynchronous first-in first-out (FIFO) modules at the first hash engine 121 and the second hash engine 122 for passing data signals between the arithmetic stages 121-64 and 122-1. Asynchronous FIFO refers to a FIFO design in which data values are written from one clock domain into a FIFO buffer and data values are read from the same FIFO buffer from another clock domain, the two clock domains being asynchronous with respect to each other. An asynchronous FIFO may be used to securely transfer data from one clock domain to another.
FIG. 3 illustrates an arithmetic core 100C having asynchronous FIFO modules according to the present disclosure. The arithmetic core 100C further includes an asynchronous FIFO module 130 disposed between the last arithmetic stage 121-64 of the first hash engine 121 and the most previous arithmetic stage 122-1 of the second hash engine 122, as compared to the arithmetic core 100A. The asynchronous FIFO module 130 is configured to receive a data signal output from the last operation stage 121-64 of the first hash engine 121 using a first clock signal and output the received data signal to the most previous operation stage 122-1 of the second hash engine 122 using a second clock signal different from the first clock signal. The arithmetic core 100C includes a first clock module 141 and a second clock module 142. The first clock module 141 is configured to provide a first clock signal to the first hash engine 121 and the asynchronous FIFO module 130, and the second clock module 142 is configured to provide a second clock signal to the second hash engine 122 and the asynchronous FIFO module 130, and wherein the first clock signal and the second clock signal have the same frequency. The introduction of the asynchronous FIFO module 130 does not affect the processing speed and throughput of the entire computational core, because the transfer time of the data signal between the asynchronous FIFO module 130 and the computational stages does not exceed the transfer time between the computational stages.
In some embodiments, the first clock signal and the data signal may be transferred in the same direction, and the second clock signal and the data signal may be transferred in opposite directions. As shown in FIG. 3, the data signal propagates through all the arithmetic stages of the arithmetic module 120 in a left-to-right direction, while the first clock signal propagates within the first hash engine 121 (from the arithmetic stages 121-1 to 121-64) in the left-to-right direction, and the second clock signal propagates within the second hash engine 122 (from the arithmetic stages 122-64 to 122-1) in the right-to-left direction.
In some other embodiments, the first clock signal and the data signal may be transferred in opposite directions, and the second clock signal and the data signal may be transferred in the same direction. As shown in fig. 4, the arithmetic core 100D still propagates through all the arithmetic stages of the arithmetic module 120 in the left-to-right direction compared to the arithmetic core 100C, but the first clock signal propagates in the right-to-left direction within the first hash engine 121 (from the arithmetic stages 121-64 to 121-1) and the second clock signal propagates in the left-to-right direction within the second hash engine 122 (from the arithmetic stages 122-1 to 122-64).
In some embodiments, the first clock module 141 and the second clock module 142 may be configured to receive clock signals from the same clock source located outside of the arithmetic core. A clock source may be used to provide a basic clock signal. That is, the first clock signal and the second clock signal may be homologous, but experience different paths from the clock source to the respective clock modules.
The introduction of asynchronous FIFO modules also brings additional effects. Similarly to the case of the synchronous FIFO module, the asynchronous FIFO module may also contribute to shortening the transfer time of the data signal from the last operation stage of the first hash engine to the last operation stage of the second hash engine in the vertical structure, thereby improving the performance of the operation core having the vertical structure. Fig. 6C and 6D are schematic diagrams of an arithmetic core including an asynchronous FIFO module and first and second hash engines vertically adjacent to each other, according to an embodiment of the disclosure. As shown in fig. 6C, the first hash engine 221 and the second hash engine 222 of the operation core 200C are adjacent to each other in the vertical direction, and the data signal reaches the second hash engine 222 from the first hash engine 221 via the asynchronous FIFO module 230, wherein the first clock signal and the data signal are transferred in the same direction, and the second clock signal and the data signal are transferred in the opposite direction. As shown in fig. 6D, the first hash engine 221 and the second hash engine 222 of the operation core 200D are adjacent to each other in the vertical direction, and the data signal arrives at the second hash engine 222 from the first hash engine 221 via the asynchronous FIFO module 230, wherein the second clock signal is in the same direction as the data signal and the first clock signal is opposite to the data signal. The relative positional relationship of the first hash engine 221 and the second hash engine 222 in the vertical direction depicted in the figure is also merely exemplary and not limiting.
In some embodiments, the first hash engine 121 or the second hash engine 122 may further include one or more asynchronous FIFO modules internally, which may be interposed between the operation stages. In this way, the number of operational stages that each clock signal needs to pass through can be further reduced. The insertion of these asynchronous FIFO modules may cause the operation stages in each hash engine to be divided into groups, and in some embodiments, the number of operation stages contained in each group may be the same. Fig. 5A to 5C show examples of operation cores in which additional asynchronous FIFO modules are provided in the first hash engine 121 and the second hash engine 122.
As shown in FIG. 5A, the arithmetic core 100A' further includes a second asynchronous FIFO module 132, as compared to the arithmetic core 100A. The second asynchronous FIFO module 132 is disposed between adjacent first 121-a and second 121-b operational stages of the first plurality of operational stages of the first hash engine 121, the first operational stage 121-a preceding the second operational stage 121-b. The second asynchronous FIFO module 132 is configured to receive the data signal output from the first operational stage 121-a using a third clock signal different from the clock signal provided by the clock module 140 and to output the received data signal to the second operational stage 121-b using the clock signal provided by the clock module 140. The arithmetic core 100A' further comprises a third clock module 143, the third clock module 143 being configured to provide a third clock signal to the second asynchronous FIFO module 132 and to the first arithmetic stage 121-a and to an arithmetic stage of the first plurality of arithmetic stages of the first hash engine 121 preceding the first arithmetic stage 121-a. The clock module 140 is further configured to provide a clock signal to the second asynchronous FIFO module 132 and to the second arithmetic stage 121-b and to the arithmetic stages of the first plurality of arithmetic stages of the first hash engine 121 that follow the second arithmetic stage 121-b.
Additionally or alternatively, the arithmetic core 100A' further comprises a third asynchronous FIFO module 133. The third isochronous FIFO module 133 is disposed between adjacent third and fourth arithmetic stages 122-c and 122-d of the second plurality of arithmetic stages of the second hash engine 122, the third arithmetic stage 122-c preceding the fourth arithmetic stage 122-d. The third asynchronous FIFO module 133 is configured to receive the data signal output from the third arithmetic stage 122-c using the clock signal provided by the clock module 140 and to output the received data signal to the fourth arithmetic stage 122-d using a fourth clock signal that is different from the clock signal provided by the clock module 140. The arithmetic core 100A' further includes a fourth clock module 144, the fourth clock module 144 being configured to provide a fourth clock signal to the third asynchronous FIFO module 133 and to the fourth arithmetic stage 122-d and to the arithmetic stages of the second plurality of arithmetic stages of the second hash engine 122 subsequent to the fourth arithmetic stage 122-d. The clock module 140 is further configured to provide a clock signal to the third asynchronous FIFO module 133 and to the third operational stage 122-c and to the operational stage of the second plurality of operational stages of the second hash engine 122 that precedes the third operational stage 122-c.
Similarly, as shown in fig. 5B, the computing core 100B' may also include a second asynchronous FIFO module 132, a third clock module 143, and/or a third asynchronous FIFO module 133, a fourth clock module 144, as compared to the computing core 100B.
As shown in fig. 5C, the arithmetic core 100C' may also include a second asynchronous FIFO module 132, as compared to the arithmetic core 100C. The second asynchronous FIFO module 132 is disposed between adjacent first 121-a and second 121-b operational stages of the first plurality of operational stages of the first hash engine 121, the first operational stage 121-a preceding the second operational stage 121-b. The second asynchronous FIFO module 132 is configured to receive the data signal output from the first operational stage 121-a using a third clock signal different from the first clock signal provided by the first clock module 141 and to output the received data signal to the second operational stage 121-b using the first clock signal provided by the first clock module 141. The arithmetic core 100C' further comprises a third clock module 143, the third clock module 143 being configured to provide a third clock signal to the second asynchronous FIFO module 132 and to the first arithmetic stage 121-a and to an arithmetic stage of the first plurality of arithmetic stages of the first hash engine 121 preceding the first arithmetic stage 121-a. The first clock module 141 is configured to provide the first clock signal to the first and second asynchronous FIFO modules 130 and 132 and to the second arithmetic stage 121-b and the arithmetic stages of the first plurality of arithmetic stages of the first hash engine 121 that follow the second arithmetic stage 121-b.
Additionally or alternatively, the arithmetic core 100C' may also include a third asynchronous FIFO module 133. The third asynchronous FIFO module 133 is disposed between adjacent third and fourth operational stages 122-c and 122-d of the second plurality of operational stages of the second hash engine 122, the third operational stage 122-c preceding the fourth operational stage 122-d. The third asynchronous FIFO module 133 is configured to receive the data signal output from the third operational stage 122-c using a second clock signal provided by the second clock module 142 and to output the received data signal to the fourth operational stage 122-d using a fourth clock signal different from the second clock signal provided by the second clock module 142. The arithmetic core 100C' may further include a fourth clock module 144, the fourth clock module 144 configured to provide a fourth clock signal to the third asynchronous FIFO module 133 and to the fourth arithmetic stage 122-d and arithmetic stages of the second plurality of arithmetic stages of the second hash engine 122 subsequent to the fourth arithmetic stage 122-d. The second clock module 142 is configured to provide a second clock signal to the first and third asynchronous FIFO modules 130 and 133 and to the third arithmetic stage 122-c and an arithmetic stage of the second plurality of arithmetic stages of the second hash engine 122 that precedes the third arithmetic stage 122-c.
For example, the second asynchronous FIFO module 132 may be disposed between the 32 nd and 33 rd operation stages of the first hash engine 121, and the third asynchronous FIFO module 133 may be disposed between the 32 nd and 33 rd operation stages of the second hash engine 122, so that each clock signal passes only 32 operation stages at the maximum, thereby further optimizing the shape of the clock signal at each operation stage.
It will be appreciated by those skilled in the art that although fig. 5A to 5C show that one asynchronous FIFO module is included in each hash engine, this is merely by way of non-limiting example, and the number and location of the asynchronous FIFO modules in the hash engine can be appropriately set according to actual needs. It should also be understood that the respective clock modules may be appropriately arranged to provide clock signals for the respective operation stages and the respective FIFO modules in the hash engine according to the number and positions of the synchronous FIFO modules and the asynchronous FIFO modules in the hash engine, as long as it is ensured that the clock signals within each hash engine are in the same direction but the clock signals of the different hash engines are in opposite directions, and that the same clock signal is provided to the synchronous FIFO modules and the different clock signals are provided to the asynchronous FIFO modules, and fig. 5A to 5C merely show a few example arrangements and are not intended to limit the present disclosure.
The present disclosure also provides a computing chip comprising one or more arithmetic cores as described in any of the above embodiments.
A computing chip 700 according to some embodiments of the present disclosure is described below in conjunction with fig. 7. The computing chip 700 may include a top-level module 710 and a plurality of arithmetic cores 720 having a hybrid clock architecture as described above. By way of non-limiting example, in the embodiment depicted in FIG. 7, the arithmetic core 720 is shown having the structure shown in FIG. 1.
As shown in fig. 7, the top module 710 includes a clock source 711. Clock source 711 is configured to provide a clock signal to the computational core 720 of the computing chip 700. The operational cores 720 are arranged in a plurality of columns 720-1, 720-2, 720-3, 720-4, with the clock modules of each column of operational cores receiving a clock signal via a common clock channel. For example, the clock modules of the computational cores in columns 720-1, 720-2, 720-3, 720-4 receive clock signals via clock channels 731, 732, 733, 734, respectively. While the two hash engines are depicted as being arranged in the same row in the arithmetic core shown in fig. 7, the clock channel may also be shared between arithmetic cores of adjacent columns when the arithmetic cores have a vertical structure.
Although not specifically illustrated, it is understood that when the hash engine of the operation core 720 has an additional asynchronous FIFO module as described above inside, an additional clock channel may also be provided, so that the respective clock modules of the operation cores 720 in the same column added by the addition of the asynchronous FIFO module may also receive a clock signal via the common clock channel.
It should be appreciated that although in the illustrated example, the computing chip includes four columns and four rows of computational cores, this is merely exemplary and not limiting, and any suitable number of computational cores may be arranged into any suitable number of columns as is practical.
The computing chip 800A of fig. 8A differs from the computing chip 700 of fig. 7 in that the computing core 820 of the computing chip 800A also has a synchronization FIFO block (as schematically represented by the left-slanted filled rectangle in the figure), for example, may have a structure as shown in fig. 2. In such an embodiment, the clock modules of the operational cores in each column 820-1, 820-2, 820-3, 820-4 may still receive a clock signal via clock channels 731, 732, 733, 734, respectively.
When the arithmetic cores have a vertical structure, the clock channel may also be shared between the arithmetic cores of adjacent columns. As shown in fig. 8B, the computing chip 800B differs from the computing chip 800A in that the arithmetic core 820' of the computing chip 800B has a first hash engine and a second hash engine adjacent to each other in the vertical direction. In the figures of the present disclosure, SF represents a synchronous FIFO module, ASF represents an asynchronous FIFO module, H1 represents a first hash engine, and H2 represents a second hash engine.
The plurality of columns of arithmetic cores may include a first column of arithmetic cores and a second column of arithmetic cores (e.g., 820-1 'and 820-2') adjacent to one another and arranged in the recited order. In some embodiments, the clock modules of the first column of computational cores 820-1 'receive a clock signal via a common clock channel 831 with the clock modules of the second column of computational cores 820-2'. The computing chip may also include a plurality of pairs of such first and second column arithmetic cores, each pair capable of receiving a clock signal via a common clock channel. For example, columns 820-3 'and 820-4' of compute chip 800B also receive a clock signal via a common clock channel 832.
It should be understood that although the synchronization FIFO modules are disposed to the left of the hash engines H1 and H2 in fig. 8B, this is merely exemplary and not limiting, e.g., as shown in fig. 8C, the synchronization FIFO modules of the compute cores may be disposed on either side of the hash engines H1 and H2, and the hybrid clock architecture of the present disclosure can be implemented whether the corresponding compute cores in the two adjacent columns of compute cores are disposed identically or inversely with respect to the synchronization FIFO modules. In fact, the position of the synchronous FIFO module in each operation core relative to the hash engine can be set reasonably according to practical situations, and the arrangement of each operation core or each column of operation cores in the computing chip is not necessarily required to be the same.
It should also be understood that although H1 is depicted above H2 in fig. 8B, the relative positional relationship of hash engines H1 and H2 in the vertical direction is not particularly limited, and the hybrid clock structure of the present disclosure can be implemented whether H1 is above H2 or H2 is above H1. In fact, the relative positional relationship of the haugh engines H1 and H2 in the vertical direction in each arithmetic core is appropriately set according to actual conditions, and it is not necessarily required that the arrangement of each arithmetic core or each column of arithmetic cores in the computing chip is the same.
The computing chip 900A of fig. 9A differs from the computing chip 700 of fig. 7 in that the computing core 920A of the computing chip 900A also has an asynchronous FIFO module (as schematically represented by the rectangle filled with right-angled lines in the figure), for example, may have a structure as shown in fig. 4. In such an embodiment, the first clock modules of the compute cores in each column 920A-1, 920A-2, 920A-3, 920A-4 receive a first clock signal via a common clock channel and the second clock modules receive a second clock signal via the common clock channel. For example, one of the first and second clock modules of the arithmetic cores in column 920A-1 receives a first clock signal via a common clock channel 931 and the other receives a second clock signal via a common clock channel 932.
The computing chip 900B of fig. 9B differs from the computing chip 900A in that the arithmetic core 920B of the computing chip 900A may have a structure as shown in fig. 3, for example. In such an embodiment, the first clock modules of the compute cores in each column 920B-1, 920B-2, 920B-3, 920B-4 receive a first clock signal via a common clock channel and the second clock modules receive a second clock signal via the common clock channel. For example, one of the first and second clock modules of the operational cores in column 920B-1 receives a first clock signal via a common clock channel 931 'and the other receives a second clock signal via a common clock channel 932'.
For the embodiment shown in fig. 9B, it may be further modified to share the clock channel between adjacent columns. The plurality of columns 920B-1, 920B-2, 920B-3, 920B-4 includes first and second columns of operational cores (e.g., 920B-1, 920B-2) adjacent to one another and arranged in the stated order, and in some embodiments, one of the first and second clock modules of the first column of operational cores and one of the first and second clock modules of the second column of operational cores may receive a clock signal via a common clock channel. In some embodiments, the plurality of columns includes, in addition to the first column of computational cores and the second column of computational cores, a third column of computational cores (e.g., 920B-3) adjacent to the second column of computational cores opposite the first column of computational cores, and the other of the first clock module and the second clock module of the second column of computational cores and the one of the first clock module and the second clock module of the third column of computational cores may receive the clock signal via a common clock channel.
For example, the computational chip 900B' of fig. 9C differs from the computational chip 900B in that the computational cores of adjacent columns share a clock channel. As shown in FIG. 9C, column 920B-1 shares clock channel 934 with column 920B-2, column 920B-2 shares clock channel 935 with column 920B-3, column 920B-3 shares clock channel 936 with column 920B-4, and furthermore column 920B-1 has a separate clock channel 933 and column 920B-4 has a separate clock channel 937. How the operation core 920B is arranged in the computing chip is not specifically shown in fig. 9C. In fact, as can be appreciated with reference to fig. 9D, regardless of the arrangement of the arithmetic cores 920B (the first hash engine is on the left and the second hash engine is on the right, or the second hash engine is on the left and the first hash engine is on the right), clock channel sharing between arithmetic cores of adjacent columns can be achieved. Therefore, each operation core in the computing chip 900B' can be reasonably arranged according to actual conditions. It should be noted that, as shown in fig. 9D, the data signal is always transferred from the first hash engine to the second hash engine, that is, whether the first hash engine is on the left side or the second hash engine is on the left side in the figure is only relative to the view point of the figure, and the data transfer upstream-downstream relationship is not changed.
In addition, when the operation cores have a vertical structure, the clock channel may also be shared between the operation cores of adjacent columns. This is described in detail below with reference to fig. 9E to 9G. In fig. 9E to 9G, the chain line is also used to represent the clock signal for clarity of illustration. The plurality of columns of operational cores may include a first column of operational cores and a second column of operational cores adjacent to each other and arranged in the recited order. In some embodiments, one of the first clock module and the second clock module of the first column of computational cores and one of the first clock module and the second clock module of the second column of computational cores receive a clock signal via a common clock channel; additionally or alternatively, the other of the first clock module and the second clock module of the first column of computational cores and the other of the first clock module and the second clock module of the second column of computational cores receive a clock signal via a common clock channel.
As shown in fig. 9E, the computing chip 900C is different from the computing chip 900B 'in that the operation core 920' of the computing chip 900C has a first hash engine and a second hash engine adjacent to each other in the vertical direction. The plurality of columns 920-1 ', 920-2', 920-3 ', 920-4' includes a first column of kernels and a second column of kernels (e.g., 920-1 ', 920-2') that are adjacent to each other and arranged in the stated order. In some embodiments, the first clock module of the first column of computational cores (e.g., 920-1 ') and the first clock module of the second column of computational cores (e.g., 920-2') each receive a clock signal as a respective first clock signal via a common clock channel (e.g., 991). In some embodiments, additionally or alternatively, the second clock module of the first column of computational cores (e.g., 920-1 ') and the second clock module of the second column of computational cores (e.g., 920-2') receive the clock signal as respective second clock signals via a common clock channel (e.g., 993). The computing chip 900C may include a plurality of pairs of such first and second column arithmetic cores. For example, first clock modules of the computational cores of columns 920-3 'and 920-4' may receive respective clock signals as respective first clock signals via a common clock channel 992, and second clock modules may receive respective clock signals as respective second clock signals via a common clock channel 994.
It should be understood that the arrangement of the asynchronous FIFO blocks in fig. 9E with respect to the hash engines H1 and H2 is merely exemplary and not limiting, for example, as shown in fig. 9F, asynchronous FIFO blocks of the compute cores may be arranged on either side of the hash engines H1 and H2, and the hybrid clock architecture of the present disclosure can be implemented whether the arrangement of the corresponding compute cores in the two adjacent columns of compute cores with respect to the asynchronous FIFO blocks is the same or opposite. In fact, the position of the asynchronous FIFO module in each operation core relative to the hash engine can be set reasonably according to practical situations, and the arrangement of each operation core or each column of operation cores in the computing chip is not necessarily required to be the same.
It should also be understood that the arrangement of the hash engine H1 above the hash engine H2 in fig. 9E is merely exemplary and not limiting, the relative positional relationship of the hash engines H1 and H2 in the vertical direction is not particularly limited, and the hybrid clock structure of the present disclosure can be implemented whether H1 is above H2 or H2 is above H1. For example, as shown in FIG. 9G, the Hash engine H2 is above the hash engine H1 in the second column of kernels. In the embodiment shown in fig. 9G, the first clock module of the first column of computational cores and the second clock module of the second column of computational cores receive the clock signals as respective first and second clock signals via a common clock channel. Additionally or alternatively, the second clock modules of the first column of computational cores and the first clock modules of the second column of computational cores receive the clock signals as respective second and first clock signals via a common clock channel. In fact, the relative positional relationship of the haugh engines H1 and H2 in the vertical direction in each arithmetic core can be set appropriately according to practical situations, and the arrangement of each arithmetic core or each column of arithmetic cores in the computing chip is not necessarily required to be the same.
Since the aspect ratio of the arithmetic cores of conventional mining computer chips is typically large (because up to 128 arithmetic stages are provided), the arrangement of the arithmetic cores on the computer chip (typically based on a silicon wafer) is very limited. The arithmetic core with the vertical structure provided by the disclosure can have a remarkably reduced length-width ratio and can be reasonably arranged on a computing chip more flexibly and freely. The inclusion of synchronous or asynchronous FIFO modules may also help to improve the performance of an arithmetic core having a vertical architecture. In addition, by sharing the clock channel between adjacent columns of the operation cores, the chip area can be further saved, and a larger number of operation cores can be arranged on the same-size chip to efficiently bear complex operation tasks.
It will be appreciated that although in the above embodiments it has been described that the clock channel is shared between adjacent columns of the arithmetic cores, it is also possible and within the scope of the present disclosure to share the clock channel between adjacent rows of the arithmetic cores in a similar manner.
The present disclosure may also provide a cryptocurrency mining machine including one or more computing chips as described above. The cryptocurrency mining machine according to the present disclosure may have a lower cost, more efficiently perform the excavation process.
The terms "left", "right", "front", "back", "top", "bottom", "upper", "lower", "high", "low", and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein. For example, features described originally as "above" other features may be described as "below" other features when the device in the drawings is inverted. The device may also be otherwise oriented (rotated 90 degrees or at other orientations) and the relative spatial relationships may be interpreted accordingly.
In the description and claims, an element being "on," "attached" to, "connected" to, "coupled" to, or "contacting" another element, etc., may be directly on, attached to, connected to, coupled to, or contacting the other element, or one or more intervening elements may be present. In contrast, when an element is referred to as being "directly on," "directly attached to," directly connected to, "directly coupled to" or "directly contacting" another element, there are no intervening elements present. In the description and claims, one feature may be "adjacent" another feature, and the portions of one feature that overlap or are above or below the adjacent feature may be referred to.
As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be reproduced exactly. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, there is no intention to be bound by any expressed or implied theory presented in the preceding technical field, background, brief summary or the following detailed description.
As used herein, the term "substantially" is intended to encompass any minor variation resulting from design or manufacturing imperfections, device or component tolerances, environmental influences, and/or other factors. The word "substantially" also allows for differences from a perfect or ideal situation due to parasitics, noise, and other practical considerations that may exist in a practical implementation.
In addition, "first," "second," and like terms may also be used herein for reference purposes only and are thus not intended to be limiting. For example, the terms "first", "second", and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.
It will be further understood that the terms "comprises/comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.
In the present disclosure, the term "providing" is used in a broad sense to encompass all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" the object, and the like.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The aspects and elements of all embodiments disclosed above may be combined in any manner and/or in combination with aspects or elements of other embodiments to provide multiple additional embodiments. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (19)

1. An arithmetic core, comprising:
an input module configured to receive a block of data;
an operation module configured to perform a hash operation on a received data block, the operation module comprising:
a first hash engine comprising a first plurality of operational stages arranged in a pipeline structure such that data signals based on the data block pass sequentially along the first plurality of operational stages; and
a second hash engine comprising a second plurality of operational stages arranged in a pipeline structure such that data signals received from the first hash engine pass along the second plurality of operational stages in sequence,
wherein each of the first and second plurality of operational stages operates on a data signal received from a previous operational stage and provides the data signal operated on by the operational stage to a subsequent operational stage; and
a clock module configured to provide a clock signal to the first hash engine and the second hash engine,
wherein a transfer direction of the clock signal within the first hash engine is opposite to a transfer direction of the clock signal within the second hash engine.
2. The arithmetic core of claim 1, wherein a direction of propagation of the clock signal within the first hash engine is opposite to a direction of propagation of the data signal, and a direction of propagation of the clock signal within the second hash engine is the same as the direction of propagation of the data signal.
3. The arithmetic core of claim 1, further comprising:
a synchronous first-in first-out (FIFO) module disposed between a last one of the first plurality of operational stages of the first hash engine and a most previous one of the second plurality of operational stages of the second hash engine, the synchronous FIFO module configured to receive a data signal output from the last operational stage of the first hash engine using the clock signal and output the received data signal to the most previous operational stage of the second hash engine using the clock signal,
wherein the clock module is further configured to provide the clock signal to the synchronization FIFO module.
4. The arithmetic core of claim 1, further comprising:
an asynchronous FIFO module disposed between a last operational stage of the first plurality of operational stages of the first hash engine and a foremost operational stage of the second plurality of operational stages of the second hash engine, the asynchronous FIFO module configured to receive a data signal output from the last operational stage of the first hash engine with a first clock signal and output the received data signal to the foremost operational stage of the second hash engine with a second clock signal different from the first clock signal,
wherein the clock module comprises a first clock module configured to provide the first clock signal to the first hash engine and the asynchronous FIFO module and a second clock module configured to provide the second clock signal to the second hash engine and the asynchronous FIFO module, and wherein the first clock signal and the second clock signal are at the same frequency.
5. The arithmetic core of claim 4, wherein the first clock signal is transferred in the same direction as the data signal and the second clock signal is transferred in the opposite direction as the data signal.
6. The arithmetic core of claim 4, wherein the first clock signal is transferred in a direction opposite to that of the data signal, and the second clock signal is transferred in a direction same as that of the data signal.
7. The arithmetic core according to any of claims 1-6, wherein the arithmetic core is implemented on a semiconductor chip, and the first and second hash engines are arranged adjacent to each other along a surface of the semiconductor chip in a vertical direction perpendicular to a transfer direction of the data signal.
8. The arithmetic core of any of claims 1-3, further comprising:
a second asynchronous FIFO module disposed between adjacent first and second operation stages of the first plurality of operation stages, the first operation stage preceding the second operation stage, the second asynchronous FIFO module configured to receive a data signal output from the first operation stage with a third clock signal different from the clock signal and output the received data signal to the second operation stage with the clock signal; and
a third clock module configured to provide the third clock signal to the second asynchronous FIFO module and to the first operational stage and an operational stage of the first plurality of operational stages that precedes the first operational stage,
wherein the clock module is further configured to provide the clock signal to the second asynchronous FIFO module and to the second operational stage and operational stages of the first plurality of operational stages subsequent to the second operational stage.
9. The arithmetic core of any of claims 1-3, further comprising:
a third asynchronous FIFO module disposed between adjacent third and fourth arithmetic stages of the second plurality of arithmetic stages, the third arithmetic stage preceding the fourth arithmetic stage, the third asynchronous FIFO module configured to receive a data signal output from the third arithmetic stage using the clock signal and output the received data signal to the fourth arithmetic stage using a fourth clock signal different from the clock signal; and
a fourth clock module configured to provide the fourth clock signal to the third asynchronous FIFO module and to the fourth arithmetic stage and an arithmetic stage of the second plurality of arithmetic stages subsequent to the fourth arithmetic stage,
wherein the clock module is further configured to provide the clock signal to the third asynchronous FIFO module and to the third operational stage and an operational stage of the second plurality of operational stages preceding the third operational stage.
10. The arithmetic core of any of claims 4-6, wherein the asynchronous FIFO module is a first asynchronous FIFO module, and the arithmetic core further comprises:
a second asynchronous FIFO module disposed between adjacent first and second operation stages of the first plurality of operation stages, the first operation stage preceding the second operation stage, the second asynchronous FIFO module configured to receive a data signal output from the first operation stage with a third clock signal different from the first clock signal and output the received data signal to the second operation stage with the first clock signal; and
a third clock module configured to provide the third clock signal to the second asynchronous FIFO module and to the first operational stage and an operational stage of the first plurality of operational stages that precedes the first operational stage,
wherein the first clock module is configured to provide the first clock signal to the first and second asynchronous FIFO modules and to the second operational stage and operational stages of the first plurality of operational stages subsequent to the second operational stage.
11. The arithmetic core of any of claims 4-6, wherein the asynchronous FIFO module is a first asynchronous FIFO module, and the arithmetic core further comprises:
a third asynchronous FIFO module disposed between adjacent third and fourth arithmetic stages of the second plurality of arithmetic stages, the third arithmetic stage preceding the fourth arithmetic stage, the third asynchronous FIFO module configured to receive a data signal output from the third arithmetic stage with the second clock signal and output the received data signal to the fourth arithmetic stage with a fourth clock signal different from the second clock signal; and
a fourth clock module configured to provide the fourth clock signal to the third asynchronous FIFO module and to the fourth arithmetic stage and an arithmetic stage of the second plurality of arithmetic stages subsequent to the fourth arithmetic stage,
wherein the second clock module is configured to provide the second clock signal to the first and third asynchronous FIFO modules and to the third arithmetic stage and an arithmetic stage of the second plurality of arithmetic stages that precedes the third arithmetic stage.
12. A computing chip comprising one or more arithmetic cores as claimed in any one of claims 1 to 11.
13. A computing chip comprising a plurality of the arithmetic cores of any of claims 1-3, 8-9, the plurality of the arithmetic cores arranged in a plurality of columns, the clock module of each column of arithmetic cores receiving a clock signal via a common clock channel.
14. The computing chip of claim 13,
wherein the operation core is an operation core according to claim 3, the operation core is implemented on a semiconductor chip, and the first hash engine and the second hash engine are arranged adjacent to each other in a vertical direction perpendicular to a transfer direction of the data signal along a surface of the semiconductor chip,
and wherein the plurality of columns includes a first column of computational cores and a second column of computational cores that are adjacent to each other and arranged in the recited order, and wherein the clock modules of the first column of computational cores and the clock modules of the second column of computational cores receive a clock signal via a common clock channel.
15. A computing chip comprising a plurality of the arithmetic cores of any of claims 4-6, 10-11, the plurality of the arithmetic cores being arranged in a plurality of columns, a first clock module of each column of arithmetic cores receiving a first clock signal via a common clock channel, and a second clock module of each column of arithmetic cores receiving a second clock signal via a common clock channel.
16. The computing chip of claim 15, wherein the memory device,
wherein the operation core is the operation core according to claim 5,
wherein the plurality of columns includes a first column of operational cores and a second column of operational cores that are adjacent to each other and arranged in the recited order,
and wherein one of the first and second clock modules of the first column of computational cores and one of the first and second clock modules of the second column of computational cores receive a clock signal via a common clock channel.
17. The computing chip as set forth in claim 16,
wherein the plurality of columns includes the first, second, and third column arithmetic cores that are adjacent to one another and arranged in the recited order,
and wherein the other of the first and second clock modules of the second column of computational cores and the one of the first and second clock modules of the third column of computational cores receive a clock signal via a common clock channel.
18. The computing chip of claim 15, wherein the memory device,
wherein the arithmetic core is implemented on a semiconductor chip, and the first hash engine and the second hash engine are arranged adjacent to each other in a vertical direction perpendicular to a transfer direction of the data signal along a surface of the semiconductor chip,
wherein the plurality of columns includes a first column of operational cores and a second column of operational cores that are adjacent to each other and arranged in the recited order,
and wherein:
receiving a clock signal via a common clock channel by one of a first clock module and a second clock module of the first column of computational cores and one of a first clock module and a second clock module of the second column of computational cores; and/or
The other of the first and second clock modules of the first column of computational cores and the other of the first and second clock modules of the second column of computational cores receive a clock signal via a common clock channel.
19. A cryptocurrency machine comprising one or more computing chips as claimed in any one of claims 12 to 18.
CN202011320665.2A 2020-11-23 2020-11-23 Operation core, calculation chip and encrypted currency mining machine Pending CN114528246A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202011320665.2A CN114528246A (en) 2020-11-23 2020-11-23 Operation core, calculation chip and encrypted currency mining machine
PCT/CN2021/104624 WO2022105252A1 (en) 2020-11-23 2021-07-06 Computing core, computing chip, and data processing device
TW110124791A TWI775514B (en) 2020-11-23 2021-07-06 Computing cores, computing chips and data processing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011320665.2A CN114528246A (en) 2020-11-23 2020-11-23 Operation core, calculation chip and encrypted currency mining machine

Publications (1)

Publication Number Publication Date
CN114528246A true CN114528246A (en) 2022-05-24

Family

ID=79601102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011320665.2A Pending CN114528246A (en) 2020-11-23 2020-11-23 Operation core, calculation chip and encrypted currency mining machine

Country Status (3)

Country Link
CN (1) CN114528246A (en)
TW (1) TWI775514B (en)
WO (1) WO2022105252A1 (en)

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2377519B (en) * 2001-02-14 2005-06-15 Clearspeed Technology Ltd Lookup engine
CN101308568B (en) * 2007-05-17 2010-06-23 北京展讯高科通信技术有限公司 Method and apparatus for production line real-time processing based FIFO
GB2505002B (en) * 2012-08-17 2014-09-24 Broadcom Corp Method and apparatus for transferring data from a first domain to a second domain
EP3095044B1 (en) * 2013-11-19 2020-09-23 Top Galore Limited Block mining methods and apparatus
US20170046298A1 (en) * 2015-08-11 2017-02-16 Mediatek Inc. Asynchronous first-in first-out buffer apparatus with active rate control and dynamic rate compensation and associated network device using the same
US20180004242A1 (en) * 2016-06-29 2018-01-04 Intel Corporation Low clock-energy 3-phase latch-based clocking scheme
CN107831824B (en) * 2017-10-16 2021-04-06 北京比特大陆科技有限公司 Clock signal transmission method and device, multiplexing chip and electronic equipment
CN108777612B (en) * 2018-05-18 2020-03-20 中科声龙科技发展(北京)有限公司 Optimization method and circuit for workload certification operation chip core calculation component
CN111488627B (en) * 2020-04-13 2023-04-07 杭州德旺信息技术有限公司 Message expanding circuit of secure hash algorithm
CN111930682A (en) * 2020-07-16 2020-11-13 深圳比特微电子科技有限公司 Clock tree, hash engine, computing chip, force plate and digital currency mining machine
CN111651402A (en) * 2020-07-16 2020-09-11 深圳比特微电子科技有限公司 Clock tree, hash engine, computing chip, force plate and digital currency mining machine
CN111913749A (en) * 2020-08-07 2020-11-10 山东大学 SM3 algorithm FPGA implementation method and system based on assembly line

Also Published As

Publication number Publication date
WO2022105252A1 (en) 2022-05-27
TW202138998A (en) 2021-10-16
TWI775514B (en) 2022-08-21

Similar Documents

Publication Publication Date Title
TW202145019A (en) Efficient hardware architecture for accelerating grouped convolutions
US20220271753A1 (en) Clock tree, hash engine, computing chip, hash board and data processing device
US8078661B2 (en) Multiple-word multiplication-accumulation circuit and montgomery modular multiplication-accumulation circuit
CN111562808A (en) Clock circuit system, computing chip, computing board and digital currency mining machine
US20210004238A1 (en) Calculating device
US20140143744A1 (en) Systems and methods for reducing logic switching noise in parallel pipelined hardware
US11750195B2 (en) Compute dataflow architecture
Qiao et al. FANS: FPGA-accelerated near-storage sorting
Minervini et al. Vitruvius+: an area-efficient RISC-V decoupled vector coprocessor for high performance computing applications
US8589467B2 (en) Systolic array and calculation method
CN111930682A (en) Clock tree, hash engine, computing chip, force plate and digital currency mining machine
US20210182186A1 (en) Topological scheduling
Alachiotis et al. Accelerating phylogeny-aware short DNA read alignment with FPGAs
CN213399573U (en) Operation core, calculation chip and encrypted currency mining machine
CN111651403A (en) Clock tree, hash engine, computing chip, force plate and digital currency mining machine
CN213399572U (en) Operation core, calculation chip and encrypted currency mining machine
CN114528246A (en) Operation core, calculation chip and encrypted currency mining machine
CN114528247A (en) Operation core, calculation chip and encrypted currency mining machine
US11016822B1 (en) Cascade streaming between data processing engines in an array
JP7228590B2 (en) data bus
CN212515801U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine
CN212515800U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine
Marakkalage et al. Fanout-Bounded Logic Synthesis for Emerging Technologies
Kao et al. Two-stage multi-bit flip-flop clustering with useful skew for low power
CN212515799U (en) Clock tree, hash engine, computing chip, force plate and encrypted currency mining machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination