WO2021233198A1 - 用于执行散列算法的电路和方法 - Google Patents
用于执行散列算法的电路和方法 Download PDFInfo
- Publication number
- WO2021233198A1 WO2021233198A1 PCT/CN2021/093612 CN2021093612W WO2021233198A1 WO 2021233198 A1 WO2021233198 A1 WO 2021233198A1 CN 2021093612 W CN2021093612 W CN 2021093612W WO 2021233198 A1 WO2021233198 A1 WO 2021233198A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- registers
- stage
- extended
- operation stage
- data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 41
- 230000000630 rising effect Effects 0.000 claims description 31
- 239000000872 buffer Substances 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 6
- 102100040862 Dual specificity protein kinase CLK1 Human genes 0.000 description 32
- 101000749294 Homo sapiens Dual specificity protein kinase CLK1 Proteins 0.000 description 16
- 102100040844 Dual specificity protein kinase CLK2 Human genes 0.000 description 15
- 101000749291 Homo sapiens Dual specificity protein kinase CLK2 Proteins 0.000 description 15
- 238000005065 mining Methods 0.000 description 14
- 230000001960 triggered effect Effects 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 11
- 230000014509 gene expression Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 230000004048 modification Effects 0.000 description 7
- 238000012986 modification Methods 0.000 description 7
- 102100040856 Dual specificity protein kinase CLK3 Human genes 0.000 description 6
- 101000749304 Homo sapiens Dual specificity protein kinase CLK3 Proteins 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 101100113690 Homo sapiens CLK1 gene Proteins 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K5/00—Manipulating of pulses not covered by one of the other main groups of this subclass
- H03K5/13—Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals
- H03K5/135—Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30047—Prefetch instructions; cache control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K3/00—Circuits for generating electric pulses; Monostable, bistable or multistable circuits
- H03K3/02—Generators characterised by the type of circuit or by the means used for producing pulses
- H03K3/027—Generators characterised by the type of circuit or by the means used for producing pulses by the use of logic circuits, with internal or external positive feedback
- H03K3/037—Bistable circuits
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3236—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
- H04L9/3239—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/50—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
- H04L2209/122—Hardware reduction or efficient architectures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
- H04L2209/125—Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations
Definitions
- the present disclosure generally relates to circuits and methods for executing hashing algorithms (may also be referred to as hashing algorithms), and more specifically, to circuits and methods for implementing data processing (such as Bitcoin mining).
- hashing algorithms may also be referred to as hashing algorithms
- data processing such as Bitcoin mining
- Bitcoin is a virtual encrypted digital currency in the form of P2P (Peer-to-Peer). Its concept was originally proposed by Satoshi Nakamoto on November 1, 2008, and was officially born on January 3, 2009. The uniqueness of Bitcoin is that it does not rely on a specific currency institution to issue, but is generated through a large number of calculations based on a specific algorithm. Bitcoin transactions use a distributed database composed of many nodes in the entire P2P network to confirm and record all transaction behaviors, and use cryptographic design to ensure security.
- Bitcoin is a proof of work (POW) based on the SHA-256 hashing algorithm, and its transaction integrity depends on the collision and pre-image resistance of SHA-256.
- Hashing algorithm is an algorithm that takes variable-length data as input and produces fixed-length hash value as output. Its essence is to refine information. Since 1993, the American Institute of Standards and Technology has designed and released multiple versions of Secure Hash Algorithm SHA (Secure Hash Algorithm).
- SHA-256 is one of the secure hash algorithms with a hash length of 256 bits. .
- a circuit for executing a hash algorithm including: an input module for receiving data; and an arithmetic module for calculating a hash value based on the received data.
- the module includes a plurality of operational stages arranged in a pipeline structure, the plurality of operational stages including the 0th operational stage, the first operational stage, and up to the P-th operational stage.
- each operation stage from the 1st operation stage to the P operation stage includes: a plurality of cache registers, used to store the intermediate value of the current operation stage and run at the first frequency, and a plurality of extension registers , Used to store the extended data of the current arithmetic stage, and include a first set of extended registers operating at the first frequency and a second set of extended registers operating at a second frequency, where the second frequency is 1 of the first frequency /N times, N is a fixed positive integer greater than 1 and not greater than the number of extension registers in the second group of extension registers.
- an apparatus for executing a data processing algorithm including the circuit for executing a hash algorithm as described above.
- a method for executing an algorithm which uses the circuit described in the foregoing to execute the algorithm.
- Figure 1 shows the operation process of the hash algorithm
- Figure 2 shows the overall process of SHA-256 processing data and outputting a data summary
- Figure 3 shows the operation process of SHA-256 round operation
- Figure 4 shows the mapping structure used to generate Wt
- Figure 5 shows a schematic diagram of a pipeline structure for round operations in a circuit for implementing SHA-256
- Fig. 6 exemplarily shows a circuit for executing a hash algorithm according to an embodiment of the present disclosure
- FIG. 7A exemplarily shows a schematic diagram of a partial structure of a circuit for executing SHA-256 according to an embodiment of the present disclosure
- FIG. 7B exemplarily shows the structure of the circuit for executing SHA-256 in FIG. 7A
- FIG. 8A exemplarily shows a schematic diagram of a partial structure of a circuit for executing SHA-256 according to an embodiment of the present disclosure
- FIG. 8B exemplarily shows the structure of the circuit for executing SHA-256 in FIG. 8A The clock signal used.
- the core of using data processing equipment (such as mining machines) for Bitcoin mining is to obtain rewards based on its computing power to calculate SHA-256.
- chip size, chip operating speed and chip power consumption are the three most important factors that determine the performance of the mining machine. Among them, the chip size determines the cost of the chip, and the speed at which the chip runs determines the operating speed of the mining machine.
- the power consumption of the chip determines the degree of power consumption, that is, the cost of mining. In practical applications, the most important performance indicator to measure a mining machine is the power consumption per unit of computing power, that is, the power consumption and computing power ratio.
- SHA-256 In order to improve security, in the Bitcoin protocol, SHA-256 must be performed twice. Therefore, for Bitcoin mining machines, the most important thing is to implement the hash algorithm SHA-256 with a lower power consumption ratio.
- the hash algorithm takes variable-length data as input and produces a fixed-length hash value as output.
- the hash algorithm has the characteristic that it is used for each data in a large input set including multiple data, and the corresponding generated multiple hash values are evenly distributed and seem to be random.
- the primary goal of the hash algorithm is to ensure the integrity of the data, so that any one or several changes in the input data will most likely change the obtained hash value.
- Figure 1 schematically shows the operation process of the hash algorithm.
- input data of any length is filled so that the length of the filled data is an integer multiple of a certain fixed length (for example, 512 bits), that is, the filled data can be divided into multiple data blocks with the above fixed length.
- the content of the stuffing bit includes the bit length information of the original data.
- the hash algorithm will perform operations on each fixed-length data block, for example, multiple rounds of operations including data expansion and ⁇ or compression. When all data blocks are used, the final fixed-length hash value is obtained.
- a pipeline structured circuit with multiple operation stages can be used to achieve high-speed operations, in which each operation stage can use registers to store real-time changes in the operation Large amounts of data.
- the register updates the data stored in it based on the clock signal. The higher the clock signal frequency, the higher the flip frequency of the register and the higher the power consumption.
- the inventor of the present application believes that the structure and operation mode of the existing circuit for implementing the hash algorithm still need to be optimized, especially the arrangement and operation mode of a large number of registers in its pipeline structure.
- the registers of each arithmetic stage are flipped with a uniform clock signal frequency to ensure that the updated data can be stored in the registers.
- the data shift between the registers of adjacent operation stages is included. For example, in the first clock cycle, the data D stored in the registers of the first operation stage is shifted to the second In the register of the operation stage, in the second clock cycle, the data D in the register of the second operation stage is shifted to the register of the third operation stage.
- the inventor of this application thinks that if data D does not participate in any operation in the second operation stage, but does not participate in the operation until it is shifted to the third operation stage, then the flip of the register of the second operation stage is actually Is redundant. If it is possible to make the register of the first operation stage not flip in the first clock cycle and still store the data D, and then make the data D directly shift from the first operation stage to the third operation stage in the second clock cycle, then On the premise of ensuring that the data D can participate in the operation correctly, the redundant flip of the register of the second operation stage can be eliminated, and the required power consumption can be reduced.
- the inventor of the present application proposes an improved circuit and method for implementing a hash algorithm, so as to realize the above-mentioned optimization idea.
- SHA-256 In order to present the inventive concept of the present disclosure more clearly and intuitively, the following will briefly introduce SHA-256 and use it as a representative example of the hash algorithm to describe the circuits and circuits for implementing the hash algorithm according to the embodiments of the present disclosure. method. Those skilled in the art will understand that the circuit and method for implementing a hash algorithm according to the embodiments of the present disclosure are applicable to any hash algorithm, and can even be further applied to any circuit and method that can adopt a pipeline structure and have data shifts. It is not limited to implementing SHA-256.
- the input of SHA-256 is data with a maximum length of less than 264 bits, and the output is a 256-bit data digest, that is, a hash value.
- Input data is processed in units of 512-bit data blocks.
- Figure 2 shows the overall process of SHA-256 data processing and output data digest. This process includes steps 1 to 5 as described in detail later.
- the padding consists of a 1 followed by a 0.
- Step 2 Additional length.
- a 64-bit unsigned integer is added after the padded data, and the 64-bit unsigned integer represents the length L of the data before padded.
- the result of the foregoing steps 1 and 2 is to generate extended data whose length is an integer multiple of 512 bits.
- the length of the extended data can be expressed as Q*512 bits, and Q is a positive integer greater than 1.
- the expanded data is divided into Q 2 512-bit length data blocks M 1, M 2 through M Q.
- Step 3 Initialize the hash buffer area.
- the initial value of the hash algorithm H 0 , the intermediate values H 1 , H 2 until H Q-1 and the final result H Q are sequentially stored in a 256-bit hash buffer, which can include 8 32-bit registers A, B, C, D, E, F, G, and H.
- a 256-bit hash buffer which can include 8 32-bit registers A, B, C, D, E, F, G, and H.
- first initialize the hash buffer to the initial value H 0 that is, initialize the registers A, B, C, D, E, F, G, and H to the integers shown in the table below (ten Hexadecimal).
- Step 4 Process data in units of 512-bit data blocks.
- the core of SHA-256 is to sequentially perform round operations including 64 rounds of operations for each of the 512-bit data blocks M 1 , M 2 and M Q.
- the round operations are marked as f in FIG. 2.
- Figure 3 shows the round calculation process of SHA-256.
- the data in the registers A to H in the hash buffer area are used as input, and the data in the registers A to H in the hash buffer area are updated.
- the hash value in the buffer is an intermediate value H i- 1, wherein i is a positive integer and i ⁇ Q.
- Each round operation in a data block in M i such as the t-th wheel (t is an integer satisfying 0 ⁇ t ⁇ 63), using a 32-bit value W is t, the current value of the data by a 512-bit Block M i is derived, and the derived algorithm is the data expansion algorithm that will be discussed below.
- Each round will also use an additional constant K t to make the calculations of each round different.
- the output of round 63 and the input of round 0 H i-1 are added to produce H i , in which the 32-bit data in each of registers A to H in the hash buffer area and H i-1
- the corresponding 32-bit data is added modulo 2 32 .
- Step 5 Output. After all Q 512-bit data blocks are processed, the output from the Q stage is a 256-bit data digest H Q , which is the hash value.
- T 1 H+ ⁇ 1 (E)+CH(E,F,G)+K t +W t
- T 2 ⁇ 0 (A)+Maj(A,B,C)
- ROTR n (x) means that the 32-bit variable x is cyclically shifted to the right by n bits; W t is a 32-bit word derived from the current 256-bit input data block; K t is a 32-bit additional constant; + is a modulus 2 32 addition; AND is the 32-bit bitwise AND operation; NOT is the inversion operation; XOR operation.
- W t how derived from the data block 512 where M i.
- Fig. 4 illustrates the mapping structure used to generate W t. As shown in Figure 4, W t is obtained according to the following formula:
- W t is directly taken from the data block M i ;
- ROTR n (x) means that the 32-bit variable x is cyclically shifted to the right by n bits
- SHR n (x) means that the 32-bit variable x is shifted to the right by n bits, and the right is filled with 0
- + is a modulo 2 32 addition.
- each bit of the generated hash code is a function of all input bits.
- the multiple complex and repeated operations of the round operation f make the results sufficiently confused, so that two data are randomly selected, even if the two data have similar characteristics, it is unlikely that a repeated hash code will be generated.
- SHA-256 is for more clearly presenting the inventive concept of the present application, and is not intended to constitute any limitation.
- the SHA-256 mentioned in this article includes any known version of SHA-256 and its variants and modifications.
- a pipeline structure can be used to perform parallel operations on multiple sets of different data to improve computing efficiency.
- a 64-stage pipeline structure can be used to operate 64 groups of data in parallel.
- FIG. 5 shows a schematic diagram of a pipeline structure for round operations in a circuit for implementing SHA-256.
- the t-th operation stage, the t+1-th operation stage, and the t+2th operation stage in the pipeline structure are divided by dashed lines.
- each arithmetic stage includes 8 32-bit registers A to H for storing intermediate values and 16 32-bit registers R 0 to R 15 for storing extended data W t to W t+15 , respectively.
- each operation stage of the pipeline includes 16 registers R 0 to R 15 , which are used to store 16 consecutive rounds of extended data W t to W t+15 , so that the next extended data W t+16 can be calculated.
- a data shift path R 13 -R 12 -R 11 -R 10 of logic operation similarly there is another data shift path R 8 -R 7 -R 6 -R 5 -R 4 -R 3 -R 2 .
- the inventor of the present application believes that for a pipeline structure including registers that are only used for data shifting and do not participate in logical operations (for example, registers R 2 to R 8 and R 10 to R 13 ), especially those that include a data shift path There is room for further optimization in the pipeline structure.
- all registers are controlled by the same clock signal, so that in each clock cycle, all registers must be flipped to store new data.
- the flip operation is actually not necessary, which will cause waste of power consumption.
- the inventor of the present application thinks that the reduced frequency can be used to control the registers (for example, the extended registers R 2 to R 8 and R 10 to R 13 ) that are only used for data shifting and do not participate in logic operations in the arithmetic stage.
- the registers for example, the extended registers R 2 to R 8 and R 10 to R 13
- the critical path in the pipeline structure of hash operations It usually appears in the logic hardware that calculates the intermediate value. Therefore, there is a certain timing redundancy in the logic hardware that calculates the extended data. This makes certain changes to the logic hardware used to calculate the extended data will not result in new The critical path (that is, does not cause the maximum operating frequency to be reduced), which facilitates the improvement of the pipeline structure.
- the circuit 100 includes: an input module 110 for receiving data; and an arithmetic module 120 for calculating a hash based on the received data.
- the arithmetic module 120 includes a plurality of arithmetic stages arranged in a pipeline structure, the plurality of arithmetic stages including the 0th arithmetic stage, the first arithmetic stage, and up to the Pth arithmetic stage, where P is greater than 1 and less than the number of arithmetic stages in the pipeline structure The fixed positive integer. For clarity of the drawings, only two operational stages are schematically shown in FIG. 1.
- Each operation stage from the 1st operation stage to the Pth operation stage may include: a plurality of buffer registers, which are used to store the intermediate value of the current operation stage and run at the first frequency; and a plurality of extension registers, which are used to store the current
- the extended data of the arithmetic stage includes a first set of extended registers operating at a first frequency and a second set of extended registers operating at a second frequency.
- the second frequency is 1/N times the first frequency
- N is a fixed positive integer greater than 1 and not greater than the number of extension registers in the second group of extension registers.
- the second group of extended registers may be registers used only for data shifting and not participating in logic operations in each operation stage, and the size of N may depend on the data shift path in the pipeline structure. length.
- the circuit 100 is used to implement SHA-256, a plurality of cache registers may include registers A to H for storing intermediate values, and a plurality of extended registers may include a register R for storing extended data. 0 to R 15 , and the data shift path can be R 13 -R 12 -R 11 -R 10 or R 8 -R 7 -R 6 -R 5 -R 4 -R 3 -R 2 .
- the 0th operation stage to the Pth operation stage are consecutive (P+1) operation stages in the pipeline structure, and the pipeline structure may also include other than the 0th operation stage to the Pth operation stage Other operation stages, for example, may include one or more operation stages connected before the 0th operation stage, and/or may include one or more operation stages connected after the Pth operation stage.
- other operation stages other than the 0th operation stage to the Pth operation stage in the pipeline structure may include structures similar to the 0th operation stage to the Pth operation stage.
- the pipeline structure may include a total of 64 operation stages, of which the first 12 operation stages adopt the structure of the 0th operation stage to the Pth operation stage as described above (in this case, the value of P is 11, and the value of N may be, for example, 3), and the 13th to 18th arithmetic stages also adopt the structure of the 0th arithmetic stage to the Pth arithmetic stage as described above (in this case, the value of P is 5, and the value of N may be 3, for example).
- the plurality of cache registers and the plurality of extension registers may include edge-triggered registers, such as rising-edge-triggered registers and/or falling-edge-triggered registers.
- the plurality of cache registers and the plurality of extension registers may include D flip-flops (DFF) and/or latches (Latch), and the latches may be, for example, latches using a pulse-type clock signal.
- DFF D flip-flops
- Latch latches
- the circuit 100 for executing the hash algorithm further includes a clock module 130 that can be used to provide a reference clock signal CLK.
- the reference clock signal CLK has a first frequency and a reference clock period corresponding to the first frequency.
- the multiple buffer registers and the first set of extended registers of each operation stage from the first operation stage to the Pth operation stage are based on the reference clock signal run.
- each operation stage from the 1st operation stage to the Pth operation stage is configured to: in each reference clock cycle, based on the first operation stage from the adjacent previous operation stage
- the extended data in at least one extended register in a set of extended registers generates an intermediate value for storing in a plurality of cache registers of the current operation level.
- each of the Nth arithmetic stages to the Pth arithmetic stages may be configured to: in each reference clock cycle, based on The extended data in at least one extension register of the adjacent first N operating levels of the current operating level is generated for storing the extended data in the first set of extension registers of the current operating level.
- the i+j 1 *N operation stage can be configured to, in the C 1 +i+k*N th reference clock cycle, based on at least one extension register in the adjacent first N operation stages of the current operation stage The extended data in, generate extended data for storage in the second set of extended registers of the current operation stage.
- N is as described above, that is, it is a fixed positive integer greater than 1, and the second frequency is 1/N times the first frequency;
- C 1 is a fixed positive integer, and its size depends on the circuit The number of clock cycles required for data initialization in the initial stage of 100 startup; i is 0 or any positive integer less than N, j 1 is any positive integer less than P/N, and k is 0 or any positive integer.
- the second operation stage is based on the 0th operation stage and The extended data in at least one extension register in the first operational stage is used to generate the extended data stored in the second set of extended registers in the second operational stage; the third operational stage is in C1+1, C1+3, Based on the extended data in at least one extension register of the first and second operation stages in the reference clock cycle of C1+5, C1+7, etc., a second group for storage in the third operation stage is generated The extended data in the extension register; the fourth operation stage is based on at least the second operation stage and the third operation stage in the reference clock cycle of the C1, C1+2, C1+4, C1+6, etc.
- the extended data in an extended register generates the extended data stored in the second set of extended registers of the fourth operation stage, and so on, and so on.
- the 0th arithmetic stage may be configured to determine the extended data in a plurality of extension registers in the 0th arithmetic stage based on the data received by the input module 110.
- the clock module 130 can be used to generate a plurality of different clock signals to realize the control of each operation stage as described above.
- the clock module 130 may also be configured to generate the first clock signal CLK1 to the Nth clock signal CLKN having the second frequency.
- the rising edge of the first clock signal CLK1 to the Nth clock signal CLKN is aligned with the rising edge of the reference clock signal, and the rising edge of each clock signal from the second clock signal to the Nth clock signal is higher than the previous clock signal.
- the rising edge of the signal is one reference clock period later, for example, the rising edge of the second clock signal is one reference clock period later than the rising edge of the first clock signal, and the rising edge of the third clock signal is one later than the rising edge of the second clock signal The reference clock period, and so on.
- the second set of extended registers in the p+q*N operation stage operates based on the p-th clock signal, p is any positive integer not greater than N, q is 0 or such that (p+q*N) is not greater than Any positive integer of P, that is, the clock signals used by the respective second set of extension registers of any two adjacent arithmetic stages have the same frequency with each other, and the rising edge differs by one reference clock period.
- the second set of extended registers in the first operation stage operates based on the first clock signal CLK1
- the second set of extended registers in the second operation stage operates based on the second clock signal CLK2
- the second set of expansion registers in the third operation stage The register operates based on the first clock signal CLK1
- the second set of extended registers in the fourth operational stage operates based on the second clock signal CLK2, and so on.
- the output terminal of one extension register in the first group of extension registers of each of the first arithmetic stages to the PNth arithmetic stage may be coupled to the adjacent lower N
- the input terminal of one extension register in the first group of extension registers of each of the N+1 operation stage to the P operation stage can be coupled to the first N adjacent ones through the N-to-1 multiplexer
- the data throughput rate of the register operating at the first frequency is N times the data throughput rate of the register operating at the second frequency. Therefore, if it is necessary to connect the output of the register operating at the first frequency to the input of the register operating at the second frequency, it can be connected to N registers operating at the second frequency. Vice versa, if you need to connect the output of the register running at the second frequency to the input of the register running at the first frequency, you can pass the output of the N registers running at the second frequency through the N to 1 multiplexer Connect to a register running at the first frequency. When the output of a register running at the second frequency needs to be connected to the input of another register running at the second frequency, because the frequencies are consistent, a one-to-one connection can be used, but (N-1) operations need to be skipped class.
- the plurality of extension registers of each operation stage from the 1st operation stage to the Pth operation stage may further include a third set of expansion registers running at a third frequency, where the third The frequency is 1/M times the first frequency, and M is a fixed positive integer greater than 1 and less than the number of extension registers in the third group of extension registers and not equal to N.
- the r+j 2 *M operation stage can be configured to, in the C 2 +r+k*M reference clock cycle, based on at least one of the adjacent previous M operation stages of the current operation stage The extended data in the extended register is used to generate the extended data stored in the third set of extended registers of the current operation stage.
- C2 is a fixed positive integer, its size depends on the number of clock cycles required for data initialization in the initial stage of the circuit startup; r is 0 or any positive integer less than M, j 2 is any positive integer less than P/M Integer, k is 0 or any positive integer.
- the clock module may be configured accordingly to generate M clock signals for controlling the third set of extension registers.
- the circuit 100 for executing a hash algorithm according to an embodiment of the present disclosure may be used to implement the SHA-256 algorithm, and the SHA-256 algorithm may be implemented in a variety of different configurations.
- the circuit and method for implementing a hash algorithm according to the embodiments of the present disclosure are applicable to any hash algorithm, and can even be further applied to any circuit and method that can adopt a pipeline structure and have data shifts. It is not limited to implementing SHA-256.
- the multiple extension registers of each operation stage may include 16 32-bit registers R 0 to R 15 .
- the registers R 0 to R 15 are respectively used to store the extended data W t to W t+15 , so the operations they participate in are shown in Expression 2.
- the second set of extended registers includes registers R 2 to R 8 and R 10 to R 13 . Since the length of the shorter data shift path R 13 -R 12 -R 11 -R 10 in the second set of extension registers is 4 (that is, including 4 serially shifted registers), the maximum value of N is 4 , That is, N can be 2, 3, or 4.
- the operating frequencies of the second set of extended registers R 2 to R 8 and R 10 to R 13 may be 1/N of the operating frequencies of the first set of extended registers R 0 , R 1 , R 9 , R 14 and R 15 Therefore, the power consumption of the second set of extended registers R 2 to R 8 and R 10 to R 13 can be reduced by (N-1)/N accordingly.
- different frequencies may be used to control the two sets of registers R 2 to R 8 and registers R 10 to R 13 respectively.
- the second set of extended registers may include registers R 2 to R 8
- the third set of extended registers may include registers R 10 to R 13 .
- the second set of extended registers R 2 to R 8 are controlled by the second frequency
- the third set of extended registers R 10 to R 13 are controlled by the third frequency. The control of these two sets of extended registers is independent of each other. Influence.
- the operating frequency of the second set of extended registers R 2 to R 8 can be 1/N of the operating frequency of the multiple extended registers R 0 , R 1 , R 9 , R 14 and R 15 , so the second set of extended registers
- the power consumption of R 2 to R 8 can be reduced by (N-1)/N accordingly.
- the operating frequency of the third group of extended registers R 10 to R 13 can be 1/M of the operating frequency of the plurality of extended registers R 0 , R 1 , R 9 , R 14 and R 15 , so the third group of extended registers
- the power consumption of R 10 to R 13 can be reduced by (M-1)/M accordingly.
- the power consumption and calculation power ratio of the circuit for implementing the hash algorithm according to the embodiment of the present disclosure is significantly improved.
- the extension register participating in the logic operation may also be subjected to frequency reduction control.
- the register R 9 is used for data shifting and participating in the logic operation of each operation stage, but the registers R 8 and R 10 that have a data shift relationship with the register R 9 are both registers only used for data shifting. Therefore, the frequency reduction control of the register R 9 can also be performed.
- the two data-shift paths R 13 -R 12 -R 11 -R 10 and R 8 -R 7 -R 6 -R 5 -R 4 -R 3 -R 2 are connected together to realize an ultra-long data shift path from R 13 to R 2.
- Additional modifications to the rest of the circuit 100 may include, for example, to modify the hardware associated with the output portion of the register R 9 are used, for example, if the output of the register prior to modification R 9 are hardwired to logical hardware operations Part, after modification, it may be necessary to connect the output of the register R 9 and the output of another register to the hardware part of the logic operation through the 2 out of 1 multiplexer.
- the first group of extended registers includes registers R 0 , R 1 , R 14, and R 15
- the second group of extended registers includes registers R 2 to R 13
- the maximum value of N can be 12, that is, N can be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12.
- the operating frequency of the second set of extended registers R 2 to R 13 can be 1/N of the operating frequency of the plurality of extended registers R 0 , R 1, and R 14 and R 15 , so the second set of extended registers R 2 to R 15
- the power consumption of R 13 can be reduced by (N-1)/N accordingly.
- FIG. 7A exemplarily shows a schematic diagram of a partial structure of a circuit 200 for executing SHA-256 according to an embodiment of the present disclosure
- FIG. 7B exemplarily shows the circuit for executing SHA-256 in FIG. 7A
- the circuit 200 shown in FIG. 7A is a specific example in which the circuit 100 shown in FIG. 6 is used to execute SHA-256, so all the foregoing descriptions about the circuit 100 for executing a hash algorithm are applicable to this.
- FIG. 7A only schematically shows a partial connection relationship of a part of the structure of the circuit 200 for performing SHA-256. For example, some registers in the figure are not connected to any representation data. The shifted arrows are connected, but this does not mean that the register is not involved in the operation, but it is not shown in the figure.
- the plurality of extension registers of each arithmetic stage may include 16 32-bit registers R 0 to R 15 .
- the registers R 0 to R 15 are respectively used to store the extended data W t to W t+15 , so the operations they participate in are shown in Expression 2.
- the arrows in FIG. 7A indicate the shift relationship of data between registers.
- the line type of the arrow is consistent with the line type of the clock signal it represents.
- the three different arrows correspond to the reference clock signal CLK, the first clock signal CLK1, and the second clock signal CLK2, respectively.
- the line type of each arrow indicates whether the clock signal used by the register it points to is CLK, CLK1, or CLK2.
- FIG. 7A also distinguishes the clock signals used by the registers in different styles of the registers. As shown in FIG. 7B, the styles of the registers using the clock signals CLK, CLK1 and CLK2 are placed after the corresponding clock signals as an illustration.
- the dotted-line framed register in FIG. 7A indicates that the clock signal used can be flexibly confirmed according to specific needs.
- the plurality of buffer registers and the plurality of extension registers may adopt rising edge triggered registers or falling edge triggered registers.
- FIG. 7B shows the clock signals required when the register triggered by the rising edge is used. By inverting these clock signals by 180°, the clock signal required by the register triggered by the falling edge can be obtained.
- the registers R 9 and R 15 in the first group of extended registers in each arithmetic stage operate according to the reference clock signal CLK.
- the i+j 1 *2 operation stage is configured such that the operation stage is within the C 1 +i+k*2 reference clock cycle based on the adjacent previous operation stage of the current operation stage.
- at least one extension extended data register two operational stages, generating extension data to be stored in the second set the current operation stage extension register of R 10 to R 13.
- C 1 is a fixed positive integer, and its size depends on the number of clock cycles required for data initialization at the initial stage of the circuit 100 startup; i is 0 or 1, j 1 is any positive integer less than P/2, k It is 0 or any positive integer.
- the first p + q * 2 in the second stage arithmetic extended register set to R 10 R 13 p based on the clock signal CLKp operation where, p is 1 or 2, q is 0 or so as to satisfy ( p+q*2) Any positive integer not greater than P.
- the third operation stage the fifth stage like the first calculation 1 + q * 2 in the second stage arithmetic extended register set to R 10 R 13 based on the first clock signal CLK1 operation; such as a second arithmetic stage, the fourth arithmetic stage like the first 2 + q * 2 in the second stage arithmetic register set extension R 13 to R 10 on the second clock signal CLK2 running.
- the rising edges of the first clock signal CLK1 and the second clock signal CLK2 are aligned with the rising edge of the reference clock signal CLK, and the rising edge of the second clock signal CLK2 is one reference clock cycle later than the rising edge of the first clock signal CLK1 .
- FIG. 8A exemplarily shows a schematic diagram of a partial structure of a circuit 300 for performing SHA-256 according to an embodiment of the present disclosure
- FIG. 8B exemplarily shows the circuit for performing SHA-256 in FIG. 8A
- the clock signal used The circuit 300 shown in FIG. 8A is a specific example in which the circuit 100 shown in FIG. 6 is used to execute SHA-256, so all the foregoing descriptions about the circuit 100 for executing a hash algorithm are applicable to this.
- FIG. 8A only schematically shows a partial connection relationship of a part of the structure of the circuit 300 for performing SHA-256. For example, some registers in the figure are not connected to any data that represent data. The shifted arrows are connected, but this does not mean that the register is not involved in the operation, but it is not shown in the figure.
- the plurality of extension registers of each arithmetic stage may include 16 32-bit registers R 0 to R 15 .
- the registers R 0 to R 15 are respectively used to store the extended data W t to W t+15 , so the operations they participate in are shown in Equation 2.
- the arrows in FIG. 8A indicate the shift relationship of data between registers.
- the line type of the arrow is consistent with the line type of the clock signal it represents.
- the four different arrow types correspond to the quasi-clock signal CLK, the first clock signal CLK1, the second clock signal CLK2, and the third clock signal CLK3, respectively.
- the line type of each arrow indicates whether the clock signal used by the register it points to is CLK, CLK1 or CLK2, CLK3.
- Figure 8A also distinguishes the clock signals used by the registers in different styles of the registers. As shown in Figure 8B, the styles of the registers that use the clock signals CLK, CLK1, CLK2, and CLK3 are placed after the corresponding clock signals, as an illustration .
- the dotted-line framed register in FIG. 8A indicates that the clock signal used can be flexibly confirmed according to specific needs.
- the plurality of buffer registers and the plurality of extension registers may adopt rising edge triggered registers or falling edge triggered registers.
- FIG. 8B shows the clock signals required when the register triggered by the rising edge is used. By inverting these clock signals by 180°, the clock signal required by the register triggered by the falling edge can be obtained.
- the registers R 9 and R 15 in the first set of extended registers in each operation stage operate according to the standard clock signal CLK.
- the i+j 1 *3 operation stage is configured such that the operation stage is based on the adjacent previous operation stage in the C 1 +i+k*3 reference clock cycle.
- at least one extension extended data register three operational stages, generating extension data to be stored in the second set the current operation stage extension register of R 10 to R 13.
- C 1 is a fixed positive integer, and its size depends on the number of clock cycles required for data initialization at the initial stage of the circuit 100 startup; i is 0, 1 or 2, and j 1 is any positive integer less than P/3 , K is 0 or any positive integer.
- the first p + q * 3 calculation stage in the second set of extension registers to R 10 R 13 p based on the clock signal CLKp operation where, p is 1, 2 or 3, q is 0 or such Any positive integer satisfying (p+q*3) not greater than P.
- first arithmetic stage fourth stage and so the first calculation 1 + q * 3 calculation stage in the second set of extension registers to R 10 R 13 based on the first clock signal CLK1 operation
- second arithmetic stage 5 like the first stage arithmetic 2 + q * 3 calculation stage in the second set of extension registers to R 10 R 13 based on the second clock signal CLK2 operation
- third stage of operation 3 + q * 3 calculating a second stage set of extension registers R 10 to R 13 operates based on the third clock signal CLK3.
- the rising edges of the first clock signal CLK1, the second clock signal CLK2, and the third clock signal CLK3 are aligned with the rising edge of the reference clock signal CLK, and the rising edge of the second clock signal CLK2 is higher than the rising edge of the first clock signal CLK1.
- One reference clock cycle later, and the rising edge of the third clock signal CLK3 is one reference clock cycle later than the rising edge of the second clock signal CLK2.
- a device for executing a data processing algorithm which includes a circuit for executing a hash algorithm as described above, such as circuit 100, circuit 200 or Circuit 300.
- the circuit for executing the hash algorithm proposed in the present disclosure is very suitable for implementing the SHA-256 algorithm with a reduced power consumption and computing power ratio, and thus is very suitable for implementing a data processing device with a reduced power consumption and computing power ratio (for example, Bitcoin mining machine).
- the power consumption and computing power of the device for executing a data processing algorithm according to an embodiment of the present disclosure has a significant advantage.
- the method may include: using an input module to receive data; and using an arithmetic module to calculate a hash value based on the received data.
- the arithmetic module may include a plurality of arithmetic stages arranged in a pipeline structure, including, for example, the 0th arithmetic stage, the first arithmetic stage, and up to the Pth arithmetic stage.
- P is a fixed positive integer greater than 1 and less than the number of arithmetic stages in the pipeline structure. .
- Each operation stage from the 1st operation stage to the Pth operation stage may include: a plurality of buffer registers, which are used to store the intermediate value of the current operation stage and run at the first frequency, and a plurality of extension registers, which are used to store the current Expansion data of operation level.
- the plurality of extension registers may include a first group of extension registers operating at a first frequency and a second group of extension registers operating at a second frequency, where the second frequency is 1/N times the first frequency, and N is greater than 1 and A fixed positive integer not greater than the number of extension registers in the second group of extension registers.
- the plurality of cache registers and the plurality of extension registers may include edge-triggered registers, such as rising-edge-triggered registers and/or falling-edge-triggered registers.
- the plurality of cache registers and the plurality of extension registers may include D flip-flops and/or latches, and the latches may be, for example, latches using a pulse-type clock signal.
- the method for executing the hash algorithm may further include using a clock module to provide a reference clock signal.
- the reference clock signal has a first frequency and a reference clock period corresponding to the first frequency
- the plurality of buffer registers and the first set of extension registers of each operation stage from the first operation stage to the Pth operation stage may be based on the reference clock signal run.
- Each operation stage from the 1st operation stage to the Pth operation stage can be configured to: in each reference clock cycle, based on at least one extension from the first set of extension registers in the adjacent previous operation stage
- the extended data in the register generates intermediate values for storage in the multiple buffer registers of the current operation level.
- Each of the Nth arithmetic stages to the Pth arithmetic stage can be configured to: in each reference clock cycle, based on the value in at least one extension register in the adjacent first N arithmetic stages of the current arithmetic stage Extended data, which is used to generate extended data stored in the first set of extended registers of the current operation stage.
- the i+j 1 *N operation stage can be configured to, in the C 1 +i+k*N th reference clock cycle, based on at least one extension register in the adjacent first N operation stages of the current operation stage
- the extended data in, generate extended data for storage in the second set of extended registers of the current operation stage.
- C 1 is a fixed positive integer
- i is 0 or any positive integer less than N
- j 1 is any positive integer less than P/N
- k is 0 or any positive integer.
- the clock module may be further configured to generate the first clock signal to the Nth clock signal having the second frequency, wherein the first clock signal to the first clock signal
- the rising edge of the N clock signal is aligned with the rising edge of the reference clock signal, and the rising edge of each clock signal from the second clock signal to the Nth clock signal is one reference clock cycle later than the rising edge of the previous clock signal.
- the second set of extended registers in the p+q*N operation stage can be run based on the p-th clock signal, p is any positive integer not greater than N, and q is 0 so that (p+q*N) is not greater than P or Any positive integer.
- the output terminal of one of the extended registers in the first group of extended registers of each of the operational stages from the first operational stage to the PN operational stage may be It is coupled to the input terminal of one of the extension registers of the second group of extension registers of each of the next N next arithmetic stages.
- the input terminal of one of the expansion registers in the first group of expansion registers of each operation stage from the N+1 operation stage to the P operation stage can be coupled to the first N adjacent ones through the N-to-1 multiplexer
- the plurality of extension registers of each operation stage from the 1st operation stage to the Pth operation stage may further include the first operation stage running at the third frequency.
- the third frequency is 1/M times the first frequency, and M is a fixed positive integer greater than 1 and less than the number of extension registers in the third set of extension registers and not equal to N.
- the r+j 2 *M operation stage can be configured to, in the C 2 +r+k*M reference clock cycle, based on at least one of the adjacent previous M operation stages of the current operation stage
- the extended data in an extension register is used to generate the extended data stored in the third set of extension registers of the current arithmetic stage; among them, C 2 is a fixed positive integer, r is 0 or any positive integer less than M, j 2 It is any positive integer less than P/M, and k is 0 or any positive integer.
- the method may be used to execute SHA-256.
- the method may be used to execute SHA-256.
- the method may be used to execute SHA-256.
- a method for executing a data processing algorithm for example, a Bitcoin mining algorithm
- a data processing algorithm for example, a Bitcoin mining algorithm
- the word "exemplary” means “serving as an example, instance, or illustration” and not as a “model” to be accurately reproduced. Any implementation described exemplarily herein is not necessarily construed as being preferred or advantageous over other implementations. Moreover, the present disclosure is not limited by any expressed or implied theory given in the above technical field, background art, summary of the invention, or specific embodiments.
- the word “substantially” means to include any minor changes caused by design or manufacturing defects, device or component tolerances, environmental influences, and/or other factors.
- the word “substantially” also allows for differences between the perfect or ideal situation due to parasitic effects, noise, and other practical considerations that may be present in the actual implementation.
- connection means that one element/node/feature is electrically, mechanically, logically, or otherwise directly connected (or Direct communication).
- coupled means that one element/node/feature can be directly or indirectly connected to another element/node/feature mechanically, electrically, logically, or in other ways. Interaction is allowed, even if the two features may not be directly connected. In other words, “coupled” is intended to include direct connection and indirect connection of elements or other features, including the connection of one or more intermediate elements.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Nonlinear Science (AREA)
- Power Engineering (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
- Power Sources (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种用于执行散列算法的电路,包括:输入模块(110),用于接收数据;以及运算模块(120),用于基于接收到的数据计算散列值,运算模块(120)包括以流水线结构布置的多个运算级,多个运算级包括第0运算级、第1运算级、直到第P运算级,P为大于1且小于流水线结构中运算级的数量的固定的正整数,其中,从第1运算级到第P运算级中的每个运算级包括:多个缓存寄存器,用于存储当前运算级的中间值并且以第一频率运行,以及多个扩展寄存器,用于存储当前运算级的扩展数据,并且包括以第一频率运行的第一组扩展寄存器和以第二频率运行的第二组扩展寄存器,其中,第二频率是第一频率的1/N倍。
Description
相关申请的交叉引用
本申请是以CN申请号为202010432370.8,申请日为2020年5月20日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
本公开总体而言涉及用于执行散列算法(也可以称为哈希算法)的电路和方法,更具体而言,涉及用于实现数据处理(例如比特币挖矿)的电路和方法。
比特币是一种P2P(Peer-to-Peer)形式的虚拟加密数字货币,其概念最初由中本聪在2008年11月1日提出,并于2009年1月3日正式诞生。比特币的独特之处在于,它不依靠特定货币机构发行,而是依据特定算法通过大量运算来产生。比特币交易使用整个P2P网络中众多节点构成的分布式数据库来确认并记录所有的交易行为,并使用密码学设计来确保安全性。
从密码学的角度而言,比特币是基于SHA-256散列算法的工作量证明POW(proof of work),其交易完整性取决于SHA-256的碰撞性和前映像阻力。散列算法是一种将可变长度的数据作为输入并产生固定长度的散列值作为输出的算法,其本质是对信息的提炼。自1993年以来,美国标准与技术研究所先后设计并发布了多个版本的安全散列算法SHA(Secure Hash Algorithm),SHA-256正是其中一种散列长度为256位的安全散列算法。
发明内容
根据本公开的第一方面,提供了一种用于执行散列算法的电路,包括:输入模块,用于接收数据;以及运算模块,用于基于接收到的数据计算散列值,所述运算模块包括以流水线结构布置的多个运算级,所述多个运算级包括第0运算级、第1运算级、直到第P运算级,P为大于1且小于流水线结构中运算级的数量的固定的正整数,其中,从第1运算级到第P运算级中的每个运算级包括:多个缓存寄存器,用于存储当前运算级的中间值并且以第一频率运行,以及多个扩展寄存器,用于存储当前运算级的扩展数据,并且包括以所 述第一频率运行的第一组扩展寄存器和以第二频率运行的第二组扩展寄存器,其中,第二频率是第一频率的1/N倍,N为大于1且不大于第二组扩展寄存器中扩展寄存器的数量的固定的正整数。
根据本公开的第二方面,提供了一种用于执行数据处理算法(例如比特币挖矿算法)的装置,包括如上所述的用于执行散列算法的电路。
根据本公开的第三方面,提供了一种用于执行算法的方法,该方法采用前文中所述的电路来执行该算法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1示出了散列算法的运算过程;
图2示出了SHA-256进行数据处理并输出数据摘要的总体过程;
图3示出了SHA-256的轮运算的运算过程;
图4示出了用于生成Wt的映射结构;
图5示出了用于实现SHA-256的电路中进行轮运算的流水线结构的示意图;
图6示例性地示出了根据本公开的实施例的用于执行散列算法的电路;
图7A示例性地示出了根据本公开的实施例的用于执行SHA-256的电路的部分结构的示意图,图7B示例性地示出了图7A中的用于执行SHA-256的电路所采用的时钟信号;
图8A示例性地示出了根据本公开的实施例的用于执行SHA-256的电路的部分结构的示意图,图8B示例性地示出了图8A中的用于执行SHA-256的电路所采用的时钟信号。
注意,在以下说明的实施方式中,有时在不同的附图之间共同使用同一附图标记来表示相同部分或具有相同功能的部分,而省略其重复说明。在本说明书中,使用相似的标号和字母表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
为了便于理解,在附图等中所示的各结构的位置、尺寸及范围等有时不表示实际的位置、尺寸及范围等。因此,所公开的发明并不限于附图等所公开的位置、尺寸及 范围等。此外,附图不必按比例绘制,一些特征可能被放大以示出具体组件的细节。
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。也就是说,本文中的用于实现散列算法的电路和方法是以示例性的方式示出,来说明本公开中的电路或方法的不同实施例,而并非意图限制。本领域的技术人员将会理解,它们仅仅说明可以用来实施本公开的示例性方式,而不是穷尽的方式。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
使用数据处理设备(例如矿机)来进行比特币挖矿的核心是根据其计算SHA-256的运算能力来获得奖励。对于矿机而言,芯片尺寸、芯片运行速度和芯片功耗是决定矿机性能的至关重要的三个因素,其中,芯片尺寸决定芯片成本,芯片运行的速度决定矿机运行速度,即算力,芯片功耗决定耗电程度,即挖矿成本。在实际应用中,衡量矿机最为重要的性能指标是单位算力所消耗的功耗,即功耗算力比。
为了提高安全性,在比特币协议中,SHA-256要进行两次。因此,对于比特币矿机而言,最重要的就是以较低的功耗算力比来实现散列算法SHA-256。
因此,存在对于具有更低功耗算力比的用于实现散列算法的电路和方法的需求,更具体而言,存在对于具有更低功耗算力比的用于实现比特币挖矿的电路和方法的需求。
如前文中所述,散列算法将可变长度的数据作为输入并产生固定长度的散列值作为输出。散列算法具有这样的特点:对于包括多个数据的大的输入集合中的每个数据分别使用散列算法,相应的产生的多个散列值均匀地分布且看起来随机,概括地说,散列算法的首要目标是保证数据完整性,使得对于输入的数据中任何一位或几位的改变,都将极大可能改变所得到的散列值。
图1示意性地示出了散列算法的运算过程。首先,任意长度的输入数据被填充,以使得填充后的数据长度为某固定长度(例如512位)的整数倍,即,使得填充后的 数据可以划分为多个具有上述固定长度的数据块。填充位的内容包括原始数据的位长度信息。接着散列算法会对各个固定长度的数据块分别进行运算处理,例如包括数据扩展和\或压缩等操作的多轮运算。当所有数据块都被使用以后,得到最终的固定长度的散列值。
对于包括多轮运算的散列算法(例如SHA-256)而言,可以使用具有多个运算级的流水线结构的电路来实现高速运算,其中每个运算级可以使用寄存器来存储在运算中实时变化的大量数据。寄存器基于时钟信号更新存储在其中的数据,时钟信号频率越高,寄存器的翻转频率越高,通常功耗也就越高。
本申请的发明人认为,现有的用于实现散列算法的电路的结构及其运行模式仍有待优化,特别是其流水线结构中大量寄存器的布置与运行模式。具体而言,在流水线结构中,每个运算级的寄存器都随着统一的时钟信号频率翻转,以确保更新的数据能够被存储在寄存器中。在这些数据更新之中,包括在相邻运算级的寄存器之间的数据移位,例如,在第一个时钟周期中,存储在第1运算级的寄存器中的数据D被移位到第2运算级的寄存器中,在第二个时钟周期中,第2运算级的寄存器中的数据D又被移位到第3运算级的寄存器。本申请的发明人想到,如果数据D在第2运算级中时并未参与任何运算,而是直到被移位到第3运算级中后才参与运算,那么第2运算级的寄存器的翻转其实是冗余的。如果能够在第一个时钟周期内使得第1运算级的寄存器不翻转,仍存储数据D,然后在第二时钟周期内使得该数据D直接从第1运算级移位到第3运算级,就能够在确保数据D能够正确参与运算的前提下,消除第2运算级的寄存器的冗余翻转,降低所需要的功耗。
但是,这样的设想在现有的使用统一时钟信号控制的电路结构中是无法实现的。因此,本申请的发明人提出改进的用于实现散列算法的电路和方法,从而实现上述优化设想。
为了更为清楚和直观的呈现本公开的发明构思,下文中将简要介绍SHA-256并以其作为散列算法的代表示例来描述根据本公开的实施例的用于实现散列算法的电路和方法。本领域技术人员将理解,根据本公开的实施例的用于实现散列算法的电路和方法适用于任何散列算法,甚至可以进一步应用到任何可以采用流水线结构且存在数据移位的电路和方法中,而不仅限于实现SHA-256。
SHA-256的输入是最大长度小于2
64位的数据,输出是256位的数据摘要,即散列值。输入数据以512位的数据块为单位进行处理。图2示出了SHA-256进行数据处 理并输出数据摘要的总体过程。这个过程包含如后文中详细描述的步骤1到步骤5。
步骤1:附加填充位。对原始长度为L位的数据进行填充,使数据的长度模512与448同余,即长度=448(mod 512)。即使原始数据已经满足上述长度要求,仍然需要进行填充,因此填充位数在1到512之间。填充由一个1和后续的0组成。
步骤2:附加长度。在填充后的数据后再附加一个64位的无符号整数,该64位的无符号整数表示填充前数据的长度L。
前述步骤1和步骤2的结果是产生了一个长度为512位的整数倍的扩展数据,扩展数据的长度可以表示为Q*512位,Q为大于1的正整数。如图2所示,该扩展数据被划分为Q个长度为512位的数据块M
1、M
2直到M
Q。
步骤3:初始化散列缓存区。散列算法的初始值H
0、中间值H
1、H
2直到H
Q-1以及最终结果H
Q依次保存于256位的散列缓冲区中,散列缓冲区可以包括8个32位的寄存器A、B、C、D、E、F、G和H。在运算开始时,首先将散列缓冲区初始化为初始值H
0,即,将寄存器A、B、C、D、E、F、G和H分别初始化为如下列表格中所示的整数(十六进制)。
A=0X7A09E667 | E=0X510E527F |
B=0XBB67AE85 | D=0X9B05688C |
C=0X3C6EF372 | F=0X1F83D9AB |
D=0XA54FF53A | H=0X5BE0CD19 |
步骤4:以512位的数据块为单位处理数据。SHA-256的核心是依次针对512位的数据块M
1、M
2直到M
Q中的每个数据块进行包括64轮运算的轮运算,轮运算在图2中标记为f。
图3示出了SHA-256的轮运算的运算过程。在轮运算的64轮运算的每一轮中,都将散列缓存区的寄存器A到H中的数据作为输入,并更新散列缓存区的寄存器A到H中的数据。在对数据块M
i的轮运算中的第0轮,散列缓存区里的值是中间值H
i-
1,其中i为正整数且i≤Q。在对数据块M
i的轮运算中的每一轮,如第t轮(t为整数且满足0≤t≤63),使用一个32位的值W
t,该值由当前的512位的数据块M
i导出,导出算法是下面将要讨论的数据扩展算法。每一轮还将使用附加的常数K
t,用来使每轮的运算不同。第63轮的输出和第0轮的输入H
i-1相加就产生H
i,其中散列缓存区中的寄存器A到H中的每个寄存器中的32位的数据和H
i-1中对应的32位的数据进行模2
32的加法运算。
步骤5:输出。所有的Q个512位数据块都处理完以后,从第Q阶段输出的是256位的数据摘要H
Q,即散列值。
下面详细讨论SHA-256的轮运算中的64轮运算中的每一轮的内部逻辑。第t轮的运算由如下的运算式定义(t为整数且满足0≤t≤63):
T
1=H+Σ
1(E)+CH(E,F,G)+K
t+W
t
T
2=Σ
0(A)+Maj(A,B,C)
H=G
G=F
F=E;
E=D+T
1
D=C
C=B
B=A
A=T
1+T
2 (运算式1)
其中:
其中,ROTR
n(x)表示对32位的变量x循环右移n位;W
t为一个32位字,从当前的256位输入数据块导出;K
t为一个32位附加常数;+为模2
32加;AND为32位按位与运算;NOT是取反的操作;
为异或运算。
接下来描述32位的字W
t是如何从512位的数据块M
i里导出的。图4例示了用于生成W
t的映射结构。如图4所示,W
t按照以下运算式得出:
对于0≤t≤15:W
t直接取自数据块M
i;
对于16≤t≤63:
W
t=σ
1(W
t-2)+W
t-7+σ
0(W
t-15)+W
t-16 (运算式2)
其中:
SHA-256算法具有如下特性:所生成的散列码的每一个位都是全部输入位的函数。轮运算f多次复杂重复的运算使得结果充分混淆,从而使得随机选择两个数据,甚至于这两个数据有相似特征,都不太可能产生重复的散列码。
本领域技术人员将理解,上述对SHA-256的详细介绍是为了更为清楚的呈现本申请的发明构思,而不意图构成任何限制。本文中所提及的SHA-256包括公知可知的任何版本的SHA-256及其变型例和修改例。
针对散列算法中的多轮重复运算,可以采用流水线结构来并行运算多组不同的数据以提高运算效率。以实现SHA-256算法为例,由于对于每个512位的数据块要进行64轮重复运算,因此可以采用64级的流水线结构来并行运算64组数据。
图5示出了用于实现SHA-256的电路中进行轮运算的流水线结构的示意图。如图5所示,以虚线划分了流水线结构中的第t运算级、第t+1运算级和第t+2运算级。其中,每一运算级包括用于存储中间值的8个32位的寄存器A到H和分别用于存储扩展数据W
t到W
t+15的16个32位的寄存器R
0至R
15。参见前文中的运算式2可知,计算扩展数据W
t+16需要用到数据W
t+14、W
t+9、W
t+1、W
t,即,需要用到最多相隔16轮的数据,因此流水线的每个运算级包括16个寄存器R
0至R
15,用以存储连续的16轮的扩展数据W
t至W
t+15,从而使得能够计算下一个扩展数据W
t+16。
在散列算法的轮运算中,存在大量的数据移位操作。以SHA-256为例,对比参考图5以及运算式2可知,在每个运算级中,需要使用寄存器R
0、R
1、R
9和R
14中的数据来计算要存储在下一运算级的寄存器R
15中的数据,而其余的寄存器R
2到R
8和R
10到R
13中的数据不需要经过逻辑运算,而是通过硬连线直接移位到下一运算级的相应寄存器中。而且,寄存器R
13中的数据依次移位到寄存器R
12、R
11和R
10,期间并不参与除数据移位以外的逻辑运算,这使得在流水线结构中存在不参与除数据移位以外的逻辑运算的一条数据移位路径R
13-R
12-R
11-R
10,类似的还存在另一条数据移位路径R
8-R
7-R
6-R
5-R
4-R
3-R
2。
本申请的发明人认为,对于包括仅用于数据移位而不参与逻辑运算的寄存器(例如,寄存器R
2到R
8和R
10到R
13)的流水线结构,特别是包含数据移位路径的流水线结构,存在进一步优化的空间。在现有的用于实现散列运算的流水线结构中,所有寄存器都采用同一时钟信号进行控制,使得在每个时钟周期,所有寄存器都要发生翻转 以存储新的数据。但是,对于仅用于数据移位而不参与逻辑运算的寄存器,其翻转操作其实并不是必要的,会造成功耗浪费。
基于此,本申请的发明人想到,可以使用降低的频率来控制运算级中仅用于数据移位而不参与逻辑运算的寄存器(例如,扩展寄存器R
2到R
8和R
10到R
13)以减少寄存器的冗余翻转,从而降低功耗。另外,由于用于生成中间值的逻辑运算(例如,参见运算式1)比用于生成扩展数据的逻辑运算(例如,参见运算式2)更为复杂,散列运算的流水线结构中的关键路径通常出现在计算中间值的逻辑硬件中,因此在计算扩展数据的逻辑硬件中存在一定的时序冗余,这使得即便对用于计算扩展数据的逻辑硬件进行一定修改,也不会导致出现的新的关键路径(即,不会导致最高运行频率降低),从而为对流水线结构进行改进提供了便利。
图6示出了根据本公开的实施例的用于执行散列算法的电路100,该电路100包括:输入模块110,用于接收数据;以及运算模块120,用于基于接收到的数据计算散列值。运算模块120包括以流水线结构布置的多个运算级,所述多个运算级包括第0运算级、第1运算级、直到第P运算级,P为大于1且小于流水线结构中运算级的数量的固定的正整数。为了附图清楚起见,图1中仅示意性地示出了两个运算级。
从第1运算级到第P运算级中的每个运算级可以包括:多个缓存寄存器,用于存储当前运算级的中间值并且以第一频率运行;以及多个扩展寄存器,用于存储当前运算级的扩展数据,并且包括以第一频率运行的第一组扩展寄存器和以第二频率运行的第二组扩展寄存器。其中,第二频率是第一频率的1/N倍,N为大于1且不大于第二组扩展寄存器中扩展寄存器的数量的固定的正整数。在根据本公开的实施例中,第二组扩展寄存器可以是各个运算级中仅用于数据移位而不参与逻辑运算的寄存器,而N的大小可以取决于流水线结构中的数据移位路径的长度。在根据本公开的一些实施例中,用电路100来实现SHA-256,多个缓存寄存器可以包括用于存储中间值的寄存器A到H,多个扩展寄存器可以包括用于存储扩展数据的寄存器R
0到R
15,而数据移位路径可以是R
13-R
12-R
11-R
10或者R
8-R
7-R
6-R
5-R
4-R
3-R
2。
在根据本公开的实施例中,第0运算级到第P运算级是流水线结构中连续的(P+1)个运算级,流水线结构还可以包括除第0运算级到第P运算级以外的其他运算级,例如,可以包括连接在第0运算级之前的一个或多个运算级,和/或可以包括连接在第P运算级之后的一个或多个运算级。在根据本公开的一些实施例中,流水线结构中的除第0运算级到第P运算级以外的其他运算级可以包括与第0运算级到第P运算级类似的结构。例如,流水线结 构可以包括总共64个运算级,其中前12个运算级采用如上文中所述的第0运算级到第P运算级的结构(此时P取值为11,N例如可以取值为3),并且第13到第18个运算级也采用如上文中所述的第0运算级到第P运算级的结构(此时P取值为5,N例如可以取值为3)。
在根据本公开的实施例中,多个缓存寄存器和多个扩展寄存器可以包括边沿触发寄存器,例如上升沿触发的寄存器和/或下降沿触发的寄存器。多个缓存寄存器和多个扩展寄存器可以包括D触发器(DFF)和/或锁存器(Latch),锁存器可以例如是采用脉冲类型的时钟信号的锁存器。
继续参考图6,用于执行散列算法的电路100还包括时钟模块130,该时钟模块可以用于提供基准时钟信号CLK。基准时钟信号CLK具有第一频率和与第一频率对应的基准时钟周期,从第1运算级到第P运算级中的每个运算级的多个缓存寄存器和第一组扩展寄存器基于基准时钟信号运行。在运算模块120的多个运算级中,第1运算级到第P运算级中的每个运算级被配置为:在每个基准时钟周期内,基于来自相邻的前一运算级中的第一组扩展寄存器中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的多个缓存寄存器中的中间值。
在根据本公开的一些实施例中,在运算模块120的多个运算级中,第N运算级到第P运算级中的每个运算级可以被配置为:在每个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第一组扩展寄存器中的扩展数据。第i+j
1*N运算级可以被配置为,在第C
1+i+k*N个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第二组扩展寄存器中的扩展数据。其中,N的定义如前文中所述,即,为大于1的固定的正整数,且第二频率是第一频率的1/N倍;C
1为固定的正整数,其大小取决于在电路100启动的初始阶段进行数据初始化所需要的时钟周期数;i为0或小于N的任意正整数,j
1为小于P/N的任意正整数,k为0或任意正整数。例如,如果N=2,则i为0或1,那么第2运算级在第C1、第C1+2、第C1+4、第C1+6等等基准时钟周期内基于在第0运算级和第1运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在第2运算级的第二组扩展寄存器中的扩展数据;第3运算级在第C1+1、第C1+3、第C1+5、第C1+7等等基准时钟周期内基于在第1运算级和第2运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在第3运算级的第二组扩展寄存器中的扩展数据;第4运算级在第C1、第C1+2、第C1+4、第C1+6等等基准时钟周期内,基于在第2运算级和第3运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在第4运算级的第二组扩展寄存器中的扩展数据,等 等,依此类推。
在根据本公开的一些实施例中,第0运算级可以被配置为基于由输入模块110接收到的数据来确定第0运算级中的多个扩展寄存器中的扩展数据。
在根据本公开的一些实施例中,可以通过使用时钟模块130生成多个不同的时钟信号来实现对各个运算级的如前文中所述的控制。具体而言,时钟模块130除了能够生成基准时钟信号CLK以外,还可以被配置为生成具有第二频率的第1时钟信号CLK1到第N时钟信号CLKN。其中,第1时钟信号CLK1到第N时钟信号CLKN的上升沿与基准时钟信号的上升沿对准,并且第2时钟信号到第N时钟信号中的每个时钟信号的上升沿比其前一个时钟信号的上升沿晚一个基准时钟周期,例如,第2时钟信号的上升沿比第1时钟信号的上升沿晚一个基准时钟周期,第3时钟信号的上升沿比第2时钟信号的上升沿晚一个基准时钟周期,依次类推。
注意,在本文中,当提及某个时钟信号时,并不意图表示电路中实际存在的某一个脉冲信号,而是可以表示电路中实际存在的、具有特定的某个频率和相位的一个或多个脉冲信号。以图6为例,虽然在文中表述为每个运算级的多个缓存寄存器都基于基准时钟信号CLK操作,但第0运算级和第p+q*N运算级各自使用的基准时钟信号CLK可以是由时钟模块130中的时钟树生成的各自独立但具有基准时钟信号CLK所要求的频率和相位的两个脉冲信号之一。
相应的,第p+q*N运算级中的第二组扩展寄存器基于第p时钟信号运行,p为不大于N的任意正整数,q为0或使得满足(p+q*N)不大于P的任意正整数,即,任意相邻的两个运算级的各自的第二组扩展寄存器所使用的时钟信号彼此频率相同、上升沿相差一个基准时钟周期。例如,第1运算级中的第二组扩展寄存器基于第1时钟信号CLK1运行,第2运算级中的第二组扩展寄存器基于第2时钟信号CLK2运行,第3运算级中的第二组扩展寄存器基于第1时钟信号CLK1运行,第4运算级中的第二组扩展寄存器基于第2时钟信号CLK2运行,等等。
在根据本公开的一些实施例中,从第1运算级到第P-N运算级中的每个运算级的第一组扩展寄存器中的一个扩展寄存器的输出端可以被耦接到相邻的后N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输入端。第N+1运算级到第P运算级中的每个运算级的第一组扩展寄存器中的一个扩展级寄存器的输入端可以通过N选1多路选择器耦接到相邻的前N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输出端。具体而言,由于第一频率是第二频率是N倍,因此以第一频率运行的寄存器的 数据吞吐率是以第二频率运行的寄存器的数据吞吐率的N倍。因此,如果需要将以第一频率运行的寄存器的输出连接到以第二频率运行的寄存器的输入,则可以连接到N个以第二频率运行的寄存器。反之亦然,如果需要将以第二频率运行的寄存器的输出连接到以第一频率运行的寄存器的输入,则可以将N个以第二频率运行的寄存器的输出通过N选1多路选择器连接到一个以第一频率运行的寄存器。当以第二频率运行的寄存器的输出需要连接到另一个以第二频率运行的寄存器的输入时,由于频率一致,可以采用一对一的方式连接,但是需要跳过(N-1)个运算级。
在根据本公开的一些实施中,从第1运算级到第P运算级中的每个运算级的所述多个扩展寄存器还可以包括以第三频率运行的第三组扩展寄存器,其中第三频率是第一频率的1/M倍,M为大于1、小于第三组扩展寄存器中扩展寄存器的数量且不等于N的固定的正整数。其中,第r+j
2*M运算级可以被配置为,在第C
2+r+k*M个基准时钟周期内,基于在当前运算级的相邻的前M个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第三组扩展寄存器中的扩展数据。其中,C2为固定的正整数,其大小取决于在电路启动的初始阶段进行数据初始化所需要的时钟周期数;r为0或小于M的任意正整数,j
2为小于P/M的任意正整数,k为0或任意正整数。在一些实施例中,时钟模块可以相应地配置为生成用于控制第三组扩展寄存器的M个时钟信号。
根据本公开的实施例的用于执行散列算法的电路100可以用于实现SHA-256算法,并且可以以多种不同的配置来实现SHA-256算法。本领域技术人员将理解,根据本公开的实施例的用于实现散列算法的电路和方法适用于任何散列算法,甚至可以进一步应用到任何可以采用流水线结构且存在数据移位的电路和方法中,而不仅限于实现SHA-256。
在采用根据本公开的电路100来实现SHA-256算法的一些实施例中,每个运算级的多个扩展寄存器可以包括16个32位寄存器R
0至R
15。寄存器R
0至R
15分别用于存储扩展数据W
t到W
t+15,因此其所参与的操作如运算式2所示。在SHA-256的轮运算中,存在一条数据移位路径R
13-R
12-R
11-R
10和另一条数据移位路径R
8-R
7-R
6-R
5-R
4-R
3-R
2,这些数据移位路径中的寄存器可以用作运算级中的第二组扩展寄存器或第三组扩展寄存器,而其余的寄存器R
0、R
1、R
9、R
14和R
15则可以用作运算级中的第一组扩展寄存器。
在采用根据本公开的电路100来实现SHA-256算法的一些优选实施例中,第二组扩展寄存器包括寄存器R
2到R
8以及R
10到R
13。由于第二组扩展寄存器中的较短的数据移位路径R
13-R
12-R
11-R
10的长度为4(即,包括4个串联移位的寄存器),因此N的最大值为4,即N 可以为2、3或4。相应地,第二组扩展寄存器R
2到R
8以及R
10到R
13的运行频率可以是第一组扩展寄存器R
0、R
1、R
9、R
14和R
15的运行频率的1/N,因此第二组扩展寄存器R
2到R
8以及R
10到R
13的功耗可以相应地降低(N-1)/N。
在采用根据本公开的电路100来实现SHA-256算法的另一些优选实施例中,可以采用不同的频率来分别控制寄存器R
2到R
8和寄存器R
10到R
13这两组寄存器。例如,第二组扩展寄存器可以包括寄存器R
2到R
8,而第三组扩展寄存器可以包括寄存器R
10到R
13。相应地,第二组扩展寄存器R
2到R
8用第二频率来控制,而第三组扩展寄存器R
10到R
13用第三频率来控制,这两组扩展寄存器的控制各自独立,互不影响。由于第二组扩展寄存器中的数据移位路径R
8-R
7-R
6-R
5-R
4-R
3-R
2的长度为7,因此N的最大值为7,即,N=2、3、4、5、6或7。相应地,第二组扩展寄存器R
2到R
8的运行频率可以是多个扩展寄存器R
0、R
1、R
9和R
14和R
15的运行频率的1/N,因此第二组扩展寄存器R
2到R
8的功耗可以相应地降低(N-1)/N。由于第三组扩展寄存器中的数据移位路径R
13-R
12-R
11-R
10的长度为4,因此M的最大值为4,即,M=2、3或4。相应地,第三组扩展寄存器R
10到R
13的运行频率可以是多个扩展寄存器R
0、R
1、R
9和R
14和R
15的运行频率的1/M,因此第三组扩展寄存器R
10到R
13的功耗可以相应地降低(M-1)/M。有利地,根据本公开的实施例的用于实现散列算法的电路的功耗算力比得到了显著的提高。
在采用根据本公开的电路100来实现SHA-256算法的再一些优选实施例中,可以将参与逻辑运算的扩展寄存器也进行降频控制。例如,寄存器R
9既用于数据移位又参与每一运算级的逻辑运算,但是与寄存器R
9之间存在数据移位关系的寄存器R
8和R
10都是仅用于数据移位的寄存器,因此,可以对寄存器R
9也进行降频控制。虽然这会导致对电路100中其他部分的额外修改,但是可以将两个数据移位的路径R
13-R
12-R
11-R
10和R
8-R
7-R
6-R
5-R
4-R
3-R
2连接在一起,从而实现R
13直到R
2的超长数据移位路径。对电路100中的其他部分的额外修改例如可以包括,对与使用寄存器R
9的输出相关的硬件部分的修改,例如,如果在修改之前寄存器R
9的输出被硬连线到进行逻辑运算的硬件部分,则在修改之后,可能需要将寄存器R
9的输出和另外一个寄存器的输出通过2选1多路选择器连接到进行逻辑运算的硬件部分。在一些实施例中,第一组扩展寄存器包括寄存器R
0、R
1、R
14和R
15,则,第二组扩展寄存器包括寄存器R
2到R
13,由于数据移位路径的长度延长到了12,因此N的最大值可以为12,即,N可以为2、3、4、5、6、7、8、9、10、11或12。相应地,第二组扩展寄存器R
2到R
13的运行频率可以是多个扩展寄存器R
0、R
1和R
14和R
15的运行频率的1/N,因此第二组扩展寄存器R
2到R
13的功耗可以相应地降低(N-1)/N。
图7A示例性地示出了根据本公开的实施例的用于执行SHA-256的电路200的部分结构的示意图,图7B示例性地示出了图7A中的用于执行SHA-256的电路所采用的时钟信号。图7A所示的电路200是采用图6中所示的电路100来执行SHA-256的一个具体示例,因此前述关于用于执行散列算法的电路100的所有描述均适用于此。注意,为了使得附图更为清楚,图7A中仅示意性地示出了用于执行SHA-256的电路200的部分结构的部分连接关系,例如,图中的部分寄存器并未与任何表示数据移位的箭头连接,但这并不代表该寄存器没有参与操作,只是未在图中示出而已。
在图7A所示的用于执行SHA-256的电路中,每个运算级的多个扩展寄存器可以包括16个32位寄存器R
0至R
15。寄存器R
0至R
15分别用于存储扩展数据W
t到W
t+15,因此其所参与的操作如运算式2所示。第二组扩展寄存器包括寄存器R
10到R
13,并且第二频率是第一频率的1/2,即N=2。
图7A中的箭头指示数据在寄存器间的移位关系。箭头的线条类型与其所代表的时钟信号的线条类型一致,三种不同的箭头分别对应于基准时钟信号CLK、第1时钟信号CLK1和第2时钟信号CLK2。每个箭头的线条类型指示其所指向的寄存器所采用的时钟信号是CLK、CLK1还是CLK2。图7A中还以寄存器的不同样式来区分寄存器所使用的时钟信号,如图7B中所示,使用时钟信号CLK、CLK1和CLK2的寄存器的样式被置于相应的时钟信号之后,作为示意。图7A中采用虚线框的寄存器表示其采用的时钟信号可以根据具体需要灵活确认。
在根据本公开的实施例中,多个缓存寄存器和多个扩展寄存器可以采用上升沿触发的寄存器,也可以采用下降沿触发的寄存器。本领域技术人员将理解,图7B示出的是采用上升沿触发的寄存器时所需的时钟信号,将这些时钟信号翻转180°,就可以得到采用下降沿触发的寄存器时所需的时钟信号。
如图7A和7B所示,每个运算级中的第一组扩展寄存器中的寄存器R
9和R
15依据基准时钟信号CLK运行。
进一步参考图7A和7B,第i+j
1*2运算级被配置为,使得该运算级在第C
1+i+k*2个基准时钟周期内,基于在当前运算级的相邻的前2个运算级中的至少一个扩展寄存器中的扩展数据,生成要存储在当前运算级的第二组扩展寄存器R
10到R
13的扩展数据。其中,C
1为固定的正整数,其大小取决于在电路100启动的初始阶段进行数据初始化所需要的时钟周期数;i为0或1,j
1为小于P/2的任意正整数,k为0或任意正整数。
继续参考图7A和7B,第p+q*2运算级中的第二组扩展寄存器R
10到R
13基于第p时钟信 号CLKp运行,其中,p为1或2,q为0或使得满足(p+q*2)不大于P的任意正整数。具体而言,诸如第1运算级、第3运算级、第5运算级等等的第1+q*2运算级中的第二组扩展寄存器R
10到R
13基于第1时钟信号CLK1运行;诸如第2运算级、第4运算级等等的第2+q*2运算级中的第二组扩展寄存器R
10到R
13基于第2时钟信号CLK2运行。其中,第1时钟信号CLK1和第2时钟信号CLK2的上升沿与基准时钟信号CLK的上升沿对准,并且第2时钟信号CLK2的上升沿比第1时钟信号CLK1的上升沿晚一个基准时钟周期。
图8A示例性地示出了根据本公开的实施例的用于执行SHA-256的电路300的部分结构的示意图,图8B示例性地示出了图8A中的用于执行SHA-256的电路所采用的时钟信号。图8A所示的电路300是采用图6中所示的电路100来执行SHA-256的一个具体示例,因此前述关于用于执行散列算法的电路100的所有描述均适用于此。注意,为了使得附图更为清楚,图8A中仅示意性地示出了用于执行SHA-256的电路300的部分结构的部分连接关系,例如,图中的部分寄存器并未与任何表示数据移位的箭头连接,但这并不代表该寄存器没有参与操作,只是未在图中示出而已。
在图8A所示的用于执行SHA-256的电路中,每个运算级的多个扩展寄存器可以包括16个32位寄存器R
0至R
15。寄存器R
0至R
15分别用于存储扩展数据W
t到W
t+15,因此其所参与的操作如等式2所示。第二组扩展寄存器包括寄存器R
10到R
13,并且第二频率是第一频率的1/3,即N=3。
图8A中的箭头指示数据在寄存器间的移位关系。箭头的线条类型与其所代表的时钟信号的线条类型一致,四种不同的箭头类型分别对应于准时钟信号CLK、第1时钟信号CLK1、第2时钟信号CLK2和第3时钟信号CLK3。每个箭头的线条类型指示其所指向的寄存器所采用的时钟信号是CLK、CLK1还是CLK2、CLK3。图8A中还以寄存器的不同样式来区分寄存器所使用的时钟信号,如图8B中所示,使用时钟信号CLK、CLK1、CLK2和CLK3的寄存器的样式被置于相应的时钟信号之后,作为示意。图8A中采用虚线框的寄存器表示其采用的时钟信号可以根据具体需要灵活确认。
在根据本公开的实施例中,多个缓存寄存器和多个扩展寄存器可以采用上升沿触发的寄存器,也可以采用下降沿触发的寄存器。本领域技术人员将理解,图8B示出的是采用上升沿触发的寄存器时所需的时钟信号,将这些时钟信号翻转180°,就可以得到采用下降沿触发的寄存器时所需的时钟信号。
如图8A和8B所示,每个运算级中的第一组扩展寄存器中的寄存器R
9和R
15依据标准时钟信号CLK运行。
进一步参考图8A和8B,第i+j
1*3运算级被配置为,使得该运算级在第C
1+i+k*3个基准时钟周期内,基于在当前运算级的相邻的前3个运算级中的至少一个扩展寄存器中的扩展数据,生成要存储在当前运算级的第二组扩展寄存器R
10到R
13的扩展数据。其中,C
1为固定的正整数,其大小取决于在电路100启动的初始阶段进行数据初始化所需要的时钟周期数;i为0、1或2,j
1为小于P/3的任意正整数,k为0或任意正整数。
继续参考图8A和8B,第p+q*3运算级中的第二组扩展寄存器R
10到R
13基于第p时钟信号CLKp运行,其中,p为1、2或3,q为0或使得满足(p+q*3)不大于P的任意正整数。具体而言,诸如第1运算级、第4运算级等等的第1+q*3运算级中的第二组扩展寄存器R
10到R
13基于第1时钟信号CLK1运行;诸如第2运算级、第5运算级等等的第2+q*3运算级中的第二组扩展寄存器R
10到R
13基于第2时钟信号CLK2运行;诸如第3运算级等等的第3+q*3运算级中的第二组扩展寄存器R
10到R
13基于第3时钟信号CLK3运行。其中,第1时钟信号CLK1、第2时钟信号CLK2和第3时钟信号CLK3的上升沿与基准时钟信号CLK的上升沿对准,第2时钟信号CLK2的上升沿比第1时钟信号CLK1的上升沿晚一个基准时钟周期,并且第3时钟信号CLK3的上升沿比第2时钟信号CLK2的上升沿晚一个基准时钟周期。
根据本公开的实施例,还提供有用于执行数据处理算法(例如比特币挖矿算法)的装置,其包括根据前文中所述的用于执行散列算法的电路,例如电路100、电路200或电路300。本公开所提出的用于执行散列算法的电路非常适合于以降低的功耗算力比来实现SHA-256算法,从而非常适合于以降低的功耗算力比来实现数据处理设备(例如比特币矿机)。有利地,根据本公开的实施例的用于执行数据处理算法的装置的功耗算力具有显著的优势。
根据本公开的实施例,还提供有用于执行算法的方法,其采用根据本公开所述的电路来执行算法。具体而言,该方法可以包括:使用输入模块来接收数据;以及使用运算模块基于接收到的数据计算散列值。运算模块可以包括以流水线结构布置的多个运算级,包括例如第0运算级、第1运算级、直到第P运算级,P为大于1且小于流水线结构中运算级的数量的固定的正整数。从第1运算级到第P运算级中的每个运算级可以包括:多个缓存寄存器,用于存储当前运算级的中间值并且以第一频率运行,以及多个扩展寄存器,用于存储当前运算级的扩展数据。多个扩展寄存器可以包括以第一频率运行的第一组扩展寄存器和以第二频率运行的第二组扩展寄存器,其中,第二频率是第一频率的1/N倍,N为大于1且不大于第二组扩展寄存器中扩展寄存器的数量的固定的正整数。
在根据本公开的一些实施例中,多个缓存寄存器和多个扩展寄存器可以包括边沿触发寄存器,例如上升沿触发的寄存器和/或下降沿触发的寄存器。多个缓存寄存器和多个扩展寄存器可以包括D触发器和/或锁存器,锁存器可以例如是采用脉冲类型的时钟信号的锁存器。
在根据本公开的一些实施例中,用于执行散列算法的方法还可以包括使用时钟模块提供基准时钟信号。基准时钟信号具有第一频率和与第一频率对应的基准时钟周期,从第1运算级到第P运算级中的每个运算级的多个缓存寄存器和第一组扩展寄存器可以基于基准时钟信号运行。从第1运算级到第P运算级中的每个运算级可以被配置为:在每个基准时钟周期内,基于来自相邻的前一运算级中的第一组扩展寄存器中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的多个缓存寄存器中的中间值。第N运算级到第P运算级中的每个运算级可以被配置为:在每个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第一组扩展寄存器中的扩展数据。第i+j
1*N运算级可以被配置为,在第C
1+i+k*N个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第二组扩展寄存器中的扩展数据。其中,C
1为固定的正整数,i为0或小于N的任意正整数,j
1为小于P/N的任意正整数,k为0或任意正整数。
在根据本公开的用于执行散列算法的方法的一些实施例中,时钟模块还可以被配置为生成具有第二频率的第1时钟信号到第N时钟信号,其中,第1时钟信号到第N时钟信号的上升沿与基准时钟信号的上升沿对准,并且第2时钟信号到第N时钟信号中的每个时钟信号的上升沿比其前一个时钟信号的上升沿晚一个基准时钟周期。第p+q*N运算级中的第二组扩展寄存器可以基于第p时钟信号运行,p为不大于N的任意正整数,q为0使得满足(p+q*N)不大于P的或任意正整数。
在根据本公开的用于执行散列算法的方法的一些实施例中,从第1运算级到第P-N运算级中的每个运算级的第一组扩展寄存器中的一个扩展寄存器的输出端可以被耦接到相邻的后N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输入端。第N+1运算级到第P运算级中的每个运算级的第一组扩展寄存器中的一个扩展级寄存器的输入端可以通过N选1多路选择器耦接到相邻的前N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输出端。
在根据本公开的用于执行散列算法的方法的一些实施例中,从第1运算级到第P运 算级中的每个运算级的多个扩展寄存器还可以包括以第三频率运行的第三组扩展寄存器,第三频率是第一频率的1/M倍,M为大于1、小于第三组扩展寄存器中扩展寄存器的数量且不等于N的固定的正整数。相应地,第r+j
2*M运算级可以被配置为,在第C
2+r+k*M个基准时钟周期内,基于在当前运算级的相邻的前M个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第三组扩展寄存器中的扩展数据;其中,C
2为固定的正整数,r为0或小于M的任意正整数,j
2为小于P/M的任意正整数,k为0或任意正整数。
在根据本公开的用于执行散列算法的方法的一些实施例中,所述方法可以用于执行SHA-256,在这种情况下,多个扩展寄存器包括16个32位寄存器R0至R15,其中,第一组扩展寄存器包括寄存器R0、R1、R9和R14和R15,第二组扩展寄存器包括寄存器R2到R8以及寄存器R10到R13,以及其中,N=2、3或4。
在根据本公开的用于执行散列算法的方法的一些实施例中,所述方法可以用于执行SHA-256,在这种情况下,多个扩展寄存器包括16个32位寄存器R0至R15,其中,第一组扩展寄存器包括寄存器R0、R1和R14和R15,第二组扩展寄存器包括寄存器R2到R13,以及其中,N=2、3、4、5、6、7、8、9、10、11或12。
在根据本公开的用于执行散列算法的方法的一些实施例中,所述方法可以用于执行SHA-256,在这种情况下,多个扩展寄存器包括16个32位寄存器R0至R15,其中,第一组扩展寄存器包括寄存器R0、R1、R9和R14和R15,第二组扩展寄存器包括寄存器R10到R13,第三组扩展寄存器包括寄存器R2到R8,以及其中,N=2、3、4,M=2、3、4、5、6或7。
根据本公开的实施例,还提供有用于执行数据处理算法(例如比特币挖矿算法)的方法,其包括根据前文中所述的用于执行散列算法的方法的步骤。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
在说明书及权利要求中的词语“前”、“后”、“顶”、“底”、“之上”、“之下”等,如果存在的话,用于描述性的目的而并不一定用于描述不变的相对位置。应当理解,这样使用的词语在适当的情况下是可互换的,使得在此所描述的本公开的实施例,例如,能够在与在此所示出的或另外描述的那些取向不同的其他取向上操作。
如在此所使用的,词语“示例性的”意指“用作示例、实例或说明”,而不是作为将被精确复制的“模型”。在此示例性描述的任意实现方式并不一定要被解释为比其它实 现方式优选的或有利的。而且,本公开不受在上述技术领域、背景技术、发明内容或具体实施方式中所给出的任何所表述的或所暗示的理论所限定。
如在此所使用的,词语“基本上”意指包含由设计或制造的缺陷、器件或元件的容差、环境影响和/或其它因素所致的任意微小的变化。词语“基本上”还允许由寄生效应、噪音以及可能存在于实际的实现方式中的其它实际考虑因素所致的与完美的或理想的情形之间的差异。
上述描述可以指示被“连接”或“耦合”在一起的元件或节点或特征。如在此所使用的,除非另外明确说明,“连接”意指一个元件/节点/特征与另一种元件/节点/特征在电学上、机械上、逻辑上或以其它方式直接地连接(或者直接通信)。类似地,除非另外明确说明,“耦合”意指一个元件/节点/特征可以与另一元件/节点/特征以直接的或间接的方式在机械上、电学上、逻辑上或以其它方式连结以允许相互作用,即使这两个特征可能并没有直接连接也是如此。也就是说,“耦合”意图包含元件或其它特征的直接连结和间接连结,包括利用一个或多个中间元件的连接。
还应理解,“包括/包含”一词在本文中使用时,说明存在所指出的特征、整体、步骤、操作、单元和/或组件,但是并不排除存在或增加一个或多个其它特征、整体、步骤、操作、单元和/或组件以及/或者它们的组合。
本领域技术人员应当意识到,在上述操作之间的边界仅仅是说明性的。多个操作可以结合成单个操作,单个操作可以分布于附加的操作中,并且操作可以在时间上至少部分重叠地执行。而且,另选的实施例可以包括特定操作的多个实例,并且在其他各种实施例中可以改变操作顺序。但是,其它的修改、变化和替换同样是可能的。因此,本说明书和附图应当被看作是说明性的,而非限制性的。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。在此公开的各实施例可以任意组合,而不脱离本公开的精神和范围。本领域的技术人员还应理解,可以对实施例进行多种修改而不脱离本公开的范围和精神。本公开的范围由所附权利要求来限定。
Claims (13)
- 一种用于执行散列算法的电路,其中,包括:输入模块,用于接收数据;以及运算模块,用于基于接收到的数据计算散列值,所述运算模块包括以流水线结构布置的多个运算级,所述多个运算级包括第0运算级、第1运算级、直到第P运算级,P为大于1且小于流水线结构中运算级的数量的固定的正整数,其中,从第1运算级到第P运算级中的每个运算级包括:多个缓存寄存器,用于存储当前运算级的中间值并且以第一频率运行,以及多个扩展寄存器,用于存储当前运算级的扩展数据,并且包括以所述第一频率运行的第一组扩展寄存器和以第二频率运行的第二组扩展寄存器,其中,第二频率是第一频率的1/N倍,N为大于1且不大于第二组扩展寄存器中扩展寄存器的数量的固定的正整数。
- 根据权利要求1所述的电路,其中,所述电路还包括:时钟模块,用于提供基准时钟信号,所述基准时钟信号具有第一频率和与第一频率对应的基准时钟周期,从第1运算级到第P运算级中的每个运算级的多个缓存寄存器和第一组扩展寄存器基于所述基准时钟信号运行;其中,从第1运算级到第P运算级中的每个运算级被配置为:在每个基准时钟周期内,基于来自相邻的前一运算级中的第一组扩展寄存器中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的多个缓存寄存器中的中间值。
- 根据权利要求2所述的电路,其中,其中,第N运算级到第P运算级中的每个运算级被配置为:在每个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第一组扩展寄存器中的扩展数据;其中,第i+j 1*N运算级被配置为,在第C 1+i+k*N个基准时钟周期内,基于在当前运算级的相邻的前N个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第二组扩展寄存器中的扩展数据;其中,C 1为固定的正整数,i为0或小于N的任意正整数,j 1为小于P/N的任意正整数, k为0或任意正整数。
- 根据权利要求2所述的电路,其中,其中,所述时钟模块还被配置为生成具有第二频率的第1时钟信号到第N时钟信号,其中,第1时钟信号到第N时钟信号的上升沿与基准时钟信号的上升沿对准,并且第2时钟信号到第N时钟信号中的每个时钟信号的上升沿比其前一个时钟信号的上升沿晚一个基准时钟周期;以及其中,第p+q*N运算级中的第二组扩展寄存器基于第p时钟信号运行,p为不大于N的任意正整数,q为0或使得满足(p+q*N)不大于P的任意正整数。
- 根据权利要求3所述的电路,其中,其中,从第1运算级到第P-N运算级中的每个运算级的第一组扩展寄存器中的一个扩展寄存器的输出端被耦接到相邻的后N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输入端;以及其中,第N+1运算级到第P运算级中的每个运算级的第一组扩展寄存器中的一个扩展级寄存器的输入端通过N选1多路选择器耦接到相邻的前N个运算级中的每个运算级的第二组扩展寄存器中的一个扩展寄存器的输出端。
- 根据权利要求2所述的电路,其中,从第1运算级到第P运算级中的每个运算级的所述多个扩展寄存器还包括以第三频率运行的第三组扩展寄存器,其中,第三频率是第一频率的1/M倍,M为大于1、小于第三组扩展寄存器中扩展寄存器的数量且不等于N的固定的正整数。
- 根据权利要求6所述的电路,其中,其中,第r+j 2*M运算级被配置为,在第C 2+r+k*M个基准时钟周期内,基于在当前运算级的相邻的前M个运算级中的至少一个扩展寄存器中的扩展数据,生成用于存储在当前运算级的第三组扩展寄存器中的扩展数据;其中,C 2为固定的正整数,r为0或小于M的任意正整数,j 2为小于P/M的任意正整数,k为0或任意正整数。
- 根据权利要求1至5中任一项所述的电路,其中,所述电路用于执行SHA-256,其中,所述多个扩展寄存器包括16个32位寄存器R 0至R 15,其中,所述第一组扩展寄存器包括寄存器R 0、R 1、R 9和R 14和R 15,所述第二组扩展寄存器包括寄存器R 2到R 8以及寄存器R 10到R 13,以及其中,N=2、3或4。
- 根据权利要求1至5中任一项所述的电路,其中,所述电路用于执行SHA-256,其中,所述多个扩展寄存器包括16个32位寄存器R 0至R 15,其中,所述第一组扩展寄存器包括寄存器R 0、R 1和R 14和R 15,所述第二组扩展寄存器包括寄存器R 2到R 13,以及其中,N=2、3、4、5、6、7、8、9、10、11或12。
- 根据权利要求6至7中任一项所述的电路,其中,所述电路用于执行SHA-256,其中,所述多个扩展寄存器包括16个32位寄存器R 0至R 15,其中,所述第一组扩展寄存器包括寄存器R 0、R 1、R 9和R 14和R 15,所述第二组扩展寄存器包括寄存器R 10到R 13,所述第三组扩展寄存器包括寄存器R 2到R 8,以及其中,N=2、3、4,M=2、3、4、5、6或7。
- 根据权利要求1所述电路,其中,所述多个缓存寄存器和所述多个扩展寄存器包括D触发器和锁存器中的至少一种。
- 一种用于执行数据处理算法的装置,包括根据权利要求1至11中任一项所述的电路。
- 一种用于执行算法的方法,其中,采用根据权利要求1-11中任一项所述的电路来执行所述算法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/602,166 US11716076B2 (en) | 2020-05-20 | 2021-05-13 | Circuits and methods for performing hash algorithm |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010432370.8A CN111612622B (zh) | 2020-05-20 | 2020-05-20 | 用于执行散列算法的电路和方法 |
CN202010432370.8 | 2020-05-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021233198A1 true WO2021233198A1 (zh) | 2021-11-25 |
Family
ID=72203478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/093612 WO2021233198A1 (zh) | 2020-05-20 | 2021-05-13 | 用于执行散列算法的电路和方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11716076B2 (zh) |
CN (1) | CN111612622B (zh) |
TW (1) | TWI779606B (zh) |
WO (1) | WO2021233198A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612622B (zh) * | 2020-05-20 | 2021-03-23 | 深圳比特微电子科技有限公司 | 用于执行散列算法的电路和方法 |
CN114648318A (zh) * | 2020-12-18 | 2022-06-21 | 深圳比特微电子科技有限公司 | 执行哈希算法的电路、计算芯片、加密货币矿机和方法 |
CN114648319A (zh) * | 2020-12-18 | 2022-06-21 | 深圳比特微电子科技有限公司 | 执行哈希算法的电路、计算芯片、加密货币矿机和方法 |
CN112787799B (zh) * | 2020-12-30 | 2022-07-26 | 浙江萤火虫区块链科技有限公司 | 一种Poseidon Hash算法实现电路及其实现方法 |
CN113642725B (zh) * | 2021-10-13 | 2022-03-08 | 清华大学 | 用于消息散列算法的消息扩展的光电集成电路 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160092688A1 (en) * | 2014-09-26 | 2016-03-31 | Gilbert M. Wolrich | Instructions and logic to provide simd sm3 cryptographic hashing functionality |
CN107835071A (zh) * | 2017-11-03 | 2018-03-23 | 中国人民解放军国防科技大学 | 一种提高键入‑散列法运算速度的方法和装置 |
CN108427575A (zh) * | 2018-02-01 | 2018-08-21 | 深圳市安信智控科技有限公司 | 全流水结构sha-2消息扩展优化方法 |
CN108959168A (zh) * | 2018-06-06 | 2018-12-07 | 厦门大学 | 基于片上内存的sha512全流水电路及其实现方法 |
CN109936441A (zh) * | 2019-01-28 | 2019-06-25 | 湖北大学 | 一种基于数据存储的流水sha256硬件实现方法 |
CN110430040A (zh) * | 2019-07-31 | 2019-11-08 | 武汉芯昌科技有限公司 | 一种低功耗sha256算法中的消息扩展电路 |
CN111612622A (zh) * | 2020-05-20 | 2020-09-01 | 深圳比特微电子科技有限公司 | 用于实现散列算法的电路和方法 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5442498A (en) | 1993-11-08 | 1995-08-15 | International Business Machines Corporation | Asynchronous track code encodement and detection for disk drive servo control system |
US6879341B1 (en) * | 1997-07-15 | 2005-04-12 | Silverbrook Research Pty Ltd | Digital camera system containing a VLIW vector processor |
US5987620A (en) * | 1997-09-19 | 1999-11-16 | Thang Tran | Method and apparatus for a self-timed and self-enabled distributed clock |
WO2005101975A2 (en) * | 2004-04-22 | 2005-11-03 | Fortress Gb Ltd. | Accelerated throughput synchronized word stream cipher, message authenticator and zero-knowledge output random number generator |
US7428652B2 (en) | 2005-05-10 | 2008-09-23 | Intel Corporation | Programmable phase generator for cross-clock communication where the clock frequency ratio is a rational number |
US8762439B2 (en) * | 2011-04-14 | 2014-06-24 | Apple Inc. | System and method for random number generation using asynchronous boundaries and phase locked loops |
US8874933B2 (en) * | 2012-09-28 | 2014-10-28 | Intel Corporation | Instruction set for SHA1 round processing on 128-bit data paths |
CN102981797B (zh) * | 2012-11-02 | 2015-06-17 | 中国航天科技集团公司第九研究院第七七一研究所 | 基于cordic算法的反馈和流水线结构相结合的三角函数运算器 |
US10127042B2 (en) * | 2013-06-26 | 2018-11-13 | Intel Corporation | Method and apparatus to process SHA-2 secure hashing algorithm |
US9317719B2 (en) * | 2014-09-04 | 2016-04-19 | Intel Corporation | SM3 hash algorithm acceleration processors, methods, systems, and instructions |
CN105577363B (zh) * | 2016-01-29 | 2018-06-01 | 江苏沁恒股份有限公司 | 针对sm4密码算法的可扩展流水线电路及其实现方法 |
US10140458B2 (en) * | 2016-04-07 | 2018-11-27 | Intel Corporation | Parallelized authentication encoding |
US10142098B2 (en) * | 2016-06-29 | 2018-11-27 | Intel Corporation | Optimized SHA-256 datapath for energy-efficient high-performance Bitcoin mining |
US10755242B2 (en) * | 2016-09-23 | 2020-08-25 | Intel Corporation | Bitcoin mining hardware accelerator with optimized message digest and message scheduler datapath |
US10326596B2 (en) * | 2016-10-01 | 2019-06-18 | Intel Corporation | Techniques for secure authentication |
US10705842B2 (en) * | 2018-04-02 | 2020-07-07 | Intel Corporation | Hardware accelerators and methods for high-performance authenticated encryption |
WO2020001167A1 (zh) * | 2018-06-25 | 2020-01-02 | 北京嘉楠捷思信息技术有限公司 | 动态d触发器、数据运算单元、芯片、算力板及计算设备 |
US10979214B2 (en) * | 2018-07-24 | 2021-04-13 | Martin Spence Denham | Secure hash algorithm implementation |
US10928847B2 (en) * | 2018-09-29 | 2021-02-23 | Intel Corporation | Apparatuses and methods for frequency scaling a message scheduler data path of a hashing accelerator |
CN110489370B (zh) * | 2019-07-15 | 2023-05-23 | 广东工业大学 | 一种哈希算法sha256消息预处理的硬件填充方法 |
CN110543481B (zh) * | 2019-08-23 | 2022-12-06 | 紫光展锐(重庆)科技有限公司 | 数据处理方法、装置、计算机设备及存储介质 |
-
2020
- 2020-05-20 CN CN202010432370.8A patent/CN111612622B/zh active Active
-
2021
- 2021-05-13 TW TW110117281A patent/TWI779606B/zh active
- 2021-05-13 WO PCT/CN2021/093612 patent/WO2021233198A1/zh active Application Filing
- 2021-05-13 US US17/602,166 patent/US11716076B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160092688A1 (en) * | 2014-09-26 | 2016-03-31 | Gilbert M. Wolrich | Instructions and logic to provide simd sm3 cryptographic hashing functionality |
CN107835071A (zh) * | 2017-11-03 | 2018-03-23 | 中国人民解放军国防科技大学 | 一种提高键入‑散列法运算速度的方法和装置 |
CN108427575A (zh) * | 2018-02-01 | 2018-08-21 | 深圳市安信智控科技有限公司 | 全流水结构sha-2消息扩展优化方法 |
CN108959168A (zh) * | 2018-06-06 | 2018-12-07 | 厦门大学 | 基于片上内存的sha512全流水电路及其实现方法 |
CN109936441A (zh) * | 2019-01-28 | 2019-06-25 | 湖北大学 | 一种基于数据存储的流水sha256硬件实现方法 |
CN110430040A (zh) * | 2019-07-31 | 2019-11-08 | 武汉芯昌科技有限公司 | 一种低功耗sha256算法中的消息扩展电路 |
CN111612622A (zh) * | 2020-05-20 | 2020-09-01 | 深圳比特微电子科技有限公司 | 用于实现散列算法的电路和方法 |
Also Published As
Publication number | Publication date |
---|---|
US20220149827A1 (en) | 2022-05-12 |
TW202143076A (zh) | 2021-11-16 |
US11716076B2 (en) | 2023-08-01 |
CN111612622B (zh) | 2021-03-23 |
CN111612622A (zh) | 2020-09-01 |
TWI779606B (zh) | 2022-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021233198A1 (zh) | 用于执行散列算法的电路和方法 | |
KR102137956B1 (ko) | 블록 마이닝 방법 및 장치 | |
US5943248A (en) | w-bit non-linear combiner for pseudo-random number generation | |
CN111600699A (zh) | 用于实现散列算法的电路和方法 | |
TWI784457B (zh) | 時鐘電路系統、計算晶片、算力板和資料處理設備 | |
Paar et al. | Stream ciphers | |
KR100478974B1 (ko) | 직렬 유한체 승산기 | |
TWI801926B (zh) | 執行哈希算法的電路、計算晶片、數據處理設備和方法 | |
CN213518334U (zh) | 执行哈希算法的电路、计算芯片和加密货币矿机 | |
CN212231468U (zh) | 用于执行散列算法的电路以及执行比特币挖矿算法的装置 | |
CN112988235B (zh) | 一种高效率第三代安全散列算法的硬件实现电路及方法 | |
TWI766754B (zh) | 執行哈希算法的電路、計算晶片、數據處理設備和方法 | |
Van Beirendonck et al. | A Lyra2 FPGA core for Lyra2REv2-based cryptocurrencies | |
Jansen et al. | Cascade jump controlled sequence generator (CJCSG) | |
WO2023053458A1 (ja) | ハッシュ値計算装置、ハッシュ値計算方法及びハッシュ値計算プログラム | |
EP4102355B1 (en) | Ring oscillator based true random number generator and a method for generating a random number | |
Mihajloska Trpcheska et al. | Programmable processing element for crypto-systems on FPGAs | |
CN213482935U (zh) | 执行哈希算法的电路、计算芯片和加密货币矿机 | |
Stefan | Hardware framework for the rabbit stream cipher | |
US20040143614A1 (en) | Hiding the internal state of a random number generator | |
Nastou et al. | Enhancing the security of block ciphers with the aid of parallel substitution box construction | |
Sherigar et al. | A pipelined parallel processor to implement MD4 message digest algorithm on Xilinx FPGA | |
Tummalapalli et al. | Implementation of Low Power RC5 Algorithm in XILINX FPGA | |
TW202242692A (zh) | 單回合高階加密標準電路模組 | |
CN118900171A (zh) | 一种可配置的hmac消息认证码电路及方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21808450 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/04/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21808450 Country of ref document: EP Kind code of ref document: A1 |