WO2022158549A1

WO2022158549A1 - Processing element, control method and control program therefor, and processing device

Info

Publication number: WO2022158549A1
Application number: PCT/JP2022/002090
Authority: WO
Inventors: テイホントラン; 康彦中島
Original assignee: 国立大学法人奈良先端科学技術大学院大学
Priority date: 2021-01-22
Filing date: 2022-01-21
Publication date: 2022-07-28
Also published as: JPWO2022158549A1

Abstract

The present invention improves processing for repeatedly calculating a hash function. This processing element comprises: an ME computation unit (30) that repeatedly performs loop processing on a message block to decompress the message block into a plurality of words; and an MC computation unit (31) that repeatedly performs loop processing on the decomposed plurality of words to compress the words into an intermediate hash value. The ME computation unit (30) and the MC computation unit (31) execute one loop processing of the ME computation unit (30) and a part of one loop processing of the MC computation unit (31) in parallel. The MC computation unit (31) executes the remaining processing not executed in parallel out of the one loop processing of the MC computation unit (31) using words computed by the one loop processing of the ME computation unit (30).

Description

Processing element, its control method and control program, and processing device

The present invention relates to a processing element that calculates an intermediate hash value from a message block of a predetermined bit length, its control method and control program, and a processing device.

Blockchain technology is used to ensure the security of decentralized cryptocurrencies such as Bitcoin, Ripple, and Ethereum. Furthermore, recently, blockchain has been extensively researched for use in various fields such as autonomous driving, smart healthcare systems, robotics, and supply chains.

The current problem with blockchain is its high power consumption. This is because the hash function (SHA-256, SHA-512, etc.) needs to be calculated repeatedly.

On the other hand, the Bitcoin mining accelerator described in Patent Document 1 has a hardware circuit optimized for loop processing in SHA-256 message compression (MC) processing. Also, the processing system described in Patent Document 2 uses clock gates and hardwired to optimize SHA-256 loop processing in bitcoin mining.

U.S. Patent No. 10755242 U.S. Patent No. 10142098

However, there is still room for improvement in the process of repeatedly calculating hash functions.

An object of one aspect of the present invention is to provide a processing element or the like in which the above processing is improved.

In order to solve the above problems, a processing element according to one aspect of the present invention is a processing element that calculates an intermediate hash value from a message block of a predetermined bit length, and repeats loop processing on the message block. , an expansion unit that expands to a bit sequence longer than the bit length, and a compression unit that repeats loop processing on the expanded bit sequence and compresses it to the intermediate hash value, wherein the expansion unit and the compression The section executes one loop processing of the decompression section and a part of the one loop processing of the compression section in parallel, and the compression section uses the word calculated by the one loop processing of the decompression section. , the rest of the one-loop processing of the compression unit, which is not executed in parallel, is executed.

A processing apparatus according to another aspect of the present invention includes a plurality of processing elements configured as described above, and further includes a control section for controlling the plurality of processing elements.

A processing element control method according to another aspect of the present invention is a processing element that calculates an intermediate hash value from a message block of a predetermined bit length, and repeats loop processing on the message block to obtain: A control method for a processing element, comprising: an expansion unit that expands a bit sequence to a longer bit sequence; and a compression unit that repeats loop processing on the expanded bit sequence and compresses the expanded bit sequence to the intermediate hash value, wherein the expansion unit and the a step of causing the compression unit to execute one loop process of the decompression unit and a part of the one loop process of the compression unit in parallel; and a step of executing the remaining processes, which are not executed in parallel, of the one loop process of the compression unit, using the words obtained.

According to one aspect of the present invention, processing for repeatedly calculating hash functions can be improved.

3 is a block diagram showing a schematic configuration of an ALU in PE of the hash value calculation device according to one embodiment of the present invention; FIG. FIG. 4 is a diagram for explaining an outline of a method for calculating a value of SHA-256; It is a block diagram which shows schematic structure of the said hash value calculating apparatus. It is a block diagram which shows schematic structure of said PE. 4 is a timing chart showing an example of the flow of processing in the ME calculation section and MC calculation section of the ALU; 3 is a block diagram showing a schematic configuration of an input-side shift buffer in the PE; FIG. 4 is a timing chart showing an example of the flow of processing in the shift buffer; 3 is a block diagram showing a schematic configuration of an output-side shift buffer in the PE; FIG. 4 is a timing chart showing an example of the flow of processing in the shift buffer; 1 is a block diagram showing an overview of double hashing in bitcoin mining; FIG. FIG. 4 is a block diagram showing a schematic configuration of a PE in a hash value computing device according to another embodiment of the present invention; It is a block diagram which shows the detail of the update part in said PE. FIG. 4 is a block diagram showing a schematic configuration of an embedded system according to another embodiment of the present invention; FIG.

Hereinafter, embodiments of the present invention will be described in detail. For convenience of explanation, members having the same functions as members shown in each embodiment are denoted by the same reference numerals, and descriptions thereof are omitted as appropriate.

[Embodiment 1]
An embodiment of the present invention will be described with reference to FIGS. 1-9. The hash value calculation device of this embodiment performs calculation of SHA (Secure Hash Algorithm)-256, which is a cryptographic hash function, on input data.

(SHA-256)
FIG. 2 is a diagram for explaining an outline of a method for calculating a SHA-256 value (hash value). First, the message is padded to convert it into a bit sequence with a length that is a multiple of 512, and the bit sequence is divided into message blocks M of 512 bits. Next, the message block M is divided into 32-bit words W ₀ -W ₁₅ . Next, ME (message expansion) operation is repeatedly performed on the message block M to calculate W ₁₆ to W ₆₃ . The above ME calculation is represented by the following equation (1).

Here, the logic functions σ0 and σ1 are represented by the following equation (2). In the following equation, S ⁿ (x) is a function that cyclically shifts x to the right by n bits. R ⁿ (x) is a function that shifts x right by n bits.

Next, MC (message compression) operations are repeatedly performed on words W ₀ -W ₆₃ to compute a 256-bit hash value. Specifically, in the MC calculation, first, variables a to h are initialized as shown in the following equations.

b ₌ _H1 _; _c ₌ H2; _d =H3; e= _H4 ; f= _H5 ;

Next, the calculation of the following equation (3) is repeated for j=0 to 63.

a=T ₁ +T ₂ ;b=a;c=b;d=c;e=d+T ₁ ;f=e;g=f;h=g (3).

Here, the word _Kj is a constant defined in the standard. Also, the logic functions Σ ₀ , Σ ₁ , Ch, and Maj are represented by the following equations (4).

Finally, the 256 _- bit intermediate hash values HO ₀ |HO ₁ | .

_HO2 = _c ₊ _H2 _; _HO3 ₌ _d ₊ H3; _HO4 =e+ _H4 ; _HO5 =f+ _H5 _; _HO6 =g+ _H6 ;

The intermediate hash values HO ₀ | _HO ₁ | . Then, the above processing is repeated for all message blocks M, and the intermediate hash value calculated for the last message block M becomes the final hash value corresponding to the message.

(Overview of Hash Value Calculator)
FIG. 3 is a block diagram showing a schematic configuration of the hash value computing device of this embodiment. As shown in FIG. 3, the hash value calculation device 10 (processing device) includes a controller 11 (control unit), a GRAM (Global Random Access Memory) 12, a bus 13, a plurality of processing elements (PE) 14, and a bus IF ( interface) 15. That is, the hash value calculation device 10 is a multi-core processor having multiple PEs 14 .

The controller 11 comprehensively controls the operations of various components of the hash value calculation device 10, and is composed of a computer including a CPU (Central Processing Unit) and memory, for example. Operation control of various configurations is performed by causing a computer to execute a control program.

The GRAM 12 stores information that is widely used by the hash value calculation device 10, and is composed of a storage device such as a flash memory. The bus 13 is for transferring data between the GRAM 12, a plurality of PEs 14, and the like. PE 14 is the processor core. Details of the PE 14 will be described later. A bus IF 15 is for transmitting and receiving data to and from an external device.

(PE)
FIG. 4 is a block diagram showing a schematic configuration of the PE 14. As shown in FIG. As shown in FIG. 4, the PE 14 comprises an ALU (Arithmetic Logic Unit) 20, two

RAMs

21 and 22, four shift buffers 23-26, and an adder 27. In this embodiment, each PE 14 has the function of calculating an intermediate hash value.

The ALU 20 performs arithmetic operations and logic operations, and in this embodiment, performs the ME operation and the MC operation. Details of the ALU 20 will be described later.

The RAM 21 (expansion memory) stores the data (words W ₀ to W ₆₃ ) used in the ME operation, and is hereinafter referred to as "WRAM 21". Of the words W ₀ to W ₆₃ , words W ₀ to W ₁₅ (message block M) are written from the GRAM 12 to the WRAM 21 via the bus 13 . Words W ₀ -W ₆₃ are written from ALU 20 to WRAM 21 via shift buffer 25 .

The RAM 22 (compression memory) stores the initial hash values H ₀ to H ₇ used in the MC calculation, and updates them with the calculated intermediate hash values HO ₀ to HO ₇ . HRAM 22”. The initial hash values H ₀ to H ₇ are written from the GRAM 12 to the HRAM 22 via the bus 13 . Intermediate hash values HO ₀ to HO ₇ are written from adder 27 .

The intermediate hash values HO ₀ to HO ₇ are the sums of the initial hash values H ₀ to H ₇ and the variables a to h calculated in the final loop of the MC calculation, respectively. The initial values H ₀ to H ₇ of the hash values are constants defined by the standard for the message block M of the first stage, and for the other message blocks M, intermediate values calculated from the message block M of the previous stage. Hash values HO ₀ to HO ₇ .

The shift buffer 23 (input buffer) temporarily stores data read from the WRAM 21 . The shift buffer 23 sends the temporarily stored data to the ALU 20 . In addition, in FIG. 4, the shift buffer 23 is described as "SBi1".

The shift buffer 24 (input buffer) temporarily stores initial hash values H ₀ to H ₇ read from the HRAM 22 . The shift buffer 24 sends the temporarily stored initial hash values H ₀ to H ₇ to the ALU 20 and the adder 27 . In addition, in FIG. 4, the shift buffer 24 is described as "SBi2".

The shift buffer 25 (output buffer) temporarily stores the word calculated by the ALU 20 . The shift buffer 25 writes the temporarily stored word to the WRAM 21 . In addition, in FIG. 4, the shift buffer 25 is described as "SBo1".

The shift buffer 26 (output buffer) temporarily stores variables a to h calculated by the ALU 20 . The shift buffer 26 sends the temporarily stored variables a to h to the adder 27 . Note that the shift buffer 26 is described as "SBo2" in FIG.

The adder 27 adds the initial hash values H ₀ to H ₇ from the shift buffer 24 and the variables a to h from the shift buffer 26 to calculate the intermediate hash values HO ₀ to HO ₇ . is. The adder 27 writes the calculated intermediate hash values HO ₀ to HO ₇ in the HRAM 22 and transmits them to the GRAM 12 via the bus 13 . Although not shown, the adder 27 has four adders.

(ALU)
As shown in FIG. 4 , the ALU 20 has a configuration including an ME operation section 30 (expansion section), an MC operation section 31 (compression section), a multiplexer 32 and a demultiplexer 33 .

The ME calculation unit 30 uses data input from the WRAM 21 via the shift buffer 23 to execute the ME calculation. The ME calculation unit 30 outputs the data calculated by the ME calculation to the WRAM 21 via the shift buffer 25 and to the MC calculation unit.

The MC calculation unit 31 uses the data from the ME calculation unit 30 and the data from the multiplexer 32 to execute the MC calculation. The MC calculation unit 31 sends the data calculated by the MC calculation to the demultiplexer 33 .

The multiplexer 32 acquires the variables a to h input to the MC calculation unit 31 from the HRAM 22 via the shift buffer 24 in the loop #0, and from the demultiplexer 33 in the other loops #1 to #63. It is to switch. The demultiplexer 33 outputs (feeds back) the variables a to h output by the MC calculation unit 31 to the multiplexer 32 for the next loops #1 to #63 in the loops #0 to #62, and in the final loop #63 It switches to output to HRAM 22 via shift buffer 26 and adder 27 .

FIG. 1 is a block diagram showing a schematic configuration of the ALU 20. As shown in FIG. FIG. 1 shows the j-th loop processing in the ALU 20. FIG. Note that in FIG. 1, Wj is shown as W( _j ).

As shown in FIG. 1, in the ME calculation unit 30, when j=0 to 15, the word W(j) is sent to the MC calculation unit 31 via the multiplexer MUX. Next, when j=16 to 63, the first arithmetic processing ME-1 and the second arithmetic processing ME-2 are performed in order.

In the first operation processing ME-1, W(j-15) is subjected to the operation of the logic function σ0 by the logic operation unit _SIG0 , and is added to W(j-16) by the arithmetic operation unit Adder. be. Further, W(j-2) is subjected to the logic function _.sigma.1 by the logic operation section SIG1, and is added to W(j-7) by the arithmetic operation section Adder.

Next, in the second arithmetic processing ME-2, the two words calculated in the first arithmetic processing ME-1 are added by the arithmetic operation unit Adder. The calculation result is output to the WRAM 21 via the shift buffer 25 as a word W(j), and is also sent to the MC calculation unit 31 .

On the other hand, in the MC calculation unit 31, the first to fourth calculation processes MC-1 to MC-4 are performed in order.

In the first arithmetic operation MC-1, the constant K(j) and the variable h are added by the arithmetic operation unit Adder. Further, the logical operation part EP1 performs the operation of the logical function _Σ1 on the variable e, and the logical operation part CH performs the operation of the logical function Ch on the variables e, f, and g. The two words are added in the arithmetic operation unit Adder. Next, in the second arithmetic processing MC-2, the two words calculated in the first arithmetic processing MC-1 are added by the arithmetic operation unit Adder. Next, in the third calculation processing MC-3, the word calculated in the second calculation processing MC-2 and the word W(j) from the ME calculation section 30 are added by the arithmetic calculation section Adder. _The result of this operation is word T1.

In the first operation processing MC-1, the logical operation unit EP0 performs the operation of the logical function _Σ0 on the variable a, and the logical operation unit MAJ operates on the variables a, b, and c. Maj is calculated, and the calculated two words are added in the arithmetic operation unit Adder. _The result of this operation is word T2.

In the fourth operation processing MC- ₄ , the word T1 and the variable d are added in the arithmetic operation unit Adder and then output as the loop variable e. Also, the word T1 and the word T2 _are added in the arithmetic operation unit Adder and output as the variable _a of the next loop. Also, variables a to c and e to g are output as variables b to d and f to h of the next loop via wire assignment unit WireAssign.

As shown in FIG. 1, the first arithmetic processing ME-1 of the ME arithmetic unit 30 and the first arithmetic processing MC-1 of the MC arithmetic unit 31 are executed in parallel, and the second arithmetic processing ME-1 of the ME arithmetic unit 30 is executed in parallel. 2 and the second arithmetic processing MC-2 of the MC arithmetic unit 31 are executed in parallel. Then, using the word W(j) calculated by the ME calculation unit 30, the third calculation processing MC-3 and the fourth calculation processing MC-4 of the MC calculation unit 31 are executed.

Therefore, the number of clocks required to execute each loop process can be reduced to 4 clocks compared to the case where the ME arithmetic unit 30 and the MC arithmetic unit 31 are executed separately. Also, an intermediate hash value for message block M can be calculated with only one PE 14 . Therefore, the number of PEs 14 required to determine the final hash value from the message can be reduced.

WRAM 21 stores message block M (words W ₀ to W ₁₅ ) and words W ₁₆ to W ₆₃ , and HRAM 22 stores initial hash values. This eliminates the need to write the words W ₁₆ to W ₆₃ to the external memory and to read the message block M, the initial hash values, and the words W ₁₆ to W ₆₃ from the external memory. Therefore, the processing efficiency of the PE 14 can be improved.

Also, the WRAM 21 stores the message block M (words W ₀ to W ₁₅ ), and the intermediate hash value can be calculated from the data stored in the HRAM 22 . Therefore, if the WRAM 21 stores the next message block M, the next intermediate hash value corresponding to the next message block M can be calculated using the intermediate hash value. By repeating these steps, only one PE 14 can obtain the final hash value from the message.

(WRAM/HRAM)
As shown in FIG. 4, the WRAM 21 has four memory units WM1 to WM4, and data of the first to fourth message blocks M are stored in the memory units WM1 to WM4, respectively. In this embodiment, the data of the first to fourth message blocks M are different, but part or all of them may be the same. Each of the memory units WM1 to WM4 should have at least a memory size (512 bits) capable of storing 16 words W _j−16 to W _j−1 , and 64 words W ₀ to W ₆₃ . may have a memory size (2048 bits) capable of storing .

The HRAM 22 is configured to include four memory units HM1 to HM4. The memory units HM1 to HM4 store first to fourth initial values H ₀ to H ₇ of hash values corresponding to the first to fourth message blocks M stored in the memory units WM1 to WM4 of the WRAM 21, respectively. The first through fourth initial hash values H ₀ through H ₇ are updated by the first through fourth intermediate hash values HO ₀ through HO ₇ corresponding to the first through fourth message blocks M, respectively. Each of the memory units HM1 to HM4 may have at least a memory size (256 bits) capable of storing initial hash values H ₀ to H ₇ .

FIG. 5 is a timing chart showing an example of the flow of processing in the ME calculation section 30 and the MC calculation section 31. FIG. In FIG. 5, F1 to F4 respectively indicate the processing for the first to fourth message blocks M stored in WM1 to WM4. Also, Lj (j=0 to 63) indicates a loop number.

The process F1 for the first message block M is performed at the following timing. As shown in FIG. 5, first, at clock CK0, the 0th loop is started, data is read from the memory units WM1 and HM1, and first arithmetic processes ME-1 and MC-1 are executed.

Next, at clock CK1, second arithmetic processing ME-2 and MC-2 are executed. The word W(16) calculated in the second arithmetic processing ME-2 is stored in the memory section WM1 and sent to the MC arithmetic section 31 as well.

Next, at the clocks CK2 and CK3, the third arithmetic processing MC-3 and the fourth arithmetic processing MC-4 of the MC arithmetic unit 31 are executed, respectively, and the 0th loop ends. Therefore, each loop is executed at four timings.

Next, at clock CK4, the first loop is started, data is read from the memory unit WM1, and the first arithmetic processing ME-1 is executed. Further, the first arithmetic processing MC-1 is executed using the variables a to h calculated in the fourth arithmetic processing MC-4 at the previous clock CK3. Thereafter, the above operation is repeated.

In the processing F2 regarding the second message block M, the 0th loop is started at the clock CK1, and the same operations as the above operations are repeated. Similarly, the processes F3 and F4 relating to the third and fourth message blocks M start the 0th loops at the clocks CK2 and CK3, respectively, and the same operations as described above are repeated. Then, at clock CK4, the first loop of process F1 relating to the first message block M is started.

Therefore, the PE 14 of this embodiment can calculate four loop processes corresponding to four message blocks M in four clocks. That is, one loop process can be executed per clock. As a result, efficiency of processing in the ALU 20 can be improved.

Although the number of memory units provided in each of the WRAM 21 and the HRAM 22 is four in this embodiment, it is not limited to this. Each of the WRAM 21 and the HRAM 22 should be provided with a memory section for the number of clocks required for one loop.

(shift buffer)
FIG. 6 is a block diagram showing a schematic configuration of shift buffers 23 and 24. As shown in FIG.

As shown in FIG. 6, the shift buffer 23 has a configuration including four shift registers 40 to 43 and a multiplexer 44 . The shift register 40 stores four words (W(j-16), W(j-15), W(j-7) and W(j-2)) from the memory portion WM1 of the WRAM 21. FIG. Similarly, the shift registers 41 to 43 store the four words from the memory sections WM2 to WM4 of the WRAM 21, respectively. The multiplexer 44 selects one of the shift registers 40 to 43 and inputs the four words of the selected shift register to the ME operation section 30 of the ALU 20 .

Also, the shift buffer 24 has a configuration including four shift registers 45 to 48 and a multiplexer 49 . The shift registers 45 to 48 store the initial hash values H ₀ to H ₇ from the memory units HM1 to HM4 of the HRAM 22, respectively. The multiplexer 49 selects one of the shift registers 45 to 48, and inputs the initial values H ₀ to H ₇ of the selected shift register to the MC calculation unit 31 of the ALU 20 via the multiplexer 32 as variables a to h. It is something to do. The multiplexer 49 also sends the initial hash values H ₀ to H ₇ of the selected shift register to the adder 27 .

FIG. 7 is a timing chart showing an example of the flow of processing in the shift buffer 23. FIG. As shown in FIG. 7, in the shift register 40, four words used by the MC calculation unit 31 are sequentially read out from the memory unit WM1 in a certain loop at the clocks CK10 to CK13. Then, at clock CK13, the stored four words are input to MC operation section 31 of ALU 20 via multiplexer 44. FIG. This initiates a certain loop of process F1 for the first message block M as described above.

Next, in the shift register 40, four words to be used in the MC calculation unit 31 are sequentially read out in the next loop at the clocks CK14 to CK17. Then, at clock CK17, the stored four words are input to MC operation section 31 of ALU 20 via multiplexer 44. FIG. Thereafter, the above operation is repeated.

On the other hand, in the shift register 41, the four words used by the MC calculation unit 31 are sequentially read from the memory unit WM2 in a certain loop during the clocks CK11 to CK14. Then, at clock CK14, the stored four words are input to MC operation section 31 of ALU 20 via multiplexer 44. FIG. This initiates a certain loop of process F2 for the second message block M as described above.

Similarly, in the shift register 42, four words sequentially stored from the memory section WM3 are input to the MC calculation section 31 of the ALU 20 via the multiplexer 44 at the clock CK15. This initiates a loop of process F3 for the third message block M described above. In the shift register 43, the four words sequentially stored from the memory section WM4 are input to the MC calculation section 31 of the ALU 20 via the multiplexer 44 at the clock CK16. This initiates a certain loop of process F4 for the fourth message block M as described above. Then, at clock CK17, the four words sequentially stored from the memory section WM1 are input to the MC calculation section 31 of the ALU 20 via the multiplexer 44. FIG. Thereby, the next loop of the process F1 for the first message block M is started.

Therefore, the shift buffer 23 of the present embodiment can sequentially read data required by the ME calculation unit 30 from the WRAM 21 and output them all at once to the ME calculation unit 30 at a predetermined timing. This eliminates the need for the ME calculation unit 30 to wait to read data from the WRAM 21 . As a result, processing efficiency in the processing elements can be further improved. Further, since the shift buffer 23 includes the shift registers 40 to 43 corresponding to the memory units WM1 to WM4 of the WRAM 21, it is possible to avoid the delay in processing speed due to the provision of only one shift register. Since the shift buffer 24 is similar to the shift buffer 23, the description thereof will be omitted.

FIG. 8 is a block diagram showing a schematic configuration of shift buffers 25 and 26. As shown in FIG.

As shown in FIG. 8, the shift buffer 25 has a configuration including four shift registers 50 to 53 and a demultiplexer . The demultiplexer 54 outputs the word W(j) calculated by the ME calculation unit 30 to any one of the shift registers 50-53. The shift registers 50-53 store the words W(j) for the first to fourth message blocks M via the demultiplexer 54, respectively. The shift registers 50-53 write the stored word W(j) to the memory units WM1-WM4, respectively.

Also, the shift buffer 26 has a configuration including four shift registers 55 to 58 and a demultiplexer 59 . The demultiplexer 59 outputs variables a to h of the final loop calculated by the MC calculation unit 31 to any one of the shift registers 55 to 58 . The shift registers 55 to 58 store final loop variables a to h for the first to fourth message blocks M via the demultiplexer 54, respectively. The variables a to h stored in the shift registers 55 to 58 are respectively added to the initial values H ₀ to H ₇ of the hash values from the shift registers 45 to 48 in the adder 27 to obtain the intermediate hash values HO ₀ to _HO7 is written in the memory units HM1 to HM4.

FIG. 9 is a timing chart showing an example of the flow of processing in the shift buffer 26. As shown in FIG. As shown in FIG. 9, in the shift register 55, variables a to h of the final loop regarding the first message block calculated by the MC calculation unit 31 of the ALU 20 are stored via the demultiplexer 33 at the clock CK20. Then, the variables a to h are sequentially sent to the first addition section of the adder 27 at the clocks CK21 to CK28. In the first adder, the variables a to h are sequentially added to the initial hash values H ₀ to H ₇ of the first message block obtained from the shift register 45 to obtain an intermediate hash value HO ₀ of the first message block. . . . _HO7 are sequentially written in the memory portion HM1 of the HRAM 22. FIG.

On the other hand, in the shift register 56, variables a to h of the final loop regarding the second message block calculated by the MC calculation unit 31 of the ALU 20 are stored via the demultiplexer 33 at the clock CK21. Then, the variables a to h are sequentially sent to the second adder of the adder 27 at the clocks CK22 to CK29. In the second adder, the variables a to h are sequentially added to the initial hash values H ₀ to H ₇ of the second message block obtained from the shift register 45 to obtain an intermediate hash value HO ₀ of the second message block. . . . HO7 are sequentially written in the memory portion _HM2 of the HRAM22. The shift registers 57 and 58 are also the same.

Therefore, the shift buffer 25 of this embodiment can sequentially write the data output by the ME calculation unit 30 to the WRAM 21 . This eliminates the need for the ME calculation unit 30 to wait to write the data to the WRAM 21 . As a result, processing efficiency in the PE 14 can be further improved. Further, since the shift buffer 25 includes the shift registers 50 to 53 corresponding to the memory units WM1 to WM4 of the WRAM 21, it is possible to avoid the delay in processing speed due to the provision of only one shift register. Since the shift buffer 26 is similar to the shift buffer 25, the description thereof will be omitted.

(Modification 1)
In this modification, the PE 14 calculates 4×L intermediate hash values HO ₀ to HO ₇ corresponding to 4×L (L is an integer of 2 or more) message blocks M, respectively. In this case, the WRAM 21 may be provided with L groups of four memory units WM1 to WM4. At this time, the WRAM 21 may have a storage capacity of L×4×64×32 bits=L kilobytes. Also, the HRAM 22 may be provided with L groups of four memory units HM1 to HM4. At this time, the HRAM 22 should have a storage capacity of L*4*8*32 bits=L*128 bytes.

Then, the ALU 20 should execute as follows. That is, first, four intermediate hash values HO ₀ to HO ₇ corresponding to the four message blocks M of the first group are calculated. Next, four intermediate hash values HO ₀ to HO ₇ corresponding to the four message blocks M of the second group are calculated. Thereafter, the processing is repeatedly executed up to the L-th group.

In the case of this modification, a large number of intermediate hash values HO ₀ to HO ₇ corresponding to a large number of message blocks M can be calculated by one PE.

(Modification 2)
By the way, when obtaining the final hash value from the message, as described above, the intermediate hash value corresponding to a certain message block becomes the initial hash value for the next message block.

Therefore, in this modification, the PE 14 sets the four intermediate hash values corresponding to the four message blocks of a certain group to be the four initial hash values of the four message blocks of the next group. be done. In this case, the HRAM 22 should have four memory units HM1 to HM4. Therefore, the HRAM 22 should have a storage capacity of 4*8*32 bits=128 bytes.

[Embodiment 2]
Another embodiment of the invention is described with reference to FIGS. 10-12. The hash value calculation device of this embodiment performs the above-described SHA-256 calculation twice on input data (double hash). The hash value calculation device of this embodiment is suitable for, for example, bitcoin mining.

Fig. 10 is a block diagram showing an overview of double hashing in bitcoin mining. The upper part of FIG. 10 shows the structure of input data for each block in the Bitcoin blockchain. The lower part of FIG. 10 shows an overview of the processing for mining the above blocks.

(Bitcoin mining)
1024-bit input data DAT is used in bitcoin mining processing. As shown in the upper part of FIG. 10, the input data DAT consists of a 32-bit version value, a 256-bit hash value (hash value of the previous block) when the block chain was created last time, and a 256-bit Merkle root (Hash value of the previous block). root) hash value, 32-bit timestamp, 32-bit target, 32-bit Nonce, and 384-bit padding.

First, the input data DAT is divided into two message blocks M1 and M2. Next, the hash function SHA-256(1) is calculated using the first message block M1 and the initial hash value (initial hash value) to obtain the first hash value Ha-1. This process may be executed by the PE 14 of the hash value calculation device in FIGS. 1 to 9, or may be executed by another calculation device. Note that the initial hash value is a constant.

Next, the hash function SHA-256(2) is calculated using the 256-bit first hash value Ha-1 and the second message block M2 to obtain the second hash value Ha-2. Next, the 256-bit second hash value Ha-2 is padded to create a 512-bit third message block M3. Next, the hash function SHA-256(3) is calculated using the third message block M3 and the initial hash value to obtain the final hash value Ha-F. In other words, double hashing is performed using the first hash value Ha-1 and the second message block M2 to obtain the final hash value Ha-F.

If the final hash value Ha-F is equal to or greater than a predetermined threshold, it is determined that the nonce is not correct, the nonce is changed (specifically, incremented by 1), and the repeat the action. On the other hand, if the final hash value Ha-F is smaller than the predetermined threshold, the nonce is determined to be correct and a new block is generated for the blockchain.

(processing element)
The hash value calculation device 10 of this embodiment differs from the hash value calculation devices shown in FIGS. 1 to 9 in the configuration of the PE, and the other configurations are the same.

FIG. 11 is a block diagram showing a schematic configuration of the PE 16 in the hash value calculation device 10 of this embodiment. The PE 16 of this embodiment computes the hash functions SHA-256(2) and SHA-256(3) shown in FIG. The first hash value Ha-1 and the second message block M2 used in the hash function SHA-256(2) are pre-stored in the GRAM 12 (see FIG. 3).

FIG. 11 is a block diagram showing a schematic configuration of the PE 16 in the hash value calculation device 10 of this embodiment. The PE 16 of this embodiment computes the hash functions SHA-256(2) and SHA-256(3) shown in FIG. In addition, the PE 16 of this embodiment schematically includes two sets of the ALU 20, WRAM 21, HRAM 22, shift buffers 23 to 26, and adder 27 in the PE 14 shown in FIG.

Specifically, the PE 16 of this embodiment includes a first set of ALU 60, WRAM 61, HRAM 62, shift buffers 63-66, and adder 67, and a second set of ALU 70, WRAM 71, HRAM 72, shift buffers 73-76, and an adder 77 .

ALUs

60, 70,

WRAMs

61, 71,

HRAMs

62, 72, shift buffers 63-66, 73-76, and

adders

67, 77 are equivalent to ALU 20, WRAM 21, HRAM 22, shift buffers 23-26, and adders shown in FIG. Since it has the same function as the device 27, its explanation is omitted.

In the first set, the second message block M2 from GRAM12 is stored in WRAM61 and the first hash value Ha-1 from GRAM12 is stored in HRAM62. Thereby, the adder 67 calculates the second hash value Ha-2.

In the second set, the second hash values Ha-2 from the first set are stored in WRAM 71 as words W ₀ -W ₇ of the third message block M3, and the padding from GRAM 12 is the word W of the third message block M3. ₈ to W ₁₅ are stored in the WRAM 71 . Also, the initial hash value from the GRAM 12 is stored in the HRAM 72 . Thereby, the adder 77 calculates the final hash value Ha-F.

Therefore, the above double hashing can be realized by the PE 16 of this embodiment, and the PE 16 of this embodiment can be incorporated into a system that uses double hashing.

(update part)
The PE 16 of this embodiment further comprises an updater 68 in the first set. The updating unit 68 updates the nonce stored in the WRAM 61 .

FIG. 12 is a block diagram showing details of the updating unit 68. As shown in FIG. As shown in FIG. 12, the updating section 68 comprises an incrementer 68a and a multiplexer 68b.

The nonce contained in the _second message block M2 is stored in WRAM 61 as word W3. Therefore, in the nineteenth loop, the incrementer 68a acquires the word W3 input from the shift buffer 63 to the ME operation unit ₃₀ of the ALU 60 and increments it by one. Incrementer 68a sends the incremented word _W3 to multiplexer 68b.

The _19th loop is the last time word W3 is used. That is, when the use of word W3 in the ME operation unit ₃₀ is completed by the above input _, word W3 is updated. Note that the updating unit 68 updates the word W3 at an arbitrary timing from when the use of the word W3 in the ME operation unit ₃₀ of the ALU 60 ends to when all loop processing of the ME operation unit ₃₀ ends. can be updated.

The multiplexer 68b writes the words W16 to W63 obtained from the ALU 60 through the shift buffer 65 to the WRAM 61 _, and writes the word W3 incremented by the incrementer 68a to the _WRAM 61 as a new word W3.

Thus, when all loops are completed _, WRAM 61 will have a new word W3 stored. As a result, immediately after the above termination, the PE 16 can again loop the second message block M2 including the new nonce and calculate a new final hash value Ha-F. As a result, there is no need to acquire the second message block M2 containing a new nonce from the GRAM 12, so the above calculation can be performed quickly. Also, the PE 16 can compute multiple final hash values Ha-F from multiple second message blocks M2 each containing multiple different nonces, and there is no limit to the amount of nonces that the PE 16 can utilize.

Further, even when calculating L second hash values Ha-2 from L second message blocks M2 each containing L different nonces, there is no need to increase the storage capacity of the WRAM 61 by L times. That is, the WRAM 61 should have a storage capacity of 4*64*32 bits=1 kilobyte.

The WRAM 71 of the second set may store the second hash value Ha-2 while sequentially updating it. Therefore, like the WRAM 61, the WRAM 71 may have a storage capacity of 4*64*32 bits=1 kilobyte. Also, the

HRAMs

62 and 72 do not need to change the stored first hash value Ha-1 and the initial hash value. Therefore, the

HRAMs

62 and 72 should have a storage capacity of 4*8*32 bits=128 bytes.

(Judgment part)
As shown in FIG. 11, the PE 16 of this embodiment further includes a determination unit 78 in the second set. The determination unit 78 determines whether the final hash value Ha-F obtained from the ALU 70 via the shift buffer 76 is smaller than the target value.

If the final hash value Ha-F is equal to or greater than the target value, the determination unit 78 determines that the final hash value Ha-F is not a valid hash value, and determines that the final hash value Ha-F is not valid. - Discard F. On the other hand, when the final hash value Ha-F is smaller than the target value, the determination unit 78 determines that the final hash value Ha-F is a valid hash value. At this time, the determination unit 78 outputs the final hash value Ha-F and the nonce corresponding to the final hash value Ha-F to the GRAM 12 . Then, the determination unit 78 instructs all the PEs 16 to end their operations.

Therefore, the PE 16 of this embodiment successively updates the nonce included in the second message block M2, successively calculates the second hash value Ha-2, and successively calculates the final hash value Ha-F. , the final hash value Ha-F is a valid hash value in turn. As a result, bitcoin mining can be efficiently executed.

Note that when the determination unit 78 determines that the final hash value Ha-F is a valid hash value, the determination unit 78 instructs the controller 11 to end the operation of all the PEs 16, or instructs all the PEs 16. You can As a result, the subsequent unnecessary mining process can be avoided.

(Additional notes)
Although the hash value calculation device 10 of the present embodiment is applied to mining of bitcoin, it can also be applied to mining of other blockchains. Further, although the hash value calculation device 10 of the present embodiment is applied to the double hash calculation, it can also be applied to the case of executing the calculation of the hash function three times or more.

[Embodiment 3]
Another embodiment of the invention will be described with reference to FIG.

FIG. 13 is a block diagram showing the schematic configuration of the embedded system of this embodiment. As shown in FIG. 13, an embedded system 80 includes a processor 81, a DDR (Double Data Rate) memory 82, an AMBA (Advanced Microcontroller Bus Architecture) bus 83, and the hash value calculation device 10 shown in FIGS. , and an AXI bus 84 .

The processor 81 includes a CPU, cache, and memory management unit (MMU). Processor 81 and DDR memory 82 are connected via AMBA bus 83 . The processor 81 and hash value calculation device 10 are also connected via an AXI bus 84 . As shown in FIG. 13, the hash value calculation device 10 shown in FIGS. 1 to 12 can be incorporated into an embedded system.

[Example of realization by software]
The control blocks (especially the

ALUs

20, 60, and 70) of the hash value calculation device 10 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software. .

In the latter case, the hash value computing device 10 is equipped with a computer that executes program instructions, which are software that implements each function. This computer includes, for example, one or more processors, and a computer-readable recording medium storing the program. In the computer, the processor reads the program from the recording medium and executes it, thereby achieving the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-temporary tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, etc. can be used. In addition, a RAM (Random Access Memory) or the like for developing the above program may be further provided. Also, the program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. Note that one aspect of the present invention can also be implemented in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

The present invention is not limited to the above-described embodiments, but can be modified in various ways within the scope of the claims, and can be obtained by appropriately combining technical means disclosed in different embodiments. is also included in the technical scope of the present invention.

For example, in the above embodiment, the value of SHA-256 is obtained. The present invention can be applied to any hash value calculation device that compresses .

〔summary〕
A processing element according to one aspect of the present invention is a processing element that calculates an intermediate hash value from a message block of a predetermined bit length, and repeats loop processing on the message block to obtain a bit sequence longer than the bit length. and a compression unit that repeats loop processing on the expanded bit sequence and compresses it to the intermediate hash value, and the expansion unit and the compression unit are one loop of the expansion unit processing and a part of one loop processing of the compression unit are executed in parallel, and the compression unit performs one loop processing of the compression unit using the word calculated by the one loop processing of the decompression unit. Among them, the remaining processes that are not executed in parallel are executed.

According to the above configuration, in each loop process, the arithmetic processing of the decompression unit and part of the arithmetic processing of the compression unit can be executed in parallel. As a result, the number of clocks required to execute each loop process can be reduced.

The expansion unit uses words calculated by one loop process in another loop process, and the compression unit uses variables calculated by one loop process in the next loop process. Then, the intermediate hash value is calculated by completing all the loop processes.

Therefore, it is possible to calculate the intermediate hash value from the message block with only one processing element. As a result, the number of processing elements required to determine the final hash value from the message can be reduced.

The processing element according to this aspect further comprises a decompression memory that stores the message block and the word, and a compression memory that stores an initial hash value used in the first loop processing of the compression unit. good too. According to the above configuration, it is not necessary to write the word to the external memory or to read the message block, the initial value of the hash value, and the word from the external memory. Therefore, the processing efficiency of the processing element can be improved.

In the processing element according to this aspect, the compression unit may update the initial value of the hash value stored in the compression memory with the intermediate hash value. According to the above configuration, by updating the message block stored in the expansion memory with the next message block, the processing element can calculate the next intermediate hash value. By repeating this, only one processing element can obtain the final hash value from the message.

In the processing element according to this aspect, each of the decompression memory and the compression memory may include a memory section for the number of clocks required for the compression section to execute one loop process. According to the above configuration, a plurality of message blocks are stored in the plurality of memory units of the decompression memory, and initial hash values corresponding to the plurality of message blocks are stored in the plurality of memory units of the compression memory. memorize As a result, a plurality of loop processes for a plurality of message blocks can be executed with the above number of clocks. That is, one loop process can be executed per clock. As a result, processing efficiency in the processing elements can be further improved.

In the processing element according to this aspect, an input buffer for temporarily storing data input from the decompression memory to the decompression unit, and an input buffer for temporarily storing data input from the compression memory to the compression unit and may be further provided.

According to the above configuration, the input buffer can sequentially read out the data required by the expansion section from the expansion memory and simultaneously output the data to the expansion section at a predetermined timing. This eliminates the need for the expansion unit to wait for reading the data from the expansion memory. Further, the input buffer can sequentially read the data required by the compression section from the compression memory and output them all at once to the compression section at a predetermined timing. This eliminates the need for the compression unit to wait to read the data from the compression memory. As a result, processing efficiency in the processing elements can be further improved.

It should be noted that when each of the expansion memory and the compression memory includes a plurality of memory units, the input buffer preferably includes buffer units corresponding to the plurality of memory units.

The processing element according to this aspect includes an output buffer for temporarily storing data to be output from the decompression unit to the decompression memory, and an output buffer for temporarily storing data to be output from the compression unit to the compression memory. You may have more.

In this case, the output buffer can sequentially write the data output by the expansion unit to the expansion memory. This eliminates the need for the expansion unit to wait for writing the data in the expansion memory. Further, the output buffer can sequentially write the data output from the compression unit to the compression memory. This eliminates the need for the compression unit to wait for writing the data in the compression memory. As a result, processing efficiency in the processing elements can be further improved.

When the expansion memory and the compression memory each include a plurality of memory units, it is preferable that the output buffer includes buffer units corresponding to the plurality of memory units.

The processing element according to this aspect includes a plurality of sets of the expansion unit, the compression unit, the expansion memory, and the compression memory, and one set of the compression units includes the compressed intermediate hash value may be written to the other set of the expansion memory as part of the other set of the message blocks.

In this case, a single processing element can be used to execute a hash function on the message block multiple times, such as double hashing. As a result, the processing element can be applied to systems that utilize double hashes, such as Bitcoin.

In the processing element according to this aspect, the number of sets is two, and words in a predetermined range in the message block stored in the expansion memory of the first set are converted to An updating unit may be further provided for updating when the use of the word is finished. According to the above configuration, it is possible to successively update the predetermined range of words in the message block and successively calculate intermediate hash values. As a result, bitcoin mining can be efficiently executed. Note that the updating unit updates the word at an arbitrary timing from the end of use of the word in the expansion unit of the first set to the end of all loop processing of the first set. can do.

The processing element according to this aspect further comprises a determination unit that determines whether the intermediate hash value compressed by the compression unit of the second set is a final hash value, and that the final hash value is a valid hash value. good too. According to the above configuration, the compression units of the second set can successively calculate the final hash values using the intermediate hash values successively calculated by the compression units of the first set. Accordingly, the determination unit can determine whether the successively calculated final hash values are valid hash values. As a result, bitcoin mining can be performed more efficiently.

When determining that the final hash value is a valid hash value, the determination unit notifies an external device of the final hash value and the predetermined range of words corresponding to the final hash value. may terminate the operation of the processing element. This eliminates the need to execute the subsequent unnecessary mining process.

The processing elements according to each aspect of the present invention may be realized by a computer. In this case, the processing elements are realized by the computer by operating the computer as each part (software element) included in the processing elements. A control program for a processing element and a computer-readable recording medium recording it are also included in the scope of the present invention.

According to the above configuration, the above effects can be achieved. Further, in bitcoin mining, the control unit sets the word change range for each processing element, so that the mining can be executed in parallel by a plurality of the processing elements, and the mining can be further performed. can be executed efficiently.

In this case, when a final hash value determined to be a valid hash value and a predetermined range of words corresponding to the final hash value are acquired from a certain processing element, the control unit operates on all processing elements. should be instructed to end the This eliminates the need to execute the subsequent unnecessary mining process.

According to the above method, the same effects as those of the above processing elements can be achieved.

10 Hash value calculation device (processing device)
11 controller (control unit)
12 GRAM
13

bus

14, 16 PE
15 bus interface
20, 60, 70 ALUs
21, 61, 71 WRAM (development memory)
22, 62, 72 HRAM (memory for compression)
23, 24, 63, 64, 73, 74 shift buffers (input buffers)
25, 26, 65, 66, 75, 76 shift buffers (output buffers)
27, 67, 77 Adder 30 ME calculation unit (development unit)
31 MC calculation unit (compression unit)
32, 44, 49,

68b Multiplexers

33, 54, 59 Demultiplexers 40-43, 45-48 Shift registers (input buffers)
50-53, 55-58 shift register (buffer for output)
68 update unit 68a incrementer 78 decision unit 80 system 81 processor 82 DDR memory 82 memory 83 AMBA bus 84 AXI bus

Claims

A processing element that calculates an intermediate hash value from a message block of predetermined bit length,
an expansion unit that repeats loop processing on the message block and expands it into a bit sequence longer than the bit length;
a compression unit that repeats loop processing on the expanded bit sequence and compresses it to the intermediate hash value,
The expansion unit and the compression unit execute in parallel one loop processing of the expansion unit and part of one loop processing of the compression unit,
The compression unit is a processing element that uses the word calculated by the one loop processing of the decompression unit to execute the remaining processing that is not executed in parallel in the one loop processing of the compression unit.
an expansion memory for storing the message blocks and the words;
2. The processing element according to claim 1, further comprising a compression memory for storing an initial hash value used in the first loop processing of said compression unit.
The processing element according to claim 2, wherein the compression unit updates the initial value of the hash value stored in the compression memory with the intermediate hash value.
4. The processing element according to claim 2 or 3, wherein each of said decompression memory and said compression memory comprises a memory section for the number of clocks necessary for said compression section to execute processing of one loop.
3. An input buffer for temporarily storing data inputted from said expansion memory to said expansion unit, and an input buffer for temporarily storing data inputted from said compression memory to said compression unit. 5. A processing element according to any one of clauses 1 to 4.
6. An output buffer for temporarily storing data to be output from said decompression unit to said decompression memory, and an output buffer for temporarily storing data to be output from said compression unit to said compression memory. A processing element according to any one of the preceding claims.
a plurality of sets of the decompression unit, the compression unit, the decompression memory, and the compression memory,
7. Any one of claims 2 to 6, wherein the compression unit of a set writes the compressed intermediate hash value as part of the message block of another set into the decompression memory of the other set. Processing elements described in .
the number of said sets is two;
further comprising an update unit that updates a predetermined range of words in the message block stored in the expansion memory of the first set when use of the words in the expansion unit of the first set ends; A processing element according to clause 7.
9. The processing element according to claim 8, further comprising a determining unit for determining whether the intermediate hash value compressed by the compressing unit of the second set is a final hash value and whether the final hash value is a valid hash value. .
A plurality of processing elements according to any one of claims 1 to 9,
A processing device, further comprising a control unit that controls the plurality of processing elements.
A processing element for calculating an intermediate hash value from a message block of a predetermined bit length, comprising: an expansion unit for repeating loop processing on the message block and expanding it into a bit sequence longer than the bit length; and expanded bits. A control method for a processing element comprising: a compression unit that repeats loop processing for a sequence and compresses to the intermediate hash value,
causing the decompression unit and the compression unit to concurrently execute one loop process of the decompression unit and part of the one loop process of the compression unit;
and causing the compression unit to use the words calculated by the one-loop processing of the decompression unit to execute the remaining processing that is not executed in parallel in the one-loop processing of the compression unit. How the element is controlled.
A control program for causing a computer to function as the processing element according to claim 1, the control program for causing the computer to function as the expansion section and the compression section.