WO2004063842A2 - Flexible hardware implementation of hash functions - Google Patents

Flexible hardware implementation of hash functions Download PDF

Info

Publication number
WO2004063842A2
WO2004063842A2 PCT/IL2004/000050 IL2004000050W WO2004063842A2 WO 2004063842 A2 WO2004063842 A2 WO 2004063842A2 IL 2004000050 W IL2004000050 W IL 2004000050W WO 2004063842 A2 WO2004063842 A2 WO 2004063842A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
hash function
word
memory
logical
Prior art date
Application number
PCT/IL2004/000050
Other languages
French (fr)
Other versions
WO2004063842A3 (en
Inventor
Isaac Hadad
Original Assignee
Discretix Technologies Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Discretix Technologies Ltd. filed Critical Discretix Technologies Ltd.
Publication of WO2004063842A2 publication Critical patent/WO2004063842A2/en
Publication of WO2004063842A3 publication Critical patent/WO2004063842A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/12Details relating to cryptographic hardware or logic circuitry

Definitions

  • the present invention relates to hardware implementations of hash functions. More particularly, the invention relates to a flexible hardware implementation of various hash function algorithms in a single module.
  • the -present invention is- directed to -hardware- implementation. of various types of hash functions derived from the MD-5 hash function algorithm, such as SHA- 1 and SHA-2 which are commonly used in digital signature applications.
  • Digital Signatures are commonly used for the authentication of electronic data, which is a key component in almost any secure data communication. The DS is particularly important in electronic commerce where it is used to guaranty for the identification of the participating entities, and for the authentication of the transmitted data.
  • a DS is a unique binary sequence which is used to identify information (message), and the secret key of the source from which the information originated.
  • a hash function is utilized to produce a unique identifier (also known as message digest) based on the message content. This identifier is encrypted utilizing the private key owned by the message originator, in this way providing for both the source identity and the integrity of the message.
  • Hardware implementations are less common, but however substantially improve the efficiency and security of hash function implementations. Hardware implementations are -particularly attractive due to_ their high speed- operation and improved power saving features, as well as for being compact for packaging, particularly when implemented in a single chip. Such implementations are of particular importance in applications where precious CPU processing time is required to perform other tasks, which should not be interrupted (e.g., cellular phones) -Jn-such applications it is preferable to use hardware modules, instead of software implementations, whenever possible, to alleviate the CPU processing.
  • interrupted e.g., cellular phones
  • a hash function tester is described in US_5,623,545, wherein a Hash Algorithm Accelerator Module is utilized to implement the SHA-1 algorithm (also referred to as "SHA Accelerator").
  • the SHA Accelerator is capable of digesting a 512 bits block, and outputs the digested result. Therefore, it re ⁇ ariess sequentially loading the message blocks via an external " data bus, and therefore it is not capable of hashing" a complete message in a single module.
  • Hardware implementations of hash functions are advantageous over their software and software/hardware hybrid implementations in many ways, as was discussed hereinabove. There are, however, no known hardware implementations capable of hashing a message -utilizing a single hardware module, wherein CPU intervention is required only for the zero padding (not always required) of the last message Block.
  • the prior art also fails to provide hash function hardware modules capable of carrying out the computation of more than one hash algorithm in a single hardware module.
  • Hash function algorithm usually involves sequences of arithmetic and logical iterations as defined hereinbelow:
  • Bitwise logical onerations AND - bitwise logical "and" of (X AND Y) is designated herein by X A Y ; OR - bitwise logical “inclusive-or” of (X OR Y) is designated herein by X v Y ; XOR - bitwise logical ? ( Y YOR Y. is esiPTif- ed herein by Z ⁇ 7; and NOT - logical "complement" of X designated herein by X
  • the modular addition designated by X+Y represents the modular addition of the corresponding integer values modulus 2 7 i.e., (x + y) Oi oa2
  • byte and word are used herein to refer to 8 bits and 32 bits integer values, respectively (i.e., one word consists of 4 bytes). Words and bytes values are also represented in a hexadecimal form for convenience.
  • the string "Ox" preceding a hexadecimal sequence is used to designate hexadecimal values, e.g., the decimal value 1518500249 corresponds to the hexadecimal value 0x5 ⁇ 827999.
  • tne term permutation it is meant to refer to manipulating the bits of one or more data words, e.g., cyclic bit rotation, XORing, etc.t
  • the present invention is directed to a hash function module for carrying out hash function computations of at least two different hash function algorithms.
  • the hash' function module comprises: a read-write memory for storing blocks of data Mu of a padded message M,_ and at least intermediate results; an accumulating device for storing at least a word of data and outputting the same; an adder being capable of producing modular addition of at least two data words, one of which is being output from the accumulating device; an exclusive-or (XOR) circuitry being capable oi producing the logical XOR result of at least two words of data, one of which is being output from the accumulating device; one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cychc bit -rotation(s) of a word of data that are input from the accumulating device or from the read- write memory; a first arbitration device for selecting a value which can be retrieved from the read-write memory, XOR circuitry,
  • the logical function circuitries can be implemented by any combination of logical gates selected from the following group:
  • the one or more cychc bit rotation device(s) -preferably - includes circuitry for carrying out a single cychc bit rotation of a word of data obtained from the accumulating device, circuitry tor carrying out four cychc bit rotations of a word of data obtained from the accumulating device, circuitry for carrying out five cychc bit rotations of a word of data obtained from the read- write memory, and circuitry for carrying out thirty cychc bit rotations of a word of data obtained from the read- write memory.
  • q4 ⁇ Y® can k e by he one or more logical function circuitries, or by a set of logical gates, wherein X, Y, and Z are words of data obtained from data registers, or alternatively from a read-write memory.
  • the hash function module may further comprise a ROM memory for storing and outputting hash function constants of one or more hash function algorithms.
  • the value output from the ROM memory can be provided as an input to the second arbitration device.
  • the hash function module may also comprise an additional arbitration device fo ⁇ s electing- the --s ⁇ urce--of- data D eing--u ⁇ the source of data being a word of data obtained from the accumulating device, or from an external data source.
  • the hash function module is capable of carrying out hash function computations of the MD5 and SHA-1 hash function algorithms.
  • the hash function module comprises: a first set of data registers for storing words of data W[i] -of a message block Mu of a padded message M; . a second set of data registers for storing hash function variables; ' a third set of data registers for storing hash function intermediate results; an accumulating device for storing at least a word of data and outputting the same; a memory device for storing hash function constants; an adder being capable of producing modular addition, of at least two data words; an exclusive-or (XOR) circuitry being capable of producing the logical XOR result of at least two words of data, one of- which is being output from the accumulating device; one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cychc bit rotation(s) of a word of data that are input from the accumulating device or from the third set of data registers; a first arbitration device for selecting a value which can be retrieved from'
  • the fifth arbitration device is used for selecting a value retrieved from the second set of data registers
  • the third arbitration device is used for selecting a value retrieved from the first set of data registers, the value is being provided as input to the exclusive-or circuitry and the second arbitration device;
  • the fourth arbitration device is used for selecting a value retrieved from the accumulating device, from the fifth arbitration device, or from the memory device, the value is provided as input to the adder;
  • the second arbitration device is used for selecting a value retrieved from the sixth arbitration device, from the encoder, from the third set of data registers, or from the third arbitration device, the value is provided as in it to the adder: and a control circuit for controlling the operation of the arbitration devices and the data flow in the module, thereby allowing the accumulating device to iteratively input intermediate results into the registers and generate, in the last iteration, a final result consisting of the intermediate result values obtained in the last iteration.
  • the one or more cychc bit rotation device(s) may include circuitry for carrying out a single cyclic bit rotation of a word of data obtained from the accumulating device, circuitry for carrying out five cyclic bit rotations of a word of data obtained from the accumulating device, and circuitry for carrying out thirty cychc bit rotations of a word of data obtained from the read- write memory.
  • the hash function module may further comprise an arbitration device for selecting the source of data being used as input to the first set of data registers, the source of data is a word of data retrieved from the accumulating device, or from an external data source.
  • An additional arbitration device may also he used for selecting the source of data being used as input to the second set of data registers, the source of data being a word of data is the modular addition obtained by the adder, or a word of data obtained from an external data source.
  • the intermediate results are obtained from the second set of data registers, or from the third set of data registers, or are a permutation of the same.
  • the word of data used for carrying out one or more cychc bit rotations is optionally obtained from the accumulating device or is the content of one of the third set of data registers.
  • Pig. 1 is a block diagram illustrating in general a hardware implementation of hash function algorithm according to a preferred embodiment of the invention
  • Fig. 2 is a block diagram illustrating a preferred embodiment of hash function module capable of performing the SHA-1 and MD5 hash algorithms
  • Fig. 3 illustrates another preferred embodiment of the invention for performing various types of hash functions
  • Figs. 4A-4B are flow charts illustrating the operation of the hash function module of Fig. 2;
  • Fig. 5 is a block diagram illustrating an implementation of the hash function module according to another preferred embodiment of the invention. Detailed Description of Preferred Embodiments
  • MD5 hash function algorithms including the MD-5 algorithm
  • Hardware implementations of those hash algorithms allow compactly embedding them into systems in which the security and integrity of data are required.
  • Such implementations also benefit from a fast and power-saving performance in comparison to the software implementations of the same algorithms, and they are particularly attractive in view of the vast increase in electronic commerce in recent years, and the broad acceptance of mobile telecommunication.
  • FIG. 1 A general hardware implementation of a hash function, according to a preferred embodiment of the invention, is shown in the block diagram illustrated in Fig. 1.
  • This implementation comprises a CPU 100, and a hash function module 107 which comprises a Control Block 101, a Memory Block 102, ROM 103, and an Operation Block 104.
  • the CPU 100 is not an integral part of the hash function module.
  • the data bus 108 is therefore used to transfer data between the CPU 100 and hash function module 107.
  • the Control Block 101 manages the digest operation which is performed by providing the Operation Block 104 with a sequence of 512 bits blocks of the message M, which are fetched fro the Memory Block 102.
  • the intervention of the CPU 100 ⁇ is required only if zero padding of the last block is needed.
  • the communication between the CPU 100 and the hash function module 107 is performed over the data bus ⁇ 08.
  • the Memory Block 102 receives data and parameters via the data bus 108, and provides the same to the Operation Block 104 for the hash function calculation.
  • the Memory Block 102 may be implemented utilizing any type of R W- emory (Read-Write-memory); preferably, it is a memory of the RAM type.
  • the digest result is stored in the Memory Block 102, and whenever required " may be provided via the data bus 108.
  • the hash function operation is initiated and monitored by the Control Block 101, by transferring 512 bits blocks of the message M, and algorithm variables (e.g., H t ), to the Operation Block 104, and retrieving the hash function computation results (also termed herein as digest) for storage in the Memory Block 102.
  • a ROM memory (Read Only memory) 103 is used for storing hash function algorithms constants (e.g., K p ). Other types of memories can be used as well to implement the memory block 103.
  • SHA-1 Secure Hash Algorithm
  • DSA Digital Signature Algorithm
  • DSS Digital Signature Standard
  • NIST National Institute of Standards and Technology
  • the SAH-1 algorithm sequentially processes blocks of 512 bits when computing the message digest. Therefore, the message M is usually padded to obtain a message having a bit length which is a multiple of 512.
  • the padding of a message Mis carried out by appending a "1" bit value at the end of the message, followed by "0" bit values.
  • the last 64 bits (two words) of the padded message M are reserved for indicating the original length (before padding) of the message.
  • the MD5 algorithm is an extension of the MD4 algorithm, which was exceptionally fast, and rapidly became popular as message digest, in many applications.
  • the MD5 algorithm is slower than its predecessor, but it is better secured against cryptanalytic attacks.
  • the message M is padded by appending a "1" value to its end,, and "0" values thereafter, to obtain 512 " bits blocks M u (0 ⁇ u ⁇ n), The last two words are also reserved for indicating the original message length.
  • the H r variables are continuously updated (line 2.5) for each block M a .
  • Table 1 shows the " values substituted in each iteration for s, which designates the index of a word to be processed, and r, which designates the number of bit rotation operations that should be performed.
  • Table 1 MD5 operation hst.
  • Fig. 2 One preferred embodiment of the invention is illustrated in Fig. 2, wherein the SHA-1 or MD5 hash function algorithms can be calculated utilizing a single hardware module.
  • the control block 101 (not shown in Fig. 2) manages the operation of the system according to the hash function algorithm to be carried out.
  • the message blocks are retrieved on the data bus 108.
  • Data to be stored in the Memory Block 102 may be also retrieved from the Accumulator 220 (ACC), and thus an arbitration device MUX3 (e.g., a multiplexer) is used for selecting the active input which should be used as data input for the Memory Block 102.
  • the data stored in the Memory Block 102 is provided on the data bus 250, from which it-is available to various components of the system.
  • address locations 0-15 of the Memory Block 102 are used for respectively storing the 32 bit words W ⁇ - W ⁇ of the message block M u , and address locations 16-25 for respectively storing the H 0 -H 4 and A -E variables.
  • the accumulator (ACC) 220 is a 32 bit register, preferably a parallel-in parallel-out register.
  • the content of ACC 220 may be processed in various ways: It may be "xored” (exclusive-or) with data provided on the data bus 250, by the XOR circuitry; it may be rotated 1 and/or 4 bits left rotations by the ROl 4 > and/or ROLW circuitries respectively; it may be subjected to additions (modulus 2 32 ) performed by the 32 bit Adder 202, and may also-b ⁇ -used to perform;other operations that will be discussed hereinafter.
  • the Adder 202 performs the modular addition of the content of the ACC 220, provided on one of its inputs inl, with the value obtained from the arbitration device MUX2, which is used for selecting the value on Adder 202 other input in2.
  • MUX2 selects the value provided on the in2 input, to be one of the outputs from the function blocks ql-q4, a value obtained, from the ROM memory 103, or a value obtained on data bus 250.
  • the content of the ACC 220 is set via another arbitration device MUX1, which selects a value to be introduced on the ACC 220 input.
  • This value may be the output of the Adder 202, ROD 1) circuitry, ROD 4) circuitry,, ROL ⁇ circuitry, ROD 30) circuitry, XOR circuitry, or a value obtained from the data bus 250.
  • the ROD 1) circuitry, ROD 4) circuitry, ROD 5) and ROD 30) circuitries performs left bit rotations to the value obtained on their in ⁇ uts. namely 1, 4, 5, and 30 left bit rotations respectively.
  • the data stored in the ACC 220 may be any one of the following:
  • Control Block 101 is capable of carrying out the MD5 or SHA-1 hash function algorithms by performing a sequence of operations, as will be discussed in detail with reference to Figs. 4A-4B.
  • TEMP variables The respective locations of A-E variables in the Memory Block 102 are set with the H j parameters values as shown in Fig. 4A.
  • the process begins in step 420 after a message block is loaded to the respective memory locations W ⁇ in the Memory Block 102.
  • condition set up in step 422 permits that only steps 1.7 and 1.8 of the SHA- 1 algorithm be performed during the first 16 iterations (0 ⁇ i ⁇ 16 ) of the process.
  • steps 423 to 428 The implementation of steps 1.7 and 1.8 of the SHA-1 algorithm is illustrated in steps 423 to 428 in Fig. 4A.
  • step 423 the X, Y, and Z registers are loaded with the content of memory locations B, C, and D, respectively.
  • the modular addition performed in step 1.8 is carried out in step 424, by providing the content of memory location A on data bus 250, rotating it 5 times to the left by ROD 5) circuitry, and storing the result in ACC; adding to the content of the ACC the following values:
  • the Control Block 101 determines the respective value of I according to the number of iteration that is being performed, and accordingly instructs the arbitration devices MUXl-3, the Memory Block 102, and the ROM 103 to output the required values.
  • the result of the modular additions performed in step 424 is obtained in the ACC 220, and then stored in the TEMP memory location in the Memory Block 102, via MUX3.
  • step 1.7 is carried out by steps 425 to 428, wherein the content of memory locations E, D, C, and B, are set by loading the ACC 220 with the required value (D, C, A, and TEMP via data bus 250, and whenever required R ⁇ 30 '(B) via data bus 250 and ROD 30) circuitry), and writing the content of the ACC 220 via MUX3 into the respective memory location in the Memory Block 102 (E, D, C, B, and A).
  • step 429 the operation of step 1.7 is completed after the content of memory location TEMP is stored in the memory location A.
  • step 1.5 is also performed, as illustrated in steps 422, 433, and 432.
  • Block 101 determines the word indexing (s) by a simple mask operation (step
  • Step 432 begins by loading a word from, memory location into ACC 220 via data bus 250.
  • the words in memory locations +13) ⁇ 0 .. ] , W [ ⁇ s+i y QxF] , and W [ ⁇ s+2 0xF] are added to the content of the ACC by a sequence of addition operations performed by the Adder 202.
  • the Control Block locates the respective memory locations by using a simple mask operation (e.g., (-? + 13) ⁇ ⁇ xE ).
  • the result of this sequence of additions is obtained in the ACC 220, which is then rotated by the ROD 1) circuitry, a single left rotation, the output of which is then stored via MUX1 in ACC 220.
  • the final result of the computation of step 1.5 is stored in memory location W ⁇ via MUX3.
  • the process is completed after the 80 iterations of the process (steps 430 and 431) are performed.
  • memory locations are allocated for the H r values, and for the A-
  • Step 2.1 of the MD5 algorithm wherein the respective locations of the A-D memory locations in the Memory Block 102 are set with the H r parameters is performed is (Fig. 4B) in step 451.
  • the process begins in step 450- after a message block is loaded to the respective memory location in the Memory Block 102.
  • step 2.3 of the MD5 algorithm is performed in steps 453 and
  • step 453 the content of registers X, Y, and Z, is set in step 453, with values from memory 102, b, c, and d, respectively. These values are determined for each iteration by the Control Block 101, with the respective pattern of the values stored in the A-D memory locations, as shown in Table 1. This may be implemented by the Control Block 101 by utilizing a memory device and a simple look-up process.
  • step 454 the ACC 220 is loaded with the value a from the memory 102, and in a sequence of additions performed by the Adder 202, the following values are added to the content of the ACC 220:
  • the result obtained in ACC 220 is then stored in the respective memory location from which the value substituted for a was obtained (e.g., in the iteration z-17 this would be the memory location of the variable D).
  • step 2.5 is carried out in steps 457 to 460.
  • steps 457 to 460 the ACC 220 is loaded with the content of memory location A, B, C, and D, the content of memory location H Q , H x , H 2 , and H 3 , is added to the ACC 220 by ADD 202, and the result of each addition, is then stored in memory locations H Q and A, H x and B, H 2 and C, and H 3 and D, respectively.
  • Block 101 according to the iteration number (i), preferably by utilizing a lookup process (as shown in Table 1).
  • the values of q v and T ⁇ are provided to the
  • adder 202 via MUX2, and the left bit rotations are performed by the ROD 1) and ROD 4) circuitries.
  • ROD 1 and ROD 4 circuitries For example, to obtain the bit rotation RO& ⁇ for iteration ROL ⁇ circuitry and once via the ROLW circuitry.
  • the manner in which these bit rotations are performed is preferably obtained from a memory via a look-up process.
  • the performance of the hash function module can be improved by adding more ROD* ) (e.g., x 5-7, 9-12, 14-17, and 20-23) circuitries to minimize the number of operations required to obtain the required left bit rotations in each iteration of the MD5 process.
  • ROD* e.g., x 5-7, 9-12, 14-17, and 20-23
  • the invention may be implemented more efficiently utilizing a set of registers for storing word data and H, parameters, and with the addition of several arbitration devices to obtain a faster operation of the hash function module, which ehminates the need of data bus 250 and Memory Block 102, as illustrated in Fig. 5.
  • the operation speed of this implementation is substantially improved, since the settings of the different registers during the operation do not require a sequence of operations involving the intermediate steps of setting the ACC 220.
  • a set of registers ⁇ [0 , W ⁇ ,W ⁇ ,...,W ⁇ 500 are utihzed to store the message block M u , which may be set via the W data in line or via the ACC 220.
  • a set of registers H Q ,H X ,H 2 ,H 3 ,-H 4 501 are utilized to store the H 3 parameters, which may be set via the H data in line or via the output of Adder 202.
  • the arbitration device MX8 is used for selecting the active source of data input to the H, ⁇ j-l, 2, ..., 4) registers, by the Control Block.
  • the set of registers A, B, C, D, and E are utihzed, instead of the respective memory locations that were used for the same purpose in the previous embodiment (in Fig. 2).
  • the content of each of these registers may be set via the respective arbitration device MX-A, MX-B, MX-C, MX-D, and MX-E.
  • the arbitration devices MX-A, MX-B, MX-C, MX-D, and MX-E are used to select the value that should be stored in the respective registers A, B, C, D, and E, as follows:
  • MX-A selects a value to be stored in register A; the value may be obtained from register H 0 or from ACC 220;
  • MX-B selects a value to be stored in register B; the value may be obtained from registers H or A (ad), or from ACC 220;
  • MX-C selects a value to be stored in register C; the value may be obtained from register H 2 , the output of the ROD 30) circuitry (ROL ⁇ 3 ) (bb)), or from ACC
  • MX-D selects a value to be stored in register D; the value may be obtained from registers H 3 or C (cc), or from ACC 220;
  • MX-E selects a value to be stored in register E; the value may be obtained from registers H ⁇ or D (dd).
  • the arbitration device MX3 is used for selecting a single value from the W ⁇ register. This selected W ⁇ value is introduced as input into the XOR circuitry and arbitration device MX2.
  • the arbitration MX2 selects the value to be introduced on the in2 input of Adder 202.
  • the value on in2 input is selected from the following inputs of MX2: the output of MX3, the value on the ⁇ line obtained from register E, the value on the a, b, c, or d, lines obtained from ENCODER 502, or the output of the arbitration device MX6.
  • the ENCODER 502 may be implemented utilizing any conventional methods known in the art.
  • the function blocks gl, q2, q3, and g4 are fed with the values obtained on the b, c, and d, lines, and their output is introduced into the inputs of arbitration device MX6.
  • H j values are selected for processing by the MX5 arbitration, which introduces the selected H j value on one of the MX4 arbitration inputs.
  • the arbitration device MX4 is used to select the value on the inl input of the Adder 202. This input may be selected from the following inputs of MX4: a value obtained from the ROM 103; a H, value obtained from MX5; and the content of the ACC 220.
  • the content of the ACC 220 is set via arbitration MX1.
  • This value may be selected from any of the following values: the output of the XOR circuitry, the output of Adder 202, the output of ROD 1) circuitry, or the output of ROD 5) circuitry.
  • the performance of this embodiment can also be improved by the addition of ROD* ) circuitries for minimizing the number of operation needed to obtain the left bit rotations required in each iteration.
  • FIG. 3 Another preferred embodiment of the hash function module according to the invention is illustrated in Fig. 3.
  • the control block 101 (not shown in Fig. 3) manages the operation of the system according to the hash function algorithm which should be performed.
  • a set of logical gates are used instead of the function blocks ql-q4. This is obtained by utilizing the following logical gates: converter 301, OR 302, AND 303, and XOR 304.
  • any logical function can be implemented over a number of cycles wherein a single logical operation is performed by a logical gate selected by the Control Block, and by storing intermediate results-in the ACC 220 or in the Memory Block 102.
  • this embodiment expands the number of hash function algorithms which may be implemented by a single hardware module, its performance is also relatively slower than that of the previous embodiments discussed hereinbefore. The reduction in performance speed is of course due to the increase in the number of cycles required to perform any logical function.

Abstract

Embodiments of the invention provide a hash function module for carrying out hash function computations of at least two different hash function algorithms. According to some exemplary embodiments of the invention, the hash function module includes a read-write memory (102), an accumulating device (220), an adder (202), exclusive-or circuitry, one or more cyclic bit rotation devices, two arbitration devices, at least three data registers, one or more logical function circuitries, and a control circuit.

Description

FLEXIBLE HARDWARE IMPLEMENTATION OF HASH FUNCTIONS
Field of the Invention
The present invention relates to hardware implementations of hash functions. More particularly, the invention relates to a flexible hardware implementation of various hash function algorithms in a single module.
Background of. the Invention
The -present invention is- directed to -hardware- implementation. of various types of hash functions derived from the MD-5 hash function algorithm, such as SHA- 1 and SHA-2 which are commonly used in digital signature applications. Digital Signatures (DS) are commonly used for the authentication of electronic data, which is a key component in almost any secure data communication. The DS is particularly important in electronic commerce where it is used to guaranty for the identification of the participating entities, and for the authentication of the transmitted data.
In general, a DS is a unique binary sequence which is used to identify information (message), and the secret key of the source from which the information originated. In common DS algorithms, a hash function is utilized to produce a unique identifier (also known as message digest) based on the message content. This identifier is encrypted utilizing the private key owned by the message originator, in this way providing for both the source identity and the integrity of the message.
The efficiency and security of electronic transactions greatly depends on the methods and implementations of security modules. Common implementations are often based on software solely, which constitutes an expensive implementation in terms of processing time, resources (memory, CPU, etc), and which is seldom less secure. More renaoie implementations combine software operations with the addition of dedicated hardware modules, which usually provide improved performance in terms of operation speed and flexibility, but in general are more expensive and not sufficiently secured. Such implementations are often attractive in applications in which several hash functions are being used, and wherein a computer program is used to select the respective hardware modules that are required for the specific hash function comnutation.
Hardware implementations are less common, but however substantially improve the efficiency and security of hash function implementations. Hardware implementations are -particularly attractive due to_ their high speed- operation and improved power saving features, as well as for being compact for packaging, particularly when implemented in a single chip. Such implementations are of particular importance in applications where precious CPU processing time is required to perform other tasks, which should not be interrupted (e.g., cellular phones) -Jn-such applications it is preferable to use hardware modules, instead of software implementations, whenever possible, to alleviate the CPU processing.
A hash function tester is described in US_5,623,545, wherein a Hash Algorithm Accelerator Module is utilized to implement the SHA-1 algorithm (also referred to as "SHA Accelerator"). The SHA Accelerator is capable of digesting a 512 bits block, and outputs the digested result. Therefore, it reαuires sequentially loading the message blocks via an external" data bus, and therefore it is not capable of hashing" a complete message in a single module.
Hardware implementations of hash functions are advantageous over their software and software/hardware hybrid implementations in many ways, as was discussed hereinabove. There are, however, no known hardware implementations capable of hashing a message -utilizing a single hardware module, wherein CPU intervention is required only for the zero padding (not always required) of the last message Block. The prior art also fails to provide hash function hardware modules capable of carrying out the computation of more than one hash algorithm in a single hardware module.
Hash function algorithm usually involves sequences of arithmetic and logical iterations as defined hereinbelow:
Bitwise logical onerations: AND - bitwise logical "and" of (X AND Y) is designated herein by X A Y ; OR - bitwise logical "inclusive-or" of (X OR Y) is designated herein by X v Y ; XOR - bitwise logical
Figure imgf000004_0001
? (Y YOR Y. is esiPTif- ed herein by ZΘ7; and NOT - logical "complement" of X designated herein by X
Modular addition.
The modular addition designated by X+Y represents the modular addition of the corresponding integer values modulus 2 7 i.e., (x + y)Oioa2
Circular left shift:
The circular left shift operation of a value X is designated herein by
Figure imgf000004_0002
where i designates the number of shifts to perform. During this operation the bits of the value X are "shifted to the left, while- the -leftmost bit in each-shift becomes the rightmost bit, e.g., ROZ(4)(l001 1110 HOO), = (lllO 1100 lOOl).
The terms byte and word are used herein to refer to 8 bits and 32 bits integer values, respectively (i.e., one word consists of 4 bytes). Words and bytes values are also represented in a hexadecimal form for convenience. The string "Ox" preceding a hexadecimal sequence is used to designate hexadecimal values, e.g., the decimal value 1518500249 corresponds to the hexadecimal value 0x5^827999. Message data blocks of 51:2 bits are designated herein as M— (u = l,2,...n), and 16 words of such blocks are designated herein as fP i (z' = 0,l32,....15), such that a 512 bits block may also be written as . , = (^0], [l]3F[2],..., ).
It is an object of the present invention to provide a hardware implementation for carrying out various, types of hash function algorithms utHizihg a single hardware module.
It is another object of . the present invention to provide a hardware implementation of hash function algorithms with substantially improved speed performance, and security.
It is a further object of the present invention to provide' a hardware implementation for carrying, out various types of hash function algorithms which significantly minimize the amount of CPU intervention required.
Other, objects and advantages, of the invention will become apparent as the description proceeds.
Su-mcmarv of the Invention
By using tne term permutation herein it is meant to refer to manipulating the bits of one or more data words, e.g., cyclic bit rotation, XORing, etc.t
The present invention is directed to a hash function module for carrying out hash function computations of at least two different hash function algorithms. According to one preferred embodiment of the invention the hash' function module comprises: a read-write memory for storing blocks of data Mu of a padded message M,_ and at least intermediate results; an accumulating device for storing at least a word of data and outputting the same; an adder being capable of producing modular addition of at least two data words, one of which is being output from the accumulating device; an exclusive-or (XOR) circuitry being capable oi producing the logical XOR result of at least two words of data, one of which is being output from the accumulating device; one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cychc bit -rotation(s) of a word of data that are input from the accumulating device or from the read- write memory; a first arbitration device for selecting a value which can be retrieved from the read-write memory, XOR circuitry, cychc bit rotation device(s), or from the adder, to be stored in the accumulating device; at least three data registers, each of which being capable of storing a word of data obtained from the memory; one or more logical function circuitries for performing logical operations between words currently stored in the data registers; a second arbitration device for selecting a value retrieved from the output of the one or more logical function circuitries or from the read- write memory, to be input to the adder; and a control circuit for controlling the operation of the arbitration devices and the data flow in the module, thereby allowing the accumulating device to iteratively input intermediate results into the read-write memory and generate, in the last iteration, a final result consisting of the intermediate result values obtained in the last iteration.
Optionally, the logical function circuitries can be implemented by any combination of logical gates selected from the following group:
- bit wise logical AND of at least two data words, one being the value stored in the accumulating device, and. the other one obtained from the memory; - bit wise logical OR of at least two data words, one being the value stored in the accumulating device, and another one which is obtained from the memory;
- bit wise logical XOR of at least two data words, one being the value stored in the accumulating device, and another one which is obtained from the memory; and
- bit wise logical NOT of at least one data word obtained from the memory.
The one or more cychc bit rotation device(s) -preferably - includes circuitry for carrying out a single cychc bit rotation of a word of data obtained from the accumulating device, circuitry tor carrying out four cychc bit rotations of a word of data obtained from the accumulating device, circuitry for carrying out five cychc bit rotations of a word of data obtained from the read- write memory, and circuitry for carrying out thirty cychc bit rotations of a word of data obtained from the read- write memory.
In a preferred embodiment of the invention the logical functions: ?1 = (I Λ 7)VΛ Z); q2 = X ® Y@Z . $3 = (ZA7)V (I AZ)V (F A Z) ; and
q4 ~ Y®
Figure imgf000007_0001
can ke
Figure imgf000007_0002
by he one or more logical function circuitries, or by a set of logical gates, wherein X, Y, and Z are words of data obtained from data registers, or alternatively from a read-write memory.
The hash function module may further comprise a ROM memory for storing and outputting hash function constants of one or more hash function algorithms. In addition, the value output from the ROM memory can be provided as an input to the second arbitration device.
The hash function module may also comprise an additional arbitration device fo^s electing- the --søurce--of- data D eing--u^ the source of data being a word of data obtained from the accumulating device, or from an external data source.
According to a preferred embodiment of the invention the hash function module is capable of carrying out hash function computations of the MD5 and SHA-1 hash function algorithms.
According to another preferred embodiment of the invention the hash function module comprises: a first set of data registers for storing words of data W[i] -of a message block Mu of a padded message M; . a second set of data registers for storing hash function variables; ' a third set of data registers for storing hash function intermediate results; an accumulating device for storing at least a word of data and outputting the same; a memory device for storing hash function constants; an adder being capable of producing modular addition, of at least two data words; an exclusive-or (XOR) circuitry being capable of producing the logical XOR result of at least two words of data, one of- which is being output from the accumulating device; one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cychc bit rotation(s) of a word of data that are input from the accumulating device or from the third set of data registers; a first arbitration device for selecting a value which can be retrieved from' the XOR circuitry, cyclic bit rotation device(s), or from the adder, to be stored in the accumulating device; an encoder for receiving words of data from the third set of data registers and outputting different patterns -of the same according to the hush function algorithm that is performed; one or more logical function circuitries for performing logical operations between the words output from the encoder; a second, third, fourth, fifth, and sixth arbitration devices, wherein:
- the fifth arbitration device is used for selecting a value retrieved from the second set of data registers;
- the third arbitration device is used for selecting a value retrieved from the first set of data registers, the value is being provided as input to the exclusive-or circuitry and the second arbitration device;
- the fourth arbitration device is used for selecting a value retrieved from the accumulating device, from the fifth arbitration device, or from the memory device, the value is provided as input to the adder;
- tne six ti arbitration device is used for selecting a value produced by the logical function circuitries;
- the second arbitration device is used for selecting a value retrieved from the sixth arbitration device, from the encoder, from the third set of data registers, or from the third arbitration device, the value is provided as in it to the adder: and a control circuit for controlling the operation of the arbitration devices and the data flow in the module, thereby allowing the accumulating device to iteratively input intermediate results into the registers and generate, in the last iteration, a final result consisting of the intermediate result values obtained in the last iteration.
The one or more cychc bit rotation device(s) may include circuitry for carrying out a single cyclic bit rotation of a word of data obtained from the accumulating device, circuitry for carrying out five cyclic bit rotations of a word of data obtained from the accumulating device, and circuitry for carrying out thirty cychc bit rotations of a word of data obtained from the read- write memory.
In addition, the hash function module may further comprise an arbitration device for selecting the source of data being used as input to the first set of data registers, the source of data is a word of data retrieved from the accumulating device, or from an external data source. An additional arbitration device may also he used for selecting the source of data being used as input to the second set of data registers, the source of data being a word of data is the modular addition obtained by the adder, or a word of data obtained from an external data source.
Optionally, the intermediate results are obtained from the second set of data registers, or from the third set of data registers, or are a permutation of the same. The word of data used for carrying out one or more cychc bit rotations is optionally obtained from the accumulating device or is the content of one of the third set of data registers.
Brief Description of the Drawings
In the drawings:
Pig. 1 is a block diagram illustrating in general a hardware implementation of hash function algorithm according to a preferred embodiment of the invention;
Fig. 2 is a block diagram illustrating a preferred embodiment of hash function module capable of performing the SHA-1 and MD5 hash algorithms;
Fig. 3 illustrates another preferred embodiment of the invention for performing various types of hash functions;
Figs. 4A-4B are flow charts illustrating the operation of the hash function module of Fig. 2; and
Fig. 5 is a block diagram illustrating an implementation of the hash function module according to another preferred embodiment of the invention. Detailed Description of Preferred Embodiments
Various MD5 hash function algorithms, including the MD-5 algorithm, are commonly used today in digital signature applications. Hardware implementations of those hash algorithms allow compactly embedding them into systems in which the security and integrity of data are required. Such implementations also benefit from a fast and power-saving performance in comparison to the software implementations of the same algorithms, and they are particularly attractive in view of the vast increase in electronic commerce in recent years, and the broad acceptance of mobile telecommunication.
A general hardware implementation of a hash function, according to a preferred embodiment of the invention, is shown in the block diagram illustrated in Fig. 1. This implementation comprises a CPU 100, and a hash function module 107 which comprises a Control Block 101, a Memory Block 102, ROM 103, and an Operation Block 104. As shown in Fig. 1, the CPU 100 is not an integral part of the hash function module. The data bus 108 is therefore used to transfer data between the CPU 100 and hash function module 107.
In general, whenever message digest is required, the Control Block 101 manages the digest operation which is performed by providing the Operation Block 104 with a sequence of 512 bits blocks of the message M, which are fetched fro the Memory Block 102. The intervention of the CPU 100~is required only if zero padding of the last block is needed. The communication between the CPU 100 and the hash function module 107 is performed over the data bus Ϊ08. The Memory Block 102 receives data and parameters via the data bus 108, and provides the same to the Operation Block 104 for the hash function calculation. The Memory Block 102 may be implemented utilizing any type of R W- emory (Read-Write-memory); preferably, it is a memory of the RAM type. The digest result is stored in the Memory Block 102, and whenever required " may be provided via the data bus 108. The hash function operation is initiated and monitored by the Control Block 101, by transferring 512 bits blocks of the message M, and algorithm variables (e.g., Ht), to the Operation Block 104, and retrieving the hash function computation results (also termed herein as digest) for storage in the Memory Block 102. A ROM memory (Read Only memory) 103 is used for storing hash function algorithms constants (e.g., Kp). Other types of memories can be used as well to implement the memory block 103.
In the following discussion hereinafter the implementation oi hash function modules capable of performing various types of hash algorithms is explained in detail. In particular, the preferred embodiment of the invention is illustrated and exemplified for various types of MD5 hash function algorithms (e.g., SAH- 1), which are relatively common in present apphcations. Hence, the SHA-1 and MD5 algorithms are briefly discussed hereinbelow.
SHA-1
This is the Secure Hash Algorithm (SHA-1) utilized in the Digital Signature Algorithm (DSA) according to the Digital Signature Standard (DSS) of the National Institute of Standards and Technology (NIST) (FTPS PUB 180-1, Secure Hash standard). The SHA-1 algorithm produces a 160 bit representation (disest) of a message M of length M <2δ4 bits. -
The SAH-1 algorithm sequentially processes blocks of 512 bits when computing the message digest. Therefore, the message M is usually padded to obtain a message having a bit length which is a multiple of 512. The padding of a message Mis carried out by appending a "1" bit value at the end of the message, followed by "0" bit values. The last 64 bits (two words) of the padded message M are reserved for indicating the original length (before padding) of the message. The Dadded message obtained therefore consists of a sequence of 512 bits blocks M = Mx,M2,...,Mn (0<n< 255), wherein each block Mυ contains a sequence of 16 words.
In the computation of the SHA-1 digest the message blocks Mu are sequentially processed, and the computation carried out for each block Ma involves an 80 iterations process in which the words of the block Mu are "digested' using a set of logical and arithmetic operations utilizing a set of constant values Kl = 0x5 AS21999, K2 = x6ED9EBAl, X3 = OxSFlBBCDC , and K4 = 0xCA62CW6 , and a set - of logical functions /l(-3,C,--3)=(--?ΛC)v(-_?Λ-D), /4 = /2(-5,C.-D)=EΘCΘ-D, and /3(E,C,- )=(-?ΛC)v(5Λ- )v(CΛ-D).
SHA-1 Algorithm:
The process is initiated with the following values,
#„ = 0*67452301. Hx=0xEFCDABW . Jff2=0x9BBADCFE. H3 =0x10325416. ,
HΛ = 0xC3D2ElE0
1.1 Let A = HQ, B = H{, C = H2, D = H3, E = H4
1.2 For t = 0 to 79 do
1.3 s = tΛθxF*
1.4 If{/>16)Then -5 W[s] = ROE(l)( +13)Λo.lF] ® W[{s÷syϋxF] Θ W[{s+2)Λ0xF] Θ W[s])i
1.6 End If
1.7 A = ROL{5](A)+fp(B,C,D)+E + W] + Kp;
Figure imgf000013_0001
1.9 End For ΛQ HQ=H0+A^ H1=H1+Ej H2=H2+C H3=H3+D H4 = HΛ+E The Hj (/ = 0.1.....4) variables are initiated with the following values
0x67452301, 0xEFCDAB89 , 0x98BADCFE , 0x10325476, and OxC3D2ElE0, which are used for the digest process of the first 512 bits block M{ of the message. For the rest of the message blocks Mu (l<u≤n) the Hj variables are continuously updated, as shown in line 1.10 above. The message digest is the 160 bits obtained in the Hj (=0.1. ■•■> 4) variables after the last message block Mn was processed.
The logical function fp{B,C,D) (p=l, 2, 3 or 4) utilized in each iteration is chosen according to the following rule: fpiβ, C, D)
Figure imgf000014_0001
MD5
The MD5 algorithm is an extension of the MD4 algorithm, which was exceptionally fast, and rapidly became popular as message digest, in many applications. The MD5 algorithm is slower than its predecessor, but it is better secured against cryptanalytic attacks. As in the SHA-1 algorithm, the message M is padded by appending a "1" value to its end,, and "0" values thereafter, to obtain 512" bits blocks Mu (0<u≤n), The last two words are also reserved for indicating the original message length.
Each message block Mi is digested in a process of 64 iterations, in which a set of Boolean functions gl(b,c,d)=( Ac)vb Ad), g2(b,c,d) = (b Ad)v{dAc), ^.g3{b,c,d)=b®c@d , and g4(b}c,d)=c@b v j, are utilized, and a set of 64 constants ],] ( 0 < f < 63 ) which are obtained by the pre-calculation of Tβ
Figure imgf000014_0002
-the-geometric -sinus function, ABS returns the absolute result of the SIN function, and TRUNC returns the integer part of the multiphcation result); Since the logical operations for reahzing Boolean function gl are the same for the Boolean function g2, a single circuitry is utilized in the preferred embodiment ol the. invention to obtain these Boolean functions ( 2(b, c, d) = gl(d, b, c)) .
MD5 Algorithm:
The process is initiated with the following values,
H0 = 0x67452301 ; Hx = 0xEFCDABS9 ; H2 = 0x98BADCFE ; and H = 0x10325476.
2.1 Let _4 = H0, B = HX , C = H2 , D = H3
2.2 For t = 0 to 63 do
Figure imgf000015_0001
2_4 End For
2.5 HQ = H0 +A , ^ . + B , H2 = H2 + C , H3 = H3 +D
The Hr (r = 0,l,...,3) variables are initiated with the following values 0x67452301 , QxEFCDAB89 ,. 0x983ADCFE , and .0x10325476 , which are used in the digest process of the first 512 bits block Mx of the message. For the rest of the message blocks Mu (l < u ≤ n) the Hr variables are continuously updated (line 2.5) for each block Ma . The message digest is the 128 bits obtained in the Hr (r=0, 1, 2, 3) variables after the last message block Mn was processed.
The values A, B, C, and D are substituted in different patterns in the variables a,- b, c, and d, in each iteration, as shown in Table 1. Table 1 also shows the "values substituted in each iteration for s, which designates the index of a word to be processed, and r, which designates the number of bit rotation operations that should be performed. Table 1: MD5 operation hst.
Figure imgf000016_0002
Figure imgf000016_0003
The logical function gp(b,c,d) (p=l, 2, 3, 4) utihzed in each iteration is chosen according ° to the following a rule:
Figure imgf000016_0001
One preferred embodiment of the invention is illustrated in Fig. 2, wherein the SHA-1 or MD5 hash function algorithms can be calculated utilizing a single hardware module. The control block 101 (not shown in Fig. 2) manages the operation of the system according to the hash function algorithm to be carried out. The message blocks are retrieved on the data bus 108. Data to be stored in the Memory Block 102 may be also retrieved from the Accumulator 220 (ACC), and thus an arbitration device MUX3 (e.g., a multiplexer) is used for selecting the active input which should be used as data input for the Memory Block 102. The data stored in the Memory Block 102 is provided on the data bus 250, from which it-is available to various components of the system.
In a preferred embodiment of the invention address locations 0-15 of the Memory Block 102 are used for respectively storing the 32 bit words W^ - W^ of the message block Mu , and address locations 16-25 for respectively storing the H0 -H4 and A -E variables. The accumulator (ACC) 220 is a 32 bit register, preferably a parallel-in parallel-out register. The content of ACC 220 may be processed in various ways: It may be "xored" (exclusive-or) with data provided on the data bus 250, by the XOR circuitry; it may be rotated 1 and/or 4 bits left rotations by the ROl 4> and/or ROLW circuitries respectively; it may be subjected to additions (modulus 232) performed by the 32 bit Adder 202, and may also-bβ-used to perform;other operations that will be discussed hereinafter.
Additional 32 bit registers X 210, Y 211, and Z 212, which are preferably parallel-in parallel-out registers, are used for storing the B, C, and D, parameters. These registers are loaded with data provided on data bus 250, and their content is provided as inputs to the ql, q2, q3, and q4, function blocks. Those function blocks (ql-q4 can be used for both the MD5 and the SHA-1 computations utilizing the following set of functions: ql = ( A Y)V \ ∑),
Figure imgf000017_0001
The Adder 202 performs the modular addition of the content of the ACC 220, provided on one of its inputs inl, with the value obtained from the arbitration device MUX2, which is used for selecting the value on Adder 202 other input in2. MUX2 selects the value provided on the in2 input, to be one of the outputs from the function blocks ql-q4, a value obtained, from the ROM memory 103, or a value obtained on data bus 250. The content of the ACC 220 is set via another arbitration device MUX1, which selects a value to be introduced on the ACC 220 input. This value may be the output of the Adder 202, ROD1) circuitry, ROD4) circuitry,, ROL© circuitry, ROD30) circuitry, XOR circuitry, or a value obtained from the data bus 250. The ROD1) circuitry, ROD4) circuitry, ROD5) and ROD30) circuitries performs left bit rotations to the value obtained on their inυuts. namely 1, 4, 5, and 30 left bit rotations respectively.
Hence, the data stored in the ACC 220 may be any one of the following:
. * the addition result of the value previously stored in the ACC 220 and the output from MUX2, said addition result is obtained from the Adder 202;
• the exclusive-or operation of the value previously stored in the ACC 220 and a value obtained from the Memory Block 102;
• one (ROD1)) or four left bit rotations (ROD4)) of the value previously stored in the ACC 220 content;
• five (ROD5)) or thirty (ROD30)) left bit rotations of a value obtained on the data bus 250 from the Memory Block 102; and
• a value obtained from the Memory Block 102 via the data bus 250.
With these means the Control Block 101 is capable of carrying out the MD5 or SHA-1 hash function algorithms by performing a sequence of operations, as will be discussed in detail with reference to Figs. 4A-4B. SHA-1 process
In this process memory locations are allocated for the Hj values, the A-E and
TEMP variables. The respective locations of A-E variables in the Memory Block 102 are set with the Hj parameters values as shown in Fig. 4A. The process begins in step 420 after a message block is loaded to the respective memory locations W \ in the Memory Block 102.
The condition set up in step 422 permits that only steps 1.7 and 1.8 of the SHA- 1 algorithm be performed during the first 16 iterations (0 < i < 16 ) of the process.
1.7 A = ROU5 (A)+fp (B,C,D)+E + W[s] +Kp ϊ
Figure imgf000019_0001
The implementation of steps 1.7 and 1.8 of the SHA-1 algorithm is illustrated in steps 423 to 428 in Fig. 4A. In step 423 the X, Y, and Z registers are loaded with the content of memory locations B, C, and D, respectively. The modular addition performed in step 1.8 is carried out in step 424, by providing the content of memory location A on data bus 250, rotating it 5 times to the left by ROD5) circuitry, and storing the result in ACC; adding to the content of the ACC the following values:
• the output of the function block q, (1=1, 2, 3, the respective function block is chosen to obtain ql ≡ / ) provided via MUX2;
• the content of memory location E provided on the data bus 250 via MUX2;
• the content of memory location W^ provided on data bus 250 via
MUX2; and • a constant value Kp from ROM 103, provided via MUX2.
The Control Block 101 determines the respective value of I according to the number of iteration that is being performed, and accordingly instructs the arbitration devices MUXl-3, the Memory Block 102, and the ROM 103 to output the required values. The result of the modular additions performed in step 424 is obtained in the ACC 220, and then stored in the TEMP memory location in the Memory Block 102, via MUX3.
The operation of step 1.7 is carried out by steps 425 to 428, wherein the content of memory locations E, D, C, and B, are set by loading the ACC 220 with the required value (D, C, A, and TEMP via data bus 250, and whenever required Rθή30'(B) via data bus 250 and ROD30) circuitry), and writing the content of the ACC 220 via MUX3 into the respective memory location in the Memory Block 102 (E, D, C, B, and A). In step 429, the operation of step 1.7 is completed after the content of memory location TEMP is stored in the memory location A.
For those iterations of the process which are above 16 (z > 16), the operation of step 1.5 is also performed, as illustrated in steps 422, 433, and 432. The Control
Block 101 determines the word indexing (s) by a simple mask operation (step
433).
1.5 W[s] = ROL^{w[(s+l3 0xF] θ W^ 0tF] θ rf] Φ W[s]) ;
Step 432 begins by loading a word from, memory location
Figure imgf000020_0001
into ACC 220 via data bus 250. The words in memory locations +13)Λ0.. ] , W[{s+iyQxF] , and W[{s+2 0xF] are added to the content of the ACC by a sequence of addition operations performed by the Adder 202. As before, the Control Block locates the respective memory locations by using a simple mask operation (e.g., (-? + 13) Λ θxE ). The result of this sequence of additions is obtained in the ACC 220, which is then rotated by the ROD1) circuitry, a single left rotation, the output of which is then stored via MUX1 in ACC 220. The final result of the computation of step 1.5 is stored in memory location W^ via MUX3. The process is completed after the 80 iterations of the process (steps 430 and 431) are performed.
MD5 process
In this process memory locations are allocated for the Hr values, and for the A-
D variables. Step 2.1 of the MD5 algorithm, wherein the respective locations of the A-D memory locations in the Memory Block 102 are set with the Hr parameters is performed is (Fig. 4B) in step 451. The process begins in step 450- after a message block is loaded to the respective memory location in the Memory Block 102.
The processing of step 2.3 of the MD5 algorithm is performed in steps 453 and
454, in a process of 64 iterations (steps 455 and 456).
2.3. a = FF.{a,b,c,d,gp,W r,T } = b + ROL + gp{b,c,d)+ W^
To perform the processing of step 2.3, the content of registers X, Y, and Z, is set in step 453, with values from memory 102, b, c, and d, respectively. These values are determined for each iteration by the Control Block 101, with the respective pattern of the values stored in the A-D memory locations, as shown in Table 1. This may be implemented by the Control Block 101 by utilizing a memory device and a simple look-up process. Then in step 454 the ACC 220 is loaded with the value a from the memory 102, and in a sequence of additions performed by the Adder 202, the following values are added to the content of the ACC 220:
" the output of function block qv (u=l, 2, 3, 4, the respective function block is chosen to obtain -qv ≡ gp) obtained via MUX2;
• a word from memory location W^ obtained on data bus 250 via MUX2;
• a constant value T^ from ROM 103, obtained via MUX2; The addition result obtained in the ACC 220 is then subject to r-times left bit rotation (the value of r is given in Table 1). The bit rotation result is stored in the ACC 220, and then used to calculate another addition; in which the value b from the Memory Block 102 obtained on data bus 250 via MUX2, is added to the content of the ACC 220.
The result obtained in ACC 220 is then stored in the respective memory location from which the value substituted for a was obtained (e.g., in the iteration z-17 this would be the memory location of the variable D).
After completion of the 64 iterations, the processing of step 2.5 is carried out in steps 457 to 460.
2.6 Q =HQ +A ) A = Q 3 HX = HX +B 3 B = HX J H2 = H2 + C 3 C — 2j
H3 = H3 +D , E> = H3
In steps 457 to 460 the ACC 220 is loaded with the content of memory location A, B, C, and D, the content of memory location HQ , Hx , H2 , and H3 , is added to the ACC 220 by ADD 202, and the result of each addition, is then stored in memory locations HQ and A, Hx and B, H2 and C, and H3 and D, respectively.
The respective function bloc (qv) "added to the ACC in step 454, the number of left bit rotations Rθήr' of the ACC 220 content to be performed, and the respective constant T^) to be added to the ACC 220, are selected by the Control
Block 101 according to the iteration number (i), preferably by utilizing a lookup process (as shown in Table 1). The values of qv and T^ are provided to the
. adder 202 via MUX2, and the left bit rotations are performed by the ROD1) and ROD4) circuitries. For example, to obtain the bit rotation RO&^ for iteration
Figure imgf000022_0001
ROL^ circuitry and once via the ROLW circuitry. The manner in which these bit rotations are performed is preferably obtained from a memory via a look-up process.
It should be noted that the performance of the hash function module can be improved by adding more ROD*) (e.g., x 5-7, 9-12, 14-17, and 20-23) circuitries to minimize the number of operations required to obtain the required left bit rotations in each iteration of the MD5 process.
The invention may be implemented more efficiently utilizing a set of registers for storing word data and H, parameters, and with the addition of several arbitration devices to obtain a faster operation of the hash function module, which ehminates the need of data bus 250 and Memory Block 102, as illustrated in Fig. 5. As will be described in detail, the operation speed of this implementation is substantially improved, since the settings of the different registers during the operation do not require a sequence of operations involving the intermediate steps of setting the ACC 220.
In the system illustrated in Fig. 5, a set of registers ^[0 , W^,W^,...,W^ 500 are utihzed to store the message block Mu , which may be set via the W data in line or via the ACC 220. An arbitration device MX7 is used by the Control Block to select the active source of data input to the W (.=1, 2, ..., 15) registers.
Similarly, a set of registers HQ,HX,H2,H3,-H4 501 are utilized to store the H3 parameters, which may be set via the H data in line or via the output of Adder 202. The arbitration device MX8 is used for selecting the active source of data input to the H, <j-l, 2, ..., 4) registers, by the Control Block. The set of registers A, B, C, D, and E are utihzed, instead of the respective memory locations that were used for the same purpose in the previous embodiment (in Fig. 2). The content of each of these registers may be set via the respective arbitration device MX-A, MX-B, MX-C, MX-D, and MX-E. This arrangement eliminates the need of the TEMP memory location, as will be shown hereinafter. The arbitration devices MX-A, MX-B, MX-C, MX-D, and MX-E, are used to select the value that should be stored in the respective registers A, B, C, D, and E, as follows:
MX-A: selects a value to be stored in register A; the value may be obtained from register H0 or from ACC 220;
MX-B: selects a value to be stored in register B; the value may be obtained from registers H or A (ad), or from ACC 220;
MX-C: selects a value to be stored in register C; the value may be obtained from register H2 , the output of the ROD30) circuitry (ROL{3 )(bb)), or from ACC
220; MX-D: selects a value to be stored in register D; the value may be obtained from registers H3 or C (cc), or from ACC 220;
MX-E: selects a value to be stored in register E; the value may be obtained from registers Hά or D (dd).
The arbitration device MX3 is used for selecting a single value from the W^ register. This selected W^ value is introduced as input into the XOR circuitry and arbitration device MX2. The arbitration MX2 selects the value to be introduced on the in2 input of Adder 202. The value on in2 input is selected from the following inputs of MX2: the output of MX3, the value on the β line obtained from register E, the value on the a, b, c, or d, lines obtained from ENCODER 502, or the output of the arbitration device MX6.
The ENCODER 502 is used for setting the appropriate values required for each hash function digest. Namely, in the case of SHA-1 hash function, these values are set according to the respective register output (a=aa, b=bb,..., d=dd), and in the case of MD5 hash function, these values are set according to Table~l. The ENCODER 502 may be implemented utilizing any conventional methods known in the art. The function blocks gl, q2, q3, and g4, are fed with the values obtained on the b, c, and d, lines, and their output is introduced into the inputs of arbitration device MX6. During the hash function process Hj values are selected for processing by the MX5 arbitration, which introduces the selected Hj value on one of the MX4 arbitration inputs.
The arbitration device MX4 is used to select the value on the inl input of the Adder 202. This input may be selected from the following inputs of MX4: a value obtained from the ROM 103; a H, value obtained from MX5; and the content of the ACC 220.
The content of the ACC 220 is set via arbitration MX1. This value may be selected from any of the following values: the output of the XOR circuitry, the output of Adder 202, the output of ROD1) circuitry, or the output of ROD5) circuitry. Of course, the performance of this embodiment can also be improved by the addition of ROD*) circuitries for minimizing the number of operation needed to obtain the left bit rotations required in each iteration.
Another preferred embodiment of the hash function module according to the invention is illustrated in Fig. 3. As will be explained now, with this embodiment- various types of MD5 based hash functions algorithms can be realized. The control block 101 (not shown in Fig. 3) manages the operation of the system according to the hash function algorithm which should be performed. To enable the realization of a plurahty of hash function algorithms, a set of logical gates are used instead of the function blocks ql-q4. This is obtained by utilizing the following logical gates: converter 301, OR 302, AND 303, and XOR 304. With this set of logical gates any logical function can be implemented over a number of cycles wherein a single logical operation is performed by a logical gate selected by the Control Block, and by storing intermediate results-in the ACC 220 or in the Memory Block 102. It should be noted that although this embodiment expands the number of hash function algorithms which may be implemented by a single hardware module, its performance is also relatively slower than that of the previous embodiments discussed hereinbefore. The reduction in performance speed is of course due to the increase in the number of cycles required to perform any logical function.
The above examples and description have of course been provided only for the purpose of illustration, and are not intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways, employing techniques different from those described above, all without exceeding the scope of the invention.

Claims

1. A hash function module for carrying out hash function computations of at least two different hash function algorithms, comprising: a) a read- write memory for storing blocks of data Mu of a padded message M, and at least intermediate results; b) an accumulating device for storing at least a word of data and outputting the same; c) an adder being capable of producing modular addition of at least two data words, one of which is being output from said accumulating device; d) an exclusive-or (XOR) circuitry being capable of producing the logical XOR result of at least two words of data, one of which is being output from said accumulating device; e) one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cychc bit rotation(s) of a word of data that are input from said accumulating device or from said read- write memory; f) a first arbitration device for selecting a value which can be retrieved from said read-write memory, XOR circuitry, cychc bit rotation device(s), or from said adder, to be stored in said accumulating device; g) at least three- data registers, each of which being capable of storing a word of data obtained from said memory; h) one or more logical function circuitries for performing logical operations between words currently stored in said data registers; i) a second arbitration device for selecting a value retrieved from the output of said one or more logical function circuitries or from said read- write memory, to be input to said adder; and j) a control circuit for controlling the operation of said arbitration devices and the data flow in said module, thereby allowing said accumulating device to iteratively input intermediate results into said read-write memory and generate, in the last iteration, a final result consisting of the intermediate result values obtained in said last iteration.
2. A hash function module according to claim 1, in which the logical function circuitries are implemented by any combination of logical gates selected from the following group:
- • bit wise logical AND of at least two data words, one being the value stored in said accumulating device, and the other one obtained from said memory;
- bit wise logical OR of at least two data words, one being the value stored in said accumulating device, and another one which is obtained from said memory;
- bit wise logical XOR of at least two. data words, one being the value stored in said accumulating device, and another one which is obtained from said memory; and bit wise logical NOT of at least one data word obtained from said memory.
3. A hash function module for carrying out hash function computations of at least two different hash function algorithms, comprising: a) a first set of data registers for storing words of data
Figure imgf000028_0001
of a message block Mu of a padded message M b) ■ a second set of data registers for storing hash function variables; c) a third set of data registers for storing hash function intermediate results; d) an accumulating device for storing at. least a word of data and outputting the same; e) a memory device for storing hash function constants; f) an adder being capable of producing modular addition of at least two data wordst g) an exclusive-or (XOR) circuitry being capable of producing the logical XOR result of at least two words of data, one of which is being output from said accumulating device; h) one or more cychc bit rotation device(s) each of which being capable of carrying out one or more cyclic bit rotation(s) of a word of data that are input from said accumulating device or from said third set of data registers; i) a first arbitration device for selecting a value which can be retrieved from said XOR circuitry, cychc bit rotation device(s), or from said adder, to be stored in said accumulating device; j) an encoder for receiving words of data from said third set of data registers and outputting different patterns of the same according to the hush function algorithm that is performed; k) one or more logical function circuitries for performing logical operations between the words output from said encoder; 1) a second, third, fourth, fifth, and sixth arbitration devices, wherein:
- said fifth arbitration device is used for selecting a value retrieved from said second set of data registers;
- said third arbitration device is used for selecting a value retrieved from said first set of data registers, said value is being provided as input to said exclusive-or circuitry and said second arbitration device;
- said fourth arbitration device is used for selecting a value retrieved from said accumulating device, from said fifth arbitration device, or from said memory device, said value is provided as input to said adder;
- said sixth arbitration device is used for selecting a value produced by said logical function circuitries;
- said second arbitration device is used for selecting a value retrieved from said sixth arbitration device, from said encoder, from said third set of data registers, or froπrsaid third arbitration device, said value is provided as input to said adder; and m) a control circuit for controlling the operation of said arbitration devices and the data flow in said module, thereby allowing said accumulating device to iteratively input intermediate results into said registers and generate, in the last iteration, a final result consisting of the intermediate result values obtained in said last iteration.
4. A hash function module according to claim 1 or 2, wherein the one or more cychc bit rotation device(s) include circuitry for carrying out a single cychc bit rotation of a word of data obtained from the accumulating device, circuitry for carrying out four cyclic bit rotations of a word of data obtained from the accumulating device, circuitry for carrying out five cychc bit rotations of a word of data obtained fro the read- write memory, and circuitry for carrying out thirty cychc bit rotations of a word of data obtained from the read- write memory.
5. A hash -function module according to claim 1 or 3, wherein the one or' more logical function circuitries are capable of producing the following logical functions:
Figure imgf000030_0001
q2 = X®Y@Z.
_ q3 = {XAY)v(XAZ)v{Y Z). Αd ?4 = 7θ(lvZ) wherein X, Y, and Z are words of data obtained from the at least three data registers.
6. A hash function module according to claim 2, wherein the logical gates are utihzed for producing the following logical functions:
Figure imgf000030_0002
q2 = X®Y®Z.
_ ?3 = (-?Λ7)v(lΛZ)v(7ΛZ);and q4 = Y®(XvZ) wherein X, Y, and Z are words of data obtained from the read-write memory.
7. A hash function module according to clai 1 or 2, further comprising a ROM memory for storing and outputting hash function constants of. one or more hash function algorithms.
8. A hash function module according to claim 7, wherein the value output from the ROM memory is provided as an input to the second arbitration device.
9. A hash function module according to claim 1 or 2, further comprising a third arbitration device for selecting the source of data being used as input to the- read- write .memory, said source of data being a word of data obtained from the accumulating device, or from an external data source. .
10. A hash function module according to claim 1, 2, or 3 for carrying out hash function computations of one or more hash function algorithms from the foUowing hst:
- MD5;
- . SHA-1.
11. A hash function module according to claim 3, wherein the one or more cychc bit rotation device(s) include circuitry for carrying out a single cychc bit rotation of a word of data obtained from the accumulating device, circuitry for carrying out five cychc bit rotations of a word of data obtained from the accumulating device, and circuitry for carrying out thirty cyclic bit rotations of a word of data obtained from the read-write memory.
12. A hash function module according to claim 3, wherein the memory device is a ROM memory.
13. A hash function module according to claim 3, further comprising a seventh arbitration device for selecting the source of data beήig used as input to the first set of data registers, said source of data is a word of data retrieved from the accumulating device, or from an external data source.
14. A hash function module according to claim 3, further comprising an eighth arbitration device for selecting the source of data being used as input to the second set of data registers, said source of data being a word of data is the modular addition obtained by the adder, or a word of data obtained from an external data source.
15. A hash function module according to claim 3, wherein the intermediate results are obtained from the second set of data registers, or from the third set of data registers, or is a permutation of the same.
16. A hash function module according to claim 3, wherein the word of data used for carrying out one or more cychc bit rotations is obtained from the accumulating device or is the content of one of the third set of data registers.
PCT/IL2004/000050 2003-01-16 2004-01-18 Flexible hardware implementation of hash functions WO2004063842A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL154010 2003-01-16
IL15401003A IL154010A0 (en) 2003-01-16 2003-01-16 Flexible hardware implementation of hash functions

Publications (2)

Publication Number Publication Date
WO2004063842A2 true WO2004063842A2 (en) 2004-07-29
WO2004063842A3 WO2004063842A3 (en) 2004-12-02

Family

ID=29798452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2004/000050 WO2004063842A2 (en) 2003-01-16 2004-01-18 Flexible hardware implementation of hash functions

Country Status (2)

Country Link
IL (1) IL154010A0 (en)
WO (1) WO2004063842A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787799A (en) * 2020-12-30 2021-05-11 浙江萤火虫区块链科技有限公司 Poseidon Hash algorithm implementation circuit and implementation method thereof
CN113946313A (en) * 2021-10-12 2022-01-18 哲库科技(北京)有限公司 Processing circuit, chip and terminal of LOOKUP3 hash algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155835A (en) * 1990-11-19 1992-10-13 Storage Technology Corporation Multilevel, hierarchical, dynamically mapped data storage subsystem
US5883901A (en) * 1995-09-22 1999-03-16 Hewlett-Packard Company Communications system including synchronization information for timing upstream transmission of data and ability to vary slot duration
US6307857B1 (en) * 1997-06-26 2001-10-23 Hitachi, Ltd. Asynchronous transfer mode controller and ATM control method thereof and ATM communication control apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155835A (en) * 1990-11-19 1992-10-13 Storage Technology Corporation Multilevel, hierarchical, dynamically mapped data storage subsystem
US5883901A (en) * 1995-09-22 1999-03-16 Hewlett-Packard Company Communications system including synchronization information for timing upstream transmission of data and ability to vary slot duration
US6307857B1 (en) * 1997-06-26 2001-10-23 Hitachi, Ltd. Asynchronous transfer mode controller and ATM control method thereof and ATM communication control apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787799A (en) * 2020-12-30 2021-05-11 浙江萤火虫区块链科技有限公司 Poseidon Hash algorithm implementation circuit and implementation method thereof
CN113946313A (en) * 2021-10-12 2022-01-18 哲库科技(北京)有限公司 Processing circuit, chip and terminal of LOOKUP3 hash algorithm
CN113946313B (en) * 2021-10-12 2023-05-05 哲库科技(北京)有限公司 Processing circuit, chip and terminal of LOOKUP3 hash algorithm

Also Published As

Publication number Publication date
IL154010A0 (en) 2003-07-31
WO2004063842A3 (en) 2004-12-02

Similar Documents

Publication Publication Date Title
US5664016A (en) Method of building fast MACS from hash functions
EP1271839B1 (en) AES Encryption circuit
KR100435052B1 (en) Encryption device
USRE44594E1 (en) Method and circuit for data encryption/decryption
US20020066014A1 (en) Message digest hardware accelerator
US8787563B2 (en) Data converter, data conversion method and program
US20060002548A1 (en) Method and system for implementing substitution boxes (S-boxes) for advanced encryption standard (AES)
TW200822664A (en) Modular reduction using folding
CN111444521B (en) Image secret sharing method based on threshold increase and digital signature system
TW200817999A (en) Multiplying two numbers
CN113472525B (en) Low-memory-occupation secret key generation method based on post-quantum cryptography Saber algorithm, encryption and decryption method and system thereof
CN110034918B (en) SM4 acceleration method and device
CN116318660B (en) Message expansion and compression method and related device
WO2003053001A1 (en) Programmable data encryption engine for advanced encryption standard algorithm
WO2004063842A2 (en) Flexible hardware implementation of hash functions
CN114826560B (en) Lightweight block cipher CREF implementation method and system
US20060010327A1 (en) Apparatus and method for performing MD5 digesting
EP1202488B1 (en) Encryption sub-key generation circuit
CN110855421A (en) Improved fully homomorphic encryption method
US20030138098A1 (en) Executing permutations
US20080063189A1 (en) Optimal signed-digit recoding for elliptic curve cryptography
CA2391997C (en) Methods and apparatus for keystream generation
CN113300829A (en) Hardware implementation device of SM3 algorithm
WO2009034393A1 (en) Aes-encryption apparatus and method
CN1795637A (en) Method and apparatus for a low memory hardware implementation of the key expansion function

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase