WO2023143696A1

WO2023143696A1 - Apparatus and method for memory integrity verification

Info

Publication number: WO2023143696A1
Application number: PCT/EP2022/051602
Authority: WO
Inventors: Qiming Li; Sampo Sovio
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2023-08-03

Abstract

A computing apparatus configured to detect memory corruption by comparing predetermined hash values to hash values generated at run time. Generation of the hash values is based on a random non-singular square binary key matrix, and data blocks having a block size equal to a word size of the computing apparatus. The hash value is generated by initializing a hash state based on a predetermined initial state, iteratively incorporating one or more data blocks into the hash state, then using the final hash state as the hash value. Each data block is incorporated into the hash state by adding the hash state to the data block to form a vector sum, then left multiplying the vector sum by the key matrix.

Description

APPARATUS AND METHOD FOR MEMORY INTEGRITY VERIFICATION

TECHNICAL FIELD

[0001] The aspects of the disclosed embodiments relate generally to computer security and more particularly to memory integrity protection in computing devices.

BACKGROUND

[0002] Ensuring memory integrity during execution of computer application program is an important part of computer security. Validating memory integrity and detecting memory corruption during application execution plays an important role when securing information being processed by software applications. Unintentional memory corruption, which may be caused by software defects or hardware faults, is typically handled using error correction techniques, such as cyclic redundancy checks or Reed-Solomon error correction codes. Intentional memory corruption may be indicative of an attack by a malicious actor and typically requires stronger detection techniques such as cryptographic hash functions, message authentication codes (MAC), and cryptographic signatures.

[0003] Malicious actors may employ powerful offline tools to compromise data stored at rest. Conventional MAC design strives to bound an attacker’s success probability by a very small number, such as 2'⁸⁰. Unfortunately, conventional cryptographic hash functions designed to achieve these small probabilities, such as secure hash algorithm one (SHA-1) and the nested MACs based thereon, are computationally expensive. These more secure corruption detection methods also rely on high iteration counts which consume precious processing resources, and use large block sizes leading to wasteful data padding.

[0004] These secure approaches to memory integrity are often too computationally expensive to be used with transient in-memory information. It would therefore be beneficial to have efficient and less resource intensive approaches for corruption detection capable of providing appropriate levels of security while reducing processing costs. [0005] Thus, there is a need for improved methods and apparatus capable of providing secure memory integrity protection with improved computational efficiency. Accordingly, it would be desirable to provide methods and apparatus that addresses at least some of the problems described above.

SUMMARY

[0006] The aspects of the disclosed embodiments are directed to improved methods and apparatus for providing memory integrity protection in computing apparatus based on generation of fast keyed hash values. The aspects of the disclosed embodiments provide appropriately secure corruption detection while reducing processing costs typically associated with memory integrity protections.

[0007] According to a first aspect, the above and further advantages are obtained by an apparatus. In one embodiment, the apparatus includes a processing device and a memory, where the memory comprises one or more data blocks and each data block in the one or more data blocks comprises a block size number of data bits. The processing device is configured to generate a fast keyed hash value based on the one or more data blocks, a key matrix, and the block size. The processing device then determines a compare result based on the fast keyed hash value and a predetermined value, and selects an execution flow based at least in part on the compare result. Generation of the fast keyed hash value includes initializing a hash state based on a predetermined initial state, iteratively incorporating each data block in the one or more data blocks into the hash state, and when all data blocks in the one or more data blocks are incorporated into the hash state, setting the fast keyed hash value to the hash state. Incorporating a data block into the hash state is accomplished by adding the hash state to the data block to form a vector sum, then left multiplying the vector sum by the key matrix. The aspects of the disclosed embodiments provide appropriately secure corruption detection while reducing processing costs typically associated with memory integrity protections.

[0008] In a possible implementation form, the processing device includes hardware logic stage configured to perform the step of incorporating a data block into the hash state by adding a first hash state to the data block to form a first vector sum, then left multiplying the first vector sum by the key matrix to form a next hash state. The logic stage comprises hardware XOR gates adapted to perform addition operations, and hardware AND gates adapted to perform multiplication operations. Using hardware logic to incorporate a data block into the hash state improves overall performance of the apparatus by offloading the central processing unit.

[0009] In a possible implementation form, the processing device comprises a processor, the one or more data blocks comprises an instruction queue, and the processing device is configured to iteratively incorporate one or more instructions from a head of the instruction queue into the hash state. When a pre-determined number of instructions have been incorporated, the processing device is configured to determine a compare result based on the hash state and a predetermined value, and select an execution flow of the processor based at least in part on the compare result. Applying the memory corruption detection directly to the instruction queue protects the processor from executing instructions that may have been modified by a malicious actor.

[0010] In a possible implementation form, the one or more data blocks include a plurality of data blocks, and the processing device includes a cascaded logic array configured to receive the plurality of data blocks and generate the fast keyed hash value. The cascaded logic array includes a first logic stage adapted to generate a first hash state based on a prior hash state and a first data block, and a second logic stage configured to generate a next hash state based on the first hash state and a second data block. The cascaded logic array provides a hardware implementation that generates the desired fast keyed hash value within a single instruction cycle without using any processing resources.

[0011] In a possible implementation form, the plurality of data blocks include a plurality of instructions, and the cascaded logic array is configured to generate a plurality of hash states based on the plurality of instructions and the key matrix. When a pre-determined number of instructions have been processed, the processing device is configured to determine the compare result based on the plurality of hash states and a plurality of reference values. By determining a fast keyed hash value for each instruction in the instruction queue, corruption can be detected early, such as several instructions prior to execution of the corrupted instruction, in a look ahead manner.

[0012] In a possible implementation form, the memory includes a plurality of chunks and each chunk in the plurality of chunks includes one or more data blocks. The processing device is further configured to: generate a plurality of fast keyed hash values where each fast keyed hash value corresponds to a different one chunk in the plurality of chunks, generate a derived key matrix based on the key matrix and a number of chunks in the plurality of chunks, generate a final hash value based on the derived key matrix and the plurality of fast keyed hash values, and determine the compare result based on the final hash value and the predetermined value. Partitioning hash value generation facilitates performance improvements through parallel processing for generation of the fast keyed hash values.

[0013] In a possible implementation form, the processing device is configured to generate each fast keyed hash value in the plurality of fast keyed hash values in parallel. Parallel processing significantly reduces the processing time required to generate the hash values.

[0014] In a possible implementation form, the processing device comprises a plurality of cascaded logic arrays and a final cascaded logic array, where the plurality of cascaded logic arrays are configured to generate the plurality of fast keyed hash values, and the final logic array is configured to generate the final hash value. Generating the final hash value with a dedicated hardware implementation frees up processor resources to perform other tasks.

[0015] In a possible implementation form, one chunk in the plurality of chunks includes a different number of data blocks than the other chunks in the plurality of chunks. An advantage of the disclosed fast keyed hash method is its ability to operate on chunks of different sizes without increasing processing consumption or data padding.

[0016] In a possible implementation form, the memory includes one or more of a file data, a software application, an operating system, and a secure channel. An advantage of the disclosed embodiments is the ability to detect memory corruption in a wide range of applications.

[0017] In a possible implementation form, the processing device is further configured to generate a message authentication code by applying a pseudo random function to one or more of the fast keyed hash values and the final hash value, and determine the compare result based on the message authentication code and the predetermined value. The disclosed embodiments are equally applicable to both hash value based and MAC based memory corruption detection.

[0018] In a possible implementation form, the key matrix includes a non-singular square binary matrix having dimensions of the block size by the block size, and the data block comprises a binary vector. Using a key matrix of this type reduces an attacker’s success probability to an appropriately low value. [0019] In a possible implementation form, the one or more data blocks and the key matrix comprise elements lying in the Galois field GF(2^q) wherein q is a positive integer. In addition to a binary implementation, the disclosed embodiment are also applicable to higher order finite fields.

[0020] In a possible implementation form, the block size corresponds to a word size of the processing device. Setting the block size to a word size of the apparatus significantly reduces processing requirements for memory corruption detection.

[0021] In a possible implementation form, the key matrix comprises one of a unit upper triangular Boolean matrix, a unit lower triangular Boolean matrix, the product of a random permuted lower unit triangular Boolean matrix and a random upper unit triangular Boolean matrix, and the product of a random lower unit triangular Boolean matrix and a random permuted upper unit triangular Boolean matrix. Use of the above described types of key matrices significantly reduces processing requirements while maintaining acceptable security.

[0022] In a possible implementation form, the processing device is configured to divide the key matrix into a plurality of square sub-keys, generate a vector product by left multiplying each square sub-key by a sub-vector value, and store the vector product in a lookup table. The processing device then performs left multiplication of the vector sum by a key matrix by looking up one or more vector products in the lookup table. Creation of a lookup table reduces processing time required to generate the hash values.

[0023] According to a second aspect, the above and further advantages are obtained by a method that includes generating a fast keyed hash value based on one or more data blocks, a key matrix, and a block size. Each data block in the one or more data blocks includes a block size number of data bits. The method determines a compare result based on the fast keyed hash value and a predetermined value and selects an execution flow based at least in part on the compare result. Generating the fast keyed hash value comprises initializing a hash state based on a predetermined initial state, iteratively incorporating each data block in the one or more data blocks into the hash state. Incorporating a data block into the hash state comprises adding the hash state to the data block to form a vector sum, then left multiplying the vector sum by a key matrix. When all data blocks in the one or more data blocks are incorporated into the hash state, the fast keyed hash value is set to the hash state. The key matrix comprises a random nonsingular square binary matrix having dimensions of the block size by the block size, and the block size corresponds to a memory word size of the memory. The aspects of the disclosed embodiments provide appropriately secure corruption detection while reducing processing costs typically associated with memory integrity protections.

[0024] In a possible implementation form, the method further includes generating a plurality of fast keyed hash values wherein each fast keyed hash value corresponds to a different one chunk in a plurality of chunks, generating a derived key matrix based on the key matrix and a number of chunks in the plurality of chunks, generating a final hash value based on the derived key matrix and the one or more fast keyed hash values, and determining the compare result based on the fast keyed hash value and a predetermined value. Dividing the memory region into a plurality of chunks facilitates parallel or out of order generation of the fast keyed hash values.

[0025] In a possible implementation form, the processing device is further configured to generate a message authentication code by applying a pseudo random function to one or more fast keyed hash values and the final hash value, and determine the compare result based on the message authentication code and the predetermined value. Use of a MAC provides a higher level of security than is provided by the hash value alone.

[0026] In a possible implementation form, the method further includes generating a message authentication code by applying a pseudo random function to one or more fast keyed hash values and the final hash value; and determining the compare result based on the message authentication code and the predetermined value.

[0027] These and other aspects, implementation forms, and advantages of the exemplary embodiments will become apparent from the embodiments described herein considered in conjunction with the accompanying drawings. It is to be understood, however, that the description and drawings are designed solely for purposes of illustration and not as a definition of the limits of the disclosed invention, for which reference should be made to the appended claims. Additional aspects and advantages of the invention will be set forth in the description that follows, and in part will be obvious from the description, or may be learned by practice of the invention. Moreover, the aspects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS

[0028] In the following detailed portion of the present disclosure, the invention will be explained in more detail with reference to the example embodiments shown in the drawings, in which like references indicate like elements and:

[0029] Figure 1 is a schematic block diagram of an exemplary apparatus configured to provide improved memory corruption detection according to the aspects of the disclosed embodiments;

[0030] Figure 2 illustrates a schematic block diagram of a logic circuit and corresponding compact notation thereof appropriate for use in memory integrity protection of a computing apparatus incorporating aspects of the disclosed embodiments;

[0031] Figure 3 illustrates a schematic block diagram of an exemplary system configured to ensure memory integrity of an instruction queue in accordance with the aspects of the disclosed embodiments;

[0032] Figure 4 illustrates a schematic block diagram of a cascaded logic array incorporating aspects of the disclosed embodiments;

[0033] Figure 5 illustrates a schematic block diagram of an exemplary apparatus employing a cascaded logic array for instruction integrity protection in accordance with aspects of the disclosed embodiments;

[0034] Figure 6 illustrates a schematic block diagram of an exemplary apparatus configured to generate a final hash value based on cascaded logic arrays in accordance with aspects of the disclosed embodiments;

[0035] Figure 7 illustrates a pictorial diagram of an approach for accelerating matrix operations in accordance with aspects of the disclosed embodiments;

[0036] Figure 8 illustrates a flow diagram of an exemplary method for ensuring memory integrity in a computing apparatus incorporating aspects of the disclosed embodiments; and

[0037] Figure 9 illustrates a flow diagram of an exemplary method for generating a fast keyed hash value in accordance with aspects of the disclosed embodiments. DETAILED DESCRIPTION OF THE DISCLOSED EMBODIMENTS

[0038] Referring to Figure 1, a diagram of an exemplary apparatus 100 configured to provide memory integrity protection in accordance with the aspects of the disclosed embodiments is illustrated. The exemplary apparatus 100 of the disclosed embodiments is directed to a computing apparatus 100 having improved methods and apparatus for detection of memory corruption. The apparatus 100 provides protection against accidental and intentional memory corruption while reducing the computing resources consumed during corruption detection. The disclosed memory integrity protection is based on generation of a fast keyed hash value adapted to be efficiently generated in either software or hardware and, when desired, may be used to generate a hash-based message authentication code (MAC).

[0039] In the illustrated embodiment, the apparatus 100 includes a processing device 102 and a memory 104. Any suitable processing device 102 may be advantageously employed in the apparatus 100. The processing device 102 may include, for example, a high-performance multicore computer processing device, such as those used in large cloud computing data centers, a multi-core or single core microprocessor such as those used in workstations and laptop computers, or a specialized processing device such as those used in mobile communications devices and telecommunications equipment.

[0040] The processing device 102 is communicatively coupled over a suitable communication link 108 to the memory 104 thereby allowing the processing device 102 to read and write data or information to/from the memory 104. The memory 104 may be any suitable form of computer memory capable of storing digital data. For example, the memory may include random-access memory (RAM), read-only memory (ROM), volatile and non-volatile storage, or other appropriate form of computer memory capable of storing program instructions and program data during execution of a software application.

[0041] The processing device 102 may be configured to access program instructions and data stored in the memory 104 based on a pre-determined word size, wherein the word size may be determined by a hardware or firmware configuration of the processing device 102 and its associated memory 104 or supporting memory subsystem. Appropriate word sizes include for example 16 bits, 32 bits, or 64 bits. In alternate embodiments, any desired word size corresponding to the computer apparatus being protected can be implemented. [0042] The following notational conventions are used throughout this document: bold capital letters (K) indicate matrices; bold lower-case letters (bi) indicate vectors; lower case letters in italics (n) indicate scalar values; and double vertical bars ( | | ) denote concatenation of vectors.

[0043] The exemplary embodiments disclosed herein are configured to generate hash values and/or MAC values over regions 150 of the memory 104 and to provide memory integrity detection based on these generated values. As will be discussed further below, the memory region(s) 150 being protected may be processed based on a plurality of chunks 152, 154, 156, where each chunk 152, 154, 156 includes an array of one or more data blocks. For example, a first chink 152 includes an array of one or more data blocks bi, b2, ... b_p. Each data block bi, b2, b_p in the one or more data blocks has the same block size n, where the block size n corresponds to the number of data elements or data bits in each data block bi, b2, ... b_p.

[0044] The processing device 102 is configured to detect memory corruption within a region 150 of the memory 104. The apparatus shown in Figure 1 illustrates an exemplary method 130 for detecting memory corruption with the memory region 150. As shown in this example, in one embodiment, the method 130 includes generating 140 a plurality of fast keyed hash values, where each fast keyed hash value corresponds to a different one chunk in the plurality of chunks 152, 154, 156. Generation of each fast keyed hash value is based on a hash method 114 and a key matrix K.

[0045] A derived key matrix K’ is then generated 142 where the generation 142 is based on the key matrix K used to generate the fast keyed hash values and the number of chunks in the plurality of chunks. The derived key matrix K’ is generated by multiplying the key matrix K by itself 5 times, where .s is the number of chunks in the plurality of chunks. Said another way, the derived key matrix K’ is generated by raising the key matrix to a power equal to the number of chunks K'

[0046] A final fast keyed hash value is generated 144 based on the derived key matrix K’ and the plurality of fast keyed hash values. For clarity, generation 144 of the final fast keyed hash value may be expressed mathematically as hf = H(K’, hi || h2 || ... || 11M) + hi, where t is the number of fast keyed hash values in the plurality of fast keyed hash values, and hi || h2 || ... || hi-i is a concatenation of all except the last fast keyed hash value in the plurality of fast keyed hash values, and hi is the last fast keyed hash value in the plurality of hash values. [0047] Parallel processing of hash function computations can be problematic because when using conventional techniques, processing of a next data block requires that processing of the prior data block be completed first. This parallel processing problem exists with most conventional hash methods in use today.

[0048] It will be appreciated that generation of each fast keyed hash value in the plurality of fast keyed hash values is order independent. For example, all the fast keyed hash values in the plurality of fast keyed hash values may be generated in parallel. Alternatively, the fast keyed hash values may be generated in any desired order. In one embodiment, selecting the number of chunks in the plurality of chunks is based on a number of CPU cores available in the processing device 102. This provides significant computational advantages by taking advantage of the ability to generate the fast keyed hash values in parallel or out of order.

[0049] The processing device 102 is configured to determine 116 a compare result by comparing the final fast keyed hash value to a predetermined value. The predetermined value may, for example, be a final hash value generated during a reference or a known good execution of the application program. If the memory region 150 has not been corrupted, the generated 144 final fast keyed hash value will match the predetermined value. If the final fast keyed hash value is different than the predetermined value, it is determined that memory corruption has occurred and the processing device 102 is configured to take appropriate action.

[0050] In certain embodiments it may be beneficial to process the memory region 150 in a single chunk. When a single memory chunk is desired, the corresponding fast keyed hash value may be used directly as the final hash value when determining 116 the compare result. In this example, the additional steps of generating 142 a derived key and generating 144 the final hash value may be skipped.

[0051] The processing device 102 may then select a suitable execution flow 118, 120 based at least in part on the determined 116 compare result. When the compare result indicates the fast keyed hash value is the same as the predetermined value, a first execution flow 120 is configured to allow the program execution to continue in accordance with the currently executing application program. When the compare result indicates that the final fast keyed hash value is different than the predetermined value, memory corruption is indicated. In this example, a second execution flow 118 can be selected, which is a recovery execution path. [0052] The recovery execution path may for example attempt to correct the corruption and recover execution of the application program. Alternatively, the recovery execution path can be configured to choose to terminate execution of the currently executing application program, and when desired, log the event for later analysis or auditing.

[0053] The following general information about hash values and MACs may aid understanding of the disclosed hash methods. A MAC is a function T <- MAC(K_S, m) such that given a message m and a key K_s, the MAC function computes a tag T of a fixed size regardless of the length of the message. The message m may be any desired data such as the data blocks bi, b2, ... b_p described above. To protect the integrity of the message m, a secret key K_s is chosen and a tag T computed over m and K_s. The resulting tag T is stored or transmitted together with the message m. At a later time, some data m' and tag T' are retrieved. To validate the integrity of the data m', a check may be made to verify whether it holds that T' = MAC(K_S, m').

[0054] In cryptography, an attacker is said to be passive if the attacker is allowed to observe a number of pairs of m and T generated from the MAC function. An attacker is said to be active if the attacker is allowed to ask a legitimate party to compute tags on a number of messages chosen by the attacker. A MAC is said to be secure against a passive or an active attacker if, without the knowledge of the key, it is computationally infeasible for the attacker to find a pair m' and T' such that T' = MAC(K_S, m'), but m' is different from all the data m that the attacker has seen or chosen.

[0055] Quantitatively, the security of a MAC scheme is typically specified as the success probability of such an attacker given a specific amount of computation resources. In practice, most existing message authentication codes process a message m in blocks of i bits, where i is a relatively large number determined by the design of the MAC. Messages that are not multiples of z bits in length would have to be padded with dummy data out to a multiple of z bits.

[0056] A conventional method of constructing a secure MAC is known as a nested MAC (NMAC), which is a theoretical function defined as NMAC(K1, K2, m) = f(Kl, h(K2, m)). It may be proven that if the inner function h is a e-almost-universal (a- AU) hash family, and the outer function f is a pseudo random function (PRF), then the constructed NMAC is secure within certain bounds determined by a and the security bound of the PRF. A hash family h is essentially a keyed hash function D <- h(K_s, m) that computes a digest D of a fixed size (for example, n- bit) given an arbitrary message m. Choosing a hash function from such a hash family is equivalent to choosing a key K_s from a predefined key space. The form of the function looks the same as the MAC, but the security requirement is different.

[0057] A hash family is said to be Universal if the probability that two distinct messages x and y will produce the same hash is no more than 2'ⁿ, where n is the digest bit size, and the key K_s is uniformly chosen at random. A hash family is said to be s-almost-universal if the probability is bounded by a, for some value of a larger than 2'ⁿ.

[0058] A concrete construction using the NMAC method is a hash-based message authentication code (HMAC), where both the functions f and h are based on a cryptographic hash function, such as SHA-1. More specifically, to compute an HMAC using SHA-1 over a message m given key K_s, two keys KI and K2 are first derived from the key K_s. A hash value is then computed as h = h(K2, m) = SHA-1(K21 | m). Finally the HMAC is computed as f(Kl, h) = SHA-1(K1 | | h). It may be proven that the function h is computationally s-almost-universal, and the HMAC constructed from h is secure when the function f is derived from a hash function that is a PRF. In the above example, the hash function f is the SHA-1 hash algorithm which is known to be a PRF, and therefore the resulting HMAC will be secure.

[0059] Conventional MAC schemes introduce significant drawbacks when applied to memory integrity applications. Although the theoretical framework around conventional NMACs allows for rigorous security proofs, the actual HMAC scheme suffers from performance penalties when applied to memory integrity protection. For example, when a standard secure hash algorithm such as SHA-1 is used, a block size of 512 bits or larger is required thereby making buffering and data padding unavoidable. These hash schemes also require initial setup costs to initialize constant values used during computations. Insertion of the key before each message adds additional computational costs.

[0060] Another drawback of using an HMAC is the serial nature of their computations which results from a need to process the data blocks in sequence. This serial nature introduces difficulties and extra overhead when used in memory integrity applications, which often require parallel processing and out-of-order processing to achieve their goals.

[0061] Conventional cipher block chaining message authentication codes (CBC-MAC) are also not well suited to memory integrity applications. The main problem when using CBC-MAC for memory integrity is that block cipher encryption is typically much more computationally expensive than hash-based techniques making them slower than most hash-based approaches.

[0062] In the embodiment illustrated in Figure 1 the fast keyed hash values are based on the method 114 for generating a fast keyed hash value where the method 114 creates a hash family H specified by a key K. The hash family created by the method 114 is further defined by an additional parameter, the block size //, where n is beneficially chosen to match a word size of the memory 104. For example, many conventional CPU and memory architectures operate using a fixed word size n such as 16-bits, 32-bits, or 64-bits. Any block size n corresponding to a word size of the underlying computing apparatus 100 may be advantageously employed for generation of the fast keyed hash value.

[0063] The key K used in the exemplary method 114 is a randomly chosen n-by-n invertible Boolean matrix, where n is the block size and each of the n² entries in the key matrix is either 0 or 1. It is interesting to note that an invertible matrix is also a non-singular matrix. A message m being hashed by the method 114 is considered as a vector of bits, and the length of the message m is always a multiple of the block size n.

[0064] Generation of the fast keyed hash value is performed by the exemplary processing device 102 according to the method 114 which begins by initializing 122 a hash state based on a predetermined initial state. The predetermined initial state may be any desired binary vector having dimensions of the block size n by 1. For example, in one embodiment the initial vector is the zero vector. A zero vector as used herein refers to a vector where all elements have a value of zero. When generating a plurality of fast keyed hash values based on a plurality of memory chunks 152, 154, 156, the initial state for a first one of the fast keyed hash values may be set to any desired vector value, and the initial hash value for each of the remaining fast keyed hash values may have an initial value of the zero vector.

[0065] Once the initial hash state is initialized 122, each data block in the one or more data blocks being processed bi, b2, ... b_p. is iteratively incorporated 124 into the hash state. Note that the one or more data blocks bi, b2, ... b_p may include any desired number of data blocks P-

[0066] Each data block bi is incorporated 124 into the hash state by adding the current hash state Sj to the data block bi to form a vector sum. The vector sum is then left multiplied by a key matrix K. This step 124 may be represented mathematically as: Sj+i = K(Si + bi), where Sj+i is the next hash state, Si is the current hash state, bi is the data block being incorporated, and K is the key matrix. Once all data blocks in the one or more data blocks bi, b2, ... b_p have been incorporated into the hash state, the resulting fast keyed hash value is set 126 to the final hash state.

[0067] The method 114 of generating a fast keyed hash value based on the one or more data blocks bi, b2, . . . b_p, a key matrix K, and the block size //, may be represented by the following pseudo code: s = So for each bi in bi, b2, b_p s = K(s + bi) end for loop hf = s where s is the current hash state, So is the predetermined initial state, bi is the data block currently being incorporated into the hash state, bi, b2, ... b_p are the one or more data blocks being processed, K is the key matrix, and hf is the resulting fast keyed hash value.

[0068] The initial hash state So can be any arbitrary value, and when desired may be used to specify an extra parameter when computing the fast keyed hash value. In certain embodiments, setting the initial value So to the zero vector provides appropriate results.

[0069] It will be appreciated that the data elements in each data block are binary values and all multiplication and addition operations are performed in the finite field or Galois Field GF(2). Thus, addition of two elements or vectors corresponds to a logical exclusive OR (XOR) operation and multiplication of two elements or vectors corresponds to a logical AND operation.

[0070] A hash is said to be a-almost-universal when given two distinct binary vectors x and y, and a key matrix K uniformly chosen at random, the probability that the hash value of x equals the hash value of y is no more than a = 1/(2"- 1), where n is the data block size or dimension of the binary vectors x and y. The above-described fast keyed hash method 114 can be shown to be Universal.

[0071] A secure MAC can be obtained by applying a PRF to the fast keyed hash value generated by the method 114. For example, if the fast keyed hash method 114 is represented by the functional notation H(K, m), a secure MAC may be defined as MAC(Ki, K, m) =y(Ki, H(K, m)). This MAC is secure when the outer function /(•) is a PRF. Any appropriate PRF may be chosen for the outer function /(•), such as the standard cryptographic hash function SHA-1 or other suitable cryptograph hash function. In the disclosed embodiments, input to the outer function *), is very small, such as the block size n. In contrast, conventional approaches often use millions or billions of bytes as the input.

[0072] A MAC constructed based on the fast keyed hash method 114 of the disclosed embodiments provides a higher level of security than is provided by an HMAC. Conventional HMACs are also based on an NMAC construction but use a cryptographic hash function as the inner hash. The higher level of security is obtained because the arguments used in the fast keyed hash method 114 are information theoretic and do not rely on the computational difficulties of a mathematical problem as does the HMAC.

[0073] As will be discussed further below, the fast keyed hash method 114 may be implemented in either hardware or software. A straight forward software implementation of the abovedescribed pseudo-code is significantly faster than both the conventional SHA-1 and SHA-256 cryptographic hash functions for short data and provides comparable results for long data. In embodiments employing parallel processing techniques, which are facilitated by the fast keyed hash method 114, significant performance improvements may be achieved for long data.

[0074] To ensure memory integrity detection remains secure, it is important to carefully manage or protect cryptographic keys within the target computing system. Secure management of cryptographic keys is typically performed by trusted software and/or hardware components 106 incorporated into the computer system. These trusted components 106 may be communicatively coupled via a communication channel 110 with the processing device 102. The communication channel 110 can be based on any appropriate secure communication channel methodology. These secure components 106 may be included in an operating system, a trusted execution environment (TEE) or other appropriately secure component or execution environment included in the apparatus 100.

[0075] Figure 2 illustrates a schematic diagram of an exemplary logic circuit 200 appropriate for use in memory integrity protection of a computing apparatus incorporating aspects of the disclosed embodiments. The logic circuit 200 is configured to iteratively incorporate a data block 230 into the hash state 232 and provides a hardware implementation for incorporating a data block into the hash state. In the schematic diagram illustrated in Figure 2, logical XOR operations are depicted as a circle surrounding a plus sign 204, and logical AND operations are depicted using an AND gate symbol 202. As an aid to understanding, the logic circuit 200 is illustrated with a block size of two (2) where the data block 230 has two bits bi, bi, the hash state 232 has two bits Si, Si., and the key is a two-by-two matrix represented by the four scalar blocks n, k , kn, kn. Those skilled in the art will readily recognize that the illustrated circuit 200 can be readily expanded to include any desired number of data bits to generate a fast keyed hash value having any desired block size n such as 16-bits, 32-bits, 64-bits, or other desired number of data bits or elements.

[0076] In the illustrated circuit 200, a clock signal (not shown) may be used to control loading the data block 230 and the hash state 232 into data registers 230 and 232, respectively. The data block 230 is updated with values taken from a group of one or more data blocks, such as the data blocks 152 described above. At the start of a clock cycle, new values are loaded into the data block 230 and the next hash state 240 is loaded into the data block 232. The logic circuit 200 then adds 234 the data block 232 to the hash states from data block 230 to form a vector sum 236. The vector sum(s) 236 is left multiplied 238 by the key matrix k , k , kn, kn to produce the next hash state(s) 240. The process repeats at the next clock cycle until all desired data blocks have been incorporated into the hash state of data block 232.

[0077] The logic circuit 200 illustrates an abstract level of design for an actual logic circuit. As an aid to understanding many of the details of an actual logic circuit are omitted. These omissions include details such as clock synchronization, register configuration, etc. The logic circuit 200 illustrates how Boolean matrix multiplication and vector additions in a GF(2) Finite Field can be efficiently implemented in hardware using basic logic gates.

[0078] For clarity of presentation, the logic circuit 200 may be depicted using the compact representation illustrated by the hashing circuit 206. The compact circuit representation 206, also referred to as a hashing circuit, is used herein to depict the same logic circuit topology shown in the logic circuit 200 and described above and may be configured to process any desired block size number of data bits.

[0079] In the hashing circuit 206 the box labelled K represents left-multiplication of the key matrix, the circle with a plus sign 218 represents addition of vectors in GF(2), which is essentially a logical XOR operation. The box labelled b represents the input data blocks, and the box labelled s represents the hash state vector. The exemplary hashing circuit 206 is configured to add a data block 210 to the current hash state 208 to produce a vector sum 212, then left multiply the vector sum 212 by a key matrix K to produce a new or next hash state 214.

[0080] The next hash state 214 may then be loaded into the current hash state 216 to prepare for incorporation of additional data blocks. As an aid to understanding the logic circuit 200 illustrates a circuit having a block size of only 2-bits. It will be appreciated that the data block 210, key matrix K, and the hash state 216 of the logic circuit 206 may include any desired block size such as 16-bits, 32-bits, 64-bits, or other desired block size.

[0081] Figure 3 illustrates a diagram of a system 300 configured to ensure memory integrity of an instruction queue in accordance with the aspects of the disclosed embodiments. In many computing systems it is desirable to ensure that the sequence of instructions executed by a processing device has not been tampered with by an attacker or accidentally modified due to a software or hardware fault. Integrity of the instruction sequence may be ensured by employing hash values or corresponding MAC values. Hash values, or MAC values, generated during program execution may be compared with corresponding predetermined values generated from an expected sequence of instructions, or generated during a prior known good execution of the program. When the values are different, memory corruption is indicated, and program execution may be interrupted allowing the error to be properly handled.

[0082] Computation of a hash value or MAC value for instructions stored in memory may be readily generated based on the foregoing embodiments. The exemplary embodiment 300 illustrated in Figure 3 shows an efficient hardware enhanced approach for dynamically generating a hash value based on a sequence of instructions being executed, such as instructions moving through an instruction queue 302.

[0083] During program execution, a processing device such as the processing device 102 described above, fetches instructions from a memory, such as the memory 104. The fetched instructions are then decoded, input registers are prepared, the instruction is executed, and when desired, results are written back to a register. In modern processing devices, an instruction queue, also referred to as an instruction pipeline, may be used to improve throughput by fetching additional instructions from memory and loading them into the queue. For example, certain standardized processor architectures include an instruction queue that holds thirteen (13) instructions.

[0084] In the illustrated embodiment 300, a hash value is generated by coupling the head 320 of an instruction queue 302 to a hashing circuit 310, where the hashing circuit 310 may be an implementation of the hashing circuit 206 described above and illustrated in Figure 2. The head 320 of the instruction queue 302 contains the instruction to be executed next by the processor 318. When a desired starting instruction is reached, the hash state 306 may be set to a desired initial state as described above. Each time an instruction is moved to the head 320 of the queue, the instruction is also input to the hashing circuit 310.

[0085] The hashing circuit 310 adds the next instruction from the head 320 of the queue to the current hash state 306 to produce a vector sum 322. The vector sum 322 is multiplied by a key matrix K to produce a new hash state 326. Each time a new instruction is moved to the head 320 of the queue 302, the hash state 306 is updated with the new hash state 326 and the cycle repeats. At any given time, the new hash state 326 contains a value incorporating all the instructions from the instruction queue 302 starting from some initial starting position.

[0086] The hashing circuit 310 provides a hardware implementation of the method step 124 described above with reference to the method 114, where the sequence of instructions 320 taken from the head of the queue 302 correspond to the one or more data blocks bi, b2, ... b_p.

[0087] The fast keyed hash value 314, which represents the hash state 306 after a desired number 308 of instructions from the instruction queue 302 have been incorporated, is input to an interrupt trigger 316. Acounter 308 is used to control the number of instructions incorporated into the hash state 306. Once the desired number of instructions have been incorporated, the fast keyed hash value 314 is set to the hash state 306 and used by the interrupt trigger 316 to generate an interrupt signal 324. The hash state 306 may then be reset to a desired initial value. The reference value 312 is set to a predetermined value and used by the interrupt trigger 316 to detect corruption.

[0088] Memory corruption is detected by comparing the reference value 312 to the fast keyed hash value 314 and generating a compare result based on the counter 308. The interrupt trigger 316 then generates an interrupt or signal 324 based at least in part on the compare result. The processor 318 selects a desired execution path based on the signal 324. In certain embodiments, it may be desirable to configure the hashing circuit 310 and the interrupt trigger 316 to remain dormant until the counter 308 is set to a non-zero value.

[0089] Figure 4 illustrates a diagram of an apparatus 400 including a cascaded logic array 450 incorporating aspects of the disclosed embodiments. The cascaded logic array 450 improves memory integrity computations by allowing multiple data blocks bi, b2, ... b_p to be incorporated into multiple hash states Si, S2, ... S_p in parallel. The iterative memory integrity scheme employed in the exemplary embodiment 300 described above, incorporates data blocks into the hash state sequentially one block at a time thereby requiring a series of clock cycles. In contrast, the cascaded logic array 450 propagates the hash state from one logic stage 404 to the next logic stage 406 directly without any delay. This allows the cascaded logic array 450 to produce a fast keyed hash value 408 based on a plurality of data blocks 402 nearly immediately with only hardware signal propagation delays.

[0090] Each data block in the plurality of data blocks 402 is input to a corresponding one logic stage in the cascaded logic array 450 to produce a fast keyed hash value 408 based on the plurality of data blocks 402. For example, consider operation of two successive cascaded logic stages 404, 406. A first logic stage 404 receives a hash state 412 from a prior logic stage and adds it to a corresponding data block 414 from the plurality of data blocks 402. The resulting vector sum 410 is left multiplied by a key matrix K to produce a hash state 412. The hash state 412 is then propagated directly as a first hash state 416 to a second logic stage 406. The second logic stage 406 receives the hash state 416, adds it to a corresponding data block 418 and produces a next hash state 420.

[0091] For illustrative purposes, the cascaded logic array 450 is shown with multiple separate key matrix boxes, each labelled with the letter K. It will be appreciated that an actual hardware implementation may include a single key register shared among all the logic stages.

[0092] The cascaded logic array 450 significantly reduces processing time necessary for memory integrity validation by incorporating all the data blocks into the fast keyed hash value in parallel.

[0093] Referring to Figure 5 there can be seen an exemplary apparatus 500 employing a cascaded logic array 550 incorporating aspects of the disclosed embodiments. The exemplary apparatus 500 is configured to protect integrity of an instruction queue and provide instruction look ahead for early detection of corruption. [0094] The exemplary apparatus 500 provides instruction queue integrity similar to the exemplary apparatus 300 described above, where a hash state 510 is generated based on instructions from the head ii of an instruction queue 502, and an interrupt trigger 508 generates an interrupt signal 518 based on a counter 506 and a predetermined reference value 504. In the exemplary embodiment 500 the cascaded logic array 550 is configured to add the first hash state 510 to the instruction 522 at the head of the queue 502. The resulting first hash state 510 incorporates all instructions reaching the head of the queue since the counter 506 was set.

[0095] In addition to the hash state 510, the exemplary apparatus 500 also generates a second hash state 512 that incorporates the next instruction 524 from the instruction queue 502 into the first hash state 510. This produces a fast keyed hash value 512 that allows the interrupt trigger 508 to look ahead to detect corruption before the next instruction 524 reaches the head 522 of the queue 502. Configuring the cascaded logic array 550 to generate hash states 510, 512, 514, . . . 516 for every instruction in the instruction queue 502, allows the interrupt trigger to detect instruction corruption several instruction cycles before a corrupted instruction reaches the head 522 of the instruction queue 502.

[0096] For example, a processing device has an instruction queue holding thirteen instructions and predetermined reference values 504 for the next ten instructions. When the counter 506 is set to ten, the exemplary apparatus 500 can immediately detect corruption in any of the next ten instructions in the queue 502 without waiting for the corrupt instruction to reach the head of the queue 522. This provides in essence a look ahead capability.

[0097] By employing a cascaded logic array 550, the exemplary apparatus 500 is able to detect errors earlier as compared to the serial construction used in the exemplary apparatus 300. Both the serial construction of the exemplary apparatus 300 and the parallel construction of the exemplary apparatus 500 allow fast keyed hash values to be generated without involvement of the processor, thereby provided beneficial advantages for control flow.

[0098] Figure 6 illustrates a block diagram of an exemplary apparatus 600 configured to generate a final hash value based on cascaded logic arrays 610, . . ., 612, 614 in accordance with aspects of the disclosed embodiments. The exemplary apparatus 600 employs a plurality of cascaded logic arrays 610, . . . , 612, 614, similar to the cascaded logic array 450 described above and with reference to Figure 4. The apparatus 600 is configured to provide a hardware generated final hash value 616 similar to the final hash value generated 144 by the exemplary method 130 described above and with reference to Figure 1.

[0099] With reference to Figure 1, generation of a final hash value to support memory integrity protection of a region 150 of the memory 104 is achieved by logically dividing the memory region 150 into a plurality of chunks 152, 154, ... , 156, where each chunk includes one or more data blocks bi, b2, ... b_p. As described above the method 114 may be used for generating a final hash value for the region 150 of memory 104. However, executing the method 114 in a processor can consume valuable processing resources. The exemplary apparatus 600 illustrates a hardware-based approach for generating the final hash value 616 of a plurality of memory chunks 152, ..., 156.

[00100] In this example, the plurality of cascaded logic arrays 610, ..., 612 are configured to generate a plurality of fast keyed hash values 606 where each fast keyed hash value hi, ..., h_q in the plurality of fast keyed hash values 606 corresponds to a different one chunk in the plurality of chunks 152, ..., 156. Generation of the plurality of fast keyed hash values is based on a binary non-singular key matrix K. It will be appreciated that each cascaded logic array 610, ..., 612 can independently and simultaneously process certain different parts of a message or memory region, such as chunks 152, ..., 156, thereby allowing greater flexibility of implementation and a greater possible performance improvements.

[00101] A final cascaded logic array 614 generates the final hash value 616 based on the plurality of fast keyed hash values 606 and a derived key matrix K’, where the derived key matrix K’ is generated by multiplying the key matrix K by itself 5 times where 5 is the number of chunks in the plurality of chunks 152, . . ., 156.

[00102] It will be appreciated that a single key register can be used to supply the key matrix K to all logic stages in the plurality of cascaded logic arrays 610, ..., 612, thereby avoiding duplicating the key register in every logic stage. Similarly, a single derived key register can be used to supply the derived key K’ to all logic stages in the final cascaded logic array 614.

[00103] As discussed above, keys used in the disclosed embodiments are invertible Boolean matrices uniformly chosen at random. Suitable keys, or key matrices, may be chosen at random through a trial-and-error process, such as by generating a random Boolean matrix, then testing for invertibility. This process may then be repeated until a suitable key matrix is found. In practice, a valid key matrix is usually found within about five attempts. However, there is no guarantee and key generation times can become excessively large.

[00104] In embodiments where fast key generation is required it may be beneficial to create special classes of keys without generating full n-by-n key matrices. In one embodiment, a unit lower triangular Boolean matrix with ones on the main diagonal may be generated. A matrix of this form is guaranteed to be invertible thereby avoiding the expense of testing for invertibility. Similarly, a unit upper triangular matrix may also be advantageously employed. While these unit triangular key matrices are less expensive to generate, unit triangular matrices provide weaker security than a fully random invertible key matrix.

[00105] The weaker security provided by a unit triangular matrix can be compensated to a certain degree by generating a random permuted unit lower triangular matrix and a random unit upper triangular matrix and using the product of these two triangular matrices as the key matrix. Similarly, the upper unit triangular matrix could be permuted rather than permuting the random lower unit triangular matrix.

[00106] Figure 7 illustrates a pictorial diagram of an operation 700 for accelerating matrix operations in accordance with aspects of the disclosed embodiments. As described above, data blocks may be incorporated into the fast keyed hash value by left multiplying a vector sum by a key matrix. The required matrix operations may be accelerated with the illustrated operation 700.

[00107] As show in Figure 7, the //-by-// key matrix is divided into a set 702 of smaller sub-keys, also referred to as sub-key matrices, where the number of sub-keys s² is equal to a square of a divisor 5, and each sub-key (Ku, K12, . . .) is a square matrix with order equal to the block size n divided by the divisor 5. Similarly, the vector sum is divided into a set 704 of smaller vectors (mi, m2, . . .).

[00108] For example, consider a processing device with a word size of 32 bits. The corresponding fast keyed hash value may be based on a 32-element block size, i.e., n=32 and a 32 by 32 element key matrix. Using a divisor of 5=4, yields 5²=16 sub-keys with each square sub-key matrix having order eight, n/s= , as illustrated by the set 702 of 16 sub-key matrices (Ku, K12, . . ., K44). Similarly, the 32-bit vector sum may be divided into 4 sub-vectors 704 of 8 bits each. [00109] A lookup table is pre-computed by multiplying all possible 8-bit sub-vectors values by each sub-key matrix and recording the product in the lookup table. By using the lookup table, left multiplying a 32-bit vector sum by the 32-bit key matrix is reduced to sixteen table lookups and sixteen XOR operations.

[00110] Figure 8 illustrates a flow diagram of an exemplary method 800 for ensuring memory integrity in a computing apparatus incorporating aspects of the disclosed embodiments. The exemplary method 800 of the disclosed embodiments is directed to a method for efficient detection of memory corruption based on generation of an improved fast keyed hash value. The method 800 is appropriate for detecting memory corruption in a region of memory within a computing apparatus such as the memory region 150 of the apparatus 100 described above and with respect to Figure 1. To improve processing, the exemplary method 800 logically divides the memory region 150 into a plurality of chunks 152, 154, . . ., 156 where each chunk in the plurality of chunks includes one or more data blocks bi, b2, ... b_p.

[00111] The method 800 begins by receiving 802 data corresponding to a portion of the memory. The portion of the memory may for example be read from a memory coupled to a processor, such as the region 150 of the memory 104 described above. The exemplary method 800 obtains certain computational efficiencies by setting a block size n of each data block in the one or more data blocks equal to a word size of the memory.

[00112] A plurality of fast keyed hash values are generated 804 for each chunk in the plurality of chunks based on a key matrix K, and the block size n. Generation of the fast keyed hash values will be described in more detail below and with reference to Figure 900. Each chunk in the plurality of chunks includes one or more data blocks and each data block in the one or more data blocks includes a block size n number of data bits. The resulting plurality of fast keyed hash values includes one fast keyed hash value corresponding to each chunk in the plurality of chunks.

[00113] A derived key matrix K’ is generated or computed 806 by multiplying the key matrix K by itself 5 times, where 5 is the number of chunks in the plurality of chunks 152, . . ., 156. The derived key matrix K’ is then used along with the plurality of fast keyed hash values to generate or compute 808 a final hash value.

[00114] A compare result is then determined 810 based on the fast keyed hash value and a predetermined value, where the predetermined value represents a final hash value generated based on a known good or uncorrupted portion of the memory. When the final hash value is the same as the predetermined value, no memory corruption has occurred and a normal execution path 814 is selected. When the final hash value is different than the predetermined value, memory corruption is indicated and an alternate or recovery execution path 812 is selected.

[00115] Figure 9 illustrates a flow diagram of an exemplary method 900 for generating a fast keyed hash value in accordance with aspects of the disclosed embodiments. The exemplary method 900 is configured to generate a fast keyed hash value based on a set of one or more data blocks and is appropriate for generating the fast keyed hash values employed in the exemplary memory integrity method 800 described above.

[00116] The exemplary method 900 begins by setting 904 a hash state to a predetermined initial state. The predetermined initial state may be any desired vector value, and when desired may be used as an additional parameter for the fast keyed hash value. In certain embodiments the zero vector may be advantageously employed as the predetermined initial state.

[00117] A loop 906 is used to iteratively incorporate each data block in a set of one or more data blocks into a hash state. Each data block is incorporated 908 into the hash state by adding the data block to a current hash state and left multiplying the resulting vector by a key matrix to generate the next hash state Sj+i. For clarity this method step can be depicted mathematically as: Sj+i = K(Sj + bi) where Si is the current hash state, bi is the data block being incorporated, Sj+i is the next hash state, and K is a key matrix. The key matrix K is a randomly chosen n-by-n invertible Boolean matrix, where n is the block size and each of the n² entries in the key matrix K has a Boolean value of either 0 or 1.

[00118] Once all data blocks in the set of one or more data blocks has been incorporated in the hash state, the fast keyed hash value is set 910 to the final hash state.

[00119] From a mathematical perspective the hash values employed in the methods described above are formed from the multiplication of a special key matrix K' constructed from powers of a Boolean invertible key matrix K, where the message bits are in a finite field or Galois field in GF(2). In certain embodiments it may be advantageous to construct the fast keyed hash value based on different finite fields, such as a Galois field GF(2^q) where q is a positive integer. [00120] The aspects of the disclosed embodiments provide protection against accidental and intentional memory corruption while reducing the computing resources consumed during corruption detection. Memory integrity protection is based on generation of a fast keyed hash value adapted to be efficiently generated in either software or hardware and, when desired, may be used to generate a hash-based message authentication code (MAC).

[00121] Thus, while there have been shown, described and pointed out, fundamental novel features of the invention as applied to the exemplary embodiments thereof, it will be understood that various omissions, substitutions and changes in the form and details of devices and methods illustrated, and in their operation, may be made by those skilled in the art without departing from the spirit and scope of the presently disclosed invention. Further, it is expressly intended that all combinations of those elements, which perform substantially the same function in substantially the same way to achieve the same results, are within the scope of the invention. Moreover, it should be recognized that structures and/or elements shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto.

Claims

CLAIMS What is claimed is:

1. An apparatus (100) comprising a processing device (102) and a memory (104), the memory (104) comprising one or more data blocks (bi, b2, ... b_p), wherein each data block in the one or more data blocks (bi, b2, ... b_p) comprises a block size (n) number of data bits, wherein the processing device (102) is configured to: generate a fast keyed hash value based on the one or more data blocks (bi, b2, ... b_p), a key matrix (K), and the block size (//); determine a compare result based on the fast keyed hash value and a predetermined value; and select an execution flow based at least in part on the compare result; wherein generating the fast keyed hash value comprises: initializing a hash state based on a predetermined initial state; iteratively incorporating each data block in the one or more data blocks (bi, b2, ... b_p) into the hash state, wherein incorporating a data block into the hash state comprises adding the hash state to the data block to form a vector sum, then left multiplying the vector sum by the key matrix (K); and when all data blocks in the one or more data blocks (bi, b₂, b_p) are incorporated into the hash state, setting the fast keyed hash value to the hash state.

2. The apparatus (100) according to claim 1, wherein the processing device (102) comprises a hardware logic stage (206) configured to incorporate a data block into the hash state by: adding a first hash state (208) to the data block (210) to form a first vector sum (212); and left multiplying the first vector sum (212) by the key matrix (K) to form a next hash state (214); wherein the logic stage (206) comprises hardware XOR gates (204) configured to perform addition operations, and hardware AND gates (202) configured to perform multiplication operations.

3. The apparatus (100) according to any one of the preceding claims, wherein the processing device (102) comprises a processor (318), and the one or more data blocks comprises an instruction queue (302), and wherein the processing device (102) is configured to: iteratively incorporate one or more instructions from a head (320) of the instruction queue (302) into the hash state (306), and when a pre-determined number (308) of instructions (320) have been incorporated: determine a compare result (116) based on the hash state (306) and a predetermined value; and select (324) an execution flow of the processor (318) based at least in part on the compare result (116).

4. The apparatus (100) according to claims 1 or 2, wherein the one or more data blocks comprises a plurality of data blocks (402), and the processing device (102) comprises a cascaded logic array (450) configured to receive the plurality of data blocks (402) and generate the fast keyed hash value (408); and wherein the cascaded logic array (450) comprises: a first logic stage (404) configured to generate a first hash state (416) based on a prior hash state (412) and a first data block (414), and a second logic stage (406) configured to generate a next hash state (420) based on the first hash state (416) and a second data block (418).

5. The apparatus (100) according to claim 4, wherein the plurality of data blocks comprise a plurality of instructions (502), and the cascaded logic array (550) is configured to generate a plurality of hash states (Si, S2, ... s_p) based on the plurality of instructions (502) and the key matrix (K), wherein when a pre-determined number (506) of instructions have been processed, the processing device (102) is further configured to determine the compare result (116) based on the plurality of hash states (Si, S2, ... s_p) and a plurality of reference values (504).

6. The apparatus (100) according to claim 1, wherein the memory (104) comprises a plurality of chunks (152, 154, . . ., 156) and each chunk in the plurality of chunks (152, 154, . . ., 156) comprises one or more data blocks (bi, b2, ... b_p), wherein the processing device (102) is further configured to: generate a plurality of fast keyed hash values wherein each fast keyed hash value corresponds to a different one chunk in the plurality of chunks 152, . . ., 156); generate a derived key matrix (K’) based on the key matrix (K) and a number of chunks in the plurality of chunks; generate a final hash value based on the derived key matrix (K’) and the plurality of fast keyed hash values; and determine the compare result (116) based on the final hash value and the predetermined value.

7. The apparatus (100) according to claim 6, wherein the processing device (102) is further configured to generate each fast keyed hash value in the plurality of fast keyed hash values in parallel.

8. The apparatus (100) according to claim 6 or 7, wherein the processing device (102) comprises a plurality of cascaded logic arrays (610, ..., 612) and a final cascaded logic array (614), wherein the plurality of cascaded logic arrays (610, ..., 612) is configured to generate (140) the plurality of fast keyed hash values (hi, ..., h_q), and the final logic array (614) is configured to generate (144) the final hash value (hf).

9. The apparatus (100) according to claims 6 or 7, wherein one chunk (156) in the plurality of chunks (152, 154, . . ., 156) comprises a different number of data blocks than the other chunks in the plurality of chunks (152, 154, ..., 156).

10. The apparatus (100) according to any one of the preceding claims, wherein the memory (104) comprises one or more of a file data, a software application, an operating system, and a secure channel.

11. The apparatus according to any one of the preceding claims, wherein the processing device (102) is further configured to generate a message authentication code by: applying a pseudo random function to one or more of the fast keyed hash values and the final hash value; and determining the compare result (116) based on the message authentication code and the predetermined value.

12. The apparatus (100) according to any one of the preceding claims, wherein the key matrix (K) comprises a non-singular square binary matrix having dimensions of the block size by the block size (n x //), and the data block comprises a binary vector.

13. The apparatus (100) according to claim 1, wherein the one or more data blocks (bi, b2, . . . b_p) and the key matrix (K) comprise elements lying in the Galois field GF(2^q) wherein q is a positive integer.

14. The apparatus (100) according to any one of the preceding claims, wherein the block size (n) corresponds to a word size of the processing device (102)

15. The apparatus (100) according to any one of the preceding claims, wherein the key matrix (K) comprises one of a unit upper triangular Boolean matrix, a unit lower triangular Boolean matrix, the product of a random permuted lower unit triangular Boolean matrix and a random upper unit triangular Boolean matrix, and the product of a random lower unit triangular Boolean matrix and a random permuted upper unit triangular Boolean matrix.

16. The apparatus (100) according to any one of the preceding claims, wherein the processing device (102) is further configured to: divide the key matrix (K) into a plurality of square sub-keys (702): generate a vector product by left multiplying each square sub-key (Kjj) by a sub-vector value (rrij); and store the vector product in a lookup table, wherein the left multiplying the vector sum by a key matrix (K) comprises looking up one or more vector products in the lookup table.

17. A method (800) comprising: generating (804) a fast keyed hash value based on a one or more data blocks, a key matrix (K), and a block size, wherein each data block in the one or more data blocks comprises a block size (n) number of data bits; determining a compare result (810) based on the fast keyed hash value and a predetermined value; and selecting a first execution flow (812) or a second execution flow (814) based at least in part on the compare result; wherein generating (804) the fast keyed hash value comprises: initializing (904) a hash state based on a predetermined initial state; iteratively incorporating (906, 908) each data block in the one or more data blocks into the hash state, wherein incorporating a data block into the hash state comprises adding the hash state (Sj) to the data block (Bj) to form a vector sum, then left multiplying the vector sum by a key matrix (K); and when all data blocks in the one or more data blocks are incorporated into the hash state, setting (910) the fast keyed hash value to the hash state, wherein the key matrix comprises a random non-singular square binary matrix having dimensions of the block size by the block size, and the block size corresponds to a memory word size of the memory a processing device.

18. The method (800) according to claim 17, wherein generating (804) the fast keyed hash value further comprises generating a plurality of fast keyed hash values, wherein each fast keyed hash value corresponds to a different one chunk in a plurality of chunks, and the method further comprises: generating (806) a derived key matrix (K’) based on the key matrix (K) and a number of chunks in the plurality of chunks; generating (808) a final hash value based on the derived key matrix (K’) and the one or more fast keyed hash values; and determining the compare result (810) based on the fast keyed hash value and a predetermined value.