WO2022242841A1

WO2022242841A1 - Method and device for executing code

Info

Publication number: WO2022242841A1
Application number: PCT/EP2021/063238
Authority: WO
Inventors: Santeri SALKO; Jan-Erik Ekberg; Sampo Sovio; Igor STOPPA
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2022-11-24

Abstract

Provided is a computer-implemented method for executing code in a processor (102, 600). The method includes fetching, by the processor (102, 600), a block of code (200A-N, 300A-N) to be executed, reading a reference checksum, and recording the reference checksum in a configuration register (606) in the processor (102, 600). The reference checksum is calculated for the block of code (200A-N, 300A-N) after compiling and is added to the beginning of the block of code before execution. The method includes executing the block of code (200A-N, 300A-N) using the processor (102, 600). The method includes generating, by the processor (102, 600), a validation checksum based on the executed code using a key (204). The method includes comparing, by the processor (102, 600), the validation checksum against the recorded reference checksum. The processor (102, 600) generates an interrupt signal if the validation checksum does not match the reference checksum.

Description

METHOD AND DEVICE FOR EXECUTING CODE

TECHNICAL FIELD

The disclosure relates to a computer-implemented method for executing code in a processor, and more particularly, the disclosure relates to a processing device including a processor for executing a code in the processor.

BACKGROUND

Control flow integrity (CFI) is a method to prevent a wide variety of memory attacks from redirecting the flow of execution (control flow) of a program. One of such powerful memory attacks is return-oriented programming (ROP) attacks. The ROP attacks allow an attacker to execute a code in the presence of security defenses. The attacker gains control of the call stack (i.e. run-time stack) to hijack program control flow and then executes carefully with chosen machine instruction sequences that are already present in the memory of the device. The ROP attacks are responsible for around 70% of exploitable bugs in contemporary software which is a serious attack vector. The ROP attacks allow the attacker to exploit software vulnerabilities like buffer overflows and use after free to hijack control flow and execute gadgets chosen by the attacker. These gadgets are subsets of instructions found in the code. By careful selection of gadgets, the attacker may execute malicious code or misuse protected data such as cryptographic keys that are available to fulfil the policy that is required to access the keys. Traditionally, code location randomization is used to protect the control flow integrity from the ROP attacks by making the control flow integrity harder to find gadgets. As the code location randomization makes the code relocate at every program launch, this does not provide full protection.

Existing solutions to protect running code against the ROP attacks include different variations of Call Flow Integrity (CFI) solutions. These protections include a shadow stack, which is employed to protect the return address of a function (which is often stored in the memory) and therefore vulnerable to a memory attack. The shadow stack is introduced which is a second separate stack that shadows the program stack. The return address of the function is stored in the program stack and the shadow stack as well. To detect an attack, the addresses of the function in the program stack and the shadow stack are compared and if the addresses differ from each other, then an attack is confirmed. A typical action is taken to terminate the function or alert the system administrator about the intrusion attempt. But, the shadow stack fails to protect stack data other than return addresses and offers incomplete protection against security vulnerabilities that result from memory safety errors. The allocation of memory for shadow stacks is also more.

The CFI protections need to consider a call flow in a direction where functions, methods, or other indirect jumps in programs are conducted. Further, a large body of work exists for this protection, and some software-based known solutions have already reached inclusion in contemporary compilers. These mechanisms can be type-based, where the forward jump is labeled with “the type” of e.g. the function being called, or more specific, where either the call site code or the target includes code that resolves/validates against the memory address of the other node to identify violations in the call flow.

Another known approach employs limiting network services to limit the usage of a licensed model of computer code in unauthorized devices as per Digital Rights Management (DRM). Nevertheless, the problem still exists even though license dongles and license managers are used. In certain scenarios, where there is a scope of the usage of hardware and software like the internet of things (IoT) devices, the computer code needs protection on the hardware and the software by running the software version on the device. This feature may also have a copying restriction or regional enforcement.

Another known approach employs a method that uses a software component that checks the integrity of a software object before running it. Further, the software component provides a run-time type checking to prevent software errors. However, the software component fails to enforce integrity validation of the software object using hardware too. There is no implementation of hardware in the above approach. Some advanced machines may not be compatible in employing the software component to validate the integrity of the computer code.

Therefore, there arises a need to address the aforementioned technical drawbacks in known techniques or technologies in executing code in a processor.

SUMMARY

It is an object of the disclosure to provide a computer-implemented method for executing code in a processor, and a processing device for executing code while avoiding one or more disadvantages of prior art approaches. This object is achieved by the features of the independent claims. Further, implementation forms are apparent from the dependent claims, the description, and the figures.

The disclosure provides a computer-implemented method for executing code in a processor, and a processing device for executing code.

According to a first aspect, there is provided a computer-implemented method for executing code in a processor. The computer-implemented method includes fetching, by the processor, a block of code to be executed, reading a reference checksum, and recording the reference checksum in a configuration register in the processor. The computer-implemented method includes executing the block of code using the processor. The computer-implemented method includes generating, by the processor, a validation checksum based on the executed code. The computer-implemented method includes comparing, by the processor, the validation checksum against the recorded reference checksum.

The method uses a checksum mechanism on a unit block of code such as a basic block for providing a secure way of execution. This mechanism protects the running code against resourceful and determined attackers, as the method uses a hardware- sealed key and allows only execution of instructions that are defined by a pre-programmed keyed checksum. The method prevents an attacker to get hold of the pre-programmed key in the CPU, by inserting his code. Therefore remote server is not essential to validate after the run-time of each basic block of the code. If any malicious software runs, the method detects the malicious software during the validation and at the end of the execution of the current basic block of code.

According to a second aspect, there is provided a processing device for executing code, and the processing device is configured to perform the above method.

The processing device uses a checksum mechanism on a basic block of code for providing a secure way of execution. This mechanism protects the running code against resourceful and determined attackers, as the processing device uses a hardware- sealed key and can report a violation upon execution of instructions that are not defined by a pre-programmed keyed checksum. The processing device prevents an attacker to get hold of the pre-programmed key in the CPU, by inserting his code. Therefore remote server is not essential to validate after the run-time of each basic block of the code. If any malicious software runs, the processing device detects the malicious software during the validation and at the end of the execution of the current basic block of code. In some examples, if violation is detected an interrupt can be triggered.

A technical problem in the prior art is resolved, where the technical problem preventing attacks while executing code in the processor.

Therefore, in contradistinction to the prior art, according to the computer-implemented method for executing code in a processor and the processing device for executing code in the processor, the efficiency for control-flow integrity is improved, by providing lightweight checksum. Due to the light-weight nature of the checksum, the checksum computation is very fast and the use of RAM is not essential. The processor streams for checksum encryption or decryption in parallel when the checksum is computed. The parallel processing can complete the checksum computation before the execution of code is finalised, so the checksum computation does not increase the code execution time. The method eliminates the chance of changing the code by an attacker due to the validation of the checksum. The method reduces the time required to identify the malicious attacks when compared to the existing solutions. The reliability of the method is very high for third-party scenarios. The method easily works for RISC-V devices that lack ARM control-flow integrity (CFI) features.

These and other aspects of the disclosure will be apparent from and the implementation(s) described below.

BRIEF DESCRIPTION OF DRAWINGS

Implementations of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a processing device for executing a code in accordance with an implementation of the disclosure;

FIG. 2 is an exemplary diagram that illustrates a process of generating a reference checksum in accordance with an implementation of the disclosure;

FIG. 3 is an exemplary diagram that illustrates a process of generating a reference checksum by replacing padding with the reference checksum in accordance with an implementation of the disclosure; FIG. 4 is an exemplary diagram that illustrates a process of loading a wrapped key to a crypto module in accordance with an implementation of the disclosure;

FIG. 5 is an exemplary diagram that illustrates a process of replacing padding with a reference checksum in accordance with an implementation of the disclosure;

FIG. 6 is an exemplary diagram that illustrates a process of executing a block of code using a processor in accordance with an implementation of the disclosure;

FIG. 7 is an exemplary process of creating validation checksum in accordance with an implementation of the disclosure;

FIG. 8 is a flow diagram that illustrates a computer-implemented method for executing code in a processor in accordance with an implementation of the disclosure; and

FIG. 9 is an illustration of a computing arrangement that is used in accordance with implementations of the disclosure.

DETAILED DESCRIPTION OF THE DRAWINGS

Implementations of the disclosure provide a computer-implemented method for executing code in a processor and a processing device for executing code.

To make solutions of the disclosure more comprehensible for a person skilled in the art, the following implementations of the disclosure are described with reference to the accompanying drawings.

Terms such as "a first", "a second", "a third", and "a fourth" (if any) in the summary, claims, and foregoing accompanying drawings of the disclosure are used to distinguish between similar objects and are not necessarily used to describe a specific sequence or order. It should be understood that the terms so used are interchangeable under appropriate circumstances, so that the implementations of the disclosure described herein are, for example, capable of being implemented in sequences other than the sequences illustrated or described herein. Furthermore, the terms "include" and "have" and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, a method, a system, a product, or a device that includes a series of steps or units, is not necessarily limited to expressly listed steps or units but may include other steps or units that are not expressly listed or that are inherent to such process, method, product, or device.

FIG. 1 is a block diagram of a processing device 100 for executing a code in accordance with an implementation of the disclosure. The processing device 100 includes a processor 102. The processing device 100 is configured to execute a method for executing code in the processor 102. The processor 102 is configured to fetch a block of code to be executed. The processor 102 is configured to read a reference checksum and record the reference checksum in a configuration register in the processor 102. The processor 102 is configured to execute the block of code. The processor 102 is configured to generate a validation checksum based on the executed code. The processor 102 is configured to compare the validation checksum against the recorded reference checksum.

The processor 102 is configured to use a checksum mechanism on a basic block of code to provide a secure way of execution. This mechanism protects the running code against resourceful and determined attackers, as the processor 102 is configured to use a hardware- sealed key and can report a violation upon execution of instructions that are not defined by a pre-programmed keyed checksum. The processor 102 prevents an attacker to get hold of the pre-programmed key in the CPU, by inserting his code. The processor 102 does not require any remote servers to validate after the run-time of each basic block of the code, i.e. remote server is not essential. If any malicious software runs, the processor 102 detects the malicious software during the validation and at the end of the current basic block of code. In some examples, if violation is detected an interrupt can be triggered.

The processor 102 is configured to execute the code to improve the efficiency for control flow integrity, by providing a lightweight checksum. Due to the light-weight nature of the checksum, the checksum computation is very fast and the consumption of RAM is not required. The processor 102 is configured to stream for checksum encryption or decryption in parallel when the checksum is computed. The parallel processing can complete the checksum computation before the execution of code is finalised, so the checksum computation does not increase the code execution time. The processor 102 eliminates the chance of changing the code by an attacker due to the validation of the checksum. The processor 102 reduces the time required to identify the malicious attacks when compared to the existing solutions. The reliability of the processor 102 is very high for third-party scenarios. The processor 102 is configured to work easily for RISC-V devices that lack ARM control-flow integrity (CFI) features. Optionally, the processor 102 is configured to generate the validation checksum by updating the validation checksum for each instruction in the block of code as the instruction is executed. Optionally, the processor 102 is configured to generate the validation checksum by generating the validation checksum using a key, and the reference checksum is generated using the same key. The key is stored in the processor 102. The key is recorded in a write- only register of the processor 102 during a booting operation of the processor 102.

The key may be received via software and provided to the processor 102 using the software. The provided key may be encrypted with an asymmetric public key and an associated decryption private key is configured into the processor 102. Optionally, the processor 102 is configured to fetch the block of code by reading a block size for the block of code, and to generate the validation checksum includes determining that the execution is complete based on the block size.

The reference checksum may be encrypted and the validation checksum is generated by encrypting the validation checksum. Optionally, the processor 102 is configured to calculate the reference checksum for the block of code after compiling and to add to the beginning of the block of code before providing the block of code to the processor 102 for execution.

Optionally, generating the validation checksum is implemented in the processor logic. The processor logic may include instructions to create contents of the configuration register and to recognize and store the reference checksum. The reference checksum may be embedded with instructions to record the configuration register value. The processor 102 may be configured to generate an interrupt if the validation checksum does not match the reference checksum. The processor 102 may be configured to generate the validation checksum by generating the validation checksum using a seed value. The block of code is configured to provide a seed value to each other block of code that can validly follow the present block of code.

FIG. 2 is an exemplary diagram that illustrates a process of generating a reference checksum in accordance with an implementation of the disclosure. Optionally, a processor generates the reference checksum for blocks of code 200A-N, using a key K 204. The processor is configured to compile a block of code (e.g. 200A or 200N) to calculate the reference checksum and a length of the block of code (e.g. 200A or 200N). Optionally, a Software, SW, creation tool, or an enhanced compiler identifies and patches all blocks from a code. Optionally, the Software, SW, creation tool, or the enhanced compiler has the key K 204 and may compute the reference checksum. The reference checksum, the key K 204, and the length of the block of code (e.g. 200A or 200N) are added at the beginning of each block of code (e.g. 200A or 200N) to obtain patched blocks of code 202A-N before providing the patched blocks of code 202A-N to the processor for execution.

By calculating the reference checksum and injecting at the beginning of the blocks of code 200A-N configures a hardware to identify the length of the block of code (e. g. 200A or 200N) easily. Thereby, values of registers that are initialized are used during validation of checksum values.

The processor may be configured to store the key K 204. The key K 204 may be received via software and provided to the key K 204 using the software. The key K 204 may be encrypted with an asymmetric public key to obtain a wrapped key K 206 and an associated decryption private key is configured into the processor.

The key K 204 for the block of code (e.g. 200 A or 200N) and the key K 204 of the validation checksum are matched for validation, thereby, eliminating the need for any extra validation or other validating requirements for validation purpose. Thus, malicious attacks can be identified in less time during the execution of the code. By encrypting the key K 204, security of the key K 204 may be triggered, and the same key is decrypted at a target processor that enables the validation of the key K 204 for the block of code (e.g. 200A or 200N). Instead of encrypting the entire block of code, the key K 204 may be added on top of the block of code (e.g. 200A or 200N) that is encrypted, thereby, enabling the booting operation for the block of code (e.g. 200A or 200N) to be light-weight. Similarly, the validation process also becomes light-weight.

The processor also provides n-bit input with a k-bit key that requires (n-k) shift and XOR operations for non-encrypted checksum. Such a method can be performed very quickly. Thereby, in some embodiments the processor validates the block of code using the non- encrypted checksum. Alternatively, as described elsewhere, the checksum is encrypted and it is compared against an encrypted reference checksum.

FIG. 3 is an exemplary diagram that illustrates a process of generating a reference checksum by replacing padding with the reference checksum in accordance with an implementation of the disclosure. Optionally, a Software, SW, creation tool, or an enhanced compiler identifies and patches all blocks from a code. Optionally, the Software, SW, creation tool or the enhanced compiler does not include a key K. Optionally, the Software, SW, creation tool, or the enhanced compiler adds the padding for one or more blocks of code 300A-N. The padding may be zeroes or magic value string. The padding may be replaced with real reference checksum during loading phase to obtain patched blocks of code 302A-N before providing the patched blocks of code 302A-N to a processor for execution.

FIG. 4 is an exemplary diagram that illustrates a process of loading a wrapped key K 402 to a crypto module 404 in accordance with an implementation of the disclosure. The exemplary diagram includes a patched block of code 400 with the wrapped key K 402. The wrapped key K 402 is the key that is recorded in a write-only register of a processor during booting/loading a software, SW, of a target system. Optionally, the wrapped key K 402 is provisioned to boot the SW if a feature for the whole software stack of the system is used. Optionally, the wrapped key K 402 is provisioned to load the SW if a feature is used only for a certain program. The crypto module 404 may use an unwrapping key to unwrap. The crypto module 404 may generate a lookup table to calculate the reference checksum during runtime of the block of code. The lookup table may be stored in two configuration registers of the processor.

Optionally, a cyclic redundancy check 32 (CRC32) algorithm and a slicing-by-4 optimization technique may be used to calculate reference checksum during the run-time. The CRC32 lookup tables make checksum (CRC32) computation very fast. In particular, CRC32 can be useful in examples where the cryptographic module cannot support a secret K. The reference checksum during the run-time may be stored in the look-up table. If the lookup table is larger, then it may be easy to calculate more data at the same time. For example, when 4 kilobytes sized lookup table is used, then it is easy to calculate 4 bytes at the same time.

The following are the exemplary two configuration registers: pccslutd = lookup table data, 32bit part of the lookup table, write only pccsluti = lookup table index, lObit index of the lookup table, read/write

The following exemplary lookup table shows an index register that is written to indicate which item of the lookup table is written, followed by a write to a data register.

FIG. 5 is an exemplary diagram that illustrates a process of replacing padding with a reference checksum in accordance with an implementation of the disclosure. The exemplary diagram includes a patched block of code 500 and a crypto module 504 that includes an unwrapped key K 502. The patched block of code 500 includes a reference checksum, a wrapped key K, and a length of the block of code. The crypto module 504 includes the unwrapped key K 502 to unwrap an associated decryption private key for the unwrapped key K 502. The crypto module 504 may generate a lookup table. The crypto module 504 may generate the reference checksum during runtime of the block of code. For each block of code, the crypto module 504 may replace padding with a calculated reference checksum.

FIG. 6 is an exemplary diagram that illustrates a process of executing a block of code using a processor 600 in accordance with an implementation of the disclosure. The exemplary diagram includes a patched block of code 602. The patched block of code 602 includes a reference checksum, a key, and a length of the block of code added at the beginning of the patched block of code 602. The length of the code may be stored in a configuration register (e.g. PCCREF) 606, or may be in a counter. During the execution of each instruction in each block of the code, a run-time reference checksum is updated and a value of the counter is decreased by one. When the value of the counter becomes zero, a crypto module 604 may compare the validation checksum with an expected reference checksum. If the validation checksum is not matched with the expected reference checksum, the crypto module 604 may provide an appropriate interrupt to the processor 600 and end execution of a current program. If the validation checksum matches with the expected reference checksum, then next three instructions are control free and during that time, and a next recorded reference checksum can be programmed.

The run-time reference checksum may be the validation checksum. The validation checksum is also encrypted. The generation of validation checksum is implemented in the processor logic. The processor logic includes instructions to create the contents of configuration register 606 and to recognize and store the reference checksum.

By calculating the reference checksum and injecting at the beginning of the blocks of code configures a hardware to identify the length of the basic block of code easily. Thereby, values of registers that are initialized are used during validation of checksum values.

The processor logic provides an extended hardware that belongs to the same system to enforce validation of the validation checksum generated over the basic blocks of code and this provides an internal validation. In IoT applications, there is no necessity to validate the hardware involved in such applications separately before executing software code. The internal validation provides the required security for an intrusion attempt.

Optionally, generating the validation checksum includes generating the validation checksum using a seed value, where the block of code is configured to provide a seed value to each other block of code that can validly follow the present block of code. An additional configuration register (e.g. pccsseed) may be used for writing before starting the execution of the block of code. Each block of code includes the seed value setup before its label. Optionally, if the block of code starts by falling through a previous block of code, a seed value gets written by that instruction. Optionally, if the block of code starts by jumping elsewhere, a calling block of code may write the seed value. Optionally, if the block of code returns to a caller, the block of code may land in after the jump instruction that starts of a next basic block.

FIG. 7 is an exemplary process of creating validation checksum in accordance with an implementation of the disclosure. At a step 702, an address of a message, M, is provided to generate a stream. At a step 704, a key K is converted to a polynomial k(x), such that degree of k(x) is equal to n and k(x) is irreducible over GF(2) to generate the stream. At a step 706, the stream is generated based on the address of the message, M, and the key K that is converted as the polynomial k(x). At a step 708, the stream is obtained. At a step 710, the message, M, is converted to a polynomial m(x), such that mac(x) = m(x)*x^An(mod k(x)), mac(x) may be a polynomial of the reference checksum. At a step 712, a reference checksum is provided along with the key K. At a step 714, the reference checksum along with the key K is received. At a step 716, the polynomial of the reference checksum is encrypted by providing the reference checksum and the generated stream. At a step 718, a validation checksum is generated.

To prevent an attack by an attacker, the reference checksum mac(x) may be encrypted using the polynomial of the reference checksum and a fast stream cipher such as RC4, Grain, Salsa20 or similar. The encryption of the polynomial of the reference checksum mac(x) may require only 4 bytes which does not cause any performance penalty. While programming the next reference checksum, the encryption may be done simultaneously. Thereby, the time taken for encryption is less.

FIG. 8 is a flow diagram that illustrates a computer-implemented method for executing code in a processor in accordance with an implementation of the disclosure. At a step 802, a block of code to be executed is fetched by the processor and a reference checksum is read and the reference checksum is recorded in a configuration register in the processor. At a step 804, the block of code is executed using the processor. At a step 806, a validation checksum is generated based on the executed code, by the processor. At a step 808, the validation checksum is compared against the recorded reference checksum by the processor.

The method uses a checksum mechanism on a basic block of code for providing a secure way of execution. This mechanism protects the running code against resourceful and determined attackers, as the method uses a hardware-sealed key and allows only execution of instructions that are defined by a pre-programmed keyed checksum. The method prevents an attacker to get hold of the pre-programmed key in the CPU, by inserting his code. The method does not require any remote servers to validate after the run-time of each basic block of the code. If any malicious software runs, the method detects the malicious software during the validation and at the end of the execution of the current basic block of code.

Optionally, generating the validation checksum includes updating the validation checksum for each instruction in the block of code as the instruction is executed. A new reference checksum cannot be configured if the calculation of a previous reference checksum is ongoing. The processor has an internal counter such that the new reference checksum is not programmed without verifying the previous reference checksum. Optionally, generating the validation checksum includes generating the validation checksum using a key, the reference checksum is generated using the same key. As the validation checksum is computed inside the processor using the same key as for the reference checksum, the external attestation of the checksum by external sources is not required. Thereby, the size of the validation checksum is less and it becomes a light-weight as it is using the same key as for the reference checksum.

Optionally, the key is stored in the processor. Optionally, the key is recorded in a write-only register of the processor during a booting operation of the processor. The key for the block of code and the key of the validation checksum are matched for validation, thereby, eliminating the need or extra validation or other validating requirements for validation purpose. Thus, malicious attacks can be identified in less time during the execution of the code.

Optionally, the key is received via software and provided to the processor using the software. Optionally, the provided key is encrypted with an asymmetric public key and an associated decryption private key is configured into the processor. By encrypting the key, the security of the key is triggered, and the same key is decrypted at the target processor that enables validation of the key for the block of code. Instead of encrypting the entire block of code, the key added on top of the block of code is encrypted, thereby, enabling the booting operation for the block of code to be light-weight. The validation process also becomes light weight similarly.

Optionally, fetching the block of code includes reading a block size for the block of code, and generating the validation checksum includes determining that the execution is complete based on the block size. The addition of the length of the code and the key at the beginning of the blocks of code ensures that the length of the basic block of code and expected checksum value are configured by the processor. Thereby, the blocks of code may be validated easily using the additions.

Optionally, the reference checksum is encrypted, and generating the validation checksum includes encrypting the validation checksum. The encryption of the validation checksum wraps the key and the only authorized processor can unwrap the key for further use, thereby providing secured validation using a light-weight key.

Optionally, the reference checksum is calculated for the block of code after compiling and is added to the beginning of the block of code before providing the block of code to the processor for execution. By calculating the reference checksum and injecting at the beginning of the basic blocks configures a hardware to identify the length of the basic block of code easily. Thereby, values of registers that are initialized are used during validation of checksum values.

Optionally, generating the validation checksum is implemented in the processor logic. Optionally, the processor logic includes instructions to create the configuration register and to recognize and store the reference checksum. The processor logic provides extended hardware that belongs to same system to enforce validation of the validation checksum generated over basic blocks of code and this provides an internal validation.

Optionally, the reference checksum is embedded with instructions to record the configuration register value. The processor also provides n-bit input with k-bit key that requires (n-k) shift and XOR operations for non-encrypted checksum. Thereby, in some embodiments the processor validates the block of code using the non-encrypted checksum. Alternatively, as described elsewhere, the checksum is encrypted and it is compared against an encrypted reference checksum.

Optionally, generating an interrupt by the processor if the validation checksum does not match the reference checksum. The interrupt signal indicates the malicious attack and thereby safeguards the other blocks of code before execution using the interrupt signal.

Optionally, generating the validation checksum includes generating the validation checksum using a seed value, where the block of code is configured to provide a seed value to each other block of code that can validly follow the present block of code. The execution order of consecutive blocks of code is controlled using the seed value, thereby improving the management of the execution of the blocks of code.

In an example implementation, a small Internet of Things (IoT) device includes a real-time operating system (RTOS) and applications. The IoT device may be security critical as it involves transactions, monitoring critical sensors, locking, etc. and an integrity of the system is also much more important compared to normal computers. In such IoT devices, when the software is programmed into the device, a key is provisioned. During a booting operation of the device, the key is programmed into internal registers of central processing unit (CPU) through system registers. The key may not be replaced without rebooting the device. As soon as the key is programmed, a basic block protection feature is enabled to protect the whole software stack. Optionally, a third-party application is purchased and delivered to a client’s system, and the third-party application is restricted to use only in the client’s system. The client obtains a licensing key to program the client’s system when the third-party application is installed. The third-party application gets loaded by enabling a basic block protection mechanism by operating system kernel. When the operating system kernel configures the client’s system, when the licensed application is running, the basic block protection mechanism is enabled.

The following exemplary instructions, when executed, record the reference checksum in a configuration register in the processor: my_function: my_function: add tO, tl, t2 li a5, 0x04abcdef add tl, t2, tO csrw pccsref, a5 add t2, tO, tl add tO, tl, t2 ret add tl, t2, tO add t2, tO, tl ret

In the above exemplary instructions, “04” refers to number of instructions, “abcdef ’ refers to reference checksum, “csrw” is an instruction which writes a configuration register, and “pccsref’ refers to a register for reference checksum.

The following exemplary instructions, when executed, record the reference checksum in a configuration register in the processor: my_function: my_function: add tO, tl, t2 li a5, 0x04000000 add tl, t2, tO csrw pccsref, a5 add t2, tO, tl add tO, tl, t2 ret add tl, t2, tO add t2, tO, tl ret

In the above exemplary instructions, “04” refers to number of instructions, “000000” refers to padding, “csrw” is an instruction which writes a configuration register, and “pccsref” refers to a register for reference checksum.

The following exemplary sequence of instructions in C language, when executed, calculate the lookup table from a secret polynomial. When the secret polynomial is not known, then a pre-computed lookup table is used to increase performance of the CRC32 technique. for (unsigned int i = 0; i <= OxFF; i++) { uint32_t crc = I; for (unsigned int j = 0; j < 8; j++) crc = (crc >> 1) ^L (( crc & 1) * Polynomial);

Crc32Lookup [0][i] = crc;

} for (unsigned int i = 0; i <= OxFF; i++) {

Crc32Lookup [l][i] = (Crc32Lookup [0][i] >> 8) ^L (Crc32Lookup [0][ Crc32Lookup [0][i] & OxFF] ;

Crc32Lookup [2][i] = (Crc32Lookup [ 1] [i] > > 8) ^L (Crc32Lookup [0][ Crc32Lookup [l][i] & OxFF] ;

Crc32Lookup [3][i] = (Crc32Lookup [2][i] >> 8) ^L (Crc32Lookup [0][ Crc32Lookup [2][i] & OxFF] ;

}

The following exemplary sequence of instructions in C language provides the three scenarios mentioned above:

Scenario 1 - csrw pccsseed, 0x0123 m y_p rc v i o u s_h 1 oc k : li a5, 0x04123456 csrw pccsref, a5 add tO, tl, t2 add tl, t2, tO add t2, tO, tl csrw pccsseed, 0x1234 my_nextJ>lock: li a5, 0x03234567 csrw pccsref, a5 add tl,t2,t3 add t2,t3,tl

Scenario 2- csrw pccsseed, 0x2345 myjoop: li a5, 0x05123456 csrw pccsref, a5 add tO, tl, t2 add tl, t2, tO add t2, tO, tl /* Seed of myjoop */ csrw pccsseed, 0x2345 bne tO, tl, myjoop /* Seed of my_nextJlock */ csrw pccsseed, 0x3456 my_next_block: li a5, 0x03234567 csrw pccsref, a5 add tl ,t2,t3 add t2,t3,tl

Scenario 3 - my_lst_function: my_dst_function : li a5, 0x08123456 li a5, 0x07234567 csrw pccsref, a5 csrw pccsref, a5 add tO, tl, t2 add tO, tl, t2 add tl, t2, tO add tl, t2, tO add t2, tO, tl add t2, tO, tl csrw pccsseed, 0x5678 ret jal my_dst_function /* It will return here */

/* Seed of my_2nd_function */ csrw pccsseed, 0x4567 my_2nd_function :

The execution order of consecutive blocks of code is controlled using the seed value, thereby improving the management of the execution of the blocks of code.

The following exemplary sequence of instructions in C language provides to calculate the validation checksum using a lookup table. uint32_t crc32_4bytes(const void* data, size_t length, uint32_t previousCrc32 = 0) { uint32_t* current = (uint32_t*) data;

// process four bytes at once while (length >= 4) { crc ^L= * current++; crc = Crc32Lookup[3][ crc & OxFF] ^L Crc32Lookup[2][ crc>>8 & OxFF] ^L Crc32Lookup[l][ crc>>16 & OxFF] ^L Crc32Lookup[0][ crc>>24]; length -= 4;

} return -crc;

}

FIG. 9 is an illustration of an exemplary computing arrangement (e.g. a processing device) 900 in which the various architectures and functionalities of the various previous implementations may be implemented. As shown, the computing arrangement 900 includes at least one processor 904 that is connected to a bus 902, wherein the computing arrangement 900 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point communication protocol (s). The computing arrangement 900 also includes a memory 906.

Control logic (software) and data are stored in the memory 906 which may take the form of random- access memory (RAM). In the present description, a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.

The computing arrangement 900 may also include a secondary storage 910. The secondary storage 910 includes, for example, a hard disk drive and a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory. The removable storage drive at least one of reads from and writes to a removable storage unit in a well-known manner.

Computer programs, or computer control logic algorithms, may be stored in at least one of the memory 906 and the secondary storage 910. Such computer programs, when executed, enable the computing arrangement 900 to perform various functions as described in the foregoing. The memory 906, the secondary storage 910, and any other storage are possible examples of computer-readable media.

In an implementation, the architectures and functionalities depicted in the various previous figures may be implemented in the context of the processor 904, a graphics processor coupled to a communication interface 912, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the processor 904 and a graphics processor, a chipset (i.e., a group of integrated circuits designed to work and sold as a unit for performing related functions, etc.).

Furthermore, the architectures and functionalities depicted in the various previous figures may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application- specific system. For example, the computing arrangement 900 may take the form of a desktop computer, a laptop computer, a server, a workstation, a game console, an embedded system.

Furthermore, the computing arrangement 900 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a smart phone, a television, etc. Additionally, although not shown, the computing arrangement 900 may be coupled to a network (e.g., a telecommunications network, a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, a peer-to-peer network, a cable network, or the like) for communication purposes through an FO interface 908.

It should be understood that the arrangement of components illustrated in the figures described are exemplary and that other arrangement may be possible. It should also be understood that the various system components (and means) defined by the claims, described below, and illustrated in the various block diagrams represent components in some systems configured according to the subject matter disclosed herein. For example, one or more of these system components (and means) may be realized, in whole or in part, by at least some of the components illustrated in the arrangements illustrated in the described figures.

In addition, while at least one of these components are implemented at least partially as an electronic hardware component, and therefore constitutes a machine, the other components may be implemented in software that when included in an execution environment constitutes a machine, hardware, or a combination of software and hardware.

Although the disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions, and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims

1. A computer-implemented method for executing code in a processor (102, 600), the method comprising: fetching, by the processor (102, 600), a block of code (200A-N, 300A-N) to be executed, reading a reference checksum and recording the reference checksum in a configuration register (606) in the processor (102, 600); executing the block of code (200A-N, 300A-N) using the processor (102, 600); generating, by the processor (102, 600), a validation checksum based on the executed code; and comparing, by the processor (102, 600), the validation checksum against the recorded reference checksum.

2. The computer- implemented method of claim 1, wherein generating the validation checksum comprises updating the validation checksum for each instruction in the block of code (200A-N, 300A-N) as the instruction is executed.

3. The computer-implemented method of claim 1 or claim 2, wherein generating the validation checksum comprises generating the validation checksum using a key (204), wherein the reference checksum is generated using the same key (204).

4. The computer-implemented method of claim 3, wherein the key (204) is stored in the processor (102, 600).

5. The computer- implemented method claim 4, wherein the key (204) is recorded in a write- only register of the processor (102, 600) during a booting operation of the processor (102, 600).

6. The computer-implemented method claim 3, wherein the key (204) is received via software and provided to the processor (102, 600) using the software.

7. The computer-implemented method of claim 6, wherein the provided key is encrypted with an asymmetric public key and an associated decryption private key is configured into the processor (102, 600).

8. The computer-implemented method of any preceding claim, wherein fetching the block of code includes reading a block size for the block of code (200A-N, 300A-N), and generating the validation checksum comprises determining that the execution is complete based on the block size.

9. The computer-implemented method of any preceding claim, wherein the reference checksum is encrypted and generating the validation checksum comprises encrypting the validation checksum.

10. The computer-implemented method of any preceding claim, wherein the reference checksum is calculated for the block of code (200A-N, 300A-N) after compiling and is added to the beginning of the block of code before providing the block of code (200A-N, 300A-N) to the processor (102, 600) for execution.

11. The computer- implemented method of any preceding claim, wherein generating the validation checksum is implemented in the processor logic.

12. The computer-implemented method of claim 11, wherein the processor logic includes instructions to create the configuration register (606) and to recognize and store the reference checksum.

13. The computer- implemented method of any one of claims 1 to 11, wherein the reference checksum is embedded with instructions to record the configuration register value.

14. The computer- implemented method of any preceding claim, further comprising generating an interrupt by the processor (102, 600) if the validation checksum does not match the reference checksum.

15. The computer- implemented method of any preceding claim, wherein generating the validation checksum comprises generating the validation checksum using a seed value, where the block of code (200A-N, 300A-N) is configured to provide a seed value to each other block of code that can validly follow the present block of code.

16. A processing device (100) for executing code, the processing device (100) configured to perform the method of any preceding claim.