EP4205010A1

EP4205010A1 - Fault resistant memory access

Info

Publication number: EP4205010A1
Application number: EP20774955.7A
Authority: EP
Inventors: Andrew Dellow; Mark Bowen Hill; Tariq Kurd
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2023-07-05
Also published as: WO2022058010A1

Abstract

Described herein is a data processing device (100) comprising an execution core (101) and a memory system (102-109) communicatively coupled to the execution core via a data path, the memory system being configured to receive from the execution core a request to fetch data from a location in the memory system, the request including data specifying the location, and in response to that request: retrieve a data unit from the location; form an error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit; and transmit the retrieved data unit and the error check value to the execution core. This can usefully reveal that a fault has occurred in accessing the memory system.

Description

FAULT RESISTANT MEMORY ACCESS

FIELD OF THE INVENTION

This invention relates to the detection of faults in data processing devices, for example when accessing the memory of a security central processing unit.

BACKGROUND

Faults can occur in data processing devices such as security central processing units (CPU). These faults can lead to undesirable behaviour of the device. For example, a fault in such a device can lead to an incorrect instruction being run, which can give an unexpected result.

Faults can be introduced accidentally or deliberately. An example of an accidental fault is one caused by a high energy particle or beam, such as a cosmic ray, that strikes a part of the processing device, causing it to malfunction. Variations in characteristics or performance of a component device of a processing device, for example due to small feature size, can introduce faults into the processing device.

Examples of deliberate fault injection are those caused by laser probing (such as heating up at least a portion of the processing device using a laser) and introducing glitches in the power supply to the processing device. Deliberate fault injection can be used to try to get unexpected behaviour in the processing device. If a processing device runs incorrect instructions, an attacker can use this to try and compromise the processing device. This can help an attacker gain control of the processing device and/or reveal details of the operation of the processing device.

A security CPU is implemented within a secure subsystem, where it has a local memory. The local memory is accessible from outside the subsystem, but the access from other components is very restricted. The security CPU authenticates code and copies it to the local memory before it executes it. If it is possible to tamper with the local memory, then the integrity of the device is broken. One method of breaking the integrity is to cause the CPU to fetch from the wrong memory location. This could, for example, cause the CPU to fail to execute a branch which checks whether security criteria have been met. The CPU then fails to detect a security problem and continues executing.

Injected faults can also corrupt the memory contents, corrupt memory write data between the core and memory, corrupt read data between the memory and the core and corrupt the memory address used to access the memory.

It is desirable to provide secure devices which can prevent or reduce the number of instances of an attacker gaining access to the secret information.

One method of fault protection is to use parity protection on the data busses. However, this may only offer weak protection and does not check whether the memory address has been corrupted.

In US2017/0185535 A1 , the contents of the memory is protected, but the method does not provide for detection of corruption on the bus network.

US 7603609 B2 describes a method and system for optimized instruction fetch to protect against soft and hard errors. This method replaces faulty read data with a known value.

In the presence of faults, it is desirable to be able to check that the CPU accessed the correct location in the memory when making memory requests and to check that write data reached the memory without corruption and read data was returned from the memory without corruption. It is further desirable to protect the long wires between the memory system and the core which may have injected faults.

SUMMARY

According to one aspect there is provided a data processing device comprising an execution core and a memory system communicatively coupled to the execution core via a data path, the memory system being configured to receive from the execution core a request to fetch data from a location in the memory system, the request including data specifying the location, and in response to that request: retrieve a data unit from the location; form an error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit; and transmit the retrieved data unit and the error check value to the execution core. This can usefully reveal that a fault has occurred in accessing the memory system and may provide a low cost, simple method for detecting address faults in the memory access, write data faults between the core and the memory, read data faults between the memory and core and fetch buffer faults.

Only every n-th location in the memory may be addressable by a request from the execution core, and the memory system may be configured to, in response to the request: retrieve n data units from each of n contiguous locations including the location specified in the request; form n error check values each computed over both of (i) at least part of the data specifying the location and (ii) at least part of a respective one of the retrieved data units; and transmit the n retrieved data units and the n error check values to the execution core. This may allow the integrity of the n values to be verified in response to such a request. n may be 2. This may provide an efficient balance between transferred data content and error check data.

The length of each data unit may be two bytes. This may provide an efficient balance between transferred data content and error check data.

The data processing device may comprise a bus for conveying the or each retrieved data unit and the or each error check value from the memory system to the execution core. The width of the bus may be at least n times the total number of bits in a data unit and an error check value. This may allow the data units and the error check values responsive to a single request to be passed simultaneously.

The memory system may be configured to generate the or each error check value using a cyclic redundancy check computed over the inputs thereto. This may be an efficient manner of computing the error check value.

The error check value may be a remainder of the cyclic redundancy check. This may efficiently compress the error check information.

The memory system may be configured to transmit the or each data unit and the or each error check value responsive to a single request in parallel to the execution core. This may be an efficient way of transporting data between the components of the data processing device. The memory system may comprise a storage block and an output interface, wherein the storage block stores data at the location and the output interface is configured to form the error check value. This may allow the memory system to store the data internally.

The memory system may comprise an input interface configured to receive the request and scramble the data specifying the location to determine the location specified by the data. This may conveniently allow the memory address and data to be scrambled before accessing the memory.

The execution core may be configured to: form the request including data specifying the location; store, locally to the execution core, the data specifying the location; transmit the request to the memory system; receive from the memory system a data unit and an error check value; form a local error check value computed over both of (i) at least part of the data specifying the location as stored locally to the execution core and (ii) at least part of the data unit as received from the memory system; check whether the local error check value matches the error check value received from the memory system; and if the local error check value does not match the error check value received from the memory system, raise an error.

The execution core may be configured to raise the error such that execution is halted. This may prevent a fault from corrupting the system or prevent an attacker from being able to influence or control the operation of a processing module in fetching instructions from memory without such influence or control being detected. Once a fault has been detected, steps can be taken to reassert authorised control and limit the effect of the attack or other fault.

The execution core may have an execution pipeline and the execution core may be configured to locally store the data specifying the location in the execution pipeline.

The execution core may be configured to: interpret the data unit as received from the memory system as an instruction; execute that instruction; and perform the said check before completing the instruction. This may prevent a faulty instruction from updating state in the CPU as a result of an accidental or deliberate fault.

The memory system may be parity protected. This may provide additional fault protection. BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings.

In the drawings:

Figure 1 shows an example of a data processing device comprising an execution core and a memory.

Figure 2 shows an exemplary flowchart detailing the method steps performed by the memory.

DETAILED DESCRIPTION

It is desirable to provide secure devices which can prevent or reduce the number of instances of an attacker gaining access to secret information. In general, it is desirable to prevent an attacker from being able to influence or control the operation of a processing module in fetching instructions from memory without such influence or control being detected. Once it is detected, steps can be taken to reassert authorised control and limit the effect of the attack or other fault.

The present disclosure relates to a method for ensuring that the processing module accesses the correct location in the memory, write data reaches the memory without corruption and read data is returned from the memory without corruption.

As will be described in more detail below, additional signals are added to the memory bus interface to protect the data and the memory address. The write data is protected on the memory request path. The read data is protected on the memory response path. The accessed address is looped back to the core processor. The core processor then checks that the accessed address matches the expected address.

In general, the memory receives a request from the core to fetch data from a location in the memory. The request including data specifying the location of the requested data in the memory. In response to that request, the memory is configured to retrieve a data unit from the location. An error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit is formed. The retrieved data unit and the error check value is then transmitted to the execution core.

The error check is preferably a cyclic redundancy check (CRC). However, other suitable error checks may also be used.

In a preferred implementation, CRC protection is added to the read data and the write data. The error check value may be a CRC remainder. The address can be included in the CRC calculation so that the CRC check only passes if the address was received. A CRC check can detect any number of bit errors, depending on the polynomial used for the CRC check.

CRC remainders are generated from not only the data but also from the requested address. This means that when the data is received (write data at the memory, read data at the core) then the CRC check includes whether the address was correctly received or not. For the write data check at the memory, this is simple because the address is sent as part of the request.

For the read data, this means that the address is looped back to the core and so the core records the originally requested address and use that to check the CRC. The address loopback is achieved without directly adding the address to the bus response.

For load data, the load-store unit (LSU) checks the CRC to confirm that data was read from the correct address. For instructions, as well as checking that the fetch was received from the correct address from the bus, the CRC can also be used to show that instructions issue at the correct PC, by pushing instructions into the fetch buffer with the CRC remainder and checking the CRC remainder at execution. This allows faults in the instruction fetch buffer and surrounding logic to also be detected, as these faults often cause instructions to issue at an incorrect PC.

In a preferred embodiment where the execution core is a RISC-V core, a CRC remainder may be calculated for every 16-bits of data, because RISC-V instructions are of variable length but are a multiple of 16-bits.

Figure 1 shows an example of a Hi2120CS NB-loT data processing device 100 comprising a HiMiDeerSV100 RISC-V execution core 101 and a memory 105 communicatively coupled to the execution core via a data path, which in this example is a bus. The memory 105 is configured to store a respective one of a plurality of data units at a respective one of a plurality of locations. In this example, the memory 105 is parity protected. As will be described in more detail below, the memory is configured to receive, from the execution core 101 , a request to fetch data from a location in the memory 105. The request includes data specifying the location of the data in the memory 105.

In response to that request, the memory system is configured to retrieve a data unit from the location and form an error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit. The memory system then transmits the retrieved data unit and the error check value to the execution core 101 .

The bus conveys the retrieved data unit(s) and the error check value(s) from the memory system to the execution core. The width of the bus is preferably at least n times the total number of bits in a data unit and an error check value to allow the data units and the error check values responsive to a single request to be passed simultaneously. Preferably, n is 2. The length of each data unit may be two bytes. This may provide an efficient balance between transferred data content and error check data.

In the example shown in Figure 1 , the AHB (Advanced High-performance Bus) signals are as follows. A HWDATA_CRC is generated from HWDATA (write/store data) and HADDR (request address). The CRC is checked at the memory to detect address/write data corruption. HRDATA_CRC is generated at the memory from HRDATA (read data) and HADDR (address) and is checked in the core. For loads, the HRDATA_CRC check ensures that the address reaches the memory and is returned to the core correctly.

As shown in Figure 1 , in a first step, the SV100 core 101 makes a bus request (l-BUS and I or D-BUS). HWDATA_CRC signals are added to the AHB, which in this example are CRC12 (polynomial 0x8f8) generated from HADDR and HWDATA and are invalid for loads. The arbiter 102 selects a bus request, and passes it to block 103. At block 103, which may act as an input interface for the memory system, the memory address and data is scrambled before accessing the memory. Block 104 generates parity to write to the memory. The memory 105 is then accessed. At block 106, HADDR, HWDATA and HRDATA (memory read data) are descrambled. At block 107, HWDATA_CRC is checked for stores. At block 108, which may act as an output interface for the memory system, the load data parity is checked from the memory and HRDATA_CRC for the loads is generated to return to the core from HADDR, and HRDATA (read data). At block 109, the response signals are registered, if necessary for timing. Back at the core 101 , the core receives the response and checks the CRCs against the read data, and the requested address. A bus sideband signal relating to the system therefore loops back the memory address to the core. Fault detection logic relating to the signal checks that the actual address accessed matches the requested address. As described above, the sideband signal may be a CRC calculated across the instruction and the returned address. The returned CRC may be checked using the returned data and an independently calculated version of the expected address, and may cause an alarm to fire on failure of the check.

In this example, CRC12 remainders are generated for each 16-bits of data. Variable length encoding means that 16-bit quantities are issued (one or more at a time). CRC12 remainders can be pushed into the instruction-fetch buffer, one for each 16-bits on instruction (much smaller than recording the looped back PC). In the execute stage, the CRC remainder is checked against the local execute PC and the issued instruction. This confirms that the instruction fetch unit (IFU) issued the instruction at the correct PC. This protects the instruction-fetch buffer against fault injection, for example missing a push due to a fault. HRDATA_CRC is also check in the load-store unit (LSU). Therefore, this check confirms that loads accessed the correct address and also means that the memory system can’t return a CRC for stores.

Longer CRC lengths may give improved protection. However, other CRC lengths, such as CRC6 remainders, may be used.

In one embodiment, four bytes are read from the memory at once and then the CRCs are calculated across each pair of bytes. For example, bytes 0-3 are read and a CRC is formed from 0-1 and 2-3. Instructions are preferably issued in 16-bit elements. Each instruction may have 1 to m 16-bit elements (1 to 3 in this embodiment). Therefore, it is desirable to be able to check the CRC across any pair of bytes. Preferably, the lowest numbered byte should be even.

For example:

Addr 0: 3 2 1 0

Addr 1 : 7 6 5 4

Addr 2: b a 9 8

The above bytes may be 12 bytes in the memory. For example, [1 :0] could be an instruction, [5:4, 3:2] could be an instruction, [b:a, 9:8, 7:6] could be an instruction, etc. Therefore, when the instruction is issued, 1-3 pairs of bytes are issued, each pair with its CRC. The CRC can then be checked for each pair. Therefore, each pair may have its own CRC, and four bytes may be read from the memory so as to generate two CRCs to return to the core.

In a preferred implementation, only every n-th location in the memory is addressable by a request from the execution core. In this case, the memory system may be configured to, in response to the request: retrieve n data units from each of n contiguous locations including the location specified in the request; form n error check values each computed over both of (i) at least part of the data specifying the location and (ii) at least part of a respective one of the retrieved data units; and transmit the n retrieved data units and the n error check values to the execution core. This may allow the integrity of the n values to be verified in response to such a request.

For efficiency, preferably, the memory transmits the fetched data unit(s) and the error check value(s) responsive to a single request in parallel to the execution core.

The method described herein therefore provides a low cost, simple method for detecting address faults in the memory access, write data faults between the core and the memory, read data faults between the memory and core and fetch buffer faults.

Where a fault occurs, it may be desirable to try to correct that fault and to carry on with accessing the memory. An alternative is to stop the process on fault detection. Where a fault is deliberately introduced, processing can be stopped, the fault reviewed and the processing core or module reset to clear the fault.

The execution core may be configured to: form the request including data specifying the location; store, locally to the execution core, the data specifying the location; transmit the request to the memory; receive from the memory a data unit and an error check value; form a local error check value computed over both of (i) at least part of the data specifying the location as stored locally to the execution core and (ii) at least part of the data unit as received from the memory; check whether the local error check value matches the error check value received from the memory; and if the local error check value does not match the error check value received from the memory, raise an error. For example, in response to detecting an error, the execution core may be configured to raise the error such that execution is halted. In one implementation, the execution core may have an execution pipeline and the execution core may be configured to locally store the data specifying the location in the execution pipeline.

In one implementation, the execution core may be configured to interpret the data unit as received from the memory as an instruction, and perform the error check before completing the instruction. If the error check indicates a fault, the instruction may be prevented from completing, so as to prevent the faulty instruction from updating state in the CPU.

Figure 2 summarizes a method 200 in which the following steps are performed by the memory system. At step 201 , the method comprises retrieving a data unit from the location. At step 202, an error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit is formed. At step 203, the retrieved data unit and the error check value are transmitted to the execution core.

The techniques described herein may be useful in various security CPUs, for example in HiMiDeerSVxxx security CPUs which form part of the Huawei product line.

The method described herein may therefore allow, in the presence of faults, to check that the CPU accessed the correct location in the memory when making memory requests (i.e. that the write data reaches the memory without corruption), that write data reached the memory without corruption, and that read data was returned to the core from the memory without corruption. It may also enable protection of the long wires between the memory system and the core, which may have injected faults.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. A data processing device (100) comprising an execution core (101) and a memory system (102-109) communicatively coupled to the execution core via a data path, the memory system being configured to receive from the execution core a request to fetch data from a location in the memory system, the request including data specifying the location, and in response to that request: retrieve a data unit from the location; form an error check value computed over both of (i) at least part of the data specifying the location and (ii) at least part of the retrieved data unit; and transmit the retrieved data unit and the error check value to the execution core.

2. A data processing device as claimed in claim 1 , wherein only every n-th location in the memory system is addressable by a request from the execution core, and the memory system is configured to, in response to the request: retrieve n data units from each of n contiguous locations including the location specified in the request; form n error check values each computed over both of (i) at least part of the data specifying the location and (ii) at least part of a respective one of the retrieved data units; and transmit the n retrieved data units and the n error check values to the execution core.

3. A data processing device as claimed in claim 2, wherein n is 2.

4. A data processing device as claimed in claim 2 or 3, wherein the length of each data unit is two bytes.

5. A data processing device as claimed in any of claims 2 to 4, comprising a bus for conveying the or each retrieved data unit and the or each error check value from the memory system to the execution core, the width of the bus being at least n times the total number of bits in a data unit and an error check value.

6. A data processing device as claimed in any preceding claim, wherein the memory system is configured to generate the or each error check value using a cyclic redundancy check computed over the inputs thereto.

7. A data processing device as claimed in claim 6, wherein the error check value is a remainder of the cyclic redundancy check.

8. A data processing device as claimed in any preceding claim, wherein the memory system is configured to transmit the or each data unit and the or each error check value responsive to a single request in parallel to the execution core.

9. A data processing device as claimed in any preceding claim, wherein the memory system comprises a storage block (105) and an output interface (108), the storage block (105) stores data at the location and the output interface (108) is configured to form the error check value.

10. A data processing device as claimed in claim 9, wherein the memory system (105) comprises an input interface (103) configured to receive the request and scramble the data specifying the location to determine the location specified by the data.

11 . A data processing device as claimed in any preceding claim, wherein the execution core is configured to: form the request including data specifying the location; store, locally to the execution core, the data specifying the location; transmit the request to the memory system; receive from the memory system a data unit and an error check value; form a local error check value computed over both of (i) at least part of the data specifying the location as stored locally to the execution core and (ii) at least part of the data unit as received from the memory system; check whether the local error check value matches the error check value received from the memory system; and if the local error check value does not match the error check value received from the memory system, raise an error.

12. A data processing device as claimed in claim 11 , wherein the execution core is configured to raise the error such that execution is halted.

13. A data processing device as claimed in claim 11 or 12, wherein the execution core has an execution pipeline and the execution core is configured to locally store the data specifying the location in the execution pipeline. ata processing device as claimed in any of claims 11 to 13, wherein the execution core ured to: interpret the data unit as received from the memory system as an instruction; execute that instruction; and perform the said check before completing the instruction.