WO2023025370A1 - Control flow integrity - Google Patents

Control flow integrity Download PDF

Info

Publication number
WO2023025370A1
WO2023025370A1 PCT/EP2021/073329 EP2021073329W WO2023025370A1 WO 2023025370 A1 WO2023025370 A1 WO 2023025370A1 EP 2021073329 W EP2021073329 W EP 2021073329W WO 2023025370 A1 WO2023025370 A1 WO 2023025370A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
computing system
calling chain
stage
pointer
Prior art date
Application number
PCT/EP2021/073329
Other languages
French (fr)
Inventor
Qiming Li
Kui Wang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2021/073329 priority Critical patent/WO2023025370A1/en
Publication of WO2023025370A1 publication Critical patent/WO2023025370A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Definitions

  • the present disclosure relates to methods and systems for providing control flow integrity in a computing system.
  • control flow of a program refers to the order in which statements and function calls are executed by the computing system.
  • Modern computing systems implement programs with highly complex control flows. Controls flows may present a security risk if an attacker can hijack the control flow and force the computing system to execute malicious code. As such, computing systems may implement measures to protect the integrity of control flows.
  • a computing system comprising a processor and a memory communicatively coupled to the processor.
  • the memory is configured to store program code executable by the processor.
  • the program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain.
  • the computing system is configured to: access the program code from the memory and generate a data value for respective stages of the calling chain based on an output of a cryptographically secure function.
  • the data value is generated based on a first input identifying a data vanable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
  • the computing system according to the first aspect provides control flow integrity for a computer program.
  • a computing system comprising a processor and a memory communicatively coupled to the processor.
  • the memory is configured to store program code executable by the processor.
  • the program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain and a first set of data values that validate the calling chain in the control flow of the program code.
  • the computing system is configured to access the program code and the first set of data values, generate a second set of data values for the respective stages of the calling chain based on an output of a cryptographically secure function and compare the second set of data values to the first set of values to determine the validity of the calling chain.
  • Each of the data values of the second set of data values is generated based a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
  • each of the data variables comprises a pointer.
  • the computing system according to the first implementation form provides protection from pointer substitution attacks.
  • the computing system sets the third input for the last stage of the calling chain to a default data value.
  • the information specific to the stage comprises identification data identifying the data structure type from a set of valid data structure types for the computing system.
  • the third implementation form binds the data value for the respective stages to the data structure associated to the stage.
  • the data structure comprises a list, a struct or a dictionary.
  • the computing system comprises an instruction set extension comprising one or more hardware-implemented instructions.
  • the computing system is arranged to generate the one or more data values for each stage of the calling chain using the instruction set extension.
  • the sixth implementation form improves performance by reducing the number of hardware instructions required to generate data values.
  • the instruction set extension comprises a pointer authentication security extension.
  • the instruction set extension comprises one or more further instructions to augment the PAuth security extension.
  • the seventh and eight implementation forms provide an efficient implementation based on a customization of an existing pointer authentication extension.
  • the computing system is configured to generate a fault exception in response to determining that a data value in the first set of data values is invalid.
  • Figure 1 shows a schematic diagram of a function calling chain, according to an example.
  • Figure 2 shows a method for providing control flow integrity for a computer program in a computing system, according to an example.
  • Figure 3 shows a method for evaluating the control flow integrity of a computer program, according to an example.
  • Figure 4 shows a simplified schematic diagram of a computing system, according to an example.
  • Forward-edge Control Flow Integrity (CFI) of a software program refers to the security property that only the intended functions are called during program execution.
  • attackers cannot force the program to call an unintended function supplied by the attacker.
  • Such an attack is possible when functions are not called directly, but via function pointers stored in writable memory.
  • an attacker may utilize a vulnerability that allows modifications to such functions pointers and make them point to carefully selected locations in the code to achieve a malicious purpose such as gaining privileges in the system that should not be granted to the attacker.
  • ISAs Instruction Set Architectures
  • PAuth Pointer Authentication
  • PAC Pointer Authentication Code
  • a PAuth extension comprises a pair of operations. Firstly, a signing operation denoted: p ⁇ - PAC(p, modifier)
  • the PAC operation takes a pointer value p stored in a CPU register as input and optionally with another register that contains a further data value, sometimes referred to as a modifier.
  • a cryptographically secure message authentication code (MAC) is computed from the pointer and the modifier, and the resulting PAC output may be inserted into unused higher bits in the pointer.
  • MAC cryptographically secure message authentication code
  • an AUT operation such a signed pointer is verified against a modifier. If the signed pointer has not been tampered with, the AUT operation will succeed, and the PAC is removed from the signed pointer stored in a CPU register. The pointer can then be used to call functions or load data from memory. If the signed pointer has been tampered with, or the modifier is different from the one used during signing, the AUT operation will fail and the pointer is left corrupted. Subsequent use of the pointer will cause, for example, a memory error.
  • Signing of pointers using PAC may be performed by a component of an operating system, such as a loader program.
  • a loader program loads program executables and links the program with shared libraries in the system as required. During such a linking stage, pointers to required functions in the shared libraries are computed and placed within a specific region in the program. If the PAuth instruction set extension is enabled for the program, these function pointers are signed by the loader.
  • the cryptographic keys used by these operations may be managed using a component with a higher privilege in the system, such as the operating system, a hypervisor, or a Trusted Execution Environment (TEE).
  • the cryptographic keys may be different for each device power cycle and for each program process. In other words, the keys are not persistent, and change when the computing device reboots. In examples, they may also change when the same computer program ends its current execution and is loaded and executed a second time.
  • code compilation may happen during the execution of a program.
  • the program may be executed by an interpreter which translates source code into native instructions on-the-fly, or perform Just-in-Time (JIT) compilation on part of the program.
  • JIT Just-in-Time
  • an interpreter or the JIT compiler may sign pointers and/or data, provided that they run in the same process space as the compiled code.
  • FPAC Faulting PAC
  • the AUT operation When FPAC is implemented and enabled, the AUT operation generates hardware faults when verification of a PAC fails. In this case, the system may not have to wait until a corrupted pointer is loaded for use hardware faults to occur. In general it is infeasible to guess the correct PAC value without knowing a cryptographic key use to generate the PAC value. As such an attacker cannot use an arbitrary pointer and forge a valid PAC for the pointer. However, a pointer substitution attack may still be possible where the attacker replaces a valid signed pointer by another signed pointer, where the latter signed pointer is also valid but pointing to a different function. In this case the construction of a modifier as a further input to the PAC becomes significant: if modifiers are not used or not carefully chosen, pointer substitution attacks may still be feasible despite the presence of a hardware MAC.
  • Figure 1 is a simplified schematic diagram of a calling chain 100 for an indirect function call, according to an example.
  • the calling chain 100 shown in Figure 1 comprises four stages and begins with an object pointer 105.
  • the object pointer obj 105 points to a Struct-type memory object 110.
  • diet _ptr 120 may be located in the Struct-type object 110.
  • the pointer diet _ptr 120 points to lookup key 125 in a dictionary-type object 130.
  • the lookup key 125 may be used to locate a list pointer, list _ptr 135 and a list index, list index 140 in the dictionary object 130.
  • the list pointer 135 and list index 140 are used to locate a function pointer, func _ptr ⁇ 5 in a list-type object 150.
  • func _ptr 145 points to a specific function 155 in code 160.
  • the calling chain 100 shown in Figure 1 is illustrative of a calling chain that may be found in a program written in, for example, an OOP language.
  • the process illustrated in Figure 1 may be repeated many times before a target function in the form of a pointer to a segment of executable code is identified.
  • PAuth operations PAC and AUT are insufficient for protecting the control flow of the calling chain 100 without further instructions.
  • the PAC operation is configured to receive a function pointer as input.
  • the calling chain 100 also uses non-pointer, reference data values such as the lookup key 125 and list index 140. These data values may be stored in writable memory and as such may also be vulnerable to substitution attacks similar to attacks previously described in relation to function pointers.
  • the methods and systems described herein provide control flow integrity for indirect function calls.
  • the methods described may be used to sign pointers and reference data values to prevent pointer substitution attacks where a legitimately signed pointer is substituted by another legitimately signed pointer to modify the control flow of programs.
  • the integrity of function calls with long indirect calling chains may be protected, as well as function calls with shorter chains or even those with a single level of indirection.
  • the methods described herein utilize a hardware-implemented MAC.
  • the underlying computing system has access to a secure hardware MAC for which it is infeasible for an attacker and for which cryptographic keys are managed in a more privileged or trusted component of the computing system.
  • MAC hardware-supported message authentication code
  • the actual instructions used to implement these operations may vary, depending on the underlying platform. According to examples these five operations may be implemented using direct single instruction hardware or alternatively using sequences of instructions provided in an instruction set architecture of a platform.
  • the two signing operations, SignPointer and SignData take three input parameters, the item to be signed, a first data value, S, referred to herein as the scene and a second data value, C, referred to herein as the context. Each of the operations outputs a data value.
  • the two verification operations, VerifyPointer and VerifyData take a signed item, a scene, S, and a context, C, as input parameters, and output the original item if verification succeeds, or result in an error when verification fails.
  • the bit length of pointers is fixed whereas the length of data values that may be input to SignData may vary.
  • bit length of a signed pointer may be kept to the same size as a pointer.
  • the data that is being signed in an indirect call resolution are often relatively small. As such it may be possible to keep the size of the data unchanged before and after signing operations.
  • the LoadAndVerify operation loads a second signed pointer, SignedPointer2, from the memory address given in a first signed pointer, SignedPointerl, and verifies the first signed pointer, SignedPointerl, using the given scene, Scene 1, and the loaded second signed pointer, SignedPointer2, as the context.
  • a pointer or data in an indirect calling chain is immutable. For example, when they are in read-only memory or hardcoded in the code. In these cases protection of such pointers and data may be omitted in practice.
  • the scene and context data values encode information relating to pointers or reference data in stages of a calling chain.
  • the scene used in the signing and verification operations is based on the information required to prepare a particular stage to move forward in an indirect calling chain.
  • the object at memory location A contains dictionary entries where each entry contains a lookup key and a value.
  • the next stage data is found in an entry of the dictionary object where the lookup key matches the input K.
  • the starting point is a pointer that points to a memory address A and an integer index I.
  • the data for next stage is loaded from the memory location A + I*S, where 5 is the size of an element in the list.
  • the first data value ie the scene (S) may be constructed using a unique identifier for each supported indirection type.
  • the scene (S) may then be constructed from (1) the indirection type identifier for the current stage, (2) a container identifier or address, (3) reference data used to resolve the current stage, and (4) a type or identifier for the target or next stage.
  • the table below illustrates the different types for the example shown in Figure 1 with constructions of the scene for different types of input data to the signing operation.
  • diet _ptr, list _ptr, list index, func ptr refer to, respectively, a dictionary type pointer, a list pointer, a list index and a function pointer.
  • the components may be combined form a scene of a certain size.
  • the components may be concatenated and compressed when they exceed a certain bit length.
  • the scene may be compressed using a hardware-implemented hash function.
  • the second data value or context used in the signing and verification operations encodes information for remaining targets at a certain stage in an indirect calling chain.
  • context data values are computed recursively in reverse, from the end of the chain.
  • RCi recursive context
  • the recursive context for the current stage is the signed pointer in the next stage.
  • a reference value such as zero may be used as the recursive context to indicate that the next is the last stage.
  • the signing of first stage pointer 120 in Error! Reference source not found.1 may be written as follows: diet _ptr Sign Pointer! /cV _ptr, Si, SignPointer(//.s7 _ptr, S2, SignPointer(/»/z _ptr, S3,
  • Si denotes the scene for the z-th stage.
  • a compiler is responsible for generating machine instructions.
  • instructions to protect the forward-edge may be inserted by programmers, with the help from a compiler.
  • compiler is used broadly to refer to any combination of compilers, interpreters, linkers, assemblers and/or low level libraries which are software components that transform program source code into machine instructions.
  • the five hardware MAC operations previously discussed may be implemented using the Pointer Authentication (PAuth) security extension.
  • PAuth Pointer Authentication
  • Such a compression function may be implemented using a hardware supported hash function.
  • the Pointer Authentication Code using Generic Key (PACGA) instruction may be used.
  • Such hash functions or the PACGA instruction can also be used to combine the scene and context so that the result can be fed into a normal PAC or AUT instruction as a modifier.
  • the PACGA instruction is used as an example to demonstrate how a compress operation can be implemented and used for these purposes.
  • any hardware supported hash function may also be used to implement this operation.
  • the PACGA instruction may be described as follows: tag ⁇ - PACGA(value, modifier)
  • Both the value and the modifier are 64-bit values contained in two general purpose CPU registers, and the output is a 32-bit tag stored in the upper 32 bits of a destination 64-bit general purpose register. Therefore, a 64-bit value can be compressed to a 32-bit tag by applying PACGA with a 0 modifier.
  • represents concatenation of the two 32-bit tags.
  • concatenation can be done, for example, by computing PACGA(V2, 0) first, followed by a shift of the destination register by 32-bits to the right, then a computation of PACGA(V1, 0) where the result is stored in the same destination register.
  • This may be used as an input to a PAC or AUT instruction as previously described. Similarly, if the scene, S, is more than 64-bit long this method may be applied repeatedly until the size is reduced to 64-bits.
  • the five operations for the hardware MAC may be implemented as follows. The operations shown below may be used to sign data of at most 32-bits long. Longer data items may be broken into smaller pieces, which are then signed separately:
  • the part “SignedData & Oxffffffff’ denotes a logical-AND operation between the SignedData and Oxffffffff. This operation clears the upper 32-bits of SignedData and leaves the lower 32-bits untouched, which is the original unsigned data. Then the tag is recomputed using PACGA. After that, an XOR operation of the SignedData with the tag is computed. If the tag stored in the upper 32-bits of SignedData matches the computed tag, the upper 32-bit of SignedData will be cleared and the original data is recovered. Otherwise some bits will remain set in the upper 32-bits and the value of the data will be invalidated.
  • the input and output pointers are in the same register. If AUT fails, the result will be a corrupt pointer that will trigger a memory fault when used.
  • the fourth and fifth operations even though PACGA is used to sign data, it is also possible to use PAC instructions to sign them and later use AUT instructions to verify, essentially treating data as pointers. In that case the fourth operation becomes the same as the first operation, and fifth operation becomes second operation.
  • LoadAndVerify the following sequence of operations may be executed.
  • SP1 is a signed pointer to be validated
  • SI is the scene for SP1.
  • step 3 if VerifyPointer succeeds, the PAC bits in SP1 will be cleared, making SP1 in step 4 the same as Pl. In this case, Pl and SP1 cancel out in the XOR operations in Step 4, making the final SP2 correct (the next pointer loaded in Step 2). If VerifyPointer fails in Step 3, SP1 will contain a corrupt value which is different from Pl, making SP2 invalid after Step 4.
  • Step 3 If the FPAC operation previously described is enabled, a failed authentication in Step 3 would already generate a hardware fault, making Step 4 unnecessary.
  • new extensions to the instruction set architecture of an existing computer architecture may be added to implement the five hardware MAC operations.
  • performance may be maximized in terms of both speed and program size by reducing the number of CPU instructions to implement each of the five operations. All the five operations involve at most three CPU registers, which is consistent with many current designs for computer architectures.
  • a customized Pointer Authentication extension may be used to implement the five operations for the hardware implemented MAC.
  • This customized extension may be implemented in a way that the syntax of the PAuth instructions are the same as defined in ARMv8.3 Instruction Set Architecture, but with a configurable semantics, so that the ISA can remain compatible with ordinary ARMv8.3 applications, while allowing more efficient forward-edge control flow integrity.
  • the configuration of the instruction semantics may be done, for example, by using a configuration bit in a system register, where ARMv8 3 semantics is expected when the bit is 0, and a customized semantics is expected when the bit is 1. Other configurations, such as dedicated register number may also be specified in this way.
  • a dedicated register is reserved for the context.
  • the modifier parameter in PAuth is used to specify the scene.
  • the dedicated register can be a new system register, or one of the general purpose registers. There are 32 64-bit general purpose registers in ARMv8.3, named from xO to x31, many of which are readily usable for such purposes.
  • the dedicated register is referred to herein as a context register regardless of whether it is a system register or general purpose register.
  • a PAC or AUT instruction when executed, it takes the pointer and modifier from its parameters, uses the modifier as the scene but at the same time it takes the value of the context register as the context.
  • the implementation of the four signing and verification operations can be simplified as below, assuming that the context is already stored in the context register.
  • the second customization is to change the semantics of the AUT instructions so that a memory error is triggered immediately when the instructions fail, instead of simply leaving a corrupt pointer and only triggering a memory error later when the pointer is used to load content from the memory.
  • This is similar to the FPAC feature in ARMv8.3 ISA, except that the AUT* instructions in ARMv8.3 does not take a context register into consideration.
  • the third customization required is when loading data from a signed pointer to the context register. When using a Load instruction to load content from a memory address (a pointer) into the content register, the Load instruction should ignore the PAC bits from the pointer. In fact, since memory errors are now triggered with AUT instructions by the second customization, it is not necessary to rely on the Load instructions to trigger memory errors and such changes are reasonable.
  • the LoadAndVerify operation may be implemented with simplified steps below, where SP1 is a signed pointer to be validated, and SI is the scene for SP1: LoadAndVerify(SPl, SI):
  • SP2 contains the result of the LoadAndVerify operation, which is another signed pointer. If the verification fails in Step 2, a memory error is triggered and the operation fails as expected.
  • the methods and systems described herein provide complete protection that prevents pointer substitution attacks.
  • the scene specifies how to resolve the next stage pointers and data in a calling chain.
  • the context allows all the stages of an entire calling chain to be bonded to the final target in a specific order.
  • FIG. 2 is a block diagram showing a method 200 for providing control flow integrity in a computing system comprising a processor and a memory communicatively coupled to the processor, according to an example.
  • the memory may be configured to store program code executable by the processor where the program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain.
  • the method 200 may be implemented by the computing system.
  • the method comprises accessing the program code.
  • the method comprises generating a data value for respective stages of the calling chain based on an output of a cryptographically secure function.
  • Figure 3 is a block diagram showing a method 300 for verifying the control flow integrity on a computing system comprising a processor and a memory communicatively coupled to the processor.
  • the memory stores program code executable by the processor comprising at least one calling chain comprising multiple stages and a first set of data values that validate the calling chain in the control flow of the program code.
  • the method 300 may be implemented by the computing system.
  • the method comprises accessing the program code and the first set of data values.
  • the method comprises generating a second set of data values for the respective stages of the calling chain based on an output of a cryptographically secure function.
  • the method comprises comparing the second set of data values to the first set of values to determine the validity of the calling chain.
  • each of the data values of the second set of data values is generated based a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
  • the second and third data values may be the scene and context as previously described herein.
  • the present disclosure is described with reference to now charts and/or block diagrams of the method, devices and systems according to examples of the present disclosure. Although the flow diagrams described above show a specific order of execution, the order of execution may differ from that which is depicted. Blocks described in relation to one flow chart may be combined with those of another flow chart.
  • the machine-readable instructions may, for example, be executed by a general-purpose computer, a special purpose computer, an embedded processor or processors of other programmable data processing devices to realize the functions described in the description and diagrams.
  • a processor or processing apparatus may execute the machine-readable instructions.
  • modules of apparatus may be implemented by a processor executing machine-readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry.
  • the term 'processor' is to be interpreted broadly to include a CPU, processing unit, logic unit, or programmable gate set etc.
  • the methods and modules may all be performed by a single processor or divided amongst several processors.
  • Such machine-readable instructions may also be stored in a computer readable storage that can guide the computer or other programmable data processing devices to operate in a specific mode.
  • Such machine-readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operations to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices provide an operation for realizing functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.
  • FIG. 4 is a block diagram of a computing system 400 that may be used for implementing the methods disclosed herein. Specific devices may utilize all of the components shown or only a subset of the components, and levels of integration may vary from device to device. Furthermore, a device may contain multiple instances of a component, such as multiple processing units, processors, memories, transmitters, receivers, etc.
  • the computing system 400 includes a processing unit 402.
  • the processing unit includes a central processing unit (CPU) 414, a graphics processing unit (GPU) 416, a memory 408, and may further include a mass storage device 404, a video adapter 410, and an I/O interface 412 connected to a bus 418.
  • CPU central processing unit
  • GPU graphics processing unit
  • memory 408 may further include a mass storage device 404, a video adapter 410, and an I/O interface 412 connected to a bus 418.
  • the bus 418 may be one or more of any type of several bus architectures including a memory bus or memory controller, a peripheral bus, or a video bus.
  • the CPU 414 and GPU 416 may comprise any type of electronic data processors.
  • the memory 408 may comprise any type of non-transitory system memory such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM (SDRAM), read-only memory (ROM), or a combination thereof.
  • the memory 408 may include ROM for use at boot-up, and DRAM for program and data storage for use while executing programs.
  • the mass storage 404 may comprise any type of non-transitory storage device configured to store data, programs, and other information and to make the data, programs, and other information accessible via the bus 418.
  • the mass storage 404 may comprise, for example, one or more of a solid state drive, hard disk drive, a magnetic disk drive, or an optical disk drive.
  • the video adapter 410 and the I/O interface 412 provide interfaces to couple external input and output devices to the processing unit 402.
  • input and output devices include a display 420 coupled to the video adapter 410 and a mouse, keyboard, or printer 422 coupled to the I/O interface 412.
  • Other devices may be coupled to the processing unit 402, and additional or fewer interface cards may be utilized.
  • a serial interface such as Universal Serial Bus (USB) (not shown) may be used to provide an interface for an external device.
  • USB Universal Serial Bus
  • the processing unit 402 also includes one or more network interfaces 406, which may comprise wired links, such as an Ethernet cable, or wireless links to access nodes or different networks.
  • the network interfaces 406 allow the processing unit 402 to communicate with remote units via the networks.
  • the network interfaces 406 may provide wireless communication via one or more transmitters/transmit antennas and one or more receivers/receive antennas.
  • the processing unit 402 is coupled to a local-area network 424 or a wide-area network for data processing and communications with remote devices, such as other processing units, the Internet, or remote storage facilities.

Abstract

A computing system, comprising a processor a memory communicatively coupled to the processor is provided. The memory is configured to store program code executable by the processor, the program code comprising at least one calling chain (100) comprising multiple stages (105, 110, 130, 150, 160), wherein respective stages of the calling chain comprise a respective data structure of data variables (115, 120, 125, 135, 140, 145, 155), and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain. The computing system is configured to access the program code from the memory and generate a data value for respective stages of the calling chain.

Description

CONTROL FLOW INTEGRITY
TECHNICAL FIELD
The present disclosure relates to methods and systems for providing control flow integrity in a computing system.
BACKGROUND
In computer programming, the control flow of a program refers to the order in which statements and function calls are executed by the computing system. Modern computing systems implement programs with highly complex control flows. Controls flows may present a security risk if an attacker can hijack the control flow and force the computing system to execute malicious code. As such, computing systems may implement measures to protect the integrity of control flows.
SUMMARY
It is an object of the invention to provide systems and methods for control flow integrity in a computer system.
The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect, a computing system is provided. The computing system, comprises a processor and a memory communicatively coupled to the processor. The memory is configured to store program code executable by the processor. The program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain. The computing system is configured to: access the program code from the memory and generate a data value for respective stages of the calling chain based on an output of a cryptographically secure function. The data value is generated based on a first input identifying a data vanable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
The computing system according to the first aspect provides control flow integrity for a computer program.
According to a second aspect a computing system is provided. The computing systems comprises a processor and a memory communicatively coupled to the processor. The memory is configured to store program code executable by the processor. The program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain and a first set of data values that validate the calling chain in the control flow of the program code. The computing system is configured to access the program code and the first set of data values, generate a second set of data values for the respective stages of the calling chain based on an output of a cryptographically secure function and compare the second set of data values to the first set of values to determine the validity of the calling chain. Each of the data values of the second set of data values is generated based a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
In a first implementation form of a computing system according to the first aspect each of the data variables comprises a pointer.
The computing system according to the first implementation form provides protection from pointer substitution attacks.
In a second implementation form the computing system sets the third input for the last stage of the calling chain to a default data value.
In a third implementation form the information specific to the stage comprises identification data identifying the data structure type from a set of valid data structure types for the computing system. The third implementation form binds the data value for the respective stages to the data structure associated to the stage.
In a fourth implementation form the data structure comprises a list, a struct or a dictionary.
In a fifth implementation form the computing system comprises an instruction set extension comprising one or more hardware-implemented instructions.
In a sixth implementation form the computing system is arranged to generate the one or more data values for each stage of the calling chain using the instruction set extension.
The sixth implementation form improves performance by reducing the number of hardware instructions required to generate data values.
In a seventh implementation form the instruction set extension comprises a pointer authentication security extension.
In an eighth implementation form, the instruction set extension comprises one or more further instructions to augment the PAuth security extension.
The seventh and eight implementation forms provide an efficient implementation based on a customization of an existing pointer authentication extension.
In a first implementation form of the second aspect the computing system is configured to generate a fault exception in response to determining that a data value in the first set of data values is invalid.
These and other aspects of the invention will be apparent from and the embodiment s) described below.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which: Figure 1 shows a schematic diagram of a function calling chain, according to an example.
Figure 2 shows a method for providing control flow integrity for a computer program in a computing system, according to an example.
Figure 3 shows a method for evaluating the control flow integrity of a computer program, according to an example.
Figure 4 shows a simplified schematic diagram of a computing system, according to an example.
DETAILED DESCRIPTION
Example embodiments are described below in sufficient detail to enable those of ordinary skill in the art to embody and implement the systems and processes herein described. It is important to understand that embodiments can be provided in many alternate forms and should not be construed as limited to the examples set forth herein.
Accordingly, while embodiments can be modified in various ways and take on various alternative forms, specific embodiments thereof are shown in the drawings and described in detail below as examples. There is no intent to limit to the particular forms disclosed. On the contrary, all modifications, equivalents, and alternatives falling within the scope of the appended claims should be included. Elements of the example embodiments are consistently denoted by the same reference numerals throughout the drawings and detailed description where appropriate.
The terminology used herein to describe embodiments is not intended to limit the scope. The articles “a,” “an,” and “the” are singular in that they have a single referent, however the use of the singular form in the present document should not preclude the presence of more than one referent. In other words, elements referred to in the singular can number one or more, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, items, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, items, steps, operations, elements, components, and/or groups thereof. Unless otherwise defined, all terms (including technical and scientific terms) used herein are to be interpreted as is customary in the art. It will be further understood that terms in common usage should also be interpreted as is customary in the relevant art and not in an idealized or overly formal sense unless expressly so defined herein.
Forward-edge Control Flow Integrity (CFI) of a software program refers to the security property that only the intended functions are called during program execution. In particular, attackers cannot force the program to call an unintended function supplied by the attacker. Such an attack is possible when functions are not called directly, but via function pointers stored in writable memory. In this case, an attacker may utilize a vulnerability that allows modifications to such functions pointers and make them point to carefully selected locations in the code to achieve a malicious purpose such as gaining privileges in the system that should not be granted to the attacker.
According to examples, hardware-assisted approaches may be used to defend against such attacks. For example, some Instruction Set Architectures (ISAs) provide a Pointer Authentication (PAuth) security extension that allows programs to insert a Pointer Authentication Code (PAC) in a function pointer when they are stored in writable memory. PACs may be validated when signed pointers are loaded during compilation and execution.
In examples described herein a PAuth extension comprises a pair of operations. Firstly, a signing operation denoted: p <- PAC(p, modifier)
Secondly a validation or verification operation denoted:
AUT(p, modifier)
During signing, the PAC operation takes a pointer value p stored in a CPU register as input and optionally with another register that contains a further data value, sometimes referred to as a modifier. A cryptographically secure message authentication code (MAC) is computed from the pointer and the modifier, and the resulting PAC output may be inserted into unused higher bits in the pointer. During an AUT operation, such a signed pointer is verified against a modifier. If the signed pointer has not been tampered with, the AUT operation will succeed, and the PAC is removed from the signed pointer stored in a CPU register. The pointer can then be used to call functions or load data from memory. If the signed pointer has been tampered with, or the modifier is different from the one used during signing, the AUT operation will fail and the pointer is left corrupted. Subsequent use of the pointer will cause, for example, a memory error.
Signing of pointers using PAC may be performed by a component of an operating system, such as a loader program. A loader program loads program executables and links the program with shared libraries in the system as required. During such a linking stage, pointers to required functions in the shared libraries are computed and placed within a specific region in the program. If the PAuth instruction set extension is enabled for the program, these function pointers are signed by the loader.
The cryptographic keys used by these operations may be managed using a component with a higher privilege in the system, such as the operating system, a hypervisor, or a Trusted Execution Environment (TEE). The cryptographic keys may be different for each device power cycle and for each program process. In other words, the keys are not persistent, and change when the computing device reboots. In examples, they may also change when the same computer program ends its current execution and is loaded and executed a second time.
In some examples code compilation may happen during the execution of a program. In this case, the program may be executed by an interpreter which translates source code into native instructions on-the-fly, or perform Just-in-Time (JIT) compilation on part of the program. In this case an interpreter or the JIT compiler may sign pointers and/or data, provided that they run in the same process space as the compiled code.
In some ISA extensions a further optional operation may be available referred to herein as Faulting PAC (FPAC). When FPAC is implemented and enabled, the AUT operation generates hardware faults when verification of a PAC fails. In this case, the system may not have to wait until a corrupted pointer is loaded for use hardware faults to occur. In general it is infeasible to guess the correct PAC value without knowing a cryptographic key use to generate the PAC value. As such an attacker cannot use an arbitrary pointer and forge a valid PAC for the pointer. However, a pointer substitution attack may still be possible where the attacker replaces a valid signed pointer by another signed pointer, where the latter signed pointer is also valid but pointing to a different function. In this case the construction of a modifier as a further input to the PAC becomes significant: if modifiers are not used or not carefully chosen, pointer substitution attacks may still be feasible despite the presence of a hardware MAC.
In modem computing systems, complex indirect calls occur naturally and frequently. For example, long calling chains arise in Object Oriented Programming (OOP) languages as virtual functions calls. In these function calls, the target function may not be determined at compile time but is instead resolved at runtime. The starting point for resolving the function call is a pointer and some reference data. This may be used to point to a further pointer and reference data. This process may be repeated until the target function is found. Such reference data may include data offset values, dictionary lookup keys and table indices.
Figure 1 is a simplified schematic diagram of a calling chain 100 for an indirect function call, according to an example. The calling chain 100 shown in Figure 1 comprises four stages and begins with an object pointer 105. The object pointer obj 105 points to a Struct-type memory object 110. From the object pointer obj 105 and an offset 115 a further pointer, diet _ptr 120 may be located in the Struct-type object 110. The pointer diet _ptr 120 points to lookup key 125 in a dictionary-type object 130. The lookup key 125 may be used to locate a list pointer, list _ptr 135 and a list index, list index 140 in the dictionary object 130. The list pointer 135 and list index 140 are used to locate a function pointer, func _ptr\ 5 in a list-type object 150. Finally func _ptr 145 points to a specific function 155 in code 160.
The calling chain 100 shown in Figure 1 is illustrative of a calling chain that may be found in a program written in, for example, an OOP language. In real systems, the process illustrated in Figure 1 may be repeated many times before a target function in the form of a pointer to a segment of executable code is identified. In the example illustrated in Figure 1, PAuth operations PAC and AUT are insufficient for protecting the control flow of the calling chain 100 without further instructions. In particular, the PAC operation is configured to receive a function pointer as input. However, the calling chain 100 also uses non-pointer, reference data values such as the lookup key 125 and list index 140. These data values may be stored in writable memory and as such may also be vulnerable to substitution attacks similar to attacks previously described in relation to function pointers.
The methods and systems described herein provide control flow integrity for indirect function calls. The methods described may be used to sign pointers and reference data values to prevent pointer substitution attacks where a legitimately signed pointer is substituted by another legitimately signed pointer to modify the control flow of programs. The integrity of function calls with long indirect calling chains may be protected, as well as function calls with shorter chains or even those with a single level of indirection.
The methods described herein utilize a hardware-implemented MAC. In the examples described it is assumed that the underlying computing system has access to a secure hardware MAC for which it is infeasible for an attacker and for which cryptographic keys are managed in a more privileged or trusted component of the computing system.
According to examples described herein, the methods described herein are based on a hardware-supported message authentication code (MAC) that provides at least the following five operations:
1. SignedPointer <- SignPointer(Pointer, S, C)
2. {Pointer, Error} <- VerifyPointer(SignedPointer, S, C)
3. { SignedPointer2, Error} <- LoadAndVerify(SignedPointerl, Scenel)
4. SignedData SignData(Data, S, C)
5. {Data, Error} <- VerifyData(SignedData, S, C)
The actual instructions used to implement these operations may vary, depending on the underlying platform. According to examples these five operations may be implemented using direct single instruction hardware or alternatively using sequences of instructions provided in an instruction set architecture of a platform. The two signing operations, SignPointer and SignData, take three input parameters, the item to be signed, a first data value, S, referred to herein as the scene and a second data value, C, referred to herein as the context. Each of the operations outputs a data value. Similarly, the two verification operations, VerifyPointer and VerifyData, take a signed item, a scene, S, and a context, C, as input parameters, and output the original item if verification succeeds, or result in an error when verification fails. In some platforms. In many platforms the bit length of pointers is fixed whereas the length of data values that may be input to SignData may vary.
The bit length of a signed pointer may be kept to the same size as a pointer. In contrast it is in general not possible to sign arbitrary data and keep the size of the signed data the same as the original data. However, the data that is being signed in an indirect call resolution are often relatively small. As such it may be possible to keep the size of the data unchanged before and after signing operations.
The LoadAndVerify operation loads a second signed pointer, SignedPointer2, from the memory address given in a first signed pointer, SignedPointerl, and verifies the first signed pointer, SignedPointerl, using the given scene, Scene 1, and the loaded second signed pointer, SignedPointer2, as the context.
In some cases a pointer or data in an indirect calling chain is immutable. For example, when they are in read-only memory or hardcoded in the code. In these cases protection of such pointers and data may be omitted in practice.
In the five operations described previously the scene and context data values encode information relating to pointers or reference data in stages of a calling chain. The scene used in the signing and verification operations is based on the information required to prepare a particular stage to move forward in an indirect calling chain.
In the example shown in Figure 1, there are four types of indirection, namely, Pointer 160, Struct 110, Dictionary 130 and List 150. These indirection types differ in terms of the information that is needed to resolve the next target in an indirect calling chain. This information consists of a pointer and optional reference data. Referring to Figure 1, for the target of Pointer type 160 a memory address is required to load pointer. For the Struct type 110, a pointer that points to memory address A and an offset X 115 are required to load the next stage data from the memory location A + X. To resolve a target of the Dictionary type 130, the starting point is a pointer that points to memory address A and a lookup key K. The object at memory location A contains dictionary entries where each entry contains a lookup key and a value. In this case, the next stage data is found in an entry of the dictionary object where the lookup key matches the input K. To resolve a target of the List type 150, the starting point is a pointer that points to a memory address A and an integer index I. The data for next stage is loaded from the memory location A + I*S, where 5 is the size of an element in the list.
According to examples the first data value ie the scene (S) may be constructed using a unique identifier for each supported indirection type. The scene (S) may then be constructed from (1) the indirection type identifier for the current stage, (2) a container identifier or address, (3) reference data used to resolve the current stage, and (4) a type or identifier for the target or next stage. The table below illustrates the different types for the example shown in Figure 1 with constructions of the scene for different types of input data to the signing operation. In the table below, diet _ptr, list _ptr, list index, func ptr, refer to, respectively, a dictionary type pointer, a list pointer, a list index and a function pointer.
Figure imgf000012_0001
Table 1 After components of a scene are determined the components may be combined form a scene of a certain size. For example, the components may be concatenated and compressed when they exceed a certain bit length. According to examples, the scene may be compressed using a hardware-implemented hash function.
The second data value or context used in the signing and verification operations encodes information for remaining targets at a certain stage in an indirect calling chain. In the calling chain context data values are computed recursively in reverse, from the end of the chain. In on example, the recursive context (RC) data value for the z-th stage (RCi) is defined as:
RCi = SignPointer(Pointeri+i, Scenei+i, RCi+i)
In other words, the recursive context for the current stage is the signed pointer in the next stage. For the final signed pointer in the calling chain, a reference value such as zero may be used as the recursive context to indicate that the next is the last stage. For example, the signing of first stage pointer 120 in Error! Reference source not found.1 may be written as follows: diet _ptr Sign Pointer! /cV _ptr, Si, SignPointer(//.s7 _ptr, S2, SignPointer(/»/z _ptr, S3,
0) where Si denotes the scene for the z-th stage. When verifying signed pointers the signed pointer for the subsequent stage is loaded and used as the context for the current stage. This may be achieved using the LoadAndVerify operation.
For programs written in high level object oriented programming languages, programmers typically do not have the control of how virtual function calls are made at a low level. In this case, a compiler is responsible for generating machine instructions. In languages that do allow programmers to code at a low level, instructions to protect the forward-edge may be inserted by programmers, with the help from a compiler.
Herein, the term compiler is used broadly to refer to any combination of compilers, interpreters, linkers, assemblers and/or low level libraries which are software components that transform program source code into machine instructions. According to a first example the five hardware MAC operations previously discussed may be implemented using the Pointer Authentication (PAuth) security extension.
There are two main differences between the required hardware MAC operations and the PAuth instructions PAC and AUT. Firstly the signing and verification of PAuth accept only a modifier and the data to be signed or verified, whereas the signing and verification operations in the five hardware MAC accept both a scene (S) and a context (C) as inputs. The second difference is that the LoadAndVerify operation that loads data and a signed pointer may not be permitted within an instruction set architecture.
Furthermore, as previously discussed, in addition to the differences identified above the ability to compress data down to specific lengths may be needed to sign arbitrary length data values. Such a compression function may be implemented using a hardware supported hash function. In some architectures, the Pointer Authentication Code using Generic Key (PACGA) instruction may be used. Such hash functions or the PACGA instruction can also be used to combine the scene and context so that the result can be fed into a normal PAC or AUT instruction as a modifier.
In what follows, the PACGA instruction is used as an example to demonstrate how a compress operation can be implemented and used for these purposes. However, any hardware supported hash function may also be used to implement this operation.
The PACGA instruction may be described as follows: tag <- PACGA(value, modifier)
Both the value and the modifier are 64-bit values contained in two general purpose CPU registers, and the output is a 32-bit tag stored in the upper 32 bits of a destination 64-bit general purpose register. Therefore, a 64-bit value can be compressed to a 32-bit tag by applying PACGA with a 0 modifier. Hence, a compress operation can be defined so that two 64-bit register values VI and V2 can be compressed to a single 64-bit tag, T, by computing:
Figure imgf000014_0001
Compress(Vl, V2) = PACGA(V1, 0) | PACGA(V2, 0)
The vertical bar ‘|’ represents concatenation of the two 32-bit tags. In practice, such concatenation can be done, for example, by computing PACGA(V2, 0) first, followed by a shift of the destination register by 32-bits to the right, then a computation of PACGA(V1, 0) where the result is stored in the same destination register.
Scene, S, and context, C, may be mixed into a single Modifier (M) by computing:
Figure imgf000015_0001
Compress(S, C) = PACGA(S, 0) | PACGA(C, 0)
This may be used as an input to a PAC or AUT instruction as previously described. Similarly, if the scene, S, is more than 64-bit long this method may be applied repeatedly until the size is reduced to 64-bits.
The five operations for the hardware MAC may be implemented as follows. The operations shown below may be used to sign data of at most 32-bits long. Longer data items may be broken into smaller pieces, which are then signed separately:
1. SignedPointer <- SignPointer(Pointer, Scene, Context)
= PAC(Pointer, Compress(Scene, Context))
2. {Pointer, Error} G VerifyPointer(SignedPointer, Scene, Context)
= AUT(SignedPointer, Compress(Scene, Context))
4. SignedData <- SignData(Data, Scene, Context)
= PACGA(Data, Compress(Scene, Context)) | Data
5. {Data, Error} <- VerifyData(SignedData, Scene, Context)
= SignedData AG * PACGA(SignedData& Oxffffffff, Compress(Scene, Context))
In the last operation, the part “SignedData & Oxffffffff’ denotes a logical-AND operation between the SignedData and Oxffffffff. This operation clears the upper 32-bits of SignedData and leaves the lower 32-bits untouched, which is the original unsigned data. Then the tag is recomputed using PACGA. After that, an XOR operation of the SignedData with the tag is computed. If the tag stored in the upper 32-bits of SignedData matches the computed tag, the upper 32-bit of SignedData will be cleared and the original data is recovered. Otherwise some bits will remain set in the upper 32-bits and the value of the data will be invalidated.
In the first and second operations the input and output pointers are in the same register. If AUT fails, the result will be a corrupt pointer that will trigger a memory fault when used. In the fourth and fifth operations, even though PACGA is used to sign data, it is also possible to use PAC instructions to sign them and later use AUT instructions to verify, essentially treating data as pointers. In that case the fourth operation becomes the same as the first operation, and fifth operation becomes second operation.
To implement the third operation, LoadAndVerify the following sequence of operations may be executed. SP1 is a signed pointer to be validated, and SI is the scene for SP1.
Figure imgf000016_0001
LoadAndVerify(SPl, SI):
1. Remove PAC bits from SP1 and store the result in Pl (i.e., Pl is a normal pointer)
2. Load memory content at address Pl into SP2 (SP2 has the next signed pointer)
3. Compute SP1 <- VerifyPointer(SPl, SI, SP2)
4. Compute SP2 <- SP2 XOR Pl XOR SP1 (when FPAC is not enabled)
In step 3, if VerifyPointer succeeds, the PAC bits in SP1 will be cleared, making SP1 in step 4 the same as Pl. In this case, Pl and SP1 cancel out in the XOR operations in Step 4, making the final SP2 correct (the next pointer loaded in Step 2). If VerifyPointer fails in Step 3, SP1 will contain a corrupt value which is different from Pl, making SP2 invalid after Step 4.
If the FPAC operation previously described is enabled, a failed authentication in Step 3 would already generate a hardware fault, making Step 4 unnecessary.
According to a second example, new extensions to the instruction set architecture of an existing computer architecture may be added to implement the five hardware MAC operations. In this case, performance may be maximized in terms of both speed and program size by reducing the number of CPU instructions to implement each of the five operations. All the five operations involve at most three CPU registers, which is consistent with many current designs for computer architectures.
According to a third example, a customized Pointer Authentication extension may be used to implement the five operations for the hardware implemented MAC. This customized extension may be implemented in a way that the syntax of the PAuth instructions are the same as defined in ARMv8.3 Instruction Set Architecture, but with a configurable semantics, so that the ISA can remain compatible with ordinary ARMv8.3 applications, while allowing more efficient forward-edge control flow integrity. The configuration of the instruction semantics may be done, for example, by using a configuration bit in a system register, where ARMv8 3 semantics is expected when the bit is 0, and a customized semantics is expected when the bit is 1. Other configurations, such as dedicated register number may also be specified in this way.
In the third example, instead of adding more instructions for compression as in the first example, a dedicated register is reserved for the context. The modifier parameter in PAuth is used to specify the scene. The dedicated register can be a new system register, or one of the general purpose registers. There are 32 64-bit general purpose registers in ARMv8.3, named from xO to x31, many of which are readily usable for such purposes. The dedicated register is referred to herein as a context register regardless of whether it is a system register or general purpose register.
More specifically, when a PAC or AUT instruction is executed, it takes the pointer and modifier from its parameters, uses the modifier as the scene but at the same time it takes the value of the context register as the context. In other words, the implementation of the four signing and verification operations can be simplified as below, assuming that the context is already stored in the context register.
1. SignedPointer <- SignPointer(Pointer, Scene, Context)
= PAC(Pointer, Scene) // with context already in CR
2. {Pointer, Error} VerifyPointer(SignedPointer, Scene, Context)
= AUT(SignedPointer, Scene) // with context already in CR
4. SignedData <- SignData(Data, Scene, Context)
= PACGA(Data, Scene) | Data // with context already in CR
5. {Data, Error} <- VerifyData(SignedData, Scene, Context)
= SignedData XOR PACGA(SignedData & Oxffffffff, Scene) // with context already in CR
The second customization is to change the semantics of the AUT instructions so that a memory error is triggered immediately when the instructions fail, instead of simply leaving a corrupt pointer and only triggering a memory error later when the pointer is used to load content from the memory. This is similar to the FPAC feature in ARMv8.3 ISA, except that the AUT* instructions in ARMv8.3 does not take a context register into consideration. The third customization required is when loading data from a signed pointer to the context register. When using a Load instruction to load content from a memory address (a pointer) into the content register, the Load instruction should ignore the PAC bits from the pointer. In fact, since memory errors are now triggered with AUT instructions by the second customization, it is not necessary to rely on the Load instructions to trigger memory errors and such changes are reasonable.
With customized AUT instructions and the special Load instruction, the LoadAndVerify operation may be implemented with simplified steps below, where SP1 is a signed pointer to be validated, and SI is the scene for SP1:
Figure imgf000018_0001
LoadAndVerify(SPl, SI):
1. Load memory content from SP1 into context register with PAC bits ignored in SP1
2. Compute SP1 <- VerifyPointer(SPl, SI, CR)
3. Copy the content of context register to SP2.
At the end of these three steps, SP2 contains the result of the LoadAndVerify operation, which is another signed pointer. If the verification fails in Step 2, a memory error is triggered and the operation fails as expected.
The methods and systems described herein provide complete protection that prevents pointer substitution attacks. The scene specifies how to resolve the next stage pointers and data in a calling chain. The context allows all the stages of an entire calling chain to be bonded to the final target in a specific order.
The methods described provide a scalable and flexible design. In particular, computations can be done efficiently using existing instructions such as PAuth as found in ARMv8.3 ISA, without customization. This method can also be implemented with a dedicated register and customized PAuth instructions resulting in a better performance. Furthermore this method can be implemented with new ISA extensions and may be applied in many use cases such as Object Oriented Programming, dynamic linking, function tables, kernel loadable modules, and so on, whenever indirect function calls are involved. Figure 2 is a block diagram showing a method 200 for providing control flow integrity in a computing system comprising a processor and a memory communicatively coupled to the processor, according to an example. The memory may be configured to store program code executable by the processor where the program code comprises at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain. The method 200 may be implemented by the computing system.
At block 210 the method comprises accessing the program code. At block 220 the method comprises generating a data value for respective stages of the calling chain based on an output of a cryptographically secure function.
Figure 3 is a block diagram showing a method 300 for verifying the control flow integrity on a computing system comprising a processor and a memory communicatively coupled to the processor. The memory stores program code executable by the processor comprising at least one calling chain comprising multiple stages and a first set of data values that validate the calling chain in the control flow of the program code. The method 300 may be implemented by the computing system.
At block 310 the method comprises accessing the program code and the first set of data values. At block 320, the method comprises generating a second set of data values for the respective stages of the calling chain based on an output of a cryptographically secure function. At block 330 the method comprises comparing the second set of data values to the first set of values to determine the validity of the calling chain.
According to examples each of the data values of the second set of data values is generated based a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain. According to examples, the second and third data values may be the scene and context as previously described herein. The present disclosure is described with reference to now charts and/or block diagrams of the method, devices and systems according to examples of the present disclosure. Although the flow diagrams described above show a specific order of execution, the order of execution may differ from that which is depicted. Blocks described in relation to one flow chart may be combined with those of another flow chart. In some examples, some blocks of the flow diagrams may not be necessary and/or additional blocks may be added. It shall be understood that each flow and/or block in the flow charts and/or block diagrams, as well as combinations of the flows and/or diagrams in the flow charts and/or block diagrams can be realized by machine readable instructions.
The machine-readable instructions may, for example, be executed by a general-purpose computer, a special purpose computer, an embedded processor or processors of other programmable data processing devices to realize the functions described in the description and diagrams. In particular, a processor or processing apparatus may execute the machine-readable instructions. Thus, modules of apparatus may be implemented by a processor executing machine-readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry. The term 'processor' is to be interpreted broadly to include a CPU, processing unit, logic unit, or programmable gate set etc. The methods and modules may all be performed by a single processor or divided amongst several processors. Such machine-readable instructions may also be stored in a computer readable storage that can guide the computer or other programmable data processing devices to operate in a specific mode.
Such machine-readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operations to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices provide an operation for realizing functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.
Figure 4 is a block diagram of a computing system 400 that may be used for implementing the methods disclosed herein. Specific devices may utilize all of the components shown or only a subset of the components, and levels of integration may vary from device to device. Furthermore, a device may contain multiple instances of a component, such as multiple processing units, processors, memories, transmitters, receivers, etc. The computing system 400 includes a processing unit 402. The processing unit includes a central processing unit (CPU) 414, a graphics processing unit (GPU) 416, a memory 408, and may further include a mass storage device 404, a video adapter 410, and an I/O interface 412 connected to a bus 418.
The bus 418 may be one or more of any type of several bus architectures including a memory bus or memory controller, a peripheral bus, or a video bus. The CPU 414 and GPU 416 may comprise any type of electronic data processors. The memory 408 may comprise any type of non-transitory system memory such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM (SDRAM), read-only memory (ROM), or a combination thereof. In an embodiment, the memory 408 may include ROM for use at boot-up, and DRAM for program and data storage for use while executing programs.
The mass storage 404 may comprise any type of non-transitory storage device configured to store data, programs, and other information and to make the data, programs, and other information accessible via the bus 418. The mass storage 404 may comprise, for example, one or more of a solid state drive, hard disk drive, a magnetic disk drive, or an optical disk drive.
The video adapter 410 and the I/O interface 412 provide interfaces to couple external input and output devices to the processing unit 402. As illustrated, examples of input and output devices include a display 420 coupled to the video adapter 410 and a mouse, keyboard, or printer 422 coupled to the I/O interface 412. Other devices may be coupled to the processing unit 402, and additional or fewer interface cards may be utilized. For example, a serial interface such as Universal Serial Bus (USB) (not shown) may be used to provide an interface for an external device.
The processing unit 402 also includes one or more network interfaces 406, which may comprise wired links, such as an Ethernet cable, or wireless links to access nodes or different networks. The network interfaces 406 allow the processing unit 402 to communicate with remote units via the networks. For example, the network interfaces 406 may provide wireless communication via one or more transmitters/transmit antennas and one or more receivers/receive antennas. In an embodiment, the processing unit 402 is coupled to a local-area network 424 or a wide-area network for data processing and communications with remote devices, such as other processing units, the Internet, or remote storage facilities.
Although the present disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.
The present inventions can be embodied in other specific apparatus and/or methods. The described embodiments are to be considered in all respects as illustrative and not restrictive. In particular, the scope of the invention is indicated by the appended claims rather than by the description and figures herein. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A computing system, comprising: a processor a memory communicatively coupled to the processor, the memory configured to store program code executable by the processor, the program code comprising at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain; and wherein the computing system is configured to: access the program code from the memory; and generate a data value for respective stages of the calling chain based on an output of a cryptographically secure function; wherein the data value is generated based on a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
2. The computing system of claim 1 wherein each of the data variables comprises a pointer.
3. The computing system of claim 1 or 2, comprising setting the third input for the last stage of the calling chain to a default data value.
4. The computing system of claims 1 to 3 wherein the information specific to the stage comprises identification data identifying the data structure type from a set of valid data structure types for the computing system.
5. The computing system of claim 4, wherein the data structure comprises a list, a struct or a dictionary.
6. The computing system of claim 1 to 5, wherein the computing system comprises an instruction set extension comprising one or more hardware-implemented instructions.
7. The computing system of claim 6 wherein the computing system is arranged to generate the one or more data values for each stage of the calling chain using the instruction set extension.
8. The computing system of claim 6 or 7 wherein the instruction set extension comprises a pointer authentication security extension.
9. The computing system of claim 8 wherein the instruction set extension comprises one or more further instructions to augment the PAuth security extension.
10. A computing system, comprising: a processor; a memory communicatively coupled to the processor, the memory configured to store: program code executable by the processor, the program code comprising at least one calling chain comprising multiple stages, wherein respective stages of the calling chain comprise a respective data structure of data variables, and a mapping configured to map a first set of data variables of a data structure of a first stage of the calling chain to a second set of data variables of a data structure of a second stage of the calling chain; and a first set of data values that validate the calling chain in the control flow of the program code; wherein the computing system is configured to: access the program code and the first set of data values; generate a second set of data values for the respective stages of the calling chain based on an output of a cryptographically secure function; and compare the second set of data values to the first set of values to determine the validity of the calling chain, wherein each of the data values of the second set of data values is generated based a first input identifying a data variable of the data structure of the respective stage, a second input encoding information specific to the stage and a third input comprising a data value of the subsequent stage of the calling chain.
11. The computing system of claim 10, wherein the computing system is configured to generate a fault exception in response to determining that a data value in the first set of data values is invalid.
PCT/EP2021/073329 2021-08-24 2021-08-24 Control flow integrity WO2023025370A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/073329 WO2023025370A1 (en) 2021-08-24 2021-08-24 Control flow integrity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/073329 WO2023025370A1 (en) 2021-08-24 2021-08-24 Control flow integrity

Publications (1)

Publication Number Publication Date
WO2023025370A1 true WO2023025370A1 (en) 2023-03-02

Family

ID=77595576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/073329 WO2023025370A1 (en) 2021-08-24 2021-08-24 Control flow integrity

Country Status (1)

Country Link
WO (1) WO2023025370A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125501A1 (en) * 2019-06-29 2020-04-23 Intel Corporation Pointer based data encryption

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125501A1 (en) * 2019-06-29 2020-04-23 Intel Corporation Pointer based data encryption

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALI JOSE MASHTIZADEH ET AL: "CCFI", PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS '15, 12 October 2015 (2015-10-12), New York, New York, USA, pages 941 - 951, XP055340240, ISBN: 978-1-4503-3832-5, DOI: 10.1145/2810103.2813676 *
HANS LILJESTRAND ET AL: "PAC it up: Towards Pointer Integrity using ARM Pointer Authentication", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 November 2018 (2018-11-22), XP081022499 *

Similar Documents

Publication Publication Date Title
US11562063B2 (en) Encoded inline capabilities
US11416624B2 (en) Cryptographic computing using encrypted base addresses and used in multi-tenant environments
US11403234B2 (en) Cryptographic computing using encrypted base addresses and used in multi-tenant environments
US11782716B2 (en) Hardware apparatuses, methods, and systems for individually revocable capabilities for enforcing temporal memory safety
US11784786B2 (en) Mitigating security vulnerabilities with memory allocation markers in cryptographic computing systems
US9165138B2 (en) Mitigation of function pointer overwrite attacks
CN111052115B (en) Data processing apparatus and method of authentication depending on call path
US20220382885A1 (en) Cryptographic computing using encrypted base addresses and used in multi-tenant environments
US11947663B2 (en) Control flow protection based on phantom addressing
Denis-Courmont et al. Camouflage: Hardware-assisted CFI for the ARM Linux kernel
Breuer et al. Avoiding hardware aliasing: Verifying RISC machine and assembly code for encrypted computing
WO2023025370A1 (en) Control flow integrity
US20220214881A1 (en) Ratchet pointers to enforce byte-granular bounds checks on multiple views of an object
US20220417042A1 (en) Platform sealing secrets using physically unclonable function (puf) with trusted computing base (tcb) recoverability
EP4020299A1 (en) Memory address bus protection for increased resilience against hardware replay attacks and memory access pattern leakage
US20230418950A1 (en) Methods, Devices, and Systems for Control Flow Integrity
US20220206814A1 (en) Cryptographic enforcement of borrow checking
Jang et al. On the analysis of byte-granularity heap randomization
WO2023001366A1 (en) Device and method for protecting code integrity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21763380

Country of ref document: EP

Kind code of ref document: A1