CN111859383B

CN111859383B - Software automatic segmentation method, system, storage medium, computer equipment and terminal

Info

Publication number: CN111859383B
Application number: CN202010514295.XA
Authority: CN
Inventors: 李兴华; 张晓涵; 石志远; 杨超; 杨力; 柯海娟; 智一方; 潘晓波; 马建峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2021-08-06
Anticipated expiration: 2040-06-08
Also published as: WO2021248666A1; CN111859383A

Abstract

The invention belongs to the technical field of application program safety zone division, and discloses a software automatic division method, a system, a storage medium, computer equipment and a terminal, namely a MulTEE zone frame, wherein a user needs to firstly annotate safety sensitive application program data; the MulTEE can automatically divide the application program into an untrusted module and a plurality of secure modules, each secure module is a minimum program slice of the sensitive data, backward data flow analysis is used for identifying codes which may affect confidentiality of the sensitive data, and forward slices are used for identifying codes which may affect integrity of the sensitive data; security-sensitive modules are deployed within a secure zone to protect them from attacks. The present invention evaluates MulTEE against the Memcached database, LibreSSL cryptography library, and Digital Bitbox bitcoin wallet, and results show that it achieves smaller TCB sizes and has acceptable performance overhead.

Description

Software automatic segmentation method, system, storage medium, computer equipment and terminal

Technical Field

The invention belongs to the technical field of application program safety zone division, and particularly relates to a software automatic segmentation method, a system, a storage medium, computer equipment and a terminal.

Background

Applications are increasingly deployed in third party data centers and public cloud environments that are not fully trusted, such as AmazonAWS and microsoft azure. This puts very high demands on the cloud data center, which must protect sensitive data from the most privileged attacker (e.g., system administrator). The encryption technology used for protecting sensitive data has a large limit on the operations that can be performed, and the fully homomorphic encryption technology, although allowing a user to perform any operation, can generate a large amount of calculation overhead. A new direction to protect applications in untrusted environments is to use the trusted execution mechanisms provided by modern CPUs, such as Intel's Software Guard Extensions (SGX). Using Intel SGX, a user can do any operations on data in the secure area (Enclave) and be protected. The secure area is a separate memory area that is transparently encrypted by hardware and isolated from other parts of the system, including higher-authority system software.

The subject of executing sensitive code in an isolated and trusted environment has attracted the attention of many researchers over the last decade. Mccone et al introduces Flicker, an infrastructure that executes code in an isolated and trusted environment. In their work, they relied only on 250 lines of code in the TCB to provide strong isolation. The same group proposed TrustVisor, a targeted Virtual Machine Manager (VMM) that provides confidentiality and integrity guarantees for sensitive code and data of applications. Geniatakos uses identification of authentication points and "sensitive" data in binary applications to dynamically adjust software defense. Many studies have proposed sandboxes to achieve security and privacy. In Robusta, the author isolates the native code, preventing external modifications to the data.

Hardware isolation techniques, e.g. ARM TrustZone and Intel SGX, have become popular security primitives that can prevent information leakage and data forgery attacks. For example, ARM TrustZone provides physical bus level hardware isolation on ARM based platforms, separating between the "secure world" and the "normal world". In TrustICE, authors ensure the isolation of secure code in the general world by using a Trusted Domain Controller (TDC) that resides in the secure world, a part referred to by the authors as establishing an Isolated Computing Environment (ICE). This functionality has been used to protect various software and hardware applications such as programming runtime, operating system kernels, one-time password tokens, trusted platform modules, and hardware trojan defense. Intel SGX achieves hardware isolation (i.e., secure enclave) through memory encryption, has been used to protect containers in the cloud, and in the field of network applications, Kim et al use OpenSGX (SGX emulator) to address inter-domain routing and point-to-point anonymous networks (Tor) based on Software Defined Networking (SDN)^[27]Privacy and security issues. Both hardware isolation primitives require an application-specific programming model that separates the runtime resources into separate execution environments. However, most of the existing research is focused mainly on the application of the isolation framework rather than on dealing with the current programming challenges.

Systems such as trusted execution application architecture, Haven, SCONE, VC3, and graphics, execute complete applications by adding enough system support in a single secure zone to provide isolation at a coarse granularity. Haven runs unmodified Windows applications using the Drawbridge library OS; the graph uses a library operating system to run a Linux application program in a safe area; SCONE places a modified version of the standard C language library in the safe area to support the recompiled Linux application; VC3 protects the map/reduce job using a secure zone and forces the map/reduce task to interact with the untrusted environment only through a specific interface. While this approach requires less development effort because it can execute most unmodified applications, it increases the size of the TCB by requiring all application code and data and system support code to be placed in a secure area. For many applications, since only a portion of the code handles sensitive data, while other code is not security sensitive and does not require protection, the size of the TCB can be reduced by simply storing security sensitive code in a single secure area. Partitioning can be done by manual or automated methods so that complex applications can use the security functions of SGX. More recently, Lind et al proposed an automatic partitioning framework for Intel SGX that automates the partitioning of secure areas for applications by program analysis and provides security assurance regarding the secure area code and its interface with untrusted environments. Atamli-Reineh et al propose two multiple security zone solutions, which differ in whether a single secret or multiple secrets are stored in each security zone. The scheme of the independent secret multi-safety zone respectively stores different secrets in different safety zones, so that the minimum privilege principle is realized, but because the scheme has large performance overhead, an author provides a mixed scheme, one to a plurality of secrets are stored in each safety zone, and the safety analysis proves that the scheme has a good defense effect on various attacks and can well improve the performance of a single secret scheme, but at present, a software partitioning method suitable for a safety zone scene does not exist for guiding the partitioning of a mixed multi-safety zone framework, so that the rationality of the partitioning is ensured, and the safety and the performance are balanced.

As applications increase in size and complexity, SGX applications increase in architecture, the most straightforward architecture being to run unmodified applications by adding enough system support (e.g., library OS or C-standard library) to a single secure zone, but this approach has a very large Trusted Computing Base (TCB), and even in well-designed code, the number of errors is proportional to the size of the code, and an attacker only needs to exploit one vulnerability in the secure zone code to break the security of the trusted execution. Privilege separation may prevent a software system from being completely corrupted by a single vulnerability because any corrupted protection domain cannot directly access the code or data of the portions of the system running in other protection domains. Calls to functions in other protection domains will be converted to Remote Procedure Calls (RPCs) and data access will be restricted to the protection domain if necessary. However, while there is significant potential for improving security of software in SGX through privilege isolation techniques, developers have challenges in leveraging privilege isolation to achieve security guarantees, reconfiguring software systems into working modules that can be executed in SGX, and maintaining good performance of decomposed programs in execution. First, privilege-disjoint systems typically aim to assign a minimum privilege right for each protection domain, but it is not clear whether such right assignments meet security guarantees. For example, Provos et al manually reconfigured OpenSSH into a privileged server process and a number of non-privileged monitor processes, each of which handles a user connection. In this study, infected monitoring processes should not affect server processes or other monitoring processes. However, OpenSSH, which has not been proven privilege-isolated until long ago, reaches the strong Clark-Wilson integrity model. Secondly, in SGX, since more side channel information is leaked by a plurality of separated software modules, the side channel security problem after privilege separation is particularly considered. Finally, if too many partitions result in too much performance overhead, and if too few, the security requirements cannot be met, so achieving the security goal while still allowing the user to make a tradeoff between security and performance is critical to successful privilege separation. However, at present, there is no software partitioning method suitable for the security zone scenario to guide the partitioning of the hybrid multi-security-zone architecture.

Through the above analysis, the problems and defects of the prior art are as follows: at present, a software partitioning method suitable for a multi-safety zone scene does not guide partitioning of the multi-safety zone.

The difficulty in solving the above problems and defects is:

when the SGX function is used to protect the application, the application needs to be modified and reconfigured, which has strict requirements on developers, and requires a complex tradeoff between performance and security, and there is no suitable theory to support how the application is divided by developers.

The problem of partitioning an application with multiple security zones first needs to be solved, that is, how to reasonably divide the application into multiple modules and reduce the coupling between the modules as much as possible, because the mutual call between the modules will introduce a large performance overhead. Secondly, the problem of the segmentation security of the application program needs to be solved, and a reasonably designed method is needed to ensure that confidential data in each module is not propagated to other modules as much as possible, so that the segmentation security is ensured. The significance of solving the problems and the defects is as follows:

by using the method, the workload required by a programmer when the application program is required to adapt to the SGX can be greatly reduced, the application program does not need to be read and modified manually, quantitative division of the performance and the safety of the software after division is supported according to the threshold value specified by the user, and the difficulty of manual division is reduced.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method, a system, a storage medium, computer equipment and a terminal for automatically segmenting software.

The invention is realized in such a way that a software automatic segmentation method, namely a MulTEE partition framework, a user needs to firstly annotate security-sensitive application program data; then, MulTEE will automatically divide the application into untrusted modules and multiple secure modules, each secure module being the smallest slice of the sensitive data, identify codes that may affect the confidentiality of the sensitive data using backward dataflow analysis, and identify codes that may affect the integrity of the sensitive data using forward slices; finally, security-sensitive modules are deployed within the secure zone to protect them from attack.

Further, the software automatic segmentation method comprises the following steps: firstly, a preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file according to an input source code file set; marking a safety sensitive variable list needing to be protected and a function name where the variable is located;

secondly, analyzing the intermediate file obtained by the preprocessor, and identifying a code set related to the security sensitive variable designated by the user;

thirdly, dividing sensitive variables with approximate partial information quantity into the same safety zone by using relevance analysis according to the safety requirement of a user;

fourthly, codes which are marked and polluted in the SDG transmission process are grouped through relevance analysis; uncontaminated code, defined by a blob in the context, the component (1) constitutes a security slice, and the component (2) forms a generic slice, deployed inside and outside the security zone, respectively.

Further, the first step further includes: the preprocessor inputs two types of data: (1) a set of source code files, denoted as F ═ F₁,f₂,...,f_nA preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file; (2) the list of security sensitive variables that must be protected and the name of the function in which the variable is located are marked as S ═ S₁,s₂,...,s_nMark is carried out.

Further, the second step further includes: the data flow and control flow dependence of the security sensitive variable are identified by using an automatic symbolic program slice, a function set comprises definition statements of the security sensitive variable, the function set is continuously expanded in the process of slice analysis, and in order to obtain a minimized program subset, a slice rule is defined as follows:

tracing the data flow of the program as it executes from n to m, rule (1) indicating that propagation of the secret reference should be traced; rule (2) indicates that propagation of other parameters of the function needs to be tracked when the secret references are used as parameters of the function call; rule (3) indicates that propagation of a return value of a function needs to be tracked when a secret reference is used as a parameter of a function call; rule (4) indicates that when any reference to a secret is dereferenced, the secret and its reference need to be tracked;

determining the propagation of sensitive information in a system dependency graph SDG by using llvm-slicing-a static program slicing frame, sequentially and independently carrying out propagation analysis on sensitive variables in { s }, and recording the obtained sensitive variable set as

Is constantly changing during the iteration when

And stopping analysis when the reference set of the reference set is not changed any more, and obtaining a large number of potential sensitive functions.

Further, the third step further includes:

representing the set of all sensitive variables p to which the sensitive variable a covers,

a set of all codes representing the propagation of the sensitive variable a;

definition 1: a similarity function sim (x, y) is a function that maps variables x, y into numbers in [0,1], measuring the similarity between x and y, sim (x, y) ═ 1 corresponds to objects x, y being the same, while sim (x, y) ═ 0 corresponds to very different objects;

using the Jaccard similarity coefficient used in the information retrieval as a similarity measure:

using a relevance threshold rho e [0, sim(s)₁，s₂)_max]As a criterion for whether to partition variables into the same group, the relevance threshold represents how much performance a user of the software is willing to sacrifice in exchange for the security of the system; sim(s)₁，s₂) When the value is more than or equal to rho, the value of s is adjusted₁，s₂Divided into the same security zone group, sim(s)₁，s₂) If < rho, then s will be₁，s₂Dividing the slices into different safety zone groups, wherein overlapping parts exist among the slices meeting different variables in the dividing process; at ρ ═ sim(s)₁，s₂)_maxThe automatic partitioning scheme stores each secret in a separate security zone; when ρ is 0, the final architecture may degrade into an architecture in the threat model.

Further, the fourth step further includes: the program slicer slices and deploys a program according to a result obtained by the variable relevance analysis, the program slicer injects a secure communication code into two slices which need to communicate with another slice, communication between two secure areas is performed by sharing a Sealed Blob, and the injected communication code comprises two parts:

shared memory access, in which a program slice loads data to or stores data from the shared memory, implementing communication with another slice;

adding an Ecall/Ocall instruction at the boundary of the safety area to realize the switching of the program execution sequence inside and outside the safety area, adding the Ocall instruction to the program code in the safety area to jump outside the safety area, and calling the function in the safety area by using the Ecall instruction outside the safety area;

further, the sections with overlapped parts have four processing schemes of Normal, Duplicated, Standalone and Hybrid;

the Normal processing mode puts the repeated parts of two sections of safety zone codes into a separate safety zone;

duplicate treatment mode retains Duplicate portions in both security zones;

the standby processing mode puts the redundant part into one section of safe area code according to the calling times and adds the related calling code into the other end code.

The Hybrid processing mode is characterized in that a gcov runtime analysis tool is used for counting the calling times of each boundary function, a cost is distributed to each safety zone boundary function according to the calling times of the overlapped part, and a part of functions are moved into a safety zone within a configurable threshold value.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

firstly, a preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file according to an input source code file set; marking a safety sensitive variable list needing to be protected and a function name where the variable is located;

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an automatic software partitioning system implementing the automatic software partitioning method, the automatic software partitioning system including:

the preprocessing module is used for realizing that the preprocessor converts all source code files into LLVM (Linked markup language) assembly codes and links the LLVM assembly codes into a single file; marking a safety sensitive variable list needing to be protected and a function name where the variable is located;

the program slicing module is used for analyzing on the basis that the preprocessor obtains the intermediate file and identifying a code set related to the security sensitive variable specified by the user;

the variable relevance analysis module is used for dividing sensitive variables with approximate partial information quantity into the same safety area by using relevance analysis according to the safety requirement of a user;

the program slice optimization module is used for realizing the codes which are marked and polluted in the SDG transmission process, and grouping the codes through relevance analysis; uncontaminated code, defined by a blob in the context, the component (1) constitutes a security slice, and the component (2) forms a generic slice, deployed inside and outside the security zone, respectively.

Another object of the present invention is to provide a terminal having the software automatic segmentation system installed therein.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides higher security level, has smaller difference in performance, realizes an automatic mixed multi-security zone architecture software segmentation method, and reasonably groups sensitive variables appointed by developers; dividing an application program into a plurality of safety-sensitive safety areas and safety-insensitive non-safety area parts, wherein each safety area comprises a group of sensitive variables; the secure area interface is processed and optimized and performance overhead is guaranteed to be acceptable.

Trusted execution mechanisms provided in modern CPUs can protect applications that are not executing in a trusted environment, such as the Intel SGX secure area. Existing applications may run in secure zones by extracting security-sensitive portions through program analysis methods, resulting in a single secure zone with a large Trusted Computing Base (TCB) and breaking the least privileged principles. According to the method, a MulTEE partition framework is used for firstly annotating security-sensitive application program data; MulTEE will then automatically divide the application into untrusted modules and multiple secured modules, each of which is a minimal program slice of the sensitive data, and to maintain the confidentiality of the data, use backward dataflow analysis to identify code that may be exposed to sensitive data, and to ensure data integrity, use forward slices to identify code that may affect sensitive data. Finally, security sensitive modules are placed within the secure zone to protect them from attack. After the segmentation is finished, a multi-security-zone software optimization method of dynamic boundary expansion and minimum slice combination is designed, the invention evaluates the MulTEE by using a Memcached database, a library of library SSL cryptography and a Digital Bitbox wallet, and the result shows that the method realizes smaller TCB size and has acceptable performance overhead.

The method and the device have the advantages that the relevance between the sensitive data is analyzed through the diffusion range of the sensitive data, so that the sensitive variables are grouped, the sensitive information of different groups is stored in different safety areas, and only functions relevant to the sensitive information are stored in each safety area, so that the sensitive information in other safety areas can be still protected when one safety area is in compromise.

The invention provides a variable relevance analysis method based on a taint diffusion range, which is characterized in that relevance among sensitive variables is analyzed according to a sensitive variable list provided by developers, so that the sensitive variables are grouped, and variables and affected sentences in the same group are stored in the same safety area. When a program is divided into a plurality of safety areas, redundant codes can be generated among the safety areas, a Hybrid-EBA redundant code processing scheme is designed based on the calling times of functions, and the TCB is minimized while the lower throughput overhead is kept; the effectiveness and efficiency of the present invention was evaluated using 4 practical applications that perform common tasks. The evaluation result of the invention shows that the invention can successfully realize the program slicing target of hardware isolation and meet the safety requirement. Furthermore, the present invention enables a controllable TCB size and context switch time, respectively, which is essential to accommodate various application requirements.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a software automatic segmentation method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of an automatic software partitioning system according to an embodiment of the present invention;

in the figure: 1. a preprocessing module; 2. a program slicing module; 3. a variable relevance analysis module; 4. and a program slice optimization module.

Fig. 3 is a diagram of a multi-safe zone application architecture provided by an embodiment of the present invention.

Fig. 4 is a flow chart of a system provided by an embodiment of the invention.

Fig. 5 is a schematic diagram of a slice relationship provided by an embodiment of the present invention.

Fig. 6 is a graph of the result of the Memcached security experiment provided by the embodiment of the present invention.

FIG. 7 is a diagram of the results of Polarssl safety experiments provided by embodiments of the present invention.

FIG. 8 is a graph of the results of a Digital Bitbox safety experiment provided by an embodiment of the present invention.

Fig. 9 is a diagram of a result of a Libmodbus security experiment according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a method, a system, a storage medium, a computer device, and a terminal for automatically segmenting software, and the present invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the automatic software partitioning method provided by the present invention includes the following steps:

s101: the preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file; marking a safety sensitive variable list needing to be protected and a function name where the variable is located;

s102: analyzing on the basis that the preprocessor obtains the intermediate file, and identifying a code set related to a security sensitive variable specified by a user;

s103: according to the security requirement of a user, sensitive variables with approximate partial information quantity are divided into the same security zone by using relevance analysis;

s104: codes marked and contaminated in the SDG propagation process are grouped through relevance analysis; uncontaminated code, defined by a blob in the context, the component (1) constitutes a security slice, and the component (2) forms a generic slice, deployed inside and outside the security zone, respectively.

As shown in fig. 2, the software automatic segmentation system provided by the present invention includes:

the preprocessing module 1 is used for realizing that a preprocessor converts an input source code file set into LLVM (link level virtual machine) assembly codes and links the LLVM assembly codes into a single file; and marking a safety sensitive variable list needing to be protected and the function name of the variable.

And the program slicing module 2 is used for analyzing on the basis that the preprocessor obtains the intermediate file, and identifying a code set related to the security sensitive variable specified by the user.

And the variable relevance analysis module 3 is used for dividing the sensitive variables with approximate partial information quantity into the same safety area by using relevance analysis according to the safety requirement of the user.

The program slice optimization module 4 is used for realizing the codes which are marked and polluted in the SDG transmission process, and grouping the codes through relevance analysis; uncontaminated code, defined by a blob in the context, the component (1) constitutes a security slice, and the component (2) forms a generic slice, deployed inside and outside the security zone, respectively.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

1. The preparation knowledge and the system architecture are important for protecting application program data. Past events indicate that data leaks and integrity violations may have a significant impact on the reputation of users and application providers.

Today, applications are often deployed in untrusted environments, such as public clouds controlled by third party providers. In addition to vulnerable applications, the underlying infrastructure (i.e., the Operating System (OS) and virtual machine hypervisor) may not be trusted by the application owner, whereas software-based solutions are implemented as part of the operating system or the hypervisor cannot protect the application data. New hardware security functions (such as Intel SGX) provide solutions through a trusted execution model. Memory and execution isolation of application code and data from other environments is supported, including higher-privilege system software. In this work, the present invention solves the problem of how developers use trusted execution to fine-grained protect security sensitive code and data of applications.

1.1Intel SGX trusted execution framework

Intel's Software protection Extensions (SGX) allow applications to protect the confidentiality and integrity of code and data, even though an attacker may control all Software (operating system, hypervisor, and BIOS) and physical access to the machine, including memory and system buses. SGX provides trusted execution mechanisms for applications in the form of secure zones. The safe zone code and data reside in a protected memory region called the safe zone page cache (EPC). The EPC only allows access to application code executing within the secure enclave, and code within the secure enclave may also access memory outside the secure enclave. An on-chip memory encryption engine (on-chip memory encryption engine) encrypts and decrypts cache lines that write to and retrieve from memory in the EPC. Since the safe area code is always executed in user mode, interaction with the operating system through system calls is required, e.g., for network or disk I/O must be executed outside the safe area.

Using Intel's SGX SDK, developers can create a secure area library that is loaded inside the secure area and executed by the SGX-capable CPU. Developers define interfaces between the secure enclave code and other untrusted application code: (i) the call to a function inside the secure area is called a secure area entry call (Ecall). For each defined Ecall, the SDK adds instructions to marshal the external parameters, unmarshal the parameters within the bounding region and execute the function; (ii) external calls (Ocalls) allow external calls to untrusted functions. The added SDK code leaves the secure area, unmarshals the parameters, calls the function, and then reenters the secure area.

Any Ecall and Ocall introduces performance overhead because the hardware must perform certain operations to maintain the security of the SGX.

1.2 System architecture

The multi-security zone architecture of the application program can be divided into two types, namely single-secret and mixed-secret, and the difference between the two types is that only one secret and related codes thereof can be stored in each security zone or a plurality of secrets can be stored in each security zone. The invention guarantees the safety of the application program by decomposing the application program into a multi-safety-zone architecture with mixed secrets. The architecture is designed primarily based on the following considerations: there may be significant associations between multiple sensitive variables, and in most cases, storing the sensitive variables having significant associations in different trusted execution environments respectively does not significantly provide higher confidentiality, and introduces a large amount of additional performance overhead. For example, if an application encrypts data using a symmetric key and then encrypts that key using an asymmetric key, both keys may be contained in the same secure area, since the leakage of either key would cause the data to be corrupted, in which case there would be no need to separately protect the asymmetric and symmetric keys.

As shown in FIG. 3, the architecture uses a plurality of security zones to protect security-sensitive code and Data, each security zone stores one to a plurality of secrets, each security zone provides the necessary system support for the part, the security zones share Data with each other by using Sealed Data Blob of shared secret keys, function calls inside and outside the security zones are realized by using Ecall and Ocall instructions, and the function calls between the security zones are organized by non-sensitive code outside the security zones. Under this architecture, the granularity of partitions is smaller than the first few architectures.

1.4 threat models

The present invention considers a code to be security sensitive if the code has direct access to sensitive data or indirectly affects the confidentiality or integrity of the data. The adversary's goal is to reveal confidential data or to destroy its integrity. The present invention is considered to be a powerful and active adversary, such as a malicious system administrator, who may control the hardware and software of the machine executing the application. Thus, an attacker can (i) access or modify any data in memory or disk; (ii) view or modify application code; (iii) modifying the operating system or other system software.

The present invention does not consider denial of service (DoS) attacks, an adversary with complete control of the machine may decide not to run an application, and a replication method may be used to detect and mitigate such attacks. Similar to other works, the invention also ignores side channel attacks that exploit timing effects or page faults, but there are special countermeasures for both attacks.

2. Introduction to the MulTEE scheme

The present invention designs a fine-grained automation partition framework that protects existing C applications by executing security sensitive code in the Intel SGX secure area. The following requirements are met: 1. it must protect the confidentiality of sensitive input data and the integrity of sensitive output data; 2. the minimum authority principle is applied to realize the minimum of the attack plane of the sensitive data; 3. automatically modifying the application code; 4. ensuring an acceptable performance overhead. To achieve these requirements, the present invention operates in four phases (see fig. 4), code analysis, variable association analysis, forward and backward slicing, and program slicing optimization:

2.1 pretreatment

To meet the requirements of variable relevance analysis and application slicing, the preprocessor needs to input two types of data: (1) a set of source code files, denoted as F ═ F₁,f₂,...,f_nA preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file; (2) the list of security sensitive variables that must be protected and the name of the function in which the variable is located are denoted as S ═ S₁,s₂,...,s_nSince such data is determined by the different applications themselves, it must be manually annotated by developers.

2.2 procedural section

And in the program slicing stage, analysis is carried out on the basis of the intermediate file obtained in the preprocessing stage, and a code set related to the security sensitive variable specified by the user is identified. The invention uses automatic symbolic program slicing to identify data flow and control flow dependence of security sensitive variables, the function set contains definition statements of the security sensitive variables and can be continuously expanded in the slicing analysis process, thereby ensuring confidentiality and integrity of sensitive data, and in order to obtain a minimized program subset, the invention defines the slicing rule as follows:

when a program executes from n to m, the data flow of the program is traced, and rule (1) indicates that the propagation of secret references should be traced. Rule (2) indicates that the propagation of other parameters of a function needs to be tracked when a secret reference is made as a parameter of a function call. Rule (3) indicates that the propagation of the return value of a function needs to be tracked when a secret reference is used as a parameter of a function call. Finally, rule (4) indicates that when any reference to a secret is dereferenced, the secret and its reference need to be tracked.

The present invention uses llvm-slicing, a static program slicing framework, to determine the propagation of sensitive information (i.e., security sensitive variables) in a System Dependency Graph (SDG). Carrying out propagation analysis on sensitive variables in the { s } in sequence and recording the obtained sensitive variable set as

Is constantly changing during the iteration when

And its reference set is no longer changed. Through the propagation analysis, the method can finally obtain a large number of potential sensitive functions. Spot analysis on-line particle sizeAnd statistically analyzing the initial security sensitive variables and the code determined by the propagation analysis to finally form the critical components of the program to be protected.

2.3 analysis of variable associations

Because a large amount of expenditure is introduced in the function calling process among the multiple safety areas, the scheme of independently setting one safety area for each secret is not feasible, and in order to reduce the number of the safety areas, the method divides part of sensitive variables with approximate information quantity into the same safety area by using relevance analysis according to the safety requirement of a user. Suppose that

represents the set of all codes that the sensitive variable a propagates.

Definition 1 (similarity function) the similarity function sim (x, y) is a function that maps variables x, y to numbers in [0,1], measuring the similarity between x and y. sim (x, y) ═ 1 corresponds to objects x, y being identical, while sim (x, y) ═ 0 corresponds to very different objects.

The present invention uses the Jaccard similarity coefficient used in the information retrieval as a similarity measure:

there are two relationships between slices of different security sensitive variables as follows:

table 1 example table for variable correlation analysis

In order to meet the safety requirements under different scenes, the invention uses a relevance threshold rho epsilon [0, sim(s)₁，s₂)_max]As whether or not to changeThe amount is divided into the same group criteria and the threshold represents how much the user of the software is willing to sacrifice performance in exchange for the security of the system. sim(s)₁，s₂) When the value is more than or equal to rho, the invention converts s₁，s₂Divided into the same security zone group, sim(s)₁，s₂) If < rho, then s will be₁，s₂The division into different groups of security zones, where there is overlap between slices that encounter different variables during the division, the handling of the overlap will be described in the fourth section. At ρ ═ sim(s)₁，s₂)_maxThe automatic partitioning scheme of the present invention stores each secret in a separate security area, so that the highest security can be ensured, but the performance of software is seriously affected, and when ρ is 0, the final architecture is degraded to architecture 3 in section 1.2.

2.4 program slice optimization

The results of the correlation analysis form two sets of program components: (1) codes marked by developers and polluted in the SDG transmission process are grouped through relevance analysis; (2) uncontaminated code. By taint definition in this context, the component (1) constitutes a security section and the component (2) forms a common section, deployed inside and outside the security zone, respectively.

The invention develops a program slicer which slices and deploys programs according to results obtained by variable relevance analysis. An important design principle of a program slicer is to ensure that the slicing program is functionally equivalent to the original program. Direct slicing of the original program does not achieve this because there are communications inside and outside the secure zone that are now split into two separate and non-concurrently executing CPU modes. To solve the missing communication problem, the program slicer needs to inject secure communication code into both slices that need to communicate with the other slice. Communication between the two secure zones is performed by sharing the Sealed Blob, and therefore, the injected communication code includes two parts:

shared memory access, in which a slice loads data to or stores data from the shared memory, thereby enabling communication with another slice;

and adding the Ecall/Ocall instruction to the boundary of the safety area to realize the switching of the program execution sequence inside and outside the safety area, adding the Ocall instruction to the program code in the safety area to jump outside the safety area, and calling the function in the safety area by using the Ecall instruction outside the safety area.

The program slice section improves the security of the program by minimizing the size of the program within the secure area. But frequent cross-calls between the security zone and the outside world can result in significant resource overhead. Partial overlapping areas exist between different safety zone groups, and how to reasonably process the overlapping areas has a great influence on the execution efficiency of the program. The invention provides a self-adaptive safe area boundary relocation method.

As shown in FIG. 5, the sections with overlapping portions have the following four processing schemes, which are respectively referred to as Normal, Duplicated, Standalone, Hybrid in the present invention:

the Normal processing mode, which puts two repeated sections of the security region code into a separate security region, achieves the minimum average TCB, but requires more external calls and increases more switching overhead.

The Duplicate processing mode keeps the repeated part in two safety zones, so that the size of TCB is increased a little, but the number of interactions with the outside is reduced, and the performance of the software is improved.

The invention provides a Hybrid processing mode, which uses the idea of EBR technology in Glamdring for reference, uses a gcov runtime analysis tool to count the calling times of each boundary function, and allocates a cost for each safety zone boundary function according to the calling times of the overlapped part. Within a configurable threshold, a portion of the function is moved into a safe zone. Adding additional functions into the safe zone does not violate the security guarantees of the present invention, but increases the size of the average TCB.

The technical effects of the present invention will be described in detail with reference to experiments.

1. Analysis of experiments

1.1 Experimental setup

The present invention was evaluated using 4 real C programs, as shown in table 2. The programs comprise a key value pair database Memcached, a Polarssl library, a Digital Bitbox bitcoin wallet and an industrial control network protocol Modbus protocol library, and the test range comprises the functions of protocol processing, encryption, statistical calculation and the like of the risk of information leakage and/or data forgery. Table 2 also summarizes the statistics of the procedure. The program observed by the invention covers different test case sets, including different code line numbers (LoC), branch statements, loops, functions and global variables, which together provide a comprehensive test set for evaluating the partitioning method proposed by the invention.

Table 2 original data table of experimental software

The scheme is realized as follows: experiments generated LLVM assembly code using Clang 3.9, and LLVM-slicing tool with tool size of about 9,000LOC was used on the basis of LLVM assembly code. Memcached is a distributed key-value pair storage database. It supports a number of operations: set (k, v), get (k), delete (k), and increment/decrement (k, i). The invention uses the invention to perform slice analysis on Memcached 1.5.13 (containing asynchronous event processing library libevent 1.4.25). The Memcached assembly code has 36250 lines of code and 655 functions. PolarSSL is an implementation of SSL and TLS protocols and respective encryption algorithms, and is the smallest SSL code base at present. High efficiency, easy transplantation and integration. To enhance the security of Polarssl as a Certificate Authority (CA), the present invention uses the present invention to perform slice analysis on Polarssl 2.16.1. Polarssl has 74772 line codes and 817 functions, and the invention uses O2 to optimize and close inline assemblies to prevent the relationships between variables from being disturbed.

Digital Bitbox is a bitcoin wallet designed for a high security USB microcontroller. It supports (1) hierarchical deterministic key generation (2) signed transactions (3) encrypted communications. The invention is used for the detection of Digital Bitbox 2.0.0 (comprising the codon library Secp256k11.0.0 and JSON library Yajl 2.1.0). Digital Bitbox has 60876 line codes and 873 functions.

Libmodbus is an implementation of an industrial control protocol modbus protocol. It supports the modbus protocol devices to send/receive data and simultaneously supports the connection of serial ports and ethernet interfaces. The invention was used to perform slice analysis on libmodbus 3.1.4. Libmodbus has 8853 lines of code and 150 functions.

1.2 safety analysis

The invention evaluates the safety of the application program after the partition according to the size of the TCB.

1.2.1Memcached

The security target: the invention intends to protect the integrity and confidentiality of all key-value pairs of Memcached deployed on an untrusted platform and prevent an attacker from reading or modifying stored key-value pair data. Therefore, the invention slices the 7 key variables buf, pid _ file, passswd and the like respectively.

Security sensitive code: as the security requirements change, the partitioning scheme changes as shown, and when the threshold is set below 0.15, all variables of the flags are partitioned into the same security zone. The redundant part has about 12% of codes, and the redundant codes can be reduced to 0.1% by using the redundancy processing scheme provided by the invention, and simultaneously, Ecall of 42.7% and Ocall of 18.2% are reduced, as shown in FIG. 6.

1.2.3PolarSSL

The security target: the goal of the invention is to protect the confidentiality of the private key of the PolarSSL CA root certificate and to protect the confidentiality of the session data. Therefore, the present invention slices the four key variables srvcert, rsa, ssl, secret, respectively, as shown in fig. 7.

Security sensitive code: as the security requirements change and the partitioning scheme changes as shown in the figure, the present invention finds that rsa and secret are always divided in the same security zone regardless of the setting of the threshold, because for the message to be encrypted, it is necessary to encrypt the message by a symmetric encryption algorithm first and then exchange the keys used by the symmetric encryption using asymmetric encryption, so that for the message, revealing rsa and secret is equivalent, which may result in the confidentiality and integrity of the message being destroyed. The redundant part is about 4% of codes, and the redundant codes can be reduced to 1% by using the redundant processing scheme provided by the invention, and simultaneously, the Ecall is reduced by 14%, and the Ocall is reduced by 5%.

1.2.4Digital Bitbox

The security target: the invention intends to protect the bitcoin transaction service deployed on the untrusted platform, an attacker cannot (i) read/modify the private key in the wallet; (ii) a transaction is created. Therefore, the invention slices the four key variables of ecdh _ secret, priv _ key, shared _ sec, sign, and the like. As shown in fig. 8.

Security sensitive code: as the security requirements change, the partitioning scheme changes as shown, and the number of ocalls and TCB size increase dramatically when the threshold is set above 0.583. The redundant part is about 10.5% of codes, and the redundant codes can be reduced by 19% by using the redundant processing scheme provided by the invention, and the Ecall/Ocall is reduced by 22%.

1.2.5Libmodbus

The security target: the invention is used for protecting the confidentiality and the integrity of a data packet received by a server supporting a modbus protocol and preventing an attacker from reading or modifying an instruction which needs to be executed by an industrial control system. Therefore, the invention slices four key variables such as ctx, mb _ mapping, query, raw _ req, and the like. As shown in fig. 9.

Security sensitive code: as the security requirements change, the partitioning scheme changes as shown. The redundant part is about 17% of codes, and the redundant codes can be reduced by 17% by using the redundant processing scheme provided by the invention, and simultaneously, the Ecall/Ocall is reduced by 12%.

1.2.6 discussion

Table 3 comparative table of safety test results

The safety assessment of the present invention shows: the scheme of the invention realizes the small TCB and protects the security sensitive function of the practical application program. FIG. 4 compares the TCB size when running Memcached for the application of the present invention with Glamdring, SCONE and Graphere, which place the entire application in a safe area, and Glamdring which stores all security related variables in a single safe area. It can be seen that the size of the proposed scheme of the present invention is similar to the Glamdring scheme, being one third of the size of the SCONE and Graphere schemes.

1.2.7 Performance evaluation

Migrating a partial program into a secure area results in additional communication overhead (i.e., Ecall/octall) to preserve original functionality, which introduces additional time overhead compared to the original program that was not protected using SGX. Therefore, the invention quantitatively evaluates the communication overhead of the program generated by the invention through the number of Ecall/Ocall.

Table 4 comparative table of performance test

Redundancy processing method	TCB size	Ecall number	Number of Ocall
				Normal(Baseline)	14525	138	242
Duplicated	14846(2.2％)	74(-46.3％)	195(-19.4％)
				Standalone	14525(0％)	124(-10.1％)	226(-6.6％)
Hybrid-EBR	14554(0.1％)	79(-42.7％)	198(-18.2％)

The four redundancy processing methods are used for analyzing memcached respectively, the fixed safety threshold value is [0.15, 0.4], the obtained experimental results are shown in the table above, the redundancy code processing algorithm provided by the invention effectively reduces the number of redundancy codes, greatly reduces the number of Ecall/Ocall and improves the performance of software.

2. Results

The invention provides an automatic program partitioning framework of a plurality of safety zones for the first time, which automatically divides a target program into a plurality of safety slices and a common slice so as to work in cooperation with a trusted execution environment based on hardware isolation. In designing the present invention, the present invention focuses on optimizing the trusted computing base of the system to ensure security while improving the utility of the software through redundant edge adaptation. The experimental results of the present invention on four real C procedures demonstrate the effectiveness and performance of the present invention. Although Intel SGX was primarily targeted in this work^[4]Platform, but the invention can also be applied to other isolation-based frameworks, for exampleARM TrustZone。

It is an object of the present invention to bridge the gap between hardware security and software developers/end-users so that they can benefit from employing hardware security primitives in system design without having to bear a significant development burden or delve into hardware security.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A software automatic segmentation method is characterized in that a MulTEE partition frame is adopted, and security sensitive application program data are annotated firstly; then, MulTEE will automatically divide the application into untrusted modules and multiple secure modules, each secure module being the smallest program slice of sensitive data, identify codes that may affect the confidentiality of sensitive data using backward dataflow analysis, and identify codes that may affect the integrity of sensitive data using forward slices; finally, the security-sensitive module is placed in a secure zone to protect it from attacks;

the software automatic segmentation method further comprises the following steps:

fourthly, forming two groups of program components according to the result of the relevance analysis: (1) codes marked by developers and polluted in the SDG transmission process are grouped through relevance analysis; (2) uncontaminated code, defined by a stain in the context, the component (1) constitutes a security section, the component (2) forms a common section, deployed inside and outside the security zone respectively;

the first step further comprises: the preprocessor inputs two types of data: (1) a set of source code files, denoted as F ═ F₁,f₂,...,f_nA preprocessor converts all source code files into LLVM assembly codes and links the LLVM assembly codes into a single file; (2) the list of security sensitive variables that must be protected and the name of the function in which the variable is located are denoted as S ═ S₁,s₂,...,s_nFifthly, labeling;

the third step further comprises:

a set of all codes representing the propagation of the sensitive variable a;

2. The software auto-segmentation method of claim 1, wherein the second step further comprises: the data flow and control flow dependence of the security sensitive variable are identified by using an automatic symbolic program slice, a function set comprises definition statements of the security sensitive variable, the function set is continuously expanded in the process of slice analysis, and in order to obtain a minimized program subset, a slice rule is defined as follows:

determining the propagation of sensitive information in a system dependency graph SDG by using an llvm-slicing-static software analysis framework, sequentially and independently carrying out propagation analysis on sensitive variables in { s }, and recording the obtained sensitive variable set as { s }_fi}，{s_fiIs constantly changing during the iteration when s_fiStopping analysis when the reference set of the reference set is not changed any more, and obtaining a large number of potential sensitive functions.

3. The software auto-segmentation method of claim 1, wherein the fourth step further comprises: the program slicer slices and deploys a program according to a result obtained by the variable relevance analysis, the program slicer injects a secure communication code into two slices which need to communicate with another slice, communication between two secure areas is performed by sharing a Sealed Blob, and the injected communication code comprises two parts:

and adding the Ecall/Ocall instruction at the boundary of the safety area to realize the switching of the program execution sequence inside and outside the safety area, adding the Ocall instruction to the program code in the safety area to jump outside the safety area, and calling the function in the safety area by using the Ecall instruction outside the safety area.

4. The software automatic segmentation method according to claim 3, wherein the slice with the overlapped part has four processing schemes of Normal, Duplicated, Standard and Hybrid;

duplicate treatment mode retains Duplicate portions in both security zones;

the standby processing mode puts the redundant part into one section of safe area code according to the calling times and adds the related calling code into the other end code;

5. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the method according to any one of claims 1-4.

6. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the method of any one of claims 1-4.

7. A software automatic segmentation system for implementing the software automatic segmentation method according to any one of claims 1 to 4, wherein the software automatic segmentation system comprises:

and the program slice optimization module is used for forming the result of the relevance analysis into two groups of program components: (1) codes marked and contaminated in the SDG propagation process are grouped through relevance analysis; (2) uncontaminated code, defined by a blob in the context, the component (1) constitutes a security slice, and the component (2) forms a generic slice, deployed inside and outside the security zone, respectively.

8. A terminal characterized in that it comprises the software auto-segmentation system of claim 7.