CN112487438A

CN112487438A - Heap object Use-After-Free vulnerability detection method based on identifier consistency

Info

Publication number: CN112487438A
Application number: CN202011453648.6A
Authority: CN
Inventors: 宋巍; 桂滨法; 熊海龙
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-12-12
Filing date: 2020-12-12
Publication date: 2021-03-12
Anticipated expiration: 2040-12-12
Also published as: CN112487438B

Abstract

The invention discloses a heap object Use-After-Free vulnerability detection method based on identifier consistency, which takes a source code of a C/C + + program as input and takes a detected heap object Use-After-Free vulnerability as output. The method comprises the steps of firstly, carrying out static analysis on an input program, and finding out code positions of heap object allocation, pointer propagation and pointer dereferencing; thereafter, code instrumentation is performed on the program at these code locations to specify the same unique identifier for each allocated heap object and all pointers to that object; and finally, executing the instrumented program, and executing instrumentation codes in the program running process to compare whether the identifier of the pointer is matched with the identifier of the object actually pointed by the pointer, so as to judge whether a vulnerability exists. The method provided by the invention has the advantages of effectiveness and high efficiency, and can detect the Use-After-Free vulnerability of the heap object with lower performance overhead and memory overhead.

Description

Heap object Use-After-Free vulnerability detection method based on identifier consistency

Technical Field

The invention belongs to the field of program analysis and test, and particularly relates to a heap object Use-After-Free vulnerability detection method based on identifier consistency.

Background

Low-level languages such as C and C + + provide low-level management of heap memory, and developers can flexibly allocate and release heap objects on the heap memory. Due to the flexibility and high efficiency of C and C + +, a large number of system programs, such as browsers, databases, servers, etc., developed using C and C + + languages are widely used in daily life. However, as the program scale increases and the modular concept develops, it is difficult for developers to always correctly allocate and release heap objects. Therefore, the programs developed in C and C + + languages are prone to timing errors such as Use-After-Free bugs, which has become a significant cause of insecurity in modern software systems. One study report indicated that the number of Use-After-Free vulnerabilities registered in the book was on a rising trend year by year from 2009 to 2019, with over 80% of the Use-After-Free vulnerabilities flagged as high risk or severe risk. Therefore, developers should pay attention to the heap object Use-After-Free vulnerability in C/C + + programs.

At present, compared with other memory errors, the Use-After-Free bug of the heap object in the C/C + + program is difficult to detect by a manual or static analysis mode for the following reasons. First, pointers are aliased, making it difficult to infer all pointer aliases distributed across many data structures. Second, it is challenging to determine to which memory object the pointer should point. Finally, but not least, the problem of path explosion due to increased program size makes it quite difficult to perform inter-procedural analysis.

Although many research works have proposed dynamic detection methods for heap object Use-After-Free bugs in C/C + + programs, most of them are object location-based methods. They use shadow memory to record the allocation/release status of heap objects, but do not distinguish between different heap objects allocated at the same heap address in sequence with program execution. This means that they can only detect a Use-After-Free hole that occurs when the freed heap memory has not been reallocated. When the heap memory storing released heap objects is reallocated to store another heap object, they are unable to detect a heap object Use-After-Free hole that occurs on the reallocated heap memory. For prediction-based dynamic detection methods, they are only suitable for detecting concurrent Use-After-Free bugs of multi-threaded programs, and cannot detect Use-After-Free bugs within single threads of sequential programs and multi-threaded programs. Therefore, a dynamic method capable of effectively detecting the Use-After-Free bug of the heap object in the C/C + + program without being influenced by memory reuse is still lacking at present.

Disclosure of Invention

The invention aims to provide a heap object Use-After-Free vulnerability detection method based on identifier consistency, which is used for efficiently and effectively detecting the heap object Use-After-Free vulnerability in a C/C + + program.

The technical solution for realizing the purpose of the invention is as follows: a heap object Use-After-Free vulnerability detection method based on identifier consistency takes source codes of a C/C + + program as input and takes detected heap object Use-After-Free vulnerability as an output result, and comprises the following steps:

step 1, converting the source code of the input C/C + + program into an LLVM IR file by using LLVM and Clang, and finding the positions of the codes for heap object allocation, pointer propagation and pointer dereferencing in the input C/C + + program based on the LLVM IR file;

step 2, code instrumentation is carried out on the input C/C + + program, the functions of assigning the same unique identifier to each distributed heap object and all pointers pointing to the heap object and inserting memory check for pointer dereferencing are achieved, and the instrumented C/C + program is obtained;

and 3, running the instrumented C/C + + program, and executing instrumented codes in the running process of the program to compare whether the object identifier of the pointer is matched with the identifier of the current object actually pointed by the pointer, so as to detect whether a user-After-Free vulnerability of the heap object occurs.

Compared with the prior art, the invention has the following remarkable advantages: (1) the method is not influenced by memory reuse in the program execution process, and can effectively detect the Use-After-Free vulnerability of the heap object in the program execution process; (2) the method has the advantages that accuracy of detecting the Use-After-Free vulnerability of the heap object is improved, and meanwhile, low runtime overhead and low memory overhead are achieved.

Drawings

FIG. 1 is a flow chart of a heap object Use-After-Free vulnerability detection method based on identifier consistency.

FIG. 2 is an exemplary diagram of source code for a C/C + + program with a heap object Use-After-Free vulnerability.

FIG. 3 is an exemplary diagram of code in the form of LLVM IR resulting from compiling source code using LLVM and Clang.

Fig. 4 is a diagram of an example of code in LLVM IR form after instrumentation is complete.

FIG. 5 is an exemplary diagram of diagnostic information given when a heap object Use-After-Free vulnerability is detected.

Detailed Description

The invention provides an efficient heap object Use-After-Free vulnerability detection method based on identifier consistency, which takes a source code of a C/C + + program as an input and a detected Use-After-Free vulnerability as an output result, and the whole flow is shown in figure 1. The detection method is specifically realized as follows:

step 1, converting the source code of the input C/C + + program into an LLVM IR file by using LLVM and Clang, and finding the code positions of heap object allocation, pointer propagation and pointer dereferencing in the input C/C + + program based on the LLVM IR file, wherein the specific steps are as follows:

step 1-1, compiling C/C + + program source codes by using LLVM and Clang, and converting the C/C + + program source codes into LLVM IR files, wherein the obtained LLVM IR files are intermediate representation forms of the C/C + + program source codes;

and step 1-2, traversing all LLVM IR files, searching all heap object allocation, pointer propagation and pointer dereferencing statements, and recording the code position of each statement.

Step 2, performing code instrumentation on the input C/C + + program to realize the functions of assigning the same unique identifier to each allocated heap object and all pointers pointing to the heap object and inserting memory check for pointer dereference, so as to obtain the instrumented C/C + program, which specifically includes the following steps:

step 2-1, using a code instrumentation function provided by the LLVM to perform code instrumentation on an input C/C + + program, wherein the specific instrumentation rules include:

(1) at the code positions allocated by all heap objects, replacing the heap object allocation function with a self-defined heap object allocation function, thereby realizing that a unique identifier is assigned to each newly allocated heap object during heap object allocation, and the unique identifier of the heap object is invalidated during heap object release;

(2) at the code position of all pointer propagation, inserting corresponding codes so as to propagate the identifier of the heap object to all pointers pointing to the heap object, namely all pointers pointing to the heap object have the same unique identifier as the heap object;

(3) inserting corresponding memory check at all code positions for pointer dereferencing so as to judge whether a heap object Use-After-Free vulnerability occurs when the pointer dereferencing the heap object;

step 2-2, optimizing the instrumented LLVM IR file by using a code optimization function provided by the LLVM, wherein the specific optimization rule comprises the following steps:

(1) when the pointer dereferences a stack object or a global object, the inserted pointer dereference memory check is eliminated, since this does not involve a heap object Use-After-Free vulnerability;

(2) when the pointer dereferences the same stack of objects for multiple times in the cycle, inserting a pointer dereferencing memory check before the cycle, and eliminating the pointer dereferencing memory check in the cycle;

(3) when a plurality of pointer dereferences for accessing the same heap object exist among the sequentially executed codes, only one pointer dereference memory check is reserved, and the rest pointer dereference memory checks are eliminated;

step 2-3: and generating the C/C + + program after the instrumentation by using the obtained LLVM IR file.

Step 3, running the instrumented C/C + + program, and executing instrumented codes in the running process of the program to compare whether the object identifier of the pointer is matched with the identifier of the current object actually pointed by the pointer, so as to detect whether a heap object Use-After-Free vulnerability occurs, wherein the specific steps are as follows:

step 3-1, operating the C/C + + program after the pile is inserted by using a command line tool;

step 3-2, using a test case in a test suite attached to the C/C + + program or a test case generated by a fuzzy test tool as the input of the C/C + + program after the instrumentation;

3-3, when the program generates pointer dereferencing, the program executes the instrumented pointer dereferencing memory check code, and judges whether a heap object Use-After-Free vulnerability occurs or not by comparing whether the identifier of the heap object associated with the pointer is matched with the identifier of the current heap object actually pointed by the pointer;

3-4, if the stack object is matched with the user-After-Free object, no Use-After-Free bug occurs, and the program continues to execute; if not, the program crashes and gives corresponding Use-After-Free vulnerability diagnostic information, namely the positions of the corresponding source codes when the heap objects are allocated, released and accessed.

The present invention will be described in detail with reference to the following examples and drawings.

Examples

The invention relates to a heap object Use-After-Free vulnerability detection method based on identifier consistency. In order to detect a heap object Use-After-Free vulnerability, firstly, performing static analysis on an input C/C + + program to find the position of codes related to heap object allocation, pointer propagation and pointer dereferencing; then, code instrumentation is carried out on the C/C + + program at the located relevant code position to assign the same unique identifier to each allocated heap object and all pointers pointing to the object and insert memory check for pointer dereferencing; and finally, executing the instrumented program, and executing instrumented codes in the program running process to compare whether the object identifier of the pointer is matched with the identifier of the current object actually pointed by the pointer, so as to detect whether the program has a heap object Use-After-Free vulnerability.

In combination with the example, the method includes:

step 1, for the source code of the input C/C + + program, acquiring the position of the code related to heap object allocation, pointer propagation and pointer dereferencing in the program source code, and specifically comprising the following steps:

step 1-1, converting source codes of a C/C + + program into an LLVM IR file by using LLVM and Clang, wherein code examples of a source code form and an LLVM IR form are respectively shown in FIG. 2 and FIG. 3;

step 1-2, traverse all LLVM IR files, search all statements related to heap object allocation, pointer propagation, and pointer dereferencing. After the search is finished, obtaining all code positions of statements related to heap object allocation, pointer propagation and pointer dereferencing in the program;

step 2, performing code instrumentation on the input C/C + + program at the code positions of all heap object allocation, pointer propagation and pointer dereferencing statements, and specifically comprising the following steps:

step 2-1, using the formulated instrumentation rule, respectively performing instrumentation on heap object allocation, pointer propagation and pointer dereferencing statements to realize that the same unique identifier is assigned to each allocated heap object and all pointers pointing to the object and memory check is inserted for pointer dereferencing, as shown in fig. 4;

step 2-2, optimizing the LLVM IR file after the pile insertion according to the formulated optimization rule;

step 2-3, generating a C/C + + program after the pile is inserted by using the obtained LLVM IR file;

step 3-1, using a command line tool to run the instrumented C/C + + program, such as/a.out, wherein/is a path where the program to be run is located, and a.out is a name of the program to be run;

step 3-2, using the test case in the test suite attached to the C/C + + program or the test case generated by the fuzzy test tool as the input (optional) of the inserted C/C + + program;

3-4, if the stack object is matched with the user-After-Free object, no Use-After-Free bug occurs, and the program continues to execute; if not, the program crashes and gives corresponding Use-After-Free vulnerability diagnostic information, namely the positions of the corresponding source codes when the heap objects are allocated, released and accessed. As shown in FIG. 5, line 1 indicates that a heap object Use-After-Free vulnerability is detected; lines 2-7 show the location of the corresponding source code when allocating heap objects; lines 8-13 show the location of the corresponding source code when the heap object is released; lines 14-19 show the location of the corresponding source code when accessing the heap object.

Claims

1. A heap object Use-After-Free vulnerability detection method based on identifier consistency takes source codes of a C/C + + program as input and detected Use-After-Free vulnerability as an output result, and is characterized by comprising the following specific steps of:

step 1, converting an input C/C + + program source code into an LLVM IR file by using LLVM and Clang, and finding code positions for heap object allocation, pointer propagation and pointer dereferencing in the input C/C + + program based on the LLVM IR file;

step 2, code instrumentation is carried out on the input C/C + + program, the same unique identifier is assigned to each distributed heap object and all pointers pointing to the heap object, and memory check is inserted for pointer dereferencing to obtain the instrumented C/C + program;

and 3, running the instrumented C/C + + program, and executing instrumented codes in the running process of the program to compare whether the object identifier of the pointer is matched with the identifier of the current object actually pointed by the pointer, so as to detect whether a heap object Use-After-Free vulnerability occurs.

2. The method for detecting the Use-After-Free vulnerability of the heap object based on the identifier consistency according to claim 1, wherein in step 1, LLVM and Clang are used to convert the source code of the input C/C + + program into LLVM IR files, and the code locations for heap object allocation, pointer propagation and pointer dereferencing in the input C/C + + program are found based on the LLVM IR files, which includes the following specific steps:

3. The identifier consistency-based heap object Use-After-Free vulnerability detection method according to claim 1, wherein the code instrumentation is performed on the input C/C + + program in step 2, so as to implement functions of assigning the same unique identifier to each allocated heap object and all pointers pointing to the heap object and inserting a memory check for pointer dereferencing, so as to obtain an instrumented C/C + program, and specifically includes the following steps:

step 2-1, using a code instrumentation function provided by the LLVM to perform code instrumentation on an input C/C + + program;

(1) when the pointer dereferences the stack object or the global object, eliminating the inserted pointer dereferencing memory check;

4. The identifier consistency-based heap object Use-After-Free vulnerability detection method according to claim 3, wherein in step 2-1, the specific instrumentation rules include:

(1) replacing the called heap object allocation function with a self-defined heap object allocation function at the code positions allocated by all heap objects, thereby realizing that a unique identifier is assigned to each newly allocated heap object during heap object allocation, and the unique identifier of the heap object is invalidated during heap object release;

(2) at the code position where all the pointers are propagated, inserting corresponding codes so as to propagate the identifier of the heap object to all the pointers pointing to the heap object, namely all the pointers pointing to the heap object have the same unique identifier as the heap object;

(3) and inserting corresponding memory check at all code positions for pointer dereferencing so as to judge whether a heap object Use-After-Free vulnerability occurs when the pointer dereferencing the heap object.

5. The method for detecting the Use-After-Free vulnerability of a heap object based on identifier consistency according to claim 1, wherein in step 3, After the instrumented C/C + + program is executed, instrumentation code is executed during the program running process to compare whether the object identifier of the pointer and the identifier of the current object actually pointed to by the pointer match, so as to detect whether the Use-After-Free vulnerability of the heap object occurs, which includes the following specific steps: