CN113971278A

CN113971278A - Memory vulnerability detection method and device, equipment and storage medium thereof

Info

Publication number: CN113971278A
Application number: CN202010722853.1A
Authority: CN
Inventors: 陈书杰
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2022-01-25

Abstract

The embodiment of the application provides a memory vulnerability detection method and a device, equipment and a storage medium thereof, wherein the method comprises the following steps: carrying out syntax analysis on the obtained source program to be detected to generate an abstract syntax tree; extracting characteristic information of each node in the abstract syntax tree; determining a target node based on the characteristic information of each node; generating a pointer-dependent control flow graph based on the target node; determining a vulnerability node in a source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule; therefore, the detection precision and the detection efficiency of the memory leak detection can be improved.

Description

Memory vulnerability detection method and device, equipment and storage medium thereof

Technical Field

The present application belongs to the field of computer technology, and relates to, but is not limited to, a memory vulnerability detection method, apparatus, device, and storage medium.

Background

With the increasing demand of computer applications, the design and development of application programs are correspondingly becoming more complex, and the variables processed by developers in the process of program implementation are also increasing greatly, so that more or less codes causing memory leakage are introduced during the routine software development. Memory leak defects are characterized by being covert and cumulative and are more difficult to detect than other memory illegal access errors. Because the memory leakage is caused because the memory block is not released and belongs to the missing type defect rather than the error type defect. In addition, memory leaks typically do not directly produce observable error symptoms, but rather accumulate gradually, reducing the overall performance of the system and, in extreme cases, potentially causing the system to crash. Therefore, how to efficiently and accurately detect the memory loophole is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a memory vulnerability detection method, a device, equipment and a storage medium thereof, which can improve the detection precision and detection efficiency of memory vulnerability detection.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a memory vulnerability detection method, which comprises the following steps:

carrying out syntax analysis on the obtained source program to be detected to generate an abstract syntax tree;

extracting characteristic information of each node in the abstract syntax tree;

determining a target node based on the characteristic information of each node;

generating a pointer-dependent control flow graph based on the target node;

and determining the vulnerability node in the source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule.

The embodiment of the application provides a memory vulnerability detection device, the device includes at least: the device comprises a syntax analysis module, a feature extraction module, a first determination module, a first generation module and a second determination module, wherein:

the syntax analysis module is used for performing syntax analysis on the acquired source program to be detected to generate an abstract syntax tree;

the feature extraction module is used for extracting feature information of each node in the abstract syntax tree;

the first determining module is used for determining a target node based on the characteristic information of each node;

the first generation module is used for generating a pointer-related control flow graph based on the target node;

and the second determining module is used for determining the vulnerability node in the source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule.

The embodiment of the present application provides a memory vulnerability detection device, the memory vulnerability detection device at least includes: a memory, a communication bus, and a processor, wherein:

the memory is used for storing a memory vulnerability detection program;

the communication bus is used for realizing connection communication between the processor and the memory;

the processor is configured to execute the memory vulnerability detection program stored in the memory to implement the steps of the memory vulnerability detection method provided in other embodiments.

The embodiment of the application provides a storage medium, wherein a memory vulnerability detection program is stored on the storage medium, and when being executed by a processor, the memory vulnerability detection program realizes the steps of the memory vulnerability detection method provided by other embodiments.

The embodiment of the application provides a memory vulnerability detection method, a device, equipment and a storage medium, wherein the syntax analysis is firstly carried out on an obtained source program to be detected to generate an abstract syntax tree; further extracting characteristic information of each node in the abstract syntax tree; determining target nodes based on characteristic information of each node, wherein the target nodes at least comprise pointer definition nodes; generating a pointer-dependent control flow graph based on the target node; and determining a vulnerability node with memory leakage in the source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule, so as to perform characteristic formalized description on the memory vulnerability based on the PCFG, abstract out a vulnerability judgment rule based on the related vulnerability characteristic formalized description, and effectively improve the memory-related vulnerability detection efficiency.

Drawings

Fig. 1 is a schematic diagram of an implementation flow of a memory vulnerability detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation flow of generating a control flow graph related to a pointer according to an embodiment of the present application;

fig. 3 is a schematic diagram of another implementation flow of the memory vulnerability detection method according to the embodiment of the present application;

fig. 4 is a schematic diagram of an implementation framework of the memory vulnerability detection method provided in the embodiment of the present application;

fig. 5 is a schematic structural diagram of a memory vulnerability detection apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a composition result of the memory vulnerability detection apparatus according to the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application clearer, specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) A memory leak, which may also be referred to as a memory leak, refers to a condition in which dynamically allocated memory cells are not freed up for use.

2) Vulnerability characteristics σ: σ (VulType) ═ N, C }. Wherein VulType represents a vulnerability type, N represents a vulnerability-related danger point, and there are N ═ { N1, N2, …, Ni }, i ═ 0, 1, 2, …. For the program Prog, all program statements containing the risk factor ξ are referred to as vulnerability-related risk points N. And C represents vulnerability relevant constraints and represents a constraint relation between elements in the vulnerability relevant danger points N. In the examples of the present application, C is defined as: c1| | C2| | C3| | | … … | | Cj | | … …, j ═ 0, 1, 2, …, and Cj ═ BCj ^ ACj, that is, the feature of a certain vulnerability may include multiple vulnerability-related constraints, and any vulnerability-related constraint Cj includes two parts, basic constraint BCj and additional constraint ACj. Basic constraints BCj represent basic constraints that need to be satisfied for the production vulnerability, and additional constraints ACj represent additional constraints that need to be satisfied for the production vulnerability.

3) Pointer-related Control Flow Graph (PCFG): the PCFG includes only pointer definitions, pointer accesses, memory allocation and release, and related nodes that guarantee the complete structure of the program, and each node in the graph has a feature attribute. PCFG ═ N, E, Entry, Exit. Where N is the set of nodes, where,

there is n ═ (id, lnum, nextn _ id, nf). Where id represents the number of node n in the PCFG, lnum represents the line number of the current node in the source program, nextn _ id represents the id of the next node pointed to in the PCFG, and nf represents the node characteristics. E is a set of edges representing the direction relationships between nodes, and each edge is a set of edges representing possible control flow relationships from node ni to node njOrdered node pairs<ni，nj>. Entry is the ingress node of the PCFG and Exit is the egress node of the PCFG.

4) Executable Path (Path): in the PCFG, Path is < n0, n1, n2, …, nm > is a sentence sequence, and if (ni-1, ni) ∈ E is satisfied, where i is 1, 2, …, m, Path is an executable Path of the PCFG.

5) Data dependent (DataDep): for any two nodes nx, ny on the PCFG, if there is an executable path from node nx to node ny, there is a variable var defined in node nx and used in node ny, and the variable var is not redefined anywhere else on the path from node nx to node ny, then the data of node ny is said to be dependent on node nx and is denoted as DataDep (nx, ny, var).

6) Successor relationship (Suc): for any two nodes nx, ny on the PCFG, if an executable path from the node nx to the node ny exists, the node ny is called a successor node of the node nx and is marked as Suc (nx, ny).

7) Define pointer DPV (pv): the pv declaration and definition of the pointer variable are denoted by defpointervariable (pv), abbreviated in the embodiments of the present application as dpv (pv).

8) Memory allocation function mmf (pv): memory allocation for pointer variable pv is denoted by memorymalloc function (pv), abbreviated mmf (pv). Common memory allocation related functions include: malloc, calloc, realloc, and the like of the C language; new in the C + + language, new [ ], and the like.

9) Memory release function mff (pv): the memory pointed by the release pointer pv is denoted by memoryfleefunction (pv), abbreviated mff (pv). Commonly used memory release related functions include: free in C language, delete [ in C + + language]And the like. In the embodiment of the application, the MMF (pv)/MFF (pv) is used for matching the allocation and release representation functions

There is no match to the representation allocation and release functions.

10) Using the pointer upv (pv): the use of pointer variable pv, abbreviated upv (pv), is denoted by usepointervariable (pv).

11) The operations include the relationship: has (n, Operation/Function) is used to indicate that the defined Operation or Function is contained in the current statement n. The Operation includes dpv (pv), upv (pv), etc., and the Function includes mmf (pv), mff (pv), etc. Such as

Statement n in the presentation program Prog defines the pointer variable pv.

12) The alias relationship is as follows: for a certain executable path in a program, if the path has a pointer variable pv and a pointer variable pv ', and pv' and pv point to the same memory address, the pointer variable pv 'and the pointer variable pv are called to have an alias relationship, which is abbreviated as isaias (pv, pv').

13) Abstract Syntax Tree (AST, Abstract Syntax Tree): the data structure of the program statements is represented in the form of a tree.

In order to better understand the embodiments of the present application, a memory vulnerability detection method and the existing shortcomings in the related art are described.

The existing memory vulnerability detection method mainly comprises the following two implementation modes:

a first implementation, conceptual feature analysis. Conceptual feature analysis mainly refers to attributing conceptual features of a vulnerability from one or more perspectives. The method considers the characteristics of the vulnerability to include objects influenced by the vulnerability, the influence of the vulnerability, the mode and method of causing the influence and the input type causing the influence. Objects influenced by the vulnerability mainly comprise user files, stack data, system directories, system programs, system equipment and the like; the influence of the vulnerability mainly comprises replacement, change, readability, appendability, creatability, locking and the like; the method for causing the influence mainly comprises the steps of configuring errors, changing environment variables, improperly executing a system protection mechanism, inheriting unnecessary permission and the like; the types of input that cause an impact include, among other things, temporary files, environment variables, user commands, configuration files, and system call parameters. The conceptual feature analysis method is beneficial to recognizing the vulnerability features from the overall and macroscopic angles, and has very important guiding significance for vulnerability research.

Second implementation, formalized feature analysis. The formal feature analysis is mainly based on conceptual features of the vulnerability, defines a formal description method and describes the vulnerability features more accurately. Formal feature analysis there are two more classical studies:

(1) and formalized feature analysis based on the code attribute graph. Firstly, fusing an abstract syntax tree, a control flow diagram and a data flow diagram based on formalized feature analysis of a code attribute diagram, and providing the code attribute diagram. And abstracting the vulnerability characteristics into a triple (Ssrc, Sdst, Sss san) based on the code attribute graph, wherein the Ssrc is grammatical description of source information controlled by an attacker, the Sdst is security sensitive operation description, and the Sss san is purification operation description. If there is no qualified Ss san from source Ssrc to sensitive operation Sdst, then the program may have a vulnerability.

(2) Formalized feature analysis based on feature manipulation. The formal feature analysis based on the feature operation is mainly researched by taking the feature operation involved in the vulnerability forming process as an entry point, and is a formal feature analysis method considered by many vulnerability researchers at present.

The two methods mainly have the following defects:

first, research on software vulnerability characteristic analysis shows that existing conceptual characteristic analysis is often beneficial to recognizing and researching vulnerability characteristics from an overall and macroscopic perspective, but it is difficult to apply vulnerability characteristics to specific vulnerability discovery and vulnerability detection.

Secondly, the existing vulnerability formalization feature analysis has two problems: on one hand, many formal feature analysis methods can only analyze the features of one or a certain type of bugs, and the expandability of the method is not strong, so that different formal description methods need to be defined for the features of different bugs, a uniform standard is lacked, and the specific application of the method is not facilitated; on the other hand, the abstraction degree of the formal feature analysis is high, but the traversal rules caused by the abstract degree are very complex, and the understanding and the application of the rules are not facilitated.

In view of the foregoing problems, an embodiment of the present application provides a memory vulnerability detection method, which is applied to a memory vulnerability detection device, where the memory vulnerability detection device may be a terminal device with computing capability and communication capability, such as a desktop computer, a notebook computer, a tablet computer, and the like. Fig. 1 is a schematic diagram of an implementation flow of a memory vulnerability detection method according to an embodiment of the present application, and as shown in fig. 1, the method includes:

and S101, performing syntax analysis on the acquired source program to be detected to generate an abstract syntax tree.

A source program, also referred to as source code, is an uncompiled text file written in accordance with a certain programming language specification, which is a series of human-readable computer language instructions. Since the machine can only execute the binary instructions, when executing the source program, the compiler needs to compile the source program to obtain the binary instructions. An abstract syntax tree is an abstract representation of the syntax structure of a source program. The syntax structure of the programming language is represented in the form of a tree, each node on the tree representing a structure in the source code.

In the step S101, the front end of the GCC compiler may be utilized to parse the source program to generate an abstract syntax tree file in the tu format.

And step S102, extracting characteristic information of each node in the abstract syntax tree.

Since the user-defined variables and the user-defined functions do not need to pay attention to specific variables or function names, but only need to pay attention to the functions of the variables or the functions in the codes, after the abstract syntax tree is generated based on the source program, the node identifiers (namely the node names) of all nodes in the abstract syntax tree can be symbolized to obtain symbolized node identifiers, and then feature information extraction is performed based on the symbolized node identifiers.

When the step S102 is implemented, a suitable keyword may be selected from the defined keyword table according to the node identifier of each node and the operation implemented by each node, and the feature information of each node may be represented in the form of "keyword _ variable" or "keyword _ function". For example, the node identifier of a node is pointer1_ DECL, and as can be seen from the node identifier, the node is a definition declaration node of a pointer variable, and a keyword corresponding to the definition declaration of the variable is DECL, then the characteristic information of the node can be obtained as DECL _ pointer 1.

And step S103, determining a target node based on the characteristic information of each node.

Here, the target node includes at least a pointer definition node. The operation realized by each node can be determined through the characteristic information of each node, so that the pointer definition node can be determined, and the pointer definition node can be determined as the target node.

And step S104, generating a pointer-related control flow graph based on the target node.

Here, in the step S104, when it is implemented, the target node may be determined as an initial node, and then each child node of the initial node is traversed through the abstract syntax tree, and if the child node of the initial node is a pointer access node, a memory allocation node, or a memory release node, the child node is added to the pointer-related control flow graph.

It should be noted that, in the embodiment of the present application, each child node of the initial node refers to a direct child node and an indirect child node of the initial node.

And step S105, determining vulnerability nodes in the source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule.

Here, when the step S105 is implemented, each executable path in the control flow graph related to the pointer may be first obtained, and then it may be determined whether each node in the executable path satisfies a preset vulnerability detection rule, where the node satisfying the vulnerability detection rule is determined as a vulnerability node. In this embodiment of the present application, the vulnerability detection rule may include a miss-release memory constraint condition, a repeated-allocation memory constraint condition, and an allocation-release mismatch constraint condition, and when a node satisfies at least one of these three constraint conditions, the node is considered to satisfy the vulnerability detection rule.

In the memory vulnerability detection method provided by the embodiment of the application, firstly, syntax analysis is carried out on an obtained source program to be detected, and an abstract syntax tree is generated; further extracting characteristic information of each node in the abstract syntax tree; determining target nodes based on characteristic information of each node, wherein the target nodes at least comprise pointer definition nodes; generating a pointer-dependent control flow graph based on the target node; and determining a vulnerability node with memory leakage in the source program to be detected based on the pointer-related control flow graph and a preset vulnerability detection rule, so as to perform characteristic formalized description on the memory vulnerability based on the PCFG, abstract out a vulnerability judgment rule based on the related vulnerability characteristic formalized description, and effectively improve the memory-related vulnerability detection efficiency.

In some embodiments, the step S101 "shown in fig. 1, performing syntax analysis on the acquired source program to be detected, and generating an abstract syntax tree" may be implemented by:

and step S1011, marking each variable in the source program to be detected to obtain the marked source program to be detected.

Here, the PCFG only includes related nodes for pointer definition, pointer access, memory allocation and release, and ensuring a complete structure of the program, so that it is necessary to distinguish a pointer type variable from a common variable, and further, a pointer type variable is marked as a first type variable, and a non-pointer type variable is marked as a second type variable. For example, the pointer variable is labeled pointer _ decl and the normal variable is labeled var _ decl.

Step S1012, performing semantic analysis on the marked source program to be detected by using a compiler, and generating an abstract syntax tree.

Through the abstract syntax tree generated in the above steps S1011 to S1012, the pointer type variable and the common type variable are already distinguished, so that when a pointer-related control flow graph is subsequently generated, targeted extraction and generation can be performed according to the variable type, and the generation efficiency can be improved.

In some embodiments, the step S102 "extracting feature information of each node in the abstract syntax tree" can be implemented by:

and step S1021, performing symbolization processing on the node identification of each node in the abstract syntax tree to obtain a symbolized node identification.

Here, in an actual implementation procedure, the above step S1021 can be implemented by the following steps S211 to S213:

step S211, obtaining node identifiers of each node in the abstract syntax tree.

Here, the node identification of each node in the abstract syntax tree generated by the source program can only embody the operation implemented by the node, for example, a line of code in the source program is: int test (int x), the code is a function definition statement, and in the abstract syntax tree, the node corresponding to the line code is marked as function _ decl; and the other line of code is char data, the line of code is a pointer variable definition statement, and in the abstract syntax tree, the node corresponding to the line of code is identified as pointer _ decl.

And step S212, when the node identifier is the user-defined variable name, mapping the node identifier to a preset symbolic variable name.

Here, in the embodiment of the present application, symbolic variable names may be set in advance, and further, custom non-pointer variable names may be mapped to symbolic non-pointer variable names, for example, symbolic non-pointer variable names var1, var2, var3, and so on. Custom pointer variable names may be mapped to symbolized pointer variable names, e.g., symbolized pointer variable names of pointer1, pointer2, pointer 3.

Taking the above example in mind, in this step, the node identification pointer _ decl is mapped to pointer1_ decl.

And step S213, when the node identifier is the user-defined function name, mapping the node identifier to a preset symbolized function name.

Here, the preset symbolization function names may be fun1, fun2, fun3, …, and so on. Taking the above example in mind, in this step, the node identification function _ decl is mapped to fun1_ decl.

Step S1022, determining feature keywords corresponding to each node in the abstract syntax tree.

In the embodiment of the present application, the corresponding relationship between the node identifier and the feature keyword may be determined according to the operation actually implemented by the code, for example, if the operation implemented by the code is a variable declaration operation, the corresponding feature keyword is DECL; if the operation actually realized by the code is the memory allocation operation, the corresponding characteristic keyword is MALLOC; and if the operation actually realized by the code is the condition judgment operation, the corresponding characteristic keyword is COND.

Step S1023, based on the symbolized node identification and the feature keyword of each node, determines feature information of each node.

Here, in the implementation of step S1023, the feature information of each node is represented in the form of "key _ symbolized node id", for example, the symbolized node id of fun1_ DECL is fun1, the corresponding feature key is DECL, and at this time, the feature information of the node is DECL _ funl.

In some embodiments, the step S104 "generating a pointer-related control flow graph based on the target node" may be implemented by steps S1041 to S1043 as described in fig. 2, which are described below in conjunction with the steps.

Step S1041, determining each target node as an initial node of the pointer-related control flow graph.

Here, since pointer access, memory allocation, or memory release all require that a pointer is defined first, in the embodiment of the present application, a pointer definition node (i.e., a target node) is used as an initial node.

Step S1042, based on the abstract syntax tree and the characteristic information of each node in the abstract syntax tree, determining the target child node of each initial node.

Here, the target child node is a node that performs pointer access, memory allocation, memory release, or guarantees that the program completes the structure in the child nodes of the initial node, that is, the target child node includes a pointer access node, a memory allocation node, a memory release node, and a node that guarantees that the program completes the structure.

And step S1043, adding the target child node into the pointer-related control flow graph.

Here, after all child nodes of the initial node in the abstract syntax tree are traversed, a final pointer-related control flow graph is obtained.

In some embodiments, the preset vulnerability detection rule includes at least one of a missed-release memory constraint condition, a repeated-allocation memory constraint condition, and an allocation-release mismatch constraint condition, and correspondingly, the step S105 "determining a vulnerability node in the source program to be detected based on the pointer-related control flow graph and the preset vulnerability detection rule" may be implemented by:

step S1051, determining each executable path in the pointer-related control flow graph.

Here, in the pointer-related control flow graph, for a statement sequence formed by m nodes, if satisfied, there is an edge between every two of the m nodes, the m nodes are considered to form an executable path. Each executable path in the pointer-related control flow graph is sequentially determined in step S1051.

Step 1052, acquiring characteristic information of each node to be detected in each executable path.

Step S1053, based on the characteristic information of each detection node, judging whether each detection node in the executable path meets the vulnerability detection rule.

Here, when it is determined that a certain detection node vulnerability detection rule, that is, at least one of a missed-release memory constraint condition, a repeated-allocation memory constraint condition, and an allocation-release mismatch constraint condition is satisfied, the detection node is considered to satisfy a preset vulnerability detection rule, and then step S1054 is performed; when a certain detection node vulnerability detection rule is determined, namely any one of a missed-release memory constraint condition, a repeated-allocation memory constraint condition and an allocation-release mismatch constraint condition is not satisfied, determining that the detection node does not satisfy the vulnerability detection rule; and detecting the next node until all the detected nodes in the pointer-related control flow graph are traversed.

In the embodiment of the present application, the miss-release memory constraint condition, the duplicate allocation memory constraint condition, and the allocation release mismatch constraint condition respectively include a basic constraint condition and an additional constraint condition, and the basic constraint condition and the additional constraint condition of the miss-release memory constraint condition, the duplicate allocation memory constraint condition, and the allocation release mismatch constraint condition are described below.

The basic constraint condition of missing the memory release constraint condition is that a pointer definition node ni exists in the executable path, a memory allocation node nj exists in a node succeeding the node ni, and the data of the node nj depends on a pointer variable pv defined by the pointer definition node ni. The additional constraint condition for omitting and releasing the memory constraint condition is that a first node nk or a second node nu does not exist in the successor node of nj, wherein the first node nk is a memory allocation node, and the data of the first node nk depends on a pointer variable pv defined by a pointer definition node ni; the second node nu defines a pointer variable pv ', the variable pv ' defined by the second node nu is in an alias relationship with the pointer variable pv defined by the pointer definition node ni, and the second node nu has a successor node nv (nv is also a successor node of nj), the node nv is a memory allocation node, and data of the node nv depends on the pointer variable pv ' defined by the node nu.

The basic constraint condition of the repeated memory allocation constraint condition is that a pointer definition node ni exists in the executable path, a memory allocation node nj exists in a node succeeding the node ni, and the data of the memory allocation node nj depends on a pointer variable pv defined by the pointer definition node ni. The additional constraint condition for repeatedly allocating the memory constraint condition is that a first node nk exists in a successor node of a memory allocation node nj, wherein the first node nk is a memory allocation node, and the data of the first node nk depends on a pointer variable pv defined by a pointer definition node ni; or an additional constraint condition of the repeatedly allocated memory constraint condition is that a node nu exists for the executable path, wherein the node nu defines a pointer variable pv ', the pointer variable pv ' and the pointer variable pv are in an alias relation, the node nu has a successor node nv (nv is also a successor node of nj), the node nv performs a memory allocation operation, and the data of the node nv depends on the pointer variable pv '.

The basic constraint condition of the mismatch constraint condition of the allocation release is that a pointer definition node ni exists in the executable path, a memory allocation node nj exists in a successor node of the node ni, and the data of the node nj depends on a pointer variable pv defined by the pointer definition node ni. An additional constraint for assigning a release mismatch constraint is that in the successor node of node nj, there is node nk: the node nk performs memory release operation, the release function is not matched with the distribution function, and the data of the node nk depends on the pointer variable pv; or a node nu exists in an executable path of the additional constraint condition of the mismatch constraint condition of the distributed release, the node nu defines a pointer variable pv ', the pointer variable pv ' is in an alias relation with the pointer variable pv, the node nu has a successor node nv (nv is also a successor node of nj), the node nv performs a memory release operation, the release function is not matched with the distribution function, and the data of the node nv depends on the pointer variable pv '.

And S1054, determining the node to be detected as a vulnerability node with memory leakage.

Here, because the memory leakage is caused by the omission of the released memory or the repeated allocation of the memory, or the mismatching of the allocation and the release, when the node to be detected meets at least one of the restriction conditions of the omission of the released memory, the restriction conditions of the repeated allocation of the memory, and the mismatching of the allocation and the release, the node to be detected is determined as the memory leak node, so that the nodes which possibly have the memory leak can be detected, and the comprehensiveness and the accuracy of the detection are ensured.

In some embodiments, as shown in fig. 3, after step S105, the method further comprises:

and step S106, determining the position information of the vulnerability node in the source program to be detected.

After the vulnerability node is determined, the position information of the vulnerability node in the source program to be detected can be determined according to the identifier of the vulnerability node. In the embodiment of the present application, the position information may be represented by a line number.

In some embodiments, a code statement corresponding to the bug node may also be determined.

And S107, outputting the vulnerability node and the position information.

Here, in the implementation of step S107, the vulnerability node and the location information may be output and displayed in a highlighted color on the display interface of the memory vulnerability detection apparatus, for example, the vulnerability node and the location information may be output and displayed in a color such as yellow or red. In some embodiments, a preset vulnerability identification may be output at a location where the vulnerability node is located in the source program, or a voice prompt may be output while the vulnerability node and location information are output, so as to prompt the user to detect the memory vulnerability, so that the user can pay attention to and repair the detected memory vulnerability in time.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application provides a memory related vulnerability detection method based on characteristic analysis, wherein lexical, grammatical and semantic characteristics related to a memory vulnerability are obtained by analyzing a code segment containing the memory related vulnerability, the characteristics related to the memory vulnerability are merged into a pointer related control flow graph, further, characteristic formalization description is carried out on three vulnerabilities including memory leakage, repeated release and reuse after release based on the pointer related control flow graph, vulnerability judgment rules are abstracted based on the related vulnerability characteristic formalization description, and the memory related vulnerability detection efficiency can be effectively improved.

Fig. 4 is a schematic diagram of an implementation framework of the memory vulnerability detection method provided in the embodiment of the present application, and as shown in fig. 4, the framework of the memory vulnerability detection method based on feature analysis mainly includes: an AST generation module 401, a node feature extraction module 402, a PCFG generation module 403, and a vulnerability determination module 404. Each block will be described below.

The AST generation module 401: the module, when implemented, may parse the source code using the front end of the GNU Compiler suite (GCC) to generate an abstract syntax tree file in tu format.

It should be noted that, because the PCFG only includes the pointer definition, the pointer access, the memory allocation and release, and the related nodes for guaranteeing the complete structure of the program, when the GCC compiler is used to generate the AST, the pointer type variable and the normal variable are distinguished, where the node name of the pointer variable in the AST is pointer _ decl, and the node name of the normal variable is var _ decl. The efficiency of extracting the PCFG of a program directly from a text-formatted AST is low, and therefore further processing of the AST is required, and the abstract syntax tree is stored in a designed data structure, thereby facilitating traversal of the program AST. The data structure of the nodes in the abstract syntax tree is designed as the following code segment:

the data structure design of the node attributes is shown as the following code segment:

the node feature extraction module 402: mainly aiming at the AST file extraction program node characteristic information. The basic idea of node feature extraction is based on source code symbolization, so that firstly, program source code needs to be symbolized. In the process of analyzing the source code, regarding the user-defined variables and the user-defined functions, the specific variable or function name is not concerned, and only the function of the variable or function in the code is concerned, so that the source code can be subjected to code symbolization processing.

The code symbolization processing of the source code needs to be realized by the following steps:

step 421, removing non-ASCII characters and comments from the code;

step 422, mapping the user-defined variables into a symbolic variable name set one by one.

In actual implementation, the tokenized variable names may be var1, var2, and so on.

Step 423, mapping the user-defined functions to the symbolic function name set one by one.

In actual implementation, the symbolized function names may be fun1, fun2, and so on.

The following illustrates the code symbolization process. First, the source code of vulnerability instance 1 is shown as the following code segment:

in the above code segment, a user-defined function test () and two user-defined variables x, data are included. When the code symbolization processing is carried out, firstly, non-ASCII code characters and comments in the source code are removed, namely, "/" custom function test ()/"in the first line is deleted, then, user custom variables and user custom functions are mapped, namely, the user custom variables x and data in the source code are mapped into var1 and pointer1 respectively, and the user custom function test is mapped into fun1, so that the following symbolized code segments are obtained:

since the AST of the source code has been generated in advance by the AST generation module, it is necessary to extract the feature of the node in the AST based on the idea of code symbolization, which is essential to extract the key information of the node and take the extracted result as the feature of the PCFG node.

The implementation process of the node feature extraction can be realized by the following steps:

and step 31, mapping the user-defined variables into a symbolic variable name set one by one.

Here, the symbolized variable names may be var1, var2, var3, and the like.

And step 32, mapping the user-defined functions into a symbolized function name set one by one.

Here, the symbolizing function names may be fun1, fun2, fun3, and the like.

Based on the operation implemented by the current node, a suitable key is selected from the defined key table, and the node characteristics are expressed in the form of "key _ variable" or "key _ function", step 33.

The following description will be made with reference to table 1 to extract node features.

TABLE 1 node feature extraction

In table 1, the first column "source code statement" is a program source code statement, the second column "source code symbolization processing" is a result of symbolizing a source code, the third column "AST node name" is a node name corresponding to the current source code statement in the AST, and the fourth column "node feature extraction" is key information obtained by processing the AST based on the source code symbolization processing.

As can be seen from the results of the processing of "node feature extraction" in the fourth column of table 1, after the feature extraction is performed on the AST node, the functions and variables related in the source program are mapped into a uniform symbol, for example, the feature extracted from char data is DECL _ pointer1, and only one pointer variable is declared in the feature in the statement, the variable is named pointer1, and the original specific name of the variable is ignored.

PCFG generation module 403: the method is mainly used for generating the PCFG corresponding to the program based on the AST. The data structure taken by the PCFG is shown by the following code segments:

in the code segment, id is used for uniquely identifying the node of the PCFG, lnum is used for recording the line number of the node in the source program, nextid is used for representing the next node id of the current node, isCond is used for identifying whether the current node is a conditional node, if yes, leftid and rightid respectively represent the left branch id and the right branch id of the current node, otherwise, both the leftid and the rightid are defaulted to be 0. vector < string > nf is used for storing the feature information of the current node, and the feature information is the node feature information extracted by the node feature extraction module.

In the present embodiment, the PCFG is generated in a recursive manner. In the process of generating the PCFG, the structures such as the sequence, selection, and loop in the program statement need to be analyzed separately. Related variables need to be declared first, and pointer variable declaration nodes of the program are processed. Since the PCFG only includes pointer definitions, pointer accesses, memory allocation and release, and related nodes for ensuring the complete structure of the program, all pointer variable declaration nodes in the program need to be stored in one node set during the generation of the PCFG. And then judging whether the current node exists in the node set or not, if so, adding the current node into the PCFG, otherwise, abandoning the current node, analyzing the next node of the current node, and outputting the PCFG when all the nodes are accessed to the end.

The vulnerability determination module 404 is mainly configured to determine whether a vulnerability exists in a program based on the vulnerability characteristic formalized description and the PCFG of the target program. The following decision rules are adopted in the decision process: for a certain executable Path p in the program Prog, belonging to the Path, if there is a vulnerability-related danger point

And if the elements in the vulnerability-related danger points N meet vulnerability-related constraints C, the program has a vulnerability.

The vulnerability determination method based on vulnerability characteristics has the main idea that the PCFG is traversed, vulnerability-related danger points N on executable paths are recorded in the traversal process, whether elements exist in the vulnerability-related danger points N corresponding to a certain executable path and meet vulnerability-related constraints C or not is determined based on determination 1, if yes, a corresponding vulnerability exists in a program is determined, and information of all vulnerability-related danger points causing the vulnerability is output.

In the embodiment of the application, the characteristics of three memory related vulnerabilities, namely memory leakage, multiple release and reuse after release, are mainly analyzed, and the vulnerability characteristics are defined. This is exemplified by memory leakage.

The memory leak refers to the result that some allocated heap memory is not released due to improper programmer coding in the program running process, so that the system memory is continuously consumed, and finally the program running speed is slowed down or the computer system is crashed. According to the vulnerability characteristic definition, the characteristic σ (ML) of the memory leak is { N, C }, and the following formalized descriptions of vulnerability-related risk points N and vulnerability characteristic constraints C are given respectively.

Vulnerability-related risk points N ═ { NDPV, NMMF, NMFF }, where:

NDPV represents the set of defining nodes of all pointer variables in the source program Prog, and in the embodiment of the present application, NDPV can be represented by the formula (1-1):

NMMF represents a set of all memory allocation nodes in the source program Prog, and in this embodiment of the present application, NMMF may be represented by equation (1-2):

the NMFF represents a set of all memory release nodes in the source program Prog, and in this embodiment, the NMFF may be represented by formulas (1-3):

and C, a vulnerability correlation constraint C-1C 2C 3. The constraints C1, C2, and C3 are explained below, respectively.

1. And missing and releasing the memory: c1 ═ BC1 ^ AC1, where: BC1 is the basic constraint and AC1 is the additional constraint.

The basic constraint BC1 indicates that for a certain executable path in the program Prog there is a node ni, which defines the pointer variable pv. There is a node nj in the successor of node ni, node nj has performed mmf (pv) operation, and node nj data depends on pointer variable pv defined by node ni. In the present embodiment, BC1 may be represented by equations (1-4):

after the basic constraint BC1 is satisfied, an additional constraint AC1 needs to be satisfied. The additional constraint AC1 includes two possible scenarios: for the executable path, in the successor node of the node nj, there is no such node nk: the node nk is subjected to MFF (pv), and the data of the node nk depends on a pointer variable pv defined by the node ni; for the executable path, in the successor node of the node nj, there is no node nu: node nu defines a pointer variable pv ', the variable pv' defined by node nu is in an alias relationship with the pointer variable pv defined by node ni, and node nu has a successor node nv (nv is also the successor node of nj), the node nv performs an MFF (pv ') operation, and the data of node nv depends on the pointer variable pv' defined by node nu. In the embodiment of the present application, AC1 can be represented by formulas (1-5):

2. and repeatedly allocating memory: c2 ═ BC2 ^ AC2, where: BC2 is the basic constraint and AC2 is the additional constraint.

The basic constraint BC2 indicates that for a certain executable path in the program Prog there is a node ni, which defines the pointer variable pv. There is a node nj in the successor of node ni, node nj has performed mmf (pv) operation, and node nj data depends on pointer variable pv defined by node ni. In the present embodiment, BC2 may be represented by equations (1-6):

after the basic constraint BC2 is satisfied, an additional constraint AC2 needs to be satisfied. The additional constraint AC2 includes two possible scenarios: for the executable path, in the successor node of the node nj, there is such a node nk: the node nk is subjected to MMF (pv) operation, and the data of the node nk depends on a pointer variable pv defined by the node ni; for the executable path, there is a node nu on the path: node nu defines a pointer variable pv ', the pointer variable pv' defined by node nu is in an alias relationship with the pointer variable pv defined by node ni, node nu has a successor node nv (nv is also the successor node of nj), node nv performs MMF (pv ') operations, and node nv data depends on the pointer variable pv' defined by node nu. In the present embodiment, AC2 may be represented by equations (1-7):

3. the allocation release does not match: c3 ═ BC3 ^ AC3, where: BC3 is the basic constraint and AC3 is the additional constraint.

The basic constraint BC3 indicates that for a certain executable path in the program Prog there is a node ni, which defines the pointer variable pv. There is a node nj in the successor of node ni, node nj has performed mmf (pv) operation, and node nj data depends on pointer variable pv defined by node ni. In the present embodiment, BC3 may be represented by equations (1-8):

after the basic constraint BC3 is satisfied, an additional constraint AC3 needs to be satisfied. The additional constraint AC3 includes two possible scenarios: for the executable path, in the successor node of the node nj, there is such a node nk: node nk carries out

Operation (i.e. the release function does not match the allocation function) and node nk data depends on the pointer variable pv defined by node ni; for the executable path, there is a node nu on the path: node nu defines a pointer variable pv ', the pointer variable pv' defined by node nu is in an alias relationship with the pointer variable pv defined by node ni, and node nu has a successor node nv (nv is also the successor node of nj), and node nv performs alias operation

The operation (i.e., the release function does not match the allocation function) and the node nv data depends on the pointer variable pv' defined by the node nu, in this embodiment, the AC3 can be represented by the following formula (1-9):

in the memory vulnerability detection method provided by the embodiment of the application, lexical, grammatical and semantic features related to memory-related vulnerability generation are summarized by analyzing a large number of code segments containing memory-related vulnerabilities, and vulnerability feature definition and vulnerability feature formalization description methods are provided. Due to the definition of the vulnerability characteristics provided by the embodiment of the application, the commonality characteristics of the vulnerabilities related to the memory can be embodied, so that the study on the commonality generated by the vulnerabilities is facilitated. Meanwhile, features of memory related loopholes are integrated into a control flow graph, a pointer related control flow graph is provided, three loopholes of memory leakage, multiple release and reuse after release are subjected to feature formalized description based on PCFG, and loophole judgment rules are abstracted based on the related loophole feature formalized description.

An embodiment of the present application provides a memory vulnerability detection apparatus, fig. 5 is a schematic structural diagram of a memory vulnerability detection apparatus in an embodiment of the present application, and as shown in fig. 5, the memory vulnerability detection apparatus 500 includes: a parsing module 501, a feature extraction module 502, a first determination module 503, a first generation module 504, and a second determination module 505, wherein:

the syntax analysis module 501 is configured to perform syntax analysis on the obtained source program to be detected, and generate an abstract syntax tree;

the feature extraction module 502 is configured to extract feature information of each node in the abstract syntax tree;

the first determining module 503 is configured to determine a target node based on feature information of each node, where the target node includes at least a pointer definition node;

the first generating module 504 is configured to generate a pointer-related control flow graph based on the target node;

the second determining module 505 is configured to determine, based on the pointer-related control flow graph and a preset vulnerability detection rule, a vulnerability node in the source program to be detected, where the memory leak exists.

In some embodiments, the parsing module 501 further comprises:

the marking unit is used for marking all variables in the source program to be detected to obtain the marked source program to be detected, wherein the pointer type variables are marked as first type variables, and the non-pointer type variables are marked as second type variables;

and the semantic analysis unit is used for performing semantic analysis on the marked source program to be detected by utilizing a compiler to generate an abstract syntax tree.

In some embodiments, the feature extraction module 502 further comprises:

the symbolization processing unit is used for symbolizing the node identification of each node in the abstract syntax tree to obtain a symbolized node identification;

the first determining unit is used for determining the characteristic keywords corresponding to each node in the abstract syntax tree;

and the second determining unit is used for determining the characteristic information of each node based on the symbolic node identification and the characteristic key word of each node.

In some embodiments, the symbolization processing unit comprises:

the first acquiring subunit is used for acquiring the node identification of each node in the abstract syntax tree;

the mapping method comprises a first mapping subunit, a second mapping subunit and a third mapping subunit, wherein the first mapping subunit is used for mapping a node identifier into a preset symbolic variable name when the node identifier is a user-defined variable name;

and the second mapping subunit is used for mapping the node identifier into a preset symbolized function name when the node identifier is the user-defined function name.

In some embodiments, the first generation module 504 further comprises:

a third determining unit, configured to determine each target node as an initial node of the pointer-related control flow graph;

a fourth determining unit, configured to determine a target child node of each initial node based on the abstract syntax tree and feature information of each node in the abstract syntax tree, where the target child node is a pointer access node, a memory allocation node, or a memory release node;

and the node adding unit is used for adding the target child node into the pointer-related control flow graph.

In some embodiments, the preset vulnerability detection rule includes at least one of a miss-release memory constraint, a duplicate-allocation memory constraint, and an allocation-release mismatch constraint, and correspondingly, the second determining module 505 further includes:

a fifth determining unit, configured to determine each executable path in the pointer-dependent control flow graph;

the acquiring unit is used for acquiring each node to be detected in each executable path;

and the sixth determining unit is used for determining the node to be detected as a vulnerability node with memory leakage when the node to be detected meets at least one of the constraint conditions of missing and releasing memory, the constraint conditions of repeatedly allocating memory and the mismatch constraint conditions of allocation and release.

In some embodiments, the apparatus further comprises:

a third determining module, configured to determine location information of the vulnerability node in the source program to be detected;

and the output module is used for outputting the vulnerability node and the position information.

It is noted that the description of the above apparatus embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the memory vulnerability detection method is implemented in the form of a software functional module and is sold or used as an independent product, the memory vulnerability detection method may also be stored in a computer readable storage medium. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present application further provides a storage medium, where a memory vulnerability detection program is stored on the storage medium, and when being executed by a processor, the memory vulnerability detection program implements the steps of the memory vulnerability detection method provided in the other embodiments.

Correspondingly, an embodiment of the present application provides a memory vulnerability detection apparatus, and fig. 6 is a schematic structural diagram of a memory vulnerability detection apparatus 600 according to an embodiment of the present application, as shown in fig. 6, the memory vulnerability detection apparatus 600 at least includes: memory 601, communication bus 602, and processor 603, wherein:

the memory 601 is used for storing a memory vulnerability detection program;

the communication bus 602 is used for realizing connection communication between the processor and the memory;

the processor 603 is configured to execute a memory vulnerability detection program stored in the memory, so as to implement the steps of the memory vulnerability detection method provided in other embodiments.

The memory 601 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a Flash Memory (Flash Memory), and the like. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM). The memory 601 described in embodiments herein is intended to comprise these and any other suitable types of memory.

As an example that the method provided by the embodiment of the present application is implemented by combining software and hardware, the method provided by the embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 603, where the software modules may be located in a storage medium located in the memory 601, and the processor 603 reads executable instructions included in the software modules in the memory 601, and combines with necessary hardware (for example, including the processor 603 and other components connected to the communication bus 602) to implement the memory hole detection method provided in the embodiment described above.

By way of example, the Processor 603 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

The above description of the memory vulnerability detection device and the storage medium embodiment is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment. For technical details that are not disclosed in the embodiments of the memory vulnerability detection apparatus and the storage medium of the present application, please refer to the description of the embodiments of the method of the present application for understanding.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method or system. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, systems according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A memory vulnerability detection method is characterized by comprising the following steps:

extracting characteristic information of each node in the abstract syntax tree;

determining a target node based on the characteristic information of each node;

generating a pointer-dependent control flow graph based on the target node;

2. The method according to claim 1, wherein the parsing the acquired source program to be detected to generate an abstract syntax tree includes:

marking all variables in a source program to be detected to obtain the marked source program to be detected, wherein pointer variables are marked as first type variables, and non-pointer variables are marked as second type variables;

and performing semantic analysis on the marked source program to be detected by using a compiler to generate an abstract syntax tree.

3. The method of claim 2, wherein extracting feature information of each node in the abstract syntax tree comprises:

performing symbolization processing on the node identification of each node in the abstract syntax tree to obtain symbolized node identification;

determining characteristic keywords corresponding to each node in the abstract syntax tree;

and determining the characteristic information of each node based on the symbolic node identification and the characteristic keywords of each node.

4. The method according to claim 3, wherein said symbolizing the node identifier of each node in the abstract syntax tree to obtain a symbolized node identifier comprises:

acquiring node identification of each node in the abstract syntax tree;

when the node identifier is a variable name defined by a user, mapping the node identifier into a preset symbolic variable name;

and when the node identification is the user-defined function name, mapping the node identification to a preset symbolized function name.

5. The method of claim 1, wherein the target node comprises at least a pointer definition node, and wherein the generating a pointer-dependent control flow graph based on the target node comprises:

determining each target node as an initial node of a pointer-related control flow graph;

determining target child nodes of each initial node based on the abstract syntax tree and the characteristic information of each node in the abstract syntax tree, wherein the target child nodes comprise pointer access nodes, memory allocation nodes or memory release nodes and nodes capable of ensuring the complete structure of a program;

and adding the target child node into the pointer-related control flow graph.

6. The method according to claim 1, wherein the determining, based on the pointer-related control flow graph and a preset vulnerability detection rule, a vulnerability node having a memory leak in a source program to be detected comprises:

determining each executable path in the pointer-dependent control flow graph;

acquiring each node to be detected in each executable path;

and when the nodes to be detected meet the vulnerability detection rules, determining the nodes to be detected as vulnerability nodes.

7. The method according to any one of claims 1 to 6, further comprising:

determining the position information of the vulnerability node in the source program to be detected;

and outputting the vulnerability node and the position information.

8. A memory vulnerability detection apparatus, the apparatus at least comprising: the device comprises a syntax analysis module, a feature extraction module, a first determination module, a first generation module and a second determination module, wherein:

9. The memory vulnerability detection device is characterized by at least comprising: a memory, a communication bus, and a processor, wherein:

the memory is used for storing a memory vulnerability detection program;

the processor is configured to execute a memory hole detection program stored in the memory to implement the steps of the memory hole detection method according to any one of claims 1 to 7.

10. A storage medium having a memory hole detection program stored thereon, wherein the memory hole detection program, when executed by a processor, implements the steps of the memory hole detection method according to any one of claims 1 to 7.