CN111967012B - Digest generation method for C/C++ code vulnerability patch - Google Patents

Digest generation method for C/C++ code vulnerability patch Download PDF

Info

Publication number
CN111967012B
CN111967012B CN202010666854.9A CN202010666854A CN111967012B CN 111967012 B CN111967012 B CN 111967012B CN 202010666854 A CN202010666854 A CN 202010666854A CN 111967012 B CN111967012 B CN 111967012B
Authority
CN
China
Prior art keywords
patch
path
control flow
basic block
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010666854.9A
Other languages
Chinese (zh)
Other versions
CN111967012A (en
Inventor
杨珉
张源
江喆越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010666854.9A priority Critical patent/CN111967012B/en
Publication of CN111967012A publication Critical patent/CN111967012A/en
Application granted granted Critical
Publication of CN111967012B publication Critical patent/CN111967012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The invention belongs to the technical field of binary vulnerability analysis, in particular to a digest generation method for a C/C++ code vulnerability patch; the method comprises the following specific steps: firstly, removing paths irrelevant to patches to the greatest extent by utilizing subsequent dominant nodes of the basic blocks of the patch control flow so as to determine the control flow paths most relevant to the patches; then, performing symbol execution on the patch related path by using a customized symbol execution tool and extracting robust patch semantic information; and finally, calculating path abstracts of all anchor nodes as abstract data of the whole patch. The method can completely generate the fine-granularity binary vulnerability patch semantic information, and provides high-precision and easily-expanded patch abstract information for binary vulnerability analysts.

Description

Digest generation method for C/C++ code vulnerability patch
Technical Field
The invention belongs to the technical field of binary vulnerability patch analysis, and particularly relates to a digest generation method for a C/C++ code vulnerability patch.
Background
Source code level patches are usually generated by comparing source codes before and after two patches through diff commands, the patches only comprise added and deleted code fragments in form, and security analysts have difficulty in knowing context semantic information of the patches from patch files, which brings great difficulty to analysis and use of the patches. In order to better analyze and understand the patch, the invention provides the concept of the patch abstract for the first time, aims to help security analysts and target program users to better analyze and understand the semantics of the patch, evaluate the patch state on the target binary, and apply the patch more conveniently.
At present, related researches on automatic analysis and collection of patch semantics do not exist, and the invention researches the existing work closest to patch abstract generation, and the work comprises the following steps: firstly, manually analyzing patch source codes to obtain semantics of patch introduction modification; and secondly, automatically extracting the patch from the patch source code, and compiling the patch into a binary format so as to acquire the binary signature characteristic of the patch.
Both of these methods have certain drawbacks, in that manual analysis often requires extensive reverse and static analysis, which is not applicable on a large scale. The patch signature information extracted by the other method is often related to a specific compiling environment, and high-level patch semantic information separated from a specific binary file cannot be provided, so that a great deal of patch original information is lost, and subsequent patch analysis is affected.
Based on the analysis, the invention provides a novel automatic abstract generation method aiming at the C/C++ code vulnerability patch, which adapts to more complex security patch analysis and application scenes by abstracting semantics introduced by a source code level vulnerability patch into an intermediate representation.
Disclosure of Invention
The invention aims to provide a method for automatically extracting code bug patch semantic information for security analysts. Because the patch is a discrete source code fragment, and the context code information is lacking, semantic analysis is difficult to perform, the invention acquires the complete semantics introduced by the patch by restoring the patch into a control flow graph of a related function before and after the patch is applied. Considering that the modification of the patch may be very minor, the present invention requires strict selection of the context area to which the patch relates, thereby avoiding the impact of introducing patch-independent code. To support the selection of such a context area, the present invention introduces the concept of an anchor node to embody the code affected by the code modification before and after the patch, while filtering the influence of irrelevant code. Based on the anchor node, the invention utilizes the backward slice path technology to screen all paths affected by the patch, utilizes the symbol execution technology to abstract the behavior semantics of the paths, and forms the semantic abstract of the patch.
The technical scheme of the invention is specifically introduced as follows.
A digest generation method for a C/C++ code vulnerability patch comprises the following specific steps:
1) Finding out a certain subsequent dominating node of the patch basic block in a control flow graph CFG of the binary file compiled by the source code before/after the patch as an anchor node, wherein the anchor node keeps semantic information of the patch basic block on a path and cuts off the redundant basic block after the patch;
2) Collecting control flow and data flow semantic features on patch related paths decided by single anchor nodes, and encoding the control flow and data flow semantic features into language-independent semantic information serving as path abstracts;
3) And collecting path summaries corresponding to all anchor nodes as summary information of the whole patch.
In the invention, in the step 1), the positioning method of the patch basic block is as follows: the method comprises the steps of firstly collecting source codes before and after a patch, then generating a patch file through diff commands, and then collecting functions of the patch and line numbers of patch lines according to patch header information to locate the positions of basic blocks affected by the patch.
In the present invention, in step 1), the method for obtaining the anchor node is as follows:
1) Firstly, determining all patch basic block sets, deleting line-affected basic blocks for source codes before patching, and adding line-affected basic blocks for source codes after patching;
2) Then, one of the patch basic blocks is taken out, and the subsequent dominant node of the basic block is calculated in a control flow graph of the corresponding method;
3) Starting to select by the nearest subsequent dominant node, if the subsequent dominant node exists in the function control flow graph before/after the patch at the same time, the subsequent dominant node is the anchor node of the current patch basic block; otherwise, continuing to select.
4) Returning to the step 1) until the anchor nodes of all the basic blocks affected by the patch are selected.
In the invention, in step 2), the semantic features of the control flow and the data flow include path constraint, memory state and function call.
In the invention, in the step 2), a symbol execution technology is utilized to simulate and execute a patch related function of a target binary file to obtain a path abstract; the specific method comprises the following steps:
firstly, taking an entry of a patch related function as an initial state, and ignoring call among functions;
then, in the process of simulation execution, the uninitialized context information is processed, wherein the uninitialized context information comprises initial parameters of the patch related function, the value of an undetermined memory variable and a return value of a function call;
the symbolic execution engine then performs a simulation execution along the path, collecting the symbolic values of the semantic features of the control flow and the data flow as a final path summary.
In the invention, in the step 2), the patch abstract is expressed in BNF normal form.
Compared with the prior art, the invention has the beneficial effects that:
the patch abstract generation technology provided by the invention can automatically generate the context semantic information of the patch in the source codes before/after the patch. On the other hand, the format of the patch abstract is a general BNF normal form representation, and security analysts can quickly utilize the patch abstract to conduct deep researches on aspects of patch analysis, patch application, patch detection and the like.
Drawings
Fig. 1 is a diagram of a binary vulnerability patch digest generation architecture.
Fig. 2 is a diagram of an example patch.
Fig. 3 is an example of an anchor node for a patch basic block.
Fig. 4 is a BNF paradigm of path summary.
Fig. 5 is a BNF paradigm of the patch digest.
Fig. 6 is anchor node selection algorithm details.
Detailed Description
The technical scheme of the invention is described in detail below with reference to the accompanying drawings and examples.
The invention mainly aims at C/C++ code bug patches, and aims at abstracting source code patches into a semantic abstract of a middle level, thereby facilitating analysis and application of the patches. An overall frame diagram of the present invention is shown in fig. 1. Comprising the following steps: the system comprises an anchor node selection module, a path abstract generation module and a patch abstract generation module. The execution flow of the invention is as follows:
1) The anchor node selection module finds the subsequent dominant node of the patch basic block in a Control Flow Graph (CFG) of a binary file compiled by source codes before and after the patch so as to shorten the control flow path passing through the patch basic block to the greatest extent and reduce the influence of irrelevant basic blocks.
2) The path summary generation module collects control flow and data flow semantic features on patch related paths decided by single anchor nodes and encodes the control flow and data flow semantic features into language independent semantic information.
3) And the patch abstract generation module is used for collecting path abstracts corresponding to all anchor nodes as abstract information of the whole patch.
The modules are further described below:
1. anchor node selection
The anchor node defined by the invention is a control flow basic block: a Post manager of a patch basic block can accurately preserve semantic information of the patch basic block on a path, while truncating redundant basic blocks after the patch. If a function patch file contains multiple patch basic blocks, there are multiple corresponding anchor nodes.
(1) Locating patch modified basic blocks
As shown in one example C/c++ vulnerability patch diagram of fig. 2, the patch file contains two different types of patches, respectively: a patch deletion line at the head of "-" and a patch addition line at the head of "+". The control flow basic block affected by the patch deletion line is embodied in the source code before the patch, and the control flow basic block affected by the patch addition line is embodied in the source code after the patch. So to fully express the semantic information of the patch, it is necessary to collect patch-related paths in the source code before/after the patch at the same time.
In order to locate the basic blocks of the patch, the invention firstly collects the source codes before and after the patch, then generates a patch file through diff commands, and then collects the functions where the patch is located and the line numbers of patch lines according to the patch header information. Because the patch generally affects the source codes before and after the patch (deleting the source code before the line affects the patch and adding the source code after the line affects the patch), the source codes before and after the patch are compiled into two binary files at the same time, and the debugging information is utilized to locate the position of the basic block affected by the patch in the function Control Flow Graph (CFG) related to the patch.
(2) Anchor node selection
After the location of the patch basic block is determined, the present invention needs to collect all control flow paths affected by the patch in order to extract the semantic information of the patch. It is intuitive to collect all paths through the patch basic blocks directly, but the patch tends to be tiny, and from the patch basic blocks to the function outlets tends to contain a large number of redundant basic blocks which are irrelevant to the patch, and semantic information contained in the redundant basic blocks can reduce the weight of the patch semantic information. To solve this problem, the present invention proposes to use anchor nodes to truncate the path after the patch, thereby maximizing the magnification of the patch semantic information.
In order to select an appropriate anchor node, the present invention provides that the anchor node needs to satisfy the following three conditions:
1) Any path through the patch basic block will reach at least one anchor node. Hereby it is ensured that all basic blocks affected by the patch are covered.
2) All paths behind the anchor node do not reach the patch basic block. This ensures that none of the patch basic blocks are deleted in the subsequent optimizations.
3) The anchor node needs to exist in the source code before and after patch application. Thereby, the consistency of the relevant paths before and after patch application can be ensured.
In order to find an anchor node meeting the above conditions, the present invention examines and filters through the subsequent dominant nodes of the patch basic block. The anchor node selection comprises the following specific steps:
1) First, determining all patch basic block sets, deleting basic blocks of line influence for source codes before patching, and adding basic blocks of line influence for source codes after patching.
2) One of the patch basic blocks is then fetched, and the subsequent dominant node of this basic block (excluding the patch basic block itself) is calculated in the control flow graph of the corresponding method.
3) The selection is started by the nearest successor dominating node, which is the anchor node of the current patch basic block if it exists in the function control flow graph before/after the patch at the same time. Otherwise, continuing to select.
4) Returning to 1) until the anchor nodes of all the patch-affected basic blocks are selected.
Through the algorithm, the method and the system can acquire all anchor nodes of one patch file. These anchor nodes are all able to meet the 3 conditions described above: for condition 1 and condition 2, because the anchor node is the subsequent dominant node, the path to the anchor node must pass through the patch basic block according to the characteristics of the node, and the path from the anchor node must not pass through the patch basic block; for condition 3, the present invention ensures that the source code of the anchor nodes before and after the patch is the same and their assembly code is also the same.
As shown in the control flow graph of the same function of the source code before/after the patch in fig. 3, the a and f nodes in the left graph correspond to the two lines of codes deleted in fig. 2, and the h and i nodes in the right graph correspond to the two lines of codes added in fig. 2. They are identified as patch basic blocks, and by the screening of the above-mentioned anchor node selection algorithm, it is determined that their anchor nodes are all g nodes. It can be seen that the gnode intercepts all patch independent paths thereafter and retains the most compact patch dependent paths.
2. Path abstract generation
The path abstract generation module is used for generating patch semantic information on the patch related path. Patch related paths herein refer to a set of control flow paths beginning with a function entry and ending with a given anchor node. The accurate semantic information needs manual analysis, and in order to automatically extract the semantic information, the control flow semantic and the data flow semantic in the path are selected as approximate patch semantic. In addition, in order to extract language-independent semantic information, the invention utilizes a symbolic execution technology to simulate the patch related function of the target binary file and selects path constraint, memory state and function call therein as path abstract. Next, the present invention sets forth these path summary formats and specific summary extraction procedures.
(1) Representation of path summary
The path extraction is a combination of three kinds of path semantic information. In the symbol execution process, the execution engine can conveniently record three kinds of information, namely path constraint, memory state and function call sequence. The present invention requires that the path summary be satisfied with specific binary-independent, language-independent requirements. In order to determine the format of the path abstract, the invention reuses the related representation in the symbol execution engine, namely, the three kinds of semantic information are uniformly represented by symbol values.
Path constraint: the Path Constraint (Path Constraint) refers to the value of a branch condition related to an input symbol in a program branch instruction, and is a series of boolean formulas without a graduated word. In the process of executing the symbol, each time a judgment and jump statement is encountered, the symbol execution tool collects the path constraint of the current execution path into a constraint set of the path. The path constraints in the present invention are represented as Abstract Syntax Trees (AST), where each AST has a comparison operator as a root node and a symbolic expression as a left/right sub-tree.
Memory state: a set of memory accesses along the patch-related path. The present invention only considers global memory accesses, because global memory accesses are consistent among patches, while local memory accesses are typically different binary specific. It should be noted that the present invention regards the memory pointed to by all symbolic addresses as global memory. The present invention uses key-value pairs to represent each access to global memory. Considering that different binary files may have different real addresses for the same global memory, the present invention represents them as signed addresses in g_idx format. Where idx denotes the number of global memory or symbolic addresses that have been recorded. For values of memory accesses, the present invention uses real or symbolic values to represent.
Function call sequence: an ordered list of all called functions in the path of the symbolic execution. For each function call, the invention only needs to record its name and ignore its parameters. For functions without names, the present invention simply assigns a special name func_unknow to them.
(2) Abstract extraction process
The symbolic execution technology uses symbolic variables to represent specific variables in the program execution process, in the symbolic execution process, an execution engine can collect various branch judgment in a path as path constraint, and finally a constraint solver is utilized to solve program input meeting the constraint. Unlike common symbolic execution, the method takes the entry of the patch related function as an initial state and ignores the call among the functions, and accordingly, some uninitialized context information needs to be processed in the process of simulation execution, including initial parameters of the patch related function, the value of an undetermined memory variable and the return value of the function call. The symbol execution engine then performs a simulation execution along the particular path to collect the three patch semantic information as a final path summary.
As shown in fig. 4, the final path digest of a single anchor node is represented in bachelus-norms (BNF), which is composed of path constraints, memory states, and a sequence of function calls. Wherein the path constraint is a constraint set of a set of Abstract Syntax Trees (AST); the memory state is a memory element set expressed by a key value; the function call sequence is a function call sequence in a specific path.
Because the anchor nodes exist in the function control flow diagrams before and after the patch at the same time, the path abstract finally generated by the module is composed of the path abstract BNF normal form of the function control flow diagram before the patch and the path abstract BNF normal form of the function control flow diagram after the patch.
3. Patch abstract generation module
In the anchor node selection module, the invention calculates an anchor node set of a patch file, each element in the anchor node set corresponds to two path sets, which are patch related path sets before and after patching respectively. In order to completely express all semantic information of one patch file, the patch abstract module needs to calculate abstract sets of two path sets corresponding to all anchor nodes, and the final patch abstract is composed of the two path abstract sets.
As shown in fig. 5, the entire patch digest is represented in BNF paradigm, where PatchDigest represents a patch digest that is made up of path digest tuples (PathDigests) controlled by anchor nodes, each path digest tuple being made up of a pre-patch binary patch-related path digest set (PrePatchPathDigests) and a post-patch binary patch-related path digest set (PostPatchPathDigests), both sets containing a path digest for each patch-related path.
Example 1
The embodiment designs an anchor node selection module, which firstly analyzes a patch file through source codes before and after the patch, locates the position of a basic block of the patch by utilizing source code line information of the patch, and then selects an anchor node of each patch basic block by utilizing a proper algorithm. The path abstract generation module designed by the invention can automatically extract the semantic information of the patch related path between the function entry and the anchor node, thereby generating the semantic abstract information of the specific patch basic block. This section describes the implementation of these two modules.
1. Anchor node selection module
The module is mainly used for determining the basic block position of the patch code in the binary file, and then selecting an anchor node of a proper patch basic block. The module takes the source codes before and after the patch and the patch file as input, and finally generates an anchor node set of each patch basic block. This section describes in detail the positioning of the patch basic blocks and the selection algorithm of the anchor nodes.
(1) Positioning of patch basic blocks
In order to map the patch into the binary executable file, the invention firstly analyzes the patch file to extract the related function information of the patch and the line number information of the patch. In order to obtain the patch file, the invention directly compares the source codes before and after the patch through diff commands so as to derive the patch file in a specific format. Each patch file contains a patch header indicating the function name where the patch is located and the location of the code line affected by the patch, and the present invention chooses to determine the patch-related function by parsing the source code, considering that the patch-related function name given in the patch file may be covered by a macro, such as SYSCALL _ DEFINE (func).
Next, the module needs to map the patch row to the basic block in the binary file. In order to complete the mapping process of the cross-language, the invention directly utilizes the debugging information in the binary file to position. The positioning process is as follows:
1) Firstly, compiling the patched source code into a binary file, and because most of published binary programs are compiled by adopting standard O2 optimization, the invention also adopts standard O2 optimization. In addition, the invention needs to start the debugging mode to compile the source codes before/after the patch, so the-g option is required to be started in the compiling process; finally, the module generates a Control Flow Graph (CFG) of the patch related function by using the debugging information in the binary file and positions the patch basic blocks in the CFG graph through the patch line numbers. It should be noted that the present invention utilizes the open source symbol execution engine angr to export the control flow graph of the target binary file.
(2) Selection algorithm for controlling anchor nodes
According to the above, each patch basic block needs to find the same control flow anchor node in the binary control flow graph obtained after the pre-patch/post-patch source code compilation. As shown in fig. 6, a specific anchor node selection algorithm is as follows:
C pre is a patch basic block affected by the patch in the source code before the patch, C post The patch basic block is affected by the patch in the source code after the patch; cfg of pre Control flow graph of patch related function, cfg, which is the source code before patch post Is a control flow graph of the patch related function of the source code after patch. anchors are an anchor node set of all patch basic blocks, initially an empty set.
The invention needs to be applied to C pre And C post And selecting anchor nodes from all patch basic blocks. For a specific patch basic block, the method calculates all subsequent dominant node sequences of the patch basic block by using a post_minimizer function, and notices that the sequences are sequences which are topologically ordered by a top_sort function, so that the method can ensure that the method starts to select from the node closest to the patch basic block. Then, the invention selects the subsequent dominant node in sequence, if the subsequent dominant node is not the patch basic block itself, and the subsequent dominant node is verified to exist in another source code through the check_existence function, the current subsequent dominant node is the anchor node, and the anchor node is added into the anchors set. The final anchors set contains anchor node sets corresponding to all patch basic blocks in a patch file.
2. Path abstract generation
The path summary generation module generates a language independent semantic information summary based on the patch dependent control flow path. The patch related path refers to an anchor node in a patch related function, and the invention enumerates all paths starting with the function entry and ending with the anchor node. Furthermore, to avoid path explosion problems during symbol execution, the present invention only expands the loop once and discards the irresolvable paths during symbol execution. This allows the final collected patch relevant path and its digest to be satisfied in actual execution.
The present module needs to extract semantic information of the patch related path, specifically, the above-mentioned control flow and data flow features, including: path constraints, memory states, and function calls. These features can be extracted by custom symbolic execution, but unlike ordinary symbolic execution, the present invention uses an open-source symbolic execution engine angr to perform analog path execution and custom modifies it so that the entry of the symbolic execution is the entry of the specified path. In addition, the present invention also requires additional processing of some variables:
1) Function parameters: the present invention determines the parameters of the patch related function by a function calling convention and initializes it to a uniquely identified symbol value (e.g., arg0 is assigned to the first parameter).
2) Undetermined memory: the undetermined memory includes an uninitialized memory and a memory pointed to by a symbolic address. For uninitialized global memory (data and bypass segments), the present invention assigns unique symbol values thereto. For local memory, the initial value of the invention is set to 0. In addition, for the memory to which the symbol address points, the present invention assigns a symbol value according to the address and reads/writes the symbol value during the interpretation. Meanwhile, the invention also maintains a mapping table between the symbol address and the value thereof. Thus, when the symbol address is de-referenced, the symbol execution engine can correctly reuse the value stored in the address.
3) Return value of function call: the return value of the function call will be represented as the sign value of { funcname } -ret_ idx }. Where funcname is the name of the function being called and idx represents the number of times the function was called on the current path. If the name of the called function cannot be determined, the function is given a unique symbolic name according to the address of the function.
The path summary is represented by symbolic values in the symbolic execution engine that abstract language independent semantic features, including control flow features (path constraints, function call sequences) and data flow features (memory states).
The patch abstract finally obtained by the invention consists of path abstracts controlled by all anchor nodes, and represents the context semantic information of each patch row in the whole patch. These semantic features remove the redundant patch-independent semantic information to the greatest extent, while providing security analysts with a summary format (BNF paradigm) that facilitates multiplexing and analysis.

Claims (5)

1. A digest generation method for a C/C++ code vulnerability patch is characterized by comprising the following specific steps:
1) Finding out a certain subsequent dominating node of the patch basic block in a control flow graph CFG of the binary file compiled by the source code before/after the patch as an anchor node, wherein the anchor node keeps semantic information of the patch basic block on a path and cuts off the redundant basic block after the patch;
2) Collecting control flow and data flow semantic features on patch related paths decided by single anchor nodes, and encoding the control flow and data flow semantic features into language-independent semantic information serving as path abstracts;
3) Collecting path abstracts corresponding to all anchor nodes as abstract information of the whole patch; wherein:
in step 1), the method for obtaining the anchor node is as follows:
(1) firstly, determining all patch basic block sets, deleting line-affected basic blocks for source codes before patching, and adding line-affected basic blocks for source codes after patching;
(2) then, one of the patch basic blocks is taken out, and the subsequent dominant node of the basic block is calculated in a control flow graph of the corresponding method;
(3) starting to select by the nearest subsequent dominant node, if the subsequent dominant node exists in the function control flow graph before/after the patch at the same time, the subsequent dominant node is the anchor node of the current patch basic block; otherwise, continuing to select;
(4) returning to the step (1) until the anchor nodes of all the basic blocks affected by the patches are selected.
2. The digest generation method according to claim 1, wherein in step 1), the positioning method of the patch basic block is as follows: the method comprises the steps of firstly collecting source codes before and after a patch, then generating a patch file through diff commands, and then collecting functions of the patch and line numbers of patch lines according to patch header information to locate the positions of basic blocks affected by the patch.
3. The summary generation method of claim 1 wherein in step 2), control flow and data flow semantic features include path constraints, memory states, and function calls.
4. The digest generation method according to claim 1, wherein in step 2), a path digest is obtained by simulating a patch-related function of an execution target binary file using a symbol execution technique; the specific method comprises the following steps:
firstly, taking an entry of a patch related function as an initial state, and ignoring call among functions;
then, the uninitialized context information is processed in the simulation execution process, wherein the uninitialized context information comprises initial parameters of the patch related function, the value of an undetermined memory variable and a return value of a function call;
the symbolic execution engine then performs a simulation execution along the path, collecting the symbolic values of the semantic features of the control flow and the data flow as a final path summary.
5. The digest generation method according to claim 1, wherein in step 2), the path digest is represented in a BNF-norm.
CN202010666854.9A 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch Active CN111967012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010666854.9A CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010666854.9A CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Publications (2)

Publication Number Publication Date
CN111967012A CN111967012A (en) 2020-11-20
CN111967012B true CN111967012B (en) 2024-03-08

Family

ID=73361519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010666854.9A Active CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Country Status (1)

Country Link
CN (1) CN111967012B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902898A (en) * 2012-12-27 2014-07-02 中国电信股份有限公司 Method and device for identifying viruses
CN111104335A (en) * 2019-12-25 2020-05-05 清华大学 C language defect detection method and device based on multi-level analysis
CN111177733A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Software patch detection method and device based on data flow analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8479188B2 (en) * 2010-07-08 2013-07-02 Microsoft Corporation Binary code change vulnerability prioritization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902898A (en) * 2012-12-27 2014-07-02 中国电信股份有限公司 Method and device for identifying viruses
CN111104335A (en) * 2019-12-25 2020-05-05 清华大学 C language defect detection method and device based on multi-level analysis
CN111177733A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Software patch detection method and device based on data flow analysis

Also Published As

Publication number Publication date
CN111967012A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN109739755B (en) Fuzzy test system based on program tracking and mixed execution
US8627290B2 (en) Test case pattern matching
US20010037492A1 (en) Method and apparatus for automatically extracting verification models
CN102804147B (en) Perform the code check executive system of the code check of ABAP source code
CN110046089B (en) Intelligent contract testing method based on path coverage sufficiency criterion
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
US11775414B2 (en) Automated bug fixing using deep learning
US8732676B1 (en) System and method for generating unit test based on recorded execution paths
US20020091968A1 (en) Object-oriented data driven software GUI automated test harness
US20020100022A1 (en) Method and apparatus for automatic verification of properties of a concurrent software system
US11579856B2 (en) Multi-chip compatible compiling method and device
US7661053B2 (en) Methods and apparatus for patternizing device responses
JPH08241193A (en) Method for analysis of code segment
CN111694746A (en) Flash defect fuzzy evaluation tool for compilation type language AS3
CN111176979A (en) Test case generation method and device of graph database
Omar et al. HOMAJ: A tool for higher order mutation testing in AspectJ and Java
US20060041873A1 (en) Computer system and method for verifying functional equivalence
CN102214142A (en) Instrumentation method for traceless manageable source code manually-defined mark
CN114238154B (en) Symbol execution method, unit testing method, electronic device and storage medium
Muylaert et al. Untangling composite commits using program slicing
CN114546879A (en) Redundancy detection and removal method for random test generation tool
CN113935041A (en) Vulnerability detection system and method for real-time operating system equipment
CN111967012B (en) Digest generation method for C/C++ code vulnerability patch
CN111625448B (en) Protocol packet generation method, device, equipment and storage medium
CN107766253A (en) A kind of method of the automatic maintenance test script based on model change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant