CN111967012A - Abstract generation method for C/C + + code vulnerability patch - Google Patents

Abstract generation method for C/C + + code vulnerability patch Download PDF

Info

Publication number
CN111967012A
CN111967012A CN202010666854.9A CN202010666854A CN111967012A CN 111967012 A CN111967012 A CN 111967012A CN 202010666854 A CN202010666854 A CN 202010666854A CN 111967012 A CN111967012 A CN 111967012A
Authority
CN
China
Prior art keywords
patch
path
basic block
control flow
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010666854.9A
Other languages
Chinese (zh)
Other versions
CN111967012B (en
Inventor
杨珉
张源
江喆越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010666854.9A priority Critical patent/CN111967012B/en
Publication of CN111967012A publication Critical patent/CN111967012A/en
Application granted granted Critical
Publication of CN111967012B publication Critical patent/CN111967012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Stored Programmes (AREA)

Abstract

The invention belongs to the technical field of binary vulnerability analysis, and particularly relates to a summary generation method for C/C + + code vulnerability patches; the method comprises the following specific steps: firstly, removing paths which are irrelevant to patches to the greatest extent by utilizing subsequent dominant nodes of patch control flow basic blocks so as to determine the control flow paths which are most relevant to the patches; then, carrying out symbolic execution on the patch related path by using a customized symbolic execution tool and extracting robust patch semantic information; and finally, calculating the path digests of all the anchor nodes as the digest data of the whole patch. The method can completely generate the semantic information of the binary vulnerability patch with fine granularity, and provides patch abstract information with high precision and easy expansion for binary vulnerability analysts.

Description

Abstract generation method for C/C + + code vulnerability patch
Technical Field
The invention belongs to the technical field of binary vulnerability patch analysis, and particularly relates to a summary generation method for C/C + + code vulnerability patches.
Background
The source code level patches are usually generated by comparing the source codes before and after two patches through diff commands, the patches only comprise added and deleted code segments in form, and security analysts are difficult to know the context semantic information of the patches from patch files, which brings great difficulty to the analysis and use of the patches. In order to better analyze and understand the patch, the invention firstly proposes the concept of the patch abstract, and aims to help security analysts and target program users to better analyze and understand the semantics of the patch, evaluate the patch status on a target binary, and more conveniently apply the patch.
At present, no relevant research for automatically analyzing and collecting the semantics of the patch exists, and the invention researches the work which is closest to the generation of the abstract of the patch at present, and comprises the following steps: firstly, the semantics of the introduced and modified patch is obtained by manually analyzing the patch source code; and secondly, automatically extracting the patch from the patch source code, and compiling the patch into a binary format so as to obtain the binary signature characteristic of the patch.
Both methods have certain disadvantages, wherein manual analysis often requires a large amount of reverse and static analysis, and cannot be applied on a large scale. The patch signature information extracted by the other method is often related to a specific compiling environment, and high-level patch semantic information separated from a specific binary file cannot be provided, so that loss of a large amount of patch original information is caused, and subsequent patch analysis is influenced.
Based on the analysis, the invention provides a novel automatic abstract generation method aiming at C/C + + code vulnerability patches, and the method adapts to more complex security patch analysis and application scenes by abstracting the semantics introduced by source code level vulnerability patches into an intermediate representation.
Disclosure of Invention
The invention aims to provide a method for automatically extracting semantic information of a code vulnerability patch for security analysts. Because the patch is a discrete source code segment, lacks context code information and is difficult to carry out semantic analysis, the invention restores the patch into a control flow graph of related functions before and after the application of the patch so as to obtain the complete semantics introduced by the patch. Considering that the modification of the patch can be very slight, the invention needs to strictly select the context area related to the patch, thereby avoiding the influence of introducing the patch-independent code. In order to support the selection of the context area, the invention introduces the concept of an anchor node to embody the code influenced by the code modification before and after patching, and simultaneously filter the influence of irrelevant code. Based on the anchor node, the invention screens all paths influenced by the patch by utilizing a backward slicing path technology, abstracts the behavior semantics of the paths by utilizing a symbolic execution technology, and composes the semantic abstract of the patch.
The technical scheme of the invention is specifically introduced as follows.
A summary generation method for C/C + + code vulnerability patches comprises the following specific steps:
1) respectively finding a certain subsequent domination node of a patch basic block as an anchor node in a control flow graph CFG of a binary file compiled by patch pre/post source codes, wherein the anchor node reserves semantic information of the patch basic block on a path and simultaneously cuts off redundant basic blocks after the patch;
2) collecting semantic features of control flow and data flow on a patch-related path determined by a single anchor node, and encoding the semantic features into language-independent semantic information as a path abstract;
3) and collecting path digests corresponding to all anchor nodes as the digest information of the whole patch.
In the invention, in step 1), the method for positioning the patch basic block comprises the following steps: firstly, collecting source codes before and after a patch, then generating a patch file through a diff command, then collecting a function where the patch is located and a line number of a patch line according to patch header information, and positioning a basic block position affected by the patch.
In the invention, in step 1), the anchor node acquisition method comprises the following steps:
1) firstly, determining all patch basic block sets, wherein for a source code before a patch, the patch basic block sets are basic blocks with line influence deleted, and for a source code after the patch, the patch basic block sets are basic blocks with line influence added;
2) then, one patch basic block is taken out, and a subsequent domination node of the basic block is calculated in a control flow graph of a corresponding method;
3) starting to select by a nearest subsequent dominant node, if the subsequent dominant node exists in the function control flow graph before/after the patch at the same time, the subsequent dominant node is an anchor node of the basic block of the current patch; otherwise, continuing to select.
4) Returning to the step 1) until all anchor nodes of the basic blocks affected by the patches are selected.
In the invention, in the step 2), semantic features of the control flow and the data flow comprise path constraint, memory state and function call.
In the invention, in the step 2), a sign execution technology is utilized to simulate and execute a patch related function of a target binary file to obtain a path abstract; the specific method comprises the following steps:
firstly, taking an entrance of a patch related function as an initial state, and neglecting the call among the functions;
then processing uninitialized context information in the process of simulation execution, wherein the uninitialized context information comprises initial parameters of a patch related function, values of undetermined memory variables and return values of function call;
then, the symbolic execution engine performs simulation execution along the path, and collects symbolic values of semantic features of the control flow and the data flow as a final path summary.
In the invention, in the step 2), the patch abstract is expressed in a BNF paradigm.
Compared with the prior art, the invention has the beneficial effects that:
the patch abstract generation technology provided by the invention can automatically generate the context semantic information of the patch in the source code before/after the patch. On the other hand, the format of the patch abstract is a universal BNF paradigm, and security analysts can quickly and deeply research on aspects of patch analysis, patch application, patch detection and the like by using the patch abstract.
Drawings
FIG. 1 is a diagram of a binary vulnerability patch digest generation architecture.
Fig. 2 is an exemplary patch.
Fig. 3 is an example of an anchor node for a patch basic block.
Fig. 4 is a BNF paradigm for path summarization.
FIG. 5 is a BNF paradigm for a patch digest.
Fig. 6 shows details of an anchor node selection algorithm.
Detailed Description
The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment.
The method mainly aims at C/C + + code vulnerability patches and aims at abstracting source code level patches into a middle-level semantic abstract, so that analysis and application of the patches are facilitated. The overall framework of the present invention is shown in fig. 1. The method comprises the following steps: the system comprises an anchor node selection module, a path abstract generation module and a patch abstract generation module. The execution flow of the invention is as follows:
1) and the anchor node selection module finds a subsequent domination node of the patch basic block in a Control Flow Graph (CFG) of a binary file compiled by the pre-patch/post-patch source code so as to shorten the control flow path passing through the patch basic block to the maximum extent and reduce the influence of irrelevant basic blocks.
2) And the path abstract generating module collects semantic features of control flow and data flow on the patch-related path determined by the single anchor node and encodes the semantic features into language-independent semantic information.
3) And the patch abstract generating module is used for collecting path abstracts corresponding to all anchor nodes as abstract information of the whole patch.
The individual modules are described further below:
1. anchor node selection
The anchor node defined by the invention is a control flow basic block: a certain Post Dominator node (Post Dominator) of a patch basic block, which can accurately preserve semantic information of the patch basic block on a path while truncating redundant basic blocks after the patch. If a function patch file contains multiple patch basic blocks, there are multiple corresponding anchor nodes.
(1) Locating basic blocks for patch modification
As shown in an exemplary diagram of a C/C + + vulnerability patch in fig. 2, the patch file includes two different types of patches, which are: a "-" headed patch delete line and a "+" headed patch add line. The control flow basic blocks affected by the patch deletion lines are embodied in the source code before the patch, and the control flow basic blocks affected by the patch addition lines are embodied in the source code after the patch. So in order to fully express the semantic information of the patch, the patch-related paths need to be collected in the source code before/after the patch at the same time.
In order to locate the basic patch block, the invention firstly collects the source codes before and after the patch, then generates the patch file through diff command, and then collects the function where the patch is located and the line number of the patch line according to the patch header information. Because a patch generally affects the source code before and after the patch at the same time (deleting the line affects the source code before the patch, adding the line affects the source code after the patch), the invention compiles the source code before and after the patch into two binary files at the same time, and positions the basic block affected by the patch in a function Control Flow Graph (CFG) related to the patch by using debugging information.
(2) Anchor node selection
After the position of the patch basic block is determined, the invention needs to collect all control flow paths affected by the patch so as to extract the semantic information of the patch. The intuitive way is to collect all paths passing through the patch basic block directly, but the patch is often tiny, and a large number of redundant basic blocks irrelevant to the patch are often included from the patch basic block to the function outlet, and the semantic information included in the redundant basic blocks can reduce the weight of the semantic information of the patch. To solve this problem, the present invention proposes to use the anchor node to truncate the path after the patch, thereby maximizing the patch semantic information.
In order to select a proper anchor node, the invention provides that the anchor node needs to meet the following three conditions:
1) any path through the patch basic block will reach at least one anchor node. Hereby it can be ensured that all basic blocks affected by the patch are covered.
2) All paths after the anchor node will not reach the patch basic block. This ensures that no patch basic block is deleted in subsequent optimizations.
3) The anchor node needs to be present in the source code both before and after the patch application. Therefore, the consistency of related paths before and after the patch is applied can be ensured.
In order to find an anchor node that meets the above conditions, the present invention performs checking and screening by patching the subsequent dominant nodes of the basic block. The anchor node selection method comprises the following specific steps:
1) firstly, determining all patch basic block sets, wherein the source code before the patch is a basic block with the effect of deleting lines, and the source code after the patch is a basic block with the effect of adding lines.
2) One of the patch basic blocks is then fetched, and the subsequent dominating node (excluding the patch basic block itself) of this basic block is computed in the control flow graph of the corresponding method.
3) Starting with the nearest subsequent dominant node, if the subsequent dominant node exists in the function control flow graph before/after the patch at the same time, the subsequent dominant node is the anchor node of the current patch basic block. Otherwise, continuing to select.
4) Returning to 1) until all anchor nodes of the basic block affected by the patch are selected.
Through the algorithm, all anchor nodes of one patch file can be obtained. These anchor nodes are all able to satisfy the 3 conditions described above: for the conditions 1 and 2, because the anchor node is a subsequent dominant node, according to the characteristics of the node, a path reaching the anchor node necessarily passes through the patch basic block, and a path starting from the anchor node necessarily does not pass through the patch basic block; for condition 3, the invention ensures that the anchor nodes are the same source code before and after the patch and also their assembly code.
As shown in the control flow graph of the same function of the pre/post patch source code in fig. 3, where the a and f nodes in the left graph correspond to the two lines of deleted codes in fig. 2, and the h and i nodes in the right graph correspond to the two lines of added codes in fig. 2. They are identified as patch basic blocks, and the anchor nodes are determined to be g nodes through the screening of the anchor node selection algorithm. It can be seen that the g node truncates all subsequent patch independent paths and preserves the simplest patch dependent paths.
2. Path summary generation
The path abstract generating module is used for generating patch semantic information on the path related to the patch. A patch-related path here refers to a set of control flow paths that start with a function entry and end with a given anchor node. Accurate semantic information needs manual analysis, and in order to extract semantic information automatically, the method selects control flow semantics and data flow semantics in a path as approximate patch semantics. In addition, in order to extract language-independent semantic information, the invention utilizes symbolic execution technology to simulate and execute patch-related functions of a target binary file and selects path constraints, memory states and function calls therein as path digests. Next, the present invention sets forth these path digest formats and the specific digest extraction process.
(1) Representation of a Path summary
The path summary is a combination of three kinds of path semantic information. In the symbolic execution process, the execution engine can conveniently record three kinds of information of path constraint, memory state and function calling sequence. The present invention requires that the path digest need to meet specific binary-independent, language-independent requirements. In order to determine the format of the path abstract, the invention reuses the related representation in the symbolic execution engine, namely the three semantic information are uniformly represented by symbolic values.
And (3) path constraint: path Constraint (Path Constraint) refers to the value of the branch condition associated with an input symbol in a program branch instruction, and is a series of Boolean-type equations without quantifiers. In the symbolic execution process, whenever a judgment and jump statement is encountered, the symbolic execution tool collects the path constraint of the current execution path into the constraint set of the path. The path constraints in the present invention are represented as Abstract Syntax Trees (AST), where each AST has the comparison operator as the root node and the symbolic expression as the left/right subtree.
Memory state: a set of memory accesses along the patch related path. The present invention only considers global memory accesses because global memory accesses are consistent among patches, while local memory accesses are typically different binary specific. It should be noted that the memory pointed to by all symbolic addresses is regarded as the global memory in the present invention. The present invention uses key-value pairs to represent each access to global memory. Considering that different binary files may have different actual addresses for the same global memory, the present invention represents them as symbolic addresses in g _ idx format. Where idx represents the number of global memory or symbolic addresses that have been recorded. For memory accessed values, the present invention uses real or symbolic values for representation.
Function call sequence: an ordered list of all called functions in the path of symbolic execution. For each function call, the invention only needs to record its name and ignore its parameters. For a function without a name, the invention simply assigns it a special name func _ unknown.
(2) Abstract extraction process
The symbolic execution technology uses symbolic variables to represent specific variables in the program execution process, and in the symbolic execution process, an execution engine can collect various branch judgments in a path to serve as path constraints, and finally, a constraint solver is used for solving program input meeting the constraints. Different from ordinary symbolic execution, the invention takes the entry of the patch related function as the initial state, and ignores the call among the functions, and accordingly, some uninitialized context information needs to be processed in the process of simulating execution, including the initial parameter of the patch related function, the value of an undetermined memory variable and the return value of the function call. Then, the symbolic execution engine performs simulation execution along a specific path so as to collect the three patch semantic information as a final path summary.
As shown in fig. 4, the final path digest of a single anchor node is expressed in a bacause form (BNF), and the path digest is composed of path constraints, memory states, and function call sequences. Wherein the path constraint is a set of constraints of a set of Abstract Syntax Trees (AST); the memory state is a memory element set represented by key value pairs; the function call sequence is a function call sequence in a particular path.
Since the anchor node exists in the function control flow graph before and after the patch, the path summary finally generated by the module consists of a path summary BNF normal form of the function control flow graph before the patch and a path summary BNF normal form of the function control flow graph after the patch.
3. Patch summary generation module
In the anchor node selection module, the invention calculates an anchor node set of a patch file, each element in the anchor node set corresponds to two path sets which are patch related path sets before and after the patch respectively. In order to completely express all semantic information of a patch file, the patch summary module needs to calculate a summary set of two path sets corresponding to all anchor nodes, and a final patch summary is formed by the collection of the two path summaries.
As shown in fig. 5, the whole patch digest is represented in BNF paradigm, where the PatchDigest represents the patch digest and is composed of anchor node controlled path digest tuples (PathDigests), each of which is composed of a pre-patch binary patch related path digest set (PrePatchPathDigests) and a post-patch binary patch related path digest set (PostPatchPathDigests) that contain the path digest of each patch related path.
Example 1
In this embodiment, an anchor node selection module is designed, where the module first parses a patch file from source codes before and after a patch, locates a position of a basic block of the patch by using source code row information of the patch, and then selects an anchor node of each basic block of the patch by using a proper algorithm. The path abstract generation module designed by the invention can automatically extract the semantic information of the patch related path from the function entrance to the anchor node, thereby generating the semantic abstract information of the specific patch basic block. This section introduces specific implementation procedures for these two modules.
First, anchor node selection module
The module is mainly used for determining the position of a basic block of a patch code in a binary file and then selecting an anchor node of a proper patch basic block. The module takes the source codes and patch files before and after the patch as input, and finally generates an anchor node set of each patch basic block. This section explains the positioning of patch basic blocks and the selection algorithm of anchor nodes in detail.
(1) Location of patch basic blocks
In order to map the patch into the binary executable file, the invention firstly analyzes the patch file to extract the function information related to the patch and the line number information of the patch. In order to obtain the patch file, the invention directly compares the source codes before and after the patch through a diff command so as to derive the patch file with a specific format. Each patch file comprises a patch header which indicates the name of the function in which the patch is located and the position of the code line affected by the patch, and in addition, considering that the name of the patch related function given in the patch file may be covered by macro (such as SYSCALL _ DEFINE (func)), the invention chooses to determine the patch related function by analyzing the source code.
This module then needs to map the patch lines to basic blocks in the binary file. To accomplish this cross-language mapping process, the present invention directly utilizes the debug information in the binary file for localization. The positioning process is as follows:
1) firstly, the source code after patching is compiled into a binary file, and as most of the issued binary programs are optimized and compiled by adopting the standard O2, the invention also adopts the standard O2 optimization. In addition, the invention needs to start a debugging mode to compile the source codes before/after the patch, so that-g options need to be started in the compiling process; finally, the module generates a Control Flow Graph (CFG) of the patch related function by using the debugging information in the binary file and positions the patch basic block in the CFG through the patch line number. It is noted that the present invention utilizes an open-source symbolic execution engine angr to derive a control flow graph for a target binary file.
(2) Selection algorithm for controlling flow anchor node
As described above, each patch basic block needs to find the same control flow anchor node in the binary control flow graph obtained after the pre/post patch source code is compiled. As shown in fig. 6, a specific anchor node selection algorithm is as follows:
Cpreis a basic patch block in the pre-patch source code affected by the patch, CpostThe patch basic block is affected by the patch in the source code after the patch; cfgpreIs a control flow graph of a patch related function of the pre-patch source code, and cfgpostIs a patchAnd the control flow graph of the patch correlation function of the back source code. anchors is the set of anchor nodes for all patch basic blocks, initially an empty set.
The present invention requires to CpreAnd CpostAnd selecting anchor nodes from all the patch basic blocks. For a specific patch basic block, the invention firstly utilizes a post _ registers function to calculate all subsequent dominating node sequences of the patch basic block, and the sequences are the sequences subjected to topology sequencing by a top _ sort function, so that the invention can be ensured to be selected from the nodes closest to the patch basic block. Then, the invention selects a subsequent dominant node in sequence, if the subsequent dominant node is not the patch basic block itself and verifies that the subsequent dominant node also exists in another source code through the check _ existence function, then the current subsequent dominant node is the anchor node and adds the anchor node into the anchors set. The final anchors set contains the anchor node sets corresponding to all patch basic blocks in one patch file.
Second, path abstract generation
The path digest generation module generates a language-independent semantic information digest based on the patch-related control flow path. The patch related path refers to an anchor node in a given patch related function, and the invention enumerates all paths starting with a function entry and ending with the anchor node. Furthermore, to avoid the path explosion problem during symbol execution, the present invention unrolls the loop only once and discards paths that are not resolvable during symbol execution. This allows the final collected patch related path and its digest to be satisfied in actual execution.
This module needs to extract the semantic information of the patch related path, specifically, the control flow and data flow features mentioned above, including: path constraints, memory state, and function calls. These features can be extracted through customized symbolic execution, but unlike ordinary symbolic execution, the present invention employs an open-source symbolic execution engine angr to simulate path execution and custom modify it so that the entry of symbolic execution is the entry of the specified path. In addition, the present invention also requires additional processing of some variables:
1) function parameters: the present invention determines the parameters of the patch related function by function call convention and initializes them to a uniquely identified symbolic value (e.g., arg0 assigned to the first parameter).
2) Undetermined memory: the undetermined memory includes uninitialized memory and memory pointed to by the symbolic address. For uninitialized global memory (. data and. bss segments), the present invention assigns unique symbol values to them. For the local memory, the invention sets its initial value to 0. In addition, for the memory pointed by the symbolic address, the invention assigns a symbolic value according to the address and reads/writes the symbolic value in the interpretation execution process. At the same time, the invention also maintains a mapping table between the symbol address and its value. Thus, when dereferencing a symbolic address, the symbolic execution engine can correctly reuse the value stored in the address.
3) Return value of function call: the return value of the function call will be expressed as the sign value of { funcname } _ ret _ { idx }. Where funcname is the name of the function being called, and idx represents the number of times the function was called on the current path. If the name of the called function can not be determined, a unique symbolic name is given to the called function according to the function address.
The path digests are represented by symbolic values in a symbolic execution engine that can abstract out language-independent semantic features, including control flow features (path constraints, function call sequences) and data flow features (memory states).
The finally obtained patch abstract is composed of path abstracts controlled by all anchor nodes and represents context semantic information of each patch line in the whole patch. These semantic features remove the redundant patch-independent semantic information to the greatest extent, while providing security analysts with a digest format (BNF paradigm) that facilitates multiplexing and analysis.

Claims (6)

1. A summary generation method for C/C + + code vulnerability patches is characterized by comprising the following specific steps:
1) respectively finding a certain subsequent domination node of a patch basic block as an anchor node in a control flow graph CFG of a binary file compiled by patch pre/post source codes, wherein the anchor node reserves semantic information of the patch basic block on a path and simultaneously cuts off redundant basic blocks after the patch;
2) collecting semantic features of control flow and data flow on a patch-related path determined by a single anchor node, and encoding the semantic features into language-independent semantic information as a path abstract;
3) and collecting path digests corresponding to all anchor nodes as the digest information of the whole patch.
2. The digest generation method according to claim 1, wherein in step 1), the method for locating the patch basic block is as follows: firstly, collecting source codes before and after a patch, then generating a patch file through a diff command, then collecting a function where the patch is located and a line number of a patch line according to patch header information, and positioning a basic block position affected by the patch.
3. The digest generation method according to claim 1, wherein in step 1), the anchor node is obtained by the following method:
1) firstly, determining all patch basic block sets, wherein for a source code before a patch, the patch basic block sets are basic blocks with line influence deleted, and for a source code after the patch, the patch basic block sets are basic blocks with line influence added;
2) then, one patch basic block is taken out, and a subsequent domination node of the basic block is calculated in a control flow graph of a corresponding method;
3) starting to select by a nearest subsequent dominant node, if the subsequent dominant node exists in the function control flow graph before/after the patch at the same time, the subsequent dominant node is an anchor node of the basic block of the current patch; otherwise, continuing to select.
4) Returning to the step 1) until all anchor nodes of the basic blocks affected by the patches are selected.
4. The digest generation method of claim 1, wherein in step 2), the control flow and data flow semantic features include path constraints, memory states, and function calls.
5. The digest synthesis method according to claim 1, wherein in step 2), a path digest is obtained by simulating and executing a patch-related function of the target binary file by using a symbolic execution technique; the specific method comprises the following steps:
firstly, taking an entrance of a patch related function as an initial state, and neglecting the call among the functions;
then processing uninitialized context information in the process of simulation execution, wherein the uninitialized context information comprises initial parameters of a patch related function, values of undetermined memory variables and return values of function call;
then, the symbolic execution engine performs simulation execution along the path, and collects symbolic values of semantic features of the control flow and the data flow as a final path summary.
6. The digest composition method according to claim 1, wherein in step 2), the patch digest is expressed in a BNF paradigm.
CN202010666854.9A 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch Active CN111967012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010666854.9A CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010666854.9A CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Publications (2)

Publication Number Publication Date
CN111967012A true CN111967012A (en) 2020-11-20
CN111967012B CN111967012B (en) 2024-03-08

Family

ID=73361519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010666854.9A Active CN111967012B (en) 2020-07-13 2020-07-13 Digest generation method for C/C++ code vulnerability patch

Country Status (1)

Country Link
CN (1) CN111967012B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011493A1 (en) * 2010-07-08 2012-01-12 Microsoft Corporation Binary code change vulnerability prioritization
CN103902898A (en) * 2012-12-27 2014-07-02 中国电信股份有限公司 Method and device for identifying viruses
CN111104335A (en) * 2019-12-25 2020-05-05 清华大学 C language defect detection method and device based on multi-level analysis
CN111177733A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Software patch detection method and device based on data flow analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011493A1 (en) * 2010-07-08 2012-01-12 Microsoft Corporation Binary code change vulnerability prioritization
CN103902898A (en) * 2012-12-27 2014-07-02 中国电信股份有限公司 Method and device for identifying viruses
CN111104335A (en) * 2019-12-25 2020-05-05 清华大学 C language defect detection method and device based on multi-level analysis
CN111177733A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Software patch detection method and device based on data flow analysis

Also Published As

Publication number Publication date
CN111967012B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN109739755B (en) Fuzzy test system based on program tracking and mixed execution
CN110046089B (en) Intelligent contract testing method based on path coverage sufficiency criterion
US7975256B2 (en) Optimizing application performance through data mining
CN102419728B (en) Method for determining software test process sufficiency based on coverage rate quantitative indicators
US9632916B2 (en) Method and apparatus to semantically connect independent build and test processes
EP3165984B1 (en) An event analysis apparatus, an event analysis method, and an event analysis program
US8732676B1 (en) System and method for generating unit test based on recorded execution paths
US20010037492A1 (en) Method and apparatus for automatically extracting verification models
JPH08241193A (en) Method for analysis of code segment
CN110147235B (en) Semantic comparison method and device between source code and binary code
CN110554954B (en) Test case selection method combining static dependency and dynamic execution rule
CN111581106A (en) Binary program vulnerability testing method and device and readable storage medium
CN110543421A (en) Unit test automatic execution method based on test case automatic generation algorithm
CN111400724A (en) Operating system vulnerability detection method, system and medium based on code similarity analysis
CN103294594A (en) Test based static analysis misinformation eliminating method
CN111026433A (en) Method, system and medium for automatically repairing software code quality problem based on code change history
CN107025175A (en) A kind of fuzz testing seed use-case variable-length field pruning method
CN103744788B (en) The characteristic positioning method analyzed based on multi-source software data
CN114546879A (en) Redundancy detection and removal method for random test generation tool
CN111967012A (en) Abstract generation method for C/C + + code vulnerability patch
CN109002723B (en) Sectional type symbol execution method
Bruschi et al. A framework for the functional verification of SystemC models
CN110659200B (en) Method and system for comparing and analyzing source code and target code of aviation onboard software
CN113282495B (en) Java software fault positioning method based on track monitoring
CN115310095A (en) Block chain intelligent contract mixed formal verification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant