CN109002723B - Sectional type symbol execution method - Google Patents

Sectional type symbol execution method Download PDF

Info

Publication number
CN109002723B
CN109002723B CN201810819763.7A CN201810819763A CN109002723B CN 109002723 B CN109002723 B CN 109002723B CN 201810819763 A CN201810819763 A CN 201810819763A CN 109002723 B CN109002723 B CN 109002723B
Authority
CN
China
Prior art keywords
program
control flow
flow graph
symbol execution
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810819763.7A
Other languages
Chinese (zh)
Other versions
CN109002723A (en
Inventor
胡昌振
马锐
窦伯文
王龙
高浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810819763.7A priority Critical patent/CN109002723B/en
Publication of CN109002723A publication Critical patent/CN109002723A/en
Application granted granted Critical
Publication of CN109002723B publication Critical patent/CN109002723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention adopts a sectional type symbol execution method to carry out coarse-grained division on program segments and adopts a mode of independently executing each program segment to carry out symbolic analysis on the program so as to improve the analysis efficiency and the analysis accuracy of the prior symbol execution tool aiming at large-scale programs and the prior sectional type symbol execution sequence analysis method. A sectional symbol execution method divides a program into a plurality of larger program sections by a clustering method, then performs independent symbol execution on each program section, and then combines symbol execution results of each program section to complete analysis of the whole program.

Description

Sectional type symbol execution method
Technical Field
The invention belongs to the technical field of vulnerability mining in information security, and particularly relates to a sectional type symbol execution method.
Background
Symbolic execution is a tool for software bug detection by using symbolic values instead of specific values, and it can detect errors of a program by analyzing path constraints. Symbolic execution has become one of the effective techniques for finding bugs and security holes in programs, and it has been used for security testing and quality assurance by major software companies such as microsoft. Symbolic execution generally tests programs by acquiring the execution path of the program and inverting the path, and aims to improve the analysis efficiency and the test coverage rate of the program by calculating a program logic expression instead of manually analyzing codes. Although the path tree executed by the symbol is too complex and may cause the path explosion problem, since the path constraint can be calculated, paths which cannot be obtained by other detection technologies, such as fuzzy test, and the like, can be obtained, and therefore special errors can be effectively found. In practice, this approach is also used, and symbolic execution has become an important technology for software error analysis and security vulnerability checking.
There are many symbol execution tools such as angr, KLEE, JPF, etc.
The angr is a binary automated analysis tool developed by the university of california, san babara, and in which the currently popular symbolic execution technology is implemented, with dynamic and static symbolic analysis capabilities on binary programs. The angr was originally used to find backdoors in programs and is now available in the field of software analysis.
KLEE is a tool developed by Stanford university to construct program test cases using symbolic execution technology. When the KLEE analyzes a program to construct a test case, the value range of the symbol is also analyzed at a key program point by utilizing a symbol execution and constraint solving technology, and whether the value range is in a safety range is checked.
JPF is an open source symbolic execution tool for JAVA bytecode program of NASA, and can provide complete symbolic execution functions, including functions of input variable symbolization, basic path constraint generation, program path search and the like.
These tools, as the popular symbol execution tools at present, have better practicability in program analysis, but they all have the same disadvantages. They all have the problems of path explosion and low analysis efficiency when analyzing large programs, which will cause huge expenses.
The sectional type Symbolic Execution is to divide a program into a plurality of sections for analysis, related researches are carried out by researchers at present, and methods similar to the method provided by the invention comprise Xiao Q, Chen Y, Wu C, et al. pbSE: Phase-Based symbolonic Execution [ C ]// IEEE/IFIP International Conference on dependent Systems and networks. IEEE,2017: 133; fangwenqing, segmented symbolic execution model and its environmental interaction problem study [ D ]. beijing post and telecommunications university, 2010. However, these methods have some problems. First, these methods are mainly intended to solve the analysis problem of external processes, rather than to deal with the path explosion problem in symbol execution; secondly, the methods mainly divide the program based on functions, usually divide too many program segments, and the too many segments can seriously cut off the data relation among the program segments, thereby causing the loss of program execution state information and further causing the inaccurate analysis result of symbol execution; thirdly, these methods all adopt a sequential manner to perform symbol execution on the program segments, and there are precedence relationships and corresponding state data between the execution of each program segment, which cannot significantly improve the efficiency of symbol execution.
Disclosure of Invention
In view of the above disadvantages, the present invention employs a segmented symbol execution method to perform coarse-grained division on program segments and perform symbolic analysis on a program in a manner that each program segment is independently executed, so as to improve the analysis efficiency and the analysis accuracy of the current symbol execution tool for large-scale programs and the current segmented symbol execution sequence analysis method.
The invention is realized by the following technical scheme:
a sectional symbol execution method divides a program into a plurality of larger program sections by a clustering method, then performs independent symbol execution on each program section, and then combines symbol execution results of each program section to complete analysis of the whole program.
Furthermore, before the program is divided, a control flow graph is extracted from the program, nodes of the control flow graph are basic blocks of the program, directed edges of the control flow graph are jumps among the basic blocks, and then the control flow graph is divided into a plurality of control flow subgraphs by a clustering method.
Further, setting a weight for each node in the control flow graph, where the node in the control flow graph represents a single basic block in the control flow graph, and taking the number of instructions in each basic block as the weight of the node, which represents the size of the basic block.
Further, the program control flow graph is divided into a plurality of larger control flow subgraphs by a clustering method, and the following division modes are specifically adopted:
selecting edges in a control flow graph according to a clustering algorithm;
step two, deleting the edge selected in the step one in the control flow graph;
step three, calculating the modularity of the control flow graph, if the modularity is improved, updating the control flow graph, otherwise, returning to the step one;
and step four, controlling the flow graph to be divided, and obtaining a divided subgraph which is a result of program segmentation.
Further, the execution of the independent symbol of each program segment specifically adopts the following mode:
(1) determining a starting node and a terminating node in each program segment;
(2) completing the jumping information between the missing basic blocks;
(3) traversing each program segment and selecting a corresponding analysis strategy;
(4) and analyzing according to the adopted corresponding analysis strategy.
Further, the result merging includes state data merging and constraint merging. And the result merging is for two connected program segments, and the result of the execution of the whole program symbol is obtained after the merging of the state data and the constraint condition of all the connected program segments is completed.
The invention has the beneficial effects that:
the invention analyzes the program by adopting an independent analysis method of sectional type symbol execution aiming at the problem of low efficiency of sequence analysis methods of large-scale program symbol execution and the prior sectional type symbol execution. This method allows for independent analysis of the program segments. Through program segment division, independent symbol execution is carried out on each program segment, and results are combined, so that the efficiency of symbol execution is improved.
Aiming at the problems that the existing segmented symbol has too much segmentation execution and seriously isolates the inaccurate operation result caused by the data flow information transmission of the program segment, the invention divides the program segment by coarse granularity through a clustering algorithm and reduces the number of the divided program segments as much as possible on the premise of ensuring that the scale of the program segment does not generate the problem of path explosion as much as possible, thereby relieving the problem of inaccurate program analysis caused by segmentation.
Drawings
FIG. 1 is a flow chart of a segmented symbol execution method according to the present invention.
Detailed Description
The invention provides a sectional type symbol execution method aiming at the problems that the efficiency of large-scale program analysis in original symbol execution is not high, the analysis in the original sectional type symbol execution is not accurate, and the like. The method is different from the traditional method for executing the symbols of the program according to the sequence analysis mode by the sectional type symbol execution, and the method for executing the symbols by independently analyzing the program sections is adopted. The method divides a program into a plurality of larger segments by a clustering method, further performs independent symbolic execution on each segment, and then combines symbolic execution results of each segment to complete the analysis of the whole program. The invention has universality for symbol execution tools, is embodied on an angr tool, and can be also used for other symbol execution tools such as KLEE and JPF.
As shown in fig. 1, the input of the present invention is a program, a control flow graph is generated through control flow analysis, and then the control flow graph is divided through a clustering-based program segment dividing method, so as to divide each program segment. In the next step, independent symbolic execution analysis is performed on each program segment, and the missing of jump information caused by the program segments is completed. After the symbolic execution is completed on each program segment, the merging of the processing results of the program segments is performed, including the merging of the state data, the merging of the constraint conditions and the obtaining of the result of the symbolic execution. The processing procedures of control flow analysis, program segment division method, single program segment symbol execution and result merging will be described below.
1. Control flow analysis
Control flow analysis first extracts a control flow graph from a program. The nodes of the control flow graph are program basic blocks, and the directed edges are jumps between the basic blocks. In this embodiment, an angr tool is used to extract a control flow graph. It is to be understood that in the implementation, the control flow graph may be obtained by other tools.
And modifying the control flow graph on the basis of obtaining the control flow graph by using the angr, and further adding a node weight. The node weight is the number of instructions in a basic block of a program and is used to indicate the size of the basic block. The control flow graph generated at this step is used for subsequent program segment division.
2. Program segment partitioning
Next, program segment division will be performed. Specifically, the control flow graph obtained in the previous step is divided by using a clustering algorithm, and the dividing steps are as follows:
(1) and selecting edges in the control flow graph according to a clustering algorithm.
(2) And deleting the edge selected in the step one in the control flow graph.
(3) And (3) calculating the modularity of the clustering algorithm, if the modularity is improved, updating the control flow graph, and otherwise, returning to the step (1).
(4) And after the control flow graph is divided, obtaining the divided subgraphs.
The divided control flow subgraph is the result of the division of the program segment. In this context, each subgraph corresponds to a program segment that can be executed symbolically independently.
3. Single-pass section symbol execution
The symbolic execution is performed independently on a single program segment, and can be performed in four steps.
(1) Determining a starting node and a terminating node in each program segment;
(2) completing the jumping information between the missing basic blocks;
(3) traversing each program segment and selecting a corresponding analysis strategy;
(4) and analyzing according to the adopted corresponding analysis strategy.
In the step (2), because direct addressing and indirect addressing strategies under the conditions of program calling and returning and the like are not considered when the program basic blocks are divided, the situation that the target address cannot be found when the program returns can be caused, and at this time, corresponding jump information needs to be completed according to the original control flow diagram.
In step (3), an analysis strategy needs to be selected according to the type of the program segment. In this embodiment, if the sequence program is used, a common exploration strategy is selected; if the program is a loop program, a dynamic and static mixed execution strategy is selected.
In the step (4), the existing symbol execution tool is adopted, symbol execution is carried out from each starting node to the ending node from the program segment inlet, and finally the state of execution to the ending node is obtained as the input of the next result combination. In the present embodiment, an angr tool is used to perform symbolic execution, and it is understood that other symbolic execution tools may be used to replace the angr in the specific implementation process.
4. Result merging
The result merging is for two program segments in the original control flow graph that have a connected relationship. And combining the results of all the connected program segments to obtain the result of the execution of the whole program symbol.
Further, for two program segments connected by a directed edge, the program segment at the start node of the directed edge is referred to as an upstream program segment, and the program segment at the end node of the directed edge is referred to as a downstream program segment.
The result merging mainly comprises two parts of state data merging and constraint condition merging.
Further, in this embodiment, the status data merge includes two parts, namely, register merge and memory merge.
And the register combination obtains a register list according to the program architecture information, and then combines the register information in the two program segments in sequence.
The main idea of merging register information is to use the state data result of the upstream program segment as the input of the downstream program segment. Specifically, when the values of two registers are combined, four different situations arise:
(1) the value in the state of the downstream program segment is an actual value rather than a symbolic value, and the value does not need to be replaced;
(2) the value in the state of the downstream program segment is a symbol value, the corresponding value in the state of the upstream program segment is an actual value, and the actual value is required to be substituted into a symbol variable;
(3) the values in the states of the upstream and downstream program segments are actual values, and at this time, two symbolic expressions are required to be substituted into operation;
(4) the register value in the downstream program segment state is not initialized, and the value is directly set as the value of the corresponding register in the upstream program segment state.
The memory merge method is similar to, but different from, the register merge method. Specifically, since the memory is a segment of continuous address, and the length of each read/write is not fixed, the memory data needs to be obtained by inserting the analysis break point, and the written memory address and length need to be recorded during the symbol execution process.
The constraint condition in the constraint condition combination refers to the constraint condition needed by the program to be executed from the program segment inlet to the program segment outlet. The constraint incorporates two steps: the first step is to replace the symbol value in the downstream state constraint condition, and the processing mode of the step is similar to the register merging method; the second step is to copy the constraints in the upstream state to the downstream state.
After all the merging processes are completed, the symbol execution analysis result of the complete program can be obtained, including the state from the program execution to the end node and the corresponding path condition. And finally, calculating whether the state triggers the vulnerability according to the program state, calculating whether the path exists according to the path constraint (namely whether the solution set is empty), and generating the test case when the solution set is not empty.

Claims (4)

1. A sectional symbol execution method is characterized in that a control flow graph is extracted from a program, nodes of the control flow graph are basic blocks of the program, directed edges of the control flow graph are jumps among the basic blocks, the program control flow graph is divided into a plurality of control flow subgraphs by a clustering method, then independent symbol execution is carried out on each program section, and finally symbol execution results of each program section are combined to complete analysis of the whole program; the program control flow graph is divided into a plurality of control flow subgraphs by a clustering method, and the following division modes are specifically adopted:
selecting edges in a control flow graph according to a clustering algorithm;
step two, deleting the edge selected in the step one in the control flow graph;
step three, calculating the modularity of the control flow graph, if the modularity is improved, updating the control flow graph, otherwise, returning to the step one;
and step four, controlling the flow graph to be divided, and obtaining a divided subgraph which is a result of program segmentation.
2. The segmented symbol execution method of claim 1, wherein a weight is set for each node in the control flow graph, the node in the control flow graph represents a single basic block in the control flow graph, and the weight of the node represents the size of the basic block by taking the number of instructions in each basic block.
3. The method as claimed in claim 1 or 2, wherein the independent symbol execution of each program segment is implemented by:
(1) determining a starting node and a terminating node in each program segment;
(2) completing the jumping information between the missing basic blocks;
(3) traversing each program segment and selecting a corresponding analysis strategy;
(4) and analyzing according to the adopted corresponding analysis strategy.
4. The segmented symbol execution method according to claim 1 or 2, wherein the merging includes state data merging and constraint condition merging, and after the completion of the merging of the state data and constraint conditions of all the connected program segments, the result of the whole program symbol execution is obtained.
CN201810819763.7A 2018-07-24 2018-07-24 Sectional type symbol execution method Active CN109002723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810819763.7A CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810819763.7A CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Publications (2)

Publication Number Publication Date
CN109002723A CN109002723A (en) 2018-12-14
CN109002723B true CN109002723B (en) 2021-09-07

Family

ID=64597107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810819763.7A Active CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Country Status (1)

Country Link
CN (1) CN109002723B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688403A (en) * 2021-10-26 2021-11-23 江苏通付盾科技有限公司 Intelligent contract vulnerability detection method and device based on symbolic execution verification
CN116541280B (en) * 2023-05-06 2023-12-26 中国电子技术标准化研究院 Fuzzy test case generation method based on neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049377A (en) * 2012-12-14 2013-04-17 中国信息安全测评中心 Parallel symbolic execution method based on path cluster reductions
CN106156366A (en) * 2016-08-01 2016-11-23 浙江工业大学 A kind of pinning control node selecting method based on cluster

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049377A (en) * 2012-12-14 2013-04-17 中国信息安全测评中心 Parallel symbolic execution method based on path cluster reductions
CN106156366A (en) * 2016-08-01 2016-11-23 浙江工业大学 A kind of pinning control node selecting method based on cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
分段符号执行模型及其环境交互问题研究;范文庆;《中国博士学位论文全文数据库》;20101115;第5-6、16页 *
基于控制流信息的克里普克结构生成方法;牛小鹏 等;《计算机科学》;20120630;第39卷(第6期);第93-97页 *
基于超图模型的软件模块自动划分;魏小凤 等;《计算机工程》;20160131;第42卷(第1期);第71、73页 *

Also Published As

Publication number Publication date
CN109002723A (en) 2018-12-14

Similar Documents

Publication Publication Date Title
CN109739755B (en) Fuzzy test system based on program tracking and mixed execution
CN101714118B (en) Detector for binary-code buffer-zone overflow bugs, and detection method thereof
US9983984B2 (en) Automated modularization of graphical user interface test cases
US9569345B2 (en) Architectural failure analysis
US7971193B2 (en) Methods for performining cross module context-sensitive security analysis
US9251045B2 (en) Control flow error localization
Partush et al. Abstract semantic differencing for numerical programs
US20110055777A1 (en) Verification of Soft Error Resilience
EP3264274B1 (en) Input discovery for unknown program binaries
CN109002723B (en) Sectional type symbol execution method
CN113468525A (en) Similar vulnerability detection method and device for binary program
CN105487983A (en) Sensitive point approximation method based on intelligent route guidance
CN115544490A (en) Method and system for detecting password constant in binary file
Hua et al. On the effectiveness of deep vulnerability detectors to simple stupid bug detection
Liu et al. Vulnerability analysis for x86 executables using genetic algorithm and fuzzing
Ritter et al. Formal verification of designs with complex control by symbolic simulation
Su et al. STCG: state-aware test case generation for simulink models
JP2017041196A (en) Stub object determination device, method, and program
WO2023067665A1 (en) Analysis function addition method, analysis function addition device, and analysis function addition program
Yuan et al. A method for detecting buffer overflow vulnerabilities
Zhang et al. INSTRCR: Lightweight instrumentation optimization based on coverage-guided fuzz testing
Li et al. Software Source code security audit algorithm supporting incremental checking
Li et al. Effective fault localization based on minimum debugging frontier set
US6986110B1 (en) Automated method and system for backtracing of instruction parameters from specified instruction in test cases
KR102421394B1 (en) Apparatus and method for detecting malicious code using tracing based on hardware and software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant