CN109002723A - A kind of segmented symbolic excution methodology - Google Patents

A kind of segmented symbolic excution methodology Download PDF

Info

Publication number
CN109002723A
CN109002723A CN201810819763.7A CN201810819763A CN109002723A CN 109002723 A CN109002723 A CN 109002723A CN 201810819763 A CN201810819763 A CN 201810819763A CN 109002723 A CN109002723 A CN 109002723A
Authority
CN
China
Prior art keywords
program
analysis
segmented
controlling stream
stream graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810819763.7A
Other languages
Chinese (zh)
Other versions
CN109002723B (en
Inventor
胡昌振
马锐
窦伯文
王龙
高浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810819763.7A priority Critical patent/CN109002723B/en
Publication of CN109002723A publication Critical patent/CN109002723A/en
Application granted granted Critical
Publication of CN109002723B publication Critical patent/CN109002723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The present invention uses a kind of segmented symbolic excution methodology, coarseness division is carried out to program segment, and symbolic analysis is carried out to program in such a way that each program segment independently executes, the analysis efficiency and accuracy of analysis of analysis and existing segmented semiology analysis sequence analysis method are carried out to improve current semiology analysis tool needle to extensive program.Procedure division is biggish multiple program segments by clustering method, and then carries out independent symbols execution to each program segment by a kind of segmented symbolic excution methodology, is then merged the semiology analysis result of each program segment, is completed analysis to entire program.

Description

A kind of segmented symbolic excution methodology
Technical field
The invention belongs to the bug excavation technical fields in information security, and in particular to a kind of segmented semiology analysis side Method.
Background technique
Semiology analysis is a kind of tool that occurrence progress software vulnerability detection is replaced by using value of symbol, it can lead to The mode of analysis path constraint is crossed to detect the mistake of program.Semiology analysis has become mistake and security breaches in search program One of effective technology, it is used for safety test and quality assurance by the main softwares company such as Microsoft.Semiology analysis is usually logical It crosses and obtains the path mode that walking along the street diameter negates of going forward side by side that program executes and carry out test program, target is by calculation procedure logical table The mode that manual analysis code is replaced up to formula improves the analysis efficiency and test coverage of program.Although the road of semiology analysis Diameter tree excessively complexity may result in path explosion problem, but since its path constraint can be calculated, can obtain Other detection techniques, such as fuzz testing, etc. the unavailable path of technologies, so as to effectively find some specific faults. In practice, this mode is also applied, and semiology analysis has become the one of software error analysis and security breaches inspection Kind important technology.
There are many semiology analysis tools, such as angr, KLEE, JPF etc. at present.
Angr is the binary system automated analysis tool developed by University of California--Santa Barbara, and is realized wherein Currently a popular symbolic execution technique has the sound state symbolic analysis ability to binary program.Angr is initially to be used to seek Look for the back door in program, it now is possible to be applied to software analysis field.
KLEE is the tool using symbolic execution technique constructor test case developed by Stanford University.KLEE exists While analyzing program structure test case, also utilizes semiology analysis and constrain solution technique in crucial program point to symbol Value range analyzed, check whether in safe range.
JPF is one of NASA and is directed to the open source semiology analysis tool of JAVA bytecode program, it is capable of providing completely Semiology analysis function, the functions such as the generation constrained including input variable symbolism, elementary path and Program path search.
These tools have preferable practicability as currently a popular semiology analysis tool in terms of program analysis, but All there is identical disadvantages for they.They are all asked there is path explosion and analysis efficiency are lower when analyzing large program Topic, this will will lead to huge expense.
Segmented semiology analysis is that program is divided into multiple sections to analyze, and also has researcher to carry out correlation at present and grinds Study carefully, similar method has Xiao Q, Chen Y, Wu C, et al.pbSE:Phase-Based Symbolic with the present invention Execution[C]//IEEE/IFIP International Conference on Dependable Systems and Networks.IEEE,2017:133-144;Model essay celebrates break sign and executes model and its north environmental interaction Study on Problems [D] Capital University of Post and Telecommunication, 2010.But there is some problems for these methods.First, these methods are primarily to solve external mistake The problem analysis of journey, rather than process symbol path explosion problem in execution;Second, these methods are mainly based upon function pair Program is divided, it will usually be marked off excessive program segment, and is excessively segmented the data that can seriously separate between each program segment Connection, so as to cause program execution state loss of learning, so that the analysis result inaccuracy of semiology analysis;Third, these Method is all to carry out semiology analysis to program segment using sequential system, each program segment execute between have sequencing relationship with And corresponding status data, the efficiency of semiology analysis cannot be obviously improved.
Summary of the invention
In view of the above deficiencies, the present invention uses a kind of segmented symbolic excution methodology, carries out coarseness division to program segment, And symbolic analysis is carried out to program in such a way that each program segment independently executes, to improve current semiology analysis tool needle pair Extensive program carries out the analysis efficiency and accuracy of analysis of analysis and existing segmented semiology analysis sequence analysis method.
The invention is realized by the following technical scheme:
A kind of segmented symbolic excution methodology, by clustering method by procedure division be biggish multiple program segments, in turn Independent symbols execution is carried out to each program segment, then merges the semiology analysis result of each program segment, is completed to entire The analysis of program.
Further, controlling stream graph will be extracted before procedure division from program first, node is program basic block, is had It is jumping between basic block to side, controlling stream graph is then divided by multiple controls by the method clustered and flows subgraph.
Further, weight is arranged to each node in the controlling stream graph, the controlling stream graph interior joint indicates control Single basic block in flow graph processed indicates the basic block using the instruction number in each basic block as the weight of the node Size.
Further, program control flowchart is divided by multiple biggish controls by clustering method and flows subgraph, specifically adopted With following division mode:
Step 1: selecting the side in controlling stream graph according to clustering algorithm;
Step 2: deleting the side selected in controlling stream graph by step 1;
Step 3: the modularity for calculating controlling stream graph updates controlling stream graph, otherwise returns if modularity gets a promotion Return step 1;
Step 4: controlling stream graph division finishes, the subgraph after being divided is the result of program segmenting.
Further, each program segment independent symbols execute specifically in the following ways:
(1) start node and terminal node in each program segment are determined;
(2) jump information between the basic block of completion missing;
(3) each program segment is traversed, and selects corresponding analysis strategy;
(4) it is analyzed according to the corresponding analysis strategy of use.
Further, it includes that status data merging and constraint condition merge that the result, which merges,.As a result merge be for For two connected program segments, after the completion of status data and constraint condition to all connected program segments merge, obtain The result that entire procedure sign executes.
Beneficial effects of the present invention:
The present invention is executed with the sequence analysis method efficiency of existing segmented semiology analysis not for extensive procedure sign High problem analyzes program by using the independent analytical methods of segmented semiology analysis.This method can be to journey Sequence section carries out independent analysis.It is divided by program segment, independent semiology analysis is carried out to each program segment, and carry out result conjunction And to it improve the efficiency of semiology analysis.
The present invention is excessive for the segmentation of existing segmented semiology analysis, and serious segment data stream information of being isolated is propagated and led The problem for causing operation result inaccuracy carries out coarseness division to program segment by clustering algorithm, is guaranteeing program segment as far as possible Under the premise of scale will not lead to the problem of path explosion, the program segment number for dividing and obtaining is reduced to the greatest extent, to alleviate due to dividing Program caused by section analyzes inaccurate problem.
Detailed description of the invention
Fig. 1 is segmented symbolic excution methodology flow chart of the present invention.
Specific embodiment
The present invention executes in segmented semiology analysis not high to extensive program analysis efficiency and original for original symbol The problems such as existing analysis is inaccurate, proposes a kind of segmented symbolic excution methodology.This method segmented symbol different from the past Number execute in sequence analysis mode to program carry out semiology analysis method, by the way of independently analyzing program segment into Row semiology analysis.Procedure division is biggish multiple sections by clustering method by this method, and then independent to each section of progress Then semiology analysis merges each section of semiology analysis result, complete the analysis to entire program.The present invention holds symbol Row tool has versatility, is embodied on angr tool, it can be equally used for other semiology analysis works such as KLEE and JPF Tool.
As shown in fig. 1, input of the invention is program, by control flow analysis to generate controlling stream graph, and then will be led to It crosses the program segment division methods based on cluster to divide controlling stream graph, to mark off each program segment.It, will in next step Independent symbols are carried out to each program segment and execute analysis, and the missing of the jump information for caused by program segmenting is mended Entirely.After completing semiology analysis to each program segment, the merging of each program segment processing result will be carried out, include status data Merging, constraint condition merging and obtain semiology analysis result.Control flow analysis will be introduced respectively below, program segment is drawn The treatment process for dividing method, single block semiology analysis and result to merge.
1, control flow analysis
Control flow analysis extracts controlling stream graph from program first.The node of controlling stream graph is program basic block, oriented While being jumping between basic block.In the present embodiment, controlling stream graph is extracted using angr tool.It is appreciated that being embodied In the process, the acquisition of controlling stream graph can also be obtained with nationality by other tools.
On the basis of obtaining controlling stream graph using angr, modifies to the controlling stream graph, be further added to section Point weight.Node weight is the instruction number in program basic block, for indicating the size of the basic block.The control that this step generates Flow graph processed is divided for subsequent program segment.
2, program segment divides
Next program segment division will be carried out.Specifically, being exactly to calculate controlling stream graph obtained in the previous step using cluster The step of method is divided, division is as follows:
(1) side in controlling stream graph is selected according to clustering algorithm.
(2) side selected in controlling stream graph by step 1 is deleted.
(3) modularity for calculating clustering algorithm updates controlling stream graph if modularity gets a promotion, and otherwise returns to the (1) step.
(4) controlling stream graph division finishes, the subgraph after being divided.
The result that control stream subgraph, that is, program segment after wherein dividing divides.Herein, each subgraph corresponds to one The program segment of semiology analysis can independently be carried out.
3, single block semiology analysis
Semiology analysis is independently carried out to single program segment, can be carried out in four steps.
(1) start node and terminal node in each program segment are determined;
(2) jump information between the basic block of completion missing;
(3) each program segment is traversed, and selects corresponding analysis strategy;
(4) it is analyzed according to the corresponding analysis strategy of use.
In (2) step, when dividing due to program basic block, routine call and directly seeking when returning are not considered Location and indirect addressing strategy, it is thus possible to which the case where can not find destination address when program being caused to return is needed at this time according to original The corresponding jump information of controlling stream graph completion.
In (3) step, need according to program segment type selection analysis strategy.In the present embodiment, if it is sequential programme, Then selection is common explores strategy;If it is cyclic program, then dynamic static mixing implementation strategy is selected.
In (4) step, using existing semiology analysis tool, since program segment entrance, from each start node to end Only node carries out semiology analysis, will finally obtain the state executed to terminal node, the input merged as next step result.? In the present embodiment, semiology analysis is completed using angr tool, it will be understood that in the specific implementation process, can also be by other symbols Number execution tool replaces angr.
4, result merges
As a result merging is for there are for two program segments of connection relationship in original controlling stream graph.It is connected to all Program segment result merge after the completion of obtain the result that entire procedure sign executes.
Further, for two program segments connected by directed edge, the program segment at directed edge start node is known as Upstream program section, the program segment at directed edge end node are known as downstream program section.
As a result merge mainly includes that status data merging and constraint condition merge two parts.
Further, in the present embodiment, it includes that register merging and memory merge two parts that the status data, which merges,.
The register merges according to program architecture information, obtains register list, then successively merges two program segments In register information.
The main thought for merging register information, is using the status data result of upstream program section as downstream program section Input.When specifically merging the value of two registers, it may appear that four kinds of different situations:
(1) value in downstream program section state is actual value rather than value of symbol, does not at this moment need to be replaced;
(2) value in downstream program section state is value of symbol, and the analog value in upstream program section state is actual value, at this moment It needs actual value substituting into symbolic variable;
(3) value in the state of two program segments of upstream and downstream is actual value, at this moment need by two character expressions into Row substitutes into operation;
(4) the register value no initializtion in downstream program section state, at this moment directly enabling its value is upstream program section state The value of middle corresponding registers.
The memory merging method is similar with the register merging method, but different.Specifically, due to interior Depositing is one section of continuous address, reads and writes length every time and is not fixed, therefore, it is necessary to obtain internal storage data by insertion analysis breakpoint, And need in Symbolic Execution, record the memory address and length of write-in.
Constraint condition in the constraint condition merging refers to that program is executed by program segment entrance to the program segment and exports institute The constraint needed.There are two steps for constraint condition merging: the first step is to be replaced the value of symbol in downstream condition constraint condition, The processing mode of this step is similar to register merging method;Second step is that the constraint condition in upper free state is copied to downstream State.
After completing all merging process, the semiology analysis that can be obtained complete routine is analyzed as a result, holding including program State and corresponding path condition of the row to end node.Finally, can calculate whether the state triggers according to program state Loophole, and path is calculated with the presence or absence of (i.e. whether disaggregation is empty) according to path constraint, when disaggregation is not sky, survey can be generated Example on probation.

Claims (6)

1. a kind of segmented symbolic excution methodology, which is characterized in that by clustering method by procedure division be multiple program segments, into And independent symbols execution is carried out to each program segment, then the semiology analysis result of each program segment is merged, is completed to whole The analysis of a program.
2. a kind of segmented symbolic excution methodology as described in claim 1, which is characterized in that by before procedure division first from journey Controlling stream graph is extracted in sequence, node is program basic block, and directed edge is jumping between basic block, then passes through cluster Controlling stream graph is divided into multiple controls and flows subgraph by method.
3. a kind of segmented symbolic excution methodology as claimed in claim 2, which is characterized in that each in the controlling stream graph Weight is arranged in a node, and the controlling stream graph interior joint indicates the single basic block in controlling stream graph, will be in each basic block Weight of the instruction number as the node, indicates the size of the basic block.
4. a kind of segmented symbolic excution methodology as claimed in claim 3, which is characterized in that by clustering method by program control Flow graph processed is divided into multiple control stream subgraphs, specifically uses following division mode:
Step 1: selecting the side in controlling stream graph according to clustering algorithm;
Step 2: deleting the side selected in controlling stream graph by step 1;
Step 3: the modularity for calculating controlling stream graph updates controlling stream graph if modularity gets a promotion, step is otherwise returned Rapid one;
Step 4: controlling stream graph division finishes, the subgraph after being divided is the result of program segmenting.
5. a kind of segmented symbolic excution methodology as claimed in claim 1 or 2 or 3 or 4, which is characterized in that each journey Sequence section independent symbols execute specifically in the following ways:
(1) start node and terminal node in each program segment are determined;
(2) jump information between the basic block of completion missing;
(3) each program segment is traversed, and selects corresponding analysis strategy;
(4) it is analyzed according to the corresponding analysis strategy of use.
6. a kind of segmented symbolic excution methodology as claimed in claim 1 or 2 or 3 or 4, which is characterized in that the result is closed And merge including status data merging and constraint condition, the status data and constraint condition to all connected program segments merge After the completion, the result that entire procedure sign executes is obtained.
CN201810819763.7A 2018-07-24 2018-07-24 Sectional type symbol execution method Active CN109002723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810819763.7A CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810819763.7A CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Publications (2)

Publication Number Publication Date
CN109002723A true CN109002723A (en) 2018-12-14
CN109002723B CN109002723B (en) 2021-09-07

Family

ID=64597107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810819763.7A Active CN109002723B (en) 2018-07-24 2018-07-24 Sectional type symbol execution method

Country Status (1)

Country Link
CN (1) CN109002723B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688403A (en) * 2021-10-26 2021-11-23 江苏通付盾科技有限公司 Intelligent contract vulnerability detection method and device based on symbolic execution verification
CN116541280A (en) * 2023-05-06 2023-08-04 中国电子技术标准化研究院 Fuzzy test case generation method based on neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049377A (en) * 2012-12-14 2013-04-17 中国信息安全测评中心 Parallel symbolic execution method based on path cluster reductions
CN106156366A (en) * 2016-08-01 2016-11-23 浙江工业大学 A kind of pinning control node selecting method based on cluster

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049377A (en) * 2012-12-14 2013-04-17 中国信息安全测评中心 Parallel symbolic execution method based on path cluster reductions
CN106156366A (en) * 2016-08-01 2016-11-23 浙江工业大学 A kind of pinning control node selecting method based on cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
牛小鹏 等: "基于控制流信息的克里普克结构生成方法", 《计算机科学》 *
范文庆: "分段符号执行模型及其环境交互问题研究", 《中国博士学位论文全文数据库》 *
魏小凤 等: "基于超图模型的软件模块自动划分", 《计算机工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688403A (en) * 2021-10-26 2021-11-23 江苏通付盾科技有限公司 Intelligent contract vulnerability detection method and device based on symbolic execution verification
CN116541280A (en) * 2023-05-06 2023-08-04 中国电子技术标准化研究院 Fuzzy test case generation method based on neural network
CN116541280B (en) * 2023-05-06 2023-12-26 中国电子技术标准化研究院 Fuzzy test case generation method based on neural network

Also Published As

Publication number Publication date
CN109002723B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN101714118B (en) Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
US8627290B2 (en) Test case pattern matching
CN102789419B (en) Software fault analysis method based on multi-sample difference comparison
CN104536883B (en) A kind of static defect detection method and its system
US5845064A (en) Method for testing and verification of a CPU using a reference model
CN110580226B (en) Object code coverage rate testing method, system and medium for operating system level program
JPH08241193A (en) Method for analysis of code segment
CN104732152B (en) Buffer-overflow vulnerability automatic testing method based on the beta pruning of semiology analysis path
CN103116540A (en) Dynamic symbolic execution method and device thereof based on overall situation super block dominator graph
CN103294594A (en) Test based static analysis misinformation eliminating method
CN107085533B (en) A kind of analysis method and system that pointer modified influences
CN110287702A (en) A kind of binary vulnerability clone detection method and device
CN105302719A (en) Mutation test method and apparatus
CN106055479A (en) Android application software test method based on compulsory execution
CN104090798A (en) Dynamic and static combined interrupt drive program data race detection method
CN108021507A (en) The parallel route searching method and device of semiology analysis
CN105487983B (en) Sensitive spot approach method based on intelligent Route guiding
CN103218297B (en) The screening technique and device of test data
CN109002723A (en) A kind of segmented symbolic excution methodology
US6691079B1 (en) Method and system for analyzing test coverage
US9626468B2 (en) Assertion extraction from design and its signal traces
CN105224455B (en) A kind of method for automatically generating character string type test case
CN107247663B (en) Redundancy variant identification method
CN107741907A (en) With reference to bottom instruction and the simulator detection method and device of system information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant