CN103440122A - Novel static function identification method using reverse extension control flow graphs - Google Patents
Novel static function identification method using reverse extension control flow graphs Download PDFInfo
- Publication number
- CN103440122A CN103440122A CN2013102919410A CN201310291941A CN103440122A CN 103440122 A CN103440122 A CN 103440122A CN 2013102919410 A CN2013102919410 A CN 2013102919410A CN 201310291941 A CN201310291941 A CN 201310291941A CN 103440122 A CN103440122 A CN 103440122A
- Authority
- CN
- China
- Prior art keywords
- control flow
- reverse
- graph
- flow graph
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000003068 static effect Effects 0.000 title claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 67
- 238000010276 construction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000007796 conventional method Methods 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 101100379079 Emericella variicolor andA gene Proteins 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a novel static function identification method using reverse extension control flow graphs, which belongs to the field of software reverse engineering. The method comprises the following steps: 1, building a set of regional reverse extension control flow graphs; 2, denoising the reverse extension control flow graphs and deleting nodes which can not be generated by a compiler and are searched in a building process; 3, deleting and combining the reverse extension control flow graphs; 4, identifying a function entry in the reverse extension control flow graphs; 5, acquiring the identification results of a plurality of functions in a specified region. Compared with the conventional method, the novel static function identification method has the characteristics that return instructions of functions are taken as identification features, and function return instruction nodes are taken as reverse search starting points to construct the reverse extension control flow graphs, so that a plurality of functions can be identified in a specified binary code region, and functions without specific header byte features and cross reference which cannot be identified by using the conventional static identification method can be identified effectively.
Description
Technical Field
The invention belongs to the field of software reverse engineering, and relates to a static function identification method using a reverse expansion control flow graph.
Background
Binary code review is a security audit process performed on binary code. Software of a certain size inevitably uses third party components. And third party components tend to lack source code. Such as microsoft system dynamic link libraries. To code review such software, reverse engineering is almost the only option. On the other hand, malicious codes are rampant at present, seriously threatening the safety of the computer system, and the detection of the malicious codes is particularly important for improving the safety of the computer system. However, most malicious codes cannot acquire the source codes, so reverse engineering is almost the only analysis means.
The reverse engineering includes disassembling, recognizing function and high level language elements, such as library function, variable, structure, etc. And finally, identifying the operation semantics of each function, and further enhancing the understanding of the whole program semantics by analyzing the cross references among the functions. It can be seen from the above steps that the recognition function is a crucial link in the whole reverse engineering. The conventional static identification method uses the characteristics of the beginning bytes of the function and the cross reference information between the functions to identify the function. Functions without significant features and cross-references often exist in large numbers in binary code, and thus traditional static identification methods cannot effectively identify such functions.
Disclosure of Invention
In order to solve the problem of identifying multiple functions without significant features and cross-references in a specified binary code region, the invention provides a novel static function identification method using a reverse-extended control flow graph.
The basic idea of the technical scheme adopted by the invention for solving the technical problem is as follows: for a specified binary code region, it is assumed that all addresses conforming to the characteristics of a function return instruction (generally, a RET instruction) are function return addresses, and then a corresponding Reverse Extended Control Flow Graph (RECFG) is constructed from the function return addresses from bottom to top. The reverse extended control flow graph refers to a control flow graph, wherein nodes represent instructions, edges represent instruction control dependencies, but different from the traditional control flow graph, the control flow graph reversely constructs an extended control flow graph from function return instruction nodes, and the nodes in the graph are predecessors of all possible instructions in front of the nodes. The reverse search starting point of the graph is a function return instruction, which contains a control flow graph of the function to be identified, so that the traditional control flow graph is a subgraph of the RECFG. For any two RECFGs, they have and have only three relationships: 1) independent of each other, i.e. they are not connected to each other; 2) subgraphs which are both a graph, namely, the subgraphs belong to a function; 3) conflicts, i.e. the search starting point (function return instruction) of one graph is part of the operand of an instruction of another graph. For two graphs that conform to relationship 2), they can be merged. And for two graphs with conflict relationship, deleting one graph by adopting a multi-attribute decision ideal point method to solve the conflict. Eventually all independent RECFGs correspond to one function. There is only one entry point for a function, which may be any node in a RECFG. And traversing the RECFG from the node to obtain a subgraph, and actually controlling the flow graph by using the node as a function with an entry point. The function identification problem thus translates into an identification problem of the entry point of the function. And finally, according to the attribute of the control flow graph corresponding to each node, identifying the entry point of the function by using a multi-attribute decision ideal point method.
The invention discloses a static function identification method using a reverse expansion control flow graph, which comprises the following steps:
step 1: establishing a set of regional reverse expansion control flow diagrams;
step 2: denoising the reverse expansion control flow graph, and deleting nodes which are searched in the RECFG construction process and can not be generated by a compiler;
and step 3: deleting and merging the reverse expansion control flow graph;
and 4, step 4: identifying a function entry in a reverse-expansion control flow graph;
and 5: and obtaining the recognition results of the plurality of functions in the designated area.
Different from the traditional method, the method takes the return instruction of the function as the identification characteristic, takes the node of the return instruction of the function as the reverse search starting point to construct the reverse extension control flow graph, can identify a plurality of functions in the specified binary code area, and can effectively identify the functions which can not be identified by the traditional static identification method and have no specific head byte characteristic and no cross reference.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a reverse expansion control flow graph construction algorithm;
FIG. 3 is a reverse search algorithm in a reverse extended control flow graph construction algorithm;
FIG. 4 is a schematic diagram of a reverse expansion control flow graph;
fig. 5 is the recognition result of fig. 4.
Detailed description of the inventionMeans for
The technical solution of the present invention is further described below with reference to the accompanying drawings, but the present invention is not limited thereto, and modifications or equivalent substitutions may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
The first embodiment is as follows: in the static function identification method using the reverse expansion control flow graph in the embodiment, first, for a specified binary code region, a corresponding reverse expansion control flow graph is constructed from all addresses conforming to the characteristics of function return instructions; by calculating three attributes of the graph: the total instruction length, the circle complexity and the percentage of all-digit instructions, and a multi-attribute decision ideal point method is used for solving possible conflicts in the graph; and finally, converting the function identification into the identification problem of the function entry point, namely calculating three attributes of the sub-graph obtained by traversing each non-front-driving node of each graph, and deciding the function entry point by using an ideal point method to obtain a final function identification result. The method comprises the following specific steps:
step 1: establishing a set of regional reverse expansion control flow diagrams:
for a specified code area, constructing a corresponding Reverse Extended Control Flow Graph (RECFG) from bottom to top from all addresses conforming to the characteristics of function return instructions to form a set of the area Reverse Extended Control Flow Graph.
As shown in fig. 2, the construction process of the RECFG is divided into two major steps. First, the initial graph is constructed in reverse: the search starting point is first added to the graph, and all possible predecessors of the newly added point in the graph are repeatedly searched. FIG. 3 shows a search for a point in RECFGvAlgorithm for all possible predecessors, specifically detecting whether all possible length instructions arevPrecursor of (a): if bytes and points of an instructionvThe bytes of the same length before (lower address) the corresponding instruction address are consistent, then the instruction is a pointvA front ofAnd (5) driving. And secondly, adding a control flow graph starting from jump targets of branch instructions in the graph into the current RECFG by using a conventional recursive traversal method based on a control flow. Fig. 4 shows a schematic diagram of a RECFG.
The reverse expansion control flow graph RECFG is an intermediate representation for function identification, which is provided by the invention and refers to a control flow graph, wherein nodes represent instructions, edges represent instruction control dependencies, but the control flow graph is different from a traditional control flow graph, and the predecessors of the nodes in the reverse expansion control flow graph are all instructions which may exist in front of the predecessors.
Step 2: denoising the reverse expansion control flow graph, and deleting nodes which are searched in the RECFG construction process and can be generated by a non-compiler:
step 21: deleting illegal conditional branch instructions, including conditional branch instructions without predecessor nodes, conditional branch instructions with jump targets of illegal memory addresses, and conditional branch instructions with byte overlapping instructions in two branches;
step 22: deleting paths which do not take the function return instruction, the jump instruction and the CALL instruction as the ends;
step 23: checking each path, and deleting nodes (such as pushads without pads on the path) and all predecessors thereof, wherein the paired instructions are not matched;
step 24: deleting a high-authority instruction and all predecessors thereof, including an interruption related instruction, a shutdown instruction and a CPU special register operation instruction;
step 25: deleting instructions without practical significance, including NOP, breakpoint instructions, 0 adding instructions, 0 subtracting instructions, logic OR of 0, shift operation instructions with 0 moving times, and transmission instructions with the same source operands and destination operands;
step 26: taking a subgraph containing a reverse search starting point as a new reverse expansion control flow graph;
step 27: and repeating the steps 21-26 until the graph is not changed any more.
And step 3: deleting and merging the reverse extended control flow graph:
step 31: enumerating any two RECFGs in the region RECFG set, and if the reverse search starting point (function return instruction) of one graph is in the other graph, deleting the graph with less nodes from the region RECFG set;
step 32: enumerating any two RECFGs in the area RECFG set, and if the reverse search starting point (function return instruction) of one graph is a part of a corresponding instruction of a certain point in another graph, deleting the graph which is relatively poor by using a multi-attribute decision ideal point method according to the instruction length sum, the circle complexity and the all-digit instruction percentage of the graph.
And 4, step 4: identifying a function entry in a reverse-expansion control flow graph:
step 41: a control flow graph taking each point without predecessor as an entry point is obtained by traversing from each point without predecessor in the graph;
step 42: and (3) according to the instruction length sum, the circle complexity and the all-digit instruction percentage of the graph, using a multi-attribute decision ideal point method to decide an optimal control flow graph as a function recognition result (as shown in FIG. 5).
And 5: and obtaining the recognition results of the plurality of functions in the designated area.
The second embodiment is as follows: in a difference from the first embodiment, the present embodiment provides a multi-attribute decision ideal point method used in step 32 and step 42, which includes the following steps:
(1) calculating a decision matrix:
the evaluation indexes are respectively: the sum of the instruction lengths, the round-robin complexity, the percentage of full-digit instructions of the graph. The full-digit instruction is an instruction with the maximum digit which can be processed by the CPU at one time. Weights of these 3 evaluation indexesw j (j=1,2,3) are 0.5, 0.38, 0.12, respectively.
Suppose there ismAnd (4) selecting 3 alternative schemes, wherein the decision matrix of the evaluation indexes is as follows:
Dmiddle elementx ij Is shown asi(iii) of alternative solutionjThe value of each evaluation index.
(2) Calculating a normalized decision matrix:
wherein,w j is the firstjThe weight of each evaluation index.
(4) Determining a positive ideal solution and a negative ideal solution according to the weighted judgment matrix:
A * andA - in each case being。
(5) Calculating Euclidean distances between each alternative and the positive ideal solution:
(6) Calculating relative closeness of each alternativeC * :
(7) The alternatives are ordered according to relative closeness size.
In the sorting result, if the closeness isC * The larger the value is, the better the alternative is, and the scheme corresponding to the value is the optimal scheme.
Claims (6)
1. A new static function identification method using a reverse expansion control flow graph is characterized by comprising the following steps:
step 1: establishing a set of regional reverse expansion control flow diagrams;
step 2: denoising the reverse expansion control flow graph, and deleting nodes which are searched in the process of constructing the reverse expansion control flow graph and can not be generated by a compiler;
and step 3: deleting and merging the reverse expansion control flow graph;
and 4, step 4: identifying a function entry in a reverse-expansion control flow graph;
and 5: and obtaining the recognition results of the plurality of functions in the designated area.
2. The method according to claim 1, wherein the specific process of step 1 is as follows:
and for the specified code region, constructing a corresponding reverse expansion control flow graph from bottom to top from all addresses conforming to the characteristics of the function return instruction, and forming a set of region reverse expansion control flow graphs.
3. The new static function identification method using reverse-expansion control flow graph according to claim 2 is characterized in that the construction process of the reverse-expansion control flow graph is divided into two steps:
first, the initial graph is constructed in reverse: firstly, adding a search starting point into the graph, and repeatedly searching all possible predecessors of the newly added point in the graph;
and secondly, adding the control flow graph starting from the jump targets to the current reverse expansion control flow graph by using the existing common control flow-based recursive traversal method for the jump targets of the branch instructions in the graph.
4. The new static function identification method using the reverse-expansion control flow graph according to claim 1, wherein the step 2 for denoising the reverse-expansion control flow graph comprises the following steps:
step 21: deleting illegal conditional branch instructions, including conditional branch instructions without predecessor nodes, conditional branch instructions with jump targets of illegal memory addresses, and conditional branch instructions with byte overlapping instructions in two branches;
step 22: deleting paths which do not take the function return instruction, the jump instruction and the CALL instruction as the ends;
step 23: checking each path, and deleting nodes and all predecessors thereof which are not matched with the paired instructions;
step 24: deleting a high-authority instruction and all predecessors thereof, including an interruption related instruction, a shutdown instruction and a CPU special register operation instruction;
step 25: deleting instructions without practical significance, including NOP, breakpoint instructions, 0 adding instructions, 0 subtracting instructions, logic OR of 0, shift operation instructions with 0 moving times, and transmission instructions with the same source operands and destination operands;
step 26: taking a subgraph containing a reverse search starting point as a new reverse expansion control flow graph;
step 27: and repeating the steps 21-26 until the graph is not changed any more.
5. The method of claim 1, wherein said step 3 for deleting and merging reverse-expansion control flow graph comprises the steps of:
step 31: enumerating any two reverse expansion control flow graphs in the regional reverse expansion control flow graph set, and deleting a graph with a small number of nodes from the regional reverse expansion control flow graph set if the reverse search starting point of the graph is in the other graph;
step 32: enumerating any two reverse extension control flow graphs in the reverse extension control flow graph set of the region, and if the reverse search starting point of one graph is a part of a corresponding instruction of a certain point in the other graph, deleting the graph which is relatively poor by using a multi-attribute decision ideal point method according to the instruction length sum, the circle complexity and the whole digit instruction percentage of the graph.
6. A new static function identification method using a reverse-extended control flow graph according to claim 1, wherein said step 4 of identifying a function entry in a reverse-extended control flow graph comprises the steps of:
step 41: a control flow graph taking each point without predecessor as an entry point is obtained by traversing from each point without predecessor in the graph;
step 42: and according to the instruction length sum, the circle complexity and the all-digit instruction percentage of the graph, deciding an optimal control flow graph as a function recognition result by using a multi-attribute decision ideal point method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310291941.0A CN103440122B (en) | 2013-07-12 | 2013-07-12 | A kind of static function recognition methods using reverse expansion controlling stream graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310291941.0A CN103440122B (en) | 2013-07-12 | 2013-07-12 | A kind of static function recognition methods using reverse expansion controlling stream graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440122A true CN103440122A (en) | 2013-12-11 |
CN103440122B CN103440122B (en) | 2016-06-08 |
Family
ID=49693813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310291941.0A Active CN103440122B (en) | 2013-07-12 | 2013-07-12 | A kind of static function recognition methods using reverse expansion controlling stream graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103440122B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095470A (en) * | 2016-08-17 | 2016-11-09 | 广东工业大学 | The program comprehension method and system that the cognitive relative importance value of stream drives are controlled based on flattening |
CN107704235A (en) * | 2017-09-22 | 2018-02-16 | 深圳航天科技创新研究院 | The analytic method of data flowchart, system and storage medium in mathematics library |
CN113918171A (en) * | 2021-10-19 | 2022-01-11 | 哈尔滨理工大学 | Novel disassembling method using extended control flow graph |
CN118502732A (en) * | 2024-07-18 | 2024-08-16 | 杭州新中大科技股份有限公司 | Analysis method, device, equipment and medium of byte code program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968766A (en) * | 2010-10-21 | 2011-02-09 | 上海交通大学 | System for detecting software bug triggered during practical running of computer program |
US20120159458A1 (en) * | 2010-12-17 | 2012-06-21 | Microsoft Corporation | Reconstructing program control flow |
US20130055221A1 (en) * | 2011-08-26 | 2013-02-28 | Fujitsu Limited | Detecting Errors in Javascript Software Using a Control Flow Graph |
-
2013
- 2013-07-12 CN CN201310291941.0A patent/CN103440122B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968766A (en) * | 2010-10-21 | 2011-02-09 | 上海交通大学 | System for detecting software bug triggered during practical running of computer program |
US20120159458A1 (en) * | 2010-12-17 | 2012-06-21 | Microsoft Corporation | Reconstructing program control flow |
US20130055221A1 (en) * | 2011-08-26 | 2013-02-28 | Fujitsu Limited | Detecting Errors in Javascript Software Using a Control Flow Graph |
Non-Patent Citations (1)
Title |
---|
邱景: "基于基本块划分的库函数快速识别技术", 《计算机工程》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095470A (en) * | 2016-08-17 | 2016-11-09 | 广东工业大学 | The program comprehension method and system that the cognitive relative importance value of stream drives are controlled based on flattening |
CN106095470B (en) * | 2016-08-17 | 2019-08-09 | 广东工业大学 | The program comprehension method and system for flowing cognition priority driving are controlled based on flattening |
CN107704235A (en) * | 2017-09-22 | 2018-02-16 | 深圳航天科技创新研究院 | The analytic method of data flowchart, system and storage medium in mathematics library |
CN113918171A (en) * | 2021-10-19 | 2022-01-11 | 哈尔滨理工大学 | Novel disassembling method using extended control flow graph |
CN118502732A (en) * | 2024-07-18 | 2024-08-16 | 杭州新中大科技股份有限公司 | Analysis method, device, equipment and medium of byte code program |
Also Published As
Publication number | Publication date |
---|---|
CN103440122B (en) | 2016-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9703565B2 (en) | Combined branch target and predicate prediction | |
JP6605573B2 (en) | Parallel decision tree processor architecture | |
CN110287702B (en) | Binary vulnerability clone detection method and device | |
US8239404B2 (en) | Identifying entries and exits of strongly connected components | |
US10157239B2 (en) | Finding common neighbors between two nodes in a graph | |
US8589888B2 (en) | Demand-driven analysis of pointers for software program analysis and debugging | |
US20130291113A1 (en) | Process flow optimized directed graph traversal | |
US20150262062A1 (en) | Decision tree threshold coding | |
US20180074798A1 (en) | Visualisation for guided algorithm design to create hardware friendly algorithms | |
US9361403B2 (en) | Efficiently counting triangles in a graph | |
CN106062740B (en) | Method and device for generating multiple index data fields | |
US20150262063A1 (en) | Decision tree processors | |
CN114385185B (en) | Control flow graph generation method and device for intelligent contract | |
US9552284B2 (en) | Determining valid inputs for an unknown binary program | |
CN103440122A (en) | Novel static function identification method using reverse extension control flow graphs | |
CN110457046B (en) | Disassembles method, disassembles device, storage medium and disassembles terminal for hybrid instruction set programs | |
US20150363177A1 (en) | Multi-branch determination syntax optimization apparatus | |
US9619362B2 (en) | Event sequence construction of event-driven software by combinational computations | |
CN108304467B (en) | Method for matching between texts | |
JP6160232B2 (en) | Compilation program and compilation method | |
Alkohlani et al. | Towards performance predictive application-dependent workload characterization | |
CN103577728A (en) | Method for identifying library functions by using shrinkage executing dependence graphs | |
CN118036005B (en) | Malicious application detection method, system, equipment and medium based on simplified call graph | |
CN114610606B (en) | Binary system module similarity matching method and device based on arrival-fixed value analysis | |
Dong et al. | A new method of software clone detection based on binary instruction structure analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230628 Address after: Building 1, Kechuang headquarters, Shenzhen (Harbin) Industrial Park, 288 Zhigu street, Songbei District, Harbin City, Heilongjiang Province Patentee after: Harbin Nenchuang Digital Technology Co.,Ltd. Address before: 150000 No. 92, West Da Zhi street, Nangang District, Harbin, Heilongjiang. Patentee before: HARBIN INSTITUTE OF TECHNOLOGY |