CN114527963A - Class inheritance relationship identification method in C + + binary file and electronic device - Google Patents

Class inheritance relationship identification method in C + + binary file and electronic device Download PDF

Info

Publication number
CN114527963A
CN114527963A CN202011322846.9A CN202011322846A CN114527963A CN 114527963 A CN114527963 A CN 114527963A CN 202011322846 A CN202011322846 A CN 202011322846A CN 114527963 A CN114527963 A CN 114527963A
Authority
CN
China
Prior art keywords
virtual
node
destructor
virtual function
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011322846.9A
Other languages
Chinese (zh)
Inventor
龚晓锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202011322846.9A priority Critical patent/CN114527963A/en
Publication of CN114527963A publication Critical patent/CN114527963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a class inheritance relationship identification method and an electronic device in a C + + binary file, which comprises the steps of extracting a virtual function table, a virtual base class table and a symbol table from the binary file; acquiring a destructor in each virtual function table according to the symbol table, pairing the virtual function tables with corresponding destructors, and performing cross reference query on each virtual function table to obtain a constructor; according to the constructor, the destructor and the generated control flow graph of directly or indirectly calling the constructor or the destructor, carrying out static taint analysis on the process of executing the virtual function table and the virtual base class by the constructor and the destructor, and constructing an object memory layout; analyzing each function in the virtual function table through the object memory layout, and recovering the class information in the binary file to obtain the class inheritance relationship. The invention can restore class inheritance relation including virtual inheritance and improve class identification recall rate by heuristic search construction function, efficient CFG generation algorithm and static taint analysis among processes.

Description

Class inheritance relationship identification method in C + + binary file and electronic device
Technical Field
The invention belongs to the technical field of software reverse analysis, and particularly relates to a class inheritance relationship identification method in a C + + binary file and an electronic device.
Background
The safety audit is generally lacked in the commercial software in the real world, so the demand of the commercial software for the safety audit is greatly increased. However, commercial software is generally large and complex, so the desire to security audit them requires a first-pass understanding, i.e., high-level abstract information. As shown by a chart of the development trend of the popularity of the programming language from 2002 released by TIOBE to the present, the C + + language is always stable in the first three positions and is always the mainstream choice of large-scale business software due to the characteristics of high performance, stability and polymorphism. Although the recovery of high-level abstract information, namely class inheritance, is extremely difficult in the C + + reverse direction, the high-level abstract information has important help for security audit. The method can help the conversing personnel to quickly understand the whole framework and high-level abstract information of the large software, so that the subsequent converse analysis can be quickly and purposefully carried out; but it can also help the converse personnel understand the context of program critical locations, thus providing support for vulnerability mining or exploitation.
The prior technical solutions are mainly divided into two types, one type of technical solution is to infer an inheritance relationship between classes by constructing a function call relationship (control flow-based method), and the method needs to recover information of a control flow, such as a Function Call Graph (FCG). However, in the case of compilation optimization, the constructor may be in-line, which may lead to failure of the method. The second technical scheme is a method for identifying class relationships (based on data streams) by means of overwrite analysis and offset recording, and although the technical scheme does not need control stream information and can resist interference caused by function inlining to a certain extent, the heuristic method used is lack of universality and cannot be applied to a plurality of platforms or a more complex compiling optimization environment. In addition, the prior art can not recognize the virtual inheritance, thereby affecting the capability of recognizing the class and the class relationship. In addition, the prior technical scheme has low analysis efficiency and cannot analyze large binary files.
Disclosure of Invention
In order to solve the problems, the invention provides a class inheritance relationship identification method in a C + + binary file and an electronic device, which are used for recovering class information from compiled and optimized business software by extracting information, executing inter-process static taint analysis, combining a heuristic reasoning mode and through a set of speed-up methods according to the characteristics of the C + + binary interface (ABI) implementation.
In order to achieve the purpose, the invention adopts the following technical scheme:
a class inheritance relationship identification method in a C + + binary file comprises the following steps:
1) extracting a virtual function table, a virtual base class table and a symbol table from the binary file;
2) acquiring a destructor in each virtual function table according to the symbol table, pairing the virtual function tables with corresponding destructors, and performing cross reference query on each virtual function table to obtain a constructor;
3) according to the constructor, the destructor and the generated control flow graph of directly or indirectly calling the constructor or the destructor, carrying out static taint analysis on the process of executing the virtual function table and the virtual base class by the constructor and the destructor, and constructing an object memory layout;
4) analyzing each function in the virtual function table through the object memory layout, and recovering the class information in the binary file to obtain the class inheritance relationship.
Further, the destructor in each virtual function table is obtained through the following strategies:
1) traversing the functions in each virtual function table, and detecting whether the functions have the overwriting operation of the memory of the virtual function table;
2) searching a first later instruction of the functions and a second later instruction of an upper-layer function calling the functions, and detecting whether delete operation exists according to the symbol table;
3) if a function has an overwrite operation and a delete operation on either the first subsequent instruction or the second subsequent instruction, the function is a destructor of the corresponding virtual function table.
Further, the constructor is obtained by:
1) cross reference is carried out to inquire each virtual function table, a destructor is eliminated, and a residual function is obtained;
2) traversing the residual function, and detecting whether the memory of the virtual function table has an overwriting operation;
3) searching a first preamble instruction of a residual function and a second preamble instruction of an upper-layer function calling the residual function, and detecting whether a new operation exists;
4) if a residual function has an overwrite operation and a new operation in either the first or second preamble instruction, the residual function is a constructor of the corresponding virtual function table.
Further, a control flow graph is generated by:
1) traversing a set formed by the addresses of the constructors and the destructors, dividing each constructor and the destructors into basic blocks by a jump statement, and taking the basic block where the initial address of each constructor and the destructor is located as an initial basic block;
2) for each basic block, if the basic block is a direct jump statement or a conditional jump statement, connecting the basic block where a jump target is located with the basic block;
3) searching a basic block containing a call instruction which is not called by a system, connecting the basic block where the call instruction is located with a basic block where a function address pointed by the call instruction is located, and merging a function address set pointed by the call instruction with a set consisting of the addresses of a constructor and a destructor;
4) if the jump target address of the jump statement is in the analyzed basic block and is not the starting address or the ending address of the analyzed basic block, the basic block is divided into the jump target address;
5) the route loop branches of the loop structure and the route loop branches of the non-return path structure are marked respectively.
Further, an object memory layout is constructed by:
1) according to the control flow graph, marking the this pointer as an initial stain by using each constructor and each destructor, and selecting an execution path;
2) and carrying out transmission and elimination of stains on the virtual function table and the virtual base table, and constructing an object memory layout.
Further, the this pointer is marked as an initial taint for each constructor and destructor by:
1) if the constructors are all in-line into the common function, the this pointer is rax register, and the initial instruction is the next instruction of the new function;
2) if the constructor is not all inlined into the normal function, the this pointer is the rcx register and the start instruction is the first instruction of the constructor.
Further, an execution path is selected by using an intelligent path selection strategy; the intelligent routing strategy comprises the following steps:
1) when two branches have the trend circulating path branch mark without a circulating structure and the trend path branch mark without a returning path structure, a Flase branch path is taken;
2) the selection of a route marked with a branch mark of a route circulation path of a circulation structure and a branch mark of a route of a non-return route structure is avoided.
Further, selection of a going cyclic path branch marker labeled with a cyclic structure and a going path branch marker of a non-return path structure is avoided by:
1) setting instruction tracing;
2) the same branch instruction starts to adopt the strategy of random walk after being executed for more than a set number of times.
Further, the virtual function table and the virtual base class table are subjected to taint propagation and elimination through the following steps:
1) scanning each instruction one by taking a basic block as a unit, and selecting a path if a branch instruction is encountered;
2) if the value in the taint register is assigned to other registers, the target register is marked as taint; if the constant value is assigned to the taint register, the taint is eliminated; if the value in the taint register is pressed into the stack, recording the corresponding stack offset and marking the taint; if the stack variable with the stain mark pops up the stack and is assigned to a stain register, eliminating the stain of the corresponding stack offset, and marking the target register as the stain;
3) when a virtual base table write operation related instruction is executed, finding out corresponding memory offset through stains, and recording the virtual function table into a corresponding object memory layout;
4) when the write operation related instruction of the virtual function table is executed, the corresponding memory offset is found through the taint, the virtual function is recorded in the corresponding object memory layout by using a tail insertion method, and the overwriting sequence in the construction function is recorded.
5) When related to the virtual base class table reading operation related instruction, searching the corresponding virtual base class table from the object memory layout through the taint, calculating the real memory offset, recording the virtual base class table and the overwriting sequence thereof, and marking the memory offset as the virtual base class.
Further, recovering the class information from the binary file through heuristic reasoning; the heuristic reasoning comprises the following steps:
1) traversing the object memory layout of each object, extracting a first virtual function table stored in each memory offset with virtual function table attributes, deleting the virtual function table from the object memory layout, independently forming a node, checking whether the same destructor exists among the virtual function tables, merging the nodes if the same destructor exists, and reserving a large numerical value in an overwriting sequence during merging;
2) circularly extracting virtual function tables in the memory offset with the virtual function table attribute, circularly extracting one virtual function table from each memory offset each time, deleting the virtual function table from the object memory layout, independently forming the virtual function table into a node, establishing an inheritance relationship with the node to which the previous virtual function table belongs, detecting whether the same destructor exists between the virtual function tables in each circulation, merging the nodes if the same destructor exists, and reserving a large value in the overwriting sequence during merging;
3) traversing all nodes, checking whether virtual function tables of different nodes have the same destructor or not, if so, merging the nodes, and when merging, keeping a large numerical value in an overwriting sequence;
4) for the memory offset of each virtual base class attribute, sequentially extracting from a second virtual function table, deleting the extracted virtual function table from the object memory layout to form a node independently, establishing a virtual inheritance relationship with the node to which the first virtual function table belongs, traversing all nodes, searching nodes having the same analytic structure function with the virtual function table, merging the nodes, and keeping a large value in an overwriting sequence during merging; if the father node or the nodes of the found node have the virtual inheritance relationship, deleting the virtual inheritance relationship between the node of the virtual function table and the virtual base class;
5) if an isolated node or node tree exists, recording the virtual function table of the lowest child node of the node or node tree, overwriting the sequence, traversing other node trees, analyzing whether the destructor of the node in other node trees contains the virtual function table of the recording node or not, if the destructor contains the node or node tree and the node overwriting sequence of the destructor is smaller than the overwriting sequence of the recording node, classifying the node or node tree to which the recording node belongs to the corresponding node, and establishing object membership;
6) and performing aggregation and deduplication operation on all nodes, wherein only one node is reserved for the same node.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the method has the advantages that:
1) a method combining inter-process static taint analysis and heuristic reasoning is provided, and the class and class relationships can be restored, wherein the class and class relationships comprise virtual inheritance (the virtual inheritance cannot be identified by the previous method), and the recall rate of class identification is improved.
2) A number of techniques are proposed to improve the efficiency of the analysis, including heuristic search constructors, efficient CFG generation algorithms and inter-process static taint analysis, which make it possible to analyze large binary files.
3) Through experimental tests, the average recall rate of recovery classes of the system realized by the method under the O2 compiling and optimizing environment is 84.36 percent, which is improved by about 50 percent compared with the forefront tool OOAnalyzer, the average accuracy rate is 97.17 percent, and the analysis efficiency is improved by more than three orders of magnitude compared with the forefront tool OOAnalyzer.
Drawings
Fig. 1 is a diagram of an overall framework of a method for identifying inheritance relationship of C + + binary code class of analyzable large-scale business software according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating exemplary source code and class relationships according to one embodiment of the present invention.
FIG. 3 is a schematic diagram of a D object layout according to an embodiment of the invention.
FIG. 4 is a diagram illustrating a jump destructor according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of an intelligent routing strategy according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of an object memory layout according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a node generation process according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is further described in detail with reference to the following examples.
Fig. 1 is a block diagram of an overall framework of the inheritance relationship identification technology for C + + binary code class that can analyze large-scale business software according to an embodiment of the present invention. As shown in FIG. 2, which is the source code of a simple C + + program, there are a total of 5 classes A, B, C, D, E, where D multiple inherits B and C, B and C virtually inherit A at the same time, and E is an object member of D. When D is instantiated, the layout of the object memory is shown in fig. 3, where the sequence numbers in fig. 3 are the initialization sequence.
The technical scheme for identifying the inheritance relationship of the C + + binary code class of the analyzable large-scale commercial software comprises the following steps of:
1. extracting basic information from binary files
In order to perform the subsequent analysis normally, basic information needs to be extracted from the binary file, including: virtual function table (vftable), virtual base class table (vbtable), symbol table.
1) The characteristics of extracting the vftable basis are as follows:
a) location on read-only, e.g., rdata;
b) the address stored is on the code segment, e.g. text;
c) the first item has cross reference, and the virtual function table pointer (vftptr) is assigned with vftable;
d) if there is runtime type information (RTTI), then a runtime type information pointer (RTTIptr) precedes the vftable.
Specifically, if the features of a) -c) exist, the vftable can be extracted from the binary file, and an additional information of the vftable can be extracted through d).
2) The characteristic of extracting the vbtable basis is as follows:
a) location on read-only, e.g., rdata;
b) fixing two fields, wherein the length of the fields is fixed to 4 bytes, the first field is a vftable offset and is fixed to-4 (32-bit program) or-8 (64-bit program), and the second field is a virtual base class offset;
c) the first entry has cross-references, assigning the vbtable to a virtual base class table pointer (vbtptr).
Similarly, if the features of a) -c) all exist, the vbtable can be extracted from the binary file
3) The symbol table information extraction specifically comprises the following steps:
step a): extracting the vftable according to the characteristics through the vftable, and recording the address of the vftable and the address of the virtual function contained in the address;
step b): extracting the vbtable according to the characteristics through the vbtable, and recording a vbtable address and the content in a vbtable field;
step c): and extracting symbol table information according to the vftable address, the contained virtual function address, the vbtable address and the content in the vbtable field.
2. Destructor and constructor analysis
In the step, destructural functions in each vftable are searched, the vftable and the destructural functions are paired, then cross-reference query is carried out on each vftable, and the constructor is searched in a heuristic mode.
Specifically, because the inter-process static taint analysis does not analyze all functions, but starts from a destructor (dtor) or constructor (factor), it is necessary to first identify the destructor and constructor from the binary file and pair the destructor with vftable.
In the past, a control flow-based method pairs the vftable with a constructor, and in the case of compilation optimization, the constructor of a base class may be inlined into the constructor of a derivative class, or the constructors are all inlined into a common function, which may cause the pairing method to fail. The invention adopts a method of pairing the vftable and the destructor, and the destructor is generally set as a virtual function in order to prevent memory leakage, so that the destructor does not have the problem of function interconnection.
A derived class with multiple inheritance relationships has multiple vftable tables, previous data-stream based methods do merge vftable according to the OffsetTop field on a continuous address space, and one vftable is bound to only one class, however, this method is not applicable in the case of compilation optimization. In the case of compilation optimization, different vftable of the same class may be distributed in a discontinuous address space; different classes may share the same vftable. Since the use of the previous method can cause class deletion, the invention merges the vftable by using a destructor, and is not limited to one-to-one matching but can also be matched in a one-to-many way.
In a real-world binary file, only one of different vftable of the same class records a destructor, and no destructor exists in other vftable. However, through careful analysis and observation, as shown in fig. 4, in vftable where no destructor exists, there exists a wrapper function that jumps to a destructor, and the main function of the wrapper function is to modify this pointer and then jump to the true destructor. The invention searches semantically same destructor by detecting semantics as a heuristic method and combines different vftable of the same class. The destructor has semantics that are mainly two:
1) overwrite operations, such as mov qword [ rsi +0x8], vftptr;
2) delete operation, e.g., call delete ().
The destructor analysis specifically comprises the following steps:
step 1: traversing the function in each vftable, detecting whether the function has memory write operation of the vftable, namely, overwriting operation, and if the function has memory write operation of the vftable, executing the step 2;
step 2: searching an instruction behind the function and an instruction behind an upper-layer function calling the function, detecting whether a delete operation exists, obtaining a symbol of the function through a symbol table in information extraction by adopting a mode matching method, matching a common delete symbol with the function, if so, having the delete operation, executing the step 3, and otherwise, returning to the step 1;
and step 3: pairing the vftable and the identified destructor, if the vftable is not traversed, executing the step 1, otherwise, ending;
in the prior art, all functions are traversed to search a structural function, which can obviously increase the cost of analysis time when large binary files are faced. In the case of complex compilation optimization, the constructors may be all inlined into a common function, so the constructors searched by the invention are not necessarily true constructors, but include the functions of constructor behaviors. The constructor has the following semantics:
1) overwrite operations, such as mov qword [ rsi +0x8], vftptr;
2) new operation, e.g. call new ().
The constructor analysis specifically comprises the following steps:
step 1: cross-reference and inquiring each vftable, eliminating destructurable functions, traversing the rest functions, detecting whether the functions have overwriting operation, and if so, executing the step 2;
step 2: searching an instruction in front of the function and an instruction in front of an upper-layer function calling the function, detecting whether a new operation exists, obtaining a symbol of the function through a symbol table in information extraction by adopting a mode matching method, matching a common new symbol with the symbol, if so, having the new operation, executing the step 3, otherwise, returning to the step 1;
and step 3: and (3) adding the identified constructor into a constructor list, finishing if the function traversal is finished, or executing the step 1.
3. Inter-process static taint analysis
In this step, a highly efficient Control Flow Graph (CFG) generation algorithm is used to generate a corresponding CFG according to the constructor, the destructor and other functions that directly or indirectly call the constructor or destructor, and then inter-process static taint analysis is performed on the constructor/destructor to construct the object memory layout.
The prior method can not process the problem of indirect reference of data, so that the condition that vbtable read-write operations are respectively in different functions can not be processed, and further virtual inheritance can not be identified. In addition, in the case of compilation optimization, the analysis of the control flow is disturbed by the constructor inlining. Based on the reasons, the method adopts an inter-process method to solve the problem of indirect reference of data, so that virtual inheritance can be identified; and analyzing from the integral angle by taking the constructor or the destructor as a starting point to construct the memory layout of the object, thereby resisting the influence caused by the interconnection of the constructor.
The reason the invention adopts static taint analysis is to avoid the problem of path explosion of symbol execution and the problem of low code coverage of dynamic analysis. The path explosion problem of symbolic execution is particularly prominent when large binary files are analyzed, and the analysis time is greatly increased. Due to the characteristic of class inheritance relationship, only the operations related to the vftable and the vbtable need to be concentrated, so that the taint analysis is more suitable for some users. Most of the prior binary taint analysis technologies are dynamic, and the dynamic analysis method has the defect of low code coverage rate. The solution to this problem is to construct multiple test samples or run from multiple starting points, but both of these approaches present new challenges. Constructing multiple test samples needs to solve the problem of how to automatically construct multiple test cases which can trigger all kinds of methods; running from multiple starting points requires solving the problem of how to determine the context of each starting point. These problems have not been solved well in current research progress and therefore dynamic analysis methods are not applicable here. For the above reasons, the method employed by the present invention is inter-process static taint analysis.
The interprocess static taint analysis includes three parts: the system comprises an efficient CFG generation algorithm, an intelligent path selection strategy and taint initialization and taint propagation rules.
The inter-process static taint analysis specifically comprises the following steps:
step 1: generating CFG for the constructor and the destructor and the related functions;
step 2: performing taint analysis on each constructor/destructor, and selecting an execution path by adopting an intelligent path selection strategy;
and step 3: marking the this pointer as an initial taint, carrying out the propagation and elimination of the taint related to the vftable and the vbtable by using a taint propagation rule, and constructing an object memory layout after the execution.
Previous methods generate CFGs for all functions, which adds significant time overhead when analyzing large binary files. In the field of inheritance relationship identification of C + + binary code classes, only functions related to overwrite analysis need to be analyzed, which generally accounts for about 20% of the total functions, so that the method adopts a strategy of partial CFG generation, and only CFG is generated for constructors, destructors and related functions, and the method specifically comprises the following steps:
step 1: the method comprises the steps of putting the addresses of a constructor and a destructor in the same set, traversing the set, and dividing each function into basic blocks by using a jump statement, wherein each function takes the basic block where the function starting address is located as a starting point, and each basic block takes the jump statement for division as an end;
step 2: for each basic block, if the basic block contains a direct jump statement or a conditional jump statement, connecting the basic block where a jump target is located with the basic block;
and step 3: for each basic block, if a call instruction exists, judging whether the system call is performed through symbol table mode matching, if so, directly discarding, otherwise, connecting the basic block where the function address pointed by the call instruction is located with the basic block, and if not, adding the function address pointed by the call instruction into a set;
and 4, step 4: when a jump statement is processed, if a jump target address is in an analyzed basic block and is not a starting address and an ending address of the basic block, the basic block is divided into the jump target address;
and 5: if a loop structure is encountered, marking the branch moving to the loop path as loop to provide a judgment basis for an intelligent path selection strategy;
step 6: if a non-return path structure (non-return) is encountered, the branch going to the non-return path is marked as the non-return path, so that a judgment basis is provided for an intelligent path selection strategy.
The main idea of the intelligent path selection strategy is to simulate a real program execution flow, avoid path explosion, screen paths, improve analysis efficiency, and prevent the execution path from entering an infinite loop condition, so that taint analysis can be normally executed and finished, as shown in fig. 5, the intelligent path selection strategy specifically includes the following strategies:
1) avoiding selecting a loop or a noreturn marked branch path when branching statements;
2) if the two branches do not have loop or no return marks, the Flase branch path can be covered by related instructions of the overwriting operation according to the principle of a compiler and reverse experience;
3) in order to avoid trapping in complex infinite loops such as inter-process loops and the like, instruction tracking is set, and a random walk strategy is adopted after the same branch instruction is executed for more than 10 times.
The taint initialization and taint propagation rules mainly track this pointer, find memory read and memory write operations related to vftable and vbtable, and construct the object memory layout according to the memory read and memory write operations. Because some polymorphic classes are defined in the source code but are not used, after compilation, the polymorphic classes do not have constructors in the binary file, and a false negative condition may be caused if only static taint analysis is performed on the constructors. In order to prevent memory leakage, the destructor is generally a virtual function and therefore exists in a binary file, but only the destructor cannot be analyzed, because the operation instructions related to vftable and vbtable in the destructor may be missing, and the information contained in the destructor is more comprehensive. For the reasons, in the inter-process static taint analysis process, the constructors are analyzed firstly, then the destructors of the inter-process static taint analysis are analyzed for the vftables which are not analyzed, and the combined analysis strategy can reduce the false alarm rate and the false alarm rate. The taint initialization and taint propagation rules specifically comprise the following steps:
step 1: and taking each identified constructor/destructor as a starting analysis point, wherein the initial taint is a this pointer. If the constructor is completely inline, namely the constructor is all inline into the ordinary function, the this pointer is rax register, and the initial instruction is the next instruction of the new function; otherwise the this pointer is rcx register, the start instruction is the constructor or the first instruction with the constructor inline;
step 2: scanning each instruction one by taking a basic block as a unit, and carrying out path selection according to an intelligent path selection strategy if a branch instruction is encountered;
and step 3: if the value in the dirty register (rax register or rcx register) is assigned to another register in the CPU, the target register is marked as dirty, and if a constant is assigned to the dirty register, the dirty is eliminated. If the value in the taint register is pressed into the stack, recording corresponding stack offset and marking taint, if the stack variable with the taint mark pops out of the stack and is assigned to the taint register, eliminating the taint from the corresponding stack offset, and marking the target register as the taint;
and 4, step 4: when a vbtable write operation related instruction is executed (such as mov qword [ rsi +0x8], vbtptr), finding a corresponding memory offset through the taint, and recording the vbtable into a corresponding object memory layout;
and 5: when a write operation related instruction of the vftable is executed (such as mov qword [ rsi +0x8], vftptr), a corresponding memory offset is found through the taint, the vftable is recorded into a corresponding object memory layout by using tail insertion, and meanwhile, the overwriting sequence (the overwriting sequence in the constructor) is also recorded. If the vbtable read operation related instruction is related, searching the corresponding vbtable from the object memory layout through a taint, calculating the real memory offset, recording the vftable and the overwriting sequence thereof (tail insertion method), and marking the memory offset as a virtual base class (vbase).
After performing the inter-process static taint analysis, an object memory layout is constructed. The object memory layout records the spatial position and time sequence of the vftable and vbtable of each object, for example, converting the information in fig. 3 into the form in fig. 6. Wherein, the longitudinal direction is a space dimension, the deviation of the memory and the attribute of the content are recorded, and the attribute of the content comprises: vftable, vbtable, vbase, var; the horizontal direction is a time dimension, and records vftable or vbtable and the overwriting order of vftable.
4. Heuristic reasoning
After the object memory layout is obtained, heuristic reasoning can be performed on the object memory layout to recover the classes and the class relationships. And (3) performing heuristic analysis on the object memory layout of each object, circularly performing the processes of node addition and node combination, and then performing object member analysis on the isolated nodes or node trees. Because different objects may contain the same class node, all nodes are finally subjected to aggregation and deduplication operations to obtain class information: the method comprises information such as classes, class relations, class methods and the like, wherein the class relations comprise: single inheritance, multiple inheritance, virtual inheritance, and object members.
The heuristic reasoning uses four heuristics which are irrelevant to compiling optimization, and the specific explanation and selection reasons are as follows:
1) the method does not adopt a mode of pairing the constructor and the vftable to recover the class inheritance relationship, and the method is ineffective in the condition of compiling optimization. The memory layout of each object about the vftable and the vbtable is recorded through a static taint analysis technology, and the inheritance relationship among the classes of the vftable at the same memory offset is known by the principle of a compiler.
2) Through reverse analysis, it can be found that multiple vftable of the same class have the same destructural function in language, which is already processed in destructural function analysis, and the heuristic can be used to merge multiple vftable into one class.
3) When the virtual base class is identified, the operation of data indirect reference exists, the problem cannot be processed by the previous method, the problem can be accurately processed by using the inter-process static taint analysis technology, the correct memory offset can be calculated, the vftable and the corresponding overwriting sequence are recorded, and meanwhile, the memory offset is marked with a vbase label. All the classes to which the vftable at the memory offset belongs have a virtual inheritance relationship, and the class to which the vftable written for the first time at the memory offset belongs can be known to be a virtual base class through reverse experience.
4) The identification of the object member is discriminated by the overwriting order, which is the time order having vftable overwriting operation in the constructor. In the constructor execution flow, the initialization of the object member is performed after the initialization of the vftptr of the class is completed, so that the overwriting sequence of the object member is greater than that of the class to which the object member belongs, which is similar to the previous method. In addition, since nodes with vftable as a set are traversed when identifying, but vftable is not paired with a constructor, a destructor is used here, and object members are restored by detecting whether a vftable is within a destructor and jointly judging in combination with the overwriting order, which is different from the previous method.
For the above reasons, four heuristics can be summarized as follows:
1. the class to which the vftable at the same memory offset belongs has an inheritance relationship and is responsible for recovering the class inheritance relationship;
2. different vftable of the same class have the same destructor and are responsible for node combination;
3. the class to which the first vftable in the memory offset with the vbase belongs is a virtual base class, and the following classes have a virtual inheritance relationship and are responsible for recovering the virtual inheritance relationship and the virtual base class;
4. the relevant overwriting instruction of the vftable of the object member exists in the destructor of the class to which the object member belongs, the overwriting sequence of the object member is greater than that of the class, and the object member is responsible for recovering.
According to the four heuristics, the heuristic reasoning specifically comprises the following steps:
step 1: traversing the object memory layout of each object, performing node generation operation, executing the step 2 to the step 6, performing node duplication elimination operation after traversing is finished, and executing the step 7;
step 2: extracting the first vftable stored in each memory offset with the vftable attribute, deleting the vftable from the object memory layout, independently forming the vftable into a node which represents a class, checking whether the same destructor exists among the vftables, merging the nodes if the destructor exists, and reserving a large numerical value in an overwriting sequence during merging;
and step 3: and circularly extracting the vftable in each memory offset with the vftable attribute, extracting one vftable from each memory offset in each circulation, deleting the vftable from the object memory layout, independently forming a node, and establishing an inheritance relationship with the node to which the last vftable belongs. In each cycle, detecting whether the identical destructor exists between every two vftables, if so, merging the nodes, and in the merging process, reserving a large numerical value in the overwriting sequence;
and 4, step 4: traversing all the nodes, checking whether the vftable of different nodes has the same destructor or not, if so, merging the nodes, and reserving a large numerical value in an overwriting sequence during merging;
and 5: for the memory offset of each vbase attribute, the memory offset is sequentially extracted from the second vftable, the extracted vftable is deleted from the object memory layout, becomes a node independently, and establishes a virtual inheritance relationship with the node (virtual base class) to which the first vftable belongs, wherein the first vftable is also extracted when the second vftable is extracted. Meanwhile, all nodes are traversed, nodes with the same destructor as the vftable are found, the nodes are merged, and the overwriting sequence is large in reserved value during merging. If the father node of the found node or the nodes above the father node of the found node have the virtual inheritance relationship, deleting the virtual inheritance relationship between the node to which the vftable belongs and the virtual base class;
step 6: if an isolated node or a node tree exists, recording the vftable of the lowest child node, the overwriting sequence, traversing other node trees, analyzing whether the destructor of the node in other node trees contains the vftable of the recording node, if the destructor contains the vftable of the recording node and the node overwriting sequence of the destructor is smaller than the overwriting sequence of the recording node, returning the node or the node tree (object member) to which the recording node belongs to the corresponding node, and establishing the object member relationship;
and 7: and performing aggregation and deduplication operation on all nodes, wherein only one node is reserved for the same node.
To enable those skilled in the art to quickly understand, an example is given of a node generation process of fig. 6 that performs heuristic reasoning, as shown in fig. 7.
Although specific details of the invention, algorithms and figures are disclosed for illustrative purposes, these are intended to aid in the understanding of the contents of the invention and the implementation in accordance therewith, as will be appreciated by those skilled in the art: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. The invention should not be limited to the preferred embodiments and drawings disclosed herein, but rather should be defined only by the scope of the appended claims.

Claims (10)

1. A class inheritance relationship identification method in a C + + binary file comprises the following steps:
1) extracting a virtual function table, a virtual base class table and a symbol table from the binary file;
2) acquiring a destructor in each virtual function table according to the symbol table, pairing the virtual function tables with corresponding destructors, and performing cross reference query on each virtual function table to obtain a constructor;
3) according to the constructor, the destructor and the generated control flow graph of directly or indirectly calling the constructor or the destructor, carrying out static taint analysis on the process of executing the virtual function table and the virtual base class by the constructor and the destructor, and constructing an object memory layout;
4) analyzing each function in the virtual function table through the object memory layout, and recovering the class information in the binary file to obtain the class inheritance relationship.
2. The method of claim 1, wherein the destructor in each vm table is obtained by the following strategy:
1) traversing the functions in each virtual function table, and detecting whether the functions have the overwriting operation of the memory of the virtual function table;
2) searching a first later instruction of the functions and a second later instruction of an upper-layer function calling the functions, and detecting whether delete operation exists according to the symbol table;
3) if a function has an overwrite operation and a delete operation on either the first subsequent instruction or the second subsequent instruction, the function is a destructor of the corresponding virtual function table.
3. The method of claim 1, wherein the constructor is obtained by:
1) cross reference is carried out to inquire each virtual function table, a destructor is eliminated, and a residual function is obtained;
2) traversing the residual function, and detecting whether the memory of the virtual function table has an overwriting operation;
3) searching a first preamble instruction of a residual function and a second preamble instruction of an upper-layer function calling the residual function, and detecting whether a new operation exists;
4) if a residual function has an overwrite operation and a new operation in either the first or second preamble instruction, the residual function is a constructor of the corresponding virtual function table.
4. The method of claim 1, wherein the control flow graph is generated by:
1) traversing a set formed by the addresses of the constructors and the destructors, dividing each constructor and the destructors into basic blocks by a jump statement, and taking the basic block where the initial address of each constructor and the destructor is located as an initial basic block;
2) for each basic block, if the basic block is a direct jump statement or a conditional jump statement, connecting the basic block where a jump target is located with the basic block;
3) searching a basic block containing a call instruction which is not called by a system, connecting the basic block where the call instruction is located with a basic block where a function address pointed by the call instruction is located, and merging a function address set pointed by the call instruction with a set consisting of the addresses of a constructor and a destructor;
4) if the jump target address of the jump statement is in the analyzed basic block and is not the starting address or the ending address of the analyzed basic block, the basic block is divided into the jump target address;
5) the route loop branches of the loop structure and the route loop branches of the non-return path structure are marked respectively.
5. The method of claim 1, wherein the object memory layout is constructed by:
1) according to the control flow graph, marking the this pointer as an initial stain by using each constructor and each destructor, and selecting an execution path;
2) and carrying out spread and elimination of stains on the virtual function table and the virtual base table, and constructing an object memory layout.
6. The method of claim 5, wherein the this pointer is marked as an initial taint for each constructor and destructor by:
1) if the constructors are all in-line into the common function, the this pointer is rax register, and the initial instruction is the next instruction of the new function;
2) if the constructor is not all inlined into the normal function, the this pointer is the rcx register and the start instruction is the first instruction of the constructor.
7. The method of claim 6, wherein the execution path is selected using an intelligent path selection policy; the intelligent routing strategy comprises the following steps:
1) when two branches have the trend circulating path branch mark without a circulating structure and the trend path branch mark without a returning path structure, a Flase branch path is taken;
2) the selection of a route marked with a branch mark of a route circulation path of a circulation structure and a branch mark of a route of a non-return route structure is avoided.
Wherein, the selection of the routing circular path branch mark marked with the circular structure and the routing path branch mark of the non-return path structure is avoided by the following steps:
1) setting instruction tracing;
2) the same branch instruction starts to adopt the strategy of random walk after being executed for more than a set number of times.
8. The method of claim 5, wherein the propagation and elimination of taint is performed on the virtual function tables and virtual base class tables by:
1) scanning each instruction one by taking a basic block as a unit, and if a branch instruction is encountered, performing path selection;
2) if the value in the taint register is assigned to other registers, the target register is marked as taint; if the constant value is assigned to the taint register, the taint is eliminated; if the value in the taint register is pressed into the stack, recording the corresponding stack offset and marking the taint; if the stack variable with the stain mark pops up the stack and is assigned to a stain register, eliminating the stain of the corresponding stack offset, and marking the target register as the stain;
3) when a virtual base table write operation related instruction is executed, finding out corresponding memory offset through stains, and recording the virtual function table into a corresponding object memory layout;
4) when the write operation related instruction of the virtual function table is executed, the corresponding memory offset is found through the taint, the virtual function is recorded in the corresponding object memory layout by using a tail insertion method, and the overwriting sequence in the construction function is recorded.
5) When related to the virtual base class table reading operation related instruction, searching the corresponding virtual base class table from the object memory layout through the taint, calculating the real memory offset, recording the virtual base class table and the overwriting sequence thereof, and marking the memory offset as the virtual base class.
9. The method of claim 1, wherein each function in the binary file is analyzed by:
1) traversing the object memory layout of each object, extracting a first virtual function table stored in each memory offset with virtual function table attributes, deleting the virtual function table from the object memory layout to independently form a node, checking whether the same destructor exists among the virtual function tables, merging the nodes if the same destructor exists, and reserving a large numerical value in an overwriting sequence during merging;
2) circularly extracting each virtual function table in the memory offset with the virtual function table attribute, extracting one virtual function table from each memory offset in each circulation, deleting the virtual function table from the layout of the object memory, independently forming the virtual function table into a node, establishing an inheritance relationship with the node to which the last virtual function table belongs, detecting whether the same destructor exists among the virtual function tables in each circulation, merging the nodes if the same destructor exists, and reserving a large numerical value in the overwriting sequence during merging;
3) traversing all nodes, checking whether virtual function tables of different nodes have the same destructor or not, if so, merging the nodes, and when merging, keeping a large numerical value in an overwriting sequence;
4) for the memory offset of each virtual base class attribute, sequentially extracting from a second virtual function table, deleting the extracted virtual function table from the object memory layout to form a node independently, establishing a virtual inheritance relationship with the node to which the first virtual function table belongs, traversing all the nodes, searching the nodes with the same analytic structure function as the virtual function table, merging the nodes, and keeping the large numerical value of the overwriting sequence during merging; if the father node or the nodes of the found nodes have the virtual inheritance relationship, deleting the virtual inheritance relationship between the node to which the virtual function table belongs and the virtual base class;
5) if an isolated node or node tree exists, recording the virtual function table of the lowest child node of the node or node tree, overwriting the sequence, traversing other node trees, analyzing whether the destructor of the node in other node trees contains the virtual function table of the recording node or not, if the destructor contains the node or node tree and the node overwriting sequence of the destructor is smaller than the overwriting sequence of the recording node, classifying the node or node tree to which the recording node belongs to the corresponding node, and establishing object membership;
6) and performing aggregation and deduplication operation on all nodes, wherein only one node is reserved for the same node.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-9.
CN202011322846.9A 2020-11-23 2020-11-23 Class inheritance relationship identification method in C + + binary file and electronic device Pending CN114527963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011322846.9A CN114527963A (en) 2020-11-23 2020-11-23 Class inheritance relationship identification method in C + + binary file and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011322846.9A CN114527963A (en) 2020-11-23 2020-11-23 Class inheritance relationship identification method in C + + binary file and electronic device

Publications (1)

Publication Number Publication Date
CN114527963A true CN114527963A (en) 2022-05-24

Family

ID=81618503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011322846.9A Pending CN114527963A (en) 2020-11-23 2020-11-23 Class inheritance relationship identification method in C + + binary file and electronic device

Country Status (1)

Country Link
CN (1) CN114527963A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340942A (en) * 2023-03-01 2023-06-27 软安科技有限公司 Function call graph construction method based on object propagation graph and pointer analysis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116340942A (en) * 2023-03-01 2023-06-27 软安科技有限公司 Function call graph construction method based on object propagation graph and pointer analysis
CN116340942B (en) * 2023-03-01 2024-04-30 软安科技有限公司 Function call graph construction method based on object propagation graph and pointer analysis

Similar Documents

Publication Publication Date Title
CN109800175B (en) Ether house intelligent contract reentry vulnerability detection method based on code instrumentation
CN109885479B (en) Software fuzzy test method and device based on path record truncation
EP1618470B1 (en) Method and apparatus for recovering data values in dynamic runtime systems
US10108527B2 (en) Debugging using program state definitions
CN102054149A (en) Method for extracting malicious code behavior characteristic
CN104536898B (en) The detection method of c program parallel regions
CN103678110A (en) Method and device for providing modification related information
CN104834837A (en) Binary code anti-obfuscation method based on semanteme
KR101979329B1 (en) Method and apparatus for tracking security vulnerable input data of executable binaries thereof
US11262988B2 (en) Method and system for using subroutine graphs for formal language processing
CN106598828A (en) Method and device for determining invalid class in source code
CN113468525A (en) Similar vulnerability detection method and device for binary program
CN105849698A (en) Execution guards in dynamic programming
CN110162474B (en) Intelligent contract reentry vulnerability detection method based on abstract syntax tree
CN105487983A (en) Sensitive point approximation method based on intelligent route guidance
CN114527963A (en) Class inheritance relationship identification method in C + + binary file and electronic device
CN102708054A (en) Detection method for security flaws in loop write-only memory of binary program
CN115237405A (en) Instruction level code multiplexing analysis method and instruction level code multiplexing method
JP6955162B2 (en) Analytical equipment, analysis method and analysis program
CN115168871B (en) Automatic generation method and device of attack utilization component
Zhu et al. Implementation of an effective dynamic concolic execution framework for analyzing binary programs
CN112199160B (en) Virtual instruction recovery method, device, equipment and storage medium
CN113312082B (en) Identification method and device for data mixed in instructions in binary file
CN117201138B (en) Intelligent contract vulnerability detection method, system and equipment based on vulnerability subgraph
CN114527961A (en) C + + binary file-oriented control flow graph generation method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination