CN116861442A

CN116861442A - Binary program-oriented data-oriented vulnerability retrieval method

Info

Publication number: CN116861442A
Application number: CN202310839451.3A
Authority: CN
Inventors: 付才; 朱庆晨; 骆天煜; 吕建强; 韩兰胜; 刘铭; 邹德清
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-10

Abstract

The invention provides a binary program-oriented data-oriented vulnerability retrieval method, which belongs to the technical field of computers and comprises the following steps: inputting test data into a target binary program in a computer and running the target binary program, and storing a memory snapshot when the target binary program runs to a loophole function where a memory error loophole is located; marking test data input into a buffer area of a computer in a memory snapshot as a pollution source, and executing a target binary program by taking a first instruction of a loophole function as a starting point; if any memory copy Gadget for copying the data in the memory to other memory locations and any address write Gadget for writing the data into the memory are found in the process of executing the target binary program, the current memory error vulnerability is judged to be enough to be improved to the data guide vulnerability DOP for utilization. The method can analyze the usability of the data-oriented vulnerability of the binary program.

Description

Binary program-oriented data-oriented vulnerability retrieval method

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a binary program-oriented data-oriented vulnerability retrieval method.

Background

In the current binary security field, DOP (Data-OrientedProgramming) vulnerability has become an important threat. The DOP exploit the existing data fragments in the program to construct malicious behavior, independent of the control flow of the program. This makes the handling of DOP vulnerabilities by conventional defense mechanisms and detection techniques more difficult.

For some found memory errors, the program is typically crashed, but if the user carefully constructs the program input, the memory errors may be promoted to DOP bugs and exploited. Therefore, determining whether the current memory error can be lifted as a DOP bug becomes an important issue.

The traditional defense mechanism and detection tool mainly pay attention to the traditional control flow hijacking attack, and the analysis capability of DOP vulnerability is limited, so that an efficient and accurate method and system for analyzing whether the current memory error has the possibility of DOP vulnerability exploitation are not available at present.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a binary program-oriented data-oriented vulnerability retrieval method.

In order to achieve the above object, the present invention provides the following technical solutions:

a binary program-oriented data-oriented vulnerability retrieval method comprises the following steps:

inputting test data into a target binary program in a computer and running the target binary program, and storing a memory snapshot when the target binary program runs to a loophole function where a memory error loophole is located;

marking test data in the memory snapshot as a pollution source, and executing a target binary program by taking a first instruction of a loophole function as a starting point;

after the target binary program executes the data transmission instruction polluted by the polluted source, if the operation of the data transmission instruction polluted by the polluted source is written back to the memory, the instruction of the target operand is cleared, the instruction of the source operand is fetched according to the sequence of entering the memory, and the instruction of the source operand and the current instruction form a Gadget together;

if any memory copy instruction AMC Gadget for copying the data in the memory to other memory positions and any address write instruction AMW Gadget for writing the data into any memory positions are found in the process of executing the target binary program, the current memory error vulnerability is judged to be enough to be improved to the data guide vulnerability DOP for utilization.

Further, the method further comprises the following steps:

taking the instruction address of the first instruction of the loophole function as an analysis starting point, taking the found instruction address of the first Gadget G1 as an analysis end point 1, and taking the found instruction address of the second Gadget G2 as an analysis end point 2;

acquiring a backward slice diagram between an analysis starting point and an analysis end point 1; acquiring a z3 expression expr1 of the analysis end point 1 relative to the analysis start point according to a backward slice diagram between the analysis start point and the analysis end point 1;

acquiring a backward slice between an analysis starting point and an analysis end point 2; acquiring a z3 expression expr2 of the analysis end point 2 relative to the analysis start point according to a backward slice between the analysis start point and the analysis end point 2;

if there is a solution for expr 1= =expr2, there is a data flow dependency between analysis endpoint 1 and analysis endpoint 2, otherwise there is no data flow dependency between analysis endpoint 1 and analysis endpoint 2;

if there is a data flow dependency between analysis endpoint 1 and analysis endpoint 2, G1 and G2 are combined into one Gadget.

Further, the acquiring the backward slice between the analysis start point and the analysis end point 1 includes:

collecting binary byte codes from an analysis starting point to an analysis end point 1 and converting the binary byte codes into a VEX IR instruction sequence;

splitting the VEX IR instruction sequence into sub-expressions, and constructing a data flow diagram DFG according to the statement type and sub-expression type of the VEX IR and the read temporary variables, registers and memories;

a backward slice between the analysis start point and the analysis end point 1 is extracted from the DFG.

Further, the method further comprises the following steps:

performing layer sequence traversal on the backward slice diagram between the analysis starting point and the analysis end point 1, and splitting the VEX IR nodes into sub-expressions for each VEX IR node in the backward slice diagram;

converting the sub-expression into a z3 expression;

solving the expression range of the corresponding register and the memory at the analysis end point 1 by using the Optimize class of the z3 expression, and determining the polluted memory according to the expression range of the corresponding register and the memory at the analysis end point 1;

executing a target binary program with the found Gadget G1 as a starting point for the polluted memory until a second Gadget G12 is found;

if the source address of G12 is covered by the pollution of G1, then G12 is regarded as a new random memory to read AMR Gadget; if the destination address of G12 is covered by the pollution of G1, then G12 is regarded as a new arbitrary address write AMW Gadget; if the source address of G12 is covered by the pollution of G1 and the destination address of G12 can be polluted, then G2 is regarded as any memory copy AMC Gadget;

continuing to execute the target binary program by taking G12 as a starting point until a new Gadget cannot be found any more or the currently explored Gadget enables the program to have DOP availability.

Further, the Gadget G1 writes the AMW Gadget for any address or copies the AMC Gadget for any memory.

Further, the method further comprises the following steps: path exploration is performed in the process of executing a target binary program, and the path exploration method comprises the following steps:

when the path is explored, defining all branches with polluted judging conditions as symbolized branches, and defining basic blocks where the symbolized branches are positioned as symbolized nodes; the symbolizing node is as follows:

mov register 1, [ Address 1]

cmp [ Address 2], register 1

jxx code segment address

If the cmp instruction in the jump node of the symbolized branch is polluted, defining the current branch as a controllable branch;

executing one path at a time in the process of executing the target binary program, and recording all controllable branches passing in the current running and the current jump result;

after the single path exploration is finished, performing sub-generation search on the controllable branches, wherein the sub-generation search comprises the following steps:

when one symbolized branch is reached, recording the jump direction of the current execution, taking out the path reaching the symbolized branch, reversing the jump direction at the tail of the path, and generating a new path for the next execution;

when executing the newly generated path, collecting constraint conditions of the newly generated path by using a symbol execution technology, verifying the resolvable property of the path constraint, and solving symbolized input leading to the path; the path is executed and a new path search is performed.

Further, the method further comprises the following steps: judging whether the current program execution is in a circulation structure or not by using a Dispatcher search algorithm, and controlling circulation times to relieve path explosion; the dispetcher search algorithm is:

searching a loop structure in a loop structure target binary program by using a LoopFinder tool, and recording all the entering edges and the exiting edges of the loop;

using a stack to store a circulation structure where the current simulation execution is located, continuously acquiring instructions of the simulation execution in the running process, and checking whether the current instruction address is positioned at an entry side or an exit side of the circulation;

if the current stack top element is positioned at the entering side of the cycle, checking whether the current stack top element is the cycle, if not, stacking the cycle, otherwise, indicating a second round entering the same cycle;

if the current loop is at the outgoing side of the loop, the loop at the top of the stack is popped up, and meanwhile whether the stack is empty is checked, if so, the current simulation execution is not in any loop.

The binary program-oriented data-oriented vulnerability retrieval method provided by the invention has the following beneficial effects:

the invention provides a method for analyzing DOP vulnerability availability of a current memory error; judging whether the current memory error can be utilized in a DOP vulnerability mode by searching the Gadget related to the memory error, and if any memory copy Gadget for copying the data in the memory to other memory positions and any address write Gadget for writing the data into the memory are found in the process of executing the target binary program, judging that the current memory error can be promoted to be utilized in the DOP. The analysis result can help the vulnerability discovery personnel and security specialists analyze the risk of the software system vulnerability suffering from DOP vulnerability exploitation.

Drawings

In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some of the embodiments of the present invention and other drawings may be made by those skilled in the art without the exercise of inventive faculty.

FIG. 1 is a schematic illustration of the present invention;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is a dispenser of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and the embodiments, so that those skilled in the art can better understand the technical scheme of the present invention and can implement the same. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Examples:

the invention provides a binary program-oriented data-oriented vulnerability retrieval method, which is shown in fig. 1 specifically and comprises the following steps: inputting test data into a target binary program and running the target binary program, and when the target binary program runs to a bug function to trigger a memory error, storing a memory snapshot to acquire data in each register and a memory currently; marking test data of an input buffer area in the memory snapshot as a pollution source, and starting pollution propagation by taking a first instruction of a loophole function as a starting point; in the process of executing pollution propagation, when an operation instruction to be polluted is executed, the operation instruction is directly stored into an instruction cache; when the data transmission instruction which is not polluted is executed, all instructions containing the destination operand of the current instruction in the instruction cache are cleared; after the contaminated data transmission instruction is executed, checking the current instruction behavior, if the operation of the current instruction is writing back to the memory, clearing the instruction containing the destination operand of the current instruction in the cache, and taking out the instruction containing the source operand of the current instruction according to the sequence entering the cache to form a Gadget together with the current instruction; if the current instruction is in the action of accessing the memory and writing the content into the register, the instruction containing the register in the cache is cleared, and meanwhile, the current instruction is written into the cache; if any memory copy Gadget and any address write Gadget are found in the program, the current memory error is judged to be promoted to DOP vulnerability for utilization.

The following are specific details and principles of the invention:

s1, modeling a Gadget function; s2, searching the Gadget; s3, evaluating the Gadget capability; s4, stitching incomplete Gadget; s5, activating a Gadget; s6, evaluating DOP availability of the program carrier.

1) Firstly, modeling and classifying the Gadget functionality. The method is used for analyzing the possibility that the carrier program can perform DOP exploit due to the known memory errors.

The invention considers whether the operation function Gadget exists or not and can not determine whether DOP loopholes exist or not, and has the complete data flow loophole semantics and needs to exist powerful memory operation code fragments.

The method is oriented to binary code segments required in DOP vulnerability exploitation, and is combined with a stain analysis result according to different pollution conditions of various parts in the instruction.

The present invention defines a Gadget with any memory copy AMC (ArbitraryMemory Copy) that allows the contents of one memory region to be copied to another memory region, any memory read AMR (ArbitraryMemory Read) that allows the contents of one memory region to be copied to another memory region, and any address write AMW (ArbitraryMemoryWrite) functions that allow data to be written into memory in any location. Meanwhile, an instruction sequence Gadget and a library function Gadget are defined according to the existence form of the Gadget, and the method is as follows:

instruction sequence Gadget: is defined as a sequence of instructions that reads data from an address in memory to a register, performs an operation on the register, and finally writes back to memory. The formalized code representation is shown in code segment 1. Wherein operation_inst represents an operation instruction in the assembly code, the instruction is unnecessary, and the code segment 1 still forms an instruction sequence Gadget after the instruction is removed.

mov register 1, [ Address 1]

Operational_inst register 1, constant 1

mov [ address 2], register 1

Library function Gadget: is defined as a library function implementation with memory assignment functions. Wherein the formal representation of the library function Gadget based on the memcpy implementation is shown in code segment 2.

mov rdi, address 1

mov rsi, address 2

movrdx, register 1

call memcpy

2) The Gadget search algorithm defined in S2 needs to be analyzed on a memory snapshot of a program.

Before analysis starts, poc is used as a program input and is operated to a function where a crash point is located, and a program memory snapshot is saved. Loading a program memory snapshot, marking data input by a user into a buffer area as a pollution source, and setting a first instruction of a loophole function as a starting point to start simulation execution.

And performing stain propagation analysis, data flow analysis, gadget identification, dispatcher identification and the like in the simulation execution process.

3) The external input is marked as pollution data, the stain propagation is implemented in the executing process, the polluted instructions are continuously received and cached, the instruction relevance analysis is carried out when the operation of writing back the memory is encountered, and the instructions which can form the Gadget with the instructions in the cache are searched.

The following is the implementation details of the Gadget search algorithm:

marking external input as pollution data, implementing stain propagation in the executing process, continuously receiving polluted instructions and caching, carrying out instruction correlation analysis when encountering the operation of storing the content back into the memory, and searching the instructions which can form gadget with the instructions in the cache.

During the operation of the algorithm, all data transfer instructions and contaminated operational instructions are analyzed. When a contaminated operation instruction is received, the contaminated operation instruction is directly stored into an instruction cache; when an uncontaminated data transmission instruction is received, all instructions containing a destination operand of the current instruction in the cache are cleared; checking the current instruction behavior after receiving the polluted data transmission instruction, if the operation of the current instruction is writing back to the memory, clearing the instruction containing the destination operand of the current instruction in the cache, and taking out the instruction containing the source operand of the current instruction according to the sequence entering the memory, and forming a gadget together with the current instruction; if the current instruction is to access the memory and write the content into the register, the instruction containing the register in the cache is cleared, and the current instruction is written into the cache.

To explore all possible gadgets, a path exploration strategy needs to be implemented during the simulation execution. In the path exploration process, all branches with polluted judging conditions are defined as symbolized branches, and basic blocks where symbolized branches are located are defined as symbolized nodes.

Formalized assembly code representation of symbolized nodes in basic blocks is shown in code segment 3:

mov register 1, [ Address 1]

cmp [ Address 2], register 1

jxx code segment address

If the cmp instruction in the jump node is contaminated, the current branch is indicated as a controllable branch. And executing one path at a time in the simulation execution process, and recording all controllable branches passing in the current operation and the current jump result. After the single execution exploration is completed, a hierarchical search (vanilla generational search) algorithm is implemented depending on the controllable branches. And recording the jump direction of the current execution every time a symbolized branch is reached, taking out the path reaching the symbolized branch, reversing the jump direction at the tail of the path, and generating a new path for the next execution. The symbolic execution technique is used to collect the path constraints as the newly generated path is executed and to verify the resolvability of the path constraints and to solve the symbolic input to the path. The path is executed and a new path exploration procedure is performed.

3.1 The invention designs the Gadget data stream analysis based on the VEX IR, which is used for solving the data stream expression of the pollution variable of the Gadget searched out in the step S2.

Overall idea of VEX IR-based Gadget data flow analysis: and obtaining the accurate range of the Gadget related register and the memory or judging whether a data flow dependency relationship exists between the two Gadgets by using a data flow analysis technology and a z3 solver. The method comprises the following specific steps: collecting binary byte codes from an analysis starting point to an analysis ending point, converting the binary byte codes into an intermediate language VEX IR instruction sequence, splitting a VEX IR statement into sub-expressions, constructing edges of a Data Flow Graph (DFG) according to the statement type and the sub-expression type and read temporary variables, registers, memories and the like, extracting a backward slice diagram containing the analysis ending point from the DFG, and finally obtaining the expression according to the backward slice, solving the range of the expression and judging the data flow dependency.

Wherein the sentence types include: ist_WrTmp, ist_store, ist_Put. The sub-expression types include: iex_binop, iex_rdtmp, iex_unop, iex_get, iex_load, iex_const.

4) Expression acquisition and range solving process: firstly, performing layer sequence traversal on a backward slice graph, and for each VEX IR node, firstly splitting the VEX IR node into sub-expressions, and then adopting different conversion strategies according to the type of the VEX IR node, the type of the sub-expressions and the type of operands to convert the sub-expressions into z3 expressions; and secondly, solving the expression range of the corresponding register and the memory at the analysis end point by utilizing the Optimize class of z3 and combining the state of the analysis start point.

Wherein the VEX IR node type and the sub-expression type correspond to the statement type and the sub-expression type mentioned in claim 5, respectively.

Operand types include iop_add, iop_sub, iop_mul, iop_div, iop_or, iop_and, iop_xor, iop_shl, iop_ Shr, iop_sar, etc. The following are three different conversion strategies for VEX IR statement types:

(1) IR statement type: ist_WrTmp, taking the sub-expression type Iex_Binop, the operand type Iop_Add, and VEX IR t2=Add64 (t 1, value) as examples, the conversion process is as follows:

rvalue1＝z3_exprs["t1"]

rvalue2＝z3.BitVecVal(value,64)

z3_exprs[t2]＝rvalue1+rvalue2

(2) IR statement type: ist_Store, taking VEX IR as STle (t 2) =t1 as an example, the conversion process is as follows:

var_name＝"Store_t2"

rvalue＝z3_exprs["t1"]

z3_expr[var_name]＝rvalue

(3) IR statement type: ist_put, taking VEX IR as Put (rsi) =t1 as an example, the conversion process is as follows:

reg_name＝rename_reg("rsi")

rvalue＝z3_exprs["t1"]

z3_exprs[reg_name]＝rvalue

description: (1) z3_exprs is a dictionary that stores expressions for various temporary variables and registers. (2) z3.BitVecVal (value, size) represents creating a z3 bit vector of size bits and value. (3) The rename_reg function is used to rename registers, guaranteeing the uniqueness of register names.

5) Data flow dependency determination:

the specific steps of the data flow dependency relationship judgment are as follows: and obtaining a backward slice between the analysis starting point and the analysis end point 1, obtaining a z3 expression expr1 of the analysis end point 1 relative to the analysis starting point, obtaining a z3 expression expr2 of the analysis end point 2 relative to the analysis starting point, and if expr 1= expr2 is a solution, proving that a data flow dependency relationship exists between the analysis end point 1 and the analysis end point 2, otherwise, no data flow dependency relationship exists.

The invention focuses on three key parameters of memory operation Gadget in the execution process: the source address, the source data and the destination address can divide the available memory operation Gadget into the following three types according to the pollution conditions of the three parameters:

arbitrary address write (AMW) the source data and destination address of the Gadget can be controlled and the source address cannot be controlled. The binary program has a function of reading data from a fixed user data buffer and writing the data to an arbitrary address range.

Any Memory Read (AMR) is that the source address of the Gadget can be controlled and the destination address is not controllable. I.e. has the ability to read data from any memory address and write it to a fixed memory buffer.

Arbitrary Memory Copy (AMC): both the source address and the destination address of the Gadget can be controlled. That is, the memory device has the capability of reading data from an arbitrary memory (except a user input buffer) address and writing the data into an arbitrary region in the memory space.

Operation (OPE): all operands in the Gadget can be controlled to complete the operation on the controlled data.

Meanwhile, the capability of Gadget is not limited to its instruction sequence itself, but also depends on its context. The present invention defines a Dispatcher search algorithm that uses the angr framework to load a binary program (or a dynamically linked library), constructs a CFG graph of the program, and uses the LoopFinder () method to find the loop structure that exists in it.

The definition of dispetcher is as follows:

a dispenser: the loop body of the loop structure comprises at least two paths of non-overlapping gadgets. The dispenser is schematically shown in fig. 3:

an enhancer: defined as a loop structure that controls only a single Gadget. There is a counter that is incremented after each entry into the loop and that directly participates in the critical variables of the Gadget. Binary instruction formalized description such as code segment 3:

cmp index register, immediate 1

ja exit address

inc index register

mov register 1, [ base register 1+ index register ]

mov [ base address register 2+ index register ], register 1

jmp cmp instruction address

And performing a dispatcher search in the simulation execution process, and recording the control relation between the searched Gadget and the dispatcher. First, a CFG of the binary program is constructed, and a loop structure therein is searched according to the CFG. The instruction is continuously received in the process of simulation execution, whether the current instruction is in a loop is checked, and the position of the current instruction in the loop is determined, so that the control relation between the loop and the code segment is acquired. The specific algorithm is shown in the following chart:

in the process of performing the Gadget capability assessment, firstly, a backward slice diagram of the data flow from the initial position of the vulnerability function to the current Gadget is constructed by the data flow analysis method described in claim 5, a data flow expression of the polluted operand in the Gadget is constructed, and a value range is calculated according to the method described in claim point 6. The range of values of the contaminated operands in the Gadget is defined as the Gadget capability. If the current Gadget is controlled by the enhancer, the Gadget's capability is enhanced according to the maximum number of cycles of the enhancer.

6) Incomplete Gadget stitching strategy:

taking any AMR random memory read and AMW random address write Gadget as an example, when the destination address of the AMR Gadget and the source address of the AMW Gadget write are overlapped, the transfer of memory data can be carried out, the AMR and the AMW can only read from and write to the fixed address respectively, and the AMR and the AMW Gadget can be called according to the sequence of the AMR Gadget and the AMW Gadget to form any memory copy Gadget. Similarly, when two AMC gadgets exist, we calculate the range of the target address of AMC1 and the range of the source address of AMC2, construct the data stream between them, and calculate the program input that can connect AMC1 and AMC2 to generate the data stream. And after the work is completed, starting to execute the Gadget activation algorithm. When the search algorithm finds a new Gadget G1, G1 must be AMW Gadget or AMC Gadget, and then construct Gadget capability and mark the corresponding memory. Then, the simulation is carried out backwards by taking G1 as a starting point, the stain propagation is carried out, all contaminated instructions are recorded, and a new Gadget G2 is found out. The activation procedure defined here is to rewrite the critical variables (source address, destination address, data pointed to by source address) of G2 by executing the code sequence of G1 in the program. And then carrying out capability assessment on G2, calculating a data flow dependency relationship in the process of executing the G2 from G1 as a starting point, and solving a data flow expression of a polluted variable in G2 and a value range thereof. If G2 is AMW Gadget or AMC Gadget, a contaminated variable exists in G2 to be used as a memory write address, the contaminated memory is continuously marked according to the value range of the variable, and the analysis procedure defined in claim 3 is continuously performed with G2 as a starting point until the current procedure is regarded as DOP complete or new AMC Gadget and AMW Gadget cannot be found any more in the activation process.

7) Program carrier DOP availability assessment

The invention expands the influence range of the memory error based on a known program memory error, and judges that the current memory error can be promoted to DOP bug for utilization if any memory copy which can be activated by the memory error and has stronger pollution capability and any address write Gadget exist in the program.

The following is a table for further defining the availability of the carrier program DOP according to the invention:

AMCGadget	AMWGadget	AMRGadget	OPEGadget	availability of
					0	0	x	x	Low and low
0	1	0	0	Medium and medium
					0	1	0	1	Higher height
0	1	1	0	Higher height
					0	1	1	1	Higher height
1	0	0	0	Medium and medium
					1	0	0	1	Medium and medium
1	0	1	0	Higher height
					1	0	1	1	Higher height
1	1	x	0	Higher height
					1	1	x	1	High height

The following are specific embodiments of the present invention:

(1) Obtaining a program memory snapshot;

(2) Marking a polluted memory, and performing simulation execution by using Triton;

(3) Searching the Gadget;

(4) Evaluating the Gadget capability;

(5) Activating the Gadget;

(6) Program carrier DOP availability evaluation.

Firstly, using GDB to run target binary program and storing the program snapshot before the memory error trigger point to obtain the data in each register and memory.

Triton simulation execution involves the following processes:

static simulation is performed. The input from POC is marked as smear data and symbolized data, and smear propagation analysis is performed during execution.

And (5) path exploration. Recording all symbolized branches in the executing process of the step (1), searching paths in a hierarchical searching mode, and recording the paths leading to the current branch and the jump direction of the current operation when encountering the symbolized branches. When a new N symbolized branches are explored, new N paths are created, which are stored using dictionary types (text), each element being represented as a key pair of address and skip success of jxx instruction. The formalism is described as follows:

{jxx addr1:True,jxx addr2:False}

and inverting the tail key value pairs of the N collected paths to generate a new path. Unreachable paths are culled using constraint solving. Meanwhile, in order to relieve the path explosion problem, the invention limits the execution times of circulation in a program to 5 times, uses angr to identify the position of a loop, records the times of entering the same loop, and if the times of entering the same loop are too many, transmits a jump-out signal to an analog execution module to jump out of the execution.

And (5) evaluating the Gadget capability. And in the analysis process, constructing a data flow diagram for each searched memory operation Gadget, extracting a backward slice diagram related to the Gadget, constructing a data flow expression of a polluted operand in the Gadget, and acquiring a corresponding value range by using a z3 solver as the capacity of the Gadget.

The Gadget activates. The invention defines that the activation process comprises a source Gadget G1 and a target Gadget G2. It is to be appreciated that Any Memory Read (AMR) Gadget is less capable and that AMR Gadget is generally not selected as G1. Any address write (AMW) and Any Memory Copy (AMC) Gadget have the ability to override critical variables. After the Gadget search module belonging to the step (3) searches G1, the expression range of the polluted operand in G1 is obtained by using the Gadgets data flow analysis algorithm code based on VEX IR, so as to form the capability of G1.

And finally marking a corresponding memory according to the pollution capability of G1, and then continuing to perform stain propagation analysis by taking G1 as a starting point. If the source address or destination address of G2 is affected by the pollution, it indicates that G2 can be activated, if the source address of G2 is covered by the pollution of G1, G2 is regarded as a new AMR Gadget, similarly, if the destination address of G2 can be polluted, G2 is regarded as a new AMW Gadget, if both can, G2 constitutes an AMC Gadget. The activation process is iterated, and the activation process is continued with G2 as a starting point until a new Gadget cannot be found any more or the currently explored Gadget can promote the program to have DOP availability.

The method for activating the Gadget can enlarge the influence range of memory errors in the program and further transmit the data stream, thereby finding more gadgets. The method analyzes the semantics of the Gadget instruction sequence, solves the memory influence range of the instruction sequence by using a data flow technology, simulates the influence on the memory after the current Gadget is fully utilized, and compared with the prior method, the method can find more Gadgets with higher availability and give out paths for triggering the Gadgets.

The incomplete Gadget is sutured. There may be two gadgets in the program where the control flow order is fixed and the data flows are interdependent, e.g., there is a common variable in G1 and G2, and G1 and G2 are on legal control flow paths. One code representation is as follows:

G1:memcpy(arg2,arg1)

G2:memcpy(arg3,arg2)

the data flow analysis module is used in the analysis process to check whether a data flow dependency relationship exists between G1 and G2, so that the parameter arg2 is identified. Wherein arg1 and arg3 are contaminated with external data. G1, G2 sequential execution enables arbitrary memory copy operations. Its Gadget capability is determined by the range of arg1 and arg3 solved by the data stream expression.

Gadget stitching strategy:

the invention designs a Gadget stitching algorithm, and uses a data flow analysis technology to functionally combine the searched gadgets so as to find more code utilization modes with stronger functions. The method designed by the invention can realize automatic classification of the Gadgets and stitching of code functions according to the types of the Gadgets so as to realize stronger code functions, and widens the utilization modes of the Gadgets

Program carrier DOP availability evaluation.

In performing the process defined in (3), (4), the ability of the found Gadget is calculated. For example, when there is a known out-of-range write memory error, parameters of memcpy (arg 1, arg2, size) can be modified, arg1 and arg2 are completely overwritten by the contents of the user input buffer, and written arg1 and arg2 are unrestricted, then the current memcpy is considered to be an AMC Gadget with full capabilities. Similarly, if arg2 points to the user input buffer, but arg1 can be rewritten by out-of-range write memory errors, and arg1 is unrestricted, then consider the current memcpy to be an AMW Gadget with full capabilities.

Furthermore, if the AMW with the memory impact capability and the AMC Gadget are found in a certain program and at least one OPE Gadget with the data operation capability exists, the current memory error is considered to have a high probability to be promoted to be DOP bug.

The above embodiments are merely preferred embodiments of the present invention, the protection scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention disclosed in the present invention belong to the protection scope of the present invention.

Claims

1. A binary program-oriented data-oriented vulnerability retrieval method is characterized by comprising the following steps:

if any memory copy instruction AMCGadget for copying the data in the memory to other memory locations and any address write instruction AMWGadget for writing the data into any memory locations are found in the process of executing the target binary program, the current memory error vulnerability is judged to be enough to be improved to the data guide vulnerability DOP for utilization.

2. The binary program-oriented data-oriented vulnerability retrieval method of claim 1, further comprising:

taking the instruction address of the first instruction of the loophole function as an analysis starting point, taking the found instruction address of the first GadgetG1 as an analysis end point 1, and taking the found instruction address of the second GadgetG2 as an analysis end point 2;

3. The binary program-oriented data-oriented vulnerability retrieval method of claim 2, wherein the obtaining the backward slice between the analysis start point and the analysis end point 1 comprises:

collecting binary byte codes from an analysis starting point to an analysis end point 1 and converting the binary byte codes into a VEXIR instruction sequence;

splitting the VEXIR instruction sequence into sub-expressions, and constructing a data flow diagram DFG according to the statement type and sub-expression type of the VEXIR and the read temporary variables, registers and memories;

4. The binary program-oriented data-oriented vulnerability retrieval method of claim 3, further comprising:

performing layer sequence traversal on the backward slice diagram between the analysis starting point and the analysis end point 1, and splitting each VEXIR node in the backward slice diagram into sub-expressions;

converting the sub-expression into a z3 expression;

executing a target binary program with the found GadgetG1 as a starting point for the polluted memory until a second GadgetG12 is found;

if the source address of G12 is covered by the pollution of G1, then G12 is regarded as a new random memory to read AMR Gadget; if the destination address of G12 is covered by the pollution of G1, then G12 is regarded as a new arbitrary address write AMWGadget; if the source address of G12 is covered by the pollution of G1 and the destination address of G12 can be polluted, then G2 is considered as any memory copy AMCGadget;

5. The binary program-oriented data-oriented vulnerability retrieval method of claim 4, wherein gadget 1 writes AMWGadget for any address or copies AMCGadget for any memory.

6. The binary program-oriented data-oriented vulnerability retrieval method of claim 1, further comprising: path exploration is performed in the process of executing a target binary program, and the path exploration method comprises the following steps:

mov register 1, [ Address 1]

cmp [ Address 2], register 1

jxx code segment address

7. The binary program-oriented data-oriented vulnerability retrieval method of claim 1, further comprising: judging whether the current program execution is in a circulation structure or not by using a Dispatcher search algorithm, and controlling circulation times to relieve path explosion; the dispetcher search algorithm is: