CN113204764B

CN113204764B - Unsigned binary indirect control flow identification method based on deep learning

Info

Publication number: CN113204764B
Application number: CN202110363702.6A
Authority: CN
Inventors: 王鹃; 王蕴茹; 杨梦达; 王杰; 钟璟
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-05-17
Anticipated expiration: 2041-04-02
Also published as: CN113204764A

Abstract

The invention relates to an unsigned binary indirect control flow identification method based on deep learning, which identifies a target basic block of an indirect jump instruction in a binary system through the deep learning. The method comprises the steps of constructing an indirect call branch and a function sequence based on instructions, basic blocks and function code blocks in a binary code file so as to construct triple samples of indirect jump and indirect call and generate an indirect jump training set and an indirect call training set; respectively constructing a neural network indirect jump and indirect call target identification classification model, and respectively constructing a neural network indirect jump and indirect call target identification classification loss function model; the method comprises the steps of preprocessing a binary file to be detected, generating indirect jump and indirect call samples aiming at indirect jump and indirect call instructions, carrying out target identification through the neural network indirect jump and indirect call target identification classification loss function model, and restoring an indirect control flow target through a classification result. The invention improves the accuracy of identification.

Description

Unsigned binary indirect control flow identification method based on deep learning

Technical Field

The invention belongs to the technical field of software analysis, and particularly relates to an unsigned binary indirect control flow identification method based on deep learning.

Background

Reconstructing a control flow graph from an unsigned binary file is a prerequisite basis for many problems in software analysis, such as instruction identification and function identification in disassembly. In addition, the control flow graph reconstruction of the binary hierarchy also plays an important role in the problems of control flow integrity research, malware classification, source tracing and the like. In general, statically reconstructing a control flow graph in binary is a recursive process, however, the process is often hampered by indirect control flow. For direct control flow, the operand of the jump/call instruction is the target address of the instruction in the control flow; for indirect branches, the operands of the corresponding instructions are often registers or memory locations that store the target addresses, so the targets of indirect control flow are difficult to statically determine. In view of the disadvantages of low coverage rate and low processing efficiency of the dynamic analysis method, the problem of static identification of indirect control flow from binary system of unsigned information becomes an urgent need to be solved.

Existing binary analysis tools typically employ different technical means to handle indirect jumps and indirect calls. Indirect jumps mainly include jump tables (compiled from switch-case and if-else). Existing jump table static processing methods can be divided into a) heuristic methods based on inverse slicing and pattern matching, and b) deep analysis techniques such as data flow analysis or Value Set Analysis (VSA). The heuristic determines the target basic block of the jump table by searching for a particular pattern to determine the base address of the jump table and the boundaries of the index. However, the method based on pattern matching requires manual setting of different patterns under different compilers and architectures, and thus lacks scalability. The deep analysis technology can reserve certain semantic information to improve the identification precision, but the method has higher calculation cost and is difficult to be applied to large-scale application programs. For indirect calling, an effective static analysis means of binary hierarchy is still lacking. The indirect call is mainly obtained by compiling a function pointer and a virtual function, and the function is used for realizing the dynamic behavior of the program. Mainstream analysis tools typically use constant propagation techniques to resolve the target of an indirect call. That is, when a constant flows to an indirect call instruction, the constant is regarded as a target of the corresponding indirect call instruction. However, only a few indirectly called target functions can be identified by this method.

In view of the above, the invention aims to solve the problem that the target of the indirect control flow instruction in the binary system of unsigned information is difficult to statically acquire by constructing the semantic-based binary system indirect control flow identification scheme. The invention utilizes semantic association between the source and the target of indirect jump (call) and carries out automatic identification on the target of indirect control flow based on a deep learning method. In addition, the framework of the invention does not need to adopt different technical means to process the indirect jump and the indirect call, and the accuracy similar to that of a mainstream binary analysis tool can be achieved in the aspect of the indirect jump.

Disclosure of Invention

Aiming at the problems, the invention provides an unsigned binary indirect control flow identification method based on deep learning. The method constructs a binary indirect control flow target recognition scheme taking deep learning as a center by acquiring semantic information among bytes in a binary system and based on context association between an indirect control flow source and a target, and comprises the following specific steps of:

step 1: an original binary code file is introduced, a plurality of bytes in the original binary code file form a plurality of instruction code blocks, the plurality of instruction code blocks form a plurality of basic block code blocks, the plurality of basic block code blocks form a plurality of function code blocks, an indirect call branch and a function sequence are constructed according to the basic block code blocks and the function code blocks, an indirect jump triple sample and an indirect call triple sample are further constructed, the indirect jump triple sample and the indirect call triple sample are respectively marked, and an indirect jump training set and an indirect call training set are generated;

step 2: establishing a neural network indirect jump target identification classification model, sequentially inputting each indirect jump triple sample in an indirect jump training set into the neural network indirect jump target identification classification model, further classifying to obtain a corresponding triple sample prediction result, further combining an indirect jump sample label and a prediction label of the classification model to establish a neural network indirect jump target identification classification loss function model, obtaining an optimization parameter set of a network through optimization training, and establishing the trained neural network indirect jump target identification classification model according to the network optimization parameter set; establishing a neural network indirect call target identification classification model, sequentially inputting each indirectly called triple sample in an indirect call training set into the neural network indirect call target identification classification model, further classifying to obtain a corresponding triple sample prediction result, further establishing a neural network indirect call target identification classification loss function model by combining an indirect call sample label and a label predicted by the classification model, obtaining an optimization parameter set of the network through optimization training, and establishing the trained neural network indirect call target identification classification model according to the network optimization parameter set;

and 3, step 3: extracting a command code block in a binary system to be detected, a basic block code block in the binary system to be detected and a function code block in the binary system to be detected from a binary system to be detected through the step 1, and judging whether the command code block in the binary system to be detected is an indirect jump command code block or an indirect call command code block;

preferably, the original binary code file in step 1 is:

text_i＝{c_i,1,c_i,2,...,c_i,L}

i∈[1,K]

wherein text_iRepresenting the ith original binary code file, K representing the number of original binary code files, L representing the number of bytes in the ith original binary code file, c_i,jRepresents the jth byte in the ith original binary code file, j ∈ [1, L [ ]]；

Step 1, a plurality of bytes in the original binary code file form a plurality of instruction code blocks, which is specifically expressed as:

Ins_i,k＝{c_{i,sins_k},c_{i,sins_k+1},...,c_{i,sins_k+nins_k-1}}

k∈[1,N_ins]

therein, Ins_i,kRepresenting the kth instruction code block, N, in the ith original binary code file_insIndicating the number of instruction code blocks in the ith original binary code file, sins _ k being the subscript of the byte starting the kth instruction code block, nins _ k indicating the number of bytes in the kth instruction code block, c_{i,sins_k+j}Represents sink-th byte of the k-th instruction code block in the ith original binary code file, wherein j belongs to [0, nins _ k-1 ]]；

Step 1, the plurality of instruction code blocks form a plurality of basic block code blocks, which are specifically expressed as:

B_i,m＝{Ins_{i,sbb_m},Ins_{i,sbb_m+1},...,Ins_{i,sbb_m+nbb_m-1}}

m∈[1,N_bb]

wherein, B_i,mRepresenting the mth basic block code block, N, in the ith original binary code file_bbDenotes the number of basic block code blocks in the ith original binary code file, sbb _ m is the subscript of the instruction code block starting from the mth basic block code block, nbb _ m denotes the number of instruction code blocks in the mth basic block code block, Ins_{i,sbb_m+j}Represents the sbb _ m + j +1 instruction code blocks in the mth basic block code block in the ith original binary code file, wherein j belongs to [0, nbb _ m-1 ]]；

Step 1, the specific expression that the plurality of basic block code blocks form a plurality of function code blocks is as follows:

F_i,n＝{B_{i,sfunc_n},B_{i,sfunc_n+1},...,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

wherein, F_i,nRepresenting the nth function code block, N, in the ith original binary code file_funcIndicating the number of function code blocks in the ith original binary code file, sfunc _ n is the subscript of the basic block code block starting from the nth function code block, nfunc _ n indicates the number of basic block code blocks in the nth function code block, B_{i,sfunc_n+j}Representing the sffunc _ n + j +1 basic block code block in the nth function code block in the ith original binary code file, wherein j belongs to [0, nffunc _ n-1 ]]；

The step 1 of constructing the indirect call branch and the function sequence according to the basic block code block and the function code block is as follows:

the indirect call branch:

Br_i,m＝{B_{i,entry_m},e,B_{i,entry_m+1},...,e,B_{i,call_m}}

m∈[1,N_call]

wherein, Br_i,mFor the indirect call branch sequence, N, of the mth indirect call instruction code block in the ith original binary code file_callRepresents the number of indirect call instruction code blocks in the ith original binary code file, entry _ mFor the index of the entry basic block of the mth indirect call branch sequence, entry _ m +1 is B_{i,entry_m}The call _ m is the subscript of the basic block code block where the mth indirect call instruction code block is located;

the function sequence is as follows:

Fs_i,n＝{B_{i,sfunc_n},e,B_{i,sfunc_n+1},...,e,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

wherein, Fs_i,nAs a function F_i,nCorresponding function sequences, e is the control flow inside the function;

step 1, further constructing indirect jump triple samples as follows:

Jdata_i,k＝(B_i,m,e,B_i,n)

k∈[1,N_{data_jmp}]

wherein, Jdata_i,kThe sample of the kth indirect jump data generated by the ith original binary code file is shown, namely the sample corresponding to the kth jump table in the ith original binary code file, N_{data_jmp}Representing the number of indirect jump samples in the ith original binary code file, e representing the control flow inside the function code block, B_i,mFor the basic block code block, B, in which the indirect jump instruction code block is located in the kth jump table_i,nDivide B for the function code block where the kth jump table is located_i,mOf any basic block code block, i.e. hypothesis B_i,m∈F_i,lThen B is_i,n∈F_i,l-{B_i,m}，m,n∈[1,N_bb]；

The above Jdata_i,kThe corresponding kth jump table is configured as follows:

JTable_i,k＝{B_i,m:{B_{i,sjt_k},B_{i,sjt_k+1},...,B_{i,sjt_k+njt_k-1}}}

wherein sjt _ k is the subscript of the basic block code block starting from the kth jump table, njt _ k indicates the number of jump entries in the kth jump table, B_{i,sjt_k+j}Representing the kth jump in the ith original binary code fileSjt _ k + j +1 th jump entry in the table, j ∈ [0, njt _ k-1 ]]；

Step 1, further constructing indirect calling triple samples as follows:

Cdata_i,k＝(Br_i,k,E,Fs_i,n)

k∈[1,N_{data_call}]

wherein, Cdata_i,kA kth indirect call data sample generated by the ith original binary code file is represented, that is, a sample corresponding to the kth indirect call instruction code block in the ith original binary code file is assumed to be Ins_i,l，N_{data_call}Representing the number of indirect call samples in the ith original binary code file, E representing the control flow between function code blocks, Br_i,kFor the kth indirect call instruction code block Ins in the ith original binary code file_i,lThe indirect call branch is used for constructing Br based on a breadth-first search algorithm_i,k；Fs_i,nFor the nth function F in the ith original binary code file_i,nCorresponding function sequences, F_i,nAny address-token function in the binary code is used;

definitions of CTRarget (Ins)_i,l) To Ins_i,lList of actually called function code blocks, namely:

CTarget(Ins_i,l)＝{F_i,ct1,F_i,ct2,...,F_i,ctn}

wherein, F_i,ct1,F_i,ct2,...,F_i,ctnTo Ins_i,lThe actual objective function of (2).

Step 1, marking the indirect jump triple sample and the indirect call triple sample respectively, and generating an indirect jump training set and an indirect call training set as follows:

triple sample for indirect jump, Jdata_i,k＝(B_i,m,e,B_i,n)：

If B is_i,n∈JTable_i,k[B_i,m]Then Jdata_i,kIs labeled as Jlabel_{i_k,1}Otherwise, Jlabel_{i_k,0}；

Triple sample for indirect calls, namely Cdata_i,k＝(Br_i,k,E,Fs_i,n)：

If F_i,n∈CTarget(Ins_i,l) Then the sample is labeled as Clabel_{i_k,1}Otherwise, it is Clabel_{i_k,0}。

Step 1, constructing an indirect jump training set, namely:

JDATA＝{(Jdata_1,1,Jlabel_{1_1,k1}),(Jdata_1,2,Jlabel_{1_2,k2}),......,(Jdata_{K,Ndata_jmp_k},Jlabel_{K_Ndata_jmp_k,kNjmp})}

JDATA is an indirect jump training set (Jdata)_1,1,Jlabel_{1_1,k1}) For the first sample in the data set, Jdata, as described above_1,1Jlabel for the 1 st sample in the 1 st original binary file_{1_1,k1}Is Jdata_1,1K1 is 0 or 1; (Jdata)_i,j,Jlabel_{i_j,km}) For the mth sample in the dataset, Jdata_i,jFor the jth sample in the ith original binary file, Jlabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_jmp_i}]K is the number of original binary code files, N_{data_jmp_i}Representing the total number of indirect jump samples of the ith binary, N_jmpThe total number of samples in the training set for indirect jump.

Step 1, constructing an indirect call training set, namely:

CDATA＝{(Cdata_1,1,Clabel_{1_1,k1}),(Cdata_1,2,Clabel_{1_2,k2}),......,(Cdata_{K,Ndata_call_k},Clabel_{K_Ndata_call_k,kNcall})}

wherein, CDATA is indirect calling training set (Cdata)_1,1,Clabel_{1_1,k1}) For the first sample in the data set, Cdata, as described above_1,1For the 1 st sample in the 1 st original binary file, Clabel_{1_1,k1}Is Cdata_1,1K1 takes a value of 0 or 1; (Cdata)_i,j,Clabel_{i_j,km}) For the mth sample in the data set, Cdata_i,jFor the jth sample in the ith original binary file, Clabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_call_i}]K is the number of original binary code files, N_{data_call_i}And indicating the total number of the indirect jump samples of the ith binary system, wherein Ncall is the total number of the samples in the indirect jump training set.

Preferably, the neural network indirect jump target identification classification model in the step 2 is formed by serially cascading an embedded layer, a deep bidirectional long-short term memory network, an attention layer and a batch normalization layer;

the embedding layer converts words from one-hot vectors which are high-dimensional sparse into low-dimensional dense vectors. And a low-dimensional dense vector jmp _ embedding _ vector of each word is a parameter to be optimized.

The deep bidirectional long and short term memory network is formed by sequentially connecting a first bidirectional long and short term memory layer, a second bidirectional long and short term memory layer and a random deactivation layer in series and cascade;

the ith layer of bidirectional long and short term memory layer is used for selectively discarding data through a gating mechanism, and then updating the data by combining the old state value memorized by the network to obtain a determined updated value and outputting the updated value to the next layer;

the weight of a forgetting gate of the ith layer of the bidirectional long-short term memory layer is jmp _ weightsf _ lstm _ i which is a parameter to be optimized;

the bias of a forgetting gate of the i-th bidirectional long and short term memory layer is jmp _ biasf _ lstm _ i; is the parameter to be optimized;

the weight of an input gate of the ith layer bidirectional long-short term memory layer is jmp _ weight _ lstm _ i which is a parameter to be optimized;

the bias of an input gate of the ith layer bidirectional long-short term memory layer is jmp _ biasi _ lstm _ i which is a parameter to be optimized;

the weight of an output gate of the ith layer of the bidirectional long and short term memory layer is jmp _ weight sc _ lstm _ i which is a parameter to be optimized;

the offset of an output gate of the ith bidirectional long and short term memory layer is jmp _ biasc _ lstm _ i which is a parameter to be optimized;

the weight of the state of the computing unit of the i-th layer bidirectional long and short term memory layer is jmp _ weight _ lstm _ i which is a parameter to be optimized;

the bias of the state of the computing unit of the i-th bidirectional long and short term memory layer is jmp _ biaso _ lstm _ i which is a parameter to be optimized;

the random inactivation layer is used for discarding the output data of the bidirectional long-short term memory layer with a certain probability so as to avoid overfitting.

The attention layer is used for reducing the problem of context loss caused by the disappearance of the gradient of the long sample sequence in the step 3 by giving greater weight to the important words;

the weight of the attention layer is jmp _ weights _ attention, which is a parameter to be optimized;

the bias of the attention layer is jmp _ bias _ attribute, which is a parameter to be optimized;

the context vector of the attention layer is jmp _ u _ attention, which is the parameter to be optimized.

The batch standardization layer comprises a full connection layer, a batch standardization layer and a normalization index layer;

the fully-connected layer outputs a one-dimensional matrix with the size of W x H, W256 and H1, and is used for integrating the output data of the attention layer and mapping the output data to the sample space of the next batch normalization layer;

the weight of the full connection layer is jmp _ weights _ dense, which is a parameter to be optimized;

the bias of the full connection layer is jmp _ bias _ dense, which is a parameter to be optimized.

The batch normalization layer is used for accelerating the optimization training convergence in the step 2;

the translation parameter of the batch normalization layer is jmp _ shift _ bn which is a parameter to be optimized;

the scaling parameter of the batch normalization layer is jmp _ scale _ bn, which is the parameter to be optimized.

The normalization index layer is used for converting continuous output characteristics of batch normalization layer intoDiscrete predictive features; the layer firstly carries out sigmoid operation on output characteristics of a batch standardization layer, then uses a cross entropy loss function which is more suitable for measuring two probability distribution differences as a measurement function, and optimizes a learning result of an upper layer, so that a final result is a label Jlabel predicted for an ith sample_i,1*、Jlabel_i,2A probability distribution of i ∈ [1, N ]]N represents the number of samples in the deep learning training set, and the problem is classified into two categories, so that the labels are classified into two categories;

the neural network indirect jump target identification classification loss function model is a cross entropy loss function, and is specifically defined as follows:

wherein N is the total number of training samples; predict a probability distribution of

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

The true label probability distribution is y⁽ⁱ⁾For step 2, the indirect control flow Jlabel of whether the ith sample is correct_i,1、Jlabel_i,2If the label of the ith sample is Jlabel_i,jThen set the corresponding probability value y^(i)jThe probability is one, and the other label Jlabel corresponding to the probability is one_i,k(k ≠ j) probability value y^(i)kIs zero;

the loss function is defined as:

wherein the cross entropy loss function l (Θ) requires the computation of all training samples

Values, and averaging. The training target of the neural network is set to predict the probability distribution

Probability distribution y of labels as close to reality as possible⁽ⁱ⁾I.e. to minimize the cross entropy loss function l (Θ); finally, calculating to obtain the probability of prediction classification;

optimizing the network parameters by using an Adam optimization algorithm to obtain a network optimization parameter set in the step 2 as follows:

the vector of each word represents jmp _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

the optimized weight parameters are jmp _ weight sf _ lstm _ best _ i, jmp _ weight si _ lstm _ best _ i, jmp _ weight sc _ lstm _ best _ i and jmp _ weight so _ lstm _ best _ i;

the optimized bias parameters are jmp _ biasf _ lstm _ best _ i, jmp _ biasi _ lstm _ best _ i, jmp _ biasc _ lstm _ best _ i and jmp _ biaso _ lstm _ best _ i respectively;

for the attention layer:

the optimized parameters comprise weight jmp _ weights _ entry _ best; a bias jmp _ bias _ authentication _ best and a context vector jmp _ u _ authentication _ best;

the optimized weight parameter jmp _ weights _ dense _ best of the full connection layer;

the optimized bias parameters of the full-connection layer are jmp _ bias _ dense _ best respectively;

the optimized translation parameter of the batch normalization layer is jmp _ shift _ bn _ best;

the optimized scaling parameter of the batch normalization layer is jmp _ scale _ bn _ best;

step 2, the neural network indirectly calls a target recognition classification model, and the target recognition classification model is formed by serially cascading an embedded layer, a deep bidirectional long-short term memory network, an attention layer and a batch normalization layer;

the embedding layer converts words from high-dimensional sparse one-hot vectors to low-dimensional dense vectors. And a low-dimensional dense vector call _ embedding _ vector of each word is a parameter to be optimized.

The bidirectional long and short term memory network is formed by sequentially connecting a first bidirectional long and short term memory layer, a second bidirectional long and short term memory layer and a random deactivation layer in series;

the weight of a forgetting gate of the ith layer of the bidirectional long and short term memory layer is call _ weight sf _ lstm _ i, and the weight is a parameter to be optimized;

the bias of a forgetting gate of the ith bidirectional long-short term memory layer is call _ biasf _ lstm _ i; is the parameter to be optimized;

the weight of an input gate of the ith layer of the bidirectional long and short term memory layer is call _ weight _ lstm _ i, which is a parameter to be optimized;

the bias of an input gate of the ith bidirectional long and short term memory layer is call _ biasi _ lstm _ i which is a parameter to be optimized;

the weight of an output gate of the ith layer of the bidirectional long and short term memory layer is call _ weight sc _ lstm _ i which is a parameter to be optimized;

the bias of an output gate of the ith bidirectional long and short term memory layer is call _ biasc _ lstm _ i which is a parameter to be optimized;

the weight of the state of the computing unit of the i-th layer bidirectional long and short term memory layer is call _ weight _ lstm _ i which is a parameter to be optimized;

the bias of the state of the computing unit of the ith layer of the bidirectional long and short term memory layer is call _ biaso _ lstm _ i which is a parameter to be optimized;

The attention layer is used for reducing the problem of context loss caused by the disappearance of the gradient of the long sample sequence in the step 3 by giving more weight to the important words;

the weight of the attention layer is call _ weights _ attribute, which is a parameter to be optimized;

the bias of the attention layer is call _ bias _ attitude, which is a parameter to be optimized;

the context vector of the attention layer is call _ u _ attention, which is the parameter to be optimized.

the weight of the full connection layer is call _ weights _ dense, which is a parameter to be optimized;

the bias of the full connection layer is call _ bias _ dense, which is a parameter to be optimized.

The batch normalization layer is used for accelerating the optimization training convergence in the step 3;

the translation parameter of the batch normalization layer is call _ shift _ bn which is a parameter to be optimized;

the scaling parameter of the batch normalization layer is call _ scale _ bn, which is the parameter to be optimized.

The normalization index layer is used for converting continuous output characteristics of the batch normalization layer into discrete prediction characteristics; the method comprises the steps of firstly carrying out sigmoid operation on output characteristics of a batch standardization layer, then using a cross entropy loss function which is more suitable for measuring two probability distribution differences as a measurement function, and optimizing a learning result of an upper layer, so that a final result is a label Clabel predicted aiming at an ith sample_i,1*、Clabel_i,2A probability distribution of i ∈ [1, N ]]N represents the number of samples in the deep learning training set, and the problem is classified into two categories, so that the labels are classified into two categories;

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

The true label probability distribution is y⁽ⁱ⁾For step 2, the indirect control flow Clabel indicates whether the ith sample is correct_i,1、Clabel_i,2If the label of the ith sample is Clabel_i,jThen set the corresponding probability value y^(i)jProbability of one, and corresponding another label Clabel_i,k(k ≠ j) probability value y^(i)kIs zero;

the loss function is defined as:

Values, and averaged. The training target of the neural network is set to predict the probability distribution

the vector of each word represents call _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

the optimized weight parameters are respectively call _ weight sf _ lstm _ best _ i,

call_weightsi_lstm_best_i*、call_weightsc_lstm_best_i*、

call_weightso_lstm_best_i*；

The optimized bias parameters are respectively call _ biasf _ lstm _ best _ i, call _ biasi _ lstm _ best _ i, call _ biasc _ lstm _ best _ i and call _ biaso _ lstm _ best _ i;

for the attention layer:

the optimized parameters comprise the weight call _ weights _ entry _ best; a bias call bias entry best and a context vector call u entry best;

the optimized weight parameter of the full connection layer is call _ weights _ dense _ best;

respectively obtaining bias parameters of the optimized fully-connected layer, namely call _ bias _ dense _ best;

the optimized translation parameter of the batch normalization layer is call _ shift _ bn _ best;

the optimized scaling parameter of the batch normalization layer is call _ scale _ bn _ best;

preferably, the step 3 of judging whether the instruction code block in the binary system to be detected is an indirect jump instruction code block or an indirect call instruction code block specifically includes:

if the instruction code block in the binary system to be detected belongs to the indirect jump instruction code block, the function code block where the indirect jump instruction code block is located, the basic block code block and the corresponding jump table are preprocessed through the step 1 to obtain a binary indirect jump triple sample to be detected,further predicting to obtain a binary sample label to be detected through a neural network indirect jump target recognition classification model defined in the step 2 after training, restoring a target basic block code block of each indirect jump instruction code block according to the sample label based on the triple definition in the step 1, namely if the kth indirect jump data sample Jdata in the ith binary original file_i,k＝(B_i,m,e,B_i,n) The predicted sample label is Jlabel_{i_k,1}Then B is_i,nA target basic block code block;

if the instruction code block in the binary system to be detected belongs to the indirect call instruction code block, the indirect call instruction code block and the binary system original file where the indirect call instruction code block is located, the function code block where the indirect call instruction code block is located are preprocessed through the step 1 to obtain the indirect call branch of the binary system to be detected and the function sequence of the binary system to be detected, the indirect call triple sample is further formed in the step one, the target recognition classification model is further indirectly called through the neural network defined in the step 2 after training, the sample label of the binary system to be detected is obtained through prediction, the target function code block of each indirect call instruction code block is restored according to the sample label based on the triple definition in the step 1, namely the kth indirect call data sample Cdata in the ith original binary system file is Cdata sample Cdata_i,k＝(Br_i,k,E,Fs_i,n) If the corresponding sample label is obtained by prediction as Clabel_{i_k,1}Then Fs_i,nCorresponding F_i,nThe target function code block.

The method is different from the traditional method for recovering the indirect control flow target based on the technologies such as data flow, starts from semantic association between the source of the indirect control flow and the target object, compares binary system with natural language, and constructs a binary system control flow instruction target automatic identification framework based on deep learning.

In order to fully reserve context associated information in a binary system, the invention provides a triple sample coding mode, and a construction method of an indirect call branch construction algorithm and a function sequence based on breadth-first search.

The invention provides an indirect control flow basic information acquisition method based on intermediate codes, and a mapping mechanism from the intermediate codes to binary is constructed.

The invention learns the semantic association characteristics between the source and the target of the binary indirect control flow through the neural network model of Bi-LSTM + Attention, and further can identify the target basic block or function of a skip list and a function pointer in the binary. The model can provide higher identification accuracy rate after verification.

The method is different from the prior static analysis method, disassembly processing is not needed to be carried out on the binary system, and the target object of the indirect control flow instruction can be recovered through a pre-trained classifier after less data preprocessing operations.

Drawings

FIG. 1 is a layout of the system design framework of the present invention.

FIG. 2 is a flow diagram of a binary indirect control flow extraction module of an embodiment of the present invention.

FIG. 3 is a schematic structural diagram of a Bi-LSTM + attention neural network constructed according to an embodiment of the present invention.

FIG. 4 is a flow diagram of an indirect control flow instruction target classification detection module of an embodiment of the invention.

Fig. 5 is a flowchart of an indirect call branch generation algorithm according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment.

The following describes the embodiments of the present invention with reference to fig. 1 to 5:

step 1: introducing an original binary code file, wherein a plurality of bytes in the original binary code file form a plurality of instruction code blocks, the plurality of instruction code blocks form a plurality of basic block code blocks, the plurality of basic block code blocks form a plurality of function code blocks, an indirect call branch and a function sequence are constructed according to the basic block code blocks and the function code blocks, an indirect jump triple sample and an indirect call triple sample are further constructed, the indirect jump triple sample and the indirect call triple sample are respectively marked, and an indirect jump training set and an indirect call training set are generated;

step 1, the original binary code file is:

text_i＝{c_i,1,c_i,2,...,c_i,L}

i∈[1,K]

Ins_i,k＝{c_{i,sins_k},c_{i,sins_k+1},...,c_{i,sins_k+nins_k-1}}

k∈[1,N_ins]

therein, Ins_i,kRepresents the kth instruction code block, N, in the ith original binary code file_insIndicating the number of instruction code blocks in the ith original binary code file, sins _ k being the subscript of the byte starting the kth instruction code block, nins _ k indicating the number of bytes in the kth instruction code block, c_{i,sins_k+j}Represents sins _ k + j +1 byte in the k instruction code block in the ith original binary code file, wherein j belongs to [0, nins _ k-1 ]]；

B_i,m＝{Ins_{i,sbb_m},Ins_{i,sbb_m+1},...,Ins_{i,sbb_m+nbb_m-1}}

m∈[1,N_bb]

wherein, B_i,mRepresenting the mth basic block code block, N, in the ith original binary code file_bbDenotes the number of basic block code blocks in the ith original binary code file, sbb _ m is the subscript of the instruction code block starting from the mth basic block code block, nbb _ m denotes the number of instruction code blocks in the mth basic block code block, Ins_{i,sbb_m+j}Representing the ith original binary generationThe sbb _ m + j +1 instruction code block in the mth basic block code block in the code file, j belongs to [0, nbb _ m-1 ]]；

F_i,n＝{B_{i,sfunc_n},B_{i,sfunc_n+1},...,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

the indirect call branch:

Br_i,m＝{B_{i,entry_m},e,B_{i,entry_m+1},...,e,B_{i,call_m}}

m∈[1,N_call]

wherein, Br_i,mFor the indirect call branch sequence, N, of the mth indirect call instruction code block in the ith original binary code file_callIndicating the number of indirect call instruction code blocks in the ith original binary code file, wherein entry _ m is a subscript of an entry basic block of the mth indirect call branch sequence, and entry _ m +1 is B_{i,entry_m}The call _ m is the subscript of the basic block code block where the mth indirect call instruction code block is located;

the function sequence is as follows:

Fs_i,n＝{B_{i,sfunc_n},e,B_{i,sfunc_n+1},...,e,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

step 1, further constructing indirect jump triple samples as follows:

Jdata_i,k＝(B_i,m,e,B_i,n)

k∈[1,N_{data_jmp}]

The above Jdata_i,kThe corresponding kth jump table is configured as follows:

JTable_i,k＝{B_i,m:{B_{i,sjt_k},B_{i,sjt_k+1},...,B_{i,sjt_k+njt_k-1}}}

wherein sjt _ k is the subscript of the basic block code block starting from the kth jump table, njt _ k indicates the number of jump entries in the kth jump table, B_{i,sjt_k+j}Representing sjt _ k + j +1 th jump entry in the kth jump table in the ith original binary code file, wherein j belongs to [0, njt _ k-1 ]]；

Step 1, further constructing indirect calling triple samples as follows:

Cdata_i,k＝(Br_i,k,E,Fs_i,n)

k∈[1,N_{data_call}]

wherein, Cdata_i,kRepresenting the kth indirect call data sample generated by the ith original binary code file, i.e. the ithA sample corresponding to the kth indirect call instruction code block in the original binary code file is assumed to be Ins_i,l，N_{data_call}Representing the number of indirect call samples in the ith original binary code file, E representing the control flow between function code blocks, Br_i,kFor the kth indirect call instruction code block Ins in the ith original binary code file_i,lThe indirect call branch is used for constructing Br based on a breadth-first search algorithm_i,k；Fs_i,nFor the nth function F in the ith original binary code file_i,nCorresponding function sequences, F_i,nAny address-token function in the binary code is used;

CTarget(Ins_i,l)＝{F_i,ct1,F_i,ct2,...,F_i,ctn}

wherein, F_i,ct1,F_i,ct2,...,F_i,ctnIs an Ins_i,lThe actual objective function of (2).

triple sample for indirect jump, Jdata_i,k＝(B_i,m,e,B_i,n)：

Triple sample for indirect calls, namely Cdata_i,k＝(Br_i,k,E,Fs_i,n)：

Step 1, constructing an indirect jump training set, namely:

JDATA is an indirect jump training set (Jdata)_1,1,Jlabel_{1_1,k1}) For the first sample in the data set, Jdata, as described above_1,1Jlabel for the 1 st sample in the 1 st original binary file_{1_1,k1}Is Jdata_1,1K1 is 0 or 1; (Jdata)_i,j,Jlabel_{i_j,km}) For the mth sample in the dataset, Jdata_i,jFor the jth sample in the ith original binary file, Jlabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_jmp_i}]K is the number of original binary code files, N_{data_jmp_i}Representing the total number of indirect jump samples of the ith binary system, N_jmpThe total number of samples in the training set for indirect jump.

Step 1, constructing an indirect call training set, namely:

wherein, CDATA is indirect calling training set (Cdata)_1,1,Clabel_{1_1,k1}) For the first sample in the data set, Cdata, as described previously_1,1For the 1 st sample in the 1 st original binary file, Clabel_{1_1,k1}Is Cdata_1,1K1 is 0 or 1; (Cdata)_i,j,Clabel_{i_j,km}) For the mth sample in the data set, Cdata_i,jFor the jth sample in the ith original binary file, Clabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_call_i}]K is the number of original binary code files, N_{data_call_i}And indicating the total number of the indirect jump samples of the ith binary system, wherein Ncall is the total number of the samples in the indirect jump training set.

2, the neural network indirect jump target recognition classification model is formed by serially cascading an embedded layer, a deep bidirectional long-short term memory network, an attention layer and a batch normalization layer;

the weight of a forgetting gate of the i-th layer bidirectional long and short term memory layer is jmp _ weightsf _ lstm _ i, and the weight is a parameter to be optimized;

the weight of an input gate of the i-th layer bidirectional long and short term memory layer is jmp _ weight _ lstm _ i, and the input gate is a parameter to be optimized;

the bias of an input gate of the i-th bidirectional long and short term memory layer is jmp _ biasi _ lstm _ i which is a parameter to be optimized;

the weight of the state of the computing unit of the ith layer bidirectional long-short term memory layer is jmp _ weight _ lstm _ i which is a parameter to be optimized;

the weight of the attention layer is jmp _ weights _ attribute, which is a parameter to be optimized;

The normalization index layer is used for converting continuous output characteristics of the batch normalization layer into discrete prediction characteristics; the method comprises the steps of firstly carrying out sigmoid operation on output characteristics of a batch standardization layer, then using a cross entropy loss function which is more suitable for measuring two probability distribution differences as a measurement function, and optimizing a learning result of an upper layer, so that a final result is a label Jlabel predicted according to an ith sample_i,1*、Jlabel_i,2A probability distribution of i ∈ [1, N ]]N represents the number of samples in the deep learning training set, and the problem is classified into two categories, so that the labels are classified into two categories;

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

the loss function is defined as:

the vector of each word represents jmp _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

the optimized weight parameters are respectively jmp _ weight sf _ lstm _ best _ i, jmp _ weight si _ lstm _ best _ i, jmp _ weight sc _ lstm _ best _ i and jmp _ weight _ lstm _ best _ i;

for the attention layer:

the optimized parameters comprise weight jmp _ weights _ entry _ best; a bias jmp _ bias _ attribute _ best and a context vector jmp _ u _ attribute _ best;

the optimized bias parameters of the full connection layer are jmp _ bias _ dense _ best respectively;

the embedding layer converts words from one-hot vectors which are high-dimensional sparse into low-dimensional dense vectors. And a low-dimensional dense vector call _ embedding _ vector of each word is a parameter to be optimized.

the bias of an output gate of the ith bidirectional long-short term memory layer is call _ biasc _ lstm _ i which is a parameter to be optimized;

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

True label probability distribution is y⁽ⁱ⁾For step 2, the indirect control flow Clabel indicates whether the ith sample is correct_i,1、Clabel_i,2If the label of the ith sample is Clabel_i,jThen set the corresponding probability value y^(i)jProbability of one, and corresponding another label Clabel_i,k(k ≠ j) probability value y^(i)kIs zero;

the loss function is defined as:

the vector of each word represents call _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

call_weightsi_lstm_best_i*、call_weightsc_lstm_best_i*、

call_weightso_lstm_best_i*；

for the attention layer:

to sum up, the data set to be tested is input in the form of a three-dimensional matrix, wherein one dimension represents the length of the longest sample sequence, two dimensions represent the total number of sample sequences to be tested, and three dimensions represent the dictionary dimension, which is 258 in the present invention. Each sample passes through an embedded layer and two bidirectional long-short term memory layers in sequence. The results obtained were passed through a random inactivation layer and then into the attention layer. The output of the attention layer is processed by a full connection layer and a batch standardization layer. The final output gets the probability of the likelihood that the sample is a normal indirect control flow through sigmoid.

And step 3: extracting a command code block in a binary system to be detected, a basic block code block in the binary system to be detected and a function code block in the binary system to be detected from a binary system to be detected through the step 1, and judging whether the command code block in the binary system to be detected is an indirect jump command code block or an indirect call command code block;

if an instruction code block in a binary system to be detected belongs to an indirect jump instruction code block, preprocessing a function code block, a basic block code block and a corresponding jump table where the indirect jump instruction code block is located through step 1 to obtain a binary system indirect jump triple sample to be detected, further predicting a binary system sample label to be detected through a neural network indirect jump target identification classification model defined in step 2 after training, restoring a target basic block code block of each indirect jump instruction code block according to the sample label based on the triple definition in step 1, namely, if a kth indirect jump data sample Jdata in an ith binary system original file_i,k＝(B_i,m,e,B_i,n) The predicted sample label is Jlabel_{i_k,1}Then B is_i,nA target basic block code block;

if the instruction code block in the binary system to be detected belongs to the indirect call instruction code block, the binary system original file where the indirect call instruction code block is located and the function code block where the indirect call instruction code block is located are preprocessed through the step 1 to obtain the binary system to be detectedIndirectly calling branches of the binary system and a binary system function sequence to be detected, further forming an indirectly calling triple sample of the step one, further indirectly calling a target recognition classification model through a neural network defined in the step 2 after training, predicting to obtain a binary system sample label to be detected, and restoring a target function code block of each indirectly calling instruction code block according to the sample label based on the triple definition of the step 1, namely for the kth indirectly calling data sample Cdata in the ith original binary file_i,k＝(Br_i,k,E,Fs_i,n) If the corresponding sample label is obtained by prediction as Clabel_{i_k,1}Then Fs_i,nCorresponding F_i,nThe target function code block.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. An unsigned binary indirect control flow identification method based on deep learning is characterized by comprising the following steps:

and step 3: and (2) extracting the instruction code block in the binary system to be detected, the basic block code block in the binary system to be detected and the function code block in the binary system to be detected from the binary system to be detected through the step 1, and judging whether the instruction code block in the binary system to be detected is an indirect jump instruction code block or an indirect call instruction code block.

2. The deep learning based unsigned binary indirect control flow identification method of claim 1,

step 1, the original binary code file is:

text_i＝{c_i,1,c_i,2,...,c_i,L}

i∈[1,K]

Ins_i,k＝{c_{i,sins_k},c_{i,sins_k+1},...,c_{i,sins_k+nins_k-1}}

k∈[1,N_ins]

therein, Ins_i,kRepresenting the kth instruction code block, N, in the ith original binary code file_insDenotes the number of instruction code blocks in the ith original binary code file, sins _ k is the index of the starting byte of the kth instruction code block, nins _ k denotes the number of bytes in the kth instruction code block, c_{i,sins_k+j}Represents sins _ k + j +1 byte in the k instruction code block in the ith original binary code file, wherein j belongs to [0, nins _ k-1 ]]；

B_i,m＝{Ins_{i,sbb_m},Ins_{i,sbb_m+1},...,Ins_{i,sbb_m+nbb_m-1}}

m∈[1,N_bb]

wherein, B_i,mRepresenting the mth basic block code block, N, in the ith original binary code file_bbDenotes the number of basic block code blocks in the ith original binary code file, sbb _ m is the subscript of the instruction code block starting from the mth basic block code block, nbb _ m denotes the number of instruction code blocks in the mth basic block code block, Ins_{i,sbb_m+j}Represents the sbb _ m + j +1 instruction code block in the mth basic block code block in the ith original binary code file, wherein j belongs to [0, nbb _ m-1 ]]；

F_i,n＝{B_{i,sfunc_n},B_{i,sfunc_n+1},...,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

wherein, F_i,nRepresenting the nth function code block, N, in the ith original binary code file_funcIndicating the number of function code blocks in the ith original binary code file, sffunc _ n is the subscript of the basic block code block starting from the nth function code block, nfunc _ n represents the number of basic block code blocks in the nth function code block, B_{i,sfunc_n+j}Representing the sffunc _ n + j +1 basic block code block in the nth function code block in the ith original binary code file, wherein j belongs to [0, nffunc _ n-1 ]]；

the indirect call branch:

Br_i,m＝{B_{i,entry_m},e,B_{i,entry_m+1},...,e,B_{i,call_m}}

m∈[1,N_call]

wherein, Br_i,mFor the indirect call branch sequence, N, of the mth indirect call instruction code block in the ith original binary code file_callIndicating the number of indirect call instruction code blocks in the ith original binary code file, wherein entry _ m is the subscript of the entry basic block of the mth indirect call branch sequence, and entry _ m +1 is B_{i,entry_m}The call _ m is the subscript of the basic block code block where the mth indirect call instruction code block is located;

the function sequence is as follows:

Fs_i,n＝{B_{i,sfunc_n},e,B_{i,sfunc_n+1},...,e,B_{i,sfunc_n+nfunc_n-1}}

n∈[1,N_func]

step 1, further constructing indirect jump triple samples as follows:

Jdata_i,k＝(B_i,m,e,B_i,n)

k∈[1,N_{data_jmp}]

wherein, Jdata_i,kThe sample of the kth indirect jump data generated by the ith original binary code file is shown, namely the sample corresponding to the kth jump table in the ith original binary code file, N_{data_jmp}Representing the number of indirect jump samples in the ith original binary code file, and e representing the functionControl flow inside digital code blocks, B_i,mFor the basic block code block, B, in which the indirect jump instruction code block is located in the kth jump table_i,nDivide B for the function code block where the kth jump table is located_i,mOf any basic block code block, i.e. hypothesis B_i,m∈F_i,lThen B is_i,n∈F_i,l-{B_i,m}，m,n∈[1,N_bb]；

The above Jdata_i,kThe corresponding kth jump table is configured as follows:

JTable_i,k＝{B_i,m:{B_{i,sjt_k},B_{i,sjt_k+1},...,B_{i,sjt_k+njt_k-1}}}

Step 1, further constructing indirect calling triple samples as follows:

Cdata_i,k＝(Br_i,k,E,Fs_i,n)

k∈[1,N_{data_call}]

definitions of CTRarget (Ins)_i,l) To Ins_i,lActually called function code blockList, i.e.:

CTarget(Ins_i,l)＝{F_i,ct1,F_i,ct2,...,F_i,ctn}

wherein, F_i,ct1,F_i,ct2,...,F_i,ctnTo Ins_i,lThe actual objective function of (2);

triple sample for indirect jump, Jdata_i,k＝(B_i,m,e,B_i,n)：

Triple sample for indirect calls, namely Cdata_i,k＝(Br_i,k,E,Fs_i,n)：

If F_i,n∈CTarget(Ins_i,l) Then the sample is labeled Clabel_{i_k,1}Otherwise, it is Clabel_{i_k,0}；

Step 1, generating an indirect jump training set, namely:

JDATA is an indirect jump training set (Jdata)_1,1,Jlabel_{1_1,k1}) For the first sample in the data set, Jdata, as described above_1,1Jlabel for the 1 st sample in the 1 st original binary file_{1_1,k1}Is Jdata_1,1K1 is 0 or 1; (Jdata)_i,j,Jlabel_{i_j,km}) For the mth sample in the dataset, Jdata_i,jFor the jth sample in the ith original binary file, Jlabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_jmp_i}]K being the original binary code fileNumber, N_{data_jmp_i}Representing the total number of indirect jump samples of the ith binary, N_jmpThe total number of samples in the indirect jump training set;

step 1, generating an indirect call training set, namely:

wherein, CDATA is indirect calling training set (Cdata)_1,1,Clabel_{1_1,k1}) For the first sample in the data set, Cdata, as described above_1,1For the 1 st sample in the 1 st original binary file, Clabel_{1_1,k1}Is Cdata_1,1K1 is 0 or 1; (Cdata)_i,j,Clabel_{i_j,km}) For the mth sample in the data set, Cdata_i,jFor the jth sample in the ith original binary file, Clabel_{i_j,km}M is the subscript of the sample in the dataset for its corresponding label, where i ∈ [1, K]，j∈[1,N_{data_call_i}]K is the number of original binary code files, N_{data_call_i}And indicating the total number of the indirect jump samples of the ith binary system, wherein Ncall is the total number of the samples in the indirect jump training set.

3. The deep learning based unsigned binary indirect control flow identification method of claim 1,

the embedding layer converts words from one-hot vectors with high dimensional sparsity into low dimensional dense vectors;

the low-dimensional dense vector of each word represents jmp _ embedding _ vector;

the deep bidirectional long and short term memory network is formed by sequentially connecting a 1 st bidirectional long and short term memory layer, a 2 nd bidirectional long and short term memory layer and a random deactivation layer in series and cascading;

the weight of a forgetting gate of the ith layer of the bidirectional long and short term memory layer is jmp _ weightsf _ lstm _ i;

the bias of a forgetting gate of the i-th bidirectional long and short term memory layer is jmp _ biasf _ lstm _ i; (ii) a

The weight of an input gate of the ith layer of the bidirectional long and short term memory layer is jmp _ weight _ lstm _ i;

the bias of an input gate of the i-th bidirectional long and short term memory layer is jmp _ biasi _ lstm _ i;

the weight of an output gate of the ith layer bidirectional long and short term memory layer is jmp _ weight sc _ lstm _ i;

the bias of an output gate of the ith bidirectional long and short term memory layer is jmp _ biasc _ lstm _ i;

the weight of the state of the computing unit of the ith layer of the bidirectional long and short term memory layer is jmp _ weight _ lstm _ i;

the bias of the state of the computing unit of the i-th bidirectional long and short term memory layer is jmp _ biaso _ lstm _ i;

i∈[1,2]；

the random inactivation layer is used for discarding the output data of the bidirectional long and short term memory layer with a certain probability and avoiding overfitting;

the attention layer is used for reducing the problem of context loss caused by the disappearance of the gradient of the long sample sequence by giving greater weight to the important words;

the weight of the attention layer is jmp _ weights _ attribute;

the bias of the attention layer is jmp _ bias _ attention;

the context vector of the attention layer is jmp _ u _ attention;

the weight of the full connection layer is jmp _ weights _ dense;

the bias of the full connection layer is jmp _ bias _ dense;

the translation parameter of the batch normalization layer is jmp _ shift _ bn;

the scaling parameter of the batch normalization layer is jmp _ scale _ bn;

the normalization index layer is used for converting continuous output characteristics of the batch normalization layer into discrete prediction characteristics; the layer firstly carries out sigmoid operation on output characteristics of a batch standardization layer, then uses a cross entropy loss function which is more suitable for measuring two probability distribution differences as a measurement function, and optimizes a learning result of an upper layer, so that a final result is a label Jlabel predicted for an ith sample_i,1*、Jlabel_i,2A probability distribution of i ∈ [1, N ]]N represents the number of samples in the deep learning training set, and the problem is classified into two categories, so that the labels are classified into two categories;

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

The true label probability distribution is y⁽ⁱ⁾Indirect control flow Jlabel for whether the ith sample is correct_i,1、Jlabel_i,2If the label of the ith sample is Jlabel_i,jThen set the corresponding probability value y^(i)jThe probability is one, and the other label Jlabel corresponding to the probability is one_i,k(k ≠ j) probability value y^(i)kIs zero;

the loss function is defined as:

wherein, cross entropy loss function

Requiring calculation of all training samples

The values are calculated and averaged; the training target of the neural network is set to predict the probability distribution

Probability distribution y of labels as close to reality as possible⁽ⁱ⁾I.e. functions that cause cross entropy loss

Minimization; finally, calculating to obtain the probability of prediction classification;

the vector of each word represents jmp _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

the optimized bias parameters are respectively jmp _ biasf _ lstm _ best _ i, jmp _ biasi _ lstm _ best _ i, jmp _ biasc _ lstm _ best _ i and jmp _ biaso _ lstm _ best _ i;

for the attention layer:

the embedding layer converts words from one-hot vectors with high dimensional sparsity into low dimensional dense vectors; a low-dimensional dense vector call _ embedding _ vector for each word;

the weight of a forgetting gate of the ith layer of the bidirectional long and short term memory layer is call _ weight sf _ lstm _ i;

the bias of a forgetting gate of the ith bidirectional long-short term memory layer is call _ biasf _ lstm _ i; (ii) a

The weight of an input gate of the ith bidirectional long and short term memory layer is call _ weight _ lstm _ i;

the bias of an input gate of the ith bidirectional long and short term memory layer is call _ biasi _ lstm _ i;

the weight of an output gate of the ith layer bidirectional long-short term memory layer is call _ weight sc _ lstm _ i;

the bias of an output gate of the ith bidirectional long and short term memory layer is call _ biasc _ lstm _ i;

the weight of the state of the computing unit of the ith layer of the bidirectional long and short term memory layer is call _ weight _ lstm _ i;

the bias of the state of the computing unit of the ith bidirectional long and short term memory layer is call _ biaso _ lstm _ i;

the weight of the attention layer is call _ weights _ entry;

the bias of the attention layer is call _ bias _ attention;

the context vector of the attention layer is call _ u _ attention;

the weight of the full connection layer is call _ weights _ dense;

the bias of the full connection layer is call _ bias _ dense;

the translation parameter of the batch standardized layer is call _ shift _ bn;

the scaling parameter of the batch normalization layer is call _ scale _ bn;

the normalized index layer is used for mixing batchesConverting continuous output characteristics of the quantitative normalization layer into discrete prediction characteristics; the method comprises the steps of firstly carrying out sigmoid operation on output characteristics of a batch standardization layer, then using a cross entropy loss function which is more suitable for measuring two probability distribution differences as a measurement function, and optimizing a learning result of an upper layer, so that a final result is a label Clabel predicted aiming at an ith sample_i,1*、Clabel_i,2A probability distribution of i ∈ [1, N ]]N represents the number of samples in the deep learning training set, and the problem is classified into two categories, so that the labels are classified into two categories;

Indirect control flow of correct or not for ith sample

A probability distribution of wherein

The probability value corresponding to the label is

the loss function is defined as:

wherein, cross entropy loss function

Requiring calculation of all training samples

the vector of each word represents call _ embedding _ vector _ best;

for the ith bidirectional long-short term memory layer;

call_weightsi_lstm_best_i*、call_weightsc_lstm_best_i*、

call_weightso_lstm_best_i*；

for the attention layer:

the optimized scaling parameter of the batch normalization layer is call _ scale _ bn _ best.

4. The deep learning based unsigned binary indirect control flow identification method of claim 1,

step 3, judging whether the instruction code block in the binary system to be detected is an indirect jump instruction code block or an indirect call instruction code block, specifically:

if the instruction code block in the binary system to be detected belongs to the indirect call instruction code block, the indirect call instruction code block and the binary system original file where the indirect call instruction code block is located and the function code block where the indirect call instruction code block is located are preprocessed through the step 1 to obtain the indirect call branch of the binary system to be detected and the function sequence of the binary system to be detected, so that the indirect call triple sample in the step 1 is further formed, and the target recognition classification model is further indirectly called through the neural network defined in the step 2 after trainingAnd predicting to obtain a binary sample label to be detected, restoring a target function code block of each indirect call instruction code block according to the sample label based on the triple definition in the step 1, namely for the kth indirect call data sample Cdata in the ith original binary file_i,k＝(Br_i,k,E,Fs_i,n) If the corresponding sample label is obtained by prediction as Clabel_{i_k,1}Then Fs_i,nCorresponding F_i,nThe target function code block.