CN115878498A - Key byte extraction method for predicting program behavior based on machine learning - Google Patents

Key byte extraction method for predicting program behavior based on machine learning Download PDF

Info

Publication number
CN115878498A
CN115878498A CN202310195368.7A CN202310195368A CN115878498A CN 115878498 A CN115878498 A CN 115878498A CN 202310195368 A CN202310195368 A CN 202310195368A CN 115878498 A CN115878498 A CN 115878498A
Authority
CN
China
Prior art keywords
program
neural network
target program
input
test set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310195368.7A
Other languages
Chinese (zh)
Inventor
毛得明
唐娜
吴春明
李芒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202310195368.7A priority Critical patent/CN115878498A/en
Publication of CN115878498A publication Critical patent/CN115878498A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a method for extracting key bytes based on machine learning prediction program behaviors, which comprises the following steps: running the input seed file on a target program, generating a large number of test sets X through a fuzzy test variation algorithm, performing code instrumentation on the target program to obtain a program PI, and running the test sets X on the program PI to obtain a test set Y; taking the X-Y data pairs of the test set as training data, and training the neural network model until the neural network model obtained by training can predict the behavior of the target program; constructing a saliency map based on a neural network model obtained by training, and extracting key bytes from the saliency map; wherein the key byte is an input that affects target program behavior. The invention effectively saves the time overhead and the performance overhead of program behavior tracking.

Description

Key byte extraction method for predicting program behavior based on machine learning
Technical Field
The invention relates to the technical field of computer information security, in particular to a method for extracting key bytes based on machine learning prediction program behaviors.
Background
Key bytes refer to inputs that affect the behavior of the target program, and based on a given set of inputs, the program behavior is observed and the bytes in the inputs that affect the program behavior are deduced back, i.e., key byte fetches are extracted. The extracted key bytes can be widely applied to the fields of system privacy data leakage, vulnerability detection, guidance fuzzy test and the like. In tracking program data flow, observing program behavior and extracting key bytes, a taint analysis technology is generally used, which detects the safety problem of a system by marking sensitive data in the system and tracking the propagation of marked data in the program, however, as the program scale is enlarged, the time cost brought by taint analysis is exponentially multiplied due to the need of tracking the information flow from taint source to taint gathering point in the responsible program.
At present, most program behavior tracking tools are realized based on taint analysis tools such as Valgrind, pin, qemu and the like. James Newsome publishes the Taintcheck developed based on Valgrind, which realizes the detection of the buffer overflow vulnerability, but ignores the tracking of the control flow. Wangjiang proposes a QEMU-based binary program offline dynamic taint analysis method, realizes extraction of a running track of a binary program by modifying a decoding and executing mechanism of QEMU, simultaneously finishes marking program input by using a HOOK technology, establishes a vulnerability model, and finishes offline program track analysis and program vulnerability detection according to a propagation strategy and a security check strategy generated by the vulnerability model while virtually replaying the program. However, the above methods all have a problem of excessive time consumption.
Machine learning is a recent research focus and it is enthusiastic to introduce methods of machine learning in different fields to improve the prior art. The TaintInduce provides a method for learning information propagation rules of a specific platform from input and output instructions. TaintInduce learns the information propagation rules based on the template and uses an algorithm to reduce the task to a prerequisite of only learning different input sets and information propagation labels. The TaintInduce learns the information propagation rule through a machine learning method, the accuracy of a single propagation rule is improved, but the TaintInduce still has the problems of high false alarm and high expense in the program behavior tracking process due to the propagation-based design.
Disclosure of Invention
In view of this, the present invention provides a method for extracting a key byte based on machine learning to predict program behavior, so as to solve the above technical problem.
The invention discloses a method for extracting key bytes based on machine learning prediction program behaviors, which comprises the following steps:
step 1: the method comprises the steps of running an input seed file on a target program, generating a large number of test sets X through a fuzzy test variation algorithm, performing code instrumentation on the target program to obtain a program PI, and running the test sets X on the program PI to obtain a test set Y;
step 2: taking the X-Y data pairs of the test set as training data, and training the neural network model until the neural network model obtained by training can predict the behavior of the target program; wherein, the test X is input data of the neural network, and the test set Y is label data of the neural network; the test set X-Y data pairs consist of a test set X and a test set Y;
and 3, step 3: constructing a saliency map based on a neural network model obtained by training, and extracting key bytes from the saliency map; wherein the key byte is an input that affects target program behavior.
Further, the step 1 comprises:
step 11: the fuzzy test takes the provided seed file as input, a large amount of variation operations are operated on a target program, and whether related results are caused after operation is checked; wherein the relevant result comprises that the target program crashes and a new execution path is found;
step 12: and performing basic block level code instrumentation on the target program to obtain a program PI, and running a test set X on the program PI to obtain an execution path of the target program, namely a test set Y.
Further, in step 11, to ensure that the length of the test set X is unchanged, three fuzzy test mutation algorithms, bitflip, arithmetric and intet, are selected to generate a large number of test sets X.
Further, the step 12 includes:
step 121: defining a function IFunc for insertion, wherein the function IFunc is inserted before each basic block, and when the basic block passes through in the execution process of a target program, the function IFunc is called to output the number of the basic block and the function where the basic block is located;
step 122: acquiring a target program;
step 123: initializing the number value num of the basic block, wherein the number value starts from 1;
step 124: traversing a function F of the target program;
step 125: traversing each basic block in the function F;
step 126: calling the inserted function IFunc;
step 127: adding 1 to the number value num of the basic block;
step 128: judging whether the traversal of the function F is finished, if not, executing the step 125, and if so, executing the step 129;
step 129: and executing the test set X by using the program PI, wherein the program PI outputs the position, the number and the file name of the executed basic block to obtain an execution path of the target program, namely the test set Y.
Further, in the step 2:
the neural network model is used for learning the data flow propagation process of different inputs in the target program by observing a large number of test set X-Y data pairs in the execution track of the target program and simulating the processing logic of the target program; the neural network model takes program input as model input and predicts an execution path of a target program.
Further, given one of the test set X-Y data pairs
Figure SMS_1
And corresponding target program execution path>
Figure SMS_2
When the output of the neural network model is &>
Figure SMS_3
Figure SMS_4
(1)
Figure SMS_5
(2)
Wherein, the first and the second end of the pipe are connected with each other,
Figure SMS_7
represents the second in test set XiRespective data +>
Figure SMS_10
Represents the first in test set YiIndividual data->
Figure SMS_13
Output vector representing a hidden layer of a neural network>
Figure SMS_8
Represents a ReLU function, <' > based on>
Figure SMS_11
、/>
Figure SMS_14
The trainable parameters representing each level of each layer are,kindex number representing a layer, <' > or>
Figure SMS_15
Indicates that the test data is->
Figure SMS_6
Neural network models in time, in combination>
Figure SMS_9
Trainable weight parameters representing a neural network model>
Figure SMS_12
Representing a sigmod function. />
Further, the neural network model comprises an input layer, a hidden layer and an output layer; the input layer is connected with the output layer through the hidden layer.
Further, the step 3 comprises:
step 31: calculating partial derivatives of the execution path predicted by the trained neural network model relative to the test set X;
step 32: constructing a saliency map based on the partial derivatives;
step 33: key bytes are extracted from the saliency map.
Further, the step 31 specifically includes:
is provided withF
Figure SMS_16
) Indicates input->
Figure SMS_17
Calculating a value of an output variable based on the value of an input->
Figure SMS_18
Is defined as follows:
Figure SMS_19
(3)
wherein, the first and the second end of the pipe are connected with each other,Xthe input data representing the neural network, i.e. test set X,
Figure SMS_20
represents input>
Figure SMS_21
The nth byte, partial derivative +>
Figure SMS_22
A Jacobian matrix forming a neural network function, each element of the matrix representing an output ÷ or ÷ value>
Figure SMS_23
Relative to the input
Figure SMS_24
Gradient of each byte.
Further, the step 32 specifically includes:
saliency map
Figure SMS_25
Is defined as follows:
Figure SMS_26
(4)
wherein the content of the first and second substances,
Figure SMS_27
is->
Figure SMS_28
Prediction output->
Figure SMS_29
For all inputs->
Figure SMS_30
The derivative sum of the nth byte represents the influence of the nth byte on the behavior of the currently executed target program, and the larger the value of the derivative sum is, the larger the influence is;
the step 33 specifically includes:
Figure SMS_31
(5)
wherein the content of the first and second substances,
Figure SMS_32
the key bytes are important bytes which can affect the execution path of the target program in the input field; top _ k represents the function that selects the k largest elements from the vector, and arg represents the function that returns the index of the selected element.
Due to the adoption of the technical scheme, the invention has the following advantages:
the invention can predict the behavior of the program by means of machine learning model simulation and different expressions of the learning program, and realizes the light-weight and accurate end-to-end information flow tracking. Compared with the traditional method for tracking the program by using the taint analysis tool, the method effectively saves the time overhead and the performance overhead of program behavior tracking. The model obtained by training by the method is used for guiding subsequent work such as fuzzy test, vulnerability mining and the like, so that the working efficiency can be greatly improved, and the analysis time can be saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings.
FIG. 1 is a flowchart illustrating a method for extracting key bytes based on machine learning prediction program behavior according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of pile insertion logic according to an embodiment of the present invention;
FIG. 3 is a diagrammatic illustration of a stake insertion flow of an embodiment of the present invention;
FIG. 4 is a diagram of a neural network model architecture according to an embodiment of the present invention;
FIG. 5 is a key byte diagram according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and examples, it being understood that the examples described are only some of the examples and are not intended to limit the invention to the embodiments described herein. All other embodiments available to those of ordinary skill in the art are intended to be within the scope of the embodiments of the present invention.
The technical problems to be solved by the invention are as follows:
(1) Accuracy problem of key byte extraction
When key byte extraction is carried out by using taint analysis, variables which have no data and control dependency relationship with program behaviors are often marked as taints, and false alarm is generated; if b is marked as taint, then for rule-based propagation, s and t will also be marked as taint, as s = a + b, t = s-b. However, b cannot actually affect t, which may cause excessive bytes in the input to be defined as key bytes, resulting in false alarm; alternatively, variables that have data and control dependencies on program behavior are not marked as dirty variables, resulting in false positives, e.g., if b >1 then a = 5, if b is marked as dirty and the value of b is greater than 1, a will not be marked as dirty since a does not have direct data communication with b, whereas in practice the value of a depends on b, which results in the byte in the input that should be the critical byte being ignored, resulting in false negations. Both of the above conditions affect the accuracy of key byte extraction.
(2) System resource and time overhead is excessive
When the taint analysis is used for extracting the key bytes, some codes for collecting information are inserted on the basis of not damaging the original logic of the target program, so that the relevant information of program operation is obtained. And a shadow memory is added on the basis of the original data to represent the pollution condition of the register and the memory. The method obtains the specific information in the program execution by inserting the piles in the program, has high analysis precision, but frequent pile inserting operation and the design of the shadow memory occupy a large amount of system resources, increases the time overhead, and increases the overhead exponentially along with the expansion of the program scale.
Referring to fig. 1, the present invention provides an embodiment of a method for extracting key bytes based on machine learning to predict program behavior; the embodiment of the invention uses a machine learning model of a neural network, a taint analysis technology tracks the propagation of marking data in a program with a large amount of time and resource cost, the neural network can predict the behavior of the program through different expressions of the learning program, and the influence of a taint source on a taint collection point in the program is calculated by utilizing gradient analysis, so that the lightweight end-to-end information flow tracking is realized.
The whole framework logic is expanded around the model and can be divided into 3 steps,
step 1: the method comprises the steps of running an input seed file on a target program, generating a large number of test sets X through a fuzzy test variation algorithm, performing code instrumentation on the target program to obtain a program PI, and running the test sets X on the program PI to obtain a test set Y;
the fuzzy test takes the provided seed file as input, carries out a large amount of variation operations, checks whether the running results cause the crash of the target program, discovers a new execution path and the like. The mutation operation of the fuzz test generally comprises the following 6 types:
TABLE 1 fuzzy test variation operation
Serial number Name(s) Description of the preferred embodiment Length variation
1 bitflip Flip by bit, 1 to 0 to 1 Without change
2 arithmetic Integer add/subtract arithmetic operation Without change
3 interet Replacing special contents in original file Without change
4 dictionary Replacing/inserting automatically generated or user-provided tokens into the original document There are variations
5 havoc Making a great deal of variation on the original document There is a change in
6 splice Splicing two files to obtain a new file There are variations
In order to ensure that the length of the test set X is unchanged, three fuzzy test variation algorithms of bitflip, arithmetric and intet are selected to generate a large number of test sets, then code instrumentation is performed on a target program, the test set X is operated on the instrumented program to obtain the path execution condition of the program, and the test set Y is collected.
Code instrumentation logic code that performs certain functions is inserted before and after instructions that are intended to be observed or processed in the assembly code of the object program, as shown in figure 2. Code instrumentation can be performed at 3 granularities, typically from the instruction level, basic block level, and function level. This patent chooses to do basic block-level code instrumentation on the target program.
The basic block is a program execution statement with only one inlet and one outlet, the function is generally divided into a plurality of basic blocks by jump instructions such as 'CMP', if the first instruction of one basic block is executed, the rest of the instructions of the basic block are executed. Compared with instrumentation of instruction-level granularity, instrumentation for basic blocks can save time and shorten program scale; instrumentation for basic blocks may improve accuracy compared to instrumentation at function-level granularity. The process of performing instrumentation on the basic blocks is shown in fig. 3. The method comprises the following steps:
1) Defining a function IFunc for insertion, wherein the function IFunc is used for defining an output function, and before each basic block, when the basic block passes through in the program execution process, the function is firstly called to output the number of the basic block and the function where the basic block is located;
2) Acquiring a target program;
3) Initializing the number value num of the basic block, wherein the number value starts from 1;
4) Function F of traversing target program
5) Traversing each basic block in the function F;
6) Calling the inserted function IFunc;
7) The number value num of the basic block is added with 1;
8) And after the traversal of the functions is finished, outputting the basic block quantity information corresponding to each function of the program.
And executing the test set X by using the instrumented program, wherein the program outputs the position and information of the executed basic block, so that a test set Y containing program execution path information is obtained, and a test set of X-Y data pairs is provided for model training.
Step 2: taking the X-Y data pairs of the test set as training data, and training the neural network model until the neural network model obtained by training can predict the behavior of the target program; wherein, the test X is input data of the neural network, and the test set Y is label data of the neural network; the test set X-Y data pairs consist of a test set X and a test set Y;
the test set X is the input of the target program, the test set Y is the execution path of the target program, the input of the target program is usually user input, files or user privacy character strings, and in order to facilitate the understanding and the recognition of the model, the method converts the byte sequence into a bounded numerical value vector with the range of [0,255 ]. The method processes the information, normalizes the execution path variable by binary data, indicates that the basic block is executed by 1, indicates that the basic block is not executed by 0, and uniformly normalizes the test set Y into 01 character strings with the same length so as to ensure the rapid convergence of the model.
The method adopts a neural network to construct a training model, and the model consists of 3 completely communicated layers, namely an input layer, a hidden layer and an output layer. The hidden layer uses ReLU as an activation function for 4096 hidden units and the output layer predicts the variables using sigmod as an activation function.
The model learns the propagation process of the information flow by observing a large number of X-Y pairs in the program execution trace. The detailed architecture of the model, which takes program input as model input and predicts the program execution path, is shown in fig. 4. Given a specific set of inputs for a particular program
Figure SMS_33
And a corresponding program execution path pick>
Figure SMS_34
With the model predicting an execution path being >>
Figure SMS_35
The formula is as follows:
Figure SMS_36
(1)
Figure SMS_37
(2)
wherein the content of the first and second substances,
Figure SMS_39
represents the second in test set XiIndividual data->
Figure SMS_42
Represents the first in test set YiIndividual data->
Figure SMS_45
Output vector representing a hidden layer of a neural network>
Figure SMS_40
Represents the ReLU function->
Figure SMS_43
、/>
Figure SMS_46
The trainable parameters representing each level of the hierarchy,kindex number representing a layer, <' > based on a predetermined index number>
Figure SMS_47
Indicates that the test data is->
Figure SMS_38
Neural network models in time, in combination>
Figure SMS_41
Trainable weight parameters representing a neural network model>
Figure SMS_44
Representing a sigmod function.
After training the neural network model, the method analyzes information flow in the target program by constructing a saliency map, which is detailed in step 3.
And 3, step 3: constructing a saliency map based on a neural network model obtained by training, and extracting key bytes from the saliency map; wherein the key byte is an input that affects target program behavior.
The key byte refers to an input that affects the behavior of the target program, and as shown in fig. 5, the red portion is a key byte diagram of a pdf file, and assuming that the length of the input x of the target program is m, the key byte is a portion of the data with the length of m, which affects the program execution path. This patent uses gradient analysis and saliency maps to compute the key bytes in the taint data. The saliency map is a gradient-based attribution method, and compared with other gradient-based methods, the saliency map focuses on the sensitivity of the neural network output to each feature, i.e., how the neural network output changes with respect to a minute change of the input. In this method, the saliency map method is chosen to be used because it is desirable to infer from the neural network which byte in the input would affect the execution path of the target program, i.e., produce the greatest sensitivity to the output of the neural network.
To extract the key bytes, the partial derivatives of the execution path predicted by the trained neural network model with respect to test set X are first computed. Is provided with
Figure SMS_48
Indicates input->
Figure SMS_49
The value of the output variable is calculated in relation to a given input ≧ during execution of the target program>
Figure SMS_50
Is defined as follows:
Figure SMS_51
(3)
wherein the content of the first and second substances,Xthe input data representing the neural network, i.e. test set X,
Figure SMS_52
indicates input->
Figure SMS_53
The nth byte, the partial derivative->
Figure SMS_54
A Jacobian matrix forming a neural network function, each element of the matrix representing an output ≥>
Figure SMS_55
Relative to the input
Figure SMS_56
Gradient of each byte. Then, based on the partial derivatives of the neural network model, a saliency map is constructed, the saliency map ≥>
Figure SMS_57
Is a vector, which is defined as follows:
Figure SMS_58
(4)/>
in the formula
Figure SMS_59
Is->
Figure SMS_60
Prediction output in neural network models/>
Figure SMS_61
The sum of the derivatives of the nth byte of all the inputs represents the influence of the nth byte on the behavior of the currently executed program, and the influence value is represented by a number, wherein the larger the number is, the larger the influence is. The flow of program execution information may be analyzed using saliency maps. After the saliency map is generated, finally, the key byte is extracted, formulated as Down, and/or based on>
Figure SMS_62
Indicates a selection->
Figure SMS_63
A function of the maximum element, based on the value of the sum of the values of the coefficients>
Figure SMS_64
Representing a function that returns the selected element index:
Figure SMS_65
(5)
wherein the content of the first and second substances,
Figure SMS_66
is the important byte in the input field that will affect the execution path of the target program, i.e. the key byte.
Most of the program behaviors of the parser programs are determined by bytes with specified input positions, namely fixed positions of file format header files, and not by file contents. After analyzing a plurality of file formats, the total number of key bytes of the file parsing class program is found to be between 250 and 500, and accounts for about 5% of the total input bytes. In practice, 5% of threshold value can be selected to calculate key byte
Figure SMS_67
The number of the base can be modified according to the actual situation.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A method for extracting key bytes based on machine learning prediction program behaviors is characterized by comprising the following steps:
step 1: running the input seed file on a target program, generating a large number of test sets X through a fuzzy test variation algorithm, performing code instrumentation on the target program to obtain a program PI, and running the test sets X on the program PI to obtain a test set Y;
and 2, step: taking the X-Y data pairs of the test set as training data, and training the neural network model until the neural network model obtained by training can predict the behavior of the target program; wherein, the test X is input data of the neural network, and the test set Y is label data of the neural network; the test set X-Y data pairs consist of a test set X and a test set Y;
and step 3: constructing a saliency map based on a neural network model obtained by training, and extracting key bytes from the saliency map; wherein the key byte is an input that affects target program behavior.
2. The method of claim 1, wherein step 1 comprises:
step 11: the fuzzy test takes the provided seed file as input, a large amount of variation operation is operated on the target program, and whether relevant results are caused after operation is checked; wherein the relevant result comprises that the target program crashes and a new execution path is found;
step 12: and performing basic block-level code instrumentation on the target program to obtain a program PI, and running a test set X on the program PI to obtain an execution path of the target program, namely a test set Y.
3. The method according to claim 2, wherein in step 11, three fuzzy test mutation algorithms bitflip, arithmetric and intet are selected to generate a large number of test sets X in order to ensure that the length of the test sets X is constant.
4. The method of claim 2, wherein step 12 comprises:
step 121: defining a function IFunc for insertion, wherein the function IFunc is inserted before each basic block, and when the basic block passes through the target program in the execution process, firstly calling the function IFunc to output the number of the basic block and the function where the basic block is located;
step 122: acquiring a target program;
step 123: initializing the number value num of the basic block, wherein the number value starts from 1;
step 124: traversing a function F of the target program;
step 125: traversing each basic block in the function F;
step 126: calling the inserted function IFunc;
step 127: the number value num of the basic block is added with 1;
step 128: judging whether the traversal of the function F is finished, if not, executing the step 125, and if so, executing the step 129;
step 129: and executing the test set X by using the program PI, and outputting the position, the number and the file name of the executed basic block by using the program PI to obtain an execution path of the target program, namely the test set Y.
5. The method according to claim 1, characterized in that in step 2:
the neural network model is used for learning the data flow propagation process of different inputs in the target program by observing a large number of test set X-Y data pairs in the execution track of the target program and simulating the processing logic of the target program; the neural network model takes program input as model input and predicts an execution path of a target program.
6. The method of claim 5, wherein the test set X-Y data pairs are given
Figure QLYQS_1
And corresponding target program execution path>
Figure QLYQS_2
When the output of the neural network model is &>
Figure QLYQS_3
Figure QLYQS_4
(1)
Figure QLYQS_5
(2)/>
Wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_7
represents the first in test set XiRespective data +>
Figure QLYQS_10
Represents the second in test set YiIndividual data->
Figure QLYQS_14
Output vector representing a hidden layer of a neural network>
Figure QLYQS_8
Represents the ReLU function->
Figure QLYQS_11
、/>
Figure QLYQS_13
Trainable on behalf of each levelThe parameters are set to be in a predetermined range,kthe index number of the presentation layer is,
Figure QLYQS_15
indicates that the test data is->
Figure QLYQS_6
Neural network model of time, based on the comparison of the measured time and the measured time>
Figure QLYQS_9
Trainable weight parameters representing a neural network model>
Figure QLYQS_12
Representing a sigmod function.
7. The method of any one of claims 1-6, wherein the neural network model comprises an input layer, a hidden layer, and an output layer; the input layer is connected with the output layer through the hidden layer.
8. The method of claim 6, wherein step 3 comprises:
step 31: calculating partial derivatives of the execution path predicted by the trained neural network model relative to the test set X;
step 32: constructing a saliency map based on the partial derivatives;
step 33: key bytes are extracted from the saliency map.
9. The method according to claim 8, wherein said step 31 is specifically:
is provided withF
Figure QLYQS_16
) Represents input>
Figure QLYQS_17
Calculating a value of an output variable based on the value of an input->
Figure QLYQS_18
Is defined as follows:
Figure QLYQS_19
(3)
wherein the content of the first and second substances,Xthe input data representing the neural network, i.e. test set X,
Figure QLYQS_20
indicates input->
Figure QLYQS_21
Nth byte, partial derivative of (1)
Figure QLYQS_22
A Jacobian matrix forming a neural network function, each element of the matrix representing an output ≥>
Figure QLYQS_23
Relative to the input->
Figure QLYQS_24
Gradient of each byte.
10. The method according to claim 9, wherein the step 32 is specifically:
significant figure
Figure QLYQS_25
Is defined as follows:
Figure QLYQS_26
(4)
wherein the content of the first and second substances,
Figure QLYQS_27
is->
Figure QLYQS_28
Prediction output->
Figure QLYQS_29
For all inputs->
Figure QLYQS_30
The derivative sum of the nth byte represents the influence of the nth byte on the behavior of the currently executed target program, and the larger the numerical value of the nth byte is, the larger the influence is;
the step 33 is specifically:
Figure QLYQS_31
(5)
wherein the content of the first and second substances,
Figure QLYQS_32
the key bytes are important bytes which can affect the execution path of the target program in the input field; top _ k represents the function that selects the k largest elements from the vector, and arg represents the function that returns the index of the selected element. />
CN202310195368.7A 2023-03-03 2023-03-03 Key byte extraction method for predicting program behavior based on machine learning Pending CN115878498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310195368.7A CN115878498A (en) 2023-03-03 2023-03-03 Key byte extraction method for predicting program behavior based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310195368.7A CN115878498A (en) 2023-03-03 2023-03-03 Key byte extraction method for predicting program behavior based on machine learning

Publications (1)

Publication Number Publication Date
CN115878498A true CN115878498A (en) 2023-03-31

Family

ID=85761904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310195368.7A Pending CN115878498A (en) 2023-03-03 2023-03-03 Key byte extraction method for predicting program behavior based on machine learning

Country Status (1)

Country Link
CN (1) CN115878498A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775127A (en) * 2023-05-25 2023-09-19 哈尔滨工业大学 Static symbol execution pile inserting method based on RetroWrite framework

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440201A (en) * 2013-09-05 2013-12-11 北京邮电大学 Dynamic taint analysis device and application thereof to document format reverse analysis
CN112463638A (en) * 2020-12-11 2021-03-09 清华大学深圳国际研究生院 Fuzzy test method based on neural network and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440201A (en) * 2013-09-05 2013-12-11 北京邮电大学 Dynamic taint analysis device and application thereof to document format reverse analysis
CN112463638A (en) * 2020-12-11 2021-03-09 清华大学深圳国际研究生院 Fuzzy test method based on neural network and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONGDONG SHE等: "Neutaint: Efficient Dynamic Taint Analysis with Neural Networks", 《2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775127A (en) * 2023-05-25 2023-09-19 哈尔滨工业大学 Static symbol execution pile inserting method based on RetroWrite framework
CN116775127B (en) * 2023-05-25 2024-05-28 哈尔滨工业大学 Static symbol execution pile inserting method based on RetroWrite frames

Similar Documents

Publication Publication Date Title
CN112733137B (en) Binary code similarity analysis method for vulnerability detection
CN111125716B (en) Method and device for detecting Ethernet intelligent contract vulnerability
US7340475B2 (en) Evaluating dynamic expressions in a modeling application
CN109977682A (en) A kind of block chain intelligence contract leak detection method and device based on deep learning
CN107169358A (en) Code homology detection method and its device based on code fingerprint
CN105808438B (en) A kind of Reuse of Test Cases method based on function call path
CN114297654A (en) Intelligent contract vulnerability detection method and system for source code hierarchy
CN116049831A (en) Software vulnerability detection method based on static analysis and dynamic analysis
CN110162972B (en) UAF vulnerability detection method based on statement joint coding deep neural network
CN110096439A (en) A kind of method for generating test case towards solidity language
CN115033895B (en) Binary program supply chain safety detection method and device
CN113326187A (en) Data-driven intelligent detection method and system for memory leakage
CN112364352A (en) Interpretable software vulnerability detection and recommendation method and system
CN115878498A (en) Key byte extraction method for predicting program behavior based on machine learning
CN115455382A (en) Semantic comparison method and device for binary function codes
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
EP1025492A1 (en) Method for the generation of isa simulators and assemblers from a machine description
CN113127933A (en) Intelligent contract Pompe fraudster detection method and system based on graph matching network
CN115576840A (en) Static program pile insertion detection method and device based on machine learning
CN117573142B (en) JAVA code anti-obfuscator based on simulation execution
Zhao et al. Suzzer: A vulnerability-guided fuzzer based on deep learning
CN117725592A (en) Intelligent contract vulnerability detection method based on directed graph annotation network
CN113886832A (en) Intelligent contract vulnerability detection method, system, computer equipment and storage medium
CN117591913A (en) Statement level software defect prediction method based on improved R-transducer
CN110955892B (en) Hardware Trojan horse detection method based on machine learning and circuit behavior level characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230331