CN111159011A - Instruction vulnerability prediction method and system based on deep random forest - Google Patents

Instruction vulnerability prediction method and system based on deep random forest Download PDF

Info

Publication number
CN111159011A
CN111159011A CN201911248246.XA CN201911248246A CN111159011A CN 111159011 A CN111159011 A CN 111159011A CN 201911248246 A CN201911248246 A CN 201911248246A CN 111159011 A CN111159011 A CN 111159011A
Authority
CN
China
Prior art keywords
instruction
vulnerability
forest
sample
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911248246.XA
Other languages
Chinese (zh)
Other versions
CN111159011B (en
Inventor
顾晶晶
柳塍
晏祖佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201911248246.XA priority Critical patent/CN111159011B/en
Publication of CN111159011A publication Critical patent/CN111159011A/en
Application granted granted Critical
Publication of CN111159011B publication Critical patent/CN111159011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting instruction vulnerability based on a deep random forest, wherein the method comprises the following steps: extracting instruction characteristic information related to each program instruction and instruction vulnerability, and generating an instruction characteristic vector representing the instruction vulnerability; fault injection is carried out on the training program, and vulnerability values of all program instructions are obtained; combining the instruction characteristic vector and the instruction vulnerability value to generate an instruction vulnerability sample data set; performing sliding sampling on the instruction vulnerability sample data set through a sliding window to generate an expanded sample data set; constructing and training an instruction vulnerability prediction model based on a deep random forest; and extracting the instruction feature vector of the target program to be predicted, and realizing the instruction vulnerability prediction of the target program to be predicted by combining an instruction vulnerability prediction model. The system is used for realizing the method. The method has the advantages of high prediction accuracy, low demand on the sample set and less manual adjustment work, and can be effectively applied to prediction of instruction vulnerability after the program is influenced by instantaneous faults.

Description

Instruction vulnerability prediction method and system based on deep random forest
Technical Field
The invention belongs to the field of software reinforcement and software reliability, and particularly relates to a method and a system for predicting instruction vulnerability based on deep random forest.
Background
With the rapid development of semiconductor manufacturing processes, the size of a computer chip is continuously reduced, so that the sensitivity of the computer chip to spatial radiation is greatly improved. Under the environment of space radiation, the single event upset effect generated by high-energy particle irradiation or electromagnetic interference on an integrated circuit chip with a high process level is one of the main reasons for the failure of a computer system. The Single Event Upset (SEU) effect refers to a phenomenon in which a certain bit of a memory value is affected and logic state inversion occurs, and this phenomenon is generally referred to as a soft error.
Soft errors are generally classified into (1) the occurrence of soft errors has no influence on the normal operation of a program; (2) the occurrence of a soft error causes a program to crash or hang; (3) the occurrence of a soft error causes an implicit error to occur in the program, i.e., the program operates normally, but the operation results in an error, and such an error is generally referred to as sdc (simple Data correction). The first type of error does not affect program operation, and the second type of error is serious but easy to detect. Compared to these two types of errors, SDC will cause more serious program problems due to its implicit propagation properties.
For software SDC errors, the conventional error detection method based on redundant instructions copies all instructions in a program, which results in huge performance, so that research on redundancy technologies at present focuses on how to select fragile instructions in the program for partial redundancy, so as to achieve the purpose of reducing overhead. Existing selection methods can be divided into three categories: (1) a selection method based on fault injection; (2) a selection method based on program analysis; (3) a selection method based on machine learning. The fault injection-based instruction selection method comprises the steps of fault injection of program instructions one by one, screening of instructions with high vulnerability by observing fault injection results and carrying out redundancy reinforcement. Program instruction vulnerability is determined through program analysis based on a program analysis method, for example, an error propagation model is established to calculate instruction SDC vulnerability through analyzing the propagation path of program transient fault in the article Errorflow model, Modeling and analysis of software propagating hardware faults. The method based on machine learning combines the advantages of the former two, avoids complex propagation calculation process, and reduces fault injection overhead. In recent years, prediction of program fragile instructions by methods such as support vector machines and neural nets has been studied, but such methods require a large number of prediction data sets and complicated manual parameter adjustment to achieve higher accuracy.
Disclosure of Invention
The invention aims to provide a method and a system for predicting instruction vulnerability, which can reduce the data set scale and parameter adjustment complexity required by model training and can be applied to large-scale programs or complex environments.
The technical solution for realizing the purpose of the invention is as follows: an instruction vulnerability prediction method based on a deep random forest comprises the following steps:
step 1, performing static analysis on a training program, extracting instruction characteristic information related to each program instruction and instruction vulnerability, and generating an instruction characteristic vector V representing the instruction vulnerability corresponding to the program instructionfeatures
Step 2, fault injection is carried out on the training program, and vulnerability values P of all program instructions are obtainedSDC(Ii);
Step 3, combining the instruction characteristic vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii);
Step 4, sliding sampling is carried out on the instruction vulnerability sample data set D through a sliding window model, instruction sequence expansion characteristics of sample data are obtained, and an expansion sample data set is generated;
step 5, constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set;
and 6, extracting the instruction feature vector of the target program to be predicted according to the process of the step 1, and combining the instruction vulnerability prediction model obtained in the step 5 to realize the instruction vulnerability prediction of the target program to be predicted.
Further, the instruction feature vector V for characterizing the vulnerability of the instruction in step 1featuresThe following 7-tuple:
Vfeatures=<Vtran_bran,Vcomp,Vaddr,Vmask,Vloop,Varith,Vblock>
in the formula, Vtran_branIndicating branch and branch-related instruction characteristics, including branch-related characteristic fis_branchFunction call related feature fis_callReturn instruction feature fis_return;VcompIndicating compare-instruction-related features, including integer compare-instruction feature fis_int_cmpFloating point compare instruction characteristics fis_float_cmp;VaddrIndicating address-instruction-dependent features, including address-instruction-reference feature fis_used_in_addAddress width characteristic f of destination operation instructiondest_op_widthStore instruction characteristics fis_used_stroe;VmaskRepresenting fault-mask-related features, including logic and instruction features fis_andLogical OR instruction characteristic fis_orLogic shift instruction feature fis_sh;VloopIndicating loop instruction dependency characteristics, including loop position instruction characteristic fis_loopCycle depth characteristic floop_d;VarithRepresenting arithmetic operation correlation features, comprising: addition-subtraction instruction characteristics fis_add/subMultiply-divide instruction feature fis_mul/div;VblockRepresenting features related to basic block information, including: basic block length feature fbb_lengthCharacteristic f of the number of instructions to be executed in the basic blockbb_remain_ins_numNumber of precursor basic blocks characteristic fpred_bb_numThe number of subsequent basic blocks characteristic fsuc_bb_num
Further, in step 2, obtaining the vulnerability value P of each program instructionSDC(Ii) The formula used is:
Figure BDA0002308294990000031
in the formula IiDenotes the ith program instruction, PSDC(Ii) Representing program instructions IiSDC vulnerability value of (M), w represents the bit width of the instruction destination register, MjRepresenting pairs of program instructions IiNumber of SDC failures after fault injection at jth bit, FjRepresents the pair instruction IiThe total number of fault injections performed by the jth bit.
Further, in step 4, the sliding sampling is performed on the instruction vulnerability sample data set D through the sliding window model, the instruction sequence expansion feature of the sample data is obtained, and the expanded sample data set is generated, which specifically includes:
let m be 2 as the initial value, m belongs to N*,2≤m≤p,p∈N*Setting a p value in a self-defined mode;
step 4-1, constructing a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
step 4-2, splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
step 4-3, using sliding window model to sample EiPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
step 4-4, utilizing two random forest regression models to carry out regression on M +1-M forest trees with the size of WmTraining the sample to obtain 2(M +1-M) regression values as an expansion characteristic; wherein the label value of the sample during training is the label value of a certain sample randomly selected from M samples, and the label value is the vulnerability value PSDC(Ii);
Step 4-5, increasing M by 1, judging whether M is larger than p, if not, returning to the step 4-1, otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and 4-6, splicing the original n-dimensional feature of each sample S with the (p-1) x (2M-p) dimensional expansion feature to obtain an expansion sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so as to generate an expansion sample data set.
Further, the step 5 of constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set specifically includes:
step 5-1, constructing a first layer of cascade regression forest, wherein the first layer of cascade regression forest comprises N random forests, and constructing a first layer of cascade regression forest
4, the expansion sample data set obtained in the step 4 is used as an initial input vector in the deep random forest regression, and therefore an output vector comprising N enhanced features is output;
step 5-2, constructing the next layer of cascade regression forest, and outputting the vector v of the previous layer of cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
step 5-3, judging whether the accuracy obtained in the step 5-2 is improved compared with the accuracy corresponding to the previous layer of cascade regression forest, if so, returning to the step 5-2; and otherwise, judging that the accuracy reaches a threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, wherein the average value of regression of all random forests in the last layer of cascade regression forest is the prediction result of the instruction vulnerability prediction model.
A deep random forest based instruction vulnerability prediction system, the system comprising:
the first feature extraction module is used for performing static analysis on the training program, extracting instruction feature information related to the instruction vulnerability of each program instruction, and generating an instruction feature vector V representing the instruction vulnerability corresponding to the program instructionfeatures
A second feature extraction module for performing fault injection on the training program to obtain vulnerability value P of each program instructionSDC(Ii);
A first sample data set construction module for combining the instruction feature vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii);
The second sample data set construction module is used for performing sliding sampling on the instruction vulnerability sample data set D through a sliding window model, obtaining the instruction sequence expansion characteristic of the sample data and generating an expansion sample data set;
the prediction model construction module is used for constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set;
and the prediction module is used for extracting the instruction feature vector of the target program to be predicted according to the working process of the first feature extraction module and realizing the instruction vulnerability prediction of the target program to be predicted by combining the instruction vulnerability prediction model.
Further, the second sample data set constructing module includes sequentially executed:
a parameter initialization unit for initializing m to 2, m belongs to N*,2≤m≤p,p∈N*
A sliding window model construction unit, configured to construct a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
a sample splicing unit for splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
a sliding sampling unit for sampling E with a sliding window modeliPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
a training unit for training M +1-M forest regression models with WmTraining the sample to obtain 2(M +1-M) regression values as an expansion characteristic; wherein the label value of the sample during training is the label value of a certain sample randomly selected from M samples, and the label value is the vulnerability value PSDC(Ii);
The first judging unit is used for enabling M to be increased by 1, judging whether M is larger than p or not, if not, returning to the execution of the sliding window model building unit, and otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and the expanding sample data set constructing unit is used for splicing the original n-dimensional feature of each sample S and the (p-1) x (2M-p) dimensional expanding feature to obtain an expanding sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so that an expanding sample data set is generated.
Further, the prediction model building module comprises sequentially executed:
the first cascade regression forest building unit is used for building a first layer of cascade regression forest which comprises N random forests, and an extended sample data set obtained by the second sample data set building module is used as an initial input vector in the deep random forest regression, so that an output vector comprising N enhanced features is output;
a second cascade regression forest construction unit for constructing the next cascade regression forest and outputting the output vector v of the previous cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
the second judging unit is used for judging whether the accuracy obtained by the second cascade regression forest constructing unit is increased compared with the accuracy corresponding to the previous layer of cascade regression forest or not, and if yes, the second cascade regression forest constructing unit is executed in a returning mode; and otherwise, judging that the accuracy reaches a threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, wherein the average value of regression of all random forests in the last layer of cascade regression forest is the prediction result of the instruction vulnerability prediction model.
Compared with the prior art, the invention has the following remarkable advantages: 1) the deep random deep forest model can obtain high prediction accuracy on a small-scale sample, so that the prediction model only needs a small amount of training data collection work and is low in complexity; 2) the depth random forest model can automatically adjust the cascade depth according to the training accuracy, so that the parameter adjusting difficulty is reduced while the prediction accuracy is high; 3) sequence features among the instruction samples are extracted through a sliding window scanning method, so that the feature space can more accurately reflect the vulnerability of the instruction SDC, and the prediction accuracy is improved.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a flowchart of an instruction vulnerability prediction method based on a deep random forest according to the present invention.
FIG. 2 is a comparison graph of accuracy of prediction results in an embodiment of the present invention.
FIG. 3 is a diagram illustrating a comparison of mean square error of the prediction results of the present invention with other prediction methods in accordance with the present invention.
Detailed Description
With reference to fig. 1, the present invention provides a method for predicting instruction vulnerability based on a deep random forest, which includes the following steps:
step 1, performing static analysis on a training program, extracting instruction characteristic information related to each program instruction and instruction vulnerability, and generating an instruction characteristic vector V representing the instruction vulnerability corresponding to the program instructionfeatures(ii) a Wherein the instruction feature vector VfeaturesThe following 7-tuple:
Vfeatures=<Vtran_bran,Vcomp,Vaddr,Vmask,Vloop,Varith,Vblock>
in the formula, Vtran_branIndicating branch and branch-related instruction characteristics, including branch-related characteristic fis_branchFunction call related feature fis_callReturn instruction feature fis_return;VcompIndicating compare-instruction-related features, including integer compare-instruction feature fis_int_cmpFloating point compare instruction characteristics fis_float_cmp;VaddrIndicating address-instruction-dependent features, including address-instruction-reference feature fis_used_in_addAddress width characteristic f of destination operation instructiondest_op_widthStore instruction characteristics fis_used_stroe;VmaskRepresenting fault-mask-related features, including logic and instruction features fis_andLogical OR instruction characteristic fis_orLogic shift instruction feature fis_sh;VloopIndicating loop instruction dependency characteristics, including loop position instruction characteristic fis_loopCycle depth characteristic floop_d;VarithRepresenting arithmetic operation correlation features, comprising: addition-subtraction instruction characteristics fis_add/subMultiply-divide instruction feature fis_mul/div;VblockRepresenting features related to basic block information, including: basic block length feature fbb_lengthCharacteristic f of the number of instructions to be executed in the basic blockbb_remain_ins_numNumber of precursor basic blocks characteristic fpred_bb_numThe number of subsequent basic blocks characteristic fsuc_bb_num
Step 2, fault injection is carried out on the training program, and vulnerability values P of all program instructions are obtainedSDC(Ii) The formula used is:
Figure BDA0002308294990000071
in the formula IiDenotes the ith program instruction, PSDC(Ii) Representing program instructions IiSDC vulnerability value of (w) represents an instructionBit width of destination register, MjRepresenting pairs of program instructions IiNumber of SDC failures after fault injection at jth bit, FjRepresents the pair instruction IiThe total number of fault injections performed by the jth bit.
Step 3, combining the instruction characteristic vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii)。
And 4, performing sliding sampling on the instruction vulnerability sample data set D through a sliding window model to obtain the instruction sequence expansion characteristics of the sample data and generate an expansion sample data set. The method specifically comprises the following steps:
let m be 2 as the initial value, m belongs to N*,2≤m≤p,p∈N*Setting a p value in a self-defined mode;
step 4-1, constructing a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
step 4-2, splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
step 4-3, using sliding window model to sample EiPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
step 4-4, utilizing two random forest regression models to carry out regression on M +1-M forest trees with the size of WmTraining the sample to obtain 2(M +1-M) regression values as an expansion characteristic; wherein the label value of the sample during training is the label value of a certain sample randomly selected from M samples, and the label value is the vulnerability value PSDC(Ii);
Step 4-5, increasing M by 1, judging whether M is larger than p, if not, returning to the step 4-1, otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and 4-6, splicing the original n-dimensional feature of each sample S with the (p-1) x (2M-p) dimensional expansion feature to obtain an expansion sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so as to generate an expansion sample data set.
Here, it is further preferable that p is 5 and M is 10, then step 4 specifically includes:
let m be 2 as the initial value, m belongs to N*,2≤m≤5;
Step 4-1, constructing a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the moving step length of the sliding window;
step 4-2, splicing 10 samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (2) is 10 xn;
step 4-3, using sliding window model to sample EiPerforming sliding sampling to obtain 11-m samples with the size of WmThe sample of (1);
step 4-4, utilizing two random forest regression models to perform pair treatment on 11-m forest regression models with the size of WmTraining the sample to obtain 2(11-m) regression values as an expansion characteristic; wherein the label value of the sample during training is the label value of the 10 th sample, and the label value is the vulnerability value PSDC(Ii);
Step 4-5, increasing m by 1, judging whether m is larger than 5, if not, returning to the step 4-1, otherwise, outputting 60-dimensional expansion characteristics obtained by the whole cycle;
and 4-6, splicing the original n-dimensional features and the 60-dimensional expansion features of each sample S to obtain an expansion sample of the 60+ n-dimensional features corresponding to the sample S, thereby generating an expansion sample data set.
And 5, constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set. The method specifically comprises the following steps:
step 5-1, constructing a first layer of cascade regression forest, wherein the cascade regression forest comprises N random forests, and taking the expansion sample data set obtained in the step 4 as an initial input vector in the deep random forest regression, so as to output an output vector comprising N enhanced features;
step 5-2, constructing the next layer of cascade regression forest, and outputting the vector v of the previous layer of cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
step 5-3, judging whether the accuracy obtained in the step 5-2 is improved compared with the accuracy corresponding to the previous layer of cascade regression forest, if so, returning to the step 5-2; and otherwise, judging that the accuracy reaches the threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, and obtaining the prediction result of the instruction vulnerability prediction model by using the average value of regression of all random forests in the last layer of cascade regression forest.
Here, it is further preferable that the hierarchical joint regression forest in step 5-1 includes N random forests, specifically: the hierarchical joint regression forest comprises N-4 random forests which are respectively two random forest regression models fnormalAnd two extreme forest regression models fextremly
And 6, extracting the instruction feature vector of the target program to be predicted according to the process of the step 1, and combining the instruction vulnerability prediction model obtained in the step 5 to realize the instruction vulnerability prediction of the target program to be predicted.
The invention provides an instruction vulnerability prediction system based on a deep random forest, which comprises:
the first characteristic extraction module is used for carrying out static analysis on the training program, extracting instruction characteristic information related to the vulnerability of each program instruction and generating the program instructionInstruction feature vector V corresponding to program instruction and representing instruction vulnerabilityfeatures
A second feature extraction module for performing fault injection on the training program to obtain vulnerability value P of each program instructionSDC(Ii)。
A first sample data set construction module for combining the instruction feature vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii)。
And the second sample data set construction module is used for performing sliding sampling on the instruction vulnerability sample data set D through the sliding window model, obtaining the instruction sequence expansion characteristic of the sample data and generating an expansion sample data set. The module specifically comprises the following steps:
a parameter initialization unit for initializing m to 2, m belongs to N*,2≤m≤p,p∈N*Setting a p value in a self-defined mode;
a sliding window model construction unit, configured to construct a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
a sample splicing unit for splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
a sliding sampling unit for sampling E with a sliding window modeliPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
a training unit for training M +1-M forest regression models with WmTraining the sample to obtain 2(M +1-M) regression values as an expansion characteristic; wherein the label value of the sample during training is randomly selected from M samplesTaking the label value of a certain sample, wherein the label value is the vulnerability value PSDC(Ii);
The first judging unit is used for enabling M to be increased by 1, judging whether M is larger than p or not, if not, returning to the execution of the sliding window model building unit, and otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and the expanding sample data set constructing unit is used for splicing the original n-dimensional feature of each sample S and the (p-1) x (2M-p) dimensional expanding feature to obtain an expanding sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so that an expanding sample data set is generated.
And the prediction model construction module is used for constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set. The module comprises the following steps of:
the first cascade regression forest building unit is used for building a first layer of cascade regression forest which comprises N random forests, and an extended sample data set obtained by the second sample data set building module is used as an initial input vector in the deep random forest regression, so that an output vector comprising N enhanced features is output;
a second cascade regression forest construction unit for constructing the next cascade regression forest and outputting the output vector v of the previous cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
the second judging unit is used for judging whether the accuracy obtained by the second cascade regression forest constructing unit is increased compared with the accuracy corresponding to the previous layer of cascade regression forest or not, and if yes, the second cascade regression forest constructing unit is executed in a returning mode; and otherwise, judging that the accuracy reaches the threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, and obtaining the prediction result of the instruction vulnerability prediction model by using the average value of regression of all random forests in the last layer of cascade regression forest.
And the prediction module is used for extracting the instruction feature vector of the target program to be predicted according to the working process of the first feature extraction module and realizing the instruction vulnerability prediction of the target program to be predicted by combining an instruction vulnerability prediction model.
The present invention will be described in further detail with reference to examples.
Examples
And (3) experimental environment configuration: intel i 78750H CPU, under the Ubuntu Linux 16.04 operating system in the 16G memory. Randomly selecting a part of test programs in a Mibench benchmark test set as a training set, extracting instruction features of a source program by using an analysis program based on an LLVM (Low level virtual machine) compiler to generate an instruction feature vector x, and injecting faults of the training program one by using an LLFI (LLVM based Fault Injection tool) to obtain an instruction SDC vulnerability value y. The total collection is about 4300 sample data, and the characteristic dimension n is 21.
Starting from the 10 th sample, sliding sampling operation is carried out one by utilizing a sliding window, 60-dimensional expansion characteristics are generated through two random forest regressors, expansion samples with 81-dimensional characteristics are finally obtained, and an expansion sample data set is generated. And then, taking the expanded sample data set as an input vector of a deep random forest, using 4 random forest regression models which are the same in pairs for each layer of random forest to generate a 4-dimensional enhancement vector, splicing the 4-dimensional enhancement vector with the initial 21 features to generate a 25-dimensional vector, and taking the 25-dimensional vector as the input of the next layer of random forest. And after the model training is finished, evaluating the accuracy on the test set.
Selecting Isqrt (square root calculation), FFT (Fourier transform), Dijkstra (shortest path planning algorithm), Bitstring (bit and character string conversion), Qsort (quick sorting) and Rad2deg (radian conversion) from a Mibench test suite. After the LLVM is subjected to feature extraction, the prediction model obtained through training is used for carrying out SDC vulnerability prediction on each instruction of the test program, the vulnerability average value of all instructions of each program is calculated, meanwhile, the prediction average values of other prediction models are compared, and the result is shown in figure 2. It can be seen from the figure that the predictive effect of the present invention is closer to the true value on each test program. Where Baseline represents the commanded vulnerability criterion value obtained by fault injection. FIG. 3 shows the comparison of the mean square error of the prediction results with other prediction methods, and it can be seen that the method of the present invention achieves the minimum error value on all test procedures.
In conclusion, the method has high prediction accuracy, low requirement on the sample set and less manual adjustment work, and can be effectively applied to prediction of instruction vulnerability after the program is influenced by the transient fault.

Claims (10)

1. An instruction vulnerability prediction method based on a deep random forest is characterized by comprising the following steps:
step 1, performing static analysis on a training program, extracting instruction characteristic information related to each program instruction and instruction vulnerability, and generating an instruction characteristic vector V representing the instruction vulnerability corresponding to the program instructionfeatures
Step 2, fault injection is carried out on the training program, and vulnerability values P of all program instructions are obtainedSDC(Ii);
Step 3, combining the instruction characteristic vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii);
Step 4, sliding sampling is carried out on the instruction vulnerability sample data set D through a sliding window model, instruction sequence expansion characteristics of sample data are obtained, and an expansion sample data set is generated;
step 5, constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set;
and 6, extracting the instruction feature vector of the target program to be predicted according to the process of the step 1, and combining the instruction vulnerability prediction model obtained in the step 5 to realize the instruction vulnerability prediction of the target program to be predicted.
2. The method for predicting the vulnerability of instructions based on deep random forest as claimed in claim 1, wherein the instruction feature vector V characterizing the vulnerability of instructions in step 1featuresThe following 7-tuple:
Vfeatures=〈Vtran_bran,Vcomp,Vaddr,Vmask,Vloop,Varith,Vblock
in the formula, Vtran_branIndicating branch and branch-related instruction characteristics, including branch-related characteristic fis_branchFunction call related feature fis_callReturn instruction feature fis_return;VcompIndicating compare-instruction-related features, including integer compare-instruction feature fis_int_cmpFloating point compare instruction characteristics fis_float_cmp;VaddrIndicating address-instruction-dependent features, including address-instruction-reference feature fis_used_in_addAddress width characteristic f of destination operation instructiondest_op_widthStore instruction characteristics fis_used_stroe;VmaskRepresenting fault-mask-related features, including logic and instruction features fis_andLogical OR instruction characteristic fis_orLogic shift instruction feature fis_sh;VloopIndicating loop instruction dependency characteristics, including loop position instruction characteristic fis_loopCycle depth characteristic floop_d;VarithRepresenting arithmetic operation correlation features, comprising: addition-subtraction instruction characteristics fis_add/subMultiply-divide instruction feature fis_mul/div;VblockRepresenting features related to basic block information, including: basic block length feature fbb_lengthCharacteristic f of the number of instructions to be executed in the basic blockbb_remain_ins_numNumber of precursor basic blocks characteristic fpred_bb_numThe number of subsequent basic blocks characteristic fsuc_bb_num
3. The deep random forest-based finger of claim 1The method for predicting the vulnerability is characterized in that the step 2 of obtaining the vulnerability value P of each program instructionSDC(Ii) The formula used is:
Figure FDA0002308294980000021
in the formula IiDenotes the ith program instruction, PSDC(Ii) Representing program instructions IiSDC vulnerability value of (M), w represents the bit width of the instruction destination register, MjRepresenting pairs of program instructions IiNumber of SDC failures after fault injection at jth bit, FjRepresents the pair instruction IiThe total number of fault injections performed by the jth bit.
4. The method according to claim 1, wherein the step 4 is to perform sliding sampling on the instruction vulnerability sample data set D through a sliding window model to obtain an instruction sequence extension characteristic of sample data and generate an extension sample data set, and specifically includes:
let m be 2 as the initial value, m belongs to N*,2≤m≤p,p∈N*Setting a p value in a self-defined mode;
step 4-1, constructing a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
step 4-2, splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
step 4-3, using sliding window model to sample EiPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
step 4-4, utilizing two random forest regression models to carry out regression on M +1-M forest trees with the size of WmThe sample is trained to obtain 2(M +1-m) regression values as expansion characteristics; wherein the label value of the sample during training is the label value of a certain sample randomly selected from M samples, and the label value is the vulnerability value PSDC(Ii);
Step 4-5, increasing M by 1, judging whether M is larger than p, if not, returning to the step 4-1, otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and 4-6, splicing the original n-dimensional feature of each sample S with the (p-1) x (2M-p) dimensional expansion feature to obtain an expansion sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so as to generate an expansion sample data set.
5. The method of claim 1, wherein p is 5 and M is 10.
6. The method for predicting the instruction vulnerability based on the deep random forest according to claim 1, wherein the step 5 of constructing and training the instruction vulnerability prediction model based on the deep random forest based on the extended sample data set specifically comprises:
step 5-1, constructing a first layer of cascade regression forest, wherein the cascade regression forest comprises N random forests, and taking the expansion sample data set obtained in the step 4 as an initial input vector in the deep random forest regression, so as to output an output vector comprising N enhanced features;
step 5-2, constructing the next layer of cascade regression forest, and outputting the vector v of the previous layer of cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
step 5-3, judging whether the accuracy obtained in the step 5-2 is improved compared with the accuracy corresponding to the previous layer of cascade regression forest, if so, returning to the step 5-2; and otherwise, judging that the accuracy reaches a threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, wherein the average value of regression of all random forests in the last layer of cascade regression forest is the prediction result of the instruction vulnerability prediction model.
7. The method for predicting instruction vulnerability based on deep random forest as claimed in claim 6, wherein the hierarchical associative regression forest in step 5-1 includes N random forests, specifically: the hierarchical joint regression forest comprises N-4 random forests which are respectively two random forest regression models fnormalAnd two extreme forest regression models fextremly
8. A system for instruction vulnerability prediction based on a deep random forest, the system comprising:
the first feature extraction module is used for performing static analysis on the training program, extracting instruction feature information related to the instruction vulnerability of each program instruction, and generating an instruction feature vector V representing the instruction vulnerability corresponding to the program instructionfeatures
A second feature extraction module for performing fault injection on the training program to obtain vulnerability value P of each program instructionSDC(Ii);
A first sample data set construction module for combining the instruction feature vector VfeaturesAnd an instruction vulnerability value PSDC(Ii) Generating an instruction vulnerability sample data set D, wherein each sample S in the data set comprises an instruction feature vector V corresponding to a certain program instructionfeaturesAnd a vulnerability value PSDC(Ii);
The second sample data set construction module is used for performing sliding sampling on the instruction vulnerability sample data set D through a sliding window model, obtaining the instruction sequence expansion characteristic of the sample data and generating an expansion sample data set;
the prediction model construction module is used for constructing and training an instruction vulnerability prediction model based on the deep random forest based on the extended sample data set;
and the prediction module is used for extracting the instruction feature vector of the target program to be predicted according to the working process of the first feature extraction module and realizing the instruction vulnerability prediction of the target program to be predicted by combining the instruction vulnerability prediction model.
9. The system of claim 8, wherein the second sample data set construction module comprises, executed in order:
a parameter initialization unit for initializing m to 2, m belongs to N*,2≤m≤p,p∈N*Setting a p value in a self-defined mode;
a sliding window model construction unit, configured to construct a sliding window model as follows:
Wm=m×n
in the formula, WmThe width of the sliding window is, n is the characteristic number of each sample S in the instruction vulnerability sample data set D, and is the sliding step length of the sliding window;
a sample splicing unit for splicing M samples in the instruction vulnerability sample data set D into a new sample Ei,EiThe characteristic number of (1) is M multiplied by n;
a sliding sampling unit for sampling E with a sliding window modeliPerforming sliding sampling to obtain M +1-M samples with the size of WmThe sample of (1);
a training unit for training M +1-M forest regression models with WmTraining the sample to obtain 2(M +1-M) regression values as an expansion characteristic; wherein the label value of the sample during training is the label value of a certain sample randomly selected from M samples, and the label value is the vulnerability value PSDC(Ii);
The first judging unit is used for enabling M to be increased by 1, judging whether M is larger than p or not, if not, returning to the execution of the sliding window model building unit, and otherwise, outputting the (p-1) x (2M-p) dimension expansion characteristics obtained in the whole circulation process;
and the expanding sample data set constructing unit is used for splicing the original n-dimensional feature of each sample S and the (p-1) x (2M-p) dimensional expanding feature to obtain an expanding sample of the (p-1) x (2M-p) + n-dimensional feature corresponding to the sample S, so that an expanding sample data set is generated.
10. The system of claim 8, wherein the prediction model building module comprises, performed in sequence:
the first cascade regression forest building unit is used for building a first layer of cascade regression forest which comprises N random forests, and an extended sample data set obtained by the second sample data set building module is used as an initial input vector in the deep random forest regression, so that an output vector comprising N enhanced features is output;
a second cascade regression forest construction unit for constructing the next cascade regression forest and outputting the output vector v of the previous cascade regression forestenhanedAnd the input vector vinputSpliced vinput,venhanced]As the input of the hierarchical cascade regression forest, then, evaluating the accuracy of the whole cascade forest up to the layer by using a cross validation method, namely calculating the mean square error between the regression result and the true value of all random forests on the layer;
the second judging unit is used for judging whether the accuracy obtained by the second cascade regression forest constructing unit is increased compared with the accuracy corresponding to the previous layer of cascade regression forest or not, and if yes, the second cascade regression forest constructing unit is executed in a returning mode; and otherwise, judging that the accuracy reaches a threshold value, not increasing the number of layers of the deep random forest any more, ending the construction and training process, obtaining an instruction vulnerability prediction model based on the deep random forest, wherein the average value of regression of all random forests in the last layer of cascade regression forest is the prediction result of the instruction vulnerability prediction model.
CN201911248246.XA 2019-12-09 2019-12-09 Instruction vulnerability prediction method and system based on deep random forest Active CN111159011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911248246.XA CN111159011B (en) 2019-12-09 2019-12-09 Instruction vulnerability prediction method and system based on deep random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911248246.XA CN111159011B (en) 2019-12-09 2019-12-09 Instruction vulnerability prediction method and system based on deep random forest

Publications (2)

Publication Number Publication Date
CN111159011A true CN111159011A (en) 2020-05-15
CN111159011B CN111159011B (en) 2022-05-20

Family

ID=70555803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911248246.XA Active CN111159011B (en) 2019-12-09 2019-12-09 Instruction vulnerability prediction method and system based on deep random forest

Country Status (1)

Country Link
CN (1) CN111159011B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610154A (en) * 2021-08-06 2021-11-05 吉林大学 GPGPU program SDC error detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
CN108334903A (en) * 2018-02-06 2018-07-27 南京航空航天大学 A kind of instruction SDC fragility prediction techniques based on support vector regression
CN108491317A (en) * 2018-02-06 2018-09-04 南京航空航天大学 A kind of SDC error-detecting methods of vulnerability analysis based on instruction
CN109063775A (en) * 2018-08-03 2018-12-21 南京航空航天大学 Instruction SDC fragility prediction technique based on shot and long term memory network
US20190258807A1 (en) * 2017-09-26 2019-08-22 Mcs2, Llc Automated adjusting of devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
US20190258807A1 (en) * 2017-09-26 2019-08-22 Mcs2, Llc Automated adjusting of devices
CN108334903A (en) * 2018-02-06 2018-07-27 南京航空航天大学 A kind of instruction SDC fragility prediction techniques based on support vector regression
CN108491317A (en) * 2018-02-06 2018-09-04 南京航空航天大学 A kind of SDC error-detecting methods of vulnerability analysis based on instruction
CN109063775A (en) * 2018-08-03 2018-12-21 南京航空航天大学 Instruction SDC fragility prediction technique based on shot and long term memory network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张倩雯 等: "基于机器学习的指令SDC脆弱性分析方法", 《小型微型计算机系统》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610154A (en) * 2021-08-06 2021-11-05 吉林大学 GPGPU program SDC error detection method and device
CN113610154B (en) * 2021-08-06 2023-12-29 吉林大学 GPGPU program SDC error detection method and device

Also Published As

Publication number Publication date
CN111159011B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
US6473884B1 (en) Method and system for equivalence-checking combinatorial circuits using interative binary-decision-diagram sweeping and structural satisfiability analysis
Gong et al. Automatic detection of infeasible paths in software testing
US7216318B1 (en) Method and system for false path analysis
US10936474B2 (en) Software test program generation
US8230382B2 (en) Model based simulation of electronic discharge and optimization methodology for design checking
US11734480B2 (en) Performance modeling and analysis of microprocessors using dependency graphs
US20190243930A1 (en) Methods and Apparatus for Transforming the Function of an Integrated Circuit
US11409916B2 (en) Methods and apparatus for removing functional bugs and hardware trojans for integrated circuits implemented by field programmable gate array (FPGA)
JP4750665B2 (en) Timing analysis method and apparatus
CN111159011B (en) Instruction vulnerability prediction method and system based on deep random forest
Rejimon et al. An accurate probabilistic model for error detection
US6792581B2 (en) Method and apparatus for cut-point frontier selection and for counter-example generation in formal equivalence verification
Ritter et al. Formal verification of designs with complex control by symbolic simulation
JP5625297B2 (en) Delay test apparatus, delay test method, and delay test program
US6760894B1 (en) Method and mechanism for performing improved timing analysis on virtual component blocks
Ganai et al. Completeness in SMT-based BMC for software programs
JP2001052043A (en) Error diagnosis method and error site proving method for combinational verification
CN112162932B (en) Symbol execution optimization method and device based on linear programming prediction
US10852354B1 (en) System and method for accelerating real X detection in gate-level logic simulation
Liu et al. Tbem: Testing-based gpu-memory consumption estimation for deep learning
CN113901479A (en) Security assessment framework and method for transient execution attack dynamic attack link
Chockler et al. Efficient automatic STE refinement using responsibility
US8527922B1 (en) Method and system for optimal counterexample-guided proof-based abstraction
Wang et al. Fast and accurate statistical static timing analysis
Oyeniran et al. High-Level Fault Diagnosis in RISC Processors with Implementation-Independent Functional Test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant