Background technology
Software vulnerability is the potential safety hazard that exists in the program, if by the computer virus utilization, will cause serious harm to system.Though had been found that many leaks, and set up vulnerability database, the known bugs behavior description is imperfection also.Describing in the unknown leak process of discovery according to these, also need assist a large amount of artificial judgment, robotization reasoning degree is low.
It is too fuzzy that behavioural characteristic in subject matter (1) vulnerability database is described semanteme, can't utilize existing feature description to go to judge and have unknown leak in the software.
Current vulnerability database mainly is a semantic layer to the leak behavior description; Google Chrome arbitrary code is carried out leak; Their leak semantic description is " there is the race condition mistake in version before the Google Chrome 17.0.963.46, allows long-range attack person to carry out arbitrary code through the vector that meeting triggers the utility program collapse ".Such semantic description is too unintelligible, and specific software has just been explained in this description, and certain type of leak that exists in the particular version number can't go whether to have unknown leak in the reasoning and judging other types software according to these descriptions.
The classification of subject matter (2) leak behavioural characteristic and quantification are too simple, can't be used for unknown Hole Detection.
When Google Chrome arbitrary code was carried out leak and classified, common leak disclosed the form that tabulation CVE adopts vulnerability database name+discovery time+numbering, and like CVE-2011-3961, such classified information not enough.China country computer network instrument is taken precautions against the center this leak has been adopted oneself numbering NIPC-2012-0428, is the classification of competitive condition mistake to the leak type.This sorting technique is applied among the China national information security vulnerability database leak CNNVD-201202-170 equally.According to leak characteristic attribute method than chronological classification method improvement has been arranged.The risk class that " general leak points-scoring system " CVSS V2 provides this leak is: 9.3, but such quantification explains that just leak has very high-risk grade, the quantification gradation value is for not effect of reasoning and judging.
Subject matter (3) is in the Hole Detection process, and the employing lexical analysis can't detect the behavior leak, and the constraint Analysis method can produce a large amount of noises in tracking behavior control stream, can't clear expression behavior semanteme.In detection, also need assist a large amount of artificial judgment, robotization reasoning degree is lower.
In sum; On the basis of following the trail of domestic and international present Research; Find the bottleneck problem of existence: the behavioural characteristic of known bugs is described too fuzzy; The classification of characteristic and quantized result are too simple, and the behavioural characteristic performance is indeterminate with the essential connection of leak, therefore can't detect unknown leak according to these feature inferences.
Summary of the invention
The object of the invention is exactly for the defective that overcomes above-mentioned prior art existence a kind of software vulnerability detection method based on the behavioural characteristic automaton model that can accurately judge whether to exist software vulnerability according to known behavioural characteristic to be provided.
The object of the invention can be realized through following technical scheme:
A kind of software vulnerability detection method based on the behavioural characteristic automaton model, this method may further comprise the steps:
The automaton model that 1) will have a data constraint is written into flow process from leak behavioural characteristic storehouse, set up the automaton model with data constraint;
2) convert the behavioural characteristic sequence in the leak behavioural characteristic storehouse to the behavioural characteristic language through automaton model;
3) whether have the similarity between uniqueness or tolerance behavioural characteristic individuality through automaton model cycle criterion behavioural characteristic, if behavioural characteristic has uniqueness, then execution in step 4), if behavioural characteristic has similarity, then execution in step 5);
4) according to the semanteme of behavioural characteristic, automaton model detects application state based on mathematical logic, and the existence of reporting software leak;
5) according to the semanteme of behavioural characteristic, automaton model detects application state based on Bayesian logic, and the existence of reporting software leak.
Described automat comprises finite-state automata and behavioural characteristic automat, and described finite-state automata judges whether behavioural characteristic has uniqueness, the similarity between described behavioural characteristic automat tolerance behavioural characteristic individuality.
Describedly detect application state based on mathematical logic and be specially:
It is semantic to adopt propositional logic or predicate logic to express behavioural characteristic, and automaton model judges whether to exist software vulnerability according to known behavioural characteristic.
Describedly detect application state based on Bayesian logic and be specially:
On the basis of behavior characteristic semanteme, utilize Bayesian formula, make up probabilistic logic, judge whether to exist software vulnerability through automaton model.
Compared with prior art, the present invention can detect the existence of software vulnerability accurately according to known behavioural characteristic, thereby improves the reliability and the security of computer software.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment
As shown in Figure 1, a kind of software vulnerability detection method based on the behavioural characteristic automaton model, this method may further comprise the steps:
1) expression of known bugs behavioural characteristic in the leak behavioural characteristic storehouse;
11) set up automaton model BAR with data constraint:
BAR=function(Behavior(),Automaton(),reasoning())
Behavior=B(Judge,In,Out,State,Action)
Automaton=(FSM(In,Out,State,f(In),g(s)),BM(Behavior))
Re?asoning(Logic(FSM(),NoClassicalLogic(BM()))
Wherein, Behavior representes the behavioural characteristic expression, and Automaton representes automat, and reasoning representes reasoning; Behavioural characteristic is expressed as and retrains Judge, input In, output Out, function that state State is relevant with behavior Action; Automat comprises finite-state automata FSM (In, Out, State, f (In), g (s)) and behavioural characteristic automat BM (Behavior), wherein representes f (In) input function, and g (s) representes state transition function; Inference method comprises reasoning and the approximate resoning in the non-classical logic in the classical logic.
12) convert the behavioural characteristic sequence in the leak behavioural characteristic storehouse to behavioural characteristic language L (BAR) through automaton model, for follow-up language classification is laid a good foundation:
L(BAR)={w|w∈In,f(S
0,w)∈S,f
1(S
0,w)∈B}
W is automat BAR acceptable behavior sequence w, and these sequences belong to input set In.From original state S
0Beginning is through state transition function f (S
0, w), the result is the element of state set S;
Same through f
1(S
0, w) ∈ B, the result is the element of behavior set B.State transition function and behavior transfer function all with original state S
0Relevant with the behavior sequence w of input.
2) judge through automaton model whether behavioural characteristic has the similarity between uniqueness or tolerance behavioural characteristic individuality;
21) the language L (w that is accepted by finite-state automata
0) be the software action uniqueness characteristic, the concrete judgement of uniqueness is following:
Suppose that Ω is the ensemble space of behavioural characteristic L (w), behavioural characteristic need meet the following conditions
L(w
1)∪L(w
2)…∪L(w
n)=Ω
L (w so
1) ∪ L (w
2) ... ∪ L (w
n) be the division of Ω, research is at w
iLast extraction behavioural characteristic L (w
i) method, enable to satisfy any two characteristics and occur simultaneously for empty, make it and possess " uniqueness ", can divide characteristic set.
22) similarity is meant the difference of behavior and characteristic.Similarity P (w
i| L (w
0)) be the key foundation which kind of type leak the judgement behavior belongs to, its formula is following:
similarity is defined as 1 when the behavioural characteristic of individuality can be by speech recognition.
When the behavioural characteristic of individuality can not be by speech recognition
Similarity is defined as personal feature S (w
i) and category feature S (w
0) the ratio.
3) according to the semanteme of behavioural characteristic, automaton model detects application state based on mathematical logic or Bayesian logic, and the existence of reporting software leak.For definite semantic, promptly have the behavioural characteristic of uniqueness, with the behavioural characteristic that propositional logic or predicate logic are described, select classical logic for use, make up automat and detect judgement.For semantic ambiguity, promptly have the behavioural characteristic of similarity, adopt Bayesian logic to describe behavioural characteristic, adopt the approximate resoning to judge the possibility that has leak.
31) detect based on logistic automat reasoning
Finite-state automata has solved the description of software vulnerability grammer, and mathematical logic can solve semantic reasoning, and is applied to inference method in the automat, has behavioural characteristic according to known software, and reasoning and judging belongs to the sort of type leak.
311) the semantic definition of behavioural characteristic
The instruction level of subordinate act characteristic, it is semantic to express behavior with propositional logic or predicate logic.
P for example assigns a topic: integer Overflow Vulnerability Q: buffer-overflow vulnerability R: overflow.
The integer Overflow Vulnerability causes overflowing, and contains definition: P → R;
Buffer-overflow vulnerability causes overflowing, the definition of containing: Q → R;
Known conditions: taken place to overflow: R is not a buffer-overflow vulnerability:
infer and to have the sort of type leak?
312) definition is contained and equivalence formula
Contain formula definition: (I1) P ∧ Q=>P; (I2) Q, Q → R=>R; (I3) P, P → R=>R formula definition of equal value: (E1)
313) make up the reasoning automat
With inference machine the definition of containing formula and equivalence formula is described, then according to known behavioural characteristic, the unknown leak that reasoning and judging exists.The definition of reasoning automat:
R={w,L(w),V,I,E}
R is the reasoning automat, and w is a software action,, L (w) is a behavioural characteristic, and V is the leak set, and I contains formulary, and E is the equivalence formula collection.
32) the approximate automat reasoning based on Bayesian logic detects
This known results, the situation of reasoning reason can utilize Bayes's posterior probability to derive.
321) Bayesian logic
If Vulnerability events B
1, B
2..., B
n, and behavioural characteristic A satisfies B
1, B
2..., B
nObjectionable intermingling in twos, P (B
i)>0, i=1,2 ..., n,
P (A)>0
Prior probability P (B
i), represent certain type of leak probability of happening P (B
1), P (B
2)
Conditional probability P (A/B
i), represent leak the probability of behavioural characteristic, P (B to occur
1→ A), P (B
2→ A)
Posterior probability P (B
i/ A), and after software takes place unusually, the posterior probability that certain leak takes place.P(A→B
1)=?
Bayesian logic:
322) the Bayesian approximate resoning in the automat
In automat, contain formula (I2) Q; Q → R=>R; P (Q); P (R|Q) can adopt the formula P (I2)
that contains of its probability to represent.According to prior probability P (B
i), conditional probability P (A/B
i), ask P (B
i/ A) probability is verified in the back, expresses reasoning from logic with bayes method.Automat can carry out approximate resoning to the behavior semantic feature according to probable value.
Adopt the suitable environment of automat approximate resoning method: suppose that leak is made up of the leak of some types, objectionable intermingling between the leak, the probability of each leak generation is all greater than 0.When knowing that probability that leak takes place with leak back software takes place and has unusual probability, require reasoning and judging to work as after software occurs unusually, the probability of certain type of leak generation just can
Conditional probability, Bayesian formula is incorporated in the automat inference system, can better reflect the probabilistic law of " cause and effect ", make " by fruit north because of " deduction, can realize that known behavioural characteristic infers unknown software vulnerability.
On the basis of behavior characteristic semanteme, utilize Bayesian formula, make up probabilistic logic; And structure automat inference method; The unknown leak of reasoning and judging is applicable to the cause-effect relationship that has disclosed behavioural characteristic and leak, realizes the reasoning of known behavioural characteristic to unknown leak.