Summary of the invention:
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of identify computer software behavior
Spectral method.
In order to solve above-mentioned technical problem, the present invention provides a kind of spectral method identifying computer software behavior,
Comprise the following steps:
(1) structure software action represents model: represent S by the model parameter two tuple (A*, B*) of DHMM
Or the software action of G;
(2) extract software action feature: matrix A * is carried out spectral factorization, extract software action feature D;
(3) metric software behavioral similarity: calculate between two computer programs according to B* and D or two
Software action similarity between individual computer program group or between a computer program and a program groups.
Further, described step (1) structure software action represents model, in two kinds of situation:
The first situation: the software action of single computer program represents model
S has M kind software action, and every kind of behavior is corresponding with a hidden state of DHMM (S);With model (A*,
B*) represent the software action of S, be embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
Input computer program S;
Call a kind of DHMM training algorithm and ask for making P [S | A, B, C] maximized model parameter A and B, point
It is not designated as A* and B*;
The second situation: the software action of computer program group represents model
For computer program group G={S1, S2..., SN, G has a M kind software action, every kind of behavior with
DHMM(S1), DHMM (S2) ..., DHMM(SN) a total hidden state is corresponding;With model (A*, B*)
Represent the software action of G, be embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
All computer program S in input G1, S2..., SN;
Call a kind of DHMM training algorithm to ask for making P [S1| A, B, C] ×
P[S2| A, B, C] × ... × P [SN| A, B, C] maximized model parameter A and B, it is designated as A* and B* respectively.
Further, described step (2) is extracted in software action feature, the software action feature square of M × M
Battle array D={dij}M×MRepresent, D i-th (i=1,2 ..., M) row element constitute a row vector
Di=<di1, di2..., diM>, it is embodied as step as follows:
Input computer program S or the software action model (A*, B*) of computer program group G;
Matrix A * is carried out spectral factorization operation, it is decomposed into A*=X ∑ X1, wherein ∑ be one to angular moment
Battle array, the element on each of which diagonal is an eigenvalue of A*, and each row vector of matrix X is and spy
The characteristic vector that value indicative is corresponding;
M eigenvalue in ∑ is sorted by numerical values recited;
The characteristic vector of the 1st X corresponding to eigenvalue after sequence is designated as D1, the 2nd eigenvalue institute is right
The characteristic vector of the X answered is designated as D2, by that analogy, the spy of the X corresponding to ith feature value after sequence
Levy vector and be designated as Di, i=1,2 ..., M.
Further, described step (3) uses step (1) the software action model (A*, B*) that exports and step
Suddenly software action feature D that (2) export, calculates and (uses T respectively between two programs1And T2Represent), single
Program (uses T1Represent) (use T with a program groups2Represent) between or two program groups (use T respectively1
And T2Represent) between software action similarity or distinctiveness ratio, the highest then similarity of distinctiveness ratio is the lowest, otherwise
As the same;Similarity the highest expression T1And T2There is the most similar software action;Software action similarity measurement makes
With two kinds of matrixes: software action represents the B* in model and software action feature D, for the sake of difference, T1's
The two matrix is used1B* and1D represents, T2Matrix use2B* and2D represents;It is embodied as step as follows:
Setting distinctiveness ratio dist (Y, Z) between matrix, wherein Y and Z represents arbitrary two same order matrixes;
Input1B*、1D、2B* and2D;
Use formula [dist (1B*,2B*)]α×[dist(1D,2D)]βWeigh T1And T2Between software action
Distinctiveness ratio, wherein α and β is two real numbers more than or equal to 0.
Compared with prior art, the invention has the beneficial effects as follows: take out from the low-level image feature representing software action
As going out the software action feature of high level, the behavior of software is described from semantic level;By computer program
DHMM (discrete HMM) modeling and spectral factorization method, it is soft that program of expressing quantitatively is had
Part behavior characteristics, according to representing model and the similarity identification Malware of behavior characteristics.
Detailed description of the invention:
The invention will be further described with detailed description of the invention below in conjunction with the accompanying drawings:
The present invention relates to a kind of method for computer software Activity recognition, it uses discrete Hidden Markov
State transition probability (the State of model (Discrete Hidden Markov Model is called for short DHMM)
Transition probabilities) matrix and emission probability (Emission probabilities) thereof
The behavior of matrix description software, spectral factorization (Spectral based on state transition probability matrix
Decomposition) result represents the behavior characteristics of software, finally according to behavior characteristics and emission probability matrix
Identifying the similarity of software action, method flow is as it is shown in figure 1, comprise the following steps:
(1) structure software action represents model: represent S by the model parameter two tuple (A*, B*) of DHMM
Or the software action of G;
(2) extract software action feature: matrix A * is carried out spectral factorization, extract software action feature D;
(3) metric software behavioral similarity: calculate between two computer programs according to B* and D or two
Software action similarity between individual computer program group or between a computer program and a program groups.
The present invention processes the computer program represented with sequence of events (Event sequence).Sequence of events
Being a kind of time or the event string that spatially there is ordering relation, when being used for representing computer program, event can
To be that program comprises or the actual computer instruction performed or job sequence on CPU, it can be program bag
What contain or program was called in the process of implementation is supplied to apply journey by computer operating system or computer equipment
The api function that sequence is called, it is also possible to be other discrete symbols describing software features.The symbolism used
1. event set: V={V1..., Vk..., VK, each element in set (uses VkRepresent,
K=1,2 ..., K) represent an event (discrete symbols), K represents the number of event;
The most single computer program: S=(s1..., st..., sn), represent that this computer program is by n thing
Part is constituted in order, and each event therein (uses stRepresent, t=1,2 ..., n) it is all the element of V, both
st∈V;
3. computer program group: G={S1, S2..., SN, represent that this group computer program is made up of N number of program,
Each program is the sequence of events that 2. a use define;
4. one group of software action: U={ θ1, θ2..., θM, each element in set represents a kind of abstract
Software action, M represents the number of behavior;
5. the discrete HMM DHMM(S of computer program S): DHMM(S)=(S, Q, A, B, C),
Wherein:
The observed value sequence of-model is S=(s1..., st..., sn), st∈V={V1..., Vk..., VK};
The status switch of-model is Q=(q1..., qt..., qn), each element therein (uses qtTable
Show, t=1,2 ..., n) it is the model hidden state corresponding with a software action, hidden state
Number is M, both qt∈U;
-state transition probability matrix A=(aij)M×M, aij=P[qt+1=θj|qt=θi], 1≤i≤M,
1≤j≤M;
-state emission probability matrix B=(bik)M×K, bik=P[vk at t|qt=θi], 1≤i≤M,
1≤k≤K;
-initial state probabilities distribution C={c1..., cj..., cM, cj=P[q1=θj], 1≤j≤M.
Fig. 2 is the principle schematic of the present invention, is explained in detail in the detailed process of each step,
Further, described step (1) structure software action represents model, in two kinds of situation:
The first situation: the software action of single computer program represents model
S has a M kind software action, every kind of behavior withA hidden state corresponding;With model (A*,
B*) represent the software action of S, be embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
Input computer program S;
Call a kind of DHMM training algorithm and ask for making P [S | A, B, C] maximized model parameter A and B, point
It is not designated as A* and B*;
The second situation: the software action of computer program group represents model
For computer program group G={S1, S2..., SN, G has a M kind software action, every kind of behavior with
DHMM(S1), DHMM (S2) ..., DHMM (SN) a total hidden state is corresponding;With model (A*, B*)
Represent the software action of G, be embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
All computer program S in input G1, S2..., SN;
Call a kind of DHMM training algorithm to ask for making P [S1| A, B, C] ×
P[S2| A, B, C] × ... × P [SN| A, B, C] maximized model parameter A and B, it is designated as A* and B* respectively.
Further, described step (2) is extracted in software action feature, the software action feature square of M × M
Battle array D={dij}M×MRepresent, D i-th (i=1,2 ..., M) row element constitute a row vector
Di=<di1, di2..., diM>, it is embodied as step as follows:
Input computer program S or the software action model (A*, B*) of computer program group G;
Matrix A * is carried out spectral factorization operation, it is decomposed into A*=X ∑ X1, wherein ∑ be one to angular moment
Battle array, the element on each of which diagonal is an eigenvalue of A*, and each row vector of matrix X is and spy
The characteristic vector that value indicative is corresponding;
M eigenvalue in ∑ is sorted by numerical values recited;
The characteristic vector of the 1st X corresponding to eigenvalue after sequence is designated as D1, the 2nd eigenvalue institute is right
The characteristic vector of the X answered is designated as D2, by that analogy, the spy of the X corresponding to ith feature value after sequence
Levy vector and be designated as Di, i=1,2 ..., M.
Further, described step (3) uses step (1) the software action model (A*, B*) that exports and step
Suddenly software action feature D that (2) export, calculates and (uses T respectively between two programs1And T2Represent), single
Program (uses T1Represent) (use T with a program groups2Represent) between or two program groups (use T respectively1
And T2Represent) between software action similarity or distinctiveness ratio, the highest then similarity of distinctiveness ratio is the lowest, otherwise
As the same;Similarity the highest expression T1And T2There is the most similar software action;Software action similarity measurement makes
With two kinds of matrixes: software action represents the B* in model and software action feature D, for the sake of difference, T1's
The two matrix is used1B* and1D represents, T2Matrix use2B* and2D represents;It is embodied as step as follows:
Set distinctiveness ratio dist (Y, Z) between matrix, wherein Y and Z represent arbitrary two with valency matrix;
Input1B*、1D、2B* and2D;
Use formula [dist (1B*,2B*)]α×[dist(1D,2D)]βWeigh T1And T2Between software action
Distinctiveness ratio, wherein α and β is two real numbers more than or equal to 0.
Last it should be noted that, above example is only with technical scheme is described, rather than to this
The restriction of bright protection domain, although the present invention being described in detail with reference to specific embodiment, this area
It is to be appreciated by one skilled in the art that technical solution of the present invention can be modified or equivalent, and not
Depart from the spirit and scope of technical solution of the present invention.