CN103778372B - A kind of spectral method identifying computer software behavior - Google Patents

A kind of spectral method identifying computer software behavior Download PDF

Info

Publication number
CN103778372B
CN103778372B CN201410012074.7A CN201410012074A CN103778372B CN 103778372 B CN103778372 B CN 103778372B CN 201410012074 A CN201410012074 A CN 201410012074A CN 103778372 B CN103778372 B CN 103778372B
Authority
CN
China
Prior art keywords
software action
software
model
computer program
matrix
Prior art date
Application number
CN201410012074.7A
Other languages
Chinese (zh)
Other versions
CN103778372A (en
Inventor
陈黎飞
陈可意
Original Assignee
福建师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建师范大学 filed Critical 福建师范大学
Priority to CN201410012074.7A priority Critical patent/CN103778372B/en
Publication of CN103778372A publication Critical patent/CN103778372A/en
Application granted granted Critical
Publication of CN103778372B publication Critical patent/CN103778372B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Abstract

The invention discloses a kind of spectral method identifying computer software behavior: (1) structure software action represents model;(2) software action feature is extracted;(3) metric software behavioral similarity.The invention has the beneficial effects as follows: from the low-level image feature representing software action, take out the software action feature of high level, describe the behavior of software from semantic level;Modeled and spectral factorization method by the DHMM (discrete HMM) of computer program, express the software action feature that program is had quantitatively, according to representing model and the similarity identification Malware of behavior characteristics.

Description

A kind of spectral method identifying computer software behavior

Technical field:

The present invention relates to a kind of spectral method identifying computer software behavior.

Background technology:

Whether computer software Activity recognition technology is Malware for one computer program of auxiliary judgment (Malware).Current method uses the low-level image feature representing software action (to include condition code, API sequence Deng), mate to come forecasting software behavior by characteristic matching or sequence pattern based on machine learning, the former can only For known malware, once Malware mutates, and needs the condition code storehouse that upgrades in time;The latter deposits The shortcoming high in rate of false alarm, rate of failing to report is high.

Summary of the invention:

It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of identify computer software behavior Spectral method.

In order to solve above-mentioned technical problem, the present invention provides a kind of spectral method identifying computer software behavior, Comprise the following steps:

(1) structure software action represents model: represent S by the model parameter two tuple (A*, B*) of DHMM Or the software action of G;

(2) extract software action feature: matrix A * is carried out spectral factorization, extract software action feature D;

(3) metric software behavioral similarity: calculate between two computer programs according to B* and D or two Software action similarity between individual computer program group or between a computer program and a program groups.

Further, described step (1) structure software action represents model, in two kinds of situation:

The first situation: the software action of single computer program represents model

S has M kind software action, and every kind of behavior is corresponding with a hidden state of DHMM (S);With model (A*, B*) represent the software action of S, be embodied as step as follows:

Set the number M of hidden state, and given initial state probabilities distribution C;

Input computer program S;

Call a kind of DHMM training algorithm and ask for making P [S | A, B, C] maximized model parameter A and B, point It is not designated as A* and B*;

The second situation: the software action of computer program group represents model

For computer program group G={S1, S2..., SN, G has a M kind software action, every kind of behavior with DHMM(S1), DHMM (S2) ..., DHMM(SN) a total hidden state is corresponding;With model (A*, B*) Represent the software action of G, be embodied as step as follows:

Set the number M of hidden state, and given initial state probabilities distribution C;

All computer program S in input G1, S2..., SN

Call a kind of DHMM training algorithm to ask for making P [S1| A, B, C] × P[S2| A, B, C] × ... × P [SN| A, B, C] maximized model parameter A and B, it is designated as A* and B* respectively.

Further, described step (2) is extracted in software action feature, the software action feature square of M × M Battle array D={dij}M×MRepresent, D i-th (i=1,2 ..., M) row element constitute a row vector Di=<di1, di2..., diM>, it is embodied as step as follows:

Input computer program S or the software action model (A*, B*) of computer program group G;

Matrix A * is carried out spectral factorization operation, it is decomposed into A*=X ∑ X1, wherein ∑ be one to angular moment Battle array, the element on each of which diagonal is an eigenvalue of A*, and each row vector of matrix X is and spy The characteristic vector that value indicative is corresponding;

M eigenvalue in ∑ is sorted by numerical values recited;

The characteristic vector of the 1st X corresponding to eigenvalue after sequence is designated as D1, the 2nd eigenvalue institute is right The characteristic vector of the X answered is designated as D2, by that analogy, the spy of the X corresponding to ith feature value after sequence Levy vector and be designated as Di, i=1,2 ..., M.

Further, described step (3) uses step (1) the software action model (A*, B*) that exports and step Suddenly software action feature D that (2) export, calculates and (uses T respectively between two programs1And T2Represent), single Program (uses T1Represent) (use T with a program groups2Represent) between or two program groups (use T respectively1 And T2Represent) between software action similarity or distinctiveness ratio, the highest then similarity of distinctiveness ratio is the lowest, otherwise As the same;Similarity the highest expression T1And T2There is the most similar software action;Software action similarity measurement makes With two kinds of matrixes: software action represents the B* in model and software action feature D, for the sake of difference, T1's The two matrix is used1B* and1D represents, T2Matrix use2B* and2D represents;It is embodied as step as follows:

Setting distinctiveness ratio dist (Y, Z) between matrix, wherein Y and Z represents arbitrary two same order matrixes;

Input1B*、1D、2B* and2D;

Use formula [dist (1B*,2B*)]α×[dist(1D,2D)]βWeigh T1And T2Between software action Distinctiveness ratio, wherein α and β is two real numbers more than or equal to 0.

Compared with prior art, the invention has the beneficial effects as follows: take out from the low-level image feature representing software action As going out the software action feature of high level, the behavior of software is described from semantic level;By computer program DHMM (discrete HMM) modeling and spectral factorization method, it is soft that program of expressing quantitatively is had Part behavior characteristics, according to representing model and the similarity identification Malware of behavior characteristics.

Accompanying drawing illustrates:

Fig. 1 is the flow chart of the present invention.

Fig. 2 is the principle schematic of the present invention

Detailed description of the invention:

The invention will be further described with detailed description of the invention below in conjunction with the accompanying drawings:

The present invention relates to a kind of method for computer software Activity recognition, it uses discrete Hidden Markov State transition probability (the State of model (Discrete Hidden Markov Model is called for short DHMM) Transition probabilities) matrix and emission probability (Emission probabilities) thereof The behavior of matrix description software, spectral factorization (Spectral based on state transition probability matrix Decomposition) result represents the behavior characteristics of software, finally according to behavior characteristics and emission probability matrix Identifying the similarity of software action, method flow is as it is shown in figure 1, comprise the following steps:

(1) structure software action represents model: represent S by the model parameter two tuple (A*, B*) of DHMM Or the software action of G;

(2) extract software action feature: matrix A * is carried out spectral factorization, extract software action feature D;

(3) metric software behavioral similarity: calculate between two computer programs according to B* and D or two Software action similarity between individual computer program group or between a computer program and a program groups.

The present invention processes the computer program represented with sequence of events (Event sequence).Sequence of events Being a kind of time or the event string that spatially there is ordering relation, when being used for representing computer program, event can To be that program comprises or the actual computer instruction performed or job sequence on CPU, it can be program bag What contain or program was called in the process of implementation is supplied to apply journey by computer operating system or computer equipment The api function that sequence is called, it is also possible to be other discrete symbols describing software features.The symbolism used

1. event set: V={V1..., Vk..., VK, each element in set (uses VkRepresent, K=1,2 ..., K) represent an event (discrete symbols), K represents the number of event;

The most single computer program: S=(s1..., st..., sn), represent that this computer program is by n thing Part is constituted in order, and each event therein (uses stRepresent, t=1,2 ..., n) it is all the element of V, both st∈V;

3. computer program group: G={S1, S2..., SN, represent that this group computer program is made up of N number of program, Each program is the sequence of events that 2. a use define;

4. one group of software action: U={ θ1, θ2..., θM, each element in set represents a kind of abstract Software action, M represents the number of behavior;

5. the discrete HMM DHMM(S of computer program S): DHMM(S)=(S, Q, A, B, C), Wherein:

The observed value sequence of-model is S=(s1..., st..., sn), st∈V={V1..., Vk..., VK};

The status switch of-model is Q=(q1..., qt..., qn), each element therein (uses qtTable Show, t=1,2 ..., n) it is the model hidden state corresponding with a software action, hidden state Number is M, both qt∈U;

-state transition probability matrix A=(aij)M×M, aij=P[qt+1j|qti], 1≤i≤M, 1≤j≤M;

-state emission probability matrix B=(bik)M×K, bik=P[vk at t|qti], 1≤i≤M, 1≤k≤K;

-initial state probabilities distribution C={c1..., cj..., cM, cj=P[q1j], 1≤j≤M.

Fig. 2 is the principle schematic of the present invention, is explained in detail in the detailed process of each step,

Further, described step (1) structure software action represents model, in two kinds of situation:

The first situation: the software action of single computer program represents model

S has a M kind software action, every kind of behavior withA hidden state corresponding;With model (A*, B*) represent the software action of S, be embodied as step as follows:

Set the number M of hidden state, and given initial state probabilities distribution C;

Input computer program S;

Call a kind of DHMM training algorithm and ask for making P [S | A, B, C] maximized model parameter A and B, point It is not designated as A* and B*;

The second situation: the software action of computer program group represents model

For computer program group G={S1, S2..., SN, G has a M kind software action, every kind of behavior with DHMM(S1), DHMM (S2) ..., DHMM (SN) a total hidden state is corresponding;With model (A*, B*) Represent the software action of G, be embodied as step as follows:

Set the number M of hidden state, and given initial state probabilities distribution C;

All computer program S in input G1, S2..., SN

Call a kind of DHMM training algorithm to ask for making P [S1| A, B, C] × P[S2| A, B, C] × ... × P [SN| A, B, C] maximized model parameter A and B, it is designated as A* and B* respectively.

Further, described step (2) is extracted in software action feature, the software action feature square of M × M Battle array D={dij}M×MRepresent, D i-th (i=1,2 ..., M) row element constitute a row vector Di=<di1, di2..., diM>, it is embodied as step as follows:

Input computer program S or the software action model (A*, B*) of computer program group G;

Matrix A * is carried out spectral factorization operation, it is decomposed into A*=X ∑ X1, wherein ∑ be one to angular moment Battle array, the element on each of which diagonal is an eigenvalue of A*, and each row vector of matrix X is and spy The characteristic vector that value indicative is corresponding;

M eigenvalue in ∑ is sorted by numerical values recited;

The characteristic vector of the 1st X corresponding to eigenvalue after sequence is designated as D1, the 2nd eigenvalue institute is right The characteristic vector of the X answered is designated as D2, by that analogy, the spy of the X corresponding to ith feature value after sequence Levy vector and be designated as Di, i=1,2 ..., M.

Further, described step (3) uses step (1) the software action model (A*, B*) that exports and step Suddenly software action feature D that (2) export, calculates and (uses T respectively between two programs1And T2Represent), single Program (uses T1Represent) (use T with a program groups2Represent) between or two program groups (use T respectively1 And T2Represent) between software action similarity or distinctiveness ratio, the highest then similarity of distinctiveness ratio is the lowest, otherwise As the same;Similarity the highest expression T1And T2There is the most similar software action;Software action similarity measurement makes With two kinds of matrixes: software action represents the B* in model and software action feature D, for the sake of difference, T1's The two matrix is used1B* and1D represents, T2Matrix use2B* and2D represents;It is embodied as step as follows:

Set distinctiveness ratio dist (Y, Z) between matrix, wherein Y and Z represent arbitrary two with valency matrix;

Input1B*、1D、2B* and2D;

Use formula [dist (1B*,2B*)]α×[dist(1D,2D)]βWeigh T1And T2Between software action Distinctiveness ratio, wherein α and β is two real numbers more than or equal to 0.

Last it should be noted that, above example is only with technical scheme is described, rather than to this The restriction of bright protection domain, although the present invention being described in detail with reference to specific embodiment, this area It is to be appreciated by one skilled in the art that technical solution of the present invention can be modified or equivalent, and not Depart from the spirit and scope of technical solution of the present invention.

Claims (1)

1. the spectral method identifying computer software behavior, it is characterised in that it is realized by following steps:
(1) structure software action represents model: represent the software row of S or G by the model parameter two tuple (A*, B*) of DHMM For;
(2) extract software action feature: matrix A * is carried out spectral factorization, extract software action feature D;
(3) software action similarity measurement: calculate between two computer programs according to B* and D or two computer program groups Between or a computer program and a program groups between software action similarity;
Described step (1) structure software action represents model, in two kinds of situation:
The first situation: the software action of single computer program represents model
S has M kind software action, and every kind of behavior is corresponding with a hidden state of DHMM (S);Represent with model (A*, B*) The software action of S, is embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
Input computer program S;
Call a kind of DHMM training algorithm and ask for making P [S | A, B, C] maximized model parameter A and B, be designated as A* respectively And B*;
The second situation: the software action of computer program group represents model
For computer program group G={S1, S2..., SN), G has M kind software action, every kind of behavior and DHMM (S1), DHMM(S2) ..., DHMM (SN) a total hidden state is corresponding;The software action of G is represented with model (A*, B*), It is embodied as step as follows:
Set the number M of hidden state, and given initial state probabilities distribution C;
All computer program S in input G1, S2..., SN
Call a kind of DHMM training algorithm to ask for making P [S1| A, B, C] × P [S2| A, B, C] × ... × P [SN| A, B, C] maximize Model parameter A and B, be designated as A* and B* respectively;
Described step (2) is extracted in software action feature, the software action feature matrix D of M × M={ dij}M×MRepresent, D I-th (i=1,2 ..., M) row element constitute row vector Di=< di1, di2..., diM>, it is embodied as step as follows:
Input computer program S or the software action model (A*, B*) of computer program group G;
Matrix A * is carried out spectral factorization operation, it is decomposed into A*=X ∑ X1, wherein ∑ is a diagonal matrix, and each of which is right Element on linea angulata is an eigenvalue of A*, and each row vector of matrix X is the characteristic vector corresponding with eigenvalue;
M eigenvalue in ∑ is sorted by numerical values recited;
The characteristic vector of the 1st X corresponding to eigenvalue after sequence is designated as D1, the spy of the 2nd X corresponding to eigenvalue Levy vector and be designated as D2, by that analogy, the characteristic vector of the X corresponding to ith feature value after sequence is designated as Di, i=1,2 ..., M;
Described step (3) uses software action feature D that step (1) the software action model (A*, B*) that exports and step (2) export, Calculate and use T respectively between two programs1And T2Expression, single program T1Represent and program groups T2Between expression or Two program groups use T respectively1And T2Software action similarity between expression or distinctiveness ratio, the highest then similarity of distinctiveness ratio is the lowest, Vice versa;Similarity the highest expression T1And T2There is the most similar software action;Software action similarity measurement uses two kinds Matrix: software action represents the B* in model and software action feature D, for the sake of difference, T1The two matrix use1B* and1D represents, T2Matrix use2B* and2D represents;It is embodied as step as follows:
Setting distinctiveness ratio dist (Y, Z) between matrix, wherein Y and Z represents arbitrary two same order matrixes;
Input1B*、1D、2B* and2D;
Use formula [dist (1B*,2B*)]α×[dist(1D,2D)]βWeigh T1And T2Between software action distinctiveness ratio, wherein α and β It is two real numbers more than or equal to 0.
CN201410012074.7A 2014-01-13 2014-01-13 A kind of spectral method identifying computer software behavior CN103778372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410012074.7A CN103778372B (en) 2014-01-13 2014-01-13 A kind of spectral method identifying computer software behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410012074.7A CN103778372B (en) 2014-01-13 2014-01-13 A kind of spectral method identifying computer software behavior

Publications (2)

Publication Number Publication Date
CN103778372A CN103778372A (en) 2014-05-07
CN103778372B true CN103778372B (en) 2016-10-19

Family

ID=50570596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410012074.7A CN103778372B (en) 2014-01-13 2014-01-13 A kind of spectral method identifying computer software behavior

Country Status (1)

Country Link
CN (1) CN103778372B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3258409B1 (en) * 2015-03-18 2019-07-17 Nippon Telegraph and Telephone Corporation Device for detecting terminal infected by malware, system for detecting terminal infected by malware, method for detecting terminal infected by malware, and program for detecting terminal infected by malware

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294948A (en) * 2012-02-27 2013-09-11 百度在线网络技术(北京)有限公司 Software malicious behavior modeling and judging method and device, and mobile terminal
CN103500307A (en) * 2013-09-26 2014-01-08 北京邮电大学 Mobile internet malignant application software detection method based on behavior model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6907396B1 (en) * 2000-06-01 2005-06-14 Networks Associates Technology, Inc. Detecting computer viruses or malicious software by patching instructions into an emulator

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294948A (en) * 2012-02-27 2013-09-11 百度在线网络技术(北京)有限公司 Software malicious behavior modeling and judging method and device, and mobile terminal
CN103500307A (en) * 2013-09-26 2014-01-08 北京邮电大学 Mobile internet malignant application software detection method based on behavior model

Also Published As

Publication number Publication date
CN103778372A (en) 2014-05-07

Similar Documents

Publication Publication Date Title
Cheng et al. Fuzzy time series forecasting based on fuzzy logical relationships and similarity measures
Geng et al. Facial age estimation by learning from label distributions
Wang et al. Combining multiobjective optimization with differential evolution to solve constrained optimization problems
Quattoni et al. An efficient projection for l 1,∞ regularization
Wang et al. Kinect based dynamic hand gesture recognition algorithm research
Chang et al. Robust static output feedback H∞ control for uncertain fuzzy systems
Chen et al. GA-based adaptive neural network controllers for nonlinear systems
Guan et al. Ensemble of bayesian predictors and decision trees for proactive failure management in cloud computing systems
Peng et al. Building program vector representations for deep learning
Li et al. Intrusion detection using convolutional neural networks for representation learning
CN104573359B (en) A kind of mass-rent labeled data integration method of task based access control difficulty and mark person&#39;s ability
CN102707256B (en) Fault diagnosis method based on BP-Ada Boost nerve network for electric energy meter
CN103942568B (en) A kind of sorting technique based on unsupervised feature selection
Kalash et al. Malware classification with deep convolutional neural networks
Dong et al. Automatic age estimation based on deep learning algorithm
Yang et al. Neural network and GA approaches for dwelling fire occurrence prediction
CN102201236B (en) Speaker recognition method combining Gaussian mixture model and quantum neural network
CN104463209A (en) Method for recognizing digital code on PCB based on BP neural network
CN106104406B (en) The method of neutral net and neural metwork training
Tong et al. An efficient deep model for day-ahead electricity load forecasting with stacked denoising auto-encoders
Urbanowicz et al. An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems
CN105631479B (en) Depth convolutional network image labeling method and device based on non-equilibrium study
Morariu et al. A neural network model for time series forecasting
CN104317681A (en) Behavioral abnormality automatic detection method and behavioral abnormality automatic detection system aiming at computer system
CN103617203B (en) Protein-ligand bindings bit point prediction method based on query driven

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160518

Address after: 350007 Fuzhou Road, Cangshan District, Fujian, No. three on the road 8

Applicant after: Fujian Normal University

Address before: 350117 Fujian city of Fuzhou province science and Technology University City Road No. 1 Qishan campus of Fujian Normal University

Applicant before: Chen Lifei

Effective date of registration: 20160518

Address after: 350007 Fuzhou Road, Cangshan District, Fujian, No. three on the road 8

Applicant after: Fujian Normal University

Address before: 350117 Fujian city of Fuzhou province science and Technology University City Road No. 1 Qishan campus of Fujian Normal University

Applicant before: Chen Lifei

DD01 Delivery of document by public notice

Addressee: Chen Lifei

Document name: Notification of Passing Examination on Formalities

Addressee: Chen Lifei

Document name: Notification of Passing Examination on Formalities

GR01 Patent grant
C14 Grant of patent or utility model
DD01 Delivery of document by public notice

Addressee: Fujian Normal University

Document name: Notification of Termination of Patent Right