Instant messaging Worm detection method
Technical field
The present invention relates to field of information security technology, specifically for a kind of detection for detecting instant messaging anthelmintic
Method.
Background technology
Instant messaging (IM) services very popular, as a kind of instant exchange way have in whole the Internet number with
The user of ten million meter.Many popular systems, as MSN Messenger (the Windows Messenger in Windows XP),
Yahoo courier (YIM), AOL Instant Messenger (AIM), and Tencent QQ have changed we and friend, Shu Renhe
The exchange way of business colleague.But, present in instant communication client, leak constitutes great security challenge.
Instant messaging anthelmintic is wide-scale distribution in instant communication network, by utilizing IM client and protocol bug, with
And the safety problem that instant message service is caused.When instant messaging anthelmintic runs, it is usually located at instant messaging visitor
Family end, and attempt oneself is sent to all of friend and infected user.Some anthelmintic utilizes common engine to send information,
Inveigle addressee to receive anthelmintic and run copy.Some IM anthelmintic even can exchange receiver's note and analyze their return
Multiple.There are many IM anthelmintic example such as Chock, SoFunny, JS Menger at present.
IM anthelmintic is different from periodic scanning virus and e-mail worm.Although research worker make great efforts the most very much understand and
Containment scanning anthelmintic and the breeding of e-mail worm, but owing to these researchs of different infection mechanisms are not that to be well suited for IM compacted
Worm.Instant messaging anthelmintic is applied suppression technology to slow down the propagation of anthelmintic by M.Williamson et al..But the method may
Effective communication can be postponed and limit too many IM user and allow contact person/day that only one of which is new etc..
Summary of the invention
For above-mentioned weak point present in prior art, the technical problem to be solved in the present invention is to provide one i.e.
Time communication Worm detection method.
The present invention adopts the following technical scheme that:
A kind of instant messaging Worm detection method, for communication server, comprises the following steps:
1) the study stage is by infecting the behavior characteristics of anthelmintic on the data analysis network of anthelmintic on network, by feature letter
Number analyzes the behavioral data of normal users, is stored in data base;
2) detection-phase detection module accepts the new data by gateway and uses in simple mahalanobis distance and step 1)
In data base, the similarity of characteristic function contrasts, and then judges that whether new data is by helminthic infection.
Further, simple mahalanobis distance computing formula is:
Wherein,For simple mahalanobis distance, m is characterized the number of function, xiFor the ith feature value of new data,
yiFor learning the ith feature value of phase data,For study stage i-th mean eigenvalue, x is new data characteristic vector, y
For learning stage averaged feature vector,For the variance of ith feature value, calculate the simple mahalanobis distance of new dataWith { Xn, n=1,2,3 ... represent simple mahalanobis distance sequence, here n express time interval, simple mahalanobis distance is more
Greatly, represent that the probability of helminthic infection is the biggest.
Further, non-parametric CUSUM is used to make detection algorithm insensitive to site access pattern: first not damaging
Lose under any characteristic, { Xn, n=1,2,3 ... } it is transformed into another random sequence { Zn, n=1,2,3 ..., make all ZnIn negative value
Will not accumulate in time, define ZnAs follows:
Zn=Xn-β (11)
Parameter beta is a constant, and for specific network condition, it helps to create a random sequence with negative value
{Zn, n=1,2,3 ..., recursive condition is as follows:
yn=(yn-1+Zn)+
y0=0 (12)
Wherein as (yn-1+Zn) > 0 time, (yn-1+Zn)+Equal to (yn-1+Zn), it is otherwise 0, ynThe biggest, show to attack the strongest, its
Middle ynIt is test statistics, ynRepresent XnAccumulation on the occasion of;
Wherein,Initial S0=0;
Then decision function is expressed as: (14)
Wherein, N represents worm detecting threshold value, dN(yn) represent the judgement at time n, inspection statistics ynMore than N, then dN(yn)
It is 1, indicates that attack occurs, otherwise dN(yn) it is 0, represent properly functioning.
Further, in order to calculate simple mahalanobis distance, use incremental learning to update statistical value and keep the correct of statistics
Property, if EiFor an eigenvalue of i-th sample, set three variablees (E, ω, n),
N is historical sample length, and when observing new sample, ternary is updated such as formula (7), (8) and (9):
n=n+1 (9)
Sample variance is calculated as such as formula (10):
Further, described characteristic function is: characteristic function URL ():
Here U is the URL that user sets transmission;
Characteristic function Filereq ():
Here A is the file size that user sets transmission;
Characteristic function IPAder ():
IPAddr()=Number of distinct IP address (3)。
Present invention have the advantage that and beneficial effect:
First the present invention passes through characteristic function in the study stage, distinguishes behavior and the instant messaging anthelmintic row of domestic consumer
For difference.Then, network worm is detected by simple mahalanobis distance.In order to make testing mechanism to site access pattern not
Sensitivity, have employed non-parametric CUSUM, generates alert when the distance of new data has exceeded the permission distance of algorithm setting
Report.The effectiveness of this inventive method is demonstrated from the data of university's instant communication server collection.
Assembly of the invention is used to install in a gateway, the machine based on 1GHz Pentium III.Every warp in data set
Spend 10 seconds, the CPU time needed for record data process part.In the sample of 99%, energy in the CPU time less than 2 seconds
Enough process the packet of 10 seconds.Additionally, maximum duration was less than four seconds CPU time needed for any ten seconds sample process.All samples
This service rate has exceeded the arrival rate of flow.This shows that the real-time performance of the inventive method has exceeded 10 seconds companies of a catenet
Send out flow.
Accompanying drawing explanation
Fig. 1 propagates by sending network address in text message for emulation IM anthelmintic, and (a) shows and change feelings at characteristic function
The situation of change of test statistics after condition, (b) introducing IM anthelmintic;
Fig. 2 is for showing that emulation IM anthelmintic shows in characteristic function situation of change, (b) introducing by sending file propagation
Test statistics y after IM anthelminticnSituation of change.
Detailed description of the invention
Below in conjunction with the accompanying drawings and the present invention is described in detail by embodiment:
A kind of instant messaging Worm detection method, for communication server, the detection device of the method place main body sets
Put on the gateway of communication server, detect by the data of gateway, comprise the following steps:
The step 1) study stage passes through to infect anthelmintic behavior characteristics on the data analysis network of anthelmintic on network, it is stored in number
According in storehouse;
Typical user uses instant communicating system to be to work or entertaining.He/her exchanges daily life with other people.
It is special what it does not appears to, but its one important feature of announcement: possible of user and several personal comminications over a period to come.
On the contrary, instant messaging anthelmintic is by extensive widespread as far as possible, generally by the trustship anthelmintic code sent or the URL of file website.
Therefore, it can distinguish instant messaging anthelmintic behavior from normally performed activity.But after loading anthelmintic code, IM anthelmintic will send one
Maliciously the message language of network address is to different users.So it is inferred that this network address transmission ratio will increase.Defined function
Count (x) is that the user that quantity is different uses identical x value and a user communication.Such as, if a user sends
Www.google.com is to four different friends in contact list, and at this moment Count (www.google.com) is equal to four.
For portraying this feature, defined feature function URL () such as formula (1).
Here U is the URL that user sets transmission.
It is the most identical with content that another kind of infection character more typically is that victim sends file size.It practice, these are civilian
Part is exactly instant messaging anthelmintic.For describing this feature, definition file forwards the characteristic function of request, such as formula (2).
Here A is the file size that user sets transmission
Multiple friends over a period to come with a user communication.When user uses MSN, they can be
Contact list select that friend or those friends link up.But, anthelmintic can attempt to propagate the soonest, therefore it
Substantial amounts of friend may contact with contact list, thus deviate from normal users usage behavior.In contact list
One IP address can represent a friend, and defined feature function IPAder () describes this feature such as formula (3).
IPAddr()=Number of distinct IP address(3)
Step 2) detection module accepts by the new data of gateway and uses simple mahalanobis distance and the feature in step 1)
The similarity of function contrasts, and then judges that whether new data is by helminthic infection.
Simple mahalanobis distance computing formula is:
Wherein,For simple mahalanobis distance, m is characterized the number of the eigenvalue of function, xiI-th for new data
Eigenvalue, yiFor the ith feature value of training stage data,For training stage i-th mean eigenvalue, x is that new data is special
Levying vector, y is training stage averaged feature vector,For the variance of ith feature value, calculate the simple geneva of new data
DistanceSimple mahalanobis distance is the biggest, represents that the probability of helminthic infection is the biggest.With { Xn, n=1,2,3 ... } represent simple
Mahalanobis distance sequence, now n express time length,
Mahalanobis distance is the most frequently used polynary anomaly statistics.What formula described substantially is that new sample is the most abnormal in history
The data of study.Here, calculate the data of New Observer and the study stage obtains the distance of data.Distance is the highest, and more having can
It can be abnormal sign.
The definition of mahalanobis distance such as formula (4):
Here x and y is two characteristic vectors, and each vector element is variable.X is new observational characteristic vector, and y is study
The averaged feature vector calculated in stage.C-1It is inverse covariance matrix Cij=Cov(yi,yj), yi, yjIt it is study phase characteristic vector
Middle ith and jth eigenvalue.
Assuming that feature is statistical iteration, mahalanobis distance provides a process useful, weighs current deviation from baseline.Cause
This, it is each eigenvalue variance that covariance matrix C becomes element on diagonal matrix and diagonal.Therefore, simple geneva away from
From such as formula (5):
Here m is set to 3 (because having three optional feature values).
When being contacted with friend by instant communicating system, owing to busy study or active user use the most always
It.Therefore, characteristic function value is likely lower than associated averages, but, this is not meant to that it is abnormal.Therefore, this deviation
Should not be set as mahalanobis distance.Therefore, formula (6) is used to calculate simple mahalanobis distance.
Wherein as (yn-1+Zn) > 0 time, (yn-1+Zn)+Equal to (yn-1+Zn), it is otherwise 0.
In order to calculate simple mahalanobis distance, use incremental learning update statistical value keep statistics correctness, if Ei
For an eigenvalue of i-th sample, set three variablees (E, ω, n),
N is historical sample length, and when observing new sample, ternary is by more
Newly such as formula (7), (8) and (9):
n=n+1 (9)
Wherein in (7), (8), (9), it is the value of new samples on the left of equal sign, is previous historical sample length on the right side of equal sign
Value.
Sample variance is calculated as such as formula (10):
In order to make the testing mechanism insensitivity to site access pattern, a kind of printenv accumulation summation CUSUM method.
Non-parametric CUSUM is used to make insensitive to site access pattern of detection: first not losing any characteristic
Under, { Xn, n=1,2,3 ... } it is transformed into another random sequence { Zn, n=1,2,3 ..., make all ZnIn negative value will not be in time
Accumulation, defines ZnAs follows:
Zn=Xn-β (11)
Parameter beta is that for specific network condition, it helps to create a random sequence with negative value to a constant
{Zn, n=1,2,3 ..., recursive condition is as follows:
yn=(yn-1+Zn)+
y0=0 (12)
Wherein (yn-1+Zn)+as (yn-1+Zn) > 0 equal to (yn-1+Zn), it is otherwise 0, ynThe biggest, show to attack the strongest, wherein
ynIt is test statistics, ynRepresent XnAccumulation on the occasion of;
Wherein,Initial S0=0;
Decision function is expressed as:
Wherein, N represents worm detecting threshold value, dN(yn) represent the judgement at time n, inspection statistics ynMore than N, then dN(yn)
It is 1, indicates that attack occurs, otherwise dN(yn) it is 0, represent properly functioning.
β is taken as 3 in the present invention.
Embodiment
The inventive method is demonstrated by simulated environment.Have collected 521 user data sets of certain university's communication server (i.e.
Time Communications service be only applicable in campus) and divide the data into two parts as study and classification and Detection.Wherein, 80% data quilt
As training data, remaining is 20% for carrying out mixing and for detecting IM anthelmintic with IM worm attack data, IM anthelmintic number
According to being random mixing.Additionally, the file simulating instant messaging anthelmintic in text message in every 5 minutes or the website information of transmission
Friend in online contact list.
For normal discharge:
Owing to being busy with work or arduous research, user will not contact with the friend in contact list, special
It is not at midnight.Therefore, when corresponding characteristic function value is much larger than zero.Result is as shown in table 1:
Table 1
characteristic |
μ |
σ2 |
URL() |
1.333312 |
0.420157 |
FileReq() |
1.271003 |
0.236540 |
IPAddr() |
2.600212 |
0.737141 |
When domestic consumer uses IM service, text message has the transmission request of several file and network address.At great majority
In the case of, user is communicated with each other by text message.From result, it is also seen that URL () and FileReq () average are
1.333312 and 1.271003, corresponding variance is 0.420157 and 0.236540.Although it means that user is at text message
Middle transmission network address or the requirement of file transmission, they generally send identical URL or file to one or two different friend.
The meansigma methods of IPAddr () and variance are 2.600212 and 0.73714.
After increasing instant messaging anthelmintic flow, worm detecting:
As it is shown in figure 1, emulation IM anthelmintic is propagated by sending network address in text message.A () shows at characteristic function
Situation of change.It is shown to the value of URL () when not having instant messaging anthelmintic flow and is not more than 1, the excursion of IPAddr () value
From 0 to 3.But, as (b) display URL () and the unexpected of IPAddr () value after introducing IM anthelmintic change close to 10 to peak.And
Do not change the value of FileReq ().Therefore, IM anthelmintic can detect in a unit interval after outburst.
Fig. 2 shows that emulation IM anthelmintic is by sending file propagation.A () shows that FileReq () value is not more than 1 He
IPAddr () value excursion does not increase IM anthelmintic flow from 0 to 3.But, FileReq () value and IPAddr () value are different from
Normal value is after introducing IM anthelmintic.They beyond 7 and reach their peak 15 at change.FileReq () value is always 0.Therefore,
B () shows that this method, after introducing IM anthelmintic, detects in the unit interval after outburst.
Carry out same test 100 times repeatedly.Result is similar, negative value does not occur.
Assembly of the invention will be used to install in a gateway, the machine based on 1GHz Pentium III.In data set often
CPU time through 10 seconds, needed for record data process part.In the sample of 99%, in the CPU time less than 2 seconds
The packet of 10 seconds can be processed.Additionally, maximum duration was less than four seconds CPU time needed for any ten seconds sample process.All
Sample service rate has exceeded the arrival rate of flow.This shows that the real-time performance of the inventive method has exceeded a catenet 10 seconds
Running fire flow.