CN103490992B - IM worm detection method - Google Patents

IM worm detection method Download PDF

Info

Publication number
CN103490992B
CN103490992B CN201310470865.XA CN201310470865A CN103490992B CN 103490992 B CN103490992 B CN 103490992B CN 201310470865 A CN201310470865 A CN 201310470865A CN 103490992 B CN103490992 B CN 103490992B
Authority
CN
China
Prior art keywords
worm
value
data
mahalanobis distance
yu
Prior art date
Application number
CN201310470865.XA
Other languages
Chinese (zh)
Other versions
CN103490992A (en
Inventor
郭薇
周翰逊
张国栋
贾大宇
Original Assignee
沈阳航空航天大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 沈阳航空航天大学 filed Critical 沈阳航空航天大学
Priority to CN201310470865.XA priority Critical patent/CN103490992B/en
Publication of CN103490992A publication Critical patent/CN103490992A/en
Application granted granted Critical
Publication of CN103490992B publication Critical patent/CN103490992B/en

Links

Abstract

本发明涉及信息安全技术领域,具体地来说为一种用于即时通讯蠕虫的检测方法。 The present invention relates to the field of information security, in particular for the detection method is a method for instant messaging worm. 本发明分为两个步骤:首先,在学习阶段通过特征函数,将普通用户的行为和即时通讯蠕虫行为区分开来。 The present invention is divided into two steps: First, in the learning phase by the characteristic function to distinguish normal user behavior and instant messaging worm behavior area. 然后,在检测阶段,通过简单马氏距离计算当前网络流量与学习数据的相似度。 Then, during the detection phase, the similarity of the current network traffic learning data calculated by simple Mahalanobis distance. 为了使得检测机制对站点访问模式不敏感,通过无参数CUSUM算法对相似度进行计算,当新的网络流量的距离超过了算法设定的允许距离时生成报警。 In order to make the detection mechanism insensitive to the site access patterns, similarity is calculated by non-parametric CUSUM algorithm, when a new network traffic from more than generate an alarm when set allows the distance algorithm.

Description

即时通讯蠕虫检测方法 IM worm detection method

技术领域 FIELD

[0001] 本发明涉及信息安全技术领域,具体地来说为一种用于检测即时通讯蠕虫的检测方法。 [0001] The present invention relates to the field of information security, in particular for detecting chat worm detection method is a method for.

背景技术 Background technique

[0002] 即时通讯(頂)服务非常受欢迎,作为一种即时的交流方式在整个互联网拥有数以千万计的用户。 [0002] Instant Messaging (top) service is very popular as a way of instant communication has tens of millions of users across the Internet. 诸多热门系统,如的MSN Messenger(Windows XP中的Windows Messenger), 雅虎信使(YM),A0L Instant MessengerUHO,和腾讯QQ已经改变了我们与朋友、熟人和商业同事的交流方式。 Many popular systems, such as the MSN Messenger (Windows XP in Windows Messenger), Yahoo Messenger (YM), A0L Instant MessengerUHO, and Tencent QQ has changed the way we communicate with friends, acquaintances and business colleagues. 然而,即时通讯客户端中存在的漏洞构成极大的安全挑战。 However, the instant messaging client loopholes pose a great security challenges.

[0003] 即时通讯蠕虫是在即时通讯网络中广泛传播,通过利用IM客户端和协议漏洞,以及即时消息服务所造成的一个安全问题。 [0003] Instant messaging worm is widespread in instant messaging network, a security issue by using the IM client and protocol vulnerabilities, as well as instant messaging services caused. 当即时通讯蠕虫运行时,它通常位于即时通讯客户端,并试图将自己发送给所有的朋友和被感染的用户。 When instant messaging worm is run, it is usually located in instant messaging client, and attempts to send itself to all your friends and users are infected. 有些蠕虫利用公共引擎发送信息, 诱骗收件人收到蠕虫运行副本。 Some worms using public engine sends information to trick the recipient receives a copy of the worm runs. 有些IM蠕虫甚至能够交换接受者短信并且分析他们的回复。 Some IM worms even the recipient can exchange text messages and analyze their reply. 目前有许多IM懦虫实例如Chock,SoFunny,JS Menger。 There are many examples, such as IM cowardly worm Chock, SoFunny, JS Menger.

[0004] 頂蠕虫不同于定期扫描病毒和电子邮件蠕虫。 [0004] Top regularly scan for viruses and worms is different from e-mail worms. 虽然研究人员已经很努力去理解和遏制扫描蠕虫和电子邮件蠕虫的繁殖,但由于不同的感染机制这些研究并不是很适合頂蠕虫。 Although researchers have tried very hard to understand and curb the breeding scanning worms and email worms, but because these studies different infectious mechanism is not very suitable for the top worm. M .Wi Iliamson等人对即时通讯懦虫应用抑制技术以减缓懦虫的传播。 M .Wi Iliamson others suppression technology for instant messaging applications to slow the spread of insect cowardly cowardly worm. 但是该方法可能会延迟有效通讯并且限制太多的頂用户允许只有一个新的联系人/天等等。 However, this approach may delay the effective communication and too restrictive to allow only the top user a new contact / day, and so on.

发明内容 SUMMARY

[0005] 针对现有技术中存在的上述不足之处,本发明要解决的技术问题在于提供一种即时通讯蠕虫检测方法。 [0005] For the above-described deficiencies present in the prior art, the present invention is to solve the technical problem is to provide a method for detecting an instant messaging worm.

[0006] 本发明采用如下的技术方案: [0006] The present invention adopts the following technical solution:

[0007] -种即时通讯蠕虫检测方法,用于通讯服务器上,包括以下步骤: [0007] - Species chat worm detection method for a communication server, comprising the steps of:

[0008] 1)学习阶段通过网络上感染蠕虫的数据分析网络上蠕虫的行为特征,通过特征函数分析正常用户的行为数据,存入数据库中; Behavioral characteristics worm network [0008] 1) a learning phase analysis data on the network worm infection, normal user behavior data analysis by a characteristic function, stored in the database;

[0009] 2)检测阶段检测模块接受通过网关的新数据并采用简单马氏距离与步骤1)中的数据库中特征函数的相似度进行对比,进而判断出新数据是否受蠕虫感染。 [0009] 2) receiving a similarity detection phase detection module 1) of the new data gateway Mahalanobis distance and a simple step function feature database comparison, so as to determine whether the new data by helminth infections.

[0010]进一步地,简单马氏距离计算公式为: [0010] Further, the Mahalanobis distance is calculated simply as:

[0011] [0011]

Figure CN103490992BD00041

(6) (6)

[0012]其中,£/〇,办为简单马氏距离,m为特征函数的数目,Xl为新数据的第i个特征值, yi为学习阶段数据的第i个特征值,为学习阶段第i个平均特征值,X为新数据特征向量,y 为学习阶段平均特征向量,4为第i个特征值的方差,计算出新数据的简单马氏距离以工,;)〇,用%,11=1,2,3~}表示简单马氏距离序列,这里11表示时间间隔,简单马氏距离越大,表示蠕虫感染的几率越大。 [0012] wherein, £ / square, Mahalanobis distance do simple, m is the number of the characteristic function, Xl is the i-th feature value of the new data, the i-th feature value yi data of the learning phase, the learning phase is wherein an average value of i, X is the new data feature vector, y is the learning phase of averaged feature vectors, 4 is the i-th feature value variance, Mahalanobis distance is calculated simply work to new data,;) square, with% ~ 11 = 1,2,3} represents a simple sequence Mahalanobis distance, where 11 represents the time interval, the greater the simple Mahalanobis distance, the greater the chance of helminth infections.

[0013]进一步地,采用无参数CUSUM算法使检测算法对站点访问模式不敏感:首先在不损失任何特性下,{Xn,n=l,2,3…}转化到另一个随机序列{Zn,n=l,2,3···},使所有Z n*的负值不会随时间积累,定义Zn如下: [0013] Further, without using the detection algorithm parameters CUSUM algorithm is insensitive to the site access patterns: first without losing any properties, {Xn, n = l, 2,3 ...} is transformed into another random sequence {Zn, n = l, 2,3 ···}, so that all the negative Z n * not accumulate with time, Zn is defined as follows:

[0014] Ζη=Χη-β (11) [0014] Ζη = Χη-β (11)

[0015] 参数β是一个常量,针对特定的网络条件它有助于产生一个带有负值的随机序列{2",11=1,2,3~},递归条件如下: [0015] The parameter β is a constant, for a given network conditions it generates a random sequence contributes with a negative {2 '} ~ 11 = 1,2,3, recursive conditions were as follows:

[0016] yn=(yn-i+Zn) + [0016] yn = (yn-i + Zn) +

[0017] y〇=〇(12) [0017] y〇 = square (12)

[0018] 其中当(yn-i+ZdX)时,(yn-!+Zn)+等于(y n-i+Zn),否则为0,yn越大,表明攻击越强,其中yn是测试统计值,yn表示Xn的累积正值; [0018] wherein when (yn-i + ZdX), (yn -! + Zn) + is equal to (y n-i + Zn), otherwise 0, yn larger, the stronger attack, where yn is the test statistic value, yn Xn represents cumulative value;

Figure CN103490992BD00051

[0019] [0019]

[0020] [0020]

[0021] [0021]

[0022]其中,N代表蠕虫检测阈值,dN(yn)表示在时间η的判决,检验统计yn大于N,则d N( yn) 为I,表示有攻击发生,否则dN(yn)为0,表示正常运行。 [0022] where, N for worm detection threshold, dN (yn) represents the time η judgment, the test statistic yn greater than N, then d N (yn) is I, expressed attack, or dN (yn) is 0, It indicates normal operation.

[0023]进一步地,为了计算简单马氏距离,采用增量学习更新统计值来保持统计的正确性,设E1为第i个样本的一个特征值,设定三个变量(Ε,ω,n: [0023] Further, in order to calculate the Mahalanobis distance is simple, using statistical accuracy incremental learning value updated to maintain statistics set E1 is a characteristic value of the i th sample, setting three variables (Ε, ω, n :

Figure CN103490992BD00052

η为历史样本长度,当观察到新的样本,三变量被更新如式(7),( 8)和(9): η history length of the sample, when a new sample is observed, three variables are updated as the formula (7), (8) and (9):

Figure CN103490992BD00053

[0024] [0024]

[0025] [0025]

[0026] [0026]

[0027]样本方差计算为如式(10): [0027] The sample variance is calculated as the formula (10):

[0028] [0028]

Figure CN103490992BD00054

[0029]进一步地,所述的特征函数为:特征函数URLO : [0029] Furthermore, the characteristic function: the characteristic function URLO:

[0030] [0030]

Figure CN103490992BD00061

[0031] 这里的U是用户设定发送的URL; [0031] U here is the URL sent by the user to set;

[0032] 特征函数FilereqO : [0032] The characteristic function FilereqO:

[0033] [0033]

Figure CN103490992BD00062

[0034] 这里A是用户设定发送的文件大小; [0034] where A is set by the user transmitted file size;

[0035] 特征函数IPAderO: [0035] The characteristic function IPAderO:

[0036] IPAddrO=Number of distinct IP address (3) 〇 [0036] IPAddrO = Number of distinct IP address (3) square

[0037] 本发明具有如下的优点及有益效果: [0037] The present invention has the following advantages and beneficial effects:

[0038] 本发明首先在学习阶段通过特征函数,区分出普通用户的行为和即时通讯蠕虫行为的差异。 [0038] The present invention is characterized by a first function in the learning phase, differentiate normal user behavior and the difference chat worm behavior. 然后,通过简单马氏距离来检测网络蠕虫。 Then, by simply Mahalanobis distance to detect network worms. 为了使检测机制对站点访问模式的不敏感性,采用了无参数CUSUM算法,当新的数据的距离超过了算法设定的允许距离时生成警报。 In order to make the detection mechanism insensitive to access the site model, using the non-parametric CUSUM algorithm, generating an alarm when a new distance is allowed distance data exceeds a preset algorithm. 从大学即时通讯服务器收集的数据证明了该发明方法的有效性。 It demonstrates the effectiveness of the method of the invention University of instant messaging server to collect data from.

[0039]采用本发明的装置安装在网关中,以IGHz奔腾III为基础的机器。 [0039] The apparatus of the present invention is installed in the gateway to IGHz based Pentium III machine. 在数据集中每经过10秒钟,记录数据进程部分所需的CPU时间。 In 10 seconds after each data set, the data recording portion of the process required CPU time. 在99%的样本中,在不到2秒的CPU时间内能够处理10秒的数据包。 99% of the samples, within the CPU can be less than 2 seconds for 10 seconds of data packets. 此外,任何十秒钟样本处理所需最长时间少于四秒CPU时间。 In addition, any desired sample processing longest ten seconds less than four seconds of CPU time. 所有样本服务率超过了流量的到达率。 All samples service rate exceeds the arrival rate traffic. 这表明本发明方法的实时性能超过了一个大型网络10秒连发流量。 This indicates that real-time performance of the method of the present invention exceeds 10 s bursts of traffic a large network.

附图说明 BRIEF DESCRIPTION

[0040] 图1为仿真頂蠕虫通过在文本信息中发送网址传播,(a)显示了在特征函数变化情况、(b)引入IM懦虫后测试统计值的变化情况; [0040] FIG. 1 is a top worm simulation by sending the URL in text information dissemination, (a) shows the change of the test insects after introduction IM timid changes in the characteristic function, (b) a statistical value;

[0041] 图2为显示了仿真頂蠕虫通过发送文件传播显示了在特征函数变化情况、(b)引入IM懦虫后测试统计值yn的变化情况。 [0041] FIG. 2 is a top worm simulation shows the change of the characteristic function changes, (b) introducing the test statistic IM confucianist yn insects by sending the file transmission.

具体实施方式 Detailed ways

[0042] 下面结合附图及实施例对本发明进行详细地说明: [0042] will be described in detail with accompanying drawings and embodiments of the present invention:

[0043] -种即时通讯蠕虫检测方法,用于通讯服务器上,该方法所在主体的检测装置设置在通讯服务器的网关上,对通过网关的数据进行检测,包括以下步骤: [0043] - Species chat worm detection method for a communication server, the detection means is provided which is located on the main body of the communication gateway server, the gateway to the data is detected, comprising the steps of:

[0044] 步骤1)学习阶段通过网络上感染蠕虫的数据分析网络上蠕虫行为特征,,存入数据库中; Characterized worm network behavior [0044] Step 1) was analyzed by the learning stage worm infection on the network data stored in the database ,,;

[0045] 典型的用户使用即时通讯系统是为了工作或娱乐。 [0045] Typical users use instant messaging system for work or play. 他/她与其他人交流日常生活。 He / she exchanges daily life with other people. 它似乎没有什么特别,但它揭示一个重要的特点:在一定时期内用户可能只和几个人交流。 It seems nothing special, but it reveals an important feature: the exchange of only a few people could and users in a given period. 相反,即时通讯蠕虫将尽可能广泛蔓延,通常通过发送的托管蠕虫代码或文件网站的URL。 In contrast, instant messaging worm would spread widely as possible, usually hosted by the worm code or files sent by the website's URL. 因此,可以从正常的行为中区分即时通讯蠕虫行为。 Therefore, instant messaging worms can be distinguished from the normal behavior of behavior. 但装载蠕虫代码后,IM蠕虫将发送一个恶意网址的文字讯息到不同的用户。 But after loading the worm code, IM worms will send a text message to a malicious Web site to a different user. 所以可以推断,该网址发送比率将会增加。 It can be inferred that the URL will increase the transmission ratio. 定义函数Count(X)为数量不同的用户使用相同的X值与一个用户通讯。 The Count-defined function (X) is the number of different users using the same X value and a user communication. 例如,如果一个用户发送WWW. google, com给联系名单中的四个不同的朋友,这时Count (WWW. google, com)就等于四。 For example, if a user sends WWW. Google, com to the contact list of four different friends, then Count (WWW. Google, com) is equal to four. 为刻画这个特征,定义特征函数URLO如式(I)。 To characterize this feature, the characteristic function is defined as URLO of formula (I).

[0046] [0046]

Figure CN103490992BD00071

[0047]这里的U是用户设定发送的URL。 [0047] U here is the URL sent by the user to set.

[0048] 另一种较常见的感染特点是受害者发送文件大小和内容都相同。 [0048] Another common infection is characterized by the victim to send the file size and content are the same. 实际上,这些文件就是即时通讯蠕虫。 In fact, these files is instant messaging worm. 为描述这一特征,定义文件转发请求的特征函数,如式(2)。 To describe this feature, the characteristic function is defined file transfer request, the formula (2).

[0049] [0049]

Figure CN103490992BD00072

[0050] 这里A是用户设定发送的文件大小 [0050] where A is set by the user transmitted file size

[0051] 多个朋友在一定时期内与一个用户通讯。 [0051] In a period of time to communicate with multiple friends a user. 当用户使用即时通讯软件,他们可以在联络人清单中选择那个朋友或那些朋友进行沟通。 When users use instant messaging software, they can choose those that friend or friends to communicate in your contact list. 然而,蠕虫会试图尽可能快传播,因此它可能与联络人清单中大量的朋友联系,这样就偏离了正常用户使用行为。 However, the worm will try to spread fast as possible, so it may contact the contact list a lot of friends, so that deviate from the normal user behavior. 在联络人清单中一个IP地址可以代表一个朋友,定义特征函数IPAderO来描述这特点如式(3)。 In the contact list an IP address can represent a friend, define the characteristic function IPAderO to describe the characteristics of this formula (3).

[0052] IPAddrO=Number of distinct IP address(3) [0052] IPAddrO = Number of distinct IP address (3)

[0053] 步骤2)检测模块接受通过网关的新数据并采用简单马氏距离与步骤I)中的特征函数的相似度进行对比,进而判断出新数据是否受蠕虫感染。 [0053] Step 2) the detection module to accept the new data gateway and simple Mahalanobis distance and similarities of feature function step I) is compared, so as to determine whether the new data by helminth infections.

[0054]简单马氏距离计算公式为: [0054] Simple Mahalanobis distance is calculated as:

[0055] [0055]

Figure CN103490992BD00073

(6) (6)

[0056] 其中,ί/(Λ·,>·)为简单马氏距离,m为特征函数的特征值的数目,Xi为新数据的第i个特征值, yi为训练阶段数据的第i个特征值,$为培训阶段第i个平均特征值,X为新数据特征向量,y为培训阶段平均特征向量,4为第i个特征值的方差,计算出新数据的简单马氏距离简单马氏距离越大,表示蠕虫感染的几率越大。 The number of eigenvalues ​​[0056] wherein, ί / (Λ ·,> ·) is a simple Mahalanobis distance, m is the characteristic function, Xi is the i-th feature value of the new data, yi is the i-th stage of the training data characteristic values ​​for the training phase $ i-th eigenvalue average, X is the new data feature vector, y is the average feature vector of the training phase, 4 to i-th feature value variance, the new data is simply calculated Mahalanobis distance simple horse the larger the Euclidean distance, the greater the chance of worm infections. 用%,11=1,2,3-}表示简单马氏距离序列,此时η表示时间长度, With%, 11} = 2,3 Mahalanobis distance represents a simple sequence, when η represents the length of time,

[0057] 马氏距离是最常用的多元异常统计。 [0057] Mahalanobis distance is the most commonly used multivariate statistical anomaly. 公式基本描述的是新的样本是否异常于历史学习的数据。 The basic formula describes the new sample data whether an exception in the history of learning. 在这里,计算新观察的数据和学习阶段得到数据的距离。 Here, the new observation and learning phase calculation data obtained from the data. 距离越高,就越有可能是不正常的迹象。 The higher the distance, the more likely it is not normal signs.

[0058]马氏距离的定义如式(4): [0058] Mahalanobis distance defined as the formula (4):

[0059] [0059]

Figure CN103490992BD00081

(4) (4)

[0060] 这里x和y是两个特征向量,每个向量元素是变量。 [0060] where x and y are two feature vectors, each vector element is variable. x是新的观测特征向量,y是学习阶段中计算的平均特征向量。 x is a new observation feature vector, y is the average feature vector calculated in the learning phase. CT1是逆协方差矩阵Cij=Cov(yi,yj),yi,yj是学习阶段特征向量中第i和第j个特征值。 CT1 is an inverse covariance matrix Cij = Cov (yi, yj), yi, yj is the i-th and j-th feature value learning stage feature vector.

[0061] 假设特征是统计独立的,马氏距离提供了一个有用方法,从基线衡量当前偏差。 [0061] wherein assuming statistically independent, the Mahalanobis distance provides a useful way to measure this deviation from the baseline. 因此,协方差矩阵C成为对角线矩阵并且对角线上元素为每个特征值方差。 Accordingly, the covariance matrix C is a diagonal matrix and the diagonal elements of the variance for each feature. 因此,简单马氏距离如式(5): Thus, a simple Mahalanobis distance formula (5):

[0062] [0062]

Figure CN103490992BD00082

(5) (5)

[0063]这里m设置为3(因为有三个可选特征值)。 [0063] where m is set to 3 (since there are three optional feature value).

[0064]当通过即时通讯系统与朋友联系时,由于繁忙的学习或工作用户不一定一直使用它。 [0064] When in touch with friends through instant messaging systems, due to the busy user to study or work do not necessarily have to use it. 因此,特征函数值可能低于相关平均值,但是,这并不意味着它是异常。 Therefore, the relevant characteristic function value may be lower than the average, but that does not mean that it is abnormal. 因此,这种偏差不应设定为马氏距离。 Thus, such deviation should be set to be the Mahalanobis distance. 因此,使用式(6)来计算简单马氏距离。 Thus, using the formula (6) is simple to calculate the Mahalanobis distance.

[0065] [0065]

Figure CN103490992BD00083

[0066] 其中当(yn-i+Zn)>0时,(yn-ι+Ζη) +等于(yn-i+Zn),否则为0。 [0066] wherein when (yn-i + Zn)> 0 when, (yn-ι + Ζη) + equals (yn-i + Zn), and 0 otherwise.

[0067] 为了计算简单马氏距离,采用增量学习更新统计值来保持统计的的正确性,设E1 为第i个样本的一个特征值,设定三个变量(Ε,ω,n), [0067] For computational simplicity Mahalanobis distance using the statistical accuracy incremental learning value updated to keep statistics of set E1 is a characteristic value of the i th sample, setting three variables (Ε, ω, n),

[0068] [0068]

Figure CN103490992BD00084

η为历史样本长度,当观察到新的样本,三变量被更新如式(7),⑶和(9): η history length of the sample, when a new sample is observed, three variables are updated as the formula (7), ⑶ and (9):

Figure CN103490992BD00085

[0069] [0069]

[0070] [0070]

[0071] [0071]

[0072] 其中(7)、(8)、(9)中,等号左侧为新样本的值,等号右侧是前一个历史样本长度的值。 [0072] wherein (7), (8), (9), the left side of the equal sign is the value of the new sample, the value of the right side of the equal sign is the length of the previous sample history.

[0073]样本方差计算为如式(10): [0073] The sample variance is calculated as the formula (10):

[0074] [0074]

Figure CN103490992BD00086

[0075] 为了使检测机制对站点访问模式的不敏感性,一种无参数累积求和CUSUM方法。 [0075] In order to make the detection mechanism insensitive to the site access patterns, a non-cumulative summation parameter CUSUM method.

[0076] 采用无参数CUSUM算法使检测对站点访问模式的不敏感:首先在不损失任何特性下,{Χη,η=1,2,3···}转化到另一个随机序列{Ζ η,η=1,2,3···},使所有Zn中的负值不会随时间积累,定义Zn如下: [0076] The non-parametric CUSUM algorithm enables detection of the site access patterns insensitive: First, without losing any properties, {Χη, η = 1,2,3 ···} is transformed into another random sequence {Ζ η, η = 1,2,3 ···}, so that all negative values ​​in the Zn does not accumulate with time, Zn is defined as follows:

[0077] Ζη=Χη-β (11) [0077] Ζη = Χη-β (11)

[0078] 参数β是一个常量针对特定的网络条件它有助于产生一个带有负值的随机序列{2",11=1,2,3~},递归条件如下: [0078] The parameter β is a constant for a given network conditions it generates a random sequence contributes with a negative {2 '} ~ 11 = 1,2,3, recursive conditions were as follows:

Figure CN103490992BD00091

[0079] [0079]

[0080] [0080]

[0081 ] 其中(yn-ι+Ζη)+当(yn-ι+Ζ η)>0等于(yn-ι+Ζη),否则为0,yn越大,表明攻击越强,其中yn是测试统计,yn表示Xn的累积正值; [0081] wherein (yn-ι + Ζη) + As (yn-ι + Ζ η)> 0 is equal to (yn-ι + Ζη), otherwise 0, yn larger, the stronger attack, where yn is the test statistic , yn represents the cumulative value of Xn;

Figure CN103490992BD00092

[0082] [0082]

[0083] [0083]

[0084] [0084]

[0085] [0085]

[0086] 其中,N代表蠕虫检测阈值,dN(yn)表示在时间η的判决,检验统计yn大于N,则d N( yn) 为I,表示有攻击发生,否则dN(yn)为0,表示正常运行。 [0086] where, N for worm detection threshold, dN (yn) represents the time η judgment, the test statistic yn greater than N, then d N (yn) is I, expressed attack, or dN (yn) is 0, It indicates normal operation.

[0087] 在本发明中β取为3。 [0087] In the present invention, taken as β 3.

[0088] 实施例 [0088] Example

[0089]通过仿真环境验证了本发明方法。 [0089] The method of the present invention is verified by simulation environment. 收集了某大学通讯服务器521个用户数据集(即时通讯服务只适用于校园内)并把数据分为两部分作为学习和分类检测。 A university collected 521 Communications Server user data sets (only applicable to the instant messaging service on campus) and the data is divided into two parts as a learning detection and classification. 其中,80%数据被用作训练数据,其余20%用于与IM蠕虫攻击数据进行混合并且用来检测IM蠕虫,頂蠕虫数据是随机混合的。 Wherein 80% of the data is used as training data, for the remaining 20% ​​is mixed with the worm IM IM data and to detect worm, worm top of the data is randomly mixed. 此外,每5分钟在文本信息中模拟即时通讯蠕虫的文件或发送的网址信息到在线的联络人清单中的朋友。 In addition, every 5 minutes analog instant messaging worm file in a text message or URL information sent to the online contact list of friends.

[0090] 对于正常流量: [0090] For normal traffic:

[0091] 由于忙于工作或艰苦研究,用户不会每时每刻都与联络人清单中的朋友联系,特别是在午夜。 [0091] Due to busy work or study hard, you will not contact all the time with the contact list of friends, especially at midnight. 因此,当相应的特征函数值远大于零时。 Thus, when the corresponding eigenfunction much greater than zero. 结果如表1所示: The results are shown in Table 1:

[0092] 表1 [0092] TABLE 1

Figure CN103490992BD00093

LUU«M」 当晋通用尸便用IM服务时,在又本信思中W儿个又仵传湔请求和|WJ址。 When LUU «M" when General Jin corpse they use IM services, in this letter and think a child in W and Wu Jian Chuan request and | WJ site. 在大多数情况下,用户通过文本信息相互沟通。 In most cases, users communicate with each other via text messages. 从结果中,还看到,URL()和Fi IeReqO均值是1.333312和1.271003,相应的方差是0.420157和0.236540。 From the results, it is also seen, URL () and Fi IeReqO mean it is 1.333312 and 1.271003, and the corresponding variance is 0.420157 0.236540. 这意味着,尽管用户在文本信息中发送网址或文件传输的要求,他们通常发送相同的URL或文件给一个或两个不同的朋友。 This means that although the user sends the request URL or file transfer in a text message, they usually send the same URL or file to one or two different friends. IPAddrO的平均值和方差是2.600212和0.73714。 IPAddrO mean and variance is 2.600212 and 0.73714.

[0095]在增加即时通讯蠕虫流量后,蠕虫检测: [0095] After increasing the flow of instant messaging worms, worm detection:

[0096]如图1所示,仿真IM蠕虫通过在文本信息中发送网址传播。 [0096] As shown in FIG 1, the simulation worm IM text message by sending the URL dissemination. (a)显示了在特征函数变化情况。 (A) shows the case where the characteristic function changes. 显示到当没有即时通讯蠕虫流量时URL()的值不大于I,IPAddr 〇值的变化范围从0到3。 URL displayed when no flow chat worm () is not greater than I, IPAddr square values ​​range from 0 to 3. 然而,如(b)显示当引入頂蠕虫后URL()和IPAddrO值的突然向顶峰变化接近10。 However, as (b) shows the peak when a sudden change in the URL () and after the introduction of the top IPAddrO value close to 10 worms. 并没有改变Fi I eReq ()的值。 I have not changed Fi I eReq () value. 因此JM懦虫可在爆发后的一个单位时间中检测出来。 So JM cowardly worm can be detected in a unit time after the outbreak.

[0097]图2显示了仿真IM蠕虫通过发送文件传播。 [0097] FIG. 2 shows a simulation IM worm spreads by sending a file. (a)显示了FileReqO值不大于1和IPAddrO值变化范围从0到3没有增加IM蠕虫流量。 (A) shows the value not more than 1 IPAddrO FileReqO value variation range from 0 to 3, and without increasing the flow IM worm. 然而,FileReqO值和IPAddrO值不同于正常值在引入IM蠕虫后。 However, FileReqO IPAddrO value and a value different from the normal value after introduction IM worm. 他们变化超出7并达到他们顶峰15<^11以叫()值一直是0。 7 and change them beyond reach their peak 15 <^ 11 call () value is always 0. 因此, (b)表明这种方法在引入IM懦虫后,在爆发后的一个单位时间内检测出来。 Accordingly, (b) show that the method after the introduction of insects IM timid, detected in a unit time after the outbreak.

[0098] 进行了同样的试验反复100次。 [0098] The same test was repeatedly performed 100 times. 结果相似的,没有出现负值。 Similar results, there is no negative result.

[0099] 将采用本发明的装置安装在网关中,以IGHz奔腾III为基础的机器。 [0099] The apparatus of the present invention is installed in the gateway to IGHz based Pentium III machine. 在数据集中每经过10秒钟,记录数据进程部分所需的CPU时间。 In 10 seconds after each data set, the data recording portion of the process required CPU time. 在99%的样本中,在不到2秒的CPU时间内能够处理10秒的数据包。 99% of the samples, within the CPU can be less than 2 seconds for 10 seconds of data packets. 此外,任何十秒钟样本处理所需最长时间少于四秒CPU时间。 In addition, any desired sample processing longest ten seconds less than four seconds of CPU time. 所有样本服务率超过了流量的到达率。 All samples service rate exceeds the arrival rate traffic. 这表明本发明方法的实时性能超过了一个大型网络10秒连发流量。 This indicates that real-time performance of the method of the present invention exceeds 10 s bursts of traffic a large network.

Claims (3)

1. 一种即时通讯蠕虫检测方法,用于通讯服务器上,其特征在于,包括W下步骤: 1) 学习阶段通过网络上感染蠕虫的数据分析网络上蠕虫的行为特征,通过特征函数分析正常用户的行为数据,存入数据库中; 2) 在网关中配置检测模块,检测阶段检测模块接受通过网关的新数据并采用简单马氏距离与步骤1)中的数据库中学习的特征函数的值的相似度进行对比,进而判断出新数据是否受蠕虫感染; 简单马氏距离计算公式为: A chat worm detection method for a communication server, wherein, W comprises the steps of: 1) a learning phase behavior characterized by a worm on a worm infection network analysis on the network data, characterized by analyzing the normal user function behavioral data, stored in the database; 2) in the gateway configuration detection module, detecting stage detection module receiving data through the new gateway and simple step Mahalanobis distance value learned database feature functions 1) similar to of the comparison, and then determine whether the new data is infected by the worm; simple Mahalanobis distance is calculated as:
Figure CN103490992BC00021
倘其中,^;/(;1%^)为简单马氏距离,111为特征函数的数目,扣为新数据的第1个特征值,71为学习阶段数据的第i个特征值,^为学习阶段第i个平均特征值,X为新数据特征向量,y为学习阶段平均特征向量,C,2为第i个特征值的方差,计算出新数据的简单马氏距离</(T.胃Oy 用^。,11=1,2,3...}表示简单马氏距离序列,运里11表示时间间隔,简单马氏距离越大,表示蠕虫感染的几率越大; 所述的特征函数为:特征函数m?L〇: If wherein ^; / (; ^ 1%) is a simple Mahalanobis distance, 111 is the number of the characteristic function, a button for the first new feature value data, 71 is the i-th feature value data of the learning phase, is ^ learning phase the i-th eigenvalue average, X is the new data feature vector, y is the average feature vector of the learning phase, C, 2 is the variance of the i-th feature value, the Mahalanobis distance is calculated simply new data </ ​​(T. with the stomach ^ Oy, ... 11 = 1,2,3} represents a simple sequence Mahalanobis distance, 11 represents the operation in the time interval, the greater the simple Mahalanobis distance, the greater the chance of helminth infections; the feature function: the characteristic function m L〇?:
Figure CN103490992BC00022
运里於特征匯运里A是用户设定发送的文件大小,公式(1)和(2)中定义函数Count(X)为在通讯过程中,通讯内容中使用相同的X的用户数目; 特征函数IPAderO: IPAddrO =Number of distinct IP address (3)。 Wherein in operation in the operation in the exchange is set by the user A transmitted file size, the formula (1) and (2) in the Count-defined function (X) is, the contents of the communication in the communication process use the same number of the user X; wherein function IPAderO: IPAddrO = Number of distinct IP address (3).
2. 按照权利要求1所述的即时通讯蠕虫检测方法,其特征在于,采用无参数CUSUM算法使检测算法对站点访问模式不敏感:首先在不损失任何特性下,{Xu,U = 1,2,3...}转化到另一个随机序列{2。 2. im worm detection method as claimed in claim 1, characterized in that, without using the detection algorithm parameters CUSUM algorithm is not sensitive to the site access patterns: first without losing any properties, {Xu, U = 1,2 , 3 ...} is transformed into another random sequence {2. ,11=1,2,3...},使所有2。 , ... 11 = 1,2,3}, for all 2. 中的负值不会随时间积累,定义2。 The negative does not accumulate over time, Definition 2. 如下: Zu = Xu-P (11) 参数e是一个常量,针对特定的网络条件它有助于产生一个带有负值的随机序列{Zu,u = 1,2,3. ..},递归条件如下: (12) 其中,当(yu-i+Zu)〉0时W为O,yu越大,表明攻击越强,其中 As follows: Zu = Xu-P (11) The parameter e is a constant, for a given network conditions which help to generate a random sequence with a negative value, the recursive {Zu, u = 1,2,3 ...} conditions were as follows: (12) wherein when (yu-i + Zu)> 0 when W is O, yu larger, the stronger attack, wherein
Figure CN103490992BC00031
Yu是测试统计值,yu表示Xuf (13) 其中 Yu is a test statistic, yu represents Xuf (13) in which
Figure CN103490992BC00032
初始So = O; 则判决函数表示为: Initial So = O; then the decision function is expressed as:
Figure CN103490992BC00033
其中,N代表蠕虫检测阔值,(In (yu)表示在时间U的判决,检验统计yu大于N,则(In (yu)为1, 表示有攻击发生,否则dN(yu)为0,表示正常运行。 Wherein, N for worm detection width value, (the In (yu) represents the time U judgment, the test statistic yu greater than N, then (the In (yu) is 1, it indicates that attack, or dN (yu) is 0, indicating normal operation.
3.按照权利要求1所述的即时通讯蠕虫检测方法,其特征在于,为了计算简单马氏距离,采用增量学习更新统计值来保持统计的正确性,设Ei为第i个样本的一个特征值,设定=个变量巧,O,n; 3. im worm detection method as claimed in claim 1, characterized in that, in order to calculate the Mahalanobis distance is simple, using statistical accuracy incremental learning value updated to keep statistics, a feature set Ei is the i-th sample value = setting variables clever, O, n;
Figure CN103490992BC00034
n为历史样本长度,当观察到新的样本,= 变量被更新如式(7),( 8)和(9): n is the length of the sample history, when a new sample is observed, the variables are updated = formula (7), (8) and (9):
Figure CN103490992BC00035
(7) 御n = n+l (9) 样本方差计算为如式(10): (7) Royal n = n + l (9) The sample variance is calculated as the formula (10):
Figure CN103490992BC00036
(10) (10)
CN201310470865.XA 2013-10-10 2013-10-10 IM worm detection method CN103490992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310470865.XA CN103490992B (en) 2013-10-10 2013-10-10 IM worm detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310470865.XA CN103490992B (en) 2013-10-10 2013-10-10 IM worm detection method

Publications (2)

Publication Number Publication Date
CN103490992A CN103490992A (en) 2014-01-01
CN103490992B true CN103490992B (en) 2016-10-19

Family

ID=49830963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310470865.XA CN103490992B (en) 2013-10-10 2013-10-10 IM worm detection method

Country Status (1)

Country Link
CN (1) CN103490992B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219225A (en) * 2014-07-31 2014-12-17 珠海市君天电子科技有限公司 Worm virus detection and prevention method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101496025A (en) * 2005-12-13 2009-07-29 约吉安全系统公司 System and method for providing network security to mobile devices
CN102457525A (en) * 2011-12-19 2012-05-16 河海大学 Load-based anomaly intrusion detection method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101496025A (en) * 2005-12-13 2009-07-29 约吉安全系统公司 System and method for providing network security to mobile devices
CN102457525A (en) * 2011-12-19 2012-05-16 河海大学 Load-based anomaly intrusion detection method and system

Also Published As

Publication number Publication date
CN103490992A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
Xie et al. Spamming botnets: signatures and characteristics
US9853983B2 (en) Preventing phishing attacks based on reputation of user locations
US10148681B2 (en) Automated identification of phishing, phony and malicious web sites
CN102656587B (en) Using the client device in a confidence measure reputation system
US7603718B2 (en) Systems and methods for protecting personally identifiable information
US8615807B1 (en) Simulated phishing attack with sequential messages
KR101201045B1 (en) Prevention of outgoing spam
US20060031306A1 (en) Method and apparatus for scoring unsolicited e-mail
CN102333082B (en) Secure url shortening
US8347394B1 (en) Detection of downloaded malware using DNS information
CN101686235B (en) Device and method for analyzing abnormal network flow
US7668921B2 (en) Method and system for phishing detection
US20070086592A1 (en) Determining the reputation of a sender of communications
US10069857B2 (en) Performing rule-based actions based on accessed domain name registrations
Boykin et al. Leveraging social networks to fight spam
CN102171657B (en) Simplify delivery entity credit rating
Kruegel et al. A multi-model approach to the detection of web-based attacks
Villamarin-Salomon et al. Identifying botnets using anomaly detection techniques applied to DNS traffic
Greensmith et al. Dendritic cells for SYN scan detection
US8205255B2 (en) Anti-content spoofing (ACS)
US20080313734A1 (en) DISTRIBUTED SYSTEM AND METHOD FOR THE DETECTION OF eTHREATS
Amleshwaram et al. Cats: Characterizing automation of twitter spammers
Wen et al. Modeling propagation dynamics of social network worms
US9591017B1 (en) Collaborative phishing attack detection
Greensmith et al. Information fusion for anomaly detection with the dendritic cell algorithm

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model