CN102158486A - Method for rapidly detecting network invasion - Google Patents

Method for rapidly detecting network invasion Download PDF

Info

Publication number
CN102158486A
CN102158486A CN2011100842307A CN201110084230A CN102158486A CN 102158486 A CN102158486 A CN 102158486A CN 2011100842307 A CN2011100842307 A CN 2011100842307A CN 201110084230 A CN201110084230 A CN 201110084230A CN 102158486 A CN102158486 A CN 102158486A
Authority
CN
China
Prior art keywords
feature
sample
network intrusion
attributes
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100842307A
Other languages
Chinese (zh)
Inventor
李元诚
李盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN2011100842307A priority Critical patent/CN102158486A/en
Publication of CN102158486A publication Critical patent/CN102158486A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了信息安全技术领域中的一种网络入侵快速检测方法,用以解决目前使用的网络入侵检测方法运算速度慢以及检测缺乏实时性的问题。该方法包括:对训练集中的每个样本的特征属性进行预处理;对样本集中样本的特征属性进行选择,实现特征降维;将降维后的特征属性组成第一特征向量,以第一特征向量作为球向量机学习算法的输入来训练分类器;对测试集中的每个样本的特征属性进行预处理;对测试集中样本的特征属性进行选择,实现特征降维;将降维后的特征属性组成第二特征向量,以第二特征向量作为训练后的分类器的输入,对分类器进行测试;根据测试结果判定网络是否受到入侵。本发明降低了网络入侵检测计算复杂度,提高了检测实时性和准确性。

Figure 201110084230

The invention discloses a network intrusion rapid detection method in the technical field of information security, which is used to solve the problems of slow calculation speed and lack of real-time detection in the currently used network intrusion detection method. The method includes: preprocessing the feature attributes of each sample in the training set; selecting the feature attributes of the samples in the sample set to achieve feature dimensionality reduction; composing the feature attributes after dimensionality reduction into a first feature vector, and using the first feature The vector is used as the input of the ball vector machine learning algorithm to train the classifier; the feature attributes of each sample in the test set are preprocessed; the feature attributes of the samples in the test set are selected to achieve feature dimensionality reduction; the feature attributes after dimensionality reduction A second feature vector is formed, and the classifier is tested by using the second feature vector as an input of the trained classifier; and it is determined whether the network is invaded according to the test result. The invention reduces the computational complexity of network intrusion detection and improves the real-time performance and accuracy of detection.

Figure 201110084230

Description

一种网络入侵快速检测方法A Fast Detection Method of Network Intrusion

技术领域technical field

本发明属于信息安全技术领域,尤其涉及一种网络入侵快速检测方法。The invention belongs to the technical field of information security, in particular to a method for quickly detecting network intrusion.

背景技术Background technique

随着信息技术的普及与发展,人类社会已走向了网络化的时代。然而互联网是一个面向大众的、开放的网络,目前的互联网协议对于信息的保密和系统的安全考虑得并不完备,对于非法入侵、黑客攻击、保密性信息泄露等安全问题难以维护。计算机网络不断地遭到非法入侵,重要情报资料不断地被窃取,从而使网络入侵检测工作面临着巨大的挑战。现有的网络入侵检测方法通常都采用基于机器学习的方法进行建模,其过程是首先从网络上抓取原始数据,并对数据进行量化和归一化处理;其次采用机器学习算法,以处理好的数据做为其输入来训练分类器。With the popularization and development of information technology, human society has entered the era of networking. However, the Internet is an open network for the public. The current Internet protocol does not fully consider the confidentiality of information and system security, and it is difficult to maintain security issues such as illegal intrusion, hacker attacks, and confidential information leakage. Computer networks are constantly being invaded illegally, and important intelligence data are being stolen constantly, which makes network intrusion detection work face a huge challenge. Existing network intrusion detection methods usually use machine learning-based modeling. The process is to first capture raw data from the network, and quantify and normalize the data; secondly, use machine learning algorithms to process Good data is used as its input to train the classifier.

在网络入侵检测中,把一次网络连接作为一个样本,每个样本数据通常包含几十个特征,建模时对所有特征进行量化处理和归一化处理,即把每个原始样本数据以向量的形式来表示,从而将所有向量组成的集合做为分类器的训练集;然后采用机器学习算法,并以训练集做为其输入来训练分类器。以训练好的分类器对输入的测试样本进行处理,便可得到预测输出值,以确定其属于正常连接还是异常连接。In network intrusion detection, a network connection is regarded as a sample, and each sample data usually contains dozens of features, and all features are quantified and normalized during modeling, that is, each original sample data is expressed as a vector form, so that the set of all vectors is used as the training set of the classifier; then the machine learning algorithm is used, and the training set is used as its input to train the classifier. The input test sample is processed by the trained classifier, and the predicted output value can be obtained to determine whether it belongs to a normal connection or an abnormal connection.

上述过程是现阶段典型的网络入侵检测方法,这种检测技术最主要的缺点在于:入侵检测的原始数据通常包含几十个特征,这些数据运用到一些分类算法中,将使分类速度非常缓慢;另外原始数据中包含的几十个特征中并不是所有的特征都对检测结果产生积极的影响,有些特征甚至会产生消极的影响。再有,以上方法考虑到分类速度问题通常采用的训练数据的规模较小,因而导致检测精度不高,误报率和漏报率很高。The above process is a typical network intrusion detection method at the present stage. The main disadvantage of this detection technology is: the original data of intrusion detection usually contains dozens of features, and these data are used in some classification algorithms, which will make the classification speed very slow; In addition, not all of the dozens of features contained in the original data have a positive impact on the detection results, and some features may even have a negative impact. Furthermore, the above methods take into account the problem of classification speed that the training data usually used is small in scale, which leads to low detection accuracy and high false positive and false negative rates.

针对上述提到的检测技术的不足,本发明提出了一种基于LLE(流形学习算法中的局部线性嵌入算法,Locally Linear Embedding)特征提取的BVM(球向量机,Ball Vector Machine)网络入侵快速检测方法。以同时包含正常连接和异常连接的样本集做为训练集,对训练集中的每个样本进行量化和归一化处理,并提取每个样本的关键特征以降低入侵检测数据的维数,然后利用分类算法训练分类器,最后利用训练好的分类器对未知连接进行分类,以确定其是正常连接还是异常连接。在对样本数据进行降维的过程中引入流形学习算法,对大量样本的特征属性进行分析处理,以发现隐藏在高维数据中有意义的低维结构,从而达到对高维特征属性进行降维处理的目的。分类学习算法引入球向量机算法,该算法引入了核心集的概念,把支持向量机中的求解支持向量的问题转化为求解包含训练样本集的最小球体(MEB)的问题。因为核心集的规模远远小于原始的训练样本集,所以极大的减少了求解优化问题的代价。BVM算法在保证检测率的情况下,能有效的减少样本的训练时间,从而能有效提高检测的实时性。Aiming at the deficiencies in the detection technology mentioned above, the present invention proposes a BVM (Ball Vector Machine, Ball Vector Machine) network intrusion based on LLE (locally linear embedding algorithm in the manifold learning algorithm, Locally Linear Embedding) feature extraction. Detection method. Take the sample set containing both normal connections and abnormal connections as the training set, quantify and normalize each sample in the training set, and extract the key features of each sample to reduce the dimensionality of the intrusion detection data, and then use The classification algorithm trains the classifier, and finally uses the trained classifier to classify the unknown connection to determine whether it is a normal connection or an abnormal connection. In the process of dimensionality reduction of sample data, the manifold learning algorithm is introduced to analyze and process the feature attributes of a large number of samples to find meaningful low-dimensional structures hidden in high-dimensional data, so as to achieve the reduction of high-dimensional feature attributes. The purpose of dimension processing. The classification learning algorithm introduces the spherical vector machine algorithm, which introduces the concept of core set, and transforms the problem of solving the support vector in the support vector machine into the problem of solving the smallest sphere (MEB) containing the training sample set. Because the size of the core set is much smaller than the original training sample set, the cost of solving the optimization problem is greatly reduced. The BVM algorithm can effectively reduce the training time of samples while ensuring the detection rate, thereby effectively improving the real-time performance of detection.

发明内容Contents of the invention

本发明的目的在于,提出一种网络入侵快速检测方法,用以解决目前使用的网络入侵检测方法计算维度高导致的运算速度慢以及检测缺乏实时性和准确率的问题。The purpose of the present invention is to propose a fast detection method for network intrusion, which is used to solve the problems of slow operation speed caused by high calculation dimension and lack of real-time and accuracy in the currently used network intrusion detection method.

为实现上述目的,本发明提供的技术方案是,一种网络入侵快速检测方法,其特征是所述方法包括:In order to achieve the above object, the technical solution provided by the present invention is a method for quickly detecting network intrusion, which is characterized in that the method includes:

步骤1:以同时包含正常连接和异常连接的样本集做为训练集,对训练集中的每个样本的特征属性进行预处理;Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set;

步骤2:采用流形学习算法中的局部线性嵌入算法对样本集中样本的特征属性进行选择,实现特征降维;Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction;

步骤3:将经过步骤2降维后的特征属性组成第一特征向量,以所述第一特征向量作为球向量机学习算法的输入来训练分类器;Step 3: Composing the feature attributes after step 2 dimension reduction into the first feature vector, using the first feature vector as the input of the ball vector machine learning algorithm to train the classifier;

步骤4:对测试集中的每个样本的特征属性进行预处理;Step 4: Preprocessing the feature attributes of each sample in the test set;

步骤5:采用流形学习算法中的局部线性嵌入算法对测试集中样本的特征属性进行选择,实现特征降维;Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction;

步骤6:将经过步骤5降维后的特征属性组成第二特征向量,以所述第二特征向量作为步骤3训练后的分类器的输入,对分类器进行测试;Step 6: Composing the feature attributes after step 5 dimensionality reduction into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier;

步骤7:根据测试结果判定网络是否受到入侵。Step 7: Determine whether the network has been invaded according to the test results.

所述对训练集中的每个样本的特征属性进行预处理具体包括:The preprocessing of the feature attributes of each sample in the training set specifically includes:

步骤11:找出每个样本中的字符特征;Step 11: Find out the character features in each sample;

步骤12:根据预置的关键词表,将字符特征替换为相应的数字特征值;Step 12: replace character features with corresponding digital feature values according to the preset keyword list;

步骤13:将每个样本中的数字特征值作为该样本的特征属性,并对每个样本中的特征属性进行归一化处理。Step 13: Use the digital feature value in each sample as the feature attribute of the sample, and perform normalization processing on the feature attribute in each sample.

所述归一化处理使用最大值归一化方法,具体采用公式

Figure BDA0000053697290000031
进行归一化The normalization process uses the maximum value normalization method, specifically using the formula
Figure BDA0000053697290000031
normalize

处理;其中xi为样本的特征属性,N为样本个数。processing; among them xi is the characteristic attribute of the sample, and N is the number of samples.

所述步骤2包括:Said step 2 includes:

步骤21:利用K近邻方法寻找每个样本的K个近邻点,其中K为给定的值;Step 21: use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value;

步骤22:利用步骤21中得到的K个近邻点构造出每个样本的局部重建权值矩阵;Step 22: Construct a local reconstruction weight matrix for each sample using the K neighbor points obtained in step 21;

步骤23:由每个样本的局部重建权值矩阵及其K个近邻点计算其低维输出值。Step 23: Calculate its low-dimensional output value from the local reconstruction weight matrix of each sample and its K neighbor points.

所述局部重建权值矩阵利用误差函数进行构造,其中

Figure BDA0000053697290000042
The local reconstruction weight matrix utilizes the error function to construct, where
Figure BDA0000053697290000042

所述低维输出值yi满足映射条件:

Figure BDA0000053697290000043
Figure BDA0000053697290000045
其中I是m×m阶的单位矩阵。The low-dimensional output value y i satisfies the mapping condition:
Figure BDA0000053697290000043
and
Figure BDA0000053697290000045
where I is an identity matrix of order m×m.

所述步骤3包括:Said step 3 includes:

步骤31:给定半径r和l=0,并选择训练集中的任意一点z做为初始核心集,令初始核心集S0={z},并根据S0计算出球体的初始中心c0Step 31: given the radius r and l=0, and choose any point z in the training set as the initial core set, set the initial core set S 0 ={z}, and calculate the initial center c 0 of the sphere according to S 0 ;

步骤32:进行迭代运算,在第l次迭代中,如果核心集Sl包含了所有的训练集中的样本,即所有样本都落在球体B(cl,(1+ε)r)之内,则迭代结束;否则,转到步骤33;其中ε为设定值,且ε>0;Step 32: Perform an iterative operation. In the first iteration, if the core set S l contains all the samples in the training set, that is, all samples fall within the sphere B(c l , (1+ε)r), Then the iteration ends; otherwise, go to step 33; where ε is a set value, and ε>0;

步骤33:在核特征空间中,找到球体B(cl,(1+ε)r)外的任意一样本点φ(x),并生成核心集Sl+1=Sl∪{x};其中,φ(*)为核映射函数;Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c l , (1+ε)r), and generate a core set S l+1 = S l ∪{x}; Among them, φ(*) is the kernel mapping function;

步骤34:由核心集Sl+1,求解Sl+1的中心cl+1;其中,cl+1的更新公式为:cl+1=φ(x)+βl(cl-φ(x)),βl=r/‖cl-φ(x)‖。Step 34: Solve the center c l+ 1 of S l+1 from the core set S l+1 ; where, the update formula of c l+1 is: c l+1 =φ(x)+β l (c l - φ(x)), β l = r/∥c l -φ(x)∥.

步骤35:令l=l+1,返回到步骤32。Step 35: set l=l+1, return to step 32.

本发明降低了网络入侵检测计算复杂性,提高了检测实时性和准确性。The invention reduces the computational complexity of network intrusion detection and improves the real-time performance and accuracy of detection.

附图说明Description of drawings

图1是网络入侵快速检测方法流程图;Fig. 1 is the flow chart of network intrusion rapid detection method;

图2是采用流形学习算法中的局部线性嵌入算法对样本集中样本进行特征降维的示意图;Fig. 2 is a schematic diagram of feature dimensionality reduction for samples in a sample set using a local linear embedding algorithm in a manifold learning algorithm;

图3是以特征向量作为球向量机学习算法的输入来训练分类器的流程图。Fig. 3 is a flow chart of training a classifier with the feature vector as the input of the ball vector machine learning algorithm.

具体实施方式Detailed ways

下面结合附图,对优选实施例作详细说明。应该强调的是,下述说明仅仅是示例性的,而不是为了限制本发明的范围及其应用。The preferred embodiments will be described in detail below in conjunction with the accompanying drawings. It should be emphasized that the following description is only exemplary and not intended to limit the scope of the invention and its application.

本发明提供的网络入侵快速检测方法是基于LLE特征提取的BVM网络入侵快速检测方法,该方法有助于解决以往检测方法中不能很好的对入侵检测数据的特征属性进行降维,以及检测实时性差和检测率不高的问题。为此,本发明的解决方法是:以同时包含正常连接和异常连接的样本集做为训练集,对训练集中的每个样本进行量化和归一化处理,并采用流形学习算法中的LLE算法提取每个样本的关键特征以降低入侵检测数据的维数,然后利用球向量机算法训练分类器,最后利用训练好的分类器对未知连接进行分类,以确定其属于正常连接还是异常连接。The rapid network intrusion detection method provided by the present invention is a BVM rapid network intrusion detection method based on LLE feature extraction. The problem of poor performance and low detection rate. For this reason, the solution of the present invention is: use the sample set containing both normal connections and abnormal connections as the training set, quantize and normalize each sample in the training set, and use the LLE in the manifold learning algorithm The algorithm extracts the key features of each sample to reduce the dimensionality of the intrusion detection data, then uses the ball vector machine algorithm to train the classifier, and finally uses the trained classifier to classify the unknown connection to determine whether it is a normal connection or an abnormal connection.

图1是网络入侵快速检测方法流程图。图1中,本发明提供的一种网络入侵快速检测方法包括下列步骤:Fig. 1 is a flow chart of a rapid network intrusion detection method. Among Fig. 1, a kind of network intrusion rapid detection method provided by the present invention comprises the following steps:

步骤1:以同时包含正常连接和异常连接的样本集做为训练集,对训练集中的每个样本的特征属性进行预处理。Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set.

训练集可以直接从网络下载。网上有专门进行网络入侵检测评估用的数据集,叫做KDDCUP’99数据集。训练集中的样本即为网络连接,为了使训练出来的分类器更加准确地对测试集进行检测,此处训练集中的样本要同时包含正常连接和异常连接。The training set can be downloaded directly from the web. There is a data set dedicated to network intrusion detection and evaluation on the Internet, called the KDDCUP'99 data set. The samples in the training set are network connections. In order to make the trained classifier detect the test set more accurately, the samples in the training set should contain both normal connections and abnormal connections.

对训练集中的每个样本的属性进行预处理具体包括:Preprocessing the attributes of each sample in the training set specifically includes:

步骤11:找出每个样本中的字符特征。每个样本即每个连接,包含数字特征和/或字符特征,可以将数字特征(实际上就是一个数字)直接作为数字特征值。Step 11: Find out the character features in each sample. Each sample, that is, each connection, contains digital features and/or character features, and a digital feature (actually a number) can be directly used as a digital feature value.

步骤12:根据预置的关键词表,将字符特征替换为相应的数字特征值。Step 12: According to the preset keyword table, the character features are replaced with corresponding digital feature values.

关键词表至少包括两个字段,分别为字符特征和对应的数字特征值。这样,就可以将每个样本的字符特征(实际上就是字符)转换为数值,即数字特征值。The keyword table includes at least two fields, which are character features and corresponding digital feature values. In this way, the character feature (actually character) of each sample can be converted into a numerical value, that is, a digital feature value.

步骤12实际是量化处理过程。Step 12 is actually a quantization process.

步骤13:将每个样本中的数字特征值作为该样本的特征属性,并对每个样本中的特征属性进行归一化处理。Step 13: Use the digital feature value in each sample as the feature attribute of the sample, and perform normalization processing on the feature attribute in each sample.

所述归一化处理使用最大值归一化方法,具体采用公式

Figure BDA0000053697290000061
进行归一化处理;其中
Figure BDA0000053697290000062
xi为样本的特征属性,N为样本个数。The normalization process uses the maximum value normalization method, specifically using the formula
Figure BDA0000053697290000061
Perform normalization processing; where
Figure BDA0000053697290000062
x i is the characteristic attribute of the sample, and N is the number of samples.

步骤2:采用流形学习算法中的局部线性嵌入算法对样本集中样本的特征属性进行选择,实现特征降维。Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction.

图2是采用流形学习算法中的局部线性嵌入算法对样本集中样本进行特征降维的示意图,图2中,步骤2具体包括:Figure 2 is a schematic diagram of using the local linear embedding algorithm in the manifold learning algorithm to perform feature dimensionality reduction on the samples in the sample set. In Figure 2, step 2 specifically includes:

步骤21:利用K近邻方法寻找每个样本的K个近邻点,其中,K为给定的值。Step 21: Use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value.

把相对于所求样本点距离最近的K个样本点规定为所求样本点的K个近邻点,其中,K是预先给定的值。距离的计算可采用欧式距离计算方法,欧氏距离算法如下:设x,y∈RN,则x,y的欧氏距离可由下式求得:The K sample points closest to the sample point to be obtained are defined as the K neighbor points of the sample point to be obtained, wherein K is a predetermined value. The calculation of the distance can adopt the Euclidean distance calculation method, and the Euclidean distance algorithm is as follows: Let x, y∈R N , then the Euclidean distance of x, y can be obtained by the following formula:

(( ΣΣ ii == 11 NN (( xx ii -- ythe y jj )) 22 )) 11 22

步骤22:利用步骤21中得到的K个近邻点构造出每个样本的局部重建权值矩阵。Step 22: Use the K neighbor points obtained in Step 21 to construct a local reconstruction weight matrix for each sample.

局部重建权值矩阵W=(wij)∈Mn,n是这样的权值矩阵,如果xi与xj不相邻,则wij=0,设xi与xjk(k=1,2,...,K)是相邻的,则有约束

Figure BDA0000053697290000071
Local reconstruction weight matrix W=(w ij )∈M n, n is such a weight matrix, if x i and x j are not adjacent, then w ij =0, set x i and x jk (k=1, 2,...,K) are adjacent, then there is a constraint
Figure BDA0000053697290000071

使用XW近似表示X,会存在一定的误差,这里定义矩阵的Frobenius范数如下:A=(ai,j)∈Mm,n,则

Figure BDA0000053697290000072
Using XW to represent X approximately, there will be certain errors. Here, the Frobenius norm of the matrix is defined as follows: A=(a i, j )∈M m, n , then
Figure BDA0000053697290000072

由下式约束寻找W:

Figure BDA0000053697290000073
Figure BDA0000053697290000074
其中,xjk,i代表xi的k个近邻点。这相当于求一系列最小二乘问题的解。如对xi而言,由下面的方程组可以获得Find W by the following constraints:
Figure BDA0000053697290000073
Right now
Figure BDA0000053697290000074
Among them, x jk, i represent the k nearest neighbor points of x i . This is equivalent to finding the solution to a series of least squares problems. As for x i , the following equations can be obtained

ww jkjk ,, ii :: ΣΣ kk == 11 KK ww jkjk ,, ii == 11 Xwwxya ii == xx ii

步骤23:由每个样本的局部重建权值矩阵及其K个近邻点计算其低维输出值。Step 23: Calculate its low-dimensional output value from the local reconstruction weight matrix of each sample and its K neighbor points.

通过权值矩阵W,可以在低维空间中找到合适的yi。通过以下约束来完成:

Figure BDA0000053697290000076
其中yi是xi的输出向量,yjk,i(k=1,2,...,K)是yi的近邻点,并且要满足两个条件:其中I是m×m的单位矩阵。由此,损失函数可重写为:
Figure BDA0000053697290000079
其中M是N×N的对称矩阵:M=(I-W)T(I-W)。Through the weight matrix W, a suitable y i can be found in a low-dimensional space. This is done with the following constraints:
Figure BDA0000053697290000076
Where y i is the output vector of x i , y jk, i (k=1, 2, ..., K) is the neighbor point of y i , and two conditions must be met: and where I is the identity matrix of m×m. Thus, the loss function can be rewritten as:
Figure BDA0000053697290000079
where M is an N×N symmetric matrix: M=(IW) T (IW).

要使损失函数值达到最小,则取Y为M的最小m个非零特征值所对应的特征向量。在处理过程中,将M的特征值从小到大排列,第一个特征值几乎接近于零,那么舍去第一个特征值。通常取第2到m+1间的特征值所对应的特征向量作为输出结果。To minimize the value of the loss function, Y is the eigenvector corresponding to the smallest m non-zero eigenvalues of M. During the processing, the eigenvalues of M are arranged from small to large, and the first eigenvalue is almost close to zero, then the first eigenvalue is discarded. Usually, the eigenvector corresponding to the eigenvalue between 2nd and m+1 is taken as the output result.

步骤3:将经过步骤2降维后的特征属性组成第一特征向量,以所述第一特征向量作为球向量机学习算法的输入来训练分类器。Step 3: Composing the dimensionality-reduced feature attributes in step 2 into a first feature vector, and using the first feature vector as an input of the ball vector machine learning algorithm to train a classifier.

图3是以特征向量作为球向量机学习算法的输入来训练分类器的流程图。Fig. 3 is a flow chart of training a classifier with the feature vector as the input of the ball vector machine learning algorithm.

图3中,步骤3具体包括:In Fig. 3, step 3 specifically includes:

步骤31:给定半径r,并选择训练集S中的任意一点z∈S做为初始核心集S0={z}并根据S0计算出球体的初始中心c0。z实际上是一个样本。Step 31: Given a radius r, select any point z∈S in the training set S as the initial core set S 0 ={z} and calculate the initial center c 0 of the sphere according to S 0 . z is actually a sample.

步骤32:进行迭代运算,在第l次迭代中,如果核心集Sl包含了所有的训练集中的样本,即所有样本都落在球体B(cl,(1+ε)r)之内(ε为设定值,且ε>0),则迭代结束;否则,转到步骤33。Step 32: Perform iterative operation. In the first iteration, if the core set S l contains all the samples in the training set, that is, all samples fall within the sphere B(c l , (1+ε)r) ( ε is a set value, and ε>0), then the iteration ends; otherwise, go to step 33.

步骤33:在核特征空间中,找到球体B(cl,(1+ε)r)外的任意一样本点φ(x),并生成核心集Sl+1=Sl∪{x};其中,φ(*)为核映射函数。Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c l , (1+ε)r), and generate a core set S l+1 = S l ∪{x}; Among them, φ(*) is the kernel mapping function.

步骤34:由核心集Sl+1,求解Sl+1的中心cl+1。其中,cl+1的更新公式为:cl+1=φ(x)+βl(cl-φ(x)),βl=r/‖cl-φ(x)‖。Step 34: Find the center c l+1 of S l+1 from the core set S l +1 . Wherein, the update formula of c l+1 is: c l+1 =φ(x)+β l (c l -φ(x)), β l =r/‖c l -φ(x)‖.

步骤35:令l=l+1,返回到步骤32。Step 35: set l=l+1, return to step 32.

步骤4:对测试集中的每个样本的特征属性进行预处理。Step 4: Preprocess the feature attributes of each sample in the test set.

测试集可以从网络下载,也可以通过对网络真实环境进行仿真,对仿真网络中的网络连接数据进行抓包,然后进行分析获得的。The test set can be downloaded from the network, or can be obtained by simulating the real network environment, capturing packets of network connection data in the simulated network, and then analyzing them.

步骤5:采用流形学习算法中的局部线性嵌入算法测试集中样本的特征属性进行选择,实现特征降维。Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction.

步骤5的具体执行过程如步骤2,只不过进行特征降维的对象由训练集改成了测试集。The specific execution process of step 5 is the same as step 2, except that the object for feature dimensionality reduction is changed from the training set to the test set.

步骤6:将经过步骤5降维后的特征属性组成第二特征向量,以所述第二特征向量作为步骤3训练后的分类器的输入,对分类器进行测试。Step 6: Composing the feature attributes after dimension reduction in step 5 into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier.

该步骤的具体过程与步骤3类似,只不过输入为第二特征向量,输出结果为正常连接或者异常连接。The specific process of this step is similar to step 3, except that the input is the second feature vector, and the output result is normal connection or abnormal connection.

步骤7:根据测试结果判定网络是否受到入侵。Step 7: Determine whether the network has been invaded according to the test results.

根据步骤6的输出结果,如果为正常连接,则网络未受到入侵;如果未异常连接,则可判定网络受到入侵。According to the output result of step 6, if the connection is normal, the network is not intruded; if there is no abnormal connection, it can be determined that the network is intruded.

本发明使用的流形学习算法具有较强的降维能力,能够发现数据的有用特征;球向量机学习算法通过求解最小闭合球体得到样本的核心集,因为核心集的规模远远小于原始的训练样本集,所以计算耗时和占用的内存空间都极大的减小,同时也减小了支持向量的求解时间和求解优化问题的代价,既实现了网络入侵行为的检测,又保证了检测的准确率和实时性。The manifold learning algorithm used in the present invention has strong dimension reduction ability and can find useful features of data; the ball vector machine learning algorithm obtains the core set of samples by solving the minimum closed sphere, because the scale of the core set is far smaller than the original training sample set, so the time-consuming calculation and the memory space occupied are greatly reduced, and at the same time, the solution time of the support vector and the cost of solving the optimization problem are also reduced, which not only realizes the detection of network intrusion behavior, but also ensures the accuracy of detection. accuracy and timeliness.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (7)

1.一种网络入侵快速检测方法,其特征是所述方法包括:1. A method for fast detection of network intrusion, characterized in that said method comprises: 步骤1:以同时包含正常连接和异常连接的样本集做为训练集,对训练集中的每个样本的特征属性进行预处理;Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set; 步骤2:采用流形学习算法中的局部线性嵌入算法对样本集中样本的特征属性进行选择,实现特征降维;Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction; 步骤3:将经过步骤2降维后的特征属性组成第一特征向量,以所述第一特征向量作为球向量机学习算法的输入来训练分类器;Step 3: Composing the feature attributes after step 2 dimension reduction into the first feature vector, using the first feature vector as the input of the ball vector machine learning algorithm to train the classifier; 步骤4:对测试集中的每个样本的特征属性进行预处理;Step 4: Preprocessing the feature attributes of each sample in the test set; 步骤5:采用流形学习算法中的局部线性嵌入算法对测试集中样本的特征属性进行选择,实现特征降维;Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction; 步骤6:将经过步骤5降维后的特征属性组成第二特征向量,以所述第二特征向量作为步骤3训练后的分类器的输入,对分类器进行测试;Step 6: Composing the feature attributes after step 5 dimensionality reduction into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier; 步骤7:根据测试结果判定网络是否受到入侵。Step 7: Determine whether the network has been invaded according to the test results. 2.根据权利要求1所述的一种网络入侵快速检测方法,其特征是所述对训练集中的每个样本的特征属性进行预处理具体包括:2. A kind of network intrusion rapid detection method according to claim 1, it is characterized in that described characteristic attribute of each sample in training set is carried out preprocessing specifically comprises: 步骤11:找出每个样本中的字符特征;Step 11: Find out the character features in each sample; 步骤12:根据预置的关键词表,将字符特征替换为相应的数字特征值;Step 12: replace character features with corresponding digital feature values according to the preset keyword list; 步骤13:将每个样本中的数字特征值作为该样本的特征属性,并对每个样本中的特征属性进行归一化处理。Step 13: Use the digital feature value in each sample as the feature attribute of the sample, and perform normalization processing on the feature attribute in each sample. 3.根据权利要求2所述的一种网络入侵快速检测方法,其特征是所述归一化处理使用最大值归一化方法,具体采用公式进行归一化处理;其中xi为样本的特征属性,N为样本个数。3. A kind of network intrusion quick detection method according to claim 2, it is characterized in that described normalization process uses the maximum value normalization method, specifically adopts the formula Perform normalization processing; where xi is the characteristic attribute of the sample, and N is the number of samples. 4.根据权利要求1所述的一种网络入侵快速检测方法,其特征是所述步骤2包括:4. A kind of network intrusion quick detection method according to claim 1, it is characterized in that described step 2 comprises: 步骤21:利用K近邻方法寻找每个样本的K个近邻点,其中K为给定的值;Step 21: use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value; 步骤22:利用步骤21中得到的K个近邻点构造出每个样本的局部重建权值矩阵;Step 22: use the K neighbor points obtained in step 21 to construct a local reconstruction weight matrix for each sample; 步骤23:由每个样本的局部重建权值矩阵及其K个近邻点计算其低维输出值。Step 23: Calculate its low-dimensional output value from the local reconstruction weight matrix of each sample and its K neighbor points. 5.根据权利要求4所述的一种网络入侵快速检测方法,其特征是所述局部重建权值矩阵利用误差函数
Figure FDA0000053697280000021
进行构造,其中
5. A kind of network intrusion rapid detection method according to claim 4, it is characterized in that described local reconstruction weight matrix utilizes error function
Figure FDA0000053697280000021
to construct, where
6.根据权利要求4所述的一种网络入侵快速检测方法,其特征是所述低维输出值yi满足映射条件:
Figure FDA0000053697280000023
Figure FDA0000053697280000024
Figure FDA0000053697280000025
其中I是m×m阶的单位矩阵。
6. a kind of network intrusion rapid detection method according to claim 4 is characterized in that described low-dimensional output value yi satisfies mapping condition:
Figure FDA0000053697280000023
and
Figure FDA0000053697280000024
Figure FDA0000053697280000025
where I is an identity matrix of order m×m.
7.根据权利要求1所述的一种网络入侵快速检测方法,其特征是所述步骤3包括:7. A kind of network intrusion quick detection method according to claim 1, it is characterized in that described step 3 comprises: 步骤31:给定半径r和l=0,并选择训练集中的任意一点z做为初始核心集,令初始核心集S0={z},并根据S0计算出球体的初始中心c0Step 31: given the radius r and l=0, and choose any point z in the training set as the initial core set, set the initial core set S 0 ={z}, and calculate the initial center c 0 of the sphere according to S 0 ; 步骤32:进行迭代运算,在第l次迭代中,如果核心集Sl包含了所有的训练集中的样本,即所有样本都落在球体B(cl,(1+ε)r)之内,则迭代结束;否则,转到步骤33;其中ε为设定值,且ε>0;Step 32: Perform an iterative operation. In the first iteration, if the core set S l contains all the samples in the training set, that is, all samples fall within the sphere B(c l , (1+ε)r), Then the iteration ends; otherwise, go to step 33; where ε is a set value, and ε>0; 步骤33:在核特征空间中,找到球体B(cl,(1+ε)r)外的任意一样本点φ(x),并生成核心集Sl+1=Sl∪{x};其中,φ(*)为核映射函数;Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c l , (1+ε)r), and generate a core set S l+1 = S l ∪{x}; Among them, φ(*) is the kernel mapping function; 步骤34:由核心集Sl+1,求解Sl+1的中心cl+1;其中,cl+1的更新公式为:cl+1=φ(x)+βl(cl-φ(x)),βl=r/‖cl-φ(x)‖。Step 34: Solve the center c l+ 1 of S l+1 from the core set S l+1 ; where, the update formula of c l+1 is: c l+1 =φ(x)+β l (c l - φ(x)), β l = r/∥c l -φ(x)∥. 步骤35:令l=l+1,返回到步骤32。Step 35: set l=l+1, return to step 32.
CN2011100842307A 2011-04-02 2011-04-02 Method for rapidly detecting network invasion Pending CN102158486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100842307A CN102158486A (en) 2011-04-02 2011-04-02 Method for rapidly detecting network invasion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100842307A CN102158486A (en) 2011-04-02 2011-04-02 Method for rapidly detecting network invasion

Publications (1)

Publication Number Publication Date
CN102158486A true CN102158486A (en) 2011-08-17

Family

ID=44439669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100842307A Pending CN102158486A (en) 2011-04-02 2011-04-02 Method for rapidly detecting network invasion

Country Status (1)

Country Link
CN (1) CN102158486A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072115A (en) * 2015-08-12 2015-11-18 国家电网公司 Information system invasion detection method based on Docker virtualization
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN107066881A (en) * 2016-12-14 2017-08-18 四川长虹电器股份有限公司 Intrusion detection method based on Kohonen neutral nets
CN107404471A (en) * 2017-04-05 2017-11-28 青海民族大学 One kind is based on ADMM algorithm network flow abnormal detecting methods
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A network intrusion anomaly detection method based on machine learning
CN110198319A (en) * 2019-06-03 2019-09-03 电子科技大学 Security protocol bug excavation method based on more counter-examples
CN110249331A (en) * 2017-01-30 2019-09-17 微软技术许可有限责任公司 For the successive learning of intrusion detection
CN110875912A (en) * 2018-09-03 2020-03-10 中移(杭州)信息技术有限公司 A deep learning-based network intrusion detection method, device and storage medium
CN111753877A (en) * 2020-05-19 2020-10-09 海克斯康制造智能技术(青岛)有限公司 Product quality detection method based on deep neural network transfer learning
CN111797997A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Network intrusion detection method, model construction method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157921A (en) * 1998-05-01 2000-12-05 Barnhill Technologies, Llc Enhancing knowledge discovery using support vector machines in a distributed network environment
CN101478534A (en) * 2008-12-02 2009-07-08 广东海洋大学 Network exception detecting method based on artificial immunity principle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157921A (en) * 1998-05-01 2000-12-05 Barnhill Technologies, Llc Enhancing knowledge discovery using support vector machines in a distributed network environment
CN101478534A (en) * 2008-12-02 2009-07-08 广东海洋大学 Network exception detecting method based on artificial immunity principle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李元诚等: "《An Intrusion Detection Method Based on LLE and BVM》", 《INFORMATION NETWORKING AND AUTOMATION(ICINA)》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072115B (en) * 2015-08-12 2018-06-08 国家电网公司 A kind of information system intrusion detection method based on Docker virtualizations
CN105072115A (en) * 2015-08-12 2015-11-18 国家电网公司 Information system invasion detection method based on Docker virtualization
CN107066881A (en) * 2016-12-14 2017-08-18 四川长虹电器股份有限公司 Intrusion detection method based on Kohonen neutral nets
CN110249331A (en) * 2017-01-30 2019-09-17 微软技术许可有限责任公司 For the successive learning of intrusion detection
US11689549B2 (en) 2017-01-30 2023-06-27 Microsoft Technology Licensing, Llc Continuous learning for intrusion detection
CN106951778A (en) * 2017-03-13 2017-07-14 步步高电子商务有限责任公司 A kind of intrusion detection method towards complicated flow data event analysis
CN107404471A (en) * 2017-04-05 2017-11-28 青海民族大学 One kind is based on ADMM algorithm network flow abnormal detecting methods
CN110875912A (en) * 2018-09-03 2020-03-10 中移(杭州)信息技术有限公司 A deep learning-based network intrusion detection method, device and storage medium
CN109962909B (en) * 2019-01-30 2021-05-14 大连理工大学 A network intrusion anomaly detection method based on machine learning
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A network intrusion anomaly detection method based on machine learning
CN110198319A (en) * 2019-06-03 2019-09-03 电子科技大学 Security protocol bug excavation method based on more counter-examples
CN111753877A (en) * 2020-05-19 2020-10-09 海克斯康制造智能技术(青岛)有限公司 Product quality detection method based on deep neural network transfer learning
CN111753877B (en) * 2020-05-19 2024-03-05 海克斯康制造智能技术(青岛)有限公司 Product quality detection method based on deep neural network migration learning
CN111797997A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Network intrusion detection method, model construction method, device and electronic equipment

Similar Documents

Publication Publication Date Title
CN102158486A (en) Method for rapidly detecting network invasion
CN112953924B (en) Network abnormal flow detection method, system, storage medium, terminal and application
CN107070943B (en) Industrial internet intrusion detection method based on flow characteristic diagram and perceptual hash
CN109309675A (en) A network intrusion detection method based on convolutional neural network
CN110233849A (en) The method and system of network safety situation analysis
CN102291392B (en) Hybrid intrusion detection method based on Bagging algorithm
CN103838835B (en) A kind of network sensitive video detection method
CN112491796A (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
TW200849917A (en) Detecting method of network invasion
CN104636764B (en) A kind of image latent writing analysis method and its device
CN112087443A (en) An intelligent detection method of sensor data anomaly under cyber-physical attack on large-scale industrial sensor network
CN110324178B (en) Network intrusion detection method based on multi-experience nuclear learning
CN112261063A (en) Network malicious traffic detection method combined with deep hierarchical network
Nuo A novel selection method of network intrusion optimal route detection based on naive Bayesian
CN116662817A (en) Asset identification method and system of Internet of things equipment
CN110097120B (en) Network flow data classification method, equipment and computer storage medium
CN116260565A (en) Chip electromagnetic side channel analysis method, system and storage medium
Xu et al. CGASNet: a generalized zero-shot learning compound fault diagnosis approach for bearings
CN114826681A (en) DGA domain name detection method, system, medium, equipment and terminal
CN115277189A (en) Unsupervised intrusion flow detection and identification method based on generative countermeasure network
CN112383488B (en) A Content Identification Method for Encrypted and Unencrypted Data Streams
CN103093236B (en) A Mobile Terminal Pornography Filtering Method Based on Image Semantic Analysis
CN108009434A (en) Rich model Stego-detection Feature Selection Algorithms based on rough set α-positive domain reduction
CN115883198B (en) A multi-factor network abnormal behavior detection method
Luo et al. A hierarchical CNN-transformer model for network intrusion detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110817