CN102158486A

CN102158486A - Method for rapidly detecting network invasion

Info

Publication number: CN102158486A
Application number: CN2011100842307A
Authority: CN
Inventors: 李元诚; 李盼
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2011-04-02
Filing date: 2011-04-02
Publication date: 2011-08-17

Abstract

The invention discloses a network intrusion rapid detection method in the technical field of information security, which is used to solve the problems of slow calculation speed and lack of real-time detection in the currently used network intrusion detection method. The method includes: preprocessing the feature attributes of each sample in the training set; selecting the feature attributes of the samples in the sample set to achieve feature dimensionality reduction; composing the feature attributes after dimensionality reduction into a first feature vector, and using the first feature The vector is used as the input of the ball vector machine learning algorithm to train the classifier; the feature attributes of each sample in the test set are preprocessed; the feature attributes of the samples in the test set are selected to achieve feature dimensionality reduction; the feature attributes after dimensionality reduction A second feature vector is formed, and the classifier is tested by using the second feature vector as an input of the trained classifier; and it is determined whether the network is invaded according to the test result. The invention reduces the computational complexity of network intrusion detection and improves the real-time performance and accuracy of detection.

Description

A Fast Detection Method of Network Intrusion

技术领域technical field

本发明属于信息安全技术领域，尤其涉及一种网络入侵快速检测方法。The invention belongs to the technical field of information security, in particular to a method for quickly detecting network intrusion.

背景技术Background technique

随着信息技术的普及与发展，人类社会已走向了网络化的时代。然而互联网是一个面向大众的、开放的网络，目前的互联网协议对于信息的保密和系统的安全考虑得并不完备，对于非法入侵、黑客攻击、保密性信息泄露等安全问题难以维护。计算机网络不断地遭到非法入侵，重要情报资料不断地被窃取，从而使网络入侵检测工作面临着巨大的挑战。现有的网络入侵检测方法通常都采用基于机器学习的方法进行建模，其过程是首先从网络上抓取原始数据，并对数据进行量化和归一化处理；其次采用机器学习算法，以处理好的数据做为其输入来训练分类器。With the popularization and development of information technology, human society has entered the era of networking. However, the Internet is an open network for the public. The current Internet protocol does not fully consider the confidentiality of information and system security, and it is difficult to maintain security issues such as illegal intrusion, hacker attacks, and confidential information leakage. Computer networks are constantly being invaded illegally, and important intelligence data are being stolen constantly, which makes network intrusion detection work face a huge challenge. Existing network intrusion detection methods usually use machine learning-based modeling. The process is to first capture raw data from the network, and quantify and normalize the data; secondly, use machine learning algorithms to process Good data is used as its input to train the classifier.

在网络入侵检测中，把一次网络连接作为一个样本，每个样本数据通常包含几十个特征，建模时对所有特征进行量化处理和归一化处理，即把每个原始样本数据以向量的形式来表示，从而将所有向量组成的集合做为分类器的训练集；然后采用机器学习算法，并以训练集做为其输入来训练分类器。以训练好的分类器对输入的测试样本进行处理，便可得到预测输出值，以确定其属于正常连接还是异常连接。In network intrusion detection, a network connection is regarded as a sample, and each sample data usually contains dozens of features, and all features are quantified and normalized during modeling, that is, each original sample data is expressed as a vector form, so that the set of all vectors is used as the training set of the classifier; then the machine learning algorithm is used, and the training set is used as its input to train the classifier. The input test sample is processed by the trained classifier, and the predicted output value can be obtained to determine whether it belongs to a normal connection or an abnormal connection.

上述过程是现阶段典型的网络入侵检测方法，这种检测技术最主要的缺点在于：入侵检测的原始数据通常包含几十个特征，这些数据运用到一些分类算法中，将使分类速度非常缓慢；另外原始数据中包含的几十个特征中并不是所有的特征都对检测结果产生积极的影响，有些特征甚至会产生消极的影响。再有，以上方法考虑到分类速度问题通常采用的训练数据的规模较小，因而导致检测精度不高，误报率和漏报率很高。The above process is a typical network intrusion detection method at the present stage. The main disadvantage of this detection technology is: the original data of intrusion detection usually contains dozens of features, and these data are used in some classification algorithms, which will make the classification speed very slow; In addition, not all of the dozens of features contained in the original data have a positive impact on the detection results, and some features may even have a negative impact. Furthermore, the above methods take into account the problem of classification speed that the training data usually used is small in scale, which leads to low detection accuracy and high false positive and false negative rates.

针对上述提到的检测技术的不足，本发明提出了一种基于LLE(流形学习算法中的局部线性嵌入算法，Locally Linear Embedding)特征提取的BVM(球向量机，Ball Vector Machine)网络入侵快速检测方法。以同时包含正常连接和异常连接的样本集做为训练集，对训练集中的每个样本进行量化和归一化处理，并提取每个样本的关键特征以降低入侵检测数据的维数，然后利用分类算法训练分类器，最后利用训练好的分类器对未知连接进行分类，以确定其是正常连接还是异常连接。在对样本数据进行降维的过程中引入流形学习算法，对大量样本的特征属性进行分析处理，以发现隐藏在高维数据中有意义的低维结构，从而达到对高维特征属性进行降维处理的目的。分类学习算法引入球向量机算法，该算法引入了核心集的概念，把支持向量机中的求解支持向量的问题转化为求解包含训练样本集的最小球体(MEB)的问题。因为核心集的规模远远小于原始的训练样本集，所以极大的减少了求解优化问题的代价。BVM算法在保证检测率的情况下，能有效的减少样本的训练时间，从而能有效提高检测的实时性。Aiming at the deficiencies in the detection technology mentioned above, the present invention proposes a BVM (Ball Vector Machine, Ball Vector Machine) network intrusion based on LLE (locally linear embedding algorithm in the manifold learning algorithm, Locally Linear Embedding) feature extraction. Detection method. Take the sample set containing both normal connections and abnormal connections as the training set, quantify and normalize each sample in the training set, and extract the key features of each sample to reduce the dimensionality of the intrusion detection data, and then use The classification algorithm trains the classifier, and finally uses the trained classifier to classify the unknown connection to determine whether it is a normal connection or an abnormal connection. In the process of dimensionality reduction of sample data, the manifold learning algorithm is introduced to analyze and process the feature attributes of a large number of samples to find meaningful low-dimensional structures hidden in high-dimensional data, so as to achieve the reduction of high-dimensional feature attributes. The purpose of dimension processing. The classification learning algorithm introduces the spherical vector machine algorithm, which introduces the concept of core set, and transforms the problem of solving the support vector in the support vector machine into the problem of solving the smallest sphere (MEB) containing the training sample set. Because the size of the core set is much smaller than the original training sample set, the cost of solving the optimization problem is greatly reduced. The BVM algorithm can effectively reduce the training time of samples while ensuring the detection rate, thereby effectively improving the real-time performance of detection.

发明内容Contents of the invention

本发明的目的在于，提出一种网络入侵快速检测方法，用以解决目前使用的网络入侵检测方法计算维度高导致的运算速度慢以及检测缺乏实时性和准确率的问题。The purpose of the present invention is to propose a fast detection method for network intrusion, which is used to solve the problems of slow operation speed caused by high calculation dimension and lack of real-time and accuracy in the currently used network intrusion detection method.

为实现上述目的，本发明提供的技术方案是，一种网络入侵快速检测方法，其特征是所述方法包括：In order to achieve the above object, the technical solution provided by the present invention is a method for quickly detecting network intrusion, which is characterized in that the method includes:

步骤1：以同时包含正常连接和异常连接的样本集做为训练集，对训练集中的每个样本的特征属性进行预处理；Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set;

步骤2：采用流形学习算法中的局部线性嵌入算法对样本集中样本的特征属性进行选择，实现特征降维；Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction;

步骤3：将经过步骤2降维后的特征属性组成第一特征向量，以所述第一特征向量作为球向量机学习算法的输入来训练分类器；Step 3: Composing the feature attributes after step 2 dimension reduction into the first feature vector, using the first feature vector as the input of the ball vector machine learning algorithm to train the classifier;

步骤4：对测试集中的每个样本的特征属性进行预处理；Step 4: Preprocessing the feature attributes of each sample in the test set;

步骤5：采用流形学习算法中的局部线性嵌入算法对测试集中样本的特征属性进行选择，实现特征降维；Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction;

步骤6：将经过步骤5降维后的特征属性组成第二特征向量，以所述第二特征向量作为步骤3训练后的分类器的输入，对分类器进行测试；Step 6: Composing the feature attributes after step 5 dimensionality reduction into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier;

步骤7：根据测试结果判定网络是否受到入侵。Step 7: Determine whether the network has been invaded according to the test results.

所述对训练集中的每个样本的特征属性进行预处理具体包括：The preprocessing of the feature attributes of each sample in the training set specifically includes:

步骤11：找出每个样本中的字符特征；Step 11: Find out the character features in each sample;

步骤12：根据预置的关键词表，将字符特征替换为相应的数字特征值；Step 12: replace character features with corresponding digital feature values according to the preset keyword list;

步骤13：将每个样本中的数字特征值作为该样本的特征属性，并对每个样本中的特征属性进行归一化处理。Step 13: Use the digital feature value in each sample as the feature attribute of the sample, and perform normalization processing on the feature attribute in each sample.

所述归一化处理使用最大值归一化方法，具体采用公式

进行归一化The normalization process uses the maximum value normalization method, specifically using the formula

normalize

处理；其中xi为样本的特征属性，N为样本个数。processing; among them xi is the characteristic attribute of the sample, and N is the number of samples.

所述步骤2包括：Said step 2 includes:

步骤21：利用K近邻方法寻找每个样本的K个近邻点，其中K为给定的值；Step 21: use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value;

步骤22：利用步骤21中得到的K个近邻点构造出每个样本的局部重建权值矩阵；Step 22: Construct a local reconstruction weight matrix for each sample using the K neighbor points obtained in step 21;

步骤23：由每个样本的局部重建权值矩阵及其K个近邻点计算其低维输出值。Step 23: Calculate its low-dimensional output value from the local reconstruction weight matrix of each sample and its K neighbor points.

所述局部重建权值矩阵利用误差函数进行构造，其中

The local reconstruction weight matrix utilizes the error function to construct, where

所述低维输出值y_i满足映射条件：

且

其中I是m×m阶的单位矩阵。The low-dimensional output value y _i satisfies the mapping condition:

and

where I is an identity matrix of order m×m.

所述步骤3包括：Said step 3 includes:

步骤31：给定半径r和l＝0，并选择训练集中的任意一点z做为初始核心集，令初始核心集S₀＝{z}，并根据S₀计算出球体的初始中心c₀；Step 31: given the radius r and l=0, and choose any point z in the training set as the initial core set, set the initial core set S ₀ ={z}, and calculate the initial center c ₀ of the sphere according to S ₀ ;

步骤32：进行迭代运算，在第l次迭代中，如果核心集S_l包含了所有的训练集中的样本，即所有样本都落在球体B(c_l，(1+ε)r)之内，则迭代结束；否则，转到步骤33；其中ε为设定值，且ε＞0；Step 32: Perform an iterative operation. In the first iteration, if the core set S _l contains all the samples in the training set, that is, all samples fall within the sphere B(c _l , (1+ε)r), Then the iteration ends; otherwise, go to step 33; where ε is a set value, and ε>0;

步骤33：在核特征空间中，找到球体B(c_l，(1+ε)r)外的任意一样本点φ(x)，并生成核心集S_l+1＝S_l∪{x}；其中，φ(*)为核映射函数；Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c _l , (1+ε)r), and generate a core set S _l+1 = S _l ∪{x}; Among them, φ(*) is the kernel mapping function;

步骤34：由核心集S_l+1，求解S_l+1的中心c_l+1；其中，c_l+1的更新公式为：c_l+1＝φ(x)+β_l(c_l-φ(x))，β_l＝r/‖c_l-φ(x)‖。Step 34: Solve the center c _l+ 1 of S _l+1 from the core set S _l+1 ; where, the update formula of c _l+1 is: c _l+1 =φ(x)+β _l (c _l - φ(x)), β _l = r/∥c _l -φ(x)∥.

步骤35：令l＝l+1，返回到步骤32。Step 35: set l=l+1, return to step 32.

本发明降低了网络入侵检测计算复杂性，提高了检测实时性和准确性。The invention reduces the computational complexity of network intrusion detection and improves the real-time performance and accuracy of detection.

附图说明Description of drawings

图1是网络入侵快速检测方法流程图；Fig. 1 is the flow chart of network intrusion rapid detection method;

图2是采用流形学习算法中的局部线性嵌入算法对样本集中样本进行特征降维的示意图；Fig. 2 is a schematic diagram of feature dimensionality reduction for samples in a sample set using a local linear embedding algorithm in a manifold learning algorithm;

图3是以特征向量作为球向量机学习算法的输入来训练分类器的流程图。Fig. 3 is a flow chart of training a classifier with the feature vector as the input of the ball vector machine learning algorithm.

具体实施方式Detailed ways

下面结合附图，对优选实施例作详细说明。应该强调的是，下述说明仅仅是示例性的，而不是为了限制本发明的范围及其应用。The preferred embodiments will be described in detail below in conjunction with the accompanying drawings. It should be emphasized that the following description is only exemplary and not intended to limit the scope of the invention and its application.

本发明提供的网络入侵快速检测方法是基于LLE特征提取的BVM网络入侵快速检测方法，该方法有助于解决以往检测方法中不能很好的对入侵检测数据的特征属性进行降维，以及检测实时性差和检测率不高的问题。为此，本发明的解决方法是：以同时包含正常连接和异常连接的样本集做为训练集，对训练集中的每个样本进行量化和归一化处理，并采用流形学习算法中的LLE算法提取每个样本的关键特征以降低入侵检测数据的维数，然后利用球向量机算法训练分类器，最后利用训练好的分类器对未知连接进行分类，以确定其属于正常连接还是异常连接。The rapid network intrusion detection method provided by the present invention is a BVM rapid network intrusion detection method based on LLE feature extraction. The problem of poor performance and low detection rate. For this reason, the solution of the present invention is: use the sample set containing both normal connections and abnormal connections as the training set, quantize and normalize each sample in the training set, and use the LLE in the manifold learning algorithm The algorithm extracts the key features of each sample to reduce the dimensionality of the intrusion detection data, then uses the ball vector machine algorithm to train the classifier, and finally uses the trained classifier to classify the unknown connection to determine whether it is a normal connection or an abnormal connection.

图1是网络入侵快速检测方法流程图。图1中，本发明提供的一种网络入侵快速检测方法包括下列步骤：Fig. 1 is a flow chart of a rapid network intrusion detection method. Among Fig. 1, a kind of network intrusion rapid detection method provided by the present invention comprises the following steps:

步骤1：以同时包含正常连接和异常连接的样本集做为训练集，对训练集中的每个样本的特征属性进行预处理。Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set.

训练集可以直接从网络下载。网上有专门进行网络入侵检测评估用的数据集，叫做KDDCUP’99数据集。训练集中的样本即为网络连接，为了使训练出来的分类器更加准确地对测试集进行检测，此处训练集中的样本要同时包含正常连接和异常连接。The training set can be downloaded directly from the web. There is a data set dedicated to network intrusion detection and evaluation on the Internet, called the KDDCUP'99 data set. The samples in the training set are network connections. In order to make the trained classifier detect the test set more accurately, the samples in the training set should contain both normal connections and abnormal connections.

对训练集中的每个样本的属性进行预处理具体包括：Preprocessing the attributes of each sample in the training set specifically includes:

步骤11：找出每个样本中的字符特征。每个样本即每个连接，包含数字特征和/或字符特征，可以将数字特征(实际上就是一个数字)直接作为数字特征值。Step 11: Find out the character features in each sample. Each sample, that is, each connection, contains digital features and/or character features, and a digital feature (actually a number) can be directly used as a digital feature value.

步骤12：根据预置的关键词表，将字符特征替换为相应的数字特征值。Step 12: According to the preset keyword table, the character features are replaced with corresponding digital feature values.

关键词表至少包括两个字段，分别为字符特征和对应的数字特征值。这样，就可以将每个样本的字符特征(实际上就是字符)转换为数值，即数字特征值。The keyword table includes at least two fields, which are character features and corresponding digital feature values. In this way, the character feature (actually character) of each sample can be converted into a numerical value, that is, a digital feature value.

步骤12实际是量化处理过程。Step 12 is actually a quantization process.

所述归一化处理使用最大值归一化方法，具体采用公式

进行归一化处理；其中

x_i为样本的特征属性，N为样本个数。The normalization process uses the maximum value normalization method, specifically using the formula

Perform normalization processing; where

x _i is the characteristic attribute of the sample, and N is the number of samples.

步骤2：采用流形学习算法中的局部线性嵌入算法对样本集中样本的特征属性进行选择，实现特征降维。Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction.

图2是采用流形学习算法中的局部线性嵌入算法对样本集中样本进行特征降维的示意图，图2中，步骤2具体包括：Figure 2 is a schematic diagram of using the local linear embedding algorithm in the manifold learning algorithm to perform feature dimensionality reduction on the samples in the sample set. In Figure 2, step 2 specifically includes:

步骤21：利用K近邻方法寻找每个样本的K个近邻点，其中，K为给定的值。Step 21: Use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value.

把相对于所求样本点距离最近的K个样本点规定为所求样本点的K个近邻点，其中，K是预先给定的值。距离的计算可采用欧式距离计算方法，欧氏距离算法如下：设x，y∈R^N，则x，y的欧氏距离可由下式求得：The K sample points closest to the sample point to be obtained are defined as the K neighbor points of the sample point to be obtained, wherein K is a predetermined value. The calculation of the distance can adopt the Euclidean distance calculation method, and the Euclidean distance algorithm is as follows: Let x, y∈R ^N , then the Euclidean distance of x, y can be obtained by the following formula:

${(({Σ Σ}_{i i = = 11}^{N N} {(({x x}^{i i} - - {y the y}^{j j}))}^{22}))}^{\frac{11}{22}}$

步骤22：利用步骤21中得到的K个近邻点构造出每个样本的局部重建权值矩阵。Step 22: Use the K neighbor points obtained in Step 21 to construct a local reconstruction weight matrix for each sample.

局部重建权值矩阵W＝(w_ij)∈M_n，n是这样的权值矩阵，如果x_i与x_j不相邻，则w_ij＝0，设x_i与x_jk(k＝1，2，...，K)是相邻的，则有约束

Local reconstruction weight matrix W=(w _ij )∈M _{n, n} is such a weight matrix, if x _i and x _j are not adjacent, then w _ij =0, set x _i and x _jk (k=1, 2,...,K) are adjacent, then there is a constraint

使用XW近似表示X，会存在一定的误差，这里定义矩阵的Frobenius范数如下：A＝(a_i，j)∈M_m，n，则

Using XW to represent X approximately, there will be certain errors. Here, the Frobenius norm of the matrix is defined as follows: A=(a _{i, j} )∈M _{m, n} , then

由下式约束寻找W：

即

其中，x_jk，i代表x_i的k个近邻点。这相当于求一系列最小二乘问题的解。如对x_i而言，由下面的方程组可以获得Find W by the following constraints:

Right now

Among them, x _{jk, i} represent the k nearest neighbor points of x _i . This is equivalent to finding the solution to a series of least squares problems. As for x _i , the following equations can be obtained

${w w}_{jk jk,, i i} : : \{\begin{matrix} {Σ Σ}_{k k = = 11}^{K K} {w w}_{jk jk,, i i} = = 11 \\ {Xw wxya}_{i i} = = {x x}_{i i} \end{matrix}$

通过权值矩阵W，可以在低维空间中找到合适的y_i。通过以下约束来完成：

其中y_i是x_i的输出向量，y_jk，i(k＝1，2，...，K)是y_i的近邻点，并且要满足两个条件：与其中I是m×m的单位矩阵。由此，损失函数可重写为：

其中M是N×N的对称矩阵：M＝(I-W)^T(I-W)。Through the weight matrix W, a suitable y _i can be found in a low-dimensional space. This is done with the following constraints:

Where y _i is the output vector of x _i , y _{jk, i} (k=1, 2, ..., K) is the neighbor point of y _i , and two conditions must be met: and where I is the identity matrix of m×m. Thus, the loss function can be rewritten as:

where M is an N×N symmetric matrix: M=(IW) ^T (IW).

要使损失函数值达到最小，则取Y为M的最小m个非零特征值所对应的特征向量。在处理过程中，将M的特征值从小到大排列，第一个特征值几乎接近于零，那么舍去第一个特征值。通常取第2到m+1间的特征值所对应的特征向量作为输出结果。To minimize the value of the loss function, Y is the eigenvector corresponding to the smallest m non-zero eigenvalues of M. During the processing, the eigenvalues of M are arranged from small to large, and the first eigenvalue is almost close to zero, then the first eigenvalue is discarded. Usually, the eigenvector corresponding to the eigenvalue between 2nd and m+1 is taken as the output result.

步骤3：将经过步骤2降维后的特征属性组成第一特征向量，以所述第一特征向量作为球向量机学习算法的输入来训练分类器。Step 3: Composing the dimensionality-reduced feature attributes in step 2 into a first feature vector, and using the first feature vector as an input of the ball vector machine learning algorithm to train a classifier.

图3中，步骤3具体包括：In Fig. 3, step 3 specifically includes:

步骤31：给定半径r，并选择训练集S中的任意一点z∈S做为初始核心集S₀＝{z}并根据S₀计算出球体的初始中心c₀。z实际上是一个样本。Step 31: Given a radius r, select any point z∈S in the training set S as the initial core set S ₀ ={z} and calculate the initial center c ₀ of the sphere according to S ₀ . z is actually a sample.

步骤32：进行迭代运算，在第l次迭代中，如果核心集S_l包含了所有的训练集中的样本，即所有样本都落在球体B(c_l，(1+ε)r)之内(ε为设定值，且ε＞0)，则迭代结束；否则，转到步骤33。Step 32: Perform iterative operation. In the first iteration, if the core set S _l contains all the samples in the training set, that is, all samples fall within the sphere B(c _l , (1+ε)r) ( ε is a set value, and ε>0), then the iteration ends; otherwise, go to step 33.

步骤33：在核特征空间中，找到球体B(c_l，(1+ε)r)外的任意一样本点φ(x)，并生成核心集S_l+1＝S_l∪{x}；其中，φ(*)为核映射函数。Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c _l , (1+ε)r), and generate a core set S _l+1 = S _l ∪{x}; Among them, φ(*) is the kernel mapping function.

步骤34：由核心集S_l+1，求解S_l+1的中心c_l+1。其中，c_l+1的更新公式为：c_l+1＝φ(x)+β_l(c_l-φ(x))，β_l＝r/‖c_l-φ(x)‖。Step 34: Find the center c _l+1 of S _l+1 from the core set S l ₊₁ . Wherein, the update formula of c _l+1 is: c _l+1 =φ(x)+β _l (c _l -φ(x)), β _l =r/‖c _l -φ(x)‖.

步骤4：对测试集中的每个样本的特征属性进行预处理。Step 4: Preprocess the feature attributes of each sample in the test set.

测试集可以从网络下载，也可以通过对网络真实环境进行仿真，对仿真网络中的网络连接数据进行抓包，然后进行分析获得的。The test set can be downloaded from the network, or can be obtained by simulating the real network environment, capturing packets of network connection data in the simulated network, and then analyzing them.

步骤5：采用流形学习算法中的局部线性嵌入算法测试集中样本的特征属性进行选择，实现特征降维。Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction.

步骤5的具体执行过程如步骤2，只不过进行特征降维的对象由训练集改成了测试集。The specific execution process of step 5 is the same as step 2, except that the object for feature dimensionality reduction is changed from the training set to the test set.

步骤6：将经过步骤5降维后的特征属性组成第二特征向量，以所述第二特征向量作为步骤3训练后的分类器的输入，对分类器进行测试。Step 6: Composing the feature attributes after dimension reduction in step 5 into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier.

该步骤的具体过程与步骤3类似，只不过输入为第二特征向量，输出结果为正常连接或者异常连接。The specific process of this step is similar to step 3, except that the input is the second feature vector, and the output result is normal connection or abnormal connection.

根据步骤6的输出结果，如果为正常连接，则网络未受到入侵；如果未异常连接，则可判定网络受到入侵。According to the output result of step 6, if the connection is normal, the network is not intruded; if there is no abnormal connection, it can be determined that the network is intruded.

本发明使用的流形学习算法具有较强的降维能力，能够发现数据的有用特征；球向量机学习算法通过求解最小闭合球体得到样本的核心集，因为核心集的规模远远小于原始的训练样本集，所以计算耗时和占用的内存空间都极大的减小，同时也减小了支持向量的求解时间和求解优化问题的代价，既实现了网络入侵行为的检测，又保证了检测的准确率和实时性。The manifold learning algorithm used in the present invention has strong dimension reduction ability and can find useful features of data; the ball vector machine learning algorithm obtains the core set of samples by solving the minimum closed sphere, because the scale of the core set is far smaller than the original training sample set, so the time-consuming calculation and the memory space occupied are greatly reduced, and at the same time, the solution time of the support vector and the cost of solving the optimization problem are also reduced, which not only realizes the detection of network intrusion behavior, but also ensures the accuracy of detection. accuracy and timeliness.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. A method for fast detection of network intrusion, characterized in that said method comprises:

Step 1: Take the sample set containing both normal connections and abnormal connections as the training set, and preprocess the feature attributes of each sample in the training set;

Step 2: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the sample set to achieve feature dimensionality reduction;

Step 3: Composing the feature attributes after step 2 dimension reduction into the first feature vector, using the first feature vector as the input of the ball vector machine learning algorithm to train the classifier;

Step 4: Preprocessing the feature attributes of each sample in the test set;

Step 5: Use the local linear embedding algorithm in the manifold learning algorithm to select the feature attributes of the samples in the test set to achieve feature dimensionality reduction;

Step 6: Composing the feature attributes after step 5 dimensionality reduction into a second feature vector, using the second feature vector as the input of the classifier trained in step 3, and testing the classifier;

Step 7: Determine whether the network has been invaded according to the test results.

2. A kind of network intrusion rapid detection method according to claim 1, it is characterized in that described characteristic attribute of each sample in training set is carried out preprocessing specifically comprises:

Step 11: Find out the character features in each sample;

Step 12: replace character features with corresponding digital feature values according to the preset keyword list;

Step 13: Use the digital feature value in each sample as the feature attribute of the sample, and perform normalization processing on the feature attribute in each sample.

3. A kind of network intrusion quick detection method according to claim 2, it is characterized in that described normalization process uses the maximum value normalization method, specifically adopts the formula Perform normalization processing; where xi is the characteristic attribute of the sample, and N is the number of samples.

4. A kind of network intrusion quick detection method according to claim 1, it is characterized in that described step 2 comprises:

Step 21: use the K nearest neighbor method to find K neighbor points for each sample, where K is a given value;

Step 22: use the K neighbor points obtained in step 21 to construct a local reconstruction weight matrix for each sample;

Step 23: Calculate its low-dimensional output value from the local reconstruction weight matrix of each sample and its K neighbor points.

5. A kind of network intrusion rapid detection method according to claim 4, it is characterized in that described local reconstruction weight matrix utilizes error function

to construct, where

6. a kind of network intrusion rapid detection method according to claim 4 is characterized in that described low-dimensional output value yi satisfies mapping condition:

and

where I is an identity matrix of order m×m.

7. A kind of network intrusion quick detection method according to claim 1, it is characterized in that described step 3 comprises:

Step 31: given the radius r and l=0, and choose any point z in the training set as the initial core set, set the initial core set S ₀ ={z}, and calculate the initial center c ₀ of the sphere according to S ₀ ;

Step 32: Perform an iterative operation. In the first iteration, if the core set S _l contains all the samples in the training set, that is, all samples fall within the sphere B(c _l , (1+ε)r), Then the iteration ends; otherwise, go to step 33; where ε is a set value, and ε>0;

Step 33: In the kernel feature space, find any sample point φ(x) outside the sphere B(c _l , (1+ε)r), and generate a core set S _l+1 = S _l ∪{x}; Among them, φ(*) is the kernel mapping function;

Step 34: Solve the center c _l+ 1 of S _l+1 from the core set S _l+1 ; where, the update formula of c _l+1 is: c _l+1 =φ(x)+β _l (c _l - φ(x)), β _l = r/∥c _l -φ(x)∥.

Step 35: set l=l+1, return to step 32.