CN113159310A

CN113159310A - Intrusion detection method based on residual error sparse width learning system

Info

Publication number: CN113159310A
Application number: CN202011524068.1A
Authority: CN
Inventors: 王振东; 刘尧迪; 李大海; 王俊岭; 曾珽
Original assignee: Jiangxi University of Science and Technology
Current assignee: Jiangxi University of Science and Technology
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-07-23

Abstract

The invention relates to the technical field of network and host intrusion detection, in particular to an intrusion detection method based on a residual error sparse width learning system, which comprises the following steps: step1, preprocessing an original intrusion detection data set, and Step2, dividing a standard data set into a training set and a testing set; and Step3, training and optimizing parameters of the BLS model, and Step4, inputting test data into the trained RES-BLS intrusion detection model, so as to obtain a classification result of each piece of data. The intrusion detection method based on the residual error sparse width learning system can effectively solve the problems of low accuracy rate, real rate, false positive rate and the like of the width learning system; the model uses SVD to solve the output weight matrix of BLS, continuously adjusts the error in the network training process through residual error learning, and finally prunes the redundant characteristics of the network through sparse pruning and the output weight so as to prune the redundant nodes and avoid the model from falling into local optimum.

Description

Intrusion detection method based on residual error sparse width learning system

Technical Field

The invention belongs to the technical field of network and host intrusion detection, and particularly relates to an intrusion detection method based on a residual error sparse width learning system.

Background

The width learning system is a feedforward neural network, the algorithm generates characteristic nodes and enhanced nodes in a sparse self-coding or random mode, and calculates corresponding output weights through ridge regression generalized inverse; the realization is simpler, but the traditional BLS has characteristic nodes and enhanced nodes with smaller output weight, so that the node has little effect on the final output of the network; a large number of redundant nodes can increase the complexity of a network structure and reduce the learning efficiency; in addition, the traditional BLS can calculate the output weight value only once, and errors are not adjusted, so that the classification effect of the model is influenced to a certain extent until the model falls into local optimum; in order to overcome the defects of the traditional BLS, a residual sparse width learning system is proposed and applied to intrusion detection.

The intrusion detection system can monitor network transmission in real time, and send out an alarm or take active defense measures when suspicious transmission is found, so that the intrusion detection system as an active safety protection technology becomes one of important technologies for guaranteeing network safety; the detection methods of intrusion detection systems are mainly classified into two categories: misuse detection and anomaly detection; the misuse detection is that the known intrusion behavior and attempt are subjected to feature extraction and written into a rule base, and the monitored network behavior is subjected to pattern matching with the rule base so as to judge the intrusion behavior or the intrusion attempt, and the method has the advantages of low false alarm rate, but has the main defects of difficult collection and update of intrusion information and large maintenance workload of the feature base; the abnormal detection is to detect an attack behavior from a large number of normal user behaviors; the obvious advantage of being capable of detecting unknown attacks is that the method is easy to generate higher false positives in the detection process; with the continuous perfection of the detection theory, the intrusion detection system is continuously developed; from an initial pattern matching algorithm and an expert system based on rule integration to the current algorithm based on artificial intelligence, good detection effects are obtained, and methods such as a support vector machine, an artificial neural network, a group intelligence algorithm, deep learning and the like are applied to intrusion detection research; however, with the advent of the big data era, the network topology is more and more complex, the data flow is more and more large, and the intrusion behavior is constantly changed, so that many defects, such as high false alarm rate and missing report rate, occur in the actual use process of the existing intrusion detection system, and especially in the high-speed switching network environment, the existing intrusion detection system cannot well detect all data packets; due to the defects of the data analysis method, the accuracy rate of data packet analysis is not high, report omission often occurs, and due to the fact that the updating of the detection rule lags the updating of the attack means, no corresponding detection rule exists for new attacks, and the false alarm phenomenon is caused; due to the fact that intrusion behaviors are variable, the intrusion detection model is difficult to maintain stable detection performance for all attack types; how to rapidly and accurately detect the intrusion behavior in the network in the current and future network environments becomes a key problem to be solved urgently in intrusion detection research; therefore, an intrusion detection method based on the residual error sparse width learning system is designed, and is urgently needed for the technical field of network and host intrusion detection at present.

Disclosure of Invention

The invention provides an intrusion detection method based on a residual error sparse width learning system, which aims to solve the problems in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

according to the embodiment of the invention, the intrusion detection method based on the residual error sparse width learning system comprises the following steps:

step1, preprocessing an original intrusion detection data set, wherein the preprocessing processes of the intrusion detection data set based on the network and the intrusion detection data set based on the host are respectively as follows:

A. preprocessing the intrusion detection data set based on the network:

(a) and high-dimensional data feature mapping: converting the discrete features into digital features using high-dimensional feature mapping;

(b) and data normalization: because the difference between the data with the same attribute is large, the training of the neural network is influenced, and therefore the data are normalized into real numbers of [0,1 ];

B. preprocessing an intrusion detection data set based on a host:

(a) converting the file type: the txt type file is converted into an xls file, and the attributes of each type of data type are separated, so that MATLAB (matrix laboratory) processing is facilitated;

(b) bag of words model characterization data: because the original data in the data set belong to text class characteristics and are not beneficial to use, the data are characterized by using a word bag model according to word frequency;

step2, standard data set partitioning: dividing a standard data set into a training set and a test set;

step3, model training: training and parameter tuning are carried out on the BLS model;

(a) initializing parameters of the BLS model: the number of characteristic nodes n, the number of enhanced nodes m, and a sparse parameter theta_kVector of

And initial weights and thresholds of the network;

(b) calculating the characteristic node mapping value Z of the BLS model_i＝φ(XW_θi+β_θi) And enhanced node mapping value H_j＝ξ(ZⁿW_hj+β_hj)；

(c) Merging the characteristic node mapping values and the enhanced node mapping values into a matrix A;

(d) calculating a residual error A + A of the BLS model, sequencing the enhancement nodes according to the residual error, calculating a relative error of the enhancement nodes, trimming the enhancement nodes according to the relative error, reserving the enhancement nodes with smaller relative errors, if an iteration termination condition is met, ending the iteration and turning to the step (e), otherwise, turning to the step (d), wherein the relative error is as follows:

(e) outputting the optimal BLS model, namely the RES-BLS model;

and Step4, inputting the test data into the trained RES-BLS intrusion detection model, and further obtaining the classification result of each piece of data.

Further, the training or prediction time of the RES-BLS model depends on the time complexity, and by assuming that the number of training samples is m, the iteration times is l, and the neuron numbers of the feature node layer and the output layer are n respectively₂And n₃The number of neurons in the enhanced node layer is n₂In the original BLS model, the time complexity of calculating a sample is o [ (n)₁+n₂)*n₃]Therefore, the overall time complexity of the BLS model is o { m x l [ (n)₁+n₂)*n₃]And the RES-BLS model is improved on the basis of BLS, the enhanced nodes need to be sorted according to relative errors, and the time complexity is o (n)₂logn₂) Therefore, the overall time complexity of the RES-BLS model is o { m x l [ (n)₁+n₂)*n₃+n₂logn₂]}。

The invention has the following advantages:

the intrusion detection method based on the residual error sparse width learning system can effectively solve the problems of low accuracy rate, real rate, false positive rate and the like of the width learning system; according to the model, SVD decomposition is used for solving an output weight matrix of BLS, errors in a network training process are continuously adjusted through residual error learning, and finally redundant nodes are pruned through sparse pruning to avoid the model from falling into local optimization through redundant features of a pruning network and output weights, so that the detection performance of the model is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

FIG. 1 is a schematic diagram of a RES-BLS intrusion detection framework of the present invention;

FIG. 2 is a flow chart of the RES-BLS algorithm of the present invention;

FIG. 3 is a schematic diagram of the residual error on the KDDCup99 data set of the present invention;

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the present specification, the terms "upper", "lower", "left", "right", "middle", and the like are used for clarity of description, and are not intended to limit the scope of the present invention, and changes or modifications in the relative relationship may be made without substantial changes in the technical content.

Referring to fig. 1-3, the present invention provides a technical solution:

an intrusion detection method based on a residual error sparse width learning system comprises the following steps:

A. preprocessing the intrusion detection data set based on the network:

B. preprocessing an intrusion detection data set based on a host:

And initial weights and thresholds of the network;

(e) outputting the optimal BLS model, namely the RES-BLS model;

In the invention: the training or predicting time of the RES-BLS model depends on time complexity, and by assuming that the number of training samples is m, the iteration times is l, and the neuron numbers of a characteristic node layer and an output layer are n respectively₂And n₃The number of neurons in the enhanced node layer is n₂In the original BLS model, the time complexity of calculating a sample is o [ (n)₁+n₂)*n₃]Therefore, the overall time complexity of the BLS model is o { m x l [ (n)₁+n₂)*n₃]And the RES-BLS model is improved on the basis of BLS, the enhanced nodes need to be sorted according to relative errors, and the time complexity is o (n)₂logn₂) Therefore, the overall time complexity of the RES-BLS model is o { m x l [ (n)₁+n₂)*n₃+n₂logn₂]Comparing the two models, the time complexity of the RES-BLS model is slightly larger than that of the BLS model, but the detection precision of the RES-BLS model is superior to that of the BLS model, and the detection precision of the RES-BLS model is effectively improved.

In the invention, the width learning system is a network transversely-extended efficient incremental learning system which takes a random vector function-linked neural network as a mapping feature and is based on the fact that a single-hidden-layer neural network passes through a neural enhancement node and directly connects the mapping feature and the enhancement node to an output end, and N input data with M dimensions are assumed

N sets of feature maps are obtained by equation (1):

Z_i＝φ(XW_ei+β_ei)#(1)

wherein, W_eiFor input to the input weight matrix between feature nodes, beta_eiFor biasing of the characteristic node, let Zⁿ＝[Z₁,Z₂,…,Z_n]The first n groups of mapping feature sets formed for mapping can generate m groups of enhanced nodes through nonlinear transformation (2) of the activation function

H_j＝ξ(ZⁿW_hj+β_hj)#(2)

Wherein, W_hjIs an input weight matrix, beta, between the feature node and the enhancement node_hjTo enhance the biasing of the node, let H^m＝[H₁,H₂,…,H_m]Enhancing the feature set of the nodes for the first m groups formed for enhancement;

thus, the width learning system can be represented as a whole by equation (3)

Y＝[Z₁,Z₂,…,Z_n|H₁,H₂,…,H_m]W^m

＝[Zⁿ|H^m]W^m#(3)

＝A^mW^m

Wherein, the network target weight matrix W^m＝[Zⁿ|H^m]⁺Y＝(A^m)⁺Y, and (A)^m)⁺Can be approximated by an equation

Calculating;

Y∈R^N×Qfor the output of the network, Y ∈ R when only two classification and regression tasks are considered^N。

In the invention, the residual error stage of the width learning system mainly comprises BLS based on SVD decomposition, truncation error in SVD, pruning of hidden neurons and a BLS sparse stage; since the RES-BLS model mainly includes three steps, as shown in fig. 2, data is first preprocessed and converted into data that can be input into the RES-BLS model; secondly, the BLS network is refined by utilizing the residual error sorting idea, and therefore, the residual error A of the enhanced node neuron needs to be calculated⁺A, then we can use the obtained residual to order each enhanced node neuron, and finally, the enhanced node neurons are separated from the network and the dimension of data AAnd (6) discharging.

1. BLS based on SVD decomposition

In equation (3), use is made of

Calculating W^m＝(A^m)⁺Y, however using W^m＝(A^m)⁺Y＝(λI+AA^T)^-1A^TY often results in unstable numerical calculations; in addition, the output weight matrix W is calculated in equation (3)^mThe operation of matrix inversion is involved, and if an input sample is too large, the complexity of matrix inversion is increased, so that the training efficiency of BLS is reduced; in order to overcome the instability of numerical calculation and reduce the calculation complexity of BLS, a BLS algorithm based on SVD is provided, and a solving method for changing a BLS output weight matrix is provided;

singular value decomposition, also called single value decomposition, is a most famous and widely used matrix decomposition method, and can simplify an intrusion detection data set, remove noise points and improve the accuracy of an algorithm by applying the method to BLS; assuming that N is an m × N real matrix, there is an m-order orthogonal matrix U and an N-order orthogonal matrix V, such that N is represented as:

N＝UDV^T

wherein:

representing left singular value orthogonal matrix, column vector

Is NN^TIs determined by the feature vector of (a),

orthogonal matrix of right singular values, column vector

Is N^TCharacteristic of NVectors, and each satisfy NN^T＝UΛ₁U^T＝I,N^TN＝ VΛ₂V^TI, a block form of an mxn matrix D

And is

Is uniquely determined by decomposition, and is marked as a singular value of a matrix N, a matrix Lambda₁Or matrix Λ₂The non-zero elements on the diagonal are respectively lambda_i(i-1, 2, …, r), where the decomposed form of the matrix product N-UDV^TIs the singular value decomposition of the matrix N.

2. Truncation error in SVD

The matrix a in equation (3) is SVD decomposed as follows:

A＝PRQ^T#(4)

at this time

W＝A⁺Y＝QR⁺P^TT#(5)

Wherein

Where α is the threshold value, θ_kIs the k-th of R^thThe singular values, SVD, present problems during the setting, in practice, very small singular values are usually set to 0, in order to avoid the problem of small value expansion during the calculation of equation (5); the singular value return to zero hardly causes numerical errors, the numerical errors are called residual errors, a sparse model is designed by utilizing the small-range error values, the accuracy of the final classification of the model can be greatly improved, and the residual errors are in W^m＝(A^m)⁺Has a direct role in the calculation of Y, since A is influenced^mAnd (5) calculating a pseudo-inverse.

3. Pruning of hidden neurons

The utilization occurs in the process of calculating the weight matrix WTo be able to design such a pruning tool better, it is necessary to know the BLS model in matrix a⁺Where errors occur during the calculation of (c); in fact, if we know that there are linearly independent columns in matrix A, one can derive A⁺Where I is an identity matrix, in practice, there is A⁺A is approximately equal to I; thus, due to the presence of A⁺The nature of A ≈ I, A⁺A is named as a pseudo identity matrix and passes

Represents; while

The diagonal and off-diagonal element distributions in (1) are slightly different from 1 and 0 because of the negligible and small range of singular value nulling in equation (6);

matrix array

Having m rows or columns, equal to the number of enhanced node neurons in the BLS; it is helpful to understand and prune the enhanced node neurons that cause large residual error, only needing to know the I e R of each row or column^m×mAnd

how much deviation is present;

in FIG. 3, BLS with tansig enhanced nodes is used in matrix A⁺The difference between A and the identity matrix I, A, with increasing number of ganglion neurons⁺The absolute value of the difference between A and I also exhibits a fluctuating increase, which also demonstrates that equation (6) is used to calculate A⁺Can be affected by truncation errors;

lei 1 order A_m-1＝[K(W_hj,β_hj,H_j)]，h＝1,2,…,NJ-1, 2, …, m being a positive definite matrix of BLS with m enhanced node neurons and N instances, a when increasing the number of enhanced nodes_m-1Is updated to A_m＝[A_m-1 a_m]＝ [K(W_hj,β_hj,H_j)](h-1, 2, …, N, j-1, 2, …, m +1), the residual a according to equation (6)_m-1Can be defined as

Where α is a threshold, neuron A with m enhanced nodes_m-1Has a characteristic spectrum of { mu _j1,2, …, m, and likewise, a with m +1 enhanced nodes_mHas a residual error of

Where { ρ _j1,2, …, m +1, noting that μ_jAnd ρ_jAre respectively as

And

can prove that:

E(A_m)≥E(A_m-1)#(8)

to prove lemma 1, assume

With descending order of μ₁>μ₂>…>μ_mAnd for the real and positive eigenvalues of

Also has ρ₁>ρ₂>…>ρ_m+1As demonstrated in the courent-Fischer theorem,

and

the characteristic values of (a) appear alternately:

ρ₁>μ₁>ρ₂>μ₂>…>μ_m>ρ_m+1#(9)

ρ₁according to equation (6), if θ_jAlpha is less than or equal to the singular value B in the diagonal term of the matrix B _j0, in the following attestation process, the square root will be ignored as it has no effect on our attestation, where the maximum eigenvalue cannot be 0 because ρ₁Greater than the maximum eigenvalue μ₁I.e. p₁>μ₁In the equation (9), ρ is ignored in terms of the definition of the residual being different from 0₁Then, the following can be obtained:

μ₁>ρ₂>μ₂>…>ρ_m>μ_m>ρ_m+1#(10)

or rho_j+1<μ_jJ is 1,2, …, m, which means that when the enhancing node adds a new neuron, there is ρ_j+1<μ_jOr

Therefore, it is desirable that the error is larger than

Suppose that

(T<The T characteristic value of m is set to 0, e.g. mu_j＝0,j＝(m-T+ 1),(m-T+2),…,m

Thus, therefore, it is

Due to rho_j+1<μ_j J 1,2, m, for j (m-T +1), (m-T +2), …, m using the same procedure for ρ _j+10; for example

Eigenvalues μ of m-T remaining in (m-1) iterations_jNot less than the threshold α, but at a new value ρ in the current iteration m_j+1Less than the value mu in the (m-1) th iteration_jI.e. p_j+1<μ_jJ is 1,2, …, m-T; since the new values become smaller, 1) if at least one of them is smaller than the threshold α, E (A) is derived_m)> E(A_m-1) 2) if all values are not less than the threshold α in equation (6), E (a) is derived_m)＝ E(A_m-1) (ii) a To this end, E (A)_m)≥E(A_m-1) And the proof of equation (8) ends here as well;

error E (A) in current BLS networks_m) Is increased depending on the addition to A_mNew column a in_m.a_mIs constructed by randomly initializing an input weight value W_hjAnd a threshold value beta_hjAnd the type of activation function, since W_hjAnd beta_hjIs randomly produced and therefore does not have good control over the singular values and residuals in equation (6), and therefore prunes away those enhanced node neurons for which the residual values are high;

to calculate the relative error, first in matrix A⁺And performing element-by-element subtraction between the A and the unit matrix I, and dividing the sum of absolute values of element differences by the sum of elements of the unit matrix I for each iteration, so that some neuron nodes contribute to reducing relative errors, while some neuron nodes greatly increase the relative errors, and the relative difference is used as a standard for pruning the neurons of the enhancement nodes.

4. Sparse phase of BLS

Sparsifying BLS networks using residuals to design matrix A, by a feed-forward feature selection strategyPerformed based on a pseudo-identity matrix

We have designed algorithm 1 using residual pruning BLS and named RES-BLS;

first, we input weight W_hhAnd a threshold value beta_hjCreating a matrix a with the most enhanced node neurons, then we select one enhanced node neuron and select the column from a, and progressively more enhanced node neurons as candidate neurons, in order to minimize the relative error of enhanced node neuron j, as follows:

wherein I_j∈R^mIs the jth column vector of the identity matrix I,

is a matrix

And sum (I) is the addition of all elements of the unit matrix I, ω_jStoring the relative error of the enhanced node j, equation (11) is used to calculate m_maxAn enhanced node, among all enhanced nodes, enhanced node neurons having a relatively small relative error are retained, and a candidate neuron ω_jIs stored in the vector

Performing the following steps;

thus, m will remain_max-1 enhancement node as candidate node, the relative error of the remaining enhancement node neurons can be calculated by equation (11), and the index and residual value of the next candidate node are added to the index and residual value, respectively

And ω, taking m this calculation up to the maximum number of enhancement nodes we wish to retain_maxTheta of_k%. parameter theta_kIs a sparsity parameter and has a value less than m_max；

By updating

And ω, it can be observed which column vectors in a have higher relative error values, and we can easily find the corresponding index value from a and can prune other columns, the left sparse design matrix after pruning

Wherein m < m_maxIs the number of column vectors in a.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. An intrusion detection method based on a residual error sparse width learning system is characterized in that: the method comprises the following steps:

A. preprocessing the intrusion detection data set based on the network:

B. preprocessing an intrusion detection data set based on a host:

And initial weights and thresholds of the network;

(e) outputting the optimal BLS model, namely the RES-BLS model;

2. The intrusion detection method based on the residual error sparse width learning system according to claim 1, wherein: the training or predicting time of the RES-BLS model depends on time complexity, and by assuming that the number of training samples is m, the iteration times is l, and the neuron numbers of a characteristic node layer and an output layer are n respectively₂And n₃The number of neurons in the enhanced node layer is n₂In the original BLS model, the time complexity of calculating a sample is o [ (n)₁+n₂)*n₃]Therefore, the overall time complexity of the BLS model is o { m x l [ (n)₁+n₂)*n₃]And the RES-BLS model is improved on the basis of BLS, the enhanced nodes need to be sorted according to relative errors, and the time complexity is o (n)₂logn₂) Therefore, the overall time complexity of the RES-BLS model is o { m x l [ (n)₁+n₂)*n₃+n₂logn₂]}。