CN101866403B

CN101866403B - Intrusion detection method based on improved OBS-NMF algorithm

Info

Publication number: CN101866403B
Application number: CN2010101991022A
Authority: CN
Inventors: 马文萍; 焦李成; 赵富家; 公茂果; 刘芳; 王爽; 尚荣华; 马晶晶
Original assignee: Xidian University
Current assignee: Discovery Turing Technology Xi'an Co ltd
Priority date: 2010-06-11
Filing date: 2010-06-11
Publication date: 2012-07-04
Anticipated expiration: 2030-06-11
Also published as: CN101866403A

Abstract

The invention discloses an intrusion detection method based on improved OBS-NMF algorithm, mainly aiming at solving the existing problems in the current technology such as low capability of processing high dimensional data, weak robustness, small selection range of threshold value and undesired detection results. The realization thereof steps are as follows: (1) collecting a call of a progress system; (2) constructing and simplifying the training matrix; (3) carrying out lowering dimension decomposition on the training matrix; (4) judging whether the convergence conditions are satisfied, if so, then performing step (5), if not, returning to step (3) to carry out iteration continuously until the maximum iterations are reached; (5) constructing a test matrix U; (6) utilizing a basis matrix W to solve the characteristic coefficient vector hu of U; (7) solving the abnormality of the process vector in U; and (5) setting the threshold value lambda and outputting the detection results. The invention has the advantages of simple realization, favorable stability, high detection precision, large selection range of the threshold value and strong instantaneity, and can be applied to real-time intrusion detection based on host system call.

Description

Based on the intrusion detection method that improves the OBS-NMF algorithm

Technical field

The invention belongs to the computer security technique field, particularly a kind of Computer Security Intrusion detection method, this method can be used for solving the abnormality detection of computer processes behavior.

Background technology

Computer security from eighties of last century seventies in early days just the someone begin one's study; But their achievement in research once was left in the basket; James P.Anderson is the masterpiece that a technical report that USAF is done just is acknowledged as intrusion detection up in April, 1980; The notion of intrusion detection in the computer system that this part has been entitled as the for the first time detailed elaboration of the technical report of " Computer Security Threat Monitoring and Surverillance " has also proposed to utilize the thought of audit-trail data monitoring invasion activity.Along with people's constantly strengthens the dependence of computer network; Traditional network security technology can not provide effective protection; Replenish as a kind of of conventional art, intrusion detection becomes a new direction of network security development, and it is a kind of computing machine of active and the hedge of network security.

Abnormality detection can detect unknown attack, and therefore, unusual in recent years intrusion detection becomes the focus of research.At present, for example statistical study, machine learning, neural network, data mining etc. have been applied in the abnormality detection by the success of a lot of methods.The correlative study work of intrusion detection has a lot of sorting techniques according to the difference of data source, comprises audit event and command sequence, system call, and network packet is hit strong characteristic, file system access.For abnormality detection system; Can carry out modeling and detect invasion system action from each different aspects of computing machine or network; One of them crucial problem is exactly how to select the user; The behavioural characteristic of system or network is so that distinguish intrusion behavior and normal behaviour according to these characteristics better.

In the operating system of Unix/Linux and compatibility thereof, system call is the excuse of resource transfer in user's space and the kernel spacing.Single program may produce a plurality of processes in the process of implementation, and the system call of a process generation is classified as one " execution mark ".Through analyzing the system call that " execution mark " produces, whether with regard to the abnormal operating condition of possible discovery procedure, and then it is under attack to judge this system.With respect to other data source, the kind of system call is very limited, and promptly kernel is that the linux system of 2.7.10 calls 221 kinds of less thaies, only kind more than 80 commonly used.In addition, hacker attacks also inevitably can stay the invasion vestige in the system call of system kernel layer.Therefore, simple and efficient based on the intrusion detection modeling method of system call, become the main research object of main frame behavior modeling in the abnormality detection in recent years.

1996, people such as Forrest at first proposed normally to move with process the system call section sequence of the certain-length of generation and portray running state of a process as research object.People such as Lee use the Ripper software package after the work of Forrest, from system call sequence, excavate normal and abnormal patterns, with the running status that the form of rule is come descriptive system, have set up a succinct efficient system normal model.People such as Warrender then utilize the HMM algorithm that the implicit state of system call is carried out modeling, have also obtained the quite good detecting result.People such as Wespi have expanded people's such as Forrest thought, use the method for elongated short sequence that modeling is carried out in system call.People such as Asaka have used the method detection system invasion of optimizing classifying face.Liao and Hu adopt arest neighbors KNN and supporting vector machine SVM that program behavior is carried out modeling respectively.Some other method, like a step Markov model, methods such as neural network and soft calculating also all are successfully applied in the program exception detection based on system call.

But also there are some problems among the current intrusion detection IDS, as detecting weak effect, the situation that fail to report, rate of false alarm is high.Data processing performance is not high, handles the indifferent of higher-dimension mass data.And the data volume of computer realm is explosive increase now; UNM is in the system call experiment of collecting the generation of Sendmail finger daemon; Only 112 e-mail messages have just produced and have surpassed 1,500,000 system call, and therefore an effective intrusion detection method must just can not detected before attack also damages to system in real time.Have, real-time is poor again, and adaptivity is bad.Big to taking of resource, do not have extensibility etc.

Nonnegative matrix resolution theory NMF is introduced into intrusion detection in recent years, its efficiently data dimensionality reduction ability it is paid attention to widely, it has the simplicity of realization, the interpretation of resolution theory, and take advantages such as resource is few.The present successful fields such as recognition of face, text classification, speech processes that are applied to.Wang adopts the NMF algorithm to set up the intrusion detection model in real-time Intrusion Detection Techniques Research one literary composition of its multiple information sources; Adopt system call as analyzing data; The high dimension vector data are decomposed, detect intrusion behavior, simplified data at lower dimensional space; Reduce resources occupation rate, had higher detection precision and real-time.But the algorithm stability of this abnormality detection model is not high, and convergence can not be guaranteed, and the selection of threshold value simultaneously is difficulty comparatively, has had influence on the detection effect.

Summary of the invention

The objective of the invention is to overcome the deficiency of above-mentioned prior art, propose a kind of intrusion detection method based on improvement OBS-NMF algorithm, to guarantee the OBS-NMF convergence, the scope of choosing of expanded threshold value improves precision and the stability that detects effectively.

The technical scheme that realizes the object of the invention is: will regard classification problem as based on the intrusion detection problem that host computer system is called; To carry out pre-service to raw data; Change into the vector matrix form that is fit to processing; Utilize nonnegative matrix to decompose aspect feature extraction and dimensionality reduction fast and effectively characteristic matrix is carried out resolution process; Utilize the better iteration mechanism of the former NMF algorithm of convergent improved properties of OBS strategy ability, utilize vectorial angle can reduce the reconstructed error of data and the characteristic that well reflects the vector data otherness, adopt cosine distance and positive chordal distance to handle the invasion problem respectively as the condition of convergence of the present invention and discriminant function.Concrete performing step comprises as follows:

(1) running client computing machine utilizes server that it is invaded attack, and the normal procedure of surveillance generation in service and abnormal process are collected the system call that produces in the process implementation;

(2) each normal procedure that will collect number is constructed a process vector from small to large according to system call, and the number of times that system call is occurred is as the element of vector; Set the training matrix scale, each vector is imported the row as training matrix successively, obtain training matrix V ^*Represent as follows:

{V^{*}}_{n \times m} = [\begin{matrix} 1 & 1 & 0 & \cdot \cdot \cdot & 3 \\ 0 & 2 & 2 & \cdot \cdot \cdot & 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot \\ 0 & 0 & 1 & \cdot \cdot \cdot & 0 \end{matrix}] = [{v^{*}}_{1}, {v^{*}}_{2}, \cdot \cdot \cdot, {v_{i}}^{*}, \cdot \cdot \cdot {v^{*}}_{m}]

N is the maximum system call number, and m is the process number, and v is the system call number of times;

(3) in the deletion training matrix 0 element ratio greater than 95% row, and the row deleted of mark, the probability of use of each system call in corresponding process in the calculation training matrix is with V ^*Be simplified to matrix V;

(4) use the OBS-NMF algorithm that the training matrix V after simplifying is carried out dimensionality reduction and decompose, make V ≈ WH, W representes the weight matrix of V, and its size is n * r, and H representes the matrix of coefficients of V, and its size is r * m, and r representes factoring, and the initial value of W and H is got random value;

(5) utilize following iterative formula to try to achieve weight matrix W and matrix of coefficients H:

W ⁺←((H·H ^T) ^-1·H·V ^T) ^T

W＝W ⁺+ΔW ⁺

H ⁺←(W·(W ^T·W) ^-1) ^T·V

H＝H ⁺+ΔH ⁺

W wherein ⁺And H ⁺Be respectively the transition variable of W and H,

W is adjusted in expression respectively ⁺And H ⁺After the square error increment that causes, j=1,2 ..., n, e _jThe unit matrix of j row, matrix W _j=W ⁺e _j, H _j=H ⁺e _j, 5＜r＜20;

(6) select maximum iteration time max iter=1000, judge whether the iteration error of V and WH satisfies condition of convergence Conv＞γ, wherein; 0.8＜γ＜1; If satisfy, then execution in step (6) continues iteration otherwise return step (5); Up to reaching maximum iteration time max iter, the iteration error formula is expressed as:

Conv = \min_{i} (\frac{v_{i}^{T} ({Wh}_{i})}{{| | v_{i} | |}_{2} {| | {Wh}_{i} | |}_{2}}), i = 1,2, \cdot \cdot \cdot m

Wherein, h _iExpression H matrix column vector;

(7) get whole normal procedure and abnormal process as test sample book, to this test sample book execution in step (2), and the row of the middle mark of deletion step (3), obtain test matrix U, use following formula, find the solution the characteristic coefficient vector h of each column vector u among the U _u:

h_{u} = {(W^{T} \cdot W)}^{- 1} \cdot W^{T} \cdot u;

(8) compare h _uWith each vectorial h among the H _iPositive chordal distance, with the positive chordal distance of minimum abnormality degree e as each process vector u among the U:

e = 1 - \max_{i} (\frac{h_{u}^{T} h_{i}}{{| | h_{u} | |}_{2} {| | h_{i} | |}_{2}}), i = 1,2 \cdot \cdot \cdot m;

(9) set threshold value 0＜λ＜0.1, if satisfy discriminant function e＞λ, it is unusual to show that then this process exists, and prompt system is handled, otherwise prompting process safety.

The present invention compared with prior art has following advantage:

1. strong robustness

Because the iteration of original NMF algorithm mechanism binding character is not strong, therefore, the influence that the detection performance receives W, H and training sample picked at random is very big; Compare former NMF algorithm, algorithm of the present invention is introduced OBS thought, and iteration mechanism is improved; Improved the binding character of iteration direction; Testing result shows that this method receives the influence of W, H and training sample picked at random very little, has better robustness than NMF algorithm.

2. real-time

This method is decomposed dimensionality reduction through the OBS-NMF algorithm to the higher-dimension mass data, is mapped to lower dimensional space and detects, and has reduced data volume, has reduced the complexity of algorithm effectively, compares other intrusion method for testing spended time still less; Because the present invention adopts the frequency characteristic of system call as analyzing data, compare the data of using other characteristic simultaneously, have better real-time property.

3. the selection of threshold scope is big, detects superior performance

Owing to adopted the condition of convergence and discriminant function based on vectorial angle; Compare the condition of convergence and the discriminant function of employing based on Euclidean distance; The former meets the architectural characteristic of experimental data more; The diversity factor that can reflect normal sample and exceptional sample better can obviously improve the precision of detection.Experimental result shows, method of the present invention has been expanded the scope of choosing of threshold value, has improved the detection performance of algorithm effectively.

4. Algorithm Convergence is better

Traditional NMF algorithm receives constringent puzzlement always, and the present invention has improved the iteration mechanism of NMF algorithm, and on the basis of least mean-square error, further optimization draws iterative increment, has strengthened the binding character of iteration direction.Simultaneously, the condition of convergence of sampling cosine angle is compared and is adopted the condition of convergence of Euclidean distance more strict to the requirement of error.Need more iterations, could satisfy the condition of convergence.Therefore, this algorithm has obtained improving greatly on convergence.

Description of drawings:

Fig. 1 is a process flow diagram of the present invention;

Fig. 2 is the testing result figure of former NMF algorithm to CERTsendmail data and UNMsendmail.log data;

Fig. 3 is that the present invention improves the testing result figure of the OBS-NMF algorithm of iteration mechanism to CERTsendmail data and UNMsendmail.log data;

Fig. 4 is that the OBS-NMF algorithm of iteration of the present invention mechanism is to the verification and measurement ratio of CERTsendmail data and the UNMsendmail.log data curve map with changes of threshold;

Fig. 5 is the present invention's OBS-NMF I algorithm of improving discriminant function to the verification and measurement ratio of CERTsendmail data and the UNMsendmail.log data curve map with changes of threshold;

Fig. 6 is the present invention's OBS-NMF II algorithm of improving the condition of convergence to the verification and measurement ratio of CERTsendmail data and the UNMsendmail.log data curve map with changes of threshold;

Fig. 7 is the curve map that verification and measurement ratio of the present invention changes with factoring r.

Embodiment

With reference to Fig. 1, following with reference to Fig. 1 concrete performing step of the present invention;

Step 1, data acquisition.

The running client computing machine, under the situation of the normal operation of system, single program is in the process of implementation; Can produce a plurality of processes, process jumps to the interior nuclear location that is called system_call through interrupt instruction 0x80; Get into the system call handling procedure, call relevant kernel function, carry out and finish; Turn back to user's space, wherein the system call sequence of process generation is through the application program in kernel patch installing or the system, and intercepting obtains like the Strace method.

Step 2, structure process vector.

For the raw data that collects, at first, divide into groups, the system call data are divided into groups according to process, and the system call under the same process is classified as one group; Secondly; Number construct a process vector from small to large according to system call, add up every kind of frequency that system call occurs in every group of data, statistics forms a column vector; The process behavior just can be described with the system call sequence that it sends like this; Whether normally the problem of intrusion detection also judges these vectors problem with regard to changing into, and makes data obtain yojan, and problem also obtains simplifying.

Step 3 makes up training matrix.

Process vector with making up is formed the matrix data that is fit to the inventive method; The process vector of selection 30% is as the scale of training matrix; Each process vector with structure; Add training matrix as a column vector successively, up to the scale that satisfies definition, raw data just can be expressed as the training matrix V of a n * m like this ^*, be designated as

{V^{*}}_{n \times m} = [\begin{matrix} {v^{*}}_{11} & {v^{*}}_{12} & \cdot \cdot \cdot & {v^{*}}_{1 m} \\ {v^{*}}_{21} & {v^{*}}_{22} & \cdot \cdot \cdot & {v^{*}}_{2 m} \\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot \\ {v^{*}}_{n 1} & {v^{*}}_{n 2} & \cdot \cdot \cdot & {v^{*}}_{nm} \end{matrix}] = [{v^{*}}_{1}, {v^{*}}_{2}, \cdot \cdot \cdot, {v^{*}}_{m}]

Wherein, the maximum system call number that n is to use, m are the process numbers, V ^*In each column vector representative every group data corresponding with it in the number of times of dissimilar element appearance.

Step 4 is simplified training matrix.

Making up matrix V ^*The time, the line number of matrix promptly is corresponding system call number, and the system call of each process is not to occur continuously in the process of carrying out, but according to the sequencing of its appearance, and only some can be used, and therefore need be to the V that constructs ^*Matrix is deleted the many row of 0 element, with the dimension of preliminary minimizing data.Concrete grammar is: add up in every row 0 number; If surpass certain ratio; The present invention limits its each row 0 element and surpasses 95%, then deletes this row, does not promptly use the character representation of this system call as behavior; So just can delete in a large number not by the characteristic of using He seldom use, matrix has obtained simple yojan; At last, the system call number of times in the matrix is transformed into the probability that it occurs in corresponding vector, other Audit data also can be handled according to this mode, has guaranteed the adaptivity and the extendability of intrusion detection.

Step 5 is found the solution weight matrix W and matrix of coefficients H.

The thought of original NMF algorithm is to seek two matrix W and H, makes V _Ij≈ WH, promptly

V_{iu} \approx {(WH)}_{iu} = Σ_{a = 1}^{r} W_{ia} H_{au}

Matrix W after decomposing and the dimension of H are respectively n * r and r * m, and each the row h among the H can regard the proper vector that every group data kept corresponding with it as.The present invention is based on the OBS strategy and proposed following iterative formula:

W ⁺←((H·H ^T) ^-1·H·V ^T) ^T

W＝W ⁺+ΔW ⁺

H ⁺←(W·(W ^T·W) ^-1) ^T·V

H＝H ⁺+ΔH ⁺

W wherein ⁺And H ⁺Be respectively the transition variable of W and H,

W is adjusted in expression respectively ⁺And H ⁺After the square error increment that causes, j=1,2 ..., n, e _jThe unit matrix of j row, matrix W _j=W ⁺e _j, H _j=H ⁺e _j, r (5＜r＜20) is a factoring, the weight matrix dimension behind the expression dimensionality reduction, and when matrix decomposition, selection r＜＜n.

According to iterative formula as above, picked at random non-negative initial matrix W and H in computation process, need adjust each weights, in the solution procedure of reality, often utilize the method for iteration alternately to try to achieve W and H, make reconstructed error ‖ V-WH ‖ minimum.

Step 6 is judged end condition.

Select maximum iteration time max iter=1000, factoring r=8 is for the training sample V=[v in the experiment ₁, v ₂... V _m], because training data is to be basic research object with vector, the present invention adopts the cosine angle as the condition of convergence for this reason, just get the cosine of V and (WH) corresponding each column vector apart from minimum value as the condition of convergence:

Conv = \min_{i} (\frac{v_{i}^{T} ({Wh}_{i})}{{| | v_{i} | |}_{2} {| | {Wh}_{i} | |}_{2}}), i = 1,2, \cdot \cdot \cdot m

Wherein, h _iExpression H matrix column vector;

For set-point 0.8＜γ＜1, when Conv >=γ, then iteration is ended, otherwise continues iteration, up to maximum iteration time.

Step 7 makes up and the simplification test matrix.

As test sample book, at first, to the structure of the same advanced row matrix of this test sample book, it is identical to make up normal training matrix in method and the step 3 with whole normal procedure and abnormal process; Then, it is carried out simplifying the operation like step 4.Because in order to represent training sample and test sample book with identical system call; And the test matrix after guaranteeing to simplify is consistent with the dimension of training matrix; The row of deletion is necessary identical with the row of training matrix deletion, to reach the purpose of more accurately representing sample with minimum dimension; At last, obtain test matrix U.

Step 8, as tested object, in order to reduce the complexity of calculating, based on the matrix W that training study arrives, the present invention proposes following formula, finds the solution the coefficient vector h of test data with each column vector u of test matrix U _u:

h _u＝(W ^T·W) ^-1·W ^T·u。

Step 9, coefficient of comparisons vector h _uWith each vectorial h among the H _iPositive chordal distance, with the positive chordal distance of minimum abnormality degree e as each process vector u among the U:

e = 1 - \max_{i} (\frac{h_{u}^{T} h_{i}}{{| | h_{u} | |}_{2} {| | h_{i} | |}_{2}}), i = 1,2 L m .

With this formula as the unusual decision function of identification.

Step 10 is set threshold value 0＜λ＜0.1, if e＞λ, it is unusual to show that then this process exists, and prompt system is handled, otherwise prompting process safety.

Advantage of the present invention can further specify through following emulation:

1) simulated conditions

Emulation experiment all adopts the public data of U.S. University of New Mexico: CERTsendmail data and UNMsendmail.Log data.More than two kinds of data all comprise 147 normal procedure, wherein, the CERTsendmail data comprise 36 abnormal process, this emulation only uses four kinds of syslog wherein to attack data local, local2, remote1, remote2; Two abnormal process UNMfwdloops and UNMsendmailsm have been comprised in the UNMsendmail.Log data.In order more effectively to check the performance of the inventive method, we attack data with syslog and add as the invasion sample.Factoring r selects 8, and maximum iteration time is 1000 times, and 30% normal procedure selects all processes to test as training process in picked at random CERTSendmail and the UNMsendmail.log data.The condition of convergence and the decision function of OBS-NMF algorithm use Euclidean distance of the present invention; The decision function of the condition of convergence of the improved OBS-NMF I of the present invention algorithm use Euclidean distance and positive chordal distance, the decision function of the condition of convergence of the improved OBS-NMF II of the present invention algorithm use cosine angle and positive chordal distance.NCD representes the verification and measurement ratio of normal sample, and ACD representes the verification and measurement ratio of exceptional sample.

2) emulation content

In order to verify the advantage of the improved invasion algorithm of the present invention on improvement strategy, with OBS-NMF algorithm and traditional NMF algorithm, and the carrying out of improved OBS-NMF I of the present invention and OBS-NMF II algorithm emulation experiment relatively.

2a) the iteration mechanism of the present invention's employing and the comparison of traditional NMF algorithm

Improve of the influence of iteration mechanism in order to explain to algorithm stability; The improved OBS-NMF algorithm of the present invention using under the situation of the identical discriminant function and the condition of convergence with traditional NMF algorithm, has been carried out emulation experiment to CERT Sendmail data and UNMsendmail.log data respectively.Normal sample represented in asterisk, the invasion that other symbolic representation is different, and horizontal ordinate is represented test process, ordinate is represented abnormality degree.Former NMF algorithm is as shown in Figure 2 to the testing result of two kinds of data, and wherein, Fig. 2 a representes the test result to CERT Sendmail data, and Fig. 2 b representes the test result to the UNMsendmail.log data.The improved OBS-NMF algorithm of the present invention is as shown in Figure 3 to the test result of two kinds of data, and wherein, Fig. 3 a representes the test result to the CERTSendmail data, and Fig. 3 b representes the test result to the UNMsendmail.log data.

2b) the present invention improves the comparison of decision function and not improved OBS-NMF algorithm

Improve decision function to detecting the raising of performance in order to explain; The improved OBS-NMF I of the present invention algorithm has carried out emulation experiment to CERT Sendmail data and UNMsendmail.log data respectively using under the situation of the identical iteration mechanism and the condition of convergence with the OBS-NMF algorithm.Solid line is represented the verification and measurement ratio of normal sample, and dotted line is represented the verification and measurement ratio of exceptional sample, and horizontal ordinate is represented threshold value, and ordinate is represented verification and measurement ratio.The improved OBS-NMF algorithm of the present invention is as shown in Figure 4 with the result of changes of threshold to the verification and measurement ratio of CERT Sendmail data and UNMsendmail.log data; Wherein, Fig. 4 a representes the verification and measurement ratio result of variations of CERT Sendmail data, and Fig. 4 b representes the verification and measurement ratio result of variations of CERT Sendmail data; The improved OBS-NMF I of the present invention algorithm is as shown in Figure 5 with the result of changes of threshold to the verification and measurement ratio of CERT Sendmail data and UNMsendmail.log data; Wherein, Fig. 5 a representes the verification and measurement ratio result of variations of CERTSendmail data, and Fig. 5 b representes the verification and measurement ratio result of variations of UNMsendmail.log data.

2c) the present invention improves the comparison of the condition of convergence and not improved OBS-NMF I algorithm

Improve the condition of convergence to detecting the raising of performance in order to explain; The improved OBS-NMF II of the present invention algorithm using under the situation of identical iteration mechanism and discriminant function with OBS-NMF I algorithm, has been carried out emulation experiment to CERT Sendmail data and UNMsendmail.log data respectively.Solid line is represented the verification and measurement ratio of normal sample, and dotted line is represented the verification and measurement ratio of exceptional sample, and horizontal ordinate is represented threshold value, and ordinate is represented verification and measurement ratio.The improved OBS-NMF II of the present invention algorithm is as shown in Figure 6 with the result of changes of threshold to the verification and measurement ratio of CERT Sendmail data and UNMsendmail.log data; Wherein, Fig. 6 a representes the verification and measurement ratio result of variations of CERT Sendmail data, and Fig. 6 b representes the verification and measurement ratio result of variations of UNMsendmail.log data.

3) The simulation experiment result analysis

As can beappreciated from fig. 2, the abnormality degree of the original normal sample of NMF algorithm changes greatly, and robustness is not strong, can not well restrain.This mainly be because original algorithm in the process of decomposing, W and H are the nonnegative matrixes of selecting at random, iterative formula can not guarantee the convergence of iteration, therefore, has influence on the detection to normal sample.Simultaneously, picked at random different training sample also can make the abnormality degree of normal sample change obviously, and it is bigger that the stability of NMF algorithm is influenced by the picked at random of normal sample.As can beappreciated from fig. 3, normal sample abnormality degree obviously reduces in the improved OBS-NMF algorithm of the present invention testing result, and has concentrated stable properties, has dwindled the abnormality degree scope of testing result greatly, is convenient to select appropriate threshold to discern.Even still select the initial value of test sample book and W, H at random, still can reach more satisfactory convergence effect, can better analyze intrusion behavior, also improved the robustness of algorithm greatly.

Can find out from Fig. 4 and Fig. 5; The selection of discriminant function is bigger to the influence of abnormal behaviour identification; Under the situation that adopts identical iterative rules; The detection performance that the improved OBS-NMF I of the present invention algorithm is compared the improved OBS-NMF algorithm of the present invention obviously improves, and explains that the sinusoidal distance discrimination method of introducing is superior to using the method for discrimination of Euclidean distance.Simultaneously, judge that invasion the most important thing is to define threshold value, whether the selection of threshold value is suitable, directly has influence on the accuracy of detection of algorithm, and therefore, the size of threshold range has embodied the superiority-inferiority of algorithm.The improved OBS-NMF I of the present invention algorithm is under the situation that guarantees higher verification and measurement ratio, and threshold range obviously is expanded.Explanation can farthest reflect based on the method for discrimination of positive chordal distance the otherness of normal sample and exceptional sample can improve the precision of intrusion detection effectively.

Can find out by Fig. 5 and Fig. 6; The improved cosine angle condition of convergence can further improve the detection performance of algorithm; The abnormality degree that the improved OBS-NMF II of the present invention algorithm is compared OBS-NMF I algorithm exceptional sample is further enhanced, and explains and introduces the cosine angle is superior to using Euclidean distance as the method for the condition of convergence the condition of convergence.This mainly is because select Euclidean distance as the condition of convergence, can produce accumulated error, and the cosine angle that adopts dot product not only meets the characteristics of training data as the condition of convergence, simultaneously also with the unity of thinking of discriminant function.Among Fig. 6 a, the improved OBS-NMF II of the present invention algorithm is reduced at verification and measurement ratio under 95.65% the situation first, and threshold range expands to 0.146 by 0.094 among Fig. 5 a.Among Fig. 6 b among the improved OBS-NMF II of the present invention algorithm and Fig. 5 b the improved OBS-NMF I of the present invention algorithm compare, keep verification and measurement ratio greater than 83.33% situation under, threshold range has also further obtained expansion.Therefore, adopt the cosine angle, can farthest reflect the otherness of normal sample and exceptional sample as the condition of convergence.Simulation result shows that under the situation that keeps higher verification and measurement ratio, the improved OBS-NMF II of the present invention algorithm has well been expanded the scope of choosing of threshold value.

In a word; Intrusion detection method of the present invention has all reached in to the analysis of normal procedure and abnormal process than high measurement accuracy, has better robustness than traditional NMF intrusion detection method, adaptivity and accuracy of detection; In handling the magnanimity high dimensional data; Have more remarkable advantages, have very strong real-time, be fit to real-time intrusion detection.

4) algorithm parameter impact analysis

The detection performance that the present invention is used for the OBS-NMF algorithm of intrusion detection mainly receives the influence of factoring r, and factoring is related to matrix and the error size of original matrix, the i.e. number of base vector after the reconstruct.

In the basis matrix after the decomposition, r is more little, and then the degree of data compression is also just big more, and the result also gets over out of true; Otherwise r is big more, and the number of base vector is just many more; Then the matrix of reconstruct can accurately be represented original matrix more, and when r reached certain big value, the change of detection effect can be too unobvious.Simultaneously, the increase of r has also improved speed of convergence, has reduced iterations.Because intrusion detection not only need be considered the accuracy of detection of algorithm; Will take into account the ability of real-time processing magnanimity high dimensional data simultaneously, therefore, the selection of r will be taken all factors into consideration precision and intensity of compression; Should satisfy certain accuracy requirement; Processing invasion that simultaneously can high-efficiency real-time, the value of r will be in a reasonable range, and the relation curve of it and verification and measurement ratio is as shown in Figure 7.As can beappreciated from fig. 7, along with the increase of r, the verification and measurement ratio of exceptional sample also increases, and when r＞18, verification and measurement ratio is near 100%.

Claims

1. the intrusion detection method based on the nonnegative matrix decomposition algorithm OBS-NMF that improves neural network comprises the steps:

(2) each normal procedure that will collect number is constructed a process vector from small to large according to system call; The number of times that system call is occurred is as the element of vector; Set the training matrix scale, each vector is imported the row as training matrix successively, construct training matrix V ^*Represent as follows:

{V^{*}}_{n \times m} = [\begin{matrix} 1 & 1 & 0 & \cdot \cdot \cdot & 3 \\ 0 & 2 & 2 & \cdot \cdot \cdot & 0 \\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot \\ 0 & 0 & 1 & \cdot \cdot \cdot & 0 \end{matrix}] = [{v^{*}}_{1}, {v^{*}}_{2}, \cdot \cdot \cdot, {v_{i}}^{*}, \cdot \cdot \cdot {v^{*}}_{m}],

N is the maximum system call number, and m is the process number, v ^*It is the system call number of times;

W ⁺←((H·H ^T) ^-1·H·V ^T) ^T

W＝W ⁺+ΔW ⁺

H ⁺←(W·(W ^T·W) ^-1) ^T·V

H＝H ⁺+ΔH ⁺，

W wherein ⁺And H ⁺Be respectively the transition variable of W and H,

(6) select maximum iteration time max iter=1000, judge whether the iteration error of V and WH satisfies condition of convergence Conv＞γ, wherein; 0.8＜γ＜1; If satisfy, then execution in step (7) continues iteration otherwise return step (5); Up to reaching maximum iteration time max iter, the iteration error formula is expressed as:

Conv = \min_{i} (\frac{{v_{i}}^{T} ({Wh}_{i})}{{| | v_{i} | |}_{2} {| | {Wh}_{i} | |}_{2}}), i = 1,2, \cdot \cdot \cdot m,

Wherein, h _iExpression H matrix column vector;

h _u＝(W ^T·W) ^-1·W ^T·u；

e = 1 - \max_{i} (\frac{h_{u}^{T} h_{i}}{{| | h_{u} | |}_{2} {| | h_{i} | |}_{2}}), i = 1,2 \cdot \cdot \cdot m;

(9) set threshold value 0＜λ＜0.1, if e＞λ, it is unusual to show that then this process exists, and prompt system is handled, otherwise prompting process safety.

2. intrusion detection method according to claim 1, wherein the described setting training matrix of step (2) scale is according to the process number of collecting, the process of selection 30% is as the vector of training matrix.

3. intrusion detection method according to claim 1; The probability of use of each system call in corresponding process in the described calculation training matrix of step (3) wherein; Be earlier the element in the process vector to be sued for peace; Again each element value and this summing value are divided by, draw the probability of use of each element.