CN103023927B

CN103023927B - The intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression and system

Info

Publication number: CN103023927B
Application number: CN201310009206.6A
Authority: CN
Inventors: 陈善雄; 熊海灵; 伍胜
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2013-01-10
Filing date: 2013-01-10
Publication date: 2016-03-16
Anticipated expiration: 2033-01-10
Also published as: CN103023927A

Abstract

The invention discloses the intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression and system, collection network data and host data, obtain raw network data one-level audit privileged program; To the preliminary treatment of network data host data, generating network characteristic, short data records vector; To data test matrix nonnegative matrix Breaking Recurrently, and to basic matrix and weight matrix rarefaction representation; Utilize projection matrix to the weight matrix data sampling of rarefaction representation, obtain the weight coefficient vector of altitude feature; Utilize characteristic vector database data the weight coefficient vector of altitude feature to be mated with the characteristic vector in training data, judge whether to meet off-note; This intrusion detection method and system utilize Non-negative Matrix Factorization Data Dimensionality Reduction, adopt many divergences as module, RIP condition in sparse expression joins in associating divergence target function race, with constrained non-negative matrix decomposition iterative process, reduce Data Detection dimension, be convenient to intruding detection system process higher-dimension mass data.

Description

The intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression and system

Technical field

The invention belongs to Intrusion Detection Technique field, particularly relate to the intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression.

Background technology

The technical report of " ComputerSecurityThreatMonitoringandSurveillance ", the basic conception of intrusion detection is proposed first till now from JamesAnderson, intrusion detection experienced by rule of conduct coupling, and reliability detects and machine learning detection method three phases.And adopt pattern recognition, the technique study intrusion detection of machine learning, make detection system have adaptivity, study property, survivability, and this is the effective means of various known and unknown attack mode in the current network of antagonism.Its usual way is extract the feature of invasion data or normal visit data, and construction feature vector storehouse, carries out pattern matching, and then complete intrusion detection.The common method of machine learning has statistical learning method, rule induction, decision tree, case-based reasoning, Bayes network, neural calculating, HMM, genetic algorithm etc.Machine learning method is being applied in Intrusion Detection Technique, is intrusion detection is regarded as a pattern recognition problem, namely according to normal behaviour and the abnormal behaviour of the compartment systems such as network flow characteristic and main frame record of the audit.

For the research of the Intrusion Detection Technique based on machine learning, scientific research institution both domestic and external gives the attention of height more for a long time, has carried out a large amount of research and exploitation.

People's tools for buidling ES collection such as Lindqvist in 1999 also for intrusion detection, are used for intrusion detection the study analysis function of expert system first.The people such as calendar year 2001 Garcia use evolution strategy improvement based on the intruding detection system of expert system, to improve the verification and measurement ratio of system; Based in the intruding detection system of state transition analysis.The people such as Dit-Yand in 2005 are with the behavior of HMM model description user, and the situation distance with user behavior being greater than certain threshold value is judged as invading.People's using state transfer analysis such as Yeung in 2003 devise one and detect the whether abnormal system of host system processes.Within 2004, Hixon uses the invasion of Markov chain Sampling network.2007, first Tan little Bin and the Wang Wei equality people of China Science & Technology University established the hidden Markov model of a computer system operation conditions, on the basis of this model, then propose an algorithm for the real-time abnormality detection of computer system.

Within 2005, Ye, Nong application decision tree method carries out computer intrusion detection.In the same year, the people such as Burbeck, Kalle use the data mining algorithm of Fast incremental cluster to carry out abnormality detection, improve standard BIRCH cluster framework, are 2.8% to verification and measurement ratio 95% rate of false alarm of KDD data set.The people such as the Stolfo of Columbia University utilize data mining principle, devise a probability anomaly detector (probabilisticanomalydetection, PAD), and verification and measurement ratio 95% rate of false alarm of PAD to some abnormal process is 2%.The people such as Tang aster, Cao Yang propose gene expression programming (GEP) rule extraction (CGREA) based on constraint, and in order to the unknown attack verification and measurement ratio solving intrusion detection existence is low, much more regular, complexity causes the problem that detection efficiency is not high.This test of heuristics to three kinds of unknown attack verification and measurement ratios 88%.Zheng Hongying, intrusion detection problem can be changed into the optimal classification problem of data by Liao Xiaofeng etc., introduces the optimization that simulated anneal algritym algorithm realizes cluster result.

Comprehensive present Research both domestic and external, the various algorithms of machine learning are subject to the extensive concern of Chinese scholars in the research in intrusion detection field, but it is large for data dependence in intrusion detection sample data, training repeated sample is many, the training time caused is long, and invasion sample labeling difficulty etc. problem be not well solved.

In machine learning field, matrix always is the effective representation of the one of data, can carry out data processing efficiently simultaneously.Matrix analysis has perfect basic theory, is therefore widely deployed in signal transacting, image procossing, economic analysis, medical science, biology, the industry-by-industries such as engineering or field.As a branch of matrix theory, nonnegative matrix opinion originates from and is found by Perron for 1907, a critical nature of the spectral radius about n rank nonnegative matrix further developed by Frobenius again afterwards, namely the spectral radius of nonnegative matrix must be a characteristic value of this nonnegative matrix.Henceforth due to it and Mathematics of Economics, computational mathematics, Combinational Mathematics, the subjects such as probability theory, physics, chemistry have close relationship, develop rapidly in recent decades, have become one of research field very active in linear algebra.In scientific research and technical field, most data negative value does not have practical significance, and nonnegative matrix, as the representation of data, has clearer and more definite physical significance to its result.Within 1999, Lee and Seung proposes Algorithms of Non-Negative Matrix Factorization for image denoising and face reconstruct at " Nature ".As a kind of dimensionality reduction and data compression instrument, time NMF is used to the fields such as blind source signal separation, gene order prediction, recognition of face, text cluster/data mining, speech processes, network invasion monitoring closely about ten years.NMF decomposition has better dimension and about subtracts the performance with feature extraction, it is to the extraction of feature and traditional PCA(PrincipleComnentAnalysis), ICA(IndependentComnentAnalysis), the method differences such as SVD (SingularValueDecomposition) are that the expression to data " reflects the concept of ' local forms entirety ' in human thinking ", therefore NMF decomposition result interpretation is strong, simultaneously the feature of the more realistic application data of constraint of its nonnegativity.Study more deep to be the Japanese brain science research institute of AndrzejCichocki leader at present to the disposal ability of data about NMF.This mechanism mainly adopts NMF method to process the brain wave that brain produces, and tries hard to obtain the eeg signal under the different behavior of the mankind, analyzes the relation of eeg signal corresponding to often kind of behavior and human thinking activities.And this process being separated corresponding behavior signal from mixed signal is similar with isolating normal access behavior with the process of intrusion behavior from network data flow.

NMF is used for intrusion detection advantage and is the feature utilizing its data non-negative, decomposition result has good feature representation ability, stronger separating power is had (in blind source separating field to signal, its basic matrix is exactly the less basis signal of linear correlation degree), so that speed and less overhead detect intrusion behavior faster.Intrusion detection can regard the process isolating intrusion behavior and normal behaviour in a large amount of network data as; and NMF has stronger data separating ability; comparatively strong to the basic matrix interpretation after data matrix decomposition, this meets in intrusion detection normal or the separation of abnormal data, the needs of expression.We set up the relational matrix between access to netwoks behavior and communication process, isolate invasion process by NMF.Certainly to the judgement of Lawful access and unauthorized access, still need to set up feature database, carry out characteristic matching, and the advantage of NMF intrusion detection is to realize accurate coupling by less feature.

The various methods of existing machine learning are subject to the extensive concern of Chinese scholars in the research in intrusion detection field, but it is large for data dependence in intrusion detection sample data, training repeated sample is many, the training time caused is long, and invasion sample labeling difficulty etc. problem be not well solved.

Summary of the invention

The invention provides the intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression, the various methods being intended to solve existing machine learning are large for data dependence in intrusion detection sample data, training time that training repeated sample causes long and invasion sample labeling difficulty fails the problem carrying out solving more.

The object of the present invention is to provide the intrusion detection method based on Non-negative Matrix Factorization under a kind of sparse expression, this intrusion detection method comprises the following steps:

Step one, collection network data and host data, obtain raw network data one-level audit privileged program, and export gathered data;

Step 2, carries out preliminary treatment to network data host data, generating network characteristic, short data records vector, and stores the network characterization data generated, short data records vector, detect;

Step 3, carries out nonnegative matrix Breaking Recurrently to the data test matrix obtained through preliminary treatment, and to obtaining basic matrix and weight matrix carries out rarefaction representation respectively;

Step 4, utilizes the weight matrix data of projection matrix to rarefaction representation meeting RIP condition to sample, obtains the weight coefficient vector of altitude feature;

Step 5, utilizes the data in characteristic vector storehouse, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, judges whether to meet off-note.

Further, in step 3, the basic matrix that nonnegative matrix Breaking Recurrently obtains, after rarefaction representation, will feed back in upper level nonnegative matrix Breaking Recurrently.

Further, in step 5, utilize the data in characteristic vector storehouse, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, when meeting off-note, then illustrate that this test data illustrates intrusion behavior, and output that abnormal results is reported to the police; If when not meeting off-note, then again carry out nonnegative matrix Breaking Recurrently, until the difference of basic matrix and weight matrix product and former test data is less than the threshold value of setting.

Further, this intrusion detection method also according to testing result execution journal, warning, block and abandon operation response.

Further, in step 4, the projection matrix meeting RIP condition can adopt consistent ball matrix, local hadamard matrix, toeplitz matrix.

Further, this intrusion detection method, is processed the network information number as the error metrics standard of matrix decomposition and enforcement rarefaction restriction by associating divergence, and the low-rank obtaining network detection data matrix approaches, find the immanent structure feature of data, isolate invasion information.

Another object of the present invention is to provide the intruding detection system based on Non-negative Matrix Factorization under a kind of sparse expression, this intruding detection system comprises:

Data acquisition and pretreatment module, for collecting and preliminary treatment the calling sequence of process;

Training and dimensionality reduction module, sequence matrix for obtaining data acquisition and pretreatment module carries out nonnegative matrix Breaking Recurrently, obtain feature basic matrix and weight encoder matrix, system call sequence is converted into low dimensional vector by high dimension vector, the feature of higher dimensional space inherited by low n-dimensional subspace n;

Abnormality detection module, for being mated with the characteristic vector comprised in data to be tested by original training data vector, judges whether to meet off-note.

Further, described data acquisition and pretreatment module comprise: data acquisition module and data preprocessing module;

Described data acquisition module comprises further:

For obtaining raw network data from residing network environment, and to the network data acquirer that obtained raw network data exports;

For privileged program of auditing, export the host data collector of record of the audit;

Described data preprocessing module comprises further:

Map and latent structure for datagram IP fragmentation and reassembly, stream restructuring, property field, generating network characteristic, the network data preprocessor that feeding characteristic vector library storage or feeding detector detect;

Perform mark for extraction procedure, and generate short data records vector with sliding window, the host data preprocessor that feeding characteristic vector library storage or feeding detector detect.

Further, described training and dimensionality reduction module comprise:

For carrying out Breaking Recurrently to the data test matrix obtained through preliminary treatment, being changed the rule of matrix decomposition iteration by the joint objective function parameter adjusting many divergences, obtaining the Non-negative Matrix Factorization module of best classification performance;

For under base conversion and the minimum basic principle of the degree of coherence of projection matrix, employing base has converted the rarefaction representation module of the rarefaction representation of basic matrix and weight matrix;

For the projection matrix by meeting RIP condition, sparse data being sampled, representing the data compression acquisition module of data to be tested access behavior by less characteristic;

For network data and the storage of host data and the characteristic vector storehouse of issue.

Further, described characteristic vector storehouse comprises: misuse detects training sample database (DB_MTRAIN), abnormality detection training sample database (DB_ATRAIN) and abnormality detection result exceptional sample storehouse (DB_ARESULT), abnormality detection result exceptional sample storehouse (DB_ARESULT) is added misuse to and is detected in training sample database (DB_MTRAIN), realizes upgrading misuse and detects training sample database (DB_MTRAIN).

The intrusion detection method based on Non-negative Matrix Factorization under sparse expression provided by the invention and system, collection network data and host data, obtain raw network data one-level audit privileged program; Preliminary treatment is carried out to network data host data, generating network characteristic, short data records vector, and the network characterization data generated, short data records vector are stored, detected; Nonnegative matrix Breaking Recurrently is carried out to the data test matrix obtained through preliminary treatment, and to obtaining basic matrix and weight matrix carries out rarefaction representation respectively; Utilize the weight matrix data of projection matrix to rarefaction representation meeting RIP condition to sample, obtain the weight coefficient vector of altitude feature; Utilize the data in characteristic vector storehouse, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, judge whether to meet off-note; This intrusion detection method and system utilize the Data Dimensionality Reduction advantage of Non-negative Matrix Factorization, have employed many divergences as module, RIP condition in sparse expression joins in associating divergence target function race, with constrained non-negative matrix decomposition iterative process, reduce Data Detection dimension, for the difficult problem solving intruding detection system process higher-dimension mass data provides a kind of scheme, there is stronger propagation and employment and be worth.

Accompanying drawing explanation

Fig. 1 is the realization flow figure of the intrusion detection method based on Non-negative Matrix Factorization under the sparse expression that provides of the embodiment of the present invention;

Fig. 2 is the structured flowchart of the intruding detection system based on Non-negative Matrix Factorization under the sparse expression that provides of the embodiment of the present invention;

Fig. 3 is the data flow diagram of the intruding detection system based on Non-negative Matrix Factorization under the sparse expression that provides of the embodiment of the present invention;

Fig. 4 is the operation principle flow chart of the intruding detection system based on Non-negative Matrix Factorization under the sparse expression that provides of the embodiment of the present invention.

In figure: 21, data acquisition and pretreatment module; 211, data acquisition module; 212, data preprocessing module; 22, training and dimensionality reduction module; 221, Non-negative Matrix Factorization module; 222, rarefaction representation module; 223, data compression acquisition module; 224, characteristic vector storehouse; 225, projection matrix; 23, abnormality detection module.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is described in further detail.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and be not used in restriction invention.

Fig. 1 shows the realization flow of the intrusion detection method based on Non-negative Matrix Factorization under the sparse expression that the embodiment of the present invention provides.

This intrusion detection method comprises the following steps:

In step S101, collection network data and host data, obtain raw network data one-level audit privileged program, and export gathered data;

In step s 102, preliminary treatment is carried out to network data host data, generating network characteristic, short data records vector, and the network characterization data generated, short data records vector are stored, detected;

In step s 103, nonnegative matrix Breaking Recurrently is carried out to the data test matrix obtained through preliminary treatment, and to obtaining basic matrix and weight matrix carries out rarefaction representation respectively;

In step S104, utilize the weight matrix data meeting projection matrix 225 pairs of rarefaction representations of RIP condition to sample, obtain the weight coefficient vector of altitude feature;

In step S105, utilize the data in characteristic vector storehouse 224, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, judge whether to meet off-note.

In embodiments of the present invention, in step s 103, the basic matrix that nonnegative matrix Breaking Recurrently obtains, after rarefaction representation, will feed back in upper level nonnegative matrix Breaking Recurrently.

In embodiments of the present invention, in step S105, utilize the data in characteristic vector storehouse 224, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, when meeting off-note, then illustrate that this test data illustrates intrusion behavior, and output that abnormal results is reported to the police; If when not meeting off-note, then again carry out nonnegative matrix Breaking Recurrently, until the difference of basic matrix and weight matrix product and former test data is less than the threshold value of setting.

In embodiments of the present invention, this intrusion detection method also according to testing result execution journal, warning, block and abandon operation response.

In embodiments of the present invention, in step S104, the projection matrix 225 meeting RIP condition can adopt consistent ball matrix, local hadamard matrix, toeplitz matrix.

In embodiments of the present invention, this intrusion detection method, is processed the network information number as the error metrics standard of matrix decomposition and enforcement rarefaction restriction by associating divergence, and the low-rank obtaining network detection data matrix approaches, find the immanent structure feature of data, isolate invasion information.

Fig. 2 shows the structure of the intruding detection system based on Non-negative Matrix Factorization under the sparse expression that the embodiment of the present invention provides.For convenience of explanation, illustrate only part related to the present invention.

This intruding detection system comprises:

Data acquisition and pretreatment module 21, for collecting and preliminary treatment the calling sequence of process;

Training and dimensionality reduction module 22, sequence matrix for obtaining data acquisition and pretreatment module 21 carries out nonnegative matrix Breaking Recurrently, obtain feature basic matrix and weight encoder matrix, system call sequence is converted into low dimensional vector by high dimension vector, and the feature of higher dimensional space inherited by low n-dimensional subspace n;

Abnormality detection module 23, for being mated with the characteristic vector comprised in data to be tested by original training data vector, judges whether to meet off-note.

In embodiments of the present invention, data acquisition and pretreatment module 21 comprise: data acquisition module 211 and data preprocessing module 212;

Data acquisition module 211 comprises further:

Data preprocessing module 212 comprises further:

Map and latent structure for datagram IP fragmentation and reassembly, stream restructuring, property field, generating network characteristic, send into the network data preprocessor that characteristic vector storehouse 224 stored or sent into detector detection;

Perform mark for extraction procedure, and generate short data records vector with sliding window, send into the host data preprocessor that characteristic vector storehouse 224 stored or sent into detector detection.

In embodiments of the present invention, training and dimensionality reduction module 22 comprise:

For carrying out Breaking Recurrently to the data test matrix obtained through preliminary treatment, being changed the rule of matrix decomposition iteration by the joint objective function parameter adjusting many divergences, obtaining the Non-negative Matrix Factorization module 221 of best classification performance;

For under base conversion and the minimum basic principle of the degree of coherence of projection matrix 225, employing base has converted the rarefaction representation module 222 of the rarefaction representation of basic matrix and weight matrix;

Sampling for the projection matrix 225 pairs of sparse datas by meeting RIP condition, representing the data compression acquisition module 223 of data to be tested access behavior by less characteristic;

For network data and the storage of host data and the characteristic vector storehouse 224 of issue.

In embodiments of the present invention, characteristic vector storehouse 224 comprises: misuse detects training sample database (DB_MTRAIN), abnormality detection training sample database (DB_ATRAIN) and abnormality detection result exceptional sample storehouse (DB_ARESULT), abnormality detection result exceptional sample storehouse (DB_ARESULT) is added misuse to and is detected in training sample database (DB_MTRAIN), realizes upgrading misuse and detects training sample database (DB_MTRAIN).

Fig. 3 shows the data flow diagram of the intruding detection system based on Non-negative Matrix Factorization under the sparse expression that the embodiment of the present invention provides.

Below in conjunction with drawings and the specific embodiments, application principle of the present invention is further described.

For prior art Problems existing, the present invention proposes a kind of new abnormality detection modeling method, utilize the Data Dimensionality Reduction advantage of Non-negative Matrix Factorization, have employed many divergences as module, RIP condition in sparse expression joins in associating divergence target function race, with constrained non-negative matrix decomposition iterative process, reduce Data Detection dimension, for the difficult problem solving intruding detection system process higher-dimension mass data provides a kind of scheme.

The present invention adopt Non-negative Matrix Factorization theory propose a kind of newly to intrusion detection method.Invention is concentrated on and is limited with enforcement rarefaction by the error metrics standard of associating divergence as matrix decomposition, the network information number is processed, the low-rank obtaining network detection data matrix approaches, thus can find the immanent structure feature of data, isolates invasion information.Meanwhile, greatly can reduce the dimension of characteristic, save and store and computational resource.The present invention's innovation in avoiding the lower Non-negative Matrix Factorization of single divergence tolerance to the limitation of the ability in feature extraction of network signal, have employed the method for many divergences tolerance, the base conversion built and projection matrix 225 meet constraint isometry, sparse about expression of realization matrix decomposition, find out the optimal Decomposition method of NMF, and for intrusion detection, carried out corresponding experiment simulation simultaneously, improve the efficiency of intruding detection system.

The distance of the intrusion detection based on Non-negative Matrix Factorization under sparse expression is as follows: image data carries out preliminary treatment, adds up the frequency of each element in these group data and forms test vector matrix, this matrix is implemented to the matrix decomposition of non-negative restriction.Breaking Recurrently pilot process obtains base vector and weight vector basic transformation matrices carries out rarefaction representation, and then carry out compression sampling by projection matrix 225 pairs of weight vectors, the split-matrix obtained so both effectively represent data characteristics, again reduced data processing amount.If the characteristic vector comprised in the weight coefficient vector of the rarefaction calculated by Decomposition iteration and training data closely, then illustrate that this test data is the normal network information.Wherein, Decomposition iteration rule dependency is in the error metrics of many divergences, and the end condition of iteration must adapt to the uniform convergence standard of the associating divergence adopted, and ensures that convergence is global optimum.

Intruding detection system based on Non-negative Matrix Factorization requires can real-time analysis intrusion attempt, before system is endangered, send warning, in real time attack is made a response, and take counter-measure to respond, for system provides safety guarantee to greatest extent, designed system function as shown in Figure 4:

1) data acquisition module 211

Data acquisition module 211 is divided into network data acquirer and host data collector, and wherein network data acquirer is responsible for obtaining raw network data from residing network environment, and provides this data to other parts of system; Privileged program of auditing is responsible for by host data collector, exports record of the audit, and provides this data to other parts of system.

2) data preprocessing module 212

Data preprocessing module 212 is divided into network data preprocessor and host data preprocessor, the functions such as datagram IP fragmentation and reassembly, stream restructuring, property field mapping and latent structure have been responsible for by network data preprocessor, generating network characteristic, sends into characteristic vector storehouse 224 and stores or send into detector detection; Host data preprocessor is responsible for extraction procedure and is performed mark, and generates short data records vector with sliding window, sends into characteristic vector storehouse 224 and stores or send into detector detection.

3) Non-negative Matrix Factorization module 221

Non-negative Matrix Factorization module 221 carries out Breaking Recurrently to the data test matrix obtained through preliminary treatment, by the joint objective function parameter of the many divergences of adjustment in decomposable process, can change the rule of matrix decomposition iteration, to obtain best classification performance.

4) rarefaction representation module 222

Adopt base to convert the rarefaction representation of basic matrix and weight matrix, select appropriate base conversion to be conducive to the compression expression of data.It is minimum that basic principle is exactly base conversion and the degree of coherence of projection matrix 225, and this is the precondition of carrying out data compression sampling.

5) data compression acquisition module 223

Sampled by projection matrix 225 pairs of sparse datas, projection matrix 225 must meet RIP condition, represents the access behavior of data to be tested by less characteristic.

6) characteristic vector storehouse 224

Storage and the issue of network data and host data are responsible in characteristic vector storehouse 224, for the parameter training of detector.Characteristic vector storehouse 224 is divided into misuse to detect training sample database (DB_MTRAIN), abnormality detection training sample database (DB_ATRAIN) and abnormality detection result exceptional sample storehouse (DB_ARESULT).DB_ARESULT adds in DB_MTRAIN by system, realizes upgrading DB_MTRAIN, re-training NMF detector, to reach renewal detected rule, automatically adapts to new network environment.

7) intrusion behavior judges and exports.

Utilize the weight coefficient vector of altitude feature to mate with the characteristic vector comprised in training data, this test data of then explanation meeting off-note illustrates intrusion behavior.Otherwise carry out NMF process again, until the difference of basic matrix and weight matrix product and former test data is less than the threshold value of setting.Abnormal results is reported to the police and exports, according to testing result execution journal, the warning of detection model, block, the operation response such as to abandon.

An outstanding intruding detection system not only needs higher accuracy of detection, but also will have good real-time.In order to improve the real-time of intrusion detection, we do not consider the transfer characteristic of event, and adopt the frequency characteristic of event carry out modeling to system action and detect invasion.

Data acquisition module 211 primary responsibility gather data, and preliminary treatment is carried out to data, data acquisition ability determines the data class that intruding detection system can be analyzed.Intrusion detection method based on privilege process behavior mainly collects system call related data in process running, and the data source used in existing detection method comprises storehouse situation of change of address that system call number, system call parameter and return value, triggering system call and process etc.

First initial data divides into groups by data preprocessing module 212, and such as system call data are divided into groups according to process, and identical process is classified as one group; Command sequence data are then divided into groups according to certain length.In the process detected, often to organize data (such as each process or every group command sequence) they are integrally detected as research object.In process of data preprocessing, first add up and often organize the frequency that in data, each element (such as system call or command sequence) occurs, the result of statistics forms a column vector.The problem of intrusion detection is just converted into and judges the whether normal problem of these vectors, and data obtain simplifies, and problem have also been obtained simplification.

After preliminary treatment is carried out to test data, add up the frequency of each element in these group data and form test vector matrix, this matrix being implemented to the matrix decomposition of non-negative restriction.Breaking Recurrently pilot process obtains base vector and weight vector basic transformation matrices carries out rarefaction representation, and then carry out compression sampling by projection matrix 225 pairs of weight vectors, the split-matrix obtained so both effectively represent data characteristics, again reduced data processing amount.If the characteristic vector comprised in the weight coefficient vector of the rarefaction calculated by Decomposition iteration and training data closely, then illustrate that this test data is the normal network information.Wherein, Decomposition iteration rule dependency is in the error metrics of many divergences, and the end condition of iteration must adapt to the uniform convergence standard of the associating divergence adopted, and ensures that convergence is global optimum.

Rarefaction representation is one of committed step of the present invention, and the data volume that usual system call produces is very large, such as, from University of New Mexico of the U.S. (UNM).Sendmail data wherein and MITlpr data.147 processes are contained in the normal data of Sendmail.Then contain 36 abnormal process ☆ in abnormal data, wherein contain syslog and attack (local1, local2, remote1, remote2), sunsendmailcp(sscp) attack and forwardingloop(fwd) abnormality.2703 normal procedure and 1001 invasion processes are altogether contained in MITlpr data), even if the basic matrix that produces of NMF and weight matrix, its data-intensive degree is very high, carries out for characteristic matching, its inefficiency for the later stage.Therefore present invention employs rarefaction method, by DCT (discrete cosine transform), project is carried out to basic matrix and weight matrix data, obtain sparse data, which reduce data volume, again can not the feature of obliterated data.

The data compression stage be the present invention from maximum different for intrusion detection of other NMF methods, be also improve intrusion detection speed, realize the important step detected in real time.The result of rarefaction representation can carry out compression sampling by compressive sensing theory, this step can obtain concise data behavioural characteristic, and the key realizing this technology is compression projection matrix 225, existing theory shows consistent ball matrix, local hadamard matrix can effectively complete this work.

Finally utilize the feature in 224 li, characteristic vector storehouse to mate with the feature after data compression, by setting rational threshold value, being greater than this threshold value and regarding as invasion process, then regarding as legitimate processes lower than this threshold value.Then Output rusults is to carry out the analysis in later stage.

The intrusion detection method based on Non-negative Matrix Factorization under the sparse expression that the embodiment of the present invention provides and system, collection network data and host data, obtain raw network data one-level audit privileged program; Preliminary treatment is carried out to network data host data, generating network characteristic, short data records vector, and the network characterization data generated, short data records vector are stored, detected; Nonnegative matrix Breaking Recurrently is carried out to the data test matrix obtained through preliminary treatment, and to obtaining basic matrix and weight matrix carries out rarefaction representation respectively; Utilize the weight matrix data meeting projection matrix 225 pairs of rarefaction representations of RIP condition to sample, obtain the weight coefficient vector of altitude feature; Utilize the data in characteristic vector storehouse 224, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, judge whether to meet off-note; This intrusion detection method and system utilize the Data Dimensionality Reduction advantage of Non-negative Matrix Factorization, have employed many divergences as module, RIP condition in sparse expression joins in associating divergence target function race, to retrain non-matrix Decomposition iteration process, reduce Data Detection dimension, for the difficult problem solving intruding detection system process higher-dimension mass data provides a kind of scheme, there is stronger propagation and employment and be worth.

These are only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1. the intrusion detection method based on Non-negative Matrix Factorization under sparse expression, is characterized in that, this intrusion detection method comprises the following steps:

Step 4, utilizes the weight matrix data of projection matrix to rarefaction representation meeting routing information protocol RIP condition to sample, obtains the weight coefficient vector of altitude feature;

Step 5, utilizes the data in characteristic vector storehouse, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, judges whether to meet off-note;

In step 3, the basic matrix that nonnegative matrix Breaking Recurrently obtains, after rarefaction representation, will feed back in upper level nonnegative matrix Breaking Recurrently;

In step 5, utilize the data in characteristic vector storehouse, the weight coefficient vector of altitude feature is mated with the characteristic vector comprised in training data, when meeting off-note, then illustrate that this test data illustrates intrusion behavior, and output that abnormal results is reported to the police; If when not meeting off-note, then again carry out nonnegative matrix Breaking Recurrently, until the difference of basic matrix and weight matrix product and former test data is less than the threshold value of setting;

This intrusion detection method also according to testing result execution journal, warning, block and abandon operation response;

In step 4, the projection matrix meeting RIP condition can adopt consistent ball matrix, local hadamard matrix, toeplitz matrix;

This intrusion detection method is limited as the error metrics standard of matrix decomposition and enforcement rarefaction by associating divergence, the network information number is processed, the low-rank obtaining network detection data matrix approaches, and finds the immanent structure feature of data, isolates invasion information.

2. the intruding detection system based on Non-negative Matrix Factorization under sparse expression, is characterized in that, this intruding detection system comprises:

Training and dimensionality reduction module, data test matrix for obtaining data acquisition and pretreatment module carries out nonnegative matrix Breaking Recurrently, obtain feature basic matrix and weight encoder matrix, system call sequence is converted into low dimensional vector by high dimension vector, and the feature of higher dimensional space inherited by low n-dimensional subspace n;

Abnormality detection module, for being mated with the characteristic vector comprised in data to be tested by original training data vector, judges whether to meet off-note;

Described data acquisition and pretreatment module comprise further: data acquisition module and data preprocessing module;

Described data acquisition module comprises further:

Described data preprocessing module comprises further:

Perform mark for extraction procedure, and generate short data records vector with sliding window, the host data preprocessor that feeding characteristic vector library storage or feeding detector detect;

Described training and dimensionality reduction module comprise further:

For the projection matrix by meeting routing information protocol RIP condition, sparse data being sampled, representing the data compression acquisition module of data to be tested access behavior by less characteristic;

For network data and the storage of host data and the characteristic vector storehouse of issue;

Described characteristic vector storehouse comprises further: misuse detects training sample database (DB_MTRAIN), abnormality detection training sample database (DB_ATRAIN) and abnormality detection result exceptional sample storehouse (DB_ARESULT), abnormality detection result exceptional sample storehouse (DB_ARESULT) is added misuse to and is detected in training sample database (DB_MTRAIN), realizes upgrading misuse and detects training sample database (DB_MTRAIN).