CN105930723A - Intrusion detection method based on feature selection - Google Patents

Intrusion detection method based on feature selection Download PDF

Info

Publication number
CN105930723A
CN105930723A CN201610246178.3A CN201610246178A CN105930723A CN 105930723 A CN105930723 A CN 105930723A CN 201610246178 A CN201610246178 A CN 201610246178A CN 105930723 A CN105930723 A CN 105930723A
Authority
CN
China
Prior art keywords
feature
attribute
data
intrusion detection
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610246178.3A
Other languages
Chinese (zh)
Inventor
陈星�
戴远飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201610246178.3A priority Critical patent/CN105930723A/en
Publication of CN105930723A publication Critical patent/CN105930723A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an intrusion detection method based on feature selection. The intrusion detection method comprises the following steps: firstly, carrying out discretization processing on original data, then, carrying out the feature selection on the data subjected to the discretization processing, carrying out normalization processing on the data subjected to the feature selection, and finally, importing the data subjected to the normalization processing into a classifier for training. The intrusion detection method can shorten model training time and improve model training accuracy.

Description

The intrusion detection method that a kind of feature based selects
Technical field
The present invention relates to Data Mining, the intrusion detection that a kind of feature based selects Method.
Background technology
Along with Internet era development, connection and the flow of data are the most increasing. for meter For calculation machine and equipment, the threat of thing followed malicious intrusions increases the most day by day. so setting up One Network Intrusion Detection System is the most important.Intrusion detection is a kind of by collecting and analyzing By protection system information, thus find the technology of invasion. owing to intrusion detection needs to enter data Row is real-time, process exactly, it was predicted that whether this data is threat information, so how does To data timely, predict and just become a difficult problem accurately.Conventional intrusion detection system System uses the way of pattern match, and the most artificial sets up rule to each intrusion model, pass through If else statement judges, but the method Units-of-production method, and accuracy rate is the highest, It is important that when the attack mode of a new type occurs, system cannot be defendd.
In recent years the method for machine learning is joined in intruding detection system and be one and become Gesture.At present, neutral net, support vector machine, naive Bayesian, the machine learning such as decision tree Method has all been applied to intrusion detection.First the feature collected is carried out the pre-of data Process, then carry out machine learning by these and the data processed, generate grader. when one When individual real time data is by system, whether system can be doped this recorded by grader and be Invasion record, if being judged as threatening, system is automatically prevented from invasion, if normally, then allows it Pass through. for an intruding detection system, the accuracy rate of detection invasion, find invasion Speed etc. these all it is critical that factor. but the machine learning method of main flow exists now Accuracy rate and can only achieve about 95%, and the training time is long, it is impossible to the data to redundancy Carry out the problems such as process.
Summary of the invention
In view of this, the purpose of the present invention is to propose to the intrusion detection side that a kind of feature based selects Method, has lifting on the time and accuracy rate of training pattern.
The present invention uses below scheme to realize: the intrusion detection method that a kind of feature based selects, First initial data is carried out sliding-model control, the data after sliding-model control are carried out feature Select, the data after feature selection are normalized, after normalized Data import grader and are trained.
Further, described sliding-model control uses entropy minimization discrete method (EMD), first will Attribute successive value to be divided arranges in order, then takes the midpoint conduct of every a pair numerical value that is connected Breakpoint Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two Part, and the comentropy of classification results is also computed, then selects that to make entropy Little breakpoint adds in break point set;Described entropy minimization is specified by a minimum description length The dwell time of discretization.
Further, described sliding-model control may be used without ratio K time interval discrete method (PKID), an interval scale and institute's phase are found by the discrete gap size of adjustment and quantity A balance between the accuracy rate hoped, and sends out this balance between difference as discretization deviation Criterion.
Further, described data after sliding-model control are carried out feature selection, employing It it is feature selection approach (CFS) based on association;Ignored for classification shadow by a valuation functions Ringing little feature, described valuation functions is as follows:
M S = kr c f k + k ( k - 1 ) r f f ;
Wherein, MSFor comprising the heuristic value of subset S of k feature, rcfIt is that feature joins with classification It is the meansigma methods of intensity, rffIt it is the meansigma methods of relation intensity between feature and feature;
Further, described data after sliding-model control are carried out feature selection, employing It is based on conforming filter method (CONS), is projected onto spy by comparing training sample The consistency level levying subset carrys out selected characteristic;From characteristic number, one is randomly generated in each wheel Individual subset S, if number of features is less than the feature in the most best character subset in subset S Number, then calculates the inconsistent standard in S, if inconsistent rate is less than the value preset, then S has just become best character subset.
Further, described data after sliding-model control are carried out feature selection, employing Being INTERACT method, described INTERACT method is calculation based on symmetrical uncertain SU Method;First pass through the symmetrical uncertain SU descending of feature oneself, then special from these Levy sequence finally to start, assess feature one by one, if the concordance contribution of certain feature is less than threshold value, Just remove this feature, otherwise use this feature;Described symmetrical uncertain SU is to describe two Feature x, information income IG and the tolerance of entropy H ratio between y, formula is as follows:
S U ( x , y ) = 2 I G ( x / y ) H ( x ) + H ( y ) ;
I G ( x y ) = H ( y ) + H ( x ) - H ( x , y ) .
Further, described normalized uses deviation normalization (NOR), and formula is as follows:
y = x - m i n m a x - m i n ;
Wherein, max is the maximum of sample data, and min is the minima of sample data.
Further, described grader employing Naive Bayes Classifier (Navie Bayes): Let d be the training set with limited quantity example, A={A1,A2,…,AnIt is n limited genus Property, an example d ∈ D vector (a1,a2,…an) describe, wherein aiIt is attribute AiCurrent Value, category attribute C represents, function dom (Ai) obtain attribute AiDefinition territory set;WhenDuring establishment, i.e. the classification of prediction example d is to belong to Property specified criteria under the maximum classification of posterior probability, it was predicted that correctness maximum;Introduce and assume: Under conditions of given classification C, all of attribute AiSeparate: P(Ai|c,Aj)=p (Ai|c),Aj, p (C) > 0, use following formula to calculate the situation in given attribute value The other posterior probability of lower class:
P ( C = c | A 1 = a 1 , A 2 = a 2 , ... A n = a n ) = arg m a x c ∈ C P ( c ) Π i = 1 n P ( A i | C = c ) .
Further, described grader uses support vector machine classifier (SVM), uses classification Function:
f ( x ) = s i g n ( Σ i = 1 l y i a i K ( x i , x ) + b ) ;
Wherein, l represents the number of training sample, and x represents the vector of example to be sorted, xi,yiRepresent the The attribute vector of i training sample and classification logotype, K (xi, x) represent kernel function, aiRepresent with b Model parameter, solves parameter a by being calculated as follows a quadratic programming problemi:
max Q ( a ) = Σ i = 1 l a i - 1 2 Σ i = 1 l Σ j = 1 l a i a j y i y j K ( x i , x j ) ;
s . t Σ i = 1 l a i y i = 0 , 0 ≤ a i ≤ C , i = 1 , ... , l ;
If two disaggregated models are:
G (x)=ω * x+b;
The threshold value arranging two disaggregated models is 0, obtains:
&omega; * x i + b > 0 , &ForAll; x i &Element; c 1 &omega; * x i + b < 0 , &ForAll; x i &Element; c 2 ;
Select the plane that in two class objects, distance is maximum.
Further, described grader uses decision Tree algorithms (Decision Tree), described Decision Tree algorithms is divided into two stages: set construction phase, hedge clipper branch stage;
Described tree construction phase uses top-down recursive fashion, starts each from root node Testing attribute is selected according to given standard, then according to the institute of respective attributes is likely on node Value set up downwards branch, divide training sample, until a node on all samples all by The sample size being divided in same class, or a certain node less than set-point time till;Its The standard of middle selection testing attribute includes information gain, information gain ratio, gini index and base Division in distance;
The first beta pruning of employing of described hedge clipper branch stage, rear beta pruning or the method that both combine;Hedge clipper The beta pruning standard of branch method includes Minimal Description Length Criterion and expectation error rate minimum principle;Before Person carries out binary coding to decision tree, and it is minimum that optimal beta pruning tree is exactly the required binary bit of coding Tree;The latter calculates the expectation error rate that the subtree on certain node is occurred after beta pruning.
Compared with prior art, the present invention has a following beneficial effect: conventional intruding detection system Using the way of pattern match, the application is for classifier training time length before, accuracy rate The highest situation, the thought that introduced feature selects, the network intrusions inspection that feature based selects is proposed Method of determining and calculating. according to experimental result, the algorithm that feature based selects is in time of training pattern and standard Really there is lifting in rate, particularly in terms of verification and measurement ratio, reached more than 98% especially.
Accompanying drawing explanation
Fig. 1 is embodiment of the present invention theory diagram.
Fig. 2 is embodiment of the present invention naive Bayesian structural model schematic diagram.
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the present invention will be further described.
As it is shown in figure 1, present embodiments provide the intrusion detection method that a kind of feature based selects, First initial data is carried out sliding-model control, the data after sliding-model control are carried out feature Select, the data after feature selection are normalized, after normalized Data import grader and are trained.
In the present embodiment, described sliding-model control uses entropy minimization discrete method, first will treat The attribute successive value divided arranges in order, then takes the midpoint of every a pair numerical value that is connected as disconnected Point Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two Divide, and the comentropy of classification results is also computed, then select that to make entropy minimum Breakpoint add in break point set;By a minimum description length specify described entropy minimization from The dwell time of dispersion.
In the present embodiment, described sliding-model control adoption rate K time interval discrete method is logical Cross and adjust discrete gap size and quantity finds an interval scale and desired accuracy rate Between a balance, using this balance as discretization deviation and send out difference between criterion.
In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be based on association feature selection approach;Ignored for classification impact by a valuation functions Little feature, described valuation functions is as follows:
M S = kr c f k + k ( k - 1 ) r f f ;
Wherein, MSFor comprising the heuristic value of subset S of k feature, rcfIt is that feature joins with classification It is the meansigma methods of intensity, rffIt it is the meansigma methods of relation intensity between feature and feature;
In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be based on conforming filter method, be projected onto feature by comparing training sample The consistency level of collection carrys out selected characteristic;From characteristic number, a son is randomly generated in each wheel Collection S, if number of features is less than the number of features in the most best character subset in subset S, Then the inconsistent standard in S is calculated, if inconsistent rate is less than the value preset, then S just becomes Best character subset.
In the present embodiment, described data after sliding-model control are carried out feature selection, adopt Be INTERACT method, described INTERACT method is based on symmetrical uncertain SU Algorithm;First pass through the symmetrical uncertain SU descending of feature oneself, then from these Characteristic sequence finally starts, and assesses feature one by one, if the concordance contribution of certain feature is less than threshold Value, just removes this feature, otherwise uses this feature;Described symmetrical uncertain SU is to describe Two features x, information income IG and the tolerance of entropy H ratio between y, formula is as follows:
S U ( x , y ) = 2 I G ( x / y ) H ( x ) + H ( y ) ;
I G ( x y ) = H ( y ) + H ( x ) - H ( x , y ) .
In the present embodiment, described normalized uses deviation normalization, and formula is as follows:
y = x - m i n m a x - m i n ;
Wherein, max is the maximum of sample data, and min is the minima of sample data.
In the present embodiment, as in figure 2 it is shown, described grader use Naive Bayes Classifier: Let d be the training set with limited quantity example, A={A1,A2,…,AnIt is n limited genus Property, an example d ∈ D vector (a1,a2,…an) describe, wherein aiIt is attribute AiCurrent Value, category attribute C represents, function dom (Ai) obtain attribute AiDefinition territory set;WhenDuring establishment, i.e. the classification of prediction example d is to belong to Property specified criteria under the maximum classification of posterior probability, it was predicted that correctness maximum;Introduce and assume: Under conditions of given classification C, all of attribute AiSeparate: P(Ai|c,Aj)=p (Ai|c),Aj, p (C) > 0, use following formula to calculate the situation in given attribute value The other posterior probability of lower class:
P ( C = c | A 1 = a 1 , A 2 = a 2 , ... A n = a n ) = arg m a x c &Element; C P ( c ) &Pi; i = 1 n P ( A i | C = c ) .
In the present embodiment, described grader uses support vector machine classifier SVM, uses and divides Class function:
f ( x ) = s i g n ( &Sigma; i = 1 l y i a i K ( x i , x ) + b ) ;
Wherein, l represents the number of training sample, and x represents the vector of example to be sorted, xi,yiRepresent the The attribute vector of i training sample and classification logotype, K (xi, x) represent kernel function, aiRepresent with b Model parameter, solves parameter a by being calculated as follows a quadratic programming problemi:
max Q ( a ) = &Sigma; i = 1 l a i - 1 2 &Sigma; i = 1 l &Sigma; j = 1 l a i a j y i y j K ( x i , x j ) ;
s . t &Sigma; i = 1 l a i y i = 0 , 0 &le; a i &le; C , i = 1 , ... , l ;
If two disaggregated models are:
G (x)=ω * x+b;
The threshold value arranging two disaggregated models is 0, obtains:
&omega; * x i + b > 0 , &ForAll; x i &Element; c 1 &omega; * x i + b < 0 , &ForAll; x i &Element; c 2 ;
Select the plane that in two class objects, distance is maximum.
In the present embodiment, described grader uses decision Tree algorithms, and described decision Tree algorithms is divided It it is two stages: tree construction phase, the hedge clipper branch stage;
Described tree construction phase uses top-down recursive fashion, starts each from root node Testing attribute is selected according to given standard, then according to the institute of respective attributes is likely on node Value set up downwards branch, divide training sample, until a node on all samples all by The sample size being divided in same class, or a certain node less than set-point time till;Its The standard of middle selection testing attribute includes information gain, information gain ratio, gini index and base Division in distance;
The first beta pruning of employing of described hedge clipper branch stage, rear beta pruning or the method that both combine;Hedge clipper The beta pruning standard of branch method includes Minimal Description Length Criterion and expectation error rate minimum principle;Before Person carries out binary coding to decision tree, and it is minimum that optimal beta pruning tree is exactly the required binary bit of coding Tree;The latter calculates the expectation error rate that the subtree on certain node is occurred after beta pruning.
Particularly, the present embodiment uses KDDcup99 data base, and this data base is uniquely can Sample label and the data set of test data are provided.This data set mainly include 41 bit attributes and One label, wherein the attribute of 19 is the basic feature that network connects, the attribute of 10 22 Be the attribute of content characteristic 23 41 that network connects be the traffic characteristic that network connects. this number Include training data according to collection, including seven weeks in tcp detect about 5,000,000 Bar linkage record, each of which is about the data of 100 bytes.Institute in our experiment Use 10%KDDcup99, i.e. 494201 data.
In order to prove the algorithm advantage compared to traditional algorithm of the present invention, we are by mentioned above 3 kinds of feature selecting algorithm and the mixing of two kinds of discretization algorithms, thus obtained following table Lattice:
Table 1 feature selection attribute
Each combination of upper table all can produce different character subsets, in conjunction with different graders More can obtain different results.So just for different combinations and different graders below, Test.
What the present embodiment was chosen is the KDD99 data set of 10%, first front 300,000 data is made For training set training pattern, then carry out model by remaining 200,000 data as test set Inspection.Nomal. in label is labeled as+1, and remaining is labeled as-1, will be in having processed Data bring in model and be trained, obtain support vector machine (SVM), decision tree (tree) With naive Bayesian (bayes) model.Test model is carried out again by remaining 200,000 data Accuracy rate.
The most just the situation of two classes is discussed the performance of three kinds of graders:
Classification accuracy after table 2 feature selection
As can be seen from Table 1 and Table 2 after having carried out feature selection, except CONS+ The combination of PKID+NOR, the feature combination that substantial majority selects can make the property of grader Can be significantly improved, in SVM classifier, INTERACT+EMD+NOR_SVM Combination obtain the highest degree of accuracy 98.35%;In decision tree classifier, INTERACT+EMD The combination of+NOR_TREE obtains the highest degree of accuracy 99.90%;In Naive Bayes Classification In device, the combination of INTERACT+PKID+NOR_BAYES obtains the highest accuracy rate 98.32%.In three above-mentioned graders, the classifying quality of decision tree is best. decision tree Error rate is maintained at less than 1%, and support vector machine is then maintained at less than 2%, due to simple pattra leaves This model proposes based on conditional independence and probability statistics, have between attribute association or During the skewness of person's attribute, all can produce the mistake of classification, so accuracy rate is not on the whole Such as both the above.
In KDDcup99 data set, having 4 big class type of errors, they are Probe, Dos respectively, U2R and R2L.Add normal normal class, a total of 5 classes.As shown in the table:
Table 3 10%kddcup all kinds of label proportion
These 5 classifications are set up many disaggregated models by the present invention, and test, and concrete data are shown in Table 4, table 5, table 6:
Many classification situation support vector machine (SVM) classification situation after table 4 feature selection
Many classification situation decision tree classification situation after table 5 feature selection
Many classification situation Bayes (Bayes) classification situation after table 6 feature selection
Finally, after carrying out the selection of many characteristic of divisions, support vector machine and decision tree are calculated The classification accuracy of method is average the most all more than 95%, and on particular category classification accuracy phase Relatively low being because is trained lazy weight, is not enough to generate grader accurately.In SVM, The combination of CFS+EMD+NOR_SVM obtains best accuracy 98.34%;Decision tree In, the combination of CONS+EMD+NOR_TREE obtains best accuracy rate 99.29%;? In naive Bayesian, the combination of CONS+PKID+NOR_BAYES obtains best accurate Rate 91.83%.By contrast, Bayesian accuracy rate is not the most the highest.Owing to Bayes divides Class device is built upon in the probability distribution of data, and the sample in possible training set can not be good The distribution situation that reaction sample is overall, so also occurring in that deviation when setting up model, causes accurately Rate declines.
The foregoing is only presently preferred embodiments of the present invention, all according to scope of the present invention patent institute Impartial change and the modification done, all should belong to the covering scope of the present invention.

Claims (10)

1. the intrusion detection method that a feature based selects, it is characterized in that: first initial data is carried out sliding-model control, data after sliding-model control are carried out feature selection, data after feature selection are normalized, the data after normalized are imported grader and is trained.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described sliding-model control uses entropy minimization discrete method, first attribute successive value to be divided is arranged in order, then the midpoint of every a pair numerical value that is connected is taken as breakpoint Candidate Set, by each breakpoint in circulation assessment Candidate Set, data have been partitioned into two parts, and the comentropy of classification results is also computed, and then select that breakpoint making entropy minimum to add in break point set;The dwell time of described entropy minimization discretization is specified by a minimum description length.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described sliding-model control adoption rate K time interval discrete method, a balance between an interval scale and desired accuracy rate is found, using this balance as the criterion between discretization deviation and a difference by adjusting discrete gap size and quantity.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described data after sliding-model control are carried out feature selection, use feature selection approach based on association;Ignoring the feature little for classification impact by a valuation functions, described valuation functions is as follows:
Wherein, MSFor comprising the heuristic value of subset S of k feature, rcfIt is the meansigma methods of feature and classification relation intensity, rffIt it is the meansigma methods of relation intensity between feature and feature.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described data after sliding-model control are carried out feature selection, using based on conforming filter method, the consistency level being projected onto character subset by comparing training sample carrys out selected characteristic;From characteristic number, subset S is randomly generated in each wheel, if number of features is less than the number of features in the most best character subset in subset S, then the inconsistent standard in S is calculated, if inconsistent rate is less than the value preset, then S has just become best character subset.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterized in that: described data after sliding-model control are carried out feature selection, using INTERACT method, described INTERACT method is algorithm based on symmetrical uncertain SU;First pass through the symmetrical uncertain SU descending of feature oneself, then from the beginning of these characteristic sequences are finally, assess feature one by one, if the concordance contribution of certain feature is less than threshold value, just removes this feature, otherwise use this feature;Described symmetrical uncertain SU is to describe two features x, information income IG and the tolerance of entropy H ratio between y, and formula is as follows:
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described normalized uses deviation normalization, and formula is as follows:
Wherein, max is the maximum of sample data, and min is the minima of sample data.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses Naive Bayes Classifier: let d be the training set with limited quantity example, A={A1,A2,…,AnIt is n limited attribute, an example d ∈ D vector (a1,a2,…an) describe, wherein aiIt is attribute AiCurrent value, category attribute C represents, function dom (Ai) obtain attribute AiDefinition territory set;WhenDuring establishment, i.e. the classification of prediction example d is the classification that posterior probability is maximum under attribute specified criteria, it was predicted that correctness maximum;Introduce and assume: under conditions of given classification C, all of attribute AiSeparate: P (Ai|c,Aj)=p (Ai|c),Aj, p (C) > 0, employing following formula calculating posterior probability of classification in the case of given attribute value:
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses support vector machine classifier SVM, employing classification function:
Wherein, l represents the number of training sample, and x represents the vector of example to be sorted, xi,yiRepresent attribute vector and classification logotype, the K (x of i-th training samplei, x) represent kernel function, aiRepresent model parameter with b, solve parameter a by being calculated as follows a quadratic programming problemi:
If two disaggregated models are:
G (x)=ω * x+b;
The threshold value arranging two disaggregated models is 0, obtains:
Select the plane that in two class objects, distance is maximum.
The intrusion detection method that a kind of feature based the most according to claim 1 selects, it is characterised in that: described grader uses decision Tree algorithms, and described decision Tree algorithms is divided into two stages: set construction phase, hedge clipper branch stage;
Described tree construction phase uses top-down recursive fashion, start to select testing attribute according to given standard on each node from root node, then according to institute's likely value of respective attributes is set up downwards branch, is divided training sample, until the sample size that is all divided in same class, or a certain node of all samples on a node is less than till during set-point;The standard wherein selecting testing attribute includes information gain, information gain ratio, gini index and division based on distance;
The first beta pruning of employing of described hedge clipper branch stage, rear beta pruning or the method that both combine;The beta pruning standard of tree pruning method includes Minimal Description Length Criterion and expectation error rate minimum principle;The former carries out binary coding to decision tree, and optimal beta pruning tree is exactly the tree that the required binary bit of coding is minimum;The latter calculates the expectation error rate that the subtree on certain node is occurred after beta pruning.
CN201610246178.3A 2016-04-20 2016-04-20 Intrusion detection method based on feature selection Pending CN105930723A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610246178.3A CN105930723A (en) 2016-04-20 2016-04-20 Intrusion detection method based on feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610246178.3A CN105930723A (en) 2016-04-20 2016-04-20 Intrusion detection method based on feature selection

Publications (1)

Publication Number Publication Date
CN105930723A true CN105930723A (en) 2016-09-07

Family

ID=56838578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610246178.3A Pending CN105930723A (en) 2016-04-20 2016-04-20 Intrusion detection method based on feature selection

Country Status (1)

Country Link
CN (1) CN105930723A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789912A (en) * 2016-11-22 2017-05-31 清华大学 Router data plane anomaly detection method based on classification regression tree
CN106874779A (en) * 2017-03-10 2017-06-20 广东工业大学 A kind of data mining method for secret protection and system
CN106874766A (en) * 2017-04-09 2017-06-20 上海云剑信息技术有限公司 The whitepack detection method that one point data is attacked in power system
CN106897413A (en) * 2017-02-20 2017-06-27 重庆邮电大学 A kind of hybrid characteristic selecting method based on harmony search
CN106897273A (en) * 2017-04-12 2017-06-27 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN106991536A (en) * 2017-04-09 2017-07-28 上海云剑信息技术有限公司 The black box detection method that one point data is attacked in power system
CN106992965A (en) * 2017-02-27 2017-07-28 南京邮电大学 A kind of Trojan detecting method based on network behavior
CN108304853A (en) * 2017-10-10 2018-07-20 腾讯科技(深圳)有限公司 Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing
CN108388563A (en) * 2017-02-03 2018-08-10 北京京东尚科信息技术有限公司 Information output method and device
CN109388944A (en) * 2018-11-06 2019-02-26 吉林大学 A kind of intrusion detection method based on KPCA and ELM
CN109450860A (en) * 2018-10-16 2019-03-08 南京航空航天大学 A kind of detection method threatened based on entropy and the advanced duration of support vector machines
CN109492667A (en) * 2018-10-08 2019-03-19 国网天津市电力公司电力科学研究院 A kind of feature selecting discrimination method for non-intrusive electrical load monitoring
WO2019080484A1 (en) * 2017-10-26 2019-05-02 北京深鉴智能科技有限公司 Method of pruning convolutional neural network based on feature map variation
CN110070141A (en) * 2019-04-28 2019-07-30 上海海事大学 A kind of network inbreak detection method
CN110135469A (en) * 2019-04-24 2019-08-16 北京航空航天大学 It is a kind of to improve the characteristic filter method and device selected based on correlative character
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting
CN110138786A (en) * 2019-05-20 2019-08-16 福州大学 Web method for detecting abnormality and system based on SMOTETomek and LightGBM
CN110191081A (en) * 2018-02-22 2019-08-30 上海交通大学 The Feature Selection system and method for network flow attack detecting based on learning automaton
CN110719278A (en) * 2019-10-08 2020-01-21 苏州浪潮智能科技有限公司 Method, device, equipment and medium for detecting network intrusion data
CN111343175A (en) * 2020-02-22 2020-06-26 苏州浪潮智能科技有限公司 Method, system, equipment and medium for improving network intrusion detection precision
CN111901340A (en) * 2020-07-28 2020-11-06 四川大学 Intrusion detection system and method for energy Internet
CN113590872A (en) * 2021-07-28 2021-11-02 广州艾美网络科技有限公司 Method, device and equipment for generating dance spectral plane
CN113726810A (en) * 2021-09-07 2021-11-30 广东电网有限责任公司广州供电局 Intrusion detection system
CN116846688A (en) * 2023-08-30 2023-10-03 南京理工大学 Interpretable flow intrusion detection method based on CNN

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100738550B1 (en) * 2006-01-16 2007-07-11 삼성전자주식회사 Network intrusion detection system using genetic algorithm and method thereof
CN102164140A (en) * 2011-04-22 2011-08-24 西安电子科技大学 Method for intrusion detection based on negative selection and information gain
CN103699698A (en) * 2014-01-16 2014-04-02 北京泰乐德信息技术有限公司 Method and system for track traffic failure recognition based on improved Bayesian algorithm
US20150135318A1 (en) * 2013-11-12 2015-05-14 Macau University Of Science And Technology Method of detecting intrusion based on improved support vector machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100738550B1 (en) * 2006-01-16 2007-07-11 삼성전자주식회사 Network intrusion detection system using genetic algorithm and method thereof
CN102164140A (en) * 2011-04-22 2011-08-24 西安电子科技大学 Method for intrusion detection based on negative selection and information gain
US20150135318A1 (en) * 2013-11-12 2015-05-14 Macau University Of Science And Technology Method of detecting intrusion based on improved support vector machine
CN103699698A (en) * 2014-01-16 2014-04-02 北京泰乐德信息技术有限公司 Method and system for track traffic failure recognition based on improved Bayesian algorithm

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
MANORANJAN DASH ET. AL.: ""Consistency-based search in feature selection"", 《ARTIFICIAL INTELLIGENCE》 *
MARK A. HALL: ""Correlation-based feature selection for machine learning"", 《HAMILTON: THE UNIVERSITY OF WAIKATO》 *
ZHENG ZHAO ET. AL.: ""Searching for interacting features"", 《PROC OF INTERNATIONAL JOINT CONFERENCE ON IJCAI》 *
何俊: ""基于支持向量机的入侵检测系统的研究与仿真"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张晓惠等: ""基于平衡二叉决策树SVM算法的物联网安全研究"", 《技术研究》 *
张永俊: ""基于SVM的增量入侵检测方法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王国才: ""朴素贝叶斯分类器的研究与应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王维光: ""基于分类算法的恶意网页检测技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789912B (en) * 2016-11-22 2020-02-21 清华大学 Router data plane abnormal behavior detection method based on classification regression decision tree
CN106789912A (en) * 2016-11-22 2017-05-31 清华大学 Router data plane anomaly detection method based on classification regression tree
CN108388563A (en) * 2017-02-03 2018-08-10 北京京东尚科信息技术有限公司 Information output method and device
CN106897413A (en) * 2017-02-20 2017-06-27 重庆邮电大学 A kind of hybrid characteristic selecting method based on harmony search
CN106992965A (en) * 2017-02-27 2017-07-28 南京邮电大学 A kind of Trojan detecting method based on network behavior
CN106874779A (en) * 2017-03-10 2017-06-20 广东工业大学 A kind of data mining method for secret protection and system
CN106874766A (en) * 2017-04-09 2017-06-20 上海云剑信息技术有限公司 The whitepack detection method that one point data is attacked in power system
CN106991536A (en) * 2017-04-09 2017-07-28 上海云剑信息技术有限公司 The black box detection method that one point data is attacked in power system
CN106874766B (en) * 2017-04-09 2018-11-13 上海云剑信息技术有限公司 The whitepack detection method that one point data is attacked in electric system
CN106897273B (en) * 2017-04-12 2018-02-06 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN106897273A (en) * 2017-04-12 2017-06-27 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN108304853A (en) * 2017-10-10 2018-07-20 腾讯科技(深圳)有限公司 Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing
WO2019080484A1 (en) * 2017-10-26 2019-05-02 北京深鉴智能科技有限公司 Method of pruning convolutional neural network based on feature map variation
CN110191081A (en) * 2018-02-22 2019-08-30 上海交通大学 The Feature Selection system and method for network flow attack detecting based on learning automaton
CN109492667A (en) * 2018-10-08 2019-03-19 国网天津市电力公司电力科学研究院 A kind of feature selecting discrimination method for non-intrusive electrical load monitoring
CN109450860A (en) * 2018-10-16 2019-03-08 南京航空航天大学 A kind of detection method threatened based on entropy and the advanced duration of support vector machines
CN109388944A (en) * 2018-11-06 2019-02-26 吉林大学 A kind of intrusion detection method based on KPCA and ELM
CN110135469A (en) * 2019-04-24 2019-08-16 北京航空航天大学 It is a kind of to improve the characteristic filter method and device selected based on correlative character
CN110070141A (en) * 2019-04-28 2019-07-30 上海海事大学 A kind of network inbreak detection method
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting
CN110138786A (en) * 2019-05-20 2019-08-16 福州大学 Web method for detecting abnormality and system based on SMOTETomek and LightGBM
CN110719278A (en) * 2019-10-08 2020-01-21 苏州浪潮智能科技有限公司 Method, device, equipment and medium for detecting network intrusion data
CN111343175A (en) * 2020-02-22 2020-06-26 苏州浪潮智能科技有限公司 Method, system, equipment and medium for improving network intrusion detection precision
CN111901340A (en) * 2020-07-28 2020-11-06 四川大学 Intrusion detection system and method for energy Internet
CN111901340B (en) * 2020-07-28 2021-06-22 四川大学 Intrusion detection system and method for energy Internet
CN113590872A (en) * 2021-07-28 2021-11-02 广州艾美网络科技有限公司 Method, device and equipment for generating dance spectral plane
CN113590872B (en) * 2021-07-28 2023-11-28 广州艾美网络科技有限公司 Method, device and equipment for generating dancing spectrum surface
CN113726810A (en) * 2021-09-07 2021-11-30 广东电网有限责任公司广州供电局 Intrusion detection system
CN116846688A (en) * 2023-08-30 2023-10-03 南京理工大学 Interpretable flow intrusion detection method based on CNN
CN116846688B (en) * 2023-08-30 2023-11-21 南京理工大学 Interpretable flow intrusion detection method based on CNN

Similar Documents

Publication Publication Date Title
CN105930723A (en) Intrusion detection method based on feature selection
CN111181939B (en) Network intrusion detection method and device based on ensemble learning
Xu et al. A genetic programming model for real-time crash prediction on freeways
CN109871954B (en) Training sample generation method, abnormality detection method and apparatus
CN112258093A (en) Risk level data processing method and device, storage medium and electronic equipment
CN105373606A (en) Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN110929939B (en) Landslide hazard susceptibility spatial prediction method based on clustering-information coupling model
CN110225055A (en) A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model
CN113159482A (en) Method and system for evaluating information security risk
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN108491511A (en) Data digging method and device, model training method based on diagram data and device
CN105574544A (en) Data processing method and device
Wang et al. A conscience on-line learning approach for kernel-based clustering
CN102045358A (en) Intrusion detection method based on integral correlation analysis and hierarchical clustering
CN113221960B (en) Construction method and collection method of high-quality vulnerability data collection model
CN114707571B (en) Credit data anomaly detection method based on enhanced isolation forest
CN105760649A (en) Big-data-oriented creditability measuring method
Chen et al. Pattern recognition using clustering algorithm for scenario definition in traffic simulation-based decision support systems
CN113378990A (en) Traffic data anomaly detection method based on deep learning
CN109086808A (en) Traffic high-risk personnel recognition methods based on random forests algorithm
CN114581694A (en) Network security situation assessment method based on improved support vector machine
CN106203520B (en) SAR image classification method based on depth Method Using Relevance Vector Machine
CN112888008B (en) Base station abnormality detection method, device, equipment and storage medium
CN117077018B (en) Data processing method, device and storage medium based on machine learning
CN112232206B (en) Face recognition method and face recognition platform based on big data and artificial intelligence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160907

RJ01 Rejection of invention patent application after publication