CN114897124A - Intrusion detection feature selection method based on improved wolf optimization algorithm - Google Patents
Intrusion detection feature selection method based on improved wolf optimization algorithm Download PDFInfo
- Publication number
- CN114897124A CN114897124A CN202210321742.9A CN202210321742A CN114897124A CN 114897124 A CN114897124 A CN 114897124A CN 202210321742 A CN202210321742 A CN 202210321742A CN 114897124 A CN114897124 A CN 114897124A
- Authority
- CN
- China
- Prior art keywords
- wolf
- population
- wolfs
- feature
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Security & Cryptography (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an intrusion detection feature selection method based on an improved grey wolf optimization algorithm, which is characterized in that a wolf group is disturbed by using Cauchy variation and Levy flight to jump out local optimums, the global search capability is improved, an optimal feature subset is obtained, and the accuracy of intrusion detection is improved. When the grey wolf population is initialized, Logistic chaotic mapping is used, the quality of the initial grey wolf population is improved, and the algorithm mining capacity is improved; disturbance is carried out when the wolfsbane population is updated, the defect of falling into local optimum is made up to a certain extent, and the global search capability is improved; and finally, training an optimal feature subset with a small number of features, and bringing the optimal feature subset into a test set to obtain a higher detection accuracy.
Description
Technical Field
The invention relates to a gray wolf optimization algorithm and kNN and other related technologies for classification and identification, and belongs to a method for seeking an optimal feature subset in an intrusion detection system.
Background
In the era of increasing network security risks, more and more security protection means are provided. The intrusion detection technology can effectively detect known and unknown attacks, and has the capabilities of identification and early warning, so that the intrusion detection technology becomes one of the most safe and common protective measures acknowledged in the industry. Although intrusion detection systems have various classifications according to different classification standards, many intrusion detection systems have problems such as low detection efficiency. As the data volume of network traffic increases exponentially, the intrusion frequency and complexity of network attackers are also increasing continuously, so that it is difficult to accurately detect attacks or anomalies in high-dimensional network traffic, and the time cost also increases exponentially with the increase of the data volume. After the network flow data is analyzed, the fact that the number of data features of the network flow is large but the number of the features with invalid redundancy is small is found, the redundancy features are removed, the dimensionality reduction of the high-dimensional network flow is achieved, the time cost can be effectively reduced, and the accuracy rate is increased.
Feature selection is the best method for reducing the dimension of high-dimensional flow data, and the most representative and most valuable feature subset is selected from the features, so that dimension reduction is realized. The categories of feature selection algorithms can be regarded as three categories, filtering, wrapping and embedding. The filtering process is to select the features and train the classifier, and no algorithm is involved in the process of selecting the feature subset; the wrapping type is that when the feature subsets are selected, various intelligent algorithms are added, an optimal group of feature subsets are found from a solution space consisting of all features, and the wrapping type algorithm is also the most widely researched feature selection algorithm at present; the embedded method mixes the two methods and takes the process of feature selection as a link of training the classifier.
The gray wolf is a predator at the top of the nature, prefers to be social and has a strict social ranking system, so that the strongest leader wolf with the highest ranking can be clearly distinguished in the gray wolf group. The grey wolf optimization algorithm simulates the mass hunting behavior pattern of grey wolfs, improved to encompass the prey, i.e., the optimal fitness, each wolf representing a feature subset solution. The algorithm has strong convergence, and has the advantage of easy realization due to less related parameters, but also has the problem of easy falling into local optimization. The characteristic selection based on the gray wolf optimization algorithm is a wrapping type characteristic selection type, when the optimal characteristic subset is searched in a solution space, the gray wolf optimization algorithm is utilized, a plurality of groups of characteristic subsets are used as base points, the search is carried out nearby, and the optimal characteristic subset is finally obtained, so that the dimensionality reduction of high-dimensional flow data is realized, the cost is reduced, the intrusion detection accuracy is improved, and the rapid improvement of the network defense efficiency is realized.
Disclosure of Invention
The invention aims to solve the problems of low accuracy and low speed of the final intrusion detection result caused by factors such as a large number of data features, redundant features and the like in the intrusion detection process. The traditional intrusion detection depends on the prior art to detect flow data and the like, and selects a feature subset for complicated data features by utilizing a gray wolf optimization algorithm, but the current algorithm has the problem of falling into local optimization and influences the selection of the optimal feature subset and the final accuracy. The invention provides an intrusion detection feature selection technology of a gray wolf optimization algorithm combining Cauchy variation and Levy flight, which utilizes the Cauchy variation and Levy flight to disturb a wolf group, jump out local optimality, improve the global search capability, further obtain an optimal feature subset and improve the accuracy of intrusion detection.
The invention is realized by the following specific steps:
step 1: 80% are obtained from KDD Cup99 data set as training set, and 20% are obtained as testing set. And the intrusion detection classification technology selects a kNN technology, and k is 5. K represents the K nearest neighbors, and each sample can be represented by its nearest K neighbors.
Step 2: parameters are set that improve the gray wolf algorithm. The number N of wolf clusters is 30, the maximum iteration number T is 50, the clusters are initialized, the position of the current wolf is represented by a vector X, the dimension is the feature number 41 of the selected data set, and the first generation wolf clusters are generated through initialization.
And step 3: converting all the positions of the wolfsbane population into a binary system, taking 0.5 as a boundary, setting the wolfsbane population as 1 when the wolfsbane population exceeds 0.5 in each dimension, and representing the characteristic represented by the dimension; otherwise, set to 0 indicates that the feature represented by the dimension is not selected.
And 4, step 4: and substituting the obtained binary grayish wolf population vector group into the kNN training set according to the corresponding characteristic selection scheme so as to select corresponding characteristic data.
And 5: training the characteristic data obtained in the step 4 to obtain the accuracyWhere TP indicates the number of correct detections, FP indicates the number of erroneous detections, and the error rate error is 1-Accuracy.
Step 6: calculating the fitness of the subset of featuresWhere a and b represent weights, a is typically 0.99, b is typically 0.01, m represents the number of features in this subset of features, and n represents all the subsets of features. The calculation of the formula is used as the fitness value because two factors, namely accuracy and feature subset length, need to be considered in the process of feature selection.
And 7: and (4) after the fitness values of the feature subset schemes calculated in the step (6) are sorted, and the three smallest values are set as three leadership wolfs alpha, beta and gamma. Among these three wolfs, the wolf head, i.e. the wolf with the best fitness, is denoted by α, in the process belonging to the current best feature subset; the beta wolf is the wolf with the second best fitness, obeys to the head wolf and belongs to the current next-best feature subset; the γ wolf is the third best wolf, subject to α, β wolf, belonging to the current third best feature subset.
And 8: generating a random number of [0,1) to determine the perturbation strategy of alpha, beta, gamma wolf. Bounded by 0.5, the random number is within the range of [0,0.5] step 8.1 is performed, otherwise step 8.2 is performed.
Step 8.1: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Step 8.2: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions WhereinAnd w represents a random number that follows a normal distribution of standards,wherein Γ (x) ═ x-1! .
Step 8.3: and calculating a corresponding fitness value according to the result obtained in the step 8.1 or 8.2, and storing and updating the positions and the fitness values of the three wolfs before updating.
And step 9: the model of the wolf surrounding the prey is X (t +1) ═ X p (t) -A ° D, wherein X p (t) represents the position vector of the current generation of prey, which refers to the position of the first three wolfs, A is the convergence factor, C is the swing factor, D represents the distance between wolfs, and A is 2 alpha DEG r 1 -α,C=2r 2 ,D=C°X p (t) -X (t), ° representing the Hadamard product, a decreases nonlinearly from 2 to 0 throughout the iteration,r 1 and r 2 Is [0,1 ]]A random vector of (1).
Step 10: in each iteration process, the best three wolfs in the current population are kept, and the positions of the candidate wolfs are calculated. The next generation of gray wolf population locations were obtained using the following formula. D α =|C 1 °X α (t)-X(t)|、D β =|C 2 °X β (t)-X(t)|、D γ =|C 3 °X γ (t)-X(t)|、X 1 =X α -A 1 °D α 、X 2 =X β -A 2 °D β 、X 3 =X γ -A 3 °D β 、Wherein X α 、X β And X γ Three wolfs, D respectively representing the best current population α 、D β And D γ Respectively representing the distances between the current candidate gray wolf and the optimal three wolfs.
step 12: and (5) repeating the steps 3 to 12 until the maximum iteration number is reached.
Step 13: and during the kNN test, extracting test set data by using the obtained optimal feature subset, and performing detection analysis.
Compared with the prior art, the invention has the beneficial effects that:
when the wolf population is initialized, Logistic chaotic mapping is used, the quality of the initial wolf population is improved, and the algorithm mining capacity is improved; disturbance is carried out when the wolfsbane population is updated, the defect of falling into local optimum is made up to a certain extent, and the global search capability is improved; and finally, training an optimal feature subset with a small number of features, and bringing the optimal feature subset into a test set to obtain a higher detection accuracy.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Fig. 1 is a schematic flow chart of an intrusion detection feature selection method based on caucasian variation gray wolf optimization algorithm according to the present invention. The method comprises the following specific implementation steps:
1. data were randomly drawn from KDD Cup99, with 80% as training set and 20% as test set.
2. All parameters used by the gray wolf optimization algorithm are set, and a gray wolf population is initialized by using Logistic chaotic mapping.
3. Substituting the obtained characteristic selection scheme corresponding to the initial wolf population into the kNN model, and trainingThe training set is subjected to feature extraction and training to obtain specific accuracyAnd error rate error 1-Accuracy, by combining the feature numbers contained in the feature subsetsAnd calculating the fitness value of each scheme, and sequencing to obtain the best first 3 wolfs as alpha, beta and gamma wolfs.
4. Levy flight is carried out on the 3 wolfOr coxib mutation And (4) disturbing, calculating the adaptability value after disturbance, comparing and storing the best result.
6. and carrying out Cauchy variation disturbance on the current Huilus lupulus population to obtain the next generation Huilus lupulus population.
8. And judging whether the maximum iteration times is reached, if not, returning to the step 4, sequentially performing each step, and jumping out of the loop after the loop is circulated until the maximum iteration times is reached to complete the search of the optimal feature subset.
9. And according to the obtained optimal feature subset, performing feature extraction on data in the KDD Cup99 test set, and detecting by using kNN to obtain the test accuracy.
Claims (1)
1. An intrusion detection feature selection method based on an improved grayling optimization algorithm is characterized by comprising the following steps:
step 1: acquiring 80% of the KDD Cup99 data set as a training set and 20% of the KDD Cup99 data set as a test set; selecting a kNN technology by using an intrusion detection classification technology, wherein k is 5; k represents the K nearest neighbors, each sample can be represented by its nearest K neighbors;
step 2: setting parameters for improving the gray wolf algorithm; the number N of wolf clusters is 30, the maximum iteration number T is 50, a population is initialized, the position of the current wolf is represented by a vector X, the dimension is the characteristic number 41 of the selected data set, and a first generation wolf cluster is generated through initialization;
and step 3: converting all the positions of the wolfsbane population into a binary system, taking 0.5 as a boundary, setting the wolfsbane population as 1 when the wolfsbane population exceeds 0.5 in each dimension, and representing the characteristic represented by the dimension; otherwise, setting the dimension to be 0 to indicate that the feature represented by the dimension is not selected;
and 4, step 4: substituting the obtained binary grayish wolf population vector group into a kNN training set according to the corresponding characteristic selection scheme so as to select corresponding characteristic data;
and 5: training the characteristic data obtained in the step 4 to obtain the accuracyWherein TP represents the number of correct detections, FP represents the number of false detections, and the error rate is 1-Accuracy;
step 6: calculating the fitness of the subset of featuresWherein a and b represent weights, a is usually 0.99, b is usually 0.01, m represents the number of features in the feature subset, and n represents all feature subsets; the calculation of the formula is used as a fitness value because two factors of accuracy and length of a feature subset need to be considered in the process of feature selection;
and 7: after the fitness values of the feature subset schemes calculated in the step 6 are sorted, the three smallest values are set as three leadership wolfs alpha, beta and gamma; among these three wolfs, the wolf head, i.e. the wolf with the best fitness, is denoted by α, in the process belonging to the current best feature subset; the beta wolf is the wolf with the second best fitness, obeys to the head wolf and belongs to the current next-best feature subset; the γ wolf is the third best wolf, obeys to the α, β wolf, and belongs to the current third best feature subset;
and 8: generating a random number of [0,1) to determine a perturbation strategy of alpha, beta, gamma wolf; taking 0.5 as a boundary, executing a step 8.1 when the random number is in the range of [0,0.5], otherwise, executing a step 8.2;
step 8.1: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Step 8.2: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions Whereinu-w- σ, v and w represent a random number that follows a normal distribution of the norm,wherein Γ (x) ═ x-1! (ii) a
Step 8.3: calculating a corresponding fitness value according to the result obtained in the step 8.1 or 8.2, and storing and updating the positions and the fitness values of the three wolfs;
and step 9: the model of the gray wolf surrounding the prey isWherein X p (t) represents the position vector of the current generation of prey, which refers to the position of the first three wolfs, A is the convergence factor, C is the swing factor, D represents the distance between wolfs,C=2r 2 , representing the hadamard product, a is reduced nonlinearly from 2 to 0 throughout the iteration,r 1 and r 2 Is [0,1 ]]A random vector of (1);
step 10: in each iteration process, the best three wolfs in the current population are reserved, and the positions of the candidate wolfs are calculated; obtaining the next generation of Huilus lupulus population position by using the following formula; wherein X α 、X β And X γ Three wolfs, D respectively representing the best current population α 、D β And D γ Respectively representing the distances between the current candidate grey wolf and the optimal three wolfs;
step 11: subjecting all Husky wolfs to Cauchi variationThe disturbance is carried out by the vibration generator,
step 12: repeating the step 3 to the step 12 until the maximum iteration number is reached;
step 13: and during kNN testing, extracting test set data by using the obtained optimal characteristic subset, and performing detection analysis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321742.9A CN114897124A (en) | 2022-03-25 | 2022-03-25 | Intrusion detection feature selection method based on improved wolf optimization algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321742.9A CN114897124A (en) | 2022-03-25 | 2022-03-25 | Intrusion detection feature selection method based on improved wolf optimization algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114897124A true CN114897124A (en) | 2022-08-12 |
Family
ID=82714788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210321742.9A Pending CN114897124A (en) | 2022-03-25 | 2022-03-25 | Intrusion detection feature selection method based on improved wolf optimization algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114897124A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116886398A (en) * | 2023-08-03 | 2023-10-13 | 中国石油大学(华东) | Internet of things intrusion detection method based on feature selection and integrated learning |
CN117354013A (en) * | 2023-10-11 | 2024-01-05 | 中国电子科技集团公司第三十研究所 | Fishing attack detection method based on wolf group hunting algorithm |
-
2022
- 2022-03-25 CN CN202210321742.9A patent/CN114897124A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116886398A (en) * | 2023-08-03 | 2023-10-13 | 中国石油大学(华东) | Internet of things intrusion detection method based on feature selection and integrated learning |
CN116886398B (en) * | 2023-08-03 | 2024-03-29 | 中国石油大学(华东) | Internet of things intrusion detection method based on feature selection and integrated learning |
CN117354013A (en) * | 2023-10-11 | 2024-01-05 | 中国电子科技集团公司第三十研究所 | Fishing attack detection method based on wolf group hunting algorithm |
CN117354013B (en) * | 2023-10-11 | 2024-04-23 | 中国电子科技集团公司第三十研究所 | Fishing attack detection method based on wolf group hunting algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114897124A (en) | Intrusion detection feature selection method based on improved wolf optimization algorithm | |
CN112491796B (en) | Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network | |
CN110070060B (en) | Fault diagnosis method for bearing equipment | |
CN110704840A (en) | Convolutional neural network CNN-based malicious software detection method | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN113344113B (en) | Yolov3 anchor frame determination method based on improved k-means clustering | |
CN112950445A (en) | Compensation-based detection feature selection method in image steganalysis | |
CN114692156B (en) | Memory segment malicious code intrusion detection method, system, storage medium and equipment | |
CN113098862A (en) | Intrusion detection method based on combination of hybrid sampling and expansion convolution | |
CN114065933B (en) | Unknown threat detection method based on artificial immunity thought | |
CN116484289A (en) | Carbon emission abnormal data detection method, terminal and storage medium | |
CN115987552A (en) | Network intrusion detection method based on deep learning | |
CN113591962B (en) | Network attack sample generation method and device | |
CN108737429B (en) | Network intrusion detection method | |
CN117155701A (en) | Network flow intrusion detection method | |
CN116520795A (en) | Industrial control system abnormality detection method based on field opening method | |
Thanh et al. | An approach to reduce data dimension in building effective network intrusion detection systems | |
CN111581640A (en) | Malicious software detection method, device and equipment and storage medium | |
CN115296837B (en) | Sustainable integrated intrusion detection method based on SSA optimization | |
CN115996135B (en) | Industrial Internet malicious behavior real-time detection method based on feature combination optimization | |
Maru et al. | Combining transformer with a discriminator for anomaly detection in multivariate time series | |
CN117390688B (en) | Model inversion method based on supervision training | |
CN111080727B (en) | Color image reconstruction method and device and image classification method and device | |
CN118298809A (en) | Open world fake voice attribution method and system based on soft comparison fake learning | |
CN116647409A (en) | Invasion detection method based on WK-1DCNN-GRU hybrid model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |